{ "info": { "author": "Max Hollmann", "author_email": "", "bugtrack_url": null, "classifiers": [ "Development Status :: 4 - Beta", "Environment :: Console", "License :: OSI Approved :: MIT License", "Programming Language :: Python", "Programming Language :: Python :: 3", "Programming Language :: Python :: 3.6", "Programming Language :: Python :: 3 :: Only", "Topic :: Multimedia", "Topic :: Multimedia :: Sound/Audio", "Topic :: Multimedia :: Sound/Audio :: Speech" ], "description": "# voxceleb-luigi\n[![pypi version](http://img.shields.io/pypi/v/voxceleb_luigi.svg?style=flat)](https://pypi.python.org/pypi/voxceleb_luigi)\n\nLuigi pipeline to download VoxCeleb audio from YouTube and extract speaker segments.\n\nThis pipeline can download both [VoxCeleb](http://www.robots.ox.ac.uk/~vgg/data/voxceleb/vox1.html) and [VoxCeleb2](http://www.robots.ox.ac.uk/~vgg/data/voxceleb/vox2.html).\n\n## Installation\n\n pip install voxceleb_luigi\n\nYou need to have `ffmpeg` and `youtube-dl` installed. On systems with `apt`, you can simply run:\n\n sudo apt install ffmpeg youtube-dl\n\n\n## Usage\n\nDownload and unpack at least one of the metadata directories with the YouTube URLs and timestamps (VC1 [dev](http://www.robots.ox.ac.uk/~vgg/data/voxceleb/data/vox1_dev_txt.zip)/[test](http://www.robots.ox.ac.uk/~vgg/data/voxceleb/data/vox1_test_txt.zip), VC2 [dev](http://www.robots.ox.ac.uk/~vgg/data/voxceleb/data/vox2_dev_txt.zip)/[test](http://www.robots.ox.ac.uk/~vgg/data/voxceleb/data/vox2_test_txt.zip)).\n\nStart luigid, the central scheduler:\n\n luigid --background\n\nThen start the worker process:\n\n luigi --module voxceleb_luigi \\\n --workers 4 \\\n voxceleb.ProcessDirectory \\\n --path /path/to/metadata\n\nThe pipeline will recursively search `/path/to/metadata` for the segment files (by looking for files called like `00001.txt` etc.), download the audio of their source videos, and extract the speaker segments.\n\nBy default, the segment audio files are stored in parallel to the metadata in directories called `wav` that get created next to the `txt` directories. Suppose you have the metadata of the dev and test sets of VoxCeleb1 in `~/vc1` with paths looking like `~/vc1/dev/txt/idX/videoX/XXXXX.txt`. If you pass `--path ~/vc1` to the pipeline, segments will end up in `~/vc1/dev/wav/idX/videoX/XXXXX.wav`. Other output of the pipeline (full audio of videos, soft failures, dummy outputs for completed directories) are stored in `./voxceleb-luigi-files` by default.\n\n\n## Configuration\n\nBoth the location where the dataset gets created and the storage directory for the pipeline can be changed through parameters in the `luigi.cfg` (default location is the current working directory; you can override this via the `LUIGI_CONFIG_PATH` environment variable):\n\n [voxceleb.Config]\n # Required\n output_dir=/where/to/store/wav/segments\n pipeline_dir=/where/to/put/pipeline/stuff\n\n ## Only necessary if youtube-dl, ffmpeg, and ffprobe are not in your PATH:\n #ffmpeg_bin=/ffmpeg-dir/ffmpeg\n #ffmpeg_directory=/ffmpeg-dir # passed on to youtube-dl via --ffmpeg-location\n #youtube_dl_bin=/path/to/youtube-dl\n\n [voxceleb.ProcessDirectory]\n ## alternative to the --path command line option\n #path=/path/to/metadata\n\nWhen `output_dir` is set, the directory structure of the metadata is mirrored in this directory. In this case, the `txt` directories are not replaced by `wav`, but removed from the path.\n\n\n", "description_content_type": "", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/maxhollmann/voxceleb-luigi", "keywords": "", "license": "MIT", "maintainer": "", "maintainer_email": "", "name": "voxceleb_luigi", "package_url": "https://pypi.org/project/voxceleb_luigi/", "platform": "", "project_url": "https://pypi.org/project/voxceleb_luigi/", "project_urls": { "Homepage": "https://github.com/maxhollmann/voxceleb-luigi" }, "release_url": "https://pypi.org/project/voxceleb_luigi/0.2.0/", "requires_dist": [ "luigi" ], "requires_python": "", "summary": "Luigi pipeline to download VoxCeleb audio from YouTube and extract speaker segments", "version": "0.2.0" }, "last_serial": 4500317, "releases": { "0.1.2": [ { "comment_text": "", "digests": { "md5": "4c15e708c1291ea84455014f55f33f8e", "sha256": "cb3702aed151f6d4e54211a70bdf09885ce4be692cf9a99622968a994a798fc5" }, "downloads": -1, "filename": "voxceleb_luigi-0.1.2.linux-x86_64.tar.gz", "has_sig": false, "md5_digest": "4c15e708c1291ea84455014f55f33f8e", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 5856, "upload_time": "2018-04-14T18:43:39", "url": "https://files.pythonhosted.org/packages/ca/6f/36027319853ed4c04225a103710caaf7890dfb02e5c9c940e983d4ea4cde/voxceleb_luigi-0.1.2.linux-x86_64.tar.gz" } ], "0.2.0": [ { "comment_text": "", "digests": { "md5": "3934b3e7f70c5e574b02fed456551c8d", "sha256": "684c5554f35b6a6af12b3a64b051f01cadf0d1e29bc6324e1c4c3c226c61ff7e" }, "downloads": -1, "filename": "voxceleb_luigi-0.2.0-py3-none-any.whl", "has_sig": false, "md5_digest": "3934b3e7f70c5e574b02fed456551c8d", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 14451, "upload_time": "2018-11-18T17:35:57", "url": "https://files.pythonhosted.org/packages/6c/a0/66f8a57abc6bfa1a992ed1027e725a4acfbc8f751ea117805ff908673f20/voxceleb_luigi-0.2.0-py3-none-any.whl" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "3934b3e7f70c5e574b02fed456551c8d", "sha256": "684c5554f35b6a6af12b3a64b051f01cadf0d1e29bc6324e1c4c3c226c61ff7e" }, "downloads": -1, "filename": "voxceleb_luigi-0.2.0-py3-none-any.whl", "has_sig": false, "md5_digest": "3934b3e7f70c5e574b02fed456551c8d", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 14451, "upload_time": "2018-11-18T17:35:57", "url": "https://files.pythonhosted.org/packages/6c/a0/66f8a57abc6bfa1a992ed1027e725a4acfbc8f751ea117805ff908673f20/voxceleb_luigi-0.2.0-py3-none-any.whl" } ] }