{
    "info": {
        "author": "Max Hollmann",
        "author_email": "",
        "bugtrack_url": null,
        "classifiers": [
            "Development Status :: 4 - Beta",
            "Environment :: Console",
            "License :: OSI Approved :: MIT License",
            "Programming Language :: Python",
            "Programming Language :: Python :: 3",
            "Programming Language :: Python :: 3.6",
            "Programming Language :: Python :: 3 :: Only",
            "Topic :: Multimedia",
            "Topic :: Multimedia :: Sound/Audio",
            "Topic :: Multimedia :: Sound/Audio :: Speech"
        ],
        "description": "# voxceleb-luigi\n[![pypi version](http://img.shields.io/pypi/v/voxceleb_luigi.svg?style=flat)](https://pypi.python.org/pypi/voxceleb_luigi)\n\nLuigi pipeline to download VoxCeleb audio from YouTube and extract speaker segments.\n\nThis pipeline can download both [VoxCeleb](http://www.robots.ox.ac.uk/~vgg/data/voxceleb/vox1.html) and [VoxCeleb2](http://www.robots.ox.ac.uk/~vgg/data/voxceleb/vox2.html).\n\n## Installation\n\n    pip install voxceleb_luigi\n\nYou need to have `ffmpeg` and `youtube-dl` installed. On systems with `apt`, you can simply run:\n\n    sudo apt install ffmpeg youtube-dl\n\n\n## Usage\n\nDownload and unpack at least one of the metadata directories with the YouTube URLs and timestamps (VC1 [dev](http://www.robots.ox.ac.uk/~vgg/data/voxceleb/data/vox1_dev_txt.zip)/[test](http://www.robots.ox.ac.uk/~vgg/data/voxceleb/data/vox1_test_txt.zip), VC2 [dev](http://www.robots.ox.ac.uk/~vgg/data/voxceleb/data/vox2_dev_txt.zip)/[test](http://www.robots.ox.ac.uk/~vgg/data/voxceleb/data/vox2_test_txt.zip)).\n\nStart luigid, the central scheduler:\n\n    luigid --background\n\nThen start the worker process:\n\n    luigi --module voxceleb_luigi \\\n        --workers 4 \\\n        voxceleb.ProcessDirectory \\\n        --path /path/to/metadata\n\nThe pipeline will recursively search `/path/to/metadata` for the segment files (by looking for files called like `00001.txt` etc.), download the audio of their source videos, and extract the speaker segments.\n\nBy default, the segment audio files are stored in parallel to the metadata in directories called `wav` that get created next to the `txt` directories. Suppose you have the metadata of the dev and test sets of VoxCeleb1 in `~/vc1` with paths looking like `~/vc1/dev/txt/idX/videoX/XXXXX.txt`. If you pass `--path ~/vc1` to the pipeline, segments will end up in `~/vc1/dev/wav/idX/videoX/XXXXX.wav`. Other output of the pipeline (full audio of videos, soft failures, dummy outputs for completed directories) are stored in `./voxceleb-luigi-files` by default.\n\n\n## Configuration\n\nBoth the location where the dataset gets created and the storage directory for the pipeline can be changed through parameters in the `luigi.cfg` (default location is the current working directory; you can override this via the `LUIGI_CONFIG_PATH` environment variable):\n\n    [voxceleb.Config]\n    # Required\n    output_dir=/where/to/store/wav/segments\n    pipeline_dir=/where/to/put/pipeline/stuff\n\n    ## Only necessary if youtube-dl, ffmpeg, and ffprobe are not in your PATH:\n    #ffmpeg_bin=/ffmpeg-dir/ffmpeg\n    #ffmpeg_directory=/ffmpeg-dir # passed on to youtube-dl via --ffmpeg-location\n    #youtube_dl_bin=/path/to/youtube-dl\n\n    [voxceleb.ProcessDirectory]\n    ## alternative to the --path command line option\n    #path=/path/to/metadata\n\nWhen `output_dir` is set, the directory structure of the metadata is mirrored in this directory. In this case, the `txt` directories are not replaced by `wav`, but removed from the path.\n\n\n",
        "description_content_type": "",
        "docs_url": null,
        "download_url": "",
        "downloads": {
            "last_day": -1,
            "last_month": -1,
            "last_week": -1
        },
        "home_page": "https://github.com/maxhollmann/voxceleb-luigi",
        "keywords": "",
        "license": "MIT",
        "maintainer": "",
        "maintainer_email": "",
        "name": "voxceleb_luigi",
        "package_url": "https://pypi.org/project/voxceleb_luigi/",
        "platform": "",
        "project_url": "https://pypi.org/project/voxceleb_luigi/",
        "project_urls": {
            "Homepage": "https://github.com/maxhollmann/voxceleb-luigi"
        },
        "release_url": "https://pypi.org/project/voxceleb_luigi/0.2.0/",
        "requires_dist": [
            "luigi"
        ],
        "requires_python": "",
        "summary": "Luigi pipeline to download VoxCeleb audio from YouTube and extract speaker segments",
        "version": "0.2.0"
    },
    "last_serial": 4500317,
    "releases": {
        "0.1.2": [
            {
                "comment_text": "",
                "digests": {
                    "md5": "4c15e708c1291ea84455014f55f33f8e",
                    "sha256": "cb3702aed151f6d4e54211a70bdf09885ce4be692cf9a99622968a994a798fc5"
                },
                "downloads": -1,
                "filename": "voxceleb_luigi-0.1.2.linux-x86_64.tar.gz",
                "has_sig": false,
                "md5_digest": "4c15e708c1291ea84455014f55f33f8e",
                "packagetype": "sdist",
                "python_version": "source",
                "requires_python": null,
                "size": 5856,
                "upload_time": "2018-04-14T18:43:39",
                "url": "https://files.pythonhosted.org/packages/ca/6f/36027319853ed4c04225a103710caaf7890dfb02e5c9c940e983d4ea4cde/voxceleb_luigi-0.1.2.linux-x86_64.tar.gz"
            }
        ],
        "0.2.0": [
            {
                "comment_text": "",
                "digests": {
                    "md5": "3934b3e7f70c5e574b02fed456551c8d",
                    "sha256": "684c5554f35b6a6af12b3a64b051f01cadf0d1e29bc6324e1c4c3c226c61ff7e"
                },
                "downloads": -1,
                "filename": "voxceleb_luigi-0.2.0-py3-none-any.whl",
                "has_sig": false,
                "md5_digest": "3934b3e7f70c5e574b02fed456551c8d",
                "packagetype": "bdist_wheel",
                "python_version": "py3",
                "requires_python": null,
                "size": 14451,
                "upload_time": "2018-11-18T17:35:57",
                "url": "https://files.pythonhosted.org/packages/6c/a0/66f8a57abc6bfa1a992ed1027e725a4acfbc8f751ea117805ff908673f20/voxceleb_luigi-0.2.0-py3-none-any.whl"
            }
        ]
    },
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "md5": "3934b3e7f70c5e574b02fed456551c8d",
                "sha256": "684c5554f35b6a6af12b3a64b051f01cadf0d1e29bc6324e1c4c3c226c61ff7e"
            },
            "downloads": -1,
            "filename": "voxceleb_luigi-0.2.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "3934b3e7f70c5e574b02fed456551c8d",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 14451,
            "upload_time": "2018-11-18T17:35:57",
            "url": "https://files.pythonhosted.org/packages/6c/a0/66f8a57abc6bfa1a992ed1027e725a4acfbc8f751ea117805ff908673f20/voxceleb_luigi-0.2.0-py3-none-any.whl"
        }
    ]
}