{
    "info": {
        "author": "Mike C. Fletcher",
        "author_email": "mcfletch@vrplumber.com",
        "bugtrack_url": null,
        "classifiers": [
            "Intended Audience :: Developers",
            "License :: OSI Approved :: MIT License",
            "Natural Language :: English",
            "Programming Language :: Python :: 2",
            "Programming Language :: Python :: 2.6",
            "Programming Language :: Python :: 2.7",
            "Programming Language :: Python :: 3",
            "Programming Language :: Python :: 3.3",
            "Programming Language :: Python :: 3.4",
            "Programming Language :: Python :: 3.5"
        ],
        "description": "==============\nAudio Datasets\n==============\n\n\n.. image:: https://img.shields.io/pypi/v/audiodatasets.svg\n        :target: https://pypi.python.org/pypi/audiodatasets\n\n.. image:: https://img.shields.io/travis/mcfletch/audiodatasets.svg\n        :target: https://travis-ci.org/mcfletch/audiodatasets\n\n.. image:: https://readthedocs.org/projects/audiodatasets/badge/?version=latest\n        :target: https://audiodatasets.readthedocs.io/en/latest/?badge=latest\n        :alt: Documentation Status\n\n.. image:: https://pyup.io/repos/github/mcfletch/audiodatasets/shield.svg\n     :target: https://pyup.io/repos/github/mcfletch/audiodatasets/\n     :alt: Updates\n\n\nPulls and pre-processes major Open Source datasets for spoken audio\n\n* Supported Datasets:\n\n  * `Librispeech <http://www.openslr.org/resources/12/>`_ (60GB)\n  * `TEDLIUM_release2 <http://www.openslr.org/resources/19/>`_ (35GB)\n  * `VCTK-Corpus <http://homepages.inf.ed.ac.uk/jyamagis/release/>`_ (11GB)\n\n* This is intended for use on Linux servers and it is expected that you will be using the \n  library to feed a machine learning system (not necessary, but that's sort of the point of \n  collecting these datasets)\n* MIT license for the software, but please note that the datasets themselves are \n  generally for non-commercial use only\n\nFeatures\n--------\n\n* Downloads common Open Source datasets and performs basic preprocessing on them\n* Provides iterables that produce Numpy arrays from the audio data in common formats\n* Uses `sphfile` to directly accesses sph files instead of needing to convert to `wav` first\n* Uses a single shared location for the datasets intended to be used by multiple projects\n\nInstallation/Setup\n------------------\n\nYou need to create the download directory and make it writable by the running user. Preferably\nyou will do that via group-based permissions to allow sharing, but we will here show creation\nof a user-specific ownership::\n\n    $ mkdir -p /var/datasets\n    $ chown user:group /var/datasets\n    $ chmod g+rw /var/datasets\n\nif `/var/datasets` doesn't exist, or isn't writable, the downloader will instead populate\n`~/.config/datasets` with the data. You may wish to link that directory to `/var/datasets`\nso that you can use default instantiations of the corpora::\n\n    $ ln -s /var/datasets ~/.config/datasets\n\nNote that the downloader expects that you have the following available, this may not\nyet be the case in a docker or minimal OS installation:\n\n    * `tar`\n    * `wget`\n\nNow you can download the datasets.\n\n.. note::\n\n    The datasets are big (100+GB)!\n    \n    If you are paying for data or are working on a slow connection you will\n    likely want to arrange to do this step during a low-rated period or on a \n    separate data connection.\n\nFrom a command prompt::\n\n    $ pip install audiodatasets\n    # this will download 100+GB and then unpack it on disk, it will take a while...\n    $ audiodatasets-download \n\nCreating MFCC data-files::\n\n    # this will generate Multi-frequency Cepestral Coefficient (MFCC) summaries for the \n    # audio datasets (and download them if that hasn't been done). This isn't necessary\n    # if you are doing only raw-audio processing\n    $ audiodatasets-preprocess \n\nPlaying some audio::\n\n    # this will iterate through playing every utterance that includes 'moon' in the transcript\n    $ audiodatasets-search 'moon'\n\nUsage\n-------\n\nOnce setup, you likely want to iterate over the data-sets using, for instance, a partition to \nseparate out test/train/validate data. To iterate over the raw audio:\n\n.. code:: python\n\n    from audiodatasets.corpora import build_corpora, partition\n    import random\n\n    def train_valid_test():\n        \"\"\"Create training, validation and tests datasets\n        \n        returns three iterators yielding (array[10:512],transcript) batches\n        \"\"\"\n        utterances = []\n        for corpus in build_corpora():\n            utterances.extend( corpus.iter_utterances())\n        random.shuffle(utterances)\n        train, test,valid = partition( utterances, (3,1,1) )\n        def generation( utterances ):\n            while True:\n                offset = random.randint(0,511)\n                for name,transcript,audio_file in utterances:\n                    for batch in t.iter_batches( audio_file, batch_size=10, input=512, offset=offset ):\n                        yield batch,transcript\n        return generation(train),generation(test),generation(valid)\n\nTo iterate over the 10ms MFCC preprocessed data, which yields 20 frequency batches per \nprocessing window (10ms):\n\n.. code:: python\n\n    from audiodatasets.corpora import build_corpora, partition\n    import random\n\n    def train_valid_test():\n        \"\"\"Create training, validation and tests datasets\n\n        Note: the batches vary in *time* at highest frequency, while\n        the frequency bins are the second-highest frequency.\n\n        See: `LibRosa MFCC <https://librosa.github.io/librosa/generated/librosa.feature.mfcc.html>`_\n        \n        returns three iterators yielding (array[10:20:63],transcript) batches\n        \"\"\"\n        utterances = []\n        for corpus in build_corpora():\n            utterances.extend( corpus.mfcc_utterances())\n        random.shuffle(utterances)\n        train, test,valid = partition( utterances, (3,1,1) )\n        def generation( utterances ):\n            while True:\n                offset = random.randint(0,62)\n                for name,transcript,audio_file in utterances:\n                    for batch in t.iter_batches( audio_file, batch_size=10, input=63, offset=offset ):\n                        yield batch,transcript\n        return generation(train),generation(test),generation(valid)\n",
        "description_content_type": null,
        "docs_url": null,
        "download_url": "",
        "downloads": {
            "last_day": -1,
            "last_month": -1,
            "last_week": -1
        },
        "home_page": "https://github.com/mcfletch/audiodatasets",
        "keywords": "audiodatasets",
        "license": "MIT license",
        "maintainer": "",
        "maintainer_email": "",
        "name": "audiodatasets",
        "package_url": "https://pypi.org/project/audiodatasets/",
        "platform": "",
        "project_url": "https://pypi.org/project/audiodatasets/",
        "project_urls": {
            "Homepage": "https://github.com/mcfletch/audiodatasets"
        },
        "release_url": "https://pypi.org/project/audiodatasets/1.0.0/",
        "requires_dist": null,
        "requires_python": "",
        "summary": "Pulls and pre-processes major Open Source (non-commercial mostly) datasets for spoken audio",
        "version": "1.0.0"
    },
    "last_serial": 2921436,
    "releases": {
        "1.0.0": [
            {
                "comment_text": "",
                "digests": {
                    "md5": "ab5ef00365338c1f6b66bfd87357dad6",
                    "sha256": "c176afd3190f93a3da7ed7701b058156b361cecbb1bb04ac10bce18775109856"
                },
                "downloads": -1,
                "filename": "audiodatasets-1.0.0.tar.gz",
                "has_sig": false,
                "md5_digest": "ab5ef00365338c1f6b66bfd87357dad6",
                "packagetype": "sdist",
                "python_version": "source",
                "requires_python": null,
                "size": 17053,
                "upload_time": "2017-06-02T20:43:20",
                "url": "https://files.pythonhosted.org/packages/eb/ca/db5183673bfb710d0d6d2e47633d3e3f48ad462c4cf16ee11787c630c010/audiodatasets-1.0.0.tar.gz"
            }
        ]
    },
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "md5": "ab5ef00365338c1f6b66bfd87357dad6",
                "sha256": "c176afd3190f93a3da7ed7701b058156b361cecbb1bb04ac10bce18775109856"
            },
            "downloads": -1,
            "filename": "audiodatasets-1.0.0.tar.gz",
            "has_sig": false,
            "md5_digest": "ab5ef00365338c1f6b66bfd87357dad6",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 17053,
            "upload_time": "2017-06-02T20:43:20",
            "url": "https://files.pythonhosted.org/packages/eb/ca/db5183673bfb710d0d6d2e47633d3e3f48ad462c4cf16ee11787c630c010/audiodatasets-1.0.0.tar.gz"
        }
    ]
}