{ "info": { "author": "Cl\u00e9ment Lafont", "author_email": "lafont.clem@gmail.com", "bugtrack_url": null, "classifiers": [ "License :: OSI Approved :: MIT License", "Operating System :: OS Independent", "Programming Language :: Python :: 3", "Topic :: Scientific/Engineering :: Artificial Intelligence" ], "description": "# Dataset Downloader\n\n## Preview\n\nDataset_downloader allow you to download large dataset from multiple list of url, from [image-net](http://image-net.org) for example.\nYou can split the download into 2 folders, one for the training and one for the testing.\nFile are save into their class name, perfect for model training. It looks something like that:\n\n```\nroot:.\n|\n\u00e2\u201d\u0153\u00e2\u201d\u20ac\u00e2\u201d\u20ac\u00e2\u201d\u20actest\n\u00e2\u201d\u201a \u00e2\u201d\u0153\u00e2\u201d\u20ac\u00e2\u201d\u20ac\u00e2\u201d\u20acaccerola\n\u00e2\u201d\u201a \u00e2\u201d\u0153\u00e2\u201d\u20ac\u00e2\u201d\u20ac\u00e2\u201d\u20acapple\n\u00e2\u201d\u201a \u00e2\u201d\u201d\u00e2\u201d\u20ac\u00e2\u201d\u20ac\u00e2\u201d\u20aclemon\n\u00e2\u201d\u0153\u00e2\u201d\u20ac\u00e2\u201d\u20ac\u00e2\u201d\u20actrain\n\u00e2\u201d\u201a \u00e2\u201d\u0153\u00e2\u201d\u20ac\u00e2\u201d\u20ac\u00e2\u201d\u20acaccerola\n\u00e2\u201d\u201a \u00e2\u201d\u0153\u00e2\u201d\u20ac\u00e2\u201d\u20ac\u00e2\u201d\u20acapple\n\u00e2\u201d\u201a \u00e2\u201d\u201d\u00e2\u201d\u20ac\u00e2\u201d\u20ac\u00e2\u201d\u20aclemon\n```\n\n## Installation\n\nSimply install from pip:\n```\npip install dataset_downloader\n```\n\n## Config\n\nCreate a `dataset.json` file with the following content:\n\n```json\n{\n \"outputTrain\": \"...\",\n \"outputTest\": \"...\",\n \"ratio\": ...,\n \"classes\": {\n \"class1\": [\n \"http://url1\",\n \"http://url2\"\n ],\n \"class2\": [\n \"http://url1\",\n \"http://url2\"\n ],\n \"class3\": \"list_images.txt\"\n }\n}\n```\n\n* `outputTrain`: Output folder of the training images\n* `outputTest`: Output folder of the testing images\n* `ratio`: The ratio of training/testing images. 0.8 correspond of 80% of training images.\n* `classes`: List of classes with their urls. Urls can be a list of url, a file containing a list of urls or an url containing a list of urls\n\nAn exemple of file on a windows computer:\n\n```json\n \"outputTrain\": \"D:/dataset/train\",\n \"outputTest\": \"D:/dataset/test\",\n \"ratio\": 0.8,\n \"classes\": {\n \"accerola\": [\n \"http://tiachea.files.wordpress.com/2008/10/acerolas.jpg\",\n \"http://www.jardimdeflores.com.br/floresefolhas/JPEGS/A56acerola5.JPG\",\n \"http://farm2.staticflickr.com/1353/4602150961_177e096984_z.jpg\",\n ],\n \"apple\": [\n \"http://www.naturalhealth365.com/images/apple.jpg\",\n \"http://urbanext.illinois.edu/fruit/images/apple1.jpg\",\n \"https://www.aroma-zone.com/cms//sites/default/files/plante-acerola.jpg\"\n ],\n \"lemon\": \"list_images.txt\",\n \"watermelon\": \"https://gist.githubusercontent.com/johnrazeur/645787bc08a5aedd82da9573fbfa169a/raw/49cea1ee1438cecef8ac213b20f24e5ae02d4d78/watermelon.txt\"\n }\n```\n\n## Run\n\nSimple call the dataset_downloader command:\n\n```bash\ncd yourdirectory\n# You must create the dataset.json file before\ndataset_downloader\n```\n\n", "description_content_type": "text/markdown", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/johnrazeur/dataset_downloader", "keywords": "", "license": "", "maintainer": "", "maintainer_email": "", "name": "dataset-downloader", "package_url": "https://pypi.org/project/dataset-downloader/", "platform": "", "project_url": "https://pypi.org/project/dataset-downloader/", "project_urls": { "Homepage": "https://github.com/johnrazeur/dataset_downloader" }, "release_url": "https://pypi.org/project/dataset-downloader/1.0.0/", "requires_dist": [ "Click (==7.0)", "requests (==2.20.0)" ], "requires_python": "", "summary": "Tool to download large dataset from a list of url", "version": "1.0.0" }, "last_serial": 4408477, "releases": { "1.0.0": [ { "comment_text": "", "digests": { "md5": "555dcb85b6d9cbd0fe2587b296fac6af", "sha256": "86f624443d9ac7deb0d908176549adf68ee7f1e757042e76ff7f124859f587da" }, "downloads": -1, "filename": "dataset_downloader-1.0.0-py3-none-any.whl", "has_sig": false, "md5_digest": "555dcb85b6d9cbd0fe2587b296fac6af", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 5695, "upload_time": "2018-10-23T22:28:09", "url": "https://files.pythonhosted.org/packages/17/47/d923bf433df1831f928c58062215d00f3d8857f2ebbaa1dd0055aa137d94/dataset_downloader-1.0.0-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "fc9c46498b8da95298dc5611edc449aa", "sha256": "f45ad3dafb0e1bfa5cd1346b6972941137d51dff166d2246edf9926341027157" }, "downloads": -1, "filename": "dataset_downloader-1.0.0.tar.gz", "has_sig": false, "md5_digest": "fc9c46498b8da95298dc5611edc449aa", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 3898, "upload_time": "2018-10-23T22:28:10", "url": "https://files.pythonhosted.org/packages/e6/03/9379a450c9978b94231640715b4845a8c59c5afa2c27156949c4d55bf853/dataset_downloader-1.0.0.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "555dcb85b6d9cbd0fe2587b296fac6af", "sha256": "86f624443d9ac7deb0d908176549adf68ee7f1e757042e76ff7f124859f587da" }, "downloads": -1, "filename": "dataset_downloader-1.0.0-py3-none-any.whl", "has_sig": false, "md5_digest": "555dcb85b6d9cbd0fe2587b296fac6af", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 5695, "upload_time": "2018-10-23T22:28:09", "url": "https://files.pythonhosted.org/packages/17/47/d923bf433df1831f928c58062215d00f3d8857f2ebbaa1dd0055aa137d94/dataset_downloader-1.0.0-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "fc9c46498b8da95298dc5611edc449aa", "sha256": "f45ad3dafb0e1bfa5cd1346b6972941137d51dff166d2246edf9926341027157" }, "downloads": -1, "filename": "dataset_downloader-1.0.0.tar.gz", "has_sig": false, "md5_digest": "fc9c46498b8da95298dc5611edc449aa", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 3898, "upload_time": "2018-10-23T22:28:10", "url": "https://files.pythonhosted.org/packages/e6/03/9379a450c9978b94231640715b4845a8c59c5afa2c27156949c4d55bf853/dataset_downloader-1.0.0.tar.gz" } ] }