{ "info": { "author": "Johannes Filter", "author_email": "hi@jfilter.de", "bugtrack_url": null, "classifiers": [ "License :: OSI Approved :: MIT License", "Programming Language :: Python :: 3.6", "Programming Language :: Python :: 3.7", "Topic :: Scientific/Engineering :: Artificial Intelligence", "Topic :: Utilities" ], "description": "# Split Folders [![Build Status](https://travis-ci.com/jfilter/split-folders.svg?branch=master)](https://travis-ci.com/jfilter/split-folders) [![PyPI](https://img.shields.io/pypi/v/split-folders.svg)](https://pypi.org/project/split-folders/) [![PyPI - Python Version](https://img.shields.io/pypi/pyversions/split-folders.svg)](https://pypi.org/project/split-folders/)\n\nSplit folders with files (e.g. images) into train, validation and test (dataset) folders.\n\nThe input folder shoud have the following format:\n\n```\ninput/\n class1/\n img1.jpg\n img2.jpg\n ...\n class2/\n imgWhatever.jpg\n ...\n ...\n```\n\nIn order to give you this:\n\n```\noutput/\n train/\n class1/\n img1.jpg\n ...\n class2/\n imga.jpg\n ...\n val/\n class1/\n img2.jpg\n ...\n class2/\n imgb.jpg\n ...\n test/\n class1/\n img3.jpg\n ...\n class2/\n imgc.jpg\n ...\n```\n\nThis should get you started to do some serious deep learning on your data. [Read here](https://stats.stackexchange.com/questions/19048/what-is-the-difference-between-test-set-and-validation-set) why it's a good idea to split your data intro three different sets.\n\n- You may only split into a training and validation set.\n- The data gets split before it gets shuffled.\n- A [seed](https://docs.python.org/3/library/random.html#random.seed) lets you reproduce the splits.\n- Works on any file types.\n- Allows randomized [oversampling](https://en.wikipedia.org/wiki/Oversampling_and_undersampling_in_data_analysis) for imbalanced datasets.\n- (Should) work on all operating systems.\n\n## Install\n\n```bash\npip install split-folders\n```\n\nIf you are working with a large amount of files, you may want to get a progress bar. Install [tqdm](https://github.com/tqdm/tqdm) in order to get updates when copying the files into the new folders.\n\n```bash\npip install split-folders tqdm\n```\n\n## Usage\n\nYou you can use `split_folders` as Python module or as a Command Line Interface (CLI).\n\nIf your datasets is balanced (each class has the same number of samples), choose `ratio` otherwise `fixed`. NB: oversampling is turned off by default.\n\n### Module\n\n```python\nimport split_folders\n\n# Split with a ratio.\n# To only split into training and validation set, set a tuple to `ratio`, i.e, `(.8, .2)`.\nsplit_folders.ratio('input_folder', output=\"output\", seed=1337, ratio=(.8, .1, .1)) # default values\n\n# Split val/test with a fixed number of items e.g. 100 for each set.\n# To only split into training and validation set, use a single number to `fixed`, i.e., `10`.\nsplit_folders.fixed('input_folder', output=\"output\", seed=1337, fixed=(100, 100), oversample=False) # default values\n```\n\n### CLI\n\n```\nUsage:\n split_folders folder_with_images [--output] [--ratio] [--fixed] [--seed] [--oversample]\nOptions:\n --output path to the output folder. defaults to `output`. Get created if non-existent.\n --ratio the ratio to split. e.g. for train/val/test `.8 .1 .1` or for train/val `.8 .2`.\n --fixed set the absolute number of items per validation/test set. The remaining items constitute\n the training set. e.g. for train/val/test `100 100` or for train/val `100`.\n --seed set seed value for shuffling the items. defaults to 1337.\n --oversample enable oversampling of imbalanced datasets, works only with --fixed.\nExample:\n split_folders imgs --ratio .8 .1 .1\n```\n\n## License\n\nMIT.\n\n\n", "description_content_type": "text/markdown", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/jfilter/split-folders", "keywords": "", "license": "MIT", "maintainer": "", "maintainer_email": "", "name": "split-folders", "package_url": "https://pypi.org/project/split-folders/", "platform": "", "project_url": "https://pypi.org/project/split-folders/", "project_urls": { "Homepage": "https://github.com/jfilter/split-folders" }, "release_url": "https://pypi.org/project/split-folders/0.3.1/", "requires_dist": null, "requires_python": ">=3.6", "summary": "\ud83d\uddc2 Split folders with files (e.g. images) into training, validation and test (dataset) folders.", "version": "0.3.1" }, "last_serial": 5609506, "releases": { "0.1.0": [ { "comment_text": "", "digests": { "md5": "e83a834cbc1c886f4bc4010ab5696a9d", "sha256": "1ef097d4cb15b68d14b85759eaa22e6836d304bc7c5e6aa51d501e4e4835142f" }, "downloads": -1, "filename": "split_folders-0.1.0-py3-none-any.whl", "has_sig": false, "md5_digest": "e83a834cbc1c886f4bc4010ab5696a9d", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 5304, "upload_time": "2018-10-04T21:24:00", "url": "https://files.pythonhosted.org/packages/e1/3d/c402a2dc38010e34d51227ba5e227e8e73a41cd17ee7b424dd4ae947d7cb/split_folders-0.1.0-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "fba66cfb4cc55cb01b33197279c12c95", "sha256": "9ec88fd6a4f965d6739a72951f83f85abb24c92926a7148b72a465d69bce3095" }, "downloads": -1, "filename": "split_folders-0.1.0.tar.gz", "has_sig": false, "md5_digest": "fba66cfb4cc55cb01b33197279c12c95", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 3739, "upload_time": "2018-10-04T21:24:02", "url": "https://files.pythonhosted.org/packages/96/62/6946b29e5c99d51a9370a3c75a5716f8626676685129598a4304f62e5ee5/split_folders-0.1.0.tar.gz" } ], "0.2.0": [ { "comment_text": "", "digests": { "md5": "9ec48a8aa4ef7ffd2aad6aeea3bcd700", "sha256": "a74317d5402854519bcdc0497d15886bf4fc86ffb5b661d522e00e6cf9a7c01a" }, "downloads": -1, "filename": "split_folders-0.2.0-py3.7.egg", "has_sig": false, "md5_digest": "9ec48a8aa4ef7ffd2aad6aeea3bcd700", "packagetype": "bdist_egg", "python_version": "3.7", "requires_python": null, "size": 4647, "upload_time": "2018-11-09T17:18:41", "url": "https://files.pythonhosted.org/packages/e0/7e/5b2caeb3660dd23500bb58207d7bb096bb10a7f885515769a127bf497a93/split_folders-0.2.0-py3.7.egg" }, { "comment_text": "", "digests": { "md5": "ad1aa51ff4428df0b545367598561410", "sha256": "980f55301d444e55d00233c619ed9a5a903e6910fb6224a7b60022c2bfce6d0f" }, "downloads": -1, "filename": "split_folders-0.2.0-py3-none-any.whl", "has_sig": false, "md5_digest": "ad1aa51ff4428df0b545367598561410", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 5754, "upload_time": "2018-10-18T18:31:36", "url": "https://files.pythonhosted.org/packages/b2/a6/dd76ca87cb23f84c998ee8ba2d56790c519c92058e651a7cac95d7f12a1b/split_folders-0.2.0-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "95d79a141b006e59a5ea2c94d5f9b6ee", "sha256": "586ffdb2cb830d379041b0f7ae64f2432df52cf1249e62254bb1f54a68c6275b" }, "downloads": -1, "filename": "split_folders-0.2.0.tar.gz", "has_sig": false, "md5_digest": "95d79a141b006e59a5ea2c94d5f9b6ee", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 4235, "upload_time": "2018-10-18T18:31:38", "url": "https://files.pythonhosted.org/packages/17/db/0e64dec5d6c94b12d1d18ffd6cebf6159d7b2bb11e6b24dddddc0f600985/split_folders-0.2.0.tar.gz" } ], "0.2.1": [ { "comment_text": "", "digests": { "md5": "ac5bbe52f780b69912528c57932656e4", "sha256": "1f9818aeb5500434cb2950810cda7788a116bea72db57bc62713aa049871fbba" }, "downloads": -1, "filename": "split_folders-0.2.1-py3-none-any.whl", "has_sig": false, "md5_digest": "ac5bbe52f780b69912528c57932656e4", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 5757, "upload_time": "2018-11-09T17:18:40", "url": "https://files.pythonhosted.org/packages/ff/95/000c77bad0fbf0825454b7ff8670216449ee01c39e83328c7ce7cd9895c0/split_folders-0.2.1-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "eaeadaf38e79fbe8dfe2e0bde46e2192", "sha256": "ff395e9e8737fcb36e7525d5e7f6da540815668d32ffff72279e9994005a9ef7" }, "downloads": -1, "filename": "split_folders-0.2.1.tar.gz", "has_sig": false, "md5_digest": "eaeadaf38e79fbe8dfe2e0bde46e2192", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 4250, "upload_time": "2018-11-09T17:18:43", "url": "https://files.pythonhosted.org/packages/c6/6a/0ea62ce52646ac56f95b2496a54246103d95893dd1728c1fcb96593d550d/split_folders-0.2.1.tar.gz" } ], "0.2.2": [ { "comment_text": "", "digests": { "md5": "7c39b7caffa64027cd1d13985ae9ebaa", "sha256": "b4171fe76769ec5ca79c09ad45cef0dd86895656581d9e98764910f07b494ff7" }, "downloads": -1, "filename": "split_folders-0.2.2-py3-none-any.whl", "has_sig": false, "md5_digest": "7c39b7caffa64027cd1d13985ae9ebaa", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 5904, "upload_time": "2019-05-12T12:42:15", "url": "https://files.pythonhosted.org/packages/94/fd/29ffa8c9495eec46b17c939db2b6d0bdc6b768c1438d7b8bd0802b040d2b/split_folders-0.2.2-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "2b6df6b7bbd0650a6ccd5844583d3005", "sha256": "ba88e3b313c581d7a2225a296ea4a7bfbf090256115e7daea1a9c6092865eb6c" }, "downloads": -1, "filename": "split_folders-0.2.2.tar.gz", "has_sig": false, "md5_digest": "2b6df6b7bbd0650a6ccd5844583d3005", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 4360, "upload_time": "2019-05-12T12:42:20", "url": "https://files.pythonhosted.org/packages/ad/88/91cf7ffffc5993b5448ea03ac31b3a57fdb601c506aef5b69a66b572dec0/split_folders-0.2.2.tar.gz" } ], "0.2.3": [ { "comment_text": "", "digests": { "md5": "2e0186c532a1b5624bbb23e5fc4a7782", "sha256": "da182d02210bfa0b7228ca674126093ecc39d449842d16a3ddc8efa8537a0f9f" }, "downloads": -1, "filename": "split_folders-0.2.3-py3-none-any.whl", "has_sig": false, "md5_digest": "2e0186c532a1b5624bbb23e5fc4a7782", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 5927, "upload_time": "2019-07-05T21:41:13", "url": "https://files.pythonhosted.org/packages/32/d3/3714dfcf4145d5afe49101a9ed36659c3832c1e9b4d09d45e5cbb736ca3f/split_folders-0.2.3-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "db0c6f681d3a25ec0fe8365583650ce1", "sha256": "ccf9e7409e6ff332feb870fcf65ca23f64e1472462fec949498de4a81a7c86f7" }, "downloads": -1, "filename": "split_folders-0.2.3.tar.gz", "has_sig": false, "md5_digest": "db0c6f681d3a25ec0fe8365583650ce1", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 4396, "upload_time": "2019-07-05T21:41:18", "url": "https://files.pythonhosted.org/packages/31/09/e0a2b08f00039ecac5701f7ca9e4cdd4c40c2d5f2382deb16605c8d11a52/split_folders-0.2.3.tar.gz" } ], "0.3.1": [ { "comment_text": "", "digests": { "md5": "018a1375d3c58835db34873e15f3b6c4", "sha256": "0252f36a93af05cb93080e4236aa602ff59af4e1ab62932a7545ac5ab5097827" }, "downloads": -1, "filename": "split_folders-0.3.1-py3-none-any.whl", "has_sig": false, "md5_digest": "018a1375d3c58835db34873e15f3b6c4", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": ">=3.6", "size": 6232, "upload_time": "2019-07-30T18:51:21", "url": "https://files.pythonhosted.org/packages/20/67/29dda743e6d23ac1ea3d16704d8bbb48d65faf3f1b1eaf53153b3da56c56/split_folders-0.3.1-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "8c88eff78f5dccd17bafa19ab4ea5b1b", "sha256": "98f32fbff02702529db3c11e5f7c049fb030a7911876b653a40796a2ae3401b6" }, "downloads": -1, "filename": "split_folders-0.3.1.tar.gz", "has_sig": false, "md5_digest": "8c88eff78f5dccd17bafa19ab4ea5b1b", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.6", "size": 4719, "upload_time": "2019-07-30T18:51:29", "url": "https://files.pythonhosted.org/packages/43/35/196590b7054028e68d6796884f3157713e092d66e6e74cd4afdeb4b898ea/split_folders-0.3.1.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "018a1375d3c58835db34873e15f3b6c4", "sha256": "0252f36a93af05cb93080e4236aa602ff59af4e1ab62932a7545ac5ab5097827" }, "downloads": -1, "filename": "split_folders-0.3.1-py3-none-any.whl", "has_sig": false, "md5_digest": "018a1375d3c58835db34873e15f3b6c4", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": ">=3.6", "size": 6232, "upload_time": "2019-07-30T18:51:21", "url": "https://files.pythonhosted.org/packages/20/67/29dda743e6d23ac1ea3d16704d8bbb48d65faf3f1b1eaf53153b3da56c56/split_folders-0.3.1-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "8c88eff78f5dccd17bafa19ab4ea5b1b", "sha256": "98f32fbff02702529db3c11e5f7c049fb030a7911876b653a40796a2ae3401b6" }, "downloads": -1, "filename": "split_folders-0.3.1.tar.gz", "has_sig": false, "md5_digest": "8c88eff78f5dccd17bafa19ab4ea5b1b", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.6", "size": 4719, "upload_time": "2019-07-30T18:51:29", "url": "https://files.pythonhosted.org/packages/43/35/196590b7054028e68d6796884f3157713e092d66e6e74cd4afdeb4b898ea/split_folders-0.3.1.tar.gz" } ] }