{ "info": { "author": "Reid Sanders", "author_email": "reid@reidsanders.net", "bugtrack_url": null, "classifiers": [], "description": "# Danbooru Utility \n\nDanbooru Utility is a simple python script for working with gwern's Danbooru2018 dataset. It can explore the dataset, filter by tags, rating, and score, detect faces, and resize the images. I've been using it to make datasets for gan training.\n\n\n## Install\n\n```sh\npip3 install danbooru-utility\n```\n\nMake sure you have downloaded [Danbooru2018](https://www.gwern.net/Danbooru2018). It's ~3.3M annotated anime images, so downloading may take a long time.\n\n## Usage\n\nFirst let's search for something fairly particular.\n\n```sh\n$ danbooru-utility \\\n--directory ~/datasets/danbooru-gwern/danbooru2018/ \\\n--rating \"s\" \\\n--required_tags \"archer,toosaka_rin,hug\" \\\n--max_examples 3 \\\n--img_size 256\n\nProcessed 3 files. Added 3 images. It took 14.39 sec\n```\n\nThis will find three images with the required tags, and resize them to 256x256. Note this took a long time since the filtering is just done in a loop. Let's check what this produced in `out-images`:\n\n![Rin Archer example 1](./img/rin_archer1.jpg)\n![Rin Archer example 2](./img/rin_archer2.jpg)\n![Rin Archer example 3](./img/rin_archer3.png)\n\nNow let's run the same command but with face detection:\n\n```sh\n$ danbooru-utility \\\n--directory ~/datasets/danbooru-gwern/danbooru2018/ \\\n--rating \"s\" \\\n--required_tags \"archer,toosaka_rin,hug\" \\\n--max_examples 3 \\\n--img_size 256 \\\n--faces\n\nProcessed 3 files. Added 1 images. It took 12.48 sec\n```\n\nThat produced:\n\n![Rin Archer face example](./img/rin_archer_face_default.jpg)\n\nSo it cropped with the face in the upper center of the image.\n\nLet's change the `face_scale` parameter. This controls how much of the image around the face is included in the crop.\n\n```sh\n$ danbooru-utility \\\n--directory ~/datasets/danbooru-gwern/danbooru2018/ \\\n--rating \"s\" \\\n--required_tags \"archer,toosaka_rin,hug\" \\\n--max_examples 3 \\\n--img_size 256 \\\n--faces \\\n--overwrite \\\n--face_scale 1.8\n\nProcessed 3 files. Added 1 images. It took 12.49 sec\n```\n\n![Rin Archer face scale example](./img/rin_archer_face_scale.jpg)\n\nA little tighter crop.\n\nIf you have already processed some images this utility will check and not reproduce them, unless you use `--overwrite`. So if you change image generation parameters you should use this flag. You can also specify a `--link_dir` to symlink to. So you can, for instance, resize a large number of images, and then create datasets for specific tags quickly.\n\nSo for GAN training I would use something like this to generate a training set:\n\n```sh\n$ danbooru-utility \\\n--directory ~/datasets/danbooru-gwern/danbooru2018/ \\\n--rating \"s,q\" \\\n--banned_tags \"photo,comic\" \\\n--max_examples 1000000000 \\\n--img_size 256 \\\n--faces\n\nProcessed 100 files. It took 10.36 sec\nProcessed 200 files. It took 20.06 sec\nProcessed 300 files. It took 39.16 sec\n...\n```\n\n## Config\n\nFor details on parameters check help.\n\n```sh\n$ danbooru-utility -h\n```\n```\nusage: danbooru-utility [-h] [-d DIRECTORY] [--metadata_dir METADATA_DIR]\n [--save_dir SAVE_DIR] [--link_dir LINK_DIR]\n [-r REQUIRED_TAGS] [-b BANNED_TAGS] [-a ATLEAST_TAGS]\n [--ratings RATINGS] [--score_range SCORE_RANGE]\n [-n ATLEAST_NUM] [--overwrite [OVERWRITE]]\n [--preview [PREVIEW]] [--faces [FACES]]\n [--face_scale FACE_SCALE]\n [--max_examples MAX_EXAMPLES] [--img_size IMG_SIZE]\n\ndanbooru2018 utility script\n\noptional arguments:\n -h, --help show this help message and exit\n -d DIRECTORY, --directory DIRECTORY\n Danbooru dataset directory.\n --metadata_dir METADATA_DIR\n Metadata path below base directory. Will load all json\n files here.\n --save_dir SAVE_DIR Directory processed images are saved to.\n --link_dir LINK_DIR Directory with already processed images. Used to\n symlink to if the files exist.\n -r REQUIRED_TAGS, --required_tags REQUIRED_TAGS\n Tags required.\n -b BANNED_TAGS, --banned_tags BANNED_TAGS\n Tags disallowed.\n -a ATLEAST_TAGS, --atleast_tags ATLEAST_TAGS\n Requires some number of these tags.\n --ratings RATINGS Only include images with these ratings. \"s,q,e\" are\n the possible entries, and represent\n \"safe,questionable,explicit\".\n --score_range SCORE_RANGE\n Only include images inside this score range.\n -n ATLEAST_NUM, --atleast_num ATLEAST_NUM\n Minimum number of atleast_tags required.\n --overwrite [OVERWRITE]\n Overwrite images in save directory.\n --preview [PREVIEW] Preview images.\n --faces [FACES] Detect faces and try to include them in top of image.\n --face_scale FACE_SCALE\n Height and width multiplier over size of face.\n --max_examples MAX_EXAMPLES\n Maximum number of files to load.\n --img_size IMG_SIZE Size of side for resized images.\n\n```\n\nHere's an example metadata entry in Danbooru2018:\n\n```python\n{'approver_id': '0',\n 'created_at': '2016-10-26 09:32:42.38506 UTC',\n 'down_score': '0',\n 'favs': ['12082', '334419', '496852', '516035', '487870'],\n 'file_ext': 'jpg',\n 'file_size': '753165',\n 'has_children': False,\n 'id': '2524919',\n 'image_height': '874',\n 'image_width': '1181',\n 'is_banned': False,\n 'is_deleted': False,\n 'is_flagged': False,\n 'is_note_locked': False,\n 'is_pending': False,\n 'is_rating_locked': False,\n 'is_status_locked': False,\n 'last_commented_at': '1970-01-01 00:00:00 UTC',\n 'last_noted_at': '1970-01-01 00:00:00 UTC',\n 'md5': 'a9260780fbf5cfd661878f92a268124e',\n 'parent_id': '2524918',\n 'pixiv_id': '54348754',\n 'pools': [],\n 'rating': 's',\n 'score': '3',\n 'source': 'http://i3.pixiv.net/img-original/img/2015/12/31/13/31/23/54348754_p13.jpg',\n 'tags': [{'category': '0', 'id': '540830', 'name': '1boy'},\n\t\t {'category': '0', 'id': '470575', 'name': '1girl'},\n\t\t {'category': '1', 'id': '1332557', 'name': 'akira_(ubw)'},\n\t\t {'category': '4', 'id': '396', 'name': 'archer'},\n\t\t {'category': '0', 'id': '13200', 'name': 'black_hair'},\n\t\t {'category': '0', 'id': '3389', 'name': 'blush'},\n\t\t {'category': '0', 'id': '4563', 'name': 'bow'},\n\t\t {'category': '0', 'id': '465619', 'name': 'closed_eyes'},\n\t\t {'category': '0', 'id': '71730', 'name': 'dark_skin'},\n\t\t {'category': '0', 'id': '610236', 'name': 'dark_skinned_male'},\n\t\t {'category': '3', 'id': '5', 'name': 'fate/stay_night'},\n\t\t {'category': '3', 'id': '662939', 'name': 'fate_(series)'},\n\t\t {'category': '0', 'id': '374938', 'name': 'frown'},\n\t\t {'category': '0', 'id': '374844', 'name': 'hair_bow'},\n\t\t {'category': '0', 'id': '5126', 'name': 'hug'},\n\t\t {'category': '0', 'id': '1815', 'name': 'smile'},\n\t\t {'category': '0', 'id': '125238', 'name': 'sweatdrop'},\n\t\t {'category': '4', 'id': '400140', 'name': 'toosaka_rin'},\n\t\t {'category': '0', 'id': '652604', 'name': 'two_side_up'},\n\t\t {'category': '0', 'id': '16581', 'name': 'white_hair'}],\n 'up_score': '3',\n 'updated_at': '2018-06-05 05:37:49.87865 UTC',\n 'uploader_id': '39276'}\n\n```\n\nYou can explore the metadata and find what tags are associated with each image using `--preview`.\n\n## Improvements\n\nThis could load the dataset into a relational database, allowing much more efficient and powerful querying.\n\nThe face detection has room for improvement. It has rare false positives, and a fair number of false negatives.\n\nI'm happy to consider pull requests.\n\n## Acknowledgements\n\nThanks to [gwern](https://gwern.net) for the excellent danbooru dataset.\n\nThanks to [nagadomi](https://github.com/nagadomi/lbpcascade_animeface) for the anime face detection model.\n\n\n\n", "description_content_type": "text/markdown", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/reidsanders/danbooru-utility.git", "keywords": "", "license": "", "maintainer": "", "maintainer_email": "", "name": "danbooru-utility", "package_url": "https://pypi.org/project/danbooru-utility/", "platform": "", "project_url": "https://pypi.org/project/danbooru-utility/", "project_urls": { "Homepage": "https://github.com/reidsanders/danbooru-utility.git" }, "release_url": "https://pypi.org/project/danbooru-utility/0.1.20/", "requires_dist": [ "numpy (>=1.15.4)", "opencv-python (>=3.4.3.18)", "python-resize-image (>=1.1.18)", "Pillow (>=5.4.1)" ], "requires_python": ">=3.6", "summary": "Utility for working with danbooru2018 dataset", "version": "0.1.20" }, "last_serial": 5009910, "releases": { "0.1.19": [ { "comment_text": "", "digests": { "md5": "d5c8d41c0a78e6c01febfa97ebd60f78", "sha256": "b2a9f02a71341cc06347f59d548fa9ae4c3a5eadc82bff1f3032b0ac82861f61" }, "downloads": -1, "filename": "danbooru_utility-0.1.19-py3-none-any.whl", "has_sig": false, "md5_digest": "d5c8d41c0a78e6c01febfa97ebd60f78", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": ">=3.6", "size": 135838, "upload_time": "2019-03-31T02:10:14", "url": "https://files.pythonhosted.org/packages/14/ea/c9c3ba8687f69b2af7d069991fa56e583f1541c5465cb7c241071f02452c/danbooru_utility-0.1.19-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "b7f52159878fff0fb09a525033912521", "sha256": "2a3884535135d961ce42c3a32af687bb587a08f411cb7e7c80b7fce6ca74cce6" }, "downloads": -1, "filename": "danbooru-utility-0.1.19.tar.gz", "has_sig": false, "md5_digest": "b7f52159878fff0fb09a525033912521", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.6", "size": 70684, "upload_time": "2019-03-31T02:10:16", "url": "https://files.pythonhosted.org/packages/58/0b/5a734d59589225d3f4a0e88a87e53ced1d36e220cd592ceb49a8464f38f0/danbooru-utility-0.1.19.tar.gz" } ], "0.1.20": [ { "comment_text": "", "digests": { "md5": "6b624aaeda9e609c0a0a9140db144458", "sha256": "24726d0ecc172457429432bcc91c596d69f36e2b11838eca7b45d7127715d54b" }, "downloads": -1, "filename": "danbooru_utility-0.1.20-py3-none-any.whl", "has_sig": false, "md5_digest": "6b624aaeda9e609c0a0a9140db144458", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": ">=3.6", "size": 135998, "upload_time": "2019-03-31T17:09:51", "url": "https://files.pythonhosted.org/packages/43/5c/bd840236e823f5082074ca54c5e230f4216a1e91a35b35093ec5367b0ecc/danbooru_utility-0.1.20-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "02068918c706cc57b627707993971ffb", "sha256": "6a07a24470bdd53acb778ceb0bbc669ff4988f3490bd5eb6045953bed9da81c2" }, "downloads": -1, "filename": "danbooru-utility-0.1.20.tar.gz", "has_sig": false, "md5_digest": "02068918c706cc57b627707993971ffb", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.6", "size": 71037, "upload_time": "2019-03-31T17:09:52", "url": "https://files.pythonhosted.org/packages/f8/9e/509b0d71f03a2743eee55ab743e1070e5fb60809b0062e0e965235ffd791/danbooru-utility-0.1.20.tar.gz" } ], "0.1.9": [ { "comment_text": "", "digests": { "md5": "a41b1a6dee9399e9a940b33928a5384d", "sha256": "03ee200805750814b2f77708ee2f364df3277a0eb79cd5e457b05343db6007a3" }, "downloads": -1, "filename": "danbooru_utility-0.1.9-py3-none-any.whl", "has_sig": false, "md5_digest": "a41b1a6dee9399e9a940b33928a5384d", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": ">=3.6", "size": 8901, "upload_time": "2019-03-30T23:51:32", "url": "https://files.pythonhosted.org/packages/3d/85/610924d291e44640e69fde1fd073bcc8b1298007b0b3350d212a3041ab39/danbooru_utility-0.1.9-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "abcc58e5913cf6ad934588b0b7035af4", "sha256": "571256b6ae1db72e7c6829d40e8fe0cc62019a4f92e46fa7337cbab9cd0860cf" }, "downloads": -1, "filename": "danbooru-utility-0.1.9.tar.gz", "has_sig": false, "md5_digest": "abcc58e5913cf6ad934588b0b7035af4", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.6", "size": 7852, "upload_time": "2019-03-30T23:51:34", "url": "https://files.pythonhosted.org/packages/68/c4/aa73fd6341407a6e34056fdc1fccc7f6d8db11534209620b3dbbbe342009/danbooru-utility-0.1.9.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "6b624aaeda9e609c0a0a9140db144458", "sha256": "24726d0ecc172457429432bcc91c596d69f36e2b11838eca7b45d7127715d54b" }, "downloads": -1, "filename": "danbooru_utility-0.1.20-py3-none-any.whl", "has_sig": false, "md5_digest": "6b624aaeda9e609c0a0a9140db144458", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": ">=3.6", "size": 135998, "upload_time": "2019-03-31T17:09:51", "url": "https://files.pythonhosted.org/packages/43/5c/bd840236e823f5082074ca54c5e230f4216a1e91a35b35093ec5367b0ecc/danbooru_utility-0.1.20-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "02068918c706cc57b627707993971ffb", "sha256": "6a07a24470bdd53acb778ceb0bbc669ff4988f3490bd5eb6045953bed9da81c2" }, "downloads": -1, "filename": "danbooru-utility-0.1.20.tar.gz", "has_sig": false, "md5_digest": "02068918c706cc57b627707993971ffb", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.6", "size": 71037, "upload_time": "2019-03-31T17:09:52", "url": "https://files.pythonhosted.org/packages/f8/9e/509b0d71f03a2743eee55ab743e1070e5fb60809b0062e0e965235ffd791/danbooru-utility-0.1.20.tar.gz" } ] }