{ "info": { "author": "Xtract AI", "author_email": "info@xtract.ai", "bugtrack_url": null, "classifiers": [ "Operating System :: OS Independent", "Programming Language :: Python :: 3" ], "description": "# xt-cvdata\n\n## Description\n\nThis repo contains utilities for building and working with computer vision datasets, developed by [Xtract AI](https://xtract.ai/).\n\nSo far, APIs for the following open-source datasets are included:\n1. COCO 2017 (detection and segmentation): `xt_cvdata.apis.COCO`\n1. Open Images V5 (detection and segmentation): `xt_cvdata.apis.OpenImages`\n1. Visual Object Tagging Tool (VoTT) CSV output (detection): `xt_cvdata.apis.VoTTCSV`\n\nMore to come.\n\n## Installation\n\nFrom PyPI:\n```bash\npip install xt-cvdata\n```\n\nFrom source:\n```bash\ngit clone https://github.com/XtractTech/xt-cvdata.git\npip install ./xt-cvdata\n```\n\n## Usage\n\nSee specific help on a dataset class using `help`. E.g., `help(xt_cvdata.apis.COCO)`.\n\n#### Building a dataset\n\n```python\nfrom xt_cvdata.apis import COCO, OpenImages\n\n# Build an object populated with the COCO image list, categories, and annotations\ncoco = COCO('/nasty/data/common/COCO_2017')\nprint(coco)\nprint(coco.class_distribution)\n\n# Same for Open Images\noi = OpenImages('/nasty/data/common/open_images_v5')\nprint(oi)\nprint(coco.class_distribution)\n\n# Get just the person classes\ncoco.subset(['person'])\noi.subset(['Person']).rename({'Person': 'person'})\n\n# Merge and build\nmerged = coco.merge(oi)\nmerged.build('./data/new_dataset_dir')\n```\n\nThis package follows pytorch chaining rules, meaning that methods operating on an object modify it in-place, but also return the modified object. The exception is the `merge()` method which does not modify in-place and returns a new merged object. Hence, the above operations can also be completed using:\n\n```python\nfrom xt_cvdata.apis import COCO, OpenImages\n\nmerged = (\n COCO('/nasty/data/common/COCO_2017')\n .subset(['person'])\n .merge(\n OpenImages('/nasty/data/common/COCO_2017')\n .subset(['Person'])\n .rename({'Person': 'person'})\n )\n)\nmerged.build('./data/new_dataset_dir')\n```\n\nIn practice, somewhere between the two approaches will probably be most readable.\n\nThe current set of dataset operations are:\n* `analyze`: recalculate dataset statistics (e.g., class distributions, train/val split)\n* `verify_schema`: check if class attributes follow required schema\n* `subset`: remove all but a subset of classes from the dataset\n* `rename`: rename/combine dataset classes\n* `sample`: sample a specified number of images from the train and validation sets\n* `split`: define the proportion of data in the validation set\n* `merge`: merge two datasets together, returning merged dataset\n* `build`: create the currently defined dataset using either symlinks or by copying images\n\n#### Implementing a new dataset type\n\nNew dataset types should inherit from the base `xt_cvdata.Builder` class. See the `Builder`, `COCO` and `OpenImages` classes as a guide. Specifically, the class initializer should define `info`, `licenses`, `categories`, `annotations`, and `images` attributes such that `self.verify_schema()` runs without error. This ensures that all of the methods defined in the `Builder` class will operate correctly on the inheriting class.\n\n## Data Sources\n\n[descriptions and links to data]\n\n## Dependencies/Licensing\n\n[list of dependencies and their licenses, including data]\n\n## References\n\n[list of references]\n\n\n", "description_content_type": "text/markdown", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/XtractTech/xt-cvdata", "keywords": "", "license": "", "maintainer": "", "maintainer_email": "", "name": "xt-cvdata", "package_url": "https://pypi.org/project/xt-cvdata/", "platform": "", "project_url": "https://pypi.org/project/xt-cvdata/", "project_urls": { "Homepage": "https://github.com/XtractTech/xt-cvdata" }, "release_url": "https://pypi.org/project/xt-cvdata/0.4.0/", "requires_dist": [ "numpy", "pandas", "torch", "h5py", "pillow", "matplotlib", "tqdm" ], "requires_python": "", "summary": "Utilities for building and working with computer vision datasets", "version": "0.4.0" }, "last_serial": 5946645, "releases": { "0.3.0": [ { "comment_text": "", "digests": { "md5": "f957e7961154dfdeaa0ba7129bfbc8c9", "sha256": "e3f9534f0afb42048c674fc3483eb9baf0110b2cf55a8c1e5c6d59bd4fdb1efa" }, "downloads": -1, "filename": "xt_cvdata-0.3.0-py3-none-any.whl", "has_sig": false, "md5_digest": "f957e7961154dfdeaa0ba7129bfbc8c9", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 16111, "upload_time": "2019-10-08T20:48:15", "url": "https://files.pythonhosted.org/packages/dc/1a/680db02ca63c4ecd222ca7f9e0efa00e7c6d9f4e53ccfa7769a1118d7590/xt_cvdata-0.3.0-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "0fd76281f96bec48504cf24634f8cb61", "sha256": "a098247c2af19f19ca18b62bf7e2ec0f028c031c8363dbf3607e1f351b95e4bc" }, "downloads": -1, "filename": "xt-cvdata-0.3.0.tar.gz", "has_sig": false, "md5_digest": "0fd76281f96bec48504cf24634f8cb61", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 13434, "upload_time": "2019-10-08T20:48:18", "url": "https://files.pythonhosted.org/packages/77/fd/26d2356d9fc92f8b4004e5836add8fd2bd01db7305cbe89a5b39398ba1c7/xt-cvdata-0.3.0.tar.gz" } ], "0.4.0": [ { "comment_text": "", "digests": { "md5": "c89b112b17c5a7a478c7c7d77649ccb0", "sha256": "d9608fd2b4839121f7a5267caabe0471c952ea29d757919da58c66e40cf24d89" }, "downloads": -1, "filename": "xt_cvdata-0.4.0-py3-none-any.whl", "has_sig": false, "md5_digest": "c89b112b17c5a7a478c7c7d77649ccb0", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 16119, "upload_time": "2019-10-08T20:51:55", "url": "https://files.pythonhosted.org/packages/06/bf/91d3a5b9ed2ad417e4832f8e21ad2172c84c7558652a7515c81c1205be95/xt_cvdata-0.4.0-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "97462ffb233ac9a27ad8d29a2e82660f", "sha256": "46517eb0aafbca4e1cd926ebec5efed9ee905cd13b812a1a9d14d514d9e00206" }, "downloads": -1, "filename": "xt-cvdata-0.4.0.tar.gz", "has_sig": false, "md5_digest": "97462ffb233ac9a27ad8d29a2e82660f", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 13447, "upload_time": "2019-10-08T20:51:57", "url": "https://files.pythonhosted.org/packages/d6/17/9e6273d68282331f687a8ed4f74b3c4771f1015a40c4a0d8bd6e2d8a241c/xt-cvdata-0.4.0.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "c89b112b17c5a7a478c7c7d77649ccb0", "sha256": "d9608fd2b4839121f7a5267caabe0471c952ea29d757919da58c66e40cf24d89" }, "downloads": -1, "filename": "xt_cvdata-0.4.0-py3-none-any.whl", "has_sig": false, "md5_digest": "c89b112b17c5a7a478c7c7d77649ccb0", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 16119, "upload_time": "2019-10-08T20:51:55", "url": "https://files.pythonhosted.org/packages/06/bf/91d3a5b9ed2ad417e4832f8e21ad2172c84c7558652a7515c81c1205be95/xt_cvdata-0.4.0-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "97462ffb233ac9a27ad8d29a2e82660f", "sha256": "46517eb0aafbca4e1cd926ebec5efed9ee905cd13b812a1a9d14d514d9e00206" }, "downloads": -1, "filename": "xt-cvdata-0.4.0.tar.gz", "has_sig": false, "md5_digest": "97462ffb233ac9a27ad8d29a2e82660f", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 13447, "upload_time": "2019-10-08T20:51:57", "url": "https://files.pythonhosted.org/packages/d6/17/9e6273d68282331f687a8ed4f74b3c4771f1015a40c4a0d8bd6e2d8a241c/xt-cvdata-0.4.0.tar.gz" } ] }