{
    "info": {
        "author": "Xtract AI",
        "author_email": "info@xtract.ai",
        "bugtrack_url": null,
        "classifiers": [
            "Operating System :: OS Independent",
            "Programming Language :: Python :: 3"
        ],
        "description": "# xt-cvdata\n\n## Description\n\nThis repo contains utilities for building and working with computer vision datasets, developed by [Xtract AI](https://xtract.ai/).\n\nSo far, APIs for the following open-source datasets are included:\n1. COCO 2017 (detection and segmentation): `xt_cvdata.apis.COCO`\n1. Open Images V5 (detection and segmentation): `xt_cvdata.apis.OpenImages`\n1. Visual Object Tagging Tool (VoTT) CSV output (detection): `xt_cvdata.apis.VoTTCSV`\n\nMore to come.\n\n## Installation\n\nFrom PyPI:\n```bash\npip install xt-cvdata\n```\n\nFrom source:\n```bash\ngit clone https://github.com/XtractTech/xt-cvdata.git\npip install ./xt-cvdata\n```\n\n## Usage\n\nSee specific help on a dataset class using `help`. E.g., `help(xt_cvdata.apis.COCO)`.\n\n#### Building a dataset\n\n```python\nfrom xt_cvdata.apis import COCO, OpenImages\n\n# Build an object populated with the COCO image list, categories, and annotations\ncoco = COCO('/nasty/data/common/COCO_2017')\nprint(coco)\nprint(coco.class_distribution)\n\n# Same for Open Images\noi = OpenImages('/nasty/data/common/open_images_v5')\nprint(oi)\nprint(coco.class_distribution)\n\n# Get just the person classes\ncoco.subset(['person'])\noi.subset(['Person']).rename({'Person': 'person'})\n\n# Merge and build\nmerged = coco.merge(oi)\nmerged.build('./data/new_dataset_dir')\n```\n\nThis package follows pytorch chaining rules, meaning that methods operating on an object modify it in-place, but also return the modified object. The exception is the `merge()` method which does not modify in-place and returns a new merged object. Hence, the above operations can also be completed using:\n\n```python\nfrom xt_cvdata.apis import COCO, OpenImages\n\nmerged = (\n    COCO('/nasty/data/common/COCO_2017')\n        .subset(['person'])\n        .merge(\n            OpenImages('/nasty/data/common/COCO_2017')\n                .subset(['Person'])\n                .rename({'Person': 'person'})\n        )\n)\nmerged.build('./data/new_dataset_dir')\n```\n\nIn practice, somewhere between the two approaches will probably be most readable.\n\nThe current set of dataset operations are:\n* `analyze`: recalculate dataset statistics (e.g., class distributions, train/val split)\n* `verify_schema`: check if class attributes follow required schema\n* `subset`: remove all but a subset of classes from the dataset\n* `rename`: rename/combine dataset classes\n* `sample`: sample a specified number of images from the train and validation sets\n* `split`: define the proportion of data in the validation set\n* `merge`: merge two datasets together, returning merged dataset\n* `build`: create the currently defined dataset using either symlinks or by copying images\n\n#### Implementing a new dataset type\n\nNew dataset types should inherit from the base `xt_cvdata.Builder` class. See the `Builder`, `COCO` and `OpenImages` classes as a guide. Specifically, the class initializer should define `info`, `licenses`, `categories`, `annotations`, and `images` attributes such that `self.verify_schema()` runs without error. This ensures that all of the methods defined in the `Builder` class will operate correctly on the inheriting class.\n\n## Data Sources\n\n[descriptions and links to data]\n\n## Dependencies/Licensing\n\n[list of dependencies and their licenses, including data]\n\n## References\n\n[list of references]\n\n\n",
        "description_content_type": "text/markdown",
        "docs_url": null,
        "download_url": "",
        "downloads": {
            "last_day": -1,
            "last_month": -1,
            "last_week": -1
        },
        "home_page": "https://github.com/XtractTech/xt-cvdata",
        "keywords": "",
        "license": "",
        "maintainer": "",
        "maintainer_email": "",
        "name": "xt-cvdata",
        "package_url": "https://pypi.org/project/xt-cvdata/",
        "platform": "",
        "project_url": "https://pypi.org/project/xt-cvdata/",
        "project_urls": {
            "Homepage": "https://github.com/XtractTech/xt-cvdata"
        },
        "release_url": "https://pypi.org/project/xt-cvdata/0.4.0/",
        "requires_dist": [
            "numpy",
            "pandas",
            "torch",
            "h5py",
            "pillow",
            "matplotlib",
            "tqdm"
        ],
        "requires_python": "",
        "summary": "Utilities for building and working with computer vision datasets",
        "version": "0.4.0"
    },
    "last_serial": 5946645,
    "releases": {
        "0.3.0": [
            {
                "comment_text": "",
                "digests": {
                    "md5": "f957e7961154dfdeaa0ba7129bfbc8c9",
                    "sha256": "e3f9534f0afb42048c674fc3483eb9baf0110b2cf55a8c1e5c6d59bd4fdb1efa"
                },
                "downloads": -1,
                "filename": "xt_cvdata-0.3.0-py3-none-any.whl",
                "has_sig": false,
                "md5_digest": "f957e7961154dfdeaa0ba7129bfbc8c9",
                "packagetype": "bdist_wheel",
                "python_version": "py3",
                "requires_python": null,
                "size": 16111,
                "upload_time": "2019-10-08T20:48:15",
                "url": "https://files.pythonhosted.org/packages/dc/1a/680db02ca63c4ecd222ca7f9e0efa00e7c6d9f4e53ccfa7769a1118d7590/xt_cvdata-0.3.0-py3-none-any.whl"
            },
            {
                "comment_text": "",
                "digests": {
                    "md5": "0fd76281f96bec48504cf24634f8cb61",
                    "sha256": "a098247c2af19f19ca18b62bf7e2ec0f028c031c8363dbf3607e1f351b95e4bc"
                },
                "downloads": -1,
                "filename": "xt-cvdata-0.3.0.tar.gz",
                "has_sig": false,
                "md5_digest": "0fd76281f96bec48504cf24634f8cb61",
                "packagetype": "sdist",
                "python_version": "source",
                "requires_python": null,
                "size": 13434,
                "upload_time": "2019-10-08T20:48:18",
                "url": "https://files.pythonhosted.org/packages/77/fd/26d2356d9fc92f8b4004e5836add8fd2bd01db7305cbe89a5b39398ba1c7/xt-cvdata-0.3.0.tar.gz"
            }
        ],
        "0.4.0": [
            {
                "comment_text": "",
                "digests": {
                    "md5": "c89b112b17c5a7a478c7c7d77649ccb0",
                    "sha256": "d9608fd2b4839121f7a5267caabe0471c952ea29d757919da58c66e40cf24d89"
                },
                "downloads": -1,
                "filename": "xt_cvdata-0.4.0-py3-none-any.whl",
                "has_sig": false,
                "md5_digest": "c89b112b17c5a7a478c7c7d77649ccb0",
                "packagetype": "bdist_wheel",
                "python_version": "py3",
                "requires_python": null,
                "size": 16119,
                "upload_time": "2019-10-08T20:51:55",
                "url": "https://files.pythonhosted.org/packages/06/bf/91d3a5b9ed2ad417e4832f8e21ad2172c84c7558652a7515c81c1205be95/xt_cvdata-0.4.0-py3-none-any.whl"
            },
            {
                "comment_text": "",
                "digests": {
                    "md5": "97462ffb233ac9a27ad8d29a2e82660f",
                    "sha256": "46517eb0aafbca4e1cd926ebec5efed9ee905cd13b812a1a9d14d514d9e00206"
                },
                "downloads": -1,
                "filename": "xt-cvdata-0.4.0.tar.gz",
                "has_sig": false,
                "md5_digest": "97462ffb233ac9a27ad8d29a2e82660f",
                "packagetype": "sdist",
                "python_version": "source",
                "requires_python": null,
                "size": 13447,
                "upload_time": "2019-10-08T20:51:57",
                "url": "https://files.pythonhosted.org/packages/d6/17/9e6273d68282331f687a8ed4f74b3c4771f1015a40c4a0d8bd6e2d8a241c/xt-cvdata-0.4.0.tar.gz"
            }
        ]
    },
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "md5": "c89b112b17c5a7a478c7c7d77649ccb0",
                "sha256": "d9608fd2b4839121f7a5267caabe0471c952ea29d757919da58c66e40cf24d89"
            },
            "downloads": -1,
            "filename": "xt_cvdata-0.4.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "c89b112b17c5a7a478c7c7d77649ccb0",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 16119,
            "upload_time": "2019-10-08T20:51:55",
            "url": "https://files.pythonhosted.org/packages/06/bf/91d3a5b9ed2ad417e4832f8e21ad2172c84c7558652a7515c81c1205be95/xt_cvdata-0.4.0-py3-none-any.whl"
        },
        {
            "comment_text": "",
            "digests": {
                "md5": "97462ffb233ac9a27ad8d29a2e82660f",
                "sha256": "46517eb0aafbca4e1cd926ebec5efed9ee905cd13b812a1a9d14d514d9e00206"
            },
            "downloads": -1,
            "filename": "xt-cvdata-0.4.0.tar.gz",
            "has_sig": false,
            "md5_digest": "97462ffb233ac9a27ad8d29a2e82660f",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 13447,
            "upload_time": "2019-10-08T20:51:57",
            "url": "https://files.pythonhosted.org/packages/d6/17/9e6273d68282331f687a8ed4f74b3c4771f1015a40c4a0d8bd6e2d8a241c/xt-cvdata-0.4.0.tar.gz"
        }
    ]
}