{ "info": { "author": "MIT Data To AI Lab", "author_email": "dailabmit@gmail.com", "bugtrack_url": null, "classifiers": [ "Development Status :: 2 - Pre-Alpha", "Intended Audience :: Developers", "License :: OSI Approved :: MIT License", "Natural Language :: English", "Programming Language :: Python :: 3", "Programming Language :: Python :: 3.4", "Programming Language :: Python :: 3.5", "Programming Language :: Python :: 3.6", "Programming Language :: Python :: 3.7" ], "description": "

\n\u201cDAI-Lab\u201d\nAn open source project from Data to AI Lab at MIT.\n

\n\n[![PyPI Shield](https://img.shields.io/pypi/v/d3m-dataset-manager.svg)](https://pypi.python.org/pypi/d3m-dataset-manager)\n[![Downloads](https://pepy.tech/badge/d3m-dataset-manager)](https://pepy.tech/project/d3m-dataset-manager)\n[![Travis CI Shield](https://travis-ci.org/HDI-Project/d3m-dataset-manager.svg?branch=master)](https://travis-ci.org/HDI-Project/d3m-dataset-manager)\n\n\n# D3M Dataset Manager\n\nThe D3M Dataset Manager is a command line tool and python package to generate and manage\ndatasets in the D3M format.\n\n- Documentation: https://HDI-Project.github.io/d3m-dataset-manager\n- Homepage: https://github.com/HDI-Project/d3m-dataset-manager\n\n# Overview\n\nThe D3M Dataset Manager is a command line tool and python package to generate and manage\ndatasets in the D3M format.\n\nIt supports:\n\n* downloading datasets from the D3M web repository or from S3 buckets\n* uploading datasets to S3 buckets\n* loading or saving datasets to local filesystem\n* spliting datasets into TRAIN, TEST and SCORE subsets following the dataSplits.csv indexes\n\n## Data Format\n\nThe D3M Dataset Schema, developed by MIT Lincoln Labs Laboratory for the DARPA's Data Driven\nDiscovery of Models Program, requires the data to be in plainly readable formats such as CSV files\nor JPG images, and to be set within a folder hierarchy alongside some metadata specifications\nin JSON format, which include information about all the data contained, as well as the problem\nthat we are trying to solve.\n\nFor more details about the schema and about how to format your data to be compliant with it,\nplease have a look at the [Schema Documentation](https://github.com/mitll/d3m-schema/tree/master/documentation)\n\n# Install\n\n## Install from PyPI\n\nThe easiest and recommended way to install the **D3M Dataset Manager** is using\n[pip](https://pip.pypa.io/en/stable/):\n\n```bash\npip install d3m-dataset-manager\n```\n\nThis will pull and install the latest stable release from [PyPI](https://pypi.org/).\n\n## Install from source\n\nIf you want to install the project from its sources, you can clone the repository and install it\nby running `make install` on the `stable` branch:\n\n```bash\ngit clone git@github.com:HDI-Project/d3m-dataset-manager.git\ncd d3m-dataset-manager\ngit checkout stable\nmake install\n```\n\n## Install for Development\n\nIf you want to contribute to the project, a few more steps are required to make the project ready\nfor development.\n\nPlease head to the [Contributing Guide](https://HDI-Project.github.io/d3m-dataset-manager/contributing.html#get-started)\nfor more details about this process.\n\n# Usage\n\n## Configuration\n\n### D3M Repository\n\nIn order to interact with the D3M repository you will need the user and the password\nuser to log into https://datadrivendiscovery.org/data\n\n### S3 Bucket\n\nIn order to interact with the S3 buckets, you will need to configure your S3 access\nfollowing the instructions from http://boto3.readthedocs.io/en/latest/guide/quickstart.html\n\nIn most cases, it will be enough to create the file `~/.aws/credentials:`\nwith the following contents:\n\n```\n[default]\naws_access_key_id = YOUR_ACCESS_KEY\naws_secret_access_key = YOUR_SECRET_KEY\n```\n\n## Command Line Options\n\nThe main element of the **D3M Dataset Manager** is the commadn `d3mdm`, which will be available\nin your command line after installing the package.\n\nThis command supports the following options:\n\n- **-i, --input** - D3M website, IPFS, S3 bucket or local folder.\n- **-o, --output** - S3 bucket or local folder.\n- **-l, --list** - List all available datasets in the indicated input.\n- **-a, --all** - Get and process all available datasets in the indicated input.\n- **-s, --split** - Split the dataset using the dataSplits.csv indexes.\n- **-r, --raw** - Do not download the splitted subsets. `-s` option implicitly enables this one.\n- **-f, --force** - Overwrite any existing datasets. If not enabled, existing datasets will be skipped.\n- **-d, --dry-run** - Do not perform any real action. Only list them.\n- **dataset names** - Name of the datasets o download. The `-a` option overrides them.\n\n### Input and Output\n\nThe Input and Output options implicitely point at different locations depending on the format:\n\n* **D3M**: `d3m:username:passsword`: password can be omitted, as well as username. Accepted only as Input.\n If omitted, the user will be asked to insert them later on.\n* **IPFS**: `ipfs`: The datasets will be downloaded using an IPFS mirror of the D3M repository.\n* **S3**: `s3://bucket-name/folder`: The datasets will be stored as a `.tar.gz` archive. If\n `folder` is not specified it defaults to `datasets`.\n* **Local filesystem**: `local/filesystem/path`: The path must exist, otherwise it raises an error.\n\n## Usage Examples\n\nDownload all datasets from D3M and store them as they are into S3 bucket named `d3m-data-dai`.\nThis will skip existing datasets.\n\n```\nd3m-dataset-manager -i d3m:a_username:a_password -o s3:d3m-data-dai -a\n```\n\nDownload all datasets from the IPFS mirror, split them and store them in a local folder\n`datasets`, overwriting any existing data.\n\nThis will prompt the user for the d3m password.\n\n```\nd3m-dataset-manager -i ipfs -o datasets -a -s -f\n```\n\nDownload the datasets `185_baseball` and `32_wikiqa` from S3 bucket `bucket-name`\ninto local folder `data/datasets`. Overwrite the existing data.\n\n```\nd3m-dataset-manager -i s3://bucket-name -o data/datasets -f 185_baseball 32_wikiqa\n```\n\n# What's next?\n\nFor more details about **D3M Dataset Manager** and all its possibilities\nand features, please check the [documentation site](https://HDI-Project.github.io/d3m-dataset-manager/).\n\n\n# History\n\n## v0.1.0 - 2019-10-09\n\nInitial Release.\n\n\n", "description_content_type": "text/markdown", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/HDI-Project/d3m-dataset-manager", "keywords": "d3mdm d3m-dataset-manager D3M Dataset Manager", "license": "MIT license", "maintainer": "", "maintainer_email": "", "name": "d3m-dataset-manager", "package_url": "https://pypi.org/project/d3m-dataset-manager/", "platform": "", "project_url": "https://pypi.org/project/d3m-dataset-manager/", "project_urls": { "Homepage": "https://github.com/HDI-Project/d3m-dataset-manager" }, "release_url": "https://pypi.org/project/d3m-dataset-manager/0.1.0/", "requires_dist": [ "bs4 (>=0.0.1)", "pandas (>=0.22.0)", "requests (>=2.18.4)", "boto3 (>=1.5.22)", "scikit-learn (>=0.21.2)", "bumpversion (>=0.5.3) ; extra == 'dev'", "pip (>=9.0.1) ; extra == 'dev'", "watchdog (>=0.8.3) ; extra == 'dev'", "m2r (>=0.2.0) ; extra == 'dev'", "Sphinx (>=1.7.1) ; extra == 'dev'", "sphinx-rtd-theme (>=0.2.4) ; extra == 'dev'", "autodocsumm (>=0.1.10) ; extra == 'dev'", "flake8 (>=3.7.7) ; extra == 'dev'", "isort (>=4.3.4) ; extra == 'dev'", "autoflake (>=1.2) ; extra == 'dev'", "autopep8 (>=1.4.3) ; extra == 'dev'", "twine (>=1.10.0) ; extra == 'dev'", "wheel (>=0.30.0) ; extra == 'dev'", "coverage (>=4.5.1) ; extra == 'dev'", "tox (>=2.9.1) ; extra == 'dev'", "pytest (>=3.4.2) ; extra == 'dev'", "pytest-cov (>=2.6.0) ; extra == 'dev'", "pytest (>=3.4.2) ; extra == 'test'", "pytest-cov (>=2.6.0) ; extra == 'test'" ], "requires_python": ">=3.4", "summary": "Command line tool and python package to generate and manage datasets in the D3M format.", "version": "0.1.0" }, "last_serial": 5950567, "releases": { "0.1.0": [ { "comment_text": "", "digests": { "md5": "744e4b8bf3fe2e13e13c4296630211db", "sha256": "0c6255b0cf7a1070e65a9d24289f0a1e341a7b1fffc3abfada4263f5c283ca0f" }, "downloads": -1, "filename": "d3m_dataset_manager-0.1.0-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "744e4b8bf3fe2e13e13c4296630211db", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": ">=3.4", "size": 13118, "upload_time": "2019-10-09T15:35:54", "url": "https://files.pythonhosted.org/packages/12/3b/b953b1761dd2cfe5d5a0b953cebb5c0adc0e76c694bdc5ae6859d2006bba/d3m_dataset_manager-0.1.0-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "8e711c2707451371d81d88e32ad75ab9", "sha256": "a3f8f37ff0e813df1b67b98dd096819bf6f140717dd352d9e4d1e3eef44fc9b7" }, "downloads": -1, "filename": "d3m-dataset-manager-0.1.0.tar.gz", "has_sig": false, "md5_digest": "8e711c2707451371d81d88e32ad75ab9", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.4", "size": 55048, "upload_time": "2019-10-09T15:35:55", "url": "https://files.pythonhosted.org/packages/65/f8/a598bd25012c9c5caa413317519f3cd0c70005f0851d20bc7d84d19f8354/d3m-dataset-manager-0.1.0.tar.gz" } ], "0.1.0.dev0": [ { "comment_text": "", "digests": { "md5": "377b1b5046ba1b44545cde1ae39c4c73", "sha256": "2048e75a1fb7ee5e6f9e469b209a58163c17d1b45c1c65aa2d33ce4f4f14e06c" }, "downloads": -1, "filename": "d3m_dataset_manager-0.1.0.dev0-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "377b1b5046ba1b44545cde1ae39c4c73", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": ">=3.4", "size": 13173, "upload_time": "2019-10-09T15:18:27", "url": "https://files.pythonhosted.org/packages/1b/8a/a42c3ad12845e3bdeb66e81c198bfcf0d94ab6f701f36e13d8cd4ece8c4c/d3m_dataset_manager-0.1.0.dev0-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "69c59f0c73dff86ef6e1b55776f9895e", "sha256": "4c7c6d8886dad0ed19ca89a836d627e8c7afb41a4004badde2b73aa881ac908c" }, "downloads": -1, "filename": "d3m-dataset-manager-0.1.0.dev0.tar.gz", "has_sig": false, "md5_digest": "69c59f0c73dff86ef6e1b55776f9895e", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.4", "size": 54998, "upload_time": "2019-10-09T15:18:30", "url": "https://files.pythonhosted.org/packages/81/1c/beb1407449f9db677003a334b9b58c69a077c2f9b6b8f0349341df896b0b/d3m-dataset-manager-0.1.0.dev0.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "744e4b8bf3fe2e13e13c4296630211db", "sha256": "0c6255b0cf7a1070e65a9d24289f0a1e341a7b1fffc3abfada4263f5c283ca0f" }, "downloads": -1, "filename": "d3m_dataset_manager-0.1.0-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "744e4b8bf3fe2e13e13c4296630211db", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": ">=3.4", "size": 13118, "upload_time": "2019-10-09T15:35:54", "url": "https://files.pythonhosted.org/packages/12/3b/b953b1761dd2cfe5d5a0b953cebb5c0adc0e76c694bdc5ae6859d2006bba/d3m_dataset_manager-0.1.0-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "8e711c2707451371d81d88e32ad75ab9", "sha256": "a3f8f37ff0e813df1b67b98dd096819bf6f140717dd352d9e4d1e3eef44fc9b7" }, "downloads": -1, "filename": "d3m-dataset-manager-0.1.0.tar.gz", "has_sig": false, "md5_digest": "8e711c2707451371d81d88e32ad75ab9", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.4", "size": 55048, "upload_time": "2019-10-09T15:35:55", "url": "https://files.pythonhosted.org/packages/65/f8/a598bd25012c9c5caa413317519f3cd0c70005f0851d20bc7d84d19f8354/d3m-dataset-manager-0.1.0.tar.gz" } ] }