{ "info": { "author": "MIT Data To AI Lab", "author_email": "dailabmit@gmail.com", "bugtrack_url": null, "classifiers": [ "Development Status :: 3 - Alpha", "Intended Audience :: Developers", "License :: OSI Approved :: MIT License", "Natural Language :: English", "Programming Language :: Python :: 3", "Programming Language :: Python :: 3.5", "Programming Language :: Python :: 3.6", "Programming Language :: Python :: 3.7", "Topic :: Scientific/Engineering :: Artificial Intelligence" ], "description": "

\n\u201cAutoBazaar\u201d\nAn open source project from Data to AI Lab at MIT.\n

\n\n\n[![Travis](https://travis-ci.org/HDI-Project/AutoBazaar.svg?branch=master)](https://travis-ci.org/HDI-Project/AutoBazaar)\n[![PyPi Shield](https://img.shields.io/pypi/v/autobazaar.svg)](https://pypi.python.org/pypi/autobazaar)\n\n\n# AutoBazaar\n\n- License: MIT\n- Documentation: https://HDI-Project.github.io/AutoBazaar/\n- Homepage: https://github.com/HDI-Project/AutoBazaar\n\n# Overview\n\nAutoBazaar is an AutoML system created to execute the experiments associated with the\n[The Machine Learning Bazaar Paper: Harnessing the ML Ecosystem for Effective System\nDevelopment](https://arxiv.org/pdf/1905.08942.pdf)\nby the [Human-Data Interaction (HDI) Project](https://hdi-dai.lids.mit.edu/) at LIDS, MIT.\n\nIt comes in the form of a python library which can be used directly inside any other python\nproject, as well as a CLI which allows searching for pipelines to solve a problem directly\nfrom the command line.\n\n# Install\n\n## Requirements\n\n**AutoBazaar** has been developed and tested on [Python 3.5, 3.6 and 3.7](https://www.python.org/downloads/)\n\nAlso, although it is not strictly required, the usage of a\n[virtualenv](https://virtualenv.pypa.io/en/latest/) is highly recommended in order to avoid\ninterfering with other software installed in the system where **AutoBazaar** is run.\n\nThese are the minimum commands needed to create a virtualenv using python3.6 for **AutoBazaar**:\n\n```bash\npip install virtualenv\nvirtualenv -p $(which python3.6) autobazaar-venv\n```\n\nAfterwards, you have to execute this command to have the virtualenv activated:\n\n```bash\nsource autobazaar-venv/bin/activate\n```\n\nRemember about executing it every time you start a new console to work on **AutoBazaar**!\n\n## Install with pip\n\nAfter creating the virtualenv and activating it, we recommend using\n[pip](https://pip.pypa.io/en/stable/) in order to install **AutoBazaar**:\n\n```bash\npip install autobazaar\n```\n\nThis will pull and install the latest stable release from [PyPi](https://pypi.org/).\n\n## Install from source\n\nAlternatively, with your virtualenv activated, you can clone the repository and install it from\nsource by running `make install` on the `stable` branch:\n\n```bash\ngit clone git@github.com:HDI-Project/AutoBazaar.git\ncd AutoBazaar\ngit checkout stable\nmake install\n```\n\nFor development, you can use `make install-develop` instead in order to install all\nthe required dependencies for testing and code linting.\n\n# Data Format\n\nAutoBazaar works with datasets in the [D3M Schema Format](https://github.com/mitll/d3m-schema)\nas input.\n\nThis dataset Schema, developed by MIT Lincoln Labs Laboratory for DARPA's Data Driven Discovery\nof Models Program, requires the data to be in plainly readable formats such as CSV files or\nJPG images, and to be set within a folder hierarchy alongside some metadata specifications\nin JSON format, which include information about all the data contained, as well as the problem\nthat we are trying to solve.\n\nFor more details about the schema and about how to format your data to be compliant with it,\nplease have a look at the [Schema Documentation](https://github.com/mitll/d3m-schema/tree/master/documentation)\n\nAs an example, you can browse some datasets which have been included in this repository for\ndemonstration purposes:\n- [185_baseball](https://github.com/HDI-Project/AutoBazaar/tree/master/data/185_baseball): Single Table Regression\n- [196_autoMpg](https://github.com/HDI-Project/AutoBazaar/tree/master/data/196_autoMpg): Single Table Classification\n\nAdditionally, you can find a collection with ~500 datasets already formatted in the\n[d3m-data-dai S3 Bucket in AWS](https://d3m-data-dai.s3.amazonaws.com/index.html).\n\n# Quickstart\n\nIn this short tutorial we will guide you through a series of steps that will help you getting\nstarted with **AutoBazaar** using its CLI command `abz`.\n\nFor more details about its usage and the available options, please execute `abz --help`\non your command line.\n\n## 1. Prepare your Data\n\nMake sure to have your data prepared in the [Data Format](#data-format) explained above inside\nand uncompressed folder in a filesystem directly accessible by **AutoBazaar**.\n\nIn order to check, whether your dataset is available and ready to use, you can execute\nthe `abz` command in your command line with its `list` subcommand.\nIf your dataset is in a different place than inside a folder called `data` within your\ncurrent working directory, do not forget to add the `-i` argument to your command indicating\nthe path to the folder that contains your dataset.\n\n```bash\n$ abz list -i /path/to/your/datasets/folder\n```\n\nThe output should be a table which includes the details of all the datasets found inside\nthe indicated directory:\n\n```\n data_modality task_type task_subtype metric size_human train_samples\ndataset\n185_baseball single_table classification multi_class f1Macro 148K 1073\n196_autoMpg single_table regression univariate meanSquaredError 32K 298\n30_personae text classification binary f1 1,4M 116\n32_wikiqa multi_table classification binary f1 4,9M 23406\n60_jester single_table collaborative_filtering meanAbsoluteError 44M 880719\n```\n\n**Note:** If you see an error saying that `No matching datasets found`, please review your\ndataset format and make sure to have indicated the right path.\n\nFor the rest of this quickstart, we will be using the `185_baseball` dataset that you can\nfind inside the [data folder](https://github.com/HDI-Project/AutoBazaar/tree/master/data)\ncontained in this repository.\n\n## 2. Start the search process\n\nOnce your data is ready, you can start the AutoBazaar search process using the `abz search`\ncommand.\nTo do this, you will need to provide again the path to where your datasets are contained, as\nwell as the name of the datasets that you want to process.\n\n```bash\n$ abz search -i /path/to/your/datasets/folder name_of_your_dataset\n```\n\nThis will evaluate the default pipeline without performing additional tuning iteration on it.\n\nIn order to start an actual tuning process, you will need to provide at least one of the\nfollowing additional options:\n\n* `-b, --budget`: Maximum number of tuning iterations to perform.\n* `-t, --timeout`: Maximum time that the system needs to run, in seconds.\n* `-c, --checkpoints`: Comma separated string containing the different checkpoints where\n the best pipeline so far must be stored and evaluated against the test dataset. There must be\n no spaces between the checkpoint times. For example, to store the best pipeline every 10 minutes\n until 30 minutes have passed, you would use the option `-c 600,1200,1800`.\n\nFor example, to search process the `185_baseball` dataset during 30 seconds evaluating the\nbest pipeline so far every 10 seconds but with a maximum of 10 tuning iterations, we would\nuse the following command:\n\n```bash\nabz search 185_baseball -c10,20,30 -b10\n```\n\nFor further details about the available options, please execute `abz search --help` in your\nterminal.\n\n## 3. Explore the results\n\nOnce the **AutoBazaar** has finished searching for the best pipeline, a table will be printed\nin stdout with a summary of the best pipeline found for each dataset.\nIf multiple checkpoints were provided, details about the best pipeline in each checkpoint\nwill also be included.\n\nThe output will be a table similar to this one:\n\n```\n pipeline score rank cv_score metric data_modality task_type task_subtype elapsed iterations load_time trivial_time fit_time cv_time error step\ndataset\n185_baseball fce28425-e45c-4620-9d3c-d329b8684bea 0.316961 0.682957 0.317043 f1Macro single_table classification multi_class 10.024457 0.0 0.011041 0.026212 NaN NaN None None\n185_baseball f7428924-79ee-439d-bc32-998a9efea619 0.675132 0.390927 0.609073 f1Macro single_table classification multi_class 21.412262 1.0 0.011041 0.026212 9.99484 NaN None None\n185_baseball 397780a5-6bf6-48c9-9a85-06b0d08c5a9d 0.675132 0.357361 0.642639 f1Macro single_table classification multi_class 31.712946 2.0 0.011041 0.026212 9.99484 12.618179 None None\n```\n\nAlternatively, a `-r` option can be passed with the name of a CSV file, and the results will\nbe stored there:\n\n```bash\nabz search 185_baseball -c10,20,30 -b10 -r results.csv\n```\n\n## What's next?\n\nFor more details about **AutoBazaar** and all its possibilities and features, please check the\n[project documentation site](https://HDI-Project.github.io/AutoBazaar/)!\n\n# Credits\n\nAutoBazaar is an Open Source project from the Data to AI Lab at MIT built by the following team:\n\n* Carles Sala \n* Micah Smith \n* Max Kanter \n* Kalyan Veeramachaneni \n\n## Citing AutoBazaar\n\nIf you use AutoBazaar for yor research, please consider citing the following paper (https://arxiv.org/pdf/1905.08942.pdf):\n\n```\n@article{smith2019mlbazaar,\n author = {Smith, Micah J. and Sala, Carles and Kanter, James Max and Veeramachaneni, Kalyan},\n title = {The Machine Learning Bazaar: Harnessing the ML Ecosystem for Effective System Development},\n journal = {arXiv e-prints},\n year = {2019},\n eid = {arXiv:1905.08942},\n pages = {arxiv:1904.09535},\n archivePrefix = {arXiv},\n eprint = {1905.08942},\n}\n```\n\n\n# History\n\n## 0.1.0 - 2019-06-24\n\nFirst Release.\n\nThis is a slightly cleaned up version of the software used to generate the results\nexplained in [The Machine Learning Bazaar Paper](https://arxiv.org/pdf/1905.08942.pdf)\n\n\n", "description_content_type": "text/markdown", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/HDI-project/AutoBazaar", "keywords": "automl machine learning hyperparameters tuning classification regression autobazaar", "license": "MIT license", "maintainer": "", "maintainer_email": "", "name": "autobazaar", "package_url": "https://pypi.org/project/autobazaar/", "platform": "", "project_url": "https://pypi.org/project/autobazaar/", "project_urls": { "Homepage": "https://github.com/HDI-project/AutoBazaar" }, "release_url": "https://pypi.org/project/autobazaar/0.1.0/", "requires_dist": [ "absl-py (==0.4.0)", "astor (==0.7.1)", "baytune (==0.2.1)", "boto (==2.48.0)", "boto3 (==1.9.27)", "botocore (==1.12.28)", "certifi (==2018.8.13)", "chardet (==3.0.4)", "click (==6.7)", "cloudpickle (==0.4.0)", "cycler (==0.10.0)", "dask (==0.18.2)", "decorator (==4.3.0)", "distributed (==1.22.1)", "docutils (==0.14)", "featuretools (==0.3.1)", "future (==0.16.0)", "gast (==0.2.0)", "grpcio (==1.12.1)", "h5py (==2.8.0)", "HeapDict (==1.0.0)", "idna (==2.6)", "iso639 (==0.1.4)", "jmespath (==0.9.3)", "Keras (==2.1.6)", "Keras-Applications (==1.0.6)", "Keras-Preprocessing (==1.0.5)", "kiwisolver (==1.0.1)", "langdetect (==1.0.7)", "lightfm (==1.15)", "matplotlib (==2.2.3)", "mit-d3m (==0.1.1)", "mlblocks (==0.2.3)", "mlprimitives (==0.1.3)", "msgpack (==0.5.6)", "networkx (==2.1)", "nltk (==3.3)", "numpy (==1.15.2)", "opencv-python (==3.4.2.17)", "pandas (==0.23.4)", "Pillow (==5.1.0)", "protobuf (==3.6.1)", "psutil (==5.4.7)", "pymongo (==3.7.2)", "pyparsing (==2.2.0)", "python-dateutil (==2.7.3)", "python-louvain (==0.10)", "pytz (==2018.5)", "PyWavelets (==0.5.2)", "PyYAML (==3.12)", "requests (==2.20.0)", "s3fs (==0.1.5)", "s3transfer (==0.1.13)", "scikit-image (==0.14.0)", "scikit-learn (==0.20.0)", "scipy (==1.1.0)", "six (==1.11.0)", "sortedcontainers (==2.0.4)", "setuptools (==39.1.0)", "tblib (==1.3.2)", "tensorboard (==1.11.0)", "tensorflow (==1.11.0)", "termcolor (==1.1.0)", "toolz (==0.9.0)", "tornado (==5.1)", "tqdm (==4.24.0)", "urllib3 (==1.23)", "Werkzeug (==0.14.1)", "xgboost (==0.72.1)", "zict (==0.1.3)", "bumpversion (>=0.5.3) ; extra == 'dev'", "pip (>=10.0.0) ; extra == 'dev'", "watchdog (>=0.8.3) ; extra == 'dev'", "m2r (>=0.2.0) ; extra == 'dev'", "Sphinx (>=1.7.1) ; extra == 'dev'", "sphinx-rtd-theme (>=0.2.4) ; extra == 'dev'", "autodocsumm (>=0.1.10) ; extra == 'dev'", "flake8 (>=3.5.0) ; extra == 'dev'", "isort (>=4.3.4) ; extra == 'dev'", "autoflake (>=1.1) ; extra == 'dev'", "autopep8 (>=1.3.5) ; extra == 'dev'", "twine (>=1.10.0) ; extra == 'dev'", "wheel (>=0.30.0) ; extra == 'dev'", "tox (>=2.9.1) ; extra == 'dev'", "coverage (>=4.5.1) ; extra == 'dev'", "pytest (>=3.4.2) ; extra == 'dev'", "pytest-cov (>=2.6.0) ; extra == 'dev'", "pytest (>=3.4.2) ; extra == 'tests'", "pytest-cov (>=2.6.0) ; extra == 'tests'" ], "requires_python": ">=3.4", "summary": "The Machine Learning Bazaar", "version": "0.1.0" }, "last_serial": 5442343, "releases": { "0.1.0": [ { "comment_text": "", "digests": { "md5": "9f12b627af19cbf55b7fc0177407bea2", "sha256": "819cdd9b97bf4df2e3a77ad5c633b0bab77f87243121da299c524d8b5fd8f03e" }, "downloads": -1, "filename": "autobazaar-0.1.0-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "9f12b627af19cbf55b7fc0177407bea2", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": ">=3.4", "size": 20086, "upload_time": "2019-06-24T19:54:13", "url": "https://files.pythonhosted.org/packages/21/d0/1b59be7690f157baa39ec99ad0b41538f8c564bbf7f8e2f24e488784dc57/autobazaar-0.1.0-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "915a0acd985183df93ea8252765e4d44", "sha256": "cc50aae6e3a42c62e593bf242b6565cbbc055757a2e22967cdf34ad688f51c45" }, "downloads": -1, "filename": "autobazaar-0.1.0.tar.gz", "has_sig": false, "md5_digest": "915a0acd985183df93ea8252765e4d44", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.4", "size": 63514, "upload_time": "2019-06-24T19:54:16", "url": "https://files.pythonhosted.org/packages/d9/20/17cabf515546923c03cf451af23590400629a249ff777525ac841c804f75/autobazaar-0.1.0.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "9f12b627af19cbf55b7fc0177407bea2", "sha256": "819cdd9b97bf4df2e3a77ad5c633b0bab77f87243121da299c524d8b5fd8f03e" }, "downloads": -1, "filename": "autobazaar-0.1.0-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "9f12b627af19cbf55b7fc0177407bea2", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": ">=3.4", "size": 20086, "upload_time": "2019-06-24T19:54:13", "url": "https://files.pythonhosted.org/packages/21/d0/1b59be7690f157baa39ec99ad0b41538f8c564bbf7f8e2f24e488784dc57/autobazaar-0.1.0-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "915a0acd985183df93ea8252765e4d44", "sha256": "cc50aae6e3a42c62e593bf242b6565cbbc055757a2e22967cdf34ad688f51c45" }, "downloads": -1, "filename": "autobazaar-0.1.0.tar.gz", "has_sig": false, "md5_digest": "915a0acd985183df93ea8252765e4d44", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.4", "size": 63514, "upload_time": "2019-06-24T19:54:16", "url": "https://files.pythonhosted.org/packages/d9/20/17cabf515546923c03cf451af23590400629a249ff777525ac841c804f75/autobazaar-0.1.0.tar.gz" } ] }