{ "info": { "author": "MIT Data to AI Lab", "author_email": "dai-lab-trane@mit.edu", "bugtrack_url": null, "classifiers": [ "Development Status :: 2 - Pre-Alpha", "Intended Audience :: Developers", "License :: OSI Approved :: MIT License", "Natural Language :: English", "Programming Language :: Python :: 3", "Programming Language :: Python :: 3.4", "Programming Language :: Python :: 3.5", "Programming Language :: Python :: 3.6" ], "description": "# Trane\nTrane is a software package for automatically generating prediction problems and generating labels for supervised learning. Trane is a system designed to advance the automation of the machine learning problem solving pipeline.\n\n## Prediction Problems\nIn data science, people usually have a few records of an entity and want to predict what will happen to that entity in the future. Trane is designed to generate time-related prediction problems. Trane transforms data meta information into lists of relevant prediction problems and cutoff times. Prediction problems are structured in a formal language described in Operations below. Cutoff times are defined as the last time in the data used for training the classifier. Data after the cutoff time is used for evaluating the classifiers accuracy. Cutoff times are necessary to prevent the classifier from training to test data.\n\n### Example\nA bank wants to predict how many transactions over 100$ a customer will make in the next year. Assume we have all the transaction records for each client from 2015 to 2017. We want to build a machine learning method to solve the prediction problem. Here is the example database.\n\n|User_id|Time|Transaction_id|Amount|\n|:--:|:--:|:--:|:--:|\n| u1 | 2015 | 1-2015-1 | 10 |\n| u1 | 2015 | 1-2015-2 | 200 |\n| u2 | 2015 | 2-2015-1 | 50 |\n| u1 | 2016 | 1-2016-1 | 10 |\n| u1 | 2017 | 1-2017-1 | 1000|\n| u1 | 2017 | 1-2017-2 | 20 |\n| u2 | 2017 | 2-2017-1 | 10 |\n\nFirst, we seperate the data by entity. Here the entity is user_id. User u1 for example, has\n\n|User_id|Time|Transaction_id|Amount|\n|:--:|:--:|:--:|:--:|\n| u1 | 2015 | 1-2015-1 | 10 |\n| u1 | 2015 | 1-2015-2 | 200 |\n| u1 | 2016 | 1-2016-1 | 10 |\n| u1 | 2017 | 1-2017-1 | 1000|\n| u1 | 2017 | 1-2017-2 | 20 |\n\nLet's consider a **cutoff time** equal to 2016. The data from 2015-2016 will be used as training data in the machine learning model. Data after 2016, that is data from 2016-2017 will be used to evaluate the trained model. Trane outputs a tuple of (entity, cutoff, label) for each prediction problem. A prediction problem is applied to entity data to generate the label. The data from Trane can be fed directly into Feature Tools to perform feature engineering.\n\n### Prediction Problem Generation\nAs shown in the example, a prediction problem is a sequence of operations applied to data as well as a cutoff time.\n\nIn Trane, we generate prediction problems with four operations: Filter Operations, Row Operations, Transformation Operations and Aggregation Operations. Filter operations are applied on the filter\\_column. Row, Transformation and Aggregation Operations are applied on the label\\_generating\\_column.\n\n## Workflow\n\nThe workflow of using Trane on a database is as follows:\n\n- Data scientist writes a `meta.json` describing columns and data types in the new database.\n- `PredictionProblemGenerator` reads the meta data and generates possible prediction problems. The prediction problems are saved to `problems.json`.\n- The data scientist can change parameters to the prediction problems in `problems.json`.\n- The `labeler` applies prediction problems in `problems.json` to the database `data.csv`\n\n\n## Built-in Operations\n- FilterOp\n - IdentityFilterOp\n - GreaterFilterOp\n- RowOp\n - IdentityRowOp\n - GreaterRowOp\n- TransformationOp\n - IdentityTransformationOp\n - DiffTransformationOp\n- AggregationOp\n - FirstAggregationOp\n - CountAggregationOp\n - SumAggregationOp\n - LastAggregationOp\n - LMFAggregationOp\n\n## Unit Testing\nWe use `pytest` to automatically collecting unit testings and `pytest-cov` to measure the coverage of unit testing. The application code is in `Trane/trane/`. The unit testing code is in `Trane/tests/`. To run all unit testing, change directory to `Trane` and execute\n\n```\n> pytest --cov=trane tests\n```\n\n\n## Setup/Install\n### Clone from Git\n```\n> git clone https://github.com/HDI-Project/Trane.git\n```\n### Run pip install\n```\n> pip3 install Trane/\n```\n\n## Quick Usage\nWe have [a tutorial notebook here](https://github.com/HDI-Project/Trane/blob/master/Tutorial.ipynb).\n\n\n## TODO\n- Need an easier way to add customize operations. Currently, external plugin operations are not allowed. The bottleneck is we need to maintain a list of operations so that we can save, load, and iterate over operations. It's not easy to add an external operation into operation list. \n- Currently, all operations are in-place operations. The aggregation ops simply take a record, change the value in the column and return. May not be a good design.\n- API for setting thresholds. \n- Some NotImplementedError.\n- NL system should be independent of Trane. Seems better to generate NL from JSON.\n\n\n# History\n\n## 0.1.0 (2018-04-12)\n\n* First release on PyPI.", "description_content_type": "text/markdown", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/HDI-Project/Trane", "keywords": "trane", "license": "MIT license", "maintainer": "", "maintainer_email": "", "name": "trane", "package_url": "https://pypi.org/project/trane/", "platform": "", "project_url": "https://pypi.org/project/trane/", "project_urls": { "Homepage": "https://github.com/HDI-Project/Trane" }, "release_url": "https://pypi.org/project/trane/0.1.0/", "requires_dist": null, "requires_python": "", "summary": "Trane is a software package for automatically generating prediction problems and generating labels for supervised learning.", "version": "0.1.0" }, "last_serial": 3817427, "releases": { "0.0": [ { "comment_text": "", "digests": { "md5": "b335e9c906204aaf2170106d9bddfed9", "sha256": "a63cfa829b36c213045df2711f8487a573c5a9c4af45cdf75b9bad565cb814c9" }, "downloads": -1, "filename": "trane-0.0-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "b335e9c906204aaf2170106d9bddfed9", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": null, "size": 28951, "upload_time": "2018-04-28T18:41:23", "url": "https://files.pythonhosted.org/packages/2d/a0/b3c00d6c09022de69791285745245c1c42dede40c3913a1604d5e66dd574/trane-0.0-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "efbe166c866307704fd76458f378050a", "sha256": "ddf93a07c629c98cd3f02a66d10e5806e1c2496fae626e005c4c542d0364948a" }, "downloads": -1, "filename": "trane-0.0.tar.gz", "has_sig": false, "md5_digest": "efbe166c866307704fd76458f378050a", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 18479, "upload_time": "2018-04-28T18:41:24", "url": "https://files.pythonhosted.org/packages/67/d7/a6e737220abf78878ad313fcb95c60107fb964fdc67cd88b89bd1e29499c/trane-0.0.tar.gz" } ], "0.1.0": [ { "comment_text": "", "digests": { "md5": "ec50cefc64ec3d242deb86db2bb371c5", "sha256": "b2b13baa61441bc1f010e38c9932db5b0872ae911387ed90d6b1fb232cb02977" }, "downloads": -1, "filename": "trane-0.1.0-py3.4.egg", "has_sig": false, "md5_digest": "ec50cefc64ec3d242deb86db2bb371c5", "packagetype": "bdist_egg", "python_version": "3.4", "requires_python": null, "size": 72367, "upload_time": "2018-04-28T18:41:25", "url": "https://files.pythonhosted.org/packages/e2/f4/75a1ff906ef995c3c8d1d64a83b0acb3576225aae4693f2a024324d90b52/trane-0.1.0-py3.4.egg" }, { "comment_text": "", "digests": { "md5": "89290cb39e3aacd6a3b0a6d0121a1855", "sha256": "2b8b818615c1455f31d84093c1039d5760bdecde0ac30591688d297d51db3cd9" }, "downloads": -1, "filename": "trane-0.1.0-py3.5.egg", "has_sig": false, "md5_digest": "89290cb39e3aacd6a3b0a6d0121a1855", "packagetype": "bdist_egg", "python_version": "3.5", "requires_python": null, "size": 72234, "upload_time": "2018-04-28T18:41:27", "url": "https://files.pythonhosted.org/packages/85/f0/47d2620a7f1048f7cbe68c762935e88916b12f01fdd8284e6af3c849cff9/trane-0.1.0-py3.5.egg" }, { "comment_text": "", "digests": { "md5": "19c89d418f4c6e2f79af15fefb5cd920", "sha256": "01482f56cd489b8ba4dada339d73b37fccea4e92df1747dbbeaa7f666b33cac1" }, "downloads": -1, "filename": "trane-0.1.0-py3.6.egg", "has_sig": false, "md5_digest": "19c89d418f4c6e2f79af15fefb5cd920", "packagetype": "bdist_egg", "python_version": "3.6", "requires_python": null, "size": 70938, "upload_time": "2018-04-28T18:41:28", "url": "https://files.pythonhosted.org/packages/6f/73/821c69f0c8c1ffe8e09ac461241a36638034eff1c638b6b091a568663a41/trane-0.1.0-py3.6.egg" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "ec50cefc64ec3d242deb86db2bb371c5", "sha256": "b2b13baa61441bc1f010e38c9932db5b0872ae911387ed90d6b1fb232cb02977" }, "downloads": -1, "filename": "trane-0.1.0-py3.4.egg", "has_sig": false, "md5_digest": "ec50cefc64ec3d242deb86db2bb371c5", "packagetype": "bdist_egg", "python_version": "3.4", "requires_python": null, "size": 72367, "upload_time": "2018-04-28T18:41:25", "url": "https://files.pythonhosted.org/packages/e2/f4/75a1ff906ef995c3c8d1d64a83b0acb3576225aae4693f2a024324d90b52/trane-0.1.0-py3.4.egg" }, { "comment_text": "", "digests": { "md5": "89290cb39e3aacd6a3b0a6d0121a1855", "sha256": "2b8b818615c1455f31d84093c1039d5760bdecde0ac30591688d297d51db3cd9" }, "downloads": -1, "filename": "trane-0.1.0-py3.5.egg", "has_sig": false, "md5_digest": "89290cb39e3aacd6a3b0a6d0121a1855", "packagetype": "bdist_egg", "python_version": "3.5", "requires_python": null, "size": 72234, "upload_time": "2018-04-28T18:41:27", "url": "https://files.pythonhosted.org/packages/85/f0/47d2620a7f1048f7cbe68c762935e88916b12f01fdd8284e6af3c849cff9/trane-0.1.0-py3.5.egg" }, { "comment_text": "", "digests": { "md5": "19c89d418f4c6e2f79af15fefb5cd920", "sha256": "01482f56cd489b8ba4dada339d73b37fccea4e92df1747dbbeaa7f666b33cac1" }, "downloads": -1, "filename": "trane-0.1.0-py3.6.egg", "has_sig": false, "md5_digest": "19c89d418f4c6e2f79af15fefb5cd920", "packagetype": "bdist_egg", "python_version": "3.6", "requires_python": null, "size": 70938, "upload_time": "2018-04-28T18:41:28", "url": "https://files.pythonhosted.org/packages/6f/73/821c69f0c8c1ffe8e09ac461241a36638034eff1c638b6b091a568663a41/trane-0.1.0-py3.6.egg" } ] }