{ "info": { "author": "Tang Han", "author_email": "aloofness54@gmail.com", "bugtrack_url": null, "classifiers": [ "License :: OSI Approved :: MIT License", "Operating System :: OS Independent", "Programming Language :: Python :: 3" ], "description": "# auto-modelling\n\nAuto-modelling is a convenient library to train and tune machine models automatically.\n\nIts main features include the following:\n\n1. preprocessing columns in all datatypes. (numeric, categorical, text)\n2. train machine models and tune parameters automatically.\n3. return top n best models with optimized parameters.\n4. Apply **stacking** technique to combine the n best models returned by the repo or self-determined fitted models together to get an even better result.\n\nThe machine learning models include the following:\n- Classification:\n - ExtraTreesClassifier\n - RandomForestClassifier\n - KNeighborsClassifier\n - LogisticRegression\n - XGBClassifier\n- Regression:\n - ExtraTreesRegressor\n - GradientBoostingRegressor\n - AdaBoostRegressor\n - DecisionTreeRegressor\n - RandomForestRegressor\n - XGBRegressor\n- Stack:\n - for classify: LogisticRegression\n - for regression: LinearRegression\n\nreference: https://github.com/EpistasisLab/tpot/blob\n\n# Installation\n\n`pip install auto-modelling`\n\n# Usage Example\n```\nfrom auto_modelling.classification import GoClassify\nfrom auto_modelling.regression import GoRegress\nfrom auto_modelling.preprocess import DataManager\nfrom auto_modelling.stack import Stack\n\n# preprocessing data\ndm = DataManager(directory = 'preprocess_tools')\ntrain, test = dm.drop_sparse_columns(x_train, x_test)\ntrain, test = dm.process_data(x_train, x_test)\n# the encoders are stored in the directory called data_process_tools.\n\n# use the same processing tools to process new data\npredict_data = dm.process_predict_data(predict_x)\n# predict_x should have the same format as x_train/x_test\n\n# classification\nclf = GoClassify(n_best=1)\nbest = clf.train(x_train, y_train)\ny_pred = best.predict(x_test)\n\n# regression\nreg = GoRegress(n_best=1)\nbest = reg.train(x_train, y_train)\ny_pred = best.predict(x_test)\n\n# get top 3 best models\nclf = GoClassify(n_best=3)\nbests = clf.train(x_train, y_train)\ny_preds = [m.predict(x_test) for m in bests]\n\n# Stack top 3 best models\nstack = Stack(n_models = 3)\nlevel_0_models, level_1_model = stack.train(x_train, y_train, x_test, y_test)\n```\n\nThere are examples `test.py` and `sample.py` in the root directory of this package. run\n`python test.py`/`python sample.py`.\n\n# Development Guide\n\n- Clone the repo\n\n- Create the virtual environment\n```\nmkvirtualenv auto\nworkon auto\npip install requirements.txt\n```\nif you have issues in installing `xgboost` \nrefrence: \nhttps://xgboost.readthedocs.io/en/latest/build.html#\nhttps://www.ibm.com/developerworks/community/blogs/jfp/entry/Installing_XGBoost_on_Mac_OSX?lang=en\n\n# Note\n\n- TO DO: Feature selection, evaluation metricss\n\n# Thoughts\n\n- Ideally, any dataframe being throw into this repo, it should be processed.\n\n1. pre-processing \n\n - drop column that have too many null(Done)\n - fill na for both numeric and non-numeric values(Done)\n - encoded for non-numeric values(Done)\n - scale values if needed\n - balance the dataset if needed\n\n2. model-training\n\n - mode = `classification`, `regression`, `auto`(Done)\n - split data-set\n - tuning parameters and model selection (Done)\n - feature selection\n - return a model with parameters, columns and a script to process x_test(Done)\n - stacking with customized fitted models (Done)\n\n3. model-evualation\n# Other reference\n\n[Packaging your project](https://packaging.python.org/tutorials/packaging-projects/)\n\n", "description_content_type": "text/markdown", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/TangHan54/auto-modelling.git", "keywords": "", "license": "", "maintainer": "", "maintainer_email": "", "name": "auto-modelling", "package_url": "https://pypi.org/project/auto-modelling/", "platform": "", "project_url": "https://pypi.org/project/auto-modelling/", "project_urls": { "Homepage": "https://github.com/TangHan54/auto-modelling.git" }, "release_url": "https://pypi.org/project/auto-modelling/1.2.5/", "requires_dist": [ "sklearn", "ipython", "numpy", "pandas", "setuptools", "xgboost", "joblib" ], "requires_python": "", "summary": "A light package for automatic model tuning and stacking", "version": "1.2.5" }, "last_serial": 5713626, "releases": { "1.2.5": [ { "comment_text": "", "digests": { "md5": "627b3f96650826304729de44585ea865", "sha256": "16b9644ca661101ff41dd94417ec57c6d623d7945c6ca2827d06ea938f1e3e3a" }, "downloads": -1, "filename": "auto_modelling-1.2.5-py3-none-any.whl", "has_sig": false, "md5_digest": "627b3f96650826304729de44585ea865", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 12297, "upload_time": "2019-08-22T07:06:39", "url": "https://files.pythonhosted.org/packages/db/b2/3369d9649e7f38f67069067d5139c5e15c787217355bffa08ebe6a619c24/auto_modelling-1.2.5-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "2ed2a9f07d8a3ba02f42481dc4c164a3", "sha256": "bb228f0548eea4254f08ada4458765dcea2b92c630796f51ce4324fe089bada3" }, "downloads": -1, "filename": "auto_modelling-1.2.5.tar.gz", "has_sig": false, "md5_digest": "2ed2a9f07d8a3ba02f42481dc4c164a3", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 6151, "upload_time": "2019-08-22T07:06:41", "url": "https://files.pythonhosted.org/packages/62/ab/fde8ec954cc70791744e29c310ba2a914c9c173b7b2bd904cbcbe963ec39/auto_modelling-1.2.5.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "627b3f96650826304729de44585ea865", "sha256": "16b9644ca661101ff41dd94417ec57c6d623d7945c6ca2827d06ea938f1e3e3a" }, "downloads": -1, "filename": "auto_modelling-1.2.5-py3-none-any.whl", "has_sig": false, "md5_digest": "627b3f96650826304729de44585ea865", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 12297, "upload_time": "2019-08-22T07:06:39", "url": "https://files.pythonhosted.org/packages/db/b2/3369d9649e7f38f67069067d5139c5e15c787217355bffa08ebe6a619c24/auto_modelling-1.2.5-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "2ed2a9f07d8a3ba02f42481dc4c164a3", "sha256": "bb228f0548eea4254f08ada4458765dcea2b92c630796f51ce4324fe089bada3" }, "downloads": -1, "filename": "auto_modelling-1.2.5.tar.gz", "has_sig": false, "md5_digest": "2ed2a9f07d8a3ba02f42481dc4c164a3", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 6151, "upload_time": "2019-08-22T07:06:41", "url": "https://files.pythonhosted.org/packages/62/ab/fde8ec954cc70791744e29c310ba2a914c9c173b7b2bd904cbcbe963ec39/auto_modelling-1.2.5.tar.gz" } ] }