{ "info": { "author": "Kevin Vecmanis", "author_email": "kevinvecmanis@gmail.com", "bugtrack_url": null, "classifiers": [ "License :: OSI Approved :: MIT License", "Operating System :: OS Independent", "Programming Language :: Python :: 3", "Programming Language :: Python :: 3.5", "Programming Language :: Python :: 3.6", "Programming Language :: Python :: 3.7" ], "description": "# ML Automator\n### Author: Kevin Vecmanis\n\n[![image](https://img.shields.io/pypi/v/mlautomator.svg)](https://pypi.org/project/mlautomator/)\n[![image](https://img.shields.io/pypi/l/mlautomator.svg)](https://pypi.org/project/mlautomator/)\n[![image](https://img.shields.io/pypi/pyversions/mlautomator.svg)](https://pypi.org/project/mlautomator/)\n[![image](https://img.shields.io/codecov/c/github/vanaurum/ML-automator/master.svg)](https://img.shields.io/codecov/c/github/vanaurum/ML-automator)\n\nMachine Learning Automator (__ML Automator__) is an automation project that integrates __Sequential Model Based Optimization__ (SMBO) with the main learning algorithms from Python's Sci-kit Learn library to generate a really fast, automated tool for tuning machine learning algorithms. __MLAutomator__ leverages a library called Hyperopt to accomplish this. [Read more about Hyperopt here](http://hyperopt.github.io/hyperopt/)\n\n## What is SMBO? \n\nSMBO is a form of hyperparameter tuning, like grid search and randomized search. In contrast to grid and randomized search, however, SMBO used __Bayesian Optimization__ to build a probability model, through trial and error, that is able to better predict what hyperparameters might produce a better model. The \"sequential\" just means that multiple trials are run, one after another, each time testing better hyper parameters by applying bayesion reasoning and updating the existing probability model.\n\nThe trade-off here is that SMBO models spend more time between each iteration \"selecting\" the next choice of hyperparameters - but this is accepted because the extra time taken to choose the next hyperparameters is typically __signigicantly__ less than each training iteration. In other words, SMBO results in:\n\n* Reduced time tuning hyperparameters compared to grid and random search methods.\n* Better scores on the testing set.\n\n## Installation:\n\nInstallation is easy - of course `pip` can be swapped for `pip3` or `pipenv` (in a virtual environment)\n\n`pip install mlautomator`\n\n\n## Key features:\n\n* Optimizes across data pre-processing and feature selection __in addition__ to hyperparameters.\n* Fast, intelligent scan of parameter space using __Hyperopt__.\n* Optimized parameter search permits scanning a larger cross section of algorithms in the same period of time.\n* An exceptional spot-checking algorithm.\n\n\n## Usage \n\n__MLAutomator__ accepts a training dataset X, and a target Y. The user can define their own functions for how these datasets are produced. Note that MLAutomator is designed to be a highly optimized spot-checking algorithm - you should take care to make sure your data is free from errors and any missing values have been dealt with. \n\n__MLAutomator__ will find ways of transforming and pre-processing your data to produce a superior model. Feel free to make your own transformations before passing the data to __MLAutomator__. \n\n\n### Optional data utilities\n\nI'm building a suite of data utility functions which can prepare most classification and regression datasets. These, however, are optional - __MLAutomator__ only requires __X and Y__ inputs in the form of a numpy ndarray.\n\n```Python\nfrom data.utilities import clf_prep\n\nx, y = clf_prep('pima-indians-diabetes.csv')\n```\n\nOnce you have training and target data, this is the main call to use MLAutomator...\n\n\n### Classification Example: 2-class\n\n```Python\nfrom mlautomator.mlautomator import MLAutomator\n\nautomator = MLAutomator(x, y, iterations = 25)\nautomator.find_best_algorithm()\nautomator.print_best_space()\n```\n\n__MLAutomator__ can typically find a ~ 98th percentile solution in a fraction of the time of __Gridsearch or Randomized search__. Here it did a comprehensive scan across all hyperparameters for 6 common machine learning algorithms and produced exceptional model performance for the classic Pima Indians Diabetes dataset.\n\n```\nBest Algorithm Configuration:\n Best algorithm: Logistic Regression\n Best accuracy : 77.73239917976761%\n C : 0.02341\n k_best : 6\n penalty : l2\n scaler : RobustScaler(\n copy=True, \n quantile_range=(25.0, 75.0), \n with_centering=True,\n with_scaling=True)\n solver : lbfgs\n Found best solution on iteration 132 of 150\n Validation used: 10-fold cross-validation\n```\n\n\n### Classification Example: Multi-class\n\nHere are the results from the classic iris dataset, a multi-class classification problem with three classes\n\n```Python\nfrom data.utilities import from_sklearn\nfrom mlautomator.mlautomator import MLAutomator\n\nx, y = from_sklearn('iris')\nautomator = MLAutomator(x, y, iterations = 30, algo_type = 'classifier', score_metric = 'accuracy')\nautomator.find_best_algorithm()\nautomator.print_best_space()\n```\n\n```\nBest Algorithm Configuration:\n Best algorithm: Bag of Support Vector Machine Classifiers\n Best accuracy : 96.67%\n C : 0.7064\n degree : 2\n gamma : auto\n k_best : 2\n kernel : rbf\n n_estimators : 9\n probability : True\n scaler : None\n Found best solution on iteration 3 of 30\n Validation used: 10-fold cross-validation\n```\n\n\n\n### Regression Example\n\nML Automator supports regression problems as well. In this example we call the Boston Housing dataset from __sklearn.datasets__ using one of our utility functions.\n\n```Python\nfrom data.utilities import from_sklearn\n\nx, y = from_sklearn('boston')\n```\n\n```Python\nfrom mlautomator.mlautomator import MLAutomator\n\nautomator = MLAutomator(x, y, iterations = 30, algo_type = 'regressor', score_metric = 'neg_mean_squared_error')\nautomator.find_best_algorithm()\nautomator.print_best_space()\n```\n\n```\nBest Algorithm Configuration:\n Best algorithm: K-Neighbor Regressor\n Best neg_mean_squared_error : 10.41395782834094\n algorithm : kd_tree\n k_best : 11\n n_neighbors : 2\n scaler : StandardScaler(copy=True, with_mean=True, with_std=True)\n weights : distance\n Found best solution on iteration 24 of 30\n Validation used: 10-fold cross-validation\n```\n\n\n## Model Persistence \n\nML Automator allows you save fit, save, and load the optimal pipeline discovered by the `find_best_algorithm()` method. A complete workflow would look something like this: \n\n\n```Python\nfrom data.utilities import clf_prep\nfrom mlautomator.mlautomator import MLAutomator\n\nx, y = clf_prep('pima-indians-diabetes.csv')\nautomator = MLAutomator(x, y, iterations = 30, algo_type = 'classifier', score_metric = 'accuracy')\nautomator.find_best_algorithm()\nautomator.fit_best_pipeline()\nautomator.save_best_pipeline('Path/to/your/directory')\n\n# some time later....\n\nautomator.load_best_pipeline('Path/to/your/directory')\n\n```\n\nNote that MLAutomator is storing the entire __transform/feature selection/model__ pipeline for you so that none of the prerequisite processing needs to be done when you need to make predictions on out-of-sample data. \n\n\n\n## Existing Algorithm Support\n\n__MLAutomator__ currently supports the following algorithms:\n\n### Classification:\n* XGBoost Classifier\n* Random Forest Classifier\n* Support Vector Machines\n* Naive Bayes Classifier\n* Stochastic Gradient Descent Classification (SGD)\n* K-Nearest Neighbors Classification\n* Logistic Regression \n\n### Regression: \n* XGBoost Regressor\n* Random Forest Regressor\n* Support Vector Machine Regression\n* SGD Regression\n* K-Nearest Neighbors Regression\n\nUnless otherwise declared using the __specific_algos__ argument, MLAutomator will scan all algorithms to find the best performer.", "description_content_type": "text/markdown", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/VanAurum/ML-automator", "keywords": "", "license": "", "maintainer": "", "maintainer_email": "", "name": "mlautomator", "package_url": "https://pypi.org/project/mlautomator/", "platform": "", "project_url": "https://pypi.org/project/mlautomator/", "project_urls": { "Homepage": "https://github.com/VanAurum/ML-automator" }, "release_url": "https://pypi.org/project/mlautomator/1.1.3/", "requires_dist": null, "requires_python": "", "summary": "A fast, simple way to train machine learning algorithms", "version": "1.1.3" }, "last_serial": 5636776, "releases": { "1.0.3": [ { "comment_text": "", "digests": { "md5": "f4de83a298bce002a0dfe28e5d5f0b48", "sha256": "5955cb9bf88e4a6f3ab1b560a84f50c35ea2b8569f05f671a880d77af054c9f1" }, "downloads": -1, "filename": "mlautomator-1.0.3-py3-none-any.whl", "has_sig": false, "md5_digest": "f4de83a298bce002a0dfe28e5d5f0b48", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 19564, "upload_time": "2019-05-27T19:44:25", "url": "https://files.pythonhosted.org/packages/dd/9b/d7b46f4ecb78c3dc67b794487a97e2ad467d1982284ced7527df10f6c5f3/mlautomator-1.0.3-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "b095d1591471c3b3ab55ae552b8b3ee3", "sha256": "e8046198b89086b184ad5066e888fc62764826e1c347595dc39224fc525f2029" }, "downloads": -1, "filename": "mlautomator-1.0.3.tar.gz", "has_sig": false, "md5_digest": "b095d1591471c3b3ab55ae552b8b3ee3", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 13235, "upload_time": "2019-05-27T19:44:27", "url": "https://files.pythonhosted.org/packages/86/3c/4d9b4366eff843b58de7c15f36deff02c9d79fd44bf6f05f8e3711a728de/mlautomator-1.0.3.tar.gz" } ], "1.0.4": [ { "comment_text": "", "digests": { "md5": "a6202793e4ded2ed04b502cb59874838", "sha256": "0a596fd4cfcde56bb2d2afe72c7640298d443a50f0e65242258a8074ae5f7fd0" }, "downloads": -1, "filename": "mlautomator-1.0.4.tar.gz", "has_sig": false, "md5_digest": "a6202793e4ded2ed04b502cb59874838", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 13306, "upload_time": "2019-05-27T19:54:05", "url": "https://files.pythonhosted.org/packages/86/8c/e90387c8e26735e02ff2d6a2f00b505240c8f664250d9d2e638d4172d47e/mlautomator-1.0.4.tar.gz" } ], "1.0.5": [ { "comment_text": "", "digests": { "md5": "e0d7cfd274f6986f854b35e9d182f675", "sha256": "6b551cab09ba0270fdd55f5f01275b42576e42a56797128877bd915a6e4cc21d" }, "downloads": -1, "filename": "mlautomator-1.0.5.tar.gz", "has_sig": false, "md5_digest": "e0d7cfd274f6986f854b35e9d182f675", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 13405, "upload_time": "2019-05-27T21:21:35", "url": "https://files.pythonhosted.org/packages/a4/05/4b688bbd26a533c2bd88c3c2e0df42d9bc36373495aeb3e35cce95d9a0a9/mlautomator-1.0.5.tar.gz" } ], "1.0.6": [ { "comment_text": "", "digests": { "md5": "2a5680d05863c518e83ade8fc0981dec", "sha256": "b8a60cae88f87aa8929d6890325039f1369b6467bc487654690505e44260358d" }, "downloads": -1, "filename": "mlautomator-1.0.6.tar.gz", "has_sig": false, "md5_digest": "2a5680d05863c518e83ade8fc0981dec", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 13532, "upload_time": "2019-05-28T13:28:59", "url": "https://files.pythonhosted.org/packages/0c/3a/718c56ca2a28672d2cbade963e8e50e7931e84e826e5f520476f2a00075a/mlautomator-1.0.6.tar.gz" } ], "1.1.1": [ { "comment_text": "", "digests": { "md5": "b30651922426ecebd51696b4e60fa164", "sha256": "96ad2b14855088eb10e3f55a872e4a37318f63792bd559fa82b6ac427c1cbd18" }, "downloads": -1, "filename": "mlautomator-1.1.1.tar.gz", "has_sig": false, "md5_digest": "b30651922426ecebd51696b4e60fa164", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 13943, "upload_time": "2019-05-28T16:36:14", "url": "https://files.pythonhosted.org/packages/3e/ec/831a34b2559c5beced4b4e010a07b81153df7288dc38d2551f954f08b838/mlautomator-1.1.1.tar.gz" } ], "1.1.2": [ { "comment_text": "", "digests": { "md5": "409934df5d0c95f425addc0753ceecec", "sha256": "41d52790c4848196076a612140ca63a6464d0602006ebead170a40271710a4a3" }, "downloads": -1, "filename": "mlautomator-1.1.2.tar.gz", "has_sig": false, "md5_digest": "409934df5d0c95f425addc0753ceecec", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 16894, "upload_time": "2019-08-05T19:01:25", "url": "https://files.pythonhosted.org/packages/e3/49/b052a7f60510d2eb5299b777111c3b7a96b16b1a6b459f77a52328358137/mlautomator-1.1.2.tar.gz" } ], "1.1.3": [ { "comment_text": "", "digests": { "md5": "f4a1838165019e90282861823d9b81c3", "sha256": "a96adb9848c2562ad0ccf2f844620fe8cd53811332384277ec0caf5abbd8b030" }, "downloads": -1, "filename": "mlautomator-1.1.3.tar.gz", "has_sig": false, "md5_digest": "f4a1838165019e90282861823d9b81c3", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 16914, "upload_time": "2019-08-05T22:47:24", "url": "https://files.pythonhosted.org/packages/d4/a5/7e827e5695717abc4743697c2fa76e1d660420a561466f124cf589cf5f17/mlautomator-1.1.3.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "f4a1838165019e90282861823d9b81c3", "sha256": "a96adb9848c2562ad0ccf2f844620fe8cd53811332384277ec0caf5abbd8b030" }, "downloads": -1, "filename": "mlautomator-1.1.3.tar.gz", "has_sig": false, "md5_digest": "f4a1838165019e90282861823d9b81c3", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 16914, "upload_time": "2019-08-05T22:47:24", "url": "https://files.pythonhosted.org/packages/d4/a5/7e827e5695717abc4743697c2fa76e1d660420a561466f124cf589cf5f17/mlautomator-1.1.3.tar.gz" } ] }