{ "info": { "author": "Christopher Shymansky", "author_email": "CMShymansky@gmail.com", "bugtrack_url": null, "classifiers": [ "Development Status :: 3 - Alpha", "License :: OSI Approved :: Apache Software License", "Topic :: Utilities" ], "description": "# What\nPyplearnr is a tool designed to perform model selection, hyperparameter tuning, and model validation via nested k-fold cross-validation in a reproducible way.\n\n# Why\nI found GridSearchCV to be lacking. I wanted a tool that used a similar procedure to perform simultaneous hyperparameter tuning AND model selection with a clear input that summarizes exactly what scikit-learn pipeline steps and parameter combinations will used and whose results allow perfect reproducibility. So, I made my own.\n\n# How\n### Use\nSee the [demo](https://nbviewer.jupyter.org/github/JaggedParadigm/pyplearnr/blob/master/pyplearnr_demo.ipynb) for more detailed use of pyplearnr with actual data.\n\nHere are the basic steps:\n#### 1) Place feature data into non-null feature matrix and target vector\n#### 2) Initialize the nested k-fold cross-validation object\n```python\nkfcv = ppl.NestedKFoldCrossValidation(outer_loop_fold_count=5, \n inner_loop_fold_count=5)\n```\n#### 3) Specify the combinatorial pipeline schematic detailing all possible model/parameter combinations \n\nEx: Here's an example of model/parameter combinations of optional scaling of two types, a principal component analysis directly using scikit-learn's sklearn.decomposition.PCA transformer, selection of data transformed by k principal components (between 1 and 30), and the use of either a k-nearest neighbors classifier (k between 1 and 30) or random forest classifier with a maximum depth between 2 and 5 (and a specified random state for reproducibility).\n\n```python\npipeline_schematic = [\n {'scaler': {\n 'none': {},\n 'min_max': {},\n 'standard': {}\n }\n },\n {'transform': {\n 'pca': {\n 'sklo': sklearn.decomposition.PCA,\n 'n_components': [feature_count]\n }\n } \n },\n {'feature_selection': {\n 'select_k_best': {\n 'k': range(1, feature_count+1)\n }\n }\n },\n {'estimator': {\n 'knn': {\n 'n_neighbors': range(1,31)\n },\n 'random_forest': {\n 'sklo': RandomForestClassifier,\n 'max_depth': range(2,6),\n 'random_state': [57]\n\t\t\t}\n }\n }\n]\n```\n\n#### 4) Run pyplearnr\n```python\n# Perform nested k-fold cross-validation\nkfcv.fit(X, y, pipeline_schematic=pipeline_schematic, \n scoring_metric='auc', score_type='median')\n```\n### Methodology\nThe core model selection and validation method is nested k-fold cross-validation (stratified if for classification). Inner-fold contests are used for model selection and outer-folds are used to cross-validate the final winning model. \n\nHere's the basic algorithm used by pyplearnr:\n\n- 1) Pyplearnr shuffles and divides the data into k validation outer-folds. \n- 2) For each outer-fold:\n\t- a) The remaining folds are combined to form the corresponding training set\n\t- b) This training set is divided into k (or possibly a different number) of inner-test-folds.\n\t- c) For each inner-test-fold:\n\t - i) The remaining inner-test-folds are combined and used to train all pipelines/models, which are scored on the corresponding inner-test-fold\n - d) The winning model/pipeline of each inner-test-fold contest is chosen as that with the best median score over all inner-test-folds\n\t - iii) The user is alerted If there is a tie and expected to decide the winning pipeline (usually the simplest for better generalizability)\n- 4) The final winning model/pipeline is chosen as that with the most number of wins from all inner-test-fold contests corresponding to each outer-fold \n\t- e) Again, the user is expected to decide the winner If there is a tie\n- 5) This final winning model/pipeline is trained on all of the training data for each outer-fold, tested on the corresponding validation set, and summary statistics are presented to the user representing expected out-of-sample performance.\n\n\n### Installation\n##### Dependencies\n\npyplearnr requires:\n\nPython (>= 2.7 or >= 3.3)\nscikit-learn (>= 0.18.2)\nnumpy (>= 1.13.0)\nscipy (>= 0.19.1)\npandas (>= 0.20.2)\nmatplotlib (>= 2.0.2)\n\nFor use in Jupyter notebooks and the conda installation, I recommend having nb_conda (>= 2.2.0).\n\n### User installation\nInstall by using pip:\n\n```\npip install pyplearnr\n```\n\nFor conda, you can issue the same command above within a conda environment or you can include this in your environment.yml file:\n\n```\n- pip:\n - pyplearnr\n```\n\nand then either generate a new environment from the terminal using:\n\n```\nconda env create\n```\n\nor update an existing one (environment_name) using:\n\n```\nconda env update -n=environment_name -f=./environment.yml\n```\n\nAnother option is to simply clone the respository, link to the location in your code, and import it. \n\n\n\n\n", "description_content_type": null, "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "http://packages.python.org/pyplearnr", "keywords": "scikit-learn pipeline k-fold cross-validation model selection", "license": "OSI Approved :: Apache Software License", "maintainer": "", "maintainer_email": "", "name": "pyplearnr", "package_url": "https://pypi.org/project/pyplearnr/", "platform": "", "project_url": "https://pypi.org/project/pyplearnr/", "project_urls": { "Homepage": "http://packages.python.org/pyplearnr" }, "release_url": "https://pypi.org/project/pyplearnr/1.0.11.1/", "requires_dist": [ "matplotlib", "numpy", "pandas", "sklearn" ], "requires_python": "", "summary": "Pyplearnr is a tool designed to easily and more elegantly build, validate (nested k-fold cross-validation), and test scikit-learn pipelines.", "version": "1.0.11.1" }, "last_serial": 3073867, "releases": { "1.0.10": [ { "comment_text": "", "digests": { "md5": "124107649a8703b0ae1ea5a9067fed8e", "sha256": "e3bf711db1875e196b0164703257f84eb4032203a96dab25f4808b562ef48993" }, "downloads": -1, "filename": "pyplearnr-1.0.10-py2-none-any.whl", "has_sig": false, "md5_digest": "124107649a8703b0ae1ea5a9067fed8e", "packagetype": "bdist_wheel", "python_version": "py2", "requires_python": null, "size": 33157, "upload_time": "2017-07-01T07:27:07", "url": "https://files.pythonhosted.org/packages/0e/bb/3b08a00e10869a429cc0724acc0a68dac2353d0edff2b3671a8afbf3ccdc/pyplearnr-1.0.10-py2-none-any.whl" }, { "comment_text": "", "digests": { "md5": "8046ca3bd5bce437db4fd79d2e99f89b", "sha256": "aa9c9746140f296e38b55040fc297d0d3efd6928de094ef3a820cf4986f6ef01" }, "downloads": -1, "filename": "pyplearnr-1.0.10.tar.gz", "has_sig": false, "md5_digest": "8046ca3bd5bce437db4fd79d2e99f89b", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 27663, "upload_time": "2017-07-01T07:36:49", "url": "https://files.pythonhosted.org/packages/a3/51/ca7fa7b0ed8b0f6d4e69106d22a2b5f1b0c02239f7b1236d4a55b3760230/pyplearnr-1.0.10.tar.gz" } ], "1.0.10.1": [ { "comment_text": "", "digests": { "md5": "ea982d753375be98140b53f0b9697bf1", "sha256": "0b7182751bd7a1e42bdbb16e47c67abe65e3c954d50cb5496ff5aa9859ec3054" }, "downloads": -1, "filename": "pyplearnr-1.0.10.1-py2-none-any.whl", "has_sig": false, "md5_digest": "ea982d753375be98140b53f0b9697bf1", "packagetype": "bdist_wheel", "python_version": "py2", "requires_python": null, "size": 33048, "upload_time": "2017-07-18T05:23:54", "url": "https://files.pythonhosted.org/packages/53/0e/5706bd19a33aa56c4c6fd891350ecbda615382bc410130fa9cb70c301068/pyplearnr-1.0.10.1-py2-none-any.whl" }, { "comment_text": "", "digests": { "md5": "cd7de9a1cf66174bb527669976e264e6", "sha256": "711904f823f747bbcbb80950538708551ab7793388dd2ba25c368d1665f23260" }, "downloads": -1, "filename": "pyplearnr-1.0.10.1.tar.gz", "has_sig": false, "md5_digest": "cd7de9a1cf66174bb527669976e264e6", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 27523, "upload_time": "2017-07-18T05:23:58", "url": "https://files.pythonhosted.org/packages/40/b3/0e60412ed0d17dd2ec809d066653561c89e4fac0c5f7ed6add67548c1204/pyplearnr-1.0.10.1.tar.gz" } ], "1.0.11": [ { "comment_text": "", "digests": { "md5": "f3043873bc82a55fffbef9603a93cfc6", "sha256": "17a7794a4199a1cb83f5b636e0d4626e04f9d7ef7705e13242059e5be7bf9fe6" }, "downloads": -1, "filename": "pyplearnr-1.0.11-py2-none-any.whl", "has_sig": false, "md5_digest": "f3043873bc82a55fffbef9603a93cfc6", "packagetype": "bdist_wheel", "python_version": "py2", "requires_python": null, "size": 33022, "upload_time": "2017-07-18T05:23:56", "url": "https://files.pythonhosted.org/packages/c3/98/4fbf58a8c57dd149187b0c52f14103f9b7cc6e687ec0568eaaad617a95fb/pyplearnr-1.0.11-py2-none-any.whl" }, { "comment_text": "", "digests": { "md5": "12b482c426da1972b5c9f925c956b16b", "sha256": "b3c75809542206e4bda4f04b5d31a7ed13d996202879d0f3565f2822393cbc60" }, "downloads": -1, "filename": "pyplearnr-1.0.11.tar.gz", "has_sig": false, "md5_digest": "12b482c426da1972b5c9f925c956b16b", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 27528, "upload_time": "2017-07-18T05:23:59", "url": "https://files.pythonhosted.org/packages/1c/12/a2d8a44249d8c72f404b1b8308ddbd8347fed2f621690dc3046cc60c5070/pyplearnr-1.0.11.tar.gz" } ], "1.0.11.1": [ { "comment_text": "", "digests": { "md5": "6411fe90c591cbf3b2d67b4e082b64a8", "sha256": "b9e4ff9e79c1cea7c0ffe35e1964e4d233ffaafe0c5cb6cdcca0ee568b64cfc6" }, "downloads": -1, "filename": "pyplearnr-1.0.11.1-py2-none-any.whl", "has_sig": false, "md5_digest": "6411fe90c591cbf3b2d67b4e082b64a8", "packagetype": "bdist_wheel", "python_version": "py2", "requires_python": null, "size": 34265, "upload_time": "2017-08-04T22:52:09", "url": "https://files.pythonhosted.org/packages/26/10/edc08c7939b9d7f9db09d4a1c39e017d8615bbd5add20c9ec9da0a498d35/pyplearnr-1.0.11.1-py2-none-any.whl" }, { "comment_text": "", "digests": { "md5": "7d4ac3e16caefddcc655f27f5e56db19", "sha256": "92594cc4d70f314eb6d5c5889a321acfca803a220fdbbb7269d1be965e137ab7" }, "downloads": -1, "filename": "pyplearnr-1.0.11.1.tar.gz", "has_sig": false, "md5_digest": "7d4ac3e16caefddcc655f27f5e56db19", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 28568, "upload_time": "2017-08-04T22:52:11", "url": "https://files.pythonhosted.org/packages/12/76/c73a277b8271545d884bfe0b3481d19cbc5c1a118430858b5027194204a8/pyplearnr-1.0.11.1.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "6411fe90c591cbf3b2d67b4e082b64a8", "sha256": "b9e4ff9e79c1cea7c0ffe35e1964e4d233ffaafe0c5cb6cdcca0ee568b64cfc6" }, "downloads": -1, "filename": "pyplearnr-1.0.11.1-py2-none-any.whl", "has_sig": false, "md5_digest": "6411fe90c591cbf3b2d67b4e082b64a8", "packagetype": "bdist_wheel", "python_version": "py2", "requires_python": null, "size": 34265, "upload_time": "2017-08-04T22:52:09", "url": "https://files.pythonhosted.org/packages/26/10/edc08c7939b9d7f9db09d4a1c39e017d8615bbd5add20c9ec9da0a498d35/pyplearnr-1.0.11.1-py2-none-any.whl" }, { "comment_text": "", "digests": { "md5": "7d4ac3e16caefddcc655f27f5e56db19", "sha256": "92594cc4d70f314eb6d5c5889a321acfca803a220fdbbb7269d1be965e137ab7" }, "downloads": -1, "filename": "pyplearnr-1.0.11.1.tar.gz", "has_sig": false, "md5_digest": "7d4ac3e16caefddcc655f27f5e56db19", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 28568, "upload_time": "2017-08-04T22:52:11", "url": "https://files.pythonhosted.org/packages/12/76/c73a277b8271545d884bfe0b3481d19cbc5c1a118430858b5027194204a8/pyplearnr-1.0.11.1.tar.gz" } ] }