{ "info": { "author": "", "author_email": "ahmedmagdi@outlook.com", "bugtrack_url": null, "classifiers": [ "Development Status :: 4 - Beta", "Intended Audience :: Developers", "License :: OSI Approved :: MIT License", "Programming Language :: Python :: 3", "Programming Language :: Python :: 3.4", "Programming Language :: Python :: 3.5", "Programming Language :: Python :: 3.6", "Programming Language :: Python :: 3.7", "Topic :: Software Development :: Libraries" ], "description": "# Nested-Cross-Validation\nThis repository implements a general nested cross-validation function. Ready to use with ANY estimator that implements the Scikit-Learn estimator interface.\n## Installing the pacakge:\nYou can find the package on [pypi](https://pypi.org/project/nested-cv/)* and install it via pip by using the following command:\n```bash\npip install nested-cv\n```\nYou can also install it from the wheel file on the [Releases](https://github.com/casperbh96/Nested-Cross-Validation/releases) page.\n\n\\* we gradually push updates, pull this master from github if you want the absolute latest changes.\n\n## Usage\nBe mindful of the options that are available for NestedCV. Some cross-validation options are defined in a dictionary `cv_options`.\nThis package is optimized for any estimator that implements a scikit-learn wrapper, e.g. XGBoost, LightGBM, KerasRegressor, KerasClassifier etc.\n\n### Single algorithm\nHere is a single example using Random Forest\n```python\n#import the package \nfrom nested_cv import NestedCV\n# Define a parameters grid\nparam_grid = {\n 'max_depth': [3, None],\n 'n_estimators': [100,200,300,400,500,600,700,800,900,1000],\n 'max_features' : [50,100,150,200] # Note: You might not have that many features\n}\n\n# Define parameters for function\n# Default scoring: RMSE\nnested_CV_search = NestedCV(model=RandomForestRegressor(), params_grid=param_grid , outer_kfolds=5, inner_kfolds=5, \n \t cv_options={'sqrt_of_score':True, 'randomized_search_iter':30})\nnested_CV_search.fit(X=X,y=y)\ngrid_nested_cv.score_vs_variance_plot()\nprint('\\nCumulated best parameter grid was:\\n{0}'.format(nested_CV_search.best_params))\n```\n\n### Multiple algorithms\nHere is an example using Random Forest, XGBoost and LightGBM.\n```python\nmodels_to_run = [RandomForestRegressor(), xgb.XGBRegressor(), lgb.LGBMRegressor()]\nmodels_param_grid = [ \n { # 1st param grid, corresponding to RandomForestRegressor\n 'max_depth': [3, None],\n 'n_estimators': [100,200,300,400,500,600,700,800,900,1000],\n 'max_features' : [50,100,150,200]\n }, \n { # 2nd param grid, corresponding to XGBRegressor\n 'learning_rate': [0.05],\n 'colsample_bytree': np.linspace(0.3, 0.5),\n 'n_estimators': [100,200,300,400,500,600,700,800,900,1000],\n 'reg_alpha' : (1,1.2),\n 'reg_lambda' : (1,1.2,1.4)\n },\n { # 3rd param grid, corresponding to LGBMRegressor\n 'learning_rate': [0.05],\n 'n_estimators': [100,200,300,400,500,600,700,800,900,1000],\n 'reg_alpha' : (1,1.2),\n 'reg_lambda' : (1,1.2,1.4)\n }\n ]\n\nfor i,model in enumerate(models_to_run):\n nested_CV_search = NestedCV(model=model, params_grid=models_param_grid[i], outer_kfolds=5, inner_kfolds=5, \n cv_options={'sqrt_of_score':True, 'randomized_search_iter':30})\n nested_CV_search.fit(X=X,y=y)\n model_param_grid = nested_CV_search.best_params\n\n print('\\nCumulated best parameter grid was:\\n{0}'.format(model_param_grid))\n```\n### NestedCV Parameters \n| Name | type | description |\n| :------------- |:-------------| :-----|\n| model | estimator | The estimator implements scikit-learn estimator interface. |\n| params_grid | dictionary \"dict\" | The dict contains hyperparameters for model. |\n| outer_kfolds | int | Number of outer K-partitions in KFold |\n| inner_kfolds | int | Number of inner K-partitions in KFold | \n| cv_options | dictionary \"dict\" | [Next section](#cv_options-value-options) |\n| n_jobs | int | Number of jobs for joblib to run (multiprocessing) | \n\n### `cv_options` value options\n**`metric` :** Callable from sklearn.metrics, default = mean_squared_error\n\n      A scoring metric used to score each model\n\n**`metric_score_indicator_lower` :** boolean, default = True\n\n      Choose whether lower score is better for the metric calculation or higher score is better.\n\n**`sqrt_of_score` :** boolean, default = False\n\n      Whether or not if the square root should be taken of score\n\n**`randomized_search` :** boolean, default = True\n\n      Whether to use gridsearch or randomizedsearch from sklearn\n\n**`randomized_search_iter` :** int, default = 10\n\n      Number of iterations for randomized search\n\n**`recursive_feature_elimination` :** boolean, default = False\n\n      Whether to do feature elimination\n\n**`predict_proba` :** boolean, default = False\n\n      If true, predict probabilities instead for a class, instead of predicting a class\n\n**`multiclass_average` :** string, default = 'binary'\n\n      For some classification metrics with a multiclass prediction, you need to specify an\n average other than 'binary'\n\n### Returns\n**`variance` :** Model variance by numpy.var()\n\n**`outer_scores` :** A list of the outer scores, from the outer cross-validation\n\n**`best_inner_score_list` :** A list of best inner scores for each outer loop\n\n**`best_params` :** All best params from each inner loop cumulated in a dict\n\n**`best_inner_params_list` :** Best inner params for each outer loop as an array of dictionaries\n\n## How to use the output?\nWe suggest looking at the best hyperparameters together with the score for each outer loop. Look at how stable the model appears to be in a nested cross-validation setting. If the outer score changes a lot, then it might indicate instability in your model. In that case, start over with making a new model.\n\n### After Nested Cross-Validation?\nIf the results from nested cross-validation are stable: Run a normal cross-validation with the same procedure as in nested cross-validation, i.e. if you used feature selection in nested cross-validation, you should also do that in normal cross-validation.\n\n## Limitations\n- [XGBoost](https://xgboost.readthedocs.io/en/latest/) implements a `early_stopping_rounds`, which cannot be used in this implementation. Other similar parameters might not work in combination with this implementation. The function will have to be adopted to use special parameters like that.\n\n## What did we learn?\n- Using [Scikit-Learn](https://github.com/scikit-learn/scikit-learn) will lead to a faster implementation, since the Scikit-Learn community has implemented many functions that does much of the work.\n- We have learned and applied this package in our main project about [House Price Prediction](https://github.com/casperbh96/house-price-prediction).\n\n## Why use Nested Cross-Validation?\nControlling the bias-variance tradeoff is an essential and important task in machine learning, indicated by [[Cawley and Talbot, 2010]](http://jmlr.csail.mit.edu/papers/volume11/cawley10a/cawley10a.pdf). Many articles indicate that this is possible by the use of nested cross-validation, one of them by [Varma and Simon, 2006](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1397873/pdf/1471-2105-7-91.pdf). Other interesting literature for nested cross-validation are [[Varoquaox et al., 2017]](https://arxiv.org/pdf/1606.05201.pdf) and [[Krstajic et al., 2014]](https://jcheminf.biomedcentral.com/track/pdf/10.1186/1758-2946-6-10).\n\n\n", "description_content_type": "text/markdown", "docs_url": null, "download_url": "https://github.com/user/reponame/archive/v_01.tar.gz", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/casperbh96/Nested-Cross-Validation", "keywords": "ml,xgboost,numpy,scikit-learn,pandas", "license": "MIT", "maintainer": "", "maintainer_email": "casperbh.96@gmail.com", "name": "nested-cv", "package_url": "https://pypi.org/project/nested-cv/", "platform": "", "project_url": "https://pypi.org/project/nested-cv/", "project_urls": { "Download": "https://github.com/user/reponame/archive/v_01.tar.gz", "Homepage": "https://github.com/casperbh96/Nested-Cross-Validation" }, "release_url": "https://pypi.org/project/nested-cv/0.916/", "requires_dist": [ "pandas", "matplotlib", "scikit-learn", "numpy" ], "requires_python": "", "summary": "A general package to handle nested cross-validation for any estimator that implements the scikit-learn estimator interface.", "version": "0.916" }, "last_serial": 5739522, "releases": { "0.9": [ { "comment_text": "", "digests": { "md5": "588e626202a37d488349bfdb306a3b27", "sha256": "2e49c5bc876bf8f495aee6dabcc7989c8dc4705e9700ec8031e4ccf7b11319dc" }, "downloads": -1, "filename": "nested_cv-0.9-py2-none-any.whl", "has_sig": false, "md5_digest": "588e626202a37d488349bfdb306a3b27", "packagetype": "bdist_wheel", "python_version": "py2", "requires_python": null, "size": 5349, "upload_time": "2019-05-17T17:18:57", "url": "https://files.pythonhosted.org/packages/dc/f4/c3f84f36a645e77770609b46a462aa73ee110c54becd1f4b6517b8fba5d2/nested_cv-0.9-py2-none-any.whl" }, { "comment_text": "", "digests": { "md5": "f837413815f7ca77030d5dba98db929b", "sha256": "42012664fa11aa602258dd0bab62da327a3c8071494f248308c493dc9b3df2a0" }, "downloads": -1, "filename": "nested_cv-0.9.tar.gz", "has_sig": false, "md5_digest": "f837413815f7ca77030d5dba98db929b", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 5683, "upload_time": "2019-05-17T17:18:59", "url": "https://files.pythonhosted.org/packages/e8/e1/216e0c921592ecbf074479c3063a452eb69862aec1a234b546d43eccae47/nested_cv-0.9.tar.gz" } ], "0.91": [ { "comment_text": "", "digests": { "md5": "db21f58bb1e42bd6d40ff755de14d5a4", "sha256": "8c82ffef1a8a722a6959e4df2f279ea9cd25e59849a8dadf24bd8ce0f816b37b" }, "downloads": -1, "filename": "nested_cv-0.91-py3-none-any.whl", "has_sig": false, "md5_digest": "db21f58bb1e42bd6d40ff755de14d5a4", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 5359, "upload_time": "2019-05-17T17:25:04", "url": "https://files.pythonhosted.org/packages/28/79/e174c83c23643e081bf048a1d9709eeff78bcec89d2caf67e43ec6107aef/nested_cv-0.91-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "3683e665f101bf03ab027b943753612c", "sha256": "0e87ef14263baed03d22c8ce0ef31762bee227997e1f8b0ce8a1461c3a780fef" }, "downloads": -1, "filename": "nested_cv-0.91.tar.gz", "has_sig": false, "md5_digest": "3683e665f101bf03ab027b943753612c", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 4061, "upload_time": "2019-05-17T17:25:05", "url": "https://files.pythonhosted.org/packages/30/d4/dac2f3c2184259047f4da1f5add6ef3655bef9fc3fe4539a399b9d27be2b/nested_cv-0.91.tar.gz" } ], "0.911": [ { "comment_text": "", "digests": { "md5": "a224d94e977729ba474341a945880e6e", "sha256": "5153c8294b3e4418e6a1a4dfda45c6c8f0cfc4d9bfd2e29647dbe6bdb494922b" }, "downloads": -1, "filename": "nested_cv-0.911-py3-none-any.whl", "has_sig": false, "md5_digest": "a224d94e977729ba474341a945880e6e", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 7312, "upload_time": "2019-05-17T17:31:37", "url": "https://files.pythonhosted.org/packages/3d/7b/fc185b3bea0619b660b91d5893bb9548cba8c134b73394615893797ed52f/nested_cv-0.911-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "10a4ec4604b71193c1b659925a4267a8", "sha256": "af0b2504fe151f6cab8fec176ac4198cccf9c604722f210c3ad2c05a4c682149" }, "downloads": -1, "filename": "nested_cv-0.911.tar.gz", "has_sig": false, "md5_digest": "10a4ec4604b71193c1b659925a4267a8", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 6250, "upload_time": "2019-05-17T17:31:38", "url": "https://files.pythonhosted.org/packages/eb/cf/af209f57e7f0e18fb55f5fde13e01f6356e0879f76a7ffaa504b66238cc9/nested_cv-0.911.tar.gz" } ], "0.912": [ { "comment_text": "", "digests": { "md5": "faacb49ea95266c8680f9263d24b802e", "sha256": "a2b6e86a1ae098d1b6265d9f08d683712de9cbe48820bb8c331257f698ecadfe" }, "downloads": -1, "filename": "nested_cv-0.912-py3-none-any.whl", "has_sig": false, "md5_digest": "faacb49ea95266c8680f9263d24b802e", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 4135, "upload_time": "2019-05-17T17:50:46", "url": "https://files.pythonhosted.org/packages/49/73/58ac94eaf9286c2dd236e02fc0be19fb3ff4e41c31c5024571a6858ae0fc/nested_cv-0.912-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "d2f00cb52968e1cf32eeae23450ca83d", "sha256": "dd33819e0ff2881be375275216852882b2b3f19f6f3367d0a2c1b8421ded0ae1" }, "downloads": -1, "filename": "nested_cv-0.912.tar.gz", "has_sig": false, "md5_digest": "d2f00cb52968e1cf32eeae23450ca83d", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 3638, "upload_time": "2019-05-17T17:50:47", "url": "https://files.pythonhosted.org/packages/11/4c/f9bc092755ab7133cfc2e5584b1b41fc2f0c678650d096e881476067a463/nested_cv-0.912.tar.gz" } ], "0.913": [ { "comment_text": "", "digests": { "md5": "b6cd5bd9d38be0942bf3f7ddba7a807d", "sha256": "20a555055835704dc0627a8c49804fa8f55f6f5024beb08aef6cf4fa771fb8f0" }, "downloads": -1, "filename": "nested_cv-0.913-py3-none-any.whl", "has_sig": false, "md5_digest": "b6cd5bd9d38be0942bf3f7ddba7a807d", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 4136, "upload_time": "2019-05-17T18:34:35", "url": "https://files.pythonhosted.org/packages/92/8b/161246ce4323e00f799cf9f863895ea43d78d12e236d6e922be6231bbfcb/nested_cv-0.913-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "ac6e5b5b9fe5dbd4cb43074f85408baf", "sha256": "01b2f50370d01cc855b0eb4cc5d90e29a8bc1fcf471600600cf414a22daeb9bb" }, "downloads": -1, "filename": "nested_cv-0.913.tar.gz", "has_sig": false, "md5_digest": "ac6e5b5b9fe5dbd4cb43074f85408baf", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 3637, "upload_time": "2019-05-17T18:34:37", "url": "https://files.pythonhosted.org/packages/40/c8/018a5b2ba0c1179efd05195af6266d372f87b7066bbe153dab142197b471/nested_cv-0.913.tar.gz" } ], "0.914": [ { "comment_text": "", "digests": { "md5": "5434f180d2e2b1e9d2055a19e11c333f", "sha256": "0c748584e0db43056044f0cfd64e5020849f8385095ff3811fa1910a49cb6bf3" }, "downloads": -1, "filename": "nested_cv-0.914-py3-none-any.whl", "has_sig": false, "md5_digest": "5434f180d2e2b1e9d2055a19e11c333f", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 7340, "upload_time": "2019-05-17T18:55:45", "url": "https://files.pythonhosted.org/packages/b7/24/0633ac9f137f4b8aa1d3f3916b4c281b42276e6b3a5ab4338b217f6e7bc1/nested_cv-0.914-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "93178b5a3faa66f481087ad48d95ddc3", "sha256": "82aa991891b3522f3d1e328e1132c1afe4506360bb35e0d8306b4fe87554a273" }, "downloads": -1, "filename": "nested_cv-0.914.tar.gz", "has_sig": false, "md5_digest": "93178b5a3faa66f481087ad48d95ddc3", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 6275, "upload_time": "2019-05-17T18:55:46", "url": "https://files.pythonhosted.org/packages/85/b9/c1d2886314a7555c5c469e5ab2439b70c4397220ccc5259d7a6ee245f765/nested_cv-0.914.tar.gz" } ], "0.915": [ { "comment_text": "", "digests": { "md5": "79a1b9d2f944850eeca02ff46e4854d6", "sha256": "bcb5762271e1d7321682a0232e14f0b71dc9c73bc5d4f171ac11d5fef65ae4ed" }, "downloads": -1, "filename": "nested_cv-0.915-py3-none-any.whl", "has_sig": false, "md5_digest": "79a1b9d2f944850eeca02ff46e4854d6", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 7345, "upload_time": "2019-05-17T19:05:35", "url": "https://files.pythonhosted.org/packages/16/d1/ea0c4a64ae06be1df078e1ec74de25b83a4cbcc5aab547ccaabf64f79788/nested_cv-0.915-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "6eb3cc5b35cffd20990f5399707837d7", "sha256": "d313e381ce828397598120491edee70de16869f14110a1e55317006ef2f406bb" }, "downloads": -1, "filename": "nested_cv-0.915.tar.gz", "has_sig": false, "md5_digest": "6eb3cc5b35cffd20990f5399707837d7", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 6277, "upload_time": "2019-05-17T19:05:37", "url": "https://files.pythonhosted.org/packages/fb/d0/a147b0688cc811ea3f315163f39978c558ff554d92f0e30fbc48c83819f3/nested_cv-0.915.tar.gz" } ], "0.916": [ { "comment_text": "", "digests": { "md5": "3fe290c8e648ba0abb36d8775e03f60b", "sha256": "270fb8d22413b7ff3cf3f96455c6b9f13a1d1f739401eedb213dd8b372602604" }, "downloads": -1, "filename": "nested_cv-0.916-py3-none-any.whl", "has_sig": false, "md5_digest": "3fe290c8e648ba0abb36d8775e03f60b", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 8681, "upload_time": "2019-08-27T21:10:47", "url": "https://files.pythonhosted.org/packages/fc/92/46185d13c88fb01e0b64674f80f70d41fe5d85610d7cc0638f6e4de9a074/nested_cv-0.916-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "83394ba6c719d171849ebb5a84142c17", "sha256": "becc7a0a663e9c9e9301fa97e2ebba19ab913bd5487cc162cdeb562298b18e40" }, "downloads": -1, "filename": "nested_cv-0.916.tar.gz", "has_sig": false, "md5_digest": "83394ba6c719d171849ebb5a84142c17", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 8389, "upload_time": "2019-08-27T21:10:49", "url": "https://files.pythonhosted.org/packages/68/55/839859d1a78093bd35d8ec149ca5b7820bfdb9451549d4e8a3c30f3434c1/nested_cv-0.916.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "3fe290c8e648ba0abb36d8775e03f60b", "sha256": "270fb8d22413b7ff3cf3f96455c6b9f13a1d1f739401eedb213dd8b372602604" }, "downloads": -1, "filename": "nested_cv-0.916-py3-none-any.whl", "has_sig": false, "md5_digest": "3fe290c8e648ba0abb36d8775e03f60b", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 8681, "upload_time": "2019-08-27T21:10:47", "url": "https://files.pythonhosted.org/packages/fc/92/46185d13c88fb01e0b64674f80f70d41fe5d85610d7cc0638f6e4de9a074/nested_cv-0.916-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "83394ba6c719d171849ebb5a84142c17", "sha256": "becc7a0a663e9c9e9301fa97e2ebba19ab913bd5487cc162cdeb562298b18e40" }, "downloads": -1, "filename": "nested_cv-0.916.tar.gz", "has_sig": false, "md5_digest": "83394ba6c719d171849ebb5a84142c17", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 8389, "upload_time": "2019-08-27T21:10:49", "url": "https://files.pythonhosted.org/packages/68/55/839859d1a78093bd35d8ec149ca5b7820bfdb9451549d4e8a3c30f3434c1/nested_cv-0.916.tar.gz" } ] }