{ "info": { "author": "Aleksandra Gacek, Piotr Lubo\u0144", "author_email": "lubonp@student.mini.pw.edu.pl, gaceka@student.mini.pw.edu.pl", "bugtrack_url": null, "classifiers": [ "License :: OSI Approved :: MIT License", "Operating System :: OS Independent", "Programming Language :: Python :: 3" ], "description": "[![Downloads](https://pepy.tech/badge/safe-transformer)](https://pepy.tech/project/safe-transformer)\n[![PyPI version](https://badge.fury.io/py/safe-transformer.svg)](https://badge.fury.io/py/safe-transformer)\n\n\n# SAFE - Surrogate Assisted Feature Extraction\n\nSAFE is a python library that you can use to build better explainable ML models leveraging capabilities of more powerful, black-box models. \nThe idea is to use more complicated model - called surrogate model - to extract more information from features, which can be used later to fit some simpler but explainable model.\nInput data is divided into intervals or new set of categories, determined by surrogate model, and then it is transformed based on the interval or category each point belonged to.\nLibrary provides you with SafeTransformer class, which implements TransformerMixin interface, so it can be used as a part of the scikit-learn pipeline.\nUsing this library you can boost simple ML models, by transferring informations from more complicated models.\nArticle about SAFE on can be found [here](https://arxiv.org/abs/1902.11035).\n\n## Requirements\n\nTo install this library run:\n\n```\npip install safe-transformer\n```\n\nThe only requirement is to have Python 3 installed on your machine.\n\n## Usage with example\n\nSample code using SAFE transformer as part of scikit-learn pipeline:\n\n```python\nfrom SafeTransformer import SafeTransformer\nfrom sklearn.datasets import load_boston\nfrom sklearn.ensemble import GradientBoostingRegressor\nfrom sklearn.model_selection import train_test_split\nimport pandas as pd\nfrom sklearn.linear_model import LinearRegression\nfrom sklearn.metrics import mean_squared_error\nfrom sklearn.pipeline import Pipeline\n\ndata = load_boston()\nX = pd.DataFrame(data.data, columns=data.feature_names)\ny = data['target']\n\nX_train, X_test, y_train, y_test = train_test_split(X, y)\n\nsurrogate_model = GradientBoostingRegressor(n_estimators=100,\n max_depth=4,\n learning_rate=0.1,\n loss='huber')\nsurrogate_model = surrogate_model.fit(X_train, y_train)\n\nlinear_model = LinearRegression()\nsafe_transformer = SafeTransformer(surrogate_model, penalty = 0.84)\npipe = Pipeline(steps=[('safe', safe_transformer), ('linear', linear_model)])\npipe = pipe.fit(X_train, y_train)\npredictions = pipe.predict(X_test)\nmean_squared_error(y_test, predictions)\n\n```\n\n```bash\n13.617733207161479\n```\n\n```python\nlinear_model_standard = LinearRegression()\nlinear_model_standard = linear_model_standard.fit(X_train, y_train)\nstandard_predictions = linear_model_standard.predict(X_test)\nmean_squared_error(y_test, standard_predictions)\n```\n\n```bash\n29.27790566931337\n```\n\nAs you can see you can improve your simple model performance with help of the more powerful, black-box model, keeping the interpretability of the simple model.\n\nYou can use any model you like, as long as it has fit and predict methods in case of regression, or fit and predict_proba in case of classification. Data used to fit SAFE transformer needs to be pandas data frame. \n\nYou can also specify penalty and pelt model arguments.\n\nIn [examples folder](https://github.com/olagacek/SAFE/tree/master/examples) you can find jupyter notebooks with complete classification and regression examples.\n\nAPI reference documentation can be found [here](https://plubon.github.io/safe-docs/)\n\n## Algorithm\n\nOur goal is to divide each feature into intervals or new categories and then transform feature values based on the subset they belonged to. \nThe division is based on the response of the surrogate model. \nIn case of continuous dependent variables for each of them we find changepoints - points that indicate values of variable for which the response of the surrogate model changes quickly. Intervals between changepoints are the basis of the transformation, eg. feature is transformed to categorical variable, where feature values in the same interval form the same category. To find changepoints we need partial dependence plots. \nThese plots describe the marginal effect of a given variable (or multiple variables) on an outcome of the model.\nIn case of categorical variables for each of them we perform hierarchical clustering based on surrogate model responses. Then, based on the biggest similarity in response between categories, they are merged together forming new categories.\n\n\nAlgorithm for performing fit method is illustrated below:\n\n  \n\n![*Fit method algorithm*](images/fl.svg)\n\n  \n\nOur algorithm works both for regression and classification problems. In case of regression we simply use model response for creating partial dependence plot and hierarchical clustering. As for classification we use predicted probabilities of each class.\n\n### Continuous variable transformation\n\nHere is example of partial dependence plot. It was created for boston housing data frame, variable in example is LSTAT. To get changepoints from partial dependence plots we use [ruptures](http://ctruong.perso.math.cnrs.fr/ruptures-docs/build/html/index.html) library and its model [Pelt](http://ctruong.perso.math.cnrs.fr/ruptures-docs/build/html/detection/pelt.html).\n\n \n\n### Categorical variable transformation\n\nIn the plot below there is illustarted categorical variable transformation. To create new categories, based on the average model responses, we use scikit-learn [ward algorithm](https://scikit-learn.org/0.15/modules/generated/sklearn.cluster.Ward.html) and to find number of clusters to cut KneeLocator class from [kneed library](https://github.com/arvkevi/kneed) is used.\n\n \n\n## Model optimization\n\nOne of the parameters you can specify is penalty - it has an impact on the number of changepoints that will be created. Here you can see how the quality of the model changese with penalty. For reference results of surrogate and basic model are also in the plot.\n\n  \n\"Model\n  \n\nWith correctly chosen penalty your simple model can achieve much better accuracy, close to accuracy of surrogate model.\n\n## Variables transformation\n\nIf you are interested in how your dataset was changed you can check summary method. \n\n```python\nsafe_transformer.summary(variable_name='CRIM')\n```\n\n```\nNumerical Variable CRIM\nSelected intervals:\n\t[-Inf, 4.90)\n\t[4.90, 11.14)\n\t[11.14, 15.59)\n\t[15.59, 24.50)\n\t[24.50, 33.40)\n\t[33.40, 48.54)\n\t[48.54, Inf)\n```\n\nTo see transformations of all the variables do not specify variable_name argument.\n\n```python\nsafe_transformer.summary()\n```\n\n```\nNumerical Variable CRIM\nSelected intervals:\n\t[-Inf, 4.90)\n\t[4.90, 11.14)\n\t[11.14, 15.59)\n\t[15.59, 24.50)\n\t[24.50, 33.40)\n\t[33.40, 48.54)\n\t[48.54, Inf)\nNumerical Variable ZN\nSelected intervals:\n\t[-Inf, 33.53)\n\t[33.53, Inf)\nNumerical Variable INDUS\nSelected intervals:\n\t[-Inf, 2.78)\n\t[2.78, 3.19)\n\t[3.19, 4.28)\n\t[4.28, 10.29)\n\t[10.29, 26.68)\n\t[26.68, Inf)\n\n.\n.\n.\n\nNumerical Variable LSTAT\nSelected intervals:\n\t[-Inf, 4.55)\n\t[4.55, 4.73)\n\t[4.73, 5.43)\n\t[5.43, 5.96)\n\t[5.96, 7.55)\n\t[7.55, 8.08)\n\t[8.08, 9.67)\n\t[9.67, 9.85)\n\t[9.85, 10.02)\n\t[10.02, 14.43)\n\t[14.43, 14.96)\n\t[14.96, 16.02)\n\t[16.02, 18.14)\n\t[18.14, 19.37)\n\t[19.37, 23.96)\n\t[23.96, 26.78)\n\t[26.78, 29.61)\n\t[29.61, Inf)\n```\n## References\n\n* [Original Safe algorithm](https://mi2datalab.github.io/SAFE/index.html), implemented in R \n* [ruptures library](https://github.com/deepcharles/ruptures), used for finding changepoints\n* [kneed library](https://github.com/arvkevi/kneed), used for cutting hierarchical tree \n* [SAFE article](https://arxiv.org/abs/1902.11035) - article about SAFE algorithm, including benchmark results using SAFE library\n\nThe project was made on [research workshops classes](https://github.com/pbiecek/CaseStudies2019W) at the Warsaw University of Technology at the Faculty of Mathematics and Information Science by Aleksandra Gacek and Piotr Lubo\u0144.\n\n\n", "description_content_type": "", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/olagacek/SAFE", "keywords": "", "license": "", "maintainer": "", "maintainer_email": "", "name": "safe-transformer", "package_url": "https://pypi.org/project/safe-transformer/", "platform": "", "project_url": "https://pypi.org/project/safe-transformer/", "project_urls": { "Homepage": "https://github.com/olagacek/SAFE" }, "release_url": "https://pypi.org/project/safe-transformer/0.0.5/", "requires_dist": [ "numpy", "ruptures", "sklearn", "pandas", "scipy", "kneed" ], "requires_python": "", "summary": "Build explainable ML models using surrogate models.", "version": "0.0.5" }, "last_serial": 5244375, "releases": { "0.0.1": [ { "comment_text": "", "digests": { "md5": "5f830741f0fb237840a427157797cf80", "sha256": "4dd1bb3781564adf395724192719c1d13eaaa93a0d385b33ff7180e93bbd5846" }, "downloads": -1, "filename": "safe_transformer-0.0.1-py3-none-any.whl", "has_sig": false, "md5_digest": "5f830741f0fb237840a427157797cf80", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 7596, "upload_time": "2019-01-09T21:16:49", "url": "https://files.pythonhosted.org/packages/1f/1d/f0ac676787363cf36f977ac8b284b4dac562738f82f08461198c1e02c021/safe_transformer-0.0.1-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "4796d2ad1a48937351a126840d4cac14", "sha256": "a40e9166314172a6182c9ee21efeb19a51605aff0c996eda1f7c2394e173e832" }, "downloads": -1, "filename": "safe-transformer-0.0.1.tar.gz", "has_sig": false, "md5_digest": "4796d2ad1a48937351a126840d4cac14", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 37105, "upload_time": "2019-01-09T21:16:52", "url": "https://files.pythonhosted.org/packages/05/a1/73520215df36fa5e6fc406b86cd886e139bacbdf0e48df0606cd7bd0eeaf/safe-transformer-0.0.1.tar.gz" } ], "0.0.2": [ { "comment_text": "", "digests": { "md5": "e4f052aca8cf60c8de0989add5bc4a06", "sha256": "d2c8a737e61d799124bb4f7a16b42f9fbfa75dd538afd4cad169f44edb3ac28e" }, "downloads": -1, "filename": "safe_transformer-0.0.2-py3-none-any.whl", "has_sig": false, "md5_digest": "e4f052aca8cf60c8de0989add5bc4a06", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 9824, "upload_time": "2019-01-10T18:10:35", "url": "https://files.pythonhosted.org/packages/41/6a/d988c9a84ba8f882806c941bb687c3bdaad3f7859ab611a98af2c7c0fb1b/safe_transformer-0.0.2-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "2f6199431d690e141edcd7a349067d13", "sha256": "cd79acd8513599e051c69c1839ee95439c6d92902383349d9bfcbd000e9957e3" }, "downloads": -1, "filename": "safe-transformer-0.0.2.tar.gz", "has_sig": false, "md5_digest": "2f6199431d690e141edcd7a349067d13", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 36883, "upload_time": "2019-01-10T18:10:37", "url": "https://files.pythonhosted.org/packages/01/09/dfc2226315872cb6e77127bc0fbf08b314cf9039e01ec8cf9c8a9915adaf/safe-transformer-0.0.2.tar.gz" } ], "0.0.3": [ { "comment_text": "", "digests": { "md5": "877b1f0e726333ee3cd8cace1fa92679", "sha256": "544d489b81eff74f2cebe34a2017c025c0a9273a7b51bb3e184a3ea769096366" }, "downloads": -1, "filename": "safe_transformer-0.0.3-py3-none-any.whl", "has_sig": false, "md5_digest": "877b1f0e726333ee3cd8cace1fa92679", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 13649, "upload_time": "2019-01-17T20:04:26", "url": "https://files.pythonhosted.org/packages/56/a0/04c62c190950911d10b6e7e4d20fa03f0dcc17845f8d3ee0301bea6428ed/safe_transformer-0.0.3-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "ec4744d294d2f4e0e19b80a92e99d17e", "sha256": "5683cb6b3bac7c139d0378415a57558ff3e6b6d5bc10af9352ddafdb86d1f361" }, "downloads": -1, "filename": "safe-transformer-0.0.3.tar.gz", "has_sig": false, "md5_digest": "ec4744d294d2f4e0e19b80a92e99d17e", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 149098, "upload_time": "2019-01-17T20:04:29", "url": "https://files.pythonhosted.org/packages/a1/7c/862c4f07cadc3abfd7fcecd256482f238fe5294ae5d07fa08acf4b77d2f7/safe-transformer-0.0.3.tar.gz" } ], "0.0.4": [ { "comment_text": "", "digests": { "md5": "91fbb34e09556237373c04e212b110d9", "sha256": "a70add7239254fdb1bce35f7885949f4280d8fce200751823227c21ebdf3a458" }, "downloads": -1, "filename": "safe_transformer-0.0.4-py3-none-any.whl", "has_sig": false, "md5_digest": "91fbb34e09556237373c04e212b110d9", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 12939, "upload_time": "2019-01-20T20:11:27", "url": "https://files.pythonhosted.org/packages/fa/d9/284a94c65b3131f9de8a6664b466a9717552491ff7712e49733a1f08dedc/safe_transformer-0.0.4-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "6e8f129e6ed4be973b9c0c3a25ed7414", "sha256": "9441de7700341c850948e9b79ba92daecf530c8405e5152449d620cab9db7f88" }, "downloads": -1, "filename": "safe-transformer-0.0.4.tar.gz", "has_sig": false, "md5_digest": "6e8f129e6ed4be973b9c0c3a25ed7414", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 148619, "upload_time": "2019-01-20T20:11:33", "url": "https://files.pythonhosted.org/packages/d8/b8/78de7f2517c8691af1d23420d5b268802bb5ac6d867714254d9d2dce2f01/safe-transformer-0.0.4.tar.gz" } ], "0.0.5": [ { "comment_text": "", "digests": { "md5": "2bb186e1eafe451427d1746e12eb64a4", "sha256": "1c075cd4abb829f9e8baeaf4d7ce5e8f349a99a5a57a2603042263ce8f836697" }, "downloads": -1, "filename": "safe_transformer-0.0.5-py3-none-any.whl", "has_sig": false, "md5_digest": "2bb186e1eafe451427d1746e12eb64a4", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 14602, "upload_time": "2019-05-08T19:40:40", "url": "https://files.pythonhosted.org/packages/74/a6/1903361772adb10876d83ddbdaf671b6a4c5f22cf30526b288ccaf4072aa/safe_transformer-0.0.5-py3-none-any.whl" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "2bb186e1eafe451427d1746e12eb64a4", "sha256": "1c075cd4abb829f9e8baeaf4d7ce5e8f349a99a5a57a2603042263ce8f836697" }, "downloads": -1, "filename": "safe_transformer-0.0.5-py3-none-any.whl", "has_sig": false, "md5_digest": "2bb186e1eafe451427d1746e12eb64a4", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 14602, "upload_time": "2019-05-08T19:40:40", "url": "https://files.pythonhosted.org/packages/74/a6/1903361772adb10876d83ddbdaf671b6a4c5f22cf30526b288ccaf4072aa/safe_transformer-0.0.5-py3-none-any.whl" } ] }