{ "info": { "author": "Micha\u0142 Ku\u017aba", "author_email": "michal.kuzba@students.mimuw.edu.pl", "bugtrack_url": null, "classifiers": [ "Intended Audience :: Developers", "Intended Audience :: Science/Research", "License :: OSI Approved :: Apache Software License", "Operating System :: OS Independent", "Programming Language :: Python :: 3", "Programming Language :: Python :: 3.5", "Programming Language :: Python :: 3.6", "Programming Language :: Python :: 3.7", "Topic :: Scientific/Engineering :: Artificial Intelligence", "Topic :: Scientific/Engineering :: Visualization" ], "description": "\n[![travis](https://travis-ci.org/ModelOriented/pyCeterisParibus.svg?branch=master)](https://travis-ci.org/ModelOriented/pyCeterisParibus)\n[![codecov](https://codecov.io/gh/ModelOriented/pyCeterisParibus/branch/master/graph/badge.svg)](https://codecov.io/gh/ModelOriented/pyCeterisParibus)\n[![Documentation Status](https://readthedocs.org/projects/pyceterisparibus/badge/?version=latest)](https://pyceterisparibus.readthedocs.io/en/latest/?badge=latest)\n[![Downloads](https://pepy.tech/badge/pyceterisparibus)](https://pepy.tech/project/pyceterisparibus)\n[![PyPI version](https://badge.fury.io/py/pyCeterisParibus.svg)](https://badge.fury.io/py/pyCeterisParibus)\n\n# pyCeterisParibus\npyCeterisParibus is a Python library based on an *R* package [CeterisParibus](https://github.com/pbiecek/ceterisParibus).\nIt implements Ceteris Paribus Plots.\nThey allow understanding how the model response would change if a selected variable is changed. \nIt\u2019s a perfect tool for What-If scenarios. Ceteris Paribus is a Latin phrase meaning all else unchanged. \nThese plots present the change in model response as the values of one feature change with all others being fixed. \nCeteris Paribus method is model-agnostic - it works for any Machine Learning model.\nThe idea is an extension of PDP (Partial Dependency Plots) and ICE (Individual Conditional Expectations) plots.\nIt allows explaining single observations for multiple variables at the same time.\nThe plot engine is developed [here](https://github.com/ModelOriented/ceterisParibusD3).\n\n## Why is it so useful?\nThere might be several motivations behind utilizing this idea. \nImagine a person gets a low credit score. \nThe client wants to understand how to increase the score and the scoring institution (e.g. a bank) should be able to answer such questions. \nMoreover, this method is useful for researchers and developers to analyze, debug, explain and improve Machine Learning models, assisting the entire process of the model design.\n\n## Setup\nTested on Python 3.5+\n\nPyCeterisParibus is on [PyPI](https://pypi.org/project/pyCeterisParibus/). Simply run:\n\n```bash\npip install pyCeterisParibus\n```\nor install the newest version from GitHub by executing:\n```bash\npip install git+https://github.com/ModelOriented/pyCeterisParibus\n```\nor download the sources, enter the main directory and perform:\n```bash\nhttps://github.com/ModelOriented/pyCeterisParibus.git\ncd pyCeterisParibus\npython setup.py install # (alternatively use pip install .)\n```\n\n## Docs\nA detailed description of all methods and their parameters might be found in [documentation](https://pyceterisparibus.readthedocs.io/en/latest/ceteris_paribus.html).\n\nTo build the documentation locally:\n```bash\npip install -r requirements-dev.txt\ncd docs\nmake html\n```\nand open `_build/html/index.html`\n\n## Examples\nBelow we present use cases on two well-known datasets - Titanic and Iris. More examples e.g. for regression problems might be found [here](examples) and in jupyter notebooks [here](jupyter-notebooks).\n\nNote, that in order to run the examples you need to install extra requirements from `requirements-dev.txt`.\n\n## Use case - Titanic survival\nWe demonstrate Ceteris Paribus Plots using the well-known Titanic dataset. In this problem, we examine the chance of survival for Titanic passengers.\nWe start with preprocessing the data and creating an XGBoost model.\n```python\nimport pandas as pd\ndf = pd.read_csv('titanic_train.csv')\n\ny = df['Survived']\nx = df.drop(['Survived', 'PassengerId', 'Name', 'Cabin', 'Ticket'],\n inplace=False, axis=1)\n\nvalid = x['Age'].isnull() | x['Embarked'].isnull()\nx = x[-valid]\ny = y[-valid]\n\nfrom sklearn.model_selection import train_test_split\nX_train, X_test, y_train, y_test = train_test_split(x, y,\n test_size=0.2, random_state=42)\n```\n```python\nfrom sklearn.pipeline import Pipeline\nfrom sklearn.preprocessing import StandardScaler, OneHotEncoder\nfrom sklearn.compose import ColumnTransformer\n\n# We create the preprocessing pipelines for both numeric and categorical data.\nnumeric_features = ['Pclass', 'Age', 'SibSp', 'Parch', 'Fare']\nnumeric_transformer = Pipeline(steps=[\n ('scaler', StandardScaler())])\n\ncategorical_features = ['Embarked', 'Sex']\ncategorical_transformer = Pipeline(steps=[\n ('onehot', OneHotEncoder(handle_unknown='ignore'))])\n\npreprocessor = ColumnTransformer(\n transformers=[\n ('num', numeric_transformer, numeric_features),\n ('cat', categorical_transformer, categorical_features)])\n```\n\n```python\nfrom xgboost import XGBClassifier\nxgb_clf = Pipeline(steps=[('preprocessor', preprocessor),\n('classifier', XGBClassifier())])\nxgb_clf.fit(X_train, y_train)\n```\n\nHere the pyCeterisParibus starts. Since this library works in a model agnostic fashion, first we need to create a wrapper around the model with uniform predict interface.\n```python\nfrom ceteris_paribus.explainer import explain\nexplainer_xgb = explain(xgb_clf, data=x, y=y, label='XGBoost',\n predict_function=lambda X: xgb_clf.predict_proba(X)[::, 1])\n```\n\n\n### Single variable profile\nLet's look at Mr Ernest James Crease, the 19-year-old man, travelling on the 3. class from Southampton with an 8 pounds ticket in his pocket. He died on Titanic. Most likely, this would not have been the case had Ernest been a few years younger.\nFigure 1 presents the chance of survival for a person like Ernest at different ages. We can see things were tough for people like him unless they were a child.\n\n```python\nernest = X_test.iloc[10]\nlabel_ernest = y_test.iloc[10]\nfrom ceteris_paribus.profiles import individual_variable_profile\ncp_xgb = individual_variable_profile(explainer_xgb, ernest, label_ernest)\n```\n\nHaving calculated the profile we can plot it. Note, that `plot_notebook` might be used instead of `plot` when used in Jupyter notebooks.\n\n```python\nfrom ceteris_paribus.plots.plots import plot\nplot(cp_xgb, selected_variables=[\"Age\"])\n```\n\n![Chance of survival depending on age](misc/titanic_single_response.png)\n\n### Many models\nThe above picture explains the prediction of XGBoost model. What if we compare various models?\n\n```python\nfrom sklearn.ensemble import RandomForestClassifier\nfrom sklearn.linear_model import LogisticRegression\nrf_clf = Pipeline(steps=[('preprocessor', preprocessor),\n ('classifier', RandomForestClassifier())])\nlinear_clf = Pipeline(steps=[('preprocessor', preprocessor),\n ('classifier', LogisticRegression())])\n\nrf_clf.fit(X_train, y_train)\nlinear_clf.fit(X_train, y_train)\n\nexplainer_rf = explain(rf_clf, data=x, y=y, label='RandomForest',\n predict_function=lambda X: rf_clf.predict_proba(X)[::, 1])\nexplainer_linear = explain(linear_clf, data=x, y=y, label='LogisticRegression', \n predict_function=lambda X: linear_clf.predict_proba(X)[::, 1])\n\nplot(cp_xgb, cp_rf, cp_linear, selected_variables=[\"Age\"])\n```\n\n![The probability of survival estimated with various models.](misc/titanic_many_models.png)\n\nClearly, XGBoost offers a better fit than Logistic Regression. \nAlso, it predicts a higher chance of survival at child's age than the Random Forest model does.\n\n### Profiles for many variables\nThis time we have a look at Miss. Elizabeth Mussey Eustis. She is 54 years old, travels at 1. class with her sister Marta, as they return to the US from their tour of southern Europe. They both survived the disaster.\n\n```python\nelizabeth = X_test.iloc[1]\nlabel_elizabeth = y_test.iloc[1]\ncp_xgb_2 = individual_variable_profile(explainer_xgb, elizabeth, label_elizabeth)\n```\n\n```python\nplot(cp_xgb_2, selected_variables=[\"Pclass\", \"Sex\", \"Age\", \"Embarked\"])\n```\n\n![Profiles for many variables.](misc/titanic_many_variables.png)\n\nWould she have returned home if she had travelled at 3. class or if she had been a man? As we can observe this is less likely. On the other hand, for a first class, female passenger chances of survival were high regardless of age. Note, this was different in the case of Ernest. Place of embarkment (Cherbourg) has no influence, which is expected behaviour.\n\n### Feature interactions and average response\nNow, what if we look at passengers most similar to Miss. Eustis (middle-aged, upper class)?\n\n```python\nfrom ceteris_paribus.select_data import select_neighbours\nneighbours = select_neighbours(X_train, elizabeth, \n selected_variables=['Pclass', 'Age', 'SibSp', 'Parch', 'Fare', 'Embarked'], \n n=15)\ncp_xgb_ns = individual_variable_profile(explainer_xgb, neighbours)\n```\n\n```python\nplot(cp_xgb_ns, color=\"Sex\", selected_variables=[\"Pclass\", \"Age\"], \n aggregate_profiles='mean', size_pdps=6, alpha_pdps=1, size=2)\n```\n\n![Interaction with gender. Apart from charts with Ceteris Paribus Profiles (top of the visualisation), we can plot a table with observations used to calculate these profiles (bottom of the visualisation).](misc/titanic_interactions_average.png)\n\nThere are two distinct clusters of passengers determined with their gender, therefore a *PDP* average plot (on grey) does not show the whole picture. Children of both genders were likely to survive, but then we see a large gap. Also, being female increased the chance of survival mostly for second and first class passengers.\n\nPlot function comes with extensive customization options. List of all parameters might be found in the documentation. Additionally, one can interact with the plot by hovering over a point of interest to see more details. Similarly, there is an interactive table with options for highlighting relevant elements as well as filtering and sorting rows.\n\n\n\n### Multiclass models - Iris dataset\nPrepare dataset and model\n```python\niris = load_iris()\n\ndef random_forest_classifier():\n rf_model = RandomForestClassifier(n_estimators=100, random_state=42)\n rf_model.fit(iris['data'], iris['target'])\n return rf_model, iris['data'], iris['target'], iris['feature_names']\n```\n\nWrap model into explainers\n```python\nrf_model, iris_x, iris_y, iris_var_names = random_forest_classifier()\n\nexplainer_rf1 = explain(rf_model, iris_var_names, iris_x, iris_y,\n predict_function= lambda X: rf_model.predict_proba(X)[::, 0], label=iris.target_names[0])\nexplainer_rf2 = explain(rf_model, iris_var_names, iris_x, iris_y,\n predict_function= lambda X: rf_model.predict_proba(X)[::, 1], label=iris.target_names[1])\nexplainer_rf3 = explain(rf_model, iris_var_names, iris_x, iris_y,\n predict_function= lambda X: rf_model.predict_proba(X)[::, 2], label=iris.target_names[2])\n```\n\nCalculate profiles and plot\n```python\ncp_rf1 = individual_variable_profile(explainer_rf1, iris_x[0], iris_y[0])\ncp_rf2 = individual_variable_profile(explainer_rf2, iris_x[0], iris_y[0])\ncp_rf3 = individual_variable_profile(explainer_rf3, iris_x[0], iris_y[0])\n\nplot(cp_rf1, cp_rf2, cp_rf3, selected_variables=['petal length (cm)', 'petal width (cm)', 'sepal length (cm)'])\n```\n![Multiclass models](misc/multiclass_models.png)\n\n## Contributing\nYou're more than welcomed to contribute to this package. See the [guideline](CONTRIBUTING.md).\n\n## Acknowledgments\nWork on this package was financially supported by the \u2018NCN Opus grant 2016/21/B/ST6/0217\u2019.\n\n\n", "description_content_type": "text/markdown", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/ModelOriented/pyCeterisParibus", "keywords": "", "license": "", "maintainer": "", "maintainer_email": "", "name": "pyCeterisParibus", "package_url": "https://pypi.org/project/pyCeterisParibus/", "platform": "", "project_url": "https://pypi.org/project/pyCeterisParibus/", "project_urls": { "Homepage": "https://github.com/ModelOriented/pyCeterisParibus" }, "release_url": "https://pypi.org/project/pyCeterisParibus/0.5.1/", "requires_dist": [ "Flask (>=1.0.2)", "numpy (>=1.15.4)", "pandas (>=0.23.4)" ], "requires_python": "", "summary": "Ceteris Paribus python package", "version": "0.5.1" }, "last_serial": 5225237, "releases": { "0.2": [ { "comment_text": "", "digests": { "md5": "6cecec8e04292334db02a0a6e357d970", "sha256": "09d2e7632226c6ebf815ba2b38e45dd4c9e2592f37c63466330249a8e84b7cb9" }, "downloads": -1, "filename": "pyCeterisParibus-0.2-py3-none-any.whl", "has_sig": false, "md5_digest": "6cecec8e04292334db02a0a6e357d970", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 50213, "upload_time": "2019-03-14T20:40:59", "url": "https://files.pythonhosted.org/packages/0e/8d/e352df6aa695bcc53e0c536beddf798cd277b9cbb3df15a57bef30be4691/pyCeterisParibus-0.2-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "147a9ce7a89d727ca0f8db392ba1dc76", "sha256": "9b0490bf843f02f1625d457955dac49508560671a7ec21235669b8a208ff1442" }, "downloads": -1, "filename": "pyCeterisParibus-0.2.tar.gz", "has_sig": false, "md5_digest": "147a9ce7a89d727ca0f8db392ba1dc76", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 44032, "upload_time": "2019-03-14T20:41:01", "url": "https://files.pythonhosted.org/packages/09/89/27cdf245ab43efc68827603b60b157cee9c7104c2bd631985996cedcdd2c/pyCeterisParibus-0.2.tar.gz" } ], "0.3": [ { "comment_text": "", "digests": { "md5": "e81416740809ae7cc611cd905174d87f", "sha256": "abf9c6fbd7c29b3eb9f6c1eddcda5a497c5c0b6b238f9298884f7fb8c8fa9a64" }, "downloads": -1, "filename": "pyCeterisParibus-0.3-py3-none-any.whl", "has_sig": false, "md5_digest": "e81416740809ae7cc611cd905174d87f", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 50289, "upload_time": "2019-03-14T22:53:49", "url": "https://files.pythonhosted.org/packages/3b/9d/344fe521fde621356a0997eb0ef1c15fb59e54aac4bfd0b928d01e6b8be1/pyCeterisParibus-0.3-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "f98173e51388bbabb7684f3122318e3e", "sha256": "5856b86d01aad0e9d2d88058d5d620ee4aa3650e13b5abeffb5dfe005e25d7f6" }, "downloads": -1, "filename": "pyCeterisParibus-0.3.tar.gz", "has_sig": false, "md5_digest": "f98173e51388bbabb7684f3122318e3e", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 44146, "upload_time": "2019-03-14T22:53:51", "url": "https://files.pythonhosted.org/packages/33/3f/a4f32579de2aeb46c1b6395ab959ff161ecc82c28ab8f0f4aa052decb78d/pyCeterisParibus-0.3.tar.gz" } ], "0.4": [ { "comment_text": "", "digests": { "md5": "02d04384d5281ce4590f4fea962ba3cb", "sha256": "ed8f5f33738570794371dcd719d89c195366ef015406206de641bfbb26bb0ad7" }, "downloads": -1, "filename": "pyCeterisParibus-0.4-py3-none-any.whl", "has_sig": false, "md5_digest": "02d04384d5281ce4590f4fea962ba3cb", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 52396, "upload_time": "2019-03-19T23:54:31", "url": "https://files.pythonhosted.org/packages/d8/ad/03b5e26332ea807849c58b06f4debefb13a14a757829b018a621cf445462/pyCeterisParibus-0.4-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "63d14107fccf6ce5ee8544d748c4d392", "sha256": "20449717305f704a3bd5b77c696a504327963c3d4dfc5d5ab09982895fd14b71" }, "downloads": -1, "filename": "pyCeterisParibus-0.4.tar.gz", "has_sig": false, "md5_digest": "63d14107fccf6ce5ee8544d748c4d392", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 46242, "upload_time": "2019-03-19T23:54:33", "url": "https://files.pythonhosted.org/packages/fe/4b/03265ff3490b7dd273196905d845e1689b020737506f018c619a961bf5ce/pyCeterisParibus-0.4.tar.gz" } ], "0.5": [ { "comment_text": "", "digests": { "md5": "c295fb7ac29dfdc79dec78e1ab9c9dc3", "sha256": "941ef3ed76aa1d7bb96580549062fcc8a7a1530f11b6471b721d3b426954fbbf" }, "downloads": -1, "filename": "pyCeterisParibus-0.5-py3-none-any.whl", "has_sig": false, "md5_digest": "c295fb7ac29dfdc79dec78e1ab9c9dc3", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 76932, "upload_time": "2019-03-24T13:11:53", "url": "https://files.pythonhosted.org/packages/c0/b2/da08b99cc6944ca69ad3a2c588b7cde628ccae1887145f7a8ffe784cfe80/pyCeterisParibus-0.5-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "a59167a6ebaefc7e1130af572512f643", "sha256": "aa56f6ee422fd806f500713265a2717c3f9680b31a0abce9a61938375016ba25" }, "downloads": -1, "filename": "pyCeterisParibus-0.5.tar.gz", "has_sig": false, "md5_digest": "a59167a6ebaefc7e1130af572512f643", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 70889, "upload_time": "2019-03-24T13:11:55", "url": "https://files.pythonhosted.org/packages/73/57/50228b039bd502e2593e88e84be684ee3daa855f92087b6eb8584863431f/pyCeterisParibus-0.5.tar.gz" } ], "0.5.1": [ { "comment_text": "", "digests": { "md5": "0887a4e383ea18b2928c94f6a9f25cbb", "sha256": "0ed952ce8e0e6597b3d91ed1adbfe8e5b06017354257edaf2125d34f5d61e8d0" }, "downloads": -1, "filename": "pyCeterisParibus-0.5.1-py3-none-any.whl", "has_sig": false, "md5_digest": "0887a4e383ea18b2928c94f6a9f25cbb", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 76973, "upload_time": "2019-05-04T10:14:09", "url": "https://files.pythonhosted.org/packages/52/76/e0a107242695c42d5d99cc84ad0b6846244140c53d3569223e240b08d5d4/pyCeterisParibus-0.5.1-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "441ff19fe023814c44f5edbd9be1058d", "sha256": "077cd3bdb2489e190d900db8410a1d27f296d3fc8557b6951c9319c6a4a1beb0" }, "downloads": -1, "filename": "pyCeterisParibus-0.5.1.tar.gz", "has_sig": false, "md5_digest": "441ff19fe023814c44f5edbd9be1058d", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 70934, "upload_time": "2019-05-04T10:14:12", "url": "https://files.pythonhosted.org/packages/af/02/7aadb02192a648763fb3e4ccd526b6f757ab2e3af0b42bdc1e545e579739/pyCeterisParibus-0.5.1.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "0887a4e383ea18b2928c94f6a9f25cbb", "sha256": "0ed952ce8e0e6597b3d91ed1adbfe8e5b06017354257edaf2125d34f5d61e8d0" }, "downloads": -1, "filename": "pyCeterisParibus-0.5.1-py3-none-any.whl", "has_sig": false, "md5_digest": "0887a4e383ea18b2928c94f6a9f25cbb", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 76973, "upload_time": "2019-05-04T10:14:09", "url": "https://files.pythonhosted.org/packages/52/76/e0a107242695c42d5d99cc84ad0b6846244140c53d3569223e240b08d5d4/pyCeterisParibus-0.5.1-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "441ff19fe023814c44f5edbd9be1058d", "sha256": "077cd3bdb2489e190d900db8410a1d27f296d3fc8557b6951c9319c6a4a1beb0" }, "downloads": -1, "filename": "pyCeterisParibus-0.5.1.tar.gz", "has_sig": false, "md5_digest": "441ff19fe023814c44f5edbd9be1058d", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 70934, "upload_time": "2019-05-04T10:14:12", "url": "https://files.pythonhosted.org/packages/af/02/7aadb02192a648763fb3e4ccd526b6f757ab2e3af0b42bdc1e545e579739/pyCeterisParibus-0.5.1.tar.gz" } ] }