{ "info": { "author": "Kevin Arvai", "author_email": "arvkevi@gmail.com", "bugtrack_url": null, "classifiers": [ "Development Status :: 3 - Alpha", "Intended Audience :: Science/Research", "Programming Language :: Python :: 3", "Topic :: Scientific/Engineering :: Information Analysis" ], "description": "# disarray\n[![Build Status](https://travis-ci.com/arvkevi/disarray.svg?branch=master)](https://travis-ci.com/arvkevi/disarray)\n[![codecov](https://codecov.io/gh/arvkevi/disarray/branch/master/graph/badge.svg)](https://codecov.io/gh/arvkevi/disarray)\n\nThis package calculates metrics derived from a confusion matrix and makes them directly accessible from a pandas \nDataFrame. Simply install and import `disarray`. \n\n**Why disarray?** \nWorking with a [confusion matrix](https://en.wikipedia.org/wiki/Confusion_matrix) is an everyday occurrence for most \ndata science projects. Sometimes, a data scientist is responsible for generating a confusion matrix using machine \nlearning libraries like [scikit-learn](https://scikit-learn.org/stable/). But it's not uncommon to work with confusion \nmatrices directly as [pandas](https://pandas.pydata.org/) DataFrames. \n\nSince `pandas` version `0.23.0`, users can easily\n[register custom accessors](https://pandas.pydata.org/pandas-docs/stable/development/extending.html#extending-pandas),\n which is how `disarray` is implemented. This makes accessing confusion matrix metrics as easy as: \n ```python\n>>> import pandas as pd\n>>> df = pd.DataFrame([[18, 1], [0, 1]])\n>>> import disarray\n>>> df.da.sensitivity\n0 0.947368\n1 1.000000\ndtype: float64\n```\n\n## Table of contents\n- [Installation](#installation)\n- [Usage](#usage)\n * [sample counts](#sample-counts)\n * [export metrics](#export-metrics)\n * [multi-class classification](#multi-class-classification)\n * [supported metrics](#supported-metrics)\n- [Contributing](#contributing)\n\n## Installation\n**Install using pip**\n```bash\n$ pip install disarray\n```\n\n**Clone from GitHub**\n```bash\n$ git clone https://github.com/arvkevi/disarray.git\n$ python setup.py install\n```\n\n## Usage\nThe `disarray` package is intended to be used similar to a `pandas` attribute or method. `disarray` is registered as \na `pandas` extension under `da`. For a DataFrame named `df`, access the library using `df.da.`.\n\nTo understand the input and usage for `disarray`, build an example confusion matrix for a **binary classification**\n problem from scratch with `scikit-learn`. \n(You can install the packages you need to run the demo with: `pip install -r requirements.demo.txt`)\n\n```python\nfrom sklearn import svm, datasets\nfrom sklearn.model_selection import train_test_split\nfrom sklearn.metrics import confusion_matrix\n# Generate a random binary classification dataset\nX, y = datasets.make_classification(n_classes=2, random_state=42)\nX_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)\n# fit and predict an SVM\nclassifier = svm.SVC(kernel='linear', C=0.01)\ny_pred = classifier.fit(X_train, y_train).predict(X_test)\n\ncm = confusion_matrix(y_test, y_pred)\nprint(cm)\n[[13 2]\n [ 0 10]]\n```\n\nUsing `disarray` is as easy as importing it and instantiating a DataFrame object from a **square** array of **positive** \nintegers.\n\n```python\nimport disarray\nimport pandas as pd\n\ndf = pd.DataFrame(cm)\nprint(df.da.sensitivity)\n0 0.866667\n1 1.000000\n```\n\n### Sample Counts\n`disarray` stores per-class sample counts of true positives, false positives, false negatives, and true negatives. \nEach of these are stored as capitalized abbreviations, `TP`, `FP`, `FN`, and `TN`.\n\n```python\ndf.da.TP\n```\n```python\n0 13\n1 10\ndtype: int64\n```\n\n### Export Metrics\nUse `df.da.export_metrics()` to store and/or visualize many common performance metrics in a new `pandas` DataFrame \nobject. Use the `metrics_to_include=` argument to pass a list of metrics defined in `disarray/metrics.py` (default is \nto use `__all_metrics__`).\n\n```python\ndf.da.export_metrics(metrics_to_include=['precision', 'recall', 'f1'])\n```\n| | 0 | 1 | micro-average |\n|-----------|----------|----------|-----------------|\n| precision | 1 | 0.833333 | 0.92 |\n| recall | 0.866667 | 1 | 0.92 |\n| f1 | 0.928571 | 0.909091 | 0.92 |\n\n\n\n### Multi-Class Classification\n`disarray` works with multi-class classification confusion matrices also. Try it out on the iris dataset. Notice, the\n DataFrame is instantiated with an `index` and `columns` here, but it is not required.\n\n```python\n# load the iris dataset\niris = datasets.load_iris()\nX = iris.data\ny = iris.target\nclass_names = iris.target_names\n# split the training and testing data\nX_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)\n# train and fit a SVM\nclassifier = svm.SVC(kernel='linear', C=0.01)\ny_pred = classifier.fit(X_train, y_train).predict(X_test)\ncm = confusion_matrix(y_test, y_pred)\n\n# Instantiate the confusion matrix DataFrame with index and columns\ndf = pd.DataFrame(cm, index=class_names, columns=class_names)\nprint(df)\n```\n| | setosa | versicolor | virginica |\n|------------|----------|--------------|-------------|\n| setosa | 13 | 0 | 0 |\n| versicolor | 0 | 10 | 6 |\n| virginica | 0 | 0 | 9 |\n\n`disarray` can provide per-class metrics:\n\n```python\ndf.da.sensitivity\n```\n```python\nsetosa 1.000\nversicolor 0.625\nvirginica 1.000\ndtype: float64\n```\nIn a familiar fashion, one of the classes can be accessed with bracket indexing.\n\n```python\ndf.da.sensitivity['setosa']\n```\n```python\n1.0\n```\nCurrently, a [micro-average](https://datascience.stackexchange.com/a/24051/16855) is supported for both binary and\n multi-class classification confusion matrices. (Although it only makes sense in the multi-class case).\n```python\ndf.da.micro_sensitivity\n```\n```python\n0.8421052631578947\n```\nFinally, a DataFrame can be exported with selected metrics.\n```python\ndf.da.export_metrics(metrics_to_include=['sensitivity', 'specificity', 'f1'])\n```\n\n| | setosa | versicolor | virginica | micro-average |\n|-------------|----------|--------------|-------------|-----------------|\n| sensitivity | 1 | 0.625 | 1 | 0.842105 |\n| specificity | 1 | 1 | 0.793103 | 0.921053 |\n| f1 | 1 | 0.769231 | 0.75 | 0.842105 |\n\n### Supported Metrics\n```python\n'accuracy',\n'f1',\n'false_discovery_rate',\n'false_negative_rate',\n'false_positive_rate',\n'negative_predictive_value',\n'positive_predictive_value',\n'precision',\n'recall',\n'sensitivity',\n'specificity',\n'true_negative_rate',\n'true_positive_rate',\n```\nAs well as micro-averages for each of these, accessible via `df.da.micro_recall`, for example.\n\n## Contributing\n\nContributions are welcome, please refer to [CONTRIBUTING](https://github.com/arvkevi/disarray/blob/master/CONTRIBUTING.md) \nto learn more about how to contribute.\n\n\n", "description_content_type": "text/markdown", "docs_url": null, "download_url": "https://github.com/arvkevi/disarray/tarball/0.1.0", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/arvkevi/disarray", "keywords": "machine learning-supervised learning", "license": "MIT", "maintainer": "", "maintainer_email": "", "name": "disarray", "package_url": "https://pypi.org/project/disarray/", "platform": "", "project_url": "https://pypi.org/project/disarray/", "project_urls": { "Download": "https://github.com/arvkevi/disarray/tarball/0.1.0", "Homepage": "https://github.com/arvkevi/disarray" }, "release_url": "https://pypi.org/project/disarray/0.1.0/", "requires_dist": [ "pandas (>=0.23.0)", "numpy (>=0.14.2)" ], "requires_python": "", "summary": "Calculate confusion matrix metrics from your pandas DataFrame", "version": "0.1.0" }, "last_serial": 5936592, "releases": { "0.1.0": [ { "comment_text": "", "digests": { "md5": "9aa7f07d3fce28b82cda8405a66d2905", "sha256": "806f4826bf33a2e3d115737c08355ee6fc34041f438ebf41ec34e1113b9f46d3" }, "downloads": -1, "filename": "disarray-0.1.0-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "9aa7f07d3fce28b82cda8405a66d2905", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": null, "size": 6674, "upload_time": "2019-10-07T01:50:29", "url": "https://files.pythonhosted.org/packages/0c/c1/9468fddcc25f930c7300dc45bd2f2bead83504ff62719ad7c5f41898aa0d/disarray-0.1.0-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "377df996739c5aa0d8bfaa0ee98d2034", "sha256": "397588be2d493b68b0c438dd0bd85a0fe7207870877281e2905963123a2c7817" }, "downloads": -1, "filename": "disarray-0.1.0.tar.gz", "has_sig": false, "md5_digest": "377df996739c5aa0d8bfaa0ee98d2034", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 6117, "upload_time": "2019-10-07T01:50:32", "url": "https://files.pythonhosted.org/packages/8a/fb/6fec6dfaed0fa749e2c46467abf46604c2ee4e1ed4787f9c26d98cc2a1e6/disarray-0.1.0.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "9aa7f07d3fce28b82cda8405a66d2905", "sha256": "806f4826bf33a2e3d115737c08355ee6fc34041f438ebf41ec34e1113b9f46d3" }, "downloads": -1, "filename": "disarray-0.1.0-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "9aa7f07d3fce28b82cda8405a66d2905", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": null, "size": 6674, "upload_time": "2019-10-07T01:50:29", "url": "https://files.pythonhosted.org/packages/0c/c1/9468fddcc25f930c7300dc45bd2f2bead83504ff62719ad7c5f41898aa0d/disarray-0.1.0-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "377df996739c5aa0d8bfaa0ee98d2034", "sha256": "397588be2d493b68b0c438dd0bd85a0fe7207870877281e2905963123a2c7817" }, "downloads": -1, "filename": "disarray-0.1.0.tar.gz", "has_sig": false, "md5_digest": "377df996739c5aa0d8bfaa0ee98d2034", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 6117, "upload_time": "2019-10-07T01:50:32", "url": "https://files.pythonhosted.org/packages/8a/fb/6fec6dfaed0fa749e2c46467abf46604c2ee4e1ed4787f9c26d98cc2a1e6/disarray-0.1.0.tar.gz" } ] }