{
    "info": {
        "author": "Kevin Arvai",
        "author_email": "arvkevi@gmail.com",
        "bugtrack_url": null,
        "classifiers": [
            "Development Status :: 3 - Alpha",
            "Intended Audience :: Science/Research",
            "Programming Language :: Python :: 3",
            "Topic :: Scientific/Engineering :: Information Analysis"
        ],
        "description": "# disarray\n[![Build Status](https://travis-ci.com/arvkevi/disarray.svg?branch=master)](https://travis-ci.com/arvkevi/disarray)\n[![codecov](https://codecov.io/gh/arvkevi/disarray/branch/master/graph/badge.svg)](https://codecov.io/gh/arvkevi/disarray)\n\nThis package calculates metrics derived from a confusion matrix and makes them directly accessible from a pandas \nDataFrame. Simply install and import `disarray`. \n\n**Why disarray?**  \nWorking with a [confusion matrix](https://en.wikipedia.org/wiki/Confusion_matrix) is an everyday occurrence for most \ndata science projects. Sometimes, a data scientist is responsible for generating a confusion matrix using machine \nlearning libraries like [scikit-learn](https://scikit-learn.org/stable/). But it's not uncommon to work with confusion \nmatrices directly as [pandas](https://pandas.pydata.org/) DataFrames. \n\nSince `pandas` version `0.23.0`, users can easily\n[register custom accessors](https://pandas.pydata.org/pandas-docs/stable/development/extending.html#extending-pandas),\n which is how `disarray` is implemented. This makes accessing confusion matrix metrics as easy as:  \n ```python\n>>> import pandas as pd\n>>> df = pd.DataFrame([[18, 1], [0, 1]])\n>>> import disarray\n>>> df.da.sensitivity\n0    0.947368\n1    1.000000\ndtype: float64\n```\n\n## Table of contents\n- [Installation](#installation)\n- [Usage](#usage)\n    * [sample counts](#sample-counts)\n    * [export metrics](#export-metrics)\n    * [multi-class classification](#multi-class-classification)\n    * [supported metrics](#supported-metrics)\n- [Contributing](#contributing)\n\n## Installation\n**Install using pip**\n```bash\n$ pip install disarray\n```\n\n**Clone from GitHub**\n```bash\n$ git clone https://github.com/arvkevi/disarray.git\n$ python setup.py install\n```\n\n## Usage\nThe `disarray` package is intended to be used similar to a `pandas` attribute or method. `disarray` is registered as \na `pandas` extension under `da`. For a DataFrame named `df`, access the library using `df.da.`.\n\nTo understand the input and usage for `disarray`, build an example confusion matrix for a **binary classification**\n problem from scratch with `scikit-learn`.   \n(You can install the packages you need to run the demo with: `pip install -r requirements.demo.txt`)\n\n```python\nfrom sklearn import svm, datasets\nfrom sklearn.model_selection import train_test_split\nfrom sklearn.metrics import confusion_matrix\n# Generate a random binary classification dataset\nX, y = datasets.make_classification(n_classes=2, random_state=42)\nX_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)\n# fit and predict an SVM\nclassifier = svm.SVC(kernel='linear', C=0.01)\ny_pred = classifier.fit(X_train, y_train).predict(X_test)\n\ncm = confusion_matrix(y_test, y_pred)\nprint(cm)\n[[13  2]\n [ 0 10]]\n```\n\nUsing `disarray` is as easy as importing it and instantiating a DataFrame object from a **square** array of **positive** \nintegers.\n\n```python\nimport disarray\nimport pandas as pd\n\ndf = pd.DataFrame(cm)\nprint(df.da.sensitivity)\n0    0.866667\n1    1.000000\n```\n\n### Sample Counts\n`disarray` stores per-class sample counts of true positives, false positives, false negatives, and true negatives. \nEach of these are stored as capitalized abbreviations, `TP`, `FP`, `FN`, and `TN`.\n\n```python\ndf.da.TP\n```\n```python\n0    13\n1    10\ndtype: int64\n```\n\n### Export Metrics\nUse `df.da.export_metrics()` to store and/or visualize many common performance metrics in a new `pandas` DataFrame \nobject. Use the `metrics_to_include=` argument to pass a list of metrics defined in `disarray/metrics.py` (default is \nto use `__all_metrics__`).\n\n```python\ndf.da.export_metrics(metrics_to_include=['precision', 'recall', 'f1'])\n```\n|           |        0 |        1 |   micro-average |\n|-----------|----------|----------|-----------------|\n| precision | 1        | 0.833333 |            0.92 |\n| recall    | 0.866667 | 1        |            0.92 |\n| f1        | 0.928571 | 0.909091 |            0.92 |\n\n\n\n### Multi-Class Classification\n`disarray` works with multi-class classification confusion matrices also. Try it out on the iris dataset. Notice, the\n DataFrame is instantiated with an `index` and `columns` here, but it is not required.\n\n```python\n# load the iris dataset\niris = datasets.load_iris()\nX = iris.data\ny = iris.target\nclass_names = iris.target_names\n# split the training and testing data\nX_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)\n# train and fit a SVM\nclassifier = svm.SVC(kernel='linear', C=0.01)\ny_pred = classifier.fit(X_train, y_train).predict(X_test)\ncm = confusion_matrix(y_test, y_pred)\n\n# Instantiate the confusion matrix DataFrame with index and columns\ndf = pd.DataFrame(cm, index=class_names, columns=class_names)\nprint(df)\n```\n|            |   setosa |   versicolor |   virginica |\n|------------|----------|--------------|-------------|\n| setosa     |       13 |            0 |           0 |\n| versicolor |        0 |           10 |           6 |\n| virginica  |        0 |            0 |           9 |\n\n`disarray` can provide per-class metrics:\n\n```python\ndf.da.sensitivity\n```\n```python\nsetosa        1.000\nversicolor    0.625\nvirginica     1.000\ndtype: float64\n```\nIn a familiar fashion, one of the classes can be accessed with bracket indexing.\n\n```python\ndf.da.sensitivity['setosa']\n```\n```python\n1.0\n```\nCurrently, a [micro-average](https://datascience.stackexchange.com/a/24051/16855) is supported for both binary and\n multi-class classification confusion matrices. (Although it only makes sense in the multi-class case).\n```python\ndf.da.micro_sensitivity\n```\n```python\n0.8421052631578947\n```\nFinally, a DataFrame can be exported with selected metrics.\n```python\ndf.da.export_metrics(metrics_to_include=['sensitivity', 'specificity', 'f1'])\n```\n\n|             |   setosa |   versicolor |   virginica |   micro-average |\n|-------------|----------|--------------|-------------|-----------------|\n| sensitivity |        1 |     0.625    |    1        |        0.842105 |\n| specificity |        1 |     1        |    0.793103 |        0.921053 |\n| f1          |        1 |     0.769231 |    0.75     |        0.842105 |\n\n### Supported Metrics\n```python\n'accuracy',\n'f1',\n'false_discovery_rate',\n'false_negative_rate',\n'false_positive_rate',\n'negative_predictive_value',\n'positive_predictive_value',\n'precision',\n'recall',\n'sensitivity',\n'specificity',\n'true_negative_rate',\n'true_positive_rate',\n```\nAs well as micro-averages for each of these, accessible via `df.da.micro_recall`, for example.\n\n## Contributing\n\nContributions are welcome, please refer to [CONTRIBUTING](https://github.com/arvkevi/disarray/blob/master/CONTRIBUTING.md) \nto learn more about how to contribute.\n\n\n",
        "description_content_type": "text/markdown",
        "docs_url": null,
        "download_url": "https://github.com/arvkevi/disarray/tarball/0.1.0",
        "downloads": {
            "last_day": -1,
            "last_month": -1,
            "last_week": -1
        },
        "home_page": "https://github.com/arvkevi/disarray",
        "keywords": "machine learning-supervised learning",
        "license": "MIT",
        "maintainer": "",
        "maintainer_email": "",
        "name": "disarray",
        "package_url": "https://pypi.org/project/disarray/",
        "platform": "",
        "project_url": "https://pypi.org/project/disarray/",
        "project_urls": {
            "Download": "https://github.com/arvkevi/disarray/tarball/0.1.0",
            "Homepage": "https://github.com/arvkevi/disarray"
        },
        "release_url": "https://pypi.org/project/disarray/0.1.0/",
        "requires_dist": [
            "pandas (>=0.23.0)",
            "numpy (>=0.14.2)"
        ],
        "requires_python": "",
        "summary": "Calculate confusion matrix metrics from your pandas DataFrame",
        "version": "0.1.0"
    },
    "last_serial": 5936592,
    "releases": {
        "0.1.0": [
            {
                "comment_text": "",
                "digests": {
                    "md5": "9aa7f07d3fce28b82cda8405a66d2905",
                    "sha256": "806f4826bf33a2e3d115737c08355ee6fc34041f438ebf41ec34e1113b9f46d3"
                },
                "downloads": -1,
                "filename": "disarray-0.1.0-py2.py3-none-any.whl",
                "has_sig": false,
                "md5_digest": "9aa7f07d3fce28b82cda8405a66d2905",
                "packagetype": "bdist_wheel",
                "python_version": "py2.py3",
                "requires_python": null,
                "size": 6674,
                "upload_time": "2019-10-07T01:50:29",
                "url": "https://files.pythonhosted.org/packages/0c/c1/9468fddcc25f930c7300dc45bd2f2bead83504ff62719ad7c5f41898aa0d/disarray-0.1.0-py2.py3-none-any.whl"
            },
            {
                "comment_text": "",
                "digests": {
                    "md5": "377df996739c5aa0d8bfaa0ee98d2034",
                    "sha256": "397588be2d493b68b0c438dd0bd85a0fe7207870877281e2905963123a2c7817"
                },
                "downloads": -1,
                "filename": "disarray-0.1.0.tar.gz",
                "has_sig": false,
                "md5_digest": "377df996739c5aa0d8bfaa0ee98d2034",
                "packagetype": "sdist",
                "python_version": "source",
                "requires_python": null,
                "size": 6117,
                "upload_time": "2019-10-07T01:50:32",
                "url": "https://files.pythonhosted.org/packages/8a/fb/6fec6dfaed0fa749e2c46467abf46604c2ee4e1ed4787f9c26d98cc2a1e6/disarray-0.1.0.tar.gz"
            }
        ]
    },
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "md5": "9aa7f07d3fce28b82cda8405a66d2905",
                "sha256": "806f4826bf33a2e3d115737c08355ee6fc34041f438ebf41ec34e1113b9f46d3"
            },
            "downloads": -1,
            "filename": "disarray-0.1.0-py2.py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "9aa7f07d3fce28b82cda8405a66d2905",
            "packagetype": "bdist_wheel",
            "python_version": "py2.py3",
            "requires_python": null,
            "size": 6674,
            "upload_time": "2019-10-07T01:50:29",
            "url": "https://files.pythonhosted.org/packages/0c/c1/9468fddcc25f930c7300dc45bd2f2bead83504ff62719ad7c5f41898aa0d/disarray-0.1.0-py2.py3-none-any.whl"
        },
        {
            "comment_text": "",
            "digests": {
                "md5": "377df996739c5aa0d8bfaa0ee98d2034",
                "sha256": "397588be2d493b68b0c438dd0bd85a0fe7207870877281e2905963123a2c7817"
            },
            "downloads": -1,
            "filename": "disarray-0.1.0.tar.gz",
            "has_sig": false,
            "md5_digest": "377df996739c5aa0d8bfaa0ee98d2034",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 6117,
            "upload_time": "2019-10-07T01:50:32",
            "url": "https://files.pythonhosted.org/packages/8a/fb/6fec6dfaed0fa749e2c46467abf46604c2ee4e1ed4787f9c26d98cc2a1e6/disarray-0.1.0.tar.gz"
        }
    ]
}