{
    "info": {
        "author": "ayush1997",
        "author_email": "ayushkumarsingh97@gmail.com",
        "bugtrack_url": null,
        "classifiers": [
            "Development Status :: 3 - Alpha",
            "Intended Audience :: Developers",
            "Intended Audience :: Science/Research",
            "License :: OSI Approved :: MIT License",
            "Programming Language :: Python :: 2",
            "Programming Language :: Python :: 2.6",
            "Programming Language :: Python :: 2.7",
            "Programming Language :: Python :: 3",
            "Programming Language :: Python :: 3.3",
            "Programming Language :: Python :: 3.4",
            "Programming Language :: Python :: 3.5",
            "Topic :: Software Development :: Build Tools"
        ],
        "description": "visualize\\_ML\n=============\n\nvisualize\\_ML is a python package made to visualize some of the steps involved while dealing with a Machine Learning problem. It is build on libraries like matplotlib for visualization and sklearn,scipy for statistical computations.\n\nTable of content:\n~~~~~~~~~~~~~~~~~\n\n-  Requirements\n-  Install\n-  Let\u2019s code\n\n   -  explore module\n   -  relation module\n\n-  contribute\n-  Licence\n-  Copyright\n\nLet\u2019s Code\n----------\n\nWhen we start dealing with a Machine Learning problem some of the\ninitial steps involved are data exploration,analysis followed by feature\nselection.Below are the modules for these tasks.\n\n1) Data Exploration\n~~~~~~~~~~~~~~~~~~~\n\nAt this stage, we explore variables one by one using **Uni-variate\nAnalysis** which depends on whether the variable type is categorical or\ncontinuous .To deal with this we have the **explore** module.\n\n>>>explore module\n~~~~~~~~~~~~~~~~~~\n\n::\n\n    visualize_ML.explore.plot(data_input,categorical_name=[],drop=[],PLOT_COLUMNS_SIZE=4,bin_size=20,\n    bar_width=0.2,wspace=0.5,hspace=0.8)\n\n**Continuous Variables** : In case of continous variables it plots the\n*Histogram* for every variable and gives descriptive statistics for\nthem.\n\n**Categorical Variables** : In case on categorical variables with 2 or\nmore classes it plots the *Bar chart* for every variable and gives\ndescriptive statistics for them.\n\n+---------------------+-----------------+---------------------------------------+\n| Parameters          | Type            | Description                           |\n+=====================+=================+=======================================+\n| data\\_input         | Dataframe       | This is the input Dataframe with all  |\n|                     |                 | data.(Right now the input can be only |\n|                     |                 | be a dataframe input.)                |\n+---------------------+-----------------+---------------------------------------+\n| categorical\\_name   | list (default=[ | Names of all categorical variable     |\n|                     | ])              | columns with more than 2 classes, to  |\n|                     |                 | distinguish them with the continuous  |\n|                     |                 | variablesEmply list implies that      |\n|                     |                 | there are no categorical features     |\n|                     |                 | with more than 2 classes.             |\n+---------------------+-----------------+---------------------------------------+\n| drop                | list default=[  | Names of columns to be dropped.       |\n|                     | ]               |                                       |\n+---------------------+-----------------+---------------------------------------+\n| PLOT\\_COLUMNS\\_SIZE | int (default=4) | Number of plots to display vertically |\n|                     |                 | in the display window.The row size is |\n|                     |                 | adjusted accordingly.                 |\n+---------------------+-----------------+---------------------------------------+\n| bin\\_size           | int             | Number of bins for the histogram      |\n|                     | (default=\u201cauto\u201d | displayed in the categorical vs       |\n|                     | )               | categorical category.                 |\n+---------------------+-----------------+---------------------------------------+\n| wspace              | float32         | Horizontal padding between subplot on |\n|                     | (default = 0.5) | the display window.                   |\n+---------------------+-----------------+---------------------------------------+\n| hspace              | float32         | Vertical padding between subplot on   |\n|                     | (default = 0.8) | the display window.                   |\n+---------------------+-----------------+---------------------------------------+\n\n**Code Snippet**\n\n.. code :: python\n\n    /* The data set is taken from famous Titanic data(Kaggle)*/\n\n    import pandas as pd\n    from visualize_ML import explore\n    df = pd.read_csv(\"dataset/train.csv\")\n\n    explore.plot(df,[\"Survived\",\"Pclass\",\"Sex\",\"SibSp\",\"Ticket\",\"Embarked\"],drop=[\"PassengerId\",\"Name\"])\n\n.. figure:: /images/explore1.png?raw=true\n   :alt: Optional Title\n\n   Graph made using explore module using matplotlib.\n\nsee the [dataset](https://www.kaggle.com/c/titanic/data)\n\n**Note:** While plotting all the rows with **NaN** values and columns\nwith **Character** values are removed(except if values are True and False ) only numeric data is plotted.\n\n2) Feature Selection\n~~~~~~~~~~~~~~~~~~~~\n\nThis is one of the challenging task to deal with for a ML task.Here we\nhave to do **Bi-variate Analysis** to find out the relationship between\ntwo variables. Here, we look for association and disassociation between\nvariables at a pre-defined\n\n\n**relation** module helps in visualizing the analysis done on various\ncombination of variables and see relation between them.\n\n>>>relation module\n~~~~~~~~~~~~~~~~~~~\n\n::\n\n    visualize_ML.relation.plot(df,\"Sex\",[\"Survived\",\"Pclass\",\"Sex\",\"SibSp\",\"Ticket\",\"Embarked\"],drop=[\"PassengerId\",\"Name\"],bin_size=10)\n\n**Continuous vs Continuous variables:** To do the Bi-variate analysis\n*scatter plots* are made as their pattern indicates the relationship\nbetween variables. To indicates the strength of relationship amongst\nthem we use Correlation between them.\n\nThe graph displays the correlation coefficient along with other\ninformation.\n\n::\n\n    Correlation = Covariance(X,Y) / SQRT( Var(X)*Var(Y))\n\n-  -1: perfect negative linear correlation\n-  +1:perfect positive linear correlation and\n-  0: No correlation\n\n**Categorical vs Categorical variables**: *Stacked Column Charts* are\nmade to visualize the relation.\\ **Chi square test** is used to derive\nthe statistical significance of relationship between the variables. It\nreturns *probability* for the computed chi-square distribution with the\ndegree of freedom. For more information on Chi Test see `this`_\n\nProbability of 0: It indicates that both categorical variable are\ndependent\n\nProbability of 1: It shows that both variables are independent.\n\nThe graph displays the *p\\_value* along with other information. If it is\nleass than **0.05** it states that the variables are dependent.\n\n**Categorical vs Continuous variables:** To explore the relation between\ncategorical and continuous variables,box plots re drawn at each level of\ncategorical variables. If levels are small in number, it will not show\nthe statistical significance. **ANOVA test** is used to derive the\nstatistical significance of relationship between the variables.\n\nThe graph displays the *p\\_value* along with other information. If it is\nleass than **0.05** it states that the variables are dependent.\n\nFor more information on ANOVA test see\n`this <https://onlinecourses.science.psu.edu/stat200/book/export/html/66>`__\n\n+----------------+-----------+-------------------------------------------------+\n| Parameters     | Type      | Description                                     |\n+================+===========+=================================================+\n| data\\_input    | Dataframe | This is the input Dataframe with all            |\n|                |           | data.(Right now the input can be only be a      |\n|                |           | dataframe input.)                               |\n+----------------+-----------+-------------------------------------------------+\n| target\\_name   | String    | The name of the target column.                  |\n+----------------+-----------+-------------------------------------------------+\n| categorical\\_n | list      | Names of all categorical variable columns with  |\n| ame            | (default= | more than 2 classes, to distinguish them with   |\n|                | [         | the continuous variablesEmply list implies that |\n|                | ])        | there are no categorical features with more     |\n|                |           | than 2 classes.                                 |\n+----------------+-----------+-------------------------------------------------+\n| drop           | list      | Names of columns to be dropped.                 |\n|                | default=[ |                                                 |\n|                | ]         |                                                 |\n+----------------+-----------+-------------------------------------------------+\n| PLOT\\_COLUMNS\\ | int       | Number of plots to display vertically in the    |\n| _SIZE          | (default= | display window.The row size is adjusted         |\n|                | 4)        | accordingly.                                    |\n+----------------+-----------+-------------------------------------------------+\n| bin\\_size      | int       | Number of bins for the histogram displayed in   |\n|                | (default= | the categorical vs categorical category.        |\n|                | \u201cauto\u201d)   |                                                 |\n+----------------+-----------+-------------------------------------------------+\n| wspace         | float32   | Horizontal padding between subplot on the       |\n|                | (default  | display window.                                 |\n|                | = 0.5)    |                                                 |\n+----------------+-----------+-------------------------------------------------+\n| hspace         | float32   | Vertical padding between subplot on the display |\n|                | (default  | window.                                         |\n|                | = 0.8)    |                                                 |\n+----------------+-----------+-------------------------------------------------+\n\n**Code Snippet**\n\n.. code :: python\n\n    /* The data set is taken from famous Titanic data(Kaggle)*/\n    import pandas as pd\n    from visualize_ML import relation\n    df = pd.read_csv(\"dataset/train.csv\")\n\n    relation.plot(df,\"Survived\",[\"Survived\",\"Pclass\",\"Sex\",\"SibSp\",\"Ticket\",\"Embarked\"],drop=[\"PassengerId\",\"Name\"],bin_size=10)\n\n.. figure:: /images/relation1.png?raw=true\n   :alt: Optional Title\n\n   Graph made using relation module using matplotlib.\n\nsee the [dataset](https://www.kaggle.com/c/titanic/data)\n\n**Note:** While plotting all the rows with **NaN** values and columns\nwith **Non numeric** values are removed only numeric data is\nplotted.Only categorical taget variable with string values are allowed.\n\nContribute\n----------\n\nIf you want to contribute and add new feature feel free to send Pull\nrequest `here`_\n\nThis project is still under development so to report any bugs or request new features, head over to the Issues page\n\nLicence\n-------\nLicensed under `The MIT License (MIT)`_.\n\nCopyright\n---------\nayush1997(c) 2016\n\n.. _here: https://github.com/ayush1997/visualize_ML\n.. _The MIT License (MIT): https://github.com/ayush1997/visualize_ML/blob/master/LICENSE.txt",
        "description_content_type": null,
        "docs_url": null,
        "download_url": "UNKNOWN",
        "downloads": {
            "last_day": -1,
            "last_month": -1,
            "last_week": -1
        },
        "home_page": "https://github.com/ayush1997/visualize_ML",
        "keywords": "visualization MachineLearning DataScience",
        "license": "MIT",
        "maintainer": null,
        "maintainer_email": null,
        "name": "visualize_ML",
        "package_url": "https://pypi.org/project/visualize_ML/",
        "platform": "UNKNOWN",
        "project_url": "https://pypi.org/project/visualize_ML/",
        "project_urls": {
            "Download": "UNKNOWN",
            "Homepage": "https://github.com/ayush1997/visualize_ML"
        },
        "release_url": "https://pypi.org/project/visualize_ML/0.2.2/",
        "requires_dist": null,
        "requires_python": null,
        "summary": "To visualize various processes involved in dealing with a Machine Learning problem.",
        "version": "0.2.2"
    },
    "last_serial": 2261969,
    "releases": {
        "0.1.1": [
            {
                "comment_text": "",
                "digests": {
                    "md5": "56902a1b41990581c6967de1298ced19",
                    "sha256": "897b50e8920975a576a6b7e65e715328115ef3633639c209ce6e2d84e0cce7a5"
                },
                "downloads": -1,
                "filename": "visualize_ML-0.1.1.tar.gz",
                "has_sig": false,
                "md5_digest": "56902a1b41990581c6967de1298ced19",
                "packagetype": "sdist",
                "python_version": "source",
                "requires_python": null,
                "size": 8226,
                "upload_time": "2016-07-30T19:50:40",
                "url": "https://files.pythonhosted.org/packages/24/80/81911be6625f71d2d8a5ac36ed0bd697c3c627d496fff68e68a7086aa26d/visualize_ML-0.1.1.tar.gz"
            }
        ],
        "0.1.2": [
            {
                "comment_text": "",
                "digests": {
                    "md5": "539763179501649f5bed6a28696ff8d0",
                    "sha256": "0a7db70519cd51b6b39a129566ce2d05a24b6cab854bbeb60ff8773febf700ff"
                },
                "downloads": -1,
                "filename": "visualize_ML-0.1.2.tar.gz",
                "has_sig": false,
                "md5_digest": "539763179501649f5bed6a28696ff8d0",
                "packagetype": "sdist",
                "python_version": "source",
                "requires_python": null,
                "size": 8219,
                "upload_time": "2016-07-30T20:23:11",
                "url": "https://files.pythonhosted.org/packages/13/cd/274ff23eedc3b28d5924845c89bd784b06ba418a27e34403a8840c35249a/visualize_ML-0.1.2.tar.gz"
            }
        ],
        "0.2.2": [
            {
                "comment_text": "",
                "digests": {
                    "md5": "18e75709e20fd80dad2465b32c716641",
                    "sha256": "1ce5cdfa2b5e4dd87c052923b90e0878d49204733d97cb2a2f987b3e71b47fff"
                },
                "downloads": -1,
                "filename": "visualize_ML-0.2.2.tar.gz",
                "has_sig": false,
                "md5_digest": "18e75709e20fd80dad2465b32c716641",
                "packagetype": "sdist",
                "python_version": "source",
                "requires_python": null,
                "size": 10936,
                "upload_time": "2016-08-04T13:37:09",
                "url": "https://files.pythonhosted.org/packages/b7/28/392ffde9e70595589d0e1bb4733317104275b56c7e3414eb8ae4631e1f39/visualize_ML-0.2.2.tar.gz"
            }
        ]
    },
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "md5": "18e75709e20fd80dad2465b32c716641",
                "sha256": "1ce5cdfa2b5e4dd87c052923b90e0878d49204733d97cb2a2f987b3e71b47fff"
            },
            "downloads": -1,
            "filename": "visualize_ML-0.2.2.tar.gz",
            "has_sig": false,
            "md5_digest": "18e75709e20fd80dad2465b32c716641",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 10936,
            "upload_time": "2016-08-04T13:37:09",
            "url": "https://files.pythonhosted.org/packages/b7/28/392ffde9e70595589d0e1bb4733317104275b56c7e3414eb8ae4631e1f39/visualize_ML-0.2.2.tar.gz"
        }
    ]
}