{ "info": { "author": "ayush1997", "author_email": "ayushkumarsingh97@gmail.com", "bugtrack_url": null, "classifiers": [ "Development Status :: 3 - Alpha", "Intended Audience :: Developers", "Intended Audience :: Science/Research", "License :: OSI Approved :: MIT License", "Programming Language :: Python :: 2", "Programming Language :: Python :: 2.6", "Programming Language :: Python :: 2.7", "Programming Language :: Python :: 3", "Programming Language :: Python :: 3.3", "Programming Language :: Python :: 3.4", "Programming Language :: Python :: 3.5", "Topic :: Software Development :: Build Tools" ], "description": "visualize\\_ML\n=============\n\nvisualize\\_ML is a python package made to visualize some of the steps involved while dealing with a Machine Learning problem. It is build on libraries like matplotlib for visualization and sklearn,scipy for statistical computations.\n\nTable of content:\n~~~~~~~~~~~~~~~~~\n\n- Requirements\n- Install\n- Let\u2019s code\n\n - explore module\n - relation module\n\n- contribute\n- Licence\n- Copyright\n\nLet\u2019s Code\n----------\n\nWhen we start dealing with a Machine Learning problem some of the\ninitial steps involved are data exploration,analysis followed by feature\nselection.Below are the modules for these tasks.\n\n1) Data Exploration\n~~~~~~~~~~~~~~~~~~~\n\nAt this stage, we explore variables one by one using **Uni-variate\nAnalysis** which depends on whether the variable type is categorical or\ncontinuous .To deal with this we have the **explore** module.\n\n>>>explore module\n~~~~~~~~~~~~~~~~~~\n\n::\n\n visualize_ML.explore.plot(data_input,categorical_name=[],drop=[],PLOT_COLUMNS_SIZE=4,bin_size=20,\n bar_width=0.2,wspace=0.5,hspace=0.8)\n\n**Continuous Variables** : In case of continous variables it plots the\n*Histogram* for every variable and gives descriptive statistics for\nthem.\n\n**Categorical Variables** : In case on categorical variables with 2 or\nmore classes it plots the *Bar chart* for every variable and gives\ndescriptive statistics for them.\n\n+---------------------+-----------------+---------------------------------------+\n| Parameters | Type | Description |\n+=====================+=================+=======================================+\n| data\\_input | Dataframe | This is the input Dataframe with all |\n| | | data.(Right now the input can be only |\n| | | be a dataframe input.) |\n+---------------------+-----------------+---------------------------------------+\n| categorical\\_name | list (default=[ | Names of all categorical variable |\n| | ]) | columns with more than 2 classes, to |\n| | | distinguish them with the continuous |\n| | | variablesEmply list implies that |\n| | | there are no categorical features |\n| | | with more than 2 classes. |\n+---------------------+-----------------+---------------------------------------+\n| drop | list default=[ | Names of columns to be dropped. |\n| | ] | |\n+---------------------+-----------------+---------------------------------------+\n| PLOT\\_COLUMNS\\_SIZE | int (default=4) | Number of plots to display vertically |\n| | | in the display window.The row size is |\n| | | adjusted accordingly. |\n+---------------------+-----------------+---------------------------------------+\n| bin\\_size | int | Number of bins for the histogram |\n| | (default=\u201cauto\u201d | displayed in the categorical vs |\n| | ) | categorical category. |\n+---------------------+-----------------+---------------------------------------+\n| wspace | float32 | Horizontal padding between subplot on |\n| | (default = 0.5) | the display window. |\n+---------------------+-----------------+---------------------------------------+\n| hspace | float32 | Vertical padding between subplot on |\n| | (default = 0.8) | the display window. |\n+---------------------+-----------------+---------------------------------------+\n\n**Code Snippet**\n\n.. code :: python\n\n /* The data set is taken from famous Titanic data(Kaggle)*/\n\n import pandas as pd\n from visualize_ML import explore\n df = pd.read_csv(\"dataset/train.csv\")\n\n explore.plot(df,[\"Survived\",\"Pclass\",\"Sex\",\"SibSp\",\"Ticket\",\"Embarked\"],drop=[\"PassengerId\",\"Name\"])\n\n.. figure:: /images/explore1.png?raw=true\n :alt: Optional Title\n\n Graph made using explore module using matplotlib.\n\nsee the [dataset](https://www.kaggle.com/c/titanic/data)\n\n**Note:** While plotting all the rows with **NaN** values and columns\nwith **Character** values are removed(except if values are True and False ) only numeric data is plotted.\n\n2) Feature Selection\n~~~~~~~~~~~~~~~~~~~~\n\nThis is one of the challenging task to deal with for a ML task.Here we\nhave to do **Bi-variate Analysis** to find out the relationship between\ntwo variables. Here, we look for association and disassociation between\nvariables at a pre-defined\n\n\n**relation** module helps in visualizing the analysis done on various\ncombination of variables and see relation between them.\n\n>>>relation module\n~~~~~~~~~~~~~~~~~~~\n\n::\n\n visualize_ML.relation.plot(df,\"Sex\",[\"Survived\",\"Pclass\",\"Sex\",\"SibSp\",\"Ticket\",\"Embarked\"],drop=[\"PassengerId\",\"Name\"],bin_size=10)\n\n**Continuous vs Continuous variables:** To do the Bi-variate analysis\n*scatter plots* are made as their pattern indicates the relationship\nbetween variables. To indicates the strength of relationship amongst\nthem we use Correlation between them.\n\nThe graph displays the correlation coefficient along with other\ninformation.\n\n::\n\n Correlation = Covariance(X,Y) / SQRT( Var(X)*Var(Y))\n\n- -1: perfect negative linear correlation\n- +1:perfect positive linear correlation and\n- 0: No correlation\n\n**Categorical vs Categorical variables**: *Stacked Column Charts* are\nmade to visualize the relation.\\ **Chi square test** is used to derive\nthe statistical significance of relationship between the variables. It\nreturns *probability* for the computed chi-square distribution with the\ndegree of freedom. For more information on Chi Test see `this`_\n\nProbability of 0: It indicates that both categorical variable are\ndependent\n\nProbability of 1: It shows that both variables are independent.\n\nThe graph displays the *p\\_value* along with other information. If it is\nleass than **0.05** it states that the variables are dependent.\n\n**Categorical vs Continuous variables:** To explore the relation between\ncategorical and continuous variables,box plots re drawn at each level of\ncategorical variables. If levels are small in number, it will not show\nthe statistical significance. **ANOVA test** is used to derive the\nstatistical significance of relationship between the variables.\n\nThe graph displays the *p\\_value* along with other information. If it is\nleass than **0.05** it states that the variables are dependent.\n\nFor more information on ANOVA test see\n`this `__\n\n+----------------+-----------+-------------------------------------------------+\n| Parameters | Type | Description |\n+================+===========+=================================================+\n| data\\_input | Dataframe | This is the input Dataframe with all |\n| | | data.(Right now the input can be only be a |\n| | | dataframe input.) |\n+----------------+-----------+-------------------------------------------------+\n| target\\_name | String | The name of the target column. |\n+----------------+-----------+-------------------------------------------------+\n| categorical\\_n | list | Names of all categorical variable columns with |\n| ame | (default= | more than 2 classes, to distinguish them with |\n| | [ | the continuous variablesEmply list implies that |\n| | ]) | there are no categorical features with more |\n| | | than 2 classes. |\n+----------------+-----------+-------------------------------------------------+\n| drop | list | Names of columns to be dropped. |\n| | default=[ | |\n| | ] | |\n+----------------+-----------+-------------------------------------------------+\n| PLOT\\_COLUMNS\\ | int | Number of plots to display vertically in the |\n| _SIZE | (default= | display window.The row size is adjusted |\n| | 4) | accordingly. |\n+----------------+-----------+-------------------------------------------------+\n| bin\\_size | int | Number of bins for the histogram displayed in |\n| | (default= | the categorical vs categorical category. |\n| | \u201cauto\u201d) | |\n+----------------+-----------+-------------------------------------------------+\n| wspace | float32 | Horizontal padding between subplot on the |\n| | (default | display window. |\n| | = 0.5) | |\n+----------------+-----------+-------------------------------------------------+\n| hspace | float32 | Vertical padding between subplot on the display |\n| | (default | window. |\n| | = 0.8) | |\n+----------------+-----------+-------------------------------------------------+\n\n**Code Snippet**\n\n.. code :: python\n\n /* The data set is taken from famous Titanic data(Kaggle)*/\n import pandas as pd\n from visualize_ML import relation\n df = pd.read_csv(\"dataset/train.csv\")\n\n relation.plot(df,\"Survived\",[\"Survived\",\"Pclass\",\"Sex\",\"SibSp\",\"Ticket\",\"Embarked\"],drop=[\"PassengerId\",\"Name\"],bin_size=10)\n\n.. figure:: /images/relation1.png?raw=true\n :alt: Optional Title\n\n Graph made using relation module using matplotlib.\n\nsee the [dataset](https://www.kaggle.com/c/titanic/data)\n\n**Note:** While plotting all the rows with **NaN** values and columns\nwith **Non numeric** values are removed only numeric data is\nplotted.Only categorical taget variable with string values are allowed.\n\nContribute\n----------\n\nIf you want to contribute and add new feature feel free to send Pull\nrequest `here`_\n\nThis project is still under development so to report any bugs or request new features, head over to the Issues page\n\nLicence\n-------\nLicensed under `The MIT License (MIT)`_.\n\nCopyright\n---------\nayush1997(c) 2016\n\n.. _here: https://github.com/ayush1997/visualize_ML\n.. _The MIT License (MIT): https://github.com/ayush1997/visualize_ML/blob/master/LICENSE.txt", "description_content_type": null, "docs_url": null, "download_url": "UNKNOWN", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/ayush1997/visualize_ML", "keywords": "visualization MachineLearning DataScience", "license": "MIT", "maintainer": null, "maintainer_email": null, "name": "visualize_ML", "package_url": "https://pypi.org/project/visualize_ML/", "platform": "UNKNOWN", "project_url": "https://pypi.org/project/visualize_ML/", "project_urls": { "Download": "UNKNOWN", "Homepage": "https://github.com/ayush1997/visualize_ML" }, "release_url": "https://pypi.org/project/visualize_ML/0.2.2/", "requires_dist": null, "requires_python": null, "summary": "To visualize various processes involved in dealing with a Machine Learning problem.", "version": "0.2.2" }, "last_serial": 2261969, "releases": { "0.1.1": [ { "comment_text": "", "digests": { "md5": "56902a1b41990581c6967de1298ced19", "sha256": "897b50e8920975a576a6b7e65e715328115ef3633639c209ce6e2d84e0cce7a5" }, "downloads": -1, "filename": "visualize_ML-0.1.1.tar.gz", "has_sig": false, "md5_digest": "56902a1b41990581c6967de1298ced19", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 8226, "upload_time": "2016-07-30T19:50:40", "url": "https://files.pythonhosted.org/packages/24/80/81911be6625f71d2d8a5ac36ed0bd697c3c627d496fff68e68a7086aa26d/visualize_ML-0.1.1.tar.gz" } ], "0.1.2": [ { "comment_text": "", "digests": { "md5": "539763179501649f5bed6a28696ff8d0", "sha256": "0a7db70519cd51b6b39a129566ce2d05a24b6cab854bbeb60ff8773febf700ff" }, "downloads": -1, "filename": "visualize_ML-0.1.2.tar.gz", "has_sig": false, "md5_digest": "539763179501649f5bed6a28696ff8d0", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 8219, "upload_time": "2016-07-30T20:23:11", "url": "https://files.pythonhosted.org/packages/13/cd/274ff23eedc3b28d5924845c89bd784b06ba418a27e34403a8840c35249a/visualize_ML-0.1.2.tar.gz" } ], "0.2.2": [ { "comment_text": "", "digests": { "md5": "18e75709e20fd80dad2465b32c716641", "sha256": "1ce5cdfa2b5e4dd87c052923b90e0878d49204733d97cb2a2f987b3e71b47fff" }, "downloads": -1, "filename": "visualize_ML-0.2.2.tar.gz", "has_sig": false, "md5_digest": "18e75709e20fd80dad2465b32c716641", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 10936, "upload_time": "2016-08-04T13:37:09", "url": "https://files.pythonhosted.org/packages/b7/28/392ffde9e70595589d0e1bb4733317104275b56c7e3414eb8ae4631e1f39/visualize_ML-0.2.2.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "18e75709e20fd80dad2465b32c716641", "sha256": "1ce5cdfa2b5e4dd87c052923b90e0878d49204733d97cb2a2f987b3e71b47fff" }, "downloads": -1, "filename": "visualize_ML-0.2.2.tar.gz", "has_sig": false, "md5_digest": "18e75709e20fd80dad2465b32c716641", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 10936, "upload_time": "2016-08-04T13:37:09", "url": "https://files.pythonhosted.org/packages/b7/28/392ffde9e70595589d0e1bb4733317104275b56c7e3414eb8ae4631e1f39/visualize_ML-0.2.2.tar.gz" } ] }