{ "info": { "author": "Xianshun Chen", "author_email": "xs0040@gmail.com", "bugtrack_url": null, "classifiers": [ "Intended Audience :: Developers", "Intended Audience :: Education", "Intended Audience :: Science/Research", "License :: OSI Approved :: MIT License", "Natural Language :: English", "Operating System :: OS Independent", "Programming Language :: Python", "Programming Language :: Python :: 2", "Programming Language :: Python :: 2.7", "Programming Language :: Python :: 3", "Programming Language :: Python :: 3.3", "Programming Language :: Python :: 3.4", "Programming Language :: Python :: 3.5", "Topic :: Software Development :: Libraries :: Python Modules", "Topic :: Text Processing :: General", "Topic :: Utilities" ], "description": "pysie\r\n=====\r\n\r\nPackage pysie implements a statistical inference engine in Python\r\n\r\n.. image:: https://travis-ci.org/chen0040/pysie.svg?branch=master\r\n :target: https://travis-ci.org/chen0040/pysie\r\n\r\n.. image:: https://coveralls.io/repos/github/chen0040/pysie/badge.svg?branch=master\r\n :target: https://coveralls.io/github/chen0040/pysie?branch=master\r\n\r\n.. image:: https://scrutinizer-ci.com/g/chen0040/pysie/badges/quality-score.png?b=master\r\n :target: https://scrutinizer-ci.com/g/chen0040/pysie/?branch=master\r\n\r\n\r\nInstall\r\n=======\r\n\r\nRun the following command to install pysie using pip\r\n\r\n.. code-block:: bash\r\n\r\n $ pip install pysie\r\n\r\n\r\nFeatures\r\n========\r\n\r\n* Automatically switch between Student's T, binomial simulation bootstrapping, or normal sampling distribution based on the sample size\r\n* Computer the confidence interval for the sampling distribution given a confidence level\r\n* Carry out hypothesis testing for both mean (for numerical sample data) and proportion (for categorical sample data)\r\n* Carry out hypothesis testing between two different experiment setup (or two different distinct groups or populations)\r\n* Anova: Carry out hypothesis testing on whether a numerical variable is independent of a categorical variable given a sample data table containing the two variables as columns\r\n* Chi-Square Testing: Carry out hypothesis testing on whether two categorical variables are independent of each other given a sample data table containing the two variables as columns\r\n* Anova for regression: Carry out hypothesis testing on whether two numerical variables are independent of each other given a sample data table containing the two variables as columns\r\n\r\nUsage\r\n=====\r\n\r\nNumerical sample\r\n----------------\r\n\r\nThe sample code below shows how to create numerical sample:\r\n\r\n.. code-block:: python\r\n\r\n sample = Sample()\r\n sample.add_numeric(x=0.001)\r\n sample.add_numeric(x=0.02)\r\n ...\r\n\r\n print(sample.size()) # return the rows in the sample data table\r\n print(sample.is_numerical()) # return True\r\n print(sample.is_categorical()) # return False\r\n print(sample.get(0).x) # return 0.001\r\n print(sample.get(1).x) # return 0.02\r\n\r\n\r\nIn the above code, the numerical variable is 'x'\r\n\r\nCategorical sample\r\n------------------\r\n\r\nThe sample code below shows how to create categorical sample:\r\n\r\n.. code-block:: python\r\n\r\n sample = Sample()\r\n sample.add_category(label=\"OK\")\r\n sample.add_category(label=\"CANCEL\")\r\n sample.add_category(label=\"OK\")\r\n ...\r\n\r\n print(sample.size()) # return the rows int the sample data table\r\n print(sample.is_categorical()) # return True\r\n print(sample.is_numerical()) # return False\r\n print(sample.get(0).label) # return \"OK\"\r\n print(sample.get(1).label) # return \"CANCEL\"\r\n\r\n\r\nIn the above code, the categorical variable is 'label'\r\n\r\nSample containing one numerical variable and one categorical variables\r\n-----------------------------------------------------------------------\r\n\r\nThe sample code below shows how to create a sample containing two columns (one numerical and the other categorical):\r\n\r\n.. code-block:: python\r\n\r\n sample = Sample()\r\n sample.add_numeric(x=0.001, group_id='grp1')\r\n sample.add_numeric(x=0.02, group_id='grp1')\r\n sample.add_numeric(x=0.003, group_id='grp1')\r\n ...\r\n\r\n print(sample.size()) # return the rows in the sample data table\r\n print(sample.is_numerical()) # return True\r\n print(sample.is_categorical()) # return False\r\n print(sample.get(0).x) # return 0.001\r\n print(sample.get(0).group_id) # return 'grp1'\r\n print(sample.get(1).x) # return 0.02\r\n print(sample.get(1).group_id) # return 'grp1'\r\n\r\n\r\nIn the above code, the numerical variable is 'x' and the categorical variable is 'group_id'\r\n\r\nSample containing two categorical variables as its data columns\r\n---------------------------------------------------------------\r\n\r\nThe sample code below shows how to create a sample containing two categorical columns\r\n\r\n.. code-block:: python\r\n\r\n sample = Sample()\r\n sample.add_category(label='OK', group_id='grp1')\r\n sample.add_category(label='CANCEL', group_id='grp1')\r\n sample.add_category(label='OK', group_id='grp1')\r\n ...\r\n\r\n print(sample.size()) # return the rows int the sample data table\r\n print(sample.is_categorical()) # return True\r\n print(sample.is_numerical()) # return False\r\n print(sample.get(0).label) # return \"OK\"\r\n print(sample.get(0).group_id) # return 'grp1'\r\n print(sample.get(1).label) # return \"CANCEL\"\r\n print(sample.get(1).group_id) # return 'grp1'\r\n\r\n\r\nIn the above code, the first categorical variable is 'label', and the second categorical variable is 'group_id'\r\n\r\nSample containing two numerical variables as its data columns\r\n-------------------------------------------------------------\r\n\r\nThe sample code below shows how to create a sample containing two numerical columns\r\n\r\n.. code-block:: python\r\n\r\n sample = Sample()\r\n sample.add_xy(x=0.001, y=0.01)\r\n sample.add_xy(x=0.02, y=0.2)\r\n ...\r\n\r\n print(sample.size()) # return the rows in the sample data table\r\n print(sample.is_numerical()) # return True\r\n print(sample.is_categorical()) # return False\r\n print(sample.get(0).x) # return 0.001\r\n print(sample.get(0).y) # return 0.01\r\n print(sample.get(1).x) # return 0.02\r\n print(sample.get(1).y) # return 0.2\r\n\r\n\r\nSampling distribution for Sample Means\r\n--------------------------------------\r\n\r\nThe sample code below show how to derive the sampling distribution for the sample means of a population given a numerical\r\nsample from that population:\r\n\r\n.. code-block:: python\r\n\r\n sample = Sample()\r\n sample.add_numeric(x=0.001)\r\n sample.add_numeric(x=0.02)\r\n ...\r\n\r\n sampling_distribution = MeanSamplingDistribution(sample_distribution=SampleDistribution(sample))\r\n print('sampling distribution: (mu = ' + str(sampling_distribution.point_estimate)\r\n + ', SE = ' + str(sampling_distribution.standard_error) + ')')\r\n print('The sampling distribution belong to family: ' + sampling_distribution.distribution_family)\r\n print('We are 95% confident that the true mean for the underlying population is between : '\r\n + str(sampling_distribution.confidence_interval(0.95)))\r\n\r\n\r\nSampling distribution for Sample Proportions\r\n--------------------------------------------\r\n\r\nThe sample code below show how to derive the sampling distribution for the proportion of class 'A' of a population\r\ngiven a categorical sample from that population:\r\n\r\n.. code-block:: python\r\n\r\n sample = Sample()\r\n sample.add_category(label='A')\r\n sample.add_category(label='C')\r\n sample.add_category(label='A')\r\n sample.add_category(label='B')\r\n ...\r\n\r\n sampling_distribution = ProportionSamplingDistribution(sample_distribution=SampleDistribution(sample,\r\n categorical_value=\"A\"))\r\n print('sampling distribution: (p = ' + str(sampling_distribution.point_estimate)\r\n + ', SE = ' + str(sampling_distribution.standard_error) + ')')\r\n print('The sampling distribution belong to family: ' + sampling_distribution.distribution_family)\r\n print('We are 95% confident that the true proportion of \"A\" in the underlying population is between : '\r\n + str(sampling_distribution.confidence_interval(0.95)))\r\n\r\n\r\nCompare Sample Means between Two Different Groups\r\n-------------------------------------------------\r\n\r\nThe sample code below shows how to derive the sampling distribution for the difference between sample means of two\r\ndifferent groups (e.g., two different experiment setups or two different populations):\r\n\r\n.. code-block:: python\r\n\r\n grp1_sample = Sample()\r\n grp1_sample.add_numeric(x=0.001)\r\n grp1_sample.add_numeric(x=0.02)\r\n ...\r\n grp2_sample = Sample()\r\n grp2_sample.add_numeric(x=0.02)\r\n grp2_sample.add_numeric(x=0.03)\r\n ...\r\n sampling_distribution = MeanDiffSamplingDistribution(grp1_sample_distribution=SampleDistribution(grp1_sample),\r\n grp2_sample_distribution=SampleDistribution(grp2_sample))\r\n self.assertEqual(sampling_distribution.distribution_family, DistributionFamily.normal)\r\n print('sampling distribution: (mean_diff = ' + str(sampling_distribution.point_estimate)\r\n + ', SE = ' + str(sampling_distribution.standard_error) + ')')\r\n print('We are 95% confident that the difference between them is : '\r\n + str(sampling_distribution.confidence_interval(0.95)))\r\n\r\n\r\nCompare Sample Proportions between Two Different Groups\r\n-------------------------------------------------------\r\n\r\nThe sample code below shows how to derive the sampling distribution for the difference between sample means of two\r\ndifferent groups (e.g., two different experiment setups or two different populations):\r\n\r\n.. code-block:: python\r\n\r\n grp1_sample = Sample()\r\n grp1_sample.add_category(label='A')\r\n grp1_sample.add_category(label='C')\r\n ...\r\n grp2_sample = Sample()\r\n grp2_sample.add_category(label='A')\r\n grp2_sample.add_category(label='B')\r\n ...\r\n sampling_distribution = ProportionDiffSamplingDistribution(\r\n grp1_sample_distribution=SampleDistribution(grp1_sample, categorical_value=\"A\"),\r\n grp2_sample_distribution=SampleDistribution(grp2_sample, categorical_value=\"A\"))\r\n self.assertEqual(sampling_distribution.distribution_family, DistributionFamily.normal)\r\n print('sampling distribution: (proportion_diff = ' + str(sampling_distribution.point_estimate)\r\n + ', SE = ' + str(sampling_distribution.standard_error) + ')')\r\n print('We are 95% confident that the difference in proportion of \"A\" between them is : '\r\n + str(sampling_distribution.confidence_interval(0.95)))\r\n\r\n\r\nHypothesis Testing on Mean\r\n--------------------------\r\n\r\nThe sample code below shows how to test whether the true mean of a population (from which the numerical sample is taken)\r\nis equal to a particular value 0.99:\r\n\r\n.. code-block:: python\r\n\r\n sample = Sample()\r\n sample.add_numeric(0.01)\r\n sample.add_numeric(0.02)\r\n ...\r\n\r\n sampling_distribution = MeanSamplingDistribution(sample_distribution=SampleDistribution(sample))\r\n testing = MeanTesting(sampling_distribution=sampling_distribution, mean_null=0.99)\r\n\r\n print('one tail p-value: ' + str(testing.p_value_one_tail))\r\n print('two tail p-value: ' + str(testing.p_value_two_tail))\r\n reject_one_tail, reject_two_tail = testing.will_reject(0.01) # 0.01 is the significance level\r\n print('will reject mean = 0.99 (one-tail) ? ' + str(reject_one_tail))\r\n print('will reject mean = 0.99 (two-tail) ? ' + str(reject_two_tail))\r\n\r\n\r\nHypothesis Testing on Proportion\r\n--------------------------------\r\n\r\nThe sample code below shows how to test whether the true proportion of class \"A\" in a population (from which the\r\ncategorical sample is taken) is equal to a particular value 0.51:\r\n\r\n.. code-block:: python\r\n\r\n sample = Sample()\r\n sample.add_category(\"A\")\r\n sample.add_category(\"B\")\r\n sample.add_category(\"A\")\r\n ...\r\n\r\n sampling_distribution = ProportionSamplingDistribution(\r\n sample_distribution=SampleDistribution(sample, categorical_value=\"A\"))\r\n\r\n testing = ProportionTesting(sampling_distribution=sampling_distribution, p_null=0.51)\r\n\r\n print('one tail p-value: ' + str(testing.p_value_one_tail))\r\n print('two tail p-value: ' + str(testing.p_value_two_tail))\r\n reject_one_tail, reject_two_tail = testing.will_reject(0.01) # 0.01 is the significance level\r\n print('will reject proportion(A) = 0.51 (one-tail) ? ' + str(reject_one_tail))\r\n print('will reject proportion(A) = 0.51 (two-tail) ? ' + str(reject_two_tail))\r\n\r\n\r\nHypothesis Testing on Mean Comparison (Two Groups)\r\n--------------------------------------------------\r\n\r\nThe sample code below shows how to test whether to reject the hypothesis that the means of two different groups (e.g.\r\ntwo different experiments or populations from which the numerical samples are take) are the same:\r\n\r\n.. code-block:: python\r\n\r\n grp1_sample = Sample()\r\n grp1_sample.add_numeric(0.01)\r\n grp1_sample.add_numeric(0.02)\r\n ...\r\n grp2_sample = Sample()\r\n grp2_sample.add_numeric(0.03)\r\n grp2_sample.add_numeric(0.02)\r\n ...\r\n\r\n sampling_distribution = MeanDiffSamplingDistribution(grp1_sample_distribution=SampleDistribution(grp1_sample),\r\n grp2_sample_distribution=SampleDistribution(grp2_sample))\r\n\r\n testing = MeanDiffTesting(sampling_distribution=sampling_distribution)\r\n\r\n print('one tail p-value: ' + str(testing.p_value_one_tail))\r\n print('two tail p-value: ' + str(testing.p_value_two_tail))\r\n reject_one_tail, reject_two_tail = testing.will_reject(0.01) # 0.01 is the significance level\r\n print('will reject hypothesis that two groups have same means (one-tail) ? ' + str(reject_one_tail))\r\n print('will reject hypothesis that two groups have same means (two-tail) ? ' + str(reject_two_tail))\r\n\r\n\r\nHypothesis Testing on Proportion Comparison (Two Groups)\r\n--------------------------------------------------------\r\n\r\nThe sample code below shows how to test whether reject the hypothesis that the true proportion of class \"A\" in two\r\ngroups (from which the categorical samples are taken) are equal to each other:\r\n\r\n.. code-block:: python\r\n\r\n grp1_sample = Sample()\r\n grp1_sample.add_category(\"A\")\r\n grp1_sample.add_category(\"B\")\r\n grp1_sample.add_category(\"A\")\r\n ...\r\n grp2_sample = Sample()\r\n grp2_sample.add_category(\"A\")\r\n grp2_sample.add_category(\"B\")\r\n grp2_sample.add_category(\"C\")\r\n ...\r\n\r\n sampling_distribution = ProportionDiffSamplingDistribution(\r\n grp1_sample_distribution=SampleDistribution(grp1_sample, categorical_value=\"A\"),\r\n grp2_sample_distribution=SampleDistribution(grp2_sample, categorical_value=\"A\"))\r\n self.assertEqual(sampling_distribution.distribution_family, DistributionFamily.normal)\r\n\r\n testing = ProportionDiffTesting(sampling_distribution=sampling_distribution)\r\n\r\n print('one tail p-value: ' + str(testing.p_value_one_tail))\r\n print('two tail p-value: ' + str(testing.p_value_two_tail))\r\n reject_one_tail, reject_two_tail = testing.will_reject(0.01) # 0.01 is the significance level\r\n print('will reject proportion(A, grp1) = proportion(A, grp2) (one-tail) ? ' + str(reject_one_tail))\r\n print('will reject proportion(A, grp1) = proportion(A, grp2) (two-tail) ? ' + str(reject_two_tail))\r\n\r\n\r\nIndependence Testing between One Numerical and One Categorical Variable (ANOVA)\r\n-------------------------------------------------------------------------------\r\n\r\nThe sample code below show how to test whether to reject the hypothesis that a numerical and categorical variable are\r\nindependent of each other for a population (from which the numerical sample is taken):\r\n\r\n.. code-block:: python\r\n\r\n sample = Sample()\r\n sample.add_numeric(x=0.001, group_id='grp1')\r\n sample.add_numeric(x=0.02, group_id='grp1')\r\n sample.add_numeric(x=0.003, group_id='grp1')\r\n ...\r\n\r\n testing = Anova(sample=sample)\r\n\r\n print('p-value: ' + str(testing.p_value))\r\n reject = testing.will_reject(0.01)\r\n print('will reject [same mean for all groups] ? ' + str(reject))\r\n\r\n\r\nIndependence Testing between Two Categorical Variables (Chi-Square Testing):\r\n----------------------------------------------------------------------------\r\n\r\nThe sample code below show how to test whether to reject that hypothesis that two categorical variables are independent\r\nof each other for a population (from which the categorical sampleis taken):\r\n\r\n\r\n.. code-block:: python\r\n\r\n sample = Sample()\r\n\r\n for i in range(1000):\r\n sample.add_category('itemA' if numpy.random.randn() > 0 else 'itemB', 'group1')\r\n sample.add_category('itemA' if numpy.random.randn() > 0 else 'itemB', 'group2')\r\n sample.add_category('itemA' if numpy.random.randn() > 0 else 'itemB', 'group3')\r\n\r\n testing = ChiSquare(sample=sample)\r\n\r\n print('p-value: ' + str(testing.p_value))\r\n reject = testing.will_reject(0.01)\r\n print('will reject [two categorical variables are independent of each other] ? ' + str(reject))", "description_content_type": null, "docs_url": null, "download_url": "UNKNOWN", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/chen0040/pysie", "keywords": "", "license": "MIT", "maintainer": "", "maintainer_email": "", "name": "pysie", "package_url": "https://pypi.org/project/pysie/", "platform": "any", "project_url": "https://pypi.org/project/pysie/", "project_urls": { "Download": "UNKNOWN", "Homepage": "https://github.com/chen0040/pysie" }, "release_url": "https://pypi.org/project/pysie/0.0.2/", "requires_dist": null, "requires_python": null, "summary": "Python implementation of a statistical inference engine", "version": "0.0.2" }, "last_serial": 2952561, "releases": { "0.0.1": [ { "comment_text": "", "digests": { "md5": "af4ef13ace4cee5bd32c3ded457ccbc9", "sha256": "c8239116b35be927ebc953f9fcd8a8de1bce86179cc0d56dfe05dc4e82a1d773" }, "downloads": -1, "filename": "pysie-0.0.1.zip", "has_sig": false, "md5_digest": "af4ef13ace4cee5bd32c3ded457ccbc9", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 10446, "upload_time": "2017-06-09T12:00:35", "url": "https://files.pythonhosted.org/packages/4e/1d/650096e93100a969eefba0a6cbb15864dd4a753180a02e1d6b2a4f826556/pysie-0.0.1.zip" } ], "0.0.2": [ { "comment_text": "", "digests": { "md5": "2be7f5eff45b40269a22a04f3580bf14", "sha256": "e59390138a9c22153e4c51352719c513a497d0085a8b8e75137108a1130760b0" }, "downloads": -1, "filename": "pysie-0.0.2.zip", "has_sig": false, "md5_digest": "2be7f5eff45b40269a22a04f3580bf14", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 10715, "upload_time": "2017-06-15T16:16:14", "url": "https://files.pythonhosted.org/packages/14/df/a5b05e190b3a9f9c2b4f5ce5e0ab2d1e280e46959ada1b34c290d1c2fbe0/pysie-0.0.2.zip" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "2be7f5eff45b40269a22a04f3580bf14", "sha256": "e59390138a9c22153e4c51352719c513a497d0085a8b8e75137108a1130760b0" }, "downloads": -1, "filename": "pysie-0.0.2.zip", "has_sig": false, "md5_digest": "2be7f5eff45b40269a22a04f3580bf14", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 10715, "upload_time": "2017-06-15T16:16:14", "url": "https://files.pythonhosted.org/packages/14/df/a5b05e190b3a9f9c2b4f5ce5e0ab2d1e280e46959ada1b34c290d1c2fbe0/pysie-0.0.2.zip" } ] }