{ "info": { "author": "Peter Foster", "author_email": "pyitlib@gmx.us", "bugtrack_url": null, "classifiers": [ "Development Status :: 4 - Beta", "Intended Audience :: Science/Research", "License :: OSI Approved :: MIT License", "Natural Language :: English", "Programming Language :: Python :: 2", "Programming Language :: Python :: 3", "Topic :: Scientific/Engineering :: Artificial Intelligence", "Topic :: Scientific/Engineering :: Information Analysis" ], "description": "pyitlib is an MIT-licensed library of information-theoretic methods for data analysis and machine learning, implemented in Python and NumPy.\n\nAPI documentation is available online at https://pafoster.github.io/pyitlib/.\n\npyitlib implements the following 19 measures on discrete random variables:\n\n* Entropy\n* Joint entropy\n* Conditional entropy\n* Cross entropy\n* Kullback-Leibler divergence\n* Symmetrised Kullback-Leibler divergence\n* Jensen-Shannon divergence\n* Mutual information\n* Normalised mutual information (7 variants)\n* Variation of information\n* Lautum information\n* Conditional mutual information\n* Co-information\n* Interaction information\n* Multi-information\n* Binding information\n* Residual entropy\n* Exogenous local information\n* Enigmatic information\n\nThe following estimators are available for each of the measures:\n\n* Maximum likelihood\n* Maximum a posteriori\n* James-Stein\n* Good-Turing\n\nMissing data are supported, either using placeholder values or NumPy masked arrays.\n\nInstallation and codebase\n-------------------------\npyitlib is listed on the Python Package Index at https://pypi.python.org/pypi/pyitlib/ and may be installed using ``pip`` as follows:\n\n.. code:: python\n\n pip install pyitlib\n\nThe codebase for pyitlib is available at https://github.com/pafoster/pyitlib.\n\n\nNotes for getting started\n-------------------------\n\nImport the module ``discrete_random_variable``, as well as NumPy:\n\n.. code:: python\n\n import numpy as np\n from pyitlib import discrete_random_variable as drv\n\nThe respective methods implemented in ``discrete_random_variable`` accept NumPy arrays as input. Let's compute the entropy for an array containing discrete random variable realisations, based on maximum likelihood estimation and quantifying entropy in bits:\n\n.. code:: python\n\n >>> X = np.array((1,2,1,2))\n >>> drv.entropy(X)\n array(1.0)\n\nNumPy arrays are created automatically for any input which isn't of the required type, by passing the input to np.array(). Let's compute entropy, again based on maximum likelihood estimation, but this time using list input and quantifying entropy in nats:\n\n.. code:: python\n\n >>> drv.entropy(['a', 'b', 'a', 'b'], base=np.exp(1))\n array(0.6931471805599453)\n\nThose methods with the suffix ``_pmf`` operate on arrays specifying probability mass assignments. For example, the analogous method call for computing the entropy of the preceding random variable realisations (with estimated equi-probable outcomes) is:\n\n.. code:: python\n\n >>> drv.entropy_pmf([0.5, 0.5], base=np.exp(1))\n 0.69314718055994529\n\nIt's possible to specify missing data using placeholder values (the default placeholder value is ``-1``). Elements equal to the placeholder value are subsequently ignored:\n\n.. code:: python\n\n >>> drv.entropy([1, 2, 1, 2, -1])\n array(1.0)\n\nIn measures expressible in terms of joint entropy (such as conditional entropy, mutual information etc.), equally many realisations of respective random variables are required (with realisations coupled using a common index). Any missing data for random variable ``X`` results in the corresponding realisations for random variable ``Y`` being ignored, and vice versa. Thus, the following method calls yield equivalent results (note use of alternative placeholder value ``None``):\n\n.. code:: python\n\n >>> drv.entropy_conditional([1,2,2,2], [1,1,2,2])\n array(0.5)\n >>> drv.entropy_conditional([1,2,2,2,1], [1,1,2,2,None], fill_value=None)\n array(0.5)\n\nIt's alternatively possible to specify missing data using NumPy masked arrays:\n\n.. code:: python\n\n >>> Z = np.ma.array((1,2,1), mask=(0,0,1))\n >>> drv.entropy(Z)\n array(1.0)\n\nIn combination with any estimator other than maximum likelihood, it may be useful to specify alphabets containing unobserved outcomes. For example, we might seek to estimate the entropy in bits for the sequence of realisations ``[1,1,1,1]``. Using maximum a posteriori estimation combined with the Perks prior (i.e. pseudo-counts of 1/L for each of L possible outcomes) and based on an alphabet specifying L=100 possible outcomes, we may use:\n\n.. code:: python\n\n >>> drv.entropy([1,1,1,1], estimator='PERKS', Alphabet_X = np.arange(100))\n array(2.030522626645241)\n\nMulti-dimensional array input is supported based on the convention that *leading dimensions index random variables, with the trailing dimension indexing random variable realisations*. Thus, the following array specifies realisations for 3 random variables:\n\n.. code:: python\n\n >>> X = np.array(((1,1,1,1), (1,1,2,2), (1,1,2,2)))\n >>> X.shape\n (3, 4)\n\nWhen using multi-dimensional arrays, any alphabets must be specified separately for each random variable represented in the multi-dimensional array, using placeholder values (or NumPy masked arrays) to pad out any unequally sized alphabets:\n\n.. code:: python\n\n >>> drv.entropy(X, estimator='PERKS', Alphabet_X = np.tile(np.arange(100),(3,1))) # 3 alphabets required\n array([ 2.03052263, 2.81433872, 2.81433872])\n\n >>> A = np.array(((1,2,-1), (1,2,-1), (1,2,3))) # padding required\n >>> drv.entropy(X, estimator='PERKS', Alphabet_X = A)\n array([ 0.46899559, 1. , 1.28669267])\n\nFor ease of use, those methods operating on two random variable array arguments (such as ``entropy_conditional``, ``information_mutual`` etc.) may be invoked with a single multi-dimensional array. In this way, we may compute mutual information for all pairs of random variables represented in the array as follows:\n\n.. code:: python\n\n >>> drv.information_mutual(X)\n array([[ 0., 0., 0.],\n [ 0., 1., 1.],\n [ 0., 1., 1.]])\n\nThe above is equivalent to setting the ``cartesian_product`` parameter to ``True`` and specifying two random variable array arguments explicitly:\n\n.. code:: python\n\n >>> drv.information_mutual(X, X, cartesian_product=True)\n array([[ 0., 0., 0.],\n [ 0., 1., 1.],\n [ 0., 1., 1.]])\n\nBy default, those methods operating on several random variable array arguments don't determine all combinations of random variables exhaustively. Instead a one-to-one mapping is performed:\n\n.. code:: python\n\n >>> drv.information_mutual(X, X) # Mutual information between 3 pairs of random variables\n array([ 0., 1., 1.])\n\n >>> drv.entropy(X) # Mutual information equivalent to entropy in above case\n array([ 0., 1., 1.])\n\npyitlib provides basic support for pandas DataFrames/Series. Both these types are converted to NumPy masked arrays internally, while masking those data recorded as missing (based on .isnull()). Note that due to indexing random variable realisations using the trailing dimension of multi-dimensional arrays, we typically need to transpose DataFrames when estimating information-theoretic quantities:\n\n.. code:: python\n\n >>> import pandas\n >>> df = pandas.read_csv('https://raw.githubusercontent.com/veekun/pokedex/master/pokedex/data/csv/pokemon.csv')\n >>> df = df[['height', 'weight', 'base_experience']].apply(lambda s: pandas.qcut(s, 10, labels=False)) # Bin the data\n >>> drv.information_mutual_normalised(df.T) # Transposition required for comparing columns\n array([[ 1. , 0.32472696, 0.17745753],\n [ 0.32729034, 1. , 0.13343504],\n [ 0.17848175, 0.13315407, 1. ]])", "description_content_type": "", "docs_url": null, "download_url": "https://github.com/pafoster/pyitlib/archive/0.2.2.tar.gz", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/pafoster/pyitlib", "keywords": "entropy,information theory,Shannon information,uncertainty,correlation,statistics,machine learning,data analysis,data science", "license": "MIT", "maintainer": "", "maintainer_email": "", "name": "pyitlib", "package_url": "https://pypi.org/project/pyitlib/", "platform": "", "project_url": "https://pypi.org/project/pyitlib/", "project_urls": { "Download": "https://github.com/pafoster/pyitlib/archive/0.2.2.tar.gz", "Homepage": "https://github.com/pafoster/pyitlib" }, "release_url": "https://pypi.org/project/pyitlib/0.2.2/", "requires_dist": null, "requires_python": "", "summary": "A library of information-theoretic methods", "version": "0.2.2" }, "last_serial": 5739753, "releases": { "0.1.10": [ { "comment_text": "", "digests": { "md5": "dd5bafe28651e4437b6be54bb508a3bb", "sha256": "832d7e6b3b66901634105b5d1d0ac721228c98e2f387f74f3211e9eeb550e8d9" }, "downloads": -1, "filename": "pyitlib-0.1.10.tar.gz", "has_sig": false, "md5_digest": "dd5bafe28651e4437b6be54bb508a3bb", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 26912, "upload_time": "2017-10-29T20:41:55", "url": "https://files.pythonhosted.org/packages/72/15/11f78ef5bac577555fc73d21456b8e4a34a7a10d1981d4df0228664c96b2/pyitlib-0.1.10.tar.gz" } ], "0.1.11": [ { "comment_text": "", "digests": { "md5": "6c1c3d9276395ed0c5b66d16ff0e73cb", "sha256": "3f19fe1d541e0cfdfeead35b5a8f7f0f17009a94171511d80ef14cae75371a9d" }, "downloads": -1, "filename": "pyitlib-0.1.11.tar.gz", "has_sig": false, "md5_digest": "6c1c3d9276395ed0c5b66d16ff0e73cb", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 27318, "upload_time": "2017-10-29T21:13:29", "url": "https://files.pythonhosted.org/packages/35/00/30d3da8042704e581595daeb3913a5b0a44f03e6b88cb2db6b6836f57764/pyitlib-0.1.11.tar.gz" } ], "0.1.12": [ { "comment_text": "", "digests": { "md5": "bb887422b71e8dce55ac1b9f71cc4fb3", "sha256": "754ae5a3f49a600c9b5b0e6c153e1d9fc4d2c5fd7b22c4a591a3d8c3346028eb" }, "downloads": -1, "filename": "pyitlib-0.1.12.tar.gz", "has_sig": false, "md5_digest": "bb887422b71e8dce55ac1b9f71cc4fb3", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 27366, "upload_time": "2018-03-10T23:46:10", "url": "https://files.pythonhosted.org/packages/fd/ee/a3302f7e8443590840b25ec46bc0a23b930cad1c473921ed23fbf0cefe7f/pyitlib-0.1.12.tar.gz" } ], "0.1.13": [ { "comment_text": "", "digests": { "md5": "d5e576d392acec033846b92496210e75", "sha256": "e8730d0d37b9caad069e175f29280cb695b3ce11e8c48573de7c5358b08dde42" }, "downloads": -1, "filename": "pyitlib-0.1.13.tar.gz", "has_sig": false, "md5_digest": "d5e576d392acec033846b92496210e75", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 27402, "upload_time": "2018-04-22T14:15:28", "url": "https://files.pythonhosted.org/packages/85/b3/00804ea0d776891115f3b3bf5c083b39c321586f5f38844e0df7a91dd641/pyitlib-0.1.13.tar.gz" } ], "0.2.0": [ { "comment_text": "", "digests": { "md5": "61c383116d0e3fd313247f2c70418d8f", "sha256": "fb14c83390837ca8fa7a4c41f4044719d662f6accc6be8f613cbb4d643e5f1e1" }, "downloads": -1, "filename": "pyitlib-0.2.0.tar.gz", "has_sig": false, "md5_digest": "61c383116d0e3fd313247f2c70418d8f", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 27603, "upload_time": "2018-12-02T15:08:48", "url": "https://files.pythonhosted.org/packages/34/58/f55476cc2c74e461b43c35aa656473eb0820405ae0d37d7f982508db9fd2/pyitlib-0.2.0.tar.gz" } ], "0.2.1": [ { "comment_text": "", "digests": { "md5": "db7ccc312819e900a20ed8c72b775266", "sha256": "ac1e12783ba9cfede893ea34e088fc108a4a915f7ae52f1d80b85900b8f0d70e" }, "downloads": -1, "filename": "pyitlib-0.2.1.tar.gz", "has_sig": false, "md5_digest": "db7ccc312819e900a20ed8c72b775266", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 27600, "upload_time": "2019-03-29T21:58:16", "url": "https://files.pythonhosted.org/packages/de/31/140c5e99194f226247aff7a21d987a22a4452ac1cf7a808d69695221d30f/pyitlib-0.2.1.tar.gz" } ], "0.2.2": [ { "comment_text": "", "digests": { "md5": "26afd5e12f8d4ab1c291743a38d3d54e", "sha256": "746ee65704da29eb27623755bd330e796164509462e5357b4241f5c6965ea6eb" }, "downloads": -1, "filename": "pyitlib-0.2.2.tar.gz", "has_sig": false, "md5_digest": "26afd5e12f8d4ab1c291743a38d3d54e", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 27610, "upload_time": "2019-08-27T22:37:57", "url": "https://files.pythonhosted.org/packages/22/52/f79a6fc5c7d22cb67bdedfe3c791015021aea2d1d0307bf1ac81f3a10188/pyitlib-0.2.2.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "26afd5e12f8d4ab1c291743a38d3d54e", "sha256": "746ee65704da29eb27623755bd330e796164509462e5357b4241f5c6965ea6eb" }, "downloads": -1, "filename": "pyitlib-0.2.2.tar.gz", "has_sig": false, "md5_digest": "26afd5e12f8d4ab1c291743a38d3d54e", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 27610, "upload_time": "2019-08-27T22:37:57", "url": "https://files.pythonhosted.org/packages/22/52/f79a6fc5c7d22cb67bdedfe3c791015021aea2d1d0307bf1ac81f3a10188/pyitlib-0.2.2.tar.gz" } ] }