{ "info": { "author": "Georg Heimel", "author_email": "georg@muckisnspirit.com", "bugtrack_url": null, "classifiers": [ "Development Status :: 4 - Beta", "Intended Audience :: Science/Research", "License :: OSI Approved :: GNU General Public License v3 (GPLv3)", "Natural Language :: English", "Operating System :: POSIX :: Linux", "Programming Language :: Python :: 3.6", "Programming Language :: Python :: 3.7", "Topic :: Scientific/Engineering :: Information Analysis", "Topic :: Text Processing" ], "description": "[![Documentation Status](https://readthedocs.org/projects/probabilistic-latent-semantic-analysis/badge/?version=latest)](https://probabilistic-latent-semantic-analysis.readthedocs.io/en/latest/?badge=latest)\n# PLSA\n_A `python` implementation of Probabilistic Latent Semantic Analysis_\n\n## What PLSA can do for you\nBroadly speaking,\n[PLSA](https://en.wikipedia.org/wiki/Probabilistic_latent_semantic_analysis)\nis a tool of\n[Natural Language Processing](https://en.wikipedia.org/wiki/Natural_language_processing)\n(NLP). It analyses a collection of text documents (a _corpus_) under the assumption\nthat there are (by far) fewer _topics_ to write about than there are documents\nin the corpus. It then tries to identify these topics (in terms of words and\ntheir relative importance to each topic) and to give you the relative\nimportance of a pre-specified number of topics in each document.\n\nIn doing so, it does not actually try to \"make sense\" of each document (or\n\"understand\" it) by contextually analysing it. Rather, it simply counts how\noften which word occurs in each document, regardless of the context in which\nthey occur. As such, it belongs to the family of so-called\n[bag-of-words](https://en.wikipedia.org/wiki/Bag-of-words_model) models. \n\n\nIn reducing a large number of documents to a much smaller number of topics,\nPSLA can be seen as an example of unsupervised\n[dimensionality reduction](https://en.wikipedia.org/wiki/Dimensionality_reduction),\nmost related to\n[non-negative matrix factorization](https://en.wikipedia.org/wiki/Dimensionality_reduction).\n\nTo give an example, a bunch of documents might frequently contain words like\n\"eating\", \"nutrition\", \"health\", _etc._ Others might contain words like \"state\",\n\"party\", \"ministry\", _etc_. Yet others might contain words like \"tournament\",\n\"ranking\", \"win\", _etc._ It is easy to imagine there being documents that\ncontain a mixture of these words. Not knowing in advance how many topics there\nare, one would have to run PLSA with several different numbers of topics and\nsee the results to judge how many is a good choice. Picking three in our example\nwould yield topics that could be described as \"food\", \"politics\", and \"sports\"\nand, while a number of documents will emerge as being purely about one of these\ntopics, it is easy to imagine that there are others that have contributions\nfrom more than one topic (_e.g._, about a new initiative from the ministry\nof health, combining \"food\" and \"politics\"). PLSA will give you that\nmixture.\n\n## Installation\nThis code is available on the python package index [PyPi](https://pypi.org/).\nTo install, I _strongly_ recommend setting up a new virtual python environment,\nand then type\n```bash\npip install plsa\n```\non the console.\n\n__WARNING__: _On first use, some components of `nltk` that don't come with it out-of-the-box\nwil be downloaded. Should you install (against my express recommendation) install\nthe `plsa` package system-wide (with `sudo`), then you lack the access rights\nto write the required `nltk` data to where it is supposed to go (into a subfolder of\nthe `plsa` package directory)._\n\n#### Dependencies\nThis package depends on the following python packages:\n- `numpy`\n- `matplotlib`\n- `wordcould`\n- `nltk`\n\nIf you want to run the\n[example notebook](https://github.com/yedivanseven/PLSA/blob/master/notebooks/Examples.ipynb),\nyou will also need to install the `jupyter` package.\n\n## Getting Started\nClone the\n[GitHub repository](https://github.com/yedivanseven/PLSA)\nand run the `jupyter` notebook\n[Examples.ipynb](https://github.com/yedivanseven/PLSA/blob/master/notebooks/Examples.ipynb)\nin the\n[notebooks](https://github.com/yedivanseven/PLSA/tree/master/notebooks)\nfolder.\n\n## Documentation\nRead the [API documentation](https://probabilistic-latent-semantic-analysis.readthedocs.io/en/latest/index.html) on [Read the Docs](https://readthedocs.org/)\n\n## Technical considerations\nThe matrices to store and manipulate data can easily get quite large. That\nmeans you will soon run out of memory when toying with a large corpus. This\ncould be mitigated to some extent by using sparse matrices. But since there\nis no built-in support for sparse matrices of more than 2 dimensions\n(we need 3) in `scipy`, this is not implemented.\n\n\n\n\n", "description_content_type": "text/markdown", "docs_url": null, "download_url": "https://pypi.python.org/pypi/plsa", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/yedivanseven/PLSA", "keywords": "nlp bag-of-words", "license": "", "maintainer": "", "maintainer_email": "", "name": "plsa", "package_url": "https://pypi.org/project/plsa/", "platform": "", "project_url": "https://pypi.org/project/plsa/", "project_urls": { "Documentation": "https://probabilistic-latent-semantic-analysis.readthedocs.io/en/latest/index.html", "Download": "https://pypi.python.org/pypi/plsa", "Homepage": "https://github.com/yedivanseven/PLSA" }, "release_url": "https://pypi.org/project/plsa/0.6.0/", "requires_dist": [ "matplotlib (>=3.0)", "nltk (>=3.4.5)", "numpy (>=1.16)", "wordcloud (>=1.5)" ], "requires_python": "", "summary": "Probabilistic Latent Semantic Analysis", "version": "0.6.0" }, "last_serial": 5850355, "releases": { "0.1.1": [ { "comment_text": "", "digests": { "md5": "df9bac1df27cb8298c17d6159ca03199", "sha256": "2e5712a1fe7fb11edaf952f14c575b4d3165a04d7acb29433d6593755b21d8e4" }, "downloads": -1, "filename": "plsa-0.1.1-py3-none-any.whl", "has_sig": false, "md5_digest": "df9bac1df27cb8298c17d6159ca03199", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 30070, "upload_time": "2019-09-05T14:41:35", "url": "https://files.pythonhosted.org/packages/47/87/b6b547b9eadc0c37617078bf968756e111a7c5ffe14e11a74096503aca25/plsa-0.1.1-py3-none-any.whl" } ], "0.2.0": [ { "comment_text": "", "digests": { "md5": "b62beee3c240f0432b11472510ef12ab", "sha256": "145eae780f4b4c6100f52b2d140e97ecbe796290472feadb1662eddcc769f372" }, "downloads": -1, "filename": "plsa-0.2.0-py3-none-any.whl", "has_sig": false, "md5_digest": "b62beee3c240f0432b11472510ef12ab", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 31595, "upload_time": "2019-09-06T11:31:11", "url": "https://files.pythonhosted.org/packages/86/16/ae6dea98c3ed376c0a4183c3ddb85942b58587f132410ff4a42c84c53471/plsa-0.2.0-py3-none-any.whl" } ], "0.3.0": [ { "comment_text": "", "digests": { "md5": "dbe2f4b277972b726822e7a3b6e1fb7d", "sha256": "f2c7b74d19394de4c65c9fa85b2f9a1a6dc8bc56af43570259cd8c4ad0d27243" }, "downloads": -1, "filename": "plsa-0.3.0-py3-none-any.whl", "has_sig": false, "md5_digest": "dbe2f4b277972b726822e7a3b6e1fb7d", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 31606, "upload_time": "2019-09-10T13:04:28", "url": "https://files.pythonhosted.org/packages/e9/8c/22dc42841e569c5d57a6306f45d4a60830f9305a2055188177e9dcecfec1/plsa-0.3.0-py3-none-any.whl" } ], "0.4.0": [ { "comment_text": "", "digests": { "md5": "955960c45ca1203043da4a1fa270363b", "sha256": "a56f1573f29e077f29c02fcd333c47798acc27e1ecfffb4660175c28c7e65bf1" }, "downloads": -1, "filename": "plsa-0.4.0-py3-none-any.whl", "has_sig": false, "md5_digest": "955960c45ca1203043da4a1fa270363b", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 31829, "upload_time": "2019-09-11T09:33:02", "url": "https://files.pythonhosted.org/packages/a0/0f/63d2d4629a3dca12ed288a35542ae273ad706dc004135edff826c9653e96/plsa-0.4.0-py3-none-any.whl" } ], "0.5.0": [ { "comment_text": "", "digests": { "md5": "1ce4b15c8a827932510f73f97be0cd83", "sha256": "9afa276e374794a746698667819b0db18c3180620ae2302b3645cc9209d79f58" }, "downloads": -1, "filename": "plsa-0.5.0-py3-none-any.whl", "has_sig": false, "md5_digest": "1ce4b15c8a827932510f73f97be0cd83", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 32339, "upload_time": "2019-09-17T12:57:52", "url": "https://files.pythonhosted.org/packages/d4/ef/562597011f1285798d9ba23eef658acfae8e313be7c26c05182c9de0f321/plsa-0.5.0-py3-none-any.whl" } ], "0.6.0": [ { "comment_text": "", "digests": { "md5": "15a88eb9bb8ff0e90ae0dfa5354347b6", "sha256": "d056fee79bc86f4c07e57877dff0506160ad2f9abb5f11cefcd247332a95af11" }, "downloads": -1, "filename": "plsa-0.6.0-py3-none-any.whl", "has_sig": false, "md5_digest": "15a88eb9bb8ff0e90ae0dfa5354347b6", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 33088, "upload_time": "2019-09-18T13:45:33", "url": "https://files.pythonhosted.org/packages/de/45/9e87603bab31ca53fc7f95d6821f5a195f628225c6eb39972352ab65fb6a/plsa-0.6.0-py3-none-any.whl" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "15a88eb9bb8ff0e90ae0dfa5354347b6", "sha256": "d056fee79bc86f4c07e57877dff0506160ad2f9abb5f11cefcd247332a95af11" }, "downloads": -1, "filename": "plsa-0.6.0-py3-none-any.whl", "has_sig": false, "md5_digest": "15a88eb9bb8ff0e90ae0dfa5354347b6", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 33088, "upload_time": "2019-09-18T13:45:33", "url": "https://files.pythonhosted.org/packages/de/45/9e87603bab31ca53fc7f95d6821f5a195f628225c6eb39972352ab65fb6a/plsa-0.6.0-py3-none-any.whl" } ] }