{ "info": { "author": "Leland McInnes", "author_email": "leland.mcinnes@gmail.com", "bugtrack_url": null, "classifiers": [ "Development Status :: 3 - Alpha", "Intended Audience :: Developers", "Intended Audience :: Science/Research", "License :: OSI Approved", "Operating System :: MacOS", "Operating System :: Microsoft :: Windows", "Operating System :: POSIX", "Operating System :: Unix", "Programming Language :: C", "Programming Language :: Python", "Programming Language :: Python :: 3.6", "Programming Language :: Python :: 3.7", "Programming Language :: Python :: 3.8", "Topic :: Scientific/Engineering", "Topic :: Software Development" ], "description": "======\nEnsTop\n======\n\nEnsTop provides an ensemble based approach to topic modelling using pLSA. It makes\nuse of a high performance numba based pLSA implementation to run multiple\nbootstrapped topic models in parallel, and then clusters the resulting outputs to\ndetermine a set of stable topics. It can then refit the document vectors against\nthese topics embed documents into the stable topic space.\n\n---------------\nWhy use EnsTop?\n---------------\n\nThere are a number of advantages to using an ensemble approach to topic modelling.\nThe most obvious is that it produces better more stable topics. A close second,\nhowever, is that, by making use of HDBSCAN for clustering topics, it can learn a\n\"natural\" number of topics. That is, while the user needs to specify an estimated\nnumber of topics, the *actual* number of topics produced will be determined by how\nmany stable topics are produced over many bootstrapped runs. In practice this can\neither be more, or less, than the estimated number of topics.\n\nDespite all of these extra features the ensemble topic approach is still very\nefficient, especially in multi-core environments (due the the embarrassingly parallel\nnature of the ensemble). A run with a reasonable size ensemble can be completed in\naround the same time it might take to fit an LDA model, and usually produces superior\nquality results.\n\nIn addition to this EnsTop comes with a pLSA implementation that can be used\nstandalone (and not as part of an ensemble). So if all you are loosing for is a good\nfast pLSA implementation (that can run considerably faster than many LDA\nimplementations) then EnsTop is the library for you.\n\n-----------------\nHow to use EnsTop\n-----------------\n\nEnsTop follows the sklearn API (and inherits from sklearn base classes), so if you\nuse sklearn for LDA or NMF then you already know how to use Enstop. General usage is\nvery straightforward. The following example uses EnsTop to model topics from the\nclassic 20-Newsgroups dataset, using sklearn's CountVectorizer to generate the\nrequired count matrix.\n\n.. code:: python\n\n from sklearn.datasets import fetch_20newsgroups\n from sklearn.feature_extraction.text import CountVectorizer\n from enstop import EnsembleTopics\n\n news = fetch_20newsgroups(subset='all')\n data = CountVectorizer().fit_transform(news.data)\n\n model = EnsembleTopics(n_components=20).fit(data)\n topics = model.components_\n doc_vectors = model.embedding_\n\n\n---------------\nHow to use pLSA\n---------------\n\nEnsTop also provides a simple to use but fast and effective pLSA implementation out\nof the box. As with the ensemble topic modeller it follows the sklearn API, and usage\nis very similar.\n\n.. code:: python\n\n from sklearn.datasets import fetch_20newsgroups\n from sklearn.feature_extraction.text import CountVectorizer\n from enstop import PLSA\n\n news = fetch_20newsgroups(subset='all')\n data = CountVectorizer().fit_transform(news.data)\n\n model = PLSA(n_components=20).fit(data)\n topics = model.components_\n doc_vectors = model.embedding_\n\n\n------------\nInstallation\n------------\n\nThe easiest way to install EnsTop is via pip\n\n.. code:: bash\n\n pip install enstop\n\nTo manually install this package:\n\n.. code:: bash\n\n wget https://github.com/lmcinnes/enstop/archive/master.zip\n unzip master.zip\n rm master.zip\n cd enstop-master\n python setup.py install\n\n----------------\nHelp and Support\n----------------\n\nSome basic example notebooks are available `here <./notebooks>`_.\n\nDocumentation is coming. This project is still very young. If you need help, or have\nproblems please `open an issue `_\nand I will try to provide any help and guidance that I can. Please also check\nthe docstrings on the code, which provide some descriptions of the parameters.\n\n\n-------\nLicense\n-------\n\nThe EnsTop package is 2-clause BSD licensed.\n\n------------\nContributing\n------------\n\nContributions are more than welcome! There are lots of opportunities\nfor potential projects, so please get in touch if you would like to\nhelp out. Everything from code to notebooks to\nexamples and documentation are all *equally valuable* so please don't feel\nyou can't contribute. To contribute please `fork the project `_ make your changes and\nsubmit a pull request. We will do our best to work through any issues with\nyou and get your code merged into the main branch.\n", "description_content_type": "", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "http://github.com/lmcinnes/enstop", "keywords": "topic model", "license": "BSD", "maintainer": "", "maintainer_email": "", "name": "enstop", "package_url": "https://pypi.org/project/enstop/", "platform": "", "project_url": "https://pypi.org/project/enstop/", "project_urls": { "Homepage": "http://github.com/lmcinnes/enstop" }, "release_url": "https://pypi.org/project/enstop/0.2.6/", "requires_dist": null, "requires_python": "", "summary": "Ensemble topic modelling with pLSA", "version": "0.2.6", "yanked": false, "yanked_reason": null }, "last_serial": 11596408, "releases": { "0.1.0": [ { "comment_text": "", "digests": { "md5": "64ec9e82623024358ebbe94bfe309f81", "sha256": "f8750cc4943c3f90e7d44e8b01f083c49f4a1c8c7459463d6d718310722554a9" }, "downloads": -1, "filename": "enstop-0.1.0.tar.gz", "has_sig": false, "md5_digest": "64ec9e82623024358ebbe94bfe309f81", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 17284, "upload_time": "2019-06-20T15:30:38", "upload_time_iso_8601": "2019-06-20T15:30:38.992481Z", "url": "https://files.pythonhosted.org/packages/b8/35/415876a5eb17591d91dac92cf522f29f01364e00d565e00e84d52ca04fb4/enstop-0.1.0.tar.gz", "yanked": false, "yanked_reason": null } ], "0.1.1": [ { "comment_text": "", "digests": { "md5": "00b6668f8ff97262a8c165673b59f362", "sha256": "29b8f74da329dda26412a7f38aeb20117e0304a87683bf1b9d0985489354bc1c" }, "downloads": -1, "filename": "enstop-0.1.1.tar.gz", "has_sig": false, "md5_digest": "00b6668f8ff97262a8c165673b59f362", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 18066, "upload_time": "2019-06-21T17:52:03", "upload_time_iso_8601": "2019-06-21T17:52:03.755574Z", "url": "https://files.pythonhosted.org/packages/43/04/658ea811e6db226f123c54b03caa20bfbd44c37dd0e81d16d1270e7e03ce/enstop-0.1.1.tar.gz", "yanked": false, "yanked_reason": null } ], "0.1.2": [ { "comment_text": "", "digests": { "md5": "3c19a65bb0d0b7fd820d23bb7cffe2ea", "sha256": "b617910e807b709231d9053c75eae3ec8bfda234dc4057393250f81c902ff00e" }, "downloads": -1, "filename": "enstop-0.1.2.tar.gz", "has_sig": false, "md5_digest": "3c19a65bb0d0b7fd820d23bb7cffe2ea", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 15980, "upload_time": "2019-10-29T13:50:42", "upload_time_iso_8601": "2019-10-29T13:50:42.628608Z", "url": "https://files.pythonhosted.org/packages/fa/b8/17562fd895211705d0f6b0113f126205a0e4f2641e70af21ce0e3b7f7a38/enstop-0.1.2.tar.gz", "yanked": false, "yanked_reason": null } ], "0.1.3": [ { "comment_text": "", "digests": { "md5": "453b4d292a8088b1ca8c25c373585453", "sha256": "439c11a61350ce0bde4c373f09a7b097e0ffb577c0390b90c1e7d17e0a947d58" }, "downloads": -1, "filename": "enstop-0.1.3.tar.gz", "has_sig": false, "md5_digest": "453b4d292a8088b1ca8c25c373585453", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 18617, "upload_time": "2019-12-02T01:54:14", "upload_time_iso_8601": "2019-12-02T01:54:14.690782Z", "url": "https://files.pythonhosted.org/packages/c4/70/170c18d6df51b5eaf459c9236cb196f66b2eef7cddb84a534a23e693c65a/enstop-0.1.3.tar.gz", "yanked": false, "yanked_reason": null } ], "0.1.4": [ { "comment_text": "", "digests": { "md5": "579e96840457b127956baf37f2764b19", "sha256": "c42d5f3927f59d7e3b24556f9b76ab1cba5898f223b5741b37d89ca72b1807c0" }, "downloads": -1, "filename": "enstop-0.1.4.tar.gz", "has_sig": false, "md5_digest": "579e96840457b127956baf37f2764b19", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 19504, "upload_time": "2020-03-18T19:55:03", "upload_time_iso_8601": "2020-03-18T19:55:03.434782Z", "url": "https://files.pythonhosted.org/packages/c5/51/007860a2fbdd28dcccad03bbd9efb5c5f680b100e08e41e7549b9fd11eef/enstop-0.1.4.tar.gz", "yanked": false, "yanked_reason": null } ], "0.1.5": [ { "comment_text": "", "digests": { "md5": "8a10573d16c75db74c5554486aaebad6", "sha256": "4965a72dd536079fc7bc87ed4012d5cf5895959e414812ccd984bfa9fae96421" }, "downloads": -1, "filename": "enstop-0.1.5.tar.gz", "has_sig": false, "md5_digest": "8a10573d16c75db74c5554486aaebad6", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 19570, "upload_time": "2020-04-30T18:38:04", "upload_time_iso_8601": "2020-04-30T18:38:04.024775Z", "url": "https://files.pythonhosted.org/packages/a4/c1/a6230d0b0e7ad4cebc3d3902f4bd82afd6985613349a95f305f934b37e9a/enstop-0.1.5.tar.gz", "yanked": false, "yanked_reason": null } ], "0.1.6": [ { "comment_text": "", "digests": { "md5": "94e54963156bc0ad922873b2e73a08e4", "sha256": "f72b3d5e4c713828b1c178c347df6f9b23d8ab1ceb74ec8855913a5027aba552" }, "downloads": -1, "filename": "enstop-0.1.6.tar.gz", "has_sig": false, "md5_digest": "94e54963156bc0ad922873b2e73a08e4", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 19587, "upload_time": "2020-04-30T21:11:03", "upload_time_iso_8601": "2020-04-30T21:11:03.569101Z", "url": "https://files.pythonhosted.org/packages/81/42/24c476be67c8425835f65ae975f3cb9e4fa2a8fd9cdf4805d3216bc79b6e/enstop-0.1.6.tar.gz", "yanked": false, "yanked_reason": null } ], "0.1.7": [ { "comment_text": "", "digests": { "md5": "19892096348a947eabeacfee7a3abe3a", "sha256": "12a101c73135e9271e747610d30e2b6dbbf03eae235e36d827c7bd97e9dd5ed9" }, "downloads": -1, "filename": "enstop-0.1.7.tar.gz", "has_sig": false, "md5_digest": "19892096348a947eabeacfee7a3abe3a", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 19161, "upload_time": "2020-05-14T15:51:22", "upload_time_iso_8601": "2020-05-14T15:51:22.863250Z", "url": "https://files.pythonhosted.org/packages/9a/fd/99a6c8cb918ba77ffe4aec3d7815500646585a057ebbe34b840c3c510814/enstop-0.1.7.tar.gz", "yanked": false, "yanked_reason": null } ], "0.2.0": [ { "comment_text": "", "digests": { "md5": "998e6ffbc4744dc1bad31e2555ecb65b", "sha256": "106a8d69988e25f468adc91957fe036d7e436ef1847913d4a59c70a4fb48ef25" }, "downloads": -1, "filename": "enstop-0.2.0.tar.gz", "has_sig": false, "md5_digest": "998e6ffbc4744dc1bad31e2555ecb65b", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 24279, "upload_time": "2020-06-13T01:44:17", "upload_time_iso_8601": "2020-06-13T01:44:17.087951Z", "url": "https://files.pythonhosted.org/packages/fb/a0/9cb016225b491ab641905d75d82d375bd99d9af74757881d6a9e8e97ffde/enstop-0.2.0.tar.gz", "yanked": false, "yanked_reason": null } ], "0.2.2": [ { "comment_text": "", "digests": { "md5": "7ce63827ce87d69cf4790b4cbe3ebf90", "sha256": "8dd579de94462872088a3f01f92b8cdc7ef4340aeb9afcc9aafee0fcf28c0a17" }, "downloads": -1, "filename": "enstop-0.2.2.tar.gz", "has_sig": false, "md5_digest": "7ce63827ce87d69cf4790b4cbe3ebf90", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 28297, "upload_time": "2020-06-29T18:14:42", "upload_time_iso_8601": "2020-06-29T18:14:42.899905Z", "url": "https://files.pythonhosted.org/packages/f8/0b/ac5838e914f88e889865ea9a1f038586b642db6704644c8a2216d26b5e86/enstop-0.2.2.tar.gz", "yanked": false, "yanked_reason": null } ], "0.2.3": [ { "comment_text": "", "digests": { "md5": "f603fea05f87cb386b02f42e6bfe9cdb", "sha256": "3ca900a55f8c0d51cc25277903960fcf12132e9732461082427b1fb8ca0c1d4f" }, "downloads": -1, "filename": "enstop-0.2.3.tar.gz", "has_sig": false, "md5_digest": "f603fea05f87cb386b02f42e6bfe9cdb", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 28302, "upload_time": "2020-06-29T18:36:49", "upload_time_iso_8601": "2020-06-29T18:36:49.026207Z", "url": "https://files.pythonhosted.org/packages/05/0c/f624d3c670a8276cdf92b36fb9bc9e7798eefb442b66d853963980c5c868/enstop-0.2.3.tar.gz", "yanked": false, "yanked_reason": null } ], "0.2.4": [ { "comment_text": "", "digests": { "md5": "3930698e6f319dd04ea730f1969c5c06", "sha256": "119a408eb6fd7f3c897868586c16c38a770c1e9aae5f46a739da4b386abe2689" }, "downloads": -1, "filename": "enstop-0.2.4.tar.gz", "has_sig": false, "md5_digest": "3930698e6f319dd04ea730f1969c5c06", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 28298, "upload_time": "2020-06-29T23:04:24", "upload_time_iso_8601": "2020-06-29T23:04:24.127101Z", "url": "https://files.pythonhosted.org/packages/68/7a/58e0ea6c612d72986301e64abfcf2bad616f56e50318bd60963473c4cd1e/enstop-0.2.4.tar.gz", "yanked": false, "yanked_reason": null } ], "0.2.5": [ { "comment_text": "", "digests": { "md5": "7031efc2e447ab486947c6059c039400", "sha256": "d54efbf20bfbd4dc65bedc4f5ca7a1fc00fd05dffc84e6b65f22afdce7146aa7" }, "downloads": -1, "filename": "enstop-0.2.5.tar.gz", "has_sig": false, "md5_digest": "7031efc2e447ab486947c6059c039400", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 29942, "upload_time": "2020-07-09T02:02:50", "upload_time_iso_8601": "2020-07-09T02:02:50.401928Z", "url": "https://files.pythonhosted.org/packages/c7/50/26607e751167a6ca5ce1cca6870f57ba1a75dc4dff4ba96f9842fbc8cd0c/enstop-0.2.5.tar.gz", "yanked": false, "yanked_reason": null } ], "0.2.6": [ { "comment_text": "", "digests": { "md5": "70ace7d0af1dec06fab3bac0a88303b2", "sha256": "c4fa88b0eb0c1ca8e694c07cd820acff773bcd62e43280d489f06a25b9856e4c" }, "downloads": -1, "filename": "enstop-0.2.6.tar.gz", "has_sig": false, "md5_digest": "70ace7d0af1dec06fab3bac0a88303b2", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 31551, "upload_time": "2021-09-30T19:11:17", "upload_time_iso_8601": "2021-09-30T19:11:17.667028Z", "url": "https://files.pythonhosted.org/packages/a4/9e/17d6ee6c9f2a4693b6572b08872b32d4014dbfb95a77e25402691ea41057/enstop-0.2.6.tar.gz", "yanked": false, "yanked_reason": null } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "70ace7d0af1dec06fab3bac0a88303b2", "sha256": "c4fa88b0eb0c1ca8e694c07cd820acff773bcd62e43280d489f06a25b9856e4c" }, "downloads": -1, "filename": "enstop-0.2.6.tar.gz", "has_sig": false, "md5_digest": "70ace7d0af1dec06fab3bac0a88303b2", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 31551, "upload_time": "2021-09-30T19:11:17", "upload_time_iso_8601": "2021-09-30T19:11:17.667028Z", "url": "https://files.pythonhosted.org/packages/a4/9e/17d6ee6c9f2a4693b6572b08872b32d4014dbfb95a77e25402691ea41057/enstop-0.2.6.tar.gz", "yanked": false, "yanked_reason": null } ], "vulnerabilities": [] }