{ "info": { "author": "markoarnauto", "author_email": "markus.tretzmueller@cortecs.at", "bugtrack_url": null, "classifiers": [ "License :: OSI Approved :: MIT License", "Operating System :: OS Independent", "Programming Language :: Python :: 3" ], "description": "# Biterm Topic Model\n\nThis is a simple Python implementation of the awesome\n[Biterm Topic Model](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.402.4032&rep=rep1&type=pdf).\nThis model is accurate in short text classification.\nIt explicitly models the word co-occurrence patterns in the whole corpus to solve the problem of sparse word co-occurrence at document-level.\n\nSimply install by:\n```\npip install biterm\n```\n\nLoad some short texts and vectorize them via sklearn.\n\n```python\n from sklearn.feature_extraction.text import CountVectorizer\n\n texts = open('./data/reuters.titles').read().splitlines()[:50]\n vec = CountVectorizer(stop_words='english')\n X = vec.fit_transform(texts).toarray()\n```\nGet the vocabulary and the biterms from the texts.\n```python\n from biterm.utility import vec_to_biterms\n\n vocab = np.array(vec.get_feature_names())\n biterms = vec_to_biterms(X)\n```\nCreate a BTM and pass the biterms to train it.\n```python\n from biterm.cbtm import oBTM\n\n btm = oBTM(num_topics=20, V=vocab)\n topics = btm.fit_transform(biterms, iterations=100)\n```\nSave a topic plot using pyLDAvis and explore the results! (also see *simple_btml.py*)\n```python\n from biterm.btm import oBTM\n\n btm = oBTM(num_topics=20, V=vocab)\n topics = btm.fit_transform(biterms, iterations=100)\n```\n![pyLDAvis Visualization](https://github.com/markoarnauto/biterm/blob/master/vis/simple_btm.png)\n\nInference is done with Gibbs Sampling and it's not really fast. The implementation is not meant for production.\nBut if you have to classify a lot of texts you can try using online learning. Use the Cython version to speed up performance a bit.\n```python\nimport numpy as np\nimport pyLDAvis\nfrom biterm.cbtm import oBTM \nfrom sklearn.feature_extraction.text import CountVectorizer\nfrom biterm.utility import vec_to_biterms, topic_summuary # helper functions\n\nif __name__ == \"__main__\":\n\n texts = open('./data/reuters.titles').read().splitlines()\n\n # vectorize texts\n vec = CountVectorizer(stop_words='english')\n X = vec.fit_transform(texts).toarray()\n\n # get vocabulary\n vocab = np.array(vec.get_feature_names())\n\n # get biterms\n biterms = vec_to_biterms(X)\n\n # create btm\n btm = oBTM(num_topics=20, V=vocab)\n\n print(\"\\n\\n Train Online BTM ..\")\n for i in range(0, len(biterms), 100): # prozess chunk of 200 texts\n biterms_chunk = biterms[i:i + 100]\n btm.fit(biterms_chunk, iterations=50)\n topics = btm.transform(biterms)\n\n print(\"\\n\\n Visualize Topics ..\")\n vis = pyLDAvis.prepare(btm.phi_wz.T, topics, np.count_nonzero(X, axis=1), vocab, np.sum(X, axis=0))\n pyLDAvis.save_html(vis, './vis/online_btm.html')\n\n print(\"\\n\\n Topic coherence ..\")\n topic_summuary(btm.phi_wz.T, X, vocab, 10)\n\n print(\"\\n\\n Texts & Topics ..\")\n for i in range(len(texts)):\n print(\"{} (topic: {})\".format(texts[i], topics[i].argmax()))\n```\n\n", "description_content_type": "text/markdown", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/markoarnauto/biterm", "keywords": "", "license": "", "maintainer": "", "maintainer_email": "", "name": "biterm", "package_url": "https://pypi.org/project/biterm/", "platform": "", "project_url": "https://pypi.org/project/biterm/", "project_urls": { "Homepage": "https://github.com/markoarnauto/biterm" }, "release_url": "https://pypi.org/project/biterm/0.1.5/", "requires_dist": [ "numpy", "tqdm", "cython", "nltk" ], "requires_python": "", "summary": "Biterm Topic Model", "version": "0.1.5" }, "last_serial": 4956022, "releases": { "0.0.1": [ { "comment_text": "", "digests": { "md5": "8cc10377aa0d1f42bac060ee900cbb56", "sha256": "0aea5118ff559fdecc655a63e143743c6b9c7f49bc621fec2f8f5e72ddc03309" }, "downloads": -1, "filename": "biterm-0.0.1-cp36-cp36m-macosx_10_7_x86_64.whl", "has_sig": false, "md5_digest": "8cc10377aa0d1f42bac060ee900cbb56", "packagetype": "bdist_wheel", "python_version": "cp36", "requires_python": null, "size": 62239, "upload_time": "2019-02-16T18:59:30", "url": "https://files.pythonhosted.org/packages/2b/5a/94f7fda1e1427ba2224402c5d2a5633c4c17fa6f41fb9114555c958f2f9a/biterm-0.0.1-cp36-cp36m-macosx_10_7_x86_64.whl" }, { "comment_text": "", "digests": { "md5": "24e3d22e2746d8856b5a8fd1ee0d0a28", "sha256": "f9af3e6d9c80b13c9d7a30e7f55318d49b23bac0ed4da577e0c3a21d8fb75711" }, "downloads": -1, "filename": "biterm-0.0.1-py3-none-any.whl", "has_sig": false, "md5_digest": "24e3d22e2746d8856b5a8fd1ee0d0a28", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 4384, "upload_time": "2018-11-30T16:33:41", "url": "https://files.pythonhosted.org/packages/32/60/b3d8f054887273f1c243f12ff4bd1be89ded58710a399f46333ad3d63bdd/biterm-0.0.1-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "d6dd4a3d902f840f86d7721e22f52dde", "sha256": "45ac8a32bc9966dfb43fbb48f4be8c6249115a547c97ef8cfdeebdc96dc1d474" }, "downloads": -1, "filename": "biterm-0.0.1.tar.gz", "has_sig": false, "md5_digest": "d6dd4a3d902f840f86d7721e22f52dde", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 2630, "upload_time": "2018-11-30T16:33:43", "url": "https://files.pythonhosted.org/packages/a0/7b/6a618e907f252bc04b5ccca35220cbe1ab317e403a935f7490302c6ddbf2/biterm-0.0.1.tar.gz" } ], "0.1.0": [ { "comment_text": "", "digests": { "md5": "6c198452f15c936ce066a2553788a6af", "sha256": "acf8538f41ad12c9274278688711acc1b13a7531c08424223ccaabb04340a38c" }, "downloads": -1, "filename": "biterm-0.1.0-cp36-cp36m-macosx_10_7_x86_64.whl", "has_sig": false, "md5_digest": "6c198452f15c936ce066a2553788a6af", "packagetype": "bdist_wheel", "python_version": "cp36", "requires_python": null, "size": 62240, "upload_time": "2019-02-16T19:03:59", "url": "https://files.pythonhosted.org/packages/fe/f2/2aaefe4c77e11f2efc4664eacb66394a69bb7bda1ee30e6db451e0997f90/biterm-0.1.0-cp36-cp36m-macosx_10_7_x86_64.whl" }, { "comment_text": "", "digests": { "md5": "0ec921f243cdeee6dd1a0798fd18fa6c", "sha256": "c61bbd5690d37c1719d5e364edaf852a733071caa41c69ef836b83bfb1d21e71" }, "downloads": -1, "filename": "biterm-0.1.0.tar.gz", "has_sig": false, "md5_digest": "0ec921f243cdeee6dd1a0798fd18fa6c", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 79394, "upload_time": "2019-02-16T19:05:54", "url": "https://files.pythonhosted.org/packages/f4/0b/d2b740b85fafb80b769699b10d2e806141eef894ed7ac12f048611e996ee/biterm-0.1.0.tar.gz" } ], "0.1.1": [ { "comment_text": "", "digests": { "md5": "5ee7d5e94828ee65c8c1924ff92f1a08", "sha256": "8ad62c810d0e240691faf2cb6b1f720d40a7436241e4f6b9bc8b05b04d72223b" }, "downloads": -1, "filename": "biterm-0.1.1-cp36-cp36m-macosx_10_7_x86_64.whl", "has_sig": false, "md5_digest": "5ee7d5e94828ee65c8c1924ff92f1a08", "packagetype": "bdist_wheel", "python_version": "cp36", "requires_python": null, "size": 62258, "upload_time": "2019-02-16T19:29:14", "url": "https://files.pythonhosted.org/packages/ce/d6/383ac811cd3ed4e1faf331cfc6896bc153a36b84632ec794a54c633809b0/biterm-0.1.1-cp36-cp36m-macosx_10_7_x86_64.whl" }, { "comment_text": "", "digests": { "md5": "e95a7dfbe1ea8a78451b434bb9f31d36", "sha256": "ab29b60d32f3dfca30d62d8af216977dd38c8a0f9f9de5debda621aa9ed3a0b7" }, "downloads": -1, "filename": "biterm-0.1.1.tar.gz", "has_sig": false, "md5_digest": "e95a7dfbe1ea8a78451b434bb9f31d36", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 79502, "upload_time": "2019-02-16T19:29:16", "url": "https://files.pythonhosted.org/packages/51/12/95824269da03c965c2857cb0fc77b01ecb57fa39ec55b6593f941d7bff33/biterm-0.1.1.tar.gz" } ], "0.1.2": [ { "comment_text": "", "digests": { "md5": "7346e73905daaadda072f0f89650282a", "sha256": "2bee6ab071da553b35d8a448beb839ada47a816a08f31d619838b17056742bf3" }, "downloads": -1, "filename": "biterm-0.1.2-cp36-cp36m-macosx_10_7_x86_64.whl", "has_sig": false, "md5_digest": "7346e73905daaadda072f0f89650282a", "packagetype": "bdist_wheel", "python_version": "cp36", "requires_python": null, "size": 62262, "upload_time": "2019-02-16T19:42:43", "url": "https://files.pythonhosted.org/packages/ab/b2/3caa7610bd8227ef2d6367dbe283fe34872c4903b867ae85b03bb84285d9/biterm-0.1.2-cp36-cp36m-macosx_10_7_x86_64.whl" }, { "comment_text": "", "digests": { "md5": "610777029717e1168dccf9a170c5ebdd", "sha256": "576b7b6086dcd0394537d9a6686f34f8f78c298c9f690a3d77149ae7dd4f2ca1" }, "downloads": -1, "filename": "biterm-0.1.2.tar.gz", "has_sig": false, "md5_digest": "610777029717e1168dccf9a170c5ebdd", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 79505, "upload_time": "2019-02-16T19:42:45", "url": "https://files.pythonhosted.org/packages/33/7e/53f846546b33171bedc2967ead97aad69d8c167b8dc4b1f6a978501e1c79/biterm-0.1.2.tar.gz" } ], "0.1.3": [ { "comment_text": "", "digests": { "md5": "06aa14885b619f6197ce8191f1cd392f", "sha256": "d3c4486beae7526c4ce572b2a32eefc48c5ba149ab9b14ca401989039eaacaba" }, "downloads": -1, "filename": "biterm-0.1.3-cp36-cp36m-macosx_10_7_x86_64.whl", "has_sig": false, "md5_digest": "06aa14885b619f6197ce8191f1cd392f", "packagetype": "bdist_wheel", "python_version": "cp36", "requires_python": null, "size": 62262, "upload_time": "2019-02-16T20:40:47", "url": "https://files.pythonhosted.org/packages/3b/83/5cdf69007ee3739751eeaf87543242015c0d7eb8a8be35e1d21d243d6b85/biterm-0.1.3-cp36-cp36m-macosx_10_7_x86_64.whl" }, { "comment_text": "", "digests": { "md5": "4e3e817ad50c1c8e6085788b30764118", "sha256": "5e18f9c98143f395db0a459ebf53cb2b1f41fb07e0771a6f1bc307d8a4efb1ff" }, "downloads": -1, "filename": "biterm-0.1.3.tar.gz", "has_sig": false, "md5_digest": "4e3e817ad50c1c8e6085788b30764118", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 79515, "upload_time": "2019-02-16T20:40:49", "url": "https://files.pythonhosted.org/packages/cd/a6/bba942e9a923dcac536b9e49af5d36104dee6523f25d8bbafaae18e2c6d9/biterm-0.1.3.tar.gz" } ], "0.1.5": [ { "comment_text": "", "digests": { "md5": "ab819c4d671c710e6d900ae3d0195b33", "sha256": "bb3b28fdbb31365ee27209c4bc132d6ae9802c2f44648a977f101014e709fa66" }, "downloads": -1, "filename": "biterm-0.1.5-cp36-cp36m-macosx_10_7_x86_64.whl", "has_sig": false, "md5_digest": "ab819c4d671c710e6d900ae3d0195b33", "packagetype": "bdist_wheel", "python_version": "cp36", "requires_python": null, "size": 62303, "upload_time": "2019-03-18T21:45:07", "url": "https://files.pythonhosted.org/packages/19/4b/8267896db7dc084d2f077253f67aab1e96bafbf3768017f183ce2d216cc1/biterm-0.1.5-cp36-cp36m-macosx_10_7_x86_64.whl" }, { "comment_text": "", "digests": { "md5": "f9763474ec9de44636d4c1cc87daf172", "sha256": "e46ed58b95e39247d0c56f3339e15a4a14545f2c1957d95c565f0ed3e3d20384" }, "downloads": -1, "filename": "biterm-0.1.5.tar.gz", "has_sig": false, "md5_digest": "f9763474ec9de44636d4c1cc87daf172", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 79672, "upload_time": "2019-03-18T21:45:09", "url": "https://files.pythonhosted.org/packages/36/ca/5a43511e6ea8ca02cc9e8be1b8898ad79b140c055d4400342dc210ba23bb/biterm-0.1.5.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "ab819c4d671c710e6d900ae3d0195b33", "sha256": "bb3b28fdbb31365ee27209c4bc132d6ae9802c2f44648a977f101014e709fa66" }, "downloads": -1, "filename": "biterm-0.1.5-cp36-cp36m-macosx_10_7_x86_64.whl", "has_sig": false, "md5_digest": "ab819c4d671c710e6d900ae3d0195b33", "packagetype": "bdist_wheel", "python_version": "cp36", "requires_python": null, "size": 62303, "upload_time": "2019-03-18T21:45:07", "url": "https://files.pythonhosted.org/packages/19/4b/8267896db7dc084d2f077253f67aab1e96bafbf3768017f183ce2d216cc1/biterm-0.1.5-cp36-cp36m-macosx_10_7_x86_64.whl" }, { "comment_text": "", "digests": { "md5": "f9763474ec9de44636d4c1cc87daf172", "sha256": "e46ed58b95e39247d0c56f3339e15a4a14545f2c1957d95c565f0ed3e3d20384" }, "downloads": -1, "filename": "biterm-0.1.5.tar.gz", "has_sig": false, "md5_digest": "f9763474ec9de44636d4c1cc87daf172", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 79672, "upload_time": "2019-03-18T21:45:09", "url": "https://files.pythonhosted.org/packages/36/ca/5a43511e6ea8ca02cc9e8be1b8898ad79b140c055d4400342dc210ba23bb/biterm-0.1.5.tar.gz" } ] }