{
    "info": {
        "author": "Adrien Guille, Pavel Soriano",
        "author_email": "adrien.guille@univ-lyon2.fr",
        "bugtrack_url": null,
        "classifiers": [
            "Development Status :: 4 - Beta",
            "Intended Audience :: Science/Research",
            "Operating System :: OS Independent",
            "Programming Language :: Python",
            "Programming Language :: Python :: 2.7",
            "Topic :: Scientific/Engineering",
            "Topic :: Text Processing"
        ],
        "description": "TOM\n===\n\nTOM (TOpic Modeling) is a Python 3 library for topic modeling and\nbrowsing, licensed under the MIT license. Its objective is to allow for\nan efficient analysis of a text corpus from start to finish, via the\ndiscovery of latent topics. To this end, TOM features functions for\npreparing and vectorizing a text corpus. It also offers a common\ninterface for two topic models (namely LDA using either variational\ninference or Gibbs sampling, and NMF using alternating least-square with\na projected gradient method), and implements three state-of-the-art\nmethods for estimating the optimal number of topics to model a corpus.\nWhat is more, TOM constructs an interactive Web-based browser that makes\nit easy to explore a topic model and the related corpus.\n\nFeatures\n--------\n\nVector space modeling\n~~~~~~~~~~~~~~~~~~~~~\n\n-  Feature selection based on word frequency\n-  Weighting\n\n   -  tf\n   -  tf-idf\n\nTopic modeling\n~~~~~~~~~~~~~~\n\n-  Latent Dirichlet Allocation\n\n   -  Standard variational Bayesian inference (Latent Dirichlet\n      Allocation. Blei et al, 2003)\n   -  Online variational Bayesian inference (Online learning for Latent\n      Dirichlet Allocation. Hoffman et al, 2010)\n   -  Collapsed Gibbs sampling (Finding scientific topics. Griffiths &\n      Steyvers, 2004)\n\n-  Non-negative Matrix Factorization (NMF)\n\n   -  Alternating least-square with a projected gradient method\n      (Projected gradient methods for non-negative matrix factorization.\n      Lin, 2007)\n\nEstimating the optimal number of topics\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\n-  Stability analysis (How many topics? Stability analysis for topic\n   models. Greene et al, 2014)\n-  Spectral analysis (On finding the natural number of topics with\n   latent dirichlet allocation: Some observations. Arun et al, 2010)\n-  Consensus-based analysis (Metagenes and molecular pattern discovery\n   using matrix factorization. Brunet et al, 2004)\n\nInstallation\n------------\n\nWe recommend you to install Anaconda (https://www.continuum.io) which\nwill automatically install most of the required dependencies (i.e.\npandas, numpy, scipy, scikit-learn, matplotlib, flask). You should then\ninstall the lda module (pip install lda). Eventually, clone or download\nthis repo and run the following command:\n\n::\n\n    python setup.py install\n\nOr, install it directly from PyPi:\n\n::\n\n    pip install tom_lib\n\nUsage\n-----\n\nWe provide two sample programs, topic\\_model.py (which shows you how to\nload and prepare a corpus, estimate the optimal number of topics, infer\nthe topic model and then manipulate it) and topic\\_model\\_browser.py\n(which shows you how to generate a topic model browser to explore a\ncorpus), to help you get started using TOM.\n\nLoad and prepare a textual corpus\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nThe following code snippet shows how to load a corpus of French\ndocuments and vectorize them using tf-idf with unigrams.\n\n::\n\n    corpus = Corpus(source_file_path='input/raw_corpus.csv',\n                    language='french', \n                    vectorization='tfidf', \n                    n_gram=1,\n                    max_relative_frequency=0.8, \n                    min_absolute_frequency=4)\n    print('corpus size:', corpus.size)\n    print('vocabulary size:', len(corpus.vocabulary))\n    print('Vector representation of document 0:\\n', corpus.vector_for_document(0))\n\nInstantiate a topic model and infer topics\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nIt is possible to instantiate a NMF or LDA object then infer topics.\n\nNMF:\n\n::\n\n    topic_model = NonNegativeMatrixFactorization(corpus)\n    topic_model.infer_topics(num_topics=15)\n\nLDA (using either the standard variational Bayesian inference or Gibbs\nsampling):\n\n::\n\n    topic_model = LatentDirichletAllocation(corpus)\n    topic_model.infer_topics(num_topics=15, algorithm='variational')\n\n::\n\n    topic_model = LatentDirichletAllocation(corpus)\n    topic_model.infer_topics(num_topics=15, algorithm='gibbs')\n\nInstantiate a topic model and estimate the optimal number of topics\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nHere we instantiate a NMF object, then generate plots with the three\nmetrics for estimating the optimal number of topics.\n\n::\n\n    topic_model = NonNegativeMatrixFactorization(corpus)\n    viz = Visualization(topic_model)\n    viz.plot_greene_metric(min_num_topics=5, \n                           max_num_topics=50, \n                           tao=10, step=1, \n                           top_n_words=10)\n    viz.plot_arun_metric(min_num_topics=5, \n                         max_num_topics=50, \n                         iterations=10)\n    viz.plot_brunet_metric(min_num_topics=5, \n                           max_num_topics=50,\n                           iterations=10)\n\nSave/load a topic model\n~~~~~~~~~~~~~~~~~~~~~~~\n\nTo allow reusing previously learned topics models, TOM can save them on\ndisk, as shown below.\n\n::\n\n    utils.save_topic_model(topic_model, 'output/NMF_15topics.tom')\n    topic_model = utils.load_topic_model('output/NMF_15topics.tom')\n\nPrint information about a topic model\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nThis code excerpt illustrates how one can manipulate a topic model, e.g.\nget the topic distribution for a document or the word distribution for a\ntopic.\n\n::\n\n    print('\\nTopics:')\n    topic_model.print_topics(num_words=10)\n    print('\\nTopic distribution for document 0:',\n          topic_model.topic_distribution_for_document(0))\n    print('\\nMost likely topic for document 0:',\n          topic_model.most_likely_topic_for_document(0))\n    print('\\nFrequency of topics:',\n          topic_model.topics_frequency())\n    print('\\nTop 10 most relevant words for topic 2:',\n          topic_model.top_words(2, 10))\n\nTopic model browser: screenshots\n--------------------------------\n\nTopic cloud\n~~~~~~~~~~~\n\n|image0| ### Topic details |image1| ### Document details |image2|\n\n.. |image0| image:: http://mediamining.univ-lyon2.fr/people/guille/tom_resources/topic_cloud.jpg\n.. |image1| image:: http://mediamining.univ-lyon2.fr/people/guille/tom_resources/topic_details.jpg\n.. |image2| image:: http://mediamining.univ-lyon2.fr/people/guille/tom_resources/document_details.jpg",
        "description_content_type": null,
        "docs_url": null,
        "download_url": "http://pypi.python.org/packages/source/t/tom_lib/tom_lib-0.2.2.tar.gz",
        "downloads": {
            "last_day": -1,
            "last_month": -1,
            "last_week": -1
        },
        "home_page": "http://mediamining.univ-lyon2.fr/people/guille/tom.php",
        "keywords": null,
        "license": "MIT",
        "maintainer": null,
        "maintainer_email": null,
        "name": "tom_lib",
        "package_url": "https://pypi.org/project/tom_lib/",
        "platform": "UNKNOWN",
        "project_url": "https://pypi.org/project/tom_lib/",
        "project_urls": {
            "Download": "http://pypi.python.org/packages/source/t/tom_lib/tom_lib-0.2.2.tar.gz",
            "Homepage": "http://mediamining.univ-lyon2.fr/people/guille/tom.php"
        },
        "release_url": "https://pypi.org/project/tom_lib/0.2.2/",
        "requires_dist": null,
        "requires_python": null,
        "summary": "A library for topic modeling and browsing",
        "version": "0.2.2"
    },
    "last_serial": 2185219,
    "releases": {
        "0.1.2": [
            {
                "comment_text": "",
                "digests": {
                    "md5": "30df7e7b5911835089d665d003b0b435",
                    "sha256": "c8bb67a437b7a18740b4b647d3da7a2062cb53cb69751d59aaa6479019cdf86c"
                },
                "downloads": -1,
                "filename": "tom_lib-0.1.2.tar.gz",
                "has_sig": false,
                "md5_digest": "30df7e7b5911835089d665d003b0b435",
                "packagetype": "sdist",
                "python_version": "source",
                "requires_python": null,
                "size": 6510757,
                "upload_time": "2016-04-15T23:28:17",
                "url": "https://files.pythonhosted.org/packages/d6/4b/d3040e1ee423ffc04b0f58bdbe335f9a44a8095b11b5a9e2c7c837327d27/tom_lib-0.1.2.tar.gz"
            }
        ],
        "0.2.0": [
            {
                "comment_text": "",
                "digests": {
                    "md5": "b0a9e473d35b6af559705c17d5f9069e",
                    "sha256": "db0b7c2b48ed5b9b146c81e208e222706d06b74a836c411a87063f32a4b9cb0e"
                },
                "downloads": -1,
                "filename": "tom_lib-0.2.0.tar.gz",
                "has_sig": false,
                "md5_digest": "b0a9e473d35b6af559705c17d5f9069e",
                "packagetype": "sdist",
                "python_version": "source",
                "requires_python": null,
                "size": 821153,
                "upload_time": "2016-06-23T22:35:52",
                "url": "https://files.pythonhosted.org/packages/ef/ee/14cec017ac1a6a3b6dcebf98196dd6a0bc07c92e7a356e74583cd629d7c4/tom_lib-0.2.0.tar.gz"
            }
        ],
        "0.2.1": [
            {
                "comment_text": "",
                "digests": {
                    "md5": "1a11c87bd3f939f5c85524d975febe80",
                    "sha256": "2d075520d19e87fcaab6a8889b88f291d6f84715ea049638962b1ddafff314e2"
                },
                "downloads": -1,
                "filename": "tom_lib-0.2.1.tar.gz",
                "has_sig": false,
                "md5_digest": "1a11c87bd3f939f5c85524d975febe80",
                "packagetype": "sdist",
                "python_version": "source",
                "requires_python": null,
                "size": 819806,
                "upload_time": "2016-06-24T12:55:37",
                "url": "https://files.pythonhosted.org/packages/53/dd/2d45db2bba7460cf76da261fd493cb4b3b6d31be3ae7c0424eba19e15c34/tom_lib-0.2.1.tar.gz"
            }
        ],
        "0.2.2": [
            {
                "comment_text": "",
                "digests": {
                    "md5": "b1e189a856f9298a0efde649df2d4a87",
                    "sha256": "46b0b542f0b241e8aead1470fd38965449784268f4a71663c7b01eff0094e41b"
                },
                "downloads": -1,
                "filename": "tom_lib-0.2.2.tar.gz",
                "has_sig": false,
                "md5_digest": "b1e189a856f9298a0efde649df2d4a87",
                "packagetype": "sdist",
                "python_version": "source",
                "requires_python": null,
                "size": 819546,
                "upload_time": "2016-06-24T13:42:41",
                "url": "https://files.pythonhosted.org/packages/e4/50/045dce3e4f2b77611bd07d9635626ea7acfdafd81d81c64c271b84456ee9/tom_lib-0.2.2.tar.gz"
            }
        ]
    },
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "md5": "b1e189a856f9298a0efde649df2d4a87",
                "sha256": "46b0b542f0b241e8aead1470fd38965449784268f4a71663c7b01eff0094e41b"
            },
            "downloads": -1,
            "filename": "tom_lib-0.2.2.tar.gz",
            "has_sig": false,
            "md5_digest": "b1e189a856f9298a0efde649df2d4a87",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 819546,
            "upload_time": "2016-06-24T13:42:41",
            "url": "https://files.pythonhosted.org/packages/e4/50/045dce3e4f2b77611bd07d9635626ea7acfdafd81d81c64c271b84456ee9/tom_lib-0.2.2.tar.gz"
        }
    ]
}