{ "info": { "author": "Jai Juneja", "author_email": "jai.juneja@gmail.com", "bugtrack_url": null, "classifiers": [ "Development Status :: 3 - Alpha", "Intended Audience :: Developers", "License :: OSI Approved :: BSD License", "Operating System :: OS Independent", "Programming Language :: Python", "Topic :: Scientific/Engineering :: Artificial Intelligence", "Topic :: Scientific/Engineering :: Information Analysis", "Topic :: Text Processing :: Filters", "Topic :: Text Processing :: Linguistic" ], "description": "PyTLDR: Automatic Text Summarization in Python\n==============================================\n\n|Build Status| |PyPI version|\n\nA Python module to perform automatic summarization of articles, text\nfiles and web pages.\n\nLicense\n-------\n\nCopyright 2014 Jai Juneja.\n\nThis program is free software: you can redistribute it and/or modify it\nunder the terms of the GNU General Public License as published by the\nFree Software Foundation, either version 3 of the License, or (at your\noption) any later version.\n\nThis program is distributed in the hope that it will be useful, but\nWITHOUT ANY WARRANTY; without even the implied warranty of\nMERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General\nPublic License for more details.\n\nYou should have received a copy of the GNU General Public License along\nwith this program. If not, see http://www.gnu.org/licenses/.\n\nInstallation\n------------\n\nUsing pip or easy\\_install\n~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nYou can download the latest release version using ``pip`` or\n``easy_install``:\n\n::\n\n pip install pytldr\n\nLatest development version\n~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nYou can alternatively download the latest development version directly\nfrom GitHub:\n\n::\n\n git clone https://github.com/jaijuneja/PyTLDR.git\n\nChange into the root directory:\n\n::\n\n cd pytldr\n\nThen install the package:\n\n::\n\n python setup.py install\n\nUsage\n-----\n\nA simple sample program using the PyTLDR module can be found at\n``https://github.com/jaijuneja/PyTLDR/blob/master/example.py``\n\nIn its current form, this module contains three distinct implementations\nof automatic text summarization:\n\n- Using the TextRank algorithm (based on PageRank)\n- Using Latent Semantic Analysis\n- Using a sentence relevance score\n\nNote that all three of the above implementations are extractive - that\nis, they simply extract and display the most relevant sentences from the\ninput text. They do not formulate their own sentences (such algorithms\nare known as \"abstractive\", and are still at a primitive stage).\n\nSentence tokenization\n~~~~~~~~~~~~~~~~~~~~~\n\nPyTLDR comes with a built-in sentence tokenizer that is used for\nsummarization. The tokenizer performs stemming in several languages as\nwell as stop-word removal. You may also specify your own list of\nstop-words.\n\n.. code:: python\n\n from pytldr.nlp.tokenizer import Tokenizer\n\n tokenizer = Tokenizer(language='english', stopwords=None, stemming=True)\n # Note that if stopwords=None then the tokenizer loads stopwords from a bundled data-set\n # You can alternatively specify a text file or provide a list of words\n\nNote that the tokenizer is the only input required to initialize a\nsummarizer object, as shown below.\n\nTextRank Summarization\n~~~~~~~~~~~~~~~~~~~~~~\n\nRanks sentences using the PageRank algorithm, where \"votes\" or\n\"in-links\" are represented by words shared between sentences.\n\n.. code:: python\n\n from pytldr.summarize.textrank import TextRankSummarizer\n from pytldr.nlp.tokenizer import Tokenizer\n\n tokenizer = Tokenizer('english')\n summarizer = TextRankSummarizer(tokenizer)\n\n # If you don't specify a tokenizer when intiializing a summarizer then the\n #\u00a0English tokenizer will be used by default\n summarizer = TextRankSummarizer()\u00a0 # English tokenizer used\n\n # This object creates a summary using the summarize method:\n #\u00a0e.g. summarizer.summarize(text, length=5, weighting='frequency', norm=None)\n\n # The length parameter specifies the length of the summary, either as a\n # number of sentences, or a percentage of the original text\n\n # The summarizer can take as input...\n # 1. A string:\n summary = summarizer.summarize(\"Some long article bla bla...\", length=4)\n\n # 2. A text file:\n summary = summarizer.summarize(\"/path/to/file.txt\", length=0.25)\n #\u00a0Above summary is a quarter of the length of the original text\n\n # 3. A URL (must start with http://):\n summary = summarizer.summarize(\"http://newsite.com/some_article\")\n\nLatent Semantic Analysis (LSA) Summarization\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nReduces the dimensionality of the article into several \"topic\" clusters\nusing singular value decomposition, and selects the sentences that are\nmost relevant to these topics. This is a rather more abstract\nsummarization algorithm.\n\nThis module comes packaged with two distinct implementations of the LSA\nalgorithm, as described in two academic papers:\n\n- J. Steinberger and K. Jezek (2004). Using latent semantic analysis in\n text summarization and summary evaluation.\n- Ozsoy, M., Alpaslan, F., and Cicekli, I. (2011). Text summarization\n using latent semantic analysis.\n\nThe more recent Ozsoy et al. implentation is called by default, but both\nclasses have the same interface.\n\n.. code:: python\n\n from pytldr.summarize.lsa import LsaSummarizer, LsaOzsoy, LsaSteinberger\n\n summarizer = LsaOzsoy()\n summarizer = LsaSteinberger()\n summarizer = LsaSummarizer() # This is identical to the LsaOzsoy object\n\n summary = summarizer.summarize(\n text, topics=4, length=5, binary_matrix=True, topic_sigma_threshold=0.5\n )\n\n # topics specifies the number of topics to cluster the article into.\n # topic_sigma_threshold removes all topics with a singular value less than a given\n # percentage of the largest singular value.\n\nRelevance Score Summarization\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nThis method computes and ranks the cosine similarity between each\nsentence vector and the overall document, removing the most relevant\nsentence at each iteration. It closely follows the approach described in\nthe paper:\n\n- Y. Gong and X. Liu (2001). Generic text summarization using relevance\n measure and latent semantic analysis.\n\n.. code:: python\n\n from pytldr.summarize.relevance import RelevanceSummarizer\n\n summarizer = RelevanceSummarizer()\n summary = summarizer.summarize(text, length=5, binary_matrix=True):\n\nMore help\n~~~~~~~~~\n\nYou can read the documentation for each of the above implementations by\ntyping the following into your python console:\n\n.. code:: python\n\n help(TextRankSummarizer)\n help(LsaSummarizer)\n help(RelevanceSummarizer)\n\nContact\n-------\n\nIf you have any questions or have encountered an error, feel free to\ncontact me at ``jai -dot- juneja -at- gmail -dot- com``.\n\n.. |Build Status| image:: https://travis-ci.org/jaijuneja/PyTLDR.svg?branch=master\n :target: https://travis-ci.org/jaijuneja/PyTLDR\n.. |PyPI version| image:: https://badge.fury.io/py/pytldr.svg\n :target: https://pypi.python.org/pypi/pytldr", "description_content_type": null, "docs_url": null, "download_url": "UNKNOWN", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/jaijuneja/PyTLDR", "keywords": "summarizer,summarization,natural language processing,nlp,machine learning,data mining,latent semantic analysis,lsa", "license": "BSD", "maintainer": null, "maintainer_email": null, "name": "PyTLDR", "package_url": "https://pypi.org/project/PyTLDR/", "platform": "UNKNOWN", "project_url": "https://pypi.org/project/PyTLDR/", "project_urls": { "Download": "UNKNOWN", "Homepage": "https://github.com/jaijuneja/PyTLDR" }, "release_url": "https://pypi.org/project/PyTLDR/0.1.4/", "requires_dist": null, "requires_python": null, "summary": "A module to perform automatic article summarization.", "version": "0.1.4" }, "last_serial": 1454862, "releases": { "0.1": [ { "comment_text": "", "digests": { "md5": "54b2fdddd79f8e6d93c5f48b1df491b1", "sha256": "17be7770e573cd0b4a28abfae7fab30f715819829a5e685d3041ef5b33a617af" }, "downloads": -1, "filename": "PyTLDR-0.1.tar.gz", "has_sig": false, "md5_digest": "54b2fdddd79f8e6d93c5f48b1df491b1", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 17207, "upload_time": "2015-03-06T04:08:12", "url": "https://files.pythonhosted.org/packages/46/0d/54e378ab8fc6771bf43095fe9d4571a63fd9eaecdb0e2a6b044227084e6d/PyTLDR-0.1.tar.gz" } ], "0.1.1": [ { "comment_text": "", "digests": { "md5": "bd5905bba9dcdc69b8120ad7a2f81f51", "sha256": "bd23b191842f44c00a653453d853197ce068a1a9056b42d28dd422c1838f61b6" }, "downloads": -1, "filename": "PyTLDR-0.1.1.tar.gz", "has_sig": false, "md5_digest": "bd5905bba9dcdc69b8120ad7a2f81f51", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 17800, "upload_time": "2015-03-06T05:49:59", "url": "https://files.pythonhosted.org/packages/e4/81/ee0c52f40c867fa34d695929393931e4c1e19642b17efc52d4e64697a345/PyTLDR-0.1.1.tar.gz" } ], "0.1.2": [ { "comment_text": "", "digests": { "md5": "3461a0e43a96c80c44d93370f41c85f8", "sha256": "823aaced6b46e5053887c0e6aca5133a3a6f8f32c49e6768a084ffdb47140e71" }, "downloads": -1, "filename": "PyTLDR-0.1.2.tar.gz", "has_sig": false, "md5_digest": "3461a0e43a96c80c44d93370f41c85f8", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 17969, "upload_time": "2015-03-07T03:50:08", "url": "https://files.pythonhosted.org/packages/16/25/a09a39be1d053c504077c00173c07fc4985ab3084a78e98f2607d9670b8e/PyTLDR-0.1.2.tar.gz" } ], "0.1.3": [ { "comment_text": "", "digests": { "md5": "66a2d3f5870db02fb532ef285dd04e08", "sha256": "b3a8cdca419f041944db0689356797675bfc42f3b69f586745292702542a60aa" }, "downloads": -1, "filename": "PyTLDR-0.1.3.tar.gz", "has_sig": false, "md5_digest": "66a2d3f5870db02fb532ef285dd04e08", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 18109, "upload_time": "2015-03-09T15:13:16", "url": "https://files.pythonhosted.org/packages/49/7a/c606250e94a6ce679a1ab9cbbe7fe5ed0e058bc5849766247394f96e8609/PyTLDR-0.1.3.tar.gz" } ], "0.1.4": [ { "comment_text": "", "digests": { "md5": "649ca803854bd2a182b08d981552204e", "sha256": "d00624749438c5e5f6fbf4a39ca97797ff80407a86c1fadd2bf4663c18cd5ffa" }, "downloads": -1, "filename": "PyTLDR-0.1.4.tar.gz", "has_sig": false, "md5_digest": "649ca803854bd2a182b08d981552204e", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 18134, "upload_time": "2015-03-09T22:32:23", "url": "https://files.pythonhosted.org/packages/70/09/02ed27061159e5f6d35abad4ec9ef3cac8e220093d61a2f7a42f53c9cb22/PyTLDR-0.1.4.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "649ca803854bd2a182b08d981552204e", "sha256": "d00624749438c5e5f6fbf4a39ca97797ff80407a86c1fadd2bf4663c18cd5ffa" }, "downloads": -1, "filename": "PyTLDR-0.1.4.tar.gz", "has_sig": false, "md5_digest": "649ca803854bd2a182b08d981552204e", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 18134, "upload_time": "2015-03-09T22:32:23", "url": "https://files.pythonhosted.org/packages/70/09/02ed27061159e5f6d35abad4ec9ef3cac8e220093d61a2f7a42f53c9cb22/PyTLDR-0.1.4.tar.gz" } ] }