{ "info": { "author": "Arturo Curiel", "author_email": "me@arturocuriel.com", "bugtrack_url": null, "classifiers": [ "Development Status :: 4 - Beta", "Intended Audience :: Developers", "License :: OSI Approved :: GNU General Public License v3 (GPLv3)", "Programming Language :: Python :: 3", "Programming Language :: Python :: 3.4", "Programming Language :: Python :: 3.5", "Programming Language :: Python :: 3.6", "Programming Language :: Python :: 3.7", "Topic :: Software Development :: Build Tools" ], "description": "# Multilang Summarizer\n\nThis package implements an online multi-document summarization algorithm, intended to improve text readability. It supports the following languages:\n\n* 'de': 'German'\n* 'en': 'English'\n* 'es': 'Spanish'\n* 'fr': 'French'\n* 'hu': 'Hungarian'\n* 'it': 'Italian'\n* 'pt': 'Portuguese'\n* 'ro': 'Romanian'\n* 'sv': 'Swedish'\n\nThis work was partially supported by the National Council of Science and Technology (CONACYT) of Mexico, as part of the C\u00e1tedras CONACYT project _Infraestructura para agilizar el desarrollo de sistemas centrados en el usuario_, Ref. 3053.\n\n### Prerequisites\n\nThis projects has the following dependencies:\n\n* Pyphen\n* TextStat\n* sentence-splitter\n* numpy\n* NLTK (needs to download tokenization corpora)\n\n### Installing\n\nThe package is distributed via pip:\n\n```\npip install multilang-summarizer\n```\n\n### Use\n\nThe summarizer function directly implements the algorithm.\n\n```\nfrom multilang_summarizer.summarizer import summarizer\n\n# summarizer(D_path, f_method, seq_method, lemmatizer, session_id=1)\n```\n\nIt receives the path to a single document, a choice for three different sentence relevance functions (f\\_method), a relevant term selection method (seq\\_method), a lemmatizer and a session number (for memory purposes).\n\nThe choice of f\\_method can be one of three:\n\n* 'f1' : uses mean term likelihood as an indicator of relevance.\n* 'f2' : uses past term use and syllabic entropy to measure relevance and sentence complexity, respectively.\n* 'f3' : uses a weighted tfidf-based approach to measure relevance.\n\nThe choice of seq\\_method can be one of three:\n\n* 'partial' : uses simple matching between the last generated summary and the new input to identify relevant terms.\n* 'probability' : uses past term likelihoods to identify relevant terms.\n* 'lcs' : uses the Longest Common Subsequence algorithm to identify relevant terms between the last generated summary and the new input.\n\nThe _lemmatizer_ object contains the lemmatization rules for the selected language. For English, it can be instanced as follows:\n\n```\nfrom multilang_summarizer.lemmatizer import Lemmatizer\n\n\nlemmatizer = Lemmatizer.for_language(\"en\")\n```\n\nFinally, session\\_id tells the algorithm to which running summary input D will be adding to. Different sessions can be opened at once. To clean the cache __for all sessions__ use the following method:\n\n```\nfrom multilang_summarizer.summarizer import clean_working_memory\n\nclean_working_memory()\n```\n\nIn the end, summarizer returns a Document object containing all the sentences selected from all previous documents in the named session, and the _f_ score with which each sentence was selected.\n\n## Running the tests\n\nTwo example scripts are provided in the repo:\n\n* tests/test\\_english.py\n* tests/test\\_spanish.py\n\nTo run them, the documents in the test\\_documents folder are required. Simply, execute\n\n```\npython tests/test_english.py\n``` \n\nfrom the root folder after setup.\n\n### Example results\n\nThe following summary was obtained using f\\_method = 'f3' and seq\\_method = 'lcs' over the 10 news items in the test\\_documents folder.\n\n```\nFor the second day in a row, astronauts boarded space shuttle Endeavour \non Friday for liftoff on NASA's first space station construction flight.\nThe decision, which followed ``frank and \ncandid'' discussions between the two partners, was not imposed by \nthe United States, he said.\nThe main cargo Thursday was the Unity module, the first U.S.-built \nstation part.\nThe shuttle contains \nthe second station component.\nThe mechanical arm has never before moved anything so big.\nThe bigger worry, by far, was over Endeavour's pursuit and capture \nof Zarya, and its coupling with Unity.\n```\n\nIn Spanish, the following summary was obtained using f\\_method = 'f1' and seq\\_method = 'lcs' over the 12 Spanish-language news items in the test\\_documents folder.\n\n```\nTras una intensa b\u00fasqueda llevada a cabo por rescatistas, los 12 ni\u00f1os y su profesor fueron encontrados con vida y en buen estado de salud.\nEl rescate de los 12 ni\u00f1os y su entrenador que quedaron atrapados en una cueva inundada, en el norte de Tailandia, podr\u00eda tomar semanas o incluso meses.\nPero aunque los 13 pudieran bucear, algunas partes de la cueva son demasiado estrechas,lo que exige mucho entrenamiento para poder pasar con tanques de buceo.\nLos ni\u00f1os fueron encontrados, 200 metros m\u00e1s adelante.\nEst\u00e1n cansados y necesitan un tiempo para reponerse.\nLa primera etapa del rescate es hacerles recuperar fuerzas.\nLos 13 miembros est\u00e1n bien.\n```\n## Authors\n\n* **Arturo Curiel** - [arturocuriel.com](https://www.arturocuriel.com/)\n\n## License\n\nThis project is licensed under the GPLv3 License - see the [LICENSE](LICENSE) file for details\n\n## Acknowledgments\n\n* Thanks to [Claudio Gutierrez Soto](http://www.face.ubiobio.cl/~cgutierr/) and [Rafael Rojano](https://scholar.google.com/citations?user=tJO7AnxQtZUC&hl=en) for their input in the development on the algorithm.\n\n\n\n", "description_content_type": "text/markdown", "docs_url": null, "download_url": "https://github.com/elmugrearturo/multilang_summarizer/archive/1.7.tar.gz", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "http://www.arturocuriel.com", "keywords": "SUMMARIZATION,MULTILANGUAGE,RULE-BASED", "license": "GPLv3", "maintainer": "", "maintainer_email": "", "name": "multilang-summarizer", "package_url": "https://pypi.org/project/multilang-summarizer/", "platform": "", "project_url": "https://pypi.org/project/multilang-summarizer/", "project_urls": { "Download": "https://github.com/elmugrearturo/multilang_summarizer/archive/1.7.tar.gz", "Homepage": "http://www.arturocuriel.com" }, "release_url": "https://pypi.org/project/multilang-summarizer/1.7/", "requires_dist": [ "nltk", "pyphen", "textstat", "sentence-splitter", "numpy" ], "requires_python": "", "summary": "Multilanguage summarizer, intended to improve text readability", "version": "1.7" }, "last_serial": 5474681, "releases": { "1.7": [ { "comment_text": "", "digests": { "md5": "0d19ead3108bbd4b3445407c72abc95b", "sha256": "3142cf181f6cb26e19bdb298c0fdee8d4e8a850285c80a543bb408043bc12bb0" }, "downloads": -1, "filename": "multilang_summarizer-1.7-py3-none-any.whl", "has_sig": false, "md5_digest": "0d19ead3108bbd4b3445407c72abc95b", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 20867749, "upload_time": "2019-07-02T02:51:22", "url": "https://files.pythonhosted.org/packages/36/9c/dc47d63c609d290733a29f21893a0c2af5cfe5dc33a4df976c83cc7f67bf/multilang_summarizer-1.7-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "84b9eff73c6ed34bdac64eb400f9e4d7", "sha256": "0c436c76ad87733ba4c9308f4b178759723ada8a1fc91697a38bf31a417f010f" }, "downloads": -1, "filename": "multilang-summarizer-1.7.tar.gz", "has_sig": false, "md5_digest": "84b9eff73c6ed34bdac64eb400f9e4d7", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 15517, "upload_time": "2019-07-02T02:51:25", "url": "https://files.pythonhosted.org/packages/83/1c/d10c577becd3bf190398946aab8540f8987805d7f0c0e4a705f56951f8e2/multilang-summarizer-1.7.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "0d19ead3108bbd4b3445407c72abc95b", "sha256": "3142cf181f6cb26e19bdb298c0fdee8d4e8a850285c80a543bb408043bc12bb0" }, "downloads": -1, "filename": "multilang_summarizer-1.7-py3-none-any.whl", "has_sig": false, "md5_digest": "0d19ead3108bbd4b3445407c72abc95b", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 20867749, "upload_time": "2019-07-02T02:51:22", "url": "https://files.pythonhosted.org/packages/36/9c/dc47d63c609d290733a29f21893a0c2af5cfe5dc33a4df976c83cc7f67bf/multilang_summarizer-1.7-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "84b9eff73c6ed34bdac64eb400f9e4d7", "sha256": "0c436c76ad87733ba4c9308f4b178759723ada8a1fc91697a38bf31a417f010f" }, "downloads": -1, "filename": "multilang-summarizer-1.7.tar.gz", "has_sig": false, "md5_digest": "84b9eff73c6ed34bdac64eb400f9e4d7", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 15517, "upload_time": "2019-07-02T02:51:25", "url": "https://files.pythonhosted.org/packages/83/1c/d10c577becd3bf190398946aab8540f8987805d7f0c0e4a705f56951f8e2/multilang-summarizer-1.7.tar.gz" } ] }