{ "info": { "author": "Jie Gao", "author_email": "j.gao@sheffield.ac.uk", "bugtrack_url": null, "classifiers": [ "Development Status :: 4 - Beta", "Intended Audience :: Developers", "Intended Audience :: Education", "Intended Audience :: Information Technology", "Intended Audience :: Science/Research", "License :: OSI Approved :: MIT License", "Operating System :: OS Independent", "Programming Language :: Python :: 3 :: Only", "Topic :: Scientific/Engineering :: Artificial Intelligence", "Topic :: Scientific/Engineering :: Human Machine Interfaces", "Topic :: Scientific/Engineering :: Information Analysis", "Topic :: Text Processing :: Filters", "Topic :: Text Processing :: General", "Topic :: Text Processing :: Indexing", "Topic :: Text Processing :: Linguistic" ], "description": "jgTextRank : Yet another Python implementation of TextRank\n\n\nThis is a parallelisable and highly customisable implementation of the TextRank algorithm [Mihalcea et al., 2004].\nYou can define your own co-occurrence context, syntactic categories(choose either \"closed\" filters or \"open\" filters),\nstop words, feed your own pre-segmented/pre-tagged data, and many more. You can also\nload co-occurrence graph directly from your text for visual analytics, debug and fine-tuning your custom settings.\nThis implementation can also be applied to large corpus for terminology extraction.\nIt can be applied to short text for supervised learning in order to provide more interesting features than conventional TF-IDF Vectorizer.\n\nTextRank algorithm look into the structure of word co-occurrence networks,\nwhere nodes are word types and edges are word cooccurrence.\n\nImportant words can be thought of as being endorsed by other words,\nand this leads to an interesting phenomenon. Words that are most\nimportant, viz. keywords, emerge as the most central words in the\nresulting network, with high degree and PageRank. The final important\nstep is post-filtering. Extracted phrases are disambiguated and\nnormalized for morpho-syntactic variations and lexical synonymy\n(Csomai and Mihalcea 2007). Adjacent words are also sometimes\ncollapsed into phrases, for a more readable output.\n\n\nMihalcea, R., & Tarau, P. (2004, July). TextRank: Bringing order into texts. Association for Computational Linguistics.\n\n### Usage ###\n\n#### Simple examples\n\nExtract weighted keywords with an undirected graph:\n\n >>> from jgtextrank import keywords_extraction\n >>> example_abstract = \"Compatibility of systems of linear constraints over the set of natural numbers. \" \\\n \"Criteria of compatibility of a system of linear Diophantine equations, strict inequations, \" \\\n \"and nonstrict inequations are considered. Upper bounds for components of a minimal set of \" \\\n \"solutions and algorithms of construction of minimal generating sets of solutions for all \" \\\n \"types of systems are given. These criteria and the corresponding algorithms for \" \\\n \"constructing a minimal supporting set of solutions can be used in solving all the \" \\\n \"considered types systems and systems of mixed types.\"\n >>> keywords_extraction(example_abstract, top_p = 1, directed=False)[0][:15]\n [('linear diophantine equations', 0.18059), ('minimal supporting set', 0.16649), ('minimal set', 0.13201), ('types systems', 0.1194), ('linear constraints', 0.10997), ('strict inequations', 0.08832), ('systems', 0.08351), ('corresponding algorithms', 0.0767), ('nonstrict inequations', 0.07276), ('mixed types', 0.07178), ('set', 0.06674), ('minimal', 0.06527), ('natural numbers', 0.06466), ('algorithms', 0.05479), ('solutions', 0.05085)]\n\nChange syntactic filters to restrict vertices to only noun phrases for addition to the graph:\n\n >>> custom_categories = {'NNS', 'NNP', 'NN'}\n >>> keywords_extraction(example_abstract, top_p = 1, top_t=None,\n directed=False, syntactic_categories=custom_categories)[0][:15]\n [('types systems', 0.17147), ('diophantine equations', 0.15503), ('supporting set', 0.14256), ('solutions', 0.13119), ('systems', 0.12452), ('algorithms', 0.09188), ('set', 0.09188), ('compatibility', 0.0892), ('construction', 0.05068), ('criteria', 0.04939), ('sets', 0.04878), ('types', 0.04696), ('system', 0.01163), ('constraints', 0.01163), ('components', 0.01163)]\n\nYou can provide an additional stop word list to filter unwanted candidate terms:\n\n >>> stop_list={'set', 'mixed', 'corresponding', 'supporting'}\n >>> keywords_extraction(example_abstract, top_p = 1, top_t=None,\n directed=False,\n syntactic_categories=custom_categories, stop_words=stop_list)[0][:15]\n [('types systems', 0.20312), ('diophantine equations', 0.18348), ('systems', 0.1476), ('algorithms', 0.11909), ('solutions', 0.11909), ('compatibility', 0.10522), ('sets', 0.06439), ('construction', 0.06439), ('criteria', 0.05863), ('types', 0.05552), ('system', 0.01377), ('constraints', 0.01377), ('components', 0.01377), ('numbers', 0.01377), ('upper', 0.01377)]\n\nYou can also use lemmatization (disabled by default) to increase the weight for terms appearing with various inflectional variations:\n\n >>> keywords_extraction(example_abstract, top_p = 1, top_t=None,\n directed=False,\n syntactic_categories=custom_categories,\n stop_words=stop_list, lemma=True)[0][:15]\n [('type system', 0.2271), ('diophantine equation', 0.20513), ('system', 0.16497), ('algorithm', 0.14999), ('compatibility', 0.11774), ('construction', 0.07885), ('solution', 0.07885), ('criterion', 0.06542),('type', 0.06213), ('component', 0.01538), ('constraint', 0.01538), ('upper', 0.01538), ('inequations', 0.01538), ('number', 0.01538)]\n\nThe co-occurrence window size is 2 by default. You can try with a different number for your data:\n\n >>> keywords_extraction(example_abstract, window=5,\n top_p = 1, top_t=None, directed=False,\n stop_words=stop_list, lemma=True)[0][:15]\n [('linear diophantine equation', 0.19172), ('linear constraint', 0.13484), ('type system', 0.1347), ('strict inequations', 0.12532), ('system', 0.10514), ('nonstrict inequations', 0.09483), ('solution', 0.06903), ('natural number', 0.06711), ('minimal', 0.06346), ('algorithm', 0.05762), ('compatibility', 0.05089), ('construction', 0.04541), ('component', 0.04418), ('criterion', 0.04086), ('type', 0.02956)]\n\nTry with a centrality measures:\n\n >>> keywords_extraction(example_abstract, solver=\"current_flow_betweenness\",\n window=5, top_p = 1, top_t=None,\n directed=False, stop_words=stop_list,\n lemma=True)[0][:15]\n [('type system', 0.77869), ('system', 0.77869), ('solution', 0.32797), ('linear diophantine equation', 0.30657), ('linear constraint', 0.30657), ('minimal', 0.26052), ('algorithm', 0.21463), ('criterion', 0.19821), ('strict inequations', 0.19651), ('nonstrict inequations', 0.19651), ('compatibility', 0.1927), ('natural number', 0.11111), ('component', 0.11111), ('type', 0.10718), ('construction', 0.10039)]\n\nTuning your graph model as a black box can be problematic.\nYou can try to visualize your co-occurrence network with your sample dataset in order to manually validate your custom parameters:\n\n >>> from jgtextrank import preprocessing, build_cooccurrence_graph\n >>> import networkx as nx\n >>> import matplotlib.pyplot as plt\n >>> preprocessed_context = preprocessing(example_abstract, stop_words=stop_list, lemma=True)\n >>> cooccurrence_graph, context_tokens = build_cooccurrence_graph(preprocessed_context, window=2)\n >>> pos = nx.spring_layout(cooccurrence_graph,k=0.20,iterations=20)\n >>> nx.draw_networkx(cooccurrence_graph, pos=pos, arrows=True, with_requets labels=True)\n >>> plt.savefig(\"my_sample_cooccurrence_graph.png\")\n >>> plt.show()\n\n\nMore examples (e.g., with custom co-occurrence context, how to extract from a corpus of text files,\nfeed your own pre-segmented/pre-tagged data), please see [jgTextRank wiki](https://github.com/jerrygaoLondon/jgtextrank/wiki)\n\n### Documentation\n\nFor `jgtextrank` documentation, see:\n\n* [textrank](http://htmlpreview.github.io/?https://github.com/jerrygaoLondon/jgtextrank/blob/master/docs/jgtextrank.html)\n\n### Installation ###\n\nTo install from [PyPi](https://pypi.python.org/pypi/jgtextrank):\n\n pip install jgtextrank\n\nTo install from github\n\n pip install git+git://github.com/jerrygaoLondon/jgtextrank.git\n\nor\n\n pip install git+https://github.com/jerrygaoLondon/jgtextrank.git\n\nTo install from source\n\n python setup.py install\n\n### Dependencies\n\n* [nltk](http://www.nltk.org/)\n\n* [networkx](https://networkx.github.io/)\n\n### Status\n\n* Beta release (update)\n\n * Python implementation of TextRank algorithm for keywords extraction\n\n * Support directed/undirected and unweighted graph\n\n * >12 MWTs weighting methods\n\n * 3 pagerank implementations and >15 additional graph ranking algorithms\n\n * Parallelisation of vertices co-occurrence computation (allow to set number of available worker instances)\n\n * Support various custom settings and parameters (e.g., use of lemmatization,\n co-occurrence window size, options for two co-occurrence context strategies,\n use of custom syntactic filters, use of custom stop words)\n\n * Keywords extraction from pre-processed (pre-segmented or pre PoS tagged) corpus/context\n\n * Keywords extraction from a given corpus directory of raw text files\n\n * Export ranked result into 'csv' or 'json' file\n\n * Support visual analytics of vertices network\n\n\n### history ###\n* 0.1.2 Beta version - Aug 2018\n * bug fixes\n * 15 additional graph ranking algorithms\n* 0.1.1 Alpha version - 1st Jan 2018\n* 0.1.3 Beta version - March, 2019\n * minor fixes and documentation improvement", "description_content_type": "", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/jerrygaoLondon/jgtextrank", "keywords": "textrank,parsing,natural language processing,nlp,keywords extraction,term extraction,text summarisation,text analytics,text mining,feature extraction,machine learning,graph algorithm,computational linguistics", "license": "MIT", "maintainer": "", "maintainer_email": "", "name": "jgtextrank", "package_url": "https://pypi.org/project/jgtextrank/", "platform": "", "project_url": "https://pypi.org/project/jgtextrank/", "project_urls": { "Homepage": "https://github.com/jerrygaoLondon/jgtextrank" }, "release_url": "https://pypi.org/project/jgtextrank/0.1.3/", "requires_dist": null, "requires_python": "", "summary": "Yet another Python implementation of TextRank: package for the creation, manipulation, and study of TextRank algorithm based keywords extraction and summarisation", "version": "0.1.3" }, "last_serial": 4906361, "releases": { "0.1.1": [ { "comment_text": "", "digests": { "md5": "9c0e1becd72a4c03b00355f9c2c09196", "sha256": "a76c166215bfb78fb5b46771f573c20aa742c5595c0b2afb116cd0a425b59da6" }, "downloads": -1, "filename": "jgtextrank-0.1.1-py3.6.egg", "has_sig": false, "md5_digest": "9c0e1becd72a4c03b00355f9c2c09196", "packagetype": "bdist_egg", "python_version": "3.6", "requires_python": null, "size": 31739, "upload_time": "2018-01-01T00:02:14", "url": "https://files.pythonhosted.org/packages/2d/43/19205b6a31dcc922144d8df0164dd96b0edce45baf075c7558123a80d301/jgtextrank-0.1.1-py3.6.egg" }, { "comment_text": "", "digests": { "md5": "6382f857c903fffc8eae0cca6f4abd4c", "sha256": "51fee402850eda62fc5eabae4477b1abb10a6d1da50528a909f598117b95307d" }, "downloads": -1, "filename": "jgtextrank-0.1.1.tar.gz", "has_sig": false, "md5_digest": "6382f857c903fffc8eae0cca6f4abd4c", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 26593, "upload_time": "2018-01-01T00:02:16", "url": "https://files.pythonhosted.org/packages/5b/d7/bb5b6031e1b3a4212ecea7f959de9376f66942c789b46764d8795f44a3d4/jgtextrank-0.1.1.tar.gz" } ], "0.1.2": [ { "comment_text": "", "digests": { "md5": "1c806bbf44a58d2f7db640a3cfeff8c1", "sha256": "b49b886d8c7f8e17ca262782c1f72a21b0a120edd7e534699e1fb19844e164fc" }, "downloads": -1, "filename": "jgtextrank-0.1.2-py3.6.egg", "has_sig": false, "md5_digest": "1c806bbf44a58d2f7db640a3cfeff8c1", "packagetype": "bdist_egg", "python_version": "3.6", "requires_python": null, "size": 70253, "upload_time": "2018-08-30T11:50:10", "url": "https://files.pythonhosted.org/packages/87/7c/1e33221e983ddb051ee275b8393ba6f47102eedf3e138e146e15d46e9db2/jgtextrank-0.1.2-py3.6.egg" }, { "comment_text": "", "digests": { "md5": "9a4d7564e65b61e69541a925780860f9", "sha256": "361dc45c7085ddb86aa6c19dd09964f7dadd0a10a2e63a4a78ea56978644b1d3" }, "downloads": -1, "filename": "jgtextrank-0.1.2.tar.gz", "has_sig": false, "md5_digest": "9a4d7564e65b61e69541a925780860f9", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 32842, "upload_time": "2018-08-30T11:50:11", "url": "https://files.pythonhosted.org/packages/63/f8/f3e0c591b01e0cd3f3d11fb52e09ba19e921727a22f4c849b10f6c438c7c/jgtextrank-0.1.2.tar.gz" } ], "0.1.3": [ { "comment_text": "", "digests": { "md5": "b30dd6bef2b3ce92de76bcaf5f4e0fc2", "sha256": "c11a5349f670c81aff7a8f023b1eda5ab95b37555d094bbb6fb80178da041a52" }, "downloads": -1, "filename": "jgtextrank-0.1.3-py3.6.egg", "has_sig": false, "md5_digest": "b30dd6bef2b3ce92de76bcaf5f4e0fc2", "packagetype": "bdist_egg", "python_version": "3.6", "requires_python": null, "size": 39634, "upload_time": "2019-03-06T17:04:37", "url": "https://files.pythonhosted.org/packages/ee/72/a5eecc55f26f928f68e43730aa67934c0f2eaeb275ce2484ed2eea368f47/jgtextrank-0.1.3-py3.6.egg" }, { "comment_text": "", "digests": { "md5": "8cb7c7663ca2410c88f3e0fd2e8ceb04", "sha256": "b7c59e63504f351247b121ebb8804b411ea1cdbf963445b75587e59e22dd5cde" }, "downloads": -1, "filename": "jgtextrank-0.1.3.tar.gz", "has_sig": false, "md5_digest": "8cb7c7663ca2410c88f3e0fd2e8ceb04", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 35301, "upload_time": "2019-03-06T17:00:27", "url": "https://files.pythonhosted.org/packages/b3/bd/91d6b9590e8e471138d40b23c60da7db8acf576422977d43646566c0b4a9/jgtextrank-0.1.3.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "b30dd6bef2b3ce92de76bcaf5f4e0fc2", "sha256": "c11a5349f670c81aff7a8f023b1eda5ab95b37555d094bbb6fb80178da041a52" }, "downloads": -1, "filename": "jgtextrank-0.1.3-py3.6.egg", "has_sig": false, "md5_digest": "b30dd6bef2b3ce92de76bcaf5f4e0fc2", "packagetype": "bdist_egg", "python_version": "3.6", "requires_python": null, "size": 39634, "upload_time": "2019-03-06T17:04:37", "url": "https://files.pythonhosted.org/packages/ee/72/a5eecc55f26f928f68e43730aa67934c0f2eaeb275ce2484ed2eea368f47/jgtextrank-0.1.3-py3.6.egg" }, { "comment_text": "", "digests": { "md5": "8cb7c7663ca2410c88f3e0fd2e8ceb04", "sha256": "b7c59e63504f351247b121ebb8804b411ea1cdbf963445b75587e59e22dd5cde" }, "downloads": -1, "filename": "jgtextrank-0.1.3.tar.gz", "has_sig": false, "md5_digest": "8cb7c7663ca2410c88f3e0fd2e8ceb04", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 35301, "upload_time": "2019-03-06T17:00:27", "url": "https://files.pythonhosted.org/packages/b3/bd/91d6b9590e8e471138d40b23c60da7db8acf576422977d43646566c0b4a9/jgtextrank-0.1.3.tar.gz" } ] }