{ "info": { "author": "Andreas van Cranenburgh", "author_email": "A.W.vanCranenburgh@uva.nl", "bugtrack_url": null, "classifiers": [ "Development Status :: 4 - Beta", "Environment :: Console", "Environment :: Web Environment", "Intended Audience :: Science/Research", "License :: OSI Approved :: Apache Software License", "Operating System :: POSIX", "Programming Language :: Cython", "Programming Language :: Python :: 2.7", "Programming Language :: Python :: 3.3", "Topic :: Text Processing :: Linguistic" ], "description": "Readability\n===========\n\nAn implementation of traditional readability measures based on simple surface\ncharacteristics. These measures are basically linear regressions based on the\nnumber of words, syllables, and sentences.\n\nThe functionality is modeled after the UNIX ``style(1)`` command. Compared to the\nimplementation as part of `GNU diction `_,\nthis version supports UTF-8 encoded text, but expects sentence-segmented and\ntokenized text. The syllabification and word type recognition is based on\nsimple heuristics and only provides a rough measure. The supported languages\nare English, German, and Dutch. Adding support for a new language involves the\naddition of heuristics for the aforementioned syllabification and word type\nrecognition; see ``langdata.py``.\n\nNB: all readability formulas were developed for English, so the scales of the\noutcomes are only meaningful for English texts. The Dale-Chall measure uses the\noriginal word list for English, but for Dutch and German lists of frequent\nwords are used that were not specifically selected for recognizability by\nschool children.\n\nInstallation\n------------\n::\n\n $ pip install https://github.com/andreasvc/readability/tarball/master\n\nUsage\n-----\nFrom Python::\n\n >>> import readability\n >>> text = ('This is an example sentence .\\n'\n 'Note that tokens are separated by spaces and sentences by newlines .\\n')\n >>> results = readability.getmeasures(text, lang='en')\n >>> print(results['readability grades']['FleschReadingEase'])\n 55.95250000000002\n\nCommand line usage::\n\n $ readability --help\n Simple readability measures.\n\n Usage: readability [--lang=] [FILE]\n or: readability [--lang=] --csv FILES...\n\n By default, input is read from standard input.\n Text should be encoded with UTF-8,\n one sentence per line, tokens space-separated.\n\n Options:\n -L, --lang= Set language (available: de, nl, en).\n --csv Produce a table in comma separated value format on\n standard output given one or more filenames.\n --tokenizer= Specify a tokenizer including options that will be given\n each text on stdin and should return tokenized output on\n stdout. Not applicable when reading from stdin.\n\nFor proper results, the text should be tokenized.\n\n- For English, I recommend \"tokenizer\",\n cf. http://moin.delph-in.net/WeSearch/DocumentParsing\n- For Dutch, I recommend the tokenizer that is part of the Alpino parser:\n http://www.let.rug.nl/vannoord/alp/Alpino/.\n- ``ucto`` is a general multilingual tokenizer: http://ilk.uvt.nl/ucto\n\nExample using ``ucto``::\n\n $ ucto -L en -n -s '' \"CONRAD, Joseph - Lord Jim.txt\" | readability\n [...]\n readability grades:\n Kincaid: 5.44\n ARI: 6.39\n Coleman-Liau: 6.91\n FleschReadingEase: 85.17\n GunningFogIndex: 9.86\n LIX: 31.98\n SMOGIndex: 9.39\n RIX: 2.56\n DaleChallIndex: 8.02\n sentence info:\n characters_per_word: 4.17\n syll_per_word: 1.24\n words_per_sentence: 16.35\n sentences_per_paragraph: 11.5\n type_token_ratio: 0.09\n characters: 551335\n syllables: 164205\n words: 132211\n wordtypes: 12071\n sentences: 8087\n paragraphs: 703\n long_words: 20670\n complex_words: 10990\n complex_words_dc: 29908\n word usage:\n tobeverb: 3907\n auxverb: 1630\n conjunction: 4398\n pronoun: 18092\n preposition: 19290\n nominalization: 1167\n sentence beginnings:\n pronoun: 2578\n interrogative: 217\n article: 629\n subordination: 120\n conjunction: 236\n preposition: 397\n\nThe option ``--csv`` collects readability measures for a number of texts in\na table. To tokenize documents on-the-fly when using this option, use\nthe ``--tokenizer`` option. Example with the \"tokenize\" tool::\n\n $ readability --csv --tokenizer='tokenizer -L en-u8 -P -S -E \"\" -N' */*.txt >readabilitymeasures.csv\n\nReferences\n----------\nThe following readability metrics are included:\n\n1. http://en.wikipedia.org/wiki/Automated_Readability_Index\n2. http://en.wikipedia.org/wiki/SMOG\n3. http://en.wikipedia.org/wiki/Flesch%E2%80%93Kincaid_Grade_Level#Flesch.E2.80.93Kincaid_Grade_Level\n4. http://en.wikipedia.org/wiki/Flesch%E2%80%93Kincaid_readability_test#Flesch_Reading_Ease\n5. http://en.wikipedia.org/wiki/Coleman-Liau_Index\n6. http://en.wikipedia.org/wiki/Gunning-Fog_Index\n7. https://en.wikipedia.org/wiki/Dale%E2%80%93Chall_readability_formula\n\nFor better readability measures, consider the following:\n\n- Collins-Thompson & Callan (2004). A language modeling approach to predicting reading difficulty.\n In Proc. of HLT/NAACL, pp. 193-200. http://aclweb.org/anthology/N04-1025.pdf\n- Schwarm & Ostendorf (2005). Reading level assessment using SVM and statistical language models.\n Proc. of ACL, pp. 523-530. http://www.aclweb.org/anthology/P05-1065.pdf\n- The Lexile framework for reading. http://www.lexile.com\n- Coh-Metrix. http://cohmetrix.memphis.edu/\n- Stylene: http://www.clips.ua.ac.be/category/projects/stylene\n- T-Scan: http://languagelink.let.uu.nl/tscan\n\nAcknowledgments\n---------------\nThe code is based on: https://github.com/mmautner/readability\n\nWhich in turn was based on: https://github.com/nltk/nltk_contrib/tree/master/nltk_contrib/readability\n", "description_content_type": "", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/andreasvc/readability/", "keywords": "", "license": "", "maintainer": "", "maintainer_email": "", "name": "readability", "package_url": "https://pypi.org/project/readability/", "platform": "", "project_url": "https://pypi.org/project/readability/", "project_urls": { "Homepage": "https://github.com/andreasvc/readability/" }, "release_url": "https://pypi.org/project/readability/0.3.1/", "requires_dist": null, "requires_python": "", "summary": "Measure the readability of a given text using surface characteristics", "version": "0.3.1" }, "last_serial": 4689580, "releases": { "0.1": [ { "comment_text": "", "digests": { "md5": "ce09d671d2e29fb82e64c64bd743e0db", "sha256": "b6d03b185448a0f97dfc48f0f3e1590b4b48f86f2b1ec0188408098fe49b14b6" }, "downloads": -1, "filename": "readability-0.1.tar.gz", "has_sig": false, "md5_digest": "ce09d671d2e29fb82e64c64bd743e0db", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 9449, "upload_time": "2014-04-13T13:18:49", "url": "https://files.pythonhosted.org/packages/ee/7c/61d1f8592f261beba7f4f79104e6b467cff86ccb0a15f3008fefa46ddef7/readability-0.1.tar.gz" } ], "0.2": [ { "comment_text": "", "digests": { "md5": "cf87a608385ab8dd6e5b7d65119be128", "sha256": "2246350df2b095c3b5859785ffd6140409a51f7f02b659fb1a1be60dcc88b8e7" }, "downloads": -1, "filename": "readability-0.2.tar.gz", "has_sig": false, "md5_digest": "cf87a608385ab8dd6e5b7d65119be128", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 10768, "upload_time": "2015-08-11T12:31:37", "url": "https://files.pythonhosted.org/packages/10/8c/869de6d3e4b7afdaa1d21ec83c10cf859c71267f3596a4c6f3c192f3c34e/readability-0.2.tar.gz" } ], "0.3": [ { "comment_text": "", "digests": { "md5": "32d9fd95b527f6160c7b5841bd7ea146", "sha256": "24fda3d0cb6dfe3a4efe15458fe8c2821e3d112cd41b449575b0eeef99119b6d" }, "downloads": -1, "filename": "readability-0.3.tar.gz", "has_sig": false, "md5_digest": "32d9fd95b527f6160c7b5841bd7ea146", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 36419, "upload_time": "2018-07-21T11:19:36", "url": "https://files.pythonhosted.org/packages/4c/cc/de564319ff609a445056c13409b726256e75fd2f32b07c03c39fc59ca07c/readability-0.3.tar.gz" } ], "0.3.1": [ { "comment_text": "", "digests": { "md5": "9a4ffde00314a2130ef941d363dd68dd", "sha256": "f9030df8bc31aad45baffa9a2d9ce1fdd8051833e5b5bda3027df32fdec00fad" }, "downloads": -1, "filename": "readability-0.3.1.tar.gz", "has_sig": false, "md5_digest": "9a4ffde00314a2130ef941d363dd68dd", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 34777, "upload_time": "2019-01-13T00:04:35", "url": "https://files.pythonhosted.org/packages/26/70/6f8750066255d4d2b82b813dd2550e0bd2bee99d026d14088a7b977cd0fc/readability-0.3.1.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "9a4ffde00314a2130ef941d363dd68dd", "sha256": "f9030df8bc31aad45baffa9a2d9ce1fdd8051833e5b5bda3027df32fdec00fad" }, "downloads": -1, "filename": "readability-0.3.1.tar.gz", "has_sig": false, "md5_digest": "9a4ffde00314a2130ef941d363dd68dd", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 34777, "upload_time": "2019-01-13T00:04:35", "url": "https://files.pythonhosted.org/packages/26/70/6f8750066255d4d2b82b813dd2550e0bd2bee99d026d14088a7b977cd0fc/readability-0.3.1.tar.gz" } ] }