{ "info": { "author": "Yuri Baburov", "author_email": "burchik@gmail.com", "bugtrack_url": null, "classifiers": [ "Environment :: Web Environment", "Intended Audience :: Developers", "Operating System :: OS Independent", "Programming Language :: Python" ], "description": "This code is under the Apache License 2.0. http://www.apache.org/licenses/LICENSE-2.0\n\nThis is a python port of a ruby port of arc90's readability project\n\nhttp://lab.arc90.com/experiments/readability/\n\nIn few words,\nGiven a html document, it pulls out the main body text and cleans it up.\nIt also can clean up title based on latest readability.js code.\n\nBased on:\n - Latest readability.js ( https://github.com/MHordecki/readability-redux/blob/master/readability/readability.js )\n - Ruby port by starrhorne and iterationlabs\n - Python port by gfxmonk ( https://github.com/gfxmonk/python-readability , based on BeautifulSoup )\n - Decruft effort to move to lxml ( http://www.minvolai.com/blog/decruft-arc90s-readability-in-python/ )\n - \"BR to P\" fix from readability.js which improves quality for smaller texts.\n - Github users contributions.\n\nInstallation::\n\n easy_install readability-lxml\n or\n pip install readability-lxml\n\nUsage::\n\n from readability.readability import Document\n import urllib\n html = urllib.urlopen(url).read()\n readable_article = Document(html).summary()\n readable_title = Document(html).short_title()\n\nCommand-line usage::\n\n python -m readability.readability -u http://pypi.python.org/pypi/readability-lxml\n\n\nUsing positive/negative keywords example::\n\n python -m readability.readability -p intro -n newsindex,homepage-box,news-section -u http://python.org\n\n\nDocument() kwarg options:\n\n - attributes:\n - debug: output debug messages\n - min_text_length:\n - retry_length:\n - url: will allow adjusting links to be absolute\n - positive_keywords: the list of positive search patterns in classes and ids, for example: [\"news-item\", \"block\"]\n - negative_keywords: the list of negative search patterns in classes and ids, for example: [\"mysidebar\", \"related\", \"ads\"]\n\n\nUpdates\n\n - 0.2.5 Update setup.py for uploading .tar.gz to pypi\n - 0.2.6 Don't crash on documents with no title\n - 0.2.6.1 Document.short_title() properly works\n - 0.3 Added Document.encoding, positive_keywords and negative_keywords", "description_content_type": null, "docs_url": null, "download_url": "UNKNOWN", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "http://github.com/hyperlinkapp/python-readability", "keywords": null, "license": "Apache License 2.0", "maintainer": null, "maintainer_email": null, "name": "PyReadability", "package_url": "https://pypi.org/project/PyReadability/", "platform": "UNKNOWN", "project_url": "https://pypi.org/project/PyReadability/", "project_urls": { "Download": "UNKNOWN", "Homepage": "http://github.com/hyperlinkapp/python-readability" }, "release_url": "https://pypi.org/project/PyReadability/0.4.0/", "requires_dist": null, "requires_python": null, "summary": "fast python port of arc90's readability tool", "version": "0.4.0" }, "last_serial": 1347880, "releases": { "0.4.0": [ { "comment_text": "", "digests": { "md5": "114c008b693d2ed3fd1a1e1b05d54768", "sha256": "f67e715150506fddc929e5f27fe90f72a2093ca0bd3d037163b2b66b02c1cecc" }, "downloads": -1, "filename": "PyReadability-0.4.0.tar.gz", "has_sig": false, "md5_digest": "114c008b693d2ed3fd1a1e1b05d54768", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 11887, "upload_time": "2014-12-17T15:26:44", "url": "https://files.pythonhosted.org/packages/82/19/a4d981fe223f76376423abc379f6619adfab74d9950356aee7eb6f35c0eb/PyReadability-0.4.0.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "114c008b693d2ed3fd1a1e1b05d54768", "sha256": "f67e715150506fddc929e5f27fe90f72a2093ca0bd3d037163b2b66b02c1cecc" }, "downloads": -1, "filename": "PyReadability-0.4.0.tar.gz", "has_sig": false, "md5_digest": "114c008b693d2ed3fd1a1e1b05d54768", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 11887, "upload_time": "2014-12-17T15:26:44", "url": "https://files.pythonhosted.org/packages/82/19/a4d981fe223f76376423abc379f6619adfab74d9950356aee7eb6f35c0eb/PyReadability-0.4.0.tar.gz" } ] }