{ "info": { "author": "Alexander Afanasyev", "author_email": "afanasieffav@gmail.com", "bugtrack_url": null, "classifiers": [ "Development Status :: 4 - Beta", "Framework :: Scrapy", "Intended Audience :: Developers", "License :: OSI Approved :: BSD License", "Programming Language :: Python", "Topic :: Internet :: WWW/HTTP" ], "description": ".. image:: https://badge.fury.io/py/scrapy-beautifulsoup.svg\n :target: http://badge.fury.io/py/scrapy-beautifulsoup\n :alt: PyPI version\n\n.. image:: https://requires.io/github/alecxe/scrapy-beautifulsoup/requirements.svg?branch=master\n :target: https://requires.io/github/alecxe/scrapy-beautifulsoup/requirements/?branch=master\n :alt: Requirements Status\n\nscrapy-beautifulsoup\n====================\n\nSimple Scrapy middleware to process non-well-formed HTML with BeautifulSoup\n\nInstallation\n============\n\nThe package is on PyPI and can be installed with ``pip``:\n\n::\n\n pip install scrapy-beautifulsoup\n\nConfiguration\n-------------\n\nAdd the middleware to ``DOWNLOADER_MIDDLEWARES`` dictionary setting:\n\n::\n\n DOWNLOADER_MIDDLEWARES = {\n 'scrapy_beautifulsoup.middleware.BeautifulSoupMiddleware': 400\n }\n\n\nBy default, ``BeautifulSoup`` would use the built-in ``html.parser`` parser. To change it, set the ``BEAUTIFULSOUP_PARSER`` setting:\n\n::\n \n BEAUTIFULSOUP_PARSER = \"html5lib\" # or BEAUTIFULSOUP_PARSER = \"lxml\"\n\n``html5lib`` is an *extremely lenient* parser and, if the target HTML is seriously broken, you might consider being it your first choice. \nNote: `html5lib `_ has to be installed in this case:\n\n::\n \n pip install html5lib\n\nMotivation\n==========\n\n`BeautifulSoup `_ itself with the help of an `underlying parser of choice `_ does a pretty good job of handling non-well-formed or broken HTML.\nIn some cases, it makes sense to pipe the HTML through ``BeautifulSoup`` to \"fix\" it.\n\n.. |GitHub version| image:: https://badge.fury.io/gh/alecxe%2Fscrapy-beautifulsoup.svg\n :target: http://badge.fury.io/gh/alecxe%2Fscrapy-beautifulsoup\n.. |Requirements Status| image:: https://requires.io/github/alecxe/scrapy-beautifulsoup/requirements.svg?branch=master\n :target: https://requires.io/github/alecxe/scrapy-beautifulsoup/requirements/?branch=master", "description_content_type": null, "docs_url": null, "download_url": "UNKNOWN", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/alecxe/scrapy-beautifulsoup", "keywords": "scrapy beautifulsoup html html-parsing web-scraping", "license": "New BSD License", "maintainer": null, "maintainer_email": null, "name": "scrapy-beautifulsoup", "package_url": "https://pypi.org/project/scrapy-beautifulsoup/", "platform": "UNKNOWN", "project_url": "https://pypi.org/project/scrapy-beautifulsoup/", "project_urls": { "Download": "UNKNOWN", "Homepage": "https://github.com/alecxe/scrapy-beautifulsoup" }, "release_url": "https://pypi.org/project/scrapy-beautifulsoup/0.0.2/", "requires_dist": null, "requires_python": null, "summary": "Simple Scrapy middleware to process non-well-formed HTML with BeautifulSoup", "version": "0.0.2" }, "last_serial": 2364660, "releases": { "0.0.1": [ { "comment_text": "", "digests": { "md5": "0445640d4cbcc9454aa559fd96cf88d5", "sha256": "5c03e3d3c216a13d2222ff9eb31aa8436c076a5d23ccaf76ca6991b6c74d6d32" }, "downloads": -1, "filename": "scrapy_beautifulsoup-0.0.1-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "0445640d4cbcc9454aa559fd96cf88d5", "packagetype": "bdist_wheel", "python_version": "2.7", "requires_python": null, "size": 4196, "upload_time": "2016-09-26T18:00:05", "url": "https://files.pythonhosted.org/packages/1a/3e/091dffa3e05197b8b61b0ade86949bee8eccc53399ac4a79bf8542c9c163/scrapy_beautifulsoup-0.0.1-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "0fd4e6331e706d07c088971f502e33bc", "sha256": "05d5b1ca40bf84f3de72001d1f5e20a6f6ca618695c4e518eb53bc9618e7a42c" }, "downloads": -1, "filename": "scrapy-beautifulsoup-0.0.1.tar.gz", "has_sig": false, "md5_digest": "0fd4e6331e706d07c088971f502e33bc", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 2092, "upload_time": "2016-09-26T18:00:14", "url": "https://files.pythonhosted.org/packages/4a/d9/a803ed0e57d589ecaa6fdcca592f335f095b5f34cfce45ea6543d6e1f00d/scrapy-beautifulsoup-0.0.1.tar.gz" } ], "0.0.2": [ { "comment_text": "", "digests": { "md5": "3f522f73be574c5d2088fb9335dc7660", "sha256": "354fb34f6d302768cb2e6380464c3310934af1de673714f3d6c46b8d0f88c3a1" }, "downloads": -1, "filename": "scrapy_beautifulsoup-0.0.2-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "3f522f73be574c5d2088fb9335dc7660", "packagetype": "bdist_wheel", "python_version": "2.7", "requires_python": null, "size": 4548, "upload_time": "2016-09-26T18:09:59", "url": "https://files.pythonhosted.org/packages/70/06/7c0f6a2f0a595cfa767fb123635f1b347fc23fe60d5f3b94eabc19582520/scrapy_beautifulsoup-0.0.2-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "fcf611c65047d783ebbadf80d0718b9f", "sha256": "6cf3158d257bb3d95dc45b8892d35dbf1d356afa4c33d4b1829fb34cdfbbd3be" }, "downloads": -1, "filename": "scrapy-beautifulsoup-0.0.2.tar.gz", "has_sig": false, "md5_digest": "fcf611c65047d783ebbadf80d0718b9f", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 2304, "upload_time": "2016-09-26T18:09:49", "url": "https://files.pythonhosted.org/packages/83/53/3b51bc3dc26e4007241f9bcdb9693501e026192898be39cfe33791db0fff/scrapy-beautifulsoup-0.0.2.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "3f522f73be574c5d2088fb9335dc7660", "sha256": "354fb34f6d302768cb2e6380464c3310934af1de673714f3d6c46b8d0f88c3a1" }, "downloads": -1, "filename": "scrapy_beautifulsoup-0.0.2-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "3f522f73be574c5d2088fb9335dc7660", "packagetype": "bdist_wheel", "python_version": "2.7", "requires_python": null, "size": 4548, "upload_time": "2016-09-26T18:09:59", "url": "https://files.pythonhosted.org/packages/70/06/7c0f6a2f0a595cfa767fb123635f1b347fc23fe60d5f3b94eabc19582520/scrapy_beautifulsoup-0.0.2-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "fcf611c65047d783ebbadf80d0718b9f", "sha256": "6cf3158d257bb3d95dc45b8892d35dbf1d356afa4c33d4b1829fb34cdfbbd3be" }, "downloads": -1, "filename": "scrapy-beautifulsoup-0.0.2.tar.gz", "has_sig": false, "md5_digest": "fcf611c65047d783ebbadf80d0718b9f", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 2304, "upload_time": "2016-09-26T18:09:49", "url": "https://files.pythonhosted.org/packages/83/53/3b51bc3dc26e4007241f9bcdb9693501e026192898be39cfe33791db0fff/scrapy-beautifulsoup-0.0.2.tar.gz" } ] }