{ "info": { "author": "Grammy Jiang", "author_email": "grammy.jiang@gmail.com", "bugtrack_url": null, "classifiers": [ "Development Status :: 2 - Pre-Alpha", "Environment :: Plugins", "Framework :: Scrapy", "Intended Audience :: Developers", "License :: OSI Approved :: BSD License", "Operating System :: OS Independent", "Programming Language :: Python", "Programming Language :: Python :: 2", "Programming Language :: Python :: 2.7", "Programming Language :: Python :: 3", "Programming Language :: Python :: 3.3", "Programming Language :: Python :: 3.4", "Programming Language :: Python :: 3.5", "Programming Language :: Python :: 3.6", "Topic :: Internet :: WWW/HTTP", "Topic :: Software Development :: Libraries :: Python Modules" ], "description": "=====================\nScrapy-Block-Inpector\n=====================\n\n.. image:: https://img.shields.io/pypi/v/scrapy-block-inspector.svg\n :target: https://pypi.python.org/pypi/scrapy-block-inspector\n :alt: PyPI Version\n\n.. image:: https://img.shields.io/travis/grammy-jiang/scrapy-block-inspector/master.svg\n :target: http://travis-ci.org/grammy-jiang/scrapy-block-inspector\n :alt: Build Status\n\n.. image:: https://img.shields.io/badge/wheel-yes-brightgreen.svg\n :target: https://pypi.python.org/pypi/scrapy-block-inspector\n :alt: Wheel Status\n\n.. image:: https://img.shields.io/codecov/c/github/grammy-jiang/scrapy-block-inspector/master.svg\n :target: http://codecov.io/github/grammy-jiang/scrapy-block-inpector?branch=master\n :alt: Coverage report\n\nOverview\n========\n\nScrapy is a great framework for web crawling. This package provides a spider\nmiddleware to inspect the spider blocked or not in a highly customized way.\n\nRequirements\n============\n\n* Tests on Python 2.7 and Python 3.5, but it should work on other version higher\n then Python 3.3\n\n* Tests on Linux, but it's a pure python module, it should work on other\n platforms with official python supported, e.g. Windows, Mac OSX, BSD\n\nInstallation\n============\n\nThe quick way::\n\n pip install scrapy-block-inspector\n\nOr put this middleware just beside the scrapy project.\n\nDocumentation\n=============\n\nBlock Inspector in spider middleware, in ``settings.py``, for example::\n\n # -----------------------------------------------------------------------------\n # USER AGENT\n # -----------------------------------------------------------------------------\n\n SPIDER_MIDDLEWARES.update({\n 'scrapy_block_inspector.spidermiddlewares.block_inspector.BlockInspector': 500,\n })\n BLOCK_INSPECTOR = 'scrapy_project.spiders.spider.inspect_block'\n BLOCK_SIGNALS = ['scrapy_rotated_proxy.signals.proxy_block']\n BLOCK_SIGNALS_DEFERRED = ['scrapy_httpcache.signals.response_block']\n RECYCLE_BLOCK_REQUEST = 'scrapy_project.utils.recycle_block_request'\n\nThis middleware will add a new stats in the stats collector, named\n'block_inspector/block'.\n\nSettings Reference\n==================\n\nBLOCK_INSPECTOR\n---------------\n\nA function in the spidermiddleware to inspect block, if blocked this function\nwill return ``True``, otherwise return ``False``.\n\nThe input of this function is the ``response``.\n\nBLOCK_SIGNALS\n-------------\n\nWhen a block inspected, this spidermiddleware can send a signal to the ``signal\nmanager`` of the crawler to let other parts (middlewares, extensions, stats,\netc.) to execute relative operations.\n\nThis should be a list.\n\nBLOCK_SIGNALS_DEFERRED\n----------------------\n\nIf the signal is connected to a function or method which will return a deferred\nobject, this signal should be put in this setting.\n\nThis should be a list.\n\nRECYCLE_BLOCK_REQUEST\n---------------------\n\nA function to recycle the blocked request. Sometimes the block request need to\nrecycle after some further treatment, like to remove proxy related key in\nrequest.meta, etc.\n\n**Note: in this middleware 'dont_filter=True' will be added automatically.**\n\nThe input of this function is the request.\n\nBuild-in Functions To Inspect Block\n===================================\n\ninspect_block_google_recaptcha\n------------------------------\n\nThis is a function to check google recaptcha block.\n\nTo use this inspector, in settings::\n\n BLOCK_INSPECTOR = 'scrapy_block_inspector.utils.inspect_block_google_recaptcha.inspect_block'\n\nNOTE\n====\n\nPlease note: in scrapy, the exception raised by the method\n``process_spider_input`` will be sent to ``request.err_back`` first if there is\nerr_back defined. So please make sure the exception ``BlockException`` defined\nby this middleware can be raised in err_back function to trigger off the method\n``process_spider_exception`` correctly.\n\nTODO\n====\n\n\n", "description_content_type": null, "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/grammy-jiang/scrapy-block-inspector", "keywords": "", "license": "BSD", "maintainer": "", "maintainer_email": "", "name": "scrapy-block-inspector", "package_url": "https://pypi.org/project/scrapy-block-inspector/", "platform": "", "project_url": "https://pypi.org/project/scrapy-block-inspector/", "project_urls": { "Homepage": "https://github.com/grammy-jiang/scrapy-block-inspector" }, "release_url": "https://pypi.org/project/scrapy-block-inspector/0.0.2/", "requires_dist": [ "scrapy (>=1.4.0)", "PyPyDispatcher (>=2.1.0); platform_python_implementation == \"PyPy\"" ], "requires_python": "", "summary": "", "version": "0.0.2" }, "last_serial": 3259083, "releases": { "0.0.2": [ { "comment_text": "", "digests": { "md5": "9577f94a207a9b39c53a9f78a6f5bc95", "sha256": "3d25e4b3adbe0160898036011ab86c052f4f4f8a1480cedb25bf2f1fae7a10ad" }, "downloads": -1, "filename": "scrapy_block_inspector-0.0.2-py2-none-any.whl", "has_sig": false, "md5_digest": "9577f94a207a9b39c53a9f78a6f5bc95", "packagetype": "bdist_wheel", "python_version": "py2", "requires_python": null, "size": 8333, "upload_time": "2017-10-18T08:48:54", "url": "https://files.pythonhosted.org/packages/27/20/51d5730140b88445244128e543aefeae4344daab27d30bbf96a0a593424f/scrapy_block_inspector-0.0.2-py2-none-any.whl" }, { "comment_text": "", "digests": { "md5": "3830ed55e38293fbd43e1c30c52ef865", "sha256": "d3df02536e846faf7d1f03f54179e1f8e3de557699bffcd90162d302eba66fde" }, "downloads": -1, "filename": "scrapy_block_inspector-0.0.2-py3-none-any.whl", "has_sig": false, "md5_digest": "3830ed55e38293fbd43e1c30c52ef865", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 8335, "upload_time": "2017-10-18T08:49:00", "url": "https://files.pythonhosted.org/packages/13/7e/1d51ca163c04eda0f40f5817c1bb010244dec11510289f514a3285baa152/scrapy_block_inspector-0.0.2-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "e9515d8d047f37c1fe4bba6f9423373d", "sha256": "5d83ad1ec75aca5ea31b475b4c6025cff05595acafe64abd30b12bc8ae56c901" }, "downloads": -1, "filename": "scrapy-block-inspector-0.0.2.tar.gz", "has_sig": false, "md5_digest": "e9515d8d047f37c1fe4bba6f9423373d", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 22075, "upload_time": "2017-10-18T08:48:56", "url": "https://files.pythonhosted.org/packages/6b/66/c4e64355541587ebfc38215b23fc395560f75b9ddb8067dffe2b81c8b610/scrapy-block-inspector-0.0.2.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "9577f94a207a9b39c53a9f78a6f5bc95", "sha256": "3d25e4b3adbe0160898036011ab86c052f4f4f8a1480cedb25bf2f1fae7a10ad" }, "downloads": -1, "filename": "scrapy_block_inspector-0.0.2-py2-none-any.whl", "has_sig": false, "md5_digest": "9577f94a207a9b39c53a9f78a6f5bc95", "packagetype": "bdist_wheel", "python_version": "py2", "requires_python": null, "size": 8333, "upload_time": "2017-10-18T08:48:54", "url": "https://files.pythonhosted.org/packages/27/20/51d5730140b88445244128e543aefeae4344daab27d30bbf96a0a593424f/scrapy_block_inspector-0.0.2-py2-none-any.whl" }, { "comment_text": "", "digests": { "md5": "3830ed55e38293fbd43e1c30c52ef865", "sha256": "d3df02536e846faf7d1f03f54179e1f8e3de557699bffcd90162d302eba66fde" }, "downloads": -1, "filename": "scrapy_block_inspector-0.0.2-py3-none-any.whl", "has_sig": false, "md5_digest": "3830ed55e38293fbd43e1c30c52ef865", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 8335, "upload_time": "2017-10-18T08:49:00", "url": "https://files.pythonhosted.org/packages/13/7e/1d51ca163c04eda0f40f5817c1bb010244dec11510289f514a3285baa152/scrapy_block_inspector-0.0.2-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "e9515d8d047f37c1fe4bba6f9423373d", "sha256": "5d83ad1ec75aca5ea31b475b4c6025cff05595acafe64abd30b12bc8ae56c901" }, "downloads": -1, "filename": "scrapy-block-inspector-0.0.2.tar.gz", "has_sig": false, "md5_digest": "e9515d8d047f37c1fe4bba6f9423373d", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 22075, "upload_time": "2017-10-18T08:48:56", "url": "https://files.pythonhosted.org/packages/6b/66/c4e64355541587ebfc38215b23fc395560f75b9ddb8067dffe2b81c8b610/scrapy-block-inspector-0.0.2.tar.gz" } ] }