{ "info": { "author": "Grammy Jiang", "author_email": "grammy.jiang@gmail.com", "bugtrack_url": null, "classifiers": [ "Development Status :: 2 - Pre-Alpha", "Environment :: Plugins", "Framework :: Scrapy", "Intended Audience :: Developers", "License :: OSI Approved :: BSD License", "Operating System :: OS Independent", "Programming Language :: Python", "Programming Language :: Python :: 3", "Programming Language :: Python :: 3.5", "Programming Language :: Python :: 3.6", "Topic :: Internet :: WWW/HTTP", "Topic :: Software Development :: Libraries :: Python Modules" ], "description": "=======================\nScrapy-Proxy-Validation\n=======================\n\n.. image:: https://img.shields.io/pypi/v/scrapy-proxy-validation.svg\n :target: https://pypi.python.org/pypi/scrapy-proxy-validation\n :alt: PyPI Version\n\n.. image:: https://img.shields.io/travis/grammy-jiang/scrapy-proxy-validation/master.svg\n :target: http://travis-ci.org/grammy-jiang/scrapy-proxy-validation\n :alt: Build Status\n\n.. image:: https://img.shields.io/badge/wheel-yes-brightgreen.svg\n :target: https://pypi.python.org/pypi/scrapy-proxy-validation\n :alt: Wheel Status\n\n.. image:: https://img.shields.io/codecov/c/github/grammy-jiang/scrapy-proxy-validation/master.svg\n :target: http://codecov.io/github/grammy-jiang/scrapy-proxy-validation?branch=master\n :alt: Coverage report\n\nOverview\n========\n\nScrapy is a great framework for web crawling. This package provides a highly\ncustomized way to deal with the exceptions happening in the downloader\nmiddleware because of the proxy, and uses a signal to note relatives to treat\nthe invalidated proxies (e.g. moving to blacklist, renew the proxy pool).\n\nThere are two types of signals this package support:\n\n* traditional signal, sync\n\n* deferred signal, async\n\nPlease refer to the scrapy and twisted documents:\n\n* `Core API \u2014 Scrapy 1.4.0 documentation`_\n\n.. _`Core API \u2014 Scrapy 1.4.0 documentation`: https://doc.scrapy.org/en/latest/topics/api.html#topics-api-signals\n\n* `Signals \u2014 Scrapy 1.4.0 documentation`_\n\n.. _`Signals \u2014 Scrapy 1.4.0 documentation`: https://doc.scrapy.org/en/latest/topics/signals.html\n\n* `Deferred Reference \u2014 Twisted 17.9.0 documentation`_\n\n.. _`Deferred Reference \u2014 Twisted 17.9.0 documentation`: https://twistedmatrix.com/documents/current/core/howto/defer.html\n\nRequirements\n============\n\n* Scrapy\n\n* Tests on Python 3.5\n\n* Tests on Linux, but it is a pure python module, should work on any other\n platforms with official python and twisted support\n\nInstallation\n============\n\nThe quick way::\n\n pip install -U scrapy-proxy-validation\n\nOr put this middleware just beside the scrapy project.\n\nDocumentation\n=============\n\nSet this middleware in ``ITEMPIPELINES`` in ``settings.py``, for example::\n\n from scrapy_proxy_validation.downloadermiddlewares.proxy_validation import Validation\n\n DOWNLOADER_MIDDLEWARES.update({\n 'scrapy_proxy_validation.downloadmiddlewares.proxy_validation.ProxyValidation': 751\n })\n\n SIGNALS = [Validation(exception='twisted.internet.error.ConnectionRefusedError',\n signal='scrapy.signals.spider_closed'),\n Validation(exception='twisted.internet.error.ConnectionLost',\n signal='scrapy.signals.spider_closed',\n signal_deferred='scrapy.signals.spider_closed',\n limit=5)]\n\n RECYCLE_REQUEST = 'scrapy_proxy_validation.utils.recycle_request.recycle_request'\n\n\nSettings Reference\n==================\n\nSIGNALS\n-------\n\nA list of the class Validation with the exception it wants to deal with, the\nsync signal it sends, the async signal it sends and the limit it touches.\n\nRECYCLE_REQUEST\n---------------\n\nA function to recycle the request which have trouble with the proxy, the input\nargument is ``request``, and the output is ``request`` too.\n\n**Note: remember to set ``dont_filter`` to be ``True``, or the middleware\n``duplicate_fitler`` will remove this request.**\n\nBuilt-in Functions\n==================\n\nscrapy_proxy_validation.utils.recycle_request.recycle_request\n-------------------------------------------------------------\n\nThis is a built-in function to recycle the request which has a problem with\nthe proxy.\n\nThis function will remove the proxy keyword in meta and set ``dont_filter`` to\nbe ``True``.\n\nTo use this function, in ``settings.py``::\n\n RECYCLE_REQUEST = 'scrapy_proxy_validation.utils.recycle_request.recycle_request'\n\nNote\n====\n\nThere could be many different problems about the proxy, thus it will take some\nto collect them all and add to ``SIGNALS``. Please be patient, this is not a\nonce-time solution middleware for this case.\n\nTODO\n====\n\nNo idea, please let me know if you have!\n\n\n", "description_content_type": null, "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/grammy-jiang/scrapy-proxy-validation", "keywords": "", "license": "BSD", "maintainer": "", "maintainer_email": "", "name": "scrapy-proxy-validation", "package_url": "https://pypi.org/project/scrapy-proxy-validation/", "platform": "", "project_url": "https://pypi.org/project/scrapy-proxy-validation/", "project_urls": { "Homepage": "https://github.com/grammy-jiang/scrapy-proxy-validation" }, "release_url": "https://pypi.org/project/scrapy-proxy-validation/0.0.4/", "requires_dist": [ "txmongo", "scrapy (>=1.4.0)" ], "requires_python": "", "summary": "", "version": "0.0.4" }, "last_serial": 3299174, "releases": { "0.0.3": [ { "comment_text": "", "digests": { "md5": "d90b92aca73b5838515db7c3f1449a2d", "sha256": "a79f273da847f8f2f974e4c14d9e47242dd532a6625086403bba490ea4e85d5a" }, "downloads": -1, "filename": "scrapy_proxy_validation-0.0.3-py3-none-any.whl", "has_sig": false, "md5_digest": "d90b92aca73b5838515db7c3f1449a2d", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 8508, "upload_time": "2017-11-02T03:25:38", "url": "https://files.pythonhosted.org/packages/de/4b/f757846d840092592f31d4db8e03dc1a641b474168cf49eb20909f46fa94/scrapy_proxy_validation-0.0.3-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "43094d2b17648308e81f7595f27286fe", "sha256": "50f6f9e02861c6310b5961203fd2c2a5371cfc44db72adb7f96f85621d8adaef" }, "downloads": -1, "filename": "scrapy-proxy-validation-0.0.3.tar.gz", "has_sig": false, "md5_digest": "43094d2b17648308e81f7595f27286fe", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 21948, "upload_time": "2017-11-02T03:25:40", "url": "https://files.pythonhosted.org/packages/b9/2c/e939f6a0200f3dae4115a13ae945b657f96b35de818f722b0626d2be48c6/scrapy-proxy-validation-0.0.3.tar.gz" } ], "0.0.4": [ { "comment_text": "", "digests": { "md5": "e6e8eada3ede813ddd969688e7cf4e8e", "sha256": "7ebc6c9882d5e3ca47cab4d78a504aed8ae80915af0715310c82baf0e426a589" }, "downloads": -1, "filename": "scrapy_proxy_validation-0.0.4-py3-none-any.whl", "has_sig": false, "md5_digest": "e6e8eada3ede813ddd969688e7cf4e8e", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 8515, "upload_time": "2017-11-02T03:43:04", "url": "https://files.pythonhosted.org/packages/13/0b/a436a8ea64215bf1b14ffc8c1b3097b01301e96011b78a9c33f136f2de72/scrapy_proxy_validation-0.0.4-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "51ec3a913dfe488420607213fa3749b7", "sha256": "3f6e3d72bf3458fd712b4693073ad4f98c4e4db567c705a9106f3c1053c4369f" }, "downloads": -1, "filename": "scrapy-proxy-validation-0.0.4.tar.gz", "has_sig": false, "md5_digest": "51ec3a913dfe488420607213fa3749b7", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 21947, "upload_time": "2017-11-02T03:43:04", "url": "https://files.pythonhosted.org/packages/a3/9c/df6b9e97074b47405427ffc93e48746ec44e2004810a976fc695b60bcc07/scrapy-proxy-validation-0.0.4.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "e6e8eada3ede813ddd969688e7cf4e8e", "sha256": "7ebc6c9882d5e3ca47cab4d78a504aed8ae80915af0715310c82baf0e426a589" }, "downloads": -1, "filename": "scrapy_proxy_validation-0.0.4-py3-none-any.whl", "has_sig": false, "md5_digest": "e6e8eada3ede813ddd969688e7cf4e8e", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 8515, "upload_time": "2017-11-02T03:43:04", "url": "https://files.pythonhosted.org/packages/13/0b/a436a8ea64215bf1b14ffc8c1b3097b01301e96011b78a9c33f136f2de72/scrapy_proxy_validation-0.0.4-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "51ec3a913dfe488420607213fa3749b7", "sha256": "3f6e3d72bf3458fd712b4693073ad4f98c4e4db567c705a9106f3c1053c4369f" }, "downloads": -1, "filename": "scrapy-proxy-validation-0.0.4.tar.gz", "has_sig": false, "md5_digest": "51ec3a913dfe488420607213fa3749b7", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 21947, "upload_time": "2017-11-02T03:43:04", "url": "https://files.pythonhosted.org/packages/a3/9c/df6b9e97074b47405427ffc93e48746ec44e2004810a976fc695b60bcc07/scrapy-proxy-validation-0.0.4.tar.gz" } ] }