{ "info": { "author": "AnJia", "author_email": "anjia0532@gmail.com", "bugtrack_url": null, "classifiers": [], "description": "Random proxy middleware for Scrapy (http://scrapy.org/)\n=======================================================\n\n**base on https://github.com/aivarsk/scrapy-proxies , support load proxies from https://github.com/qiyeboy/IPProxyPool**\n\nProcesses Scrapy requests using a random proxy from list to avoid IP ban and\nimprove crawling speed.\n\nGet your proxy list from sites like http://www.hidemyass.com/ (copy-paste into text file\nand reformat to http://host:port format)\n\nInstall\n--------\n\nThe quick way:\n\n```bash\npip install scrapy-proxies-tool\n```\n\nOr checkout the source and run\n\n```bash\npython setup.py install\n```\n\nsettings.py\n-----------\n\n```python\n\n# Retry many times since proxies often fail\nRETRY_TIMES = 10\n# Retry on most error codes since proxies fail for different reasons\nRETRY_HTTP_CODES = [500, 503, 504, 400, 403, 404, 408]\n\nDOWNLOADER_MIDDLEWARES = {\n 'scrapy.downloadermiddlewares.retry.RetryMiddleware': 90,\n 'scrapy_proxies.RandomProxy': 100,\n 'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware': 110,\n}\n\nPROXY_SETTINGS = {\n # Proxy list containing entries like\n # http://host1:port\n # http://username:password@host2:port\n # http://host3:port\n # ...\n # if PROXY_SETTINGS[from_proxies_server] = True , proxy_list is server address (ref https://github.com/qiyeboy/IPProxyPool and https://github.com/awolfly9/IPProxyTool )\n # Only support http(ref https://github.com/qiyeboy/IPProxyPool#%E5%8F%82%E6%95%B0)\n # list : ['http://localhost:8000?protocol=0'],\n 'list':['/path/to/proxy/list.txt'],\n\n # disable proxy settings and use real ip when all proxies are unusable\n 'use_real_when_empty':False,\n 'from_proxies_server':False,\n\n # If proxy mode is 2 uncomment this sentence :\n # 'custom_proxy': \"http://host1:port\",\n\n # Proxy mode\n # 0 = Every requests have different proxy\n # 1 = Take only one proxy from the list and assign it to every requests\n # 2 = Put a custom proxy to use in the settings\n 'mode':0\n}\n```\n\nFor older versions of Scrapy (before 1.0.0) you have to use\n`scrapy.contrib.downloadermiddleware.retry.RetryMiddleware` and\n`scrapy.contrib.downloadermiddleware.httpproxy.HttpProxyMiddleware`\nmiddlewares instead.\n\n\nYour spider\n-----------\n\nIn each callback ensure that proxy /really/ returned your target page by\nchecking for site logo or some other significant element.\nIf not - retry request with dont_filter=True\n\n```python\n if not hxs.select('//get/site/logo'):\n yield Request(url=response.url, dont_filter=True)\n```", "description_content_type": "text/markdown", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/anjia0532/scrapy-proxies", "keywords": "Scrapy,scrapy-proxies,proxies,IPProxyTool", "license": "MIT Licence", "maintainer": "", "maintainer_email": "", "name": "scrapy-proxies-tool", "package_url": "https://pypi.org/project/scrapy-proxies-tool/", "platform": "", "project_url": "https://pypi.org/project/scrapy-proxies-tool/", "project_urls": { "Homepage": "https://github.com/anjia0532/scrapy-proxies" }, "release_url": "https://pypi.org/project/scrapy-proxies-tool/0.4.0/", "requires_dist": null, "requires_python": "", "summary": "Scrapy Proxies: random proxy middleware for Scrapy(support load proxies from IPProxyTool)", "version": "0.4.0" }, "last_serial": 4275306, "releases": { "0.3.0": [ { "comment_text": "", "digests": { "md5": "5c009894d847dcdc0baa5f1f9d8800af", "sha256": "2292cf063a1f00db93bf0733c06d632e60de88c4933b68207b62f81581da2539" }, "downloads": -1, "filename": "scrapy-proxies-tool-0.3.0.tar.gz", "has_sig": false, "md5_digest": "5c009894d847dcdc0baa5f1f9d8800af", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 4291, "upload_time": "2018-09-15T17:06:08", "url": "https://files.pythonhosted.org/packages/06/f8/98a376ef4efb28401dd06ad3b561122a73e31fd0e3e3d0d7e82facf6dfb0/scrapy-proxies-tool-0.3.0.tar.gz" } ], "0.4.0": [ { "comment_text": "", "digests": { "md5": "42faf711cde6fd74f28394bdca1b93a8", "sha256": "1b5e2614b91625f52906b6f990b86efcafc3141eb3886f0de5978bbe63523915" }, "downloads": -1, "filename": "scrapy-proxies-tool-0.4.0.tar.gz", "has_sig": false, "md5_digest": "42faf711cde6fd74f28394bdca1b93a8", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 4332, "upload_time": "2018-09-15T18:05:42", "url": "https://files.pythonhosted.org/packages/d6/9a/529796b44339608ff26026831bf8ebf123a804e001535ef8239f75329416/scrapy-proxies-tool-0.4.0.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "42faf711cde6fd74f28394bdca1b93a8", "sha256": "1b5e2614b91625f52906b6f990b86efcafc3141eb3886f0de5978bbe63523915" }, "downloads": -1, "filename": "scrapy-proxies-tool-0.4.0.tar.gz", "has_sig": false, "md5_digest": "42faf711cde6fd74f28394bdca1b93a8", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 4332, "upload_time": "2018-09-15T18:05:42", "url": "https://files.pythonhosted.org/packages/d6/9a/529796b44339608ff26026831bf8ebf123a804e001535ef8239f75329416/scrapy-proxies-tool-0.4.0.tar.gz" } ] }