{ "info": { "author": "debugtalk", "author_email": "mail@debugtalk.com", "bugtrack_url": null, "classifiers": [ "Programming Language :: Python :: 3.6" ], "description": "# requests-crawler\n\nA web crawler based on [requests-html][requests-html], mainly targets for url validation test.\n\n## Features\n\n- based on [requests-html][requests-html], **full JavaScript support!**\n- support requests frequency limitation, e.g. rps/rpm\n- support crawl with headers and cookies\n- include & exclude mechanism\n- group visited urls by HTTP status code\n- display url's referers and hyper links\n\n## Installation/Upgrade\n\n```bash\n$ pip install requests-crawler\n```\n\nOnly **Python 3.6** is supported.\n\nTo ensure the installation or upgrade is successful, you can execute command `requests_crawler -V` to see if you can get the correct version number.\n\n```bash\n$ requests_crawler -V\n0.5.3\n```\n\n## Usage\n\n```text\n$ requests_crawler -h\nusage: requests_crawler [-h] [-V] [--log-level LOG_LEVEL]\n [--seed SEED]\n [--headers [HEADERS [HEADERS ...]]]\n [--cookies [COOKIES [COOKIES ...]]]\n [--requests-limit REQUESTS_LIMIT]\n [--interval-limit INTERVAL_LIMIT]\n [--include [INCLUDE [INCLUDE ...]]]\n [--exclude [EXCLUDE [EXCLUDE ...]]]\n [--workers WORKERS]\n\nA web crawler based on requests-html, mainly targets for url validation test.\n\noptional arguments:\n -h, --help show this help message and exit\n -V, --version show version\n --log-level LOG_LEVEL\n Specify logging level, default is INFO.\n --seed SEED Specify crawl seed url\n --headers [HEADERS [HEADERS ...]]\n Specify headers, e.g. 'User-Agent:iOS/10.3'\n --cookies [COOKIES [COOKIES ...]]\n Specify cookies, e.g. 'lang=en country:us'\n --requests-limit REQUESTS_LIMIT\n Specify requests limit for crawler, default rps.\n --interval-limit INTERVAL_LIMIT\n Specify limit interval, default 1 second.\n --include [INCLUDE [INCLUDE ...]]\n Urls include the snippets will be crawled recursively.\n --exclude [EXCLUDE [EXCLUDE ...]]\n Urls include the snippets will be skipped.\n --workers WORKERS Specify concurrent workers number.\n```\n\n## Examples\n\nBasic usage.\n\n```bash\n$ requests_crawler --seed http://debugtalk.com\n```\n\nCrawl with headers and cookies.\n\n```text\n$ requests_crawler --seed http://debugtalk.com --headers User-Agent:iOS/10.3 --cookies lang:en country:us\n```\n\nCrawl with 30 rps limitation.\n\n```text\n$ requests_crawler --seed http://debugtalk.com --requests-limit 30\n```\n\nCrawl with 500 rpm limitation.\n\n```text\n$ requests_crawler --seed http://debugtalk.com --requests-limit 500 --interval-limit 60\n```\n\nCrawl with extra hosts, e.g. `httprunner.org` will also be crawled recursively.\n\n```text\n$ requests_crawler --seed http://debugtalk.com --include httprunner.org\n```\n\nSkip excluded url snippets, e.g. urls include `httprunner` will be skipped.\n\n```text\n$ requests_crawler --seed http://debugtalk.com --exclude httprunner\n```\n\n\n\n\n[requests-html]: https://github.com/kennethreitz/requests-html\n\n", "description_content_type": "text/markdown", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/debugtalk/webcrawler", "keywords": "web crawler validation", "license": "MIT", "maintainer": "", "maintainer_email": "", "name": "requests-crawler", "package_url": "https://pypi.org/project/requests-crawler/", "platform": "", "project_url": "https://pypi.org/project/requests-crawler/", "project_urls": { "Homepage": "https://github.com/debugtalk/webcrawler" }, "release_url": "https://pypi.org/project/requests-crawler/0.5.4/", "requires_dist": [ "termcolor", "requests-html" ], "requires_python": "", "summary": "A web crawler based on requests-html, mainly targets for url validation test.", "version": "0.5.4" }, "last_serial": 3908742, "releases": { "0.5.2": [ { "comment_text": "", "digests": { "md5": "bf4b7dbf150455ac3f4c001749e00250", "sha256": "16366d80e70e69ae8b90270e4db6855cbd56c479eb04c7797bca5c0cdf3566a1" }, "downloads": -1, "filename": "requests_crawler-0.5.2-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "bf4b7dbf150455ac3f4c001749e00250", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": null, "size": 20613, "upload_time": "2018-05-26T16:05:07", "url": "https://files.pythonhosted.org/packages/1c/30/c34ddc0638ef6e6f98a3aec6ded65748fe0a3414e79758bff2f32265b1a6/requests_crawler-0.5.2-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "676263e06bf426abb3defe6e9d438c37", "sha256": "ce82979b83a7da48f5045b85298f3cc0c85a7f830cb269bf5ac820e6270559f9" }, "downloads": -1, "filename": "requests-crawler-0.5.2.tar.gz", "has_sig": false, "md5_digest": "676263e06bf426abb3defe6e9d438c37", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 11355, "upload_time": "2018-05-26T16:05:08", "url": "https://files.pythonhosted.org/packages/3f/93/7c494291ec676b74b6341e7c466eafa394efdf689797a5d4c9378f71cb0a/requests-crawler-0.5.2.tar.gz" } ], "0.5.4": [ { "comment_text": "", "digests": { "md5": "e9fad5898a1c350dd7730443f3feeb1e", "sha256": "e2b557e8fdfa5e66ccbea8ce7baf7424cc6d4b3582375963b1fee1b23e47a297" }, "downloads": -1, "filename": "requests_crawler-0.5.4-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "e9fad5898a1c350dd7730443f3feeb1e", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": null, "size": 20605, "upload_time": "2018-05-29T11:27:40", "url": "https://files.pythonhosted.org/packages/7d/b7/c73ab226be33a1e1788f8b5165b700f7557266354f6db7619494c9267e6a/requests_crawler-0.5.4-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "f88b3224bc2d36c9d782c5cbe0896e23", "sha256": "fa4e14bc0c2d203765747266b8836e487aa777d86dc0bd34972c7f8cdae098e3" }, "downloads": -1, "filename": "requests-crawler-0.5.4.tar.gz", "has_sig": false, "md5_digest": "f88b3224bc2d36c9d782c5cbe0896e23", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 11330, "upload_time": "2018-05-29T11:27:42", "url": "https://files.pythonhosted.org/packages/61/2a/ce5aa0db4a6d81e27d8676be7734399cf986fda0401f12b220a9fba63785/requests-crawler-0.5.4.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "e9fad5898a1c350dd7730443f3feeb1e", "sha256": "e2b557e8fdfa5e66ccbea8ce7baf7424cc6d4b3582375963b1fee1b23e47a297" }, "downloads": -1, "filename": "requests_crawler-0.5.4-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "e9fad5898a1c350dd7730443f3feeb1e", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": null, "size": 20605, "upload_time": "2018-05-29T11:27:40", "url": "https://files.pythonhosted.org/packages/7d/b7/c73ab226be33a1e1788f8b5165b700f7557266354f6db7619494c9267e6a/requests_crawler-0.5.4-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "f88b3224bc2d36c9d782c5cbe0896e23", "sha256": "fa4e14bc0c2d203765747266b8836e487aa777d86dc0bd34972c7f8cdae098e3" }, "downloads": -1, "filename": "requests-crawler-0.5.4.tar.gz", "has_sig": false, "md5_digest": "f88b3224bc2d36c9d782c5cbe0896e23", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 11330, "upload_time": "2018-05-29T11:27:42", "url": "https://files.pythonhosted.org/packages/61/2a/ce5aa0db4a6d81e27d8676be7734399cf986fda0401f12b220a9fba63785/requests-crawler-0.5.4.tar.gz" } ] }