{ "info": { "author": "Tapan Pandita", "author_email": "tapan.pandita@gmail.com", "bugtrack_url": null, "classifiers": [ "Framework :: AsyncIO", "Intended Audience :: Developers", "License :: OSI Approved :: MIT License", "Operating System :: OS Independent", "Programming Language :: Python :: 3", "Programming Language :: Python :: 3.7" ], "description": "AIOCrawler\n==========\n[![Build Status](https://travis-ci.org/tapanpandita/aiocrawler.svg?branch=master)](https://travis-ci.org/tapanpandita/aiocrawler)\n[![Codacy Badge](https://api.codacy.com/project/badge/Grade/eab04685503c490082f1c6a545c4016e)](https://www.codacy.com/app/tapanpandita/aiocrawler?utm_source=github.com&utm_medium=referral&utm_content=tapanpandita/aiocrawler&utm_campaign=Badge_Grade)\n[![PyPI version](https://badge.fury.io/py/pyaiocrawler.svg)](https://badge.fury.io/py/pyaiocrawler)\n\nAsynchronous web crawler built on [asyncio](https://docs.python.org/3/library/asyncio.html)\n\nInstallation\n------------\n```shell\npip install pyaiocrawler\n```\nUsage\n-----\n### Generating sitemap\n```python\nimport asyncio\nfrom aiocrawler import SitemapCrawler\n\ncrawler = SitemapCrawler('https://www.google.com', depth=3)\nsitemap = asyncio.run(crawler.get_results())\n```\n### Configuring the crawler\n```python\nfrom aiocrawler import SitemapCrawler\n\ncrawler = SitemapCrawler(\n init_url='https://www.google.com', # The base URL to start crawling from\n depth=3, # The maximum depth to crawl till\n concurrency=300, # Maximum concurrent requests to make\n max_retries=3, # Maximum times the crawler will retry to get a response from a URL\n user_agent='My Crawler', # Use a custom user agent for requests\n)\n```\n### Extending the crawler\nTo create your own crawler, simply subclass `AIOCrawler` and implement the `parse` method. For every page crawled, the `parse` method is executed with the url of the page, the links in that page and the html of the page. The return of the `parse` method is appended to an array which is then available when the `get_results` method is called. We have implemented an example crawler here that extracts the title from the page.\n```python\nimport asyncio\nfrom aiocrawler import AIOCrawler\nfrom bs4 import BeautifulSoup # We will use beautifulsoup to extract the title from the html\nfrom typing import Set, Tuple\n\n\nclass TitleScraper(AIOCrawler):\n '''\n Subclasses AIOCrawler to extract titles for the pages on the given domain\n '''\n timeout = 10\n max_redirects = 2\n\n def parse(self, url: str, links: Set[str], html: bytes) -> Tuple[str, str]:\n '''\n Returns the url and the title of the url\n '''\n soup = BeautifulSoup(html, 'html.parser')\n title = soup.find('title').string\n return url, title\n\n\ncrawler = TitleScraper('https://www.google.com', 3)\ntitles = asyncio.run(crawler.get_results())\n```\nContributing\n------------\n### Installing dependencies\n```shell\npipenv install --dev\n```\n### Running tests\n```shell\npytest --cov=aiocrawler\n```\n\n\n", "description_content_type": "text/markdown", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/tapanpandita/aiocrawler", "keywords": "", "license": "MIT", "maintainer": "", "maintainer_email": "", "name": "pyaiocrawler", "package_url": "https://pypi.org/project/pyaiocrawler/", "platform": "", "project_url": "https://pypi.org/project/pyaiocrawler/", "project_urls": { "Homepage": "https://github.com/tapanpandita/aiocrawler" }, "release_url": "https://pypi.org/project/pyaiocrawler/0.5.0/", "requires_dist": [ "aiohttp", "beautifulsoup4", "cchardet", "aiodns" ], "requires_python": "", "summary": "Asynchronous web crawler built on asyncio", "version": "0.5.0" }, "last_serial": 5717515, "releases": { "0.1.0": [ { "comment_text": "", "digests": { "md5": "4794ad5dc7e8d37edd4b853036fa5fcf", "sha256": "e127070c59faa1ef056f7089c9147e0921e237bba2b6e8cc4b1b3d3368a850b5" }, "downloads": -1, "filename": "pyaiocrawler-0.1.0-py3-none-any.whl", "has_sig": false, "md5_digest": "4794ad5dc7e8d37edd4b853036fa5fcf", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 3744, "upload_time": "2019-08-07T17:24:42", "url": "https://files.pythonhosted.org/packages/1b/76/09c0ff49732baa96c66646d28e4ee1a9f12e1f30de193435f7e9fd4f7214/pyaiocrawler-0.1.0-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "7f01c437cdd796f13459ff2e09adb835", "sha256": "a41779df263c1c907faa52ca1145079f42d7a2a3122326523f5d501e3da0e80b" }, "downloads": -1, "filename": "pyaiocrawler-0.1.0.tar.gz", "has_sig": false, "md5_digest": "7f01c437cdd796f13459ff2e09adb835", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 2620, "upload_time": "2019-08-07T17:24:44", "url": "https://files.pythonhosted.org/packages/34/c1/114f8987f4e72cf73dd047887a1c78eac1efc588bc323ae710ab53309c6c/pyaiocrawler-0.1.0.tar.gz" } ], "0.1.1": [ { "comment_text": "", "digests": { "md5": "8ae12f9d8326d18e1d4e685b775ed924", "sha256": "aa8c555cd308ff5abe5febc76782258d9c5e69c3f17b3651f1f8e86666849d6f" }, "downloads": -1, "filename": "pyaiocrawler-0.1.1-py3-none-any.whl", "has_sig": false, "md5_digest": "8ae12f9d8326d18e1d4e685b775ed924", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 4175, "upload_time": "2019-08-07T19:04:34", "url": "https://files.pythonhosted.org/packages/ad/04/b182f0f8b250f38a650f12b939d57a1f2a5d0d84c8847eb0fb7352d41ca3/pyaiocrawler-0.1.1-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "51f2c78a0f81b648ba9545c4b61f4430", "sha256": "49c23687a0ea066c0ef4aab3c6a8b2cf6fd64faf4b8eab115a3053c047c9ea4a" }, "downloads": -1, "filename": "pyaiocrawler-0.1.1.tar.gz", "has_sig": false, "md5_digest": "51f2c78a0f81b648ba9545c4b61f4430", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 3164, "upload_time": "2019-08-07T19:04:37", "url": "https://files.pythonhosted.org/packages/e2/de/c4a5add11a6d496c79b28d044b1ab53f245242c7f39fe4055430c1b73036/pyaiocrawler-0.1.1.tar.gz" } ], "0.1.2": [ { "comment_text": "", "digests": { "md5": "11ac0f15dd16ec6124eae2b1d118004a", "sha256": "4f69c992226aaef0b19e6c2432d7e9a5d0a452cf050a8d8afb78371b3e47436f" }, "downloads": -1, "filename": "pyaiocrawler-0.1.2-py3-none-any.whl", "has_sig": false, "md5_digest": "11ac0f15dd16ec6124eae2b1d118004a", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 4221, "upload_time": "2019-08-07T19:03:36", "url": "https://files.pythonhosted.org/packages/73/bc/9d688727de942cf7201a7c561cd5c08fca56866ab714b43d26bd740d4e1a/pyaiocrawler-0.1.2-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "5cb7129eb2c3da7916365bd5b155f8ba", "sha256": "4b0fd1a8b57645a8808341179067471478f148f0d8da73861933ba7640289c27" }, "downloads": -1, "filename": "pyaiocrawler-0.1.2.tar.gz", "has_sig": false, "md5_digest": "5cb7129eb2c3da7916365bd5b155f8ba", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 3208, "upload_time": "2019-08-07T19:03:38", "url": "https://files.pythonhosted.org/packages/f3/9c/7c8552a53e466f422919068e3715ff9382cb61adc9d14c443e90093a033c/pyaiocrawler-0.1.2.tar.gz" } ], "0.2.0": [ { "comment_text": "", "digests": { "md5": "3c8918949d6104be776a004788348bff", "sha256": "bc742de513e3638be5dfaef2a96bfef759e2e62bca95996445b78600523c14ab" }, "downloads": -1, "filename": "pyaiocrawler-0.2.0-py3-none-any.whl", "has_sig": false, "md5_digest": "3c8918949d6104be776a004788348bff", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 4984, "upload_time": "2019-08-08T16:24:16", "url": "https://files.pythonhosted.org/packages/98/15/5713e718cfb3271fe0d50b6cf586d5ea5ac55c5484a5da3b8312993964e3/pyaiocrawler-0.2.0-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "83ecc0cf9b5ee9ae0d89831f2102f886", "sha256": "9acf43bc9d9fcf6b68e6b950e9e50d448d8a842a3ded5bd9ba2b9a53f464cebc" }, "downloads": -1, "filename": "pyaiocrawler-0.2.0.tar.gz", "has_sig": false, "md5_digest": "83ecc0cf9b5ee9ae0d89831f2102f886", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 3907, "upload_time": "2019-08-08T16:24:18", "url": "https://files.pythonhosted.org/packages/d4/46/127217d16df7cc204572cf4944defa3fd9d1d2ac91f2396c99b8d3e58e9c/pyaiocrawler-0.2.0.tar.gz" } ], "0.2.1": [ { "comment_text": "", "digests": { "md5": "5e15d9fe4838bd84b276d21d6e430ae4", "sha256": "d5f084905f3eb8608606c48c853cb53542411496a237a014446e011f39c3cfec" }, "downloads": -1, "filename": "pyaiocrawler-0.2.1-py3-none-any.whl", "has_sig": false, "md5_digest": "5e15d9fe4838bd84b276d21d6e430ae4", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 4984, "upload_time": "2019-08-08T16:28:40", "url": "https://files.pythonhosted.org/packages/ec/d0/68a81c59ce335d92ccdbcc535502fb54f1cd69d332f506d795be4d9400e4/pyaiocrawler-0.2.1-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "dc9249eb581f277421d5c6b02e9086f3", "sha256": "0834cb9277896c89608f9814ed9d1a34ae0152a58156a4974122168a41839f9d" }, "downloads": -1, "filename": "pyaiocrawler-0.2.1.tar.gz", "has_sig": false, "md5_digest": "dc9249eb581f277421d5c6b02e9086f3", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 3914, "upload_time": "2019-08-08T16:28:42", "url": "https://files.pythonhosted.org/packages/88/44/47611c1663bd07d001f3b6c25e7c485a0694571b0364c02e1786ec80a398/pyaiocrawler-0.2.1.tar.gz" } ], "0.2.2": [ { "comment_text": "", "digests": { "md5": "016ddc6e8b89df542c91ab13c358dbb1", "sha256": "903054cb3b473dfc01f431529e2ba6eb37ad403d44fec27cc1fd26a1466f3ade" }, "downloads": -1, "filename": "pyaiocrawler-0.2.2-py3-none-any.whl", "has_sig": false, "md5_digest": "016ddc6e8b89df542c91ab13c358dbb1", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 4984, "upload_time": "2019-08-10T08:16:12", "url": "https://files.pythonhosted.org/packages/aa/88/3c2d4f4759d7143f2759f0c3b05c309eecc5e0985a4fde2c2a39709bf00d/pyaiocrawler-0.2.2-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "9dbb3e07415b93230c538c4f38edaaa2", "sha256": "e867b6f92e32771507d172ef8780c89d75e32bd7ffbe926b9773384edc8d1e91" }, "downloads": -1, "filename": "pyaiocrawler-0.2.2.tar.gz", "has_sig": false, "md5_digest": "9dbb3e07415b93230c538c4f38edaaa2", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 3889, "upload_time": "2019-08-10T08:16:13", "url": "https://files.pythonhosted.org/packages/78/74/754ceea971523181208c922d83f635997719c407090200775fefbe09c2c2/pyaiocrawler-0.2.2.tar.gz" } ], "0.3.0": [ { "comment_text": "", "digests": { "md5": "bd69236198870fc2621bcd2e81e8dfeb", "sha256": "27b891b85b0db3239a1340d84503a897f3a326416fbc98a9385b70f8a06014d2" }, "downloads": -1, "filename": "pyaiocrawler-0.3.0-py3-none-any.whl", "has_sig": false, "md5_digest": "bd69236198870fc2621bcd2e81e8dfeb", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 5566, "upload_time": "2019-08-11T13:58:28", "url": "https://files.pythonhosted.org/packages/e2/f2/a4f0b90bbaaf36ead894828a3006002b098c6355151bc866593ad095ab37/pyaiocrawler-0.3.0-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "29868be078057a3e2f3a6ff4c34849b5", "sha256": "90c077d7e5013d7da82b59f006ef3d8592a618ed2aab112c28bb58aadf8e66d1" }, "downloads": -1, "filename": "pyaiocrawler-0.3.0.tar.gz", "has_sig": false, "md5_digest": "29868be078057a3e2f3a6ff4c34849b5", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 4491, "upload_time": "2019-08-11T13:58:29", "url": "https://files.pythonhosted.org/packages/09/9d/b4bcc47e77795741b63157f78d4642b5663b3e195b23a2d0c026b4b3f17d/pyaiocrawler-0.3.0.tar.gz" } ], "0.3.2": [ { "comment_text": "", "digests": { "md5": "cf12ac2db1bdaa011a44d66b55fa155d", "sha256": "28098ee86067b7f6273513874c39d0b641aa8000be630fe2ef6f123df257e4c1" }, "downloads": -1, "filename": "pyaiocrawler-0.3.2-py3-none-any.whl", "has_sig": false, "md5_digest": "cf12ac2db1bdaa011a44d66b55fa155d", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 5684, "upload_time": "2019-08-11T15:42:16", "url": "https://files.pythonhosted.org/packages/00/5b/b3ea95180871397ad9bcc7767eb422fb1b25e1896ee3c6ce1caf62254ec2/pyaiocrawler-0.3.2-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "8c35cbb290beeffe3488824a7f53e552", "sha256": "3200445d9fa6137a2855b553a69c1ce1dce6fd70ad9b77cc49c89495470742f0" }, "downloads": -1, "filename": "pyaiocrawler-0.3.2.tar.gz", "has_sig": false, "md5_digest": "8c35cbb290beeffe3488824a7f53e552", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 4664, "upload_time": "2019-08-11T15:42:18", "url": "https://files.pythonhosted.org/packages/c2/70/1985a4f19339530794c08f05beb37c6e5dbd51f7c9ae021b1494e144654c/pyaiocrawler-0.3.2.tar.gz" } ], "0.3.3": [ { "comment_text": "", "digests": { "md5": "0d5750ea17d4f7d257071631aa0ee2c9", "sha256": "a9c396e923e9d0c78b3bb1709a88948c88c94295b8df1a166e65c064ab460c20" }, "downloads": -1, "filename": "pyaiocrawler-0.3.3-py3-none-any.whl", "has_sig": false, "md5_digest": "0d5750ea17d4f7d257071631aa0ee2c9", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 5686, "upload_time": "2019-08-12T13:25:23", "url": "https://files.pythonhosted.org/packages/01/9f/dfd5ecad08830fc724e54b632bc6c84484ae854454bee637c1f1db57fdd6/pyaiocrawler-0.3.3-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "c4700cb8ce5a5c45b307d2effdbe3b54", "sha256": "734a9f4ff1a4738d6e7866ffba507feb87dd062e693f8cefd1bab295efd13b41" }, "downloads": -1, "filename": "pyaiocrawler-0.3.3.tar.gz", "has_sig": false, "md5_digest": "c4700cb8ce5a5c45b307d2effdbe3b54", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 4666, "upload_time": "2019-08-12T13:25:25", "url": "https://files.pythonhosted.org/packages/30/77/189ebd25df63e19f2b0fe57e5f410f761267b3dff6c168497f7bb2d08d9f/pyaiocrawler-0.3.3.tar.gz" } ], "0.4.3": [ { "comment_text": "", "digests": { "md5": "81c229a7466e5792e77d916824d15ae5", "sha256": "075cbf003d872918db75c976974dd6688956e799fdac1c91c4792ac2784ae8dc" }, "downloads": -1, "filename": "pyaiocrawler-0.4.3-py3-none-any.whl", "has_sig": false, "md5_digest": "81c229a7466e5792e77d916824d15ae5", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 5973, "upload_time": "2019-08-22T21:30:19", "url": "https://files.pythonhosted.org/packages/42/d7/0360b293335eff4ab305be5378b8f0738395a21010a0fc60850aaeef88ba/pyaiocrawler-0.4.3-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "15ee4c41e8a68a32539253033a452ee7", "sha256": "6bb8f770f7e4a63673a4290d007c47f9e8abb35adf8d10131eefb9dad1ba7553" }, "downloads": -1, "filename": "pyaiocrawler-0.4.3.tar.gz", "has_sig": false, "md5_digest": "15ee4c41e8a68a32539253033a452ee7", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 4748, "upload_time": "2019-08-22T21:30:21", "url": "https://files.pythonhosted.org/packages/f9/79/bb24585524d594d53dc6340d973c7db023b51432584bc39e4973bd8bd525/pyaiocrawler-0.4.3.tar.gz" } ], "0.5.0": [ { "comment_text": "", "digests": { "md5": "9109abd3523f1c4f79129032777616aa", "sha256": "c2dde87c72ee1bf6ab807905cf1a38fe9d9c1483f09a24893975e250469a044d" }, "downloads": -1, "filename": "pyaiocrawler-0.5.0-py3-none-any.whl", "has_sig": false, "md5_digest": "9109abd3523f1c4f79129032777616aa", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 5979, "upload_time": "2019-08-22T21:38:09", "url": "https://files.pythonhosted.org/packages/2c/6d/6a0bc696a2b16a2a5dfe17a80875a33c58f8b2d2ba91a7fc93c3eee71a76/pyaiocrawler-0.5.0-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "e39d8b7a589329b4a85f43703e136d61", "sha256": "a07db1e07f8987c2bac5abfac36babfe3104e39d1b4481cea5bff9ccb0733678" }, "downloads": -1, "filename": "pyaiocrawler-0.5.0.tar.gz", "has_sig": false, "md5_digest": "e39d8b7a589329b4a85f43703e136d61", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 4752, "upload_time": "2019-08-22T21:38:10", "url": "https://files.pythonhosted.org/packages/6d/c2/a448e9b227bc4388431aa300f262657dd8d0b399bc2e0540c7df07544fdd/pyaiocrawler-0.5.0.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "9109abd3523f1c4f79129032777616aa", "sha256": "c2dde87c72ee1bf6ab807905cf1a38fe9d9c1483f09a24893975e250469a044d" }, "downloads": -1, "filename": "pyaiocrawler-0.5.0-py3-none-any.whl", "has_sig": false, "md5_digest": "9109abd3523f1c4f79129032777616aa", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 5979, "upload_time": "2019-08-22T21:38:09", "url": "https://files.pythonhosted.org/packages/2c/6d/6a0bc696a2b16a2a5dfe17a80875a33c58f8b2d2ba91a7fc93c3eee71a76/pyaiocrawler-0.5.0-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "e39d8b7a589329b4a85f43703e136d61", "sha256": "a07db1e07f8987c2bac5abfac36babfe3104e39d1b4481cea5bff9ccb0733678" }, "downloads": -1, "filename": "pyaiocrawler-0.5.0.tar.gz", "has_sig": false, "md5_digest": "e39d8b7a589329b4a85f43703e136d61", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 4752, "upload_time": "2019-08-22T21:38:10", "url": "https://files.pythonhosted.org/packages/6d/c2/a448e9b227bc4388431aa300f262657dd8d0b399bc2e0540c7df07544fdd/pyaiocrawler-0.5.0.tar.gz" } ] }