{ "info": { "author": "wooddance", "author_email": "zireael.me@gmail.com", "bugtrack_url": null, "classifiers": [], "description": "aCrawler\n========\n\n\n.. image:: https://img.shields.io/pypi/v/acrawler.svg\n :target: https://pypi.org/project/acrawler/\n :alt: PyPI\n.. image:: https://readthedocs.org/projects/acrawler/badge/?version=latest\n :target: https://acrawler.readthedocs.io/en/latest/?badge=latest\n :alt: Documentation Status\n\n\ud83d\udd0d A powerful web-crawling framework, based on aiohttp.\n\n\n\nFeature\n-------\n\n\n* Write your crawler in one Python script with asyncio\n* Schedule task with priority, fingerprint, exetime, recrawl...\n* Middleware: add handlers before or after task's execution\n* Simple shortcuts to speed up scripting\n* Parse html conveniently with `Parsel `_\n* Parse with rules and chained processors\n* Support JavaScript/browser-automation with `pyppeteer `_\n* Stop and Resume: crawl periodically and persistently\n* Distributed work support with Redis\n\nInstallation\n------------\n\nTo install, simply use `pipenv `_ (or pip):\n\n.. code-block:: bash\n\n $ pipenv install acrawler\n\n (Optional)\n $ pipenv install uvloop #(only Linux/macOS, for faster asyncio event loop)\n $ pipenv install aioredis #(if you need Redis support)\n $ pipenv install motor #(if you need MongoDB support)\n $ pipenv install aiofiles #(if you need FileRequest)\n\nDocumentation\n-------------\nDocumentation and tutorial are available online at https://acrawler.readthedocs.io/ and in the ``docs``\ndirectory.\n\nSample Code\n-----------\n\n\n\nScrape imdb.com\n^^^^^^^^^^^^^^^\n\n.. code-block:: python\n\n from acrawler import Crawler, Request, ParselItem, Handler, register, get_logger\n\n\n class MovieItem(ParselItem):\n log = True\n css = {\n # just some normal css rules\n # see Parsel for detailed information\n \"date\": \".subtext a[href*=releaseinfo]::text\",\n \"time\": \".subtext time::text\",\n \"rating\": \"span[itemprop=ratingValue]::text\",\n \"rating_count\": \"span[itemprop=ratingCount]::text\",\n \"metascore\": \".metacriticScore span::text\",\n\n # if you provide a list with additional functions,\n # they are considered as field processor function\n \"title\": [\"h1::text\", str.strip],\n\n # the following four fules is for getting all matching values\n # the rule starts with [ and ends with ] comparing to normal rules\n \"genres\": \"[.subtext a[href*=genres]::text]\",\n \"director\": \"[h4:contains(Director) ~ a[href*=name]::text]\",\n \"writers\": \"[h4:contains(Writer) ~ a[href*=name]::text]\",\n \"stars\": \"[h4:contains(Star) ~ a[href*=name]::text]\",\n }\n\n\n class IMDBCrawler(Crawler):\n config = {\"MAX_REQUESTS\": 4, \"DOWNLOAD_DELAY\": 1}\n\n async def start_requests(self):\n yield Request(\"https://www.imdb.com/chart/moviemeter\", callback=self.parse)\n\n def parse(self, response):\n yield from response.follow(\n \".lister-list tr .titleColumn a::attr(href)\", callback=self.parse_movie\n )\n\n def parse_movie(self, response):\n url = response.url_str\n yield MovieItem(response.sel, extra={\"url\": url.split(\"?\")[0]})\n\n\n @register()\n class HorrorHandler(Handler):\n family = \"MovieItem\"\n logger = get_logger(\"horrorlog\")\n\n async def handle_after(self, item):\n if item[\"genres\"] and \"Horror\" in item[\"genres\"]:\n self.logger.warning(f\"({item['title']}) is a horror movie!!!!\")\n\n\n @MovieItem.bind()\n def process_time(value):\n # a self-defined field processing function\n # process time to minutes\n # '3h 1min' -> 181\n if value:\n res = 0\n segs = value.split(\" \")\n for seg in segs:\n if seg.endswith(\"min\"):\n res += int(seg.replace(\"min\", \"\"))\n elif seg.endswith(\"h\"):\n res += 60 * int(seg.replace(\"h\", \"\"))\n return res\n return value\n\n\n if __name__ == \"__main__\":\n IMDBCrawler().run()\n\n\n\nScrape quotes.toscrape.com\n^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. code-block:: python\n\n # Scrape quotes from http://quotes.toscrape.com/\n from acrawler import Parser, Crawler, ParselItem, Request\n\n\n logger = get_logger(\"quotes\")\n\n\n class QuoteItem(ParselItem):\n log = True\n default = {\"type\": \"quote\"}\n css = {\"author\": \"small.author::text\"}\n xpath = {\"text\": ['.//span[@class=\"text\"]/text()', lambda s: s.strip(\"\u201c\")[:20]]}\n\n\n class AuthorItem(ParselItem):\n log = True\n default = {\"type\": \"author\"}\n css = {\"name\": \"h3.author-title::text\", \"born\": \"span.author-born-date::text\"}\n\n class QuoteCrawler(Crawler):\n\n main_page = r\"quotes.toscrape.com/page/\\d+\"\n author_page = r\"quotes.toscrape.com/author/.*\"\n parsers = [\n Parser(\n in_pattern=main_page,\n follow_patterns=[main_page, author_page],\n item_type=QuoteItem,\n css_divider=\".quote\",\n ),\n Parser(in_pattern=author_page, item_type=AuthorItem),\n ]\n\n async def start_requests(self):\n yield Request(url=\"http://quotes.toscrape.com/page/1/\")\n\n\n if __name__ == \"__main__\":\n QuoteCrawler().run()\n\n\nSee `examples `_.\n\n\nTodo\n----\n\n* Add delta_key support for request\n* Cralwer's name for distinguishing\n* Command Line config support\n* Monitor all crawlers in web\n* Write detailed Documentation\n* Write testing code", "description_content_type": "", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/wooddance/aCrawler", "keywords": "", "license": "", "maintainer": "", "maintainer_email": "", "name": "acrawler", "package_url": "https://pypi.org/project/acrawler/", "platform": "", "project_url": "https://pypi.org/project/acrawler/", "project_urls": { "Homepage": "https://github.com/wooddance/aCrawler" }, "release_url": "https://pypi.org/project/acrawler/0.1.4/", "requires_dist": null, "requires_python": ">=3.6.0", "summary": "A simple web-crawling framework, based on aiohttp.", "version": "0.1.4" }, "last_serial": 5840290, "releases": { "0.0.1": [ { "comment_text": "", "digests": { "md5": "e1427f8880abe4c8b51659329917bb53", "sha256": "57633919a4c4b9727961a812110f54036e1f48acae102047a1ff90636ce36766" }, "downloads": -1, "filename": "acrawler-0.0.1-py3.6.egg", "has_sig": false, "md5_digest": "e1427f8880abe4c8b51659329917bb53", "packagetype": "bdist_egg", "python_version": "3.6", "requires_python": ">=3.6.0", "size": 42445, "upload_time": "2019-05-09T11:33:19", "url": "https://files.pythonhosted.org/packages/18/39/c3aa20065749e98f0f902bc3e4ff570e7373140e10d5c37376ee5452a984/acrawler-0.0.1-py3.6.egg" }, { "comment_text": "", "digests": { "md5": "763476ffc7a2c7441137916ea7269576", "sha256": "0a3f2023408049c6e5dfd7986e45996b19626fa60c792eddc341a8b7df167305" }, "downloads": -1, "filename": "acrawler-0.0.1.tar.gz", "has_sig": false, "md5_digest": "763476ffc7a2c7441137916ea7269576", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.6.0", "size": 12525, "upload_time": "2019-05-09T11:33:21", "url": "https://files.pythonhosted.org/packages/18/5a/66b9561bc1586df10c844f3736e5a93ac4cc846c936c98ba0c64a3261eaa/acrawler-0.0.1.tar.gz" } ], "0.0.2": [ { "comment_text": "", "digests": { "md5": "73ed3f9c2fef122c5fe9fba84aba3fb7", "sha256": "0b6b4cae062046144a84c616bc47fbd0434826a0c3ed5192131c8a0c6e0c7cf5" }, "downloads": -1, "filename": "acrawler-0.0.2.tar.gz", "has_sig": false, "md5_digest": "73ed3f9c2fef122c5fe9fba84aba3fb7", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.6.0", "size": 13108, "upload_time": "2019-05-15T05:09:28", "url": "https://files.pythonhosted.org/packages/3c/2a/e5671215ca46a6c52dcee3b8df5fbb8ae1bfc0e6ceef3ff1ac489ce8b6b1/acrawler-0.0.2.tar.gz" } ], "0.0.3": [ { "comment_text": "", "digests": { "md5": "271e4b445564312f253b7afdc031a5c7", "sha256": "1668a40dfa627cb8637aa734ca6bce7a5b8e90cbcbd4807279a33333181a12a6" }, "downloads": -1, "filename": "acrawler-0.0.3.tar.gz", "has_sig": false, "md5_digest": "271e4b445564312f253b7afdc031a5c7", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.6.0", "size": 14726, "upload_time": "2019-05-18T13:38:11", "url": "https://files.pythonhosted.org/packages/ba/04/3756feab81c212e07ca0187b22d4256fc70bcb108fe565dba7f9e8984d60/acrawler-0.0.3.tar.gz" } ], "0.0.4": [ { "comment_text": "", "digests": { "md5": "de50c642c28b5a205f1cb7e2bb26fe74", "sha256": "be14004d7af54d8d4005810d2b5a61f89a6c16c4670c5530f022ccbffd419757" }, "downloads": -1, "filename": "acrawler-0.0.4.tar.gz", "has_sig": false, "md5_digest": "de50c642c28b5a205f1cb7e2bb26fe74", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.6.0", "size": 17082, "upload_time": "2019-05-20T05:04:03", "url": "https://files.pythonhosted.org/packages/b9/c7/f67ce556913396f972f1b64e59aa221f6e7030c5291b29bc5ed34f12e4c8/acrawler-0.0.4.tar.gz" } ], "0.0.5": [ { "comment_text": "", "digests": { "md5": "db875db2a96aee5208abbca8789ff493", "sha256": "b2c17426209f6edc93d8c6240d1b3d1fd6570c25e7ec3ef8ada98ef7cd89684b" }, "downloads": -1, "filename": "acrawler-0.0.5.tar.gz", "has_sig": false, "md5_digest": "db875db2a96aee5208abbca8789ff493", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.6.0", "size": 17940, "upload_time": "2019-05-21T01:53:46", "url": "https://files.pythonhosted.org/packages/78/07/031e7c257e34bbf208d12e29cb9bd5f0e393a67bd629acaaa84d0104556f/acrawler-0.0.5.tar.gz" } ], "0.0.6": [ { "comment_text": "", "digests": { "md5": "36784ebbc8e005dd5b3411d2107cc966", "sha256": "18c425d7ba3f40908e87d33b0a3c9b7ab6f773c74249cb1cda0a279bbae0c09d" }, "downloads": -1, "filename": "acrawler-0.0.6.tar.gz", "has_sig": false, "md5_digest": "36784ebbc8e005dd5b3411d2107cc966", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.6.0", "size": 20616, "upload_time": "2019-05-27T13:10:07", "url": "https://files.pythonhosted.org/packages/32/82/2355db01960e5cf0d32d54d569a3719faca8d2198718e2358f4c0c1f98f1/acrawler-0.0.6.tar.gz" } ], "0.0.7": [ { "comment_text": "", "digests": { "md5": "897ec46e4f6671594e59256dd9ee2d7d", "sha256": "b4344187ca2eb5aa0bca5fc87430b5157ebea2d73ec19ce6745fa7cd7c96d15a" }, "downloads": -1, "filename": "acrawler-0.0.7.tar.gz", "has_sig": false, "md5_digest": "897ec46e4f6671594e59256dd9ee2d7d", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.6.0", "size": 23916, "upload_time": "2019-06-04T16:45:25", "url": "https://files.pythonhosted.org/packages/40/08/6319c2bd806a283808fc5264080cf367861d2297999817e7f1fd7bccd7b5/acrawler-0.0.7.tar.gz" } ], "0.0.8": [ { "comment_text": "", "digests": { "md5": "c42f843bdc4e4d5c1411e5fab358700f", "sha256": "7bde5175a8076007f09ed6c0cad0fbb9eaba768746884203b87edc67bde2b325" }, "downloads": -1, "filename": "acrawler-0.0.8.tar.gz", "has_sig": false, "md5_digest": "c42f843bdc4e4d5c1411e5fab358700f", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.6.0", "size": 26262, "upload_time": "2019-07-02T05:00:03", "url": "https://files.pythonhosted.org/packages/8f/ab/882ac59b458a9c5c48d907a94291e6af28a55a1111037cb27e64076747a2/acrawler-0.0.8.tar.gz" } ], "0.0.9": [ { "comment_text": "", "digests": { "md5": "b74a5f932b8cf79026bdf3d03996d491", "sha256": "1205be9685f5cf06de5220ef24e1a981e09ffd2f652311e838a6df3105324ee1" }, "downloads": -1, "filename": "acrawler-0.0.9.tar.gz", "has_sig": false, "md5_digest": "b74a5f932b8cf79026bdf3d03996d491", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.6.0", "size": 31055, "upload_time": "2019-07-28T01:59:00", "url": "https://files.pythonhosted.org/packages/e5/55/d8464e912b74c0908c30b48d713fdd4dc811d05ee59a846ce5e1258d6c7d/acrawler-0.0.9.tar.gz" } ], "0.1.0": [ { "comment_text": "", "digests": { "md5": "f8051d89314a48346a0a875ba683fd01", "sha256": "b5849835f8ee22daece371ae500b465ae5878529c01fc8598be38d82ea7661bb" }, "downloads": -1, "filename": "acrawler-0.1.0-py3.6.egg", "has_sig": false, "md5_digest": "f8051d89314a48346a0a875ba683fd01", "packagetype": "bdist_egg", "python_version": "3.6", "requires_python": ">=3.6.0", "size": 88396, "upload_time": "2019-08-23T12:53:34", "url": "https://files.pythonhosted.org/packages/6e/82/f55ccb5c4c9d707dc9bf5d5764e3a4e507bad3618695782889039326021a/acrawler-0.1.0-py3.6.egg" } ], "0.1.1": [ { "comment_text": "", "digests": { "md5": "4b23a3071002704a70decb847bdcb08f", "sha256": "02ee6f49406f34bb573b5e06b92c64272f395ee242e692dc356513116643ad55" }, "downloads": -1, "filename": "acrawler-0.1.1.tar.gz", "has_sig": false, "md5_digest": "4b23a3071002704a70decb847bdcb08f", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.6.0", "size": 32376, "upload_time": "2019-08-23T23:44:36", "url": "https://files.pythonhosted.org/packages/5b/3d/ef3a07b081c511c7161ca73e28239748c0b30520d0c6e4308d7b6077a685/acrawler-0.1.1.tar.gz" } ], "0.1.2a0": [ { "comment_text": "", "digests": { "md5": "18ac0a28f04911dc112cfee8254ea69f", "sha256": "a748dd582cf0ac6436beadf8fe2123229fe3ae47a24ae7b4491486868ec4ca5d" }, "downloads": -1, "filename": "acrawler-0.1.2a0.tar.gz", "has_sig": false, "md5_digest": "18ac0a28f04911dc112cfee8254ea69f", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.6.0", "size": 32364, "upload_time": "2019-08-24T00:34:54", "url": "https://files.pythonhosted.org/packages/b2/dd/bd4f4ddb5d3375e1f8d194689dd0eff463a759479304ab94ddb3eb5a5c73/acrawler-0.1.2a0.tar.gz" } ], "0.1.3": [ { "comment_text": "", "digests": { "md5": "b9f6b2a7a7423f179e57f91bf38e0547", "sha256": "d5a5b0244ad567336cea665275db464ad2a22c2e843e772b0956d18b20270dc7" }, "downloads": -1, "filename": "acrawler-0.1.3.tar.gz", "has_sig": false, "md5_digest": "b9f6b2a7a7423f179e57f91bf38e0547", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.6.0", "size": 32372, "upload_time": "2019-08-24T00:39:03", "url": "https://files.pythonhosted.org/packages/e0/63/15471255b49d989a23b3fa7c9ffd42772d4cb52de2e286135ec04ac7b7e6/acrawler-0.1.3.tar.gz" } ], "0.1.4": [ { "comment_text": "", "digests": { "md5": "fcd5b2d2cb8ea02f688a02b6f3cafd71", "sha256": "acceeea2e613bdc5798b995af1b791e4ca51389f575546e694ebcae7bbee858b" }, "downloads": -1, "filename": "acrawler-0.1.4.tar.gz", "has_sig": false, "md5_digest": "fcd5b2d2cb8ea02f688a02b6f3cafd71", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.6.0", "size": 32249, "upload_time": "2019-09-17T08:41:00", "url": "https://files.pythonhosted.org/packages/64/60/5a35bf86254b76ba516d864d9aef9cbacab9fe589cf499f6aa1b39b12694/acrawler-0.1.4.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "fcd5b2d2cb8ea02f688a02b6f3cafd71", "sha256": "acceeea2e613bdc5798b995af1b791e4ca51389f575546e694ebcae7bbee858b" }, "downloads": -1, "filename": "acrawler-0.1.4.tar.gz", "has_sig": false, "md5_digest": "fcd5b2d2cb8ea02f688a02b6f3cafd71", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.6.0", "size": 32249, "upload_time": "2019-09-17T08:41:00", "url": "https://files.pythonhosted.org/packages/64/60/5a35bf86254b76ba516d864d9aef9cbacab9fe589cf499f6aa1b39b12694/acrawler-0.1.4.tar.gz" } ] }