{
"info": {
"author": "wooddance",
"author_email": "zireael.me@gmail.com",
"bugtrack_url": null,
"classifiers": [],
"description": "aCrawler\n========\n\n\n.. image:: https://img.shields.io/pypi/v/acrawler.svg\n :target: https://pypi.org/project/acrawler/\n :alt: PyPI\n.. image:: https://readthedocs.org/projects/acrawler/badge/?version=latest\n :target: https://acrawler.readthedocs.io/en/latest/?badge=latest\n :alt: Documentation Status\n\n\ud83d\udd0d A powerful web-crawling framework, based on aiohttp.\n\n\n\nFeature\n-------\n\n\n* Write your crawler in one Python script with asyncio\n* Schedule task with priority, fingerprint, exetime, recrawl...\n* Middleware: add handlers before or after task's execution\n* Simple shortcuts to speed up scripting\n* Parse html conveniently with `Parsel `_\n* Parse with rules and chained processors\n* Support JavaScript/browser-automation with `pyppeteer `_\n* Stop and Resume: crawl periodically and persistently\n* Distributed work support with Redis\n\nInstallation\n------------\n\nTo install, simply use `pipenv `_ (or pip):\n\n.. code-block:: bash\n\n $ pipenv install acrawler\n\n (Optional)\n $ pipenv install uvloop #(only Linux/macOS, for faster asyncio event loop)\n $ pipenv install aioredis #(if you need Redis support)\n $ pipenv install motor #(if you need MongoDB support)\n $ pipenv install aiofiles #(if you need FileRequest)\n\nDocumentation\n-------------\nDocumentation and tutorial are available online at https://acrawler.readthedocs.io/ and in the ``docs``\ndirectory.\n\nSample Code\n-----------\n\n\n\nScrape imdb.com\n^^^^^^^^^^^^^^^\n\n.. code-block:: python\n\n from acrawler import Crawler, Request, ParselItem, Handler, register, get_logger\n\n\n class MovieItem(ParselItem):\n log = True\n css = {\n # just some normal css rules\n # see Parsel for detailed information\n \"date\": \".subtext a[href*=releaseinfo]::text\",\n \"time\": \".subtext time::text\",\n \"rating\": \"span[itemprop=ratingValue]::text\",\n \"rating_count\": \"span[itemprop=ratingCount]::text\",\n \"metascore\": \".metacriticScore span::text\",\n\n # if you provide a list with additional functions,\n # they are considered as field processor function\n \"title\": [\"h1::text\", str.strip],\n\n # the following four fules is for getting all matching values\n # the rule starts with [ and ends with ] comparing to normal rules\n \"genres\": \"[.subtext a[href*=genres]::text]\",\n \"director\": \"[h4:contains(Director) ~ a[href*=name]::text]\",\n \"writers\": \"[h4:contains(Writer) ~ a[href*=name]::text]\",\n \"stars\": \"[h4:contains(Star) ~ a[href*=name]::text]\",\n }\n\n\n class IMDBCrawler(Crawler):\n config = {\"MAX_REQUESTS\": 4, \"DOWNLOAD_DELAY\": 1}\n\n async def start_requests(self):\n yield Request(\"https://www.imdb.com/chart/moviemeter\", callback=self.parse)\n\n def parse(self, response):\n yield from response.follow(\n \".lister-list tr .titleColumn a::attr(href)\", callback=self.parse_movie\n )\n\n def parse_movie(self, response):\n url = response.url_str\n yield MovieItem(response.sel, extra={\"url\": url.split(\"?\")[0]})\n\n\n @register()\n class HorrorHandler(Handler):\n family = \"MovieItem\"\n logger = get_logger(\"horrorlog\")\n\n async def handle_after(self, item):\n if item[\"genres\"] and \"Horror\" in item[\"genres\"]:\n self.logger.warning(f\"({item['title']}) is a horror movie!!!!\")\n\n\n @MovieItem.bind()\n def process_time(value):\n # a self-defined field processing function\n # process time to minutes\n # '3h 1min' -> 181\n if value:\n res = 0\n segs = value.split(\" \")\n for seg in segs:\n if seg.endswith(\"min\"):\n res += int(seg.replace(\"min\", \"\"))\n elif seg.endswith(\"h\"):\n res += 60 * int(seg.replace(\"h\", \"\"))\n return res\n return value\n\n\n if __name__ == \"__main__\":\n IMDBCrawler().run()\n\n\n\nScrape quotes.toscrape.com\n^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. code-block:: python\n\n # Scrape quotes from http://quotes.toscrape.com/\n from acrawler import Parser, Crawler, ParselItem, Request\n\n\n logger = get_logger(\"quotes\")\n\n\n class QuoteItem(ParselItem):\n log = True\n default = {\"type\": \"quote\"}\n css = {\"author\": \"small.author::text\"}\n xpath = {\"text\": ['.//span[@class=\"text\"]/text()', lambda s: s.strip(\"\u201c\")[:20]]}\n\n\n class AuthorItem(ParselItem):\n log = True\n default = {\"type\": \"author\"}\n css = {\"name\": \"h3.author-title::text\", \"born\": \"span.author-born-date::text\"}\n\n class QuoteCrawler(Crawler):\n\n main_page = r\"quotes.toscrape.com/page/\\d+\"\n author_page = r\"quotes.toscrape.com/author/.*\"\n parsers = [\n Parser(\n in_pattern=main_page,\n follow_patterns=[main_page, author_page],\n item_type=QuoteItem,\n css_divider=\".quote\",\n ),\n Parser(in_pattern=author_page, item_type=AuthorItem),\n ]\n\n async def start_requests(self):\n yield Request(url=\"http://quotes.toscrape.com/page/1/\")\n\n\n if __name__ == \"__main__\":\n QuoteCrawler().run()\n\n\nSee `examples `_.\n\n\nTodo\n----\n\n* Add delta_key support for request\n* Cralwer's name for distinguishing\n* Command Line config support\n* Monitor all crawlers in web\n* Write detailed Documentation\n* Write testing code",
"description_content_type": "",
"docs_url": null,
"download_url": "",
"downloads": {
"last_day": -1,
"last_month": -1,
"last_week": -1
},
"home_page": "https://github.com/wooddance/aCrawler",
"keywords": "",
"license": "",
"maintainer": "",
"maintainer_email": "",
"name": "acrawler",
"package_url": "https://pypi.org/project/acrawler/",
"platform": "",
"project_url": "https://pypi.org/project/acrawler/",
"project_urls": {
"Homepage": "https://github.com/wooddance/aCrawler"
},
"release_url": "https://pypi.org/project/acrawler/0.1.4/",
"requires_dist": null,
"requires_python": ">=3.6.0",
"summary": "A simple web-crawling framework, based on aiohttp.",
"version": "0.1.4"
},
"last_serial": 5840290,
"releases": {
"0.0.1": [
{
"comment_text": "",
"digests": {
"md5": "e1427f8880abe4c8b51659329917bb53",
"sha256": "57633919a4c4b9727961a812110f54036e1f48acae102047a1ff90636ce36766"
},
"downloads": -1,
"filename": "acrawler-0.0.1-py3.6.egg",
"has_sig": false,
"md5_digest": "e1427f8880abe4c8b51659329917bb53",
"packagetype": "bdist_egg",
"python_version": "3.6",
"requires_python": ">=3.6.0",
"size": 42445,
"upload_time": "2019-05-09T11:33:19",
"url": "https://files.pythonhosted.org/packages/18/39/c3aa20065749e98f0f902bc3e4ff570e7373140e10d5c37376ee5452a984/acrawler-0.0.1-py3.6.egg"
},
{
"comment_text": "",
"digests": {
"md5": "763476ffc7a2c7441137916ea7269576",
"sha256": "0a3f2023408049c6e5dfd7986e45996b19626fa60c792eddc341a8b7df167305"
},
"downloads": -1,
"filename": "acrawler-0.0.1.tar.gz",
"has_sig": false,
"md5_digest": "763476ffc7a2c7441137916ea7269576",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.6.0",
"size": 12525,
"upload_time": "2019-05-09T11:33:21",
"url": "https://files.pythonhosted.org/packages/18/5a/66b9561bc1586df10c844f3736e5a93ac4cc846c936c98ba0c64a3261eaa/acrawler-0.0.1.tar.gz"
}
],
"0.0.2": [
{
"comment_text": "",
"digests": {
"md5": "73ed3f9c2fef122c5fe9fba84aba3fb7",
"sha256": "0b6b4cae062046144a84c616bc47fbd0434826a0c3ed5192131c8a0c6e0c7cf5"
},
"downloads": -1,
"filename": "acrawler-0.0.2.tar.gz",
"has_sig": false,
"md5_digest": "73ed3f9c2fef122c5fe9fba84aba3fb7",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.6.0",
"size": 13108,
"upload_time": "2019-05-15T05:09:28",
"url": "https://files.pythonhosted.org/packages/3c/2a/e5671215ca46a6c52dcee3b8df5fbb8ae1bfc0e6ceef3ff1ac489ce8b6b1/acrawler-0.0.2.tar.gz"
}
],
"0.0.3": [
{
"comment_text": "",
"digests": {
"md5": "271e4b445564312f253b7afdc031a5c7",
"sha256": "1668a40dfa627cb8637aa734ca6bce7a5b8e90cbcbd4807279a33333181a12a6"
},
"downloads": -1,
"filename": "acrawler-0.0.3.tar.gz",
"has_sig": false,
"md5_digest": "271e4b445564312f253b7afdc031a5c7",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.6.0",
"size": 14726,
"upload_time": "2019-05-18T13:38:11",
"url": "https://files.pythonhosted.org/packages/ba/04/3756feab81c212e07ca0187b22d4256fc70bcb108fe565dba7f9e8984d60/acrawler-0.0.3.tar.gz"
}
],
"0.0.4": [
{
"comment_text": "",
"digests": {
"md5": "de50c642c28b5a205f1cb7e2bb26fe74",
"sha256": "be14004d7af54d8d4005810d2b5a61f89a6c16c4670c5530f022ccbffd419757"
},
"downloads": -1,
"filename": "acrawler-0.0.4.tar.gz",
"has_sig": false,
"md5_digest": "de50c642c28b5a205f1cb7e2bb26fe74",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.6.0",
"size": 17082,
"upload_time": "2019-05-20T05:04:03",
"url": "https://files.pythonhosted.org/packages/b9/c7/f67ce556913396f972f1b64e59aa221f6e7030c5291b29bc5ed34f12e4c8/acrawler-0.0.4.tar.gz"
}
],
"0.0.5": [
{
"comment_text": "",
"digests": {
"md5": "db875db2a96aee5208abbca8789ff493",
"sha256": "b2c17426209f6edc93d8c6240d1b3d1fd6570c25e7ec3ef8ada98ef7cd89684b"
},
"downloads": -1,
"filename": "acrawler-0.0.5.tar.gz",
"has_sig": false,
"md5_digest": "db875db2a96aee5208abbca8789ff493",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.6.0",
"size": 17940,
"upload_time": "2019-05-21T01:53:46",
"url": "https://files.pythonhosted.org/packages/78/07/031e7c257e34bbf208d12e29cb9bd5f0e393a67bd629acaaa84d0104556f/acrawler-0.0.5.tar.gz"
}
],
"0.0.6": [
{
"comment_text": "",
"digests": {
"md5": "36784ebbc8e005dd5b3411d2107cc966",
"sha256": "18c425d7ba3f40908e87d33b0a3c9b7ab6f773c74249cb1cda0a279bbae0c09d"
},
"downloads": -1,
"filename": "acrawler-0.0.6.tar.gz",
"has_sig": false,
"md5_digest": "36784ebbc8e005dd5b3411d2107cc966",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.6.0",
"size": 20616,
"upload_time": "2019-05-27T13:10:07",
"url": "https://files.pythonhosted.org/packages/32/82/2355db01960e5cf0d32d54d569a3719faca8d2198718e2358f4c0c1f98f1/acrawler-0.0.6.tar.gz"
}
],
"0.0.7": [
{
"comment_text": "",
"digests": {
"md5": "897ec46e4f6671594e59256dd9ee2d7d",
"sha256": "b4344187ca2eb5aa0bca5fc87430b5157ebea2d73ec19ce6745fa7cd7c96d15a"
},
"downloads": -1,
"filename": "acrawler-0.0.7.tar.gz",
"has_sig": false,
"md5_digest": "897ec46e4f6671594e59256dd9ee2d7d",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.6.0",
"size": 23916,
"upload_time": "2019-06-04T16:45:25",
"url": "https://files.pythonhosted.org/packages/40/08/6319c2bd806a283808fc5264080cf367861d2297999817e7f1fd7bccd7b5/acrawler-0.0.7.tar.gz"
}
],
"0.0.8": [
{
"comment_text": "",
"digests": {
"md5": "c42f843bdc4e4d5c1411e5fab358700f",
"sha256": "7bde5175a8076007f09ed6c0cad0fbb9eaba768746884203b87edc67bde2b325"
},
"downloads": -1,
"filename": "acrawler-0.0.8.tar.gz",
"has_sig": false,
"md5_digest": "c42f843bdc4e4d5c1411e5fab358700f",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.6.0",
"size": 26262,
"upload_time": "2019-07-02T05:00:03",
"url": "https://files.pythonhosted.org/packages/8f/ab/882ac59b458a9c5c48d907a94291e6af28a55a1111037cb27e64076747a2/acrawler-0.0.8.tar.gz"
}
],
"0.0.9": [
{
"comment_text": "",
"digests": {
"md5": "b74a5f932b8cf79026bdf3d03996d491",
"sha256": "1205be9685f5cf06de5220ef24e1a981e09ffd2f652311e838a6df3105324ee1"
},
"downloads": -1,
"filename": "acrawler-0.0.9.tar.gz",
"has_sig": false,
"md5_digest": "b74a5f932b8cf79026bdf3d03996d491",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.6.0",
"size": 31055,
"upload_time": "2019-07-28T01:59:00",
"url": "https://files.pythonhosted.org/packages/e5/55/d8464e912b74c0908c30b48d713fdd4dc811d05ee59a846ce5e1258d6c7d/acrawler-0.0.9.tar.gz"
}
],
"0.1.0": [
{
"comment_text": "",
"digests": {
"md5": "f8051d89314a48346a0a875ba683fd01",
"sha256": "b5849835f8ee22daece371ae500b465ae5878529c01fc8598be38d82ea7661bb"
},
"downloads": -1,
"filename": "acrawler-0.1.0-py3.6.egg",
"has_sig": false,
"md5_digest": "f8051d89314a48346a0a875ba683fd01",
"packagetype": "bdist_egg",
"python_version": "3.6",
"requires_python": ">=3.6.0",
"size": 88396,
"upload_time": "2019-08-23T12:53:34",
"url": "https://files.pythonhosted.org/packages/6e/82/f55ccb5c4c9d707dc9bf5d5764e3a4e507bad3618695782889039326021a/acrawler-0.1.0-py3.6.egg"
}
],
"0.1.1": [
{
"comment_text": "",
"digests": {
"md5": "4b23a3071002704a70decb847bdcb08f",
"sha256": "02ee6f49406f34bb573b5e06b92c64272f395ee242e692dc356513116643ad55"
},
"downloads": -1,
"filename": "acrawler-0.1.1.tar.gz",
"has_sig": false,
"md5_digest": "4b23a3071002704a70decb847bdcb08f",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.6.0",
"size": 32376,
"upload_time": "2019-08-23T23:44:36",
"url": "https://files.pythonhosted.org/packages/5b/3d/ef3a07b081c511c7161ca73e28239748c0b30520d0c6e4308d7b6077a685/acrawler-0.1.1.tar.gz"
}
],
"0.1.2a0": [
{
"comment_text": "",
"digests": {
"md5": "18ac0a28f04911dc112cfee8254ea69f",
"sha256": "a748dd582cf0ac6436beadf8fe2123229fe3ae47a24ae7b4491486868ec4ca5d"
},
"downloads": -1,
"filename": "acrawler-0.1.2a0.tar.gz",
"has_sig": false,
"md5_digest": "18ac0a28f04911dc112cfee8254ea69f",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.6.0",
"size": 32364,
"upload_time": "2019-08-24T00:34:54",
"url": "https://files.pythonhosted.org/packages/b2/dd/bd4f4ddb5d3375e1f8d194689dd0eff463a759479304ab94ddb3eb5a5c73/acrawler-0.1.2a0.tar.gz"
}
],
"0.1.3": [
{
"comment_text": "",
"digests": {
"md5": "b9f6b2a7a7423f179e57f91bf38e0547",
"sha256": "d5a5b0244ad567336cea665275db464ad2a22c2e843e772b0956d18b20270dc7"
},
"downloads": -1,
"filename": "acrawler-0.1.3.tar.gz",
"has_sig": false,
"md5_digest": "b9f6b2a7a7423f179e57f91bf38e0547",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.6.0",
"size": 32372,
"upload_time": "2019-08-24T00:39:03",
"url": "https://files.pythonhosted.org/packages/e0/63/15471255b49d989a23b3fa7c9ffd42772d4cb52de2e286135ec04ac7b7e6/acrawler-0.1.3.tar.gz"
}
],
"0.1.4": [
{
"comment_text": "",
"digests": {
"md5": "fcd5b2d2cb8ea02f688a02b6f3cafd71",
"sha256": "acceeea2e613bdc5798b995af1b791e4ca51389f575546e694ebcae7bbee858b"
},
"downloads": -1,
"filename": "acrawler-0.1.4.tar.gz",
"has_sig": false,
"md5_digest": "fcd5b2d2cb8ea02f688a02b6f3cafd71",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.6.0",
"size": 32249,
"upload_time": "2019-09-17T08:41:00",
"url": "https://files.pythonhosted.org/packages/64/60/5a35bf86254b76ba516d864d9aef9cbacab9fe589cf499f6aa1b39b12694/acrawler-0.1.4.tar.gz"
}
]
},
"urls": [
{
"comment_text": "",
"digests": {
"md5": "fcd5b2d2cb8ea02f688a02b6f3cafd71",
"sha256": "acceeea2e613bdc5798b995af1b791e4ca51389f575546e694ebcae7bbee858b"
},
"downloads": -1,
"filename": "acrawler-0.1.4.tar.gz",
"has_sig": false,
"md5_digest": "fcd5b2d2cb8ea02f688a02b6f3cafd71",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.6.0",
"size": 32249,
"upload_time": "2019-09-17T08:41:00",
"url": "https://files.pythonhosted.org/packages/64/60/5a35bf86254b76ba516d864d9aef9cbacab9fe589cf499f6aa1b39b12694/acrawler-0.1.4.tar.gz"
}
]
}