{ "info": { "author": "\u7518\u4eae", "author_email": "lovercws@gmail.com", "bugtrack_url": null, "classifiers": [], "description": "# pcrawler\u722c\u866b\u7a0b\u5e8f\n[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://github.com/mumupy/pcrawler/blob/master/LICENSE)\n[![Build Status](https://travis-ci.org/mumupy/pcrawler.svg?branch=master)](https://travis-ci.org/mumupy/pcrawler)\n[![codecov](https://codecov.io/gh/mumupy/pcrawler/branch/master/graph/badge.svg)](https://codecov.io/gh/mumupy/pcrawler)\n[![pypi](https://img.shields.io/pypi/v/pcrawler.svg)](https://pypi.python.org/pypi/pcrawler)\n[![Documentation Status](https://readthedocs.org/projects/pcrawler/badge/?version=latest)](https://pcrawler.readthedocs.io/en/latest/?badge=latest)\n\n***pcrawler\u662f\u4e00\u6b3epython\u7248\u672c\u7684\u722c\u866b\u7a0b\u5e8f\uff0c\u901a\u8fc7\u8be5\u722c\u866b\u7a0b\u5e8f\u53ef\u4ee5\u975e\u5e38\u5feb\u901f\u65b9\u4fbf\u7684\u7f16\u5199\u4e00\u4e2a\u81ea\u5df1\u7684\u722c\u866b\u7a0b\u5e8f\u3002pcrawler\u4e3b\u8981\n\u5305\u542bdownloader\u3001schedular\u3001processor\u3001storage\u56db\u5927\u7ec4\u4ef6\u7ec4\u6210\u3002\u800c\u4e14\u53ef\u4ee5\u975e\u5e38\u65b9\u4fbf\u5feb\u6377\u7684\u62d3\u5c55\u5404\u4e2a\u7ec4\u4ef6\u3002***\n\n## \u7279\u6027\uff1a\n- \u7b80\u5355\u7684API\uff0c\u53ef\u5feb\u901f\u4e0a\u624b\n- \u6a21\u5757\u5316\u7684\u7ed3\u6784\uff0c\u53ef\u8f7b\u677e\u6269\u5c55\n- \u63d0\u4f9b\u591a\u7ebf\u7a0b\u548c\u5206\u5e03\u5f0f\u652f\u6301\n\n## \u67b6\u6784\npcrawler\u4e3b\u8981\u5305\u542bdownloader\u3001schedular\u3001processor\u3001storage\u56db\u5927\u7ec4\u4ef6\u7ec4\u6210\u3002\n- processor \u722c\u866b\u9875\u9762\u5904\u7406\u5668\uff0c\u5bf9\u9875\u9762\u8fdb\u884c\u5206\u6790\u3002\u76ee\u524d\u96c6\u6210\u56fe\u7247\u4e0b\u8f7d\u5904\u7406\u5668\u3001\u591a\u5a92\u4f53\u89c6\u9891\u4e0b\u8f7d\u5904\u7406\u5668\u3001\u65b0\u6d6a\u65b0\u95fb\u5904\u7406\u5668\u3002\n- schedular URL\u7ba1\u7406\u7ec4\u4ef6\uff0c\u5bf9\u5f85\u6293\u53d6\u7684URL\u961f\u5217\u8fdb\u884c\u7ba1\u7406\uff0c\u5bf9\u5df2\u6293\u53d6\u7684URL\u8fdb\u884c\u53bb\u91cd\u3002\u76ee\u524durl\u961f\u5217\u7ba1\u7406\u652f\u6301\u6587\u4ef6\u7f13\u5b58\u7ba1\u7406\u548c\u96c6\u5408\u7ba1\u7406\u3002url\u53bb\u91cd\u652f\u6301\u6587\u4ef6\u7f13\u5b58\u3001\u96c6\u5408\u3001bloomFilter\u5e03\u9686\u8fc7\u6ee4\u5668\u7b49\u3002\n- downloader \u4e0b\u8f7d\u7ec4\u4ef6\uff0c\u9ed8\u8ba4\u4f7f\u7528urllib2\u4e0b\u8f7d\u3002\n- storage \u5b58\u50a8\u7ec4\u4ef6\uff0c\u652f\u6301\u591a\u6837\u6587\u4ef6\u683c\u5f0f(csv\u3001json\u3001avro\u3001video)\n\n## \u76f8\u5173\u9605\u8bfb \n[webmagic\u722c\u866b](http://webmagic.io/) \n[Bloom Filter](http://blog.csdn.net/jiaomeng/article/details/1495500)\n\n## \u8054\u7cfb\u65b9\u5f0f\n**\u4ee5\u4e0a\u89c2\u70b9\u7eaf\u5c5e\u4e2a\u4eba\u770b\u6cd5\uff0c\u5982\u6709\u4e0d\u540c\uff0c\u6b22\u8fce\u6307\u6b63\u3002 \nemail: \ngithub:[https://github.com/babymm](https://github.com/babymm)**\n", "description_content_type": "", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/mumupy/pcrawler.git", "keywords": "python\u7248\u672c\u7684\u722c\u866b\u7a0b\u5e8f", "license": "Apache License", "maintainer": "", "maintainer_email": "", "name": "pcrawler", "package_url": "https://pypi.org/project/pcrawler/", "platform": "any", "project_url": "https://pypi.org/project/pcrawler/", "project_urls": { "Homepage": "https://github.com/mumupy/pcrawler.git" }, "release_url": "https://pypi.org/project/pcrawler/0.0.3/", "requires_dist": null, "requires_python": "", "summary": "python\u7248\u672c\u7684\u722c\u866b\u7a0b\u5e8f\u3002\u6839\u636ejava\u7248\u672c\u7684webmagic\u6539\u7f16\u800c\u6210\u3002\u8be5\u722c\u866b\u7a0b\u5e8f\u4e3b\u8981\u5305\u542bdownloader\u3001storage\u3001processor\u3001schemular\u7b49\u56db\u5927\u529f\u80fd\u6a21\u5757\u3002\u901a\u8fc7\u8be5\u722c\u866b\u7a0b\u5e8f\u53ef\u4ee5\u5feb\u901f\u7684\u7f16\u5199\u4e00\u4e2a\u81ea\u5b9a\u4e49\u7684\u722c\u866b\u7a0b\u5e8f\u3002", "version": "0.0.3" }, "last_serial": 5434529, "releases": { "0.0.2": [ { "comment_text": "", "digests": { "md5": "d61b36ecb2685afb556af55471aff9ba", "sha256": "6146ea8aea620a41ba862481b599f012afeee668569de12494562c98f14fbfba" }, "downloads": -1, "filename": "pcrawler-0.0.2.tar.gz", "has_sig": false, "md5_digest": "d61b36ecb2685afb556af55471aff9ba", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 17114, "upload_time": "2018-09-26T14:25:52", "url": "https://files.pythonhosted.org/packages/5a/7e/7ae104bfded9342245282077222491da2c6dcbf389351a00313aabe4f525/pcrawler-0.0.2.tar.gz" } ], "0.0.3": [ { "comment_text": "", "digests": { "md5": "713958cdb159b17d6b401ae89c7ee072", "sha256": "3dcfb39d59bdfd7eefdde00cca803c47b0d870e5b995206b3c547bbd892f2531" }, "downloads": -1, "filename": "pcrawler-0.0.3.tar.gz", "has_sig": false, "md5_digest": "713958cdb159b17d6b401ae89c7ee072", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 17576, "upload_time": "2019-06-22T12:09:40", "url": "https://files.pythonhosted.org/packages/36/f2/6e60188b051b35bf590fb51aae5cb9ffcda9197c6cb081db976ab93f4759/pcrawler-0.0.3.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "713958cdb159b17d6b401ae89c7ee072", "sha256": "3dcfb39d59bdfd7eefdde00cca803c47b0d870e5b995206b3c547bbd892f2531" }, "downloads": -1, "filename": "pcrawler-0.0.3.tar.gz", "has_sig": false, "md5_digest": "713958cdb159b17d6b401ae89c7ee072", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 17576, "upload_time": "2019-06-22T12:09:40", "url": "https://files.pythonhosted.org/packages/36/f2/6e60188b051b35bf590fb51aae5cb9ffcda9197c6cb081db976ab93f4759/pcrawler-0.0.3.tar.gz" } ] }