{ "info": { "author": "haokuan", "author_email": "jingdaohao@gmail.com", "bugtrack_url": null, "classifiers": [], "description": "# webcrawl\n[![Build Status](https://api.travis-ci.org/listen-lavender/webcrawl.svg?branch=master)](https://api.travis-ci.org/listen-lavender/webcrawl)\n\nwebcrawl\u662f\u5bf9\u6293\u53d6\u5e38\u7528\u5de5\u5177\u7684\u5c01\u88c5\uff0c\u5305\u62ecrequests\uff0clxml\uff0cphantomjs\uff0c\u5e76\u4e14\u5b9e\u73b0\u4e86workflow\uff0c\u4f7fcoder\u5728\u9075\u5b88\u89c4\u8303\u7684\u57fa\u7840\u4e0a\u66f4\u4e13\u6ce8\u6293\u53d6\u4e1a\u52a1\uff0c\u65b9\u4fbf\u5feb\u901f\u5b9e\u73b0\u7a33\u5b9a\u7684\u5de5\u7a0b\uff1b\u8fd8\u6709\u4e00\u4e9b\u5176\u4ed6\u4f1a\u7528\u5230\u7684\u5de5\u5177\u7684\u5c01\u88c5\uff0c\u4f8b\u5982rsa.py\u662fhttp://www.ohdave.com/rsa \u7684Python\u7248\u672c\uff0c\u8fd9\u4e2a\u5f88\u591a\u7f51\u7ad9\u6709\u7528\u5230\uff1batlas.py\u8bbe\u8ba1\u5230\u4e00\u4e9b\u5730\u56fe\u5750\u6807\u7684\u5904\u7406\u3002\n\n## http\u8bf7\u6c42\u589e\u5f3a\nhandleRequest.py\u662f\u5bf9requests\u6a21\u5757\u6293\u53d6\u5e38\u7528\u7684http\u65b9\u6cd5\u4ee5\u53calxml\u89e3\u6790\u7684\u5c01\u88c5\uff0c\u4ee5\u53caphantomsjs\u4ee3\u7406\u7684\u652f\u6301\uff0c\u8fd8\u6709\u4e00\u4e9b\u901a\u7528\u5185\u5bb9\u7684\u5904\u7406\n> - html \n> - xml \n> - json \n> - text \n> - response object \n\n## task\u7684\u7b80\u5355\u63a7\u5236\ntask.py(work.py)\u662f\u4efb\u52a1\u6d41workflow\u7684\u5b9e\u73b0\uff0c\u662f\u6570\u636e\u9a71\u52a8\u5f02\u6b65\u6267\u884c\u7684\uff0c\u7c7b\u4f3c\u4e8ecelery\u7684chain\uff0cgroup\uff0cchord\u7b49\u7684\u590d\u5408\u7c7b\u578b\uff0c\u4f46\u662f\u6bd4celery\u7684\u8fd9\u65b9\u9762\u66f4\u5f3a\u5927\u66f4\u597d\u7528\uff0c\u5e76\u4e14\u63a7\u5236\u7740\u6293\u53d6\u4ee3\u7801\u7684\u7f16\u5199\u89c4\u8303\uff0c\u4f9d\u8d56\u4e8epjq\u961f\u5217\n> - workflow \n> - priority \n> - selfloop \n> - subtask timeout \n> - task timeout \n\n## queue\u652f\u6301\npjq.py\u662fpriority join queue\uff0c\u4e3a\u4e86\u652f\u6301\u4efb\u52a1\u6d41\u7684\u5b9e\u73b0\uff0c\u5176\u4e2dmongo queue\u6bd4\u8f83\u5f3a\u5927\uff0c\u652f\u6301task\u7684\u589e\u67e5\u6539\uff0c\u5c31\u662f\u5728\u6267\u884c\u8fc7\u7a0b\u4e2dsubtask\u662f\u53ef\u63a7\u7684\u3002\n> - workflow \n> - priority \n> - selfloop \n> - subtask timeout \n> - task timeout \n\n## mongo queue\n```\n |-------put ---------- get insert insert\n | / \\ | |\n | WAIT---[ready]--- RUNNING --------COMPLETED |\n | | |\n | | |\nRETRY----------------------|----------------------ERROR\n | |\n | |\n |__________________________________________________|\n\n WAIT : 2\n RUNNING : 3\n RETRY : 4\n ABANDONED: 5\n COMPLETED: 1\n ERROR : 0\n ready - 10\n```\n\n# Getting started\n\nNo example now.\n\n## Installation\n\nTo install webcrawl, simply:\n\n````bash\n\n $ pip install webcrawl\n \u2728\ud83c\udf70\u2728\n````\n\n## Discussion and support\n\nReport bugs on the *GitHub issue tracker ", "license": "UNKNOWN", "maintainer": null, "maintainer_email": null, "name": "webcrawl", "package_url": "https://pypi.org/project/webcrawl/", "platform": "UNKNOWN", "project_url": "https://pypi.org/project/webcrawl/", "project_urls": { "Download": "UNKNOWN", "Homepage": "https://github.com/listen-lavender/webcrawl" }, "release_url": "https://pypi.org/project/webcrawl/1.1.2/", "requires_dist": null, "requires_python": null, "summary": "wecatch webcrawl", "version": "1.1.2" }, "last_serial": 2267844, "releases": { "1.1.0": [ { "comment_text": "", "digests": { "md5": "f8fa10974d89a0e8f23054d38a07574e", "sha256": "67014c32bc12134a2bbfaf7233a2777b615ab0ad05b202c275106aac84cf2883" }, "downloads": -1, "filename": "webcrawl-1.1.0.tar.gz", "has_sig": false, "md5_digest": "f8fa10974d89a0e8f23054d38a07574e", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 52484, "upload_time": "2016-06-24T10:35:33", "url": "https://files.pythonhosted.org/packages/ba/51/995316dc13f786458d0fc493ca4da4323f989a161396e0175ad558c07475/webcrawl-1.1.0.tar.gz" } ], "1.1.1": [ { "comment_text": "", "digests": { "md5": "4cd0c9789cbb67825825acc981e6b1a3", "sha256": "68c3559d9f81bbcc4bcf443294bf6bf83b370b639cd4cc868eccb220fea86021" }, "downloads": -1, "filename": "webcrawl-1.1.1.tar.gz", "has_sig": false, "md5_digest": "4cd0c9789cbb67825825acc981e6b1a3", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 43844, "upload_time": "2016-07-26T00:30:25", "url": "https://files.pythonhosted.org/packages/93/e7/7b40a18bcd1df09e4438415789c9a8647746a8d5cd518dbfc83447f7c448/webcrawl-1.1.1.tar.gz" } ], "1.1.2": [ { "comment_text": "", "digests": { "md5": "7b024da809e99026526f50951b46ef4e", "sha256": "b5157ef669f5676b3fe2dd1d4a307e93233a64f6e516304313261716022786b0" }, "downloads": -1, "filename": "webcrawl-1.1.2.tar.gz", "has_sig": false, "md5_digest": "7b024da809e99026526f50951b46ef4e", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 43898, "upload_time": "2016-08-08T03:29:15", "url": "https://files.pythonhosted.org/packages/8f/7b/2f67a0a588ee2f91c30107aafc8225eed4e2eeb7852dda9489b5f40b6e05/webcrawl-1.1.2.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "7b024da809e99026526f50951b46ef4e", "sha256": "b5157ef669f5676b3fe2dd1d4a307e93233a64f6e516304313261716022786b0" }, "downloads": -1, "filename": "webcrawl-1.1.2.tar.gz", "has_sig": false, "md5_digest": "7b024da809e99026526f50951b46ef4e", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 43898, "upload_time": "2016-08-08T03:29:15", "url": "https://files.pythonhosted.org/packages/8f/7b/2f67a0a588ee2f91c30107aafc8225eed4e2eeb7852dda9489b5f40b6e05/webcrawl-1.1.2.tar.gz" } ] }