{ "info": { "author": "Marvin Zhang", "author_email": "tikazyq@163.com", "bugtrack_url": null, "classifiers": [], "description": "# Crawlab\nCelery-based web crawler admin platform for managing distributed web spiders regardless of languages and frameworks.\n\n## Pre-requisite\n- Python3\n- MongoDB\n- Redis\n\n## Installation\n\n```bash\npip install -r requirements.txt\n```\n\n## Configure\n\nPlease edit configuration file `config.py` to configure api and database connections.\n\n## Quick Start\n```bash\n# run web app\npython app.py\n\n# run flower app\npython ./bin/run_flower.py\n\n# run worker\npython ./bin/run_worker.py\n```\n\n```bash\n# TODO: frontend\n```\n\n## Nodes\n\nNodes are actually the workers defined in Celery. A node is running and connected to a task queue, redis for example, to receive and run tasks. As spiders need to be deployed to the nodes, users should specify their ip addresses and ports before the deployment.\n\n## Spiders\n\n#### Auto Discovery\nIn `config.py` file, edit `PROJECT_SOURCE_FILE_FOLDER` as the directory where the spiders projects are located. The web app will discover spider projects automatically.\n\n#### Deploy Spiders\n\nAll spiders need to be deployed to a specific node before crawling. Simply click \"Deploy\" button on spider detail page and select the right node for the deployment. \n\n#### Run Spiders\n\nAfter deploying the spider, you can click \"Run\" button on spider detail page and select a specific node to start crawling. It will triggers a task for the crawling, where you can see in detail in tasks page.\n\n## Tasks\n\nTasks are triggered and run by the workers. Users can check the task status info and logs in the task detail page.", "description_content_type": "", "docs_url": null, "download_url": "https://github.com/tikazyq/crawlab/archive/master.zip", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/tikazyq/crawlab", "keywords": "celery,python,webcrawler,crawl,scrapy,admin", "license": "BSD", "maintainer": "", "maintainer_email": "", "name": "crawlab-server", "package_url": "https://pypi.org/project/crawlab-server/", "platform": "", "project_url": "https://pypi.org/project/crawlab-server/", "project_urls": { "Download": "https://github.com/tikazyq/crawlab/archive/master.zip", "Homepage": "https://github.com/tikazyq/crawlab" }, "release_url": "https://pypi.org/project/crawlab-server/0.0.1/", "requires_dist": null, "requires_python": "", "summary": "Celery-based web crawler admin platform for managing distributed web spiders regardless of languages and frameworks.", "version": "0.0.1" }, "last_serial": 4889801, "releases": { "0.0.1": [ { "comment_text": "", "digests": { "md5": "2c6e68592a32d03e111925f8551dd21d", "sha256": "cc9e97da42700cb51ed80c60495077a305b60c9b4a3600b5fe05df7e0cd1774b" }, "downloads": -1, "filename": "crawlab-server-0.0.1.tar.gz", "has_sig": false, "md5_digest": "2c6e68592a32d03e111925f8551dd21d", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 11084, "upload_time": "2019-03-03T03:22:19", "url": "https://files.pythonhosted.org/packages/49/4c/e5e8bfd5d68986c75d6d5d9ce069b3eac2d5d8e8af59791947d12d8b3b2b/crawlab-server-0.0.1.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "2c6e68592a32d03e111925f8551dd21d", "sha256": "cc9e97da42700cb51ed80c60495077a305b60c9b4a3600b5fe05df7e0cd1774b" }, "downloads": -1, "filename": "crawlab-server-0.0.1.tar.gz", "has_sig": false, "md5_digest": "2c6e68592a32d03e111925f8551dd21d", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 11084, "upload_time": "2019-03-03T03:22:19", "url": "https://files.pythonhosted.org/packages/49/4c/e5e8bfd5d68986c75d6d5d9ce069b3eac2d5d8e8af59791947d12d8b3b2b/crawlab-server-0.0.1.tar.gz" } ] }