{ "info": { "author": "Royce Haynes", "author_email": "royce.haynes@gmail.com", "bugtrack_url": null, "classifiers": [], "description": "## A RabbitMQ Scheduler for Scrapy Framework.\n\nScrapy-rabbitmq is a tool that lets you feed and queue URLs from RabbitMQ via Scrapy spiders, using the [Scrapy framework](http://doc.scrapy.org/en/latest/index.html).\n\nInpsired by and modled after [scrapy-redis](https://github.com/darkrho/scrapy-redis).\n\n## Installation\n\nUsing pip, type in your command-line prompt\n\n```\npip install scrapy-rabbitmq\n```\n \nOr clone the repo and inside the scrapy-rabbitmq directory, type\n\n```\npython setup.py install\n```\n\n## Usage\n\n### Step 1: In your scrapy settings, add the following config values:\n\n```\n# Enables scheduling storing requests queue in rabbitmq.\n\nSCHEDULER = \"scrapy_rabbitmq.scheduler.Scheduler\"\n\n# Don't cleanup rabbitmq queues, allows to pause/resume crawls.\nSCHEDULER_PERSIST = True\n\n# Schedule requests using a priority queue. (default)\nSCHEDULER_QUEUE_CLASS = 'scrapy_rabbitmq.queue.SpiderQueue'\n\n# RabbitMQ Queue to use to store requests\nRABBITMQ_QUEUE_NAME = 'scrapy_queue'\n\n# Provide host and port to RabbitMQ daemon\nRABBITMQ_CONNECTION_PARAMETERS = {'host': 'localhost', 'port': 6666}\n\n# Store scraped item in rabbitmq for post-processing.\nITEM_PIPELINES = {\n 'scrapy_rabbitmq.pipelines.RabbitMQPipeline': 1\n}\n\n```\n\n### Step 2: Add RabbitMQMixin to Spider.\n\n#### Example: multidomain_spider.py\n\n```\nfrom scrapy.contrib.spiders import CrawlSpider\nfrom scrapy_rabbitmq.spiders import RabbitMQMixin\n\nclass MultiDomainSpider(RabbitMQMixin, CrawlSpider):\n name = 'multidomain'\n\n def parse(self, response):\n # parse all the things\n pass\n\n```\n\n### Step 3: Run spider using [scrapy client](http://doc.scrapy.org/en/1.0/topics/shell.html)\n\n```\nscrapy runspider multidomain_spider.py\n```\n\n### Step 4: Push URLs to RabbitMQ\n\n#### Example: push_web_page_to_queue.py\n\n```\n#!/usr/bin/env python\nimport pika\nimport settings\n\nconnection = pika.BlockingConnection(pika.ConnectionParameters(\n 'localhost'))\nchannel = connection.channel()\n\nchannel.basic_publish(exchange='',\n routing_key=settings.RABBITMQ_QUEUE_NAME,\n body='raw html contentsextract url')\n\nconnection.close()\n\n```\n\n## Contributing and Forking\n\nSee [Contributing Guidlines](CONTRIBUTING.MD)\n\n## Releases\n\nSee the [changelog](CHANGELOG.md) for release details.\n\n| Version | Release Date |\n| :-----: | :----------: |\n| 0.1.0 | 2014-11-14 |\n| 0.1.1 | 2015-07-02 |\n\n\n\n## Copyright & License\n\nCopyright (c) 2015 Royce Haynes - Released under The MIT License.\n", "description_content_type": null, "docs_url": null, "download_url": "UNKNOWN", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/roycehaynes/scrapy-rabbitmq", "keywords": null, "license": "MIT", "maintainer": null, "maintainer_email": null, "name": "scrapy-rabbitmq", "package_url": "https://pypi.org/project/scrapy-rabbitmq/", "platform": "UNKNOWN", "project_url": "https://pypi.org/project/scrapy-rabbitmq/", "project_urls": { "Download": "UNKNOWN", "Homepage": "https://github.com/roycehaynes/scrapy-rabbitmq" }, "release_url": "https://pypi.org/project/scrapy-rabbitmq/0.1.2/", "requires_dist": null, "requires_python": null, "summary": "RabbitMQ Plug-in for Scrapy", "version": "0.1.2" }, "last_serial": 1616916, "releases": { "0.1.1": [ { "comment_text": "", "digests": { "md5": "65e364403f9c2e9c3cb5845ec6538044", "sha256": "494773ae0a390bfa79f5cc7f731c5c5056acf16268cea817bd60217664246311" }, "downloads": -1, "filename": "scrapy-rabbitmq-0.1.1.tar.gz", "has_sig": false, "md5_digest": "65e364403f9c2e9c3cb5845ec6538044", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 5600, "upload_time": "2015-07-02T17:28:01", "url": "https://files.pythonhosted.org/packages/3b/10/08bc4366b69d0d4f3d89d50c84e0af97829561a3f78efd3543c583c3c027/scrapy-rabbitmq-0.1.1.tar.gz" }, { "comment_text": "", "digests": { "md5": "d8acd59fb1eb235e9d369242612aafad", "sha256": "d0b1e9f2f098082d8b4996bdc27654c338b9e80763822012912ef586714b5507" }, "downloads": -1, "filename": "scrapy-rabbitmq-0.1.1.zip", "has_sig": false, "md5_digest": "d8acd59fb1eb235e9d369242612aafad", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 9468, "upload_time": "2015-07-02T17:27:57", "url": "https://files.pythonhosted.org/packages/1e/ae/a16a077878672b57b0f0b5b1d38416e2baad3d6990d5f105ebd0d2b13630/scrapy-rabbitmq-0.1.1.zip" } ], "0.1.2": [ { "comment_text": "", "digests": { "md5": "d1ab4cd43c481c7dd5cac32af2583581", "sha256": "769374ce03b7f9261b30f4df4162096202c3f69e87d5953a7be6dbee1cd88246" }, "downloads": -1, "filename": "scrapy-rabbitmq-0.1.2.tar.gz", "has_sig": false, "md5_digest": "d1ab4cd43c481c7dd5cac32af2583581", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 5618, "upload_time": "2015-07-02T17:30:00", "url": "https://files.pythonhosted.org/packages/82/d5/79e39481a45ff04e9beb2128f92ce9a12e1945254ef378abfb1e402c2637/scrapy-rabbitmq-0.1.2.tar.gz" }, { "comment_text": "", "digests": { "md5": "f1b14a0ede47aac81bec8e2b37e8355a", "sha256": "3c938edd770061b0b3257f1c009c210d604fa44afe1af1a1753016b77ff83ef2" }, "downloads": -1, "filename": "scrapy-rabbitmq-0.1.2.zip", "has_sig": false, "md5_digest": "f1b14a0ede47aac81bec8e2b37e8355a", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 9519, "upload_time": "2015-07-02T17:29:56", "url": "https://files.pythonhosted.org/packages/a9/d9/8e228008fe7d560b089ca8ac2cf293ac7b805c6f0cca92de8208836a229e/scrapy-rabbitmq-0.1.2.zip" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "d1ab4cd43c481c7dd5cac32af2583581", "sha256": "769374ce03b7f9261b30f4df4162096202c3f69e87d5953a7be6dbee1cd88246" }, "downloads": -1, "filename": "scrapy-rabbitmq-0.1.2.tar.gz", "has_sig": false, "md5_digest": "d1ab4cd43c481c7dd5cac32af2583581", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 5618, "upload_time": "2015-07-02T17:30:00", "url": "https://files.pythonhosted.org/packages/82/d5/79e39481a45ff04e9beb2128f92ce9a12e1945254ef378abfb1e402c2637/scrapy-rabbitmq-0.1.2.tar.gz" }, { "comment_text": "", "digests": { "md5": "f1b14a0ede47aac81bec8e2b37e8355a", "sha256": "3c938edd770061b0b3257f1c009c210d604fa44afe1af1a1753016b77ff83ef2" }, "downloads": -1, "filename": "scrapy-rabbitmq-0.1.2.zip", "has_sig": false, "md5_digest": "f1b14a0ede47aac81bec8e2b37e8355a", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 9519, "upload_time": "2015-07-02T17:29:56", "url": "https://files.pythonhosted.org/packages/a9/d9/8e228008fe7d560b089ca8ac2cf293ac7b805c6f0cca92de8208836a229e/scrapy-rabbitmq-0.1.2.zip" } ] }