{ "info": { "author": "Rolando Espinoza La fuente", "author_email": "darkrho@gmail.com", "bugtrack_url": null, "classifiers": [ "Development Status :: 4 - Beta", "Intended Audience :: Developers", "Programming Language :: Python" ], "description": "==================\nscrapy-boilerplate\n==================\n\n`scrapy-boilerplate` is a small set of utilities for `Scrapy`_ to simplify\nwriting low-complexity spiders that are very common in small and one-off projects.\n\nIt requires `Scrapy`_ `(>= 0.16)` and has been tested using `python 2.7`.\nAdditionally, `PyQuery`_ is required to run the scripts in the `examples`_\ndirectory.\n\n.. note::\n\n The code is experimental, includes some magic under the hood and might be\n hard to debug. If you are new to `Scrapy`_, don't use this code unless\n you are ready to debug errors that nobody have seen before.\n\n\n-----------\nUsage Guide\n-----------\n\nItems\n=====\n\nStandard item definition:\n\n.. code:: python\n\n from scrapy.item import Item, Field\n\n class BaseItem(Item):\n url = Field()\n crawled = Field()\n\n class UserItem(BaseItem):\n name = Field()\n about = Field()\n location = Field()\n\n class StoryItem(BaseItem):\n title = Field()\n body = Field()\n user = Field()\n\nBecomes:\n\n.. code:: python\n\n from scrapy_boilerplate import NewItem\n\n BaseItem = NewItem('url crawled')\n\n UserItem = NewItem('name about location', base_cls=BaseItem)\n\n StoryItem = NewItem('title body user', base_cls=BaseItem)\n\n\nBaseSpider\n==========\n\nStandard spider definition:\n\n.. code:: python\n\n from scrapy.spider import BaseSpider\n\n class MySpider(BaseSpider):\n name = 'my_spider'\n start_urls = ['http://example.com/latest']\n\n def parse(self, response):\n # do stuff\n\n\nBecomes:\n\n.. code:: python\n\n from scrapy_boilerplate import NewSpider\n\n MySpider = NewSpider('my_spider')\n\n @MySpider.scrape('http://example.com/latest')\n def parse(spider, response):\n # do stuff\n\n\nCrawlSpider\n===========\n\nStandard crawl-spider definition:\n\n.. code:: python\n\n from scrapy.contrib.spiders import CrawlSpider, Rule\n from scrapy.contrib.linkextractors.sgml import SgmlLinkExtractor\n\n\n class MySpider(CrawlSpider):\n name = 'my_spider'\n start_urls = ['http://example.com']\n\n rules = (\n Rule(SgmlLinkExtractor('category\\.php'), follow=True),\n Rule(SgmlLinkExtractor('item\\.php'), callback='parse_item'),\n )\n\n def parse_item(self, response):\n # do stuff\n\n\nBecomes:\n\n.. code:: python\n\n from scrapy_boilerplate import NewCrawlSpider\n\n MySpider = NewCrawlSpider('my_spider')\n MySpider.follow('category\\.php')\n\n @MySpider.rule('item\\.php')\n def parse_item(spider, response):\n # do stuff\n\n\nRunning Helpers\n===============\n\nSingle-spider running script:\n\n.. code:: python\n\n # file: my-spider.py\n # imports omitted ...\n\n class MySpider(BaseSpider):\n # spider code ...\n\n if __name__ == '__main__':\n from scrapy_boilerplate import run_spider\n custom_settings = {\n # ...\n }\n spider = MySpider()\n\n run_spider(spider, custom_settings)\n\n\nMulti-spider script with standard crawl command line options:\n\n.. code:: python\n\n # file: my-crawler.py\n # imports omitted ...\n\n\n class MySpider(BaseSpider):\n name = 'my_spider'\n # spider code ...\n\n\n class OtherSpider(CrawlSpider):\n name = 'other_spider'\n # spider code ...\n\n\n if __name__ == '__main__':\n from scrapy_boilerplate import run_crawler, SpiderManager\n custom_settings = {\n # ...\n }\n\n SpiderManager.register(MySpider)\n SpiderManager.register(OtherSpider)\n\n run_crawler(custom_settings)\n\n\n.. note:: See the `examples`_ directory for working code examples.\n\n\n.. _`Scrapy`: http://www.scrapy.org\n.. _`PyQuery`: http://pypi.python.org/pypi/pyquery\n.. _`examples`: https://github.com/darkrho/scrapy-boilerplate/tree/master/examples", "description_content_type": null, "docs_url": null, "download_url": "UNKNOWN", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/darkrho/scrapy-boilerplate", "keywords": null, "license": "BSD", "maintainer": null, "maintainer_email": null, "name": "scrapy-boilerplate", "package_url": "https://pypi.org/project/scrapy-boilerplate/", "platform": "UNKNOWN", "project_url": "https://pypi.org/project/scrapy-boilerplate/", "project_urls": { "Download": "UNKNOWN", "Homepage": "https://github.com/darkrho/scrapy-boilerplate" }, "release_url": "https://pypi.org/project/scrapy-boilerplate/0.2.1/", "requires_dist": null, "requires_python": null, "summary": "Small set of utilities to simplify writing Scrapy spiders.", "version": "0.2.1" }, "last_serial": 799334, "releases": { "0.1": [ { "comment_text": "", "digests": { "md5": "54919d0fb266a77ae3a8c937b8d56680", "sha256": "3395311d878896461fff35213557021d006b9735cd33005b0c7f52a818b62588" }, "downloads": -1, "filename": "scrapy-boilerplate-0.1.tar.gz", "has_sig": false, "md5_digest": "54919d0fb266a77ae3a8c937b8d56680", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 2955, "upload_time": "2013-02-02T05:57:07", "url": "https://files.pythonhosted.org/packages/be/fa/e46ff635fb15d20c8d47844fed2430064aea9ce7f73c627d1419f24343b9/scrapy-boilerplate-0.1.tar.gz" } ], "0.2": [ { "comment_text": "", "digests": { "md5": "c3609d87735c4b74494e44bad3ffd589", "sha256": "92561779cfe6600c5cafabe340aa7d65a99788ca25bbcb7f6819b55aab10e76c" }, "downloads": -1, "filename": "scrapy-boilerplate-0.2.tar.gz", "has_sig": false, "md5_digest": "c3609d87735c4b74494e44bad3ffd589", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 4991, "upload_time": "2013-02-04T00:34:16", "url": "https://files.pythonhosted.org/packages/17/48/2ad787186949f8f51ed032593a64cb32cab68b7b743c4da4160ca04e8eb9/scrapy-boilerplate-0.2.tar.gz" } ], "0.2.1": [ { "comment_text": "", "digests": { "md5": "9c484b430d39ae2298acfb275f9ebcf7", "sha256": "49860bcb31ad8d7516e9f322b89b34ae708b4f373b588477cbb65eee3a273078" }, "downloads": -1, "filename": "scrapy-boilerplate-0.2.1.tar.gz", "has_sig": false, "md5_digest": "9c484b430d39ae2298acfb275f9ebcf7", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 4993, "upload_time": "2013-02-04T00:35:17", "url": "https://files.pythonhosted.org/packages/65/4f/8901d1fc946dc6bc27e8abe472d298db36da18db3e63793938383499dd6a/scrapy-boilerplate-0.2.1.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "9c484b430d39ae2298acfb275f9ebcf7", "sha256": "49860bcb31ad8d7516e9f322b89b34ae708b4f373b588477cbb65eee3a273078" }, "downloads": -1, "filename": "scrapy-boilerplate-0.2.1.tar.gz", "has_sig": false, "md5_digest": "9c484b430d39ae2298acfb275f9ebcf7", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 4993, "upload_time": "2013-02-04T00:35:17", "url": "https://files.pythonhosted.org/packages/65/4f/8901d1fc946dc6bc27e8abe472d298db36da18db3e63793938383499dd6a/scrapy-boilerplate-0.2.1.tar.gz" } ] }