{ "info": { "author": "Bruno Rocha", "author_email": "rochacbruno@gmail.com", "bugtrack_url": null, "classifiers": [ "Development Status :: 2 - Pre-Alpha", "Intended Audience :: Developers", "License :: OSI Approved :: BSD License", "Natural Language :: English", "Programming Language :: Python :: 2", "Programming Language :: Python :: 2.6", "Programming Language :: Python :: 2.7", "Programming Language :: Python :: 3", "Programming Language :: Python :: 3.3" ], "description": "Create scraper using Scrapy Selectors\n============================================\n\n[](https://travis-ci.org/rochacbruno/splinter_model)\n\n[](https://pypi.python.org/pypi/splinter_model/)\n[](https://pypi.python.org/pypi/splinter_model/)\n\n## What is Splinter?\n\nSplinter is an open source tool for testing web applications using Python. It lets you automate browser actions, such as visiting URLs and interacting with their items.\n\nhttp://splinter.cobrateam.info/\n\n\n## What is splinter_model ?\n\nIt is a clone of [scrapy_model](http://github.com/rochacbruno/scrapy_model) but intead of Scrapy it uses Splinter as engine, so it allows scraping of JavaScript websites.\n\n\n## TODO:\n\nEverything is in to-do, so just dont use it until it is released to PyPI.\n\n### Requirements\n\nThis module should keep implement the same api as scrapy_model, at least the support for CSSField, XPathField, processors, validators, multiple queries and parse_methods.\n\nAlso it should implement a layer to interact with JavaScript.\n\n### Current status: pre-alpha-dev\n\nIt is just a helper to create scrapers using the Scrapy Selectors allowing you to select elements by CSS or by XPATH and structuring your scraper via Models (just like an ORM model) and plugable to an ORM model via ``populate`` method.\n\nImport the BaseFetcherModel, CSSField or XPathField (you can use both)\n\n```python\nfrom splinter_model import BaseFetcherModel, CSSField\n```\n\nGo to a webpage you want to scrap and use chrome dev tools or firebug to figure out the css paths then considering you want to get the following fragment from some page.\n\n```html\n Bruno Rocha website\n```\n\n```python\nclass MyFetcher(BaseFetcherModel):\n name = CSSField('span#person')\n website = CSSField('span#person a')\n # XPathField('//xpath_selector_here')\n```\n\nFields can receive ``auto_extract=True`` parameter which auto extracts values from selector before calling the parse or processors. Also you can pass the ``takes_first=True`` which will for auto_extract and also tries to get the first element of the result, because scrapy selectors returns a list of matched elements.\n\n\n### Multiple queries in a single field\n\nYou can use multiple queries for a single field\n\n```python\nname = XPathField(\n ['//*[@id=\"8\"]/div[2]/div/div[2]/div[2]/ul',\n '//*[@id=\"8\"]/div[2]/div/div[3]/div[2]/ul']\n)\n```\n\nIn that case, the parsing will try to fetch by the first query and returns if finds a match, else it will try the subsequent queries until it finds something, or it will return an empty selector.\n\n#### Finding the best match by a query validator\n\nIf you want to run multiple queries and also validates the best match you can pass a validator function which will take the scrapy selector an should return a boolean.\n\nExample, imagine you get the \"name\" field defined above and you want to validates each query to ensure it has a 'li' with a text \"Schblaums\" in there.\n\n```python\n\ndef has_schblaums(selector):\n for li in selector.css('li'): # takes each