{ "info": { "author": "", "author_email": "", "bugtrack_url": null, "classifiers": [], "description": "# Scrapy with Puppeteer\n[![PyPI](https://img.shields.io/pypi/v/scrapy-puppeteer.svg)](https://pypi.python.org/pypi/scrapy-puppeteer) [![Build Status](https://travis-ci.org/clemfromspace/scrapy-puppeteer.svg?branch=master)](https://travis-ci.org/clemfromspace/scrapy-puppeteer) [![Test Coverage](https://api.codeclimate.com/v1/badges/86603b736e684dd4f8c9/test_coverage)](https://codeclimate.com/github/clemfromspace/scrapy-puppeteer/test_coverage) [![Maintainability](https://api.codeclimate.com/v1/badges/86603b736e684dd4f8c9/maintainability)](https://codeclimate.com/github/clemfromspace/scrapy-puppeteer/maintainability)\n\nScrapy middleware to handle javascript pages using [puppeteer](https://github.com/GoogleChrome/puppeteer).\n\n## \u26a0 IN ACTIVE DEVELOPMENT - READ BEFORE USING \u26a0\n\nThis is an attempt to make Scrapy and Puppeteer work together to handle Javascript-rendered pages.\nThe design is strongly inspired of the Scrapy [Splash plugin](https://github.com/scrapy-plugins/scrapy-splash).\n\n**Scrapy and Puppeteer**\n\nThe main issue when running Scrapy and Puppeteer together is that Scrapy is using [Twisted](https://twistedmatrix.com/trac/) and that [Pyppeteeer](https://miyakogi.github.io/pyppeteer/) (the python port of puppeteer we are using) is using [asyncio](https://docs.python.org/3/library/asyncio.html) for async stuff. \n\nLuckily, we can use the Twisted's [asyncio reactor](https://twistedmatrix.com/documents/18.4.0/api/twisted.internet.asyncioreactor.html) to make the two talking with each other.\n\nThat's why you **cannot** use the buit-in `scrapy` command line (installing the default reactor), you will have to use the `scrapyp` one, provided by this module.\n\nIf you are running your spiders from a script, you will have to make sure you install the asyncio reactor before importing scrapy or doing anything else:\n\n```python\nimport asyncio\nfrom twisted.internet import asyncioreactor\n\nasyncioreactor.install(asyncio.get_event_loop())\n```\n\n\n## Installation\n```\n$ pip install scrapy-puppeteer\n```\n\n## Configuration\nAdd the `PuppeteerMiddleware` to the downloader middlewares:\n```python\nDOWNLOADER_MIDDLEWARES = {\n 'scrapy_puppeteer.PuppeteerMiddleware': 800\n}\n```\n\n\n## Usage\nUse the `scrapy_puppeteer.PuppeteerRequest` instead of the Scrapy built-in `Request` like below:\n```python\nfrom scrapy_puppeteer import PuppeteerRequest\n\ndef your_parse_method(self, response):\n # Your code...\n yield PuppeteerRequest('http://httpbin.org', self.parse_result)\n```\nThe request will be then handled by puppeteer.\n\nThe `selector` response attribute work as usual (but contains the html processed by puppeteer).\n\n```python\ndef parse_result(self, response):\n print(response.selector.xpath('//title/@text'))\n``` \n\n### Additional arguments\nThe `scrapy_puppeteer.PuppeteerRequest` accept 2 additional arguments:\n\n#### `wait_until`\n\nWill be passed to the [`waitUntil`](https://miyakogi.github.io/pyppeteer/_modules/pyppeteer/page.html#Page.goto) parameter of puppeteer.\nDefault to `domcontentloaded`.\n\n#### `wait_for`\nWill be passed to the [`waitFor`](https://miyakogi.github.io/pyppeteer/reference.html?highlight=image#pyppeteer.page.Page.waitFor) to puppeteer.\n\n#### `screenshot`\nWhen used, puppeteer will take a [screenshot](https://miyakogi.github.io/pyppeteer/reference.html?highlight=headers#pyppeteer.page.Page.screenshot) of the page and the binary data of the .png captured will be added to the response `meta`:\n```python\nyield PuppeteerRequest(\n url,\n self.parse_result,\n screenshot=True\n)\n\ndef parse_result(self, response):\n with open('image.png', 'wb') as image_file:\n image_file.write(response.meta['screenshot'])\n```\n\n\n\n", "description_content_type": "", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/clemfromspace/scrapy-puppeteer", "keywords": "", "license": "", "maintainer": "", "maintainer_email": "", "name": "scrapy-puppeteer", "package_url": "https://pypi.org/project/scrapy-puppeteer/", "platform": "", "project_url": "https://pypi.org/project/scrapy-puppeteer/", "project_urls": { "Homepage": "https://github.com/clemfromspace/scrapy-puppeteer" }, "release_url": "https://pypi.org/project/scrapy-puppeteer/0.0.1b0/", "requires_dist": [ "scrapy (>=1.0.0)", "pyppeteer" ], "requires_python": "", "summary": "Scrapy with puppeteer", "version": "0.0.1b0" }, "last_serial": 4548113, "releases": { "0.0.1b0": [ { "comment_text": "", "digests": { "md5": "1c73d3eafefc7827d68c51ee19e5175f", "sha256": "fb633b248444817f1f9c7f57f78bfb2c74752f9322986c82ebf45a36cc7d666a" }, "downloads": -1, "filename": "scrapy_puppeteer-0.0.1b0-py3-none-any.whl", "has_sig": false, "md5_digest": "1c73d3eafefc7827d68c51ee19e5175f", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 6479, "upload_time": "2018-11-30T18:07:10", "url": "https://files.pythonhosted.org/packages/a3/8e/d8aefc1d78710a56ddd9a10124dd81bee896b330be0a083f6fc26251c485/scrapy_puppeteer-0.0.1b0-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "b5919eb16db8a2ec40c953753bde7da5", "sha256": "5f6fd2b0868217506805cf9d4433d49732d721d8488b10293a88f4d7b07adfc8" }, "downloads": -1, "filename": "scrapy-puppeteer-0.0.1b0.tar.gz", "has_sig": false, "md5_digest": "b5919eb16db8a2ec40c953753bde7da5", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 5105, "upload_time": "2018-11-30T18:07:13", "url": "https://files.pythonhosted.org/packages/de/de/735b5bb8e9884590a979b7f5c0f69fb034e80cc9c88713006a9b85615a5f/scrapy-puppeteer-0.0.1b0.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "1c73d3eafefc7827d68c51ee19e5175f", "sha256": "fb633b248444817f1f9c7f57f78bfb2c74752f9322986c82ebf45a36cc7d666a" }, "downloads": -1, "filename": "scrapy_puppeteer-0.0.1b0-py3-none-any.whl", "has_sig": false, "md5_digest": "1c73d3eafefc7827d68c51ee19e5175f", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 6479, "upload_time": "2018-11-30T18:07:10", "url": "https://files.pythonhosted.org/packages/a3/8e/d8aefc1d78710a56ddd9a10124dd81bee896b330be0a083f6fc26251c485/scrapy_puppeteer-0.0.1b0-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "b5919eb16db8a2ec40c953753bde7da5", "sha256": "5f6fd2b0868217506805cf9d4433d49732d721d8488b10293a88f4d7b07adfc8" }, "downloads": -1, "filename": "scrapy-puppeteer-0.0.1b0.tar.gz", "has_sig": false, "md5_digest": "b5919eb16db8a2ec40c953753bde7da5", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 5105, "upload_time": "2018-11-30T18:07:13", "url": "https://files.pythonhosted.org/packages/de/de/735b5bb8e9884590a979b7f5c0f69fb034e80cc9c88713006a9b85615a5f/scrapy-puppeteer-0.0.1b0.tar.gz" } ] }