{ "info": { "author": "", "author_email": "", "bugtrack_url": null, "classifiers": [], "description": "# Scrapy with selenium\n[![PyPI](https://img.shields.io/pypi/v/scrapy-selenium.svg)](https://pypi.python.org/pypi/scrapy-selenium) [![Build Status](https://travis-ci.org/clemfromspace/scrapy-selenium.svg?branch=master)](https://travis-ci.org/clemfromspace/scrapy-selenium) [![Test Coverage](https://api.codeclimate.com/v1/badges/5c737098dc38a835ff96/test_coverage)](https://codeclimate.com/github/clemfromspace/scrapy-selenium/test_coverage) [![Maintainability](https://api.codeclimate.com/v1/badges/5c737098dc38a835ff96/maintainability)](https://codeclimate.com/github/clemfromspace/scrapy-selenium/maintainability)\n\nScrapy middleware to handle javascript pages using selenium.\n\n## Installation\n```\n$ pip install scrapy-selenium\n```\nYou should use **python>=3.6**. \nYou will also need one of the Selenium [compatible browsers](http://www.seleniumhq.org/about/platforms.jsp).\n\n## Configuration\n1. Add the browser to use, the path to the driver executable, and the arguments to pass to the executable to the scrapy settings:\n ```python\n from shutil import which\n\n SELENIUM_DRIVER_NAME = 'firefox'\n SELENIUM_DRIVER_EXECUTABLE_PATH = which('geckodriver')\n SELENIUM_DRIVER_ARGUMENTS=['-headless'] # '--headless' if using chrome instead of firefox\n ```\n\nOptionally, set the path to the browser executable:\n ```python\n SELENIUM_BROWSER_EXECUTABLE_PATH = which('firefox')\n ```\n\n2. Add the `SeleniumMiddleware` to the downloader middlewares:\n ```python\n DOWNLOADER_MIDDLEWARES = {\n 'scrapy_selenium.SeleniumMiddleware': 800\n }\n ```\n## Usage\nUse the `scrapy_selenium.SeleniumRequest` instead of the scrapy built-in `Request` like below:\n```python\nfrom scrapy_selenium import SeleniumRequest\n\nyield SeleniumRequest(url, self.parse_result)\n```\nThe request will be handled by selenium, and the request will have an additional `meta` key, named `driver` containing the selenium driver with the request processed.\n```python\ndef parse_result(self, response):\n print(response.request.meta['driver'].title)\n```\nFor more information about the available driver methods and attributes, refer to the [selenium python documentation](http://selenium-python.readthedocs.io/api.html#module-selenium.webdriver.remote.webdriver)\n\nThe `selector` response attribute work as usual (but contains the html processed by the selenium driver).\n```python\ndef parse_result(self, response):\n print(response.selector.xpath('//title/@text'))\n```\n\n### Additional arguments\nThe `scrapy_selenium.SeleniumRequest` accept 4 additional arguments:\n\n#### `wait_time` / `wait_until`\n\nWhen used, selenium will perform an [Explicit wait](http://selenium-python.readthedocs.io/waits.html#explicit-waits) before returning the response to the spider.\n```python\nfrom selenium.webdriver.common.by import By\nfrom selenium.webdriver.support import expected_conditions as EC\n\nyield SeleniumRequest(\n url=url,\n callback=self.parse_result,\n wait_time=10,\n wait_until=EC.element_to_be_clickable((By.ID, 'someid'))\n)\n```\n\n#### `screenshot`\nWhen used, selenium will take a screenshot of the page and the binary data of the .png captured will be added to the response `meta`:\n```python\nyield SeleniumRequest(\n url=url,\n callback=self.parse_result,\n screenshot=True\n)\n\ndef parse_result(self, response):\n with open('image.png', 'wb') as image_file:\n image_file.write(response.meta['screenshot'])\n```\n\n#### `script`\nWhen used, selenium will execute custom JavaScript code.\n```python\nyield SeleniumRequest(\n url,\n self.parse_result,\n script='window.scrollTo(0, document.body.scrollHeight);',\n)\n```", "description_content_type": "", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/markronquillo/scrapy-selenium", "keywords": "", "license": "", "maintainer": "", "maintainer_email": "", "name": "scrapy-selenium-mark", "package_url": "https://pypi.org/project/scrapy-selenium-mark/", "platform": "", "project_url": "https://pypi.org/project/scrapy-selenium-mark/", "project_urls": { "Homepage": "https://github.com/markronquillo/scrapy-selenium" }, "release_url": "https://pypi.org/project/scrapy-selenium-mark/0.0.10/", "requires_dist": null, "requires_python": "", "summary": "Scrapy with selenium", "version": "0.0.10" }, "last_serial": 5198654, "releases": { "0.0.10": [ { "comment_text": "", "digests": { "md5": "6589ea38e863fc139b139e89868a8973", "sha256": "69d560214b2f9b5c4d73fe673ee8fe2f8e4167f0b0b5fd777a2a99e3b4c13afe" }, "downloads": -1, "filename": "scrapy-selenium-mark-0.0.10.tar.gz", "has_sig": false, "md5_digest": "6589ea38e863fc139b139e89868a8973", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 5548, "upload_time": "2019-04-28T05:49:55", "url": "https://files.pythonhosted.org/packages/74/c3/a6f72289ffc1a198404699ceeb9963f83033f88653ecf01c02f7c0404a95/scrapy-selenium-mark-0.0.10.tar.gz" } ], "0.0.8": [ { "comment_text": "", "digests": { "md5": "b394e0ff15359d8cb69fbeecdad051af", "sha256": "9b276ec91322f99dee0c1264b8874903cfa3dd997ec4a7e9b33c010816719aa3" }, "downloads": -1, "filename": "scrapy-selenium-mark-0.0.8.tar.gz", "has_sig": false, "md5_digest": "b394e0ff15359d8cb69fbeecdad051af", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 5487, "upload_time": "2019-04-26T10:25:52", "url": "https://files.pythonhosted.org/packages/b7/14/5d135f6119a7d34bad24a00c47995ac306a9905aeedf6178523d1fa31059/scrapy-selenium-mark-0.0.8.tar.gz" } ], "0.0.9": [ { "comment_text": "", "digests": { "md5": "58d9d7cd41c28c1688b9cf694072dde9", "sha256": "f9f6e98d79c779a91b3fe8649a65bdb99d54642aa4e297bb9bb05a11095f04ab" }, "downloads": -1, "filename": "scrapy-selenium-mark-0.0.9.tar.gz", "has_sig": false, "md5_digest": "58d9d7cd41c28c1688b9cf694072dde9", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 5545, "upload_time": "2019-04-26T15:23:58", "url": "https://files.pythonhosted.org/packages/1b/3e/df3e710ad8d95295a92cc327e74630175ee3c0a355f9d0649f15a620e7d9/scrapy-selenium-mark-0.0.9.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "6589ea38e863fc139b139e89868a8973", "sha256": "69d560214b2f9b5c4d73fe673ee8fe2f8e4167f0b0b5fd777a2a99e3b4c13afe" }, "downloads": -1, "filename": "scrapy-selenium-mark-0.0.10.tar.gz", "has_sig": false, "md5_digest": "6589ea38e863fc139b139e89868a8973", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 5548, "upload_time": "2019-04-28T05:49:55", "url": "https://files.pythonhosted.org/packages/74/c3/a6f72289ffc1a198404699ceeb9963f83033f88653ecf01c02f7c0404a95/scrapy-selenium-mark-0.0.10.tar.gz" } ] }