{ "info": { "author": "", "author_email": "", "bugtrack_url": null, "classifiers": [], "description": "# Scrapy with selenium\nScrapy middleware to handle javascript pages using selenium.\n\n## Installation\n```\n$ pip install scrapy_selenium_python_pi\n```\nYou should use **python>=3.5**. \nYou will also need one of the Selenium [compatible browsers](http://www.seleniumhq.org/about/platforms.jsp).\n\n## Configuration\n1. Add the browser to use, the path to the executable, and the arguments to pass to the executable to the scrapy settings:\n ```python\n from shutil import which\n\n SELENIUM_DRIVER_NAME='firefox'\n SELENIUM_DRIVER_EXECUTABLE_PATH=which('geckodriver')\n SELENIUM_DRIVER_ARGUMENTS=['-headless'] # '--headless' if using chrome instead of firefox\n ```\n\n2. Add the `SeleniumMiddleware` to the downloader middlewares:\n ```python\n DOWNLOADER_MIDDLEWARES = {\n 'scrapy_selenium_python_pi.SeleniumMiddleware': 800\n }\n ```\n## Usage\nUse the `scrapy_selenium_python_pi.SeleniumRequest` instead of the scrapy built-in `Request` like below:\n```python\nfrom scrapy_selenium_python_pi import SeleniumRequest\n\nyield SeleniumRequest(url, self.parse_result)\n```\nThe request will be handled by selenium, and the response will have an additional `meta` key, named `driver` containing the selenium driver with the request processed.\n```python\ndef parse_result(self, response):\n print(response.meta['driver'].title)\n```\nFor more information about the available driver methods and attributes, refer to the [selenium python documentation](http://selenium-python.readthedocs.io/api.html#module-selenium.webdriver.remote.webdriver)\n\nThe `selector` response attribute work as usual (but contains the html processed by the selenium driver).\n```python\ndef parse_result(self, response):\n print(response.selector.xpath('//title/@text'))\n```\n\n### Additional arguments\nThe `scrapy_selenium_python_pi.SeleniumRequest` accept 3 additional arguments:\n\n#### `wait_time` / `wait_until`\n\nWhen used, selenium will perform an [Explicit wait](http://selenium-python.readthedocs.io/waits.html#explicit-waits) before returning the response to the spider.\n```python\nfrom selenium.webdriver.common.by import By\nfrom selenium.webdriver.support import expected_conditions as EC\n\nyield SeleniumRequest(\n url,\n self.parse_result,\n wait_time=10,\n wait_until=EC.element_to_be_clickable((By.ID, 'someid'))\n)\n```\n\n#### `screenshot`\nWhen used, selenium will take a screenshot of the page and the binary data of the .png captured will be added to the response `meta`:\n```python\nyield SeleniumRequest(\n url,\n self.parse_result,\n screenshot=True\n)\n\ndef parse_result(self, response):\n with open('image.png', 'wb') as image_file:\n image_file.write(response.meta['screenshot])\n```\n\n\n\n", "description_content_type": "", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/calincc/scrapy_selenium_python_pi", "keywords": "", "license": "", "maintainer": "", "maintainer_email": "", "name": "scrapy-selenium-python-pi", "package_url": "https://pypi.org/project/scrapy-selenium-python-pi/", "platform": "", "project_url": "https://pypi.org/project/scrapy-selenium-python-pi/", "project_urls": { "Homepage": "https://github.com/calincc/scrapy_selenium_python_pi" }, "release_url": "https://pypi.org/project/scrapy-selenium-python-pi/0.2.0/", "requires_dist": [ "scrapy (>=1.0.0)", "selenium (>=3.9.0)" ], "requires_python": "", "summary": "Scrapy with selenium", "version": "0.2.0" }, "last_serial": 4579828, "releases": { "0.2.0": [ { "comment_text": "", "digests": { "md5": "3e118354e38fd8be92e70a4103d3fc50", "sha256": "3302e38c6e171fcc2d7f7f15a022b6a7c6b968dc12b5880a3de929c9be03f8d8" }, "downloads": -1, "filename": "scrapy_selenium_python_pi-0.2.0-py3-none-any.whl", "has_sig": false, "md5_digest": "3e118354e38fd8be92e70a4103d3fc50", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 8331, "upload_time": "2018-12-10T08:45:00", "url": "https://files.pythonhosted.org/packages/24/71/00c7610c73772243092ad2ab327bed6f7ff604044a7345ffe001f1a51873/scrapy_selenium_python_pi-0.2.0-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "e851e30109ee9621a369810fa164df15", "sha256": "f7709b32fcac3c5ea4c1a867f61e0fc3252015294bd9c6ea100cdb94b6268559" }, "downloads": -1, "filename": "scrapy_selenium_python_pi-0.2.0.tar.gz", "has_sig": false, "md5_digest": "e851e30109ee9621a369810fa164df15", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 5010, "upload_time": "2018-12-10T08:45:02", "url": "https://files.pythonhosted.org/packages/89/05/672128e1802f7297d157bfdd4ffcf3578b6760d2a5f90c93eba8faafe60f/scrapy_selenium_python_pi-0.2.0.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "3e118354e38fd8be92e70a4103d3fc50", "sha256": "3302e38c6e171fcc2d7f7f15a022b6a7c6b968dc12b5880a3de929c9be03f8d8" }, "downloads": -1, "filename": "scrapy_selenium_python_pi-0.2.0-py3-none-any.whl", "has_sig": false, "md5_digest": "3e118354e38fd8be92e70a4103d3fc50", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 8331, "upload_time": "2018-12-10T08:45:00", "url": "https://files.pythonhosted.org/packages/24/71/00c7610c73772243092ad2ab327bed6f7ff604044a7345ffe001f1a51873/scrapy_selenium_python_pi-0.2.0-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "e851e30109ee9621a369810fa164df15", "sha256": "f7709b32fcac3c5ea4c1a867f61e0fc3252015294bd9c6ea100cdb94b6268559" }, "downloads": -1, "filename": "scrapy_selenium_python_pi-0.2.0.tar.gz", "has_sig": false, "md5_digest": "e851e30109ee9621a369810fa164df15", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 5010, "upload_time": "2018-12-10T08:45:02", "url": "https://files.pythonhosted.org/packages/89/05/672128e1802f7297d157bfdd4ffcf3578b6760d2a5f90c93eba8faafe60f/scrapy_selenium_python_pi-0.2.0.tar.gz" } ] }