{ "info": { "author": "Mikhail Korobov", "author_email": "kmike84@gmail.com", "bugtrack_url": null, "classifiers": [ "Development Status :: 4 - Beta", "Framework :: Scrapy", "Intended Audience :: Developers", "License :: OSI Approved :: BSD License", "Operating System :: OS Independent", "Programming Language :: Python", "Programming Language :: Python :: 2", "Programming Language :: Python :: 2.7", "Programming Language :: Python :: 3", "Programming Language :: Python :: 3.4", "Programming Language :: Python :: 3.5", "Programming Language :: Python :: 3.6", "Topic :: Internet :: WWW/HTTP", "Topic :: Software Development :: Libraries :: Application Frameworks", "Topic :: Software Development :: Libraries :: Python Modules" ], "description": "==============================================\nScrapy & JavaScript integration through Splash\n==============================================\n\n.. image:: https://img.shields.io/pypi/v/scrapy-splash.svg\n :target: https://pypi.python.org/pypi/scrapy-splash\n :alt: PyPI Version\n\n.. image:: https://travis-ci.org/scrapy-plugins/scrapy-splash.svg?branch=master\n :target: http://travis-ci.org/scrapy-plugins/scrapy-splash\n :alt: Build Status\n\n.. image:: http://codecov.io/github/scrapy-plugins/scrapy-splash/coverage.svg?branch=master\n :target: http://codecov.io/github/scrapy-plugins/scrapy-splash?branch=master\n :alt: Code Coverage\n\nThis library provides Scrapy_ and JavaScript integration using Splash_.\nThe license is BSD 3-clause.\n\n.. _Scrapy: https://github.com/scrapy/scrapy\n.. _Splash: https://github.com/scrapinghub/splash\n\nInstallation\n============\n\nInstall scrapy-splash using pip::\n\n $ pip install scrapy-splash\n\nScrapy-Splash uses Splash_ HTTP API, so you also need a Splash instance.\nUsually to install & run Splash, something like this is enough::\n\n $ docker run -p 8050:8050 scrapinghub/splash\n\nCheck Splash `install docs`_ for more info.\n\n.. _install docs: http://splash.readthedocs.org/en/latest/install.html\n\n\nConfiguration\n=============\n\n1. Add the Splash server address to ``settings.py`` of your Scrapy project\n like this::\n\n SPLASH_URL = 'http://192.168.59.103:8050'\n\n2. Enable the Splash middleware by adding it to ``DOWNLOADER_MIDDLEWARES``\n in your ``settings.py`` file and changing HttpCompressionMiddleware\n priority::\n\n DOWNLOADER_MIDDLEWARES = {\n 'scrapy_splash.SplashCookiesMiddleware': 723,\n 'scrapy_splash.SplashMiddleware': 725,\n 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware': 810,\n }\n\n Order `723` is just before `HttpProxyMiddleware` (750) in default\n scrapy settings.\n\n HttpCompressionMiddleware priority should be changed in order to allow\n advanced response processing; see https://github.com/scrapy/scrapy/issues/1895\n for details.\n\n3. Enable ``SplashDeduplicateArgsMiddleware`` by adding it to\n ``SPIDER_MIDDLEWARES`` in your ``settings.py``::\n\n SPIDER_MIDDLEWARES = {\n 'scrapy_splash.SplashDeduplicateArgsMiddleware': 100,\n }\n\n This middleware is needed to support ``cache_args`` feature; it allows\n to save disk space by not storing duplicate Splash arguments multiple\n times in a disk request queue. If Splash 2.1+ is used the middleware\n also allows to save network traffic by not sending these duplicate\n arguments to Splash server multiple times.\n\n4. Set a custom ``DUPEFILTER_CLASS``::\n\n DUPEFILTER_CLASS = 'scrapy_splash.SplashAwareDupeFilter'\n\n5. If you use Scrapy HTTP cache then a custom cache storage backend\n is required. scrapy-splash provides a subclass of\n ``scrapy.contrib.httpcache.FilesystemCacheStorage``::\n\n HTTPCACHE_STORAGE = 'scrapy_splash.SplashAwareFSCacheStorage'\n\n If you use other cache storage then it is necesary to subclass it and\n replace all ``scrapy.util.request.request_fingerprint`` calls with\n ``scrapy_splash.splash_request_fingerprint``.\n\n.. note::\n\n Steps (4) and (5) are necessary because Scrapy doesn't provide a way\n to override request fingerprints calculation algorithm globally; this\n could change in future.\n\n\nThere are also some additional options available.\nPut them into your ``settings.py`` if you want to change the defaults:\n\n* ``SPLASH_COOKIES_DEBUG`` is ``False`` by default.\n Set to ``True`` to enable debugging cookies in the ``SplashCookiesMiddleware``.\n This option is similar to ``COOKIES_DEBUG``\n for the built-in scarpy cookies middleware: it logs sent and received cookies\n for all requests.\n* ``SPLASH_LOG_400`` is ``True`` by default - it instructs to log all 400 errors\n from Splash. They are important because they show errors occurred\n when executing the Splash script. Set it to ``False`` to disable this logging.\n* ``SPLASH_SLOT_POLICY`` is ``scrapy_splash.SlotPolicy.PER_DOMAIN`` by default.\n It specifies how concurrency & politeness are maintained for Splash requests,\n and specify the default value for ``slot_policy`` argument for\n ``SplashRequest``, which is described below.\n\n\nUsage\n=====\n\nRequests\n--------\n\nThe easiest way to render requests with Splash is to\nuse ``scrapy_splash.SplashRequest``::\n\n yield SplashRequest(url, self.parse_result,\n args={\n # optional; parameters passed to Splash HTTP API\n 'wait': 0.5,\n\n # 'url' is prefilled from request url\n # 'http_method' is set to 'POST' for POST requests\n # 'body' is set to request body for POST requests\n },\n endpoint='render.json', # optional; default is render.html\n splash_url='', # optional; overrides SPLASH_URL\n slot_policy=scrapy_splash.SlotPolicy.PER_DOMAIN, # optional\n )\n\nAlternatively, you can use regular scrapy.Request and\n``'splash'`` Request `meta` key::\n\n yield scrapy.Request(url, self.parse_result, meta={\n 'splash': {\n 'args': {\n # set rendering arguments here\n 'html': 1,\n 'png': 1,\n\n # 'url' is prefilled from request url\n # 'http_method' is set to 'POST' for POST requests\n # 'body' is set to request body for POST requests\n },\n\n # optional parameters\n 'endpoint': 'render.json', # optional; default is render.json\n 'splash_url': '', # optional; overrides SPLASH_URL\n 'slot_policy': scrapy_splash.SlotPolicy.PER_DOMAIN,\n 'splash_headers': {}, # optional; a dict with headers sent to Splash\n 'dont_process_response': True, # optional, default is False\n 'dont_send_headers': True, # optional, default is False\n 'magic_response': False, # optional, default is True\n }\n })\n\nUse ``request.meta['splash']`` API in middlewares or when scrapy.Request\nsubclasses are used (there is also ``SplashFormRequest`` described below).\nFor example, ``meta['splash']`` allows to create a middleware which enables\nSplash for all outgoing requests by default.\n\n``SplashRequest`` is a convenient utility to fill ``request.meta['splash']``;\nit should be easier to use in most cases. For each ``request.meta['splash']``\nkey there is a corresponding ``SplashRequest`` keyword argument: for example,\nto set ``meta['splash']['args']`` use ``SplashRequest(..., args=myargs)``.\n\n* ``meta['splash']['args']`` contains arguments sent to Splash.\n scrapy-splash adds some default keys/values to ``args``:\n\n * 'url' is set to request.url;\n * 'http_method' is set to 'POST' for POST requests;\n * 'body' is set to to request.body for POST requests.\n\n You can override default values by setting them explicitly.\n\n Note that by default Scrapy escapes URL fragments using AJAX escaping scheme.\n If you want to pass a URL with a fragment to Splash then set ``url``\n in ``args`` dict manually. This is handled automatically if you use\n ``SplashRequest``, but you need to keep that in mind if you use raw\n ``meta['splash']`` API.\n\n Splash 1.8+ is required to handle POST requests; in earlier Splash versions\n 'http_method' and 'body' arguments are ignored. If you work with ``/execute``\n endpoint and want to support POST requests you have to handle\n ``http_method`` and ``body`` arguments in your Lua script manually.\n\n* ``meta['splash']['cache_args']`` is a list of argument names to cache\n on Splash side. These arguments are sent to Splash only once, then cached\n values are used; it allows to save network traffic and decreases request\n queue disk memory usage. Use ``cache_args`` only for large arguments\n which don't change with each request; ``lua_source`` is a good candidate\n (if you don't use string formatting to build it). Splash 2.1+ is required\n for this feature to work.\n\n* ``meta['splash']['endpoint']`` is the Splash endpoint to use.\n In case of SplashRequest\n `render.html `_\n is used by default. If you're using raw scrapy.Request then\n `render.json `_\n is a default (for historical reasons). It is better to always pass endpoint\n explicitly.\n\n See Splash `HTTP API docs`_ for a full list of available endpoints\n and parameters.\n\n.. _HTTP API docs: http://splash.readthedocs.org/en/latest/api.html\n\n* ``meta['splash']['splash_url']`` overrides the Splash URL set\n in ``settings.py``.\n\n* ``meta['splash']['splash_headers']`` allows to add or change headers\n which are sent to Splash server. Note that this option **is not** for\n setting headers which are sent to the remote website.\n\n* ``meta['splash']['slot_policy']`` customize how\n concurrency & politeness are maintained for Splash requests.\n\n Currently there are 3 policies available:\n\n 1. ``scrapy_splash.SlotPolicy.PER_DOMAIN`` (default) - send Splash requests to\n downloader slots based on URL being rendered. It is useful if you want\n to maintain per-domain politeness & concurrency settings.\n\n 2. ``scrapy_splash.SlotPolicy.SINGLE_SLOT`` - send all Splash requests to\n a single downloader slot. It is useful if you want to throttle requests\n to Splash.\n\n 3. ``scrapy_splash.SlotPolicy.SCRAPY_DEFAULT`` - don't do anything with slots.\n It is similar to ``SINGLE_SLOT`` policy, but can be different if you access\n other services on the same address as Splash.\n\n* ``meta['splash']['dont_process_response']`` - when set to True,\n SplashMiddleware won't change the response to a custom scrapy.Response\n subclass. By default for Splash requests one of SplashResponse,\n SplashTextResponse or SplashJsonResponse is passed to the callback.\n\n* ``meta['splash']['dont_send_headers']``: by default scrapy-splash passes\n request headers to Splash in 'headers' JSON POST field. For all render.xxx\n endpoints it means Scrapy header options are respected by default\n (http://splash.readthedocs.org/en/stable/api.html#arg-headers). In Lua\n scripts you can use ``headers`` argument of ``splash:go`` to apply the\n passed headers: ``splash:go{url, headers=splash.args.headers}``.\n\n Set 'dont_send_headers' to True if you don't want to pass ``headers``\n to Splash.\n\n* ``meta['splash']['http_status_from_error_code']`` - set response.status\n to HTTP error code when ``assert(splash:go(..))`` fails; it requires\n ``meta['splash']['magic_response']=True``. ``http_status_from_error_code``\n option is False by default if you use raw meta API;\n SplashRequest sets it to True by default.\n\n* ``meta['splash']['magic_response']`` - when set to True and a JSON\n response is received from Splash, several attributes of the response\n (headers, body, url, status code) are filled using data returned in JSON:\n\n * response.headers are filled from 'headers' keys;\n * response.url is set to the value of 'url' key;\n * response.body is set to the value of 'html' key,\n or to base64-decoded value of 'body' key;\n * response.status is set to the value of 'http_status' key.\n When ``meta['splash']['http_status_from_error_code']`` is True\n and ``assert(splash:go(..))`` fails with an HTTP error\n response.status is also set to HTTP error code.\n\n This option is set to True by default if you use SplashRequest.\n ``render.json`` and ``execute`` endpoints may not have all the necessary\n keys/values in the response.\n For non-JSON endpoints, only url is filled, regardless of the\n ``magic_response`` setting.\n\n\nUse ``scrapy_splash.SplashFormRequest`` if you want to make a ``FormRequest``\nvia splash. It accepts the same arguments as ``SplashRequest``,\nand also ``formdata``, like ``FormRequest`` from scrapy::\n\n >>> SplashFormRequest('http://example.com', formdata={'foo': 'bar'})\n \n\n``SplashFormRequest.from_response`` is also supported, and works as described\nin `scrapy documentation `_.\n\nResponses\n---------\n\nscrapy-splash returns Response subclasses for Splash requests:\n\n* SplashResponse is returned for binary Splash responses - e.g. for\n /render.png responses;\n* SplashTextResponse is returned when the result is text - e.g. for\n /render.html responses;\n* SplashJsonResponse is returned when the result is a JSON object - e.g.\n for /render.json responses or /execute responses when script returns\n a Lua table.\n\nTo use standard Response classes set ``meta['splash']['dont_process_response']=True``\nor pass ``dont_process_response=True`` argument to SplashRequest.\n\nAll these responses set ``response.url`` to the URL of the original request\n(i.e. to the URL of a website you want to render), not to the URL of the\nrequested Splash endpoint. \"True\" URL is still available as\n``response.real_url``.\n\nSplashJsonResponse provide extra features:\n\n* ``response.data`` attribute contains response data decoded from JSON;\n you can access it like ``response.data['html']``.\n\n* If Splash session handling is configured, you can access current cookies\n as ``response.cookiejar``; it is a CookieJar instance.\n\n* If Scrapy-Splash response magic is enabled in request (default),\n several response attributes (headers, body, url, status code)\n are set automatically from original response body:\n\n * response.headers are filled from 'headers' keys;\n * response.url is set to the value of 'url' key;\n * response.body is set to the value of 'html' key,\n or to base64-decoded value of 'body' key;\n * response.status is set from the value of 'http_status' key.\n\nWhen ``respone.body`` is updated in SplashJsonResponse\n(either from 'html' or from 'body' keys) familiar ``response.css``\nand ``response.xpath`` methods are available.\n\nTo turn off special handling of JSON result keys either set\n``meta['splash']['magic_response']=False`` or pass ``magic_response=False``\nargument to SplashRequest.\n\nSession Handling\n================\n\nSplash itself is stateless - each request starts from a clean state.\nIn order to support sessions the following is required:\n\n1. client (Scrapy) must send current cookies to Splash;\n2. Splash script should make requests using these cookies and update\n them from HTTP response headers or JavaScript code;\n3. updated cookies should be sent back to the client;\n4. client should merge current cookies wiht the updated cookies.\n\nFor (2) and (3) Splash provides ``splash:get_cookies()`` and\n``splash:init_cookies()`` methods which can be used in Splash Lua scripts.\n\nscrapy-splash provides helpers for (1) and (4): to send current cookies\nin 'cookies' field and merge cookies back from 'cookies' response field\nset ``request.meta['splash']['session_id']`` to the session\nidentifier. If you only want a single session use the same ``session_id`` for\nall request; any value like '1' or 'foo' is fine.\n\nFor scrapy-splash session handling to work you must use ``/execute`` endpoint\nand a Lua script which accepts 'cookies' argument and returns 'cookies'\nfield in the result::\n\n function main(splash)\n splash:init_cookies(splash.args.cookies)\n\n -- ... your script\n\n return {\n cookies = splash:get_cookies(),\n -- ... other results, e.g. html\n }\n end\n\nSplashRequest sets ``session_id`` automatically for ``/execute`` endpoint,\ni.e. cookie handling is enabled by default if you use SplashRequest,\n``/execute`` endpoint and a compatible Lua rendering script.\n\nIf you want to start from the same set of cookies, but then 'fork' sessions\nset ``request.meta['splash']['new_session_id']`` in addition to\n``session_id``. Request cookies will be fetched from cookiejar ``session_id``,\nbut response cookies will be merged back to the ``new_session_id`` cookiejar.\n\nStandard Scrapy ``cookies`` argument can be used with ``SplashRequest``\nto add cookies to the current Splash cookiejar.\n\nExamples\n========\n\nGet HTML contents::\n\n import scrapy\n from scrapy_splash import SplashRequest\n\n class MySpider(scrapy.Spider):\n start_urls = [\"http://example.com\", \"http://example.com/foo\"]\n\n def start_requests(self):\n for url in self.start_urls:\n yield SplashRequest(url, self.parse, args={'wait': 0.5})\n\n def parse(self, response):\n # response.body is a result of render.html call; it\n # contains HTML processed by a browser.\n # ...\n\nGet HTML contents and a screenshot::\n\n import json\n import base64\n import scrapy\n from scrapy_splash import SplashRequest\n\n class MySpider(scrapy.Spider):\n\n # ...\n splash_args = {\n 'html': 1,\n 'png': 1,\n 'width': 600,\n 'render_all': 1,\n }\n yield SplashRequest(url, self.parse_result, endpoint='render.json',\n args=splash_args)\n\n # ...\n def parse_result(self, response):\n # magic responses are turned ON by default,\n # so the result under 'html' key is available as response.body\n html = response.body\n\n # you can also query the html result as usual\n title = response.css('title').extract_first()\n\n # full decoded JSON data is available as response.data:\n png_bytes = base64.b64decode(response.data['png'])\n\n # ...\n\nRun a simple `Splash Lua Script`_::\n\n import json\n import base64\n from scrapy_splash import SplashRequest\n\n\n class MySpider(scrapy.Spider):\n\n # ...\n script = \"\"\"\n function main(splash)\n assert(splash:go(splash.args.url))\n return splash:evaljs(\"document.title\")\n end\n \"\"\"\n yield SplashRequest(url, self.parse_result, endpoint='execute',\n args={'lua_source': script})\n\n # ...\n def parse_result(self, response):\n doc_title = response.body_as_unicode()\n # ...\n\n\nMore complex `Splash Lua Script`_ example - get a screenshot of an HTML\nelement by its CSS selector (it requires Splash 2.1+).\nNote how are arguments passed to the script::\n\n import json\n import base64\n from scrapy_splash import SplashRequest\n\n script = \"\"\"\n -- Arguments:\n -- * url - URL to render;\n -- * css - CSS selector to render;\n -- * pad - screenshot padding size.\n\n -- this function adds padding around region\n function pad(r, pad)\n return {r[1]-pad, r[2]-pad, r[3]+pad, r[4]+pad}\n end\n\n -- main script\n function main(splash)\n\n -- this function returns element bounding box\n local get_bbox = splash:jsfunc([[\n function(css) {\n var el = document.querySelector(css);\n var r = el.getBoundingClientRect();\n return [r.left, r.top, r.right, r.bottom];\n }\n ]])\n\n assert(splash:go(splash.args.url))\n assert(splash:wait(0.5))\n\n -- don't crop image by a viewport\n splash:set_viewport_full()\n\n local region = pad(get_bbox(splash.args.css), splash.args.pad)\n return splash:png{region=region}\n end\n \"\"\"\n\n class MySpider(scrapy.Spider):\n\n\n # ...\n yield SplashRequest(url, self.parse_element_screenshot,\n endpoint='execute',\n args={\n 'lua_source': script,\n 'pad': 32,\n 'css': 'a.title'\n }\n )\n\n # ...\n def parse_element_screenshot(self, response):\n image_data = response.body # binary image data in PNG format\n # ...\n\n\nUse a Lua script to get an HTML response with cookies, headers, body\nand method set to correct values; ``lua_source`` argument value is cached\non Splash server and is not sent with each request (it requires Splash 2.1+)::\n\n import scrapy\n from scrapy_splash import SplashRequest\n\n script = \"\"\"\n function main(splash)\n splash:init_cookies(splash.args.cookies)\n assert(splash:go{\n splash.args.url,\n headers=splash.args.headers,\n http_method=splash.args.http_method,\n body=splash.args.body,\n })\n assert(splash:wait(0.5))\n\n local entries = splash:history()\n local last_response = entries[#entries].response\n return {\n url = splash:url(),\n headers = last_response.headers,\n http_status = last_response.status,\n cookies = splash:get_cookies(),\n html = splash:html(),\n }\n end\n \"\"\"\n\n class MySpider(scrapy.Spider):\n\n\n # ...\n yield SplashRequest(url, self.parse_result,\n endpoint='execute',\n cache_args=['lua_source'],\n args={'lua_source': script},\n headers={'X-My-Header': 'value'},\n )\n\n def parse_result(self, response):\n # here response.body contains result HTML;\n # response.headers are filled with headers from last\n # web page loaded to Splash;\n # cookies from all responses and from JavaScript are collected\n # and put into Set-Cookie response header, so that Scrapy\n # can remember them.\n\n\n\n.. _Splash Lua Script: http://splash.readthedocs.org/en/latest/scripting-tutorial.html\n\n\nHTTP Basic Auth\n===============\n\nIf you need HTTP Basic Authentication to access Splash, use\nScrapy's HttpAuthMiddleware_.\n\nAnother option is ``meta['splash']['splash_headers']``: it allows to set\ncustom headers which are sent to Splash server; add Authorization header\nto ``splash_headers`` if HttpAuthMiddleware doesn't fit for some reason.\n\n.. _HttpAuthMiddleware: http://doc.scrapy.org/en/latest/topics/downloader-middleware.html#module-scrapy.downloadermiddlewares.httpauth\n\nWhy not use the Splash HTTP API directly?\n=========================================\n\nThe obvious alternative to scrapy-splash would be to send requests directly\nto the Splash `HTTP API`_. Take a look at the example below and make\nsure to read the observations after it::\n\n import json\n\n import scrapy\n from scrapy.http.headers import Headers\n\n RENDER_HTML_URL = \"http://127.0.0.1:8050/render.html\"\n\n class MySpider(scrapy.Spider):\n start_urls = [\"http://example.com\", \"http://example.com/foo\"]\n\n def start_requests(self):\n for url in self.start_urls:\n body = json.dumps({\"url\": url, \"wait\": 0.5}, sort_keys=True)\n headers = Headers({'Content-Type': 'application/json'})\n yield scrapy.Request(RENDER_HTML_URL, self.parse, method=\"POST\",\n body=body, headers=headers)\n\n def parse(self, response):\n # response.body is a result of render.html call; it\n # contains HTML processed by a browser.\n # ...\n\n\nIt works and is easy enough, but there are some issues that you should be\naware of:\n\n1. There is a bit of boilerplate.\n\n2. As seen by Scrapy, we're sending requests to ``RENDER_HTML_URL`` instead\n of the target URLs. It affects concurrency and politeness settings:\n ``CONCURRENT_REQUESTS_PER_DOMAIN``, ``DOWNLOAD_DELAY``, etc could behave\n in unexpected ways since delays and concurrency settings are no longer\n per-domain.\n\n3. As seen by Scrapy, response.url is an URL of the Splash server.\n scrapy-splash fixes it to be an URL of a requested page.\n \"Real\" URL is still available as ``response.real_url``.\n\n4. Some options depend on each other - for example, if you use timeout_\n Splash option then you may want to set ``download_timeout``\n scrapy.Request meta key as well.\n\n5. It is easy to get it subtly wrong - e.g. if you won't use\n ``sort_keys=True`` argument when preparing JSON body then binary POST body\n content could vary even if all keys and values are the same, and it means\n dupefilter and cache will work incorrectly.\n\n6. Default Scrapy duplication filter doesn't take Splash specifics in\n account. For example, if an URL is sent in a JSON POST request body\n Scrapy will compute request fingerprint without canonicalizing this URL.\n\n7. Splash Bad Request (HTTP 400) errors are hard to debug because by default\n response content is not displayed by Scrapy. SplashMiddleware logs content\n of HTTP 400 Splash responses by default (it can be turned off by setting\n ``SPLASH_LOG_400 = False`` option).\n\n8. Cookie handling is tedious to implement, and you can't use Scrapy\n built-in Cookie middleware to handle cookies when working with Splash.\n\n9. Large Splash arguments which don't change with every request\n (e.g. ``lua_source``) may take a lot of space when saved to Scrapy disk\n request queues. ``scrapy-splash`` provides a way to store such static\n parameters only once.\n\n10. Splash 2.1+ provides a way to save network traffic by caching large\n static arguments on server, but it requires client support: client should\n send proper ``save_args`` and ``load_args`` values and handle HTTP 498\n responses.\n\nscrapy-splash utlities allow to handle such edge cases and reduce\nthe boilerplate.\n\n.. _HTTP API: http://splash.readthedocs.org/en/latest/api.html\n.. _timeout: http://splash.readthedocs.org/en/latest/api.html#arg-timeout\n\n\nContributing\n============\n\nSource code and bug tracker are on github:\nhttps://github.com/scrapy-plugins/scrapy-splash\n\nTo run tests, install \"tox\" Python package and then run ``tox`` command\nfrom the source checkout.\n\n\nChanges\n=======\n\n0.7.2 (2017-03-30)\n------------------\n\n* fixed issue with response type detection.\n\n0.7.1 (2016-12-20)\n------------------\n\n* Scrapy 1.0.x support is back;\n* README updates.\n\n0.7 (2016-05-16)\n----------------\n\n* ``SPLASH_COOKIES_DEBUG`` setting allows to log cookies\n sent and received to/from Splash in ``cookies`` request/response fields.\n It is similar to Scrapy's builtin ``COOKIES_DEBUG``, but works for\n Splash requests;\n* README cleanup.\n\n0.6.1 (2016-04-29)\n------------------\n\n* Warning about HTTP methods is no longer logged for non-Splash requests.\n\n0.6 (2016-04-20)\n----------------\n\n* ``SplashAwareDupeFilter`` and ``splash_request_fingerprint`` are improved:\n they now canonicalize URLs and take URL fragments in account;\n* ``cache_args`` value fingerprints are now calculated faster.\n\n0.5 (2016-04-18)\n----------------\n\n* ``cache_args`` SplashRequest argument and\n ``request.meta['splash']['cache_args']`` key allow to save network traffic\n and disk storage by not storing duplicate Splash arguments in disk request\n queues and not sending them to Splash multiple times. This feature requires\n Splash 2.1+.\n\nTo upgrade from v0.4 enable ``SplashDeduplicateArgsMiddleware`` in settings.py::\n\n SPIDER_MIDDLEWARES = {\n 'scrapy_splash.SplashDeduplicateArgsMiddleware': 100,\n }\n\n0.4 (2016-04-14)\n----------------\n\n* SplashFormRequest class is added; it is a variant of FormRequest which uses\n Splash;\n* Splash parameters are no longer stored in request.meta twice; this change\n should decrease disk queues data size;\n* SplashMiddleware now increases request priority when rescheduling the request;\n this should decrease disk queue data size and help with stale cookie\n problems.\n\n0.3 (2016-04-11)\n----------------\n\nPackage is renamed from ``scrapyjs`` to ``scrapy-splash``.\n\nAn easiest way to upgrade is to replace ``scrapyjs`` imports with\n``scrapy_splash`` and update ``settings.py`` with new defaults\n(check the README).\n\nThere are many new helpers to handle JavaScript rendering transparently;\nthe recommended way is now to use ``scrapy_splash.SplashRequest`` instead\nof ``request.meta['splash']``. Please make sure to read the README if\nyou're upgrading from scrapyjs - you may be able to drop some code from your\nproject, especially if you want to access response html, handle cookies\nand headers.\n\n* new SplashRequest class; it can be used as a replacement for scrapy.Request\n to provide a better integration with Splash;\n* added support for POST requests;\n* SplashResponse, SplashTextResponse and SplashJsonResponse allow to\n handle Splash responses transparently, taking care of response.url,\n response.body, response.headers and response.status. SplashJsonResponse\n allows to access decoded response JSON data as ``response.data``.\n* cookie handling improvements: it is possible to handle Scrapy and Splash\n cookies transparently; current cookiejar is exposed as response.cookiejar;\n* headers are passed to Splash by default;\n* URLs with fragments are handled automatically when using SplashRequest;\n* logging is improved: ``SplashRequest.__repr__`` shows both requested URL\n and Splash URL;\n* in case of Splash HTTP 400 errors the response is logged by default;\n* an issue with dupefilters is fixed: previously the order of keys in\n JSON request body could vary, making requests appear as non-duplicates;\n* it is now possible to pass custom headers to Splash server itself;\n* test coverage reports are enabled.\n\n0.2 (2016-03-26)\n----------------\n\n* Scrapy 1.0 and 1.1 support;\n* Python 3 support;\n* documentation improvements;\n* project is moved to https://github.com/scrapy-plugins/scrapy-splash.\n\n0.1.1 (2015-03-16)\n------------------\n\nFixed fingerprint calculation for non-string meta values.\n\n0.1 (2015-02-28)\n----------------\n\nInitial release\n", "description_content_type": null, "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/scrapy-plugins/scrapy-splash", "keywords": "", "license": "BSD", "maintainer": "", "maintainer_email": "", "name": "scrapy-splash", "package_url": "https://pypi.org/project/scrapy-splash/", "platform": "", "project_url": "https://pypi.org/project/scrapy-splash/", "project_urls": { "Homepage": "https://github.com/scrapy-plugins/scrapy-splash" }, "release_url": "https://pypi.org/project/scrapy-splash/0.7.2/", "requires_dist": null, "requires_python": "", "summary": "JavaScript support for Scrapy using Splash", "version": "0.7.2" }, "last_serial": 4308913, "releases": { "0.2": [], "0.3": [ { "comment_text": "", "digests": { "md5": "48b18504fce89687cda012ca4d8b049f", "sha256": "628a77562c5254bac5c01f3656ab8e52dd325a13246bdac776fbf53212ea3d1d" }, "downloads": -1, "filename": "scrapy_splash-0.3-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "48b18504fce89687cda012ca4d8b049f", "packagetype": "bdist_wheel", "python_version": "3.5", "requires_python": null, "size": 39099, "upload_time": "2016-04-11T16:45:03", "url": "https://files.pythonhosted.org/packages/9b/0c/2a0d39260dbd870c4461d424624714b35eec3a2d177df2c8fc6c4e7e6958/scrapy_splash-0.3-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "e28368f31eee26fba87ba71b0a9bdd5f", "sha256": "309059c194b6d37ac9e62ff53d3c3f4fa97559c213d661d781b95e7a14e9fe40" }, "downloads": -1, "filename": "scrapy-splash-0.3.tar.gz", "has_sig": false, "md5_digest": "e28368f31eee26fba87ba71b0a9bdd5f", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 33538, "upload_time": "2016-04-11T16:44:45", "url": "https://files.pythonhosted.org/packages/73/40/77f3ac9d32d3133d0d9354bd99291f5819af9c8bdf47b31b93e90d2f6a70/scrapy-splash-0.3.tar.gz" } ], "0.4": [ { "comment_text": "", "digests": { "md5": "f31dab288d9d56dae2e87b8ec580de2b", "sha256": "8e7ef09c8d9de5426063d07be934b080869264bf561d088b844b3d2b9b584491" }, "downloads": -1, "filename": "scrapy_splash-0.4-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "f31dab288d9d56dae2e87b8ec580de2b", "packagetype": "bdist_wheel", "python_version": "3.5", "requires_python": null, "size": 40307, "upload_time": "2016-04-14T18:02:45", "url": "https://files.pythonhosted.org/packages/52/ed/3a32fea797aab0b79652f8de239af7a28feffe9605747d78bbf643a28b2c/scrapy_splash-0.4-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "ffba28e725162c9c7871b4d8c0fa1b98", "sha256": "f8d52cb8c19e98327834327be92161a5463a9b2b7b9bb5a82b78c9000c91462a" }, "downloads": -1, "filename": "scrapy-splash-0.4.tar.gz", "has_sig": false, "md5_digest": "ffba28e725162c9c7871b4d8c0fa1b98", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 34988, "upload_time": "2016-04-14T18:01:38", "url": "https://files.pythonhosted.org/packages/e4/22/92203a0f6d860186c943e58411ef239f3de3cf7878b4fe8902ddadf7986a/scrapy-splash-0.4.tar.gz" } ], "0.5": [ { "comment_text": "", "digests": { "md5": "c9da5381e1064861933c0aae1041adf3", "sha256": "2defee778571592d6a7cfdd8e95fd75f5a9c17a42db2a8af08fdbbb23ba155ac" }, "downloads": -1, "filename": "scrapy_splash-0.5-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "c9da5381e1064861933c0aae1041adf3", "packagetype": "bdist_wheel", "python_version": "3.5", "requires_python": null, "size": 43098, "upload_time": "2016-04-18T18:32:51", "url": "https://files.pythonhosted.org/packages/80/46/fd6c527abef5d5968524dbe29d62ae580c5b9c3892af0c8ed708d4dc9e08/scrapy_splash-0.5-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "6d74a386e6c82d40e8bf5363cddfd345", "sha256": "fdf1560bb409bc706e7d39e21691974a0506d6e7b0d2c4dd6882889ac2ef4c78" }, "downloads": -1, "filename": "scrapy-splash-0.5.tar.gz", "has_sig": false, "md5_digest": "6d74a386e6c82d40e8bf5363cddfd345", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 40274, "upload_time": "2016-04-18T18:32:28", "url": "https://files.pythonhosted.org/packages/74/22/567d37efa79ac7603ab5f507f3b31ae099dbb20d2cf5437eaf0ad279d626/scrapy-splash-0.5.tar.gz" } ], "0.6": [ { "comment_text": "", "digests": { "md5": "125c51630df3542f909cefde349a4a23", "sha256": "f0ba729d946bdcbfb1f15c370b121db9781436af64962691ef935b257fb32c13" }, "downloads": -1, "filename": "scrapy_splash-0.6-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "125c51630df3542f909cefde349a4a23", "packagetype": "bdist_wheel", "python_version": "3.5", "requires_python": null, "size": 44273, "upload_time": "2016-04-20T17:30:43", "url": "https://files.pythonhosted.org/packages/6f/79/f083be077be6796acbd0268030e5ec37987df69322928c4c561c1cf30902/scrapy_splash-0.6-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "2e1807995d66c1678e833a51e0fc1054", "sha256": "df9ba5b357fe17bbf5aec758211327a23397db5e383103b4b043127db837619c" }, "downloads": -1, "filename": "scrapy-splash-0.6.tar.gz", "has_sig": false, "md5_digest": "2e1807995d66c1678e833a51e0fc1054", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 44196, "upload_time": "2016-04-20T17:29:52", "url": "https://files.pythonhosted.org/packages/f8/e5/804061d5a7a1a334b65a86882bd8910d6d7d4aa8d2c7f4a5256fcf598d3c/scrapy-splash-0.6.tar.gz" } ], "0.6.1": [ { "comment_text": "", "digests": { "md5": "6f82818735abb90e29302d2d0ecfeac5", "sha256": "139b901f47163e532f67c079e3e46b46443a0ddddcb686de809f2017f09f768d" }, "downloads": -1, "filename": "scrapy_splash-0.6.1-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "6f82818735abb90e29302d2d0ecfeac5", "packagetype": "bdist_wheel", "python_version": "3.5", "requires_python": null, "size": 44381, "upload_time": "2016-04-29T17:03:11", "url": "https://files.pythonhosted.org/packages/76/d3/a50a7a79a1189c9cdbf017ed737b2b3e2f011bf0f42ee981e5a0e8513d86/scrapy_splash-0.6.1-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "6ee9f01762b75f728f110c787af178c2", "sha256": "2ae17191e91a7f031f5639569794f9a4f8340fc334c95b5a941f324fd1c302c5" }, "downloads": -1, "filename": "scrapy-splash-0.6.1.tar.gz", "has_sig": false, "md5_digest": "6ee9f01762b75f728f110c787af178c2", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 44354, "upload_time": "2016-04-29T17:02:49", "url": "https://files.pythonhosted.org/packages/03/7a/024d1355899337af9bcd9533942405d450b5f1f7761c6e837b7b3d46889a/scrapy-splash-0.6.1.tar.gz" } ], "0.7": [ { "comment_text": "", "digests": { "md5": "784c891ac605df47e301d32340603f71", "sha256": "eae572529f216566bfbb6d4912f5818b3f8f8e4c30508d9ce1e07ecd31e7d79f" }, "downloads": -1, "filename": "scrapy_splash-0.7-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "784c891ac605df47e301d32340603f71", "packagetype": "bdist_wheel", "python_version": "3.5", "requires_python": null, "size": 45345, "upload_time": "2016-05-16T10:38:58", "url": "https://files.pythonhosted.org/packages/c9/b0/d8f93666c73a29cb6a52e4a56e2307af2f5b9a478348dbcd5a0bb51ea62f/scrapy_splash-0.7-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "4e27654c602d90ed08d99fef04e68803", "sha256": "7640d762a4eed1bef3efa7079f81eb7c6fc00a15323f85386e47ff86ff197418" }, "downloads": -1, "filename": "scrapy-splash-0.7.tar.gz", "has_sig": false, "md5_digest": "4e27654c602d90ed08d99fef04e68803", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 46464, "upload_time": "2016-05-16T10:38:45", "url": "https://files.pythonhosted.org/packages/aa/fd/6e4d5764fac0acde3b13bd4b11567c23a60005164746a9d06797a719126e/scrapy-splash-0.7.tar.gz" } ], "0.7.1": [ { "comment_text": "", "digests": { "md5": "bb25527bd188a901c681b7ba587c2ab9", "sha256": "98156df086c76322a69ae513e18e221c823db217e412d34b36b17e5394b96ed8" }, "downloads": -1, "filename": "scrapy_splash-0.7.1-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "bb25527bd188a901c681b7ba587c2ab9", "packagetype": "bdist_wheel", "python_version": "3.5", "requires_python": null, "size": 45405, "upload_time": "2016-12-20T16:14:17", "url": "https://files.pythonhosted.org/packages/90/ff/237050b2e2220309323da7e284d426199723ee990ae2221ec6e70fca7b3a/scrapy_splash-0.7.1-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "c3c2cced54ab312fd0d8ee6c80691228", "sha256": "2d781d175c1e9a099ee5351a3d5b33c029b42192db4c4cbcb4d8122f8135173f" }, "downloads": -1, "filename": "scrapy-splash-0.7.1.tar.gz", "has_sig": false, "md5_digest": "c3c2cced54ab312fd0d8ee6c80691228", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 46564, "upload_time": "2016-12-20T16:14:07", "url": "https://files.pythonhosted.org/packages/44/7c/7276ea84c748a81437f5385c570ce3ddf92c025c5ceedc13bebd55c0c642/scrapy-splash-0.7.1.tar.gz" } ], "0.7.2": [ { "comment_text": "", "digests": { "md5": "2d78d9de5a6774feda32a4c5850f2610", "sha256": "71ac958370f8732fec746a25a8235b03a4d3c4c93a59be51aa8e910a08cfe511" }, "downloads": -1, "filename": "scrapy_splash-0.7.2-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "2d78d9de5a6774feda32a4c5850f2610", "packagetype": "bdist_wheel", "python_version": "3.5", "requires_python": null, "size": 34554, "upload_time": "2017-03-29T23:13:59", "url": "https://files.pythonhosted.org/packages/64/19/aa6e9559ca16a4daec98f6451748dd1cae9a91e7f43069cc1d294f7576bc/scrapy_splash-0.7.2-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "a745b340006d674f9f6126b3f75853fa", "sha256": "089188b11202813b14b88d3be008641a2b3d36e82fadd4a6c61f0af59b66e7b5" }, "downloads": -1, "filename": "scrapy-splash-0.7.2.tar.gz", "has_sig": false, "md5_digest": "a745b340006d674f9f6126b3f75853fa", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 47156, "upload_time": "2017-03-29T23:13:43", "url": "https://files.pythonhosted.org/packages/cd/cb/9e8ba2530b1c267f3ff7b2f1e32688ec99a08e7989154218b41fa43f7a87/scrapy-splash-0.7.2.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "2d78d9de5a6774feda32a4c5850f2610", "sha256": "71ac958370f8732fec746a25a8235b03a4d3c4c93a59be51aa8e910a08cfe511" }, "downloads": -1, "filename": "scrapy_splash-0.7.2-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "2d78d9de5a6774feda32a4c5850f2610", "packagetype": "bdist_wheel", "python_version": "3.5", "requires_python": null, "size": 34554, "upload_time": "2017-03-29T23:13:59", "url": "https://files.pythonhosted.org/packages/64/19/aa6e9559ca16a4daec98f6451748dd1cae9a91e7f43069cc1d294f7576bc/scrapy_splash-0.7.2-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "a745b340006d674f9f6126b3f75853fa", "sha256": "089188b11202813b14b88d3be008641a2b3d36e82fadd4a6c61f0af59b66e7b5" }, "downloads": -1, "filename": "scrapy-splash-0.7.2.tar.gz", "has_sig": false, "md5_digest": "a745b340006d674f9f6126b3f75853fa", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 47156, "upload_time": "2017-03-29T23:13:43", "url": "https://files.pythonhosted.org/packages/cd/cb/9e8ba2530b1c267f3ff7b2f1e32688ec99a08e7989154218b41fa43f7a87/scrapy-splash-0.7.2.tar.gz" } ] }