{
    "info": {
        "author": "Mikhail Korobov",
        "author_email": "kmike84@gmail.com",
        "bugtrack_url": null,
        "classifiers": [
            "Development Status :: 3 - Alpha",
            "Framework :: Scrapy",
            "Intended Audience :: Developers",
            "License :: OSI Approved :: BSD License",
            "Operating System :: OS Independent",
            "Programming Language :: Python",
            "Programming Language :: Python :: 2",
            "Programming Language :: Python :: 2.7",
            "Programming Language :: Python :: 3",
            "Programming Language :: Python :: 3.3",
            "Programming Language :: Python :: 3.4",
            "Programming Language :: Python :: 3.5",
            "Topic :: Internet :: WWW/HTTP",
            "Topic :: Software Development :: Libraries :: Application Frameworks",
            "Topic :: Software Development :: Libraries :: Python Modules"
        ],
        "description": "=========================================================\nScrapyJS - Scrapy & JavaScript integration through Splash\n=========================================================\n\n.. image:: https://travis-ci.org/scrapy-plugins/scrapy-splash.svg?branch=master\n   :target: http://travis-ci.org/scrapy-plugins/scrapy-splash\n\nThis library provides Scrapy_ and JavaScript integration using Splash_.\nThe license is BSD 3-clause.\n\n.. _Scrapy: https://github.com/scrapy/scrapy\n.. _Splash: https://github.com/scrapinghub/splash\n\nInstallation\n============\n\nInstall ScrapyJS using pip::\n\n    $ pip install scrapyjs\n\nScrapyJS uses Splash_ HTTP API, so you also need a Splash instance.\nUsually to install & run Splash, something like this is enough::\n\n    $ docker run -p 8050:8050 scrapinghub/splash\n\nCheck Splash `install docs`_ for more info.\n\n.. _install docs: http://splash.readthedocs.org/en/latest/install.html\n\n\nConfiguration\n=============\n\n1. Add the Splash server address to ``settings.py`` of your Scrapy project like this::\n\n      SPLASH_URL = 'http://192.168.59.103:8050'\n\n2. Enable the Splash middleware by adding it to ``DOWNLOADER_MIDDLEWARES`` in your ``settings.py`` file::\n\n      DOWNLOADER_MIDDLEWARES = {\n          'scrapyjs.SplashMiddleware': 725,\n      }\n\n.. note::\n\n   Order `725` is just before `HttpProxyMiddleware` (750) in default\n   scrapy settings.\n\n3. Set a custom ``DUPEFILTER_CLASS``::\n\n      DUPEFILTER_CLASS = 'scrapyjs.SplashAwareDupeFilter'\n\n4. If you use Scrapy HTTP cache then a custom cache storage backend is required.\n   ScrapyJS provides a subclass of ``scrapy.contrib.httpcache.FilesystemCacheStorage``::\n\n      HTTPCACHE_STORAGE = 'scrapyjs.SplashAwareFSCacheStorage'\n\n   If you use other cache storage then it is necesary to subclass it and\n   replace all ``scrapy.util.request.request_fingerprint`` calls with\n   ``scrapyjs.splash_request_fingerprint``.\n\n.. note::\n\n    Steps (3) and (4) are necessary because Scrapy doesn't provide a way\n    to override request fingerprints calculation algorithm globally; this\n    could change in future.\n\n\nUsage\n=====\n\nTo render the requests with Splash, use the ``'splash'`` Request `meta` key::\n\n    yield Request(url, self.parse_result, meta={\n        'splash': {\n            'args': {\n                # set rendering arguments here\n                'html': 1,\n                'png': 1,\n\n                # 'url' is prefilled from request url\n            },\n\n            # optional parameters\n            'endpoint': 'render.json',  # optional; default is render.json\n            'splash_url': '<url>',      # overrides SPLASH_URL\n            'slot_policy': scrapyjs.SlotPolicy.PER_DOMAIN,\n        }\n    })\n\n* ``meta['splash']['args']`` contains arguments sent to Splash.\n  ScrapyJS adds request.url to these arguments automatically.\n\n  Note that by default Scrapy escapes URL fragments using AJAX escaping scheme.\n  If you want to pass a URL with a fragment to Splash then set ``url``\n  in ``args`` dict manually.\n\n* ``meta['splash']['endpoint']`` is the Splash endpoint to use. By default\n  `render.json <http://splash.readthedocs.org/en/latest/api.html#render-json>`_\n  is used.\n\n  See Splash `HTTP API docs`_ for a full list of available endpoints\n  and parameters.\n\n.. _HTTP API docs: http://splash.readthedocs.org/en/latest/api.html\n\n* ``meta['splash']['splash_url']`` overrides the Splash URL set\n  in ``settings.py``.\n\n* ``meta['splash']['slot_policy']`` customize how\n  concurrency & politeness are maintained for Splash requests.\n\n  Currently there are 3 policies available:\n\n  1. ``scrapyjs.SlotPolicy.PER_DOMAIN`` (default) - send Splash requests to\n     downloader slots based on URL being rendered. It is useful if you want\n     to maintain per-domain politeness & concurrency settings.\n\n  2. ``scrapyjs.SlotPolicy.SINGLE_SLOT`` - send all Splash requests to\n     a single downloader slot. It is useful if you want to throttle requests\n     to Splash.\n\n  3. ``scrapyjs.SlotPolicy.SCRAPY_DEFAULT`` - don't do anything with slots.\n     It is similar to ``SINGLE_SLOT`` policy, but can be different if you access\n     other services on the same address as Splash.\n\n\nExamples\n========\n\nGet HTML contents::\n\n    import scrapy\n\n    class MySpider(scrapy.Spider):\n        start_urls = [\"http://example.com\", \"http://example.com/foo\"]\n\n        def start_requests(self):\n            for url in self.start_urls:\n                yield scrapy.Request(url, self.parse, meta={\n                    'splash': {\n                        'endpoint': 'render.html',\n                        'args': {'wait': 0.5}\n                    }\n                })\n\n        def parse(self, response):\n            # response.body is a result of render.html call; it\n            # contains HTML processed by a browser.\n            # ...\n\nGet HTML contents and a screenshot::\n\n    import json\n    import base64\n    import scrapy\n\n    class MySpider(scrapy.Spider):\n\n        # ...\n            yield scrapy.Request(url, self.parse_result, meta={\n                'splash': {\n                    'args': {\n                        'html': 1,\n                        'png': 1,\n                        'width': 600,\n                        'render_all': 1,\n                    }\n                }\n            })\n\n        # ...\n        def parse_result(self, response):\n            data = json.loads(response.body_as_unicode())\n            body = data['html']\n            png_bytes = base64.b64decode(data['png'])\n            # ...\n\nRun a simple `Splash Lua Script`_::\n\n    import json\n    import base64\n\n    class MySpider(scrapy.Spider):\n\n        # ...\n            script = \"\"\"\n            function main(splash)\n                assert(splash:go(splash.args.url))\n                return splash:evaljs(\"document.title\")\n            end\n            \"\"\"\n            yield scrapy.Request(url, self.parse_result, meta={\n                'splash': {\n                    'args': {'lua_source': script},\n                    'endpoint': 'execute',\n                }\n            })\n\n        # ...\n        def parse_response(self, response):\n            doc_title = response.body_as_unicode()\n            # ...\n\n\n.. _Splash Lua Script: http://splash.readthedocs.org/en/latest/scripting-tutorial.html\n\n\nHTTP Basic Auth\n===============\n\nIf you need HTTP Basic Authentication to access Splash, use\nScrapy's HttpAuthMiddleware_.\n\n.. _HttpAuthMiddleware: http://doc.scrapy.org/en/latest/topics/downloader-middleware.html#module-scrapy.downloadermiddlewares.httpauth\n\nWhy not use the Splash HTTP API directly?\n=========================================\n\nThe obvious alternative to ScrapyJS would be to send requests directly\nto the Splash `HTTP API`_. Take a look at the example below and make\nsure to read the observations after it::\n\n    import json\n\n    import scrapy\n    from scrapy.http.headers import Headers\n\n    RENDER_HTML_URL = \"http://127.0.0.1:8050/render.html\"\n\n    class MySpider(scrapy.Spider):\n        start_urls = [\"http://example.com\", \"http://example.com/foo\"]\n\n        def start_requests(self):\n            for url in self.start_urls:\n                body = json.dumps({\"url\": url, \"wait\": 0.5})\n                headers = Headers({'Content-Type': 'application/json'})\n                yield scrapy.Request(RENDER_HTML_URL, self.parse, method=\"POST\",\n                                     body=body, headers=headers)\n\n        def parse(self, response):\n            # response.body is a result of render.html call; it\n            # contains HTML processed by a browser.\n            # ...\n\n\nIt works and is easy enough, but there are some issues that you should be\naware of:\n\n1. There is a bit of boilerplate.\n\n2. As seen by Scrapy, we're sending requests to ``RENDER_HTML_URL`` instead\n   of the target URLs. It affects concurrency and politeness settings:\n   ``CONCURRENT_REQUESTS_PER_DOMAIN``, ``DOWNLOAD_DELAY``, etc could behave\n   in unexpected ways since delays and concurrency settings are no longer\n   per-domain.\n\n3. Some options depend on each other - for example, if you use timeout_\n   Splash option then you may want to set ``download_timeout``\n   scrapy.Request meta key as well.\n\nScrapyJS utlities allow to handle such edge cases and reduce the boilerplate.\n\n.. _HTTP API: http://splash.readthedocs.org/en/latest/api.html\n.. _timeout: http://splash.readthedocs.org/en/latest/api.html#arg-timeout\n\n\n\nContributing\n============\n\nSource code and bug tracker are on github:\nhttps://github.com/scrapy-plugins/scrapy-splash\n\nTo run tests, install \"tox\" Python package and then run ``tox`` command\nfrom the source checkout.\n\n\nChanges\n=======\n\n0.2 (2016-03-26)\n----------------\n\n* Scrapy 1.0 and 1.1 support;\n* Python 3 support;\n* documentation improvements;\n* project is moved to https://github.com/scrapy-plugins/scrapy-splash.\n\n0.1.1 (2015-03-16)\n------------------\n\nFixed fingerprint calculation for non-string meta values.\n\n0.1 (2015-02-28)\n----------------\n\nInitial release",
        "description_content_type": null,
        "docs_url": null,
        "download_url": "UNKNOWN",
        "downloads": {
            "last_day": -1,
            "last_month": -1,
            "last_week": -1
        },
        "home_page": "https://github.com/scrapy-plugins/scrapy-splash",
        "keywords": null,
        "license": "BSD",
        "maintainer": null,
        "maintainer_email": null,
        "name": "scrapyjs",
        "package_url": "https://pypi.org/project/scrapyjs/",
        "platform": "UNKNOWN",
        "project_url": "https://pypi.org/project/scrapyjs/",
        "project_urls": {
            "Download": "UNKNOWN",
            "Homepage": "https://github.com/scrapy-plugins/scrapy-splash"
        },
        "release_url": "https://pypi.org/project/scrapyjs/0.2/",
        "requires_dist": null,
        "requires_python": null,
        "summary": "JavaScript support for Scrapy using Splash",
        "version": "0.2"
    },
    "last_serial": 2027669,
    "releases": {
        "0.1": [
            {
                "comment_text": "",
                "digests": {
                    "md5": "b5d4023cba74d6f8c6e53aa5f6ac04f0",
                    "sha256": "f741d93b4a3c0e935ec7a620634b0edd012e70ba06e38f32a3da41c359ed0d0e"
                },
                "downloads": -1,
                "filename": "scrapyjs-0.1-py2.py3-none-any.whl",
                "has_sig": false,
                "md5_digest": "b5d4023cba74d6f8c6e53aa5f6ac04f0",
                "packagetype": "bdist_wheel",
                "python_version": "2.7",
                "requires_python": null,
                "size": 11904,
                "upload_time": "2015-02-27T21:34:54",
                "url": "https://files.pythonhosted.org/packages/ee/29/98ec11ecf609bb36466eda328683b4571abd58b74b63620aa9864e244d53/scrapyjs-0.1-py2.py3-none-any.whl"
            },
            {
                "comment_text": "",
                "digests": {
                    "md5": "be69eb7c5a00b55e9bdc58061ade4435",
                    "sha256": "ed3d672a48e9cd2ebabafc5790006e3f03cab4535e4ea5f5327672ccd7769595"
                },
                "downloads": -1,
                "filename": "scrapyjs-0.1.tar.gz",
                "has_sig": false,
                "md5_digest": "be69eb7c5a00b55e9bdc58061ade4435",
                "packagetype": "sdist",
                "python_version": "source",
                "requires_python": null,
                "size": 11020,
                "upload_time": "2015-02-27T21:34:41",
                "url": "https://files.pythonhosted.org/packages/3b/f8/9211bc99be8d5ebe2dddb517e44b7f13693ceb6b681d20cd725d4d8a4b66/scrapyjs-0.1.tar.gz"
            }
        ],
        "0.1.1": [
            {
                "comment_text": "",
                "digests": {
                    "md5": "53c7ed66a97f2dcddc034ed52122b324",
                    "sha256": "040f7c0116ae40c98156e49c18113b54ab7f8272ffbb8488f03d12cac32e7219"
                },
                "downloads": -1,
                "filename": "scrapyjs-0.1.1-py2.py3-none-any.whl",
                "has_sig": false,
                "md5_digest": "53c7ed66a97f2dcddc034ed52122b324",
                "packagetype": "bdist_wheel",
                "python_version": "2.7",
                "requires_python": null,
                "size": 12031,
                "upload_time": "2015-03-16T00:45:36",
                "url": "https://files.pythonhosted.org/packages/62/cb/9543087e4aa37276fd2e48d804771274ab185059a62b009f733251fb4157/scrapyjs-0.1.1-py2.py3-none-any.whl"
            },
            {
                "comment_text": "",
                "digests": {
                    "md5": "3823769c6a6d6f8f9ee6209a7b585853",
                    "sha256": "83bff510da818e4101fbf10f44ddb029971c482d28c22c78d67d493038688546"
                },
                "downloads": -1,
                "filename": "scrapyjs-0.1.1.tar.gz",
                "has_sig": false,
                "md5_digest": "3823769c6a6d6f8f9ee6209a7b585853",
                "packagetype": "sdist",
                "python_version": "source",
                "requires_python": null,
                "size": 11291,
                "upload_time": "2015-03-16T00:45:27",
                "url": "https://files.pythonhosted.org/packages/c9/9d/79a97adeaf5a919384f1586d5a610307a874dd2e009193b33fd3e8be8c39/scrapyjs-0.1.1.tar.gz"
            }
        ],
        "0.2": [
            {
                "comment_text": "",
                "digests": {
                    "md5": "59912f0495e92811212db93ac66a70d1",
                    "sha256": "2c0000d40d6c081360757c578fd70fa997f8b548b8083d06f6b2eca94935d150"
                },
                "downloads": -1,
                "filename": "scrapyjs-0.2-py2.py3-none-any.whl",
                "has_sig": false,
                "md5_digest": "59912f0495e92811212db93ac66a70d1",
                "packagetype": "bdist_wheel",
                "python_version": "3.5",
                "requires_python": null,
                "size": 13321,
                "upload_time": "2016-03-25T23:08:00",
                "url": "https://files.pythonhosted.org/packages/1a/5f/dcfb268903063cc8265e86f211e3e2d03cb05fe37bf8e6531fa6f4a8921f/scrapyjs-0.2-py2.py3-none-any.whl"
            },
            {
                "comment_text": "",
                "digests": {
                    "md5": "b6d9c3d1f20a461543e4d0d871d82572",
                    "sha256": "1833ea816daaa91c362c7d1c86d8f97269f70cbfdf783b2b946378936e944057"
                },
                "downloads": -1,
                "filename": "scrapyjs-0.2.tar.gz",
                "has_sig": false,
                "md5_digest": "b6d9c3d1f20a461543e4d0d871d82572",
                "packagetype": "sdist",
                "python_version": "source",
                "requires_python": null,
                "size": 12710,
                "upload_time": "2016-03-25T23:08:33",
                "url": "https://files.pythonhosted.org/packages/89/73/15a3cb81e96438a2e7ced2a1938a9dd9b798d633e9c7c84f1d31dd6e8812/scrapyjs-0.2.tar.gz"
            }
        ]
    },
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "md5": "59912f0495e92811212db93ac66a70d1",
                "sha256": "2c0000d40d6c081360757c578fd70fa997f8b548b8083d06f6b2eca94935d150"
            },
            "downloads": -1,
            "filename": "scrapyjs-0.2-py2.py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "59912f0495e92811212db93ac66a70d1",
            "packagetype": "bdist_wheel",
            "python_version": "3.5",
            "requires_python": null,
            "size": 13321,
            "upload_time": "2016-03-25T23:08:00",
            "url": "https://files.pythonhosted.org/packages/1a/5f/dcfb268903063cc8265e86f211e3e2d03cb05fe37bf8e6531fa6f4a8921f/scrapyjs-0.2-py2.py3-none-any.whl"
        },
        {
            "comment_text": "",
            "digests": {
                "md5": "b6d9c3d1f20a461543e4d0d871d82572",
                "sha256": "1833ea816daaa91c362c7d1c86d8f97269f70cbfdf783b2b946378936e944057"
            },
            "downloads": -1,
            "filename": "scrapyjs-0.2.tar.gz",
            "has_sig": false,
            "md5_digest": "b6d9c3d1f20a461543e4d0d871d82572",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 12710,
            "upload_time": "2016-03-25T23:08:33",
            "url": "https://files.pythonhosted.org/packages/89/73/15a3cb81e96438a2e7ced2a1938a9dd9b798d633e9c7c84f1d31dd6e8812/scrapyjs-0.2.tar.gz"
        }
    ]
}