{ "info": { "author": "Pat Sier", "author_email": "pat@citybureau.org", "bugtrack_url": null, "classifiers": [ "Framework :: Scrapy", "License :: OSI Approved :: MIT License", "Operating System :: OS Independent", "Programming Language :: Python :: 3" ], "description": "# Scrapy Wayback Middleware\n\n[![Build Status](https://travis-ci.org/City-Bureau/scrapy-wayback-middleware.svg?branch=master)](https://travis-ci.org/City-Bureau/scrapy-wayback-middleware)\n\nMiddleware for submitting all scraped response URLs to the [Internet Archive Wayback Machine](https://archive.org/web/) for archival.\n\n## Installation\n\n```bash\npip install scrapy-wayback-middleware\n```\n\n## Setup\n\nAdd `scrapy_wayback_middleware.WaybackMiddleware` to your project's `SPIDER_MIDDLEWARES` settings. By default, the middleware will make `GET` requests to `web.archive.org/save/{URL}`, but if the `WAYBACK_MIDDLEWARE_POST` setting is `True` then it will make POST requests to [`pragma.archivelab.org`](https://archive.readme.io/docs/creating-a-snapshot) instead.\n\n## Configuration\n\nTo configure custom behavior for certain methods, subclass `WaybackMiddleware` and override the `get_item_urls` method to pull additional links to archive from individual items or `handle_wayback` to change how responses from the Wayback Machine are handled. The `WAYBACK_MIDDLEWARE_POST` can be set to `True` to adjust request behavior.\n\n### Duplicate Filtering\n\nIn order to avoid sending duplicate requests with `WAYBACK_MIDDLEWARE_POST` set to `False`, you'll need to either include `web.archive.org` in your spider's `allowed_domains` property (if specified) or disable `scrapy.spidermiddlewares.offsite.OffsiteMiddleware` in your settings.\n\n### Rate Limits\n\nWhile neither endpoint returns headers indicating specific rate limits, the `GET` endpoint at `web.archive.org/save` has a rate limit of 25 requests/minute, resetting each minute. The middleware is configured to wait for 60 seconds whenever it sees a 429 error code to handle this.\n\n\n", "description_content_type": "text/markdown", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/City-Bureau/scrapy-wayback-middleware", "keywords": "", "license": "MIT", "maintainer": "", "maintainer_email": "", "name": "scrapy-wayback-middleware", "package_url": "https://pypi.org/project/scrapy-wayback-middleware/", "platform": "", "project_url": "https://pypi.org/project/scrapy-wayback-middleware/", "project_urls": { "Homepage": "https://github.com/City-Bureau/scrapy-wayback-middleware" }, "release_url": "https://pypi.org/project/scrapy-wayback-middleware/0.2.0/", "requires_dist": [ "scrapy" ], "requires_python": ">=3.5,<4.0", "summary": "Scrapy middleware for submitting URLs to the Internet Archive Wayback Machine", "version": "0.2.0" }, "last_serial": 5962080, "releases": { "0.0.1": [ { "comment_text": "", "digests": { "md5": "f13ba7e4c7a25b2e994a98ee6c431ecb", "sha256": "e0722278dad1cdae3e3164d9ef24aa45eaacf4bed581b39709663a744a766dc5" }, "downloads": -1, "filename": "scrapy_wayback_middleware-0.0.1-py3-none-any.whl", "has_sig": false, "md5_digest": "f13ba7e4c7a25b2e994a98ee6c431ecb", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": ">=3.5,<4.0", "size": 4176, "upload_time": "2019-02-25T16:03:34", "url": "https://files.pythonhosted.org/packages/1d/35/e4d78b1c23579e382897a6ff53fd2e7665f4c3aa411bd78f1eaf115155f7/scrapy_wayback_middleware-0.0.1-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "1bb3120d5dbb0358ebe143bc604dc6a3", "sha256": "52cfd62305eac1eede931dc922ecd89dc1f3dfd3ab5e40607cc0708c6ef9a4ce" }, "downloads": -1, "filename": "scrapy-wayback-middleware-0.0.1.tar.gz", "has_sig": false, "md5_digest": "1bb3120d5dbb0358ebe143bc604dc6a3", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.5,<4.0", "size": 2562, "upload_time": "2019-02-25T16:03:36", "url": "https://files.pythonhosted.org/packages/87/e8/7275f89556ff3510e153776174d8ab9bcdc6ebf6ab2f0323bb208131eca9/scrapy-wayback-middleware-0.0.1.tar.gz" } ], "0.0.2": [ { "comment_text": "", "digests": { "md5": "ea8625287c9a53901144dbcd84934a34", "sha256": "d65c9f9fa15864ed51ab3a4c7b56f7b678a7cfce08f0657003e350a9ac596ef3" }, "downloads": -1, "filename": "scrapy_wayback_middleware-0.0.2-py3-none-any.whl", "has_sig": false, "md5_digest": "ea8625287c9a53901144dbcd84934a34", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": ">=3.5,<4.0", "size": 4233, "upload_time": "2019-02-25T16:44:38", "url": "https://files.pythonhosted.org/packages/21/86/35ee870fc2581c1daf63e029c4a4b751710024ed3572b0124bf9aa3cf72c/scrapy_wayback_middleware-0.0.2-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "b570737d538b64bc724a73588e83874c", "sha256": "85c7499bd04bcce90ff3294d3d7f2b8c007c937dfdc3079c41b879cc8a82adaf" }, "downloads": -1, "filename": "scrapy-wayback-middleware-0.0.2.tar.gz", "has_sig": false, "md5_digest": "b570737d538b64bc724a73588e83874c", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.5,<4.0", "size": 2629, "upload_time": "2019-02-25T16:44:39", "url": "https://files.pythonhosted.org/packages/27/66/0c9ffa403b825acc8fd91feba032062ae9f9dddcf3b0cc946b8c4137a3bc/scrapy-wayback-middleware-0.0.2.tar.gz" } ], "0.1.0": [ { "comment_text": "", "digests": { "md5": "ace89a631b07cbbd4de0b4e9ac565cf9", "sha256": "b3db78db804f1c26729ad96c368c3c87535b1aa1f3019012b23c0d5e0d83cc61" }, "downloads": -1, "filename": "scrapy_wayback_middleware-0.1.0-py3-none-any.whl", "has_sig": false, "md5_digest": "ace89a631b07cbbd4de0b4e9ac565cf9", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": ">=3.5,<4.0", "size": 4757, "upload_time": "2019-09-08T23:37:53", "url": "https://files.pythonhosted.org/packages/59/47/821243abf87f5a77eadaf5a8379a6992e75a1362768e8b007e583edbb9ca/scrapy_wayback_middleware-0.1.0-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "b6e8341a0d6c9ce812b22b2eb2e503f3", "sha256": "24c57b8b80559b13057bfa18f90752184903b02b960eb6010ff89cbc92657743" }, "downloads": -1, "filename": "scrapy-wayback-middleware-0.1.0.tar.gz", "has_sig": false, "md5_digest": "b6e8341a0d6c9ce812b22b2eb2e503f3", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.5,<4.0", "size": 3083, "upload_time": "2019-09-08T23:37:55", "url": "https://files.pythonhosted.org/packages/ea/a3/92beea58d36d4ea65c389ab894c4652d13e8783f5a3d22b211554615239c/scrapy-wayback-middleware-0.1.0.tar.gz" } ], "0.1.1": [ { "comment_text": "", "digests": { "md5": "67bf25b8fb3db26aa8312f9f7a22cbeb", "sha256": "d34abd590796ca1f93b2ca9e4f0864260fab74f3fcbb4189ffafd0ac4fa94e81" }, "downloads": -1, "filename": "scrapy_wayback_middleware-0.1.1-py3-none-any.whl", "has_sig": false, "md5_digest": "67bf25b8fb3db26aa8312f9f7a22cbeb", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": ">=3.5,<4.0", "size": 4985, "upload_time": "2019-10-11T19:58:42", "url": "https://files.pythonhosted.org/packages/fd/ff/691491042e74f2577d77386c279586bf3357bc02f032436bfd59962f1f21/scrapy_wayback_middleware-0.1.1-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "d5cc03950be9502ed34ba854278bfba0", "sha256": "2cae763f759de31ace20a64c5509c969f3b029d272cdde1fcec187791bf0dd99" }, "downloads": -1, "filename": "scrapy-wayback-middleware-0.1.1.tar.gz", "has_sig": false, "md5_digest": "d5cc03950be9502ed34ba854278bfba0", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.5,<4.0", "size": 3315, "upload_time": "2019-10-11T19:58:43", "url": "https://files.pythonhosted.org/packages/f6/02/744a1a0712ede9f5d5388f8a0d5a26157010819866251eaccf5f894c2863/scrapy-wayback-middleware-0.1.1.tar.gz" } ], "0.2.0": [ { "comment_text": "", "digests": { "md5": "d9327b23105fa182e1a1f50f38a6fed0", "sha256": "f91477f420bc7feeaa0197de74d199626c6418f10032b3b777495894dcbc0136" }, "downloads": -1, "filename": "scrapy_wayback_middleware-0.2.0-py3-none-any.whl", "has_sig": false, "md5_digest": "d9327b23105fa182e1a1f50f38a6fed0", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": ">=3.5,<4.0", "size": 5210, "upload_time": "2019-10-11T22:00:25", "url": "https://files.pythonhosted.org/packages/c1/ac/28297101106423a78497fdc8c4d442b6e30709d1a87eca32a4b1caa1d7f9/scrapy_wayback_middleware-0.2.0-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "57c3b6dc61dc4b1858d79a94e031b3c9", "sha256": "9d402437092ad2e2d36ea2d41e6f88cc8f9813e80d4ada0cfe0dcd061af0e3f5" }, "downloads": -1, "filename": "scrapy-wayback-middleware-0.2.0.tar.gz", "has_sig": false, "md5_digest": "57c3b6dc61dc4b1858d79a94e031b3c9", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.5,<4.0", "size": 3529, "upload_time": "2019-10-11T22:00:26", "url": "https://files.pythonhosted.org/packages/fa/96/ded7a1e4361dee57d8c187324fbff9b9ac31266412c374b3bd349e3dedba/scrapy-wayback-middleware-0.2.0.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "d9327b23105fa182e1a1f50f38a6fed0", "sha256": "f91477f420bc7feeaa0197de74d199626c6418f10032b3b777495894dcbc0136" }, "downloads": -1, "filename": "scrapy_wayback_middleware-0.2.0-py3-none-any.whl", "has_sig": false, "md5_digest": "d9327b23105fa182e1a1f50f38a6fed0", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": ">=3.5,<4.0", "size": 5210, "upload_time": "2019-10-11T22:00:25", "url": "https://files.pythonhosted.org/packages/c1/ac/28297101106423a78497fdc8c4d442b6e30709d1a87eca32a4b1caa1d7f9/scrapy_wayback_middleware-0.2.0-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "57c3b6dc61dc4b1858d79a94e031b3c9", "sha256": "9d402437092ad2e2d36ea2d41e6f88cc8f9813e80d4ada0cfe0dcd061af0e3f5" }, "downloads": -1, "filename": "scrapy-wayback-middleware-0.2.0.tar.gz", "has_sig": false, "md5_digest": "57c3b6dc61dc4b1858d79a94e031b3c9", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.5,<4.0", "size": 3529, "upload_time": "2019-10-11T22:00:26", "url": "https://files.pythonhosted.org/packages/fa/96/ded7a1e4361dee57d8c187324fbff9b9ac31266412c374b3bd349e3dedba/scrapy-wayback-middleware-0.2.0.tar.gz" } ] }