{ "info": { "author": "Ed Summers", "author_email": "ehs@pobox.com", "bugtrack_url": null, "classifiers": [], "description": "Give *waybackprov* a URL and it will summarize which Internet Archive\ncollections have archived the URL. This kind of information can sometimes\nprovide insight about why a particular web resource or set of web resources were\narchived from the web.\n\n## Install \n\n pip install waybackprov\n\n## Basic Usage\n\nTo check a particular URL here's how it works:\n\n % waybackprov https://twitter.com/EPAScottPruitt\n 364 https://archive.org/details/focused_crawls\n 306 https://archive.org/details/edgi_monitor\n 151 https://archive.org/details/www3.epa.gov\n 60 https://archive.org/details/epa.gov4\n 47 https://archive.org/details/epa.gov5\n ...\n\nThe first column contains the number of crawls for a particular URL, and the\nsecond column contains the URL for the Internet Archive collection that added\nit.\n\n## Time\n\nBy default waybackprov will only look at the current year. If you would like it\nto examine a range of years use the `--start` and `--end` options:\n\n % waybackprov --start 2016 --end 2018 https://twitter.com/EPAScottPruitt\n\n## Multiple Pages\n\nIf you would like to look at all URLs at a particular URL prefix you can use the\n`--prefix` option:\n\n % waybackprov --prefix https://twitter.com/EPAScottPruitt\n\nThis will use the Internet Archive's [CDX API](https://github.com/webrecorder/pywb/wiki/CDX-Server-API) to also include URLs that are extensions of the URL you supply, so it would include for example:\n\n https://twitter.com/EPAScottPruitt/status/1309839080398339\n\nBut it can also include things you may not want, such as:\n\n https://twitter.com/EPAScottPruitt/status/1309839080398339/media/1\n\nTo further limit the URLs use the `--match` parameter to specify a regular\nexpression only check particular URLs. Further specifying the URLs you are\ninterested in is highly recommended since it prevents lots of lookups for CSS,\nJavaScript and image files that are components of the resource that was\ninitially crawled.\n\n % waybackprov --prefix --match 'status/\\d+$' https://twitter.com/EPAScottPruitt\n\n## Collections\n\nOne thing to remember when interpreting this data is that collections can\ncontain other collections. For example the *edgi_monitor* collection is a\nsub-collection of *focused_crawls*.\n\nIf you use the `--collapse` option only the most specific collection will be\nreported for a given crawl. So if *coll1* is part of *coll2* which is part of\n*coll3*, only *coll1* will be reported instead of *coll1*, *coll2* and *coll3*.\nThis does involve collection metadata lookups at the Internet Archive API, so it\ndoes slow performance significantly.\n\n## JSON and CSV\n\nIf you would rather see the raw data as JSON or CSV use the `--format` option.\nWhen you use either of these formats you will see the metadata for each crawl,\nrather than a summary.\n\n## Log\n\nIf you would like to see detailed information about what *waybackprov* is doing\nuse the `--log` option to supply the a file path to log to:\n\n % waybackprov --log waybackprov.log https://example.com/\n\n## Test\n\nIf you would like to test it first install [pytest] and then:\n\n pytest test.py\n\n[pytest]: https://docs.pytest.org/en/latest/", "description_content_type": "text/markdown", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/edsu/waybackprov", "keywords": "", "license": "", "maintainer": "", "maintainer_email": "", "name": "waybackprov", "package_url": "https://pypi.org/project/waybackprov/", "platform": "", "project_url": "https://pypi.org/project/waybackprov/", "project_urls": { "Homepage": "https://github.com/edsu/waybackprov" }, "release_url": "https://pypi.org/project/waybackprov/0.0.7/", "requires_dist": null, "requires_python": ">=3.0", "summary": "Checks the provenance of a URL in the Wayback machine", "version": "0.0.7" }, "last_serial": 4117033, "releases": { "0.0.1": [ { "comment_text": "", "digests": { "md5": "23a012865a4abc53b1c88117cd7ccfb5", "sha256": "0d458d2f6d25631973197e8f0d758bb662f2dd399700d565f84c225e1fa1cf51" }, "downloads": -1, "filename": "waybackprov-0.0.1.tar.gz", "has_sig": false, "md5_digest": "23a012865a4abc53b1c88117cd7ccfb5", "packagetype": "sdist", "python_version": "source", "requires_python": ">=2.7", "size": 3568, "upload_time": "2018-07-12T16:19:53", "url": "https://files.pythonhosted.org/packages/c0/8b/fe0afdef204a6c0203eeb3adea537b8d7ae218603237bab4c70cfe76359a/waybackprov-0.0.1.tar.gz" } ], "0.0.2": [ { "comment_text": "", "digests": { "md5": "f7f25d4743151275a352ff1c3e084014", "sha256": "55b171a534478227050c5b27dca783bc5161401cb7581e33f7cdc2fed7d83f58" }, "downloads": -1, "filename": "waybackprov-0.0.2.tar.gz", "has_sig": false, "md5_digest": "f7f25d4743151275a352ff1c3e084014", "packagetype": "sdist", "python_version": "source", "requires_python": ">=2.7", "size": 3726, "upload_time": "2018-07-12T20:19:25", "url": "https://files.pythonhosted.org/packages/47/75/a6f1e9b22d82304d264b0a0a760fb57225396465d5edd08f0bab3cc6cf49/waybackprov-0.0.2.tar.gz" } ], "0.0.3": [ { "comment_text": "", "digests": { "md5": "328f74a318502043aa54eceefb58093d", "sha256": "47423fd340de1f17417ccffadb1ddc8f56657aefaff1b57a5f8d09b063268790" }, "downloads": -1, "filename": "waybackprov-0.0.3.tar.gz", "has_sig": false, "md5_digest": "328f74a318502043aa54eceefb58093d", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.0", "size": 4375, "upload_time": "2018-07-21T20:36:54", "url": "https://files.pythonhosted.org/packages/51/4a/1d8f92afde6e8382866c9bc1e73d35ad7be6f41e5ede28ed47f300dd7ca8/waybackprov-0.0.3.tar.gz" } ], "0.0.4": [ { "comment_text": "", "digests": { "md5": "f49426d6050e312b7d1358a654982c3e", "sha256": "56f6b8b5fca1bd994b592244cd53ed58600de4c9accab390c666e51af93ee1ac" }, "downloads": -1, "filename": "waybackprov-0.0.4.tar.gz", "has_sig": false, "md5_digest": "f49426d6050e312b7d1358a654982c3e", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.0", "size": 5023, "upload_time": "2018-07-23T17:18:01", "url": "https://files.pythonhosted.org/packages/be/2f/9f8946bf1e067821c6d8b4460efe3c6cdc8d16cf39767c4c0aa802eb8b65/waybackprov-0.0.4.tar.gz" } ], "0.0.5": [ { "comment_text": "", "digests": { "md5": "2be80b0f7e92f21faf46d2834c19393c", "sha256": "b794f99f54f87b64e82b5c16515ad78921489895b6b7d436dbc60418cf6b67a9" }, "downloads": -1, "filename": "waybackprov-0.0.5.tar.gz", "has_sig": false, "md5_digest": "2be80b0f7e92f21faf46d2834c19393c", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 5156, "upload_time": "2018-07-24T16:36:26", "url": "https://files.pythonhosted.org/packages/9a/e9/434d8ab160251a5bed526bdd4b396c0652022434fc6fc0e0fd4bc6d16c5a/waybackprov-0.0.5.tar.gz" } ], "0.0.6": [ { "comment_text": "", "digests": { "md5": "f4cc6f9a82165bb12ff65a064b9c549a", "sha256": "bc505488f75ca6b332d528a575d8a35ce640ad4591d1c374f1de25774e1e0286" }, "downloads": -1, "filename": "waybackprov-0.0.6.tar.gz", "has_sig": false, "md5_digest": "f4cc6f9a82165bb12ff65a064b9c549a", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.0", "size": 5156, "upload_time": "2018-07-24T16:38:23", "url": "https://files.pythonhosted.org/packages/c1/9d/6df984065694f18931984a8bc298def22894a56af67855ed18aa5f52531e/waybackprov-0.0.6.tar.gz" } ], "0.0.7": [ { "comment_text": "", "digests": { "md5": "604a25a29eeaa202469c7dc1081c6ca8", "sha256": "4f75d678b08c7895449a04aafdb46a5de645903b412b51a2b27f16b3a1dafd6d" }, "downloads": -1, "filename": "waybackprov-0.0.7.tar.gz", "has_sig": false, "md5_digest": "604a25a29eeaa202469c7dc1081c6ca8", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.0", "size": 5219, "upload_time": "2018-07-30T16:10:29", "url": "https://files.pythonhosted.org/packages/fc/20/1ce73f77a22183b7e0ac89d8ad42e5a477742d8ecb5f14a83bd8518b0b48/waybackprov-0.0.7.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "604a25a29eeaa202469c7dc1081c6ca8", "sha256": "4f75d678b08c7895449a04aafdb46a5de645903b412b51a2b27f16b3a1dafd6d" }, "downloads": -1, "filename": "waybackprov-0.0.7.tar.gz", "has_sig": false, "md5_digest": "604a25a29eeaa202469c7dc1081c6ca8", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.0", "size": 5219, "upload_time": "2018-07-30T16:10:29", "url": "https://files.pythonhosted.org/packages/fc/20/1ce73f77a22183b7e0ac89d8ad42e5a477742d8ecb5f14a83bd8518b0b48/waybackprov-0.0.7.tar.gz" } ] }