{
"info": {
"author": "David Foster",
"author_email": "david@dafoster.net",
"bugtrack_url": null,
"classifiers": [
"Development Status :: 5 - Production/Stable",
"Environment :: Web Environment",
"Intended Audience :: Developers",
"License :: OSI Approved :: MIT License",
"Operating System :: OS Independent",
"Programming Language :: Python",
"Programming Language :: Python :: 3.4",
"Topic :: Database",
"Topic :: Internet :: Proxy Servers",
"Topic :: Internet :: WWW/HTTP",
"Topic :: Internet :: WWW/HTTP :: HTTP Servers",
"Topic :: System :: Archiving",
"Topic :: System :: Archiving :: Backup",
"Topic :: System :: Archiving :: Mirroring"
],
"description": "webcrystal\n==========\n\nwebcrystal is:\n\n1. An HTTP proxy and web service that saves every web page accessed\n through it to disk.\n2. An on-disk archival format for storing websites.\n\nwebcrystal is intended as a tool for archiving websites. It is also\nintended to be convenient to write HTTP-based and browser-based web\nscrapers on top of.\n\nFeatures\n--------\n\n- Compact package: One .py file. Only one dependency (``urllib3``).\n- A simple documented archival format.\n- >95% code coverage, enforced by the test suite.\n- Friendly MIT license.\n- Excellent documentation.\n\nInstallation\n------------\n\n- Install `Python 3 `__.\n- From a command-line terminal (Terminal on OS X, Command Prompt on\n Windows), run the command:\n\n::\n\n pip3 install webcrystal\n\nQuickstart\n----------\n\nTo start the proxy run a command like:\n\n::\n\n webcrystal.py 9227 xkcd.wbcr http://xkcd.com/\n\nThen you can visit http://localhost:9227/ to have the same effect as\nvisiting http://xkcd.com/ directly, except that all requests are\narchived in ``xkcd.wbcr/``.\n\nWhen you access an HTTP resource through the webcrystal proxy for the\nfirst time, it will be fetched from the origin HTTP server and archived\nlocally. All subsequent requests for the same resource will be returned\nfrom the archive.\n\nCLI\n---\n\nTo start the webcrystal proxy:\n\n::\n\n webcrystal.py [--help] [--quiet] []\n\nTo stop the proxy press ^C or send a SIGINT signal to it.\n\nFull Syntax\n~~~~~~~~~~~\n\n::\n\n webcrystal.py --help\n\nThis outputs:\n\n::\n\n usage: webcrystal.py [-h] [-q] port archive_dirpath [default_origin_domain]\n\n An archiving HTTP proxy and web service.\n\n positional arguments:\n port Port on which to run the HTTP proxy. Suggest 9227\n (WBCR).\n archive_dirpath Path to the archive directory. Usually has .wbcr\n extension.\n default_origin_domain\n Default HTTP domain which the HTTP proxy will redirect\n to if no URL is specified.\n\n optional arguments:\n -h, --help Show this help message and exit.\n -q, --quiet Suppresses all output.\n\nHTTP API\n--------\n\nThe HTTP API is the primary API for interacting with the webcrystal\nproxy.\n\nWhile the proxy is running, it responds to the following HTTP endpoints.\n\nNotice that GET is an accepted method for all endpoints, so that they\ncan be easily requested using a regular web browser. Browser\naccessibility is convenient for manual inspection and browser-based\nwebsite scrapers.\n\n``GET,HEAD /``\n~~~~~~~~~~~~~~\n\nRedirects to the home page of the default origin domain if it was\nspecified at the CLI. Returns:\n\n- HTTP 404 (Not Found) if no default origin domain is specified (the\n default) or\n- HTTP 307 (Temporary Redirect) to the default origin domain if it is\n specified.\n\n``GET,HEAD /_/http[s]/__PATH__``\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nIf in online mode (the default):\n\n- The requested resource will be fetched from the origin server and\n added to the archive if:\n\n - \n\n (1) it is not already archived,\n\n - \n\n (2) a ``Cache-Control=no-cache`` header is specified, or\n\n - \n\n (3) a ``Pragma=no-cache`` header is specified.\n\n- The newly archived resource will be returned to the client, with all\n URLs in HTTP headers and content rewritten to point to the proxy.\n\nIf in offline mode:\n\n- If the resource is in the archive, it will be returned to the client,\n with all URLs in HTTP headers and content rewritten to point to the\n proxy.\n- If the resource is not in the archive, an HTTP 503 (Service\n Unavailable) response will be returned, with an HTML page that\n provides a link to the online version of the content.\n\n``POST,GET /_online``\n~~~~~~~~~~~~~~~~~~~~~\n\nSwitches the proxy to online mode.\n\n``POST,GET /_offline``\n~~~~~~~~~~~~~~~~~~~~~~\n\nSwitches the proxy to offline mode.\n\n``POST,GET /_refresh/http[s]/__PATH__``\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nRefetches the specified URL from the origin server using the same\nrequest headers as the last time it was fetched. Returns:\n\n- HTTP 200 (OK) if successful or\n- HTTP 404 (Not Found) if the specified URL was not in the archive.\n\n``POST,GET /_delete/http[s]/__PATH__``\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nDeletes the specified URL in the archive. Returns:\n\n- HTTP 200 (OK) if successful or\n- HTTP 404 (Not Found) if the specified URL was not in the archive.\n\nArchival Format\n---------------\n\nWhen the proxy is started with a command like:\n\n::\n\n webcrystal.py 9227 website.wbcr\n\nIt creates an archive in the directory ``website.wbcr/`` in the\nfollowing format:\n\n``website.wbcr/index.txt``\n~~~~~~~~~~~~~~~~~~~~~~~~~~\n\n- Lists the URL of each archived HTTP resource, one per line.\n- UTF-8 encoded text file with Unix line endings (``\\n``).\n\nExample:\n\n::\n\n http://xkcd.com/\n http://xkcd.com/s/b0dcca.css\n http://xkcd.com/1645/\n\nThe preceding example archive contains 3 HTTP resources, numbered #1,\n#2, and #3.\n\n``website.wbcr/1.request_headers.json``\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\n- Contains the HTTP request headers sent to the origin HTTP server to\n obtain HTTP resource #1.\n- UTF-8 encoded JSON file.\n\nExample:\n\n::\n\n {\"Accept-Language\": \"en-us\", \"Accept\": \"text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8\", \"Host\": \"xkcd.com\", \"Accept-Encoding\": \"gzip, deflate\", \"User-Agent\": \"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_3) AppleWebKit/601.4.4 (KHTML, like Gecko) Version/9.0.3 Safari/601.4.4\"}\n\n``website.wbcr/1.response_headers.json``\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\n- Contains the HTTP response headers received from the origin HTTP\n server when obtaining HTTP resource #1.\n- UTF-8 encoded JSON file.\n- Contains an internal \"X-Status-Code\" header that indicates the HTTP\n status code received from the origin HTTP server.\n\nExample:\n\n::\n\n {\"Cache-Control\": \"public\", \"Connection\": \"keep-alive\", \"Accept-Ranges\": \"bytes\", \"X-Cache-Hits\": \"0\", \"Date\": \"Tue, 15 Mar 2016 04:37:05 GMT\", \"Age\": \"0\", \"X-Served-By\": \"cache-sjc3628-SJC\", \"Content-Type\": \"text/html\", \"Server\": \"lighttpd/1.4.28\", \"X-Status-Code\": \"404\", \"X-Cache\": \"MISS\", \"Content-Length\": \"345\", \"X-Timer\": \"S1458016625.375814,VS0,VE148\", \"Via\": \"1.1 varnish\"}\n\n``website.wbcr/1.response_body.dat``\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\n- Contains the contents of the HTTP response body received from the\n origin HTTP server when obtaining HTTP resource #1.\n- Binary file.\n\nContributing\n------------\n\nInstall Dev Requirements\n~~~~~~~~~~~~~~~~~~~~~~~~\n\n::\n\n pip3 install -r dev-requirements.txt\n\nRun the Tests\n~~~~~~~~~~~~~\n\n::\n\n make test\n\nGather Code Coverage Metrics\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\n::\n\n make coverage\n open htmlcov/index.html\n\nUpload a New Version to PyPI\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\n- Ensure the changelog is updated.\n- Bump the version number in ``setup.py``.\n- ``python3 setup.py sdist bdist_wheel upload``\n\n - There are more advanced `upload\n techniques `__\n that might be used later.\n\n- Tag the release in Git.\n\nKnown Limitations\n-----------------\n\n- Sites that vary the content served at a particular URL depending on\n whether you are logged in can have only one version of the URL\n archived.\n\nLicense\n-------\n\nThis code is provided under the MIT License. See LICENSE file for\ndetails.\n\nChangelog\n---------\n\n- v1.0.1\n\n - More robust support for HTTPS URLs on OS X 10.11.\n - Validate HTTPS certificates.\n\n- v1.0 - Initial release",
"description_content_type": null,
"docs_url": null,
"download_url": "UNKNOWN",
"downloads": {
"last_day": -1,
"last_month": -1,
"last_week": -1
},
"home_page": "http://dafoster.net/projects/webcrystal/",
"keywords": null,
"license": "MIT",
"maintainer": null,
"maintainer_email": null,
"name": "webcrystal",
"package_url": "https://pypi.org/project/webcrystal/",
"platform": "UNKNOWN",
"project_url": "https://pypi.org/project/webcrystal/",
"project_urls": {
"Download": "UNKNOWN",
"Homepage": "http://dafoster.net/projects/webcrystal/"
},
"release_url": "https://pypi.org/project/webcrystal/1.0.1/",
"requires_dist": null,
"requires_python": null,
"summary": "A website archival tool and format.",
"version": "1.0.1"
},
"last_serial": 2069008,
"releases": {
"1.0": [
{
"comment_text": "",
"digests": {
"md5": "650ba26558e52b553979467a0aada597",
"sha256": "d6c3dcbbdd2798e3c8bbd72561ac9f9f11299ab830a8a692e961719662e4c1fe"
},
"downloads": -1,
"filename": "webcrystal-1.0.tar.gz",
"has_sig": false,
"md5_digest": "650ba26558e52b553979467a0aada597",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 12256,
"upload_time": "2016-03-30T05:26:38",
"url": "https://files.pythonhosted.org/packages/c4/4f/48eb2d8b2fcba5eab0fb02cf49c5f94138802f55ec7ae230aa5a4426af2c/webcrystal-1.0.tar.gz"
}
],
"1.0.1": [
{
"comment_text": "",
"digests": {
"md5": "5df3c286516d1d6a876990d7d264cfa6",
"sha256": "d78410114ffdc6df6a46e42f5d819ff88c5e8e57acc39d65a0de752fcb5e0d1f"
},
"downloads": -1,
"filename": "webcrystal-1.0.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "5df3c286516d1d6a876990d7d264cfa6",
"packagetype": "bdist_wheel",
"python_version": "3.4",
"requires_python": null,
"size": 23593,
"upload_time": "2016-04-18T06:04:26",
"url": "https://files.pythonhosted.org/packages/f4/50/f2e4af3be96d9baf5882e70d4a1d971c71e21613e4642db6f8a677e2c34a/webcrystal-1.0.1-py3-none-any.whl"
},
{
"comment_text": "",
"digests": {
"md5": "cf9a4d05a384005a689662b757ca705b",
"sha256": "124b1ad9a7ab4c9bbada3ef9bb1478a161c4ddcd776d29d1665de0e8f4a5fc8c"
},
"downloads": -1,
"filename": "webcrystal-1.0.1.tar.gz",
"has_sig": false,
"md5_digest": "cf9a4d05a384005a689662b757ca705b",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 13165,
"upload_time": "2016-04-18T06:04:18",
"url": "https://files.pythonhosted.org/packages/20/13/39c518bb305b2e01e26442c1b6aa899e8cc44ade0fc950d3266605e7a3de/webcrystal-1.0.1.tar.gz"
}
]
},
"urls": [
{
"comment_text": "",
"digests": {
"md5": "5df3c286516d1d6a876990d7d264cfa6",
"sha256": "d78410114ffdc6df6a46e42f5d819ff88c5e8e57acc39d65a0de752fcb5e0d1f"
},
"downloads": -1,
"filename": "webcrystal-1.0.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "5df3c286516d1d6a876990d7d264cfa6",
"packagetype": "bdist_wheel",
"python_version": "3.4",
"requires_python": null,
"size": 23593,
"upload_time": "2016-04-18T06:04:26",
"url": "https://files.pythonhosted.org/packages/f4/50/f2e4af3be96d9baf5882e70d4a1d971c71e21613e4642db6f8a677e2c34a/webcrystal-1.0.1-py3-none-any.whl"
},
{
"comment_text": "",
"digests": {
"md5": "cf9a4d05a384005a689662b757ca705b",
"sha256": "124b1ad9a7ab4c9bbada3ef9bb1478a161c4ddcd776d29d1665de0e8f4a5fc8c"
},
"downloads": -1,
"filename": "webcrystal-1.0.1.tar.gz",
"has_sig": false,
"md5_digest": "cf9a4d05a384005a689662b757ca705b",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 13165,
"upload_time": "2016-04-18T06:04:18",
"url": "https://files.pythonhosted.org/packages/20/13/39c518bb305b2e01e26442c1b6aa899e8cc44ade0fc950d3266605e7a3de/webcrystal-1.0.1.tar.gz"
}
]
}