{ "info": { "author": "Christopher Foo", "author_email": "chris.foo@gmail.com", "bugtrack_url": null, "classifiers": [ "License :: OSI Approved :: GNU General Public License v3 or later (GPLv3+)", "Programming Language :: Python :: 3", "Topic :: System :: Archiving" ], "description": "WARCAT: Web ARChive (WARC) Archiving Tool\n=========================================\n\nTool and library for handling Web ARChive (WARC) files.\n\n\nQuick Start\n===========\n\nRequirements:\n\n* Python 3\n\nInstall stable version::\n\n pip-3 install warcat\n\nOr install latest version::\n\n git clone git://github.com/chfoo/warcat.git\n pip-3 install -r requirements.txt\n python3 setup.py install\n\n\nExample Run::\n\n python3 -m warcat --help\n python3 -m warcat list example/at.warc.gz\n python3 -m warcat verify megawarc.warc.gz --progress\n python3 -m warcat extract megawarc.warc.gz --output-dir /tmp/megawarc/ --progress\n\n\nSupported commands\n++++++++++++++++++\n\nconcat\n Naively join archives into one\nextract\n Extract files from archive\nhelp\n List commands available\nlist\n List contents of archive\npass\n Load archive and write it back out\nsplit\n Split archives into individual records\nverify\n Verify digest and validate conformance\n\n\nLibrary\n+++++++\n\nExample:\n\n.. code-block:: python\n\n >>> import warcat.model\n >>> warc = warcat.model.WARC()\n >>> warc.load('example/at.warc.gz')\n >>> len(warc.records)\n 8\n >>> record = warc.records[0]\n >>> record.warc_type\n 'warcinfo'\n >>> record.content_length\n 233\n >>> record.header.version\n '1.0'\n >>> record.header.fields.list()\n [('WARC-Type', 'warcinfo'), ('Content-Type', 'application/warc-fields'), ('WARC-Date', '2013-04-09T00:11:14Z'), ('WARC-Record-ID', ''), ('WARC-Filename', 'at.warc.gz'), ('WARC-Block-Digest', 'sha1:3C6SPSGP5QN2HNHKPTLYDHDPFYKYAOIX'), ('Content-Length', '233')]\n >>> record.header.fields['content-type']\n 'application/warc-fields'\n >>> record.content_block.fields.list()\n [('software', 'Wget/1.13.4-2608 (linux-gnu)'), ('format', 'WARC File Format 1.0'), ('conformsTo', 'http://bibnum.bnf.fr/WARC/WARC_ISO_28500_version1_latestdraft.pdf'), ('robots', 'classic'), ('wget-arguments', '\"http://www.archiveteam.org/\" \"--warc-file=at\" ')]\n >>> record.content_block.fields['software']\n 'Wget/1.13.4-2608 (linux-gnu)'\n >>> record.content_block.payload.length\n 0\n >>> bytes(warc)[:60]\n b'WARC/1.0\\r\\nWARC-Type: warcinfo\\r\\nContent-Type: application/war'\n >>> bytes(record.content_block.fields)[:60]\n b'software: Wget/1.13.4-2608 (linux-gnu)\\r\\nformat: WARC File Fo'\n\n\n.. note::\n\n The library may not be entirely thread-safe yet.\n\n\nAbout\n=====\n\nThe goal of the Warcat project is to create a tool and library as easy and fast as manipulating any other archive such as tar and zip archives.\n\nWarcat is designed to handle large, gzip-ed files by partially extracting them as needed.\n\nWarcat is provided without warranty and cannot guarantee the safety of your files. Remember to make backups and test them!\n\n\n* Homepage: https://github.com/chfoo/warcat\n* Documentation: http://warcat.readthedocs.org/\n* Questions?: https://answers.launchpad.net/warcat\n* Bugs?: https://github.com/chfoo/warcat/issues\n* PyPI: https://pypi.python.org/pypi/Warcat/\n* Chat: irc://irc.efnet.org/archiveteam-bs (I'll be on #archiveteam-bs on EFnet) \n\n\nSpecification\n+++++++++++++\n\nThis implementation is based loosely on draft ISO 28500 papers ``WARC_ISO_28500_version1_latestdraft.pdf`` and ``warc_ISO_DIS_28500.pdf`` which can be found at http://bibnum.bnf.fr/WARC/ .\n\n\nFile format\n+++++++++++\n\nHere's a quick description:\n\nA WARC file contains one or more Records concatenated together. Each Record contains Named Fields, newline, a Content Block, newline, and newline. A Content Block may be two types: {binary data} or {Named Fields, newline, and binary data}. Named Fields consists of string, colon, string, and newline.\n\nA Record may be compressed with gzip. Filenames ending with ``.warc.gz`` indicate one or more gzip compressed files concatenated together.\n\n\nAlternatives\n++++++++++++\n\nWarcat is inspired by\n\n* https://github.com/internetarchive/warc\n* http://code.hanzoarchives.com/warc-tools\n\n\nDevelopment\n===========\n\n.. image:: https://travis-ci.org/chfoo/warcat.png\n :target: https://travis-ci.org/chfoo/warcat\n :alt: Travis build status\n\n\nTesting\n+++++++\n\nAlways remember to test. Continue testing::\n\n python3 -m unittest discover -p '*_test.py'\n nosetests3\n\n\nTo-do\n+++++\n\n* Smart archive join\n* Regex filtering of records\n* Generate index to disk (eg, for fast resume)\n* Grab files like wget and archive them\n* See TODO and FIXME markers in code\n* etc.\n\n", "description_content_type": null, "docs_url": "https://pythonhosted.org/Warcat/", "download_url": "UNKNOWN", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/chfoo/warcat", "keywords": null, "license": "UNKNOWN", "maintainer": null, "maintainer_email": null, "name": "Warcat", "package_url": "https://pypi.org/project/Warcat/", "platform": "UNKNOWN", "project_url": "https://pypi.org/project/Warcat/", "project_urls": { "Download": "UNKNOWN", "Homepage": "https://github.com/chfoo/warcat" }, "release_url": "https://pypi.org/project/Warcat/2.2.5/", "requires_dist": null, "requires_python": null, "summary": "Tool and library for handling Web ARChive (WARC) files.", "version": "2.2.5" }, "last_serial": 2805019, "releases": { "1.6": [ { "comment_text": "", "digests": { "md5": "4a8938ea50f47284ac61862b8a39a273", "sha256": "15241f918c15271d6582e1199ea36c8b1955015dc06a76b4223cc9e0bddecea2" }, "downloads": -1, "filename": "Warcat-1.6.tar.gz", "has_sig": false, "md5_digest": "4a8938ea50f47284ac61862b8a39a273", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 52983, "upload_time": "2013-04-10T23:48:56", "url": "https://files.pythonhosted.org/packages/28/6b/affdbc9b5edf51c7ec856d2e06d8ec162607863b303fe09e531ec734fa23/Warcat-1.6.tar.gz" } ], "1.8": [ { "comment_text": "", "digests": { "md5": "9f6a169ecdebb04e3112f4495a5b86ec", "sha256": "fd1e0fb7e5bf0c4bc3c2c6c13d3e90e40e13ed8d04be381e612bee8ddccca8c9" }, "downloads": -1, "filename": "Warcat-1.8.tar.gz", "has_sig": false, "md5_digest": "9f6a169ecdebb04e3112f4495a5b86ec", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 53501, "upload_time": "2013-04-11T02:16:18", "url": "https://files.pythonhosted.org/packages/40/8a/fc27caecd8e162af19e8c92bdab48032b5e0c99533d69a2f9ea9abff07ab/Warcat-1.8.tar.gz" } ], "2.0.1": [ { "comment_text": "", "digests": { "md5": "198f178b7bed5e0b4678e93c9bac74ad", "sha256": "f8a86dc4be1fab6c42f075a6843a87e6745268ec67a10cee7d469507e523bb9e" }, "downloads": -1, "filename": "Warcat-2.0.1.tar.gz", "has_sig": false, "md5_digest": "198f178b7bed5e0b4678e93c9bac74ad", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 55237, "upload_time": "2013-04-11T17:34:43", "url": "https://files.pythonhosted.org/packages/70/5d/d704fb229e1c27755c8de2d3522f8e492ba938af25eee30673e7bc88760a/Warcat-2.0.1.tar.gz" } ], "2.0.2": [ { "comment_text": "", "digests": { "md5": "fcae613d29898bf7e7d762ed925f427f", "sha256": "2fdb8ec843338e1ac2d4493c630ad9aa8fd373623b43d8b0ef84f82f519336f7" }, "downloads": -1, "filename": "Warcat-2.0.2.tar.gz", "has_sig": false, "md5_digest": "fcae613d29898bf7e7d762ed925f427f", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 55432, "upload_time": "2013-04-11T18:00:37", "url": "https://files.pythonhosted.org/packages/7a/f3/89bae4cb2b41cc772949dc7232de61f5c5fd47c7d4fa733cef390076e69c/Warcat-2.0.2.tar.gz" } ], "2.1": [ { "comment_text": "", "digests": { "md5": "d7a8c5327ddc2e726d7d9dc78dc8a759", "sha256": "72c9b79fe4fdf15e08c653f3d354a75017387f697fc6139b307c6271cf550f51" }, "downloads": -1, "filename": "Warcat-2.1.tar.gz", "has_sig": false, "md5_digest": "d7a8c5327ddc2e726d7d9dc78dc8a759", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 55729, "upload_time": "2013-04-12T00:03:52", "url": "https://files.pythonhosted.org/packages/fb/c0/ab7e47e37a6177177f16310ee3fbf19d9119f9b645445722f34e9bce7a38/Warcat-2.1.tar.gz" } ], "2.2.1": [ { "comment_text": "", "digests": { "md5": "8569751330f16a2b34d6e01fab2416f6", "sha256": "0e1b3de886325f12790f7bb01e11dee012fd149265ccacc18b275f1734fffdc8" }, "downloads": -1, "filename": "Warcat-2.2.1.tar.gz", "has_sig": false, "md5_digest": "8569751330f16a2b34d6e01fab2416f6", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 57012, "upload_time": "2013-04-26T15:09:06", "url": "https://files.pythonhosted.org/packages/e4/4b/7172916649518d18e1ace9627a1f3bc9b853921977aca7798b3d14001f2a/Warcat-2.2.1.tar.gz" } ], "2.2.2": [ { "comment_text": "", "digests": { "md5": "54b2cae2402971861225004839fa8a02", "sha256": "6bfd18e43a7ac21d481becf37b9e54a5a59a95c8b400925a26af1fde1c95358c" }, "downloads": -1, "filename": "Warcat-2.2.2.tar.gz", "has_sig": false, "md5_digest": "54b2cae2402971861225004839fa8a02", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 56733, "upload_time": "2014-01-17T21:27:06", "url": "https://files.pythonhosted.org/packages/f7/4a/eda83e528eeeba11dca2ad72242acd098b6d7b52122fb0bd7cd0af9e7c4c/Warcat-2.2.2.tar.gz" } ], "2.2.3": [ { "comment_text": "", "digests": { "md5": "039b75c2dd3e09590baef85a12271356", "sha256": "13114dd9241c429845cb985df133c50f51423afa7d8365c4f488908d0211909d" }, "downloads": -1, "filename": "Warcat-2.2.3.tar.gz", "has_sig": true, "md5_digest": "039b75c2dd3e09590baef85a12271356", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 57454, "upload_time": "2014-06-05T20:50:18", "url": "https://files.pythonhosted.org/packages/49/f5/797bda8143045627c7c179bf77c1a5618032f390bfcaf9d54a1b3275969e/Warcat-2.2.3.tar.gz" } ], "2.2.4": [ { "comment_text": "", "digests": { "md5": "c94328c5f190b13f77adad3959f8710c", "sha256": "d0a3e5f7555d165ffe21c2153aee4988888194188604f63ff2b621bef7229bc3" }, "downloads": -1, "filename": "Warcat-2.2.4.tar.gz", "has_sig": true, "md5_digest": "c94328c5f190b13f77adad3959f8710c", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 57585, "upload_time": "2016-03-18T03:04:23", "url": "https://files.pythonhosted.org/packages/81/9e/598ff25921f1c4018cf7b7ba6871fa03131c0e7c63950cc19422e1542607/Warcat-2.2.4.tar.gz" } ], "2.2.5": [ { "comment_text": "", "digests": { "md5": "30393815f749e7ae79b2cf14639c8cc7", "sha256": "a3d7af6b4f1cbc6244833e19904932647c8f57d46a2b770fc499a1ec5ca8a8bd" }, "downloads": -1, "filename": "Warcat-2.2.5.tar.gz", "has_sig": true, "md5_digest": "30393815f749e7ae79b2cf14639c8cc7", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 57866, "upload_time": "2017-04-15T00:58:48", "url": "https://files.pythonhosted.org/packages/51/78/3abb1702eae1ac1dec44a0d1d366ff10394679894b7a2acc6b6efd0db898/Warcat-2.2.5.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "30393815f749e7ae79b2cf14639c8cc7", "sha256": "a3d7af6b4f1cbc6244833e19904932647c8f57d46a2b770fc499a1ec5ca8a8bd" }, "downloads": -1, "filename": "Warcat-2.2.5.tar.gz", "has_sig": true, "md5_digest": "30393815f749e7ae79b2cf14639c8cc7", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 57866, "upload_time": "2017-04-15T00:58:48", "url": "https://files.pythonhosted.org/packages/51/78/3abb1702eae1ac1dec44a0d1d366ff10394679894b7a2acc6b6efd0db898/Warcat-2.2.5.tar.gz" } ] }