{ "info": { "author": "Tom Ritchford", "author_email": "tom@swirly.com", "bugtrack_url": null, "classifiers": [ "Development Status :: 5 - Production/Stable", "License :: OSI Approved :: MIT License", "Programming Language :: Python :: 3.6", "Programming Language :: Python :: 3.7" ], "description": "hardback: hard-copy backup of digital data\n------------------------------------------------\n\nNewest updates are\n`here `_.\n\nIn one sentence\n==================\n\nArchive a digital document as a hardcopy book that can then be turned back\ninto the original document.\n\n\nHigh level picture\n======================\n\nThere are only two parts to this project:\n\n* Writing the original document or book from a digital document\n* Reading the book back in\n\nWriting is non-trivial, but there is a clear path to a good solution for that.\n\nBut I really don't have any solid solution yet to reading, short of someone\nscanning each QR code individually.\n\nOf course, that is a reasonable solution if you care about the data and don't\nmind paying someone to take the time.\n\n\nThe book format is EPUB\n============================================\n\nThe output format will be EPUB, https://en.wikipedia.org/wiki/EPUB -\nthe only choice for an open-source book format, full-featured and universally\naccepted.\n\nI'm using a Python library called EBookLib for this - I haven't looked\ninto it thoroughly yet, but it seems well-received and there is no other\ncandidate in Python.\n\nUpdate: EBookLib is fairly gnarly, but the underlying format is just XHTML,\nso I'm having reasonable success getting output.\n\nThe data format within the book is QR code\n=============================================\n\nQR codes will be used to store the data in 1k blocks - again, QR is the only\nreasonable choice for solving the problem of printable data.\n\nA Python library called segno can write each one as a tiny PNG file about 2K in\nsize. This is quite reasonable - it means that we can aim to create a book\ndocument that's less than three times the size of the original digital\ndocument. (Interestingly enough, SVG files were an order of magnitude larger -\nin some cases over one hundred times larger!)\n\nWe'll be using QR code format 36, which holds up to 1,051 bytes at the highest\nerror correction code level, 'H'.\n\nThe official list of all the QR code formats,\nhttps://www.qrcode.com/en/about/version.html is poorly organized - click on\n31-40 and then scroll down.\n\nI'm going to use that to hold 1024 bytes of target data with an index\nand a hash of the original document, totalling 1,048 bytes. (The extra 3 bytes\naren't entirely wasted - we get a tiny bit better error correction.)\n\n\nData layout\n=============================\n\nThe binary data is divided into 1K *chunks*. A chunk is written to a QR code\nas part of a *block*, which also contains an index and a hash of the\noriginal documet.\n\nThe layout in bytes within the block is by default like this:\n\n.. code-block:: text\n\n | index [8] | document[8] | chunk [up to 1024] |\n\n\nbut you can customize all these sizes.\n\nThere's no checksum or error correction for this block itself, as the QR code is\nalready taking care of that for us.\n\n``hash`` is the first 16 bytes of the 32-byte SHA256 hash of the entire\ndocument. ``data`` is one kilobyte from your target file.\n\n``index`` is an 8-byte signed integer - a number that can be positive,\nnegative or zero, and that fits into 8 bytes (or equivalently 16 hex digits).\n\nIf the index is zero or negative, then it is a metadata block.\n\nThe block with index zero always contains a JSON description of the\noriginal file with the fields ``filename``, ``timestamp``, ``size`` and\n``sha256``. If the original filename is too long (which would be about 900\ncharacters or so!), it is truncated from the left.\n\nBlocks with negative indexs are currently unspecified and reserved\nfor future expansion or individuals to use. The first version of the software\nwill only produce output with non-negative indexs.\n\nIf ``index`` is positive, it's the index of a data block. This\nmeans that the first data block has ``index`` 1.\n\nEight bytes allows us to generate 2 to the power of 63 blocks of 1K each, or\nabout 9 zetabytes (which is 9,000,000,000,000 gigabytes) - roughly the entire\nsize of all the world's data in 2019.\n\nWithin a block, ``index`` is is represented in `big-endian\n`_ (or intuitive or network order) -\nwhich means the *most* significant digits occur first.\n\nIntel processors are little-ended, where the *least* significant digits come\nfirst, so we use the `struct library\n`_\nto make sure that the output is system-independent.\n\nRemembering that one byte is equal to two hex digits, if the hash of a\nfull document is\n``56484fd9aad8e87540609ca6c938f98fab60296b3bec808ea8b3e24da2035ce9``\nthen the resulting sequence of QR codes would look like:\n\n.. code-block:: text\n\n 0000000000000000 56484fd9aad8e87540609ca6c938f98f {\"filename\": \"me.jpg\", ...\n 0000000000000001 56484fd9aad8e87540609ca6c938f98f ... 1024 bytes ...\n 0000000000000002 56484fd9aad8e87540609ca6c938f98f ... more data ...\n ... etc\n\nThis means that each QR code identifies itself as to what part of the whole\ndocument it is.\n\nIt also means that the metadata block is key to understanding how the whole\nsystem works! If you have a metadata block, then you can reconstruct at least\npart of the data even if a lot of it is lost. Otherwise, you really have to\nguess.\n\nSo we're going to have to intersperse the metadata block within all the other\nblocks periodically if we really want something that can be partially\nreconstructed!\n\nUpdate - this is done: the metadata blocks appear in varying locations on each\npage so even a hole were punched through the book, some copy of the metadata\nwould probably survive.\n\nAlso, \"raw\" formats like RAW and AIFF are much preferable for this sort of\narchival activity because compressed formats dramatically magnify the effect of\nany errors or gaps. If you had a book containing the digital data for an AIFF\nor RAW, you could still reconstruct pieces of it even if you only have a\nlimited number of pages, whereas you might get nothing at all if you were using\nmp3 or jpg files.\n", "description_content_type": "", "docs_url": null, "download_url": "http://github.com/timedata-org/hardback/archive/0.9.2.tar.gz", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "http://github.com/timedata-org/hardback", "keywords": "", "license": "MIT", "maintainer": "", "maintainer_email": "", "name": "hardback", "package_url": "https://pypi.org/project/hardback/", "platform": "", "project_url": "https://pypi.org/project/hardback/", "project_urls": { "Download": "http://github.com/timedata-org/hardback/archive/0.9.2.tar.gz", "Homepage": "http://github.com/timedata-org/hardback" }, "release_url": "https://pypi.org/project/hardback/0.9.2/", "requires_dist": null, "requires_python": "", "summary": "Hardcopy backups of digital data", "version": "0.9.2" }, "last_serial": 5375498, "releases": { "0.9.2": [ { "comment_text": "", "digests": { "md5": "884212f4076a22f158bb7a11e1186da4", "sha256": "9da3a12e6f257c8d0295020a08fb79a89e43966a3f21831446eab902da666290" }, "downloads": -1, "filename": "hardback-0.9.2.tar.gz", "has_sig": false, "md5_digest": "884212f4076a22f158bb7a11e1186da4", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 14952, "upload_time": "2019-06-08T14:22:57", "url": "https://files.pythonhosted.org/packages/34/38/0c4ceb395b472a96e970d33652ba1d5b8f23993efe59fa82888392945549/hardback-0.9.2.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "884212f4076a22f158bb7a11e1186da4", "sha256": "9da3a12e6f257c8d0295020a08fb79a89e43966a3f21831446eab902da666290" }, "downloads": -1, "filename": "hardback-0.9.2.tar.gz", "has_sig": false, "md5_digest": "884212f4076a22f158bb7a11e1186da4", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 14952, "upload_time": "2019-06-08T14:22:57", "url": "https://files.pythonhosted.org/packages/34/38/0c4ceb395b472a96e970d33652ba1d5b8f23993efe59fa82888392945549/hardback-0.9.2.tar.gz" } ] }