{ "info": { "author": "Luis Pedro Coelho", "author_email": "luis@luispedro.org", "bugtrack_url": null, "classifiers": [ "Intended Audience :: Developers", "License :: OSI Approved :: MIT License", "Operating System :: OS Independent", "Programming Language :: Python", "Programming Language :: Python :: 2", "Programming Language :: Python :: 2.7", "Programming Language :: Python :: 3", "Programming Language :: Python :: 3.3", "Programming Language :: Python :: 3.4", "Programming Language :: Python :: 3.5", "Programming Language :: Python :: 3.6", "Topic :: Software Development :: Libraries" ], "description": "# Disk-based hashtable\n\n[![Travis](https://api.travis-ci.org/luispedro/diskhash.png)](https://travis-ci.org/luispedro/diskhash)\n[![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](https://opensource.org/licenses/MIT)\n\n\nA simple disk-based hash table (i.e., persistent hash table).\n\nIt is a hashtable implemented on memory-mapped disk, so that it can be loaded\nwith a single `mmap()` system call and used in memory directly (being as fast\nas an in-memory hashtable once it is loaded from disk).\n\nThe code is in C, wrappers are provided for Python, Haskell, and C++. The\nwrappers follow similar APIs with variations to accommodate the language\nspecificity. They all use the same underlying code, so you can open a hashtable\ncreated in C from Haskell, modify it within your Haskell code, and later open\nthe result in Python (although Python's version currently only deals with\nintegers, stored as longs).\n\nCross-language functionality will work best for very simple types so that you\ncan control their binary representation (64-bit integers, for example).\n\nReading does not touch the disk representation at all and, thus, can be done on\ntop of read-only files or using multiple threads (and different processes will\nshare the memory: the operating system does that for you). Writing or modifying\nvalues is, however, not thread-safe.\n\n## Examples\n\nThe following examples all create a hashtable to store longs (`int64_t`), then\nset the value associated with the key `\"key\"` to 9. In the current API, the\nmaximum size of the keys needs to be pre-specified, which is the value `15`\nbelow.\n\n### Raw C\n\n```c\n#include \n#include \n#include \"diskhash.h\"\n\nint main(void) {\n HashTableOpts opts;\n opts.key_maxlen = 15;\n opts.object_datalen = sizeof(int64_t);\n char* err = NULL;\n HashTable* ht = dht_open(\"testing.dht\", opts, O_RDWR|O_CREAT, &err);\n if (!ht) {\n if (!err) err = \"Unknown error\";\n fprintf(stderr, \"Failed opening hash table: %s.\\n\", err);\n return 1;\n }\n long i = 9;\n dht_insert(ht, \"key\", &i);\n \n long* val = (long*) dht_lookup(ht, \"key\");\n printf(\"Looked up value: %l\\n\", *val);\n\n dht_free(ht);\n return 0;\n}\n```\n\n### Haskell\n\nIn Haskell, you have different types/functions for read-write and read-only\nhashtables.\n\nRead write example:\n\n```haskell\nimport Data.DiskHash\nimport Data.Int\nmain = do\n ht <- htOpenRW \"testing.dht\" 15\n htInsertRW ht \"key\" (9 :: Int64)\n val <- htLookupRW \"key\" ht\n print val\n```\n\nRead only example (`htLookupRO` is pure in this case):\n\n```haskell\nimport Data.DiskHash\nimport Data.Int\nmain = do\n ht <- htOpenRO \"testing.dht\" 15\n let val :: Int64\n val = htLookupRO \"key\" ht\n print val\n```\n\n\n### Python\n\nPython's interface is based on the [struct\nmodule](https://docs.python.org/3/library/struct.html). For example, `'ll'`\nrefers to a pair of 64-bit ints (_longs_):\n\n```python\nimport diskhash\ntb = diskhash.StructHash(\"testing.dht\", 15, 'll', 'rw')\ntb.insert(\"key\", 1, 2)\nprint(tb.lookup(\"key\"))\n```\n\nThe Python interface is currently Python 3 only. Patches to extend it to 2.7\nare welcome, but it's not a priority.\n\n\n### C++\n\nIn C++, a simple wrapper is defined, which provides a modicum of type-safety.\nYou use the `DiskHash` template. Additionally, errors are reported through\nexceptions (both `std::bad_alloc` and `std::runtime_error` can be thrown) and\nnot return codes.\n\n```c++\n#include \n#include \n\n#include \n\nint main() {\n const int key_maxlen = 15;\n dht::DiskHash ht(\"testing.dht\", key_maxlen, dht::DHOpenRW);\n std::string line;\n uint64_t ix = 0;\n while (std::getline(std::cine, line)) {\n if (line.length() > key_maxlen) {\n std::cerr << \"Key too long: '\" << line << \"'. Aborting.\\n\";\n return 2;\n }\n const bool inserted = ht.insert(line.c_str(), ix);\n if (!inserted) {\n std::cerr << \"Found repeated key '\" << line << \"' (ignored).\\n\";\n }\n ++ix;\n }\n return 0;\n}\n```\n\n## Stability\n\nThis is _beta_ software. It is good enough that I am using it, but the API can\nchange in the future with little warning. The binary format is versioned (the\nmagic string encodes its version, so changes can be detected and you will get\nan error message in the future rather than some silent misbehaviour.\n\n[Automated unit testing](https://travis-ci.org/luispedro/diskhash) ensures that\nbasic mistakes will not go uncaught.\n\n## Limitations\n\n- You must specify the maximum key size. This can be worked around either by\n pre-hashing the keys (with a strong hash) or using multiple hash tables for\n different key sizes. Neither is currently implemented in diskhash.\n\n- You cannot delete objects. This was not a necessity for my uses, so it was\n not implemented. A simple implementation could be done by marking objects as\n \"deleted\" in place and recompacting when the hash table size changes or with\n an explicit `dht_gc()` call. It may also be important to add functionality to\n shrink hashtables so as to not waste disk space.\n\n- The algorithm is a rather na\u00efve implementation of linear addression. It would\n not be hard to switch to [robin hood\n hashing](https://www.sebastiansylvan.com/post/robin-hood-hashing-should-be-your-default-hash-table-implementation/)\n and this may indeed happen in the near future.\n\nLicense: MIT\n\n", "description_content_type": null, "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "http://github.com/luispedro/diskhash", "keywords": "", "license": "MIT", "maintainer": "", "maintainer_email": "", "name": "diskhash", "package_url": "https://pypi.org/project/diskhash/", "platform": "Any", "project_url": "https://pypi.org/project/diskhash/", "project_urls": { "Homepage": "http://github.com/luispedro/diskhash" }, "release_url": "https://pypi.org/project/diskhash/0.4.0/", "requires_dist": null, "requires_python": "", "summary": "Disk-based hashtable", "version": "0.4.0" }, "last_serial": 3368724, "releases": { "0.0": [ { "comment_text": "", "digests": { "md5": "181ebb87395ff5a6afec489edc8b6b7e", "sha256": "ea0dab0c1105e319bed0f9efc8a091d716bc9ae65e33463c14eaecc30f5a680d" }, "downloads": -1, "filename": "diskhash-0.0.tar.gz", "has_sig": false, "md5_digest": "181ebb87395ff5a6afec489edc8b6b7e", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 12366, "upload_time": "2017-06-26T11:52:14", "url": "https://files.pythonhosted.org/packages/7a/c4/1886751c936bfb5a829d2b4496276f1790996994a37ac36959b015b893c3/diskhash-0.0.tar.gz" } ], "0.1": [ { "comment_text": "", "digests": { "md5": "0e85ceada1877ec709c1f804dba13e72", "sha256": "28f02eba67b7a3a035bee8e1e5d7d6cbaefd13b5d7105fec2437922810760524" }, "downloads": -1, "filename": "diskhash-0.1.tar.gz", "has_sig": false, "md5_digest": "0e85ceada1877ec709c1f804dba13e72", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 13673, "upload_time": "2017-09-07T17:39:05", "url": "https://files.pythonhosted.org/packages/e0/1f/719e9cb17800d581758e2d8a449ccd525ef59483edf0f4a415c2bc3f18f4/diskhash-0.1.tar.gz" } ], "0.2": [ { "comment_text": "", "digests": { "md5": "29c4fd393e45ffc41f03fed7a988c395", "sha256": "928765ac86f6e2ba25ab98ca176a7d9ab7db32bd15f4d7ad6c043f0939cc5e35" }, "downloads": -1, "filename": "diskhash-0.2.tar.gz", "has_sig": false, "md5_digest": "29c4fd393e45ffc41f03fed7a988c395", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 14037, "upload_time": "2017-10-05T19:00:33", "url": "https://files.pythonhosted.org/packages/b2/5f/a11fbf960a20be7d49ae2bc2915b4bf0618cd0bb477f9afb8461fd005a9c/diskhash-0.2.tar.gz" } ], "0.2.1": [ { "comment_text": "", "digests": { "md5": "6733a0a7075b3dfc9863e1f08f1f6ad8", "sha256": "002e5f1c0656770e7dd2b55b38b07bc27a3b5daae172f206baea1f1c7cd5206e" }, "downloads": -1, "filename": "diskhash-0.2.1.tar.gz", "has_sig": false, "md5_digest": "6733a0a7075b3dfc9863e1f08f1f6ad8", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 14248, "upload_time": "2017-10-13T09:33:19", "url": "https://files.pythonhosted.org/packages/77/7e/4ab51713941a6d9a20150d4affe3588364374f929c71fe9da1c4f45dbd6c/diskhash-0.2.1.tar.gz" } ], "0.2.2": [ { "comment_text": "", "digests": { "md5": "d02d53227c04fc1910cb1dd59ed260a8", "sha256": "488100384f046bdc5c791626437ddc511b65866f5819c14cd227f257bdb95c9c" }, "downloads": -1, "filename": "diskhash-0.2.2.tar.gz", "has_sig": false, "md5_digest": "d02d53227c04fc1910cb1dd59ed260a8", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 14447, "upload_time": "2017-10-23T07:37:05", "url": "https://files.pythonhosted.org/packages/76/f2/6473ca2e99a604d49de621e07ac343a675484cbc06d28a26f56853d57864/diskhash-0.2.2.tar.gz" } ], "0.2.3": [ { "comment_text": "", "digests": { "md5": "7c0dec078783643d46b5c86c55834e4b", "sha256": "3dc6e4fb29eaf11f17dc2c2996628d50a17964c64a9efa9762b9bb456e2d3d1a" }, "downloads": -1, "filename": "diskhash-0.2.3.tar.gz", "has_sig": false, "md5_digest": "7c0dec078783643d46b5c86c55834e4b", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 14357, "upload_time": "2017-10-24T17:07:35", "url": "https://files.pythonhosted.org/packages/eb/9a/210309c826967cb75a7412f35d8e58e4cf2b502af0efff60d5e4d22654e8/diskhash-0.2.3.tar.gz" } ], "0.3.0": [ { "comment_text": "", "digests": { "md5": "e34e1721fee06400432275d48d07f9f0", "sha256": "d1d02f8fbdeaef5e58f4f6370c5d127d58c00fc6fab7c4bbb1efed104a8aff11" }, "downloads": -1, "filename": "diskhash-0.3.0.tar.gz", "has_sig": false, "md5_digest": "e34e1721fee06400432275d48d07f9f0", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 14616, "upload_time": "2017-11-09T14:00:05", "url": "https://files.pythonhosted.org/packages/ef/38/cd0ddba69f8b81e259914d37a5108292cc4851b51fff0f55314bc8f86d08/diskhash-0.3.0.tar.gz" } ], "0.3.1": [ { "comment_text": "", "digests": { "md5": "c4a97dfd09a8ebe9b0287582f63db35d", "sha256": "a07792fa283822588b6053586a755ac2c94bbd00a0dcfdc70abd2d8046916c4e" }, "downloads": -1, "filename": "diskhash-0.3.1.tar.gz", "has_sig": false, "md5_digest": "c4a97dfd09a8ebe9b0287582f63db35d", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 18072, "upload_time": "2017-11-09T14:14:22", "url": "https://files.pythonhosted.org/packages/55/ee/4de0075b466255a1d7965f96fdbff662a34d1f83bc1cf691b8639b47964e/diskhash-0.3.1.tar.gz" } ], "0.4.0": [ { "comment_text": "", "digests": { "md5": "f6a2b4832d5dae4f05c0d0b1a6fe6e7e", "sha256": "f59622ec007b9c91f94c7257b9ae365dc0ad9f01a134c6d3af19b0a56e67da6f" }, "downloads": -1, "filename": "diskhash-0.4.0.tar.gz", "has_sig": false, "md5_digest": "f6a2b4832d5dae4f05c0d0b1a6fe6e7e", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 18572, "upload_time": "2017-11-27T18:12:30", "url": "https://files.pythonhosted.org/packages/1e/a4/672ea698c480c44025dd6f82ea2fe1482814d18cc33f0ddac97f8972b860/diskhash-0.4.0.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "f6a2b4832d5dae4f05c0d0b1a6fe6e7e", "sha256": "f59622ec007b9c91f94c7257b9ae365dc0ad9f01a134c6d3af19b0a56e67da6f" }, "downloads": -1, "filename": "diskhash-0.4.0.tar.gz", "has_sig": false, "md5_digest": "f6a2b4832d5dae4f05c0d0b1a6fe6e7e", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 18572, "upload_time": "2017-11-27T18:12:30", "url": "https://files.pythonhosted.org/packages/1e/a4/672ea698c480c44025dd6f82ea2fe1482814d18cc33f0ddac97f8972b860/diskhash-0.4.0.tar.gz" } ] }