{ "info": { "author": "", "author_email": "", "bugtrack_url": null, "classifiers": [], "description": "## Less Hash Bloom Filter\n[![Build Status](https://travis-ci.com/garawalid/LH-BloomFilter.svg?branch=master)](https://travis-ci.com/garawalid/LH-BloomFilter)\n\nLess Hash Bloom Filter is fast bloom filter suitable for Big Data. \n\nThe computation of hash functions and checking the existence of an element is a major computation overhead. \nAlso, bloom filter requires multiple independent hash functions, and well-designed hash functions are computation-intensive like MD5, SHA-1 [1]. \n\nIn this implementation, we use a different technique to generate the k hash functions from only two. Therefore, the bloom filter is fast.\n\n### Install\n\nInstall Less Hash Bloom Filter with pip as follows:\n```\n$ pip install LessHash-BloomFilter\n```\n\n### Usage\n\nLHBF needs to know the size of bloom filter `m` and number of hash functions `k`.\n\n**Note:** You should use high `m` to avoid the collision of hash functions. The probability of two random strings colliding is ~ 1/m\n\n\n\n```python\nfrom lhbf import BloomFilter\n\n# Create a bloom filter \nbf = BloomFilter(m=200, k=2)\n\n# Add an element\nbf.add(\"a\")\n\n# Check if element exists\nbf.might_contain(\"a\")\n\n# Estimate flase positive probability \nbf.estimate_fpp()\n\n# Combine two bloom filters\nbf2 = BloomFilter(m=200, k=2)\nbf.combine(bf2)\n\n```\n\n### Details\n\n+ Hash functions used: \n + For integer, we use **Knuth multiplicative hash** [2]\n + For string, we use **polynomial rolling hash function** [3]\n\n+ k hash functions: \n\n Using two hash functions, we calculate the k hash functions as follows: \n \n *gi(x) = h1(x) + i x h2(x) mod m, where 0 ≤ i ≤ k-1* \n \n It has been proved that using this method does not increase the asymptotic false positive probability [4].\n\n\n### Contributing\nYou're welcome to submit pull requests with any changes for this repository at any time. I'll be very glad to see any contributions.\n\n\n### References\n\n+ [1] Luo, Lailong, et al. Optimizing bloom filter: challenges, solutions, and comparisons. IEEE Communications Surveys & Tutorials (2018). \n+ [2] Knuth, Donald Ervin. The art of computer programming: sorting and searching. Vol. 3. Pearson Education, 1997.\n+ [3] Karp, Richard M., and Michael O. Rabin. Efficient randomized pattern-matching algorithms. IBM journal of research and development 31.2 (1987): 249-260. \n+ [4] Kirsch, Adam, and Michael Mitzenmacher. Less hashing, same performance: building a better bloom filter. European Symposium on Algorithms. Springer, Berlin, Heidelberg, 2006.", "description_content_type": "text/markdown", "docs_url": null, "download_url": "https://github.com/garawalid/LH-BloomFilter/archive/v0.0.4.tar.gz", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/garawalid/LH-BloomFilter", "keywords": "", "license": "Apache License v2", "maintainer": "Walid Gara", "maintainer_email": "", "name": "LessHash-BloomFilter", "package_url": "https://pypi.org/project/LessHash-BloomFilter/", "platform": "", "project_url": "https://pypi.org/project/LessHash-BloomFilter/", "project_urls": { "Download": "https://github.com/garawalid/LH-BloomFilter/archive/v0.0.4.tar.gz", "Homepage": "https://github.com/garawalid/LH-BloomFilter" }, "release_url": "https://pypi.org/project/LessHash-BloomFilter/0.0.5/", "requires_dist": null, "requires_python": "", "summary": "Fast Bloom Filter", "version": "0.0.5" }, "last_serial": 5437315, "releases": { "0.0.2": [ { "comment_text": "", "digests": { "md5": "352569f2bfa7d6ecceef99cd0a995a95", "sha256": "c765ffc67104ecec4fcceebaeb82c72baae0e17c3263d5a281c6a4e2136f5d9a" }, "downloads": -1, "filename": "LessHash-BloomFilter-0.0.2.tar.gz", "has_sig": false, "md5_digest": "352569f2bfa7d6ecceef99cd0a995a95", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 3712, "upload_time": "2019-06-07T19:27:54", "url": "https://files.pythonhosted.org/packages/27/a4/707de323d0b5ccadaa5ae7e68f4e67763f3dc161b684477a9f96d25f3dc5/LessHash-BloomFilter-0.0.2.tar.gz" } ], "0.0.3": [ { "comment_text": "", "digests": { "md5": "07d251c783bbe36a4a7991ee1d6ff142", "sha256": "d032e8f198a3598843ec80d0f1940c213fdcefaaca2160e4ec7235e8c5c06858" }, "downloads": -1, "filename": "LessHash-BloomFilter-0.0.3.tar.gz", "has_sig": false, "md5_digest": "07d251c783bbe36a4a7991ee1d6ff142", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 3885, "upload_time": "2019-06-07T19:36:35", "url": "https://files.pythonhosted.org/packages/92/03/32e6762159c5b71ff39e082e9555c337fb2fbf62dad4d593b113ab46b3ff/LessHash-BloomFilter-0.0.3.tar.gz" } ], "0.0.4": [ { "comment_text": "", "digests": { "md5": "c13e15dc3cb068ec2bead18d4b225f2b", "sha256": "8f9199cd6e360bf2bd300d32b2c08267ef4b347e1f820b09fb491413f1c1a150" }, "downloads": -1, "filename": "LessHash-BloomFilter-0.0.4.tar.gz", "has_sig": false, "md5_digest": "c13e15dc3cb068ec2bead18d4b225f2b", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 3908, "upload_time": "2019-06-07T19:49:05", "url": "https://files.pythonhosted.org/packages/c9/4f/6af543aca46e838c2dd5a775025aaf2cc4ca319fd1969148744782c1ca0f/LessHash-BloomFilter-0.0.4.tar.gz" } ], "0.0.5": [ { "comment_text": "", "digests": { "md5": "0416ad2e322e36317a6c800967ed6813", "sha256": "a2ff6906ec78dbe3106e803ba2f2a724cb80e578cf3aaaf94ee8362ab59cefeb" }, "downloads": -1, "filename": "LessHash-BloomFilter-0.0.5.tar.gz", "has_sig": false, "md5_digest": "0416ad2e322e36317a6c800967ed6813", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 4218, "upload_time": "2019-06-23T13:14:53", "url": "https://files.pythonhosted.org/packages/07/8c/c47a53b548b4ea3099ccc951faead48f00f54224c4350630ae208f80afea/LessHash-BloomFilter-0.0.5.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "0416ad2e322e36317a6c800967ed6813", "sha256": "a2ff6906ec78dbe3106e803ba2f2a724cb80e578cf3aaaf94ee8362ab59cefeb" }, "downloads": -1, "filename": "LessHash-BloomFilter-0.0.5.tar.gz", "has_sig": false, "md5_digest": "0416ad2e322e36317a6c800967ed6813", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 4218, "upload_time": "2019-06-23T13:14:53", "url": "https://files.pythonhosted.org/packages/07/8c/c47a53b548b4ea3099ccc951faead48f00f54224c4350630ae208f80afea/LessHash-BloomFilter-0.0.5.tar.gz" } ] }