{ "info": { "author": "Huy Do", "author_email": "huydhn@gmail.com", "bugtrack_url": null, "classifiers": [ "License :: OSI Approved :: MIT License", "Operating System :: OS Independent", "Programming Language :: Python :: 3" ], "description": "Cuckoo filters\n--------------\n\nA Cuckoo filter is a data structure for probabilistic set-membership\nqueries with a low false positive probability (FPP). As an improvement\nover the classic Bloom filter, items can be added or removed into\nCuckoo filters at will. A Cuckoo filter also utilizes space more\nefficiently.\n\nCuckoo filters were originally described in:\n Fan, B., Andersen, D. G., Kaminsky, M., & Mitzenmacher, M. D. (2014, December).\n Cuckoo filter: Practically better than bloom.\n In Proceedings of the 10th ACM International on Conference on emerging Networking\n Experiments and Technologies (pp. 75-88). ACM.\n\nThis package implements the basic Cuckoo filter as well as several\nderivations built on top of it.\n\nInstallation\n------------\n\nThe package can be installed from\n[PyPI](https://pypi.org/project/scalable-cuckoo-filter)\n\n```\npip install scalable-cuckoo-filter\n```\n\nClassic Cuckoo filter\n---------------------\n\n```python\nimport math\n\nfrom random import randrange\nfrom cuckoo.filter import CuckooFilter\n\ncapacity = 1000000\nerror_rate = 0.000001\n# Create a classic Cuckoo filter with a fixed capacity of 1000000\n# buckets\ncuckoo = CuckooFilter(capacity=capacity, error_rate=error_rate)\n\nbucket_size = 6\n# Setting the bucket size is optional, the bigger the bucket,\n# the more number of items a filter can hold, and the longer\n# the fingerprint needs to be to stay at the same error rate\ncuckoo = CuckooFilter(capacity=capacity, error_rate=error_rate, bucket_size=bucket_size)\n\n# The fingerprint length is computed using the following formula:\nfingerprint_size = int(math.ceil(math.log(1.0 / error_rate, 2) + math.log(2 * bucket_size, 2)))\n\nfor _ in range(1, 100000):\n item = str(randrange(1, 1000000000))\n cuckoo.insert(item)\n\n if cuckoo.contains(item):\n print '{} has been added'.format(item)\n\n cuckoo.delete(item)\n\n if not cuckoo.contains(item):\n print '{} has been removed'.format(item)\n```\n\nBitarray Cuckoo filter\n----------------------\n\nA classic Cuckoo filter is implemented using a Python list of Bucket\nobjects. Each Bucket, again, stores a fixed list of fingerprints. The\nimplementation is easy and straightforward but unnecessary wasteful\nfor applications that needs to keep hundreds of millions of items.\n\nThe bitarray Cuckoo filter is built to deal with such situation. The\nwhole filter is compressed into a bitarray to minimize memory usage.\nFor example, a bitarray Cuckoo filter with capacity of 100.000.000,\nbucket size of 4, and error rate of 0.000001 will requires:\n\n- 23-bit fingerprint, computed using the above formula.\n- 9.200.000.000 bits = 1.08 GB = capacity * bucket size * fingerprint.\n\nAnd it can theoretically store up to 400.000.000 items at full capacity.\n\n```python\nimport math\n\nfrom random import randrange\nfrom cuckoo.filter import BCuckooFilter\n\ncapacity = 1000000\nerror_rate = 0.000001\n# Create a bit array Cuckoo filter with a fixed capacity of 1000000\n# buckets\ncuckoo = BCuckooFilter(capacity=capacity, error_rate=error_rate)\n\nbucket_size = 6\n# Setting the bucket size is optional, the bigger the bucket,\n# the more number of items a filter can hold, and the longer\n# the fingerprint needs to be to stay at the same error rate\ncuckoo = BCuckooFilter(capacity=capacity, error_rate=error_rate, bucket_size=bucket_size)\n\n# The fingerprint length is computed using the following formula:\nfingerprint_size = int(math.ceil(math.log(1.0 / error_rate, 2) + math.log(2 * bucket_size, 2)))\n\nfor _ in range(1, 100000):\n item = str(randrange(1, 1000000000))\n cuckoo.insert(item)\n\n if cuckoo.contains(item):\n print '{} has been added'.format(item)\n\n cuckoo.delete(item)\n\n if not cuckoo.contains(item):\n print '{} has been removed'.format(item)\n```\n\nScalable Cuckoo filter\n----------------------\n\nHaving a fix capacity is a problem when using Cuckoo filter in practice.\nAllocating too much capacity and it goes to waste while allocating too\nlittle and it degrades the filter performance and causes FP. Therefore,\nthe scalable Cuckoo filter is introduced as an attempt to solve the fix\ncapacity issue.\n\nInspired by Scalable Bloom filter, Scalable Cuckoo filter utilizes\nmultiple filters to scale the capacity dynamically. When an existing\nfilter approaches its capacity, a new one, double in size, will be\ncreated. A scalable Cuckoo filter will handle all usual operations\nseamlessly and transparently.\n\nInternally, scalable Cuckoo filter uses bitarray Cuckoo filter for\nefficiency although it can be changed easily.\n\n```python\nimport math\n\nfrom random import randrange\nfrom cuckoo.filter import ScalableCuckooFilter\n\ninitial_capacity = 1000000\nerror_rate = 0.000001\n# Create a scalable Cuckoo filter with an initial capacity of 1000000\n# buckets\ncuckoo = ScalableCuckooFilter(initial_capacity=initial_capacity, error_rate=error_rate)\n\nbucket_size = 6\n# Setting the bucket size is optional, the bigger the bucket,\n# the more number of items a filter can hold, and the longer\n# the fingerprint needs to be to stay at the same error rate\ncuckoo = ScalableCuckooFilter(initial_capacity=initial_capacity, error_rate=error_rate, bucket_size=bucket_size)\n\n# The fingerprint length is computed using the following formula:\nfingerprint_size = int(math.ceil(math.log(1.0 / error_rate, 2) + math.log(2 * bucket_size, 2)))\n\nfor _ in range(1, 100000):\n item = str(randrange(1, 1000000000))\n cuckoo.insert(item)\n\n if cuckoo.contains(item):\n print '{} has been added'.format(item)\n\n cuckoo.delete(item)\n\n if not cuckoo.contains(item):\n print '{} has been removed'.format(item)\n```\n\n\n", "description_content_type": "text/markdown", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/huydhn/cuckoo-filter", "keywords": "", "license": "MIT", "maintainer": "", "maintainer_email": "", "name": "scalable-cuckoo-filter", "package_url": "https://pypi.org/project/scalable-cuckoo-filter/", "platform": "", "project_url": "https://pypi.org/project/scalable-cuckoo-filter/", "project_urls": { "Homepage": "https://github.com/huydhn/cuckoo-filter" }, "release_url": "https://pypi.org/project/scalable-cuckoo-filter/1.1/", "requires_dist": [ "mmh3", "bitarray" ], "requires_python": "", "summary": "Scalable Cuckoo filter", "version": "1.1" }, "last_serial": 4246996, "releases": { "1.0": [ { "comment_text": "", "digests": { "md5": "d9e8de60c16565963eb38cbe7d8e0d98", "sha256": "4d3c79f3c18cf5ad4ead66b6a3c6a08605cb90f88b8640173c36808348700dc5" }, "downloads": -1, "filename": "scalable_cuckoo_filter-1.0-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "d9e8de60c16565963eb38cbe7d8e0d98", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": null, "size": 10278, "upload_time": "2018-09-06T07:29:24", "url": "https://files.pythonhosted.org/packages/a3/59/5c546eb524573b245bfa1e314b00fe7f16aa9729cfadb53eb84fb5f84c8b/scalable_cuckoo_filter-1.0-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "5f82a9ffe3bece448280807546971794", "sha256": "ceefc52ea718679143071ca7ac1d8b501b7a42bad78fddb088493078d56ee8a7" }, "downloads": -1, "filename": "scalable-cuckoo-filter-1.0.tar.gz", "has_sig": false, "md5_digest": "5f82a9ffe3bece448280807546971794", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 9076, "upload_time": "2018-09-06T07:29:25", "url": "https://files.pythonhosted.org/packages/61/04/d0672dbc4e69075543adf9d2173a8fd5397fb73d4cd4464569ac7ff9cebb/scalable-cuckoo-filter-1.0.tar.gz" } ], "1.1": [ { "comment_text": "", "digests": { "md5": "a5168fb13c5a27b38638d324ce18060f", "sha256": "c05e6e2eb1c74736e8fa7cf5c8049df67bd9b1f81d961a14fb231d8553636dd5" }, "downloads": -1, "filename": "scalable_cuckoo_filter-1.1-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "a5168fb13c5a27b38638d324ce18060f", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": null, "size": 10484, "upload_time": "2018-09-07T03:46:09", "url": "https://files.pythonhosted.org/packages/e4/0b/a65d3820bae71fe3724272e30d0ca0f25853ab8458aefaf9a867267d6f33/scalable_cuckoo_filter-1.1-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "282ea13c99b0ef525caa9ba4fecbea8a", "sha256": "73053812b16a41c7c3c4312b3dc1fb9eb8ae71905493decb062ec538e2a13054" }, "downloads": -1, "filename": "scalable-cuckoo-filter-1.1.tar.gz", "has_sig": false, "md5_digest": "282ea13c99b0ef525caa9ba4fecbea8a", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 9003, "upload_time": "2018-09-07T03:46:11", "url": "https://files.pythonhosted.org/packages/08/f7/bcbcf03f662d37746b48ea15c2bcd0cf8c99207a1bb16f3c67d0d25b256e/scalable-cuckoo-filter-1.1.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "a5168fb13c5a27b38638d324ce18060f", "sha256": "c05e6e2eb1c74736e8fa7cf5c8049df67bd9b1f81d961a14fb231d8553636dd5" }, "downloads": -1, "filename": "scalable_cuckoo_filter-1.1-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "a5168fb13c5a27b38638d324ce18060f", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": null, "size": 10484, "upload_time": "2018-09-07T03:46:09", "url": "https://files.pythonhosted.org/packages/e4/0b/a65d3820bae71fe3724272e30d0ca0f25853ab8458aefaf9a867267d6f33/scalable_cuckoo_filter-1.1-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "282ea13c99b0ef525caa9ba4fecbea8a", "sha256": "73053812b16a41c7c3c4312b3dc1fb9eb8ae71905493decb062ec538e2a13054" }, "downloads": -1, "filename": "scalable-cuckoo-filter-1.1.tar.gz", "has_sig": false, "md5_digest": "282ea13c99b0ef525caa9ba4fecbea8a", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 9003, "upload_time": "2018-09-07T03:46:11", "url": "https://files.pythonhosted.org/packages/08/f7/bcbcf03f662d37746b48ea15c2bcd0cf8c99207a1bb16f3c67d0d25b256e/scalable-cuckoo-filter-1.1.tar.gz" } ] }