{ "info": { "author": "nitxiodev", "author_email": "smnitxio@gmail.com", "bugtrack_url": null, "classifiers": [ "License :: OSI Approved :: MIT License", "Operating System :: POSIX :: Linux", "Programming Language :: Python :: 3.5", "Programming Language :: Python :: 3.6" ], "description": "# python-bloomfilter\nScalable bloom filter using different backends written in Python. Current version **only** works with Python 3.\n\n# Installation\n```bash\npip install BloomFilterPy\n```\n# Backends\n\nCurrently, BloomFilterPy has the following backends available: `numpy`, `bitarray` and `redis`. The first two are recommended when the expected number of elements in the filter fit in memory. Redis backend is the preferred when:\n\n- Expect huge amount of data in the filter that it doesn't fit in memory.\n- You want a distributed filter available (i.e. more than one machine). Thanks to lua scripts, now is possible to take advantage of redis atomic operations in the server side and share the same filter across multiple machines. \n\n# Usage & API\n\nBloomFilterPy implements a common API regardless of the backend used. Every backend extends `BaseBackend` class that implements the common API. In turn, this base class extends the default `set` class of Python, but just `add` and `len` operations are properly handled.\n\n## `BloomFilter` class\n\n- `max_number_of_element_expected`: Size of filter. Number of elements it will contain.\n- `error_rate`: rate of error you're willing to assume. Default is **0.0005**.\n- `backend`: `numpy`, `bitarray` or `redis`. Default is **numpy**.\n- Only applies with `redis` backend:\n - `redis_connection`: url for redis connection as accepted by redis-py.\n - `connection_retries`: max number of connection retries in case of losing the connection with redis. Default is **3**.\n - `wait`: max waiting time before trying to make a new request against redis. \n - `prefix_key`: key used in redis to store bloom filter data. Default is **bloom_filter**.\n\n## API\n\n- `add(element)`: add a new element in the filter.\n- `full`: property that indicates if the filter is full.\n- `false_positive_probability`: property that indicates current and updated error rate of the filter. This value should match with choosed error_rate when BloomFilterPy was instanciated, but as new items are added, this value will change.\n- `reset()`: purge every element from the filter. In the case of bitarray or numpy, after calling `reset()` it is possible to keep using the filter. However, with redis backend, once `reset()` is called, you **must** reinstantiate the filter.\n- `len`: get the length of the filter (i.e. number of elements).\n\n## Local Example\n\n```python\nfrom pybloom import BloomFilter\n\nif __name__ == '__main__':\n f = BloomFilter(10, error_rate=0.0000003, backend='bitarray') # or backend='numpy'\n\n for i in range(10):\n f.add(i) # or f += i\n assert i in f\n\n print(f.false_positive_probability, 11 in f) # 6.431432780588261e-07 False\n```\n\nIn the example above, we have created a bloom filter using `bitarray` backend, with `10` expected elements and max false probability assumed of `0.0000003`.\n\n## Redis example\nIn order to build a bloom filter in redis, `BloomFilterPy` with `RedisBackend` will do all the work for you. The first process that wins the distributed lock, will be the responsible to initialize the filter. \n```python\nfrom pybloom import BloomFilter\n\nif __name__ == '__main__':\n f = BloomFilter(10, error_rate=0.0000003, backend='redis', redis_connection='redis://localhost:6379/0')\n\n for i in range(10):\n f.add(i) # or f += i\n assert i in f\n\n print(f.false_positive_probability, 11 in f) # 6.431432780588261e-07 False\n```\nOnce the filter is initiallized, if you **don't** change the `prefix_key` in `BloomFilter` object and current `prefix_key` already exists, `BloomFilterPy` will reuse it in a distributed fashion. In this case, `max_number_of_element_expected` and `error_rate` are ignored, but for compatibility with the rest of the backends, it is mandatory to set them up.\n\n# How can I extend it?\n\nIf you install this library from sources and are interested in build a new backend, like MongoBackend or FileSystemBackend for example, is very simple. You just need extend your new backend from:\n\n- `ThreadBackend`: if you want to develop a **local** thread-safe backend, like FileSystemBackend. This backend exposes a `lock` property to use it when a new item is added or the filter is reset.\n- `SharedBackend`: if you want to develop a **shared** backend across several machines, like DatabaseBackend.\n\nand implement the following methods:\n\n- `_add(*args, **kwargs)`: this method specify the way of adding new elements in the filter using the backend.\n- `reset()`: this method is used to delete or purge **every** element from the filter.\n- `__contains__`: this method returns the length of the filter using `_capacity` private variable (i.e. number of elements).\n\nBesides, `__init__` method **must** have three parameters to define the array size, optimal hash and filter_size. For convention, `array_size: int`, `optimal_hash: int` and `filter_size: int` are used.\n\nFor example, in a hypothetical MongoBackend, the skeleton would be something similar to:\n\n```python\nclass MongoBackend(SharedBackend):\n def __init__(self, array_size: int, optimal_hash: int, **kwargs):\n # In kwargs you can put mongodb connection details, like host, port and so on.\n\n self._mongo_connection = MongoClient(**kwargs)\n super(MongoBackend, self).__init__(array_size, hash_size)\n\n def _add(self, other):\n # perform hashing functions of other and save it in mongo using mongo_connection\n\n def reset(self):\n # purge bloom filter using mongo_connection\n\n def __contains__(self, item):\n # check if item is present in the filter\n```\n\nOnce you have a new backend ready, you **must** add it into BloomFilter factory class:\n\n```python\nclass BloomFilter(object):\n def __new__(cls, max_number_of_element_expected: int, error_rate=.0005, backend='numpy', **kwargs):\n ...\n\n elif backend == 'MongoBackend':\n return MongoBackend(filter_metadata.optimal_size, filter_metadata.optimal_hash, max_number_of_element_expected, **kwargs)\n\n ...\n```\n\n\n", "description_content_type": "text/markdown", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/nitxiodev/python-bloomfilter", "keywords": "", "license": "", "maintainer": "", "maintainer_email": "", "name": "BloomFilterPy", "package_url": "https://pypi.org/project/BloomFilterPy/", "platform": "", "project_url": "https://pypi.org/project/BloomFilterPy/", "project_urls": { "Homepage": "https://github.com/nitxiodev/python-bloomfilter" }, "release_url": "https://pypi.org/project/BloomFilterPy/1.1/", "requires_dist": [ "redis (==2.10.6)", "psutil (==5.4.8)", "mmh3 (==2.5.1)", "bitarray (==0.8.3)", "numpy (==1.15.4)" ], "requires_python": "", "summary": "Scalable bloom filter using different backends written in Python", "version": "1.1" }, "last_serial": 4766056, "releases": { "1.0.0": [ { "comment_text": "", "digests": { "md5": "b530f13fa4ef523b2e80d593ffce3096", "sha256": "2f6f879bcbcff2cff0a42ca890aafc4a3884aff192a4116d0eeefef5fcdb457c" }, "downloads": -1, "filename": "BloomFilterPy-1.0.0-py3-none-any.whl", "has_sig": false, "md5_digest": "b530f13fa4ef523b2e80d593ffce3096", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 9652, "upload_time": "2019-01-02T00:20:49", "url": "https://files.pythonhosted.org/packages/51/13/41f5852a06f5f6956b9bb045b77eb14f90739bea78ce7beee6a242147ab7/BloomFilterPy-1.0.0-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "52afcd6b670dc35f8f0f5d4578ddffbd", "sha256": "0133fdd715e663216f9062df2eb1e0868de8803851ba9579b283585946811785" }, "downloads": -1, "filename": "BloomFilterPy-1.0.0.tar.gz", "has_sig": false, "md5_digest": "52afcd6b670dc35f8f0f5d4578ddffbd", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 6027, "upload_time": "2019-01-02T00:20:50", "url": "https://files.pythonhosted.org/packages/39/1c/39c143a6a2db82546bcbea1a07d7e9baabb5b2b6ea955f40f083936c06a8/BloomFilterPy-1.0.0.tar.gz" } ], "1.0.1": [ { "comment_text": "", "digests": { "md5": "5cd908d3d0e9d5a95342d6bafc5a0e31", "sha256": "5980682502182bed6a71cd6f1cbeb1f2eec081bb6ff9114c234ccdf1059abb5e" }, "downloads": -1, "filename": "BloomFilterPy-1.0.1-py3-none-any.whl", "has_sig": false, "md5_digest": "5cd908d3d0e9d5a95342d6bafc5a0e31", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 9804, "upload_time": "2019-01-02T00:41:26", "url": "https://files.pythonhosted.org/packages/41/33/9c049abf135929d2550cafdd24cc2595e150bc97b4e61ca197635a3493ea/BloomFilterPy-1.0.1-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "5539b482e3e0131e999309b69f6a5304", "sha256": "b5dc96fd107835dbb41fca411df51f6befb4755f399a8b4058f6dc45e73e36fc" }, "downloads": -1, "filename": "BloomFilterPy-1.0.1.tar.gz", "has_sig": false, "md5_digest": "5539b482e3e0131e999309b69f6a5304", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 6127, "upload_time": "2019-01-02T00:41:28", "url": "https://files.pythonhosted.org/packages/36/17/824712e9be0f086ce3107fa18db4843145b09f61778cccd1ec2234d9d150/BloomFilterPy-1.0.1.tar.gz" } ], "1.0.2": [ { "comment_text": "", "digests": { "md5": "7e4618ac1da279ce77572346a09108e5", "sha256": "a76cb7b187fdaf897ffbca5042b58870ebb1cc87524b007a93e04341f94d082d" }, "downloads": -1, "filename": "BloomFilterPy-1.0.2-py3-none-any.whl", "has_sig": false, "md5_digest": "7e4618ac1da279ce77572346a09108e5", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 11393, "upload_time": "2019-01-20T00:22:12", "url": "https://files.pythonhosted.org/packages/d8/32/4688211c48e92147ee6e0abb31d0594305270213104a6839a5fa9791eea6/BloomFilterPy-1.0.2-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "7bac7a5e8f8edc3395b50074fe56715d", "sha256": "ac03e255376268189776877bf003e1d181a7ea02eec2dd8760e069e962738a08" }, "downloads": -1, "filename": "BloomFilterPy-1.0.2.tar.gz", "has_sig": false, "md5_digest": "7bac7a5e8f8edc3395b50074fe56715d", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 7955, "upload_time": "2019-01-20T00:22:13", "url": "https://files.pythonhosted.org/packages/23/24/cc53acfbd2d51530b1cd3b5b325acbc6b73f5b1958b39e89e7050869e3fe/BloomFilterPy-1.0.2.tar.gz" } ], "1.1": [ { "comment_text": "", "digests": { "md5": "f676d1cfbc19ef2777dd00e0b7e9efa6", "sha256": "f396622fb48f6770af0b7a65f0c05eebdac690d5fa445b4baa2b6e08d24e5509" }, "downloads": -1, "filename": "BloomFilterPy-1.1-py3-none-any.whl", "has_sig": false, "md5_digest": "f676d1cfbc19ef2777dd00e0b7e9efa6", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 15360, "upload_time": "2019-02-01T00:09:59", "url": "https://files.pythonhosted.org/packages/0c/80/a0e895aba61c817a97a94c002a124c047643c2b7d4caf206ea39855b1605/BloomFilterPy-1.1-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "c94efde99645f6da37b741d5cbba02de", "sha256": "b9611d9ef08d8bd5c0878233f0e25d667df0e639f7efa5f48a5b478474bd0659" }, "downloads": -1, "filename": "BloomFilterPy-1.1.tar.gz", "has_sig": false, "md5_digest": "c94efde99645f6da37b741d5cbba02de", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 11585, "upload_time": "2019-02-01T00:10:01", "url": "https://files.pythonhosted.org/packages/3c/9c/0208d2ed0879af43e582580da7154f30999265ad732f2c00bf0e44d74829/BloomFilterPy-1.1.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "f676d1cfbc19ef2777dd00e0b7e9efa6", "sha256": "f396622fb48f6770af0b7a65f0c05eebdac690d5fa445b4baa2b6e08d24e5509" }, "downloads": -1, "filename": "BloomFilterPy-1.1-py3-none-any.whl", "has_sig": false, "md5_digest": "f676d1cfbc19ef2777dd00e0b7e9efa6", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 15360, "upload_time": "2019-02-01T00:09:59", "url": "https://files.pythonhosted.org/packages/0c/80/a0e895aba61c817a97a94c002a124c047643c2b7d4caf206ea39855b1605/BloomFilterPy-1.1-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "c94efde99645f6da37b741d5cbba02de", "sha256": "b9611d9ef08d8bd5c0878233f0e25d667df0e639f7efa5f48a5b478474bd0659" }, "downloads": -1, "filename": "BloomFilterPy-1.1.tar.gz", "has_sig": false, "md5_digest": "c94efde99645f6da37b741d5cbba02de", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 11585, "upload_time": "2019-02-01T00:10:01", "url": "https://files.pythonhosted.org/packages/3c/9c/0208d2ed0879af43e582580da7154f30999265ad732f2c00bf0e44d74829/BloomFilterPy-1.1.tar.gz" } ] }