{ "info": { "author": "Michael Axiak", "author_email": "mike@axiak.net", "bugtrack_url": null, "classifiers": [ "Development Status :: 3 - Alpha", "License :: OSI Approved :: BSD License", "Operating System :: OS Independent", "Programming Language :: Python", "Topic :: Software Development :: Libraries" ], "description": "===========================================\nfuzzyset - A fuzzy string set for python.\n===========================================\n\nfuzzyset is a data structure that performs something akin to fulltext search\nagainst data to determine likely mispellings and approximate string matching.\n\nUsage\n-----\n\nThe usage is simple. Just add a string to the set, and ask for it later\nby using either ``.get`` or ``[]``::\n\n >>> a = fuzzyset.FuzzySet()\n >>> a.add(\"michael axiak\")\n >>> a.get(\"micael asiak\")\n [(0.8461538461538461, u'michael axiak')]\n\nThe result will be a list of ``(score, mached_value)`` tuples.\nThe score is between 0 and 1, with 1 being a perfect match.\n\nFor roughly 15% performance increase, there is also a Cython-implemented\nversion called ``cfuzzyset``. So you can write the following, akin to\n``cStringIO`` and ``cPickle``::\n\n try:\n from cfuzzyset import cFuzzySet as FuzzySet\n except ImportError:\n from fuzzyset import FuzzySet\n\nConstruction Arguments\n----------------------\n\n - iterable: An iterable that yields strings to initialize the data structure with\n - gram_size_lower: The lower bound of gram sizes to use, inclusive (see Theory of operation). Default: 2\n - gram_size_upper: The upper bound of gram sizes to use, inclusive (see Theory of operation). Default: 3\n - use_levenshtein: Whether or not to use the levenshtein distance to determine the match scoring. Default: True\n\nTheory of operation\n-------------------\n\nAdding to the data structure\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nFirst let's look at adding a string, 'michaelich' to an empty set. We first break apart the string into n-grams (strings of length\nn). So trigrams of 'michaelich' would look like::\n\n '-mi'\n 'mic'\n 'ich'\n 'cha'\n 'hae'\n 'ael'\n 'eli'\n 'lic'\n 'ich'\n 'ch-'\n\nNote that fuzzyset will first normalize the string by removing non word characters except for spaces and commas and force\neverything to be lowercase.\n\nNext the fuzzyset essentially creates a reverse index on those grams. Maintaining a dictionary that says::\n\n 'mic' -> (1, 0)\n 'ich' -> (2, 0)\n ...\n\nAnd there's a list that looks like::\n\n [(3.31, 'michaelich')]\n\nNote that we maintain this reverse index for *all* grams from ``gram_size_lower`` to ``gram_size_upper`` in the constructor.\nThis becomes important in a second.\n\nRetrieving\n~~~~~~~~~~\n\nTo search the data structure, we take the n-grams of the query string and perform a reverse index look up. To illustrate,\nlet's consider looking up ``'michael'`` in our fictitious set containing ``'michaelich'`` where the ``gram_size_upper``\nand ``gram_size_lower`` parameters are default (3 and 2 respectively).\n\nWe begin by considering first all trigrams (the value of ``gram_size_upper``). Those grams are::\n\n '-mi'\n 'mic'\n 'ich'\n 'cha'\n 'el-'\n\nThen we create a list of any element in the set that has *at least one* occurrence of a trigram listed above. Note that\nthis is just a dictionary lookup 5 times. For each of these matched elements, we compute the `cosine similarity`_ between\neach element and the query string. We then sort to get the most similar matched elements.\n\nIf ``use_levenshtein`` is false, then we return all top matched elements with the same cosine similarity.\n\nIf ``use_levenshtein`` is true, then we truncate the possible search space to 50, compute a score based on the levenshtein\ndistance (so that we handle transpositions), and return based on that.\n\nIn the event that none of the trigrams matched, we try the whole thing again with bigrams (note though that if there are no matches,\nthe failure to match will be quick). Bigram searching will always be slower because there will be a much larger set to order.\n\n.. _cosine similarity: http://en.wikipedia.org/wiki/Cosine_similarity\n\n\nInstall\n--------\n\n ``pip install fuzzyset``\n\n\nLicense\n-------\n\nBSD\n\nAuthor\n--------\n\nMike Axiak \n", "description_content_type": "", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/axiak/fuzzyset/", "keywords": "fuzzyset fuzzy data structure", "license": "BSD", "maintainer": "", "maintainer_email": "", "name": "fuzzyset", "package_url": "https://pypi.org/project/fuzzyset/", "platform": "", "project_url": "https://pypi.org/project/fuzzyset/", "project_urls": { "Homepage": "https://github.com/axiak/fuzzyset/" }, "release_url": "https://pypi.org/project/fuzzyset/0.0.19/", "requires_dist": null, "requires_python": "", "summary": "A simple python fuzzyset implementation.", "version": "0.0.19" }, "last_serial": 4901798, "releases": { "0.0.1": [ { "comment_text": "", "digests": { "md5": "9c64f1bf48320608a492d2ca5d4f4b51", "sha256": "447fd45333530a1eca713713b4ee493ccd77a452934ddfde769d071a47c1472a" }, "downloads": -1, "filename": "fuzzyset-0.0.1.tar.gz", "has_sig": false, "md5_digest": "9c64f1bf48320608a492d2ca5d4f4b51", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 2021, "upload_time": "2012-02-10T23:04:01", "url": "https://files.pythonhosted.org/packages/da/57/585d0e226fbc47b313a078ae85c75a264bec171b0fec68a1433e1a718434/fuzzyset-0.0.1.tar.gz" } ], "0.0.11": [ { "comment_text": "", "digests": { "md5": "082c4bb15a822d23ade77ac9e084ee5d", "sha256": "0123cc7bccff19a193b5cf474f3498fc998eec60ddc4d01bc24e5e5f84433832" }, "downloads": -1, "filename": "fuzzyset-0.0.11.tar.gz", "has_sig": false, "md5_digest": "082c4bb15a822d23ade77ac9e084ee5d", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 318811, "upload_time": "2017-07-13T13:28:46", "url": "https://files.pythonhosted.org/packages/51/13/dfe550a9503a934c9eed3683c97b06c80abb931394e1a0db7519fd2a24be/fuzzyset-0.0.11.tar.gz" } ], "0.0.12": [ { "comment_text": "", "digests": { "md5": "eb0950f55191b88b9c723597d52a66e7", "sha256": "059080462f99c89add2591f043bfc35f2452c1a3b8ce30256032c3f45bf8ec7e" }, "downloads": -1, "filename": "fuzzyset-0.0.12.tar.gz", "has_sig": false, "md5_digest": "eb0950f55191b88b9c723597d52a66e7", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 318767, "upload_time": "2018-07-10T14:48:53", "url": "https://files.pythonhosted.org/packages/a4/9e/ab7f03b75b5beb2272ebc3293abff7cdb6425e1176de0ea43035d9de40b4/fuzzyset-0.0.12.tar.gz" } ], "0.0.13": [ { "comment_text": "", "digests": { "md5": "ffad4755121ed6defda4baf85a0d8f93", "sha256": "18d229979bac2423ea66927fb6bbfe8308f385d7dc69925a56606c4a0c5cfc15" }, "downloads": -1, "filename": "fuzzyset-0.0.13.tar.gz", "has_sig": false, "md5_digest": "ffad4755121ed6defda4baf85a0d8f93", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 363550, "upload_time": "2018-07-12T18:14:11", "url": "https://files.pythonhosted.org/packages/d3/0e/2d010e6860ea357bb6c28d4270d1d6c76cc21ca3e5019ee26fa793e3e207/fuzzyset-0.0.13.tar.gz" } ], "0.0.14": [ { "comment_text": "", "digests": { "md5": "7c9181e496e3d2736108ea387050f7dd", "sha256": "ae34949183fcb5b97448097da140f2a5a0c5e020abbeaf075d0405ef0839077c" }, "downloads": -1, "filename": "fuzzyset-0.0.14.tar.gz", "has_sig": false, "md5_digest": "7c9181e496e3d2736108ea387050f7dd", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 318816, "upload_time": "2018-08-15T13:42:55", "url": "https://files.pythonhosted.org/packages/7a/86/39695dfd12d8eb371e5e4e8d2a059a777c581693f1d826cdd7f5cd574183/fuzzyset-0.0.14.tar.gz" } ], "0.0.15": [ { "comment_text": "", "digests": { "md5": "d6e759189c8833f02cc6d88c1f594592", "sha256": "861f156d69bfe22096047b8c07c8cbc78291022f4dabe55b81d2bd8235eb4402" }, "downloads": -1, "filename": "fuzzyset-0.0.15.tar.gz", "has_sig": false, "md5_digest": "d6e759189c8833f02cc6d88c1f594592", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 363418, "upload_time": "2018-08-27T21:42:32", "url": "https://files.pythonhosted.org/packages/92/21/4939957d219ff9f88b3b120061282199eda0ddd7393732a8c15cfcf51253/fuzzyset-0.0.15.tar.gz" } ], "0.0.16": [ { "comment_text": "", "digests": { "md5": "2d338e270fd06493fec4b6bd235d50fa", "sha256": "dce505e52dfd6a791214728b9b6b43f8f58e8f88fcf783c80d759d1c4349a160" }, "downloads": -1, "filename": "fuzzyset-0.0.16.tar.gz", "has_sig": false, "md5_digest": "2d338e270fd06493fec4b6bd235d50fa", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 374036, "upload_time": "2019-01-10T22:49:21", "url": "https://files.pythonhosted.org/packages/80/b4/c202f15029f497539bfb57f641acba5ca63a81d77e69afa767a0111483e5/fuzzyset-0.0.16.tar.gz" } ], "0.0.17": [ { "comment_text": "", "digests": { "md5": "351e25f8ccec987394f1d6553dd27047", "sha256": "6cbfcd93458fe54a773e23634106720c4efb316241887c4402d1d5a4cb1c2f94" }, "downloads": -1, "filename": "fuzzyset-0.0.17.tar.gz", "has_sig": false, "md5_digest": "351e25f8ccec987394f1d6553dd27047", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 374961, "upload_time": "2019-01-24T16:18:24", "url": "https://files.pythonhosted.org/packages/38/4a/122c9ba542aad7570a9faeaf857a1030386cec854993e2fb9ae554742f2e/fuzzyset-0.0.17.tar.gz" } ], "0.0.18": [ { "comment_text": "", "digests": { "md5": "bb2e1fd3cd86eca9c184a0aa52d4c25d", "sha256": "09a7488aef3ebeb845702d8652bba64996a3bc0e436163955bf67b0740c817f7" }, "downloads": -1, "filename": "fuzzyset-0.0.18.tar.gz", "has_sig": false, "md5_digest": "bb2e1fd3cd86eca9c184a0aa52d4c25d", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 377456, "upload_time": "2019-03-05T20:07:15", "url": "https://files.pythonhosted.org/packages/62/99/771a8508f399bea182c22a83935d884ff59783fb19152cfd90f66688f2d8/fuzzyset-0.0.18.tar.gz" } ], "0.0.19": [ { "comment_text": "", "digests": { "md5": "92a1db7d81a897980c1ec88a91fc8a4f", "sha256": "2bf5a3de20f107124a4842d875e5005ee523719f97ab731caf4121e86ec8ccbc" }, "downloads": -1, "filename": "fuzzyset-0.0.19.tar.gz", "has_sig": false, "md5_digest": "92a1db7d81a897980c1ec88a91fc8a4f", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 377735, "upload_time": "2019-03-05T20:24:52", "url": "https://files.pythonhosted.org/packages/2e/78/7509f3efbb6acbcf842d7bdbd9a919ca8c0ed248123bdd8c57f08497e0dd/fuzzyset-0.0.19.tar.gz" } ], "0.0.2": [ { "comment_text": "", "digests": { "md5": "215783fb0e866da902e27c2166744c4b", "sha256": "62dd2d4fa31d02f8dfcd30c29c0d498724f67191fd8542704d69e283af93f089" }, "downloads": -1, "filename": "fuzzyset-0.0.2.tar.gz", "has_sig": false, "md5_digest": "215783fb0e866da902e27c2166744c4b", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 5220, "upload_time": "2012-02-14T05:14:22", "url": "https://files.pythonhosted.org/packages/b2/15/1dc52d45e992ec80b1ab9429c3605fd8643a75dfb61605a8b4fc3f6c3bef/fuzzyset-0.0.2.tar.gz" } ], "0.0.3": [ { "comment_text": "", "digests": { "md5": "33fc27ccea6249d41de8813743848f9d", "sha256": "9852970437ec71285830812ea6dc5ab4a008ad29f27183d7a13e38e3688e08fc" }, "downloads": -1, "filename": "fuzzyset-0.0.3.tar.gz", "has_sig": false, "md5_digest": "33fc27ccea6249d41de8813743848f9d", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 319648, "upload_time": "2012-02-14T16:00:52", "url": "https://files.pythonhosted.org/packages/6a/29/6bbc6480627b29f93b87a1cc0f47eee455a368da19902c6f58834b9d1ef0/fuzzyset-0.0.3.tar.gz" } ], "0.0.4": [ { "comment_text": "", "digests": { "md5": "88a528773b2e4254ce308b769b01504d", "sha256": "fc2716bde05e3d1fea1bf385fd4c9b2fcac97c264e8561a68566f19975259fc1" }, "downloads": -1, "filename": "fuzzyset-0.0.4.tar.gz", "has_sig": false, "md5_digest": "88a528773b2e4254ce308b769b01504d", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 351992, "upload_time": "2012-02-14T16:07:18", "url": "https://files.pythonhosted.org/packages/3d/c1/834e3092b883bd2103fe517c2c53f317890f492ac06c4654f18c01114b21/fuzzyset-0.0.4.tar.gz" } ], "0.0.5": [ { "comment_text": "", "digests": { "md5": "81bad3b005f3099795edcf73c550fd04", "sha256": "8015eca8697fc06c7cc1eda95eb4e8184c78f51fa848a27c5c5efc4f9c307373" }, "downloads": -1, "filename": "fuzzyset-0.0.5.tar.gz", "has_sig": false, "md5_digest": "81bad3b005f3099795edcf73c550fd04", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 377811, "upload_time": "2012-06-07T17:58:42", "url": "https://files.pythonhosted.org/packages/90/61/a82c51669b26967a6ce8399d9b833be9b51343544790e763f35ccd4fb94a/fuzzyset-0.0.5.tar.gz" } ], "0.0.6": [ { "comment_text": "", "digests": { "md5": "68a9ccaf4d7734da59df4fbddb2bf920", "sha256": "487cae5e6ecf2ac0bb59bbaa6dcf0fda4e897a9f36315f1c4845d3a5bb60ba65" }, "downloads": -1, "filename": "fuzzyset-0.0.6.tar.gz", "has_sig": false, "md5_digest": "68a9ccaf4d7734da59df4fbddb2bf920", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 378646, "upload_time": "2012-06-07T18:15:54", "url": "https://files.pythonhosted.org/packages/08/fe/c640e83e47ae084f421f12dfb015338223ff25238c87d8bac3d52dc0ea5c/fuzzyset-0.0.6.tar.gz" } ], "0.0.7": [ { "comment_text": "", "digests": { "md5": "40d539000c1bc55c9d65fb3aeddc03a2", "sha256": "bc0489f24b1cf0d1d69cf6a57ce028e9e20ee1c9a6d85980b3f175f68891ab78" }, "downloads": -1, "filename": "fuzzyset-0.0.7.tar.gz", "has_sig": false, "md5_digest": "40d539000c1bc55c9d65fb3aeddc03a2", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 379435, "upload_time": "2012-06-07T18:44:10", "url": "https://files.pythonhosted.org/packages/73/43/b2e28076bfffac30734c7e1593a399d14641884a98c8b3653f277ab75357/fuzzyset-0.0.7.tar.gz" } ], "0.0.8": [ { "comment_text": "", "digests": { "md5": "dcd69c41e069c0b7ec522d42208bd2da", "sha256": "4bef2eeff6a1b3d3e43b6c0937c17ce5c78dcd9d47a69316eece3477f54ceac0" }, "downloads": -1, "filename": "fuzzyset-0.0.8.tar.gz", "has_sig": false, "md5_digest": "dcd69c41e069c0b7ec522d42208bd2da", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 347055, "upload_time": "2012-06-08T17:06:10", "url": "https://files.pythonhosted.org/packages/11/4e/c59adfd0cf6a50295416148cfa790e4e684b0adbfb6dec4f56523c5547ee/fuzzyset-0.0.8.tar.gz" } ], "0.0.9": [ { "comment_text": "", "digests": { "md5": "22486108b767d5072ab006c2374fbad3", "sha256": "483dfe97886cb8ff404391748f1a5211d5a7620dc605be6aacc433c72dd7ffa7" }, "downloads": -1, "filename": "fuzzyset-0.0.9.tar.gz", "has_sig": false, "md5_digest": "22486108b767d5072ab006c2374fbad3", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 375912, "upload_time": "2012-06-09T00:13:57", "url": "https://files.pythonhosted.org/packages/ee/03/8df984c6861e035dbc69585c5e58101fb0d2836254890005a30fa736a875/fuzzyset-0.0.9.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "92a1db7d81a897980c1ec88a91fc8a4f", "sha256": "2bf5a3de20f107124a4842d875e5005ee523719f97ab731caf4121e86ec8ccbc" }, "downloads": -1, "filename": "fuzzyset-0.0.19.tar.gz", "has_sig": false, "md5_digest": "92a1db7d81a897980c1ec88a91fc8a4f", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 377735, "upload_time": "2019-03-05T20:24:52", "url": "https://files.pythonhosted.org/packages/2e/78/7509f3efbb6acbcf842d7bdbd9a919ca8c0ed248123bdd8c57f08497e0dd/fuzzyset-0.0.19.tar.gz" } ] }