{ "info": { "author": "Micha\u00ebl Meyer", "author_email": "michaelnm.meyer@gmail.com", "bugtrack_url": null, "classifiers": [ "Intended Audience :: Developers", "License :: OSI Approved :: GNU General Public License (GPL)", "Natural Language :: English", "Operating System :: OS Independent", "Programming Language :: C", "Programming Language :: Python", "Programming Language :: Python :: 3.3", "Topic :: Software Development :: Libraries :: Python Modules" ], "description": "distance - Utilities for comparing sequences\n============================================\n\nThis package provides helpers for computing similarities between arbitrary sequences. Included metrics are Levenshtein, Hamming, Jaccard, and Sorensen distance, plus some bonuses. All distance computations are implemented in pure Python, and most of them are also implemented in C.\n\n\nInstallation\n------------\n\nIf you don't want or need to use the C extension, just unpack the archive and run, as root:\n\n\t# python setup.py install\n\nFor the C extension to work, you need the Python source files, and a C compiler (typically Microsoft Visual C++ 2010 on Windows, and GCC on Mac and Linux). On a Debian-like system, you can get all of these with:\n\n\t# apt-get install gcc pythonX.X-dev\n\nwhere X.X is the number of your Python version.\n\nThen you should type:\n\n\t# python setup.py install --with-c\n\nNote the use of the `--with-c` switch.\n\n\nUsage\n-----\n\nA common use case for this module is to compare single words for similarity:\n\n\t>>> distance.levenshtein(\"lenvestein\", \"levenshtein\")\n\t3\n\t>>> distance.hamming(\"hamming\", \"hamning\")\n\t1\n\nIf there is not a one-to-one mapping between sounds and glyphs in your language, or if you want to compare not glyphs, but syllables or phonems, you can pass in tuples of characters:\n\n\t>>> t1 = (\"de\", \"ci\", \"si\", \"ve\")\n\t>>> t2 = (\"de\", \"ri\", \"si\", \"ve\")\n\t>>> distance.levenshtein(t1, t2)\n\t1\n\nComparing lists of strings can also be useful for computing similarities between sentences, paragraphs, etc.:\n\n\t>>> sent1 = ['the', 'quick', 'brown', 'fox', 'jumps', 'over', 'the', 'lazy', 'dog']\n\t>>> sent2 = ['the', 'lazy', 'fox', 'jumps', 'over', 'the', 'crazy', 'dog']\n\t>>> distance.levenshtein(sent1, sent2)\n\t3\n\nHamming and Levenshtein distance can be normalized, so that the results of several distance measures can be meaningfully compared. Two strategies are available for Levenshtein: either the length of the shortest alignment between the sequences is taken as factor, or the length of the longer one. Example uses:\n\n\t>>> distance.hamming(\"fat\", \"cat\", normalized=True)\n\t0.3333333333333333\n\t>>> distance.nlevenshtein(\"abc\", \"acd\", method=1) # shortest alignment\n\t0.6666666666666666\n\t>>> distance.nlevenshtein(\"abc\", \"acd\", method=2) # longest alignment\n\t0.5\n\n`jaccard` and `sorensen` return a normalized value per default:\n\n\t>>> distance.sorensen(\"decide\", \"resize\")\n\t0.5555555555555556\n\t>>> distance.jaccard(\"decide\", \"resize\")\n\t0.7142857142857143\n\nAs for the bonuses, there is a `fast_comp` function, which computes the distance between two strings up to a value of 2 included. If the distance between the strings is higher than that, -1 is returned. This function is of limited use, but on the other hand it is quite faster than `levenshtein`. There is also a `lcsubstrings` function which can be used to find the longest common substrings in two sequences.\n\nFinally, two convenience iterators `ilevenshtein` and `ifast_comp` are provided, which are intended to be used for filtering from a long list of sequences the ones that are close to a reference one. They both return a series of tuples (distance, sequence). Example:\n\n\t>>> tokens = [\"fo\", \"bar\", \"foob\", \"foo\", \"fooba\", \"foobar\"]\n\t>>> sorted(distance.ifast_comp(\"foo\", tokens))\n\t[(0, 'foo'), (1, 'fo'), (1, 'foob'), (2, 'fooba')]\n\t>>> sorted(distance.ilevenshtein(\"foo\", tokens, max_dist=1))\n\t[(0, 'foo'), (1, 'fo'), (1, 'foob')]\n\n`ifast_comp` is particularly efficient, and can handle 1 million tokens without a problem.\n\nFor more informations, see the functions documentation (`help(funcname)`).\n\nHave fun!\n\n\nChangelog\n---------\n\n20/11/13:\n* Switched back to using the to-be-deprecated Python unicode api. Good news is that this makes the\nC extension compatible with Python 2.7+, and that distance computations on unicode strings is now\nmuch faster.\n* Added a C version of `lcsubstrings`.\n* Added a new method for computing normalized Levenshtein distance.\n* Added some tests.\n\n12/11/13:\nExpanded `fast_comp` (formerly `quick_levenshtein`) so that it can handle transpositions.\nFixed variable interversions in (C) `levenshtein` which produced sometimes strange results.\n\n10/11/13:\nAdded `quick_levenshtein` and `iquick_levenshtein`.\n\n05/11/13:\nAdded Sorensen and Jaccard metrics, fixed memory issue in Levenshtein.", "description_content_type": null, "docs_url": null, "download_url": "UNKNOWN", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/doukremt/distance", "keywords": null, "license": "UNKNOWN", "maintainer": null, "maintainer_email": null, "name": "Distance", "package_url": "https://pypi.org/project/Distance/", "platform": "UNKNOWN", "project_url": "https://pypi.org/project/Distance/", "project_urls": { "Download": "UNKNOWN", "Homepage": "https://github.com/doukremt/distance" }, "release_url": "https://pypi.org/project/Distance/0.1.3/", "requires_dist": null, "requires_python": null, "summary": "Utilities for comparing sequences", "version": "0.1.3" }, "last_serial": 925135, "releases": { "0.1": [ { "comment_text": "", "digests": { "md5": "9d101eca8ec50f7d456ad4124baa981e", "sha256": "5b26973dc040064f8b48ff29a4d82035076bc91056ebb1b0f446872952c42e9d" }, "downloads": -1, "filename": "distance.tar.gz", "has_sig": false, "md5_digest": "9d101eca8ec50f7d456ad4124baa981e", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 34078, "upload_time": "2013-11-03T14:47:55", "url": "https://files.pythonhosted.org/packages/05/2e/5dd635d1ba751fa46e10e57c9fe767cae134ff85c17e9dfbfd91bdf9ee65/distance.tar.gz" } ], "0.1.1": [ { "comment_text": "", "digests": { "md5": "18bad064df049dc372dd57921ba16261", "sha256": "f56101ea0c7662426bbce235cb459c78b32c8941fbc63274c9bb31644b337d49" }, "downloads": -1, "filename": "Distance-0.1.1.tar.gz", "has_sig": false, "md5_digest": "18bad064df049dc372dd57921ba16261", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 89835, "upload_time": "2013-11-06T06:27:39", "url": "https://files.pythonhosted.org/packages/75/50/2359de7b6e4751e683b7d73de445c52fb350f9418f371c0bbc495b69889c/Distance-0.1.1.tar.gz" } ], "0.1.2": [ { "comment_text": "", "digests": { "md5": "8c0088ff1929b683ccacd2e18b670656", "sha256": "ac57287b3390749bcbfb8ab671facc0e82c0e337d58a21dc7fc1ea375ffba95a" }, "downloads": -1, "filename": "Distance-0.1.2.tar.gz", "has_sig": false, "md5_digest": "8c0088ff1929b683ccacd2e18b670656", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 90820, "upload_time": "2013-11-10T18:30:33", "url": "https://files.pythonhosted.org/packages/ae/38/b7817d94da0bdd4076b89f3c3fd4ddb95fd598444573963759c96f1fec27/Distance-0.1.2.tar.gz" } ], "0.1.2.5": [ { "comment_text": "", "digests": { "md5": "6fdc71405ceb1685ac9ea5e376b09314", "sha256": "487b7fddca09093d9a5a3c3bb13eda279d5fbc4c4b2a3d0e084a30192d30fc14" }, "downloads": -1, "filename": "Distance-0.1.2.5.tar.gz", "has_sig": false, "md5_digest": "6fdc71405ceb1685ac9ea5e376b09314", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 148337, "upload_time": "2013-11-12T17:29:00", "url": "https://files.pythonhosted.org/packages/7a/07/a6964df6af0f03e779720334a4c4b8212b82f18aa76ad4e73abc12828454/Distance-0.1.2.5.tar.gz" } ], "0.1.3": [ { "comment_text": "", "digests": { "md5": "23d82d30517c22f0992e62099e4d3f00", "sha256": "60807584f5b6003f5c521aa73f39f51f631de3be5cccc5a1d67166fcbf0d4551" }, "downloads": -1, "filename": "Distance-0.1.3.tar.gz", "has_sig": false, "md5_digest": "23d82d30517c22f0992e62099e4d3f00", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 180271, "upload_time": "2013-11-21T00:14:34", "url": "https://files.pythonhosted.org/packages/5c/1a/883e47df323437aefa0d0a92ccfb38895d9416bd0b56262c2e46a47767b8/Distance-0.1.3.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "23d82d30517c22f0992e62099e4d3f00", "sha256": "60807584f5b6003f5c521aa73f39f51f631de3be5cccc5a1d67166fcbf0d4551" }, "downloads": -1, "filename": "Distance-0.1.3.tar.gz", "has_sig": false, "md5_digest": "23d82d30517c22f0992e62099e4d3f00", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 180271, "upload_time": "2013-11-21T00:14:34", "url": "https://files.pythonhosted.org/packages/5c/1a/883e47df323437aefa0d0a92ccfb38895d9416bd0b56262c2e46a47767b8/Distance-0.1.3.tar.gz" } ] }