{ "info": { "author": "Marcin Pertek", "author_email": "kat.zygfryd@gmail.com", "bugtrack_url": null, "classifiers": [ "Development Status :: 4 - Beta", "Intended Audience :: Developers", "License :: OSI Approved :: MIT License", "Operating System :: OS Independent", "Programming Language :: Python", "Programming Language :: Python :: 2.6", "Programming Language :: Python :: 2.7", "Programming Language :: Python :: 3", "Topic :: Database :: Database Engines/Servers", "Topic :: Scientific/Engineering :: Information Analysis", "Topic :: Software Development :: Libraries :: Python Modules", "Topic :: Text Processing :: Indexing" ], "description": "=====\nsetix\n=====\n\nAt its core setix provides a \"set intersection index\", an inverted index data structure designed for storing sets\nof symbols and fast querying of sets intersecting the given set, with sorting based on the number of intersections\nor a similarity measure.\n\nAdditionally, a wrapper for indexing strings is provided in setix.trgm, which implements a trigram index compatible\nwith the PostgreSQL extension pg_trgm.\n\nExamples\n========\n\nUsing a set index:\n\n.. code-block:: python\n \n import setix\n \n ix = setix.SetIntersectionIndex ()\n ix.add ((1, 2, 3))\n ix.add ((1, 2, 4))\n ix.add ((2, 3, 4))\n \n ix.find ((1, 2), 1).get_list()\n # returns [(2, [(1, 2, 3)]),\n # (2, [(1, 2, 4)]),\n # (1, [(2, 3, 4)])]\n # (the order of the first two results can change as they have equal scores)\n\nUsing a trigram index:\n\n.. code-block:: python\n\n import setix.trgm\n \n ix = setix.trgm.TrigramIndex ()\n ix.add (\"strength\")\n ix.add (\"strenght\")\n ix.add (\"strength and honor\")\n \n ix.find (\"stremgth\", threshold=1).get_list()\n # returns [(6, [\"strength and honor\"])\n # (6, [\"strength\"]),\n # (4, [\"strenght\"])]\n \n ix.find_similar (\"stremgth\", threshold=0.1).get_list()\n # returns [(0.5, [\"strength\"]), # 6 intersections / (9 total + 9 total - 6)\n # (0.29, [\"strenght\"]), # 4 intersections / (9 total + 9 total - 4)\n # (0.27, [\"strength and honor\"])] # 6 intersections / (9 total + 19 total - 6)\n\nIn general, to search for phrases containing a misspelt word, a threshold of -3*N can be given where N is the number\nof misspellings.\n\n.. code-block:: python\n\n ix.find (\"stremgth\", threshold=-3).get_list()\n # returns [(6, [\"strength and honor\"]),\n # (6, [\"strength\"])]\n\nBenchmarks\n==========\n\nA benchmark is included in tests/dvd_db_test.py\n\nResults from an Athlon II running at 2.6GHz:\n\nPython 2.7\n----------------------\n\n.. code-block:: none\n\n In [1]: import tests.dvd_db_test\n Loading database...\n Extracted 240577 titles\n Memory used by data: 107.8MB\n Building index...\n CPU time used: 43.1s\n Unique trigrams indexed: 11352\n Unique phrases indexed: 228620\n Memory used by index: 80.9MB\n \n In [2]: %timeit list (tests.dvd_db_test.titles.find(\"daft punk\", 8))\n 10 loops, best of 3: 27.8 ms per loop\n \n In [3]: %timeit list (tests.dvd_db_test.titles.find(\"daft punk\", 1))\n 10 loops, best of 3: 86.4 ms per loop\n\nPython 3.2\n----------------------\n\n.. code-block:: none\n\n In [1]: import tests.dvd_db_test\n Loading database...\n Extracted 240577 titles\n Memory used by data: 108.8MB\n Building index...\n CPU time used: 45.8s\n Unique trigrams indexed: 11352\n Unique phrases indexed: 228620\n Memory used by index: 86.2MB\n \n In [2]: %timeit list (tests.dvd_db_test.titles.find(\"daft punk\", 8))\n 10 loops, best of 3: 27.9 ms per loop\n \n In [3]: %timeit list (tests.dvd_db_test.titles.find(\"daft punk\", 1))\n 10 loops, best of 3: 86.3 ms per loop\n\nDVD title list used in the benchmark was obtained from http://www.hometheaterinfo.com/dvdlist.htm\nThanks for making it available.", "description_content_type": null, "docs_url": null, "download_url": "UNKNOWN", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "http://github.com/zygfryd/python-setix", "keywords": "set,intersection,index,trgm,fuzzy", "license": "MIT", "maintainer": null, "maintainer_email": null, "name": "setix", "package_url": "https://pypi.org/project/setix/", "platform": "UNKNOWN", "project_url": "https://pypi.org/project/setix/", "project_urls": { "Download": "UNKNOWN", "Homepage": "http://github.com/zygfryd/python-setix" }, "release_url": "https://pypi.org/project/setix/0.8.3/", "requires_dist": null, "requires_python": null, "summary": "Fast data structures for finding intersecting sets and similar strings", "version": "0.8.3" }, "last_serial": 989580, "releases": { "0.8": [ { "comment_text": "", "digests": { "md5": "487b894890b55b06dc2eaa47de1f5ce0", "sha256": "0555eb11443ac0f2a61296573e3c9a089e01487dca89529900b8e97013b4bed1" }, "downloads": -1, "filename": "setix-0.8.tar.gz", "has_sig": false, "md5_digest": "487b894890b55b06dc2eaa47de1f5ce0", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 7141, "upload_time": "2013-03-09T17:04:32", "url": "https://files.pythonhosted.org/packages/2d/b0/130ec06e2ff9a84c4b3e855e6284df5df3a5b2391bf007d829ee6466aab4/setix-0.8.tar.gz" } ], "0.8.1": [ { "comment_text": "", "digests": { "md5": "feca7a1abe7e12568d23b71bbb38bcbb", "sha256": "c6265c91943cba52ce9f9e9af9ea1e8be81f58cb54a55bedf675168e1775bc1c" }, "downloads": -1, "filename": "setix-0.8.1.tar.gz", "has_sig": false, "md5_digest": "feca7a1abe7e12568d23b71bbb38bcbb", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 8102, "upload_time": "2013-03-09T17:12:32", "url": "https://files.pythonhosted.org/packages/91/e9/0251bc0030667abe35817d9cac6468d1688040ddfb61f689d0f55fae261e/setix-0.8.1.tar.gz" } ], "0.8.2": [ { "comment_text": "", "digests": { "md5": "3c846cd40ba96007b0c6b067b06227b0", "sha256": "d4720508920b2e2d1b3ba378dad2b82a3ca268187a71605e80ec3a80973673dd" }, "downloads": -1, "filename": "setix-0.8.2.tar.gz", "has_sig": false, "md5_digest": "3c846cd40ba96007b0c6b067b06227b0", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 8188, "upload_time": "2013-03-10T07:59:09", "url": "https://files.pythonhosted.org/packages/77/7f/dd75810bdf85de1deaa323d242ca5b9da523f249f3777fcff674fdef9871/setix-0.8.2.tar.gz" } ], "0.8.3": [ { "comment_text": "", "digests": { "md5": "c2ed11b63ae1a6b38e12ed5702471c84", "sha256": "022d2f82002c4cf84c31131c6f597c46bb52e845ca73667e0681a193df7f796c" }, "downloads": -1, "filename": "setix-0.8.3.tar.gz", "has_sig": false, "md5_digest": "c2ed11b63ae1a6b38e12ed5702471c84", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 8336, "upload_time": "2014-02-03T17:16:53", "url": "https://files.pythonhosted.org/packages/81/21/f0628d54be13cba7353c5452b9434fca501d61f3d5ac9bc40e2f671b5ad1/setix-0.8.3.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "c2ed11b63ae1a6b38e12ed5702471c84", "sha256": "022d2f82002c4cf84c31131c6f597c46bb52e845ca73667e0681a193df7f796c" }, "downloads": -1, "filename": "setix-0.8.3.tar.gz", "has_sig": false, "md5_digest": "c2ed11b63ae1a6b38e12ed5702471c84", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 8336, "upload_time": "2014-02-03T17:16:53", "url": "https://files.pythonhosted.org/packages/81/21/f0628d54be13cba7353c5452b9434fca501d61f3d5ac9bc40e2f671b5ad1/setix-0.8.3.tar.gz" } ] }