{ "info": { "author": "Mikhail Korobov", "author_email": "kmike84@gmail.com", "bugtrack_url": null, "classifiers": [ "Development Status :: 4 - Beta", "Intended Audience :: Developers", "Intended Audience :: Science/Research", "License :: OSI Approved :: GNU Lesser General Public License v2 or later (LGPLv2+)", "Programming Language :: Cython", "Programming Language :: Python", "Programming Language :: Python :: 2", "Programming Language :: Python :: 2.6", "Programming Language :: Python :: 2.7", "Programming Language :: Python :: 3", "Programming Language :: Python :: 3.3", "Programming Language :: Python :: 3.4", "Programming Language :: Python :: 3.5", "Programming Language :: Python :: Implementation :: CPython", "Topic :: Scientific/Engineering :: Information Analysis", "Topic :: Software Development :: Libraries :: Python Modules", "Topic :: Text Processing :: Linguistic" ], "description": "datrie |travis| |appveyor|\n======\n\n.. |travis| image:: https://travis-ci.org/pytries/datrie.svg\n :target: https://travis-ci.org/pytries/datrie\n\n.. |appveyor| image:: https://ci.appveyor.com/api/projects/status/6bpvhllpjhlau7x0?svg=true\n :target: https://ci.appveyor.com/project/superbobry/datrie\n\nSuper-fast, efficiently stored Trie for Python (2.x and 3.x).\nUses `libdatrie`_.\n\n.. _libdatrie: http://linux.thai.net/~thep/datrie/datrie.html\n\nInstallation\n============\n\n::\n\n pip install datrie\n\nUsage\n=====\n\nCreate a new trie capable of storing items with lower-case ascii keys::\n\n >>> import string\n >>> import datrie\n >>> trie = datrie.Trie(string.ascii_lowercase)\n\n``trie`` variable is a dict-like object that can have unicode keys of\ncertain ranges and Python objects as values.\n\nIn addition to implementing the mapping interface, tries facilitate\nfinding the items for a given prefix, and vice versa, finding the\nitems whose keys are prefixes of a given string. As a common special\ncase, finding the longest-prefix item is also supported.\n\n.. warning::\n\n For efficiency you must define allowed character range(s) while\n creating trie. ``datrie`` doesn't check if keys are in allowed\n ranges at runtime, so be careful! Invalid keys are OK at lookup time\n but values won't be stored correctly for such keys.\n\nAdd some values to it (datrie keys must be unicode; the examples\nare for Python 2.x)::\n\n >>> trie[u'foo'] = 5\n >>> trie[u'foobar'] = 10\n >>> trie[u'bar'] = 'bar value'\n >>> trie.setdefault(u'foobar', 15)\n 10\n\nCheck if u'foo' is in trie::\n\n >>> u'foo' in trie\n True\n\nGet a value::\n\n >>> trie[u'foo']\n 5\n\nFind all prefixes of a word::\n\n >>> trie.prefixes(u'foobarbaz')\n [u'foo', u'foobar']\n\n >>> trie.prefix_items(u'foobarbaz')\n [(u'foo', 5), (u'foobar', 10)]\n\n >>> trie.iter_prefixes(u'foobarbaz')\n \n\n >>> trie.iter_prefix_items(u'foobarbaz')\n \n\nFind the longest prefix of a word::\n\n >>> trie.longest_prefix(u'foo')\n u'foo'\n\n >>> trie.longest_prefix(u'foobarbaz')\n u'foobar'\n\n >>> trie.longest_prefix(u'gaz')\n KeyError: u'gaz'\n\n >>> trie.longest_prefix(u'gaz', default=u'vasia')\n u'vasia'\n\n >>> trie.longest_prefix_item(u'foobarbaz')\n (u'foobar', 10)\n\nCheck if the trie has keys with a given prefix::\n\n >>> trie.has_keys_with_prefix(u'fo')\n True\n\n >>> trie.has_keys_with_prefix(u'FO')\n False\n\nGet all items with a given prefix from a trie::\n\n >>> trie.keys(u'fo')\n [u'foo', u'foobar']\n\n >>> trie.items(u'ba')\n [(u'bar', 'bar value')]\n\n >>> trie.values(u'foob')\n [10]\n\nGet all suffixes of certain word starting with a given prefix from a trie::\n\n >>> trie.suffixes()\n [u'pro', u'producer', u'producers', u'product', u'production', u'productivity', u'prof']\n >>> trie.suffixes(u'prod')\n [u'ucer', u'ucers', u'uct', u'uction', u'uctivity']\n\n\nSave & load a trie (values must be picklable)::\n\n >>> trie.save('my.trie')\n >>> trie2 = datrie.Trie.load('my.trie')\n\n\n\nTrie and BaseTrie\n=================\n\nThere are two Trie classes in datrie package: ``datrie.Trie`` and\n``datrie.BaseTrie``. ``datrie.BaseTrie`` is slightly faster and uses less\nmemory but it can store only integer numbers -2147483648 <= x <= 2147483647.\n``datrie.Trie`` is a bit slower but can store any Python object as a value.\n\nIf you don't need values or integer values are OK then use ``datrie.BaseTrie``::\n\n import datrie\n import string\n trie = datrie.BaseTrie(string.ascii_lowercase)\n\nCustom iteration\n================\n\nIf the built-in trie methods don't fit you can use ``datrie.State`` and\n``datrie.Iterator`` to implement custom traversal.\n\n.. note::\n\n If you use ``datrie.BaseTrie`` you need ``datrie.BaseState`` and\n ``datrie.BaseIterator`` for custom traversal.\n\n\nFor example, let's find all suffixes of ``'fo'`` for our trie and get\nthe values::\n\n >>> state = datrie.State(trie)\n >>> state.walk(u'foo')\n >>> it = datrie.Iterator(state)\n >>> while it.next():\n ... print(it.key())\n ... print(it.data))\n o\n 5\n obar\n 10\n\nPerformance\n===========\n\nPerformance is measured for ``datrie.Trie`` against Python's dict with\n100k unique unicode words (English and Russian) as keys and '1' numbers\nas values.\n\n``datrie.Trie`` uses about 5M memory for 100k words; Python's dict\nuses about 22M for this according to my unscientific tests.\n\nThis trie implementation is 2-6 times slower than python's dict\non __getitem__. Benchmark results (macbook air i5 1.8GHz,\n\"1.000M ops/sec\" == \"1 000 000 operations per second\")::\n\n Python 2.6:\n dict __getitem__: 7.107M ops/sec\n trie __getitem__: 2.478M ops/sec\n\n Python 2.7:\n dict __getitem__: 6.550M ops/sec\n trie __getitem__: 2.474M ops/sec\n\n Python 3.2:\n dict __getitem__: 8.185M ops/sec\n trie __getitem__: 2.684M ops/sec\n\n Python 3.3:\n dict __getitem__: 7.050M ops/sec\n trie __getitem__: 2.755M ops/sec\n\nLooking for prefixes of a given word is almost as fast as\n``__getitem__`` (results are for Python 3.3)::\n\n trie.iter_prefix_items (hits): 0.461M ops/sec\n trie.prefix_items (hits): 0.743M ops/sec\n trie.prefix_items loop (hits): 0.629M ops/sec\n trie.iter_prefixes (hits): 0.759M ops/sec\n trie.iter_prefixes (misses): 1.538M ops/sec\n trie.iter_prefixes (mixed): 1.359M ops/sec\n trie.has_keys_with_prefix (hits): 1.896M ops/sec\n trie.has_keys_with_prefix (misses): 2.590M ops/sec\n trie.longest_prefix (hits): 1.710M ops/sec\n trie.longest_prefix (misses): 1.506M ops/sec\n trie.longest_prefix (mixed): 1.520M ops/sec\n trie.longest_prefix_item (hits): 1.276M ops/sec\n trie.longest_prefix_item (misses): 1.292M ops/sec\n trie.longest_prefix_item (mixed): 1.379M ops/sec\n\nLooking for all words starting with a given prefix is mostly limited\nby overall result count (this can be improved in future because a\nlot of time is spent decoding strings from utf_32_le to Python's\nunicode)::\n\n trie.items(prefix=\"xxx\"), avg_len(res)==415: 0.609K ops/sec\n trie.keys(prefix=\"xxx\"), avg_len(res)==415: 0.642K ops/sec\n trie.values(prefix=\"xxx\"), avg_len(res)==415: 4.974K ops/sec\n trie.items(prefix=\"xxxxx\"), avg_len(res)==17: 14.781K ops/sec\n trie.keys(prefix=\"xxxxx\"), avg_len(res)==17: 15.766K ops/sec\n trie.values(prefix=\"xxxxx\"), avg_len(res)==17: 96.456K ops/sec\n trie.items(prefix=\"xxxxxxxx\"), avg_len(res)==3: 75.165K ops/sec\n trie.keys(prefix=\"xxxxxxxx\"), avg_len(res)==3: 77.225K ops/sec\n trie.values(prefix=\"xxxxxxxx\"), avg_len(res)==3: 320.755K ops/sec\n trie.items(prefix=\"xxxxx..xx\"), avg_len(res)==1.4: 173.591K ops/sec\n trie.keys(prefix=\"xxxxx..xx\"), avg_len(res)==1.4: 180.678K ops/sec\n trie.values(prefix=\"xxxxx..xx\"), avg_len(res)==1.4: 503.392K ops/sec\n trie.items(prefix=\"xxx\"), NON_EXISTING: 2023.647K ops/sec\n trie.keys(prefix=\"xxx\"), NON_EXISTING: 1976.928K ops/sec\n trie.values(prefix=\"xxx\"), NON_EXISTING: 2060.372K ops/sec\n\nRandom insert time is very slow compared to dict, this is the limitation\nof double-array tries; updates are quite fast. If you want to build a trie,\nconsider sorting keys before the insertion::\n\n dict __setitem__ (updates): 6.497M ops/sec\n trie __setitem__ (updates): 2.633M ops/sec\n dict __setitem__ (inserts, random): 5.808M ops/sec\n trie __setitem__ (inserts, random): 0.053M ops/sec\n dict __setitem__ (inserts, sorted): 5.749M ops/sec\n trie __setitem__ (inserts, sorted): 0.624M ops/sec\n dict setdefault (updates): 3.455M ops/sec\n trie setdefault (updates): 1.910M ops/sec\n dict setdefault (inserts): 3.466M ops/sec\n trie setdefault (inserts): 0.053M ops/sec\n\nOther results (note that ``len(trie)`` is currently implemented\nusing trie traversal)::\n\n dict __contains__ (hits): 6.801M ops/sec\n trie __contains__ (hits): 2.816M ops/sec\n dict __contains__ (misses): 5.470M ops/sec\n trie __contains__ (misses): 4.224M ops/sec\n dict __len__: 334336.269 ops/sec\n trie __len__: 22.900 ops/sec\n dict values(): 406.507 ops/sec\n trie values(): 20.864 ops/sec\n dict keys(): 189.298 ops/sec\n trie keys(): 2.773 ops/sec\n dict items(): 48.734 ops/sec\n trie items(): 2.611 ops/sec\n\nPlease take this benchmark results with a grain of salt; this\nis a very simple benchmark and may not cover your use case.\n\nCurrent Limitations\n===================\n\n* keys must be unicode (no implicit conversion for byte strings\n under Python 2.x, sorry);\n* there are no iterator versions of keys/values/items (this is not\n implemented yet);\n* it is painfully slow and maybe buggy under pypy;\n* library is not tested with narrow Python builds.\n\nContributing\n============\n\nDevelopment happens at github and bitbucket:\n\n* https://github.com/kmike/datrie\n* https://bitbucket.org/kmike/datrie\n\nThe main issue tracker is at github.\n\nFeel free to submit ideas, bugs, pull requests (git or hg) or\nregular patches.\n\nRunning tests and benchmarks\n----------------------------\n\nMake sure `tox`_ is installed and run\n\n::\n\n $ tox\n\nfrom the source checkout. Tests should pass under python 2.6, 2.7 and 3.2.\n\n::\n\n $ tox -c tox-bench.ini\n\nruns benchmarks.\n\nIf you've changed anything in the source code then\nmake sure `cython`_ is installed and run\n\n::\n\n $ update_c.sh\n\nbefore each ``tox`` command.\n\nPlease note that benchmarks are not included in the release\ntar.gz's because benchmark data is large and this\nsaves a lot of bandwidth; use source checkouts from\ngithub or bitbucket for the benchmarks.\n\n.. _cython: http://cython.org\n.. _tox: http://tox.testrun.org\n\nAuthors & Contributors\n----------------------\n\n* Mikhail Korobov \n* Jared Suttles\n* Gabi Davar\n* Ahmed T. Youssef\n\nThis module is based on `libdatrie`_ C library by Theppitak Karoonboonyanan\nand is inspired by `fast_trie`_ Ruby bindings, `PyTrie`_ pure\nPython implementation and `Tree::Trie`_ Perl implementation;\nsome docs and API ideas are borrowed from these projects.\n\n.. _fast_trie: https://github.com/tyler/trie\n.. _PyTrie: https://bitbucket.org/gsakkis/pytrie\n.. _Tree::Trie: http://search.cpan.org/~avif/Tree-Trie-1.9/Trie.pm\n\nLicense\n=======\n\nLicensed under LGPL v2.1.\n\nCHANGES\n=======\n\n0.7.1 (2016-03-12)\n------------------\n\n* updated the bundled C library to version 0.2.9;\n* implemented ``Trie.__len__`` in terms of ``trie_enumerate``;\n* rebuilt Cython wrapper with Cython 0.23.4;\n* changed ``Trie`` to implement ``collections.abc.MutableMapping``;\n* fixed ``Trie`` pickling, which segfaulted on Python2.X.\n\n0.7 (2014-02-18)\n----------------\n\n* bundled libdatrie C library is updated to version 0.2.8;\n* new `.suffixes()` method (thanks Ahmed T. Youssef);\n* wrapper is rebuilt with Cython 0.20.1.\n\n0.6.1 (2013-09-21)\n------------------\n\n* fixed build for Visual Studio (thanks Gabi Davar).\n\n0.6 (2013-07-09)\n----------------\n\n* datrie is rebuilt with Cython 0.19.1;\n* ``iter_prefix_values``, ``prefix_values`` and ``longest_prefix_value``\n methods for ``datrie.BaseTrie`` and ``datrie.Trie`` (thanks Jared Suttles).\n\n0.5.1 (2013-01-30)\n------------------\n\n* Recently introduced memory leak in ``longest_prefix``\n and ``longest_prefix_item`` is fixed.\n\n0.5 (2013-01-29)\n----------------\n\n* ``longest_prefix`` and ``longest_prefix_item`` methods are fixed;\n* datrie is rebuilt with Cython 0.18;\n* misleading benchmark results in README are fixed;\n* State._walk is renamed to State.walk_char.\n\n0.4.2 (2012-09-02)\n------------------\n\n* Update to latest libdatrie; this makes ``.keys()`` method a bit slower but\n removes a keys length limitation.\n\n0.4.1 (2012-07-29)\n------------------\n\n* cPickle is used for saving/loading ``datrie.Trie`` if it is available.\n\n0.4 (2012-07-27)\n----------------\n\n* ``libdatrie`` improvements and bugfixes, including C iterator API support;\n* custom iteration support using ``datrie.State`` and ``datrie.Iterator``.\n* speed improvements: ``__length__``, ``keys``, ``values`` and\n ``items`` methods should be up to 2x faster.\n* keys longer than 32768 are not supported in this release.\n\n\n0.3 (2012-07-21)\n----------------\n\nThere are no new features or speed improvements in this release.\n\n* ``datrie.new`` is deprecated; use ``datrie.Trie`` with the same arguments;\n* small test & benchmark improvements.\n\n0.2 (2012-07-16)\n----------------\n\n* ``datrie.Trie`` items can have any Python object as a value\n (``Trie`` from 0.1.x becomes ``datrie.BaseTrie``);\n* ``longest_prefix`` and ``longest_prefix_items`` are fixed;\n* ``save`` & ``load`` are rewritten;\n* ``setdefault`` method.\n\n\n0.1.1 (2012-07-13)\n------------------\n\n* Windows support (upstream libdatrie changes are merged);\n* license is changed from LGPL v3 to LGPL v2.1 to match the libdatrie license.\n\n0.1 (2012-07-12)\n----------------\n\nInitial release.", "description_content_type": null, "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/brunoalano/datrie", "keywords": "", "license": "LGPLv2+", "maintainer": "", "maintainer_email": "", "name": "datrie-extended", "package_url": "https://pypi.org/project/datrie-extended/", "platform": "UNKNOWN", "project_url": "https://pypi.org/project/datrie-extended/", "project_urls": { "Homepage": "https://github.com/brunoalano/datrie" }, "release_url": "https://pypi.org/project/datrie-extended/0.7.3/", "requires_dist": null, "requires_python": "", "summary": "Super-fast, efficiently stored Trie for Python.", "version": "0.7.3" }, "last_serial": 2644878, "releases": { "0.7.2": [], "0.7.3": [ { "comment_text": "", "digests": { "md5": "33ba6c848ca8967e5bad4a9807061537", "sha256": "243fbe7c8b30517843aa9021bb87f3ca3ef17efd1ad1329ddf02d5cec698f4bf" }, "downloads": -1, "filename": "datrie_extended-0.7.3.tar.gz", "has_sig": false, "md5_digest": "33ba6c848ca8967e5bad4a9807061537", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 204345, "upload_time": "2017-02-15T19:15:42", "url": "https://files.pythonhosted.org/packages/72/44/dbdf75fba79e90f070e62d2c95cffc7cc8bd648a7a0f355544e27dd8643e/datrie_extended-0.7.3.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "33ba6c848ca8967e5bad4a9807061537", "sha256": "243fbe7c8b30517843aa9021bb87f3ca3ef17efd1ad1329ddf02d5cec698f4bf" }, "downloads": -1, "filename": "datrie_extended-0.7.3.tar.gz", "has_sig": false, "md5_digest": "33ba6c848ca8967e5bad4a9807061537", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 204345, "upload_time": "2017-02-15T19:15:42", "url": "https://files.pythonhosted.org/packages/72/44/dbdf75fba79e90f070e62d2c95cffc7cc8bd648a7a0f355544e27dd8643e/datrie_extended-0.7.3.tar.gz" } ] }