{ "info": { "author": "Jeff Donner", "author_email": "jeffrey.donner@gmail.com", "bugtrack_url": null, "classifiers": [ "Development Status :: 4 - Beta", "Environment :: Console", "Intended Audience :: Developers", "Intended Audience :: Information Technology", "License :: OSI Approved :: MIT License", "Operating System :: OS Independent", "Programming Language :: C++", "Programming Language :: Cython", "Programming Language :: Python :: 2", "Programming Language :: Python :: 2.7", "Programming Language :: Python :: 3", "Programming Language :: Python :: 3.0", "Programming Language :: Python :: 3.1", "Programming Language :: Python :: 3.2", "Programming Language :: Python :: 3.3", "Programming Language :: Python :: 3.4", "Topic :: Text Processing" ], "description": "Non-Overlapping Aho-Corasick Trie\n\nFeatures:\n- 'short' and 'long' (longest matching key) searches, both one-off and\n iteration over all non-overlapping keyword matches in some text.\n- Works with both unicode and str in Python 2, and unicode in Python 3. NOTE:\n As everything is simply single UCS4 / UTF-32 codepoints under the hood, all\n substrings and input unicode must be normalized, ie any separate modifying\n marks must be folded into each codepoint. See:\n http://stackoverflow.com/questions/16467479/normalizing-unicode\n Or, theoretically, you could put into the tree all forms of the\n keywords you expect to see in your text.\n- Allows you to associate an arbitrary Python object payload with each\n keyword, and supports dict operations len(), [], and 'in' for the\n keywords (though no del or traversal).\n- Does the 'compilation' (generation of Aho-Corasick failure links) of\n the trie on-demand, ie you could mix adding keywords and searching\n text, freely, but mostly it just relieves you of worrying about\n compiling.\n- Can be used commercially, it's under the minimal, MIT license (if you\n somehow need a different license, ask me, I mean for it to be used).\n\nAnti-Features:\n- Will not find overlapped keywords (eg given keywords 'abc' and 'cdef', will\n not find 'cdef' in 'abcdef'. Any full Aho-Corasick implementation would give\n you both. The package 'Acora' is an alternative package for this use. (noaho\n can be relatively easily modified to be a normal Aho-Corasick, but it wasn't\n what I personally needed.)\n- Lacking overlap, find[all]_short is kind of useless.\n- Lacks key iteration and deletion from the mapping (dict) protocol.\n- Memory leaking untested (one run under valgrind turned up nothing, but it\n wasn't extensive).\n- No /testcase/ for unicode in Python 2 (did manual test however)\n Unicode chars represented as ucs4, and, each character has its own hashtable,\n so it's relatively memory-heavy (see 'Ways to Reduce Memory Use' below).\n- Requires a C++ compiler (C++98 support is enough).\n\nBug reports and patches welcome of course!\n\n\nTo build and install, use either\n pip install noaho\nor\n # Python 2\n python2 setup.py install # (or ... build, and copy the .so to where you want it)\n pip install\nor\n # Python 3\n python3 setup.py install # (or ... build, and copy the .so to where you want it)\n\n\nAPI:\n from noaho import NoAho\n trie = NoAho()\n'text' below applies to str and unicode in Python 2, or unicode in Python 3 (all there is)\n trie.add(key_text, optional payload)\n (key_start, key_end, key_value) = trie.find_short(text_to_search)\n (key_start, key_end, key_value) = trie.find_long(text_to_search)\n (key_start, key_end, key_value) = trie.findall_short(text_to_search)\n (key_start, key_end, key_value) = trie.findall_long(text_to_search)\n # keyword = text_to_search[key_start:key_end]\n trie['keyword] = key_value\n key_value = trie.find_long(text_to_search)\n assert len(trie)\n assert keyword in trie\n\nExamples:\n >>> a = NoAho()\n >>> a.add('ms windows')\n >>> a.add('ms windows 2000', \"this is canonical\")\n >>> a.add('windows', None)\n >>> a.add('windows 2000')\n >>> a['apple'] = None\n >>> text = 'windows 2000 ms windows 2000 windows'\n >>> for k in a.findall_short(text):\n ... print text[k[0]:k[1]]\n ...\n windows\n ms windows\n windows\n >>> for k in a.findall_long(text):\n ... print text[k[0]:k[1]]\n ...\n windows 2000\n ms windows 2000\n windows\n\nMapping (dictionary) methods:\n trie = NoAho()\n trie['apple'] = apple_handling_function\n trie['orange'] = Orange()\n trie.add('banana') # payload will be None\n trie['pear'] # will give key error\n assert isinstance(trie['orange'], Orange)\n assert 'banana' in trie\n len(trie)\n # No del;\n # no iteration over keys\n\nThe 'find[all]_short' forms are named as long and awkwardly as they are,\nto leave plain 'find[all]' free if overlapping matches are ever implemented.\n\n\nFor the fullest spec of what the code will and will not do, check out\ntest-noaho.py (run it with: python[3] test-noaho.py)\n\nUntested: whether the payload handling is complete, ie that there are no\nmemory leaks. It should be correct though.\n\n\nRegenerating the Python Wrapper:\n- Needs a C++ compiler (C++98 is fine) and Cython.\n\nYou do not need to rebuild the Cython wrapper (the generated noaho.cpp), but if\nyou want to make changes to the module itself, there is a script:\n\n test-all-configurations.sh\n\nwhich will, with minor configuration tweaking, rebuild and test against both\npython 2 and 3. It requires you to have a Cython tarball in the top directory.\nNote that the python you used to install Cython should be the same as the one\nyou use to do the regeneration, because the regeneration setup includes a module\nCython.Distutils, from the installation.\n\nCython generates python-wrapper noaho.cpp from noaho.pyx (be careful\nto distinguish it from the misnamed array-aho.* (it uses hash tables),\nwhich is the original C++ code).\n\nWays to Reduce Memory Use:\nOne of its aims is to handle Unicode, which means you have to accommodate a huge\nbranching factor, thus the hashtable (a full array would be out of the\nquestion). Ways to attack memory size might be, to either force very\nconservative hashtable growth, or, once the trie is complete (in 'compile', say)\ngo through the tree and replace the hashtables with just-the-right-size arrays -\nlinear scan / binary search should be fast enough if small enough, and take less\nmemory. If you're willing to do a linear scan at that point, you could switch to\nUTF-8, too, saving quite a bit of memory. Danny Yoo's original code I think just\nstarted out as arrays and would grow to hashtables when needed.\n\nAlso, if all you need is ASCII, you could re-define AC_CHAR_TYPE to be 'char'.\nI've tried to be careful to use AC_CHAR_TYPE consistently, but you'd probably\nwant to go through the code to make sure if you're going to rely on this. Python\n3 uses Unicode internally though and would do a lot of conversions anyway.\nOtherwise, I don't trust my knowledge of Unicode enough to try to play games\nwith storing fewer bits.\n\nIn the Hopper:\nI have a case-insensitive version (the easiest thing is just to downcase\neverything you add or search for in noaho.pyx), and, one that will only yield\nkeywords at word boundaries, thanks to Python's unicode character classes.\n(However, this second is a bit raw, and you can do it manually anyway.)", "description_content_type": null, "docs_url": null, "download_url": "http://pypi.python.org/packages/source/N/NoAho/NoAho-0.9.6.1.tar.gz", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/JDonner/NoAho", "keywords": null, "license": "UNKNOWN", "maintainer": null, "maintainer_email": null, "name": "NoAho", "package_url": "https://pypi.org/project/NoAho/", "platform": "UNKNOWN", "project_url": "https://pypi.org/project/NoAho/", "project_urls": { "Download": "http://pypi.python.org/packages/source/N/NoAho/NoAho-0.9.6.1.tar.gz", "Homepage": "https://github.com/JDonner/NoAho" }, "release_url": "https://pypi.org/project/NoAho/0.9.6.1/", "requires_dist": null, "requires_python": null, "summary": "Fast, non-overlapping simultaneous multiple keyword search", "version": "0.9.6.1" }, "last_serial": 1619803, "releases": { "0.9": [ { "comment_text": "", "digests": { "md5": "7ee512e6cc89ad078e9b39429706936c", "sha256": "bca4310dc9e81db7ce7998bfbd54ea65b5503780acec07213996db459e268fd7" }, "downloads": -1, "filename": "NoAho-0.9.tar.gz", "has_sig": false, "md5_digest": "7ee512e6cc89ad078e9b39429706936c", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 39029, "upload_time": "2012-03-20T01:47:33", "url": "https://files.pythonhosted.org/packages/0a/80/6847f99f6660b68b247ea84bf4b5a7edd416059b307f3db8ab97865b2888/NoAho-0.9.tar.gz" } ], "0.9.01": [ { "comment_text": "", "digests": { "md5": "36de452454ac1b16066a7fe58845cd16", "sha256": "9cd589c28891dfa5c8fd4f2c908835fbaaa9830611c07d367f1177652a19090c" }, "downloads": -1, "filename": "NoAho-0.9.01.tar.gz", "has_sig": false, "md5_digest": "36de452454ac1b16066a7fe58845cd16", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 39276, "upload_time": "2012-03-20T07:17:16", "url": "https://files.pythonhosted.org/packages/b8/9c/b134fd837407145f6e8801bc3b9126cd0a50f8892cd03208837a2bbb1a98/NoAho-0.9.01.tar.gz" } ], "0.9.02": [ { "comment_text": "", "digests": { "md5": "c1007d1a7f999bc7c9910ecd30b9320f", "sha256": "bcfabfa19b00bc6cf6d7c94db01c5a0b0f65cc0e036dafbf1c253ed18f1d7d40" }, "downloads": -1, "filename": "NoAho-0.9.02.tar.gz", "has_sig": false, "md5_digest": "c1007d1a7f999bc7c9910ecd30b9320f", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 39290, "upload_time": "2012-03-20T08:06:58", "url": "https://files.pythonhosted.org/packages/aa/d7/5f489e17d831d260eb4167df8d0310a0f46b50420f01bb1b060636921ac8/NoAho-0.9.02.tar.gz" } ], "0.9.4.2": [ { "comment_text": "", "digests": { "md5": "c387625bf2864a1fe9d7757f417fcc58", "sha256": "1bb55858ba4602de7500efc24711a8bd20bc11990aaff0be6110b017891de512" }, "downloads": -1, "filename": "NoAho-0.9.4.2.tar.bz2", "has_sig": false, "md5_digest": "c387625bf2864a1fe9d7757f417fcc58", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 39747, "upload_time": "2014-05-06T09:15:22", "url": "https://files.pythonhosted.org/packages/62/d0/febd74b4a2495f679f994a4494baf0765d659139ffa50b3c185b4b7b2b0f/NoAho-0.9.4.2.tar.bz2" }, { "comment_text": "", "digests": { "md5": "549905957b7969e41682b127d9072336", "sha256": "45d1043fc3ae41906a4360204dcd4d735532570add545457182fb4933f02599f" }, "downloads": -1, "filename": "NoAho-0.9.4.2.tar.gz", "has_sig": false, "md5_digest": "549905957b7969e41682b127d9072336", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 45647, "upload_time": "2014-05-06T09:15:25", "url": "https://files.pythonhosted.org/packages/77/5d/b18f70ccd6d08b3fac79a21a468c2c5ff572770ffe5fd56c91e56bff41a1/NoAho-0.9.4.2.tar.gz" }, { "comment_text": "", "digests": { "md5": "ee208e29a9aa4c33ad46ad381ae9fd5c", "sha256": "85b51680040b03ee28608351cc853d2a4b6c99d427594ccc1c12d8b7607f4085" }, "downloads": -1, "filename": "NoAho-0.9.4.2.zip", "has_sig": false, "md5_digest": "ee208e29a9aa4c33ad46ad381ae9fd5c", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 52762, "upload_time": "2014-05-06T09:15:28", "url": "https://files.pythonhosted.org/packages/9e/cc/66237cceb7b590882348d653e98af93ff1b8c5d5ae8b3305037358526b09/NoAho-0.9.4.2.zip" } ], "0.9.5": [ { "comment_text": "", "digests": { "md5": "4912666810bb95c11a8a90e12897b5d8", "sha256": "6e4a89dd8e907d86d55217e454a927c60cd08006573557b28802fc995bd3a526" }, "downloads": -1, "filename": "NoAho-0.9.5.tar.bz2", "has_sig": false, "md5_digest": "4912666810bb95c11a8a90e12897b5d8", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 39595, "upload_time": "2015-02-17T08:15:24", "url": "https://files.pythonhosted.org/packages/85/6e/db8b2b186b722aaebfdf0420c65fccc5b97dc235ea024152929fa4def503/NoAho-0.9.5.tar.bz2" }, { "comment_text": "", "digests": { "md5": "a3345decd788f347aeb47c29226b86d5", "sha256": "30c9e5bef7bc18d27ee9040bea8a161f841812be637e66be5a49575fdf6d302f" }, "downloads": -1, "filename": "NoAho-0.9.5.tar.gz", "has_sig": false, "md5_digest": "a3345decd788f347aeb47c29226b86d5", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 46396, "upload_time": "2015-02-17T08:15:26", "url": "https://files.pythonhosted.org/packages/c6/ac/9f262bc5a5204d666c93bb55d5c841235fc1ec4f3c705ef3d20fc38485aa/NoAho-0.9.5.tar.gz" }, { "comment_text": "", "digests": { "md5": "5c9f38c7c7365a686c093b49e0cdbe00", "sha256": "87077f61e7d9c5af89a586cc4aafc58959245576fcff72fb4a8fd76429554285" }, "downloads": -1, "filename": "NoAho-0.9.5.zip", "has_sig": false, "md5_digest": "5c9f38c7c7365a686c093b49e0cdbe00", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 52563, "upload_time": "2015-02-17T08:15:29", "url": "https://files.pythonhosted.org/packages/4b/49/120572ca8be7e67366c6e7fa7b57982417991069e8e34327063f1799d67c/NoAho-0.9.5.zip" } ], "0.9.6": [ { "comment_text": "", "digests": { "md5": "960f3dd57ed0d91726a9f3f7f3378759", "sha256": "f5927804843e8d3dcd6b55b6abf8af14c7fbd9eaca59fcb67223c186bac47e2f" }, "downloads": -1, "filename": "NoAho-0.9.6.tar.bz2", "has_sig": false, "md5_digest": "960f3dd57ed0d91726a9f3f7f3378759", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 39659, "upload_time": "2015-02-21T05:31:44", "url": "https://files.pythonhosted.org/packages/4a/ba/cc87b6e991fef0d1b383f6633a3b7bb896c90b0f87039cdf9d50fc4e5562/NoAho-0.9.6.tar.bz2" }, { "comment_text": "", "digests": { "md5": "c3025829d775d5c020aad329606dd372", "sha256": "b5ba76cd0899c3bf43999c0b46fd190b4c6c052964d48a023428fbc62ddc76d4" }, "downloads": -1, "filename": "NoAho-0.9.6.tar.gz", "has_sig": false, "md5_digest": "c3025829d775d5c020aad329606dd372", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 46428, "upload_time": "2015-02-21T05:31:47", "url": "https://files.pythonhosted.org/packages/97/35/16fe06d59898e2b50fbc2f45523cc0d2bbaf12567a5df007fb2ba98e4822/NoAho-0.9.6.tar.gz" }, { "comment_text": "", "digests": { "md5": "548b705801ec3c1ccb12162be1054a34", "sha256": "3d8cefe84b9eb1dbf6c529c4a20695f5d31ba90282f095e4aeabb9ff402f9d9d" }, "downloads": -1, "filename": "NoAho-0.9.6.zip", "has_sig": false, "md5_digest": "548b705801ec3c1ccb12162be1054a34", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 52590, "upload_time": "2015-02-21T05:31:51", "url": "https://files.pythonhosted.org/packages/8d/df/83b6b2a54d2ace100d0ac50005e35cf7b384b8c51b6338f51571cb9523d7/NoAho-0.9.6.zip" } ], "0.9.6.1": [ { "comment_text": "", "digests": { "md5": "1418a6b84ba5454629b5861f14c0f81f", "sha256": "f07b3b313cf063395dac0f2030a7e5c281f039c630ed565d125f441e11c975a5" }, "downloads": -1, "filename": "NoAho-0.9.6.1.tar.bz2", "has_sig": false, "md5_digest": "1418a6b84ba5454629b5861f14c0f81f", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 40504, "upload_time": "2015-07-05T04:46:24", "url": "https://files.pythonhosted.org/packages/33/4a/3ea8d85ce45ed7fcb3785c80e8047bb1ad4407608381bb1fc516ea6a4780/NoAho-0.9.6.1.tar.bz2" }, { "comment_text": "", "digests": { "md5": "85816f8c2b099014c52f3f73256a0c8a", "sha256": "d41961d8b46ca260c102be9b9ebde4a4d3e0d5babc3a1e63bb6478a2f2029feb" }, "downloads": -1, "filename": "NoAho-0.9.6.1.tar.gz", "has_sig": false, "md5_digest": "85816f8c2b099014c52f3f73256a0c8a", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 47229, "upload_time": "2015-07-05T04:46:27", "url": "https://files.pythonhosted.org/packages/2d/ef/1462ac8dc402b0ad622a64f10b394cf5136b47636e6e20ef39f3357406a6/NoAho-0.9.6.1.tar.gz" }, { "comment_text": "", "digests": { "md5": "0304c2d48e5db251debf74f044ae8ce1", "sha256": "4ffcf29e7002203f39a8f254ad7f792351dcb601aea84988f7e5cd4c4a027888" }, "downloads": -1, "filename": "NoAho-0.9.6.1.zip", "has_sig": false, "md5_digest": "0304c2d48e5db251debf74f044ae8ce1", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 54028, "upload_time": "2015-07-05T04:46:30", "url": "https://files.pythonhosted.org/packages/f1/ea/4c6f909222da563daf274ddb4a9be70807c91d2a2bafa79d2c1f4372b183/NoAho-0.9.6.1.zip" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "1418a6b84ba5454629b5861f14c0f81f", "sha256": "f07b3b313cf063395dac0f2030a7e5c281f039c630ed565d125f441e11c975a5" }, "downloads": -1, "filename": "NoAho-0.9.6.1.tar.bz2", "has_sig": false, "md5_digest": "1418a6b84ba5454629b5861f14c0f81f", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 40504, "upload_time": "2015-07-05T04:46:24", "url": "https://files.pythonhosted.org/packages/33/4a/3ea8d85ce45ed7fcb3785c80e8047bb1ad4407608381bb1fc516ea6a4780/NoAho-0.9.6.1.tar.bz2" }, { "comment_text": "", "digests": { "md5": "85816f8c2b099014c52f3f73256a0c8a", "sha256": "d41961d8b46ca260c102be9b9ebde4a4d3e0d5babc3a1e63bb6478a2f2029feb" }, "downloads": -1, "filename": "NoAho-0.9.6.1.tar.gz", "has_sig": false, "md5_digest": "85816f8c2b099014c52f3f73256a0c8a", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 47229, "upload_time": "2015-07-05T04:46:27", "url": "https://files.pythonhosted.org/packages/2d/ef/1462ac8dc402b0ad622a64f10b394cf5136b47636e6e20ef39f3357406a6/NoAho-0.9.6.1.tar.gz" }, { "comment_text": "", "digests": { "md5": "0304c2d48e5db251debf74f044ae8ce1", "sha256": "4ffcf29e7002203f39a8f254ad7f792351dcb601aea84988f7e5cd4c4a027888" }, "downloads": -1, "filename": "NoAho-0.9.6.1.zip", "has_sig": false, "md5_digest": "0304c2d48e5db251debf74f044ae8ce1", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 54028, "upload_time": "2015-07-05T04:46:30", "url": "https://files.pythonhosted.org/packages/f1/ea/4c6f909222da563daf274ddb4a9be70807c91d2a2bafa79d2c1f4372b183/NoAho-0.9.6.1.zip" } ] }