{ "info": { "author": "GE Ning", "author_email": "benjaminzge@gmail.com", "bugtrack_url": null, "classifiers": [ "Development Status :: 3 - Alpha", "Intended Audience :: Developers", "License :: OSI Approved :: GNU Lesser General Public License v3 (LGPLv3)", "Operating System :: OS Independent", "Programming Language :: Python :: 2.6", "Programming Language :: Python :: 2.7", "Programming Language :: Python :: 3", "Topic :: Text Processing :: General", "Topic :: Utilities" ], "description": "2-dimensional Sequence Regular Expression (SEQ RE)\n==================================================\n\nThis module provides regular expression matching operations over a sequence of tuples\n(or a sequence of sequence) data structure. It looks like the following::\n\n seq_m_n = [[str_11, str_12, ... str_1n],\n [str_21, str_22, ... str_2n],\n ...,\n [str_m1, str_m2, ... str_mn]]\n\nThe sequence is a homogeneous 2D array, that is a matrix with m rows and n columns.\nIn practice, m maybe vary from sequence to sequence, while n is usually a fixed-size.\n\nA element in the tuple of the sequence can be considered as either a string, a word, a phrase,\na char, a flag, a token or a tag, and maybe a set of tags or values (multi-values) in the future.\n\nTo match a pattern over a sequence of tuples,\nthe SEQ RE patterns is written like one of the examples::\n\n ([;;PERSON]+) [was|has been] [an]? .{0,3} ([^painter|drawing artist|\u753b\u5bb6])\n\n (?P[;;PERSON]) [;VERB be;] [born] [on] (?P([;;NUMBER|MONTH]|[-]){2,3})\n\n\n1. The syntax of SEQ RE pattern\n-------------------------------\n\nA SEQ RE pattern is very similar to the ordinary regular express (RE) used in Python,\nin which the delimiters ``[...]`` is to indicate a tuple -- the second dimension of the sequence.\n\n1.1 Inside ``[...]``\n++++++++++++++++++++\n\n- ``[`` and ``]``\n\n is the beginning and end delimiter of the tuple, e.g. ``[...]``.\n\n- ``;``\n\n separates each element which the tuple contains,\n and the continuous ``;`` at the tail can be omitted,\n e.g. ``[A|B;X;;]``, ``[A|B;X]``.\n\n- ``|``\n\n indicates the different values of one element, e.g. ``A|B``.\n These values form a set, and any string in the set will be matched,\n e.g. ``A|B`` will match ``A`` or ``B``.\n\n- ``^``\n\n be the first character of an element,\n all the string that are not in the value set of this element will be matched.\n And ``^`` has no special meaning if it\u2019s not the first character of the element.\n If ``^`` comes the first character of an element but it is a part of a literal string,\n ``\\^`` should be used to escape it.\n\n- The priority of above-mentioned operations:\n\n ``[`` ``]`` < ``;`` < ``^`` (not literal) < ``|`` < ``^`` (literal) .\n\n- ``\\``\n\n is an escaping symbol before aforementioned special characters.\n Characters other than ``]``, ``:`` or ``\\`` lose their special meaning inside ``[...]``.\n To express ``]``, ``:`` or ``|`` in literal, ``\\`` should be added before ``]``, ``:`` or ``|``.\n Meanwhile, to represent a literal backslash ``\\`` before ``]``, ``;`` or ``|``,\n ``\\\\`` should be used in the plain text\n that is to say ``'\\\\\\\\'`` must be used in the Python code.\n\n1.2 Outside ``[...]``\n+++++++++++++++++++++\n\n- The special meanings of special characters in the ordinary RE are available here,\n but with the limitations discussed below.\n\n 1. **Not** support ``[`` and ``]`` as special characters to indicate a set of characters.\n\n 2. **Not** support the following escaped special characters:\n ``\\number``, ``\\A``, ``\\b``, ``\\B``, ``\\d``, ``\\D``, ``\\s``, ``\\S``,\n ``\\w``, ``\\W``, ``\\Z``, ``\\a``, ``\\b``, ``\\f``, ``\\n``, ``\\r``, ``\\t``, ``\\v``,\n ``\\x``.\n\n 3. **Not** support ranges of characters,\n such as ``[0-9A-Za-z]``, ``[\\u4E00-\\u9FBB\\u3007]`` (Unihan and Chinese character ``\u3007``)\n used in ordinary RE.\n\n 4. The whitespace and non-special characters are ignored.\n\n- ``.`` is an abbreviation of an arbitrary tuple ``[]`` or ``[;]``.\n\n- The named groups in the pattern are very useful.\n As an extension, a format string starting with ``@`` can be followed after the group name,\n to describe which element of the tuples belonging this group will be output as the result.\n For example: ``(?P...)``,\n in which ``d1``, ``d2`` and ``d3`` are all 0-based position index number of elements in the tuple.\n\n 1. ``@0,2:4`` means in the matched result only the 0th\n and from 2nd to 3rd elements of tuples will be output.\n\n 2. ``@@`` means the pattern of the group itself will be output other than the matched result.\n one can choose whether to include the group name and parentheses or not.\n\n 3. ``@`` means all elements of tuples in the matched result will be output.\n\n1.3 Boolean logic in the ``[...]``\n++++++++++++++++++++++++++++++++++\n\nGiven a sequence of 3-tuple ``[[s1, s2, s3], ... ]``,\n\n- AND\n\n ``[X;;Y]`` will match ``s1`` == ``X`` && ``s3`` == ``Y``.\n Its behavior looks like the ordinary RE pattern ``(?:X.Y)``.\n\n- OR\n\n ``[X;;]|[;;Y]`` will match ``s1`` == ``X`` || ``s3`` == ``Y``.\n Its behavior looks like the ordinary RE pattern ``(?:X..)|(?:..Y)``\n\n- NOT\n\n If ``[;^P;]`` will match ``s2`` != ``P``.\n Its behavior looks like the ordinary RE pattern ``(?:.[^P].)``.\n\n We can also use a negative lookahead assertion of the ordinary RE,\n to give a negative covering its following.\n e.g. ``(?![;P;][Q])[;;][;;]`` <==> ``[;^P;][^Q;;]``,\n which behavior looks like the ordinary RE pattern ``(?!(?:.P.)(?:Q..))...``.\n\n2. Notes\n--------\n\n**Not** support comparing the number of figures.\n\nMulti-values of one element is not supported now, but this feature may be improved in the future.\n\nAlthough SEQ RE has sufficient ability to express a pattern over sequences of tuples,\nit is still not a cascaded regular expressions (see also: `Stanford TokensRegex\n`_).\n\n\n3. Examples\n-----------\n\nThe usage of seq_re module::\n\n from __future__ import print_function\n import seq_re\n\n n = 3\n pattern = ('(?P[;;PERSON]+) [is|was|has been] [a|an]? '\n '(?P.{0,3}) ([artist])')\n seq = [['Vincent van Gogh', 'NNP', 'PERSON'],\n ['was', 'VBD', 'O'],\n ['a', 'DT', 'O'],\n ['Dutch', 'JJ', 'O'],\n ['Post-Impressionist', 'NN', 'O'],\n ['painter', 'NN', 'OCCUPATION'],\n ['who', 'WP', 'O'],\n ['is', 'VBZ', 'O'],\n ['among', 'IN', 'O'],\n ['the', 'DT', 'O'],\n ['most', 'RBS', 'O'],\n ['famous', 'JJ', 'O'],\n ['and', 'CC', 'O'],\n ['influential', 'JJ', 'O'],\n ['figures', 'NNS', 'O'],\n ['in', 'IN', 'O'],\n ['the', 'DT', 'O'],\n ['history', 'NN', 'O'],\n ['of', 'IN', 'O'],\n ['Western art', 'NNP', 'DOMAIN'],\n ['.', '.', 'O']]\n placeholder_dict = {'artist': ['painter', 'drawing artist']}\n\n sr = seq_re.SeqRegex(n).compile(pattern, **placeholder_dict)\n match = sr.search(seq)\n if match:\n for g in match.group_list:\n print(' '.join(['`'.join(tup) for tup in g[1]]))\n for name in sorted(match.named_group_dict,\n key=lambda gn: match.named_group_dict[gn][0]):\n print(name, match.format_group_to_str(name, True))", "description_content_type": null, "docs_url": "https://pythonhosted.org/seq-re/", "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/gening/seq_re", "keywords": "n-tuple n-vector token sequence seq 2D-array matrix regular-expressions regex", "license": "LGPL-3.0", "maintainer": "", "maintainer_email": "", "name": "seq-re", "package_url": "https://pypi.org/project/seq-re/", "platform": "UNKNOWN", "project_url": "https://pypi.org/project/seq-re/", "project_urls": { "Homepage": "https://github.com/gening/seq_re" }, "release_url": "https://pypi.org/project/seq-re/0.2.1/", "requires_dist": null, "requires_python": "", "summary": "2-dimensional Sequence Regular Expression (SEQ RE)", "version": "0.2.1" }, "last_serial": 2828603, "releases": { "0.2": [ { "comment_text": "", "digests": { "md5": "4ab481cde404bf86a602a68f91d752e8", "sha256": "b83767e5e52f5efd37bfd8120a3d97408cf08be6cbc0e698d58cd14f483c4fc6" }, "downloads": -1, "filename": "seq_re-0.2.tar.gz", "has_sig": false, "md5_digest": "4ab481cde404bf86a602a68f91d752e8", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 21478, "upload_time": "2017-04-19T10:23:46", "url": "https://files.pythonhosted.org/packages/a3/f9/61335fec062990c7de2ccf77513e5995e0f7c5a35a6439741cf4c6d7d8a1/seq_re-0.2.tar.gz" } ], "0.2.1": [ { "comment_text": "", "digests": { "md5": "941f32ef91938bdf8905d51cfd6e7dcb", "sha256": "f26aa53ed1c49bff18bf3aadb98f0b58944b406e4b503b024c129b5060bb97f1" }, "downloads": -1, "filename": "seq_re-0.2.1.tar.gz", "has_sig": false, "md5_digest": "941f32ef91938bdf8905d51cfd6e7dcb", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 22031, "upload_time": "2017-04-25T11:53:28", "url": "https://files.pythonhosted.org/packages/01/fc/3b3ab30b98bb7e8b9ae525615251686f35580c54ada0a246eebe374f6516/seq_re-0.2.1.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "941f32ef91938bdf8905d51cfd6e7dcb", "sha256": "f26aa53ed1c49bff18bf3aadb98f0b58944b406e4b503b024c129b5060bb97f1" }, "downloads": -1, "filename": "seq_re-0.2.1.tar.gz", "has_sig": false, "md5_digest": "941f32ef91938bdf8905d51cfd6e7dcb", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 22031, "upload_time": "2017-04-25T11:53:28", "url": "https://files.pythonhosted.org/packages/01/fc/3b3ab30b98bb7e8b9ae525615251686f35580c54ada0a246eebe374f6516/seq_re-0.2.1.tar.gz" } ] }