{ "info": { "author": "David Hagen", "author_email": "david@drhagen.com", "bugtrack_url": null, "classifiers": [ "Development Status :: 5 - Production/Stable", "Intended Audience :: Developers", "License :: OSI Approved :: MIT License", "Operating System :: OS Independent", "Programming Language :: Python", "Programming Language :: Python :: 3", "Programming Language :: Python :: 3.4", "Programming Language :: Python :: 3.5", "Programming Language :: Python :: 3.6", "Programming Language :: Python :: 3.7", "Programming Language :: Python :: 3 :: Only", "Topic :: Software Development :: Libraries" ], "description": "Parsita\n=======\n\n.. image:: https://travis-ci.org/drhagen/parsita.svg\n :target: https://travis-ci.org/drhagen/parsita\n.. image:: https://codecov.io/github/drhagen/parsita/coverage.svg\n :target: https://codecov.io/github/drhagen/parsita\n.. image:: https://img.shields.io/pypi/v/parsita.svg\n :target: https://pypi.python.org/pypi/parsita\n.. image:: https://img.shields.io/pypi/pyversions/parsita.svg\n :target: https://pypi.python.org/pypi/parsita\n\nThe executable grammar of parsers combinators made available in the executable pseudocode of Python.\n\nMotivation\n----------\n\nParsita is a parser combinator library written in Python. Parser combinators provide an easy way to define a grammar using code so that the grammar itself effectively parses the source. They are not the fastest to parse, but they are the easiest to write. The science of parser combinators is best left to `others `__, so I will demonstrate only the syntax of Parsita.\n\nLike all good parser combinator libraries, this one abuses operators to provide a clean grammar-like syntax. The ``__or__`` method is defined so that ``|`` tests between two alternatives. The ``__and__`` method is defined so that ``&`` tests two parsers in sequence. Other operators are used as well.\n\nIn a technique that I think is new to Python, Parsita uses metaclass magic to allow for forward declarations of values. This is important for parser combinators because grammars are often recursive or mutually recursive, means that some components must be used in the definition of others before they themselves are defined.\n\nMotivating example\n^^^^^^^^^^^^^^^^^^\n\nBelow is a complete parser of `JSON `__. It could have be shorter if I chose to cheat with Python's ``eval``, but I wanted to show the full power of Parsita:\n\n.. code:: python\n\n from parsita import *\n\n json_whitespace = r'[ \\t\\n\\r]*'\n\n class JsonStringParsers(TextParsers, whitespace=None):\n quote = lit(r'\\\"') > constant('\"')\n reverse_solidus = lit(r'\\\\') > constant('\\\\')\n solidus = lit(r'\\/') > constant('/')\n backspace = lit(r'\\b') > constant('\\b')\n form_feed = lit(r'\\f') > constant('\\f')\n line_feed = lit(r'\\n') > constant('\\n')\n carriage_return = lit(r'\\r') > constant('\\r')\n tab = lit(r'\\t') > constant('\\t')\n uni = reg(r'\\\\u([0-9a-fA-F]{4})') > (lambda x: chr(int(x.group(1), 16)))\n\n escaped = (quote | reverse_solidus | solidus | backspace | form_feed |\n line_feed | carriage_return | tab | uni)\n unescaped = reg(r'[\\u0020-\\u0021\\u0023-\\u005B\\u005D-\\U0010FFFF]+')\n\n string = '\"' >> rep(escaped | unescaped) << '\"' > ''.join\n\n\n class JsonParsers(TextParsers, whitespace=json_whitespace):\n number = reg(r'-?(0|[1-9][0-9]*)(\\.[0-9]+)?([eE][-+]?[0-9]+)?') > float\n\n false = lit('false') > constant(False)\n true = lit('true') > constant(True)\n null = lit('null') > constant(None)\n\n string = JsonStringParsers.string\n\n array = '[' >> repsep(value, ',') << ']'\n\n entry = string << ':' & value\n obj = '{' >> repsep(entry, ',') << '}' > dict\n\n value = number | false | true | null | string | array | obj\n\n if __name__ == '__main__':\n strings = [\n '\"name\"',\n '-12.40e2',\n '[false, true, null]',\n '{\"__class__\" : \"Point\", \"x\" : 2.3, \"y\" : -1.6}',\n '{\"__class__\" : \"Rectangle\", \"location\" : {\"x\":-1.3,\"y\":-4.5}, \"height\" : 2.0, \"width\" : 4.0}',\n ]\n\n for string in strings:\n print('source: {}\\nvalue: {}'.format(string, JsonParsers.value.parse(string)))\n\nTutorial\n--------\n\nThe recommended means of installation is with ``pip`` from PyPI.\n\n.. code:: bash\n\n pip3 install parsita\n\nThere is a lot of generic parsing machinery under the hood. Parser combinators have a rich science behind them. If you know all about that and want to do advanced parsing, by all means pop open the source hood and install some nitro. However, most users will want the basic interface, which is described below.\n\n.. code:: python\n\n from parsita import *\n\nMetaclass magic\n^^^^^^^^^^^^^^^\n\n``GeneralParsers`` and ``TextParsers`` are two classes that are imported that are just wrappers around a couple of metaclasses. They are not meant to be instantiated. They are meant to be inherited from and their class bodies used to define a grammar. I am going to call these classes \"contexts\" to reflect their intended usage.\n\n.. code:: python\n\n class MyParsers(TextParsers):\n ...\n\nIf you are parsing strings (and you almost certainly are), use ``TextParser`` not the other one. If you know what it means to parse things other than strings, you probably don't need this tutorial anyway. The ``TextParser`` ignores whitespace. By default it considers ``r\"\\s*\"`` to be whitespace, but this can be configured using the ``whitespace`` keyword. Use ``None`` to disable whitespace skipping.\n\n.. code:: python\n\n class MyParsers(TextParsers, whitespace=r'[ \\t]*'):\n # In here, only space and tab are considered whitespace.\n # This can be useful for grammars sensitive to newlines.\n ...\n\n``lit(*literals)``: literal parser\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nThis is the simplest parser. It matches the exact string provided and returns the string as its value. If multiple arguments are provided, it tries each one in succession, returning the first one it finds.\n\n.. code:: python\n\n class HelloParsers(TextParsers):\n hello = lit('Hello World!')\n assert HelloParsers.hello.parse('Hello World!') == Success('Hello World!')\n assert HelloParsers.hello.parse('Goodbye') == Failure(\"Hello World! expected but Goodbye found\")\n\nIn most cases, the call to ``lit`` is handled automatically. If a bare string is provided to the functions and operators below, it will be promoted to literal parser whenever possible. Only when an operator is between two Python types, like a string and a string ``'a' | 'b'`` or a string and function ``'100' > int`` will this \"implicit conversion\" not take place and you have to use ``lit`` (e.g. ``lit('a', 'b')`` and ``lit('100') > int``).\n\n``reg(pattern)``: regular expression parser\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nLike ``lit``, this matches a string and returns it, but the matching is done with a `regular expression `__.\n\n.. code:: python\n\n class IntegerParsers(TextParsers):\n integer = reg(r'[-+]?[0-9]+')\n assert IntegerParsers.integer.parse('-128') == Success('-128')\n\n``parser > function``: conversion parser\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nConversion parsers don't change how the text is parsed\u00e2\u20ac\u201dthey change the value returned. Every parser returns a value when it succeeds. The function supplied must take a single argument (that value) and returns a new value. This is how text is converted to other objects and simpler objects built into larger ones. In accordance with Python's operator precedence, ``>`` is the operator in Parsita with the loosest binding.\n\n.. code:: python\n\n class IntegerParsers(TextParsers):\n integer = reg(r'[-+]?[0-9]+') > int\n assert IntegerParsers.integer.parse('-128') == Success(-128)\n\n``parser1 | parser2``: alternative parser\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nThis tries to match ``parser1``. If it fails, it then tries to match ``parser2``. If both fail, it returns the failure message from whichever one got farther. Either side can be a bare string, not both because ``'a' | 'b'`` tries to call ``__or__`` on ``str`` which fails. To try alternative literals, use ``lit`` with multiple arguments.\n\n.. code:: python\n\n class NumberParsers(TextParsers):\n integer = reg(r'[-+]?[0-9]+') > int\n real = reg(r'[+-]?\\d+\\.\\d+(e[+-]?\\d+)?') | 'nan' | 'inf' > float\n number = integer | real\n assert NumberParsers.number.parse('4.0000') == Success(4.0)\n\n``parser1 & parser2``: sequential parser\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nAll the parsers above will match at most one thing. This is the syntax for matching one parser and then another after it. If working in the ``TextParsers`` context, the two may be separated by whitespace. The value returned is a list of all the values returned by each parser. If there are multiple parsers separated by ``&``, a list of the same length as the number of parsers is returned. Like ``|``, either side may be a bare string, but not both. In accordance with Python's operator precedence, ``&`` binds more tightly than ``|``.\n\n.. code:: python\n\n class UrlParsers(TextParsers, whitespace=None):\n url = lit('http', 'ftp') & '://' & reg(r'[^/]+') & reg(r'.*')\n assert UrlParsers.url.parse('http://drhagen.com/blog/sane-equality/') == \\\n Success(['http', '://', 'drhagen.com', '/blog/sane_equality/'])\n\n``parser1 >> parser2`` and ``parser1 << parser2``: discard left and right parsers\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nThe discard left and discard right parser match the exact same text as ``parser1 & parser2``, but rather than return a list of values from both, the left value in ``>>`` and the right value in ``<<`` is discarded so that only the remaining value is returned. A mnemonic to help remember which is which is to imagine the symbols as open mouths eating the parser to be discarded.\n\n.. code:: python\n\n class PointParsers(TextParsers):\n integer = reg(r'[-+]?[0-9]+') > int\n point = '(' >> integer << ',' & integer << ')'\n assert PointParsers.point.parse('(4, 3)') == Success([4, 3])\n\nIn accordance with Python's operator precedence, these bind more tightly than any other operators including ``&`` or ``|``, meaning that ``<<`` and ``>>`` discard only the immediate parser.\n\n- Incorrect: ``entry = key << ':' >> value``\n- Correct: ``entry = key << ':' & value``\n- Also correct: ``entry = key & ':' >> value``\n- Incorrect: ``hostname = lit('http', 'ftp') & '://' >> reg(r'[^/]+') << reg(r'.*')``\n- Correct: ``hostname = lit('http', 'ftp') >> '://' >> reg(r'[^/]+') << reg(r'.*')``\n- Also correct: ``hostname = (lit('http', 'ftp') & '://') >> reg(r'[^/]+') << reg(r'.*')``\n\n``opt(parser)``: optional parser\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nAn optional parser tries to match its argument. If the argument succeeds, it returns a list of length one with the successful value as its only element. If the argument fails, then ``opt`` succeeds anyway, but returns an empty list and consumes no input.\n\n.. code:: python\n\n class DeclarationParsers(TextParsers):\n id = reg(r'[A-Za-z_][A-Za-z0-9_]+')\n declaration = id & opt(':' >> id)\n assert DeclarationParsers.declaration.parse('x: int') == Success(['x', ['int']])\n\n``rep(parser)`` and ``rep1(parser)``: repeated parsers\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nA repeated parser matches repeated instances of its parser argument. It returns a list with each element being the value of one match. ``rep1`` only succeeds if at least one match is found. ``rep`` always succeeds, returning an empty list if no matches are found.\n\n.. code:: python\n\n class SummationParsers(TextParsers):\n integer = reg(r'[-+]?[0-9]+') > int\n summation = integer & rep('+' >> integer) > lambda x: sum([x[0]] + x[1])\n assert SummationParsers.summation.parse('1 + 1 + 2 + 3 + 5') == Success(12)\n\n``repsep(parser, separator)`` and ``rep1sep(parser, separator)``: repeated separated parsers\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nA repeated separated parser matches ``parser`` separated by ``separator``, returning a list of the values returned by ``parser`` and discarding the value of ``separator``. ``rep1sep`` only succeeds if at least one match is found. ``repsep`` always succeeds, returning an empty list if no matches are found.\n\n.. code:: python\n\n class ListParsers(TextParsers):\n integer = reg(r'[-+]?[0-9]+') > int\n my_list = '[' >> repsep(integer, ',') << ']'\n assert ListParsers.my_list.parse('[1,2,3]') == Success([1, 2, 3])\n\n``eof``: end of file\n^^^^^^^^^^^^^^^^^^^^\n\nA parser than matches the end of the input stream. It is not necessary to include this on every parser. The ``parse`` method on every parser is successful if it matches the entire input. The ``eof`` parser is only needed to indicate that the preceding parser is only valid at the end of the input. Most commonly, it is used an alternative to an end token when the end token may be omitted at the end of the input. Note that ``eof`` is not a function\u00e2\u20ac\u201dit is a complete parser itself.\n\n.. code:: python\n\n class OptionsParsers(TextParsers):\n option = reg(r'[A-Za-z]+') << '=' & reg(r'[A-Za-z]+') << (';' | eof)\n options = rep(option)\n assert OptionsParsers.options.parse('log=warn;detail=minimal;') == \\\n Success([['log', 'warn'], ['detail', 'minimal']])\n assert OptionsParsers.options.parse('log=warn;detail=minimal') == \\\n Success([['log', 'warn'], ['detail', 'minimal']])\n\n``fwd()``: forward declaration\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nThis creates a forward declaration for a parser to be defined later. This function is not typically needed because forward declarations are created automatically within the class bodies of subclasses of ``TextParsers`` and ``GeneralParsers``, which is the recommended way to use Parsita. This function exists so you can create a forward declaration manually because you are either working outside of the magic classes or wish to define them manually to make your IDE happier.\n\nTo use ``fwd``, first assign ``fwd()`` to a variable, then use that variable in other combinators like any other parser, then call the ``define(parser: Parser)`` method on the object to provide the forward declaration with its definition. The forward declaration will now look and act like the definition provided.\n\n.. code:: python\n\n class ArithmeticParsers(TextParsers):\n number = reg(r'[+-]?\\d+(\\.\\d+)?(e[+-]?\\d+)?') > float\n expr = fwd()\n base = '(' >> expr << ')' | number\n add = base & '+' >> expr > (lambda x: x[0] + x[1])\n subtract = base & '-' >> expr > (lambda x: x[0] - x[1])\n expr.define(add | subtract | base)\n assert ArithmeticParsers.expr.parse('2-(1+2)') == Success(-1.0)\n\n``success(value)``: always succeed with value\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nThis parser always succeeds with the given ``value`` of an arbitrary type while consuming no input. Its utility is limited to inserting arbitrary values into complex parsers, often as a placeholder for unimplemented code. Usually, these kinds of values are better inserted as a post processing step or with a conversion parser ``>``, but for prototyping, this parser can be convenient.\n\n.. code:: python\n\n class HostnameParsers(TextParsers, whitespace=None):\n port = success(80) # TODO: do not just ignore other ports\n host = rep1sep(reg('[A-Za-z0-9]+([-]+[A-Za-z0-9]+)*'), '.')\n server = host & port\n assert HostnameParsers.server.parse('drhagen.com') == Success([['drhagen', 'com'], 80])\n\n``failure(expected)``: always fail with message\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nThis parser always fails with a message that it is expecting the given string ``expected``. Its utility is limited to marking sections of code as either not yet implemented or providing a better error message for common bad input. Usually, these kinds of messages are better crafted as a processing step following parsing, but for prototyping, they can be inserted with this parser.\n\n.. code:: python\n\n class HostnameParsers(TextParsers, whitespace=None):\n # TODO: implement allowing different port\n port = lit('80') | reg('[0-9]+') & failure('no other port than 80')\n host = rep1sep(reg('[A-Za-z0-9]+([-]+[A-Za-z0-9]+)*'), '.')\n server = host << ':' & port\n assert HostnameParsers.server.parse('drhagen.com:443') == \\\n Failure('Expected no other port than 80 but found end of source')\n", "description_content_type": "", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/drhagen/parsita", "keywords": "parser combinator", "license": "MIT", "maintainer": "", "maintainer_email": "", "name": "parsita", "package_url": "https://pypi.org/project/parsita/", "platform": "", "project_url": "https://pypi.org/project/parsita/", "project_urls": { "Homepage": "https://github.com/drhagen/parsita" }, "release_url": "https://pypi.org/project/parsita/1.3.2/", "requires_dist": null, "requires_python": "", "summary": "Parser combinator library for Python.", "version": "1.3.2" }, "last_serial": 4101110, "releases": { "1.0.0": [ { "comment_text": "", "digests": { "md5": "4985de483ec4c96514d6361e1fbd6bd9", "sha256": "8ae26767555ca54e254b0cbb2a66ff3dd0cffdde5172e590f6b70c5952937b12" }, "downloads": -1, "filename": "parsita-1.0.0.tar.gz", "has_sig": false, "md5_digest": "4985de483ec4c96514d6361e1fbd6bd9", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 13214, "upload_time": "2016-10-02T21:56:50", "url": "https://files.pythonhosted.org/packages/ff/d0/35c46eaa4c10de297ce26c61748283e9946e67c7082743301b90102fdcb6/parsita-1.0.0.tar.gz" } ], "1.1.0": [ { "comment_text": "", "digests": { "md5": "d807af75dd3bec1e8240ff0c662deb17", "sha256": "02aebe9c7e92e54692fd3b16309d247a6e90b26e916cf6cbcdff4b8e8ff48c6a" }, "downloads": -1, "filename": "parsita-1.1.0.tar.gz", "has_sig": false, "md5_digest": "d807af75dd3bec1e8240ff0c662deb17", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 13630, "upload_time": "2016-11-16T11:47:40", "url": "https://files.pythonhosted.org/packages/60/71/5b03a55c9899f2100592967da146ffb35b8043e3c51f9dc70415515c800d/parsita-1.1.0.tar.gz" } ], "1.1.1": [ { "comment_text": "", "digests": { "md5": "bde32bc55719d2935def133c9ff19b0b", "sha256": "dbe859430f3bd4b03998e782b1363a2ae8ae09cc34cfe9479f8c8327739c4a2e" }, "downloads": -1, "filename": "parsita-1.1.1.tar.gz", "has_sig": false, "md5_digest": "bde32bc55719d2935def133c9ff19b0b", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 13632, "upload_time": "2017-02-18T14:07:08", "url": "https://files.pythonhosted.org/packages/ee/7e/30bd339ca51947f649868615879b193e04cf0935df68c84d8794769144d7/parsita-1.1.1.tar.gz" } ], "1.2.0": [ { "comment_text": "", "digests": { "md5": "d58d079431d4d109471fd3b95a5b3312", "sha256": "f1ba917e714df20c78f033d081d05cf90a47481a2a879561afa340c3f95f7848" }, "downloads": -1, "filename": "parsita-1.2.0.tar.gz", "has_sig": false, "md5_digest": "d58d079431d4d109471fd3b95a5b3312", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 14698, "upload_time": "2017-10-06T00:46:14", "url": "https://files.pythonhosted.org/packages/f7/da/9b5e7a944cd52fbba67fe1bb7d018eeeabb539a4948d05ff3db8cfe83559/parsita-1.2.0.tar.gz" } ], "1.2.1": [ { "comment_text": "", "digests": { "md5": "4649974e716b3a0a36dc028268164133", "sha256": "bf3a35287599bf65f23e7095784131fdda60500b9f760480c332a9b20903c008" }, "downloads": -1, "filename": "parsita-1.2.1.tar.gz", "has_sig": false, "md5_digest": "4649974e716b3a0a36dc028268164133", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 15719, "upload_time": "2018-04-02T00:57:45", "url": "https://files.pythonhosted.org/packages/86/f6/bb50d2e8d15034607f0b5abea0d464ebdda5bfba6536d35954bdbebd0271/parsita-1.2.1.tar.gz" } ], "1.3.0": [ { "comment_text": "", "digests": { "md5": "33d3dbca0ec0112363a0c305e8bd55ad", "sha256": "96a4fb94a6ee705a256aa5251791bb04be0186af2eaea9acb5d09ed1f3a33748" }, "downloads": -1, "filename": "parsita-1.3.0.tar.gz", "has_sig": false, "md5_digest": "33d3dbca0ec0112363a0c305e8bd55ad", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 17138, "upload_time": "2018-04-28T17:26:45", "url": "https://files.pythonhosted.org/packages/27/95/31f369c4cbf5df3b5e78d1bfb71d4f9921734f03d7fb653f0a62356a5392/parsita-1.3.0.tar.gz" } ], "1.3.1": [ { "comment_text": "", "digests": { "md5": "169497e4548f0a825b35abb3bb42c5fc", "sha256": "6096ddc5ff9d2def24d195a39275d9fa7045b0ff65d8c993f15491890a895546" }, "downloads": -1, "filename": "parsita-1.3.1.tar.gz", "has_sig": false, "md5_digest": "169497e4548f0a825b35abb3bb42c5fc", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 17143, "upload_time": "2018-04-28T17:49:21", "url": "https://files.pythonhosted.org/packages/c9/2e/793eef161fd99511962d696df9c28b4db51de9e9201f8233fb48010b5dd9/parsita-1.3.1.tar.gz" } ], "1.3.2": [ { "comment_text": "", "digests": { "md5": "4b00e08bc4a80df304c82fe280e5c58e", "sha256": "9a5a132f1e40339c204406b86e024ddc8819764a1b776fbe297391c559156e87" }, "downloads": -1, "filename": "parsita-1.3.2.tar.gz", "has_sig": false, "md5_digest": "4b00e08bc4a80df304c82fe280e5c58e", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 17166, "upload_time": "2018-07-25T16:08:26", "url": "https://files.pythonhosted.org/packages/b8/8c/596a5b4c2e6a6d6402c2a02c9c7526b0f45e471dd6a663417ae5dcff5537/parsita-1.3.2.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "4b00e08bc4a80df304c82fe280e5c58e", "sha256": "9a5a132f1e40339c204406b86e024ddc8819764a1b776fbe297391c559156e87" }, "downloads": -1, "filename": "parsita-1.3.2.tar.gz", "has_sig": false, "md5_digest": "4b00e08bc4a80df304c82fe280e5c58e", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 17166, "upload_time": "2018-07-25T16:08:26", "url": "https://files.pythonhosted.org/packages/b8/8c/596a5b4c2e6a6d6402c2a02c9c7526b0f45e471dd6a663417ae5dcff5537/parsita-1.3.2.tar.gz" } ] }