{ "info": { "author": "lisael", "author_email": "lisael@lisael.org", "bugtrack_url": null, "classifiers": [ "Development Status :: 3 - Alpha", "Intended Audience :: Developers", "License :: OSI Approved :: GNU General Public License v3 (GPLv3)", "Programming Language :: Python :: 2", "Programming Language :: Python :: 2.7", "Topic :: Software Development :: Code Generators", "Topic :: Software Development :: Compilers" ], "description": "==========\nFastidious\n==========\n\nA python `parsing expression grammar\n(PEG) `_ based parser\ngenerator. It is loosely based on https://github.com/PuerkitoBio/pigeon\n\nUsage\n=====\n\nFrom `the calculator example\n`_.\nTo read the example, think regex, except that the OR spells ``/``, that\nliteral chars are in quotes and that we can reference other rules.\n\n.. code-block:: python\n\n #! /usr/bin/python\n from fastidious import Parser\n\n\n class Calculator(Parser):\n\n # __grammar__ is the PEG definition. Each `rulename <- a rule`\n # adds a method `Calculator.rulename()`. This methods tries to\n # match the input at current position\n __grammar__ = r\"\"\"\n # the label is ommited. `:expr` is equivalent to `expr:expr`\n eval <- :expr EOF {@expr}\n\n # action {on_expr} calls `Calculator.on_expr(self, value, first, rest)`\n # on match. `first` and `rest` args are the labeled parts of the rule\n term <- first:factor rest:( _ mult_op _ factor )* {on_expr}\n\n # Because the Parser has a method named `on_expr` (\"on_\" + rulename)\n # this method is the implicit action of this rule. We omitted {on_expr}\n expr <- _ first:term rest:( _ add_op _ term )* _\n\n # there's no explicit or implicit action. These rules return their exact\n # matches\n add_op <- '+' / '-'\n mult_op <- '*' / '/'\n\n # action {@fact} means : return only the match of part labeled `fact`.\n factor <- ( '(' fact:expr ')' ) / fact:integer {@fact}\n\n integer <- '-'? [0-9]+\n _ <- [ \\n\\t\\r]*\n\n # this one is tricky. `.` means \"any char\". At EOF there's no char,\n # thus Not any char, thus `!.`\n EOF <- !.\n \"\"\"\n\n def on_expr(self, value, first, rest):\n result = first\n for r in rest:\n op = r[1]\n if op == '+':\n result += r[3]\n elif op == '-':\n result -= r[3]\n elif op == '*':\n result *= r[3]\n else:\n result /= r[3]\n return result\n\n def on_integer(self, value):\n return int(self.p_flatten(value))\n\n if __name__ == \"__main__\":\n import sys\n c = Calculator(\"\".join(sys.argv[1:]))\n result = c.eval()\n # because eval is the first rule defined in the grammar, it's the default rule.\n # We could call the classmethod `p_parse`:\n # result = Calculator.p_parse(\"\".join(sys.argv[1:]))\n # The default entry point can be overriden setting the class attribute\n # `__default__`\n print(result)\n\nThen you have a full-fledge state-of-the-art integer only calculator \\o/\n\n.. code-block:: sh\n\n examples/calculator.py \"-21 * ( 3 + 1 ) / -2\"\n 42\n\nInheritance\n+++++++++++\n\nA parser can inherit rules. Here's an example from fastidious tests:\n\n.. code-block:: python\n\n class Parent(Parser):\n __grammar__ = r\"\"\"\n some_as <- 'a'+\n \"\"\"\n\n\n class Child(Parent):\n __grammar__ = r\"\"\"\n letters <- some_as some_bs EOF {p_flatten}\n some_bs <- 'b'+\n EOF <- !.\n \"\"\"\n\n assert(Child.p_parse(\"aabb\") == \"aabb\")\n\nHere, ``Child`` has inherited the method the rule ``some_as``.\n\nRules can also be overridden in child parsers.\n\nNote that there's no overhead in inheritance at parsing as the rules from the parent\nare copied into the child.\n\nContrib\n-------\n\nI plan to add a set of reusable rules in ``fastidious.contrib`` to compose your\nparsers.\n\nAt the moment, there's only URLParser, that provides a rule that match URLs and\noutputs an ``urlparse.ParseResult`` on match.\n\nPlease send a pull request if you made an interesting piece of code :)\n\nPEG Syntax\n==========\n\nThe whole syntax is formally defined in `fastidious parser class\n`_, using\nthe PEG syntax (which is actually used to generate the fastidious parser itself,\nso it's THE TRUTH. Yes, I like meta-stuff). What follows is an informal and\nrather incomplete description of this syntax.\n\nIdentifiers, whitespace, comments and literals follow a subset of python\nnotation:\n\n.. code-block::\n\n # a comment\n 'a string literal'\n \"a more \\\"complex\\\" one with a litteral '\\\\' \\nand a second line\"\n _aN_iden7ifi3r\n\nIdentifiers MUST be valid python identifiers as they are added as methods on the\nparser objects. Parsers have utility methods that are prefixed by `p_` and\n`_p_`. Please avoid these names.\n\nRules\n+++++\n\nA PEG grammar consists of a set of rules. A rule is an identifier followed by a\nrule definition operator ``<-`` and an expression. An optional display name - a\nstring literal used in error messages instead of the rule identifier - can be\nspecified after the rule identifier. An action can also be specified enclosed in\n``{}`` after the rule, more on this later.\n\n.. code-block::\n\n rule_a \"friendly name\" <- 'a'+ {an_action} # one or more lowercase 'a's\n\nActions\n+++++++\n\nActions are a way to alter the output of a rule. Without actions the rules emit\nstrings, lists of strings, or a list of lists and strings.\n\nAction are useful to control the output. One could for example instantiate AST\nnodes, or, as we do in the JSON example, our result string, lists and dicts.\n\nActions can also be used to reduce the result as the input is parsed, that's\nexactly what we do in the calculator example in the method ``on_expr``.\n\nThere are two kind of actions: labels and methods\n\nLabel action\n------------\n\nIf an expression has a label, you can use it as the return value. In the calculator,\nwe use::\n\n factor <- ( '(' fact:expr ')' ) / fact:integer {@fact}\n\nHere, ``@fact`` means 'return the part labeled ``fact``' which is an integer literal\nor the result of an ``expr`` enclosed in parentheses, depending on the branch that\nmatches.\n\nAll the rest (e.g the parentheses) of the match is never output and is lost.\n\nMethod action\n-------------\n\nMethod actions are methods on the parser. In the calculator, there's::\n\n term <- first:factor rest:( _ mult_op _ factor )* {on_expr}\n\nThis means that on match, the method of the parser named ``on_expr`` is called\nwith one positional argument: ``value`` and two keyword arguments: ``first`` and\n``rest`` named after the labels in the expression.\n\n``value`` is the full value of the match, something like::\n\n [ 2 [ \" \", \"*\", \"\", 3]]\n\n``first`` would be ``2`` and ``rest`` would be ``[ \" \", \"*\", \"\", 3]``. \n\nI hope the indices of ``r`` in this method make sense, now:\n\n.. code-block:: python\n\n def on_expr(self, value, first, rest):\n result = first\n for r in rest:\n op = r[1]\n if op == '+':\n result += r[3]\n elif op == '-':\n result -= r[3]\n elif op == '*':\n result *= r[3]\n else:\n result /= r[3]\n return result\n\nNote that even though the rule ``_`` has the Kleen star ``*`` it will at least\nreturn an empty string, so ``rest`` is guaranteed to be a 4 elements list.\n\nBecause of its name, ``on_expr`` is also the implicit action of the rule ``expr``.\nThis can of course be overridden by adding an explicit action on the rule\n\nBuiltin method actions\n......................\n\nAt the moment, there's one builtin action ``{{p_flatten}}`` that recursively\nconcatenates a list of lists and strings::\n\n [\"a\", [\"b\", [\"c\", \"d\"], \"e\"], \"fg\"] => \"abcdefg\"\n\nExpressions\n+++++++++++\n\nA rule is defined by an expression. The following sections describe the various\nexpression types. Expressions can be grouped by using parentheses, and a rule\ncan be referenced by its identifier in place of an expression.\n\nChoice expression\n-----------------\n\nThe choice expression is a list of expressions that will be tested in the order\nthey are defined. The first one that matches will be used. Expressions are\nseparated by the forward slash character \"/\". E.g.:\n\n.. code-block::\n\n choice_expr <- A / B / C # A, B and C should be rules declared in the grammar\n\nBecause the first match is used, it is important to think about the order of\nexpressions. For example, in this rule, \"<=\" would never be used because the \"<\"\nexpression comes first:\n\n.. code-block::\n\n bad_choice_expr <- \"<\" / \"<=\"\n\nSequence expression\n-------------------\n\nThe sequence expression is a list of expressions that must all match in that\nsame order for the sequence expression to be considered a match. Expressions are\nseparated by whitespace. E.g.:\n\n.. code-block::\n\n seq_expr <- \"A\" \"b\" \"c\" # matches \"Abc\", but not \"Acb\"\n\nLabeled expression\n------------------\n\nA labeled expression consists of an identifier followed by a colon \":\" and an\nexpression. A labeled expression introduces a variable named with the label that\ncan be referenced in the action of the rule. The variable will have the value of\nthe expression that follows the colon. E.g.:\n\n.. code-block::\n\n labeled_expr <- value:[a-z]+ \"a suffix\" {@value}\n\nIf this sequence matches, the rule returns only the ``[a-z]+`` part instead of\n``[\"thevalue\", \"a suffix\"]``\n\nAnd and not expressions\n-----------------------\n\nAn expression prefixed with the exclamation point ``!`` is the \"not\" predicate\nexpression: it is considered a match if the following expression is not a\nmatch, but it does not consume any input.\n\nAn expression prefixed with the ampersand ``&`` is the \"and\" predicate\nexpression: it is considered a match if the following expression is a match,\nbut it does not consume any input.\n\n``&`` doesn't exist in pure PEG grammar theory, and is sugar for ``!!``\n\n.. code-block::\n\n\tnot_expr <- \"A\" !\"B\" # matches \"A\" if not followed by a \"B\" (does not consume \"B\")\n\tand_expr <- \"A\" &\"B\" # matches \"A\" if followed by a \"B\" (does not consume \"B\")\n\n\nRepeating expressions\n---------------------\n\nAn expression followed by \"*\", \"?\" or \"+\" is a match if the expression occurs\nzero or more times (\"*\"), zero or one time \"?\" or one or more times (\"+\")\nrespectively. The match is greedy, it will match as many times as possible.\nE.g:: \n\n zero_or_more_as <- \"A\"*\n\nLiteral matcher\n---------------\n\nA literal matcher tries to match the input against a single character or a\nstring literal. The literal may be a single-quoted or double-quoted string. \nThe same rules as Python apply regarding allowed characters and escaping.\n\nThe literal may be followed by a lowercase ``i`` (outside the ending quote)\nto indicate that the match is case-insensitive. E.g.::\n\n literal_match <- \"Awesome\\n\"i # matches \"awesome\" followed by a newline\n\nCharacter class matcher\n-----------------------\n\nA character class matcher tries to match the input against a class of\ncharacters inside square brackets ``[...]``. Inside the brackets, characters\nrepresent themselves and the same escapes as in string literals are available,\nexcept that the single- and double-quote escape is not valid, instead the\nclosing square bracket ``]`` must be escaped to be used.\n\nCharacter ranges can be specified using the ``[a-z]`` notation. Unicode chars are\nnot supported yet.\n\nAs for string literals, a lowercase ``i`` may follow the matcher (outside the\nending square bracket) to indicate that the match is case-insensitive. A ``^`` as\nfirst character inside the square brackets indicates that the match is inverted\n(it is a match if the input does not match the character class matcher). E.g.::\n\n not_az <- [^a-z]i\n\nAny matcher\n-----------\n\nThe any matcher is represented by the dot ``.``. It matches any character except\nthe end of file, thus the ``!.`` expression is used to indicate \"match the end of\nfile\". E.g.::\n\n any_char <- . # match a single character\n EOF <- !.\n\nRegex matcher\n-------------\n\nAlthough not in the formal definition of PEG parsers, regex may be handy (OR NOT!)\nand may provide substantial performance improvements. A regex expression is\ndefined in a single- or double-quoted string prefixed by a ``~``.\n\nFlags \"iLmsux\" as described in python ``re`` module can follow the pattern. E.g.::\n\n re_match <- ~\"https?://[\\\\S:@/]*\"i # DON'T TRY THIS ONE, it's just a silly example\n\nError reporting\n===============\n\nPEG parsers design makes automatic syntax error reporting hard. The parser has\nto follow every possible path from the root and fail to parse the document before\nit can tell there's a syntax error. It's even harder to tell where is the error,\nbecause at this point, we only know that every path has fail.\n\nHowever this paper http://arxiv.org/pdf/1405.6646v1.pdf suggest a bunch of\ntechniques to improve syntax error detection, we implemented some of them and, by\nexperience, it's satisfying (i.e: I can debug my errors using fastidious messages).\n\nTODO\n====\n\n- make a tool to generate standalone modules\n- python3\n- more tests\n- tox\n- travis", "description_content_type": null, "docs_url": null, "download_url": "UNKNOWN", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/lisael/fastidious", "keywords": "PEG parser generator lexer", "license": "GPLv3", "maintainer": null, "maintainer_email": null, "name": "fastidious", "package_url": "https://pypi.org/project/fastidious/", "platform": "UNKNOWN", "project_url": "https://pypi.org/project/fastidious/", "project_urls": { "Download": "UNKNOWN", "Homepage": "https://github.com/lisael/fastidious" }, "release_url": "https://pypi.org/project/fastidious/0.1.dev0/", "requires_dist": null, "requires_python": null, "summary": "Yet another python parser generator", "version": "0.1.dev0" }, "last_serial": 2140062, "releases": { "0.1.dev0": [ { "comment_text": "", "digests": { "md5": "09ab0b0052675bba7f98d12e37e58c39", "sha256": "96b6ab6258df8150ebf19c84f084bf3d2f80c9512aacb9ccb8001527fd1e6827" }, "downloads": -1, "filename": "fastidious-0.1.dev0.tar.gz", "has_sig": false, "md5_digest": "09ab0b0052675bba7f98d12e37e58c39", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 23605, "upload_time": "2016-05-30T02:43:27", "url": "https://files.pythonhosted.org/packages/a2/bd/5a750375f3d5b9ded4f5ab7b07b57a6a54dcae65523b18367bd268f2beb8/fastidious-0.1.dev0.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "09ab0b0052675bba7f98d12e37e58c39", "sha256": "96b6ab6258df8150ebf19c84f084bf3d2f80c9512aacb9ccb8001527fd1e6827" }, "downloads": -1, "filename": "fastidious-0.1.dev0.tar.gz", "has_sig": false, "md5_digest": "09ab0b0052675bba7f98d12e37e58c39", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 23605, "upload_time": "2016-05-30T02:43:27", "url": "https://files.pythonhosted.org/packages/a2/bd/5a750375f3d5b9ded4f5ab7b07b57a6a54dcae65523b18367bd268f2beb8/fastidious-0.1.dev0.tar.gz" } ] }