{ "info": { "author": "John K. Von Seggern", "author_email": "vonseg@gmail.com", "bugtrack_url": null, "classifiers": [ "Development Status :: 2 - Pre-Alpha", "License :: OSI Approved :: MIT License", "Programming Language :: Python", "Topic :: Software Development :: Compilers", "Topic :: Software Development :: Interpreters" ], "description": "sourcer\n=======\n\nSimple parsing library for Python.\n\nThere's not much documentation yet, and the performance is probably pretty\nbad, but if you want to give it a try, go for it!\n\nFeel free to send me your feedback at vonseg@gmail.com. Or use the github\n`issue tracker `_.\n\n.. contents::\n\n\nInstallation\n------------\n\nTo install sourcer::\n\n pip install sourcer\n\nIf pip is not installed, use easy_install::\n\n easy_install sourcer\n\nOr download the source from `github `_\nand install with::\n\n python setup.py install\n\n\nExamples\n--------\n\n\nExample 1: Hello, World!\n~~~~~~~~~~~~~~~~~~~~~~~~\n\nLet's parse the string \"Hello, World!\" (just to make sure the basics work):\n\n.. code:: python\n\n from sourcer import *\n\n # Let's parse strings like \"Hello, foo!\", and just keep the \"foo\" part.\n greeting = 'Hello' >> Opt(',') >> ' ' >> Pattern(r'\\w+') << '!'\n\n # Let's try it on the string \"Hello, World!\"\n person1 = parse(greeting, 'Hello, World!')\n assert person1 == 'World'\n\n # Now let's try omitting the comma, since we made it optional (with \"Opt\").\n person2 = parse(greeting, 'Hello Chief!')\n assert person2 == 'Chief'\n\nSome notes about this example:\n\n* The ``>>`` operator means \"Discard the result from the left operand. Just\n return the result from the right operand.\"\n* The ``<<`` operator similarly means \"Just return the result from the result\n from the left operand and discard the result from the right operand.\"\n* ``Opt`` means \"This term is optional. Parse it if it's there, otherwise just\n keep going.\"\n* ``Pattern`` means \"Parse strings that match this regular expression.\"\n\n\nExample 2: Parsing Arithmetic Expressions\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nHere's a quick example showing how to use operator precedence parsing:\n\n.. code:: python\n\n from sourcer import *\n\n Int = Pattern(r'\\d+') * int\n Parens = '(' >> ForwardRef(lambda: Expr) << ')'\n Expr = OperatorPrecedence(\n Int | Parens,\n InfixRight('^'),\n Prefix('+', '-'),\n Postfix('%'),\n InfixLeft('*', '/'),\n InfixLeft('+', '-'),\n )\n\n # Now let's try parsing an expression.\n t1 = parse(Expr, '1+2^3/4')\n assert t1 == Operation(1, '+', Operation(Operation(2, '^', 3), '/', 4))\n\n # Let's try putting some parentheses in the next one.\n t2 = parse(Expr, '1*(2+3)')\n assert t2 == Operation(1, '*', Operation(2, '+', 3))\n\n # Finally, let's try using a unary operator in our expression.\n t3 = parse(Expr, '-1*2')\n assert t3 == Operation(Operation(None, '-', 1), '*', 2)\n\nSome notes about this example:\n\n* The ``*`` operator means \"Take the result from the left operand and then\n apply the function on the right.\"\n* In this case, the function is simply ``int``.\n* So in our example, the ``Int`` rule matches any string of digit characters\n and produces the corresponding ``int`` value.\n* So the ``Parens`` rule in our example parses an expression in parentheses,\n discarding the parentheses.\n* The ``ForwardRef`` term is necessary because the ``Parens`` rule wants to\n refer to the ``Expr`` rule, but ``Expr`` hasn't been defined by that point.\n* The ``OperatorPrecedence`` rule constructs the operator precedence table.\n It parses operations and returns ``Operation`` objects.\n\n\nExample 3: Building an Abstract Syntax Tree\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nLet's try building a simple AST for the\n`lambda calculus `_. We can use\n``Struct`` classes to define the AST and the parser at the same time:\n\n.. code:: python\n\n from sourcer import *\n\n class Identifier(Struct):\n def parse(self):\n self.name = Word\n\n class Abstraction(Struct):\n def parse(self):\n self.parameter = '\\\\' >> Word\n self.body = '. ' >> Expr\n\n class Application(LeftAssoc):\n def parse(self):\n self.left = Operand\n self.operator = ' '\n self.right = Operand\n\n Word = Pattern(r'\\w+')\n Parens = '(' >> ForwardRef(lambda: Expr) << ')'\n Operand = Parens | Abstraction | Identifier\n Expr = Application | Operand\n\n t1 = parse(Expr, r'(\\x. x) y')\n assert isinstance(t1, Application)\n assert isinstance(t1.left, Abstraction)\n assert isinstance(t1.right, Identifier)\n assert t1.left.parameter == 'x'\n assert t1.left.body.name == 'x'\n assert t1.right.name == 'y'\n\n t2 = parse(Expr, 'x y z')\n assert isinstance(t2, Application)\n assert isinstance(t2.left, Application)\n assert isinstance(t2.right, Identifier)\n assert t2.left.left.name == 'x'\n assert t2.left.right.name == 'y'\n assert t2.right.name == 'z'\n\n\nExample 4: Tokenizing\n~~~~~~~~~~~~~~~~~~~~~\n\nIt's often useful to tokenize your input before parsing it. Let's create a\ntokenizer for the lambda calculus.\n\n.. code:: python\n\n from sourcer import *\n\n class LambdaTokens(TokenSyntax):\n def __init__(self):\n self.Word = r'\\w+'\n self.Symbol = AnyChar(r'(\\.)')\n self.Space = Skip(r'\\s+')\n\n # Run the tokenizer on a lambda term with a bunch of random whitespace.\n Tokens = LambdaTokens()\n ans1 = tokenize(Tokens, '\\n ( x y\\n\\t) ')\n\n # Assert that we didn't get any space tokens.\n assert len(ans1) == 4\n (t1, t2, t3, t4) = ans1\n assert isinstance(t1, Tokens.Symbol) and t1.content == '('\n assert isinstance(t2, Tokens.Word) and t2.content == 'x'\n assert isinstance(t3, Tokens.Word) and t3.content == 'y'\n assert isinstance(t4, Tokens.Symbol) and t4.content == ')'\n\n # Let's use the tokenizer with a simple grammar, just to show how that\n # works.\n Sentence = Some(Tokens.Word) << '.'\n ans2 = tokenize_and_parse(Tokens, Sentence, 'This is a test.')\n\n # Assert that we got a list of Word tokens.\n assert all(isinstance(i, Tokens.Word) for i in ans2)\n\n # Assert that the tokens have the expected content.\n contents = [i.content for i in ans2]\n assert contents == ['This', 'is', 'a', 'test']\n\n\nIn this example, the ``Skip`` term tells the tokenizer that we want to ignore\nwhitespace. The ``AnyChar`` term tell the tokenizer that a symbol can be any\none of the characters ``(``, ``\\``, ``.``, ``)``. Alternatively, we could have\nused:\n\n.. code:: python\n\n Symbol = r'[(\\\\.)]'\n\n\nExample 5: Parsing Significant Indentation\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nWe can use sourcer to parse languages with significant indentation. Here's a\nbare-bones example to demonstrate one possible approach.\n\n.. code:: python\n\n from sourcer import *\n\n class TestTokens(TokenSyntax):\n def __init__(self):\n # Let's just use words, newlines, and spaces in this example.\n self.Word = r'\\w+'\n self.Newline = r'\\n'\n # In this case, we'll say that an indent is a newline followed by\n # some spaces, followed by a word.\n self.Indent = r'(?<=\\n) +(?=\\w)'\n # And let's just throw out all other space characters.\n self.Space = Skip(' +')\n\n # All our token classes are attributes of this ``Tokens`` object. It's\n # essentially a namespace for our token classes.\n Tokens = TestTokens()\n\n class InlineStatement(Struct):\n def parse(self):\n # Let's say an inline-statement is just some word tokens. We'll use\n # ``Content`` to get the string content of each token (since in this\n # case, we don't care about the tokens themselves).\n self.words = Some(Content(Tokens.Word))\n\n def __repr__(self):\n # We'll define a ``repr`` method so that we can easily check the\n # parse results. We'll just put a semicolon after each statement.\n return '%s;' % ' '.join(self.words)\n\n class Block(Struct):\n def parse(self, indent=''):\n # A block is a bunch of statements at the same indentation,\n # all separated by some newline tokens.\n self.statements = Statement(indent) // Some(Tokens.Newline)\n\n def __repr__(self):\n # In this case, we'll put a space between each statement and enclose\n # the whole block in curly braces. This will make it easy for us to\n # tell if our parse results look right.\n return '{%s}' % ' '.join(repr(i) for i in self.statements)\n\n def Statement(indent):\n # Let's say there are two ways to get a statement:\n # - Get an inline-statement with the current indentation.\n # - Get a block that is indented farther than the current indentation.\n return (CurrentIndent(indent) >> InlineStatement\n | IncreaseIndent(indent) ** Block)\n\n def CurrentIndent(indent):\n # The point of this function is to return a parsing expression that\n # matches the current indent (which is provided as an argument).\n return Return('') if indent == '' else indent\n\n def IncreaseIndent(current):\n # To see if the next indentation is more than the current indentation,\n # we peek at the next token, using ``Expect``, and we get its string\n # content using ``Content``. The ``^`` operator means \"require\". In this\n # case, we require that the next indentation is longer than the current\n # indentation.\n token = Expect(Content(Tokens.Indent))\n return token ^ (lambda token: len(current) < len(token))\n\n # Let's say that a program is a block, optionally surrounded by newlines.\n # (The ``>>`` and ``<<`` operators discard the newlines in this case.)\n OptNewlines = List(Tokens.Newline)\n Program = OptNewlines >> Block << OptNewlines\n\n test = '''\n print foo\n while true\n print bar\n if baz\n then break\n exit\n '''\n\n # Let's parse the test case and then use ``repr`` to make sure that we get\n # back what we expect.\n ans = tokenize_and_parse(Tokens, Program, test)\n expect = '{print foo; while true; {print bar; if baz; {then break;}} exit;}'\n assert repr(ans) == expect\n\n\nMore Examples\n-------------\nParsing `Excel formula `_\nand some corresponding\n`test cases `_.", "description_content_type": null, "docs_url": null, "download_url": "UNKNOWN", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/jvs/sourcer", "keywords": "packrat,parser,peg", "license": "MIT License", "maintainer": null, "maintainer_email": null, "name": "sourcer", "package_url": "https://pypi.org/project/sourcer/", "platform": "any", "project_url": "https://pypi.org/project/sourcer/", "project_urls": { "Download": "UNKNOWN", "Homepage": "https://github.com/jvs/sourcer" }, "release_url": "https://pypi.org/project/sourcer/0.1.3/", "requires_dist": null, "requires_python": null, "summary": "simple parsing library", "version": "0.1.3" }, "last_serial": 1442541, "releases": { "0.1.1": [ { "comment_text": "", "digests": { "md5": "dd3c8d605c374a38e2cd2a70190ab86b", "sha256": "42d0ce9a77bd2aa7b5e2d9d428348e4a9344e1b4342059e3e2ac20803f2f9374" }, "downloads": -1, "filename": "sourcer-0.1.1.tar.gz", "has_sig": false, "md5_digest": "dd3c8d605c374a38e2cd2a70190ab86b", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 4877, "upload_time": "2015-02-19T07:39:06", "url": "https://files.pythonhosted.org/packages/77/66/48b0a515537d885950d74b51a266b70143b37ae1db4fd781cd0ecc0ee9cb/sourcer-0.1.1.tar.gz" } ], "0.1.2": [ { "comment_text": "", "digests": { "md5": "5b99715347ac429d5f52ce9a0c2568c6", "sha256": "61b78a8a4412fff8aef109cc143c08953bb85154bb865e61cb54682f6375fd31" }, "downloads": -1, "filename": "sourcer-0.1.2.tar.gz", "has_sig": false, "md5_digest": "5b99715347ac429d5f52ce9a0c2568c6", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 7420, "upload_time": "2015-02-21T23:58:24", "url": "https://files.pythonhosted.org/packages/24/cf/f82de97ea7522527177451c7f20f00c3b7dc0065f1f960de031f270c8162/sourcer-0.1.2.tar.gz" } ], "0.1.3": [ { "comment_text": "", "digests": { "md5": "1efb5675cb7338b8e8493b0a81e0d71b", "sha256": "bdae3b196960acd6171009b9e865420f9d664aaa8f6b7fb1e07ca8495bdb5e0d" }, "downloads": -1, "filename": "sourcer-0.1.3.tar.gz", "has_sig": false, "md5_digest": "1efb5675cb7338b8e8493b0a81e0d71b", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 10952, "upload_time": "2015-02-28T19:21:54", "url": "https://files.pythonhosted.org/packages/83/0e/b8ccbb6eeebe3ac0db24b089d5bbf4dbdd12470eef6269916f32641e05e6/sourcer-0.1.3.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "1efb5675cb7338b8e8493b0a81e0d71b", "sha256": "bdae3b196960acd6171009b9e865420f9d664aaa8f6b7fb1e07ca8495bdb5e0d" }, "downloads": -1, "filename": "sourcer-0.1.3.tar.gz", "has_sig": false, "md5_digest": "1efb5675cb7338b8e8493b0a81e0d71b", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 10952, "upload_time": "2015-02-28T19:21:54", "url": "https://files.pythonhosted.org/packages/83/0e/b8ccbb6eeebe3ac0db24b089d5bbf4dbdd12470eef6269916f32641e05e6/sourcer-0.1.3.tar.gz" } ] }