{ "info": { "author": "Scott Rogowski", "author_email": "scottmrogowski@gmail.com", "bugtrack_url": null, "classifiers": [ "Programming Language :: Python :: 2", "Programming Language :: Python :: 3" ], "description": "Sparser is string parsing and regular expressions for humans\n====================================\nParsing strings can be a surprisingly difficult task. This difficulty\nmultiplies when dealing with multiline strings. Traditionally, regular\nexpressions were used for this. As anyone who has worked with regular\nexpressions knows, however, they are difficult to read, difficult to\nmaintain, full of gotchas, and don't scale well over multiple lines.\nSparser was developed to handle this problem.\n\n >>> import sparser as sp\n >>> pattern = \"Hello {{str}}!\"\n >>> string = \"Hello World!\"\n >>> r = sp.match(pattern, string)\n >>> print r\n True\n\n >>> pattern = \"Hello {{str planet}}!\"\n >>> string = \"Hello World!\"\n >>> r = sp.match(pattern, string)\n >>> print r\n True\n\n >>> pattern = \"Hello {{str planet}}!\"\n >>> string = \"Hello World!\"\n >>> r = sp.parse(pattern, string)\n >>> print r\n {'planet': 'World'}\n\nSyntax-wise, Sparser is a mashup of [regular expressions](https://docs.python.org/2/library/re.html)\nand [Handlebars](https://github.com/wycats/handlebars.js/)-style templating. A more precise tag-line\nmight be, *Sparser is a reverse-templating language for matching and parsing strings*\n\n >>> pattern = \"The {{spstr rocket}} {{str}} off at {{int hour}}:{{int minute}}\" \\\n \"and costs {{currency price}}.\"\n >>> compiled = sp.compile(pattern)\n >>> print compiled.parse(\"The Falcon 9 blasts off at 5:30 and costs $62,000,000.\")\n {'rocket': 'Falcon 9',\n 'hour': 5,\n 'minute': 30,\n 'price': 62000000\n }\n\n\nTable of Contents\n=================\n1. [Examples](#examples)\n2. [Installation](#installation)\n3. [Documentation](#documentation)\n 1. [Method reference](#method-reference)\n 2. [Pattern behaviour](#pattern-behavior)\n 3. [Tags](#tags)\n 4. [Built-in types](#built-in-types)\n 5. [Custom types](#custom-types)\n4. [TODO](#todo)\n5. [Similar projects](#similar-projects)\n6. [Bugs](#bugs)\n\n\nExamples\n========\n\nSparser has a basic set of built-in types like `str`, `float`, and `currency`.\nIf this isn't enough, you can also pass `custom types` into the compile method\n\n >>> patt = \"The {{str}} {{animal who}} was {{str action}} {{spstr food}}\" \\\n \"on the {{int date}}rd of {{str month}}\"\n >>> custom_types = {\"animal\": (\"cat|dog|pig\", None)}\n >>> compiled = sp.compile(patt, custom_types)\n >>> print compiled.parse(\"The handsome cat was slurping pho on the 23rd of July\")\n {'who': 'cat',\n 'action': 'slurping',\n 'food': 'pho',\n 'date': 23,\n 'month': 'July'}\n\nThe first argument in the `custom types` tuple is the regex to match. The second\nis a callback method. Use it if you want to clean the output.\n\n >>> custom_types = {\"animal\": (\"cat|dog|pig\", str.upper)}\n >>> compiled = sp.compile(patt, custom_types)\n >>> print compiled.parse(\"The handsome cat was slurping pho on the 23rd of July\")\n {'who': 'CAT',\n 'action': 'slurping',\n 'food': 'pho',\n 'date': 23,\n 'month': 'July'}\n\nFor simple one-offs, use inline `lambda types`\n\n >>> patt = \"The {{str}} {{\"cat|dog|pig\" who}} was {{str action}} {{spstr food}}\" \\\n \"on the {{int date}}{{\"st|nd|rd|th\"}} of {{str month}}\"\n >>> compiled = sp.compile(patt)\n >>> print compiled.parse(\"The handsome cat was slurping pho on the 23rd of July\")\n {'who': 'cat,\n 'action': 'slurping',\n 'food': 'pho',\n 'date': 23,\n 'month': 'July'}\n\nLoops are one of the most power features of Sparser. They are great\nfor lighly formatted tables\n\n >>> patt = \"\"\"\\\n {*loop imdb_list*}\n {*case*}{{int rank}}. {{spstr title}} ({{int year}}) +{{float rating}}{*endcase*}\n {*endloop*}\"\"\"\n >>> compiled = sp.compile(patt)\n >>> imdb_top_250 = \"\"\"\n 1. The Shawshank Redemption (1994) 9.2\n 2. The Godfather (1972) 9.2\n 3. The Godfather: Part II (1974) 9.0\n 4. The Dark Knight (2008) 8.9\"\"\"\n >>> print compiled.parse(imbd_top_250)\n {\"imdb_list\": [\n {\"rating\": 9.2, \"year\": 1994, \"rank\": 1, \"title\": \"The Shawshank Redemption\"},\n {\"rating\": 9.2, \"year\": 1972, \"rank\": 2, \"title\": \"The Godfather\"},\n {\"rating\": 9.0, \"year\": 1974, \"rank\": 3, \"title\": \"The Godfather: Part II\"},\n {\"rating\": 8.9, \"year\": 2008, \"rank\": 4, \"title\": \"The Dark Knight\"}\n ]}\n\nLoops can have multiple cases which can either be named or unnamed\n\n >>> patt = \"\"\"\\\n {*loop line_items*}\n {*case header*}{{spalpha name}}{*endcase*}\n {*case line_item*} {{spalpha name}} {{currency price}}{*endcase*}\n {*case total*} Total {{spalpha name}}: {{currency price}}{*endcase*}\n {*case break*}{*endcase*}\n {*endloop*}\n Net income {{currency net_income}}\"\"\"\n >>> compiled = sp.compile(patt)\n >>> income_statement = \"\"\"\\\n Revenues\n Merchandise sales $30\n Other revenues $8\n Total Revenues: $38\n\n Expenses\n Cost of goods sold $15\n Depreciation $3\n Wages $30\n Total Expenses: $28\n\n Net income - $10\"\"\"\n >>> print compiled.parse(income_statement)\n {\"line_items\": [\n {\"case\": \"header\", \"name\": \"Revenues\"},\n {\"case\": \"line_item\", \"price\": 30.0, \"name\": \"Merchandise sales\"},\n {\"case\": \"line_item\", \"price\": 8.0, \"name\": \"Other revenues\"},\n {\"case\": \"total\", \"price\": 38.0, \"name\": \"Revenues\"},\n {\"case\": \"break\"},\n {\"case\": \"header\", \"name\": \"Expenses\"},\n {\"case\": \"line_item\", \"price\": 15.0, \"name\": \"Cost of goods sold\"},\n {\"case\": \"line_item\", \"price\": 3.0, \"name\": \"Depreciation\"},\n {\"case\": \"line_item\", \"price\": 30.0, \"name\": \"Wages\"},\n {\"case\": \"total\", \"price\": 28.0, \"name\": \"Expenses\"},\n {\"case\": \"break\"}],\n \"net_income\": -10.0}\n\n\nSimilar to loops are switch statements. These can be used for simple\nmulti-option matches\n\n >>> patt = \"\"\"\\\n The Patriots are {*switch statement*}\n {*case fact*}an NFL team{*endcase*}\n {*case opinion*}overrated{*endcase*}\n {*case telling_it_like_it_is*}ball deflaters and serial cheaters{*endcase*}\n {*endswitch*}.\"\"\"\n >>> compiled = sp.compile(patt)\n >>> string = \"The Patriots are ball deflaters and serial cheaters.\"\n >>> print compiled.parse(string)\n {\"statement\": {\"case\": \"telling_it_like_it_is\"}}\n\nYou can use the {\\*include \\*} statement to embed patterns in patterns.\nThis works on a preprocessor level (like #define from C) so it is\nequivalent to copying and pasting. This is useful for reusing common\npatterns or just breaking up and organizing longer ones.\n\n >>> patt = \"\"\"\n {*loop logs*}\n {*case*}{*include iso8601*}: {{spstr error}}{*endcase*}\n {*case*}{{spstr error}}: {*include iso8601*}{*endcase*}\n {*endloop*}\n \"\"\"\n >>> iso8601 = \"\"\"{{int year}}-{{int month}}-{{int day}}T{{int hour}}:{{int minute}}:{{float second}}\"\"\"\n >>> logs = \"\"\"\n AssertionError: 2017-03-04T21:40:43.408923\n ZeroDivisionError: 2017-03-04T21:49:20.932833\n 2017-03-04T21:52:03.987341: TypeError\n \"\"\"\n >>> compiled = sp.compile(patt, includes={\"iso8601\": iso8601})\n >>> print compiled.parse(logs)\n {'logs': [\n {'error': 'AssertionError',\n 'day': 4,\n 'hour': 21,\n 'minute': 40,\n 'month': 3,\n 'second': 43.408923,\n 'year': 2017},\n {'error': 'ZeroDivisionError',\n 'day': 4,\n 'hour': 21,\n 'minute': 49,\n 'month': 3,\n 'second': 20.932833,\n 'year': 2017},\n {'error': 'TypeError',\n 'day': 4,\n 'hour': 21,\n 'minute': 52,\n 'month': 3,\n 'second': 3.987341,\n 'year': 2017}\n ]}\n\nUnfortunately, Sparser does not support nesting switches and loops in v0.1. This might\nbe updated in future versions\n\n\nInstallation\n============\n\n $ pip install sparser\n\nSparser has no dependencies and is supported in both Python 2 & 3. I have no\nidea how Windows users install Python packages but Google can help you.\n\n\nDocumentation\n=============\n\nMethod reference\n----------------\n**sparser.parse**(pattern, string[, custom_types[, includes]])\n\n

Given a pattern and a string, parse the string and return a dictionary.\nIf the string does not match the pattern, a SparserValueError exception\nis raised. Optionally, use custom_types ({type_name: (type_pattern, callback)} format)\nand/or includes ({include_name: pattern})

\n\n**sparser.match**(pattern, string[, custom_types[, includes]])\n\n

The same as parse except instead of returning a dictionary, return True if the\npattern successfully matched the string. Internally, this works the same as\nsparser.parse but is useful when you just need to know whether something matched\nand don't want to deal with error handling or falsy, empty dictionaries.

\n\n**sparser.compile**((pattern[, custom_types[, includes]])\n\n

Pre-compile a pattern and return a SparserObject which you can later call parse/match\non. This is useful if speed is essential or simply as a way to keep your code clean.

\n\n**SparserObject.parse**(string[, custom_types[, includes]])\n\n

Same as sparser.parse but pre-compiled using the sparser.compile method

\n\n**SparserObject.match**(string[, custom_types[, includes]])\n\n

Same as sparser.match but pre-compiled using the sparser.compile method

\n\n\nPattern behavior\n----------------\n#### Matching to the end of input\nSparser expects a perfect match and doesn't do partial matching. What this\nmeans is that this works\n\n >>> print sp.parse(\"Hello {{alpha planet}}, nice to meet you\", \"Hello world, nice to meet you\")\n {'planet': 'world'}\n\nBut these will raise SparserValueErrors\n\n >>> print sp.parse(\"Hello {{alpha planet}}\", \"Hello world, nice to meet you\")\n SparserValueError\n >>> print sp.parse(\"Hello {{alpha planet}}, nice to meet you\", \"Hello world\")\n SparserValueError\n >>> print sp.parse(\"{{alpha planet}}\", \"Hello world\")\n SparserValueError\n\nTo get around this, you can use a `{{spstr}}` tag before or after your pattern\nlike this\n\n >>> print sp.parse(\"Hello {{alpha planet}}{{spstr}}\", \"Hello world, nice to meet you\")\n {'planet': 'world'}\n >>> print sp.parse(\"Hello {{alpha planet}}{{spstr}}\", \"Hello world\")\n {'planet': 'world'}\n >>> print sp.parse(\"{{spstr}} Hello {{alpha planet}}\", \"blah blah Hello world\")\n {'planet': 'world'}\n\n`spstr` is a built-in type that stands for \"spaced string\". This is equivalent\nto the regex `.+` meaning \"any character 1+ times\".\n\n\n#### Un-matched cases\nWhen Sparser is in a switch/loop, and there are multiple cases,\nit will return the first case to match the input. If no cases match within the\nswitch/loop, it will raise a SparserValueError\n\n\n#### Corraling loops and switches\nLoops and switches greedily match everything until the pattern immediately after\nthe block or the end of the string. So, if you have a table like this\n\n Winners\n 1. Peach\n 2. Yoshi\n 3. Luigi\n Play again?\n\nThe pattern below will raise a SparserValueError because it is going to try to\nmatch `Play again?` in the loop.\n\n Winners\n {*loop ranked_players*}\n {*case*}{{int rank}}. {{str name}}{*endcase*}\n {*endloop*}\n\nInstead, be sure to include the last line after the endloop\n\n Winners\n {*loop ranked_players*}\n {*case*}{{int rank}}. {{str name}}{*endcase*}\n {*endloop*}\n Play again?\n\nOr, a `{{spstr}}` works in a pinch\n\n Winners\n {*loop ranked_players*}\n {*case*}{{int rank}}\\. {{str name}}{*endcase*}\n {*endloop*}\n {{spstr}}\n\n\n#### Loops and newlines\nLoops are designed to handle table-like strings so newlines are implied in\nloop-matching. Multiline-loops are supported but inline loops are not.\nSparser should support inline loops in version 0.2. In the\nmeantime, you can use the regex bar (`|`) operator in custom or lambda\ntypes.\n\n\n#### The number of spaces and newlines doesn't matter\nSparser is designed to make no distinction between single spaces and\nmultiple spaces (just like HTML). This means\n\n >>> sp.match(\"The moon\", \"The moon\")\n True\n\nAnd\n\n >>> sp.match(\"The sun\", \"The sun\")\n True\n\nBut\n\n >>> sp.match(\"The sun\", \"Thesun\")\n False\n\nThe same is true for `\\n` newlines. This is not without precedent. HTML\nrendering works in the same way. This was done because of the unique\nchallenges in parsing multi-line regular expressions.\n\n\nTags\n----\nFor the moment, there are only eight tags in Sparser\n\n| Tag | Notes\n| ------------------------------ | -----------------------------------------------------------------------\n| {{ }} | var_name is optional\n| {\\*switch \\*} | can only contain {\\*case\\*} tags as direct decendents\n| {\\*endswitch\\*} |\n| {\\*loop \\*} | can only contain {\\*case\\*} tags as direct decendents\n| {\\*endloop\\*} |\n| {\\*case \\*} | case_name is optional\n| {\\*endcase\\*} |\n| {\\*include \\*} | this just inserts one pattern into another. Think of it like C's `#define` macro\n\n\nBuilt-in types\n---------------\n\nThese types can be used in any variable tag\n\n| Type | Description | Python Regex Pattern\n| -------------- | ------------------------------------------------- | --------\n| str | a string with no spaces | \"\\S+\"\n| spstr | a string with spaces allowed | \".+\"\n| int | an integer. Won't accept decimals | \"-? ?[0-9,]+\"\n| float | a float | \"-? ?[0-9,.]+\"\n| alpha | a string without digits, special chars, or spaces | \"[a-zA-Z]+\"\n| spalpha | a string without digits or special chars | \"[a-zA-Z ]+\"\n| alphanumeric | a string without special chars or spaces | \"[a-zA-Z0-9_]+\"\n| spalphanumeric | a string without special chars | \"[a-zA-Z0-9_ ]+\"\n| currency | a float which might be prefixed with '$' and/or - | \"-? ?[$-]*[0-9,.]+\"\n\n\nCustom types\n------------\nCustom types are optional parameters. They take the form of\n\n {type_name: (regex_pattern, callback), ...}\n\nCallback is a function to modify the extracted string before inclusion in the\ndictionary. For example, to uppercase the match, do\n\n lambda float_str: float_str.upper()\n # str.upper also works\n\nIf you wanted to add a custom date type, you could do something like\n\n date_cb = lambda dt_str: date.strptime(st_str, \"%m/%d/%Y\")\n custom_types = {\"date\": (\"\\d{2}/\\d{2}/\\d{4}\", date_cb)}\n\nIf you just want to return the un-modified string, pass in `None`\n\n custom_types = {\"berry\": (\"(?:blue|black|rasp)berry\" , None)}\n\n\nTODO\n======\n- Nested loops (will need to rewrite major parts as a finite state machine)\n- Inline loops (If a loop is not adjacent to \\n on both sides, we should not automatically newline it)\n- Cleanup the building of the AST\n- Add unicode-compatible currency (euros, yen, etc.)\n- Tests should use assertRaisesRegexp and match to exactly which exception was raised\n- built-in date types\n- .search for un-strict matching\n\n\nSimilar projects\n================\n- [parse](https://pypi.python.org/pypi/parse)\n- [pyParsing](http://pyparsing.wikispaces.com/)\n\n\nBugs\n===========\nThis is version 0.1 and there is liable to be a bug or two that we missed. Please\nlet us know or submit a patch.", "description_content_type": null, "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/sparser/sparser", "keywords": "string parsing,string parserregular expressions,regex,reverse templating", "license": "MIT/X", "maintainer": "", "maintainer_email": "", "name": "sparser", "package_url": "https://pypi.org/project/sparser/", "platform": "UNKNOWN", "project_url": "https://pypi.org/project/sparser/", "project_urls": { "Homepage": "https://github.com/sparser/sparser" }, "release_url": "https://pypi.org/project/sparser/0.101/", "requires_dist": null, "requires_python": "", "summary": "String parsing and regular expressions for humans", "version": "0.101" }, "last_serial": 2958193, "releases": { "0.0.1": [ { "comment_text": "", "digests": { "md5": "121071ecc903b7dbc454448e46f721d7", "sha256": "d76a99d2f5bc3d151505dbff9420525c86ab8da0361a4cbe7598aaa0b7af399a" }, "downloads": -1, "filename": "sparser-0.0.1.tar.gz", "has_sig": false, "md5_digest": "121071ecc903b7dbc454448e46f721d7", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 19330, "upload_time": "2017-06-18T22:55:36", "url": "https://files.pythonhosted.org/packages/3e/3e/9eff40ab602080f8d737ff7a6993f76593f6ac3e5d080fc503c53fb4cb85/sparser-0.0.1.tar.gz" } ], "0.1": [], "0.101": [ { "comment_text": "", "digests": { "md5": "34811b8af091a0fcb544c44a3d378811", "sha256": "eea0b5a7168f52a4ad5758c69ca58c9a75f249f99e28528abbb464aeae2ecc05" }, "downloads": -1, "filename": "sparser-0.101.tar.gz", "has_sig": false, "md5_digest": "34811b8af091a0fcb544c44a3d378811", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 19331, "upload_time": "2017-06-18T22:55:40", "url": "https://files.pythonhosted.org/packages/b9/85/9f2c0db33568fecfad1a38dfba3169429a86de11b3ee55a8bdc0f09c3028/sparser-0.101.tar.gz" } ], "0.2": [ { "comment_text": "", "digests": { "md5": "034d0e4eb44e8ca0ace9f98325cc17e3", "sha256": "44d404e1e676e8aca26e75f6e207cf77fad24fdf2b65c3eadd305d700cb5151a" }, "downloads": -1, "filename": "sparser-0.2.tar.gz", "has_sig": false, "md5_digest": "034d0e4eb44e8ca0ace9f98325cc17e3", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 19321, "upload_time": "2017-06-18T22:58:45", "url": "https://files.pythonhosted.org/packages/dd/03/8535dec4d1887d2e566fa2701189b4f63388b3d07650c02b279596ec0a66/sparser-0.2.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "34811b8af091a0fcb544c44a3d378811", "sha256": "eea0b5a7168f52a4ad5758c69ca58c9a75f249f99e28528abbb464aeae2ecc05" }, "downloads": -1, "filename": "sparser-0.101.tar.gz", "has_sig": false, "md5_digest": "34811b8af091a0fcb544c44a3d378811", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 19331, "upload_time": "2017-06-18T22:55:40", "url": "https://files.pythonhosted.org/packages/b9/85/9f2c0db33568fecfad1a38dfba3169429a86de11b3ee55a8bdc0f09c3028/sparser-0.101.tar.gz" } ] }