{ "info": { "author": "Jeroen van Straten", "author_email": "j.vanstraten-1@tudelft.nl", "bugtrack_url": null, "classifiers": [ "Development Status :: 4 - Beta", "Intended Audience :: Developers", "License :: OSI Approved :: MIT License", "Programming Language :: Python :: 3", "Topic :: Software Development :: Code Generators" ], "description": "vhdre: a VHDL regex matcher generator\n================================================================================\n\nThis module allows you to generate a hardware implementation of a regular\nexpression matcher operating on UTF-8-encoded strings. The regular expression\nfeatures are limited, regex compilation is done prior to synthesis, and the unit\nwill only tell you *if* a string matched, but it can do so extremely fast. It's\nintended to be used for needle-in-a-haystack searching, where the initial\nreduction is so large that you can just let a CPU figure out where and how the\nstrings matched after letting vhdre do the initial filtering.\n\n### Highlights ###\n\n - UTF-8 bytestream decoding with error detection (for most errors).\n\n - Stream interfaces compatible with AXI4.\n\n - Can match multiple regular expressions simultaneously using shared decoding\n logic.\n\n - Can matches multiple bytes (or even strings) per clock cycle depending on\n configuration.\n\n - Optimized for FPGAs with 6-input LUTs, with a critical path of only 2 LUTs\n per cycle for most regular expressions when configured for one byte per\n cycle matching.\n\n - Different interface styles varying in complexity.\n\n### Dependencies ###\n\n - Python script: only Python 3.x and funcparserlib.\n\n - Generated VHDL: no dependencies, should synthesize/simulate using any\n VHDL-93 tool.\n\n\nSupported regex features\n-------------------------------------------------------------------------------\n\n### Supported syntax ###\n\n - `x` -> match `x` exactly.\n - `x*` -> match `x` zero or more times.\n - `x+` -> match `x` one or more times.\n - `x?` -> match `x` zero or one times.\n - `xy` -> match `x` followed by `y`.\n - `x|y` -> match `x` or `y`.\n - `(x)` -> parentheses are for disambiguation only (there are no capture\n groups).\n - `[x]` -> character class (matches `x`).\n - `[^x]` -> inverted character class (matches everything except `x`).\n - `[ab]` -> match either `a` or `b`.\n - `[a-z]` -> match anything in the `a-z` code point range (inclusive).\n\n### Escape sequences ###\n\nNote that the unicode escape sequences deviate from the norm.\n\n - `\\xHH` -> `HH` = 2-digit hexadecimal code point\n - `\\uHHHHHH` -> `HHHHHH` = 6-digit hexadecimal code point\n - `\\d` -> `[0-9]`\n - `\\w` -> `[0-9a-zA-Z_]`\n - `\\s` -> `[ \\a\\b\\t\\n\\v\\f\\r]`\n - `\\a\\b\\t\\n\\v\\f\\r` -> the usual control character escape sequences\n - anything else matches the character behind the \\ literally\n\n### Notes ###\n\n - The matcher does regex *matching*. That is, the given string must match the\n regular expression exactly, as if it is surrounded by a `^` and a `$`. To do\n searching instead, prefix and/or append `.*`.\n\n - The following characters must be escaped outside character classes:\n `[\\^$.|?*+()`\n\n - The following characters must be escaped inside character classes:\n `]-`\n\n - There is no difference between greedy and lazy matching in this engine. Any\n string that conforms to the regex in any way is considered to be a match.\n\n - A lot of the more advanced regex features are not supported because they\n require backtracking. This engine specifically can never do that, because\n it is based on a NFAE.\n\n\nUsage\n-------------------------------------------------------------------------------\n\n`vhdre` is an executable Python 3 module. Running it without arguments will\ngive you some basic usage information:\n\n```\n$ python3 -m vhdre\nUsage: python -m vhdre ... [-- ...]\n\nGenerates a file by the name .vhd in the working directory\nwhich matches against the given regular expressions. If one or more test\nstrings are provided, a testbench by the name _tb.vhd is\nalso generated. To insert a unicode code point, use {0xHHHHHH:u}. To\ninsert a raw byte (for instance to check error handling) use {0xHH:b}.\n{{ and }} can be used for matching { and } literally.\n```\n\nThe script will compile the regular expression(s) into a nondeterministic\nfinite automaton, which it then builds a VHDL unit for. Just to avoid confusion\nhere: yes, the regular expressions are fixed after generation. They are NOT\nruntime-configurable. You need to resynthesize your design if you want to\nchange the regex.\n\nThe generated VHDL unit can be instantiated in many different ways, varying\nin complexity and speed. This is the simplest form:\n\n```vhdl\nlbl: entity work.\n port map (\n clk => -- clock signal.\n in_valid => -- signal indicating that there is an incoming byte.\n in_data => -- the incoming byte.\n in_last => -- signal indicating whether this byte is the last in\n -- the string.\n out_valid => -- signal indicating that a string has been received.\n out_match => -- indicates which regular expressions were matched if\n -- `out_valid` is high.\n );\n```\n\nThe above signalling doesn't support empty strings. If you need that, we need\nto add a \"valid\" flag that is independent of `in_last`: `in_mask`. Whenever\nan empty string is received, simply set `in_mask` low and `in_last` high.\n\n```vhdl\nlbl: entity work.\n port map (\n clk => -- clock signal.\n in_valid => -- signal indicating that there is an incoming byte\n -- or empty string.\n in_mask(0) => -- signal indicating that there is an incoming byte.\n in_data => -- the incoming byte.\n in_last => -- signal indicating that this byte is the last in the\n -- string, or if there is no byte, that the previous\n -- byte was the last one, if any.\n out_valid => -- signal indicating that a string has been received.\n out_match => -- indicates which regular expressions were matched if\n -- `out_valid` is high.\n );\n```\n\nBefore we make things difficult, you can extend any of the port maps listed\nhere with one or more of the following signals depending on your needs:\n - `reset`: an active-high synchronous reset.\n - `aresetn`: an active-low asynchronous reset.\n - `clken`: active-high global clock enable signal.\n - `in_ready` & `out_ready`: AXI4-compatible backpressure signals, in case\n your output can stall. NOTE: backpressure is implemented trivially. If\n it gives you timing problems, read the \"notes on backpressure and\n timing closure\" section below.\n - `out_error` or `out_xerror`: output signal indicating whether an UTF-8\n decoding error occurred. `out_xerror` should be used instead of\n `out_error` ONLY by systems which handle multiple strings per cycle.\n\nTo get more speed, vhdre supports handling multiple bytes per cycle. Two\nflavors are supported: one where the amount of *strings* per cycle is still\nlimited to one, and one where there is no such limitation. The latter\nrequires you to use a more complex output stream.\n\nHere's the first flavor (at most one string per cycle):\n\n```vhdl\nlbl: entity work.\n generic map (\n BPC => -- number of bytes per cycle.\n )\n port map (\n clk => -- clock signal.\n in_valid => -- signal indicating that there are incoming control\n -- or data signals.\n in_mask => -- signal indicating which of the incoming bytes are\n -- valid. Its width is equal to BPC.\n in_data => -- the incoming bytes. Its width is equal to BPC*8.\n in_last => -- signal indicating that the last byte received so far\n -- is the last byte of a string, if any.\n out_valid => -- signal indicating that a string has been received.\n out_match => -- indicates which regular expressions were matched if\n -- `out_valid` is high.\n );\n```\n\nAnd here's the second flavor (no limitations). We use the word \"slots\" to\nrefer to the parallel matching engines since they no longer necessarily\ncorrespond to valid bytes at this point.\n\n```vhdl\nlbl: entity work.\n generic map (\n BPC => -- number of bytes per cycle.\n )\n port map (\n clk => -- clock signal.\n in_valid => -- signal indicating that there are incoming control\n -- or data signals.\n in_mask => -- signal indicating which of the incoming bytes are\n -- valid. Its width is equal to BPC.\n in_data => -- the incoming bytes for each slot. Its width is equal\n -- to BPC*8.\n in_xlast => -- signal indicating which of the byte slots are to be\n -- interpreted as string boundaries. Its width is\n -- equal to BPC.\n out_valid => -- signal indicating that one or more strings have been\n -- received.\n out_xmask => -- indicates which of the parallel slots received a\n -- string terminator. Its width is BPC.\n out_xmatch => -- indicates which regular expressions were matched by\n -- each slot. Its width is BPC.\n );\n```\n\nIt is guaranteed that matches are reported by the same slot for which\n`in_xlast` was set. Let's say that you have a system that can handle 8 bytes\nper cycle, but strings always start at 4-byte boundaries, and you only want\nto worry about handling at most two strings per cycle. In this case, you can\nuse only `in_xlast(3)` and `in_xlast(7)` (appropriately setting\n`in_mask(1..3)` and `in_mask(5..7)` low for strings that are not multiples\nof four in length), which means that you only need to look at `out_xmask(3)`\nand `out_xmask(7)` because of this guarantee.\n\nBy default, vhdre is little endian; i.e., lower indexed bytes/slots are\nmatched first. As a little convenience thing for big endian systems, you can\nswap the order by setting the `BIG_ENDIAN` generic to true. Of course this\nmeans that you need to use reversed indices in the above example.\n\n### Notes on backpressure and timing closure ###\n\nThe `in_ready` and `out_ready` signals in this unit are implemented in a very\ntrivial way: they are practically just passed through the entire unit\ncombinatorially. This can lead to timing problems in high-throughput systems.\n\nThere are a couple things you can do to work around this, ordered by effort\nvs. improvement in timing:\n\n - Insert an 2-stage AXI4 stream slice directly after the output of this unit\n to break the critical path from your stream sink's ready signal to the\n matcher.\n\n - Insert an 2-stage AXI4 stream slice directly before the input of this unit\n to break the critical path from the regex matcher's ready signal to your\n data source.\n\n - Bypass vhdre's backpressure logic entirely by placing a FIFO after this\n unit's output and tying its almost-empty signal to the source stream's\n ready signal. This unit's `in_valid` signal must be tied to\n `valid and ready`, so it is strobed only when a transfer is acknowledged.\n The FIFO must be configured such that it releases its almost-empty signal\n when there are less than 6 free entries in the FIFO. This ensures that\n the FIFO can never become full, because vhdre's pipeline isn't deep\n enough to generate more data than that when it doesn't receive any input.\n This allows this unit to ignore the `ready` signal from the FIFO. NOTE:\n do NOT connect the FIFO `ready` signal to `out_ready`! Doing that\n would essentially make a false path through vhdre's silly backpressure\n system.\n\n\n", "description_content_type": "text/markdown", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/abs-tudelft/vhdre", "keywords": "vhdl regular expression generator", "license": "MIT", "maintainer": "", "maintainer_email": "", "name": "vhdre", "package_url": "https://pypi.org/project/vhdre/", "platform": "", "project_url": "https://pypi.org/project/vhdre/", "project_urls": { "Homepage": "https://github.com/abs-tudelft/vhdre", "Source": "https://github.com/abs-tudelft/vhdre" }, "release_url": "https://pypi.org/project/vhdre/0.2/", "requires_dist": [ "funcparserlib" ], "requires_python": ">=3", "summary": "VHDL code generator for matching regular expressions.", "version": "0.2" }, "last_serial": 5276724, "releases": { "0.2": [ { "comment_text": "", "digests": { "md5": "ed4b077b4792331ba2cf69aca2772977", "sha256": "8d1e0aace7a9656a76d2b8bd5107dae0e749f0a0621a79129fdf1233d76e7de4" }, "downloads": -1, "filename": "vhdre-0.2-py3-none-any.whl", "has_sig": false, "md5_digest": "ed4b077b4792331ba2cf69aca2772977", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": ">=3", "size": 25360, "upload_time": "2019-05-16T10:05:09", "url": "https://files.pythonhosted.org/packages/df/f7/63e26cd812a08ce7e3e55de60b377ac2b46b5665ea0dcb2d6a8b44ce757e/vhdre-0.2-py3-none-any.whl" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "ed4b077b4792331ba2cf69aca2772977", "sha256": "8d1e0aace7a9656a76d2b8bd5107dae0e749f0a0621a79129fdf1233d76e7de4" }, "downloads": -1, "filename": "vhdre-0.2-py3-none-any.whl", "has_sig": false, "md5_digest": "ed4b077b4792331ba2cf69aca2772977", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": ">=3", "size": 25360, "upload_time": "2019-05-16T10:05:09", "url": "https://files.pythonhosted.org/packages/df/f7/63e26cd812a08ce7e3e55de60b377ac2b46b5665ea0dcb2d6a8b44ce757e/vhdre-0.2-py3-none-any.whl" } ] }