{ "info": { "author": "zhangjinjie", "author_email": "zhangjinjie@yimian.com.cn", "bugtrack_url": null, "classifiers": [ "Development Status :: 3 - Alpha", "License :: OSI Approved :: GNU Lesser General Public License v3 or later (LGPLv3+)", "Programming Language :: Python :: 2.7", "Programming Language :: Python :: 3.5", "Programming Language :: Python :: 3.6", "Programming Language :: Python :: 3.7", "Topic :: Text Processing :: Linguistic" ], "description": "### pyrefo: a fast regex for object\n\nThis project is based on [refo](https://github.com/machinalis/refo) and the paper [Regular Expression Matching: the Virtual Machine Approach](https://swtch.com/~rsc/regexp/regexp2.html), it use cffi to extend python with c to speed accelerate processing performance.\n\nThis project has done the following work:\n\n1. full compatiable with refo api, support all patterns and match, search, finditer methods;\n2. fix c source bug included in the paper;\n3. use cffi to extend python with c;\n4. add new feature which supports partial match;\n5. add new `Phrase`pattern which can realize `'ab'`match `['a', 'b', 'c']`list;\n\n\n\n### performance test\n\n#### prerequisites\n\n```python\nimport jieba\ntext = '\u4e3a\u4ec0\u4e48\u5728\u672c\u5e97\u4e70\u4e1c\u897f\uff1f\u56e0\u4e3a\u7269\u6d41\u8fc5\u901f\uff0b\u54c1\u8d28\u4fdd\u8bc1\u3002\u4e3a\u4ec0\u4e48\u6211\u8d2d\u4e70\u7684\u6bcf\u4ef6\u5546\u54c1\u8bc4\u4ef7\u90fd\u4e00\u6837\u5462\uff1f\u56e0\u4e3a\u6211\u4e70\u7684\u4e1c\u897f\u592a\u591a\u4e86\uff0c\u79ef\u7d2f\u4e86\u5f88\u591a\u672a\u8bc4\u4ef7\u7684\u8ba2\u5355\uff0c\u6240\u4ee5\u6211\u7edf\u4e00\u7528\u8fd9\u6bb5\u8bdd\u4f5c\u4e3a\u8bc4\u4ef7\u5185\u5bb9\u3002\u5982\u679c\u6211\u7528\u4e86\u8fd9\u6bb5\u8bdd\u4f5c\u4e3a\u8bc4\u4ef7\uff0c\u90a3\u5c31\u8bf4\u660e\u8fd9\u6b3e\u4ea7\u54c1\u975e\u5e38\u8d5e\uff0c\u975e\u5e38\u597d\uff01'\ntokens = list(jieba.cut(text))\n```\n\n#### CPython\n\n- pyrefo\n\n```python\nfrom pyrefo import search, Group, Star, Any, Literal\n%timeit search(Group(Literal('\u7269\u6d41') + Star(Any()) + Literal('\u8fc5\u901f'), 'a'), tokens)\n```\n\n```shell\n95.9 \u00b5s \u00b1 472 ns per loop (mean \u00b1 std. dev. of 7 runs, 10000 loops each)\n```\n\n- refo\n\n```python\nimport refo\n%timeit refo.search(refo.Group(refo.Literal('\u7269\u6d41') + refo.Star(refo.Any()) + refo.Literal('\u8fc5\u901f'), 'a'), tokens)\n```\n\n```shell\n1.03 ms \u00b1 7.27 \u00b5s per loop (mean \u00b1 std. dev. of 7 runs, 1000 loops each)\n```\n\n- re\n\n```python\nimport re\n%timeit re.search('(\u7269\u6d41.*\u901f\u5ea6)', text)\n```\n\n```shell\n989 ns \u00b1 4.69 ns per loop (mean \u00b1 std. dev. of 7 runs, 1000000 loops each)\n```\n\n#### PyPy\n\n- pyrefo\n\n```python\nfrom pyrefo import search, Group, Star, Any, Literal\n%timeit search(Group(Literal('\u7269\u6d41') + Star(Any()) + Literal('\u8fc5\u901f'), 'a'), tokens)\n```\n\n```shell\n53.4 \u00b5s \u00b1 28 \u00b5s per loop (mean \u00b1 std. dev. of 7 runs, 1000 loops each)\n```\n\n- refo\n\n```python\nimport refo\n%timeit refo.search(refo.Group(refo.Literal('\u7269\u6d41') + refo.Star(refo.Any()) + refo.Literal('\u8fc5\u901f'), 'a'), tokens)\n```\n\n```shell\n78 \u00b5s \u00b1 35.8 \u00b5s per loop (mean \u00b1 std. dev. of 7 runs, 1000 loops each)\n```\n\n- re\n\n```shell\nimport re\n%timeit re.search('(\u7269\u6d41.*\u901f\u5ea6)', text)\n```\n\n```shell\n347 ns \u00b1 3.26 ns per loop (mean \u00b1 std. dev. of 7 runs, 1000000 loops each)\n```", "description_content_type": "", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "http://github.com/yimian/pyrefo", "keywords": "regex", "license": "GPLv3+", "maintainer": "", "maintainer_email": "", "name": "pyrefo", "package_url": "https://pypi.org/project/pyrefo/", "platform": "", "project_url": "https://pypi.org/project/pyrefo/", "project_urls": { "Homepage": "http://github.com/yimian/pyrefo" }, "release_url": "https://pypi.org/project/pyrefo/0.1/", "requires_dist": null, "requires_python": "", "summary": "a fast regex for object", "version": "0.1" }, "last_serial": 4348578, "releases": { "0.1": [ { "comment_text": "", "digests": { "md5": "c77a6fc461b05ff70d32b1d87de92fde", "sha256": "4efa6db958694d6b012183dad7e1e069252bf41c3450c07364e127f8abd621ee" }, "downloads": -1, "filename": "pyrefo-0.1.tar.gz", "has_sig": false, "md5_digest": "c77a6fc461b05ff70d32b1d87de92fde", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 8143, "upload_time": "2018-10-07T04:38:10", "url": "https://files.pythonhosted.org/packages/a7/7c/9ec7ba3851b2918bf0e493a0fb3dd35c606e1afce00851f903e85509fef1/pyrefo-0.1.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "c77a6fc461b05ff70d32b1d87de92fde", "sha256": "4efa6db958694d6b012183dad7e1e069252bf41c3450c07364e127f8abd621ee" }, "downloads": -1, "filename": "pyrefo-0.1.tar.gz", "has_sig": false, "md5_digest": "c77a6fc461b05ff70d32b1d87de92fde", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 8143, "upload_time": "2018-10-07T04:38:10", "url": "https://files.pythonhosted.org/packages/a7/7c/9ec7ba3851b2918bf0e493a0fb3dd35c606e1afce00851f903e85509fef1/pyrefo-0.1.tar.gz" } ] }