{ "info": { "author": "Brian Skinn", "author_email": "bskinn@alum.mit.edu", "bugtrack_url": null, "classifiers": [ "Development Status :: 4 - Beta", "Intended Audience :: Science/Research", "License :: OSI Approved :: MIT License", "Natural Language :: English", "Operating System :: OS Independent", "Programming Language :: Python :: 3", "Programming Language :: Python :: 3 :: Only", "Programming Language :: Python :: 3.5", "Programming Language :: Python :: 3.6", "Programming Language :: Python :: 3.7", "Programming Language :: Python :: 3.8", "Topic :: Scientific/Engineering", "Topic :: Scientific/Engineering :: Mathematics", "Topic :: Utilities" ], "description": "pent Extracts Numerical Text\n============================\n\n*Mini-language driven parser for structured numerical (or other) data\nin free text*\n\n**Current Development Version:**\n\n.. image:: https://img.shields.io/travis/bskinn/pent?label=travis-ci&logo=travis\n :target: https://travis-ci.org/bskinn/pent\n\n.. image:: https://codecov.io/gh/bskinn/pent/branch/master/graph/badge.svg\n :target: https://codecov.io/gh/bskinn/pent\n\n**Most Recent Stable Release:**\n\n.. image:: https://img.shields.io/pypi/v/pent.svg?logo=pypi\n :target: https://pypi.org/project/pent\n\n.. image:: https://img.shields.io/pypi/pyversions/pent.svg?logo=python\n\n**Info:**\n\n.. image:: https://img.shields.io/readthedocs/pent/latest\n :target: http://pent.readthedocs.io/en/v0.2/\n\n.. image:: https://img.shields.io/github/license/mashape/apistatus.svg\n :target: https://github.com/bskinn/pent/blob/stable/LICENSE.txt\n\n.. image:: https://img.shields.io/badge/code%20style-black-000000.svg\n :target: https://github.com/psf/black\n\n----\n\n**Do you have structured numerical data stored as text?**\n\n**Does the idea of writing regex to parse it fill you with loathing?**\n\n``pent`` *can help!*\n\nSay you have data in a text file that looks like this:\n\n.. code::\n\n $vibrational_frequencies\n 18\n 0 0.000000\n 1 0.000000\n 2 0.000000\n 3 0.000000\n 4 0.000000\n 5 0.000000\n 6 194.490162\n 7 198.587114\n 8 389.931897\n 9 402.713910\n 10 538.244274\n 11 542.017838\n 12 548.246738\n 13 800.613516\n 14 1203.096114\n 15 1342.200360\n 16 1349.543713\n 17 1885.157022\n\nWhat's the most efficient way to get that list of floats\nextracted into a ``numpy`` array?\nThere's clearly structure here, but how to exploit it?\n\nIt would work to import the text into a spreadsheet, split columns appropriately,\n`re-export just the one column to CSV `__,\nand import to Python from there,\nbut that's just exhausting drudgery if there are dozens of files involved.\n\nAutomating the parsing via a line-by-line string search would work fine\n(this is how |cclib|_ implements its data imports), but a new line-by-line\nmethod is needed for every new kind of dataset,\nand any time the formatting of a given dataset changes.\n\nIt's not *too* hard to\n`write regex `__\nthat will parse it, but because of the mechanics of regex group captures\nyou have to write two patterns: one to capture the entire block, including the header\n(to ensure other, similarly-formatted data isn't also captured); and then one to\n`iterate line-by-line `__\nover just the data block to extract the individual values. And, of course, one has to actually *write*\n(and proofread, and maintain) the regex.\n\n``pent`` **provides a better way.**\n\nThe data above comes from `this file `__,\n``C2F4_01.hess``. With ``pent``, the data can be pulled into ``numpy`` in just a couple\nof lines, without writing **any** regex at all:\n\n.. code:: python\n\n >>> data = pathlib.Path(\"pent\", \"test\", \"C2F4_01.hess\").read_text()\n >>> prs = pent.Parser(\n ... head=(\"@.$vibrational_frequencies\", \"#.+i\"),\n ... body=(\"#.+i #!..f\")\n ... )\n >>> arr = np.array(prs.capture_body(data), dtype=float)\n >>> print(arr)\n [[[ 0. ]\n [ 0. ]\n [ 0. ]\n [ 0. ]\n [ 0. ]\n [ 0. ]\n [ 194.490162]\n [ 198.587114]\n [ 389.931897]\n [ 402.71391 ]\n [ 538.244274]\n [ 542.017838]\n [ 548.246738]\n [ 800.613516]\n [1203.096114]\n [1342.20036 ]\n [1349.543713]\n [1885.157022]]]\n\nThe result comes out as a length-one list of 2-D matrices, since the search pattern\noccurs only once in the data file. The single 2-D matrix is laid out as a\ncolumn vector, because the data runs down the column in the file.\n\n``pent`` can handle larger, more deeply nested data as well.\nTake `this 18x18 matrix `__\nwithin ``C2F4_01.hess``, for example.\nHere, it's necessary to pass a ``Parser`` as the *body* of another ``Parser``:\n\n.. code:: python\n\n >>> prs_hess = pent.Parser(\n ... head=(\"@.$hessian\", \"#.+i\"),\n ... body=pent.Parser(\n ... head=\"#++i\",\n ... body=\"#.+i #!+.f\"\n ... )\n ... )\n >>> result = prs_hess.capture_body(data)\n >>> arr = np.column_stack([np.array(_, dtype=float) for _ in result[0]])\n >>> print(arr[:3, :7])\n [[ 0.468819 -0.006771 0.020586 -0.38269 0.017874 -0.05449 -0.044552]\n [-0.006719 0.022602 -0.016183 0.010997 -0.033397 0.014422 -0.01501 ]\n [ 0.020559 -0.016184 0.066859 -0.033601 0.014417 -0.072836 0.045825]]\n\nThe need for the generator expression, the ``[0]`` index into ``result``,\nand the composition via ``np.column_stack`` arises\ndue to the manner in which ``pent`` returns data from a nested match like this.\nSee the `documentation `__,\nin particular `this example `__,\nfor more information.\n\nThe grammar of the ``pent`` mini-language is designed to be flexible enough that\nit should handle essentially all well-formed structured data, and even some data\nthat's not especially well formed. Some datasets will require post-processing of the\ndata structures generated by ``pent`` before they can be pulled into\n``numpy`` (see, e.g., `this test `__,\nparsing `this data block `__).\n\n-----\n\nBeta releases available on `PyPI `__: ``pip install pent``\n\nFull documentation is hosted at\n`Read The Docs `__.\n\nSource on `GitHub `__. Bug reports,\nfeature requests, and ``Parser`` construction help requests\nare welcomed at the\n`Issues `__ page there.\n\nCopyright (c) Brian Skinn 2018-2019\n\nLicense: The MIT License. See `LICENSE.txt `__\nfor full license terms.\n\n\n.. |cclib| replace:: ``cclib``\n\n.. _cclib: https://github.com/cclib/cclib\n\n\n", "description_content_type": "text/x-rst", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://www.github.com/bskinn/pent", "keywords": "", "license": "MIT License", "maintainer": "", "maintainer_email": "", "name": "pent", "package_url": "https://pypi.org/project/pent/", "platform": "", "project_url": "https://pypi.org/project/pent/", "project_urls": { "Homepage": "https://www.github.com/bskinn/pent" }, "release_url": "https://pypi.org/project/pent/0.2/", "requires_dist": [ "attrs (>=17.1)", "pyparsing (>=1.5.5)" ], "requires_python": ">=3.4", "summary": "pent Extracts Numerical Text", "version": "0.2", "yanked": false, "yanked_reason": null }, "last_serial": 6036581, "releases": { "0.0": [ { "comment_text": "", "digests": { "md5": "5bac67a84f960aee9ad80161c654236c", "sha256": "9559b791b81ec8dc98ee3f48754891e710efa06bbb3ace9151b04c16576efcf6" }, "downloads": -1, "filename": "pent-0.0.tar.gz", "has_sig": false, "md5_digest": "5bac67a84f960aee9ad80161c654236c", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 1751, "upload_time": "2017-11-15T01:08:42", "upload_time_iso_8601": "2017-11-15T01:08:42.740039Z", "url": "https://files.pythonhosted.org/packages/11/80/7413c058596fa88e770303791f5c93554e9e2e62f5115733326bfb7ad31f/pent-0.0.tar.gz", "yanked": false, "yanked_reason": null } ], "0.1": [ { "comment_text": "", "digests": { "md5": "e9d35dd2c11ec6e100fa4250f0618f97", "sha256": "d38d8a85043466a2841d80df13fd5052b1353fe7006ad3c94e2b78c357f4bdb1" }, "downloads": -1, "filename": "pent-0.1-py3-none-any.whl", "has_sig": false, "md5_digest": "e9d35dd2c11ec6e100fa4250f0618f97", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": ">=3.4", "size": 10484, "upload_time": "2018-09-24T01:48:11", "upload_time_iso_8601": "2018-09-24T01:48:11.844323Z", "url": "https://files.pythonhosted.org/packages/79/06/ba86a0070c930fae34423c485a7a1fd40458558b6f4352bebc4c0669a58c/pent-0.1-py3-none-any.whl", "yanked": false, "yanked_reason": null }, { "comment_text": "", "digests": { "md5": "f3d7c65b8c1125ebfb7530bcd93ba5a5", "sha256": "850b9e435a961edccf0d63b5fbb719b32473ede0bb0262ad239cb145b152992f" }, "downloads": -1, "filename": "pent-0.1.tar.gz", "has_sig": false, "md5_digest": "f3d7c65b8c1125ebfb7530bcd93ba5a5", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.4", "size": 9237, "upload_time": "2018-09-24T01:48:24", "upload_time_iso_8601": "2018-09-24T01:48:24.844494Z", "url": "https://files.pythonhosted.org/packages/54/45/c5af267cfb1820f9ff0f2b11bae75b6da2d33573b10d2eb5123b0c23e856/pent-0.1.tar.gz", "yanked": false, "yanked_reason": null } ], "0.2": [ { "comment_text": "", "digests": { "md5": "5fbdaad3ac982bd64f00fb789354340d", "sha256": "225e56ac9855ed088ec92d6744937c9ddbdd9da3388310d44d883891583850ac" }, "downloads": -1, "filename": "pent-0.2-py3-none-any.whl", "has_sig": false, "md5_digest": "5fbdaad3ac982bd64f00fb789354340d", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": ">=3.4", "size": 16800, "upload_time": "2019-10-27T10:57:03", "upload_time_iso_8601": "2019-10-27T10:57:03.703727Z", "url": "https://files.pythonhosted.org/packages/f2/77/7b14bdd71a54c773a8a3410141b3ea6fc704d3fad4a179a7fb3ba4b7b025/pent-0.2-py3-none-any.whl", "yanked": false, "yanked_reason": null } ], "0.2rc1": [ { "comment_text": "", "digests": { "md5": "d9a896a3e6e1e18e8217c944b478356b", "sha256": "d299024e6acd721c9a129c2cb2e4eeeebd656c626eabfcce9a08117f50b0c204" }, "downloads": -1, "filename": "pent-0.2rc1.tar.gz", "has_sig": false, "md5_digest": "d9a896a3e6e1e18e8217c944b478356b", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.4", "size": 14484, "upload_time": "2018-10-28T14:41:06", "upload_time_iso_8601": "2018-10-28T14:41:06.759851Z", "url": "https://files.pythonhosted.org/packages/1a/67/aade2c475ee315b14eec719c6d69dece4ad7edcadc3195d9fc703dee2fa8/pent-0.2rc1.tar.gz", "yanked": false, "yanked_reason": null } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "5fbdaad3ac982bd64f00fb789354340d", "sha256": "225e56ac9855ed088ec92d6744937c9ddbdd9da3388310d44d883891583850ac" }, "downloads": -1, "filename": "pent-0.2-py3-none-any.whl", "has_sig": false, "md5_digest": "5fbdaad3ac982bd64f00fb789354340d", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": ">=3.4", "size": 16800, "upload_time": "2019-10-27T10:57:03", "upload_time_iso_8601": "2019-10-27T10:57:03.703727Z", "url": "https://files.pythonhosted.org/packages/f2/77/7b14bdd71a54c773a8a3410141b3ea6fc704d3fad4a179a7fb3ba4b7b025/pent-0.2-py3-none-any.whl", "yanked": false, "yanked_reason": null } ], "vulnerabilities": [] }