{ "info": { "author": "Dmitry Marakasov", "author_email": "amdmi3@amdmi3.ru", "bugtrack_url": null, "classifiers": [ "Development Status :: 4 - Beta", "Intended Audience :: Developers", "License :: OSI Approved :: MIT License", "Programming Language :: C++", "Programming Language :: Python :: 3.4", "Programming Language :: Python :: 3.5", "Programming Language :: Python :: 3.6", "Programming Language :: Python :: 3.7", "Programming Language :: Python :: 3.8", "Programming Language :: Python :: 3 :: Only" ], "description": "# jsonslicer - stream JSON parser\n\n\n\t\"jsonslicer\n\n\n[![Build Status](https://travis-ci.org/AMDmi3/jsonslicer.svg?branch=master)](https://travis-ci.org/AMDmi3/jsonslicer)\n[![Coverage Status](https://coveralls.io/repos/github/AMDmi3/jsonslicer/badge.svg?branch=master)](https://coveralls.io/github/AMDmi3/jsonslicer?branch=master)\n[![PyPI downloads](https://img.shields.io/pypi/dm/jsonslicer.svg)](https://pypi.org/project/jsonslicer/)\n[![PyPI version](https://img.shields.io/pypi/v/jsonslicer.svg)](https://pypi.org/project/jsonslicer/)\n[![PyPI pythons](https://img.shields.io/pypi/pyversions/jsonslicer.svg)](https://pypi.org/project/jsonslicer/)\n[![Github commits (since latest release)](https://img.shields.io/github/commits-since/AMDmi3/jsonslicer/latest.svg)](https://github.com/AMDmi3/jsonslicer)\n\n## Overview\n\nJsonSlicer performs a **stream** or **iterative**, **pull** JSON\nparsing, which means it **does not load** whole JSON into memory\nand is able to parse **very large** JSON files or streams. The\nmodule is written in C and uses [YAJL](https://lloyd.github.io/yajl/)\nJSON parsing library, so it's also quite **fast**.\n\nJsonSlicer takes a **path** of JSON map keys or array indexes, and\nprovides **iterator interface** which yields JSON data matching\ngiven path as complete Python objects.\n\n## Example\n\n```json\n{\n \"friends\": [\n {\"name\": \"John\", \"age\": 31},\n {\"name\": \"Ivan\", \"age\": 26}\n ],\n \"colleagues\": {\n \"manager\": {\"name\": \"Jack\", \"age\": 33},\n \"subordinate\": {\"name\": \"Lucy\", \"age\": 21}\n }\n}\n```\n\n```python\nfrom jsonslicer import JsonSlicer\n\n# Extract specific elements:\nwith open('people.json') as data:\n ivans_age = next(JsonSlicer(data, ('friends', 1, 'age')))\n # 26\n\nwith open('people.json') as data:\n managers_name = next(JsonSlicer(data, ('colleagues', 'manager', 'name')))\n # 'Jack'\n\n# Iterate over collection(s) by using wildcards in the path:\nwith open('people.json') as data:\n for person in JsonSlicer(data, ('friends', None)):\n print(person)\n # {'name': 'John', 'age': 31}\n # {'name': 'Ivan', 'age': 26}\n\n# Iteration over both arrays and dicts is possible, even at the same time\nwith open('people.json') as data:\n for person in JsonSlicer(data, (None, None)):\n print(person)\n # {'name': 'John', 'age': 31}\n # {'name': 'Ivan', 'age': 26}\n # {'name': 'Jack', 'age': 33}\n # {'name': 'Lucy', 'age': 21}\n\n# Map key of returned objects is available on demand...\nwith open('people.json') as data:\n for position, person in JsonSlicer(data, ('colleagues', None), path_mode='map_keys'):\n print(position, person)\n # 'manager' {'name': 'Jack', 'age': 33}\n # 'subordinate' {'name': 'Lucy', 'age': 21}\n\n# ...as well as complete path information\nwith open('people.json') as data:\n for *path, person in JsonSlicer(data, (None, None), path_mode='full'):\n print(path, person)\n # ('friends', 0) {'name': 'John', 'age': 31})\n # ('friends', 1) {'name': 'Ivan', 'age': 26})\n # ('colleagues', 'manager') {'name': 'Jack', 'age': 33})\n # ('colleagues', 'subordinate') {'name': 'Lucy', 'age': 21})\n\n# Extract all instances of deep nested field\nwith open('people.json') as data:\n age_sum = sum(JsonSlicer(data, (None, None, 'age')))\n # 111\n```\n\n## API\n\n```\njsonslicer.JsonSlicer(\n file,\n path_prefix,\n read_size=1024,\n path_mode=None,\n yajl_allow_comments=False,\n yajl_dont_validate_strings=False,\n yajl_allow_trailing_garbage=False,\n yajl_allow_multiple_values=False,\n yajl_allow_partial_values=False,\n encoding=None,\n errors=None,\n binary=False,\n)\n```\n\nConstructs iterative JSON parser. which reads JSON data from _file_ (a `.read()`-supporting [file-like object](https://docs.python.org/3/glossary.html#term-file-like-object) containing a JSON document).\n\n_file_ is a `.read()`-supporting [file-like\nobject](https://docs.python.org/3/glossary.html#term-file-like-object)\ncontaining a JSON document. Both binary and text files are supported,\nbut binary ones are preferred, because the parser has to operate on\nbinary data internally anyway, and using text input would require an\nunnecessary encoding/decoding which yields ~3% performance overhead.\nNote that JsonSlicer supports both unicode and binary output regardless\nof input format.\n\n_path_prefix_ is an iterable (usually a list or a tuple) specifying\na path or a path pattern of objects which the parser should extract\nfrom JSON.\n\nFor instance, in the example above a path `('friends', 0, 'name')`\nwill yield string `'John'`, by descending from the root element\ninto the dictionary element by key `'friends'`, then into the array\nelement by index `0`, then into the dictionary element by key\n`'name'`. Note that integers only match array indexes and strings\nonly match dictionary keys.\n\nThe path can be turned into a pattern by specifying `None` as a\nplaceholder in some path positions. For instance, `(None, None,\n'name')` will yield all four names from the example above, because\nit matches an item under 'name' key on the second nesting level of\nany arrays or map structure.\n\nBoth strings and byte objects are allowed in path, regardless of\ninput and output encodings. are automatically converted\nto the format used internally.\n\n_read_size_ is a size of block read by the parser at a time.\n\n_path_mode_ is a string which specifies how a parser should\nreturn path information along with objects. The following modes are\nsupported:\n\n* _'ignore'_ (the default) - do not output any path information, just\nobjects as is (`'friends'`).\n\n ```python\n {'name': 'John', 'age': 31}\n {'name': 'Ivan', 'age': 26}\n {'name': 'Jack', 'age': 33}\n {'name': 'Lucy', 'age': 21}\n ```\n\n Common usage pattern for this mode is\n\n ```python\n for object in JsonSlicer(...)\n ```\n\n* _'map_keys'_ - output objects as is when traversing arrays and tuples\nconsisting of map key and object when traversing maps.\n\n ```python\n {'name': 'John', 'age': 31}\n {'name': 'Ivan', 'age': 26}\n ('manager', {'name': 'Jack', 'age': 33})\n ('subordinate', {'name': 'Lucy', 'age': 21})\n ```\n\n This format may seem inconsistent (and therefore it's not the default),\n however in practice only collection of a single type is iterated at\n a time and this type is known, so this format is likely the most useful\n as in most cases you do need dictionary keys.\n\n Common usage pattern for this mode is\n\n ```python\n for object in JsonSlicer(...) # when iterating arrays\n for key object in JsonSlicer(...) # when iterating maps\n ```\n\n* _'full_paths'_ - output tuples consisting of all path components\n(both map keys and array indexes) and an object as the last element.\n\n ```python\n ('friends', 0, {'name': 'John', 'age': 31})\n ('friends', 1, {'name': 'Ivan', 'age': 26})\n ('colleagues', 'manager', {'name': 'Jack', 'age': 33})\n ('colleagues', 'subordinate', {'name': 'Lucy', 'age': 21})\n ```\n\n Common usage pattern for this mode is\n\n ```python\n for *path, object in JsonSlicer(...)\n ```\n\n_yajl_allow_comments_ enables corresponding YAJL flag, which is\ndocumented as follows:\n\n> Ignore javascript style comments present in JSON input. Non-standard,\n> but rather fun\n\n_yajl_dont_validate_strings_ enables corresponding YAJL flag, which\nis documented as follows:\n\n> When set the parser will verify that all strings in JSON input\n> are valid UTF8 and will emit a parse error if this is not so. When\n> set, this option makes parsing slightly more expensive (~7% depending\n> on processor and compiler in use)\n\n_yajl_allow_trailing_garbage_ enables corresponding YAJL flag, which\nis documented as follows:\n\n> By default, yajl will ensure the entire input text was consumed\n> and will raise an error otherwise. Enabling this flag will cause\n> yajl to disable this check. This can be useful when parsing json\n> out of a that contains more than a single JSON document.\n\n_yajl_allow_multiple_values_ enables corresponding YAJL flag, which\nis documented as follows:\n\n> Allow multiple values to be parsed by a single handle. The entire\n> text must be valid JSON, and values can be seperated by any kind\n> of whitespace. This flag will change the behavior of the parser,\n> and cause it continue parsing after a value is parsed, rather than\n> transitioning into a complete state. This option can be useful\n> when parsing multiple values from an input stream.\n\n_yajl_allow_partial_values_ enables corresponding YAJL flag, which\nis documented as follows:\n\n> When yajl_complete_parse() is called the parser will check that the\n> top level value was completely consumed. I.E., if called whilst\n> in the middle of parsing a value yajl will enter an error state\n> (premature EOF). Setting this flag suppresses that check and the\n> corresponding error.\n\n_encoding_ may be used to override output encoding, which is derived\nfrom the input file handle if possible, or otherwise set to the\ndefault one as Python builtin `open()` would use (usually `'UTF-8'`).\n\n_errors_ is an optional string that specifies how encoding and\ndecoding errors are to be handled. Defaults to `'strict'`\n\n_binary_ forces the output to be in form of `bytes` objects instead\nof `str` unicode strings.\n\nThe constructed object is as iterator. You may call `next()` to extract\nsingle element from it, iterate it via `for` loop, or use it in generator\ncomprehensions or in any place where iterator is accepted.\n\n## Performance/competitors\n\nThe closest competitor is [ijson](https://github.com/isagalaev/ijson),\nand JsonSlicer was written to be better. Namely,\n\n* It's about 15x faster, similar in performance to Python's native `json` module\n* It allows iterating over dictionaries and allows more flexibility when\n specifying paths/patterns of objects to iterate over\n\nThe results of bundled benchmark on Python 3.7.2 / clang 6.0.1 / `-O2 -DNDEBUG` / FreeBSD 12.0 amd64 / Core i7-6600U CPU @ 2.60GHz.\n\n| Facility | Type | Objects/sec |\n|:---------------------------------------------------------|:------:|--------------:|\n| json.loads() | str | 1155.9K |\n| json.load(StringIO()) | str | 1104.1K |\n| **JsonSlicer (no paths, binary input, binary output)** | bytes | 1149.5K |\n| **JsonSlicer (no paths, unicode input, binary output)** | bytes | 1121.3K |\n| **JsonSlicer (no paths, binary input, unicode output)** | str | 1033.3K |\n| **JsonSlicer (no paths, unicode input, unicode output)** | str | 1006.2K |\n| **JsonSlicer (full paths, binary output)** | bytes | 787.6K |\n| **JsonSlicer (full paths, unicode output)** | str | 586.5K |\n| ijson.yajl2_cffi | bytes | 75.7K |\n| ijson.yajl2 | bytes | 52.0K |\n| ijson.python | str | 32.2K |\n\n## Status/TODO\n\nJsonSlicer is currently in beta stage, used in production in\n[Repology](https://repology.org) project. Testing foci are:\n\n- Edge cases with uncommon encoding (input/output) configurations\n- Absence of memory leaks\n\n## Requirements\n\n- Python 3.4+ (Python 2 not supported)\n- pkg-config\n- [yajl](https://lloyd.github.io/yajl/) 2.0.3+ (older versions lack pkgconfig file)\n\n## License\n\nMIT license, copyright (c) 2019 Dmitry Marakasov amdmi3@amdmi3.ru.", "description_content_type": "text/markdown", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/AMDmi3/jsonslicer", "keywords": "json,parser,pull,stream", "license": "MIT", "maintainer": "", "maintainer_email": "", "name": "jsonslicer", "package_url": "https://pypi.org/project/jsonslicer/", "platform": "", "project_url": "https://pypi.org/project/jsonslicer/", "project_urls": { "Homepage": "https://github.com/AMDmi3/jsonslicer" }, "release_url": "https://pypi.org/project/jsonslicer/0.1.5/", "requires_dist": null, "requires_python": "", "summary": "Stream JSON parser with iterator interface", "version": "0.1.5" }, "last_serial": 5951564, "releases": { "0.1.0": [ { "comment_text": "", "digests": { "md5": "d75e053a631feb9529227ae43373972d", "sha256": "9e2e4b9e0cb03b7b5519c5289f5b1b9b43ec69f0d0c617a264f650fc3d9c99bd" }, "downloads": -1, "filename": "jsonslicer-0.1.0.tar.gz", "has_sig": false, "md5_digest": "d75e053a631feb9529227ae43373972d", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 21458, "upload_time": "2019-01-22T13:47:40", "url": "https://files.pythonhosted.org/packages/0a/6f/803ddf86b706db159957e4c67dc1731c59e53c6ba6c5a1fe8853995bcf2b/jsonslicer-0.1.0.tar.gz" } ], "0.1.1": [ { "comment_text": "", "digests": { "md5": "a497cb9f5a4778ab0e6d4985e3171758", "sha256": "82bcb3d3f995b8248f1713cef978ca3244509c70a2f6ddc9c3cb15b141f5cfff" }, "downloads": -1, "filename": "jsonslicer-0.1.1.tar.gz", "has_sig": false, "md5_digest": "a497cb9f5a4778ab0e6d4985e3171758", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 21782, "upload_time": "2019-01-22T16:06:05", "url": "https://files.pythonhosted.org/packages/13/87/df2b130e0c532e26af5c8e05fea6a405892f5759f9bf2520cc005e75f16f/jsonslicer-0.1.1.tar.gz" } ], "0.1.2": [ { "comment_text": "", "digests": { "md5": "f6176437352ba43e56725dbe33a87059", "sha256": "cacca46c021bf3a2783968fd6c435d6d5baf60769dd18fe5157aec7324931287" }, "downloads": -1, "filename": "jsonslicer-0.1.2.tar.gz", "has_sig": false, "md5_digest": "f6176437352ba43e56725dbe33a87059", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 22873, "upload_time": "2019-03-04T18:50:37", "url": "https://files.pythonhosted.org/packages/6d/3c/4c2c5eb3fc85de32fd769686903e12d0197d54abe2618b0b825382392843/jsonslicer-0.1.2.tar.gz" } ], "0.1.3": [ { "comment_text": "", "digests": { "md5": "56bd1da70e244ff39e93c08f8f904314", "sha256": "1cc9a731bdce316afcf04e938e959b7d0b7620bf580dcb1d98872603bbb365b8" }, "downloads": -1, "filename": "jsonslicer-0.1.3.tar.gz", "has_sig": false, "md5_digest": "56bd1da70e244ff39e93c08f8f904314", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 29100, "upload_time": "2019-03-07T17:39:20", "url": "https://files.pythonhosted.org/packages/0a/35/34db36b02375baaeb6809027173ca15b461bbc61f0b25cac336d72c7bec6/jsonslicer-0.1.3.tar.gz" } ], "0.1.4": [ { "comment_text": "", "digests": { "md5": "89641b363828a1f51a08d0af5f16d461", "sha256": "b2db201723f954887ae45c87ec27a97f219c6dd2db1796543e27d44044f8a240" }, "downloads": -1, "filename": "jsonslicer-0.1.4.tar.gz", "has_sig": false, "md5_digest": "89641b363828a1f51a08d0af5f16d461", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 22911, "upload_time": "2019-03-12T13:30:45", "url": "https://files.pythonhosted.org/packages/bf/a0/54fe9b1c190ef9c2b0e6013b4c88f759a873b89144681ce52f87fb8a2dc1/jsonslicer-0.1.4.tar.gz" } ], "0.1.5": [ { "comment_text": "", "digests": { "md5": "a39696ccd983106e56975604434d7b97", "sha256": "78aff4171369faafdeee0dd0363457b81104ca1768a4b82144073e748de6fd66" }, "downloads": -1, "filename": "jsonslicer-0.1.5.tar.gz", "has_sig": false, "md5_digest": "a39696ccd983106e56975604434d7b97", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 23022, "upload_time": "2019-10-09T19:21:14", "url": "https://files.pythonhosted.org/packages/61/ec/75c764f881625c97d2d9e9e06fe1b9aa6d145dee8b327d73cd76b36aeec2/jsonslicer-0.1.5.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "a39696ccd983106e56975604434d7b97", "sha256": "78aff4171369faafdeee0dd0363457b81104ca1768a4b82144073e748de6fd66" }, "downloads": -1, "filename": "jsonslicer-0.1.5.tar.gz", "has_sig": false, "md5_digest": "a39696ccd983106e56975604434d7b97", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 23022, "upload_time": "2019-10-09T19:21:14", "url": "https://files.pythonhosted.org/packages/61/ec/75c764f881625c97d2d9e9e06fe1b9aa6d145dee8b327d73cd76b36aeec2/jsonslicer-0.1.5.tar.gz" } ] }