{ "info": { "author": "Alex Kovner", "author_email": "dev@alexkovner.com", "bugtrack_url": null, "classifiers": [ "Development Status :: 3 - Alpha", "Intended Audience :: Developers", "License :: OSI Approved :: MIT License", "Natural Language :: English", "Operating System :: OS Independent", "Programming Language :: Python", "Programming Language :: Python :: 3", "Programming Language :: Python :: 3.4", "Programming Language :: Python :: 3.5", "Programming Language :: Python :: 3.6", "Programming Language :: Python :: 3.7", "Topic :: Text Processing" ], "description": "JSV - JSON separated values\n===========================\n\n.. image:: https://travis-ci.org/akovner/jsv.svg?branch=master\n :target: https://travis-ci.org/akovner/jsv\n.. image:: https://coveralls.io/repos/github/akovner/jsv/badge.svg?branch=master\n :target: https://coveralls.io/github/akovner/jsv?branch=master\n.. image:: https://readthedocs.org/projects/jsv/badge/?version=latest\n :target: https://jsv.readthedocs.io/en/latest/?badge=latest\n :alt: Documentation Status\n\nA compact way to represent a stream of similar json objects\n\ndocumentation\n-------------\n\nSee ``_\n\ninstallation\n------------\n\n``pip install jsv``\n\nmotivation\n----------\n\nJSON is an excellent universal represention for dictionaries, arrays and basic primitives. When representing records that are identical or nearly identical in schema, however, it is extremely verbose, as the same dictionary keys must be repeated for every record. The json lines format (which is just a sequence of json objects, each on a separate line of a file) therefore acheives great generality, but sacrifices compactness.\n\nOther formats, notably csv, are not standardized, and can become leaky abstractions. In addition, they are usually confined to representing flat data, whereas json objects are rich with nested arrays and dictionaries.\n\nThis project replaces the json lines and csv formats with a rich, flexible, and compact representation of similar json objects. It aims to stay true to the simplicity and generality of json, and can represent any json object regardless of nesting. In addition, it provides for multiple record types in a single file or stream.\n\nexamples\n--------\n\nsimple objects\n++++++++++++++\n\nLet's start with a simple example. Suppose we have a list of three json objects with identical keys: ::\n\n {\"key_1\":1,\"key_2\":2,\"key_3\":\"three\"}\n {\"key_1\":\"four\",\"key_2\":true,\"key_3\":6}\n {\"key_1\":7,\"key_2\":\"eight\",\"key_3\":null}\n\nWe transform this into: ::\n\n #_ {\"key_1\",\"key_2\",\"key_3\"}\n {1,2,\"three\"}\n {\"four\",true,6}\n {7,\"eight\",null}\n\nThe first line is simply a list of keys, embedded into a simple dictionary. The ``#`` indicates that a template is being defined, and ``_`` is the reserved id for the default template. The next three lines are the values, but devoid of keys. This is where the jsv format gets its compactness, and the resemblence to both csv and json is clear. Nevertheless it is neither: all four lines are unparsable either as json or csv.\n\nnested objects\n++++++++++++++\n\nLet's consider some basic nested objects: ::\n\n {\"key_1\":{\"subkey_1\":1,\"subkey_2\":2},\"key_2\":[\"a\",\"b\",\"c\"]}\n {\"key_1\":{\"subkey_1\":3,\"subkey_2\":4},\"key_2\":[\"d\",\"e\",\"f\"]}\n {\"key_1\":{\"subkey_1\":5,\"subkey_2\":{\"subsubkey_1\":\"vvv\"}},\"key_2\":[\"g\",\"h\",[\"i\",\"j\",\"k\"]]}\n\nThis becomes: ::\n\n #_ {\"key_1\":{\"subkey_1\",\"subkey\"2},\"key_2\"}\n {{1,2},[\"a\",\"b\",\"c\"]}\n {{3,4},[\"d\",\"e\",\"f\"]}\n {{5,{\"subsubkey_1\":\"vvv\"}},[\"g\",\"h\",[\"i\",\"j\",\"k\"]]}\n\nThe template, again, is a representation of the key structure, this time with nesting. The records (the last three lines) also have the nesting structure, but *without* the keys that are represented in the template. Notice also that there are some non-primitive values in the data. This is fine, as long as the key structure is honored. Also, arrays are left as-is, since they are already compact.\n\narrays\n++++++\n\nHere are some objects with arrays that contain dictionaries: ::\n\n {\"arrival_time\":\"8:00\",\"guests\":[{\"name\":\"Alice\",\"age\":37},{\"name\":\"Bob\",\"age\":73}]}\n {\"arrival_time\":\"8:30\",\"guests\":[{\"name\":\"Cookie Monster\",\"age\":49}]}]\n\nHere is the jsv file: ::\n\n #_ [{\"arrival_time\",\"guests\":[{\"name\",\"age\"}]\n {\"8:00\",[{\"Alice\",37},{\"Bob\",73}]}\n {\"8:30\",[{\"Cookie Monster\",49}]}\n\nFor arrays, a key structure can be given for multiple array entries. They are applied in order, and the last one is applied to all subsequent values found in the record. There is no requirement that the array contain any entries, however if there is an entry, it must conform to the template.\n\nmultiple record types\n+++++++++++++++++++++\n\nWhen there is a need to represent multiple record types in the same file or stream, we must include more metadata in the object that defines the template. For the following example, we consider a stream with two record types:\n\n#. A transaction on an account, such as a purchase.\n#. A change of address on an account.\n\nHere is the initial json lines file: ::\n\n {\"account_number\":111111111,\"transaction_type\":\"sale\",\"merchant_id\":987654321,\"amount\":123.45}\n {\"account_number\":111111111,\"new_address\":{\"street\":\"123 main st.\",\"city\":\"San Francisco\",\"state\":\"CA\",\"zip\":\"94103\"}\n {\"account_number\":222222222,\"transaction_type\":\"sale\",\"merchant_id\":848757678,\"amount\":5974.29}\n\nHere is the ``.jsv`` file: ::\n\n #trns {\"account_number\",\"transaction_type\",\"merchant_id\",\"amount\"}\n @trns {111111111,\"sale\",987654321,123.45}\n #addr {\"id\":\"A\",\"name\":\"address change\",\"template\":{\"account_number\",\"new_address\":{\"street\",\"city\",\"state\",\"zip\"}}}\n @addr {111111111,{\"123 main st.\",\"San Francisco\",\"CA\",\"94103\"}}\n @trns {222222222,:\"sale\",848757678,5974.29}\n\nThe ``@`` followed by the template name indicates a record. New record types can be defined (and redefined) at any point in the stream, provided they appear before any records of that type appear. We can also mix this with using a default template. For example, if we make ``trns`` the default, we end up with the following: ::\n\n #_ {\"account_number\",\"transaction_type\",\"merchant_id\",\"amount\"}\n {111111111,\"sale\",987654321,123.45}\n #addr {\"id\":\"A\",\"name\":\"address change\",\"template\":{\"account_number\",\"new_address\":{\"street\",\"city\",\"state\",\"zip\"}}}\n @addr {111111111,{\"123 main st.\",\"San Francisco\",\"CA\",\"94103\"}}\n {222222222,:\"sale\",848757678,5974.29}\n\nsplit template and record files\n+++++++++++++++++++++++++++++++\n\nWe can also store the templates in a separate file. By convention, we use the ``.jsvt`` extension for the template file, and the ``.jsvr`` extension for the record file. Using the example from the previous section, here is the template file: ::\n\n #trns {\"account_number\",\"transaction_type\",\"merchant_id\",\"amount\"}\n #addr {\"id\":\"A\",\"name\":\"address change\",\"template\":{\"account_number\",\"new_address\":{\"street\",\"city\",\"state\",\"zip\"}}}\n\nand here is the record file: ::\n\n @trns {111111111,\"sale\",987654321,123.45}\n @addr {111111111,{\"123 main st.\",\"San Francisco\",\"CA\",\"94103\"}}\n @trns {222222222,:\"sale\",848757678,5974.29}\n\nThis feature is intended to facilitate analysis on a cluster device, where the record file can be split among nodes, and the template file can be put in the global cache.\n\ndefinitions\n-----------\n\nHere are some terms specific to this project:\n\ntemplate\n A data structure which contains only they keys for a json-like object, along with the nesting structure of the dictionaries of that object.\n\nrecord\n A data structure which contains only the values for a json-like object, fully nested in both dictionaries and arrays.\n\nobject\n An ordinary json object, or its equivalent representation in a given language.\n\nfuture features\n---------------\n\nabbreviations\n+++++++++++++\n\nSpecify that certain repeated values be replaced with a token in the file or stream.\n\nnested templates\n++++++++++++++++\n\nAllow templates to be specified within a record.\n\nintegration with JSON schema\n++++++++++++++++++++++++++++\n\nThe ability to define a template from a `JSON Schema `_ definition.\n\n\n", "description_content_type": "", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/akovner/jsv", "keywords": "", "license": "MIT", "maintainer": "", "maintainer_email": "", "name": "jsv", "package_url": "https://pypi.org/project/jsv/", "platform": "", "project_url": "https://pypi.org/project/jsv/", "project_urls": { "Homepage": "https://github.com/akovner/jsv" }, "release_url": "https://pypi.org/project/jsv/0.1.1/", "requires_dist": null, "requires_python": ">=3.4", "summary": "A compact representation of bulk JSON objects.", "version": "0.1.1" }, "last_serial": 4648515, "releases": { "0.1.1": [ { "comment_text": "", "digests": { "md5": "42fe6bdbe38255641b5572e83148e3ff", "sha256": "b2bf628afffde551b955aec14e07fd162814ca518149a2eb8bda3e68534a972d" }, "downloads": -1, "filename": "jsv-0.1.1-py3-none-any.whl", "has_sig": false, "md5_digest": "42fe6bdbe38255641b5572e83148e3ff", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": ">=3.4", "size": 14798, "upload_time": "2018-12-31T17:51:23", "url": "https://files.pythonhosted.org/packages/2d/75/3d3940612e303602d698497d1a348d9562c16553010ea6919ee665d58e4d/jsv-0.1.1-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "c4fca6a966b6608b549c8c91717ec094", "sha256": "21858cfae574e7f49f1160aa5bf012b419541000304d44a4af714fee7fffa6dd" }, "downloads": -1, "filename": "jsv-0.1.1.tar.gz", "has_sig": false, "md5_digest": "c4fca6a966b6608b549c8c91717ec094", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.4", "size": 22735, "upload_time": "2018-12-31T17:51:26", "url": "https://files.pythonhosted.org/packages/65/42/6b52eca3f61082f9965ec1e419faf43c5d5b631889f286b3b8e47759a5a9/jsv-0.1.1.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "42fe6bdbe38255641b5572e83148e3ff", "sha256": "b2bf628afffde551b955aec14e07fd162814ca518149a2eb8bda3e68534a972d" }, "downloads": -1, "filename": "jsv-0.1.1-py3-none-any.whl", "has_sig": false, "md5_digest": "42fe6bdbe38255641b5572e83148e3ff", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": ">=3.4", "size": 14798, "upload_time": "2018-12-31T17:51:23", "url": "https://files.pythonhosted.org/packages/2d/75/3d3940612e303602d698497d1a348d9562c16553010ea6919ee665d58e4d/jsv-0.1.1-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "c4fca6a966b6608b549c8c91717ec094", "sha256": "21858cfae574e7f49f1160aa5bf012b419541000304d44a4af714fee7fffa6dd" }, "downloads": -1, "filename": "jsv-0.1.1.tar.gz", "has_sig": false, "md5_digest": "c4fca6a966b6608b549c8c91717ec094", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.4", "size": 22735, "upload_time": "2018-12-31T17:51:26", "url": "https://files.pythonhosted.org/packages/65/42/6b52eca3f61082f9965ec1e419faf43c5d5b631889f286b3b8e47759a5a9/jsv-0.1.1.tar.gz" } ] }