{ "info": { "author": "Sean Hammond", "author_email": "losser@seanh.cc", "bugtrack_url": null, "classifiers": [ "Development Status :: 4 - Beta", "License :: OSI Approved :: GNU Affero General Public License v3 or later (AGPLv3+)", "Programming Language :: Python :: 2.7" ], "description": "[![Build Status](https://travis-ci.org/ckan/losser.svg)](https://travis-ci.org/ckan/losser)\n[![Coverage Status](https://img.shields.io/coveralls/ckan/losser.svg)](https://coveralls.io/r/ckan/losser)\n[![Latest Version](https://pypip.in/version/losser/badge.svg)](https://pypi.python.org/pypi/losser/)\n[![Downloads](https://pypip.in/download/losser/badge.svg)](https://pypi.python.org/pypi/losser/)\n[![Supported Python versions](https://pypip.in/py_versions/losser/badge.svg)](https://pypi.python.org/pypi/losser/)\n[![Development Status](https://pypip.in/status/losser/badge.svg)](https://pypi.python.org/pypi/losser/)\n[![License](https://pypip.in/license/losser/badge.svg)](https://pypi.python.org/pypi/losser/)\n\n\nLosser\n======\n\nA little UNIX command and Python library for lossy filter, transform, and\nexport of JSON to Excel-compatible CSV.\nCreated for [ckanapi-exporter](https://github.com/ckan/ckanapi-exporter).\n\nLosser can either be run as a UNIX command or used as a Python library\n(see [Usage](#usage) below). It takes a JSON-formatted list of objects\n(or a list of Python dicts) as input and produces a \"table\" as output.\n\nThe input objects don't all have to have the same fields or structure as each\nother, and may contain sub-lists and sub-objects arbitrarily nested.\n\nThe output \"table\" is a list of objects that all have the same keys in the same\norder, and with sub-objects and sub-lists nested no more than one level deep.\nIt can be output as:\n\n* A list of Python OrderedDicts each having the same keys in the same order\n* A string of JSON-formatted text representing a list of objects each having\n the same keys in the same order\n ([TODO](https://github.com/ckan/losser/issues/3))\n* A string of CSV-formatted text, one object per CSV row. The rows of the CSV\n correspond to the objects in the list of output objects if they had been\n returned as Python or JSON Data, and the columns correspond to the objects'\n keys.\n\nThe input objects can be filtered and transformed before producing the output\ntable. You provide a list of \"column query\" objects in a `columns.json` file\nthat specifies what columns the output table should have, and how the values\nfor those columns should be retrieved from the input objects.\n\nFor example, if you had some input objects that looked like this:\n\n [\n {\n \"author\": \"Sean Hammond\",\n \"title\": \"An Example Input Object\",\n \"extras\":\n {\n \"Delivery Unit\": \"Commissioning\"\n {\n },\n ...\n ]\n\nYou might transform them using a `columns.json` file like this:\n\n {\n \"Data Owner\": {\n \"pattern_path\": \"^author$\"\n },\n \"Title\": {\n \"pattern_path\": \"^title$\"\n },\n \"Delivery Unit\": {\n \"pattern_path\": [\"^extras$\", \"^Delivery Unit$\"]\n }\n }\n\nThis would output a CSV file like this:\n\n Data Owner,Title,Delivery Unit\n Sean Hammond,An Example Input Object,Commissioning\n Frank Black,Another Example Object,Some Other Unit\n ...\n\nThe `columns.json` file above specifies three column headings for the output\ntable:\n\n1. Data Owner\n2. Title\n3. Delivery Unit\n\nThe values for each column are retrieved from the input objects by following a\n\"pattern path\": a list of regular expressions that are matched against the keys\nof the input object and its sub-objects in turn to find a value.\n\nFor example the \"Data Owner\" field above has the pattern path `\"^author$\"` which\nmatches the string \"author\". This will find top-level keys named \"author\" in\nthe input objects and output their values in the \"Data Owner\" column of the\noutput table.\n\nThe \"Delivery Unit\" column above has a more complex pattern path:\n`[\"^extras$\", \"^Delivery Unit$\"]`. This will find the top-level key \"extras\" in\nan input object and, assuming the value for the \"extras\" key is a sub-object,\nwill find and return the value for the \"Delivery Unit\" key in the sub-object.\n\nPattern paths can be arbitrarily long, recursing into arbitrarily deeply nested\nsub-objects.\n\nOne of the patterns in a pattern path may match multiple keys in an object or\nsub-object. In that case losser recurses on each of the matched keys and ends\nup returning a list of values instead of a single value.\n\nFor example given this input object:\n\n {\n \"update\": \"yearly\",\n \"update frequency\": \"monthly\",\n ...\n }\n\nThe pattern path `\"^update.*\"` (which matches both \"update\" and \"update\nfrequency\") would output `\"yearly, monthly\"` (a quoted, comma-separated list)\nin the corresponding cell in the CSV output.\n\nIf a pattern path goes through a sub-list in the input dict losser will iterate\nover the list and recurse on each of its items. Again it will end up returning\na list of values instead of a single value.\n\nFor example, given a list of input objects like this:\n\n [\n {\n \"resources\": [\n {\n \"format\": \"CSV\",\n ...\n },\n {\n \"format\": \"JSON\",\n ...\n },\n ...\n ],\n ...\n },\n ...\n ]\n\nThe pattern path `[\"^resources$\", \"^format$\"]` will find each object's\n\"resources\" sub-list and then find the \"format\" field in each object in the\nsub-list. The values in the CSV column will be lists like `\"CSV, JSON\"`.\nList can optionally be deduplicated.\n\nNested lists can occur (when the input object contains a list of lists, for\nexample). These are flattened in the output cells.\n\nSome of the filtering and transformations you can do with losser include:\n\n* Extract some fields from the objects and filter out others.\n\n Any fields in an input object that do not match any of the pattern paths in\n the `columns.json` file are filtered out.\n\n ([TODO](https://github.com/ckan/losser/issues/2): Support appending unmatched\n fields to the end of the ouput table as additional columns).\n\n* Specify the order of the columns in the output table.\n\n Columns are output in the same order that they appear in the `columns.json`\n file, which does not have to be the same order as the corresponding fields in\n the input objects.\n\n* Rename fields, using a different name for the column in the output table than\n for the field in the input objects.\n\n For example to get the \"notes\" field from each input object and place them\n all in a \"description\" column in the output table, put this object in your\n `columns.json`:\n\n \"Description\": {\n \"pattern_path\": \"^notes$\",\n }\n\n* Match patterns case-sensitively.\n\n By default patterns are matched case-insensitively. To do case-sensitive\n matching put `\"case_sensitive\": true` in a column query in your\n `columns.json` file:\n\n \"Title\": {\n \"pattern_path\": \"^title$\",\n \"case_sensitive\": true\n },\n\n This will match \"title\" in the input object, but not \"Title\" or \"TITLE\".\n\n* Transform the matched values, for example truncating or stripping whitespace\n from strings.\n\n* Provide arbitrary pre-processor and post-processor functions to do custom\n transformations on the input and output objects\n ([TODO](https://github.com/ckan/losser/issues/1)).\n\n* Find inconsistently-named fields using a pattern that matches any of the\n names and combine them into a single column in the output table.\n\n For example you can provide a pattern like `\"^update.*\"` that will find keys\n named \"update\", \"Update\", \"Update Frequency\" etc. in different input objects\n and collect their values in a single \"Update Frequency\" column.\n\n* Collect multiple fields together in a single column.\n\n If a pattern matches multiple fields they'll be output as a quoted\n comma-separated list in a single cell in the CSV.\n\n For example with an input object like this:\n\n {\n \"Contributor 1\": \"Thom Yorke\",\n \"Contributor 2\": \"Nigel Godrich\",\n \"Contributor 3\": \"Jonny Greenwood\",\n ...\n }\n\n The pattern `\"^Contributor.*\"` will match all three fields and the value in\n the CSV cell will be `\"Thom Yorke,Nigel Godrich,Jonny Greenwood\"`.\n\n* You can specify that a pattern path should find a unique value in the object,\n and if more than one value in the object matches the pattern (and a list\n would be returned) an exception will be raised.\n\n Use `\"unique\": true` in a column query in your `columns.json` file:\n\n \"Title\": {\n \"pattern_path\": \"^title$\",\n \"unique\": true\n },\n\n This is useful for debugging pattern paths that you expect to be unique.\n\n* You can specify that a pattern path *must* match a value in the object, and\n an exception will be raised if there's no matching path through the object\n ([TODO](https://github.com/ckan/losser/issues/4)).\n\n* When a pattern matches multiple paths through the input object, or matches a\n path going through a sub-list, the resulting list of values in the output\n table cell can be deduplicated. Put `\"deduplicate\": true` in a column query\n in your `columns.json` file:\n\n \"Format\": {\n \"pattern_path\": [\"^resources$\", \"^format$\"],\n \"deduplicate\": true\n },\n\n\nWhat it can't do (yet):\n\n* Pattern match against the values of items (as opposed to their keys).\n\n When following a pattern path through an object, when losser hits an\n object/dictionary in the input, either one of the top-level objects in the\n list of input objects or a sub-object, losser matches the relevant regex\n against the object's keys and then recurses on the values of each of the\n matched keys.\n\n If the key matches the pattern it recurses, you can't also specify a pattern\n to match the value against.\n\n When it hits a string, number, boolean or ``None``/``null`` losser returns\n it. You can't give it a pattern to match the value against to decide whether\n to return it or not.\n\n When it hits a list losser iterates over the items in the list and for each\n item either returns it or, if it's a sub-list or sub-object, recurses.\n (When sub-lists or sub-objects would cause a nested list to be returned it's\n flattened into a single list and optionally deduplicated.) Again, you can't\n provide a pattern to be matched against each item to decide whether to\n return/recurse or not.\n\n Adding pattern matching against values as well as keys would add a lot of\n power.\n\n\nRequirements\n------------\n\nPython 2.7.\n\n\nInstallation\n------------\n\nTo install run:\n\n pip install losser\n\nTo install for development, create and activate a Python virtual environment\nthen do:\n\n git clone https://github.com/ckan/losser.git\n cd losser\n python setup.py develop\n pip install -r dev-requirements.txt\n\n\nUsage\n-----\n\nOn the command-line losser reads input objects from stdin and writes the output\ntable to stdout, making it composable with other UNIX commands. For example:\n\n losser --columns columns.json < input.json > output.csv\n\nYou can also specify columns on the command line instead of using a\ncolumns.json file. For example:\n\n losser --column \"Data Owner\" --pattern '^author$' --unique --case-sensitive --column \"Description\" --pattern '^notes$' --unique --case-sensitive --max-length 255 --column Formats --pattern '^resources$' '^format$' --case-sensitive --deduplicate\n\nYou specify one or more `--column` options with the column title as argument\nand followed by a `--pattern` option and any other column options (`--unique`,\n`--case-sensitive`, etc).\n\nThe `--pattern` option can take more than one space-separated argument, in the\ncase where the column's pattern path contains more than one pattern, for\nexample: `--pattern \"^resources$\" \"^format$\"`.\n\nColumn options like `--pattern`, `--unique`, `--max-length` etc apply to the\npreceding `--column` on the command-line.\n\nSee `losser -h` for help.\n\n\nThis will read input objects from `input.json`, read column queries from\n`columns.json`, and write output objects to `output.csv`.\n\nTo use losser as a Python library:\n\n import losser.losser as losser\n table = losser.table(input_objects, columns)\n\n`input_objects` should be a list of dicts. `columns` can be either a list of\ndicts or the path to a `columns.json` file (string). The returned `table` will\nbe a list of dicts. If you pass `csv=True` to `table()` it'll return a\nCSV-formatted string instead. See `table()`'s docstring for more arguments.\n\n\nInheriting Losser's Command Line Interface\n------------------------------------------\n\nLosser's command line interface with `--column` and related arguments is fairly\ncomplicated to implement. You may want to offer the same command line features\nin your own losser-based command without having to reimplement them.\n\nFor example [ckanapi-exporter](https://github.com/ckan/ckanapi-exporter) offers\nthe losser command line interface but adds its own `--url` and `--apikey`\narguments to pull the input data directly from a CKAN site instead of reading\nit from stdin.\n\n`losser.cli` provides `make_parser()` and `parse()` functions to enable\ninheriting its command-line interface. Here's how you use them:\n\n parent_parser = losser.cli.make_parser(add_help=False, exclude_args=[\"-i\"])\n parser = argparse.ArgumentParser(\n description=\"Export datasets from a CKAN site to JSON or CSV.\",\n parents=[parent_parser],\n )\n parser.add_argument(\"--url\", required=True)\n parser.add_argument(\"--apikey\")\n try:\n parsed_args = losser.cli.parse(parser=parser)\n except losser.cli.CommandLineExit as err:\n sys.exit(err.code)\n except losser.cli.CommandLineError as err:\n if err.message:\n parser.error(err.message)\n url = parsed_args.url\n columns = parsed_args.columns\n apikey = parsed_args.apikey\n datasets = get_datasets_from_ckan(url, apikey)\n csv_string = losser.losser.table(datasets, columns, csv=True)\n\nSee [ckanapi-exporter](https://github.com/ckan/ckanapi-exporter) for a\nworking example.\n\n\nRunning the Tests\n-----------------\n\nFirst activate your virtualenv then install the dev requirements:\n\n pip install -r dev-requirements.txt\n\nThen to run the tests do:\n\n nosetests\n\nTo run the tests and produce a test coverage report do:\n\n nosetests --with-coverage --cover-inclusive --cover-erase --cover-tests", "description_content_type": null, "docs_url": null, "download_url": "UNKNOWN", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/ckan/losser", "keywords": "json csv", "license": "AGPL", "maintainer": null, "maintainer_email": null, "name": "losser", "package_url": "https://pypi.org/project/losser/", "platform": "UNKNOWN", "project_url": "https://pypi.org/project/losser/", "project_urls": { "Download": "UNKNOWN", "Homepage": "https://github.com/ckan/losser" }, "release_url": "https://pypi.org/project/losser/0.0.3/", "requires_dist": null, "requires_python": null, "summary": "Filter, transform and export a list of JSON objects to JSON or CSV", "version": "0.0.3" }, "last_serial": 1381099, "releases": { "0.0.1": [ { "comment_text": "", "digests": { "md5": "2b80502da0f1d7a72f3d7b384ede548e", "sha256": "c70c6e21e127f27163d2aa53f471af4c8bc56cf7901e267225e39b1279131f8a" }, "downloads": -1, "filename": "losser-0.0.1.tar.gz", "has_sig": false, "md5_digest": "2b80502da0f1d7a72f3d7b384ede548e", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 7229, "upload_time": "2014-10-22T19:50:21", "url": "https://files.pythonhosted.org/packages/d4/41/0485f3857c01eb787f170e813645999202d2ebf5095eb67a62a89a537ada/losser-0.0.1.tar.gz" } ], "0.0.2": [ { "comment_text": "", "digests": { "md5": "71bf5f0d80d33b93d13dd370ac65ceda", "sha256": "060f1ef5bd528343e94d8f82574ff493b973ab78533f974a0da7439951b0f461" }, "downloads": -1, "filename": "losser-0.0.2.tar.gz", "has_sig": false, "md5_digest": "71bf5f0d80d33b93d13dd370ac65ceda", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 20850, "upload_time": "2014-11-06T19:32:32", "url": "https://files.pythonhosted.org/packages/01/28/4031e65fc921403fbf75a647cd5d50176a56c201f3f3fe328c3387cf5e61/losser-0.0.2.tar.gz" } ], "0.0.3": [ { "comment_text": "", "digests": { "md5": "f451e660b2d6fb2659ba223a99d95fd0", "sha256": "4fefb83b992b819040f3ed4c6b801e596bcfcda0bfb7036fad5bc74beec0816b" }, "downloads": -1, "filename": "losser-0.0.3.tar.gz", "has_sig": false, "md5_digest": "f451e660b2d6fb2659ba223a99d95fd0", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 21006, "upload_time": "2014-11-09T14:51:32", "url": "https://files.pythonhosted.org/packages/b2/01/0994050c2dc6e641d9a832a1bd6af46277cbedb55d1a5e0e237a0e55318c/losser-0.0.3.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "f451e660b2d6fb2659ba223a99d95fd0", "sha256": "4fefb83b992b819040f3ed4c6b801e596bcfcda0bfb7036fad5bc74beec0816b" }, "downloads": -1, "filename": "losser-0.0.3.tar.gz", "has_sig": false, "md5_digest": "f451e660b2d6fb2659ba223a99d95fd0", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 21006, "upload_time": "2014-11-09T14:51:32", "url": "https://files.pythonhosted.org/packages/b2/01/0994050c2dc6e641d9a832a1bd6af46277cbedb55d1a5e0e237a0e55318c/losser-0.0.3.tar.gz" } ] }