{ "info": { "author": "Tony Narlock", "author_email": "cihai@git-pull.com", "bugtrack_url": null, "classifiers": [ "Development Status :: 4 - Beta", "Environment :: Web Environment", "Intended Audience :: Developers", "License :: OSI Approved :: MIT License", "Programming Language :: Python", "Programming Language :: Python :: 2.7", "Programming Language :: Python :: 3", "Programming Language :: Python :: 3.5", "Programming Language :: Python :: 3.6", "Programming Language :: Python :: Implementation :: PyPy", "Topic :: Software Development :: Internationalization", "Topic :: Utilities" ], "description": "*unihan-etl* - `ETL`_ tool for Unicode's Han Unification (`UNIHAN`_) database\nreleases. unihan-etl retrieves (downloads), extracts (unzips), and transforms the\ndatabase from Unicode's website to a flat, tabular or structured, tree-like\nformat.\n\nunihan-etl can be used as a python library through its `API`_, to retrieve data\nas a python object, or through the `CLI`_ to retrieve a CSV, JSON, or YAML file.\n\nPart of the `cihai`_ project. Similar project: `libUnihan `_.\n\nUNIHAN Version compatibility (as of unihan-etl v0.10.0):\n`11.0.0 `__\n(released 2018-05-08, revision 25).\n\n|pypi| |docs| |build-status| |coverage| |license|\n\n`UNIHAN`_'s data is dispersed across multiple files in the format of::\n\n U+3400\tkCantonese\tjau1\n U+3400\tkDefinition\t(same as U+4E18 \u4e18) hillock or mound\n U+3400\tkMandarin\tqi\u016b\n U+3401\tkCantonese\ttim2\n U+3401\tkDefinition\tto lick; to taste, a mat, bamboo bark\n U+3401\tkHanyuPinyin\t10019.020:ti\u00e0n\n U+3401\tkMandarin\tti\u00e0n\n\nValues vary in shape and structure depending on their field type.\n`kHanyuPinyin `_\nmaps Unicode codepoints to `H\u00e0ny\u01d4 D\u00e0 Z\u00ecdi\u01cen `_,\nwhere ``10019.020:ti\u00e0n`` represents an entry. Complicating it further,\nmore variations::\n\n U+5EFE\tkHanyuPinyin\t10513.110,10514.010,10514.020:g\u01d2ng\n U+5364\tkHanyuPinyin\t10093.130:x\u012b,l\u01d4 74609.020:l\u01d4,x\u012b\n\n*kHanyuPinyin* supports multiple entries delimited by spaces. \":\"\n(colon) separate locations in the work from pinyin readings. \",\"\n(comma) separate multiple entries/readings. This is just one of 90 \nfields contained in the database.\n\n.. _API: https://unihan-etl.git-pull.com/en/latest/api.html\n.. _CLI: https://unihan-etl.git-pull.com/en/latest/cli.html\n\nTabular, \"Flat\" output\n----------------------\n\nCSV (default), ``$ unihan-etl``::\n\n char,ucn,kCantonese,kDefinition,kHanyuPinyin,kMandarin\n \u3400,U+3400,jau1,(same as U+4E18 \u4e18) hillock or mound,,qi\u016b\n \u3401,U+3401,tim2,\"to lick; to taste, a mat, bamboo bark\",10019.020:ti\u00e0n,ti\u00e0n\n\nWith ``$ unihan-etl -F yaml --no-expand``:\n\n.. code-block:: yaml\n\n - char: \u3400\n kCantonese: jau1\n kDefinition: (same as U+4E18 \u4e18) hillock or mound\n kHanyuPinyin: null\n kMandarin: qi\u016b\n ucn: U+3400\n - char: \u3401\n kCantonese: tim2\n kDefinition: to lick; to taste, a mat, bamboo bark\n kHanyuPinyin: 10019.020:ti\u00e0n\n kMandarin: ti\u00e0n\n ucn: U+3401\n\nWith ``$ unihan-etl -F json --no-expand``:\n\n.. code-block:: json\n\n [\n {\n \"char\": \"\u3400\",\n \"ucn\": \"U+3400\",\n \"kDefinition\": \"(same as U+4E18 \u4e18) hillock or mound\",\n \"kCantonese\": \"jau1\",\n \"kHanyuPinyin\": null,\n \"kMandarin\": \"qi\u016b\"\n },\n {\n \"char\": \"\u3401\",\n \"ucn\": \"U+3401\",\n \"kDefinition\": \"to lick; to taste, a mat, bamboo bark\",\n \"kCantonese\": \"tim2\",\n \"kHanyuPinyin\": \"10019.020:ti\u00e0n\",\n \"kMandarin\": \"ti\u00e0n\"\n }\n ]\n\n\"Structured\" output\n-------------------\n\nCodepoints can pack a lot more detail, unihan-etl carefully extracts these values\nin a uniform manner. Empty values are pruned.\n\nTo make this possible, unihan-etl exports to JSON, YAML, and python\nlist/dicts.\n\n.. admonition:: Why not CSV?\n \n Unfortunately, CSV is only suitable for storing table-like \n information. File formats such as JSON and YAML accept key-values and\n hierarchical entries.\n\nJSON, ``$ unihan-etl -F json``:\n\n.. code-block:: json\n\n [\n {\n \"char\": \"\u3400\",\n \"ucn\": \"U+3400\",\n \"kDefinition\": [\n \"(same as U+4E18 \u4e18) hillock or mound\"\n ],\n \"kCantonese\": [\n \"jau1\"\n ],\n \"kMandarin\": {\n \"zh-Hans\": \"qi\u016b\",\n \"zh-Hant\": \"qi\u016b\"\n }\n },\n {\n \"char\": \"\u3401\",\n \"ucn\": \"U+3401\",\n \"kDefinition\": [\n \"to lick\",\n \"to taste, a mat, bamboo bark\"\n ],\n \"kCantonese\": [\n \"tim2\"\n ],\n \"kHanyuPinyin\": [\n {\n \"locations\": [\n {\n \"volume\": 1,\n \"page\": 19,\n \"character\": 2,\n \"virtual\": 0\n }\n ],\n \"readings\": [\n \"ti\u00e0n\"\n ]\n }\n ],\n \"kMandarin\": {\n \"zh-Hans\": \"ti\u00e0n\",\n \"zh-Hant\": \"ti\u00e0n\"\n }\n }\n ]\n\nYAML ``$ unihan-etl -F yaml``:\n\n.. code-block:: yaml\n\n - char: \u3400\n kCantonese:\n - jau1\n kDefinition:\n - (same as U+4E18 \u4e18) hillock or mound\n kMandarin:\n zh-Hans: qi\u016b\n zh-Hant: qi\u016b\n ucn: U+3400\n - char: \u3401\n kCantonese:\n - tim2\n kDefinition:\n - to lick\n - to taste, a mat, bamboo bark\n kHanyuPinyin:\n - locations:\n - character: 2\n page: 19\n virtual: 0\n volume: 1\n readings:\n - ti\u00e0n\n kMandarin:\n zh-Hans: ti\u00e0n\n zh-Hant: ti\u00e0n\n ucn: U+3401\n\n\nFeatures\n--------\n\n* automatically downloads UNIHAN from the internet\n* strives for accuracy with the specifications described in `UNIHAN's database\n design `_\n* export to JSON, CSV and YAML (requires `pyyaml`_) via ``-F``\n* configurable to export specific fields via ``-f``\n* accounts for encoding conflicts due to the Unicode-heavy content\n* designed as a technical proof for future CJK (Chinese, Japanese,\n Korean) datasets\n* core component and dependency of `cihai`_, a CJK library\n* `data package`_ support\n* expansion of multi-value delimited fields in YAML, JSON and python\n dictionaries \n* supports python 2.7, >= 3.5 and pypy\n\nIf you encounter a problem or have a question, please `create an\nissue`_.\n\n.. _cihai: https://cihai.git-pull.com\n.. _cihai-handbook: https://github.com/cihai/cihai-handbook\n.. _cihai team: https://github.com/cihai?tab=members\n.. _cihai-python: https://github.com/cihai/cihai-python\n\nUsage\n-----\n\n``unihan-etl`` offers customizable builds via its command line arguments.\n\nSee `unihan-etl CLI arguments`_ for information on how you can specify \ncolumns, files, download URL's, and output destination.\n\nTo download and build your own UNIHAN export:\n\n.. code-block:: bash\n\n $ pip install --user unihan-etl\n\nTo output CSV, the default format:\n\n.. code-block:: bash\n\n $ unihan-etl\n\nTo output JSON::\n\n $ unihan-etl -F json\n\nTo output YAML::\n\n $ pip install --user pyyaml\n $ unihan-etl -F yaml\n\nTo only output the kDefinition field in a csv::\n\n $ unihan-etl -f kDefinition\n\nTo output multiple fields, separate with spaces::\n\n $ unihan-etl -f kCantonese kDefinition\n\nTo output to a custom file::\n\n $ unihan-etl --destination ./exported.csv\n\nTo output to a custom file (templated file extension)::\n\n $ unihan-etl --destination ./exported.{ext}\n\nSee `unihan-etl CLI arguments`_ for advanced usage examples.\n\n.. _unihan-etl CLI arguments: https://unihan-etl.git-pull.com/en/latest/cli.html\n\nCode layout\n-----------\n\n.. code-block:: bash\n\n # cache dir (Unihan.zip is downloaded, contents extracted)\n {XDG cache dir}/unihan_etl/\n\n # output dir\n {XDG data dir}/unihan_etl/\n unihan.json\n unihan.csv\n unihan.yaml # (requires pyyaml)\n\n # package dir\n unihan_etl/\n process.py # argparse, download, extract, transform UNIHAN's data\n constants.py # immutable data vars (field to filename mappings, etc)\n expansion.py # extracting details baked inside of fields\n _compat.py # python 2/3 compatibility module\n util.py # utility / helper functions\n\n # test suite\n tests/*\n\n.. _UNIHAN: http://www.unicode.org/charts/unihan.html\n.. _ETL: https://en.wikipedia.org/wiki/Extract,_transform,_load\n.. _create an issue: https://github.com/cihai/unihan-etl/issues/new\n.. _Data Package: http://frictionlessdata.io/data-packages/\n.. _pyyaml: http://pyyaml.org/\n\n.. |pypi| image:: https://img.shields.io/pypi/v/unihan-etl.svg\n :alt: Python Package\n :target: http://badge.fury.io/py/unihan-etl\n\n.. |build-status| image:: https://img.shields.io/travis/cihai/unihan-etl.svg\n :alt: Build Status\n :target: https://travis-ci.org/cihai/unihan-etl\n\n.. |coverage| image:: https://codecov.io/gh/cihai/unihan-etl/branch/master/graph/badge.svg\n :alt: Code Coverage\n :target: https://codecov.io/gh/cihai/unihan-etl\n\n.. |license| image:: https://img.shields.io/github/license/cihai/unihan-etl.svg\n :alt: License \n\n.. |docs| image:: https://readthedocs.org/projects/unihan-etl/badge/?version=latest\n :alt: Documentation Status\n :scale: 100%\n :target: https://readthedocs.org/projects/unihan-etl/", "description_content_type": "", "docs_url": null, "download_url": "https://pypi.python.org/pypi/unihan-etl", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/cihai/unihan-etl", "keywords": "unihan-etl", "license": "MIT", "maintainer": "", "maintainer_email": "", "name": "unihan-etl", "package_url": "https://pypi.org/project/unihan-etl/", "platform": "", "project_url": "https://pypi.org/project/unihan-etl/", "project_urls": { "Code": "https://github.com/cihai/unihan-etl", "Documentation": "https://unihan-etl.git-pull.com", "Download": "https://pypi.python.org/pypi/unihan-etl", "Homepage": "https://github.com/cihai/unihan-etl", "Issue tracker": "https://github.com/cihai/unihan-etl/issues" }, "release_url": "https://pypi.org/project/unihan-etl/0.10.3/", "requires_dist": null, "requires_python": "", "summary": "Export UNIHAN to Python, Data Package, CSV, JSON and YAML", "version": "0.10.3" }, "last_serial": 5695173, "releases": { "0.10.0": [ { "comment_text": "", "digests": { "md5": "fe3c5bda195b0e14fccab012891ab580", "sha256": "7cd433c5eea6c0791c7b2a388137d8c17cd946a708597868eda3dd262b184a30" }, "downloads": -1, "filename": "unihan-etl-0.10.0.tar.gz", "has_sig": false, "md5_digest": "fe3c5bda195b0e14fccab012891ab580", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 23393, "upload_time": "2018-07-29T12:02:35", "url": "https://files.pythonhosted.org/packages/3d/27/8ce6e4a0439d542475676d1be5843408a4de7e4d9296d2e07946cf703adb/unihan-etl-0.10.0.tar.gz" } ], "0.10.1": [ { "comment_text": "", "digests": { "md5": "2e732ba1b969c9d8cac83ee77d66f107", "sha256": "ed8278379756c1f3b5e8d902a4c2b43788853a0e0467b8668aee8628ae593b3e" }, "downloads": -1, "filename": "unihan-etl-0.10.1.tar.gz", "has_sig": false, "md5_digest": "2e732ba1b969c9d8cac83ee77d66f107", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 23694, "upload_time": "2018-09-08T17:54:37", "url": "https://files.pythonhosted.org/packages/38/2b/0f90895a3b1be22eb9496592efbb3ec1b0fd56aa55e86b3dc41bd488b40c/unihan-etl-0.10.1.tar.gz" } ], "0.10.2": [ { "comment_text": "", "digests": { "md5": "47a61c1df1e362b02634bcd139dff596", "sha256": "e693ffb5130654e945088713638c25acf04e339e81745b3f9b030c990dbb8530" }, "downloads": -1, "filename": "unihan-etl-0.10.2.tar.gz", "has_sig": false, "md5_digest": "47a61c1df1e362b02634bcd139dff596", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 27065, "upload_time": "2019-08-17T15:41:42", "url": "https://files.pythonhosted.org/packages/e4/ea/4632971b9de318da5e74bdee35526cd791e624b3f807796ec7c6c34c7979/unihan-etl-0.10.2.tar.gz" } ], "0.10.3": [ { "comment_text": "", "digests": { "md5": "2120762799d3bdd610aa66f28d2fe1d1", "sha256": "1f2451616002659141c10e62beedc6ad6d1afb5224a380d52ac09214b56b19ad" }, "downloads": -1, "filename": "unihan-etl-0.10.3.tar.gz", "has_sig": false, "md5_digest": "2120762799d3bdd610aa66f28d2fe1d1", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 27136, "upload_time": "2019-08-18T16:36:14", "url": "https://files.pythonhosted.org/packages/b8/88/951644cfd1509cc7773a1f46404b23ddbf80c51549cd7cb8696629351972/unihan-etl-0.10.3.tar.gz" } ], "0.9.0": [ { "comment_text": "", "digests": { "md5": "49ee124ce8b6170b9674165d38724ada", "sha256": "555f085985c603e48209b72a3c6774d480f4541bb25ca89f00555549d7195f2b" }, "downloads": -1, "filename": "unihan-etl-0.9.0.tar.gz", "has_sig": false, "md5_digest": "49ee124ce8b6170b9674165d38724ada", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 24934, "upload_time": "2017-05-26T19:24:30", "url": "https://files.pythonhosted.org/packages/02/dd/40e8e2b8dca9a198bb32094c7e4237eb8e6509f718c0e36862c2134bf922/unihan-etl-0.9.0.tar.gz" } ], "0.9.1": [ { "comment_text": "", "digests": { "md5": "0d664b16e5d34fdd6019a81185472bdd", "sha256": "a8e0e3074f8e70fe52ec2aaa03945382c46c93115ab68b26e5247014ffc88d41" }, "downloads": -1, "filename": "unihan-etl-0.9.1.tar.gz", "has_sig": false, "md5_digest": "0d664b16e5d34fdd6019a81185472bdd", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 25662, "upload_time": "2017-05-28T02:41:40", "url": "https://files.pythonhosted.org/packages/08/5b/3c876ce0648cf66fca27b72e1570dae3324e50c48cbf64467dd0abed5cb6/unihan-etl-0.9.1.tar.gz" } ], "0.9.2": [ { "comment_text": "", "digests": { "md5": "742ad9882b829ea24332e9f8d788f8d5", "sha256": "1460233ecea51eac85e3b81095faa5d6716e19bdaeeb0bc87636650b2ad095b0" }, "downloads": -1, "filename": "unihan-etl-0.9.2.tar.gz", "has_sig": false, "md5_digest": "742ad9882b829ea24332e9f8d788f8d5", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 25141, "upload_time": "2017-05-31T12:55:42", "url": "https://files.pythonhosted.org/packages/92/69/33daf987b1b5d64e6a0ed1e4bfd643be4a6a91281266f0f26d61c42afab6/unihan-etl-0.9.2.tar.gz" } ], "0.9.3": [ { "comment_text": "", "digests": { "md5": "0d91e1ed9346fe6351bfe240b539111b", "sha256": "d16ef1e160fcc0f1e06743ea894f3104539f292d2fadcc5bc27f002e5395cb17" }, "downloads": -1, "filename": "unihan-etl-0.9.3.tar.gz", "has_sig": false, "md5_digest": "0d91e1ed9346fe6351bfe240b539111b", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 25167, "upload_time": "2017-05-31T16:14:17", "url": "https://files.pythonhosted.org/packages/86/9f/a483395926406a055aa1089ad79720fc61ae47cded6fd66c7722bd8f796d/unihan-etl-0.9.3.tar.gz" } ], "0.9.4": [ { "comment_text": "", "digests": { "md5": "749abf5aa9e4b9d65a7eb83c396e943d", "sha256": "e109d0707e85310be30b65d4210be0326c9aa1011589de3673470a66065ce4b2" }, "downloads": -1, "filename": "unihan-etl-0.9.4.tar.gz", "has_sig": false, "md5_digest": "749abf5aa9e4b9d65a7eb83c396e943d", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 25275, "upload_time": "2017-06-05T05:04:02", "url": "https://files.pythonhosted.org/packages/e6/f2/ad475612c1951852fbf186e9c2ffac59296b278605657be60b58b4699358/unihan-etl-0.9.4.tar.gz" } ], "0.9.5": [ { "comment_text": "", "digests": { "md5": "a8a37c20bc9d45d81899c097f28f3640", "sha256": "253453fbd2c439c75739a8026a56ee333f77f23370a3cab791bd71e70c905707" }, "downloads": -1, "filename": "unihan-etl-0.9.5.tar.gz", "has_sig": false, "md5_digest": "a8a37c20bc9d45d81899c097f28f3640", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 25345, "upload_time": "2017-06-26T17:39:53", "url": "https://files.pythonhosted.org/packages/d8/de/3d9897770b4f0eafb171ba960ce769fa5034edcb5ffc2b3c0538635c5b63/unihan-etl-0.9.5.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "2120762799d3bdd610aa66f28d2fe1d1", "sha256": "1f2451616002659141c10e62beedc6ad6d1afb5224a380d52ac09214b56b19ad" }, "downloads": -1, "filename": "unihan-etl-0.10.3.tar.gz", "has_sig": false, "md5_digest": "2120762799d3bdd610aa66f28d2fe1d1", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 27136, "upload_time": "2019-08-18T16:36:14", "url": "https://files.pythonhosted.org/packages/b8/88/951644cfd1509cc7773a1f46404b23ddbf80c51549cd7cb8696629351972/unihan-etl-0.10.3.tar.gz" } ] }