{
"info": {
"author": "Tony Narlock",
"author_email": "cihai@git-pull.com",
"bugtrack_url": null,
"classifiers": [
"Development Status :: 4 - Beta",
"Environment :: Web Environment",
"Intended Audience :: Developers",
"License :: OSI Approved :: MIT License",
"Programming Language :: Python",
"Programming Language :: Python :: 2.7",
"Programming Language :: Python :: 3",
"Programming Language :: Python :: 3.5",
"Programming Language :: Python :: 3.6",
"Programming Language :: Python :: Implementation :: PyPy",
"Topic :: Software Development :: Internationalization",
"Topic :: Utilities"
],
"description": "*unihan-etl* - `ETL`_ tool for Unicode's Han Unification (`UNIHAN`_) database\nreleases. unihan-etl retrieves (downloads), extracts (unzips), and transforms the\ndatabase from Unicode's website to a flat, tabular or structured, tree-like\nformat.\n\nunihan-etl can be used as a python library through its `API`_, to retrieve data\nas a python object, or through the `CLI`_ to retrieve a CSV, JSON, or YAML file.\n\nPart of the `cihai`_ project. Similar project: `libUnihan `_.\n\nUNIHAN Version compatibility (as of unihan-etl v0.10.0):\n`11.0.0 `__\n(released 2018-05-08, revision 25).\n\n|pypi| |docs| |build-status| |coverage| |license|\n\n`UNIHAN`_'s data is dispersed across multiple files in the format of::\n\n U+3400\tkCantonese\tjau1\n U+3400\tkDefinition\t(same as U+4E18 \u4e18) hillock or mound\n U+3400\tkMandarin\tqi\u016b\n U+3401\tkCantonese\ttim2\n U+3401\tkDefinition\tto lick; to taste, a mat, bamboo bark\n U+3401\tkHanyuPinyin\t10019.020:ti\u00e0n\n U+3401\tkMandarin\tti\u00e0n\n\nValues vary in shape and structure depending on their field type.\n`kHanyuPinyin `_\nmaps Unicode codepoints to `H\u00e0ny\u01d4 D\u00e0 Z\u00ecdi\u01cen `_,\nwhere ``10019.020:ti\u00e0n`` represents an entry. Complicating it further,\nmore variations::\n\n U+5EFE\tkHanyuPinyin\t10513.110,10514.010,10514.020:g\u01d2ng\n U+5364\tkHanyuPinyin\t10093.130:x\u012b,l\u01d4 74609.020:l\u01d4,x\u012b\n\n*kHanyuPinyin* supports multiple entries delimited by spaces. \":\"\n(colon) separate locations in the work from pinyin readings. \",\"\n(comma) separate multiple entries/readings. This is just one of 90 \nfields contained in the database.\n\n.. _API: https://unihan-etl.git-pull.com/en/latest/api.html\n.. _CLI: https://unihan-etl.git-pull.com/en/latest/cli.html\n\nTabular, \"Flat\" output\n----------------------\n\nCSV (default), ``$ unihan-etl``::\n\n char,ucn,kCantonese,kDefinition,kHanyuPinyin,kMandarin\n \u3400,U+3400,jau1,(same as U+4E18 \u4e18) hillock or mound,,qi\u016b\n \u3401,U+3401,tim2,\"to lick; to taste, a mat, bamboo bark\",10019.020:ti\u00e0n,ti\u00e0n\n\nWith ``$ unihan-etl -F yaml --no-expand``:\n\n.. code-block:: yaml\n\n - char: \u3400\n kCantonese: jau1\n kDefinition: (same as U+4E18 \u4e18) hillock or mound\n kHanyuPinyin: null\n kMandarin: qi\u016b\n ucn: U+3400\n - char: \u3401\n kCantonese: tim2\n kDefinition: to lick; to taste, a mat, bamboo bark\n kHanyuPinyin: 10019.020:ti\u00e0n\n kMandarin: ti\u00e0n\n ucn: U+3401\n\nWith ``$ unihan-etl -F json --no-expand``:\n\n.. code-block:: json\n\n [\n {\n \"char\": \"\u3400\",\n \"ucn\": \"U+3400\",\n \"kDefinition\": \"(same as U+4E18 \u4e18) hillock or mound\",\n \"kCantonese\": \"jau1\",\n \"kHanyuPinyin\": null,\n \"kMandarin\": \"qi\u016b\"\n },\n {\n \"char\": \"\u3401\",\n \"ucn\": \"U+3401\",\n \"kDefinition\": \"to lick; to taste, a mat, bamboo bark\",\n \"kCantonese\": \"tim2\",\n \"kHanyuPinyin\": \"10019.020:ti\u00e0n\",\n \"kMandarin\": \"ti\u00e0n\"\n }\n ]\n\n\"Structured\" output\n-------------------\n\nCodepoints can pack a lot more detail, unihan-etl carefully extracts these values\nin a uniform manner. Empty values are pruned.\n\nTo make this possible, unihan-etl exports to JSON, YAML, and python\nlist/dicts.\n\n.. admonition:: Why not CSV?\n \n Unfortunately, CSV is only suitable for storing table-like \n information. File formats such as JSON and YAML accept key-values and\n hierarchical entries.\n\nJSON, ``$ unihan-etl -F json``:\n\n.. code-block:: json\n\n [\n {\n \"char\": \"\u3400\",\n \"ucn\": \"U+3400\",\n \"kDefinition\": [\n \"(same as U+4E18 \u4e18) hillock or mound\"\n ],\n \"kCantonese\": [\n \"jau1\"\n ],\n \"kMandarin\": {\n \"zh-Hans\": \"qi\u016b\",\n \"zh-Hant\": \"qi\u016b\"\n }\n },\n {\n \"char\": \"\u3401\",\n \"ucn\": \"U+3401\",\n \"kDefinition\": [\n \"to lick\",\n \"to taste, a mat, bamboo bark\"\n ],\n \"kCantonese\": [\n \"tim2\"\n ],\n \"kHanyuPinyin\": [\n {\n \"locations\": [\n {\n \"volume\": 1,\n \"page\": 19,\n \"character\": 2,\n \"virtual\": 0\n }\n ],\n \"readings\": [\n \"ti\u00e0n\"\n ]\n }\n ],\n \"kMandarin\": {\n \"zh-Hans\": \"ti\u00e0n\",\n \"zh-Hant\": \"ti\u00e0n\"\n }\n }\n ]\n\nYAML ``$ unihan-etl -F yaml``:\n\n.. code-block:: yaml\n\n - char: \u3400\n kCantonese:\n - jau1\n kDefinition:\n - (same as U+4E18 \u4e18) hillock or mound\n kMandarin:\n zh-Hans: qi\u016b\n zh-Hant: qi\u016b\n ucn: U+3400\n - char: \u3401\n kCantonese:\n - tim2\n kDefinition:\n - to lick\n - to taste, a mat, bamboo bark\n kHanyuPinyin:\n - locations:\n - character: 2\n page: 19\n virtual: 0\n volume: 1\n readings:\n - ti\u00e0n\n kMandarin:\n zh-Hans: ti\u00e0n\n zh-Hant: ti\u00e0n\n ucn: U+3401\n\n\nFeatures\n--------\n\n* automatically downloads UNIHAN from the internet\n* strives for accuracy with the specifications described in `UNIHAN's database\n design `_\n* export to JSON, CSV and YAML (requires `pyyaml`_) via ``-F``\n* configurable to export specific fields via ``-f``\n* accounts for encoding conflicts due to the Unicode-heavy content\n* designed as a technical proof for future CJK (Chinese, Japanese,\n Korean) datasets\n* core component and dependency of `cihai`_, a CJK library\n* `data package`_ support\n* expansion of multi-value delimited fields in YAML, JSON and python\n dictionaries \n* supports python 2.7, >= 3.5 and pypy\n\nIf you encounter a problem or have a question, please `create an\nissue`_.\n\n.. _cihai: https://cihai.git-pull.com\n.. _cihai-handbook: https://github.com/cihai/cihai-handbook\n.. _cihai team: https://github.com/cihai?tab=members\n.. _cihai-python: https://github.com/cihai/cihai-python\n\nUsage\n-----\n\n``unihan-etl`` offers customizable builds via its command line arguments.\n\nSee `unihan-etl CLI arguments`_ for information on how you can specify \ncolumns, files, download URL's, and output destination.\n\nTo download and build your own UNIHAN export:\n\n.. code-block:: bash\n\n $ pip install --user unihan-etl\n\nTo output CSV, the default format:\n\n.. code-block:: bash\n\n $ unihan-etl\n\nTo output JSON::\n\n $ unihan-etl -F json\n\nTo output YAML::\n\n $ pip install --user pyyaml\n $ unihan-etl -F yaml\n\nTo only output the kDefinition field in a csv::\n\n $ unihan-etl -f kDefinition\n\nTo output multiple fields, separate with spaces::\n\n $ unihan-etl -f kCantonese kDefinition\n\nTo output to a custom file::\n\n $ unihan-etl --destination ./exported.csv\n\nTo output to a custom file (templated file extension)::\n\n $ unihan-etl --destination ./exported.{ext}\n\nSee `unihan-etl CLI arguments`_ for advanced usage examples.\n\n.. _unihan-etl CLI arguments: https://unihan-etl.git-pull.com/en/latest/cli.html\n\nCode layout\n-----------\n\n.. code-block:: bash\n\n # cache dir (Unihan.zip is downloaded, contents extracted)\n {XDG cache dir}/unihan_etl/\n\n # output dir\n {XDG data dir}/unihan_etl/\n unihan.json\n unihan.csv\n unihan.yaml # (requires pyyaml)\n\n # package dir\n unihan_etl/\n process.py # argparse, download, extract, transform UNIHAN's data\n constants.py # immutable data vars (field to filename mappings, etc)\n expansion.py # extracting details baked inside of fields\n _compat.py # python 2/3 compatibility module\n util.py # utility / helper functions\n\n # test suite\n tests/*\n\n.. _UNIHAN: http://www.unicode.org/charts/unihan.html\n.. _ETL: https://en.wikipedia.org/wiki/Extract,_transform,_load\n.. _create an issue: https://github.com/cihai/unihan-etl/issues/new\n.. _Data Package: http://frictionlessdata.io/data-packages/\n.. _pyyaml: http://pyyaml.org/\n\n.. |pypi| image:: https://img.shields.io/pypi/v/unihan-etl.svg\n :alt: Python Package\n :target: http://badge.fury.io/py/unihan-etl\n\n.. |build-status| image:: https://img.shields.io/travis/cihai/unihan-etl.svg\n :alt: Build Status\n :target: https://travis-ci.org/cihai/unihan-etl\n\n.. |coverage| image:: https://codecov.io/gh/cihai/unihan-etl/branch/master/graph/badge.svg\n :alt: Code Coverage\n :target: https://codecov.io/gh/cihai/unihan-etl\n\n.. |license| image:: https://img.shields.io/github/license/cihai/unihan-etl.svg\n :alt: License \n\n.. |docs| image:: https://readthedocs.org/projects/unihan-etl/badge/?version=latest\n :alt: Documentation Status\n :scale: 100%\n :target: https://readthedocs.org/projects/unihan-etl/",
"description_content_type": "",
"docs_url": null,
"download_url": "https://pypi.python.org/pypi/unihan-etl",
"downloads": {
"last_day": -1,
"last_month": -1,
"last_week": -1
},
"home_page": "https://github.com/cihai/unihan-etl",
"keywords": "unihan-etl",
"license": "MIT",
"maintainer": "",
"maintainer_email": "",
"name": "unihan-etl",
"package_url": "https://pypi.org/project/unihan-etl/",
"platform": "",
"project_url": "https://pypi.org/project/unihan-etl/",
"project_urls": {
"Code": "https://github.com/cihai/unihan-etl",
"Documentation": "https://unihan-etl.git-pull.com",
"Download": "https://pypi.python.org/pypi/unihan-etl",
"Homepage": "https://github.com/cihai/unihan-etl",
"Issue tracker": "https://github.com/cihai/unihan-etl/issues"
},
"release_url": "https://pypi.org/project/unihan-etl/0.10.3/",
"requires_dist": null,
"requires_python": "",
"summary": "Export UNIHAN to Python, Data Package, CSV, JSON and YAML",
"version": "0.10.3"
},
"last_serial": 5695173,
"releases": {
"0.10.0": [
{
"comment_text": "",
"digests": {
"md5": "fe3c5bda195b0e14fccab012891ab580",
"sha256": "7cd433c5eea6c0791c7b2a388137d8c17cd946a708597868eda3dd262b184a30"
},
"downloads": -1,
"filename": "unihan-etl-0.10.0.tar.gz",
"has_sig": false,
"md5_digest": "fe3c5bda195b0e14fccab012891ab580",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 23393,
"upload_time": "2018-07-29T12:02:35",
"url": "https://files.pythonhosted.org/packages/3d/27/8ce6e4a0439d542475676d1be5843408a4de7e4d9296d2e07946cf703adb/unihan-etl-0.10.0.tar.gz"
}
],
"0.10.1": [
{
"comment_text": "",
"digests": {
"md5": "2e732ba1b969c9d8cac83ee77d66f107",
"sha256": "ed8278379756c1f3b5e8d902a4c2b43788853a0e0467b8668aee8628ae593b3e"
},
"downloads": -1,
"filename": "unihan-etl-0.10.1.tar.gz",
"has_sig": false,
"md5_digest": "2e732ba1b969c9d8cac83ee77d66f107",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 23694,
"upload_time": "2018-09-08T17:54:37",
"url": "https://files.pythonhosted.org/packages/38/2b/0f90895a3b1be22eb9496592efbb3ec1b0fd56aa55e86b3dc41bd488b40c/unihan-etl-0.10.1.tar.gz"
}
],
"0.10.2": [
{
"comment_text": "",
"digests": {
"md5": "47a61c1df1e362b02634bcd139dff596",
"sha256": "e693ffb5130654e945088713638c25acf04e339e81745b3f9b030c990dbb8530"
},
"downloads": -1,
"filename": "unihan-etl-0.10.2.tar.gz",
"has_sig": false,
"md5_digest": "47a61c1df1e362b02634bcd139dff596",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 27065,
"upload_time": "2019-08-17T15:41:42",
"url": "https://files.pythonhosted.org/packages/e4/ea/4632971b9de318da5e74bdee35526cd791e624b3f807796ec7c6c34c7979/unihan-etl-0.10.2.tar.gz"
}
],
"0.10.3": [
{
"comment_text": "",
"digests": {
"md5": "2120762799d3bdd610aa66f28d2fe1d1",
"sha256": "1f2451616002659141c10e62beedc6ad6d1afb5224a380d52ac09214b56b19ad"
},
"downloads": -1,
"filename": "unihan-etl-0.10.3.tar.gz",
"has_sig": false,
"md5_digest": "2120762799d3bdd610aa66f28d2fe1d1",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 27136,
"upload_time": "2019-08-18T16:36:14",
"url": "https://files.pythonhosted.org/packages/b8/88/951644cfd1509cc7773a1f46404b23ddbf80c51549cd7cb8696629351972/unihan-etl-0.10.3.tar.gz"
}
],
"0.9.0": [
{
"comment_text": "",
"digests": {
"md5": "49ee124ce8b6170b9674165d38724ada",
"sha256": "555f085985c603e48209b72a3c6774d480f4541bb25ca89f00555549d7195f2b"
},
"downloads": -1,
"filename": "unihan-etl-0.9.0.tar.gz",
"has_sig": false,
"md5_digest": "49ee124ce8b6170b9674165d38724ada",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 24934,
"upload_time": "2017-05-26T19:24:30",
"url": "https://files.pythonhosted.org/packages/02/dd/40e8e2b8dca9a198bb32094c7e4237eb8e6509f718c0e36862c2134bf922/unihan-etl-0.9.0.tar.gz"
}
],
"0.9.1": [
{
"comment_text": "",
"digests": {
"md5": "0d664b16e5d34fdd6019a81185472bdd",
"sha256": "a8e0e3074f8e70fe52ec2aaa03945382c46c93115ab68b26e5247014ffc88d41"
},
"downloads": -1,
"filename": "unihan-etl-0.9.1.tar.gz",
"has_sig": false,
"md5_digest": "0d664b16e5d34fdd6019a81185472bdd",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 25662,
"upload_time": "2017-05-28T02:41:40",
"url": "https://files.pythonhosted.org/packages/08/5b/3c876ce0648cf66fca27b72e1570dae3324e50c48cbf64467dd0abed5cb6/unihan-etl-0.9.1.tar.gz"
}
],
"0.9.2": [
{
"comment_text": "",
"digests": {
"md5": "742ad9882b829ea24332e9f8d788f8d5",
"sha256": "1460233ecea51eac85e3b81095faa5d6716e19bdaeeb0bc87636650b2ad095b0"
},
"downloads": -1,
"filename": "unihan-etl-0.9.2.tar.gz",
"has_sig": false,
"md5_digest": "742ad9882b829ea24332e9f8d788f8d5",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 25141,
"upload_time": "2017-05-31T12:55:42",
"url": "https://files.pythonhosted.org/packages/92/69/33daf987b1b5d64e6a0ed1e4bfd643be4a6a91281266f0f26d61c42afab6/unihan-etl-0.9.2.tar.gz"
}
],
"0.9.3": [
{
"comment_text": "",
"digests": {
"md5": "0d91e1ed9346fe6351bfe240b539111b",
"sha256": "d16ef1e160fcc0f1e06743ea894f3104539f292d2fadcc5bc27f002e5395cb17"
},
"downloads": -1,
"filename": "unihan-etl-0.9.3.tar.gz",
"has_sig": false,
"md5_digest": "0d91e1ed9346fe6351bfe240b539111b",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 25167,
"upload_time": "2017-05-31T16:14:17",
"url": "https://files.pythonhosted.org/packages/86/9f/a483395926406a055aa1089ad79720fc61ae47cded6fd66c7722bd8f796d/unihan-etl-0.9.3.tar.gz"
}
],
"0.9.4": [
{
"comment_text": "",
"digests": {
"md5": "749abf5aa9e4b9d65a7eb83c396e943d",
"sha256": "e109d0707e85310be30b65d4210be0326c9aa1011589de3673470a66065ce4b2"
},
"downloads": -1,
"filename": "unihan-etl-0.9.4.tar.gz",
"has_sig": false,
"md5_digest": "749abf5aa9e4b9d65a7eb83c396e943d",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 25275,
"upload_time": "2017-06-05T05:04:02",
"url": "https://files.pythonhosted.org/packages/e6/f2/ad475612c1951852fbf186e9c2ffac59296b278605657be60b58b4699358/unihan-etl-0.9.4.tar.gz"
}
],
"0.9.5": [
{
"comment_text": "",
"digests": {
"md5": "a8a37c20bc9d45d81899c097f28f3640",
"sha256": "253453fbd2c439c75739a8026a56ee333f77f23370a3cab791bd71e70c905707"
},
"downloads": -1,
"filename": "unihan-etl-0.9.5.tar.gz",
"has_sig": false,
"md5_digest": "a8a37c20bc9d45d81899c097f28f3640",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 25345,
"upload_time": "2017-06-26T17:39:53",
"url": "https://files.pythonhosted.org/packages/d8/de/3d9897770b4f0eafb171ba960ce769fa5034edcb5ffc2b3c0538635c5b63/unihan-etl-0.9.5.tar.gz"
}
]
},
"urls": [
{
"comment_text": "",
"digests": {
"md5": "2120762799d3bdd610aa66f28d2fe1d1",
"sha256": "1f2451616002659141c10e62beedc6ad6d1afb5224a380d52ac09214b56b19ad"
},
"downloads": -1,
"filename": "unihan-etl-0.10.3.tar.gz",
"has_sig": false,
"md5_digest": "2120762799d3bdd610aa66f28d2fe1d1",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 27136,
"upload_time": "2019-08-18T16:36:14",
"url": "https://files.pythonhosted.org/packages/b8/88/951644cfd1509cc7773a1f46404b23ddbf80c51549cd7cb8696629351972/unihan-etl-0.10.3.tar.gz"
}
]
}