{ "info": { "author": "Thomas Lee", "author_email": "thomaslee@yam.ai", "bugtrack_url": null, "classifiers": [ "Development Status :: 3 - Alpha", "License :: OSI Approved :: Apache Software License", "Programming Language :: Python :: 3.6", "Topic :: Scientific/Engineering :: Artificial Intelligence" ], "description": "# yamconv\n\n`yamconv` converts a machine learning dataset from one format to another format.\n\n## Installation\n\n`yamconv` is published on [PyPI](https://pypi.org/project/yamconv/). You can install `yamconv` using pip as follows:\n\n```sh\npip install yamconv\n```\n\nAlternatively, you can install it from the source code by running `pip` in the project directory where [`setup.py`](https://github.com/yam-ai/yamconv/blob/master/setup.py) is located:\n\n```sh\npip install .\n```\n\n## Usage\n\n```sh\nyamconv.py -c converter -i input_file -o output_file -s settings -v\n```\n\n* `-c`: converter name\n* `-i`: input file path\n* `-o`: output file path\n* `-s`: converter settings in JSON\n* `-v`: verbose, to display the processing progress and information\n\n## Supported converters\n\nThe following are the supported converters:\n\n* `mlt.sqlite2fasttext`: SQLite database file to fastText text file\n* `mlt.sqlite2csv`: SQLite database file to CSV text file\n* `mlt.fasttext2sqlite`: fastText text file to SQLite database file\n* `mlt.csv2sqlite`: CSV text file to SQLite database file\n* `mlt.csv2fasttext`: CSV text file to fastText text file\n* `mlt.sqlite2sqlite`: SQLite database file to SQLite database file (with normalization)\n* `mlt.fasttext2fasttext`: fastText text file to fastText text file (with normalization)\n* `mlt.csv2csv`: CSV text file to CSV text file (with normalization)\n\n### Settings\n\nSettings for converters are given in the `-s` option as a JSON string, e.g., `'{\"cache_labels\": true}'`.\n\n| Setting | Values | Description | Applicable converters |\n|---------|--------|-------------|-----------------------|\n| `normalize_labels` | `true` (default), `false` | When `normalize_labels` is `true`, all labels are normalized. That is, all symbols are removed; all alphabets are converted to lower case. | Any |\n| `word_seq` | `true`, `false` (default) | When `word_seq` is `true`, each text is normalized into a sequence of lower-case words. That is, all symbols are removed, all alphabets are converted to lower case; and all unicode word characters (e.g., Chinese characters) are delimited by a space. | Any |\n| `cache_labels` | `true`, `false` (default) | When `cache_labels` is `true`, the normalized labels are cached in memory. It can be set to `false` if there is insufficient memory to cache a huge number of different labels in the dataset. | Any |\n\n## Supported dataset formats\n\n### Multi-label text classificaiton\n\n#### SQLite database\n\nA [SQLite](https://www.sqlite.org) database is used to store the classifications of texts.\nThe database schema is as follows:\n\n```SQL\nCREATE TABLE IF NOT EXISTS texts (\n id TEXT NOT NULL PRIMARY KEY,\n text TEXT NOT NULL\n);\nCREATE TABLE IF NOT EXISTS labels (\n label TEXT NOT NULL,\n text_id text NOT NULL,\n FOREIGN KEY (text_id) REFERENCES texts(id)\n);\nCREATE INDEX IF NOT EXISTS label_index ON labels (label);\nCREATE INDEX IF NOT EXISTS text_id_index ON labels (text_id);\n```\n\nThe `texts` table contains the text contents in the `text` field,\nand each row is uniquely identified by the `id` field.\nThe `labels` table contains the labels in the `label` field.\nEach row has a `text_id` foreign key that links the label to the text in the `texts` table,\nwhere the text is classified with the label.\nIn other words, each row in `texts` is associated with zero or more rows in `labels`.\n\n#### fastText text file\n\nThe [fastText](https://fasttext.cc) format is a text file that contains a series of lines.\nEach line represents a text classified by multiple labels.\nA line starts with multiple labels, followed by the text content.\nEach label is marked with the `__label__` prefix and the labels are separated by a space.\nThe following is a fragment of an example fastText dataset file:\n\n```text\n__label__food __label__region Many people love having dim sum in Hong Kong restaurants.\n__label__region __label__plant __label__business The Netherlands is the major supplier to the European floral market.\n```\n\n#### CSV text file\n\nThe dataset is in form of a CSV (Common Separated Values) file. The first row is the header. Each of the second row and the following rows stores a single record. The CSV file can be in either of one of the following formats.\n\n##### Format 1\n\nSuppose the format of the header row is like the follwoing:\n\n```csv\n\"id\", \"text\", \"region\", \"business\", \"food\", \"plant\"\n```\n\nThat is:\n\n* Cell `1`: `id`\n* Cell `2`: any arbitary value\n* Cell `n` where `n >= 3`: the name of label `n`, e.g., `region`, `business`, `food`, `plant`.\n\nEach record row looks like:\n\n```csv\n\"10\", \"Many people love having dim sum in Hong Kong restaurants.\", 1, 0, 1, 0\n```\n\nThat is:\n\n* Cell `1`: the `id` string\n* Cell `2`: the text content\n* Cell `n` where `n >= 3`: `1` or `0` representing whether the text is classified with label `n` or not respectively.\n\n##### Format 2\n\nSuppose the format of the header row is like the follwoing:\n\n```csv\n\"text\", \"region\", \"business\", \"food\", \"plant\"\n```\n\nThat is:\n\n* Cell `1`: any arbitary value\n* Cell `n` where `n >= 2`: the name of label `n`, e.g., `region`, `business`, `food`, `plant`.\n\nEach record row looks like:\n\n```csv\n\"Many people love having dim sum in Hong Kong restaurants.\", 1, 0, 1, 0\n```\n\nThat is:\n\n* Cell `1`: the text content\n* Cell `n` where `n >= 2`: `1` or `0` representing whether the text is classified with label `n` or not respectively.\n\n## Profesional services\n\nIf you need any supporting resources or consultancy services from YAM AI Machinery, please find us at:\n\n* https://www.yam.ai\n* https://twitter.com/theYAMai\n* https://www.linkedin.com/company/yamai\n* https://www.facebook.com/theYAMai\n* https://github.com/yam-ai\n* https://hub.docker.com/u/yamai", "description_content_type": "text/markdown", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/yam-ai/yamconv", "keywords": "machine learning", "license": "", "maintainer": "", "maintainer_email": "", "name": "yamconv", "package_url": "https://pypi.org/project/yamconv/", "platform": "", "project_url": "https://pypi.org/project/yamconv/", "project_urls": { "Homepage": "https://github.com/yam-ai/yamconv" }, "release_url": "https://pypi.org/project/yamconv/0.1.7/", "requires_dist": null, "requires_python": "", "summary": "yamconv converts the file formats of machine learning datasets", "version": "0.1.7" }, "last_serial": 5809274, "releases": { "0.1": [ { "comment_text": "", "digests": { "md5": "a9688e11cdb81a324cf4ee0b4dea54a7", "sha256": "65bebe5c4d21602106472c070dbdabb2b72b65815fa389d933246f6a277d6551" }, "downloads": -1, "filename": "yamconv-0.1.tar.gz", "has_sig": false, "md5_digest": "a9688e11cdb81a324cf4ee0b4dea54a7", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 1506, "upload_time": "2019-08-27T09:45:09", "url": "https://files.pythonhosted.org/packages/c0/74/247574c218c382f7bb9e0813d543b0b9a159bb3136f51c65cc90f26fe9c9/yamconv-0.1.tar.gz" } ], "0.1.1": [ { "comment_text": "", "digests": { "md5": "d658c9b065b970ae5d739e44a1461517", "sha256": "1192a3731f122b0152be547609df6279d4d1d9cd66c28e95fd4ee024fa10956b" }, "downloads": -1, "filename": "yamconv-0.1.1.tar.gz", "has_sig": false, "md5_digest": "d658c9b065b970ae5d739e44a1461517", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 1547, "upload_time": "2019-08-27T09:49:53", "url": "https://files.pythonhosted.org/packages/71/96/9282c6e8398ad1fce0c1f7a2c0c6b5d980fbac0833f53c80368f1c2767d5/yamconv-0.1.1.tar.gz" } ], "0.1.2": [ { "comment_text": "", "digests": { "md5": "ad3d988cab660b354cc0f5922cd9849a", "sha256": "ff3dcd8fd0bc7e21a7c76f0696badc1e85d1e749bf1809d69991402d609344ae" }, "downloads": -1, "filename": "yamconv-0.1.2.tar.gz", "has_sig": false, "md5_digest": "ad3d988cab660b354cc0f5922cd9849a", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 2637, "upload_time": "2019-08-27T14:41:11", "url": "https://files.pythonhosted.org/packages/13/f0/752eaf802ccc315ef44104ab6b36bde31338fd4bee65748275c22fc6c8d4/yamconv-0.1.2.tar.gz" } ], "0.1.3": [ { "comment_text": "", "digests": { "md5": "f745a1f94ed3b2b360319e1417d14ae8", "sha256": "9f654e27a6945621d61d251069b7dd8459a151681a90bc2bba465ba2a37dced6" }, "downloads": -1, "filename": "yamconv-0.1.3.tar.gz", "has_sig": false, "md5_digest": "f745a1f94ed3b2b360319e1417d14ae8", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 6825, "upload_time": "2019-08-28T06:46:38", "url": "https://files.pythonhosted.org/packages/76/9a/0c22fcc1de06b0b3361ec9c5047a3b02ef708dee10ab621a580b7c29f0c9/yamconv-0.1.3.tar.gz" } ], "0.1.4": [ { "comment_text": "", "digests": { "md5": "6552a08c14f3851ffbc59447e7205f7e", "sha256": "e7c49b28ab0ee0f70209dba213e9e33356805c85da52eb994d6c50f2971ad0fd" }, "downloads": -1, "filename": "yamconv-0.1.4.tar.gz", "has_sig": false, "md5_digest": "6552a08c14f3851ffbc59447e7205f7e", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 7452, "upload_time": "2019-08-29T07:38:06", "url": "https://files.pythonhosted.org/packages/fb/a7/6f13046d3cb3da137d206cf1a2e33d0007bca38705752f1ac4de490ca4af/yamconv-0.1.4.tar.gz" } ], "0.1.5": [ { "comment_text": "", "digests": { "md5": "c73721a04a7e2a2eeb5ef73eb7785cb1", "sha256": "64add75902c84229b6fb6e4cc37e018eaa563231caeefe8a4b1aacb4797b7f99" }, "downloads": -1, "filename": "yamconv-0.1.5.tar.gz", "has_sig": false, "md5_digest": "c73721a04a7e2a2eeb5ef73eb7785cb1", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 9014, "upload_time": "2019-09-06T16:46:28", "url": "https://files.pythonhosted.org/packages/85/35/3e794c85af1d54e4c7d19a7419cf04a4ebd7a10bcfdc1c22498af66dbb2c/yamconv-0.1.5.tar.gz" } ], "0.1.7": [ { "comment_text": "", "digests": { "md5": "76c7214eaab11de113457609d3137492", "sha256": "185395dd8c3f6ef787e7ce8d728604690e002e828a3deb3d8ad6ac1a9cf0fb41" }, "downloads": -1, "filename": "yamconv-0.1.7.tar.gz", "has_sig": false, "md5_digest": "76c7214eaab11de113457609d3137492", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 9414, "upload_time": "2019-09-10T14:21:42", "url": "https://files.pythonhosted.org/packages/30/0c/fc176c6661b1e533c031491021fdac8f3dcab93a2d6dda7b67ca5698e3c0/yamconv-0.1.7.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "76c7214eaab11de113457609d3137492", "sha256": "185395dd8c3f6ef787e7ce8d728604690e002e828a3deb3d8ad6ac1a9cf0fb41" }, "downloads": -1, "filename": "yamconv-0.1.7.tar.gz", "has_sig": false, "md5_digest": "76c7214eaab11de113457609d3137492", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 9414, "upload_time": "2019-09-10T14:21:42", "url": "https://files.pythonhosted.org/packages/30/0c/fc176c6661b1e533c031491021fdac8f3dcab93a2d6dda7b67ca5698e3c0/yamconv-0.1.7.tar.gz" } ] }