{ "info": { "author": "IBM SystemT, IBM CODAIT", "author_email": "qian.kun@ibm.com, karthik.muthuraman@ibm.com, ihjhuo@ibm.com, frreiss@us.ibm.com", "bugtrack_url": null, "classifiers": [ "License :: OSI Approved :: Apache Software License" ], "description": "# DimSim - A Chinese Soundex Library (Python version) \n\nDimSim is a library developed by the Scalable Knowledge Intelligence team at IBM Almaden Research Center as part of the [SystemT](https://researcher.watson.ibm.com/researcher/view_group.php?id=1264) project. \n\nThe PyPi project page can be found [here](https://pypi.org/project/dimsim/). It was created in collaboration with IBM Center for Open-Source Data and AI Technologies ([CODAIT](codait.org)).\n\n## Overview\nWe provide a phonetic algorithm for indexing Chinese characters by sound. The technical details can be found in the following [paper](http://aclweb.org/anthology/K18-1043):\n\nMin Li, Marina Danilevsky, Sara Noeman and Yunyao Li. *DIMSIM: An Accurate Chinese Phonetic Similarity Algorithm based on Learned High Dimensional Encoding*. CoNLL 2018.\n\nIn this library, we provide a pre-trained model that can perform the following functions:\n- Given two Chinese phrases (of the same length), return the phonetic distance of the input phrases. Optionally you can feed in pinyin strings of Chinese phrases too.\n- Given a Chinese phrase, return its top-k similar (phoentically) Chinese phrases.\n\n\n\n## How to install\n\n**Dependencies**:\n- [pypinyin](https://github.com/mozillazg/python-pinyin): used for translating Chinese characters into their correponding pinyins. \n\nThere are two ways to install this library:\n- Install from PyPi\n\n```shell\npip install dimsim\n```\n- Download the source code by cloning this repo and compile it yourself.\n\n```shell\ngit clone git@github.com:System-T/DimSim.git\n\ncd DimSim/\n\npip install -e .\n```\n\n## How to use\nOnce you have the package installed you can use it for the two functions as shown below.\n\n- Computing phonetic distance of two Chinese phrases. The optional argument `pinyin` (False by default) can be used to provide a pinyin string list directly. See example usage below.\n\n```python\nimport dimsim\n\ndist = dimsim.get_distance(\"\u5927\u4fa0\",\"\u5927\u867e\")\n0.0002380952380952381\n\ndist = dimsim.get_distance(\"\u5927\u4fa0\",\"\u5927\u4eba\")\n25.001417183349876\n\ndist = dimsim.get_distance(['da4','xia2'],['da4','xia1']], pinyin=True)\n0.0002380952380952381\n\ndist = dimsim.get_distance(['da4','xia2'],['da4','ren2']], pinyin=True)\n25.001417183349876\n\n```\n***\n- Return top-k phonetically similar phrases of a given Chinese phrase. Two parameters:\n- **mode** controls the character type of the returned Chinese phrases, where 'simplified' represents simplified Chinese and 'traditional' represents traditional Chinese.\n- **theta** controls the size of search space for the candidate phrases.\n```python\nimport dimsim\n\ncandidates = dimsim.get_candidates(\"\u5927\u4fa0\", mode=\"simplified\", theta=1)\n['\u6253\u4e0b', '\u5927\u867e', '\u5927\u4fa0']\n\ncandidates = dimsim.get_candidates(\"\u7c89\u4e1d\", mode=\"traditinoal\", theta=1)\n['\u9580\u5e02', '\u5206\u6642', '\u711a\u5c4d', '\u7c89\u98fe', '\u7c89\u7d72']\n```\n\n## Citation\n\nPlease cite the library by referencing the published paper:\n```\n@InProceedings{K18-1043,\n author = \t{Li, Min and Danilevsky, Marina and Noeman, Sara and Li, Yunyao},\n title = \t{{DIMSIM:} An Accurate Chinese Phonetic Similarity Algorithm Based on Learned High Dimensional Encoding},\n booktitle = \t{Proceedings of the 22nd Conference on Computational Natural Language Learning},\n year = \t{2018},\n publisher = \t{Association for Computational Linguistics},\n pages = \t{444-453},\n location = \t{Brussels, Belgium},\n url = \t{http://aclweb.org/anthology/K18-1043}\n}\n```\n\n\n", "description_content_type": "text/markdown", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/System-T/DimSim", "keywords": "", "license": "", "maintainer": "", "maintainer_email": "", "name": "dimsim", "package_url": "https://pypi.org/project/dimsim/", "platform": "", "project_url": "https://pypi.org/project/dimsim/", "project_urls": { "Homepage": "https://github.com/System-T/DimSim" }, "release_url": "https://pypi.org/project/dimsim/0.2.2/", "requires_dist": [ "pypinyin" ], "requires_python": "", "summary": "Python implementation of the Chinese soundex project DimSim", "version": "0.2.2" }, "last_serial": 5185348, "releases": { "0.2": [ { "comment_text": "", "digests": { "md5": "b68e18f6d72e72f407e038cc679845b6", "sha256": "19c23704515e645cb597db55b26f40bcbcb0497d36c08ebc1d971a434490a798" }, "downloads": -1, "filename": "dimsim-0.2-py3-none-any.whl", "has_sig": false, "md5_digest": "b68e18f6d72e72f407e038cc679845b6", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 2382673, "upload_time": "2019-04-24T23:44:39", "url": "https://files.pythonhosted.org/packages/13/ff/bcac60f377e359ce57f91a62c40325526fab49dbb9f7590f27caec524ad2/dimsim-0.2-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "b60b283b13c97c99fec972c413545c12", "sha256": "23d6008902f67cc3f8623a3040319e8b9cff86cc144d8cf8804a1f3d6f2624ed" }, "downloads": -1, "filename": "dimsim-0.2.tar.gz", "has_sig": false, "md5_digest": "b60b283b13c97c99fec972c413545c12", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 2379666, "upload_time": "2019-04-24T23:44:46", "url": "https://files.pythonhosted.org/packages/78/a8/c488ae4e63bfd613ecc10228832027c45d1120e80d29622137e6471858da/dimsim-0.2.tar.gz" } ], "0.2.1": [ { "comment_text": "", "digests": { "md5": "223607fc515341aa52ec95e77bc4b31e", "sha256": "7ac286b587d17f0420fdbc360d578dd224e4a897d1d37719bd48699fb2632a59" }, "downloads": -1, "filename": "dimsim-0.2.1-py3-none-any.whl", "has_sig": false, "md5_digest": "223607fc515341aa52ec95e77bc4b31e", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 2382694, "upload_time": "2019-04-24T23:57:26", "url": "https://files.pythonhosted.org/packages/40/40/aa4d28daee41eb6dd8a7e45bd9492b53be436c295b8159c280bae9497b71/dimsim-0.2.1-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "5c68c44c9cf08c749c5e6be45f8a9e32", "sha256": "484f94e25c41f118e7b490e76c52a4b2ea6cf7503b74c43a21af958919a79033" }, "downloads": -1, "filename": "dimsim-0.2.1.tar.gz", "has_sig": false, "md5_digest": "5c68c44c9cf08c749c5e6be45f8a9e32", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 2379701, "upload_time": "2019-04-24T23:57:29", "url": "https://files.pythonhosted.org/packages/aa/d6/62dc0f626a3fe88811c700a1bfddf5506a78f36299d196a1d17eb9ea276a/dimsim-0.2.1.tar.gz" } ], "0.2.2": [ { "comment_text": "", "digests": { "md5": "9ca40666ae5b8619bbe965b77a40cf18", "sha256": "f2babc83c26bc2650207716cc98f06e8d2642c3c592fd320c7061ea5a7b9b54c" }, "downloads": -1, "filename": "dimsim-0.2.2-py3-none-any.whl", "has_sig": false, "md5_digest": "9ca40666ae5b8619bbe965b77a40cf18", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 2384287, "upload_time": "2019-04-25T00:00:46", "url": "https://files.pythonhosted.org/packages/b7/04/f2245b7260ea8fa61400cd25149a90c3a55d691301b0826d408222b5bf3a/dimsim-0.2.2-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "a40d82191a0f87979b64759b54bba857", "sha256": "1b1b1df6efb989ccbb23f3d0a3788a0cd59de563c511e58fe057ddd9fc49d3be" }, "downloads": -1, "filename": "dimsim-0.2.2.tar.gz", "has_sig": false, "md5_digest": "a40d82191a0f87979b64759b54bba857", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 2381705, "upload_time": "2019-04-25T00:00:49", "url": "https://files.pythonhosted.org/packages/25/ca/9f0c5fa7eea40ef1f509d872a8e8a7c6b0144d3593913abad834db2415a6/dimsim-0.2.2.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "9ca40666ae5b8619bbe965b77a40cf18", "sha256": "f2babc83c26bc2650207716cc98f06e8d2642c3c592fd320c7061ea5a7b9b54c" }, "downloads": -1, "filename": "dimsim-0.2.2-py3-none-any.whl", "has_sig": false, "md5_digest": "9ca40666ae5b8619bbe965b77a40cf18", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 2384287, "upload_time": "2019-04-25T00:00:46", "url": "https://files.pythonhosted.org/packages/b7/04/f2245b7260ea8fa61400cd25149a90c3a55d691301b0826d408222b5bf3a/dimsim-0.2.2-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "a40d82191a0f87979b64759b54bba857", "sha256": "1b1b1df6efb989ccbb23f3d0a3788a0cd59de563c511e58fe057ddd9fc49d3be" }, "downloads": -1, "filename": "dimsim-0.2.2.tar.gz", "has_sig": false, "md5_digest": "a40d82191a0f87979b64759b54bba857", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 2381705, "upload_time": "2019-04-25T00:00:49", "url": "https://files.pythonhosted.org/packages/25/ca/9f0c5fa7eea40ef1f509d872a8e8a7c6b0144d3593913abad834db2415a6/dimsim-0.2.2.tar.gz" } ] }