{ "info": { "author": "znwang25", "author_email": "znwang25@gmail.com", "bugtrack_url": null, "classifiers": [ "Intended Audience :: Developers", "License :: OSI Approved :: BSD License", "Natural Language :: Chinese (Simplified)", "Natural Language :: Chinese (Traditional)", "Operating System :: OS Independent", "Programming Language :: Python :: 3", "Topic :: Text Processing", "Topic :: Text Processing :: Linguistic" ], "description": "fuzzychinese\n=====\n\u5f62\u8fd1\u8bcd\u4e2d\u6587\u6a21\u7cca\u5339\u914d\n\nA simple tool to fuzzy match chinese words, particular useful for proper name matching and address matching. \n\n\u4e00\u4e2a\u53ef\u4ee5\u6a21\u7cca\u5339\u914d\u5f62\u8fd1\u5b57\u8bcd\u7684\u5c0f\u5de5\u5177\u3002\u5bf9\u4e8e\u4e13\u6709\u540d\u8bcd\uff0c\u5730\u5740\u7684\u5339\u914d\u5c24\u5176\u6709\u7528\u3002\n\n## \u5b89\u88c5\u8bf4\u660e\n```\npip install fuzzychinese\n```\n\n## \u4f7f\u7528\u8bf4\u660e\n\u9996\u5148\u4f7f\u7528\u60f3\u8981\u5339\u914d\u7684\u5b57\u5178\u5bf9\u6a21\u578b\u8fdb\u884c\u8bad\u7ec3\u3002\n\n\u7136\u540e\u7528`FuzzyChineseMatch.transform(raw_words, n)` \u6765\u5feb\u901f\u67e5\u627e\u4e0e`raw_words`\u7684\u8bcd\u6700\u76f8\u8fd1\u7684\u524dn\u4e2a\u8bcd\u3002\n\n\u8bad\u7ec3\u6a21\u578b\u65f6\u6709\u4e09\u79cd\u5206\u6790\u65b9\u5f0f\u53ef\u4ee5\u9009\u62e9\uff0c\u7b14\u5212\u5206\u6790(`stroke`)\uff0c\u90e8\u9996\u5206\u6790(`radical`)\uff0c\u548c\u5355\u5b57\u5206\u6790(`char`)\u3002\u4e5f\u53ef\u4ee5\u901a\u8fc7\u8c03\u6574`ngram_range`\u7684\u503c\u6765\u63d0\u9ad8\u6a21\u578b\u6027\u80fd\u3002\n\n\u5339\u914d\u5b8c\u6210\u540e\u8fd4\u56de\u76f8\u4f3c\u5ea6\u5206\u6570\uff0c\u5339\u914d\u7684\u76f8\u8fd1\u8bcd\u8bed\u53ca\u5176\u539f\u6709\u7d22\u5f15\u53f7\u3002\n\n```python\n import pandas as pd\n from fuzzychinese import FuzzyChineseMatch\n test_dict = pd.Series(['\u957f\u767d\u671d\u9c9c\u65cf\u81ea\u6cbb\u53bf','\u957f\u9633\u571f\u5bb6\u65cf\u81ea\u6cbb\u53bf','\u57ce\u6b65\u82d7\u65cf\u81ea\u6cbb\u53bf','\u8fbe\u5c14\u7f55\u8302\u660e\u5b89\u8054\u5408\u65d7','\u6c68\u7f57\u5e02'])\n raw_word = pd.Series(['\u8fbe\u8302\u8054\u5408\u65d7','\u957f\u9633\u53bf','\u6c69\u7f57\u5e02'])\n assert('\u6c69\u7f57\u5e02'!='\u6c68\u7f57\u5e02') # They are not the same!\n\n fcm = FuzzyChineseMatch(ngram_range=(3, 3), analyzer='stroke')\n fcm.fit(test_dict)\n top2_similar = fcm.transform(raw_word, n=2)\n res = pd.concat([\n raw_word,\n pd.DataFrame(top2_similar, columns=['top1', 'top2']),\n pd.DataFrame(\n fcm.get_similarity_score(),\n columns=['top1_score', 'top2_score']),\n pd.DataFrame(\n fcm.get_index(),\n columns=['top1_index', 'top2_index'])],\n axis=1)\n```\n\n| | top1 | top2 | top1_score | top2_score | top1_index | top2_index |\n| ---------- | ------------------ | ---------------- | ---------- | ---------- | ---------- | ---------- |\n| \u8fbe\u8302\u8054\u5408\u65d7 | \u8fbe\u5c14\u7f55\u8302\u660e\u5b89\u8054\u5408\u65d7 | \u957f\u767d\u671d\u9c9c\u65cf\u81ea\u6cbb\u53bf | 0.824751 | 0.287237 | 3 | 0 |\n| \u957f\u9633\u53bf | \u957f\u9633\u571f\u5bb6\u65cf\u81ea\u6cbb\u53bf | \u957f\u767d\u671d\u9c9c\u65cf\u81ea\u6cbb\u53bf | 0.610285 | 0.475000 | 1 | 0 |\n| \u6c69\u7f57\u5e02 | \u6c68\u7f57\u5e02 | \u957f\u767d\u671d\u9c9c\u65cf\u81ea\u6cbb\u53bf | 1.000000 | 0.152093 | 4 | 0 |\n\n## \u5176\u4ed6\u529f\u80fd\n+ \u76f4\u63a5\u4f7f\u7528`Stroke`, `Radical`\u8fdb\u884c\u6c49\u5b57\u5206\u89e3\u3002\n ```python\n stroke = Stroke()\n radical = Radical()\n print(\"\u50cf\", stroke.get_stroke(\"\u50cf\"))\n print(\"\u50cf\", radical.get_radical(\"\u50cf\"))\n ```\n ```\n \u50cf \u31d2\u3021\u31d2\u31c7\u3021\u31d5\u4e00\u31d2\u31c1\u31d2\u31d2\u31d2\u31cf\n \u50cf \u4eba\u8c61\n ```\n+ \u4f7f\u7528`FuzzyChineseMatch.compare_two_columns(X, Y)`\u5bf9\u6bcf\u4e00\u884c\u7684\u4e24\u4e2a\u8bcd\u8fdb\u884c\u6bd4\u8f83\uff0c\u83b7\u5f97\u76f8\u4f3c\u5ea6\u5206\u6570\u3002\n\n+ \u8be6\u60c5\u8bf7\u53c2\u89c1[\u8bf4\u660e\u6587\u6863](http://znwang.me/fuzzychinese/).\n\n## \u81f4\u8c22\n\n\u62c6\u5b57\u6570\u636e\u6765\u81ea\u4e8e [\u6f22\u8a9e\u62c6\u5b57\u5b57\u5178](https://github.com/kfcd/chaizi) by [\u958b\u653e\u8a5e\u5178\u7db2](http://kaifangcidian.com/)\u3002\n\n## Installation\n```\npip install fuzzychinese\n```\n\n## Quickstart\n\nFirst train a model with the target list of words you want to match to. \n\nThen use `FuzzyChineseMatch.transform(raw_words, n)` to find top n most similar words in the target for your `raw_words` . \n\nThere are three analyzers to choose from when training a model: `stroke`, `radical`, and `char`. You can also change `ngram_range` to fine-tune the model.\n\nAfter the matching, similarity score, matched words and its corresponding index are returned. \n\n```python\n from fuzzychinese import FuzzyChineseMatch\n test_dict = pd.Series(['\u957f\u767d\u671d\u9c9c\u65cf\u81ea\u6cbb\u53bf','\u957f\u9633\u571f\u5bb6\u65cf\u81ea\u6cbb\u53bf','\u57ce\u6b65\u82d7\u65cf\u81ea\u6cbb\u53bf','\u8fbe\u5c14\u7f55\u8302\u660e\u5b89\u8054\u5408\u65d7','\u6c68\u7f57\u5e02'])\n raw_word = pd.Series(['\u8fbe\u8302\u8054\u5408\u65d7','\u957f\u9633\u53bf','\u6c69\u7f57\u5e02'])\n assert('\u6c69\u7f57\u5e02'!='\u6c68\u7f57\u5e02') # They are not the same!\n\n fcm = FuzzyChineseMatch(ngram_range=(3, 3), analyzer='stroke')\n fcm.fit(test_dict)\n top2_similar = fcm.transform(raw_word, n=2)\n res = pd.concat([\n raw_word,\n pd.DataFrame(top2_similar, columns=['top1', 'top2']),\n pd.DataFrame(\n fcm.get_similarity_score(),\n columns=['top1_score', 'top2_score']),\n pd.DataFrame(\n fcm.get_index(),\n columns=['top1_index', 'top2_index'])],\n axis=1)\n```\n\n| | top1 | top2 | top1_score | top2_score | top1_index | top2_index |\n| ---------- | ------------------ | ---------------- | ---------- | ---------- | ---------- | ---------- |\n| \u8fbe\u8302\u8054\u5408\u65d7 | \u8fbe\u5c14\u7f55\u8302\u660e\u5b89\u8054\u5408\u65d7 | \u957f\u767d\u671d\u9c9c\u65cf\u81ea\u6cbb\u53bf | 0.824751 | 0.287237 | 3 | 0 |\n| \u957f\u9633\u53bf | \u957f\u9633\u571f\u5bb6\u65cf\u81ea\u6cbb\u53bf | \u957f\u767d\u671d\u9c9c\u65cf\u81ea\u6cbb\u53bf | 0.610285 | 0.475000 | 1 | 0 |\n| \u6c69\u7f57\u5e02 | \u6c68\u7f57\u5e02 | \u957f\u767d\u671d\u9c9c\u65cf\u81ea\u6cbb\u53bf | 1.000000 | 0.152093 | 4 | 0 |\n\n## Other use\n+ Directly use `Stroke`, `Radical` to decompose Chinese character into strokes or radicals.\n ```python\n stroke = Stroke()\n radical = Radical()\n print(\"\u50cf\", stroke.get_stroke(\"\u50cf\"))\n print(\"\u50cf\", radical.get_radical(\"\u50cf\"))\n ```\n ```\n \u50cf \u31d2\u3021\u31d2\u31c7\u3021\u31d5\u4e00\u31d2\u31c1\u31d2\u31d2\u31d2\u31cf\n \u50cf \u4eba\u8c61\n ```\n+ Use `FuzzyChineseMatch.compare_two_columns(X, Y)` to compare the pair of words in each row to get similarity score.\n\n+ See [documentation](http://znwang.me/fuzzychinese/) for details.\n\n\n## Credits\n\nData for Chinese radicals are from [\u6f22\u8a9e\u62c6\u5b57\u5b57\u5178](https://github.com/kfcd/chaizi) by [\u958b\u653e\u8a5e\u5178\u7db2](http://kaifangcidian.com/).\n\n", "description_content_type": "text/markdown", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/znwang25/fuzzychinese", "keywords": "NLP,fuzzy matching,Chinese word", "license": "", "maintainer": "", "maintainer_email": "", "name": "fuzzychinese", "package_url": "https://pypi.org/project/fuzzychinese/", "platform": "", "project_url": "https://pypi.org/project/fuzzychinese/", "project_urls": { "Homepage": "https://github.com/znwang25/fuzzychinese" }, "release_url": "https://pypi.org/project/fuzzychinese/0.1.5/", "requires_dist": null, "requires_python": "", "summary": "A small package to fuzzy match chinese words \u4e2d\u6587\u6a21\u7cca\u5339\u914d", "version": "0.1.5" }, "last_serial": 5204753, "releases": { "0.1.3": [ { "comment_text": "", "digests": { "md5": "4af2cee28d2df492fc69352d153f961e", "sha256": "df68816811b1b455b4597964e3857d12dce8e91dd8a010cee0fe6573fb9e2204" }, "downloads": -1, "filename": "fuzzychinese-0.1.3-py3-none-any.whl", "has_sig": false, "md5_digest": "4af2cee28d2df492fc69352d153f961e", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 169716, "upload_time": "2019-04-16T04:13:10", "url": "https://files.pythonhosted.org/packages/98/68/88e0e8a0e28291a4cc5a2c7e1f28154a726c6b69bb8d8638df9bd9a780e8/fuzzychinese-0.1.3-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "977867146d7e602a3e8a25c18704fddb", "sha256": "395e011e7f4ac48e362eb74a77656b4a33c1393915dab542b167d60a4296e41e" }, "downloads": -1, "filename": "fuzzychinese-0.1.3.tar.gz", "has_sig": false, "md5_digest": "977867146d7e602a3e8a25c18704fddb", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 155010, "upload_time": "2019-04-16T04:13:12", "url": "https://files.pythonhosted.org/packages/d8/c0/85630c8fe3776334c9cd59cda3839f4001e71b12f974ab2f83bc5cce8e14/fuzzychinese-0.1.3.tar.gz" } ], "0.1.4": [ { "comment_text": "", "digests": { "md5": "8f3842cda13b3500949a96970e52f6db", "sha256": "74c320f91c6cfe62f07a766fdec9526b49ece5afa2f0e38c83743e6d6e33387c" }, "downloads": -1, "filename": "fuzzychinese-0.1.4-py3-none-any.whl", "has_sig": false, "md5_digest": "8f3842cda13b3500949a96970e52f6db", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 170481, "upload_time": "2019-04-18T00:39:35", "url": "https://files.pythonhosted.org/packages/c4/da/200767c139c205e2fd691e242e2a201ea6b54fbbf25c1adecab9fa6e540f/fuzzychinese-0.1.4-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "cb484ceb7809ef47082c700c452e2a19", "sha256": "d7fb4a12dacfc280b3d0df3472fb89e13ff3a98e0feff28456a22d5014744039" }, "downloads": -1, "filename": "fuzzychinese-0.1.4.tar.gz", "has_sig": false, "md5_digest": "cb484ceb7809ef47082c700c452e2a19", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 155866, "upload_time": "2019-04-18T00:39:37", "url": "https://files.pythonhosted.org/packages/43/e2/281dda9d5f791eee44fe42eceec8fc27fd76929f9eaa41d9f3e7cc763832/fuzzychinese-0.1.4.tar.gz" } ], "0.1.5": [ { "comment_text": "", "digests": { "md5": "153c83f22e6a615c9fbbf528f7dde97e", "sha256": "59e62dab7eb3585e8d334fe867d97cbde6a7ecae5750c9e4e4041a42db2ff593" }, "downloads": -1, "filename": "fuzzychinese-0.1.5-py3-none-any.whl", "has_sig": false, "md5_digest": "153c83f22e6a615c9fbbf528f7dde97e", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 302458, "upload_time": "2019-04-29T20:08:51", "url": "https://files.pythonhosted.org/packages/48/e7/d5186b34c7919c31f5dd7e7b6a437ac97d0149d882c611009efad270aadf/fuzzychinese-0.1.5-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "dbb684871d940e84326d22653587fc64", "sha256": "a8640118865bda3b0317a3c04342e336a8cd92b085760388e1d9bb1c644cfac7" }, "downloads": -1, "filename": "fuzzychinese-0.1.5.tar.gz", "has_sig": false, "md5_digest": "dbb684871d940e84326d22653587fc64", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 287485, "upload_time": "2019-04-29T20:08:56", "url": "https://files.pythonhosted.org/packages/f7/8c/54db3f0384ce9050adbb320ccd6cc137b34d4940453c99dda629b9816a01/fuzzychinese-0.1.5.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "153c83f22e6a615c9fbbf528f7dde97e", "sha256": "59e62dab7eb3585e8d334fe867d97cbde6a7ecae5750c9e4e4041a42db2ff593" }, "downloads": -1, "filename": "fuzzychinese-0.1.5-py3-none-any.whl", "has_sig": false, "md5_digest": "153c83f22e6a615c9fbbf528f7dde97e", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 302458, "upload_time": "2019-04-29T20:08:51", "url": "https://files.pythonhosted.org/packages/48/e7/d5186b34c7919c31f5dd7e7b6a437ac97d0149d882c611009efad270aadf/fuzzychinese-0.1.5-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "dbb684871d940e84326d22653587fc64", "sha256": "a8640118865bda3b0317a3c04342e336a8cd92b085760388e1d9bb1c644cfac7" }, "downloads": -1, "filename": "fuzzychinese-0.1.5.tar.gz", "has_sig": false, "md5_digest": "dbb684871d940e84326d22653587fc64", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 287485, "upload_time": "2019-04-29T20:08:56", "url": "https://files.pythonhosted.org/packages/f7/8c/54db3f0384ce9050adbb320ccd6cc137b34d4940453c99dda629b9816a01/fuzzychinese-0.1.5.tar.gz" } ] }