{ "info": { "author": "Janson, Yuzheng", "author_email": "gandancing@gmail.com", "bugtrack_url": null, "classifiers": [], "description": "Yaha\u5206\u8bcd\n========\n\"\u54d1\u54c8\"\u4e2d\u6587\u5206\u8bcd\uff0c\u66f4\u5feb\u6216\u66f4\u51c6\u786e\uff0c\u7531\u4f60\u6765\u5b9a\u4e49\u3002\u901a\u8fc7\u7b80\u5355\u5b9a\u5236\uff0c\u8ba9\u5206\u8bcd\u6a21\u5757\u66f4\u9002\u7528\u4e8e\u4f60\u7684\u9700\u6c42\u3002\n\"Yaha\" You can custom your Chinese Word Segmentation efficiently by using Yaha\n\n\n\u5b89\u88c5\n======\npip install haya\n\n\n\u5728\u7ebf\u6f14\u793a\n========\n\u4ee3\u7801\u90e8\u7f72\u5728GAE\u4e0a\uff1ahttp://yahademo.appspot.com\n\n\u4ee3\u7801\u90e8\u7f72\u5728SAE\u4e0a\uff1ahttp://yaha.sinaapp.com\n\n\u539f\u672c\u7684\u8fd9\u4e2a\u5730\u5740\u5df2\u4e0d\u518d\u4f7f\u7528\uff1ahttp://yaha.v-find.com/\n\n\nFeature\n========\n* \u57fa\u672c\u529f\u80fd\uff1a\n * \u7cbe\u786e\u6a21\u5f0f\uff0c\u5c06\u53e5\u5b50\u5207\u6210\u6700\u5408\u7406\u7684\u8bcd\u3002\n * \u5168\u6a21\u5f0f\uff0c\u6240\u6709\u7684\u53ef\u80fd\u8bcd\u90fd\u88ab\u5207\u6210\u8bcd\uff0c\u4e0d\u6d88\u9664\u6b67\u4e49\u3002\n * \u641c\u7d22\u5f15\u64ce\u6a21\u5f0f\uff0c\u5728\u7cbe\u786e\u7684\u57fa\u7840\u4e0a\u518d\u6b21\u9a7f\u957f\u8bcd\u8fdb\u884c\u5207\u5206\uff0c\u63d0\u9ad8\u53ec\u56de\u7387\uff0c\u9002\u5408\u641c\u7d22\u5f15\u64ce\u521b\u5efa\u7d22\u5f15\u3002\n * \u5907\u9009\u8def\u5f84\uff0c\u53ef\u751f\u6210\u6700\u597d\u7684\u591a\u6761\u5207\u8bcd\u8def\u5f84\uff0c\u53ef\u5728\u6b64\u57fa\u7840\u4e0a\u6839\u636e\u5176\u5b83\u4fe1\u606f\u5f97\u5230\u66f4\u7cbe\u786e\u7684\u5206\u8bcd\u6a21\u5f0f\u3002\n\n* \u53ef\u7528\u63d2\u4ef6\uff1a\n * \u6b63\u5219\u8868\u8fbe\u5f0f\u63d2\u4ef6\n * \u4eba\u540d\u524d\u7f00\u63d2\u4ef6\n * \u5730\u540d\u540e\u7f00\u63d2\u4ef6\n * \u5b9a\u5236\u529f\u80fd\u3002\u5206\u8bcd\u8fc7\u7a0b\u4ea7\u751f4\u79cd\u9636\u6bb5\uff0c\u6bcf\u4e2a\u9636\u6bb5\u90fd\u53ef\u4ee5\u52a0\u5165\u4e2a\u4eba\u7684\u5b9a\u5236\u3002\n\n* \u9644\u52a0\u529f\u80fd\uff1a\n * \u65b0\u8bcd\u5b66\u4e60\u529f\u80fd\u3002\u901a\u8fc7\u8f93\u5165\u5927\u6bb5\u6587\u5b57\uff0c\u5b66\u4e60\u5230\u6b64\u5185\u5bb9\u4ea7\u751f\u7684\u65b0\u8001\u8bcd\u8bed\u3002 \uff08\u6dfb\u52a0\u4e86\u4e00\u4e2a\u7531\u6211\u670b\u53cb\u5b9e\u73b0\u7684C++\u7248\u672c\u7684\u6700\u5927\u71b5\u65b0\u8bcd\u53d1\u73b0\u529f\u80fd\uff0c\u901f\u5ea6\u662fpython\u768410\u500d\u5427\uff09\n * \u83b7\u53d6\u5927\u6bb5\u6587\u672c\u7684\u5173\u952e\u5b57\u3002\n * \u83b7\u53d6\u5927\u6bb5\u6587\u672c\u7684\u6458\u8981\u3002\n * \u652f\u6301\u7528\u6237\u81ea\u5b9a\u4e49\u8bcd\u5178 \uff08TODO\u76ee\u524d\u8fd8\u6ca1\u6709\u5b9e\u73b0\u5f97\u5f88\u597d\uff09\n\n\n\nAlgorithm\n=========\n* \u6838\u5fc3\u662f\u57fa\u4e8e\u67e5\u627e\u53e5\u5b50\u7684\u6700\u5927\u6982\u7387\u8def\u5f84\u6765\u8fdb\u884c\u5206\u8bcd\u3002\n* \u4fdd\u8bc1\u6548\u7387\u7684\u57fa\u7840\u4e0a\uff0c\u5bf9\u5206\u8bcd\u7684\u5404\u4e2a\u9636\u6bb5\u8fdb\u884c\u5b9a\u4e49\uff0c\u65b9\u4fbf\u7528\u6237\u6dfb\u52a0\u5c5e\u4e8e\u81ea\u5df1\u7684\u5206\u8bcd\u65b9\u6cd5(\u9ed8\u8ba4\u6709\u6b63\u5219\uff0c\u524d\u7f00\u540d\u5b57\u4e0e\u540e\u7f00\u5730\u540d)\u3002\n* \u7528\u6237\u53ef\u81ea\u5b9a\u4e49\u4f7f\u7528\u52a8\u6001\u89c4\u5212\u6216Dijdstra\u7b97\u6cd5\u5f97\u5230\u6700\u4f18\u7684\u4e00\u6761\u6216\u591a\u6761\u8def\u5f84\uff0c\u518d\u6b21\u53ef\u6839\u636e\u8bcd\u6027(\u4e2d\u79d1\u5927ictclas\u7684\u4f5c\u6cd5)\u7b49\u5176\u5b83\u4fe1\u606f\u5f97\u83b7\u5f97\u6700\u4f18\u8def\u5f84\u3002\n* \u4f7f\u7528\u201c\u6700\u5927\u71b5\u201d\u7b97\u6cd5\u6765\u5b9e\u73b0\u5bf9\u5927\u6587\u672c\u7684\u65b0\u8bcd\u53d1\u73b0\u80fd\u529b\uff0c\u5f88\u9002\u5408\u4f7f\u7528\u5b83\u6765\u521b\u5efa\u81ea\u5b9a\u4e49\u8bcd\u5178\uff0c\u6216\u5728SNS\u7b49\u573a\u5408\u8fdb\u884c\u6570\u636e\u6316\u6398\u7684\u5de5\u4f5c\u3002\n* \u76f8\u6bd4\u5df2\u5b58\u5728\u7684\u7ed3\u5df4\u5206\u8bcd\uff0c\u53bb\u6389\u4e86\u5f88\u6d88\u8017\u5185\u5b58\u7684Trie\u6811\u7ed3\u6784\uff0c\u4ee5\u53ca\u65b0\u8bcd\u53d1\u73b0\u80fd\u529b\u5e76\u4e0d\u5f3a\u7684HMM\u6a21\u578b(\u672a\u6765\u6b64\u6a21\u578b\u53ef\u80fd\u5f53\u6210\u4e00\u4e2a\u5907\u9009\u63d2\u4ef6\u52a0\u5165\u5230\u6b64\u6a21\u5757)\u3002\n\n\n\u9636\u6bb5\u8bb2\u89e3\n========\n* stage 1\u662f\u5728\u5206\u53e5\u4e2d\u5b9e\u73b0\uff0c\u901a\u8fc7\u6b63\u5219\u53ef\u76f4\u63a5\u5c06\u6570\u5b57\u6216\u82f1\u6587\u5355\u8bcd\u5206\u6210\u72ec\u7acb\u7684\u8bcd\uff0c\u751f\u6210\u72ec\u7acb\u7684\u8fd9\u4e9b\u8bcd\u4e0d\u518d\u53c2\u4e0e\u4e0b\u4e00\u6b65\u7684\u5206\u8bcd\u3002\n* stage 2\u5728\u521b\u5efa\u6709\u5411\u65e0\u73af\u56fe\u4e4b\u524d\u5b9e\u73b0\uff0c\u5bf9\u5206\u53e5\u8fdb\u884c\u9884\u626b\u63cf\uff0c\u52a0\u5165\u4e00\u4e9b\u53ef\u80fd\u5f62\u6210\u7684\u8bcd\uff0c\u5e76\u8d4b\u4e88\u4e00\u5b9a\u7684\u6982\u7387\u3002\n* stage 3\u5728\u521b\u5efa\u6709\u5411\u65e0\u73af\u56fe\u671f\u95f4\u5b9e\u73b0\uff0c\u4ece\u5b57\u5178\u5f97\u5230\u8bcd\u7684\u6982\u7387\uff0c\u6216\u901a\u8fc7\u4e00\u4e9b\u5339\u914d\u6a21\u5f0f\u5f97\u5230\u53ef\u80fd\u7684\u8bcd\uff0c\u8d4b\u4e88\u4e00\u5b9a\u6982\u7387\u3002\n* stage 4\u5728\u5f97\u5230\u6709\u5411\u65e0\u73af\u56fe\u7684\u6700\u5927\u6982\u7387\u4e4b\u540e\uff08\u7a0b\u5e8f\u5b9e\u73b0\u5f53\u4e2d\u662f\u6700\u77ed\u8def\u5f84\uff09\uff0c\u5bf9\u4e00\u4e9b\u4e0d\u80fd\u6210\u8bcd\u7684\u5355\u5b57\u518d\u7ee7\u7eed\u8fdb\u884c\u5904\u7406\uff1b\u6216\u5f97\u5230\u6700\u77ed\u7684\u591a\u6761\u8def\u5f84\u4e4b\u540e\uff0c\u6839\u636e\u7528\u6237\u7684\u5174\u8da3\u5f97\u5230\u6700\u7ec8\u7684\u4e00\u6761\u8def\u5f84\u3002\u82e5\u7528\u6237\u6709\u5174\u8da3\uff0c\u53ef\u4ee5\u5728\u8fd9\u4e00\u6b65\u5b9e\u73b0\u5bf9\u8bcd\u6027\u7684\u5206\u6790\u3002\n\n\n\u76ee\u524d\u72b6\u6001\n========\n\u51c6\u5907\u53d1\u5e03\u7b2c\u4e00\u4e2a\u7248\u672c\uff0c\u5982\u679c\u6709\u597d\u5fc3\u4eba\u5e2e\u6d4b\u8bd5\u4f1a\u975e\u5e38\u611f\u8c22\u3002 \u6700\u540e\u8981\u611f\u8c22jieba\u7684\u4f5c\u8005\uff0c\u76ee\u524d\u7684\u5b57\u5178\u662f\u76f4\u63a5\u4ecejieba\u9879\u76ee\u62f7\u8d1d\u8fc7\u6765\u7684\u3002\n", "description_content_type": null, "docs_url": null, "download_url": "UNKNOWN", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "http://github.com/jannson/yaha", "keywords": "word,segmenetation,keyword,summerize", "license": "MIT License", "maintainer": null, "maintainer_email": null, "name": "yaha", "package_url": "https://pypi.org/project/yaha/", "platform": "UNKNOWN", "project_url": "https://pypi.org/project/yaha/", "project_urls": { "Download": "UNKNOWN", "Homepage": "http://github.com/jannson/yaha" }, "release_url": "https://pypi.org/project/yaha/0.02/", "requires_dist": null, "requires_python": null, "summary": "Chinese Words Segementation Utilities", "version": "0.02" }, "last_serial": 865405, "releases": { "0.01.alpha": [ { "comment_text": "", "digests": { "md5": "63a378e8eaf518baf83c8fc446880400", "sha256": "a6630a1a0fb197c460104f60fb3285fe12885f06314dad93ccb573d850a8a908" }, "downloads": -1, "filename": "yaha-0.01.alpha.tar.gz", "has_sig": false, "md5_digest": "63a378e8eaf518baf83c8fc446880400", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 1085348, "upload_time": "2013-08-21T08:33:21", "url": "https://files.pythonhosted.org/packages/a6/60/4553eaca4f59dad98d4063a864b9902f8e1e68698cdd6da4ba69a00d157a/yaha-0.01.alpha.tar.gz" } ], "0.02": [ { "comment_text": "", "digests": { "md5": "741399f93185f8270e0e7b85298bc79b", "sha256": "80285ce760d10cc641b326a19f2a1c66a85dd998e82a8c971b7a81d593333fc4" }, "downloads": -1, "filename": "yaha-0.02.tar.gz", "has_sig": false, "md5_digest": "741399f93185f8270e0e7b85298bc79b", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 1105114, "upload_time": "2013-09-09T08:27:23", "url": "https://files.pythonhosted.org/packages/b3/24/a1d629de9090a1b4b460a19fe194cdad893bcf98bb3d78a99b52cf56ca1c/yaha-0.02.tar.gz" } ], "0.03.alpha": [ { "comment_text": "", "digests": { "md5": "68ea469adde765bf06ec4f66c9105f8d", "sha256": "cb1cd4ee4408886847fcea32dcc40ef885147047c36334ce48ac61cb65f30ba3" }, "downloads": -1, "filename": "yaha-0.03.alpha.tar.gz", "has_sig": false, "md5_digest": "68ea469adde765bf06ec4f66c9105f8d", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 1107087, "upload_time": "2013-09-14T07:50:31", "url": "https://files.pythonhosted.org/packages/89/2f/1d125f4477ddd52bc05dd27d636649c48a5ebed9e215a846d3f1a6e050c3/yaha-0.03.alpha.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "741399f93185f8270e0e7b85298bc79b", "sha256": "80285ce760d10cc641b326a19f2a1c66a85dd998e82a8c971b7a81d593333fc4" }, "downloads": -1, "filename": "yaha-0.02.tar.gz", "has_sig": false, "md5_digest": "741399f93185f8270e0e7b85298bc79b", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 1105114, "upload_time": "2013-09-09T08:27:23", "url": "https://files.pythonhosted.org/packages/b3/24/a1d629de9090a1b4b460a19fe194cdad893bcf98bb3d78a99b52cf56ca1c/yaha-0.02.tar.gz" } ] }