{ "info": { "author": "henryyang42", "author_email": "henryyang42@gmail.com", "bugtrack_url": null, "classifiers": [ "Development Status :: 3 - Alpha", "Intended Audience :: Developers", "License :: OSI Approved :: MIT License", "Natural Language :: Chinese (Traditional)", "Operating System :: OS Independent", "Programming Language :: Python", "Programming Language :: Python :: 3", "Programming Language :: Python :: 3.3", "Programming Language :: Python :: 3.4", "Programming Language :: Python :: 3.5", "Programming Language :: Python :: 3.6", "Topic :: Text Processing", "Topic :: Text Processing :: Indexing", "Topic :: Text Processing :: Linguistic" ], "description": "Ckip Segmenter\n==============\nA Python client for the Chinese Word Segmentation System (see `ckipsvr.iis.sinica.edu.tw `_) provided by Academia Sinica Chinese Knowledge and Information Processing (CKIP) Group. The core was adapted from `amigcamel/PyCCS `_.\n\nInstallation\n============\n\nSimply run the following command:\n\n.. code-block:: sh\n\n pip install ckip-segmenter\n\nIf ``pip`` is not available, you can also `download it manually from PyPI `_.\n\n*Note: Currently only Python 3+ is supported.*\n\nUsage\n=====\n\nSummon a ``CkipSegmenter``\n-----------------------------------------------------------\nIn[1]:\n\n.. code-block:: python\n\n from ckip import CkipSegmenter\n segmenter = CkipSegmenter()\n\n text = '\u8a5e\u662f\u6700\u5c0f\u6709\u610f\u7fa9\u4e14\u53ef\u4ee5\u81ea\u7531\u4f7f\u7528\u7684\u8a9e\u8a00\u55ae\u4f4d\u3002\u4efb\u4f55\u8a9e\u8a00\u8655\u7406\u7684\u7cfb\u7d71\u90fd\u5fc5\u9808\u5148\u80fd\u5206\u8fa8\u6587\u672c\u4e2d\u7684\u8a5e\u624d\u80fd\u9032\u884c\u9032\u4e00\u6b65\u7684\u8655\u7406'\n corpus = [\n '\u8a5e\u662f\u6700\u5c0f\u6709\u610f\u7fa9\u4e14\u53ef\u4ee5\u81ea\u7531\u4f7f\u7528\u7684\u8a9e\u8a00\u55ae\u4f4d',\n '\u4efb\u4f55\u8a9e\u8a00\u8655\u7406\u7684\u7cfb\u7d71\u90fd\u5fc5\u9808\u5148\u80fd\u5206\u8fa8\u6587\u672c\u4e2d\u7684\u8a5e\u624d\u80fd\u9032\u884c\u9032\u4e00\u6b65\u7684\u8655\u7406',\n '\u4f8b\u5982\u6a5f\u5668\u7ffb\u8b6f\u3001\u8a9e\u8a00\u5206\u6790\u3001\u8a9e\u8a00\u4e86\u89e3\u3001\u8cc7\u8a0a\u62bd\u53d6',\n '\u56e0\u6b64\u4e2d\u6587\u81ea\u52d5\u5206\u8a5e\u7684\u5de5\u4f5c\u6210\u4e86\u8a9e\u8a00\u8655\u7406\u4e0d\u53ef\u6216\u7f3a\u7684\u6280\u8853',\n '\u57fa\u672c\u4e0a\u81ea\u52d5\u5206\u8a5e\u591a\u5229\u7528\u8a5e\u5178\u4e2d\u6536\u9304\u7684\u8a5e\u548c\u6587\u672c\u505a\u6bd4\u5c0d',\n '\u627e\u51fa\u53ef\u80fd\u5305\u542b\u7684\u8a5e\uff0c\u7531\u65bc\u5b58\u5728\u6b67\u7fa9\u7684\u5207\u5206\u7d50\u679c',\n '\u56e0\u6b64\u591a\u6578\u7684\u4e2d\u6587\u5206\u8a5e\u7a0b\u5f0f\u591a\u8a0e\u8ad6\u5982\u4f55\u89e3\u6c7a\u5206\u8a5e\u6b67\u7fa9\u7684\u554f\u984c',\n '\u800c\u8f03\u5c11\u8a0e\u8ad6\u5982\u4f55\u8655\u7406\u8a5e\u5178\u4e2d\u672a\u6536\u9304\u7684\u8a5e\u51fa\u73fe\u7684\u554f\u984c\uff08\u65b0\u8a5e\u5982\u4f55\u8fa8\u8a8d\uff09',\n ]\n\nThe result object contains ``res``, ``tok`` and ``pos``\n-------------------------------------------------------\nIn[2]:\n\n.. code-block:: python\n\n result = segmenter.seg(text)\n # result.res is a list of tuples contain a token and its pos-tag.\n print('result.res: {}\\n'.format(result.res))\n\n # result.tok and result.pos contains only tokens and pos-tags respectively.\n print('result.tok: {}\\n'.format(result.tok))\n print('result.pos: {}\\n'.format(result.pos))\n\n\nOut[2]:\n\n.. parsed-literal::\n\n result.res: [('\u8a5e', 'Na'), ('\u662f', 'SHI'), ('\u6700', 'Dfa'), ('\u5c0f', 'VH'), ('\u6709', 'V_2'), ('\u610f\u7fa9', 'Na'), ('\u4e14', 'Cbb'), ('\u53ef\u4ee5', 'D'), ('\u81ea\u7531', 'VH'), ('\u4f7f\u7528', 'VC'), ('\u7684', 'DE'), ('\u8a9e\u8a00', 'Na'), ('\u55ae\u4f4d', 'Na'), ('\u3002', 'PERIODCATEGORY'), ('\u4efb\u4f55', 'Neqa'), ('\u8a9e\u8a00', 'Na'), ('\u8655\u7406', 'VC'), ('\u7684', 'DE'), ('\u7cfb\u7d71', 'Na'), ('\u90fd', 'D'), ('\u5fc5\u9808', 'D'), ('\u5148\u80fd', 'Nb'), ('\u5206\u8fa8', 'VE'), ('\u6587\u672c', 'Nb'), ('\u4e2d', 'Ng'), ('\u7684', 'DE'), ('\u8a5e', 'Na'), ('\u624d\u80fd', 'Na'), ('\u9032\u884c', 'VC'), ('\u9032\u4e00\u6b65', 'D'), ('\u7684', 'DE'), ('\u8655\u7406', 'VC')]\n\n result.tok: ['\u8a5e', '\u662f', '\u6700', '\u5c0f', '\u6709', '\u610f\u7fa9', '\u4e14', '\u53ef\u4ee5', '\u81ea\u7531', '\u4f7f\u7528', '\u7684', '\u8a9e\u8a00', '\u55ae\u4f4d', '\u3002', '\u4efb\u4f55', '\u8a9e\u8a00', '\u8655\u7406', '\u7684', '\u7cfb\u7d71', '\u90fd', '\u5fc5\u9808', '\u5148\u80fd', '\u5206\u8fa8', '\u6587\u672c', '\u4e2d', '\u7684', '\u8a5e', '\u624d\u80fd', '\u9032\u884c', '\u9032\u4e00\u6b65', '\u7684', '\u8655\u7406']\n\n result.pos: ['Na', 'SHI', 'Dfa', 'VH', 'V_2', 'Na', 'Cbb', 'D', 'VH', 'VC', 'DE', 'Na', 'Na', 'PERIODCATEGORY', 'Neqa', 'Na', 'VC', 'DE', 'Na', 'D', 'D', 'Nb', 'VE', 'Nb', 'Ng', 'DE', 'Na', 'Na', 'VC', 'D', 'DE', 'VC']\n\n\n\nUsing ``batch_seg`` for a list of text would be slightly faster\n---------------------------------------------------------------\nIn[3]:\n\n.. code-block:: python\n\n segmenter.batch_seg(corpus)\n\nOut[3]:\n\n.. parsed-literal::\n\n 8/8 collected.\n\n [,\n ,\n ,\n ,\n ,\n ,\n ,\n ]\n\n\n\n\n", "description_content_type": null, "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/henryyang42/ckip-segmenter", "keywords": "NLP,tokenizing,Chinese word segementation,part-of-speech tagging", "license": "MIT", "maintainer": "", "maintainer_email": "", "name": "ckip-segmenter", "package_url": "https://pypi.org/project/ckip-segmenter/", "platform": "", "project_url": "https://pypi.org/project/ckip-segmenter/", "project_urls": { "Homepage": "https://github.com/henryyang42/ckip-segmenter" }, "release_url": "https://pypi.org/project/ckip-segmenter/1.0.2/", "requires_dist": null, "requires_python": ">=3", "summary": "Ckip Segmenter", "version": "1.0.2" }, "last_serial": 3024719, "releases": { "1.0": [ { "comment_text": "", "digests": { "md5": "c590dbe74f29e6ecfa9ab16324dc3232", "sha256": "2c7a74ceb17c2082d61fa48ad23f6d9e7ef58a65fb9a45ae69389240a8b50845" }, "downloads": -1, "filename": "ckip_segmenter-1.0-py3-none-any.whl", "has_sig": false, "md5_digest": "c590dbe74f29e6ecfa9ab16324dc3232", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": ">=3", "size": 4431, "upload_time": "2017-07-15T07:31:05", "url": "https://files.pythonhosted.org/packages/84/e0/e43a294f742000024955d9f68ac72543f03b688fee225a6a0214aa18e51e/ckip_segmenter-1.0-py3-none-any.whl" } ], "1.0.1": [ { "comment_text": "", "digests": { "md5": "f691157b9d10dab2307581fe8d10a6d3", "sha256": "ec1ce1888af2dde9cfb2487c85031a850d1ea8d8030c13c36ec4f26b54e2cf71" }, "downloads": -1, "filename": "ckip_segmenter-1.0.1-py3-none-any.whl", "has_sig": false, "md5_digest": "f691157b9d10dab2307581fe8d10a6d3", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": ">=3", "size": 7830, "upload_time": "2017-07-15T08:18:21", "url": "https://files.pythonhosted.org/packages/c8/1a/81697645bc0c72c63a4795aa21b3d7be172e0cbb3e2937500670b04c8b20/ckip_segmenter-1.0.1-py3-none-any.whl" } ], "1.0.2": [ { "comment_text": "", "digests": { "md5": "2e8ac3903fc7ed6bacf46fbbd77124ca", "sha256": "932f2e297e85e9ddaa5a4510ed574a2da5db20faf58bae2f652169171c13f468" }, "downloads": -1, "filename": "ckip_segmenter-1.0.2-py3-none-any.whl", "has_sig": false, "md5_digest": "2e8ac3903fc7ed6bacf46fbbd77124ca", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": ">=3", "size": 7848, "upload_time": "2017-07-15T08:32:50", "url": "https://files.pythonhosted.org/packages/f6/b6/f32a83f415581ae6b901bdcef0d9560e6b20f8f4fceb64bec7e1700f370c/ckip_segmenter-1.0.2-py3-none-any.whl" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "2e8ac3903fc7ed6bacf46fbbd77124ca", "sha256": "932f2e297e85e9ddaa5a4510ed574a2da5db20faf58bae2f652169171c13f468" }, "downloads": -1, "filename": "ckip_segmenter-1.0.2-py3-none-any.whl", "has_sig": false, "md5_digest": "2e8ac3903fc7ed6bacf46fbbd77124ca", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": ">=3", "size": 7848, "upload_time": "2017-07-15T08:32:50", "url": "https://files.pythonhosted.org/packages/f6/b6/f32a83f415581ae6b901bdcef0d9560e6b20f8f4fceb64bec7e1700f370c/ckip_segmenter-1.0.2-py3-none-any.whl" } ] }