{ "info": { "author": "yilei.wang", "author_email": "stevewyl@163.com", "bugtrack_url": null, "classifiers": [ "Programming Language :: Java", "Programming Language :: Python", "Programming Language :: Python :: 3.6", "Programming Language :: Python :: Implementation :: CPython", "Programming Language :: Python :: Implementation :: PyPy" ], "description": "# Chunk\u5206\u8bcd\u5668\u4f7f\u7528\u6307\u5357\n\n\u73af\u5883\u4f9d\u8d56\uff1apython 3.6 (\u6682\u65f6\u53ea\u652f\u6301python3)\n\n## \u4e3b\u8981\u529f\u80fd\n\n1. \u80fd\u591f\u8f93\u51fa\u540d\u8bcd\u77ed\u8bed\n2. \u652f\u6301\u8bcd\u6027\u8f93\u51fa\uff0c\u540d\u8bcd\u77ed\u8bed\u8bcd\u6027\u4e3anp\n3. \u652f\u6301\u540d\u8bcd\u77ed\u8bed\u4ee5\u9650\u5b9a\u8bcd+\u4e2d\u5fc3\u8bcd\u7684\u5f62\u5f0f\u8f93\u51fa\n\n>\u4e0d\u53ef\u5206\u5272\u7684\u540d\u8bcd\u77ed\u8bed\u662f\u4e0d\u5b58\u5728\u9650\u5b9a\u8bcd+\u4e2d\u5fc3\u8bcd\u7684\u5f62\u5f0f\u7684\uff0c\u5982\u201c\u673a\u5668\u5b66\u4e60\u201d\uff0c\u800c\u201c\u7ecf\u5178\u673a\u5668\u5b66\u4e60\u7b97\u6cd5\u201d\u53ef\u62c6\u89e3\u4e3a\u201c\u7ecf\u5178_\u673a\u5668\u5b66\u4e60_\u7b97\u6cd5\u201d\n\n## Step 1 \u5b89\u88c5\u8f6f\u4ef6\u5305\n\n\u63a8\u8350\u65b0\u5efa\u4e00\u4e2apython\u7684\u865a\u62df\u73af\u5883\uff08\u53ef\u8df3\u8fc7\uff09\n\n```bash\nconda create --name chunk_seg python=3.6.5\n```\n\n### pip\u5b89\u88c5\n\n```bash\npip install git+https://www.github.com/keras-team/keras-contrib.git\npip install chunk-segmentor\n```\n\n### \u624b\u52a8\u5b89\u88c5\n\n```bash\ngit clone https://github.com/stevewyl/chunk_segmentor\ncd chunk_segmentor\npip install -r requirements.txt\npython setup.py install\n```\n\n### \u989d\u5916\u5b89\u88c5\n```bash\n# \u82e5\u4f60\u7684\u673a\u5668\u5b89\u88c5\u6709GPU\uff0c\u5229\u7528GPU\u52a0\u901f\u9884\u6d4b\u901f\u5ea6\npip install tensorflow-gpu==1.9.0\n```\n### \u5b89\u88c5\u9519\u8bef\n1. ImportError: cannot import name 'normalize_data_format'\n```bash\npip install -U keras\n```\n\n## Step 2 \u5982\u4f55\u4f7f\u7528\n\n* \u7b2c\u4e00\u6b21import\u7684\u65f6\u5019\uff0c\u4f1a\u81ea\u52a8\u4e0b\u8f7d\u6a21\u578b\u548c\u5b57\u5178\u6570\u636e \n* \u652f\u6301\u5355\u53e5\u548c\u591a\u53e5\u6587\u672c\u7684\u8f93\u5165\u683c\u5f0f\uff0c\u5efa\u8bae\u4ee5\u5217\u8868\u7684\u5f62\u5f0f\u4f20\u5165\u5206\u8bcd\u5668\n\n```python\nfrom chunk_segmentor import Chunk_Segmentor\ncutter = Chunk_Segmentor()\ns = '\u8fd9\u662f\u4e00\u4e2a\u80fd\u591f\u8f93\u51fa\u540d\u8bcd\u77ed\u8bed\u7684\u5206\u8bcd\u5668\uff0c\u6b22\u8fce\u8bd5\u7528\uff01'\nres = [item for item in cutter.cut([s] * 10000)] # 1080ti\u4e0a\u8017\u65f612s\n\n# \u63d0\u4f9b\u4e24\u4e2a\u7248\u672c\uff0caccurate\u4e3a\u7cbe\u786e\u7248\uff0cfast\u4e3a\u5feb\u901f\u7248\u4f46\u53ec\u56de\u4f1a\u964d\u4f4e\u4e00\u4e9b\uff0c\u9ed8\u8ba4\u7cbe\u786e\u7248\ncutter = Chunk_Segmentor(mode='accurate')\ncutter = Chunk_Segmentor(mode='fast')\n# \u9650\u5b9a\u8bcd+\u4e2d\u5fc3\u8bcd\u7684\u5f62\u5f0f, \u9ed8\u8ba4\u5f00\u542f\ncutter.cut(s, qualifier=False)\n# \u662f\u5426\u8f93\u51fa\u8bcd\u6027\uff0c \u9ed8\u8ba4\u5f00\u542f\ncutter.cut(s, pos=False)\n\n# \u8f93\u51fa\u683c\u5f0f\uff08\u8bcd\u5217\u8868\uff0c\u8bcd\u6027\u5217\u8868\uff0cchunk\u96c6\u5408\uff09\n[\n (\n ['\u8fd9', '\u662f', '\u4e00\u4e2a', '\u80fd\u591f', '\u8f93\u51fa', '\u540d\u8bcd_\u77ed\u8bed', '\u7684', '\u5206\u8bcd\u5668', ',', '\u6b22\u8fce', '\u8bd5\u7528', '!'],\n ['rzv', 'vshi', 'mq', 'v', 'vn', 'np', 'ude1', 'np', 'w', 'v', 'v', 'w'],\n ['\u5206\u8bcd\u5668', '\u540d\u8bcd_\u77ed\u8bed']\n )\n ...\n]\n```\n\n## Step 3 \u540e\u7eed\u66f4\u65b0\n\n\u82e5\u5b58\u5728\u65b0\u7684\u6a21\u578b\u548c\u5b57\u5178\u6570\u636e\uff0c\u4f1a\u63d0\u793a\u4f60\u662f\u5426\u9700\u8981\u66f4\u65b0\n\n## To-Do Lists\n\n1. \u63d0\u5347\u9650\u5b9a\u8bcd\u548c\u540d\u8bcd\u77ed\u8bed\u7684\u51c6\u786e\u6027 ---> \u65b0\u7684\u6a21\u578b\n2. char\u6a21\u578b\u5b58\u5728GPU\u8c03\u7528\u5185\u5b58\u6ea2\u51fa\u7684\u95ee\u9898 ---> \u4f7f\u7528cnn\u63d0\u53d6Nchar\u4fe1\u606f\u6765\u4ee3\u66ffembedding\u7684\u65b9\u5f0f\uff0c\u7f29\u5c0f\u6a21\u578b\u89c4\u6a21\n3. \u81ea\u5b9a\u4e49\u5b57\u5178\uff0c\u652f\u6301\u4e0d\u540c\u7c92\u5ea6\u7684\u5207\u5206\n4. \u591a\u8fdb\u7a0b\u6a21\u578b\u52a0\u8f7d\u548c\u9884\u6d4b", "description_content_type": "", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/stevewyl/chunk_segmentor", "keywords": "", "license": "MIT", "maintainer": "", "maintainer_email": "", "name": "chunk-segmentor", "package_url": "https://pypi.org/project/chunk-segmentor/", "platform": "", "project_url": "https://pypi.org/project/chunk-segmentor/", "project_urls": { "Homepage": "https://github.com/stevewyl/chunk_segmentor" }, "release_url": "https://pypi.org/project/chunk-segmentor/1.1.0/", "requires_dist": null, "requires_python": ">=3.6", "summary": "Segmentor with Noun Pharses", "version": "1.1.0" }, "last_serial": 4492131, "releases": { "1.0.0": [ { "comment_text": "", "digests": { "md5": "57163758b305b01a3c7e7cbe737338d3", "sha256": "3464ce9116879ae1e78fc85aba64fe06be2575dbeb05b5183a11e14c93c5a7b0" }, "downloads": -1, "filename": "chunk_segmentor-1.0.0.tar.gz", "has_sig": false, "md5_digest": "57163758b305b01a3c7e7cbe737338d3", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.6", "size": 1712897, "upload_time": "2018-08-29T12:24:04", "url": "https://files.pythonhosted.org/packages/64/b6/45e90ba875fffe3b5b4a7456526e220abe5ce0ca4566f7ed7ba1f6176f61/chunk_segmentor-1.0.0.tar.gz" } ], "1.0.1": [ { "comment_text": "", "digests": { "md5": "1b1d1ec061381a8a8aece8867a498e3a", "sha256": "3cfe0cecd19b7c41e884de04a8460592594595d64099987b06b51eb4ce30c549" }, "downloads": -1, "filename": "chunk_segmentor-1.0.1.tar.gz", "has_sig": false, "md5_digest": "1b1d1ec061381a8a8aece8867a498e3a", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.6", "size": 1713039, "upload_time": "2018-11-13T09:26:56", "url": "https://files.pythonhosted.org/packages/e4/11/de57602ebec0efa529c82ca7ec3ef1276a15da795fa5348df490178eb579/chunk_segmentor-1.0.1.tar.gz" } ], "1.0.2": [ { "comment_text": "", "digests": { "md5": "de84849e87ef4d4706fdb23c6fc32f32", "sha256": "1d9b12c2b8bdd097430895ca8bd56f2e2ffb4daa085079cd9f9011c8d29e2ad0" }, "downloads": -1, "filename": "chunk_segmentor-1.0.2.tar.gz", "has_sig": false, "md5_digest": "de84849e87ef4d4706fdb23c6fc32f32", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.6", "size": 1713027, "upload_time": "2018-11-13T11:29:09", "url": "https://files.pythonhosted.org/packages/3d/5f/2975e7760dcb83bf800e449171bbf6d78badb4d5f87da732b112b179fbcc/chunk_segmentor-1.0.2.tar.gz" } ], "1.1.0": [ { "comment_text": "", "digests": { "md5": "785e88b88169438c9326493855e297ee", "sha256": "0afe6acd331e441648375d65c83bf517ea01fc938e1a618789a0f61cf3539871" }, "downloads": -1, "filename": "chunk_segmentor-1.1.0.tar.gz", "has_sig": false, "md5_digest": "785e88b88169438c9326493855e297ee", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.6", "size": 1713521, "upload_time": "2018-11-16T02:36:07", "url": "https://files.pythonhosted.org/packages/12/fc/89a4a78500ccffc03871819492b0eaf1f112f0fdd460f3dfc11f04d56f6d/chunk_segmentor-1.1.0.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "785e88b88169438c9326493855e297ee", "sha256": "0afe6acd331e441648375d65c83bf517ea01fc938e1a618789a0f61cf3539871" }, "downloads": -1, "filename": "chunk_segmentor-1.1.0.tar.gz", "has_sig": false, "md5_digest": "785e88b88169438c9326493855e297ee", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.6", "size": 1713521, "upload_time": "2018-11-16T02:36:07", "url": "https://files.pythonhosted.org/packages/12/fc/89a4a78500ccffc03871819492b0eaf1f112f0fdd460f3dfc11f04d56f6d/chunk_segmentor-1.1.0.tar.gz" } ] }