{ "info": { "author": "Sang-Kil Park", "author_email": "skpark1224@hyundai.com", "bugtrack_url": null, "classifiers": [ "License :: OSI Approved :: BSD License", "Operating System :: OS Independent", "Programming Language :: Python :: 3" ], "description": "# Korean Sentence Splitter\n\n\n\n- [Korean Sentence Splitter](#korean-sentence-splitter)\n- [Installation](#installation)\n - [Usage](#usage)\n - [Demo](#demo)\n - [Requirements](#requirements)\n- [Build from scratch](#build-from-scratch)\n - [C++](#c)\n - [Python](#python)\n - [Uninstall](#uninstall)\n - [PyPI](#pypi)\n\n\n\nSplit Korean text into sentences using heuristic algorithm. This algorithm was greatly inspired by EungGyun Kim <> who is Kakao NLP Leader and one of the most brilliant NLP Engineers in Korea.\n\nI've started this project inspired by [this article](http://semantics.kr/%ED%95%9C%EA%B5%AD%EC%96%B4-%ED%98%95%ED%83%9C%EC%86%8C-%EB%B6%84%EC%84%9D%EA%B8%B0-%EB%B3%84-%EB%AC%B8%EC%9E%A5-%EB%B6%84%EB%A6%AC-%EC%84%B1%EB%8A%A5%EB%B9%84%EA%B5%90/) and we've achieved best result on the test set. And of course, It's very robust to both Spoken and Written expressions.\n\n# Installation\nThe package is listed in the Python Package Index (PyPI), so you can install it with pip:\n\n```bash\n$ pip install kss\n```\n\n## Usage\n```python\nimport kss\n\ns = \"\ud68c\uc0ac \ub3d9\ub8cc \ubd84\ub4e4\uacfc \ub2e4\ub140\uc654\ub294\ub370 \ubd84\uc704\uae30\ub3c4 \uc88b\uace0 \uc74c\uc2dd\ub3c4 \ub9db\uc788\uc5c8\uc5b4\uc694 \ub2e4\ub9cc, \uac15\ub0a8 \ud1a0\ub07c\uc815\uc774 \uac15\ub0a8 \uc251\uc251\ubc84\uac70 \uace8\ubaa9\uae38\ub85c \ucb49 \uc62c\ub77c\uac00\uc57c \ud558\ub294\ub370 \ub2e4\ub4e4 \uc251\uc251\ubc84\uac70\uc758 \uc720\ud639\uc5d0 \ub118\uc5b4\uac08 \ubed4 \ud588\ub2f5\ub2c8\ub2e4 \uac15\ub0a8\uc5ed \ub9db\uc9d1 \ud1a0\ub07c\uc815\uc758 \uc678\ubd80 \ubaa8\uc2b5.\"\nfor sent in kss.split_sentences(s):\n print(sent)\n```\n\nThe result is shown below:\n```\n\ud68c\uc0ac \ub3d9\ub8cc \ubd84\ub4e4\uacfc \ub2e4\ub140\uc654\ub294\ub370 \ubd84\uc704\uae30\ub3c4 \uc88b\uace0 \uc74c\uc2dd\ub3c4 \ub9db\uc788\uc5c8\uc5b4\uc694\n\ub2e4\ub9cc, \uac15\ub0a8 \ud1a0\ub07c\uc815\uc774 \uac15\ub0a8 \uc251\uc251\ubc84\uac70 \uace8\ubaa9\uae38\ub85c \ucb49 \uc62c\ub77c\uac00\uc57c \ud558\ub294\ub370 \ub2e4\ub4e4 \uc251\uc251\ubc84\uac70\uc758 \uc720\ud639\uc5d0 \ub118\uc5b4\uac08 \ubed4 \ud588\ub2f5\ub2c8\ub2e4\n\uac15\ub0a8\uc5ed \ub9db\uc9d1 \ud1a0\ub07c\uc815\uc758 \uc678\ubd80 \ubaa8\uc2b5.\n```\n\n## Demo\n\n\n## Requirements\n- C++11\n - GCC or Clang with C++11 build supported.\n- Python 3\n\nGoogle Test binary provided was built on macOS.\n\n# Build from scratch\n## C++\n```\n$ mkdir bld\n$ cd bld\n$ cmake ..\n$ make\n$ ./sentsplit\n```\n\nNOTICE: Google Test binary provided was built on macOS only. So, You cannot build test binary on linux.\n\n```cpp\n#include \n#include \"sentence_splitter.h\"\n\nint main() {\n std::string s = \"\ud68c\uc0ac \ub3d9\ub8cc \ubd84\ub4e4\uacfc \ub2e4\ub140\uc654\ub294\ub370 \ubd84\uc704\uae30\ub3c4 \uc88b\uace0 \uc74c\uc2dd\ub3c4 \ub9db\uc788\uc5c8\uc5b4\uc694 \ub2e4\ub9cc, \uac15\ub0a8 \ud1a0\ub07c\uc815\uc774 \uac15\ub0a8 \uc251\uc251\ubc84\uac70 \uace8\ubaa9\uae38\ub85c \ucb49 \uc62c\ub77c\uac00\uc57c \ud558\ub294\ub370 \ub2e4\ub4e4 \uc251\uc251\ubc84\uac70\uc758 \uc720\ud639\uc5d0 \ub118\uc5b4\uac08 \ubed4 \ud588\ub2f5\ub2c8\ub2e4 \uac15\ub0a8\uc5ed \ub9db\uc9d1 \ud1a0\ub07c\uc815\uc758 \uc678\ubd80 \ubaa8\uc2b5.\";\n for (auto sent : splitSentences(s)) {\n std::cout << sent << std::endl;\n }\n\n return 0;\n}\n```\n\nThe result is shown below:\n```\n\ud68c\uc0ac \ub3d9\ub8cc \ubd84\ub4e4\uacfc \ub2e4\ub140\uc654\ub294\ub370 \ubd84\uc704\uae30\ub3c4 \uc88b\uace0 \uc74c\uc2dd\ub3c4 \ub9db\uc788\uc5c8\uc5b4\uc694\n\ub2e4\ub9cc, \uac15\ub0a8 \ud1a0\ub07c\uc815\uc774 \uac15\ub0a8 \uc251\uc251\ubc84\uac70 \uace8\ubaa9\uae38\ub85c \ucb49 \uc62c\ub77c\uac00\uc57c \ud558\ub294\ub370 \ub2e4\ub4e4 \uc251\uc251\ubc84\uac70\uc758 \uc720\ud639\uc5d0 \ub118\uc5b4\uac08 \ubed4 \ud588\ub2f5\ub2c8\ub2e4\n\uac15\ub0a8\uc5ed \ub9db\uc9d1 \ud1a0\ub07c\uc815\uc758 \uc678\ubd80 \ubaa8\uc2b5.\n```\n\n## Python\nPython wrapper has implemented using Cython. You can execute build tasks by the command below.\n```bash\n$ python setup.py install --record files.txt\nor\n$ pip install .\n```\n\n### Uninstall\n```bash\n$ xargs rm -rf < files.txt\nor\n$ pip uninstall kss\n```\n\n## PyPI\n```bash\n$ python setup.py sdist\n$ twine upload --repository-url https://test.pypi.org/legacy/ dist/*\n```", "description_content_type": "text/markdown", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/likejazz/korean-sentence-splitter", "keywords": "", "license": "BSD 3-Clause \"New\" or \"Revised\" License", "maintainer": "", "maintainer_email": "", "name": "kss", "package_url": "https://pypi.org/project/kss/", "platform": "any", "project_url": "https://pypi.org/project/kss/", "project_urls": { "Homepage": "https://github.com/likejazz/korean-sentence-splitter" }, "release_url": "https://pypi.org/project/kss/1.2.5/", "requires_dist": null, "requires_python": "", "summary": "Split Korean text into sentences using heuristic algorithm.", "version": "1.2.5" }, "last_serial": 5978737, "releases": { "1.2.2": [ { "comment_text": "", "digests": { "md5": "2528c5a819601e8d6b23ba3e13f2d08a", "sha256": "a3663833538b67672ddeada8dbbaadc1bd2599422467b1534cb140dfa188f8c7" }, "downloads": -1, "filename": "kss-1.2.2.tar.gz", "has_sig": false, "md5_digest": "2528c5a819601e8d6b23ba3e13f2d08a", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 5805, "upload_time": "2019-08-15T15:41:05", "url": "https://files.pythonhosted.org/packages/04/a4/8a4fb0af0dfb32d478c42bf7ace474db434d98f5d4769dcbc5f9ddac9ad5/kss-1.2.2.tar.gz" } ], "1.2.3": [ { "comment_text": "", "digests": { "md5": "6589e2ec1da7c9b745f1ec3c901a65d3", "sha256": "ad5df05197d51b65374d893ec31ace2a6551fefb9c8e0100ef9327d9b3c44832" }, "downloads": -1, "filename": "kss-1.2.3.tar.gz", "has_sig": false, "md5_digest": "6589e2ec1da7c9b745f1ec3c901a65d3", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 5794, "upload_time": "2019-08-15T17:25:52", "url": "https://files.pythonhosted.org/packages/35/f6/3dc932c8e4b0bd8f821e3f9027218809d1a5b1b31c826f2b0b68ca0a1d1f/kss-1.2.3.tar.gz" } ], "1.2.4": [ { "comment_text": "", "digests": { "md5": "d6211c89934940a8658c228af76c570d", "sha256": "e592f9d4b1aaa2ef6636f21270f11d43f79d0ea83df342d3b5dd8cd96084940e" }, "downloads": -1, "filename": "kss-1.2.4.tar.gz", "has_sig": false, "md5_digest": "d6211c89934940a8658c228af76c570d", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 5894, "upload_time": "2019-08-16T16:25:17", "url": "https://files.pythonhosted.org/packages/bd/91/7e7a7896eb67d9aa2ddfd7d58386df8bdd88c87580000f63c6bdabf17df4/kss-1.2.4.tar.gz" } ], "1.2.5": [ { "comment_text": "", "digests": { "md5": "63e281096d373913c2fc3691081ec659", "sha256": "dced4f1832727cc315c6d7ff37006367017a2648bb72ea7b93aed2a913ac81e0" }, "downloads": -1, "filename": "kss-1.2.5.tar.gz", "has_sig": false, "md5_digest": "63e281096d373913c2fc3691081ec659", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 5899, "upload_time": "2019-10-15T18:21:05", "url": "https://files.pythonhosted.org/packages/e3/e1/ff733dfcdf26212b4a56fd144a407ee939cbb2f24e71c0bc1abaf808264a/kss-1.2.5.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "63e281096d373913c2fc3691081ec659", "sha256": "dced4f1832727cc315c6d7ff37006367017a2648bb72ea7b93aed2a913ac81e0" }, "downloads": -1, "filename": "kss-1.2.5.tar.gz", "has_sig": false, "md5_digest": "63e281096d373913c2fc3691081ec659", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 5899, "upload_time": "2019-10-15T18:21:05", "url": "https://files.pythonhosted.org/packages/e3/e1/ff733dfcdf26212b4a56fd144a407ee939cbb2f24e71c0bc1abaf808264a/kss-1.2.5.tar.gz" } ] }