{ "info": { "author": "Yukino Ikegami", "author_email": "yknikgm@gmail.com", "bugtrack_url": null, "classifiers": [ "Development Status :: 3 - Alpha", "Intended Audience :: Developers", "Intended Audience :: Information Technology", "License :: OSI Approved :: MIT License", "Natural Language :: Japanese", "Operating System :: MacOS", "Operating System :: Microsoft", "Operating System :: POSIX", "Programming Language :: Python :: 3.4", "Programming Language :: Python :: 3.5", "Programming Language :: Python :: 3.6", "Programming Language :: Python :: 3.7", "Topic :: Text Processing" ], "description": "sengiri\n==========\n|travis| |coveralls| |pyversion| |version| |license|\n\nYet another sentence-level tokenizer for the Japanese text\n\nDEPENDENCY\n==============\n\nMeCab\n\nINSTALLATION\n==============\n\n::\n\n $ pip install sengiri\n\n\nUSAGE\n============\n\n.. code:: python\n\n import sengiri\n\n print(sengiri.tokenize('\u3046\u30fc\u3093\ud83e\udd14\ud83e\udd14\ud83e\udd14\u3069\u3046\u3057\u3088\u3046'))\n #=>['\u3046\u30fc\u3093\ud83e\udd14\ud83e\udd14\ud83e\udd14', '\u3069\u3046\u3057\u3088\u3046']\n print(sengiri.tokenize('\u30e2\u30fc\u5a18\u3002\u306e\u30b3\u30f3\u30b5\u30fc\u30c8\u306b\u884c\u3063\u305f\u3002'))\n #=>['\u30e2\u30fc\u5a18\u3002\u306e\u30b3\u30f3\u30b5\u30fc\u30c8\u306b\u884c\u3063\u305f\u3002']\n print(sengiri.tokenize('\u3042\u308a\u304c\u3068\u3046\uff3e\uff3e \u52a9\u304b\u308a\u307e\u3059\u3002'))\n #=>['\u3042\u308a\u304c\u3068\u3046\uff3e\uff3e', '\u52a9\u304b\u308a\u307e\u3059\u3002']\n print(sengiri.tokenize('\u9854\u6587\u5b57\u30c6\u30b9\u30c8(*\u00b4\u03c9\uff40*)\u3046\u307e\u304f\u3044\u304f\u304b\u306a\uff1f'))\n #=>['\u9854\u6587\u5b57\u30c6\u30b9\u30c8(*\u00b4\u03c9\uff40*)\u3046\u307e\u304f\u3044\u304f\u304b\u306a\uff1f']\n # I recommend using the NEologd dictionary.\n print(sengiri.tokenize('\u9854\u6587\u5b57\u30c6\u30b9\u30c8(*\u00b4\u03c9\uff40*)\u3046\u307e\u304f\u3044\u304f\u304b\u306a\uff1f', mecab_args='-d /usr/local/lib/mecab/dic/mecab-ipadic-neologd'))\n #=>['\u9854\u6587\u5b57\u30c6\u30b9\u30c8(*\u00b4\u03c9\uff40*)', '\u3046\u307e\u304f\u3044\u304f\u304b\u306a\uff1f']\n print(sengiri.tokenize('\u5b50\u4f9b\u304c\u5927\u5909\u306a\u3053\u3068\u306b\u306a\u3063\u305f\u3002'\n '\uff08\u5f8c\u3067\u805e\u3044\u305f\u306e\u3060\u304c\u3001\u8105\u3055\u308c\u305f\u3089\u3057\u3044\uff09'\n '\uff08\u8105\u8feb\u306f\u3084\u3081\u3066\u307b\u3057\u3044\u3068\u8a00\u3063\u3066\u3044\u308b\u306e\u306b\uff09'))\n #=>['\u5b50\u4f9b\u304c\u5927\u5909\u306a\u3053\u3068\u306b\u306a\u3063\u305f\u3002', '\uff08\u5f8c\u3067\u805e\u3044\u305f\u306e\u3060\u304c\u3001\u8105\u3055\u308c\u305f\u3089\u3057\u3044\uff09', '\uff08\u8105\u8feb\u306f\u3084\u3081\u3066\u307b\u3057\u3044\u3068\u8a00\u3063\u3066\u3044\u308b\u306e\u306b\uff09']\n print(sengiri.tokenize('\u697d\u3057\u304b\u3063\u305fw \u307e\u305f\u904a\u307cwww'))\n #=>['\u697d\u3057\u304b\u3063\u305fw', '\u307e\u305f\u904a\u307cwww']\n print(sengiri.tokenize('http://www.inpaku.go.jp/'))\n #=>['http://www.inpaku.go.jp/']\n\n.. |travis| image:: https://travis-ci.org/ikegami-yukino/sengiri.svg?branch=master\n :target: https://travis-ci.org/ikegami-yukino/sengiri\n :alt: travis-ci.org\n\n.. |coveralls| image:: https://coveralls.io/repos/ikegami-yukino/sengiri/badge.svg?branch=master&service=github\n :target: https://coveralls.io/github/ikegami-yukino/sengiri?branch=master\n :alt: coveralls.io\n\n.. |pyversion| image:: https://img.shields.io/pypi/pyversions/sengiri.svg\n\n.. |version| image:: https://img.shields.io/pypi/v/sengiri.svg\n :target: http://pypi.python.org/pypi/sengiri/\n :alt: latest version\n\n.. |license| image:: https://img.shields.io/pypi/l/sengiri.svg\n :target: http://pypi.python.org/pypi/sengiri/\n :alt: license\n\n\nCHANGES\n=======\n\n0.2.1 (2019-10-12)\n------------------\n\n- Works well with also a text including emoticon and www (Laughing expression)\n- Always treat emoji to delimiter regardless MeCab's POS\n\n0.1.1 (2019-10-05)\n------------------\n\n- First release", "description_content_type": "", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/ikegami-yukino/sengiri", "keywords": "japanese,tokenizer,sentence,sentence-tokenizer", "license": "MIT License", "maintainer": "", "maintainer_email": "", "name": "sengiri", "package_url": "https://pypi.org/project/sengiri/", "platform": "POSIX", "project_url": "https://pypi.org/project/sengiri/", "project_urls": { "Homepage": "https://github.com/ikegami-yukino/sengiri" }, "release_url": "https://pypi.org/project/sengiri/0.2.1/", "requires_dist": null, "requires_python": "", "summary": "Yet another sentence-level tokenizer for the Japanese text", "version": "0.2.1" }, "last_serial": 5961408, "releases": { "0.1.1": [ { "comment_text": "", "digests": { "md5": "d1a1ffa5fe5699ad13a04ba123b837ba", "sha256": "c5c1557021b6a05d52c5495538187c1f8a2912a02d013f4f8c8432d983a15d96" }, "downloads": -1, "filename": "sengiri-0.1.1.tar.gz", "has_sig": false, "md5_digest": "d1a1ffa5fe5699ad13a04ba123b837ba", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 3633, "upload_time": "2019-10-04T20:41:36", "url": "https://files.pythonhosted.org/packages/c5/24/86a7164f5e891886e191019f32ed948956342bc28abf7c0e4393e78a18e9/sengiri-0.1.1.tar.gz" } ], "0.2": [ { "comment_text": "", "digests": { "md5": "4e8723dcb884f2a6d1eb4f2d734059b9", "sha256": "b0e8b286d97d7acb0b98136b806bb46804439a06e7864f2403c0708fdc1045d3" }, "downloads": -1, "filename": "sengiri-0.2.tar.gz", "has_sig": false, "md5_digest": "4e8723dcb884f2a6d1eb4f2d734059b9", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 4197, "upload_time": "2019-10-10T18:08:11", "url": "https://files.pythonhosted.org/packages/be/be/aba6739c9c2e3a339095e94b793ebaa4e6d4481e2be208d46c59db4e7497/sengiri-0.2.tar.gz" } ], "0.2.1": [ { "comment_text": "", "digests": { "md5": "9b51862a856394c52996c23d854e8655", "sha256": "7d0ae55513dc7b7ed7a1cf42dbd7983c5bc900f0d9c9e39b3a8dfd65d178eeaa" }, "downloads": -1, "filename": "sengiri-0.2.1.tar.gz", "has_sig": false, "md5_digest": "9b51862a856394c52996c23d854e8655", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 4322, "upload_time": "2019-10-11T17:44:11", "url": "https://files.pythonhosted.org/packages/66/51/584352e8c50c2d73696c40b3be8a78b6eab7ee3db7b021c1c7e247ceb399/sengiri-0.2.1.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "9b51862a856394c52996c23d854e8655", "sha256": "7d0ae55513dc7b7ed7a1cf42dbd7983c5bc900f0d9c9e39b3a8dfd65d178eeaa" }, "downloads": -1, "filename": "sengiri-0.2.1.tar.gz", "has_sig": false, "md5_digest": "9b51862a856394c52996c23d854e8655", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 4322, "upload_time": "2019-10-11T17:44:11", "url": "https://files.pythonhosted.org/packages/66/51/584352e8c50c2d73696c40b3be8a78b6eab7ee3db7b021c1c7e247ceb399/sengiri-0.2.1.tar.gz" } ] }