{ "info": { "author": "Yukino Ikegami", "author_email": "yknikgm@gmail.com", "bugtrack_url": null, "classifiers": [ "Development Status :: 3 - Alpha", "Intended Audience :: Developers", "Intended Audience :: Information Technology", "License :: OSI Approved :: Apache Software License", "Natural Language :: Chinese (Simplified)", "Natural Language :: Chinese (Traditional)", "Natural Language :: Japanese", "Programming Language :: Python :: 2.6", "Programming Language :: Python :: 2.7", "Programming Language :: Python :: 3.3", "Programming Language :: Python :: 3.4", "Programming Language :: Python :: 3.5", "Programming Language :: Python :: 3.6", "Topic :: Text Processing :: Linguistic" ], "description": "Rakuten MA Python\n===================\n\n|travis| |coveralls| |pyversion| |version| |landscape| |license|\n\nRakuten MA Python (morphological analyzer) is a Python version of Rakuten MA (word segmentor + PoS Tagger) for Chinese and Japanese.\n\nFor details about Rakuten MA, See https://github.com/rakuten-nlp/rakutenma\n\nSee also http://qiita.com/yukinoi/items/925bc238185aa2fad8a7 (In Japanese)\n\nContributions are welcome!\n\n\nInstallation\n==============\n\n::\n\n pip install rakutenma\n\nExample\n===========\n\n.. code:: python\n\n from rakutenma import RakutenMA\n\n # Initialize a RakutenMA instance with an empty model\n # the default ja feature set is set already\n rma = RakutenMA()\n\n # Let's analyze a sample sentence (from http://tatoeba.org/jpn/sentences/show/103809)\n # With a disastrous result, since the model is empty!\n print(rma.tokenize(\"\u5f7c\u306f\u65b0\u3057\u3044\u4ed5\u4e8b\u3067\u304d\u3063\u3068\u6210\u529f\u3059\u308b\u3060\u308d\u3046\u3002\"))\n\n # Feed the model with ten sample sentences from tatoeba.com\n # \"tatoeba.json\" is available at https://github.com/rakuten-nlp/rakutenma\n import json\n tatoeba = json.load(open(\"tatoeba.json\"))\n for i in tatoeba:\n rma.train_one(i)\n\n # Now what does the result look like?\n print(rma.tokenize(\"\u5f7c\u306f\u65b0\u3057\u3044\u4ed5\u4e8b\u3067\u304d\u3063\u3068\u6210\u529f\u3059\u308b\u3060\u308d\u3046\u3002\"))\n\n # Initialize a RakutenMA instance with a pre-trained model\n rma = RakutenMA(phi=1024, c=0.007812) # Specify hyperparameter for SCW (for demonstration purpose)\n rma.load(\"model_ja.json\")\n\n # Set the feature hash function (15bit)\n rma.hash_func = rma.create_hash_func(15)\n\n # Tokenize one sample sentence\n print(rma.tokenize(\"\u3046\u3089\u306b\u308f\u306b\u306f\u306b\u308f\u306b\u308f\u3068\u308a\u304c\u3044\u308b\"));\n\n # Re-train the model feeding the right answer (pairs of [token, PoS tag])\n res = rma.train_one(\n [[\"\u3046\u3089\u306b\u308f\",\"N-nc\"],\n [\"\u306b\",\"P-k\"],\n [\"\u306f\",\"P-rj\"],\n [\"\u306b\u308f\",\"N-n\"],\n [\"\u306b\u308f\u3068\u308a\",\"N-nc\"],\n [\"\u304c\",\"P-k\"],\n [\"\u3044\u308b\",\"V-c\"]])\n # The result of train_one contains:\n # sys: the system output (using the current model)\n # ans: answer fed by the user\n # update: whether the model was updated\n print(res)\n\n # Now what does the result look like?\n print(rma.tokenize(\"\u3046\u3089\u306b\u308f\u306b\u306f\u306b\u308f\u306b\u308f\u3068\u308a\u304c\u3044\u308b\"))\n\n\nNOTE\n===========\n\nAdded API\n--------------\nAs compared to original RakutenMA, following methods are added:\n\n- RakutenMA::load(model_path)\n - Load model from JSON file\n\n- RakutenMA::save(model_path)\n - Save model to path\n\nmisc\n--------------\nAs initial setting, following values are set:\n\n- rma.featset = CTYPE_JA_PATTERNS # RakutenMA.default_featset_ja\n- rma.hash_func = rma.create_hash_func(15)\n- rma.tag_scheme = \"SBIEO\" # if using Chinese, set \"IOB2\"\n\nLICENSE\n=========\n\nApache License version 2.0\n\n\nCopyright\n=============\n\nRakuten MA Python\n(c) 2015- Yukino Ikegami. All Rights Reserved.\n\nRakuten MA (original)\n(c) 2014 Rakuten NLP Project. All Rights Reserved.\n\n.. |travis| image:: https://travis-ci.org/ikegami-yukino/rakutenma-python.svg?branch=master\n :target: https://travis-ci.org/ikegami-yukino/rakutenma-python\n :alt: travis-ci.org\n.. |coveralls| image:: https://coveralls.io/repos/ikegami-yukino/rakutenma-python/badge.png\n :target: https://coveralls.io/r/ikegami-yukino/rakutenma-python\n :alt: coveralls.io\n\n.. |pyversion| image:: https://img.shields.io/pypi/pyversions/rakutenma.svg\n\n.. |version| image:: https://img.shields.io/pypi/v/rakutenma.svg\n :target: http://pypi.python.org/pypi/rakutenma/\n :alt: latest version\n\n.. |landscape| image:: https://landscape.io/github/ikegami-yukino/rakutenma-python/master/landscape.svg?style=flat\n :target: https://landscape.io/github/ikegami-yukino/rakutenma-python/master\n :alt: Code Health\n\n.. |license| image:: https://img.shields.io/pypi/l/rakutenma.svg\n :target: http://pypi.python.org/pypi/rakutenma/\n :alt: license\n\n\nCHANGES\n=======\n\n0.3.3 (2017-05-22)\n-------------------------\n\n- Bug fix about training\n\n0.3.2 (2017-02-01)\n-------------------------\n\n- Use ujson when possible\n- Enable POS to MeCab style\n- Support Python 3.5 and 3.6\n\n0.3 (2016-04-10)\n-------------------------\n\n- Add CUI ($ rakutenma)\n\n0.2.2 (2016-04-09)\n-------------------------\n\n- Bundle model files (model_ja.json, model_ja_min.json)\n- Support Windows\n\n0.2 (2015-01-10)\n-------------------------\n\n- Support Python 2.6 and 2.7\n\n0.1.1 (2015-01-08)\n-------------------------\n\n- Slightly improve performance\n\n0.1 (2015-01-01)\n-------------------------\n\n- First release.\n", "description_content_type": null, "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/ikegami-yukino/rakutenma-python", "keywords": "morphological analyzer", "license": "Apache Software License", "maintainer": "", "maintainer_email": "", "name": "rakutenma", "package_url": "https://pypi.org/project/rakutenma/", "platform": "POSIX", "project_url": "https://pypi.org/project/rakutenma/", "project_urls": { "Homepage": "https://github.com/ikegami-yukino/rakutenma-python" }, "release_url": "https://pypi.org/project/rakutenma/0.3.3/", "requires_dist": null, "requires_python": "", "summary": "morphological analyzer (word segmentor + PoS Tagger) for Chinese and Japanese", "version": "0.3.3" }, "last_serial": 2890084, "releases": { "0.1.1": [ { "comment_text": "", "digests": { "md5": "f35859d6d56472c8584421feb1360284", "sha256": "5b3da01e65424caa84911fc4a5e828826bb80458c4031cd119329021c452962e" }, "downloads": -1, "filename": "rakutenma-0.1.1.tar.gz", "has_sig": false, "md5_digest": "f35859d6d56472c8584421feb1360284", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 11453, "upload_time": "2015-01-08T12:51:58", "url": "https://files.pythonhosted.org/packages/3d/9f/45c959cc5db7565e5411b8b0c032f87a37f2ba225e30b97d9a3af172243b/rakutenma-0.1.1.tar.gz" } ], "0.2.3": [ { "comment_text": "", "digests": { "md5": "8049372bf4a32b7af034162980008260", "sha256": "4075fe015c7be92a9e2edeaf55f2232f767bcf69c9b7f57ab2f626af99eb201c" }, "downloads": -1, "filename": "rakutenma-0.2.3.tar.gz", "has_sig": false, "md5_digest": "8049372bf4a32b7af034162980008260", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 25023034, "upload_time": "2016-04-09T10:50:29", "url": "https://files.pythonhosted.org/packages/0f/99/2cf8ac06f0f903735aca3229799d18c4fb082d434bf194635d4ce3c663f0/rakutenma-0.2.3.tar.gz" } ], "0.3.1": [ { "comment_text": "", "digests": { "md5": "b65d5c1a4ac385c5b661b5b7db84188d", "sha256": "dda1f79a8a26a7916f977927f2fc77cb67459257d3fb8eb38a6f87d4f35c4504" }, "downloads": -1, "filename": "rakutenma-0.3.1.tar.gz", "has_sig": false, "md5_digest": "b65d5c1a4ac385c5b661b5b7db84188d", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 25023806, "upload_time": "2016-04-09T17:14:07", "url": "https://files.pythonhosted.org/packages/e2/72/a37d536bccff10cb72f9e6567031617cbc496b1e01a509efdd4507748bac/rakutenma-0.3.1.tar.gz" } ], "0.3.2": [ { "comment_text": "", "digests": { "md5": "00b3d2c5c0ef6d32b59063be5364d2ab", "sha256": "6c82bd5a90f6131777d36f6890a2656ae4e07709217d9ea9b12aea2454531eda" }, "downloads": -1, "filename": "rakutenma-0.3.2.tar.gz", "has_sig": false, "md5_digest": "00b3d2c5c0ef6d32b59063be5364d2ab", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 25025196, "upload_time": "2017-01-31T20:44:44", "url": "https://files.pythonhosted.org/packages/df/1d/a3a1116894e7d340050f5131ccf782ef20004f5c535ac1b0a5e541ca889c/rakutenma-0.3.2.tar.gz" } ], "0.3.3": [ { "comment_text": "", "digests": { "md5": "1cc18f28ae60e334211a2bc707129b20", "sha256": "2ee2ffe6cf93d7dae805f7e0e1971ef6a40f724a72e92d4de4f498097f776773" }, "downloads": -1, "filename": "rakutenma-0.3.3.tar.gz", "has_sig": false, "md5_digest": "1cc18f28ae60e334211a2bc707129b20", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 25025324, "upload_time": "2017-05-22T07:46:59", "url": "https://files.pythonhosted.org/packages/b7/53/e2c622a87a70a8f271b8bf8d62147366f619cad41e3ecbb21e4c6acdc540/rakutenma-0.3.3.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "1cc18f28ae60e334211a2bc707129b20", "sha256": "2ee2ffe6cf93d7dae805f7e0e1971ef6a40f724a72e92d4de4f498097f776773" }, "downloads": -1, "filename": "rakutenma-0.3.3.tar.gz", "has_sig": false, "md5_digest": "1cc18f28ae60e334211a2bc707129b20", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 25025324, "upload_time": "2017-05-22T07:46:59", "url": "https://files.pythonhosted.org/packages/b7/53/e2c622a87a70a8f271b8bf8d62147366f619cad41e3ecbb21e4c6acdc540/rakutenma-0.3.3.tar.gz" } ] }