{ "info": { "author": "lovit", "author_email": "soy.lovit@gmail.com", "bugtrack_url": null, "classifiers": [ "License :: OSI Approved :: GNU General Public License v3 (GPLv3)", "Operating System :: OS Independent", "Programming Language :: Python :: 3.6" ], "description": "# \ud55c\uad6d\uc5b4 \uc6a9\uc5b8 \ubd84\uc11d\uae30 (Korean Lemmatizer)\n\n\ud55c\uad6d\uc5b4\uc758 \ub3d9\uc0ac\uc640 \ud615\uc6a9\uc0ac\uc758 \ud65c\uc6a9\ud615 (surfacial form) \uc744 \ubd84\uc11d\ud569\ub2c8\ub2e4. \ud55c\uad6d\uc5b4 \uc6a9\uc5b8 \ubd84\uc11d\uae30\ub294 \ub2e4\uc74c\uc758 \uae30\ub2a5\uc744 \uc81c\uacf5\ud569\ub2c8\ub2e4.\n\n1. \uc785\ub825\ub41c \ub2e8\uc5b4\ub97c \uc5b4\uac04 (stem) \uacfc \uc5b4\ubbf8 (eomi) \uc73c\ub85c \ubd84\ub9ac\n1. \uc785\ub825\ub41c \ub2e8\uc5b4\ub97c \uc6d0\ud615\uc73c\ub85c \ubcf5\uc6d0\n\n\uc774 \ud328\ud0a4\uc9c0\uc758 \uad6c\ud604 \uc6d0\ub9ac\ub294 [github.io \ube14\ub85c\uadf8][io]\uc5d0 \uc815\ub9ac\ud558\uc600\uc2b5\ub2c8\ub2e4.\n\n[io]: https://lovit.github.io/nlp/2019/01/22/trained_kor_lemmatizer/\n\n## Usage\n\n### analyze, lemmatize, conjugate\n\n`analyze` function returns morphemes of the given predicator word\n\n```python\nfrom soylemma import Lemmatizer\n\nlemmatizer = Lemmatizer()\nlemmatizer.analyze('\ucc28\uac00\uc6b0\ub2c8\uae4c')\n```\n\nThe return value forms list of tuples because there can be more than one morpheme combination.\n\n```\n[(('\ucc28\uac11', 'Adjective'), ('\uc6b0\ub2c8\uae4c', 'Eomi'))]\n```\n\n`lemmatize` function returns lemma of the given predicator word.\n\n```python\nlemmatizer.lemmatize('\ucc28\uac00\uc6b0\ub2c8\uae4c')\n```\n\n```\n[('\ucc28\uac11\ub2e4', 'Adjective')]\n```\n\nIf the input word is not predicator such as Noun, it return empty list.\n\n```python\nlemmatizer.lemmatize('\ud55c\uad6d\uc5b4') # []\n```\n\n`conjugate` function returns surfacial form. You should put stem and eomi as arguments. It returns all possible surfacial forms for the given stem and eomi.\n\n```python\nlemmatizer.conjugate(stem='\ucc28\uac11', eomi='\uc6b0\ub2c8\uae4c')\nlemmatizer.conjugate('\uc608\uc058', '\uc5c8\ub358')\n```\n\n```\n['\ucc28\uac00\uc6b0\ub2c8\uae4c', '\ucc28\uac11\uc6b0\ub2c8\uae4c']\n['\uc608\ubee4\ub358', '\uc608\uc058\uc5c8\ub358']\n```\n\n### update dictionaries and rules\n\nFor demonstration, we use dictioanry `demo`.\n\n`\uc5b4\uc5ec\ubee4\uc5b4` cannot be analyzed because the adjective `\uc5b4\uc5ec\uc058` does not enrolled in dictionary.\n\n```python\nfrom soylemma import Lemmatizer\n\nlemmatizer = Lemmatizer(dictionary_name='demo')\nprint(lemmatizer.analyze('\uc5b4\uc5ec\ubee4\uc5b4')) # []\n```\n\nSo, we add the word with tag using `add_words` function. Do it again. Then you can see the word `\uc5b4\uc5ec\ubee4\uc5b4` is analyzed.\n\n```python\nlemmatizer.add_words('\uc5b4\uc5ec\uc058', 'Adjective')\nlemmatizer.analyze('\uc5b4\uc5ec\ubee4\uc5b4')\n```\n\n```\n[(('\uc5b4\uc5ec\uc058', 'Adjective'), ('\uc5c8\uc5b4', 'Eomi'))]\n```\n\nHowever, the word `\ud30c\ub7ac\ub2e4` is still not able to be analyzed because the lemmatization rule for surfacial form `\ub7ac` does not exist.\n\n```python\nlemmatizer.analyze('\ud30c\ub7ac\ub2e4') # []\n```\n\nSo, in this time, we update additional lemmatization rules using `add_lemma_rules` function.\n\n```python\nsupplements = {\n '\ub7ac': {('\ub797', '\uc558')}\n}\n\nlemmatizer.add_lemma_rules(supplements)\n```\n\nAfter that, we can see the word `\ud30c\ub7ac\ub2e4` is analyzed, and also conjugation of `\ud30c\ub797 + \uc558\ub2e4` is available.\n\n```python\nlemmatizer.analyze('\ud30c\ub7ac\ub2e4')\nlemmatizer.conjugate('\ud30c\ub797', '\uc558\ub2e4')\n```\n\n```\n[(('\ud30c\ub797', 'Adjective'), ('\uc558\ub2e4', 'Eomi'))]\n['\ud30c\ub7ac\ub2e4', '\ud30c\ub797\uc558\ub2e4']\n```\n\n### debug on\n\nIf you wonder which subwords came up as candidates of (stem, eomi), use `debug`.\n\n```python\nlemmatizer.analyze('\ud30c\ub7ac\ub2e4', debug=True)\n```\n\n```\n[DEBUG] word: \ud30c\ub7ac\ub2e4 = \ud30c\ub797 + \uc558\ub2e4, conjugation: \ub7ac = \ub797 + \uc558\n[(('\ud30c\ub797', 'Adjective'), ('\uc558\ub2e4', 'Eomi'))]\n```\n\n### lemmatization rule extractor\n\nYou can extract lemmatization rule using `extract_rule` function.\n\n```python\nfrom soylemma import extract_rule\n\neojeol = '\ub85c\ub4dc\ubb34\ube44\uc600\ub2e4'\nlw = '\ub85c\ub4dc\ubb34\ube44\uc774'\nlt = 'Adjective'\nrw = '\uc5c8\ub2e4'\nrt = 'Eomi'\n\nextract_rule(eojeol, lw, lt, rw, rt)\n```\n\n```\n('\uc600\ub2e4', ('\uc774', '\uc5c8\ub2e4'))\n```\n\n\n", "description_content_type": "text/markdown", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/lovit/korean_lemmatizer", "keywords": "korean-nlp,nlp,lemmatizer", "license": "", "maintainer": "", "maintainer_email": "", "name": "soylemma", "package_url": "https://pypi.org/project/soylemma/", "platform": "", "project_url": "https://pypi.org/project/soylemma/", "project_urls": { "Homepage": "https://github.com/lovit/korean_lemmatizer" }, "release_url": "https://pypi.org/project/soylemma/0.2.0/", "requires_dist": null, "requires_python": "", "summary": "Trained Korean Lemmatizer", "version": "0.2.0" }, "last_serial": 5908497, "releases": { "0.1.0": [ { "comment_text": "", "digests": { "md5": "6a6578f5a4bc4f1fe0cf1dc159eb939e", "sha256": "1a97aab959f624ed30307d9e14ea5a287de455a7ba33bfcfabbe68fbf763c35e" }, "downloads": -1, "filename": "soylemma-0.1.0-py3.7.egg", "has_sig": false, "md5_digest": "6a6578f5a4bc4f1fe0cf1dc159eb939e", "packagetype": "bdist_egg", "python_version": "3.7", "requires_python": null, "size": 101860, "upload_time": "2019-05-26T18:06:58", "url": "https://files.pythonhosted.org/packages/2f/7a/9907596e6965ecf37148f0d5aadaa72a5cc5259534af52c7b846db0ea1df/soylemma-0.1.0-py3.7.egg" }, { "comment_text": "", "digests": { "md5": "a915c818fcbf4080cfbfaa0ce838446c", "sha256": "25b4ca2b5f1cbb75e0642a7a5e813045cbd2c01a03d4da17850921386ebf90af" }, "downloads": -1, "filename": "soylemma-0.1.0-py3-none-any.whl", "has_sig": false, "md5_digest": "a915c818fcbf4080cfbfaa0ce838446c", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 91812, "upload_time": "2019-05-26T18:06:54", "url": "https://files.pythonhosted.org/packages/28/df/8cd8f8896012cc150f9d1ce103d8e79005b99de4d51eb97c26e4b79eee3c/soylemma-0.1.0-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "0df7e3976a871016710db644fff76006", "sha256": "dea434f26ac1f3c9bd960e3b4fa145d50ed10706a424c31638cc79425f49846c" }, "downloads": -1, "filename": "soylemma-0.1.0.tar.gz", "has_sig": false, "md5_digest": "0df7e3976a871016710db644fff76006", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 93771, "upload_time": "2019-05-26T18:07:00", "url": "https://files.pythonhosted.org/packages/4c/01/07da5b88fcc7217fa8dcae840c276e93b22504d8b4bd4ec7791ebc3b6fa2/soylemma-0.1.0.tar.gz" } ], "0.1.1": [ { "comment_text": "", "digests": { "md5": "c8191c4e33cd38f449f2e71f83ab75a5", "sha256": "286272d2c2c0893a3a8c97ee36d1024586cced61539d6d11cddca8d95a41cf95" }, "downloads": -1, "filename": "soylemma-0.1.1-py3.7.egg", "has_sig": false, "md5_digest": "c8191c4e33cd38f449f2e71f83ab75a5", "packagetype": "bdist_egg", "python_version": "3.7", "requires_python": null, "size": 102290, "upload_time": "2019-09-27T18:51:26", "url": "https://files.pythonhosted.org/packages/c7/8f/a5bf85531bff288e52821b86fd6a6861f84cf06d29847cebb46fda040253/soylemma-0.1.1-py3.7.egg" }, { "comment_text": "", "digests": { "md5": "b59d768f9893477e79d57b027e3e513b", "sha256": "b8aca0de6b1f3ac89aa1a89267d766acfdd3db8a8259195611906be6b4114abe" }, "downloads": -1, "filename": "soylemma-0.1.1-py3-none-any.whl", "has_sig": false, "md5_digest": "b59d768f9893477e79d57b027e3e513b", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 92038, "upload_time": "2019-09-27T18:51:24", "url": "https://files.pythonhosted.org/packages/7e/ac/bbdcfce243d291d1eb574fa7fa0b36ecee8eca259f97d5f6e2dad9e857f7/soylemma-0.1.1-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "ba9c1f83f221c45b2a3b7790f60a25e1", "sha256": "1c104534e3095e5e41cdf2dc58c047eda210b6274f5363547e314341bd491519" }, "downloads": -1, "filename": "soylemma-0.1.1.tar.gz", "has_sig": false, "md5_digest": "ba9c1f83f221c45b2a3b7790f60a25e1", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 93949, "upload_time": "2019-09-27T18:51:28", "url": "https://files.pythonhosted.org/packages/e4/22/de9e4961b4b36c84761038f7f0b210e7f6ccd7bd14d997a99347704f6e65/soylemma-0.1.1.tar.gz" } ], "0.2.0": [ { "comment_text": "", "digests": { "md5": "ed009d4e518b9a9283f54c9cfabea4d5", "sha256": "8b2d4734c9b7669ae95810ff561acfd1ed0a86b26f1307cb1e777412196ded6c" }, "downloads": -1, "filename": "soylemma-0.2.0-py3.7.egg", "has_sig": false, "md5_digest": "ed009d4e518b9a9283f54c9cfabea4d5", "packagetype": "bdist_egg", "python_version": "3.7", "requires_python": null, "size": 134951, "upload_time": "2019-09-30T19:06:25", "url": "https://files.pythonhosted.org/packages/12/4f/c2b97e5407d843f1aac8f7250fb5253ae788889972ff647089015d429b55/soylemma-0.2.0-py3.7.egg" }, { "comment_text": "", "digests": { "md5": "540d1ebce5f64e47791dae7fa6446ea9", "sha256": "6767c3e35a45a475bf7635029eaa20ac5f492574b73952d1b6bfb78ed23a3a69" }, "downloads": -1, "filename": "soylemma-0.2.0-py3-none-any.whl", "has_sig": false, "md5_digest": "540d1ebce5f64e47791dae7fa6446ea9", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 124113, "upload_time": "2019-09-30T19:06:23", "url": "https://files.pythonhosted.org/packages/03/f5/d612c52b363a1e04b2df6dbde86bedd84035ba3d07651c586dcfdac50b53/soylemma-0.2.0-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "08dd559b9b04ceb9755fa55e140924e8", "sha256": "fa43afd27be7059fe4716774c928454f091e3a3b686a1f97c12e3d825960b6ec" }, "downloads": -1, "filename": "soylemma-0.2.0.tar.gz", "has_sig": false, "md5_digest": "08dd559b9b04ceb9755fa55e140924e8", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 125174, "upload_time": "2019-09-30T19:06:28", "url": "https://files.pythonhosted.org/packages/bf/9a/6fd9d5f475191aa834f8c75c5a94759fab5ebfc240e22262da43f006cbff/soylemma-0.2.0.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "ed009d4e518b9a9283f54c9cfabea4d5", "sha256": "8b2d4734c9b7669ae95810ff561acfd1ed0a86b26f1307cb1e777412196ded6c" }, "downloads": -1, "filename": "soylemma-0.2.0-py3.7.egg", "has_sig": false, "md5_digest": "ed009d4e518b9a9283f54c9cfabea4d5", "packagetype": "bdist_egg", "python_version": "3.7", "requires_python": null, "size": 134951, "upload_time": "2019-09-30T19:06:25", "url": "https://files.pythonhosted.org/packages/12/4f/c2b97e5407d843f1aac8f7250fb5253ae788889972ff647089015d429b55/soylemma-0.2.0-py3.7.egg" }, { "comment_text": "", "digests": { "md5": "540d1ebce5f64e47791dae7fa6446ea9", "sha256": "6767c3e35a45a475bf7635029eaa20ac5f492574b73952d1b6bfb78ed23a3a69" }, "downloads": -1, "filename": "soylemma-0.2.0-py3-none-any.whl", "has_sig": false, "md5_digest": "540d1ebce5f64e47791dae7fa6446ea9", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 124113, "upload_time": "2019-09-30T19:06:23", "url": "https://files.pythonhosted.org/packages/03/f5/d612c52b363a1e04b2df6dbde86bedd84035ba3d07651c586dcfdac50b53/soylemma-0.2.0-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "08dd559b9b04ceb9755fa55e140924e8", "sha256": "fa43afd27be7059fe4716774c928454f091e3a3b686a1f97c12e3d825960b6ec" }, "downloads": -1, "filename": "soylemma-0.2.0.tar.gz", "has_sig": false, "md5_digest": "08dd559b9b04ceb9755fa55e140924e8", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 125174, "upload_time": "2019-09-30T19:06:28", "url": "https://files.pythonhosted.org/packages/bf/9a/6fd9d5f475191aa834f8c75c5a94759fab5ebfc240e22262da43f006cbff/soylemma-0.2.0.tar.gz" } ] }