{ "info": { "author": "Kyubyong Park", "author_email": "kbpark.linguist@gmail.com", "bugtrack_url": null, "classifiers": [ "Development Status :: 5 - Production/Stable", "Intended Audience :: Developers", "Intended Audience :: Science/Research", "License :: OSI Approved :: Apache Software License", "Operating System :: OS Independent", "Programming Language :: Python :: 3", "Programming Language :: Python :: 3.6", "Programming Language :: Python :: 3.7" ], "description": "# g2pK: g2p module for Korean\n\ng2p means a task that converts graphemes to phonemes. Hangul, the main script for Korean, is phonetic, but the pronunciation rules are notoriously complicated.\nSo it is never easy to learn how to read a text in Korean. That's why g2p is necessary in various nlp tasks like TTS.\n. There's a open source g2p library for Korean, [KoG2P](https://github.com/scarletcho/KoG2P). It is \nsimple and works well, but I think we need a better one. Please read through the following section (main features and usage)\nto understand the philosophy of g2pK and how to use g2pK. We know it is not perfect in present. \nThat's one of the reasons your contributions are more than welcome.\n\n## Requirements\n* python >= 3.6\n* jamo\n* Mecab (Consult http://konlpy.org/en/latest/install/ for installation)\n* konlpy\n* nltk\n\n## Installation\n```\npip install g2pk\n```\n\n## Main features & Usage\n* Returns text as it is pronounced, keeping punctuations.\n```\n>>> from g2pk import G2p\n>>> g2p = G2p()\n>>> g2p(\"\uc5b4\uc81c\ub294 \ub0a0\uc528\uac00 \ub9d1\uc558\ub294\ub370, \uc624\ub298\uc740 \ud750\ub9ac\ub2e4.\")\n\uc5b4\uc81c\ub294 \ub0a0\uc528\uac00 \ub9d0\uac04\ub294\ub370, \uc624\ub290\ub978 \ud750\ub9ac\ub2e4.\n```\n* Determines pronunciation seeing context, thanks to Mecab, a morphological analyzer.\nIn the following example, note that the first and second \uc2e0\uace0 are pronounced differently.\n```\n>>> g2p(\"\uc2e0\uc744 \uc2e0\uace0 \uc5bc\ub978 \ub3d9\uc0ac\ubb34\uc18c\uc5d0 \uac00\uc11c \ud63c\uc778 \uc2e0\uace0 \ud574\ub77c\")\n\uc2dc\ub298 \uc2e0\uaf2c \uc5bc\ub978 \ub3d9\uc0ac\ubb34\uc18c\uc5d0 \uac00\uc11c \ud638\ub2cc \uc2e0\uace0 \ud574\ub77c\n```\n* Returns two types of results, that is, prescriptive (default) and descriptive (with the option `descriptive=True`) pronunciation.\nFor example, josa \uc758 is pronounced \uc758 in principle, but in real life, it is often pronounced \uc5d0.\nAlso, \uacc4 is much more often pronounced \uac8c. \n```\n>>> sent = \"\ub098\uc758 \uce5c\uad6c\ub294 \uacc4\uc0b0\uc774 \uc544\uc8fc \ube60\ub974\ub2e4\"\n>>> g2p(sent)\n\ub098\uc758 \uce5c\uad6c\ub294 \uacc4\uc0ac\ub2c8 \uc544\uc8fc \ube60\ub974\ub2e4\n>>> g2p(sent, descriptive=True)\n\ub098\uc5d0 \uce5c\uad6c\ub294 \uac8c\uc0ac\ub2c8 \uc544\uc8fc \ube60\ub974\ub2e4\n```\n* This distinction becomes more obvious if you set `group_vowels=True`.\nIn contemporary colloquial speech, some vowels are hard to distinguish from each other.\nFor example, in the example below, the vowel \u3152 is normalized to \u3156.\n```\n>>> sent = \"\uc800\ub294 \uc608\uc804\uc5d0 \uadf8 \uc598\uae30\ub97c \ub4e4\uc740 \uc801\uc774 \uc788\uc2b5\ub2c8\ub2e4\"\n>>> g2p(sent)\n\uc800\ub290 \ub15c\uc800\ub124 \uadf8 \uc598\uae30\ub97c \ub4dc\ub978 \uc800\uae30 \uc77b\uc500\ub2c8\ub2e4\n>>> g2p(sent, group_vowels=True)\n\uc800\ub290 \ub15c\uc800\ub124 \uadf8 \uc608\uae30\ub97c \ub4dc\ub978 \uc800\uae30 \uc77b\uc500\ub2c8\ub2e4\n```\n* By default, it returns the standard Korean script, where letters are assembled to form a syllable.\n If you set `to_syl=False`, however, it returns Hangul letters or jamo. This can be useful for many applications like speech synthesis.\n\\*Depending on the font you are using, the two results below may look the same, but actually they are not.\n```\n>>> sent = \"\uc5b4\uc81c\ub294 \ub0a0\uc528\uac00 \ub9d1\uc558\ub294\ub370, \uc624\ub298\uc740 \ud750\ub9ac\ub2e4.\"\n>>> g2p(sent)\n\uc5b4\uc81c\ub294 \ub0a0\uc528\uac00 \ub9d0\uac04\ub294\ub370, \uc624\ub290\ub978 \ud750\ub9ac\ub2e4.\n>>> g2p(sent, to_syl=False)\n\u110b\u1165\u110c\u1166\u1102\u1173\u11ab \u1102\u1161\u11af\u110a\u1175\u1100\u1161 \u1106\u1161\u11af\u1100\u1161\u11ab\u1102\u1173\u11ab\u1103\u1166, \u110b\u1169\u1102\u1173\u1105\u1173\u11ab \u1112\u1173\u1105\u1175\u1103\u1161.\n```\n* English words in alphabets are converted into Hangul. \nThis is possible due to [cmu pronouncing dictionary](http://www.speech.cs.cmu.edu/cgi-bin/cmudict).\n```\n>>> sent = \"\uadf8 \uc0ac\ub78c\uc740 \uc880, old school \uac19\uc544\"\n>>> g2p(sent)\n\uadf8 \uc0ac\ub77c\ubbc4 \uc880, \uc62c\ub4dc \uc2a4\ucfe8 \uac00\ud0c0\n```\n* Arabic numbers are spelled out to their context.\n Note that the first 12 is pronounced \uc5f4\ub450, whereas the second 12 is pronounced \uc2ed\uc774.\n```\n>>> sent = \"\uc9c0\uae08 \uc2dc\uac01\uc740 12\uc2dc 12\ubd84\uc785\ub2c8\ub2e4\"\n>>> g2p(sent)\n\uc9c0\uae08 \uc2dc\uac00\uadf8 \ub148\ub450\uc2dc \uc2dc\ube44\ubd80\ub2d8\ub2c8\ub2e4\n```\n* It is natural that rules can NOT cover every single case. Add special idioms to `idioms.txt`.\n* If you set `verbose=True`, you will see the conversion processes with relevant information.\n```\n>>> sent = \"\ud559\uad50\uc5d0 \uac14\ub2e4 \uc640\uc11c, \uc5c4\ub9c8\uac00 \ud574 \uc8fc\uc2e0 \ubc25\uc744 \uba39\uc5c8\ub2e4.\"\n>>> g2p(sent, verbose=True)\n\ud559\uad50\uc5d0 \uac14\ub2e4 \uc640\uc11c, \uc5c4\ub9c8\uac00 \ud574 \uc8fc\uc2e0 \ubc25\uc744 \uba39\uc5c8\ub2e4. -> \ud559\uaf9c\uc5d0 \uac14\ub2e4 \uc640\uc11c, \uc5c4\ub9c8\uac00 \ud574 \uc8fc\uc2e0 \ubc25\uc744 \uba39\uc5c8\ub2e4.\n \uc81c23\ud56d\u3000\ubc1b\uce68 '\u3131(\u3132, \u314b, \u3133, \u313a), \u3137(\u3145, \u3146, \u3148, \u314a, \u314c), \u3142(\u314d, \u313c, \u313f, \u3144)' \ub4a4\uc5d0 \uc5f0\uacb0\ub418\ub294 '\u3131, \u3137, \u3142, \u3145, \u3148'\uc740 \ub41c\uc18c\ub9ac\ub85c \ubc1c\uc74c\ud55c\ub2e4.\n-> \uad6d\ubc25[\uad6d\ube71], \uae4e\ub2e4[\uae4d\ub530], \ub111\ubc1b\uc774[\ub109\ube60\uc9c0], \uc0af\ub3c8[\uc0ad\ub614]\n-> \ub2ed\uc7a5[\ub2e5\uc9f1], \uce61\ubc94[\uce59\ubee0], \ubed7\ub300\ub2e4[\ubed7\ub54c\ub2e4], \uc637\uace0\ub984[\uc62b\uaf2c\ub984]\n-> \uc788\ub358[\uc77b\ub5a4], \uaf42\uace0[\uaf33\uaf2c], \uaf43\ub2e4\ubc1c[\uaf33\ub530\ubc1c], \ub0af\uc124\ub2e4[\ub09f\uc370\ub2e4]\n-> \ubc2d\uac08\uc774[\ubc1b\uae4c\ub9ac], \uc1a5\uc804[\uc193\uca50], \uacf1\ub3cc[\uacf1\ub618], \ub36e\uac1c[\ub365\uae68]\n-> \uc606\uc9d1[\uc5fd\ucc1d], \ub113\uc8fd\ud558\ub2e4[\ub119\ucb48\uce74\ub2e4], \uc74a\uc870\ub9ac\ub2e4[\uc74d\ucabc\ub9ac\ub2e4], \uac12\uc9c0\ub2e4[\uac11\ucc0c\ub2e4] \n\ud559\uaf9c\uc5d0 \uac14\ub2e4 \uc640\uc11c, \uc5c4\ub9c8\uac00 \ud574 \uc8fc\uc2e0 \ubc25\uc744 \uba39\uc5c8\ub2e4. -> \ud559\uaf9c\uc5d0 \uac07\ub530 \uc640\uc11c, \uc5c4\ub9c8\uac00 \ud574 \uc8fc\uc2e0 \ubc25\uc744 \uba39\uc5bb\ub530.\n \uc81c9\ud56d\u3000\ubc1b\uce68 '\u3132, \u314b', '\u3145, \u3146, \u3148, \u314a, \u314c', '\u314d'\uc740 \uc5b4\ub9d0 \ub610\ub294 \uc790\uc74c \uc55e\uc5d0\uc11c \uac01\uac01 \ub300\ud45c\uc74c [\u3131, \u3137, \u3142]\uc73c\ub85c \ubc1c\uc74c\ud55c\ub2e4.\n-> \ub2e6\ub2e4[\ub2e5\ub530], \ud0a4\uc754[\ud0a4\uc73d], \ud0a4\uc754\uacfc[\ud0a4\uc73d\uaf48], \uc637[\uc62b]\n-> \uc6c3\ub2e4[\uc6b7\ub530], \uc788\ub2e4[\uc77b\ub530], \uc816[\uc807], \ube5a\ub2e4[\ube4b\ub530]\n-> \uaf43[\uaf33], \ucad3\ub2e4[\ucac3\ub530], \uc1a5[\uc193], \ubc49\ub2e4[\ubc37\ub530]\n-> \uc55e[\uc555], \ub36e\ub2e4[\ub365\ub530]\n\uc81c23\ud56d\u3000\ubc1b\uce68 '\u3131(\u3132, \u314b, \u3133, \u313a), \u3137(\u3145, \u3146, \u3148, \u314a, \u314c), \u3142(\u314d, \u313c, \u313f, \u3144)' \ub4a4\uc5d0 \uc5f0\uacb0\ub418\ub294 '\u3131, \u3137, \u3142, \u3145, \u3148'\uc740 \ub41c\uc18c\ub9ac\ub85c \ubc1c\uc74c\ud55c\ub2e4.\n-> \uad6d\ubc25[\uad6d\ube71], \uae4e\ub2e4[\uae4d\ub530], \ub111\ubc1b\uc774[\ub109\ube60\uc9c0], \uc0af\ub3c8[\uc0ad\ub614]\n-> \ub2ed\uc7a5[\ub2e5\uc9f1], \uce61\ubc94[\uce59\ubee0], \ubed7\ub300\ub2e4[\ubed7\ub54c\ub2e4], \uc637\uace0\ub984[\uc62b\uaf2c\ub984]\n-> \uc788\ub358[\uc77b\ub5a4], \uaf42\uace0[\uaf33\uaf2c], \uaf43\ub2e4\ubc1c[\uaf33\ub530\ubc1c], \ub0af\uc124\ub2e4[\ub09f\uc370\ub2e4]\n-> \ubc2d\uac08\uc774[\ubc1b\uae4c\ub9ac], \uc1a5\uc804[\uc193\uca50], \uacf1\ub3cc[\uacf1\ub618], \ub36e\uac1c[\ub365\uae68]\n-> \uc606\uc9d1[\uc5fd\ucc1d], \ub113\uc8fd\ud558\ub2e4[\ub119\ucb48\uce74\ub2e4], \uc74a\uc870\ub9ac\ub2e4[\uc74d\ucabc\ub9ac\ub2e4], \uac12\uc9c0\ub2e4[\uac11\ucc0c\ub2e4] \n\ud559\uaf9c\uc5d0 \uac07\ub530 \uc640\uc11c, \uc5c4\ub9c8\uac00 \ud574 \uc8fc\uc2e0 \ubc25\uc744 \uba39\uc5bb\ub530. -> \ud559\uaf9c\uc5d0 \uac07\ub530 \uc640\uc11c, \uc5c4\ub9c8\uac00 \ud574 \uc8fc\uc2e0 \ubc14\ube14 \uba38\uac77\ub530.\n \uc81c13\ud56d\u3000\ud651\ubc1b\uce68\uc774\ub098 \uc30d\ubc1b\uce68\uc774 \ubaa8\uc74c\uc73c\ub85c \uc2dc\uc791\ub41c \uc870\uc0ac\ub098 \uc5b4\ubbf8, \uc811\ubbf8\uc0ac\uc640 \uacb0\ud569\ub418\ub294 \uacbd\uc6b0\uc5d0\ub294, \uc81c \uc74c\uac00\ub300\ub85c \ub4a4 \uc74c\uc808 \uccab\uc18c\ub9ac\ub85c \uc62e\uaca8 \ubc1c\uc74c\ud55c\ub2e4.\n-> \uae4e\uc544[\uae4c\uae4c], \uc637\uc774[\uc624\uc2dc], \uc788\uc5b4[\uc774\uc368], \ub0ae\uc774[\ub098\uc9c0]\n-> \uaf42\uc544[\uaf2c\uc790], \uaf43\uc744[\uaf2c\uce28], \ucad3\uc544[\ucabc\ucc28], \ubc2d\uc5d0[\ubc14\ud14c]\n-> \uc55e\uc73c\ub85c[\uc544\ud504\ub85c], \ub36e\uc774\ub2e4[\ub354\ud53c\ub2e4] \n```\n\n\n## References\n\nIf you use our software for research, please cite:\n\n```\n@misc{gp2K2019,\n author = {Park, Kyubyong},\n title = {g2pK},\n year = {2019},\n publisher = {GitHub},\n journal = {GitHub repository},\n howpublished = {\\url{https://github.com/Kyubyong/g2pk}}\n}\n```\n\n\n", "description_content_type": "text/markdown", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/Kyubyong/g2pK", "keywords": "", "license": "Apache License 2.0", "maintainer": "", "maintainer_email": "", "name": "g2pK", "package_url": "https://pypi.org/project/g2pK/", "platform": "", "project_url": "https://pypi.org/project/g2pK/", "project_urls": { "Homepage": "https://github.com/Kyubyong/g2pK" }, "release_url": "https://pypi.org/project/g2pK/0.9.3/", "requires_dist": [ "jamo", "nltk", "konlpy", "mecab-python" ], "requires_python": ">=3.6", "summary": "g2pK: g2p module for Korean", "version": "0.9.3" }, "last_serial": 5460286, "releases": { "0.9.3": [ { "comment_text": "", "digests": { "md5": "c317b1de7b4f0ba64cf75633e665bcc8", "sha256": "36ca614efd73438afc561ae2b07f027d3d278067967b7dc297b8783a271bf910" }, "downloads": -1, "filename": "g2pK-0.9.3-py3-none-any.whl", "has_sig": false, "md5_digest": "c317b1de7b4f0ba64cf75633e665bcc8", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": ">=3.6", "size": 27399, "upload_time": "2019-06-28T06:31:39", "url": "https://files.pythonhosted.org/packages/d7/ee/b08ea71746b6ed9db5c53a56c2e05f6ac7544b44528e37d402f2336d1147/g2pK-0.9.3-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "023a5a94a4d71b87094e72f27a86a47e", "sha256": "2bbe2656e19e1e5a7f4918aba46658cc8966776fcd8cf7bd5887f657c4fbfa4a" }, "downloads": -1, "filename": "g2pK-0.9.3.tar.gz", "has_sig": false, "md5_digest": "023a5a94a4d71b87094e72f27a86a47e", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.6", "size": 21511, "upload_time": "2019-06-28T06:31:41", "url": "https://files.pythonhosted.org/packages/61/1b/85df6c102bbfb2f838c8b8ec9fe9c810f12184e9eba52b26f8d4b998a66b/g2pK-0.9.3.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "c317b1de7b4f0ba64cf75633e665bcc8", "sha256": "36ca614efd73438afc561ae2b07f027d3d278067967b7dc297b8783a271bf910" }, "downloads": -1, "filename": "g2pK-0.9.3-py3-none-any.whl", "has_sig": false, "md5_digest": "c317b1de7b4f0ba64cf75633e665bcc8", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": ">=3.6", "size": 27399, "upload_time": "2019-06-28T06:31:39", "url": "https://files.pythonhosted.org/packages/d7/ee/b08ea71746b6ed9db5c53a56c2e05f6ac7544b44528e37d402f2336d1147/g2pK-0.9.3-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "023a5a94a4d71b87094e72f27a86a47e", "sha256": "2bbe2656e19e1e5a7f4918aba46658cc8966776fcd8cf7bd5887f657c4fbfa4a" }, "downloads": -1, "filename": "g2pK-0.9.3.tar.gz", "has_sig": false, "md5_digest": "023a5a94a4d71b87094e72f27a86a47e", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.6", "size": 21511, "upload_time": "2019-06-28T06:31:41", "url": "https://files.pythonhosted.org/packages/61/1b/85df6c102bbfb2f838c8b8ec9fe9c810f12184e9eba52b26f8d4b998a66b/g2pK-0.9.3.tar.gz" } ] }