{ "info": { "author": "Pavel Sofroniev", "author_email": "pavelsof@gmail.com", "bugtrack_url": null, "classifiers": [ "Development Status :: 4 - Beta", "Intended Audience :: Science/Research", "License :: OSI Approved :: MIT License", "Programming Language :: Python :: 3", "Programming Language :: Python :: 3 :: Only", "Topic :: Text Processing :: Linguistic" ], "description": "======\nipatok\n======\n\nA simple IPA tokeniser, as simple as in:\n\n>>> from ipatok import tokenise\n>>> tokenise('\u02c8ti\u02d0t\u0361\u0283\u0259')\n['t', 'i\u02d0', 't\u0361\u0283', '\u0259']\n>>> tokenise('\u0283\u02d0jeq\u0361\u03c7\u02d0\u02bcjer')\n['\u0283\u02d0', 'j', 'e', 'q\u0361\u03c7\u02d0\u02bc', 'j', 'e', 'r']\n\n\napi\n===\n\n``tokenise(string, strict=False, replace=False, diphtongs=False, tones=False,\nunknown=False, merge=None)`` takes an IPA string and returns a list of tokens.\nA token usually consists of a single letter together with its accompanying\ndiacritics. If two letters are connected by a tie bar, they are also considered\na single token. Except for length markers, suprasegmentals are excluded from\nthe output. Whitespace is also ignored. The function accepts the following\nkeyword arguments:\n\n- ``strict``: if set to ``True``, the function ensures that ``string`` complies\n to the IPA spec (`the 2015 revision`_); a ``ValueError`` is raised if it does\n not. If set to ``False`` (the default), the role of non-IPA characters is\n guessed based on their Unicode category (cf. the pitfalls section below).\n- ``replace``: if set to ``True``, the function replaces some common\n substitutes with their IPA-compliant counterparts, e.g. ``g \u2192 \u0261``, ``\u026b \u2192 l\u0334``,\n ``\u02a6 \u2192 t\u0361s``. Refer to ``ipatok/data/replacements.tsv`` for a full list. If\n both ``strict`` and ``replace`` are set to ``True``, replacing is done before\n checking for spec compliance.\n- ``diphtongs``: if set to ``True``, the function groups together non-syllabic\n vowels with their syllabic neighbours (e.g. ``a\u026a\u032f`` would form a single\n token). If set to ``False`` (the default), vowels are not tokenised together\n unless there is a connecting tie bar (e.g. ``a\u0361\u026a``).\n- ``tones``: if set to ``True``, tone and word accents are included in the\n output (accent markers as diacritics and Chao letters as separate tokens). If\n set to ``False`` (the default), these are ignored.\n- ``unknown``: if set to ``True``, the output includes (as separate tokens)\n symbols that cannot be classified as letters, diacritics or suprasegmentals\n (e.g. ``-``, ``/``, ``$``). If set to ``False`` (the default), such symbols\n are ignored. It does not have effect if ``strict`` is set to ``True``.\n- ``merge``: expects a ``str, str \u2192 bool`` function to be applied onto each\n pair of consecutive tokens; those for which the output is ``True`` are merged\n together. You can use this to, e.g., plug in your own diphtong detection\n algorithm:\n\n >>> tokenise(string, diphtongs=False, merge=custom_func)\n\n``tokenize`` is an alias for ``tokenise``.\n\n\nother functions\n---------------\n\n``replace_digits_with_chao(string, inverse=False)`` takes an IPA string and\nreplaces the digits 1-5 (also in superscript) with Chao tone letters. If\n``inverse=True``, smaller digits are converted into higher tones; otherwise,\nthey are converted into lower tones (the default). Equal consecutive digits\nare collapsed into a single Chao letter (e.g. ``55 \u2192 \u02e5``).\n\n>>> tokenise(replace_numbers_with_chao('\u0255ia\u2075\u00b9\u0255y\u025b\u00b2\u00b9\u2074'), tones=True)\n['\u0255', 'i', 'a', '\u02e5\u02e9', '\u0255', 'y', '\u025b', '\u02e8\u02e9\u02e6']\n\n\npitfalls\n========\n\nWhen ``strict=True`` each symbol is looked up in the spec and there is no\nambiguity as to how the input should be tokenised.\n\nWith ``strict=False`` IPA symbols are still handled correctly. A non-IPA symbol\nwould be treated as follows:\n\n- if it is a non-modifier letter (e.g. ``\u0393``), it is considered a consonant;\n- if it is a modifier (e.g. ``\u02c0``) or a combining mark (e.g. ``\u0259\u0307``), it is\n considered a diacritic;\n- if it is a `modifier tone letter`_ (e.g. ``\ua70d``), it is considered a tone\n symbol;\n- if it is neither of those, it is considered an unknown symbol.\n\nRegardless of the value of ``strict``, whitespace characters and underscores\nare considered to be word boundaries, i.e. there would not be tokens grouping\ntogether symbols separated by these characters, even though the latter are not\nincluded in the output.\n\n\ninstallation\n============\n\nThis is a standard Python 3 package without dependencies. It is offered at the\n`Cheese Shop`_, so you can install it with pip::\n\n pip install ipatok\n\nor, alternatively, you can clone this repo (safe to delete afterwards) and do::\n\n python setup.py test\n python setup.py install\n\nOf course, this could be happening within a virtualenv/venv as well.\n\n\nother IPA packages\n==================\n\n- lingpy_ is a historical linguistics suite that includes an ipa2tokens_\n function.\n- ipapy_ is a package for working with IPA strings.\n- ipalint_ provides a command-line tool for checking IPA datasets for errors\n and inconsistencies.\n\n\nlicence\n=======\n\nMIT. Do as you please and praise the snake gods.\n\n\n.. _`the 2015 revision`: https://www.internationalphoneticassociation.org/sites/default/files/phonsymbol.pdf\n.. _`modifier tone letter`: http://www.unicode.org/charts/PDF/UA700.pdf\n.. _`Cheese Shop`: https://pypi.python.org/pypi/ipatok\n.. _`lingpy`: https://pypi.python.org/pypi/lingpy\n.. _`ipa2tokens`: http://lingpy.org/reference/lingpy.sequence.html#lingpy.sequence.sound_classes.ipa2tokens\n.. _`ipapy`: https://pypi.python.org/pypi/ipapy\n.. _`ipalint`: https://pypi.python.org/pypi/ipalint\n\n\n", "description_content_type": null, "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/pavelsof/ipatok", "keywords": "IPA tokeniser tokenizer", "license": "MIT", "maintainer": "", "maintainer_email": "", "name": "ipatok", "package_url": "https://pypi.org/project/ipatok/", "platform": "", "project_url": "https://pypi.org/project/ipatok/", "project_urls": { "Homepage": "https://github.com/pavelsof/ipatok" }, "release_url": "https://pypi.org/project/ipatok/0.2.0/", "requires_dist": null, "requires_python": "", "summary": "IPA tokeniser", "version": "0.2.0" }, "last_serial": 3553225, "releases": { "0.0.0": [ { "comment_text": "", "digests": { "md5": "81d4c574af140dac47c46ad2832a06af", "sha256": "27bf75f4cf6b7c4c40f5f9732e1382638c7ea8174dc56ae9fdea1d163e5a9d4d" }, "downloads": -1, "filename": "ipatok-0.0.0-py3-none-any.whl", "has_sig": false, "md5_digest": "81d4c574af140dac47c46ad2832a06af", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 8531, "upload_time": "2017-11-19T14:35:56", "url": "https://files.pythonhosted.org/packages/69/59/5f5126fb268985fa65ad92f594e1bd1547afe4767a78d4e5d7920eb5ce4f/ipatok-0.0.0-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "afda61c3bd6fd0c9acb514444fb0c2ee", "sha256": "c74496735b8dbf9c96e27e9a4d4ec57cfed1fadf623731c3e788b4bfec39417c" }, "downloads": -1, "filename": "ipatok-0.0.0.tar.gz", "has_sig": false, "md5_digest": "afda61c3bd6fd0c9acb514444fb0c2ee", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 5592, "upload_time": "2017-11-19T14:35:57", "url": "https://files.pythonhosted.org/packages/db/45/fcc4abd8dda907a0ca13a9ca8eec9e76327e2d29bb68cf5fdb16fc53f90d/ipatok-0.0.0.tar.gz" } ], "0.0.1": [ { "comment_text": "", "digests": { "md5": "1da170a1e580eeb9a2f0b8e11a45cb1f", "sha256": "c10554effccaa53a57001a634bad8fad327d78aae5b3805924f8f16d2e778a37" }, "downloads": -1, "filename": "ipatok-0.0.1-py3-none-any.whl", "has_sig": false, "md5_digest": "1da170a1e580eeb9a2f0b8e11a45cb1f", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 8672, "upload_time": "2017-11-19T15:16:55", "url": "https://files.pythonhosted.org/packages/28/1d/e5694bec613ae98d75495bff63a2e1ee93eca2b9d67b1b4f1f89482069e1/ipatok-0.0.1-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "1f3577b987973f4d9a4357fd826e0dc6", "sha256": "3cb743027327c0e2f096a5607e573eb948c129053d82396a88eab3f7ab01042f" }, "downloads": -1, "filename": "ipatok-0.0.1.tar.gz", "has_sig": false, "md5_digest": "1f3577b987973f4d9a4357fd826e0dc6", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 5657, "upload_time": "2017-11-19T15:16:57", "url": "https://files.pythonhosted.org/packages/79/1c/e15964957d015f2405e8e9b73a2717bd16d0aaa7c5ebda7bcdc0a8b6758e/ipatok-0.0.1.tar.gz" } ], "0.0.2": [ { "comment_text": "", "digests": { "md5": "d08ef53777fd9c213041942b5280b855", "sha256": "2f70c196dd858a4dfc144fd710156cd229d0ffcb1c95bae6000e8c2f3fe78a80" }, "downloads": -1, "filename": "ipatok-0.0.2-py3-none-any.whl", "has_sig": false, "md5_digest": "d08ef53777fd9c213041942b5280b855", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 9413, "upload_time": "2017-11-20T14:35:27", "url": "https://files.pythonhosted.org/packages/3d/73/c581f1c4f38825df3ef70352bfe13ae652267bacf4280dde788e66516c21/ipatok-0.0.2-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "6d41c99d45d955e69a7835f0dc0abb69", "sha256": "6a327660eb576decb34be09ffe3aef37f890da21175d492b7fb12488e473f112" }, "downloads": -1, "filename": "ipatok-0.0.2.tar.gz", "has_sig": false, "md5_digest": "6d41c99d45d955e69a7835f0dc0abb69", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 6314, "upload_time": "2017-11-20T14:35:30", "url": "https://files.pythonhosted.org/packages/9c/7c/475b82fe3a875a8cc0e585b991a5f8943f2787f2c2495670b67cc70a0c57/ipatok-0.0.2.tar.gz" } ], "0.1.0": [ { "comment_text": "", "digests": { "md5": "4c7edd96a6404ff7173c7de7fc0fa987", "sha256": "b799d4e8e0da6d17d19aed90970de38f076c4751a125651125890ff5900136c9" }, "downloads": -1, "filename": "ipatok-0.1.0-py3-none-any.whl", "has_sig": false, "md5_digest": "4c7edd96a6404ff7173c7de7fc0fa987", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 14784, "upload_time": "2017-11-21T15:38:58", "url": "https://files.pythonhosted.org/packages/1e/b8/40a027e9ff74d8e3ddde66c61fa14eb54ae0d276915f729107afa4cd7b6d/ipatok-0.1.0-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "939d963561a625871a064115041e0ad4", "sha256": "418295b77f9f678c65c6e594f4fd008148ebf001d7276abc9b49132fd1d8eec6" }, "downloads": -1, "filename": "ipatok-0.1.0.tar.gz", "has_sig": false, "md5_digest": "939d963561a625871a064115041e0ad4", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 9066, "upload_time": "2017-11-21T15:38:59", "url": "https://files.pythonhosted.org/packages/be/89/6a50ab294b81c0b11187ed005ac571dca2326e0908d6c2d2c246e8af155b/ipatok-0.1.0.tar.gz" } ], "0.1.1": [ { "comment_text": "", "digests": { "md5": "9e62fe8bcb2cb33125841b0e618912ab", "sha256": "87a769e97bf47544ada17da79d1afb481ff6168be69d82bdee533adaef563d9c" }, "downloads": -1, "filename": "ipatok-0.1.1-py3-none-any.whl", "has_sig": false, "md5_digest": "9e62fe8bcb2cb33125841b0e618912ab", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 14979, "upload_time": "2017-11-22T14:22:35", "url": "https://files.pythonhosted.org/packages/82/f6/f63b7f6faddeb0bdf3614552bd129157f62b16b0391663456f2e524fe60d/ipatok-0.1.1-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "e78b52c13c3e08a8edb1ab0456299d95", "sha256": "31bea2f74028179149e3a004cbb9cefc9824e3b781638dbf4a255d52d8e884b5" }, "downloads": -1, "filename": "ipatok-0.1.1.tar.gz", "has_sig": false, "md5_digest": "e78b52c13c3e08a8edb1ab0456299d95", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 9316, "upload_time": "2017-11-22T14:22:37", "url": "https://files.pythonhosted.org/packages/48/74/d7e6b95286f4fae709d72149d70cedad4d327dd7b29fa8d26e0027dacb54/ipatok-0.1.1.tar.gz" } ], "0.1.2": [ { "comment_text": "", "digests": { "md5": "8fb59823e77f742b74f590ba40976950", "sha256": "ca15303a2cc0c79fe1fc678d71f79b143669654a0af1cdb8ae3dcf2e12b34d92" }, "downloads": -1, "filename": "ipatok-0.1.2-py3-none-any.whl", "has_sig": false, "md5_digest": "8fb59823e77f742b74f590ba40976950", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 16182, "upload_time": "2017-12-23T21:02:00", "url": "https://files.pythonhosted.org/packages/45/f0/5ebaa32b65ac84fe98c66e7f7aa172ca01a10cd11489982b6cceb6126004/ipatok-0.1.2-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "4aabdc91cae274e71d50b102dfbe5c29", "sha256": "53e24a2a8d70ababfe864863a4c0e1e56ac44ea3da7b3a08489b07945c8608ad" }, "downloads": -1, "filename": "ipatok-0.1.2.tar.gz", "has_sig": false, "md5_digest": "4aabdc91cae274e71d50b102dfbe5c29", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 10174, "upload_time": "2017-12-23T21:02:02", "url": "https://files.pythonhosted.org/packages/b6/fc/e99ab8c2d0ea155cbc23a671bb002bceec0c2dae8cb92736fd608caa5a3d/ipatok-0.1.2.tar.gz" } ], "0.2.0": [ { "comment_text": "", "digests": { "md5": "e9b1b6c79d18e59e7fd5164cba7d5278", "sha256": "6649115b950b272746d6cb656331a7b1c5f14da9e9e4229532a4c812309848b4" }, "downloads": -1, "filename": "ipatok-0.2.0-py3-none-any.whl", "has_sig": false, "md5_digest": "e9b1b6c79d18e59e7fd5164cba7d5278", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 18450, "upload_time": "2018-02-05T12:52:18", "url": "https://files.pythonhosted.org/packages/6b/fe/06adfb0d2110295db67bbb9a393bb68a0fcaeeae3c455d926bee62f9abbf/ipatok-0.2.0-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "d7aba6ff0f80d1906438f22c39e6ac1a", "sha256": "5c1c7557708e3914889fd75d2ee3cb8eeb851f5b1d1a74c1e16f681725c1c3bb" }, "downloads": -1, "filename": "ipatok-0.2.0.tar.gz", "has_sig": false, "md5_digest": "d7aba6ff0f80d1906438f22c39e6ac1a", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 11896, "upload_time": "2018-02-05T12:52:20", "url": "https://files.pythonhosted.org/packages/e4/19/ed19313265d845b04edde5ac2e9ca1892f80988ebb51382a04885b699e83/ipatok-0.2.0.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "e9b1b6c79d18e59e7fd5164cba7d5278", "sha256": "6649115b950b272746d6cb656331a7b1c5f14da9e9e4229532a4c812309848b4" }, "downloads": -1, "filename": "ipatok-0.2.0-py3-none-any.whl", "has_sig": false, "md5_digest": "e9b1b6c79d18e59e7fd5164cba7d5278", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 18450, "upload_time": "2018-02-05T12:52:18", "url": "https://files.pythonhosted.org/packages/6b/fe/06adfb0d2110295db67bbb9a393bb68a0fcaeeae3c455d926bee62f9abbf/ipatok-0.2.0-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "d7aba6ff0f80d1906438f22c39e6ac1a", "sha256": "5c1c7557708e3914889fd75d2ee3cb8eeb851f5b1d1a74c1e16f681725c1c3bb" }, "downloads": -1, "filename": "ipatok-0.2.0.tar.gz", "has_sig": false, "md5_digest": "d7aba6ff0f80d1906438f22c39e6ac1a", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 11896, "upload_time": "2018-02-05T12:52:20", "url": "https://files.pythonhosted.org/packages/e4/19/ed19313265d845b04edde5ac2e9ca1892f80988ebb51382a04885b699e83/ipatok-0.2.0.tar.gz" } ] }