{ "info": { "author": "Steven Moran and Robert Forkel", "author_email": "steven.moran@uzh.ch", "bugtrack_url": null, "classifiers": [ "Development Status :: 4 - Beta", "Intended Audience :: Developers", "License :: OSI Approved :: Apache Software License", "Natural Language :: English", "Programming Language :: Python :: 3", "Programming Language :: Python :: 3.4", "Programming Language :: Python :: 3.5", "Programming Language :: Python :: 3.6", "Programming Language :: Python :: 3.7", "Programming Language :: Python :: Implementation :: CPython", "Programming Language :: Python :: Implementation :: PyPy" ], "description": "segments\n========\n\n[![Build Status](https://travis-ci.org/cldf/segments.svg?branch=master)](https://travis-ci.org/cldf/segments)\n[![codecov](https://codecov.io/gh/cldf/segments/branch/master/graph/badge.svg)](https://codecov.io/gh/cldf/segments)\n[![PyPI](https://img.shields.io/pypi/v/segments.svg)](https://pypi.org/project/segments)\n\n\n[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.1051158.svg)](https://doi.org/10.5281/zenodo.1051158)\n\nThe segments package provides Unicode Standard tokenization routines and orthography segmentation,\nimplementing the linear algorithm described in the orthography profile specification from \n*The Unicode Cookbook* (Moran and Cysouw 2018 [![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.1296780.svg)](https://doi.org/10.5281/zenodo.1296780)).\n\n\nCommand line usage\n------------------\n\nCreate a text file:\n```\n$ echo \"a\u00e4aa\u00f6aa\u00fcaa\" > text.txt\n```\n\nNow look at the profile:\n```\n$ cat text.txt | segments profile\nGrapheme frequency mapping\na 7 a\na\u0308 1 a\u0308\nu\u0308 1 u\u0308\no\u0308 1 o\u0308\n```\n\nWrite the profile to a file:\n```\n$ cat text.txt | segments profile > profile.prf\n```\n\nEdit the profile:\n\n```\n$ more profile.prf\nGrapheme frequency mapping\naa 0 x\na 7 a\na\u0308 1 a\u0308\nu\u0308 1 u\u0308\no\u0308 1 o\u0308\n```\n\nNow tokenize the text without profile:\n```\n$ cat text.txt | segments tokenize\na a\u0308 a a o\u0308 a a u\u0308 a a\n```\n\nAnd with profile:\n```\n$ cat text.txt | segments --profile=profile.prf tokenize\na a\u0308 aa o\u0308 aa u\u0308 aa\n\n$ cat text.txt | segments --mapping=mapping --profile=profile.prf tokenize\na a\u0308 x o\u0308 x u\u0308 x\n```\n\n\nAPI\n---\n\n```python\n>>> from __future__ import unicode_literals, print_function\n>>> from segments import Profile, Tokenizer\n>>> t = Tokenizer()\n>>> t('abcd')\n'a b c d'\n>>> prf = Profile({'Grapheme': 'ab', 'mapping': 'x'}, {'Grapheme': 'cd', 'mapping': 'y'})\n>>> print(prf)\nGrapheme\tmapping\nab\tx\ncd\ty\n>>> t = Tokenizer(profile=prf)\n>>> t('abcd')\n'ab cd'\n>>> t('abcd', column='mapping')\n'x y'\n```", "description_content_type": "text/markdown", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/cldf/segments", "keywords": "tokenizer", "license": "Apache 2.0", "maintainer": "", "maintainer_email": "", "name": "segments", "package_url": "https://pypi.org/project/segments/", "platform": "", "project_url": "https://pypi.org/project/segments/", "project_urls": { "Homepage": "https://github.com/cldf/segments" }, "release_url": "https://pypi.org/project/segments/2.1.0/", "requires_dist": null, "requires_python": "", "summary": "", "version": "2.1.0" }, "last_serial": 5858039, "releases": { "0.1.0": [ { "comment_text": "", "digests": { "md5": "013768b54f420db41dfd64daa7e247e9", "sha256": "0526894124da2c2c4ed5641597091d6eb9fa36c9887e74801f48c61b887a1976" }, "downloads": -1, "filename": "segments-0.1.0.tar.gz", "has_sig": false, "md5_digest": "013768b54f420db41dfd64daa7e247e9", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 19001, "upload_time": "2017-01-09T13:56:29", "url": "https://files.pythonhosted.org/packages/cc/e7/7c4974e777877fdfb2bc0de90cd8cb04a8ca72a565a4f22a76e2771ce19f/segments-0.1.0.tar.gz" } ], "0.2.0": [ { "comment_text": "", "digests": { "md5": "b6cb20312d7046897b82f113902480e1", "sha256": "308fb202d766eccb8e270528bd290db743a6327f48e69fdd66d0cd14bf274c4e" }, "downloads": -1, "filename": "segments-0.2.0.tar.gz", "has_sig": false, "md5_digest": "b6cb20312d7046897b82f113902480e1", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 19499, "upload_time": "2017-01-12T15:34:02", "url": "https://files.pythonhosted.org/packages/2c/93/7aeedf37d8eda360e687e48c6016010837a7275aa99877bce4a157e5a81c/segments-0.2.0.tar.gz" } ], "0.3.0": [ { "comment_text": "", "digests": { "md5": "7621ddec25ba8de2a23ef16f9157a579", "sha256": "40a65062b44c83533bf9c7019c8c0b220335b55b7435d130074088bf8d0c00a1" }, "downloads": -1, "filename": "segments-0.3.0.tar.gz", "has_sig": false, "md5_digest": "7621ddec25ba8de2a23ef16f9157a579", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 20119, "upload_time": "2017-01-13T11:12:39", "url": "https://files.pythonhosted.org/packages/5c/b6/65be6aa581a78cd644936c8ef941edd23c7438c00c2911aaf88c9cc2f140/segments-0.3.0.tar.gz" } ], "1.0.0": [ { "comment_text": "", "digests": { "md5": "2b3c70849fdc76a6a90a0b9ec82b0a5c", "sha256": "dd835065df78c2720ec19175ba73c1fd54299fe43ecb8f5dc50caa4eb7b19c25" }, "downloads": -1, "filename": "segments-1.0.0.tar.gz", "has_sig": false, "md5_digest": "2b3c70849fdc76a6a90a0b9ec82b0a5c", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 14790, "upload_time": "2017-01-25T12:46:04", "url": "https://files.pythonhosted.org/packages/81/d1/c0304d5d831f75f6090969cd6699aeaa74a053d498e220d7e01adf317bfa/segments-1.0.0.tar.gz" } ], "1.1.0": [ { "comment_text": "", "digests": { "md5": "c7537731d9be873408f654d427df3317", "sha256": "6beb813acd9193439e39a0d956e927a49206f511e454f53cd1d3053b027105ff" }, "downloads": -1, "filename": "segments-1.1.0-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "c7537731d9be873408f654d427df3317", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": null, "size": 15627, "upload_time": "2017-11-16T14:28:51", "url": "https://files.pythonhosted.org/packages/47/b6/65964fea6d6d82fb5dbcaac0bd227e59968f81d90da43f965abca2629453/segments-1.1.0-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "179bbde5f4ba3b8ea1dee22a3890f1fc", "sha256": "5b58b08b7f09160065040a2a981e22358fea5d6b3da11ddfdf9fc779d40a57d2" }, "downloads": -1, "filename": "segments-1.1.0.tar.gz", "has_sig": false, "md5_digest": "179bbde5f4ba3b8ea1dee22a3890f1fc", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 14681, "upload_time": "2017-02-02T10:09:41", "url": "https://files.pythonhosted.org/packages/5d/7a/85b5410c6a2ff607b5b70da714136af162d3b69784084fe310518e1b6b39/segments-1.1.0.tar.gz" } ], "1.1.1": [ { "comment_text": "", "digests": { "md5": "e70bc8c371561eb7bf00c403e567fb4f", "sha256": "e633722a818162d49dbd42961274039631ecb2307f2a0c7f370439bbc41a46db" }, "downloads": -1, "filename": "segments-1.1.1-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "e70bc8c371561eb7bf00c403e567fb4f", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": null, "size": 16262, "upload_time": "2017-11-16T14:28:53", "url": "https://files.pythonhosted.org/packages/e6/f5/f79731628e8c91094f40eef6ac32935c4702e32a6f666117765d7aa9d18c/segments-1.1.1-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "97342763f2452e39fa7dbe543a4ca4d4", "sha256": "d6e1c66c60bc175eaac3fd804e1c32b85d894928bea783dc77351b1d9982b742" }, "downloads": -1, "filename": "segments-1.1.1.tar.gz", "has_sig": false, "md5_digest": "97342763f2452e39fa7dbe543a4ca4d4", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 12555, "upload_time": "2017-11-16T14:29:41", "url": "https://files.pythonhosted.org/packages/51/05/aebb92025c53a3ee068102f30d0e890a132a3c580370ac430d7dbd332666/segments-1.1.1.tar.gz" } ], "1.2.1": [ { "comment_text": "", "digests": { "md5": "d9ba147cea38de0351867f5941d310d7", "sha256": "2b3a4486239318e3b5b2b63159db33b0813fd63febc13a17c5748bd3437b5dd9" }, "downloads": -1, "filename": "segments-1.2.1-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "d9ba147cea38de0351867f5941d310d7", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": null, "size": 16748, "upload_time": "2018-04-30T10:50:19", "url": "https://files.pythonhosted.org/packages/29/b2/3de9c177de6bb515975885a4620565da51a5d1093648ef6201593b1cd7ed/segments-1.2.1-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "82344bada0c48629c69980a466f00cd0", "sha256": "61ab9a610664322e5a0baf24da25af5255507771969ce28c133be90535ca71d5" }, "downloads": -1, "filename": "segments-1.2.1.tar.gz", "has_sig": false, "md5_digest": "82344bada0c48629c69980a466f00cd0", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 12916, "upload_time": "2018-04-30T10:50:06", "url": "https://files.pythonhosted.org/packages/79/4c/960cc195283f62332d855fbf584037ca569ea31cf43b0bde9ef649b09068/segments-1.2.1.tar.gz" } ], "1.2.2": [ { "comment_text": "", "digests": { "md5": "a896e2c4b5f92571329aba5d4a400f5c", "sha256": "eedcf972941fd6740031dfa0d93c4b32f00c60db9a3758a4c1b5161e2c552165" }, "downloads": -1, "filename": "segments-1.2.2-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "a896e2c4b5f92571329aba5d4a400f5c", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": null, "size": 14330, "upload_time": "2018-06-22T10:35:28", "url": "https://files.pythonhosted.org/packages/64/f6/5a23502d06b0fadbb92b3da8fd06ffee18a387d0ec989b8c18c928593d6f/segments-1.2.2-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "46fe54999514c9eb46fa11170f7e7739", "sha256": "b6b31227198787c87612afe935156c215830d3ceafed4f93c0c816f8b2c3af15" }, "downloads": -1, "filename": "segments-1.2.2.tar.gz", "has_sig": false, "md5_digest": "46fe54999514c9eb46fa11170f7e7739", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 13683, "upload_time": "2018-06-22T10:35:22", "url": "https://files.pythonhosted.org/packages/65/1d/4b9db86db5f30bde9becc46f45230b05480435da010af97ec6bded779e73/segments-1.2.2.tar.gz" } ], "2.0.0": [ { "comment_text": "", "digests": { "md5": "89eee849e6865fe6bccf198bbdf80720", "sha256": "ac333edc41731bbd869fce95f5e39d27fe809ff7b4debb5b24426413fde85e8f" }, "downloads": -1, "filename": "segments-2.0.0-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "89eee849e6865fe6bccf198bbdf80720", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": null, "size": 15706, "upload_time": "2018-08-01T20:46:54", "url": "https://files.pythonhosted.org/packages/35/d5/af715bc0c0da03f7ca469b9041ad6bbf840c2ccab7ecb5c5bac5f9b9da4e/segments-2.0.0-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "e27ab582794976337d0c1aa9318e3ecb", "sha256": "9d3143cab08891d314845434a49cc064a8b871ede8735d72b0f44f1cd0cdedb9" }, "downloads": -1, "filename": "segments-2.0.0.tar.gz", "has_sig": false, "md5_digest": "e27ab582794976337d0c1aa9318e3ecb", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 14290, "upload_time": "2018-08-01T20:46:48", "url": "https://files.pythonhosted.org/packages/4a/09/360e02feea7430652d848464980339b4416805e491aefbce6ad2f365f8c5/segments-2.0.0.tar.gz" } ], "2.0.1": [ { "comment_text": "", "digests": { "md5": "be24b20b9e7e0c2c19f4099f4c3d69aa", "sha256": "4a6ac81af292c754f79c2e067d3af7096d702825dc74f358bdc5607a08dbe63d" }, "downloads": -1, "filename": "segments-2.0.1-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "be24b20b9e7e0c2c19f4099f4c3d69aa", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": null, "size": 15720, "upload_time": "2018-08-02T08:54:36", "url": "https://files.pythonhosted.org/packages/10/d3/dcbdf340604e74effc502aefc5929d7e555490e19d7a8a1b71613e2e2446/segments-2.0.1-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "4b434a60a1de58faf6f30789181c5923", "sha256": "e9f11006f9be9b9de6e9c5bd472cac435b490969270ff06bce3cb71cf8d6fb2a" }, "downloads": -1, "filename": "segments-2.0.1.tar.gz", "has_sig": false, "md5_digest": "4b434a60a1de58faf6f30789181c5923", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 14312, "upload_time": "2018-08-02T08:54:32", "url": "https://files.pythonhosted.org/packages/96/3c/29a47f43ad1b8f244a946482c42dbf884766f48c4a344435383aa034302b/segments-2.0.1.tar.gz" } ], "2.0.2": [ { "comment_text": "", "digests": { "md5": "1be8414f6b55bb2e4d91c256b3a3ac19", "sha256": "257127c119f6bcffe03ed6716fec2538add2fac86ca9ecbb3eb11773b95f0761" }, "downloads": -1, "filename": "segments-2.0.2-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "1be8414f6b55bb2e4d91c256b3a3ac19", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": null, "size": 15730, "upload_time": "2019-07-01T09:04:27", "url": "https://files.pythonhosted.org/packages/e1/2f/b162cb98c1c241cd5ac25a1752a81ab82ff12611ffb5d7a5d9cf516b308b/segments-2.0.2-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "22946fe9758f27f6cccf49989c1d84f2", "sha256": "6e29ec932f6f069aa24a4d2befdca348a2d46d485cdae07607d7495d8379d51d" }, "downloads": -1, "filename": "segments-2.0.2.tar.gz", "has_sig": false, "md5_digest": "22946fe9758f27f6cccf49989c1d84f2", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 14356, "upload_time": "2019-07-01T09:04:23", "url": "https://files.pythonhosted.org/packages/be/05/f7a295140b640c7fcddf76201688b9bc5323a6e5cb79eb7d63cdc216ee1e/segments-2.0.2.tar.gz" } ], "2.1.0": [ { "comment_text": "", "digests": { "md5": "5872b54d5c01494b77816c60444f3b30", "sha256": "0451e57c3517c642d7a1d6c576e8abc47dbe1a971200d95a10ed3265647d32e4" }, "downloads": -1, "filename": "segments-2.1.0-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "5872b54d5c01494b77816c60444f3b30", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": null, "size": 14929, "upload_time": "2019-09-19T18:21:10", "url": "https://files.pythonhosted.org/packages/10/49/efcbde07638a64500c4b0094dca6b613891ac4fe4af823224f77e14e4483/segments-2.1.0-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "88e1be638d00bd4f8c631b40953c1af5", "sha256": "7ce25c441dc762ee2551a566f025831d25860a1ed77cfb0cff58c18357531afb" }, "downloads": -1, "filename": "segments-2.1.0.tar.gz", "has_sig": false, "md5_digest": "88e1be638d00bd4f8c631b40953c1af5", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 14223, "upload_time": "2019-09-19T18:21:05", "url": "https://files.pythonhosted.org/packages/77/ea/b4dc8f150684d48e59a4a122e12302fabf3e2bbee763459bdb9b381ce942/segments-2.1.0.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "5872b54d5c01494b77816c60444f3b30", "sha256": "0451e57c3517c642d7a1d6c576e8abc47dbe1a971200d95a10ed3265647d32e4" }, "downloads": -1, "filename": "segments-2.1.0-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "5872b54d5c01494b77816c60444f3b30", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": null, "size": 14929, "upload_time": "2019-09-19T18:21:10", "url": "https://files.pythonhosted.org/packages/10/49/efcbde07638a64500c4b0094dca6b613891ac4fe4af823224f77e14e4483/segments-2.1.0-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "88e1be638d00bd4f8c631b40953c1af5", "sha256": "7ce25c441dc762ee2551a566f025831d25860a1ed77cfb0cff58c18357531afb" }, "downloads": -1, "filename": "segments-2.1.0.tar.gz", "has_sig": false, "md5_digest": "88e1be638d00bd4f8c631b40953c1af5", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 14223, "upload_time": "2019-09-19T18:21:05", "url": "https://files.pythonhosted.org/packages/77/ea/b4dc8f150684d48e59a4a122e12302fabf3e2bbee763459bdb9b381ce942/segments-2.1.0.tar.gz" } ] }