{ "info": { "author": "Ruoho Ruotsi", "author_email": "ruoho.ruotsi@gmail.com", "bugtrack_url": null, "classifiers": [ "Development Status :: 3 - Alpha", "Environment :: Console", "Intended Audience :: Developers", "License :: OSI Approved :: MIT License", "Operating System :: MacOS :: MacOS X", "Operating System :: POSIX", "Operating System :: Unix", "Programming Language :: Python", "Programming Language :: Python :: 3", "Programming Language :: Python :: 3.5", "Programming Language :: Python :: 3.6", "Programming Language :: Python :: 3.7", "Programming Language :: Python :: Implementation :: CPython", "Topic :: Software Development", "Topic :: Utilities" ], "description": "# \u00ccr\u00e0nl\u1ecd\u0301w\u1ecd\u0301\n[![Build Status](https://travis-ci.org/Niger-Volta-LTI/iranlowo.svg?branch=master)](https://travis-ci.org/Niger-Volta-LTI/iranlowo)\n[![PyPI](https://img.shields.io/pypi/v/iranlowo.svg)](https://pypi.org/project/iranlowo)\n![PyPI - Python Version](https://img.shields.io/pypi/pyversions/iranlowo.svg)\n[![License](https://black.readthedocs.io/en/stable/_static/license.svg)](https://github.com/ruohoruotsi/iranlowo/blob/master/LICENSE)\n[![Style](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/ambv/black)\n\n\u00ccr\u00e0nl\u1ecd\u0301w\u1ecd\u0301 is a set of utilities to analyze & process Yor\u00f9b\u00e1 text for NLP tasks. The focus is on *helping software developers* build large, clean text datasets for (further) diacritic restoration and machine translation tasks.\n\n## Features\n\n### ADR tools\n* [X] Strip all diacritics from word-types\n* [X] Verify that text is NFC or NFD\n* [X] Normalize a corpus (from MS Word or elsewhere) → NFC\n* [X] Split long sentences on certain characters like `;`,`:`, etc\n* [X] Automatically restore correct diacritics using a pre-trained model\n* [X] Find all variants of all word-type in a given corpus\n* [ ] Partially strip diacritics from word-types\n\n### Ready to use webpage scrapers\n* [X] B\u00edb\u00e9l\u00ec M\u00edm\u1ecd\u0301 (Biblica, Bible Society of Nigeria)\n* [ ] Yor\u00f9b\u00e1 Blog\n* [ ] BBC Yor\u00f9b\u00e1\n\n### Corpus analysis tools\n* [X] Dataset character distribution\n* [X] Dataset ambuiguity statistics → Lexdif, etc for a given corpus\n* [ ] Dataset scoring (proximity to correctly diacritized text, LM perplexity, KL divergence)\n\n## Installation\nObtainable from the [Python Package Index (PyPI)](https://pypi.org/project/iranlowo/) → `pip install iranlowo`\n\n## Example\n\n* Show computing environment and installation process\n\n\n\n* Diacritize a phrase\n```\n$ python\nPython 3.7.3 (default, Mar 27 2019, 16:54:48)\n[Clang 4.0.1 (tags/RELEASE_401/final)] :: Anaconda, Inc. on darwin\nType \"help\", \"copyright\", \"credits\" or \"license\" for more information.\n>>> import iranlowo.adr as ra\u0301nl\u1ecd\n>>> ra\u0301nl\u1ecd.diacritize_text(\"lootoo ni pe ojo gbogbo ni ti ole\")\nPRED AVG SCORE: -0.0037, PRED PPL: 1.0037\n'l\u00f3\u00f2t\u00f3\u1ecd\u0301 ni p\u00e9 \u1ecdj\u1ecd\u0301 gbogbo ni ti ol\u00e8' \n```\n\n* Diacritize phrases, note we use `ipython` only because it renders nicer, easy-to-read text-colours in the terminal!\n\n\n\n## Disclaimer\n\nThis is beta software, if you pass the diacritizer [out-of-domain text](https://www.quora.com/What-is-in-domain-out-domain-and-open-domain-data), English, pidgin or any other non-Yor\u00f9b\u00e1 text, you will experience very marvelous, black-box results. \n\nSince this a work-in-progress and we are steadily improving, if you encounter any problems with correctness or performance, please submit [pull-requests](https://github.com/ruohoruotsi/iranlowo/pulls) with corrections or file an [issue](https://github.com/ruohoruotsi/iranlowo/issues).\n\n## License\n\nThis project is licensed under the [MIT License](https://opensource.org/licenses/MIT).\n\n", "description_content_type": "text/markdown", "docs_url": null, "download_url": "https://github.com/Niger-Volta-LTI/iranlowo/archive/v0.0.8.3.tar.gz", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/Niger-Volta-LTI/iranlowo", "keywords": "", "license": "MIT", "maintainer": "", "maintainer_email": "", "name": "iranlowo", "package_url": "https://pypi.org/project/iranlowo/", "platform": "", "project_url": "https://pypi.org/project/iranlowo/", "project_urls": { "Download": "https://github.com/Niger-Volta-LTI/iranlowo/archive/v0.0.8.3.tar.gz", "Homepage": "https://github.com/Niger-Volta-LTI/iranlowo" }, "release_url": "https://pypi.org/project/iranlowo/0.0.8.3/", "requires_dist": [ "bs4", "configargparse", "torch", "numpy", "requests", "tqdm" ], "requires_python": "", "summary": "Utility library for analysis & (pre)processing of Yor\u00f9b\u00e1 text", "version": "0.0.8.3" }, "last_serial": 5496179, "releases": { "0.0.0": [ { "comment_text": "", "digests": { "md5": "09234a4823837c1cb5e3e921997b11ae", "sha256": "feda71093297b6e33d1b52a98019b27a844439e252c2cc442961b409240ccb13" }, "downloads": -1, "filename": "iranlowo-0.0.0.zip", "has_sig": false, "md5_digest": "09234a4823837c1cb5e3e921997b11ae", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 8729, "upload_time": "2019-04-06T00:17:58", "url": "https://files.pythonhosted.org/packages/4b/ee/ca3bba2373c8b3ef8d4442564f0d386185e10d6b0316efe2eedfd6d81259/iranlowo-0.0.0.zip" } ], "0.0.4": [ { "comment_text": "", "digests": { "md5": "19d5e6a4ce4c84366e4b4eb4a7a55faf", "sha256": "36b752ebb285c4c1b53accb5358e778a8266c6922a9a6c481f461b700bde9b43" }, "downloads": -1, "filename": "iranlowo-0.0.4.zip", "has_sig": false, "md5_digest": "19d5e6a4ce4c84366e4b4eb4a7a55faf", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 65220255, "upload_time": "2019-05-22T16:20:09", "url": "https://files.pythonhosted.org/packages/5d/c8/7c7cec138fe80b5ceba12dd7fe6dbbf0e3be6dceaa39de2f69dc5247654c/iranlowo-0.0.4.zip" } ], "0.0.5.4": [ { "comment_text": "", "digests": { "md5": "20b55e1b43245e8e81bd2a89bb9268a9", "sha256": "f5c884f9341ecf1a335fbfaf0f6adfab36e4aa42e868a7e98b3416a85c94fba5" }, "downloads": -1, "filename": "iranlowo-0.0.5.4-py3-none-any.whl", "has_sig": false, "md5_digest": "20b55e1b43245e8e81bd2a89bb9268a9", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 65216378, "upload_time": "2019-05-27T04:04:08", "url": "https://files.pythonhosted.org/packages/80/44/c8d1f98dab639f6dc6f622d6551395c4ac2c2265d2e1ab9e4c7834bd3fd3/iranlowo-0.0.5.4-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "3b151642ff43dd518ec47292dae4bfcf", "sha256": "454f0a719ac07530b7313919e5348141ab0c29dcca6abce50d3dcbf3353dd3ff" }, "downloads": -1, "filename": "iranlowo-0.0.5.4.tar.gz", "has_sig": false, "md5_digest": "3b151642ff43dd518ec47292dae4bfcf", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 65189173, "upload_time": "2019-05-27T04:04:16", "url": "https://files.pythonhosted.org/packages/f8/48/9d3a42eea760285af290f460b0172fcb5a94742bf93ec746b8eab078cc43/iranlowo-0.0.5.4.tar.gz" } ], "0.0.6": [ { "comment_text": "", "digests": { "md5": "608aea4c8f22d0faba70e6d62e6ab1b9", "sha256": "4d476e6d3b894c3fe818739d7770d236520e6c62f8995cc02d0acee6f12f3425" }, "downloads": -1, "filename": "iranlowo-0.0.6-py3-none-any.whl", "has_sig": false, "md5_digest": "608aea4c8f22d0faba70e6d62e6ab1b9", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 65258866, "upload_time": "2019-05-28T04:53:15", "url": "https://files.pythonhosted.org/packages/63/b9/8f8a052a7dd7369c9ee048c59b3568dd9977933b6f0d52bb12899a1103bb/iranlowo-0.0.6-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "f6e89f00d7188f672a64a0fd1a98e86c", "sha256": "bc885e1a5dfa2438bf29dddfac7fab8406a077878bf7117de13d9e688b6d75f8" }, "downloads": -1, "filename": "iranlowo-0.0.6.tar.gz", "has_sig": false, "md5_digest": "f6e89f00d7188f672a64a0fd1a98e86c", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 65222539, "upload_time": "2019-05-28T04:53:24", "url": "https://files.pythonhosted.org/packages/8e/7a/a7dbb26f6c2c332da1b435be2f0af5c8f6ddb1fc6a93374bcc223650b482/iranlowo-0.0.6.tar.gz" } ], "0.0.7": [ { "comment_text": "", "digests": { "md5": "1218610b1e2ffcaf0ccf059e17e3ff86", "sha256": "714dd76a84858533f46d0a4d267b884c312d4d616c61161c82ffb751e8b992d4" }, "downloads": -1, "filename": "iranlowo-0.0.7-py3-none-any.whl", "has_sig": false, "md5_digest": "1218610b1e2ffcaf0ccf059e17e3ff86", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 87947120, "upload_time": "2019-06-19T07:49:36", "url": "https://files.pythonhosted.org/packages/6f/4e/b5bc328318d94d94fea7cb6be23fc2d42850293c2c1c5e2ece33a0b81376/iranlowo-0.0.7-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "f614d5e79bf1299062f85352e9066d16", "sha256": "90193d4f9afd769668dd193bb97259d1622b9a02aa55e9cd976f0f53c527ce3d" }, "downloads": -1, "filename": "iranlowo-0.0.7.tar.gz", "has_sig": false, "md5_digest": "f614d5e79bf1299062f85352e9066d16", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 87909074, "upload_time": "2019-06-19T07:49:46", "url": "https://files.pythonhosted.org/packages/55/77/deaa7ee80a419b60c65f3ff80c6694f1bdced572316af8c3004a3ada701b/iranlowo-0.0.7.tar.gz" } ], "0.0.8": [ { "comment_text": "", "digests": { "md5": "8303778ae8ef4b663f0cd8d8a29ecf5e", "sha256": "824d35c307ceec9563137f78e45bccae65a52b6b6b6014c49ffd4bb5a42afd26" }, "downloads": -1, "filename": "iranlowo-0.0.8-py3-none-any.whl", "has_sig": false, "md5_digest": "8303778ae8ef4b663f0cd8d8a29ecf5e", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 87947888, "upload_time": "2019-06-28T20:17:13", "url": "https://files.pythonhosted.org/packages/c8/8f/b6c1fb296ea2270000fada5cbc273ce5903db24fb0a4eaae83cc5131643b/iranlowo-0.0.8-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "0ea209b10575ff060439e59b881cae08", "sha256": "515070a81d89a457c29838f0b37441e863dd6c51873b1fcd314c2c622365eff5" }, "downloads": -1, "filename": "iranlowo-0.0.8.tar.gz", "has_sig": false, "md5_digest": "0ea209b10575ff060439e59b881cae08", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 87910728, "upload_time": "2019-06-28T20:17:25", "url": "https://files.pythonhosted.org/packages/7d/43/f6cea7541d0ce2668bc329d1d8ae5fd5ee0ae3276143ef46dbda29349aa7/iranlowo-0.0.8.tar.gz" } ], "0.0.8.3": [ { "comment_text": "", "digests": { "md5": "e19836c57f28ca0a929c9fd9641bd1a1", "sha256": "5679c3421f4092033bd86c60efeebf0273910c2b2a8c5fb3358518efb2ba72df" }, "downloads": -1, "filename": "iranlowo-0.0.8.3-py3-none-any.whl", "has_sig": false, "md5_digest": "e19836c57f28ca0a929c9fd9641bd1a1", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 87948582, "upload_time": "2019-07-07T04:18:11", "url": "https://files.pythonhosted.org/packages/39/84/fb9e39f146f3128c4976b851b92d230ef0de47fab051c92f56f5e69e762a/iranlowo-0.0.8.3-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "22e2aa01ff4918ff850ada8fa482c76d", "sha256": "ae62ea57b96b9d27bcd3e768655f7faffb3df7a1fd4f78f49db1ac9402dca619" }, "downloads": -1, "filename": "iranlowo-0.0.8.3.tar.gz", "has_sig": false, "md5_digest": "22e2aa01ff4918ff850ada8fa482c76d", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 87911194, "upload_time": "2019-07-07T04:18:21", "url": "https://files.pythonhosted.org/packages/b0/e3/7516f763688cc1bae9e71db3b33c53d5313e16a52caeb2a89a2774e203a1/iranlowo-0.0.8.3.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "e19836c57f28ca0a929c9fd9641bd1a1", "sha256": "5679c3421f4092033bd86c60efeebf0273910c2b2a8c5fb3358518efb2ba72df" }, "downloads": -1, "filename": "iranlowo-0.0.8.3-py3-none-any.whl", "has_sig": false, "md5_digest": "e19836c57f28ca0a929c9fd9641bd1a1", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 87948582, "upload_time": "2019-07-07T04:18:11", "url": "https://files.pythonhosted.org/packages/39/84/fb9e39f146f3128c4976b851b92d230ef0de47fab051c92f56f5e69e762a/iranlowo-0.0.8.3-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "22e2aa01ff4918ff850ada8fa482c76d", "sha256": "ae62ea57b96b9d27bcd3e768655f7faffb3df7a1fd4f78f49db1ac9402dca619" }, "downloads": -1, "filename": "iranlowo-0.0.8.3.tar.gz", "has_sig": false, "md5_digest": "22e2aa01ff4918ff850ada8fa482c76d", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 87911194, "upload_time": "2019-07-07T04:18:21", "url": "https://files.pythonhosted.org/packages/b0/e3/7516f763688cc1bae9e71db3b33c53d5313e16a52caeb2a89a2774e203a1/iranlowo-0.0.8.3.tar.gz" } ] }