{ "info": { "author": "Lu\u00eds Gomes", "author_email": "luismsgomes@gmail.com", "bugtrack_url": null, "classifiers": [ "Development Status :: 5 - Production/Stable", "Intended Audience :: Developers", "License :: OSI Approved :: GNU Lesser General Public License v2 or later (LGPLv2+)", "Programming Language :: Python :: 3.5", "Topic :: Text Processing :: Linguistic" ], "description": "mosestokenizer\n==============\n\nThis package provides wrappers for some pre-processing Perl scripts from the\nMoses toolkit, namely, ``normalize-punctuation.perl``, ``tokenizer.perl``,\n``detokenizer.perl`` and ``split-sentences.perl``.\n\nSample Usage\n------------\n\nAll provided classes are importable from the package ``mosestokenizer``.\n\n >>> from mosestokenizer import *\n\nAll classes have a constructor that takes a two-letter language code as\nargument (``'en'``, ``'fr'``, ``'de'``, etc) and the resulting objects\nare callable.\n\nWhen created, these wrapper objects launch the corresponding Perl script as a\nbackground process. When the objects are no longer needed, you should call the\n``.close()`` method to close the background process and free system resources.\n\nThe objects also support the context manager interface.\nThus, if used within a ``with`` block, the ``.close()`` method is invoked\nautomatically when the block exits.\n\nThe following two usages of ``MosesTokenizer`` are equivalent:\n\n >>> # here we will call .close() explicitly at the end:\n >>> tokenize = MosesTokenizer('en')\n >>> tokenize('Hello World!')\n ['Hello', 'World', '!']\n >>> tokenize.close()\n\n >>> # here we take advantage of the context manager interface:\n >>> with MosesTokenizer('en') as tokenize:\n >>> tokenize('Hello World!')\n ...\n ['Hello', 'World', '!']\n\nAs shown above, ``MosesTokenizer`` callable objects take a string and return a\nlist of tokens (strings).\n\nBy contrast, ``MosesDetokenizer`` takes a list of tokens and returns a string:\n\n >>> with MosesDetokenizer('en') as detokenize:\n >>> detokenize(['Hello', 'World', '!'])\n ...\n 'Hello World!'\n\n``MosesSentenceSplitter`` does more than the name says. Besides splitting\nsentences, it will also unwrap text, i.e. it will try to guess if a sentence\ncontinues in the next line or not. It takes a list of lines (strings) and\nreturns a list of sentences (strings):\n\n >>> with MosesSentenceSplitter('en') as splitsents:\n >>> splitsents([\n ... 'Mr. Smith is away. Do you want to',\n ... 'leave a message?'\n ... ])\n ...\n ['Mr. Smith is away.', 'Do you want to leave a message?']\n\n\n``MosesPunctuationNormalizer`` objects take a string as argument and return a\nstring:\n\n >>> with MosesPunctuationNormalizer('en') as normalize:\n >>> normalize('\u00abHello World\u00bb \u2014 she said\u2026')\n ...\n '\"Hello World\" - she said...'\n\n\nLicense\n-------\n\nCopyright \u00ae 2016-2021, Lu\u00eds Gomes .\n\nThis library is free software; you can redistribute it and/or\nmodify it under the terms of the GNU Lesser General Public\nLicense as published by the Free Software Foundation; either\nversion 2.1 of the License, or (at your option) any later version.\n\nThis library is distributed in the hope that it will be useful,\nbut WITHOUT ANY WARRANTY; without even the implied warranty of\nMERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU\nLesser General Public License for more details.\n\nYou should have received a copy of the GNU Lesser General Public\nLicense along with this library; if not, write to the Free Software\nFoundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA\n02110-1301 USA", "description_content_type": "", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/luismsgomes/mosestokenizer", "keywords": "text tokenization pre-processing", "license": "LGPLv2", "maintainer": "", "maintainer_email": "", "name": "mosestokenizer", "package_url": "https://pypi.org/project/mosestokenizer/", "platform": "", "project_url": "https://pypi.org/project/mosestokenizer/", "project_urls": { "Homepage": "https://github.com/luismsgomes/mosestokenizer" }, "release_url": "https://pypi.org/project/mosestokenizer/1.2.1/", "requires_dist": null, "requires_python": "", "summary": "Wrappers for several pre-processing scripts from the Moses toolkit.", "version": "1.2.1", "yanked": false, "yanked_reason": null }, "last_serial": 11803625, "releases": { "0.3.0": [ { "comment_text": "", "digests": { "md5": "449bb9b59f795fa5422fade7b60cc044", "sha256": "923adc38d209dd5b482dd3e2eb3baccfa9988c4d018b7ae9d3357a11de8655dc" }, "downloads": -1, "filename": "mosestokenizer-0.3.0-py3-none-any.whl", "has_sig": false, "md5_digest": "449bb9b59f795fa5422fade7b60cc044", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 43996, "upload_time": "2016-08-17T20:25:27", "upload_time_iso_8601": "2016-08-17T20:25:27.657741Z", "url": "https://files.pythonhosted.org/packages/e6/ba/5665197497573bc4d91b2dd2049f52423b86bc1e62fe78df677fd0828b33/mosestokenizer-0.3.0-py3-none-any.whl", "yanked": false, "yanked_reason": null }, { "comment_text": "", "digests": { "md5": "6bb80ceb25946503884f0ebeceb5c519", "sha256": "c8d9566f3646a938e5c1321b6aba87b02613003891f55e0db08cd10db16ea6ae" }, "downloads": -1, "filename": "mosestokenizer-0.3.0.tar.gz", "has_sig": false, "md5_digest": "6bb80ceb25946503884f0ebeceb5c519", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 31365, "upload_time": "2016-08-17T20:25:17", "upload_time_iso_8601": "2016-08-17T20:25:17.960661Z", "url": "https://files.pythonhosted.org/packages/d7/9c/ff162aa1452ea6789d04d5bcd5d74ac777735f96d030ae0d93b8b394aedf/mosestokenizer-0.3.0.tar.gz", "yanked": false, "yanked_reason": null } ], "0.5.0": [ { "comment_text": "", "digests": { "md5": "f82674f9d9ab736a79f10a6a88a5fc29", "sha256": "e050a2c4a9114f1f11043532aeb2ce091f2b8da613a3007be117d1122a2c92db" }, "downloads": -1, "filename": "mosestokenizer-0.5.0-py3-none-any.whl", "has_sig": false, "md5_digest": "f82674f9d9ab736a79f10a6a88a5fc29", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 49442, "upload_time": "2017-05-24T09:15:36", "upload_time_iso_8601": "2017-05-24T09:15:36.042042Z", "url": "https://files.pythonhosted.org/packages/bf/6a/f483e313a4a75b81edce19c2be85af7f729fece96988b188d0ff7982e343/mosestokenizer-0.5.0-py3-none-any.whl", "yanked": false, "yanked_reason": null }, { "comment_text": "", "digests": { "md5": "bea93689fcb2b735d588edbb3101a00c", "sha256": "d0f541fdae06a1257d7c708a9222121c951f78991a621a6a6809a4e69fdaf32c" }, "downloads": -1, "filename": "mosestokenizer-0.5.0.tar.gz", "has_sig": false, "md5_digest": "bea93689fcb2b735d588edbb3101a00c", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 32645, "upload_time": "2017-05-24T09:15:29", "upload_time_iso_8601": "2017-05-24T09:15:29.874735Z", "url": "https://files.pythonhosted.org/packages/0d/b6/99817d3f595b4c37253df7c0998aaf59df3afe15298ec73aed69f65d4049/mosestokenizer-0.5.0.tar.gz", "yanked": false, "yanked_reason": null } ], "1.0.0": [ { "comment_text": "", "digests": { "md5": "dd56b4ad98df0fef082caceb0f3b2a9d", "sha256": "4de94102c00ad21ea26c1d8327bf72d38288c6c83c9b2920fb9a86c66eddf8b7" }, "downloads": -1, "filename": "mosestokenizer-1.0.0-py3-none-any.whl", "has_sig": false, "md5_digest": "dd56b4ad98df0fef082caceb0f3b2a9d", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 51182, "upload_time": "2017-05-24T10:29:35", "upload_time_iso_8601": "2017-05-24T10:29:35.078066Z", "url": "https://files.pythonhosted.org/packages/45/c6/913c968e5cbcaff6cdd2a54a1008330c01a573ecadcdf9f526058e3d33a0/mosestokenizer-1.0.0-py3-none-any.whl", "yanked": false, "yanked_reason": null }, { "comment_text": "", "digests": { "md5": "ab8f1fea7c23bfbc132f36aee4eb1808", "sha256": "2d65a781add83e93612a5e491a2cfc9c3740048b8a028556a4e23fceb1a7d48a" }, "downloads": -1, "filename": "mosestokenizer-1.0.0.tar.gz", "has_sig": false, "md5_digest": "ab8f1fea7c23bfbc132f36aee4eb1808", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 33749, "upload_time": "2017-05-24T10:29:28", "upload_time_iso_8601": "2017-05-24T10:29:28.681515Z", "url": "https://files.pythonhosted.org/packages/dc/12/cdc143b9e13c3f235ff10de86a16c8074982be6b5b22be9724603bb4872a/mosestokenizer-1.0.0.tar.gz", "yanked": false, "yanked_reason": null } ], "1.1.0": [ { "comment_text": "", "digests": { "md5": "5cfc8b92b8f12649f18148064124a4a7", "sha256": "27520b3156bc43457ef4272eb8dc40d865e9d0160422e32256ed436df0f00261" }, "downloads": -1, "filename": "mosestokenizer-1.1.0.tar.gz", "has_sig": false, "md5_digest": "5cfc8b92b8f12649f18148064124a4a7", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 37054, "upload_time": "2019-10-24T13:54:39", "upload_time_iso_8601": "2019-10-24T13:54:39.339102Z", "url": "https://files.pythonhosted.org/packages/4b/b3/c0af235b16c4f44a2828ef017f7947d1262b2646e440f85c6a2ff26a8c6f/mosestokenizer-1.1.0.tar.gz", "yanked": false, "yanked_reason": null } ], "1.2.1": [ { "comment_text": "", "digests": { "md5": "0004d7cb0200633ac0ce49d25683007a", "sha256": "438b3e35a221f7930c408e97e3f38af6d0cec74b991eb9edb00a44e3510e836d" }, "downloads": -1, "filename": "mosestokenizer-1.2.1.tar.gz", "has_sig": false, "md5_digest": "0004d7cb0200633ac0ce49d25683007a", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 37120, "upload_time": "2021-10-22T14:15:07", "upload_time_iso_8601": "2021-10-22T14:15:07.205726Z", "url": "https://files.pythonhosted.org/packages/8d/84/4f3c1b5b8d796a07e3816cd41f7b1491e2291db4ade5f17b850116fd80e5/mosestokenizer-1.2.1.tar.gz", "yanked": false, "yanked_reason": null } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "0004d7cb0200633ac0ce49d25683007a", "sha256": "438b3e35a221f7930c408e97e3f38af6d0cec74b991eb9edb00a44e3510e836d" }, "downloads": -1, "filename": "mosestokenizer-1.2.1.tar.gz", "has_sig": false, "md5_digest": "0004d7cb0200633ac0ce49d25683007a", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 37120, "upload_time": "2021-10-22T14:15:07", "upload_time_iso_8601": "2021-10-22T14:15:07.205726Z", "url": "https://files.pythonhosted.org/packages/8d/84/4f3c1b5b8d796a07e3816cd41f7b1491e2291db4ade5f17b850116fd80e5/mosestokenizer-1.2.1.tar.gz", "yanked": false, "yanked_reason": null } ], "vulnerabilities": [] }