{ "info": { "author": "David Kuryakin", "author_email": "dkuryakin@gmail.com", "bugtrack_url": null, "classifiers": [ "Development Status :: 4 - Beta", "Environment :: Web Environment", "Intended Audience :: Developers", "License :: OSI Approved :: MIT License", "Operating System :: OS Independent", "Programming Language :: Python", "Topic :: Internet :: WWW/HTTP", "Topic :: Software Development :: Libraries :: Python Modules", "Topic :: Text Processing :: Linguistic" ], "description": "recoder\n=======\n\n\n\u041d\u0430\u0437\u043d\u0430\u0447\u0435\u043d\u0438\u0435\n----------\n\n\u041f\u0430\u043a\u0435\u0442 \u043d\u0443\u0436\u0435\u043d \u0447\u0442\u043e\u0431\u044b \u0447\u0438\u043d\u0438\u0442\u044c \"\u043a\u0440\u0430\u043a\u043e\u0437\u044f\u0431\u0440\u044b\" (\u0438\u043b\u0438 \"\u043a\u0440\u0430\u043a\u0430\u0437\u044f\u0431\u0440\u044b\") \u0432 \u0447\u0438\u0442\u0430\u0435\u043c\u044b\u0439 \u0442\u0435\u043a\u0441\u0442. \u041d\u0430\u043f\u0440\u0438\u043c\u0435\u0440: \"\u00f5\u00ee\u011f\u00ee\u00f8\u00e8\u00e9 \u00f2\u00e5\u00ea\u00f1\u00f2\" => \"\u0445\u043e\u043e\u0448\u0438\u0439 \u0442\u0435\u043a\u0441\u0442\".\n\n\n\u0423\u0441\u0442\u0430\u043d\u043e\u0432\u043a\u0430\n---------\n::\n\n $ git clone https://bitbucket.org/dkuryakin/recoder.git\n $ cd recoder && python setup.py install\n\n\u0438\u043b\u0438\n::\n\n $ pip install recoder\n\n\u041f\u043e\u043b\u0435\u0437\u043d\u044b\u0435 \u043a\u043e\u043c\u0430\u043d\u0434\u044b\n----------------\n\n\u0418\u0441\u043f\u043e\u043b\u044c\u0437\u043e\u0432\u0430\u043d\u0438\u0435 \u043a\u0430\u043a \u043a\u043e\u043d\u0441\u043e\u043b\u044c\u043d\u0430\u044f \u0442\u0443\u043b\u0437\u0430.\n::\n\n $ echo \"\u00ce\u00f1\u00ed\u00ee\u00e2\u00ed\u00e0\u00ff \u00ce\u00eb\u00e8\u00ec\u00ef\u00e8\u00e9\u00f1\u00ea\u00e0\u00ff \u00e4\u00e5\u00f0\u00e5\u00e2\u00ed\u00ff \u00e2\" | python -mrecoder [coding]\n\n\u041f\u043e \u0443\u043c\u043e\u043b\u0447\u0430\u043d\u0438\u044e, coding=utf-8.\n\n\u0418\u0441\u043f\u043e\u043b\u044c\u0437\u043e\u0432\u0430\u043d\u0438\u0435 \u0432 \u043a\u043e\u0434\u0435\n--------------------\n\n\u0427\u0430\u0449\u0435 \u0432\u0441\u0435\u0433\u043e \u0441 \u043a\u0440\u0430\u043a\u043e\u0437\u044f\u0431\u0440\u0430\u043c\u0438 \u0441\u043f\u0440\u0430\u0432\u0438\u0442\u0441\u044f \u0442\u0430\u043a\u043e\u0439 \u0431\u0430\u0437\u043e\u0432\u044b\u0439 \u043f\u0440\u0438\u043c\u0435\u0440:\n\n.. code-block:: python\n\n from recoder.cyrillic import Recoder\n rec = Recoder()\n broken_text = u'\u00ce\u00f1\u00ed\u00ee\u00e2\u00ed\u00e0\u00ff \u00ce\u00eb\u00e8\u00ec\u00ef\u00e8\u00e9\u00f1\u00ea\u00e0\u00ff \u00e4\u00e5\u00f0\u00e5\u00e2\u00ed\u00ff \u00e2'\n fixed_text = rec.fix_common(broken_text)\n print fixed_text.encode('utf-8')\n\n\n\u0415\u0441\u043b\u0438 \u0431\u0430\u0437\u043e\u0432\u044b\u0439 \u043f\u0440\u0438\u043c\u0435\u0440 \u043d\u0435 \u0441\u043f\u0440\u0430\u0432\u0438\u043b\u0441\u044f, \u043c\u043e\u0436\u043d\u043e \u043f\u043e\u0438\u0433\u0440\u0430\u0442\u044c\u0441\u044f \u0441 \u043d\u0430\u0441\u0442\u0440\u043e\u0439\u043a\u0430\u043c\u0438:\n\n.. code-block:: python\n\n from recoder.cyrillic import Recoder\n rec = Recoder(depth=4)\n broken_text = u'...'\n fixed_text = rec.fix(broken_text) # fix \u0440\u0430\u0431\u043e\u0442\u0430\u0435\u0442 \u0434\u043e\u043b\u044c\u0448\u0435 \u0438 \u0441\u043b\u043e\u0436\u043d\u0435\u0435 \u0447\u0435\u043c fix_common\n ...\n\n\n\u041c\u043e\u0436\u043d\u043e \u0438\u0441\u043f\u043e\u043b\u044c\u0437\u043e\u0432\u0430\u0442\u044c \u0447\u0430\u0441\u0442\u043e\u0443\u043f\u043e\u0442\u0440\u0435\u0431\u0438\u043c\u044b\u0435 \u0441\u043b\u043e\u0432\u0430 (\u0438, \u043d\u0430, \u043a, \u0432, ...) \u043a\u0430\u043a \u0438\u043d\u0434\u0438\u043a\u0430\u0442\u043e\u0440 \u0443\u0441\u043f\u0435\u0445\u0430 \u043f\u0435\u0440\u0435\u043a\u043e\u0434\u0438\u0440\u043e\u0432\u043a\u0438. \u041d\u043e \u0432 \u044d\u0442\u043e\u043c \u0441\u043b\u0443\u0447\u0430\u0435 \u0442\u0435\u043a\u0441\u0442 \u043f\u043e\u0447\u0438\u043d\u0438\u0442\u0441\u044f \u0442\u043e\u043b\u044c\u043a\u043e \u0435\u0441\u043b\u0438 \u0432 \u043d\u0451\u043c \u0435\u0441\u0442\u044c \u044d\u0442\u0438 \u0441\u043b\u043e\u0432\u0430:\n\n.. code-block:: python\n\n from recoder.cyrillic import Recoder\n rec = Recoder(use_plus_words=True)\n ...\n\n\n\u0417\u0430\u043c\u0435\u0447\u0430\u043d\u0438\u044f\n---------\n\n\u0412 \u0434\u0430\u043d\u043d\u044b\u0439 \u043c\u043e\u043c\u0435\u043d\u0442 \u043f\u043e\u0434\u0434\u0435\u0440\u0436\u0438\u0432\u0430\u0435\u0442\u0441\u044f \u0442\u043e\u043b\u044c\u043a\u043e \u043a\u0438\u0440\u0438\u043b\u043b\u0438\u0446\u0430.\n\n\u0420\u0430\u0441\u0448\u0438\u0440\u0435\u043d\u0438\u0435\n----------\n\n\u0415\u0441\u043b\u0438 \u0445\u043e\u0447\u0435\u0442\u0441\u044f \u0440\u0430\u0441\u0448\u0438\u0440\u0438\u0442\u044c \u0431\u0438\u0431\u043b\u0438\u043e\u0442\u0435\u043a\u0443 \u043d\u0435 \u0442\u043e\u043b\u044c\u043a\u043e \u043a\u0438\u0440\u0438\u043b\u043b\u0438\u0446\u0435\u0439, \u043f\u0440\u0435\u0434\u0443\u0441\u043c\u043e\u0442\u0440\u0435\u043d\u043d\u0430 \u0443\u0434\u043e\u0431\u043d\u0430\u044f \u0442\u0443\u043b\u0437\u0430:\n::\n\n $ cat some_learning_text.txt | python -mrecoder.builder [coding]\n\n\u041f\u043e-\u0443\u043c\u043e\u043b\u0447\u0430\u043d\u0438\u044e, coding=utf-8. \u041d\u0430 stdin \u043f\u043e\u0434\u0430\u0432\u0430\u0442\u044c \u0442\u0435\u043a\u0441\u0442\u043e\u0432\u043a\u0443 \u0434\u043b\u044f \u043e\u0431\u0443\u0447\u0435\u043d\u0438\u044f. \u041d\u0430 \u0432\u044b\u0445\u043e\u0434\u0435 \u043f\u043e\u043b\u0443\u0447\u0438\u0442\u0441\u044f 2 \u0444\u0430\u0439\u043b\u0438\u043a\u0430: 3grams.json \u0438 plus_words.json. \u0414\u0430\u043b\u0435\u0435 \u0432\u0441\u0451 \u0434\u0435\u043b\u0430\u0435\u0442\u0441\u044f \u043f\u043e \u0430\u043d\u0430\u043b\u043e\u0433\u0438\u0438 \u0441 recoder.cyrillic.\n\n\u0422\u0435\u0441\u0442\u044b\n-----\n\n\u0422\u0443\u0442 \u0432\u0441\u0451 \u043f\u0440\u043e\u0441\u0442\u043e:\n::\n\n $ git clone https://bitbucket.org/dkuryakin/recoder.git\n $ cd recoder && python setup.py test\n\nChangelog\n---------\n\nv0.1.0\n - \u0420\u0435\u0430\u043b\u0438\u0438\u0437\u0430\u0446\u0438\u044f \u0431\u0430\u0437\u043e\u0432\u043e\u0439 \u0444\u0443\u043d\u043a\u0446\u0438\u043e\u043d\u0430\u043b\u044c\u043d\u043e\u0441\u0442\u0438.\n\nv0.2.0\n - \u0414\u043e\u0431\u0430\u0432\u043b\u043d\u0438\u0435 \u0434\u0435\u043a\u043e\u0434\u0435\u0440\u043e\u0432. \u0422\u0435\u043f\u0435\u0440\u044c \u0443\u043c\u0435\u0435\u0442 \u0434\u0435\u043a\u043e\u0434\u0438\u0442\u044c \u0442\u0430\u043a\u0438\u0435 \u043a\u0440\u0430\u043a\u043e\u0437\u044f\u0431\u0440\u044b (\u0432\u0437\u044f\u043b \u043f\u0440\u0438\u043c\u0435\u0440\u044b \u043d\u0430 2cyr.com):\n - - èðèëèöà\n - - %D0%A2%D0%BE%D0%B2%D0%B0+%D0%B5+%D0%BA\n - - åäíà ãîäè\n - - ирилица\n\nv0.3.0\n - \u0414\u043e\u0431\u0430\u0432\u043b\u0435\u043d\u0430 \u043f\u043e\u0434\u0434\u0435\u0440\u0436\u043a\u0430 python3.\n\nv0.3.1\n - \u0423\u0434\u0430\u043b\u0435\u043d\u0438\u0435 \u0438\u0437 \u0437\u0430\u0432\u0438\u0441\u0438\u043c\u043e\u0441\u0442\u0435\u0439 \u043f\u0430\u043a\u0435\u0442\u0430 regex.\n - \u041c\u0438\u043d\u043e\u0440\u043d\u044b\u0435 \u0444\u0438\u043a\u0441\u044b.", "description_content_type": null, "docs_url": null, "download_url": "https://bitbucket.org/dkuryakin/recoder/get/master.tar.gz", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://bitbucket.org/dkuryakin/recoder", "keywords": "cyrillic,encoding,coding,fix,decoder,recoder,i18n", "license": "mit", "maintainer": null, "maintainer_email": null, "name": "recoder", "package_url": "https://pypi.org/project/recoder/", "platform": "any", "project_url": "https://pypi.org/project/recoder/", "project_urls": { "Download": "https://bitbucket.org/dkuryakin/recoder/get/master.tar.gz", "Homepage": "https://bitbucket.org/dkuryakin/recoder" }, "release_url": "https://pypi.org/project/recoder/0.3.1/", "requires_dist": null, "requires_python": null, "summary": "Tool (and lib) for coding fix.", "version": "0.3.1" }, "last_serial": 1042123, "releases": { "0.1.0": [ { "comment_text": "", "digests": { "md5": "b34ef7cdb8bf536521221664b9be50ec", "sha256": "2ddcff3f415b7dd2bd5dd96eccb12b5e13405788692e5bbd621a5dffb2cb6576" }, "downloads": -1, "filename": "recoder-0.1.0.tar.gz", "has_sig": false, "md5_digest": "b34ef7cdb8bf536521221664b9be50ec", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 95896, "upload_time": "2014-03-25T02:01:04", "url": "https://files.pythonhosted.org/packages/84/72/ec43863086a9d6eb78a9ba27c5dddd24b1fe196b5256320fdb727f3b7897/recoder-0.1.0.tar.gz" } ], "0.2.0": [ { "comment_text": "", "digests": { "md5": "8bb2020878e8cbeefcaa596ec23f29f8", "sha256": "bbde4be2f0d7b22c44e404fd8ba894c104182baaf5eb70d2dd0cd96cb3b4f154" }, "downloads": -1, "filename": "recoder-0.2.0.tar.gz", "has_sig": false, "md5_digest": "8bb2020878e8cbeefcaa596ec23f29f8", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 96372, "upload_time": "2014-03-25T15:13:45", "url": "https://files.pythonhosted.org/packages/30/70/70327006cce0556b43e2c1bacd959faef86778d355e91e849329d6fe930d/recoder-0.2.0.tar.gz" } ], "0.3.0": [ { "comment_text": "", "digests": { "md5": "7116a58444dd2f98bdec81dee4371e0e", "sha256": "95f41629015a5cf35ec9b670d3aaa86c67dfdc6b7ab1856af0646213be837bae" }, "downloads": -1, "filename": "recoder-0.3.0.tar.gz", "has_sig": false, "md5_digest": "7116a58444dd2f98bdec81dee4371e0e", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 96801, "upload_time": "2014-03-26T07:56:36", "url": "https://files.pythonhosted.org/packages/31/33/968185686c87862f281ba3e52dce166c90a57b6e831e93ddb749ca57589e/recoder-0.3.0.tar.gz" } ], "0.3.1": [ { "comment_text": "", "digests": { "md5": "dc856f52760583e28a5d6a7bf30ece62", "sha256": "b56f0f5d86997288b2da11b44c6e855ea4977a848990252528f63fa03dd2cdb4" }, "downloads": -1, "filename": "recoder-0.3.1.tar.gz", "has_sig": false, "md5_digest": "dc856f52760583e28a5d6a7bf30ece62", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 96815, "upload_time": "2014-03-26T14:57:17", "url": "https://files.pythonhosted.org/packages/3a/a7/d759e65694e9206a843c437d0d8c9fae4d9c7d613fd91f79e09c68e3d357/recoder-0.3.1.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "dc856f52760583e28a5d6a7bf30ece62", "sha256": "b56f0f5d86997288b2da11b44c6e855ea4977a848990252528f63fa03dd2cdb4" }, "downloads": -1, "filename": "recoder-0.3.1.tar.gz", "has_sig": false, "md5_digest": "dc856f52760583e28a5d6a7bf30ece62", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 96815, "upload_time": "2014-03-26T14:57:17", "url": "https://files.pythonhosted.org/packages/3a/a7/d759e65694e9206a843c437d0d8c9fae4d9c7d613fd91f79e09c68e3d357/recoder-0.3.1.tar.gz" } ] }