{ "info": { "author": "Bystroushaak", "author_email": "bystrousak@kitakitsune.org", "bugtrack_url": null, "classifiers": [ "Development Status :: 5 - Production/Stable", "Intended Audience :: Developers", "License :: OSI Approved :: MIT License", "Natural Language :: Czech", "Programming Language :: Python :: 2", "Programming Language :: Python :: 2.7", "Topic :: Text Processing", "Topic :: Text Processing :: General" ], "description": "normalize_cz_unicode\n====================\n\n.. image:: https://badge.fury.io/py/normalize_cz_unicode.png\n :target: http://badge.fury.io/py/normalize_cz_unicode\n\n.. image:: https://pypip.in/d/normalize_cz_unicode/badge.png\n :target: https://pypi.python.org/pypi/normalize_cz_unicode\n\n\nSanitize unicode inputs from unwanted characters.\n\nPrinciple of the module is simple; Use translation table. If the character is\nnot in translation table, convert it to ``latin2``. If it can't be converted,\ntry to normalize it using unicode `NKFD` normalization. If it can't be\nnormalized, replace it with ``?``.\n\nUsage\n-----\n\n.. code-block:: python\n\n >>> from normalize_cz_unicode import normalize\n\n.. code-block:: python\n\n >>> print normalize(\"Tohle je smajl\u00edk: \ud83d\ude2d , kter\u00fd tu ale nechci.\")\n Tohle je smajl\u00edk: ? , kter\u00fd tu ale nechci.\n\nVarious whitespace and special dash characters are normalized to basic ascii:\n\n.. code-block:: python\n\n >>> a = u\"Spojovn\u00edky \u2015 a dal\u0161\u00ed hav\u011b\u0165 jako ned\u011bliteln\u00e9\u202fmezery\u2007taky nechci.\"\n u'Spojovn\\xedky \\u2015 a dal\\u0161\\xed hav\\u011b\\u0165 jako ned\\u011bliteln\\xe9\\u202fmezery\\u2007taky nechci.'\n >>> normalize(a)\n u'Spojovn\\xedky - a dal\\u0161\\xed hav\\u011b\\u0165 jako ned\\u011bliteln\\xe9 mezery taky nechci.'\n\n\nInstallation\n------------\n\nModule is hosted at `PYPI `_, and\ncan be installed using `PIP`_::\n\n sudo pip install normalize_cz_unicode\n\n.. _PIP: http://en.wikipedia.org/wiki/Pip_%28package_manager%29\n\n\nChangelog\n=========\n\n1.0.1\n-----\n - Added caching.\n\n1.0.0\n-----\n - First working version.\n - Added tests.\n - Added documentation to README.rst.\n - Uploaded to `PYPI `_.\n\n0.1.0\n-----\n - Project created.", "description_content_type": null, "docs_url": null, "download_url": "UNKNOWN", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/Bystroushaak/normalize_cz_unicode", "keywords": null, "license": "MIT", "maintainer": null, "maintainer_email": null, "name": "normalize_cz_unicode", "package_url": "https://pypi.org/project/normalize_cz_unicode/", "platform": "UNKNOWN", "project_url": "https://pypi.org/project/normalize_cz_unicode/", "project_urls": { "Download": "UNKNOWN", "Homepage": "https://github.com/Bystroushaak/normalize_cz_unicode" }, "release_url": "https://pypi.org/project/normalize_cz_unicode/1.0.1/", "requires_dist": null, "requires_python": null, "summary": "Take unicode string, leave czech characters, normalize rest.", "version": "1.0.1" }, "last_serial": 1556451, "releases": { "0.1.0": [], "1.0.0": [ { "comment_text": "", "digests": { "md5": "2f38306762b5210284bc1c96eac943ae", "sha256": "b0e0328671ce30361e2c9d2c025e5c26376f2f312df57ddf360b0daad34aceee" }, "downloads": -1, "filename": "normalize_cz_unicode-1.0.0.tar.gz", "has_sig": false, "md5_digest": "2f38306762b5210284bc1c96eac943ae", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 4376, "upload_time": "2015-05-19T11:54:26", "url": "https://files.pythonhosted.org/packages/d3/b0/7336b703286b54b4d4ff21e2bfac22f55f97b116c63e4db439138bb0fd21/normalize_cz_unicode-1.0.0.tar.gz" } ], "1.0.1": [ { "comment_text": "", "digests": { "md5": "9ca1a0733212dafe91cd8903304920d7", "sha256": "5a552a043105fe8079ad6e0a1e5f7494124bf233b5988caea013402f47cbd5c8" }, "downloads": -1, "filename": "normalize_cz_unicode-1.0.1.tar.gz", "has_sig": false, "md5_digest": "9ca1a0733212dafe91cd8903304920d7", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 4619, "upload_time": "2015-05-21T13:53:34", "url": "https://files.pythonhosted.org/packages/37/90/6815f724f70ade092855d53f88f7a32c482f1f01e958c8a9502b8926c118/normalize_cz_unicode-1.0.1.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "9ca1a0733212dafe91cd8903304920d7", "sha256": "5a552a043105fe8079ad6e0a1e5f7494124bf233b5988caea013402f47cbd5c8" }, "downloads": -1, "filename": "normalize_cz_unicode-1.0.1.tar.gz", "has_sig": false, "md5_digest": "9ca1a0733212dafe91cd8903304920d7", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 4619, "upload_time": "2015-05-21T13:53:34", "url": "https://files.pythonhosted.org/packages/37/90/6815f724f70ade092855d53f88f7a32c482f1f01e958c8a9502b8926c118/normalize_cz_unicode-1.0.1.tar.gz" } ] }