{ "info": { "author": "Matthew Darling", "author_email": "matthewjdarling@gmail.com", "bugtrack_url": null, "classifiers": [ "Development Status :: 5 - Production/Stable", "Intended Audience :: Science/Research", "License :: OSI Approved :: MIT License", "Natural Language :: English", "Operating System :: OS Independent", "Programming Language :: Python", "Programming Language :: Python :: 2", "Programming Language :: Python :: 3", "Topic :: Scientific/Engineering" ], "description": "\ufeffTransliteration is the conversion of text from one orthography to\r\nanother, or if you're a linguist, from a phonetic orthography to \r\ntranscription. I wrote this module to convert the orthography of an\r\nindigenuous Mexican language into IPA: their orthography was based\r\non Spanish, so there weren't any confusing irregularities like in\r\nEnglish.\r\n\r\nre_transliterate supports Unicode, though there might be corner cases\r\nfor particular languages. If that's the case, let me know, and I'll \r\nlook into using the regex module from PyPI rather than the built-in re\r\nmodule.\r\n\r\nUnlike the transliterate package on PyPI, re_transliterate allows for \r\nmapping between arbitrary strings: one (or more) letters can correspond\r\nto one (or more) letters. Since it's based on regular expressions, you\r\ncan also define language-specific character classes, like a set of\r\nvowels. This is extremely useful for handling diacritics and other\r\nsuprasegmental features.\r\n\r\nHow does it work?\r\n-----------------\r\n\r\nre_transliterate relies on Python dictionaries - mappings of x:y. These\r\nmappings should be written as regular expressions, which you may \r\nalready be familiar with. If not, find a computer science undergrad and\r\nmake them write your mappings :)\r\n\r\nIn simple cases, you can simply use normal strings (prefaced by 'u' for\r\nUnicode in Python 2):\r\n\r\n >>> consonant_map = {u\"ch\":u\"\u010d\", u\"x\u2019\":u\"\u0161\u2019\"}\r\n >>> word_replace(consonant_map, u\"ch\")\r\n \u010d\r\n\r\nWith that mapping, occurrences of \"ch\" will be turned into '\u010d' and \r\nejectives written with 'x' will instead use '\u0161'.\r\n\r\nThe power of regular expressions\r\n--------------------------------\r\n\r\nIn more complicated cases, allowing regular expressions makes life\r\nmuch simpler. We can use character classes to reduce repetition, and \r\nuse backslashes to do fancy things. If your string contains a \r\nbackslash, preface it with 'r' as well as 'u'. An example of both:\r\n\r\n >>> VOWELS = u\"([aeiou\u00e1\u00e9\u00ed\u00f3\u00fa])\"\r\n >>> long_vowels = {VOWELS + u\":\":ur\"\\g<1>\u02d0\"}\r\n >>> word_replace(long_vowels, u\"a:'\")\r\n a\u02d0\r\n\r\nWith this mapping, vowels followed by a colon character (':') will be\r\nchanged to use the IPA symbol for length ('\u02d0'). Here, square brackets []\r\nindicate a \"character class\": these characters will be treated in the \r\nsame way, such that the string \"a:\" will be matched just like \"i:\" or \r\n\"\u00fa:\".\r\n\r\nThen, the parentheses () indicate a \"group\" in the regular expression:\r\nthis allows us to reference what it is that the pattern matched - \r\nthat's what the \\g<1> does, referring to \"group 1\". If the pattern were\r\nto match \"a:\", it would insert \"a\u02d0\". This is the primary benefit of\r\nusing regular expressions: we don't have to specify that \"a:\" turns\r\ninto \"a\u02d0\", and that \"e:\" turns into \"e\u02d0\", etc.\r\n\r\nComplicated mappings\r\n--------------------\r\n\r\nIn the previous example, did you see that I used addition inside the\r\nmapping? This allows useful patterns and special Unicode sequences to\r\nbe saved globally and reused elsewhere. In my own work, I defined the \r\nabove character class of vowels, and the Unicode sequence for the \"tilde\r\nbelow letter\" diacritic:\r\n\r\n >>> LARYNGEAL = ur\"\\u0330\"\r\n\r\nThis allowed me to write mappings like this, where the ' character\r\nrepresents laryngealization and may occur before/after a marker for\r\nlength:\r\n\r\n >>> long_laryngealized = {VOWELS + u\"(:'|':)\":\r\n ur\"\\g<1>\" + LARYNGEAL + u\"\u02d0\"}\r\n\r\nThis converts any vowel, followed by either (marked by '|') \":'\" or\r\n\"':\", into the vowel + laryngealization diacritic + length marker.\r\n\r\nApplying your mappings\r\n----------------------\r\n\r\nOnce you've created all your mappings, you can then either apply them\r\none at a time to a word (word_replace) or list of words\r\n(word_list_replace). As you develop your conversion code, you should\r\nuse these functions first, to make sure everything is working correctly.\r\n\r\nGenerally, you'll want to have multiple mappings, because some\r\nconversions will be sensitive to ordering. For my work, I created three\r\nmappings: trigraphs (sequences of three characters), bigraphs, and \r\nmonographs. I then applied each of those in order, using the two\r\ntransliterate functions. Example:\r\n\r\n >>> re_trigraphs = {u\"lh\u2019\":u\"\u026c\u2019\",\r\n VOWELS + u\":'|':\":ur\"\\g<1>\" + LARYNGEAL + u\"\u02d0\"}\r\n >>> re_bigraphs = {u\"ch\":u\"\u010d\", u\"lh\":u\"\u026c\", u\"nh\":u\"\u014b\u0294\",\r\n u\"tz\":u\"c\", u\"uj\":u\"\u028d\",\r\n u\"s\u2019\":u\"s\u2019\", u\"x\u2019\":u\"\u0161\u2019\",\r\n VOWELS + u\":\":ur\"\\g<1>\u02d0\",\r\n VOWELS + u\"'\":ur\"\\g<1>\" + LARYNGEAL}\r\n >>> re_monographs = {u\"h\":u\"\u0294\", u\"r\":u\"\u027e\", u\"x\":u\"\u0161\"}\r\n >>> order = [re_trigraphs, re_bigraphs, re_monographs]\r\n >>> transliterate(order, u\"a:ma'ha:'pi'tz\u00edn\")\r\n a\u02d0ma\u0330\u0294a\u0330\u02d0pi\u0330c\u00edn\r\n\r\nThis way, the conversion for \"lh\":\"\u026c\" won't interrupt the conversion of\r\n\"lh\u2019\":u\"\u026c\u2019\". Alternatively, I suppose you could use an OrderedDict with\r\nthe word_replace/word_list_replace functions, but I personally like it\r\nthis way.\r\n\r\nConclusion\r\n----------\r\n\r\nIf you're writing your code in an interactive environment such as IPython,\r\nnote that if you change the contents of a dictionary, it will not be\r\nupdated inside any lists you may have created. I learned this the hard way,\r\nwhen my code didn't work until I redefined the order list used above. If you\r\nsee any weird bugs, that might be the cause.\r\n\r\nOther than that, if you have any other questions/issues with the module,\r\nshoot me an e-mail; especially if you run into any issues with Unicode\r\nsupport or bugs specific to Python 3/PyPy/another interpreter.\r\n\r\nFor an example transliteration script, see the script I wrote for\r\n`Upper Necaxa Totonac `_.\r\nThe example usage above is all drawn from that code - but it might help to see\r\na working example.\r\n\r\n\ufeff========\r\nVersions\r\n========\r\n\r\n1.3\r\n----------\r\n\r\nDebugged readme examples (Unicode escape was wrong, and | was used incorrectly)\r\n\r\n1.2\r\n----------\r\n\r\nReadme should be perfect\r\n\r\n1.1\r\n----------\r\n\r\nFixing readme for PyPI\r\n\r\n1.0\r\n----------\r\n\r\nFactored out of the UNT_to_IPA module", "description_content_type": null, "docs_url": null, "download_url": "UNKNOWN", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/MatthewDarling/re_transliterate/", "keywords": "", "license": "http://opensource.org/licenses/MIT", "maintainer": "", "maintainer_email": "", "name": "re_transliterate", "package_url": "https://pypi.org/project/re_transliterate/", "platform": "UNKNOWN", "project_url": "https://pypi.org/project/re_transliterate/", "project_urls": { "Download": "UNKNOWN", "Homepage": "https://github.com/MatthewDarling/re_transliterate/" }, "release_url": "https://pypi.org/project/re_transliterate/1.3/", "requires_dist": null, "requires_python": null, "summary": "Functions for transliteration using regular expressions", "version": "1.3" }, "last_serial": 925787, "releases": { "1.0": [ { "comment_text": "", "digests": { "md5": "f9940211bf789bb40453f0e0392b0036", "sha256": "eb7e1b0eb63bc045f5d3a25544fac9582bdc73176fe5980babc3699c3e1b5089" }, "downloads": -1, "filename": "re_transliterate-1.0-py27-none-any.whl", "has_sig": false, "md5_digest": "f9940211bf789bb40453f0e0392b0036", "packagetype": "bdist_wheel", "python_version": "2.7", "requires_python": null, "size": 7665, "upload_time": "2013-11-20T22:43:34", "url": "https://files.pythonhosted.org/packages/f4/db/4c7bb2a073323d25b56b102be5e20daa6c97c165f09890a844327b6091ba/re_transliterate-1.0-py27-none-any.whl" }, { "comment_text": "", "digests": { "md5": "6ecf6feb4028d0220fc5e4b24ef12c01", "sha256": "cc0fc46cae7d6b81ea0f60659c311bd0af0a48154f4812bf89f6e37ddbc11f1a" }, "downloads": -1, "filename": "re_transliterate-1.0.zip", "has_sig": false, "md5_digest": "6ecf6feb4028d0220fc5e4b24ef12c01", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 12006, "upload_time": "2013-11-20T22:43:21", "url": "https://files.pythonhosted.org/packages/12/ed/5c808ed1c40d37f31597239772a8641c3f459a9be418e61bfe7e2f2d22d8/re_transliterate-1.0.zip" } ], "1.1": [ { "comment_text": "", "digests": { "md5": "22ef1254d934076374dd95b7a5d2f75e", "sha256": "5871010b82d8cb284f97d71364688614b929b9fc8f92667056e62ba5d09678f8" }, "downloads": -1, "filename": "re_transliterate-1.1-py27-none-any.whl", "has_sig": false, "md5_digest": "22ef1254d934076374dd95b7a5d2f75e", "packagetype": "bdist_wheel", "python_version": "2.7", "requires_python": null, "size": 7692, "upload_time": "2013-11-20T22:48:18", "url": "https://files.pythonhosted.org/packages/1a/51/753b1aa5bb387c432b4abeebab5aa0bccbdfcedb09ac58923b324e70f013/re_transliterate-1.1-py27-none-any.whl" }, { "comment_text": "", "digests": { "md5": "43da66313f25ac20aa3ca37c81c62caf", "sha256": "d433f6f5cf8b017c51148b7d92fb3be533d0f53415ffac38de49a3abc451c3ec" }, "downloads": -1, "filename": "re_transliterate-1.1.zip", "has_sig": false, "md5_digest": "43da66313f25ac20aa3ca37c81c62caf", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 12048, "upload_time": "2013-11-20T22:48:29", "url": "https://files.pythonhosted.org/packages/67/4a/8c61df330d1a1d43859fe070cd96f00c5ec265add997cd06e9aa8f6f8652/re_transliterate-1.1.zip" } ], "1.2": [ { "comment_text": "", "digests": { "md5": "bf598f64387c46968319c546bff71c80", "sha256": "3f122064f7b85a86d3b3524f20f36e13003f5cea02868bcbb9602a80999d6612" }, "downloads": -1, "filename": "re_transliterate-1.2-py27-none-any.whl", "has_sig": false, "md5_digest": "bf598f64387c46968319c546bff71c80", "packagetype": "bdist_wheel", "python_version": "2.7", "requires_python": null, "size": 7713, "upload_time": "2013-11-20T22:50:14", "url": "https://files.pythonhosted.org/packages/fa/d6/62edf897118957e99cd34573b5a4264bfe9ed9f7f7b253eb9344169e341a/re_transliterate-1.2-py27-none-any.whl" }, { "comment_text": "", "digests": { "md5": "b43a02f3f455f40a5a5ee43165d0103d", "sha256": "414869ee869b728bc484445184fdb6027d7b2868a1762a3bb7fb241b0a2061b4" }, "downloads": -1, "filename": "re_transliterate-1.2.zip", "has_sig": false, "md5_digest": "b43a02f3f455f40a5a5ee43165d0103d", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 12092, "upload_time": "2013-11-20T22:50:01", "url": "https://files.pythonhosted.org/packages/6d/72/7a5f80ae21fdc640e08ccd73624ed9ba7f12dd129006863ae44fb20ff560/re_transliterate-1.2.zip" } ], "1.3": [ { "comment_text": "", "digests": { "md5": "4b713b8e40083dd6a1730144d1aaf8cf", "sha256": "bb546dcad9686230ab37da99e83b02a0d74717d77005e2f0cddfaeb1c59ee856" }, "downloads": -1, "filename": "re_transliterate-1.3-py27-none-any.whl", "has_sig": false, "md5_digest": "4b713b8e40083dd6a1730144d1aaf8cf", "packagetype": "bdist_wheel", "python_version": "2.7", "requires_python": null, "size": 8081, "upload_time": "2013-11-21T17:01:52", "url": "https://files.pythonhosted.org/packages/b2/64/b8da2008704628babd2fb95a0f34b7b044e5060534244f1544183784723b/re_transliterate-1.3-py27-none-any.whl" }, { "comment_text": "", "digests": { "md5": "d255ecf726b1084dba060b71f9fe67a6", "sha256": "746f4485266bd3d11979c1d41eee5759af1a990f083527e056ab18764a9628ff" }, "downloads": -1, "filename": "re_transliterate-1.3.zip", "has_sig": false, "md5_digest": "d255ecf726b1084dba060b71f9fe67a6", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 12634, "upload_time": "2013-11-21T17:02:11", "url": "https://files.pythonhosted.org/packages/b4/67/5497989e51341c478a1672bd761c83c0c8b5675c0e6824ff97c0821e5a56/re_transliterate-1.3.zip" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "4b713b8e40083dd6a1730144d1aaf8cf", "sha256": "bb546dcad9686230ab37da99e83b02a0d74717d77005e2f0cddfaeb1c59ee856" }, "downloads": -1, "filename": "re_transliterate-1.3-py27-none-any.whl", "has_sig": false, "md5_digest": "4b713b8e40083dd6a1730144d1aaf8cf", "packagetype": "bdist_wheel", "python_version": "2.7", "requires_python": null, "size": 8081, "upload_time": "2013-11-21T17:01:52", "url": "https://files.pythonhosted.org/packages/b2/64/b8da2008704628babd2fb95a0f34b7b044e5060534244f1544183784723b/re_transliterate-1.3-py27-none-any.whl" }, { "comment_text": "", "digests": { "md5": "d255ecf726b1084dba060b71f9fe67a6", "sha256": "746f4485266bd3d11979c1d41eee5759af1a990f083527e056ab18764a9628ff" }, "downloads": -1, "filename": "re_transliterate-1.3.zip", "has_sig": false, "md5_digest": "d255ecf726b1084dba060b71f9fe67a6", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 12634, "upload_time": "2013-11-21T17:02:11", "url": "https://files.pythonhosted.org/packages/b4/67/5497989e51341c478a1672bd761c83c0c8b5675c0e6824ff97c0821e5a56/re_transliterate-1.3.zip" } ] }