{ "info": { "author": "Hanefi Onaldi", "author_email": "abdullahanefi16@gmail.com", "bugtrack_url": null, "classifiers": [], "description": "\ufeff# Turkish Stemmer for Python\n\nNote : Most of the documentation taken from elasticsearch-analysis-turkishstemmer project. \n\nStemmer algorithm for Turkish language.\n\n## Introduction to Turkish language morphology\n\n> Turkish is an agglutinative language and has a very rich morphological\n stucture. In Turkish, you can form many different words from a single stem by\n appending a sequence of suffixes. For example The word \"doktoruymu\u015fsunuz\"\n means \"You had been the doctor of him\". The stem of the word is \"doktor\" and\n it takes three different suffixes -sU, -ymU\u015f, and -sUnUz.\n\nFrom \"Snowball Description\":\n\n> Words are usually composed of a stem and of at least two or three affixes\n appended to it.\n\n> We can analyze noun suffixes in Turkish in two groups. Noun suffixes (eg.\n \"doktor-um\" meaning \"my doctor\") and nominal verb suffixes (eg. \"doktor-dur\"\n meaning \u2018is a doctor\u2019). The words ending with nominal verb suffixes can be\n used as verbs in sentences. There are over thirty different suffixes\n classified in these two general groups of suffixes.\n\n> In Turkish, the suffixes are affixed to the stem according to definite\n ordering rules.\n\nFrom \"An affix stripping morphological analyzer for Turkish\" paper:\n\n> Turkish has a special place within the natural languages not only being a\n fully concatenative language but also having the suffixes as the only affix\n type. Another feature of the language is that, someone who knows Turkish can\n easily analyze a word even if he/she does not know its stem.\n\n> The phonological rules of Turkish are significant factors that influence\n this feature.\n Ex: (any word)lerim => (any word)-ler-im\n \"ler\" plural suffix, \"im\" 1st singular person possessive.\n\n### Rules\n\n1. The only affix type in Turkish is the suffix.\n\n2. A plural suffix cannot follow a possesive suffix.\n\n3. A suffix in Turkish can have multiple allomorphs in order to provide sound\n harmony in the word to which it is affixed.\n\n4. In Turkish each vowel indicates a distinct syllable.\n\n5. In Turkish, single syllable words are mostly the stem itself\n\n6. If a word has nominal __verb__ suffixes, they always appear at the end of\n the word. They follow __noun__ suffixes or the stem itself at the absence\n of noun suffixes\n\n7. In Turkish, \u201c-lAr\u201d suffix can be used both as a nominal verb suffix (third\n person plural present tense) and as a noun suffix (plural inflection).\n\n8. In Turkish, words do not end with consonants 'b', 'c', 'd', and '\u011f'.\n However, when a suffix starting with a vowel is affixed to a word ending\n with 'p', '\u00e7', 't' or 'k', the last consonant is transformed into 'b', 'c',\n 'd', or '\u011f' respectively. The postlude routine transforms last consonants\n 'b', 'c','d', or '\u011f'' back to 'p', '\u00e7', 't' or 'k', respectively, after\n stemming is complete.\n\n### Suffix Classes\n\nClass | Type\n-----------------------------|----------------\nNominal verb suffixes | Inflectional\nDerivational suffixes | Derivational\nNoun suffixes | Inflectional\nTense & person verb suffixes | Inflectional\nVerb suffixes | Inflectional\n\n### Suffix allomorphs\n\nSuffix allomorphs are used to create a good sound harmony. They do not change\nthe meaning of the word. If a suffix has a capital letter then it has an\nallomorh. If a suffix has a letter in parentheses then it can be omitted.\nPossible allomorphs are given below:\n\nLetter | Allomorph\n-------|------------\nU | \u0131,i,u,\u00fc\nC | c,\u00e7\nA | a,e\nD | d,t\nI | \u0131,I\n\n### Nominal Verb Suffixes\n\na/a | Suffix\n----|------------------\n1 | \u2013(y)Um\n2 | \u2013sUn\n3 | \u2013(y)Uz\n4 | \u2013sUnUz\n5 | \u2013lAr\n6 | \u2013md\n7 | \u2013n\n8 | \u2013k\n9 | \u2013nUz\n10 | \u2013DUr\n11 | \u2013cAsInA\n12 | \u2013(y)DU\n13 | \u2013(y)sA\n14 | \u2013(y)mU\u015f\n15 | \u2013(y)ken\n\nSuffix transition ordering for nominal verbs can be seen in References[5]\n\n### Noun Suffixes\n\na/a | Suffixes\n----|-------------\n1 | \u2013lAr\n2 | \u2013(U)m\n3 | \u2013(U)mUz\n4 | \u2013(U)n\n5 | \u2013(U)nUz\n6 | \u2013(s)U\n7 | \u2013lArI\n8 | \u2013(y)U\n9 | \u2013nU\n10 | \u2013(n)Un\n11 | \u2013(y)A\n12 | \u2013nA\n13 | \u2013DA\n14 | \u2013nDA\n15 | \u2013DAn\n16 | \u2013nDAn\n17 | \u2013(y)lA\n18 | \u2013ki\n19 | \u2013(n)cA\n\nSuffix transition ordering for nouns can be seen in References[5]\n\n### Derivational Suffixes\n\na/a | Suffixes\n----|----------\n1 | \u2013lUk\n2 | \u2013CU\n3 | \u2013CUk\n4 | \u2013lA\u015f\n5 | \u2013lA\n6 | \u2013lAn\n7 | \u2013CA\n8 | \u2013lU\n9 | \u2013sUz\n\nInitially, we will handle only a small subset of the above suffixes which are\nmore common in our domain.\n\n### Vowel Harmony\n\nThis routine checks whether __the last two__ vowels of the word obey vowel\nharmony rules. A brief description of Turkish vowel harmony follows.\n\nTurkish vowel harmony is a two dimensional vowel harmony system, where vowels\nare characterised by two features named frontness and roundness. There are\nvowel harmony rules for each feature.\n\n1. Vowel harmony rule for frontness: Vowels in Turkish are grouped into two\n according to where they are produced. Front produced vowels are formed at\n the front of the mouth ('e', 'i', '\u00f6', '\u00fc') and back produced vowels are\n produced nearer to throat ('a', '\u0131', 'o', 'u'). According to the vowel\n harmony rule, words cannot contain both front and back vowels. This is one\n of the reasons why suffixes containing vowels can take different forms to\n obey vowel harmony.\n\n2. Vowel harmony rule for roundness: Vowels in Turkish are grouped into two\n according to whether lips are rounded while producing it. 'o', '\u00f6', 'u' and\n '\u00fc' are rounded vowels whereas 'a', 'e', '\u0131' and 'i' are unrounded.\n According to the vowel harmony rules, if the vowel of a syllable is\n unrounded, the following vowel is unrounded as well. If the vowel of a\n syllable is rounded, the following vowels are 'a', 'e', 'u' or '\u00fc'.\n\n### Last consonant\n\nAnother interesting case in detecting suffixes in Turkish is that, for some\nsuffixes, if the word ends with a vowel, a consonant is inserted between the\nrest of the word and the suffix. These merging consonants can be 'y', 'n' or\n's'. When a merging consonant can be inserted before the suffix, the\nrepresentation of the suffix starts with the optional consonant surrounded by\nparanthesis (eg. \u2013(y)Um, -(n)cA). For these kinds of suffixes, if existence of\na merging consonant is considered, the candidate stem is checked whether it\nends with a vowel.\n\nIf there is no 'y' consonant before the suffix, only the real part of the\nsuffix (eg. -Um) is marked for stemming. If there is a 'y' consonant and it is\npreceded by a vowel, 'y' is treated as a merging consonant and both 'y' and\nthe candidate suffix (eg. -Um) is marked for stemming. If there is a consonant\njust before 'y', the decision is that the consonant 'y' and the candidate\nsuffix are really a part of the stem. In such a case, cursor is not advanced\nto prevent over-stemming. The last case can occur especially when the stem\noriginates from another language like in 'lityum' (meaning the element\nLithium). If the check for vowel harmony was not made, the word would be\nstemmed to 'lit', for '\u2013(y)Um' would be treated as a suffix affixed to it. But\naccording to morphological rules of Turkish, the final word would be 'litim',\nnot 'lityum' if 'lit' were really the stem of the word and the suffix '\u2013(y)Um'\nwere affixed to it. So detecting 'lit' as the stem of the word would be an over\n-stemming.\n\n### Merging Vowel\n\nSimilar to merging consonants, there are merging vowels for some suffixes\nstarting with consonants. They can be preceded by merging vowels like in '-(U)\nmUz' suffix when they are affixed to a stem ending with a consonant. In such a\ncase, a U vowel ('\u0131', 'i', 'u' or '\u00fc' depending on vowel harmony) is inserted\nbetween the stem and real suffix (e.g. '-mUz') for ease of pronunciation.\n\n### Some examples\n\nWord / Analysis | Meaning / Stem\n------------------------------ |--------------------------------\nKalelerimizdekilerden | From the ones at one of our castles\nKale-lAr-UmUz-DA-ki-lAr-DAn | Kale\n\u00c7ocu\u011fuymu\u015fumcas\u0131na | As if I were her child\n\u00c7ocuk-(s)U-(y)mU\u015f-(y)Um-cAsInA | \u00c7ocuk\nKedileriyle | With their cats\nKedi-lAr-(s)U-(y)lA | Kedi\n\u00c7ocuklar\u0131mm\u0131\u015f | Someone told me that they were my children\n\u00e7ocuk-lAr-(U)m-(y)mU\u015f | \u00c7ocuk\nKitab\u0131m\u0131zd\u0131 | It was our book\nkitap-UmUz-(y)DU | Kitap\n\n## Future Work\n\n* Add more verbs suffixes.\n* Add more derivational suffixes.\n\n## References\n\n1. [Turkish Stemmer used in Lucene](http://snowball.tartarus.org/algorithms/turkish/stemmer.html)\n2. [Java Implementation](http://snowball.tartarus.org/archives/snowball-discuss/att-0875/02-TurkishStemmer.java)\n3. [Snowball Implementation](http://snowball.tartarus.org/algorithms/turkish/stem_Unicode.sbl)\n4. [Snowball Description](http://snowball.tartarus.org/algorithms/turkish/accompanying_paper.doc)\n5. [An affix stripping morphological analyzer for Turkish](http://web.itu.edu.tr/~gulsenc/papers/iasted.pdf)\n6. [Lead Generation](https://en.wikipedia.org/wiki/Lead_generation)\n7. [Vowel Harmony](https://en.wikipedia.org/wiki/Vowel_harmony#Turkish)\n8. [Turkish Suffixes](https://en.wiktionary.org/wiki/Appendix:Turkish_suffixes)\n9. [Turkish Grammar](https://en.wikipedia.org/wiki/Turkish_grammar)\n10. [Turkish Language](https://en.wikipedia.org/wiki/Turkish_language)\n11. [Tartarus](http://tartarus.org/)\n12. [Information Retrieval on Turkish Texts](http://www.users.muohio.edu/canf/papers/JASIST2008offPrint.pdf)\n\n## Installation\n\nCopy the module folder inside PythonXX/Lib/site-packages or inside your application directory.\n\n## Usage\n```python\n>>> from TurkishStemmer import TurkishStemmer\n>>> stemmer = TurkishStemmer()\n>>> stemmer.stem(\"okuldakilerden\")\nokul\n```\n\n## Contributing\n\n1. Fork it ( `http://github.com//turkishstemmer-python/fork` )\n2. Create your feature branch (`git checkout -b my-new-feature`)\n3. Commit your changes (`git commit -am 'Add some feature'`)\n4. Push to the branch (`git push origin my-new-feature`)\n5. Create new Pull Request\n\n## License\n\nturkishstemmer-python is licensed under the Apache Software License, Version 2.0.\n\n\n", "description_content_type": "text/markdown", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/hanefi/turkish-stemmer-python", "keywords": "", "license": "", "maintainer": "", "maintainer_email": "", "name": "TurkishStemmer", "package_url": "https://pypi.org/project/TurkishStemmer/", "platform": "", "project_url": "https://pypi.org/project/TurkishStemmer/", "project_urls": { "Homepage": "https://github.com/hanefi/turkish-stemmer-python" }, "release_url": "https://pypi.org/project/TurkishStemmer/1.3/", "requires_dist": null, "requires_python": "", "summary": "Turkish Stemmer", "version": "1.3" }, "last_serial": 4355561, "releases": { "1.0": [ { "comment_text": "", "digests": { "md5": "3a11ac24966c185ac447baca5c115a09", "sha256": "4412b0a68ae7455cd85e2ad11260e6089cd7b14db7d8f8f895fafde2aa6baef0" }, "downloads": -1, "filename": "TurkishStemmer-1.0-py3-none-any.whl", "has_sig": false, "md5_digest": "3a11ac24966c185ac447baca5c115a09", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 18643, "upload_time": "2018-10-06T10:25:43", "url": "https://files.pythonhosted.org/packages/98/45/183eb31355c0d523ec7c2f68badb3eb658c3eb5d28317ffff3d9e9575d32/TurkishStemmer-1.0-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "2cd9873040eb5c56550c81654ace019f", "sha256": "6c52ea44c7e4c5aeede2c71b5d178ee803c32460251d15d53441fcadf6799c3a" }, "downloads": -1, "filename": "TurkishStemmer-1.0.tar.gz", "has_sig": false, "md5_digest": "2cd9873040eb5c56550c81654ace019f", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 3971, "upload_time": "2018-01-02T13:18:44", "url": "https://files.pythonhosted.org/packages/31/54/1fd6abd2cca522fc1b0189509d463c123d26b9e1a0967d5579a675a66565/TurkishStemmer-1.0.tar.gz" } ], "1.1": [ { "comment_text": "", "digests": { "md5": "b48400a2ec10f72cf70ce7a08d138bb4", "sha256": "83d7a2d3d89fdd7de19a676c7850e4c2deb54050b7785fba3aa493f4550a12bf" }, "downloads": -1, "filename": "TurkishStemmer-1.1-py3-none-any.whl", "has_sig": false, "md5_digest": "b48400a2ec10f72cf70ce7a08d138bb4", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 18671, "upload_time": "2018-10-06T10:25:45", "url": "https://files.pythonhosted.org/packages/74/4d/ed834c94f99ecae034dc21ee11c9536ea70e5cc19307e12a063838e22804/TurkishStemmer-1.1-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "3e1edcfd73cd006252cb9313a1aec326", "sha256": "c13a82080c58450509974034b3ba8419f4303c11aebfd9acaa0d93eed81c6399" }, "downloads": -1, "filename": "TurkishStemmer-1.1.tar.gz", "has_sig": false, "md5_digest": "3e1edcfd73cd006252cb9313a1aec326", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 15631, "upload_time": "2018-10-06T10:26:09", "url": "https://files.pythonhosted.org/packages/ac/f6/9e54205fb5ec371a27a8fc98b323d6cd9fff284de8eb1b7c8998a6bc119f/TurkishStemmer-1.1.tar.gz" } ], "1.3": [ { "comment_text": "", "digests": { "md5": "58158613c337a1c6d6249fcc82150171", "sha256": "c5aa75bc8b0025179cddb16401581e5dff0d925e5446bec07b1ade0a9bc7575e" }, "downloads": -1, "filename": "TurkishStemmer-1.3-py3-none-any.whl", "has_sig": false, "md5_digest": "58158613c337a1c6d6249fcc82150171", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 20585, "upload_time": "2018-10-09T12:09:49", "url": "https://files.pythonhosted.org/packages/fd/bf/3e56dd4ce442f9237e1c202ce736ae5e5818d74f81604f1665e67736cfc0/TurkishStemmer-1.3-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "ac57e7309b48caa15ff5de701ab832fe", "sha256": "1f218da478e0d33ac49335d0e38c8bceff6bbfb0348b3c7b019b4287e12b9dac" }, "downloads": -1, "filename": "TurkishStemmer-1.3.tar.gz", "has_sig": false, "md5_digest": "ac57e7309b48caa15ff5de701ab832fe", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 16703, "upload_time": "2018-10-09T12:09:51", "url": "https://files.pythonhosted.org/packages/45/4d/decfd7bc07d2a19c646e314cd5f29d329a9fea9a92fcf5366675b911611a/TurkishStemmer-1.3.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "58158613c337a1c6d6249fcc82150171", "sha256": "c5aa75bc8b0025179cddb16401581e5dff0d925e5446bec07b1ade0a9bc7575e" }, "downloads": -1, "filename": "TurkishStemmer-1.3-py3-none-any.whl", "has_sig": false, "md5_digest": "58158613c337a1c6d6249fcc82150171", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 20585, "upload_time": "2018-10-09T12:09:49", "url": "https://files.pythonhosted.org/packages/fd/bf/3e56dd4ce442f9237e1c202ce736ae5e5818d74f81604f1665e67736cfc0/TurkishStemmer-1.3-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "ac57e7309b48caa15ff5de701ab832fe", "sha256": "1f218da478e0d33ac49335d0e38c8bceff6bbfb0348b3c7b019b4287e12b9dac" }, "downloads": -1, "filename": "TurkishStemmer-1.3.tar.gz", "has_sig": false, "md5_digest": "ac57e7309b48caa15ff5de701ab832fe", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 16703, "upload_time": "2018-10-09T12:09:51", "url": "https://files.pythonhosted.org/packages/45/4d/decfd7bc07d2a19c646e314cd5f29d329a9fea9a92fcf5366675b911611a/TurkishStemmer-1.3.tar.gz" } ] }