{ "info": { "author": "Metehan Cetinkaya", "author_email": "metehancet@gmail.com", "bugtrack_url": null, "classifiers": [ "Environment :: MacOS X", "Programming Language :: Python", "Programming Language :: Python :: 3", "Programming Language :: Python :: 3.6" ], "description": "Not : T\u00fcrk\u00e7e d\u00f6k\u00fcmantasyon a\u015fa\u011f\u0131dad\u0131r.\n\nSee [Change Log](Changelog.md)\n\n\n# Turkish NLP with Python\n\n## Performance \n\n**System : Processor : 1,6 GHz Intel Core i5 , RAM : 8 GB 1600 MHZ DDR3 , Macbook Air\n\n| Method | Execution Time (ms) | Words Count|\n|------------|----------|----------|\n| auto_correct | 135 ms | 1000 words |\n| is_turkish | 1 ms | 1000 words |\n| syllabicate_sentence | 94 ms | 1000 words |\n\n\nVery early version of the TurkishNLP. For now it has basically 5 main functions; Detecting Turkish Language, correcting text without whitespace, correcting typos, vowel harmonic detection, Turkish origin detection and syllabication in Turkish words.\n\n## Dataset\nDataset was created by parsing and filtering a Turkish wikipedia dump. \n\n## Getting Started\nTo get started first you need to install the package. With using pip;\n```\npip install turkishnlp\n```\nAnd you can install the most recent version by;\n```\npip install --upgrade turkishnlp\n```\nAfter installing the package succesfully try and import the package.\n\n```python\nimport turkishnlp\n```\n### Downloading the data\nTo download the data first we need to create an instance of TurkishNLP class. So we need to ;\n```python\nfrom turkishnlp import detector\nobj = detector.TurkishNLP()\n```\nAfter creating the instance we can simply call the download function like this;\n\n```python\nobj.download()\n```\nIt will take shortly and after the download it will print out \"Download succesful\". You won't have to download the data again.\n\n### Creating the wordset\nTo create the wordset from data you need to ;\n```python\nobj.create_word_set()\n```\nAnd it will create the wordset and necesary dictionaries.\n\n### Example Usage\nSo there are 5 main functions, detecting if the language is Turkish, Turkish typo correction, vowel harmony detection, Turkish origin detection and syllabication.\n\n### Language Detection\n```python\nprint(obj.is_turkish(\"Ben bug\u00fcn ankaraya gidece\u011fim belki bir\u015feyler al\u0131r\u0131m\"))\n```\nWill return us \"True\" along with the accuracy point which is 0.85\n\n### Typo Correction\n\n```python\nlwords = obj.list_words(\"vri k\u00fcmsi idrae edre ancaka daha g\u00fcezl oalbilir\")\nprint(obj.auto_correct(lwords))\n```\nWhich will print out ['veri', 'k\u00fcmesi', 'idare', 'eder', 'ancak', 'daha', 'g\u00fczel', 'olabilir']. \"List_words\" method simply splits the text by words with the help of regex. You can simply use \"join\" to make it a sentence again like this;\n\n```python\nlwords = obj.list_words(\"vri k\u00fcmsi idrae edre ancaka daha g\u00fcezl oalbilir\")\ncorrected_words = obj.auto_correct(lwords)\ncorrected_string = \" \".join(corrected_words)\n```\nWhich will print out 'veri k\u00fcmesi idare eder ancak daha g\u00fczel olabilir'. \n\n### Syllabication\n\n```python\nobj.syllabicate_sentence(\"Hi\u00e7 unutmad\u0131m, do\u011fudan esen hafif bir yel sa\u00e7lar\u0131n\u0131 dalgaland\u0131r\u0131yordu\")\n```\nAnd it will give you ;\n\n\"[['hi\u00e7'], ['u', 'nut', 'ma', 'd\u0131m,'], ['do', '\u011fu', 'dan'], ['e', 'sen'], ['ha', 'fif'], ['bir'], ['yel'], ['sa\u00e7', 'la', 'r\u0131', 'n\u0131'], ['dal', 'ga', 'lan', 'd\u0131', 'r\u0131', 'yor', 'du']]\"\n\n### Vowel Harmony\n\nThis is a Turkish language rule. You can check if a word is vowel harmonic by doing this;\n\n\n```python\nobj.is_vowel_harmonic(\"Belki\")\n```\nWhich will return True, since it is vowel harmonic.\n\n### Is Turkish Origin\n\nAgain there are Turkish language rules so you can check if a word is Turkish origin or not. For example;\nThe word 'program' is not a Turkish word. Lets try and check;\n\n```python\nobj.is_turkish_origin(\"program\")\n```\nReturns false. On the other hand the word 'yaz\u0131l\u0131m';\n\n```python\nobj.is_turkish_origin(\"yaz\u0131l\u0131m\")\n```\nGives us True\n\n### Correct Text Without WhiteSpace\nImportant Note : Since this function is based on an another dataset, you need to re-call download function again.\n\nAs it is said in the title this function corrects the text without whitespace. For example you have the word 't\u00fcrk\u00e7edo\u011faldili\u015fleme'. We call the function and pass the word as the param;\n```python\nobj.correct_text_without_space('t\u00fcrk\u00e7edo\u011faldili\u015fleme')\n```\nWill return us ; 't\u00fcrk\u00e7e do\u011fal dil i\u015fleme' as expected. Lets try something longer a random text I have found; 'hidroelektriksantralbarajlardasuyunenerjisikullan\u0131l\u0131rakelektrikenerjisi\u00fcretilensantralehidroelektriksantralad\u0131verilir'\n\n```python\nobj.correct_text_without_space('hidroelektriksantralbarajlardasuyunenerjisikullan\u0131l\u0131rakelektrikenerjisi\u00fcretilensantralehidroelektriksantralad\u0131verilir')\n```\nWill return ; 'hidroelektrik santral baraj l ar\u0131n da suyun enerjisi kullan \u0131 l \u0131rak elektrik enerjisi \u00fcretilen santral e hidroelektrik santral ad\u0131 veri lir'. As you can see this function is not %100 accurate since it is very dependant on the dataset. If someone to create a clear dataset for this function, I think it will run very smooth with this current approach. Note : This function does not exist in the current Pypi release\n\n\n\n# Python ile T\u00fcrk\u00e7e Dil \u0130\u015fleme\n\nTurkishNLP k\u00fct\u00fcphanesinin alfa versiyonu. \u015eimdilik T\u00fcrk\u00e7e dilini tespit etme, Bo\u015fluksuz yaz\u0131lan yaz\u0131y\u0131 bo\u015fluklar\u0131na ay\u0131rma, T\u00fcrk\u00e7e yaz\u0131m hatalar\u0131n\u0131 d\u00fczeltme, b\u00fcy\u00fck \u00fcnl\u00fc uyumu kontrol\u00fc, T\u00fcrk\u00e7e k\u00f6ken kontrol\u00fc ve kelimeleri hecelere ayr\u0131ma olmak \u00fczere 5 ana fonksiyonu var\n\n## Veri\nVeri k\u00fcmesi wikipedia'n\u0131n T\u00fcrk\u00e7e dump'\u0131 parselan\u0131p temizlenerek olu\u015fturuldu.\n\n## Ba\u015flarken\n\u00d6ncelikle ba\u015flamadan, pip ile k\u00fct\u00fcphaneyi y\u00fcklemeniz gerekiyor. \u015eu \u015fekilde;\n```\npip install turkishnlp\n```\nAyr\u0131ca \u015fu \u015fekilde yay\u0131nlanan son versiyonu indirebilirsiniz;\n```\npip install --upgrade turkishnlp\n```\nY\u00fckledikten sonra k\u00fct\u00fcphaneyi \u015fu \u015fekilde import etmeyi deneyin;\n\n```python\nimport turkishnlp\n```\n### Veriyi indirmek\nVeriyi indirmek i\u00e7in \u00f6nce TurkishNLP s\u0131n\u0131f\u0131ndan t\u00fcretilmi\u015f bir obje olu\u015fturmam\u0131z laz\u0131m;\n```python\nfrom turkishnlp import detector\nobj = detector.TurkishNLP()\n```\nObjeyi olu\u015fturduktan sonra indirme metodunu \u015fu \u015fekilde \u00e7a\u011f\u0131rarak indirme i\u015flemini ba\u015flatabiliriz ;\n\n```python\nobj.download()\n```\n\u0130ndirme i\u015flemi \u00e7ok uzun s\u00fcrmeden bitecek ve ard\u0131ndan \"Download Succesful\" yani indirme ba\u015far\u0131l\u0131 manas\u0131na gelen bir yaz\u0131 ekrana bas\u0131lacak\n\n### Verisetini olu\u015fturmak\n\u0130ndirdi\u011fimiz veriden kodun i\u00e7inde kullanaca\u011f\u0131m\u0131z verisetlerini olu\u015fturmak i\u00e7in basitce;\n```python\nobj.create_word_set()\n```\nYap\u0131yoruz ve i\u015flem tamamlanm\u0131\u015f oluyor\n\n### \u00d6rnek Kullan\u0131m\nBa\u015fl\u0131kta da belirtti\u011fim gibi temel olarak 5 metod var.\n\n### T\u00fcrk\u00e7e Dil Tespiti\n\n```python\nprint(obj.is_turkish(\"Ben bug\u00fcn ankaraya gidece\u011fim belki bir\u015feyler al\u0131r\u0131m\"))\n```\nYapt\u0131\u011f\u0131nda g\u00f6rece\u011fiz ki, ekrana \"True\" bast\u0131r\u0131yor ve do\u011fruluk oran\u0131 olarak 0.85 d\u00f6nd\u00fcr\u00fcyor.\n\n### Yaz\u0131m Hatas\u0131 D\u00fczeltme\n\n```python\nlwords = obj.list_words(\"vri k\u00fcmsi idrae edre ancaka daha g\u00fcezl oalbilir\")\nprint(obj.auto_correct(lwords))\n```\nYap\u0131yoruz ve sonu\u00e7 olarak bize ['veri', 'k\u00fcmesi', 'idare', 'eder', 'ancak', 'daha', 'g\u00fczel', 'olabilir'] listesi veriliyor. Burada \"list_words\" metodunun yapt\u0131\u011f\u0131 string olarak gelen texti regex yard\u0131m\u0131yla kelimelerine ay\u0131rmakt\u0131r Kelimeleri birle\u015ftirmek i\u00e7in Python'\u0131n \"join\" metodu kullan\u0131labilir. \u00d6rne\u011fin;\n\n```python\nlwords = obj.list_words(\"vri k\u00fcmsi idrae edre ancaka daha g\u00fcezl oalbilir\")\ncorrected_words = obj.auto_correct(lwords)\ncorrected_string = \" \".join(corrected_words)\n```\nYazd\u0131raca\u011f\u0131 sonu\u00e7 : 'veri k\u00fcmesi idare eder ancak daha g\u00fczel olabilir'. \n\n### Hecelere Ay\u0131rmak \n```python\nobj.syllabicate_sentence(\"Hi\u00e7 unutmad\u0131m, do\u011fudan esen hafif bir yel sa\u00e7lar\u0131n\u0131 dalgaland\u0131r\u0131yordu\")\n```\nYap\u0131yoruz. Ve d\u00f6nen sonu\u00e7;\n\n\"[['hi\u00e7'], ['u', 'nut', 'ma', 'd\u0131m,'], ['do', '\u011fu', 'dan'], ['e', 'sen'], ['ha', 'fif'], ['bir'], ['yel'], ['sa\u00e7', 'la', 'r\u0131', 'n\u0131'], ['dal', 'ga', 'lan', 'd\u0131', 'r\u0131', 'yor', 'du']]\"\n\n### B\u00fcy\u00fck \u00dcnl\u00fc Uyumu\n\nHerhangi bir kelimenin b\u00fcy\u00fck \u00fcnl\u00fc uyumuna uyup uymad\u0131\u011f\u0131n\u0131 \u015fu \u015fekilde kontrol edebiliriz;\n\n```python\nobj.is_vowel_harmonic(\"Belki\")\n```\n'belki' kelimesi b\u00fcy\u00fck \u00fcnl\u00fc uyumuna uydu\u011fundan bu i\u015flem bize True d\u00f6nd\u00fcrecektir\n\n### T\u00fcrk\u00e7e K\u00f6ken Kontrol\u00fc\n\nBir kelimenin T\u00fcrk\u00e7e k\u00f6kenli olup olmad\u0131\u011f\u0131n\u0131 \u00f6\u011frenmek i\u00e7in \u00e7e\u015fitli kurallar var. turkishnlp k\u00fct\u00fcphanesiyle 'program' kelimesinin t\u00fcrk\u00e7e k\u00f6kenli olup olmad\u0131\u011f\u0131n\u0131 \u00f6\u011frenmek i\u00e7in;\n\n```python\nobj.is_turkish_origin(\"program\")\n```\nYap\u0131yoruz ve bize False de\u011feri d\u00f6nd\u00fcr\u00fcyor. \u00d6te yandan 'yaz\u0131l\u0131m' kelimesi i\u00e7in\n\n```python\nobj.is_turkish_origin(\"yaz\u0131l\u0131m\")\n```\nYap\u0131yoruz ve bize True de\u011ferini d\u00f6nd\u00fcr\u00fcyor\n\n### Bo\u015fluksuz Yaz\u0131lan Yaz\u0131y\u0131 D\u00fczeltme\n\u00d6nemli Not : Bu fonksiyon farkl\u0131 bir verik\u00fcmesine ba\u011fl\u0131 oldu\u011fundan, download fonksiyonunu tekrar \u00e7al\u0131\u015ft\u0131rman\u0131z gerekecektir.\n\nBu fonksiyon ba\u015fl\u0131kta da belirtildi\u011fi gibi bo\u015fluksuz olarak yaz\u0131lan bir yaz\u0131y\u0131, bo\u015fluklar\u0131na ay\u0131r\u0131yor. \u00d6rne\u011fin, 't\u00fcrk\u00e7edo\u011faldili\u015fleme' kelimesine sahip oldu\u011fumuzu d\u00fc\u015f\u00fcnelim. Fonksiyonu \u00e7a\u011f\u0131r\u0131p kelimeyi parametre olarak ge\u00e7ti\u011fimizde;\n```python\nobj.correct_text_without_space('t\u00fcrk\u00e7edo\u011faldili\u015fleme')\n```\nBize beklendi\u011fi gibi ; 't\u00fcrk\u00e7e do\u011fal dil i\u015fleme' d\u00f6necek. \u015eimdi internetten rastgele bulup bo\u015fluklar\u0131n\u0131 sildi\u011fim bir yaz\u0131y\u0131 deneyelim; 'hidroelektriksantralbarajlardasuyunenerjisikullan\u0131l\u0131rakelektrikenerjisi\u00fcretilensantralehidroelektriksantralad\u0131verilir'\n\n```python\nobj.correct_text_without_space('hidroelektriksantralbarajlardasuyunenerjisikullan\u0131l\u0131rakelektrikenerjisi\u00fcretilensantralehidroelektriksantralad\u0131verilir')\n```\nBize ; 'hidroelektrik santral baraj l ar\u0131n da suyun enerjisi kullan \u0131 l \u0131rak elektrik enerjisi \u00fcretilen santral e hidroelektrik santral ad\u0131 veri lir'. G\u00f6r\u00fcld\u00fc\u011f\u00fc \u00fczere bu fonksiyon kelime k\u00fcmesine de fazla ba\u011fl\u0131 oldu\u011fu i\u00e7in %100 do\u011fruluk oran\u0131yla \u00e7al\u0131\u015fmad\u0131. Ancak temiz bir veri k\u00fcmesi olu\u015fturuldu\u011fu takdirde bu yakla\u015f\u0131mla \u00e7ok daha y\u00fcksek bir do\u011fruluk oran\u0131 yakalanaca\u011f\u0131n\u0131 d\u00fc\u015f\u00fcn\u00fcyorum. Not : Bu fonksiyon Pypi release'inde mevcut de\u011fil", "description_content_type": "text/markdown", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/MeteHanC/turkishnlp", "keywords": "turkishnlp,python,nlp,language processing", "license": "", "maintainer": "Metehan Cetinkaya", "maintainer_email": "metehancet@gmail.com", "name": "turkishnlp", "package_url": "https://pypi.org/project/turkishnlp/", "platform": "", "project_url": "https://pypi.org/project/turkishnlp/", "project_urls": { "Homepage": "https://github.com/MeteHanC/turkishnlp" }, "release_url": "https://pypi.org/project/turkishnlp/0.0.61/", "requires_dist": null, "requires_python": "", "summary": "A python script that processes Turkish language", "version": "0.0.61" }, "last_serial": 5705670, "releases": { "0.0.3": [ { "comment_text": "", "digests": { "md5": "d4c3b2fd711c8bf27fde550798964a74", "sha256": "900c2b5bb6fc40dce597a954a459e1ff645e43efbf2b0c7e478a765028fb7bdb" }, "downloads": -1, "filename": "turkishnlp-0.0.3-py3-none-any.whl", "has_sig": false, "md5_digest": "d4c3b2fd711c8bf27fde550798964a74", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 3698, "upload_time": "2018-08-28T20:19:36", "url": "https://files.pythonhosted.org/packages/95/33/497ed40fc0b0de17fdd8c8587ab25a8de2ab396fe9193015ca3da2fe5974/turkishnlp-0.0.3-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "bf4978b8e416a085337c4a0d7732020c", "sha256": "b1cc13b69c38a0cb3e6dc70e31cc8fac3cebaf461a1844e4834956b598bb4a79" }, "downloads": -1, "filename": "turkishnlp-0.0.3.tar.gz", "has_sig": false, "md5_digest": "bf4978b8e416a085337c4a0d7732020c", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 2675, "upload_time": "2018-08-28T20:19:37", "url": "https://files.pythonhosted.org/packages/ad/92/30c823ee9011d24a25f4b5933c897d1da6ebf35800143ce4d5db74656d16/turkishnlp-0.0.3.tar.gz" } ], "0.0.5": [ { "comment_text": "", "digests": { "md5": "7f18c41a66784e71eeb57a12fc1105f4", "sha256": "75fcbaffb206d7ad03deadddf30a8b626a3257e9cd7584b5082fe77078941f99" }, "downloads": -1, "filename": "turkishnlp-0.0.5-py3-none-any.whl", "has_sig": false, "md5_digest": "7f18c41a66784e71eeb57a12fc1105f4", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 6067, "upload_time": "2018-09-16T18:51:26", "url": "https://files.pythonhosted.org/packages/d4/14/9e404acc8f433e5c252804611794725983296682b037cf5d0f6ac344eb11/turkishnlp-0.0.5-py3-none-any.whl" } ], "0.0.6": [ { "comment_text": "", "digests": { "md5": "8ee6310b142e1ff939e71b7a6c29bd3e", "sha256": "1e35d5724ec5cb8eb34b0e0f5018c717e7f3ed79a62c09f5bd61b963b2b554ed" }, "downloads": -1, "filename": "turkishnlp-0.0.6.tar.gz", "has_sig": false, "md5_digest": "8ee6310b142e1ff939e71b7a6c29bd3e", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 6474, "upload_time": "2019-04-29T17:47:02", "url": "https://files.pythonhosted.org/packages/a5/ea/d3a5b072847e05d123eed0e49dd420b8bd5415b499439499196ab2b1486f/turkishnlp-0.0.6.tar.gz" } ], "0.0.61": [ { "comment_text": "", "digests": { "md5": "03896de30d884622f5016f3abb8aff18", "sha256": "bf406519e0da42a3fb4d7e66d513cf0d48cd9483e16105a161fc2cd9390fbe7b" }, "downloads": -1, "filename": "turkishnlp-0.0.61.tar.gz", "has_sig": false, "md5_digest": "03896de30d884622f5016f3abb8aff18", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 8385, "upload_time": "2019-08-20T21:09:44", "url": "https://files.pythonhosted.org/packages/9d/05/84e26ea98e5818ba430d6905c4640a526f7a3e59a64d010c42e80a889890/turkishnlp-0.0.61.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "03896de30d884622f5016f3abb8aff18", "sha256": "bf406519e0da42a3fb4d7e66d513cf0d48cd9483e16105a161fc2cd9390fbe7b" }, "downloads": -1, "filename": "turkishnlp-0.0.61.tar.gz", "has_sig": false, "md5_digest": "03896de30d884622f5016f3abb8aff18", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 8385, "upload_time": "2019-08-20T21:09:44", "url": "https://files.pythonhosted.org/packages/9d/05/84e26ea98e5818ba430d6905c4640a526f7a3e59a64d010c42e80a889890/turkishnlp-0.0.61.tar.gz" } ] }