{ "info": { "author": "Rami Al-Rfou", "author_email": "rmyeid@gmail.com", "bugtrack_url": null, "classifiers": [ "Development Status :: 4 - Beta", "Environment :: Console", "Intended Audience :: Education", "Intended Audience :: Science/Research", "License :: OSI Approved :: GNU General Public License v3 or later (GPLv3+)", "Natural Language :: Afrikaans", "Natural Language :: Arabic", "Natural Language :: Bengali", "Natural Language :: Bosnian", "Natural Language :: Bulgarian", "Natural Language :: Catalan", "Natural Language :: Chinese (Simplified)", "Natural Language :: Chinese (Traditional)", "Natural Language :: Croatian", "Natural Language :: Czech", "Natural Language :: Danish", "Natural Language :: Dutch", "Natural Language :: English", "Natural Language :: Esperanto", "Natural Language :: Finnish", "Natural Language :: French", "Natural Language :: Galician", "Natural Language :: German", "Natural Language :: Greek", "Natural Language :: Hebrew", "Natural Language :: Hindi", "Natural Language :: Hungarian", "Natural Language :: Icelandic", "Natural Language :: Indonesian", "Natural Language :: Italian", "Natural Language :: Japanese", "Natural Language :: Javanese", "Natural Language :: Korean", "Natural Language :: Latin", "Natural Language :: Latvian", "Natural Language :: Macedonian", "Natural Language :: Malay", "Natural Language :: Marathi", "Natural Language :: Norwegian", "Natural Language :: Panjabi", "Natural Language :: Persian", "Natural Language :: Polish", "Natural Language :: Portuguese", "Natural Language :: Portuguese (Brazilian)", "Natural Language :: Romanian", "Natural Language :: Russian", "Natural Language :: Serbian", "Natural Language :: Slovak", "Natural Language :: Slovenian", "Natural Language :: Spanish", "Natural Language :: Swedish", "Natural Language :: Tamil", "Natural Language :: Telugu", "Natural Language :: Thai", "Natural Language :: Turkish", "Natural Language :: Ukranian", "Natural Language :: Urdu", "Natural Language :: Vietnamese", "Programming Language :: Python :: 2", "Programming Language :: Python :: 2.7", "Programming Language :: Python :: 3", "Programming Language :: Python :: 3.4", "Topic :: Scientific/Engineering :: Artificial Intelligence", "Topic :: Text Processing :: Linguistic" ], "description": "polyglot\n========\n\n|Downloads| |Latest Version| |Build Status| |Documentation Status|\n\n.. |Downloads| image:: https://img.shields.io/pypi/dm/polyglot.svg\n :target: https://pypi.python.org/pypi/polyglot\n.. |Latest Version| image:: https://badge.fury.io/py/polyglot.svg\n :target: https://pypi.python.org/pypi/polyglot\n.. |Build Status| image:: https://travis-ci.org/aboSamoor/polyglot.png?branch=master\n :target: https://travis-ci.org/aboSamoor/polyglot\n.. |Documentation Status| image:: https://readthedocs.org/projects/polyglot/badge/?version=latest\n :target: https://readthedocs.org/builds/polyglot/\n\nPolyglot is a natural language pipeline that supports massive\nmultilingual applications.\n\n- Free software: GPLv3 license\n- Documentation: http://polyglot.readthedocs.org.\n- GitHub: https://github.com/aboSamoor/polyglot\n\nFeatures\n~~~~~~~~\n\n- Tokenization (165 Languages)\n- Language detection (196 Languages)\n- Named Entity Recognition (40 Languages)\n- Part of Speech Tagging (16 Languages)\n- Sentiment Analysis (136 Languages)\n- Word Embeddings (137 Languages)\n- Morphological analysis (135 Languages)\n- Transliteration (69 Languages)\n\nDeveloper\n~~~~~~~~~\n\n- Rami Al-Rfou @ ``rmyeid gmail com``\n\nQuick Tutorial\n--------------\n\n.. code:: python\n\n import polyglot\n from polyglot.text import Text, Word\n\nLanguage Detection\n~~~~~~~~~~~~~~~~~~\n\n.. code:: python\n\n text = Text(\"Bonjour, Mesdames.\")\n print(\"Language Detected: Code={}, Name={}\\n\".format(text.language.code, text.language.name))\n\n\n.. parsed-literal::\n\n Language Detected: Code=fr, Name=French\n \n\n\nTokenization\n~~~~~~~~~~~~\n\n.. code:: python\n\n zen = Text(\"Beautiful is better than ugly. \"\n \"Explicit is better than implicit. \"\n \"Simple is better than complex.\")\n print(zen.words)\n\n\n.. parsed-literal::\n\n [u'Beautiful', u'is', u'better', u'than', u'ugly', u'.', u'Explicit', u'is', u'better', u'than', u'implicit', u'.', u'Simple', u'is', u'better', u'than', u'complex', u'.']\n\n\n.. code:: python\n\n print(zen.sentences)\n\n\n.. parsed-literal::\n\n [Sentence(\"Beautiful is better than ugly.\"), Sentence(\"Explicit is better than implicit.\"), Sentence(\"Simple is better than complex.\")]\n\n\nPart of Speech Tagging\n~~~~~~~~~~~~~~~~~~~~~~\n\n.. code:: python\n\n text = Text(u\"O primeiro uso de desobedi\u00eancia civil em massa ocorreu em setembro de 1906.\")\n \n print(\"{:<16}{}\".format(\"Word\", \"POS Tag\")+\"\\n\"+\"-\"*30)\n for word, tag in text.pos_tags:\n print(u\"{:<16}{:>2}\".format(word, tag))\n\n\n.. parsed-literal::\n\n Word POS Tag\n ------------------------------\n O DET\n primeiro ADJ\n uso NOUN\n de ADP\n desobedi\u00eancia NOUN\n civil ADJ\n em ADP\n massa NOUN\n ocorreu ADJ\n em ADP\n setembro NOUN\n de ADP\n 1906 NUM\n . PUNCT\n\n\nNamed Entity Recognition\n~~~~~~~~~~~~~~~~~~~~~~~~\n\n.. code:: python\n\n text = Text(u\"In Gro\u00dfbritannien war Gandhi mit dem westlichen Lebensstil vertraut geworden\")\n print(text.entities)\n\n\n.. parsed-literal::\n\n [I-LOC([u'Gro\\\\xdfbritannien']), I-PER([u'Gandhi'])]\n\n\nPolarity\n~~~~~~~~\n\n.. code:: python\n\n print(\"{:<16}{}\".format(\"Word\", \"Polarity\")+\"\\n\"+\"-\"*30)\n for w in zen.words[:6]:\n print(\"{:<16}{:>2}\".format(w, w.polarity))\n\n\n.. parsed-literal::\n\n Word Polarity\n ------------------------------\n Beautiful 0\n is 0\n better 1\n than 0\n ugly -1\n . 0\n\n\nEmbeddings\n~~~~~~~~~~\n\n.. code:: python\n\n word = Word(\"Obama\", language=\"en\")\n print(\"Neighbors (Synonms) of {}\".format(word)+\"\\n\"+\"-\"*30)\n for w in word.neighbors:\n print(\"{:<16}\".format(w))\n print(\"\\n\\nThe first 10 dimensions out the {} dimensions\\n\".format(word.vector.shape[0]))\n print(word.vector[:10])\n\n\n.. parsed-literal::\n\n Neighbors (Synonms) of Obama\n ------------------------------\n Bush \n Reagan \n Clinton \n Ahmadinejad \n Nixon \n Karzai \n McCain \n Biden \n Huckabee \n Lula \n \n \n The first 10 dimensions out the 256 dimensions\n \n [-2.57382345 1.52175975 0.51070285 1.08678675 -0.74386948 -1.18616164\n 2.92784619 -0.25694436 -1.40958667 -2.39675403]\n\n\nMorphology\n~~~~~~~~~~\n\n.. code:: python\n\n word = Text(\"Preprocessing is an essential step.\").words[0]\n print(word.morphemes)\n\n\n.. parsed-literal::\n\n [u'Pre', u'process', u'ing']\n\n\nTransliteration\n~~~~~~~~~~~~~~~\n\n.. code:: python\n\n from polyglot.transliteration import Transliterator\n transliterator = Transliterator(source_lang=\"en\", target_lang=\"ru\")\n print(transliterator.transliterate(u\"preprocessing\"))\n\n\n.. parsed-literal::\n\n \u043f\u0440\u0435\u043f\u0440\u043e\u043a\u0435\u0441\u0441\u0438\u043d\u0433\n\n\n\n\n\nHistory\n-------\n\n\"14.11\" (2014-01-11)\n---------------------\n\n* First release on PyPI.\n\n\n\"15.5.2\" (2015-05-02)\n---------------------\n\n* Polyglot is feature complete.\n\n\n\"15.10.03\" (2015-10-03)\n---------------------------\n\n* Change the polyglot models mirror to Stony Brook University DSL lab instead\n of Google cloud storage.\n\n\n\"16.07.04\" (2016-07-03)\n---------------------------\n\n* New Features:\n - Support Transfer POS Tagging.\n - Support supplying `hint_language_code` for `Text`.\n\n* Bug Fix: \n - Improve sentence serialization (PR #34)\n - Fix rare unicode encode error (PR #35)\n - Fix transliteration from languages other than English (PR 46)\n - Add link to Github in README (PR #49)\n - Make handling of paths more coherent (RP #55)\n - Fix normalizing embedding in place for NER corrupts the features of POS (issue #60, PR #62)", "description_content_type": null, "docs_url": null, "download_url": "UNKNOWN", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/aboSamoor/polyglot", "keywords": "polyglot", "license": "GPLv3", "maintainer": null, "maintainer_email": null, "name": "polyglot", "package_url": "https://pypi.org/project/polyglot/", "platform": "UNKNOWN", "project_url": "https://pypi.org/project/polyglot/", "project_urls": { "Download": "UNKNOWN", "Homepage": "https://github.com/aboSamoor/polyglot" }, "release_url": "https://pypi.org/project/polyglot/16.7.4/", "requires_dist": null, "requires_python": null, "summary": "Polyglot is a natural language pipeline that supports massive multilingual applications.", "version": "16.7.4" }, "last_serial": 2200800, "releases": { "14.11": [ { "comment_text": "", "digests": { "md5": "f3438f1dfa48e1b136ead6f9a80d8584", "sha256": "312f07f365744272d0df5dc83267e76766a2f0525aa531ec2236d1f79185133d" }, "downloads": -1, "filename": "polyglot-14.11.tar.gz", "has_sig": false, "md5_digest": "f3438f1dfa48e1b136ead6f9a80d8584", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 64610, "upload_time": "2014-09-08T21:00:03", "url": "https://files.pythonhosted.org/packages/6f/00/bd18604d6c320c219be344c039282d3602a36c71b63d0f69246206b3e5fc/polyglot-14.11.tar.gz" } ], "15.03": [ { "comment_text": "", "digests": { "md5": "3ca6c1ca36768b2f856997c0e9c467c9", "sha256": "369c8ef564fa644b1404e7f7461894705bc9543ae2dbf87ab8432272bca8a777" }, "downloads": -1, "filename": "polyglot-15.03.tar.gz", "has_sig": false, "md5_digest": "3ca6c1ca36768b2f856997c0e9c467c9", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 69405, "upload_time": "2015-03-03T03:18:05", "url": "https://files.pythonhosted.org/packages/f1/6a/55e7f320d059df9194b7fa1b911c192b96c36c0c4feae0b9fd39cbcd5216/polyglot-15.03.tar.gz" } ], "15.03.05": [ { "comment_text": "", "digests": { "md5": "189904eb084769b20acdf3f28bb4f64d", "sha256": "8d0b03d5b19f3892170a9654db91333cf0a85cf3f621615b3b276ec7a6f22121" }, "downloads": -1, "filename": "polyglot-15.03.05.tar.gz", "has_sig": false, "md5_digest": "189904eb084769b20acdf3f28bb4f64d", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 88332, "upload_time": "2015-03-05T23:58:17", "url": "https://files.pythonhosted.org/packages/95/85/0d926175a874dd6f764fc8bb53500241394846967a32ccd36bbb5f3d47cf/polyglot-15.03.05.tar.gz" } ], "15.03.17": [ { "comment_text": "", "digests": { "md5": "c32ad952e621b0f8e785c8ef175cc767", "sha256": "66ac1ab35e2ba666510f2d87746930b0e18a75dce1b16d0692be019d09a86b95" }, "downloads": -1, "filename": "polyglot-15.03.17.tar.gz", "has_sig": false, "md5_digest": "c32ad952e621b0f8e785c8ef175cc767", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 130040, "upload_time": "2015-03-17T04:22:24", "url": "https://files.pythonhosted.org/packages/a6/bf/99f529778822764c0f3c394b05494089530190ac31c698124a6ddc0a83a0/polyglot-15.03.17.tar.gz" } ], "15.04.19": [ { "comment_text": "", "digests": { "md5": "f922f1fe4704581cd0e8075d3f2f8dec", "sha256": "f2706f2311d5afa2d05f325ea6dd7c2dcd1b387db79d311fc365bb210e0a0da9" }, "downloads": -1, "filename": "polyglot-15.04.19.tar.gz", "has_sig": false, "md5_digest": "f922f1fe4704581cd0e8075d3f2f8dec", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 141999, "upload_time": "2015-04-20T04:16:32", "url": "https://files.pythonhosted.org/packages/3a/07/a576344b548b4f8ff48db4c4b0cc4212df5e126ffdd9cc1d17090ddb8efe/polyglot-15.04.19.tar.gz" } ], "15.10.03": [ { "comment_text": "", "digests": { "md5": "9aaeac6ece72e4ace1703f793fdd9dd5", "sha256": "f763f8ffb85f9b24ca1dde3e4fd41d1f5db8cbec32c1703b3fe37e8c5f1c64d9" }, "downloads": -1, "filename": "polyglot-15.10.03-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "9aaeac6ece72e4ace1703f793fdd9dd5", "packagetype": "bdist_wheel", "python_version": "2.7", "requires_python": null, "size": 54930, "upload_time": "2015-10-03T22:18:48", "url": "https://files.pythonhosted.org/packages/c3/e9/fe3669dbc44b4c4d1e4dc01e62dd89f5513d5ad537516284136f55002627/polyglot-15.10.03-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "ca114b46b4f6150c6a8388c3eb4631da", "sha256": "c63566ea655e8790f1fb8a3f5f60626418e0de085dd5447c4e43297a58e49f20" }, "downloads": -1, "filename": "polyglot-15.10.03.tar.gz", "has_sig": false, "md5_digest": "ca114b46b4f6150c6a8388c3eb4631da", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 126386, "upload_time": "2015-10-03T22:18:32", "url": "https://files.pythonhosted.org/packages/d3/c3/fee35a094d07a3f19142ba64fa32446af2ee23584b5ee3c1f60519dc3b72/polyglot-15.10.03.tar.gz" } ], "15.5.1": [], "15.5.2": [ { "comment_text": "", "digests": { "md5": "fef3153342e28745b706d5686f4f608a", "sha256": "84777fee5951537aeb24e5619d7587c095e9c266714ee874f858b673e3a5e918" }, "downloads": -1, "filename": "polyglot-15.5.2.tar.gz", "has_sig": false, "md5_digest": "fef3153342e28745b706d5686f4f608a", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 126980, "upload_time": "2015-05-01T12:14:38", "url": "https://files.pythonhosted.org/packages/95/44/23a984c476d735c8f706a5b62b8ab50975e97bbd3bd204cb8bf07d928158/polyglot-15.5.2.tar.gz" } ], "16.7.4": [ { "comment_text": "", "digests": { "md5": "645969b6b1eaf78d8893ed70756ea577", "sha256": "f7d9cca9a212622548e9416fb89f1238b994b8860ef49e03b7c82c67f9b6269b" }, "downloads": -1, "filename": "polyglot-16.7.4.tar.gz", "has_sig": false, "md5_digest": "645969b6b1eaf78d8893ed70756ea577", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 126296, "upload_time": "2016-07-03T20:05:42", "url": "https://files.pythonhosted.org/packages/e7/98/e24e2489114c5112b083714277204d92d372f5bbe00d5507acf40370edb9/polyglot-16.7.4.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "645969b6b1eaf78d8893ed70756ea577", "sha256": "f7d9cca9a212622548e9416fb89f1238b994b8860ef49e03b7c82c67f9b6269b" }, "downloads": -1, "filename": "polyglot-16.7.4.tar.gz", "has_sig": false, "md5_digest": "645969b6b1eaf78d8893ed70756ea577", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 126296, "upload_time": "2016-07-03T20:05:42", "url": "https://files.pythonhosted.org/packages/e7/98/e24e2489114c5112b083714277204d92d372f5bbe00d5507acf40370edb9/polyglot-16.7.4.tar.gz" } ] }