{ "info": { "author": "Abhijit Balaji", "author_email": "balaabhijit5@gmail.com", "bugtrack_url": null, "classifiers": [ "License :: OSI Approved :: MIT License", "Operating System :: OS Independent", "Programming Language :: Python :: 3" ], "description": "# spacy-langdetect\nFully customizable language detection pipeline for [spaCy](https://github.com/explosion/spaCy)\n\n## Installation\n`pip install spacy-langdetect`\n\n## NOTE:\nRequires spaCy >= 2.0. This dependency is removed in `pip install spacy-langdetect` so that it can be used with `nightly` versions also\n\n## Basic usage\nOut of the box, under the hood it uses [langdetect](https://github.com/Mimino666/langdetect) to detect languages on spaCy's Doc and Span objects.\n\n```python\nimport spacy\nfrom spacy_langdetect import LanguageDetector\nnlp = spacy.load(\"en\")\nnlp.add_pipe(LanguageDetector(), name=\"language_detector\", last=True)\ntext = \"This is English text. Er lebt mit seinen Eltern und seiner Schwester in Berlin. Yo me divierto todos los d\u00edas en el parque. Je m'appelle Ang\u00e9lica Summer, j'ai 12 ans et je suis canadienne.\"\ndoc = nlp(text)\n# document level language detection. Think of it like average language of document!\nprint(doc._.language)\n# sentence level language detection\nfor i, sent in enumerate(doc.sents):\n print(sent, sent._.language)\n```\n\n## Using your own language detector\nSuppose you are not happy with the accuracy of the out of the box language detector or you have your own language detector which you want to use with spaCy pipeline. How do you do it? That's where the `language_detection_function` argument comes in. The function takes in a Spacy Doc or Span object and can return any python object which is stored in `doc._.language` and `span._.language`. For example, let's say you want to use [googletrans](https://pypi.org/project/googletrans/) as your language detection module:\n\n```python\nimport spacy\nfrom spacy.tokens import Doc, Span\nfrom spacy_langdetect import LanguageDetector\n# install using pip install googletrans\nfrom googletrans import Translator\nnlp = spacy.load(\"en\")\n\ndef custom_detection_function(spacy_object):\n # custom detection function should take a Spacy Doc or a\n assert isinstance(spacy_object, Doc) or isinstance(\n spacy_object, Span), \"spacy_object must be a spacy Doc or Span object but it is a {}\".format(type(spacy_object))\n detection = Translator().detect(spacy_object.text)\n return {'language':detection.lang, 'score':detection.confidence}\n\nnlp.add_pipe(LanguageDetector(language_detection_function=custom_detection_function), name=\"language_detector\", last=True)\ntext = \"This is English text. Er lebt mit seinen Eltern und seiner Schwester in Berlin. Yo me divierto todos los d\u00edas en el parque. Je m'appelle Ang\u00e9lica Summer, j'ai 12 ans et je suis canadienne.\"\ndoc = nlp(text)\n# document level language detection. Think of it like average language of document!\nprint(doc._.language)\n# sentence level language detection\nfor i, sent in enumerate(doc.sents):\n print(sent, sent._.language)\n```\nSimilarly you can also use [pycld2](https://pypi.org/project/pycld2/) and other language detectors with spaCy\n\n\n", "description_content_type": "text/markdown", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/Abhijit-2592/spacy-langdetect", "keywords": "", "license": "", "maintainer": "", "maintainer_email": "", "name": "spacy-langdetect", "package_url": "https://pypi.org/project/spacy-langdetect/", "platform": "", "project_url": "https://pypi.org/project/spacy-langdetect/", "project_urls": { "Homepage": "https://github.com/Abhijit-2592/spacy-langdetect" }, "release_url": "https://pypi.org/project/spacy-langdetect/0.1.2/", "requires_dist": [ "pytest", "langdetect (==1.0.7)" ], "requires_python": "", "summary": "Fully customizable language detection pipeline for spaCy", "version": "0.1.2" }, "last_serial": 5215583, "releases": { "0.1.1": [ { "comment_text": "", "digests": { "md5": "cbb950e2a0a66573b06e635bc7712699", "sha256": "731d90e422b2237efedbec5a9f1abab18bc185b51588fff07f61a48238281a95" }, "downloads": -1, "filename": "spacy_langdetect-0.1.1-py3-none-any.whl", "has_sig": false, "md5_digest": "cbb950e2a0a66573b06e635bc7712699", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 6059, "upload_time": "2019-02-12T14:34:27", "url": "https://files.pythonhosted.org/packages/46/4c/3fdf892237cbcbd6b9ff30aedc4fb5f8a804a94ec7933b16126306fff5eb/spacy_langdetect-0.1.1-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "e5ad4d684d601b13a70961512fe2dc67", "sha256": "1a2a5d4f3d2861f5cd0e3fe4eb041f26d4579416de00a28e5f76392544dcb0d9" }, "downloads": -1, "filename": "spacy-langdetect-0.1.1.tar.gz", "has_sig": false, "md5_digest": "e5ad4d684d601b13a70961512fe2dc67", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 3038, "upload_time": "2019-02-12T14:34:29", "url": "https://files.pythonhosted.org/packages/b9/d5/146d41f4008b6d1d6abe0479630bb74bf5c04d73fe501138c0cc8ea4eddf/spacy-langdetect-0.1.1.tar.gz" } ], "0.1.2": [ { "comment_text": "", "digests": { "md5": "42ec70afe4e200cf0110b341c117b803", "sha256": "fb77878fb2445933cb6db836fc20a0d712fa685e1bd7e3b6a447d0556e54ebde" }, "downloads": -1, "filename": "spacy_langdetect-0.1.2-py3-none-any.whl", "has_sig": false, "md5_digest": "42ec70afe4e200cf0110b341c117b803", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 5012, "upload_time": "2019-05-02T06:47:23", "url": "https://files.pythonhosted.org/packages/29/70/72dad19abe81ca8e85ff951da170915211d42d705a001d7e353af349a704/spacy_langdetect-0.1.2-py3-none-any.whl" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "42ec70afe4e200cf0110b341c117b803", "sha256": "fb77878fb2445933cb6db836fc20a0d712fa685e1bd7e3b6a447d0556e54ebde" }, "downloads": -1, "filename": "spacy_langdetect-0.1.2-py3-none-any.whl", "has_sig": false, "md5_digest": "42ec70afe4e200cf0110b341c117b803", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 5012, "upload_time": "2019-05-02T06:47:23", "url": "https://files.pythonhosted.org/packages/29/70/72dad19abe81ca8e85ff951da170915211d42d705a001d7e353af349a704/spacy_langdetect-0.1.2-py3-none-any.whl" } ] }