{ "info": { "author": "Nick Haynes", "author_email": "nickdavidhaynes@gmail.com", "bugtrack_url": null, "classifiers": [], "description": "# spaCy-CLD: Bringing simple language detection to spaCy\n\n## Installation\n\n`pip install spacy_cld`\n\n## Usage\n\nAdding the spaCy-CLD component to the processing pipeline is relatively simple:\n\n```\nimport spacy\nfrom spacy_cld import LanguageDetector\n\nnlp = spacy.load('en')\nlanguage_detector = LanguageDetector()\nnlp.add_pipe(language_detector)\ndoc = nlp('This is some English text.')\n\ndoc._.languages # ['en']\ndoc._.language_scores['en'] # 0.96\n```\n\nspaCy-CLD operates on `Doc` and `Span` spaCy objects. When called on a `Doc` or `Span`, the object is given two attributes: `languages` (a list of up to 3 language codes) and `language_scores` (a dictionary mapping language codes to confidence scores between 0 and 1).\n\n## Under the hood\n\nspacy-cld is a little extension that wraps the [PYCLD2](https://github.com/aboSamoor/pycld2) Python library, which in turn wraps the [Compact Language Detector 2](https://github.com/CLD2Owners/cld2) C library originally built at Google for the Chromium project. CLD2 uses character n-grams as features and a Naive Bayes classifier to identify 80+ languages from Unicode text strings (or XML/HTML). It can detect up to 3 different languages in a given document, and reports a confidence score (reported in with each language.\n\nFor additional details, see the linked project pages for PYCLD2 and CLD2.", "description_content_type": null, "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/nickdavidhaynes/spacy-cld", "keywords": "", "license": "MIT", "maintainer": "", "maintainer_email": "", "name": "spacy-cld", "package_url": "https://pypi.org/project/spacy-cld/", "platform": "", "project_url": "https://pypi.org/project/spacy-cld/", "project_urls": { "Homepage": "https://github.com/nickdavidhaynes/spacy-cld" }, "release_url": "https://pypi.org/project/spacy-cld/0.1.0/", "requires_dist": null, "requires_python": "", "summary": "spaCy pipeline component for guessing the language of Doc and Span objects.", "version": "0.1.0" }, "last_serial": 3377871, "releases": { "0.0.1": [ { "comment_text": "", "digests": { "md5": "280f1cf40c70e5a77af3a092deb7cfd9", "sha256": "69d4fa172389964eea2e237149392c81eb1215e484242dba9d3d093572cc6487" }, "downloads": -1, "filename": "spacy_cld-0.0.1-py3.6.egg", "has_sig": false, "md5_digest": "280f1cf40c70e5a77af3a092deb7cfd9", "packagetype": "bdist_egg", "python_version": "3.6", "requires_python": null, "size": 5022, "upload_time": "2017-11-30T03:46:27", "url": "https://files.pythonhosted.org/packages/e2/55/fc99326f95b0e57857fc9d1f6ec86bc55c7765ed50fabe694165aabdd91e/spacy_cld-0.0.1-py3.6.egg" }, { "comment_text": "", "digests": { "md5": "0caa9f666bf9cf8ad5c30d599657480c", "sha256": "ec5332330ecf921c1382114a9f7208bc0d0cbcd3d9519b5d70d5560fc72a8fa0" }, "downloads": -1, "filename": "spacy_cld-0.0.1.tar.gz", "has_sig": false, "md5_digest": "0caa9f666bf9cf8ad5c30d599657480c", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 3070, "upload_time": "2017-11-30T03:46:28", "url": "https://files.pythonhosted.org/packages/8b/bd/c0f4a8a4bd0c0e6b3126aa61f7aed6fa9a350ee9dbfa7b7acdfb42f9cd92/spacy_cld-0.0.1.tar.gz" } ], "0.0.2": [ { "comment_text": "", "digests": { "md5": "682385828e7f8145678b78fc87d514a6", "sha256": "5178dcd389ce3cd9fc08d37e052705dfcdc2a44ad294a2adf2e278302e1546bc" }, "downloads": -1, "filename": "spacy_cld-0.0.2.tar.gz", "has_sig": false, "md5_digest": "682385828e7f8145678b78fc87d514a6", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 3069, "upload_time": "2017-11-30T03:52:08", "url": "https://files.pythonhosted.org/packages/41/73/22857f2d8003290f78c930146fc6e5d00cdefb68884661c2180bcbcb8920/spacy_cld-0.0.2.tar.gz" } ], "0.0.3": [ { "comment_text": "", "digests": { "md5": "7f73bc2aa015cb8b2d249b5932818f77", "sha256": "0f12d97d943d92095b64835ef51a0f9dd97dd018887b2ebd53330c0febaca7bf" }, "downloads": -1, "filename": "spacy_cld-0.0.3.tar.gz", "has_sig": false, "md5_digest": "7f73bc2aa015cb8b2d249b5932818f77", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 3197, "upload_time": "2017-11-30T15:24:21", "url": "https://files.pythonhosted.org/packages/af/7b/6304085333b4c9a15ee9c0a6c8ad58e184feefb9cb5fa8dda4e3c48b1cde/spacy_cld-0.0.3.tar.gz" } ], "0.1.0": [ { "comment_text": "", "digests": { "md5": "0572f0ff474332ec85c0b348ad248619", "sha256": "f40178116c90bcb77d343976f9502c865b56643cb9dd8c7e3ca93c91303872cd" }, "downloads": -1, "filename": "spacy_cld-0.1.0.tar.gz", "has_sig": false, "md5_digest": "0572f0ff474332ec85c0b348ad248619", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 3263, "upload_time": "2017-11-30T16:40:27", "url": "https://files.pythonhosted.org/packages/e3/3b/f5344007259b5beb0a8e0d7b9e6b0d2c5c4dcfe674bc94b7497bcc201ee0/spacy_cld-0.1.0.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "0572f0ff474332ec85c0b348ad248619", "sha256": "f40178116c90bcb77d343976f9502c865b56643cb9dd8c7e3ca93c91303872cd" }, "downloads": -1, "filename": "spacy_cld-0.1.0.tar.gz", "has_sig": false, "md5_digest": "0572f0ff474332ec85c0b348ad248619", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 3263, "upload_time": "2017-11-30T16:40:27", "url": "https://files.pythonhosted.org/packages/e3/3b/f5344007259b5beb0a8e0d7b9e6b0d2c5c4dcfe674bc94b7497bcc201ee0/spacy_cld-0.1.0.tar.gz" } ] }