{ "info": { "author": "Ines Montani", "author_email": "ines@explosion.ai", "bugtrack_url": null, "classifiers": [], "description": "spacymoji: emoji for spaCy\n**************************\n\n`spaCy v2.0 `_ extension and pipeline component\nfor adding emoji meta data to ``Doc`` objects. Detects emoji consisting of one\nor more unicode characters, and can optionally merge multi-char emoji (combined\npictures, emoji with skin tone modifiers) into one token. Human-readable emoji\ndescriptions are added as a custom attribute, and an optional lookup table can\nbe provided for your own descriptions. The extension sets the custom ``Doc``,\n``Token`` and ``Span`` attributes ``._.is_emoji``, ``._.emoji_desc``,\n``._.has_emoji`` and ``._.emoji``. You can read more about custom pipeline\ncomponents and extension attributes\n`here `_.\n\nEmoji are matched using spaCy's ``PhraseMatcher``, and looked up in the data\ntable provided by the `\"emoji\" package `_.\n\n.. image:: https://img.shields.io/github/release/ines/spacymoji.svg?style=flat-square\n :target: https://github.com/ines/spacymoji/releases\n :alt: Current Release Version\n\n.. image:: https://img.shields.io/pypi/v/spacymoji.svg?style=flat-square\n :target: https://pypi.python.org/pypi/spacymoji\n :alt: pypi Version\n\n\u23f3 Installation\n===============\n\n``spacymoji`` requires ``spacy`` v2.0.0 or higher.\n\n.. code:: bash\n\n pip install spacymoji\n\n\u261d\ufe0f Usage\n========\n\nImport the component and initialise it with the shared ``nlp`` object (i.e. an\ninstance of ``Language``), which is used to initialise the ``PhraseMatcher``\nwith the shared vocab, and create the match patterns. Then add the component\nanywhere in your pipeline.\n\n.. code:: python\n\n import spacy\n from spacymoji import Emoji\n\n nlp = spacy.load('en')\n emoji = Emoji(nlp)\n nlp.add_pipe(emoji, first=True)\n\n doc = nlp(u\"This is a test \ud83d\ude3b \ud83d\udc4d\ud83c\udfff\")\n assert doc._.has_emoji == True\n assert doc[2:5]._.has_emoji == True\n assert doc[0]._.is_emoji == False\n assert doc[4]._.is_emoji == True\n assert doc[5]._.emoji_desc == u'thumbs up dark skin tone'\n assert len(doc._.emoji) == 2\n assert doc._.emoji[1] == (u'\ud83d\udc4d\ud83c\udfff', 5, u'thumbs up dark skin tone')\n\n``spacymoji`` only cares about the token text, so you can use it on a blank\n``Language`` instance (it should work for all\n`available languages `_!), or in\na pipeline with a loaded model. If you're loading a model and your pipeline\nincludes a tagger, parser and entity recognizer, make sure to add the emoji\ncomponent as ``first=True``, so the spans are merged right after tokenization,\nand *before* the document is parsed. If your text contains a lot of emoji, this\nmight even give you a nice boost in parser accuracy.\n\nAvailable attributes\n--------------------\n\nThe extension sets attributes on the ``Doc``, ``Span`` and ``Token``. You can\nchange the attribute names on initialisation of the extension. For more details\non custom components and attributes, see the\n`processing pipelines documentation `_.\n\n====================== ======= ===\n``Token._.is_emoji`` bool Whether the token is an emoji.\n``Token._.emoji_desc`` unicode A human-readable description of the emoji.\n``Doc._.has_emoji`` bool Whether the document contains emoji.\n``Doc._.emoji`` list ``(emoji, index, description)`` tuples of the document's emoji.\n``Span._.has_emoji`` bool Whether the span contains emoji.\n``Span._.emoji`` list ``(emoji, index, description)`` tuples of the span's emoji.\n====================== ======= ===\n\nSettings\n--------\n\nOn initialisation of ``Emoji``, you can define the following settings:\n\n=============== ============ ===\n``nlp`` ``Language`` The shared ``nlp`` object. Used to initialise the matcher with the shared ``Vocab``, and create ``Doc`` match patterns.\n``attrs`` tuple Attributes to set on the ._ property. Defaults to ``('has_emoji', 'is_emoji', 'emoji_desc', 'emoji')``.\n``pattern_id`` unicode ID of match pattern, defaults to ``'EMOJI'``. Can be changed to avoid ID conflicts.\n``merge_spans`` bool Merge spans containing multi-character emoji, defaults to ``True``. Will only merge combined emoji resulting in one icon, not sequences.\n``lookup`` dict Optional lookup table that maps emoji unicode strings to custom descriptions, e.g. translations or other annotations.\n=============== ============ ===\n\n.. code:: python\n\n emoji = Emoji(nlp, attrs=('has_e', 'is_e', 'e_desc', 'e'), lookup={u'\ud83d\udc68\u200d\ud83c\udfa4': u'David Bowie'})\n nlp.add_pipe(emoji)\n doc = nlp(u\"We can be \ud83d\udc68\u200d\ud83c\udfa4 heroes\")\n assert doc[3]._.is_e\n assert doc[3]._.e_desc == u'David Bowie'\n\n\ud83d\udee3 Roadmap\n==========\n\nThis extension is still experimental, but here are some features that might\nbe cool to add in the future:\n\n* **Add match patterns and attributes for emoji shortcodes**, e.g. ``:+1:``. The shortcodes could optionally be merged into one token, and receive a ``NORM`` attribute with the unicode emoji. The ``NORM`` is used as a feature for training, so ``:+1:`` and \ud83d\udc4d would automatically receive similar representations.\n\n* **Add support for the Unicode Emoji Annotations project**. The JavaScript `package `_ also comes with `pre-compiled JSON data `_, including both standardised and community-contributed annotations in English and German.", "description_content_type": "", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://spacy.io", "keywords": "", "license": "MIT", "maintainer": "", "maintainer_email": "", "name": "spacymoji", "package_url": "https://pypi.org/project/spacymoji/", "platform": "", "project_url": "https://pypi.org/project/spacymoji/", "project_urls": { "Homepage": "https://spacy.io" }, "release_url": "https://pypi.org/project/spacymoji/2.0.0/", "requires_dist": null, "requires_python": "", "summary": "spaCy pipeline component for adding emoji meta data to Doc, Token and Span objects.", "version": "2.0.0" }, "last_serial": 5117667, "releases": { "0.0.1": [ { "comment_text": "", "digests": { "md5": "9fa7809d2526299ada812c5913b010a3", "sha256": "11b706cf079141351169b057ed7f2409d9ff353e9780a9698b5dfc07cf3c5f46" }, "downloads": -1, "filename": "spacymoji-0.0.1.tar.gz", "has_sig": false, "md5_digest": "9fa7809d2526299ada812c5913b010a3", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 5726, "upload_time": "2017-10-12T21:47:56", "url": "https://files.pythonhosted.org/packages/1c/18/f289088d3e63b17f7a9ada2a55ee753952d6579d526e9f8d5bc27af1fd28/spacymoji-0.0.1.tar.gz" } ], "1.0.0": [ { "comment_text": "", "digests": { "md5": "7fa29334ca44237f89ef7704a71dd1c0", "sha256": "e45b0abf93da973656f9f8e778c67259c2d34631dab22af910dec6df4ed2face" }, "downloads": -1, "filename": "spacymoji-1.0.0.tar.gz", "has_sig": false, "md5_digest": "7fa29334ca44237f89ef7704a71dd1c0", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 5788, "upload_time": "2017-12-09T09:16:58", "url": "https://files.pythonhosted.org/packages/3d/be/0074c90f82a38dbb3b4238f7b08de59a5b782ea1b5d8e15b2ce61db8b5a5/spacymoji-1.0.0.tar.gz" } ], "2.0.0": [ { "comment_text": "", "digests": { "md5": "8aaa47048ac334b64c7c6b6ad122c25e", "sha256": "3af5c43da897d0d9a84de3eada89cb9d406ba388c827e9ef9ca4fed8920f5b88" }, "downloads": -1, "filename": "spacymoji-2.0.0-py3-none-any.whl", "has_sig": false, "md5_digest": "8aaa47048ac334b64c7c6b6ad122c25e", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 7528, "upload_time": "2019-04-09T08:56:20", "url": "https://files.pythonhosted.org/packages/e6/42/b4460030eb06504451973609ab8a95b8c0106090aca7a4d657baf9b2611d/spacymoji-2.0.0-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "5d9e6d272f41f099973aecf826e357e2", "sha256": "4df573480523a79ccd41fc3b6f23c0baedd009012a7772abb0adf627e3da4c8a" }, "downloads": -1, "filename": "spacymoji-2.0.0.tar.gz", "has_sig": false, "md5_digest": "5d9e6d272f41f099973aecf826e357e2", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 5845, "upload_time": "2019-04-09T08:56:03", "url": "https://files.pythonhosted.org/packages/cf/76/e6b9ec7a442ad440c7b931d19cc018c1aca8ebed06fca04c69c2f7279e7e/spacymoji-2.0.0.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "8aaa47048ac334b64c7c6b6ad122c25e", "sha256": "3af5c43da897d0d9a84de3eada89cb9d406ba388c827e9ef9ca4fed8920f5b88" }, "downloads": -1, "filename": "spacymoji-2.0.0-py3-none-any.whl", "has_sig": false, "md5_digest": "8aaa47048ac334b64c7c6b6ad122c25e", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 7528, "upload_time": "2019-04-09T08:56:20", "url": "https://files.pythonhosted.org/packages/e6/42/b4460030eb06504451973609ab8a95b8c0106090aca7a4d657baf9b2611d/spacymoji-2.0.0-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "5d9e6d272f41f099973aecf826e357e2", "sha256": "4df573480523a79ccd41fc3b6f23c0baedd009012a7772abb0adf627e3da4c8a" }, "downloads": -1, "filename": "spacymoji-2.0.0.tar.gz", "has_sig": false, "md5_digest": "5d9e6d272f41f099973aecf826e357e2", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 5845, "upload_time": "2019-04-09T08:56:03", "url": "https://files.pythonhosted.org/packages/cf/76/e6b9ec7a442ad440c7b931d19cc018c1aca8ebed06fca04c69c2f7279e7e/spacymoji-2.0.0.tar.gz" } ] }