{ "info": { "author": "sami moustachir", "author_email": "moustachir.sami@gmail.com", "bugtrack_url": null, "classifiers": [ "Development Status :: 3 - Alpha", "Environment :: Console", "Intended Audience :: Developers", "Intended Audience :: Science/Research", "License :: OSI Approved :: MIT License", "Natural Language :: English", "Operating System :: OS Independent", "Programming Language :: Python", "Programming Language :: Python :: 2", "Programming Language :: Python :: 3", "Topic :: Scientific/Engineering", "Topic :: Software Development :: Libraries :: Python Modules" ], "description": "[![Build Status](https://travis-ci.org/sammous/spacy-lefff.svg?branch=master)](https://travis-ci.org/sammous/spacy-lefff)[![Coverage Status](https://codecov.io/gh/sammous/spacy-lefff/badge.svg?branch=master)](https://codecov.io/gh/sammous/spacy-lefff?branch=master)[![PyPI version](https://badge.fury.io/py/spacy-lefff.svg)](https://badge.fury.io/py/spacy-lefff)\n# spacy-lefff : Custom French POS and lemmatizer based on Lefff for spacy \n\n[spacy v2.0](https://spacy.io/usage/v2) extension and pipeline component for adding a French POS and lemmatizer based on [Lefff](https://hal.inria.fr/inria-00521242/).\n\n*On version [v2.0.17](https://github.com/explosion/spaCy/releases/tag/v2.0.17), spaCy updated French lemmatization*\n\n## Description\n\nThis package allows to bring Lefff lemmatization and part-of-speech tagging to a spaCy custom pipeline.\nWhen POS tagging and Lemmatizaion are combined inside a pipeline, it improves your text preprocessing for French compared to the built-in spaCy French processing.\nIt is still a WIP (work in progress), so the matching might not be perfect but if nothing was found by the package, it is still possible to use the default results of `spaCy`.\n\n## Installation\n\n`spacy-lefff` requires `spacy` <= v2.0.12.\n\n```\npip install spacy-lefff\n```\n\n## Usage\n\nImport and initialize your `nlp` spacy object and add the custom component after it parsed the document so you can benefit the POS tags.\nBe aware to work with `UTF-8`.\n\nIf both POS and lemmatizer are bundled, you need to tell the lemmatizer to use MElt mapping by setting `after_melt`, else it will use the spaCy part of speech mapping.\n\n`default` option allows to return the word by default if no lemma was found.\n\nCurrent mapping used spaCy to Lefff is :\n\n```json\n{\n \"ADJ\": \"adj\",\n \"ADP\": \"det\",\n \"ADV\": \"adv\",\n \"DET\": \"det\",\n \"PRON\": \"cln\",\n \"PROPN\": \"np\",\n \"NOUN\": \"nc\",\n \"VERB\": \"v\",\n \"PUNCT\": \"poncts\"\n}\n```\n\n## MElt Tagset\n\nMElt Tag table:\n\n```\nADJ \t adjective\nADJWH\t interrogative adjective\nADV\t adverb\nADVWH\t interrogative adverb\nCC\t coordination conjunction\nCLO\t object clitic pronoun\nCLR\t reflexive clitic pronoun\nCLS\t subject clitic pronoun\nCS\t subordination conjunction\nDET\t determiner\nDETWH\t interrogative determiner\nET\t foreign word\nI\t interjection\nNC\t common noun\nNPP\t proper noun\nP\t preposition\nP+D\t preposition+determiner amalgam\nP+PRO\t prepositon+pronoun amalgam\nPONCT\t punctuation mark\nPREF\t prefix\nPRO\t full pronoun\nPROREL\t relative pronoun\nPROWH\t interrogative pronoun\nV\t indicative or conditional verb form\nVIMP\t imperative verb form\nVINF\t infinitive verb form\nVPP\t past participle\nVPR\t present participle\nVS\t subjunctive verb form\n```\n\n### Code snippet\n\nYou need to install the French spaCy package before : `python -m spacy download fr`.\n\n- An example using the `LefffLemmatizer` without the `POSTagger`:\n\n```python\nimport spacy\nfrom spacy_lefff import LefffLemmatizer, POSTagger\n\nnlp = spacy.load('fr')\nfrench_lemmatizer = LefffLemmatizer()\nnlp.add_pipe(french_lemmatizer, name='lefff')\ndoc = nlp(u\"Apple cherche a acheter une startup anglaise pour 1 milliard de dollard\")\nfor d in doc:\n print(d.text, d.pos_, d._.lefff_lemma, d.tag_, d.lemma_)\n```\n\n| Text | spaCy POS | Lefff Lemma | spaCy tag | spaCy Lemma |\n|-----|-----|-----|------|------|\n|Apple | ADJ | None | ADJ__Number=Sing | Apple |\n|cherche |NOUN |cherche |NOUN__Number=Sing| chercher|\n|a |AUX |None |AUX__Mood=Ind Number=Sing Person=3 Tense=Pres VerbForm=Fin |avoir|\n|acheter| VERB| acheter| VERB__VerbForm=Inf| acheter|\n|une |DET |un |DET__Definite=Ind Gender=Fem Number=Sing PronType=Art |un|\n|startup |ADJ |None| ADJ__Number=Sing| startup|\n|anglaise |NOUN |anglaise |NOUN__Gender=Fem Number=Sing |anglais|\n|pour |ADP |None| ADP___| pour|\n|1 |NUM |None |NUM__NumType=Card |1|\n|milliard| NOUN |milliard |NOUN__Gender=Masc Number=Sing NumType=Card| milliard|\n|de | ADP |un |ADP___ |de|\n|dollard | NOUN | None | NOUN__Gender=Masc Number=Sing |dollard|\n\n- An example using the `POSTagger` :\n\n```python\nimport spacy\nfrom spacy_lefff import LefffLemmatizer, POSTagger\n\nnlp = spacy.load('fr')\npos = POSTagger()\nfrench_lemmatizer = LefffLemmatizer(after_melt=True, default=True)\nnlp.add_pipe(pos, name='pos', after='parser')\nnlp.add_pipe(french_lemmatizer, name='lefff', after='pos')\ndoc = nlp(u\"Apple cherche a acheter une startup anglaise pour 1 milliard de dollard\")\nfor d in doc:\n print(d.text, d.pos_, d._.melt_tagger, d._.lefff_lemma, d.tag_, d.lemma_)\n```\n|Text|spaCy POS|MElt Tag| Lefff Lemma| spaCy tag| spaCy Lemma|\n|-----|-----|-----|-----|-----|-----|\n|Apple| ADJ| NPP| apple |ADJ__Number=Sing| Apple|\n|cherche |NOUN |V |chercher |NOUN__Number=Sing |chercher|\n|a |AUX |V| avoir| AUX__Mood=Ind Number=Sing Person=3 Tense=Pres VerbForm=Fin |avoir|\n|acheter |VERB |VINF| acheter | VERB__VerbForm=Inf| acheter|\n|une |DET |DET |un| DET__Definite=Ind Gender=Fem Number=Sing PronType=Art |un|\n|startup |ADJ| NC | startup |ADJ__Number=Sing|startup|\n|anglaise |NOUN |ADJ| anglais| NOUN__Gender=Fem Number=Sing| anglais|\n|pour |ADP |P| pour | ADP___ |pour|\n|1 |NUM| DET | 1 |NUM__NumType=Card |1|\n|milliard |NOUN |NC |milliard| NOUN__Gender=Masc Number=Sing NumType=Card| milliard|\n|de |ADP |P| de | ADP___ |de|\n|dollard| NOUN |NC | dollard |NOUN__Gender=Masc Number=Sing |dollard|\n\n\nWe can see that both `cherche` and `startup` where not tagged correctly by the default pos tagger.\n`spaCy`classified them as a `NOUN` and `ADJ` while `MElT` classified them as a `V` and an `NC`.\n\n## Credits\n\nSagot, B. (2010). [The Lefff, a freely available and large-coverage morphological and syntactic lexicon for French](https://hal.inria.fr/inria-00521242/). In 7th international conference on Language Resources and Evaluation (LREC 2010).\n\nBeno\u00eet Sagot Webpage about LEFFF
\nhttp://alpage.inria.fr/~sagot/lefff-en.html
\n\nFirst work of [Claude Coulombe](https://github.com/ClaudeCoulombe) to support Lefff with Python : https://github.com/ClaudeCoulombe", "description_content_type": "text/markdown", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/sammous/spacy-lefff", "keywords": "", "license": "MIT", "maintainer": "", "maintainer_email": "", "name": "spacy-lefff", "package_url": "https://pypi.org/project/spacy-lefff/", "platform": "", "project_url": "https://pypi.org/project/spacy-lefff/", "project_urls": { "Homepage": "https://github.com/sammous/spacy-lefff" }, "release_url": "https://pypi.org/project/spacy-lefff/0.3.5/", "requires_dist": null, "requires_python": "", "summary": "Custom French POS and lemmatizer based on Leff for spacy", "version": "0.3.5" }, "last_serial": 5681066, "releases": { "0.1": [ { "comment_text": "", "digests": { "md5": "2bf9c94bc012b7b2b5630cb4b17c1bc1", "sha256": "da723e751aa589459259027d4797cdc8cf10b47e733aedc4c3044af42630533e" }, "downloads": -1, "filename": "spacy_lefff-0.1-py2-none-any.whl", "has_sig": false, "md5_digest": "2bf9c94bc012b7b2b5630cb4b17c1bc1", "packagetype": "bdist_wheel", "python_version": "py2", "requires_python": null, "size": 5367, "upload_time": "2018-03-19T14:15:45", "url": "https://files.pythonhosted.org/packages/6e/86/07f27ca5b69e6e8d0cf97d461c3c94086152adc91eabe78bfe7f158099c7/spacy_lefff-0.1-py2-none-any.whl" }, { "comment_text": "", "digests": { "md5": "640468d210a8df59f4877a474c73d74e", "sha256": "d5fea9daf400eff8d76a7197d33e35a23969aed94942cd03011cf2a7ff20d2bd" }, "downloads": -1, "filename": "spacy-lefff-0.1.tar.gz", "has_sig": false, "md5_digest": "640468d210a8df59f4877a474c73d74e", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 3189, "upload_time": "2018-03-19T14:15:47", "url": "https://files.pythonhosted.org/packages/24/f9/f77f6bf94536a8e31611842e76e47e75f3174e670965679b450b9fc907c8/spacy-lefff-0.1.tar.gz" } ], "0.1.1": [ { "comment_text": "", "digests": { "md5": "ef1ce4af7473cfc41a9fc964bcaaf9f2", "sha256": "f17505ce782a14b7fc88270b98a5fa240ec073c3dd6b68a405407446523bfeb6" }, "downloads": -1, "filename": "spacy-lefff-0.1.1.tar.gz", "has_sig": false, "md5_digest": "ef1ce4af7473cfc41a9fc964bcaaf9f2", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 2944191, "upload_time": "2018-03-20T15:13:55", "url": "https://files.pythonhosted.org/packages/d8/52/623ab57555dd22ed41763d6ea48bd97da349d832ea9ca4c2dac63facb0e8/spacy-lefff-0.1.1.tar.gz" } ], "0.1.2": [ { "comment_text": "", "digests": { "md5": "10d68fcf06dd4c56b85c678eb69977ab", "sha256": "b2510fdbff0479941de317ff04484e9ac78a2af8f907e0b080d82cf550957cc5" }, "downloads": -1, "filename": "spacy-lefff-0.1.2.tar.gz", "has_sig": false, "md5_digest": "10d68fcf06dd4c56b85c678eb69977ab", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 2945420, "upload_time": "2018-03-20T15:39:27", "url": "https://files.pythonhosted.org/packages/ea/69/e20985cdf254c1f93a81097563fd057e6fadcee84aa43d3f163716ece3ff/spacy-lefff-0.1.2.tar.gz" } ], "0.1.3": [ { "comment_text": "", "digests": { "md5": "387450487e9a9cd8a5eb557fe34f6acb", "sha256": "a849a99556a9b7dc767101f3c8ed5cf569de812088ca123e8997e7ed52939d59" }, "downloads": -1, "filename": "spacy_lefff-0.1.3-py2.7.egg", "has_sig": false, "md5_digest": "387450487e9a9cd8a5eb557fe34f6acb", "packagetype": "bdist_egg", "python_version": "2.7", "requires_python": null, "size": 2946571, "upload_time": "2018-05-24T08:19:52", "url": "https://files.pythonhosted.org/packages/7e/be/50b3d375a2e3e05ead2533704a130169616caa1394765bfd1011885efae4/spacy_lefff-0.1.3-py2.7.egg" }, { "comment_text": "", "digests": { "md5": "5108b9719e5659ce46f850c366c0b877", "sha256": "6d31cad106dd34fbb8cf4322ecad3af6b358c77fa60476affc7cdd161054eeff" }, "downloads": -1, "filename": "spacy-lefff-0.1.3.tar.gz", "has_sig": false, "md5_digest": "5108b9719e5659ce46f850c366c0b877", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 2945554, "upload_time": "2018-05-07T12:37:59", "url": "https://files.pythonhosted.org/packages/b1/8f/b3158c50ef72063039c2d49c8a973984a835744b370f7073a4ffd3d0b000/spacy-lefff-0.1.3.tar.gz" } ], "0.2.1": [ { "comment_text": "", "digests": { "md5": "1422bee5582d6f992c69b854b0d38cba", "sha256": "9d2e5a5d7d14300af785b42ef7025980ec764ee6eb262e2933ce4d4ef5bad134" }, "downloads": -1, "filename": "spacy-lefff-0.2.1.tar.gz", "has_sig": false, "md5_digest": "1422bee5582d6f992c69b854b0d38cba", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 2956791, "upload_time": "2018-05-24T09:20:06", "url": "https://files.pythonhosted.org/packages/61/24/15888a5046334d9f62069d8539de53fc9109b34ff314ba2ebb77580c719a/spacy-lefff-0.2.1.tar.gz" } ], "0.3.1": [ { "comment_text": "", "digests": { "md5": "f442ecfa212197753ebb17202c962611", "sha256": "e0c6c118fcb77dd7caf9fabbce7f26d6c0fc947c271c66ee4c5ca72f8f99f3e7" }, "downloads": -1, "filename": "spacy-lefff-0.3.1.tar.gz", "has_sig": false, "md5_digest": "f442ecfa212197753ebb17202c962611", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 2957038, "upload_time": "2018-06-25T12:24:09", "url": "https://files.pythonhosted.org/packages/7a/8b/743b4929d81197b2026e8e5bf7931dd2e83e7e8bbcdf03ab0dc4b0963dac/spacy-lefff-0.3.1.tar.gz" } ], "0.3.2": [ { "comment_text": "", "digests": { "md5": "9f807ce0fba7e3d4ab51663562bd1f33", "sha256": "bfd894375dde40bef2322b4cc30e796eaaa4914b00d18a012be71005f4a2aa60" }, "downloads": -1, "filename": "spacy-lefff-0.3.2.tar.gz", "has_sig": false, "md5_digest": "9f807ce0fba7e3d4ab51663562bd1f33", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 2957895, "upload_time": "2018-07-24T15:11:39", "url": "https://files.pythonhosted.org/packages/cf/f0/af44cc838f508007b1382620e75bb1fc3c78c0e93c655a8c16f6890da818/spacy-lefff-0.3.2.tar.gz" } ], "0.3.3": [ { "comment_text": "", "digests": { "md5": "2a27f9adf2cade63a36c838a38194137", "sha256": "ba980d47b57106f5d647088d1d666d5995e1e8697f152fc64fa3d48d6ba55aac" }, "downloads": -1, "filename": "spacy-lefff-0.3.3.tar.gz", "has_sig": false, "md5_digest": "2a27f9adf2cade63a36c838a38194137", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 2957884, "upload_time": "2018-08-23T14:19:17", "url": "https://files.pythonhosted.org/packages/9f/e3/dec6fa3f84bb3d875817cf46358736874ec55a131c52d6fd86e47eb5a9e3/spacy-lefff-0.3.3.tar.gz" } ], "0.3.4": [ { "comment_text": "", "digests": { "md5": "ea9688f33593f2237bf870109bf92ddc", "sha256": "154aebb4042309e3f92013c6efc641fcc582f0cb6b6cc1ebd9cd80c6d98ba87d" }, "downloads": -1, "filename": "spacy-lefff-0.3.4.tar.gz", "has_sig": false, "md5_digest": "ea9688f33593f2237bf870109bf92ddc", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 2955749, "upload_time": "2019-08-07T08:00:47", "url": "https://files.pythonhosted.org/packages/cd/57/b807a2331e4b09457b7f054ec69ef6f22fdd668739c959bd3c64e4a1b401/spacy-lefff-0.3.4.tar.gz" } ], "0.3.5": [ { "comment_text": "", "digests": { "md5": "1209d0608508505b52b594ea7dc1dfeb", "sha256": "8e1682f963468c3bfd67e84d01fe6aae640fb506ff4af5e8d571bc8212165478" }, "downloads": -1, "filename": "spacy-lefff-0.3.5.tar.gz", "has_sig": false, "md5_digest": "1209d0608508505b52b594ea7dc1dfeb", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 2955795, "upload_time": "2019-08-15T08:12:07", "url": "https://files.pythonhosted.org/packages/82/41/3ebb19ae52087c221b0d3e562dfaaad140d6eb41be43aae65fffcd7eef42/spacy-lefff-0.3.5.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "1209d0608508505b52b594ea7dc1dfeb", "sha256": "8e1682f963468c3bfd67e84d01fe6aae640fb506ff4af5e8d571bc8212165478" }, "downloads": -1, "filename": "spacy-lefff-0.3.5.tar.gz", "has_sig": false, "md5_digest": "1209d0608508505b52b594ea7dc1dfeb", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 2955795, "upload_time": "2019-08-15T08:12:07", "url": "https://files.pythonhosted.org/packages/82/41/3ebb19ae52087c221b0d3e562dfaaad140d6eb41be43aae65fffcd7eef42/spacy-lefff-0.3.5.tar.gz" } ] }