{ "info": { "author": "Yash Patadia", "author_email": "yash@patadia.org", "bugtrack_url": null, "classifiers": [ "Development Status :: 3 - Alpha", "Intended Audience :: Developers", "License :: OSI Approved :: MIT License", "Programming Language :: Python :: 3.6", "Topic :: Software Development :: Build Tools" ], "description": "# DframCy\n[![Package Version](https://img.shields.io/pypi/v/dframcy.svg)](https://pypi.python.org/pypi/dframcy/)\n[![Python 3.6](https://img.shields.io/badge/python-3.6-blue.svg)](https://www.python.org/downloads/release/python-360/) \n[![Build Status](https://travis-ci.org/yash1994/dframcy.svg?branch=master)](https://travis-ci.org/yash1994/dframcy) \n[![Coverage Status](https://coveralls.io/repos/github/yash1994/dframcy/badge.svg?&service=github)](https://coveralls.io/github/yash1994/dframcy)\n\nDframCy is a light-weight utility module to integrate Pandas Dataframe to spaCy's linguistic annotation and training tasks. DframCy provides clean APIs to convert spaCy's linguistic annotations, Matcher and PhraseMatcher information to Pandas dataframe, also supports training and evaluation of NLP pipeline from CSV/XLXS/XLS without any changes to spaCy's underlying APIs.\n\n## Getting Started\nDframCy can be easily installed. Just need to the following:\n### Requirements\n* Python 3.6\n* Pandas\n* spaCy 2.2.0\n\nAlso need to download spaCy's language model:\n```bash\npython -m spacy download en_core_web_sm\n```\nFor more information refer to: [Models & Languages](https://spacy.io/usage/models)\n\n### Installation:\nThis package can be installed from [PyPi](https://pypi.org/project/dframcy/) by running:\n```bash\npip install dframcy\n```\nTo build from source:\n```bash\ngit clone https://github.com/yash1994/dframcy.git\ncd dframcy\npython setup.py install\n```\n\n## Usage\n\n#### Linguistic Annotations\nGet linguistic annotation in the dataframe. For linguistic annotations (dataframe column names) refer to [spaCy's Token API](https://spacy.io/api/token) document.\n```python\nfrom dframcy import DframCy\ndframcy = DframCy(\"en_core_web_sm\")\ndoc = dframcy.nlp(u\"Apple is looking at buying U.K. startup for $1 billion\")\n\n# default columns: ['id', 'text', 'start', 'end', 'pos', 'tag', 'dep', 'head', 'label'] \nannotation_dataframe = dframcy.to_dataframe(doc)\n\n# can also pass columns names (spaCy's linguistic annotation attributes)\nannotation_dataframe = dframcy.to_dataframe(doc, columns=[\"text\", \"lemma\", \"lower\", \"is_punct\"])\n\n# for separate entity dataframe\ntoken_annotation_dataframe, entity_dataframe = dframcy.to_dataframe(doc, separate_entity_dframe=True) \n```\n#### Rule-Based Matching\n```python\n# Token-based Matching\nfrom dframcy.matcher import DframCyMatcher, DframCyPhraseMatcher\ndframcy_matcher = DframCyMatcher(\"en_core_web_sm\")\npattern = [{\"LOWER\": \"hello\"}, {\"IS_PUNCT\": True}, {\"LOWER\": \"world\"}]\ndframcy_matcher.add(\"HelloWorld\", None, pattern)\ndoc = dframcy_matcher.nlp(\"Hello, world! Hello world!\")\nmatches_dataframe = dframcy_matcher(doc)\n\n# Phrase Matching\ndframcy_phrase_matcher = DframCyPhraseMatcher(\"en_core_web_sm\")\nterms = [u\"Barack Obama\", u\"Angela Merkel\",u\"Washington, D.C.\"]\npatterns = [dframcy_phrase_matcher.get_nlp().make_doc(text) for text in terms]\ndframcy_phrase_matcher.add(\"TerminologyList\", None, *patterns)\ndoc = dframcy_phrase_matcher.nlp(u\"German Chancellor Angela Merkel and US President Barack Obama \"\n u\"converse in the Oval Office inside the White House in Washington, D.C.\")\nphrase_matches_dataframe = dframcy_phrase_matcher(doc)\n```\n#### Command Line Interface\nDframcy supports command line arguments for conversion of plain text file to linguistically annotated text in CSV/JSON format, training and evaluation of language models from CSV/XLS formatted training data.\n[Training data example](https://github.com/yash1994/dframcy/blob/master/data/training_data_format.csv). CLI arguments for training and evaluation are exactly same as [spaCy's CLI](https://spacy.io/api/cli), only difference is the format of training data.\n```bash\n# convert\ndframcy convert -i plain_text.txt -o annotations.csv -t CSV\n\n# train\ndframcy train -l en -o spacy_models -t train.csv -d test.csv\n\n# evaluate\ndframcy evaluate -m spacy_model/ -d test.csv\n```", "description_content_type": "text/markdown", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/yash1994/dframcy", "keywords": "spacy,dataframe,pandas", "license": "", "maintainer": "", "maintainer_email": "", "name": "dframcy", "package_url": "https://pypi.org/project/dframcy/", "platform": "", "project_url": "https://pypi.org/project/dframcy/", "project_urls": { "Homepage": "https://github.com/yash1994/dframcy" }, "release_url": "https://pypi.org/project/dframcy/0.1.2/", "requires_dist": null, "requires_python": "", "summary": "Pandas Dataframe integration for spaCy", "version": "0.1.2" }, "last_serial": 5970840, "releases": { "0.0.1": [ { "comment_text": "", "digests": { "md5": "09509afa22262be1e1c3fc3e6fa55584", "sha256": "8c27c8c011ce01ae0d1e8cd708df33376ffe16bfc23d43facb5b6973193478c0" }, "downloads": -1, "filename": "dframcy-0.0.1-py3.6.egg", "has_sig": false, "md5_digest": "09509afa22262be1e1c3fc3e6fa55584", "packagetype": "bdist_egg", "python_version": "3.6", "requires_python": null, "size": 35740, "upload_time": "2019-10-14T09:31:41", "url": "https://files.pythonhosted.org/packages/c2/e7/9b5f8dc2917cb657274ada375181914f30dedf9ff8170794382036f968d6/dframcy-0.0.1-py3.6.egg" } ], "0.1.0": [ { "comment_text": "", "digests": { "md5": "86571819ff1968a21cd02c513d954ffb", "sha256": "0336527ddb376b428cae35eba512907b9c383cf0a763515ea661aece05fa1e77" }, "downloads": -1, "filename": "dframcy-0.1.0-py3-none-any.whl", "has_sig": false, "md5_digest": "86571819ff1968a21cd02c513d954ffb", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 20710, "upload_time": "2019-10-14T09:47:39", "url": "https://files.pythonhosted.org/packages/87/6b/48578cea6395dd6c0273653897dc6da5ac853c7aa8d007a93d6481cda5f5/dframcy-0.1.0-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "9a53c0172c9071a376d4d8ad177b97a4", "sha256": "476c7851f8c4281e73867b725a575cd290c1bf46290f1ee025f28472d1eebebf" }, "downloads": -1, "filename": "dframcy-0.1.0.tar.gz", "has_sig": false, "md5_digest": "9a53c0172c9071a376d4d8ad177b97a4", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 17953, "upload_time": "2019-10-14T09:31:45", "url": "https://files.pythonhosted.org/packages/30/54/9c78ed3fe505a3143fd2f72fed5fe634434ceebaf98c58c5fd40e0721e82/dframcy-0.1.0.tar.gz" }, { "comment_text": "", "digests": { "md5": "0e11ffe20fd57cedfdc9ec0804662d55", "sha256": "863269aa31673797b52bb83a90e45411bba3e1800d379fb198a96359dbd7edda" }, "downloads": -1, "filename": "dframcy-0.1-py3.6.egg", "has_sig": false, "md5_digest": "0e11ffe20fd57cedfdc9ec0804662d55", "packagetype": "bdist_egg", "python_version": "3.6", "requires_python": null, "size": 43249, "upload_time": "2019-10-14T09:31:47", "url": "https://files.pythonhosted.org/packages/4e/b0/090eb9f96bdd0d8dd2f5161c209a69d084c0741d6553adb3905f3b9883fe/dframcy-0.1-py3.6.egg" } ], "0.1.1": [ { "comment_text": "", "digests": { "md5": "8665067d94d7a745a3159c459b9bdfb9", "sha256": "5cd15836698676a8799518d65ef2126f6c3ccb852e1ca907c1ee1bf557496eea" }, "downloads": -1, "filename": "dframcy-0.1.1.tar.gz", "has_sig": false, "md5_digest": "8665067d94d7a745a3159c459b9bdfb9", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 18179, "upload_time": "2019-10-14T10:05:20", "url": "https://files.pythonhosted.org/packages/f9/da/7b4b03026baf38ad92c71db305dea24283b0c8a94c891010e72923cd8771/dframcy-0.1.1.tar.gz" } ], "0.1.2": [ { "comment_text": "", "digests": { "md5": "672349d6f7a506a9a33154a46a66eb46", "sha256": "01fc1ca5896c6f213b552f5f8f2d31558f2d782043280ea374c0b72cc2619293" }, "downloads": -1, "filename": "dframcy-0.1.2.tar.gz", "has_sig": false, "md5_digest": "672349d6f7a506a9a33154a46a66eb46", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 24536, "upload_time": "2019-10-14T10:41:19", "url": "https://files.pythonhosted.org/packages/61/b1/b93f0360aa88815e3ec79ca7c928d620963686c6b2b8b790394e1fecdde9/dframcy-0.1.2.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "672349d6f7a506a9a33154a46a66eb46", "sha256": "01fc1ca5896c6f213b552f5f8f2d31558f2d782043280ea374c0b72cc2619293" }, "downloads": -1, "filename": "dframcy-0.1.2.tar.gz", "has_sig": false, "md5_digest": "672349d6f7a506a9a33154a46a66eb46", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 24536, "upload_time": "2019-10-14T10:41:19", "url": "https://files.pythonhosted.org/packages/61/b1/b93f0360aa88815e3ec79ca7c928d620963686c6b2b8b790394e1fecdde9/dframcy-0.1.2.tar.gz" } ] }