{ "info": { "author": "The DKPro cassis team", "author_email": "dkpro-core-user@googlegroups.com", "bugtrack_url": null, "classifiers": [ "Development Status :: 4 - Beta", "Intended Audience :: Developers", "Intended Audience :: Science/Research", "License :: OSI Approved :: Apache Software License", "Programming Language :: Python :: 3.5", "Programming Language :: Python :: 3.6", "Programming Language :: Python :: 3.7", "Programming Language :: Python :: 3 :: Only", "Topic :: Scientific/Engineering :: Human Machine Interfaces", "Topic :: Software Development :: Libraries", "Topic :: Text Processing :: Linguistic" ], "description": "dkpro-cassis\n============\n\n.. image:: https://travis-ci.org/dkpro/dkpro-cassis.svg?branch=master\n :target: https://travis-ci.org/dkpro/dkpro-cassis\n\n.. image:: https://readthedocs.org/projects/cassis/badge/?version=latest\n :target: https://cassis.readthedocs.io/en/latest/?badge=latest\n :alt: Documentation Status\n\n.. image:: https://codecov.io/gh/dkpro/dkpro-cassis/branch/master/graph/badge.svg\n :target: https://codecov.io/gh/dkpro/dkpro-cassis\n\n.. image:: https://img.shields.io/pypi/l/dkpro-cassis.svg\n :alt: PyPI - License\n :target: https://pypi.org/project/dkpro-cassis/\n\n.. image:: https://img.shields.io/pypi/pyversions/dkpro-cassis.svg\n :alt: PyPI - Python Version\n :target: https://pypi.org/project/dkpro-cassis/\n\n.. image:: https://img.shields.io/pypi/v/dkpro-cassis.svg\n :alt: PyPI\n :target: https://pypi.org/project/dkpro-cassis/\n\n.. image:: https://img.shields.io/badge/code%20style-black-000000.svg\n :target: https://github.com/ambv/black\n\nDKPro **cassis** (pronunciation: [ka.sis]) provides a pure-Python implementation of the *Common Analysis System* (CAS)\nas defined by the `UIMA `_ framework. The CAS is a data structure representing an object to\nbe enrichted with annotations (the co-called *Subject of Analysis*, short *SofA*).\n\nThis library enables the creation and manipulation of CAS objects and their associated type systems as well as loading\nand saving CAS objects in the `CAS XMI XML representation `_\nin Python programs. This can ease in particular the integration of Python-based Natural Language Processing (e.g.\n`spacy `_ or `NLTK `_) and Machine Learning librarys (e.g.\n`scikit-learn `_ or `Keras `_) in UIMA-based text analysis workflows.\n\nAn example of cassis in action is the `spacy recommender for INCEpTION `_,\nwhich wraps the spacy NLP library as a web service which can be used in conjunction with the `INCEpTION `_\ntext annotation platform to automatically generate annotation suggestions.\n\nFeatures\n------------\n\nCurrently supported features are:\n\n- Text SofAs\n- Deserializing/serializing UIMA CAS from/to XMI\n- Deserializing/serializing type systems from/to XML\n- Selecting annotations, selecting covered annotations, adding\n annotations\n- Type inheritance\n- Multiple SofA support\n\nSome features are still under development, e.g.\n\n- feature encoding as XML elements (right now only XML attributes work)\n- proper type checking\n- XML/XMI schema validation\n- type unmarshalling from string to the actual type specified in the\n type system\n- reference, array and list features\n\nInstallation\n------------\n\nTo install the package with :code:`pip`, just run\n\n pip install dkpro-cassis\n\nUsage\n-----\n\nExample CAS XMI and types system files can be found under :code:`tests\\test_files`.\n\nLoading a CAS\n~~~~~~~~~~~~~\n\nA CAS can be deserialized from XMI either by reading from a file or\nstring using :code:`load_cas_from_xmi`.\n\n.. code:: python\n\n from cassis import *\n\n with open('typesystem.xml', 'rb') as f:\n typesystem = load_typesystem(f)\n \n with open('cas.xml', 'rb') as f:\n cas = load_cas_from_xmi(f, typesystem=typesystem)\n\nAdding annotations\n~~~~~~~~~~~~~~~~~~\n\nGiven a type system with a type :code:`cassis.Token` that has an :code:`id` and\n:code:`pos` feature, annotations can be added in the following:\n\n.. code:: python\n\n from cassis import *\n\n with open('typesystem.xml', 'rb') as f:\n typesystem = load_typesystem(f)\n \n with open('cas.xml', 'rb') as f:\n cas = load_cas_from_xmi(f, typesystem=typesystem)\n \n Token = typesystem.get_type('cassis.Token')\n\n tokens = [\n Token(begin=0, end=3, id='0', pos='NNP'),\n Token(begin=4, end=10, id='1', pos='VBD'),\n Token(begin=11, end=14, id='2', pos='IN'),\n Token(begin=15, end=18, id='3', pos='DT'),\n Token(begin=19, end=24, id='4', pos='NN'),\n Token(begin=25, end=26, id='5', pos='.'),\n ]\n\n for token in tokens:\n cas.add_annotation(token)\n\nSelecting annotations\n~~~~~~~~~~~~~~~~~~~~~\n\n.. code:: python\n\n from cassis import *\n\n with open('typesystem.xml', 'rb') as f:\n typesystem = load_typesystem(f)\n \n with open('cas.xml', 'rb') as f:\n cas = load_cas_from_xmi(f, typesystem=typesystem)\n\n for sentence in cas.select('cassis.Sentence'):\n for token in cas.select_covered('cassis.Token', sentence):\n print(cas.get_covered_text(token))\n \n # Annotation values can be accessed as properties\n print('Token: begin={0}, end={1}, id={2}, pos={3}'.format(token.begin, token.end, token.id, token.pos)) \n\nCreating types and adding features\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\n.. code:: python\n\n from cassis import *\n\n typesystem = TypeSystem()\n\n parent_type = typesystem.create_type(name='example.ParentType')\n typesystem.add_feature(type_=parent_type, name='parentFeature', rangeTypeName='String')\n\n child_type = typesystem.create_type(name='example.ChildType', supertypeName=parent_type.name)\n typesystem.add_feature(type_=child_type, name='childFeature', rangeTypeName='Integer')\n\n annotation = child_type(parentFeature='parent', childFeature='child')\n\nWhen adding new features, these changes are propagated. For example,\nadding a feature to a parent type makes it available to a child type.\nTherefore, the type system does not need to be frozen for consistency.\n\nSofa support\n~~~~~~~~~~~~\n\nA Sofa represents some form of an unstructured artifact that is processed in a UIMA pipeline. It contains for instance\nthe document text. Currently, new Sofas can be created. This is automatically done when creating a new view. Basic\nproperties of the Sofa can be read and written:\n\n.. code:: python\n\n cas = Cas()\n cas.sofa_string = \"Joe waited for the train . The train was late .\"\n cas.sofa_mime = \"text/plain\"\n\n print(cas.sofa_string)\n print(cas.sofa_mime)\n\nManaging views\n~~~~~~~~~~~~~~\n\nA view into a CAS contains a subset of feature structures and annotations. One view corresponds to exactly one Sofa. It\ncan also be used to query and alter information about the Sofa, e.g. the document text. Annotations added to one view\nare not visible in another view. A view Views can be created and changed. A view has the same methods and attributes\nas a :code:`Cas` .\n\n.. code:: python\n\n from cassis import *\n\n with open('typesystem.xml', 'rb') as f:\n typesystem = load_typesystem(f)\n Token = typesystem.get_type('cassis.Token')\n\n # This creates automatically the view `_InitialView`\n cas = Cas()\n cas.sofa_string = \"I like cheese .\"\n\n cas.add_annotations([\n Token(begin=0, end=1),\n Token(begin=2, end=6),\n Token(begin=7, end=13),\n Token(begin=14, end=15)\n ])\n\n print([cas.get_covered_text(x) for x in cas.select_all()])\n\n # Create a new view and work on it.\n view = cas.create_view('testView')\n view.sofa_string = \"I like blackcurrant .\"\n\n view.add_annotations([\n Token(begin=0, end=1),\n Token(begin=2, end=6),\n Token(begin=7, end=19),\n Token(begin=20, end=21)\n ])\n\n print([view.get_covered_text(x) for x in view.select_all()])\n\nDevelopment\n-----------\n\nThe required dependencies are managed by **pip**. A virtual environment\ncontaining all needed packages for development and production can be\ncreated and activated by\n\n::\n\n virtualenv venv --python=python3 --no-site-packages\n source venv/bin/activate\n pip install -e \".[test, dev, doc]\"\n\nThe tests can be run in the current environment by invoking\n\n::\n\n make test\n\nor in a clean environment via\n\n::\n\n tox", "description_content_type": "text/x-rst", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://dkpro.github.io", "keywords": "uima dkpro cas xmi", "license": "Apache License 2.0", "maintainer": "", "maintainer_email": "", "name": "dkpro-cassis", "package_url": "https://pypi.org/project/dkpro-cassis/", "platform": "", "project_url": "https://pypi.org/project/dkpro-cassis/", "project_urls": { "Bug Tracker": "https://github.com/dkpro/dkpro-cassis/issues", "Documentation": "https://cassis.readthedocs.org/", "Homepage": "https://dkpro.github.io", "Source Code": "https://github.com/dkpro/dkpro-cassis" }, "release_url": "https://pypi.org/project/dkpro-cassis/0.2.1/", "requires_dist": null, "requires_python": ">=3.5.0", "summary": "UIMA CAS processing library in Python", "version": "0.2.1" }, "last_serial": 5804081, "releases": { "0.1.1": [ { "comment_text": "", "digests": { "md5": "f13f82ad0aca59846ce9cfdefa543b6f", "sha256": "ba6ebfb46101fa417ac77701567f9e294ee5b7a24a2b1b1024371364040a1259" }, "downloads": -1, "filename": "dkpro_cassis-0.1.1-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "f13f82ad0aca59846ce9cfdefa543b6f", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": ">=3.5.0", "size": 19352, "upload_time": "2018-12-06T13:32:36", "url": "https://files.pythonhosted.org/packages/38/6a/39ae6a669d1dfa54b2017f9d065cca23fdbe653940bcf496f388a47aca22/dkpro_cassis-0.1.1-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "ba4dbe0f699e17b15282b1c91f29a079", "sha256": "a2925c5fe96ba9166e6782cdb8226e67e901a6d4a6127c805c6bc6a33e8c5d91" }, "downloads": -1, "filename": "dkpro-cassis-0.1.1.tar.gz", "has_sig": false, "md5_digest": "ba4dbe0f699e17b15282b1c91f29a079", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.5.0", "size": 18850, "upload_time": "2018-12-06T13:32:38", "url": "https://files.pythonhosted.org/packages/d2/d6/aa9913f4933096b6b148efcd18365ce938939d0a636db056117fa80bb673/dkpro-cassis-0.1.1.tar.gz" } ], "0.2.0rc1": [ { "comment_text": "", "digests": { "md5": "2f14686cba3bcf4ec286ccc2e71bbe21", "sha256": "bf17d5f68f0e5dba9b36c9cbb29d4d8b6b578e42b8176c6d8a5a2a92d34dbf22" }, "downloads": -1, "filename": "dkpro_cassis-0.2.0rc1-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "2f14686cba3bcf4ec286ccc2e71bbe21", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": ">=3.5.0", "size": 29704, "upload_time": "2019-07-17T13:55:26", "url": "https://files.pythonhosted.org/packages/64/cf/d9b002e62cc186b1c326d0b099311b9ec530a487390607d1c86e0ef88a36/dkpro_cassis-0.2.0rc1-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "77f66efa0ced330aad0c652c0c82fca3", "sha256": "4d501941a7c49795bf426a608ec5c8b2183724f26f8216b5c3b0ab86253acbd7" }, "downloads": -1, "filename": "dkpro-cassis-0.2.0rc1.tar.gz", "has_sig": false, "md5_digest": "77f66efa0ced330aad0c652c0c82fca3", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.5.0", "size": 30183, "upload_time": "2019-07-17T13:55:28", "url": "https://files.pythonhosted.org/packages/4c/56/77156a5bb47a01093228118055c0b6b3277f61b1ccecda77b4b6cb33d658/dkpro-cassis-0.2.0rc1.tar.gz" } ], "0.2.0rc2": [ { "comment_text": "", "digests": { "md5": "9f686cfd6554d5b3b4020b0863cd8596", "sha256": "40cf6fb13c30995f8a7fdeeae89d686cf68c9c23f46e5ea56b42b4b994f11fee" }, "downloads": -1, "filename": "dkpro-cassis-0.2.0rc2.tar.gz", "has_sig": false, "md5_digest": "9f686cfd6554d5b3b4020b0863cd8596", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.5.0", "size": 27098, "upload_time": "2019-07-25T13:07:21", "url": "https://files.pythonhosted.org/packages/85/f3/34e8bad13dae112e6ef5a45711ca9713c352973b4dd823065eb2729b6714/dkpro-cassis-0.2.0rc2.tar.gz" } ], "0.2.1": [ { "comment_text": "", "digests": { "md5": "8bc32be6303baf4c0ad6c901f946511b", "sha256": "a93db5472ea897877b8d051ffab390761e35bc54b47b3e4ff36e087e4c4ec386" }, "downloads": -1, "filename": "dkpro-cassis-0.2.1.tar.gz", "has_sig": false, "md5_digest": "8bc32be6303baf4c0ad6c901f946511b", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.5.0", "size": 27275, "upload_time": "2019-09-09T15:09:51", "url": "https://files.pythonhosted.org/packages/f0/e3/89d34466bff5779171913ac2dd804b6a99afab8b89d16f4c116e9dc6d1da/dkpro-cassis-0.2.1.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "8bc32be6303baf4c0ad6c901f946511b", "sha256": "a93db5472ea897877b8d051ffab390761e35bc54b47b3e4ff36e087e4c4ec386" }, "downloads": -1, "filename": "dkpro-cassis-0.2.1.tar.gz", "has_sig": false, "md5_digest": "8bc32be6303baf4c0ad6c901f946511b", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.5.0", "size": 27275, "upload_time": "2019-09-09T15:09:51", "url": "https://files.pythonhosted.org/packages/f0/e3/89d34466bff5779171913ac2dd804b6a99afab8b89d16f4c116e9dc6d1da/dkpro-cassis-0.2.1.tar.gz" } ] }