{ "info": { "author": "Tobias Kolditz", "author_email": "tbs.kldtz@gmail.com", "bugtrack_url": null, "classifiers": [ "License :: OSI Approved :: MIT License", "Operating System :: OS Independent", "Programming Language :: Python :: 3" ], "description": "# bratiaa\n\nInter-annotator agreement for [Brat](https://brat.nlplab.org/) annotation projects. For a quick overview of the output generated by `bratiaa`, have a look at the [example files](https://github.com/kldtz/bratiaa/tree/master/example-files). So far only text-bound annotations are supported, all other annotation types are ignored.\n\n## Installation\n\nInstall the package via pip.\n\n```shell\npip install bratiaa\n```\n\n## Project Structure\n\nBy default `bratiaa` expects that each first-level subdirectory of the annotation project contains the files of one annotator. It will automatically determine the set of files annotated by each annotator (files with the same relative path starting from the different annotators' directories). Here is a simple example:\n\n```shell\nexample-project/\n\u251c\u2500\u2500 annotation.conf\n\u251c\u2500\u2500 annotator-1\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 doc-1.ann\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 doc-1.txt\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 doc-3.ann\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 doc-3.txt\n\u2502\u00a0\u00a0 \u2514\u2500\u2500 second\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 doc-2.ann\n\u2502\u00a0\u00a0 \u2514\u2500\u2500 doc-2.txt\n\u2514\u2500\u2500 annotator-2\n \u251c\u2500\u2500 doc-3.ann\n \u251c\u2500\u2500 doc-3.txt\n \u251c\u2500\u2500 doc-4.ann\n \u251c\u2500\u2500 doc-4.txt\n \u2514\u2500\u2500 second\n \u251c\u2500\u2500 doc-2.ann\n \u2514\u2500\u2500 doc-2.txt\n```\nIn this example, we have two agreement documents: 'second/doc-2.txt' and 'doc-3.txt'. The other two documents are only annotated by a single annotator.\n\nIf you have a different project setup, you need to provide your own `input_generator` function, yielding document objects with paths to the plain text and all corresponding ANN files (cf. `bratiaa.agree.py`). \n\n## Usage\n\nYou can either use `bratiaa` as a Python library or as a command-line tool.\n\n\n### Python Interface\n```python\nimport bratiaa as biaa\n\nproject = '/path/to/brat/project'\n\n# instance-level agreement\nf1_agreement = biaa.compute_f1_agreement(project)\n\n# print agreement report to stdout\nbiaa.iaa_report(f1_agreement)\n\n# agreement per label\nlabel_mean, label_sd = f1_agreement.mean_sd_per_label()\n\n# agreement per document\ndoc_mean, doc_sd = f1_agreement.mean_sd_per_document() \n\n# total agreement\ntotal_mean, total_sd = f1_agreement.mean_sd_total()\n```\n\nFor the token-level evaluation, please use your own tokenization function. This function should yield (start, end) offset tuples for any given string like the example function below.\n\n```python\nimport re\nimport bratiaa as biaa\n\ndef token_func(text):\n token = re.compile('\\w+|[^\\w\\s]+')\n for match in re.finditer(token, text):\n yield match.start(), match.end()\n\n# token-level agreement\nf1_agreement = biaa.compute_f1_agreement('/path/to/brat/project' , token_func=token_func)\n```\n\n### CLI\nHelp message: `brat-iaa -h`\n\n```shell\n# instance-level agreement and heatmap\nbrat-iaa /path/to/brat/project --heatmap instance-heatmap.png > instance-agreement.md\n\n# token-level agreement (not recommended)\nbrat-iaa /path/to/brat/project -t --heatmap token-heatmap.png > token-agreement.md\n```\n\nThe token-based evaluation of the command-line interface uses the generic pattern `'\\S+'` to identify tokens (splitting on whitespace) and hence is not recommended. Please use the Python interface with a language- and task-specific tokenizer instead.\n\nFor the output formats generated by the above commands, have a look at the [example files](https://github.com/kldtz/bratiaa/tree/master/example-files).\n\n\n## Agreement Measures\n\nFor each multiply annotated document, we compute the number of true positives (*TP*), false positives (*FP*) and false negatives (*FN*) for each 2-combination of annotators, where each annotator contributes one set of annotations, via basic (multi)set operations. These numbers can later be aggregated along two dimensions: *documents* and/or *labels*. Based on the aggregated numbers we compute `F1 = (2*TP) / (2*TP + FP + FN)` for each annotator pair. From these pair-wise F-scores, mean and standard deviation are reported (see Hripcsak & Rothschild, 2005).\n\n\n### Instance-Based Agreement\n\nAn annotation instance pertaining to a certain document consists of a label and one or more start-end offset tuples (multiple start-end tuples in the case of discontinuous annotations). Two instances are considered identical if label and offset tuples match. Identical instances from a single annotator (on the same document) are considered as accidental - only unqiue annotation instances are used for calculating agreement.\n\n### Token-Based Agreement\n\nEach annotation instance is split up into its overlapping tokens, e.g. if our tokenizer splits on whitespace, \"\\[ORG Human Rights Watch\\]\" and \"\\[ORG Human Rights Wat\\]ch\" both become \"\\[ORG Human\\] \\[ORG Rights\\] \\[ORG Watch\\]\". These split annotations are then treated as instances in the way described above with the only exception that we are dealing with **multisets** instead of sets, allowing for multiple token-based annotations with the same label and offsets in the case of overlapping annotations of the same type. For example, in \"\\[LOC University of \\[LOC Jena\\]\\]\" we have two overlapping location annotations resulting in four token-based annotations of which two are identical (\"\\[LOC Jena\\]\").\n\nBe aware that \"\\[ORG Human\\] \\[ORG Rights Watch\\]\" and \"\\[ORG Human Rights\\] \\[ORG Watch\\]\" both become \"\\[ORG Human\\] \\[ORG Rights\\] \\[ORG Watch\\]\", that is, boundary errors between adjacent annotations of the same type are ignored!\n\n\n## References\n\nHripcsak, G., & Rothschild, A. S. (2005). Agreement, the f-measure, and reliability in information retrieval. Journal of the American Medical Informatics Association, 12(3), 296-298.\n\n\n## License\n\nThis software is provided under the [MIT-License](https://github.com/kldtz/bratiaa/blob/master/LICENSE). The code contains a modified subset of brat, which is available under the same permissive [license](https://github.com/kldtz/bratiaa/blob/master/bratsubset/BRAT_LICENSE.md).\n\n", "description_content_type": "text/markdown", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/kldtz/bratiaa", "keywords": "", "license": "", "maintainer": "", "maintainer_email": "", "name": "bratiaa", "package_url": "https://pypi.org/project/bratiaa/", "platform": "", "project_url": "https://pypi.org/project/bratiaa/", "project_urls": { "Homepage": "https://github.com/kldtz/bratiaa" }, "release_url": "https://pypi.org/project/bratiaa/0.1.3/", "requires_dist": [ "numpy", "matplotlib", "pytest", "scipy", "tabulate" ], "requires_python": ">=3.5", "summary": "Inter-annotator agreement for Brat annotation projects", "version": "0.1.3" }, "last_serial": 5527294, "releases": { "0.1.1": [ { "comment_text": "", "digests": { "md5": "c4cfe9e8d9ac3e86bcb403d672aaa6c9", "sha256": "dd681dcf04a3b48a530f1b5b7ec788177af18a6d86a55472da93de92d47410a7" }, "downloads": -1, "filename": "bratiaa-0.1.1-py3-none-any.whl", "has_sig": false, "md5_digest": "c4cfe9e8d9ac3e86bcb403d672aaa6c9", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": ">=3.5", "size": 47576, "upload_time": "2019-07-10T19:05:04", "url": "https://files.pythonhosted.org/packages/9b/c9/800cf7b224036d99dd633872bb61b6ef398115b34c5b1f06877983af326b/bratiaa-0.1.1-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "7013f238d3c63de7018206c2f0ace260", "sha256": "576b719b7e899049213047c6cea1a1e5bde8fb0095a770de5f60f92b6740b696" }, "downloads": -1, "filename": "bratiaa-0.1.1.tar.gz", "has_sig": false, "md5_digest": "7013f238d3c63de7018206c2f0ace260", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.5", "size": 45245, "upload_time": "2019-07-10T19:05:07", "url": "https://files.pythonhosted.org/packages/b2/3a/eac48a95eacd6dd4721e2b9c34363fdd9195a11420f73613c9c3ae29759a/bratiaa-0.1.1.tar.gz" } ], "0.1.2": [ { "comment_text": "", "digests": { "md5": "cac0c43120b4332f35756100c583da81", "sha256": "dd9970d8f18e27f5277c756b95760b8231520db17a41a9744bc60e8f428cc5b6" }, "downloads": -1, "filename": "bratiaa-0.1.2-py3-none-any.whl", "has_sig": false, "md5_digest": "cac0c43120b4332f35756100c583da81", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": ">=3.5", "size": 47747, "upload_time": "2019-07-11T06:48:15", "url": "https://files.pythonhosted.org/packages/5e/b2/62472ed54685d06952dda0f170c0c9be6e715d34f3ba2c6bb188494a178c/bratiaa-0.1.2-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "776c713257989d9968250aa8a9937223", "sha256": "1ec03ea5c5712ec305c231867e95779c0d874db18f38d51687b96bf6bae128fb" }, "downloads": -1, "filename": "bratiaa-0.1.2.tar.gz", "has_sig": false, "md5_digest": "776c713257989d9968250aa8a9937223", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.5", "size": 45386, "upload_time": "2019-07-11T06:48:17", "url": "https://files.pythonhosted.org/packages/15/ec/a9e1daa6c1389a65be92f13e15e9cd0c265b0935e2fac05d4f6ca6d9d52b/bratiaa-0.1.2.tar.gz" } ], "0.1.3": [ { "comment_text": "", "digests": { "md5": "fc02ee58c635a9d3c560272a2739def5", "sha256": "36dca3fe291503ac64d3458cb9d567bb72aafb863299032f9597833756331bd9" }, "downloads": -1, "filename": "bratiaa-0.1.3-py3-none-any.whl", "has_sig": false, "md5_digest": "fc02ee58c635a9d3c560272a2739def5", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": ">=3.5", "size": 46817, "upload_time": "2019-07-13T16:18:46", "url": "https://files.pythonhosted.org/packages/6c/b0/24b2fca0caba896afac1851888abecd40b186622cf4a3eb487ea4b18e793/bratiaa-0.1.3-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "9b94c5fc419c67b131ebd871e2b4fdf7", "sha256": "fb75564eabb8ab6f7efc0341b9ef409755cae9996258b71c0ca558677ce0f963" }, "downloads": -1, "filename": "bratiaa-0.1.3.tar.gz", "has_sig": false, "md5_digest": "9b94c5fc419c67b131ebd871e2b4fdf7", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.5", "size": 44415, "upload_time": "2019-07-13T16:18:48", "url": "https://files.pythonhosted.org/packages/a0/68/d0b0e12d40946cfa45a0d7c40c70b613a32dfc7194db0c336ad25bcf455b/bratiaa-0.1.3.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "fc02ee58c635a9d3c560272a2739def5", "sha256": "36dca3fe291503ac64d3458cb9d567bb72aafb863299032f9597833756331bd9" }, "downloads": -1, "filename": "bratiaa-0.1.3-py3-none-any.whl", "has_sig": false, "md5_digest": "fc02ee58c635a9d3c560272a2739def5", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": ">=3.5", "size": 46817, "upload_time": "2019-07-13T16:18:46", "url": "https://files.pythonhosted.org/packages/6c/b0/24b2fca0caba896afac1851888abecd40b186622cf4a3eb487ea4b18e793/bratiaa-0.1.3-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "9b94c5fc419c67b131ebd871e2b4fdf7", "sha256": "fb75564eabb8ab6f7efc0341b9ef409755cae9996258b71c0ca558677ce0f963" }, "downloads": -1, "filename": "bratiaa-0.1.3.tar.gz", "has_sig": false, "md5_digest": "9b94c5fc419c67b131ebd871e2b4fdf7", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.5", "size": 44415, "upload_time": "2019-07-13T16:18:48", "url": "https://files.pythonhosted.org/packages/a0/68/d0b0e12d40946cfa45a0d7c40c70b613a32dfc7194db0c336ad25bcf455b/bratiaa-0.1.3.tar.gz" } ] }