{ "info": { "author": "L.C. Vriend, D.E. Kim", "author_email": "vanboefer@gmail.com", "bugtrack_url": null, "classifiers": [ "Development Status :: 3 - Alpha", "Intended Audience :: Science/Research", "License :: OSI Approved :: GNU General Public License v3 or later (GPLv3+)", "Programming Language :: Python :: 3.7", "Topic :: Scientific/Engineering :: Information Analysis", "Topic :: Text Processing :: Linguistic", "Topic :: Utilities" ], "description": "# Humannotator\n\n**Library for conveniently creating simple customizable annotators \nfor manual annotation of your data** \n*Jenia Kim, Lawrence Vriend*\n\nWorks well with Jupyter notebooks:\n\n[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/lcvriend/humannotator/master?filepath=examples%2Fexamples.ipynb)\n\n## Use case\n\nThe humannotator provides an easy way to set up custom annotators.\nThis tool is for you if manual annotation is part of your workflow \nand you are looking for a solution that is:\n\n- Lightweight\n- Customizable\n- Easy to set up\n- Integrates with Jupyter/pandas/Python\n\n## Quick start\n\n### Create a simple annotator\n\n1. [Load the data](#load-data)\n2. [Define the tasks](#define-tasks)\n3. [Instantiate the annotator](#annotator)\n\n```Python\n import pandas as pd\n from humannotator import Annotator, task_factory\n\n df = pd.read_csv('news.csv', index_col=0)\n cols = ['title', 'date', 'news_id']\n\n choices={\n '0': 'not adverse media',\n '1': 'adverse media',\n '3': 'exclude from dataset',\n }\n instruct = \"What is the topic in the title?\"\n task1 = task_factory(choices, 'Adverse media')\n task2 = task_factory(\n 'str',\n 'Topic',\n instruction=instruct,\n nullable=True\n )\n\n annotator = Annotator([task1, task2], df[cols])\n```\n\n### Annotate your data\n\n- Use the annotator by calling it: `annotator()`.\n- The annotator keeps track of where you were.\n- Highlight phrases with the 'phrases' argument.\n- The annotator stores user (if provided) and timestamp with the annotation.\n\n### Access your annotations\n\n- Access the annotations with the `annotated` attribute.\n- Return merged data and annotations with the `merged` method.\n- The annotations are conveniently stored in a pandas `DataFrame`.\n\n### Store your annotations\n\n- Store the annotator with the `save` method.\n- Load the annotator with the `load` method.\n\n## Load data\n\nThe annotator accepts `list`, `dict`, `Series` and `DataFrame` objects as data. \nThe data will be converted to a dataframe internally.\n\n### dataframes\n\n- By default, the annotator will use its `index` and all `columns`. \n- Use `load_data` to create a `data` object if you need more control:\n 1. `id_col` sets the column to be used as index.\n 2. `item_cols` set the column or columns to be displayed.\n\n## Define tasks\n\nTasks are set up, using the `task_factory`.\nCreate a task by passing it:\n\n- the `kind` of task\n- the `name` of the task\n- (optionally) an `instruction`\n- (optionally) a list of `dependencies`\n- whether it is `nullable` (default is False)\n- any kwargs necessary\n\nTypically: \n```Python\n task_factory(\n 'kind',\n 'name',\n instruction='instruction',\n dependencies=dependencies,\n nullable=True/False,\n **kwargs,\n )\n```\n\nPassing a dict or list to `kind` will create a categorical task. \nIn this case the `categories` kwarg is ignored.\n\n### Setting up tasks through subscription\n\nIt is also possible to instantiate an annotator and add tasks through subscription: \n\n```Python\n a = Annotator()\n a.tasks['topic'] = ['economy', 'politics', 'media', 'other']\n a.tasks['factual'] = bool, \"Is the article factual?\", False\n```\n\nTo add a task like this, you minimally need to provide the `kind` of task you are trying to create.\nOptionally, you can add `instruction`, `nullability`, `dependencies` and any other kwargs (as dictionary).\nChange the order in which tasks are prompted to the user with the `order` attribute on `tasks`.\n\n### Available tasks\n\nkind | kwargs | dtype | description\n--------- | -----------| ---------------- | ----------------\nstr | | object | String\nregex | regex | object | String validated by regex\nint | | Int64 | Nullable integer\nfloat | | float64 | Float\nbool | | bool | Boolean\ncategory | categories | CategoricalDtype | Categorical variable\ndate | | datetime64[ns] | Date\n\n### Dependencies\n\nDependencies consist of a *condition* and a *value*, that can be passed as tuple:\n\n```Python\n (\"col1 == 'x'\", False)\n```\n\nThe condition is a [pandas query statement](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.query.html#pandas.DataFrame.query).\nBefore prompting the user for input, the condition is evaluated on the current annotation.\nIf the query evaluates to True then the value will be assigned automatically.\n\n## Annotator\n\n### Calling the annotator\n\nThe annotator detects if it is run from Jupyter.\nIf so, the annotator will render itself in html and css.\nIf not, the annotator will render itself as text.\nYou can annotate a selection of records by passing a list of ids to the annotator call. If you want to reannotate ids that have already been annotated, then set `redo` to True when calling the annotator.\n\n### Instantiating the annotator\n\n> arguments\n> ---------\n> tasks : *task, list of task or DataFrame, default None* \n>\n> Annotation task(s).\n> If passed a DataFrame, then the tasks will be inferred from it.\n> Annotation data in the dataframe will also be initialized.\n>\n> data : *data, list-/dict-like, Series or DataFrame, default None* \n>\n> Data to be annotated.\n> If `data` is not already a data object,\n> then it will be passed through `load_data`.\n> The annotator can be instantiated without data,\n> but will only work after data is loaded.\n>\n> user : *str, default None* \n>\n> Name of the user.\n>\n> name : *str, default 'HUMANNOTATOR'* \n>\n> Name of the annotator.\n> save_data : *boolean, default False* \n>\n> Set flag to True if you want to store the data with the annotator.\n> This will ensure that the pickled object, will contain the data.\n> \n> other parameters\n> ----------------\n> **DISPLAY** \n> text_display : *boolean, default None* \n>\n> If True will display the annotator in plain text instead of html.\n> \n> **DATA** \n> item_cols : *str or list of str, default None* \n>\n> Name(s) of dataframe column(s) to display when annotating.\n> By default: display all columns.\n>\n> id_col : *str, default None* \n>\n> Name of dataframe column to use as index.\n> By default: use the dataframe's index.\n> \n> **HIGHLIGHTER** \n> phrases : *str, list of str, default None* \n>\n> Phrases to highlight in the display.\n> The phrases can be regexes.\n> It also to pass in a dict where:\n> - the keys are the phrases\n> - the values are the css styling\n>\n> escape : *boolean, default False* \n>\n> Set escape to True in order to escape the phrases.\n>\n> flags : *int, default 0 (no flags)* \n>\n> Flags to pass through to the re module, e.g. re.IGNORECASE.\n> \n> **TRUNCATER** \n> truncate : *boolean, default True* \n>\n> Set to False to not truncate items.\n>\n> trunc_limit : *int, default 32* \n>\n> The number of words beyond which an item will be truncated.\n>\n\n\n", "description_content_type": "text/markdown", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "http://github.com/lcvriend/humannotator", "keywords": "annotation annotator text data pandas", "license": "GPLv3+", "maintainer": "", "maintainer_email": "", "name": "humannotator", "package_url": "https://pypi.org/project/humannotator/", "platform": "", "project_url": "https://pypi.org/project/humannotator/", "project_urls": { "Homepage": "http://github.com/lcvriend/humannotator" }, "release_url": "https://pypi.org/project/humannotator/0.0.1/", "requires_dist": [ "pandas (>=0.24.0)", "markdown" ], "requires_python": ">=3.6", "summary": "Customizable tool for easy manual annotation", "version": "0.0.1" }, "last_serial": 5932618, "releases": { "0.0.1": [ { "comment_text": "", "digests": { "md5": "72e6c7187e97ecaa8fcbdbfc2fd5997c", "sha256": "42e770b231b3579115ad2ba7307cd34638d3299c275c4a65b37bf94874c26963" }, "downloads": -1, "filename": "humannotator-0.0.1-py3-none-any.whl", "has_sig": false, "md5_digest": "72e6c7187e97ecaa8fcbdbfc2fd5997c", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": ">=3.6", "size": 47947, "upload_time": "2019-10-05T17:18:30", "url": "https://files.pythonhosted.org/packages/c3/d5/4a904086009bd4786654750e8e049af65d9b4e61fd6d10763670c153d01a/humannotator-0.0.1-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "89c99dfc6a02f5aa832b248af248cf08", "sha256": "2c1b26208157ebaf1bc16bcecbb7d035c495a8208b0ebcfb98404283c3add3d1" }, "downloads": -1, "filename": "humannotator-0.0.1.tar.gz", "has_sig": false, "md5_digest": "89c99dfc6a02f5aa832b248af248cf08", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.6", "size": 27451, "upload_time": "2019-10-05T17:18:33", "url": "https://files.pythonhosted.org/packages/45/9f/2ec9c92b6ac78e2c82aa079f56bee42b108ba18bc685673720c5653aebe9/humannotator-0.0.1.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "72e6c7187e97ecaa8fcbdbfc2fd5997c", "sha256": "42e770b231b3579115ad2ba7307cd34638d3299c275c4a65b37bf94874c26963" }, "downloads": -1, "filename": "humannotator-0.0.1-py3-none-any.whl", "has_sig": false, "md5_digest": "72e6c7187e97ecaa8fcbdbfc2fd5997c", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": ">=3.6", "size": 47947, "upload_time": "2019-10-05T17:18:30", "url": "https://files.pythonhosted.org/packages/c3/d5/4a904086009bd4786654750e8e049af65d9b4e61fd6d10763670c153d01a/humannotator-0.0.1-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "89c99dfc6a02f5aa832b248af248cf08", "sha256": "2c1b26208157ebaf1bc16bcecbb7d035c495a8208b0ebcfb98404283c3add3d1" }, "downloads": -1, "filename": "humannotator-0.0.1.tar.gz", "has_sig": false, "md5_digest": "89c99dfc6a02f5aa832b248af248cf08", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.6", "size": 27451, "upload_time": "2019-10-05T17:18:33", "url": "https://files.pythonhosted.org/packages/45/9f/2ec9c92b6ac78e2c82aa079f56bee42b108ba18bc685673720c5653aebe9/humannotator-0.0.1.tar.gz" } ] }