{ "info": { "author": "Jun Harashima", "author_email": "j.harashima@gmail.com", "bugtrack_url": null, "classifiers": [ "Development Status :: 2 - Pre-Alpha", "Intended Audience :: Developers", "License :: OSI Approved :: MIT License", "Natural Language :: English", "Programming Language :: Python :: 3", "Programming Language :: Python :: 3.5", "Programming Language :: Python :: 3.6", "Programming Language :: Python :: 3.7" ], "description": "========\npttagger\n========\n\n.. image:: https://img.shields.io/pypi/v/pttagger.svg\n :target: https://pypi.python.org/pypi/pttagger\n\n.. image:: https://img.shields.io/travis/jun-harashima/pttagger.svg\n :target: https://travis-ci.org/jun-harashima/pttagger\n\npttagger is a simple PyTorch-based tagger which has the following features:\n\n- stacked bi-directional RNN (GRU or LSTM)\n- variable-sized mini-batches\n- multiple inputs\n\nQuick Start\n===========\n\nInstallation\n------------\n\nRun this command in your terminal:\n\n.. code-block:: bash\n\n $ pip install pttagger\n\nPre-processing\n--------------\n\nSuppose that you have the following examples of named entity recognition:\n\n- Joe/B-PER Smith/I-PER goes/O to/O Japan/B-LOC ./O\n- Jane/B-PER Smith/I-PER belongs/O to/O Kyoto/B-ORG University/I-ORG ./O\n- ...\n\nFirst, give the examples to construct a ``Dataset`` object like this:\n\n.. code-block:: python\n\n from pttagger.dataset import Dataset\n\n examples = [\n {'Xs': [['Joe', 'Doe', 'goes', 'to', 'Japan', '.']],\n 'Y': ['B-PER', 'I-PER', 'O', 'O', 'B-LOC', 'O']},\n {'Xs': [['Jane', 'Doe', 'belongs', 'to', 'Kyoto', 'University', '.']],\n 'Y': ['B-PER', 'I-PER', 'O', 'O', 'B-ORG', 'I-ORG', 'O']},\n ...\n ]\n dataset = Dataset(examples)\n\nYou can also use multiple inputs as a value of ``Xs``. In the following case, ``Xs`` has not only word information but also POS information:\n\n.. code-block:: python\n\n examples = [\n {'Xs': [['Joe', 'Doe', 'goes', 'to', 'Japan', '.'], ['NNP', 'NNP', 'VBZ', 'TO', 'NNP', '.']],\n 'Y': ['B-PER', 'I-PER', 'O', 'O', 'B-LOC', 'O']},\n {'Xs': [['Jane', 'Doe', 'belongs', 'to', 'Kyoto', 'University', '.'], ['NNP', 'NNP', 'VBZ', 'TO', 'NNP', 'NNP', '.']],\n 'Y': ['B-PER', 'I-PER', 'O', 'O', 'B-ORG', 'I-ORG', 'O']},\n ...\n ]\n dataset = Dataset(examples)\n\nNow, ``dataset`` has the following two indices:\n\n- ``x_to_index``: e.g., ``[{'': 0, '': 1, 'Joe': 2, 'Doe': 3, ...}]``\n- ``y_to_index``: e.g., ``{'': 0, '': 1, 'B-PER': 2, 'I-PER': 3, ...}``\n\nIf you use multiple inputs, ``x_to_index`` has indices for each input.\n\nTraining\n--------\n\nConstruct a ``Model`` object and train it as follows:\n\n.. code-block:: python\n\n from pttagger.model import Model\n\n EMBEDDING_DIMS = [100] # if you use multiple inputs, set a dimension for each input\n HIDDEN_DIMS = [10] # the same as above\n x_set_sizes = [len(x_to_index) for x_to_index in dataset.x_to_index]\n y_set_size = len(dataset.y_to_index)\n model = Model(EMBEDDING_DIMS, HIDDEN_DIMS, x_set_sizes, y_set_size)\n model.train(dataset)\n\nYou can also use the following parameters:\n\n- **use_lstm** - If ``True``, uses LSTM. Default: ``False`` (uses GRU)\n- **num_layers** - Number of recurrent layers. Default: ``1``\n\nTest\n----\n\nPredict tags for test examples like this:\n\n.. code-block:: python\n\n test_examples = [\n {'Xs': [['Richard', 'Roe', 'comes', 'to', 'America', '.']],\n 'Y': ['B-PER', 'I-PER', 'O', 'O', 'B-LOC', 'O']}\n ]\n test_dataset = Dataset(test_examples)\n results = model.test(dataset)\n", "description_content_type": "", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/jun-harashima/pttagger", "keywords": "pttagger", "license": "MIT license", "maintainer": "", "maintainer_email": "", "name": "pttagger", "package_url": "https://pypi.org/project/pttagger/", "platform": "", "project_url": "https://pypi.org/project/pttagger/", "project_urls": { "Homepage": "https://github.com/jun-harashima/pttagger" }, "release_url": "https://pypi.org/project/pttagger/0.1.1/", "requires_dist": null, "requires_python": "", "summary": "Simple PyTorch-based tagger", "version": "0.1.1" }, "last_serial": 4691981, "releases": { "0.1.0": [ { "comment_text": "", "digests": { "md5": "148821d46f0ab577af2dfd67b3cc49ec", "sha256": "e926041c5d9b26a673b4160efd55ae1f786fa1caae6c53d496fe272aa541e39c" }, "downloads": -1, "filename": "pttagger-0.1.0.tar.gz", "has_sig": false, "md5_digest": "148821d46f0ab577af2dfd67b3cc49ec", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 12877, "upload_time": "2019-01-03T15:26:41", "url": "https://files.pythonhosted.org/packages/a8/fa/1cedb79be3ef58eaef35c839db6a0f8a20f361bc8c5663872098a4981855/pttagger-0.1.0.tar.gz" } ], "0.1.1": [ { "comment_text": "", "digests": { "md5": "463b04ad655047ba31121de9da194e34", "sha256": "3004508163422ca35093effde4fd6d7997bb5bc9bfaf4971a958dfeed1635a66" }, "downloads": -1, "filename": "pttagger-0.1.1.tar.gz", "has_sig": false, "md5_digest": "463b04ad655047ba31121de9da194e34", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 13062, "upload_time": "2019-01-13T22:46:20", "url": "https://files.pythonhosted.org/packages/7d/c3/9962a15d5717c9844dbb8ac2f6c5c16f5cd36e031286f169df8baa38f6f7/pttagger-0.1.1.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "463b04ad655047ba31121de9da194e34", "sha256": "3004508163422ca35093effde4fd6d7997bb5bc9bfaf4971a958dfeed1635a66" }, "downloads": -1, "filename": "pttagger-0.1.1.tar.gz", "has_sig": false, "md5_digest": "463b04ad655047ba31121de9da194e34", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 13062, "upload_time": "2019-01-13T22:46:20", "url": "https://files.pythonhosted.org/packages/7d/c3/9962a15d5717c9844dbb8ac2f6c5c16f5cd36e031286f169df8baa38f6f7/pttagger-0.1.1.tar.gz" } ] }