{ "info": { "author": "hankcs", "author_email": "hankcshe@gmail.com", "bugtrack_url": null, "classifiers": [ "Intended Audience :: Developers", "Natural Language :: English", "Operating System :: OS Independent", "Programming Language :: Python :: 3", "Topic :: Utilities" ], "description": "# IParser: Industrial Strength Dependency Parser\n\nYet another multilingual dependency parser, integrated with tokenizer, part-of-speech tagger and visualization tool. IParser can parse raw sentence to dependency tree in CoNLL format, and is able to visualize trees in your browser. [See live demo!](http://iparser.hankcs.com/)\n\nCurrently, iparser is in a prototype state. It makes no warranty and may not be ready for practical usage.\n\n## Install\n\n```\npip3 install iparser --process-dependency-links\n```\n\n## Quick Start\n\n### CLI\n\n#### Interactive Shell\n\nYou can play with IParser in an interactive mode:\n\n```\n$ iparser parse\nI looove iparser!\n1\tI _\t_\tPRP\t_\t2\tnsubj\t_\t_\n2\tlooove _\t_\tVBP\t_\t0\troot\t_\t_\n3\tiparser _\t_\tNN\t_\t2\tdobj\t_\t_\n4\t! _\t_\t. _\t2\tpunct\n```\nYou type a sentence, hit enter, IParser will output its dependency tree.\n\n- Use `iparser segment` or `iparser tag` for word segmentation or part-of-speech tagging\n- Some models may take a while to load\n- IParser is language-agnostic, we provide pre-trained models for both English and Chinese, shipped in the installation package. The default model is PTB (English), you can switch to CTB (Chinese) via appending `--language cn`\n- Append `--help` to see the detail manual\n\n#### Pipeline\n\n```\n$ iparser segment <<< '\u5546\u54c1\u548c\u670d\u52a1' \n\u5546\u54c1 \u548c \u670d\u52a1\n\n$ iparser tag <<< 'I looove iparser!' \nI/PRP looove/VBP iparser/NN !/.\n\n$ iparser parse <<< 'I looove iparser!' \n1\tI _\t_\tPRP\t_\t2\tnsubj\t_\t_\n2\tlooove _\t_\tVBP\t_\t0\troot\t_\t_\n3\tiparser _\t_\tNN\t_\t2\tdobj\t_\t_\n4\t! _\t_\t. _\t2\tpunct\t_\t_\n```\n\n- `iparser` is a compatible pipeline for standard I/O redirection. You can use `iparser` directly in terminal without writing codes.\n\n### API\n\n#### IParser\n\nThe all-in-one interface is provided by class `IParser`:\n\n```\n$ python3\n>>> from iparser import *\n>>> iparser = IParser(pos_config_file=PTB_POS, dep_config_file=PTB_DEP)\n>>> print(iparser.tag('I looove iparser!'))\n[('I', 'PRP'), ('looove', 'VBP'), ('iparser', 'NN'), ('!', '.')]\n>>> print(iparser.parse('I looove iparser!'))\n1\tI\t_\t_\tPRP\t_\t2\tnsubj\t_\t_\n2\tlooove\t_\t_\tVBP\t_\t0\troot\t_\t_\n3\tiparser\t_\t_\tNN\t_\t2\tdobj\t_\t_\n4\t!\t_\t_\t.\t_\t2\tpunct\t_\t_\n```\n\nYou can load models trained on different corpora to support multilingual:\n\n```\n>>> iparser = IParser(seg_config_file=CTB_SEG, pos_config_file=CTB_POS, dep_config_file=CTB_DEP)\n>>> print(iparser.parse('\u6211\u7231\u4f9d\u5b58\u5206\u6790\uff01'))\n1\t\u6211\t_\t_\tPN\t_\t2\tnsubj\t_\t_\n2\t\u7231\t_\t_\tVV\t_\t0\troot\t_\t_\n3\t\u4f9d\u5b58\t_\t_\tVV\t_\t2\tccomp\t_\t_\n4\t\u5206\u6790\t_\t_\tVV\t_\t3\tcomod\t_\t_\n5\t\uff01\t_\t_\tPU\t_\t2\tpunct\t_\t_\n```\n\nIf you only want to perform an intermediate step, you can checkout the following APIs.\n\n#### Word Segmentation\n\n```\n>>> segmenter = Segmenter(CTB_SEG).load()\n>>> segmenter.segment('\u4e0b\u96e8\u5929\u5730\u9762\u79ef\u6c34')\n['\u4e0b\u96e8\u5929', '\u5730\u9762', '\u79ef\u6c34']\n```\n\n- Notice that you need to call `load` to indicate that you want to load a pre-trained model, not to prepare an empty model for training.\n\n#### Part-of-Speech Tagging\n\n```\n>>> tagger = POSTagger(PTB_POS).load()\n>>> tagger.tag('I looove languages'.split())\n[('I', 'PRP'), ('looove', 'VBP'), ('languages', 'NNS')]\n```\n\n- `POSTagger` is not responsible for word segmentation. Do segmentation in advance or use `IParser` for convenience.\n\n#### Dependency Parsing\n\n```\n>>> parser = DepParser(PTB_DEP).load()\n>>> sentence = [('Is', 'VBZ'), ('this', 'DT'), ('the', 'DT'), ('future', 'NN'), ('of', 'IN'), ('chamber', 'NN'), ('music', 'NN'), ('?', '.')]\n>>> print(parser.parse(sentence))\n1\tIs\t_\t_\tVBZ\t_\t4\tcop\t_\t_\n2\tthis\t_\t_\tDT\t_\t4\tnsubj\t_\t_\n3\tthe\t_\t_\tDT\t_\t4\tdet\t_\t_\n4\tfuture\t_\t_\tNN\t_\t0\troot\t_\t_\n5\tof\t_\t_\tIN\t_\t4\tprep\t_\t_\n6\tchamber\t_\t_\tNN\t_\t7\tnn\t_\t_\n7\tmusic\t_\t_\tNN\t_\t5\tpobj\t_\t_\n8\t?\t_\t_\t.\t_\t4\tpunct\t_\t_\n```\n\n- `DepParser` is neither responsible for segmentation nor tagging. \n- The input must be a list of tuples of word and tag.\n\n### Server\n\n```\n$ iparser.server\nusage: iparser.server [-h] [--port PORT]\n\nA http server for IParser\n\noptional arguments:\n -h, --help show this help message and exit\n --port PORT\n```\n\n- The default URL is http://localhost:8666/\n- You can check our live demo at http://iparser.hankcs.com/\n\n\n## Train Models\n\nIParser is designed to be language-agnostic, which means it has universal language support, only need to give some corpora of a desired language. \n\n### Corpus Format\n\nThe format is described here: https://github.com/hankcs/TreebankPreprocessing\n\n### Configuration File\n\nIParser employs configuration files to ensure the same network is created before and after serialization, in training phase and testing phase accordingly. This is important for research engineers who want to fine-tune those hyper parameters, or train new models on third language corpora. We provide well documented configuration template files, containing all configurable parameters for users to adjust. \n\nYou can check out templates shipped with the `iparsermodels`, e.g.\n\n```\npython3\n>>> from iparser import *\n>>> PTB_DEP\n'/usr/local/python3/lib/python3.6/site-packages/iparsermodels/ptb/dep/config.ini'\n```\n\n### CLI\n\nThe CLI is not only capable for prediction, but also can perform training. Only requires a configuration file.\n\n```\n$ iparser segment --help\nusage: iparser segment [-h] [--config CONFIG] [--action ACTION]\n\noptional arguments:\n -h, --help show this help message and exit\n --config CONFIG path to config file\n --action ACTION Which action (train, test, predict)?\n```\n\n- `--action train` is what you are looking for.\n\n### API\n\nThe training APIs can be found in `tests/train_parser.py` etc.\n\n## Performance\n\n![tag](http://wx1.sinaimg.cn/large/006Fmjmcly1fpgvl4ijsoj31kw07zgn9.jpg)\n\n![dep](http://wx3.sinaimg.cn/large/006Fmjmcly1fpgvlqpigpj31kw0bmjtt.jpg)\n\nThe character model seems to be useless for English and Chinese, so it is disabled by default.\n\n## Acknowledgments\n\n- Bi-LSTM-CRF implementation modified from a Dynet-1.x version by [rguthrie3](https://github.com/rguthrie3/BiLSTM-CRF).\n- BiAffine implementation extended from [jcyk](https://github.com/jcyk/Dynet-Biaffine-dependency-parser), added a subword LSTM layer with attention, as introduced in [Stanford's Graph-based Neural Dependency Parser at the CoNLL 2017 Shared Task](https://web.stanford.edu/~tdozat/files/TDozat-CoNLL2017-Paper.pdf).\n- Visualization part is adopted from [Annodoc](https://github.com/spyysalo/annodoc), an annotation documentation support system which is used as a wrapper of BRAT.", "description_content_type": "", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/hankcs/iparser", "keywords": "dependency parser", "license": "", "maintainer": "", "maintainer_email": "", "name": "iparser", "package_url": "https://pypi.org/project/iparser/", "platform": "", "project_url": "https://pypi.org/project/iparser/", "project_urls": { "Homepage": "https://github.com/hankcs/iparser" }, "release_url": "https://pypi.org/project/iparser/0.1.8/", "requires_dist": null, "requires_python": "", "summary": "Integrated and Industrial Strength Dependency Parser", "version": "0.1.8" }, "last_serial": 3680217, "releases": { "0.1.8": [ { "comment_text": "", "digests": { "md5": "04b9be784c44e289ee069acc3a2674b4", "sha256": "473c1b376fe8c14dda5f5bdc0e8db49ffa174043fbd1850220f818ef6f9b8703" }, "downloads": -1, "filename": "iparser-0.1.8.tar.gz", "has_sig": false, "md5_digest": "04b9be784c44e289ee069acc3a2674b4", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 38855, "upload_time": "2018-03-18T05:10:14", "url": "https://files.pythonhosted.org/packages/ef/70/b13464ba7a4434705a0e06d7dc48d239d2288545ac7f9788b3f2291ba250/iparser-0.1.8.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "04b9be784c44e289ee069acc3a2674b4", "sha256": "473c1b376fe8c14dda5f5bdc0e8db49ffa174043fbd1850220f818ef6f9b8703" }, "downloads": -1, "filename": "iparser-0.1.8.tar.gz", "has_sig": false, "md5_digest": "04b9be784c44e289ee069acc3a2674b4", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 38855, "upload_time": "2018-03-18T05:10:14", "url": "https://files.pythonhosted.org/packages/ef/70/b13464ba7a4434705a0e06d7dc48d239d2288545ac7f9788b3f2291ba250/iparser-0.1.8.tar.gz" } ] }