{ "info": { "author": "Stanford NLP Group", "author_email": "chaganty@cs.stanford.edu", "bugtrack_url": null, "classifiers": [ "Development Status :: 4 - Beta", "Intended Audience :: Developers", "License :: OSI Approved :: MIT License", "Programming Language :: Python :: 2", "Programming Language :: Python :: 2.7", "Programming Language :: Python :: 3", "Programming Language :: Python :: 3.3", "Programming Language :: Python :: 3.4", "Programming Language :: Python :: 3.5", "Topic :: Software Development :: Object Brokering" ], "description": "Stanford CoreNLP Python Interface\n=================================\n\n.. image:: https://travis-ci.org/stanfordnlp/python-stanford-corenlp.svg?branch=master\n :target: https://travis-ci.org/stanfordnlp/python-stanford-corenlp\n\nThis package contains a python interface for `Stanford CoreNLP\n`_ that contains a reference\nimplementation to interface with the `Stanford CoreNLP server\n`_.\nThe package also contains a base class to expose a python-based annotation\nprovider (e.g. your favorite neural NER system) to the CoreNLP\npipeline via a lightweight service.\n\nTo use the package, first download the `official java CoreNLP release \n`_, unzip it, and define an environment\nvariable :code:`$CORENLP_HOME` that points to the unzipped directory.\n\nYou can also install this package from `PyPI `_ using :code:`pip install stanford-corenlp` \n\n----\n\nCommand Line Usage\n------------------\nProbably the easiest way to use this package is through the `annotate` command-line utility::\n\n usage: annotate [-h] [-i INPUT] [-o OUTPUT] [-f {json}]\n [-a ANNOTATORS [ANNOTATORS ...]] [-s] [-v] [-m MEMORY]\n [-p PROPS [PROPS ...]]\n\n Annotate data\n\n optional arguments:\n -h, --help show this help message and exit\n -i INPUT, --input INPUT\n Input file to process; each line contains one document\n (default: stdin)\n -o OUTPUT, --output OUTPUT\n File to write annotations to (default: stdout)\n -f {json}, --format {json}\n Output format\n -a ANNOTATORS [ANNOTATORS ...], --annotators ANNOTATORS [ANNOTATORS ...]\n A list of annotators\n -s, --sentence-mode Assume each line of input is a sentence.\n -v, --verbose-server Server is made verbose\n -m MEMORY, --memory MEMORY\n Memory to use for the server\n -p PROPS [PROPS ...], --props PROPS [PROPS ...]\n Properties as a list of key=value pairs\n\n\nWe recommend using `annotate` in conjuction with the wonderful `jq`\ncommand to process the output. As an example, given a file with a\nsentence on each line, the following command produces an equivalent\nspace-separated tokens::\n\n cat file.txt | annotate -s -a tokenize | jq '[.tokens[].originalText]' > tokenized.txt\n\n\nAnnotation Server Usage\n-----------------------\n\n.. code-block:: python\n\n import corenlp\n\n text = \"Chris wrote a simple sentence that he parsed with Stanford CoreNLP.\"\n\n # We assume that you've downloaded Stanford CoreNLP and defined an environment\n # variable $CORENLP_HOME that points to the unzipped directory.\n # The code below will launch StanfordCoreNLPServer in the background\n # and communicate with the server to annotate the sentence.\n with corenlp.CoreNLPClient(annotators=\"tokenize ssplit pos lemma ner depparse\".split()) as client:\n ann = client.annotate(text)\n\n # You can access annotations using ann.\n sentence = ann.sentence[0]\n\n # The corenlp.to_text function is a helper function that\n # reconstructs a sentence from tokens.\n assert corenlp.to_text(sentence) == text\n\n # You can access any property within a sentence.\n print(sentence.text)\n\n # Likewise for tokens\n token = sentence.token[0]\n print(token.lemma)\n\n # Use tokensregex patterns to find who wrote a sentence.\n pattern = '([ner: PERSON]+) /wrote/ /an?/ []{0,3} /sentence|article/'\n matches = client.tokensregex(text, pattern)\n # sentences contains a list with matches for each sentence.\n assert len(matches[\"sentences\"]) == 1\n # length tells you whether or not there are any matches in this\n assert matches[\"sentences\"][0][\"length\"] == 1\n # You can access matches like most regex groups.\n matches[\"sentences\"][1][\"0\"][\"text\"] == \"Chris wrote a simple sentence\"\n matches[\"sentences\"][1][\"0\"][\"1\"][\"text\"] == \"Chris\"\n\n # Use semgrex patterns to directly find who wrote what.\n pattern = '{word:wrote} >nsubj {}=subject >dobj {}=object'\n matches = client.semgrex(text, pattern)\n # sentences contains a list with matches for each sentence.\n assert len(matches[\"sentences\"]) == 1\n # length tells you whether or not there are any matches in this\n assert matches[\"sentences\"][0][\"length\"] == 1\n # You can access matches like most regex groups.\n matches[\"sentences\"][1][\"0\"][\"text\"] == \"wrote\"\n matches[\"sentences\"][1][\"0\"][\"$subject\"][\"text\"] == \"Chris\"\n matches[\"sentences\"][1][\"0\"][\"$object\"][\"text\"] == \"sentence\"\n\nSee `test_client.py` and `test_protobuf.py` for more examples. Props to\n@dan-zheng for tokensregex/semgrex support.\n\n\nAnnotation Service Usage\n------------------------\n\n*NOTE*: The annotation service allows users to provide a custom\nannotator to be used by the CoreNLP pipeline. Unfortunately, it relies\non experimental code internal to the Stanford CoreNLP project is not yet\navailable for public use.\n\n.. code-block:: python\n\n import corenlp\n from .happyfuntokenizer import Tokenizer\n\n class HappyFunTokenizer(Tokenizer, corenlp.Annotator):\n def __init__(self, preserve_case=False):\n Tokenizer.__init__(self, preserve_case)\n corenlp.Annotator.__init__(self)\n\n @property\n def name(self):\n \"\"\"\n Name of the annotator (used by CoreNLP)\n \"\"\"\n return \"happyfun\"\n\n @property\n def requires(self):\n \"\"\"\n Requires has to specify all the annotations required before we\n are called.\n \"\"\"\n return []\n\n @property\n def provides(self):\n \"\"\"\n The set of annotations guaranteed to be provided when we are done.\n NOTE: that these annotations are either fully qualified Java\n class names or refer to nested classes of\n edu.stanford.nlp.ling.CoreAnnotations (as is the case below).\n \"\"\"\n return [\"TextAnnotation\",\n \"TokensAnnotation\",\n \"TokenBeginAnnotation\",\n \"TokenEndAnnotation\",\n \"CharacterOffsetBeginAnnotation\",\n \"CharacterOffsetEndAnnotation\",\n ]\n\n def annotate(self, ann):\n \"\"\"\n @ann: is a protobuf annotation object.\n Actually populate @ann with tokens.\n \"\"\"\n buf, beg_idx, end_idx = ann.text.lower(), 0, 0\n for i, word in enumerate(self.tokenize(ann.text)):\n token = ann.sentencelessToken.add()\n # These are the bare minimum required for the TokenAnnotation\n token.word = word\n token.tokenBeginIndex = i\n token.tokenEndIndex = i+1\n\n # Seek into the txt until you can find this word.\n try:\n # Try to update beginning index\n beg_idx = buf.index(word, beg_idx)\n except ValueError:\n # Give up -- this will be something random\n end_idx = beg_idx + len(word)\n\n token.beginChar = beg_idx\n token.endChar = end_idx\n\n beg_idx, end_idx = end_idx, end_idx\n\n annotator = HappyFunTokenizer()\n # Calling .start() will launch the annotator as a service running on\n # port 8432 by default.\n annotator.start()\n\n # annotator.properties contains all the right properties for\n # Stanford CoreNLP to use this annotator. \n with corenlp.CoreNLPClient(properties=annotator.properties, annotators=\"happyfun ssplit pos\".split()) as client:\n ann = client.annotate(\"RT @ #happyfuncoding: this is a typical Twitter tweet :-)\")\n\n tokens = [t.word for t in ann.sentence[0].token]\n print(tokens)\n\n\nSee `test_annotator.py` for more examples.\n\n\n", "description_content_type": "", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/stanfordnlp/python-stanford-corenlp", "keywords": "corenlp natural-language-processing nlp", "license": "MIT", "maintainer": "", "maintainer_email": "", "name": "stanford-corenlp", "package_url": "https://pypi.org/project/stanford-corenlp/", "platform": "", "project_url": "https://pypi.org/project/stanford-corenlp/", "project_urls": { "Homepage": "https://github.com/stanfordnlp/python-stanford-corenlp" }, "release_url": "https://pypi.org/project/stanford-corenlp/3.9.2/", "requires_dist": [ "corenlp-protobuf (>=3.8.0)", "requests (>=2.10.0)", "six (>=1.9)", "check-manifest; extra == 'dev'", "coverage; extra == 'test'" ], "requires_python": "", "summary": "Official python interface for Stanford CoreNLP", "version": "3.9.2" }, "last_serial": 4692189, "releases": { "3.7.0": [ { "comment_text": "", "digests": { "md5": "d0b4cbaad54ff343584924829a7406b2", "sha256": "ba633e186427fe725e40207c810eea2b40a36ba3329ecdaaebb4dc68045ba865" }, "downloads": -1, "filename": "stanford-corenlp-3.7.0.tar.gz", "has_sig": false, "md5_digest": "d0b4cbaad54ff343584924829a7406b2", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 12766, "upload_time": "2017-06-10T03:33:24", "url": "https://files.pythonhosted.org/packages/eb/c2/9dae16864dc322e130ce5133d47d49744dec03b538ae64ba6c583bcfb376/stanford-corenlp-3.7.0.tar.gz" } ], "3.7.1": [ { "comment_text": "", "digests": { "md5": "4dc9706f1851430686e65a3d25f0e17a", "sha256": "d49b11f394070e9353eaa283188b2c4a531ca4337c6025c2a01266edeb848197" }, "downloads": -1, "filename": "stanford-corenlp-3.7.1.tar.gz", "has_sig": false, "md5_digest": "4dc9706f1851430686e65a3d25f0e17a", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 12838, "upload_time": "2017-06-10T19:28:51", "url": "https://files.pythonhosted.org/packages/5a/94/3aafea9951b440bec011023a1154d620036d225ce2264bbfb42f0eb3d1f9/stanford-corenlp-3.7.1.tar.gz" } ], "3.8.0": [ { "comment_text": "", "digests": { "md5": "712e29e37dfc69b1630379cf6dc08446", "sha256": "baee23f6deff0bd91a43a1ab035a0dec3d5bc302d448075daddeffdadc5f176d" }, "downloads": -1, "filename": "stanford_corenlp-3.8.0-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "712e29e37dfc69b1630379cf6dc08446", "packagetype": "bdist_wheel", "python_version": "2.7", "requires_python": null, "size": 11509, "upload_time": "2017-11-18T23:15:16", "url": "https://files.pythonhosted.org/packages/5d/64/440e3daa30bb48fe6c08df40c8251326268c34d9402fb64b85c0a69ac09d/stanford_corenlp-3.8.0-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "89c43a89820ea46ba47c5581ca5a3f21", "sha256": "bd0530f1828b70e4e785f1265fd065b25501826c000ef1820886e919fc71a02c" }, "downloads": -1, "filename": "stanford-corenlp-3.8.0.tar.gz", "has_sig": false, "md5_digest": "89c43a89820ea46ba47c5581ca5a3f21", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 14196, "upload_time": "2017-11-18T23:15:15", "url": "https://files.pythonhosted.org/packages/19/12/c98ba238da577f5ae2ab35ebe4f877de8a1e2dede32d054c93b181632014/stanford-corenlp-3.8.0.tar.gz" } ], "3.9.2": [ { "comment_text": "", "digests": { "md5": "2769e25407f7d0cde21b77074337c128", "sha256": "43e94ab1d78aac8ccfc6d02290620ec856b5a6fed2a5564938292966db806baa" }, "downloads": -1, "filename": "stanford_corenlp-3.9.2-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "2769e25407f7d0cde21b77074337c128", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": null, "size": 11160, "upload_time": "2019-01-14T00:26:10", "url": "https://files.pythonhosted.org/packages/31/a9/695357743b55c08e74e46fe72579ca2a2559fa9b196d9f2035339af89b94/stanford_corenlp-3.9.2-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "e30b1a2618717406712cd91b417c6533", "sha256": "598a18653779cbd5d5fcbf7d2b6ccf3f7372de9a5f7ecd8c9e4cbbb65ba46d64" }, "downloads": -1, "filename": "stanford-corenlp-3.9.2.tar.gz", "has_sig": false, "md5_digest": "e30b1a2618717406712cd91b417c6533", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 18317, "upload_time": "2019-01-14T00:26:11", "url": "https://files.pythonhosted.org/packages/20/0f/7e5675963ceec1fec931b3c0ccde88bca94698f29dae431dc9e57794e67b/stanford-corenlp-3.9.2.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "2769e25407f7d0cde21b77074337c128", "sha256": "43e94ab1d78aac8ccfc6d02290620ec856b5a6fed2a5564938292966db806baa" }, "downloads": -1, "filename": "stanford_corenlp-3.9.2-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "2769e25407f7d0cde21b77074337c128", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": null, "size": 11160, "upload_time": "2019-01-14T00:26:10", "url": "https://files.pythonhosted.org/packages/31/a9/695357743b55c08e74e46fe72579ca2a2559fa9b196d9f2035339af89b94/stanford_corenlp-3.9.2-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "e30b1a2618717406712cd91b417c6533", "sha256": "598a18653779cbd5d5fcbf7d2b6ccf3f7372de9a5f7ecd8c9e4cbbb65ba46d64" }, "downloads": -1, "filename": "stanford-corenlp-3.9.2.tar.gz", "has_sig": false, "md5_digest": "e30b1a2618717406712cd91b417c6533", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 18317, "upload_time": "2019-01-14T00:26:11", "url": "https://files.pythonhosted.org/packages/20/0f/7e5675963ceec1fec931b3c0ccde88bca94698f29dae431dc9e57794e67b/stanford-corenlp-3.9.2.tar.gz" } ] }