{ "info": { "author": "Andreas van Cranenburgh", "author_email": "A.W.vanCranenburgh@uva.nl", "bugtrack_url": null, "classifiers": [ "Development Status :: 4 - Beta", "Environment :: Console", "Environment :: Web Environment", "Intended Audience :: Science/Research", "License :: OSI Approved :: GNU General Public License (GPL)", "Operating System :: POSIX", "Programming Language :: Cython", "Programming Language :: Python :: 2.7", "Programming Language :: Python :: 3.3", "Topic :: Text Processing :: Linguistic" ], "description": "=================\nDiscontinuous DOP\n=================\n\n.. image:: docs/images/disco-dop.png\n :align: right\n :alt: contrived discontinuous constituent for expository purposes.\n\nThe aim of this project is to parse discontinuous constituents in natural\nlanguage using Data-Oriented Parsing (DOP), with a focus on global world\ndomination. The grammar is extracted from a treebank of sentences annotated\nwith (discontinuous) phrase-structure trees. Concretely, this project provides\na statistical constituency parser with support for discontinuous constituents\nand Data-Oriented Parsing. Discontinuous constituents are supported through the\ngrammar formalism Linear Context-Free Rewriting System (LCFRS), which is a\ngeneralization of Probabilistic Context-Free Grammar (PCFG). Data-Oriented\nParsing allows re-use of arbitrary-sized fragments from previously seen\nsentences using Tree-Substitution Grammar (TSG).\n\n.. contents:: Contents of this README:\n :local:\n\nFeatures\n========\nGeneral statistical parsing:\n\n- grammar formalisms: PCFG, PLCFRS\n- extract treebank grammar: trees decomposed into productions, relative\n frequencies as probabilities\n- exact *k*-best list of derivations\n- coarse-to-fine pruning: posterior threshold,\n *k*-best coarse-to-fine\n\nDOP specific (parsing with tree fragments):\n\n- implementations: Goodman's DOP reduction, Double-DOP, DOP1.\n- estimators: relative frequency estimate (RFE), equal weights estimate (EWE).\n- objective functions: most probable parse (MPP),\n most probable derivation (MPD), most probable shortest derivation (MPSD),\n most likely tree with shortest derivation (SL-DOP).\n- marginalization: n-best derivations, sampled derivations.\n\n.. image:: docs/images/runexp.png\n :alt: screenshot of parse tree produced by parser\n\nInstallation\n============\n\nRequirements:\n\n- Python 3.3+ http://www.python.org (headers required, e.g. python3-dev package)\n- Cython 0.21+ http://www.cython.org\n- Numpy 1.5+ http://numpy.org/\n\nPython 2.7 is supported, but Python 3 is recommended.\nInstall the ``futures`` package when running Python 2.7.\n\nDebian, Ubuntu based systems\n----------------------------\nTo compile the latest development version on an `Ubuntu `_ system,\nrun the following sequence of commands::\n\n sudo apt-get install build-essential python3-dev python3-numpy python3-pip git\n git clone --depth 1 git://github.com/andreasvc/disco-dop.git\n cd disco-dop\n pip3 install --user -r requirements.txt\n make install\n\nThe ``--user`` option means the packages will be installed to your home\ndirectory which does not require root privileges. Make sure that\n``~/.local/bin`` directory is in your PATH, or add it as follows::\n\n echo export PATH=$HOME/.local/bin:$PATH >> ~/.bashrc\n\nOther Linux systems\n-------------------\nThis assumes no root access, but assumes that ``gcc`` is installed.\n\nSet environment variables so that software can be installed to the home directory\n(replace with equivalent for your shell if you do not use bash)::\n\n mkdir -p ~/.local\n echo export PATH=$HOME/.local/bin:$PATH >> ~/.bashrc\n echo export LD_LIBRARY_PATH=$HOME/.local/lib:/usr/lib64:/usr/lib >> ~/.bashrc\n echo export PYTHONIOENCODING=\"utf-8\" >> ~/.bashrc\n\nAfter this, re-login or restart the shell to activate these settings.\nInstall Python 3 from source, if not installed already.\nPython may require some libraries such as ``zlib`` and ``readline``;\ninstallation steps are similar to the ones below::\n\n wget http://www.python.org/ftp/python/3.5.1/Python-3.5.1.tgz\n tar -xzf Python-*.tgz\n cd Python-*\n ./configure --prefix=$HOME/.local --enable-shared\n make install && cd ..\n\nCheck by running ``python3`` that version 3.5.1 was installed successfully and\nis the default.\n\nInstall the latest development version of discodop::\n\n wget https://github.com/andreasvc/disco-dop/archive/master.zip\n unzip disco-dop-master.zip\n cd disco-dop-master\n pip install --user -r requirements.txt\n make install\n\nMac OS X\n--------\n- Install `Xcode `_ and `Homebrew `_\n- Install dependencies using Homebrew::\n\n brew install gcc python3 git\n git clone --depth 1 git://github.com/andreasvc/disco-dop.git\n cd disco-dop\n sudo pip3 install -r requirements.txt\n env CC=gcc sudo python setup.py install\n sudo make\n\nOther systems\n-------------\nIf you do not run Linux, it is possible to run the code inside a virtual machine.\nTo do that, install `Virtualbox `_\nand download the virtual machine imagine with disco-dop pre-installed:\nhttp://lang.science.uva.nl/VMs/discodop-vboximage.zip\n\n\nUsage, documentation\n====================\ndiscodop can be used in three ways:\n\n1. through the command line; cf. the manual pages for the ``discodop`` command\n installed as part of the installation: ``man discodop``.\n2. as a library, cf. the `API reference `_\n and `example notebooks `_\n3. `Web interfaces `_\n\nNB: avoid running discodop from within the source tree, to ensure that the\ninstalled versions of modules are imported.\n\nThe documentation can be found at http://discodop.readthedocs.io\n\nGrammars\n========\nCf. https://lang.science.uva.nl/grammars/\n\nThe English, German, and Dutch grammars are described in\n`van Cranenburgh et al., (2016) `_;\nthe French grammar appears in `Sangati & van Cranenburgh (2015)\n`_.\nFor comparison, there is also an English grammar without discontinuous\nconstituents (``ptb-nodisc``).\n\nAcknowledgments\n===============\n\nThe Tree data structures in ``tree.py`` and the simple binarization algorithm\nin ``treetransforms.py`` were taken from `NLTK `_.\nThe Zhang-Shasha tree-edit distance algorithm in ``treedist.py`` was taken from\nhttps://github.com/timtadh/zhang-shasha\nElements of the PLCFRS parser and punctuation re-attachment are based on code\nfrom `rparse `_. Various other bits inspired\nby the Stanford parser, Berkeley parser, Bubs parser, &c.\n\nReferences\n==========\nPlease cite the following paper if you use this code in the context of a\npublication::\n\n @article{vancranenburgh2016disc,\n title={Data-Oriented Parsing with discontinuous constituents and function tags},\n author={van Cranenburgh, Andreas and Remko Scha and Rens Bod},\n journal={Journal of Language Modelling},\n year={2016},\n volume={4},\n number={1},\n pages={57--111},\n url={http://dx.doi.org/10.15398/jlm.v4i1.100}\n }", "description_content_type": null, "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "http://discodop.readthedocs.io", "keywords": "", "license": "", "maintainer": "", "maintainer_email": "", "name": "disco-dop", "package_url": "https://pypi.org/project/disco-dop/", "platform": "", "project_url": "https://pypi.org/project/disco-dop/", "project_urls": { "Homepage": "http://discodop.readthedocs.io" }, "release_url": "https://pypi.org/project/disco-dop/0.5.2/", "requires_dist": null, "requires_python": "", "summary": "Discontinuous Data-Oriented Parsing", "version": "0.5.2" }, "last_serial": 2688359, "releases": { "0.1": [ { "comment_text": "", "digests": { "md5": "6a23bd4f49890b5b5ffd5521ee6e9d6f", "sha256": "aaaeb9959c6da58b7152314fa5b0a7232e9b0169b912f1bb21592df8ae98279e" }, "downloads": -1, "filename": "disco-dop-0.1.tar.gz", "has_sig": false, "md5_digest": "6a23bd4f49890b5b5ffd5521ee6e9d6f", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 190334, "upload_time": "2013-05-08T13:12:01", "url": "https://files.pythonhosted.org/packages/5a/34/d66391a4fa9a0ab6753224148af76ab2718df57f062e481d68d4a5ec57c0/disco-dop-0.1.tar.gz" } ], "0.2": [ { "comment_text": "", "digests": { "md5": "c8aaeb63b655e59c3d2bd8921b2a34b7", "sha256": "91f6ecc2e695a07731b261a34f047252c1a862f363f7947639bad015cafad3c5" }, "downloads": -1, "filename": "disco-dop-0.2.tar.gz", "has_sig": false, "md5_digest": "c8aaeb63b655e59c3d2bd8921b2a34b7", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 253069, "upload_time": "2013-05-15T13:00:38", "url": "https://files.pythonhosted.org/packages/31/aa/f79ea3aef699b6b0e58a245c2ed807e55ebeb0d523a96b8fce88fb0f9e16/disco-dop-0.2.tar.gz" } ], "0.3pre1": [ { "comment_text": "", "digests": { "md5": "c0329a1280951b6a6ce65ad802bb3ac4", "sha256": "335ac628222732f003db85968c1c4a39dbb8b6efdd5d608ea6a28f9605032b79" }, "downloads": -1, "filename": "disco-dop-0.3pre1.tar.gz", "has_sig": false, "md5_digest": "c0329a1280951b6a6ce65ad802bb3ac4", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 227497, "upload_time": "2013-08-24T20:06:49", "url": "https://files.pythonhosted.org/packages/bf/4d/33fc9f8973151b4c0c3ca40555ecf360bfbb64b1e985b5c7179c8c7babc0/disco-dop-0.3pre1.tar.gz" } ], "0.4": [ { "comment_text": "", "digests": { "md5": "3f5739ff19ba206d2f8f397f62b29f00", "sha256": "4b944a4f4e06553373021b6c8f1ff878ed421d45c8652b74fa2f472073ae8ac5" }, "downloads": -1, "filename": "disco-dop-0.4.tar.gz", "has_sig": false, "md5_digest": "3f5739ff19ba206d2f8f397f62b29f00", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 232963, "upload_time": "2013-10-29T12:31:02", "url": "https://files.pythonhosted.org/packages/2b/4d/0d77439adca0c67bb8cf8ecf21036a2ddecaa4eed15657044a17b57a308a/disco-dop-0.4.tar.gz" } ], "0.5": [ { "comment_text": "", "digests": { "md5": "e173d3b6f87c8ccd6fe70ea675a749d6", "sha256": "a44ae189a2ecce3010258e50228f2f8bbd85d8ce95b184839bd96a21b94344ea" }, "downloads": -1, "filename": "disco-dop-0.5.tar.gz", "has_sig": false, "md5_digest": "e173d3b6f87c8ccd6fe70ea675a749d6", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 1891124, "upload_time": "2016-12-16T13:31:23", "url": "https://files.pythonhosted.org/packages/19/86/870787deb7e3113e8728c6f6d6e301408c0d297a58f0a00761cb18b5bfa6/disco-dop-0.5.tar.gz" } ], "0.5.1": [ { "comment_text": "", "digests": { "md5": "32a6b943c10076c5662bac19f41a09d9", "sha256": "8b750dc65e96e03bea2d329d16c67720b62ec339bbde8488fef3611d53ea36d0" }, "downloads": -1, "filename": "disco-dop-0.5.1.tar.gz", "has_sig": false, "md5_digest": "32a6b943c10076c5662bac19f41a09d9", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 1804762, "upload_time": "2016-12-23T15:30:54", "url": "https://files.pythonhosted.org/packages/71/0e/efc5cdc7d7bd0315eaa086d1733b590c36e2a2cadf014863426fd260ef7b/disco-dop-0.5.1.tar.gz" } ], "0.5.2": [ { "comment_text": "", "digests": { "md5": "84959d9965fad78f0442f813a318dc32", "sha256": "8c44d9014b536e6f51416ba5d93540d1acd570b024e606d8d78e4f272ad96043" }, "downloads": -1, "filename": "disco-dop-0.5.2.tar.gz", "has_sig": false, "md5_digest": "84959d9965fad78f0442f813a318dc32", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 1672042, "upload_time": "2017-03-07T09:48:47", "url": "https://files.pythonhosted.org/packages/d5/0f/c7e6849af5f1619e563f0bfd735310bb3b1f07e853774382f34af5cb50bb/disco-dop-0.5.2.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "84959d9965fad78f0442f813a318dc32", "sha256": "8c44d9014b536e6f51416ba5d93540d1acd570b024e606d8d78e4f272ad96043" }, "downloads": -1, "filename": "disco-dop-0.5.2.tar.gz", "has_sig": false, "md5_digest": "84959d9965fad78f0442f813a318dc32", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 1672042, "upload_time": "2017-03-07T09:48:47", "url": "https://files.pythonhosted.org/packages/d5/0f/c7e6849af5f1619e563f0bfd735310bb3b1f07e853774382f34af5cb50bb/disco-dop-0.5.2.tar.gz" } ] }