{ "info": { "author": "Wladimir Sidorenko (Uladzimir Sidarenka)", "author_email": "sidarenk@uni-potsdam.de", "bugtrack_url": null, "classifiers": [ "Development Status :: 4 - Beta", "Environment :: Console", "Intended Audience :: Science/Research", "License :: OSI Approved :: MIT License", "Natural Language :: German", "Operating System :: MacOS", "Operating System :: Unix", "Programming Language :: Python :: 2", "Programming Language :: Python :: 2.6", "Programming Language :: Python :: 2.7", "Topic :: Text Processing :: Linguistic" ], "description": "===================\nDiscourse Segmenter\n===================\n\n.. image:: https://travis-ci.org/WladimirSidorenko/DiscourseSegmenter.svg?branch=master\n :alt: Build Status\n :align: right\n :target: https://travis-ci.org/WladimirSidorenko/DiscourseSegmenter\n\n.. image:: https://img.shields.io/badge/license-MIT-blue.svg\n :alt: MIT License\n :align: right\n :target: http://opensource.org/licenses/MIT\n\nA collection of various discourse segmenters (with pre-trained models for German texts).\n\n\nDescription\n===========\n\nThis python module currently comprises three discourse segmenters:\n**edseg**, **bparseg**, and **mateseg**.\n\n**edseg**\n is a rule-based system that uses shallow discourse-oriented\n parsing to determine the boundaries of elementary discourse units.\n The rules are hard-coded in the `submodule's file`_ and are\n only applicable to German input.\n\n**bparseg**\n is an ML-based segmentation module that operates on\n syntactic constituency trees (output from BitPar_) and decides\n whether a syntactic constituent initiates a discourse segment or not\n using a pre-trained linear SVM model. This model was trained on the\n German PCC_ corpus, but you can also train your own classifer for any\n language using your own training data (cf. ``discourse_segmenter\n --help`` for further instructions on how to do that).\n\n**mateseg**\n is another ML-based segmentation module that operates on dependency\n trees (output from MateParser_) and decides whether a sub-structure\n of the dependency graph initiates a discourse segment or not using\n a pre-trained linear SVM model. Again, this model was trained on\n the German PCC_ corpus.\n\n\nInstallation\n============\n\nTo install this package from the PyPi index, run\n\n.. code-block:: shell\n\n pip install dsegmenter\n\nAlternatively, you can also install it directly from the source\nrepository by executing:\n\n.. code-block:: shell\n\n git clone git@github.com:discourse-lab/DiscourseSegmenter.git\n pip install -r DiscourseSegmenter/requirements.txt DiscourseSegmenter/ --user\n\n\nUsage\n=====\n\nAfter installation, you can import the module in your python scripts\n(see an example here_), e.g.:\n\n.. code-block:: python\n\n from dsegmenter.bparseg import BparSegmenter\n\n segmenter = BparSegmenter()\n\nor, alternatively, also use the delivered front-end script\n`discourse_segmenter` to process your parsed input data, e.g.:\n\n.. code-block:: shell\n\n discourse_segmenter bparseg segment DiscourseSegmenter/examples/bpar/maz-8727.exb.bpar\n\nor\n\n.. code-block:: shell\n\n discourse_segmenter mateseg segment DiscourseSegmenter/examples/conll/maz-8727.parsed.conll\n\nNote that this script requires two mandatory arguments: the type of\nthe segmenter to use (`bparseg` or `mateseg` in the above cases) and the\noperation to perform (which meight be specific to each segmenter).\n\n\nEvaluation\n==========\n\nIntrinsic evaluation scores of the machine learning models on the\npredicted vectors will be printed when training and evaluating a\nsegmentation model.\n\nExtrinsic evaluation scores on the predicted segmentation trees can be\ncalculated with the evaluation script.\n\n.. code-block:: shell\n\n evaluation {FOLDER:TRUE} {FOLDER:PRED}\n\nNote, that the script internally calls the `DKpro agreement library`_,\nwhich requires Java 8.\n\n\n\n.. _`Bitpar`: http://www.cis.uni-muenchen.de/~schmid/tools/BitPar/\n.. _`MateParser`: http://code.google.com/p/mate-tools/\n.. _`PCC`: http://www.lrec-conf.org/proceedings/lrec2014/pdf/579_Paper.pdf\n.. _`here`: https://github.com/discourse-lab/DiscourseSegmenter/blob/master/scripts/discourse_segmenter\n.. _`submodule's file`: https://github.com/discourse-lab/DiscourseSegmenter/blob/master/dsegmenter/edseg/clause_segmentation.py\n.. _`DKpro agreement library`: https://dkpro.github.io/dkpro-statistics/", "description_content_type": null, "docs_url": "https://pythonhosted.org/dsegmenter/", "download_url": "UNKNOWN", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/discourse-lab/DiscourseSegmenter", "keywords": "discourse segmentation NLP linguistics", "license": "MIT", "maintainer": null, "maintainer_email": null, "name": "dsegmenter", "package_url": "https://pypi.org/project/dsegmenter/", "platform": "UNKNOWN", "project_url": "https://pypi.org/project/dsegmenter/", "project_urls": { "Download": "UNKNOWN", "Homepage": "https://github.com/discourse-lab/DiscourseSegmenter" }, "release_url": "https://pypi.org/project/dsegmenter/0.2.12/", "requires_dist": null, "requires_python": null, "summary": "Collection of discourse segmenters (with pre-trained models for German)", "version": "0.2.12" }, "last_serial": 2597925, "releases": { "0.0.1.dev1": [ { "comment_text": "built for Linux-3.19.0-42-generic-x86_64-with-glibc2.4", "digests": { "md5": "d633d5ecc7fc4a5d89b841cba12f70dc", "sha256": "f7ae03bb8bdde990f227343bbba54f56c2784deb89f25ee8a3a8ede43098bafd" }, "downloads": -1, "filename": "dsegmenter-0.0.1.dev1.linux-x86_64.tar.gz", "has_sig": false, "md5_digest": "d633d5ecc7fc4a5d89b841cba12f70dc", "packagetype": "bdist_dumb", "python_version": "any", "requires_python": null, "size": 2241249, "upload_time": "2015-12-28T12:16:48", "url": "https://files.pythonhosted.org/packages/81/d8/dda4c97034fce5863fd5e63cd2427276a6eaa5816aedfde47e19e2dd93e3/dsegmenter-0.0.1.dev1.linux-x86_64.tar.gz" }, { "comment_text": "", "digests": { "md5": "6e5ef5b5c4420d38926d1eda41bbb2fa", "sha256": "b72e7219bc5e9734b3e33d6911d79d69c12c970d134a3e76cd08573476d07e3c" }, "downloads": -1, "filename": "dsegmenter-0.0.1.dev1.tar.gz", "has_sig": false, "md5_digest": "6e5ef5b5c4420d38926d1eda41bbb2fa", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 2198607, "upload_time": "2015-12-28T12:16:28", "url": "https://files.pythonhosted.org/packages/53/9a/025740fe2dce80a9a85e13f62d36f19e17af01eddb21dd10f2b9bf693c07/dsegmenter-0.0.1.dev1.tar.gz" } ], "0.2.0": [ { "comment_text": "", "digests": { "md5": "dfb1496b8f6828697dbb80833d2c18a8", "sha256": "4b6b3c25a6cbf7ac7441cb97c1fbc57b40b3c70740e2cd6190ebe2b42947c8ae" }, "downloads": -1, "filename": "dsegmenter-0.2.0-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "dfb1496b8f6828697dbb80833d2c18a8", "packagetype": "bdist_wheel", "python_version": "2.7", "requires_python": null, "size": 3852204, "upload_time": "2017-01-06T17:05:53", "url": "https://files.pythonhosted.org/packages/0c/69/d3d9b6016022490c3c08b88cf33c7fe2f6366e3fb7f1b0819324310bf978/dsegmenter-0.2.0-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "5f2bd4f329630b931d9bb0707a6bc7f4", "sha256": "9717a1bb3001b2c7d0f72a193528a75e7d4af22aa9d7995606f66c93b57cd8dd" }, "downloads": -1, "filename": "dsegmenter-0.2.0.tar.gz", "has_sig": false, "md5_digest": "5f2bd4f329630b931d9bb0707a6bc7f4", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 3809478, "upload_time": "2017-01-06T17:09:11", "url": "https://files.pythonhosted.org/packages/29/7a/a697ec034bc94c242d31999984fccebef8ff3eb9ed35b816ff02b0c5bcfa/dsegmenter-0.2.0.tar.gz" } ], "0.2.1": [ { "comment_text": "", "digests": { "md5": "e52a3ab1309a24093ccee07e7b5b361d", "sha256": "eedcbdab3c8222211f98b5c8e265eeecba259abe327116d00e5c103669e3475a" }, "downloads": -1, "filename": "dsegmenter-0.2.1-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "e52a3ab1309a24093ccee07e7b5b361d", "packagetype": "bdist_wheel", "python_version": "2.7", "requires_python": null, "size": 3851427, "upload_time": "2017-01-06T18:08:37", "url": "https://files.pythonhosted.org/packages/21/a2/b3b04e9ecdb3859fe222a0368871d9f3329cd4f356a8a7cbf6e14d63fcae/dsegmenter-0.2.1-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "2baedb47fd1b9460d718517a3b4f51fa", "sha256": "849becebe05dac06c4601b527ffd1591ab048c8f533183d405f65c85a92d5462" }, "downloads": -1, "filename": "dsegmenter-0.2.1.tar.gz", "has_sig": false, "md5_digest": "2baedb47fd1b9460d718517a3b4f51fa", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 3808913, "upload_time": "2017-01-06T18:10:19", "url": "https://files.pythonhosted.org/packages/c2/7a/fd2270c95200d290537a2f9ec4b7975714f89f8c1258045d3935a3a489ad/dsegmenter-0.2.1.tar.gz" } ], "0.2.12": [ { "comment_text": "", "digests": { "md5": "a48541e81b5e091c93a498acca27e3dd", "sha256": "da26188cf1e11af9127bb73fd2138d4002d41c588d2ee62e94734d8ba5fad200" }, "downloads": -1, "filename": "dsegmenter-0.2.12-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "a48541e81b5e091c93a498acca27e3dd", "packagetype": "bdist_wheel", "python_version": "2.7", "requires_python": null, "size": 3851458, "upload_time": "2017-01-25T17:14:53", "url": "https://files.pythonhosted.org/packages/f9/f8/8870a3621b9c4720e6c7feeec400ce62a69a2c0fc9e88a40af3a41ecd23a/dsegmenter-0.2.12-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "1e9967b4c45b038b6f57bf800694c901", "sha256": "29dacc4b78fa26e44be69322f3645517a3bc545dc63193d4e165d8eaa2bc009e" }, "downloads": -1, "filename": "dsegmenter-0.2.12.tar.gz", "has_sig": false, "md5_digest": "1e9967b4c45b038b6f57bf800694c901", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 3808941, "upload_time": "2017-01-25T17:13:51", "url": "https://files.pythonhosted.org/packages/23/df/70ff5b0cfb00e1e161fbd6a1c277005ec052f748ef148751b22dfe1958f6/dsegmenter-0.2.12.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "a48541e81b5e091c93a498acca27e3dd", "sha256": "da26188cf1e11af9127bb73fd2138d4002d41c588d2ee62e94734d8ba5fad200" }, "downloads": -1, "filename": "dsegmenter-0.2.12-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "a48541e81b5e091c93a498acca27e3dd", "packagetype": "bdist_wheel", "python_version": "2.7", "requires_python": null, "size": 3851458, "upload_time": "2017-01-25T17:14:53", "url": "https://files.pythonhosted.org/packages/f9/f8/8870a3621b9c4720e6c7feeec400ce62a69a2c0fc9e88a40af3a41ecd23a/dsegmenter-0.2.12-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "1e9967b4c45b038b6f57bf800694c901", "sha256": "29dacc4b78fa26e44be69322f3645517a3bc545dc63193d4e165d8eaa2bc009e" }, "downloads": -1, "filename": "dsegmenter-0.2.12.tar.gz", "has_sig": false, "md5_digest": "1e9967b4c45b038b6f57bf800694c901", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 3808941, "upload_time": "2017-01-25T17:13:51", "url": "https://files.pythonhosted.org/packages/23/df/70ff5b0cfb00e1e161fbd6a1c277005ec052f748ef148751b22dfe1958f6/dsegmenter-0.2.12.tar.gz" } ] }