{ "info": { "author": "Thomas Proisl", "author_email": "thopro@posteo.de", "bugtrack_url": null, "classifiers": [ "Development Status :: 5 - Production/Stable", "Environment :: Console", "License :: OSI Approved :: GNU General Public License v3 or later (GPLv3+)", "Operating System :: OS Independent", "Programming Language :: Python :: 2", "Programming Language :: Python :: 3", "Topic :: Scientific/Engineering :: Artificial Intelligence", "Topic :: Text Processing :: Linguistic" ], "description": "=======\nUsurper\n=======\n\nIntroduction\n============\n\nThis is an implementation of the unsupervised dependency parser\ndescribed by S\u00f8gaard (2012). The parser is language independent and\ndoes not need any training data.\n\nThe parser operates in two stages. First, it constructs a directed\ngraph from the words in a sentence using\n\n- information on word adjacency,\n\n- an (automatically created) list of function words [1]_,\n\n- morphological cues\n\n- and information from part-of-speech tags, if available [2]_.\n\nThe resulting graph structure is used to rank the words using the\nPageRank algorithm (Brin and Page, 1998). In the second stage, the\nparser constructs a dependency tree from that ranked list of words. If\npart-of-speech information is available, the parser can make use of\nuniversal dependency rules (Naseem et al., 2010).\n\n.. [1] The list of function words is extracted from the whole input\n text by applying a variant of Mihalcea and Tarau's (2004)\n TextRank algorithm.\n.. [2] The parser relies on a universal part-of-speech tagset (Petrov\n et al., 2012). The language-dependent input tags are mapped to\n that universal tagset using the mappings provided `here\n `_.\n\nInstallation\n============\n\nUsurper can be easily installed using pip::\n\n pip install Usurper\n\nUsage\n=====\n\nUsing the usrpr executable\n--------------------------\n\nYou can use the parser as a standalone program from the command\nline. Your input text has to be either in `CoNLL-X format\n`_ or in a simple format with one token per\nline and an empty line between sentences. If your data is\npart-of-speech tagged, the tags should be separated from the tokens by\na tab::\n\n Many\tJJ\n people\tNNS\n need\tVBP\n our\tPRP$\n help\tNN\n .\t.\n \n Please\tUH\n continue\tVB\n our\tPRP$\n important\tJJ\n partnership\tNN\n .\t.\n\nGeneral usage information, including a list of supported\npart-of-speech tagsets, is available via the ``-h`` option::\n\n usrpr -h\n\nIf you want to use the full parser, i.e. you have part-of-speech\ntagged input data and you want to use the universal dependency rules,\nyou can invoke the parser like this::\n\n usrpr -t [--conll] \n\nIf you do not want to use the universal dependency rules, you can use\nthe ``--no-rules`` option::\n\n usrpr --no-rules -t [--conll] \n\nIf your data is untagged or you want to ignore the tags, simply omit\nthe ``-t`` option (in that case it is not possible to make use of the\nuniversal dependency rules)::\n\n usrpr [--conll] \n\nNote that the parser tries to automatically identify function\nwords. If your input file is too small, that cannot be done reliably\nand might have an impact on parser performance.\n\nUsing the module\n----------------\n\nYou can easily incorporate the parser into your own Python\nprojects. All you have to do is import ``usurper.soegaard``::\n\n from usurper import soegaard\n \n parse = soegaard.parse_sentence(tokens, function_words, no_rules, tags, tagset)\n\nThe ``parse_sentence`` function returns a `networkx\n`_ ``DiGraph`` object. You can convert it\ninto a nested list representation using the ``export_to_conll_format``\nfunction in ``usurper.utils.conll``.\n\nThe function's docstring gives more detailed information about the\narguments it takes::\n\n parse_sentence(tokens, function_words, no_rules, tags=[], tagset=None)\n Parse sentence using the algorithm by S\u00f8gaard (2012).\n \n Args:\n tokens: list of tokens\n function_words: set of function words\n no_rules: boolean; true if universal dependency rules should\n not be used\n tags: list of tags, if available; the nth element of tags\n should be the part-of-speech tag associated with the nth\n element of tokens\n tagset: string identifying one of the supported tagsets\n \n Returns:\n A networkx DiGraph representing the dependency structure.\n\nEvaluation\n==========\n\nHere is a table giving unlabeled attachment scores (ignoring\npunctuation) for a couple of languages. Test data for most of the\nlanguages is available from the `CoNLL-X Shared Task website\n`_. Performance for\nEnglish was evaluated on section 23 of the Penn Treebank.\n\n========== ======= ======== ===========\nLanguage no tags no rules full parser\n========== ======= ======== ===========\nDanish 30.04 37.66 38.20\nEnglish\t 20.41 40.74 40.94\nGerman\t 18.59 33.93 39.24\nPortuguese 19.86 44.86 44.50\nSlovene 19.70 31.41 31.39\nSwedish 20.75 44.69 49.21\n========== ======= ======== ===========\n\nReferences\n==========\n\n- Brin, Sergey, Lawrence Page (1998): \u201cThe anatomy of a large-scale\n hypertextual web search engine.\u201d In: Computer Networks and ISDN\n Systems 30/1\u20137, 107\u2013117. `PDF\n `__.\n- Mihalcea, Rada, Paul Tarau (2004): \u201cTextRank: Bringing order into\n text.\u201d In: Proceedings of the 2004 Conference on Empirical Methods\n in Natural Language Processing (EMNLP'04). ACL, 404\u2013411. `PDF\n `__.\n- Naseem, Tahira, Harr Chen, Regina Barzilay, Mark Johnson (2010):\n \u201cUsing universal linguistic knowledge to guide grammar induction.\u201d\n In: Proceedings of the 2010 Conference on Empirical Methods in\n Natural Language Processing (EMNLP'10). ACL, 1234\u20131244. `PDF\n `__.\n- Petrov, Slav, Dipanjan Das, Ryan McDonald (2012): \u201cA universal\n part-of-speech tagset.\u201d In: Proceedings of the Eighth International\n Conference on Language Resources and Evaluation (LREC'12),\n 2089\u20132096. `PDF\n `__.\n- S\u00f8gaard, Anders (2012): \u201cUnsupervised dependency parsing without\n training.\u201d In: Natural Language Engineering 18/2, 187\u2013203. `Link\n `_.", "description_content_type": null, "docs_url": null, "download_url": "UNKNOWN", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "http://pypi.python.org/pypi/Usurper/", "keywords": null, "license": "GNU General Public License v3 or later (GPLv3+)", "maintainer": null, "maintainer_email": null, "name": "Usurper", "package_url": "https://pypi.org/project/Usurper/", "platform": "UNKNOWN", "project_url": "https://pypi.org/project/Usurper/", "project_urls": { "Download": "UNKNOWN", "Homepage": "http://pypi.python.org/pypi/Usurper/" }, "release_url": "https://pypi.org/project/Usurper/0.9.1/", "requires_dist": null, "requires_python": null, "summary": "An unsupervised dependency parser.", "version": "0.9.1" }, "last_serial": 1221850, "releases": { "0.9.0": [ { "comment_text": "", "digests": { "md5": "b3ba018909a2fe15ab572b26b08cbf93", "sha256": "10b87889baf263b8fe8220b34e3cae7783b17431fff5162291c33baa07bff0ca" }, "downloads": -1, "filename": "Usurper-0.9.0.tar.gz", "has_sig": false, "md5_digest": "b3ba018909a2fe15ab572b26b08cbf93", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 19256, "upload_time": "2014-09-12T10:25:06", "url": "https://files.pythonhosted.org/packages/6b/13/bde8a1b8c0613f2156652797b23f64ba4707f77b66130d534bd8494c343d/Usurper-0.9.0.tar.gz" } ], "0.9.0a9": [ { "comment_text": "", "digests": { "md5": "ec69c37c8e0abe8b7c901e201882432f", "sha256": "07242c88437adcb736df2c455bbbe4d55c20551f89aad0eb40ac95cfa0c607e2" }, "downloads": -1, "filename": "Usurper-0.9.0a9.tar.gz", "has_sig": false, "md5_digest": "ec69c37c8e0abe8b7c901e201882432f", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 19290, "upload_time": "2014-09-12T10:04:41", "url": "https://files.pythonhosted.org/packages/96/1c/059bcd1b9516b3d6c5e0fe544f9bdc9a657c05db6a3c9896d440f7da5f70/Usurper-0.9.0a9.tar.gz" } ], "0.9.1": [ { "comment_text": "", "digests": { "md5": "e471b584e9b9dfa106789991bea40fa0", "sha256": "76a3298a35641f8ff184e2d3fd54f4a0fd6f90bac1d227c0db0b84f0e6c3457d" }, "downloads": -1, "filename": "Usurper-0.9.1.tar.gz", "has_sig": false, "md5_digest": "e471b584e9b9dfa106789991bea40fa0", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 19377, "upload_time": "2014-09-12T10:59:13", "url": "https://files.pythonhosted.org/packages/33/c5/692c781457eb6a97b22f79f39dee2069e20fa8404f6d051c394650fdbb50/Usurper-0.9.1.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "e471b584e9b9dfa106789991bea40fa0", "sha256": "76a3298a35641f8ff184e2d3fd54f4a0fd6f90bac1d227c0db0b84f0e6c3457d" }, "downloads": -1, "filename": "Usurper-0.9.1.tar.gz", "has_sig": false, "md5_digest": "e471b584e9b9dfa106789991bea40fa0", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 19377, "upload_time": "2014-09-12T10:59:13", "url": "https://files.pythonhosted.org/packages/33/c5/692c781457eb6a97b22f79f39dee2069e20fa8404f6d051c394650fdbb50/Usurper-0.9.1.tar.gz" } ] }