{ "info": { "author": "Mikhail Korobov", "author_email": "kmike84@gmail.com", "bugtrack_url": null, "classifiers": [ "Development Status :: 3 - Alpha", "Intended Audience :: Developers", "Intended Audience :: Science/Research", "License :: OSI Approved :: MIT License", "Natural Language :: Russian", "Programming Language :: Python", "Programming Language :: Python :: 2", "Programming Language :: Python :: 2.6", "Programming Language :: Python :: 2.7", "Programming Language :: Python :: 3", "Programming Language :: Python :: 3.2", "Programming Language :: Python :: 3.3", "Programming Language :: Python :: Implementation :: CPython", "Programming Language :: Python :: Implementation :: PyPy", "Topic :: Scientific/Engineering :: Information Analysis", "Topic :: Software Development :: Libraries :: Python Modules", "Topic :: Text Processing :: Linguistic" ], "description": "================\nruscorpora-tools\n================\n\nThis package provides Python interface to a free corpus subset\navailable at http://ruscorpora.ru.\n\nInstallation\n============\n\n::\n\n pip install ruscorpora-tools\n\nUsage\n=====\n\nCorpus downloading\n------------------\n\nDownload and unpack the archive with XML files from\nhttp://www.ruscorpora.ru/corpora-usage.html\n\nCorpus reading\n--------------\n\n``ruscorpora.parse_xml`` function parses single XML file and returns\nan iterator over sentences; each sentence is a list of ``ruscorpora.Token``\ninstances, annotated with a list of ``ruscorpora.Annotation`` instances.\n\n``ruscorpora.simplify`` simplifies a result of ``ruscorpora.parse_xml`` by\nremoving ambiguous annotations, joining split tokens (+ joining their\nannotations) and removing accent information.\n\n::\n\n >>> import ruscorpora as rnc\n >>> for sent in rnc.simplify(rnc.parse('fiction.xml')):\n ... print(sent)\n\nWorking with tags\n-----------------\n\n``ruscorpora.Tag`` class is a convenient wrapper for tags used in\nruscorpora::\n\n >>> tag = rnc.Tag('S,f,inan=sg,nom')\n >>> tag.POS\n 'S'\n >>> tag.gender\n 'f'\n >>> tag.animacy\n 'inan'\n >>> tag.number\n 'sg'\n >>> tag.case\n 'nom'\n >>> tag.tense\n None\n\n(there are also other attributes).\n\nCheck if a grammeme is in tag::\n\n >>> 'S' in tag\n True\n >>> 'V' in tag\n False\n >>> 'Foo' in tag\n Traceback (most recent call last)\n ...\n ValueError: Grammeme is unknown: Foo\n\nTest tags equality::\n\n >>> tag == rnc.Tag('S,f,inan=sg,nom')\n True\n >>> tag == 'S,f,inan=sg,nom'\n True\n >>> tag == rnc.Tag('S,f,inan=sg,acc')\n False\n >>> tag == 'S,f,inan=sg,acc'\n False\n >>> tag == 'Foo,inan'\n Traceback (most recent call last)\n ...\n ValueError: Unknown grammemes: frozenset({Foo})\n\nTags returned by ``rnc.simplify`` are wrapped with this class by default.\n\nDevelopment\n===========\n\nDevelopment happens at github and bitbucket:\n\n* https://github.com/kmike/ruscorpora-tools\n* https://bitbucket.org/kmike/ruscorpora-tools\n\nThe issue tracker is at github: https://github.com/kmike/ruscorpora-tools/issues\n\nFeel free to submit ideas, bugs, pull requests (git or hg) or regular patches.\n\nRunning tests\n-------------\n\nMake sure `tox `_ is installed and run\n\n::\n\n $ tox\n\nfrom the source checkout. Tests should pass under python 2.6..3.3\nand pypy > 1.8.", "description_content_type": null, "docs_url": null, "download_url": "UNKNOWN", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/kmike/ruscorpora-tools/", "keywords": null, "license": "MIT license", "maintainer": null, "maintainer_email": null, "name": "ruscorpora-tools", "package_url": "https://pypi.org/project/ruscorpora-tools/", "platform": "UNKNOWN", "project_url": "https://pypi.org/project/ruscorpora-tools/", "project_urls": { "Download": "UNKNOWN", "Homepage": "https://github.com/kmike/ruscorpora-tools/" }, "release_url": "https://pypi.org/project/ruscorpora-tools/0.3/", "requires_dist": null, "requires_python": null, "summary": "Python interface to a free corpus subset from ruscorpora.ru", "version": "0.3" }, "last_serial": 655877, "releases": { "0.1": [ { "comment_text": "", "digests": { "md5": "bfb99ec8cdb366ca1a48cf1e6b6099d7", "sha256": "8a43df580ba55d3cc048f2dd1460903c285a1c392acb03fde86cf435cb7a832a" }, "downloads": -1, "filename": "ruscorpora-tools-0.1.tar.gz", "has_sig": false, "md5_digest": "bfb99ec8cdb366ca1a48cf1e6b6099d7", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 3141, "upload_time": "2013-01-31T21:28:59", "url": "https://files.pythonhosted.org/packages/92/19/c5b34fa9e659a3c45684e90452359a4287dda92c940496d8a696125cc795/ruscorpora-tools-0.1.tar.gz" } ], "0.2": [ { "comment_text": "", "digests": { "md5": "f604e1f2ad9c825b1464422be289f035", "sha256": "afd4fad984e1b6f6e556a44d54fa9d9f3c77a19a0ee1a497f15eba1cbca2b7e1" }, "downloads": -1, "filename": "ruscorpora-tools-0.2.tar.gz", "has_sig": false, "md5_digest": "f604e1f2ad9c825b1464422be289f035", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 7998, "upload_time": "2013-02-04T20:31:17", "url": "https://files.pythonhosted.org/packages/f3/6a/ff647ed1e0a1b3903d7e3982b0721f8d7cfbb9331727e70d025f64b20665/ruscorpora-tools-0.2.tar.gz" } ], "0.3": [ { "comment_text": "", "digests": { "md5": "8480be6c21f3b25d594fec11ea31f58f", "sha256": "50b6c5845e1b7fba7ca71a7ba85376a5b63b5c413ae231434ae2fb5ec8d11936" }, "downloads": -1, "filename": "ruscorpora-tools-0.3.tar.gz", "has_sig": false, "md5_digest": "8480be6c21f3b25d594fec11ea31f58f", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 8281, "upload_time": "2013-02-14T12:11:55", "url": "https://files.pythonhosted.org/packages/34/22/560bfdc4947453075b72b6deeb81357ad6812f5c0e5818c2c10e928fd6f8/ruscorpora-tools-0.3.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "8480be6c21f3b25d594fec11ea31f58f", "sha256": "50b6c5845e1b7fba7ca71a7ba85376a5b63b5c413ae231434ae2fb5ec8d11936" }, "downloads": -1, "filename": "ruscorpora-tools-0.3.tar.gz", "has_sig": false, "md5_digest": "8480be6c21f3b25d594fec11ea31f58f", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 8281, "upload_time": "2013-02-14T12:11:55", "url": "https://files.pythonhosted.org/packages/34/22/560bfdc4947453075b72b6deeb81357ad6812f5c0e5818c2c10e928fd6f8/ruscorpora-tools-0.3.tar.gz" } ] }