{
    "info": {
        "author": "Markus Konrad",
        "author_email": "markus.konrad@wzb.eu",
        "bugtrack_url": null,
        "classifiers": [
            "Development Status :: 4 - Beta",
            "Intended Audience :: Developers",
            "Intended Audience :: Science/Research",
            "License :: OSI Approved :: Apache Software License",
            "Operating System :: OS Independent",
            "Programming Language :: Python",
            "Programming Language :: Python :: 3",
            "Programming Language :: Python :: 3.4",
            "Programming Language :: Python :: 3.5",
            "Programming Language :: Python :: 3.6",
            "Programming Language :: Python :: 3.7",
            "Topic :: Scientific/Engineering :: Information Analysis",
            "Topic :: Software Development :: Libraries :: Python Modules",
            "Topic :: Utilities"
        ],
        "description": "# GermaLemma\n\nJanuary 2019, Markus Konrad <markus.konrad@wzb.eu> / [Berlin Social Science Center](https://www.wzb.eu/en)\n\n## A lemmatizer for German language text\n\nGermalemma lemmatizes Part-of-Speech-tagged German language words. To do so, it combines a large lemma dictionary (an excerpt of the [TIGER corpus from the University of Stuttgart](http://www.ims.uni-stuttgart.de/forschung/ressourcen/korpora/tiger.html)), functions from the CLiPS \"Pattern\" package, and an algorithm to split composita.\n\n## Installation\n\n### Easy option: Installing from PyPI via `pip`\n\nYou can install the package from [PyPI](https://pypi.org/project/germalemma/) via `pip`:\n\n```\npip install -U germalemma\n```\n\n### Downloading and installing from source\n\nIn order to use GermaLemma, you will need to install some additional packages (see *Requirements* section below) and then download the [TIGER corpus from the University of Stuttgart](http://www.ims.uni-stuttgart.de/forschung/ressourcen/korpora/tiger.html). You will need to use the CONLL09 format, *not* the XML format.\nThe corpus is free to use for non-commercial purposes (see [License Agreement](http://www.ims.uni-stuttgart.de/forschung/ressourcen/korpora/TIGERCorpus/license/htmlicense.html)).\n\nThen, you should convert the corpus into pickle format for faster loading by executing *germalemma.py* and passing the path to the corpus file in CONLL09 format:\n\n```\npython germalemma.py tiger_release_[...].conll09\n```\n\nThis will place a `lemmata.pickle` file in the `data` directory which is then automatically loaded.\n\n## Part-of-Speech (POS) Tagging\n\nYou will need to apply [Part-of-Speech (POS) tagging](https://en.wikipedia.org/wiki/Part-of-speech_tagging) to your text before you can lemmatize its words. See [this blog post](https://datascience.blog.wzb.eu/2016/07/13/accurate-part-of-speech-tagging-of-german-texts-with-nltk/) on how to do that.\n\n## Usage\n\nYou have set up GermaLemma to use the TIGER corpus (as explained above). You have tokenized your text (e.g. with NLTK). You have POS-tagged your tokens. Now you can use GermaLemma:\n\n```python\nfrom germalemma import GermaLemma\n\nlemmatizer = GermaLemma()\n\n# passing the word and the POS tag (\"N\" for noun)\nlemma = lemmatizer.find_lemma('Feinstaubbelastungen', 'N')\nprint(lemma)\n# -> lemma is \"Feinstaubbelastung\"\n```\n\n## Valid POS tags\n\nYou can pass POS tags from the [STTS tagset](http://www.ims.uni-stuttgart.de/forschung/ressourcen/lexika/TagSets/stts-table.html), however, only four POS tags can be processed:\n\n* 'N...' (nouns)\n* 'V...' (verbs)\n* 'ADJ...' (adjectives)\n* 'ADV...' (adverbs)\n\n**All other POS tags will result in a `ValueError` so you should wrap the call to `find_lemma` in a *try-except block*.**\n\n## Accuracy\n\nGermaLemma's accuracy was evaluated using a sample of 696 POS tagged and manually lemmatized words from a sample of paragraphs from proceedings of the European Parliament, Goethe's \"Werther\", Kafka's \"Verwandlung\" and a news article from the website of the WZB (see samples in folder \"eval_texts\").\n\n**Under the assumption that the POS tag is correct** (only those words were selected), GermaLemma finds the correct lemma in 99.43% of the cases. For comparison, *Pattern* achieved 95.11% for the same sample.\n\n## Requirements\n\n* Python 3.x (Python 2 is not supported any more!)\n* required package [*Pyphen*](http://pyphen.org/)\n* optional package [*Pattern*](http://www.clips.ua.ac.be/pattern) (This package is optional but highly recommended as it boosts the lemmatizer's accuracy.)\n\n## License\n\nApache License 2.0. See *LICENSE* file.\n\nThe TIGER corpus is **not** part of this repository and has to be downloaded separately under separate license conditions.\n\n\n",
        "description_content_type": "text/markdown",
        "docs_url": null,
        "download_url": "",
        "downloads": {
            "last_day": -1,
            "last_month": -1,
            "last_week": -1
        },
        "home_page": "https://github.com/WZBSocialScienceCenter/germalemma",
        "keywords": "text lemmatization normalization textmining textanalysis mining preprocessing",
        "license": "Apache 2.0",
        "maintainer": "",
        "maintainer_email": "",
        "name": "germalemma",
        "package_url": "https://pypi.org/project/germalemma/",
        "platform": "",
        "project_url": "https://pypi.org/project/germalemma/",
        "project_urls": {
            "Homepage": "https://github.com/WZBSocialScienceCenter/germalemma",
            "Source": "https://github.com/WZBSocialScienceCenter/germalemma",
            "Tracker": "https://github.com/WZBSocialScienceCenter/germalemma/issues"
        },
        "release_url": "https://pypi.org/project/germalemma/0.1.1/",
        "requires_dist": [
            "Pattern (>=3.6)",
            "Pyphen (>=0.9.5)"
        ],
        "requires_python": ">=3.4",
        "summary": "A lemmatizer for German language text.",
        "version": "0.1.1"
    },
    "last_serial": 4759459,
    "releases": {
        "0.1.0": [
            {
                "comment_text": "",
                "digests": {
                    "md5": "65aa473976fefa70d9af8377b4fb89ae",
                    "sha256": "46d5dd5ad9c82d8790c94e166be7e5ba49a70835072458f4bd96991774981a37"
                },
                "downloads": -1,
                "filename": "germalemma-0.1.0-py3-none-any.whl",
                "has_sig": false,
                "md5_digest": "65aa473976fefa70d9af8377b4fb89ae",
                "packagetype": "bdist_wheel",
                "python_version": "py3",
                "requires_python": ">=3.4",
                "size": 2249676,
                "upload_time": "2019-01-30T11:47:51",
                "url": "https://files.pythonhosted.org/packages/72/8d/c25a41774eb0102d6913f4e268fe75d1797097326435807ab730134d2c91/germalemma-0.1.0-py3-none-any.whl"
            },
            {
                "comment_text": "",
                "digests": {
                    "md5": "4d64b41b26472f5023c1571fe5389388",
                    "sha256": "4ead9b29c31fc46f304479396854cf348caf5898da68c06366c9d8c8fb25dd51"
                },
                "downloads": -1,
                "filename": "germalemma-0.1.0.tar.gz",
                "has_sig": false,
                "md5_digest": "4d64b41b26472f5023c1571fe5389388",
                "packagetype": "sdist",
                "python_version": "source",
                "requires_python": ">=3.4",
                "size": 2249744,
                "upload_time": "2019-01-30T11:47:55",
                "url": "https://files.pythonhosted.org/packages/b4/eb/1f96b4f785497b55e44adf3e0b9de42952185652212566d6b451c1f0edd2/germalemma-0.1.0.tar.gz"
            }
        ],
        "0.1.1": [
            {
                "comment_text": "",
                "digests": {
                    "md5": "11a3b3c9018504ef57c884c8a1e8aac3",
                    "sha256": "b376209b09eba5657feb47a8d4e90bdff5bec21dda96c2634f46b612dc5a5424"
                },
                "downloads": -1,
                "filename": "germalemma-0.1.1-py3-none-any.whl",
                "has_sig": false,
                "md5_digest": "11a3b3c9018504ef57c884c8a1e8aac3",
                "packagetype": "bdist_wheel",
                "python_version": "py3",
                "requires_python": ">=3.4",
                "size": 2249721,
                "upload_time": "2019-01-30T11:59:47",
                "url": "https://files.pythonhosted.org/packages/ff/f9/9fb28336e480b0e3744a8633813f9e1bc3f49a4eb3d7f6ad23e923a5a5b4/germalemma-0.1.1-py3-none-any.whl"
            },
            {
                "comment_text": "",
                "digests": {
                    "md5": "51ab3910c534a6b00d9ee74b9301bf22",
                    "sha256": "e3a745ec169004a3d8b537b7db2391887fa98e6ecf014536d8b96fe12b08e206"
                },
                "downloads": -1,
                "filename": "germalemma-0.1.1.tar.gz",
                "has_sig": false,
                "md5_digest": "51ab3910c534a6b00d9ee74b9301bf22",
                "packagetype": "sdist",
                "python_version": "source",
                "requires_python": ">=3.4",
                "size": 2249870,
                "upload_time": "2019-01-30T11:59:50",
                "url": "https://files.pythonhosted.org/packages/93/07/f8329dbd98a961b10d22f479be81a46774e12d0eff859a33f98f65100bdc/germalemma-0.1.1.tar.gz"
            }
        ]
    },
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "md5": "11a3b3c9018504ef57c884c8a1e8aac3",
                "sha256": "b376209b09eba5657feb47a8d4e90bdff5bec21dda96c2634f46b612dc5a5424"
            },
            "downloads": -1,
            "filename": "germalemma-0.1.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "11a3b3c9018504ef57c884c8a1e8aac3",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.4",
            "size": 2249721,
            "upload_time": "2019-01-30T11:59:47",
            "url": "https://files.pythonhosted.org/packages/ff/f9/9fb28336e480b0e3744a8633813f9e1bc3f49a4eb3d7f6ad23e923a5a5b4/germalemma-0.1.1-py3-none-any.whl"
        },
        {
            "comment_text": "",
            "digests": {
                "md5": "51ab3910c534a6b00d9ee74b9301bf22",
                "sha256": "e3a745ec169004a3d8b537b7db2391887fa98e6ecf014536d8b96fe12b08e206"
            },
            "downloads": -1,
            "filename": "germalemma-0.1.1.tar.gz",
            "has_sig": false,
            "md5_digest": "51ab3910c534a6b00d9ee74b9301bf22",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.4",
            "size": 2249870,
            "upload_time": "2019-01-30T11:59:50",
            "url": "https://files.pythonhosted.org/packages/93/07/f8329dbd98a961b10d22f479be81a46774e12d0eff859a33f98f65100bdc/germalemma-0.1.1.tar.gz"
        }
    ]
}