{
    "info": {
        "author": "Joseph Sefara",
        "author_email": "sefaratj@gmail.com",
        "bugtrack_url": null,
        "classifiers": [
            "Intended Audience :: Developers",
            "License :: OSI Approved :: MIT License",
            "Natural Language :: English",
            "Operating System :: OS Independent",
            "Programming Language :: Python :: 3",
            "Programming Language :: Python :: Implementation :: PyPy",
            "Topic :: Text Processing :: Linguistic"
        ],
        "description": "# [TextAugment: Improving short text classification through global augmentation methods](https://arxiv.org/abs/1907.03752) \n\nTextAugment is a Python 3 library for augmenting text for natural language processing applications. TextAugment stands on the giant shoulders of [NLTK](https://www.nltk.org/), [Gensim](https://radimrehurek.com/gensim/), and [TextBlob](https://textblob.readthedocs.io/) and plays nicely with them.\n\n## Citation Paper\n\n**[Improving short text classification through global augmentation methods](https://arxiv.org/abs/1907.03752)** published to [MLDM 2019](http://mldm.de)\n\n![alt text](https://raw.githubusercontent.com/dsfsi/textaugment/master/augment.png \"Augmentation methods\")\n\n### Requirements\n\n* Python 3\n\nThe following software packages are dependencies and will be installed automatically.\n\n```shell\n$ pip install numpy nltk gensim textblob googletrans \n\n```\nThe following code downloads NLTK corpus for [wordnet](http://www.nltk.org/howto/wordnet.html).\n```python\nnltk.download('wordnet')\n```\nThe following code downloads [NLTK tokenizer](https://www.nltk.org/_modules/nltk/tokenize/punkt.html). This tokenizer divides a text into a list of sentences by using an unsupervised algorithm to build a model for abbreviation words, collocations, and words that start sentences. \n```python\nnltk.download('punkt')\n```\nThe following code downloads default [NLTK part-of-speech tagger](https://www.nltk.org/_modules/nltk/tag.html) model. A part-of-speech tagger processes a sequence of words, and attaches a part of speech tag to each word.\n```python\nnltk.download('averaged_perceptron_tagger')\n```\nUse gensim to load a pre-trained word2vec model. Like [Google News from Google drive](https://drive.google.com/file/d/0B7XkCwpI5KDYNlNUTTlSS21pQmM/edit).\n```python\nimport gensim\nmodel = gensim.models.Word2Vec.load_word2vec_format('./GoogleNews-vectors-negative300.bin', binary=True)\n```\nOr training one from scratch using your data or the following public dataset:\n\n- [Text8 Wiki](http://mattmahoney.net/dc/enwik9.zip)\n\n- [Dataset from \"One Billion Word Language Modeling Benchmark\"](http://www.statmt.org/lm-benchmark/1-billion-word-language-modeling-benchmark-r13output.tar.gz)\n\n### Installation\n\nInstall from pip [Recommended] \n```sh\n$ pip install textaugment\nor install latest release\n$ pip install git+git@github.com:dsfsi/textaugment.git\n```\n\nInstall from source\n```sh\n$ git clone git@github.com:dsfsi/textaugment.git\n$ cd textaugment\n$ python setup.py install\n```\n\n### How to use\n\nThere are three types of augmentations which can be used:\n\n- word2vec \n\n```python\nfrom textaugment import Word2vec\n```\n\n- wordnet \n```python\nfrom textaugment import Wordnet\n```\n- translate (This will require internet access)\n```python\nfrom textaugment import Translate\n```\n#### Word2vec-based augmentation\n**Basic example**\n```python\n>>> from textaugment import Word2vec\n>>> t = Word2vec(model='path/to/gensim/model'or 'gensim model itself')\n>>> t.augment('The stories are good')\nThe films are good\n```\n**Advanced example**\n\n```python\n>>> runs = 1 # By default.\n>>> v = False # verbose mode to replace all the words. If enabled runs is not effective. Used in this paper (https://www.cs.cmu.edu/~diyiy/docs/emnlp_wang_2015.pdf)\n>>> p = 0.5 # The probability of success of an individual trial. (0.1<p<1.0), default is 0.5. Used by Geometric distribution to selects words from a sentence.\n\n>>> t = Word2vec(model='path/to/gensim/model'or'gensim model itself', runs=5, v=False, p=0.5)\n>>> t.augment('The stories are good')\nThe movies are excellent\n```\n#### WordNet-based augmentation\n**Basic example**\n```python\n>>> import nltk\n>>> nltk.download('punkt')\n>>> nltk.download('wordnet')\n>>> from textaugment import Wordnet\n>>> t = Wordnet()\n>>> t.augment('In the afternoon, John is going to town')\nIn the afternoon, John is walking to town\n```\n**Advanced example**\n\n```python\n>>> v = True # enable verbs augmentation. By default is True.\n>>> n = False # enable nouns augmentation. By default is False.\n>>> runs = 1 # number of times to augment a sentence. By default is 1.\n>>> p = 0.5 # The probability of success of an individual trial. (0.1<p<1.0), default is 0.5. Used by Geometric distribution to selects words from a sentence.\n\n>>> t = Wordnet(v=False ,n=True, p=0.5)\n>>> t.augment('In the afternoon, John is going to town')\nIn the afternoon, Joseph is going to town.\n```\n#### RTT-based augmentation\n**Example**\n```python\n>>> src = \"en\" # source language of the sentence\n>>> to = \"fr\" # target language\n>>> from textaugment import Translate\n>>> t = Translate(src=\"en\", to=\"fr\")\n>>> t.augment('In the afternoon, John is going to town')\nIn the afternoon John goes to town\n```\n## Built with \u2764 on\n* [Python](http://python.org/)\n\n## Authors\n* [Joseph Sefara](https://za.linkedin.com/in/josephsefara) (http://www.speechtech.co.za)\n* [Vukosi Marivate](http://www.vima.co.za) (http://www.vima.co.za)\n\n## Acknowledgements\nCite this [paper](https://arxiv.org/abs/1907.03752) when using this library.\n\n## Licence\nMIT licensed. See the bundled [LICENCE](LICENCE) file for more details.\n\n",
        "description_content_type": "text/markdown",
        "docs_url": null,
        "download_url": "",
        "downloads": {
            "last_day": -1,
            "last_month": -1,
            "last_week": -1
        },
        "home_page": "https://github.com/dsfsi/textaugment",
        "keywords": "text augmentation,python,natural language processing,nlp",
        "license": "MIT",
        "maintainer": "",
        "maintainer_email": "",
        "name": "textaugment",
        "package_url": "https://pypi.org/project/textaugment/",
        "platform": "",
        "project_url": "https://pypi.org/project/textaugment/",
        "project_urls": {
            "Homepage": "https://github.com/dsfsi/textaugment"
        },
        "release_url": "https://pypi.org/project/textaugment/1.1/",
        "requires_dist": [
            "nltk",
            "gensim",
            "textblob",
            "numpy",
            "googletrans"
        ],
        "requires_python": "",
        "summary": "A library for augmenting text for natural language processing applications.",
        "version": "1.1"
    },
    "last_serial": 5535452,
    "releases": {
        "1.0": [
            {
                "comment_text": "",
                "digests": {
                    "md5": "9b1372cb4cafdf67948b7d0f2330b7c1",
                    "sha256": "c210ae4b50764cb17ddf91fedb197fba988793731c9f4445448ef3a77dd70957"
                },
                "downloads": -1,
                "filename": "textaugment-1.0-py3-none-any.whl",
                "has_sig": false,
                "md5_digest": "9b1372cb4cafdf67948b7d0f2330b7c1",
                "packagetype": "bdist_wheel",
                "python_version": "py3",
                "requires_python": null,
                "size": 10258,
                "upload_time": "2019-07-15T12:38:26",
                "url": "https://files.pythonhosted.org/packages/ef/88/372549739a6dfa4fe9f0b0e6247c7e3b861a6c3c00504f6e96ddec9e25b5/textaugment-1.0-py3-none-any.whl"
            },
            {
                "comment_text": "",
                "digests": {
                    "md5": "b6aed57fe8bad23d008587110db93074",
                    "sha256": "44316631883effe6c3a76d19b5e908c1afaf637ed8696d8cbc1a0d2037c72e78"
                },
                "downloads": -1,
                "filename": "textaugment-1.0.tar.gz",
                "has_sig": false,
                "md5_digest": "b6aed57fe8bad23d008587110db93074",
                "packagetype": "sdist",
                "python_version": "source",
                "requires_python": null,
                "size": 10287,
                "upload_time": "2019-07-15T12:38:28",
                "url": "https://files.pythonhosted.org/packages/9e/fc/557b6a1ec8fb5095b1d9f1f3c3fffeb59404b84ef66ea6d5398b79bba242/textaugment-1.0.tar.gz"
            }
        ],
        "1.1": [
            {
                "comment_text": "",
                "digests": {
                    "md5": "f6d4ce092f907799dcfe6419a264e6cb",
                    "sha256": "65c2d014dab8f4457f5998c0b2f3e04a7ae0e717cb81d0ab5f0f59b760133e1b"
                },
                "downloads": -1,
                "filename": "textaugment-1.1-py3-none-any.whl",
                "has_sig": false,
                "md5_digest": "f6d4ce092f907799dcfe6419a264e6cb",
                "packagetype": "bdist_wheel",
                "python_version": "py3",
                "requires_python": null,
                "size": 11116,
                "upload_time": "2019-07-15T15:02:25",
                "url": "https://files.pythonhosted.org/packages/8d/a0/c48647d04668f3b7cec8e9504058a959709251f2cc5dd4a8df4d62b2a638/textaugment-1.1-py3-none-any.whl"
            },
            {
                "comment_text": "",
                "digests": {
                    "md5": "7b5ef3c9efd1a78259788015ffb455c7",
                    "sha256": "6d0ecca10cafc6e73d3f0b3b78beeb62b4db8f1527f026feeaa8e19ca986f7c6"
                },
                "downloads": -1,
                "filename": "textaugment-1.1.tar.gz",
                "has_sig": false,
                "md5_digest": "7b5ef3c9efd1a78259788015ffb455c7",
                "packagetype": "sdist",
                "python_version": "source",
                "requires_python": null,
                "size": 10368,
                "upload_time": "2019-07-15T15:02:26",
                "url": "https://files.pythonhosted.org/packages/7e/42/1f7b29274fed9242080fcb31dc52d5b67cf9578370fd8d783959f7cfbd4e/textaugment-1.1.tar.gz"
            }
        ]
    },
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "md5": "f6d4ce092f907799dcfe6419a264e6cb",
                "sha256": "65c2d014dab8f4457f5998c0b2f3e04a7ae0e717cb81d0ab5f0f59b760133e1b"
            },
            "downloads": -1,
            "filename": "textaugment-1.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "f6d4ce092f907799dcfe6419a264e6cb",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 11116,
            "upload_time": "2019-07-15T15:02:25",
            "url": "https://files.pythonhosted.org/packages/8d/a0/c48647d04668f3b7cec8e9504058a959709251f2cc5dd4a8df4d62b2a638/textaugment-1.1-py3-none-any.whl"
        },
        {
            "comment_text": "",
            "digests": {
                "md5": "7b5ef3c9efd1a78259788015ffb455c7",
                "sha256": "6d0ecca10cafc6e73d3f0b3b78beeb62b4db8f1527f026feeaa8e19ca986f7c6"
            },
            "downloads": -1,
            "filename": "textaugment-1.1.tar.gz",
            "has_sig": false,
            "md5_digest": "7b5ef3c9efd1a78259788015ffb455c7",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 10368,
            "upload_time": "2019-07-15T15:02:26",
            "url": "https://files.pythonhosted.org/packages/7e/42/1f7b29274fed9242080fcb31dc52d5b67cf9578370fd8d783959f7cfbd4e/textaugment-1.1.tar.gz"
        }
    ]
}