{
    "info": {
        "author": "Raghavendra Kotikalapudi, Johannes Filter",
        "author_email": "ragha@outlook.com, hi@jfilter.de",
        "bugtrack_url": null,
        "classifiers": [
            "Development Status :: 3 - Alpha",
            "License :: OSI Approved :: MIT License",
            "Programming Language :: Python :: 2.7",
            "Programming Language :: Python :: 3.5",
            "Programming Language :: Python :: 3.6",
            "Topic :: Scientific/Engineering :: Artificial Intelligence"
        ],
        "description": "# Text Classification Keras [![Build Status](https://travis-ci.com/jfilter/text-classification-keras.svg?branch=master)](https://travis-ci.com/jfilter/text-classification-keras) [![PyPI](https://img.shields.io/pypi/v/text-classification-keras.svg)](https://pypi.org/project/text-classification-keras/) [![PyPI - Python Version](https://img.shields.io/pypi/pyversions/text-classification-keras.svg)](https://pypi.org/project/text-classification-keras/) [![Gitter](https://img.shields.io/gitter/room/text-classification-keras/Lobby.svg)](https://gitter.im/text-classification-keras/Lobby)\n\nA high-level text classification library implementing various well-established models. With a clean and extendable interface to implement custom architectures.\n\n## Quick start\n\n### Install\n\n```bash\npip install text-classification-keras[full]\n```\n\nThe `[full]` will additionally install [TensorFlow](https://github.com/tensorflow/tensorflow), [Spacy](https://github.com/explosion/spaCy), and [Deep Plots](https://github.com/jfilter/deep-plots). Choose this if you want to get started right away.\n\n### Usage\n\n```python\nfrom texcla import experiment, data\nfrom texcla.models import TokenModelFactory, YoonKimCNN\nfrom texcla.preprocessing import FastTextWikiTokenizer\n\n# input text\nX = ['some random text', 'another random text lala', 'peter', ...]\n\n# input labels\ny = ['a', 'b', 'a', ...]\n\n# use the special tokenizer used for constructing the embeddings\ntokenizer = FastTextWikiTokenizer()\n\n# preprocess data (once)\nexperiment.setup_data(X, y, tokenizer, 'data.bin', max_len=100)\n\n# load data\nds = data.Dataset.load('data.bin')\n\n# construct base\nfactory = TokenModelFactory(\n    ds.num_classes, ds.tokenizer.token_index, max_tokens=100,\n    embedding_type='fasttext.wiki.simple', embedding_dims=300)\n\n# choose a model\nword_encoder_model = YoonKimCNN()\n\n# build a model\nmodel = factory.build_model(\n    token_encoder_model=word_encoder_model, trainable_embeddings=False)\n\n# use experiment.train as wrapper for Keras.fit()\nexperiment.train(x=ds.X, y=ds.y, validation_split=0.1, model=model,\n    word_encoder_model=word_encoder_model)\n```\n\nCheck out more [examples](./examples).\n\n## API Documenation\n\n<https://github.io/jfilter/text-classification-keras/>\n\n## Advanced\n\n### Embeddings\n\nChoose a pre-trained word embedding by setting the embedding_type and the corresponding embedding dimensions. Set `embedding_type=None` to initialize the word embeddings randomly (but make sure to set `trainable_embeddings=True` so you actually train the embeddings).\n\n```python\nfactory = TokenModelFactory(embedding_type='fasttext.wiki.simple', embedding_dims=300)\n```\n\n#### FastText\n\nSeveral pre-trained [FastText](https://fasttext.cc/) embeddings are included. For now, we only have the word embeddings and not the n-gram features. All embedding have 300 dimensions.\n\n-   [English Vectors](https://fasttext.cc/docs/en/english-vectors.html): e.g. `fasttext.wn.1M.300d`, [check out all avaiable embeddings](https://github.com/jfilter/text-classification-keras/blob/master/texcla/embeddings.py#L19)\n-   [Multilang Vectors](https://fasttext.cc/docs/en/crawl-vectors.html): in the format `fasttext.cc.LANG_CODE` e.g. `fasttext.cc.en`\n-   [Wikipedia Vectors](https://fasttext.cc/docs/en/pretrained-vectors.html): in the format `fasttext.wiki.LANG_CODE` e.g. `fasttext.wiki.en`\n\n#### GloVe\n\nThe [GloVe](https://nlp.stanford.edu/projects/glove/) embeddings are some kind of predecessor to FastText. In general choose FastText embeddings over GloVe. The dimension for the pre-trained embeddings varies.\n\n-   : e.g. `glove.6B.50d`, [check out all avaiable embeddings](https://github.com/jfilter/text-classification-keras/blob/master/texcla/embeddings.py#L19)\n\n### Tokenzation\n\n-   To work on token (or word) level, use a TokenTokenizer such e.g. `TwokenizeTokenizer` or `SpacyTokenizer`.\n-   To work on token and sentence level, use `SpacySentenceTokenizer`.\n-   To create an custom Tokenizer, extend `Tokenizer` and implement the `token_generator` method.\n\n#### Spacy\n\nYou may use [spaCy](https://spacy.io/) for the tokenization. See instructions on how to\n[download model](https://spacy.io/docs/usage/models#download) for your target language. E.g. for English:\n\n```bash\npython -m spacy download en\n```\n\n### Models\n\n#### Token-based Models\n\nWhen working on token level, use `TokenModelFactory`.\n\n```python\nfrom texcla.models import TokenModelFactory, YoonKimCNN\n\nfactory = TokenModelFactory(tokenizer.num_classes, tokenizer.token_index,\n    max_tokens=100, embedding_type='glove.6B.100d')\nword_encoder_model = YoonKimCNN()\nmodel = factory.build_model(token_encoder_model=word_encoder_model)\n```\n\nCurrently supported models include:\n\n-   [Yoon Kim CNN](https://arxiv.org/abs/1408.5882)\n-   [Stacked RNNs](https://arxiv.org/abs/1312.6026)\n-   [Attention (with/without context) based RNN encoders](https://www.cs.cmu.edu/~hovy/papers/16HLT-hierarchical-attention-networks.pdf)\n\n`TokenModelFactory.build_model` uses the provided word encoder which is then classified via a [Dense](https://keras.io/layers/core/#dense) layer.\n\n#### Sentence-based Models\n\nWhen working on sentence level, use `SentenceModelFactory`.\n\n```python\n# Pad max sentences per doc to 500 and max words per sentence to 200.\n# Can also use `max_sents=None` to allow variable sized max_sents per mini-batch.\n\nfactory = SentenceModelFactory(10, tokenizer.token_index, max_sents=500,\n    max_tokens=200, embedding_type='glove.6B.100d')\nword_encoder_model = AttentionRNN()\nsentence_encoder_model = AttentionRNN()\n\n# Allows you to compose arbitrary word encoders followed by sentence encoder.\nmodel = factory.build_model(word_encoder_model, sentence_encoder_model)\n```\n\n-   [Hierarchical attention networks](http://www.cs.cmu.edu/~./hovy/papers/16HLT-hierarchical-attention-networks.pdf)\n    (HANs) can be build by composing two attention based RNN models. This is useful when a document is very large.\n-   For smaller document a reasonable way to encode sentences is to average words within it. This can be done by using\n    `token_encoder_model=AveragingEncoder()`\n-   Mix and match encoders as you see fit for your problem.\n\n`SentenceModelFactory.build_model` created a tiered model where words within a sentence is first encoded using\n`word_encoder_model`. All such encodings per sentence is then encoded using `sentence_encoder_model`.\n\n## Related\n\n-   https://github.com/brightmart/text_classification\n-   https://github.com/allenai/allennlp\n-   https://github.com/facebookresearch/pytext\n-   https://docs.fast.ai/text.html\n-   https://github.com/dkpro/dkpro-tc\n\n## Contributing\n\nIf you have a **question**, found a **bug** or want to propose a new **feature**, have a look at the [issues page](https://github.com/jfilter/text-classification-keras/issues).\n\n**Pull requests** are especially welcomed when they fix bugs or improve the code quality.\n\n## Acknowledgements\n\nBuilt upon the work by Raghavendra Kotikalapudi: [keras-text](https://github.com/raghakot/keras-text).\n\n## Citation\n\nIf you find Text Classification Keras useful for an academic publication, then please use the following BibTeX to cite it:\n\n```tex\n@misc{raghakotfiltertexclakeras\n    title={Text Classification Keras},\n    author={Raghavendra Kotikalapudi, and Johannes Filter, and contributors},\n    year={2018},\n    publisher={GitHub},\n    howpublished={\\url{https://github.com/jfilter/text-classification-keras}},\n}\n```\n\n## License\n\nMIT.\n\n\n",
        "description_content_type": "text/markdown",
        "docs_url": null,
        "download_url": "",
        "downloads": {
            "last_day": -1,
            "last_month": -1,
            "last_week": -1
        },
        "home_page": "https://github.com/jfilter/text-classification-keras",
        "keywords": "",
        "license": "MIT",
        "maintainer": "",
        "maintainer_email": "",
        "name": "text-classification-keras",
        "package_url": "https://pypi.org/project/text-classification-keras/",
        "platform": "",
        "project_url": "https://pypi.org/project/text-classification-keras/",
        "project_urls": {
            "Homepage": "https://github.com/jfilter/text-classification-keras"
        },
        "release_url": "https://pypi.org/project/text-classification-keras/0.1.4/",
        "requires_dist": [
            "keras (==2.*)",
            "six (==1.*)",
            "scikit-learn (==0.*)",
            "joblib (==0.*)",
            "jsonpickle (==0.*)",
            "numpy (==1.*)",
            "spacy (==2.*) ; extra == 'full'",
            "deep-plots (==0.*) ; extra == 'full'",
            "tensorflow (==1.*) ; extra == 'full'"
        ],
        "requires_python": "",
        "summary": "Text Classification Library for Keras",
        "version": "0.1.4"
    },
    "last_serial": 5200359,
    "releases": {
        "0.1.0": [
            {
                "comment_text": "",
                "digests": {
                    "md5": "eefd96f6fdd562d590a7aeefc184b30c",
                    "sha256": "abccb93005f7580003c163800bed464cd7225a5b8894dd1c37a7f8cbe887aaab"
                },
                "downloads": -1,
                "filename": "text_classification_keras-0.1.0-py2.py3-none-any.whl",
                "has_sig": false,
                "md5_digest": "eefd96f6fdd562d590a7aeefc184b30c",
                "packagetype": "bdist_wheel",
                "python_version": "py2.py3",
                "requires_python": null,
                "size": 45978,
                "upload_time": "2018-08-01T13:52:34",
                "url": "https://files.pythonhosted.org/packages/e7/83/0cb191006c208664d3cf3273acfdaf54047d21e15927cf26be323b3863d5/text_classification_keras-0.1.0-py2.py3-none-any.whl"
            },
            {
                "comment_text": "",
                "digests": {
                    "md5": "6ec155c06e8f308890edaad31e93d6fa",
                    "sha256": "8515277cd2d1209663005b97fe0abab6f36ca1e501954bbbfbd745f94523dee7"
                },
                "downloads": -1,
                "filename": "text-classification-keras-0.1.0.tar.gz",
                "has_sig": false,
                "md5_digest": "6ec155c06e8f308890edaad31e93d6fa",
                "packagetype": "sdist",
                "python_version": "source",
                "requires_python": null,
                "size": 38088,
                "upload_time": "2018-08-01T13:52:37",
                "url": "https://files.pythonhosted.org/packages/87/7b/6c3543529c6f5621499ef01b45de871f2f932ba86c73c676ceafe73de0e5/text-classification-keras-0.1.0.tar.gz"
            }
        ],
        "0.1.1": [
            {
                "comment_text": "",
                "digests": {
                    "md5": "1e9d369f772bb6d78a29c945d40ec493",
                    "sha256": "8d4cec60614c65408969755b0caa1e152a8dfbe6fd0b2394c5f5bd8709eafb6b"
                },
                "downloads": -1,
                "filename": "text_classification_keras-0.1.1-py2.py3-none-any.whl",
                "has_sig": false,
                "md5_digest": "1e9d369f772bb6d78a29c945d40ec493",
                "packagetype": "bdist_wheel",
                "python_version": "py2.py3",
                "requires_python": null,
                "size": 46178,
                "upload_time": "2018-09-12T22:44:38",
                "url": "https://files.pythonhosted.org/packages/c2/1b/778dcab0ad3a0dac7d4116deb40e5582d9e196b825c228f97a1cb2180dd7/text_classification_keras-0.1.1-py2.py3-none-any.whl"
            },
            {
                "comment_text": "",
                "digests": {
                    "md5": "1cb05b2219e0a718eea1f20c300fae32",
                    "sha256": "5a62febf19996d194d0edc60d73a0dcd32569337524245e78d3f86c7d8a407f3"
                },
                "downloads": -1,
                "filename": "text_classification_keras-0.1.1-py3.7.egg",
                "has_sig": false,
                "md5_digest": "1cb05b2219e0a718eea1f20c300fae32",
                "packagetype": "bdist_egg",
                "python_version": "3.7",
                "requires_python": null,
                "size": 46661,
                "upload_time": "2018-11-01T13:05:44",
                "url": "https://files.pythonhosted.org/packages/62/32/083d31cb458c34405942547c567d369bca843af9383098afc617e011fb9d/text_classification_keras-0.1.1-py3.7.egg"
            },
            {
                "comment_text": "",
                "digests": {
                    "md5": "d7cc412f2d0d9495c0d0f117be4761fc",
                    "sha256": "8366aa50613842d9b656da38f979ffdcb19ab136b22d9f5758296b2451bd2585"
                },
                "downloads": -1,
                "filename": "text-classification-keras-0.1.1.tar.gz",
                "has_sig": false,
                "md5_digest": "d7cc412f2d0d9495c0d0f117be4761fc",
                "packagetype": "sdist",
                "python_version": "source",
                "requires_python": null,
                "size": 38294,
                "upload_time": "2018-09-12T22:44:40",
                "url": "https://files.pythonhosted.org/packages/2f/a5/5ad27bdb9d8b8c1244dcaaf40775280943dc1acceca9268cfc6ef1f7caf1/text-classification-keras-0.1.1.tar.gz"
            }
        ],
        "0.1.2": [
            {
                "comment_text": "",
                "digests": {
                    "md5": "89d5ce9c0bcae55511bd7c51be0d5d17",
                    "sha256": "206b75e233ff43cb06fb3610e0a22538ff398fa30ddfd40fb39dd1d68c744ea9"
                },
                "downloads": -1,
                "filename": "text_classification_keras-0.1.2-py2.py3-none-any.whl",
                "has_sig": false,
                "md5_digest": "89d5ce9c0bcae55511bd7c51be0d5d17",
                "packagetype": "bdist_wheel",
                "python_version": "py2.py3",
                "requires_python": null,
                "size": 48747,
                "upload_time": "2018-11-01T13:05:40",
                "url": "https://files.pythonhosted.org/packages/ae/ae/3e5704da0fbd90b2b3bc1553aa88811f90a5c27bb6f07bc6f5fc89ffda9d/text_classification_keras-0.1.2-py2.py3-none-any.whl"
            },
            {
                "comment_text": "",
                "digests": {
                    "md5": "91203d5e1095dee92a4055465f152f1c",
                    "sha256": "492a50ff03617a04d9af65b95f5e6e15ec193ee2df029c6733146b4173a9b5d5"
                },
                "downloads": -1,
                "filename": "text_classification_keras-0.1.2-py3.7.egg",
                "has_sig": false,
                "md5_digest": "91203d5e1095dee92a4055465f152f1c",
                "packagetype": "bdist_egg",
                "python_version": "3.7",
                "requires_python": null,
                "size": 46682,
                "upload_time": "2018-11-01T13:05:45",
                "url": "https://files.pythonhosted.org/packages/46/da/e192d8f29159daee8f97e4ffd13d57be937ca303e39adcf30a85ca59edcb/text_classification_keras-0.1.2-py3.7.egg"
            },
            {
                "comment_text": "",
                "digests": {
                    "md5": "56c93ed1131045dd2cd1904f5b23035e",
                    "sha256": "957ed5ae338f4f78f5ecabf7f133888897a57bec99402b9d1027ad23e185ec29"
                },
                "downloads": -1,
                "filename": "text-classification-keras-0.1.2.tar.gz",
                "has_sig": false,
                "md5_digest": "56c93ed1131045dd2cd1904f5b23035e",
                "packagetype": "sdist",
                "python_version": "source",
                "requires_python": null,
                "size": 39701,
                "upload_time": "2018-11-01T13:05:42",
                "url": "https://files.pythonhosted.org/packages/bd/11/d71cd71029cfa4885c7272c6c1456325f38cfa8f7e67b752bae537dba0c7/text-classification-keras-0.1.2.tar.gz"
            }
        ],
        "0.1.3": [
            {
                "comment_text": "",
                "digests": {
                    "md5": "e0dc029a3fbb32b4eb5bf4e38ab7f892",
                    "sha256": "d82eb06231c849831127522bea3cc1e65a37b446a5ed7fc1cf28cfd5c479264a"
                },
                "downloads": -1,
                "filename": "text_classification_keras-0.1.3-py2.py3-none-any.whl",
                "has_sig": false,
                "md5_digest": "e0dc029a3fbb32b4eb5bf4e38ab7f892",
                "packagetype": "bdist_wheel",
                "python_version": "py2.py3",
                "requires_python": null,
                "size": 48742,
                "upload_time": "2018-12-14T15:50:56",
                "url": "https://files.pythonhosted.org/packages/e1/bb/ae52983bc88af860e946446fc730c0a9b0488501b1a1a1787fa15b63ca9f/text_classification_keras-0.1.3-py2.py3-none-any.whl"
            },
            {
                "comment_text": "",
                "digests": {
                    "md5": "fd413abc0cdbf3657cd19f206ba1b9fc",
                    "sha256": "56f12ca4cd3d99f46a410742e2aefaa72eae8bf5a393562e54794aeb413291f8"
                },
                "downloads": -1,
                "filename": "text-classification-keras-0.1.3.tar.gz",
                "has_sig": false,
                "md5_digest": "fd413abc0cdbf3657cd19f206ba1b9fc",
                "packagetype": "sdist",
                "python_version": "source",
                "requires_python": null,
                "size": 40208,
                "upload_time": "2018-12-14T15:51:00",
                "url": "https://files.pythonhosted.org/packages/ee/19/96d3a26818e89f3cf76978f23ae6df63c2d3cf6f4c7d77a43c471bd4227f/text-classification-keras-0.1.3.tar.gz"
            }
        ],
        "0.1.4": [
            {
                "comment_text": "",
                "digests": {
                    "md5": "2fcae53bc0200aa3c20e0a80eab34d3b",
                    "sha256": "8219e16304c4335ebcca0c1e6f7b121be0c2acb29f0aa25af4126feec1c89e51"
                },
                "downloads": -1,
                "filename": "text_classification_keras-0.1.4-py2.py3-none-any.whl",
                "has_sig": false,
                "md5_digest": "2fcae53bc0200aa3c20e0a80eab34d3b",
                "packagetype": "bdist_wheel",
                "python_version": "py2.py3",
                "requires_python": null,
                "size": 50952,
                "upload_time": "2019-04-28T18:42:44",
                "url": "https://files.pythonhosted.org/packages/10/4e/e37a46a62416881cb18cc164ec66c87ba288e7483df45302187e99c7762f/text_classification_keras-0.1.4-py2.py3-none-any.whl"
            },
            {
                "comment_text": "",
                "digests": {
                    "md5": "945859bdecfd224b95e37bef83b8dcac",
                    "sha256": "10d4805b7b9d451dc9dbf3511fddd3cb8d4e14f5e2132a805d0636af1ddfbc26"
                },
                "downloads": -1,
                "filename": "text-classification-keras-0.1.4.tar.gz",
                "has_sig": false,
                "md5_digest": "945859bdecfd224b95e37bef83b8dcac",
                "packagetype": "sdist",
                "python_version": "source",
                "requires_python": null,
                "size": 42035,
                "upload_time": "2019-04-28T18:42:53",
                "url": "https://files.pythonhosted.org/packages/af/c9/740d1446c409891996341feb894738d8212ffdf80df0cc18dbf5b95e22aa/text-classification-keras-0.1.4.tar.gz"
            }
        ]
    },
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "md5": "2fcae53bc0200aa3c20e0a80eab34d3b",
                "sha256": "8219e16304c4335ebcca0c1e6f7b121be0c2acb29f0aa25af4126feec1c89e51"
            },
            "downloads": -1,
            "filename": "text_classification_keras-0.1.4-py2.py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "2fcae53bc0200aa3c20e0a80eab34d3b",
            "packagetype": "bdist_wheel",
            "python_version": "py2.py3",
            "requires_python": null,
            "size": 50952,
            "upload_time": "2019-04-28T18:42:44",
            "url": "https://files.pythonhosted.org/packages/10/4e/e37a46a62416881cb18cc164ec66c87ba288e7483df45302187e99c7762f/text_classification_keras-0.1.4-py2.py3-none-any.whl"
        },
        {
            "comment_text": "",
            "digests": {
                "md5": "945859bdecfd224b95e37bef83b8dcac",
                "sha256": "10d4805b7b9d451dc9dbf3511fddd3cb8d4e14f5e2132a805d0636af1ddfbc26"
            },
            "downloads": -1,
            "filename": "text-classification-keras-0.1.4.tar.gz",
            "has_sig": false,
            "md5_digest": "945859bdecfd224b95e37bef83b8dcac",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 42035,
            "upload_time": "2019-04-28T18:42:53",
            "url": "https://files.pythonhosted.org/packages/af/c9/740d1446c409891996341feb894738d8212ffdf80df0cc18dbf5b95e22aa/text-classification-keras-0.1.4.tar.gz"
        }
    ]
}