{ "info": { "author": "Raghavendra Kotikalapudi, Johannes Filter", "author_email": "ragha@outlook.com, hi@jfilter.de", "bugtrack_url": null, "classifiers": [ "Development Status :: 3 - Alpha", "License :: OSI Approved :: MIT License", "Programming Language :: Python :: 2.7", "Programming Language :: Python :: 3.5", "Programming Language :: Python :: 3.6", "Topic :: Scientific/Engineering :: Artificial Intelligence" ], "description": "# Text Classification Keras [![Build Status](https://travis-ci.com/jfilter/text-classification-keras.svg?branch=master)](https://travis-ci.com/jfilter/text-classification-keras) [![PyPI](https://img.shields.io/pypi/v/text-classification-keras.svg)](https://pypi.org/project/text-classification-keras/) [![PyPI - Python Version](https://img.shields.io/pypi/pyversions/text-classification-keras.svg)](https://pypi.org/project/text-classification-keras/) [![Gitter](https://img.shields.io/gitter/room/text-classification-keras/Lobby.svg)](https://gitter.im/text-classification-keras/Lobby)\n\nA high-level text classification library implementing various well-established models. With a clean and extendable interface to implement custom architectures.\n\n## Quick start\n\n### Install\n\n```bash\npip install text-classification-keras[full]\n```\n\nThe `[full]` will additionally install [TensorFlow](https://github.com/tensorflow/tensorflow), [Spacy](https://github.com/explosion/spaCy), and [Deep Plots](https://github.com/jfilter/deep-plots). Choose this if you want to get started right away.\n\n### Usage\n\n```python\nfrom texcla import experiment, data\nfrom texcla.models import TokenModelFactory, YoonKimCNN\nfrom texcla.preprocessing import FastTextWikiTokenizer\n\n# input text\nX = ['some random text', 'another random text lala', 'peter', ...]\n\n# input labels\ny = ['a', 'b', 'a', ...]\n\n# use the special tokenizer used for constructing the embeddings\ntokenizer = FastTextWikiTokenizer()\n\n# preprocess data (once)\nexperiment.setup_data(X, y, tokenizer, 'data.bin', max_len=100)\n\n# load data\nds = data.Dataset.load('data.bin')\n\n# construct base\nfactory = TokenModelFactory(\n ds.num_classes, ds.tokenizer.token_index, max_tokens=100,\n embedding_type='fasttext.wiki.simple', embedding_dims=300)\n\n# choose a model\nword_encoder_model = YoonKimCNN()\n\n# build a model\nmodel = factory.build_model(\n token_encoder_model=word_encoder_model, trainable_embeddings=False)\n\n# use experiment.train as wrapper for Keras.fit()\nexperiment.train(x=ds.X, y=ds.y, validation_split=0.1, model=model,\n word_encoder_model=word_encoder_model)\n```\n\nCheck out more [examples](./examples).\n\n## API Documenation\n\n\n\n## Advanced\n\n### Embeddings\n\nChoose a pre-trained word embedding by setting the embedding_type and the corresponding embedding dimensions. Set `embedding_type=None` to initialize the word embeddings randomly (but make sure to set `trainable_embeddings=True` so you actually train the embeddings).\n\n```python\nfactory = TokenModelFactory(embedding_type='fasttext.wiki.simple', embedding_dims=300)\n```\n\n#### FastText\n\nSeveral pre-trained [FastText](https://fasttext.cc/) embeddings are included. For now, we only have the word embeddings and not the n-gram features. All embedding have 300 dimensions.\n\n- [English Vectors](https://fasttext.cc/docs/en/english-vectors.html): e.g. `fasttext.wn.1M.300d`, [check out all avaiable embeddings](https://github.com/jfilter/text-classification-keras/blob/master/texcla/embeddings.py#L19)\n- [Multilang Vectors](https://fasttext.cc/docs/en/crawl-vectors.html): in the format `fasttext.cc.LANG_CODE` e.g. `fasttext.cc.en`\n- [Wikipedia Vectors](https://fasttext.cc/docs/en/pretrained-vectors.html): in the format `fasttext.wiki.LANG_CODE` e.g. `fasttext.wiki.en`\n\n#### GloVe\n\nThe [GloVe](https://nlp.stanford.edu/projects/glove/) embeddings are some kind of predecessor to FastText. In general choose FastText embeddings over GloVe. The dimension for the pre-trained embeddings varies.\n\n- : e.g. `glove.6B.50d`, [check out all avaiable embeddings](https://github.com/jfilter/text-classification-keras/blob/master/texcla/embeddings.py#L19)\n\n### Tokenzation\n\n- To work on token (or word) level, use a TokenTokenizer such e.g. `TwokenizeTokenizer` or `SpacyTokenizer`.\n- To work on token and sentence level, use `SpacySentenceTokenizer`.\n- To create an custom Tokenizer, extend `Tokenizer` and implement the `token_generator` method.\n\n#### Spacy\n\nYou may use [spaCy](https://spacy.io/) for the tokenization. See instructions on how to\n[download model](https://spacy.io/docs/usage/models#download) for your target language. E.g. for English:\n\n```bash\npython -m spacy download en\n```\n\n### Models\n\n#### Token-based Models\n\nWhen working on token level, use `TokenModelFactory`.\n\n```python\nfrom texcla.models import TokenModelFactory, YoonKimCNN\n\nfactory = TokenModelFactory(tokenizer.num_classes, tokenizer.token_index,\n max_tokens=100, embedding_type='glove.6B.100d')\nword_encoder_model = YoonKimCNN()\nmodel = factory.build_model(token_encoder_model=word_encoder_model)\n```\n\nCurrently supported models include:\n\n- [Yoon Kim CNN](https://arxiv.org/abs/1408.5882)\n- [Stacked RNNs](https://arxiv.org/abs/1312.6026)\n- [Attention (with/without context) based RNN encoders](https://www.cs.cmu.edu/~hovy/papers/16HLT-hierarchical-attention-networks.pdf)\n\n`TokenModelFactory.build_model` uses the provided word encoder which is then classified via a [Dense](https://keras.io/layers/core/#dense) layer.\n\n#### Sentence-based Models\n\nWhen working on sentence level, use `SentenceModelFactory`.\n\n```python\n# Pad max sentences per doc to 500 and max words per sentence to 200.\n# Can also use `max_sents=None` to allow variable sized max_sents per mini-batch.\n\nfactory = SentenceModelFactory(10, tokenizer.token_index, max_sents=500,\n max_tokens=200, embedding_type='glove.6B.100d')\nword_encoder_model = AttentionRNN()\nsentence_encoder_model = AttentionRNN()\n\n# Allows you to compose arbitrary word encoders followed by sentence encoder.\nmodel = factory.build_model(word_encoder_model, sentence_encoder_model)\n```\n\n- [Hierarchical attention networks](http://www.cs.cmu.edu/~./hovy/papers/16HLT-hierarchical-attention-networks.pdf)\n (HANs) can be build by composing two attention based RNN models. This is useful when a document is very large.\n- For smaller document a reasonable way to encode sentences is to average words within it. This can be done by using\n `token_encoder_model=AveragingEncoder()`\n- Mix and match encoders as you see fit for your problem.\n\n`SentenceModelFactory.build_model` created a tiered model where words within a sentence is first encoded using\n`word_encoder_model`. All such encodings per sentence is then encoded using `sentence_encoder_model`.\n\n## Related\n\n- https://github.com/brightmart/text_classification\n- https://github.com/allenai/allennlp\n- https://github.com/facebookresearch/pytext\n- https://docs.fast.ai/text.html\n- https://github.com/dkpro/dkpro-tc\n\n## Contributing\n\nIf you have a **question**, found a **bug** or want to propose a new **feature**, have a look at the [issues page](https://github.com/jfilter/text-classification-keras/issues).\n\n**Pull requests** are especially welcomed when they fix bugs or improve the code quality.\n\n## Acknowledgements\n\nBuilt upon the work by Raghavendra Kotikalapudi: [keras-text](https://github.com/raghakot/keras-text).\n\n## Citation\n\nIf you find Text Classification Keras useful for an academic publication, then please use the following BibTeX to cite it:\n\n```tex\n@misc{raghakotfiltertexclakeras\n title={Text Classification Keras},\n author={Raghavendra Kotikalapudi, and Johannes Filter, and contributors},\n year={2018},\n publisher={GitHub},\n howpublished={\\url{https://github.com/jfilter/text-classification-keras}},\n}\n```\n\n## License\n\nMIT.\n\n\n", "description_content_type": "text/markdown", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/jfilter/text-classification-keras", "keywords": "", "license": "MIT", "maintainer": "", "maintainer_email": "", "name": "text-classification-keras", "package_url": "https://pypi.org/project/text-classification-keras/", "platform": "", "project_url": "https://pypi.org/project/text-classification-keras/", "project_urls": { "Homepage": "https://github.com/jfilter/text-classification-keras" }, "release_url": "https://pypi.org/project/text-classification-keras/0.1.4/", "requires_dist": [ "keras (==2.*)", "six (==1.*)", "scikit-learn (==0.*)", "joblib (==0.*)", "jsonpickle (==0.*)", "numpy (==1.*)", "spacy (==2.*) ; extra == 'full'", "deep-plots (==0.*) ; extra == 'full'", "tensorflow (==1.*) ; extra == 'full'" ], "requires_python": "", "summary": "Text Classification Library for Keras", "version": "0.1.4" }, "last_serial": 5200359, "releases": { "0.1.0": [ { "comment_text": "", "digests": { "md5": "eefd96f6fdd562d590a7aeefc184b30c", "sha256": "abccb93005f7580003c163800bed464cd7225a5b8894dd1c37a7f8cbe887aaab" }, "downloads": -1, "filename": "text_classification_keras-0.1.0-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "eefd96f6fdd562d590a7aeefc184b30c", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": null, "size": 45978, "upload_time": "2018-08-01T13:52:34", "url": "https://files.pythonhosted.org/packages/e7/83/0cb191006c208664d3cf3273acfdaf54047d21e15927cf26be323b3863d5/text_classification_keras-0.1.0-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "6ec155c06e8f308890edaad31e93d6fa", "sha256": "8515277cd2d1209663005b97fe0abab6f36ca1e501954bbbfbd745f94523dee7" }, "downloads": -1, "filename": "text-classification-keras-0.1.0.tar.gz", "has_sig": false, "md5_digest": "6ec155c06e8f308890edaad31e93d6fa", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 38088, "upload_time": "2018-08-01T13:52:37", "url": "https://files.pythonhosted.org/packages/87/7b/6c3543529c6f5621499ef01b45de871f2f932ba86c73c676ceafe73de0e5/text-classification-keras-0.1.0.tar.gz" } ], "0.1.1": [ { "comment_text": "", "digests": { "md5": "1e9d369f772bb6d78a29c945d40ec493", "sha256": "8d4cec60614c65408969755b0caa1e152a8dfbe6fd0b2394c5f5bd8709eafb6b" }, "downloads": -1, "filename": "text_classification_keras-0.1.1-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "1e9d369f772bb6d78a29c945d40ec493", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": null, "size": 46178, "upload_time": "2018-09-12T22:44:38", "url": "https://files.pythonhosted.org/packages/c2/1b/778dcab0ad3a0dac7d4116deb40e5582d9e196b825c228f97a1cb2180dd7/text_classification_keras-0.1.1-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "1cb05b2219e0a718eea1f20c300fae32", "sha256": "5a62febf19996d194d0edc60d73a0dcd32569337524245e78d3f86c7d8a407f3" }, "downloads": -1, "filename": "text_classification_keras-0.1.1-py3.7.egg", "has_sig": false, "md5_digest": "1cb05b2219e0a718eea1f20c300fae32", "packagetype": "bdist_egg", "python_version": "3.7", "requires_python": null, "size": 46661, "upload_time": "2018-11-01T13:05:44", "url": "https://files.pythonhosted.org/packages/62/32/083d31cb458c34405942547c567d369bca843af9383098afc617e011fb9d/text_classification_keras-0.1.1-py3.7.egg" }, { "comment_text": "", "digests": { "md5": "d7cc412f2d0d9495c0d0f117be4761fc", "sha256": "8366aa50613842d9b656da38f979ffdcb19ab136b22d9f5758296b2451bd2585" }, "downloads": -1, "filename": "text-classification-keras-0.1.1.tar.gz", "has_sig": false, "md5_digest": "d7cc412f2d0d9495c0d0f117be4761fc", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 38294, "upload_time": "2018-09-12T22:44:40", "url": "https://files.pythonhosted.org/packages/2f/a5/5ad27bdb9d8b8c1244dcaaf40775280943dc1acceca9268cfc6ef1f7caf1/text-classification-keras-0.1.1.tar.gz" } ], "0.1.2": [ { "comment_text": "", "digests": { "md5": "89d5ce9c0bcae55511bd7c51be0d5d17", "sha256": "206b75e233ff43cb06fb3610e0a22538ff398fa30ddfd40fb39dd1d68c744ea9" }, "downloads": -1, "filename": "text_classification_keras-0.1.2-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "89d5ce9c0bcae55511bd7c51be0d5d17", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": null, "size": 48747, "upload_time": "2018-11-01T13:05:40", "url": "https://files.pythonhosted.org/packages/ae/ae/3e5704da0fbd90b2b3bc1553aa88811f90a5c27bb6f07bc6f5fc89ffda9d/text_classification_keras-0.1.2-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "91203d5e1095dee92a4055465f152f1c", "sha256": "492a50ff03617a04d9af65b95f5e6e15ec193ee2df029c6733146b4173a9b5d5" }, "downloads": -1, "filename": "text_classification_keras-0.1.2-py3.7.egg", "has_sig": false, "md5_digest": "91203d5e1095dee92a4055465f152f1c", "packagetype": "bdist_egg", "python_version": "3.7", "requires_python": null, "size": 46682, "upload_time": "2018-11-01T13:05:45", "url": "https://files.pythonhosted.org/packages/46/da/e192d8f29159daee8f97e4ffd13d57be937ca303e39adcf30a85ca59edcb/text_classification_keras-0.1.2-py3.7.egg" }, { "comment_text": "", "digests": { "md5": "56c93ed1131045dd2cd1904f5b23035e", "sha256": "957ed5ae338f4f78f5ecabf7f133888897a57bec99402b9d1027ad23e185ec29" }, "downloads": -1, "filename": "text-classification-keras-0.1.2.tar.gz", "has_sig": false, "md5_digest": "56c93ed1131045dd2cd1904f5b23035e", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 39701, "upload_time": "2018-11-01T13:05:42", "url": "https://files.pythonhosted.org/packages/bd/11/d71cd71029cfa4885c7272c6c1456325f38cfa8f7e67b752bae537dba0c7/text-classification-keras-0.1.2.tar.gz" } ], "0.1.3": [ { "comment_text": "", "digests": { "md5": "e0dc029a3fbb32b4eb5bf4e38ab7f892", "sha256": "d82eb06231c849831127522bea3cc1e65a37b446a5ed7fc1cf28cfd5c479264a" }, "downloads": -1, "filename": "text_classification_keras-0.1.3-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "e0dc029a3fbb32b4eb5bf4e38ab7f892", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": null, "size": 48742, "upload_time": "2018-12-14T15:50:56", "url": "https://files.pythonhosted.org/packages/e1/bb/ae52983bc88af860e946446fc730c0a9b0488501b1a1a1787fa15b63ca9f/text_classification_keras-0.1.3-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "fd413abc0cdbf3657cd19f206ba1b9fc", "sha256": "56f12ca4cd3d99f46a410742e2aefaa72eae8bf5a393562e54794aeb413291f8" }, "downloads": -1, "filename": "text-classification-keras-0.1.3.tar.gz", "has_sig": false, "md5_digest": "fd413abc0cdbf3657cd19f206ba1b9fc", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 40208, "upload_time": "2018-12-14T15:51:00", "url": "https://files.pythonhosted.org/packages/ee/19/96d3a26818e89f3cf76978f23ae6df63c2d3cf6f4c7d77a43c471bd4227f/text-classification-keras-0.1.3.tar.gz" } ], "0.1.4": [ { "comment_text": "", "digests": { "md5": "2fcae53bc0200aa3c20e0a80eab34d3b", "sha256": "8219e16304c4335ebcca0c1e6f7b121be0c2acb29f0aa25af4126feec1c89e51" }, "downloads": -1, "filename": "text_classification_keras-0.1.4-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "2fcae53bc0200aa3c20e0a80eab34d3b", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": null, "size": 50952, "upload_time": "2019-04-28T18:42:44", "url": "https://files.pythonhosted.org/packages/10/4e/e37a46a62416881cb18cc164ec66c87ba288e7483df45302187e99c7762f/text_classification_keras-0.1.4-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "945859bdecfd224b95e37bef83b8dcac", "sha256": "10d4805b7b9d451dc9dbf3511fddd3cb8d4e14f5e2132a805d0636af1ddfbc26" }, "downloads": -1, "filename": "text-classification-keras-0.1.4.tar.gz", "has_sig": false, "md5_digest": "945859bdecfd224b95e37bef83b8dcac", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 42035, "upload_time": "2019-04-28T18:42:53", "url": "https://files.pythonhosted.org/packages/af/c9/740d1446c409891996341feb894738d8212ffdf80df0cc18dbf5b95e22aa/text-classification-keras-0.1.4.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "2fcae53bc0200aa3c20e0a80eab34d3b", "sha256": "8219e16304c4335ebcca0c1e6f7b121be0c2acb29f0aa25af4126feec1c89e51" }, "downloads": -1, "filename": "text_classification_keras-0.1.4-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "2fcae53bc0200aa3c20e0a80eab34d3b", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": null, "size": 50952, "upload_time": "2019-04-28T18:42:44", "url": "https://files.pythonhosted.org/packages/10/4e/e37a46a62416881cb18cc164ec66c87ba288e7483df45302187e99c7762f/text_classification_keras-0.1.4-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "945859bdecfd224b95e37bef83b8dcac", "sha256": "10d4805b7b9d451dc9dbf3511fddd3cb8d4e14f5e2132a805d0636af1ddfbc26" }, "downloads": -1, "filename": "text-classification-keras-0.1.4.tar.gz", "has_sig": false, "md5_digest": "945859bdecfd224b95e37bef83b8dcac", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 42035, "upload_time": "2019-04-28T18:42:53", "url": "https://files.pythonhosted.org/packages/af/c9/740d1446c409891996341feb894738d8212ffdf80df0cc18dbf5b95e22aa/text-classification-keras-0.1.4.tar.gz" } ] }