{ "info": { "author": "Artit Wangperawong", "author_email": "artitw@gmail.com", "bugtrack_url": null, "classifiers": [ "License :: OSI Approved :: Apache Software License", "Operating System :: OS Independent", "Programming Language :: Python :: 3" ], "description": "# Text2Class\nBuild multi-class text classifiers using state-of-the-art pre-trained contextualized language models, e.g. BERT. Only a few hundred samples per class are necessary to get started.\n\n## Background\n\nThis project is based on our study: [Transfer Learning Robustness in Multi-Class Categorization by Fine-Tuning Pre-Trained Contextualized Language Models](https://arxiv.org/abs/1909.03564).\n\n### Citation\n\nTo cite this work, use the following BibTeX citation.\n\n```\n@article{transfer2019multiclass,\n title={Transfer Learning Robustness in Multi-Class Categorization by Fine-Tuning Pre-Trained Contextualized Language Models},\n author={Liu, Xinyi and Wangperawong, Artit},\n journal={arXiv preprint arXiv:1909.03564},\n year={2019}\n}\n```\n\n## Installation\n```\npip install text2class\n```\n\n## Example usage\n\n### Create a dataframe with two columns, such as 'text' and 'label'. No text pre-processing is necessary.\n```\nimport pandas as pd\nfrom text2class.text_classifier import TextClassifier\n\ndf = pd.read_csv(\"data.csv\")\n\ntrain = df.sample(frac=0.9,random_state=200)\ntest = df.drop(train.index)\n\ncls = TextClassifier(\n\tnum_labels=3,\n\tdata_column=\"text\",\n\tlabel_column=\"label\",\n\tmax_seq_length=128\n)\n\ncls.fit(train)\n\npredictions = cls.predict(test[\"text\"])\n```\n\n## Advanced usage\n\n### Model type\nThe default model is an uncased Bidirectional Encoder Representations from Transformers (BERT) consisting of 12 transformer layers, 12 self-attention heads per layer, and a hidden size of 768. Below are all models currently supported that you can specify with `hub_module_handle`. We expect that more will be added in the future. For more information, see [BERT's GitHub](https://github.com/google-research/bert).\n```\nhttps://tfhub.dev/google/bert_uncased_L-12_H-768_A-12/1\nhttps://tfhub.dev/google/bert_uncased_L-24_H-1024_A-16/1\nhttps://tfhub.dev/google/bert_cased_L-12_H-768_A-12/1\nhttps://tfhub.dev/google/bert_cased_L-24_H-1024_A-16/1\nhttps://tfhub.dev/google/bert_chinese_L-12_H-768_A-12/1\nhttps://tfhub.dev/google/bert_multi_cased_L-12_H-768_A-12/1\n\ncls = TextClassifier(\n\tnum_labels=3,\n\tdata_column=\"text\",\n\tlabel_column=\"label\",\n\tmax_seq_length=128,\n\thub_module_handle=\"https://tfhub.dev/google/bert_uncased_L-12_H-768_A-12/1\"\n)\n```\n\n## Contributing\nText2Class is an open-source project founded and maintained to better serve the machine learning and data science community. Please feel free to submit pull requests to contribute to the project. By participating, you are expected to adhere to Text2Class's [code of conduct](CODE_OF_CONDUCT.md).\n\n## Questions?\nFor questions or help using Text2Class, please submit a GitHub issue.\n\n\n", "description_content_type": "text/markdown", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/artitw/text2class", "keywords": "bert nlp text classification data science machine learning", "license": "", "maintainer": "", "maintainer_email": "", "name": "text2class", "package_url": "https://pypi.org/project/text2class/", "platform": "", "project_url": "https://pypi.org/project/text2class/", "project_urls": { "Homepage": "https://github.com/artitw/text2class" }, "release_url": "https://pypi.org/project/text2class/0.0.4/", "requires_dist": [ "tensorflow (==1.15.2)", "tensorflow-hub", "pandas" ], "requires_python": "", "summary": "Multi-class text categorization using state-of-the-art pre-trained contextualized language models, e.g. BERT.", "version": "0.0.4", "yanked": false, "yanked_reason": null }, "last_serial": 6535287, "releases": { "0.0.1": [ { "comment_text": "", "digests": { "md5": "355291687c00d3d2ff76bc490a38ac4d", "sha256": "da9db43c2be7093d2fef20bdc97167b62b2c5777d5a83f1b8027ec23d7ffd27e" }, "downloads": -1, "filename": "text2class-0.0.1-py3-none-any.whl", "has_sig": false, "md5_digest": "355291687c00d3d2ff76bc490a38ac4d", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 6275, "upload_time": "2019-09-20T22:42:44", "upload_time_iso_8601": "2019-09-20T22:42:44.751064Z", "url": "https://files.pythonhosted.org/packages/4f/41/e62f18c4be0eec9f3e4b7333530adf044deae6fbc7ab9930769ff4297eb2/text2class-0.0.1-py3-none-any.whl", "yanked": false, "yanked_reason": null }, { "comment_text": "", "digests": { "md5": "658494f4c56fe4a2bb43eb2ab50a63df", "sha256": "e9c40f01bbb7002092550eebb74db12a010e9d9060125a4de88083e2a6831a56" }, "downloads": -1, "filename": "text2class-0.0.1.tar.gz", "has_sig": false, "md5_digest": "658494f4c56fe4a2bb43eb2ab50a63df", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 34567, "upload_time": "2019-09-20T22:42:47", "upload_time_iso_8601": "2019-09-20T22:42:47.600354Z", "url": "https://files.pythonhosted.org/packages/56/be/5ad59bdbb5737e8022eda8a34c8ab2022a2c4af690e13aedb539b6f745e6/text2class-0.0.1.tar.gz", "yanked": false, "yanked_reason": null } ], "0.0.2": [ { "comment_text": "", "digests": { "md5": "756117ab24200f0fa716b805f857a9d5", "sha256": "3e9b0d4987a3685d1091149e97506ee6250bc94a650c52605cf126b5d110d216" }, "downloads": -1, "filename": "text2class-0.0.2-py3-none-any.whl", "has_sig": false, "md5_digest": "756117ab24200f0fa716b805f857a9d5", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 37463, "upload_time": "2019-10-31T00:51:02", "upload_time_iso_8601": "2019-10-31T00:51:02.396444Z", "url": "https://files.pythonhosted.org/packages/80/a0/370f21a696d33562ea6320eb0e56808e1f33514af7cfb7d3cdae6fcadfdc/text2class-0.0.2-py3-none-any.whl", "yanked": false, "yanked_reason": null }, { "comment_text": "", "digests": { "md5": "9e386db0060815662d940be5d83a21f8", "sha256": "835e536fdcb133a371b216acc6c76c75c21b0659e965183118ace56db599eeaa" }, "downloads": -1, "filename": "text2class-0.0.2.tar.gz", "has_sig": false, "md5_digest": "9e386db0060815662d940be5d83a21f8", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 34559, "upload_time": "2019-10-31T00:51:03", "upload_time_iso_8601": "2019-10-31T00:51:03.947881Z", "url": "https://files.pythonhosted.org/packages/2f/69/011956f11979ac39523c80fccfef055c7d0e2f3ee6c0fde065f7223f8e5c/text2class-0.0.2.tar.gz", "yanked": false, "yanked_reason": null } ], "0.0.3": [ { "comment_text": "", "digests": { "md5": "5c3f2e51d6a7bbac34965960f26dd557", "sha256": "b1af14635455e4d654b88e45cecd0dbf7b188a7c825b7b0be0f970244fe510e0" }, "downloads": -1, "filename": "text2class-0.0.3-py3-none-any.whl", "has_sig": false, "md5_digest": "5c3f2e51d6a7bbac34965960f26dd557", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 37493, "upload_time": "2019-12-25T23:18:55", "upload_time_iso_8601": "2019-12-25T23:18:55.367723Z", "url": "https://files.pythonhosted.org/packages/54/e9/0a51ee0774ed2600251ca0d6277343094297681db058407918a438da8a24/text2class-0.0.3-py3-none-any.whl", "yanked": false, "yanked_reason": null }, { "comment_text": "", "digests": { "md5": "118711fdd9989e2f80409a8612ee1915", "sha256": "571ba9e4e8aa81b7abecf9ef2e6c5d86c6cc8367dd168aa8491b6e575b67f050" }, "downloads": -1, "filename": "text2class-0.0.3.tar.gz", "has_sig": false, "md5_digest": "118711fdd9989e2f80409a8612ee1915", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 33241, "upload_time": "2019-12-25T23:18:56", "upload_time_iso_8601": "2019-12-25T23:18:56.810924Z", "url": "https://files.pythonhosted.org/packages/00/c6/830514af7dc1fe37f9b15c1fafa764f80fd5c9138ae116fe28fb15dcb0ba/text2class-0.0.3.tar.gz", "yanked": false, "yanked_reason": null } ], "0.0.4": [ { "comment_text": "", "digests": { "md5": "67e87774b0f57e85fc7c5bab0458fafc", "sha256": "fd13871023c9847ec6b9394eac20acbeed8ad73829a2e2bde6f22149fe4647f3" }, "downloads": -1, "filename": "text2class-0.0.4-py3-none-any.whl", "has_sig": false, "md5_digest": "67e87774b0f57e85fc7c5bab0458fafc", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 37493, "upload_time": "2020-01-29T00:38:47", "upload_time_iso_8601": "2020-01-29T00:38:47.537610Z", "url": "https://files.pythonhosted.org/packages/cf/53/c898361833afc53aaad398b8559475b40113fae4166e3604434a6df3ad0f/text2class-0.0.4-py3-none-any.whl", "yanked": false, "yanked_reason": null }, { "comment_text": "", "digests": { "md5": "d3154c9ed140b62e020755f1845e0eee", "sha256": "76bafc8a7305ecbeffae76ae487c266efa5a2a7472e3653e933500f73b0872dd" }, "downloads": -1, "filename": "text2class-0.0.4.tar.gz", "has_sig": false, "md5_digest": "d3154c9ed140b62e020755f1845e0eee", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 32744, "upload_time": "2020-01-29T00:38:49", "upload_time_iso_8601": "2020-01-29T00:38:49.139857Z", "url": "https://files.pythonhosted.org/packages/cd/15/ba732ea88f5a0e493e7b8c8dda3bbee68923ac87ffbf2b4f57e62e257a4b/text2class-0.0.4.tar.gz", "yanked": false, "yanked_reason": null } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "67e87774b0f57e85fc7c5bab0458fafc", "sha256": "fd13871023c9847ec6b9394eac20acbeed8ad73829a2e2bde6f22149fe4647f3" }, "downloads": -1, "filename": "text2class-0.0.4-py3-none-any.whl", "has_sig": false, "md5_digest": "67e87774b0f57e85fc7c5bab0458fafc", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 37493, "upload_time": "2020-01-29T00:38:47", "upload_time_iso_8601": "2020-01-29T00:38:47.537610Z", "url": "https://files.pythonhosted.org/packages/cf/53/c898361833afc53aaad398b8559475b40113fae4166e3604434a6df3ad0f/text2class-0.0.4-py3-none-any.whl", "yanked": false, "yanked_reason": null }, { "comment_text": "", "digests": { "md5": "d3154c9ed140b62e020755f1845e0eee", "sha256": "76bafc8a7305ecbeffae76ae487c266efa5a2a7472e3653e933500f73b0872dd" }, "downloads": -1, "filename": "text2class-0.0.4.tar.gz", "has_sig": false, "md5_digest": "d3154c9ed140b62e020755f1845e0eee", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 32744, "upload_time": "2020-01-29T00:38:49", "upload_time_iso_8601": "2020-01-29T00:38:49.139857Z", "url": "https://files.pythonhosted.org/packages/cd/15/ba732ea88f5a0e493e7b8c8dda3bbee68923ac87ffbf2b4f57e62e257a4b/text2class-0.0.4.tar.gz", "yanked": false, "yanked_reason": null } ], "vulnerabilities": [] }