{ "info": { "author": "Dean Knudson", "author_email": "knuddj1@student.op.ac.nz", "bugtrack_url": null, "classifiers": [ "Intended Audience :: Developers", "License :: OSI Approved :: MIT License", "Programming Language :: Python :: 3", "Topic :: Scientific/Engineering :: Artificial Intelligence" ], "description": "# OP Text\nA wrapper around the popular [transformers](https://github.com/huggingface/transformers) machine learning library, by the HuggingFace team. OP Text provides a simplified, [Keras](https://keras.io/) like, interface for fine-tuning, evaluating and inference of popular pretrained BERT models.\n\n## Installation\nPyTorch is required as a prerequisite before installing OP Text. Head on over to the [getting started page](https://pytorch.org/get-started/locally/) of their website and follow the installation instructions for your version of Python. \n\n>!Currently only Python versions 3.6 and above are supported\n\nUse one of the following commands to install OP Text:\n\n### with pip\n\n pip install op_text\n\n### with anaconda\n\n conda install op_text\n\n## Usage \nThe entire purpose of this package is to allow users to leverage the power of the transformer models available in HuggingFace's library without needing to understand how to use PyTorch.\n\n### Model Loading\n\nCurrently the available models are:\n\n 1. **BERT** from the paper [BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding(https://arxiv.org/abs/1810.04805)](https://arxiv.org/abs/1810.04805) released by Google.\n\n2. **RoBERTa** from the paper [Robustly Optimized BERT Pretraining Approach](https://arxiv.org/abs/1907.11692) released by Facebook.\n\n3. **DistilBERT** from the paper [DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter](https://arxiv.org/abs/1910.01108) released by HuggingFace.\n\nEach model has contains a list of the available pretrained models.\nUse the **DOWNLOADABLES** property to access them.\n\n Bert.DOWNLOADBLES\n >> ['bert-base-uncased','bert-large-uncased']\n\n\tRoberta.DOWNLOADBLES\n\t>> ['roberta-base','roberta-large']\n\n\tDistilBert.DOWNLOADABLES\n\t>> ['distilbert-base-uncased']\n\nLoading a model is achieved in one line of code. The string can either be the name of a pretrained model or a path to a local fine-tuned model on disk. \n\n from op_text.models import Bert\n\n\t# Loading a pretrained model \n\tmodel = Bert(\"bert-base-uncased\", num_labels=2)\n\n\t# Loading a fine-tuned model\n\tmodel = Bert(\"path/to/local/model/\")\n\n> \tSupply *num_labels* when using a pretrained model, as an untrained classification head is added when using this one of the DOWNLOADABLE strings.\n\n### Fine-tuning\nFinetuning a model is as simple as loading a dataset, instantiating a model and then passing it to the models *fit* function. \n\n from models.import Bert\n\n\tX_train = [\n\t\t\"Example sentence 1\"\n\t\t\"Example sentence 2\"\n\t\t\"Today was a horrible day\"\n\t]\n\ty_train = [1,1,0]\n\n\tmodel = Bert('bert-base-uncased', num_labels=2)\n\tmodel.fit(X_train, y_train)\n\n### Saving\nAt the conclusion of training you will most likely want to save your model to disk. Simply call the the models *save* function and supply an output directory and name for the model.\n\n from models.import Bert\n\tmodel = Bert('bert-base-uncased', num_labels=2)\n\tmodel.save(\"path/to/output/dir/\", \"example_save_name\")\n\n### Evaluation \nModel evaluation is basically the same as model training. Load a dataset, instantiate a model and but instead call the *evaluate* function. This returns a number between 0 and 1 which is the percentage of predictions the model got correct.\n\n model.evalaute(X_test, y_test)\n >> 0.8 # 80% correct predictions\n\n### Prediction\n\nPredict the label of a piece of text/s by passing a list of strings to the models \n*predict* function. This returns a list of tuples, one for each piece of text to be predicted. These tuples contain the models confidence scores for each class and the numerical label of the predicted class. If a label converter is supplied, a string label of the predicted class is also included in each output tuple.\n\n from op_text.utils import LabelConverter\n\n\tconverter = LabelConverter({0: \"negative\", 1: \"positive\"} \n to_predict = [\"Today was a great day!\"]\n\tmodel.predict(to_predict, label_converter=converter)\n\t>> [([0.02, 0.98], 1, \"positive\")]\n\n## Citation\n\nPaper you can cite for the Transformers library:\n```\n@article{Wolf2019HuggingFacesTS,\n title={HuggingFace's Transformers: State-of-the-art Natural Language Processing},\n author={Thomas Wolf and Lysandre Debut and Victor Sanh and Julien Chaumond and Clement Delangue and Anthony Moi and Pierric Cistac and Tim Rault and R'emi Louf and Morgan Funtowicz and Jamie Brew},\n journal={ArXiv},\n year={2019},\n volume={abs/1910.03771}\n}\n```\n\n", "description_content_type": "text/markdown", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/knuddj1/op_text", "keywords": "NLP deep learning transformer pytorch BERT GPT GPT-2 google openai CMU sentiment text", "license": "MIT", "maintainer": "", "maintainer_email": "", "name": "op-text", "package_url": "https://pypi.org/project/op-text/", "platform": "", "project_url": "https://pypi.org/project/op-text/", "project_urls": { "Homepage": "https://github.com/knuddj1/op_text" }, "release_url": "https://pypi.org/project/op-text/0.2.0/", "requires_dist": [ "numpy", "torch (>=1.0.0)", "boto3", "requests", "tqdm", "regex", "sentencepiece", "pytorch-transformers" ], "requires_python": ">=3.6", "summary": "Thin wrapper around HuggingFace Transformers sequence classification models for ease of use", "version": "0.2.0", "yanked": false, "yanked_reason": null }, "last_serial": 6115016, "releases": { "0.2.0": [ { "comment_text": "", "digests": { "md5": "bc787066140a202af3b017aaf9e1664f", "sha256": "b8e0ae445201ed2ff6f6aee5e093bd5f6668ef4f23e6e11033fa35dd1e26f04b" }, "downloads": -1, "filename": "op_text-0.2.0-py3-none-any.whl", "has_sig": false, "md5_digest": "bc787066140a202af3b017aaf9e1664f", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": ">=3.6", "size": 11355, "upload_time": "2019-11-11T08:44:04", "upload_time_iso_8601": "2019-11-11T08:44:04.546438Z", "url": "https://files.pythonhosted.org/packages/b0/f8/e27daa6a5a245424e078bd351d7e6bd158d90e8139a706bdf067026308f2/op_text-0.2.0-py3-none-any.whl", "yanked": false, "yanked_reason": null }, { "comment_text": "", "digests": { "md5": "e4cdf237b3e0b1641d0c9c9233d30d19", "sha256": "17dc44493933570dda22c08f895431a818df0d1d5f55073bd7df7ab913db0e8e" }, "downloads": -1, "filename": "op_text-0.2.0.tar.gz", "has_sig": false, "md5_digest": "e4cdf237b3e0b1641d0c9c9233d30d19", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.6", "size": 10502, "upload_time": "2019-11-11T08:44:06", "upload_time_iso_8601": "2019-11-11T08:44:06.734087Z", "url": "https://files.pythonhosted.org/packages/75/55/5941a124af4aa54b4e50a60a8192cf470e328075106543765a0dcca51f42/op_text-0.2.0.tar.gz", "yanked": false, "yanked_reason": null } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "bc787066140a202af3b017aaf9e1664f", "sha256": "b8e0ae445201ed2ff6f6aee5e093bd5f6668ef4f23e6e11033fa35dd1e26f04b" }, "downloads": -1, "filename": "op_text-0.2.0-py3-none-any.whl", "has_sig": false, "md5_digest": "bc787066140a202af3b017aaf9e1664f", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": ">=3.6", "size": 11355, "upload_time": "2019-11-11T08:44:04", "upload_time_iso_8601": "2019-11-11T08:44:04.546438Z", "url": "https://files.pythonhosted.org/packages/b0/f8/e27daa6a5a245424e078bd351d7e6bd158d90e8139a706bdf067026308f2/op_text-0.2.0-py3-none-any.whl", "yanked": false, "yanked_reason": null }, { "comment_text": "", "digests": { "md5": "e4cdf237b3e0b1641d0c9c9233d30d19", "sha256": "17dc44493933570dda22c08f895431a818df0d1d5f55073bd7df7ab913db0e8e" }, "downloads": -1, "filename": "op_text-0.2.0.tar.gz", "has_sig": false, "md5_digest": "e4cdf237b3e0b1641d0c9c9233d30d19", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.6", "size": 10502, "upload_time": "2019-11-11T08:44:06", "upload_time_iso_8601": "2019-11-11T08:44:06.734087Z", "url": "https://files.pythonhosted.org/packages/75/55/5941a124af4aa54b4e50a60a8192cf470e328075106543765a0dcca51f42/op_text-0.2.0.tar.gz", "yanked": false, "yanked_reason": null } ], "vulnerabilities": [] }