{ "info": { "author": "Andriy Mulyar, Elliot Schumacher, Masoud Rouhizadeh and Mark Dredze", "author_email": "contact@andriymulyar.com", "bugtrack_url": null, "classifiers": [ "Development Status :: 4 - Beta", "Intended Audience :: Science/Research", "License :: OSI Approved :: MIT License", "Natural Language :: English", "Programming Language :: Python :: 3.5", "Topic :: Text Processing :: Linguistic" ], "description": "# :book: BERT Long Document Classification :book:\nan easy-to-use interface to fully trained BERT based models for multi-class and multi-label long document classification.\n\npre-trained models are currently available for two clinical note (EHR) phenotyping tasks: smoker identification and obesity detection.\n\nTo sustain future development and improvements, we interface [pytorch-transformers](https://github.com/huggingface/pytorch-transformers)\nfor all language model components of our architectures. Additionally, their is a [blog post](http://andriymulyar.com/blog/bert-document-classification) describing the architecture.\n\n| Model | Dataset | # Labels | Evaluation F1 |\n|-------------------|------------------|--------|----------|\n| n2c2_2006_smoker_lstm | I2B2 2006: Smoker Identification | 4 | 0.981 |\n| n2c2_2008_obesity_lstm | I2B2 2008: Obesity and Co-morbidities Identification | 15 | 0.997 |\n\n# Installation\n\nInstall with pip:\n\n```\npip install bert_document_classification\n```\n\nor directly:\n\n```\npip install git+https://github.com/AndriyMulyar/bert_document_classification\n```\n\n# Use\nMaps text documents of arbitrary length to binary vectors indicating labels.\n```python\nfrom bert_document_classification.models import SmokerPhenotypingBert\nfrom bert_document_classification.models import ObesityPhenotypingBert\n\nsmoking_classifier = SmokerPhenotypingBert(device='cuda', batch_size=10) #defaults to GPU prediction\n\nobesity_classifier = ObesityPhenotypingBert(device='cpu', batch_size=10) #or CPU if you would like.\n\nsmoking_classifier.predict([\"I'm a document! Make me long and the model can still perform well!\"])\n```\nMore [examples](/examples).\n\n\n\n# Notes\n- For training you will need a GPU.\n- For bulk inference where speed is not of concern lots of available memory and CPU cores will likely work.\n- Model downloads are cached in `~/.cache/torch/bert_document_classification/`. Try clearing this folder if you have issues.\n\n\n\n# Acknowledgement\nIf you found this project useful, consider citing our extended abstract accepted at NeurIPS 2019 ML4Health .\n\n```\nFormat bibtex citation\n```\n\nImplementation, development and training in this project were supported by funding from the Mark Dredze Lab at Johns Hopkins University.\n\n\n", "description_content_type": "text/markdown", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/AndriyMulyar/bert_document_classification", "keywords": "BERT,document classification", "license": "MIT", "maintainer": "", "maintainer_email": "", "name": "bert-document-classification", "package_url": "https://pypi.org/project/bert-document-classification/", "platform": "", "project_url": "https://pypi.org/project/bert-document-classification/", "project_urls": { "Homepage": "https://github.com/AndriyMulyar/bert_document_classification" }, "release_url": "https://pypi.org/project/bert-document-classification/1.0.0/", "requires_dist": [ "pytorch-transformers", "torch", "configargparse", "scikit-learn" ], "requires_python": "", "summary": "long document classification with language models", "version": "1.0.0" }, "last_serial": 5933482, "releases": { "1.0.0": [ { "comment_text": "", "digests": { "md5": "ceebce09c73cabbd6a834976d6fbffc0", "sha256": "4d4559fa8e15d2fb800cedfdc79c14266d7b325c31ed084564ddec3707217480" }, "downloads": -1, "filename": "bert_document_classification-1.0.0-py3-none-any.whl", "has_sig": false, "md5_digest": "ceebce09c73cabbd6a834976d6fbffc0", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 18666, "upload_time": "2019-10-06T01:40:42", "url": "https://files.pythonhosted.org/packages/f9/e0/bfce41dcb17179d538c46093e04a8925b63c913dae9a269aca51b0e2d701/bert_document_classification-1.0.0-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "3d1a7e85dd8fb3e5709e3a34f6e2317b", "sha256": "74e91b3932fa34cb9008170d57c219e65a0178b800ea6928f601c6153f193450" }, "downloads": -1, "filename": "bert_document_classification-1.0.0.tar.gz", "has_sig": false, "md5_digest": "3d1a7e85dd8fb3e5709e3a34f6e2317b", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 16294, "upload_time": "2019-10-06T01:40:44", "url": "https://files.pythonhosted.org/packages/04/cf/7d774c7b9eef0f0f8299ca0a3942133c1460d9a6262e6eb0ccb07f90419d/bert_document_classification-1.0.0.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "ceebce09c73cabbd6a834976d6fbffc0", "sha256": "4d4559fa8e15d2fb800cedfdc79c14266d7b325c31ed084564ddec3707217480" }, "downloads": -1, "filename": "bert_document_classification-1.0.0-py3-none-any.whl", "has_sig": false, "md5_digest": "ceebce09c73cabbd6a834976d6fbffc0", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 18666, "upload_time": "2019-10-06T01:40:42", "url": "https://files.pythonhosted.org/packages/f9/e0/bfce41dcb17179d538c46093e04a8925b63c913dae9a269aca51b0e2d701/bert_document_classification-1.0.0-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "3d1a7e85dd8fb3e5709e3a34f6e2317b", "sha256": "74e91b3932fa34cb9008170d57c219e65a0178b800ea6928f601c6153f193450" }, "downloads": -1, "filename": "bert_document_classification-1.0.0.tar.gz", "has_sig": false, "md5_digest": "3d1a7e85dd8fb3e5709e3a34f6e2317b", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 16294, "upload_time": "2019-10-06T01:40:44", "url": "https://files.pythonhosted.org/packages/04/cf/7d774c7b9eef0f0f8299ca0a3942133c1460d9a6262e6eb0ccb07f90419d/bert_document_classification-1.0.0.tar.gz" } ] }