{
    "info": {
        "author": "Andriy Mulyar, Elliot Schumacher, Masoud Rouhizadeh and Mark Dredze",
        "author_email": "contact@andriymulyar.com",
        "bugtrack_url": null,
        "classifiers": [
            "Development Status :: 4 - Beta",
            "Intended Audience :: Science/Research",
            "License :: OSI Approved :: MIT License",
            "Natural Language :: English",
            "Programming Language :: Python :: 3.5",
            "Topic :: Text Processing :: Linguistic"
        ],
        "description": "# :book: BERT Long Document Classification :book:\nan easy-to-use interface to fully trained BERT based models for multi-class and multi-label long document classification.\n\npre-trained models are currently available for two clinical note (EHR) phenotyping tasks: smoker identification and obesity detection.\n\nTo sustain future development and improvements, we interface [pytorch-transformers](https://github.com/huggingface/pytorch-transformers)\nfor all language model components of our architectures. Additionally, their is a [blog post](http://andriymulyar.com/blog/bert-document-classification) describing the architecture.\n\n| Model             |          Dataset |  # Labels |  Evaluation F1 |\n|-------------------|------------------|--------|----------|\n|   n2c2_2006_smoker_lstm   | I2B2 2006: Smoker Identification            | 4 |      0.981        |\n| n2c2_2008_obesity_lstm | I2B2 2008: Obesity and Co-morbidities Identification    | 15 |      0.997        |\n\n# Installation\n\nInstall with pip:\n\n```\npip install bert_document_classification\n```\n\nor directly:\n\n```\npip install git+https://github.com/AndriyMulyar/bert_document_classification\n```\n\n# Use\nMaps text documents of arbitrary length to binary vectors indicating labels.\n```python\nfrom bert_document_classification.models import SmokerPhenotypingBert\nfrom bert_document_classification.models import ObesityPhenotypingBert\n\nsmoking_classifier = SmokerPhenotypingBert(device='cuda', batch_size=10) #defaults to GPU prediction\n\nobesity_classifier = ObesityPhenotypingBert(device='cpu', batch_size=10) #or CPU if you would like.\n\nsmoking_classifier.predict([\"I'm a document! Make me long and the model can still perform well!\"])\n```\nMore [examples](/examples).\n\n\n\n# Notes\n- For training you will need a GPU.\n- For bulk inference where speed is not of concern lots of available memory and CPU cores will likely work.\n- Model downloads are cached in `~/.cache/torch/bert_document_classification/`. Try clearing this folder if you have issues.\n\n\n\n# Acknowledgement\nIf you found this project useful, consider citing our extended abstract accepted at NeurIPS 2019 ML4Health .\n\n```\nFormat bibtex citation\n```\n\nImplementation, development and training in this project were supported by funding from the Mark Dredze Lab at Johns Hopkins University.\n\n\n",
        "description_content_type": "text/markdown",
        "docs_url": null,
        "download_url": "",
        "downloads": {
            "last_day": -1,
            "last_month": -1,
            "last_week": -1
        },
        "home_page": "https://github.com/AndriyMulyar/bert_document_classification",
        "keywords": "BERT,document classification",
        "license": "MIT",
        "maintainer": "",
        "maintainer_email": "",
        "name": "bert-document-classification",
        "package_url": "https://pypi.org/project/bert-document-classification/",
        "platform": "",
        "project_url": "https://pypi.org/project/bert-document-classification/",
        "project_urls": {
            "Homepage": "https://github.com/AndriyMulyar/bert_document_classification"
        },
        "release_url": "https://pypi.org/project/bert-document-classification/1.0.0/",
        "requires_dist": [
            "pytorch-transformers",
            "torch",
            "configargparse",
            "scikit-learn"
        ],
        "requires_python": "",
        "summary": "long document classification with language models",
        "version": "1.0.0"
    },
    "last_serial": 5933482,
    "releases": {
        "1.0.0": [
            {
                "comment_text": "",
                "digests": {
                    "md5": "ceebce09c73cabbd6a834976d6fbffc0",
                    "sha256": "4d4559fa8e15d2fb800cedfdc79c14266d7b325c31ed084564ddec3707217480"
                },
                "downloads": -1,
                "filename": "bert_document_classification-1.0.0-py3-none-any.whl",
                "has_sig": false,
                "md5_digest": "ceebce09c73cabbd6a834976d6fbffc0",
                "packagetype": "bdist_wheel",
                "python_version": "py3",
                "requires_python": null,
                "size": 18666,
                "upload_time": "2019-10-06T01:40:42",
                "url": "https://files.pythonhosted.org/packages/f9/e0/bfce41dcb17179d538c46093e04a8925b63c913dae9a269aca51b0e2d701/bert_document_classification-1.0.0-py3-none-any.whl"
            },
            {
                "comment_text": "",
                "digests": {
                    "md5": "3d1a7e85dd8fb3e5709e3a34f6e2317b",
                    "sha256": "74e91b3932fa34cb9008170d57c219e65a0178b800ea6928f601c6153f193450"
                },
                "downloads": -1,
                "filename": "bert_document_classification-1.0.0.tar.gz",
                "has_sig": false,
                "md5_digest": "3d1a7e85dd8fb3e5709e3a34f6e2317b",
                "packagetype": "sdist",
                "python_version": "source",
                "requires_python": null,
                "size": 16294,
                "upload_time": "2019-10-06T01:40:44",
                "url": "https://files.pythonhosted.org/packages/04/cf/7d774c7b9eef0f0f8299ca0a3942133c1460d9a6262e6eb0ccb07f90419d/bert_document_classification-1.0.0.tar.gz"
            }
        ]
    },
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "md5": "ceebce09c73cabbd6a834976d6fbffc0",
                "sha256": "4d4559fa8e15d2fb800cedfdc79c14266d7b325c31ed084564ddec3707217480"
            },
            "downloads": -1,
            "filename": "bert_document_classification-1.0.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "ceebce09c73cabbd6a834976d6fbffc0",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 18666,
            "upload_time": "2019-10-06T01:40:42",
            "url": "https://files.pythonhosted.org/packages/f9/e0/bfce41dcb17179d538c46093e04a8925b63c913dae9a269aca51b0e2d701/bert_document_classification-1.0.0-py3-none-any.whl"
        },
        {
            "comment_text": "",
            "digests": {
                "md5": "3d1a7e85dd8fb3e5709e3a34f6e2317b",
                "sha256": "74e91b3932fa34cb9008170d57c219e65a0178b800ea6928f601c6153f193450"
            },
            "downloads": -1,
            "filename": "bert_document_classification-1.0.0.tar.gz",
            "has_sig": false,
            "md5_digest": "3d1a7e85dd8fb3e5709e3a34f6e2317b",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 16294,
            "upload_time": "2019-10-06T01:40:44",
            "url": "https://files.pythonhosted.org/packages/04/cf/7d774c7b9eef0f0f8299ca0a3942133c1460d9a6262e6eb0ccb07f90419d/bert_document_classification-1.0.0.tar.gz"
        }
    ]
}