{
    "info": {
        "author": "Tae Hwan Jung(@graykode)",
        "author_email": "nlkey2022@gmail.com",
        "bugtrack_url": null,
        "classifiers": [],
        "description": "## TOEIC-BERT\n\n### 76% Correct rate with ONLY Pre-Trained BERT model in TOEIC!!\n\n\n\nThis is project as topic: `TOEIC(Test of English for International Communication) problem solving using pytorch-pretrained-BERT model.` The reason why I used huggingface's [pytorch-pretrained-BERT model](<https://github.com/huggingface/pytorch-pretrained-BERT>) is for pre-training or to do fine-tune more easily.  **I've solved the only blank problem, not the whole problem.** There are two types of blank issues:\n\n1. Selecting Correct Grammar Type.\n\n```\nQ) The teacher had me _________ scales several times a day.\n  1. play (Answer)\n  2. to play\n  3. played\n  4. playing\n```\n\n2. Selecting Correct Vocabulary Type.\n\n```\nQ) The wet weather _________ her from going shopping.\n  1. interrupted\n  2. obstructed\n  3. impeded\n  4. discouraged (Answer)\n```\n\n\n\n#### Why BERT?\n\nIn pretrained BERT, It contains contextual information. So It can find more contextual or grammatical sentences, not clear, a little bit. I was inspired by grammar checker from [blog post](<https://www.scribendi.ai/can-we-use-bert-as-a-language-model-to-assign-score-of-a-sentence/>).\n\n> [Can We Use BERT as a Language Model to Assign a Score to a Sentence?](<https://www.scribendi.ai/can-we-use-bert-as-a-language-model-to-assign-score-of-a-sentence/>)\n>\n> BERT uses a bidirectional encoder to encapsulate a sentence from left to right and from right to left. Thus, it learns two representations of each word-one from left to right and one from right to left-and then concatenates them for many downstream tasks.\n\n\n\n## Evaluation\n\n<p align=\"center\"><img width=\"500\" src=\"images/baseline.gif\" /></p>\n\nI had evaluated with only **pretrained BERT model(not fine-tuning)** to check grammatical or lexical error. Above mathematical expression, `X` is a question sentence. and `n` is number of questions : `{a, b, c, d}`. `C` subset means answer candidate tokens : `C` of `warranty` is `['warrant', '##y']`. `V` means total Vocabulary.\n\nThere's a problem with more than one token. I solved this problem by getting the average value of each tensor. ex) `is being formed` as `['is', 'being', 'formed']` \n\nThen, we find argmax in `L_n(T_n)`.\n\n![](images/prediction.gif)\n\n```python\npredictions = model(question_tensors, segment_tensors)\n\n# predictions : [batch_size, sequence_length, vocab_size]\npredictions_candidates = predictions[0, masked_index, candidate_ids].mean()\n```\n\n\n\n#### Result of Evaluation.\n\nFantastic result with **only pretrained BERT model**\n\n- `bert-base-uncased`: 12-layer, 768-hidden, 12-heads, 110M parameters\n- `bert-large-uncased`: 24-layer, 1024-hidden, 16-heads, 340M parameters\n- `bert-base-cased`: 12-layer, 768-hidden, 12-heads , 110M parameters\n- `bert-large-cased`: 24-layer, 1024-hidden, 16-heads, 340M parameters\n\nTotal 7067 datasets: make non-deterministic with `model.eval()`\n\n|             | bert-base-uncased | bert-base-cased | bert-large-uncased | bert-large-cased |\n| :---------: | :---------------: | :-------------: | :----------------: | :--------------: |\n| Correct Num |       5192        |      5398       |        5321        |       5148       |\n|   Percent   |      73.46%       |     76.38%      |       75.29%       |      72.84       |\n\n\n\n## Quick Start with Python pip Package.\n\n**Start with pip**\n\n```shell\n$ pip install toeicbert\n```\n\n\n\n**Run & Option**\n\n```shell\n$ python toeicbert -m bert-base-uncased -f test.json\n```\n\n- `-m, --model` : bert-model name in huggingface's pytorch-pretrained-BERT : `bert-base-uncased`, `bert-large-uncased`, `bert-base-cased`, `bert-large-cased`.\n\n- `-f, --file` : json file to evalution, see json format, [test.json](test.json). \n\n  **key(question, 1, 2, 3, 4)  is required options, but answer not.**\n\n  `_` in question will be replaced to `[MASK]`\n\n```json\n{\n    \"1\" : {\n        \"question\" : \"The teacher had me _ scales several times a day.\",\n        \"answer\" : \"play\",\n        \"1\" : \"play\",\n        \"2\" : \"to play\",\n        \"3\" : \"played\",\n        \"4\" : \"playing\"\n    },\n    \"2\" : {\n        \"question\" : \"The teacher had me _ scales several times a day.\",\n        \"1\" : \"play\",\n        \"2\" : \"to play\",\n        \"3\" : \"played\",\n        \"4\" : \"playing\"\n    }\n}\n```\n\n\n\n## Author\n\n- Tae Hwan Jung(Jeff Jung) @graykode, Kyung Hee Univ CE(Undergraduate).\n- Author Email : [nlkey2022@gmail.com](mailto:nlkey2022@gmail.com)\n\nThanks for Hwan Suk Gang(Kyung Hee Univ.) for collecting Dataset(`7114` datasets)",
        "description_content_type": "text/markdown",
        "docs_url": null,
        "download_url": "",
        "downloads": {
            "last_day": -1,
            "last_month": -1,
            "last_week": -1
        },
        "home_page": "https://github.com/graykode/toeicbert",
        "keywords": "BERT TOEIC pytorch-pretrained-BERT bert nlp NLP",
        "license": "MIT",
        "maintainer": "",
        "maintainer_email": "",
        "name": "toeicbert",
        "package_url": "https://pypi.org/project/toeicbert/",
        "platform": "",
        "project_url": "https://pypi.org/project/toeicbert/",
        "project_urls": {
            "Homepage": "https://github.com/graykode/toeicbert"
        },
        "release_url": "https://pypi.org/project/toeicbert/0.0.2/",
        "requires_dist": null,
        "requires_python": "",
        "summary": "TOEIC blank problem solving using pytorch-pretrained-BERT model.",
        "version": "0.0.2"
    },
    "last_serial": 5204358,
    "releases": {
        "0.0.1": [
            {
                "comment_text": "",
                "digests": {
                    "md5": "6f6e6a7ee0067f35f242bc45eac1e09a",
                    "sha256": "05e68d9d0cd60c73f60c6759d493c65a82fbe49e75600e9f1c4e5ca4b80c95e4"
                },
                "downloads": -1,
                "filename": "toeicbert-0.0.1.tar.gz",
                "has_sig": false,
                "md5_digest": "6f6e6a7ee0067f35f242bc45eac1e09a",
                "packagetype": "sdist",
                "python_version": "source",
                "requires_python": null,
                "size": 4447,
                "upload_time": "2019-04-29T18:00:59",
                "url": "https://files.pythonhosted.org/packages/30/3c/5e46abf8cd3df21776889e25b254cd2083ee28b57172eb51d5cc7cffad1d/toeicbert-0.0.1.tar.gz"
            }
        ],
        "0.0.2": [
            {
                "comment_text": "",
                "digests": {
                    "md5": "1943c684df30dde25cbc3ef28212b779",
                    "sha256": "547824420c9ecf7a55de546d79dcd5eeb6a3faf62f6ba61d8f327c6b0dde4b2c"
                },
                "downloads": -1,
                "filename": "toeicbert-0.0.2.tar.gz",
                "has_sig": false,
                "md5_digest": "1943c684df30dde25cbc3ef28212b779",
                "packagetype": "sdist",
                "python_version": "source",
                "requires_python": null,
                "size": 4443,
                "upload_time": "2019-04-29T18:06:03",
                "url": "https://files.pythonhosted.org/packages/9e/04/58275d8074cf90f66cf8dfe867768970b71c9740bf7522058f3724e82b2a/toeicbert-0.0.2.tar.gz"
            }
        ]
    },
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "md5": "1943c684df30dde25cbc3ef28212b779",
                "sha256": "547824420c9ecf7a55de546d79dcd5eeb6a3faf62f6ba61d8f327c6b0dde4b2c"
            },
            "downloads": -1,
            "filename": "toeicbert-0.0.2.tar.gz",
            "has_sig": false,
            "md5_digest": "1943c684df30dde25cbc3ef28212b779",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 4443,
            "upload_time": "2019-04-29T18:06:03",
            "url": "https://files.pythonhosted.org/packages/9e/04/58275d8074cf90f66cf8dfe867768970b71c9740bf7522058f3724e82b2a/toeicbert-0.0.2.tar.gz"
        }
    ]
}