{ "info": { "author": "ltskinner", "author_email": "", "bugtrack_url": null, "classifiers": [], "description": "# critical-path-nlp\nTools for adapting universal language models to specifc tasks\n\nPlease note these are tools for rapid prototyping - not brute force hyperparameter tuning.\n\nAdapted from: [Google's BERT](https://github.com/google-research/bert)\n\n# Installation\n1. Clone repo (just for now - will be on pip soon)\n2. Download a pretrained [BERT model](https://github.com/google-research/bert#pre-trained-models) - start with **BERT-Base Uncased** if you're not sure where to begin\n3. Unzip the model and make note of the path\n\n# Examples\n* **Full implementation examples can be found here:**\n + [**SQuAD example**](../master/bert_squad_example.py)\n + [**Multi-Label Classification example**](../master/bert_multilabel_example.py)\n + [**Single-Label Classification example**](../master/bert_classifier_example.py)\n\n## Current Capabilities\n\n### BERT for Question Answering\n\n* Train and evaluate the SQuAD dataset\n + [SQuAD 2.0 - Stanford Question Answering Dataset](https://rajpurkar.github.io/SQuAD-explorer/)\n\n### BERT for Multi-Label Classification\n\n* Train and evaluate custom datasets for multi-label classification tasks (multiple labels possible)\n + [Kaggle - Google Toxic Comment Classification Challange](https://www.kaggle.com/c/jigsaw-toxic-comment-classification-challenge)\n\n### BERT for Single-Label Classification\n\n* Train and evaluate custom datasets for single-label classification tasks (one label possible)\n + [CoLA - Corpus of Linguistic Acceptability](https://nyu-mll.github.io/CoLA/)\n + [MPRC - Microsoft Research Paraphrase Corpus](http://nlpprogress.com/english/semantic_textual_similarity.html)\n + [MNLI - Multi-Genre Natural Language Inference](https://www.nyu.edu/projects/bowman/multinli/)\n + etc.\n\n## Future Capabilities\n\n### GPT-2 Training and Generation\n\n# Usage + Core Components\n### Configuring BERT\n#### First, define the model paths\n\n\n```python \n\nbase_model_folder_path = \"../models/uncased_L-12_H-768_A-12/\" # Folder containing downloaded Base Model\nname_of_config_json_file = \"bert_config.json\" # Located inside the Base Model folder\nname_of_vocab_file = \"vocab.txt\" # Located inside the Base Model folder\n\noutput_directory = \"../models/trained_BERT/\" # Trained model and results landing folder\n\n# Multi-Label and Single-Label Specific\ndata_dir = None # Directory .tsv data is stored in - typically for CoLA/MPRC or other datasets with known structure\n\n```\n\n#### Second, define the model run parameters\n\n```python\n\n\"\"\"Settable parameters and their default values\n\nNote: Most default values are perfectly fine\n\"\"\"\n\n# Administrative\ninit_checkpoint = None\nsave_checkpoints_steps = 1000\niterations_per_loop = 1000\ndo_lower_case = True \n\n# Technical\nbatch_size_train = 32\nbatch_size_eval = 8\nbatch_size_predict = 8\nnum_train_epochs = 3.0\nmax_seq_length = 128\nwarmup_proportion = 0.1\nlearning_rate = 3e-5\n\n# SQuAD Specific\ndoc_stride = 128\nmax_query_length = 64\nn_best_size = 20\nmax_answer_length = 30\nis_squad_v2 = False # SQuAD 2.0 has examples with no answer, aka \"impossible\", SQuAD 1.0 does not\nverbose_logging = False\nnull_score_diff_threshold = 0.0\n\n```\n#### Initialize the configuration handler\n```python\n\nfrom critical_path.BERT.configs import ConfigClassifier\n\nFlags = ConfigClassifier()\nFlags.set_model_paths(\n bert_config_file=base_model_folder_path + name_of_config_json_file,\n bert_vocab_file=base_model_folder_path + name_of_vocab_file,\n bert_output_dir=output_folder_path,\n data_dir=data_dir)\n\nFlags.set_model_params(\n batch_size_train=8, \n max_seq_length=256,\n num_train_epochs=3)\n\n# Retrieve a handle for the configs\nFLAGS = Flags.get_handle()\n```\n\n**A single 1070GTX using BERT-Base Uncased can handle**\n\n| Model | max_seq_len | batch_size |\n| ----------------- |:-----------:| ----------:|\n| BERT-Base Uncased | 256 | 6 |\n| ... | 384 | 4 |\n\nFor full batch size and sequence length guidelines see Google's [recommendations](https://github.com/google-research/bert#out-of-memory-issues)\n\n### Using Configured Model\n#### First, create a new model with the configured parameters\n```python\n\n\"\"\"For Multi-Label Classification\"\"\"\nfrom critical_path.BERT.model_multilabel_class import MultiLabelClassifier\n\nmodel = MultiLabelClassifier(FLAGS)\n\n```\n\n#### Second, load your data source\n* SQuAD has dedicated dataloaders\n + **read_squad_examples(), write_squad_predictions()** in [/BERT/model_squad](../master/critical_path/BERT/model_squad.py)\n* Multi-Label Classification has a generic dataloader\n + **DataProcessor** in [/BERT/model_multilabel_class](../master/critical_path/BERT/model_multilabel_class.py)\n + **Note:** This requires data labels to be in string format\n + ```python\n labels = [\n [\"label_1\", \"label_2\", \"label_3\"],\n [\"label_2\"]\n ]\n ```\n* Single-Label Classification dataloaders\n + **ColaProcessor** is implemented in [/BERT/model_classifier](../master/critical_path/BERT/model_classifier.py)\n + More dataloader formats have been done by [pytorch-pretrained-BERT](https://github.com/huggingface/pytorch-pretrained-BERT/blob/master/examples/run_classifier.py)\n\n```python\n\n\"\"\"For Multi-Label Classification with a custom .csv reading function\"\"\"\nfrom critical_path.BERT.model_multilabel_class import DataProcessor\n\n# read_data is dataset specifc - see /bert_multilabel_example.py\ninput_ids, input_text, input_labels, label_list = read_toxic_data(randomize=True)\n\nprocessor = DataProcessor(label_list=label_list)\ntrain_examples = processor.get_samples(\n input_ids=input_ids,\n input_text=input_text,\n input_labels=input_labels,\n set_type='train')\n\n```\n\n#### Third, run your task\n```python\n\n\"\"\"Train and predict a Multi-Label Classifier\"\"\"\n\nif do_train:\n model.train(train_examples, label_list)\n\nif do_predict:\n model.predict(predict_examples, label_list)\n\n```\n\n# For full examples please see:\n* **Full implementations:**\n + [**SQuAD example**](../master/bert_squad_example.py)\n + [**Multi-Label Classification example**](../master/bert_multilabel_example.py)\n + [**Single-Label Classification example**](../master/bert_classifier_example.py)\n\n\n\n", "description_content_type": "text/markdown", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/ltskinner/critical-path-nlp", "keywords": "BERT google transformer squad SQuAD nlp", "license": "Apache", "maintainer": "", "maintainer_email": "", "name": "critical-path", "package_url": "https://pypi.org/project/critical-path/", "platform": "", "project_url": "https://pypi.org/project/critical-path/", "project_urls": { "Homepage": "https://github.com/ltskinner/critical-path-nlp" }, "release_url": "https://pypi.org/project/critical-path/0.0.9/", "requires_dist": [ "tensorflow" ], "requires_python": "", "summary": "Tools for adapting universal language models to specifc tasks", "version": "0.0.9" }, "last_serial": 5389255, "releases": { "0.0.1": [ { "comment_text": "", "digests": { "md5": "f2991d3ac334956538624696b574ba8f", "sha256": "3cbcd8d36416259ba4eda70d1e4d4534f15c5c77518645c47e66ff0f4c509831" }, "downloads": -1, "filename": "critical_path-0.0.1-py3-none-any.whl", "has_sig": false, "md5_digest": "f2991d3ac334956538624696b574ba8f", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 7661, "upload_time": "2019-06-12T01:40:52", "url": "https://files.pythonhosted.org/packages/4e/78/bf3d22bf5d59086cc225bdea2c5bcdab00d3f73c28257f4e54d8fa55ebb9/critical_path-0.0.1-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "782b2dc01db8d89e5ebcbff1264fc104", "sha256": "760c255d619642289d318fb4f7fa8ef4d4f06d6e51ced0012eee6f281dda3ca7" }, "downloads": -1, "filename": "critical_path-0.0.1.tar.gz", "has_sig": false, "md5_digest": "782b2dc01db8d89e5ebcbff1264fc104", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 4032, "upload_time": "2019-06-12T01:40:54", "url": "https://files.pythonhosted.org/packages/6f/f3/414cf4f936f7b6f7209fe6526da0283aee658e3fad6e537c52390a3a14ff/critical_path-0.0.1.tar.gz" } ], "0.0.2": [ { "comment_text": "", "digests": { "md5": "d736a9e478854f7ff3dc22e54957e4fc", "sha256": "e1177a46627faa96e91cf745f69bd5f547685309b6979e4a235a5ecde1227b26" }, "downloads": -1, "filename": "critical_path-0.0.2-py3-none-any.whl", "has_sig": false, "md5_digest": "d736a9e478854f7ff3dc22e54957e4fc", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 7855, "upload_time": "2019-06-12T01:58:51", "url": "https://files.pythonhosted.org/packages/52/a1/7e708f9ffc1df746dc9319394137a5ba7c3ec0fe9a1081752fdc941c33f4/critical_path-0.0.2-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "8d1f1682fc2dbc5fb5679756b4a16b30", "sha256": "51821a3aa111ed10f848d92834506439847b09cc9d3633d66f48c24025d0595a" }, "downloads": -1, "filename": "critical_path-0.0.2.tar.gz", "has_sig": false, "md5_digest": "8d1f1682fc2dbc5fb5679756b4a16b30", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 4085, "upload_time": "2019-06-12T01:58:52", "url": "https://files.pythonhosted.org/packages/12/99/d29d962dfd9d1a0fac4c7011c25866675664ab5e410fc98e02aebe038d27/critical_path-0.0.2.tar.gz" } ], "0.0.3": [ { "comment_text": "", "digests": { "md5": "e6f94cdd4dd920f08b6b1435a1d57c79", "sha256": "14b98d14ff585e7493cf4c6156bebc008218e27ee5d7c33ecc94d6a3c3a36c41" }, "downloads": -1, "filename": "critical_path-0.0.3-py3-none-any.whl", "has_sig": false, "md5_digest": "e6f94cdd4dd920f08b6b1435a1d57c79", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 7859, "upload_time": "2019-06-12T02:01:15", "url": "https://files.pythonhosted.org/packages/96/e0/004d2ba365e60dd1f94928ae45c5fb06911646ae207e97a72e037943f1d4/critical_path-0.0.3-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "e5f8f0966d9a412ddbb518dcae1e1763", "sha256": "10ca09188031760f145b7bc1adcb0243a3987a4c46bee547bef277f8944376a2" }, "downloads": -1, "filename": "critical_path-0.0.3.tar.gz", "has_sig": false, "md5_digest": "e5f8f0966d9a412ddbb518dcae1e1763", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 4081, "upload_time": "2019-06-12T02:01:16", "url": "https://files.pythonhosted.org/packages/94/cf/b26ccf6842f36049cfcf87d40d5b18e599ef7df59c78c18ba8ce0ffd5f27/critical_path-0.0.3.tar.gz" } ], "0.0.4": [ { "comment_text": "", "digests": { "md5": "6f7f27301fdbb240171c19647d08b854", "sha256": "35944e33221d7ebb9e6a65d4c17550bd9e78984ad9e5adfc47c696ce026c6b7e" }, "downloads": -1, "filename": "critical_path-0.0.4-py3-none-any.whl", "has_sig": false, "md5_digest": "6f7f27301fdbb240171c19647d08b854", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 7934, "upload_time": "2019-06-12T02:16:04", "url": "https://files.pythonhosted.org/packages/2e/86/5e8a47a33d991100c63f70f0b32d024ac619846dcbfc321f96a1bf7aab31/critical_path-0.0.4-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "40ea3a58039a846a4a3fe515836d80f4", "sha256": "dfcc23d6ba1f328ad5035e47ff1b0f43e492ab639ae1b09f006e4641a384e91a" }, "downloads": -1, "filename": "critical_path-0.0.4.tar.gz", "has_sig": false, "md5_digest": "40ea3a58039a846a4a3fe515836d80f4", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 4124, "upload_time": "2019-06-12T02:16:05", "url": "https://files.pythonhosted.org/packages/73/a0/ebc11ae6308f1c5ccbe85a04450d06964147cae946197e2af14f09f11e4d/critical_path-0.0.4.tar.gz" } ], "0.0.5": [ { "comment_text": "", "digests": { "md5": "f92d51b689032a6269755de2c5911600", "sha256": "aa15e7e0d995c0859465b361bd1eeef387d84a7b308aa3cd7d2d03f683bff4f9" }, "downloads": -1, "filename": "critical_path-0.0.5-py3-none-any.whl", "has_sig": false, "md5_digest": "f92d51b689032a6269755de2c5911600", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 7941, "upload_time": "2019-06-12T02:55:09", "url": "https://files.pythonhosted.org/packages/f8/d7/4e3092e57178c75e1065a618d614ed7d6a570abb35911fa0af7651768fdd/critical_path-0.0.5-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "07ccb21f8fa5b4dcccffdcbd873efcd4", "sha256": "26a4f1907ef2b02b66b7d6daa38b2d026f43cc0671f105b08e772cb9f9b084f3" }, "downloads": -1, "filename": "critical_path-0.0.5.tar.gz", "has_sig": false, "md5_digest": "07ccb21f8fa5b4dcccffdcbd873efcd4", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 4123, "upload_time": "2019-06-12T02:55:10", "url": "https://files.pythonhosted.org/packages/d0/03/5393bd2df3ecf4d1ec41ef9c8023418078ae74427ed4d1654d254bee0b41/critical_path-0.0.5.tar.gz" } ], "0.0.6": [ { "comment_text": "", "digests": { "md5": "5b77c5667393bb042ff48d4745617046", "sha256": "4ed06b27e2edcf19944ebf0365a187d5470f5efed1899f8692a2cdf187fb54d7" }, "downloads": -1, "filename": "critical_path-0.0.6-py3-none-any.whl", "has_sig": false, "md5_digest": "5b77c5667393bb042ff48d4745617046", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 7962, "upload_time": "2019-06-12T03:09:02", "url": "https://files.pythonhosted.org/packages/8f/38/4a3a8da3b91bfae485c1b9133031233f8ce24f4f5efe94c8dcc488098769/critical_path-0.0.6-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "83bce9cf708665f9c2eb1440dd5713bc", "sha256": "288ae51db6a362c3da2275a6818ae7af63d6970e4091ccd11112e2fa18f4cac4" }, "downloads": -1, "filename": "critical_path-0.0.6.tar.gz", "has_sig": false, "md5_digest": "83bce9cf708665f9c2eb1440dd5713bc", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 4193, "upload_time": "2019-06-12T03:09:04", "url": "https://files.pythonhosted.org/packages/a0/cb/c8b62740374efff5cd9411c27e5dca124968eb30afbba46de0fbcead2e6b/critical_path-0.0.6.tar.gz" } ], "0.0.7": [ { "comment_text": "", "digests": { "md5": "d773c0ed9b82add0b08d0a06d3844021", "sha256": "304c32ee06fa6d3b6878be94e8ddaea6148da7ebc1eb7255a4b5432953ec5bc3" }, "downloads": -1, "filename": "critical_path-0.0.7-py3-none-any.whl", "has_sig": false, "md5_digest": "d773c0ed9b82add0b08d0a06d3844021", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 53511, "upload_time": "2019-06-12T03:10:45", "url": "https://files.pythonhosted.org/packages/2b/8d/be8b42671ea15cc914e511ae436dd9f64b7cc8c407d1a16fc04e91439729/critical_path-0.0.7-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "7a9fe24e6da3c2481acba03e348bac0a", "sha256": "f470c45ea1b7fa7235faa481765bd917da0e63e0c768e60bcd78e5ca05a95bca" }, "downloads": -1, "filename": "critical_path-0.0.7.tar.gz", "has_sig": false, "md5_digest": "7a9fe24e6da3c2481acba03e348bac0a", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 40174, "upload_time": "2019-06-12T03:10:47", "url": "https://files.pythonhosted.org/packages/4f/40/a1c97ab0842ff337a7d8b53c49d4788733047712d1863502cdf7e4ee0676/critical_path-0.0.7.tar.gz" } ], "0.0.8": [ { "comment_text": "", "digests": { "md5": "434c3fe23094ec8ab2fcb24b322f9de6", "sha256": "d5256bf2ae049b0d03709c9e1795472adfc25746ebf3a538e4baea1a6a75d483" }, "downloads": -1, "filename": "critical_path-0.0.8-py3-none-any.whl", "has_sig": false, "md5_digest": "434c3fe23094ec8ab2fcb24b322f9de6", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 53450, "upload_time": "2019-06-12T03:18:41", "url": "https://files.pythonhosted.org/packages/3b/3c/40b9437619a8729b9f1b292c8fd67e4911f0d240addb748110f9b9ce6d11/critical_path-0.0.8-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "ac74876af5d95310fd0367bad3bfab20", "sha256": "437a5bff8e2df7f221092b572cedf355c4de733c6eb58b01cd5b7d7d44970a91" }, "downloads": -1, "filename": "critical_path-0.0.8.tar.gz", "has_sig": false, "md5_digest": "ac74876af5d95310fd0367bad3bfab20", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 40102, "upload_time": "2019-06-12T03:18:43", "url": "https://files.pythonhosted.org/packages/bb/41/3605b7c00a8b8bd3c0dd06b69befda96ac8107f28562e5e47cb168eb3745/critical_path-0.0.8.tar.gz" } ], "0.0.9": [ { "comment_text": "", "digests": { "md5": "6bb14c462e014044d7deabf80204433b", "sha256": "9632300318eaa1482e696ad59275f65162a66586d2b12d60e1626169bcae95c9" }, "downloads": -1, "filename": "critical_path-0.0.9-py3-none-any.whl", "has_sig": false, "md5_digest": "6bb14c462e014044d7deabf80204433b", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 53428, "upload_time": "2019-06-12T03:20:35", "url": "https://files.pythonhosted.org/packages/66/18/5b839386fb54ceab87f3d768a8af565b5658c4efc6ef244d3254593d97dd/critical_path-0.0.9-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "3235b59eaeebb800def89894c2c55d7e", "sha256": "a3efe908e1005292ae2b2a047120cfc6d7d501244d68ab601cd89ef4127f5cd1" }, "downloads": -1, "filename": "critical_path-0.0.9.tar.gz", "has_sig": false, "md5_digest": "3235b59eaeebb800def89894c2c55d7e", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 40071, "upload_time": "2019-06-12T03:20:37", "url": "https://files.pythonhosted.org/packages/db/2f/074d61f7de774c4074edfe9eb863799fe534f8788adb12822c63b65aa26c/critical_path-0.0.9.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "6bb14c462e014044d7deabf80204433b", "sha256": "9632300318eaa1482e696ad59275f65162a66586d2b12d60e1626169bcae95c9" }, "downloads": -1, "filename": "critical_path-0.0.9-py3-none-any.whl", "has_sig": false, "md5_digest": "6bb14c462e014044d7deabf80204433b", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 53428, "upload_time": "2019-06-12T03:20:35", "url": "https://files.pythonhosted.org/packages/66/18/5b839386fb54ceab87f3d768a8af565b5658c4efc6ef244d3254593d97dd/critical_path-0.0.9-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "3235b59eaeebb800def89894c2c55d7e", "sha256": "a3efe908e1005292ae2b2a047120cfc6d7d501244d68ab601cd89ef4127f5cd1" }, "downloads": -1, "filename": "critical_path-0.0.9.tar.gz", "has_sig": false, "md5_digest": "3235b59eaeebb800def89894c2c55d7e", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 40071, "upload_time": "2019-06-12T03:20:37", "url": "https://files.pythonhosted.org/packages/db/2f/074d61f7de774c4074edfe9eb863799fe534f8788adb12822c63b65aa26c/critical_path-0.0.9.tar.gz" } ] }