{
    "info": {
        "author": "Patrice Lopez",
        "author_email": "patrice.lopez@science-miner.com",
        "bugtrack_url": null,
        "classifiers": [
            "License :: OSI Approved :: Apache Software License",
            "Operating System :: OS Independent",
            "Programming Language :: Python :: 3.5"
        ],
        "description": "<img align=\"right\" width=\"150\" height=\"150\" src=\"doc/cat-delft-small.jpg\">\n\n[![Build Status](https://travis-ci.org/kermitt2/delft.svg?branch=master)](https://travis-ci.org/kermitt2/delft)\n[![PyPI version](https://badge.fury.io/py/delft.svg)](https://badge.fury.io/py/delft)\n[![License](http://img.shields.io/:license-apache-blue.svg)](http://www.apache.org/licenses/LICENSE-2.0.html)\n\n\n\n# DeLFT\n\n__Work in progress !__\n\n__DeLFT__ (**De**ep **L**earning **F**ramework for **T**ext) is a Keras framework for text processing, covering sequence labelling (e.g. named entity tagging) and text classification (e.g. comment classification). This library re-implements standard state-of-the-art Deep Learning architectures.\n\nFrom the observation that most of the open source implementations using Keras are toy examples, our motivation is to develop a framework that can be efficient, scalable and more usable in a production environment (with all the known limitations of Python of course for this purpose). The benefits of DeLFT are:\n\n* Re-implement a variety of state-of-the-art deep learning architectures for both sequence labelling and text classification problems, including the usage of the recent [ELMo](https://allennlp.org/elmo) and [BERT](https://github.com/google-research/bert) contextualised embeddings, which can all be used within the same environment. For instance, this allows to reproduce under similar conditions the performance of all recent NER systems, and even improve most of them.\n\n* Reduce model size, in particular by removing word embeddings from them. For instance, the model for the toxic comment classifier went down from a size of 230 MB with embeddings to 1.8 MB. In practice the size of all the models of DeLFT is less than 2 MB, except for Ontonotes 5.0 NER model which is 4.7 MB.\n\n* Use dynamic data generator so that the training data do not need to stand completely in memory.\n\n* Load and manage efficiently an unlimited volume of pre-trained embeddings: instead of loading pre-trained embeddings in memory - which is horribly slow in Python and limits the number of embeddings to be used simultaneously - the pre-trained embeddings are compiled the first time they are accessed and stored efficiently in a LMDB database. This permits to have the pre-trained embeddings immediately \"warm\" (no load time), to free memory and to use any number of embeddings with a very negligible impact on runtime when using SSD.\n\nThe medium term goal is then to provide good performance (accuracy, runtime, compactness) models also to productions stack such as Java/Scala and C++. A native Java integration of these deep learning models has been realized in [GROBID](https://github.com/kermitt2/grobid) via [JEP](https://github.com/ninia/jep).\n\nDeLFT has been tested with python 3.5, Keras 2.1 and Tensorflow 1.7+ as backend. At this stage, we do not guarantee that DeLFT will run with other different versions of these libraries or other Keras backend versions. As always, GPU(s) are required for decent training time: a GeForce GTX 1050 Ti for instance is absolutely OK without ELMo contextual embeddings. Using ELMo or BERT Base model is fine with a GeForce GTX 1080 Ti.\n\n## Install\n\nGet the github repo:\n\n```sh\ngit clone https://github.com/kermitt2/delft\ncd delft\n```\nIt is advised to setup first a virtual environment to avoid falling into one of these gloomy python dependency marshlands:\n\n```sh\nvirtualenv --system-site-packages -p python3 env\nsource env/bin/activate\n```\n\nInstall the dependencies:\n\n```sh\npip3 install -r requirements.txt\n```\n\nDeLFT uses tensorflow 1.7 as backend, and will exploit your available GPU with the condition that CUDA (>=8.0) is properly installed. \n\nYou need then to download some pre-trained word embeddings and notify their path into the embedding registry. We suggest for exploiting the provided models:\n\n* _glove Common Crawl_ (2.2M vocab., cased, 300 dim. vectors): [glove-840B](http://nlp.stanford.edu/data/glove.840B.300d.zip)\n\n* _fasttext Common Crawl_ (2M vocab., cased, 300 dim. vectors): [fasttext-crawl](https://s3-us-west-1.amazonaws.com/fasttext-vectors/crawl-300d-2M.vec.zip)\n\n* _word2vec GoogleNews_ (3M vocab., cased, 300 dim. vectors): [word2vec](https://drive.google.com/file/d/0B7XkCwpI5KDYNlNUTTlSS21pQmM/edit?usp=sharing)\n\n* _fasttext_wiki_fr_ (1.1M, NOT cased, 300 dim. vectors) for French: [wiki.fr](https://s3-us-west-1.amazonaws.com/fasttext-vectors/wiki.fr.vec)\n\n* _ELMo_ trained on 5.5B word corpus (will produce 1024 dim. vectors) for English: [options](https://s3-us-west-2.amazonaws.com/allennlp/models/elmo/2x4096_512_2048cnn_2xhighway_5.5B/elmo_2x4096_512_2048cnn_2xhighway_5.5B_options.json) and [weights](https://s3-us-west-2.amazonaws.com/allennlp/models/elmo/2x4096_512_2048cnn_2xhighway_5.5B/elmo_2x4096_512_2048cnn_2xhighway_5.5B_weights.hdf5)\n\n* _BERT_ for English, we are using BERT-Base, Cased, 12-layer, 768-hidden, 12-heads , 110M parameters: available [here](https://storage.googleapis.com/bert_models/2018_10_18/cased_L-12_H-768_A-12.zip)\n\n\nThen edit the file `embedding-registry.json` and modify the value for `path` according to the path where you have saved the corresponding embeddings. The embedding files must be unzipped.\n\n```json\n{\n    \"embeddings\": [\n        {\n            \"name\": \"glove-840B\",\n            \"path\": \"/PATH/TO/THE/UNZIPPED/EMBEDDINGS/FILE/glove.840B.300d.txt\",\n            \"type\": \"glove\",\n            \"format\": \"vec\",\n            \"lang\": \"en\",\n            \"item\": \"word\"\n        },\n        ...\n    ]\n}\n\n```\n\nYou're ready to use DeLFT.\n\n## Management of embeddings\n\nThe first time DeLFT starts and accesses pre-trained embeddings, these embeddings are serialised and stored in a LMDB database, a very efficient embedded database using memory page (already used in the Machine Learning world by Caffe and Torch for managing large training data). The next time these embeddings will be accessed, they will be immediately available.\n\nOur approach solves the bottleneck problem pointed for instance [here](https://spenai.org/bravepineapple/faster_em/) in a much better way than quantising+compression or pruning. After being compiled and stored at the first access, any volume of embeddings vectors can be used immediately without any loading, with a negligible usage of memory, without any accuracy loss and with a negligible impact on runtime when using SSD. In practice, we can exploit for instance embeddings for dozen languages simultaneously, without any memory and runtime issues - a requirement for any ambitious industrial deployment of a neural NLP system. \n\nFor instance, in a traditional approach `glove-840B` takes around 2 minutes to load and 4GB in memory. Managed with LMDB, after a first load time of around 4 minutes, `glove-840B` can be accessed immediately and takes only a couple MB in memory, for an impact on runtime negligible (around 1% slower) for any further command line calls.\n\nBy default, the LMDB databases are stored under the subdirectory `data/db`. The size of a database is roughly equivalent to the size of the original uncompressed embeddings file. To modify this path, edit the file `embedding-registry.json` and change the value of the attribute `embedding-lmdb-path`.\n\nTo get FastText .bin format support please uncomment the package `fasttextmirror==0.8.22` in `requirements.txt` or `requirements-gpu.txt` according to your system's configuration. Please note that the **.bin format is not supported on Windows platforms**. Installing the FastText .bin format support introduces the following additional dependencies:\n\n* (gcc-4.8 or newer) or (clang-3.3 or newer)\n* [Python](https://www.python.org/) version 2.7 or >=3.4\n* [pybind11](https://github.com/pybind/pybind11)\n\nWhile FastText .bin format are supported by DeLFT (including using ngrams for OOV words), this format will be loaded entirely in memory and does not take advantage of our memory-efficient management of embeddings.\n\n> I have plenty of memory on my machine, I don't care about load time because I need to grab a coffee, I only process one language at the time, so I am not interested in taking advantage of the LMDB emebedding management !\n\nOk, ok, then set the `embedding-lmdb-path` value to `\"None\"` in the file `embedding-registry.json`, the embeddings will be loaded in memory as immutable data, like in the usual Keras scripts.\n\n## Sequence Labelling\n\n### Available models\n\n* _BidLSTM-CRF_ with words and characters input following:\n\n&nbsp;&nbsp;&nbsp;&nbsp; [1] Guillaume Lample, Miguel Ballesteros, Sandeep Subramanian, Kazuya Kawakami, Chris Dyer. \"Neural Architectures for Named Entity Recognition\". Proceedings of NAACL 2016. https://arxiv.org/abs/1603.01360\n\n* _BidLSTM-CNN_ with words, characters and custom casing features input, see:\n\n&nbsp;&nbsp;&nbsp;&nbsp; [2] Jason P. C. Chiu, Eric Nichols. \"Named Entity Recognition with Bidirectional LSTM-CNNs\". 2016. https://arxiv.org/abs/1511.08308\n\n* _BidLSTM-CNN-CRF_ with words, characters and custom casing features input following:\n\n&nbsp;&nbsp;&nbsp;&nbsp; [3] Xuezhe Ma and Eduard Hovy. \"End-to-end Sequence Labelling via Bi-directional LSTM-CNNs-CRF\". 2016. https://arxiv.org/abs/1603.01354\n\n* _BidGRU-CRF_, similar to: \n\n&nbsp;&nbsp;&nbsp;&nbsp; [4] Matthew E. Peters, Waleed Ammar, Chandra Bhagavatula, Russell Power. \"Semi-supervised sequence tagging with bidirectional language models\". 2017. https://arxiv.org/pdf/1705.00108  \n\n* the current state of the art (92.22% F1 on CoNLL2003 NER dataset, averaged over five runs), _BidLSTM-CRF_ with [ELMo](https://allennlp.org/elmo) contextualised embeddings, see:\n\n&nbsp;&nbsp;&nbsp;&nbsp; [5] Matthew E. Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, Luke Zettlemoyer. \"Deep contextualized word representations\". 2018. https://arxiv.org/abs/1802.05365\n\n* Feature extraction to be used as contextual embeddings can also be obtained from BERT, as ELMo alternative, as explained in section 5.4 of: \n\n&nbsp;&nbsp;&nbsp;&nbsp; [6] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova, BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. 2018. https://arxiv.org/abs/1810.04805\n\nThe addition of BERT transformer architecture, as alternative to the above RNN architectures, is currently work in progress. \n\nNote that all our annotation data for sequence labelling follows the [IOB2](https://en.wikipedia.org/wiki/Inside%E2%80%93outside%E2%80%93beginning_(tagging)) scheme.\n\n### Examples\n\n#### NER\n\n##### Overview\n\nWe have reimplemented in DeLFT the main neural architectures for NER of the last two years and performed a reproducibility analysis of the these systems with comparable evaluation criterias. Unfortunaltely, in publications, systems are usually compared directly with reported results obtained in different settings, which can bias scores by more than 1.0 points and completely invalidate both comparison and interpretation of results.  \n\nYou can read more about our reproducibility study of neural NER in this [blog article](http://science-miner.com/a-reproducibility-study-on-neural-ner/).\n\nAll reported scores bellow are __f-score__ for the CoNLL-2003 NER dataset. We report first the f-score averaged over 10 training runs, and second the best f-score over these 10 training runs. All the DeLFT trained models are included in this repository. \n\n| Architecture  | Implementation | Glove only (avg / best)| Glove + valid. set (avg / best)| ELMo + Glove (avg / best)| ELMo + Glove + valid. set (avg / best)|\n| --- | --- | --- | --- | --- | --- |\n| BidLSTM-CRF   | DeLFT | __90.75__ / __91.35__  | 91.13 / 91.60 | __92.47__ / __92.71__ | __92.69__ / __93.09__ | \n|               | [(Lample and al., 2016)](https://arxiv.org/abs/1603.01360) | - / 90.94 |      |              |               | \n| BidLSTM-CNN-CRF | DeLFT | 90.73 / 91.07| 91.01 / 91.26 | 92.30 / 92.57| 92.67 / 93.04 |\n|               | [(Ma & Hovy, 2016)](https://arxiv.org/abs/1603.01354) |  - / 91.21  | | | |\n|               | [(Peters & al. 2018)](https://arxiv.org/abs/1802.05365) |  | | 92.22** / - | |\n| BidLSTM-CNN   | DeLFT | 89.23 / 89.47  | 89.35 / 89.87 | 91.66 / 92.00 | 92.01 / 92.16 |\n|               | [(Chiu & Nichols, 2016)](https://arxiv.org/abs/1511.08308) || __90.88***__ / - | | |\n| BidGRU-CRF    | DeLFT | 90.38 / 90.72  | 90.28 / 90.69 | 92.03 / 92.44 | 92.43 / 92.71 |\n|               | [(Peters & al. 2017)](https://arxiv.org/abs/1705.00108) |  | |  | 91.93* / - |\n\n\n\n_*_ reported f-score using Senna word embeddings and not Glove.\n\n** f-score is averaged over 5 training runs. \n\n*** reported f-score with Senna word embeddings (Collobert 50d) averaged over 10 runs, including case features and not including lexical features. DeLFT implementation of the same architecture includes the capitalization features too, but uses the more efficient GloVe 300d embeddings.\n\n\n##### Command Line Interface\n\nDifferent datasets and languages are supported. They can be specified by the command line parameters. The general usage of the CLI is as follow: \n\n```\nusage: nerTagger.py [-h] [--fold-count FOLD_COUNT] [--lang LANG]\n                    [--dataset-type DATASET_TYPE]\n                    [--train-with-validation-set]\n                    [--architecture ARCHITECTURE] [--use-ELMo] [--use-BERT]\n                    [--data-path DATA_PATH] [--file-in FILE_IN]\n                    [--file-out FILE_OUT]\n                    action\n\nNeural Named Entity Recognizers\n\npositional arguments:\n  action                one of [train, train_eval, eval, tag]\n\noptional arguments:\n  -h, --help            show this help message and exit\n  --fold-count FOLD_COUNT\n                        number of folds or re-runs to be used when training\n  --lang LANG           language of the model as ISO 639-1 code\n  --dataset-type DATASET_TYPE\n                        dataset to be used for training the model\n  --train-with-validation-set\n                        Use the validation set for training together with the\n                        training set\n  --architecture ARCHITECTURE\n                        type of model architecture to be used, one of\n                        [BidLSTM_CRF, BidLSTM_CNN, BidLSTM_CNN_CRF, BidGRU-\n                        CRF]\n  --use-ELMo            Use ELMo contextual embeddings\n  --use-BERT            Use BERT extracted features (embeddings)\n  --data-path DATA_PATH\n                        path to the corpus of documents for training (only use\n                        currently with Ontonotes corpus in orginal XML format)\n  --file-in FILE_IN     path to a text file to annotate\n  --file-out FILE_OUT   path for outputting the resulting JSON NER anotations\n```\n\nMore explanations and examples are presented in the following sections. \n\n##### CONLL 2003\n\nDeLFT comes with various pre-trained models with the CoNLL-2003 NER dataset.\n\nBy default, the BidLSTM-CRF architecture is used. With this available model, glove-840B word embeddings, and optimisation of hyperparameters, the current f1 score on CoNLL 2003 _testb_ set is __91.35__ (best run over 10 training, using _train_ set for training and _testa_ for validation), as compared to the 90.94 reported in [1], or __90.75__ when averaged over 10 training. Best model f1 score becomes __91.60__ when using both _train_ and _testa_ (validation set) for training (best run over 10 training), as it is done by (Chiu & Nichols, 2016) or some recent works like (Peters and al., 2017).  \n\nUsing BidLSTM-CRF model with ELMo embeddings, following [5] and some parameter optimisations and [warm-up](https://github.com/allenai/allennlp/blob/master/tutorials/how_to/elmo.md#notes-on-statefulness-and-non-determinism), make the predictions around 30 times slower but improve the f1 score on CoNLL 2003 currently to __92.47__ (averaged over 10 training, __92.71__ for best model, using _train_ set for training and _testa_ for validation), or __92.69__ (averaged over 10 training, __93.09__ best model) when training with the validation set (as in the paper Peters and al., 2017).\n\nFor re-training a model, the CoNLL-2003 NER dataset (`eng.train`, `eng.testa`, `eng.testb`) must be present under `data/sequenceLabelling/CoNLL-2003/` in IOB2 tagging sceheme (look [here](https://github.com/Franck-Dernoncourt/NeuroNER/tree/master/data/conll2003/en) for instance ;) and [here](https://github.com/kermitt2/delft/tree/master/utilities). The CONLL 2003 dataset (English) is the default dataset and English is the default language, but you can also indicate it explicitly as parameter with `--dataset-type conll2003` and specifying explicitly the language `--lang en`.\n\nFor training and evaluating following the traditional approach (training with the train set without validation set, and evaluating on test set), use:\n\n> python3 nerTagger.py --dataset-type conll2003 train_eval\n\nTo use ELMo contextual embeddings, add the parameter `--use-ELMo`. This will slow down considerably (30 times) the first epoch of the training, then the contextual embeddings will be cached and the rest of the training will be similar to usual embeddings in term of training time. Alternatively add `--use-BERT` to use BERT extracted features as contextual embeddings to the RNN architecture. \n\n> python3 nerTagger.py --dataset-type conll2003 --use-ELMo train_eval\n\nSome recent works like (Chiu & Nichols, 2016) and (Peters and al., 2017) also train with the validation set, leading obviously to a better accuracy (still they compare their scores with scores previously reported trained differently, which is arguably a bit unfair - this aspect is mentioned in (Ma & Hovy, 2016)). To train with both train and validation sets, use the parameter `--train-with-validation-set`:\n\n> python3 nerTagger.py --dataset-type conll2003 --train-with-validation-set train_eval\n\nNote that, by default, the BidLSTM-CRF model is used. (Documentation on selecting other models and setting hyperparameters to be included here !)\n\nFor evaluating against CoNLL 2003 testb set with the existing model:\n\n> python3 nerTagger.py --dataset-type conll2003 eval\n\n```text\n    Evaluation on test set:\n        f1 (micro): 91.35\n                 precision    recall  f1-score   support\n\n            ORG     0.8795    0.9007    0.8899      1661\n            PER     0.9647    0.9623    0.9635      1617\n           MISC     0.8261    0.8120    0.8190       702\n            LOC     0.9260    0.9305    0.9282      1668\n\n    avg / total     0.9109    0.9161    0.9135      5648\n\n```\n\nIf the model has been trained also with the validation set (`--train-with-validation-set`), similarly to (Chiu & Nichols, 2016) or (Peters and al., 2017), results are significantly better:\n\n```text\n    Evaluation on test set:\n        f1 (micro): 91.60\n                 precision    recall    f1-score    support\n\n            LOC     0.9219    0.9418    0.9318      1668\n           MISC     0.8277    0.8077    0.8176       702\n            PER     0.9594    0.9635    0.9614      1617\n            ORG     0.9029    0.8904    0.8966      1661\n\n    avg / total     0.9158    0.9163    0.9160      5648\n```\n\nUsing ELMo with the best model obtained over 10 training (not using the validation set for training, only for early stop):\n\n```text\n    Evaluation on test set:\n        f1 (micro): 92.71\n                      precision    recall  f1-score   support\n\n                 PER     0.9787    0.9672    0.9729      1617\n                 LOC     0.9368    0.9418    0.9393      1668\n                MISC     0.8237    0.8319    0.8278       702\n                 ORG     0.9072    0.9181    0.9126      1661\n\n    all (micro avg.)     0.9257    0.9285    0.9271      5648\n\n```\n\nUsing ELMo and training with the validation set gives a f-score of 93.09 (best model), 92.69 averaged over 10 runs (the best model is provided under `data/models/sequenceLabelling/ner-en-conll2003-BidLSTM_CRF/with_validation_set/`).\n\nFor training with all the available data:\n\n> python3 nerTagger.py --dataset-type conll2003 train\n\nTo take into account the strong impact of random seed, you need to train multiple times with the n-folds options. The model will be trained n times with different seed values but with the same sets if the evaluation set is provided. The evaluation will then give the average scores over these n models (against test set) and for the best model which will be saved. For 10 times training for instance, use:\n\n> python3 nerTagger.py --dataset-type conll2003 --fold-count 10 train_eval\n\nAfter training a model, for tagging some text, for instance in a file `data/test/test.ner.en.txt` (), use the command:\n\n> python3 nerTagger.py --dataset-type conll2003 --file-in data/test/test.ner.en.txt tag\n\nNote that, currently, the input text file must contain one sentence per line, so the text must be presegmented into sentences. To obtain the JSON annotations in a text file instead than in the standard output, use the parameter `--file-out`. Predictions work at around 7400 tokens per second for the BidLSTM_CRF architecture with a GeForce GTX 1080 Ti. \n\nThis produces a JSON output with entities, scores and character offsets like this:\n\n```json\n{\n    \"runtime\": 0.34,\n    \"texts\": [\n        {\n            \"text\": \"The University of California has found that 40 percent of its students suffer food insecurity. At four state universities in Illinois, that number is 35 percent.\",\n            \"entities\": [\n                {\n                    \"text\": \"University of California\",\n                    \"endOffset\": 32,\n                    \"score\": 1.0,\n                    \"class\": \"ORG\",\n                    \"beginOffset\": 4\n                },\n                {\n                    \"text\": \"Illinois\",\n                    \"endOffset\": 134,\n                    \"score\": 1.0,\n                    \"class\": \"LOC\",\n                    \"beginOffset\": 125\n                }\n            ]\n        },\n        {\n            \"text\": \"President Obama is not speaking anymore from the White House.\",\n            \"entities\": [\n                {\n                    \"text\": \"Obama\",\n                    \"endOffset\": 18,\n                    \"score\": 1.0,\n                    \"class\": \"PER\",\n                    \"beginOffset\": 10\n                },\n                {\n                    \"text\": \"White House\",\n                    \"endOffset\": 61,\n                    \"score\": 1.0,\n                    \"class\": \"LOC\",\n                    \"beginOffset\": 49\n                }\n            ]\n        }\n    ],\n    \"software\": \"DeLFT\",\n    \"date\": \"2018-05-02T12:24:55.529301\",\n    \"model\": \"ner\"\n}\n\n```\n\nIf you have trained the model with ELMo, you need to indicate to use ELMo-based model when annotating with the parameter `--use-ELMo` (note that the runtime impact is important as compared to traditional embeddings): \n\n> python3 nerTagger.py --dataset-type conll2003 --use-ELMo --file-in data/test/test.ner.en.txt tag\n\n##### Ontonotes 5.0 CONLL 2012\n\nDeLFT comes with pre-trained models with the [Ontonotes 5.0 CoNLL-2012 NER dataset](http://cemantix.org/data/ontonotes.html). As dataset-type identifier, use `conll2012`. All the options valid for CoNLL-2003 NER dataset are usable for this dataset.\n\nWith the default BidLSTM-CRF architecture, FastText embeddings and without any parameter tuning, f1 score is __86.65__ averaged over these 10 trainings, with best run at  __87.01__ (provided model) when trained with the train set strictly. \n\nWith ELMo, f-score is __88.66__ averaged over these 10 trainings, and with best best run at __89.01__.\n\nFor re-training, the assembled Ontonotes datasets following CoNLL-2012 must be available and converted into IOB2 tagging scheme, see [here](https://github.com/kermitt2/delft/tree/master/utilities) for more details. To train and evaluate following the traditional approach (training with the train set without validation set, and evaluating on test set), use:\n\n> python3 nerTagger.py --dataset-type conll2012 train_eval\n\n```text\nEvaluation on test set:\n\tf1 (micro): 87.01\n                  precision    recall  f1-score   support\n\n            DATE     0.8029    0.8695    0.8349      1602\n        CARDINAL     0.8130    0.8139    0.8135       935\n          PERSON     0.9061    0.9371    0.9214      1988\n             GPE     0.9617    0.9411    0.9513      2240\n             ORG     0.8799    0.8568    0.8682      1795\n           MONEY     0.8903    0.8790    0.8846       314\n            NORP     0.9226    0.9501    0.9361       841\n         ORDINAL     0.7873    0.8923    0.8365       195\n            TIME     0.5772    0.6698    0.6201       212\n     WORK_OF_ART     0.6000    0.5060    0.5490       166\n             LOC     0.7340    0.7709    0.7520       179\n           EVENT     0.5000    0.5556    0.5263        63\n         PRODUCT     0.6528    0.6184    0.6351        76\n         PERCENT     0.8717    0.8567    0.8642       349\n        QUANTITY     0.7155    0.7905    0.7511       105\n             FAC     0.7167    0.6370    0.6745       135\n        LANGUAGE     0.8462    0.5000    0.6286        22\n             LAW     0.7308    0.4750    0.5758        40\n\nall (micro avg.)     0.8647    0.8755    0.8701     11257\n```\n\nWith ELMo embeddings (using the default hyper-parameters, except the batch size which is increased to better learn the less frequent classes):\n\n```text\nEvaluation on test set:\n  f1 (micro): 89.01\n                  precision    recall  f1-score   support\n\n             LAW     0.7188    0.5750    0.6389        40\n         PERCENT     0.8946    0.8997    0.8971       349\n           EVENT     0.6212    0.6508    0.6357        63\n        CARDINAL     0.8616    0.7722    0.8144       935\n        QUANTITY     0.7838    0.8286    0.8056       105\n            NORP     0.9232    0.9572    0.9399       841\n             LOC     0.7459    0.7709    0.7582       179\n            DATE     0.8629    0.8252    0.8437      1602\n        LANGUAGE     0.8750    0.6364    0.7368        22\n             GPE     0.9637    0.9607    0.9622      2240\n         ORDINAL     0.8145    0.9231    0.8654       195\n             ORG     0.9033    0.8903    0.8967      1795\n           MONEY     0.8851    0.9076    0.8962       314\n             FAC     0.8257    0.6667    0.7377       135\n            TIME     0.6592    0.6934    0.6759       212\n          PERSON     0.9350    0.9477    0.9413      1988\n     WORK_OF_ART     0.6467    0.7169    0.6800       166\n         PRODUCT     0.6867    0.7500    0.7170        76\n\nall (micro avg.)     0.8939    0.8864    0.8901     11257\n```\n\nFor ten model training with average, worst and best model with ELMo embeddings, use:\n\n> python3 nerTagger.py --dataset-type conll2012 --use-ELMo --fold-count 10 train_eval\n\n##### French model (based on Le Monde corpus)\n\nNote that Le Monde corpus is subject to copyrights and is limited to research usage only. This is the default French model, so it will be used by simply indicating the language as parameter: `--lang fr`, but you can also indicate explicitly the dataset with `--dataset-type lemonde`.\n\nSimilarly as before, for training and evaluating use:\n\n> python3 nerTagger.py --lang fr train_eval\n\nIn practice, we need to repeat training and evaluation several times to neutralise random seed effects and to average scores, here ten times:\n\n> python3 nerTagger.py --lang fr --fold-count 10 train_eval\n\nThe performance is as follow, with a f-score of __91.01__ averaged over 10 training:\n\n```text\naverage over 10 folds\n  macro f1 = 0.9100881012386587\n  macro precision = 0.9048633201198737\n  macro recall = 0.9153907496012759 \n\n** Worst ** model scores - \n\n                  precision    recall  f1-score   support\n\n      <location>     0.9467    0.9647    0.9556       368\n   <institution>     0.8621    0.8333    0.8475        30\n      <artifact>     1.0000    0.5000    0.6667         4\n  <organisation>     0.9146    0.8089    0.8585       225\n        <person>     0.9264    0.9522    0.9391       251\n      <business>     0.8463    0.8936    0.8693       376\n\nall (micro avg.)     0.9040    0.9083    0.9061      1254\n\n** Best ** model scores - \n\n                  precision    recall  f1-score   support\n\n      <location>     0.9439    0.9592    0.9515       368\n   <institution>     0.8667    0.8667    0.8667        30\n      <artifact>     1.0000    0.5000    0.6667         4\n  <organisation>     0.8813    0.8578    0.8694       225\n        <person>     0.9453    0.9641    0.9546       251\n      <business>     0.8706    0.9122    0.8909       376\n\nall (micro avg.)     0.9090    0.9242    0.9166      1254\n```\n\nWith frELMo:\n\n> python3 nerTagger.py --lang fr --fold-count 10 --use-ELMo train_eval\n\n```text\naverage over 10 folds\n    macro f1 = 0.9209397554337976\n    macro precision = 0.91949107960079\n    macro recall = 0.9224082934609251 \n\n** Worst ** model scores - \n\n                  precision    recall  f1-score   support\n\n  <organisation>     0.8704    0.8356    0.8526       225\n        <person>     0.9344    0.9641    0.9490       251\n      <artifact>     1.0000    0.5000    0.6667         4\n      <location>     0.9173    0.9647    0.9404       368\n   <institution>     0.8889    0.8000    0.8421        30\n      <business>     0.9130    0.8936    0.9032       376\n\nall (micro avg.)     0.9110    0.9147    0.9129      1254\n\n** Best ** model scores - \n\n                  precision    recall  f1-score   support\n\n  <organisation>     0.9061    0.8578    0.8813       225\n        <person>     0.9416    0.9641    0.9528       251\n      <artifact>     1.0000    0.5000    0.6667         4\n      <location>     0.9570    0.9674    0.9622       368\n   <institution>     0.8889    0.8000    0.8421        30\n      <business>     0.9016    0.9255    0.9134       376\n\nall (micro avg.)     0.9268    0.9290    0.9279      1254\n```\n\nFor training with all the dataset without evaluation:\n\n> python3 nerTagger.py --lang fr train\n\nand for annotating some examples:\n\n> python3 nerTagger.py --lang fr --file-in data/test/test.ner.fr.txt tag\n\n```json\n{\n    \"date\": \"2018-06-11T21:25:03.321818\",\n    \"runtime\": 0.511,\n    \"software\": \"DeLFT\",\n    \"model\": \"ner-fr-lemonde\",\n    \"texts\": [\n        {\n            \"entities\": [\n                {\n                    \"beginOffset\": 5,\n                    \"endOffset\": 13,\n                    \"score\": 1.0,\n                    \"text\": \"Allemagne\",\n                    \"class\": \"<location>\"\n                },\n                {\n                    \"beginOffset\": 57,\n                    \"endOffset\": 68,\n                    \"score\": 1.0,\n                    \"text\": \"Donald Trump\",\n                    \"class\": \"<person>\"\n                }\n            ],\n            \"text\": \"Or l\u2019Allemagne pourrait pr\u00e9f\u00e9rer la retenue, de peur que Donald Trump ne surtaxe prochainement les automobiles \u00e9trang\u00e8res.\"\n        }\n    ]\n}\n\n```\n\n<p align=\"center\">\n    <img src=\"https://abstrusegoose.com/strips/muggle_problems.png\">\n</p>\n\nThis above work is licensed under a [Creative Commons Attribution-Noncommercial 3.0 United States License](http://creativecommons.org/licenses/by-nc/3.0/us/). \n\n#### GROBID models\n\nDeLFT supports [GROBID](https://github.com/kermitt2/grobid) training data (originally for CRF) and GROBID feature matrix to be labelled.\n\nTrain a model:\n\n> python3 grobidTagger.py *name-of-model* train\n\nwhere *name-of-model* is one of GROBID model (_date_, _affiliation-address_, _citation_, _header_, _name-citation_, _name-header_, ...), for instance:\n\n> python3 grobidTagger.py date train\n\nTo segment the training data and eval on 10%:\n\n> python3 grobidTagger.py *name-of-model* train_eval\n\nFor instance for the _date_ model:\n\n> python3 grobidTagger.py date train_eval\n\n```text\n        Evaluation:\n        f1 (micro): 96.41\n                 precision    recall  f1-score   support\n\n        <month>     0.9667    0.9831    0.9748        59\n         <year>     1.0000    0.9844    0.9921        64\n          <day>     0.9091    0.9524    0.9302        42\n\n    avg / total     0.9641    0.9758    0.9699       165\n```\n\nFor applying a model on some examples:\n\n> python3 grobidTagger.py date tag\n\n```json\n{\n    \"runtime\": 0.509,\n    \"software\": \"DeLFT\",\n    \"model\": \"grobid-date\",\n    \"date\": \"2018-05-23T14:18:15.833959\",\n    \"texts\": [\n        {\n            \"entities\": [\n                {\n                    \"score\": 1.0,\n                    \"endOffset\": 6,\n                    \"class\": \"<month>\",\n                    \"beginOffset\": 0,\n                    \"text\": \"January\"\n                },\n                {\n                    \"score\": 1.0,\n                    \"endOffset\": 11,\n                    \"class\": \"<year>\",\n                    \"beginOffset\": 8,\n                    \"text\": \"2006\"\n                }\n            ],\n            \"text\": \"January 2006\"\n        },\n        {\n            \"entities\": [\n                {\n                    \"score\": 1.0,\n                    \"endOffset\": 4,\n                    \"class\": \"<month>\",\n                    \"beginOffset\": 0,\n                    \"text\": \"March\"\n                },\n                {\n                    \"score\": 1.0,\n                    \"endOffset\": 13,\n                    \"class\": \"<day>\",\n                    \"beginOffset\": 10,\n                    \"text\": \"27th\"\n                },\n                {\n                    \"score\": 1.0,\n                    \"endOffset\": 19,\n                    \"class\": \"<year>\",\n                    \"beginOffset\": 16,\n                    \"text\": \"2001\"\n                }\n            ],\n            \"text\": \"March the 27th, 2001\"\n        }\n    ]\n}\n```\n\nSimilarly to the NER models, to use ELMo contextual embeddings, add the parameter `--use-ELMo`, e.g.:\n\n> python3 grobidTagger.py citation --use-ELMo train_eval\n\nAdd the parameter `--use-BERT` to use BERT extracted features as contextual embeddings for the RNN architecture. \n\n(To be completed)\n\n#### Insult recognition\n\nA small experimental model for recognising insults and threats in texts, based on the Wikipedia comment from the Kaggle _Wikipedia Toxic Comments_ dataset, English only. This uses a small dataset labelled manually.\n\nFor training:\n\n> python3 insultTagger.py train\n\nBy default training uses the whole train set.\n\nExample of a small tagging test:\n\n> python3 insultTagger.py tag\n\nwill produced (__socially offensive language warning!__) result like this:\n\n```json\n{\n    \"runtime\": 0.969,\n    \"texts\": [\n        {\n            \"entities\": [],\n            \"text\": \"This is a gentle test.\"\n        },\n        {\n            \"entities\": [\n                {\n                    \"score\": 1.0,\n                    \"endOffset\": 20,\n                    \"class\": \"<insult>\",\n                    \"beginOffset\": 9,\n                    \"text\": \"moronic wimp\"\n                },\n                {\n                    \"score\": 1.0,\n                    \"endOffset\": 56,\n                    \"class\": \"<threat>\",\n                    \"beginOffset\": 54,\n                    \"text\": \"die\"\n                }\n            ],\n            \"text\": \"you're a moronic wimp who is too lazy to do research! die in hell !!\"\n        }\n    ],\n    \"software\": \"DeLFT\",\n    \"date\": \"2018-05-14T17:22:01.804050\",\n    \"model\": \"insult\"\n}\n```\n\n#### Creating your own model\n\nAs long your task is a sequence labelling of text, adding a new corpus and create an additional model should be straightfoward. If you want to build a model named `toto` based on labelled data in one of the supported format (CoNLL, TEI or GROBID CRF), create the subdirectory `data/sequenceLabelling/toto` and copy your training data under it.  \n\n(To be completed)\n\n## Text classification\n\n### Available models\n\nAll the following models includes Dropout, Pooling and Dense layers with hyperparameters tuned for reasonable performance across standard text classification tasks. If necessary, they are good basis for further performance tuning.\n\n* `gru`: two layers Bidirectional GRU\n* `gru_simple`: one layer Bidirectional GRU\n* `bidLstm`: a Bidirectional LSTM layer followed by an Attention layer\n* `cnn`: convolutional layers followed by a GRU\n* `lstm_cnn`: LSTM followed by convolutional layers\n* `mix1`: one layer Bidirectional GRU followed by a Bidirectional LSTM\n* `dpcnn`: Deep Pyramid Convolutional Neural Networks (but not working as expected - to be reviewed)\n\nNote: by default the first 300 tokens of the text to be classified are used, which is largely enough for any _short text_ classification tasks and works fine with low profile GPU (for instance GeForce GTX 1050 Ti with 4 GB memory). For taking into account a larger portion of the text, modify the config model parameter `maxlen`. However, using more than 1000 tokens for instance requires a modern GPU with enough memory (e.g. 10 GB).\n\nFor all these RNN architectures, it is possible to use ELMo contextual embeddings (`--use-ELMo`) or BERT extracted features as embeddings (`--use-BERT`). Integration of BERT as additional non-RNN architecture is work-in-progress. \n\n### Examples\n\n#### Toxic comment classification\n\nThe dataset of the [Kaggle Toxic Comment Classification challenge](https://www.kaggle.com/c/jigsaw-toxic-comment-classification-challenge) can be found here: https://www.kaggle.com/c/jigsaw-toxic-comment-classification-challenge/data\n\nThis is a multi-label regression problem, where a Wikipedia comment (or any similar short texts) should be associated to 6 possible types of toxicity (`toxic`, `severe_toxic`, `obscene`, `threat`, `insult`, `identity_hate`).\n\nTo launch the training:\n\n> python3 toxicCommentClassifier.py train\n\nFor training with n-folds, use the parameter `--fold-count`:\n\n> python3 toxicCommentClassifier.py train --fold-count 10\n\nAfter training (1 or n-folds), to process the Kaggle test set, use:\n\n> python3 toxicCommentClassifier.py test\n\nTo classify a set of comments:\n\n> python3 toxicCommentClassifier.py classify\n\n#### Citation classification\n\nWe use the dataset developed and presented by A. Athar in the following article:\n\n[7] Awais Athar. \"Sentiment Analysis of Citations using Sentence Structure-Based Features\". Proceedings of the ACL 2011 Student Session, 81-87, 2011. http://www.aclweb.org/anthology/P11-3015\n\nFor a given scientific article, the task is to estimate if the occurrence of a bibliographical citation is positive, neutral or negative given its citation context. Note that the dataset, similarly to the Toxic Comment classification, is highly unbalanced (86% of the citations are neutral).\n\nIn this example, we formulate the problem as a 3 class regression (`negative`. `neutral`, `positive`). To train the model:\n\n> python3 citationClassifier.py train\n\nwith n-folds:\n\n> python3 citationClassifier.py train --fold-count 10\n\nTraining and evalation (ratio):\n\n> python3 citationClassifier.py train_eval\n\nwhich should produce the following evaluation (using the 2-layers Bidirectional GRU model `gru`):\n\n<!-- eval before data generator\n```\nEvaluation on 896 instances:\n\nClass: negative\n    accuracy at 0.5 = 0.9665178571428571\n    f-1 at 0.5 = 0.9665178571428571\n    log-loss = 0.10193770380479757\n    roc auc = 0.9085232470270055\n\nClass: neutral\n    accuracy at 0.5 = 0.8995535714285714\n    f-1 at 0.5 = 0.8995535714285714\n    log-loss = 0.2584601024897698\n    roc auc = 0.8914776135848872\n\nClass: positive\n    accuracy at 0.5 = 0.9252232142857143\n    f-1 at 0.5 = 0.9252232142857143\n    log-loss = 0.20726886795593405\n    roc auc = 0.8892779640954823\n\nMacro-average:\n    average accuracy at 0.5 = 0.9304315476190476\n    average f-1 at 0.5 = 0.9304315476190476\n    average log-loss = 0.18922222475016715\n    average roc auc = 0.8964262749024584\n\nMicro-average:\n    average accuracy at 0.5 = 0.9304315476190482\n    average f-1 at 0.5 = 0.9304315476190482\n    average log-loss = 0.18922222475016712\n    average roc auc = 0.9319196428571429\n```    \n-->\n\n```text\nEvaluation on 896 instances:\n\nClass: negative\n    accuracy at 0.5 = 0.9654017857142857\n    f-1 at 0.5 = 0.9654017857142857\n    log-loss = 0.1056664130630102\n    roc auc = 0.898580121703854\n\nClass: neutral\n    accuracy at 0.5 = 0.8939732142857143\n    f-1 at 0.5 = 0.8939732142857143\n    log-loss = 0.25354114470640177\n    roc auc = 0.88643347739321\n\nClass: positive\n    accuracy at 0.5 = 0.9185267857142857\n    f-1 at 0.5 = 0.9185267857142856\n    log-loss = 0.1980544119553914\n    roc auc = 0.8930591175116723\n\nMacro-average:\n    average accuracy at 0.5 = 0.9259672619047619\n    average f-1 at 0.5 = 0.9259672619047619\n    average log-loss = 0.18575398990826777\n    average roc auc = 0.8926909055362455\n\nMicro-average:\n    average accuracy at 0.5 = 0.9259672619047624\n    average f-1 at 0.5 = 0.9259672619047624\n    average log-loss = 0.18575398990826741\n    average roc auc = 0.9296875\n\n\n```\n\nIn [7], based on a SVM (linear kernel) and custom features, the author reports a F-score of 0.898 for micro-average and 0.764 for macro-average. As we can observe, a non-linear deep learning approach, even without any feature engineering nor tuning, is very robust for an unbalanced dataset and provides higher accuracy.\n\nTo classify a set of citation contexts:\n\n> python3 citationClassifier.py classify\n\nwhich will produce some JSON output like this:\n\n```json\n{\n    \"model\": \"citations\",\n    \"date\": \"2018-05-13T16:06:12.995944\",\n    \"software\": \"DeLFT\",\n    \"classifications\": [\n        {\n            \"negative\": 0.001178970211185515,\n            \"text\": \"One successful strategy [15] computes the set-similarity involving (multi-word) keyphrases about the mentions and the entities, collected from the KG.\",\n            \"neutral\": 0.187219500541687,\n            \"positive\": 0.8640883564949036\n        },\n        {\n            \"negative\": 0.4590276777744293,\n            \"text\": \"Unfortunately, fewer than half of the OCs in the DAML02 OC catalog (Dias et al. 2002) are suitable for use with the isochrone-fitting method because of the lack of a prominent main sequence, in addition to an absence of radial velocity and proper-motion data.\",\n            \"neutral\": 0.3570767939090729,\n            \"positive\": 0.18021513521671295\n        },\n        {\n            \"negative\": 0.0726129561662674,\n            \"text\": \"However, we found that the pairwise approach LambdaMART [41] achieved the best performance on our datasets among most learning to rank algorithms.\",\n            \"neutral\": 0.12469841539859772,\n            \"positive\": 0.8224021196365356\n        }\n    ],\n    \"runtime\": 1.202\n}\n\n\n```\n\n## TODO\n\n__Embeddings__:\n\n* use/experiment more with OOV mechanisms\n\n* train decent French embeddings (ELMo)\n\n__Models__:\n\n* add BERT transformer architecture (non just the extracted features as embeddings as now)\n\n* test Theano as alternative backend (waiting for Apache MXNet...)\n\n* augment word vectors with features, in particular layout features generated by GROBID\n\n* review/rewrite the current Linear Chain CRF layer that we are using, this Keras CRF implementation is (i) a runtime bottleneck, we could try to use Cython for improving runtime and (ii) the viterbi decoding is incomplete, it does not outputing final decoded label scores and it can't output n-best. \n\n__NER__:\n\n* complete the benchmark with OntoNotes 5 - other languages\n\n* align the CoNLL corpus tokenisation (CoNLL corpus is \"pre-tokenised\", but we might not want to follow this particular tokenisation)\n\n__Production stack__:\n\n* improve runtime\n\n__Build more models and examples__...\n\n* model for entity disambiguation\n\n* dependency parser\n\n## Acknowledgments\n\n* Keras CRF implementation by Philipp Gross\n\n* The evaluations for sequence labelling are based on a modified version of https://github.com/chakki-works/seqeval\n\n* The preprocessor of the sequence labelling part is derived from https://github.com/Hironsan/anago/\n\n* [ELMo](https://allennlp.org/elmo) contextual embeddings are developed by the [AllenNLP](https://allennlp.org) team and we use the TensorFlow library [bilm-tf](https://github.com/allenai/bilm-tf) for integrating them into DeLFT.\n\n## License and contact\n\nDistributed under [Apache 2.0 license](http://www.apache.org/licenses/LICENSE-2.0). The dependencies used in the project are either themselves also distributed under Apache 2.0 license or distributed under a compatible license.\n\nContact: Patrice Lopez (<patrice.lopez@science-miner.com>)\n\n\n",
        "description_content_type": "text/markdown",
        "docs_url": null,
        "download_url": "",
        "downloads": {
            "last_day": -1,
            "last_month": -1,
            "last_week": -1
        },
        "home_page": "https://github.com/kermitt2/delft",
        "keywords": "",
        "license": "",
        "maintainer": "",
        "maintainer_email": "",
        "name": "delft",
        "package_url": "https://pypi.org/project/delft/",
        "platform": "",
        "project_url": "https://pypi.org/project/delft/",
        "project_urls": {
            "Homepage": "https://github.com/kermitt2/delft"
        },
        "release_url": "https://pypi.org/project/delft/0.2.3/",
        "requires_dist": [
            "keras (==2.2.4)",
            "numpy (>=1.16.1)",
            "pandas (>=0.22.0)",
            "bleach (>=2.1.0)",
            "regex (>=2018.2.21)",
            "scikit-learn (>=0.19.1)",
            "tqdm (>=4.21)",
            "tensorflow-gpu (==1.12.0)",
            "gensim (>=3.4.0)",
            "langdetect (>=1.0.7)",
            "textblob (>=0.15.1)",
            "h5py (>=2.7.1)",
            "unidecode (>=1.0.22)",
            "pydot (>=1.2.4)",
            "lmdb (>=0.94)",
            "keras-bert (==0.39.0)"
        ],
        "requires_python": ">=3.5",
        "summary": "a Deep Learning Framework for Text",
        "version": "0.2.3"
    },
    "last_serial": 5381968,
    "releases": {
        "0.1.0": [
            {
                "comment_text": "",
                "digests": {
                    "md5": "6e920f49e28c3853d0db5e1b2514a24e",
                    "sha256": "9620da88f4745ff0186bbd8c616b6d3e2966762bf8b17b6ce59c90155cfee23a"
                },
                "downloads": -1,
                "filename": "delft-0.1.0-py3-none-any.whl",
                "has_sig": false,
                "md5_digest": "6e920f49e28c3853d0db5e1b2514a24e",
                "packagetype": "bdist_wheel",
                "python_version": "py3",
                "requires_python": ">=3.5",
                "size": 93881,
                "upload_time": "2019-02-07T00:22:37",
                "url": "https://files.pythonhosted.org/packages/a8/41/2182d581e53860b8277fac42c0f80f83930bc52de0447c6dc74d94100171/delft-0.1.0-py3-none-any.whl"
            },
            {
                "comment_text": "",
                "digests": {
                    "md5": "b23db8aaffb24b4309ed7b78d99d4726",
                    "sha256": "acb3ee57282a6057580b0050378255985fc89716e5547a8aa365ab0b1f531192"
                },
                "downloads": -1,
                "filename": "delft-0.1.0.tar.gz",
                "has_sig": false,
                "md5_digest": "b23db8aaffb24b4309ed7b78d99d4726",
                "packagetype": "sdist",
                "python_version": "source",
                "requires_python": ">=3.5",
                "size": 107483,
                "upload_time": "2019-02-07T00:22:40",
                "url": "https://files.pythonhosted.org/packages/38/4e/abe601f61cb0e4ab82954a877d66e121d3326c2a4472f975c5412f87d06d/delft-0.1.0.tar.gz"
            }
        ],
        "0.1.1": [
            {
                "comment_text": "",
                "digests": {
                    "md5": "0451634aae7ef47a9a64d6deedc9a2e1",
                    "sha256": "05764ee0fb5daff4d0d4cdaf54c8b0e27aa0c650993effc4ef0b6ab3a9c1b2de"
                },
                "downloads": -1,
                "filename": "delft-0.1.1-py3-none-any.whl",
                "has_sig": false,
                "md5_digest": "0451634aae7ef47a9a64d6deedc9a2e1",
                "packagetype": "bdist_wheel",
                "python_version": "py3",
                "requires_python": ">=3.5",
                "size": 93760,
                "upload_time": "2019-02-25T14:45:12",
                "url": "https://files.pythonhosted.org/packages/3b/1b/f0e649585e555f9b9f5c47d8735a6daf8a2405b49509b6636e0ad4b6a498/delft-0.1.1-py3-none-any.whl"
            },
            {
                "comment_text": "",
                "digests": {
                    "md5": "c6eb45d3064c39d136892583a4918266",
                    "sha256": "f1b9851bd87380e9fa50a53eefbf903545f738da8f56fc5764e2cc96c72f8408"
                },
                "downloads": -1,
                "filename": "delft-0.1.1.tar.gz",
                "has_sig": false,
                "md5_digest": "c6eb45d3064c39d136892583a4918266",
                "packagetype": "sdist",
                "python_version": "source",
                "requires_python": ">=3.5",
                "size": 107347,
                "upload_time": "2019-02-25T14:45:16",
                "url": "https://files.pythonhosted.org/packages/91/9a/284337b2a7b9bd7bb21ade43c1c8fcdfc1351d540dc8e817343319309ed2/delft-0.1.1.tar.gz"
            }
        ],
        "0.1.2": [
            {
                "comment_text": "",
                "digests": {
                    "md5": "af923d980ab788935b05dec7f14172d9",
                    "sha256": "6b0e7d6b0141306cb6ceb109bf93b966f604dddc70f3b7e4aa7d2016a0b1e2cc"
                },
                "downloads": -1,
                "filename": "delft-0.1.2-py3-none-any.whl",
                "has_sig": false,
                "md5_digest": "af923d980ab788935b05dec7f14172d9",
                "packagetype": "bdist_wheel",
                "python_version": "py3",
                "requires_python": ">=3.5",
                "size": 94595,
                "upload_time": "2019-03-16T00:30:55",
                "url": "https://files.pythonhosted.org/packages/dd/e0/863ecde9885779a7cabcf82716f18800116d202b79ac0cf457f7b5935884/delft-0.1.2-py3-none-any.whl"
            },
            {
                "comment_text": "",
                "digests": {
                    "md5": "0e3068a3b1e9f7ea6725fe037a6fb75d",
                    "sha256": "cb42248a8a89822542b53147076da0e12bd6167d40af12edf8503239a0b0d487"
                },
                "downloads": -1,
                "filename": "delft-0.1.2.tar.gz",
                "has_sig": false,
                "md5_digest": "0e3068a3b1e9f7ea6725fe037a6fb75d",
                "packagetype": "sdist",
                "python_version": "source",
                "requires_python": ">=3.5",
                "size": 108262,
                "upload_time": "2019-03-16T00:31:00",
                "url": "https://files.pythonhosted.org/packages/4e/28/e16ed6b66ba29b03539f4a8ad865017cb43fa0e68a51b1469affbcf577af/delft-0.1.2.tar.gz"
            }
        ],
        "0.1.3": [
            {
                "comment_text": "",
                "digests": {
                    "md5": "15ba63907d88408cd40a31497d6ceda2",
                    "sha256": "983db496c911820d9b3694c8cd68871161696dfcaec31f170d51cb4815bb2e1e"
                },
                "downloads": -1,
                "filename": "delft-0.1.3-py3-none-any.whl",
                "has_sig": false,
                "md5_digest": "15ba63907d88408cd40a31497d6ceda2",
                "packagetype": "bdist_wheel",
                "python_version": "py3",
                "requires_python": ">=3.5",
                "size": 94599,
                "upload_time": "2019-03-16T07:42:31",
                "url": "https://files.pythonhosted.org/packages/b3/21/94b7082d66daeca75688935329a0919acc54827ea568ab8b78ceac1c74ad/delft-0.1.3-py3-none-any.whl"
            },
            {
                "comment_text": "",
                "digests": {
                    "md5": "2b4332a99f668903bdb611bdaa90cb44",
                    "sha256": "0e22d8ecf9b0498ec3c002ba4e72928af473bda70c2ab8e15bd8ce4a4d57f0ce"
                },
                "downloads": -1,
                "filename": "delft-0.1.3.tar.gz",
                "has_sig": false,
                "md5_digest": "2b4332a99f668903bdb611bdaa90cb44",
                "packagetype": "sdist",
                "python_version": "source",
                "requires_python": ">=3.5",
                "size": 108264,
                "upload_time": "2019-03-16T07:42:38",
                "url": "https://files.pythonhosted.org/packages/2e/e3/26193dc23f29c8e7577729dc3c1b9de0dbfd01c237de18a4eac7b680f611/delft-0.1.3.tar.gz"
            }
        ],
        "0.1.4": [
            {
                "comment_text": "",
                "digests": {
                    "md5": "b1fc7c709902b4c333f79e7a8f4c2f86",
                    "sha256": "662d74a370503501b0c645976e7fb5ebb42495641b5ebc9e3a91be1ba9efc0c7"
                },
                "downloads": -1,
                "filename": "delft-0.1.4-py3-none-any.whl",
                "has_sig": false,
                "md5_digest": "b1fc7c709902b4c333f79e7a8f4c2f86",
                "packagetype": "bdist_wheel",
                "python_version": "py3",
                "requires_python": ">=3.5",
                "size": 95182,
                "upload_time": "2019-04-02T14:41:23",
                "url": "https://files.pythonhosted.org/packages/37/93/d5f0f033e422530aad85037e70a8040387517c2b6803f4771d455b7539cf/delft-0.1.4-py3-none-any.whl"
            },
            {
                "comment_text": "",
                "digests": {
                    "md5": "34f60b482fb82cc77e93b0bef7a50e04",
                    "sha256": "d27c3d85b4c06bd3ba735f7eeb410ccb99ddade386995178625baaed85412336"
                },
                "downloads": -1,
                "filename": "delft-0.1.4.tar.gz",
                "has_sig": false,
                "md5_digest": "34f60b482fb82cc77e93b0bef7a50e04",
                "packagetype": "sdist",
                "python_version": "source",
                "requires_python": ">=3.5",
                "size": 109558,
                "upload_time": "2019-04-02T14:41:33",
                "url": "https://files.pythonhosted.org/packages/c1/e3/68d0ca3e316629c6418260c2eff30a6e33418c355e63641ed1b33dd8007f/delft-0.1.4.tar.gz"
            }
        ],
        "0.1.5": [
            {
                "comment_text": "",
                "digests": {
                    "md5": "d520ca61ee29d38425f1c2cf74937dec",
                    "sha256": "6c834204b023699082d7a177fd27c1d157418408dd82b149628095c5b0728690"
                },
                "downloads": -1,
                "filename": "delft-0.1.5-py3-none-any.whl",
                "has_sig": false,
                "md5_digest": "d520ca61ee29d38425f1c2cf74937dec",
                "packagetype": "bdist_wheel",
                "python_version": "py3",
                "requires_python": ">=3.5",
                "size": 95182,
                "upload_time": "2019-04-02T15:02:50",
                "url": "https://files.pythonhosted.org/packages/89/85/d9cb68af772b402330bf95536dc94dd38bc722ae828c4e8af7231a1b117b/delft-0.1.5-py3-none-any.whl"
            },
            {
                "comment_text": "",
                "digests": {
                    "md5": "b1ce92b14cd5897d0e2eb0391fab399a",
                    "sha256": "083bf3374a57d9667ad1b734aa0dba9df67b4cdc1d53f9eaa312c4c299dc0e08"
                },
                "downloads": -1,
                "filename": "delft-0.1.5.tar.gz",
                "has_sig": false,
                "md5_digest": "b1ce92b14cd5897d0e2eb0391fab399a",
                "packagetype": "sdist",
                "python_version": "source",
                "requires_python": ">=3.5",
                "size": 109555,
                "upload_time": "2019-04-02T15:03:02",
                "url": "https://files.pythonhosted.org/packages/01/7c/c502d09dabfbd57f544f2bf4d9f47a182d8ab0400f559c6680eb88de303f/delft-0.1.5.tar.gz"
            }
        ],
        "0.1.6": [
            {
                "comment_text": "",
                "digests": {
                    "md5": "1d91fb26fe41aa9ca0ef7c48b9b2fdbd",
                    "sha256": "2d5c2f53bc9ccb1f96943c3f879dd11a592cb636de4f55ff5eaf524628e7cfef"
                },
                "downloads": -1,
                "filename": "delft-0.1.6-py3-none-any.whl",
                "has_sig": false,
                "md5_digest": "1d91fb26fe41aa9ca0ef7c48b9b2fdbd",
                "packagetype": "bdist_wheel",
                "python_version": "py3",
                "requires_python": ">=3.5",
                "size": 95201,
                "upload_time": "2019-04-02T16:14:30",
                "url": "https://files.pythonhosted.org/packages/92/30/234820802b4dfa9601d64e3b28bca007d35e74e7726c7099f06572d829d4/delft-0.1.6-py3-none-any.whl"
            },
            {
                "comment_text": "",
                "digests": {
                    "md5": "e911045461a0c84b2841d431f38bc595",
                    "sha256": "40bbaed7679b740231c168fbe8a3b4fbbb3bdd67c7a42689cb35343560a214ae"
                },
                "downloads": -1,
                "filename": "delft-0.1.6.tar.gz",
                "has_sig": false,
                "md5_digest": "e911045461a0c84b2841d431f38bc595",
                "packagetype": "sdist",
                "python_version": "source",
                "requires_python": ">=3.5",
                "size": 109571,
                "upload_time": "2019-04-02T16:14:48",
                "url": "https://files.pythonhosted.org/packages/00/1a/8163ba7f5f9c46b0fbbe5b94a35354cf2beb4f343176691d2823a6621e14/delft-0.1.6.tar.gz"
            }
        ],
        "0.2.0": [
            {
                "comment_text": "",
                "digests": {
                    "md5": "331b9ab2efbfcdd4fc2bbad376c5e4e4",
                    "sha256": "cfc7c34b8462fb83e15d17e80bb0964bab1c7f2f123be128958d48bf6de69581"
                },
                "downloads": -1,
                "filename": "delft-0.2.0-py3-none-any.whl",
                "has_sig": false,
                "md5_digest": "331b9ab2efbfcdd4fc2bbad376c5e4e4",
                "packagetype": "bdist_wheel",
                "python_version": "py3",
                "requires_python": ">=3.6",
                "size": 98037,
                "upload_time": "2019-04-04T03:16:30",
                "url": "https://files.pythonhosted.org/packages/6a/f6/3556311fd80fc4fd2704bc898e829597c734efba97b73c0eb8fc0801bbbd/delft-0.2.0-py3-none-any.whl"
            },
            {
                "comment_text": "",
                "digests": {
                    "md5": "563c5411eaad35d5e36e56e1186751ad",
                    "sha256": "f0f1e6fde94160cc39300e0983622464a143a91da8e94344cfc021de794fa560"
                },
                "downloads": -1,
                "filename": "delft-0.2.0.tar.gz",
                "has_sig": false,
                "md5_digest": "563c5411eaad35d5e36e56e1186751ad",
                "packagetype": "sdist",
                "python_version": "source",
                "requires_python": ">=3.6",
                "size": 112921,
                "upload_time": "2019-04-04T03:16:46",
                "url": "https://files.pythonhosted.org/packages/d0/62/85c7be332a6a598f354691bde5b2477c4c501daeccad773cd7624b2fd4d2/delft-0.2.0.tar.gz"
            }
        ],
        "0.2.1": [
            {
                "comment_text": "",
                "digests": {
                    "md5": "5d7359476870b262933c3af62f532273",
                    "sha256": "63ce587e54195b587aeac3070df2907755b03d75a10210e2e04701a23de1b02d"
                },
                "downloads": -1,
                "filename": "delft-0.2.1-py3-none-any.whl",
                "has_sig": false,
                "md5_digest": "5d7359476870b262933c3af62f532273",
                "packagetype": "bdist_wheel",
                "python_version": "py3",
                "requires_python": ">=3.5",
                "size": 98632,
                "upload_time": "2019-05-08T16:56:31",
                "url": "https://files.pythonhosted.org/packages/a4/37/0ba604b4f6edea16be4050bf9430171e2807724ed47964f965e3bafa1cca/delft-0.2.1-py3-none-any.whl"
            },
            {
                "comment_text": "",
                "digests": {
                    "md5": "3c3131b8e4c55fd8ed623eb0012be9f5",
                    "sha256": "84783daa8ba6eeb8cfdc051c8fbb183e56689d27a79012f4419a16828a1be7ff"
                },
                "downloads": -1,
                "filename": "delft-0.2.1.tar.gz",
                "has_sig": false,
                "md5_digest": "3c3131b8e4c55fd8ed623eb0012be9f5",
                "packagetype": "sdist",
                "python_version": "source",
                "requires_python": ">=3.5",
                "size": 112316,
                "upload_time": "2019-05-08T16:56:51",
                "url": "https://files.pythonhosted.org/packages/fd/57/727d47cf3cda020d87d069e3e2dd4c3fdd5f98dd2ae306ee473247c984e3/delft-0.2.1.tar.gz"
            }
        ],
        "0.2.2": [
            {
                "comment_text": "",
                "digests": {
                    "md5": "a3fe32f73c17dc6ac7d090d4fb31bb07",
                    "sha256": "417f5f35f7bad470724f2a7a5bfd69d9f70ea2d87b06697b1ec60905759e2e78"
                },
                "downloads": -1,
                "filename": "delft-0.2.2-py3-none-any.whl",
                "has_sig": false,
                "md5_digest": "a3fe32f73c17dc6ac7d090d4fb31bb07",
                "packagetype": "bdist_wheel",
                "python_version": "py3",
                "requires_python": ">=3.5",
                "size": 98632,
                "upload_time": "2019-05-08T17:11:59",
                "url": "https://files.pythonhosted.org/packages/34/71/07ef90dfaed379b31851b995664c850eec9bfde8b19c095b8889f5a63e17/delft-0.2.2-py3-none-any.whl"
            },
            {
                "comment_text": "",
                "digests": {
                    "md5": "b170afe7131704a78534a3dd5dece7de",
                    "sha256": "227323a0bcdf469c85009740cfaefeb9e65ee54fb3173ea70ca28b93d2bd6000"
                },
                "downloads": -1,
                "filename": "delft-0.2.2.tar.gz",
                "has_sig": false,
                "md5_digest": "b170afe7131704a78534a3dd5dece7de",
                "packagetype": "sdist",
                "python_version": "source",
                "requires_python": ">=3.5",
                "size": 112319,
                "upload_time": "2019-05-08T17:12:21",
                "url": "https://files.pythonhosted.org/packages/a3/a8/89e34a645d19ff63514dc7c691c2638228cda8c7e4da15d46cabcf1fa982/delft-0.2.2.tar.gz"
            }
        ],
        "0.2.3": [
            {
                "comment_text": "",
                "digests": {
                    "md5": "b667e3603d68abc6fd52d856be73376a",
                    "sha256": "11046577bd2537268fada93062014cc423467f716ab9035b633af48d6cee7756"
                },
                "downloads": -1,
                "filename": "delft-0.2.3-py3-none-any.whl",
                "has_sig": false,
                "md5_digest": "b667e3603d68abc6fd52d856be73376a",
                "packagetype": "bdist_wheel",
                "python_version": "py3",
                "requires_python": ">=3.5",
                "size": 98676,
                "upload_time": "2019-06-10T15:47:57",
                "url": "https://files.pythonhosted.org/packages/0b/50/d3d800080b327ec54e1be7c0e8912b5bd69a3b95f0cfe4c36ba5b1340f67/delft-0.2.3-py3-none-any.whl"
            },
            {
                "comment_text": "",
                "digests": {
                    "md5": "387bccbb3a7a24b8c9bcbd8006dff6f5",
                    "sha256": "32f5748dfb0458976384d100ce7847a989ebff60f41fbd5b10342feb0ff20bcc"
                },
                "downloads": -1,
                "filename": "delft-0.2.3.tar.gz",
                "has_sig": false,
                "md5_digest": "387bccbb3a7a24b8c9bcbd8006dff6f5",
                "packagetype": "sdist",
                "python_version": "source",
                "requires_python": ">=3.5",
                "size": 112414,
                "upload_time": "2019-06-10T15:48:36",
                "url": "https://files.pythonhosted.org/packages/ed/e6/64dcd02b0c664e72e5b1f3143b173097f4334e610abeb8223036834b47bb/delft-0.2.3.tar.gz"
            }
        ]
    },
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "md5": "b667e3603d68abc6fd52d856be73376a",
                "sha256": "11046577bd2537268fada93062014cc423467f716ab9035b633af48d6cee7756"
            },
            "downloads": -1,
            "filename": "delft-0.2.3-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "b667e3603d68abc6fd52d856be73376a",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.5",
            "size": 98676,
            "upload_time": "2019-06-10T15:47:57",
            "url": "https://files.pythonhosted.org/packages/0b/50/d3d800080b327ec54e1be7c0e8912b5bd69a3b95f0cfe4c36ba5b1340f67/delft-0.2.3-py3-none-any.whl"
        },
        {
            "comment_text": "",
            "digests": {
                "md5": "387bccbb3a7a24b8c9bcbd8006dff6f5",
                "sha256": "32f5748dfb0458976384d100ce7847a989ebff60f41fbd5b10342feb0ff20bcc"
            },
            "downloads": -1,
            "filename": "delft-0.2.3.tar.gz",
            "has_sig": false,
            "md5_digest": "387bccbb3a7a24b8c9bcbd8006dff6f5",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.5",
            "size": 112414,
            "upload_time": "2019-06-10T15:48:36",
            "url": "https://files.pythonhosted.org/packages/ed/e6/64dcd02b0c664e72e5b1f3143b173097f4334e610abeb8223036834b47bb/delft-0.2.3.tar.gz"
        }
    ]
}