{ "info": { "author": "Ma Can", "author_email": "ma_cancan@163.com", "bugtrack_url": null, "classifiers": [ "License :: OSI Approved :: MIT License", "Operating System :: OS Independent", "Programming Language :: Python :: 3.6" ], "description": "# BERT-BiLSMT-CRF-NER\nTensorflow solution of NER task Using BiLSTM-CRF model with Google BERT Fine-tuning\n\n\u4f7f\u7528\u8c37\u6b4c\u7684BERT\u6a21\u578b\u5728BLSTM-CRF\u6a21\u578b\u4e0a\u8fdb\u884c\u9884\u8bad\u7ec3\u7528\u4e8e\u4e2d\u6587\u547d\u540d\u5b9e\u4f53\u8bc6\u522b\u7684Tensorflow\u4ee3\u7801'\n\n\u4e2d\u6587\u6587\u6863\u8bf7\u67e5\u770bhttps://blog.csdn.net/macanv/article/details/85684284 \u5982\u679c\u5bf9\u60a8\u6709\u5e2e\u52a9\uff0c\u9ebb\u70e6\u70b9\u4e2astar,\u8c22\u8c22~~ \n\nWelcome to star this repository!\n\nThe Chinese training data($PATH/NERdata/) come from:https://github.com/zjy-ucas/ChineseNER \n\nThe CoNLL-2003 data($PATH/NERdata/ori/) come from:https://github.com/kyzhouhzau/BERT-NER \n\nThe evaluation codes come from:https://github.com/guillaumegenthial/tf_metrics/blob/master/tf_metrics/__init__.py \n\n\nTry to implement NER work based on google's BERT code and BiLSTM-CRF network!\nThis project may be more close to process Chinese data. but other language only need Modify a small amount of code.\n\nTHIS PROJECT ONLY SUPPORT Python3. \n###################################################################\n## Download project and install \nYou can install this project by: \n```\npip install bert-base==0.0.8 -i https://pypi.python.org/simple\n```\n\nOR\n```angular2html\ngit clone https://github.com/macanv/BERT-BiLSTM-CRF-NER\ncd BERT-BiLSTM-CRF-NER/\npython3 setup.py install\n```\n\n## UPDATE:\n- 2019.2.25 Fix some bug for ner service\n- 2019.2.19: add text classification service\n- fix Missing loss error\n- add label_list params in train process, so you can using -label_list xxx to special labels in training process. \n\n\n## Train model:\nYou can use -help to view the relevant parameters of the training named entity recognition model, where data_dir, bert_config_file, output_dir, init_checkpoint, vocab_file must be specified.\n```angular2html\nbert-base-ner-train -help\n```\n![](./pictures/ner_help.png) \n\n\ntrain/dev/test dataset is like this:\n```\n\u6d77 O\n\u9493 O\n\u6bd4 O\n\u8d5b O\n\u5730 O\n\u70b9 O\n\u5728 O\n\u53a6 B-LOC\n\u95e8 I-LOC\n\u4e0e O\n\u91d1 B-LOC\n\u95e8 I-LOC\n\u4e4b O\n\u95f4 O\n\u7684 O\n\u6d77 O\n\u57df O\n\u3002 O\n```\nThe first one of each line is a token, the second is token's label, and the line is divided by a blank line. The maximum length of each sentence is [max_seq_length] params. \nYou can get training data from above two git repos \nYou can training ner model by running below command: \n```angular2html\nbert-base-ner-train \\\n -data_dir {your dataset dir}\\\n -output_dir {training output dir}\\\n -init_checkpoint {Google BERT model dir}\\\n -bert_config_file {bert_config.json under the Google BERT model dir} \\\n -vocab_file {vocab.txt under the Google BERT model dir}\n```\nyou can special labels using -label_list params, the project get labels from training data. \n```angular2html\n# using , split\n-labels 'B-LOC, I-LOC ...'\nOR save label in a file like labels.txt, one line one label\n-labels labels.txt\n``` \n\nAfter training model, the NER model will be saved in {output_dir} which you special above cmd line. \n\n## As Service\nMany server and client code comes from excellent open source projects: [bert as service of hanxiao](https://github.com/hanxiao/bert-as-service) If my code violates any license agreement, please let me know and I will correct it the first time.\n~~and NER server/client service code can be applied to other tasks with simple modifications, such as text categorization, which I will provide later.~~\nthis project private Named Entity Recognition and Text Classification server service.\nWelcome to submit your request or share your model, if you want to share it on Github or my work. \n\nYou can use -help to view the relevant parameters of the NER as Service:\nwhich model_dir, bert_model_dir is need\n```\nbert-base-serving-start -help\n```\n![](./pictures/server_help.png)\n\nand than you can using below cmd start ner service:\n```angular2html\nbert-base-serving-start \\\n -model_dir C:\\workspace\\python\\BERT_Base\\output\\ner2 \\\n -bert_model_dir F:\\chinese_L-12_H-768_A-12\n -model_pb_dir C:\\workspace\\python\\BERT_Base\\model_pb_dir\n -mode NER\n```\nor text classification service:\n```angular2html\nbert-base-serving-start \\\n -model_dir C:\\workspace\\python\\BERT_Base\\output\\ner2 \\\n -bert_model_dir F:\\chinese_L-12_H-768_A-12\n -model_pb_dir C:\\workspace\\python\\BERT_Base\\model_pb_dir\n -mode CLASS\n -max_seq_len 202\n```\n\nas you see: \nmode: If mode is NER/CLASS, then the service identified by the Named Entity Recognition/Text Classification will be started. If it is BERT, it will be the same as the [bert as service] project. \nbert_model_dir: bert_model_dir is a BERT model, you can download from https://github.com/google-research/bert\nner_model_dir: your ner model checkpoint dir\nmodel_pb_dir: model freeze save dir, after run optimize func, there will contains like ner_model.pb binary file \n>You can download my ner model from\uff1ahttps://pan.baidu.com/s/1m9VcueQ5gF-TJc00sFD88w, ex_code: guqq\n> Or text classification model from: https://pan.baidu.com/s/1oFPsOUh1n5AM2HjDIo2XCw, ex_code: bbu8 \nSet ner_mode.pb/classification_model.pb to model_pb_dir, and set other file to model_dir(Different models need to be stored separately, you can set ner models label_list.pkl and label2id.pkl to model_dir/ner/ and set text classification file to model_dir/text_classification) , Text classification model can classify 12 categories of Chinese data\uff1a '\u6e38\u620f', '\u5a31\u4e50', '\u8d22\u7ecf', '\u65f6\u653f', '\u80a1\u7968', '\u6559\u80b2', '\u793e\u4f1a', '\u4f53\u80b2', '\u5bb6\u5c45', '\u65f6\u5c1a', '\u623f\u4ea7', '\u5f69\u7968' \n\nYou can see below service starting info:\n![](./pictures/service_1.png)\n![](./pictures/service_2.png)\n\n\nyou can using below code test client: \n#### 1. NER Client\n```angular2html\nimport time\nfrom bert_base.client import BertClient\n\nwith BertClient(show_server_config=False, check_version=False, check_length=False, mode='NER') as bc:\n start_t = time.perf_counter()\n str = '1\u670824\u65e5\uff0c\u65b0\u534e\u793e\u5bf9\u5916\u53d1\u5e03\u4e86\u4e2d\u592e\u5bf9\u96c4\u5b89\u65b0\u533a\u7684\u6307\u5bfc\u610f\u89c1\uff0c\u6d0b\u6d0b\u6d12\u6d121.2\u4e07\u591a\u5b57\uff0c17\u6b21\u63d0\u5230\u5317\u4eac\uff0c4\u6b21\u63d0\u5230\u5929\u6d25\uff0c\u4fe1\u606f\u91cf\u5f88\u5927\uff0c\u5176\u5b9e\u4e5f\u56de\u7b54\u4e86\u4eba\u4eec\u5173\u5fc3\u7684\u5f88\u591a\u95ee\u9898\u3002'\n rst = bc.encode([str, str])\n print('rst:', rst)\n print(time.perf_counter() - start_t)\n```\nyou can see this after run the above code:\n![](./pictures/server_ner_rst.png)\nIf you want to customize the word segmentation method, you only need to make the following simple changes on the client side code.\n\n```angular2html\nrst = bc.encode([list(str), list(str)], is_tokenized=True)\n``` \n\n#### 2. Text Classification Client\n```angular2html\nwith BertClient(show_server_config=False, check_version=False, check_length=False, mode='CLASS') as bc:\n start_t = time.perf_counter()\n str1 = '\u5317\u4eac\u65f6\u95f42\u670817\u65e5\u51cc\u6668\uff0c\u7b2c69\u5c4a\u67cf\u6797\u56fd\u9645\u7535\u5f71\u8282\u516c\u5e03\u4e3b\u7ade\u8d5b\u5355\u5143\u83b7\u5956\u540d\u5355\uff0c\u738b\u666f\u6625\u3001\u548f\u6885\u51ed\u501f\u738b\u5c0f\u5e05\u6267\u5bfc\u7684\u4e2d\u56fd\u5f71\u7247\u300a\u5730\u4e45\u5929\u957f\u300b\u8fde\u593a\u6700\u4f73\u7537\u5973\u6f14\u5458\u53cc\u94f6\u718a\u5927\u5956\uff0c\u8fd9\u662f\u4e2d\u56fd\u6f14\u5458\u9996\u6b21\u5305\u63fd\u67cf\u6797\u7535\u5f71\u8282\u6700\u4f73\u7537\u5973\u6f14\u5458\u5956\uff0c\u4e3a\u534e\u8bed\u5f71\u7247\u5237\u65b0\u7eaa\u5f55\u3002\u4e0e\u6b64\u540c\u65f6\uff0c\u7531\u9752\u5e74\u5bfc\u6f14\u738b\u4e3d\u5a1c\u6267\u5bfc\u7684\u5f71\u7247\u300a\u7b2c\u4e00\u6b21\u7684\u522b\u79bb\u300b\u4e5f\u8363\u83b7\u4e86\u672c\u5c4a\u67cf\u6797\u7535\u5f71\u8282\u65b0\u751f\u4ee3\u5355\u5143\u56fd\u9645\u8bc4\u5ba1\u56e2\u6700\u4f73\u5f71\u7247\uff0c\u53ef\u4ee5\u8bf4\uff0c\u5728\u7ecf\u5386\u6570\u4e2a\u83b7\u5956\u5c0f\u5e74\u4e4b\u540e\uff0c\u4e2d\u56fd\u7535\u5f71\u5728\u67cf\u6797\u5f71\u5c55\u518d\u6b21\u8fce\u6765\u4e86\u9ad8\u5149\u65f6\u523b\u3002'\n str2 = '\u53d7\u7ca4\u6e2f\u6fb3\u5927\u6e7e\u533a\u89c4\u5212\u7eb2\u8981\u63d0\u632f\uff0c\u6e2f\u80a1\u5468\u4e8c\u9ad8\u5f00\uff0c\u6052\u6307\u5f00\u76d8\u4e0a\u6da8\u8fd1\u767e\u70b9\uff0c\u6da8\u5e450.33%\uff0c\u62a528440.49\u70b9\uff0c\u76f8\u5173\u6982\u5ff5\u80a1\u4ea6\u96c6\u4f53\u4e0a\u6da8\uff0c\u7535\u5b50\u5143\u4ef6\u3001\u65b0\u80fd\u6e90\u8f66\u3001\u4fdd\u9669\u3001\u57fa\u5efa\u6982\u5ff5\u591a\u6570\u4e0a\u6da8\u3002\u7ca4\u6cf0\u80a1\u4efd\u3001\u73e0\u6c5f\u5b9e\u4e1a\u3001\u6df1\u5929\u5730A\u7b4910\u4f59\u80a1\u6da8\u505c\uff1b\u4e2d\u5174\u901a\u8baf\u3001\u4e18\u949b\u79d1\u6280\u3001\u821c\u5b87\u5149\u5b66\u5206\u522b\u9ad8\u5f001.4%\u30014.3%\u30011.6%\u3002\u6bd4\u4e9a\u8fea\u7535\u5b50\u3001\u6bd4\u4e9a\u8fea\u80a1\u4efd\u3001\u5149\u5b87\u56fd\u9645\u5206\u522b\u9ad8\u5f001.7%\u30011.2%\u30011%\u3002\u8d8a\u79c0\u4ea4\u901a\u57fa\u5efa\u6da8\u8fd12%\uff0c\u7ca4\u6d77\u6295\u8d44\u3001\u78a7\u6842\u56ed\u7b49\u591a\u80a1\u6da8\u8d851%\u3002\u5176\u4ed6\u65b9\u9762\uff0c\u65e5\u672c\u8f6f\u94f6\u96c6\u56e2\u80a1\u4ef7\u4e0a\u6da8\u8d850.4%\uff0c\u63a8\u52a8\u65e5\u7ecf225\u548c\u4e1c\u8bc1\u6307\u6570\u9f50\u9f50\u9ad8\u5f00\uff0c\u4f46\u968f\u540e\u5747\u56de\u5410\u6da8\u5e45\u8f6c\u8dcc\u4e1c\u8bc1\u6307\u6570\u8dcc0.2%\uff0c\u65e5\u7ecf225\u6307\u6570\u8dcc0.11%\uff0c\u62a521258.4\u70b9\u3002\u53d7\u82af\u7247\u5236\u9020\u5546SK\u6d77\u529b\u58eb\u80a1\u4ef7\u4e0b\u8dcc1.34\uff05\u62d6\u7d2f\uff0c\u97e9\u56fd\u7efc\u6307\u4e0b\u8dcc0.34\uff05\u81f32203.9\u70b9\u3002\u6fb3\u5927\u5229\u4e9aASX 200\u6307\u6570\u65e9\u76d8\u4e0a\u6da80.39\uff05\u81f36089.8\u70b9\uff0c\u5927\u591a\u6570\u884c\u4e1a\u677f\u5757\u5747\u73b0\u6da8\u52bf\u3002\u5728\u4fdd\u5065\u54c1\u54c1\u724c\u6fb3\u4f73\u5b9d\u4e0b\u8c03\u4e0b\u534a\u8d22\u5e74\u7684\u9500\u552e\u9884\u671f\u540e\uff0c\u5176\u80a1\u4ef7\u66b4\u8dcc\u8d85\u8fc723\uff05\u3002\u6fb3\u4f73\u5b9dCEO\u4ea8\u5f17\u91cc\uff08Richard Henfrey\uff09\u8ba4\u4e3a\uff0c\u516c\u53f8\u4e0b\u534a\u5e74\u7684\u5229\u6da6\u53ef\u80fd\u4f1a\u4f4e\u4e8e\u4e0a\u534a\u5e74\uff0c\u4e3b\u8981\u662f\u53d7\u5230\u9500\u552e\u989d\u75b2\u5f31\u7684\u5f71\u54cd\u3002\u540c\u65f6\uff0c\u4e9a\u5e02\u65e9\u76d8\u6fb3\u6d32\u8054\u50a8\u516c\u5e03\u4e862\u6708\u4f1a\u8bae\u7eaa\u8981\uff0c\u653f\u7b56\u59d4\u5458\u5c06\u7ee7\u7eed\u8c28\u614e\u8bc4\u4f30\u7ecf\u6d4e\u589e\u957f\u524d\u666f\uff0c\u56e0\u524d\u666f\u5145\u6ee1\u4e0d\u786e\u5b9a\u6027\u7684\u5f71\u54cd\uff0c\u7a33\u5b9a\u5f53\u524d\u7684\u5229\u7387\u6c34\u5e73\u6bd4\u8d38\u7136\u8c03\u6574\u5229\u7387\u66f4\u4e3a\u5408\u9002\uff0c\u800c\u4e14\u5f53\u524d\u5229\u7387\u6c34\u5e73\u5c06\u6709\u5229\u4e8e\u8d8b\u5411\u901a\u80c0\u76ee\u6807\u53ca\u6539\u5584\u5c31\u4e1a\uff0c\u5f53\u524d\u52b3\u52a8\u529b\u5e02\u573a\u6570\u636e\u8868\u73b0\u5f3a\u52bf\u4e8e\u5176\u4ed6\u7ecf\u6d4e\u6570\u636e\u3002\u53e6\u4e00\u65b9\u9762\uff0c\u7ecf\u6d4e\u589e\u957f\u524d\u666f\u4ea6\u4ee4\u6d88\u8d39\u8005\u6d88\u8d39\u610f\u613f\u4e0b\u6ed1\uff0c\u5982\u679c\u623f\u4ef7\u51fa\u73b0\u4e0b\u6ed1\uff0c\u6d88\u8d39\u53ef\u80fd\u4f1a\u8fdb\u4e00\u6b65\u75b2\u5f31\u3002\u5728\u6fb3\u6d32\u8054\u50a8\u516c\u5e03\u4f1a\u8bae\u7eaa\u8981\u540e\uff0c\u6fb3\u5143\u5151\u7f8e\u5143\u4e0b\u8dcc\u8fd130\u70b9\uff0c\u62a50.7120 \u3002\u7f8e\u5143\u6307\u6570\u5728\u6628\u65e5\u89e6\u53ca96.65\u9644\u8fd1\u7684\u4f4e\u70b9\u4e4b\u540e\u53cd\u5f39\u81f396.904\u3002\u65e5\u5143\u5151\u7f8e\u5143\u62a5110.56\uff0c\u63a5\u8fd1\u4e0a\u4e00\u4ea4\u6613\u65e5\u7684\u4f4e\u70b9\u3002'\n str3 = '\u65b0\u4eac\u62a5\u5feb\u8baf \u636e\u56fd\u5bb6\u5e02\u573a\u76d1\u7ba1\u603b\u5c40\u6d88\u606f\uff0c\u9488\u5bf9\u5a92\u4f53\u62a5\u9053\u6c34\u997a\u7b49\u732a\u8089\u5236\u54c1\u68c0\u51fa\u975e\u6d32\u732a\u761f\u75c5\u6bd2\u6838\u9178\u9633\u6027\u95ee\u9898\uff0c\u5e02\u573a\u76d1\u7ba1\u603b\u5c40\u3001\u519c\u4e1a\u519c\u6751\u90e8\u5df2\u8981\u6c42\u4f01\u4e1a\u7acb\u5373\u8ffd\u6eaf\u732a\u8089\u539f\u6599\u6765\u6e90\u5e76\u5bf9\u732a\u8089\u5236\u54c1\u8fdb\u884c\u4e86\u5904\u7f6e\u3002\u4e24\u90e8\u95e8\u5df2\u6d3e\u51fa\u8054\u5408\u7763\u67e5\u7ec4\u8c03\u67e5\u6838\u5b9e\u76f8\u5173\u60c5\u51b5\uff0c\u8981\u6c42\u732a\u8089\u5236\u54c1\u751f\u4ea7\u4f01\u4e1a\u8fdb\u4e00\u6b65\u52a0\u5f3a\u5bf9\u732a\u8089\u539f\u6599\u7684\u7ba1\u63a7\uff0c\u843d\u5b9e\u68c0\u9a8c\u68c0\u75ab\u7968\u8bc1\u67e5\u9a8c\u89c4\u5b9a\uff0c\u5b8c\u5584\u975e\u6d32\u732a\u761f\u68c0\u6d4b\u548c\u590d\u6838\u5236\u5ea6\uff0c\u9632\u6b62\u67d3\u75ab\u732a\u8089\u539f\u6599\u8fdb\u5165\u98df\u54c1\u52a0\u5de5\u73af\u8282\u3002\u5e02\u573a\u76d1\u7ba1\u603b\u5c40\u3001\u519c\u4e1a\u519c\u6751\u90e8\u7b49\u90e8\u95e8\u8981\u6c42\u5404\u5730\u5168\u9762\u843d\u5b9e\u9632\u63a7\u8d23\u4efb\uff0c\u5f3a\u5316\u9632\u63a7\u63aa\u65bd\uff0c\u89c4\u8303\u4fe1\u606f\u62a5\u544a\u548c\u53d1\u5e03\uff0c\u5bf9\u4e0d\u6309\u8981\u6c42\u5c65\u884c\u9632\u63a7\u8d23\u4efb\u7684\u4f01\u4e1a\uff0c\u4e00\u65e6\u53d1\u73b0\u5c06\u4e25\u5389\u67e5\u5904\u3002\u4e13\u5bb6\u8ba4\u4e3a\uff0c\u975e\u6d32\u732a\u761f\u4e0d\u662f\u4eba\u755c\u5171\u60a3\u75c5\uff0c\u867d\u7136\u5bf9\u732a\u6709\u81f4\u547d\u5371\u9669\uff0c\u4f46\u5bf9\u4eba\u6ca1\u6709\u4efb\u4f55\u5371\u5bb3\uff0c\u5c5e\u4e8e\u53ea\u4f20\u732a\u4e0d\u4f20\u4eba\u578b\u75c5\u6bd2\uff0c\u4e0d\u4f1a\u5f71\u54cd\u98df\u54c1\u5b89\u5168\u3002\u5f00\u5c55\u732a\u8089\u5236\u54c1\u75c5\u6bd2\u6838\u9178\u68c0\u6d4b\uff0c\u53ef\u4e3a\u9632\u63a7\u6eaf\u6e90\u5de5\u4f5c\u63d0\u4f9b\u7ebf\u7d22\u3002'\n rst = bc.encode([str1, str2, str3])\n print('rst:', rst)\n print('time used:{}'.format(time.perf_counter() - start_t))\n```\nyou can see this after run the above code:\n![](./pictures/text_class_rst.png)\n\nNote that it can not start NER service and Text Classification service together. but you can using twice command line start ner service and text classification with different port. \n\n\n# The following tutorial is an old version and will be removed in the future.\n\n## How to train\n#### 1. Download BERT chinese model : \n ```\n wget https://storage.googleapis.com/bert_models/2018_11_03/chinese_L-12_H-768_A-12.zip \n ```\n#### 2. create output dir\ncreate output path in project path:\n```angular2html\nmkdir output\n```\n#### 3. Train model\n\n##### first method \n```\n python3 bert_lstm_ner.py \\\n --task_name=\"NER\" \\ \n --do_train=True \\\n --do_eval=True \\\n --do_predict=True\n --data_dir=NERdata \\\n --vocab_file=checkpoint/vocab.txt \\ \n --bert_config_file=checkpoint/bert_config.json \\ \n --init_checkpoint=checkpoint/bert_model.ckpt \\\n --max_seq_length=128 \\\n --train_batch_size=32 \\\n --learning_rate=2e-5 \\\n --num_train_epochs=3.0 \\\n --output_dir=./output/result_dir/ \n ``` \n ##### OR replace the BERT path and project path in bert_lstm_ner.py\n ```\n if os.name == 'nt': #windows path config\n bert_path = '{your BERT model path}'\n root_path = '{project path}'\nelse: # linux path config\n bert_path = '{your BERT model path}'\n root_path = '{project path}'\n ```\n Than Run:\n ```angular2html\npython3 bert_lstm_ner.py\n```\n\n### USING BLSTM-CRF OR ONLY CRF FOR DECODE!\nJust alter bert_lstm_ner.py line of 450, the params of the function of add_blstm_crf_layer: crf_only=True or False \n\nONLY CRF output layer:\n```\n blstm_crf = BLSTM_CRF(embedded_chars=embedding, hidden_unit=FLAGS.lstm_size, cell_type=FLAGS.cell, num_layers=FLAGS.num_layers,\n dropout_rate=FLAGS.droupout_rate, initializers=initializers, num_labels=num_labels,\n seq_length=max_seq_length, labels=labels, lengths=lengths, is_training=is_training)\n rst = blstm_crf.add_blstm_crf_layer(crf_only=True)\n```\n\n\nBiLSTM with CRF output layer\n```\n blstm_crf = BLSTM_CRF(embedded_chars=embedding, hidden_unit=FLAGS.lstm_size, cell_type=FLAGS.cell, num_layers=FLAGS.num_layers,\n dropout_rate=FLAGS.droupout_rate, initializers=initializers, num_labels=num_labels,\n seq_length=max_seq_length, labels=labels, lengths=lengths, is_training=is_training)\n rst = blstm_crf.add_blstm_crf_layer(crf_only=False)\n```\n\n## Result:\nall params using default\n#### In dev data set:\n![](./pictures/picture1.png)\n\n#### In test data set\n![](./pictures/picture2.png)\n\n#### entity leval result:\nlast two result are label level result, the entitly level result in code of line 796-798,this result will be output in predict process.\nshow my entity level result :\n![](./pictures/03E18A6A9C16082CF22A9E8837F7E35F.png)\n> my model can download from baidu cloud: \n>\u94fe\u63a5\uff1ahttps://pan.baidu.com/s/1GfDFleCcTv5393ufBYdgqQ \u63d0\u53d6\u7801\uff1a4cus \nNOTE: My model is trained by crf_only params\n\n## ONLINE PREDICT\nIf model is train finished, just run\n```angular2html\npython3 terminal_predict.py\n```\n![](./pictures/predict.png)\n\n ## Using NER as Service\n\n#### Service \nUsing NER as Service is simple, you just need to run the python script below in the project root path:\n```angular2html\npython3 runs.py \\ \n -mode NER\n -bert_model_dir /home/macan/ml/data/chinese_L-12_H-768_A-12 \\\n -ner_model_dir /home/macan/ml/data/bert_ner \\\n -model_pd_dir /home/macan/ml/workspace/BERT_Base/output/predict_optimizer \\\n -num_worker 8\n```\n\n\nYou can download my ner model from\uff1ahttps://pan.baidu.com/s/1m9VcueQ5gF-TJc00sFD88w, ex_code: guqq \nSet ner_mode.pb to model_pd_dir, and set other file to ner_model_dir and than run last cmd \n![](./pictures/service_1.png)\n![](./pictures/service_2.png)\n\n\n#### Client\nThe client using methods can reference client_test.py script\n```angular2html\nimport time\nfrom client.client import BertClient\n\nner_model_dir = 'C:\\workspace\\python\\BERT_Base\\output\\predict_ner'\nwith BertClient( ner_model_dir=ner_model_dir, show_server_config=False, check_version=False, check_length=False, mode='NER') as bc:\n start_t = time.perf_counter()\n str = '1\u670824\u65e5\uff0c\u65b0\u534e\u793e\u5bf9\u5916\u53d1\u5e03\u4e86\u4e2d\u592e\u5bf9\u96c4\u5b89\u65b0\u533a\u7684\u6307\u5bfc\u610f\u89c1\uff0c\u6d0b\u6d0b\u6d12\u6d121.2\u4e07\u591a\u5b57\uff0c17\u6b21\u63d0\u5230\u5317\u4eac\uff0c4\u6b21\u63d0\u5230\u5929\u6d25\uff0c\u4fe1\u606f\u91cf\u5f88\u5927\uff0c\u5176\u5b9e\u4e5f\u56de\u7b54\u4e86\u4eba\u4eec\u5173\u5fc3\u7684\u5f88\u591a\u95ee\u9898\u3002'\n rst = bc.encode([str])\n print('rst:', rst)\n print(time.perf_counter() - start_t)\n```\nNOTE: input format you can sometime reference bert as service project. \nWelcome to provide more client language code like java or others. \n ## Using yourself data to train\n if you want to use yourself data to train ner model,you just modify the get_labes func.\n ```angular2html\ndef get_labels(self):\n return [\"O\", \"B-PER\", \"I-PER\", \"B-ORG\", \"I-ORG\", \"B-LOC\", \"I-LOC\", \"X\", \"[CLS]\", \"[SEP]\"]\n```\nNOTE: \"X\", \u201c[CLS]\u201d, \u201c[SEP]\u201d These three are necessary, you just replace your data label to this return list. \nOr you can use last code lets the program automatically get the label from training data\n```angular2html\ndef get_labels(self):\n # \u901a\u8fc7\u8bfb\u53d6train\u6587\u4ef6\u83b7\u53d6\u6807\u7b7e\u7684\u65b9\u6cd5\u4f1a\u51fa\u73b0\u4e00\u5b9a\u7684\u98ce\u9669\u3002\n if os.path.exists(os.path.join(FLAGS.output_dir, 'label_list.pkl')):\n with codecs.open(os.path.join(FLAGS.output_dir, 'label_list.pkl'), 'rb') as rf:\n self.labels = pickle.load(rf)\n else:\n if len(self.labels) > 0:\n self.labels = self.labels.union(set([\"X\", \"[CLS]\", \"[SEP]\"]))\n with codecs.open(os.path.join(FLAGS.output_dir, 'label_list.pkl'), 'wb') as rf:\n pickle.dump(self.labels, rf)\n else:\n self.labels = [\"O\", 'B-TIM', 'I-TIM', \"B-PER\", \"I-PER\", \"B-ORG\", \"I-ORG\", \"B-LOC\", \"I-LOC\", \"X\", \"[CLS]\", \"[SEP]\"]\n return self.labels\n\n```\n\n\n## NEW UPDATE\n2019.1.30 Support pip install and command line control \n\n2019.1.30 Add Service/Client for NER process \n\n2019.1.9: Add code to remove the adam related parameters in the model, and reduce the size of the model file from 1.3GB to 400MB. \n\n2019.1.3: Add online predict code \n\n\n\n## reference: \n+ The evaluation codes come from:https://github.com/guillaumegenthial/tf_metrics/blob/master/tf_metrics/__init__.py\n\n+ [https://github.com/google-research/bert](https://github.com/google-research/bert)\n\n+ [https://github.com/kyzhouhzau/BERT-NER](https://github.com/kyzhouhzau/BERT-NER)\n\n+ [https://github.com/zjy-ucas/ChineseNER](https://github.com/zjy-ucas/ChineseNER)\n\n+ [https://github.com/hanxiao/bert-as-service](https://github.com/hanxiao/bert-as-service)\n> Any problem please open issue OR email me(ma_cancan@163.com)\n\n\n", "description_content_type": "text/markdown", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/macanv/BERT-BiLSTM-CRF-NER", "keywords": "bert nlp ner NER named entity recognition bilstm crf tensorflow machine learning sentence encoding embedding serving", "license": "MIT", "maintainer": "", "maintainer_email": "", "name": "bert-base", "package_url": "https://pypi.org/project/bert-base/", "platform": "", "project_url": "https://pypi.org/project/bert-base/", "project_urls": { "Homepage": "https://github.com/macanv/BERT-BiLSTM-CRF-NER" }, "release_url": "https://pypi.org/project/bert-base/0.0.9/", "requires_dist": [ "numpy", "six", "pyzmq (>=16.0.0)", "GPUtil (>=1.3.0)", "termcolor (>=1.1)", "tensorflow (>=1.10.0) ; extra == 'cpu'", "tensorflow-gpu (>=1.10.0) ; extra == 'gpu'", "flask ; extra == 'http'", "flask-compress ; extra == 'http'", "flask-cors ; extra == 'http'", "flask-json ; extra == 'http'" ], "requires_python": "", "summary": "Use Google's BERT for Chinese natural language processing tasks such as named entity recognition and provide server services", "version": "0.0.9" }, "last_serial": 4893370, "releases": { "0.0.5": [ { "comment_text": "", "digests": { "md5": "a24254ee9d6ab8538b649d3e8fe177fc", "sha256": "a1e69c3c846bc0ed46d65bd3e0c4b3ed515889d96e2f448cdec4df38bbf0f652" }, "downloads": -1, "filename": "bert_base-0.0.5-py3-none-any.whl", "has_sig": false, "md5_digest": "a24254ee9d6ab8538b649d3e8fe177fc", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 106611, "upload_time": "2019-02-11T11:32:59", "url": "https://files.pythonhosted.org/packages/2b/82/c48a566090a6bbd71e26d63885405dff6c513235dfd87de64d001c463fb7/bert_base-0.0.5-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "bb38dcc8b83fece92c1b9e68d3a01da7", "sha256": "0a1ef9e4f12f1155fc0f3ef979e7accde123c8ee325e244a6ab1a4348e6a79e5" }, "downloads": -1, "filename": "bert_base-0.0.5.tar.gz", "has_sig": false, "md5_digest": "bb38dcc8b83fece92c1b9e68d3a01da7", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 89383, "upload_time": "2019-02-11T11:33:01", "url": "https://files.pythonhosted.org/packages/6e/fb/f9977c17e7e4b4740003bd86eb9d95d10ae8988453370e3f6c8d22b8260c/bert_base-0.0.5.tar.gz" } ], "0.0.6": [ { "comment_text": "", "digests": { "md5": "d8171611d6bf3c5cfb9194ef325a1e2f", "sha256": "47a3a0589957d7688aa30ee6b47edd66534a04cb008b8b57bcce5758eb040b7c" }, "downloads": -1, "filename": "bert_base-0.0.6-py3.6.egg", "has_sig": false, "md5_digest": "d8171611d6bf3c5cfb9194ef325a1e2f", "packagetype": "bdist_egg", "python_version": "3.6", "requires_python": null, "size": 232681, "upload_time": "2019-02-25T02:21:11", "url": "https://files.pythonhosted.org/packages/b1/b6/bcf20f5134ea5a64912921e92bcf6617a1042ed0a76022fa1f3caccb090b/bert_base-0.0.6-py3.6.egg" }, { "comment_text": "", "digests": { "md5": "27766157046d99bda33eef5e54503063", "sha256": "827fb8e76a43bbb0702d4b4d2644d5c8597d3eaea20c6921330341b71c083394" }, "downloads": -1, "filename": "bert_base-0.0.6-py3-none-any.whl", "has_sig": false, "md5_digest": "27766157046d99bda33eef5e54503063", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 110329, "upload_time": "2019-02-19T06:21:46", "url": "https://files.pythonhosted.org/packages/0d/f1/df8b864c6da6ca30cf1cf4281fbc4621ab21cd267c5cf05631e5da47151a/bert_base-0.0.6-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "78a89d277214f6b3f3d114339c9fe5ab", "sha256": "aa05db76d0c12daf625949abd54c8a460966b8bdafdc2a17c05aa3db54a0bc7d" }, "downloads": -1, "filename": "bert_base-0.0.6.tar.gz", "has_sig": false, "md5_digest": "78a89d277214f6b3f3d114339c9fe5ab", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 93615, "upload_time": "2019-02-19T06:21:48", "url": "https://files.pythonhosted.org/packages/cf/18/8fb477058377482394a9149fe21b130a4637292a26695f72cdcdca29d6cd/bert_base-0.0.6.tar.gz" } ], "0.0.7": [ { "comment_text": "", "digests": { "md5": "bd423454f9ef279d87e22e9b2bf58ea0", "sha256": "9e3da073eb6319a03a1aa0bc67483557e186c819a2708f8dbaf791bc779b2adb" }, "downloads": -1, "filename": "bert_base-0.0.7-py3-none-any.whl", "has_sig": false, "md5_digest": "bd423454f9ef279d87e22e9b2bf58ea0", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 110497, "upload_time": "2019-02-25T02:21:09", "url": "https://files.pythonhosted.org/packages/46/44/782ef0a1d033cdfc8783172c3e8f5290038f5f54e4fa7b68d36a61e9c044/bert_base-0.0.7-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "22d15801fd4684ea48f7dfcd3be4f4c1", "sha256": "37c4b25d77c434820ae67ef0a696b97d65b64a4c02544d4ccf2ca6683aee35f0" }, "downloads": -1, "filename": "bert_base-0.0.7.tar.gz", "has_sig": false, "md5_digest": "22d15801fd4684ea48f7dfcd3be4f4c1", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 93800, "upload_time": "2019-02-25T02:21:13", "url": "https://files.pythonhosted.org/packages/af/4b/f578b9739b27fc0eddc21cbc801522399e458ed804f15f70c3361780f057/bert_base-0.0.7.tar.gz" } ], "0.0.8": [ { "comment_text": "", "digests": { "md5": "58d04ef486f55d52ba780275eb979473", "sha256": "4a04a273ed60d2bb5f33bd4c1e3b534b7234873e59d7e136ed112d62402148fe" }, "downloads": -1, "filename": "bert_base-0.0.8-py3-none-any.whl", "has_sig": false, "md5_digest": "58d04ef486f55d52ba780275eb979473", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 8739, "upload_time": "2019-03-04T02:08:34", "url": "https://files.pythonhosted.org/packages/eb/8d/915852d919c6fc519e8fca96feed62a715c90d0fe81ec13caa7f7834a2e0/bert_base-0.0.8-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "36b1d5fd26f6c4ba3bc775fb0d7bcb5d", "sha256": "5c50d92faedc2e07e0e897ce5702fd8aa7d26700c1e8ba0ffc43e4c6038722a8" }, "downloads": -1, "filename": "bert_base-0.0.8.tar.gz", "has_sig": false, "md5_digest": "36b1d5fd26f6c4ba3bc775fb0d7bcb5d", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 10258, "upload_time": "2019-03-04T02:08:36", "url": "https://files.pythonhosted.org/packages/3f/38/50c6e9828e2ce1f6363c6f3b45930745c933eff0b99030b55d6290501a80/bert_base-0.0.8.tar.gz" } ], "0.0.9": [ { "comment_text": "", "digests": { "md5": "b403bbd720b92f68e9a34cafc7572501", "sha256": "2a806924203ac4de32cb2201c00ed15b6ae31b3ea541400a86ada4a15acc681a" }, "downloads": -1, "filename": "bert_base-0.0.9-py3-none-any.whl", "has_sig": false, "md5_digest": "b403bbd720b92f68e9a34cafc7572501", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 110592, "upload_time": "2019-03-04T08:45:34", "url": "https://files.pythonhosted.org/packages/1c/a0/3df3f40301e6506cd2f496fc0a68aeb6a2c2a697672a524fd3f8ebb7998d/bert_base-0.0.9-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "138cf2f0839d149297a8007af3f0636d", "sha256": "3e869ad320dcd5435def63fec5ba8c27d80d47148306cc47fe93f48b5fda3198" }, "downloads": -1, "filename": "bert_base-0.0.9.tar.gz", "has_sig": false, "md5_digest": "138cf2f0839d149297a8007af3f0636d", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 93902, "upload_time": "2019-03-04T08:45:36", "url": "https://files.pythonhosted.org/packages/b1/f3/d7a866780ad694350d3764b00afff0e3eac552aa96c0a2d6729460e2d2f0/bert_base-0.0.9.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "b403bbd720b92f68e9a34cafc7572501", "sha256": "2a806924203ac4de32cb2201c00ed15b6ae31b3ea541400a86ada4a15acc681a" }, "downloads": -1, "filename": "bert_base-0.0.9-py3-none-any.whl", "has_sig": false, "md5_digest": "b403bbd720b92f68e9a34cafc7572501", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 110592, "upload_time": "2019-03-04T08:45:34", "url": "https://files.pythonhosted.org/packages/1c/a0/3df3f40301e6506cd2f496fc0a68aeb6a2c2a697672a524fd3f8ebb7998d/bert_base-0.0.9-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "138cf2f0839d149297a8007af3f0636d", "sha256": "3e869ad320dcd5435def63fec5ba8c27d80d47148306cc47fe93f48b5fda3198" }, "downloads": -1, "filename": "bert_base-0.0.9.tar.gz", "has_sig": false, "md5_digest": "138cf2f0839d149297a8007af3f0636d", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 93902, "upload_time": "2019-03-04T08:45:36", "url": "https://files.pythonhosted.org/packages/b1/f3/d7a866780ad694350d3764b00afff0e3eac552aa96c0a2d6729460e2d2f0/bert_base-0.0.9.tar.gz" } ] }