{ "info": { "author": "blmoistawinde", "author_email": "1840962220@qq.com", "bugtrack_url": null, "classifiers": [ "License :: OSI Approved :: MIT License", "Natural Language :: Chinese (Simplified)", "Natural Language :: Chinese (Traditional)", "Operating System :: OS Independent", "Programming Language :: Python :: 3", "Topic :: Text Processing" ], "description": "\ufeff# HarvestText\n\nSow with little data seed, harvest much from a text field.\n\n\u64ad\u6492\u51e0\u591a\u79cd\u5b50\u8bcd\uff0c\u6536\u83b7\u4e07\u5343\u9886\u57df\u5b9e\n\n![PyPI - Python Version](https://img.shields.io/badge/python-3.6-blue.svg) ![GitHub](https://img.shields.io/github/license/mashape/apistatus.svg) ![Version](https://img.shields.io/badge/version-V0.5-red.svg)\n\n## \u7528\u9014\nHarvestText\u662f\u4e00\u4e2a\u4e13\u6ce8\u65e0\uff08\u5f31\uff09\u76d1\u7763\u65b9\u6cd5\uff0c\u80fd\u591f\u6574\u5408\u9886\u57df\u77e5\u8bc6\uff08\u5982\u7c7b\u578b\uff0c\u522b\u540d\uff09\u5bf9\u7279\u5b9a\u9886\u57df\u6587\u672c\u8fdb\u884c\u7b80\u5355\u9ad8\u6548\u5730\u5904\u7406\u548c\u5206\u6790\u7684\u5e93\u3002\u9002\u7528\u4e8e\u8bb8\u591a\u6587\u672c\u9884\u5904\u7406\u548c\u521d\u6b65\u63a2\u7d22\u6027\u5206\u6790\u4efb\u52a1\uff0c\u5728\u5c0f\u8bf4\u5206\u6790\uff0c\u7f51\u7edc\u6587\u672c\uff0c\u4e13\u4e1a\u6587\u732e\u7b49\u9886\u57df\u90fd\u6709\u6f5c\u5728\u5e94\u7528\u4ef7\u503c\u3002\n\n\u4f7f\u7528\u6848\u4f8b:\n- [\u5206\u6790\u300a\u4e09\u56fd\u6f14\u4e49\u300b\u4e2d\u7684\u793e\u4ea4\u7f51\u7edc](https://blog.csdn.net/blmoistawinde/article/details/85344906)\uff08\u5b9e\u4f53\u5206\u8bcd\uff0c\u6587\u672c\u6458\u8981\uff0c\u5173\u7cfb\u7f51\u7edc\u7b49\uff09\n![\u7f51\u7edc\u5efa\u6a21\u8fc7\u7a0b\u793a\u610f.png](https://img-blog.csdnimg.cn/20181229200533159.png?x-oss-)\n- [2018\u4e2d\u8d85\u8206\u60c5\u5c55\u793a\u7cfb\u7edf](https://blmoistawinde.github.io/SuperLegal2018Display/index.html)\uff08\u5b9e\u4f53\u5206\u8bcd\uff0c\u60c5\u611f\u5206\u6790\uff0c\u65b0\u8bcd\u53d1\u73b0\\[\u8f85\u52a9\u7ef0\u53f7\u8bc6\u522b\\]\u7b49\uff09\n\u76f8\u5173\u6587\u7ae0\uff1a[\u4e00\u6587\u770b\u8bc4\u8bba\u91cc\u7684\u4e2d\u8d85\u98ce\u4e91](https://blog.csdn.net/blmoistawinde/article/details/83443196)\n![2018\u4e2d\u8d85\u8206\u60c5\u5c55\u793a\u7cfb\u7edf](https://img-blog.csdnimg.cn/20181027084021173.png)\n\n\u3010\u6ce8\uff1a\u672c\u5e93\u4ec5\u5b8c\u6210\u5b9e\u4f53\u5206\u8bcd\u548c\u60c5\u611f\u5206\u6790\uff0c\u53ef\u89c6\u5316\u4f7f\u7528matplotlib\u3011\n- [\u8fd1\u4ee3\u53f2\u7eb2\u8981\u4fe1\u606f\u62bd\u53d6\u53ca\u95ee\u7b54\u7cfb\u7edf](https://blog.csdn.net/blmoistawinde/article/details/86557070)(\u547d\u540d\u5b9e\u4f53\u8bc6\u522b\uff0c\u4f9d\u5b58\u53e5\u6cd5\u5206\u6790\uff0c\u7b80\u6613\u95ee\u7b54\u7cfb\u7edf)\n\n\u5177\u4f53\u529f\u80fd\u5982\u4e0b\uff1a\n\n\u76ee\u5f55:\n- \u57fa\u672c\u5904\u7406\n\t- [\u7cbe\u7ec6\u5206\u8bcd\u5206\u53e5](#\u5b9e\u4f53\u94fe\u63a5)\n\t\t- \u53ef\u5305\u542b\u6307\u5b9a\u8bcd\u548c\u7c7b\u522b\u7684\u5206\u8bcd\u3002\u5145\u5206\u8003\u8651\u7701\u7565\u53f7\uff0c\u53cc\u5f15\u53f7\u7b49\u7279\u6b8a\u6807\u70b9\u7684\u5206\u53e5\u3002\n\t- [\u5b9e\u4f53\u94fe\u63a5](#\u5b9e\u4f53\u94fe\u63a5)\n\t\t- \u628a\u522b\u540d\uff0c\u7f29\u5199\u4e0e\u4ed6\u4eec\u7684\u6807\u51c6\u540d\u8054\u7cfb\u8d77\u6765\u3002 \n\t- [\u547d\u540d\u5b9e\u4f53\u8bc6\u522b](#\u547d\u540d\u5b9e\u4f53\u8bc6\u522b)\n\t\t- \u627e\u5230\u4e00\u53e5\u53e5\u5b50\u4e2d\u7684\u4eba\u540d\uff0c\u5730\u540d\uff0c\u673a\u6784\u540d\u7b49\u547d\u540d\u5b9e\u4f53\u3002\n\t- [\u4f9d\u5b58\u53e5\u6cd5\u5206\u6790](#\u4f9d\u5b58\u53e5\u6cd5\u5206\u6790)\n\t\t- \u5206\u6790\u8bed\u53e5\u4e2d\u5404\u4e2a\u8bcd\u8bed\uff08\u5305\u62ec\u94fe\u63a5\u5230\u7684\u5b9e\u4f53\uff09\u7684\u4e3b\u8c13\u5bbe\u8bed\u4fee\u9970\u7b49\u8bed\u6cd5\u5173\u7cfb\uff0c\n\t- [\u5185\u7f6e\u8d44\u6e90](#\u5185\u7f6e\u8d44\u6e90)\n\t\t- \u901a\u7528\u505c\u7528\u8bcd\uff0c\u901a\u7528\u60c5\u611f\u8bcd\uff0cIT\u3001\u8d22\u7ecf\u3001\u996e\u98df\u3001\u6cd5\u5f8b\u7b49\u9886\u57df\u8bcd\u5178\u3002\u53ef\u76f4\u63a5\u7528\u4e8e\u4ee5\u4e0a\u4efb\u52a1\u3002\n\t- [\u4fe1\u606f\u68c0\u7d22](#\u4fe1\u606f\u68c0\u7d22)\n\t\t- \u7edf\u8ba1\u7279\u5b9a\u5b9e\u4f53\u51fa\u73b0\u7684\u4f4d\u7f6e\uff0c\u6b21\u6570\u7b49\u3002\n\t- [\u65b0\u8bcd\u53d1\u73b0](#\u65b0\u8bcd\u53d1\u73b0)\n\t\t- \u5229\u7528\u7edf\u8ba1\u89c4\u5f8b\uff08\u6216\u89c4\u5219\uff09\u53d1\u73b0\u8bed\u6599\u4e2d\u53ef\u80fd\u4f1a\u88ab\u4f20\u7edf\u5206\u8bcd\u9057\u6f0f\u7684\u7279\u6b8a\u8bcd\u6c47\u3002\u4e5f\u4fbf\u4e8e\u4ece\u6587\u672c\u4e2d\u5feb\u901f\u7b5b\u9009\u51fa\u5173\u952e\u8bcd\u3002\n\t- [\u5b57\u7b26\u62fc\u97f3\u7ea0\u9519](#\u5b57\u7b26\u62fc\u97f3\u7ea0\u9519)\n\t\t- \u628a\u8bed\u53e5\u4e2d\u6709\u53ef\u80fd\u662f\u5df2\u77e5\u5b9e\u4f53\u7684\u9519\u8bef\u62fc\u5199\uff08\u8bef\u5dee\u4e00\u4e2a\u5b57\u7b26\u6216\u62fc\u97f3\uff09\u7684\u8bcd\u8bed\u94fe\u63a5\u5230\u5bf9\u5e94\u5b9e\u4f53\u3002\n\t- [\u5b58\u53d6\u6d88\u9664](#\u5b58\u53d6\u4e0e\u6d88\u9664)\n\t\t- \u53ef\u4ee5\u672c\u5730\u4fdd\u5b58\u6a21\u578b\u518d\u8bfb\u53d6\u590d\u7528\uff0c\u4e5f\u53ef\u4ee5\u6d88\u9664\u5f53\u524d\u6a21\u578b\u7684\u8bb0\u5f55\u3002\n- \u9ad8\u5c42\u5e94\u7528\n\t- [\u60c5\u611f\u5206\u6790](#\u60c5\u611f\u5206\u6790)\n\t\t- \u7ed9\u51fa\u5c11\u91cf\u79cd\u5b50\u8bcd\uff08\u901a\u7528\u7684\u8912\u8d2c\u4e49\u8bcd\u8bed\uff09\uff0c\u5f97\u5230\u8bed\u6599\u4e2d\u5404\u4e2a\u8bcd\u8bed\u548c\u8bed\u6bb5\u7684\u8912\u8d2c\u5ea6\u3002\n\t- [\u5173\u7cfb\u7f51\u7edc](#\u5173\u7cfb\u7f51\u7edc)\n\t\t- \u5229\u7528\u5171\u73b0\u5173\u7cfb\uff0c\u83b7\u5f97\u5173\u952e\u8bcd\u4e4b\u95f4\u7684\u7f51\u7edc\u3002\u6216\u8005\u4ee5\u4e00\u4e2a\u7ed9\u5b9a\u8bcd\u8bed\u4e3a\u4e2d\u5fc3\uff0c\u63a2\u7d22\u4e0e\u5176\u76f8\u5173\u7684\u8bcd\u8bed\u7f51\u7edc\u3002\n\t- [\u6587\u672c\u6458\u8981](#\u6587\u672c\u6458\u8981)\n\t\t- \u57fa\u4e8eTextrank\u7b97\u6cd5\uff0c\u5f97\u5230\u4e00\u7cfb\u5217\u53e5\u5b50\u4e2d\u7684\u4ee3\u8868\u6027\u53e5\u5b50\u3002\n\t- [\u4e8b\u5b9e\u62bd\u53d6](#\u4f9d\u5b58\u53e5\u6cd5\u5206\u6790)\n\t\t- \u5229\u7528\u53e5\u6cd5\u5206\u6790\uff0c\u63d0\u53d6\u53ef\u80fd\u8868\u793a\u4e8b\u4ef6\u7684\u4e09\u5143\u7ec4\u3002\n\t- [\u7b80\u6613\u95ee\u7b54\u7cfb\u7edf](#\u7b80\u6613\u95ee\u7b54\u7cfb\u7edf)\n\t\t- \u4ece\u4e09\u5143\u7ec4\u4e2d\u5efa\u7acb\u77e5\u8bc6\u56fe\u8c31\u5e76\u5e94\u7528\u4e8e\u95ee\u7b54\uff0c\u53ef\u4ee5\u5b9a\u5236\u4e00\u4e9b\u95ee\u9898\u6a21\u677f\u3002\u6548\u679c\u6709\u5f85\u63d0\u5347\uff0c\u4ec5\u4f5c\u4e3a\u793a\u4f8b\u3002\n\n\n## \u7528\u6cd5\n\n\n\u9996\u5148\u5b89\u88c5\uff0c\n\u4f7f\u7528pip\n```\npip install harvesttext\n```\n\n\u6216\u8fdb\u5165setup.py\u6240\u5728\u76ee\u5f55\uff0c\u7136\u540e\u547d\u4ee4\u884c:\n```\npython setup.py install\n```\n\n\u968f\u540e\u5728\u4ee3\u7801\u4e2d\uff1a\n\n```python3\nfrom harvesttext import HarvestText\nht = HarvestText()\n```\n\n\u5373\u53ef\u8c03\u7528\u672c\u5e93\u7684\u529f\u80fd\u63a5\u53e3\u3002\n\n \n### \u5b9e\u4f53\u94fe\u63a5\n\u7ed9\u5b9a\u67d0\u4e9b\u5b9e\u4f53\u53ca\u5176\u53ef\u80fd\u7684\u4ee3\u79f0\uff0c\u4ee5\u53ca\u5b9e\u4f53\u5bf9\u5e94\u7c7b\u578b\u3002\u5c06\u5176\u767b\u5f55\u5230\u8bcd\u5178\u4e2d\uff0c\u5728\u5206\u8bcd\u65f6\u4f18\u5148\u5207\u5206\u51fa\u6765\uff0c\u5e76\u4e14\u4ee5\u5bf9\u5e94\u7c7b\u578b\u4f5c\u4e3a\u8bcd\u6027\u3002\u4e5f\u53ef\u4ee5\u5355\u72ec\u83b7\u5f97\u8bed\u6599\u4e2d\u7684\u6240\u6709\u5b9e\u4f53\u53ca\u5176\u4f4d\u7f6e\uff1a\n\n```python3\npara = \"\u4e0a\u6e2f\u7684\u6b66\u78ca\u548c\u6052\u5927\u7684\u90dc\u6797\uff0c\u8c01\u662f\u4e2d\u56fd\u6700\u597d\u7684\u524d\u950b\uff1f\u90a3\u5f53\u7136\u662f\u6b66\u78ca\u6b66\u7403\u738b\u4e86\uff0c\u4ed6\u662f\u5c04\u624b\u699c\u7b2c\u4e00\uff0c\u539f\u6765\u662f\u5f31\u70b9\u7684\u5355\u5200\u4e5f\u6709\u4e86\u8fdb\u6b65\"\nentity_mention_dict = {'\u6b66\u78ca':['\u6b66\u78ca','\u6b66\u7403\u738b'],'\u90dc\u6797':['\u90dc\u6797','\u90dc\u98de\u673a'],'\u524d\u950b':['\u524d\u950b'],'\u4e0a\u6d77\u4e0a\u6e2f':['\u4e0a\u6e2f'],'\u5e7f\u5dde\u6052\u5927':['\u6052\u5927'],'\u5355\u5200\u7403':['\u5355\u5200']}\nentity_type_dict = {'\u6b66\u78ca':'\u7403\u5458','\u90dc\u6797':'\u7403\u5458','\u524d\u950b':'\u4f4d\u7f6e','\u4e0a\u6d77\u4e0a\u6e2f':'\u7403\u961f','\u5e7f\u5dde\u6052\u5927':'\u7403\u961f','\u5355\u5200\u7403':'\u672f\u8bed'}\nht.add_entities(entity_mention_dict,entity_type_dict)\nprint(\"\\nSentence segmentation\")\nprint(ht.seg(para,return_sent=True)) # return_sent=False\u65f6\uff0c\u5219\u8fd4\u56de\u8bcd\u8bed\u5217\u8868\n```\n\n> \u4e0a\u6e2f \u7684 \u6b66\u78ca \u548c \u6052\u5927 \u7684 \u90dc\u6797 \uff0c \u8c01 \u662f \u4e2d\u56fd \u6700\u597d \u7684 \u524d\u950b \uff1f \u90a3 \u5f53\u7136 \u662f \u6b66\u78ca \u6b66\u7403\u738b \u4e86\uff0c \u4ed6 \u662f \u5c04\u624b\u699c \u7b2c\u4e00 \uff0c \u539f\u6765 \u662f \u5f31\u70b9 \u7684 \u5355\u5200 \u4e5f \u6709 \u4e86 \u8fdb\u6b65\n\n\u91c7\u7528\u4f20\u7edf\u7684\u5206\u8bcd\u5de5\u5177\u5f88\u5bb9\u6613\u628a\u201c\u6b66\u7403\u738b\u201d\u62c6\u5206\u4e3a\u201c\u6b66 \u7403\u738b\u201d\n\n\u8bcd\u6027\u6807\u6ce8\uff0c\u5305\u62ec\u6307\u5b9a\u7684\u7279\u6b8a\u7c7b\u578b\u3002\n```python3\nprint(\"\\nPOS tagging with entity types\")\nfor word, flag in ht.posseg(para):\n\tprint(\"%s:%s\" % (word, flag),end = \" \")\n```\n\n> \u4e0a\u6e2f:\u7403\u961f \u7684:uj \u6b66\u78ca:\u7403\u5458 \u548c:c \u6052\u5927:\u7403\u961f \u7684:uj \u90dc\u6797:\u7403\u5458 \uff0c:x \u8c01:r \u662f:v \u4e2d\u56fd:ns \u6700\u597d:a \u7684:uj \u524d\u950b:\u4f4d\u7f6e \uff1f:x \u90a3:r \u5f53\u7136:d \u662f:v \u6b66\u78ca:\u7403\u5458 \u6b66\u7403\u738b:\u7403\u5458 \u4e86:ul \uff0c:x \u4ed6:r \u662f:v \u5c04\u624b\u699c:n \u7b2c\u4e00:m \uff0c:x \u539f\u6765:d \u662f:v \u5f31\u70b9:n \u7684:uj \u5355\u5200:\u672f\u8bed \u4e5f:d \u6709:v \u4e86:ul \u8fdb\u6b65:d \n\n```python3\nfor span, entity in ht.entity_linking(para):\n\tprint(span, entity)\n```\n\n> [0, 2] ('\u4e0a\u6d77\u4e0a\u6e2f', '#\u7403\u961f#')\n[3, 5] ('\u6b66\u78ca', '#\u7403\u5458#')\n[6, 8] ('\u5e7f\u5dde\u6052\u5927', '#\u7403\u961f#')\n[9, 11] ('\u90dc\u6797', '#\u7403\u5458#')\n[19, 21] ('\u524d\u950b', '#\u4f4d\u7f6e#')\n[26, 28] ('\u6b66\u78ca', '#\u7403\u5458#')\n[28, 31] ('\u6b66\u78ca', '#\u7403\u5458#')\n[47, 49] ('\u5355\u5200\u7403', '#\u672f\u8bed#')\n\n\u8fd9\u91cc\u628a\u201c\u6b66\u7403\u738b\u201d\u8f6c\u5316\u4e3a\u4e86\u6807\u51c6\u6307\u79f0\u201c\u6b66\u78ca\u201d\uff0c\u53ef\u4ee5\u4fbf\u4e8e\u6807\u51c6\u7edf\u4e00\u7684\u7edf\u8ba1\u5de5\u4f5c\u3002\n\n\u5206\u53e5\uff1a\n```python3\nprint(ht.cut_sentences(para))\n```\n\n> ['\u4e0a\u6e2f\u7684\u6b66\u78ca\u548c\u6052\u5927\u7684\u90dc\u6797\uff0c\u8c01\u662f\u4e2d\u56fd\u6700\u597d\u7684\u524d\u950b\uff1f', '\u90a3\u5f53\u7136\u662f\u6b66\u78ca\u6b66\u7403\u738b\u4e86\uff0c\u4ed6\u662f\u5c04\u624b\u699c\u7b2c\u4e00\uff0c\u539f\u6765\u662f\u5f31\u70b9\u7684\u5355\u5200\u4e5f\u6709\u4e86\u8fdb\u6b65']\n\n\u5982\u679c\u624b\u5934\u6682\u65f6\u6ca1\u6709\u53ef\u7528\u7684\u8bcd\u5178\uff0c\u4e0d\u59a8\u770b\u770b\u672c\u5e93[\u5185\u7f6e\u8d44\u6e90](#\u5185\u7f6e\u8d44\u6e90)\u4e2d\u7684\u9886\u57df\u8bcd\u5178\u662f\u5426\u9002\u5408\u4f60\u7684\u9700\u8981\u3002\n\n\\*\u73b0\u5728\u672c\u5e93\u80fd\u591f\u4e5f\u7528\u4e00\u4e9b\u57fa\u672c\u7b56\u7565\u6765\u5904\u7406\u590d\u6742\u7684\u5b9e\u4f53\u6d88\u6b67\u4efb\u52a1\uff08\u6bd4\u5982\u4e00\u8bcd\u591a\u4e49\u3010\"\u8001\u5e08\"\u662f\u6307\"A\u8001\u5e08\"\u8fd8\u662f\"B\u8001\u5e08\"\uff1f\u3011\u3001\u5019\u9009\u8bcd\u91cd\u53e0\u3010xx\u5e02\u957f/\u6c5fyy\uff1f\u3001xx\u5e02\u957f/\u6c5fyy\uff1f\u3011\uff09\u3002\n\u5177\u4f53\u53ef\u89c1[linking_strategy()](./examples/basics.py#linking_strategy)\n\n \n### \u547d\u540d\u5b9e\u4f53\u8bc6\u522b\n\u627e\u5230\u4e00\u53e5\u53e5\u5b50\u4e2d\u7684\u4eba\u540d\uff0c\u5730\u540d\uff0c\u673a\u6784\u540d\u7b49\u547d\u540d\u5b9e\u4f53\u3002\u4f7f\u7528\u4e86 [pyhanLP](https://github.com/hankcs/pyhanlp) \u7684\u63a5\u53e3\u5b9e\u73b0\u3002\n\n```python\nht0 = HarvestText()\nsent = \"\u4e0a\u6d77\u4e0a\u6e2f\u8db3\u7403\u961f\u7684\u6b66\u78ca\u662f\u4e2d\u56fd\u6700\u597d\u7684\u524d\u950b\u3002\"\nprint(ht0.named_entity_recognition(sent))\n```\n\n```\n{'\u4e0a\u6d77\u4e0a\u6e2f\u8db3\u7403\u961f': '\u673a\u6784\u540d', '\u6b66\u78ca': '\u4eba\u540d', '\u4e2d\u56fd': '\u5730\u540d'}\n```\n\n \n### \u4f9d\u5b58\u53e5\u6cd5\u5206\u6790\n\u5206\u6790\u8bed\u53e5\u4e2d\u5404\u4e2a\u8bcd\u8bed\uff08\u5305\u62ec\u94fe\u63a5\u5230\u7684\u5b9e\u4f53\uff09\u7684\u4e3b\u8c13\u5bbe\u8bed\u4fee\u9970\u7b49\u8bed\u6cd5\u5173\u7cfb\uff0c\u5e76\u4ee5\u6b64\u63d0\u53d6\u53ef\u80fd\u7684\u4e8b\u4ef6\u4e09\u5143\u7ec4\u3002\u4f7f\u7528\u4e86 [pyhanLP](https://github.com/hankcs/pyhanlp) \u7684\u63a5\u53e3\u5b9e\u73b0\u3002\n\n```python\nht0 = HarvestText()\npara = \"\u4e0a\u6e2f\u7684\u6b66\u78ca\u6b66\u7403\u738b\u662f\u4e2d\u56fd\u6700\u597d\u7684\u524d\u950b\u3002\"\nentity_mention_dict = {'\u6b66\u78ca': ['\u6b66\u78ca', '\u6b66\u7403\u738b'], \"\u4e0a\u6d77\u4e0a\u6e2f\":[\"\u4e0a\u6e2f\"]}\nentity_type_dict = {'\u6b66\u78ca': '\u7403\u5458', \"\u4e0a\u6d77\u4e0a\u6e2f\":\"\u7403\u961f\"}\nht0.add_entities(entity_mention_dict, entity_type_dict)\nfor arc in ht0.dependency_parse(para):\n print(arc)\nprint(ht0.triple_extraction(para))\n```\n\n```\n[0, '\u4e0a\u6e2f', '\u7403\u961f', '\u5b9a\u4e2d\u5173\u7cfb', 3]\n[1, '\u7684', 'u', '\u53f3\u9644\u52a0\u5173\u7cfb', 0]\n[2, '\u6b66\u78ca', '\u7403\u5458', '\u5b9a\u4e2d\u5173\u7cfb', 3]\n[3, '\u6b66\u7403\u738b', '\u7403\u5458', '\u4e3b\u8c13\u5173\u7cfb', 4]\n[4, '\u662f', 'v', '\u6838\u5fc3\u5173\u7cfb', -1]\n[5, '\u4e2d\u56fd', 'ns', '\u5b9a\u4e2d\u5173\u7cfb', 8]\n[6, '\u6700\u597d', 'd', '\u5b9a\u4e2d\u5173\u7cfb', 8]\n[7, '\u7684', 'u', '\u53f3\u9644\u52a0\u5173\u7cfb', 6]\n[8, '\u524d\u950b', 'n', '\u52a8\u5bbe\u5173\u7cfb', 4]\n[9, '\u3002', 'w', '\u6807\u70b9\u7b26\u53f7', 4]\n```\n```python\nprint(ht0.triple_extraction(para))\n```\n```\n[['\u4e0a\u6e2f\u6b66\u78ca\u6b66\u7403\u738b', '\u662f', '\u4e2d\u56fd\u6700\u597d\u524d\u950b']]\n```\n\n \n\n### \u5b57\u7b26\u62fc\u97f3\u7ea0\u9519\n\u628a\u8bed\u53e5\u4e2d\u6709\u53ef\u80fd\u662f\u5df2\u77e5\u5b9e\u4f53\u7684\u9519\u8bef\u62fc\u5199\uff08\u8bef\u5dee\u4e00\u4e2a\u5b57\u7b26\u6216\u62fc\u97f3\uff09\u7684\u8bcd\u8bed\u94fe\u63a5\u5230\u5bf9\u5e94\u5b9e\u4f53\u3002\n```python\ndef entity_error_check():\n ht0 = HarvestText()\n typed_words = {\"\u4eba\u540d\":[\"\u6b66\u78ca\"]}\n ht0.add_typed_words(typed_words)\n sent1 = \"\u6b66\u78ca\u548c\u5434\u529b\u53ea\u5dee\u4e00\u4e2a\u62fc\u97f3\"\n print(sent1)\n print(ht0.entity_linking(sent1, pinyin_recheck=True))\n sent2 = \"\u6b66\u78ca\u548c\u5434\u78ca\u53ea\u5dee\u4e00\u4e2a\u5b57\"\n print(sent2)\n print(ht0.entity_linking(sent2, char_recheck=True))\n sent3 = \"\u5434\u78ca\u548c\u5434\u529b\u90fd\u53ef\u80fd\u662f\u6b66\u78ca\u7684\u4ee3\u79f0\"\n print(sent3)\n print(ht0.get_linking_mention_candidates(sent3, pinyin_recheck=True, char_recheck=True))\nentity_error_check()\n```\n\n```\n\u6b66\u78ca\u548c\u5434\u529b\u53ea\u5dee\u4e00\u4e2a\u62fc\u97f3\n[([0, 2], ('\u6b66\u78ca', '#\u4eba\u540d#')), [(3, 5), ('\u6b66\u78ca', '#\u4eba\u540d#')]]\n\u6b66\u78ca\u548c\u5434\u78ca\u53ea\u5dee\u4e00\u4e2a\u5b57\n[([0, 2], ('\u6b66\u78ca', '#\u4eba\u540d#')), [(3, 5), ('\u6b66\u78ca', '#\u4eba\u540d#')]]\n\u5434\u78ca\u548c\u5434\u529b\u90fd\u53ef\u80fd\u662f\u6b66\u78ca\u7684\u4ee3\u79f0\n('\u5434\u78ca\u548c\u5434\u529b\u90fd\u53ef\u80fd\u662f\u6b66\u78ca\u7684\u4ee3\u79f0', defaultdict(, {(0, 2): {'\u6b66\u78ca'}, (3, 5): {'\u6b66\u78ca'}}))\n```\n \n\n### \u60c5\u611f\u5206\u6790\n\u672c\u5e93\u91c7\u7528\u60c5\u611f\u8bcd\u5178\u65b9\u6cd5\u8fdb\u884c\u60c5\u611f\u5206\u6790\uff0c\u901a\u8fc7\u63d0\u4f9b\u5c11\u91cf\u6807\u51c6\u7684\u8912\u8d2c\u4e49\u8bcd\u8bed\uff08\u201c\u79cd\u5b50\u8bcd\u201d\uff09\uff0c\u4ece\u8bed\u6599\u4e2d\u81ea\u52a8\u5b66\u4e60\u5176\u4ed6\u8bcd\u8bed\u7684\u60c5\u611f\u503e\u5411\uff0c\u5f62\u6210\u60c5\u611f\u8bcd\u5178\u3002\u5bf9\u53e5\u4e2d\u60c5\u611f\u8bcd\u7684\u52a0\u603b\u5e73\u5747\u5219\u7528\u4e8e\u5224\u65ad\u53e5\u5b50\u7684\u60c5\u611f\u503e\u5411\uff1a\n\n```python3\nprint(\"\\nsentiment dictionary\")\nsents = [\"\u6b66\u78ca\u5a01\u6b66\uff0c\u4e2d\u8d85\u7b2c\u4e00\u5c04\u624b\uff01\",\n \"\u6b66\u78ca\u5f3a\uff0c\u4e2d\u8d85\u6700\u7b2c\u4e00\u672c\u571f\u7403\u5458\uff01\",\n \"\u90dc\u6797\u4e0d\u884c\uff0c\u53ea\u4f1a\u62b1\u6028\u7684\u7403\u5458\u6ce8\u5b9a\u4e0a\u9650\u4e86\",\n \"\u90dc\u6797\u770b\u6765\u4e0d\u884c\uff0c\u5df2\u7ecf\u5230\u4e0a\u9650\u4e86\"]\nsent_dict = ht.build_sent_dict(sents,min_times=1,pos_seeds=[\"\u7b2c\u4e00\"],neg_seeds=[\"\u4e0d\u884c\"])\nprint(\"%s:%f\" % (\"\u5a01\u6b66\",sent_dict[\"\u5a01\u6b66\"]))\nprint(\"%s:%f\" % (\"\u7403\u5458\",sent_dict[\"\u7403\u5458\"]))\nprint(\"%s:%f\" % (\"\u4e0a\u9650\",sent_dict[\"\u4e0a\u9650\"]))\n```\n\n> sentiment dictionary \n> \u5a01\u6b66:1.000000 \n> \u7403\u5458:0.000000 \n> \u4e0a\u9650:-1.000000\n\n```python3\nprint(\"\\nsentence sentiment\")\nsent = \"\u6b66\u7403\u738b\u5a01\u6b66\uff0c\u4e2d\u8d85\u6700\u5f3a\u7403\u5458\uff01\"\nprint(\"%f:%s\" % (ht.analyse_sent(sent),sent))\n```\n> 0.600000:\u6b66\u7403\u738b\u5a01\u6b66\uff0c\u4e2d\u8d85\u6700\u5f3a\u7403\u5458\uff01\n\n\u5982\u679c\u6ca1\u60f3\u597d\u9009\u62e9\u54ea\u4e9b\u8bcd\u8bed\u4f5c\u4e3a\u201c\u79cd\u5b50\u8bcd\u201d\uff0c\u672c\u5e93\u4e2d\u4e5f\u5185\u7f6e\u4e86\u4e00\u4e2a\u901a\u7528\u60c5\u611f\u8bcd\u5178[\u5185\u7f6e\u8d44\u6e90](#\u5185\u7f6e\u8d44\u6e90)\uff0c\u5728\u4e0d\u6307\u5b9a\u60c5\u611f\u8bcd\u65f6\u4f5c\u4e3a\u9ed8\u8ba4\u7684\u9009\u62e9\uff0c\u4e5f\u53ef\u4ee5\u6839\u636e\u9700\u8981\u4ece\u4e2d\u6311\u9009\u3002\n\n\u9ed8\u8ba4\u4f7f\u7528\u7684SO-PMI\u7b97\u6cd5\u5bf9\u4e8e\u60c5\u611f\u503c\u6ca1\u6709\u4e0a\u4e0b\u754c\u7ea6\u675f\uff0c\u5982\u679c\u9700\u8981\u9650\u5236\u5728[0,1]\u6216\u8005[-1,1]\u8fd9\u6837\u7684\u533a\u95f4\u7684\u8bdd\uff0c\u53ef\u4ee5\u8c03\u6574scale\u53c2\u6570\uff0c\u4f8b\u5b50\u5982\u4e0b\uff1a\n\n```python3\nprint(\"\\nsentiment dictionary using default seed words\")\ndocs = [\"\u5f20\u5e02\u7b79\u8bbe\u5174\u534e\u5b9e\u4e1a\u516c\u53f8\u5916\u533a\u8d44\u672c\u5bb6\u8e0a\u8dc3\u6295\u8d44\u664b\u5bdf\u5180\u8fb9\u533a\u5174\u534e\u5b9e\u4e1a\u516c\u53f8\uff0c\u81ea\u7b79\u5907\u6210\u7acb\u4ee5\u6765\uff0c\u89e3\u653e\u533a\u5185\u5916\u4f01\u4e1a\u754c\u4eba\u58eb\u53ca\u4e00\u822c\u5546\u6c11\uff0c\u5747\u8e0a\u8dc3\u8ba4\u80a1\u6295\u8d44\",\n \"\u6253\u5012\u4e07\u6076\u7684\u8d44\u672c\u5bb6\",\n \"\u8be5\u516c\u53f8\u539f\u5b9a\u8d44\u672c\u603b\u989d\u4e3a\u4e8c\u5341\u4e94\u4e07\u4e07\u5143\uff0c\u73b0\u5df2\u7531\u5404\u754c\u5206\u8ba4\u8fbe\u4e8c\u5341\u4e07\u4e07\u5143\uff0c\u6240\u5c5e\u5404\u5382\u3001\u5404\u516c\u53f8\u4ea6\u52df\u5f97\u80a1\u91d1\u4e00\u4e07\u4e07\u4f59\u5143\",\n \"\u8fde\u65e5\u6765\u89e3\u653e\u533a\u4ee5\u5916\u5404\u5de5\u5546\u4eba\u58eb\uff0c\u6295\u51fd\u5411\u8be5\u516c\u53f8\u8be2\u95ee\u7ecf\u8425\u6027\u8d28\u4e0e\u8303\u56f4\u4ee5\u53ca\u80a1\u4e1c\u6743\u9650\u7b49\u95ee\u9898\u8005\u751a\u591a\uff0c\u7edc\u7ece\u62b5\u6b64\u7684\u8bb8\u591a\u8d44\u672c\u5bb6\uff0c\u4e8e\u53c2\u89c2\u8be5\u516c\u53f8\u6240\u5c5e\u5404\u5382\u7ecf\u8425\u72b6\u51b5\u540e\uff0c\u5bf9\u6c11\u4e3b\u653f\u5e9c\u6276\u52a9\u4e0e\u5956\u52b1\u79c1\u8425\u4f01\u4e1a\u53d1\u5c55\u7684\u653f\u7b56\uff0c\u5747\u6781\u8868\u8d5e\u540c\uff0c\u6709\u4e9b\u8d44\u672c\u5bb6\u56e0\u6b3e\u9879\u672a\u80fd\u5373\u523b\u6c47\u6765\uff0c\u591a\u5411\u7b79\u5907\u5904\u9884\u8ba4\u6295\u8d44\u7684\u989d\u6570\u3002\u7531\u5e73\u6d25\u6765\u5f20\u7684\u6797\u660e\u68cb\u5148\u751f\uff0c\u4e00\u6b21\u5373\u4ee5\u73b0\u6b3e\u5165\u80a1\u516d\u5341\u4f59\u4e07\u5143\"\n ]\n# scale: \u5c06\u6240\u6709\u8bcd\u8bed\u7684\u60c5\u611f\u503c\u8303\u56f4\u8c03\u6574\u5230[-1,1]\n# \u7701\u7565pos_seeds, neg_seeds,\u5c06\u91c7\u7528\u9ed8\u8ba4\u7684\u60c5\u611f\u8bcd\u5178 get_qh_sent_dict()\nprint(\"scale=\\\"0-1\\\", \u6309\u7167\u6700\u5927\u4e3a1\uff0c\u6700\u5c0f\u4e3a0\u8fdb\u884c\u7ebf\u6027\u4f38\u7f29\uff0c0.5\u672a\u5fc5\u662f\u4e2d\u6027\")\nsent_dict = ht.build_sent_dict(docs,min_times=1,scale=\"0-1\")\nprint(\"%s:%f\" % (\"\u8d5e\u540c\",sent_dict[\"\u8d5e\u540c\"]))\nprint(\"%s:%f\" % (\"\u4e8c\u5341\u4e07\",sent_dict[\"\u4e8c\u5341\u4e07\"]))\nprint(\"%s:%f\" % (\"\u4e07\u6076\",sent_dict[\"\u4e07\u6076\"]))\nprint(\"%f:%s\" % (ht.analyse_sent(docs[0]), docs[0]))\nprint(\"%f:%s\" % (ht.analyse_sent(docs[1]), docs[1]))\n\nprint(\"scale=\\\"+-1\\\", \u5728\u6b63\u8d1f\u533a\u95f4\u5185\u5206\u522b\u4f38\u7f29\uff0c\u4fdd\u75590\u4f5c\u4e3a\u4e2d\u6027\u7684\u8bed\u4e49\")\nsent_dict = ht.build_sent_dict(docs,min_times=1,scale=\"+-1\")\nprint(\"%s:%f\" % (\"\u8d5e\u540c\",sent_dict[\"\u8d5e\u540c\"]))\nprint(\"%s:%f\" % (\"\u4e8c\u5341\u4e07\",sent_dict[\"\u4e8c\u5341\u4e07\"]))\nprint(\"%s:%f\" % (\"\u4e07\u6076\",sent_dict[\"\u4e07\u6076\"]))\nprint(\"%f:%s\" % (ht.analyse_sent(docs[0]), docs[0]))\nprint(\"%f:%s\" % (ht.analyse_sent(docs[1]), docs[1]))\n```\n\n```\nsentiment dictionary using default seed words\nscale=\"0-1\", \u6309\u7167\u6700\u5927\u4e3a1\uff0c\u6700\u5c0f\u4e3a0\u8fdb\u884c\u7ebf\u6027\u4f38\u7f29\uff0c0.5\u672a\u5fc5\u662f\u4e2d\u6027\n\u8d5e\u540c:1.000000\n\u4e8c\u5341\u4e07:0.153846\n\u4e07\u6076:0.000000\n0.449412:\u5f20\u5e02\u7b79\u8bbe\u5174\u534e\u5b9e\u4e1a\u516c\u53f8\u5916\u533a\u8d44\u672c\u5bb6\u8e0a\u8dc3\u6295\u8d44\u664b\u5bdf\u5180\u8fb9\u533a\u5174\u534e\u5b9e\u4e1a\u516c\u53f8\uff0c\u81ea\u7b79\u5907\u6210\u7acb\u4ee5\u6765\uff0c\u89e3\u653e\u533a\u5185\u5916\u4f01\u4e1a\u754c\u4eba\u58eb\u53ca\u4e00\u822c\u5546\u6c11\uff0c\u5747\u8e0a\u8dc3\u8ba4\u80a1\u6295\u8d44\n0.364910:\u6253\u5012\u4e07\u6076\u7684\u8d44\u672c\u5bb6\nscale=\"+-1\", \u5728\u6b63\u8d1f\u533a\u95f4\u5185\u5206\u522b\u4f38\u7f29\uff0c\u4fdd\u75590\u4f5c\u4e3a\u4e2d\u6027\u7684\u8bed\u4e49\n\u8d5e\u540c:1.000000\n\u4e8c\u5341\u4e07:0.000000\n\u4e07\u6076:-1.000000\n0.349305:\u5f20\u5e02\u7b79\u8bbe\u5174\u534e\u5b9e\u4e1a\u516c\u53f8\u5916\u533a\u8d44\u672c\u5bb6\u8e0a\u8dc3\u6295\u8d44\u664b\u5bdf\u5180\u8fb9\u533a\u5174\u534e\u5b9e\u4e1a\u516c\u53f8\uff0c\u81ea\u7b79\u5907\u6210\u7acb\u4ee5\u6765\uff0c\u89e3\u653e\u533a\u5185\u5916\u4f01\u4e1a\u754c\u4eba\u58eb\u53ca\u4e00\u822c\u5546\u6c11\uff0c\u5747\u8e0a\u8dc3\u8ba4\u80a1\u6295\u8d44\n-0.159652:\u6253\u5012\u4e07\u6076\u7684\u8d44\u672c\u5bb6\n```\n\n\n \n\n### \u4fe1\u606f\u68c0\u7d22\n\u53ef\u4ee5\u4ece\u6587\u6863\u5217\u8868\u4e2d\u67e5\u627e\u51fa\u5305\u542b\u5bf9\u5e94\u5b9e\u4f53\uff08\u53ca\u5176\u522b\u79f0\uff09\u7684\u6587\u6863\uff0c\u4ee5\u53ca\u7edf\u8ba1\u5305\u542b\u67d0\u5b9e\u4f53\u7684\u6587\u6863\u6570\u3002\u4f7f\u7528\u5012\u6392\u7d22\u5f15\u7684\u6570\u636e\u7ed3\u6784\u5b8c\u6210\u5feb\u901f\u68c0\u7d22\u3002\n```python3\ndocs = [\"\u6b66\u78ca\u5a01\u6b66\uff0c\u4e2d\u8d85\u7b2c\u4e00\u5c04\u624b\uff01\",\n\t\t\"\u90dc\u6797\u770b\u6765\u4e0d\u884c\uff0c\u5df2\u7ecf\u5230\u4e0a\u9650\u4e86\u3002\",\n\t\t\"\u6b66\u7403\u738b\u5a01\u6b66\uff0c\u4e2d\u8d85\u6700\u5f3a\u524d\u950b\uff01\",\n\t\t\"\u6b66\u78ca\u548c\u90dc\u6797\uff0c\u8c01\u662f\u4e2d\u56fd\u6700\u597d\u7684\u524d\u950b\uff1f\"]\ninv_index = ht.build_index(docs)\nprint(ht.get_entity_counts(docs, inv_index)) # \u83b7\u5f97\u6587\u6863\u4e2d\u6240\u6709\u5b9e\u4f53\u7684\u51fa\u73b0\u6b21\u6570\n# {'\u6b66\u78ca': 3, '\u90dc\u6797': 2, '\u524d\u950b': 2}\n\nprint(ht.search_entity(\"\u6b66\u78ca\", docs, inv_index)) # \u5355\u5b9e\u4f53\u67e5\u627e\n# ['\u6b66\u78ca\u5a01\u6b66\uff0c\u4e2d\u8d85\u7b2c\u4e00\u5c04\u624b\uff01', '\u6b66\u7403\u738b\u5a01\u6b66\uff0c\u4e2d\u8d85\u6700\u5f3a\u524d\u950b\uff01', '\u6b66\u78ca\u548c\u90dc\u6797\uff0c\u8c01\u662f\u4e2d\u56fd\u6700\u597d\u7684\u524d\u950b\uff1f']\n\nprint(ht.search_entity(\"\u6b66\u78ca \u90dc\u6797\", docs, inv_index)) # \u591a\u5b9e\u4f53\u5171\u73b0\n# ['\u6b66\u78ca\u548c\u90dc\u6797\uff0c\u8c01\u662f\u4e2d\u56fd\u6700\u597d\u7684\u524d\u950b\uff1f']\n\n# \u8c01\u662f\u6700\u88ab\u4eba\u4eec\u70ed\u8bae\u7684\u524d\u950b\uff1f\u7528\u8fd9\u91cc\u7684\u63a5\u53e3\u53ef\u4ee5\u5f88\u7b80\u4fbf\u5730\u56de\u7b54\u8fd9\u4e2a\u95ee\u9898\nsubdocs = ht.search_entity(\"#\u7403\u5458# \u524d\u950b\", docs, inv_index)\nprint(subdocs) # \u5b9e\u4f53\u3001\u5b9e\u4f53\u7c7b\u578b\u6df7\u5408\u67e5\u627e\n# ['\u6b66\u7403\u738b\u5a01\u6b66\uff0c\u4e2d\u8d85\u6700\u5f3a\u524d\u950b\uff01', '\u6b66\u78ca\u548c\u90dc\u6797\uff0c\u8c01\u662f\u4e2d\u56fd\u6700\u597d\u7684\u524d\u950b\uff1f']\ninv_index2 = ht.build_index(subdocs)\nprint(ht.get_entity_counts(subdocs, inv_index2, used_type=[\"\u7403\u5458\"])) # \u53ef\u4ee5\u9650\u5b9a\u7c7b\u578b\n# {'\u6b66\u78ca': 2, '\u90dc\u6797': 1}\n```\n\n \n### \u5173\u7cfb\u7f51\u7edc\n(\u4f7f\u7528networkx\u5b9e\u73b0)\n\u5229\u7528\u8bcd\u5171\u73b0\u5173\u7cfb\uff0c\u5efa\u7acb\u5176\u5b9e\u4f53\u95f4\u56fe\u7ed3\u6784\u7684\u7f51\u7edc\u5173\u7cfb(\u8fd4\u56denetworkx.Graph\u7c7b\u578b)\u3002\u53ef\u4ee5\u7528\u6765\u5efa\u7acb\u4eba\u7269\u4e4b\u95f4\u7684\u793e\u4ea4\u7f51\u7edc\u7b49\u3002\n```python3\n# \u5728\u73b0\u6709\u5b9e\u4f53\u5e93\u7684\u57fa\u7840\u4e0a\u968f\u65f6\u65b0\u589e\uff0c\u6bd4\u5982\u4ece\u65b0\u8bcd\u53d1\u73b0\u4e2d\u5f97\u5230\u7684\u6f0f\u7f51\u4e4b\u9c7c\nht.add_new_entity(\"\u989c\u9a8f\u51cc\", \"\u989c\u9a8f\u51cc\", \"\u7403\u5458\")\ndocs = [\"\u6b66\u78ca\u548c\u989c\u9a8f\u51cc\u662f\u961f\u53cb\",\n\t\t\"\u6b66\u78ca\u548c\u90dc\u6797\u90fd\u662f\u56fd\u5185\u9876\u5c16\u524d\u950b\"]\nG = ht.build_entity_graph(docs)\nprint(dict(G.edges.items()))\nG = ht.build_entity_graph(docs, used_types=[\"\u7403\u5458\"])\nprint(dict(G.edges.items()))\n```\n\n\u83b7\u5f97\u4ee5\u4e00\u4e2a\u8bcd\u8bed\u4e3a\u4e2d\u5fc3\u7684\u8bcd\u8bed\u7f51\u7edc\uff0c\u4e0b\u9762\u4ee5\u4e09\u56fd\u7b2c\u4e00\u7ae0\u4e3a\u4f8b\uff0c\u63a2\u7d22\u4e3b\u4eba\u516c\u5218\u5907\u7684\u906d\u9047\uff08\u4e0b\u4e3a\u4e3b\u8981\u4ee3\u7801\uff0c\u4f8b\u5b50\u89c1[build_word_ego_graph()](./examples/basics.py#)\uff09\u3002\n```python3\nentity_mention_dict, entity_type_dict = get_sanguo_entity_dict()\nht0.add_entities(entity_mention_dict, entity_type_dict)\nsanguo1 = get_sanguo()[0]\nstopwords = get_baidu_stopwords()\ndocs = ht0.cut_sentences(sanguo1)\nG = ht0.build_word_ego_graph(docs,\"\u5218\u5907\",min_freq=3,other_min_freq=2,stopwords=stopwords)\n```\n![word_ego_net](/images/word_ego_net.jpg)\n\n\u5218\u5173\u5f20\u4e4b\u60c5\u8c0a\uff0c\u5218\u5907\u6295\u5954\u7684\u9760\u5c71\uff0c\u4ee5\u53ca\u5218\u5907\u8ba8\u8d3c\u4e4b\u7ecf\u5386\u5c3d\u5728\u4e8e\u6b64\u3002\n\n \n### \u6587\u672c\u6458\u8981\n(\u4f7f\u7528networkx\u5b9e\u73b0)\n\u4f7f\u7528Textrank\u7b97\u6cd5\uff0c\u5f97\u5230\u4ece\u6587\u6863\u96c6\u5408\u4e2d\u62bd\u53d6\u4ee3\u8868\u53e5\u4f5c\u4e3a\u6458\u8981\u4fe1\u606f\uff1a\n```python3\nprint(\"\\nText summarization\")\ndocs = [\"\u6b66\u78ca\u5a01\u6b66\uff0c\u4e2d\u8d85\u7b2c\u4e00\u5c04\u624b\uff01\",\n\t\t\"\u90dc\u6797\u770b\u6765\u4e0d\u884c\uff0c\u5df2\u7ecf\u5230\u4e0a\u9650\u4e86\u3002\",\n\t\t\"\u6b66\u7403\u738b\u5a01\u6b66\uff0c\u4e2d\u8d85\u6700\u5f3a\u524d\u950b\uff01\",\n\t\t\"\u6b66\u78ca\u548c\u90dc\u6797\uff0c\u8c01\u662f\u4e2d\u56fd\u6700\u597d\u7684\u524d\u950b\uff1f\"]\nfor doc in ht.get_summary(docs, topK=2):\n\tprint(doc)\n# \u6b66\u7403\u738b\u5a01\u6b66\uff0c\u4e2d\u8d85\u6700\u5f3a\u524d\u950b\uff01\n# \u6b66\u78ca\u5a01\u6b66\uff0c\u4e2d\u8d85\u7b2c\u4e00\u5c04\u624b\uff01\t\n```\n\n\n \n### \u5185\u7f6e\u8d44\u6e90\n\u73b0\u5728\u672c\u5e93\u5185\u96c6\u6210\u4e86\u4e00\u4e9b\u8d44\u6e90\uff0c\u65b9\u4fbf\u4f7f\u7528\u548c\u5efa\u7acbdemo\u3002\n\n\u8d44\u6e90\u5305\u62ec\uff1a\n- \u8912\u8d2c\u4e49\u8bcd\u5178 \u6e05\u534e\u5927\u5b66 \u674e\u519b \u6574\u7406\u81eahttp://nlp.csai.tsinghua.edu.cn/site2/index.php/13-sms\n- \u767e\u5ea6\u505c\u7528\u8bcd\u8bcd\u5178 \u6765\u81ea\u7f51\u7edc\uff1ahttps://wenku.baidu.com/view/98c46383e53a580216fcfed9.html\n- \u9886\u57df\u8bcd\u5178 \u6765\u81ea\u6e05\u534eTHUNLP\uff1a http://thuocl.thunlp.org/ \u5168\u90e8\u7c7b\u578b`['IT', '\u52a8\u7269', '\u533b\u836f', '\u5386\u53f2\u4eba\u540d', '\u5730\u540d', '\u6210\u8bed', '\u6cd5\u5f8b', '\u8d22\u7ecf', '\u98df\u7269']`\n\n\n\u6b64\u5916\uff0c\u8fd8\u63d0\u4f9b\u4e86\u4e00\u4e2a\u7279\u6b8a\u8d44\u6e90\u2014\u2014\u300a\u4e09\u56fd\u6f14\u4e49\u300b\uff0c\u5305\u62ec\uff1a\n- \u4e09\u56fd\u6f14\u4e49\u6587\u8a00\u6587\u6587\u672c\n- \u4e09\u56fd\u6f14\u4e49\u4eba\u540d\u3001\u5dde\u540d\u3001\u52bf\u529b\u77e5\u8bc6\u5e93\n\n\u5927\u5bb6\u53ef\u4ee5\u63a2\u7d22\u4ece\u5176\u4e2d\u80fd\u591f\u5f97\u5230\u4ec0\u4e48\u6709\u8da3\u53d1\u73b0\ud83d\ude01\u3002\n\n```python3\ndef load_resources():\n\tfrom harvesttext.resources import get_qh_sent_dict,get_baidu_stopwords,get_sanguo,get_sanguo_entity_dict\n sdict = get_qh_sent_dict() # {\"pos\":[\u79ef\u6781\u8bcd...],\"neg\":[\u6d88\u6781\u8bcd...]}\n print(\"pos_words:\",list(sdict[\"pos\"])[10:15])\n print(\"neg_words:\",list(sdict[\"neg\"])[5:10])\n\n stopwords = get_baidu_stopwords()\n print(\"stopwords:\", list(stopwords)[5:10])\n\n docs = get_sanguo() # \u6587\u672c\u5217\u8868\uff0c\u6bcf\u4e2a\u5143\u7d20\u4e3a\u4e00\u7ae0\u7684\u6587\u672c\n print(\"\u4e09\u56fd\u6f14\u4e49\u6700\u540e\u4e00\u7ae0\u672b16\u5b57:\\n\",docs[-1][-16:])\n entity_mention_dict, entity_type_dict = get_sanguo_entity_dict()\n print(\"\u5218\u5907 \u6307\u79f0\uff1a\",entity_mention_dict[\"\u5218\u5907\"])\n print(\"\u5218\u5907 \u7c7b\u522b\uff1a\",entity_type_dict[\"\u5218\u5907\"])\n print(\"\u8700 \u7c7b\u522b\uff1a\", entity_type_dict[\"\u8700\"])\n print(\"\u76ca\u5dde \u7c7b\u522b\uff1a\", entity_type_dict[\"\u76ca\u5dde\"])\nload_resources()\n```\n\n```\npos_words: ['\u5bb0\u76f8\u809a\u91cc\u597d\u6491\u8239', '\u67e5\u5b9e', '\u5fe0\u5b9e', '\u540d\u624b', '\u806a\u660e']\nneg_words: ['\u6563\u6f2b', '\u8c17\u8a00', '\u8fc2\u6267', '\u80a0\u80a5\u8111\u6ee1', '\u51fa\u5356']\nstopwords: ['apart', '\u5de6\u53f3', '\u7ed3\u679c', 'probably', 'think']\n\u4e09\u56fd\u6f14\u4e49\u6700\u540e\u4e00\u7ae0\u672b16\u5b57:\n \u9f0e\u8db3\u4e09\u5206\u5df2\u6210\u68a6\uff0c\u540e\u4eba\u51ed\u540a\u7a7a\u7262\u9a9a\u3002\n\u5218\u5907 \u6307\u79f0\uff1a ['\u5218\u5907', '\u5218\u7384\u5fb7', '\u7384\u5fb7']\n\u5218\u5907 \u7c7b\u522b\uff1a \u4eba\u540d\n\u8700 \u7c7b\u522b\uff1a \u52bf\u529b\n\u76ca\u5dde \u7c7b\u522b\uff1a \u5dde\u540d\n```\n\n\u52a0\u8f7d\u6e05\u534e\u9886\u57df\u8bcd\u5178\uff0c\u5e76\u4f7f\u7528\u505c\u7528\u8bcd\u3002\n```python3\ndef using_typed_words():\n from harvesttext.resources import get_qh_typed_words,get_baidu_stopwords\n ht0 = HarvestText()\n typed_words, stopwords = get_qh_typed_words(), get_baidu_stopwords()\n ht0.add_typed_words(typed_words)\n sentence = \"THUOCL\u662f\u81ea\u7136\u8bed\u8a00\u5904\u7406\u7684\u4e00\u5957\u4e2d\u6587\u8bcd\u5e93\uff0c\u8bcd\u8868\u6765\u81ea\u4e3b\u6d41\u7f51\u7ad9\u7684\u793e\u4f1a\u6807\u7b7e\u3001\u641c\u7d22\u70ed\u8bcd\u3001\u8f93\u5165\u6cd5\u8bcd\u5e93\u7b49\u3002\"\n print(sentence)\n print(ht0.posseg(sentence,stopwords=stopwords))\nusing_typed_words()\n```\n\n```\nTHUOCL\u662f\u81ea\u7136\u8bed\u8a00\u5904\u7406\u7684\u4e00\u5957\u4e2d\u6587\u8bcd\u5e93\uff0c\u8bcd\u8868\u6765\u81ea\u4e3b\u6d41\u7f51\u7ad9\u7684\u793e\u4f1a\u6807\u7b7e\u3001\u641c\u7d22\u70ed\u8bcd\u3001\u8f93\u5165\u6cd5\u8bcd\u5e93\u7b49\u3002\n[('THUOCL', 'eng'), ('\u81ea\u7136\u8bed\u8a00\u5904\u7406', 'IT'), ('\u4e00\u5957', 'm'), ('\u4e2d\u6587', 'nz'), ('\u8bcd\u5e93', 'n'), ('\u8bcd\u8868', 'n'), ('\u6765\u81ea', 'v'), ('\u4e3b\u6d41', 'b'), ('\u7f51\u7ad9', 'n'), ('\u793e\u4f1a', 'n'), ('\u6807\u7b7e', '\u8d22\u7ecf'), ('\u641c\u7d22', 'v'), ('\u70ed\u8bcd', 'n'), ('\u8f93\u5165\u6cd5', 'IT'), ('\u8bcd\u5e93', 'n')]\n```\n\n\u4e00\u4e9b\u8bcd\u8bed\u88ab\u8d4b\u4e88\u7279\u6b8a\u7c7b\u578bIT,\u800c\u201c\u662f\u201d\u7b49\u8bcd\u8bed\u88ab\u7b5b\u51fa\u3002\n\n\n \n### \u65b0\u8bcd\u53d1\u73b0\n\u4ece\u6bd4\u8f83\u5927\u91cf\u7684\u6587\u672c\u4e2d\u5229\u7528\u4e00\u4e9b\u7edf\u8ba1\u6307\u6807\u53d1\u73b0\u65b0\u8bcd\u3002\uff08\u53ef\u9009\uff09\u901a\u8fc7\u63d0\u4f9b\u4e00\u4e9b\u79cd\u5b50\u8bcd\u8bed\u6765\u786e\u5b9a\u600e\u6837\u7a0b\u5ea6\u8d28\u91cf\u7684\u8bcd\u8bed\u53ef\u4ee5\u88ab\u53d1\u73b0\u3002\uff08\u5373\u81f3\u5c11\u6240\u6709\u7684\u79cd\u5b50\u8bcd\u4f1a\u88ab\u53d1\u73b0\uff0c\u5728\u6ee1\u8db3\u4e00\u5b9a\u7684\u57fa\u7840\u8981\u6c42\u7684\u524d\u63d0\u4e0b\u3002\uff09\n```python3\npara = \"\u4e0a\u6e2f\u7684\u6b66\u78ca\u548c\u6052\u5927\u7684\u90dc\u6797\uff0c\u8c01\u662f\u4e2d\u56fd\u6700\u597d\u7684\u524d\u950b\uff1f\u90a3\u5f53\u7136\u662f\u6b66\u78ca\u6b66\u7403\u738b\u4e86\uff0c\u4ed6\u662f\u5c04\u624b\u699c\u7b2c\u4e00\uff0c\u539f\u6765\u662f\u5f31\u70b9\u7684\u5355\u5200\u4e5f\u6709\u4e86\u8fdb\u6b65\"\n#\u8fd4\u56de\u5173\u4e8e\u65b0\u8bcd\u8d28\u91cf\u7684\u4e00\u7cfb\u5217\u4fe1\u606f\uff0c\u5141\u8bb8\u624b\u5de5\u6539\u8fdb\u7b5b\u9009(pd.DataFrame\u578b)\nnew_words_info = ht.word_discover(para)\n#new_words_info = ht.word_discover(para, threshold_seeds=[\"\u6b66\u78ca\"]) \nnew_words = new_words_info.index.tolist()\nprint(new_words)\n```\n\n> [\"\u6b66\u78ca\"]\n\n\u5177\u4f53\u7684\u65b9\u6cd5\u548c\u6307\u6807\u542b\u4e49\uff0c\u53c2\u8003\uff1ahttp://www.matrix67.com/blog/archives/5044\n\n\u53d1\u73b0\u7684\u65b0\u8bcd\u5f88\u591a\u90fd\u53ef\u80fd\u662f\u6587\u672c\u4e2d\u7684\u7279\u6b8a\u5173\u952e\u8bcd\uff0c\u6545\u53ef\u4ee5\u628a\u627e\u5230\u7684\u65b0\u8bcd\u767b\u5f55\uff0c\u4f7f\u540e\u7eed\u7684\u5206\u8bcd\u4f18\u5148\u5206\u51fa\u8fd9\u4e9b\u8bcd\u3002\n```python3\ndef new_word_register():\n new_words = [\"\u843d\u53f6\u7403\",\"666\"]\n ht.add_new_words(new_words) # \u4f5c\u4e3a\u5e7f\u4e49\u4e0a\u7684\"\u65b0\u8bcd\"\u767b\u5f55\n ht.add_new_entity(\"\u843d\u53f6\u7403\", mention0=\"\u843d\u53f6\u7403\", type0=\"\u672f\u8bed\") # \u4f5c\u4e3a\u7279\u5b9a\u7c7b\u578b\u767b\u5f55\n print(ht.seg(\"\u8fd9\u4e2a\u843d\u53f6\u7403\u8e22\u5f97\u771f\u662f666\", return_sent=True))\n for word, flag in ht.posseg(\"\u8fd9\u4e2a\u843d\u53f6\u7403\u8e22\u5f97\u771f\u662f666\"):\n print(\"%s:%s\" % (word, flag), end=\" \")\n```\n> \u8fd9\u4e2a \u843d\u53f6\u7403 \u8e22 \u5f97 \u771f\u662f 666\n\n> \u8fd9\u4e2a:r \u843d\u53f6\u7403:\u672f\u8bed \u8e22:v \u5f97:ud \u771f\u662f:d 666:\u65b0\u8bcd \n\n\u4e5f\u53ef\u4ee5\u4f7f\u7528\u4e00\u4e9b\u7279\u6b8a\u7684*\u89c4\u5219*\u6765\u627e\u5230\u6240\u9700\u7684\u5173\u952e\u8bcd\uff0c\u5e76\u76f4\u63a5\u8d4b\u4e88\u7c7b\u578b\uff0c\u6bd4\u5982\u5168\u82f1\u6587\uff0c\u6216\u8005\u6709\u7740\u7279\u5b9a\u7684\u524d\u540e\u7f00\u7b49\u3002\n```python3\n# find_with_rules()\nfrom harvesttext.match_patterns import UpperFirst, AllEnglish, Contains, StartsWith, EndsWith\ntext0 = \"\u6211\u559c\u6b22Python\uff0c\u56e0\u4e3arequests\u5e93\u5f88\u9002\u5408\u722c\u866b\"\nht0 = HarvestText()\n\nfound_entities = ht0.find_entity_with_rule(text0, rulesets=[AllEnglish()], type0=\"\u82f1\u6587\u540d\")\nprint(found_entities)\nprint(ht0.posseg(text0))\n```\n\n```\n{'Python', 'requests'}\n[('\u6211', 'r'), ('\u559c\u6b22', 'v'), ('Python', '\u82f1\u6587\u540d'), ('\uff0c', 'x'), ('\u56e0\u4e3a', 'c'), ('requests', '\u82f1\u6587\u540d'), ('\u5e93', 'n'), ('\u5f88', 'd'), ('\u9002\u5408', 'v'), ('\u722c\u866b', 'n')]\n```\n\n\n \n### \u5b58\u53d6\u6d88\u9664\n\u53ef\u4ee5\u672c\u5730\u4fdd\u5b58\u6a21\u578b\u518d\u8bfb\u53d6\u590d\u7528\uff0c\u4e5f\u53ef\u4ee5\u6d88\u9664\u5f53\u524d\u6a21\u578b\u7684\u8bb0\u5f55\u3002\n\n```python3\nfrom harvesttext import loadHT,saveHT\npara = \"\u4e0a\u6e2f\u7684\u6b66\u78ca\u548c\u6052\u5927\u7684\u90dc\u6797\uff0c\u8c01\u662f\u4e2d\u56fd\u6700\u597d\u7684\u524d\u950b\uff1f\u90a3\u5f53\u7136\u662f\u6b66\u78ca\u6b66\u7403\u738b\u4e86\uff0c\u4ed6\u662f\u5c04\u624b\u699c\u7b2c\u4e00\uff0c\u539f\u6765\u662f\u5f31\u70b9\u7684\u5355\u5200\u4e5f\u6709\u4e86\u8fdb\u6b65\"\nsaveHT(ht,\"ht_model1\")\nht2 = loadHT(\"ht_model1\")\n\n# \u6d88\u9664\u8bb0\u5f55\nht2.clear()\nprint(\"cut with cleared model\")\nprint(ht2.seg(para))\n```\n\n \n### \u7b80\u6613\u95ee\u7b54\u7cfb\u7edf\n\u5177\u4f53\u5b9e\u73b0\u53ca\u4f8b\u5b50\u5728[naiveKGQA.py](./examples/naiveKGQA.py)\u4e2d\uff0c\u4e0b\u9762\u7ed9\u51fa\u90e8\u5206\u793a\u610f\uff1a\n\n```python\nQA = NaiveKGQA(SVOs, entity_type_dict=entity_type_dict)\nquestions = [\"\u4f60\u597d\",\"\u5b59\u4e2d\u5c71\u5e72\u4e86\u4ec0\u4e48\u4e8b\uff1f\",\"\u8c01\u53d1\u52a8\u4e86\u4ec0\u4e48\uff1f\",\"\u6e05\u653f\u5e9c\u7b7e\u8ba2\u4e86\u54ea\u4e9b\u6761\u7ea6\uff1f\",\n\t\t\t \"\u82f1\u56fd\u4e0e\u9e26\u7247\u6218\u4e89\u7684\u5173\u7cfb\u662f\u4ec0\u4e48\uff1f\",\"\u8c01\u590d\u8f9f\u4e86\u5e1d\u5236\uff1f\"]\nfor question0 in questions:\n\tprint(\"\u95ee\uff1a\"+question0)\n\tprint(\"\u7b54\uff1a\"+QA.answer(question0))\n```\n\n```\n\u95ee\uff1a\u5b59\u4e2d\u5c71\u5e72\u4e86\u4ec0\u4e48\u4e8b\uff1f\n\u7b54\uff1a\u5c31\u4efb\u4e34\u65f6\u5927\u603b\u7edf\u3001\u53d1\u52a8\u62a4\u6cd5\u8fd0\u52a8\u3001\u8ba9\u4f4d\u4e8e\u8881\u4e16\u51ef\n\u95ee\uff1a\u8c01\u53d1\u52a8\u4e86\u4ec0\u4e48\uff1f\n\u7b54\uff1a\u82f1\u6cd5\u8054\u519b\u4fb5\u7565\u4e2d\u56fd\u3001\u56fd\u6c11\u515a\u4eba\u4e8c\u6b21\u9769\u547d\u3001\u82f1\u56fd\u9e26\u7247\u6218\u4e89\u3001\u65e5\u672c\u4fb5\u7565\u671d\u9c9c\u3001\u5b59\u4e2d\u5c71\u62a4\u6cd5\u8fd0\u52a8\u3001\u6cd5\u56fd\u4fb5\u7565\u8d8a\u5357\u3001\u82f1\u56fd\u4fb5\u7565\u4e2d\u56fd\u897f\u85cf\u6218\u4e89\u3001\u6148\u79a7\u592a\u540e\u620a\u620c\u653f\u53d8\n\u95ee\uff1a\u6e05\u653f\u5e9c\u7b7e\u8ba2\u4e86\u54ea\u4e9b\u6761\u7ea6\uff1f\n\u7b54\uff1a\u5317\u4eac\u6761\u7ea6\u3001\u5929\u6d25\u6761\u7ea6\n\u95ee\uff1a\u82f1\u56fd\u4e0e\u9e26\u7247\u6218\u4e89\u7684\u5173\u7cfb\u662f\u4ec0\u4e48\uff1f\n\u7b54\uff1a\u53d1\u52a8\n\u95ee\uff1a\u8c01\u590d\u8f9f\u4e86\u5e1d\u5236\uff1f\n\u7b54\uff1a\u8881\u4e16\u51ef\n```\n\n## More\n\u672c\u5e93\u6b63\u5728\u5f00\u53d1\u4e2d\uff0c\u5173\u4e8e\u73b0\u6709\u529f\u80fd\u7684\u6539\u5584\u548c\u66f4\u591a\u529f\u80fd\u7684\u6dfb\u52a0\u53ef\u80fd\u4f1a\u9646\u7eed\u5230\u6765\u3002\u6b22\u8fce\u5728issues\u91cc\u63d0\u4f9b\u610f\u89c1\u5efa\u8bae\u3002\u89c9\u5f97\u597d\u7528\u7684\u8bdd\uff0c\u4e5f\u4e0d\u59a8\u6765\u4e2aStar~\n\n\u611f\u8c22\u4ee5\u4e0brepo\u5e26\u6765\u7684\u542f\u53d1\uff1a\n\n[snownlp](https://github.com/isnowfy/snownlp)\n\n[pyhanLP](https://github.com/hankcs/pyhanlp)\n\n[funNLP](https://github.com/fighting41love/funNLP)\n\n[ChineseWordSegmentation](https://github.com/Moonshile/ChineseWordSegmentation)\n\n[EventTriplesExtraction](https://github.com/liuhuanyong/EventTriplesExtraction)\n\n\n\n", "description_content_type": "text/markdown", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/blmoistawinde/HarvestText", "keywords": "NLP,tokenizing,entity linking,sentiment analysis", "license": "MIT", "maintainer": "", "maintainer_email": "", "name": "harvesttext", "package_url": "https://pypi.org/project/harvesttext/", "platform": "all", "project_url": "https://pypi.org/project/harvesttext/", "project_urls": { "Homepage": "https://github.com/blmoistawinde/HarvestText" }, "release_url": "https://pypi.org/project/harvesttext/0.5.4.2/", "requires_dist": [ "jieba", "numpy", "pandas", "networkx", "pypinyin", "pyhanlp", "rdflib", "pyxDamerauLevenshtein (==1.5)" ], "requires_python": "", "summary": "", "version": "0.5.4.2" }, "last_serial": 5444511, "releases": { "0.3.1": [ { "comment_text": "", "digests": { "md5": "cb8266c92410de644b747a6d90c30ed4", "sha256": "5d26e5f53fa21bc685a6fb0c9c0826ae326bad893754fa6cb3196d90a14e3cc9" }, "downloads": -1, "filename": "harvesttext-0.3.1-py3.6.egg", "has_sig": false, "md5_digest": "cb8266c92410de644b747a6d90c30ed4", "packagetype": "bdist_egg", "python_version": "3.6", "requires_python": null, "size": 1087736, "upload_time": "2018-12-21T12:30:22", "url": "https://files.pythonhosted.org/packages/b9/2b/b3abf25f3fefb109db3bfce369394e83890dfc4fea96b3971de0585d4aca/harvesttext-0.3.1-py3.6.egg" }, { "comment_text": "", "digests": { "md5": "de95372fa7808e2bd6c8a29d0f201aa3", "sha256": "aab5cd7b130483cacf49e21c5f09655a57154ceba434517b34392004c05aa092" }, "downloads": -1, "filename": "harvesttext-0.3.1-py3-none-any.whl", "has_sig": false, "md5_digest": "de95372fa7808e2bd6c8a29d0f201aa3", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 1077107, "upload_time": "2018-12-21T12:30:19", "url": "https://files.pythonhosted.org/packages/cc/40/c4530cc233448b9acb14219af3914da12940ed39a56db03f950364849d3d/harvesttext-0.3.1-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "13a044470b584530a630a0ef750208d7", "sha256": "ab799ceafeee5ee2e7a289aa58efd7c67cc5a7335a331983135c3434708c99ac" }, "downloads": -1, "filename": "harvesttext-0.3.1.tar.gz", "has_sig": false, "md5_digest": "13a044470b584530a630a0ef750208d7", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 17235, "upload_time": "2018-12-21T12:30:24", "url": "https://files.pythonhosted.org/packages/56/ee/4ae1eccbdb654522c9c208efd5d1a9ee48cec72bc7852a92eca7758f4cfa/harvesttext-0.3.1.tar.gz" } ], "0.4.0": [ { "comment_text": "", "digests": { "md5": "c931cdec09ea17f663aa5b851e01043f", "sha256": "90d11a160a494e97a5f1442af74f2e24861a4c45411cbb1894be8405bfc18227" }, "downloads": -1, "filename": "harvesttext-0.4.0-py3-none-any.whl", "has_sig": false, "md5_digest": "c931cdec09ea17f663aa5b851e01043f", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 1797920, "upload_time": "2018-12-28T13:55:19", "url": "https://files.pythonhosted.org/packages/35/72/6e7bbddccc8be200c8b50e9a3c2f1b0f78f163dc57d660a6f22d49ffdccf/harvesttext-0.4.0-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "8d58f0fe9b6b875e7ef06f6825c3a2a1", "sha256": "079f9701cacae127efc0d24f0c1dc33faf1af84d060b41000b85455f221e9509" }, "downloads": -1, "filename": "harvesttext-0.4.0.tar.gz", "has_sig": false, "md5_digest": "8d58f0fe9b6b875e7ef06f6825c3a2a1", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 21887, "upload_time": "2018-12-28T13:55:21", "url": "https://files.pythonhosted.org/packages/af/ec/1145043e6390ff8742aeb75b2860cf6ad5fd36629a8a4c224818df3eb961/harvesttext-0.4.0.tar.gz" } ], "0.4.1": [ { "comment_text": "", "digests": { "md5": "f51b7e79787a8b76878effa18e9aec75", "sha256": "528a2e4fcb180f7ee8793db24254180b730e479532595963307917e93358bc47" }, "downloads": -1, "filename": "harvesttext-0.4.1-py3.6.egg", "has_sig": false, "md5_digest": "f51b7e79787a8b76878effa18e9aec75", "packagetype": "bdist_egg", "python_version": "3.6", "requires_python": null, "size": 1810051, "upload_time": "2018-12-29T11:36:58", "url": "https://files.pythonhosted.org/packages/b8/eb/521f18c255a0dbb49f828d55cd00f67922918413f044d420cafda6e2b0a1/harvesttext-0.4.1-py3.6.egg" }, { "comment_text": "", "digests": { "md5": "5c9b960bc31d61663c9cae43085f8a67", "sha256": "f000d49dff303bea1748269f4328f5023f129441fc2292c5430ed13bc51afcf1" }, "downloads": -1, "filename": "harvesttext-0.4.1-py3-none-any.whl", "has_sig": false, "md5_digest": "5c9b960bc31d61663c9cae43085f8a67", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 1797218, "upload_time": "2018-12-29T11:36:33", "url": "https://files.pythonhosted.org/packages/17/97/daffde38831b17700d716af2fe0e3b6cfa79a09dea8ad40f05de229be9ea/harvesttext-0.4.1-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "bf5a2f69bf14a2b695ca1d24ad99c3af", "sha256": "24d5c7fddbc980048ce2e917287d7121cb421a64f4825b15965a18009f1a2667" }, "downloads": -1, "filename": "harvesttext-0.4.1.tar.gz", "has_sig": false, "md5_digest": "bf5a2f69bf14a2b695ca1d24ad99c3af", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 21992, "upload_time": "2018-12-29T11:37:00", "url": "https://files.pythonhosted.org/packages/25/dc/dfd58bc5809528638986db56e0b19d5254f0d45e800ea4cfbc8f9694e597/harvesttext-0.4.1.tar.gz" } ], "0.5": [ { "comment_text": "", "digests": { "md5": "0107c070a0d27601eb33429d71dca8f8", "sha256": "16bb4f72f4a7b8230346a3357cb66ea8f9de3541df084a6e7a8c2e491091a6ed" }, "downloads": -1, "filename": "harvesttext-0.5-py3.6.egg", "has_sig": false, "md5_digest": "0107c070a0d27601eb33429d71dca8f8", "packagetype": "bdist_egg", "python_version": "3.6", "requires_python": null, "size": 1830313, "upload_time": "2019-01-17T13:34:22", "url": "https://files.pythonhosted.org/packages/1c/3f/794ab6be01f887c56588b38f7c47868702bcbaf4b0ae60c4630b71c30398/harvesttext-0.5-py3.6.egg" }, { "comment_text": "", "digests": { "md5": "14bd98a481c2ad71171aaec128968be5", "sha256": "1c95127929f784f89f004a7312923ad16cccc7fc55705a32a24aa32154c05afa" }, "downloads": -1, "filename": "harvesttext-0.5-py3-none-any.whl", "has_sig": false, "md5_digest": "14bd98a481c2ad71171aaec128968be5", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 1814826, "upload_time": "2019-01-17T13:34:15", "url": "https://files.pythonhosted.org/packages/e8/56/48f53b9633ba18702d0d2d3419e4d422f9129979fbedf999fb4ce21e7e83/harvesttext-0.5-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "8b37b49f334e7e4746564e3981fcab8d", "sha256": "bc3f1239e20c4344cd64f5d976779cd8a15415e6e2cbfd6e0d5349b62d5fb5a5" }, "downloads": -1, "filename": "harvesttext-0.5.tar.gz", "has_sig": false, "md5_digest": "8b37b49f334e7e4746564e3981fcab8d", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 25951, "upload_time": "2019-01-17T13:34:24", "url": "https://files.pythonhosted.org/packages/fd/57/80dac9d183cb0c84a07b807631383912687854ce60eaf9b0a55558be9cd4/harvesttext-0.5.tar.gz" } ], "0.5.1": [ { "comment_text": "", "digests": { "md5": "c5ca8371ee9f1a0a7cc4a3f8e8c2444f", "sha256": "dffc871c527838a61df4d3e1a8c877e327ddbdc1d1625b245484976c5c07ba23" }, "downloads": -1, "filename": "harvesttext-0.5.1-py3.6.egg", "has_sig": false, "md5_digest": "c5ca8371ee9f1a0a7cc4a3f8e8c2444f", "packagetype": "bdist_egg", "python_version": "3.6", "requires_python": null, "size": 1838794, "upload_time": "2019-01-20T03:43:34", "url": "https://files.pythonhosted.org/packages/25/8d/30703952468dc028523a0ae7a829620df66682866febd9894d51905c3c01/harvesttext-0.5.1-py3.6.egg" }, { "comment_text": "", "digests": { "md5": "91e2a4b461c1e8066e23ec367c5174f0", "sha256": "f989f7ba638de260b61129f4854d65653c49ac968fb878231eddbef79bf67bb5" }, "downloads": -1, "filename": "harvesttext-0.5.1-py3-none-any.whl", "has_sig": false, "md5_digest": "91e2a4b461c1e8066e23ec367c5174f0", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 1818635, "upload_time": "2019-01-20T03:43:31", "url": "https://files.pythonhosted.org/packages/6c/82/147d13bcacc62664184e9f5eec35f3d2925653af989973223d96a6b7666e/harvesttext-0.5.1-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "b669e95cddb306e1d75864dbf8ae59b2", "sha256": "6e887ba2a3afe88cb156b73a3e5c8555d0a15baa553269833780731dff08a126" }, "downloads": -1, "filename": "harvesttext-0.5.1.tar.gz", "has_sig": false, "md5_digest": "b669e95cddb306e1d75864dbf8ae59b2", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 26464, "upload_time": "2019-01-20T03:43:36", "url": "https://files.pythonhosted.org/packages/6c/fb/c43eaac30f53e4df81c6d80a9cef15947e6dfeef6645e895e213697e1a13/harvesttext-0.5.1.tar.gz" } ], "0.5.2": [ { "comment_text": "", "digests": { "md5": "9397613991c3d162de3c47acbf7bc7a8", "sha256": "f70061c6eeb2f92f6094e9132945e635908880e57dd080650bcc54d6323deb53" }, "downloads": -1, "filename": "harvesttext-0.5.2-py3-none-any.whl", "has_sig": false, "md5_digest": "9397613991c3d162de3c47acbf7bc7a8", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 1819294, "upload_time": "2019-02-15T06:54:15", "url": "https://files.pythonhosted.org/packages/0a/37/bb70c047c66e16136e93e9d97a40cf1ea67b36b5ce876d4be12058daf3c5/harvesttext-0.5.2-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "923dc50e99adb6fe1cdc74358871bb2b", "sha256": "04071bd23118b585587569c829f7fbceb9cc3347d60afef761a9f206da8f7659" }, "downloads": -1, "filename": "harvesttext-0.5.2.tar.gz", "has_sig": false, "md5_digest": "923dc50e99adb6fe1cdc74358871bb2b", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 26862, "upload_time": "2019-02-15T06:54:17", "url": "https://files.pythonhosted.org/packages/54/37/e0dd173f3e69eb32317bd68db07740182ced96b57b2dc11a4ac04fd4984b/harvesttext-0.5.2.tar.gz" } ], "0.5.3.1": [ { "comment_text": "", "digests": { "md5": "255947e68219e0622d40c10238305f8a", "sha256": "49a6fb0708e21948bc3e6474d0a742e0afd95f82b63f5a1b3140145c8b31c16d" }, "downloads": -1, "filename": "harvesttext-0.5.3.1-py3-none-any.whl", "has_sig": false, "md5_digest": "255947e68219e0622d40c10238305f8a", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 1819696, "upload_time": "2019-02-21T01:57:50", "url": "https://files.pythonhosted.org/packages/63/0c/268e33ec23b2b7af5f060dec048d68cbf66763ae65dbab09c42ec5df36eb/harvesttext-0.5.3.1-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "82831e99d5e821e2e2261ee2e2abeb7b", "sha256": "518fb507d87b626549a916bb768ec0d035a1f8e69f48979092c79f4551fc2ae9" }, "downloads": -1, "filename": "harvesttext-0.5.3.1.tar.gz", "has_sig": false, "md5_digest": "82831e99d5e821e2e2261ee2e2abeb7b", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 27248, "upload_time": "2019-02-21T01:57:52", "url": "https://files.pythonhosted.org/packages/ce/0e/40edf27fa6baa79e4b7fb3de77575e8fdbb68535f62921d93587ecdb174a/harvesttext-0.5.3.1.tar.gz" } ], "0.5.4": [ { "comment_text": "", "digests": { "md5": "b0df88d5e383f1fc5a6e24c303b604b1", "sha256": "1c365cca0bee0b1996ced4792b5a6c40e54fa552a94c3b23178ddb9ce48971cf" }, "downloads": -1, "filename": "harvesttext-0.5.4-py3.6.egg", "has_sig": false, "md5_digest": "b0df88d5e383f1fc5a6e24c303b604b1", "packagetype": "bdist_egg", "python_version": "3.6", "requires_python": null, "size": 1840862, "upload_time": "2019-02-21T09:42:38", "url": "https://files.pythonhosted.org/packages/d1/91/8b315768bf4adb769cc365e76acf4b4c2e15625c35989887c8fe594a6258/harvesttext-0.5.4-py3.6.egg" }, { "comment_text": "", "digests": { "md5": "6035335afd2c1903db1a3c4c8f2f92fc", "sha256": "c12f748d285bd768878ccb2bcf5c1891a093b527774e62faf8b801c0e323c901" }, "downloads": -1, "filename": "harvesttext-0.5.4-py3-none-any.whl", "has_sig": false, "md5_digest": "6035335afd2c1903db1a3c4c8f2f92fc", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 1820070, "upload_time": "2019-02-21T09:42:34", "url": "https://files.pythonhosted.org/packages/d6/f2/c96227fba1013cb03a8e6888e2870cb1edfd4812fef086b8e7bfbf186560/harvesttext-0.5.4-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "b003c1732d5b734069dee46e8194ac66", "sha256": "bb433683e4e49e0b79d885262ca54b67a07a575999ad1ccaa4a0a1cfb26bfc29" }, "downloads": -1, "filename": "harvesttext-0.5.4.tar.gz", "has_sig": false, "md5_digest": "b003c1732d5b734069dee46e8194ac66", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 27663, "upload_time": "2019-02-21T09:42:40", "url": "https://files.pythonhosted.org/packages/71/1e/724e58ec8aca091bf641719ae2ee0e359aa33f5f6ca377b3d5ec40c0abbe/harvesttext-0.5.4.tar.gz" } ], "0.5.4.2": [ { "comment_text": "", "digests": { "md5": "114088765bde1666ebfd14d27b92782c", "sha256": "8465975b59c4ea3e2074529db612649df47b8518206ceacdf1980303aad0d488" }, "downloads": -1, "filename": "harvesttext-0.5.4.2-py3-none-any.whl", "has_sig": false, "md5_digest": "114088765bde1666ebfd14d27b92782c", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 1810622, "upload_time": "2019-06-25T08:48:15", "url": "https://files.pythonhosted.org/packages/05/6a/baedc46a7820353b651e5693d9118111d260f106847b0ffbc90f7a13d3ce/harvesttext-0.5.4.2-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "65b54a8851e44da1f8ec39dd879217c6", "sha256": "48a109c26b80435be3366df9db9b2ae172b375b0c1b500f9da71ddb073304aa0" }, "downloads": -1, "filename": "harvesttext-0.5.4.2.tar.gz", "has_sig": false, "md5_digest": "65b54a8851e44da1f8ec39dd879217c6", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 2657524, "upload_time": "2019-06-25T08:48:20", "url": "https://files.pythonhosted.org/packages/6c/a8/cf7a5968dc1b6dbb2aa35d74572401d25e8b5e5cf0221d917e5e2eb00e4a/harvesttext-0.5.4.2.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "114088765bde1666ebfd14d27b92782c", "sha256": "8465975b59c4ea3e2074529db612649df47b8518206ceacdf1980303aad0d488" }, "downloads": -1, "filename": "harvesttext-0.5.4.2-py3-none-any.whl", "has_sig": false, "md5_digest": "114088765bde1666ebfd14d27b92782c", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 1810622, "upload_time": "2019-06-25T08:48:15", "url": "https://files.pythonhosted.org/packages/05/6a/baedc46a7820353b651e5693d9118111d260f106847b0ffbc90f7a13d3ce/harvesttext-0.5.4.2-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "65b54a8851e44da1f8ec39dd879217c6", "sha256": "48a109c26b80435be3366df9db9b2ae172b375b0c1b500f9da71ddb073304aa0" }, "downloads": -1, "filename": "harvesttext-0.5.4.2.tar.gz", "has_sig": false, "md5_digest": "65b54a8851e44da1f8ec39dd879217c6", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 2657524, "upload_time": "2019-06-25T08:48:20", "url": "https://files.pythonhosted.org/packages/6c/a8/cf7a5968dc1b6dbb2aa35d74572401d25e8b5e5cf0221d917e5e2eb00e4a/harvesttext-0.5.4.2.tar.gz" } ] }