{ "info": { "author": "ZhenSheng Peng", "author_email": "pzsyjsgldd@163.com", "bugtrack_url": null, "classifiers": [ "Intended Audience :: Developers", "Natural Language :: English", "Programming Language :: Python :: 3" ], "description": "TEDT\n========\nTEDT\uff1a\u57fa\u4e8e\u5bc6\u5ea6\u53ca\u6587\u672c\u7279\u5f81\u7684\u65b0\u95fb\u6807\u9898\u62bd\u53d6\u7b97\u6cd5\n\n\n\u7279\u70b9\n========\n* \u81ea\u9002\u5e94\u6a21\u5f0f\uff1a\u53ef\u4ee5\u81ea\u52a8\u9002\u5e94\u677e\u6563\u6216\u8005\u7d27\u5bc6\u7684\u7f51\u9875\u7ed3\u6784\uff1b\n* \u65e5\u5fd7\u4fe1\u606f\uff1a\u53ef\u4ee5\u81ea\u5df1\u8bbe\u7f6e\u65e5\u5fd7\u7b49\u7ea7\uff0c\u76d1\u6d4b\u7b97\u6cd5\u5185\u90e8\u7ec6\u8282\u548c\u8ba1\u7b97\u6d41\u7a0b\uff1b\n* \u914d\u7f6e\u4fe1\u606f\uff1a\u53ef\u4ee5\u81ea\u5df1\u914d\u7f6e\u6b63\u6587\u6587\u672c\u884c\u7684\u8303\u56f4\uff0c\u6807\u9898\u957f\u5ea6\u8303\u56f4\u3002\n* MIT \u6388\u6743\u534f\u8bae\n\n\n\u5b89\u88c5\u8bf4\u660e\n=======\n\n\u4ee3\u7801\u4ec5\u5bf9 Python3 \u517c\u5bb9\n\n* \u5168\u81ea\u52a8\u5b89\u88c5\uff1a`easy_install TEDT` \u6216\u8005 `pip install TEDT` / `pip3 install TEDT`\n* \u534a\u81ea\u52a8\u5b89\u88c5\uff1a\u5148\u4e0b\u8f7d http://pypi.python.org/pypi/TEDT/ \uff0c\u89e3\u538b\u540e\u8fd0\u884c `python setup.py install`\n* \u624b\u52a8\u5b89\u88c5\uff1a\u5c06 TEDT \u76ee\u5f55\u653e\u7f6e\u4e8e\u5f53\u524d\u76ee\u5f55\u6216\u8005 site-packages \u76ee\u5f55\n* \u901a\u8fc7 `import TEDT` \u6765\u5f15\u7528\n\n\u7b97\u6cd5\n========\n* \u4e3a\u4ece\u5927\u91cf\u7684\u590d\u6742\u975e\u89c4\u8303\u7f51\u9875\u7ed3\u6784\u4e2d\u81ea\u52a8\u62bd\u53d6\u51fa\u65b0\u95fb\u6807\u9898\uff0c\u672c\u6587\u63d0\u51fa\u4e00\u79cd\u57fa\u4e8e\u5bc6\u5ea6\u548c\u6587\u672c\u7279\u5f81\u7684\u65b0\u95fb\u6807\u9898\u62bd\u53d6\u7b97\u6cd5\u3002\n* \u4e3b\u8981\u901a\u8fc7\u878d\u5408\u7f51\u9875\u6587\u672c\u5bc6\u5ea6\u5206\u5e03\u548c\u8bed\u8a00\u7279\u5f81\u7684\u8bed\u6599\u5224\u5b9a\u6a21\u578b\uff0c\u5c06\u7f51\u9875\u5212\u5206\u4e3a\u8bed\u6599\u533a\u548c\u6807\u9898\u5019\u9009\u533a\uff0c\u9009\u53d6\u8bed\u6599\u540e\u901a\u8fc7TextRank\u7b97\u6cd5\u8ba1\u7b97\u5bf9\u5e94\u7684key-value\u6743\u91cd\u96c6\u5408\uff0c\u6700\u540e\u91c7\u7528\u6539\u8fdb\u7684\u76f8\u4f3c\u5ea6\u8ba1\u7b97\u65b9\u6cd5\u4ece\u6807\u9898\u5019\u9009\u533a\u62bd\u53d6\u65b0\u95fb\u6807\u9898\u3002\n* \u8be5\u7b97\u6cd5\u80fd\u6709\u6548\u5212\u5206\u8bed\u6599\u548c\u6807\u9898\u533a\u57df\uff0c\u964d\u4f4e\u7f51\u9875\u566a\u58f0\u5e72\u6270\uff0c\u51c6\u786e\u62bd\u53d6\u51fa\u65b0\u95fb\u6807\u9898\u3002\n\n\u4e3b\u8981\u529f\u80fd\n=======\n1. \u62bd\u53d6\u6807\u9898\u3001\u6b63\u6587\u3001\u53d1\u5e03\u65f6\u95f4\n--------\n* `TEDT`\u5b9e\u4f8b\u5316\u9700\u8981\u81f3\u5c11\u63a5\u53d7\u4e00\u4e2a\u53c2\u6570\uff1aurl\n* `TEDT`\u5b9e\u4f8b\u5316\u63a5\u53d7\u4e00\u4e2a\u53c2\u6570\u5373url\u65f6\u7684\u9ed8\u8ba4\u914d\u7f6e\u4e3a\n\n- CENTER_DISTANCE_MIN = 0 #\u6700\u5c0f\u6587\u672c\u884c\u95f4\u8ddd\n- CENTER_DISTANCE_MAX = 10 # \u6700\u5927\u6587\u672c\u884c\u95f4\u8ddd\n- TITLE_MIN_LENGTH = 5 # \u6700\u5c0f\u6807\u9898\u957f\u5ea6\n- TITLE_MAX_LENGTH = 50 # \u6700\u5927\u6807\u9898\u957f\u5ea6\n- LOG_ENABLE = True # \u662f\u5426\u5f00\u542f\u65e5\u5fd7\n- LOG_LEVEL = 'WARNING' #\u9ed8\u8ba4\u65e5\u5fd7\u7b49\u7ea7\n- ADAPTIVE = True #\u662f\u5426\u81ea\u9002\u5e94\u7f51\u9875\u5bc6\u5ea6\u7ed3\u6784\n\n2. \u4ee3\u7801\u793a\u4f8b\n--------\n\n >>> from TEDT import TEDT\n >>> url = 'http://www16.zzu.edu.cn/msgs/vmsgisapi.dll/onemsg?msgid=1712291126498126051'\n >>> t = TEDT(url, LOG_LEVEL='INFO',)\n >>> t.ie()\n >>> print(t.corpus)\n \u65e5\u524d\uff0c\u65e5\u672c\u9a7b\u534e\u5927\u4f7f\u9986\u7ecf\u6d4e\u90e8\u4e00\u7b49\u79d8\u4e66\u4e0a\u7530\u667a\u4e00\u3001\u65e5\u672c\u79d1\u5b66\u6280\u672f\u632f\u5174\u673a\u6784\uff08jst\uff09\u5317\u4eac\u4e8b\u52a1\u6240\u6240\u957f\u8336\u5c71\u79c0\u4e00\u3001\u65e5\u672c\u7406\u5316\u5b66\u7814\u7a76\u6240\uff08riken\uff09\u3002\u3002\u3002\n >>> print(t.title)\n \u65e5\u672c\u79d1\u6280\u4ee3\u8868\u56e2\u6765\u6821\u8bbf\u95ee\u4ea4\u6d41\uff08\u56fe\uff09\n\t>>> print(t.time)\n 2017-12-29\n\n3. \u5b9e\u4f8b\u6d4b\u8bd5\n--------\n- from TEDT import TEDT\n\n- urls = [\n 'http://www.cankaoxiaoxi.com/china/20170630/2158196.shtml', # \u53c2\u8003\u6d88\u606f\n 'http://news.ifeng.com/a/20180121/55332303_0.shtml', # \u51e4\u51f0\u8d44\u8baf\n 'http://china.huanqiu.com/article/2018-01/11541273.html', # \u73af\u7403\u7f51\n 'http://news.china.com/socialgd/10000169/20180122/31990621.html', # \u4e2d\u534e\u7f51\n 'http://www.thepaper.cn/newsDetail_forward_1962275', # \u6f8e\u6e43\u65b0\u95fb\n # 'http://news.szu.edu.cn/info/1003/4989.htm', # \u6df1\u5733\u5927\u5b66\u65b0\u95fb\u7f51\n 'http://www16.zzu.edu.cn/msgs/vmsgisapi.dll/onemsg?msgid=1712291126498126051', # \u90d1\u5dde\u5927\u5b66\u65b0\u95fb\u7f51\n 'http://news.ruc.edu.cn/archives/194824', # \u4eba\u6c11\u5927\u5b66\u65b0\u95fb\u7f51\n 'http://xinwen.ouc.edu.cn/Article/Class3/xwlb/2018/01/22/82384.html', # \u4e2d\u56fd\u6d77\u6d0b\u5927\u5b66\u65b0\u95fb\u7f51\n 'http://news.sjtu.edu.cn/info/1002/1645201.htm', # \u4e0a\u6d77\u4ea4\u901a\u5927\u5b66\u65b0\u95fb\u7f51\n]\n- for url in urls:\n- t = TEDT(url, LOG_LEVEL='INFO',)\n- t.ie()\n\n- INFO:------------------------------TEDT------------------------------\n- INFO:\u6807\u9898\uff1a\u3010\u6e2f\u5a92\u79f0\u4eba\u5de5\u667a\u80fd\u6539\u53d8\u5185\u5730\u4eba\u751f\u6d3b\uff1a\u795e\u5947\u8001\u5e08\u6df1\u53d7\u5c0f\u5b66\u751f\u559c\u7231\u3011\n- INFO:\u65f6\u95f4\uff1a\u30102017-06-30\u3011\n- INFO:\u6b63\u6587\uff1a\u3010\u6838\u5fc3\u63d0\u793a\uff1a\u5bb6\u8c6a\u7684\u6545\u4e8b\u8868\u660e\uff0cai\u6b63\u5728\u6539\u53d8\u73b0\u4ee3\u793e\u4f1a\uff0c\u8fd9\u9879\u6280\u672f\u6b63\u5728\u6162\u6162\u4ece\u53d1\u660e\u65b0\u5947\u7684\u4ea7\u54c1\uff0c\u5411\u53d1\u660e\u6539\u53d8\u65e5\u5e38\u751f\u6d3b\u7684\u5e94\u7528\u7a0b\u5e8f\u8f6c\u53d8.\u3002\u3002\u3002\n- INFO:*****************************************************************\n- INFO:------------------------------TEDT------------------------------\n- INFO:\u6807\u9898\uff1a\u3010\u5404\u5730\u5e72\u90e8\u7fa4\u4f17\u70ed\u8bae\u5341\u4e5d\u5c4a\u4e8c\u4e2d\u5168\u4f1a\u516c\u62a5\u3011\n- INFO:\u65f6\u95f4\uff1a\u30102018-01-21\u3011\n- INFO:\u6b63\u6587\uff1a\u3010\u539f\u6807\u9898\uff1a\u4e3a\u65b0\u65f6\u4ee3\u4e2d\u56fd\u7279\u8272\u793e\u4f1a\u4e3b\u4e49\u63d0\u4f9b\u6709\u529b\u5baa\u6cd5\u4fdd\u969c\u2014\u2014\u5404\u5730\u5e72\u90e8\u7fa4\u4f17\u70ed\u8bae\u515a\u7684\u5341\u4e5d\u5c4a\u4e8c\u4e2d\u5168\u4f1a\u516c\u62a5\u65b0\u534e\u793e\u5317\u4eac1\u670821\u65e5\u7535(\u65b0\u534e\u793e\u8bb0\u8005)\u201c\u3002\u3002\u3002\n- INFO:*****************************************************************\n- INFO:------------------------------TEDT------------------------------\n- INFO:\u6807\u9898\uff1a\u3010\u5404\u5730\u5e72\u90e8\u7fa4\u4f17\u70ed\u8bae\u515a\u7684\u5341\u4e5d\u5c4a\u4e8c\u4e2d\u5168\u4f1a\u516c\u62a5\u3011\n- INFO:\u65f6\u95f4\uff1a\u30102018-01-21\u3011\n- INFO:\u6b63\u6587\uff1a\u3010\u65b0\u534e\u793e\u5317\u4eac1\u670821\u65e5\u7535\u9898\uff1a\u4e3a\u65b0\u65f6\u4ee3\u4e2d\u56fd\u7279\u8272\u793e\u4f1a\u4e3b\u4e49\u63d0\u4f9b\u6709\u529b\u5baa\u6cd5\u4fdd\u969c\u2014\u2014\u5404\u5730\u5e72\u90e8\u7fa4\u4f17\u70ed\u8bae\u515a\u7684\u5341\u4e5d\u5c4a\u4e8c\u4e2d\u5168\u4f1a\u516c\u62a5\u65b0\u534e\u793e\u8bb0\u8005\u201c\u3002\u3002\u3002\n- INFO:*****************************************************************\n- INFO:------------------------------TEDT------------------------------\n- INFO:\u6807\u9898\uff1a\u3010\u5317\u4eac\u5e72\u6e3490\u5929\u7ec8\u8fce\u521d\u96ea\u96ea\u540e\u6c14\u6e29\u9aa4\u8dcc\u5c06\u9047\u51b0\u51bb\u5468\u3011\n- INFO:\u65f6\u95f4\uff1a\u30102018-01-22\u3011\n- INFO:\u6b63\u6587\uff1a\u3010\u4e2d\u56fd\u5929\u6c14\u7f51\u8baf\u671f\u76fc\u5df2\u4e45\u7684\u5317\u4eac\u521d\u96ea\u7ec8\u4e8e\u6765\u4e86\uff01\u6628\u665a\uff0821\u65e5\uff09\u968f\u7740\u964d\u96ea\u8303\u56f4\u9010\u6e10\u6269\u5927\uff0c\u5317\u4eac\u8fce\u6765\u4e86\u4eca\u51ac\u521d\u96ea\u3002\u53d7\u964d\u96ea\u5f71\u54cd\uff0c\u3002\u3002\u3002\n- INFO:*****************************************************************\n- INFO:------------------------------TEDT------------------------------\n- INFO:\u6807\u9898\uff1a\u3010\u53c8\u670945\u6240\u9ad8\u6821\u8981\u6539\u540d\uff0c\u4f60\u7684\u6bcd\u6821\u8fd8\u662f\u4f60\u7684\u6bcd\u6821\u5417\u3011\n- INFO:\u65f6\u95f4\uff1a\u30102018-01-22\u3011\n- INFO:\u6b63\u6587\uff1a\u3010\u8d85\u5927\u5927\u6807\u51c6\u5c0f\u9ad8\u6821\u6539\u540d\u8fd1\u4e9b\u5e74\u6765\u65b9\u5174\u672a\u827e\uff0c2018\u5e74\u53c8\u670945\u6240\u9ad8\u6821\u53ef\u80fd\u53d8\u66f4\u6821\u540d\u30021\u670820\u65e5\uff0c\u6559\u80b2\u90e8\u53d1\u5c55\u89c4\u5212\u53f8\u6b63\u5f0f\u516c\u5e03\u4e862017\u5e74\u3002\u3002\u3002\n- INFO:*****************************************************************\n- INFO:------------------------------TEDT------------------------------\n- INFO:\u6807\u9898\uff1a\u3010\u65e5\u672c\u79d1\u6280\u4ee3\u8868\u56e2\u6765\u6821\u8bbf\u95ee\u4ea4\u6d41\uff08\u56fe\uff09\u3011\n- INFO:\u65f6\u95f4\uff1a\u30102017-12-29\u3011\n- INFO:\u6b63\u6587\uff1a\u3010\u65e5\u524d\uff0c\u65e5\u672c\u9a7b\u534e\u5927\u4f7f\u9986\u7ecf\u6d4e\u90e8\u4e00\u7b49\u79d8\u4e66\u4e0a\u7530\u667a\u4e00\u3001\u65e5\u672c\u79d1\u5b66\u6280\u672f\u632f\u5174\u673a\u6784\uff08jst\uff09\u5317\u4eac\u4e8b\u52a1\u6240\u6240\u957f\u8336\u5c71\u79c0\u4e00\u3001\u65e5\u672c\u7406\u5316\u5b66\u7814\u7a76\u6240\uff08riken\uff09\u3002\u3002\u3002\n- INFO:*****************************************************************\n- INFO:------------------------------TEDT------------------------------\n- INFO:\u6807\u9898\uff1a\u3010\u4e2d\u56fd\u4eba\u6c11\u5927\u5b66\u53ec\u5f00\u5e74\u5ea6\u6821\u7ea7\u9886\u5bfc\u73ed\u5b50\u6c11\u4e3b\u751f\u6d3b\u4f1a\u3011\n- INFO:\u65f6\u95f4\uff1a\u30102018-01-22\u3011\n- INFO:\u6b63\u6587\uff1a\u3010\u6309\u7167\u4e2d\u592e\u7edf\u4e00\u90e8\u7f72\u548c\u8981\u6c42\uff0c1\u670817\u65e5\uff0c\u4e2d\u56fd\u4eba\u6c11\u5927\u5b66\u53ec\u5f002017\u5e74\u5ea6\u6821\u7ea7\u9886\u5bfc\u73ed\u5b50\u6c11\u4e3b\u751f\u6d3b\u4f1a\u3002\u4e2d\u592e\u7ec4\u7ec7\u90e8\u526f\u90e8\u957f\u5468\u7956\u7ffc\u5168\u7a0b\u53c2\u52a0\u5e76\u6307\u5bfc\u6c11\u4e3b\u751f\u6d3b\u4f1a\uff0c\u3002\u3002\u3002\n- INFO:*****************************************************************\n- INFO:------------------------------TEDT------------------------------\n- INFO:\u6807\u9898\uff1a\u3010\u4e2d\u56fd\u6d77\u6d0b\u5927\u5b66\u7b2c\u5341\u4e5d\u5c4a\u201c\u5929\u6cf0\u4f18\u79c0\u4eba\u624d\u5956\u201d\u3001\u201c\u5929\u6cf0\u5956\u5b66\u91d1\u201d\u9881\u5956\u4eea\u5f0f\u4e3e\u884c\u3011\n- INFO:\u65f6\u95f4\uff1a\u30102018-01-22\u3011\n- INFO:\u6b63\u6587\uff1a\u3010\u672c\u7ad9\u8baf1\u670819\u65e5\u4e0b\u5348\uff0c\u4e2d\u56fd\u6d77\u6d0b\u5927\u5b66\u7b2c\u5341\u4e5d\u5c4a\u201c\u5929\u6cf0\u4f18\u79c0\u4eba\u624d\u5956\u201d\u3001\u201c\u5929\u6cf0\u5956\u5b66\u91d1\u201d\u9881\u5956\u4eea\u5f0f\u5728\u5d02\u5c71\u6821\u533a\u4e3e\u884c\u3002\u5929\u6cf0\u516c\u76ca\u57fa\u91d1\u4f1a\u79d8\u4e66\u957f\u5f20\u7ec7\u4e91\u3002\u3002\u3002\n- INFO:*****************************************************************\n- INFO:------------------------------TEDT------------------------------\n- INFO:\u6807\u9898\uff1a\u3010\u9ad8\u6821\u601d\u60f3\u653f\u6cbb\u7406\u8bba\u8bfe\u5b9e\u5730\u6559\u5b66\u89c2\u6469\u5728\u4e0a\u6d77\u4ea4\u5927\u4e3e\u884c[\u56fe]\u3011\n- INFO:\u65f6\u95f4\uff1a\u30102018-01-18\u3011\n- INFO:\u6b63\u6587\uff1a\u3010\u4e3a\u6df1\u5165\u5b66\u4e60\u8d2f\u5f7b\u843d\u5b9e\u515a\u7684\u5341\u4e5d\u5927\u7cbe\u795e\uff0c\u6df1\u5165\u63a8\u52a8\u4e60\u8fd1\u5e73\u65b0\u65f6\u4ee3\u4e2d\u56fd\u7279\u8272\u793e\u4f1a\u4e3b\u4e49\u601d\u60f3\u8fdb\u6559\u6750\u8fdb\u8bfe\u5802\u8fdb\u5934\u8111\uff0c\u4e0d\u65ad\u63d0\u9ad8\u601d\u653f\u8bfe\u5efa\u8bbe\u7684\u8d28\u91cf\u548c\u6c34\u5e73\u3002\u3002\u3002\n- INFO:*****************************************************************", "description_content_type": null, "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/pzs741/TEDT", "keywords": "", "license": "MIT", "maintainer": "", "maintainer_email": "", "name": "TEDT", "package_url": "https://pypi.org/project/TEDT/", "platform": "", "project_url": "https://pypi.org/project/TEDT/", "project_urls": { "Homepage": "https://github.com/pzs741/TEDT" }, "release_url": "https://pypi.org/project/TEDT/0.5/", "requires_dist": null, "requires_python": "", "summary": "News Title Extraction Algorithm Based on Density and Text Features", "version": "0.5" }, "last_serial": 3956818, "releases": { "0.5": [ { "comment_text": "", "digests": { "md5": "1d5c3d9f1da9d15fdb4280333cfbb739", "sha256": "0893b0e743a133a7901ffa7252590610dc1114d82db68f73a86200d91b4cdc6e" }, "downloads": -1, "filename": "TEDT-0.5.tar.gz", "has_sig": false, "md5_digest": "1d5c3d9f1da9d15fdb4280333cfbb739", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 75519, "upload_time": "2018-02-07T20:18:23", "url": "https://files.pythonhosted.org/packages/8c/14/a8992a3a811b3d9f47f01a951d94e54fd809403fdfc4a85aa748a0e9dd4e/TEDT-0.5.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "1d5c3d9f1da9d15fdb4280333cfbb739", "sha256": "0893b0e743a133a7901ffa7252590610dc1114d82db68f73a86200d91b4cdc6e" }, "downloads": -1, "filename": "TEDT-0.5.tar.gz", "has_sig": false, "md5_digest": "1d5c3d9f1da9d15fdb4280333cfbb739", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 75519, "upload_time": "2018-02-07T20:18:23", "url": "https://files.pythonhosted.org/packages/8c/14/a8992a3a811b3d9f47f01a951d94e54fd809403fdfc4a85aa748a0e9dd4e/TEDT-0.5.tar.gz" } ] }