{ "info": { "author": "Lapis-Hong", "author_email": "dinghongquan@sjtu.edu.cn", "bugtrack_url": null, "classifiers": [ "Programming Language :: Python :: 2.7", "Programming Language :: Python :: 3.2", "Programming Language :: Python :: 3.3", "Programming Language :: Python :: 3.4", "Programming Language :: Python :: 3.5", "Programming Language :: Python :: 3.6", "Topic :: Scientific/Engineering :: Artificial Intelligence" ], "description": "# xinci \u65b0\u8bcd & \u62bd\u8bcd\nxinci is a Python interface for chinese words extraction & new words extraction.\n[https://pypi.org/project/xinci/]\n\n## Requirements\nPython >= 2.7\n\n## Installation\n### 1. using pip\n```shell\npip install xinci\n```\n### 2. using setup.py\n``` shell\ngit clone git@github.com:Lapis-Hong/xinci.git \ncd xinci \npip setup.py install\n```\n\n## Usage\nThis package has two main use cases: words extraction and\nfind new words. \n\n### 1. command line\n```shell\ncd xinci\npython word_extraction.py \n```\nor \n```\n./run.sh\n```\n\n### 2. python package\n```python \nimport xinci\n\n# if you want to see logging events.\nimport logging\nlogging.basicConfig(level=logging.INFO, format='%(asctime)s : %(levelname)s : %(message)s')\n\n# init default dictionary or user dic,\ndic = xinci.Dictionary()\n# load vocab, vocab is a python set.\nvocab = dic.load() # or dic.dictionary\nprint(vocab)\n\n# add words to dic\ndic.add(['\u795e\u9a6c']) # or dic.add_from_file('user.dic')\n# remove words from dic\ndic.remove(['\u795e\u9a6c']) # or dic.remove_from_file('user.dic')\n\n# extract new words, xc is a set\nxc = xinci.extract('corpus.txt')\nfor w in xc:\n print(w)\n# extract all words, c is a set\nc = xinci.extract('corpus.txt', all_words=True)\nfor w in xc:\n print(w)\n```\nresult\n```angular2html\n\u53d1\u73b05\u4e2a\u65b0\u8bcd\u5982\u4e0b:\n@\u65b0\u8bcd\t@\u8bcd\u9891\n\u795b\u6591\t13\n\u540e\u518d\t7\n\u4eca\u65e5\u5934\u6761\t9\n\u6d17\u51c0\u5207\t7\n\u86cb\u6db2\t9\n```\n### Notes: Iteratively add \"not seems to new words\" in result to common dic will improve a lot. \n\n\n## API documentation\n```python\nxc = xinci.extract(params)\n\n```\nList of available `params` and their default value:\n```angular2html\ncorpus_file: string, input corpus file (required)\ncommon_words_file: string, common words dic file [common.dic]\nmin_candidate_len: int, min candidate word length [2]\nmax_candidate_len: int, max candidate word length [5]\nleast_cnt_threshold: int, least word count to extract [5]\nsolid_rate_threshold: float, solid rate threshold [0.018]\nentropy_threshold: float, entropy threshold [1.92]\nall_words: bool, set True to extract all words mode [False]\nsave_file: string, output file [None]\n```\n\n## References\nThe code is based on this java version\n[https://github.com/GeorgeBourne/grid]\n\n\n\n", "description_content_type": "", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/lapis-hong/xinci", "keywords": "NLP,Chinese words extraction,New words discovery", "license": "MIT", "maintainer": "", "maintainer_email": "", "name": "xinci", "package_url": "https://pypi.org/project/xinci/", "platform": "", "project_url": "https://pypi.org/project/xinci/", "project_urls": { "Homepage": "https://github.com/lapis-hong/xinci" }, "release_url": "https://pypi.org/project/xinci/1.2.0/", "requires_dist": null, "requires_python": "", "summary": "Chinese words extraction and new words discovery", "version": "1.2.0" }, "last_serial": 3980222, "releases": { "1.2.0": [ { "comment_text": "", "digests": { "md5": "f7ce84bc128009502d8a1a38e7488f45", "sha256": "7540bad5163572058a64b857449c829f7849a02c144e834cb29992336cb734e5" }, "downloads": -1, "filename": "xinci-1.2.0-py2-none-any.whl", "has_sig": false, "md5_digest": "f7ce84bc128009502d8a1a38e7488f45", "packagetype": "bdist_wheel", "python_version": "py2", "requires_python": null, "size": 1021488, "upload_time": "2018-06-20T06:06:05", "url": "https://files.pythonhosted.org/packages/b8/d9/c88909d9f3d891f2ec368f6abc789ac6e10015b3bd5273ae481fb72e5b6f/xinci-1.2.0-py2-none-any.whl" }, { "comment_text": "", "digests": { "md5": "7e198e411ae5993bafd3e8fc312f7dff", "sha256": "05076fa33ef32dcd9dd8bc502603df07fddef1ec42c45fdb17e964c5a64b14de" }, "downloads": -1, "filename": "xinci-1.2.0.tar.gz", "has_sig": false, "md5_digest": "7e198e411ae5993bafd3e8fc312f7dff", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 1015686, "upload_time": "2018-06-20T06:06:10", "url": "https://files.pythonhosted.org/packages/08/91/80ecc85e199caaeeb90e13c1d5cd79c36cbcb822c40f6d8c9be48e3deee1/xinci-1.2.0.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "f7ce84bc128009502d8a1a38e7488f45", "sha256": "7540bad5163572058a64b857449c829f7849a02c144e834cb29992336cb734e5" }, "downloads": -1, "filename": "xinci-1.2.0-py2-none-any.whl", "has_sig": false, "md5_digest": "f7ce84bc128009502d8a1a38e7488f45", "packagetype": "bdist_wheel", "python_version": "py2", "requires_python": null, "size": 1021488, "upload_time": "2018-06-20T06:06:05", "url": "https://files.pythonhosted.org/packages/b8/d9/c88909d9f3d891f2ec368f6abc789ac6e10015b3bd5273ae481fb72e5b6f/xinci-1.2.0-py2-none-any.whl" }, { "comment_text": "", "digests": { "md5": "7e198e411ae5993bafd3e8fc312f7dff", "sha256": "05076fa33ef32dcd9dd8bc502603df07fddef1ec42c45fdb17e964c5a64b14de" }, "downloads": -1, "filename": "xinci-1.2.0.tar.gz", "has_sig": false, "md5_digest": "7e198e411ae5993bafd3e8fc312f7dff", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 1015686, "upload_time": "2018-06-20T06:06:10", "url": "https://files.pythonhosted.org/packages/08/91/80ecc85e199caaeeb90e13c1d5cd79c36cbcb822c40f6d8c9be48e3deee1/xinci-1.2.0.tar.gz" } ] }