{ "info": { "author": "Chao Li", "author_email": "chaoli.job@gmail.com", "bugtrack_url": null, "classifiers": [ "Development Status :: 2 - Pre-Alpha", "Intended Audience :: Developers", "License :: OSI Approved :: MIT License", "Natural Language :: English", "Programming Language :: Python :: 3.5", "Programming Language :: Python :: 3.6", "Programming Language :: Python :: 3.7" ], "description": "taxonomy-matcher\n=================\n\nDescription\n-----------\n\nGiven a gazetteer/taxonomy and input text, ``taxonomy-matcher`` can\nbe used to find all phrases which matches the codes/instances/keywords in the\ngazetteer or taxonomy.\n\nFor each match, it will return the information of,\n\n - surface_form\n\n - matched position\n\n - Code ID and Code Description\n\n - and other code related information\n\n\nCI Status\n-----------\n\n.. image:: https://travis-ci.org/tilaboy/taxonomy-matcher.svg?branch=master\n :target: https://travis-ci.org/tilaboy/taxonomy-matcher\n\n.. image:: https://pyup.io/repos/github/tilaboy/taxonomy-matcher/shield.svg\n :target: https://pyup.io/repos/github/tilaboy/taxonomy-matcher/\n :alt: Updates\n\n.. image:: https://readthedocs.org/projects/gazetteer-matcher/badge/?version=latest\n :target: https://gazetteer-matcher.readthedocs.io/en/latest/?badge=latest\n :alt: Documentation Status\n\n\nRequirements\n------------\n\nPython 3.6+\n\nUsage\n-----\n\nUse taxonomy-match script:\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\n::\n\n usage: taxonomy-match input_file taxonomy_file [--output_file OUTPUT_FILE]\n\n\n load taxonomy phrases from the taxonomy file, and find all matched phrases\n from the input text. The result will eithor write to an output file or print\n to the screen.\n\n positional arguments:\n input_file input text file, text to mine phrases\n taxonomy_file taxonomy file, support json/xml/txt, see documentation\n for more details\n\n optional arguments:\n --output_file output file of matched phrases, supports\n jsonl/csv/tsv/txt format, print matched phrases to\n the screen if not defined\n\n\nUse taxonomy-matcher module\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\n- From normalization table in JSOM format:\n\n::\n\n from taxonomy_matcher.matcher import Matcher\n taxonomy_matcher = Matcher(normtable=json_file)\n for matched in taxonomy_matcher.matching(text):\n print(matched)\n\nAnd an example of the normalization table in JSON:\n\n::\n\n {\n \"meta\": {\n \"concept_type\": \"skills\",\n \"release_datetime\": \"2019-xx-xx\"\n },\n \"concepts\": [\n {\n \"display_name\": \"Risk Analysis\",\n \"category\": \"Financial Skill\",\n \"id\": \"ABCDEFG001\",\n \"surface_forms\": [\n {\n \"surface_form\": \"risk analysis\",\n \"skill_likelihood\": 0.9\n },\n {\n \"surface_form\": \"quantitative risk assessment\",\n \"skill_likelihood\": 1.0\n },\n {\n \"surface_form\": \"risk assessment\",\n \"skill_likelihood\": 0.7\n }\n ]\n },\n .......\n {\n \"display_name\": \"Mobile Data\",\n \"category\": \"Computer Skill\",\n \"id\": \"ABCDEFG002\",\n \"surface_forms\": [\n {\n \"surface_form\": \"mobile data\"\n }\n ]\n }\n ]\n }\n\n- From gazetteer:\n\n::\n\n from taxonomy_matcher.matcher import Matcher\n taxonomy_matcher = Matcher(gazetteer=gz_file)\n for matched in taxonomy_matcher.matching(text):\n print(matched)\n\nand an example of the gazetteer\n\n::\n\n # gazetteer\n mobile data\n risk analysis\n quantitative risk assessment\n risk assessment\n .....\n\n- From Taxonomy Codetable:\n\n::\n\n from taxonomy_matcher.matcher import Matcher\n ct_matcher = Matcher(codetable=ct_file)\n for matched in ct_matcher.matching(text):\n print(matched)\n\nCodeTable is a XML version of the JSON example given above.\n\nother functions\n~~~~~~~~~~~~~~~\n\n- Context words:\n\nWhen context are needed for matched phrases, e.g. for the following up\nvalidation functions, enable the ``with\\_context`` option:\n\n::\n\n from taxonomy_matcher.matcher import Matcher\n taxonomy_matcher = Matcher(normtable=json_file,with_context=True)\n for matched in taxonomy_matcher.matching(text):\n print(matched.left_context, matched.right_context)\n\n- Code Property lookup\n\nIf need to lookup the property of an Code in the taxonomy,\ncheck the matcher Class property 'code\\_property\\_mapping',\nit is a dictionary mapping id to description and category, it is in\nthe form of:\n\n::\n\n dict[code_id] = {\n 'desc':code_description,\n 'type':code_category\n }\n\nE.g. to get the description of the codeid:\n\n::\n\n codeid = 12345\n from taxonomy_matcher.matcher import Matcher\n taxonomy_matcher = Matcher(normtable=json_file)\n if codeid in taxonomy_matcher.code_property_mapping:\n print(taxonomy_matcher.code_property_mapping[codeid]['desc'])\n\n\ncheck the Metainfo of the Taxonomy or Gazetteer:\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nNote: currently only available for the Normalized code JSOM.\n\nThe metainfo can be stored in meta part of the JSON document, e.g. if\nthe following information is listed in the JSOM meta section:\n\n::\n\n \"meta\": {\n \"language\": \"EN\",\n \"release_datetime\": \"2019-04-17T12:22:10.729673\",\n \"concept_type\": \"skills\",\n \"purpose\": \"normalization\"\n },\n\nWe can fetch it via the matcher object\n\n::\n\n from taxonomy_matcher.matcher import Matcher\n taxonomy_matcher = Matcher(normtable=json_file)\n print(taxonomy_matcher['meta_info'])\n\noutput will be:\n\n::\n\n {\n 'language': 'EN',\n 'release_datetime': '2019-04-17T12:22:10.729673',\n 'concept_type': 'skills',\n 'purpose': 'normalization'\n }\n\nmatched phrase object: MatchedPhrase\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nmatcher.matching is an iterable which return a MatchedPhrase instance,\nthe instance has the following attributes:\n\n- normalize pattern form: matched\\_pattern\n\n- surface form: surface\\_form\n\n- start position and end position: start\\_pos, end\\_pos\n\n- code\\_id and code\\_description (None if not set in the pattern file)\n\n- left context and right context of the matched skills (only availabe if with\\_context=True )\n\n\n::\n\n for match in matcher.matching(text):\n print(\"found pattern [{}] in the form of [{}] at position ({}:{}), code:{} {} {}\".format(\n matched.matched_pattern\n matched.surface_form\n matched.start_pos\n matched.end_pos\n matched.code_id\n matched.code_description\n matched.category\n matched.left_context\n matched.right_context\n )\n\nDevelopment\n-----------\n\nTo install package and its dependencies, run the following from project\nroot directory:\n\n::\n\n python setup.py install\n\nTesting\n~~~~~~~\n\nTo run unit tests, execute the following from the project root\ndirectory:\n\n::\n\n python setup.py test\n\n\n0.0.8 (2019-08-05)\n==================\n\nAdd option \"output to file\" to ``taxonomy-match`` script,\nsupport jsonl/csv/tsv/txt, and STDOUT\n\n\n0.0.7 (2019-07-30)\n==================\n\nAdd a script ``taxonomy-match`` to find all matches from input_text, with a\ngiven taxonomy\n\n0.0.6 (2019-07-30)\n==================\n\ntest the travis-ci\n\n\n0.0.5 (2019-07-30)\n==================\n\ntest the travis-ci\n\n\n0.0.4 (2019-07-30)\n==================\n\nrename the package name to taxonomy-matcher. Reorder the structure of the\npackage.\n\n\n0.0.3 (2019-07-28)\n==================\n\ntest the CI frame, added the travis support, added automatic document\ngeneration.\n\n\n0.0.2 (2019-07-27)\n==================\n\nAdded a working version Matcher, which can create a matcher with\nthe gazetteer from eighor a txt, json, or xml format, and found matched phrases\nfrom input text.\n\n\n0.0.1 (2019-07-27)\n==================\nInitiate the package.\n\n\n", "description_content_type": "", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/tilaboy/taxonomy-matcher", "keywords": "taxonomy-matcher", "license": "MIT license", "maintainer": "", "maintainer_email": "", "name": "taxonomy-matcher", "package_url": "https://pypi.org/project/taxonomy-matcher/", "platform": "", "project_url": "https://pypi.org/project/taxonomy-matcher/", "project_urls": { "Homepage": "https://github.com/tilaboy/taxonomy-matcher" }, "release_url": "https://pypi.org/project/taxonomy-matcher/0.0.8/", "requires_dist": null, "requires_python": "", "summary": "Python tool to match phrases listed in the taxonomy", "version": "0.0.8" }, "last_serial": 5633187, "releases": { "0.0.7": [ { "comment_text": "", "digests": { "md5": "aef27c1488e37b9afc493e1030455a41", "sha256": "95d2be8cacde3b5b17b07a2dfbd43c5dca1f6e580ed0eb65699bbe0f7d682516" }, "downloads": -1, "filename": "taxonomy_matcher-0.0.7-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "aef27c1488e37b9afc493e1030455a41", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": null, "size": 15969, "upload_time": "2019-07-31T07:26:20", "url": "https://files.pythonhosted.org/packages/ec/2b/4b13b5d43a36c38a3c7839cbe569a5982b6179248ab03f17f59c1e90779d/taxonomy_matcher-0.0.7-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "7f979967b82b80da3666878e2d529de5", "sha256": "bebbbaaccbb617a26c6b68f53a64efb59c38eb5d729f6ba6fdf4515f5dc5e847" }, "downloads": -1, "filename": "taxonomy_matcher-0.0.7.tar.gz", "has_sig": false, "md5_digest": "7f979967b82b80da3666878e2d529de5", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 19237, "upload_time": "2019-07-31T07:26:23", "url": "https://files.pythonhosted.org/packages/10/fd/3471a7b3a6d1a8fade68882c2ff7c5758d0591215eb0ff0f416935f84eb4/taxonomy_matcher-0.0.7.tar.gz" } ], "0.0.8": [ { "comment_text": "", "digests": { "md5": "01447aa0ba7e197bd9f65d75e004661a", "sha256": "95485aae3625b69bb5ada59e50329a495aaac3af26c35547ca8606806b542050" }, "downloads": -1, "filename": "taxonomy_matcher-0.0.8-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "01447aa0ba7e197bd9f65d75e004661a", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": null, "size": 17440, "upload_time": "2019-08-05T08:46:34", "url": "https://files.pythonhosted.org/packages/61/4a/8ccbf16b27c37057fc54b564a5ab3aff0a5ab09f4250fd6df390c2405c69/taxonomy_matcher-0.0.8-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "734352a3089119cefaa34a8a7afa3c12", "sha256": "5c49f5bd8c9fd2fe1e05213fae2e7b36fa778243d73e5e1653f5e694c8ecfcb5" }, "downloads": -1, "filename": "taxonomy_matcher-0.0.8.tar.gz", "has_sig": false, "md5_digest": "734352a3089119cefaa34a8a7afa3c12", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 20497, "upload_time": "2019-08-05T08:46:36", "url": "https://files.pythonhosted.org/packages/de/78/82bfa6f9312eceab284ce75eb84ef7ed255a9f00d02d39be13d967931a6e/taxonomy_matcher-0.0.8.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "01447aa0ba7e197bd9f65d75e004661a", "sha256": "95485aae3625b69bb5ada59e50329a495aaac3af26c35547ca8606806b542050" }, "downloads": -1, "filename": "taxonomy_matcher-0.0.8-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "01447aa0ba7e197bd9f65d75e004661a", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": null, "size": 17440, "upload_time": "2019-08-05T08:46:34", "url": "https://files.pythonhosted.org/packages/61/4a/8ccbf16b27c37057fc54b564a5ab3aff0a5ab09f4250fd6df390c2405c69/taxonomy_matcher-0.0.8-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "734352a3089119cefaa34a8a7afa3c12", "sha256": "5c49f5bd8c9fd2fe1e05213fae2e7b36fa778243d73e5e1653f5e694c8ecfcb5" }, "downloads": -1, "filename": "taxonomy_matcher-0.0.8.tar.gz", "has_sig": false, "md5_digest": "734352a3089119cefaa34a8a7afa3c12", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 20497, "upload_time": "2019-08-05T08:46:36", "url": "https://files.pythonhosted.org/packages/de/78/82bfa6f9312eceab284ce75eb84ef7ed255a9f00d02d39be13d967931a6e/taxonomy_matcher-0.0.8.tar.gz" } ] }