{ "info": { "author": "Luca Soldaini", "author_email": "luca@ir.cs.georgetown.edu", "bugtrack_url": null, "classifiers": [ "Development Status :: 5 - Production/Stable", "License :: OSI Approved :: MIT License", "Operating System :: OS Independent", "Programming Language :: Python :: 2", "Programming Language :: Python :: 3", "Topic :: Scientific/Engineering :: Artificial Intelligence", "Topic :: Scientific/Engineering :: Bio-Informatics" ], "description": "[**NEW: v.1.3 is pip-ready!**](https://giphy.com/embed/BlVnrxJgTGsUw) You can now install QuickUMLS through a simple `pip install quickumls`.\n\n# QuickUMLS\n\nQuickUMLS (Soldaini and Goharian, 2016) is a tool for fast, unsupervised biomedical concept extraction from medical text.\nIt takes advantage of [Simstring](http://www.chokkan.org/software/simstring/) (Okazaki and Tsujii, 2010) for approximate string matching.\nFor more details on how QuickUMLS works, we remand to our paper.\n\nThis project should be compatible with Python 3 (Python 2 is [no longer supported](https://pythonclock.org/)) and run on any UNIX system (support for Windows is experimental, please report bugs!). **If you find any bugs, please file an issue on GitHub or email the author at luca@ir.cs.georgetown.edu**.\n\n## Installation\n\n1. **Obtain a UMLS installation** This tool requires you to have a valid UMLS installation on disk. To install UMLS, you must first obtain a [license](https://uts.nlm.nih.gov/license.html) from the National Library of Medicine; then you should download all UMLS files from [this page](https://www.nlm.nih.gov/research/umls/licensedcontent/umlsknowledgesources.html); finally, you can install UMLS using the [MetamorphoSys](https://www.nlm.nih.gov/pubs/factsheets/umlsmetamorph.html) tool as [explained in this guide](https://www.nlm.nih.gov/research/umls/implementation_resources/metamorphosys/help.html). The installation can be removed once the system has been initialized.\n2. **Install QuickUMLS**: You can do so by either running `pip install quickumls` or `python setup.py install`. On macOS, using anaconda is **strongly recommended**\u2020. \n3. **Obrain a SpaCy corpus**: After you install QuickUMLS and its dependencies, you should be able to do so by running `python -m spacy download en`.\n3. **Create a QuickUMLS installation** Initialize the system by running `python -m quickumls.install `, where `` is where the installation files are (in particular, we need `MRCONSO.RRF` and `MRSTY.RRF`) and `` is the directory where the QuickUmls data files should be installed. This process will take between 5 and 30 minutes depending how fast the CPU and the drive where UMLS and QuickUMLS files are stored are (on a system with a Intel i7 6700K CPU and a 7200 RPM hard drive, initialization takes 8.5 minutes). `python -m quickumls.install` supports the following optional arguments:\n - `-L` / `--lowercase`: if used, all concept terms are folded to lowercase before being processed. This option typically increases recall, but it might reduce precision;\n - `-U` / `--normalize-unicode`: if used, expressions with non-ASCII characters are converted to the closest combination of ASCII characters.\n - `-E` / `--language`: Specify the language to consider for UMLS concepts; by default, English is used. For a complete list of languages, please see [this table provided by NLM](https://www.nlm.nih.gov/research/umls/knowledge_sources/metathesaurus/release/abbreviations.html#LAT).\n\n\n**\u2020**: If the installation fails on macOS when using Anaconda, install `leveldb` first by running `conda install -c conda-forge python-leveldb`.\n\n## APIs\n\nA QuickUMLS object can be instantiated as follows:\n\n```python\nfrom quickumls import QuickUMLS\n\nmatcher = QuickUMLS(quickumls_fp, overlapping_criteria, threshold,\n similarity_name, window, accepted_semtypes)\n```\n\nWhere:\n\n- `quickumls_fp` is the directory where the QuickUMLS data files are installed.\n- `overlapping_criteria` (optional, default: \"score\") is the criteria used to deal with overlapping concepts; choose \"score\" if the matching score of the concepts should be consider first, \"length\" if the longest should be considered first instead.\n- `threshold` (optional, default: 0.7) is the minimum similarity value between strings.\n- `similarity_name` (optional, default: \"jaccard\") is the name of similarity to use. Choose between \"dice\", \"jaccard\", \"cosine\", or \"overlap\".\n- `window` (optional, default: 5) is the maximum number of tokens to consider for matching.\n- `accepted_semtypes` (optional, default: see `constants.py`) is the set of UMLS semantic types concepts should belong to. Semantic types are identified by the letter \"T\" followed by three numbers (e.g., \"T131\", which identifies the type *\"Hazardous or Poisonous Substance\"*). See [here](https://metamap.nlm.nih.gov/Docs/SemanticTypes_2013AA.txt) for the full list.\n\nTo use the matcher, simply call\n\n```python\ntext = \"The ulna has dislocated posteriorly from the trochlea of the humerus.\"\nmatcher.match(text, best_match=True, ignore_syntax=False)\n```\n\nSet `best_match` to `False` if you want to return overlapping candidates, `ignore_syntax` to `True` to disable all heuristics introduced in (Soldaini and Goharian, 2016).\n\n\n## Server / Client Support\n\nStarting with v.1.2, QuickUMLS includes a support for being used in a client-server configuration. That is, you can start one QuickUMLS server, and query it from multiple scripts using a client.\n\nTo start the server, run `python -m quickumls.server`:\n\n```bash\npython -m quickumls.server /path/to/quickumls/files {-P QuickUMLS port} {-H QuickUMLS host} {QuickUMLS options}\n```\n\nHost and port are optional; by default, QuickUMLS runs on `localhost:4645`. You can also pass any QuickUMLS option mentioned above to the server. To obtain a list of options for the server, run `python -m quickumls.server -h`.\n\nTo load the client, import `get_quickumls_client` from `quickumls`:\n\n```bash\nfrom quickumls import get_quickumls_client\nmatcher = get_quickumls_client()\ntext = \"The ulna has dislocated posteriorly from the trochlea of the humerus.\"\nmatcher.match(text, best_match=True, ignore_syntax=False)\n```\n\nThe API of the client is the same of a QuickUMLS object.\n\n\nIn case you wish to run the server in the background, you can do so as follows:\n\n```bash\nnohup python -m quickumls.server /path/to/QuickUMLS {server options} > /dev/null 2>&1 & echo $! > nohup.pid\n\n```\n\nWhen you are done, don't forget to stop the server by running.\n```bash\nkill -9 `cat nohup.pid`\nrm nohup.pid\n```\n\n## References\n\n- Okazaki, Naoaki, and Jun'ichi Tsujii. \"*Simple and efficient algorithm for approximate dictionary matching.*\" COLING 2010.\n- Luca Soldaini and Nazli Goharian. \"*QuickUMLS: a fast, unsupervised approach for medical concept extraction.*\" MedIR Workshop, SIGIR 2016.", "description_content_type": "text/markdown", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/Georgetown-IR-Lab/QuickUMLS", "keywords": "", "license": "MIT", "maintainer": "", "maintainer_email": "", "name": "quickumls", "package_url": "https://pypi.org/project/quickumls/", "platform": "", "project_url": "https://pypi.org/project/quickumls/", "project_urls": { "Homepage": "https://github.com/Georgetown-IR-Lab/QuickUMLS" }, "release_url": "https://pypi.org/project/quickumls/1.3.0.post4/", "requires_dist": null, "requires_python": "", "summary": "QuickUMLS is a tool for fast, unsupervised biomedical concept extraction from medical text", "version": "1.3.0.post4" }, "last_serial": 5418163, "releases": { "1.3.0.post4": [ { "comment_text": "", "digests": { "md5": "10e86e72a32d9f62230f2fa162cccc6e", "sha256": "c5c6547e382e015921c44f31e2cf66703c1a972587f49d826415f719da5f3e91" }, "downloads": -1, "filename": "quickumls-1.3.0.post4.tar.gz", "has_sig": false, "md5_digest": "10e86e72a32d9f62230f2fa162cccc6e", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 20993, "upload_time": "2019-06-19T02:31:57", "url": "https://files.pythonhosted.org/packages/ee/bf/ff99e77ee8f0ae878cefdc68e77228124362e73675c0b4c8f082f61f396d/quickumls-1.3.0.post4.tar.gz" } ], "1.3.post1": [ { "comment_text": "", "digests": { "md5": "83509ac04f6e13b6d4e8f96da34f40c9", "sha256": "92b350d6a4c525be72db4cc49d7966c04d860b55147f5d77cb35c7195a500e86" }, "downloads": -1, "filename": "quickumls-1.3.post1.tar.gz", "has_sig": false, "md5_digest": "83509ac04f6e13b6d4e8f96da34f40c9", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 20879, "upload_time": "2019-06-18T04:07:32", "url": "https://files.pythonhosted.org/packages/aa/b9/864222750de6287f9c12985f6db11b7908005c74472b4012cc99f36b48b9/quickumls-1.3.post1.tar.gz" } ], "1.3.post2": [ { "comment_text": "", "digests": { "md5": "4d835352bcc38688729b2a399c9b9f54", "sha256": "7dfed353f7eeca8695943374d3787233d969f695905964db5d3fc65d901a9aa9" }, "downloads": -1, "filename": "quickumls-1.3.post2.tar.gz", "has_sig": false, "md5_digest": "4d835352bcc38688729b2a399c9b9f54", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 20879, "upload_time": "2019-06-18T04:24:37", "url": "https://files.pythonhosted.org/packages/a1/5c/75dc51f405b0329428147330cdafa0b779e71748055a8724868471dbf134/quickumls-1.3.post2.tar.gz" } ], "1.3.post3": [ { "comment_text": "", "digests": { "md5": "8fb581d33733f1737caa1a19ac409cbe", "sha256": "94a95ed2aa41f0ad915ea4770477337c1939f52b4b842cf844696b05863a1912" }, "downloads": -1, "filename": "quickumls-1.3.post3.tar.gz", "has_sig": false, "md5_digest": "8fb581d33733f1737caa1a19ac409cbe", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 20902, "upload_time": "2019-06-18T04:30:22", "url": "https://files.pythonhosted.org/packages/17/cf/fa547fd92dbb99d2577b360898e68ff095091e88aac2a9662e436f855f08/quickumls-1.3.post3.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "10e86e72a32d9f62230f2fa162cccc6e", "sha256": "c5c6547e382e015921c44f31e2cf66703c1a972587f49d826415f719da5f3e91" }, "downloads": -1, "filename": "quickumls-1.3.0.post4.tar.gz", "has_sig": false, "md5_digest": "10e86e72a32d9f62230f2fa162cccc6e", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 20993, "upload_time": "2019-06-19T02:31:57", "url": "https://files.pythonhosted.org/packages/ee/bf/ff99e77ee8f0ae878cefdc68e77228124362e73675c0b4c8f082f61f396d/quickumls-1.3.0.post4.tar.gz" } ] }