{ "info": { "author": "Yuda Munarko", "author_email": "yuda.munarko@gmail.com", "bugtrack_url": null, "classifiers": [ "License :: OSI Approved :: GNU General Public License (GPL)", "Operating System :: OS Independent", "Programming Language :: Python :: 3" ], "description": "# NLIMED\n\n![Latest Version](https://img.shields.io/pypi/v/nlimed.svg?colorB=bc4545)\n![Python Versions](https://img.shields.io/pypi/pyversions/nlimed.svg?colorB=bc4545)\n\nNatural Language Interface for Model Entity Discovery (NLIMED) is an interface to search model entities (i.e. flux of sodium across the basolateral plasma membrane, the concentration of potassium in the portion of tissue fluid) in the biosimulation models in repositories. The interface utilises the RDF inside biosimulation models and metadata from BioPortal. Currently, the interface can retrieve model entities from the Physiome Model Repository (PMR, https://models.physiomeproject.org) and the BioModels (https://www.ebi.ac.uk/biomodels/).\n\nIn general, NLIMED works by converting natural language query into SPARQL, so it may help users by avoiding the rigid syntax of SPARQL, query path consisting multiple predicates, and detail knowledge about ontologies.\n\nNote: model entities extracted from BioModels are those having ontology classes indexed by BioPortal.\n\nLicense :: OSI Approved :: GNU General Public License (GPL)\n\n## Demo\n1. [Online web](http://130.216.217.102/nlimed)\n2. [Colab](https://colab.research.google.com/drive/1xq3ewKIT9pHD0AveWuYy2cpJvG4oLjDR#scrollTo=VYv3uMcMt6HJ) - online tutorial\n\n## References\nThe main reference of this work is: https://doi.org/10.1101/756304\n\nCite the following works when you implement NLIMED with parser or nlp of:\n1. CoreNLP: https://stanfordnlp.github.io/CoreNLP/citing.html\n2. Benepar: https://arxiv.org/abs/1805.01052\n3. NCBO: https://www.ncbi.nlm.nih.gov/pubmed/21347171\n4. Stanza and xStanza: http://arxiv.org/abs/2007.14640\n\n## Installation\nWe sugest you to install NLIMED from PyPI. If you already installed [pip](https://pip.pypa.io/en/stable/installing/), run the following command:\n ```\n pip install NLIMED\n ```\nAs an alternative, you can clone and download this github repository and use the following command:\n ```\n git clone https://github.com/napakalas/NLIMED.git\n cd NLIMED\n pip install -e .\n ```\n\nFollow this [Colab](https://colab.research.google.com/drive/1xq3ewKIT9pHD0AveWuYy2cpJvG4oLjDR#scrollTo=VYv3uMcMt6HJ) tutorial for easy step by step installation and utilisation.\n\n## Configuration\n\nNLIMED implements CoreNLP Parser, Benepar Parser, and NCBO parser. You may select one of them for your system.\n * Benepar, Stanza, xStanza:\n - automatically deploy as a dependency.\n * To run CoreNLP and NCBO, you need to run configuration. It can be using default value\n ```python\n from NLIMED import config\n config()\n ```\n or using your own [NCBO api key](https://bioportal.bioontology.org/help#Getting_an_API_key) and self defined CoreNLP installation location\n ```python\n from NLIMED import config\n config(apikey='your-api-key', corenlp_home='installation-location')\n ```\n\n * It is also possible to run configuration from command prompt or terminal:\n\n ```\n NLIMED --config --apikey {your-ncbo-api-key} --corenlp-home {installation-location}\n ```\n\n## Issues\n\nFor any issues please follows and reports [issues](https://github.com/napakalas/NLIMED/issues).\n\n## Experiment\nWe conducted an experiment to measure NLIMED performance in term of:\n1. Annotating natural language query to ontology classes\n2. NLIMED behaviour to native query in PMR\n\nThe experiment is available at:\n1. [Jupyter](https://github.com/napakalas/NLIMED/tree/experiment)\n2. Colab:\n * [NLQ Annotator Performance](https://colab.research.google.com/drive/1xq3ewKIT9pHD0AveWuYy2cpJvG4oLjDR#scrollTo=KwQAqORxsnk_)\n * [NLIMED Behaviour on Native Query in PMR (Historical Data)](https://colab.research.google.com/drive/1xq3ewKIT9pHD0AveWuYy2cpJvG4oLjDR#scrollTo=qsEFA2pGtVZH)\n\n## How NLIMED works?\nHere is the process inside NLIMED converting natural language query (NLQ) and SPARQL and then retrieving model entities from biomodel repositories:\n1. NLQ Annotation -- Annotating NLQ to ontology classes\n\n - NLQ is parsed using selected parser (CoreNLP, Benepar, NCBO, Stanza, or xStanza), resulting candidate noun phrases (CNPs).\n - Measuring association level of each CNP to ontology classes. The measurement utilises five textual features, i.e. preferred label, synonym, definition,, parent label (from ontology class) and local definition (from biosimulation model)\n - Select CNPs with highest association, having longest term, and not overlapping with other CNP. The selected CNPs should cover all terms in NLQ.\n - Filter out CNP having low association to ontology class ( .\n ?c .\n ?a .\n ?Model_entity ?a .\n ?a ?c .\n ?c ?e .\n OPTIONAL{?Model_entity ?desc .} }}'\n ]\n ```\n* Get SPARQL code for query \"concentration of potassium in the portion of tissue fluid\" in the PMR using CoreNLP parser with precision level 2, alpha=4, beta=0.7, gamma=0.5, delta=0.8, theta=0.01 and CoreNLP parser\n ```python\n from NLIMED import NLIMED\n nlimed = NLIMED(repo='pmr', parser='CoreNLP', pl=2, alpha=4, beta=0.7, gamma=0.5, delta=0.8, theta=0.01)\n query = 'flux of sodium'\n result = nlimed.getSparql(query=query,format='json')\n ```\n The result can be 0 or more than one SPARQL based on the RDF Graph Index.\n ```\n [\n 'SELECT ?graph ?Model_entity ?desc\n WHERE { GRAPH ?graph { ?c .\n ?e .\n ?a .\n ?Model_entity ?a .\n ?a ?c .\n ?c ?e .\n OPTIONAL{?Model_entity ?desc .} }}'\n ]\n ```\n\n ## Running NLIMED from console (query to get model entities)\n NLIMED can be run directly on command prompt or terminal. There are 2 mandatory arguments and 7 optional arguments. To get help about the required arguments, run:\n ```terminal\n NLIMED -h\n ```\n then you will get:\n ```terminal\n usage: NLIMED [-h] -r {pmr,bm,all} -p {CoreNLP,Benepar,ncbo} -q QUERY\n [-pl PL] [-s {models,sparql,annotation,verbose}] [-a ALPHA]\n [-b BETA] [-g GAMMA] [-d DELTA] [-t THETA]\n\n optional arguments:\n -h, --help show this help message and exit\n -r {pmr,bm,all}, --repo {pmr,bm,all}\n repository name\n -p {corenlp,benepar,stanza,xstanza,ncbo}, --parser {corenlp,benepar,stanza,xstanza,ncbo}\n parser tool\n -q QUERY, --query QUERY\n query -- any text containing words\n -pl PL precision level, >=1\n -s {models,sparql,annotation,verbose}, --show {models,sparql,annotation,verbose}\n results presentation type\n -a ALPHA, --alpha ALPHA\n Minimum alpha is 0\n -b BETA, --beta BETA Minimum beta is 0\n -g GAMMA, --gamma GAMMA\n Minimum gamma is 0\n -d DELTA, --delta DELTA\n Minimum delta is 0\n -t THETA, --theta THETA\n Minimum theta is 0\n -c CUTOFF, --cutoff CUTOFF\n Minimum cutoff is 0\n -tf TFMODE, --tfMode TFMODE\n tf mode calculation, [1,2,3]\n ```\n Here is the description of those arguments:\n * -r {pmr,bm,all} or --repo {pmr,bm,all} (mandatory) is the name of repository. pmr is the Physiome Repository Model, bm is BioSimulations, all is for both repositories\n * -p {corenlp,benepar,stanza,xstanza,ncbo} or --parser {CoreNLP,Benepar,ncbo} (mandatory) is the type of parser for query annotation.\n * -q QUERY or --query QUERY (mandatory) is the query text. For a multi words query, the words should be double quoted.\n * -pl PL (optional) is precision level indicating the number of ontology classes used to construct SPARQL. Larger number will utilised more ontology cllasses which may generate more SPARQL and produce more results. Minimum value is 1. Default value is 1.\n * -s {models,sparql,annotation,verbose} or --show {models,sparql,annotation,verbose} (optional) is for selecting presented results. models shows models, sparql shows all possible SPARQLs, annotation shows annotation results, and verbore shows annotation results, SPARQLs, and models\n * -a ALPHA or --alpha ALPHA (optional) is to set up the weight of preffered label feature. Minimum alpha is 0. Default value is 3.0.\n * -b BETA or --beta BETA (optional) is to set up the weight of synonym feature. Minimum beta is 0. Default value is 3.0.\n * -g GAMMA or --gamma GAMMA (optional) is to set up the weight of definition feature. Minimum gamma is 0. Default value is 0.1.\n * -d DELTA, --delta DELTA (optional) is to set up the weight of parent labels feature. Minimum gamma is 0. Default value is 0.1.\n * -t THETA, --theta THETA (optional) is to set up the weight of description feature. Minimum theta is 0. Default value is 0.38.\n * -c CUTOFF, --cutoff (optional) is the minimum degree of association between a phrases and a ontology class used. Minimum cutoff is 0. Default value is 1.\n * -tf tfMode, --tfMode (optional) is the term frequency calculation mode, 1 = all features with dependency term, 2 = all features without dependency term, 3 = highest feature with dependency term. Devault value is 3.\n\n ### Running example\n * running with minimum setup for repository = Physiome Model Repository, parser = Benepar, query = \"flux of sodium\", and other default arguments values:\n ```\n NLIMED -r pmr -p Benepar -q \"flux of sodium\"\n ```\n * running with full setup for repository=BioModels, parser=CoreNLP, query=\"flux of sodium\", precision level = 2, alpha = 2, beta = 1, gamma = 1, and delta = 1, theta = 0.4, cutoff = 1.0, tfMode = 3\n ```\n NLIMED -r bm -p CoreNLP -q \"flux of sodium\" -pl 2 -a 2 -b 1 -g 1 -d 1 -t 0.4 -c 1 -tf 3\n ```\n Note: running with CoreNLP parser may cause delay local server startup for the first run. However, for the next run, the delay is disappeared.\n\n * running for repository = Physiome Model Repository, parser = NCBO, query = \"flux of sodium\", precision level = 3,and other default arguments\n\n ```\n NLIMED -r pmr -p ncbo -q \"flux of sodium\" -pl 3\n ```\n Note: running with NCBO Parser parser is slower than other parsers because it is using a web service depended on the Internet connection.\n\n## Recreate Indexes (RDF Graph Index and Text Feature Index)\n\nAll indexes are already provided in this project. However, if you want to recreate all indexes you can use the following script on command prompt or terminal. Please be patient, it may take times to be finished.\n\n### Indexing the pmr\nDownload all required ontology dictionaries for indexing from (https://bioportal.bioontology.org/).\nPreferably is csv files but obo files are working.\nList of ontology dictionaries:\n'SO','PW','PSIMOD','PR','PATO','OPB','NCBITAXON','MAMO',\n'FMA','EFO','EDAM','ECO','CL','CHEBI','BTO','SBO',\n'UNIPROT','KEGG','EC-CODE','ENSEMBL','GO'\n\n```\nNLIMED --build-index pmr \"{location-of-ontology-files}\"\n```\n\n### Indexing biomodels\n1. Download all RDF files (ftp://ftp.ebi.ac.uk/pub/databases/RDF/biomodels/r31/biomodels-rdf.tar.bz2) as a compressed file from BioModels. Extract the compressed file at your convenience directory.\n\n2. Run the following code:\n ```\n NLIMED --build-index bm \"{location-of-ontology-files}\" \"{location-of-RDF-files}\"\n ```\n\n\n", "description_content_type": "text/markdown", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/napakalas/NLIMED", "keywords": "", "license": "", "maintainer": "", "maintainer_email": "", "name": "NLIMED", "package_url": "https://pypi.org/project/NLIMED/", "platform": "", "project_url": "https://pypi.org/project/NLIMED/", "project_urls": { "Homepage": "https://github.com/napakalas/NLIMED" }, "release_url": "https://pypi.org/project/NLIMED/0.0.2/", "requires_dist": [ "nltk", "SPARQLWrapper", "pandas", "stanza", "benepar" ], "requires_python": "", "summary": "Natural Language Interface for Model Entity Discovery (NLIMED) is an interface to search model entities in the biosimulation models in repositories.", "version": "0.0.2", "yanked": false, "yanked_reason": null }, "last_serial": 10523103, "releases": { "0.0.1": [ { "comment_text": "", "digests": { "md5": "e1660285688dd22686a00053e9fa52fc", "sha256": "caa80eeb8b1682559935354f168047eb78aff8169b2d2eb87aae704ae6f99311" }, "downloads": -1, "filename": "NLIMED-0.0.1-py3-none-any.whl", "has_sig": false, "md5_digest": "e1660285688dd22686a00053e9fa52fc", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 2189275, "upload_time": "2019-10-23T22:28:35", "upload_time_iso_8601": "2019-10-23T22:28:35.009796Z", "url": "https://files.pythonhosted.org/packages/4f/61/c7b00b78f53d0eace50cf0aa8aef291d409acfeacf279ecbfebb957a4eab/NLIMED-0.0.1-py3-none-any.whl", "yanked": false, "yanked_reason": null }, { "comment_text": "", "digests": { "md5": "0d8fe1077ce82ff4009c5ad23fff9e4a", "sha256": "cf21ec6d25072c1293646b0e036c0d586fa6817967570a6a68241d4c4e677b57" }, "downloads": -1, "filename": "NLIMED-0.0.1.tar.gz", "has_sig": false, "md5_digest": "0d8fe1077ce82ff4009c5ad23fff9e4a", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 2036004, "upload_time": "2019-10-23T22:28:38", "upload_time_iso_8601": "2019-10-23T22:28:38.451194Z", "url": "https://files.pythonhosted.org/packages/3f/4f/f8d4cd75ffebbfebb5633e66c3200503a6c7b50a0d29575877689eeca97b/NLIMED-0.0.1.tar.gz", "yanked": false, "yanked_reason": null } ], "0.0.2": [ { "comment_text": "", "digests": { "md5": "1c9b531c08eaa9c72e67b1ce5cc7b2a0", "sha256": "3e80fc62b3a507f9c011e1a6cef26cb5e0114d6abdd95350e0f888c14bcce5ce" }, "downloads": -1, "filename": "NLIMED-0.0.2-py3-none-any.whl", "has_sig": false, "md5_digest": "1c9b531c08eaa9c72e67b1ce5cc7b2a0", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 12732536, "upload_time": "2021-06-01T00:46:00", "upload_time_iso_8601": "2021-06-01T00:46:00.930022Z", "url": "https://files.pythonhosted.org/packages/8b/fb/4b5905dd1adbcc14340a14dc4950547b6473370e8b21d2d4f10f67106b0c/NLIMED-0.0.2-py3-none-any.whl", "yanked": false, "yanked_reason": null }, { "comment_text": "", "digests": { "md5": "a8341ba5b11caefade701d55953001b3", "sha256": "d0eabe2031a74a9ffc7d2c6a6c4d8fa3130197eafbba1aded8da05910fa04062" }, "downloads": -1, "filename": "NLIMED-0.0.2.tar.gz", "has_sig": false, "md5_digest": "a8341ba5b11caefade701d55953001b3", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 12228404, "upload_time": "2021-06-01T00:46:04", "upload_time_iso_8601": "2021-06-01T00:46:04.246783Z", "url": "https://files.pythonhosted.org/packages/af/74/54e43543c3667849b763867a376a5c8fcf36cef0c7da77460fe2f0f2e553/NLIMED-0.0.2.tar.gz", "yanked": false, "yanked_reason": null } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "1c9b531c08eaa9c72e67b1ce5cc7b2a0", "sha256": "3e80fc62b3a507f9c011e1a6cef26cb5e0114d6abdd95350e0f888c14bcce5ce" }, "downloads": -1, "filename": "NLIMED-0.0.2-py3-none-any.whl", "has_sig": false, "md5_digest": "1c9b531c08eaa9c72e67b1ce5cc7b2a0", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 12732536, "upload_time": "2021-06-01T00:46:00", "upload_time_iso_8601": "2021-06-01T00:46:00.930022Z", "url": "https://files.pythonhosted.org/packages/8b/fb/4b5905dd1adbcc14340a14dc4950547b6473370e8b21d2d4f10f67106b0c/NLIMED-0.0.2-py3-none-any.whl", "yanked": false, "yanked_reason": null }, { "comment_text": "", "digests": { "md5": "a8341ba5b11caefade701d55953001b3", "sha256": "d0eabe2031a74a9ffc7d2c6a6c4d8fa3130197eafbba1aded8da05910fa04062" }, "downloads": -1, "filename": "NLIMED-0.0.2.tar.gz", "has_sig": false, "md5_digest": "a8341ba5b11caefade701d55953001b3", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 12228404, "upload_time": "2021-06-01T00:46:04", "upload_time_iso_8601": "2021-06-01T00:46:04.246783Z", "url": "https://files.pythonhosted.org/packages/af/74/54e43543c3667849b763867a376a5c8fcf36cef0c7da77460fe2f0f2e553/NLIMED-0.0.2.tar.gz", "yanked": false, "yanked_reason": null } ], "vulnerabilities": [] }