{ "info": { "author": "Jim Arnold", "author_email": "jimmyjamesarnold@gmail.com", "bugtrack_url": null, "classifiers": [ "Development Status :: 3 - Alpha", "Intended Audience :: Science/Research", "License :: OSI Approved :: MIT License", "Operating System :: OS Independent", "Programming Language :: Python :: 3", "Topic :: Scientific/Engineering :: Information Analysis", "Topic :: Scientific/Engineering :: Interface Engine/Protocol Translator" ], "description": "# annoPipeline - an API-enabled gene annotation pipeline\n\n***annoPipeline*** uses APIs from [mygene.info](http://mygene.info/) and [Entrez esummary](https://dataguide.nlm.nih.gov/eutilities/utilities.html#esummary) to annotate a user-provided list of gene symbols.\n\nGenerates a pandas DataFrame with gene symbol, gene name, EntrezID, and bibliographic info for up to 5 pubmed publications where a functional reference was provided (more about functional references at [GeneRIF](https://www.ncbi.nlm.nih.gov/gene/about-generif)).\n\nDesigned to be useful for tasks such as:\n* identifying relevant publications for a given function\n* analyzing publications trends for genes belonging to a common pathway\n* identifying influential PIs for a given gene network. \n\n## Reqirements:\n\n* Written for use with Python 3.7, not tested on other versions.\n\n* *annoPipeline* requires:\n - numpy >= 1.16.2\n - pandas >= 0.24.2\n - Biopython >= 1.73\n - openpyxl >= 2.6.1\n - requests >= 2.21.0\n\n## To Install:\n```\npip install annoPipeline\n```\n\nOr clone the repo from github.\nThen, in the annoPipeline directory, run:\n```\npython setup.py install\n```\nRequired dependencies will be installed if missing, may take a few seconds.\n\n## Example usage:\n\nExecute the full annotation pipeline on a list of gene symbols like this:\n```python\nimport annoPipeline as ap\n\n# define a list of genes you would like annotated\ngeneList = ['CDK2', 'FGFR1', 'SLC6A4']\n\n# annoPipeline will execute full annotation pipeline (see individual functions below). \ndf = ap.annoPipeline(geneList) # returns pandas df with annotations for gene and bibliographic info.\n```\n- ***ap.annoPipeline*** will default save annotation output to Excel file named by geneList symbols separated by '_'.\n\n### Warning! \nIf querying a **single gene**, still pass as a list. For example:\n```python\nimport annoPipeline as ap\n\ndf = ap.annoPipeline(['CDK2']) # for single gene queries still include [] - will be fixed in later version\n```\n\n\n## v0.0.1 Functionality\n\n### Task 1:\n1. From the MyGeneInfo API, use the \u201cGene query service\" GET method to return details on a given list of human gene symbols.\n2. From the returned json, parse out the \u201cname\", \u201csymbol\" and \u201centrezgene\" values and print to screen\n\nUse *queryGenes()*:\n```python\nimport annoPipeline as ap\n\ngeneList = ['CDK2', 'FGFR1', 'SLC6A4']\n\nl1 = ap.queryGenes(geneList) # returns list of dicts where keys are default mygene fields (symbol,name,taxid,entrezgene,ensemblgene)\n```\n\n### Task 2: \n1. \tUsing the appropriate identifier from the above result, send a query to the MyGeneInfo \u201cGene annotation services\" method for each gene\n2.\tFrom the resulting json, collate up to 5 generif descriptions per gene\n3.\tWrite the results to an Excel spreadsheet with columns: gene_symbol, gene_name, entrez_id, generifs\n\nUse *getAnno()*:\n```python\nimport annoPipeline as ap\n\ngeneList = ['CDK2', 'FGFR1', 'SLC6A4']\nl1 = ap.queryGenes(geneList)\nl2 = ap.getAnno(l1, saveExcel=True) # saveExcel defaults False\n```\n- returns pandas df with genes and up to 5 generifs from mygene.info. \n- default **saveExcel**=*False*, to save output to Excel must state *True*\n- if *True*, Excel file will be named by geneList symbols separated by '_'. \n\n### Task 3:\n1. Use the Pubmed IDs associated with the above generif content to extract additional bibliographic information.\n\nUse *addBibs()*:\n```python\nimport annoPipeline as ap\n\ngeneList = ['CDK2', 'FGFR1', 'SLC6A4']\nl1 = ap.queryGenes(geneList)\nl2 = ap.getAnno(l1)\nl3 = ap.addBibs(l2) # will return df with genes and up to 5 generifs from mygene.info\n``` \n* Currently returns the following bibliographic information when available:\n * PubDate\n * Source\n * Title\n * LastAuthor\n * DOI\n * PmcRefCount\n\n", "description_content_type": "text/markdown", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/jimmyjamesarnold/annoPipeline", "keywords": "", "license": "MIT", "maintainer": "", "maintainer_email": "", "name": "annoPipeline", "package_url": "https://pypi.org/project/annoPipeline/", "platform": "", "project_url": "https://pypi.org/project/annoPipeline/", "project_urls": { "Homepage": "https://github.com/jimmyjamesarnold/annoPipeline" }, "release_url": "https://pypi.org/project/annoPipeline/0.0.1/", "requires_dist": [ "numpy (>=1.16.2)", "pandas (>=0.24.2)", "Biopython (>=1.73)", "openpyxl (>=2.6.1)", "requests (>=2.21.0)" ], "requires_python": "", "summary": "API-enabled Gene Annotation", "version": "0.0.1" }, "last_serial": 5151411, "releases": { "0.0.1": [ { "comment_text": "", "digests": { "md5": "636d79ba9b92281e959916825e7d4f4f", "sha256": "a60b89c80a1c1144c0587e1f3f1766ce8afaf432229f2fe30a276610c18516b1" }, "downloads": -1, "filename": "annoPipeline-0.0.1-py3-none-any.whl", "has_sig": false, "md5_digest": "636d79ba9b92281e959916825e7d4f4f", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 6901, "upload_time": "2019-04-16T18:25:19", "url": "https://files.pythonhosted.org/packages/13/5a/6045cf8ff00ba2299465ce3f171cddf70552025a2afef020c768b8ac6784/annoPipeline-0.0.1-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "93c9e719d5691b3bfa5c47dfe023f2c2", "sha256": "ed4d17e48eadb908f76c06839ab27670250e11686d1602b4820882cacc58ea00" }, "downloads": -1, "filename": "annoPipeline-0.0.1.tar.gz", "has_sig": false, "md5_digest": "93c9e719d5691b3bfa5c47dfe023f2c2", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 4965, "upload_time": "2019-04-16T18:25:23", "url": "https://files.pythonhosted.org/packages/fd/ab/578fb6f6863f06729d45184ea985778d9848a503518f9c2356773aa91002/annoPipeline-0.0.1.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "636d79ba9b92281e959916825e7d4f4f", "sha256": "a60b89c80a1c1144c0587e1f3f1766ce8afaf432229f2fe30a276610c18516b1" }, "downloads": -1, "filename": "annoPipeline-0.0.1-py3-none-any.whl", "has_sig": false, "md5_digest": "636d79ba9b92281e959916825e7d4f4f", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 6901, "upload_time": "2019-04-16T18:25:19", "url": "https://files.pythonhosted.org/packages/13/5a/6045cf8ff00ba2299465ce3f171cddf70552025a2afef020c768b8ac6784/annoPipeline-0.0.1-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "93c9e719d5691b3bfa5c47dfe023f2c2", "sha256": "ed4d17e48eadb908f76c06839ab27670250e11686d1602b4820882cacc58ea00" }, "downloads": -1, "filename": "annoPipeline-0.0.1.tar.gz", "has_sig": false, "md5_digest": "93c9e719d5691b3bfa5c47dfe023f2c2", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 4965, "upload_time": "2019-04-16T18:25:23", "url": "https://files.pythonhosted.org/packages/fd/ab/578fb6f6863f06729d45184ea985778d9848a503518f9c2356773aa91002/annoPipeline-0.0.1.tar.gz" } ] }