{ "info": { "author": "Hannes Svardal", "author_email": "hannes.svardal@gmail.com", "bugtrack_url": null, "classifiers": [ "Development Status :: 5 - Production/Stable", "Intended Audience :: Developers", "License :: OSI Approved :: MIT License", "Programming Language :: Python :: 2", "Programming Language :: Python :: 2.6", "Programming Language :: Python :: 2.7", "Programming Language :: Python :: 3", "Programming Language :: Python :: 3.2", "Programming Language :: Python :: 3.3", "Programming Language :: Python :: 3.4", "Topic :: Scientific/Engineering :: Bio-Informatics" ], "description": "Enrichme\n======================================================================\n\nTesting for enrichment of genome-wide scores or p-values\nin gene-categories.\nenrichme.py implements a method that naturally corrects for\ngene length, linkage disequilibrium and gene clustering by comaring the data\nto a null-distribution obtained by randomly rotating the scores\nagainst their genomic positions.\n\n\nGiven you have results from a genome wide association study (GWAS),\na genome wide seletion scan or any kind of analysis that attributes\nscores, likelihoods or p-values to positions across the genome,\nenrichme.py test whether high scores are enriched in specific gene categories.\n\n\nInstallation\n======================================================================\n\nUse `pip `_ or easy_install::\n\n pip install enrichme==0.1.1\n\n\nTesting\n======================================================================\n\nYou can run unit tests using the command::\n\n python setup.py test\n\nEnrichment test implementations\n======================================================================\nThe program currently implements three methods:\n\n1. Candidate (enrichme.py -M Candidate --help)\\\n Comparing a candidate gene list to a background gene list.\n This is a standard function that is done by many enrichment\n analysis tools. No correction for gene length or LD.\n\n2. TopScores (enrichme.py -M TopScores --help)\\\n Check whether top ranking scores are within or close to genes\n enriched in specific gene-categories.\n This method naturally corrects for gene-length and LD.\n This method is useful if one expects that only scores above\n a threshold contain biologically relevant information.\n No information is used from the ranking of scores that pass\n the threshold.\n\n3. Summary (enrichme.py -M Summary --help)\\\n Similar to TopScores but instead of defining a threshold\n on the scores, a summary of scores is calculated for each gene\n category. As an example, for each gene category, the program could\n calculate the mean across the category of the maximum score across\n each gene in the category.\n This method is useful if one thinks that scores contain information\n down to low value or if the relative value of scores is important\n beyond defining a simple threshold.\n\nHow to use\n======================================================================\n\n::\n\n ./enrichme.py --help\n\nExample scripts with further explanations and example data can be found in ./examples/\n\nTest for enrichment of GWAS scores above 3 (p-value<10^-3) in GO-categories::\n\n cd ./examples/\n ../enrichme.py -M TopScores \\\n --feature_to_category example_gene_to_category.csv \\\n --feature_to_category_cols gene_id go_identifier \\\n --rod example_GWAS_result.csv \\\n --rod_cols chrom pos score \\\n --features example_gene_annotation.csv \\\n --feature_cols chrom start end gene_id \\\n --name minimal_test_TopScores \\\n --n_permut 10 \\\n --top_type threshold \\\n --top 3 \\\n --descending \\\n --max_dist 5000 \\\n\n\nFor each gene, calculate the max GWAS score within the gene. Then, test for enrichment of the mean of these gene-scores in GO-categories::\n\n cd ./examples/ \n ../enrichme.py -M Summary \\\n --feature_to_category example_gene_to_category_40_cats.csv \\\n --feature_to_category_cols gene_id go_identifier \\\n --category_to_description example_go_to_name.csv \\\n --category_to_description_cols go_identifier go_name \\\n --rod example_GWAS_result.csv \\\n --rod_cols chrom pos score \\\n --features example_gene_annotation.csv \\\n --feature_cols chrom start end gene_id \\\n --name minimal_test_Summary \\\n --feature_summary max \\\n --category_summary mean \\\n\n\n\nInput files\n======================================================================\n\nA. FEATURE to CATEGORY mapping (input argument --feature_to_category)\\\n This file maps genetic features (usually genes) to feature categories\n such as gene-lists. This could be GO-terms or custom defined gene-lists.\n File can be tab-separated (.tsv) or comma-separates (.csv)\n\n .. code::\n \n $head examples/example_gene_annotation.csv\n gene_id,go_identifier\n AT1G01010,GO:0005634\n AT1G01010,GO:0006355\n AT1G01010,GO:0003677\n AT1G01010,GO:0007275\n AT1G01010,GO:0003700\n AT1G01010,GO:0043090\n AT1G01010,GO:0006888\n AT1G01020,GO:0016125\n AT1G01020,GO:0016020\n\n\nB. FEATURES (input argument --features)\\\n This file gives the position of features (e.g. genes)\n across the genome. Often this will be the gene\n annotation. The column pos gives the start of the feature.\n\n .. code::\n \n $head examples/example_gene_annotation.csv\n chrom,start,end,gene_id\n 1,3631,5899,AT1G01010\n 1,5928,8737,AT1G01020\n 1,11649,13714,AT1G01030\n 1,23146,31227,AT1G01040\n 1,28500,28706,AT1G01046\n 1,31170,33153,AT1G01050\n 1,33379,37840,AT1G01060\n 1,38752,40944,AT1G01070\n 1,44677,44787,AT1G01073\n\nC. Scores across the genome (input argument --rod)\\\n This could be position of SNPs and a\n score or p-value associated with them.\n ROD stands for Reference Ordered Data.\n\n .. code::\n\n $head examples/example_GWAS_result.csv\n chrom,pos,score\n 1,3102,0.09305379\n 1,4648,0.30615359999999997\n 1,4880,0.35306350000000003\n 1,5975,0.9596856\n 1,6063,0.23715001\n 1,6449,0.019213928\n 1,6514,0.43630862\n 1,6603,0.23235813\n 1,6768,0.58977395\n\nD. [Optional] Mapping of categories to category descriptions (input argument --category_to_description)\\\n This could be a csv with GO-category ids and descriptions.\n\n .. code::\n\n $head examples/example_go_to_name.csv \n go_identifier,go_name\n GO:0000001,mitochondrion inheritance\n GO:0000002,mitochondrial genome maintenance\n GO:0000003,reproduction\n GO:0042254,ribosome biogenesis\n GO:0044183,protein binding involved in protein folding\n GO:0051082,unfolded protein binding\n GO:0000006,high-affinity zinc uptake transmembrane transporter activity\n GO:0000007,low-affinity zinc ion transmembrane transporter activity\n GO:0003756,protein disulfide isomerase activity\n\nOutput\n======================================================================\n\nThe different modes provide different output files. The main output file is common for all modes, called .pvals.tsv. It is a ranked table with most significantly enriched categories on top::\n\n go_identifier out_of rank score_summary p_value benjamini_hochberg go_name\n GO:0000165 2000 1943 0.8731354255802085 0.02898550724637683 27.014492753623205 MAPK cascade\n GO:0000041 2000 1825 0.8348620634942308 0.08795602198900554 27.32500416458439 transition metal ion transport\n GO:0000160 2000 1800 0.9736749697560976 0.1004497751124438 23.404797601199405 phosphorelay signal transduction system\n GO:0000164 2000 1698 1.0469719100000001 0.15142428785607198 28.225487256371814 protein phosphatase type 1 complex\n GO:0000096 2000 1692 0.8680123230000001 0.15442278860569714 23.987006496751622 sulfur amino acid metabolic process\n GO:0000145 2000 1685 0.9976431777777778 0.15792103948025982 21.02605839937174 exocyst\n GO:0000159 2000 1562 0.9504652303750003 0.21939030484757627 25.558970514742636 protein phosphatase type 2A complex\n GO:0000156 2000 1558 0.9427544812820514 0.22138930534732637 22.92609250930091 phosphorelay response regulator activity\n\nParallel support\n======================================================================\n\nThere are two ways to run this program in parallel. Per default, the program uses as many cores as available on the host machine. This can be controlled with the --ncpus option. Advanced users, who want to parallelise across multiple nodes of a compute cluster, can use the built in map/reduce framework to automatically combine results from multiple independent runs. See\n\n::\n\n examples/run_permute_reduce_examples.sh\n \nfor an example.\n\nChangelog\n======================================================================\n\n**enrichme** follows `semantic versioning `_. The\nfirst release with stable API will be 1.0.0 (soon). Until then, you\nare encouraged to specify explicitly the version in your dependency\ntools, e.g.::\n\n pip install enrichme==0.1.1\n\n- 0.1.1 Initial release. \n\n\nHistory\n-------\n\n0.0.1\n+++++\n* Initial checkin", "description_content_type": null, "docs_url": null, "download_url": "UNKNOWN", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/feilchenfeldt/enrichme", "keywords": "Enrichment", "license": "MIT", "maintainer": null, "maintainer_email": null, "name": "enrichme", "package_url": "https://pypi.org/project/enrichme/", "platform": "UNKNOWN", "project_url": "https://pypi.org/project/enrichme/", "project_urls": { "Download": "UNKNOWN", "Homepage": "https://github.com/feilchenfeldt/enrichme" }, "release_url": "https://pypi.org/project/enrichme/0.1.1/", "requires_dist": null, "requires_python": null, "summary": "Test enrichment of genome-wide statistics in gene (or other) categories while correcting for gene-length and clustering.", "version": "0.1.1" }, "last_serial": 2044529, "releases": { "0.1.0": [], "0.1.1": [ { "comment_text": "", "digests": { "md5": "618c765baecec9df1fa95406c44378db", "sha256": "50839b7d71bc0a89d1416af51e4987462c669fe3475c3a8570433b81fb288509" }, "downloads": -1, "filename": "enrichme-0.1.1-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "618c765baecec9df1fa95406c44378db", "packagetype": "bdist_wheel", "python_version": "2.7", "requires_python": null, "size": 25240, "upload_time": "2016-04-04T11:41:38", "url": "https://files.pythonhosted.org/packages/d2/80/d39fb091dfb5f7dfa1ab211f2cb8d7452cbd938657f80fc902ab856d5230/enrichme-0.1.1-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "e4a58d6c4ad78b92760b46e1a65aea3f", "sha256": "5bd0a110d68cce41cd9e3fac6ed2270ff25a3e1f2e42455ea67c4f2ba1e7f465" }, "downloads": -1, "filename": "enrichme-0.1.1.tar.gz", "has_sig": false, "md5_digest": "e4a58d6c4ad78b92760b46e1a65aea3f", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 3090039, "upload_time": "2016-04-04T11:41:27", "url": "https://files.pythonhosted.org/packages/ec/b0/155d97045e0788323fc8d0137fe4213f042cc1543cff2f598a8c5dffd688/enrichme-0.1.1.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "618c765baecec9df1fa95406c44378db", "sha256": "50839b7d71bc0a89d1416af51e4987462c669fe3475c3a8570433b81fb288509" }, "downloads": -1, "filename": "enrichme-0.1.1-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "618c765baecec9df1fa95406c44378db", "packagetype": "bdist_wheel", "python_version": "2.7", "requires_python": null, "size": 25240, "upload_time": "2016-04-04T11:41:38", "url": "https://files.pythonhosted.org/packages/d2/80/d39fb091dfb5f7dfa1ab211f2cb8d7452cbd938657f80fc902ab856d5230/enrichme-0.1.1-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "e4a58d6c4ad78b92760b46e1a65aea3f", "sha256": "5bd0a110d68cce41cd9e3fac6ed2270ff25a3e1f2e42455ea67c4f2ba1e7f465" }, "downloads": -1, "filename": "enrichme-0.1.1.tar.gz", "has_sig": false, "md5_digest": "e4a58d6c4ad78b92760b46e1a65aea3f", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 3090039, "upload_time": "2016-04-04T11:41:27", "url": "https://files.pythonhosted.org/packages/ec/b0/155d97045e0788323fc8d0137fe4213f042cc1543cff2f598a8c5dffd688/enrichme-0.1.1.tar.gz" } ] }