{ "info": { "author": "Brent Pedersen", "author_email": "bpederse@gmail.com", "bugtrack_url": null, "classifiers": [ "Topic :: Scientific/Engineering :: Bio-Informatics" ], "description": "A library to combine, analyze, group and correct p-values in BED files.\nUnique tools involve correction for spatial autocorrelation.\nThis is useful for ChIP-Seq probes and Tiling arrays, or any data with spatial\ncorrelation.\n\nAbout\n=====\n\nThe Bioinformatics Applications Note manuscript is available here:\n http://bioinformatics.oxfordjournals.org/content/28/22/2986.full\n\nIt includes an explanation of 3 examples in the examples directory\nof this repository.\n\nThe software is distributed under the MIT license.\n\nQuickStart\n==========\n\n.. image:: https://anaconda.org/bioconda/combined-pvalues/badges/installer/conda.svg\n :target: https://conda.anaconda.org/bioconda\n\nIf your data is a sorted BED (first columns are chrom, start, stop) with a column for\np-value in the 4th column from single-probe tests--e.g. from limma::topTable(..., n=Inf),\nyou can find DMRs as::\n\n comb-p pipeline \\\n -c 4 \\ # p-values in 4th column\n --seed 1e-3 \\ # require a p-value of 1e-3 to start a region \n --dist 200 # extend region if find another p-value within this dist\n -p $OUT_PREFIX \\\n --region-filter-p 0.1 \\ # post-filter reported regions\n --anno mm9 \\ # annotate with genome mm9 from UCSC\n $PVALS # sorted BED file with pvals in 4th column\n\nThe output will look like:\n\n https://github.com/brentp/combined-pvalues/blob/master/manuscript/anno.tsv\n\nWith DMRs annotated to the nearest gene and CpG island. Negative distances indicate\nthat the DMR is upstream of the gene. DMRs inside of genes have `exon` / `UTR` or the\nappropriate feature to indicate their location within the gene.\nIf `matplotlib` is installed, then you will get a figure like this:\n\n.. figure:: https://gist.githubusercontent.com/brentp/bf7d3c3d3f23cc319ed8/raw/b547a7458b1cf91f2e19baf1c96893272e06c1e1/mslk.png\n\n Manhattan plot of p-values with DMRs highlighted\n\n Regions passing the `--region-filter-p` are highlighted in a red color.\n\nCommands below give finer control over each step.\n\nInstallation\n============\n\n`comb-p` requires python2.7.\n\nIf you do not have `numpy` and `scipy` installed. Please use anaconda\nfrom: http://continuum.io/downloads\nwhich is a complete python distribution with those modules included.\n\nrun::\n\n sudo python setup.py install\n\nto have `comb-p` installed on your path.\nOtherwise, you can use the python scripts in the cpv subdirectory.\nE.g.\n\n::\n\n python cpv/peaks.py\n\ncorresponds to the command::\n\n comb-p peaks\n\n\nInvocation\n==========\nThe program is run with::\n\n $ comb-p\n\nThis message is displayed::\n\n To run, indicate one of:\n\n acf - calculate autocorrelation within BED file\n slk - Stouffer-Liptak-Kechris correction of spatially correlated p-values\n fdr - Benjamini-Hochberg correction of p-values\n peaks - find peaks in a BED file.\n region_p - generate SLK p-values for a region (of p-values)\n manhattan - a manhattan plot of values in a BED file.\n pipeline - run the series of commands to find DMRs.\n\n NOTE: most of these assume *sorted* BED files.\n\n\nWhere each of the listed modules indicates an available program.\nRunning any of the above will result in a more detailed help message. e.g.::\n\n $ comb-p acf -h\n\nGives::\n\n usage: comb-p [-h] [-d D] [-c C] files [files ...]\n\n calculate the autocorrelation of a *sorted* bed file with a set\n of *distance* lags.\n\n positional arguments:\n files files to process\n\n optional arguments:\n -h, --help show this help message and exit\n -d D start:stop:stepsize of distance. e.g. 15:500:50 means check acf\n at distances of:[15, 65, 115, 165, 215, 265, 315, 365, 415, 465]\n -c C column number with p-values for acf calculations\n\n\nIndicating that it can be run as::\n\n $ .comb-p acf -d 1:500:50 -c 5 data/pvals.bed > data/acf.txt\n\nEach module is described in detail below.\n\nExample\n=======\n\nFind and merge peaks/troughs within a bed file\n----------------------------------------------\n::\n\n python cpv/peaks.py --seed 0.05 --dist 1000 data/pvals.bed > data/pvals.peaks.bed\n\nThis will seed peaks with values < 0.05 and merge any adjacent values\nwithin 1KB. The output is a BED file containing the extent of the troughs.\nIf the argument `--invert` is specified, the program will find look for\nvalues larger than the seed.\n\nPipeline\n========\n\nThe default steps are to:\n\n 1) calculate the ACF\n 2) use the ACF to do the Stouffer-Liptak correction\n 3) do the Benjamini-Hochberg FDR correction\n 4) find regions from the adjusted p-values.\n\nInputs and outputs to each step are BED files.\n\nNote that any of these steps can be run independently, e.g. to do multiple\ntesting correction on a BED file with p-values, just call the fdr.py script.\n\nACF\n---\nTo calclulate autocorrelation from 1 to 500 bases with a stepsize of 50\non the p-values in column 5, the command would look something like:\n\n $ python cpv/acf.py -d 1:500:50 -c 5 data/pvals.bed > data/acf.txt\n\nThe ACF will look something like::\n\n # {link}\n lag_min lag_max correlation N\n 1 51 0.06853 2982\n 51 101 0.04583 4182\n 101 151 0.02719 2623\n 151 201 0.0365 3564\n 201 251 0.0005302 2676\n 251 301 0.02595 3066\n 301 351 0.04935 2773\n 351 401 0.04592 2505\n 401 451 0.03923 2972\n\nWhere the first and second columns indicate the lag-bin, the third is the\nautocorrelation at that lag, and the last is the number of pairs used in\ncalculating the autocorrelation.\nIf that number is too small, the correlation values may be unreliable.\nWe expect the correlation to decrease with increase lag (unless there is some\nperiodicity).\n\nThat output should be directed to a file for use in later steps.\n\nCombine P-values with Stouffer-Liptak-Kechris correction\n--------------------------------------------------------\n\nSee\n+++\n\nThe ACF output is then used to do the Stouffer-Liptak-Kechris correction.\nA call like::\n\n $ python cpv/slk.py --acf data/acf.txt -c 5 data/pvals.bed > data/pvals.acf.bed\n\n + adjusts the p-values by stouffer-liptak with values from the autocorrelation\n in the step above.\n + outputs a new BED file with columns:\n\n*chr*, *start*, *end*, *pval*, *combined-pval*\n\nRegions\n-------\nWe are often interested in entire regions. After running the above example, we\ncan find the extent of any regions using::\n\n $ python cpv/peaks.py --dist 500 --seed 0.1 \\\n data/pvals.adjusted.bed > data/pvals.regions.bed\n\nwhere the seed inidicates a minimum p-value to start a region.\nAgain, *-c* can be used to indicate the column containing the p-values\n(defaults to last column)`--dist` tells the program to merge peaks (in this case\ntroughs) within 500 bases of the other.\nThe output file is a BED file with each region and the lowest (currently)\np-value in the region.\n\nThe cpv/peaks.py script is quite flexible. Run it without arguments for\nfurther usage.\n\nRegion P-values (region_p)\n--------------------------\n\nThe reported p-value is a Stouffer-Liptak *p-value* for the entire\nregion. This is done by taking a file of regions, and the original,\nuncorrected p-values, calculating the ACF out to the length of the longest\nregion, and then using that ACF to perform the Stouffer-Liptak correction on\neach region based on the original p-values.\nThe 1-step Sidak correction for multiple testing is performed on the p-value\nfor the region. Because the original p-values are sent in, we know the\ncoverage of the input. The Sidak correction is then based on the number of\npossible regions of the current size that could be created from the total\ncoverage. The extra columns added to the output file are the Stouffer-Liptak\np-value of the region and the Sidak correction of that p-value.\n\n\nAn invocation::\n\n $ comb-p region_p -p data/pvals.bed \\\n -r data/regions.bed \\\n -s 50 \\\n -c 5 > data/regions.sig.bed\n\nWill extract p-values from column 5 of pvals.bed for lines within regions in\nregions.bed. \n\nFrequently Asked Questions\n==========================\n\nSee the Wiki `F.A.Q.`_\n\n.. _`F.A.Q.`: https://github.com/brentp/combined-pvalues/wiki/F.A.Q.", "description_content_type": null, "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/brentp/combined-pvalues", "keywords": "", "license": "MIT", "maintainer": "", "maintainer_email": "", "name": "cpv", "package_url": "https://pypi.org/project/cpv/", "platform": "", "project_url": "https://pypi.org/project/cpv/", "project_urls": { "Homepage": "https://github.com/brentp/combined-pvalues" }, "release_url": "https://pypi.org/project/cpv/0.48/", "requires_dist": null, "requires_python": "", "summary": "combine p-values", "version": "0.48" }, "last_serial": 3602389, "releases": { "0.48": [ { "comment_text": "", "digests": { "md5": "e312a550bc46a82fb63c6eb5a261203e", "sha256": "118da7b82e914c0c91d9a19d101068afed0ae83756756d8abd8ca2870107a17e" }, "downloads": -1, "filename": "cpv-0.48.tar.gz", "has_sig": false, "md5_digest": "e312a550bc46a82fb63c6eb5a261203e", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 65410, "upload_time": "2018-02-21T14:56:47", "url": "https://files.pythonhosted.org/packages/dc/90/9730bd4a1d13a5ec823c5f4663128347a9feb9c5757d0fd52ebf9bbaf27b/cpv-0.48.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "e312a550bc46a82fb63c6eb5a261203e", "sha256": "118da7b82e914c0c91d9a19d101068afed0ae83756756d8abd8ca2870107a17e" }, "downloads": -1, "filename": "cpv-0.48.tar.gz", "has_sig": false, "md5_digest": "e312a550bc46a82fb63c6eb5a261203e", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 65410, "upload_time": "2018-02-21T14:56:47", "url": "https://files.pythonhosted.org/packages/dc/90/9730bd4a1d13a5ec823c5f4663128347a9feb9c5757d0fd52ebf9bbaf27b/cpv-0.48.tar.gz" } ] }