{ "info": { "author": "Tim O'Donnell", "author_email": "timodonnell@gmail.com", "bugtrack_url": null, "classifiers": [ "Development Status :: 3 - Alpha", "Environment :: Console", "Intended Audience :: Science/Research", "License :: OSI Approved :: Apache Software License", "Operating System :: OS Independent", "Programming Language :: Python", "Topic :: Scientific/Engineering :: Bio-Informatics" ], "description": ".. image:: https://travis-ci.org/hammerlab/varlens.svg?branch=master\n :target: https://travis-ci.org/hammerlab/varlens\n\nvarlens\n======================\n\nA collection of Python tools for working with genomic variants and\nnext-generation sequencing reads. Not particularly fast for large datasets. The\nemphasis is on exracting what you need from BAMs and VCFs into a CSV file for\nfurther analysis.\n\nBuilt on `varcode `_ and `pysam `_.\n\nvarlens-variants\n Combine, annotate, and filter variants from VCF or CSV files. Available\n annotations include genes, variant effects, surrounding sequence context,\n counts of supporting reads from specified BAM files, and MHC I binding\n affinity prediction of mutant peptides.\n\nvarlens-reads\n Display, filter, and copy reads from a SAM/BAM file. Partial replacement for ``samtools view``.\n\nvarlens-allele-support\n Count reads supporting each allele at specified sites in BAM files.\n\n\nInstallation\n-------------\n\nTo install from `PyPI `_:\n\n::\n \n pip install varlens\n\nOr from a git checkout:\n\n::\n\n pip install .\n\nTo run the tests:\n\n::\n\n nosetests .\n\nTo build the documentation (just this README plus the commandline tool help):\n\n::\n\n pip install -e .\n pip install Sphinx\n cd docs\n make clean setup rst html\n\nThe docs will be written to the ``_build/html`` directory.\n\n\nvarlens-variants\n----------------------\n\nGiven variants from one or more VCF or CSV files, apply filters, add additional\ncolumns, and output to CSV.\n\nCurrently we can only output to CSV, not VCF.\n\nA number of useful annotations can be added for each variant by specifying\noptions of the form '--include-XXX', e.g. '--include-gene'. See detailed help\n(run with -h).\n\nExamples\n`````````````\n\nPrint basic info for the variants found in two VCF files. Note that variants\nfound in both files are listed in one row, and the 'sources' column lists\nthe files each variant was found in:\n\n::\n\n $ varlens-variants test/data/CELSR1/vcfs/vcf_1.vcf test/data/CELSR1/vcfs/vcf_2.vcf\n\n genome,contig,interbase_start,interbase_end,ref,alt,sources\n GRCh37,22,21829554,21829555,T,G,1.vcf\n GRCh37,22,46931059,46931060,A,C,1.vcf\n GRCh37,22,46931061,46931062,G,A,1.vcf 2.vcf\n GRCh37,22,50636217,50636218,A,C,1.vcf\n GRCh37,22,50875932,50875933,A,C,1.vcf\n GRCh37,22,45309892,45309893,T,G,2.vcf\n\nSame as the above but include additional columns giving varcode variant effect\nannotations and the genes the variants overlap, and write to a file:\n\n::\n\n $ varlens-variants test/data/CELSR1/vcfs/vcf_1.vcf test/data/CELSR1/vcfs/vcf_2.vcf \\\n --include-effect \\\n --include-gene \\\n --out /tmp/result.csv\n\n Wrote: /tmp/result.csv\n\n $ cat /tmp/result.csv\n\n genome,contig,interbase_start,interbase_end,ref,alt,sources,effect,gene\n GRCh37,22,21829554,21829555,T,G,1.vcf,non-coding-transcript,PI4KAP2\n GRCh37,22,46931059,46931060,A,C,1.vcf,p.S670A,CELSR1\n GRCh37,22,46931061,46931062,G,A,1.vcf 2.vcf,p.S669F,CELSR1\n GRCh37,22,50636217,50636218,A,C,1.vcf,intronic,TRABD\n GRCh37,22,50875932,50875933,A,C,1.vcf,splice-acceptor,PPP6R2\n GRCh37,22,45309892,45309893,T,G,2.vcf,p.T214P,PHF21B\n\nPrint counts for number of reads supporting reference/variant/other alleles\nfrom the specified BAM, counting only reads with mapping quality >= 10:\n\n::\n\n $ varlens-variants test/data/CELSR1/vcfs/vcf_1.vcf \\\n --include-read-evidence \\\n --reads test/data/CELSR1/bams/bam_1.bam \\\n --min-mapping-quality 10\n\n genome,contig,interbase_start,interbase_end,ref,alt,sources,num_alt,num_ref,total_depth\n GRCh37,22,21829554,21829555,T,G,vcf_1.vcf,0,0,0\n GRCh37,22,46931059,46931060,A,C,vcf_1.vcf,0,216,320\n GRCh37,22,46931061,46931062,G,A,vcf_1.vcf,0,321,321\n GRCh37,22,50636217,50636218,A,C,vcf_1.vcf,0,0,0\n GRCh37,22,50875932,50875933,A,C,vcf_1.vcf,0,0,0\n\n\nvarlens-reads\n----------------------\n\nFilter reads from one or more BAMs and output a CSV or a new BAM.\n\nLoci and VCF files may be specified, in which case reads are filtered to\noverlap the specified loci or variants.\n\nExamples\n`````````````\n\nPrint basic fields for the reads in a BAM:\n\n::\n\n $ varlens-reads test/data/CELSR1/bams/bam_0.bam\n\n query_name,reference_start,reference_end,cigarstring\n HISEQ:142:C5822ANXX:3:2116:16538:101199,46929962,46930062,100M\n HISEQ:142:C5822ANXX:3:1106:18985:32932,46929964,46930064,100M\n HISEQ:142:C5822ANXX:3:2201:21091:67220,46929966,46930066,100M\n HISEQ:142:C5822ANXX:4:1304:5363:12786,46929966,46930066,100M\n HISEQ:142:C5822ANXX:4:1104:9008:85114,46929969,46930069,100M\n HISEQ:142:C5822ANXX:3:2304:9921:94828,46929970,46930070,100M\n HISEQ:142:C5822ANXX:3:2211:6266:74633,46929973,46930073,100M\n HISEQ:142:C5822ANXX:3:1305:8982:42729,46929974,46930074,100M\n HISEQ:142:C5822ANXX:4:2316:5630:7371,46929978,46930078,100M\n ...\n\nSame as above but filter only to reads aligned on the (-) strand, write to a\nfile instead of stdout, and also include the mapping quality and sequenced\nbases in the output:\n\n::\n\n $ varlens-reads test/data/CELSR1/bams/bam_0.bam \\\n --is-reverse \\\n --field mapping_quality query_alignment_sequence \\\n --out /tmp/result.csv\n\n Wrote: /tmp/result.csv\n\n $ head /tmp/result.csv\n\n query_name,reference_start,reference_end,cigarstring,mapping_quality,query_alignment_sequence\n HISEQ:142:C5822ANXX:3:2116:16538:101199,46929962,46930062,100M,60,CATGATCTGGGCATTAGGGCCTTCATCAGGGTCGTTAGCACGAATCTTTGCCACCACCGACCCCACTGGGTTGTTCTCCTCAACAAACAGCTCCAGTTCG\n HISEQ:142:C5822ANXX:3:1106:18985:32932,46929964,46930064,100M,60,TGATCTGGGCATTAGGGCCTTCATCAGGGTCGTTAGCACGAATCTTTGCCACCACCGACCCCACTGGGTTGTTCTCCTCAACAAACAGCTCCAGTTCGTC\n HISEQ:142:C5822ANXX:4:1104:9008:85114,46929969,46930069,100M,60,TGGGCATTAGGGCCTTCATCAGGGTCGTTAGCACGAATCTTTGCCACCACCGACCCCACTGGGTTGTTCTCCTCAACAAACAGCTCCAGTTCGTCCTTCT\n HISEQ:142:C5822ANXX:4:1202:18451:91174,46929979,46930079,100M,60,GGCCTTCATCAGGGTCGTTAGCACGAATCTTTGCCACCACCGACCCCACTGGGTTGTTCTCCTCAACAAACAGCTCCAGTTCGTCCTTCTCAAACATGGG\n HISEQ:142:C5822ANXX:3:1211:18522:54773,46929987,46930087,100M,60,TCAGGGTCGTTAGCACGAATCTTTGCCACCACCGACCCCACTGGGTTGTTCTCCTCAACAAACAGCTCCAGTTCGTCCTTCTCAAACATGGGGGCATTGT\n HISEQ:142:C5822ANXX:3:2114:19455:45093,46929987,46930087,100M,60,TCAGGGTCGTTAGCACGAATCTTTGCCACCGCCGACCCCACTGGGTTGTTCTCCTCAACAAACAGCTCCAGTTCGTCCTTCTCAAACATGGGGGCATTGT\n HISEQ:142:C5822ANXX:4:2115:9153:21593,46929994,46930094,100M,60,CGTTAGCACGAATCTTTGCCACCACCGACCCCACTGGGTTGTTCTCCTCAACAAACAGCTCCAGTTCGTCCTTCTCAAACATGGGGGCATTGTCATTAAT\n HISEQ:142:C5822ANXX:4:1212:15644:87227,46929995,46930095,100M,60,GTTAGCACGTATGTTTGCCACCACCGACCCCACTGAGTTGTTCTCCTCAACAAACAGCTCCAGTTCGTGCTTCTCAAACATGGGGGCAGTGTCATTAATG\n HISEQ:142:C5822ANXX:3:1103:4717:26369,46929997,46930097,100M,60,TAGCACGAATCTTTGCCACCACCGACCCCACTGGGTTGTTCTCCTCAACAAACAGCTCCAGTTCGTCCTTCTCAAACATGGGGGCATTGTCATTAATGTC\n\n\nWrite a bam file consisting of reads with mapping quality >=30 and\noverlapping a certain locus:\n\n::\n\n $ varlens-reads test/data/CELSR1/bams/bam_0.bam \\\n --min-mapping-quality 30 \\\n --locus 22:46932040-46932050 \\\n --out /tmp/result.bam\n\nWrite a bam file consisting of reads overlapping variants from a VCF:\n\n::\n\n $ varlens-reads test/data/CELSR1/bams/bam_0.bam \\\n --variants test/data/CELSR1/vcfs/vcf_1.vcf \\\n --out /tmp/result.bam\n\nPrint just the header for a BAM in csv format:\n\n::\n\n $ varlens-reads test/data/CELSR1/bams/bam_0.bam --header\n\nvarlens-allele-support\n----------------------\n\nGiven one or more BAMs and some genomic sites to consider, write a csv file\ngiving counts of reads supporting each allele at each site for each BAM.\n\nThe genomic sites to consider may be specified by locus (--locus option), or via\none or more VCF files.\n\nThe positions outputted by this command are in *interbase coordinates*, i.e.\nstarting at 0, inclusive on first index, exclusive on second (as opposed to\nthe one-based inclusive coordinates used in VCF files).\n\nExamples\n`````````````\n\n:: \n\n varlens-allele-support \\\n --reads test/data/CELSR1/bams/bam_1.bam \\\n --locus 22:46931061 22:46931063\n\n source,contig,interbase_start,interbase_end,allele,count\n bam_1.bam,22,46931060,46931061,,1\n bam_1.bam,22,46931060,46931061,G,329\n bam_1.bam,22,46931062,46931063,A,327\n bam_1.bam,22,46931062,46931063,AC,1\n bam_1.bam,22,46931062,46931063,AG,2\n\nNote on coordinate systems\n-----------------------------------\n\n``varlens`` uses 0-based half-open coordinates internally. Many tools\n(including samtools and VCF files) use inclusive 1-based coordinates. We try to\nkeep the confusion to a minimum by using the term \"interbase\" whenever we're\nusing 0-based half open coordinates and \"inclusive\" when we're using 1-based\ninclusive coordinates.\n\nOne particularly sticky place this comes up is when specifying loci on the\ncommandline using e.g. ``--locus chr22:43243-43244``. To maintain consistency\nwith the most common other tools, when you specify a locus like\n``chr22:10-20``, we interpret that as a 1-based inclusive coordinate. To\nspecify 0-based half-open coordinates, use this syntax: ``chr22/11-20`` (i.e. a\nslash instead of a colon).\n\nSee this `blog post `_\nfor more details on coordinate systems.\n\n.. Documentation\n -------------\n The docs are just this readme and the commandline tool help.\n They are available here: http://hammerlab.github.io/varlens/docs/html", "description_content_type": null, "docs_url": null, "download_url": "UNKNOWN", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/hammerlab/varlens", "keywords": null, "license": "http://www.apache.org/licenses/LICENSE-2.0.html", "maintainer": null, "maintainer_email": null, "name": "varlens", "package_url": "https://pypi.org/project/varlens/", "platform": "UNKNOWN", "project_url": "https://pypi.org/project/varlens/", "project_urls": { "Download": "UNKNOWN", "Homepage": "https://github.com/hammerlab/varlens" }, "release_url": "https://pypi.org/project/varlens/0.0.2/", "requires_dist": null, "requires_python": null, "summary": "commandline manipulation of genomic variants and NGS reads", "version": "0.0.2" }, "last_serial": 2064582, "releases": { "0.0.1": [ { "comment_text": "", "digests": { "md5": "826c8a9868ea1716dc0e313071f7044d", "sha256": "2134c873a25711b57c14ca27276f8cf1bcb11d599e58335e614e7925c578d8a3" }, "downloads": -1, "filename": "varlens-0.0.1.tar.gz", "has_sig": false, "md5_digest": "826c8a9868ea1716dc0e313071f7044d", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 38655, "upload_time": "2016-03-17T18:41:10", "url": "https://files.pythonhosted.org/packages/ac/49/9773cb96b66bc8a85c49a23650abaf45a2d1c59d8a21faa8b50e7a51627f/varlens-0.0.1.tar.gz" } ], "0.0.2": [ { "comment_text": "", "digests": { "md5": "c47c0fd62195cd601ca73ee0021754c3", "sha256": "cbdc8786304817302358338829c05a852121d508d0ee9878a43ce59e5eaf60d2" }, "downloads": -1, "filename": "varlens-0.0.2.tar.gz", "has_sig": false, "md5_digest": "c47c0fd62195cd601ca73ee0021754c3", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 38923, "upload_time": "2016-04-14T21:39:17", "url": "https://files.pythonhosted.org/packages/3f/94/7df142fa46404c0365ebbcce78ba9cfb78d757b6acecf78c14b7b7293609/varlens-0.0.2.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "c47c0fd62195cd601ca73ee0021754c3", "sha256": "cbdc8786304817302358338829c05a852121d508d0ee9878a43ce59e5eaf60d2" }, "downloads": -1, "filename": "varlens-0.0.2.tar.gz", "has_sig": false, "md5_digest": "c47c0fd62195cd601ca73ee0021754c3", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 38923, "upload_time": "2016-04-14T21:39:17", "url": "https://files.pythonhosted.org/packages/3f/94/7df142fa46404c0365ebbcce78ba9cfb78d757b6acecf78c14b7b7293609/varlens-0.0.2.tar.gz" } ] }