{
    "info": {
        "author": "Peter Hickey",
        "author_email": "peter.hickey@gmail.com",
        "bugtrack_url": null,
        "classifiers": [
            "License :: OSI Approved :: MIT License",
            "Programming Language :: Python :: 2.7",
            "Programming Language :: Python :: 3.2",
            "Programming Language :: Python :: 3.4",
            "Programming Language :: Python :: 3.5",
            "Topic :: Scientific/Engineering :: Bio-Informatics"
        ],
        "description": "|Build Status| |Coverage Status|\n\nmethtuple\n=========\n\nOverview\n--------\n\nWhat does it do?\n~~~~~~~~~~~~~~~~\n\n``methtuple`` allows the user to investigate the co-occurence of\nmethylation marks at the level of individual DNA fragments. It does this\nby performing methylation calling at *m-tuples* of methylation loci from\nhigh-throughput bisulfite sequencing data, such as *methylC-seq*. In\nshort, ``methtuple`` extracts and tabulates the methylation states of\nall m-tuples from a ``BAM`` file (for a user-defined value of *m*).\n\nWhy would I want to do that?\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nA typical read from a bisulfite-sequencing experiment reports a binary\nmethylated or unmethylated measurement at multiple loci. Each read\noriginates from a single cell. Because methylation calls are made from\nindividual reads/read-pairs, we can investigate the co-occurence of\nmethylation events at the level of individual DNA fragments.\n\nI have been using ``methtuple`` to investigate the spatial dependence of\nDNA methylation at the level of individual DNA fragments by studying\nmethylation patterns of CpG 2-tuples. ``methtuple`` can also be used as\na drop-in replacement for ``bismark_methylation_extractor`` while also\nproviding enhanced filtering options and a slightly faster runtime\n(10-20% faster, albeit with an increased memory usage).\n\nWhat is an m-tuple?\n~~~~~~~~~~~~~~~~~~~\n\nThe simplest *m-tuple* is the 1-tuple (*m* = 1). ``methtuple`` tabulates\nthe number of reads that are methylated (*M*) and unmethylated (*U*) for\neach methylation 1-tuple in the genome. 1-tuples are the type of\nmethylation calling performed by most methylation calling software such\nas Bismark's ``bismark_methylation_extractor``.\n\nA 2-tuple (*m* = 2) is a pair of methylation loci. ``methtuple``\ntabulates the number of reads that methylated at each locus in the pair\n(*MM*), both unmethylated (*UU*) or methylated at one locus but not the\nother (*MU* or *UM*). This idea readily extends to 3-tuples, 4-tuples,\netc.\n\nIn its default settings, and with *m* > 1, ``methtuple`` tries to create\nonly m-tuples made of \"neighbouring\" loci. However, please see the\nexample below for why I say this only \"tries\" to create m-tuples of\nneighbouring loci. For a DNA fragment containing *k* methylation loci\nthere are *m - k + 1* m-tuples made of neighbouring loci.\n\nAlternatively, we can create all combinations of m-tuples by using the\n``--all-combinations`` flag. For a DNA fragment containing *k*\nmethylation loci there are \"*k* choose *m*\" m-tuples when using\n``--all-combinations``, a number that grows rapidly in *k*, particularly\nwhen *m* is close to *k/2*.\n\nRegardless of how m-tuples are constructed, ``methtuple`` always takes\ncare to only count each methylation locus once when it has been\ntwice-sequenced by overlapping paired-end reads.\n\nDraw me a picture\n~~~~~~~~~~~~~~~~~\n\nWell, I hope ASCII art will do.\n\nSuppose we sequence a region of the genome containing five methylation\nloci with three paired-end reads (``A``, ``B`` and ``C``):\n\n::\n\n    ref: 1    2   3 4 5\n    A_1: |----->\n    A_2:         <------|\n    B_1: |----->\n    B_2:           <----|\n    C_1:    |----->\n    C_2:      <------|\n\nIf we are interested in 1-tuples, then we would obtain the following\nfrom each read by running ``methtuple``:\n\n::\n\n    A: {1}, {2}, {3}, {4}, {5}\n    B: {1}, {2}, {4}, {5}\n    C: {2}, {3}, {4}\n\nThis result is true regardless of whether the ``--all-combinations``\nflag is set.\n\nIf we are interested in 3-tuples, then we would obtain the following\nfrom each read by running ``methtuple`` in its default mode:\n\n::\n\n    A: {1, 2, 3}, {2, 3, 4}, {3, 4, 5}\n    B: {1, 2, 4}, {2, 4, 5}\n    C: {2, 3, 4}\n\nThings to note:\n\n-  Read-pair ``A`` sequences all three (= 5 - 3 + 1) \"neighbouring\"\n   3-tuples\n-  Read-pair ``B`` sequences none of the \"neighbouring\" 3-tuples but\n   does \"erroneously\" construct two non-neighbouring 3-tuples. This\n   happens because m-tuples are created independently from each\n   read-pair; effectively, read-pair ``B`` is \"unaware\" of methylation\n   locus ``3``. Depending on the downstream analysis, you may want to\n   *post-hoc* filter out these \"non-neighbouring\" m-tuples.\n-  The twice-sequenced methylation loci, ``2`` and ``3``, in read-pair\n   ``C`` are not double counted.\n\nHowever, if we were to run ``methtuple`` with ``--all-combinations``\nthen we would obtain:\n\n::\n\n    A: {1, 2, 3}, {2, 3, 4}, {3, 4, 5}, {1, 2, 4}, {1, 2, 5}, {1, 3, 4}, {1, 3, 5}, {1, 4, 5}, {2, 3, 5}, {2, 4, 5}\n    B: {1, 2, 4}, {2, 4, 5}, {1, 2, 5}, {1, 4, 5}\n    C: {2, 3, 4}\n\nInstallation and dependencies\n-----------------------------\n\n``methtuple`` is written in Python and relies upon the ``pysam`` module.\n**NOTE: ``methtuple`` now requires ``pysam v0.8.4`` or greater.**\n\nRunning ``python setup.py install`` will attempt to install ``pysam`` if\nit isn't found on your system. Alternatively, instructions for\ninstalling ``pysam`` are available from\nhttps://github.com/pysam-developers/pysam.\n\nI have extensively used and tested ``methtuple`` with Python 2.7. It\nshould also work on Python 3.2, 3.3, 3.4, and 3.5 with the current\nversion of ``pysam`` (``v0.8.4``), as indicated by the `Travis-CI\nbuilds <https://travis-ci.org/PeteHaitch/methtuple>`__.\n\nUsing ``pip``\n~~~~~~~~~~~~~\n\nThe simplest way:\n\n::\n\n    pip install methtuple\n\n``methtuple`` is written in Python and requires the ``pysam`` module.\n**NOTE: ``methtuple`` now requires ``pysam v0.8.4`` or greater.**\n\nAlternatively, after cloning or downloading the ``methtuple`` git\nrepositority, simply run:\n\n::\n\n    python setup.py install\n\nin the root ``methtuple`` directory should work for most systems.\n\nUsage\n-----\n\nBasic usage\n~~~~~~~~~~~\n\n``methtuple`` processes a single ``BAM`` file and works for both\nsingle-end and paired-end sequencing data. Example ``BAM`` files from\nsingle-end directional and paired-end directional bisulfite-sequencing\nexperiments are available in the ``data/`` directory.\n\nMethylation measurements may be filtered by base quality or other\ncriteria such as the mapping quality of the read or whether the read is\nmarked as a PCR duplicate. For a full list of filtering options, please\nrun ``methtuple --help`` or see the **Advanced Usage** section below.\n\nCurrently, the BAM file must have been created with\n`Bismark <http://www.bioinformatics.bbsrc.ac.uk/projects/download.html#bismark>`__.\nIf the data were aligned with Bismark version < 0.8.3 please use the\n``--aligner Bismark_old`` flag. Please file an issue if you would like\nto use a ``BAM`` file created with another aligner and I will do my best\nto support it.\n\nThe main options to pass ``methtuple`` are the size of the m-tuple\n(``-m``); the type of methylation, which is some combination of *CG*,\n*CHG*, *CHH* and *CNN* (``--methylation-type``); any filters to be\napplied to reads or positions within reads (see below); the BAM file;\nand the sample name, which will be used as a prefix for all output\nfiles. Multiple methylation types may be specified jointly, e.g.,\n``--methylation-type CG --methylation-type CHG``\n\nOutput\n~~~~~~\n\nThree output files are created and summary information is written to\n``STDOUT``. The main output file is a tab-delimited file of all\nm-tuples, ``<in>.<--methylation-type>.<-m>[ac].tsv``, where ``<in>`` is\nthe prefix of the ``<in.bam>`` BAM file and ``ac`` is added if the\n``--all-combinations`` flag was used, e.g., ``SRR949207.CG.2ac.tsv``.\nOutput files may be gzipped (``--gzip``) or bzipped (``--bzip2``).\n\nHere are the first 5 rows (including with the header row) from\n``data/se_directional.fq.gz_bismark_bt2.CG.2.tsv``, which is created by\nrunning the single-end directional example shown below:\n\n::\n\n    chr     strand  pos1    pos2    MM      MU      UM      UU\n    chr1    +       6387768 6387783 1       0       0       0\n    chr1    +       7104116 7104139 1       0       0       0\n    chr1    +       7104139 7104152 1       0       0       0\n    chr1    +       9256170 9256179 0       0       0       1\n\nSo, for example, at the CpG 2-tuple chr1:+:(6,387,768, 6,387,783) we\nobserved 1 read that was methylated at chr1:+:6,387,768 and methylated\nat chr1:+:6,387,783.\n\nThe ``strand`` is recorded as ``+`` (forward strand, \"OT\" in Bismark),\n``-`` (reverse strand, \"OB\" in Bismark) or ``*``, meaning not applicable\n(if the ``--strand-collapse`` option is set). The position of all\nmethylation loci is always with respect to the forward strand.\n\nThe second file (``<in>.<--methylation-type>_per_read.hist``) is a text\nhistogram of the number of methylation loci per read/readpair (of the\ntype specified by ``--methylation-type``) that passed the filters\nspecified at runtime of ``methtuple``.\n\nHere is the file\n``data/se_directional.fq.gz_bismark_bt2.CG_per_read.hist``, which is\ncreated by running the single-end directional example shown below:\n\n::\n\n    n       count\n    0       4561\n    1       2347\n    2       789\n    3       296\n    4       144\n    5       61\n    6       29\n    7       19\n    8       3\n    9       4\n    10      2\n    11      1\n    12      3\n    13      4\n    14      1\n    18      2\n\nSo, 4,561 reads aligned to a position containing no CpGs while 2 reads\naligned to a position containing 18 CpGs.\n\nAn optional third and final file (``<in>.reads_that_failed_QC.txt>``)\nrecords the querynames (``QNAME``) of all reads that failed to pass\nquality control filters and which filter the read failed. This file may\nbe omitted by use of the ``--no-failed-filter-file`` flag.\n\nIn this case we didn't set any quality control filters and so this file\nis empty.\n\nExamples\n~~~~~~~~\n\nTwo small example datasets are included in the ``data/`` directory.\nIncluded are the ``FASTQ`` files and the ``BAM`` files generated with\n**Bismark** in **Bowtie2** mode. More details of the example datasets\ncan be found in ``data/README.md``\n\nAlthough the example datasets are both from directional\nbisulfite-sequencing protocols, ``methtuple`` also works with data from\nnon-directional bisulfite-sequencing protocols.\n\nSingle-end reads\n^^^^^^^^^^^^^^^^\n\nThe following command will extract all CpG 2-tuples from the file\n``data/se_directional.bam``:\n\n::\n\n    methtuple -m 2 --methylation-type CG data/se_directional.fq.gz_bismark_bt2.bam\n\nThis results in 3 files:\n\n-  ``data/se_directional.fq.gz_bismark_bt2.CG.2.tsv``\n-  ``data/se_directional.fq.gz_bismark_bt2.CG_per_read.hist``\n-  ``data/se_directional.fq.gz_bismark_bt2.reads_that_failed_QC.txt``\n\nPaired-end reads\n^^^^^^^^^^^^^^^^\n\nPaired-end data must firstly be sorted by queryname prior to running\n``methtuple``. ``BAM`` files created by Bismark, such as\n``data/pe_directional.bam``, are already sorted by queryname. So, to\nextract all CG/CHH 3-tuples we would simply run:\n\n::\n\n    methtuple -m 3 --methylation-type CG --methylation-type CHH data/pe_directional_1.fq.gz_bismark_bt2_pe.bam\n\nThis results in 3 files:\n\n-  ``data/pe_directional_1.fq.gz_bismark_bt2_pe.CG_CHH.3.tsv``\n-  ``data/pe_directional_1.fq.gz_bismark_bt2_pe.CG_CHH_per_read.hist``\n-  ``data/pe_directional_1.fq.gz_bismark_bt2_pe.reads_that_failed_QC.txt``\n\nNote on sort-order of paired-end BAM files\n''''''''''''''''''''''''''''''''''''''''''\n\nIf your paired-end BAM file is sorted by genomic coordinates, then you\nmust first sort the ``BAM`` by queryname and then run ``methtuple`` on\nthe queryname-sorted ``BAM``. This can be done by using\n``samtools sort`` with the ``-n`` option or Picard's ``SortSam``\nfunction with the ``SO=queryname`` option:\n\n::\n\n    # Create a coordinate-sorted BAM for the sake of argument\n    samtools sort data/pe_directional_1.fq.gz_bismark_bt2_pe.bam data/cs_pe_directional_1.fq.gz_bismark_bt2_pe\n    # Re-sort the coordinate-sorted BAM by queryname\n    samtools sort -n data/cs_pe_directional_1.fq.gz_bismark_bt2_pe.bam data/qs_pe_directional_1.fq.gz_bismark_bt2_pe\n    # Run methtuple on the queryname sorted BAM\n    methtuple -m 3 --methylation-type CG --methylation-type CHG data/qs_pe_directional_1.fq.gz_bismark_bt2_pe.bam\n\nMemory usage and running time\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nFor a rough indication of performance, here are the results for\nprocessing approximately 40,000,000 100bp paired-end reads from chr1 of\na 20-30x coverage whole-genome methylC-seq experiment of human data.\nThis analysis used a single AMD Opteron 6276 CPU (2.3GHz) on a shared\nmemory system.\n\n``-m 2``\n^^^^^^^^\n\nMemory usage peaked at 1.9GB and the running time was approximately 5\nhours.\n\n``-m 2 --all-combinations``\n^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nMemory usage peaked at 7GB and the running time was approximately 5.5\nhours.\n\nUse of the ``--all-combinations`` flag creates all possible m-tuples,\nincluding non-neighbouring ones. This produces many more m-tuples and so\nincreases the memory usage.\n\n``-m 5``\n^^^^^^^^\n\nMemory usage peaked at 1.5GB and the running time was approximately 4.3\nhours.\n\nHelper script\n~~~~~~~~~~~~~\n\nI frequently work with large, coordinate-sorted ``BAM`` files. To speed\nup the extraction of m-tuples, I use a simple parallelisation strategy\nwith `GNU parallel <http://www.gnu.org/software/parallel/>`__. The idea\nis to split the ``BAM`` file into chromosome-level ``BAM`` files,\nprocess each chromosome-level ``BAM`` separately and then recombine\nthese chromosome-level files into a genome-level file. The script\n``helper_scripts/run_methtuple.sh`` implements this strategy; simply\nedit the key variables in this script or adapt it to your own needs.\nPlease check the requirements listed in\n``helper_scripts/run_methtuple.sh``.\n\nWarnings\n^^^^^^^^\n\n-  **WARNING**: This simple strategy uses as many cores as there are\n   chromosomes. This can result in **very** large memory usage,\n   depending on the value of ``-m``, and may cause problems if you have\n   more chromosomes than available cores.\n-  **WARNING**: The script ``tabulate_hist.R`` must be in the same\n   directory as ``run_methtuple.sh``\n\nAdvanced usage\n~~~~~~~~~~~~~~\n\nA full list of options is available by running ``methtuple --help``:\n\n::\n\n    usage: methtuple [options] <in.bam>\n    Please run 'methtuple -h' for a full list of options.\n\n    Extract methylation patterns at m-tuples of methylation loci from the aligned\n    reads of a bisulfite-sequencing experiment. Currently only supports BAM files\n    created with Bismark.\n\n    Input options:\n      --aligner {Bismark,Bismark_old}\n                            The aligner used to generate the BAM file. Bismark_old\n                            refers to Bismark version < 0.8.3 (default: Bismark)\n      --Phred64             Quality scores are encoded as Phred64 rather than\n                            Phred33 (default: False)\n\n    Output options:\n      -o <text>, --output-prefix <text>\n                            By default, all output files have the same prefix as\n                            that of the input file. This will override the prefix\n                            of output file names\n      --sc, --strand-collapse\n                            Collapse counts across across Watson and Crick\n                            strands. Only possible for CG methylation type. The\n                            strand is recorded as '*' if this option is selected.\n                            (default: False)\n      --nfff, --no-failed-filter-file\n                            Do not create the file listing the reads that failed\n                            to pass to pass the filters and which filter it failed\n                            (default: False)\n      --gzip                gzip all output files. --gzip and --bzip2 are mutually\n                            exclusive (default: False)\n      --bzip2               bzip2 all output files. --gzip and --bzip2 are\n                            mutually exclusive (default: False)\n\n    Construction of methylation loci m-tuples:\n      --mt {CG,CHG,CHH,CNN}, --methylation-type {CG,CHG,CHH,CNN}\n                            The methylation type. Multiple methylation types may\n                            be analysed jointly by repeated use of this argument,\n                            e.g., --methylation-type CG --methylation-type CHG\n                            (default: ['CG'])\n      -m <int>              The size of the m-tuples, i.e., the 'm' in m-tuples\n                            (default: 1)\n      --ac, --all-combinations\n                            Create all combinations of m-tuples, including non-\n                            neighbouring m-tuples. WARNING: This will greatly\n                            increase the runtime and memory usage, particularly\n                            for larger values of -m and when analysing non-CG\n                            methylation (default: False)\n\n    Filtering of reads:\n      Applied before filtering of bases\n\n      --id, --ignore-duplicates\n                            Ignore reads that have been flagged as PCR duplicates\n                            by, for example, Picard's MarkDuplicates function.\n                            More specifically, ignore reads with the 0x400 bit in\n                            the FLAG (default: False)\n      --mmq <int>, --min-mapq <int>\n                            Ignore reads with a mapping quality score (mapQ) less\n                            than <int> (default: 0)\n      --of {sequence_strict,sequence,XM_strict,XM,XM_ol,quality,Bismark}, --overlap-filter {sequence_strict,sequence,XM_strict,XM,XM_ol,quality,Bismark}\n                            The type of check to be performed (listed roughly from\n                            most-to-least stringent): Ignore the read-pair if the\n                            sequence in the overlap differs between mates\n                            (sequence_strict); Ignore the overlapping region if the\n                            sequence in the overlap differs between mates\n                            (sequence); Ignore the read-pair if the XM-tag in the\n                            overlap differs (XM_strict); Ignore the overlapping\n                            region if the XM-tag in the overlap differs between\n                            mates (XM); Ignore any positions in the overlapping\n                            region where the XM-tags differ between the mates\n                            (XM_ol); Use the mate with the higher average quality\n                            basecalls in the overlapping region (quality); Use the\n                            first mate of each read-pair, i.e., the method used by\n                            bismark_methylation_extractor with the --no_overlap\n                            flag (Bismark) (default: XM_ol)\n      --uip, --use-improper-pairs\n                            Use the improper read-pairs, i.e. don't filter them.\n                            More specifically, check the 0x2 FLAG bit of each\n                            read; the exact definition of an improper read-pair\n                            depends on the aligner and alignment parameters\n                            (default: False)\n\n    Filtering of bases:\n      Applied after filtering of reads\n\n      --ir1p VALUES, --ignore-read1-positions VALUES\n                            If single-end data, ignore these read positions from\n                            all reads. If paired-end data, ignore these read\n                            positions from just read_1 of each pair. Multiple\n                            values should be comma-delimited, ranges can be\n                            specified by use of the hyphen and all positions\n                            should use 1-based co-ordinates. For example,\n                            1-5,80,95-100 corresponds to ignoring read-positions\n                            1, 2, 3, 4, 5, 80, 98, 99, 100. (default: None)\n      --ir2p VALUES, --ignore-read2-positions VALUES\n                            Ignore these read positions from just read_2 of each\n                            pair if paired-end sequencing. Multiple values should\n                            be comma-delimited, ranges can be specified by use of\n                            the hyphen and all positions should use 1-based co-\n                            ordinates. For example, 1-5,80,95-100 corresponds to\n                            ignoring read-positions 1, 2, 3, 4, 5, 80, 98, 99,\n                            100. (default: None)\n      --mbq <int>, --min-base-qual <int>\n                            Ignore read positions with a base quality score less\n                            than <int> (default: 0)\n\n    Other:\n      -v, --version         show program's version number and exit\n      -h, --help            show this help message and exit\n\n    methtuple (v1.4.0) by Peter Hickey (peter.hickey@gmail.com,\n    https://github.com/PeteHaitch/methtuple/)\n\nLimitations and notes\n---------------------\n\nThese are current limitations and their statuses:\n\nOnly works with data aligned with the **Bismark** mapping software\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\n``methtuple`` makes use of Bismark's custom SAM tags ``XM``, ``XR`` and\n``XG``. The ``XM`` tag is used to infer the methylation state of each\nsequenced cytosine while the ``XR`` and ``XG`` tags are used to infer\nthe orientation and strand of the alignment. If the data were aligned\nwith Bismark version < 0.8.3 please use the ``--oldBismark`` flag.\n\nPlease file an issue if you would like to use a ``BAM`` file created\nwith another aligner and I will do my best to support it; also, see\n`Issue #30 <https://github.com/PeteHaitch/methtuple/issues/30>`__\n\nPaired-end data must be sorted by queryname\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nThis is required in order to avoid lookups when finding the mate of a\npaired-end read.\n\nThe ``BAM`` file created by Bismark is natively in queryname order and\nso this is not a problem. If the file is not in queryname order then use\n``samtools sort`` with the ``-n`` option or Picard's ``SortSam``\nfunction with ``SO=queryname`` to sort your ``BAM`` by queryname. The\nhelper script ``helper_scripts/run_methtuple.sh`` works with a\ncoordinate-sorted ``BAM`` file and does so by including a step to sort\nthe chromosome-level ``BAM`` files by queryname using Picard's\n``SortSam``.\n\nThe ``--aligner Bismark_old`` option is a bit crude\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nSpecifically, it assumes that there are no '/' characters in the read\nnames (``QNAME``) and that the BAM has not been processed with any other\nprograms, e.g. Picard's MarkDuplicates, that might change the ``FLAG``\nfield. Please file an issue or submit a pull request if you would like\nthis improved.\n\nConstruction of \"non-neighbouring\" m-tuples\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nAs discussed in the above example, ``methtuple`` tries not to create\n\"non-neighbouring\" m-tuples, however, these do occur due to m-tuples\nbeing created independently from each read/read-pair. I do not make use\nof non-neighbouring m-tuples in my downstream analyses and so I\n*post-hoc* filter these out.\n\nIf you would like the option to create all possible m-tuples, both\n\"neighbouring\" and \"non-neighbouring\", please let me know at\nhttps://github.com/PeteHaitch/methtuple/issues/85 as there is a simple\nsolution that just awaits motivation for me to implement it.\n\nChoice of ``--overlap-filter``\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nThe two mates of a paired-end read, ``read_1`` and ``read_2``, often\noverlap in bisulfite-sequencing data. ``methtuple`` ensures that the\noverlapping sequence isn't double-counted and offers several different\nchoices of how overlapping paired-end reads are processed via the\n``--overlap-filter`` flag. Listed roughly from most-to-least stringent\nthese are:\n\n1. ``sequence_strict``: Check that the entire overlapping sequence is\n   identical; if not identical then do not use any methylation calls\n   from the entire read-pair.\n2. ``sequence``: Check that the entire overlapping sequence is\n   identical; if not identical then do not use any methylation calls\n   from the overlap.\n3. ``XM_strict``: Check that the XM-tag is identical for the overlapping\n   region; if not identical then do not use any methylation calls from\n   the entire read-pair.\n4. ``XM``: Check that the XM-tag is identical for the overlapping\n   region; if not identical then do not use any methylation calls from\n   the overlap.\n5. ``XM_ol``: Check that the XM-tag is identical for the overlapping\n   region; if not identical then exclude those positions of disagreement\n   and count once the remaining positions in the overlap.\n6. ``quality``: No check of the overlapping bases; simply use the read\n   with the higher average quality basecalls in the overlapping region.\n7. ``Bismark``: No check of the overlapping bases; simply use the\n   overlapping bases from read\\_1, i.e., the method used by\n   ``bismark_methylation_extractor`` with the ``--no_overlap`` flag.\n\nOther notes\n~~~~~~~~~~~\n\n-  Bismark-Bowtie1 always sets the mapping quality (``mapQ``) as the\n   value 255, which means unavailable in the SAM format specification.\n   Thus the ``--min-mapq`` option will not have any effect for\n   Bismark-Bowtie1 data.\n-  ``methtuple`` skips paired-end reads where either mate is unmapped.\n\nAcknowledgements\n----------------\n\nA big thank you to `Felix\nKrueger <http://www.bioinformatics.babraham.ac.uk/people.html>`__ (the\nauthor of Bismark) for his help in understanding mapping of\nbisulfite-sequencing data and for answering my many questions along the\nway.\n\nThanks also to Tobias Sargeant ([@folded](https://github.com/folded))\nfor his help in turning the original ``methtuple.py`` script into the\ncurrent Python module ``methtuple`` and for help in setting up a testing\nframework.\n\nQuestions and comments\n----------------------\n\nPlease use the `GitHub Issue\nTracker <www.github.com/PeteHaitch/methtuple>`__ to file bug reports or\nrequest new functionality. I welcome questions and comments; you can\nemail me at peter.hickey@gmail.com.\n\n.. |Build Status| image:: https://travis-ci.org/PeteHaitch/methtuple.png?branch=master\n   :target: https://travis-ci.org/PeteHaitch/methtuple\n.. |Coverage Status| image:: https://coveralls.io/repos/PeteHaitch/methtuple/badge.svg?branch=master\n   :target: https://coveralls.io/r/PeteHaitch/methtuple?branch=master",
        "description_content_type": null,
        "docs_url": null,
        "download_url": "UNKNOWN",
        "downloads": {
            "last_day": -1,
            "last_month": -1,
            "last_week": -1
        },
        "home_page": "https://github.com/PeteHaitch/methtuple",
        "keywords": "bisulfite sequencing methylation bismark bioinformatics",
        "license": "MIT",
        "maintainer": null,
        "maintainer_email": null,
        "name": "methtuple",
        "package_url": "https://pypi.org/project/methtuple/",
        "platform": "UNKNOWN",
        "project_url": "https://pypi.org/project/methtuple/",
        "project_urls": {
            "Download": "UNKNOWN",
            "Homepage": "https://github.com/PeteHaitch/methtuple"
        },
        "release_url": "https://pypi.org/project/methtuple/1.5.4/",
        "requires_dist": null,
        "requires_python": null,
        "summary": "methtuple",
        "version": "1.5.4"
    },
    "last_serial": 1818524,
    "releases": {
        "1.5.3": [
            {
                "comment_text": "",
                "digests": {
                    "md5": "1725e3040f174f7cb83bdacabe9fd9af",
                    "sha256": "93e9438ebe9a39c781c2e59c84cd84bb327848654b97d689e4aa2d2e1bc768c5"
                },
                "downloads": -1,
                "filename": "methtuple-1.5.3-py2.py3-none-any.whl",
                "has_sig": false,
                "md5_digest": "1725e3040f174f7cb83bdacabe9fd9af",
                "packagetype": "bdist_wheel",
                "python_version": "py2.py3",
                "requires_python": null,
                "size": 62619,
                "upload_time": "2015-08-18T02:47:47",
                "url": "https://files.pythonhosted.org/packages/ff/3e/3e8185568af54258eeb5c6622c1dac11a2367e41564938ab89c9fc09deeb/methtuple-1.5.3-py2.py3-none-any.whl"
            },
            {
                "comment_text": "",
                "digests": {
                    "md5": "271b799d76502da36478168b31ca7931",
                    "sha256": "fa7470aeca6eaf273160eba21c21fd4e96d58c57088b8b67ccce9638dfaa824e"
                },
                "downloads": -1,
                "filename": "methtuple-1.5.3.tar.gz",
                "has_sig": false,
                "md5_digest": "271b799d76502da36478168b31ca7931",
                "packagetype": "sdist",
                "python_version": "source",
                "requires_python": null,
                "size": 52043,
                "upload_time": "2015-08-18T02:47:52",
                "url": "https://files.pythonhosted.org/packages/1b/f7/2f2b5dc49805304ad5c64057fb3bc4aa37bf009067ea3b486ad339551ef1/methtuple-1.5.3.tar.gz"
            }
        ],
        "1.5.4": []
    },
    "urls": []
}