{ "info": { "author": "Mirny Lab", "author_email": "espresso@mit.edu", "bugtrack_url": null, "classifiers": [ "Development Status :: 4 - Beta", "Operating System :: OS Independent", "Programming Language :: Python", "Programming Language :: Python :: 3", "Programming Language :: Python :: 3.4", "Programming Language :: Python :: 3.5", "Programming Language :: Python :: 3.6" ], "description": "# pairtools\n\n[![Documentation Status](https://readthedocs.org/projects/pairtools/badge/?version=latest)](http://pairtools.readthedocs.org/en/latest/)\n[![Build Status](https://travis-ci.org/mirnylab/pairtools.svg?branch=master)](https://travis-ci.org/mirnylab/pairtools)\n[![Join the chat at https://gitter.im/mirnylab/distiller](https://badges.gitter.im/mirnylab/distiller.svg)](https://gitter.im/mirnylab/distiller?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge)\n[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.1490831.svg)](https://doi.org/10.5281/zenodo.1490831)\n\n## Process Hi-C pairs with pairtools\n\n`pairtools` is a simple and fast command-line framework to process sequencing\ndata from a Hi-C experiment.\n\n`pairtools` process pair-end sequence alignments and perform the following\noperations:\n\n- detect ligation junctions (a.k.a. Hi-C pairs) in aligned paired-end sequences of Hi-C DNA molecules\n- sort .pairs files for downstream analyses\n- detect, tag and remove PCR/optical duplicates \n- generate extensive statistics of Hi-C datasets\n- select Hi-C pairs given flexibly defined criteria\n- restore .sam alignments from Hi-C pairs\n\nTo get started:\n- Take a look at a [quick example](https://github.com/mirnylab/pairtools#quick-example)\n- Check out the detailed [documentation](http://pairtools.readthedocs.io).\n\n## Data formats\n\n`pairtools` produce and operate on tab-separated files compliant with the\n[.pairs](https://github.com/4dn-dcic/pairix/blob/master/pairs_format_specification.md) \nformat defined by the [4D Nucleome Consortium](https://www.4dnucleome.org/). All\npairtools properly manage file headers and keep track of the data\nprocessing history.\n\nAdditionally, `pairtools` define the .pairsam format, an extension of .pairs that includes the SAM alignments \nof a sequenced Hi-C molecule. .pairsam complies with the .pairs format, and can be processed by any tool that\noperates on .pairs files.\n\n## Installation\n\nRequirements:\n\n- Python 3.x\n- Python packages `cython`, `numpy` and `click`.\n- Command-line utilities `sort` (the Unix version), `bgzip` (shipped with `tabix`) and `samtools`. If available, `pairtools` can compress outputs with `pbgzip` and `lz4`.\n\nWe highly recommend using the `conda` package manager to install `pairtools` together with all its dependencies. To get it, you can either install the full [Anaconda](https://www.continuum.io/downloads) Python distribution or just the standalone [conda](http://conda.pydata.org/miniconda.html) package manager.\n\nWith `conda`, you can install `pairtools` and all of its dependencies from the [bioconda](https://bioconda.github.io/index.html) channel.\n```sh\n$ conda install -c conda-forge -c bioconda pairtools\n```\n\nAlternatively, install `pairtools` and only Python dependencies from PyPI using pip:\n```sh\n$ pip install pairtools\n```\n\n## Quick example\n\nSetup a new test folder and download a small Hi-C dataset mapped to sacCer3 genome:\n```bash\n$ mkdir /tmp/test-pairtools\n$ cd /tmp/test-pairtools\n$ wget https://github.com/mirnylab/distiller-test-data/raw/master/bam/MATalpha_R1.bam\n```\n\nAdditionally, we will need a .chromsizes file, a TAB-separated plain text table describing the names, sizes and the order of chromosomes in the genome assembly used during mapping:\n```bash\n$ wget https://raw.githubusercontent.com/mirnylab/distiller-test-data/master/genome/sacCer3.reduced.chrom.sizes\n```\n\nWith `pairtools parse`, we can convert paired-end sequence alignments stored in .sam/.bam format into .pairs, a TAB-separated table of Hi-C ligation junctions:\n\n```bash\n$ pairtools parse -c sacCer3.reduced.chrom.sizes -o MATalpha_R1.pairs.gz --drop-sam MATalpha_R1.bam \n```\n\nInspect the resulting table:\n\n```bash\n$ less MATalpha_R1.pairs.gz\n```\n\n## Pipelines\n\n- We provide a simple working example of a mapping bash pipeline in /examples/.\n- [distiller](https://github.com/mirnylab/distiller-nf) is a powerful\nHi-C data analysis workflow, based on `pairtools` and [nextflow](https://www.nextflow.io/).\n\n\n## Tools\n\n- `parse`: read .sam files produced by bwa and form Hi-C pairs\n - form Hi-C pairs by reporting the outer-most mapped positions and the strand\n on the either side of each molecule;\n - report unmapped/multimapped (ambiguous alignments)/chimeric alignments as\n chromosome \"!\", position 0, strand \"-\";\n - identify and rescue chrimeric alignments produced by singly-ligated Hi-C \n molecules with a sequenced ligation junction on one of the sides;\n - perform upper-triangular flipping of the sides of Hi-C molecules \n such that the first side has a lower sorting index than the second side;\n - form hybrid pairsam output, where each line contains all available data \n for one Hi-C molecule (outer-most mapped positions on the either side, \n read ID, pair type, and .sam entries for each alignment);\n - print the .sam header as #-comment lines at the start of the file.\n\n- `sort`: sort pairs files (the lexicographic order for chromosomes, \n the numeric order for the positions, the lexicographic order for pair types).\n\n- `merge`: merge sorted .pairs files\n - merge sort .pairs;\n - combine the .pairs headers from all input files;\n - check that each .pairs file was mapped to the same reference genome index \n (by checking the identity of the @SQ sam header lines).\n\n- `select`: select pairs according to specified criteria\n - select pairs entries according to the provided condition. A programmable\n interface allows for arbitrarily complex queries on specific pair types, \n chromosomes, positions, strands, read IDs (including matches to a\n wildcard/regexp/list).\n - optionally print the non-matching entries into a separate file.\n\n- `dedup`: remove PCR duplicates from a sorted triu-flipped .pairs file\n - remove PCR duplicates by finding pairs of entries with both sides mapped\n to similar genomic locations (+/- N bp);\n - optionally output the PCR duplicate entries into a separate file.\n - NOTE: in order to remove all PCR duplicates, the input must contain \\*all\\* \n mapped read pairs from a single experimental replicate;\n\n- `maskasdup`: mark all pairs in a pairsam as Hi-C duplicates\n - change the field pair_type to DD;\n - change the pair_type tag (Yt:Z:) for all sam alignments;\n - set the PCR duplicate binary flag for all sam alignments (0x400).\n\n- `split`: split a .pairsam file into .pairs and .sam.\n\n- `stats`: calculate various statistics of .pairs files\n\n- `restrict`: identify the span of the restriction fragment forming a Hi-C junction\n\n## Contributing\n\n[Pull requests](https://akrabat.com/the-beginners-guide-to-contributing-to-a-github-project/) are welcome.\n\nFor development, clone and install in \"editable\" (i.e. development) mode with the `-e` option. This way you can also pull changes on the fly.\n```sh\n$ git clone https://github.com/mirnylab/pairtools.git\n$ cd pairtools\n$ pip install -e .\n```\n\n## License\n\nMIT", "description_content_type": "text/markdown", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/mirnylab/pairtools", "keywords": "genomics,bioinformatics,Hi-C,contact", "license": "MIT", "maintainer": "", "maintainer_email": "", "name": "pairtools", "package_url": "https://pypi.org/project/pairtools/", "platform": "", "project_url": "https://pypi.org/project/pairtools/", "project_urls": { "Homepage": "https://github.com/mirnylab/pairtools" }, "release_url": "https://pypi.org/project/pairtools/0.3.0/", "requires_dist": null, "requires_python": "", "summary": "CLI tools to process mapped Hi-C data", "version": "0.3.0" }, "last_serial": 5525935, "releases": { "0.1.0": [ { "comment_text": "", "digests": { "md5": "9288c827e1ceb1370a3dd729fde95fda", "sha256": "cded88f07dbc7b829bfb2e1979755f81b3777b85371363d70ba185df7c609f6b" }, "downloads": -1, "filename": "pairtools-0.1.0.tar.gz", "has_sig": false, "md5_digest": "9288c827e1ceb1370a3dd729fde95fda", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 492930, "upload_time": "2018-07-19T20:23:46", "url": "https://files.pythonhosted.org/packages/c2/96/6dad0eef3ca5bd6398d4c6721f71ef49411d88eabd738e65c4951d6bf6c8/pairtools-0.1.0.tar.gz" } ], "0.1.1": [ { "comment_text": "", "digests": { "md5": "77beb989bfcd10edb6df3fb5a9152093", "sha256": "172c1bc9a043f4ecaf2b464bddd7f17dcc3e0fe417936909062485ab107c23a2" }, "downloads": -1, "filename": "pairtools-0.1.1.tar.gz", "has_sig": false, "md5_digest": "77beb989bfcd10edb6df3fb5a9152093", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 495374, "upload_time": "2018-07-19T21:24:06", "url": "https://files.pythonhosted.org/packages/ea/ab/7319c089cd8c4099fd1a25b18059bfc220db6a58d60ce428d2ccc6243e24/pairtools-0.1.1.tar.gz" } ], "0.2.0": [ { "comment_text": "", "digests": { "md5": "d7de0669e35b7de802dfaa99512a55ff", "sha256": "23bc37ed886ff30b0c573de702914bbb8f8c657b65c5d93841a01dd572b7f0f8" }, "downloads": -1, "filename": "pairtools-0.2.0.tar.gz", "has_sig": false, "md5_digest": "d7de0669e35b7de802dfaa99512a55ff", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 578545, "upload_time": "2018-08-03T21:53:44", "url": "https://files.pythonhosted.org/packages/3e/00/9bba3ae1dcc2b8b3d7b3187444f036b708bc76c748c715ec57d594531fc2/pairtools-0.2.0.tar.gz" } ], "0.2.1": [ { "comment_text": "", "digests": { "md5": "937cc065f45489ae976be8ace618ecb0", "sha256": "43a1685776d7f6d2ca38950bbab33c081cb0ca3e48f3e46d030b8a1de906ecf7" }, "downloads": -1, "filename": "pairtools-0.2.1.tar.gz", "has_sig": false, "md5_digest": "937cc065f45489ae976be8ace618ecb0", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 578578, "upload_time": "2018-12-21T06:43:30", "url": "https://files.pythonhosted.org/packages/b5/6c/e1c3e03702bf369b2eab6d57ed0bba66fe668c0fbf2c2e3e5b86cc6e407e/pairtools-0.2.1.tar.gz" } ], "0.2.2": [ { "comment_text": "", "digests": { "md5": "30afd2189c5b066864cfe021d15cf2ba", "sha256": "2acf68a980856fe7deb7d0973aebe5c8204e6b6f120ddb05e353866402f98fb1" }, "downloads": -1, "filename": "pairtools-0.2.2.tar.gz", "has_sig": false, "md5_digest": "30afd2189c5b066864cfe021d15cf2ba", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 574683, "upload_time": "2019-01-07T18:42:21", "url": "https://files.pythonhosted.org/packages/98/78/e12042ec11598cf7aca8d5b93df5c5366c96c3ab4a6fe0729d201de7b0b8/pairtools-0.2.2.tar.gz" } ], "0.3.0": [ { "comment_text": "", "digests": { "md5": "acff3d5d2302bfd0d616867e3bee71d1", "sha256": "4931da4b5fcb327d13c9564e6fdda1acf2335e16213a55d2620fe340ae918b53" }, "downloads": -1, "filename": "pairtools-0.3.0.tar.gz", "has_sig": false, "md5_digest": "acff3d5d2302bfd0d616867e3bee71d1", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 572045, "upload_time": "2019-07-13T03:33:01", "url": "https://files.pythonhosted.org/packages/35/e5/8efba77e40d770c936884bc5def89c8d211c27932246de5bb7ea005ac33f/pairtools-0.3.0.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "acff3d5d2302bfd0d616867e3bee71d1", "sha256": "4931da4b5fcb327d13c9564e6fdda1acf2335e16213a55d2620fe340ae918b53" }, "downloads": -1, "filename": "pairtools-0.3.0.tar.gz", "has_sig": false, "md5_digest": "acff3d5d2302bfd0d616867e3bee71d1", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 572045, "upload_time": "2019-07-13T03:33:01", "url": "https://files.pythonhosted.org/packages/35/e5/8efba77e40d770c936884bc5def89c8d211c27932246de5bb7ea005ac33f/pairtools-0.3.0.tar.gz" } ] }