{ "info": { "author": "University of Michigan Bioinformatics Core", "author_email": "bfx-connor@umich.edu", "bugtrack_url": null, "classifiers": [ "Development Status :: 5 - Production/Stable", "Environment :: Console", "Intended Audience :: Science/Research", "License :: OSI Approved :: Apache Software License", "Operating System :: MacOS", "Operating System :: Unix", "Programming Language :: Python :: 2.7", "Programming Language :: Python :: 3", "Topic :: Scientific/Engineering :: Bio-Informatics" ], "description": "======\nConnor\n======\n\nA command-line tool to deduplicate bam files based on custom, inline barcoding.\n\n.. image:: https://travis-ci.org/umich-brcf-bioinf/Connor.svg?branch=develop\n :target: https://travis-ci.org/umich-brcf-bioinf/Connor\n :alt: Build Status\n\n.. image:: https://codeclimate.com/repos/5793a84516ba097bda009574/badges/28ae96f1f3179a08413e/coverage.svg\n :target: https://codeclimate.com/repos/5793a84516ba097bda009574/coverage\n :alt: Test Coverage\n\n.. image:: https://codeclimate.com/repos/5793a84516ba097bda009574/badges/28ae96f1f3179a08413e/gpa.svg\n :target: https://codeclimate.com/repos/5793a84516ba097bda009574/feed\n :alt: Code Climate\n\n.. image:: https://img.shields.io/badge/license-Apache-blue.svg\n :target: https://pypi.python.org/pypi/connor/\n :alt: License\n\n.. image:: http://img.shields.io/pypi/v/connor.svg\n :target: https://pypi.python.org/pypi/connor/\n :alt: Latest PyPI version\n\nThe official repository is at:\nhttps://github.com/umich-brcf-bioinf/Connor\n\n--------\nOverview\n--------\n\nWhen analyzing deep-sequence NGS data it is sometimes difficult to distinguish\nsequencing and PCR errors from rare variants; as a result some variants may\nbe missed and some will be identified with an inaccurate variant frequency. To\naddress this, researchers can attach random barcode sequences during sample\npreparation. Upon sequencing, the barcodes act as a signature to trace the set\nof PCR amplified molecules back to the original biological molecules of\ninterest thereby differentiating rare variants in the original molecule from\nerrors introduced downstream.\n\nConnor accepts a barcoded, paired alignment file (BAM), groups those input\nalignments into families, combines each family into a consensus alignment, and\nemits the set of deduplicated, consensus alignments (BAM).\n\n *Connor workflow:*\n\n *Sequencing [FASTQ 1/2] -> Aligner [BAM] -> Connor [BAM] -> Variant Detection [VCF]*\n\nConnor first groups original alignments into **alignment families** based on their\nalignment position and **Universal Molecular Tag (UMT)** barcode. (Connor assumes\nthe incoming aligned sequences begin with the UMT barcode.) Each family of\nalignments is then combined into a single **consensus alignment**; discrepancies\nin base-calls and qualities are resolved by majority vote across family members.\nBy default, smaller families (<3 align pairs) are excluded.\n\nFor more information see:\n\n* `QUICKSTART`_ : get started deduplicating barcoded BAMs.\n\n* `INSTALL`_ : alternative ways to install.\n\n* `METHODS`_ : details on UMT barcode structure, suggestions on\n alignment parameters, details on family grouping, and examples.\n\n\n-----------\nConnor help\n-----------\n\n::\n\n $ connor --help\n usage: connor input_bam output_bam\n\n positional arguments:\n input_bam path to input BAM\n output_bam path to deduplicated output BAM\n\n optional arguments:\n -h, --help show this help message and exit\n -V, --version show program's version number and exit\n -v, --verbose print all log messages to console\n --log_file LOG_FILE ={output_filename}.log. Path to verbose log file\n --annotated_output_bam ANNOTATED_OUTPUT_BAM\n path to output BAM containing all original aligns annotated with BAM tags\n -f CONSENSUS_FREQ_THRESHOLD, --consensus_freq_threshold CONSENSUS_FREQ_THRESHOLD\n =0.6 (0..1.0): Ambiguous base calls at a specific position in a family are\n transformed to either majority base call, or N if the majority percentage\n is below this threshold. (Higher threshold results in more Ns in\n consensus.)\n -s MIN_FAMILY_SIZE_THRESHOLD, --min_family_size_threshold MIN_FAMILY_SIZE_THRESHOLD\n =3 (>=0): families with count of original reads < threshold are excluded\n from the deduplicated output. (Higher threshold is more\n stringent.)\n -d UMT_DISTANCE_THRESHOLD, --umt_distance_threshold UMT_DISTANCE_THRESHOLD\n =1 (>=0); UMTs equal to or closer than this Hamming distance will be\n combined into a single family. Lower threshold make more families with more\n consistent UMTs; 0 implies UMT must match\n exactly.\n --filter_order {count,name}\n =count; determines how filters will be ordered in the log\n results\n --umt_length UMT_LENGTH\n =6 (>=1); length of UMT\n\n====\n\nEmail bfx-connor@umich.edu for support and questions.\n\nUM BRCF Bioinformatics Core\n\n.. _INSTALL: doc/INSTALL.rst\n.. _METHODS: doc/METHODS.rst\n.. _QUICKSTART : doc/QUICKSTART.rst\n\n\nChangelog\n=========\n\n0.6.1 (8/16/2018)\n-------------------\n- Adjust logging to not crash if username/hostname not available\n\n0.6 (4/11/2018)\n-------------------\n- Extended to support pysam v0.13, v0.14\n- Added optional command line arg to specify length of unique molecular tag (UMT)\n- Added optional command line arg to sort filters results by name instead of count\n- Added validation to check for properly paired alignments\n- Added validation to check for presence of secondary alignments\n- Adjusted so warning instead of error when no families found\n- Substantial refactors to clarify implementation\n\n0.5.1 (9/8/2017)\n----------------\n- Extended supported python and pysam versions\n- Adjusted to avoid performance problem when processing extremely deep pileups\n- Adjusted so that when no families pass filters show warning instead of\n error message (thanks to ccario83 for upvoting this fix)\n\n0.5 (9/13/2016)\n---------------\n- Filters now exclude supplemental alignments\n- Added BAM tags to show pair positions and CIGAR values\n- Reduced required memory and improved performance\n\n0.4 (8/26/2016)\n---------------\n- Added input/command validations\n- Added annotated bam option\n- Revised QUICKSTART, METHODS\n- Added PG line in BAM header\n- Improved logging of filtered aligns and progress\n- Removed some logged stats to focus logging results\n- Removed dependency on pandas/numpy\n- Moderate performance (speed) improvements in calculating consensus sequence\n- Switched consensus quality to be the max mapping quality\n\n0.3 (8/8/2016)\n---------------\n- Added filters to exclude low quality, unmapped, or unpaired alignments\n- Revised BAM tags; documented BAM tags in BAM header\n- Extended logging to write to file and console\n- Adjusted to make deterministic in Py3/Py2\n\n0.2 (7/15/2016)\n---------------\n- Bugfix: connor was mangling left hand side of right hand consensus reads\n- Fuzzy grouping of pairs into families based on left or right UMI match\n- Fuzzy grouping of pairs into families based on UMI within Hamming distance\n- Command line args for hamming distince, consensus threshold, min orig reads\n- Extended logging to assist in overall diagnostics\n- Generate additional file of alignments excluded from consensus (diagnostic)\n- Added UMI sequence tag (X0)\n\n0.1 (6/17/2016)\n---------------\n- Initial development release\n- Partitions raw reads into consensus families\n\n\nConnor is written and maintained by the University of Michigan \nBRCF Bioinformatic Core; individual contributors include:\n\n- Chris Gates\n- Peter Ulintz\n\n\n", "description_content_type": "", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/umich-brcf-bioinf/Connor", "keywords": "bioinformatic exome-seq DNA-seq BAM", "license": "Apache", "maintainer": "", "maintainer_email": "", "name": "Connor", "package_url": "https://pypi.org/project/Connor/", "platform": "", "project_url": "https://pypi.org/project/Connor/", "project_urls": { "Homepage": "https://github.com/umich-brcf-bioinf/Connor" }, "release_url": "https://pypi.org/project/Connor/0.6.1/", "requires_dist": [ "pysam (>=0.8.4)", "sortedcontainers (>=1.5.3)" ], "requires_python": "", "summary": "Command-line tool to deduplicate reads in bam files based on custom inline barcoding.", "version": "0.6.1" }, "last_serial": 4178684, "releases": { "0.4": [ { "comment_text": "", "digests": { "md5": "ccb211c66d6d810be0d3dbaf8c368a31", "sha256": "1b3fb6fc6a28c9c05d1d6a8010f8d9859c54b8db9ebb74bad9ddf7ac47592576" }, "downloads": -1, "filename": "Connor-0.4-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "ccb211c66d6d810be0d3dbaf8c368a31", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": null, "size": 23612, "upload_time": "2016-08-27T11:08:01", "url": "https://files.pythonhosted.org/packages/08/31/4b6bb12d4910fc5a40dfa779bcfce09e9431c608beb2a8e4155af158c32c/Connor-0.4-py2.py3-none-any.whl" } ], "0.5": [ { "comment_text": "", "digests": { "md5": "6efd42fe4aa2ff49c49d6875c4503621", "sha256": "dd8ed04593ece465b62365aa853533f2cca68823cb77b24207c64fdaebfce601" }, "downloads": -1, "filename": "Connor-0.5-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "6efd42fe4aa2ff49c49d6875c4503621", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": null, "size": 24741, "upload_time": "2016-09-14T04:34:08", "url": "https://files.pythonhosted.org/packages/99/a7/b708b9fd22801c68a42e8cb7b0954537f4f7599fbdf8d0f6900840b2d1c8/Connor-0.5-py2.py3-none-any.whl" } ], "0.5.1": [ { "comment_text": "", "digests": { "md5": "42076533bbbbf3c63da975db70c9a296", "sha256": "bb422a62873ef7e04f4ae2684197d986c797d4bd191b96bb0ac88a0da3efe864" }, "downloads": -1, "filename": "Connor-0.5.1-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "42076533bbbbf3c63da975db70c9a296", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": null, "size": 25232, "upload_time": "2017-09-08T20:01:37", "url": "https://files.pythonhosted.org/packages/82/4f/0013ad40504a317ba13f67eeb4d4ba6a1100c50aa7bd3c25e02db52c1007/Connor-0.5.1-py2.py3-none-any.whl" } ], "0.6": [ { "comment_text": "", "digests": { "md5": "910af43a1f07130e3830497ed87133b3", "sha256": "8e2b1814eef255212ab1c9dcbeec20c7616135b2822f926ed458925545957672" }, "downloads": -1, "filename": "Connor-0.6-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "910af43a1f07130e3830497ed87133b3", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": null, "size": 26123, "upload_time": "2018-04-11T18:20:52", "url": "https://files.pythonhosted.org/packages/a8/4a/f9e85b72ad090ba480e55512f2d8abd1b742371ecaa87b0fc7b4e14951c0/Connor-0.6-py2.py3-none-any.whl" } ], "0.6.1": [ { "comment_text": "", "digests": { "md5": "d55cfe57b2c064c40009abb70c659ec2", "sha256": "5adc8d7b5d466bea4c0ba0db23fcea44787a6320ccbc9e885509b2828bb4339f" }, "downloads": -1, "filename": "Connor-0.6.1-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "d55cfe57b2c064c40009abb70c659ec2", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": null, "size": 26276, "upload_time": "2018-08-17T02:13:35", "url": "https://files.pythonhosted.org/packages/28/56/a1f3455a421d0b6e14b21a3f8ddf483dfdd5264f2e176d669a5ea646c3c1/Connor-0.6.1-py2.py3-none-any.whl" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "d55cfe57b2c064c40009abb70c659ec2", "sha256": "5adc8d7b5d466bea4c0ba0db23fcea44787a6320ccbc9e885509b2828bb4339f" }, "downloads": -1, "filename": "Connor-0.6.1-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "d55cfe57b2c064c40009abb70c659ec2", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": null, "size": 26276, "upload_time": "2018-08-17T02:13:35", "url": "https://files.pythonhosted.org/packages/28/56/a1f3455a421d0b6e14b21a3f8ddf483dfdd5264f2e176d669a5ea646c3c1/Connor-0.6.1-py2.py3-none-any.whl" } ] }