{ "info": { "author": "Kristoffer Sahlin", "author_email": "kxs624@psu.edu", "bugtrack_url": null, "classifiers": [ "Development Status :: 3 - Alpha", "Programming Language :: Python :: 2.7", "Programming Language :: Python :: 3", "Programming Language :: Python :: 3.4", "Programming Language :: Python :: 3.5", "Programming Language :: Python :: 3.6" ], "description": "IsoCon\n========\n\nIsoCon is distributed as a python package supported on Linux / OSX with python v>=2.7, and 3.4-3.6, 3.5-dev and 3.6-dev [![Build Status](https://travis-ci.org/ksahlin/IsoCon.svg?branch=master)](https://travis-ci.org/ksahlin/IsoCon)\n\n\nIsoCon is a tool for deriving *finished transcripts* from *Iso-Seq* reads from *targeted* sequencing. Input is either a set of full-length-non-chimeric reads in fasta format and the CCS base call values as a bam file, or a fastq file of CCS reads and their quality values provided by the ccs caller. IsoCon can only run on targeted Iso-Seq datasets due to underlying alignment approach --- runtime does not scale for nontargeted sequencing. The output is a set of predicted transcripts. IsoCon can be run as follows\n\n```\nIsoCon pipeline -fl_reads -outfolder --ccs \n```\nor\n\n```\nIsoCon pipeline -fl_reads -outfolder \n```\n\npredicted transcripts are found in file **/path/to/output/final_candidates.fa**. Reads that could not be corrected or clustered are found in /path/to/output/not_converged.fa. \n\n* Can IsoCon be run on nontargeted Iso-Seq datasets? [see here](https://github.com/ksahlin/IsoCon/issues/2). \n* How does my data set affect the runtime? [see here](https://github.com/ksahlin/IsoCon/issues/3) \n\nFor more instructions see below.\n\n\nTable of Contents\n=================\n\n * [Table of Contents](#table-of-contents)\n * [INSTALLATION](#installation)\n * [Using pip](#using-pip)\n * [Downloading source from GitHub](#downloading-source-from-github)\n * [Dependencies](#dependencies)\n * [USAGE](#usage)\n * [Pipline](#pipline)\n * [Output](#output)\n * [get_candidates](#get_candidates)\n * [stat_filter](#stat_filter)\n * [Parameters](#parameters)\n * [CREDITS](#credits)\n * [LICENCE](#licence)\n\n\nINSTALLATION\n----------------\n\nThe preferred way to install IsoCon is with pythons package installer pip.\n\n### Using pip \n\n`pip` is pythons official package installer. This section assumes you have `python` (v2.7 or >=3.4) and a recent version of `pip` installed which should be included in most python versions. If you do not have `pip`, it can be easily installed [from here](https://pip.pypa.io/en/stable/installing/) and upgraded with `pip install --upgrade pip`. \n\nWith `python` and `pip` available, create a file `requirements.txt` with contents copied from [this file](https://github.com/ksahlin/IsoCon/blob/master/requirements.txt). Then, type in terminal \n\n```\npip install --requirement requirements.txt IsoCon\n```\n\nThis should install IsoCon. With proper installation of **IsoCon**, you should be able to issue the command `IsoCon pipeline` to view user instructions. You should also be able to run IsoCon on this [small dataset](https://github.com/ksahlin/IsoCon/tree/master/test/data). Simply download the test dataset and run:\n\n```\nIsoCon pipeline -fl_reads [path/simulated_pacbio_reads.fa] -outfolder [output path]\n```\n\n`pip` will install the dependencies automatically for you. IsoCon has been built with python 2.7, 3.4-3.6 on Linux systems using [Travis](https://travis-ci.org/). For customized installation of latest master branch, see below.\n\n### Downloading source from GitHub\n\n#### Dependencies\n\nMake sure the below listed dependencies are installed (installation links below). Versions in parenthesis are suggested as IsoCon has not been tested with earlier versions of these libraries. However, IsoCon may also work with earliear versions of these libaries.\n* [edlib](https://github.com/Martinsos/edlib \"edlib's Homepage\"), for installation see [link](https://github.com/Martinsos/edlib/tree/master/bindings/python#installation) (>= v1.1.2)\n* [networkx](https://networkx.github.io/) (>= v1.10)\n* [parasail](https://github.com/jeffdaily/parasail-python)\n* [pysam](http://pysam.readthedocs.io/en/latest/installation.html) (>= v0.11)\n\n\nWith these dependencies installed. Run\n\n```sh\ngit clone https://github.com/ksahlin/IsoCon.git\ncd IsoCon\n./IsoCon\n```\n\n\nUSAGE\n-------\n\nIsoCon's algorithm consists of two main phases; the error correction step and the statistical testing step. IsoCon can run these two steps in one go using `IsoCon pipeline`, or it can run only the correction or statistical test steps using `IsoCon get_candidates` and `IsoCon stat_filter` respectively. The preffered and most tested way is to use the entire pipeline `IsoCon pipeline`, but the other two settings can come in handy for specific cases. For example, running only `IsoCon get_candidates` will give more sequences if one is not concerned about precision and will also be faster, while one might use only `IsoCon stat_filter` using different parameters for a set of already constructed candidates in order to prevent rerunning the error correction step.\n\n\n### Pipeline\n\nIsoCon takes two input files: (1) a fasta file of full length non-chimeric (flnc) CCS reads and (2) the bam file of CCS reads containing predicted base call quality values. The fasta file containing flnc can be obtained from PacBios Iso-Seq pipeline [ToFU](https://github.com/PacificBiosciences/IsoSeq_SA3nUP/wiki) and the bam file is the output of running the consensus caller algorthm [ccs](https://github.com/PacificBiosciences/unanimity/blob/master/doc/PBCCS.md) on the Iso-Seq reads (ccs takes bam files so if you have bax files, convert them using [bax2bam](https://github.com/PacificBiosciences/unanimity/blob/master/doc/PBCCS.md#input) ). IsoCon can then be run as\n\n```\nIsoCon pipeline -fl_reads -outfolder [--ccs ]\n```\n\nIsoCon also supports taking only the fasta read file as input. Without the base call quality values in `--ccs`, IsoCon will use an empirically estimated error model. The ability to take only the flnc fasta file as input is useful when the reads have been altered after the CCS base calling algorithm, \\emph{e.g.}, from error correction using Illumina reads. **However, we highly recommend supplying the CCS quality values to IsoCon if CCS reads has not gone through any additional correction.** \n\nSimply omit the `--ccs` parameter if running IsoCon without base call quality values as\n```\nIsoCon pipeline -fl_reads -outfolder \n```\n\n#### Output\n\nThe final high quality transcripts are written to the file `final_candidates.fa` in the output folder. If there was only one or two reads coming from a transcript, which is sufficiently different from other reads (exon difference), it will be output in the file `not_converged.fa`. This file may contain other erroneous CCS reads such as chimeras. The output also contains a file `cluster_info.tsv` that shows for each read which candidate it was assigned to in `final_candidates.fa`.\n\n### get_candidates\n\nRuns only the error correction step. The output is the converged candidates in a fasta file.\n\n```\nIsoCon get_candidates -fl_reads -outfolder \n```\n\n### stat_filter\n\nRuns only the statistical filtering of candidates.\n\n```\nIsoCon pipeline -fl_reads -outfolder -candidates [--ccs ]\n```\nObserve that `candidate_transcripts.fa` does not have to come from IsoCon's error correction algorithm. For example, this could either be a set of already validated transcripts to which one would like to see if they occur in the CCS reads, or they could be Illumina (or in other ways) corrected CCS reads.\n\n\n### Parameters\n\n```\n $ IsoCon pipeline --help\nusage: Pipeline for obtaining non-redundant haplotype specific transcript isoforms using PacBio IsoSeq reads. pipeline\n [-h] -fl_reads FL_READS -outfolder OUTFOLDER [--ccs CCS]\n [--nr_cores NR_CORES] [--verbose]\n [--neighbor_search_depth NEIGHBOR_SEARCH_DEPTH]\n [--min_exon_diff MIN_EXON_DIFF]\n [--min_candidate_support MIN_CANDIDATE_SUPPORT]\n [--p_value_threshold P_VALUE_THRESHOLD]\n [--min_test_ratio MIN_TEST_RATIO]\n [--max_phred_q_trusted MAX_PHRED_Q_TRUSTED]\n [--ignore_ends_len IGNORE_ENDS_LEN] [--cleanup]\n [--prefilter_candidates]\n\noptional arguments:\n -h, --help show this help message and exit\n --ccs CCS BAM/SAM file with CCS sequence predictions.\n --nr_cores NR_CORES Number of cores to use. [default = 16]\n --verbose This will print more information abount workflow and\n provide plots of similarity network etc.\n --neighbor_search_depth NEIGHBOR_SEARCH_DEPTH\n Maximum number of pairwise alignments in search matrix\n to find nearest_neighbor. [default =2**32]\n --min_exon_diff MIN_EXON_DIFF\n Minimum consequtive base pair difference between two\n neigborss in order to remove edge. If more than this\n nr of consequtive base pair difference, its likely an\n exon difference. [default =20]\n --min_candidate_support MIN_CANDIDATE_SUPPORT\n Required minimum number of reads converged to the same\n sequence to be included in statistical test. [default\n 2]\n --p_value_threshold P_VALUE_THRESHOLD\n Threshold for statistical test, filter everythin below\n this threshold . [default = 0.01]\n --min_test_ratio MIN_TEST_RATIO\n Don't do tests where candidate c has more than\n reads assigned to itself compared to\n the reference t, calculated as test_ratio = c/t,\n because c will likely be highly significant [default =\n 5]\n --max_phred_q_trusted MAX_PHRED_Q_TRUSTED\n Maximum PHRED quality score trusted (T), linerarly\n remaps quality score interval [0,93] --> [0, T].\n Quality scores may have some uncertainty since T is\n estimated from a consensus caller algorithm.\n --ignore_ends_len IGNORE_ENDS_LEN\n Number of bp to ignore in ends. If two candidates are\n identical except in ends of this size, they are\n collapsed and the longest common substing is chosen to\n represent them. In statistical test step, the nearest\n neighbors are found based on ignoring the ends of this\n size. Also indels \"hanging off\" ends of this size will\n not be tested. [default 15].\n --cleanup Remove everything except logfile.txt,\n candidates_converged.fa and final_candidates.fa in\n output folder. [default = False]\n --prefilter_candidates\n Filter candidates if they are not consensus over any\n base pair in the candidate transcript formed from\n them, this can reduce runtime without significant loss\n in true candidates. [default = False]\n\nrequired arguments:\n -fl_reads FL_READS Fast file pacbio Reads of Insert.\n -outfolder OUTFOLDER Outfolder.\n```\n\nCREDITS\n----------------\n\nPlease cite [1] when using IsoCon.\n\n1. Kristoffer Sahlin*, Marta Tomaszkiewicz*, Kateryna D. Makova\u2020, Paul Medvedev\u2020 (2018) \"IsoCon: Deciphering highly similar multigene family transcripts from Iso-Seq data\", bioRxiv [Link](https://www.biorxiv.org/content/early/2018/01/10/246066).\n\nLICENCE\n----------------\n\nGPL v3.0, see [LICENSE.txt](https://github.com/ksahlin/IsoCon/blob/master/LICENCE.txt).\n\n\n\n", "description_content_type": "", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/ksahlin/IsoCon", "keywords": "Iso-Seq CCS PacBio transcript long-read", "license": "", "maintainer": "", "maintainer_email": "", "name": "IsoCon", "package_url": "https://pypi.org/project/IsoCon/", "platform": "", "project_url": "https://pypi.org/project/IsoCon/", "project_urls": { "Homepage": "https://github.com/ksahlin/IsoCon" }, "release_url": "https://pypi.org/project/IsoCon/0.3.2/", "requires_dist": [ "edlib (>=1.1.2)", "pysam (>=0.11)", "networkx (>=1.10)", "parasail (>=1.1.11)" ], "requires_python": ">=2.7, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*, <4", "summary": "Pipeline for obtaining non-redundant haplotype specific transcript isoforms using PacBio IsoSeq reads.", "version": "0.3.2" }, "last_serial": 4112858, "releases": { "0.2.5": [ { "comment_text": "", "digests": { "md5": "1cde5b5e4b0186af8cfec23d1e3d0171", "sha256": "4b9c47944a63442e614d19493b8c3ad124023fa69ae724ddb56413353acdfa4d" }, "downloads": -1, "filename": "IsoCon-0.2.5.tar.gz", "has_sig": false, "md5_digest": "1cde5b5e4b0186af8cfec23d1e3d0171", "packagetype": "sdist", "python_version": "source", "requires_python": ">=2.7, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*, <4", "size": 66897, "upload_time": "2017-12-28T06:07:16", "url": "https://files.pythonhosted.org/packages/fb/0d/16337232c28df5e0026a7fadbdb42ebe3382b5171de0c068c00ad7cb95ea/IsoCon-0.2.5.tar.gz" } ], "0.2.5.1": [ { "comment_text": "", "digests": { "md5": "180ab13c595068ebd652ae73cd4b2e64", "sha256": "249cba90c74b1e6b1619c401cf9a7d1693fb9aeb7e63563d3088a937260f77cf" }, "downloads": -1, "filename": "IsoCon-0.2.5.1.tar.gz", "has_sig": false, "md5_digest": "180ab13c595068ebd652ae73cd4b2e64", "packagetype": "sdist", "python_version": "source", "requires_python": ">=2.7, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*, <4", "size": 76930, "upload_time": "2017-12-28T07:22:37", "url": "https://files.pythonhosted.org/packages/71/dd/4a7aa46032738138138af3db3e4739ad36d56aae8bb2758d4e26e05070ac/IsoCon-0.2.5.1.tar.gz" } ], "0.3.0": [ { "comment_text": "", "digests": { "md5": "e7d52f0ba055b0425bbcd7397a741a3f", "sha256": "ba750f72194b4f8650401c0b817b804be0498dae8d9567d71e1fa2d521587626" }, "downloads": -1, "filename": "IsoCon-0.3.0-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "e7d52f0ba055b0425bbcd7397a741a3f", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": ">=2.7, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*, <4", "size": 73467, "upload_time": "2018-02-03T02:43:56", "url": "https://files.pythonhosted.org/packages/01/02/ea28396a0ca3ee5d36a7e68a611744ba54c0a5798ae890f9947be7304729/IsoCon-0.3.0-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "6a2fe32e6b493d5485bcec90b19d76a6", "sha256": "2c5e5b03138d8221cdfeaa1dd726d2000a3dbe65fbbb2ae4cbfa0bc3d833ab1f" }, "downloads": -1, "filename": "IsoCon-0.3.0.tar.gz", "has_sig": false, "md5_digest": "6a2fe32e6b493d5485bcec90b19d76a6", "packagetype": "sdist", "python_version": "source", "requires_python": ">=2.7, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*, <4", "size": 63998, "upload_time": "2018-02-03T02:43:58", "url": "https://files.pythonhosted.org/packages/c7/51/6a43ac7daeeec9c737e30f2bc3ea1fd45977c8c3ac622f1c36facff94fbd/IsoCon-0.3.0.tar.gz" } ], "0.3.1": [ { "comment_text": "", "digests": { "md5": "16a056cb41420983e8134ae9e573a437", "sha256": "d5a9bd33f1c28eaa4ad3daaafa4b2589d4b46fb0c510ff150ea2aa0fed8a8d27" }, "downloads": -1, "filename": "IsoCon-0.3.1.tar.gz", "has_sig": false, "md5_digest": "16a056cb41420983e8134ae9e573a437", "packagetype": "sdist", "python_version": "source", "requires_python": ">=2.7, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*, <4", "size": 70983, "upload_time": "2018-03-25T13:59:42", "url": "https://files.pythonhosted.org/packages/5c/e9/426388b5f5fd5a3366624cb0629ef13e9ea5171003272d7aae23acf1b23d/IsoCon-0.3.1.tar.gz" } ], "0.3.2": [ { "comment_text": "", "digests": { "md5": "fb4740f064a6768408f6d77183c5e761", "sha256": "a8872c860a27785fa5d37c43d27177a98341e602504156e173b2e6baa80c7c2c" }, "downloads": -1, "filename": "IsoCon-0.3.2-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "fb4740f064a6768408f6d77183c5e761", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": ">=2.7, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*, <4", "size": 83988, "upload_time": "2018-07-29T02:38:30", "url": "https://files.pythonhosted.org/packages/c4/9d/8068d6c56d58f1bfdcb496fb74b7c3eae1c20e40c0ba643e5005321614d3/IsoCon-0.3.2-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "a32af624da42d37242d152045ae55d28", "sha256": "62bb117689e20dbfb436a906332aa353c77fd4ad1b16673b6310e81d824416e0" }, "downloads": -1, "filename": "IsoCon-0.3.2.tar.gz", "has_sig": false, "md5_digest": "a32af624da42d37242d152045ae55d28", "packagetype": "sdist", "python_version": "source", "requires_python": ">=2.7, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*, <4", "size": 74577, "upload_time": "2018-07-29T02:38:36", "url": "https://files.pythonhosted.org/packages/fe/93/df5f166072f8e1fa3189da4e2f6c0005b55aa0ba7117bb921a2fb31e748b/IsoCon-0.3.2.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "fb4740f064a6768408f6d77183c5e761", "sha256": "a8872c860a27785fa5d37c43d27177a98341e602504156e173b2e6baa80c7c2c" }, "downloads": -1, "filename": "IsoCon-0.3.2-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "fb4740f064a6768408f6d77183c5e761", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": ">=2.7, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*, <4", "size": 83988, "upload_time": "2018-07-29T02:38:30", "url": "https://files.pythonhosted.org/packages/c4/9d/8068d6c56d58f1bfdcb496fb74b7c3eae1c20e40c0ba643e5005321614d3/IsoCon-0.3.2-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "a32af624da42d37242d152045ae55d28", "sha256": "62bb117689e20dbfb436a906332aa353c77fd4ad1b16673b6310e81d824416e0" }, "downloads": -1, "filename": "IsoCon-0.3.2.tar.gz", "has_sig": false, "md5_digest": "a32af624da42d37242d152045ae55d28", "packagetype": "sdist", "python_version": "source", "requires_python": ">=2.7, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*, <4", "size": 74577, "upload_time": "2018-07-29T02:38:36", "url": "https://files.pythonhosted.org/packages/fe/93/df5f166072f8e1fa3189da4e2f6c0005b55aa0ba7117bb921a2fb31e748b/IsoCon-0.3.2.tar.gz" } ] }