{ "info": { "author": "Adam Taranto", "author_email": "adam.taranto@anu.edu.au", "bugtrack_url": null, "classifiers": [ "Development Status :: 4 - Beta", "Environment :: Console", "Intended Audience :: Science/Research", "License :: OSI Approved :: MIT License", "Natural Language :: English", "Operating System :: OS Independent", "Programming Language :: Python :: 3", "Topic :: Scientific/Engineering :: Bio-Informatics", "Topic :: Software Development :: Libraries :: Python Modules" ], "description": "[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)\n\n# deRIP2\n\nPredict progenitor sequence of fungal repeat families by correcting for RIP-like mutations \n(CpA --> TpA) and cytosine deamination (C --> T) events.\n\nMask RIP or deamination events from input alignment as ambiguous bases.\n\n# Table of contents\n* [Algorithm overview](#algorithm-overview)\n* [Options and usage](#options-and-usage)\n * [Installation](#installation)\n * [Example usage](#example-usage)\n * [Standard options](#standard-options)\n* [Issues](#issues)\n* [License](#license)\n\n## Algorithm overview\n\nFor each column in input alignment:\n - Check if number of gapped rows is greater than max gap proportion. If true, then a gap is added to the output sequence.\n - Set invariant column values in output sequence.\n - If at least X proportion of bases are C/T or G/A (i.e. maxSNPnoise = 0.4, then at least 0.6 of positions in column must be C/T or G/A).\n - If reaminate option is set then revert T-->C or A-->G.\n - If reaminate is not set then check for number of positions in RIP dinucleotide context (C/TpA or TpG/A).\n - If proportion of positions in column in RIP-like context => minRIPlike threshold, AND at least one substrate and one product motif (i.e. CpA and TpA) is present, perform RIP correction in output sequence.\n - For all remaining positions in output sequence (not filled by gap, reaminate, or RIP-correction) inherit sequence from input sequence with the fewest observed RIP events (or greatest GC content if not RIP detected or multiple sequences sharing min-RIP count).\n\nOutputs:\n - Corrected sequence as fasta.\n - Optional, alignment with: \n - Corrected sequence appended.\n - With corrected positions masked as ambiguous bases.\n \n\n## Options and Usage\n\n### Installation\n\nRequires Python => v3.6\n\nClone from this repository:\n\n```bash\n% git clone https://github.com/Adamtaranto/deRIP2.git && cd deRIP2 && pip install -e .\n```\n\nInstall from PyPi.\n\n```bash\n% pip install derip2\n```\n\nTest installation.\n\n```bash\n# Print version number and exit.\n% derip2 --version\nderip2 0.0.2\n\n# Get usage information\n% derip2 --help\n```\n\n### Example usage\n\nFor aligned sequences in 'myalignment.fa':\n - Any column >= 70% gap positions is not corrected.\n - Bases in column must be >= 80% C/T or G/A \n - At least 50% bases must be in RIP dinucleotide context (C/T as CpA / TpA)\n - Inherit all remaining uncorrected positions from least RIP'd sequence.\n - Mask all substrate and product motifs from corrected columns as ambiguous bases (i.e. CpA to TpA --> YpA)\n\n```bash\nderip2 --inAln myalignment.fa --format fasta \\\n--maxGaps 0.7 \\\n--maxSNPnoise 0.2 \\\n--minRIPlike 0.5 \\\n--outDir results \\\n--outName deRIPed_sequence.fa \\\n--outAlnName aligment_with_deRIP.fa \\\n--label deRIPseqName \\\n--mask \n```\n\n**Output:** \n - results/deRIPed_sequence.fa\n - results/masked_aligment_with_deRIP.fa\n\n### Standard options\n\n```\nUsage: derip2 [-h] [--version] -i INALN\n [--format {clustal,emboss,fasta,fasta-m10,ig,nexus,phylip,phylip-sequential,phylip-relaxed,stockholm}]\n [-g MAXGAPS] [-a] [--maxSNPnoise MAXSNPNOISE]\n [--minRIPlike MINRIPLIKE] [--fillmaxgc] [--fillindex FILLINDEX]\n [--mask] [--noappend] [-d OUTDIR] [-o OUTNAME]\n [--outAlnName OUTALNNAME]\n [--outAlnFormat {clustal,emboss,fasta,fasta-m10,ig,nexus,phylip,phylip-sequential,phylip-relaxed,stockholm}]\n [--label LABEL]\n\nPredict ancestral sequence of fungal repeat elements by correcting for RIP-\nlike mutations or cytosine deamination in multi-sequence DNA alignments. \nOptionally, mask corrected positions in alignment.\n\noptional arguments:\n -h, --help show this help message and exit\n --version show program's version number and exit\n -i INALN, --inAln INALN\n Multiple sequence alignment.\n --format {clustal,emboss,fasta,fasta-m10,ig,nexus,phylip,phylip-sequential,phylip-relaxed,stockholm}\n Format of input alignment.\n -g MAXGAPS, --maxGaps MAXGAPS\n Maximum proportion of gapped positions in column to be\n tolerated before forcing a gap in final deRIP\n sequence.\n -a, --reaminate Correct deamination events in non-RIP contexts.\n --maxSNPnoise MAXSNPNOISE\n Maximum proportion of conflicting SNPs permitted\n before excluding column from RIP/deamination\n assessment. i.e. By default a column with >= 0.5 'C/T'\n bases will have 'TpA' positions logged as RIP events.\n --minRIPlike MINRIPLIKE\n Minimum proportion of deamination events in RIP\n context (5' CpA 3' --> 5' TpA 3') required for column\n to deRIP'd in final sequence. Note: If 'reaminate'\n option is set all deamination events will be corrected\n --fillmaxgc By default uncorrected positions in the output\n sequence are filled from the sequence with the lowest\n RIP count. If this option is set remaining positions\n are filled from the sequence with the highest G/C\n content.\n --fillindex FILLINDEX\n Force selection of alignment row to fill uncorrected\n positions from by row index number (indexed from 0).\n Note: Will override '--fillmaxgc' option.\n --mask Mask corrected positions in alignment with degenerate\n IUPAC codes.\n --noappend If set, do not append deRIP'd sequence to output\n alignment.\n -d OUTDIR, --outDir OUTDIR\n Directory for deRIP'd sequence files to be written to.\n -o OUTNAME, --outName OUTNAME\n Write deRIP sequence to this file.\n --outAlnName OUTALNNAME\n Optional: If set write alignment including deRIP\n sequence to this file.\n --outAlnFormat {clustal,emboss,fasta,fasta-m10,ig,nexus,phylip,phylip-sequential,phylip-relaxed,stockholm}\n Optional: Write alignment including deRIP sequence to\n file of format X.\n --label LABEL Use label as name for deRIP'd sequence in output\n files.\n```\n\n## Issues\nSubmit feedback to the [Issue Tracker](https://github.com/Adamtaranto/deRIP2/issues)\n\n## License\nSoftware provided under MIT license.", "description_content_type": "text/markdown", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/Adamtaranto/deRIP2", "keywords": "Transposon,RIP,TE", "license": "MIT", "maintainer": "", "maintainer_email": "", "name": "derip2", "package_url": "https://pypi.org/project/derip2/", "platform": "", "project_url": "https://pypi.org/project/derip2/", "project_urls": { "Homepage": "https://github.com/Adamtaranto/deRIP2" }, "release_url": "https://pypi.org/project/derip2/0.0.2/", "requires_dist": null, "requires_python": ">= 3.6", "summary": "Predict ancestral sequence of fungal repeat elements by correcting for RIP-like mutations in multi-sequence DNA alignments.", "version": "0.0.2" }, "last_serial": 5824237, "releases": { "0.0.1": [ { "comment_text": "", "digests": { "md5": "0643608cb6c8d9ca7b0d43e972550a0f", "sha256": "874bff1aa950d57a3a8bbe7f3cff46de999ce53a89a7d079b1038a896ffc376f" }, "downloads": -1, "filename": "derip2-0.0.1.tar.gz", "has_sig": true, "md5_digest": "0643608cb6c8d9ca7b0d43e972550a0f", "packagetype": "sdist", "python_version": "source", "requires_python": ">= 3.6", "size": 8776, "upload_time": "2019-02-21T15:19:05", "url": "https://files.pythonhosted.org/packages/a8/c6/5cfac339b83579ae36b9bdf8698ff73ff70c75568ec7e72c34ee3d09ee15/derip2-0.0.1.tar.gz" } ], "0.0.2": [ { "comment_text": "", "digests": { "md5": "6cdaf78e3219c493fafb16947d344dbf", "sha256": "3e2edfe693fba60cc02ba6aa5d99b64c76f2c823c75ee6b0855259364558b428" }, "downloads": -1, "filename": "derip2-0.0.2.tar.gz", "has_sig": true, "md5_digest": "6cdaf78e3219c493fafb16947d344dbf", "packagetype": "sdist", "python_version": "source", "requires_python": ">= 3.6", "size": 13021, "upload_time": "2019-09-13T07:23:07", "url": "https://files.pythonhosted.org/packages/78/03/8b4800ee8718001f057829f8a6757f3cd57f3f880d6448e981c2a83fe8a0/derip2-0.0.2.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "6cdaf78e3219c493fafb16947d344dbf", "sha256": "3e2edfe693fba60cc02ba6aa5d99b64c76f2c823c75ee6b0855259364558b428" }, "downloads": -1, "filename": "derip2-0.0.2.tar.gz", "has_sig": true, "md5_digest": "6cdaf78e3219c493fafb16947d344dbf", "packagetype": "sdist", "python_version": "source", "requires_python": ">= 3.6", "size": 13021, "upload_time": "2019-09-13T07:23:07", "url": "https://files.pythonhosted.org/packages/78/03/8b4800ee8718001f057829f8a6757f3cd57f3f880d6448e981c2a83fe8a0/derip2-0.0.2.tar.gz" } ] }