{ "info": { "author": "Rob Williams", "author_email": "robccwilliams@gmail.com", "bugtrack_url": null, "classifiers": [ "Development Status :: 3 - Alpha", "Environment :: MacOS X", "Intended Audience :: Science/Research", "License :: OSI Approved :: GNU General Public License v3 or later (GPLv3+)", "Operating System :: MacOS :: MacOS X", "Programming Language :: Python :: 3.6", "Programming Language :: Python :: 3.7", "Topic :: Scientific/Engineering :: Bio-Informatics" ], "description": "# ERVin\n\nThis is a tool to allow for the detection of ERVs in genome segments\n\nThis has been designed primarily with a view to be used on OSX, cross-compatibility with other UNIX-based architectures may exist, but it almost certainly will not run on Microsoft Windows systems\n\n### Installation\n\n`pip install ervin`\n\n### Requirements\n- Python 3.6+ ([Download](https://www.python.org/downloads/))\n- NCBI BLAST suite must be installed locally ([Download](ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/))\n- Local genome db to be queried\n - This can be located in a directory of your choosing, but must be named in a `config.json` file\n - There is a `config.json.templ` file which will be used to create a `config.json` file from with the contained defaults at first run if you do not provide your own\n \n \n### Current functionality\nERViN Currently:\n- When provided with a `.fasta` file of probe sequences\n - Runs local `tblastn` against the specified genome database, filtering the results based on alignment length and e-value (optional arguments which result in default values of >400 and <0.009 respectively when omitted)\n - Parses and merges filtered results where appropriate \n - Runs resultant fasta records against a local Viruses refseq database (a copy will be downloaded if not user provided, and will be kept up-to-date) using `tblastn`, grouping the records in a final set of output files based on their top hit\n \n### Usage\n\n#### Arguments\n\n
| Argument | \nVerbose | \nDescription | \nType | \nRequired | \nDefault | \n
|---|---|---|---|---|---|
-f | \n--file | \nSource fasta file containing the sample probe records to run through tblastn | \nFilepath | \nTrue | \n\n |
-gdb | \n--genome_database | \nName of the genome database against which the probe records are to be BLASTed (located in the genome db store specified in the config file | \nstr | \nTrue | \n\n |
-o | \n--output_dir | \nLocation to which to write the result files | \nstr | \nFalse | \n<current_working_directory>/OUTPUT | \n
-a | \n--alignment_len_threshold | \nMinimum length threshold that BLAST result alignment sequence lengths should exceed | \nint | \nFalse | \n400 | \n
-e | \n--e_value | \nMaximum e-value threshold that BLAST result e-values should exceed | \nfloat | \nFalse | \n0.009 | \n
ervin -f data/fasta_file.fasta -gdb genome_db\n\n\nervin -f data/fasta_file.fasta -gdb genome_db -o results/probe_blaster_output\n\n\nervin -f data/fasta_file.fasta -gdb genome_db -o results/probe_blaster_output -a 500\n\n\nervin -f data/fasta_file.fasta -gdb genome_db -o results/probe_blaster_output -e 0.0008\n\n\nervin -f data/fasta_file.fasta -gdb genome_db -o results/probe_blaster_output -a 800 -e 0.01\n