{ "info": { "author": "Frank Aylward", "author_email": "faylward@vt.edu", "bugtrack_url": null, "classifiers": [ "License :: OSI Approved :: GNU Lesser General Public License v3 or later (LGPLv3+)", "Operating System :: Unix", "Programming Language :: Python :: 3" ], "description": "# ViralRecall\nViralRecall is a flexible command-line tool for predicting prophage and other virus-like regions in genomic data.\nThis code is under development and likely to change as bugs are identified.\n\n### Dependencies\nViralRecall is written in Python 3.5.6 and requires biopython, matplotlib, numpy, and pandas. \nViralRecall uses Prodigal and HMMER3 for protein prediction and HMM searches, respectively. Please ensure these tools are installed in your PATH before using. \nOne a Unix system you should be able to install these tools with: \n\n> sudo apt install prodigal\n\nand\n\n> sudo apt install hmmer\n\nor if you don't have sudo priveleges, you can try with conda:\n\n>conda install prodigal -c bioconda\n\nand\n\n>conda install hmmer3 -c bioconda\n\n### Installation\nPlease ensure you are using > Python 3.5.2 and have the appropriate python modules installed. If this is an issue please create a Python environment using conda (see here https://conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html) \nViralRecall was tested on Ubuntu 16.04 and should work on most Unix-based systems. To see the help menu use:\n> python viralrecall.py -h\n\n### Databases\nViralrecall uses two main HMM databases to analyze viral signatures in genomic data. The first is Pfam, which is a broad-specificity database that detects many protein families that are common in the genomes of cellular organisms. The Pfam database here has been modified to remove common viral protein families. The second HMM database is VOGDB, which is a comprehensive database of known viral protein families. The full VOG HMM database is quite large, and in some situations users may wish to use smaller sets to speed up runtime. We have created three VOG database that vary in size (all, large, and small). Users wishing to perform an exhaustive search should use the \"all\" database, while those preferring a faster search may wish to use the \"small\" database (the \"large\" database is an intermediate size option). \n\nThe database files are available for download from the Virginia Tech library system. To download and unpack, navigate to the folder that contains the viralrecall.py script and type:\n> wget -O hmm.tar.gz https://data.lib.vt.edu/downloads/8k71nh28c\n\nand then\n\n> tar -xvzf hmm.tar.gz\n\n### Basic Usage\nTo test if ViralRecall will run properly type:\n> python viralrecall.py -i examples/test_seq.fna -p test_outdir -t 2 -db small -f\n\nResults should be located in the test_outdir folder. \nThe output folder will contain:\n\n*.faa: The proteins predicted from the input file using Prodigal\n\n*.full_annot.tsv: A full annotation table of the predicted ORFs. \n\n*.prophage.annot.tsv: An annotation of only the viral regions (only present if some viral regions found)\n\n*summary.tsv Summary statistics for the predicted viral regions. \n\n*.pfamout Raw output of the Pfam HMMER3 search\n\n*.vogout Raw output of the VOGDB HMMER3 search\n\n\nAdditionally, for each viral region viralrecall will print out .faa and .fna files for the proteins and nucleotide sequences for the regions found. \nPlease be sure to use only .fna files as input. \n\n\n### Options\n\nThere are several parameters you can change in viralrecall depending on your preferences and the data you're analyzing. The important parameters that will influence the results are:\n\n**-s, --minscore**\nThis is the mean score that a genomic regions needs to have in order to pass the filter and get reported as a viral region. The score is calculated from the HMMER3 scores, with higher scores indicating more and better matches to the VOG database, and lower scores indicating more and higher matches to the Pfam database. The default is 10. \n\n**-w, --window**\nSize of the sliding window to use for calculating moving averages. A smaller window may help predict short viral regions, but may split large viral regions into several pieces. \n\n**-m, --minsize**\nMinimum size, in kilobases, of the viral regions to report. \n\n**-v, --minvog**\nMinimum number of hits against the VOG database that must be recorded in a region in order for it to be reported (larger values == higher confidence). \n\nFor example, if we wanted to recover only long, well-defined viral regions we could use the following command:\n> python viralrecall.py -i examples/test_seq.fna -p testout -s 15 -m 30 -v 10\n\nHere we are asking for only regions that have a mean score >= 15, are at least 30 kilobases long, and have at least 10 VOG hits.\n\nIf we want to quickly re-do the above analysis with different parameters, but without re-doing gene predictions and HMMER3 searches, we can use the -r flag:\n\n> python viralrecall.py -i examples/test_seq.fna -p testout -s 15 -m 30 -v 10 -r\n\nThis should re-calculate the results quickly and allow you to identify the most appropriate ones for your analysis. \n\n\n### Batch mode\nIf you have many sequences you wish to test you can put them all in a folder and use batch mode. Here the input (-i) should point to a folder with .fna files in it. \nBasic usage is:\n\n> python viralrecall.py -i examples/testfolder -p folderout -b\n\nAll of the output files should have their own folder in the folderout directory. You can also use the -b flag with the -r flag for quick re-calculations. \n\n\n\n## Citation\n\nCitation for this tool is pending. If you plan on using this tool in an article please email Frank Aylward to ask the best way to cite this work at faylward _at_ vt dot edu\n\nAlso, since this tool requires Prodigal and HMMER3, please cite thest tools as well. Their citations are:\n\nHyatt et al. \u201cProdigal: prokaryotic gene recognition and translation initiation site identification\u201d. BMC bioinformatics, 2010. \nEddy, \"A new generation of homology search tools based on probabilistic inference\". Genome Informatics, 2009. \n\n\n\n\n\n\n\n\n\n\n\n", "description_content_type": "text/markdown", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/faylward/viralrecall", "keywords": "", "license": "", "maintainer": "", "maintainer_email": "", "name": "viralrecall", "package_url": "https://pypi.org/project/viralrecall/", "platform": "", "project_url": "https://pypi.org/project/viralrecall/", "project_urls": { "Homepage": "https://github.com/faylward/viralrecall" }, "release_url": "https://pypi.org/project/viralrecall/0.0.1/", "requires_dist": null, "requires_python": "", "summary": "Tool for predicting prophage and other virus-like regions in genomic data", "version": "0.0.1" }, "last_serial": 4983644, "releases": { "0.0.1": [ { "comment_text": "", "digests": { "md5": "cd14f8fcba94106d924c2023f1bfe7c3", "sha256": "353c670e70ea57ff9be0fb51645cf2985b716c6de393948d89ec293e205d344d" }, "downloads": -1, "filename": "viralrecall-0.0.1-py3-none-any.whl", "has_sig": false, "md5_digest": "cd14f8fcba94106d924c2023f1bfe7c3", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 15973, "upload_time": "2019-03-25T17:09:12", "url": "https://files.pythonhosted.org/packages/10/8e/242ba9fccd0048e988bb8fbcbe95d3fc9e9821a3ff6a1650ad86c86d2bd1/viralrecall-0.0.1-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "4a7f6855ac68dd9d7fbee331a4d948a8", "sha256": "f860b98438a70b78922c7b4439b4d1cfb20e17725aae4a337331da4cb266ca37" }, "downloads": -1, "filename": "viralrecall-0.0.1.tar.gz", "has_sig": false, "md5_digest": "4a7f6855ac68dd9d7fbee331a4d948a8", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 3832, "upload_time": "2019-03-25T17:09:14", "url": "https://files.pythonhosted.org/packages/5f/24/cf165e1390b68531a6bb71efb235da9a546d070b9d9ca287084bd1cfe4d4/viralrecall-0.0.1.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "cd14f8fcba94106d924c2023f1bfe7c3", "sha256": "353c670e70ea57ff9be0fb51645cf2985b716c6de393948d89ec293e205d344d" }, "downloads": -1, "filename": "viralrecall-0.0.1-py3-none-any.whl", "has_sig": false, "md5_digest": "cd14f8fcba94106d924c2023f1bfe7c3", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 15973, "upload_time": "2019-03-25T17:09:12", "url": "https://files.pythonhosted.org/packages/10/8e/242ba9fccd0048e988bb8fbcbe95d3fc9e9821a3ff6a1650ad86c86d2bd1/viralrecall-0.0.1-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "4a7f6855ac68dd9d7fbee331a4d948a8", "sha256": "f860b98438a70b78922c7b4439b4d1cfb20e17725aae4a337331da4cb266ca37" }, "downloads": -1, "filename": "viralrecall-0.0.1.tar.gz", "has_sig": false, "md5_digest": "4a7f6855ac68dd9d7fbee331a4d948a8", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 3832, "upload_time": "2019-03-25T17:09:14", "url": "https://files.pythonhosted.org/packages/5f/24/cf165e1390b68531a6bb71efb235da9a546d070b9d9ca287084bd1cfe4d4/viralrecall-0.0.1.tar.gz" } ] }