{ "info": { "author": "Andrew J. Page", "author_email": "path-help@sanger.ac.uk", "bugtrack_url": null, "classifiers": [ "Development Status :: 4 - Beta", "Intended Audience :: Science/Research", "License :: OSI Approved :: GNU General Public License v3 (GPLv3)", "Programming Language :: Python :: 3 :: Only", "Topic :: Scientific/Engineering :: Bio-Informatics" ], "description": "# Krocus\nGenome sequencing is rapidly being adopted in reference labs and hospitals for bacterial outbreak investigation and diagnostics where time is critical. Seven gene multi-locus sequence typing is a standard tool for broadly classifying samples into sequence types, allowing, in many cases, to rule a sample in or out of an outbreak, or allowing for general characteristics about a bacterial strain to be inferred. Long read sequencing technologies, such as from PacBio or Oxford Nanopore, can produce read data within minutes of an experiment starting, unlike short read sequencing technologies which require many hours/days. However, the error rates of raw uncorrected long read data are very high. We present Krocus which can predict a sequence type directly from uncorrected long reads, and which was designed to consume read data as it is produced, providing results in minutes. It is the only tool which can do this from uncorrected long reads. We tested Krocus on over 600 samples sequenced with using long read sequencing technologies from PacBio and Oxford Nanopore. It provides sequence types on average within 90 seconds, with a sensitivity of 94% and specificity of 97%, directly from uncorrected raw sequence reads. The software is written in Python and is available under the open source license GNU GPL version 3.\n\n[![Build Status](https://travis-ci.org/andrewjpage/krocus.svg?branch=master)](https://travis-ci.org/andrewjpage/krocus)\n\n# Paper\nAndrew J. Page, Jacqueline A. Keane. (2018) Rapid multi-locus sequence typing direct from uncorrected long reads using Krocus. PeerJ 6:e5233 https://doi.org/10.7717/peerj.5233\n\n# Installation\nThe only dependancy is Python3. Assuming you have python 3.3+ and pip installed, just run:\n```\npip3 install krocus\n```\n\nor if you wish to install the latest development version:\n```\npip3 install git+git://github.com/andrewjpage/krocus.git\n```\n\nOn some systems pip3 may be just called pip.\n\n## Debian/Ubuntu (Trusty/Xenial)\nTo install Python3 on Ubuntu, as root run:\n```\napt-get update -qq\napt-get install -y git python3 python3-setuptools python3-biopython python3-pip\npip3 install krocus\n```\n\n## Conda\nFirst install Conda (Python3), then run:\n```\nconda install krocus\n```\n\n## Windows\nLike virtually all Bioinformatics software, this software is unlikely to work on Windows. Try using a Linux virtual machine.\n\n# Usage\n## krocus_database_downloader script\nFirst of all you need MLST databases. There is a snapshot bundled with this repository for your convenience, or alternatively you can use the downloader script to get the latest data. You will need internet access for this step.\n\n```\nusage: krocus_database_downloader [options]\n\nDownload\n\noptional arguments:\n -h, --help show this help message and exit\n --list_species, -l List all available species (default: False)\n --species SPECIES, -s SPECIES\n Species to download (default: None)\n --output_directory OUTPUT_DIRECTORY, -o OUTPUT_DIRECTORY\n Output directory (default: mlst_files)\n --verbose, -v Turn on debugging (default: False)\n --version show program's version number and exit\n\n```\nFirst of all you can get a list of available databases by running:\n```\nkrocus_database_downloader -l\n```\n\nFrom this list choose one of the species and use it for the next step:\n```\nkrocus_database_downloader --species \"Salmonella enterica\" --output_directory Salmonella_enterica\n```\nYou will now have a directory called __Salmonella_enterica___ which can be provided to the main script.\n\n## krocus script\nThis is the main script of the application. The manditory inputs are a directory containing an MLST database (from the previous step), and a FASTQ file, which can be optionally gzipped.\n```\nusage: krocus [options] allele_directory input.fastq\n\nmulti-locus sequence typing (MLST) from uncorrected long reads\n\npositional arguments:\n allele_directory Allele directory\n input_fastq Input FASTQ file (optionally gzipped)\n\noptional arguments:\n -h, --help show this help message and exit\n --filtered_reads_file FILTERED_READS_FILE, -f FILTERED_READS_FILE\n Filename to save matching reads to (default: None)\n --output_file OUTPUT_FILE, -o OUTPUT_FILE\n Output file [STDOUT] (default: None)\n --max_gap MAX_GAP Maximum gap for blocks to be contigous, measured in\n multiples of the k-mer size (default: 4)\n --margin MARGIN Flanking region around a block to use for mapping\n (default: 100)\n --min_block_size MIN_BLOCK_SIZE\n Minimum block size in bases (default: 150)\n --min_fasta_hits MIN_FASTA_HITS, -m MIN_FASTA_HITS\n Minimum No. of kmers matching a read (default: 10)\n --print_interval PRINT_INTERVAL, -p PRINT_INTERVAL\n Print ST every this number of reads (default: 200)\n --kmer KMER, -k KMER k-mer size (default: 11)\n --target_st TARGET_ST\n For performance testing print time to find given ST\n (default: None)\n --verbose, -v Turn on debugging [0]\n --version show program's version number and exit\n```\n\n### Required\n__allele_directory__: The directory containing the MLST database you wish to query against. This is generated by the krocus_database_downloader script and just contains copies of the allele sequences in FASTA format and the profile.txt file linking allele numbers to STs.\n\n__input_fastq__: This is a single FASTQ file. It can be optionally gzipped. Alternatively input can be read from stdin by using the dash character (-) as the input file name.\n\n### Options\n__kmer__: The most important parameter. Long reads have a high error rate, so if you set this too high, nothing will match (because it will contain errors). If you set it too low, everything will match, which isnt much use to you. Thinking about your data, on average how long of a stretch of bases can you get in your read without errors? This is what you should set your kmer to. For example, if you have an average of 1 error every 10 bases, then the ideal kmer would be 9.\n\n__min_fasta_hits__: This is the minimum number of matching kmers in a read, for the read to be considered for analysis. It is a hard minimum threshold which is really there to speed things up. If you set this too high, then nothing will be returned.\n\n__filtered_reads_file__: If you provide a filename for this option, all of the reads which are estimated to match one of the MLST genes are saved to a file. Only the region predicted to contain the MLST gene is saved. This can be used for downstream analysis, such as de novo assembly. This file should not already exist. \n\n__output_file__: By default the predicted sequence types are printed to screen (STDOUT). If a filename is provided, the predicted sequence types are instead printed to this file. This file should not already exist. \n\n__print_interval__: Print out the predicted sequence type every X number of reads. This is where you are performing analysis in real time and want a quick result.\n\n# Output\nThe output format is: the predicted sequence type (ST) number (column 1), the percentage k-mer coverage of the alleles (0-100) (column 2), the specific alleles identified. For each allele the name of the gene is noted and the allele number, which corresponds to a unique sequence, is given in brackets. If there is only a partial match a start (*) is appended.\n\n```\n323\t97.23\tinfB(1)\tpgi(1)\tphoE(9)*\ttonB(93)\trpoB(1)*\tgapA(2)\tmdh(1)\n```\nIn the above example the sequence type is 323,and 97.23% of k-mers making up 323 are covered by reads. 2 of the alleles are have k-mers which were not identified in the reads, possibly due to errors in the reads encountered or no reads covering that region were found. \n\n\n\n# Resource usage\nFor an 550Mbyte FASTQ file (unzipped) of long reads from a Pacbio RSII containing Salmonella required 550MB of RAM.\n\n\n\n\n", "description_content_type": "", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/sanger-pathogens/krocus", "keywords": "", "license": "GPLv3", "maintainer": "", "maintainer_email": "", "name": "krocus", "package_url": "https://pypi.org/project/krocus/", "platform": "", "project_url": "https://pypi.org/project/krocus/", "project_urls": { "Homepage": "https://github.com/sanger-pathogens/krocus" }, "release_url": "https://pypi.org/project/krocus/1.0.1/", "requires_dist": [ "biopython (>=1.68)", "pyfastaq (>=3.12.0)" ], "requires_python": "", "summary": "krocus: multi-locus sequence typing from uncorrected long reads", "version": "1.0.1" }, "last_serial": 4347603, "releases": { "0.2.2": [ { "comment_text": "", "digests": { "md5": "1f4a4886eb21ce40b7757484adfbd530", "sha256": "0a364129493712cd6a774c02c571986cd4d3386a21075680ebcabcc7e9c3ab1e" }, "downloads": -1, "filename": "krocus-0.2.2.tar.gz", "has_sig": false, "md5_digest": "1f4a4886eb21ce40b7757484adfbd530", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 11822, "upload_time": "2018-05-01T13:25:39", "url": "https://files.pythonhosted.org/packages/be/14/002c8c14745e3b97ae56d33881eab196a23788956b0f1d7c2b2c8c4a30fc/krocus-0.2.2.tar.gz" } ], "1.0.0": [ { "comment_text": "", "digests": { "md5": "28ccf7e505369258d667a2b8e18df53e", "sha256": "203c0d78e8a83b20b077b394ca98a75cd4f001402134bdebfc471c3e6e0ed139" }, "downloads": -1, "filename": "krocus-1.0.0.tar.gz", "has_sig": false, "md5_digest": "28ccf7e505369258d667a2b8e18df53e", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 13154, "upload_time": "2018-08-13T20:10:02", "url": "https://files.pythonhosted.org/packages/17/4e/6ac93907bc49a69a826df28f2125dcec751a24bf611dd145d52189769402/krocus-1.0.0.tar.gz" } ], "1.0.1": [ { "comment_text": "", "digests": { "md5": "4327799e08bda79e968edc5073a3b19f", "sha256": "9d57e152141cc383ef7a1859294e4b82491eff75935b5262cfc5bf308dde3a1a" }, "downloads": -1, "filename": "krocus-1.0.1-py3-none-any.whl", "has_sig": false, "md5_digest": "4327799e08bda79e968edc5073a3b19f", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 21587, "upload_time": "2018-10-06T17:51:26", "url": "https://files.pythonhosted.org/packages/c1/b5/10b9b42f9e012e4c658939144c6e8fae4d3187af40a60730d04d08deecf9/krocus-1.0.1-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "a1b99b756111df444f61f7b824f45a9c", "sha256": "ad3c54c35c88a06c1f9908b4f3511e6497a42ce67d49cdb78b645382429cd8d6" }, "downloads": -1, "filename": "krocus-1.0.1.tar.gz", "has_sig": false, "md5_digest": "a1b99b756111df444f61f7b824f45a9c", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 28480, "upload_time": "2018-10-06T17:51:29", "url": "https://files.pythonhosted.org/packages/46/0d/53440ecd90421019a92c4ef1aace0cfd069cf5ab5259530e5f1e1fc9be18/krocus-1.0.1.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "4327799e08bda79e968edc5073a3b19f", "sha256": "9d57e152141cc383ef7a1859294e4b82491eff75935b5262cfc5bf308dde3a1a" }, "downloads": -1, "filename": "krocus-1.0.1-py3-none-any.whl", "has_sig": false, "md5_digest": "4327799e08bda79e968edc5073a3b19f", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 21587, "upload_time": "2018-10-06T17:51:26", "url": "https://files.pythonhosted.org/packages/c1/b5/10b9b42f9e012e4c658939144c6e8fae4d3187af40a60730d04d08deecf9/krocus-1.0.1-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "a1b99b756111df444f61f7b824f45a9c", "sha256": "ad3c54c35c88a06c1f9908b4f3511e6497a42ce67d49cdb78b645382429cd8d6" }, "downloads": -1, "filename": "krocus-1.0.1.tar.gz", "has_sig": false, "md5_digest": "a1b99b756111df444f61f7b824f45a9c", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 28480, "upload_time": "2018-10-06T17:51:29", "url": "https://files.pythonhosted.org/packages/46/0d/53440ecd90421019a92c4ef1aace0cfd069cf5ab5259530e5f1e1fc9be18/krocus-1.0.1.tar.gz" } ] }