{ "info": { "author": "Andrew J. Page", "author_email": "andrew.page@quadram.ac.uk", "bugtrack_url": null, "classifiers": [ "Development Status :: 4 - Beta", "Intended Audience :: Science/Research", "License :: OSI Approved :: GNU General Public License v3 (GPLv3)", "Programming Language :: Python :: 3 :: Only", "Topic :: Scientific/Engineering :: Bio-Informatics" ], "description": "# TipToft\nGiven some raw uncorrected long reads, such as those from PacBio or Oxford Nanopore, predict which plasmid should be present. Assemblies of long read data can often miss out on plasmids, particularly if they are very small or have a copy number which is too high/low when compared to the chromosome. This software gives you an indication of which plasmids to expect, flagging potential issues with an assembly.\n\n[![Build Status](https://travis-ci.org/andrewjpage/tiptoft.svg?branch=master)](https://travis-ci.org/andrewjpage/tiptoft)\n[![License: GPL v3](https://img.shields.io/badge/License-GPL%20v3-brightgreen.svg)](https://github.com/andrewjpage/tiptoft/blob/master/LICENSE)\n[![codecov](https://codecov.io/gh/andrewjpage/tiptoft/branch/master/graph/badge.svg)](https://codecov.io/gh/andrewjpage/tiptoft)\n[![Docker Build Status](https://img.shields.io/docker/build/andrewjpage/tiptoft.svg)](https://hub.docker.com/r/andrewjpage/tiptoft)\n[![Docker Pulls](https://img.shields.io/docker/pulls/andrewjpage/tiptoft.svg)](https://hub.docker.com/r/andrewjpage/tiptoft) \n\n# Paper\nComing soon.\n\nPlease remember to cite the plasmidFinder paper as their database makes this software work:\n\nCarattoli *et al*, *In Silico Detection and Typing of Plasmids using PlasmidFinder and Plasmid Multilocus Sequence Typing*, **Antimicrob Agents Chemother.** 2014;58(7):3895\u20133903. [view](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4068535/)\n\n\n# Installation\nThe only dependancies are Python3 and a compiler (gcc, clang,...) and this should work on Linux or OSX. Cython needs to be installed in advance. Assuming you have Python 3.4+ and pip installed, just run:\n```\npip3 install cython\npip3 install tiptoft\n```\n\nor if you wish to install the latest development version:\n```\npip3 install git+git://github.com/andrewjpage/tiptoft.git\n```\n\n## Debian/Ubuntu (Trusty/Xenial)\nTo install Python3 on Ubuntu run:\n```\nsudo apt-get update -qq\nsudo apt-get install -y git python3 python3-setuptools python3-biopython python3-pip\npip3 install cython\npip3 install tiptoft\n```\n\n## Docker\nInstall [Docker](https://www.docker.com/). There is a docker container which gets automatically built from the latest version of TipToft. To install it:\n\n```\ndocker pull andrewjpage/tiptoft\n```\n\nTo use it you would use a command such as this (substituting in your filename/directories), using the example file in this respository:\n```\ndocker run --rm -it -v /path/to/example_data:/example_data andrewjpage/tiptoft tiptoft /example_data/ERS654932_plasmids.fastq.gz\n```\n\n## Homebrew\nInstall [Brew](https://brew.sh/) for OSX or [LinuxBrew](http://linuxbrew.sh/) for Linux, then run:\n\n```\nbrew install python # this is python v3\npip3 install cython\npip3 install tiptoft\n```\n## Bioconda\nInstall [Bioconda](http://bioconda.github.io/), then run:\n\n```\nconda install tiptoft\n```\n\n## Windows\nIt has been reported that the software works when using Ubuntu on Windows 10. This is not a supported platform as the authors don't use windows, so use at your own risk.\n\n# Usage\n## tiptoft_database_downloader script\nFirst of all you need plasmid database from PlasmidFinder. There is a snapshot bundled with this repository for your convenience, or alternatively you can use the downloader script to get the latest data. You will need internet access for this step. Please remember to cite the PlasmidFinder paper.\n\n```\nusage: tiptoft_database_downloader [options] output_prefix\n\nDownload PlasmidFinder database\n\npositional arguments:\n output_prefix Output prefix\n\noptional arguments:\n -h, --help show this help message and exit\n --verbose, -v Turn on debugging (default: False)\n --version show program's version number and exit\n```\n\nJust run:\n```\ntiptoft_database_downloader \n```\nYou will now have a file called 'plasmid_files.fa' which can be used with the main script.\n\n## tiptoft script\nThis is the main script of the application. The mandatory inputs are a FASTQ file of long reads, which can be optionally gzipped.\n```\nusage: tiptoft [options] input.fastq\n\nplasmid incompatibility group prediction from uncorrected long reads\n\npositional arguments:\n input_fastq Input FASTQ file (optionally gzipped)\n\noptional arguments:\n -h, --help show this help message and exit\n\nOptional input arguments:\n --plasmid_data PLASMID_DATA, -d PLASMID_DATA\n FASTA file containing plasmid data from downloader\n script, defaults to bundled database (default: None)\n --kmer KMER, -k KMER k-mer size (default: 13)\n\nOptional output arguments:\n --filtered_reads_file FILTERED_READS_FILE, -f FILTERED_READS_FILE\n Filename to save matching reads to (default: None)\n --output_file OUTPUT_FILE, -o OUTPUT_FILE\n Output file [STDOUT] (default: None)\n --print_interval PRINT_INTERVAL, -p PRINT_INTERVAL\n Print results every this number of reads (default:\n None)\n --verbose, -v Turn on debugging [False]\n --version show program's version number and exit\n\nOptional advanced input arguments:\n --max_gap MAX_GAP Maximum gap for blocks to be contigous, measured in\n multiples of the k-mer size (default: 3)\n --margin MARGIN Flanking region around a block to use for mapping\n (default: 10)\n --min_block_size MIN_BLOCK_SIZE\n Minimum block size in bases (default: 130)\n --min_fasta_hits MIN_FASTA_HITS, -m MIN_FASTA_HITS\n Minimum No. of kmers matching a read (default: 10)\n --min_perc_coverage MIN_PERC_COVERAGE, -c MIN_PERC_COVERAGE\n Minimum percentage coverage of typing sequence to\n report (default: 85)\n --min_kmers_for_onex_pass MIN_KMERS_FOR_ONEX_PASS\n Minimum No. of kmers matching a read in 1st pass\n (default: 10)\n```\n\n### Required argument\n\n__input_fastq__: This is a single FASTQ file. It can be optionally gzipped. Alternatively input can be read from stdin by using the dash character (-) as the input file name. The file must contain long reads, such as those from PacBio or Oxford Nanopore. The quality scores are ignored.\n\n### Optional input arguments\n\n__plasmid_data__: This is a FASTA file containing all of the plasmid typing sequences. This is generated by the tiptoft_database_downloader script. It comes from the PlasmidFinder website, so please be sure to cite their paper (citation gets printed every time you run the script).\n\n__kmer__: The most important parameter. 13 works well for Nanopore, 15 works well for PacBio, but you may need to play around with it for your data. Long reads have a high error rate, so if you set this too high, nothing will match (because it will contain errors). If you set it too low, everything will match, which isnt much use to you. Thinking about your data, on average how long of a stretch of bases can you get in your read without errors? This is what you should set your kmer to. For example, if you have an average of 1 error every 10 bases, then the ideal kmer would be 9.\n\n### Optional output arguments\n\n__filtered_reads_file__: Save the reads which contain the rep/inc sequences to a new FASTQ file. This is useful if you want to undertake a further assembly just on the plasmids.This file should not already exist. \n\n__output_file OUTPUT_FILE__: By default the results are printed to STDOUT. If you provide an output filename (which must not exist already), it will print the results to the file.\n\n__print_interval__: By default the whole file is processed and the final results are printed out. However you can get intermediate results printed after every X number of reads, which is useful if you are doing real time streaming of data into the application and can halt when you have enough information. They are separated by \"****\". \n\n__verbose__: Enable debugging mode where lots of extra output is printed to STDOUT.\n\n__version__: Print the version number and exit.\n\n\n### Optional advanced input arguments\n\n__max_gap__: Maximum gap for blocks to be contigous, measured in multiples of the k-mer size. This allows for short regions of elevated errors in the reads to be spanned.\n\n__margin__: Expand the analysis to look at a few bases on either side of where the sequence is predicted to be on the read. This allows for k-mers to overlap the ends.\n\n__min_block_size__: This is the minimum sub read size of a read to consider for indepth analysis after matching k-mers have been identified in the read. This speeds up the analysis quite a bit, but there is the risk that some reads may be missed, particularly if they have partial rep/inc sequences.\n\n__min_fasta_hits__: This is the minimum number of matching kmers in a read, for the read to be considered for analysis. It is a hard minimum threshold to speed up analysis.\n\n__min_perc_coverage__: Only report rep/inc sequences above this percentage coverage. Coverage in this instance is kmer coverage of the underlying sequence (rather than depth of coverage).\n\n__min_kmers_for_onex_pass__: The number of k-mers that must be present in the read for the initial onex pass of the database to be considered for further analysis. This speeds up the analysis quite a bit, but there is the risk that some reads may be missed, particularly if they have partial rep/inc sequences.\n\n# Output\nThe output is tab delmited and printed to STDOUT by default. You can optionally print it to a file using the '-o' parameter. If you would like to see intermediate results, you can tell it to print every X reads with the '-p' parameter, separated by '****'. An example of the output is:\n\n```\nGENE\tCOMPLETENESS\t%COVERAGE\tACCESSION\tDATABASE\tPRODUCT\nrep7.1\tFull\t100\tAB037671\tplasmidfinder\trep7.1_repC(Cassette)_AB037671\nrep7.5\tPartial\t99\tAF378372\tplasmidfinder\trep7.5_CDS1(pKC5b)_AF378372\nrep7.6\tPartial\t94\tSAU38656\tplasmidfinder\trep7.6_ORF(pKH1)_SAU38656\nrep7.9\tFull\t100\tNC007791\tplasmidfinder\trep7.9_CDS3(pUSA02)_NC007791\nrep7.10\tPartial\t91\tNC_010284.1\tplasmidfinder\trep7.10_repC(pKH17)_NC_010284.1\nrep7.12\tPartial\t93\tGQ900417.1\tplasmidfinder\trep7.12_rep(SAP060B)_GQ900417.1\nrep7.17\tFull\t100\tAM990993.1\tplasmidfinder\trep7.17_repC(pS0385-1)_AM990993.1\nrep20.11\tFull\t100\tAP003367\tplasmidfinder\trep20.11_repA(VRSAp)_AP003367\nrepUS14.\tFull\t100\tAP003367\tplasmidfinder\trepUS14._repA(VRSAp)_AP003367\n```\n\n__GENE__: The first column is the first part of the product name. \n\n__COMPLETENESS__: If all of the k-mers in the gene are found in the reads, the completeness is noted as 'Full', otherwise if there are some k-mers missing, it is noted as 'Partial'. \n\n__%COVERAGE__: The percentage coverage is the number of underlying k-mers in the gene where at least 1 matching k-mer has been found in the reads. 100 indicates that every k-mer in the gene is covered. Low coverage results are not shown (controlled by the --min_perc_coverage parameter).\n\n__ACCESSION__: This is the accession number from where the typing sequence originates. You can look this up at NCBI or EBI.\n\n__DATABASE__: This is where the data has come from, which is currently always plasmidfinder.\n\n__PRODUCT__: This is the full product of the gene as found in the database.\n\n# Example usage\nA real [test file](https://github.com/andrewjpage/tiptoft/raw/master/example_data/ERS654932_plasmids.fastq.gz) is bundled in the repository. Download it then run:\n\n```\ntiptoft ERS654932_plasmids.fastq.gz\n```\n\nThe [expected output](https://raw.githubusercontent.com/andrewjpage/tiptoft/master/example_data/expected_output) is in the repository. This uses a bundled database, however if you wish to use the latest up to date database, you should run the tiptoft_database_downloader script.\n\n# Resource usage\nFor an 800 MB FASTQ file (unzipped) of long reads from a Oxford Nanopore MinION containing Salmonella required 80 MB of RAM and took under 1 minute.\n\n## License\nTipToft is free software, licensed under [GPLv3](https://github.com/andrewjpage/tiptoft/blob/master/GPL-LICENSE).\n\n## Feedback/Issues\nPlease report any issues to the [issues page](https://github.com/andrewjpage/tiptoft/issues).\n\n## Contribute to the software\nIf you wish to fix a bug or add new features to the software we welcome Pull Requests. Please fork the repo, make the change, then submit a Pull Request with details about what the change is and what it fixes/adds.\n\n\n\n", "description_content_type": "", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/andrewjpage/tiptoft", "keywords": "", "license": "GPLv3", "maintainer": "", "maintainer_email": "", "name": "tiptoft", "package_url": "https://pypi.org/project/tiptoft/", "platform": "", "project_url": "https://pypi.org/project/tiptoft/", "project_urls": { "Homepage": "https://github.com/andrewjpage/tiptoft" }, "release_url": "https://pypi.org/project/tiptoft/1.0.0/", "requires_dist": [ "biopython (>=1.68)", "pyfastaq (>=3.12.0)", "cython" ], "requires_python": "", "summary": "tiptoft: predict which plasmid should be present from uncorrected long read data", "version": "1.0.0" }, "last_serial": 4485594, "releases": { "0.1.1": [ { "comment_text": "", "digests": { "md5": "462ebe46485a6cf3f4c7b66f8f337481", "sha256": "73b3589d37fb6711f4e83207d8bf1b384e30c648f38587f5312cfba9b7c0ee46" }, "downloads": -1, "filename": "tiptoft-0.1.1-cp36-cp36m-macosx_10_9_x86_64.whl", "has_sig": false, "md5_digest": "462ebe46485a6cf3f4c7b66f8f337481", "packagetype": "bdist_wheel", "python_version": "cp36", "requires_python": null, "size": 84732, "upload_time": "2018-09-27T18:00:07", "url": "https://files.pythonhosted.org/packages/8c/68/fa930a16789bb02d764821f07aa550827f999381029d0532632e50acefb5/tiptoft-0.1.1-cp36-cp36m-macosx_10_9_x86_64.whl" }, { "comment_text": "", "digests": { "md5": "67199c13151290495050d4499c21384b", "sha256": "aa26f2df73b78ba5e6c873ac373f0e98d14abab6908cdda62f0f9427f0109f87" }, "downloads": -1, "filename": "tiptoft-0.1.1.tar.gz", "has_sig": false, "md5_digest": "67199c13151290495050d4499c21384b", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 86987, "upload_time": "2018-09-27T18:00:10", "url": "https://files.pythonhosted.org/packages/8d/1e/0df98a3565f3f656ca625bc427804de61282593c0fed17fcb14da7605891/tiptoft-0.1.1.tar.gz" } ], "0.1.2": [ { "comment_text": "", "digests": { "md5": "2b85dcf9eb7cc487278cd3f21830af21", "sha256": "f464fbf9372ae9c65af8d71d9ed150a31a97d500a139382fdd3bd79edf471e00" }, "downloads": -1, "filename": "tiptoft-0.1.2-cp36-cp36m-macosx_10_9_x86_64.whl", "has_sig": false, "md5_digest": "2b85dcf9eb7cc487278cd3f21830af21", "packagetype": "bdist_wheel", "python_version": "cp36", "requires_python": null, "size": 86749, "upload_time": "2018-10-02T08:25:15", "url": "https://files.pythonhosted.org/packages/05/56/349ad4261a760440f4dcfe1668e6cdb28ac7b0b5db1a1041dcd8a9cc5f45/tiptoft-0.1.2-cp36-cp36m-macosx_10_9_x86_64.whl" }, { "comment_text": "", "digests": { "md5": "e46b602371ceba5346d2204053991888", "sha256": "2b7d19291ce3443a4e54cbb59e5ac2711cb4337d69c7860dca8f113509e5cceb" }, "downloads": -1, "filename": "tiptoft-0.1.2.tar.gz", "has_sig": false, "md5_digest": "e46b602371ceba5346d2204053991888", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 88386, "upload_time": "2018-10-02T08:25:16", "url": "https://files.pythonhosted.org/packages/b7/81/e4f1fff36e4c12b3d25c92b0948bce5e454fd7132b87ede88a1d420874ee/tiptoft-0.1.2.tar.gz" } ], "0.1.3": [ { "comment_text": "", "digests": { "md5": "85b343afed16efcf9c7e2b0e7074fb4d", "sha256": "fa53da8ce29b508d6316643533f220c88efc88ddd951fe0b5fe24315c0fa1d65" }, "downloads": -1, "filename": "tiptoft-0.1.3-cp36-cp36m-macosx_10_9_x86_64.whl", "has_sig": false, "md5_digest": "85b343afed16efcf9c7e2b0e7074fb4d", "packagetype": "bdist_wheel", "python_version": "cp36", "requires_python": null, "size": 86617, "upload_time": "2018-10-02T08:58:28", "url": "https://files.pythonhosted.org/packages/ac/b3/76ab9ea2348e09bdce60182d9748de2573582fdae12479792fe859fc8c34/tiptoft-0.1.3-cp36-cp36m-macosx_10_9_x86_64.whl" }, { "comment_text": "", "digests": { "md5": "64a128f0ec9b436072092c231c13461d", "sha256": "4e0f994867f146361eb28f2040bab599412f4e238b3e7541e533664e856ed8e3" }, "downloads": -1, "filename": "tiptoft-0.1.3.tar.gz", "has_sig": false, "md5_digest": "64a128f0ec9b436072092c231c13461d", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 88479, "upload_time": "2018-10-02T08:58:30", "url": "https://files.pythonhosted.org/packages/c0/9a/3b39936b78e2d0c1aeeb61ec6a6b03737e4bae7e5874ea121dbb6b3d0588/tiptoft-0.1.3.tar.gz" } ], "0.1.4": [ { "comment_text": "", "digests": { "md5": "8b1c537e31130d68aa7a06184d091b46", "sha256": "4c59a738a2eca5a4ba83828ea9d3a1d79b364dd12f4a99f12195f99627d84f96" }, "downloads": -1, "filename": "tiptoft-0.1.4-cp36-cp36m-macosx_10_9_x86_64.whl", "has_sig": false, "md5_digest": "8b1c537e31130d68aa7a06184d091b46", "packagetype": "bdist_wheel", "python_version": "cp36", "requires_python": null, "size": 86617, "upload_time": "2018-10-06T17:29:38", "url": "https://files.pythonhosted.org/packages/97/58/b5c33542598aca44c74ce70615b482878845105e42085ffb02d995823185/tiptoft-0.1.4-cp36-cp36m-macosx_10_9_x86_64.whl" }, { "comment_text": "", "digests": { "md5": "cb9a45ae0756c274f2dbe080bcf659eb", "sha256": "167b7194599d6eea7b2c0fa596d0a224991cba867aa4fd067e5677b0e6bfbabf" }, "downloads": -1, "filename": "tiptoft-0.1.4.tar.gz", "has_sig": false, "md5_digest": "cb9a45ae0756c274f2dbe080bcf659eb", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 105155, "upload_time": "2018-10-06T17:29:39", "url": "https://files.pythonhosted.org/packages/e3/78/f8771c111c68508507fe83f15b4d7c1811ae31262918b0860a9526f8c7ab/tiptoft-0.1.4.tar.gz" } ], "1.0.0": [ { "comment_text": "", "digests": { "md5": "bc69ba9256bfaee834d9d127d30d8bc5", "sha256": "4b3f779c4be4c52f401f44e7ea3f4305adad6c684621a8758715bfccd550011b" }, "downloads": -1, "filename": "tiptoft-1.0.0-cp36-cp36m-macosx_10_9_x86_64.whl", "has_sig": false, "md5_digest": "bc69ba9256bfaee834d9d127d30d8bc5", "packagetype": "bdist_wheel", "python_version": "cp36", "requires_python": null, "size": 87023, "upload_time": "2018-11-14T13:17:50", "url": "https://files.pythonhosted.org/packages/2f/5a/083796eb276c1a578d721cc2f6db0f3227ea1719c90ae8961669377250ad/tiptoft-1.0.0-cp36-cp36m-macosx_10_9_x86_64.whl" }, { "comment_text": "", "digests": { "md5": "3b20f0ebd041f5070ed5ab866d4a9d6c", "sha256": "271badd22f2a8e239f0c1a89b06380176535b22f48470ae87aa716c177cddd7b" }, "downloads": -1, "filename": "tiptoft-1.0.0.tar.gz", "has_sig": false, "md5_digest": "3b20f0ebd041f5070ed5ab866d4a9d6c", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 105586, "upload_time": "2018-11-14T13:17:53", "url": "https://files.pythonhosted.org/packages/56/fd/07cee0bceaead935df9a7a53e9ab568dad323052bc742c15c51d53cfb1ee/tiptoft-1.0.0.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "bc69ba9256bfaee834d9d127d30d8bc5", "sha256": "4b3f779c4be4c52f401f44e7ea3f4305adad6c684621a8758715bfccd550011b" }, "downloads": -1, "filename": "tiptoft-1.0.0-cp36-cp36m-macosx_10_9_x86_64.whl", "has_sig": false, "md5_digest": "bc69ba9256bfaee834d9d127d30d8bc5", "packagetype": "bdist_wheel", "python_version": "cp36", "requires_python": null, "size": 87023, "upload_time": "2018-11-14T13:17:50", "url": "https://files.pythonhosted.org/packages/2f/5a/083796eb276c1a578d721cc2f6db0f3227ea1719c90ae8961669377250ad/tiptoft-1.0.0-cp36-cp36m-macosx_10_9_x86_64.whl" }, { "comment_text": "", "digests": { "md5": "3b20f0ebd041f5070ed5ab866d4a9d6c", "sha256": "271badd22f2a8e239f0c1a89b06380176535b22f48470ae87aa716c177cddd7b" }, "downloads": -1, "filename": "tiptoft-1.0.0.tar.gz", "has_sig": false, "md5_digest": "3b20f0ebd041f5070ed5ab866d4a9d6c", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 105586, "upload_time": "2018-11-14T13:17:53", "url": "https://files.pythonhosted.org/packages/56/fd/07cee0bceaead935df9a7a53e9ab568dad323052bc742c15c51d53cfb1ee/tiptoft-1.0.0.tar.gz" } ] }