{ "info": { "author": "Ben Woodcroft", "author_email": "", "bugtrack_url": null, "classifiers": [ "Development Status :: 3 - Alpha", "License :: OSI Approved :: GNU General Public License v3 or later (GPLv3+)", "Programming Language :: Python :: 2", "Programming Language :: Python :: 2.7", "Topic :: Scientific/Engineering", "Topic :: Scientific/Engineering :: Bio-Informatics" ], "description": "\n**Table of Contents**\n\n- [SingleM](#singlem)\n - [Generating an OTU table](#generating-an-otu-table)\n - [Further processing of OTU tables](#further-processing-of-otu-tables)\n - [Summarising OTU tables](#summarising-otu-tables)\n - [Calculating beta diversity between samples](#calculating-beta-diversity-between-samples)\n - [Creating and querying SingleM databases](#creating-and-querying-singlem-databases)\n - [Appraising assembly and genome recovery efforts](#appraising-assembly-and-genome-recovery-efforts)\n - [Installation](#installation)\n - [Installation via GNU Guix](#installation-via-gnu-guix)\n - [Installation via DockerHub](#installation-via-dockerhub)\n - [Installation via PyPI](#installation-via-pypi)\n - [Help](#help)\n - [FAQ](#faq)\n - [Can you target the 16S rRNA gene instead of the default set of ribosomal proteins with SingleM?](#can-you-target-the-16s-rrna-gene-instead-of-the-default-set-of-ribosomal-proteins-with-singlem)\n - [How should SingleM be run on multiple samples?](#how-should-singlem-be-run-on-multiple-samples)\n - [License](#license)\n\n\n# SingleM\nWelcome.\n\nSingleM is a tool to find the abundances of discrete operational taxonomic units (OTUs) directly from shotgun metagenome data, without heavy reliance on reference sequence databases. It is able to differentiate closely related species even if those species are from lineages new to science.\n\nWhere [GraftM](https://github.com/geronimp/graftM) can give a taxonomic overview of your community e.g. proportion of a community from a particular taxonomic family, SingleM finds sequence-based OTUs from raw, untrimmed metagenomic reads.\n\nThis gives you the ability to answer questions such as:\n\n* How many different kinds of TM6 do I have?\n* What is the Chao 1 diversity of my sample?\n* Are the Acidobacteria in sample 1 very closely related to the Acidobacteria in sample 2?\n* Do I have population genomes for the main community members?\n* How diverse are the Pelagibacteria relative to the Flavobacteria?\n* Has my genome been observed in any samples submitted to the [SRA](http://www.ncbi.nlm.nih.gov/sra)?\n* Which community members are more likely to assemble?\n\n## Generating an OTU table\n\nAn overview of your community can be obtained like so:\n```\nsinglem pipe --sequences my_sequences.fastq.gz --otu_table otu_table.csv --threads 24\n```\nPlease use **raw** metagenome reads, not quality trimmed reads. Quality trimming with e.g. [Trimmomatic](https://doi.org/10.1093/bioinformatics/btu170) reads often makes them too short for SingleM to use. On the other hand, adapter trimming is unlikely to be detrimental.\n\nThe output table consists of columns:\n```\ngene sample sequence num_hits coverage taxonomy\n4.21.ribosomal_protein_S19_rpsS my_sequences TGGTCGCGCCGTTCGACGGTCACTCCGGACTTCATCGGCCTACAGTTCGCCGTGCACATC 1 1.64 Root; d__Bacteria; p__Proteobacteria; c__Deltaproteobacteria; o__Desulfuromonadales\n4.21.ribosomal_protein_S19_rpsS my_sequences TGGTCGCGGCGCTCAACCATTCTGCCCGAGTTCGTCGGCCACACCGTGGCCGTTCACAAC 1 1.64 Root; d__Bacteria; p__Acidobacteria; c__Solibacteres; o__Solibacterales; f__Solibacteraceae; g__Candidatus_Solibacter; s__Candidatus_Solibacter_usitatus\n```\n1. marker name\n2. sample name\n3. sequence of the OTU\n4. number of reads detected from that OTU\n5. estimated coverage of a genome from this OTU\n6. \"median\" taxonomic classification of each of the reads in the OTU according to [pplacer](http://matsen.fhcrc.org/pplacer/)\n\nCurrently SingleM concentrates on 14 single copy marker genes to provide fine-grained differentiation of species that is independent of the copy-number variation issues that hamper 16S analyses. SingleM is reasonably fast and is quite scalable, although there is much room for improvement. On average, each of the 14 genes better differentiates closely related lineages than a typical 16S amplicon-based study.\n\n## Further processing of OTU tables\n### Summarising OTU tables by rarefying, clustering, etc.\nOnce an OTU table has been generated with the `pipe` command, it can be further processed in various ways using `summarise`:\n\nCreate a [Krona](https://sourceforge.net/p/krona/) plot of the community. The following command generates a Krona file `my_krona.html` which can be viewed in a web browser:\n```\nsinglem summarise --input_otu_tables otu_table.csv --krona my_krona.html\n```\n\nSeveral OTU tables can be combined into one. Note that this is not necessary if the combined output is to be input again into summarise (or many other commands) - it is possible to just specify multiple input tables with `--input_otu_tables`. To combine:\n```\nsinglem summarise --input_otu_tables otu_table1.csv otu_table2.csv --output_otu_table combined.otu_table.csv\n```\n\nCluster sequences, collapsing them into OTUs with less resolution, but with more robustness against sequencing error:\n```\nsinglem summarise --input_otu_tables otu_table.csv --cluster --clustered_output_otu_table clustered.otu_table.csv\n```\n\nRarefy a set of OTU tables so that each sample contains the same number of OTU sequences:\n```\nsinglem summarise --input_otu_tables otu_table.csv other_samples.otu_table.csv --rarefied_output_otu_table rarefied.otu_table.csv --number_to_choose 100\n```\n\nConversion to [BIOM format](http://biom-format.org) for use with QIIME:\n```\nsinglem summarise --input_otu_tables otu_table.csv other_samples.otu_table.csv --biom_prefix myprefix\n```\nThis generates a BIOM table for each marker gene e.g. `myprefix.4.12.ribosomal_protein_L11_rplK.biom`.\n\n### Calculating beta diversity between samples\nAs SingleM generates OTUs that are independent of taxonomy, they can be used as input to beta diversity methods known to be appropriate for the analysis of 16S amplicon studies, of which there are many. We recommend [express beta diversity](https://github.com/dparks1134/ExpressBetaDiversity) (EBD) as it implements many different metrics with a unified interface. For instance to calculate Bray-Curtis beta diversity, first convert your OTU table to unifrac file format using `singlem summarise`. Note that this file format does not contain any phylogenetic information, even if the format is called 'unifrac'.\n```\nsinglem summarise --input_otu_table otu_table.csv --unifrac_by_otu otu_table.unifrac\n```\nThe above commands generates 14 different unifrac format files, one for each marker gene used in SingleM. At this point, you need to choose one table to proceed with. Hopefully, the choice matters little, but it might pay to use multiple tables and ensure that the results are consistent.\n\nTo calculate beta diversity that does not account for the phylogenetic relationships between the OTU sequences, use the EBD script `convertToEBD.py` to convert the unifrac format into ebd format, and calculate the diversity metric:\n```\nconvertToEBD.py otu_table.unifrac.4.12.ribosomal_protein_L11_rplK.unifrac otu_table.ebd\nExpressBetaDiversity -s otu_table.ebd -c Bray-Curtis\n```\nPhylogenetic tree-based methods of calculating beta diversity can also be calculated, but `pipe` must be used to generate a new OTU table using the `diamond_example` taxonomy assignment method so that each OTU is assigned to a single leaf in the tree:\n```\nsinglem pipe --sequences my_sequences.fastq.gz --otu_table otu_table.diamond_example.csv --threads 24 --assignment_method diamond_example\n```\nThen, use the `--unifrac_by_taxonomy` flag to create a unifrac format file indexed by taxonomy identifier:\n```\nsinglem summarise --otu_tables otu_table.diamond_example.csv --unifrac_by_taxonomy otu_table.diamond_example.csv\nconvertToEBD.py otu_table.diamond_example.unifrac otu_table.diamond_example.ebd\n```\nThen, finally run `ExpressBetaDiversity` using the `-t` flag.\n```\nExpressBetaDiversity -s otu_table.diamond_example.ebd -c Bray-Curtis -t \n```\nwhere `` is the newick format file in the SingleM package used to find the OTU sequences. This path can be found using `singlem get_tree`.\n\n\n### Creating and querying SingleM databases\nIt can be useful in some situations to search for sequences in OTU tables. For instance, you may ask \"is the most abundant OTU or anything similar in samples B, C or D?\" To answer this question make a SingleM database from sample B, C & D's OTU tables:\n```\nsinglem makedb --otu_tables sample_B.csv sample_C.csv sample_D.csv --db_path sample_BCD.sdb\n```\n`.sdb` is the conventional file extension for SingleM databases. Then to query this database\n```\nsinglem query --query_sequence TGGTCGCGGCGCTCAACCATTCTGCCCGAGTTCGTCGGCCACACCGTGGCCGTTCACAAC --db sample_BCD.sdb\n```\n\n\n### Appraising assembly and genome recovery efforts \nSingleM can be used to determine how much of a community is represented in an assembly or represented\nby a set of genomes.\n\nThe assessment is carried out by comparing the set of OTU sequences in the\nassembly/genomes to those found from the raw metagenome reads. A similar metric\ncan be estimated by the fraction of reads mapping to either the assembly or the\ngenome, but the approach here is more flexible and has several advantages:\n\n1. We can determine which specific lineages are missing\n2. We can match OTU sequences imperfectly, so we can e.g. make statements about whether a genus level representative genome has been recovered\n3. Since the metric assesses only single copy marker genes, the appraisal is on a per-cell basis, not per-read\n4. Some care must be taken, but we can prevent Eukaryotic DNA from skewing the estimate\n\nTo assess how well a set of sequences represent a metagenome, first run `pipe`\non both the genomes and the raw reads, and then use `appraise`:\n```\nsinglem pipe --sequences raw.fq.gz --otu_table metagenome.otu_table.csv\nsinglem pipe --sequences my_genomes/*.fasta --otu_table genomes.otu_table.csv\nsinglem appraise --metagenome_otu_tables metagenome.otu_table.csv --genome_otu_tables genomes.otu_table.csv\n```\nOne may also accommodate some sequence differences, with `--imperfect`, or\noutput OTU tables of those sequences that match and those that do not (see\n`singlem appraise -h`). Assessing assemblies is similar to assessing genomes,\nexcept that when assessing bins duplicate markers from the same genome are\nexcluded as likely contamination.\n\nAn appraisal can also be represented visually, using `appraise --plot`:\n\n![Image of appraise](https://raw.githubusercontent.com/wwood/singlem/master/appraise_plot.png)\n\nEach rectangle represents a single OTU sequence where its size represents its\nabundance (the number of reads that OTU represents in the metagenome). Colours\nrepresent 89% OTU clustering of these sequences (~genus level), with the\ntaxonomy of the most common sequence written out. Here we see that highly\nabundant OTUs in SRR5040536 were assembled, but only 1 of the 3 abundant\nGallionellales OTUs has an associated bin. As is common, the highest abundance\nlineages did not necessarily assemble and bin successfully. The marker\n`4.20.ribosomal_protein_S15P_S13e` was chosen as the representative marker\nbecause it has a representative fraction of OTUs binned, assembled and\nunassembled.\n\n\n### Installation\n\n#### Installation via GNU Guix\nThe most straightforward way of installing SingleM is to use the GNU Guix package which is part of the ACE Guix package collection. This method installs not just the Python libraries required but the compiled bioinformatics tools needed as well. Once you have installed Guix, clone the ACE collection and install:\n```\ngit clone https://github.com/Ecogenomics/ace-guix\nGUIX_PACKAGE_PATH=ace-guix guix package --install singlem\n```\nBeyond installing GNU Guix, super-user privileges are not required for installation.\n\n#### Installation via DockerHub\nA docker image generated from the Guix package is available on DockerHub. After installing Docker, run the following, replacing `[RELEASE_TAG]` with a tag from https://hub.docker.com/r/wwood/singlem/tags:\n```\ndocker pull wwood/singlem:[RELEASE_TAG]\n```\nIf the sequence data to be analyzed is in the current working directory, SingleM can be used like so:\n```\ndocker run -v `pwd`:`pwd` wwood/singlem:[RELEASE_TAG] pipe --sequences `pwd`/my.fastq.gz --otu_table `pwd`/my.otu_table.csv --threads 14\n```\n\n#### Installation via PyPI\nTo install the Python libraries required:\n```\npip install graftm\npip install singlem\n```\nYou may need super-user privileges.\n\nSingleM also has the following non-Python dependencies:\n* [smafa](https://github.com/wwood/smafa)\n* [VSEARCH](https://github.com/torognes/vsearch)\n\nSome dependencies of [GraftM](https://github.com/geronimp/graftM):\n* [OrfM](https://github.com/wwood/OrfM) >= 0.2.0 \n* [HMMER](http://hmmer.janelia.org/) >= 3.1b1 \n* [fxtract](https://github.com/ctSkennerton/fxtract)\n* [pplacer](http://matsen.fhcrc.org/pplacer/) >= 1.1.alpha17\n* [KronaTools](http://sourceforge.net/p/krona/home/krona/) >= 2.4\n* [diamond](https://github.com/bbuchfink/diamond) >= 0.9\n\n## Help\nIf you have any questions or comments, send a message to the [SupportM mailing list](https://groups.google.com/forum/?utm_medium=email&utm_source=footer#!forum/supportm) or raise a [GitHib issue](https://github.com/wwood/singlem/issues).\n\n### FAQ\n#### Can you target the 16S rRNA gene instead of the default set of ribosomal proteins with SingleM?\nYes. By default, SingleM builds OTU tables from ribosomal protein genes rather than 16S because this in general gives more strain-level resolution due to redundancy in the genetic code. If you are really keen on using 16S, then you can use SingleM with a 16S SingleM package (spkg). There is a repository of auxiliary packages at https://github.com/wwood/singlem_extra_packages including a 16S package that is suitable for this purpose. The resolution won't be as high taxonomically, and there are issues around copy number variation, but it could be useful to use 16S for various reasons e.g. linking it to an amplicon study or using the GreenGenes taxonomy. For now there's no 16S spkg that gets installed by default, you have to use the `--singlem_packages` flag in `pipe` mode pointing to a separately downloaded package - see https://github.com/wwood/singlem_extra_packages/blob/master/README.md.\n\n#### How should SingleM be run on multiple samples?\nThere are two ways. It is possible to specify multiple input files to the `singlem pipe` subcommand directly by space separating them. Alternatively `singlem pipe` can be run on each sample and OTU tables combined using `singlem summarise`. The results should be identical, though there are some performance trade-offs. For large numbers of samples (>100) it is probably preferable to run each sample individually or in smaller groups.\n\n## License\nSingleM is written by [Ben Woodcroft](http://ecogenomic.org/personnel/dr-ben-woodcroft) (@wwood) at the [Australian Centre for Ecogenomics (UQ)](http://ecogenomic.org/) and is licensed under [GPL3 or later](https://gnu.org/licenses/gpl.html).\n", "description_content_type": "", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/wwood/SingleM", "keywords": "metagenomics bioinformatics", "license": "GPL3+", "maintainer": "", "maintainer_email": "", "name": "singlem", "package_url": "https://pypi.org/project/singlem/", "platform": "", "project_url": "https://pypi.org/project/singlem/", "project_urls": { "Homepage": "https://github.com/wwood/SingleM" }, "release_url": "https://pypi.org/project/singlem/0.12.1/", "requires_dist": null, "requires_python": "", "summary": "Find de-novo operational taxonomic units (OTUs) from metagenome data", "version": "0.12.1" }, "last_serial": 5001420, "releases": { "0.11.0": [ { "comment_text": "", "digests": { "md5": "0f51c1fc71b2b0b1f086274e22bdaee0", "sha256": "fcb8f799b426c425c471ac8a7bbf5d7e8b16b614e174f6cf05bac9f262adad4e" }, "downloads": -1, "filename": "singlem-0.11.0.tar.gz", "has_sig": false, "md5_digest": "0f51c1fc71b2b0b1f086274e22bdaee0", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 79862779, "upload_time": "2018-07-31T06:56:58", "url": "https://files.pythonhosted.org/packages/9c/7c/bdc416d19613e03f631006b9e63355480dfd5438007172002652014dccd5/singlem-0.11.0.tar.gz" } ], "0.12.0": [ { "comment_text": "", "digests": { "md5": "027d1564a832f4ab238b0acb74bea91b", "sha256": "11192bc050d28eb8b9c0e84aab64b1cfe59dcd7cb4fc0b0a86718d1fbaaebd98" }, "downloads": -1, "filename": "singlem-0.12.0.tar.gz", "has_sig": false, "md5_digest": "027d1564a832f4ab238b0acb74bea91b", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 79860731, "upload_time": "2019-02-24T23:45:37", "url": "https://files.pythonhosted.org/packages/a3/67/58f5f073fa50c02cf0ccbade3bc0823976f29d12545e48a0b2f2c45288f9/singlem-0.12.0.tar.gz" } ], "0.12.1": [ { "comment_text": "", "digests": { "md5": "c4f97bad6515e344b26331583736b0c4", "sha256": "598278f553d5bcd775531e295d6272c42e19b7ca4c72d673967f958b364774a8" }, "downloads": -1, "filename": "singlem-0.12.1.tar.gz", "has_sig": false, "md5_digest": "c4f97bad6515e344b26331583736b0c4", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 78453897, "upload_time": "2019-03-29T02:48:59", "url": "https://files.pythonhosted.org/packages/68/8c/8bd9277ff17836d14658186469cb05376569f052f90cb3a82a449ac7eea4/singlem-0.12.1.tar.gz" } ], "0.7.1": [ { "comment_text": "", "digests": { "md5": "8806f295350304210d7300807d42d16d", "sha256": "6e129ad769d0ab411221ecd7be4ab7cb11b15c999d8ab11a59f4f93d0a9efb32" }, "downloads": -1, "filename": "singlem-0.7.1.tar.gz", "has_sig": false, "md5_digest": "8806f295350304210d7300807d42d16d", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 39415229, "upload_time": "2017-02-11T06:32:05", "url": "https://files.pythonhosted.org/packages/35/55/72780c79146b2ccd465016b28eccbe6c326493f5a8db3c489eda3382fbb1/singlem-0.7.1.tar.gz" } ], "0.8.0": [ { "comment_text": "", "digests": { "md5": "58033fd17729ecfd6b28e309236f6894", "sha256": "6c0213990fd1ebb2a3c18b5bfa919143c0bdb53bbe9d53d4740e3206b9a151d2" }, "downloads": -1, "filename": "singlem-0.8.0.tar.gz", "has_sig": false, "md5_digest": "58033fd17729ecfd6b28e309236f6894", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 38893725, "upload_time": "2017-07-14T04:17:01", "url": "https://files.pythonhosted.org/packages/17/8b/f553c11bbb3e950516589aa26299b85098d0ac78405a4ebacefa023de306/singlem-0.8.0.tar.gz" } ], "0.8.1": [ { "comment_text": "", "digests": { "md5": "f14a8e93398ba7cf0abb1c93981db00c", "sha256": "d8a7025f7526cd3a589369085a1cc39227aaca13c9d9ef222c1d9d972c3a6c9e" }, "downloads": -1, "filename": "singlem-0.8.1.tar.gz", "has_sig": false, "md5_digest": "f14a8e93398ba7cf0abb1c93981db00c", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 38892788, "upload_time": "2017-08-14T00:10:49", "url": "https://files.pythonhosted.org/packages/b6/f9/9538a096be3c2cc820642d87a626e45ce03f55716175025605b2d33f5e39/singlem-0.8.1.tar.gz" } ], "0.8.2": [ { "comment_text": "", "digests": { "md5": "b4f46be616d05530237eb208d819f151", "sha256": "2b5704510674e44c81d29b73f937853b84715884f802d737ce4a6e68379665a5" }, "downloads": -1, "filename": "singlem-0.8.2.tar.gz", "has_sig": false, "md5_digest": "b4f46be616d05530237eb208d819f151", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 44891666, "upload_time": "2018-01-14T05:41:49", "url": "https://files.pythonhosted.org/packages/ca/c5/95b117cf1ebc381cf922280dd0aaaf9967777c903c8a079330728a278709/singlem-0.8.2.tar.gz" } ], "0.9.0": [ { "comment_text": "", "digests": { "md5": "7294294a5ee88906df1e355e96a2a1ad", "sha256": "9dc7786c23545f27edea2d9c34f11cbbc30a24825ad34636161cb996262dd7c6" }, "downloads": -1, "filename": "singlem-0.9.0.tar.gz", "has_sig": false, "md5_digest": "7294294a5ee88906df1e355e96a2a1ad", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 45061209, "upload_time": "2018-02-21T05:24:21", "url": "https://files.pythonhosted.org/packages/9d/8d/156892825f51ea215336c9171ba98ddd3003bc72538232c3e5cc20840b9c/singlem-0.9.0.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "c4f97bad6515e344b26331583736b0c4", "sha256": "598278f553d5bcd775531e295d6272c42e19b7ca4c72d673967f958b364774a8" }, "downloads": -1, "filename": "singlem-0.12.1.tar.gz", "has_sig": false, "md5_digest": "c4f97bad6515e344b26331583736b0c4", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 78453897, "upload_time": "2019-03-29T02:48:59", "url": "https://files.pythonhosted.org/packages/68/8c/8bd9277ff17836d14658186469cb05376569f052f90cb3a82a449ac7eea4/singlem-0.12.1.tar.gz" } ] }