{ "info": { "author": "Daniel McDonald", "author_email": "wasade@gmail.com", "bugtrack_url": null, "classifiers": [], "description": "# UniFrac\n##### Canonically pronounced *yew-nih-frak*\n\n[![Build Status](https://travis-ci.org/biocore/unifrac.svg?branch=master)](https://travis-ci.org/biocore/unifrac)\n\nThe *de facto* repository for high-performance phylogenetic diversity calculations. The methods in this repository are based on an implementation of the [Strided State UniFrac](https://www.nature.com/articles/s41592-018-0187-8) algorithm which is faster, and uses less memory than [Fast UniFrac](http://www.nature.com/ismej/journal/v4/n1/full/ismej200997a.html). Strided State UniFrac supports [Unweighted UniFrac](http://aem.asm.org/content/71/12/8228.abstract), [Weighted UniFrac](http://aem.asm.org/content/73/5/1576), [Generalized UniFrac](https://academic.oup.com/bioinformatics/article/28/16/2106/324465/Associating-microbiome-composition-with), [Variance Adjusted UniFrac](https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-12-118) and [meta UniFrac](http://www.pnas.org/content/105/39/15076.short).\nThis repository also includes Stacked Faith (manuscript in preparation), a method for calculating Faith's PD that is faster and uses less memory than the Fast UniFrac-based [reference implementation](http://scikit-bio.org/).\n\nThis repository produces a C API exposed via a shared library which can be linked against by any programming language. \n\n# Citation\n\nA detailed description of the Strided State UniFrac algorithm can be found in [McDonald et al. 2018 Nature Methods](https://www.nature.com/articles/s41592-018-0187-8). Please note that this package implements multiple UniFrac variants, which may have their own citation. Details can be found in the help output from the command line interface in the citations section, and is included immediately below:\n\n ssu\n For UniFrac, please see:\n McDonald et al. Nature Methods 2018; DOI: 10.1038/s41592-018-0187-8\n Lozupone and Knight Appl Environ Microbiol 2005; DOI: 10.1128/AEM.71.12.8228-8235.2005\n Lozupone et al. Appl Environ Microbiol 2007; DOI: 10.1128/AEM.01996-06\n Hamady et al. ISME 2010; DOI: 10.1038/ismej.2009.97\n Lozupone et al. ISME 2011; DOI: 10.1038/ismej.2010.133\n For Generalized UniFrac, please see: \n Chen et al. Bioinformatics 2012; DOI: 10.1093/bioinformatics/bts342\n For Variance Adjusted UniFrac, please see: \n Chang et al. BMC Bioinformatics 2011; DOI: 10.1186/1471-2105-12-118\n\n faithpd\n For Faith's PD, please see:\n Faith Biological Conservation 1992; DOI: 10.1016/0006-3207(92)91201-3\n\n# Install\n\nAt this time, there are two primary ways to install the library. The first is through QIIME 2, and the second is via `pip`. It is also possible to clone the repository and install using either the `sucpp/Makefile` or `setup.py`. \n\nCompilation has been performed on both LLVM 9.0.0 (OS X >= 10.12) or GCC 4.9.2 (Centos >= 6) and HDF5 >= 1.8.17. Python installation requires Python >= 3.5, NumPy >= 1.12.1, scikit-bio >= 0.5.1, and Cython >= 0.28.3. \n\nInstallation time should be a few minutes at most.\n\n## Install (QIIME2)\n\nThe easiest way to use this library is through [QIIME2](https://docs.qiime2.org/2019.4/install/). The implementation of this algorithm is installed by default and is available under `qiime diversity beta-phylogenetic-alt`.\n\n## Install (native)\n\nTo install, first the binary needs to be compiled. This assumes that the HDF5 \ntoolchain and libraries are available. More information about how to setup the\nstack can be found [here](https://support.hdfgroup.org/HDF5/Tutor/compile.html). \n\nAssuming `h5c++` is in your path, the following should work:\n\n pip install -e . \n\n**Note**: if you are using `conda` we recommend installing HDF5 using the\n`conda-forge` channel, for example:\n\n conda install -c conda-forge hdf5\n \n# Examples of use\n\nBelow are a few light examples of different ways to use this library.\n\n## QIIME2 \n\nTo use Strided State UniFrac through QIIME2, you need to provide a `FeatureTable[Frequency]` and a `Phylogeny[Rooted]` artifacts. An example of use is:\n\n qiime diversity beta-phylogenetic --i-table table-evenly-samples.qza \\\n --i-phylogeny a-tree.qza \\\n --o-distance-matrix resulting-distance-matrix.qza \\\n --p-metric unweighted_unifrac\n \n## Python\n\nThe library can be accessed directly from within Python. If operating in this mode, the API methods are expecting a filepath to a BIOM-Format V2.1.0 table, and a filepath to a Newick formatted phylogeny.\n\n $ python\n Python 3.5.4 | packaged by conda-forge | (default, Aug 10 2017, 01:41:15)\n [GCC 4.2.1 Compatible Apple LLVM 6.1.0 (clang-602.0.53)] on darwin\n Type \"help\", \"copyright\", \"credits\" or \"license\" for more information.\n >>> import unifrac\n >>> dir(unifrac)\n ['__all__', '__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__path__', '__spec__', '__version__', '_api', '_meta', '_methods', 'generalized', 'meta', 'pkg_resources', 'ssu', 'stacked_faith', 'unweighted', 'weighted_normalized', 'weighted_unnormalized']\n >>> print(unifrac.unweighted.__doc__)\n Compute Unweighted UniFrac\n\n Parameters\n ----------\n table : str\n A filepath to a BIOM-Format 2.1 file.\n phylogeny : str\n A filepath to a Newick formatted tree.\n threads : int, optional\n The number of threads to use. Default of 1.\n variance_adjusted : bool, optional\n Adjust for varianace or not. Default is False.\n bypass_tips : bool\n Bypass the tips of the tree in the computation. This reduces compute\n by about 50%, but is an approximation.\n\n Returns\n -------\n skbio.DistanceMatrix\n The resulting distance matrix.\n\n Raises\n ------\n IOError\n If the tree file is not found\n If the table is not found\n ValueError\n If the table does not appear to be BIOM-Format v2.1.\n If the phylogeny does not appear to be in Newick format.\n\n Notes\n -----\n Unweighted UniFrac was originally described in [1]_. Variance Adjusted\n UniFrac was originally described in [2]_, and while its application to\n Unweighted UniFrac was not described, factoring in the variance adjustment\n is still feasible and so it is exposed.\n\n References\n ----------\n .. [1] Lozupone, C. & Knight, R. UniFrac: a new phylogenetic method for\n comparing microbial communities. Appl. Environ. Microbiol. 71, 8228-8235\n (2005).\n .. [2] Chang, Q., Luan, Y. & Sun, F. Variance adjusted weighted UniFrac: a\n powerful beta diversity measure for comparing communities based on\n phylogeny. BMC Bioinformatics 12:118 (2011).\n\n\t>>> print(unifrac.faith_pd.__doc__)\n\tExecute a call to the Stacked Faith API in the UniFrac package\n\n\t\tParameters\n\t\t----------\n\t\tbiom_filename : str\n\t\t\tA filepath to a BIOM 2.1 formatted table (HDF5)\n\t\ttree_filename : str\n\t\t\tA filepath to a Newick formatted tree\n\n\t\tReturns\n\t\t-------\n\t\tpd.Series\n\t\t\tSeries of Faith's PD for each sample in `biom_filename`\n\n\t\tRaises\n\t\t------\n\t\tIOError\n\t\t\tIf the tree file is not found\n\t\t\tIf the table is not found\n\t\t\tIf the table is empty\n\t\n\n## Command line\n\nThe methods can also be used directly through the command line after install:\n\n $ which ssu\n /Users//miniconda3/envs/qiime2-20xx.x/bin/ssu\n $ ssu --help\n usage: ssu -i -o -m [METHOD] -t [-n threads] [-a alpha] [--vaw]\n\n -i\t\tThe input BIOM table.\n -t\t\tThe input phylogeny in newick.\n -m\t\tThe method, [unweighted | weighted_normalized | weighted_unnormalized | generalized].\n -o\t\tThe output distance matrix.\n -n\t\t[OPTIONAL] The number of threads, default is 1.\n -a\t\t[OPTIONAL] Generalized UniFrac alpha, default is 1.\n -f\t\t[OPTIONAL] Bypass tips, reduces compute by about 50%.\n --vaw\t[OPTIONAL] Variance adjusted, default is to not adjust for variance.\n\n Citations:\n For UniFrac, please see:\n Lozupone and Knight Appl Environ Microbiol 2005; DOI: 10.1128/AEM.71.12.8228-8235.2005\n Lozupone et al. Appl Environ Microbiol 2007; DOI: 10.1128/AEM.01996-06\n Hamady et al. ISME 2010; DOI: 10.1038/ismej.2009.97\n Lozupone et al. ISME 2011; DOI: 10.1038/ismej.2010.133\n For Generalized UniFrac, please see:\n Chen et al. Bioinformatics 2012; DOI: 10.1093/bioinformatics/bts342\n For Variance Adjusted UniFrac, please see:\n Chang et al. BMC Bioinformatics 2011; DOI: 10.1186/1471-2105-12-118\n\n $ which faithpd\n /Users//miniconda3/envs/qiime2-20xx.x/bin/faithpd\n $ faithpd --help\n\tusage: faithpd -i -t -o \n\n\t\t-i The input BIOM table.\n\t\t-t The input phylogeny in newick.\n\t\t-o The output series.\n\n\tCitations: \n\t\tFor Faith's PD, please see:\n\t\t\tFaith Biological Conservation 1992; DOI: 10.1016/0006-3207(92)91201-3\n\n \n## Shared library access\n\nIn addition to the above methods to access UniFrac, it is also possible to link against the shared library. The C API is described in `sucpp/api.hpp`, and examples of linking against this API can be found in `examples/`. \n\n## Minor test dataset\n\nA small test `.biom` and `.tre` can be found in `sucpp/`. An example with expected output is below, and should execute in 10s of milliseconds:\n\n $ ssu -i sucpp/test.biom -t sucpp/test.tre -m unweighted -o test.out\n $ cat test.out\n \tSample1\tSample2\tSample3\tSample4\tSample5\tSample6\n Sample1\t0\t0.2\t0.5714285714285714\t0.6\t0.5\t0.2\n Sample2\t0.2\t0\t0.4285714285714285\t0.6666666666666666\t0.6\t0.3333333333333333\n Sample3\t0.5714285714285714\t0.4285714285714285\t0\t0.7142857142857143\t0.8571428571428571\t0.4285714285714285\n Sample4\t0.6\t0.6666666666666666\t0.7142857142857143\t0\t0.3333333333333333\t0.4\n Sample5\t0.5\t0.6\t0.8571428571428571\t0.3333333333333333\t0\t0.6\n Sample6\t0.2\t0.3333333333333333\t0.4285714285714285\t0.4\t0.6\t0", "description_content_type": "text/markdown", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/biocore/unifrac", "keywords": "", "license": "BSD-3-Clause", "maintainer": "", "maintainer_email": "", "name": "unifrac", "package_url": "https://pypi.org/project/unifrac/", "platform": "", "project_url": "https://pypi.org/project/unifrac/", "project_urls": { "Homepage": "https://github.com/biocore/unifrac" }, "release_url": "https://pypi.org/project/unifrac/0.10.0/", "requires_dist": null, "requires_python": "", "summary": "High performance phylogenetic diversity calculations", "version": "0.10.0" }, "last_serial": 5503014, "releases": { "0.10.0": [ { "comment_text": "", "digests": { "md5": "b6b6fbf58e56ba35b5586083c511825e", "sha256": "94b4a1b146b35ee7f855ae602cedfab2ca179cc87e397d535fd3e7f64d4d7349" }, "downloads": -1, "filename": "unifrac-0.10.0.tar.gz", "has_sig": false, "md5_digest": "b6b6fbf58e56ba35b5586083c511825e", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 219679, "upload_time": "2019-07-08T20:25:37", "url": "https://files.pythonhosted.org/packages/be/89/683b0925dadf0c63f015aba45e56d943a738caa70f736233e8654dc4a713/unifrac-0.10.0.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "b6b6fbf58e56ba35b5586083c511825e", "sha256": "94b4a1b146b35ee7f855ae602cedfab2ca179cc87e397d535fd3e7f64d4d7349" }, "downloads": -1, "filename": "unifrac-0.10.0.tar.gz", "has_sig": false, "md5_digest": "b6b6fbf58e56ba35b5586083c511825e", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 219679, "upload_time": "2019-07-08T20:25:37", "url": "https://files.pythonhosted.org/packages/be/89/683b0925dadf0c63f015aba45e56d943a738caa70f736233e8654dc4a713/unifrac-0.10.0.tar.gz" } ] }