{ "info": { "author": "Haibao Tang", "author_email": "tanghaibao@gmail.com", "bugtrack_url": null, "classifiers": [ "Development Status :: 4 - Beta", "Intended Audience :: Science/Research", "License :: OSI Approved :: BSD License", "Programming Language :: Python", "Programming Language :: Python :: 2", "Topic :: Scientific/Engineering :: Bio-Informatics" ], "description": "TREECUT: Dynamic tree cut algorithm\n===================================\n\n|Latest PyPI version| |Travis-CI|\n\n+-----------+---------------------------------------------------------------+\n| Author | Haibao Tang (`tanghaibao `__) |\n+-----------+---------------------------------------------------------------+\n| | Jingping Li (`Jingping `__) |\n+-----------+---------------------------------------------------------------+\n| Email | tanghaibao@gmail.com |\n+-----------+---------------------------------------------------------------+\n| License | `BSD `__ |\n+-----------+---------------------------------------------------------------+\n\nDescription\n-----------\n\nHierarchical clustering is an important tool in mining useful\nrelationships among multivariate biological data. However, there is no\nobvious way to define a set of useful, non-overlapping groups from the\nidentified hierarchy. Most efforts have focused on different cut-off\nvalues, evaluate the relative strengths of intra- versus inter- group\nvariances and then heuristically determine a \"good\" cutoff. This study\nintroduces a more dynamic approach that extracts clades that are\nsignificantly enriched or different from other clades. Incorporating\nphylogenetic information removes the false positives observed in a\nconventional analysis thus improves the prediction of trait association.\n\nThe algorithm takes two inputs, a tree model and some mapping of values\nfor all the terminal branches. Briefly, the algorithm performs\nindependent statistical tests on all the internal branches, and\ncalculates the P-values for each node. At exploratory stage, the\nstatistical tests are: 1) for quantitative values, test the difference\nof two groups separated by each node (student's t-test); 2) for\ncategorical values, test the association of a particular category for\nthe descendants of each internal node (Fisher's exact test).\n\nThe candidate nodes are determined using the following rule: the P-value\nfor the candidate node v has to be the smallest among all root-to-leaf\npaths that pass v. In other words, the group rooted at node v should\ncontain the largest level of association, thus avoiding redundant\nclades.\n\nServer version\n--------------\n\nA server version of TREECUT software can be found here:\nhttp://chibba.agtec.uga.edu/duplication/cut/\n\nInstallation\n------------\n\n- Python version >= 2.6\n- `scipy `__ for t-test and Fisher's Exact Test\n- `ete2 `__ for parsing the tree structure\n\n.. code:: bash\n\n pip install scipy ete2\n\nUsage\n-----\n\nTake a look at examples in the ``data/`` folder: ``treefile`` and\n``listfile``.\n\nThe ``treefile`` should be a\n`Newick-formatted `__ file\n(typically from the output of a phylogenetic reconstruction software,\ne.g. `phylip `__\nor `MEGA `__).\n\nThe ``listfile`` should contain the quantitative value for each taxon\n(separated by comma). Make sure that the taxon names match between\n``treefile`` and ``listfile``:\n\n::\n\n # continuous example\n IS13,57.2\n IS35,66.13\n\nIf the data type is discrete, separate the classes by semicolon. For\nexample:\n\n::\n\n # discrete example\n AT1G02150,GO:0009507;GO:0005488\n AT1G02160,GO:0005575;GO:0003674;GO:0008150\n\nNote that ``#`` represents a comment line and will be ignored.\n\nTo run the software:\n\n.. code:: bash\n\n python treecut.py data/tree.nwk data/continuous.csv tree.pdf\n\nA summary of extracted modules will be written to ``stdout``. Each row\nwill contain a subclade that show either significantly high phenotypic\nvalue or low phenotypic value. Further a visualization is available as\n``tree.pdf`` (supported image formats include ``svg``, ``png``, ``pdf``,\n``jpg``, etc.). The modules are highlighted in green (low-value modules)\nand red (high-value modules) colors.\n\n.. figure:: http://lh4.ggpht.com/_srvRoIok9Xs/TAdZnqQGvQI/AAAAAAAAA8I/gQvkBVpm8Rw/s800/tree.png\n :alt: tree-value mapping\n\n tree-value mapping\n\nCookbook\n--------\n\nThere are several immediate applications of TREECUT. Below just show\ncase two examples, but there are more.\n\nExtract taxonomic groups with high/low phenotype values\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nSee an example in the ``data/`` folder. This is the flowering time data\nfor sorghum diversity panel. ``flowering.nwk`` is a phylogenetic tree\nfor the sorghum accessions used in the study. ``flowering.assoc`` has\nthe mapping to the accession to the trait values (in this case the\nnumber of days until flowering). To run:\n\n.. code:: bash\n\n python treecut.py data/flowering.nwk data/flowering.assoc\n\nIf you stead want to treat the flowering data as discrete values, say\n\"high\" versus low. You can add a ``--discrete`` option:\n\n.. code:: bash\n\n python treecut.py data/flowering.nwk data/flowering_discrete.assoc --discrete flowering_discrete.png\n\nThe significant different clades (like extreme trait values) will be\nwritten to the screen.\n\nExtract co-expressed genes with functional enrichment\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nIn this example, I used Eisen's ``CLUSTER`` software\n(`here `__)\nto process a series of arabidopsis microarray series\n`AtGenExpress `__.\nAfter the ``CLUSTER`` is run. I found two files - ``microarray.cdt`` and\n``microarray.gtr``. The ``.gtr`` file contains a hierarchical tree\nstructure, but I need to convert it to ``.nwk`` format in order for\n``treecut.py`` to process.\n\nTake a look at ``microarray.assoc``, this contains the mapping from\narabidopsis genes to the GO terms, which are based on the information\ndownloaded at `Gene Ontology\nwebsite `__.\nNote that a gene can have multiple GO terms associated with it. Here is\nthe script that I used to create the ``microarray.assoc``:\n\n.. code:: bash\n\n python scripts/parse_tair_go.py\n\nOnce everything is set, just run ``treecut.py`` as usual (make sure to\nturn on the ``--discrete`` option):\n\n.. code:: bash\n\n python scripts/eisen_to_newick.py data/microarray.gtr data/microarray.cdt data/microarray.nwk\n python treecut.py data/microarray.nwk data/microarray.assoc --discrete\n\nThe clades that are significantly enriched in certain GO terms will be\nwritten to the screen.\n\nReference\n---------\n\nTang et al. TREECUT: algorithm for extracting significant modules from\nhierarchical clustering\n\n.. |Latest PyPI version| image:: https://img.shields.io/pypi/v/treecut.svg\n :target: https://pypi.python.org/pypi/treecut\n.. |Travis-CI| image:: https://travis-ci.org/tanghaibao/treecut.svg?branch=master\n :target: https://travis-ci.org/tanghaibao/treecut\n", "description_content_type": null, "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "http://github.com/tanghaibao/treecut", "keywords": "", "license": "BSD", "maintainer": "", "maintainer_email": "", "name": "treecut", "package_url": "https://pypi.org/project/treecut/", "platform": "", "project_url": "https://pypi.org/project/treecut/", "project_urls": { "Homepage": "http://github.com/tanghaibao/treecut" }, "release_url": "https://pypi.org/project/treecut/0.7.8/", "requires_dist": null, "requires_python": "", "summary": "Find nodes in hierarchical clustering that are statistically significant", "version": "0.7.8" }, "last_serial": 3250863, "releases": { "0.7.6": [ { "comment_text": "", "digests": { "md5": "6948f64ab07741266eeea8a2eb7a1e89", "sha256": "af9fc894a943527426e9bab02b31a7763f05d8ba79cd9a2362b718aa9ab57e8e" }, "downloads": -1, "filename": "treecut-0.7.6.tar.gz", "has_sig": false, "md5_digest": "6948f64ab07741266eeea8a2eb7a1e89", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 11567, "upload_time": "2017-10-07T15:42:45", "url": "https://files.pythonhosted.org/packages/72/c5/fda03b219cb595344ddf7a82ab49aabadd90cc453c9dac26689663d8f263/treecut-0.7.6.tar.gz" } ], "0.7.7": [ { "comment_text": "", "digests": { "md5": "684b856563aab6fe03ab6a455be2b933", "sha256": "4093aadbb922fc184f27e358ce526fc605530d4aa3f74450ad85aea87a804e5b" }, "downloads": -1, "filename": "treecut-0.7.7.tar.gz", "has_sig": false, "md5_digest": "684b856563aab6fe03ab6a455be2b933", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 12576, "upload_time": "2017-10-15T01:30:37", "url": "https://files.pythonhosted.org/packages/fa/03/4599d8155ad507cfd63674e98e06cda80c0fbe979b8892a3467401e558cf/treecut-0.7.7.tar.gz" } ], "0.7.8": [ { "comment_text": "", "digests": { "md5": "1660f901acb8776cc022307aab3a5b60", "sha256": "f0f6ca6d3f13066ddca19911261b931e7adef93070dae34d886d1d752250d3fc" }, "downloads": -1, "filename": "treecut-0.7.8.tar.gz", "has_sig": false, "md5_digest": "1660f901acb8776cc022307aab3a5b60", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 12600, "upload_time": "2017-10-15T01:33:48", "url": "https://files.pythonhosted.org/packages/7a/63/bb3003b501cb7bd57a9bbafaa7c53715cf68538c195f6098e53a08797811/treecut-0.7.8.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "1660f901acb8776cc022307aab3a5b60", "sha256": "f0f6ca6d3f13066ddca19911261b931e7adef93070dae34d886d1d752250d3fc" }, "downloads": -1, "filename": "treecut-0.7.8.tar.gz", "has_sig": false, "md5_digest": "1660f901acb8776cc022307aab3a5b60", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 12600, "upload_time": "2017-10-15T01:33:48", "url": "https://files.pythonhosted.org/packages/7a/63/bb3003b501cb7bd57a9bbafaa7c53715cf68538c195f6098e53a08797811/treecut-0.7.8.tar.gz" } ] }