{ "info": { "author": "Eliot Bush", "author_email": "bush@hmc.edu", "bugtrack_url": null, "classifiers": [ "License :: OSI Approved :: GNU General Public License v3 (GPLv3)", "Operating System :: OS Independent", "Programming Language :: Python :: 3" ], "description": "======\nxenoGI\n======\n\nCode for detecting genomic island insertions in clades of microbes.\n\nRequirements\n------------\n\n* NCBI blast+\n\n We need blastp and makeblastdb executables (ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/).\n\n* Python 3\n\n* Package dependencies\n\n - Biopython (http://biopython.org/). This is for parsing genbank files and can be installed using pip:\n ``pip3 install biopython``\n\n - Parasail (https://github.com/jeffdaily/parasail). This is an optimized alignment library, used in calculating scores between proteins. It can also be installed using pip:\n ``pip3 install parasail``\n\n - Numpy (http://www.numpy.org/).\n ``pip3 install numpy``\n \n - Scipy (https://www.scipy.org/).\n ``pip3 install scipy``\n\n(The pip you use needs to correspond to a version of Python 3. In some cases it may just be called pip instead of pip3).\n\n* Additional dependencies\n\n If you make use of the ``makeSpeciesTree`` flag, you will also need the following\n\n - MUSCLE (https://www.drive5.com/muscle/).\n\n - FastTree (http://www.microbesonline.org/fasttree/).\n\n - ASTRAL (https://github.com/smirarab/ASTRAL/).\n\nInstallation\n------------\n\nThe easiest way to install is using pip::\n\n pip3 install xenoGI\n\nCitation\n--------\n\nIf you use xenoGI in a publication, please cite the following:\n\nBush EC, Clark AE, DeRanek CA, Eng A, Forman J, Heath K, Lee AB, Stoebel DM, Wang Z, Wilber M, Wu H. xenoGI: reconstructing the history of genomic island insertions in clades of closely related bacteria. BMC Bioinformatics. 19(32). 2018.\n\nHow to use\n----------\n\nAn ``example/`` directory is included in this repository.\n\nThe basic method works on a set of species with known phylogenetic relationships. In the example, these species are: E. coli K12, E. coli ATCC 11775, E. fergusonii and S. bongori.\n\nRequired files\n~~~~~~~~~~~~~~\n\nThe working directory must contain:\n\n* A parameter file. In the provided ``example/`` directory this is called ``params.py``. The ``blastExecutDirPath`` parameter in this file should be edited to point to the directory where the blastp and makeblastdb executables are.\n\n* A newick format tree representing the relationships of the strains. In the example this is called ``example.tre``. Note that branch lengths are not used in xenoGI, and ``example.tre`` does not contain branch lengths. Also note that internal nodes should be given names in this tree. In the example.tre we label them i0, i1 etc. The parameter ``treeFN`` in ``params.py`` has the path to this tree file. If a strain tree is not available, xenoGI has some accessory methods, described below, to help obtain one.\n\n* A subdirectory of sequence files. In the example, this is called ``ncbi/``. Contained in this subdirectory will be genbank (gbff) files for the species. The parameter ``genbankFilePath`` in ``params.py`` has the path to these files.\n\nNaming of genbank files\n~~~~~~~~~~~~~~~~~~~~~~~\n\nThe system needs a way to connect the sequence files to the names used in the tree.\n\nIn the example, the sequence files have names corresponding to their assembly accession number from ncbi. We connect these to the human readable names in example.tre using a mapping given in the file ``ncbiHumanMap.txt``. This file has two columns, the first giving the name of the genbank file, and the second giving the name for the species used in the tree file. Note that the species name should not contain any dashes (\"-\"). In ``params.py`` the parameter ``fileNameMapFN`` is set to point to this file.\n\nAnother approach is to change the names of the sequence files to match what's in the tree. If you do this, then you should set ``fileNameMapFN = None`` in ``params.py``. (This is not necessary in the example, which is already set to run the other way).\n\nRunning the code\n~~~~~~~~~~~~~~~~\n\nIf you install via pip, then you should have an executable script in your path called xenoGI.\n\nYou run the code from within the working directory. To run the example, you would cd into the ``example/`` directory. You will need to ensure that the ``params.py`` parameters file contains the correct path to the directory with the blastp and makeblastdb executables in it. Then, the various steps of xenoGI can be run all at once like this::\n\n xenoGI params.py runAll\n\nThey can also be run individually::\n\n xenoGI params.py parseGenbank\n xenoGI params.py runBlast\n xenoGI params.py calcScores\n xenoGI params.py makeFamilies\n xenoGI params.py makeIslands\n xenoGI params.py printAnalysis\n xenoGI params.py createIslandBed\n\nIf for some reason you don't want to install via pip, then you can download the repository and run the code like this::\n\n python3 path-to-xenoGI-github-repository/xenoGI-runner.py params.py runAll\n\n(In this case you will have to make sure all the package dependencies are satisfied.)\n\nWhat the steps do\n~~~~~~~~~~~~~~~~~\n\n* ``parseGenbank`` runs through the genbank files and produces input files that are used by subsequent code.\n \n* ``runBlast`` does an all vs. all protein blast of the genes in these strains. The number of processes it will run in parallel is specified by the numThreads parameter in the parameter file. Before running a particular comparison, runBlast checks to see if the output file for that comparison already exists (e.g. from a previous run). If so it skips the comparison.\n \n* ``calcScores`` calculates similarity and synteny scores between genes in the strains. It is also (mostly) parallelized.\n \n* ``makeFamilies`` calculates gene families in a tree aware way, also taking account of synteny.\n\n* ``makeIslands`` groups families according to their origin, putting families with a common origin together as islands. It is partly parallelized.\n\n* ``printAnalysis`` produces a number of analysis files.\n\n* ``createIslandBed`` produces bed files for each genome.\n \nNotes on several parameters\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\n* ``rootFocalClade`` defines the focal clade where we will do the reconstruction. It is specified by giving the name of an internal node. It should be chosen such that there are one or more outgroups outside the focal clade. These outgroups help us to better recognize core genes given the possibility of deletion in some lineages. \n\n* ``numProcesses`` determines how many separate processes to run in parts of the code that are parallel. If you have a machine with 32 processors, you would typically set this to 32 or less.\n\n\nA note on the output\n~~~~~~~~~~~~~~~~~~~~\n\nA brief illustration will allow us to define some terminology used in xenoGI's output. The basic goal of xenoGI is to group genes with a common origin and map them onto a phylogenetic tree.\n\nConsider a clade of three species: (A,B),C. In this group, A and B are most closely related, and C is the outgroup. Gene a in species A has an ortholog b in species B. These two genes have high synteny, but have no ortholog in C. We call a and b a *locus family* because they are descended from a common ancestor, and occur in the same syntenic location.\n\nWhen a genomic island inserts as a part of a horizontal transfer event, it typically brings in multiple locus families at the same time. xenoGI will attempt to group these into a *locus island*. In the a/b case, if there were several other locus families nearby that also inserted on the branch leading to the A,B clade, we would group them together into a single locus island.\n\nAt present, locus islands and locus families are the basic units of output. If you are interested in finding genomic islands that inserted on a particular branch in your tree, you would be looking for locus islands identified on that branch.\n\nLet us define one last bit of terminology. Consider another clade of three species: (X,Y),Z. Genes x1 and y1 represent a locus family in the X,Y clade. They are orthologs sharing high synteny. (And they have no ortholog species Z). Imagine that there is also a set of paralogs x2 and y2 which resulted from a gene duplication in the lineage leading to the X,Y clade. These occur in a different syntenic location. In this case, x2 and y2 constitute another locus family. Because these two locus families descended from a common ancestor gene within the species tree, we place them in the same *family*. In general, a family consists of one or more locus families.\n\nOutput files\n~~~~~~~~~~~~\n\nThe last two steps, printAnalysis and createIslandBed make the output files relevant to the user.\n\n* ``printAnalysis``\n\n ``islandsSummary.out`` contains a summary of islands, organized by node.\n\n This script also produces a set of species specific genome files. These contain all the genes in a strain laid out in the order they occur on the contigs. Each gene entry includes locus island and family information, as well as a brief description of the gene's function. These files all have the name genes in their stem, followed by the strain name, and the extension .out.\n\n* ``createIslandBed`` creates a subdirectory called bed/ containing bed files for each genome showing the locus islands in different colors. (Color is specified in the RGB field of the bed).\n\nInteractive analysis\n~~~~~~~~~~~~~~~~~~~~\n\nAfter you have done runAll, it is possible to bring up the interpreter for interactive analysis::\n\n xenoGI params.py interactiveAnalysis\n \nFrom within python, you can then run functions such as\n\n* printLocusIslandsAtNode\n\n Usage::\n\n printLocusIslandsAtNode('i2') # All locus islands at node i2\n printLocusIslandsAtNode('E_coli_K12') # All locus islands on the E. coli K12 branch\n\n* findLocusIsland\n\n Usage::\n \n findLocusIsland('gadA') # Find a locus island associated with a gene name or description``\n \n* printLocusIsland\n\n Say we've identified locus island 1550 as being of interest. We can print it like this::\n\n printLocusIsland(1550,10) # First argument is locus island id, second is the number of genes to print to each side\n \n printLocusIsland prints the locus island in each strain where it's present. Its output includes the locus island and family numbers for each gene, the most recent common ancestor (mrca) of the family, and a description of the gene.\n\n* printFam\n\n Print scores within a particular gene family, and also with similar genes not in the family::\n \n printFam(3279)\n\n Note that this function takes a family number, not a locus family number.\n\nObtaining a species tree if you don't already have one\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nHaving an accurate species tree is a key to the xenoGI method.\n\nThe package does include some functions that may be helpful if you don't have a species tree. These use MUSCLE to create either protein or DNA alignments, FastTree to make gene trees, and ASTRAL to consolidate the gene trees into a species tree.\n\nYou begin by running the first three steps of xenoGI::\n\n xenoGI params.py parseGenbank\n xenoGI params.py runBlast\n xenoGI params.py calcScores\n\nYou can then run ``makeSpeciesTree``::\n\n xenoGI params.py makeSpeciesTree\n\nThe ``params.py`` file found in the example directory contains a number of parameters related to ``makeSpeciesTree``. Among these is ``dnaBasedSpeciesTree``. If this is True, the method will use DNA based alignments, otherwise it will use protein alignments. Once ``makeSpeciesTree`` has completed, you can proceed with the rest of xenoGI::\n\n xenoGI params.py makeFamilies\n xenoGI params.py makeIslands\n xenoGI params.py printAnalysis\n xenoGI params.py createIslandBed\n \nAdditional flags\n~~~~~~~~~~~~~~~~\n\nPrint the version number::\n \n xenoGI params.py version\n\nProduce a directory containing a gene tree for every family::\n\n xenoGI params.py makeGeneFamilyTrees\n\nThis uses the same methods as the makeSpeciesTree flag (but doesn't call ASTRAL).\n \nProduce a set of pdf files showing histograms of scores between all possible strains::\n\n xenoGI params.py plotScoreHists\n \n \nAdditional files\n----------------\n\nThe github repository also contains an additional directory called misc/. This contains various python scripts that may be of use in conjunction with xenoGI. Installation via pip does not include this, so to use these you need to clone the github repository. There is some brief documentation included in the directory.", "description_content_type": "text/x-rst", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/ecbush/xenoGI", "keywords": "", "license": "", "maintainer": "", "maintainer_email": "", "name": "xenoGI", "package_url": "https://pypi.org/project/xenoGI/", "platform": "", "project_url": "https://pypi.org/project/xenoGI/", "project_urls": { "Homepage": "https://github.com/ecbush/xenoGI" }, "release_url": "https://pypi.org/project/xenoGI/2.2.0/", "requires_dist": null, "requires_python": "", "summary": "Python command line application for detecting genomic island insertions in clades of microbes.", "version": "2.2.0" }, "last_serial": 5602450, "releases": { "1.1.0": [ { "comment_text": "", "digests": { "md5": "bd6fd97bfd9d1f3cfbb673148ad2aec4", "sha256": "f96a9b19d01bea5a99b1237a1487a2a57003a3ac7bc14d430b1376884720769d" }, "downloads": -1, "filename": "xenoGI-1.1.0.tar.gz", "has_sig": false, "md5_digest": "bd6fd97bfd9d1f3cfbb673148ad2aec4", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 51590, "upload_time": "2018-06-02T19:07:36", "url": "https://files.pythonhosted.org/packages/95/e2/cd87ba303e02d33e4f5e9006609da565358462286fe802b4980fd3bc05a6/xenoGI-1.1.0.tar.gz" } ], "1.1.1": [ { "comment_text": "", "digests": { "md5": "b45863b6cf77b40dc7dd557ef6899f94", "sha256": "428765758238e97edcdf777a147f6d42ad08b0d3727973389ab0103e85ff8ce0" }, "downloads": -1, "filename": "xenoGI-1.1.1.tar.gz", "has_sig": false, "md5_digest": "b45863b6cf77b40dc7dd557ef6899f94", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 53817, "upload_time": "2018-06-11T22:47:38", "url": "https://files.pythonhosted.org/packages/87/bf/843085ce397e1162816caa94652d22908496f052bc609e902f873e02c6e2/xenoGI-1.1.1.tar.gz" } ], "1.1.2": [ { "comment_text": "", "digests": { "md5": "e58bd2f066d9473759c2f69b96d58bb3", "sha256": "96a0d0a23bc8adc83b8bb94eafce05ce7650db6f35547583cffe25ec8bcdf078" }, "downloads": -1, "filename": "xenoGI-1.1.2.tar.gz", "has_sig": false, "md5_digest": "e58bd2f066d9473759c2f69b96d58bb3", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 54443, "upload_time": "2018-10-06T18:43:17", "url": "https://files.pythonhosted.org/packages/73/80/45fd2fbab804ac2e9c5143cb71f4380adbd41d3eee51dd1b6f17ba0c2189/xenoGI-1.1.2.tar.gz" } ], "2.0.0": [ { "comment_text": "", "digests": { "md5": "e1fb3337929b6293a08455739993d8f4", "sha256": "760bf4cfd24df6c4fb8d3cb53f142cffc37d456f7fbe3d4ad29f70c6f6d8c8cd" }, "downloads": -1, "filename": "xenoGI-2.0.0.tar.gz", "has_sig": false, "md5_digest": "e1fb3337929b6293a08455739993d8f4", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 65232, "upload_time": "2019-03-20T19:56:01", "url": "https://files.pythonhosted.org/packages/42/55/00ed84e6a21e9daab118969c1cba593dacd9b85a2e05024c2360ba242119/xenoGI-2.0.0.tar.gz" } ], "2.1.0": [ { "comment_text": "", "digests": { "md5": "0d7ac6f356868550bff55a8c85803747", "sha256": "636b23009a29fa5d3eaf126a879ea6c8f8b82ed06f2aa73662c66b6bee331d2e" }, "downloads": -1, "filename": "xenoGI-2.1.0.tar.gz", "has_sig": false, "md5_digest": "0d7ac6f356868550bff55a8c85803747", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 67886, "upload_time": "2019-05-29T14:41:37", "url": "https://files.pythonhosted.org/packages/cb/e6/e305c1843a868d99b159c3624c8848bd10db634d2ae35b7b6449494bf27f/xenoGI-2.1.0.tar.gz" } ], "2.2.0": [ { "comment_text": "", "digests": { "md5": "50c2fa130520d9c8718f4cbb6ec0543b", "sha256": "b18a33fa40fa28cd14208c63912b5d177601ecc8e0ab92998fc8154d1514f801" }, "downloads": -1, "filename": "xenoGI-2.2.0.tar.gz", "has_sig": false, "md5_digest": "50c2fa130520d9c8718f4cbb6ec0543b", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 77297, "upload_time": "2019-07-29T23:18:29", "url": "https://files.pythonhosted.org/packages/7f/27/5aade3ece47e77d824f7ee6262d489867d4ed28173ad1a5d238a917929d5/xenoGI-2.2.0.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "50c2fa130520d9c8718f4cbb6ec0543b", "sha256": "b18a33fa40fa28cd14208c63912b5d177601ecc8e0ab92998fc8154d1514f801" }, "downloads": -1, "filename": "xenoGI-2.2.0.tar.gz", "has_sig": false, "md5_digest": "50c2fa130520d9c8718f4cbb6ec0543b", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 77297, "upload_time": "2019-07-29T23:18:29", "url": "https://files.pythonhosted.org/packages/7f/27/5aade3ece47e77d824f7ee6262d489867d4ed28173ad1a5d238a917929d5/xenoGI-2.2.0.tar.gz" } ] }