{ "info": { "author": "David Sundell", "author_email": "david.sundell@foi.se", "bugtrack_url": null, "classifiers": [ "License :: OSI Approved :: GNU General Public License v3 (GPLv3)", "Operating System :: OS Independent", "Programming Language :: Python :: 3" ], "description": "# Flexible Taxonomy Databases (FlexTaxD)\n\nFlexible Taxonomy Databases - A cross platform tool for customization and merging of various taxonomic classification sources.\n\nSupported sources in version later than v0.1.1:\n* QIIME\n* NCBI\n* CanSNPer\n* TSV\n\nThe flextaxd (flextaxd) script allows customization of databases from NCBI, QIIME or CanSNPer sources and supports export functions into NCBI formatted names and nodes.dmp files as well as a standard tab separated file (or a selected separation). The script was initially written to allow the use of GTDB with some custom modifications to allow increased resolution of selected subgroups. GTDB was created by an Australian group aimed to restructure the taxonomy relation from the NCBI taxonomy tree to strictly follow a phylogenetic structure (http://gtdb.ecogenomic.org/) this script can use the bac120_taxonomy_r89.tsv files from the GTDB downloads page as input (with the --taxonomy_type selected as QIIME). By default the script will read a Tab separated file containing parent and child (defined by column headers). The script also allows customization of the database using multiple sources and databases can be merged at a selected node(s) there is also an option to add resolution to certain subgroups (ie combine the different database types) using a tab separated file (format described below).\n\nAll data is kept in a sqlite3 database (.ftd by default) and can be dumped to NCBI formatted names and nodes.dmp files. Supported export formats are NCBI and TSV). The TSV dump format is similar to the NCBI dump except that it contains a header (parentchild), has parent on the left and only uses tab to separate each column (not \\|\\).\n\n# Installation\n```\n## Using pip or conda\npip3 install flextaxd\nconda install flextaxd\n\n## Using python\npython setup.py install\n```\n# Usage\nA new database(sqlite3) file will be created automatically when a taxonomy file is supplied default full path (.ftd)\n\nDownload the latest GTDB files (latest at the time of this update is \"bac120_taxonomy_r89.tsv\"\ncheck https://data.ace.uq.edu.au/public/gtdb/data/releases/latest/ for latest version)\n```wget https://data.ace.uq.edu.au/public/gtdb/data/releases/latest/bac[120]_taxonomy.tsv```\n#### Save the default FlexTaxD into a custom taxonomy database\n```\nflextaxd --taxonomy_file bac_taxonomy_r86.tsv --taxonomy_type QIIME --database .ftd\n```\n\n### Write the database into NCBI formatted nodes and names.dmp\n```\nflextaxd --dump\n```\n\n### Optional parameters\nUse the --help option for a complete list of parameters\n```\nflextaxd --help\n```\n\n## Modify your database\nThe database update function can use either a previously built flextaxd database or directly through a TAB separated text file with headers (parent, child, (level))). Using the --parent parameter, all nodes/edges subsequent to that parent will be added (or can replace an existing node see options) with the links supplied. The parent node must exist in the database/tables and must have the same name (ex \"Francisella tularensis\"). Using the (--replace) parameter all children in the old database under the given parent will be removed, if you only want to replace for example Francisella tularensis be sure not to choose Francisella as parent.\n\n\n#### Modify the database and add sub-species specifications (for Francisella tularensis)\n```\nflextaxd --mod_file custom_modification.txt --parent \"Francisella tularensis\" --genomeid2taxid custom_genome_annotations.txt\n```\n#### Example of custom_modifications.txt and custom_genome_annotations.txt\nThe modification file must contain three columns \\ and \\ and \\) -> (Note that these tags in the file below are only there to show what is what in the file, also any number of extra tabs/spaces are there only to visualize columns! Only headers, node names and \\\\t \\\\n chars should be in the file).\n```\n
parent \\tchild \\tlevel\\n\nFrancisella tularensis \\tFrancisella tularensis tularensis \\tsubspecies\\n\nFrancisella tularensis tularensis \\tB.6 \\tsubsubspecies\\n\nFrancisella tularensis \\tFrancisella tularensis holarctica \\tsubspecies\\n\n```\nThe genome annotation file will then contain the genome_id to taxonomy(node) name as annotation. The genome id has to match the names of the genomes in the genomes_path. In particular if with use of the create_kraken_database subscript. If the genome is already annotated, the annotation will be updated.\n```\nGCF_00005111.1\\tFrancisella tularensis tularensis\nGCF_00005211.1\\tFrancisella tularensis tularensis\n```\n\n## One liner version\nCreate, modify and dump your database\n```\nflextaxd --taxonomy_file bac_taxonomy_r86.tsv --taxonomy_type QIIME --mod_file custom_modification.txt --genomeid2taxid custom_genome_annotations.txt --parent \"Francisella tularensis\" --dump\n```\n\n#####\n# Customize the NCBI taxonomy tree\n#####\n\nThe most common database to start with is the NCBI taxonomy tree, however there are many known caveats to the NCBI tree,\nFlexTaxD allows modifications of the NCBI taxonomy by replacing nodes with correct structures.\n\nCreating a custom taxonomy database using the NCBI taxonomy tree instead of FlexTaxD as base\n```\nRequired files from NCBI (ftp://ftp.ncbi.nlm.nih.gov/pub/taxonomy):\nfrom taxdmp.zip\n names.dmp\n nodes.dmp\nnucl_gb.accession2taxid.gz (ftp://ftp.ncbi.nlm.nih.gov/pub/taxonomy/accession2taxid/nucl_gb.accession2taxid.gz)\n```\n### Create a custom taxonomy database from the NCBI taxonomy\n```\nflextaxd\n --taxonomy_file taxonomy/nodes.dmp ## Path to taxonomy nodes.dmp\n --taxonomy_type NCBI ## NCBI formatted input\n --genomes_path refseq/bacteria/ ## path to ncbi-genome-download folder with bacteria (or other folder structures)\n --genomeid2taxid taxonomy/nucl_gb.accession2taxid.gz ## accession numbers to taxid annotation\n --database NCBI_taxonomy.db ## Name of the database where all information will be kept (for future use or reuse of flextaxd)\n -o NCBI_database ## Output folder\n --dump ## Print out names.dmp and nodes.dmp from flextaxd database\n```\n### Modify the NCBI database using a previously created flextaxd (example database from another source (CanSNPer.db))\nReplace the Francisella node in the NCBI database with the node structure from a CanSNP database containing the Francisella CanSNPer tree.\n```\n## Step one: create your database containing required modifications\nflextaxd --taxonomy_file cansnper_tree.txt --taxonomy_type CanSNPer --genomeid2taxid cansnper_genometotaxid_annotation.txt --database canSNPer_database/CanSNPer.db\n\n## Step two: create your custom taxonomy database from NCBI\nflextaxd --database NCBI_taxonomy.db --mod_database canSNPer_database/CanSNPer.db --parent \"Francisella\"\n\n## Step three: dump your database\nflextaxd --dump\n```\n\n#####\n# Create a kraken database\n#####\n\nFinally there is a quick option to create a kraken2 or a krakenuniq database using your custom taxonomy database by only supplying genome names matching your annotation table given as --genomeid2taxid\nRequirements: kraken2 or krakenuniq needs to be installed, For the FlexTaxD standard database, source data from genbank or refseq is required (for genomeid2taxid match)\n\nNote: If your genome names are different, you can create a custom genome2taxid file and import into your database to match the names of your genome fasta/fna/fa files. Note that the genome files needs to be gzipped (and end with .gz) in their stored location.\n\n### Create Kraken DB\nFirst dump your CTDB into names.dmp and nodes.dmp (default) if that was not already done.\n```\nflextaxd --dump -o NCBI_database\n```\n\nAdd genomes to a kraken database (and create the kraken database using --create_db)\n```\nflextaxd-create --kraken_db my_custom_krakendb --genomes_path /path/to/genomes/folder --create_db --krakenversion kraken2\n```\n\nThe script will find all fasta, fna, fa files in the given path and then add them to the krakendb, if --create_db parameter is given\nthe script will execute the kraken-build --build command.\n\n### Create kraken2 NCBI database (approx 60min with 40 cores with complete bacterial genomes from genbank may 2019 (~9000)) \n```\nconda activate kraken2\nflextaxd-create\n --database NCBI_taxonomy.db ## flextaxd database file\n -o NCBI_database ## output dir (must be the same as where names and nodes.dmp were exported (--dump))\n --kraken_db NCBI_krakendb ## Name of kraken database, absolute path not realtive to --outdir (-o)\n --genomes_path refseq/bacteria/ ## Path to ncbi-genome-download folder with bacteria (or other folder structures) using the same as when\n ## the database was created is recommended otw genomes may be missed\n --create_db ## Request script to attempt to create a kraken database (kraken2 or krakenuniq must be installed)\n --krakenversion kraken2 ## Select version of kraken which is installed\n -p 40 ## Number of cores to use for adding genomes to the kraken database as well as the number of cores\n used to run kraken*-build (\\*2,uniq)\n --skip \"taxid\" ## exclude genomes in taxid see details below\n```\n\n### Exclude genomes in taxid on database creation\nRemove branches, the --skip parameter was implemented for benchmarking purposes as an option to remove branches by taxid, all children of the given taxid will be excluded.\n\nflextaxd-create --skip \"taxid,taxid2\"\n\n\n", "description_content_type": "text/markdown", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/FOI-Bioinformatics/flextaxd", "keywords": "taxonomy NCBI CanSNPer customization GTDB", "license": "GNU GENERAL PUBLIC LICENSE version 3", "maintainer": "", "maintainer_email": "", "name": "flextaxd", "package_url": "https://pypi.org/project/flextaxd/", "platform": "", "project_url": "https://pypi.org/project/flextaxd/", "project_urls": { "Homepage": "https://github.com/FOI-Bioinformatics/flextaxd" }, "release_url": "https://pypi.org/project/flextaxd/0.1.5/", "requires_dist": null, "requires_python": ">=3.6", "summary": "Script that allows the creation of custom kraken databases from various sources (NCBI, QIIME, CanSNPer)", "version": "0.1.5" }, "last_serial": 5906833, "releases": { "0.1.5": [ { "comment_text": "", "digests": { "md5": "66f45eab5d96cb547d714f4552b7dd1e", "sha256": "dcbb3b1e6a239b79f6147b47613d8c8ce13f4dce9ec5bfc28d171e33fea554f5" }, "downloads": -1, "filename": "flextaxd-0.1.5-py3-none-any.whl", "has_sig": false, "md5_digest": "66f45eab5d96cb547d714f4552b7dd1e", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": ">=3.6", "size": 48480, "upload_time": "2019-09-30T10:21:18", "url": "https://files.pythonhosted.org/packages/f3/d8/c7f0dd8ba11d5022712f7ab4c6843ab6ece49c55e63891b4358a065d49e8/flextaxd-0.1.5-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "498aeed7a2fbfda4497f3d89749138c4", "sha256": "fb1949b4259bc8a62d5f234eb5691c6f772b642d7540643a83ca1669f0df478b" }, "downloads": -1, "filename": "flextaxd-0.1.5.tar.gz", "has_sig": false, "md5_digest": "498aeed7a2fbfda4497f3d89749138c4", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.6", "size": 41287, "upload_time": "2019-09-30T10:21:19", "url": "https://files.pythonhosted.org/packages/33/51/80648eb6de43a000e71c1c8228983634595d0170204553fcf13c5b7ee290/flextaxd-0.1.5.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "66f45eab5d96cb547d714f4552b7dd1e", "sha256": "dcbb3b1e6a239b79f6147b47613d8c8ce13f4dce9ec5bfc28d171e33fea554f5" }, "downloads": -1, "filename": "flextaxd-0.1.5-py3-none-any.whl", "has_sig": false, "md5_digest": "66f45eab5d96cb547d714f4552b7dd1e", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": ">=3.6", "size": 48480, "upload_time": "2019-09-30T10:21:18", "url": "https://files.pythonhosted.org/packages/f3/d8/c7f0dd8ba11d5022712f7ab4c6843ab6ece49c55e63891b4358a065d49e8/flextaxd-0.1.5-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "498aeed7a2fbfda4497f3d89749138c4", "sha256": "fb1949b4259bc8a62d5f234eb5691c6f772b642d7540643a83ca1669f0df478b" }, "downloads": -1, "filename": "flextaxd-0.1.5.tar.gz", "has_sig": false, "md5_digest": "498aeed7a2fbfda4497f3d89749138c4", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.6", "size": 41287, "upload_time": "2019-09-30T10:21:19", "url": "https://files.pythonhosted.org/packages/33/51/80648eb6de43a000e71c1c8228983634595d0170204553fcf13c5b7ee290/flextaxd-0.1.5.tar.gz" } ] }