{ "info": { "author": "Anders Gon\u00e7alves da Silva", "author_email": "andersgs@gmail.com", "bugtrack_url": null, "classifiers": [ "License :: OSI Approved :: GNU Lesser General Public License v3 (LGPLv3)", "Programming Language :: Python :: 3", "Programming Language :: Python :: 3.6", "Programming Language :: Python :: 3.7" ], "description": "# beastify: Generate input file for BEAST from whole-genome alignmennt\n\n## Background\n\nSometimes you want to partiion your alignment in to distinct codon positions\n(i.e., 1, 2, and 3), and you want to also model non-coding positions in your\nBEAST analysis.\n\n`beastify` does that for you. It will:\n\n1. Figure out all the codon positions in your reference (including overlapping positions)\n2. Optionally, label your sequences with any metadata (e.g., dates)\n3. Optionally, allows you to remove one or more positions from the alignment\n4. Optionally, allows you to mask positions form the alignment\n5. Optionally, allows you to sub-sample the alignment (if you want to work on a smaller dataset to test your models before throwing the whole kitchen at BEAST).\n6. Output a NEXUS files with the partitions ready for running BEAUTi.\n\nPartitions are labelled:\n\n1. For the first codon position\n2. For the second codon position\n3. For the third codon position\n4. For any overlapping codons (sometimes CDS annotations overlap because sometimes bacterial genes will share codons)\n5. If position is not found in a CDS.\n\n## Installation\n\n### Dependencies\n\n- Python >=3.6\n- Click\n- Pandas\n- Numpy\n- BioPython\n\n### Using pip\n\n```\npip3 install beastify\n```\n\n### Testing your installation\n\n```\nbeastify --test\n```\n\n## Input\n\n1. Genbank reference\n2. `snippy` \\*.consensus.subs.fa files\n3. List of genes to include in the final alignment\n4. N (optional) --- random number of genes to select and include\n\n### Command list\n\n```\n --out TEXT Outfile name (default: out.nexus)\n --info TEXT Path to a tab-delimited file with two or more\n columns. The first column has the isolate ID, and\n other columns have dates, location, etc. The\n information will be added to the isolate ID in the\n same order as the columns\n --inc_ref Whether to include the reference in the final out\n file (default: False)\n --aln_file TEXT A sequence alignment file to give in lieu of\n folder with snippy output.\n --aln_file_format TEXT If providing an alignment file with --aln_file,\n set the format of the alignment. Any format\n supported by BioPython:AlignIO could be valid.\n Default: fasta. Tested: fasta.\n --subsample INTEGER Subsample X number of bases at random from each\n partition. default: all bases\n --subsample_seed INTEGER Set the seed when subsampling sites. Default:42\n --parts TEXT Comma-separated list of partitions to include.\n default:1,2,3,4,5\n --test Run beastify tests and exit\n --mask TEXT A BED file indicating regions to mask from the\n genome\n --version Show the version and exit.\n --help Show this message and exit.\n```\n\n## Output\n\nA `nexus` formatted file ready for `beast`.\n\n## Script outline\n\n1. Parse coordinates of genes from Genbank into a `Genes` Class\n - Methods:\n - load_features: a method to load the Genbank features into a\n dictionary. **Method should check that there\n the length is a multiple of 3**, and that the\n **start** and **end** codons are in place. **stop**\n codon should be stripped.\n - parse_snippycore: a method to load the snippy core.tab data\n and identify all variable SNPs among the data that are in\n coding regions --- has options to return a 'random' sample\n of size N genes, 'top' genes with the most SNPs, with the\n N top genes with most SNPs.\n - Data:\n - features: a dictionary with key = genename and value\n set by seqFeature object --- **IF** N is provided, only\n keep a random set of gene*coords of size \\_N*\n2. Load `snippy` alignment into an `Isolate` Class\n - Methods:\n - load_fasta: will load the sequence into the object\n - cat_genes: given an isolate id, and a genes object,\n return a concatenated sequence (NOT IMPLEMENTED YET)\n - get_gene: return a string with the sequence for the gene specified\n by gene_id using a `Genes` object\n - **str**: print the sequence ID and length, if there is one\n - **getitem**: return the sequence string associated with the key\n - add_dates: the user supplies a table of isolate IDs, and dates in\n a format suitable for BEAST, and the script adds them to the\n identifier\n - Data:\n - seq: A SeqRecord\n - id: The isolate id\n - genes: a dictionary with 'gene_name' as keys and sequence string as\n value\n3. `Collection` class to store all the `Isolate` objects\n - Methods:\n - load_isolates: given a list of isolate files, creates\n and stores individual `Isolate` objects for each.\n - gen_align: given a `Genes` object, generate the\n alignment --- uses `cat_genes`\n - **getitem**: given an isolate ID as a key, return the `Isolate`\n object\n - Data:\n - isolates: a dictionary with isolate id as keys and\n `Isolate` objects as values\n\n\n", "description_content_type": "text/markdown", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/andersgs/beastify", "keywords": "", "license": "", "maintainer": "", "maintainer_email": "", "name": "beastify", "package_url": "https://pypi.org/project/beastify/", "platform": "", "project_url": "https://pypi.org/project/beastify/", "project_urls": { "Homepage": "https://github.com/andersgs/beastify" }, "release_url": "https://pypi.org/project/beastify/0.2.0a0/", "requires_dist": [ "click", "pandas", "numpy", "biopython" ], "requires_python": "", "summary": "Partition your alignment into distinct codon positions and non-coding positions", "version": "0.2.0a0" }, "last_serial": 4505389, "releases": { "0.2.0a0": [ { "comment_text": "", "digests": { "md5": "559acd83311fe7924a515e97efcdf346", "sha256": "3d7c4ac923c45fa412916e6e1e99591882e1dd22526bcf027f3f55b41ba9ba2c" }, "downloads": -1, "filename": "beastify-0.2.0a0-py3-none-any.whl", "has_sig": false, "md5_digest": "559acd83311fe7924a515e97efcdf346", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 614731, "upload_time": "2018-11-20T01:08:25", "url": "https://files.pythonhosted.org/packages/c6/fc/ab2fe4c20707d20947909938aaaf2144645a7c430bd27160e98041efc923/beastify-0.2.0a0-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "578bed2d67cd2f13d6b773788a298f68", "sha256": "f39e6fe7155a4811b9f68f96a3da168aecd3936c80e5417b6e68f9f928dd701d" }, "downloads": -1, "filename": "beastify-0.2.0a0.tar.gz", "has_sig": false, "md5_digest": "578bed2d67cd2f13d6b773788a298f68", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 601021, "upload_time": "2018-11-20T01:08:28", "url": "https://files.pythonhosted.org/packages/40/e0/9758280d8eca480a1fae4df6aba419d2d102d3879d464086faab4e721065/beastify-0.2.0a0.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "559acd83311fe7924a515e97efcdf346", "sha256": "3d7c4ac923c45fa412916e6e1e99591882e1dd22526bcf027f3f55b41ba9ba2c" }, "downloads": -1, "filename": "beastify-0.2.0a0-py3-none-any.whl", "has_sig": false, "md5_digest": "559acd83311fe7924a515e97efcdf346", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 614731, "upload_time": "2018-11-20T01:08:25", "url": "https://files.pythonhosted.org/packages/c6/fc/ab2fe4c20707d20947909938aaaf2144645a7c430bd27160e98041efc923/beastify-0.2.0a0-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "578bed2d67cd2f13d6b773788a298f68", "sha256": "f39e6fe7155a4811b9f68f96a3da168aecd3936c80e5417b6e68f9f928dd701d" }, "downloads": -1, "filename": "beastify-0.2.0a0.tar.gz", "has_sig": false, "md5_digest": "578bed2d67cd2f13d6b773788a298f68", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 601021, "upload_time": "2018-11-20T01:08:28", "url": "https://files.pythonhosted.org/packages/40/e0/9758280d8eca480a1fae4df6aba419d2d102d3879d464086faab4e721065/beastify-0.2.0a0.tar.gz" } ] }