{ "info": { "author": "Marek Borowiec", "author_email": "petiolus@gmail.com", "bugtrack_url": null, "classifiers": [ "Development Status :: 2 - Pre-Alpha", "Intended Audience :: Developers", "License :: OSI Approved :: GNU General Public License v3 (GPLv3)", "Natural Language :: English", "Programming Language :: Python :: 3.3" ], "description": "# AMAS\nAlignment manipulation and summary statistics\n\n## Installation\n\nYou can download this repository zipped (button on the right-hand side of the screen) and use `AMAS.py` in the `amas` directory as a stand-alone program or clone it if you have [git installed](http://git-scm.com/book/en/v2/Getting-Started-Installing-Git) on your system.\n\nIf your system doesn't have a Python version 3.4 or newer (`AMAS` will work under Python 3.0 but you may noy be able to use it with multiple cores), you will need to [download and install it](http://www.python.org/downloads/). On Linux-like systems (including Ubuntu) you can install it from the command line using\n\n```\nsudo apt-get install python3\n```\nTo use `AMAS` as a Python module you can get it with [pip](https://pip.pypa.io/en/latest/installing.html) from the [Python Package Index](https://pypi.python.org/pypi/amas/):\n```\npip install amas\n```\nSee below for the instructions on how to use this program as a Python module.\n\n# Command line interface\n`AMAS` can be run from the command line. Here is the general usage (you can view this in your command line with `python3 AMAS.py -h`):\n\n```\nusage: AMAS []\n\nThe AMAS commands are:\n concat Concatenate input alignments\n convert Convert to other file format\n replicate Create replicate data sets for phylogenetic jackknife\n split Split alignment according to a partitions file\n summary Write alignment summary\n remove Remove taxa from alignment\n\nUse AMAS -h for help with arguments of the command of interest\n\npositional arguments:\n command Subcommand to run\n\noptional arguments:\n -h, --help show this help message and exit\n```\n\nTo show help for individual commands, use `AMAS.py -h` or `AMAS.py --help`.\n\n## Examples\nFor every `AMAS.py` run on the command line you need to specify action with `concat`, `convert`, `replicate`, `split`, or `summary` for the input to be processed.\nAdditionally, you need to provide three arguments required for all commands. The order in which the arguments are given does not matter:\n\n1) input file name(s) with `-i` (or in long version: `--in-files`),\n\n2) format with `-f` (`--in-format`),\n\n3) and data type with `-d` (`--data-type`). \n\nThe options available for the format are `fasta`, `phylip`, `nexus` (sequential), `phylip-int`, and `nexus-int` (interleaved). Data types are `aa` for protein alignments and `dna` for nucleotide alignments.\n\nFor example:\n```\npython3 AMAS.py concat -i gene1.nex gene2.nex -f nexus -d dna\n```\n\nIf you have many files that you want to input in one run, you can use multiple cores of your computer to process them in parallel. The `summary` command supports `-c` or `--cores` with which you can specify the number of cores to be used:\n\n```\npython3 AMAS.py summary -f phylip -d dna -i *phy -c 12\n```\nIn the above, we specified 12 cores. Note that this won't improve computing time if you're working with only one or very few files. The parallel processing is only used for the file parsing step and calculating alignment summaries.\n\nIn addition to overall alignment summaries, you can also print statistics calculated on a sequence (taxon) by sequence basis. Use `-s` or `--by-taxon` flag to turn it on. `AMAS` in this mode will print out one file with overall alignment summaries and a file with taxon summaries for each input alignment.\n\nIMPORTANT! `AMAS` is fast and powerful, but be careful: it assumes you know what you are doing and will not prevent you overwriting a file. It will, however, print out a warning if this has happened. `AMAS` was also written to work with aligned data and some of the output generated from unaligned sequences won't make sense. Because of computing efficiency `AMAS` by default does not check if input sequences are aligned. You can turn this option on with `-e` or `--check-align`.\n\n### Concatenating alignments\nFor example, if you want to concatenate all DNA phylip files in a directory and all of them have the `.phy` extension, you can run:\n```\npython3 AMAS.py concat -f phylip -d dna -i *phy\n```\nBy default the output will be written to two files: `partitions.txt`, containing partitions from which your new alignment was constructed, and `concatenated.out` with the alignment itself in the fasta format. You can change the default names for these files with `-p` (`--concat-part`) and `-t` (`--concat-out`), respectively, followed by the desired name. The output format is specified by `-u` (`--out-format`) and can also be any of the following: `fasta`, `phylip`, `nexus` (sequential), `phylip-int`, or `nexus-int` (interleaved).\n\nBelow is a command specifying the concatenated file output format as nexus with `-u nexus`:\n```\npython3 AMAS.py concat -f fasta -d aa -i *phy -u nexus\n```\nAlignments to be concatenated need not have identical sets of taxa before processing: the concatenated alignment will be populated with missing data where a given locus is missing a taxon. However, if every file to be concatenated includes only unique names (for example species name plus sequence name: `D_melanogaster_NW_001845408.1` in one alignment, `D_melanogaster_NW_001848855.1` in other alignment etc.), you will first need to trim those names so that sequences from one taxon have equivalents in all files. \n\n### Getting alignment statistics\nThis is an example of how you can summarize two protein fasta alignments by running:\n```\npython3 AMAS.py summary -f fasta -d aa -i my_aln.fasta my_aln2.fasta\n```\nBy default `AMAS` will write a file with the summary of the alignment in `summary.txt`. You can change the name of this file with `-o` or `--summary-out`. You can also summarize a single or multiple sequence alignments at once. \n\nThe statistics calculated include the number of taxa, alignment length, total number of matrix cells, overall number of undetermined characters, percent of missing data, AT and GC contents (for DNA alignments), number and proportion of variable sites, number and proportion of parsimony informative sites, and counts of all characters present in the relevant alphabet.\n\n### Converting among formats\nTo convert all nucleotide fasta files with a `.fas` extension in a directory to nexus alignments, you could use:\n```\npython3 AMAS.py convert -d dna -f fasta -i *fas -u nexus\n```\nIn the above, the required options are combined with `convert` command to convert the input files and `-u nexus` which indicates the output format.\n\n`AMAS` will not overwrite over input here but will create new files instead, automatically appending appropriate extensions to the input file's name: `-out.fas`, `-out.phy`, `-out.int-phy`, `-out.nex`, or `-out.int-nex`.\n\n### Splitting alignment by partitions\nIf you have a partition file, you can split a concatenated alignment and write a file for each partition:\n```\npython3 AMAS.py split -f nexus -d dna -i concat.nex -l partitions.txt -u nexus\n```\nIn the above one input file `concat.nex` was provided for splitting with `split` and partitions file `partitions.txt` with `-l` (same as `--split-by`). For splitting you should only use one input and one partition file at a time. This is an example partition file:\n```\n AApos1&2 = 1-604\\3, 2-605\\3\n AApos3 = 3-606\\3\n 28SAutapoInDels=7583, 7584, 7587, 7593\n```\nIf this was the `partitions.txt` file from the example command above, `AMAS` would write three output files called `concat_AApos1&2.nex`, `concat_AApos3.nex`, and `concat_28SautapoInDels.nex`. The partitions file will be parsed correctly as long as there is no text prior to the partition name (`CHARSET AApos1&2` or `DNA, AApos1&2` will not work) and commas separate ranges or individual sites in each partition.\n\nSometimes after splitting you will have alignments with taxa that have only gaps `-` or missing data `?`. If you want to these to not be included in the output , add `-j` or `--remove-empty` to the command line. \n\n### Creating replicate data sets\nWith `AMAS` you can create concatenated alignments from a proportion of randomly chosen alignments that can be used for, for example, a phylogenetic jackknife analysis. Say you have 1000 phylip files, each containing a single aligned locus, and you want to create 200 replicate phylip alignments, each built from 100 loci randomly chosen from all the input files. You can do this by specifying `replicate` command and following it with `-r` or `--rep-aln` followed by the number of replicates (in this case `200`) and number of alignments (`100`). Remember to supply the output format with `-u` if you want it to be other than `fasta`:\n```\npython3 AMAS.py replicate -r 200 100 -d dna -f phylip -i *phy -u phylip\n```\n### Removing taxa/sequences from alignment\nIt is possible to remove taxa from alignments:\n```\npython3 AMAS.py remove -x species1 species2 -d dna -f nexus -i *nex -u nexus-int -g no_species12_ \n```\nThe above will process all `nexus` files in the directory and remove taxa called `species1` and `species2`. The argument `-x` (the same as `--taxa-to-remove`) is followed by the names of sequences to be removed. Note that `AMAS` converts spaces into underscores and strips any quotes present in input sequence names before processing, so you may need to modify your names to remove accordingly. The argument `-g` (the same as `--out-prefix`) specifies a prefix to be added to output file names. The default prefix is 'reduced_'. You may want to realign your files after taxon removal.\n \n### Checking if input is aligned\nBy specifying optional argument `-e` (`--check-align`), you can make `AMAS` check if your input files contain only aligned sequences. This option is disabled by default because it can substantially increase computation times in files with many taxa. Enabling this option also provides an additional check against misspecified input file format.\n\n\n# AMAS as a Python module\nUsing `AMAS` inside your Python pipeline gives you much more flexibility in how the input and output are being processed. All the major functions of the command line interface can recreated using `AMAS` as a module. Following installation from [pip](https://pip.pypa.io/en/latest/installing.html) use:\n```\npydoc amas.AMAS\n```\nTo access detailed documentation for the classes and functions available.\n\nYou can import `AMAS` to your script with:\n```python\nfrom amas import AMAS\n```\nThe class used to manipulate alignments in `AMAS` is `MetaAlignment`. This class has to be instantiated with the same, named arguments as on the command line: `in_files`, `data_type`, `in_format`. You also need to supply the number of cores to be used with `cores`. MetaAlignment holds one or multiple alignments and its `in_files` option must be a list, even if only one file is being read.\n```python\n\nmeta_aln = AMAS.MetaAlignment(in_files=[\"gene1.phy\"], data_type=\"dna\",in_format=\"phylip\", cores=1)\n```\nCreating MetaAlignment with multiple files is easy:\n```python\nmulti_meta_aln = AMAS.MetaAlignment(in_files=[\"gene1.phy\", \"gene1.phy\"], data_type=\"dna\", in_format=\"phylip\", cores=2)\n```\nNow you can call the various methods on your alignments. `.get_summaries()` method will compute summaries for your alignments and produce headers for them as atuple with first element being the header and the second element a list of lists with the statistics:\n```python\nsummaries = meta_aln.get_summaries()\n```\nThe header is different for nucleotide and amino acid data. You may choose to skip it and print only the second element of the tuple, that is a list of summary statistics:\n```python\nstatistics = summaries[1]\n```\n`.get_parsed_alignments()` returns a list of dictionaries where each dictionary is an alignment and where taxa are the keys and sequences are the values. This allows you to, for example, print only taxa names in each alignment:\n```python3 \n# get parsed dictionaties\naln_dicts = multi_meta_aln.get_parsed_alignments()\n\n# print only taxa names in the alignments:\nfor alignment in aln_dicts:\n for taxon_name in alignment.keys():\n print(taxon_name)\n```\nTo split alignment use `.get_partitioned(\"your_partitions_file\")` on a `MetaAlignment` with a single input file. `.get_partitioned()` returns a list of dictionaries of dictionaries, with `{ partition_name : { taxon : sequence } }` structure for each partition:\n```python\npartitions = meta_aln.get_partitioned(\"partitions.txt\")\n```\n`AMAS` uses `.get_partitions(\"your_partitions_file\")` to parse the partition file:\n```python\nparsed_parts = meta_aln.get_partitions(\"partitions.txt\")\nprint(parsed_parts)\n```\n\n`.get_replicate(no_replicates, no_loci)` gives a list of parsed alignments (dictionaries), each a replicate constructed from the specified number of loci:\n```python\nreplicate_sets = multi_meta_aln.get_replicate(2, 2)\n```\nTo concatenate multiple alignments first parse them with `.get_parsed_alignments()`, then pass to `.get_concatenated(your_parsed_alignments)`. This will return a tuple where the first element is the `{ taxon : sequence }` dictionary\nof concatenated alignment and the second element is the partitions dict with `{ name : range }`.\n```python\nparsed_alns = multi_meta_aln.get_parsed_alignments()\nconcat_tuple = multi_meta_aln.get_concatenated(parsed_alns)\nconcatenated_alignments = concat_tuple[0]\nconcatenated_partitions = concat_tuple[1]\n```\nRemoving taxa from alignments is very easy:\n```python\nspp_to_remove = [\"taxon1\", \"taxon2\", \"taxon3\"]\nreduced_alns = multi_meta_aln.remove_taxa(spp_to_remove)\n```\nTo print to file or convert among file formats use one of the `.print_format(parsed_alignment)` methods called with a parsed dictionary as an argument. These methods include `.print_fasta()`, `.print_nexus()`, `.print_nexus_int()`, `print_phylip()`, and `.print_phylip_int()`. They return an apporpriately formatted string.\n```python\nfor alignment in concatenated_alignments:\n nex_int_string = meta_aln.print_nexus_int(alignment)\n print(nex_int_string)\n```", "description_content_type": null, "docs_url": null, "download_url": "UNKNOWN", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/marekborowiec/AMAS", "keywords": "amas", "license": "GNU PL v3", "maintainer": null, "maintainer_email": null, "name": "amas", "package_url": "https://pypi.org/project/amas/", "platform": "UNKNOWN", "project_url": "https://pypi.org/project/amas/", "project_urls": { "Download": "UNKNOWN", "Homepage": "https://github.com/marekborowiec/AMAS" }, "release_url": "https://pypi.org/project/amas/0.98/", "requires_dist": null, "requires_python": null, "summary": "Calculate various summary statistics on a multiple sequence alignment", "version": "0.98" }, "last_serial": 1928503, "releases": { "0.2": [ { "comment_text": "", "digests": { "md5": "52573be2e2d706aea55a65e1fb875a16", "sha256": "55a1e8b82c9c5d6fc519f7b3b067f85afe97e202f39589c5c3aeba2e0d5eabf3" }, "downloads": -1, "filename": "amas-0.2.tar.gz", "has_sig": false, "md5_digest": "52573be2e2d706aea55a65e1fb875a16", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 6100, "upload_time": "2015-05-03T20:32:59", "url": "https://files.pythonhosted.org/packages/fe/ab/6f1ebb9f04c9fa2d96fe135accd33dd8faaf92476a9e291979e09f221186/amas-0.2.tar.gz" } ], "0.9": [ { "comment_text": "", "digests": { "md5": "b6a3a3a57fa8a354130bc86116d66b12", "sha256": "7d2e3de303815309516203b80e197c21fcd8c9fa24ef2eb332d96dbf4b14503e" }, "downloads": -1, "filename": "amas-0.9.tar.gz", "has_sig": false, "md5_digest": "b6a3a3a57fa8a354130bc86116d66b12", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 15146, "upload_time": "2015-09-03T01:58:16", "url": "https://files.pythonhosted.org/packages/66/99/8d72d4d0b3edef686d1130867bdc92a7a76c13f2032c3135ab748ff78426/amas-0.9.tar.gz" } ], "0.91": [ { "comment_text": "", "digests": { "md5": "e61d97906b716fe764b263a66131b63d", "sha256": "83774f45a4c1592af655593f41f7d498c341c4188b9a611b5aaab4bbd2cb3044" }, "downloads": -1, "filename": "amas-0.91.tar.gz", "has_sig": false, "md5_digest": "e61d97906b716fe764b263a66131b63d", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 14679, "upload_time": "2015-09-03T16:46:04", "url": "https://files.pythonhosted.org/packages/0d/c1/18886d54dfdc807e3c2f415a9c1a055e6a94a44e1d5293725ce132a8ecaf/amas-0.91.tar.gz" } ], "0.92": [ { "comment_text": "", "digests": { "md5": "e6d52b77f9a4a999009711a825f66ea7", "sha256": "cec2078de14636b30937e0bd57ea4c5865a16ba1357c466edb035aabd16337d2" }, "downloads": -1, "filename": "amas-0.92.tar.gz", "has_sig": false, "md5_digest": "e6d52b77f9a4a999009711a825f66ea7", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 14684, "upload_time": "2015-09-03T17:05:21", "url": "https://files.pythonhosted.org/packages/31/7d/8fbb79222ff1bd6312b4e4beab3671300c7b35d8f41476b0b3f253d704b1/amas-0.92.tar.gz" } ], "0.93": [ { "comment_text": "", "digests": { "md5": "fd74c2b02d2d05f0d626fb04d65e8bc3", "sha256": "877072570e7e2fa06e29adc11b0ceece7c4d9724cf2d8572a44df378cdb411f8" }, "downloads": -1, "filename": "amas-0.93.tar.gz", "has_sig": false, "md5_digest": "fd74c2b02d2d05f0d626fb04d65e8bc3", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 265476, "upload_time": "2015-09-03T23:56:32", "url": "https://files.pythonhosted.org/packages/28/21/397eaeed6643687961dd9e1b21ca84a4c0e6954958745e77966a5255ddf4/amas-0.93.tar.gz" } ], "0.94": [ { "comment_text": "", "digests": { "md5": "0d301f4517b7dfd83096b19f360623bb", "sha256": "4399790f0ac6e42e378cce29dcba12bd24a15033a4406728a69c561de0e20834" }, "downloads": -1, "filename": "amas-0.94.tar.gz", "has_sig": false, "md5_digest": "0d301f4517b7dfd83096b19f360623bb", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 264947, "upload_time": "2015-11-21T22:53:34", "url": "https://files.pythonhosted.org/packages/78/d1/b32b4030ca5c456305bfb66d36e73b8c862158513d422ca140348abbc7ed/amas-0.94.tar.gz" } ], "0.95": [ { "comment_text": "", "digests": { "md5": "e49e837d64a75955fda5cb9d5dc3cbd5", "sha256": "4f177f19d442c338a8d93363f2c945d109e07e36a16e84398b46a8fbdce0a204" }, "downloads": -1, "filename": "amas-0.95.tar.gz", "has_sig": false, "md5_digest": "e49e837d64a75955fda5cb9d5dc3cbd5", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 265998, "upload_time": "2015-11-25T00:58:13", "url": "https://files.pythonhosted.org/packages/f0/ae/6c75ed58cdc0829d8dd1c6a29baef248373b7cb8b0904fb56c4596f1655f/amas-0.95.tar.gz" } ], "0.96": [ { "comment_text": "", "digests": { "md5": "aef6a49960882582e3098f691d3349f8", "sha256": "7cafa9d6ecb2653490bd878bc55f2ba33ee11fb536fff762ad4250c2d57072db" }, "downloads": -1, "filename": "amas-0.96.tar.gz", "has_sig": false, "md5_digest": "aef6a49960882582e3098f691d3349f8", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 266837, "upload_time": "2015-12-07T17:47:34", "url": "https://files.pythonhosted.org/packages/a3/82/f2af0f5c68c31e27aedef68cc33a6eececba5a330a88ad2570aabd2a6aef/amas-0.96.tar.gz" } ], "0.97": [ { "comment_text": "", "digests": { "md5": "ca55dd6c53f7bc709209aad6bf17b413", "sha256": "1cf56738dd3d3c68441bdbc01283e29dd07225566339dd50bdef5f7e8548bfa4" }, "downloads": -1, "filename": "amas-0.97.tar.gz", "has_sig": false, "md5_digest": "ca55dd6c53f7bc709209aad6bf17b413", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 266948, "upload_time": "2015-12-07T18:48:41", "url": "https://files.pythonhosted.org/packages/98/11/f9775f6894c3d9e2f043b548351c327b2cf2423952a399379c0ddff07d1a/amas-0.97.tar.gz" } ], "0.98": [ { "comment_text": "", "digests": { "md5": "c87a7c621057c53c46e8b099f8c0a228", "sha256": "aa5dab87acb8ee001ba5ca7a2ef1bf2a39498f8b4dd6bc290ddc4d47e778c17e" }, "downloads": -1, "filename": "amas-0.98.tar.gz", "has_sig": false, "md5_digest": "c87a7c621057c53c46e8b099f8c0a228", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 267990, "upload_time": "2016-01-29T03:19:08", "url": "https://files.pythonhosted.org/packages/f0/04/356c63c97a277e679349292236348e1f6731601482133a6ae2c4f8418730/amas-0.98.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "c87a7c621057c53c46e8b099f8c0a228", "sha256": "aa5dab87acb8ee001ba5ca7a2ef1bf2a39498f8b4dd6bc290ddc4d47e778c17e" }, "downloads": -1, "filename": "amas-0.98.tar.gz", "has_sig": false, "md5_digest": "c87a7c621057c53c46e8b099f8c0a228", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 267990, "upload_time": "2016-01-29T03:19:08", "url": "https://files.pythonhosted.org/packages/f0/04/356c63c97a277e679349292236348e1f6731601482133a6ae2c4f8418730/amas-0.98.tar.gz" } ] }