{ "info": { "author": "Eric Hugoson", "author_email": "eric@hugoson.org", "bugtrack_url": null, "classifiers": [ "Development Status :: 5 - Production/Stable", "Environment :: Console", "Intended Audience :: Science/Research", "License :: OSI Approved :: GNU General Public License v3 or later (GPLv3+)", "Programming Language :: Python :: 3", "Programming Language :: Python :: 3.6", "Programming Language :: Python :: 3.7", "Topic :: Scientific/Engineering :: Bio-Informatics" ], "description": "==============\n**randMAG**\n==============\n\n- Eric Hugoson (eric@hugoson.org / hugoson@evolbio.mpg.de / @EricHugo)\n\n\nIntroduction\n--------------\nWith an ever increasing rate of sequencing comes challenges in analysing the data.\nIn particular, single-amplified genomes and metagenome assembled genomes (SAGs & MAGs) may\nprove challenging due to their propensity to be incomplete or contaminated\nwith contigs from other genomes. Many tools exist to tackle the challenges at\nvarious stages of the analysis, but none are perfect. In order to devise new\ntools methods for dealing with SAGs and MAGs the ability to simulate such data\nsets is paramount.\n\nrandMAG is a slim software aimed at producing artificial MAGs or SAGs from existent\ncomplete genomes. randMAG produces randomised MAGs based on whole genomes provided, a\ndistribution of contig lengths given, as well as desired completeness- and\ncontamination levels.\n\nPackaged with randMAG is a file containing the lengths of 192243 contigs of 2284\nbacterial MAGs [1] from the Tara Oceans metagenomic survey.\n\n#. [TullyEtAl_2018]_\n\nDependencies\n--------------\nPython (>=3.6)\n\nPython libraries\n^^^^^^^^^^^^^^^^^^^\nIf built from the PyPI package these will be installed automatically, otherwise can\neasily be installed using ``pip`` (`Install pip `_).\n\nRequired\n\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\n- Biopython (>= 1.70)\n- Numpy (>= 1.13.1)\n- Matplotlib (>= 2.0.2)\n- SciPy (>= 1.1.0)\n- seaborn (>= 0.9.0)\n- pandas (>= 0.23.4)\n\nPositional arguments\n^^^^^^^^^^^^^^^^^^^^^^^\n::\n\n genome_tab Genomes in .fna in list format\n distribution Set of lengths to base the distribution on\n\n``genome_tab`` should be a simple file with a list of genomes (nucleotide fasta files)\nthat will be used to built the artificial MAGs, e.g.: ::\n\n GCF_000007265.1_ASM726v1_genomic.fna\n GCF_000007325.1_ASM732v1_genomic.fna\n ...\n\n``distribution`` should also be a simple file containing a list of integers representing\na distribution of contig sizes by which the artificial contigs will be split, e.g.::\n\n 23967\n 15981\n 149609\n ...\n\nOptionally, ``distribution`` can be filled with ``Tara_bact`` to use the set of contig\nlengths from Tara Oceans described in the introduction.\n\n\nOptional arguments\n^^^^^^^^^^^^^^^^^^^^^^^^\n -h, --help show this help message and exit\n -c COMPLETENESS, --completeness COMPLETENESS\n Range of completeness levels to be produced.\n Default=1.0\n -r CONTAMINATION, --contamination CONTAMINATION\n Range of contamination to be included in produced\n MAGs. Default=1.0\n -n NUM, --num NUM Number of randomised MAGs to produce. Produces one\n randomised per provided genome by default.\n\n\nExamples\n^^^^^^^^^^^^^^^^^^^^^^^^\nInitially, create a file with a list of the genomic fasta files which are to be the basis for the artificial MAGs::\n\n $ head -n5 reference_fna.list \n /nobackup/genomic_sources/genbank_reference_bacteria_fna_20181128/ncbi-genomes-2018-11-28/GCF_000005845.2_ASM584v2_genomic.fna\n /nobackup/genomic_sources/genbank_reference_bacteria_fna_20181128/ncbi-genomes-2018-11-28/GCF_000006745.1_ASM674v1_genomic.fna\n /nobackup/genomic_sources/genbank_reference_bacteria_fna_20181128/ncbi-genomes-2018-11-28/GCF_000006765.1_ASM676v1_genomic.fna\n /nobackup/genomic_sources/genbank_reference_bacteria_fna_20181128/ncbi-genomes-2018-11-28/GCF_000006785.2_ASM678v2_genomic.fna\n /nobackup/genomic_sources/genbank_reference_bacteria_fna_20181128/ncbi-genomes-2018-11-28/GCF_000006845.1_ASM684v1_genomic.fna\n\n\nThen a second file which is a list of lengths representing the distribution of contig sizes::\n\n $ head -n5 Tara_bact\n 23967\n 15981\n 149609\n 37636\n 121320\n\nExample 1 - Split into contigs\n\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\n\nThis example splits up all .fna files in ``reference_fna.list`` into contigs according\nto the distribution of contig lengths in ``Tara_bact``::\n\n $ randMAG reference_fna.list Tara_bact\n\nThe result is all a single .fna file created in the current working directory for\neach input .fna. Also produced is ``simulated_MAGs.tab``, a tabular file containing input\nfile, the unique 8-char suffix assigned, completeness-, and redundancy fractions::\n\n $ head -n5 simulated_MAGs.tab\n GCF_000005845.2_ASM584v2_genomic\tmpmwoutk\t1.0\t1.0\n GCF_000006745.1_ASM674v1_genomic\tmmqehxfx\t1.0\t1.0\n GCF_000006765.1_ASM676v1_genomic\tengiyxoz\t1.0\t1.0\n GCF_000006785.2_ASM678v2_genomic\trtvayhqu\t1.0\t1.0\n GCF_000006845.1_ASM684v1_genomic\tbzpsqnfq\t1.0\t1.0\n\nThe content of the fasta files in this example remained unchanged apart from\nbeing split into contigs.\n\nExample 2 - Alter completeness/contamination\n\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\nTo change the completeness and contamination of the fasta files the ``-c`` and ``-r``\narguments need to be used. The ``-n`` argument can be used to get precisely the\ndesired number of unqiue MAGs::\n\n $ randMAG reference_fna.list Tara_bact -c 0.7 -r 1.2 -n 10000\n\nThis will produce files that are at most 70% complete and at least 20% contaminated::\n\n $ head -n5 simulated_MAGs.tab \n GCF_000005845.2_ASM584v2_genomic\tkfcckaxy\t0.6956260400391929\t1.2092498368299183\n GCF_000006745.1_ASM674v1_genomic\txqnerzfy\t0.6927500102156292\t1.202911467089386\n GCF_000006765.1_ASM676v1_genomic\tkiubfyau\t0.6988059837775469\t1.200510423558475\n GCF_000006785.2_ASM678v2_genomic\txxltcsmv\t0.6849106013550827\t1.2144702270571384\n GCF_000006845.1_ASM684v1_genomic\trcxezdoq\t0.6927952822804169\t1.2028640216997193\n\nAs well as 10 000 unique MAGs as requested with ``-n``::\n\n $ wc -l simulated_MAGs.tab\n 10000 simulated_MAGs.tab\n\nReferences\n----------------\n\n.. [TullyEtAl_2018] Tully,B.J. et al. (2018) The reconstruction of 2,631 draft metagenome-assembled genomes from the global oceans. Sci. Data, 5, 170203", "description_content_type": "", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://bitbucket.org/EricHugo/randmag", "keywords": "bioinformatics simulation metagenomics", "license": "GNU General Public License v3 or later (G PLv3+)", "maintainer": "", "maintainer_email": "", "name": "randmag", "package_url": "https://pypi.org/project/randmag/", "platform": "", "project_url": "https://pypi.org/project/randmag/", "project_urls": { "Homepage": "https://bitbucket.org/EricHugo/randmag" }, "release_url": "https://pypi.org/project/randmag/1.2.0/", "requires_dist": null, "requires_python": "", "summary": "Construct artifical MAGs from complete genomes according to distribution", "version": "1.2.0" }, "last_serial": 5589753, "releases": { "1.0.0": [ { "comment_text": "", "digests": { "md5": "657c85a4908b0cf0d24e6eeb4677148e", "sha256": "7588cdaa730f63f392910f06390c42526556058d55e011c7681e28bca66030a2" }, "downloads": -1, "filename": "randmag-1.0.0.tar.gz", "has_sig": false, "md5_digest": "657c85a4908b0cf0d24e6eeb4677148e", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 16893, "upload_time": "2019-02-05T18:42:23", "url": "https://files.pythonhosted.org/packages/2e/9c/136e617d7b4fa5ef2960fcd3e875e283e305ef1245331a0421e59fbf9974/randmag-1.0.0.tar.gz" } ], "1.1.0": [ { "comment_text": "", "digests": { "md5": "3d729ab0e58c7a97f73c1a69296a5e5d", "sha256": "f8d7a43aa30da62a2fc20b46a8d8fd557e6cdd3cecd2927007e0b2df376a2be8" }, "downloads": -1, "filename": "randmag-1.1.0.tar.gz", "has_sig": false, "md5_digest": "3d729ab0e58c7a97f73c1a69296a5e5d", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 17089, "upload_time": "2019-02-21T21:52:01", "url": "https://files.pythonhosted.org/packages/c8/9e/b8bfc5c8f7a641c39e2a01d3b7e3946ad3cdb2f5b43eb9503b32552cd1ed/randmag-1.1.0.tar.gz" } ], "1.2.0": [ { "comment_text": "", "digests": { "md5": "aeb0cb226ae070e392c45ebd1d90185e", "sha256": "0bde02a1113e4c95894ee9b3c133ca3f891e26c85c5e05590f25858c4892d9c2" }, "downloads": -1, "filename": "randmag-1.2.0.tar.gz", "has_sig": false, "md5_digest": "aeb0cb226ae070e392c45ebd1d90185e", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 526437, "upload_time": "2019-07-26T16:28:56", "url": "https://files.pythonhosted.org/packages/32/7a/157b5f44a48542076dbef835d4d639a1e9c20f07a8876e9cb96e89771c03/randmag-1.2.0.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "aeb0cb226ae070e392c45ebd1d90185e", "sha256": "0bde02a1113e4c95894ee9b3c133ca3f891e26c85c5e05590f25858c4892d9c2" }, "downloads": -1, "filename": "randmag-1.2.0.tar.gz", "has_sig": false, "md5_digest": "aeb0cb226ae070e392c45ebd1d90185e", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 526437, "upload_time": "2019-07-26T16:28:56", "url": "https://files.pythonhosted.org/packages/32/7a/157b5f44a48542076dbef835d4d639a1e9c20f07a8876e9cb96e89771c03/randmag-1.2.0.tar.gz" } ] }