{ "info": { "author": "", "author_email": "", "bugtrack_url": null, "classifiers": [ "Development Status :: 4 - Beta", "Intended Audience :: Developers", "License :: OSI Approved :: GNU General Public License v3 (GPLv3)", "Programming Language :: Python :: 2", "Programming Language :: Python :: 2.7", "Programming Language :: Python :: 3", "Programming Language :: Python :: 3.3", "Programming Language :: Python :: 3.4", "Programming Language :: Python :: 3.5", "Programming Language :: Python :: 3.6", "Topic :: Software Development :: Build Tools" ], "description": "NanoSim-H\n=========\n\n.. image:: https://travis-ci.org/karel-brinda/NanoSim-H.svg?branch=master\n\t:target: https://travis-ci.org/karel-brinda/NanoSim-H\n\n.. image:: https://img.shields.io/badge/install%20with-bioconda-brightgreen.svg?style=flat-square\n\t:target: https://anaconda.org/bioconda/nanosim-h\n\n.. image:: https://badge.fury.io/py/NanoSim-H.svg\n\t:target: https://badge.fury.io/py/NanoSim-H\n\n\nAbout\n-----\n\nNanoSim-H is a simulator of Oxford Nanopore reads that captures the technology-specific features of ONT data,\nand allows for adjustments upon improvement of Nanopore sequencing technology.\nNanoSim-H has been derived from `NanoSim `_,\na software package developed by Chen Yang at `Canada's Michael Smith Genome Sciences Centre `_.\nThe fork was created from version 1.0.1 and the versions of NanoSim-H and NanoSim are kept synchronized.\n\nNanoSim-H is implemented using Python uses R for model fitting.\nIn silico reads can be simulated from a given reference genome using ``nanosim-h``.\nThe NanoSim-H package is distributed with several precomputed error profiles, but\nadditional profiles can be computed using the ``nanosim-h-train``.\n\nThe main improvements compared to NanoSim are:\n\n* Support for Python 3\n* Support for `RNF `_ read names\n* Installation from `PyPI `_\n* Error profiles distributed with the main package\n* Automatic testing using `Travis `_\n* Reproducible simulations (setting a seed for PRG)\n* Improved interface with new parameters (e.g., for merging all contigs) and a progress bar\n* Several minor bugs fixed\n\n\n\nQuick example\n-------------\n\nSimulation of 100 reads from an *E.coli genome*.\n\n.. code-block:: bash\n\n\tpip install --upgrade nanosim-h\n\tcurl \"https://www.ncbi.nlm.nih.gov/sviewer/viewer.fcgi?db=nuccore&dopt=fasta&val=545778205&sendto=on\" | \\\n\t\tnanosim-h -n 100 -\n\n\n\nInstallation\n------------\n\n**From** `BioConda `_ **(recommended):**\n\n\n.. code-block:: bash\n\n\tconda config --add channels defaults\n\tconda config --add channels conda-forge\n\tconda config --add channels bioconda\n\tconda install -y nanosim-h\n\n**From** `PyPI `_ **:**\n\n.. code-block:: bash\n\n\tpip install --upgrade nanosim-h\n\n**From Github:**\n\n.. code-block:: bash\n\n\tgit clone https://github.com/karel-brinda/nanosim-h\n\tcd nanosim-h\n\tpip install --upgrade .\n\nor\n\n.. code-block:: bash\n\n\tgit clone https://github.com/karel-brinda/nanosim-h\n\tcd nanosim-h\n\tpython setup.py install\n\n\n**Dependencies:**\n\nFor read simulation:\n\n* `Python `_ (2.7, 3.2 - 3.6)\n* `Numpy `_\n\nFor computing new error profiles:\n\n* `LAST `_ (tested with version 847)\n* `R `_\n\nWhen installed using Bioconda, all NanoSim-H dependencies get installed automatically.\nWhen installed using PIP, all dependencies for read simulation are installed automatically.\n\n\nRead simulation\n---------------\n\nSimulation stage takes a reference genome and possibly a read profile as input, and outputs simulated reads in FASTA format.\n\n\n.. command: nanosim-h --help\n\n.. code-block::\n\n\t$ nanosim-h --help\n\tusage: nanosim-h [-h] [-v] [-p str] [-o str] [-n int] [-u float] [-m float]\n\t [-i float] [-d float] [-s int] [--circular] [--perfect]\n\t [--merge-contigs] [--rnf] [--rnf-add-cigar] [--max-len int]\n\t [--min-len int] [--kmer-bias int]\n\t \n\t\n\tProgram: NanoSim-H - a simulator of Oxford Nanopore reads.\n\tVersion: 1.1.0.4\n\tAuthors: Chen Yang - author of the original software package (NanoSim)\n\t Karel Brinda - author of the NanoSim-H fork\n\t\n\tpositional arguments:\n\t reference genome (- for standard input)\n\t\n\toptional arguments:\n\t -h, --help show this help message and exit\n\t -v, --version show program's version number and exit\n\t -p str, --profile str\n\t error profile - one of precomputed profiles\n\t ('ecoli_R7.3', 'ecoli_R7', 'ecoli_R9_1D',\n\t 'ecoli_R9_2D', 'yeast', 'ecoli_UCSC1b') or own\n\t directory with an error profile [ecoli_R9_2D]\n\t -o str, --out-pref str\n\t prefix of output file [simulated]\n\t -n int, --number int number of generated reads [10000]\n\t -u float, --unalign-rate float\n\t rate of unaligned reads [detect from the error\n\t profile]\n\t -m float, --mis-rate float\n\t mismatch rate (weight tuning) [1.0]\n\t -i float, --ins-rate float\n\t insertion rate (weight tuning) [1.0]\n\t -d float, --del-rate float\n\t deletion rate (weight tuning) [1.0]\n\t -s int, --seed int initial seed for the pseudorandom number generator (0\n\t for random) [42]\n\t --circular circular simulation (linear otherwise)\n\t --perfect output perfect reads, no mutations\n\t --merge-contigs merge contigs from the reference\n\t --rnf use RNF format for read names\n\t --rnf-add-cigar add cigar to RNF names (not fully debugged, yet)\n\t --max-len int maximum read length [inf]\n\t --min-len int minimum read length [50]\n\t --kmer-bias int prohibits homopolymers with length >= n bases in\n\t output reads [6]\n\t\n\tExamples: nanosim-h --circular ecoli_ref.fasta\n\t nanosim-h --circular --perfect ecoli_ref.fasta\n\t nanosim-h -p yeast --kmer-bias 0 yeast_ref.fasta\n\t\n\tNotice: the use of `max-len` and `min-len` will affect the read length distributions. If\n\tthe range between `max-len` and `min-len` is too small, the program will run slowlier accordingly.\n\t\n\n.. end\n\n\n**Examples:**\n\n1. If you want to simulate reads from *E. coli* genome, then circular mode should be used because it is a circular genome.\n\n\t``nanosim-h --circular Ecoli_ref.fasta``\n\n2. If you want to simulate only perfect reads, i.e. no SNPs, or indels, just simulate the read length distribution.\n\n\t``nanosimh-h --circular --perfect Ecoli_ref.fasta``\n\n3. If you want to simulate reads from a *S. cerevisiae* genome with no *k*-mer bias, then linear mode should be chosen because it is a linear genome.\n\n\t``nanosimh-h -p yeast --kmer-bias 0 yeast_ref.fasta``\n\n\n**Output files:**\n\n1. ``simulated.log`` \u2013 Log file for simulation process.\n\n2. ``simulated.fa`` \u2013 FASTA file of simulated reads. Reads can contain information about how they were created either in RNF, or in the original NanoSim naming convention.\n\n **RNF naming convention**\n\n See the associated `RNF paper `_ and `RNF specification `_.\n\n **NanoSim naming convention**\n\n\tEach reads has \"unaligned\", \"aligned\", or \"perfect\" in the header determining their error rate. \"unaligned\" means that the reads have an error rate over 90% and cannot be aligned. \"aligned\" reads have the same error rate as training reads. \"perfect\" reads have no errors.\n\n\tTo explain the information in the header, we have two examples:\n\n\t* ``>ref|NC-001137|-[chromosome=V]_468529_unaligned_0_F_0_3236_0``\n\t\tAll information before the first ``_`` are chromosome information. ``468529`` is the start position and *unaligned* suggesting it should be unaligned to the reference. The first ``0`` is the sequence index. ``F`` represents a forward strand. ``0_3236_0`` means that sequence length extracted from the reference is 3236 bases.\n\t* ``>ref|NC-001143|-[chromosome=XI]_115406_aligned_16565_R_92_12710_2``\n\t\tThis is an aligned read coming from chromosome XI at position 115406. ``16565`` is the sequence index. `R` represents a reverse complement strand. ``92_12710_2`` means that this read has 92-base head region (cannot be aligned), followed by 12710 bases of middle region, and then 2-base tail region.\n\n\tThe information in the header can help users to locate the read easily.\n\n3. ``simulated.errors.txt`` \u2013 List of introduced errors.\n\n\tThe output contains error type, position, original bases and current bases.\n\n\nError profiles\n--------------\n\nCharacterization stage takes a reference and a training read set in FASTA format as input. User can also provide their own alignment file in MAF format.\n\n\n**Profiles distributed with NanoSim-H:**\n\n* ``ecoli_R7``\n* ``ecoli_R7.3``\n* ``ecoli_R9_1D``\n* ``ecoli_R9_2D`` (default error profile for read simulation)\n* ``ecoli_UCSC1b``\n* ``yeast``\n\n**New error profiles:**\n\nA new error profile can be obtained using the ``nanosim-h-train`` command.\n\n.. command: nanosim-h-train --help\n\n.. code-block::\n\n\t$ nanosim-h-train --help\n\tusage: nanosim-h-train [-h] [-v] [-i str] [-m str] [-b int] [--no-model-fit]\n\t \n\t\n\tProgram: NanoSim-H-Train - compute an error profile for NanoSim-H.\n\tVersion: 1.1.0.4\n\tAuthors: Chen Yang - author of the original software package (NanoSim)\n\t Karel Brinda - author of the NanoSim-H fork\n\t\n\tpositional arguments:\n\t reference genome of the training reads\n\t error profile dir\n\t\n\toptional arguments:\n\t -h, --help show this help message and exit\n\t -v, --version show program's version number and exit\n\t -i str, --infile str training ONT real reads, must be fasta files\n\t -m str, --maf str user can provide their own alignment file, with maf\n\t extension\n\t -b int, --num-bins int\n\t number of bins (for development) [20]\n\t --no-model-fit no model fitting\n\t\n\n.. end\n\n**Files associated with an error profile:**\n\n1. ``aligned_length_ecdf`` \u2013 Length distribution of aligned regions on aligned reads.\n2. ``aligned_reads_ecdf`` \u2013 Length distribution of aligned reads.\n3. ``align_ratio`` \u2013 Empirical distribution of align ratio of each read.\n4. ``besthit.maf`` \u2013 The best alignment of each read based on length.\n5. ``match.hist``, ``mis.hist``, ``ins.hist``, ``del.hist`` \u2013 Histograms of matches, mismatches, insertions, and deletions.\n6. ``first_match.hist`` \u2013 Histogram of the first match length of each alignment.\n7. ``error_markov_model`` \u2013 Markov model of error types.\n8. ``ht_ratio`` \u2013 Empirical distribution of the head region vs total unaligned region.\n9. ``training.maf`` \u2013 The output of LAST, alignment file in MAF format.\n10. ``match_markov_model`` \u2013 Markov model of the length of matches (stretches of correct base calls).\n11. ``model_profile`` \u2013 Fitted model for errors.\n12. ``processed.maf`` \u2013 A re-formatted MAF file for user-provided alignment file.\n13. ``unaligned_length_ecdf`` \u2013 Length distribution of unaligned reads", "description_content_type": "", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/karel-brinda/nanosim-h", "keywords": "Nanopore simulation", "license": "GPLv3", "maintainer": "", "maintainer_email": "", "name": "NanoSim-H", "package_url": "https://pypi.org/project/NanoSim-H/", "platform": "", "project_url": "https://pypi.org/project/NanoSim-H/", "project_urls": { "Homepage": "https://github.com/karel-brinda/nanosim-h" }, "release_url": "https://pypi.org/project/NanoSim-H/1.1.0.4/", "requires_dist": null, "requires_python": "", "summary": "", "version": "1.1.0.4" }, "last_serial": 4145302, "releases": { "1.1.0.0": [ { "comment_text": "", "digests": { "md5": "9f74b3518acf8679eed8e4fb96e76553", "sha256": "8565bf378d0fe6e38399d8baa624ccd8f25a6454aa57aee78fbac4ae932254b9" }, "downloads": -1, "filename": "NanoSim-H-1.1.0.0.tar.gz", "has_sig": false, "md5_digest": "9f74b3518acf8679eed8e4fb96e76553", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 688305, "upload_time": "2017-05-10T18:53:54", "url": "https://files.pythonhosted.org/packages/e0/af/3afd825eedf07b4370ece52a0ba81f55c8cecdf70ba4951c486c2427bc51/NanoSim-H-1.1.0.0.tar.gz" } ], "1.1.0.1": [ { "comment_text": "", "digests": { "md5": "2f4f94dd5e1458c825442c971cc14df6", "sha256": "91bf1b5203225af1ae83d07a91353a748948e6c6caf99fdb001e58c08f6c7986" }, "downloads": -1, "filename": "NanoSim-H-1.1.0.1.tar.gz", "has_sig": false, "md5_digest": "2f4f94dd5e1458c825442c971cc14df6", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 688324, "upload_time": "2017-05-10T19:00:24", "url": "https://files.pythonhosted.org/packages/2c/13/a33d0e4c29b70ff1189bc6a56418c0f3cd151c49d6aaf551f7ca1ae1995c/NanoSim-H-1.1.0.1.tar.gz" } ], "1.1.0.2": [ { "comment_text": "", "digests": { "md5": "6c4da7a548a047b9c7b409ed60afd6b2", "sha256": "c714fb90f0c8e2672540e789ab0a9e797e2385372f45ef85cd8b82aee5fc1a29" }, "downloads": -1, "filename": "NanoSim-H-1.1.0.2.tar.gz", "has_sig": false, "md5_digest": "6c4da7a548a047b9c7b409ed60afd6b2", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 688326, "upload_time": "2017-05-10T19:02:29", "url": "https://files.pythonhosted.org/packages/60/ed/b0e9384cc59ba52e22b62b7517b356bafc18fafa64ae76ca5194745e92b2/NanoSim-H-1.1.0.2.tar.gz" } ], "1.1.0.3": [ { "comment_text": "", "digests": { "md5": "f1ddee7cbe95bf1efedc437d5c54ad80", "sha256": "de82ad6ee2b2fabd2cb513bfb353be6ec4e7d9d81681d78cca6ec1af4812a27a" }, "downloads": -1, "filename": "NanoSim-H-1.1.0.3.tar.gz", "has_sig": false, "md5_digest": "f1ddee7cbe95bf1efedc437d5c54ad80", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 688265, "upload_time": "2017-05-16T19:09:42", "url": "https://files.pythonhosted.org/packages/20/32/4e5abbfb64835c62be3f4d3a5178e33a2f6aabea9cceb3e15927c90b0f39/NanoSim-H-1.1.0.3.tar.gz" } ], "1.1.0.4": [ { "comment_text": "", "digests": { "md5": "68d2c6724dd0f170964383684c5e8b8c", "sha256": "cd911f9b05419e164a92e5d36d8ae1e6d60641b54a8ca4bf5a179be4e89a52c6" }, "downloads": -1, "filename": "NanoSim_H-1.1.0.4-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "68d2c6724dd0f170964383684c5e8b8c", "packagetype": "bdist_wheel", "python_version": "3.6", "requires_python": null, "size": 695168, "upload_time": "2018-08-07T17:55:30", "url": "https://files.pythonhosted.org/packages/e2/f3/a339c42515d5beec10c22e8a6c821668776714b4cdda17f0bc065c538409/NanoSim_H-1.1.0.4-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "e6bd54bd59ac87d8d9416bfe6edec8a7", "sha256": "0fc5f31f3569a04d77bcf8322390301f01d4c26da67d05cf79a57b44d00e341c" }, "downloads": -1, "filename": "NanoSim-H-1.1.0.4.tar.gz", "has_sig": false, "md5_digest": "e6bd54bd59ac87d8d9416bfe6edec8a7", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 697244, "upload_time": "2018-08-07T17:55:28", "url": "https://files.pythonhosted.org/packages/f5/44/8815be8aec318b1d77b4ce10d523081cb6975cc3c76382c8ab971d0a96eb/NanoSim-H-1.1.0.4.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "68d2c6724dd0f170964383684c5e8b8c", "sha256": "cd911f9b05419e164a92e5d36d8ae1e6d60641b54a8ca4bf5a179be4e89a52c6" }, "downloads": -1, "filename": "NanoSim_H-1.1.0.4-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "68d2c6724dd0f170964383684c5e8b8c", "packagetype": "bdist_wheel", "python_version": "3.6", "requires_python": null, "size": 695168, "upload_time": "2018-08-07T17:55:30", "url": "https://files.pythonhosted.org/packages/e2/f3/a339c42515d5beec10c22e8a6c821668776714b4cdda17f0bc065c538409/NanoSim_H-1.1.0.4-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "e6bd54bd59ac87d8d9416bfe6edec8a7", "sha256": "0fc5f31f3569a04d77bcf8322390301f01d4c26da67d05cf79a57b44d00e341c" }, "downloads": -1, "filename": "NanoSim-H-1.1.0.4.tar.gz", "has_sig": false, "md5_digest": "e6bd54bd59ac87d8d9416bfe6edec8a7", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 697244, "upload_time": "2018-08-07T17:55:28", "url": "https://files.pythonhosted.org/packages/f5/44/8815be8aec318b1d77b4ce10d523081cb6975cc3c76382c8ab971d0a96eb/NanoSim-H-1.1.0.4.tar.gz" } ] }