{ "info": { "author": "Leszek Pryszcz", "author_email": "l.p.pryszcz+distutils@gmail.com", "bugtrack_url": null, "classifiers": [ "Development Status :: 4 - Beta", "Programming Language :: Python :: 2", "Programming Language :: Python :: 2.7" ], "description": ".. contents:: Table of Contents\n\npyScaf\n======\n\npyScaf orders contigs from genome assemblies utilising several types of information:\n\n- paired-end (PE) and/or mate-pair libraries ([NGS-based mode](#ngs-based-scaffolding))\n- long reads ([NGS-based mode](#scaffolding-based-on-long-reads))\n- synteny to the genome of some related species ([reference-based mode](#reference-based-scaffolding))\n\n=================\nScaffolding modes\n=================\n\nNGS-based scaffolding\n~~~~~~~~~~~~~~~~~~~~~\nThis is under development... Stay tuned. \n\nScaffolding based on long reads\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\nExperimental version available.\n\nReference-based scaffolding\n~~~~~~~~~~~~~~~~~~~~~~~~~~~\nIn reference-based mode, pyScaf uses synteny to the genome of closely related species in order to order contigs and estimate distances between adjacent contigs.\n\nContigs are aligned globally (end-to-end) onto reference chromosomes, ignoring:\n\n- matches not satisfying cut-offs (`--identity` and `--overlap`)\n- suboptimal matches (only best match of each query to reference is kept) \n- and removing overlapping matches on reference. \n\nIn preliminary tests, pyScaf performed superbly on simulated heterozygous genomes based on *C. parapsilosis* (13 Mb; CANPA) and *A. thaliana* (119 Mb; ARATH) chromosomes, reconstructing correctly all chromosomes always for CANPA and nearly always for ARATH (`Figures in dropbox `_, `CANPA table `_, `ARATH table `_). \nRuns took ~0.5 min for CANPA on `4 CPUs` and ~2 min for ARATH on `16 CPUs`. \n\n**Important remarks:**\n\n- Reduce your assembly before (fasta2homozygous.py) as any redundancy will likely break the synteny.\n- pyScaf works better with contigs than scaffolds, as scaffolds are often affected by mis-assemblies (no *de novo assembler* / scaffolder is perfect...), which breaks synteny. \n- pyScaf works very well if divergence between reference genome and assembled contigs is below 20% at nucleotide level. \n- pyScaf deals with large rearrangements ie. deletions, insertion, inversions, translocations. **Note however, this is experimental implementation!**\n- Consider closing gaps after scaffolding. \n\n=====\nUsage\n=====\nDependencies\n~~~~~~~~~~~~\n- `LAST v700+ `_\n- `FastaIndex `_\n\nParameters\n~~~~~~~~~~\nGiven reference genome, the program generates pairwise genome alignment (dotplots) by default. \n\n- Genral options:\n\n -h, --help show this help message and exit\n -f FASTA, --fasta FASTA\n assembly FASTA file\n -o OUTPUT, --output OUTPUT\n output stream [scaffolds.fa]\n -t THREADS, --threads THREADS\n max no. of threads to run [4]\n --log LOG output log to [stderr]\n --dotplot\n generate dotplot as [png]\n --version show program's version number and exit\n\n- Reference-based scaffolding options:\n\n -r REF, --ref REF, --reference REF\n reference FastA file\n --identity IDENTITY min. identity [0.33]\n --overlap OVERLAP min. overlap [0.66]\n -g MAXGAP, --maxgap MAXGAP\n max. distance between adjacent contigs [0.01 * assembly_size]\n --norearrangements high identity mode (rearrangements not allowed)\n\n- Long read-based scaffolding options (EXPERIMENTAL!): \n\n -n LONGREADS, --longreads LONGREADS\n FastQ/FastA file(s) with PacBio/ONT reads\n\n- NGS-based scaffolding options (!NOT IMPLEMENTED!):\n\n -i FASTQ, --fastq FASTQ\n FASTQ PE/MP files\n -j JOINS, --joins JOINS\n min pairs to join contigs [5]\n -a LINKRATIO, --linkratio LINKRATIO\n max link ratio between two best contig pairs [0.7]\n -l LOAD, --load LOAD align subset of reads [0.2]\n -q MAPQ, --mapq MAPQ min mapping quality [10]\n\n\nTest run\n~~~~~~~~\nTo perform reference-based assembly, provide assembled contigs and reference genome in FastA format.\nDotplots of below runs can be found in [docs](/docs).\nIf you wish to skip dotplot generation (ie. no X11 on your system), provide `--dotplot ''` parameter. \n\n.. code-block:: bash\n\n # scaffold homogenised assembly (reduced contigs)\n ./pyScaf.py -f test/contigs.reduced.fa -r test/ref.fa -o test/contigs.reduced.ref.fa\n\n # scaffold reduced contigs using global mode (no norearrangements allowed)\n ./pyScaf.py -f test/contigs.reduced.fa -r test/ref.fa -o test/contigs.reduced.ref.global.fa --norearrangements\n\n # scaffold heterozygous assembly (de novo assembled contigs)\n ./pyScaf.py -f test/contigs.fa -r test/ref.fa -o test/contigs.ref.fa\n\n # scaffold reduced contigs using long reads\n ## pacbio\n ./pyScaf.py -f test/contigs.reduced.fa -n test/pacbio.fq.gz -o test/contigs.reduced.pacbio.fa\n ## nanopore\n ./pyScaf.py -f test/contigs.reduced.fa -n test/nanopore.fa.gz -o test/contigs.reduced.nanopore.fa\n\n # generate dotplot\n lastdb test/ref.fa\n lastal -f TAB test/ref.fa test/contigs.reduced.pacbio.fa | last-dotplot - test/contigs.reduced.pacbio.fa.ref.png\n lastal -f TAB test/ref.fa test/contigs.reduced.nanopore.fa | last-dotplot - test/contigs.reduced.nanopore.fa.ref.png\n\n # clean-up\n #rm test/contigs.{,reduced.}fa.* test/ref.fa.* test/*.{nanopore,pacbio,ref}* test/*.log\n\n\n================\nProof of concept\n================\npyScaf is under heavy development right now.\nNevertheless, the reference-based mode is functional and produces meaningful assemblies. Moverover, it has been implemented in `Redundans `_.\n\nFor more info, have a look in `workbook `_. \n\n\n\n", "description_content_type": null, "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/lpryszcz/pyScaf", "keywords": "assembly scaffolding paired-end long-reads synteny", "license": "GPLv3", "maintainer": "", "maintainer_email": "", "name": "pyScaf", "package_url": "https://pypi.org/project/pyScaf/", "platform": "", "project_url": "https://pypi.org/project/pyScaf/", "project_urls": { "Homepage": "https://github.com/lpryszcz/pyScaf" }, "release_url": "https://pypi.org/project/pyScaf/0.12a4/", "requires_dist": [ "FastaIndex" ], "requires_python": "", "summary": "", "version": "0.12a4" }, "last_serial": 2584919, "releases": { "0.12a0": [ { "comment_text": "", "digests": { "md5": "c9ed6ac8b37b92c9f67ee2b3158bd783", "sha256": "0c3bec17c4c1047fee4dde2982a993945533a8f69393872753e69675b863f1bf" }, "downloads": -1, "filename": "pyScaf-0.12a0-py2-none-any.whl", "has_sig": false, "md5_digest": "c9ed6ac8b37b92c9f67ee2b3158bd783", "packagetype": "bdist_wheel", "python_version": "py2", "requires_python": null, "size": 19935, "upload_time": "2017-01-19T13:06:05", "url": "https://files.pythonhosted.org/packages/1c/b1/27e36324e913a034d0183bca499aef64a20301cad26fdfb6246933153650/pyScaf-0.12a0-py2-none-any.whl" } ], "0.12a1": [ { "comment_text": "", "digests": { "md5": "7a8ae693655bd1cfd7643faebf851367", "sha256": "242768239549d82cdfff5f9500c25e3d4b01439e61e097eccf69eeb8ee80fb4b" }, "downloads": -1, "filename": "pyScaf-0.12a1-py2-none-any.whl", "has_sig": false, "md5_digest": "7a8ae693655bd1cfd7643faebf851367", "packagetype": "bdist_wheel", "python_version": "py2", "requires_python": null, "size": 20249, "upload_time": "2017-01-19T13:17:30", "url": "https://files.pythonhosted.org/packages/5e/a1/e2be4b333940264bddd40694a09e33450433f8ddf5dafe155dc72b0be7eb/pyScaf-0.12a1-py2-none-any.whl" } ], "0.12a2": [ { "comment_text": "", "digests": { "md5": "b04bb35319f8a1ea0095769809f79cd8", "sha256": "3cff06a97cbab925c47f90ad03943edd9f92142f446434c94e2d5dc75ba8d645" }, "downloads": -1, "filename": "pyScaf-0.12a2-py2-none-any.whl", "has_sig": false, "md5_digest": "b04bb35319f8a1ea0095769809f79cd8", "packagetype": "bdist_wheel", "python_version": "py2", "requires_python": null, "size": 20317, "upload_time": "2017-01-19T13:19:16", "url": "https://files.pythonhosted.org/packages/14/96/3b9ca835dd2ea6fc95636576c4d22644782710bbd4b8a4e16b24da88dee5/pyScaf-0.12a2-py2-none-any.whl" } ], "0.12a3": [ { "comment_text": "", "digests": { "md5": "7442783b02081c10774e143ea4419ef1", "sha256": "f70296ec03b6893f9dd539658ee34d7ff10d2b69d4d50b6fe501375f24825b93" }, "downloads": -1, "filename": "pyScaf-0.12a3-py2-none-any.whl", "has_sig": false, "md5_digest": "7442783b02081c10774e143ea4419ef1", "packagetype": "bdist_wheel", "python_version": "py2", "requires_python": null, "size": 20319, "upload_time": "2017-01-19T13:22:24", "url": "https://files.pythonhosted.org/packages/8f/3c/647b0606007f450a843e504ee645d7aba0d04420a2a3216282ed1d6ebec1/pyScaf-0.12a3-py2-none-any.whl" } ], "0.12a4": [ { "comment_text": "", "digests": { "md5": "5837572dce0c88f79b8240a2f894bf64", "sha256": "8df880c5c0560fa1d2f76b509f964ed14baa0ed884b46616f28be5da4d538dac" }, "downloads": -1, "filename": "pyScaf-0.12a4-py2-none-any.whl", "has_sig": false, "md5_digest": "5837572dce0c88f79b8240a2f894bf64", "packagetype": "bdist_wheel", "python_version": "py2", "requires_python": null, "size": 20366, "upload_time": "2017-01-19T13:52:56", "url": "https://files.pythonhosted.org/packages/29/e4/fdc8ffca0a993076d240bc95afcc26c73feaec6128dd3073d07aad3cbed9/pyScaf-0.12a4-py2-none-any.whl" }, { "comment_text": "", "digests": { "md5": "c67526747eb04d1e28279ac310916d40", "sha256": "3ce3f6fe80bd058831b6a38a56d464ef10f3ebbdd6bc3dcb0d7f127c0b2c1b36" }, "downloads": -1, "filename": "pyScaf-0.12a4.tar.gz", "has_sig": false, "md5_digest": "c67526747eb04d1e28279ac310916d40", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 34912, "upload_time": "2017-01-19T13:52:59", "url": "https://files.pythonhosted.org/packages/ee/52/a947347d00c323a87588d6b6d5ad54b3656a5df2f3bcaad477833a43d1f6/pyScaf-0.12a4.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "5837572dce0c88f79b8240a2f894bf64", "sha256": "8df880c5c0560fa1d2f76b509f964ed14baa0ed884b46616f28be5da4d538dac" }, "downloads": -1, "filename": "pyScaf-0.12a4-py2-none-any.whl", "has_sig": false, "md5_digest": "5837572dce0c88f79b8240a2f894bf64", "packagetype": "bdist_wheel", "python_version": "py2", "requires_python": null, "size": 20366, "upload_time": "2017-01-19T13:52:56", "url": "https://files.pythonhosted.org/packages/29/e4/fdc8ffca0a993076d240bc95afcc26c73feaec6128dd3073d07aad3cbed9/pyScaf-0.12a4-py2-none-any.whl" }, { "comment_text": "", "digests": { "md5": "c67526747eb04d1e28279ac310916d40", "sha256": "3ce3f6fe80bd058831b6a38a56d464ef10f3ebbdd6bc3dcb0d7f127c0b2c1b36" }, "downloads": -1, "filename": "pyScaf-0.12a4.tar.gz", "has_sig": false, "md5_digest": "c67526747eb04d1e28279ac310916d40", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 34912, "upload_time": "2017-01-19T13:52:59", "url": "https://files.pythonhosted.org/packages/ee/52/a947347d00c323a87588d6b6d5ad54b3656a5df2f3bcaad477833a43d1f6/pyScaf-0.12a4.tar.gz" } ] }