{ "info": { "author": "Fidel Ram\u00edrez, Gina Renschler, Gautier Richard", "author_email": "fidel.ramirez@gmail.com", "bugtrack_url": null, "classifiers": [ "Development Status :: 5 - Production/Stable", "Environment :: Console", "Framework :: Jupyter", "Intended Audience :: Developers", "Intended Audience :: Science/Research", "Natural Language :: English", "Operating System :: MacOS :: MacOS X", "Operating System :: Microsoft :: Windows", "Operating System :: POSIX :: Linux", "Programming Language :: Python :: 3", "Programming Language :: Python :: 3.5", "Programming Language :: Python :: 3.6", "Topic :: Scientific/Engineering :: Bio-Informatics" ], "description": "HiCAssembler\n============\n\nHi-C scaffolding tool to assemble contigs/scaffolds into complete chromosomes\n-----------------------------------------------------------------------------\n\nThis software uses Hi-C sequencing data to assemble contigs/scaffolds into\ncomplete chromosomes. The assembly process consists of the following steps:\n\n * creation of corrected Hi-C contact matrix\n * detection of mis-assemblies (automatic and/or manual)\n * creation of initial path graph\n * iterative joining of high-confidence scaffold paths\n * addition of scaffolds that were not used yet\n * saving of scaffolds fasta file and liftover chain file\n\nHiCAssembler automatically visualizes the assembly process to inform\nthe user on the assembly status\n\n![HiCAssembler assembly](./docs/content/images/HiCAssembler_assembly.png)\n\n\nInstallation\n------------\nHiCAssembler works with python 2.7 and requires that [HiCExplorer](https://hicexplorer.readthedocs.io/) is installed\n\nTo install HiCAssembler use pip.\n\n```bash\n$ pip install HiCAssembler\n```\n\nIf you want to install the latest version use:\n\n```bash\n$ pip install git+https://github.com/maxplanck-ie/HiCAssembler.git\n```\n\n\nUsage\n-----\nBefore running HiCAssembler, creation of a corrected Hi-C matrix in h5 format\nis required. This file format is the output created by HiCExporer (http://hicexplorer.readthedocs.io/).\nHi-C reads need to be mapped to your pre-assembled contigs/scaffolds and then the\nHi-C matrix needs to be created and corrected. An example usage of HiCExporer for\nthese steps can be found at http://hicexplorer.readthedocs.io/en/latest/content/example_usage.html.\n\nAfterwards, you can start to assemble your pre-assembled contigs/scaffolds\ninto chromosomes using HiCAssembler.\n\n```bash\n$ assemble -m Hi_C_matrix_corrected.h5 -o ./assembly_output \\\n--min_scaffold_length 100000 --bin_size 5000 --misassembly_zscore_threshold -1.0 \\\n--num_iterations 3 --num_processors 16\n```\n\n`--min_scaffold_length 100000` sets the minimal length of pre-assembled scaffolds\nto 100 kb. Scaffolds smaller than 100 kb are added after the iterative correction.\n\n`--bin_size 5000` sets the Hi-C bin size to 5 kb. This would be the size of\nhigh-resolution bins referred to in the algorithm description.\n\n`--misassembly_zscore_threshold -1.0` sets the threshold deciding if a\nTAD-separation score is strong enough to be considered a mis-assembly.\n\n`--num_iterations 3` sets the number of assembly iterations to 3.\n\n\nIn case your final result contains assembly errors, you can manually correct them.\nThe position of assembly errors can be specified and added as a position to\ncut your pre-assembled contigs/scaffolds before the assembly using the\n`--split_positions_file split.bed` parameter. The exact position of the error\nin the pre-assembled contigs/scaffolds can be identified by using the tool\n`plotScaffoldInteractive`.\n\n```bash\n$ plotScaffoldInteractive scaffold_123\n```\n\nThe position of the assembly error is displayed by moving your cursor over it.\n\nCitation\n---------\nPrepring in preparation\n\nExamples\n--------\n\n(A small corrected Hi-C matrix can be found in the `data/` folder)\n\nA minimal example of the assembly of several scaffolds:\n\n\n```bash\n$ assemble -m /data/hic_small.h5 -o ./assembly_output \\\n--min_scaffold_length 100000 --bin_size 5000 --misassembly_zscore_threshold -1.0 \\\n--num_iterations 3 --num_processors 16\n```\n\nEach step of the assembly is automatically visualized. \n\n![HiCAssembler visualization](./docs/content/images/HiCAssembler_visualization.png)\n\n\nNow, let's see how scaffolds with and without mis-assemblies look like:\n\n![HiCAssembler assembly errors](./docs/content/images/Assembly_errors.png)\n\nAssembly errors can easily be detected as an abrupt change in the Hi-C signal.\nHiCAssembler automatically splits scaffolds at minima of the Hi-C score as shown\nhere:\n\n![HiCAssembler score](./docs/content/images/Assembly_errors_score.png)\n\nStrong TAD boundaries with a low TAD-separation score can wrongly be considered as\nmis-assemblies. The threshold of the score where scaffolds get split needs to\nbe chosen using the `--misassembly_zscore_threshold` parameter.\nA balance between too many splits and thus very small scaffolds after splitting\nand too many left-over mis-assemblies needs to be found. We identified a score\nof -1 as a good starting point for assemblies but we recommend to test several\nsettings. Some scaffolds will be split at TAD boundaries. This is no problem\nbecause they will be connected again by the assembly process afterwards.\n\nIf not all mis-assemblies are split automatically, splits can be added manually\nusing the `--split_positions_file split.bed` parameter. The exact position of\nthe error in the pre-assembled contigs/scaffolds can be identified by using the tool\n`plotScaffoldInteractive`.\n\n```bash\n$ plotScaffoldInteractive scaffold_123\n```\n\nThe position of the assembly error is displayed by moving your cursor over it.", "description_content_type": "", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/maxplanck-ie/HiCAssembler", "keywords": "", "license": "BSD", "maintainer": "", "maintainer_email": "", "name": "HiCAssembler", "package_url": "https://pypi.org/project/HiCAssembler/", "platform": "", "project_url": "https://pypi.org/project/HiCAssembler/", "project_urls": { "Homepage": "https://github.com/maxplanck-ie/HiCAssembler" }, "release_url": "https://pypi.org/project/HiCAssembler/1.1.1/", "requires_dist": null, "requires_python": ">=2.7", "summary": "Hi-C guided genome assembly", "version": "1.1.1" }, "last_serial": 4490965, "releases": { "1.1": [ { "comment_text": "", "digests": { "md5": "887924ceef79a180d04e0a1fb0b5ebb6", "sha256": "c7618beaba71c9b3d0c4401ea242fe7962a7d845f05d919892a16dda4f423cc0" }, "downloads": -1, "filename": "HiCAssembler-1.1.tar.gz", "has_sig": false, "md5_digest": "887924ceef79a180d04e0a1fb0b5ebb6", "packagetype": "sdist", "python_version": "source", "requires_python": ">=2.7", "size": 93625, "upload_time": "2018-11-12T19:53:38", "url": "https://files.pythonhosted.org/packages/6a/67/6e325c72775647693490aa4507cfc594b2202afe1109cc261eeadad375f5/HiCAssembler-1.1.tar.gz" } ], "1.1.1": [ { "comment_text": "", "digests": { "md5": "086d39f2f7631ba0a9f96fbc24220322", "sha256": "8fc4e49c11454d17eb7e6e8a835515077afd4abceb7b6c1b5080780b428f46e2" }, "downloads": -1, "filename": "HiCAssembler-1.1.1.tar.gz", "has_sig": false, "md5_digest": "086d39f2f7631ba0a9f96fbc24220322", "packagetype": "sdist", "python_version": "source", "requires_python": ">=2.7", "size": 93716, "upload_time": "2018-11-15T19:23:23", "url": "https://files.pythonhosted.org/packages/ec/1e/83f428ebc7b53abab7098c42c107536a088c2d8707baa3df26c229d77a37/HiCAssembler-1.1.1.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "086d39f2f7631ba0a9f96fbc24220322", "sha256": "8fc4e49c11454d17eb7e6e8a835515077afd4abceb7b6c1b5080780b428f46e2" }, "downloads": -1, "filename": "HiCAssembler-1.1.1.tar.gz", "has_sig": false, "md5_digest": "086d39f2f7631ba0a9f96fbc24220322", "packagetype": "sdist", "python_version": "source", "requires_python": ">=2.7", "size": 93716, "upload_time": "2018-11-15T19:23:23", "url": "https://files.pythonhosted.org/packages/ec/1e/83f428ebc7b53abab7098c42c107536a088c2d8707baa3df26c229d77a37/HiCAssembler-1.1.1.tar.gz" } ] }