{ "info": { "author": "Adam Rosenbaum", "author_email": "adam.rosenbaum@scilifelab.se", "bugtrack_url": null, "classifiers": [ "Environment :: Console", "License :: OSI Approved :: MIT License", "Programming Language :: Python :: 3.6" ], "description": "\n# mutacc\n[![Build Status](https://travis-ci.org/Clinical-Genomics/mutacc.png)](https://travis-ci.org/Clinical-Genomics/mutacc)\n[![Coverage Status](https://coveralls.io/repos/github/Clinical-Genomics/mutacc/badge.svg?branch=master)](https://coveralls.io/github/Clinical-Genomics/mutacc?branch=master)\n\n## The mutation accumulation database\n\nmutacc is a tool that makes it possible to create synthetic datasets to be used\nfor quality control or benchmarking of bioinformatic tools and pipelines intended\nfor variant calling of clinical variants. Using raw reads that supports a known\nvariant from a real NGS data, mutacc stores the relevant read from each case into\na database. This database can then be queried to create validation sets with true\npositives with the same properties as a real NGS data.\n\n## Installation\n### Conda\nFor installation of mutacc and the external prerequisites, this is made easy by\ncreating conda environment\n\n```consol\nconda create -n python=3.6 pip numpy cython\n```\n\nactivate environment\n\n```consol\nsource activate \n```\n### External Prerequisites\nmutacc takes use of two external packages, [seqkit](https://github.com/shenwei356/seqkit)>=v0.9 ,\nand [picard](https://github.com/broadinstitute/picard)>=v2.18. These can be\ninstalled within a conda environment by\n\n```console\nconda install -c bioconda picard\nconda install -c bioconda seqkit\n```\n\n### Install mutacc\nWithin the conda environment, do\n```console\ngit clone https://github.com/adrosenbaum/mutacc\npip install -e mutacc\n```\n## Usage\n\n### Configuration File\n\nSome options are best passed to mutacc through a configuration file. below is an\nexample of a config file, using the YAML format.\n\n```yaml\n#EXAMPLE OF A CONFIGURATION FILE\nhost: #Defaults to 'localhost'\nport: #Defaults to 27017\ndatabase: #Defaults to 'mutacc'\nusername: \npassword: \nroot_dir: \n```\n\nThe 'root_dir' entry specifies an existing directory in the file system, where\nall files generated by mutacc will be stored in corresponding subdirectories.\nE.g. all generated fastq files will be stored in /.../root_dir/reads/\n\n\n### Populate the mutacc database\n\nTo export data sets from the mutacc DB, the database must first be populated. To\nextract the raw reads supporting a known variant, mutacc takes use of all\nrelevant files generated from a NGS experiment up to the variant calling itself.\nThat is the bam file, and vcf file containing only the variants of interest.\n\nThis information is specified as a 'case', represented in yaml format\n\n```yaml\n#EXAMPLE OF A CASE\n\n#THE CASE FIELD CONTAINS METADATA OF THE CASE ITSELF\ncase:\n case_id: 'case123' #REQUIRED CASE_ID\n\n#LIST OF THE SAMPLES INVOLVED IN THE EXPERIMENT (MAY BE ONE, OR SEVERAL, E.G.\n#A TRIO)\nsamples:\n - sample_id: 'sample1' #REQUIRED\n analysis_type: 'wgs' #REQUIRED\n sex: 'male' #REQUIRED\n mother: 'sample2' #REQUIRED (CAN BE 0 if no mother)\n father: 'sample3' #REQUIRED (CAN BE 0 if no father)\n bam_file: /path/to/sorted_bam #REQUIRED\n phenotype: 'affected'\n\n - sample_id: 'sample2'\n analysis_type: 'wgs'\n sex: 'female' \n mother: '0' #0 if no parent \n father: '0' \n bam_file: /path/to/sorted_bam\n phenotype: 'unaffected'\n\n - sample_id: 'sample2'\n analysis_type: 'wgs'\n sex: 'male' \n mother: '0' \n father: '0' \n bam_file: /path/to/sorted_bam\n phenotype: 'affected'\n\n#PATH TO VCF FILE CONTAINING VARIANTS OF INTEREST FROM CASE\nvariants: /path/to/vcf\n```\n\nThis will find the reads from the bam files specified for each sample. If it\nis desired that the reads are found from the fastq files instead, this can be\ndone by specifying the fastq files as such\n\n```yaml\n - sample_id: 'sample1'\n analysis_type: 'wgs'\n sex: 'male' \n mother: 'sample2' \n father: 'sample3' \n bam_file: /path/to/sorted_bam\n fastq_files:\n - /path/to/fastq1\n - /path/to/fastq2\n phenotype: 'affected'\n```\nTo extract the reads from the case\n\n```console\nmutacc --config-file extract --padding 600 --case \n```\nthe --padding option takes the number of basepairs that the desired region is\npadded with.\n\nThis will create a file 'case_id'.mutacc stored in the directory specified in the\n/.../root_dir/imports directory.\n\nTo import the case into the database\n\n```console\nmutacc db import /.../root_dir/imports/.mutacc\n```\n\nThe db command is called each time mutacc needs to do any operation against the\ndatabase.\n\nThis will try to establish a connection to an instance of mongodb, by default\nrunning on 'localhost' on port 27017. If this is not wanted, it can be specified\nwith the --host and --port options.\n\n\n\n```console\nmutacc db -h -p import case_id.mutacc\n```\n\nIf authentication is required, this can be specified with the --username and\n--password options.\n\nor in a configuration file e.g.\n```yaml\nhost: \nport: \nusername: \npassword: \n```\n\n```console\nmutacc --config-file db import case_id.mutacc\n```\n\n\n### Export datasets from the database\nThe datasets are exported one sample at the time. At the moment, mutacc only\nsupports father/mother/child-trios and single samples. To export a synthetic\ndataset, the export command is used together with options.\n\nexport:\n\n -m/--member [child|father|mother|affected]\n specifies what family member to create a dataset for. Finds the correct\n member in each case (if trio) in the database, and uses the reads from this\n sample only to enrich the background samples. If a single sample dataset is\n required, the option can be passed with the 'affected' argument, use the\n reads from only one of the affected samples from each case.\n\n -c/--case-query \\\n Query to search among the case collection in the mongodb. A json string,\n with valid mongodb query language.\n\n -v/--variant-query \\\n Query to search among the variants collection.\n\n\n\n -s/--sex [male|female] \\\n Specify the sex of the sample\n\n -n/--sample-name \\\n name of the sample\n\n -p/--proband \\\n This flag will make the sample 'proband', this will force all variants from\n single cases to be included into this sample\n\n --vcf-dir \\\n Specify the directory where the vcf file (truth set) is stored. defaults\n to /.../root_dir/variants/\n\n\nexample:\n\n```console\nmutacc --config-file db export -m affected -c '{}'\n```\nwill find all the cases from the mutacc DB, and store this\ninformation in a file /.../root_dir/queries/sample_name_query.mutacc.\n\nto export an entire trio, this can be done by\n\n```console\nmutacc --config-file db export -m child -c '{}' -p -n child\nmutacc --config-file db export -m father -c '{}' -n father\nmutacc --config-file db export -m mother -c '{}' -n mother\n```\nThis will create three files child_query.mutacc, father_query.mutacc, and\nmother_query.mutacc.\n\nthe export subcommand will also generate a truth set vcf-file for each exported\nsamples, containing all queried variants.\n\nTo make a dataset (fastq-files) from a query file the synthesize command is used\nwith the following options\n\n -b/--background-bam \\\n Path to the bam file for sample to be used as background\n\n -f/--background-fastq \\\n Path to fastq file for sample to be used as background\n\n -f2/--background-fastq2 \\\n Path to second fastq file (if paired end experiment)\n\n -q/--query \\\n Path to the query files created with the export command\n\n --dataset-dir \\\n Directory where fastq files will be stored. defaults to\n /.../root_dir/datasets\n\n\nexample, using the query files created above\n\n```console\nmutacc --config-file synthesize -b -f -f2 -q child_query.mutacc\nmutacc --config-file synthesize -b -f -f2 -q father_query.mutacc\nmutacc --config-file synthesize -b -f -f2 -q mother_query.mutacc\n```\n\nThe created fastq-files will be stored in the directory /.../root_dir/datasets/\nor in directory specified by ---dataset-dir\n\n### Remove case from database\n\nTo remove a case from the mutacc DB, and all the generated bam, and fastq files\ngenerated from that case from disk, the remove command is used\n\n```console\nmutacc --config-file db remove \n```\n\n## Limitations\n\nmutacc is currently under development and only supports either single cases\n(cases with one sample) or mother/father/child trios. Furthermore, all cases\nuploaded, and exported from the mutacc DB are assumed to be paired-end reads\nexperiments.\n\n\n", "description_content_type": "", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/Clinical-Genomics/mutacc", "keywords": "", "license": "MIT", "maintainer": "", "maintainer_email": "", "name": "mutacc", "package_url": "https://pypi.org/project/mutacc/", "platform": "", "project_url": "https://pypi.org/project/mutacc/", "project_urls": { "Homepage": "https://github.com/Clinical-Genomics/mutacc" }, "release_url": "https://pypi.org/project/mutacc/1.1.0/", "requires_dist": [ "Click", "pysam", "coloredlogs", "biopython", "cyvcf2 (<0.10.0)", "mongo-adapter", "ped-parser", "PyYaml", "pymongo" ], "requires_python": ">=3.6.0", "summary": "", "version": "1.1.0" }, "last_serial": 5090939, "releases": { "1.1.0": [ { "comment_text": "", "digests": { "md5": "cee8e8b5c5e91d390ae157ec72b416ea", "sha256": "356dc48dd69aeb99d467efc0b5fd73c0fc9c589bea5ad59c39f1d3cbde983963" }, "downloads": -1, "filename": "mutacc-1.1.0-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "cee8e8b5c5e91d390ae157ec72b416ea", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": ">=3.6.0", "size": 40824, "upload_time": "2019-04-03T13:40:18", "url": "https://files.pythonhosted.org/packages/4c/a0/e38dfac13625070b3adddc0022a4f61e41376eac5affdf396d027a46e60a/mutacc-1.1.0-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "061895cca0f91578168a63b953fcf66c", "sha256": "63c2092b196bd17bd4ef624a971df82c3fc1c84626f0d413c687fedee38dd271" }, "downloads": -1, "filename": "mutacc-1.1.0.tar.gz", "has_sig": false, "md5_digest": "061895cca0f91578168a63b953fcf66c", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.6.0", "size": 31917, "upload_time": "2019-04-03T13:40:20", "url": "https://files.pythonhosted.org/packages/eb/f0/b57c0ec775bdd2f4bf27eca7522d6bb4cbff3e76baf7ab665408f81fab5f/mutacc-1.1.0.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "cee8e8b5c5e91d390ae157ec72b416ea", "sha256": "356dc48dd69aeb99d467efc0b5fd73c0fc9c589bea5ad59c39f1d3cbde983963" }, "downloads": -1, "filename": "mutacc-1.1.0-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "cee8e8b5c5e91d390ae157ec72b416ea", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": ">=3.6.0", "size": 40824, "upload_time": "2019-04-03T13:40:18", "url": "https://files.pythonhosted.org/packages/4c/a0/e38dfac13625070b3adddc0022a4f61e41376eac5affdf396d027a46e60a/mutacc-1.1.0-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "061895cca0f91578168a63b953fcf66c", "sha256": "63c2092b196bd17bd4ef624a971df82c3fc1c84626f0d413c687fedee38dd271" }, "downloads": -1, "filename": "mutacc-1.1.0.tar.gz", "has_sig": false, "md5_digest": "061895cca0f91578168a63b953fcf66c", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.6.0", "size": 31917, "upload_time": "2019-04-03T13:40:20", "url": "https://files.pythonhosted.org/packages/eb/f0/b57c0ec775bdd2f4bf27eca7522d6bb4cbff3e76baf7ab665408f81fab5f/mutacc-1.1.0.tar.gz" } ] }