{ "info": { "author": "Ian Maurer", "author_email": "ian@genomoncology.com", "bugtrack_url": null, "classifiers": [ "Development Status :: 4 - Beta", "Intended Audience :: Developers", "Intended Audience :: Science/Research", "License :: Other/Proprietary License", "Natural Language :: English", "Programming Language :: Python :: 3", "Programming Language :: Python :: 3.6", "Topic :: Scientific/Engineering :: Bio-Informatics" ], "description": "govcf - Variant Call File \"call\" generator\n==========================================\n\nThis is a proprietary package that is available from [GenomOncology][GenomOncology] and works\nwith our [Knowledge Management System][Knowledge Management System].\n\nFor more information about licensing please contact us at:\n\ninfo@genomoncology.com\n\n\nAdditional proprietary projects available for download via pypi include:\n\n* [GO SDK][GO SDK] - GenomOncology Software Development Kit\n* [GO CLI][GO CLI] - GenomOncology Command Line Interface\n\nOur open source projects include:\n\n* [Related][Related] - Nested Object Models in Python with dictionary, YAML, and JSON transformation support\n* [Specd][Specd] - Swagger v2 Specification Directories\n* [Rigor][Rigor] - HTTP-based DSL for for validating RESTful APIs\n\n\nOverview\n--------\n\nGenomOncology Variant Call File (VCF) generator built on top of the VCF parser\nwithin the [pysam] project. The generator yields two record types as indicated by\nthe `__type__` dictionary attribute:\n\n* **Header** (1 per VCF file)\n* **Call** (1 per unique sample alt)\n\nThe header includes the following information:\n\n* `__child__`: the type of the records that will follow the header.\n* `config`: any configuration fields provided to the generator.\n* `file_path`: the file location of the VCF.\n* `formats`: the meta data of the FORMAT fields in the header.\n* `info`: the meta data of the INFO fields in the header.\n* `types`: the field type of all of the fields found in the INFO or FORMAT.\n\nA call is the representation of a single ALT allele for a given sample. The\ncalls are generated for each VCF record by iterating each of the samples and\nyielding a call for each unique ALT index specified by the GT (genotype) field.\n\nA call includes the following fields:\n\n* `alt`: alternate allele\n* `chr`: chromosome\n* `filters`: filters provided, including None for '.'\n* `info`: info value fields\n* `is_het`: boolean that is true when allele is heterozygous (e.g. 0/1)\n* `is_phased`: boolean that indicates whether phased (|) or unphased (/)\n* `quality`: quality value\n* `ref`: reference allele\n* `rs_id`: ID field\n* `sample_name`: name of the sample column\n* `start`: start position\n\nThis package also has a class called `BedFilter` which can be passed into\nthe iterator functions that filters records by chromosome and start position\nand only yields calls that fall within the range specified by the BED file.\n\n\nQuick Example\n-------------\n\nThe following example is what the parsing of the example provided at the top\nof the VCF Specification document here:\n\nhttps://samtools.github.io/hts-specs/VCFv4.2.pdf\n\nHere is the VCF:\n\n```text\n##fileformat=VCFv4.2\n##fileDate=20090805\n##source=myImputationProgramV3.1\n##reference=file:///seq/references/1000GenomesPilot-NCBI36.fasta\n##contig=\n##phasing=partial\n##INFO=\n##INFO=\n##INFO=\n##INFO=\n##INFO=\n##INFO=\n##FILTER=\n##FILTER=\n##FORMAT=\n##FORMAT=\n##FORMAT=\n##FORMAT=\n#CHROM\tPOS\tID\tREF\tALT\tQUAL\tFILTER\tINFO\tFORMAT\tNA00001\tNA00002\tNA00003\n20\t14370\trs6054257\tG\tA\t29\tPASS\tNS=3;DP=14;AF=0.5;DB;H2\tGT:GQ:DP:HQ\t0|0:48:1:51,51\t1|0:48:8:51,51\t1/1:43:5:.,.\n20\t17330\t.\tT\tA\t3\tq10\tNS=3;DP=11;AF=0.017\tGT:GQ:DP:HQ\t0|0:49:3:58,50\t0|1:3:5:65,3\t0/0:41:3\n20\t1110696\trs6040355\tA\tG,T\t67\tPASS\tNS=2;DP=10;AF=0.333,0.667;AA=T;DB\tGT:GQ:DP:HQ\t1|2:21:6:23,27\t2|1:2:0:18,2\t2/2:35:4\n20\t1230237\t.\tT\t.\t47\tPASS\tNS=3;DP=13;AA=T\tGT:GQ:DP:HQ\t0|0:54:7:56,60\t0|0:48:4:51,51\t0/0:61:2\n20\t1234567\tmicrosat1\tGTC\tG,GTCT\t50\tPASS\tNS=3;DP=9;AA=G;H2\tGT:GQ:DP\t0/1:35:4\t0/2:17:2\t1/1:40:3\n```\n\n\nHere is some example python code:\n\n```python\n\nfrom govcf import iterate_vcf_calls, BEDFilter\nfrom pprint import pprint\n\nbed_filter = BEDFilter(\"panel.bed\")\n\nfor record in iterate_vcf_calls(\"tests/vcfs/spec.vcf\", bed_filter=bed_filter):\n pprint(record)\n```\n\nYields the following results:\n\n```\n{'__child__': 'CALL',\n '__type__': 'HEADER',\n 'config': {'include_vaf': True},\n 'file_path': '/Users/ian/code/govcf/tests/vcfs/spec.vcf',\n 'formats': {'DP': {'description': 'Read Depth',\n 'id': 2,\n 'name': 'DP',\n 'number': 1,\n 'type': 'Integer'},\n 'GQ': {'description': 'Genotype Quality',\n 'id': 10,\n 'name': 'GQ',\n 'number': 1,\n 'type': 'Integer'},\n 'GT': {'description': 'Genotype',\n 'id': 9,\n 'name': 'GT',\n 'number': 1,\n 'type': 'String'},\n 'HQ': {'description': 'Haplotype Quality',\n 'id': 11,\n 'name': 'HQ',\n 'number': 2,\n 'type': 'Integer'}},\n 'info': {'AA': {'description': 'Ancestral Allele',\n 'id': 4,\n 'name': 'AA',\n 'number': 1,\n 'type': 'String'},\n 'AF': {'description': 'Allele Frequency',\n 'id': 3,\n 'name': 'AF',\n 'number': 'A',\n 'type': 'Float'},\n 'DB': {'description': 'dbSNP membership, build 129',\n 'id': 5,\n 'name': 'DB',\n 'number': 0,\n 'type': 'Flag'},\n 'DP': {'description': 'Total Depth',\n 'id': 2,\n 'name': 'DP',\n 'number': 1,\n 'type': 'Integer'},\n 'H2': {'description': 'HapMap2 membership',\n 'id': 6,\n 'name': 'H2',\n 'number': 0,\n 'type': 'Flag'},\n 'NS': {'description': 'Number of Samples With Data',\n 'id': 1,\n 'name': 'NS',\n 'number': 1,\n 'type': 'Integer'}},\n 'types': {'AA': 'string',\n 'AF': 'float',\n 'DB': 'boolean',\n 'DP': 'int',\n 'GQ': 'int',\n 'H2': 'boolean',\n 'HQ': 'mint',\n 'NS': 'int'}}\n{'__type__': 'CALL',\n 'alt': 'A',\n 'chr': '20',\n 'filters': ['PASS'],\n 'info': {'AF': 0.5,\n 'DB': True,\n 'DP': 8,\n 'GQ': 48,\n 'H2': True,\n 'HQ': (51, 51),\n 'NS': 3},\n 'is_het': True,\n 'is_phased': True,\n 'quality': 29.0,\n 'ref': 'G',\n 'rs_id': 'rs6054257',\n 'sample_name': 'NA00002',\n 'start': 14370}\n{'__type__': 'CALL',\n 'alt': 'A',\n 'chr': '20',\n 'filters': ['PASS'],\n 'info': {'AF': 0.5,\n 'DB': True,\n 'DP': 5,\n 'GQ': 43,\n 'H2': True,\n 'HQ': (None, None),\n 'NS': 3},\n 'is_het': False,\n 'is_phased': False,\n 'quality': 29.0,\n 'ref': 'G',\n 'rs_id': 'rs6054257',\n 'sample_name': 'NA00003',\n 'start': 14370}\n{'__type__': 'CALL',\n 'alt': 'A',\n 'chr': '20',\n 'filters': ['q10'],\n 'info': {'AF': 0.017000000923871994,\n 'DP': 5,\n 'GQ': 3,\n 'HQ': (65, 3),\n 'NS': 3},\n 'is_het': True,\n 'is_phased': True,\n 'quality': 3.0,\n 'ref': 'T',\n 'rs_id': None,\n 'sample_name': 'NA00002',\n 'start': 17330}\n{'__type__': 'CALL',\n 'alt': 'G',\n 'chr': '20',\n 'filters': ['PASS'],\n 'info': {'AA': 'T',\n 'AF': 0.3330000042915344,\n 'DB': True,\n 'DP': 6,\n 'GQ': 21,\n 'HQ': (23, 27),\n 'NS': 2},\n 'is_het': True,\n 'is_phased': True,\n 'quality': 67.0,\n 'ref': 'A',\n 'rs_id': 'rs6040355',\n 'sample_name': 'NA00001',\n 'start': 1110696}\n{'__type__': 'CALL',\n 'alt': 'T',\n 'chr': '20',\n 'filters': ['PASS'],\n 'info': {'AA': 'T',\n 'AF': 0.6669999957084656,\n 'DB': True,\n 'DP': 6,\n 'GQ': 21,\n 'HQ': (23, 27),\n 'NS': 2},\n 'is_het': True,\n 'is_phased': True,\n 'quality': 67.0,\n 'ref': 'A',\n 'rs_id': 'rs6040355',\n 'sample_name': 'NA00001',\n 'start': 1110696}\n{'__type__': 'CALL',\n 'alt': 'G',\n 'chr': '20',\n 'filters': ['PASS'],\n 'info': {'AA': 'T',\n 'AF': 0.3330000042915344,\n 'DB': True,\n 'DP': 0,\n 'GQ': 2,\n 'HQ': (18, 2),\n 'NS': 2},\n 'is_het': True,\n 'is_phased': True,\n 'quality': 67.0,\n 'ref': 'A',\n 'rs_id': 'rs6040355',\n 'sample_name': 'NA00002',\n 'start': 1110696}\n{'__type__': 'CALL',\n 'alt': 'T',\n 'chr': '20',\n 'filters': ['PASS'],\n 'info': {'AA': 'T',\n 'AF': 0.6669999957084656,\n 'DB': True,\n 'DP': 0,\n 'GQ': 2,\n 'HQ': (18, 2),\n 'NS': 2},\n 'is_het': True,\n 'is_phased': True,\n 'quality': 67.0,\n 'ref': 'A',\n 'rs_id': 'rs6040355',\n 'sample_name': 'NA00002',\n 'start': 1110696}\n{'__type__': 'CALL',\n 'alt': 'T',\n 'chr': '20',\n 'filters': ['PASS'],\n 'info': {'AA': 'T',\n 'AF': 0.6669999957084656,\n 'DB': True,\n 'DP': 4,\n 'GQ': 35,\n 'HQ': (None,),\n 'NS': 2},\n 'is_het': False,\n 'is_phased': False,\n 'quality': 67.0,\n 'ref': 'A',\n 'rs_id': 'rs6040355',\n 'sample_name': 'NA00003',\n 'start': 1110696}\n{'__type__': 'CALL',\n 'alt': 'G',\n 'chr': '20',\n 'filters': ['PASS'],\n 'info': {'AA': 'G', 'DP': 4, 'GQ': 35, 'H2': True, 'NS': 3},\n 'is_het': True,\n 'is_phased': False,\n 'quality': 50.0,\n 'ref': 'GTC',\n 'rs_id': 'microsat1',\n 'sample_name': 'NA00001',\n 'start': 1234567}\n{'__type__': 'CALL',\n 'alt': 'GTCT',\n 'chr': '20',\n 'filters': ['PASS'],\n 'info': {'AA': 'G', 'DP': 2, 'GQ': 17, 'H2': True, 'NS': 3},\n 'is_het': True,\n 'is_phased': False,\n 'quality': 50.0,\n 'ref': 'GTC',\n 'rs_id': 'microsat1',\n 'sample_name': 'NA00002',\n 'start': 1234567}\n{'__type__': 'CALL',\n 'alt': 'G',\n 'chr': '20',\n 'filters': ['PASS'],\n 'info': {'AA': 'G', 'DP': 3, 'GQ': 40, 'H2': True, 'NS': 3},\n 'is_het': False,\n 'is_phased': False,\n 'quality': 50.0,\n 'ref': 'GTC',\n 'rs_id': 'microsat1',\n 'sample_name': 'NA00003',\n 'start': 1234567}\n```\n\n\n\n[GenomOncology]: https://genomoncology.com/\n[Knowledge Management System]: https://genomoncology.com/solutions/clinical-oncology/\n[pysam]: https://pysam.readthedocs.io/\n[Related]: https://github.com/genomoncology/related\n[Specd]: https://github.com/genomoncology/specd \n[Rigor]: https://github.com/genomoncology/rigor \n[GO SDK]: https://pypi.org/project/gosdk/\n[GO CLI]: https://pypi.org/project/gocli/\n\n\n", "description_content_type": "text/markdown", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "", "keywords": "Bioinformatics HGVS VCF Clinical Trials Genomics", "license": "Proprietary", "maintainer": "", "maintainer_email": "", "name": "govcf", "package_url": "https://pypi.org/project/govcf/", "platform": "", "project_url": "https://pypi.org/project/govcf/", "project_urls": null, "release_url": "https://pypi.org/project/govcf/0.8.0/", "requires_dist": [ "pysam (>=0.14.1)", "intervaltree", "related" ], "requires_python": "", "summary": "govcf", "version": "0.8.0" }, "last_serial": 4761243, "releases": { "0.8.0": [ { "comment_text": "", "digests": { "md5": "aa1b044879556417207a90d4d2ce0529", "sha256": "2b0542d1f45e09a0b1d2abad52afb29ed403b06c5495d06ab3055c182a571f9d" }, "downloads": -1, "filename": "govcf-0.8.0-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "aa1b044879556417207a90d4d2ce0529", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": null, "size": 12542, "upload_time": "2018-11-14T21:23:57", "url": "https://files.pythonhosted.org/packages/e0/5f/291bf6f08488607c0dafdb827d5b20382363c24619bc19d3cb4a959df3d4/govcf-0.8.0-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "fc6dd090e4303c0bfbd7b939260d5c50", "sha256": "92789787d4da2e498da174467461575dd9d3705bb200f7030d29f5e50566a752" }, "downloads": -1, "filename": "govcf-0.8.0.tar.gz", "has_sig": false, "md5_digest": "fc6dd090e4303c0bfbd7b939260d5c50", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 36287, "upload_time": "2018-11-14T21:23:59", "url": "https://files.pythonhosted.org/packages/35/73/5cf4a984ee891add184cf645fdd82cdb4be8f896734124ef321fc9fd8d57/govcf-0.8.0.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "aa1b044879556417207a90d4d2ce0529", "sha256": "2b0542d1f45e09a0b1d2abad52afb29ed403b06c5495d06ab3055c182a571f9d" }, "downloads": -1, "filename": "govcf-0.8.0-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "aa1b044879556417207a90d4d2ce0529", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": null, "size": 12542, "upload_time": "2018-11-14T21:23:57", "url": "https://files.pythonhosted.org/packages/e0/5f/291bf6f08488607c0dafdb827d5b20382363c24619bc19d3cb4a959df3d4/govcf-0.8.0-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "fc6dd090e4303c0bfbd7b939260d5c50", "sha256": "92789787d4da2e498da174467461575dd9d3705bb200f7030d29f5e50566a752" }, "downloads": -1, "filename": "govcf-0.8.0.tar.gz", "has_sig": false, "md5_digest": "fc6dd090e4303c0bfbd7b939260d5c50", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 36287, "upload_time": "2018-11-14T21:23:59", "url": "https://files.pythonhosted.org/packages/35/73/5cf4a984ee891add184cf645fdd82cdb4be8f896734124ef321fc9fd8d57/govcf-0.8.0.tar.gz" } ] }