{ "info": { "author": "Joseph Halstead", "author_email": "josephhalstead89@gmail.com", "bugtrack_url": null, "classifiers": [ "Operating System :: OS Independent", "Programming Language :: Python :: 3" ], "description": "# PyVariantFilter\n\nVersatile Python package for filtering germline genetic variants based on inheritance pattern. \n\n * Find Autosomal Dominant, Autosomal Reccessive, X Linked Reccessive, X Linked Dominant, De Novo and Compound Heterozygous variants in your dataset.\n\n * Filter variants by their annotations by applying custom python filtering functions.\n\n * Convert VEP annotated VCFs to Pandas DataFrames.\n\n## Quick Start\n\n```python\nfrom pyvariantfilter.family import Family\nfrom pyvariantfilter.variant_set import VariantSet\n\n# Create a Family object describing the relationships between samples as well as their sex and affected status\nmy_family = Family('FAM001')\nmy_family.read_from_ped_file(ped_file_path='test_data/NA12878.ped', family_id='FAM001', proband_id='NA12878i')\n\n# Associate a VariantSet object with a the Family object we just created\nmy_variant_set = VariantSet()\nmy_variant_set.add_family(my_family)\n\n# Read variants from a standard VCF and apply initial filtering function\nmy_variant_set.read_variants_from_vcf('test_data/NA12878.trio.vep.vcf', proband_variants_only=True, filter_func=import_filter, args=(new_family.get_proband_id(),) )\n\n# Get candidate compound hets\nmy_variant_set.get_candidate_compound_hets()\n\n# Filter the compound hets by phasing them using parental information\nmy_variant_set.filter_compound_hets()\n\n# Flatten the compound hets to a dictionary with each compound het as a key\nmy_variant_set.get_filtered_compound_hets_as_dict()\n\n# Create Pandas Dataframe\ndf = my_variant_set.to_df()\n\n# Filter to view variants matching any inheritiance model.\ndf[['variant_id', 'worst_consequence','inheritance_models', 'csq_SYMBOL' ]][df['inheritance_models'] != ''].head()\n\n\n```\n\nWhere the filter\\_func argument to read\\_variants\\_from\\_vcf() is something like the function below.\n\n```python\ndef import_filter(variant, proband_id):\n\n if variant.has_alt(proband_id) and variant.passes_gt_filter(proband_id) and variant.passes_filter():\n\n freq_filter = variant.filter_on_numerical_transcript_annotation_lte(annotation_key='gnomAD_AF',\n ad_het=0.01,\n ad_hom_alt=0.01,\n x_male =0.01,\n x_female_het=0.01,\n x_female_hom=0.01,\n compound_het=0.01,\n y=0.01,\n mt=0.01,\n )\n\n\n csq_filter = False\n\n if variant.get_worst_consequence() in {'transcript_ablation': None,\n 'splice_acceptor_variant': None,\n 'splice_donor_variant': None,\n 'stop_gained': None,\n 'frameshift_variant': None,\n 'stop_lost': None,\n 'start_lost': None}:\n\n csq_filter = True\n\n\n if csq_filter and freq_filter:\n\n return True\n\n return False\n\n```\n\n## Input Requirements\n\nWhen using the VariantSet classes read from vcf functions a decomposed (Split Multiallelic Variants) and VEP annotated VCF is required. \n\nBoth GATK and Platypus VCFs are supported.\n\nUse VT and VEP with the following commands to preprocess your VCF before analysing.\n\n* https://github.com/atks/vt\n* https://github.com/Ensembl/ensembl-vep\n\nAnnotation with VEP is only neccecary if you want to find compound hets.\n\n```\n# split multiallellics and normalise\ncat input.vcf | vt decompose -s - | vt normalize -r reference.fasta - > input.norm.vcf\n\n# Annotate with VEP\nvep --verbose --format vcf --everything --fork 1 --species homo_sapiens --assembly GRCh37 --input_file input.norm.vcf \\\n--output_file input.norm.vep.vcf --force_overwrite --cache --dir vep_cache_location \\\n--fasta reference.fasta --offline --cache_version 94 -no_escape --shift_hgvs 1 --exclude_predicted --vcf --refseq --flag_pick \\\n--custom gnomad.genomes.vcf.gz,gnomADg,vcf,exact,0,AF_POPMAX \\\n--custom gnomad.exomes.vcf.gz,gnomADe,vcf,exact,0,AF_POPMAX \\\n\n```\n\n## Inheritance Models\n\nA lot of the rules here have been adapted from the GEMINI software which is worth checking out. https://gemini.readthedocs.io/en/latest/\n\n### Autosomal Dominant\n\n1) Variant must be on an autosome.\n\n2) All affected samples must be heterozygous or missing e.g. ./. See lenient option to allow homozygous alternate genotypes in affected samples other than the proband.\n\n3) If the variant is not in a low penetrant gene then all unaffected samples must be homozygous reference or have a missing genotype.\n\n\n### Autosomal Reccessive\n\n1) Variant must be on an Autosome.\n\n2) All affected samples must be homozygous for the alternate allele. Can be missing.\n\n3) No unaffected samples can be homozygous for the alternate allele. Can be missing.\n\n### X-Linked Reccessive\n\n1) Variant must be on the X chromosome.\n\n2) All affected female samples must be homozygous for the alternate allele or missing.\n\n3) No unaffected female samples can be homozygous for the alternate allele.\n\n4) All affected male samples must have the variant or be missing.\n\n5) No unaffected male samples can have the variant.\n\n### X-Linked Dominant\n\n1) Variant must be on the X chromosome.\n\n2) The daughters of affected male samples must be affected.\n\n3) The sons of affected males must not be affected.\n\n4) Affected male samples must have the variant or be missing.\n\n5) Affected female samples must be heterozygous or be missing.\n\n6) Unaffected samples must not have the variant.\n\n### De Novo\n\n1) Variant must be in the proband and not in either parent.\n\n2) Parents must have a GQ value above min\\_parental\\_gq.\n\n3) Parents must have a DP value above min\\_parental\\_depth.\n\n4) Parents must have a alt/ref ratio below max\\_parental\\_alt\\_ref\\_ratio.\n\n\n### Compound Heterozygous\n\n1) No unaffected samples can have the pair of variants. Can be adjusted using the allow\\_hets\\_in\\_unaffected argument.\n\n2) All affected samples must have the pair of variants or be missing the genotype data. Can be adjusted using the check_affected argument.\n\n3) \n\na) One of the pair must be inherited from mum and the other from dad.\n\nb) If include_denovo is True then one of the pair can be de_novo and the other inherited from either parent or both can be de_novo. There are no minimum requirements e.g. depth on the de_novo calls.\n\n\n## Install\n\n### Requirements\n\n* Python 3.6 or greater\n* Pysam 0.15.0\n* Pandas 0.23.4\n\n### Install the Package\n\n`pip install pyvariantfilter `\n\nor\n\n`git clone https://github.com/josephhalstead/pyvariantfilter.git `\n\n## Examples\n\nSee the notebooks folder for some examples of using the library for WES and WGS analyses.\n\n## Test\n\n`python tests.py`\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n", "description_content_type": "text/markdown", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/josephhalstead/pyvariantfilter", "keywords": "", "license": "", "maintainer": "", "maintainer_email": "", "name": "pyvariantfilter", "package_url": "https://pypi.org/project/pyvariantfilter/", "platform": "", "project_url": "https://pypi.org/project/pyvariantfilter/", "project_urls": { "Homepage": "https://github.com/josephhalstead/pyvariantfilter" }, "release_url": "https://pypi.org/project/pyvariantfilter/0.0.1/", "requires_dist": [ "pysam (>=0.15.2)", "pandas (>=0.23.4)" ], "requires_python": "", "summary": "Python package for filtering variants.", "version": "0.0.1" }, "last_serial": 5281875, "releases": { "0.0.1": [ { "comment_text": "", "digests": { "md5": "e92624e1d627e098b347d62d8aea7715", "sha256": "8284aa45be96c9dce54f54eb38eea4ff96f3762aa2226b9ea703bc67005cd50f" }, "downloads": -1, "filename": "pyvariantfilter-0.0.1-py3-none-any.whl", "has_sig": false, "md5_digest": "e92624e1d627e098b347d62d8aea7715", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 21038, "upload_time": "2019-05-17T11:43:00", "url": "https://files.pythonhosted.org/packages/b4/0a/a169343a47d4026b8844c706d739fa9ad808f5916717f9e6acdbe261f129/pyvariantfilter-0.0.1-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "818c01fbec313e2f422e17aa94159217", "sha256": "fae11b5cb7463932410faa70da25598fa2844ff5e06fd1c1e9e5dcef636ab8a6" }, "downloads": -1, "filename": "pyvariantfilter-0.0.1.tar.gz", "has_sig": false, "md5_digest": "818c01fbec313e2f422e17aa94159217", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 19873, "upload_time": "2019-05-17T11:43:03", "url": "https://files.pythonhosted.org/packages/d5/94/fcf9ee2961bd3fd5f4f653443655e6ddffa4e61d5f0cc3ac808072c90db1/pyvariantfilter-0.0.1.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "e92624e1d627e098b347d62d8aea7715", "sha256": "8284aa45be96c9dce54f54eb38eea4ff96f3762aa2226b9ea703bc67005cd50f" }, "downloads": -1, "filename": "pyvariantfilter-0.0.1-py3-none-any.whl", "has_sig": false, "md5_digest": "e92624e1d627e098b347d62d8aea7715", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 21038, "upload_time": "2019-05-17T11:43:00", "url": "https://files.pythonhosted.org/packages/b4/0a/a169343a47d4026b8844c706d739fa9ad808f5916717f9e6acdbe261f129/pyvariantfilter-0.0.1-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "818c01fbec313e2f422e17aa94159217", "sha256": "fae11b5cb7463932410faa70da25598fa2844ff5e06fd1c1e9e5dcef636ab8a6" }, "downloads": -1, "filename": "pyvariantfilter-0.0.1.tar.gz", "has_sig": false, "md5_digest": "818c01fbec313e2f422e17aa94159217", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 19873, "upload_time": "2019-05-17T11:43:03", "url": "https://files.pythonhosted.org/packages/d5/94/fcf9ee2961bd3fd5f4f653443655e6ddffa4e61d5f0cc3ac808072c90db1/pyvariantfilter-0.0.1.tar.gz" } ] }