{ "info": { "author": "Yoann Anselmetti", "author_email": "yoann.anselmetti@umontpellier.fr", "bugtrack_url": null, "classifiers": [ "Development Status :: 4 - Beta", "Environment :: Console", "Intended Audience :: Science/Research", "License :: OSI Approved :: CEA CNRS Inria Logiciel Libre License, version 2.1 (CeCILL-2.1)", "Natural Language :: English", "Operating System :: POSIX :: Linux", "Programming Language :: Python", "Topic :: Scientific/Engineering :: Bio-Informatics", "Topic :: Scientific/Engineering :: Visualization" ], "description": "Pop-Con: a tool to visualize genotype profiles on Site Frequency Spectrum\n=========================================================================\n\nINTRODUCTION\n------------\nPop-Con is a tool designed to visualize genotype profiles of a Site Frequency Spectrum (SFS) from a Variant Calling Format (VCF) file [1] for SNP and indel variant.\nWe define a *genotype profile* as the list of genotypes for all individuals at a given site.\nFor example, **\"0/1,1/1,0/1,0/1\"** is the genotype profile corresponding to the VCF line below:\n```\nDst_b1v03_scaf00008 54329 . G A 1970.68 PASS AC=6;AF=0.500;AN=12;BaseQRankSum=-6.140e-01;ClippingRankSum=0.00;DP=144;ExcessHet=4.8280;FS=6.723;MLEAC=5;MLEAF=0.417;MQ=60.00;MQRankSum=0.00;QD=15.28;ReadPosRankSum=-8.460e-01;SOR=0.440 GT:AD:DP:GQ:PL 0/1:16,22:38:99:558,0,409 1/1:1,11:12:3:309,3,0 0/1:26,28:54:99:750,0,617 0/1:8,11:19:99:286,0,203\n```\nWhere individuals 1, 3 and 4 are heterozygote (\"0/1\") and individual 3 is homozygote for the alternative allele (\"1/1\") for this site.\n\nPop-Con takes as input a VCF file and uses the **cyvcf2** python package [2] to read and parse the VCF file\n To be readable and parsable by **cyvcf2** the VCf file has to be compressed and indexed with one of the 2 following command lines:\n```bash\nbgzip input_vcf_file.vcf\ntabix -p input_vcf_file.vcf.gz\n```\nOR\n```bash\nbcftools view input_vcf_file.vcf -Oz -o input_vcf_file.vcf.gz\nbcftools index input_vcf_file.vcf.gz\n```\n\nPop-Con has been developped on VCF files produced with **GATK** [3] and **read2snp** [4] variant callers. Variant caller used to produced VCF file has to be given with option `--tool` (**GATK** is the default variant caller).\n\n`--read` and `--marft` filtering options can be used to test different set of genotype filtering:\n* `--read` option take as input a list of read coverage threshold. Genotype with read coverage lower than threshold will be tagged *lowCov*\n* `--marft` option take as input a list of minor allele read frequency threshold. Genotype with minor allele read frequency lower than threshold will be tagged *lowMARF*\n\nFor each filtering combination (\"read coverage\" + \"minor allele read frequency\"), Pop-Con plots 2 figures (for **SNP** and **indel**):\n1. with sites where all individuals are genotyped and passed read coverage and minor allele read frequency filtering.\n2. with all sites where at least 1 individual is genotyped and passed read coverage and minor allele read frequency filtering.\n\nFor figure 1, two plots are produced:\n* an upper plot representing the observed proportion of the **X** most represented genotype profiles for each SFS pic (**X** is set with the `-max/--max_profiles` option)\n* a lower plot representing for each pic the expected proportion of genotype profile under the Hardy-Weinberg equilibrium. Genotype profiles are colored in red if observed genotype profile is **Y** times upper Hardy-Weinberg expectation, blue if **Y** times lower and white if at equilibrum (between **Y** times upper and **Y** times lower). Where **Y** is set with the option `-fold/--hwe_fold_change`\n\nFor figure 2, only SFS plot with observed genotype profiles proportion is produced. This plot can be used to detect abnormal porpotion of low read coverage, low minor allele read frequency or ungenotyped individuals but display is not satisfying for now...\n\nPop-Con is supported on Linux with python2 (version==2.7) and python3 (versions>=3.4) and surely supported on Macintosh and Windows with the pypi python packages manager but not tested. \n\n\nINSTALLATION\n------------\n## Source from GitHub\nThis assumes that you have the python modules cyvcf2, numpy and matplotlib installed (cf *requirements.txt*).\n```bash\ngit clone https://github.com/YoannAnselmetti/Pop-Con.git\ncd Pop-Con\npython ./Pop-Con\n```\n\n## Using pip (recommended)\n```bash\nsudo pip install Pop-Con\n```\n\n## Manual installation\nIn the **Pop-Con** folder where \"setup.py\" is located, run:\n```bash\nsudo python setup.py install\n```\n\nIf you encountered problem during the installation of cyvcf2 or get this following error message when executing Pop-Con:\n```\nTraceback (most recent call last):\n File \"\", line 1, in \n File \"/usr/local/lib/python3.6/site-packages/cyvcf2/__init__.py\", line 1, in \n from .cyvcf2 import (VCF, Variant, Writer, r_ as r_unphased, par_relatedness,\nImportError: /usr/local/lib/python3.6/site-packages/cyvcf2/cyvcf2.cpython-36m-x86_64-linux-gnu.so: undefined symbol: EVP_sha1\n```\nYou have to install it manually from the source:\nhttps://github.com/brentp/cyvcf2\n\n\nUSAGE\n-----\n```\nusage: Pop-Con [-h] -i VCF_FILE [-t VARIANT_CALLER]\n [-r READ_COVERAGE_THRESHOLD [READ_COVERAGE_THRESHOLD ...]]\n [-m MARFT [MARFT ...]] [-f VARIANT_CALLER_FILTERING]\n [-fmono MONOMORPH_FILTERING] [-p PREFIX] [-v VERBOSE]\n [-o OUTPUT_DIR] [-hf WRITE_HETEROZYGOSITY_FILE] [-sep SEP]\n [-max MAX_PROFILES] [-fold HWE_FOLD_CHANGE]\n\nPop-Con - A tool for Population genomic Conflicts detection.\n\noptional arguments:\n -h, --help show this help message and exit\n -i VCF_FILE, --input_file VCF_FILE\n Variant Calling Format file containing variant calling\n data.\n -t VARIANT_CALLER, --tool VARIANT_CALLER\n Variant calling tool used to call variant. Values:\n \"read2snp\" or \"GATK\". (Default: \"GATK\")\n -r READ_COVERAGE_THRESHOLD [READ_COVERAGE_THRESHOLD ...], --read READ_COVERAGE_THRESHOLD [READ_COVERAGE_THRESHOLD ...]\n List of read coverage threshold filtering for each\n genotype. Values range: [0,infinity[. (Default: 0)\n -m MARFT [MARFT ...], --marft MARFT [MARFT ...]\n List of Minor Allele Read Frequency (MARF) threshold\n for heterozygous genotypes filtering. Values range:\n [0.0,0.5]. (Default: 0.0)\n -f VARIANT_CALLER_FILTERING, --variant_caller_filtering VARIANT_CALLER_FILTERING\n Boolean to set if variant caller filtering is consider\n or not. (if \"True\", sites with TAG column are not\n empty are filtered out from analysis). (Default: True)\n -fmono MONOMORPH_FILTERING, --monomorph_filtering MONOMORPH_FILTERING\n Boolean to filter out monomorph sites. (\"True\":\n monomorph sites are filtered out from analysis. Reduce\n drastically the execution time!!!). (Default: True)\n -p PREFIX, --prefix PREFIX\n Experiment name (used as prefix for output files).\n (Default: \"exp1\")\n -v VERBOSE, --verbose VERBOSE\n Verbose intensity. (Default: 1)\n -o OUTPUT_DIR, --output OUTPUT_DIR\n Output directory path. (Default: ./)\n -hf WRITE_HETEROZYGOSITY_FILE, --heterozygosity_file WRITE_HETEROZYGOSITY_FILE\n Boolean to set if the heterozygosity files\n (summarizing VCF file for each combination of read\n coverage and MARF filtering) have to be written or\n not. File used by script\n \"scripts/plot_SFS_geneotype_profiles.py\" to plot SFS\n without re-parsing VCF file. (Default: True)\n -sep SEP, --separator SEP\n Separator used in genotype profiles. (Default: \",\")\n -max MAX_PROFILES, --max_profiles MAX_PROFILES\n Maximum number of genotype profiles displayed in SFS\n plot. (Default: 10)\n -fold HWE_FOLD_CHANGE, --hwe_fold_change HWE_FOLD_CHANGE\n Fold change value to define when an observed genotype\n profile proportion is in excess/deficit compare to the\n expected value under Hardy-Weinberg Equilibrium.\n (Default: 2.0)\n\nSource code and manual: http://github.com/YoannAnselmetti/Pop-Con\n```\n\nEXAMPLE\n-------\nBelow, the minimal command line to run Pop-Con:\n```bash\npython Pop-Con -i input_vcf_file.vcf.gz\n```\nIf the VCF file contains SNP and indel variants, Pop-Con will create two directories *\"SNP/\"* and *\"indel/\"* in current directory.\n\nBelow, an example on VCF file present in directory *\"example/data/Lineus_longissimus/\"*:\n```bash\npython Pop-Con -i example/data/Lineus_longissimus/Lineus_longissimus_read10_par0.vcf.gz -p Lineus_longissimus -o example/results/Lineus_longissimus/ -t read2snp -fmono True -max 20\n```\n\nBelow the architecture of the output directory *\"example/results/Lineus_longissimus/\"* produced by Pop-Con with command line above:\n```\nexample/results/Lineus_longissimus/\n\u2514\u2500\u2500 SNP\n \u251c\u2500\u2500 heterozygosity\n \u2502 \u2514\u2500\u2500 heterozygosity_allFilter_Lineus_longissimus.tab\n \u2514\u2500\u2500 MARFt0.0\n \u251c\u2500\u2500 heterozygosity\n \u2502\t \u2514\u2500\u2500 read0\n \u2502\t\t \u251c\u2500\u2500 genotype_profiles_distrib_read0_Lineus_longissimus_SNP_MARFt0.0.tab\n \u2502\t \u251c\u2500\u2500 genotype_profiles_per_altNb_read0_Lineus_longissimus_SNP_MARFt0.0_max20_with_all_positions.tab\n \u2502\t\t \u251c\u2500\u2500 genotype_profiles_per_altNb_read0_Lineus_longissimus_SNP_MARFt0.0_max20_with_positions_with_all_individuals_genotyped.tab\n \u2502\t\t \u2514\u2500\u2500 heterozygosity_read0_Lineus_longissimus_SNP_MARFt0.0.tab\n \u2514\u2500\u2500 SFS_profiles\n \u2514\u2500\u2500 read0\n \u251c\u2500\u2500 SFSplot_genotypes_profiles_read0_Lineus_longissimus_SNP_MARFt0.0_max20_all_positions.pdf\n \u2514\u2500\u2500 SFSplot_genotypes_profiles_read0_Lineus_longissimus_SNP_MARFt0.0_max20.pdf\n```\n\nThe 2 SFS plot with genotype profiles produced:\n### For sites where all individuals are genotyped and passed read coverage + minor allele read frequency filtering\nFile: \"SFSplot_genotypes_profiles_read0_Lineus_longissimus_SNP_MARFt0.0_max20.pdf\"\n![Lineus longissimus SFS plot with genotype profiles (only individuals genotyped)](docs/img/SFSplot_genotypes_profiles_read0_Lineus_longissimus_SNP_MARFt0.0_max20.pdf)\n\n### For sites where at least 1 individual is genotyped and passed read coverage + minor allele read frequency filtering\nFile: \"SFSplot_genotypes_profiles_read0_Lineus_longissimus_SNP_MARFt0.0_max20_all_positions.pdf\"\n![Lineus longissimus SFS plot with genotype profiles (all positions)](docs/img/SFSplot_genotypes_profiles_read0_Lineus_longissimus_SNP_MARFt0.0_max20_all_positions.pdf)\n\n\nBIBLIOGRAPHY\n-------------\n[1] Danecek, P. et al. The variant call format and VCFtools. Bioinformatics 27, 2156\u20132158 (2011).\n[2] Pedersen, B. S. & Quinlan, A. R. cyvcf2: fast, flexible variant analysis with Python. Bioinformatics 33, 1867\u20131869 (2017).\n[3] McKenna, A. et al. The Genome Analysis Toolkit: A MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297\u20131303 (2010).\n[4] Gayral, P. et al. Reference-Free Population Genomics from Next-Generation Transcriptome Data and the Vertebrate\u2013Invertebrate Gap. PLOS Genetics 9, e1003457 (2013).\n\n\nTROUBLESHOOTING\n---------------\nYou get the following error message, when you forgot to set the option `--tool read2snp`. \n```bash\nTraceback (most recent call last):\n File \"./Pop-Con\", line 404, in \n individuals_number,dict_SNP_genotypes,dict_indel_genotypes=read_VCF_file(vcf_file,prefix,OUTPUT,list_read_coverage_threshold,list_MARFt,monomorph_filtering,sep,verbose)\n File \"./Pop-Con\", line 127, in read_VCF_file\n dict_SNP_genotypes,dict_SNP_SFS_allPos,dict_SNP_SFS_allIndGT,dict_SNP_hetero,dict_FIS=heterozygosity.store_polymorphism(variant,list_Individuals,dict_SNP_genotypes,dict_SNP_SFS_allPos,dict_SNP_SFS_allIndGT,dict_SNP_hetero,dict_FIS,OUTPUT,'SNP',prefix,list_read_coverage_threshold,list_MARFt,False,variant_caller,write_heterozygosity_file,sep,verbose)\n File \"/home/yanselmetti/Bureau/Pop-Con/popcon/heterozygosity.py\", line 365, in store_polymorphism\n variant_size,x,y,z,listInd,dict_SFS_allPos,dict_SFS_allIndGT=store_SFS(variant,dict_SFS_allPos,dict_SFS_allIndGT,read_coverage_threshold,MARFt,bool_INDEL,variant_caller,verbose)\n File \"/home/yanselmetti/Bureau/Pop-Con/popcon/heterozygosity.py\", line 212, in store_SFS\n alt,x,y,z,listInd=get_polymorphism_GATK(variant,read_coverage_threshold,MARFt)\n File \"/home/yanselmetti/Bureau/Pop-Con/popcon/heterozygosity.py\", line 54, in get_polymorphism_GATK\n str_coverage=str(variant).split()[x].split(\":\")[1]\nIndexError: list index out of range\n```\nThis error can also arise if you used an other variant caller tool than *GATK* or *read2snp* as Pop-Con was only tested on VCF files produced by those. If you encountered such problem can you please write an issue, I'll try to shortly produce a new module to parse the VCF produced by the variant caller you used.\n\n\nMISCELLANEOUS\n-------------\nFor R lovers, there is an extra script (*\"scripts/SFS_genotypes_profiles_plot.R\"*) to plot SFS with genotype profiles from file.\nExample:\n```bash\nRscript ./scripts/SFS_genotypes_profiles_plot.R example/results/Lineus_longissimus/SNP/MARFt0.0/heterozygosity/read0/genotype_profiles_per_altNb_read0_Lineus_longissimus_SNP_MARFt0.0_max20_with_positions_with_all_individuals_genotyped.tab 20 6 Lineus_longissimus example/results/Lineus_longissimus/SNP/MARFt0.0/SFS_profiles/read0/SFSplot_genotypes_profiles_read0_Lineus_longissimus_SNP_MARFt0.0_max20_with_Rscript.pdf 0\n```\nThis script only plot SFS with observed porportion of genotype profiles and the one with comparison to the expected values under the Hardy-Weinberg equilibrium:\n![Lineus longissimus SFS plot with genotype profiles (only individuals genotyped) with Rscript](docs/img/SFSplot_genotypes_profiles_read0_Lineus_longissimus_SNP_MARFt0.0_max20_with_Rscript.pdf)\n\n\n#### **Contact**\nYoann Anselmetti : *yoann.anselmetti@umontpellier.fr*\n\n", "description_content_type": "text/markdown", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/YoannAnselmetti/Pop-Con", "keywords": "", "license": "CeCILL-2.1", "maintainer": "", "maintainer_email": "", "name": "Pop-Con", "package_url": "https://pypi.org/project/Pop-Con/", "platform": "", "project_url": "https://pypi.org/project/Pop-Con/", "project_urls": { "Homepage": "https://github.com/YoannAnselmetti/Pop-Con" }, "release_url": "https://pypi.org/project/Pop-Con/1.0.5/", "requires_dist": [ "matplotlib (==2.2.4)", "numpy (==1.16.4)", "cyvcf2 (==0.11.4)" ], "requires_python": "", "summary": "Visualization of genotype profiles on population genomics data for detection of abnormal genotypes pattern.", "version": "1.0.5" }, "last_serial": 5643823, "releases": { "1.0.0": [ { "comment_text": "", "digests": { "md5": "7119601f341fd334d05b4fb8f40ff990", "sha256": "ea59290510ebabbcf5c539ab9e92cda0984ded612b2924fa8c03e6defafad2b5" }, "downloads": -1, "filename": "Pop_Con-1.0.0-py3-none-any.whl", "has_sig": false, "md5_digest": "7119601f341fd334d05b4fb8f40ff990", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 22520, "upload_time": "2019-07-04T13:23:13", "url": "https://files.pythonhosted.org/packages/2d/d3/5a6576f243dd9cbb1c7dca5f9bc937f65f74304f088c6f5fb3fb78cee0a7/Pop_Con-1.0.0-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "dd5f3f1093a1f56a332a962f34eff30c", "sha256": "61cbe938fe9806f2d2edbc567fb1c2b5c4ba29675cf02844ab92f63f18c5349c" }, "downloads": -1, "filename": "Pop-Con-1.0.0.tar.gz", "has_sig": false, "md5_digest": "dd5f3f1093a1f56a332a962f34eff30c", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 23516, "upload_time": "2019-07-04T13:23:15", "url": "https://files.pythonhosted.org/packages/7e/30/32ac488e6359bb92a99b46b8e8a1b514ebce6548b89a40b475d89b327f25/Pop-Con-1.0.0.tar.gz" } ], "1.0.1": [ { "comment_text": "", "digests": { "md5": "a967cce6ea3ec06ff5a7f3c71f36804d", "sha256": "4868ebeb46714a6e199f7e4a69f868e0bf217c6e2566579571d97880059fec31" }, "downloads": -1, "filename": "Pop_Con-1.0.1-py3-none-any.whl", "has_sig": false, "md5_digest": "a967cce6ea3ec06ff5a7f3c71f36804d", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 25122, "upload_time": "2019-07-10T15:21:56", "url": "https://files.pythonhosted.org/packages/88/2b/aad4cd1ec06a20ad106d140f6e43288e80d3d1cc98b66de454aa0703e2e7/Pop_Con-1.0.1-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "cd8fcdf89551222d02eecd5dbfc17dc3", "sha256": "8226f75688031f1c9a50524697ce11e6c4dd179c8dc912c15d5a4c949aa725dd" }, "downloads": -1, "filename": "Pop-Con-1.0.1.tar.gz", "has_sig": false, "md5_digest": "cd8fcdf89551222d02eecd5dbfc17dc3", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 25935, "upload_time": "2019-07-10T15:21:58", "url": "https://files.pythonhosted.org/packages/ca/03/e8f5f50178cc152373f88849341815554af92bc4b7f6099b9b33cbd394ae/Pop-Con-1.0.1.tar.gz" } ], "1.0.5": [ { "comment_text": "", "digests": { "md5": "0ddddf50802e48b70185e7b02b5337df", "sha256": "8218e82557bf1bb5702135382486c137dc63e265a2b3cd0da2a6b03ad950c582" }, "downloads": -1, "filename": "Pop_Con-1.0.5-py3-none-any.whl", "has_sig": false, "md5_digest": "0ddddf50802e48b70185e7b02b5337df", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 26483, "upload_time": "2019-08-07T08:38:33", "url": "https://files.pythonhosted.org/packages/8e/4f/e14c3558e316ab86a2200f360bcfbb104ce280ac383a7ee13d89731c3088/Pop_Con-1.0.5-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "49d77c4e0ae0e043a8e8f4b92f22df79", "sha256": "409962a710aab5cfa30f5e053aa39549a5b9f62def4f3feca2e8c3eb074df654" }, "downloads": -1, "filename": "Pop-Con-1.0.5.tar.gz", "has_sig": false, "md5_digest": "49d77c4e0ae0e043a8e8f4b92f22df79", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 26938, "upload_time": "2019-08-07T08:38:35", "url": "https://files.pythonhosted.org/packages/3d/ef/bbecc24d6784d4d2264c6a7313912eac35fa8e52496ee8b16037e217ebb6/Pop-Con-1.0.5.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "0ddddf50802e48b70185e7b02b5337df", "sha256": "8218e82557bf1bb5702135382486c137dc63e265a2b3cd0da2a6b03ad950c582" }, "downloads": -1, "filename": "Pop_Con-1.0.5-py3-none-any.whl", "has_sig": false, "md5_digest": "0ddddf50802e48b70185e7b02b5337df", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 26483, "upload_time": "2019-08-07T08:38:33", "url": "https://files.pythonhosted.org/packages/8e/4f/e14c3558e316ab86a2200f360bcfbb104ce280ac383a7ee13d89731c3088/Pop_Con-1.0.5-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "49d77c4e0ae0e043a8e8f4b92f22df79", "sha256": "409962a710aab5cfa30f5e053aa39549a5b9f62def4f3feca2e8c3eb074df654" }, "downloads": -1, "filename": "Pop-Con-1.0.5.tar.gz", "has_sig": false, "md5_digest": "49d77c4e0ae0e043a8e8f4b92f22df79", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 26938, "upload_time": "2019-08-07T08:38:35", "url": "https://files.pythonhosted.org/packages/3d/ef/bbecc24d6784d4d2264c6a7313912eac35fa8e52496ee8b16037e217ebb6/Pop-Con-1.0.5.tar.gz" } ] }