{ "info": { "author": "Chester (Yu-Chuan Chang)", "author_email": "chester75321@gmail.com", "bugtrack_url": null, "classifiers": [ "Development Status :: 3 - Alpha", "Intended Audience :: Science/Research", "License :: OSI Approved :: MIT License", "Programming Language :: Python :: 3", "Programming Language :: Python :: 3.7", "Topic :: Scientific/Engineering :: Artificial Intelligence", "Topic :: Scientific/Engineering :: Bio-Informatics" ], "description": "# GenEpi\nGenEpi is a package to uncover epistasis associated with phenotypes by a machine learning approach, developed by Yu-Chuan Chang at [c4Lab](http://bioinfo.bime.ntu.edu.tw/c4lab/) of National Taiwan University and Taiwan AI Labs\n\n\n\nThe architecture and modules of GenEpi.\n\n## Getting Started\n### Installation\n```\n$ pip install GenEpi\n```\n>**NOTE:** GenEpi is a memory-consuming package, which might cause memory errors when calculating the epistasis of a gene containing a large number of SNPs. We recommend that the memory for running GenEpi should be over 256 GB.\n\n### Inputs\nWe provided test data [sample.gen](https://github.com/Chester75321/GenEpi/raw/master/genepi/example/sample.gen) and [sample.csv](https://github.com/Chester75321/GenEpi/raw/master/genepi/example/sample.csv) in [example folder](https://github.com/Chester75321/GenEpi/raw/master/genepi/example). Please see the following detail about input data.\n\n**1\\. Genotype Data:**\n\nGenEpi takes the [Genotype File Format](http://www.cog-genomics.org/plink/1.9/formats#gen) (.GEN) used by Oxford statistical genetics tools, such as IMPUTE2 and SNPTEST as the input format for genotype data. If your files are in [PLINK format](http://www.cog-genomics.org/plink/1.9/formats) (.BED/.BIM/.FAM) or [1000 Genomes Project text Variant Call Format](http://www.cog-genomics.org/plink/1.9/formats#vcf) (.VCF), you could use [PLINK](http://www.cog-genomics.org/plink/1.9/) with the following command to convert the files to the .GEN file.\n\nIf your files are in the **.BED/.BIM/.FAM** format.\n```\n$ plink --bfile prefixOfTheFilename --recode oxford --out prefixOfTheFilename\n```\nIf your file is in the **.VCF** format.\n```\n$ plink --vcf filename.vcf --recode oxford --out prefixOfTheFilename\n```\n\n**2\\. Phenotype & Environmental Factor Data**\n\nGenEpi takes the .CSV file without header line as the input format for phenotype and environmental factor data. The last column of the file will be considered as the phenotype data and the other columns will be considered as the environmental factor (covariates) data.\n>**NOTE:** The sequential order of the phenotype data should be the same as that in the .GEN file.\n\n## Usage Example\n### Running a Quick Test\nYou will obtain all the outputs of GenEpi in current folder.\n```\n$ GenEpi -g example -p example -o ./\n```\n\n### Applying on Your Data\n```\n$ GenEpi -g full_path_of_your_.GEN_file -p full_path_of_your_.CSV_file -o ./\n```\n\n### Applying Seld-defined Genome Regions on Your Data\nPrepare your genome regions in .TXT with the columns [chromosome, start, end, strand, geneSymbol], for eample:\n```\n1,10873,14409,+,DDX11L1\n1,14361,30370,-,WASH7P\n1,34610,37081,-,FAM138F\n1,68090,70008,+,OR4F5\n...\n```\n\nThen, use the parameter -s for applying it on your data\n```\n$ GenEpi -s full_path_of_your_genome_region_file -g full_path_of_your_.GEN_file -p full_path_of_your_.CSV_file -o ./\n```\n\n### Options\nFor checking all the optional arguments, please use --help:\n```\n$ GenEpi --help\n```\n\nYou will obtain the following argument list:\n```\nusage: GenEpi [-h] -g G -p P [-s S] [-o O] [-m {c,r}] [-k K] [-t T]\n [--updatedb] [-b {hg19,hg38}] [--compressld] [-d D] [-r R]\n\noptional arguments:\n -h, --help show this help message and exit\n -g G filename of the input .gen file\n -p P filename of the input phenotype\n -s S self-defined genome regions\n -o O output file path\n -m {c,r} choose model type: c for classification; r for regression\n -k K k of k-fold cross validation\n -t T number of threads\n\nupdate UCSC database:\n --updatedb enable this function\n -b {hg19,hg38} human genome build\n\ncompress data by LD block:\n --compressld enable this function\n -d D threshold for compression: D prime\n -r R threshold for compression: R square\n```\n\nFor changing the build of USCS genome browser, please modify the parameter -b:\n```\n$ GenEpi -g example -p example -o ./ --updatedb -b hg38\n```\n\nYou could modify the threshold for Linkage Disequilibrium dimension reduction by following command:\n```\n$ GenEpi -g example -p example -o ./ --compressld -d 0.9 -r 0.9\n```\n\n## Interpreting the Results\n### The Main Table\nGenEpi will automatically generate three folders (snpSubsets, singleGeneResult, crossGeneResult) beside your .GEN file. You could go to the folder **crossGeneResult** directly to obtain your main table for episatasis in **Result.csv**.\n\n| RSID | -Log102 p-value) | Odds Ratio | Genotype Frequency | Gene Symbol |\n|-----------------------------|---------------------------------------------:|-----------:|-------------------:|-------------|\n| rs157580_BB rs2238681_AA | 8.4002 | 9.3952 | 0.1044 | TOMM40 |\n| rs449647_AA rs769449_AB | 8.0278 | 5.0877 | 0.2692 | APOE |\n| rs59007384_BB rs11668327_AA | 8.0158 | 12.0408 | 0.0824 | TOMM40 |\n| rs283811_BB rs7254892_AA | 8.0158 | 12.0408 | 0.0824 | PVRL2 |\n| rs429358_AA | 5.7628 | 0.1743 | 0.5962 | APOE |\n| rs73052335_AA rs429358_AA | 5.6548 | 0.1867 | 0.5714 | APOC1\\*APOE |\n\n>The first column lists each feature by its RSID and the genotype (denoted as RSID_genotype), the pairwise epistatis features are represented using two SNPs. The last column describes the genes where the SNPs are located according to the genomic coordinates. We used a star sign to denote the epistasis between genes. The p-values of the χ2 test (the quantitative task will use student t-test) are also included. The odds ratio significantly away from 1 also indicates whether the features are potential causal or protective genotypes. Since low genotype frequency may cause unreliable odds ratios, we also listed this information in the table.\n\n### Other Details\n**1\\. Linkage Disequilibrium**\n\nAfter performing linkage disequilibrium (LD) dimension reduction, GenEpi will generate two files, a dimension-reduced .GEN file and a file containing LD blocks (.LDBlock file). Each row in the .LDBlock file indicates a LD block (see below for examples). The SNPs in front of colon signs are the representative SNPs of each LD block, and only these SNPs will be retained in the dimension-reduced .GEN file.\n```\nrs429358:rs429358\nrs7412:rs7412\nrs117656888:rs117656888\nrs1081105:rs1081105\nrs1081106:rs1081106,rs191315680\n```\n\n**2\\. Single-gene .GEN Files**\n\nThe subsets of the .GEN file for each gene will be stored in the folder **snpSubsets**.\n\n**3\\. Single-gene Results**\n\nAll of the within-gene epistasis selected by sinlge-gene models will be stored in the folder **singleGeneResult**, of which the format is the same as that in the **Result.csv** of cross-gene result. The performance of each single-gene model will be shown in **All_Logistic/Lasso_k-Fold.csv** in the same folder (see below for examples).\n\n| Gene Symbol | F1 Score |\n|-------------|---------:|\n| APOE | 0.6109 |\n| TOMM40 | 0.5900 |\n| PVRL2 | 0.5745 |\n| APOC1 | 0.5736 |\n\n**4\\. Model Persistance**\n\nThe final models of the step five and step six will be persisted in the folder **crossGeneResult** as **RFClassifier/Regressor.pkl** and **RFClassifier/Regressor_Covariates.pkl**, respectively. You could keep these models for future use without reconstructing them.\n\n## Meta\nChester (Yu-Chuan Chang) - chester75321@gmail.com \nDistributed under the MIT license. See ``LICENSE`` for more information. \n[https://github.com/Chester75321/GenEpi/](https://github.com/Chester75321/GenEpi/)\n\n\n", "description_content_type": "text/markdown", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/Chester75321/GenEpi", "keywords": "epistasis,SNP-SNP interactions,GWAS", "license": "MIT", "maintainer": "", "maintainer_email": "", "name": "genepi", "package_url": "https://pypi.org/project/genepi/", "platform": "", "project_url": "https://pypi.org/project/genepi/", "project_urls": { "Homepage": "https://github.com/Chester75321/GenEpi" }, "release_url": "https://pypi.org/project/genepi/2.0.2/", "requires_dist": [ "pymysql (>=0.8.0)", "numpy (>=1.13.0)", "scipy (>=0.19.0)", "psutil (>=4.3.0)", "scikit-learn (>=0.21.2)" ], "requires_python": ">=3", "summary": "A package for detecting epsitasis by machine learning", "version": "2.0.2" }, "last_serial": 5847911, "releases": { "1.0.3": [ { "comment_text": "", "digests": { "md5": "b95b19260015b144cfa29d675a9b05e7", "sha256": "695eb64fda48372348c58aa1132e342748d8059950512a4b83ac6f14630a582f" }, "downloads": -1, "filename": "genepi-1.0.3.tar.gz", "has_sig": false, "md5_digest": "b95b19260015b144cfa29d675a9b05e7", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3", "size": 418345, "upload_time": "2018-04-22T11:24:20", "url": "https://files.pythonhosted.org/packages/06/00/11f9ec9ad5fdac925f2a0fd0beb5b23abf44461a354b17b7dfce5aaf3d10/genepi-1.0.3.tar.gz" } ], "1.0.4": [ { "comment_text": "", "digests": { "md5": "8827c89f505769c65bec8088e9b4e838", "sha256": "da46e642d7eab4905a401f1213d524556846e4bee36d9aab63a28e5cff6c9518" }, "downloads": -1, "filename": "genepi-1.0.4.tar.gz", "has_sig": false, "md5_digest": "8827c89f505769c65bec8088e9b4e838", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3", "size": 418254, "upload_time": "2018-04-23T12:17:29", "url": "https://files.pythonhosted.org/packages/ab/92/9f840ec8ec1443bf793a6b97402170e55f60072f0904ebf2c09abda6decb/genepi-1.0.4.tar.gz" } ], "1.0.5": [ { "comment_text": "", "digests": { "md5": "163e3f7184608a61a1cafb193a500c4e", "sha256": "cb285ae718e2b370d72da895b94f9a8070b8eb24047d055314ee6951eb4f0cc6" }, "downloads": -1, "filename": "genepi-1.0.5.tar.gz", "has_sig": false, "md5_digest": "163e3f7184608a61a1cafb193a500c4e", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3", "size": 419877, "upload_time": "2018-05-01T03:44:44", "url": "https://files.pythonhosted.org/packages/59/8d/1ad3daea5edcb0638a417436a21ca5d469cf107746f55bc7b0899d40ed1e/genepi-1.0.5.tar.gz" } ], "1.0.6": [ { "comment_text": "", "digests": { "md5": "a5a2eaafa28c12c02f73c3f0c6d6c18b", "sha256": "d56e153eccb4df66dff2834542042ae390386d8b53c601c7181c5506163cc6f7" }, "downloads": -1, "filename": "genepi-1.0.6.tar.gz", "has_sig": false, "md5_digest": "a5a2eaafa28c12c02f73c3f0c6d6c18b", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3", "size": 419964, "upload_time": "2018-05-30T14:25:59", "url": "https://files.pythonhosted.org/packages/ba/9c/6c9ae9a51a75b8b2e69b7264f2837346a01d878b45ff96ea138fcb881413/genepi-1.0.6.tar.gz" } ], "2.0.1": [ { "comment_text": "", "digests": { "md5": "935517be82a9b558450faee65e8a74aa", "sha256": "0fbbbf3c8cb7824101ac9d134bd482c4205dc769ddba5ce85fab4771a015a845" }, "downloads": -1, "filename": "genepi-2.0.1.tar.gz", "has_sig": false, "md5_digest": "935517be82a9b558450faee65e8a74aa", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3", "size": 443094, "upload_time": "2019-07-15T02:58:15", "url": "https://files.pythonhosted.org/packages/16/9c/58c9713b45a58d0a2c6189e83dd11e18ed23dceb9a4bd5e222f90114d707/genepi-2.0.1.tar.gz" } ], "2.0.2": [ { "comment_text": "", "digests": { "md5": "26558a993e83781773d88198750d4d43", "sha256": "8e607227a3c3c977efddbd24cb897a4790fb624b7a1e84dd9a4a5d2344df8849" }, "downloads": -1, "filename": "genepi-2.0.2-py3-none-any.whl", "has_sig": false, "md5_digest": "26558a993e83781773d88198750d4d43", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": ">=3", "size": 457613, "upload_time": "2019-09-18T05:44:05", "url": "https://files.pythonhosted.org/packages/0d/57/6aa5a84f67aa9f689c5c4b4cc1eb8a5844142223d5b51027377a5ca18b2e/genepi-2.0.2-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "26d87e7eac136bfd213b26992a47a2b6", "sha256": "a990d538978cc7ddbe6c0ece10f2836a741567e0eca16b8021e7e05ed7bfad19" }, "downloads": -1, "filename": "genepi-2.0.2.tar.gz", "has_sig": false, "md5_digest": "26d87e7eac136bfd213b26992a47a2b6", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3", "size": 438689, "upload_time": "2019-09-18T05:44:07", "url": "https://files.pythonhosted.org/packages/5a/05/ea3f519e229d80e0ba0ef73ae1f2d5c42c3c821c62bd997e5e794da213fe/genepi-2.0.2.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "26558a993e83781773d88198750d4d43", "sha256": "8e607227a3c3c977efddbd24cb897a4790fb624b7a1e84dd9a4a5d2344df8849" }, "downloads": -1, "filename": "genepi-2.0.2-py3-none-any.whl", "has_sig": false, "md5_digest": "26558a993e83781773d88198750d4d43", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": ">=3", "size": 457613, "upload_time": "2019-09-18T05:44:05", "url": "https://files.pythonhosted.org/packages/0d/57/6aa5a84f67aa9f689c5c4b4cc1eb8a5844142223d5b51027377a5ca18b2e/genepi-2.0.2-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "26d87e7eac136bfd213b26992a47a2b6", "sha256": "a990d538978cc7ddbe6c0ece10f2836a741567e0eca16b8021e7e05ed7bfad19" }, "downloads": -1, "filename": "genepi-2.0.2.tar.gz", "has_sig": false, "md5_digest": "26d87e7eac136bfd213b26992a47a2b6", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3", "size": 438689, "upload_time": "2019-09-18T05:44:07", "url": "https://files.pythonhosted.org/packages/5a/05/ea3f519e229d80e0ba0ef73ae1f2d5c42c3c821c62bd997e5e794da213fe/genepi-2.0.2.tar.gz" } ] }