{ "info": { "author": "Joon-Hyeong Park", "author_email": "clearclouds@snu.ac.kr", "bugtrack_url": null, "classifiers": [], "description": "GeneMethyl 1.0.0\nWritten by Joon-Hyeong Park, Seoul National University, Applied Biology and Chemistry, clearclouds@snu.ac.kr.\n\n\n\nA. Goal\n\n\tDNA methylation, adding methyl group to 5' carbon of a cytosine pyrimidine ring, is considered as a important biomarker for some diseases.\n\tAnd CpG sites, regions of a single DNA strand where a cytosine nucleotide is followed by a guanine nucleotide, are known as easily methylated in human body.\n\tMethylation of CpG sites can affect specific genes' expression levels and in some diseases like cancer, CpG sites' hypermethylation silence tumor suppressor genes' activities and hypomethylation promote retrotransposons' activities like LINE-1 to make chromosome instability.\n\tSo, numerous clinical researches about methylation of CpG sites were already completed and still be underway.\n\tTo make easily analyze relationship between DNA methylation status and target genes' activities, I made the packaged named GeneMethyl.\n\tI hope this package will contribute to many researches. And I plan to add more functions to solve more complicated problems.\n\n\n\n\nB. Input data format\n\n\tRequired Files : DNA methylation data, RNAseq data(raw counts or normalized by RSEM) of samples about a target disease.\n\n\t1. File name : [TargetDiseaseName].DNA_methylation_450K.tsv (Separated by tab)\n\n\t\tColumn : Each sample\n\t\tRow : Each CpG site\n\n\t\tMissing values are denoted as nothing. (Just separted by tab)\n\t\tWe recognize sample length up to 12. (excluding '-', or '_')\n\n\tIllumina's Infinium HumanMethylation450K Beadchip made us easily get DNA methylation data for selected CpG sites about 480,000.\n\tSo, this package use beta-value data table generated from HumanMethylation450K.\n\tAnd the format of a DNA methylation dataset is the same with a TCGA's pan-cancer atlas methylation dataset.\n\n\n\n\t2. File name : [TargetDiseaseName].PANCANCER.RNAseq.tsv (Separated by tab)\n\n\t\tColumn : Each sample\n\t\tRow : Each gene\n\n\t\tMissing values are denoted as nothing. (Just separted by tab)\n\t\tWe recognize sample length up to 12. (excluding '-', or '_')\n\n\tYour target gene expression levels are automatically calculated from RNAseq raw counts data or normalized data by RSEM package.\n\tAnd the format of a RNAseq dataset is the same with a TCGA's pan-cancer atlas RNAseq dataset.\n\n\n\n\n\nC. Package functions\n\n\t1. BetavalueDistribution\n\n\t\t(1) BetavalueDistribution.Draw(\"TargetDiseaseName\", Cutoff, WhetherHistogram) : Drawing beta-value distribution by density plot based on histogram.\n\n\t\t\tInput parameters : \n\n\t\t\t\t\"TargetDiseaseName\" : Target disease name\n\t\t\t\tCutoff : Cutoff must be in [0, 1] and sections are divided by this cutoff.\n\t\t\t\t\tex) Cutoff : 0.1 \t-> \tYour Sections : 0~0.1, 0.1~0.2, 0.2~0.3, 0.3~0.4, 0.4~0.5, 0.5~0.6, 0.6~0.7, 0.7~0.8, 0.8~0.9, 0.9~1.0\n\t\t\t\tWhetherHistogram : True or False. If you choose True, histogram of beta-values divided by cutoff is also shown.\n\n\n\t\t\tDescription : \n\n\t\t\t\tDNA methylation data is too large to easily handle, so receiving whole DNA methylation data and drawing density plot at once is not effective from the perspective of memory.\n\t\t\t\tSo, I received DNA methylation data line by line and approximated DNA methylation beta-values into several sections divided by specific cutoff. (You can choose cutoff)\n\t\t\t\tThen by using histogram, I made density plot of beta-values' distribution.\n\n\n\t\t\tOutput file :\n\n\t\t\t\t/Result/DistributionPlot/[TargetDiseaseName].Betavalue.Distribution.Plot.pdf\n\n\n\n\n\n\t2. SimpleCutoff\n\n\t\t(1) SimpleCutoff.TargetGeneActivity(\"TargetDiseaseName\", [TargetGenesList]) : Calculating target genes' activities.\n\n\t\t\tInput parameters : \n\n\t\t\t\t\"TargetDiseaseName\" : Target disease name\n\t\t\t\t[TargetGenesList] : Target genes' list. You can choose multiple genes.\n\n\n\t\t\tDescription : \n\n\t\t\t\tYou can simply calculate target genes' activities.\n\t\t\t\tI decided representative target genes' activity by using logarithm.\n\t\t\t\t\tbase : a number of gene\n\t\t\t\t\tanti-logarithm : geometric mean of target genes' RNAseq data added to pseudocount(1).\n\t\t\t\t\tcf) I added whole RNAseq data to pseudocount(1) to prevent from minus value of representative target genes' activity.\n\n\n\t\t\tOutput file :\n\n\t\t\t\t/Result/SimpleCutoff/[TargetDiseaseName].TargetGeneActivity.txt\n\n\n\n\n\t\t(2) SimpleCutoff.View_Correlation_AND_ScatterPlot(\"TargetDiseaseName\", [TargetGenesList], [Cutoff], Type, WhetherFoldChange) : Calculating sperman's correlation between representative target genes' activities and summations of whole samples. Drawing scatter plots of this correlation.\n\n\t\t\tInput parameters : \n\n\t\t\t\t\"TargetDiseaseName\" : Target disease name\n\t\t\t\t[TargetGenesList] : Target genes' list. You can choose multiple genes.\n\t\t\t\tCutoff, Type : Calculating summations by using Cutoff depending on Type.\n\t\t\t\t\tType : \"Lower\", \"Higher\", \"Both\", \"All\"\n\t\t\t\t\t\t\"Lower\" : If beta-value is lower than cutoff, sample's beta-value is converted into 1. Or, into 0. Then summate this values to each sample. -> You can get the number of beta-values lower than cutoff to each sample. (in [0, cutoff])\n\t\t\t\t\t\t\"Higher\" : If beta-value is higher than cutoff, sample's beta-value is converted into 1. Or, into 0. Then summate this values to each sample. -> You can get the number of beta-values higher than cutoff to each sample. (in [cutoff, 1])\n\t\t\t\t\t\t\"Both\" : If beta-value is lower than cutoff/2 or higher than 1 - cutoff/2, sample's beta-value is converted into 1. Or, into 0. Then summate this values to each sample. -> You can get the number of beta-values in [0, cutoff/2] or [1 - cutoff/2, 1] to each sample.\n\t\t\t\t\t\t\"All\" : Doing all of theses types respectively.\n\t\t\t\t\tCutoff : Cutoff must be in [0, 1] and it determine the section. You can choose multiple cutoffes.\n\t\t\t\tWhetherFoldChange : Calculating the fold change\n\t\t\t\t\tFold change = Mean or Median of the representative target genes' activity of samples included in the section by cutoff / Mean or Median of the representative target genes' activity of samples NOT included in the section by cutoff\n\n\t\t\tOutput file :\n\n\t\t\t\t/Result/SimpleCutoff/FC_CpGsites/WholeSites.Cutoff.[Cutoff].[TargetDiseaseName].[Type].FC.CpGsites.txt\n\t\t\t\t/Result/SimpleCutoff/Summation/WholeSites.Cutoff.[Cutoff].[TargetDiseaseName].[Type].Binarization.Summation.txt\n\t\t\t\t/Result/SimpleCutoff/Correlation/WholeSites.[TargetDiseaseName].[Type].Correlation.Summation.And.TargetGeneActivity.txt -> including whole cutoffes to compare easily\n\t\t\t\t/Result/SimpleCutoff/Correlation/WholeSites.[TargetDiseaseName].CompareAll.Correlation.Summation.And.TargetGeneActivity.txt -> only emerging if Type is All to compare easily\n\t\t\t\t/Result/SimpleCutoff/ScatterPlot/WholeSites.Cutoff.[Cutoff].[TargetDiseaseName].[Type].ScatterPlot.pdf\n\n\n\n\n\n\t3. TopPercentageCutoff\n\n\t\t(1) TopPercentageCutoff.TargetGeneActivity(\"TargetDiseaseName\", [TargetGenesList]) : Calculating target genes' activities.\n\n\t\t\tInput parameters : \n\n\t\t\t\t\"TargetDiseaseName\" : Target disease name\n\t\t\t\t[TargetGenesList] : Target genes' list. You can choose multiple genes.\n\n\n\t\t\tDescription : \n\n\t\t\t\tYou can simply calculate target genes' activities.\n\t\t\t\tI decided representative target genes' activity by using logarithm.\n\t\t\t\t\tbase : a number of gene\n\t\t\t\t\tanti-logarithm : geometric mean of target genes' RNAseq data added to pseudocount(1).\n\t\t\t\t\tcf) I added whole RNAseq data to pseudocount(1) to prevent from minus value of representative target genes' activity.\n\n\n\t\t\tOutput file :\n\n\t\t\t\t/Result/TopNpercentageCutoff/[TargetDiseaseName].TargetGeneActivity.txt\n\n\n\n\n\t\t(2) TopPercentageCutoff.View_Correlation_AND_ScatterPlot(\"TargetDiseaseName\", [TargetGenesList], [Percentage], Type, WhetherMeanMethod, WhetherFoldChange) : Calculating sperman's correlation between representative target genes' activities and summations of whole samples. Drawing scatter plots of this correlation.\n\n\t\t\tBackground : \n\n\t\t\t\tBefore explaining TopNpercentageCutoff, we have to determine what percentage means in this method.\n\n\t\t\t\tIn some diseases like cancer, beta-value distribution of each CpG site is look like roughly 2-peaked graph.\n\t\t\t\tSo, I roughly classified beta-value distribution of each CpG site as 2 categories, left-skewed and right-skewed.\n\t\t\t\tTo determine skewedness of CpG sites, we compare median of each CpG site with mean or 0.5( = an exact half of [0, 1]).\n\t\t\t\t\tIf you want to use 'mean method', you need to make WhetherMeanMethod True.\n\t\t\t\t\tIf you want to use '0.5 method', you need to make WhetherMeanMethod False.\n\n\t\t\t\tAfter determining skewedness, we can classify types as \"Positive\", \"Negative\", \"Both\".\n\t\t\t\t\tIf type is \"Positive\" and CpG site's beta-value distribution is right-skewed, top N% of sample beta-value is converted into 1. Or, into 0.\n\t\t\t\t\tIf type is \"Positive\" and CpG site's beta-value distribution is left-skewed, bottom N% of sample beta-value is converted into 1. Or into 0.\n\t\t\t\t\tThus, \"Positive\" type means we count the number of beta-values following the tendency of each CpG site's distribution for each sample.\n\n\t\t\t\t\tIf type is \"Negative\" and CpG site's beta-value distribution is right-skewed, bottom N% of sample beta-value is converted into 1. Or, into 0.\n\t\t\t\t\tIf type is \"Negative\" and CpG site's beta-value distribution is left-skewed, top N% of sample beta-value is converted into 1. Or into 0.\n\t\t\t\t\tThus, \"Negative\" type means we count the number of beta-values not following the tendency of each CpG site's distribution for each sample.\n\n\t\t\t\t\tIf type is \"Both\", regardless of CpG site's skewedness top N/2% and bottom N/2% of sample beta-value is converted into 1. Or, into 0.\n\t\t\t\t\tThus, \"Both\" type means we count the number of strongly methylated or demethylated CpG sites for each sample.\n\n\t\t\t\tBy using this method, we can simply count the number of hypermethylated or hypomethylated CpG sites for the specific situation.\n\t\t\t\tTo explain this, let's take an example of cancer.\n\t\t\t\t\tGlobally, CpG sites are hypomethylated to increase chromosomal instability, by expressing retrotranspons(just one example).\n\t\t\t\t\tBut, CpG sites are hypermethylated near the promoter regions to silence life-critical genes.\n\t\t\t\t\tSo, If we use TopNpercentageCutoff Positive type method, we can count the number of these hypermethylated or hypomethylated CpG sites.\n\t\t\t\t\tThen, we can correlate this values with the representative target genes' activity.\n\n\t\t\t\tIn short, we can analyze the epigenetic impact of diseases.\n\n\n\t\t\tInput parameters :\n\n\t\t\t\t\"TargetDiseaseName\" : Target disease name\n\t\t\t\t[TargetGenesList] : Target genes' list. You can choose multiple genes.\n\t\t\t\tPercentage, Type : Calculating summations by using Percentage depending on Type.\n\t\t\t\t\tType : \"Positive\", \"Negative\", \"Both\", \"All\"\n\t\t\t\t\tPercentage : What percentage do you want?\n\t\t\t\tWhetherFoldChange : Calculating the fold change\n\t\t\t\t\tFold change = Mean or Median of the representative target genes' activity of samples included in the section by percentage / Mean or Median of the representative target genes' activity of samples NOT included in the section by percentage.\n\t\t\t\tWhetherMeanMethod : Choosing the method of determining skewedness\n\n\n\t\t\tOutput file :\n\n\t\t\t\t/Result/TopPercentageCutoff/FC_CpGsites/WholeSites.Percentage.[Percentage].[TargetDiseaseName].[Type].FC.CpGsites.txt\n\t\t\t\t/Result/TopPercentageCutoff/Summation/WholeSites.Percentage.[Percentage].[TargetDiseaseName].[Type].Binarization.Summation.txt\n\t\t\t\t/Result/TopPercentageCutoff/Skewed/WholeSites.[TargetDiseaseName].Left.Skewed.CpGsites.txt\n\t\t\t\t/Result/TopPercentageCutoff/Skewed/WholeSites.[TargetDiseaseName].Right.Skewed.CpGsites.txt\n\t\t\t\t/Result/TopPercentageCutoff/Correlation/WholeSites.[TargetDiseaseName].[Type].Correlation.Summation.And.TargetGeneActivity.txt -> including whole percentages to compare easily\n\t\t\t\t/Result/TopPercentageCutoff/Correlation/WholeSites.[TargetDiseaseName].CompareAll.Correlation.Summation.And.TargetGeneActivity.txt -> only emerging if Type is All to compare easily\n\t\t\t\t/Result/TopPercentageCutoff/ScatterPlot/WholeSites.Percentage.[Percentage].[TargetDiseaseName].[Type].ScatterPlot.pdf\n\n\n\n\n\nD. Usage Example\n\n\tfrom GeneMethyl import *\n\n\tBetavalueDistribution.DrawDensityPlot(\"PANCANCER\", 0.001, True)\n\n\tSimpleCutoff.TargetGeneActivity(\"PANCANCER\", [\"GZMA\", \"PRF1\"])\n\tSimpleCutoff.View_Correlation_AND_ScatterPlot(\"PANCANCER\", [\"GZMA\", \"PRF1\"], [0.1, 0.2], \"All\", True)\n\t\tcf) Even before not calling TargetGeneActivity, it will automatically executed in this function.\n\n\tTopPercentageCutoff.TargetGeneActivity(\"PANCANCER\", [\"GZMA\", \"PRF1\"])\n\tTopPercentageCutoff.View_Correlation_AND_ScatterPlot(\"PANCANCER\", [\"GZMA\", \"PRF1\"], [0.05, 0.1, 0.15, 0.2], \"All\", False, True)\n\t\tcf) Even before not calling TargetGeneActivity, it will automatically executed in this function.\n\n\n", "description_content_type": "text/markdown", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/JoonHyeongPark/GeneMethyl", "keywords": "", "license": "", "maintainer": "", "maintainer_email": "", "name": "GeneMethyl", "package_url": "https://pypi.org/project/GeneMethyl/", "platform": "", "project_url": "https://pypi.org/project/GeneMethyl/", "project_urls": { "Homepage": "https://github.com/JoonHyeongPark/GeneMethyl" }, "release_url": "https://pypi.org/project/GeneMethyl/1.0.0/", "requires_dist": null, "requires_python": "", "summary": "By this package, you can simply calculate the epigenetic result caused by a specific disease and reveal the relationship between the epigenetic effect and specific genes' expressions. For numerous samples of a specific disease, you can get a correlation value between gene expression levels and summations calculated from DNA methylation beta-values. Additionally you can simply know beta-value distribution by density plot.", "version": "1.0.0" }, "last_serial": 4367941, "releases": { "1.0.0": [ { "comment_text": "", "digests": { "md5": "b78569d3a61d2d480c94d009e93c614d", "sha256": "953c198827065b022659c7c0547daee35ca8d374e8e4b3653c2d6bb503740cc1" }, "downloads": -1, "filename": "GeneMethyl-1.0.0-py3-none-any.whl", "has_sig": false, "md5_digest": "b78569d3a61d2d480c94d009e93c614d", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 17418, "upload_time": "2018-10-12T09:39:36", "url": "https://files.pythonhosted.org/packages/23/eb/235f9bcce2a6877381cafcb746712b4cf14f3bdf272528c776268d198ab7/GeneMethyl-1.0.0-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "5b1e9b074d35770161e4daf78ba43ee2", "sha256": "7ae4042d2f7a7a2531bd587d6640a7fe5f99a3fb1f3abd531294568bb3954c76" }, "downloads": -1, "filename": "GeneMethyl-1.0.0.tar.gz", "has_sig": false, "md5_digest": "5b1e9b074d35770161e4daf78ba43ee2", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 10210, "upload_time": "2018-10-12T09:39:38", "url": "https://files.pythonhosted.org/packages/4d/75/5b48c6690faecb451dd5f0547da4dfe39f887742fb85f7db8df4ac92524d/GeneMethyl-1.0.0.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "b78569d3a61d2d480c94d009e93c614d", "sha256": "953c198827065b022659c7c0547daee35ca8d374e8e4b3653c2d6bb503740cc1" }, "downloads": -1, "filename": "GeneMethyl-1.0.0-py3-none-any.whl", "has_sig": false, "md5_digest": "b78569d3a61d2d480c94d009e93c614d", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 17418, "upload_time": "2018-10-12T09:39:36", "url": "https://files.pythonhosted.org/packages/23/eb/235f9bcce2a6877381cafcb746712b4cf14f3bdf272528c776268d198ab7/GeneMethyl-1.0.0-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "5b1e9b074d35770161e4daf78ba43ee2", "sha256": "7ae4042d2f7a7a2531bd587d6640a7fe5f99a3fb1f3abd531294568bb3954c76" }, "downloads": -1, "filename": "GeneMethyl-1.0.0.tar.gz", "has_sig": false, "md5_digest": "5b1e9b074d35770161e4daf78ba43ee2", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 10210, "upload_time": "2018-10-12T09:39:38", "url": "https://files.pythonhosted.org/packages/4d/75/5b48c6690faecb451dd5f0547da4dfe39f887742fb85f7db8df4ac92524d/GeneMethyl-1.0.0.tar.gz" } ] }