{ "info": { "author": "brentp", "author_email": "bpederse@gmail.com", "bugtrack_url": null, "classifiers": [ "Programming Language :: Python :: 2.7", "Programming Language :: Python :: 3", "Programming Language :: Python :: 3.2", "Programming Language :: Python :: 3.3", "Topic :: Scientific/Engineering :: Bio-Informatics" ], "description": "Aclust\n======\nStreaming agglomerative clustering with custom distance and correlation\n\n\n*Agglomerative clustering* is a very simple algorithm.\nThe function `aclust` provided here is an attempt at a simple implementation\nof a modified version that allows a stream of input so that data is not\nrequired to be read into memory all at once. Most clustering algorithms operate\non a matrix of correlations which may not be feasible with high-dimensional\ndata.\n\n`aclust` **defers** some complexity to the caller by relying on a stream of\nobjects that support an interface (I know, I know) of:\n\n obj.distance(other) -> numeric\n obj.is_correlated(other) -> bool\n\nWhile this does add some infrastructure, we can imagine a class with\nposition and values attributes, where the former is an integer and the\nlatter is a list of numeric values. Then, those methods would be implemented\nas:\n\n def distance(self, other):\n return self.position - other.position\n\n def is_correlated(self, other):\n return np.corrcoef(self.values, other.values)[0, 1] > 0.5\n\nThis allows the `aclust` function to be used on **any** kind of data. We can\nimagine that distance might return the Levenshtein distance between 2 strings\nwhile is\\_correlated might indicate their presence in the same sentence or in\nsentences with the same sentiment.\n\nSince the input can be- and the output is- streamed, it is assumed the the objs\nare in sorted order. This is important for things like genomic data, but may be\nless so in text, where the max\\_skip parameter can be set to a large value to\ndetermine how much data is kept in memory.\n\nSee the function docstring for examples and options. The function signature is:\n\n aclust(object\\_stream, max\\_dist,\n max\\_skip=1, linkage='single', multi\\_member=False)\n\nIt yields clusters (lists) of objects from the input object stream.\n\n`multi\\_member` allows a feature to be a member of multiple clusters as long as\nit meets the distance and correlation constraints. The default is to only\nallow a feature to be added to the *nearest* cluster with which it is\ncorrelated.\n\nUses\n====\n\n+ Clustering methylation data which we know to be locally correlated. We can\n use this to reduce the number of tests (of association) from 1 test per CpG,\n to 1 test per correlated unit.\n See: https://github.com/brentp/aclust/blob/master/examples/methylation-clustering-asthma.py for a full example.\n\n```\n chrom start end n_probes probes asthma.pvalue asthma.tstat asthma.coef\n chr1 566570 567501 8 chr1:566570,chr1:566731,chr1:567113,chr1:567206,chr1:567312,chr1:567348,chr1:567358,chr1:567501 0.4566 -0.74 -0.06\n chr1 713985 714021 3 chr1:713985,chr1:714012,chr1:714021 0.1185 -1.56 -0.13\n chr1 845810 846195 3 chr1:845810,chr1:846155,chr1:846195 0.5913 0.54 0.04\n chr1 848379 848440 3 chr1:848379,chr1:848409,chr1:848440 0.3399 -0.95 -0.06\n chr1 854766 855046 7 chr1:854766,chr1:854824,chr1:854838,chr1:854918,chr1:854951,chr1:854966,chr1:855046 0.7482 -0.32 -0.02\n chr1 870791 871546 8 chr1:870791,chr1:870810,chr1:870958,chr1:871033,chr1:871057,chr1:871308,chr1:871441,chr1:871546 0.2198 -1.23 -0.11\n chr1 892857 892948 3 chr1:892857,chr1:892914,chr1:892948 0.2502 -1.15 -0.05\n chr1 901062 901799 5 chr1:901062,chr1:901449,chr1:901685,chr1:901725,chr1:901799 0.6004 0.52 0.04\n chr1 946875 947091 4 chr1:946875,chr1:947003,chr1:947018,chr1:947091 0.9949 0.01 0.00\n```\n So we can filter on the asthma.pvalue to find regions associated with asthma.\n \n\nINSTALL\n=======\n\n`aclust` is available on pypi, as such it can be installed with:\n\n pip install aclust\n\n\nAcknowledgments\n===============\n\nThe idea of this is taken from this paper:\n\n Sofer, T., Schifano, E. D., Hoppin, J. A., Hou, L., & Baccarelli, A. A. (2013). A-clustering: A Novel Method for the Detection of Co-regulated Methylation Regions, and Regions Associated with Exposure. Bioinformatics, btt498.\n\nThe example uses a pull-request implementing GEE for python's statsmodels:\n https://github.com/statsmodels/statsmodels/pull/928", "description_content_type": null, "docs_url": null, "download_url": "UNKNOWN", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/brentp/aclust/", "keywords": "bioinformatics cluster", "license": "MIT", "maintainer": null, "maintainer_email": null, "name": "aclust", "package_url": "https://pypi.org/project/aclust/", "platform": "any", "project_url": "https://pypi.org/project/aclust/", "project_urls": { "Download": "UNKNOWN", "Homepage": "https://github.com/brentp/aclust/" }, "release_url": "https://pypi.org/project/aclust/0.1.3/", "requires_dist": null, "requires_python": null, "summary": "streaming agglomerative clustering", "version": "0.1.3" }, "last_serial": 965605, "releases": { "0.1.0": [ { "comment_text": "", "digests": { "md5": "4a48af6e0225b556c497d29905e13bea", "sha256": "68167d9947f82b0fa9b1167271341cb05072599783091dabda450e0fa04d2ec9" }, "downloads": -1, "filename": "aclust-0.1.0.tar.gz", "has_sig": false, "md5_digest": "4a48af6e0225b556c497d29905e13bea", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 3620, "upload_time": "2013-09-11T15:24:33", "url": "https://files.pythonhosted.org/packages/ba/f3/87e3f33a3ea3e3585025db64955a0cecb561ff2b43ce84eb4af6e2a2e4d2/aclust-0.1.0.tar.gz" } ], "0.1.2": [ { "comment_text": "", "digests": { "md5": "d05e16de332ce8562442aa6f7f250025", "sha256": "b54b3c906a461b6159481c827a6859adfb15e00ccd12f8bc7b52f57cb12bced6" }, "downloads": -1, "filename": "aclust-0.1.2.tar.gz", "has_sig": false, "md5_digest": "d05e16de332ce8562442aa6f7f250025", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 4924, "upload_time": "2013-11-19T16:09:37", "url": "https://files.pythonhosted.org/packages/f1/25/fc2f0d64ce112753b03111948d69b4541679089a02ce784b4bea0bc82563/aclust-0.1.2.tar.gz" } ], "0.1.3": [ { "comment_text": "", "digests": { "md5": "6d0041dbdc768840dbcd8e705cd85e76", "sha256": "27c0d0372310ba796407cf7919001d80cb45831f31f5144d0d1f216badbc84f3" }, "downloads": -1, "filename": "aclust-0.1.3.tar.gz", "has_sig": false, "md5_digest": "6d0041dbdc768840dbcd8e705cd85e76", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 5298, "upload_time": "2014-01-10T17:48:32", "url": "https://files.pythonhosted.org/packages/02/28/5aed4aff9e1894c940348a7a1564a37cb516692d01fcbc6feb4534c0a408/aclust-0.1.3.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "6d0041dbdc768840dbcd8e705cd85e76", "sha256": "27c0d0372310ba796407cf7919001d80cb45831f31f5144d0d1f216badbc84f3" }, "downloads": -1, "filename": "aclust-0.1.3.tar.gz", "has_sig": false, "md5_digest": "6d0041dbdc768840dbcd8e705cd85e76", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 5298, "upload_time": "2014-01-10T17:48:32", "url": "https://files.pythonhosted.org/packages/02/28/5aed4aff9e1894c940348a7a1564a37cb516692d01fcbc6feb4534c0a408/aclust-0.1.3.tar.gz" } ] }