{ "info": { "author": "Gregory Giecold", "author_email": "ggiecold@jimmy.harvard.edu", "bugtrack_url": null, "classifiers": [ "Development Status :: 4 - Beta", "Environment :: Console", "Intended Audience :: Developers", "Intended Audience :: End Users/Desktop", "Intended Audience :: Science/Research", "License :: OSI Approved :: MIT License", "Natural Language :: English", "Operating System :: MacOS :: MacOS X", "Operating System :: POSIX", "Programming Language :: Python :: 2.7", "Topic :: Scientific/Engineering", "Topic :: Scientific/Engineering :: Mathematics", "Topic :: Scientific/Engineering :: Visualization" ], "description": "DBSCAN\\_multiplex\r\n=================\r\n\r\nOverview\r\n--------\r\n\r\nA fast and memory-efficient implementation of DBSCAN (Density-Based\r\nSpatial Clustering of Applications with Noise).\r\n\r\nIt is especially suited for multiple rounds of down-sampling and\r\nclustering from a joint dataset: after an initial overhead O(N log(N)),\r\neach subsequent run of clustering will have O(N) time complexity.\r\n\r\nAs illustrated by a doctest embedded in the present module's docstring,\r\non a dataset of 15,000 samples and 47 features, on a Asus Zenbook laptop\r\nwith 8 GiB of RAM and an Intel Core M processor,\r\nDBSCAN\\_multiplex processes 50 rounds of sub-sampling and clustering \r\nin about 4 minutes, whereas Scikit-learn's implementation of DBSCAN \r\nperforms the same task in more than 28 minutes.\r\n\r\nSuch a test can be performed quite conveniently on your machine: simply\r\nentering ``python DBSCAN_multiplex``\\ in your terminal will prompt a\r\ndoctest to start comparing the performance of the two afore-mentioned\r\nimplementations of DBSCAN.\r\n\r\nThis 7-fold gain in performance proved critical to the statistical\r\nlearning application that prompted the design of this algorithm.\r\n\r\nInstallation and Requirements\r\n-----------------------------\r\n\r\nDBSCAN\\_multiplex requires a machine running any member of the Unix-like \r\nfamily of operaating systems, Python 2.7 along with the following packages\r\nand a few modules from the Standard Python Library:\r\n\r\n- NumPy >= 1.9\r\n\r\n- PyTables\r\n\r\n- scikit-learn\r\n\r\nYou can install DBSCAN_multiplex from the official Python Package Index (PyPI) as follows:\r\n\r\n- open a terminal window;\r\n- type in ``pip install DBSCAN_multiplex``.\r\n\r\nThe command listed above should automatically install or upgrade any missing or outdated dependency among those listed at the beginning of this section.\r\n\r\nUsage and Example\r\n-----------------\r\n\r\nSee the docstrings associated to each function of the DBSCAN\\_multiplex\r\nmodule for more information; in a Python interpreter console, they can\r\nbe viewed by calling the built-in help system, e.g.,\r\n``help(DBSCAN_multiplex.load)``.\r\n\r\nThe following few lines show how DBSCAN\\_multiplex can be used for\r\nclustering 50 randomly selected subsamples out of a common Gaussian\r\ndistributed dataset. This situation arises in consensus clustering where\r\none might want to obtain and then combine multiple vectors of cluster\r\nlabels.\r\n\r\n::\r\n\r\n >>> import numpy as np\r\n >>> import DBSCAN_multiplex as DB\r\n\r\n >>> data = np.random.randn(15000, 7)\r\n >>> N_iterations = 50\r\n >>> N_sub = 9 * data.shape[0] / 10\r\n >>> subsamples_matrix = np.zeros((N_iterations, N_sub), dtype = int)\r\n >>> for i in xrange(N_iterations): \r\n subsamples_matrix[i] = np.random.choice(data.shape[0], N_sub, replace = False)\r\n >>> eps, labels_matrix = DB.DBSCAN(data, minPts = 3, subsamples_matrix = subsamples_matrix, verbose = True)\r\n \r\nReferences\r\n----------\r\n\r\n- Ester, M., Kriegel, H.-P., Sander, J. and Xu, X. (1996) \"A density-based\r\n algorithm for discovering clusters in large spatial databases with noise\".\r\n Proceedings of the Second International Conference on Knowledge Discovery\r\n and Data Mining (KDD-96). AAAI Press. pp. 226-231.\r\n\r\n- Kriegel, H.-P., Kroeger, P., Sander, J. and Zimek, A. (2011) \"Density-based\r\n Clustering\". WIREs Data Mining and Knowledge Discovery 1 (3): 231-240.", "description_content_type": null, "docs_url": null, "download_url": "https://github.com/GGiecold/DBSCAN_multiplex", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/GGiecold/DBSCAN_multiplex", "keywords": "machine-learning clustering", "license": "MIT License", "maintainer": "", "maintainer_email": "", "name": "DBSCAN_multiplex", "package_url": "https://pypi.org/project/DBSCAN_multiplex/", "platform": "Any", "project_url": "https://pypi.org/project/DBSCAN_multiplex/", "project_urls": { "Download": "https://github.com/GGiecold/DBSCAN_multiplex", "Homepage": "https://github.com/GGiecold/DBSCAN_multiplex" }, "release_url": "https://pypi.org/project/DBSCAN_multiplex/1.5/", "requires_dist": null, "requires_python": null, "summary": "Fast and memory-efficient DBSCAN clustering,possibly on various subsamples out of a common dataset", "version": "1.5" }, "last_serial": 1906525, "releases": { "1.3": [ { "comment_text": "", "digests": { "md5": "9ad633f00eb9e450e53e524c73fd7196", "sha256": "8dc5ab82963592448e16d04a32d5b279122d464f8ab77a3790f044bfae6364d7" }, "downloads": -1, "filename": "DBSCAN_multiplex-1.3.tar.gz", "has_sig": false, "md5_digest": "9ad633f00eb9e450e53e524c73fd7196", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 9847, "upload_time": "2015-11-17T18:23:12", "url": "https://files.pythonhosted.org/packages/2f/9f/1b7f1b03ba6efb2d0969503e41d6952eaec24699943636506901eb441d49/DBSCAN_multiplex-1.3.tar.gz" } ], "1.4": [ { "comment_text": "", "digests": { "md5": "8242cc10df56cbf43506b6c715473c73", "sha256": "90b79aa61f111d4a721571c1e96b106ff3f4f35cbf7e9acbf9a09ae756461292" }, "downloads": -1, "filename": "DBSCAN_multiplex-1.4.tar.gz", "has_sig": false, "md5_digest": "8242cc10df56cbf43506b6c715473c73", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 10015, "upload_time": "2015-11-18T18:26:06", "url": "https://files.pythonhosted.org/packages/c5/a6/1f53d611a65a27b043e1f2c17eb9cb62ab2489e8f5057c68f2224f6ba09d/DBSCAN_multiplex-1.4.tar.gz" } ], "1.5": [ { "comment_text": "", "digests": { "md5": "767e25b9d76719c0bfc785eea016e0bd", "sha256": "3836d25b453af5bc4fcc6680aad27868081d8bca30a5200aa18a2991b0388346" }, "downloads": -1, "filename": "DBSCAN_multiplex-1.5.tar.gz", "has_sig": false, "md5_digest": "767e25b9d76719c0bfc785eea016e0bd", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 9980, "upload_time": "2015-12-07T16:16:20", "url": "https://files.pythonhosted.org/packages/65/63/8b64de0b81326ad6bed0fce0df79016d8141961d44b3447c6d96cf275645/DBSCAN_multiplex-1.5.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "767e25b9d76719c0bfc785eea016e0bd", "sha256": "3836d25b453af5bc4fcc6680aad27868081d8bca30a5200aa18a2991b0388346" }, "downloads": -1, "filename": "DBSCAN_multiplex-1.5.tar.gz", "has_sig": false, "md5_digest": "767e25b9d76719c0bfc785eea016e0bd", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 9980, "upload_time": "2015-12-07T16:16:20", "url": "https://files.pythonhosted.org/packages/65/63/8b64de0b81326ad6bed0fce0df79016d8141961d44b3447c6d96cf275645/DBSCAN_multiplex-1.5.tar.gz" } ] }