{ "info": { "author": "Gregory Giecold", "author_email": "ggiecold@jimmy.harvard.edu", "bugtrack_url": null, "classifiers": [ "Development Status :: 4 - Beta", "Environment :: Console", "Intended Audience :: Developers", "Intended Audience :: End Users/Desktop", "Intended Audience :: Science/Research", "License :: OSI Approved :: MIT License", "Natural Language :: English", "Operating System :: MacOS :: MacOS X", "Operating System :: POSIX", "Programming Language :: Python :: 2.7", "Topic :: Scientific/Engineering", "Topic :: Scientific/Engineering :: Mathematics", "Topic :: Scientific/Engineering :: Visualization" ], "description": "Density\\_Sampling\n=================\n\nOverview\n--------\n\nFor a data-set comprising a mixture of rare and common populations,\ndensity sampling gives equal weights to selected representatives of\nthose distinct populations.\n\nDensity sampling is a balancing act between signal and noise. Indeed,\nwhile it increases the prevalence of rare populations, it also increases\nthe prevalence of noisy sample points that would happen to have their\nlocal densities larger than an outlier density computed by density\nsampling.\n\nMore specifically, density sampling proceeds as follows: \\* For each\nsample point of the data-set 'data', estimate a local density in feature\nspace by counting the number of neighboring data-points within a\nparticular region centered around that sample point. \\* The ``i``-th\nsample point of the data-set is selected by density sampling with a\nprobability given by:\n\n::\n\n | 0 if outlier_density > LD[i];\n P(keep the i-th data-point) = | 1 if outlier_density <= LD[i] <= target_density;\n | target_density / LD[i] if LD[i] > target_density.\n\nHere ``LD[i]`` denotes the local density of the ``i``-th sample point of\nthe data-set, whereas ``outlier_density`` and ``target_density`` are\ncomputed as particular percentiles of the distribution of local\ndensities.\n\nInstallation and Requirements\n-----------------------------\n\nDensity\\_Sampling is written in Python 2.7 and requires the following\npackages, along with a few modules from the Python Standard Library: \\*\nNumPy >= 1.9.0 \\* scikit-learn \\* setuptools\n\nIt is recommended that you ascertain that those requirements are met and\ncheck that the afore-mentioned libraries are up and running, even though\nthe ``pip``\\ command below should automatically handle the installation\nof those dependencies. You can install Density\\_Sampling from the Python\nPackage Index (PyPI) in two simple steps: \\* open a terminal emulator\nwindow to interact with the Shell (KDE's ``konsole`` or GNOME's\n``terminal``); \\* enter the command: ``pip install Density_Sampling``\n\nThis module has been tested on Fedora, OS X and Ubuntu and should work\nfine on any other member of the Unix-like family of operating systems.\n\nUsage\n-----\n\nMore information on the inner workings of density sampling can be gained\nfrom the docstrings associated to the functions making up this module.\n\nThe following few lines illustrate density sampling on the Iris data-set\nfrom the UCI Machine Learning repository. Instead of copying those lines\ninto a Python interpreter console, a similar exemple can be run\nautomatically via Python's doctest's functionality. Simply locate the\ndirectory holding the file ``Density_Sampling.py``, change it to your\ncurrent working directory and then type ``python Density_Sampling.py``\nat the command line.\n\n::\n\n >>> from sklearn import datasets\n\n >>> iris = datasets.load_iris()\n >>> Y = iris.target\n\n >>> from sklearn.decomposition import PCA\n\n >>> X_reduced = PCA(n_components = 3).fit_transform(iris.data)\n\n >>> import matplotlib.pyplot as plt\n >>> from mpl_toolkits.mplot3d import Axes3D\n >>> from time import sleep\n\n >>> def plot_PCA(X_reduced, Y, title):\n fig = plt.figure(1, figsize = (10, 8))\n ax = Axes3D(fig, elev = -150, azim = 110)\n \n ax.scatter(X_reduced[:, 0], X_reduced[:, 1], X_reduced[:, 2], \n c = Y, cmap = plt.cm.Paired)\n \n ax.set_title('First three PCA direction for {title}'.format(**locals()))\n ax.set_xlabel('1st eigenvector')\n ax.w_xaxis.set_ticklabels([])\n ax.set_ylabel('2nd eigenvector')\n ax.w_yaxis.set_ticklabels([])\n ax.set_zlabel('3rd eigenvector')\n ax.w_zaxis.set_ticklabels([])\n \n plt.show(block = False)\n sleep(3)\n plt.close()\n \n >>> plot_PCA(X_reduced, Y, 'the whole Iris data-set')\n\n >>> import Density_Sampling\n >>> sampled_indices = Density_Sampling.density_sampling(X_reduced, \n metric = 'euclidean', desired_samples = 50)\n \n >>> downsampled_X_reduced = X_reduced[sampled_indices, :]\n >>> downsampled_Y = Y[sampled_indices]\n\n >>> plot_PCA(downsampled_X_reduced, downsampled_Y, \n 'the Iris data-set\\ndown-sampled to about 50 samples')\n\nReference\n---------\n\nGiecold, G., Marco, E., Trippa, L. and Yuan, G.-C., \"Robust Lineage\nReconstruction from High-Dimensional Single-Cell Data\". ArXiv preprint\n[q-bio.QM, stat.AP, stat.CO, stat.ML]: http://arxiv.org/abs/1601.02748", "description_content_type": null, "docs_url": null, "download_url": "https://github.com/GGiecold/Density_Sampling", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/GGiecold/Density_Sampling", "keywords": "data-analysis data-mining downsampling sampling statistics subsampling", "license": "MIT License", "maintainer": null, "maintainer_email": null, "name": "Density_Sampling", "package_url": "https://pypi.org/project/Density_Sampling/", "platform": "Any", "project_url": "https://pypi.org/project/Density_Sampling/", "project_urls": { "Download": "https://github.com/GGiecold/Density_Sampling", "Homepage": "https://github.com/GGiecold/Density_Sampling" }, "release_url": "https://pypi.org/project/Density_Sampling/1.3/", "requires_dist": null, "requires_python": null, "summary": "For a dataset comprising a mixture of rare and common populations, density sampling gives equal weights to the representatives of those distinct populations.", "version": "1.3" }, "last_serial": 2328625, "releases": { "1.0": [ { "comment_text": "", "digests": { "md5": "5e913ef65db4eb520195237a2c1ed544", "sha256": "f2663618cc0f6d0c8923759917b2282829a51b06623b019222eaa3e69b3f2613" }, "downloads": -1, "filename": "Density_Sampling-1.0.tar.gz", "has_sig": false, "md5_digest": "5e913ef65db4eb520195237a2c1ed544", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 6741, "upload_time": "2015-11-23T19:18:02", "url": "https://files.pythonhosted.org/packages/7a/74/201a2c68e06b4529846beaede1a46916f9a6d976c7b8f213e9debbcb4073/Density_Sampling-1.0.tar.gz" } ], "1.1": [ { "comment_text": "", "digests": { "md5": "8de1018f9ae7db2861e55aaad5483b76", "sha256": "faf920b16427890cb5a2a0e405c7c74423239da1def2d58c8d6bb71a92b14407" }, "downloads": -1, "filename": "Density_Sampling-1.1.tar.gz", "has_sig": false, "md5_digest": "8de1018f9ae7db2861e55aaad5483b76", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 6801, "upload_time": "2015-12-07T16:27:58", "url": "https://files.pythonhosted.org/packages/4f/26/b48154cb8c9e916e4d5a331c30f14c094e41cbe7c77e80fbc78b6465f372/Density_Sampling-1.1.tar.gz" } ], "1.2": [], "1.3": [ { "comment_text": "", "digests": { "md5": "0e94ab975f4f1ad9f6436281d1cc2c51", "sha256": "fe06d40cc5ad333648194d16ef6ed44f32dc44c4d3dc5a93a9bda458b7e0a5e8" }, "downloads": -1, "filename": "Density_Sampling-1.3.tar.gz", "has_sig": false, "md5_digest": "0e94ab975f4f1ad9f6436281d1cc2c51", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 496888, "upload_time": "2016-09-07T03:59:20", "url": "https://files.pythonhosted.org/packages/c3/2a/3f9ff16a7081212b4844c62e24d24a0c6f28d14c3dbc76957994693da778/Density_Sampling-1.3.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "0e94ab975f4f1ad9f6436281d1cc2c51", "sha256": "fe06d40cc5ad333648194d16ef6ed44f32dc44c4d3dc5a93a9bda458b7e0a5e8" }, "downloads": -1, "filename": "Density_Sampling-1.3.tar.gz", "has_sig": false, "md5_digest": "0e94ab975f4f1ad9f6436281d1cc2c51", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 496888, "upload_time": "2016-09-07T03:59:20", "url": "https://files.pythonhosted.org/packages/c3/2a/3f9ff16a7081212b4844c62e24d24a0c6f28d14c3dbc76957994693da778/Density_Sampling-1.3.tar.gz" } ] }