{ "info": { "author": "", "author_email": "", "bugtrack_url": null, "classifiers": [ "Development Status :: 3 - Alpha", "Intended Audience :: Developers", "Intended Audience :: Science/Research", "License :: OSI Approved", "Operating System :: MacOS", "Operating System :: Microsoft :: Windows", "Operating System :: POSIX", "Operating System :: Unix", "Programming Language :: C", "Programming Language :: Python", "Programming Language :: Python :: 2.7", "Programming Language :: Python :: 3.6", "Topic :: Scientific/Engineering", "Topic :: Software Development" ], "description": ".. image:: https://img.shields.io/pypi/v/umap-learn.svg\n :target: https://pypi.python.org/pypi/umap-learn/\n :alt: PyPI Version\n.. image:: https://anaconda.org/conda-forge/umap-learn/badges/version.svg\n :target: https://anaconda.org/conda-forge/umap-learn\n :alt: Conda-forge Version\n.. image:: https://anaconda.org/conda-forge/umap-learn/badges/downloads.svg\n :target: https://anaconda.org/conda-forge/umap-learn\n :alt: Downloads from conda-forge\n.. image:: https://img.shields.io/pypi/l/umap-learn.svg\n :target: https://github.com/lmcinnes/umap/blob/master/LICENSE.txt\n :alt: License\n.. image:: https://travis-ci.org/lmcinnes/umap.svg\n :target: https://travis-ci.org/lmcinnes/umap\n :alt: Travis Build Status\n.. image:: https://coveralls.io/repos/github/lmcinnes/umap/badge.svg\n :target: https://coveralls.io/github/lmcinnes/umap\n :alt: Test Coverage Status\n.. image:: https://readthedocs.org/projects/umap-learn/badge/?version=latest\n :target: https://umap-learn.readthedocs.io/en/latest/?badge=latest\n :alt: Documentation Status\n.. image:: http://joss.theoj.org/papers/10.21105/joss.00861/status.svg\n :target: https://doi.org/10.21105/joss.00861\n :alt: JOSS article for this repository\n\n====\nUMAP\n====\n\nUniform Manifold Approximation and Projection (UMAP) is a dimension reduction\ntechnique that can be used for visualisation similarly to t-SNE, but also for\ngeneral non-linear dimension reduction. The algorithm is founded on three\nassumptions about the data\n\n1. The data is uniformly distributed on a Riemannian manifold;\n2. The Riemannian metric is locally constant (or can be approximated as such);\n3. The manifold is locally connected.\n\nFrom these assumptions it is possible to model the manifold with a fuzzy\ntopological structure. The embedding is found by searching for a low dimensional\nprojection of the data that has the closest possible equivalent fuzzy\ntopological structure.\n\nThe details for the underlying mathematics can be found in\n`our paper on ArXiv `_:\n\nMcInnes, L, Healy, J, *UMAP: Uniform Manifold Approximation and Projection\nfor Dimension Reduction*, ArXiv e-prints 1802.03426, 2018\n\nThe important thing is that you don't need to worry about that -- you can use\nUMAP right now for dimension reduction and visualisation as easily as a drop\nin replacement for scikit-learn's t-SNE.\n\nDocumentation is `available via ReadTheDocs `_.\n\n---------------\nHow to use UMAP\n---------------\n\nThe umap package inherits from sklearn classes, and thus drops in neatly\nnext to other sklearn transformers with an identical calling API.\n\n.. code:: python\n\n import umap\n from sklearn.datasets import load_digits\n\n digits = load_digits()\n\n embedding = umap.UMAP().fit_transform(digits.data)\n\nThere are a number of parameters that can be set for the UMAP class; the\nmajor ones are as follows:\n\n - ``n_neighbors``: This determines the number of neighboring points used in\n local approximations of manifold structure. Larger values will result in\n more global structure being preserved at the loss of detailed local\n structure. In general this parameter should often be in the range 5 to\n 50, with a choice of 10 to 15 being a sensible default.\n\n - ``min_dist``: This controls how tightly the embedding is allowed compress\n points together. Larger values ensure embedded points are more evenly\n distributed, while smaller values allow the algorithm to optimise more\n accurately with regard to local structure. Sensible values are in the\n range 0.001 to 0.5, with 0.1 being a reasonable default.\n\n - ``metric``: This determines the choice of metric used to measure distance\n in the input space. A wide variety of metrics are already coded, and a user\n defined function can be passed as long as it has been JITd by numba.\n\nAn example of making use of these options:\n\n.. code:: python\n\n import umap\n from sklearn.datasets import load_digits\n\n digits = load_digits()\n\n embedding = umap.UMAP(n_neighbors=5,\n min_dist=0.3,\n metric='correlation').fit_transform(digits.data)\n\nUMAP also supports fitting to sparse matrix data. For more details\nplease see `the UMAP documentation `_\n\n----------------\nBenefits of UMAP\n----------------\n\nUMAP has a few signficant wins in its current incarnation.\n\nFirst of all UMAP is *fast*. It can handle large datasets and high\ndimensional data without too much difficulty, scaling beyond what most t-SNE\npackages can manage.\n\nSecond, UMAP scales well in embedding dimension -- it isn't just for\nvisualisation! You can use UMAP as a general purpose dimension reduction\ntechnique as a preliminary step to other machine learning tasks. With a\nlittle care (documentation on how to be careful is coming) it partners well\nwith the `hdbscan `_\nclustering library.\n\nThird, UMAP often performs better at preserving aspects of global structure of\nthe data than t-SNE. This means that it can often provide a better \"big\npicture\" view of your data as well as preserving local neighbor relations.\n\nFourth, UMAP supports a wide variety of distance functions, including\nnon-metric distance functions such as *cosine distance* and *correlation\ndistance*. You can finally embed word vectors properly using cosine distance!\n\nFifth, UMAP supports adding new points to an existing embedding via\nthe standard sklearn ``transform`` method. This means that UMAP can be\nused as a preprocessing transformer in sklearn pipelines.\n\nSixth, UMAP supports supervised and semi-supervised dimension reduction.\nThis means that if you have label information that you wish to use as\nextra information for dimension reduction (even if it is just partial\nlabelling) you can do that -- as simply as providing it as the ``y``\nparameter in the fit method.\n\nFinally UMAP has solid theoretical foundations in manifold learning\n(see `our paper on ArXiv `_).\nThis both justifies the approach and allows for further\nextensions that will soon be added to the library\n(embedding dataframes etc.).\n\n------------------------\nPerformance and Examples\n------------------------\n\nUMAP is very efficient at embedding large high dimensional datasets. In\nparticular it scales well with both input dimension and embedding dimension.\nThus, for a problem such as the 784-dimensional MNIST digits dataset with\n70000 data samples, UMAP can complete the embedding in around 2.5 minutes (as\ncompared with around 45 minutes for most t-SNE implementations). Despite this\nruntime efficiency UMAP still produces high quality embeddings.\n\nThe obligatory MNIST digits dataset, embedded in 2 minutes and 22\nseconds using a 3.1 GHz Intel Core i7 processor (n_neighbors=10, min_dist=0\n.001):\n\n.. image:: images/umap_example_mnist1.png\n :alt: UMAP embedding of MNIST digits\n\nThe MNIST digits dataset is fairly straightforward however. A better test is\nthe more recent \"Fashion MNIST\" dataset of images of fashion items (again\n70000 data sample in 784 dimensions). UMAP\nproduced this embedding in 2 minutes exactly (n_neighbors=5, min_dist=0.1):\n\n.. image:: images/umap_example_fashion_mnist1.png\n :alt: UMAP embedding of \"Fashion MNIST\"\n\nThe UCI shuttle dataset (43500 sample in 8 dimensions) embeds well under\n*correlation* distance in 2 minutes and 39 seconds (note the longer time\nrequired for correlation distance computations):\n\n.. image:: images/umap_example_shuttle.png\n :alt: UMAP embedding the UCI Shuttle dataset\n\n----------\nInstalling\n----------\n\nUMAP depends upon ``scikit-learn``, and thus ``scikit-learn``'s dependencies\nsuch as ``numpy`` and ``scipy``. UMAP adds a requirement for ``numba`` for\nperformance reasons. The original version used Cython, but the improved code\nclarity, simplicity and performance of Numba made the transition necessary.\n\nRequirements:\n\n* numpy\n* scipy\n* scikit-learn\n* numba\n\n**Install Options**\n\nConda install, via the excellent work of the conda-forge team:\n\n.. code:: bash\n\n conda install -c conda-forge umap-learn\n\nThe conda-forge packages are available for linux, OS X, and Windows 64 bit.\n\nPyPI install, presuming you have numba and sklearn and all its requirements\n(numpy and scipy) installed:\n\n.. code:: bash\n\n pip install umap-learn\n\nIf pip is having difficulties pulling the dependencies then we'd suggest installing\nthe dependencies manually using anaconda followed by pulling umap from pip:\n\n.. code:: bash\n\n conda install numpy scipy\n conda install scikit-learn\n conda install numba\n pip install umap-learn\n\nFor a manual install get this package:\n\n.. code:: bash\n\n wget https://github.com/lmcinnes/umap/archive/master.zip\n unzip master.zip\n rm master.zip\n cd umap-master\n\nInstall the requirements\n\n.. code:: bash\n\n sudo pip install -r requirements.txt\n\nor\n\n.. code:: bash\n\n conda install scikit-learn numba\n\nInstall the package\n\n.. code:: bash\n\n python setup.py install\n\n----------------\nHelp and Support\n----------------\n\nDocumentation is at `ReadTheDocs `_.\nThe documentation `includes a FAQ `_ that\nmay answer your questions. If you still have questions then please\n`open an issue `_\nand I will try to provide any help and guidance that I can.\n\n--------\nCitation\n--------\n\nIf you make use of this software for your work we would appreciate it if you\nwould cite the paper from the Journal of Open Source Software:\n\n.. code:: bibtex\n\n @article{mcinnes2018umap-software,\n title={UMAP: Uniform Manifold Approximation and Projection},\n author={McInnes, Leland and Healy, John and Saul, Nathaniel and Grossberger, Lukas},\n journal={The Journal of Open Source Software},\n volume={3},\n number={29},\n pages={861},\n year={2018}\n }\n\nIf you would like to cite this algorithm in your work the ArXiv paper is the\ncurrent reference:\n\n.. code:: bibtex\n\n @article{2018arXivUMAP,\n author = {{McInnes}, L. and {Healy}, J. and {Melville}, J.},\n title = \"{UMAP: Uniform Manifold Approximation\n and Projection for Dimension Reduction}\",\n journal = {ArXiv e-prints},\n archivePrefix = \"arXiv\",\n eprint = {1802.03426},\n primaryClass = \"stat.ML\",\n keywords = {Statistics - Machine Learning,\n Computer Science - Computational Geometry,\n Computer Science - Learning},\n year = 2018,\n month = feb,\n }\n\n-------\nLicense\n-------\n\nThe umap package is 3-clause BSD licensed.\n\nWe would like to note that the umap package makes heavy use of\nNumFOCUS sponsored projects, and would not be possible without\ntheir support of those projects, so please `consider contributing to NumFOCUS `_.\n\n------------\nContributing\n------------\n\nContributions are more than welcome! There are lots of opportunities\nfor potential projects, so please get in touch if you would like to\nhelp out. Everything from code to notebooks to\nexamples and documentation are all *equally valuable* so please don't feel\nyou can't contribute. To contribute please\n`fork the project `_\nmake your changes and\nsubmit a pull request. We will do our best to work through any issues with\nyou and get your code merged into the main branch.\n\n\n\n\n", "description_content_type": "", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/lilab-bcb/umap", "keywords": "dimension reduction t-sne manifold", "license": "BSD", "maintainer": "Bo Li", "maintainer_email": "bli28@mgh.harvard.edu", "name": "umap-learn-modified", "package_url": "https://pypi.org/project/umap-learn-modified/", "platform": "", "project_url": "https://pypi.org/project/umap-learn-modified/", "project_urls": { "Homepage": "https://github.com/lilab-bcb/umap" }, "release_url": "https://pypi.org/project/umap-learn-modified/0.3.8/", "requires_dist": [ "numpy (>=1.13)", "scikit-learn (>=0.16)", "scipy (>=0.19)", "numba (>=0.37)" ], "requires_python": "", "summary": "Forked from umap-learn (https://github.com/lmcinnes/umap). Change API so that UMAP accepts precomputed KNN input, rather than always calculating by itself. Remove numba.njit parallel option to avoid delay in the cloud.", "version": "0.3.8" }, "last_serial": 5601484, "releases": { "0.3.8": [ { "comment_text": "", "digests": { "md5": "33184263b7c04f6849a6f017a09b06c0", "sha256": "9e6434525eeaaf5cf322081be665a816a507cbc12695472f55083ad17de2eb8c" }, "downloads": -1, "filename": "umap_learn_modified-0.3.8-py3-none-any.whl", "has_sig": false, "md5_digest": "33184263b7c04f6849a6f017a09b06c0", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 39777, "upload_time": "2019-07-29T19:36:22", "url": "https://files.pythonhosted.org/packages/e3/17/36f2e1a9306a301f740d1a579df91b9af04a0d25b822be7146540f0a8e84/umap_learn_modified-0.3.8-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "c2a0582a0253fe79028d3d9fc71553c8", "sha256": "27b7a08b07a7f639ea37109cd52034d002b7875109420482b9ec8769bb7e5886" }, "downloads": -1, "filename": "umap-learn-modified-0.3.8.tar.gz", "has_sig": false, "md5_digest": "c2a0582a0253fe79028d3d9fc71553c8", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 40636, "upload_time": "2019-07-29T19:36:24", "url": "https://files.pythonhosted.org/packages/53/fe/4b852e59303526c82602433426cd7a197ab595f100a1f29217b4432cafa3/umap-learn-modified-0.3.8.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "33184263b7c04f6849a6f017a09b06c0", "sha256": "9e6434525eeaaf5cf322081be665a816a507cbc12695472f55083ad17de2eb8c" }, "downloads": -1, "filename": "umap_learn_modified-0.3.8-py3-none-any.whl", "has_sig": false, "md5_digest": "33184263b7c04f6849a6f017a09b06c0", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 39777, "upload_time": "2019-07-29T19:36:22", "url": "https://files.pythonhosted.org/packages/e3/17/36f2e1a9306a301f740d1a579df91b9af04a0d25b822be7146540f0a8e84/umap_learn_modified-0.3.8-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "c2a0582a0253fe79028d3d9fc71553c8", "sha256": "27b7a08b07a7f639ea37109cd52034d002b7875109420482b9ec8769bb7e5886" }, "downloads": -1, "filename": "umap-learn-modified-0.3.8.tar.gz", "has_sig": false, "md5_digest": "c2a0582a0253fe79028d3d9fc71553c8", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 40636, "upload_time": "2019-07-29T19:36:24", "url": "https://files.pythonhosted.org/packages/53/fe/4b852e59303526c82602433426cd7a197ab595f100a1f29217b4432cafa3/umap-learn-modified-0.3.8.tar.gz" } ] }