{ "info": { "author": "Yue Zhang", "author_email": "yjzhang@cs.washington.edu", "bugtrack_url": null, "classifiers": [ "Development Status :: 3 - Alpha", "License :: OSI Approved :: MIT License", "Programming Language :: Python :: 2.7", "Programming Language :: Python :: 3.5", "Topic :: Scientific/Engineering :: Bio-Informatics" ], "description": "UNCURL\n======\n\n.. image:: https://travis-ci.org/yjzhang/uncurl_python.svg\n :target: https://travis-ci.org/yjzhang/uncurl_python\n.. image:: https://img.shields.io/pypi/v/uncurl-seq.svg\n :target: https://pypi.python.org/pypi/uncurl-seq\n.. image:: https://pepy.tech/badge/uncurl-seq\n :target: https://pepy.tech/project/uncurl-seq\n\n.. contents::\n\nOverview\n--------\n\nUncurl is a python package for analyzing single-cell RNA-seq data.\n\n\nInstallation\n------------\n\nUncurl can be installed from PyPI: ``pip install uncurl-seq``.\n\nAlternatively, uncurl can be installed from source: After cloning the repository, first run ``pip install -r requirements.txt`` to install the required libraries. Then, run ``pip install .``\n\nRequirements: numpy, scipy, cython, scikit-learn\n\nTested on python 2.7, 3.5 on Linux.\n\nFor parallel state estimation, OpenMP is required.\n\nTo run tests: ``python setup.py test``\n\nAfter the python package is installed, uncurl can be used from R using ``reticulate``. See `Using UNCURL in R <#using-uncurl-in-r>`_\n\n\nExamples\n--------\n\nSee the ``examples`` folder for example scripts, and the ``notebooks`` folder for Jupyter notebooks.\n\nFor a detailed tutorial, see ``Tutorial.ipynb`` in the ``notebooks`` folder.\n\n`Full documentation `_\n\n\nPublications\n------------\n\nPresented at ISMB 2018.\n\nMukherjee, S., Zhang, Y., Fan, J., Seelig, G. & Kannan, S. Scalable preprocessing for sparse scRNA-seq data exploiting prior knowledge. Bioinformatics 34, i124\u2013i132 (2018).\n\n`https://academic.oup.com/bioinformatics/article/34/13/i124/5045758 `_\n\n\nFeatures\n========\n\nState Estimation\n----------------\n\nThe simplest way to use state estimation is to use the ``run_state_estimation`` function, which can be used to call any of the state estimation functions for different distributions. The possible distributions are 'Poiss', 'LogNorm', 'Gaussian', 'NB' (negative binomial), or 'ZIP' (zero-inflated Poisson). Generally, 'Poiss' is recommended for sparse or count-valued datasets. Currently the NB and ZIP options are unsupported.\n\nBefore running state estimation, it is often a good idea to subset the number of genes. This can be done using the function ``max_variance_genes``, which bins the genes by mean expression, and selects a top fraction of genes by variance from each bin. It also removes genes that have all zero expression counts.\n\nExample:\n\n.. code-block:: python\n\n import numpy as np\n import scipy.io\n from uncurl import max_variance_genes, run_state_estimation\n\n data = np.loadtxt('counts.txt')\n\n # sparse data (matrix market format)\n data_sparse = scipy.io.mmread('matrix.mtx')\n\n # max variance genes, default parameters \n genes = max_variance_genes(data_sparse, nbins=5, frac=0.2)\n data_subset = data_sparse[genes,:]\n\n M, W, ll = run_state_estimation(data_subset, clusters=4, dist='Poiss', disp=False, max_iters=30, inner_max_iters=100, initialization='tsvd', threads=8)\n\n M2, W2, cost = run_state_estimation(data_subset, clusters=4, dist='LogNorm')\n\nDetails\n^^^^^^^\n\n``run_state_estimation`` is actually a wrapper around several other functions for state estimation.\n\nThe ``poisson_estimate_state`` function is used to estimate cell types using the Poisson Convex Mixture Model. It can take in dense or sparse matrices of reals or integers as input, and can be accelerated by parallelization. The input is of shape (genes, cells). It has three outputs: two matrices ``M`` and ``W``, and ``ll``, the negative log-likelihood. M is a (genes, clusters) matrix, and W is a (clusters, cells) matrix where each column sums to 1. The outputs ``W`` and ``M*W`` can be used for further visualization or dimensionality reduction, as described latter.\n\nThere are a number of different initialization methods and options for ``poisson_estimate_state``. By default, it is initialized using truncated SVD + K-means, but it can also be initialized using ``poisson_cluster`` or just K-means.\n\nExample:\n\n.. code-block:: python\n\n from uncurl import max_variance_genes, poisson_cluster, poisson_estimate_state\n\n # poisson state estimation\n M, W, ll = poisson_estimate_state(data_subset, 2)\n\n # labels in 0...k-1\n labels = W.argmax(0)\n\n # optional arguments\n M, W, ll = poisson_estimate_state(data_subset, clusters=2, disp=False, max_iters=30, inner_max_iters=150, initialization='tsvd', threads=8)\n\n # initialization by providing means and weights\n assignments_p, centers = poisson_cluster(data_subset, 2)\n M, W, ll = poisson_estimate_state(data_subset, 2, init_means=centers, init_weights=assignments_p)\n\nThe ``log_norm_nmf`` function is a wrapper around scikit-Learn's NMF class that performs a log-transform and per-cell count normalization before running NMF. It returns two matrices, W and H, which correspond to the M and W returned by ``poisson_estimate_state``. It can also take sparse matrix inputs.\n\nExample:\n\n.. code-block:: python\n\n from uncurl import log_norm_nmf\n\n W, H = log_norm_nmf(data_subset, k=2)\n\n\nDistribution Selection\n----------------------\n\nThe ``DistFitDataset`` function is used to determine the distribution of each gene in a dataset by calculating the fit error for the Poisson, Normal, and Log-Normal distributions. It currently only works for dense matrices. For large datasets, we recommend taking a small random subset of less than 1000 cells.\n\nExample:\n\n.. code-block:: python\n\n import numpy as np\n from uncurl import DistFitDataset\n\n data = np.loadtxt('counts.txt')\n\n fit_errors = DistFitDataset(data)\n\n poiss_fit_errors = fit_errors['poiss']\n norm_fit_errors = fit_errors['norm']\n lognorm_fit_errors = fit_errors['lognorm']\n\n\nThe output, ``fit_errors``, contains the fit error for each gene, for each of the three distributions when fitted to the data using maximum likelihood.\n\n\nQualitative to Quantitative Framework\n-------------------------------------\n\nThe ``qualNorm`` function is used to convert binary (or otherwise) data with shape (genes, types) into starting points for clustering and state estimation.\n\nExample:\n\n.. code-block:: python\n\n from uncurl import qualNorm\n import numpy as np\n\n data = np.loadtxt('counts.txt')\n bin_data = np.loadtxt('binary.txt')\n starting_centers = qualNorm(data, bin_data)\n assignments, centers = poisson_cluster(data, 2, init=starting_centers)\n\n\nClustering\n----------\n\nThe ``poisson_cluster`` function does Poisson clustering with hard assignments. It takes an array of features by examples and the number of clusters, and returns two arrays: an array of cluster assignments and an array of cluster centers.\n\n\nExample:\n\n.. code-block:: python\n\n from uncurl import poisson_cluster\n import numpy as np\n\n # data is a 2d array of floats, with dimensions genes x cells\n data = np.loadtxt('counts.txt')\n assignments_p, centers = poisson_cluster(data, 2)\n\nImputation\n----------\n\nImputation is done by simply multiplying the resulting matrices M and W, resulting in a new matrix of the same dimensionality as the original.\n\nFor an example using UNCURL for imputation, see `this notebook `_.\n\n\nDimensionality Reduction\n------------------------\n\nWe recommend using standard dimensionality reduction techniques such as t-SNE and PCA. They can be run on either W or ``MW = M.dot(W)``. When running t-SNE on MW, we suggest taking the log and then doing a PCA or truncated SVD, as you would do for the original input data. This is the basis for the UNCURL + tSNE results in our paper. When using t-SNE on W, we suggest using a symmetric relative entropy metric, which is available as ``uncurl.sparse_utils.symmetric_kld`` (this can be passed in to scikit-learn's t-SNE implementation). Cosine distance has also worked better than Euclidean distance on W.\n\nAlternatively, we provide an MDS-based dimensionality reduction method that takes advantage of the convex mixture model. It is generally less accurate than t-SNE, but much faster. See `docs for unsupported methods `_.\n\n\nLineage Estimation & Pseudotime\n-------------------------------\n\nThe output MW of UNCURL can be used as input for other lineage estimation tools.\n\nWe also have implemented our own lineage estimation tools but have not thoroughly validated them. See `docs for unsupported methods `_.\n\nUsing UNCURL in R\n-----------------\n\nUNCURL has been tested in R using the ``reticulate`` library. There first has to be a python installation that has uncurl installed. Example:\n\n.. code-block:: R\n\n # https://bioconductor.org/packages/release/bioc/html/SingleCellExperiment.html\n library(SingleCellExperiment)\n\n # https://rstudio.github.io/reticulate/\n library(reticulate)\n\n # The 'import' function is provided by reticulate, and allows python libraries to be imported in R.\n uncurl <- import(\"uncurl\")\n\n # Say that 'sce' is a SingleCellExperiment object.\n # See https://bioconductor.org/packages/release/bioc/vignettes/SingleCellExperiment/inst/doc/intro.html\n # for an example.\n\n data <- counts(sce)\n k = 10\n results <- uncurl$run_state_estimation(data, k)\n\n # m and w are matrices of numeric values.\n # m is of shape (genes, k), an w is of shape (k, cells).\n m <- results[[1]]\n w <- results[[2]]\n\n # This gets the cluster labels using argmax.\n cluster_labels <- apply(w, 2, which.max)\n\n\nMiscellaneous\n-------------\n\nUnsupported methods included in the package: https://yjzhang.github.io/uncurl_python/unsupported_methods.html\n\nMiscellaneous uncurl parameters (non-default parameters and things we tried): https://yjzhang.github.io/uncurl_python/things_we_tried.html\n\n\nIncluded datasets\n-----------------\n\nReal datasets:\n\n10x_pooled_400.mat: 50 cells each from 8 cell types: CD19+ b cells, CD14+ monocytes, CD34+, CD56+ NK, CD4+/CD45RO+ memory t, CD8+/CD45RA+ naive cytotoxic, CD4+/CD45RA+/CD25- naive t, and CD4+/CD25 regulatory t. Source: `10x genomics `_.\n\nGSE60361_dat.mat: subset of data from `Zelsel et al. 2015 `_.\n\nSCDE_test.mat: data from `Islam et al. 2011 `_.\n\nSynthetic datasets:\n\nBranchedSynDat.mat: simulated lineage dataset with 3 branches\n\nSynMouseESprog_1000.mat: simulated lineage dataset showing linear differentiation", "description_content_type": "text/x-rst", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/yjzhang/uncurl_python", "keywords": "", "license": "MIT", "maintainer": "", "maintainer_email": "", "name": "uncurl-seq", "package_url": "https://pypi.org/project/uncurl-seq/", "platform": "", "project_url": "https://pypi.org/project/uncurl-seq/", "project_urls": { "Homepage": "https://github.com/yjzhang/uncurl_python" }, "release_url": "https://pypi.org/project/uncurl-seq/0.2.14/", "requires_dist": null, "requires_python": "", "summary": "Tool for pre-processing single-cell RNASeq data", "version": "0.2.14" }, "last_serial": 5041097, "releases": { "0.2.10": [ { "comment_text": "", "digests": { "md5": "b0a0be8156866a1d351c38fc5837181d", "sha256": "b50e79e269522d5e577d2fda8244cd0375d3c862b9a226392c2b636b869e3259" }, "downloads": -1, "filename": "uncurl_seq-0.2.10.tar.gz", "has_sig": false, "md5_digest": "b0a0be8156866a1d351c38fc5837181d", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 8631069, "upload_time": "2018-05-04T04:46:50", "url": "https://files.pythonhosted.org/packages/77/e8/76e436e19e5e5d33d74eae36c0b92d72a4e8003b24b8db144ceacc1ef47a/uncurl_seq-0.2.10.tar.gz" } ], "0.2.11": [ { "comment_text": "", "digests": { "md5": "f1eff4c9f7bc4272ca29e07df7f044fb", "sha256": "200145e6fa38ec18d7c1e934bc0ce2a6dd2295240142c5bb441d036f9176272f" }, "downloads": -1, "filename": "uncurl_seq-0.2.11.tar.gz", "has_sig": false, "md5_digest": "f1eff4c9f7bc4272ca29e07df7f044fb", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 8509487, "upload_time": "2018-10-11T01:12:31", "url": "https://files.pythonhosted.org/packages/21/f6/5873f9225665883919b7c2d73ddde8f275cbf8f12256ddaae431d71aaf65/uncurl_seq-0.2.11.tar.gz" } ], "0.2.12": [ { "comment_text": "", "digests": { "md5": "5e281cc9af9721bf2881dd22784eaaf6", "sha256": "0684a686e5219254a4feaad0faf5ac5944cf9ecea258bc78ca1abdca6c28fa65" }, "downloads": -1, "filename": "uncurl_seq-0.2.12.tar.gz", "has_sig": false, "md5_digest": "5e281cc9af9721bf2881dd22784eaaf6", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 8510417, "upload_time": "2019-02-15T00:11:39", "url": "https://files.pythonhosted.org/packages/76/d7/054955738a25ac9951e0ebe40ba13941770223f1ec82cb8c14ae1672aebc/uncurl_seq-0.2.12.tar.gz" } ], "0.2.13": [ { "comment_text": "", "digests": { "md5": "099e113882646cf7857fd4dbee2d0837", "sha256": "caae217facecb211ac095bd341538e3024e3ed2e87776bfc9e0665521e60b60b" }, "downloads": -1, "filename": "uncurl_seq-0.2.13.tar.gz", "has_sig": false, "md5_digest": "099e113882646cf7857fd4dbee2d0837", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 8513178, "upload_time": "2019-03-16T02:43:28", "url": "https://files.pythonhosted.org/packages/91/73/4c0ba0f2f0670faa7fce9fade4bd96f7351653cd39fa3cf554ecfdc6aa7c/uncurl_seq-0.2.13.tar.gz" } ], "0.2.14": [ { "comment_text": "", "digests": { "md5": "9e2bee30c12cdc3f43bec36177bb15a8", "sha256": "a2f5e38cba3988d1fdd6a86cb50c7ff5830fbc65860a938d7b74240a2f22ff24" }, "downloads": -1, "filename": "uncurl_seq-0.2.14.tar.gz", "has_sig": false, "md5_digest": "9e2bee30c12cdc3f43bec36177bb15a8", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 8499537, "upload_time": "2019-04-01T21:30:08", "url": "https://files.pythonhosted.org/packages/a7/90/28be074dc0dcce46484d18bdece31cde09d587b625541e352b0f018318f5/uncurl_seq-0.2.14.tar.gz" } ], "0.2.4": [ { "comment_text": "", "digests": { "md5": "79f3ef73c74966709e4a1f83e55e4443", "sha256": "c25a0c3cd22e8039155bfe0ead38981143ce46615301e2ee83f5f07a1c093c5e" }, "downloads": -1, "filename": "uncurl_seq-0.2.5.tar.gz", "has_sig": false, "md5_digest": "79f3ef73c74966709e4a1f83e55e4443", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 8365293, "upload_time": "2018-02-06T04:00:29", "url": "https://files.pythonhosted.org/packages/92/42/f7f315865275eb67866ae2699c45b5c54d648727539449cb977628a21768/uncurl_seq-0.2.5.tar.gz" } ], "0.2.6": [ { "comment_text": "", "digests": { "md5": "71f6401d15549400dec486d5fab2430f", "sha256": "2386c8c2cd29ac2366e1fc24379680905689bc19e48923f69b65bd02f4314045" }, "downloads": -1, "filename": "uncurl_seq-0.2.6.tar.gz", "has_sig": false, "md5_digest": "71f6401d15549400dec486d5fab2430f", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 8604800, "upload_time": "2018-02-08T00:23:48", "url": "https://files.pythonhosted.org/packages/4f/fd/deb50d3118ceeaffd562709e6ac15a6cdad9f7a59c26ca53065aa8e73f4e/uncurl_seq-0.2.6.tar.gz" } ], "0.2.7": [ { "comment_text": "", "digests": { "md5": "8e61467e804613d14c7fc602c8098d7b", "sha256": "64773cb2f2ac84721c469173542940758e9cfe0dc7f161f8ac76449596d89686" }, "downloads": -1, "filename": "uncurl_seq-0.2.7.tar.gz", "has_sig": false, "md5_digest": "8e61467e804613d14c7fc602c8098d7b", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 8623920, "upload_time": "2018-03-24T00:07:16", "url": "https://files.pythonhosted.org/packages/81/d4/734f958e834b3f7ea0b8e8fdfa430261313e08b500e7eae8313f7cd460b5/uncurl_seq-0.2.7.tar.gz" } ], "0.2.8": [ { "comment_text": "", "digests": { "md5": "2708b4ee30ed8e5fee66f875e6607339", "sha256": "4fb43931cfe59eae73dc44922489d557949a81a152f112ffd116d1894c0ae44d" }, "downloads": -1, "filename": "uncurl_seq-0.2.8.tar.gz", "has_sig": false, "md5_digest": "2708b4ee30ed8e5fee66f875e6607339", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 8624442, "upload_time": "2018-05-03T03:51:02", "url": "https://files.pythonhosted.org/packages/54/e5/9826701aacd32c6e9d6d86f9b9162694790ce71fe34bb602f304b5d839fd/uncurl_seq-0.2.8.tar.gz" } ], "0.2.9": [ { "comment_text": "", "digests": { "md5": "13deab22e4c40b5d8fdecdafd986ec70", "sha256": "f0e9a984385e09946ddf59b350bc22ff338a98191baf8cbf29d29b8ab5e72ee9" }, "downloads": -1, "filename": "uncurl_seq-0.2.9.tar.gz", "has_sig": false, "md5_digest": "13deab22e4c40b5d8fdecdafd986ec70", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 8630848, "upload_time": "2018-05-03T04:25:38", "url": "https://files.pythonhosted.org/packages/a3/5a/d57ae6cd86ab037632cd97c4299ac0165bd8f84af25130d3461a229ebf3a/uncurl_seq-0.2.9.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "9e2bee30c12cdc3f43bec36177bb15a8", "sha256": "a2f5e38cba3988d1fdd6a86cb50c7ff5830fbc65860a938d7b74240a2f22ff24" }, "downloads": -1, "filename": "uncurl_seq-0.2.14.tar.gz", "has_sig": false, "md5_digest": "9e2bee30c12cdc3f43bec36177bb15a8", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 8499537, "upload_time": "2019-04-01T21:30:08", "url": "https://files.pythonhosted.org/packages/a7/90/28be074dc0dcce46484d18bdece31cde09d587b625541e352b0f018318f5/uncurl_seq-0.2.14.tar.gz" } ] }