{ "info": { "author": "Narges Razavian", "author_email": "nsr3@nyu.edu", "bugtrack_url": null, "classifiers": [ "License :: OSI Approved :: MIT License", "Operating System :: OS Independent", "Programming Language :: Python :: 3" ], "description": "# csr_utils\n\nScalable Operations for CSR matrices.\n\n[![Build Status](https://travis-ci.org/narges-rzv/csr_utils.svg?branch=master)](https://travis-ci.org/narges-rzv/csr_utils)\n\nInstallation \n------------\nFor general users and if using conda etc.:\n\n`pip install csr_utils` \n\n\nWithout root access:\n\n`pip install --user csr_utils`\n\nUsage\n-----\n\n```\n>>> import numpy as np\n>>> from scipy.sparse import csr_matrix\n>>> xcsr = csr_matrix(np.array([[1, 0], [3, 4], [2, 2]], dtype=float))\n\n>>> import csr_utils\n>>> xnorm, xmean, xstd, xixnormed = normalize_csr_matrix(xcsr)\n\n```\n\n\nOverview\n--------\nThis package currently only has a fast and memory efficient implementation for normalizing nonzero values of a CSR array without un-sparsifying the function. This is useful step for machine learning on large matrices. Most algorithms work better with normalized input, in particular the commonly used [linear classification models in sklearn](http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html). \n\nThere will be more functions added as the need arises, including turning a csr array into cuda sparse array directly. Stay tuned. \n\nnormalize_csr_matrix:\n---------------------\n- Normalizes a CSR matrix only based on non-zero values, without turning it into dense array. \n - In the CSR matrix, rows correspond to samples, and columns correspond to features.\n - Normalization will be such that each column (feature)'s non-zero values will have mean of 0.0 and standard deviation of 1.0.\n- Will return the scalable equivalent of x = x.toarray(); x[(x==0)] = np.nan; (x - np.nanmean(x, axis=0)) / np.nanstd(x, axis=0)\n- We compute a faster and equivalent definition of standard deviation:\n - ```sigma = SquareRoot(ExpectedValue(|X - mean|^2)) # slow```\n - ```sigma = SquareRoot(ExpectedValue(X^2) - ExpectedValue(X)^2) # fast```\n - [For more info see the math](https://en.wikipedia.org/wiki/Standard_deviation#Definition_of_population_values)\n- This function makes the following assumptions:\n - If we don't have any observations in a column i, mean_array[i] be set to 0.0, and std_array[i] will be set to 1.0.\n - If we have a single observation, or if standard deviation is 0.0 for a column, we only subtract the mean for that column.\n\n- (Useful for normalizing test sets:) The function allows the normalization to be based on pre-specified mean and standard deviation arrays. \n\n- The function also allows only a given subset of features to be normalized.\n\nExample\n-------\n```\n>>> import numpy as np\n>>> from scipy.sparse import csr_matrix\n\n>>> x = csr_matrix(np.array([[1, 0, 0], [3, 0, 4], [2, 5, 2]], dtype=float))\n\n>>> print(x.toarray())\n[[ 1. 0. 0.]\n[ 3. 0. 4.]\n[ 2. 5. 2.]]\n\n>>> xnorm, xmean, xstd, xixnormed = csr_utils.normalize_csr_matrix(x)\n\n>>> print(xnorm.todense())\n[[-1.22474487 0. 0. ]\n[ 1.22474487 0. 1. ]\n[ 0. 0. -1. ]]\n\n>>> xmean\narray([2., 5., 3.])\n\n>>> xstd\narray([0.81649658, 1. , 1. ]) \n\n>>> xixnormed\narray([0, 2])\n\n```\n\n\n", "description_content_type": "text/markdown", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/narges-rzv/csr_utils", "keywords": "", "license": "", "maintainer": "", "maintainer_email": "", "name": "csr-utils", "package_url": "https://pypi.org/project/csr-utils/", "platform": "", "project_url": "https://pypi.org/project/csr-utils/", "project_urls": { "Homepage": "https://github.com/narges-rzv/csr_utils" }, "release_url": "https://pypi.org/project/csr-utils/0.1.3/", "requires_dist": [ "scipy", "numpy" ], "requires_python": "", "summary": "Utility functions for scalable handling of CSR matrices", "version": "0.1.3" }, "last_serial": 4066897, "releases": { "0.1": [ { "comment_text": "", "digests": { "md5": "239558686be2ecf5934c704c5583b745", "sha256": "376cd89832b9315815804605a65a90452fc5183aa6d01e3c66dd0e80d1c4fb79" }, "downloads": -1, "filename": "csr_utils-0.1-py3-none-any.whl", "has_sig": false, "md5_digest": "239558686be2ecf5934c704c5583b745", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 4575, "upload_time": "2018-07-09T03:45:44", "url": "https://files.pythonhosted.org/packages/bb/32/c74ab9267b3d0daf28303a300517c8ec4046b3f0b81c139fc8ce79a45c6c/csr_utils-0.1-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "5f7ff8595d92648b1ef9bbf99df9d4ff", "sha256": "77f960cf798029c2ff84dfb7bf014600c98a6db46f05ce5f1956e0dd2b813e81" }, "downloads": -1, "filename": "csr_utils-0.1.tar.gz", "has_sig": false, "md5_digest": "5f7ff8595d92648b1ef9bbf99df9d4ff", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 3742, "upload_time": "2018-07-09T03:45:45", "url": "https://files.pythonhosted.org/packages/0b/1a/058ffb51c1040b97a8833d13b5ff75eabb49fc2fd1e1cea6b8dbd5ffeae6/csr_utils-0.1.tar.gz" } ], "0.1.1": [ { "comment_text": "", "digests": { "md5": "fac63af3fcc0e14ec54e6207b9346ff5", "sha256": "2a0c738498d7185bf1d3ba72893cb202b0081216e1c91647c6e5902c02e528ff" }, "downloads": -1, "filename": "csr_utils-0.1.1-py3-none-any.whl", "has_sig": false, "md5_digest": "fac63af3fcc0e14ec54e6207b9346ff5", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 5956, "upload_time": "2018-07-15T04:54:53", "url": "https://files.pythonhosted.org/packages/4d/f1/51a09455cd3faffdea26e3d2d80c7f634396f54885d1ed6d95158d5701b4/csr_utils-0.1.1-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "dd603111b41eb5ec9d550582ec0a7e79", "sha256": "a3e3a1a5eadae4e47236a9bd377c219ad8f578d4a95bab86206f05c72123c6c7" }, "downloads": -1, "filename": "csr_utils-0.1.1.tar.gz", "has_sig": false, "md5_digest": "dd603111b41eb5ec9d550582ec0a7e79", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 4623, "upload_time": "2018-07-15T04:54:54", "url": "https://files.pythonhosted.org/packages/d9/0e/065ee770524bdb78e9e1824361fed7a5f3e1a1644a55e7c855c7d65d6c26/csr_utils-0.1.1.tar.gz" } ], "0.1.2": [ { "comment_text": "", "digests": { "md5": "3ca12fe3479bd9fabc6fe4149987668a", "sha256": "7307c43a2febffa93d9767b57de5aef5b44df42d665bfb004b35a35060536904" }, "downloads": -1, "filename": "csr_utils-0.1.2-py3-none-any.whl", "has_sig": false, "md5_digest": "3ca12fe3479bd9fabc6fe4149987668a", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 6026, "upload_time": "2018-07-16T19:24:20", "url": "https://files.pythonhosted.org/packages/b0/bb/3155aa5a78bcac61ad611e77086d45b7461896be901b71737cdc391ee6eb/csr_utils-0.1.2-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "e708701f5cf988a865f43495fe9c65d8", "sha256": "e005b4ef250a69c6fc57c7b7334ab7ccaff765601da09bac07d18207bb8df6ec" }, "downloads": -1, "filename": "csr_utils-0.1.2.tar.gz", "has_sig": false, "md5_digest": "e708701f5cf988a865f43495fe9c65d8", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 4743, "upload_time": "2018-07-16T19:24:22", "url": "https://files.pythonhosted.org/packages/0e/a8/de000834048d2be1da9ce2fd9573e50bd0781687914374f454be941af7ed/csr_utils-0.1.2.tar.gz" } ], "0.1.3": [ { "comment_text": "", "digests": { "md5": "616840ad5a96d9098dc838fd2ebfad1d", "sha256": "fb592b12ae19d9c827fb949abe7fe31cfc53a2299231947c943662a41c0c708b" }, "downloads": -1, "filename": "csr_utils-0.1.3-py3-none-any.whl", "has_sig": false, "md5_digest": "616840ad5a96d9098dc838fd2ebfad1d", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 5901, "upload_time": "2018-07-16T19:33:06", "url": "https://files.pythonhosted.org/packages/b1/f4/60975f0eb14292f357ffb37d0c8fb49a8e85e7c22864aadc5b7c9c3ffdf4/csr_utils-0.1.3-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "673af52fd210b8fd7ce05e1d73cb04a1", "sha256": "8794e3ced6f94affdfd4a1e6dee3bb68802289ff2cbe7cd1ae0de621ef3b220b" }, "downloads": -1, "filename": "csr_utils-0.1.3.tar.gz", "has_sig": false, "md5_digest": "673af52fd210b8fd7ce05e1d73cb04a1", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 4626, "upload_time": "2018-07-16T19:33:07", "url": "https://files.pythonhosted.org/packages/26/6d/8d5175def81f197c7efe5392bdabfa34a81403057a21f585abcede118fa2/csr_utils-0.1.3.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "616840ad5a96d9098dc838fd2ebfad1d", "sha256": "fb592b12ae19d9c827fb949abe7fe31cfc53a2299231947c943662a41c0c708b" }, "downloads": -1, "filename": "csr_utils-0.1.3-py3-none-any.whl", "has_sig": false, "md5_digest": "616840ad5a96d9098dc838fd2ebfad1d", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 5901, "upload_time": "2018-07-16T19:33:06", "url": "https://files.pythonhosted.org/packages/b1/f4/60975f0eb14292f357ffb37d0c8fb49a8e85e7c22864aadc5b7c9c3ffdf4/csr_utils-0.1.3-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "673af52fd210b8fd7ce05e1d73cb04a1", "sha256": "8794e3ced6f94affdfd4a1e6dee3bb68802289ff2cbe7cd1ae0de621ef3b220b" }, "downloads": -1, "filename": "csr_utils-0.1.3.tar.gz", "has_sig": false, "md5_digest": "673af52fd210b8fd7ce05e1d73cb04a1", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 4626, "upload_time": "2018-07-16T19:33:07", "url": "https://files.pythonhosted.org/packages/26/6d/8d5175def81f197c7efe5392bdabfa34a81403057a21f585abcede118fa2/csr_utils-0.1.3.tar.gz" } ] }