{ "info": { "author": "telescopes", "author_email": "luyaoli88@gmail.com", "bugtrack_url": null, "classifiers": [ "Development Status :: 3 - Alpha", "Intended Audience :: Science/Research", "License :: OSI Approved :: BSD License", "Operating System :: OS Independent", "Programming Language :: Python :: 3", "Topic :: Scientific/Engineering :: Mathematics", "Topic :: Software Development :: Libraries :: Python Modules" ], "description": "\n# Light_FAMD\n\n`Light_FAMD` is a library for prcessing [factor analysis of mixed data](https://www.wikiwand.com/en/Factor_analysis). This includes a variety of methods including [principal component analysis (PCA)](https://www.wikiwand.com/en/Principal_component_analysis) and [multiply correspondence analysis (MCA)](https://www.researchgate.net/publication/239542271_Multiple_Correspondence_Analysis). The goal is to provide an efficient and light implementation for each algorithm along with a scikit-learn API.\n\n## Table of contents\n\n- [Usage](##Usage)\n - [Guidelines](###Guidelines)\n - [Principal component analysis (PCA)](#principal-component-analysis-pca)\n - [Correspondence analysis (CA)](#correspondence-analysis-ca)\n - [Multiple correspondence analysis (MCA)](#multiple-correspondence-analysis-mca)\n - [Multiple factor analysis (MFA)](#multiple-factor-analysis-mfa)\n - [Factor analysis of mixed data (FAMD)](#factor-analysis-of-mixed-data-famd)\n- [Going faster](#going-faster)\n\n\n\n\n`Light_FAMD` doesn't have any extra dependencies apart from the usual suspects (`sklearn`, `pandas`, `numpy`) which are included with Anaconda.\n\n\n\n### Guidelines\n\nEach base estimator(CA,PCA) provided by `Light_FAMD` extends scikit-learn's `(TransformerMixin,BaseEstimator)`.which means we could use directly `fit_transform`,and `(set_params,get_params)` methods.\n\nUnder the hood `Light_FAMD` uses a [randomised version of SVD](https://scikit-learn.org/dev/modules/generated/sklearn.utils.extmath.randomized_svd.html). This algorithm finds a (usually very good) approximate truncated singular value decomposition using randomization to speed up the computations. It is particularly fast on large matrices on which you wish to extract only a small number of components. In order to obtain further speed up, n_iter can be set <=2 (at the cost of loss of precision). However if you want reproducible results then you should set the `random_state` parameter.\n\nThe randomised version of SVD is an iterative method. Because each of light_famd's algorithms use SVD, they all possess a `n_iter` parameter which controls the number of iterations used for computing the SVD. On the one hand the higher `n_iter` is the more precise the results will be. On the other hand increasing `n_iter` increases the computation time. In general the algorithm converges very quickly so using a low `n_iter` (which is the default behaviour) is recommended.\n\nIn this package,inheritance relationship as shown below(A->B:A is superclass of B):\n\n- PCA -> MFA -> FAMD\n- CA ->MCA\n\nYou are supposed to use each method depending on your situation:\n\n- All your variables are numeric: use principal component analysis (`PCA`)\n- You have a contingency table: use correspondence analysis (`CA`)\n- You have more than 2 variables and they are all categorical: use multiple correspondence analysis (`MCA`)\n- You have groups of categorical **or** numerical variables: use multiple factor analysis (`MFA`)\n- You have both categorical and numerical variables: use factor analysis of mixed data (`FAMD`)\n\nThe next subsections give an overview of each method along with usage information. The following papers give a good overview of the field of factor analysis if you want to go deeper:\n\n- [A Tutorial on Principal Component Analysis](https://arxiv.org/pdf/1404.1100.pdf)\n- [Theory of Correspondence Analysis](http://statmath.wu.ac.at/courses/CAandRelMeth/caipA.pdf)\n- [Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions](https://arxiv.org/pdf/0909.4061.pdf)\n- [Computation of Multiple Correspondence Analysis, with code in R](https://core.ac.uk/download/pdf/6591520.pdf)\n- [Singular Value Decomposition Tutorial](https://davetang.org/file/Singular_Value_Decomposition_Tutorial.pdf)\n- [Multiple Factor Analysis](https://www.utdallas.edu/~herve/Abdi-MFA2007-pretty.pdf)\n\nNotice that `Light_FAMD` does't support the sparse input,see [Truncated_FAMD](https://github.com/Cauchemare/Truncated_FAMD) for an alternative of sparse and big data.\n\n\n###\tPrincipal-Component-Analysis: PCA\n\n**PCA**(rescale_with_mean=True, rescale_with_std=True, n_components=2, n_iter=3,\n copy=True, check_input=True, random_state=None, engine='auto'):\n\n**Args:**\n- `rescale_with_mean` (bool): Whether to substract each column's mean or not.\n- `rescale_with_std` (bool): Whether to divide each column by it's standard deviation or not.\n- `n_components` (int): The number of principal components to compute.\n- `n_iter` (int): The number of iterations used for computing the SVD.\n- `copy` (bool): Whether to perform the computations inplace or not.\n- `check_input` (bool): Whether to check the consistency of the inputs or not.\n- `engine`(string):\"auto\":randomized_svd,\"fbpca\":Facebook's randomized SVD implementation\n- `random_state`(int, RandomState instance or None, optional (default=None):The seed of the -pseudo random number generator to use when shuffling the data. If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.\nReturn ndarray (M,k),M:Number of samples,K:Number of components.\n\n**Examples:**\n```\n>>>import numpy as np\n>>> np.random.seed(42) # This is for doctests reproducibility\n\n>>>from light_famd import PCA\n>>>X = pd.DataFrame(np.random.randint(0,10,size=(10,3)),columns=list('ABC'))\n>>>pca = PCA(n_components=2)\n>>>pca.fit(X)\nPCA(check_input=True, copy=True, engine='auto', n_components=2, n_iter=3,\n random_state=None, rescale_with_mean=True, rescale_with_std=True)\n\n>>>print(pca.explained_variance_)\n[20.20385109 8.48246239]\n\n>>>print(pca.explained_variance_ratio_)\n[0.6734617029875277, 0.28274874633810754]\n>>>print(pca.column_correlation(X)) # pearson correlation between component and original column,while p-value >=0.05 this similarity is `Nan`.\n 0 1\nA -0.953482 NaN\nB 0.907314 NaN\nC NaN 0.84211\n\n>>>print(pca.transform(X))\n[[-0.82262005 0.11730656]\n [ 0.05359079 1.62298683]\n [ 1.03052849 0.79973099]\n [-0.24313366 0.25651395]\n [-0.94630387 -1.04943025]\n [-0.70591749 -0.01282583]\n [-0.39948373 -1.52612436]\n [ 2.70164194 0.38048482]\n [-2.49373351 0.53655273]\n [ 1.8254311 -1.12519545]]\n>>>print(pca.fit_transform(X))\n[[-0.82262005 0.11730656]\n [ 0.05359079 1.62298683]\n [ 1.03052849 0.79973099]\n [-0.24313366 0.25651395]\n [-0.94630387 -1.04943025]\n [-0.70591749 -0.01282583]\n [-0.39948373 -1.52612436]\n [ 2.70164194 0.38048482]\n [-2.49373351 0.53655273]\n [ 1.8254311 -1.12519545]]\n\n```\n###\tCorrespondence-Analysis: CA\n\n**CA**(n_components=2, n_iter=10, copy=True, check_input=True, random_state=None,\n engine='auto'):\n\n**Args:**\n- `n_components` (int): The number of principal components to compute.\n- `copy` (bool): Whether to perform the computations inplace or not.\n- `check_input` (bool): Whether to check the consistency of the inputs or not.\n- `engine`(string):\"auto\":randomized_svd,\"fbpca\":Facebook's randomized SVD implementation\n- `random_state`(int, RandomState instance or None, optional (default=None):The seed of the -pseudo random number generator to use when shuffling the data. If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.\n\nReturn ndarray (M,k),M:Number of samples,K:Number of components.\n\n**Examples:**\n```\n>>>import numpy as np\n>>>from light_famd import CA\n>>>X = pd.DataFrame(data=np.random.randint(0,100,size=(10,4)),columns=list('ABCD'))\n>>>ca=CA(n_components=2,n_iter=2)\n>>>ca.fit(X)\nCA(check_input=True, copy=True, engine='auto', n_components=2, n_iter=2,\n random_state=None)\n\n>>> print(ca.explained_variance_)\n[0.16892141 0.0746376 ]\n\n>>>print(ca.explained_variance_ratio_)\n[0.5650580210934917, 0.2496697790527281]\n\n>>>print(ca.transform(X))\n[[ 0.23150854 -0.39167802]\n [ 0.36006095 0.00301414]\n [-0.48192602 -0.13002647]\n [-0.06333533 -0.21475652]\n [-0.16438708 -0.10418312]\n [-0.38129126 -0.16515196]\n [ 0.2721296 0.46923757]\n [ 0.82953753 0.20638333]\n [-0.500007 0.36897935]\n [ 0.57932474 -0.1023383 ]]\n\n>>>print(ca.fit_transform(X))\n[[ 0.23150854 -0.39167802]\n [ 0.36006095 0.00301414]\n [-0.48192602 -0.13002647]\n [-0.06333533 -0.21475652]\n [-0.16438708 -0.10418312]\n [-0.38129126 -0.16515196]\n [ 0.2721296 0.46923757]\n [ 0.82953753 0.20638333]\n [-0.500007 0.36897935]\n [ 0.57932474 -0.1023383 ]]\n```\n\n###\tMultiple-Correspondence-Analysis: MCA\nMCA class inherits from CA class.\n\n```\n>>>import pandas as pd\n>>>from light_famd import MCA\n>>>X=pd.DataFrame(np.random.choice(list('abcde'),size=(10,4),replace=True),columns =list('ABCD'))\n>>>print(X)\n A B C D\n0 d e a d\n1 e d b b\n2 e d a e\n3 b b e d\n4 b d b b\n5 c b a e\n6 e d b a\n7 d c d d\n8 b c d a\n9 a e c c\n>>>mca=MCA(n_components=2)\n>>>mca.fit(X)\nMCA(check_input=True, copy=True, engine='auto', n_components=2, n_iter=10,\n random_state=None)\n\n>>>print(mca.explained_variance_)\n[0.90150495 0.76979456]\n\n>>>print(mca.explained_variance_ratio_)\n[0.24040131974598467, 0.20527854948955893]\n\n>>>print(mca.transform(X)) \n[[ 0.55603013 0.7016272 ]\n [-0.73558629 -1.17559462]\n [-0.44972794 -0.4973024 ]\n [-0.16248444 0.95706908]\n [-0.66969377 -0.79951057]\n [-0.21267777 0.39953562]\n [-0.67921667 -0.8707747 ]\n [ 0.05058625 1.34573057]\n [-0.31952341 0.77285922]\n [ 2.62229391 -0.83363941]]\n\n>>>print(mca.fit_transform(X)) \n[[ 0.55603013 0.7016272 ]\n [-0.73558629 -1.17559462]\n [-0.44972794 -0.4973024 ]\n [-0.16248444 0.95706908]\n [-0.66969377 -0.79951057]\n [-0.21267777 0.39953562]\n [-0.67921667 -0.8707747 ]\n [ 0.05058625 1.34573057]\n [-0.31952341 0.77285922]\n [ 2.62229391 -0.83363941]]\n\n```\n###\tMultiple-Factor-Analysis: MFA\nMFA class inherits from PCA class.\nSince FAMD class inherits from MFA and the only thing to do for FAMD is to determine `groups` parameter compare to its superclass `MFA`.therefore we skip this chapiter and go directly to `FAMD`.\n\n\n###\tFactor-Analysis-of-Mixed-Data: FAMD\nThe `FAMD` inherits from the `MFA` class, which entails that you have access to all it's methods and properties of `MFA` class.\n```\n>>>import pandas as pd\n>>>from light_famd import FAMD\n>>>X_n = pd.DataFrame(data=np.random.randint(0,100,size=(10,2)),columns=list('AB'))\n>>>X_c =pd.DataFrame(np.random.choice(list('abcde'),size=(10,4),replace=True),columns =list('CDEF'))\n>>>X=pd.concat([X_n,X_c],axis=1)\n>>>print(X)\n A B C D E F\n0 96 19 b d b e\n1 11 46 b d a e\n2 0 89 a a a c\n3 13 63 c a e d\n4 37 36 d b e c\n5 10 99 a b d c\n6 76 2 c a d e\n7 32 5 c a e d\n8 49 9 c e e e\n9 4 22 c c b d\n\n>>>famd = FAMD(n_components=2)\n>>>famd.fit(X)\nMCA PROCESS MCA PROCESS ELIMINATED 0 COLUMNS SINCE THEIR MISS_RATES >= 99%\nOut:\nFAMD(check_input=True, copy=False, engine='auto', n_components=2, n_iter=2,\n random_state=None)\n\n>>>print(famd.explained_variance_)\n[17.40871219 9.73440949]\n\n>>>print(famd.explained_variance_ratio_)\n[0.32596621039327284, 0.1822701494502082]\n\n>>> print(famd.column_correlation(X))\n 0 1\nA NaN NaN\nB NaN NaN\nC_a NaN NaN\nC_b NaN 0.824458\nC_c 0.922220 NaN\nC_d NaN NaN\nD_a NaN NaN\nD_b NaN NaN\nD_c NaN NaN\nD_d NaN 0.824458\nD_e NaN NaN\nE_a NaN NaN\nE_b NaN NaN\nE_d NaN NaN\nE_e NaN NaN\nF_c NaN -0.714447\nF_d 0.673375 NaN\nF_e NaN 0.839324\n\n\n\n>>>print(famd.transform(X)) \n[[ 2.23848136 5.75809647]\n [ 2.0845175 4.78930072]\n [ 2.6682068 -2.78991262]\n [ 6.2962962 -1.57451325]\n [ 2.52140085 -3.28279729]\n [ 1.58256681 -3.73135011]\n [ 5.19476759 1.18333717]\n [ 6.35288446 -1.33186723]\n [ 5.02971134 1.6216402 ]\n [ 4.05754963 0.69620997]]\n\n>>>print(famd.fit_transform(X))\nMCA PROCESS HAVE ELIMINATE 0 COLUMNS SINCE ITS MISSING RATE >= 99%\n[[ 2.23848136 5.75809647]\n [ 2.0845175 4.78930072]\n [ 2.6682068 -2.78991262]\n [ 6.2962962 -1.57451325]\n [ 2.52140085 -3.28279729]\n [ 1.58256681 -3.73135011]\n [ 5.19476759 1.18333717]\n [ 6.35288446 -1.33186723]\n [ 5.02971134 1.6216402 ]\n [ 4.05754963 0.69620997]]\n\n```\n\n\n\n\n## Going faster\n\nBy default `light_famd` uses `sklearn`'s randomized SVD implementation. One of the goals of `Light_FAMD` is to make it possible to use a different SVD backend. For the while the only other supported backend is [Facebook's randomized SVD implementation](https://research.facebook.com/blog/fast-randomized-svd/) called [fbpca](http://fbpca.readthedocs.org/en/latest/). You can use it by setting the `engine` parameter to `'fbpca'` or see [Truncated_FAMD](https://github.com/Cauchemare/Truncated_FAMD) for an alternative of automatic selection of svd_solver depends on the structure of input:\n\n```python\n>>> import Light_FAMD\n>>> pca = Light_FAMD.PCA(engine='fbpca')\n\n```\n\n\n\n", "description_content_type": "text/markdown", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/Cauchemare/Light_FAMD", "keywords": "famd,factor analysis", "license": "", "maintainer": "", "maintainer_email": "", "name": "light-famd", "package_url": "https://pypi.org/project/light-famd/", "platform": "", "project_url": "https://pypi.org/project/light-famd/", "project_urls": { "Homepage": "https://github.com/Cauchemare/Light_FAMD" }, "release_url": "https://pypi.org/project/light-famd/0.0.3/", "requires_dist": [ "scikit-learn", "scipy", "pandas", "numpy" ], "requires_python": "", "summary": "Light Factor Analysis of Mixed Data", "version": "0.0.3" }, "last_serial": 5902109, "releases": { "0.0.3": [ { "comment_text": "", "digests": { "md5": "976d1d7a40d36335123229be8919fbce", "sha256": "63abe8762ca98f32736b239a94a4b849dd082ad40de5a36be84118136470686a" }, "downloads": -1, "filename": "light_famd-0.0.3-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "976d1d7a40d36335123229be8919fbce", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": null, "size": 14124, "upload_time": "2019-09-29T08:29:04", "url": "https://files.pythonhosted.org/packages/6c/40/6678217385426fe2d7791df6a56c866e8676f9d75afa26e188d9a2a291a3/light_famd-0.0.3-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "bc6e6e28443acc65cd49a6f6778d1018", "sha256": "0d640659de578a572ec513f3741e6c2f5eeaf841884579d15e8c3eb834853b81" }, "downloads": -1, "filename": "light_famd-0.0.3.tar.gz", "has_sig": false, "md5_digest": "bc6e6e28443acc65cd49a6f6778d1018", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 15869, "upload_time": "2019-09-29T08:29:06", "url": "https://files.pythonhosted.org/packages/9e/f0/60e56c2e3c00e33cfeab5d54dfdb917fa960fd8d178fb57be1320af7010b/light_famd-0.0.3.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "976d1d7a40d36335123229be8919fbce", "sha256": "63abe8762ca98f32736b239a94a4b849dd082ad40de5a36be84118136470686a" }, "downloads": -1, "filename": "light_famd-0.0.3-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "976d1d7a40d36335123229be8919fbce", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": null, "size": 14124, "upload_time": "2019-09-29T08:29:04", "url": "https://files.pythonhosted.org/packages/6c/40/6678217385426fe2d7791df6a56c866e8676f9d75afa26e188d9a2a291a3/light_famd-0.0.3-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "bc6e6e28443acc65cd49a6f6778d1018", "sha256": "0d640659de578a572ec513f3741e6c2f5eeaf841884579d15e8c3eb834853b81" }, "downloads": -1, "filename": "light_famd-0.0.3.tar.gz", "has_sig": false, "md5_digest": "bc6e6e28443acc65cd49a6f6778d1018", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 15869, "upload_time": "2019-09-29T08:29:06", "url": "https://files.pythonhosted.org/packages/9e/f0/60e56c2e3c00e33cfeab5d54dfdb917fa960fd8d178fb57be1320af7010b/light_famd-0.0.3.tar.gz" } ] }