{
    "info": {
        "author": "telescopes",
        "author_email": "luyaoli88@gmail.com",
        "bugtrack_url": null,
        "classifiers": [
            "Development Status :: 3 - Alpha",
            "Intended Audience :: Science/Research",
            "License :: OSI Approved :: BSD License",
            "Operating System :: OS Independent",
            "Programming Language :: Python :: 3",
            "Topic :: Scientific/Engineering :: Mathematics",
            "Topic :: Software Development :: Libraries :: Python Modules"
        ],
        "description": "\n# Light_FAMD\n\n`Light_FAMD` is a library for prcessing [factor analysis of mixed data](https://www.wikiwand.com/en/Factor_analysis). This includes a variety of methods including [principal component analysis (PCA)](https://www.wikiwand.com/en/Principal_component_analysis) and [multiply correspondence analysis (MCA)](https://www.researchgate.net/publication/239542271_Multiple_Correspondence_Analysis). The goal is to provide an efficient and light implementation for each algorithm along with a scikit-learn API.\n\n## Table of contents\n\n- [Usage](##Usage)\n  - [Guidelines](###Guidelines)\n  - [Principal component analysis (PCA)](#principal-component-analysis-pca)\n  - [Correspondence analysis (CA)](#correspondence-analysis-ca)\n  - [Multiple correspondence analysis (MCA)](#multiple-correspondence-analysis-mca)\n  - [Multiple factor analysis (MFA)](#multiple-factor-analysis-mfa)\n  - [Factor analysis of mixed data (FAMD)](#factor-analysis-of-mixed-data-famd)\n- [Going faster](#going-faster)\n\n\n\n\n`Light_FAMD` doesn't have any extra dependencies apart from the usual suspects (`sklearn`, `pandas`, `numpy`) which are included with Anaconda.\n\n\n\n### Guidelines\n\nEach base estimator(CA,PCA) provided by `Light_FAMD` extends scikit-learn's `(TransformerMixin,BaseEstimator)`.which means we could use directly `fit_transform`,and `(set_params,get_params)` methods.\n\nUnder the hood `Light_FAMD` uses a [randomised version of SVD](https://scikit-learn.org/dev/modules/generated/sklearn.utils.extmath.randomized_svd.html). This algorithm finds a (usually very good) approximate truncated singular value decomposition using randomization to speed up the computations. It is particularly fast on large matrices on which you wish to extract only a small number of components. In order to obtain further speed up, n_iter can be set <=2 (at the cost of loss of precision). However if you want reproducible results then you should set the `random_state` parameter.\n\nThe randomised version of SVD is an iterative method. Because each of light_famd's algorithms use SVD, they all possess a `n_iter` parameter which controls the number of iterations used for computing the SVD. On the one hand the higher `n_iter` is the more precise the results will be. On the other hand increasing `n_iter` increases the computation time. In general the algorithm converges very quickly so using a low `n_iter` (which is the default behaviour) is recommended.\n\nIn this package,inheritance relationship as shown  below(A->B:A is superclass of B):\n\n- PCA -> MFA -> FAMD\n- CA ->MCA\n\nYou are supposed to use each method depending on your situation:\n\n- All your variables are numeric: use principal component analysis (`PCA`)\n- You have a contingency table: use correspondence analysis (`CA`)\n- You have more than 2 variables and they are all categorical: use multiple correspondence analysis (`MCA`)\n- You have groups of categorical **or** numerical variables: use multiple factor analysis (`MFA`)\n- You have both categorical and numerical variables: use factor analysis of mixed data (`FAMD`)\n\nThe next subsections give an overview of each method along with usage information. The following papers give a good overview of the field of factor analysis if you want to go deeper:\n\n- [A Tutorial on Principal Component Analysis](https://arxiv.org/pdf/1404.1100.pdf)\n- [Theory of Correspondence Analysis](http://statmath.wu.ac.at/courses/CAandRelMeth/caipA.pdf)\n- [Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions](https://arxiv.org/pdf/0909.4061.pdf)\n- [Computation of Multiple Correspondence Analysis, with code in R](https://core.ac.uk/download/pdf/6591520.pdf)\n- [Singular Value Decomposition Tutorial](https://davetang.org/file/Singular_Value_Decomposition_Tutorial.pdf)\n- [Multiple Factor Analysis](https://www.utdallas.edu/~herve/Abdi-MFA2007-pretty.pdf)\n\nNotice that `Light_FAMD` does't support the sparse input,see [Truncated_FAMD](https://github.com/Cauchemare/Truncated_FAMD) for an alternative of sparse and big data.\n\n\n###\tPrincipal-Component-Analysis: PCA\n\n**PCA**(rescale_with_mean=True, rescale_with_std=True, n_components=2, n_iter=3,\n                 copy=True, check_input=True, random_state=None, engine='auto'):\n\n**Args:**\n- `rescale_with_mean` (bool): Whether to substract each column's mean or not.\n- `rescale_with_std` (bool): Whether to divide each column by it's standard deviation or not.\n- `n_components` (int): The number of principal components to compute.\n- `n_iter` (int): The number of iterations used for computing the SVD.\n- `copy` (bool): Whether to perform the computations inplace or not.\n- `check_input` (bool): Whether to check the consistency of the inputs or not.\n- `engine`(string):\"auto\":randomized_svd,\"fbpca\":Facebook's randomized SVD implementation\n- `random_state`(int, RandomState instance or None, optional (default=None):The seed of the -pseudo random number generator to use when shuffling the data. If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.\nReturn ndarray (M,k),M:Number of samples,K:Number of components.\n\n**Examples:**\n```\n>>>import numpy as np\n>>> np.random.seed(42)  # This is for doctests reproducibility\n\n>>>from light_famd  import PCA\n>>>X = pd.DataFrame(np.random.randint(0,10,size=(10,3)),columns=list('ABC'))\n>>>pca = PCA(n_components=2)\n>>>pca.fit(X)\nPCA(check_input=True, copy=True, engine='auto', n_components=2, n_iter=3,\n  random_state=None, rescale_with_mean=True, rescale_with_std=True)\n\n>>>print(pca.explained_variance_)\n[20.20385109  8.48246239]\n\n>>>print(pca.explained_variance_ratio_)\n[0.6734617029875277, 0.28274874633810754]\n>>>print(pca.column_correlation(X))  # pearson correlation between component and  original column,while p-value >=0.05 this similarity is `Nan`.\n          0        1\nA -0.953482      NaN\nB  0.907314      NaN\nC       NaN  0.84211\n\n>>>print(pca.transform(X))\n[[-0.82262005  0.11730656]\n [ 0.05359079  1.62298683]\n [ 1.03052849  0.79973099]\n [-0.24313366  0.25651395]\n [-0.94630387 -1.04943025]\n [-0.70591749 -0.01282583]\n [-0.39948373 -1.52612436]\n [ 2.70164194  0.38048482]\n [-2.49373351  0.53655273]\n [ 1.8254311  -1.12519545]]\n>>>print(pca.fit_transform(X))\n[[-0.82262005  0.11730656]\n [ 0.05359079  1.62298683]\n [ 1.03052849  0.79973099]\n [-0.24313366  0.25651395]\n [-0.94630387 -1.04943025]\n [-0.70591749 -0.01282583]\n [-0.39948373 -1.52612436]\n [ 2.70164194  0.38048482]\n [-2.49373351  0.53655273]\n [ 1.8254311  -1.12519545]]\n\n```\n###\tCorrespondence-Analysis: CA\n\n**CA**(n_components=2, n_iter=10, copy=True, check_input=True, random_state=None,\n                 engine='auto'):\n\n**Args:**\n- `n_components` (int): The number of principal components to compute.\n- `copy` (bool): Whether to perform the computations inplace or not.\n- `check_input` (bool): Whether to check the consistency of the inputs or not.\n- `engine`(string):\"auto\":randomized_svd,\"fbpca\":Facebook's randomized SVD implementation\n- `random_state`(int, RandomState instance or None, optional (default=None):The seed of the -pseudo random number generator to use when shuffling the data. If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.\n\nReturn ndarray (M,k),M:Number of samples,K:Number of components.\n\n**Examples:**\n```\n>>>import numpy as np\n>>>from light_famd import CA\n>>>X  = pd.DataFrame(data=np.random.randint(0,100,size=(10,4)),columns=list('ABCD'))\n>>>ca=CA(n_components=2,n_iter=2)\n>>>ca.fit(X)\nCA(check_input=True, copy=True, engine='auto', n_components=2, n_iter=2,\n  random_state=None)\n\n>>> print(ca.explained_variance_)\n[0.16892141 0.0746376 ]\n\n>>>print(ca.explained_variance_ratio_)\n[0.5650580210934917, 0.2496697790527281]\n\n>>>print(ca.transform(X))\n[[ 0.23150854 -0.39167802]\n [ 0.36006095  0.00301414]\n [-0.48192602 -0.13002647]\n [-0.06333533 -0.21475652]\n [-0.16438708 -0.10418312]\n [-0.38129126 -0.16515196]\n [ 0.2721296   0.46923757]\n [ 0.82953753  0.20638333]\n [-0.500007    0.36897935]\n [ 0.57932474 -0.1023383 ]]\n\n>>>print(ca.fit_transform(X))\n[[ 0.23150854 -0.39167802]\n [ 0.36006095  0.00301414]\n [-0.48192602 -0.13002647]\n [-0.06333533 -0.21475652]\n [-0.16438708 -0.10418312]\n [-0.38129126 -0.16515196]\n [ 0.2721296   0.46923757]\n [ 0.82953753  0.20638333]\n [-0.500007    0.36897935]\n [ 0.57932474 -0.1023383 ]]\n```\n\n###\tMultiple-Correspondence-Analysis: MCA\nMCA class inherits from  CA  class.\n\n```\n>>>import pandas as pd\n>>>from light_famd import MCA\n>>>X=pd.DataFrame(np.random.choice(list('abcde'),size=(10,4),replace=True),columns =list('ABCD'))\n>>>print(X)\n      A  B  C  D\n0  d  e  a  d\n1  e  d  b  b\n2  e  d  a  e\n3  b  b  e  d\n4  b  d  b  b\n5  c  b  a  e\n6  e  d  b  a\n7  d  c  d  d\n8  b  c  d  a\n9  a  e  c  c\n>>>mca=MCA(n_components=2)\n>>>mca.fit(X)\nMCA(check_input=True, copy=True, engine='auto', n_components=2, n_iter=10,\n  random_state=None)\n\n>>>print(mca.explained_variance_)\n[0.90150495 0.76979456]\n\n>>>print(mca.explained_variance_ratio_)\n[0.24040131974598467, 0.20527854948955893]\n\n>>>print(mca.transform(X)) \n[[ 0.55603013  0.7016272 ]\n [-0.73558629 -1.17559462]\n [-0.44972794 -0.4973024 ]\n [-0.16248444  0.95706908]\n [-0.66969377 -0.79951057]\n [-0.21267777  0.39953562]\n [-0.67921667 -0.8707747 ]\n [ 0.05058625  1.34573057]\n [-0.31952341  0.77285922]\n [ 2.62229391 -0.83363941]]\n\n>>>print(mca.fit_transform(X)) \n[[ 0.55603013  0.7016272 ]\n [-0.73558629 -1.17559462]\n [-0.44972794 -0.4973024 ]\n [-0.16248444  0.95706908]\n [-0.66969377 -0.79951057]\n [-0.21267777  0.39953562]\n [-0.67921667 -0.8707747 ]\n [ 0.05058625  1.34573057]\n [-0.31952341  0.77285922]\n [ 2.62229391 -0.83363941]]\n\n```\n###\tMultiple-Factor-Analysis: MFA\nMFA class inherits from  PCA  class.\nSince FAMD class inherits from  MFA and the only thing to do for FAMD is to determine `groups` parameter compare to its  superclass `MFA`.therefore we skip this chapiter and go directly to `FAMD`.\n\n\n###\tFactor-Analysis-of-Mixed-Data: FAMD\nThe `FAMD` inherits from the `MFA` class, which entails that you have access to all it's methods and properties of `MFA` class.\n```\n>>>import pandas as pd\n>>>from light_famd import FAMD\n>>>X_n = pd.DataFrame(data=np.random.randint(0,100,size=(10,2)),columns=list('AB'))\n>>>X_c =pd.DataFrame(np.random.choice(list('abcde'),size=(10,4),replace=True),columns =list('CDEF'))\n>>>X=pd.concat([X_n,X_c],axis=1)\n>>>print(X)\n        A   B  C  D  E  F\n0  96  19  b  d  b  e\n1  11  46  b  d  a  e\n2   0  89  a  a  a  c\n3  13  63  c  a  e  d\n4  37  36  d  b  e  c\n5  10  99  a  b  d  c\n6  76   2  c  a  d  e\n7  32   5  c  a  e  d\n8  49   9  c  e  e  e\n9   4  22  c  c  b  d\n\n>>>famd = FAMD(n_components=2)\n>>>famd.fit(X)\nMCA PROCESS MCA PROCESS ELIMINATED 0  COLUMNS SINCE THEIR MISS_RATES >= 99%\nOut:\nFAMD(check_input=True, copy=False, engine='auto', n_components=2, n_iter=2,\n     random_state=None)\n\n>>>print(famd.explained_variance_)\n[17.40871219  9.73440949]\n\n>>>print(famd.explained_variance_ratio_)\n[0.32596621039327284, 0.1822701494502082]\n\n>>> print(famd.column_correlation(X))\n             0         1\nA         NaN       NaN\nB         NaN       NaN\nC_a       NaN       NaN\nC_b       NaN  0.824458\nC_c  0.922220       NaN\nC_d       NaN       NaN\nD_a       NaN       NaN\nD_b       NaN       NaN\nD_c       NaN       NaN\nD_d       NaN  0.824458\nD_e       NaN       NaN\nE_a       NaN       NaN\nE_b       NaN       NaN\nE_d       NaN       NaN\nE_e       NaN       NaN\nF_c       NaN -0.714447\nF_d  0.673375       NaN\nF_e       NaN  0.839324\n\n\n\n>>>print(famd.transform(X)) \n[[ 2.23848136  5.75809647]\n [ 2.0845175   4.78930072]\n [ 2.6682068  -2.78991262]\n [ 6.2962962  -1.57451325]\n [ 2.52140085 -3.28279729]\n [ 1.58256681 -3.73135011]\n [ 5.19476759  1.18333717]\n [ 6.35288446 -1.33186723]\n [ 5.02971134  1.6216402 ]\n [ 4.05754963  0.69620997]]\n\n>>>print(famd.fit_transform(X))\nMCA PROCESS HAVE ELIMINATE 0  COLUMNS SINCE ITS MISSING RATE >= 99%\n[[ 2.23848136  5.75809647]\n [ 2.0845175   4.78930072]\n [ 2.6682068  -2.78991262]\n [ 6.2962962  -1.57451325]\n [ 2.52140085 -3.28279729]\n [ 1.58256681 -3.73135011]\n [ 5.19476759  1.18333717]\n [ 6.35288446 -1.33186723]\n [ 5.02971134  1.6216402 ]\n [ 4.05754963  0.69620997]]\n\n```\n\n\n\n\n## Going faster\n\nBy default `light_famd` uses `sklearn`'s randomized SVD implementation. One of the goals of `Light_FAMD` is to make it possible to use a different SVD backend. For the while the only other supported backend is [Facebook's randomized SVD implementation](https://research.facebook.com/blog/fast-randomized-svd/) called [fbpca](http://fbpca.readthedocs.org/en/latest/). You can use it by setting the `engine` parameter to `'fbpca'` or see [Truncated_FAMD](https://github.com/Cauchemare/Truncated_FAMD) for an alternative of automatic selection of svd_solver depends on the structure of input:\n\n```python\n>>> import Light_FAMD\n>>> pca = Light_FAMD.PCA(engine='fbpca')\n\n```\n\n\n\n",
        "description_content_type": "text/markdown",
        "docs_url": null,
        "download_url": "",
        "downloads": {
            "last_day": -1,
            "last_month": -1,
            "last_week": -1
        },
        "home_page": "https://github.com/Cauchemare/Light_FAMD",
        "keywords": "famd,factor analysis",
        "license": "",
        "maintainer": "",
        "maintainer_email": "",
        "name": "light-famd",
        "package_url": "https://pypi.org/project/light-famd/",
        "platform": "",
        "project_url": "https://pypi.org/project/light-famd/",
        "project_urls": {
            "Homepage": "https://github.com/Cauchemare/Light_FAMD"
        },
        "release_url": "https://pypi.org/project/light-famd/0.0.3/",
        "requires_dist": [
            "scikit-learn",
            "scipy",
            "pandas",
            "numpy"
        ],
        "requires_python": "",
        "summary": "Light Factor Analysis of Mixed Data",
        "version": "0.0.3"
    },
    "last_serial": 5902109,
    "releases": {
        "0.0.3": [
            {
                "comment_text": "",
                "digests": {
                    "md5": "976d1d7a40d36335123229be8919fbce",
                    "sha256": "63abe8762ca98f32736b239a94a4b849dd082ad40de5a36be84118136470686a"
                },
                "downloads": -1,
                "filename": "light_famd-0.0.3-py2.py3-none-any.whl",
                "has_sig": false,
                "md5_digest": "976d1d7a40d36335123229be8919fbce",
                "packagetype": "bdist_wheel",
                "python_version": "py2.py3",
                "requires_python": null,
                "size": 14124,
                "upload_time": "2019-09-29T08:29:04",
                "url": "https://files.pythonhosted.org/packages/6c/40/6678217385426fe2d7791df6a56c866e8676f9d75afa26e188d9a2a291a3/light_famd-0.0.3-py2.py3-none-any.whl"
            },
            {
                "comment_text": "",
                "digests": {
                    "md5": "bc6e6e28443acc65cd49a6f6778d1018",
                    "sha256": "0d640659de578a572ec513f3741e6c2f5eeaf841884579d15e8c3eb834853b81"
                },
                "downloads": -1,
                "filename": "light_famd-0.0.3.tar.gz",
                "has_sig": false,
                "md5_digest": "bc6e6e28443acc65cd49a6f6778d1018",
                "packagetype": "sdist",
                "python_version": "source",
                "requires_python": null,
                "size": 15869,
                "upload_time": "2019-09-29T08:29:06",
                "url": "https://files.pythonhosted.org/packages/9e/f0/60e56c2e3c00e33cfeab5d54dfdb917fa960fd8d178fb57be1320af7010b/light_famd-0.0.3.tar.gz"
            }
        ]
    },
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "md5": "976d1d7a40d36335123229be8919fbce",
                "sha256": "63abe8762ca98f32736b239a94a4b849dd082ad40de5a36be84118136470686a"
            },
            "downloads": -1,
            "filename": "light_famd-0.0.3-py2.py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "976d1d7a40d36335123229be8919fbce",
            "packagetype": "bdist_wheel",
            "python_version": "py2.py3",
            "requires_python": null,
            "size": 14124,
            "upload_time": "2019-09-29T08:29:04",
            "url": "https://files.pythonhosted.org/packages/6c/40/6678217385426fe2d7791df6a56c866e8676f9d75afa26e188d9a2a291a3/light_famd-0.0.3-py2.py3-none-any.whl"
        },
        {
            "comment_text": "",
            "digests": {
                "md5": "bc6e6e28443acc65cd49a6f6778d1018",
                "sha256": "0d640659de578a572ec513f3741e6c2f5eeaf841884579d15e8c3eb834853b81"
            },
            "downloads": -1,
            "filename": "light_famd-0.0.3.tar.gz",
            "has_sig": false,
            "md5_digest": "bc6e6e28443acc65cd49a6f6778d1018",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 15869,
            "upload_time": "2019-09-29T08:29:06",
            "url": "https://files.pythonhosted.org/packages/9e/f0/60e56c2e3c00e33cfeab5d54dfdb917fa960fd8d178fb57be1320af7010b/light_famd-0.0.3.tar.gz"
        }
    ]
}