{ "info": { "author": "Damien Forthommme", "author_email": "damien2227@hotmail.com", "bugtrack_url": null, "classifiers": [ "Development Status :: 4 - Beta", "Intended Audience :: Science/Research", "License :: OSI Approved :: MIT License", "Programming Language :: Python :: 2.7", "Programming Language :: Python :: 3.5", "Topic :: Software Development :: Build Tools" ], "description": "ipfn\n=======================\n\nIterative proportional fitting is an algorithm used is many different fields such as economics or social sciences, to alter results in such a way that aggregates along one or several dimensions match known marginals (or aggregates along these same dimensions).\n\nThe algorithm exists in 2 versions:\n\n* numpy version, which the fastest by far\n* pandas version, which is much slower but easier to use than the numpy version\n\n\nThe algorithm recognizes the input variable type and and uses the appropriate version to solve the problem. To install the package:\n\n* pip install ipfn\n* pip install git+http://github.com/dirguis/ipfn@master\n\nFor more information and examples, please visit:\n\n* `wikipedia page on ipf `_\n* `slides explaining the methodology and links to specific examples `_\n* https://github.com/Dirguis/ipfn\n\n----\n\nIf you want to test the package, clone the repo and from the main folder, run:\n\n* py.test --verbose --color=yes tests/tests.py\n\n----\n\nThe project is similar to the ipfp package available for R and tests have been run to ensure same results.\n\n----\n\nInput Variables:\n * original: numpy darray matrix or dataframe to perform the ipfn on.\n * aggregates: list of numpy array or darray or pandas dataframe/series. The aggregates are the same as the marginals.\nThey are the target values that we want along one or several axis when aggregating along one or several axes.\n * dimensions: list of lists with integers if working with numpy objects, or column names if working with pandas objects.\nPreserved dimensions along which we sum to get the corresponding aggregates.\n * convergence_rate: if there are many aggregates/marginal, it could be useful to loosen the convergence criterion.\n * max_iteration: Integer. Maximum number of iterations allowed.\n * verbose: integer 0, 1 or 2. Each case number includes the outputs of the previous case numbers.\n\n * 0: Updated matrix returned.\n\n * 1: Flag with the output status (0 for failure and 1 for success).\n\n * 2: dataframe with iteration numbers and convergence rate information at all steps.\n\n * rate_tolerance: float value. If above 0.0, like 0.001, the algorithm will stop once the difference between the conv_rate variable of 2 consecutive iterations is below that specified value.\n\nExample with the numpy version of the algorithm:\n------------------------------------------------\nPlease, follow the example below to run the package. Several additional examples in addition to the one listed below, are listed in the ipfn.py script. This example is taken from ``_\n\nFirst, let us define a matrix of N=3 dimensions, the matrix being of specific size 2*4*3 and populate that matrix with some values ::\n\n from ipfn import ipfn\n import numpy as np\n import pandas as pd\n\n m = np.zeros((2,4,3))\n m[0,0,0] = 1\n m[0,0,1] = 2\n m[0,0,2] = 1\n m[0,1,0] = 3\n m[0,1,1] = 5\n m[0,1,2] = 5\n m[0,2,0] = 6\n m[0,2,1] = 2\n m[0,2,2] = 2\n m[0,3,0] = 1\n m[0,3,1] = 7\n m[0,3,2] = 2\n m[1,0,0] = 5\n m[1,0,1] = 4\n m[1,0,2] = 2\n m[1,1,0] = 5\n m[1,1,1] = 5\n m[1,1,2] = 5\n m[1,2,0] = 3\n m[1,2,1] = 8\n m[1,2,2] = 7\n m[1,3,0] = 2\n m[1,3,1] = 7\n m[1,3,2] = 6\n\nNow, let us define some marginals::\n\n xipp = np.array([52, 48])\n xpjp = np.array([20, 30, 35, 15])\n xppk = np.array([35, 40, 25])\n xijp = np.array([[9, 17, 19, 7], [11, 13, 16, 8]])\n xpjk = np.array([[7, 9, 4], [8, 12, 10], [15, 12, 8], [5, 7, 3]])\n\nI used the letter p to denote the dimension(s) being summed over\n\nFor this specific example, they all have to be less than N=3 dimensions and be consistent with the dimensions of contingency table m. For example, the marginal along the first dimension will be made of 2 elements. We want the sum of elements in m for dimensions 2 and 3 to equal the marginal::\n\n m[0,:,:].sum() == xipp[0]\n m[1,:,:].sum() == xipp[1]\n\nDefine the aggregates list and the corresponding list of dimension to indicate the algorithm which dimension(s) to sum over for each aggregate::\n\n aggregates = [xipp, xpjp, xppk, xijp, xpjk]\n dimensions = [[0], [1], [2], [0, 1], [1, 2]]\n\nFinally, run the algorithm::\n\n IPF = ipfn.ipfn(m, aggregates, dimensions)\n m = IPF.iteration()\n print(xijp[0,0])\n print(m[0, 0, :].sum())\n\n\nExample with the pandas version of the algorithm:\n------------------------------------------------\nIn the same fashion, we can run a similar example, but using a dataframe::\n\n from ipfn import ipfn\n import numpy as np\n import pandas as pd\n\n m = np.array([1., 2., 1., 3., 5., 5., 6., 2., 2., 1., 7., 2.,\n 5., 4., 2., 5., 5., 5., 3., 8., 7., 2., 7., 6.], )\n dma_l = [501, 501, 501, 501, 501, 501, 501, 501, 501, 501, 501, 501,\n 502, 502, 502, 502, 502, 502, 502, 502, 502, 502, 502, 502]\n size_l = [1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4,\n 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4]\n\n age_l = ['20-25','30-35','40-45',\n '20-25','30-35','40-45',\n '20-25','30-35','40-45',\n '20-25','30-35','40-45',\n '20-25','30-35','40-45',\n '20-25','30-35','40-45',\n '20-25','30-35','40-45',\n '20-25','30-35','40-45']\n\n df = pd.DataFrame()\n df['dma'] = dma_l\n df['size'] = size_l\n df['age'] = age_l\n df['total'] = m\n\n xipp = df.groupby('dma')['total'].sum()\n xpjp = df.groupby('size')['total'].sum()\n xppk = df.groupby('age')['total'].sum()\n xijp = df.groupby(['dma', 'size'])['total'].sum()\n xpjk = df.groupby(['size', 'age'])['total'].sum()\n # xppk = df.groupby('age')['total'].sum()\n\n xipp.loc[501] = 52\n xipp.loc[502] = 48\n\n xpjp.loc[1] = 20\n xpjp.loc[2] = 30\n xpjp.loc[3] = 35\n xpjp.loc[4] = 15\n\n xppk.loc['20-25'] = 35\n xppk.loc['30-35'] = 40\n xppk.loc['40-45'] = 25\n\n xijp.loc[501] = [9, 17, 19, 7]\n xijp.loc[502] = [11, 13, 16, 8]\n\n xpjk.loc[1] = [7, 9, 4]\n xpjk.loc[2] = [8, 12, 10]\n xpjk.loc[3] = [15, 12, 8]\n xpjk.loc[4] = [5, 7, 3]\n\n aggregates = [xipp, xpjp, xppk, xijp, xpjk]\n dimensions = [['dma'], ['size'], ['age'], ['dma', 'size'], ['size', 'age']]\n\n IPF = ipfn.ipfn(df, aggregates, dimensions)\n df = IPF.iteration()\n\n print(df)\n print(df.groupby('size')['total'].sum(), xpjp)\n\nAdded notes:\n------------\n\nTo call the algorithm in a program, execute::\n\n from ipfn import ipfn\n\n\n", "description_content_type": "", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/Dirguis/ipfn.git", "keywords": "iterative proportional fitting ipfp biproportional ras raking scaling", "license": "MIT", "maintainer": "", "maintainer_email": "", "name": "ipfn", "package_url": "https://pypi.org/project/ipfn/", "platform": "Any", "project_url": "https://pypi.org/project/ipfn/", "project_urls": { "Homepage": "https://github.com/Dirguis/ipfn.git" }, "release_url": "https://pypi.org/project/ipfn/1.3.0/", "requires_dist": [ "pandas", "numpy" ], "requires_python": "", "summary": "Iterative Proportional Fitting with N dimensions, for python", "version": "1.3.0" }, "last_serial": 5249678, "releases": { "1.1.6": [ { "comment_text": "", "digests": { "md5": "137b15e382991d3de5ec6eb0985f2e6b", "sha256": "302a2d6a7a1845c8e874914dbe948d42bd88d373f881234983190ac72ddec3cb" }, "downloads": -1, "filename": "ipfn-1.1.6-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "137b15e382991d3de5ec6eb0985f2e6b", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": null, "size": 10257, "upload_time": "2017-01-15T19:42:45", "url": "https://files.pythonhosted.org/packages/99/17/89bd84bdc468f16ccded499fcb44cfbe8cb01276a11ef936ee02dd94f63d/ipfn-1.1.6-py2.py3-none-any.whl" } ], "1.1.7": [ { "comment_text": "", "digests": { "md5": "5c2c1ae6e010a1cadfb2403c11f1ce71", "sha256": "0d9826d756b532710905a562f798ce50f0849a3e6c2b6b2da3f91a07a4105fff" }, "downloads": -1, "filename": "ipfn-1.1.7-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "5c2c1ae6e010a1cadfb2403c11f1ce71", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": null, "size": 10285, "upload_time": "2017-02-11T03:12:05", "url": "https://files.pythonhosted.org/packages/7c/7e/5679bc5f276db1ea78d62eb48fb8f817e0789cfff82717c3a7ac2e7044bd/ipfn-1.1.7-py2.py3-none-any.whl" } ], "1.2.0": [ { "comment_text": "", "digests": { "md5": "466990ea467d599eef453d515b73f12a", "sha256": "45d38d1fda84e861925c6b754623af3b4ad623bf1e5010cff91fc5669255df44" }, "downloads": -1, "filename": "ipfn-1.2.0-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "466990ea467d599eef453d515b73f12a", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": null, "size": 11640, "upload_time": "2018-03-26T17:31:41", "url": "https://files.pythonhosted.org/packages/4e/7a/2ac76c22c357d83cdc29cfb1bae01411f687941c5b8ca91b386705ec200c/ipfn-1.2.0-py2.py3-none-any.whl" } ], "1.3.0": [ { "comment_text": "", "digests": { "md5": "dfb956c39d70ef07defa503845796dfb", "sha256": "4ca31fc6f188ad55dff30840740f65f6dff3c4ca58a0394219a4fae768943e2c" }, "downloads": -1, "filename": "ipfn-1.3.0-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "dfb956c39d70ef07defa503845796dfb", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": null, "size": 7217, "upload_time": "2019-05-09T22:13:07", "url": "https://files.pythonhosted.org/packages/03/cc/cc081ea80cabb764783a781d28f8b736b58c3edf2551ef39c59f0b249911/ipfn-1.3.0-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "245b483cda170013385ec89cdbe2914c", "sha256": "e15d68f06dd90a2b5ff03894c6b5d1079906e8c87f1aae052fdbc8925ba816bc" }, "downloads": -1, "filename": "ipfn-1.3.0.tar.gz", "has_sig": false, "md5_digest": "245b483cda170013385ec89cdbe2914c", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 7907, "upload_time": "2019-05-09T22:13:09", "url": "https://files.pythonhosted.org/packages/41/38/bfebbc5b0776651fd55e2a4c69c12bfd6b6af1ed4e31c1d5151788a47ade/ipfn-1.3.0.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "dfb956c39d70ef07defa503845796dfb", "sha256": "4ca31fc6f188ad55dff30840740f65f6dff3c4ca58a0394219a4fae768943e2c" }, "downloads": -1, "filename": "ipfn-1.3.0-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "dfb956c39d70ef07defa503845796dfb", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": null, "size": 7217, "upload_time": "2019-05-09T22:13:07", "url": "https://files.pythonhosted.org/packages/03/cc/cc081ea80cabb764783a781d28f8b736b58c3edf2551ef39c59f0b249911/ipfn-1.3.0-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "245b483cda170013385ec89cdbe2914c", "sha256": "e15d68f06dd90a2b5ff03894c6b5d1079906e8c87f1aae052fdbc8925ba816bc" }, "downloads": -1, "filename": "ipfn-1.3.0.tar.gz", "has_sig": false, "md5_digest": "245b483cda170013385ec89cdbe2914c", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 7907, "upload_time": "2019-05-09T22:13:09", "url": "https://files.pythonhosted.org/packages/41/38/bfebbc5b0776651fd55e2a4c69c12bfd6b6af1ed4e31c1d5151788a47ade/ipfn-1.3.0.tar.gz" } ] }