{ "info": { "author": "Michael Christoph Nowotny", "author_email": "nowotnym@gmail.com", "bugtrack_url": null, "classifiers": [ "License :: OSI Approved :: MIT License", "Programming Language :: Python", "Programming Language :: Python :: 3", "Programming Language :: Python :: 3.6", "Programming Language :: Python :: 3.7", "Programming Language :: Python :: Implementation :: CPython" ], "description": "\n# Recombinator - Statistical Resampling in Python\n\n\n## Overview\n\nRecombinator is a Python package for statistical resampling in Python. It provides various algorithms for the iid bootstrap, the block bootstrap, as well as optimal block-length selection. \n\n## Algorithms\n\n* I.I.D. bootstrap: Standard i.i.d. bootstrap for one-dimensional and multi-dimensional data, balanced bootstrap, anthithetic bootstrap \n* Block based bootstrap: Moving Block Bootstrap, Circular Block Bootstrap, Stationary Bootstrap, Tapered Block-Bootstrap \n* Optimal block-length selection algorithm for Circular Block Bootstrap and Stationary Bootstrap\n\n## Table of Contents\n\n1. [Installation](#installation)\n2. [Getting Started](#getting-started)\n\n\n## Installation\n### Latest Release\n
\n    pip install recombinator\n
\nor \n
\n    pip3 install recombinator\n
\nif not using Anaconda.\n\nTo get the latest version, clone the repository from github, \nopen a terminal/command prompt, navigate to the root folder and install via\n
\n    pip install .\n
\nor \n
\n    pip3 install . \n
\nif not using Anaconda.\n\n### Most Recent Version on GitHub\n1. Clone the github repository via\n\n
\n    git clone https://github.com/InvestmentSystems/recombinator.git\n
\n\n2. Navigate to the Recombinator base directory and run\n
\n    pip install .\n
\n\n## Getting Started\nPlease see the Jupyter notebooks 'notebooks/Block Bootstrap.ipynb' and 'notebooks/IID Bootstrap.ipynb' for more examples.\n\n### Basic I.I.D. Bootstrap\nImport modules\n
\nimport matplotlib\nimport matplotlib.pyplot as plt\n%matplotlib inline\n\nimport numpy as np\n\nfrom recombinator.iid_bootstrap import \\\n    iid_balanced_bootstrap, \\\n    iid_bootstrap, \\\n    iid_bootstrap_vectorized, \\\n    iid_bootstrap_via_choice, \\\n    iid_bootstrap_via_loop, \\\n    iid_bootstrap_with_antithetic_resampling\n
\n\nFor illustrative purposes, generate an original dataset of sample size n=100 to resample from.\n
\nn=100\nnp.random.seed(1)\nx = np.abs(np.random.randn(n))\n
\n\nEstimate the 75th percentile from the original sample\n
\npercentile = 75\noriginal_statistic = np.percentile(x, percentile)\nprint(original_statistic)\n
\n\nNow generate 100000 new bootstrap samples by drawing from the original sample with replacement.\n
\nR = 100000\nx_resampled = iid_bootstrap(x, replications=R)\n
\nThis produces a 100000 x 100 dimensional NumPy array. Each row is a new sample.\n\nIf we instead wanted to resample without replacement, we could draw shorter samples. \nThis is known as as subsampling. See these lecture notes by Charles Geyer for statistical background: \nhttp://www.stat.umn.edu/geyer/5601/notes/sub.pdf. \n\nIn recombinator, subsampling is achieved by using the keyword arguments \"sub_sample_length\" and \"replace\". \nTo draw samples of size 50 without replacement we would write\n
\nx_resampled = iid_bootstrap(x, replications=R, sub_sample_length=50, replace=False)\n
\n\nLet us return to the original example of resampling at the full sample size with replacement.\nWe are interested in the sampling distribution of a statistic (in this example the 75th percentile of the distribution).\nHence, we calculate the statistic on each of the new bootstrap samples:\n
\nresampled_statistic = np.percentile(x_resampled, percentile, axis=1)\n
\n\nWe can use Recombinator to estimate the bootstrap standard error and the 95% \nconfidence interval of the statistic as follows.\n
\nfrom recombinator.statistics import \\\n    estimate_confidence_interval_from_bootstrap, \\\n    estimate_standard_error_from_bootstrap\n
\n\nEstimate the standard error of the estimate of the 75th percentile via bootstrap:\n
\nestimate_standard_error_from_bootstrap(bootstrap_estimates=resampled_statistic,\n                                       original_estimate=original_statistic)\n
\n\nEstimate the 95% confidence interval of the 75th percentile via bootstrap:\n
\nestimate_confidence_interval_from_bootstrap(bootstrap_estimates=resampled_statistic, \n                                            confidence_level=95)\n
\n\nRecombinator supports other variations of the standard i.i.d. bootstrap \n(the balanced and the antithetic bootstrap). Please see the Jupyter notebook on the I.I.D. boostrap for examples.\n\n### Block-Based Bootstrap for Time-Series\nRecombinator offers the following block-based approaches to resample temporally dependent data:\n* Moving Block Bootstrap - Kuensch (1989)\n* Circular Block Bootstrap - Politis and Romano (1992)\n* Stationary Bootstrap - Politis and Romano (1994)\n* Tapered Block Bootstrap - Paparoditis and Politis (2001) \n\nImport statsmodels for the estimation of time-series models.\n
\nfrom statsmodels.tsa.arima_process import ArmaProcess\nfrom statsmodels.tsa.ar_model import AR\nimport statsmodels.api as sm\nfrom statsmodels.graphics.tsaplots import plot_acf\n
\n\n\nGenerate a sample of length n=1000 from an AR(1) process with autoregressive coefficient 0.5:\n
\nnp.random.seed(1)\n\n# number of time periods\nT = 1000\n\n# draw random errors\ne = np.random.randn(T)\ny = np.zeros((T,))\n\n# y is an AR(1) with phi_1 = 0.5\nphi_1 = 0.5\ny[0] = e[0]*np.sqrt(1.0/(1.0-phi_1**2))\nfor t in range(1, T):\n    y[t]=phi_1*y[t-1] + e[t]\n
\n\n\n#### Optimal Block-Length\nIn order to preserve temporal dependence in time-series data, \nbootstrap algorithms sample from the original data in blocks rather than \nsampling single observations. \nA key question is what block length to use. \nWe are using a data based block length selection algorithm due to Politis and White (2004) \nwith corrections by Patton, Politis, and White (2007). \nThis algorithm produces optimal block-lengths for the circular block bootstrap and the stationary bootstrap for the estimation of the variance of the mean.\n\nImport the optimal block-length selection functionality:\n
\nfrom recombinator.optimal_block_length import optimal_block_length\n
\n\nCompute the optimal block length of the sample from the AR(1) process created above:\n
\nb_star = optimal_block_length(y)\nb_star_sb = b_star[0].b_star_sb\nb_star_cb = math.ceil(b_star[0].b_star_cb)\nprint(f'optimal block length for stationary bootstrap = {b_star_sb}')\nprint(f'optimal block length for circular bootstrap = {b_star_cb}')\n
\n\n#### Resampling using a Block-Based Bootstrap\nWe illustrate the procedure using the circular-block bootstrap.\n
\nfrom recombinator.block_bootstrap import circular_block_bootstrap\n
\n\nThe true autoregressive coefficient of the process we simulated is 0.5. \nThe estimated coefficient from the simulated time-series is obtained as follows\n
\nar = AR(y)\nestimate_from_original_data = ar.fit(maxlag=1)\nprint(estimate_from_original_data.params[1])\n
\n\nStatsmodels can produce an estimate of the standard error of the estimate of the autoregressive coefficient resorting to asymptotic theory:\n
\nprint(estimate_from_original_data.bse[1])\n
\n\nGenerate 10000 new time-series by resampling using a circular-block bootstrap\n
\n# number of replications for bootstraps (number of resampled time-series to generate)\nB = 10000\n\ny_star_cb \\\n    = circular_block_bootstrap(y, \n                               block_length=b_star_cb, \n                               replications=B, \n                               replace=True)\n
\n\nNow estimate the AR coefficient on each of the bootstrap samples\n
\nestimates_from_bootstrap = []\nar_estimates_from_bootstrap = np.zeros((B, ))\n\nfor b in range(B):\n    y_bootstrap = y_star_cb[b, :]\n    ar_bootstrap = AR(y_bootstrap)\n    estimate_from_bootstrap = ar_bootstrap.fit(maxlag=1)\n    estimates_from_bootstrap.append(estimate_from_bootstrap)\n    ar_estimates_from_bootstrap[b] = estimate_from_bootstrap.params[1]\n
\n\nPlot the sampling distribution and compute its mean and median\n
\nplt.hist(ar_estimates_from_bootstrap, bins=20)\nprint(f'mean={np.mean(ar_estimates_from_bootstrap)}')\nprint(f'median={np.median(ar_estimates_from_bootstrap)}')\n
\n\nIt turns out that the mean and median are below the AR coefficient estimated on the original sample. \nThis is due to the fact that the block bootstrap approach breaks the temporal \ndependence whenever a new block starts. \nOne can reduce this effect by choosing a higher block-length at the expense of \nreducing the number of possible permutations of the data.\n\nAnother way to reduce this issue is to use a tapered-block bootstrap which is \ndesigned to mitigate artificial structural breaks introduced by the resampling \nprocedure at the block-transition points.\n\nImport the function\n
\nfrom recombinator.tapered_block_bootstrap import tapered_block_bootstrap\n
\n\nRun the tapered-block bootstrap\n
\ny_star_tbb \\\n    = tapered_block_bootstrap(y, \n                              block_length=b_star_cb, \n                              replications=B)\n
\n\nAgain estimate the AR coefficient on each of the new bootstrap samples\n
\nestimates_from_bootstrap = []\nar_estimates_from_bootstrap = np.zeros((B, ))\n\nfor b in range(B):\n    y_bootstrap = y_star_tbb[b, :]\n    ar_bootstrap = AR(y_bootstrap)\n    estimate_from_bootstrap = ar_bootstrap.fit(maxlag=1)\n    estimates_from_bootstrap.append(estimate_from_bootstrap)\n    ar_estimates_from_bootstrap[b] = estimate_from_bootstrap.params[1]\n
\n\n... and plot the sampling distribution\n
\nplt.hist(ar_estimates_from_bootstrap, bins=20)\nprint(f'mean={np.mean(ar_estimates_from_bootstrap)}')\nprint(f'median={np.median(ar_estimates_from_bootstrap)}')\n
\n\nThe mean and median of the distribution are now much closer to the estimate from the original time series.\n\n### Running Recombinator on a GPU\nResampling can be a computationally intensive task, which is highly parallelizable. \nThe implementations of various algorithms fall into two broad categories:\n* Loop-based (via Numba)\n* Vectorized (via NumPy)\n\nVectorized implementations can be run on the GPU depending on the availability a \nGPU package with a NumPy compatible interface. \nTo this end, vectorized implementations in Recombinator lets the user specify \nalternative modules and functions that are to be used internally.\n\nA NumPy compatible package that supports both CUDA and OpenCL is Cocos \navailable at https://github.com/michaelnowotny/cocos.\n\nUsing Cocos, resampling on the GPU using Recombinator can be performed as follows.\n\nImport the NumPy-like Cocos package. This requires an ArrayFire installation, a compatible GPU, and the installation of Cocos.\n
\nimport cocos.numerics as cn\nfrom cocos.device import gpu_sync_wrapper, sync\n
\n\n#### I.I.D. Boostrap\nGenerate an original sample to work with\n
\nn = 100\ncn.random.seed(1)\nx_gpu = cn.absolute(cn.random.randn(n))\nplt.hist(x_gpu);\n
\n\nResample using an i.i.d. boostrap\n
\nx_resampled_vectorized_gpu \\\n    = iid_bootstrap_vectorized(x_gpu, \n                               replications=R, \n                               randint=cn.random.randint)\nsync()\n
\n\nIn this case, GPU support is achieved by simply specifying an alternative \nimplementation of NumPy's randint function.\n\n#### Block-Based Bootstrap\nTransfer the array created in the time-series example above to the GPU\n
\ny_gpu = cn.array(y)\n
\n\nRun a circular-block boostrap on the GPU\n
\ny_star_cb \\\n    = circular_block_bootstrap_vectorized(y_gpu, \n                                          block_length=b_star_cb, \n                                          replications=B, \n                                          replace=True, \n                                          num_pack=cn, \n                                          choice=cn.random.choice)\n
\n\nEstimate the AR coefficient on each of the bootstrap samples\n
\nestimates_from_bootstrap = []\nar_estimates_from_bootstrap = np.zeros((len(y_star_cb), ))\n\nfor b in range(len(y_star_cb)):\n    y_bootstrap = np.array(y_star_mb[b, :].squeeze())\n    ar_bootstrap = AR(y_bootstrap)\n    estimate_from_bootstrap = ar_bootstrap.fit(maxlag=1)\n    estimates_from_bootstrap.append(estimate_from_bootstrap)\n    ar_estimates_from_bootstrap[b] = estimate_from_bootstrap.params[1]\n
\n\nPlot the empirical sampling distribution and the compute its mean and median\n
\nplt.hist(ar_estimates_from_bootstrap, bins=20)\nprint(f'mean={np.mean(ar_estimates_from_bootstrap)}')\nprint(f'median={np.median(ar_estimates_from_bootstrap)}')\n
\n\nFor the block-based bootstrap, GPU support requires the specification of a \nfunction compatible with NumPy's random.choice as well as a package \n(num_pack) that is compatible with NumPy.\n\n#### Cocos-Specific Convenience Functions\nThe module recombinator.bootstrap_cocos provides Cocos specific wrapper-functions \nfor GPU enabled vectorized bootstrap procedures.\n\n", "description_content_type": "text/markdown", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/michaelnowotny/recombinator", "keywords": "", "license": "MIT", "maintainer": "", "maintainer_email": "", "name": "recombinator", "package_url": "https://pypi.org/project/recombinator/", "platform": "", "project_url": "https://pypi.org/project/recombinator/", "project_urls": { "Homepage": "https://github.com/michaelnowotny/recombinator" }, "release_url": "https://pypi.org/project/recombinator/0.0.3/", "requires_dist": [ "cocos", "numba", "numpy", "pandas", "scipy", "pytest" ], "requires_python": ">=3.6.0", "summary": "Recombinator - Statistical Resampling in Python", "version": "0.0.3" }, "last_serial": 5543963, "releases": { "0.0.1": [ { "comment_text": "", "digests": { "md5": "9605e3ff60b9ea7f956fb9dca6de6181", "sha256": "8b5b976156a4bc4527af38922a927e85a264a4c8e8183afe0d8a118147acc0a5" }, "downloads": -1, "filename": "recombinator-0.0.1-py3-none-any.whl", "has_sig": false, "md5_digest": "9605e3ff60b9ea7f956fb9dca6de6181", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": ">=3.6.0", "size": 20466, "upload_time": "2019-07-03T01:29:16", "url": "https://files.pythonhosted.org/packages/57/4f/6db305df2347f466f0b17f7579644c1c846ef76a2ef944a80dcf64bcdc25/recombinator-0.0.1-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "ea66dcfb552a7686166664464e5c6c70", "sha256": "df6f7c1d75151ec16a127a00ab385fe1fbf2045cc931875a93e997e178443399" }, "downloads": -1, "filename": "recombinator-0.0.1.tar.gz", "has_sig": false, "md5_digest": "ea66dcfb552a7686166664464e5c6c70", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.6.0", "size": 16471, "upload_time": "2019-07-03T01:29:18", "url": "https://files.pythonhosted.org/packages/9a/a3/37e4a06dbafa97e64a5231e24244bba927a9f2c6cef6a84ffbd5180531e1/recombinator-0.0.1.tar.gz" } ], "0.0.2": [ { "comment_text": "", "digests": { "md5": "b841cf4602e9fa9ee10b3acad8c9b770", "sha256": "b9b177b04d46104c5918f4611fb958b5a97482692758dbf4239a6c9d89b120f2" }, "downloads": -1, "filename": "recombinator-0.0.2-py3-none-any.whl", "has_sig": false, "md5_digest": "b841cf4602e9fa9ee10b3acad8c9b770", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": ">=3.6.0", "size": 22094, "upload_time": "2019-07-09T09:34:38", "url": "https://files.pythonhosted.org/packages/3d/83/7a7a224b3e5a9b4270873462104444ad339be47ac4c56225238780bf7da5/recombinator-0.0.2-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "212a41a24d496603db0e01e0bdbff9f6", "sha256": "65e7ceaacee0ea43db168b7e2368f52dd80c64f69ee5d0b10e65da063f847669" }, "downloads": -1, "filename": "recombinator-0.0.2.tar.gz", "has_sig": false, "md5_digest": "212a41a24d496603db0e01e0bdbff9f6", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.6.0", "size": 17128, "upload_time": "2019-07-09T09:34:40", "url": "https://files.pythonhosted.org/packages/71/89/ac7fab6f6269848e70305115584258e0fee0425cc770f1d913f3efc32e33/recombinator-0.0.2.tar.gz" } ], "0.0.3": [ { "comment_text": "", "digests": { "md5": "d013881e9c9083746f6584c5aef72292", "sha256": "8ce343f4fba3bb6e66424e5cbca1c4b2f18d06a87275eae92567ea925b3a873e" }, "downloads": -1, "filename": "recombinator-0.0.3-py3-none-any.whl", "has_sig": false, "md5_digest": "d013881e9c9083746f6584c5aef72292", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": ">=3.6.0", "size": 28415, "upload_time": "2019-07-17T04:23:52", "url": "https://files.pythonhosted.org/packages/e3/79/4da0c103153555dd7fca4cfd9173c2a9eddc118208b14370e177ce8737eb/recombinator-0.0.3-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "15e58b08d76bc496d1d8e5a09c3d3430", "sha256": "3483eb2b0831ccb8f21afdbece6e23dbd44529178e0c3bb96cf3e0f5412fb685" }, "downloads": -1, "filename": "recombinator-0.0.3.tar.gz", "has_sig": false, "md5_digest": "15e58b08d76bc496d1d8e5a09c3d3430", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.6.0", "size": 23688, "upload_time": "2019-07-17T04:23:54", "url": "https://files.pythonhosted.org/packages/a2/77/78bc818f5405db0b80d3cb01b28453e1711543aa36ff32fec616ad053a57/recombinator-0.0.3.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "d013881e9c9083746f6584c5aef72292", "sha256": "8ce343f4fba3bb6e66424e5cbca1c4b2f18d06a87275eae92567ea925b3a873e" }, "downloads": -1, "filename": "recombinator-0.0.3-py3-none-any.whl", "has_sig": false, "md5_digest": "d013881e9c9083746f6584c5aef72292", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": ">=3.6.0", "size": 28415, "upload_time": "2019-07-17T04:23:52", "url": "https://files.pythonhosted.org/packages/e3/79/4da0c103153555dd7fca4cfd9173c2a9eddc118208b14370e177ce8737eb/recombinator-0.0.3-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "15e58b08d76bc496d1d8e5a09c3d3430", "sha256": "3483eb2b0831ccb8f21afdbece6e23dbd44529178e0c3bb96cf3e0f5412fb685" }, "downloads": -1, "filename": "recombinator-0.0.3.tar.gz", "has_sig": false, "md5_digest": "15e58b08d76bc496d1d8e5a09c3d3430", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.6.0", "size": 23688, "upload_time": "2019-07-17T04:23:54", "url": "https://files.pythonhosted.org/packages/a2/77/78bc818f5405db0b80d3cb01b28453e1711543aa36ff32fec616ad053a57/recombinator-0.0.3.tar.gz" } ] }