{ "info": { "author": "David Nielsen", "author_email": "davidnielsen@id.uff.br", "bugtrack_url": null, "classifiers": [ "License :: OSI Approved :: MIT License", "Operating System :: OS Independent", "Programming Language :: Python :: 3" ], "description": "# pystuff\n\n[![GitHub version](https://badge.fury.io/gh/davidmnielsen%2Fpystuff.svg)](https://badge.fury.io/gh/davidmnielsen%2Fpystuff) [![PyPI version](https://badge.fury.io/py/pystuff.svg)](https://badge.fury.io/py/pystuff)\n\nSome useful functions for data analysis in python.\n\n- [Introduction](#introduction)\n- [Handling time series](#handling-time-series) \n- [Periodogram](#periodogram)\n- [Principal Component Analysis (PCA)](#principal-component-analysis)\n- [Ensemble Subsampling](#ensemble-subsampling)\n- [Statistics](#statistics)\n\n## Introduction\n\n#### Installation\n```\n$ pip install pystuff\n```\n\n#### Import pystuff\n```python\n>>> import pystuff as ps\n```\n\n#### Import auxiliary packages\n```python\nimport numpy as np\nimport xarray as xr\nimport pandas as pd\nimport matplotlib.pyplot as plt\n```\n\n#### Read example data\n```python\n# ERA-Interim SST anomalies, weighted by sqrt(cos(lat))\npathfile='/work/uo1075/u241292/data/ERAI/atmo/monavg_sst_w2_anom.nc'\nds=xr.open_dataset(pathfile)\nlat=ds['lat'].values\nlon=ds['lon'].values\nmyvar=ds['sst'].values\ntime=pd.to_datetime(ds['time'].values)\n```\n\n## Handling time series\n\n```python\n# Area Average\nnasst, sasst_map = ps.getbox([0,60,280,360],lat,lon,myvar,returnmap=True)\n\n# Standardize (center=True is default)\nnasstn = ps.standardize(nasst)\n\n# Running Mean\nnasstn_rm = ps.runmean(nasstn,window=5*12)\nnasstn_rm_fill = ps.runmean(nasstn,window=5*12,fillaround=True)\n\n# Detrend (redundant with ddreg, to some extent)\n# If returnTrend=False, only detended series is returned\nnasstndt, slope, trend = ps.ddetrend(nasstn, returnTrend=True)\n\n# The trend line can also be taken from:\ntrend2 = ps.ddreg(range(len(nasst)),nasst)\n\n# Annual Mean\nannual=ps.annualmean(nasstndt)\n\n# Low-pass Lanczos Filter\ndt=12 # month\ncutoff=5 # years\nlow, low_nonan = ps.lanczos(nasstndt,dt*cutoff,returnNonan=True)\n\n# Example Plot\nfig = plt.figure(figsize=(9,4.5))\n\nax=fig.add_subplot(1,2,1)\nps.nospines(ax)\nplt.axhline(0,color='grey',ls='--')\nplt.plot(time,nasstn,'b',lw=0.2, label='Monthly NASST')\nplt.plot(time,nasstn_rm,'b',lw=2, label='5-year running mean')\nplt.plot(time,low+trend,'g',lw=2, label='5-year low-pass Lanczos filter')\nplt.plot(time,trend,'-r', lw=2, label='Trend line')\nplt.text(0.1,0.8,'Trend=%.2f std year$^{-1}$' %(float(slope)*12), color='r',\n transform=plt.gcf().transFigure)\nplt.ylim((-2.5,2.5))\nplt.ylabel('std. units')\nplt.title('NASST')\nplt.title('a', loc='left',fontweight='bold', fontsize=14)\nps.leg(loc='lower right', fontsize=9)\n\nps.usetex(False)\nplt.tight_layout()\nplt.show()\n\nfig.savefig('/home/zmaw/u241292/scripts/python/pystuff/figs/timeseries.png')\t\t\n```\n\n![alt text](https://raw.githubusercontent.com/davidmnielsen/pystuff/master/figs/timeseries.png \"timeseries.png\")\n\n## Periodogram\n\n```python\nfig = plt.figure(figsize=(5,4.5))\n\nax=fig.add_subplot(111)\nps.nospines(ax)\n\n# Calculate yearly periodogram on monthly data (Monte Carlo with 10k iterations)\nf, psd, pctl, max5, meanRed = ps.periods(nasstndt, dt=12, nsim=10000)\n\nplt.plot(f, psd,'b', lw=2, label='NASST Power Spectrum')\nplt.plot(f,meanRed, 'r', lw=2, label='Mean of 10k Red-Noise Spectra')\nplt.plot(f,pctl[:,0], '--k', label='80% Percentile')\nplt.plot(f,pctl[:,1], '-k', label='90% Percentile')\nplt.plot(f,pctl[:,2], '--g', label='95% Percentile')\nplt.plot(f,pctl[:,3], '-g', label='99% Percentile')\nplt.xlabel('Frequency [year$^{-1}$]')\nplt.ylabel('PSD [Units$^{2}$ year]')\nplt.xlim((0,2.5))\nplt.title('Periodogram of NASST')\nplt.title('b', loc='left',fontweight='bold', fontsize=14)\nps.leg(fontsize=10, frameon=True)\n\n# Print maximum periods on graph\nfor i in range(5):\n t=plt.text(max5[i,0],max5[i,1],'%.0f yr' %round(max5[i,2]), color='blue')\n t.set_bbox(dict(facecolor='white', alpha=0.5, edgecolor='w'))\n\n\nplt.tight_layout()\nplt.show()\n\nfig.savefig('/home/zmaw/u241292/scripts/python/pystuff/figs/periodogram.png')\n```\n\n![alt text](https://raw.githubusercontent.com/davidmnielsen/pystuff/master/figs/periodogram.png \"periodogram.png\")\n\n## Principal Component Analysis (PCA)\n\n```python\n# Get data from any 3 grid cells\nx1=myvar[:,100,100]\nx2=myvar[:,100,200]\nx3=myvar[:,100,300]\nX=np.transpose([x1,x3,x2]) # np.shape(X) = (468, 3)\n\n# Calculate PCA\nscores, eigenvals, eigenvecs, expl, expl_acc, means, stds, north, loadings = ps.ddpca(X)\n\n# Combo-plot\nfrom matplotlib.gridspec import GridSpec\ngs = GridSpec(nrows=2, ncols=4)\nf=plt.figure(figsize=(6,6))\n\nax=f.add_subplot(gs[0, 0:2])\nps.usetex()\nps.nospines(ax)\nplt.errorbar(np.arange(1,len(expl)+1),expl,yerr=[north,north], fmt='o',color='b',markeredgecolor='b')\nfor i in range(3):\n plt.text(i+1.2,expl[i],'$%.1f \\pm %.1f$' %(expl[i],north[i]))\nplt.xticks(np.arange(1,len(expl)+1,1))\nplt.xlim((0.5,4))\nplt.xlabel('PC')\nplt.ylabel('Explained Variance [\\%]')\nplt.text(0.6,46.5,'a',fontsize=14,fontweight='heavy')\n\nax=f.add_subplot(gs[0, 2:])\nps.nospines(ax)\nwdt=0.15\nplt.axhline(0,color='k')\nplt.bar(np.arange(3)-wdt,eigenvecs[0,:],facecolor='r',edgecolor='r',width=wdt)\nplt.bar(np.arange(3) ,eigenvecs[1,:],facecolor='g',edgecolor='g',width=wdt)\nplt.bar(np.arange(3)+wdt,eigenvecs[2,:],facecolor='b',edgecolor='b',width=wdt)\nplt.ylabel('Eigenvector values')\nplt.xticks(np.arange(3)+0.125,['$1$','$2$','$3$'],usetex=True)\nplt.xlabel('PC')\nplt.axvline(0.625,color='lightgrey',ls='--')\nplt.axvline(1.625,color='lightgrey',ls='--')\nplt.text(0.8,0.5,'Series 1',color='r')\nplt.text(0.8,0.4,'Series 2',color='g')\nplt.text(0.8,0.3,'Series 3',color='b')\nplt.text(-0.4,0.9,'b',fontsize=14,fontweight='heavy')\n\nax1=f.add_subplot(gs[1, 1:-1])\nax1.set_xlim((-3,3)); plt.ylim((-3,3))\nax1.set(xlabel='PC1', ylabel='PC2')\nax1.axvline(0,color='k')\nax1.axhline(0,color='k')\nax1.plot(scores[:,0],scores[:,1],'o',markersize=3,\n markeredgecolor='lightgrey',markerfacecolor='lightgrey')\nax1.text(-2.8,2.5,'c',fontsize=15,fontweight='heavy')\nax2 = ps.twinboth(ax1)\nax2.set_xlim((-1.5,1.5)); plt.ylim((-1.5,1.5))\nax2.set_xlabel('PC1 Loadings', labelpad=3)\nax2.set_ylabel('PC2 Loadings', labelpad=14)\nax2.arrow(0,0,loadings[0,0],loadings[0,1],width=0.005,color='r',lw=2)\nax2.arrow(0,0,loadings[1,0],loadings[1,1],width=0.005,color='g',lw=2)\nax2.arrow(0,0,loadings[2,0],loadings[2,1],width=0.005,color='b',lw=2)\n\nplt.tight_layout()\nplt.show()\n\nf.savefig('figs/pca.png')\n```\n![alt text](https://raw.githubusercontent.com/davidmnielsen/pystuff/master/figs/pca.png \"pca.png\")\n\n## Ensemble Subsampling\n\n```python\n# Create a fake ensemble of nsim members of red noise (preserving the lag-dt) correlation\nnsim=1000\ndt=1\nens=ps.rednoise(len(nasstndt),ps.rhoAlt(nasstndt, dt=dt), nsim=nsim)\n\n# Get Percentiles of Ensemble Spread\nspread=ensPctl(ens,pctl=[0.025,0.975])\n\n# Get the best 5% ensemble members, based on correlation\nbestens, ensmean, nmembers = bestEns(ens,nasstndt,pctl=0.95)\n\n\n# Plot\nf=plt.figure(figsize=(8,6))\n\nax=f.add_subplot(2,1,1)\nps.nospines(ax)\nplt.plot(time, ens,'lightgrey', lw=0.5)\n# plt.fill_between(time, spread[:,0],spread[:,1],facecolor='k',alpha=0.1,label='2 std')\nplt.plot(time, ens[:,1],'lightgrey', lw=1, label='All 10k Ensemble Members')\nplt.plot(time, ens[:,1],'r', lw=0.5, label='Member 1')\nplt.plot(time, np.nanmean(ens,axis=1),'k', lw=2, label='Full Ensemble Mean')\nplt.plot(time, nasstndt,'b',lw=1,label='NASST')\nplt.ylim((-6,6))\nplt.ylabel('s.d.')\nplt.title('Full Ensemble (10k Members)')\nps.leg(loc='upper right', frameon=True)\n\nax=f.add_subplot(2,1,2)\nps.nospines(ax)\nplt.plot(time,bestens[:,0],'lightgrey', label='Best 5\\% of Ensemble (50 members)')\nplt.plot(time,bestens,'lightgrey')\nplt.plot(time,ensmean,'k', lw=2, label='Best Ensemble Mean')\nplt.plot(time,nasstndt,'b',lw=1,label='NASST')\nplt.plot(time,low,'g',lw=2,label='NASST low-pass filt.')\nplt.ylim((-6,6))\nplt.ylabel('s.d.')\nplt.title('Best Ensemble (50 Members)')\nps.leg(loc='upper right', frameon=True)\n\nplt.tight_layout()\nplt.show()\nf.savefig('figs/ensemeble.png')\n```\n![alt text](https://raw.githubusercontent.com/davidmnielsen/pystuff/master/figs/ensemeble.png \"ensemeble.png\")\n\n## Statistics\n\n```python\n# Annual means of monthly values\ny=ps.annualmean(x)\n\n# Autocorrelation of lag 1\nr=ps.rho(x)\n\n# Autocorrelation of lag n\nr=ps.rhoAlt(x,dt=n)\n\n# Rednoise time series\nred=ps.rednoise(length, lag-1 correlation, nsim)\n\n# Value of percentile\nx95=getPercentile(x,pctl=0.95)\n\n# Multiple Linear Regression (Uncertainties based on t-test)\nX = [x1, x2, ... xn]\nfitted, lo, up, r2, r2adj, [coefs] = ps.mlr(X, y, stds=2, returnCoefs=False, printSummary=False)\n\n# Multiple Linear Regression (Uncertainties baed on Bootstrap)\nfitted_0, up, lo, r2adj_0, r2adj_un, [r2_0, r2_unc, coefs_0, coefs] = ps.bootsmlr(X, y, n=1000, conflev=0.95, positions='new', uncertain='Betas', details=False, printSummary=False)\n\n# Predict MLR, using coefs from the two above\npred, lo, up = ps.mlr_predict(X,coefs,stds=2)\n\n# Residuals frmo MLR\nres = ps.mlr_res(X, y)\n\n# Histogram\nbinlims, x_count = ps.ddhist(x,vmin,vmax,binsize, zeronan=True, density=False)\n\n# Order (sort ascending) x and y based on values of x\nxo, yo = ps.order(x,y)\n\n# Horizontal divergence\ndiv = ps.hdivg(u,v,lat,lon,regulargrid=True)\n\n# Vorticity (vertical component of relative vorticity)\nvor = ps.hcurl(u,v,lat,lon,regulargrid=True)\n```\n## Non-pystuff, though nice to copy-and-paste too\n\n```python\n# Save regular grid NetCDF (with Xarray)\nnewds = xr.Dataset(data_vars={'lat' : (\"lat\", lat),\n 'lon' : (\"lon\", lon,\n 'time' : (\"time\", time),\n 'slp' : ([\"time\",\"lat\",\"lon\"], slp)})\nnewds['lon'].attrs = {'standard_name' :'longitude','long_name':'longitude','units': 'degrees_east' ,'axis': 'X'}\nnewds['lat'].attrs = {'standard_name' :'latitude' ,'long_name':'latitude' ,'units': 'degrees_north' ,'axis': 'Y'}\nnewds['time'].attrs = {'standard_name' :'time' ,'long_name':'time' ,'units': 'year of January','axis': 'T'}\nnewds['slp'].attrs = {'standard_name' : 'slp', 'shortname': 'slp', 'units': 'hPa'}\nnewds.to_netcdf('slp.nc') \n\n# Save curvilinear grid (e.g. MPIOM) NetCDF\nnewds = xr.Dataset(data_vars={'lat' : ([\"y\",\"x\"], lat),\n 'lon' : ([\"y\",\"x\"], lon),\n 'time' : (\"time\", time,\n 'dist': ([\"time\",\"y\",\"x\"], dist)})\nnewds['lon'].attrs = {'standard_name':'lon', 'long_name':'longitude','units':'degrees_east', '_CoordinateAxisType':'Lon'}\nnewds['lat'].attrs = {'standard_name':'lat', 'long_name':'latitude' ,'units':'degrees_north','_CoordinateAxisType':'Lat'}\nnewds['time'].attrs = {'standard_name':'time','long_name':'time','units':'hours since 0-01-01 22:40:00','calendar':'365_day' ,'axis':'T'}\nnewds['dist'].attrs = {'standard_name':'dist','long_name':'distances from segment', 'units':'degrees','coordinates':'lon lat'}\nnewds.to_netcdf('distances_MPIOM_LR.nc') \n```\n\n\n", "description_content_type": "text/markdown", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/davidmnielsen/pystuff", "keywords": "", "license": "", "maintainer": "", "maintainer_email": "", "name": "pystuff", "package_url": "https://pypi.org/project/pystuff/", "platform": "", "project_url": "https://pypi.org/project/pystuff/", "project_urls": { "Homepage": "https://github.com/davidmnielsen/pystuff" }, "release_url": "https://pypi.org/project/pystuff/0.0.6/", "requires_dist": [ "numpy", "matplotlib", "xarray", "pandas", "scipy" ], "requires_python": "", "summary": "Some useful functions for data analysis in python.", "version": "0.0.6" }, "last_serial": 5588220, "releases": { "0.0.2": [ { "comment_text": "", "digests": { "md5": "0ccfd561c815cc1cabd50950952787f9", "sha256": "67e16522957115e708acf8e9397b34e8a4eff76e187e2cad8ed41cba18ddf090" }, "downloads": -1, "filename": "pystuff-0.0.2-py3-none-any.whl", "has_sig": false, "md5_digest": "0ccfd561c815cc1cabd50950952787f9", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 20160, "upload_time": "2019-07-26T08:34:51", "url": "https://files.pythonhosted.org/packages/3a/92/32e0a8f765c6e54fe3c5aa8e33b759a69b80ad400bfcaddc078efb340257/pystuff-0.0.2-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "95898fba5344afe95cc25d62460ebf3c", "sha256": "f28264af3d6fd47a86a3e1f5e6f0369088e3e32b742a262102dba0c5e0f166be" }, "downloads": -1, "filename": "pystuff-0.0.2.tar.gz", "has_sig": false, "md5_digest": "95898fba5344afe95cc25d62460ebf3c", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 359924, "upload_time": "2019-07-26T08:40:39", "url": "https://files.pythonhosted.org/packages/4a/51/01e51f42025bb55bda18e9432b0f43bb7269448b08c3e27b9e9360b2013a/pystuff-0.0.2.tar.gz" } ], "0.0.3": [ { "comment_text": "", "digests": { "md5": "f975595a2f22223668fd755866a48ab5", "sha256": "e4f7ca6724fa9729d92c557032040b8a67a93dba2ba623e55e8c27134b1f4b08" }, "downloads": -1, "filename": "pystuff-0.0.3-py3-none-any.whl", "has_sig": false, "md5_digest": "f975595a2f22223668fd755866a48ab5", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 20152, "upload_time": "2019-07-26T08:44:33", "url": "https://files.pythonhosted.org/packages/26/e4/2de4a5ab006b9dcb517576aaf47c9440d49fc810acfe7564a8edc41fc876/pystuff-0.0.3-py3-none-any.whl" } ], "0.0.4": [ { "comment_text": "", "digests": { "md5": "0bdbb8bc463f524e7a69fb9b6c7ddf45", "sha256": "aceb254633774455aedf34ad7cc6d7b6f828aea5738f518cf4e0981b5edb32c3" }, "downloads": -1, "filename": "pystuff-0.0.4-py3-none-any.whl", "has_sig": false, "md5_digest": "0bdbb8bc463f524e7a69fb9b6c7ddf45", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 20191, "upload_time": "2019-07-26T09:08:34", "url": "https://files.pythonhosted.org/packages/15/02/4e9ac2601dbf3930c9fb97687c72da4052e14abce40fa68bb767f4212bfc/pystuff-0.0.4-py3-none-any.whl" } ], "0.0.5": [ { "comment_text": "", "digests": { "md5": "47cf996867def3bd3eb0865f3d4c92e7", "sha256": "60ded8f540c586c066b060d74154a1efbcced79f3517ecffa97c7657443bfe39" }, "downloads": -1, "filename": "pystuff-0.0.5-py3-none-any.whl", "has_sig": false, "md5_digest": "47cf996867def3bd3eb0865f3d4c92e7", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 20172, "upload_time": "2019-07-26T09:29:36", "url": "https://files.pythonhosted.org/packages/6a/37/564f267b21d47226dea5842ffecde5a18c37dad9248aa8a3819dfb381627/pystuff-0.0.5-py3-none-any.whl" } ], "0.0.6": [ { "comment_text": "", "digests": { "md5": "1c70b5b4f9696e3962a824d756573ddb", "sha256": "989d97ebe300f458449ce9049c74f85ebe9fe0605cbfb454a3f807c396142edb" }, "downloads": -1, "filename": "pystuff-0.0.6-py3-none-any.whl", "has_sig": false, "md5_digest": "1c70b5b4f9696e3962a824d756573ddb", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 20319, "upload_time": "2019-07-26T10:00:26", "url": "https://files.pythonhosted.org/packages/f4/3e/d2b604def429a840dae676249288fad32a951d050e7a84e8cc03a768f46f/pystuff-0.0.6-py3-none-any.whl" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "1c70b5b4f9696e3962a824d756573ddb", "sha256": "989d97ebe300f458449ce9049c74f85ebe9fe0605cbfb454a3f807c396142edb" }, "downloads": -1, "filename": "pystuff-0.0.6-py3-none-any.whl", "has_sig": false, "md5_digest": "1c70b5b4f9696e3962a824d756573ddb", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 20319, "upload_time": "2019-07-26T10:00:26", "url": "https://files.pythonhosted.org/packages/f4/3e/d2b604def429a840dae676249288fad32a951d050e7a84e8cc03a768f46f/pystuff-0.0.6-py3-none-any.whl" } ] }