{ "info": { "author": "James Ross", "author_email": "jamespatross@gmail.com", "bugtrack_url": null, "classifiers": [ "License :: OSI Approved :: MIT License", "Operating System :: OS Independent", "Programming Language :: Python :: 3" ], "description": "# Bayesian Hidden Markov Models\n\n[![Build Status](https://img.shields.io/travis/jamesross2/Bayesian-HMM/master?logo=travis&style=flat-square)](https://travis-ci.org/jamesross2/Bayesian-HMM?style=flat-square)\n[![PyPI Version](https://img.shields.io/pypi/v/bayesian-hmm?label=PyPI&logo=pypi&style=flat-square)](https://pypi.org/project/bayesian-hmm/)\n\nThis code implements a non-parametric Bayesian Hidden Markov model,\nsometimes referred to as a Hierarchical Dirichlet Process Hidden Markov\nModel (HDP-HMM), or an Infinite Hidden Markov Model (iHMM). This package has capability\nfor a standard non-parametric Bayesian HMM, as well as a sticky HDPHMM \n(see references). Inference is performed via Markov chain Monte Carlo estimation,\nincluding efficient beam sampling for the latent sequence resampling steps,\nand multithreading when possible for parameter resampling.\n\n\n## Installation\n\nThe current version is development only, and installation is only recommended for\npeople who are aware of the risks. It can be installed through PyPI:\n\n```sh\npip install bayesian-hmm\n```\n\n\n## Hidden Markov Models\n\n[Hidden Markov Models](https://en.wikipedia.org/wiki/Hidden_Markov_model) \nare powerful time series models, which use latent variables to explain observed emission sequences.\nThe result is a generative model for time series data, which is often tractable and can be easily understood.\nThe latent series is assumed to be a Markov chain, which requires a starting distribution and transition distribution, \nas well as an emission distribution to tie emissions to latent states.\nTraditional parametric Hidden Markov Models use a fixed number of states for the latent series Markov chain.\nHierarchical Dirichlet Process Hidden Markov Models (including the one implemented by `bayesian_hmm` package) allow\nfor the number of latent states to vary as part of the fitting process. \nThis is done by using a hierarchical Dirichlet prior on the latent state starting and transition distributions, \nand performing MCMC sampling on the latent states to estimate the model parameters.\n\n\n## Usage\n\nBasic usage allows us to supply a list of emission sequences, initialise the HDPHMM, and perform MCMC estimation.\nThe below example constructs some artificial observation series, and uses a brief MCMC estimation step to estimate the \nmodel parameters.\nWe use a moderately sized data to showcase the speed of the package: 50 sequences of length 200, with 500 MCMC steps. \n\n```python\nimport bayesian_hmm\n\n# create emission sequences\nbase_sequence = list(range(5)) + list(range(5, 0, -1))\nsequences = [base_sequence * 20 for _ in range(50)]\n\n# initialise object with overestimate of true number of latent states\nhmm = bayesian_hmm.HDPHMM(sequences, sticky=False)\nhmm.initialise(k=20)\n\nresults = hmm.mcmc(n=500, burn_in=100, ncores=3, save_every=10, verbose=True)\n```\n\nThis model typically converges to 10 latent states, a sensible posterior. In some cases,\nit converges to 11 latent states, in which a starting state which outputs '0' with high\nconfidence is separate to another latent start which outputs '0' with high confidence.\nWe can inspect this using the printed output, or with probability matrices printed \ndirectly.\n\n```python\n# print final probability estimates (expect 10 latent states)\nhmm.print_probabilities()\n```\n\nThis final command prints the transition and emission probabiltiies of the model after\nMCMC using the [`terminaltables`](https://pypi.org/project/terminaltables/) package. The \ncode below visualises the results using [`pandas`](https://pypi.org/project/pandas/)\nand [`seaborn`](https://pypi.org/project/seaborn/). For simplicity, we will stick with\nthe returned MAP estimate, but a more complete analysis might use a more sophisticated\napproach.\n\n```python\nimport pandas as pd\nimport seaborn as sns\nimport matplotlib.pyplot as plt\nplt.style.use('ggplot')\n\n# plot the number of states as a histogram\nsns.countplot(results['state_count'])\nplt.title('Number of latent states')\nplt.xlabel('Number of latent states')\nplt.ylabel('Number of iterations')\nplt.show()\n```\n\n\n![State counts](https://raw.githubusercontent.com/jamesross2/Bayesian-HMM/master/outputs/plot_state_count.png)\n\n```python\n# plot the starting probabilities of the sampled MAP estimate\nmap_index = results['chain_loglikelihood'].index(min(results['chain_loglikelihood']))\nparameters_map = results['parameters'][map_index]\nsns.barplot(\n x=list(parameters_map['p_initial'].keys()), \n y=list(parameters_map['p_initial'].values())\n)\nplt.title('Starting state probabilities')\nplt.xlabel('Latent state')\nplt.ylabel('MAP estimate for initial probability')\nplt.show()\n```\n\n![MAP initial probabilities](https://raw.githubusercontent.com/jamesross2/Bayesian-HMM/master/outputs/plot_p_initial.png)\n\n```python\n# convert list of hyperparameters into a DataFrame\nhyperparam_posterior_df = (\n pd.DataFrame(results['hyperparameters'])\n .reset_index()\n .melt(id_vars=['index'])\n .rename(columns={'index': 'iteration'})\n)\nhyperparam_prior_df = pd.concat(\n pd.DataFrame(\n {'iteration': range(500), 'variable': k, 'value': [v() for _ in range(500)]}\n )\n for k, v in hmm.priors.items()\n)\nhyperparam_df = pd.concat(\n (hyperparam_prior_df, hyperparam_posterior_df), \n keys=['prior', 'posterior'], \n names=('type','index')\n)\nhyperparam_df.reset_index(inplace=True)\n\n# advanced: plot sampled prior & sampled posterior together\ng = sns.FacetGrid(\n hyperparam_df[hyperparam_df['variable'] != 'kappa'],\n col='variable', \n col_wrap=3, \n sharex=False,\n hue='type'\n)\ng.map(sns.distplot, 'value')\ng.add_legend()\ng.fig.suptitle('Hyperparameter prior & posterior estimates')\nplt.subplots_adjust(top=0.9)\nplt.xlabel('Value')\nplt.ylabel('Density')\nplt.show()\n```\n\n\n![Hyperparameter posteriors](https://raw.githubusercontent.com/jamesross2/Bayesian-HMM/master/outputs/plot_hyperparameters.png)\n\nThe `bayesian_hmm` package can handle more advanced usage, including:\n * Multiple emission sequences,\n * Emission series of varying length,\n * Any categorical emission distribution,\n * Multithreaded MCMC estimation, and\n * Starting probability estimation, which share a dirichlet prior with the transition probabilities.\n\n\n\n## Inference\n\nThis code uses an MCMC approach to parameter estimation. \nWe use efficient Beam sampling on the latent sequences, as well as \nMetropolis Hastings sampling on each of the hyperparameters.\nWe approximate true resampling steps by using probability estimates\ncalculated on all states of interest, rather than the \nleaving probabilities unadjusted\nfor current variable resampling steps (rather than removing the current)\nvariable for the sampled estimate. \n\n\n## Outstanding issues and future work\n\nWe have the following set as a priority to improve in the future:\n\n* Expand package to include standard non-Bayesian HMM functions, such as Baum Welch and Viterbi algorithm\n* Allow for missing or `NULL` emissions which do not impact the model probability\n* Include functionality to use maximum likelihood estimates for the hyperparameters \n(currently only Metropolis Hastings resampling is possible for hyperparameters)\n\n\n## References\n\nVan Gael, J., Saatci, Y., Teh, Y. W., & Ghahramani, Z. (2008, July). Beam sampling for the infinite hidden Markov model. In Proceedings of the 25th international conference on Machine learning (pp. 1088-1095). ACM.\n\nBeal, M. J., Ghahramani, Z., & Rasmussen, C. E. (2002). The infinite hidden Markov model. In Advances in neural information processing systems (pp. 577-584).\n\nFox, E. B., Sudderth, E. B., Jordan, M. I., & Willsky, A. S. (2007). The sticky HDP-HMM: Bayesian nonparametric hidden Markov models with persistent states. Arxiv preprint.\n\n\n", "description_content_type": "text/markdown", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/jamesross2/Bayesian-HMM", "keywords": "", "license": "MIT", "maintainer": "", "maintainer_email": "", "name": "bayesian-hmm", "package_url": "https://pypi.org/project/bayesian-hmm/", "platform": "", "project_url": "https://pypi.org/project/bayesian-hmm/", "project_urls": { "Homepage": "https://github.com/jamesross2/Bayesian-HMM" }, "release_url": "https://pypi.org/project/bayesian-hmm/0.0.4/", "requires_dist": [ "numpy", "scipy", "sympy", "terminaltables", "tqdm" ], "requires_python": "", "summary": "A non-parametric Bayesian approach to Hidden Markov Models", "version": "0.0.4" }, "last_serial": 5828300, "releases": { "0.0.0a0": [ { "comment_text": "", "digests": { "md5": "d106a1844537fa09737a905dd1d73e91", "sha256": "5b90618d4a52c5c50b772ae88c2d8993bae2cb80feca6e4dafa72ee7fb8b9ad6" }, "downloads": -1, "filename": "bayesian_hmm-0.0.0a0-py2.7.egg", "has_sig": false, "md5_digest": "d106a1844537fa09737a905dd1d73e91", "packagetype": "bdist_egg", "python_version": "2.7", "requires_python": null, "size": 22473, "upload_time": "2019-03-31T05:29:24", "url": "https://files.pythonhosted.org/packages/72/76/c4808b3aa667007171a5dc983041e260e0ca741bb4de1bb71c91b526490b/bayesian_hmm-0.0.0a0-py2.7.egg" }, { "comment_text": "", "digests": { "md5": "178779d24da94862df9fab27993fd768", "sha256": "9522ca5fa99f2cac0c49df100e0010b2c222a3861f9d7e7c1fdeea24ef5df6a5" }, "downloads": -1, "filename": "bayesian_hmm-0.0.0a0-py3.6.egg", "has_sig": false, "md5_digest": "178779d24da94862df9fab27993fd768", "packagetype": "bdist_egg", "python_version": "3.6", "requires_python": null, "size": 23353, "upload_time": "2019-03-31T05:29:26", "url": "https://files.pythonhosted.org/packages/91/ad/c54bbae2a4de1a85f81d96aa84aaccf0b49230cc53509f6830d51437ead7/bayesian_hmm-0.0.0a0-py3.6.egg" }, { "comment_text": "", "digests": { "md5": "6ec02d78d39dd3d3eae43345db6dfb1f", "sha256": "ed6ff2ea91a5be6bb56ea07a480ead8827ece81782bb8542463dc9cc0bdb435b" }, "downloads": -1, "filename": "bayesian_hmm-0.0.0a0-py3-none-any.whl", "has_sig": false, "md5_digest": "6ec02d78d39dd3d3eae43345db6dfb1f", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 21724, "upload_time": "2019-03-31T05:29:21", "url": "https://files.pythonhosted.org/packages/a6/28/70b45c58f0b9d4170c056bfa527f6f2ed62cdaf31695e2465e876c1fada1/bayesian_hmm-0.0.0a0-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "30db05b6e4dac9ce1fda1a528a456b4a", "sha256": "0594dc5e00c9e5dd14017e19f75b255f530b1ed23b5492af4bb7a50338cb0b0a" }, "downloads": -1, "filename": "bayesian_hmm-0.0.0a0.tar.gz", "has_sig": false, "md5_digest": "30db05b6e4dac9ce1fda1a528a456b4a", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 12103, "upload_time": "2019-03-31T05:29:28", "url": "https://files.pythonhosted.org/packages/28/22/3751d32e1dddc4c21fdb8f7481012d94a0c6a880e054ebfc8825b6a9585e/bayesian_hmm-0.0.0a0.tar.gz" } ], "0.0.1": [ { "comment_text": "", "digests": { "md5": "464ca53c979d73471386c4f5a96c11d1", "sha256": "e0ea550652aab227a71b18e65e4fb4d7d669423400759bcc381b674839597525" }, "downloads": -1, "filename": "bayesian_hmm-0.0.1-py3-none-any.whl", "has_sig": false, "md5_digest": "464ca53c979d73471386c4f5a96c11d1", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 16470, "upload_time": "2019-04-09T13:06:30", "url": "https://files.pythonhosted.org/packages/93/48/6ccb440486ee82275955b609f8f868cff61b45455821bfa86b76ae9ed535/bayesian_hmm-0.0.1-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "b427688129077fc336bc474372ceaab1", "sha256": "2046ce5da61c1af3a2466b8b9af6df3bcc65db12872ca7a4b8c62ce0793569d7" }, "downloads": -1, "filename": "bayesian_hmm-0.0.1.tar.gz", "has_sig": false, "md5_digest": "b427688129077fc336bc474372ceaab1", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 14579, "upload_time": "2019-04-09T13:06:32", "url": "https://files.pythonhosted.org/packages/88/f5/c5e6e47c2123fe5e302c4c0316c1124b68ab4ed0b2c36082fb9f7c2b000f/bayesian_hmm-0.0.1.tar.gz" } ], "0.0.2": [ { "comment_text": "", "digests": { "md5": "f69ab14ee0dff53428bdf3e87bec207b", "sha256": "abb481e3863ae8565e7b3b967b33b397de2b7987ba55c80b2a552d68ca624766" }, "downloads": -1, "filename": "bayesian_hmm-0.0.2-py3-none-any.whl", "has_sig": false, "md5_digest": "f69ab14ee0dff53428bdf3e87bec207b", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 16650, "upload_time": "2019-09-14T00:25:32", "url": "https://files.pythonhosted.org/packages/ea/95/0794063f8c03510908a2e1cd3c1b145d43ad2888b3649d5709614ac9e4ce/bayesian_hmm-0.0.2-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "ab1e295fcb5b829d9592399c210c4570", "sha256": "092fb6934011c78620d3a94b12c50ea25c7548f92653fbbae208684bef444d25" }, "downloads": -1, "filename": "bayesian_hmm-0.0.2.tar.gz", "has_sig": false, "md5_digest": "ab1e295fcb5b829d9592399c210c4570", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 16466, "upload_time": "2019-09-14T00:25:34", "url": "https://files.pythonhosted.org/packages/bf/f8/396c315be3b965febda16e168b69391c73cdc50148269db6b88c72c2f96f/bayesian_hmm-0.0.2.tar.gz" } ], "0.0.3": [ { "comment_text": "", "digests": { "md5": "f4d60a9501d8f0dcae461f7894caec3a", "sha256": "5ac06a96875c4d77b49d7880be6ae2e767d01184a411bb1e00fd3b711a402b77" }, "downloads": -1, "filename": "bayesian_hmm-0.0.3-py3-none-any.whl", "has_sig": false, "md5_digest": "f4d60a9501d8f0dcae461f7894caec3a", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 19055, "upload_time": "2019-09-14T00:44:47", "url": "https://files.pythonhosted.org/packages/72/69/f9d949d0c4f1d1ff50142dcc7c5bd004ad322063a55150818826c2cdbb0b/bayesian_hmm-0.0.3-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "93bf48dc6f104b13f4516b7aaf132601", "sha256": "040baeb30f7db6f7a01d430ca0217fa3d50a2a0605af5f04dc1bd6fd0c9f67a4" }, "downloads": -1, "filename": "bayesian_hmm-0.0.3.tar.gz", "has_sig": false, "md5_digest": "93bf48dc6f104b13f4516b7aaf132601", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 19591, "upload_time": "2019-09-14T00:44:51", "url": "https://files.pythonhosted.org/packages/f5/99/ba622478d8e09086124891a2b4886c2c4e6c1129c64bbf2c8628f15e75b0/bayesian_hmm-0.0.3.tar.gz" } ], "0.0.4": [ { "comment_text": "", "digests": { "md5": "14e9d741ce4d14cd9295d9404a6ba6e5", "sha256": "778bece06a944938b42a735b61b01954b4719ee760a4f2235850a3ed2754f28f" }, "downloads": -1, "filename": "bayesian_hmm-0.0.4-py3-none-any.whl", "has_sig": false, "md5_digest": "14e9d741ce4d14cd9295d9404a6ba6e5", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 20107, "upload_time": "2019-09-14T01:06:26", "url": "https://files.pythonhosted.org/packages/41/e7/92a35e9f9cf2f79e0692e96ffeaf28c2ed5e444e9daa810c5c672ead6973/bayesian_hmm-0.0.4-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "fafe5bcf387068e64574a98cea7d0794", "sha256": "fc96df5cbfde748a874c6ce9d3d51678b46bc95dd762be9a7f77e4c48d1d484b" }, "downloads": -1, "filename": "bayesian_hmm-0.0.4.tar.gz", "has_sig": false, "md5_digest": "fafe5bcf387068e64574a98cea7d0794", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 20746, "upload_time": "2019-09-14T01:06:30", "url": "https://files.pythonhosted.org/packages/8b/f7/ff60fd32c66524ff1a49d03c1a848c7add42b0a595e95abe4188e7035461/bayesian_hmm-0.0.4.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "14e9d741ce4d14cd9295d9404a6ba6e5", "sha256": "778bece06a944938b42a735b61b01954b4719ee760a4f2235850a3ed2754f28f" }, "downloads": -1, "filename": "bayesian_hmm-0.0.4-py3-none-any.whl", "has_sig": false, "md5_digest": "14e9d741ce4d14cd9295d9404a6ba6e5", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 20107, "upload_time": "2019-09-14T01:06:26", "url": "https://files.pythonhosted.org/packages/41/e7/92a35e9f9cf2f79e0692e96ffeaf28c2ed5e444e9daa810c5c672ead6973/bayesian_hmm-0.0.4-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "fafe5bcf387068e64574a98cea7d0794", "sha256": "fc96df5cbfde748a874c6ce9d3d51678b46bc95dd762be9a7f77e4c48d1d484b" }, "downloads": -1, "filename": "bayesian_hmm-0.0.4.tar.gz", "has_sig": false, "md5_digest": "fafe5bcf387068e64574a98cea7d0794", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 20746, "upload_time": "2019-09-14T01:06:30", "url": "https://files.pythonhosted.org/packages/8b/f7/ff60fd32c66524ff1a49d03c1a848c7add42b0a595e95abe4188e7035461/bayesian_hmm-0.0.4.tar.gz" } ] }