{ "info": { "author": "Raimi bin Karim", "author_email": "raimi.bkarim@gmail.com", "bugtrack_url": null, "classifiers": [ "License :: OSI Approved :: MIT License", "Operating System :: OS Independent", "Programming Language :: Python :: 3" ], "description": "# Mixed Naive Bayes\n\nNaive Bayes classifiers are a set of supervised learning algorithms based on applying Bayes' theorem, but with strong independence assumptions between the features given the value of the class variable (hence naive).\n\nThis module implements **Categorical** (Multinoulli) and **Gaussian** naive Bayes algorithms (hence *mixed naive Bayes*). This means that we are not confined to the assumption that features (given their respective *y*'s) follow the Gaussian distribution, but also the categorical distribution. Hence it is natural that the continuous data be attributed to the Gaussian and the categorical data (nominal or ordinal) be attributed the the categorical distribution.\n\nThe motivation for writing this library is that [scikit-learn](https://scikit-learn.org/) does not have an implementation for mixed type of naive bayes. They have one for `CategoricalNB` [here](https://github.com/scikit-learn/scikit-learn/blob/86aea9915/sklearn/naive_bayes.py#L1021) but it's still pending.\n\nI like `scikit-learn`'s APIs \ud83d\ude0d so if you use it a lot, you'll find that it's easy to get started started with this library (there's `.fit()`, `.predict()`, `.predict_proba()` and `.score()`).\n\nI've written a tutorial [here](https://remykarem.github.io/blog/naive-bayes) for naive bayes if you need to understand a bit more on the math.\n\n## Contents\n\n- [Installation](#installation)\n- [Quick start](#quick-start)\n- [Requirements](#requirements)\n- [Performance (Accuracy)](#performance-accuracy)\n- [Performance (Speed)](#performance-speed)\n- [Tests](#tests)\n- [API Documentation](#api-documentation)\n- [To-Dos](#to-dos)\n- [References](#references)\n- [Related work](#related-work)\n- [Contributing \ufe0f\u2764\ufe0f](#contributing)\n\n## Installation\n\n### via pip\n\n```bash\npip install git+https://github.com/remykarem/mixed-naive-bayes#egg=mixed_naive_bayes\n```\n\n## Quick starts\n\n### Example 1: Discrete and continuous data\n\nBelow is an example of a dataset with discrete (first 2 columns) and continuous data (last 2). Specify the indices of the features which are to follow the categorical distribution (columns `0` and `1`). Then fit and\npredict as per usual.\n\n```python\nfrom mixed_naive_bayes import MixedNB\nX = [[0, 0, 180, 75],\n [1, 1, 165, 61],\n [2, 1, 166, 60],\n [1, 1, 173, 68],\n [0, 2, 178, 71]]\ny = [0, 0, 1, 1, 0]\nclf = MixedNB(categorical_features=[0,1])\nclf.fit(X,y)\nclf.predict(X)\n```\n\n**NOTE: The module expects that you treat the categorical data be label encoded accordingly. See the following example to see how.**\n\n### Example 2: Discrete and continuous data\n\nBelow is an example of a dataset with discrete (first 2 columns) and continuous data (last 2). Specify the indices of the features which are to follow the categorical distribution (columns `0` and `1`). Then fit and\npredict as per usual.\n\nIf we decide to make the 3rd column as a discrete feature, we can use sklearn's [`LabelEncoder()`](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.LabelEncoder.html) preprocessing module.\n\n```python\nfrom sklearn.preprocessing import LabelEncoder\nX = [[0, 0, 180, 75],\n [1, 1, 165, 61],\n [2, 1, 166, 60],\n [1, 1, 173, 68],\n [0, 2, 178, 71]]\ny = [0, 0, 1, 1, 0]\nX = np.array(X)\ny = np.array(y)\nlabel_encoder = LabelEncoder()\nX[:,2] = label_encoder.fit_transform(X[:,2])\n# array([[ 0, 0, 4, 75],\n# [ 1, 1, 0, 61],\n# [ 2, 1, 1, 60],\n# [ 1, 1, 2, 68],\n# [ 0, 2, 3, 71]])\n```\n\n```python\nfrom mixed_naive_bayes import MixedNB\nclf = MixedNB(categorical_features=[0,1])\nclf.fit(X,y)\nclf.predict(X)\n```\n\n### Example 3: Discrete data only\n\nIf all columns are to be treated as discrete, specify `categorical_features='all'`.\n\n```python\nfrom mixed_naive_bayes import MixedNB\nX = [[0, 0],\n [1, 1],\n [1, 0],\n [0, 1],\n [1, 1]]\ny = [0, 0, 1, 0, 1]\nclf = MixedNB(categorical_features='all')\nclf.fit(X,y)\nclf.predict(X)\n```\n\n**NOTE: The module expects that you treat the categorical data be label encoded accordingly. See the previous example to see how.**\n\n### Example 4: Continuous data only\n\nIf all columns are to be treated as continuous, then leave the constructor blank.\n\n```python\nfrom mixed_naive_bayes import MixedNB\nX = [[0, 0],\n [1, 1],\n [1, 0],\n [0, 1],\n [1, 1]]\ny = [0, 0, 1, 0, 1]\nclf = MixedNB()\nclf.fit(X,y)\nclf.predict(X)\n```\n\n### More examples\n\nSee the `examples/` folder for more example notebooks or jump in to a notebook hosted at MyBinder [here](https://mybinder.org/v2/gh/remykarem/mixed-naive-bayes/master?filepath=%2Fexamples%2Fdataset_digits.ipynb).\n\n## Requirements\n\n- `Python>=3.6`\n- `numpy>=1.16.1`\n\nThe `scikit-learn` library is used to import data as seen in the examples. Otherwise, the module itself does not require it.\n\nThe `pytest` library is not needed unless you want to perform testing.\n\n## Performance (Accuracy)\n\nMeasures the accuracy of (1) using categorical data and (2) my Gaussian implementation.\n\nDataset | GaussianNB | MixedNB (G) | MixedNB (C) | MixedNB (G+C) |\n------- | ---------- | ----------- | ----------- | ------------- |\nIris | 0.960 | 0.960 | - | - |\nDigits | 0.858 | 0.858 | **0.961** | - |\nWine | 0.989 | 0.989 | - | - |\nCancer | 0.942 | 0.942 | - | - |\ncovtype | 0.616 | 0.616 | | **0.657** |\n\nG - Gaussian only\nC - categorical only\nG+C - Gaussian and categorical\n\n## Performance (Speed)\n\nThe library is written in [NumPy](https://numpy.org/), so many operations are vectorised and faster than their for-loop counterparts. Fun fact: my first prototype (with many for-loops) took me 8 times slower than sklearn's \ud83d\ude31.\n\n(Still measuring)\n\n## Tests\n\nI'm still writing more test cases, but in the meantime, you can run the following:\n\n```bash\npytest tests.py\n```\n\n## API Documentation\n\nFor more information on usage of the API, visit [here](https://remykarem.github.io/docs/mixed_naive_bayes.html). This was generated using pdoc3.\n\n## To-Dos\n\n- [ ] Implement `predict_log_proba()`\n- [ ] Write more test cases\n- [ ] Performance (Speed)\n- [X] Support refitting\n- [X] Regulariser for categorical distribution\n- [X] Variance smoothing for Gaussian distribution\n- [X] Vectorised main operations using NumPy\n\nPossible features:\n\n- [ ] Masking in NumPy\n- [ ] Support label encoding\n- [ ] Support missing data\n\n## References\n\n- [scikit-learn's naive bayes](https://scikit-learn.org/stable/modules/classes.html#module-sklearn.naive_bayes)\n\n## Related Work\n\n- [Categorical naive Bayes by scikit-learn](https://scikit-learn.org/dev/modules/generated/sklearn.naive_bayes.CategoricalNB.html)\n- [Naive Bayes classifier for categorical and numerical data](https://github.com/wookieJ/naive-bayes)\n- [Generalised naive Bayes classifier](https://github.com/ashkonf/HybridNaiveBayes)\n\n## Contributing \ufe0f\u2764\ufe0f\n\nPlease submit your pull requests, will appreciate it a lot \u2764\n\n\n", "description_content_type": "text/markdown", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/remykarem/mixed-naive-bayes", "keywords": "", "license": "", "maintainer": "", "maintainer_email": "", "name": "mixed-naive-bayes", "package_url": "https://pypi.org/project/mixed-naive-bayes/", "platform": "", "project_url": "https://pypi.org/project/mixed-naive-bayes/", "project_urls": { "Homepage": "https://github.com/remykarem/mixed-naive-bayes" }, "release_url": "https://pypi.org/project/mixed-naive-bayes/0.0.1/", "requires_dist": [ "numpy (>=1.16.1)", "scikit-learn (>=0.20.2)" ], "requires_python": ">=3.6", "summary": "Categorical and Gaussian Naive Bayes", "version": "0.0.1" }, "last_serial": 5966132, "releases": { "0.0.1": [ { "comment_text": "", "digests": { "md5": "07e70e3a2dca5000d59da635e7f1245a", "sha256": "b570c85d2bfd4f615db86b72d8814732a9e5bc53ccbd8e3ff10b48108df14441" }, "downloads": -1, "filename": "mixed_naive_bayes-0.0.1-py3-none-any.whl", "has_sig": false, "md5_digest": "07e70e3a2dca5000d59da635e7f1245a", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": ">=3.6", "size": 8872, "upload_time": "2019-10-13T02:56:37", "url": "https://files.pythonhosted.org/packages/7a/1f/4af788fa4df56a0aa38cbe949f3c3021ece5200a2d777adb4eddf662468d/mixed_naive_bayes-0.0.1-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "0fdc9478b069da205465873137363ac7", "sha256": "cd012ab7dc8df91c1a419d3aabc9b78c80cb21140edb81e12317c2ce28e6e901" }, "downloads": -1, "filename": "mixed-naive-bayes-0.0.1.tar.gz", "has_sig": false, "md5_digest": "0fdc9478b069da205465873137363ac7", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.6", "size": 27903, "upload_time": "2019-10-13T02:56:41", "url": "https://files.pythonhosted.org/packages/5a/1a/8003c2bc899499799b97da059b70bc74a311dc5bed6fe97258c5aac29532/mixed-naive-bayes-0.0.1.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "07e70e3a2dca5000d59da635e7f1245a", "sha256": "b570c85d2bfd4f615db86b72d8814732a9e5bc53ccbd8e3ff10b48108df14441" }, "downloads": -1, "filename": "mixed_naive_bayes-0.0.1-py3-none-any.whl", "has_sig": false, "md5_digest": "07e70e3a2dca5000d59da635e7f1245a", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": ">=3.6", "size": 8872, "upload_time": "2019-10-13T02:56:37", "url": "https://files.pythonhosted.org/packages/7a/1f/4af788fa4df56a0aa38cbe949f3c3021ece5200a2d777adb4eddf662468d/mixed_naive_bayes-0.0.1-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "0fdc9478b069da205465873137363ac7", "sha256": "cd012ab7dc8df91c1a419d3aabc9b78c80cb21140edb81e12317c2ce28e6e901" }, "downloads": -1, "filename": "mixed-naive-bayes-0.0.1.tar.gz", "has_sig": false, "md5_digest": "0fdc9478b069da205465873137363ac7", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.6", "size": 27903, "upload_time": "2019-10-13T02:56:41", "url": "https://files.pythonhosted.org/packages/5a/1a/8003c2bc899499799b97da059b70bc74a311dc5bed6fe97258c5aac29532/mixed-naive-bayes-0.0.1.tar.gz" } ] }