{ "info": { "author": "Yuhao Su, Jie Ding", "author_email": "su000088@umn.edu, dingj@umn.edu", "bugtrack_url": null, "classifiers": [ "License :: OSI Approved :: MIT License", "Operating System :: OS Independent", "Programming Language :: Python :: 3" ], "description": "# GBART\n\n\n## Introduction\n\nGBART is a pure python package to implement our proposed algorithm GBART in our ICASSP2020 submitted paper: \n\n*Variable Grouping based Bayesian Additive Regression Tree*\n\nThrough GBART, We will seek for potential grouping of variables in such way that there is no nonlinear interaction term between variables of different groups, then we build the final model that can take advantage of such concealed information.\n\nGBART consists of two stages. The first stage is to search for potential interactions and an appropriate grouping of variables. The second stage is to build a final model based on the discovered groups.\n\n\nThis python package is built based on BartPy package, the pure python version of BART proposed by Chipman et. al [1]. For details please refer to the offical website [BartPy](https://github.com/JakeColtman/bartpy).\n\n\n\n## Installation\n### Using pip\n```\npip install gbart\nNote: a typo is fixing, somehow using pip will result in a missed subfolder(modified_bart), we recommend install it manually.\n```\n### Mannually installation\n```\n1. Visit our website https://github.com/augusHsu/gbart\n\n2. Directly download folder \"gbart\"\n\n3. put folders into place where you usually import packages, generally, it should be in \n ~/bin/python3.7/site-packages/ \n Or put it with your script in same folder, then it will be fine.\n```\n\n\n## Usage and examples\n### The easiest way to run and obtain result (generated data)\n\n* Preparation\n\n```\nimport copy\nimport numpy as np\nimport gbart.utilities as ut # provide helper functions\nimport gbart.create_dataset as cd # data generator\nfrom gbart.groupbart import * \n```\n* generate dataset\n\n```\ndataset = cd.create_friedman()\n# the last column of dataset is dependent variable, the output Y.\n```\n* Build model and get accuracy by using BART.\n\n```\nacc_o = build_original_model(dataset)\n# This function splits the whole dataset into training and testing (80% for training)\n# This function returns accuracy in testing data.\n```\n\n* Get the grouping information\n\n```\noutput_pair = get_pair(dataset, theta=1)\n# return the grouping information. The first stage of GBART.\n# theta is a parameter that controls variable selection. Suggested range is [0,1]\n# To turn off the variable selection, set theta=0\n# More variables are likely to be removed when theta is higher.\n\n```\n\n* Build the gbart model \n\n```\nacc_g = build_group_wise_model(dataset, output_pair)\n# take \"output_pair\" as pair_list \n# This function returns accuracy in testing data.\n\n```\n **To design a customization version and/or tune model parameters. Please consider the following**\n\n* Write your own helper function instead of calling functions in **groupbart.py**, the only thing you may need in **groupbart.py** is **get_pair(dataset)**, which will help you find the proper grouping information.\n\n\n* Tune the parameters of both BART and GBART model. Here is an easy example.\n\n```\nimport numpy as np\nimport gbart.utilities as ut\nfrom gbart.modified_bartpy.sklearnmodel import SklearnModel\n\n# Data preparation \nb = int(0.8 * np.shape(dataset)[0]) \nData_train = dataset[:b,:]\nData_predict = dataset[b:,:]\nx_data = Data_train[:,:-1]\ny_data = Data_train[:,-1]\n\n\n# Building the model\nmodel = SklearnModel(sublist=None,\n n_trees=50,\n n_chains=4,\n n_samples=50,\n n_burn=200,\n thin=0.1,\n n_jobs=1)\n# This GBART model is built based on BART model, a new feature named sublist is added. \n# sublist can either take \"None\" or list of groups of variables as input.\n# when sublist is None, it will be the exact BART model.\n# when sublist is a list, it will build GBART model.\n# To tune other parameters, please refer to BartPy usage guide.\n\n# fit and prediction \nmodel.fit(x_data, y_data)\ny_pred = model.predict(Data_predict[:,:-1])\ny_true = Data_predict[:,-1]\nacc = ut.get_error_reg(y_pred, y_true)\n\n```\n### Real data example\n\n\n* Preparation\n\n```\nimport numpy as np\nimport pandas as pd\nfrom gbart.groupbart import * \n```\n* load and pre-processing data\n\n```\nurl = \"~/Admission_Predict_Ver1.1.csv\"\ndata = pd.read_csv(url)\ndata = data.drop(columns=\"Serial No.\")\ndataset = data.iloc[:, :].values\n# Note our package can match with Sklearn, we use the same dataset type. \n```\n* Build model and get accuracy by using BART.\n\n```\nacc_o = build_original_model(dataset)\n# This function splits the whole dataset into training and testing (80% for training)\n# This function returns accuracy in testing data.\n```\n\n* Get the grouping information\n\n```\noutput_pair = get_pair(dataset)\n# return the the grouping information. The first phase of Gbart algorithm.\n\n```\n\n* Build the gbart model \n\n```\nacc_g = build_group_wise_model(dataset, output_pair)\n# take \"output_pair\" as pair_list \n# This function returns accuracy in testing data.\n\n```\n\n* Tune the parameters of both BART and GBART model. Please refer to the generated part.\n\n* One more advcanced thing that worth to explore: During the group searching process, we may need to set some threshold to control the variable selection. I fix it to be 85 percent of the best, you can tune it by change that constant to be an attribute in **get_pair(dataset)**\n\n\n\n\n## Acknowledge \n\nWe truly thank Mr. Jake Coltman for his contribution to BartPy package.\n\n\n## Reference\n\n[1] Chipman, Hugh A., Edward I. George, and Robert E. McCulloch. \u201cBART: Bayesian Additive Regression Trees.\u201d The Annals of Applied Statistics 4.1 (2010): 266\u2013298. Crossref. Web. \n\n\n\n\n\n", "description_content_type": "text/markdown", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/AugusHsu/gbart", "keywords": "", "license": "", "maintainer": "", "maintainer_email": "", "name": "gbart", "package_url": "https://pypi.org/project/gbart/", "platform": "", "project_url": "https://pypi.org/project/gbart/", "project_urls": { "Homepage": "https://github.com/AugusHsu/gbart" }, "release_url": "https://pypi.org/project/gbart/0.2.1/", "requires_dist": [ "joblib", "matplotlib", "numpy", "pandas", "scipy", "seaborn", "sklearn", "statsmodels" ], "requires_python": ">=3.6", "summary": "A python package to implement Variable Grouping Based Bayesian Additive Regression Tree, associated with ICASSP2020 submitted paper", "version": "0.2.1", "yanked": false, "yanked_reason": null }, "last_serial": 6032426, "releases": { "0.0.1": [ { "comment_text": "", "digests": { "md5": "43b3c72354da5b3077e39f5e00ebae00", "sha256": "4736c2394d99131828209aa38c8e0df6b39ddbec3627222f0cede09500580bd7" }, "downloads": -1, "filename": "gbart-0.0.1-py3-none-any.whl", "has_sig": false, "md5_digest": "43b3c72354da5b3077e39f5e00ebae00", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": ">=3.6", "size": 7207, "upload_time": "2019-09-23T20:58:32", "upload_time_iso_8601": "2019-09-23T20:58:32.086987Z", "url": "https://files.pythonhosted.org/packages/3c/c0/a72af563ffcc2b73cb1d6b4fb1190ff34bf0157bcbed2805e5de31b2459a/gbart-0.0.1-py3-none-any.whl", "yanked": false, "yanked_reason": null }, { "comment_text": "", "digests": { "md5": "d811e66cd72a518be358c8af2894a9e2", "sha256": "b09bd4031dbd73a28000ee748374686dd0064860f50fc91684459e56b254b372" }, "downloads": -1, "filename": "gbart-0.0.1.tar.gz", "has_sig": false, "md5_digest": "d811e66cd72a518be358c8af2894a9e2", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.6", "size": 6447, "upload_time": "2019-09-23T20:58:36", "upload_time_iso_8601": "2019-09-23T20:58:36.107040Z", "url": "https://files.pythonhosted.org/packages/50/4e/19f4e3b38020251b44a11154199ee989051f53eeb301579a827535af32fa/gbart-0.0.1.tar.gz", "yanked": false, "yanked_reason": null } ], "0.0.2": [ { "comment_text": "", "digests": { "md5": "b19fe89194c2652930207f0ccc3ee0a8", "sha256": "c7e6146d440b68e131d45edc4a719ed4ecee7496bb43df91ba13604e277ef630" }, "downloads": -1, "filename": "gbart-0.0.2-py3-none-any.whl", "has_sig": false, "md5_digest": "b19fe89194c2652930207f0ccc3ee0a8", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": ">=3.6", "size": 38942, "upload_time": "2019-09-23T22:09:20", "upload_time_iso_8601": "2019-09-23T22:09:20.194783Z", "url": "https://files.pythonhosted.org/packages/29/9b/514ed55b0cf93b27298ae5950c8fd9a9ca4a244f254f8a78a56c372a975e/gbart-0.0.2-py3-none-any.whl", "yanked": false, "yanked_reason": null }, { "comment_text": "", "digests": { "md5": "66990b0a7b619533b56d4c070fffb103", "sha256": "25eda99c968a79ec6d2dc3d8cc473765b7679fc1f1d7e186378664d6f91dfe6a" }, "downloads": -1, "filename": "gbart-0.0.2.tar.gz", "has_sig": false, "md5_digest": "66990b0a7b619533b56d4c070fffb103", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.6", "size": 28306, "upload_time": "2019-09-23T22:09:22", "upload_time_iso_8601": "2019-09-23T22:09:22.662002Z", "url": "https://files.pythonhosted.org/packages/56/02/bc88e1d5b41156d51e077ac69df2ca9c997da017bb7d80b6e8614eee0080/gbart-0.0.2.tar.gz", "yanked": false, "yanked_reason": null } ], "0.0.3": [ { "comment_text": "", "digests": { "md5": "73506b09870043fb0dccfc0d9843d346", "sha256": "397c3d16186c6e6097aa3b09e99cd0b21ccb5f1c8c9a8d18d0de4de7dbde56ec" }, "downloads": -1, "filename": "gbart-0.0.3-py3-none-any.whl", "has_sig": false, "md5_digest": "73506b09870043fb0dccfc0d9843d346", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": ">=3.6", "size": 38926, "upload_time": "2019-10-14T00:07:12", "upload_time_iso_8601": "2019-10-14T00:07:12.348005Z", "url": "https://files.pythonhosted.org/packages/01/3f/b155af5eaddbb2e344bef803db9c46003e5f562233bb53137ae645fb25c7/gbart-0.0.3-py3-none-any.whl", "yanked": false, "yanked_reason": null }, { "comment_text": "", "digests": { "md5": "3e98f25c3c4da5a786db4f896ca7c708", "sha256": "15b58f8773adf131a742858f3937e2c2bcf36bedad78880f72ab1372ab797065" }, "downloads": -1, "filename": "gbart-0.0.3.tar.gz", "has_sig": false, "md5_digest": "3e98f25c3c4da5a786db4f896ca7c708", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.6", "size": 29240, "upload_time": "2019-10-14T00:07:14", "upload_time_iso_8601": "2019-10-14T00:07:14.057756Z", "url": "https://files.pythonhosted.org/packages/88/79/f703a06cdf1bf214da2e3d6654ed1d88be1ec1d62add0cb18045ec2652b4/gbart-0.0.3.tar.gz", "yanked": false, "yanked_reason": null } ], "0.1.0": [ { "comment_text": "", "digests": { "md5": "ee73aba63547667d6e8f763477000f2d", "sha256": "67eebb79df8ce6492d5291645f7e11e6e1254e512e0a2a557668ba2e30b406f7" }, "downloads": -1, "filename": "gbart-0.1.0-py3-none-any.whl", "has_sig": false, "md5_digest": "ee73aba63547667d6e8f763477000f2d", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": ">=3.6", "size": 38921, "upload_time": "2019-10-14T00:12:31", "upload_time_iso_8601": "2019-10-14T00:12:31.175929Z", "url": "https://files.pythonhosted.org/packages/3b/02/9c250b92ad63c5fd6901eea4467e6f2de7133352dcae7896fb2e1d349cb5/gbart-0.1.0-py3-none-any.whl", "yanked": false, "yanked_reason": null }, { "comment_text": "", "digests": { "md5": "427089b448867bb0645dd67179a25191", "sha256": "a83f10d3a14bb3c5c380c2e0887815eca4397b6c4e525a4eccb49fc1c8d9ba5d" }, "downloads": -1, "filename": "gbart-0.1.0.tar.gz", "has_sig": false, "md5_digest": "427089b448867bb0645dd67179a25191", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.6", "size": 29232, "upload_time": "2019-10-14T00:12:32", "upload_time_iso_8601": "2019-10-14T00:12:32.815652Z", "url": "https://files.pythonhosted.org/packages/33/3f/cd1097195a08fa42c2b3ad51e6e8338d0a401554287041090c822e3739f4/gbart-0.1.0.tar.gz", "yanked": false, "yanked_reason": null } ], "0.2.0": [ { "comment_text": "", "digests": { "md5": "8a69b27501a66d597e88238bcc2e48b6", "sha256": "5ab8613fb928cc07a9ebae895698910bb9daa60d71a6ee2639c7492854009b1e" }, "downloads": -1, "filename": "gbart-0.2.0-py3-none-any.whl", "has_sig": false, "md5_digest": "8a69b27501a66d597e88238bcc2e48b6", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": ">=3.6", "size": 39376, "upload_time": "2019-10-23T05:28:24", "upload_time_iso_8601": "2019-10-23T05:28:24.127316Z", "url": "https://files.pythonhosted.org/packages/33/f1/b76c25803ab7f3b59f05424fa1dcbeaa0931c5d54a40059ce3ea340ab325/gbart-0.2.0-py3-none-any.whl", "yanked": false, "yanked_reason": null }, { "comment_text": "", "digests": { "md5": "7d67688b0925fc9d34307999ed839ceb", "sha256": "2e78409e2aa7311b52f48b13e5df0b8a9fe2b8a63ecb052d07a16e2305b8b9c1" }, "downloads": -1, "filename": "gbart-0.2.0.tar.gz", "has_sig": false, "md5_digest": "7d67688b0925fc9d34307999ed839ceb", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.6", "size": 29751, "upload_time": "2019-10-23T05:28:25", "upload_time_iso_8601": "2019-10-23T05:28:25.924858Z", "url": "https://files.pythonhosted.org/packages/da/6b/e0773bc64443b0d157c49cec7bcecd4f4aac246ab6d2cb8501fd618a962b/gbart-0.2.0.tar.gz", "yanked": false, "yanked_reason": null } ], "0.2.1": [ { "comment_text": "", "digests": { "md5": "5b6a11fda7c03c5d989320c31d449b3d", "sha256": "8a3cda10438675cb9f2c9ae2ccaffb66a6b60c86cdc8df9da651e7fab6fc0750" }, "downloads": -1, "filename": "gbart-0.2.1-py3-none-any.whl", "has_sig": false, "md5_digest": "5b6a11fda7c03c5d989320c31d449b3d", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": ">=3.6", "size": 40172, "upload_time": "2019-10-26T00:42:01", "upload_time_iso_8601": "2019-10-26T00:42:01.041001Z", "url": "https://files.pythonhosted.org/packages/0e/cd/ee9d87fac2a2ecadbb37be57c7d028b56ff0bcffdbddb0ad6d92bd975e83/gbart-0.2.1-py3-none-any.whl", "yanked": false, "yanked_reason": null }, { "comment_text": "", "digests": { "md5": "42fd797c87b0a2a80b6bdde11f4b72ac", "sha256": "3f45f93602fd103ebd601fcafd95a99b8c2746eda8ecc1141946e41db3af460e" }, "downloads": -1, "filename": "gbart-0.2.1.tar.gz", "has_sig": false, "md5_digest": "42fd797c87b0a2a80b6bdde11f4b72ac", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.6", "size": 29752, "upload_time": "2019-10-26T00:42:02", "upload_time_iso_8601": "2019-10-26T00:42:02.351206Z", "url": "https://files.pythonhosted.org/packages/97/c8/80983a899056df3bc3b29f5f43f63b9e1f195f8ec5c1e5336e535e0519f9/gbart-0.2.1.tar.gz", "yanked": false, "yanked_reason": null } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "5b6a11fda7c03c5d989320c31d449b3d", "sha256": "8a3cda10438675cb9f2c9ae2ccaffb66a6b60c86cdc8df9da651e7fab6fc0750" }, "downloads": -1, "filename": "gbart-0.2.1-py3-none-any.whl", "has_sig": false, "md5_digest": "5b6a11fda7c03c5d989320c31d449b3d", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": ">=3.6", "size": 40172, "upload_time": "2019-10-26T00:42:01", "upload_time_iso_8601": "2019-10-26T00:42:01.041001Z", "url": "https://files.pythonhosted.org/packages/0e/cd/ee9d87fac2a2ecadbb37be57c7d028b56ff0bcffdbddb0ad6d92bd975e83/gbart-0.2.1-py3-none-any.whl", "yanked": false, "yanked_reason": null }, { "comment_text": "", "digests": { "md5": "42fd797c87b0a2a80b6bdde11f4b72ac", "sha256": "3f45f93602fd103ebd601fcafd95a99b8c2746eda8ecc1141946e41db3af460e" }, "downloads": -1, "filename": "gbart-0.2.1.tar.gz", "has_sig": false, "md5_digest": "42fd797c87b0a2a80b6bdde11f4b72ac", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.6", "size": 29752, "upload_time": "2019-10-26T00:42:02", "upload_time_iso_8601": "2019-10-26T00:42:02.351206Z", "url": "https://files.pythonhosted.org/packages/97/c8/80983a899056df3bc3b29f5f43f63b9e1f195f8ec5c1e5336e535e0519f9/gbart-0.2.1.tar.gz", "yanked": false, "yanked_reason": null } ], "vulnerabilities": [] }