{ "info": { "author": "Zhen Li, Tong Wang", "author_email": "tong-wang@uiowa.edu", "bugtrack_url": null, "classifiers": [ "Development Status :: 3 - Alpha", "Intended Audience :: Developers", "License :: OSI Approved :: MIT License", "Programming Language :: Python :: 3.5", "Topic :: Software Development :: Build Tools" ], "description": "Bayesian Rule Set Mining\n\nFind the rule set from the data\nThe input data should follow the following format:\nX has to be a pandas DataFrame\nall the column names can not contain '_' or '<'\nand the column names can not be pure numbers\nThe categorical data should be represented in string\n(For example, gender needs to be 'male'/'female',\n or '0'/'1' to represent male and female respectively.)\nThe parser will only recognize this format of data.\nSo transform the data set first before using the\nfunctions.\ny hass to be a numpy.ndarray\n\nreference:\nWang, Tong, et al. \"Bayesian rule sets for interpretable classification.\" \nData Mining (ICDM), 2016 IEEE 16th International Conference on. IEEE, 2016.\n\nThe program is very picky on the input data format\nX needs to be a pandas DataFrame,\ny needs to be a nd.array\n\n Parameters\n ----------\n max_rules : int, default 5000\n Maximum number of rules when generating rules\n\n max_iter : int, default 200\n Maximun number of iteratations to find the rule set\n\n chians : int, default 1\n Number of chains that run in parallel\n\n support : int, default 5\n The support is the percentile threshold for the itemset\n to be selected.\n\n maxlen : int, default 3\n The maximum number of items in a rule\n\n #note need to replace all the alpha_1 to alpha_+\n alpha_1 : float, default 100\n alpha_+\n\n beta_1 : float, default 1\n beta_+\n\n alpha_2 : float, default 100\n alpha_-\n\n beta_2 : float, default 1\n beta_-\n\n alpha_l : float array, shape (maxlen+1,)\n default all elements to be 1\n\n beta_l : float array, shape (maxlen+1,)\n default corresponding patternSpace\n\n level : int, default 4\n Number of intervals to deal with numerical continous features\n\n neg : boolean, default True\n Negate the features\n\n add_rules : list, default empty\n User defined rules to add\n it needs user to add numerical version of the rules\n\n criteria : str, default 'precision'\n When there are rules more than max_rules,\n the criteria used to filter rules\n\n greedy_initilization : boolean, default False\n Wether start the rule set using a greedy\n initilization (according to accuracy)\n\n greedy_threshold : float, default 0.05\n Threshold for the greedy algorithm\n to find the starting rule set\n\n propose_threshold : float, default 0.1\n Threshold for a proposal to be accepted\n\n method : str, default 'fpgrowth'\n The method used to generate rules.\n Can be 'fpgrowth' or 'forest'\n Notice that if there are potentially many rules\n then fpgrowth is not a good method as it will\n have memory issue (because the rule screening is\n after rule generations).\n\n\nSample usage\n\nfrom ruleset import *\n\ndf = pd.read_csv('data/adult.dat', header=None, sep=',', names=['age', 'workclass', 'fnlwgt', 'education', 'educationnum', 'matritalstatus', 'occupation', 'relationship', 'race', 'sex', 'capitalgain', 'capitalloss', 'hoursperweek', 'nativecountary', 'income'])\ny = (df['income'] == '>50K').as_matrix()\ndf.drop('income', axis=1, inplace=True)\nmodel = BayesianRuleSet(method='forest')\nmodel.fit(df, y)\n\n\n", "description_content_type": null, "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/zli37/bayesianRuleSet", "keywords": "data mining analysis", "license": "MIT", "maintainer": "", "maintainer_email": "", "name": "ruleset", "package_url": "https://pypi.org/project/ruleset/", "platform": "", "project_url": "https://pypi.org/project/ruleset/", "project_urls": { "Homepage": "https://github.com/zli37/bayesianRuleSet" }, "release_url": "https://pypi.org/project/ruleset/1.0.1/", "requires_dist": [ "numpy", "pandas", "scipy", "sklearn" ], "requires_python": ">=3", "summary": "Bayesian Rule Set Mining", "version": "1.0.1" }, "last_serial": 3269188, "releases": { "1.0.0": [ { "comment_text": "", "digests": { "md5": "7a2bb64836dd3949194c58649dffb00b", "sha256": "1a68f757229a12fd62f277cb2860f0eba30332cd8c1af763cbc38c6187d82fba" }, "downloads": -1, "filename": "ruleset-1.0.0-py3-none-any.whl", "has_sig": false, "md5_digest": "7a2bb64836dd3949194c58649dffb00b", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": ">=3", "size": 15367, "upload_time": "2017-08-15T20:49:23", "url": "https://files.pythonhosted.org/packages/bd/6e/26642d8ef83dfccb0f6308dd928c9d4cdeaae895a67ef2dac079f736f510/ruleset-1.0.0-py3-none-any.whl" } ], "1.0.1": [ { "comment_text": "", "digests": { "md5": "9c694701f513143797631723a823f119", "sha256": "09ae4f86912094d7a652ff077ffd4dbfede7058154fb3b4d0d23a3735c661fed" }, "downloads": -1, "filename": "ruleset-1.0.1-py3-none-any.whl", "has_sig": false, "md5_digest": "9c694701f513143797631723a823f119", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": ">=3", "size": 15367, "upload_time": "2017-10-22T03:32:52", "url": "https://files.pythonhosted.org/packages/9f/3e/f1f88d114bd67c609756acb7ac6f4140c47581fdb30907b1ecf0a95ff32e/ruleset-1.0.1-py3-none-any.whl" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "9c694701f513143797631723a823f119", "sha256": "09ae4f86912094d7a652ff077ffd4dbfede7058154fb3b4d0d23a3735c661fed" }, "downloads": -1, "filename": "ruleset-1.0.1-py3-none-any.whl", "has_sig": false, "md5_digest": "9c694701f513143797631723a823f119", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": ">=3", "size": 15367, "upload_time": "2017-10-22T03:32:52", "url": "https://files.pythonhosted.org/packages/9f/3e/f1f88d114bd67c609756acb7ac6f4140c47581fdb30907b1ecf0a95ff32e/ruleset-1.0.1-py3-none-any.whl" } ] }