{ "info": { "author": "US ", "author_email": "who@gmail.com", "bugtrack_url": null, "classifiers": [ "Development Status :: 4 - Beta", "Environment :: Console", "Intended Audience :: Developers", "Intended Audience :: End Users/Desktop", "License :: OSI Approved :: Apache Software License", "Operating System :: OS Independent", "Programming Language :: Python", "Topic :: Scientific/Engineering :: Information Analysis" ], "description": "# Cape\nAs an initial contribution we have developed a novel system called Cape (Counterbalancing with Aggregation Patterns for Explanations) for explaining outliers in aggregation queries through counterbalancing. That is, explanations are outliers in the opposite direction of the outlier of interest.\n**Cape** (**C**ounterbalancing with **A**ggregation **P**atterns for **E**xplanations) is a system that explains outliers (surprisingly low or high) aggregation function results for group-by queries in SQL. The user provides the system with a query and a surprising outcome for this query. Cape then uses patterns discovered over the input data of the query in an offline pattern mining step to explain the outlier by finding an *related* outlier in the opposite direction that **counterbalances** the outlier of interest.\n\n# Installation\n\n## Prerequisites\n\nCape requires python 3 and uses python's `tkinter` for its graphical UI. For example, on ubuntu you can install the prerequisites with:\n\n~~~shell\nsudo apt-get install python3 python3-pip python3-tk\n~~~\n\n## Install with pip\n\nMake sure you have pip installed (see previous step).\n\n~~~shell\npip3 install capexplain\n~~~\n\n## Install from github\n\nAlternatively, you can directly install from source. For that you need python3 and the setuptools package. You probably would want to use a [virtual environment](https://packaging.python.org/guides/installing-using-pip-and-virtual-environments/) for that. If you have `python3` and `pip` installed you can install/update the `setuptools` package by running:\n\n~~~shell\n~~~\n\nTo install Cape, clone the github repository and use `setup.py` to install.\n\n~~~shell\ngit clone git@github.com:IITDBGroup/cape.git capexplain\ncd capexplain\npython3 setup.py install\n~~~\n\n# Usage\n\nCape provides a single binary `capexplain` that support multiple subcommands. The general form is:\n\n~~~shell\ncapexplain COMMAND [OPTIONS]\n~~~\n\nOptions are specific to each subcommand. Use `capexplain help` to see a list of supported commands and `capexplain help COMMAND` get more detailed help for a subcommand.\n\n## Overview\n\nCape currently only supports PostgreSQL as a backend database (version 9 or higher). To use Cape to explain an aggregation outlier, you first have to let cape find patterns for the table over which you are aggregating. This an offline step that only has to be executed only once for each table (unless you want to re-run pattern mining with different parameter settings). Afterwards, you can either use the commandline or Cape's UI to request explanations for an outlier in an aggregation query result.\n\n## Mining Patterns\n\nUse `capexplain mine [OPTIONS]` to mine patterns. Cape will store the discovered patterns in the database. The \"mined\" patterns will be stored in a created schema called `pattern`, and the pattern tables generated after running `mine` command are `pattern.{target_table}_global` and `pattern.{target_table}_local`. At the minimum you have to tell Cape how to connect to the database you want to use and which table it should generate patterns for. Run `capexplain help mine` to get a list of all supported options for the mine command. The options needed to specify the target table and database connection are:\n\n~~~shell\n-h ,--host - database connection host IP address (DEFAULT: 127.0.0.1)\n-u ,--user - database connection user (DEFAULT: postgres)\n-p ,--password - database connection password\n-d ,--db - database name (DEFAULT: postgres)\n-P ,--port - database connection port (DEFAULT: 5432)\n-t ,--target-table - mine patterns for this table\n~~~\n\nFor instance, if you run a postgres server locally (default) with user `postgres` (default), password `test`, and want to mine patterns for a table `employees` in database `mydb`, then run:\n\n~~~shell\ncapexplain mine -p test -d mydb -t employees\n~~~\n\n### Mining algorithm parameters\n\nCape's mining algorithm takes the following arguments:\n\n~~~shell\n--gof-const - goodness-of-fit threshold for constant regression (DEFAULT: 0.1)\n--gof-linear - goodness-of-fit threshold for linear regression (DEFAULT: 0.1)\n--confidence - global confidence threshold\n-r ,--regpackage - regression analysis package to use {'statsmodels', 'sklearn'} (DEFAULT: statsmodels)\n--local-support - local support threshold (DEFAULT: 10)\n--global-support - global support thresh (DEFAULT: 100)\n-f ,--fd-optimizations - activate functional dependency detection and optimizations (DEFAULT: False)\n-a ,--algorithm - algorithm to use for pattern mining {'naive', 'optimized', 'naive_alternative'} (DEFAULT: optimized)\n--show-progress - show progress meters (DEFAULT: True)\n--manual-config - manually configure numeric-like string fields (treat fields as string or numeric?) (DEFAULT: False)\n\n~~~\n\n### Running our \"crime\" data example\n\nWe included a subset of the \"Chicago Crime\" dataset (https://data.cityofchicago.org/Public-Safety/Crimes-2001-to-present/)\nin our repository for user to play with. To import this dataset in your postgres databse, under `/testdb` directory, run the following command template:\n\n~~~shell\npsql -h -U -d < ~/cape/testdb/crime_demonstration.sql\n~~~\nthen run the `capexplain` commands accordingly to explore this example.\n\n## Explaining Outliers\n\nTo explain an aggregation outlier use `capexplain explain [OPTIONS]`.\n\n~~~shell\n-l ,--log - select log level {DEBUG,INFO,WARNING,ERROR} (DEFAULT: ERROR)\n--help - show this help message\n-h ,--host - database connection host IP address (DEFAULT: 127.0.0.1)\n-u ,--user - database connection user (DEFAULT: postgres)\n-p ,--password - database connection password\n-d ,--db - database name (DEFAULT: postgres)\n-P ,--port - database connection port (DEFAULT: 5432)\n--ptable - table storing aggregate regression patterns\n--qtable - table storing aggregation query result\n--ufile - file storing user question\n-o ,--ofile - file to write output to\n-a ,--aggcolumn - column that was input to the aggregation function\n~~~\nfor `explain` option, besides the common options, user should give `--ptable`,the `pattern.{target_table}` and `--qtable`, the `target_table`. Also, we currently only allow user pass question through a `.txt` file, user need to put the question in the following format:\n\n~~~shell\nattribute1, attribute 2, attribute3...., direction\nvalue1,value2,value3...., high/low\n~~~\nplease refer to `input.txt` to look at an example.\n\n\n## Starting the Explanation Explorer GUI\n\nCape comes with a graphical UI for running queries, selecting outliers of interest, and exploring patterns that are relevant for an outlier and browsing explanations generated by the system. You need to specify the Postgres server to connect to. The explorer can only generate explanations for queries over tables for which patterns have mined beforehand using `capexplain mine`.\nHere is our demo video : (https://www.youtube.com/watch?v=gWqhIUrcwz8)\n\n~~~shell\n$ capexplain help gui\ncapexplain gui [OPTIONS]:\n\tOpen the Cape graphical explanation explorer.\n\nSUPPORTED OPTIONS:\n-l ,--log - select log level {DEBUG,INFO,WARNING,ERROR} (DEFAULT: ERROR)\n--help - show this help message\n-h ,--host - database connection host IP address (DEFAULT: 127.0.0.1)\n-u ,--user - database connection user (DEFAULT: postgres)\n-p ,--password - database connection password\n-d ,--db - database name (DEFAULT: postgres)\n-P ,--port - database connection port (DEFAULT: 5432)\n~~~\n\nFor instance, if you run a postgres server locally (default) with user `postgres` (default), password `test`, and database `mydb`, then run:\n\n~~~shell\ncapexplain gui -p test -d mydb\n~~~\n\n# Links\n\nCape is developed by researchers at Illinois Institute of Technology and Duke University. For more information and publications see the Cape project page [http://www.cs.iit.edu/~dbgroup/projects/cape.html](http://www.cs.iit.edu/~dbgroup/projects/cape.html).\n\n\n", "description_content_type": "text/markdown", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/iitdbgroup/cape", "keywords": "db", "license": "Apache2", "maintainer": "", "maintainer_email": "", "name": "capexplain", "package_url": "https://pypi.org/project/capexplain/", "platform": "any", "project_url": "https://pypi.org/project/capexplain/", "project_urls": { "Homepage": "https://github.com/iitdbgroup/cape" }, "release_url": "https://pypi.org/project/capexplain/0.4/", "requires_dist": [ "certifi (>=2018.4.16)", "chardet (>=3.0.4)", "colorful (>=0.4.1)", "geopy", "idna (>=2.7)", "matplotlib (==3.0.2)", "numpy (>=1.14.5)", "pandas (==0.23.4)", "patsy (>=0.5.0)", "pkginfo (>=1.4.2)", "psycopg2-binary (>=2.7.6)", "python-dateutil (>=2.7.3)", "pytz (>=2018.5)", "requests (>=2.19.1)", "requests-toolbelt (>=0.8.0)", "scikit-learn (>=0.19.2)", "scipy (==1.2.1)", "six (>=1.11.0)", "sklearn (>=0.0)", "SQLAlchemy (==1.2.10)", "statsmodels (>=0.9.0)", "tqdm (>=4.23.4)", "urllib3 (<1.24,>=1.23)", "pandastable (>=0.12.0)", "matplotlib (>=3.0.2)" ], "requires_python": "", "summary": "Cape - a system for explaining outliers in aggregation results through counterbalancing.", "version": "0.4" }, "last_serial": 5723584, "releases": { "0.2": [ { "comment_text": "", "digests": { "md5": "49e40aff083e7b81b257a9c3c619248a", "sha256": "9d628bddff71b896408ffddd9772edf9033b8fdd481d5702e8d0ab6a3e795ab2" }, "downloads": -1, "filename": "capexplain-0.2-py3-none-any.whl", "has_sig": false, "md5_digest": "49e40aff083e7b81b257a9c3c619248a", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 94201, "upload_time": "2019-06-13T05:06:18", "url": "https://files.pythonhosted.org/packages/d6/c1/27c423a073bc7c0605ca0431d8954c669abf1229fb0abf893165037e5896/capexplain-0.2-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "60473a4f5d11da40061a13bd72142264", "sha256": "dc0273912b444631c5e5419c69889efdb06dab2167507a7db83dabe2bad87c84" }, "downloads": -1, "filename": "capexplain-0.2.tar.gz", "has_sig": false, "md5_digest": "60473a4f5d11da40061a13bd72142264", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 69089, "upload_time": "2019-06-13T05:06:20", "url": "https://files.pythonhosted.org/packages/d8/56/392696ac55517bb0ab170592b7bb84986b365035e174b03aad17b5fbf0b2/capexplain-0.2.tar.gz" } ], "0.3": [ { "comment_text": "", "digests": { "md5": "eaff7a06d43695c0f5a5450cf1a88d74", "sha256": "fc36e7888fb28dbd434cce070f6890c0aefee71889981e4131920031175197f3" }, "downloads": -1, "filename": "capexplain-0.3-py3-none-any.whl", "has_sig": false, "md5_digest": "eaff7a06d43695c0f5a5450cf1a88d74", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 94211, "upload_time": "2019-06-13T05:16:23", "url": "https://files.pythonhosted.org/packages/9e/e4/476c01af552096fdb2675c18ad39a55f71e349a7edc508831a39795493c1/capexplain-0.3-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "58cc3ff18bc34bd5ca570fd7f8cfc4d1", "sha256": "64aca37b65a7f977b94012a42d99cf0ed7f8ef67d46d87ead57a5d84d847adf1" }, "downloads": -1, "filename": "capexplain-0.3.tar.gz", "has_sig": false, "md5_digest": "58cc3ff18bc34bd5ca570fd7f8cfc4d1", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 69098, "upload_time": "2019-06-13T05:16:26", "url": "https://files.pythonhosted.org/packages/47/0f/d158ab40403c24562dc52ae0ca68f1a1ff126164530377adf1fe5483bf53/capexplain-0.3.tar.gz" } ], "0.4": [ { "comment_text": "", "digests": { "md5": "400604749bb8f396cd0ea3f2e84f3882", "sha256": "0efe895e57e3fe20434f1a5f710eb70c340bf40dc6b72ccf6eb24ea085da67f8" }, "downloads": -1, "filename": "capexplain-0.4-py3-none-any.whl", "has_sig": false, "md5_digest": "400604749bb8f396cd0ea3f2e84f3882", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 99607, "upload_time": "2019-08-24T03:42:55", "url": "https://files.pythonhosted.org/packages/62/e9/3840a1ec6029031d9388fa79737506aa0b5aa2fd6997c3d892d9e12b395d/capexplain-0.4-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "3d8538b313c71302831e7c3a6ab84891", "sha256": "075b8e5e94fdc655a43feecc6a92da546042ce0ef0642e8e7264d00cdec9585a" }, "downloads": -1, "filename": "capexplain-0.4.tar.gz", "has_sig": false, "md5_digest": "3d8538b313c71302831e7c3a6ab84891", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 77501, "upload_time": "2019-08-24T03:42:57", "url": "https://files.pythonhosted.org/packages/c6/61/4bcbe934d46fac7e5555dd2255634ab35c3e33c6ecbf3638fe5ae2c02378/capexplain-0.4.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "400604749bb8f396cd0ea3f2e84f3882", "sha256": "0efe895e57e3fe20434f1a5f710eb70c340bf40dc6b72ccf6eb24ea085da67f8" }, "downloads": -1, "filename": "capexplain-0.4-py3-none-any.whl", "has_sig": false, "md5_digest": "400604749bb8f396cd0ea3f2e84f3882", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 99607, "upload_time": "2019-08-24T03:42:55", "url": "https://files.pythonhosted.org/packages/62/e9/3840a1ec6029031d9388fa79737506aa0b5aa2fd6997c3d892d9e12b395d/capexplain-0.4-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "3d8538b313c71302831e7c3a6ab84891", "sha256": "075b8e5e94fdc655a43feecc6a92da546042ce0ef0642e8e7264d00cdec9585a" }, "downloads": -1, "filename": "capexplain-0.4.tar.gz", "has_sig": false, "md5_digest": "3d8538b313c71302831e7c3a6ab84891", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 77501, "upload_time": "2019-08-24T03:42:57", "url": "https://files.pythonhosted.org/packages/c6/61/4bcbe934d46fac7e5555dd2255634ab35c3e33c6ecbf3638fe5ae2c02378/capexplain-0.4.tar.gz" } ] }