{ "info": { "author": "Amit Sharma, Emre Kiciman", "author_email": "", "bugtrack_url": null, "classifiers": [ "Development Status :: 3 - Alpha", "License :: OSI Approved :: MIT License", "Programming Language :: Python :: 3", "Programming Language :: Python :: 3.4", "Programming Language :: Python :: 3.5", "Programming Language :: Python :: 3.6", "Programming Language :: Python :: 3.7" ], "description": "DoWhy | Making causal inference easy\n====================================\n\n`Amit Sharma `_,\n`Emre Kiciman `_\n\n`Blog Post `_ | `Docs `_ | Try it in a web browser! |Binder|_\n\n.. |Binder| image:: https://mybinder.org/badge_logo.svg\n.. _Binder: https://mybinder.org/v2/gh/microsoft/dowhy/master?filepath=docs%2Fsource%2F\n\n\nAs computing systems are more frequently and more actively intervening in societally critical domains such as healthcare, education, and governance, it is critical to correctly predict and understand the causal effects of these interventions. Without an A/B test, conventional machine learning methods, built on pattern recognition and correlational analyses, are insufficient for causal reasoning. \n\nMuch like machine learning libraries have done for prediction, **\"DoWhy\" is a Python library that aims to spark causal thinking and analysis**. DoWhy provides a unified interface for causal inference methods and automatically tests many assumptions, thus making inference accessible to non-experts.\n\nFor a quick introduction to causal inference, check out `amit-sharma/causal-inference-tutorial `_. We also gave a more comprehensive tutorial at the ACM Knowledge Discovery and Data Mining (`KDD 2018 `_) conference: `causalinference.gitlab.io/kdd-tutorial `_.\n\nDocumentation for DoWhy is available at `microsoft.github.io/dowhy `_.\n\n.. i here comment toctree::\n.. i here comment :maxdepth: 4\n.. i here comment :caption: Contents:\n.. contents:: Contents\n\nThe need for causal inference\n----------------------------------\n\nPredictive models uncover patterns that connect the inputs and outcome in observed data. To intervene, however, we need to estimate the effect of changing an input from its current value, for which no data exists. Such questions, involving estimating a *counterfactual*, are common in decision-making scenarios.\n\n* Will it work?\n * Does a proposed change to a system improve people's outcomes?\n* Why did it work?\n * What led to a change in a system's outcome?\n* What should we do?\n * What changes to a system are likely to improve outcomes for people?\n* What are the overall effects?\n * How does the system interact with human behavior?\n * What is the effect of a system's recommendations on people's activity?\n\nAnswering these questions requires causal reasoning. While many methods exist\nfor causal inference, it is hard to compare their assumptions and robustness of results. DoWhy makes three contributions,\n\n1. Provides a principled way of modeling a given problem as a causal graph so\n that all assumptions explicit.\n2. Provides a unified interface for many popular causal inference methods, combining the two major frameworks of graphical models and potential outcomes.\n3. Automatically tests for the validity of assumptions if possible and assesses\n the robustness of the estimate to violations.\n\n\n\nSample causal inference analysis in DoWhy\n-------------------------------------------\nMost DoWhy\nanalyses for causal inference take 4 lines to write, assuming a\npandas dataframe df that contains the data:\n\n.. code:: python\n\n import dowhy\n from dowhy.do_why import CausalModel\n import dowhy.datasets\n\n # Load some sample data\n data = dowhy.datasets.linear_dataset(\n beta=10,\n num_common_causes=5,\n num_instruments=2,\n num_samples=10000,\n treatment_is_binary=True)\n\nDoWhy supports two formats for providing the causal graph: `gml `_ (preferred) and `dot `_. After loading in the data, we use the four main operations in DoWhy: *model*,\n*estimate*, *identify* and *refute*:\n\n.. code:: python\n\n # Create a causal model from the data and given graph.\n model = CausalModel(\n data=data[\"df\"],\n treatment=data[\"treatment_name\"],\n outcome=data[\"outcome_name\"],\n graph=data[\"gml_graph\"])\n\n # Identify causal effect and return target estimands\n identified_estimand = model.identify_effect()\n\n # Estimate the target estimand using a statistical method.\n estimate = model.estimate_effect(identified_estimand,\n method_name=\"backdoor.propensity_score_matching\")\n\n # Refute the obtained estimate using multiple robustness checks.\n refute_results = model.refute_estimate(identified_estimand, estimate,\n method_name=\"random_common_cause\")\n\nDoWhy stresses on the interpretability of its output. At any point in the analysis,\nyou can inspect the untested assumptions, identified estimands (if any) and the\nestimate (if any). Here's a sample output of the linear regression estimator.\n\n.. image:: https://raw.githubusercontent.com/microsoft/dowhy/master/docs/images/regression_output.png\n\nFor detailed code examples, check out the Jupyter notebooks in `docs/source/ `_, or try them online at `Binder `_.\n\n\nA High-level Pandas API\n-----------------------\n\nWe've made an even simpler API for dowhy which is a light layer on top of the standard one. The goal\nwas to make causal analysis much more like regular exploratory analysis. To use this API, simply\nimport :code:`dowhy.api`. This will magically add the :code:`causal` namespace to your\n:code:`pandas.DataFrame` s. Then,\nyou can use the namespace as follows.\n\n.. code:: python\n\n import dowhy.api\n import dowhy.datasets\n\n data = dowhy.datasets.linear_dataset(beta=5,\n num_common_causes=1,\n num_instruments = 0,\n num_samples=1000,\n treatment_is_binary=True)\n\n # data['df'] is just a regular pandas.DataFrame\n data['df'].causal.do(x='v',\n variable_types={'v': 'b', 'y': 'c', 'X0': 'c'},\n outcome='y',\n common_causes=['X0']).groupby('v').mean().plot(y='y', kind='bar')\n\n.. image:: https://raw.githubusercontent.com/microsoft/dowhy/master/docs/images/do_barplot.png\n\nThe :code:`do` method in the causal namespace generates a random sample from $P(outcome|do(X=x))$ of the\nsame length as your data set, and returns this outcome as a new :code:`DataFrame`. You can continue to perform\nthe usual :code:`DataFrame` operations with this sample, and so you can compute statistics and create plots\nfor causal outcomes!\n\nThe :code:`do` method is built on top of the lower-level :code:`dowhy` objects, so can still take a graph and perform\nidentification automatically when you provide a graph instead of :code:`common_causes`.\n\n\nInstallation\n-------------\n\n**Requirements**\n\nDoWhy support Python 3+. It requires the following packages:\n\n* numpy\n* scipy\n* scikit-learn\n* pandas\n* networkx (for analyzing causal graphs)\n* matplotlib (for general plotting)\n* sympy (for rendering symbolic expressions)\n\nInstall DoWhy and its dependencies by running this from the top-most folder of\nthe repo.\n\n.. code:: shell\n \n python setup.py install\n\nIf you face any problems, try installing dependencies manually.\n\n.. code:: shell\n \n pip install -r requirements.txt\n\nOptionally, if you wish to input graphs in the dot format, then install pydot (or pygraphviz).\n\n\nFor better-looking graphs, you can optionally install pygraphviz. To proceed,\nfirst install graphviz and then pygraphviz (on Ubuntu and Ubuntu WSL).\n\n.. code:: shell\n\n sudo apt install graphviz libgraphviz-dev graphviz-dev pkg-config\n ## from https://github.com/pygraphviz/pygraphviz/issues/71\n pip install pygraphviz --install-option=\"--include-path=/usr/include/graphviz\" \\\n --install-option=\"--library-path=/usr/lib/graphviz/\"\n\nKeep in mind that pygraphviz installation can be problematic on the latest versions of Python3. Tested to work with Python 3.5.\n\nGraphical Models and Potential Outcomes: Best of both worlds\n------------------------------------------------------------\nDoWhy builds on two of the most powerful frameworks for causal inference:\ngraphical models and potential outcomes. It uses graph-based criteria and\ndo-calculus for modeling assumptions and identifying a non-parametric causal effect.\nFor estimation, it switches to methods based primarily on potential outcomes.\n\nA unifying language for causal inference\n----------------------------------------\n\nDoWhy is based on a simple unifying language for causal inference. Causal\ninference may seem tricky, but almost all methods follow four key steps:\n\n1. Model a causal inference problem using assumptions.\n2. Identify an expression for the causal effect under these assumptions (\"causal estimand\").\n3. Estimate the expression using statistical methods such as matching or instrumental variables.\n4. Finally, verify the validity of the estimate using a variety of robustness checks.\n\nThis workflow can be captured by four key verbs in DoWhy:\n\n- model\n- identify\n- estimate\n- refute\n\nUsing these verbs, DoWhy implements a causal inference engine that can support \na variety of methods. *model* encodes prior knowledge as a formal causal graph, *identify* uses \ngraph-based methods to identify the causal effect, *estimate* uses \nstatistical methods for estimating the identified estimand, and finally *refute* \ntries to refute the obtained estimate by testing robustness to assumptions.\n\nDoWhy brings three key differences compared to available software for causal inference:\n\n**Explicit identifying assumptions**\n Assumptions are first-class citizens in DoWhy.\n\n Each analysis starts with a\n building a causal model. The assumptions can be viewed graphically or in terms\n of conditional independence statements. Wherever possible, DoWhy can also\n automatically test for stated assumptions using observed data.\n\n**Separation between identification and estimation**\n Identification is the causal problem. Estimation is simply a statistical problem.\n\n DoWhy\n respects this boundary and treats them separately. This focuses the causal\n inference effort on identification, and frees up estimation using any\n available statistical estimator for a target estimand. In addition, multiple\n estimation methods can be used for a single identified_estimand and\n vice-versa.\n\n**Automated robustness checks**\n What happens when key identifying assumptions may not be satisfied?\n\n The most critical, and often skipped, part of causal analysis is checking the\n robustness of an estimate to unverified assumptions. DoWhy makes it easy to\n automatically run sensitivity and robustness checks on the obtained estimate.\n\nFinally, DoWhy is easily extensible, allowing other implementations of the\nfour verbs to co-exist (we hope to integrate with external\nimplementations in the future). The four verbs are mutually independent, so their\nimplementations can be combined in any way.\n\n\n\nBelow are more details about the current implementation of each of these verbs.\n\nModel a causal problem\n-----------------------\nDoWhy creates an underlying causal graphical model for each problem. This\nserves to make each causal assumption explicit. This graph need not be\ncomplete---you can provide a partial graph, representing prior\nknowledge about some of the variables. DoWhy automatically considers the rest\nof the variables as potential confounders.\n\nCurrently, DoWhy supports two formats for graph input: `gml `_ (preferred) and\n`dot `_. We strongly suggest to use gml as the input format, as it works well with networkx. You can provide the graph either as a .gml file or as a string. If you prefer to use dot format, you will need to install additional packages (pydot or pygraphviz, see the installation section above). Both .dot files and string format are supported. \n\nWhile not recommended, you can also specify common causes and/or instruments directly\ninstead of providing a graph.\n\n\n.. i comment image:: causal_model.png\n\nIdentify a target estimand under the model\n------------------------------------------\nBased on the causal graph, DoWhy finds all possible ways of identifying a desired causal effect based on\nthe graphical model. It uses graph-based criteria and do-calculus to find\npotential ways find expressions that can identify the causal effect.\n\nEstimate causal effect based on the identified estimand\n-------------------------------------------------------\nDoWhy supports methods based on both back-door criterion and instrumental\nvariables. It also provides a non-parametric permutation test for testing\nthe statistical significance of obtained estimate. \n\nCurrently supported back-door criterion methods.\n\n* Methods based on estimating the treatment assignment\n * Propensity-based Stratification\n * Propensity Score Matching\n * Inverse Propensity Weighting\n\n* Methods based on estimating the response surface\n * Regression\n\nCurrently supported methods based on instrumental variables.\n\n* Binary Instrument/Wald Estimator\n* Regression discontinuity\n\n\nRefute the obtained estimate\n----------------------------\nHaving access to multiple refutation methods to verify a causal inference is\na key benefit of using DoWhy.\n\nDoWhy supports the following refutation methods.\n\n* Placebo Treatment\n* Irrelevant Additional Confounder\n* Subset validation\n\n\nRoadmap \n-----------\nThe `projects `_ page lists the next steps for DoWhy. If you would like to contribute, have a look at the current projects. If you have a specific request for DoWhy, please raise an issue `here `_.\n\nContributing\n-------------\n\nThis project welcomes contributions and suggestions. Most contributions require you to agree to a\nContributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us\nthe rights to use your contribution. For details, visit https://cla.microsoft.com.\n\nWhen you submit a pull request, a CLA-bot will automatically determine whether you need to provide\na CLA and decorate the PR appropriately (e.g., label, comment). Simply follow the instructions\nprovided by the bot. You will only need to do this once across all repos using our CLA.\n\nThis project has adopted the `Microsoft Open Source Code of Conduct `_.\nFor more information see the `Code of Conduct FAQ `_ or\ncontact `opencode@microsoft.com `_ with any additional questions or comments.", "description_content_type": "", "docs_url": null, "download_url": "https://github.com/microsoft/dowhy/archive/v0.1.1-alpha.tar.gz", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/microsoft/dowhy", "keywords": "causality machine-learning causal-inference statistics graphical-model", "license": "MIT", "maintainer": "", "maintainer_email": "", "name": "dowhy", "package_url": "https://pypi.org/project/dowhy/", "platform": "", "project_url": "https://pypi.org/project/dowhy/", "project_urls": { "Download": "https://github.com/microsoft/dowhy/archive/v0.1.1-alpha.tar.gz", "Homepage": "https://github.com/microsoft/dowhy" }, "release_url": "https://pypi.org/project/dowhy/0.1.1/", "requires_dist": null, "requires_python": ">=3.0", "summary": "DoWhy is a Python library for causal inference that supports explicit modeling and testing of causal assumptions.", "version": "0.1.1" }, "last_serial": 5534873, "releases": { "0.1.1": [ { "comment_text": "", "digests": { "md5": "0d530080cf915fc49474976abab38d5a", "sha256": "b322fdda2f43677e37ce44bcb1e413c5e54239e718a5b8616e98fb7b87138d5b" }, "downloads": -1, "filename": "dowhy-0.1.1.tar.gz", "has_sig": false, "md5_digest": "0d530080cf915fc49474976abab38d5a", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.0", "size": 459272, "upload_time": "2019-07-15T13:12:59", "url": "https://files.pythonhosted.org/packages/ec/24/1909fc4098920682e91d10a317186d11190c87be67319328fee0953d480d/dowhy-0.1.1.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "0d530080cf915fc49474976abab38d5a", "sha256": "b322fdda2f43677e37ce44bcb1e413c5e54239e718a5b8616e98fb7b87138d5b" }, "downloads": -1, "filename": "dowhy-0.1.1.tar.gz", "has_sig": false, "md5_digest": "0d530080cf915fc49474976abab38d5a", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.0", "size": 459272, "upload_time": "2019-07-15T13:12:59", "url": "https://files.pythonhosted.org/packages/ec/24/1909fc4098920682e91d10a317186d11190c87be67319328fee0953d480d/dowhy-0.1.1.tar.gz" } ] }