# Eric.ed.gov – Bayesian Unimodal Density Regression for Causal Inference

eric.ed.gov har udgivet:

Karabatsos and Walker (2011) introduced a new Bayesian nonparametric (BNP) regression model. Through analyses of real and simulated data, they showed that the BNP regression model outperforms other parametric and nonparametric regression models of common use, in terms of predictive accuracy of the outcome (dependent) variable. The other, outperformed, regression models include random-effects/hierarchical linear and generalized linear models, when the random effects were assumed to be normally-distributed (Laird & Ware, 1982; Breslow & Clayton 1993), and when the random effects were more generally modeled by a nonparametric, Dirichlet process (DP) mixture prior (Kleinman & Ibrahim, 1998a,1998b). The authors argue that the new Bayesian nonparametric (BNP) regression model provides a novel, richer, and more valid approach to causal inference, which allows the researcher to investigate how treatments causally change the entire distribution (density) of (potential) outcomes, including not only the mean, but also other features of the outcome variable, such as quantiles (e.g., median, 10th percentile), and the variance. They illustrate the BNP model through the analysis of observational data, to estimate the causal effect of exposure to excellent high school math education (versus non-exposure, the control), on ACT math achievement. In the data analysis, they also compare the predictive accuracy of the new BNP model against other regression models. These other models assume symmetric distributions for the outcomes, and for the inverse-link function of the propensity score model (when specified), and have been recommended for causal inference from observational data. The other models include the normal linear regression model, having one interaction between (1) subject (pre-treatment) covariates, (2) treatment indicators, and (3) indicators of (greater than or equal) 5 matched groups of subjects, formed either by subclassification (Rosenbaum & Rubin, 1984) or optimal full matching on the estimated propensity score. They also compare with the BART model, which provides a very flexible regression of observed outcomes on the treatment variable and the covariates. Extensive data-based simulation studies have shown that, in terms of bias and mean square error in causal effect estimation, these linear regression models and BART outperform normal linear regression of outcomes using (1) propensity-score-based pair-matching or subclassification alone, (2) treatment indicators and estimated propensity scores as covariates, and (3) observation weights defined by inverse of propensity score estimates, when the only covariate is a treatment indicator (Robins, et al. 2000), and when the linear model also includes subject covariates (Kang & Schafer, 2007; Schafer & Kang, 2008; Hill, 2011). These results seemed to hold true, especially when both the outcome and propensity score models were misspecified for the data, which, arguably, almost always occurs in practice. Through the analysis of an observational data set on math achievement, the authors showed that the new BNP regression model can provide richer causal inferences with higher predictive accuracy, compared to typical causal models which focus inference on the mean outcome, and which make restrictive parametric assumptions about the outcome variable and about the propensity score model. The new BNP model allows one to investigate how treatments causally change any interesting aspect of the distribution (density) of (potential) outcomes, in a flexible manner. (Contains 3 tables and 1 figure.) [This research is supported by the Chicago Teacher Partnership Project.]