Doubly Robust / AIPW Estimation
Combines outcome modeling and propensity score weighting — consistent if either model is correctly specified.
Quick Reference
- When to Use
- When you rely on selection-on-observables and want protection against model misspecification — consistent if either the outcome model or the propensity score model is correctly specified.
- Key Assumption
- Conditional independence (selection on observables) and positivity/overlap (propensity scores bounded away from 0 and 1). At least one of the two models (outcome or propensity) must be correctly specified for consistency.
- Common Mistake
- Assuming doubly robust means no assumptions — you still need conditional independence and overlap. Also, not checking for positivity violations (propensity scores near 0 or 1) which cause extreme inverse probability weights.
- Estimated Time
- 2.5 hours
One-Line Implementation
teffects aipw (outcome x1 x2 x3) (treatment x1 x2 x3)AIPW$new(Y=df$outcome, A=df$treatment, W=df[,c('x1','x2','x3')])$fit()$summary()from econml.dr import DRLearner; DRLearner().fit(Y, T, X=X, W=W)Download Full Analysis Code
Complete scripts with diagnostics, robustness checks, and result export.
Motivating Example
A government wants to evaluate its job training program. Randomization was not possible — the program was offered to anyone who signed up, and the people who signed up are different from those who did not. You observe a rich set of covariates (education, age, prior earnings, employment history, neighborhood characteristics) and are willing to assume that, conditional on these covariates, participation is as good as random.
You have two obvious strategies:
Strategy 1: Outcome regression. Model the outcome (post-program earnings) as a function of treatment and covariates using, say, OLS. If the model is correctly specified, this regression gives you the treatment effect. But if you get the functional form wrong — maybe the true relationship is nonlinear, or there are interactions you missed — your estimate is biased.
Strategy 2: Propensity score weighting. Model the treatment assignment (who signs up) as a function of covariates, as in matching methods. Re-weight observations so that the treated and control groups are balanced on observables. If the propensity score model is correctly specified, this reweighting gives you the treatment effect. But if you misspecify the selection model, the weights are wrong and your estimate is biased.
Each strategy relies on getting one model right. What if you are not sure which one you got right?
Doubly robust estimation combines both strategies into a single estimator that is consistent if either the outcome model or the propensity score model is correctly specified. You only need one of the two to work. This property — known as double robustness — is a key theoretical advantage of the estimator.
A. Overview
The doubly robust estimator, also called the augmented inverse probability weighted (AIPW) estimator, was developed by Robins and colleagues in the biostatistics literature.
(Robins et al., 1994) (Bang & Robins, 2005)The intuition is instructive. The AIPW estimator starts with the outcome regression and then corrects it using the propensity score weights. If the outcome model is right, the correction term has expected value zero and does not hurt. If the outcome model is wrong but the propensity score model is right, the correction term exactly removes the bias. You get two chances to be right.
How It Works (Intuitively)
Think of it in two steps:
-
Predict outcomes for everyone under both treatment and control using your outcome model. Compute the predicted treatment effect.
-
Correct the prediction errors using propensity score weights. For each observation, look at the residual (actual outcome minus predicted outcome). Weight these residuals by the inverse of the propensity score to correct for any remaining imbalance.
If your outcome model is perfect, the residuals are pure noise, and the correction adds nothing. If your outcome model is imperfect but your propensity scores are right, the weighted residuals exactly correct the bias.
Common Confusions
"Does doubly robust mean I do not need the selection-on-observables assumption?" Absolutely not. Doubly robust estimation still requires conditional independence (also called unconfoundedness or ignorability). It protects against model misspecification, not against omitted variables. If there are unobserved confounders, no amount of modeling sophistication will save you.
"What if both models are wrong?" Then the doubly robust estimator is also wrong. "Doubly robust" does not mean "right in all cases." It means "right if either component is right." If both are wrong, the bias could actually be worse than either individual estimator.
"Is doubly robust the same as double machine learning (DML)?" They are related but distinct. DML uses ML methods for both nuisance functions (outcome model and propensity score) and adds cross-fitting to prevent overfitting bias. The doubly robust property is a building block of DML, but DML adds important refinements. See the DML page for details.
"Should I use the same covariates in both models?" You can, but you do not have to. The outcome model and the propensity score model can include different covariates. In practice, including the same core covariates in both is a reasonable default, but you might want additional variables in one model based on domain knowledge.
B. Identification
The Target Estimand
We want the Average Treatment Effect on the Treated (ATT) or the Average Treatment Effect (ATE):
Under the conditional independence assumption (CIA):
where is the treatment indicator and are observed covariates.
The AIPW Estimator
The AIPW estimator for the ATE is:
where:
- is the estimated outcome under treatment
- is the estimated outcome under control
- is the estimated propensity score
The Doubly Robust Property
This estimator is consistent if:
- The outcome models are correctly specified, OR
- The propensity score model is correctly specified.
You need at least one to be right, but you do not need both.
Positivity Assumption
In addition to CIA, you need the positivity (or overlap) assumption:
In words: for every combination of covariate values, there must be a positive probability of being either treated or untreated. Without this condition, the inverse probability weights become unbounded.
C. Visual Intuition
Imagine a scatterplot of propensity scores for treated and control units. Good overlap means the two distributions overlap substantially. Poor overlap means there are regions where only treated (or only control) units exist. In those regions, the propensity score is near 0 or 1, and the IPW weights become extreme.
The doubly robust estimator is most valuable when the propensity score distributions overlap reasonably well, but you are uncertain about whether your outcome model captures the true functional form.
Doubly Robust in Action
Compare three estimators: outcome regression, IPW, and AIPW. Introduce misspecification in the outcome model (wrong functional form) or the propensity score model, and see which estimators break.
Why Doubly Robust Estimation?
DGP with nonlinear confounding. PS model: Correct. Outcome model: Wrong. DR works if at least one model is correct—an "insurance" property. N = 300.
Estimation Results
| Estimator | β̂ | SE | 95% CI | Bias |
|---|---|---|---|---|
| IPWclosest | 2.281 | 0.569 | [1.17, 3.40] | +0.281 |
| Outcome Regression | 3.651 | 0.189 | [3.28, 4.02] | +1.651 |
| Doubly Robust | 2.297 | 0.417 | [1.48, 3.11] | +0.297 |
| Both Models Wrong | 3.651 | 0.361 | [2.94, 4.36] | +1.651 |
| True β | 2.000 | — | — | — |
Scenarios
0 = correctly specified; 1 = severely misspecified
0 = correctly specified; 1 = severely misspecified
True average treatment effect
Number of observations
Why the difference?
Propensity score model: correctly specified. Outcome model: severely misspecified. The outcome model is misspecified but the PS model is correct. IPW remains consistent (bias = 0.281), while outcome regression is biased (bias = 1.651). The DR estimator inherits consistency from the correct PS model (bias = 0.297). The "Both Wrong" column confirms that double robustness is an insurance property, not magic: when both models fail, the DR estimator is biased (1.651).
D. Mathematical Derivation
Don't worry about the notation yet — here's what this means in words: The AIPW estimator combines outcome regression and IPW in a way that cancels out the bias from misspecification of either component. The key is that the correction term has expected value zero when either model is correct.
The AIPW estimating equation for is:
Case 1: Outcome model is correct. If , then . The correction term has conditional mean zero and adds only noise. The estimator converges to the true regardless of .
Case 2: Propensity score is correct. If , then:
By iterated expectations:
Similarly:
So the full AIPW estimator equals , regardless of whether is correct.
This result is the doubly robust property: the estimator is consistent under either condition.
Efficiency note: When both models are correctly specified, AIPW achieves the semiparametric efficiency bound ((Hahn, 1998)) for regular estimators of the ATE.
E. Implementation
# Using the AIPW package
library(AIPW)
# Fit AIPW with SuperLearner for both models
aipw_obj <- AIPW$new(
Y = df$outcome,
A = df$treatment,
W = df[, c("x1", "x2", "x3")],
Q.SL.library = c("SL.glm", "SL.ranger"), # outcome model
g.SL.library = c("SL.glm", "SL.ranger") # propensity score
)
aipw_obj$fit()
aipw_obj$summary()
# Simpler: using the WeightIt + cobalt packages
library(WeightIt)
library(cobalt)
# Estimate propensity scores and weights
w <- weightit(treatment ~ x1 + x2 + x3, data = df, method = "ps",
estimand = "ATE")
bal.tab(w) # Check balance
# Outcome regression with weights
library(survey)
d <- svydesign(ids = ~1, weights = ~w$weights, data = df)
fit <- svyglm(outcome ~ treatment + x1 + x2 + x3, design = d)
summary(fit)F. Diagnostics
-
Propensity score overlap. Plot the propensity score distributions for treated and control groups. If there are regions with no overlap, consider trimming or truncating extreme scores.
-
Covariate balance after weighting. Use standardized mean differences to check whether the covariates are balanced after applying the IPW weights. Absolute SMDs below 0.1 are generally good.
-
Sensitivity analysis. Use the Oster (2019) coefficient stability approach or the Cinelli and Hazlett (2020) sensitivity analysis to assess how robust your results are to omitted variable bias.
-
Extreme weights. Check for observations with very large weights (propensity scores near 0 or 1). These extreme weights can dominate the estimator and inflate variance. Report the distribution of weights and consider trimming at the 1st and 99th percentiles.
-
Model specification tests. Run the outcome model and propensity score model separately and check their fit. For the propensity score, check the c-statistic (AUC). For the outcome model, check residual plots.
Interpreting Your Results
AIPW, IPW, and regression adjustment agree: All three approaches give similar estimates. This agreement is reassuring and suggests your results are not sensitive to the specific modeling choices.
AIPW and regression disagree, but AIPW and IPW agree: The outcome model may be misspecified. The propensity score model is doing the heavy lifting. Report AIPW as your main result but discuss the sensitivity.
AIPW and IPW disagree, but AIPW and regression agree: The propensity score model may be misspecified. Regression is doing the heavy lifting.
All three disagree: Something fundamental is wrong. Check for positivity violations, influential observations, or misspecification of both models.
G. What Can Go Wrong
Positivity Violation: Propensity Scores Near 0 or 1
Trim observations with propensity scores below 0.05 or above 0.95 before applying AIPW. Report the number of trimmed observations and how the estimate changes.
AIPW estimate: $1,450 (SE = 320), using 4,850 of 5,000 observations after trimming 150 extreme-propensity units.
Both Models Misspecified: The Double Robustness Illusion
Specify the outcome model with appropriate nonlinear terms (quadratic age, log-earnings) and the propensity score model with relevant interactions. At least one model is approximately correct.
AIPW estimate: $1,500 (SE = 350). Outcome regression gives $1,480, IPW gives $1,550 — all three agree, suggesting at least one model is well-specified.
Overfitting Without Cross-Fitting
Use 5-fold cross-fitting: train the outcome model and propensity score model on 4 folds, predict on the held-out fold. Repeat for all folds and combine.
AIPW with cross-fitting: $1,520 (SE = 340). Valid confidence intervals with 94.8% coverage in simulations.
H. Practice
You estimate the effect of a job training program using three methods: (1) OLS gives an ATE of 1,500 (SE = 300), (2) IPW gives 2,200 (SE = 800), (3) AIPW gives 1,600 (SE = 350). The propensity score distribution shows that 5% of control units have propensity scores above 0.95. What is the most likely explanation for the pattern?
Double Robustness: Estimating the Effect of Insurance Coverage on Preventive Care
A health economist uses an augmented inverse probability weighted (AIPW) estimator to study whether gaining health insurance increases use of preventive care visits. She estimates both a propensity score model (logistic regression predicting insurance coverage) and an outcome model (predicting preventive care visits given insurance status and controls like age, income, and chronic conditions).
Read the analysis below carefully and identify the errors.
Select all errors you can find:
Read the analysis below carefully and identify the errors.
Select all errors you can find:
Read the paper summary below and write a brief referee critique (2-3 sentences) of the identification strategy.
Paper Summary
The authors estimate the effect of a state-level Medicaid expansion on emergency department (ED) visits using AIPW. They use a cross-sectional sample of 50,000 adults from 30 states, 15 of which expanded Medicaid. The outcome is number of ED visits in the past year. They estimate propensity scores using logistic regression with state-level covariates and individual demographics. The outcome model is OLS with the same covariates. They report an AIPW estimate of a 15% reduction in ED visits (95% CI: [-22%, -8%]).
Key Table
| Estimator | Estimate | SE | 95% CI |
|---|---|---|---|
| OLS | -12% | 3.1% | [-18%, -6%] |
| IPW | -23% | 8.5% | [-40%, -6%] |
| AIPW | -15% | 3.6% | [-22%, -8%] |
Propensity score range: [0.01, 0.99] Max IPW weight: 142 Covariate balance (max SMD after weighting): 0.18
Authors' Identification Claim
We use AIPW to achieve doubly robust estimation. Conditional on our rich set of state and individual covariates, Medicaid expansion is as good as randomly assigned.
I. Swap-In: When to Use Something Else
- Matching: When a transparent matched-pair design is preferred and the propensity score model is well-specified — matching discards unmatched units rather than reweighting them.
- DML (Double/Debiased Machine Learning): When nuisance functions are high-dimensional and cross-fitting is needed for valid inference — DML extends the doubly robust logic with machine-learning estimators.
- Inverse probability weighting (IPW): When the propensity score model is well-specified and a simpler reweighting estimator without an outcome model suffices.
- OLS with controls: When the outcome model is correctly specified and selection bias is modest — regression adjustment alone may be sufficient.
J. Reviewer Checklist
Critical Reading Checklist
Paper Library
Foundational (5)
Robins, J. M., Rotnitzky, A., & Zhao, L. P. (1994). Estimation of Regression Coefficients When Some Regressors Are Not Always Observed.
This paper introduced the augmented inverse probability weighting (AIPW) estimator, which combines outcome modeling and propensity score weighting. The key insight is that the estimator is consistent if either the outcome model or the propensity score model is correctly specified, providing a double layer of protection against misspecification.
Scharfstein, D. O., Rotnitzky, A., & Robins, J. M. (1999). Adjusting for Nonignorable Drop-Out Using Semiparametric Nonresponse Models.
Scharfstein, Rotnitzky, and Robins extended the doubly robust framework to handle nonignorable missing data and dropout in longitudinal studies. This paper further developed the semiparametric efficiency theory underlying doubly robust estimation.
Bang, H., & Robins, J. M. (2005). Doubly Robust Estimation in Missing Data and Causal Inference Models.
Bang and Robins provided an accessible exposition of doubly robust estimators, demonstrating their properties through simulations and clarifying when the double robustness property provides meaningful protection. This paper helped make the method more accessible to applied researchers.
Sant'Anna, P. H. C., & Zhao, J. (2020). Doubly Robust Difference-in-Differences Estimators.
Sant'Anna and Zhao developed doubly robust DID estimators that combine outcome regression and inverse probability weighting. The estimator is consistent for the ATT if either the outcome evolution model or the propensity score model for treatment group membership is correctly specified.
Hahn, J. (1998). On the Role of the Propensity Score in Efficient Semiparametric Estimation of Average Treatment Effects.
Hahn derived the semiparametric efficiency bound for estimating average treatment effects and showed that knowledge of the propensity score does not improve the bound, but using estimated propensity scores can achieve efficiency. This paper provided the theoretical foundation for why doubly robust estimators can attain semiparametric efficiency.
Application (4)
Funk, M. J., Westreich, D., Wiesen, C., Sturmer, T., Brookhart, M. A., & Davidian, M. (2011). Doubly Robust Estimation of Causal Effects.
Funk and colleagues provided a practical tutorial on doubly robust estimation for epidemiologists, demonstrating through a worked example how the AIPW estimator protects against misspecification of either the outcome model or the propensity score model. This paper helped spread the method in health sciences.
Glynn, A. N., & Quinn, K. M. (2010). An Introduction to the Augmented Inverse Propensity Weighted Estimator.
Glynn and Quinn introduced the AIPW estimator to political scientists, providing intuition, simulation evidence, and practical guidance. This tutorial demonstrated the advantages of doubly robust methods over propensity score weighting or outcome regression alone in social science applications.
Lunceford, J. K., & Davidian, M. (2004). Stratification and Weighting via the Propensity Score in Estimation of Causal Treatment Effects: A Comparative Study.
Lunceford and Davidian compared propensity score methods including doubly robust estimators in a systematic simulation study. They showed that doubly robust estimators generally perform well and recommended them as a default approach for causal inference from observational data.
Zhao, Q., Small, D. S., & Bhatt, D. L. (2019). Sensitivity Analysis for Inverse Probability Weighting Estimators via the Percentile Bootstrap.
Zhao, Small, and Bhatt developed sensitivity analysis tools for inverse probability weighted and doubly robust estimators, applying them to evaluate the causal effect of bariatric surgery on mortality using health-care claims data. The paper demonstrates practical use of AIPW in a medical decision-making context while addressing concerns about unobserved confounding.
Survey (1)
Kang, J. D. Y., & Schafer, J. L. (2007). Demystifying Double Robustness: A Comparison of Alternative Strategies for Estimating a Population Mean from Incomplete Data.
Kang and Schafer showed through simulations that doubly robust estimators can perform poorly when both models are moderately misspecified, even though they remain consistent when one model is correct. This influential paper tempered enthusiasm and motivated further methodological work on practical performance.