MethodAtlas
Method·advanced·10 min read
Model-BasedModern

Doubly Robust / AIPW Estimation

Combines outcome modeling and propensity score weighting — consistent if either model is correctly specified.

When to UseWhen you rely on selection-on-observables and want protection against model misspecification — consistent if either the outcome model or the propensity score model is correctly specified.
AssumptionConditional independence (selection on observables) and positivity/overlap (propensity scores bounded away from 0 and 1). At least one of the two models (outcome or propensity) must be correctly specified for consistency.
MistakeAssuming doubly robust means no assumptions — you still need conditional independence and overlap. Also, not checking for positivity violations (propensity scores near 0 or 1) which cause extreme inverse probability weights.
Reading Time~10 min read · 11 sections · 6 interactive exercises

One-Line Implementation

RAIPW$new(Y=df$outcome, A=df$treatment, W=df[,c('x1','x2','x3')])$fit()$summary()
Statateffects aipw (outcome x1 x2 x3) (treatment x1 x2 x3)
Pythonfrom econml.dr import DRLearner; DRLearner().fit(Y, T, X=X, W=W)

Download Full Analysis Code

Complete scripts with diagnostics, robustness checks, and result export.

Motivating Example: Evaluating a Job Training Program

A government wants to evaluate its job training program. Randomization was not possible — the program was offered to anyone who signed up, and the people who signed up are different from those who did not. You observe a rich set of covariates (education, age, prior earnings, employment history, neighborhood characteristics) and are willing to assume that, conditional on these covariates, participation is as good as random.

You have two obvious strategies:

Strategy 1: Outcome regression. Model the outcome (post-program earnings) as a function of treatment and covariates using, say, OLS. If the model is correctly specified, this regression gives you the treatment effect. But if you get the functional form wrong — maybe the true relationship is nonlinear, or there are interactions you missed — your estimate is biased.

Strategy 2: weighting. Model the treatment assignment (who signs up) as a function of covariates, as in matching methods. Re-weight observations so that the treated and control groups are balanced on observables. If the propensity score model is correctly specified, this reweighting gives you the treatment effect. But if you misspecify the selection model, the weights are wrong and your estimate is biased.

Each strategy relies on getting one model right. What if you are not sure which one you got right?

Doubly robust estimation combines both strategies into a single estimator that is consistent if either the outcome model or the propensity score model is correctly specified. You only need one of the two to work. This property — known as double robustness — is a key theoretical advantage of the estimator.


AOverview

The doubly robust estimator, also called the augmented inverse probability weighted (AIPW) estimator, was developed by Robins and colleagues in the biostatistics literature (Robins et al., 1994) (Bang & Robins, 2005).

The intuition is instructive. The AIPW estimator starts with the outcome regression and then corrects it using the propensity score weights. If the outcome model is right, the correction term has expected value zero and does not hurt. If the outcome model is wrong but the propensity score model is right, the correction term exactly removes the bias. You get two chances to be right.

How It Works (Intuitively)

Think of it in two steps:

  1. Predict outcomes for everyone under both treatment and control using your outcome model. Compute the predicted treatment effect.

  2. Correct the prediction errors using propensity score weights. For each observation, look at the residual (actual outcome minus predicted outcome). Weight these residuals by the inverse of the propensity score to correct for any remaining imbalance.

If your outcome model is perfect, the residuals are pure noise, and the correction adds nothing. If your outcome model is imperfect but your propensity scores are right, the weighted residuals exactly correct the bias.

Common Confusions

"Does doubly robust mean I do not need the selection-on-observables assumption?" Absolutely not. Doubly robust estimation still requires conditional independence (also called unconfoundedness or ignorability). It protects against model misspecification, not against omitted variables. If there are unobserved confounders, no amount of modeling sophistication will save you.

"What if both models are wrong?" Then the doubly robust estimator is also wrong. "Doubly robust" does not mean "right in all cases." It means "right if either component is right." If both are wrong, the bias is generally of order equal to the product of the two individual estimation errors — typically smaller than either individual bias, but not zero.

"Is doubly robust the same as double machine learning (DML)?" They are related but distinct. DML uses ML methods for both nuisance functions (outcome model and propensity score) and adds cross-fitting to prevent overfitting bias. The doubly robust property is a building block of DML, but DML adds important refinements. See the DML page for details.

"Should I use the same covariates in both models?" You can, but you do not have to. The outcome model and the propensity score model can include different covariates. In practice, including the same core covariates in both is a reasonable default, but you might want additional variables in one model based on domain knowledge.


BIdentification

The Target Estimand

We want the Average Treatment Effect on the Treated (ATT) or the Average Treatment Effect (ATE):

ATE=E[Y(1)Y(0)]\text{ATE} = E[Y(1) - Y(0)]

Under the conditional independence assumption (CIA):

{Y(0),Y(1)} ⁣ ⁣ ⁣DX\{Y(0), Y(1)\} \perp\!\!\!\perp D \mid X

where DD is the treatment indicator and XX are observed covariates.

The AIPW Estimator

The AIPW estimator for the ATE is:

τ^AIPW=1ni=1n[μ^1(Xi)μ^0(Xi)+Di(Yiμ^1(Xi))e^(Xi)(1Di)(Yiμ^0(Xi))1e^(Xi)]\hat{\tau}^{AIPW} = \frac{1}{n} \sum_{i=1}^{n} \left[ \hat{\mu}_1(X_i) - \hat{\mu}_0(X_i) + \frac{D_i (Y_i - \hat{\mu}_1(X_i))}{\hat{e}(X_i)} - \frac{(1 - D_i)(Y_i - \hat{\mu}_0(X_i))}{1 - \hat{e}(X_i)} \right]

where:

  • μ^1(Xi)=E^[YX=Xi,D=1]\hat{\mu}_1(X_i) = \hat{E}[Y | X = X_i, D = 1] is the estimated outcome under treatment
  • μ^0(Xi)=E^[YX=Xi,D=0]\hat{\mu}_0(X_i) = \hat{E}[Y | X = X_i, D = 0] is the estimated outcome under control
  • e^(Xi)=P^(D=1X=Xi)\hat{e}(X_i) = \hat{P}(D = 1 | X = X_i) is the estimated propensity score

The Doubly Robust Property

This estimator is consistent if:

  1. The outcome models μ^1,μ^0\hat{\mu}_1, \hat{\mu}_0 are correctly specified, OR
  2. The propensity score model e^\hat{e} is correctly specified.

You need at least one to be right, but you do not need both.

Positivity Assumption

In addition to CIA, you need the positivity (or overlap) assumption:

0<P(D=1X=x)<1for all x in the support of X0 < P(D = 1 \mid X = x) < 1 \quad \text{for all } x \text{ in the support of } X

In words: for every combination of covariate values, there must be a positive probability of being either treated or untreated. Without this condition, the inverse probability weights become unbounded.


CVisual Intuition

Imagine a scatterplot of propensity scores for treated and control units. Good overlap means the two distributions overlap substantially. Poor overlap means there are regions where only treated (or only control) units exist. In those regions, the propensity score is near 0 or 1, and the (IPW) weights become extreme.

The doubly robust estimator is most valuable when the propensity score distributions overlap reasonably well, but you are uncertain about whether your outcome model captures the true functional form.


DMathematical Derivation

Don't worry about the notation yet — here's what this means in words: The AIPW estimator combines outcome regression and IPW in a way that cancels out the bias from misspecification of either component. The key is that the correction term has expected value zero when either model is correct.

The AIPW estimating equation for E[Y(1)]E[Y(1)] is:

μ^1AIPW=1ni=1n[μ^1(Xi)+Di(Yiμ^1(Xi))e^(Xi)]\hat{\mu}^{AIPW}_1 = \frac{1}{n} \sum_{i=1}^n \left[ \hat{\mu}_1(X_i) + \frac{D_i(Y_i - \hat{\mu}_1(X_i))}{\hat{e}(X_i)} \right]

Case 1: Outcome model is correct. If μ^1(Xi)=E[YiXi,Di=1]\hat{\mu}_1(X_i) = E[Y_i | X_i, D_i = 1], then E[Yiμ^1(Xi)Xi,Di=1]=0E[Y_i - \hat{\mu}_1(X_i) | X_i, D_i = 1] = 0. The correction term has conditional mean zero and adds only noise. The estimator converges to the true E[Y(1)]E[Y(1)] regardless of e^\hat{e}.

Case 2: Propensity score is correct. If e^(Xi)=P(Di=1Xi)\hat{e}(X_i) = P(D_i = 1 | X_i), then:

E[Di(Yiμ^1(Xi))e^(Xi)]=E[DiYie^(Xi)]E[Diμ^1(Xi)e^(Xi)]E\left[\frac{D_i(Y_i - \hat{\mu}_1(X_i))}{\hat{e}(X_i)}\right] = E\left[\frac{D_i Y_i}{\hat{e}(X_i)}\right] - E\left[\frac{D_i \hat{\mu}_1(X_i)}{\hat{e}(X_i)}\right]

By iterated expectations:

E[DiYie(Xi)]=E[E[DiYiXi]e(Xi)]=E[e(Xi)E[YiXi,Di=1]e(Xi)]=E[Y(1)]E\left[\frac{D_i Y_i}{e(X_i)}\right] = E\left[\frac{E[D_i Y_i \mid X_i]}{e(X_i)}\right] = E\left[\frac{e(X_i) E[Y_i \mid X_i, D_i = 1]}{e(X_i)}\right] = E[Y(1)]

Similarly:

E[Diμ^1(Xi)e(Xi)]=E[μ^1(Xi)]E\left[\frac{D_i \hat{\mu}_1(X_i)}{e(X_i)}\right] = E[\hat{\mu}_1(X_i)]

So the full AIPW estimator equals E[μ^1(Xi)]+E[Y(1)]E[μ^1(Xi)]=E[Y(1)]E[\hat{\mu}_1(X_i)] + E[Y(1)] - E[\hat{\mu}_1(X_i)] = E[Y(1)], regardless of whether μ^1\hat{\mu}_1 is correct.

This result is the doubly robust property: the estimator is consistent under either condition.

Efficiency note: When both models are correctly specified, AIPW achieves the semiparametric efficiency bound (Hahn, 1998) for regular estimators of the ATE.


EImplementation

# Requires: AIPW
# AIPW: R package for augmented inverse probability weighting
library(AIPW)

# --- Step 1: Fit AIPW with flexible ML for both nuisance models ---
# AIPW is consistent if EITHER the outcome or propensity model is correct
# Q.SL.library: ensemble ML for outcome model E[Y|X,A]
# g.SL.library: ensemble ML for propensity score P(A=1|X)
aipw_obj <- AIPW$new(
Y = df$outcome,
A = df$treatment,
W = df[, c("x1", "x2", "x3")],
Q.SL.library = c("SL.glm", "SL.ranger"),   # outcome: GLM + random forest
g.SL.library = c("SL.glm", "SL.ranger")    # propensity: GLM + random forest
)
aipw_obj$fit()
aipw_obj$summary()
# Reports ATE and ATT with confidence intervals

# --- Step 2: Alternative approach using WeightIt + cobalt ---
library(WeightIt)   # WeightIt: propensity score weighting
library(cobalt)     # cobalt: covariate balance assessment

# Estimate propensity scores and compute IPW weights for ATE
w <- weightit(treatment ~ x1 + x2 + x3, data = df, method = "ps",
            estimand = "ATE")
# Check covariate balance after weighting (target: SMD < 0.1)
bal.tab(w)

# --- Step 3: Doubly robust estimation via weighted regression ---
# Combining IPW weights with outcome regression gives double robustness
library(survey)
d <- svydesign(ids = ~1, weights = ~w$weights, data = df)
fit <- svyglm(outcome ~ treatment + x1 + x2 + x3, design = d)
summary(fit)
# Coefficient on treatment: DR estimate of the ATE

FDiagnostics

  1. Propensity score overlap. Plot the propensity score distributions for treated and control groups. If there are regions with no overlap, consider trimming or truncating extreme scores.

  2. Covariate balance after weighting. Use standardized mean differences to check whether the covariates are balanced after applying the IPW weights. Absolute SMDs below 0.1 are generally good.

  3. Sensitivity analysis. Use the Oster (2019) coefficient stability approach or the Cinelli and Hazlett (2020) sensitivity analysis to assess how robust your results are to omitted variable bias.

  4. Extreme weights. Check for observations with very large weights (propensity scores near 0 or 1). These extreme weights can dominate the estimator and inflate variance. Report the distribution of weights and consider trimming at the 1st and 99th percentiles.

  5. Model specification tests. Run the outcome model and propensity score model separately and check their fit. For the propensity score, check the c-statistic (AUC). For the outcome model, check residual plots.

Interpreting Your Results

AIPW, IPW, and regression adjustment agree: All three approaches give similar estimates. This agreement is reassuring and suggests your results are not sensitive to the specific modeling choices.

AIPW and regression disagree, but AIPW and IPW agree: The outcome model may be misspecified. The propensity score model is doing the heavy lifting. Report AIPW as your main result but discuss the sensitivity.

AIPW and IPW disagree, but AIPW and regression agree: The propensity score model may be misspecified. Regression is doing the heavy lifting.

All three disagree: Something fundamental is wrong. Check for positivity violations, influential observations, or misspecification of both models.


GWhat Can Go Wrong

What Can Go Wrong

Positivity Violation: Propensity Scores Near 0 or 1

Trim observations with propensity scores below 0.05 or above 0.95 before applying AIPW. Report the number of trimmed observations and how the estimate changes.

AIPW estimate: $1,450 (SE = 320), using 4,850 of 5,000 observations after trimming 150 extreme-propensity units.

What Can Go Wrong

Both Models Misspecified: The Double Robustness Illusion

Specify the outcome model with appropriate nonlinear terms (quadratic age, log-earnings) and the propensity score model with relevant interactions. At least one model is approximately correct.

AIPW estimate: $1,500 (SE = 350). Outcome regression gives $1,480, IPW gives $1,550 — all three agree, suggesting at least one model is well-specified.

What Can Go Wrong

Overfitting Without Cross-Fitting

Use 5-fold cross-fitting: train the outcome model and propensity score model on 4 folds, predict on the held-out fold. Repeat for all folds and combine.

AIPW with cross-fitting: $1,520 (SE = 340). Valid confidence intervals with 94.8% coverage in simulations.


HPractice

Concept Check

You estimate the effect of a job training program using three methods: (1) OLS gives an ATE of 1,500 (SE = 300), (2) IPW gives 2,200 (SE = 800), (3) AIPW gives 1,600 (SE = 350). The propensity score distribution shows that 5% of control units have propensity scores above 0.95. What is the most likely explanation for the pattern?

Concept Check

What does 'doubly robust' mean in the context of AIPW estimation?

Guided Exercise

Double Robustness: Estimating the Effect of Insurance Coverage on Preventive Care

A health economist uses an augmented inverse probability weighted (AIPW) estimator to study whether gaining health insurance increases use of preventive care visits. She estimates both a propensity score model (logistic regression predicting insurance coverage) and an outcome model (predicting preventive care visits given insurance status and controls like age, income, and chronic conditions).

What does 'doubly robust' mean in this context?

If extreme propensity scores (near 0 or 1) are common in the data, what problem arises in the IPW component?

How does the AIPW estimator use the outcome model to 'augment' the IPW estimator?

A colleague says 'our AIPW estimate is valid even if uninsured patients differ in unobservable health motivation.' Are they correct?

Error Detective

Read the analysis below carefully and identify the errors.

A health economist estimates the effect of a hospital quality improvement program on patient mortality using AIPW. The sample includes 12,000 patients across 45 hospitals. She reports: "The AIPW estimate shows the program reduces 30-day mortality by 4.2 percentage points (SE = 0.8, p < 0.001). We use logistic regression for the propensity score and OLS for the outcome model, both with hospital fixed effects and patient demographics." She writes: "Because our estimator is doubly robust, our results are valid even if unobserved patient severity differs across program and non-program hospitals."

Select all errors you can find:

Error Detective

Read the analysis below carefully and identify the errors.

A labor economist studies the effect of union membership on wages using AIPW with gradient boosting for both nuisance models. He fits both models on the full sample (no cross-fitting) and reports: "AIPW estimate: $2.30/hour wage premium (SE = $0.15). The propensity score model achieves an in-sample AUC of 0.97, and the outcome model achieves R-squared of 0.91." He notes: "The high predictive accuracy of both models ensures our doubly robust estimate is reliable."

Select all errors you can find:

Referee Exercise

Read the paper summary below and write a brief referee critique (2-3 sentences) of the identification strategy.

Paper Summary

The authors estimate the effect of a state-level Medicaid expansion on emergency department (ED) visits using AIPW. They use a cross-sectional sample of 50,000 adults from 30 states, 15 of which expanded Medicaid. The outcome is number of ED visits in the past year. They estimate propensity scores using logistic regression with state-level covariates and individual demographics. The outcome model is OLS with the same covariates. They report an AIPW estimate of a 15% reduction in ED visits (95% CI: [-22%, -8%]).

Key Table

EstimatorEstimateSE95% CI
OLS-12%3.1%[-18%, -6%]
IPW-23%8.5%[-40%, -6%]
AIPW-15%3.6%[-22%, -8%]
Propensity score range: [0.01, 0.99]
Max IPW weight: 142
Covariate balance (max SMD after weighting): 0.18

Authors' Identification Claim

We use AIPW to achieve doubly robust estimation. Conditional on our rich set of state and individual covariates, Medicaid expansion is as good as randomly assigned.


ISwap-In: When to Use Something Else

  • Matching: When a transparent matched-pair design is preferred and the propensity score model is well-specified — matching discards unmatched units rather than reweighting them.
  • DML (Double/Debiased Machine Learning): When nuisance functions are high-dimensional and cross-fitting is needed for valid inference — DML extends the doubly robust logic with machine-learning estimators.
  • Inverse probability weighting (IPW): When the propensity score model is well-specified and a simpler reweighting estimator without an outcome model suffices.
  • OLS with controls: When the outcome model is correctly specified and selection bias is modest — regression adjustment alone may be sufficient.

JReviewer Checklist

Critical Reading Checklist

0 of 8 items checked0%

Paper Library

Foundational (7)

Bang, H., & Robins, J. M. (2005). Doubly Robust Estimation in Missing Data and Causal Inference Models.

Bang and Robins provide an accessible exposition of doubly robust estimators, demonstrating their properties through simulations and clarifying when the double robustness property provides meaningful protection. This paper helps make the method more accessible to applied researchers.

Glynn, A. N., & Quinn, K. M. (2010). An Introduction to the Augmented Inverse Propensity Weighted Estimator.

Political AnalysisDOI: 10.1093/pan/mpp036

Glynn and Quinn introduce the AIPW estimator to political scientists, providing intuition, simulation evidence, and practical guidance. This tutorial demonstrates the advantages of doubly robust methods over propensity score weighting or outcome regression alone in social science applications.

Hahn, J. (1998). On the Role of the Propensity Score in Efficient Semiparametric Estimation of Average Treatment Effects.

EconometricaDOI: 10.2307/2998560

Hahn derives the semiparametric efficiency bound for estimating average treatment effects and shows that knowledge of the propensity score does not improve the bound—it is ancillary for ATE. The efficient estimators take the form of sample averages completed by nonparametric imputation. This paper is foundational for understanding efficient semiparametric estimation of treatment effects.

Robins, J. M., Rotnitzky, A., & Zhao, L. P. (1994). Estimation of Regression Coefficients When Some Regressors Are Not Always Observed.

Journal of the American Statistical AssociationDOI: 10.1080/01621459.1994.10476818

Robins, Rotnitzky, and Zhao introduce the augmented inverse probability weighting (AIPW) estimator, which combines outcome modeling and propensity score weighting. The key insight is that the estimator is consistent if either the outcome model or the propensity score model is correctly specified, providing a double layer of protection against misspecification.

Sant'Anna, P. H. C., & Zhao, J. (2020). Doubly Robust Difference-in-Differences Estimators.

Journal of EconometricsDOI: 10.1016/j.jeconom.2020.06.003

Sant'Anna and Zhao develop doubly robust DID estimators that combine outcome regression and inverse probability weighting. The estimator is consistent for the ATT if either the outcome evolution model or the propensity score model for treatment group membership is correctly specified.

Scharfstein, D. O., Rotnitzky, A., & Robins, J. M. (1999). Adjusting for Nonignorable Drop-Out Using Semiparametric Nonresponse Models.

Journal of the American Statistical AssociationDOI: 10.1080/01621459.1999.10473862

Scharfstein, Rotnitzky, and Robins develop a semiparametric sensitivity analysis framework for nonignorable dropout in longitudinal studies. They propose treating the selection bias parameter as known, then varying it over a plausible range to assess how inferences change. This paper provides foundational methods for sensitivity analysis under nonignorable missing data.

Zhao, Q., Small, D. S., & Bhattacharya, B. B. (2019). Sensitivity Analysis for Inverse Probability Weighting Estimators via the Percentile Bootstrap.

Journal of the Royal Statistical Society: Series BDOI: 10.1111/rssb.12327

Zhao, Small, and Bhattacharya develop sensitivity analysis tools for inverse probability weighted and augmented IPW estimators via the percentile bootstrap. They apply the methods to evaluate the causal effect of fish consumption on blood mercury levels, demonstrating practical use of AIPW sensitivity analysis in an observational study context. The paper provides a computationally convenient approach for assessing how sensitive doubly robust estimates are to violations of the unconfoundedness assumption.

Application (2)

Funk, M. J., Westreich, D., Wiesen, C., Sturmer, T., Brookhart, M. A., & Davidian, M. (2011). Doubly Robust Estimation of Causal Effects.

American Journal of EpidemiologyDOI: 10.1093/aje/kwq439

Funk and colleagues provide a practical tutorial on doubly robust estimation for epidemiologists, demonstrating through a worked example how the AIPW estimator protects against misspecification of either the outcome model or the propensity score model. This paper helps spread the method in health sciences.

Lunceford, J. K., & Davidian, M. (2004). Stratification and Weighting via the Propensity Score in Estimation of Causal Treatment Effects: A Comparative Study.

Statistics in MedicineDOI: 10.1002/sim.1903

Lunceford and Davidian compare propensity-score stratification, inverse probability weighting, and doubly robust estimators in a systematic simulation study. The paper provides a side-by-side assessment of these approaches for estimating causal treatment effects from observational data.

Survey (2)

Angrist, J. D., & Pischke, J.-S. (2009). Mostly Harmless Econometrics: An Empiricist's Companion.

Princeton University PressDOI: 10.1515/9781400829828

Angrist and Pischke write one of the most influential modern textbooks on applied econometrics, organizing the field around a design-based approach to causal inference. The book provides essential treatments of instrumental variables, difference-in-differences, and regression discontinuity, each grounded in the potential outcomes framework. It remains the standard reference for graduate students learning to evaluate and implement identification strategies.

Kang, J. D. Y., & Schafer, J. L. (2007). Demystifying Double Robustness: A Comparison of Alternative Strategies for Estimating a Population Mean from Incomplete Data.

Statistical ScienceDOI: 10.1214/07-STS227

Kang and Schafer show through simulations that doubly robust estimators can perform poorly when both models are moderately misspecified, even though they remain consistent when one model is correct. This influential paper tempers enthusiasm and motivates further methodological work on practical performance.

Tags

model-basedselection-on-observablessemiparametric