Lab: Doubly Robust Estimation (AIPW)
Implement the augmented inverse probability weighting (AIPW) estimator step by step. Learn why double robustness provides insurance against model misspecification and how to compare AIPW with IPW and outcome regression.
Overview
In this lab you will estimate the average treatment effect of a job training program on earnings using three approaches: outcome regression (OLS), inverse probability weighting (IPW), and augmented inverse probability weighting (AIPW). You will see that the AIPW estimator is consistent if either the propensity score model or the outcome model is correctly specified — the "doubly robust" property.
What you will learn:
- How to estimate propensity scores and check overlap
- How to construct the IPW estimator from first principles
- How to build the AIPW (doubly robust) estimator
- Why double robustness matters when one model is misspecified
- How to compute standard errors via the influence function
Prerequisites: Familiarity with OLS regression and logistic regression. Understanding of potential outcomes notation is helpful.
Step 1: Simulate Observational Data
We create a dataset where treatment assignment depends on covariates (selection on observables).
set.seed(42)
n <- 2000
age <- pmin(pmax(rnorm(n, 35, 10), 18), 65)
educ <- pmin(pmax(rnorm(n, 12, 3), 6), 20)
married <- rbinom(n, 1, 0.4)
prev_earn <- rlnorm(n, 9.5, 0.8)
logit_ps <- -2 + 0.03 * age - 0.1 * educ + 0.5 * married + 0.00005 * prev_earn
true_ps <- plogis(logit_ps)
treat <- rbinom(n, 1, true_ps)
y0 <- 20000 + 500 * educ + 100 * age + 3000 * married + 0.3 * prev_earn + rnorm(n, 0, 5000)
y1 <- y0 + 2500 + 200 * educ - 50 * age
earnings <- treat * y1 + (1 - treat) * y0
true_ate <- mean(y1 - y0)
df <- data.frame(earnings, treat, age, educ, married, prev_earn)
cat("True ATE:", round(true_ate), "\n")
cat("Naive diff:", round(mean(earnings[treat == 1]) - mean(earnings[treat == 0])), "\n")Expected output:
| Covariate | Mean (Treated) | Mean (Control) | Difference |
|---|---|---|---|
| earnings | ~38,000 | ~34,000 | ~4,000 (biased) |
| age | ~36.0 | ~34.5 | +1.5 |
| educ | ~11.5 | ~12.3 | -0.8 |
| married | ~0.48 | ~0.35 | +0.13 |
| prev_earn | ~16,000 | ~12,000 | +4,000 |
Sample data preview (first 5 rows):
| earnings | treat | age | educ | married | prev_earn |
|---|---|---|---|---|---|
| 35,214 | 1 | 42.3 | 10.5 | 1 | 18,500 |
| 28,901 | 0 | 28.7 | 14.2 | 0 | 9,200 |
| 41,378 | 1 | 38.1 | 12.0 | 1 | 22,100 |
| 32,156 | 0 | 33.5 | 13.8 | 0 | 11,800 |
| 36,892 | 0 | 45.2 | 11.1 | 1 | 19,400 |
| Key Statistics | Value |
|---|---|
| Sample size | 2,000 |
| Treatment rate | ~35–45% |
| True ATE | ~4,500–5,500 (DGP: 2500 + 200mean(educ) - 50mean(age)) |
| Naive diff in means | Biased due to selection on age, educ, married, prev_earn |
Step 2: Estimate the Propensity Score
# Estimate propensity score
ps_model <- glm(treat ~ age + educ + married + prev_earn,
data = df, family = binomial)
df$ps_hat <- predict(ps_model, type = "response")
# Check overlap
library(ggplot2)
ggplot(df, aes(x = ps_hat, fill = factor(treat))) +
geom_histogram(alpha = 0.5, position = "identity", bins = 40) +
labs(x = "Propensity Score", fill = "Treated") +
theme_minimal()
cat("PS range (treated):", range(df$ps_hat[df$treat == 1]), "\n")
cat("PS range (control):", range(df$ps_hat[df$treat == 0]), "\n")Expected output:
| Propensity Score | Treated | Control |
|---|---|---|
| Range | [~0.10, ~0.80] | [~0.05, ~0.70] |
| Mean | ~0.42 | ~0.33 |
| Median | ~0.40 | ~0.32 |
You notice that some control units have propensity scores very close to zero (e.g., 0.01). Why is this a problem for IPW estimation?
Step 3: Outcome Regression (OLS)
# Outcome regression
ols <- lm(earnings ~ treat + age + educ + married + prev_earn, data = df)
cat("OLS ATE:", coef(ols)["treat"], "\n")
# Separate models for each treatment arm
mu1_model <- lm(earnings ~ age + educ + married + prev_earn, data = df[df$treat == 1, ])
mu0_model <- lm(earnings ~ age + educ + married + prev_earn, data = df[df$treat == 0, ])
mu1_hat <- predict(mu1_model, newdata = df)
mu0_hat <- predict(mu0_model, newdata = df)
cat("Outcome regression (separate models):", mean(mu1_hat - mu0_hat), "\n")Expected output:
| Estimator | ATE Estimate | SE | True ATE |
|---|---|---|---|
| OLS (treat coefficient) | ~4,500–5,500 | ~300 | ~5,000 |
| Outcome regression (separate models) | ~4,500–5,500 | — | ~5,000 |
| OLS Regression Output | Coefficient | SE | t | p-value |
|---|---|---|---|---|
| treat | ~5,000 | ~300 | ~16.7 | <0.001 |
| age | ~100 | ~15 | ~6.7 | <0.001 |
| educ | ~700 | ~50 | ~14.0 | <0.001 |
| married | ~3,000 | ~300 | ~10.0 | <0.001 |
| prev_earn | ~0.30 | ~0.02 | ~15.0 | <0.001 |
Note: The OLS coefficient on treat captures the ATE under the assumption that the linear outcome model is correctly specified. Since the true DGP is linear in the outcome equation, OLS should perform well here.
Step 4: Inverse Probability Weighting (IPW)
# IPW estimator
ps <- pmin(pmax(df$ps_hat, 0.05), 0.95)
w1 <- df$treat / ps
w0 <- (1 - df$treat) / (1 - ps)
# Horvitz-Thompson
ate_ipw <- mean(w1 * df$earnings) - mean(w0 * df$earnings)
# Hajek (normalized)
ate_hajek <- sum(w1 * df$earnings) / sum(w1) - sum(w0 * df$earnings) / sum(w0)
cat("Horvitz-Thompson IPW:", round(ate_ipw), "\n")
cat("Hajek IPW:", round(ate_hajek), "\n")Expected output:
| IPW Estimator | ATE Estimate |
|---|---|
| Horvitz-Thompson IPW | ~4,000–6,000 (can be volatile) |
| Hajek (normalized) IPW | ~4,500–5,500 (more stable) |
| True ATE | ~5,000 |
The Horvitz-Thompson IPW can be unstable due to extreme weights (observations with very low propensity scores receive very large weights). The Hajek (normalized) estimator divides by the sum of weights, which stabilizes the estimate. Trimming propensity scores at [0.05, 0.95] further reduces variance.
Step 5: The AIPW (Doubly Robust) Estimator
# AIPW by hand
aipw_scores <- (mu1_hat - mu0_hat
+ df$treat * (df$earnings - mu1_hat) / ps
- (1 - df$treat) * (df$earnings - mu0_hat) / (1 - ps))
ate_aipw <- mean(aipw_scores)
se_aipw <- sd(aipw_scores) / sqrt(n)
cat("AIPW ATE:", round(ate_aipw), " (SE:", round(se_aipw), ")\n")
cat("95% CI: [", round(ate_aipw - 1.96 * se_aipw), ",",
round(ate_aipw + 1.96 * se_aipw), "]\n")
# Or use the AIPW package
# library(AIPW)
# aipw_obj <- AIPW$new(Y = df$earnings, A = df$treat,
# W = df[, c("age","educ","married","prev_earn")])
# aipw_obj$fit()$summary()Expected output:
| AIPW (Doubly Robust) | Value |
|---|---|
| ATE estimate | ~4,500–5,500 |
| Standard error | ~250–400 |
| 95% CI lower | ~4,200 |
| 95% CI upper | ~5,800 |
| True ATE | ~5,000 |
| Comparison of All Estimators | ATE Estimate |
|---|---|
| OLS (outcome regression) | ~5,000 |
| IPW (Hajek) | ~5,000 |
| AIPW (doubly robust) | ~5,000 |
| True ATE | ~5,000 |
All three estimators should be close to the true ATE in this case because both the propensity score model and the outcome model are correctly specified. The advantage of AIPW becomes clear when one model is deliberately misspecified (Step 6).
Step 6: Double Robustness in Action
Now deliberately misspecify one of the two models to see the doubly robust property.
# Misspecify propensity score (only age)
ps_wrong <- pmin(pmax(predict(glm(treat ~ age, family = binomial), type = "response"), 0.05), 0.95)
# Misspecify outcome model (only age)
mu1_wrong <- predict(lm(earnings ~ age, data = df[df$treat == 1, ]), newdata = df)
mu0_wrong <- predict(lm(earnings ~ age, data = df[df$treat == 0, ]), newdata = df)
# AIPW: correct outcome + wrong PS
aipw_good_out <- mean(mu1_hat - mu0_hat + df$treat * (df$earnings - mu1_hat) / ps_wrong
- (1 - df$treat) * (df$earnings - mu0_hat) / (1 - ps_wrong))
# AIPW: wrong outcome + correct PS
aipw_good_ps <- mean(mu1_wrong - mu0_wrong + df$treat * (df$earnings - mu1_wrong) / ps
- (1 - df$treat) * (df$earnings - mu0_wrong) / (1 - ps))
cat("Correct outcome + wrong PS:", round(aipw_good_out), "\n")
cat("Wrong outcome + correct PS:", round(aipw_good_ps), "\n")
cat("True ATE:", round(true_ate), "\n")Expected output:
| Scenario | AIPW Estimate | IPW Alone | OLS Alone | True ATE |
|---|---|---|---|---|
| Both models correct | ~5,000 | ~5,000 | ~5,000 | ~5,000 |
| Wrong PS, correct outcome | ~5,000 | ~3,000–7,000 (biased) | ~5,000 | ~5,000 |
| Correct PS, wrong outcome | ~5,000 | ~5,000 | ~2,000–8,000 (biased) | ~5,000 |
| Both models wrong | biased | biased | biased | ~5,000 |
This table demonstrates the doubly robust property in action:
- Row 2: Even with a badly misspecified propensity score (using only age), AIPW recovers the correct ATE because the outcome model is correct. The IPW-only estimator, which relies solely on the propensity score, is biased.
- Row 3: Even with a badly misspecified outcome model (using only age), AIPW recovers the correct ATE because the propensity score is correct. The OLS-only estimator, which relies solely on the outcome model, is biased.
- Row 4: When both models are wrong, AIPW offers no guarantee and produces a biased estimate.
When BOTH the propensity score and outcome models are misspecified, the AIPW estimator in this example gives a biased estimate. What does this tell us about double robustness?
Exercises
-
Use machine learning for nuisance models. Replace logistic regression and OLS with random forests for the propensity score and outcome models. Does the AIPW estimate improve?
-
Estimate the ATT. Modify the AIPW formula to target the average treatment effect on the treated. How does it differ from the ATE?
-
Bootstrap confidence intervals. Implement a nonparametric bootstrap for the AIPW estimator and compare the bootstrap SE with the influence-function SE.
-
Vary the degree of overlap. Increase the coefficients in the propensity score model so that overlap deteriorates. At what point does AIPW break down?
Summary
In this lab you learned:
- Outcome regression relies on a correctly specified outcome model; IPW relies on a correctly specified propensity score model
- The AIPW estimator combines both approaches, providing consistency if either model is correct (the doubly robust property)
- Propensity score trimming is essential for practical stability of IPW and AIPW
- The influence function provides analytic standard errors without bootstrapping
- When both models are wrong, no estimator can save you — invest effort in specifying both models well
- AIPW is the foundation for many modern causal inference methods, including targeted learning (TMLE) and double machine learning