MethodAtlas
tutorial90 minutes

Lab: Doubly Robust Estimation (AIPW)

Implement the augmented inverse probability weighting (AIPW) estimator step by step. Learn why double robustness provides insurance against model misspecification and how to compare AIPW with IPW and outcome regression.

Overview

In this lab you will estimate the average treatment effect of a job training program on earnings using three approaches: outcome regression (OLS), inverse probability weighting (IPW), and augmented inverse probability weighting (AIPW). You will see that the AIPW estimator is consistent if either the propensity score model or the outcome model is correctly specified — the "doubly robust" property.

What you will learn:

  • How to estimate propensity scores and check overlap
  • How to construct the IPW estimator from first principles
  • How to build the AIPW (doubly robust) estimator
  • Why double robustness matters when one model is misspecified
  • How to compute standard errors via the influence function

Prerequisites: Familiarity with OLS regression and logistic regression. Understanding of potential outcomes notation is helpful.


Step 1: Simulate Observational Data

We create a dataset where treatment assignment depends on covariates (selection on observables).

set.seed(42)
n <- 2000

age <- pmin(pmax(rnorm(n, 35, 10), 18), 65)
educ <- pmin(pmax(rnorm(n, 12, 3), 6), 20)
married <- rbinom(n, 1, 0.4)
prev_earn <- rlnorm(n, 9.5, 0.8)

logit_ps <- -2 + 0.03 * age - 0.1 * educ + 0.5 * married + 0.00005 * prev_earn
true_ps <- plogis(logit_ps)
treat <- rbinom(n, 1, true_ps)

y0 <- 20000 + 500 * educ + 100 * age + 3000 * married + 0.3 * prev_earn + rnorm(n, 0, 5000)
y1 <- y0 + 2500 + 200 * educ - 50 * age

earnings <- treat * y1 + (1 - treat) * y0
true_ate <- mean(y1 - y0)

df <- data.frame(earnings, treat, age, educ, married, prev_earn)

cat("True ATE:", round(true_ate), "\n")
cat("Naive diff:", round(mean(earnings[treat == 1]) - mean(earnings[treat == 0])), "\n")

Expected output:

CovariateMean (Treated)Mean (Control)Difference
earnings~38,000~34,000~4,000 (biased)
age~36.0~34.5+1.5
educ~11.5~12.3-0.8
married~0.48~0.35+0.13
prev_earn~16,000~12,000+4,000

Sample data preview (first 5 rows):

earningstreatageeducmarriedprev_earn
35,214142.310.5118,500
28,901028.714.209,200
41,378138.112.0122,100
32,156033.513.8011,800
36,892045.211.1119,400
Key StatisticsValue
Sample size2,000
Treatment rate~35–45%
True ATE~4,500–5,500 (DGP: 2500 + 200mean(educ) - 50mean(age))
Naive diff in meansBiased due to selection on age, educ, married, prev_earn

Step 2: Estimate the Propensity Score

# Estimate propensity score
ps_model <- glm(treat ~ age + educ + married + prev_earn,
              data = df, family = binomial)
df$ps_hat <- predict(ps_model, type = "response")

# Check overlap
library(ggplot2)
ggplot(df, aes(x = ps_hat, fill = factor(treat))) +
geom_histogram(alpha = 0.5, position = "identity", bins = 40) +
labs(x = "Propensity Score", fill = "Treated") +
theme_minimal()

cat("PS range (treated):", range(df$ps_hat[df$treat == 1]), "\n")
cat("PS range (control):", range(df$ps_hat[df$treat == 0]), "\n")
Requiresggplot2

Expected output:

Propensity ScoreTreatedControl
Range[~0.10, ~0.80][~0.05, ~0.70]
Mean~0.42~0.33
Median~0.40~0.32
Concept Check

You notice that some control units have propensity scores very close to zero (e.g., 0.01). Why is this a problem for IPW estimation?


Step 3: Outcome Regression (OLS)

# Outcome regression
ols <- lm(earnings ~ treat + age + educ + married + prev_earn, data = df)
cat("OLS ATE:", coef(ols)["treat"], "\n")

# Separate models for each treatment arm
mu1_model <- lm(earnings ~ age + educ + married + prev_earn, data = df[df$treat == 1, ])
mu0_model <- lm(earnings ~ age + educ + married + prev_earn, data = df[df$treat == 0, ])
mu1_hat <- predict(mu1_model, newdata = df)
mu0_hat <- predict(mu0_model, newdata = df)
cat("Outcome regression (separate models):", mean(mu1_hat - mu0_hat), "\n")

Expected output:

EstimatorATE EstimateSETrue ATE
OLS (treat coefficient)~4,500–5,500~300~5,000
Outcome regression (separate models)~4,500–5,500~5,000
OLS Regression OutputCoefficientSEtp-value
treat~5,000~300~16.7<0.001
age~100~15~6.7<0.001
educ~700~50~14.0<0.001
married~3,000~300~10.0<0.001
prev_earn~0.30~0.02~15.0<0.001

Note: The OLS coefficient on treat captures the ATE under the assumption that the linear outcome model is correctly specified. Since the true DGP is linear in the outcome equation, OLS should perform well here.


Step 4: Inverse Probability Weighting (IPW)

# IPW estimator
ps <- pmin(pmax(df$ps_hat, 0.05), 0.95)

w1 <- df$treat / ps
w0 <- (1 - df$treat) / (1 - ps)

# Horvitz-Thompson
ate_ipw <- mean(w1 * df$earnings) - mean(w0 * df$earnings)

# Hajek (normalized)
ate_hajek <- sum(w1 * df$earnings) / sum(w1) - sum(w0 * df$earnings) / sum(w0)

cat("Horvitz-Thompson IPW:", round(ate_ipw), "\n")
cat("Hajek IPW:", round(ate_hajek), "\n")

Expected output:

IPW EstimatorATE Estimate
Horvitz-Thompson IPW~4,000–6,000 (can be volatile)
Hajek (normalized) IPW~4,500–5,500 (more stable)
True ATE~5,000

The Horvitz-Thompson IPW can be unstable due to extreme weights (observations with very low propensity scores receive very large weights). The Hajek (normalized) estimator divides by the sum of weights, which stabilizes the estimate. Trimming propensity scores at [0.05, 0.95] further reduces variance.


Step 5: The AIPW (Doubly Robust) Estimator

# AIPW by hand
aipw_scores <- (mu1_hat - mu0_hat
              + df$treat * (df$earnings - mu1_hat) / ps
              - (1 - df$treat) * (df$earnings - mu0_hat) / (1 - ps))

ate_aipw <- mean(aipw_scores)
se_aipw <- sd(aipw_scores) / sqrt(n)

cat("AIPW ATE:", round(ate_aipw), " (SE:", round(se_aipw), ")\n")
cat("95% CI: [", round(ate_aipw - 1.96 * se_aipw), ",",
  round(ate_aipw + 1.96 * se_aipw), "]\n")

# Or use the AIPW package
# library(AIPW)
# aipw_obj <- AIPW$new(Y = df$earnings, A = df$treat,
#                       W = df[, c("age","educ","married","prev_earn")])
# aipw_obj$fit()$summary()
RequiresAIPW

Expected output:

AIPW (Doubly Robust)Value
ATE estimate~4,500–5,500
Standard error~250–400
95% CI lower~4,200
95% CI upper~5,800
True ATE~5,000
Comparison of All EstimatorsATE Estimate
OLS (outcome regression)~5,000
IPW (Hajek)~5,000
AIPW (doubly robust)~5,000
True ATE~5,000

All three estimators should be close to the true ATE in this case because both the propensity score model and the outcome model are correctly specified. The advantage of AIPW becomes clear when one model is deliberately misspecified (Step 6).


Step 6: Double Robustness in Action

Now deliberately misspecify one of the two models to see the doubly robust property.

# Misspecify propensity score (only age)
ps_wrong <- pmin(pmax(predict(glm(treat ~ age, family = binomial), type = "response"), 0.05), 0.95)

# Misspecify outcome model (only age)
mu1_wrong <- predict(lm(earnings ~ age, data = df[df$treat == 1, ]), newdata = df)
mu0_wrong <- predict(lm(earnings ~ age, data = df[df$treat == 0, ]), newdata = df)

# AIPW: correct outcome + wrong PS
aipw_good_out <- mean(mu1_hat - mu0_hat + df$treat * (df$earnings - mu1_hat) / ps_wrong
                   - (1 - df$treat) * (df$earnings - mu0_hat) / (1 - ps_wrong))

# AIPW: wrong outcome + correct PS
aipw_good_ps <- mean(mu1_wrong - mu0_wrong + df$treat * (df$earnings - mu1_wrong) / ps
                  - (1 - df$treat) * (df$earnings - mu0_wrong) / (1 - ps))

cat("Correct outcome + wrong PS:", round(aipw_good_out), "\n")
cat("Wrong outcome + correct PS:", round(aipw_good_ps), "\n")
cat("True ATE:", round(true_ate), "\n")
RequiresAIPW

Expected output:

ScenarioAIPW EstimateIPW AloneOLS AloneTrue ATE
Both models correct~5,000~5,000~5,000~5,000
Wrong PS, correct outcome~5,000~3,000–7,000 (biased)~5,000~5,000
Correct PS, wrong outcome~5,000~5,000~2,000–8,000 (biased)~5,000
Both models wrongbiasedbiasedbiased~5,000

This table demonstrates the doubly robust property in action:

  • Row 2: Even with a badly misspecified propensity score (using only age), AIPW recovers the correct ATE because the outcome model is correct. The IPW-only estimator, which relies solely on the propensity score, is biased.
  • Row 3: Even with a badly misspecified outcome model (using only age), AIPW recovers the correct ATE because the propensity score is correct. The OLS-only estimator, which relies solely on the outcome model, is biased.
  • Row 4: When both models are wrong, AIPW offers no guarantee and produces a biased estimate.
Concept Check

When BOTH the propensity score and outcome models are misspecified, the AIPW estimator in this example gives a biased estimate. What does this tell us about double robustness?


Exercises

  1. Use machine learning for nuisance models. Replace logistic regression and OLS with random forests for the propensity score and outcome models. Does the AIPW estimate improve?

  2. Estimate the ATT. Modify the AIPW formula to target the average treatment effect on the treated. How does it differ from the ATE?

  3. Bootstrap confidence intervals. Implement a nonparametric bootstrap for the AIPW estimator and compare the bootstrap SE with the influence-function SE.

  4. Vary the degree of overlap. Increase the coefficients in the propensity score model so that overlap deteriorates. At what point does AIPW break down?


Summary

In this lab you learned:

  • Outcome regression relies on a correctly specified outcome model; IPW relies on a correctly specified propensity score model
  • The AIPW estimator combines both approaches, providing consistency if either model is correct (the doubly robust property)
  • Propensity score trimming is essential for practical stability of IPW and AIPW
  • The influence function provides analytic standard errors without bootstrapping
  • When both models are wrong, no estimator can save you — invest effort in specifying both models well
  • AIPW is the foundation for many modern causal inference methods, including targeted learning (TMLE) and double machine learning