Lab·tutorial·6 min read

tutorial90 minutes

Lab: Doubly Robust Estimation (AIPW)

Implement the AIPW estimator: how double robustness insures against model misspecification, with comparisons to IPW and outcome regression.

Method: Doubly Robust / AIPW Estimation
Languages: Python, R, Stata
Dataset: Simulated observational job training data

Overview

In this lab you will estimate the average treatment effect of a job training program on earnings using three approaches: outcome regression (OLS), inverse probability weighting (IPW), and augmented inverse probability weighting (AIPW). You will see that the AIPW estimator is consistent if either the propensity score model or the outcome model is correctly specified — the "doubly robust" property.

What you will learn:

How to estimate propensity scores and check overlap
How to construct the IPW estimator from first principles
How to build the AIPW (doubly robust) estimator
Why double robustness matters when one model is misspecified
How to compute standard errors via the influence function

Prerequisites: Familiarity with OLS regression and logistic regression. Understanding of potential outcomes notation is helpful.

Step 1: Simulate Observational Data

We create a dataset where treatment assignment depends on covariates (selection on observables).

1set.seed(42)
2n <- 2000
3
4age <- pmin(pmax(rnorm(n, 35, 10), 18), 65)
5educ <- pmin(pmax(rnorm(n, 12, 3), 6), 20)
6married <- rbinom(n, 1, 0.4)
7prev_earn <- rlnorm(n, 9.5, 0.8)
8
9logit_ps <- -2 + 0.03 * age - 0.1 * educ + 0.5 * married + 0.00005 * prev_earn
10true_ps <- plogis(logit_ps)
11treat <- rbinom(n, 1, true_ps)
12
13y0 <- 20000 + 500 * educ + 100 * age + 3000 * married + 0.3 * prev_earn + rnorm(n, 0, 5000)
14y1 <- y0 + 4400 + 200 * educ - 50 * age
15
16earnings <- treat * y1 + (1 - treat) * y0
17true_ate <- mean(y1 - y0)
18
19df <- data.frame(earnings, treat, age, educ, married, prev_earn)
20
21cat("True ATE:", round(true_ate), "\n")
22cat("Naive diff:", round(mean(earnings[treat == 1]) - mean(earnings[treat == 0])), "\n")

Expected output:

Covariate	Mean (Treated)	Mean (Control)	Difference
earnings	~38,000	~34,000	~4,000 (biased)
age	~36.0	~34.5	+1.5
educ	~11.5	~12.3	-0.8
married	~0.48	~0.35	+0.13
prev_earn	~16,000	~12,000	+4,000

Sample data preview (first 5 rows):

earnings	treat	age	educ	married	prev_earn
35,214	1	42.3	10.5	1	18,500
28,901	0	28.7	14.2	0	9,200
41,378	1	38.1	12.0	1	22,100
32,156	0	33.5	13.8	0	11,800
36,892	0	45.2	11.1	1	19,400

Key Statistics	Value
Sample size	2,000
Treatment rate	~35–45%
True ATE	~4,500–5,500 (DGP: 4400 + 200mean(educ) - 50mean(age))
Naive diff in means	Biased due to selection on age, educ, married, prev_earn

Step 2: Estimate the Propensity Score

1# First-time setup: install.packages(c("ggplot2"))
2# Estimate propensity score
3ps_model <- glm(treat ~ age + educ + married + prev_earn,
4              data = df, family = binomial)
5df$ps_hat <- predict(ps_model, type = "response")
6
7# Check overlap
8library(ggplot2)
9ggplot(df, aes(x = ps_hat, fill = factor(treat))) +
10geom_histogram(alpha = 0.5, position = "identity", bins = 40) +
11labs(x = "Propensity Score", fill = "Treated") +
12theme_minimal()
13
14cat("PS range (treated):", range(df$ps_hat[df$treat == 1]), "\n")
15cat("PS range (control):", range(df$ps_hat[df$treat == 0]), "\n")

Requiresggplot2

Expected output:

Propensity Score	Treated	Control
Range	[~0.10, ~0.80]	[~0.05, ~0.70]
Mean	~0.42	~0.33
Median	~0.40	~0.32

Concept Check

You notice that some treated units have propensity scores very close to zero (e.g., 0.01). Why is this a problem for IPW estimation?

Low propensity scores violate the unconfoundedness assumption.Units with extreme propensity scores receive very large weights (1/ps near 100), making the estimator highly variable and sensitive to these few observations.It means the logistic regression model is misspecified.It is not a problem — IPW handles all propensity score values correctly.

Step 3: Outcome Regression (OLS)

1# Outcome regression
2ols <- lm(earnings ~ treat + age + educ + married + prev_earn, data = df)
3cat("OLS ATE:", coef(ols)["treat"], "\n")
4
5# Separate models for each treatment arm
6mu1_model <- lm(earnings ~ age + educ + married + prev_earn, data = df[df$treat == 1, ])
7mu0_model <- lm(earnings ~ age + educ + married + prev_earn, data = df[df$treat == 0, ])
8mu1_hat <- predict(mu1_model, newdata = df)
9mu0_hat <- predict(mu0_model, newdata = df)
10cat("Outcome regression (separate models):", mean(mu1_hat - mu0_hat), "\n")

Expected output:

Estimator	ATE Estimate	SE	True ATE
OLS (treat coefficient)	~4,500–5,500	~300	~5,000
Outcome regression (separate models)	~4,500–5,500	—	~5,000

OLS Regression Output	Coefficient	SE	t	p-value
treat	~5,000	~300	~16.7	<0.001
age	~100	~15	~6.7	<0.001
educ	~700	~50	~14.0	<0.001
married	~3,000	~300	~10.0	<0.001
prev_earn	~0.30	~0.02	~15.0	<0.001

Note: The OLS specification omits the treat×educ and treat×age interactions present in the true DGP, so it does not capture individual-level heterogeneous treatment effects. However, because both potential outcome equations are linear in covariates, the OLS coefficient on treat recovers a variance-weighted average that closely approximates the true ATE.

Step 4: Inverse Probability Weighting (IPW)

1# IPW estimator
2ps <- pmin(pmax(df$ps_hat, 0.05), 0.95)
3
4w1 <- df$treat / ps
5w0 <- (1 - df$treat) / (1 - ps)
6
7# Horvitz-Thompson
8ate_ipw <- mean(w1 * df$earnings) - mean(w0 * df$earnings)
9
10# Hajek (normalized)
11ate_hajek <- sum(w1 * df$earnings) / sum(w1) - sum(w0 * df$earnings) / sum(w0)
12
13cat("Horvitz-Thompson IPW:", round(ate_ipw), "\n")
14cat("Hajek IPW:", round(ate_hajek), "\n")

Expected output:

IPW Estimator	ATE Estimate
Horvitz-Thompson IPW	~4,000–6,000 (can be volatile)
Hajek (normalized) IPW	~4,500–5,500 (more stable)
True ATE	~5,000

The Horvitz-Thompson IPW can be unstable due to extreme weights (observations with very low propensity scores receive very large weights). The Hajek (normalized) estimator divides by the sum of weights, which stabilizes the estimate. Trimming propensity scores at [0.05, 0.95] further reduces variance.

Step 5: The AIPW (Doubly Robust) Estimator

1# First-time setup: install.packages(c("AIPW"))
2# AIPW by hand
3aipw_scores <- (mu1_hat - mu0_hat
4              + df$treat * (df$earnings - mu1_hat) / ps
5              - (1 - df$treat) * (df$earnings - mu0_hat) / (1 - ps))
6
7ate_aipw <- mean(aipw_scores)
8se_aipw <- sd(aipw_scores) / sqrt(n)
9
10cat("AIPW ATE:", round(ate_aipw), " (SE:", round(se_aipw), ")\n")
11cat("95% CI: [", round(ate_aipw - 1.96 * se_aipw), ",",
12  round(ate_aipw + 1.96 * se_aipw), "]\n")
13
14# Or use the AIPW package
15# library(AIPW)
16# aipw_obj <- AIPW$new(Y = df$earnings, A = df$treat,
17#                       W = df[, c("age","educ","married","prev_earn")])
18# aipw_obj$fit()$summary()

RequiresAIPW

Expected output:

AIPW (Doubly Robust)	Value
ATE estimate	~4,500–5,500
Standard error	~250–400
95% CI lower	~4,200
95% CI upper	~5,800
True ATE	~5,000

Comparison of All Estimators	ATE Estimate
OLS (outcome regression)	~5,000
IPW (Hajek)	~5,000
AIPW (doubly robust)	~5,000
True ATE	~5,000

All three estimators should be close to the true ATE in this case because both the propensity score model and the outcome model are correctly specified. The advantage of AIPW becomes clear when one model is deliberately misspecified (Step 6).

Step 6: Double Robustness in Action

Now deliberately misspecify one of the two models to see the doubly robust property.

1# Misspecify propensity score (only age)
2ps_wrong <- pmin(pmax(predict(glm(treat ~ age, family = binomial), type = "response"), 0.05), 0.95)
3
4# Misspecify outcome model (only age)
5mu1_wrong <- predict(lm(earnings ~ age, data = df[df$treat == 1, ]), newdata = df)
6mu0_wrong <- predict(lm(earnings ~ age, data = df[df$treat == 0, ]), newdata = df)
7
8# AIPW: correct outcome + wrong PS
9aipw_good_out <- mean(mu1_hat - mu0_hat + df$treat * (df$earnings - mu1_hat) / ps_wrong
10                   - (1 - df$treat) * (df$earnings - mu0_hat) / (1 - ps_wrong))
11
12# AIPW: wrong outcome + correct PS
13aipw_good_ps <- mean(mu1_wrong - mu0_wrong + df$treat * (df$earnings - mu1_wrong) / ps
14                  - (1 - df$treat) * (df$earnings - mu0_wrong) / (1 - ps))
15
16cat("Correct outcome + wrong PS:", round(aipw_good_out), "\n")
17cat("Wrong outcome + correct PS:", round(aipw_good_ps), "\n")
18cat("True ATE:", round(true_ate), "\n")

RequiresAIPW

Expected output:

Scenario	AIPW Estimate	IPW Alone	OLS Alone	True ATE
Both models correct	~5,000	~5,000	~5,000	~5,000
Wrong PS, correct outcome	~5,000	~3,000–7,000 (biased)	~5,000	~5,000
Correct PS, wrong outcome	~5,000	~5,000	~2,000–8,000 (biased)	~5,000
Both models wrong	biased	biased	biased	~5,000

This table demonstrates the doubly robust property in action:

Row 2: Even with a badly misspecified propensity score (using only age), AIPW recovers the correct ATE because the outcome model is correct. The IPW-only estimator, which relies solely on the propensity score, is biased.
Row 3: Even with a badly misspecified outcome model (using only age), AIPW recovers the correct ATE because the propensity score is correct. The OLS-only estimator, which relies solely on the outcome model, is biased.
Row 4: When both models are wrong, AIPW offers no guarantee and produces a biased estimate.

Concept Check

When BOTH the propensity score and outcome models are misspecified, the AIPW estimator in this example gives a biased estimate. What does this tell us about double robustness?

Double robustness is a myth — it does not actually work.Double robustness means you get two chances: the estimator is consistent if at least one of the two models is correctly specified, but it offers no protection if both are wrong.You need to add more covariates to fix the problem.Use OLS instead since AIPW failed.

Exercises

Use machine learning for nuisance models. Replace logistic regression and OLS with random forests for the propensity score and outcome models. Does the AIPW estimate improve?
Estimate the ATT. Modify the AIPW formula to target the average treatment effect on the treated. How does it differ from the ATE?
Bootstrap confidence intervals. Implement a nonparametric bootstrap for the AIPW estimator and compare the bootstrap SE with the influence-function SE.
Vary the degree of overlap. Increase the coefficients in the propensity score model so that overlap deteriorates. At what point does AIPW break down?

Expected output

If your code runs correctly, expect to see:

True ATE: Around $4,500–$5,500 (DGP: 4400 + 200 * mean(educ) - 50 * mean(age), which varies with the sample)
Naive difference in means: Biased (different from the true ATE) due to selection on observables
OLS with controls: Around the true ATE if the outcome model is correctly specified
IPW estimate: Around the true ATE if the propensity score is correctly specified; can be unstable without trimming
AIPW (doubly robust) estimate: Around the true ATE, with smaller variance than IPW
Propensity score overlap: Sufficient overlap (most propensity scores between 0.1 and 0.9), with some trimming needed in the tails
Misspecification exercise: When only the propensity score is wrong, AIPW is still close to the truth; when only the outcome model is wrong, AIPW is still close; when both are wrong, AIPW is biased
Influence-function SEs: Close to bootstrap SEs
Sample size: 2,000 observations

Summary

In this lab you learned:

Outcome regression relies on a correctly specified outcome model; IPW relies on a correctly specified propensity score model
The AIPW estimator combines both approaches, providing consistency if either model is correct (the doubly robust property)
Propensity score trimming is important for practical stability of IPW and AIPW
The influence function provides analytic standard errors without bootstrapping
When both models are wrong, no estimator can save you — invest effort in specifying both models well
AIPW is the foundation for many modern causal inference methods, including targeted learning (TMLE) and double machine learning

Overview#

Step 1: Simulate Observational Data#

Step 2: Estimate the Propensity Score#

Step 3: Outcome Regression (OLS)#

Step 4: Inverse Probability Weighting (IPW)#

Step 5: The AIPW (Doubly Robust) Estimator#

Step 6: Double Robustness in Action#

Exercises#

Summary#

Overview

Step 1: Simulate Observational Data

Step 2: Estimate the Propensity Score

Step 3: Outcome Regression (OLS)

Step 4: Inverse Probability Weighting (IPW)

Step 5: The AIPW (Doubly Robust) Estimator

Step 6: Double Robustness in Action

Exercises

Summary