Capstone Lab: Same Data, Five Methods
Apply OLS, matching, IV, DiD, and RDD to the same dataset. Discover why different methods give different estimates.
Overview
This capstone lab applies five estimation methods to the same dataset. Because the data is simulated, we know the true causal effect and can compare each method's estimate to the truth.
What you will learn:
- How to apply OLS, matching, IV, DiD, and RDD to the same research question
- Why different methods estimate different parameters (ATE, ATT, LATE)
- How violations of identifying assumptions bias each method differently
- How to present multi-method evidence in a paper
Prerequisites: We recommend completing the tutorial labs for OLS, matching, IV, DiD, and RDD first.
The Setting
A government agency runs a job training program:
- Eligibility: Workers with a pre-program test score below 70 are eligible.
- Selection: Among eligible workers, those with higher motivation (unobserved) are more likely to enroll.
- Lottery: Some eligible workers received a random encouragement letter, which increased enrollment.
- Timing: The program launched in 2020. We have earnings for 2019 (pre) and 2021 (post).
- Heterogeneity: The program helps low-skill workers more than high-skill workers.
The true parameters: ATE = $2,000 (eligible population), ATT = $2,400 (enrolled workers), LATE = $1,800 (lottery compliers), effect at cutoff = $1,500 (workers near score 70).
Step 1: Generate the Data
set.seed(2024)
n <- 5000
age <- round(runif(n, 22, 60))
education <- pmin(pmax(round(rnorm(n, 12, 2.5)), 8), 20)
female <- rbinom(n, 1, 0.5)
test_score <- 40 + 0.8 * education + rnorm(n, sd = 10)
motivation <- rnorm(n) # Unobserved confounder
eligible <- as.integer(test_score < 70)
lottery <- rep(0L, n)
lottery[eligible == 1] <- rbinom(sum(eligible), 1, 0.5)
# Treatment depends on eligibility, motivation, and lottery
latent <- -1.5 + 0.8*eligible + 0.6*motivation + 0.9*lottery +
0.01*(70 - test_score)*eligible
training <- rbinom(n, 1, plogis(latent))
training[eligible == 0] <- 0L
earnings_pre <- 20000 + 500*education + 200*age - 2000*female +
1500*motivation + rnorm(n, sd = 3000)
# Heterogeneous treatment effect
te_i <- pmax(2000 + 8*(60 - test_score) + 400*motivation, 0)
time_trend <- 1000
earnings_post <- earnings_pre + time_trend + te_i*training + rnorm(n, sd = 2000)
df <- data.frame(id = 1:n, age, education, female,
test_score = round(test_score, 1), motivation, eligible,
lottery, training, earnings_pre = round(earnings_pre),
earnings_post = round(earnings_post),
earnings_change = round(earnings_post - earnings_pre))
cat("True ATE (eligible):", round(mean(te_i[eligible == 1])), "\n")
cat("True ATT (treated):", round(mean(te_i[training == 1])), "\n")Step 2: OLS Regression
Regress post-period earnings on a training indicator with observable controls.
library(estimatr)
ols_naive <- lm_robust(earnings_post ~ training,
data = df, se_type = "HC2")
ols_controls <- lm_robust(
earnings_post ~ training + education + age + female + earnings_pre,
data = df, se_type = "HC2")
cat("OLS naive:", round(coef(ols_naive)["training"]), "\n")
cat("OLS + controls:", round(coef(ols_controls)["training"]), "\n")
cat("True ATE (eligible):", round(mean(te_i[eligible == 1])), "\n")
cat("OLS is biased upward by selection on motivation.\n")What to notice: The OLS coefficient is substantially larger than the true ATE or ATT. Motivated workers both enroll in training and earn more regardless, so OLS conflates causation with selection.
Step 3: Propensity Score Matching
Match treated workers to similar untreated workers based on observable characteristics.
library(MatchIt)
df_elig <- df[df$eligible == 1, ]
m_out <- matchit(training ~ education + age + female + earnings_pre,
data = df_elig, method = "nearest",
distance = "glm", ratio = 1, replace = FALSE)
m_data <- match.data(m_out)
att_match <- lm_robust(earnings_post ~ training, data = m_data,
weights = weights, se_type = "HC2")
cat("ATT (matching):", round(coef(att_match)["training"]), "\n")
cat("True ATT:", round(mean(te_i[training == 1 & eligible == 1])), "\n")
cat("Matching reduces bias but cannot eliminate unobserved confounding.\n")What to notice: Matching improves on naive OLS by creating a more comparable control group, but the estimate remains biased because unobserved motivation is not balanced.
Step 4: Instrumental Variables
Use the lottery encouragement letter as an instrument for training enrollment.
library(ivreg)
df_elig <- df[df$eligible == 1, ]
# First stage
first_stage <- lm_robust(training ~ lottery + education + age + female,
data = df_elig, se_type = "HC2")
cat("First-stage F:", round(summary(first_stage)$fstatistic[1], 1), "\n")
# 2SLS
iv_est <- ivreg(earnings_post ~ training + education + age + female |
lottery + education + age + female, data = df_elig)
iv_robust <- coeftest(iv_est, vcov = vcovHC(iv_est, type = "HC2"))
cat("LATE (IV):", round(iv_robust["training", 1]), "\n")
cat("SE:", round(iv_robust["training", 2]), "\n")
cat("IV estimates the LATE for compliers.\n")What to notice: The IV estimate is close to the true LATE (~$1,800), which is lower than the ATT because compliers include less-motivated workers who benefit less. The standard error is larger than OLS -- the classic bias-variance tradeoff of IV.
Step 5: Difference-in-Differences
Compare the change in earnings for treated vs. untreated workers before and after the program.
library(fixest)
# First-difference approach (equivalent to FE with T=2)
df_elig <- df[df$eligible == 1, ]
did_est <- lm_robust(earnings_change ~ training, data = df_elig,
se_type = "HC2")
cat("ATT (DiD):", round(coef(did_est)["training"]), "\n")
cat("True ATT:", round(mean(te_i[training == 1 & eligible == 1])), "\n")
cat("DiD removes time-invariant confounders like motivation.\n")What to notice: The DiD estimate should be close to the true ATT (~$2,400) because the common time trend is identical for treated and control workers, so parallel trends holds. DiD succeeds by differencing out the time-invariant component of motivation.
Step 6: Regression Discontinuity Design
Exploit the eligibility cutoff at test score = 70 to estimate the local treatment effect at the threshold.
library(rdrobust)
df$running <- df$test_score - 70 # Negative = eligible
# Fuzzy RD: eligibility is sharp but treatment is not
rd_first <- rdrobust(df$training, df$running, c = 0)
summary(rd_first)
rd_fuzzy <- rdrobust(df$earnings_post, df$running, c = 0,
fuzzy = df$training)
cat("\nRD estimate:", round(rd_fuzzy$coef[1]), "\n")
cat("RD estimates the local effect at the cutoff (~$1,500).\n")What to notice: The RD estimate targets the effect for workers right at the eligibility cutoff (test score = 70). These higher-skill workers benefit less from training, so the RD estimate (~$1,500) is lower than both the ATT and the LATE. This is not a flaw -- it reflects the local nature of RDD.
Step 7: Comparison and Interpretation
results <- data.frame(
Method = c("OLS (naive)", "OLS + controls", "Matching",
"IV (2SLS)", "DiD", "RD (fuzzy)"),
Target = c("ATE*", "ATE*", "ATT", "LATE", "ATT", "LATE at cutoff"),
Note = c("Biased upward by selection",
"Less biased, motivation still omitted",
"Observables only; motivation unbalanced",
"Consistent if lottery is valid IV",
"Consistent if parallel trends holds",
"Local effect at eligibility threshold"))
print(results, row.names = FALSE)
cat("\n--- True Parameters ---\n")
cat("ATE:", round(mean(te_i[eligible == 1])), "\n")
cat("ATT:", round(mean(te_i[training == 1])), "\n")
cat("LATE (compliers): ~1,800\nEffect at cutoff: ~1,500\n")Summary Comparison Table
| Method | Target Estimand | Expected Estimate | Bias Source |
|---|---|---|---|
| OLS (naive) | ATE (biased) | ~$3,500+ | Selection on motivation |
| OLS + controls | ATE (biased) | ~$3,000+ | Residual selection on motivation |
| Matching (PSM) | ATT (biased) | ~$2,800+ | Cannot match on unobserved motivation |
| IV (2SLS) | LATE (compliers) | ~$1,800 | Unbiased if instrument valid |
| DiD | ATT | ~$2,400 | Unbiased if parallel trends holds |
| RD (fuzzy) | LATE at cutoff | ~$1,500 | Unbiased locally; not generalizable |
In this capstone lab, the IV estimate (~\$1,800) is lower than the DiD estimate (~\$2,400). Both methods are consistent under their respective assumptions. Why are their point estimates different?
Lessons for Applied Research
1. Define your estimand first. Before choosing a method, ask: what causal parameter do I want? ATE? ATT? LATE?
2. No method is assumption-free. OLS requires no omitted variables. Matching requires selection on observables. IV requires instrument validity. DiD requires parallel trends. RDD requires continuity at the cutoff.
3. Present multiple methods when possible. Agreement across methods strengthens your argument. Disagreement forces you to think about why.
4. Distinguish bias from estimand differences. OLS is biased (violated assumptions). IV and DiD differ because they target different parameters (legitimate). The first is a problem; the second is informative.
Extensions (Optional)
- Add motivation as a control to OLS and matching. How much does the bias shrink?
- Violate parallel trends by adding a differential time trend for treated workers. How does DiD change?
- Weaken the instrument by reducing the lottery effect on enrollment. What happens to IV precision?
- Estimate CATE using causal forests and compare to the true heterogeneous effects.
- Bootstrap the comparison. Repeat 500 times and plot each estimator's distribution around the truth.
Next Step: Return to the Labs index to explore method-specific labs, or revisit the Foundations to review the conceptual framework behind these methods.