MethodAtlas
replication3 hours

Capstone Lab: Same Data, Five Methods

Apply OLS, matching, IV, DiD, and RDD to the same dataset. Discover why different methods give different estimates.

Overview

This capstone lab applies five estimation methods to the same dataset. Because the data is simulated, we know the true causal effect and can compare each method's estimate to the truth.

What you will learn:

  • How to apply OLS, matching, IV, DiD, and RDD to the same research question
  • Why different methods estimate different parameters (ATE, ATT, LATE)
  • How violations of identifying assumptions bias each method differently
  • How to present multi-method evidence in a paper

Prerequisites: We recommend completing the tutorial labs for OLS, matching, IV, DiD, and RDD first.


The Setting

A government agency runs a job training program:

  • Eligibility: Workers with a pre-program test score below 70 are eligible.
  • Selection: Among eligible workers, those with higher motivation (unobserved) are more likely to enroll.
  • Lottery: Some eligible workers received a random encouragement letter, which increased enrollment.
  • Timing: The program launched in 2020. We have earnings for 2019 (pre) and 2021 (post).
  • Heterogeneity: The program helps low-skill workers more than high-skill workers.

The true parameters: ATE = $2,000 (eligible population), ATT = $2,400 (enrolled workers), LATE = $1,800 (lottery compliers), effect at cutoff = $1,500 (workers near score 70).


Step 1: Generate the Data

set.seed(2024)
n <- 5000
age <- round(runif(n, 22, 60))
education <- pmin(pmax(round(rnorm(n, 12, 2.5)), 8), 20)
female <- rbinom(n, 1, 0.5)
test_score <- 40 + 0.8 * education + rnorm(n, sd = 10)
motivation <- rnorm(n)  # Unobserved confounder

eligible <- as.integer(test_score < 70)
lottery <- rep(0L, n)
lottery[eligible == 1] <- rbinom(sum(eligible), 1, 0.5)

# Treatment depends on eligibility, motivation, and lottery
latent <- -1.5 + 0.8*eligible + 0.6*motivation + 0.9*lottery +
        0.01*(70 - test_score)*eligible
training <- rbinom(n, 1, plogis(latent))
training[eligible == 0] <- 0L

earnings_pre <- 20000 + 500*education + 200*age - 2000*female +
              1500*motivation + rnorm(n, sd = 3000)

# Heterogeneous treatment effect
te_i <- pmax(2000 + 8*(60 - test_score) + 400*motivation, 0)

time_trend <- 1000
earnings_post <- earnings_pre + time_trend + te_i*training + rnorm(n, sd = 2000)

df <- data.frame(id = 1:n, age, education, female,
test_score = round(test_score, 1), motivation, eligible,
lottery, training, earnings_pre = round(earnings_pre),
earnings_post = round(earnings_post),
earnings_change = round(earnings_post - earnings_pre))

cat("True ATE (eligible):", round(mean(te_i[eligible == 1])), "\n")
cat("True ATT (treated):", round(mean(te_i[training == 1])), "\n")

Step 2: OLS Regression

Regress post-period earnings on a training indicator with observable controls.

library(estimatr)

ols_naive <- lm_robust(earnings_post ~ training,
                     data = df, se_type = "HC2")
ols_controls <- lm_robust(
earnings_post ~ training + education + age + female + earnings_pre,
data = df, se_type = "HC2")

cat("OLS naive:", round(coef(ols_naive)["training"]), "\n")
cat("OLS + controls:", round(coef(ols_controls)["training"]), "\n")
cat("True ATE (eligible):", round(mean(te_i[eligible == 1])), "\n")
cat("OLS is biased upward by selection on motivation.\n")
Requiresestimatr

What to notice: The OLS coefficient is substantially larger than the true ATE or ATT. Motivated workers both enroll in training and earn more regardless, so OLS conflates causation with selection.


Step 3: Propensity Score Matching

Match treated workers to similar untreated workers based on observable characteristics.

library(MatchIt)
df_elig <- df[df$eligible == 1, ]

m_out <- matchit(training ~ education + age + female + earnings_pre,
               data = df_elig, method = "nearest",
               distance = "glm", ratio = 1, replace = FALSE)
m_data <- match.data(m_out)

att_match <- lm_robust(earnings_post ~ training, data = m_data,
                     weights = weights, se_type = "HC2")

cat("ATT (matching):", round(coef(att_match)["training"]), "\n")
cat("True ATT:", round(mean(te_i[training == 1 & eligible == 1])), "\n")
cat("Matching reduces bias but cannot eliminate unobserved confounding.\n")
RequiresMatchIt

What to notice: Matching improves on naive OLS by creating a more comparable control group, but the estimate remains biased because unobserved motivation is not balanced.


Step 4: Instrumental Variables

Use the lottery encouragement letter as an instrument for training enrollment.

library(ivreg)
df_elig <- df[df$eligible == 1, ]

# First stage
first_stage <- lm_robust(training ~ lottery + education + age + female,
                       data = df_elig, se_type = "HC2")
cat("First-stage F:", round(summary(first_stage)$fstatistic[1], 1), "\n")

# 2SLS
iv_est <- ivreg(earnings_post ~ training + education + age + female |
              lottery + education + age + female, data = df_elig)
iv_robust <- coeftest(iv_est, vcov = vcovHC(iv_est, type = "HC2"))

cat("LATE (IV):", round(iv_robust["training", 1]), "\n")
cat("SE:", round(iv_robust["training", 2]), "\n")
cat("IV estimates the LATE for compliers.\n")
Requiresivreg

What to notice: The IV estimate is close to the true LATE (~$1,800), which is lower than the ATT because compliers include less-motivated workers who benefit less. The standard error is larger than OLS -- the classic bias-variance tradeoff of IV.


Step 5: Difference-in-Differences

Compare the change in earnings for treated vs. untreated workers before and after the program.

library(fixest)

# First-difference approach (equivalent to FE with T=2)
df_elig <- df[df$eligible == 1, ]
did_est <- lm_robust(earnings_change ~ training, data = df_elig,
                   se_type = "HC2")

cat("ATT (DiD):", round(coef(did_est)["training"]), "\n")
cat("True ATT:", round(mean(te_i[training == 1 & eligible == 1])), "\n")
cat("DiD removes time-invariant confounders like motivation.\n")
Requiresfixestdid

What to notice: The DiD estimate should be close to the true ATT (~$2,400) because the common time trend is identical for treated and control workers, so parallel trends holds. DiD succeeds by differencing out the time-invariant component of motivation.


Step 6: Regression Discontinuity Design

Exploit the eligibility cutoff at test score = 70 to estimate the local treatment effect at the threshold.

library(rdrobust)
df$running <- df$test_score - 70  # Negative = eligible

# Fuzzy RD: eligibility is sharp but treatment is not
rd_first <- rdrobust(df$training, df$running, c = 0)
summary(rd_first)

rd_fuzzy <- rdrobust(df$earnings_post, df$running, c = 0,
                   fuzzy = df$training)
cat("\nRD estimate:", round(rd_fuzzy$coef[1]), "\n")
cat("RD estimates the local effect at the cutoff (~$1,500).\n")
Requiresrdrobust

What to notice: The RD estimate targets the effect for workers right at the eligibility cutoff (test score = 70). These higher-skill workers benefit less from training, so the RD estimate (~$1,500) is lower than both the ATT and the LATE. This is not a flaw -- it reflects the local nature of RDD.


Step 7: Comparison and Interpretation

results <- data.frame(
Method = c("OLS (naive)", "OLS + controls", "Matching",
           "IV (2SLS)", "DiD", "RD (fuzzy)"),
Target = c("ATE*", "ATE*", "ATT", "LATE", "ATT", "LATE at cutoff"),
Note = c("Biased upward by selection",
         "Less biased, motivation still omitted",
         "Observables only; motivation unbalanced",
         "Consistent if lottery is valid IV",
         "Consistent if parallel trends holds",
         "Local effect at eligibility threshold"))
print(results, row.names = FALSE)
cat("\n--- True Parameters ---\n")
cat("ATE:", round(mean(te_i[eligible == 1])), "\n")
cat("ATT:", round(mean(te_i[training == 1])), "\n")
cat("LATE (compliers): ~1,800\nEffect at cutoff: ~1,500\n")
Requiresdid

Summary Comparison Table

MethodTarget EstimandExpected EstimateBias Source
OLS (naive)ATE (biased)~$3,500+Selection on motivation
OLS + controlsATE (biased)~$3,000+Residual selection on motivation
Matching (PSM)ATT (biased)~$2,800+Cannot match on unobserved motivation
IV (2SLS)LATE (compliers)~$1,800Unbiased if instrument valid
DiDATT~$2,400Unbiased if parallel trends holds
RD (fuzzy)LATE at cutoff~$1,500Unbiased locally; not generalizable
Concept Check

In this capstone lab, the IV estimate (~\$1,800) is lower than the DiD estimate (~\$2,400). Both methods are consistent under their respective assumptions. Why are their point estimates different?

Lessons for Applied Research

1. Define your estimand first. Before choosing a method, ask: what causal parameter do I want? ATE? ATT? LATE?

2. No method is assumption-free. OLS requires no omitted variables. Matching requires selection on observables. IV requires instrument validity. DiD requires parallel trends. RDD requires continuity at the cutoff.

3. Present multiple methods when possible. Agreement across methods strengthens your argument. Disagreement forces you to think about why.

4. Distinguish bias from estimand differences. OLS is biased (violated assumptions). IV and DiD differ because they target different parameters (legitimate). The first is a problem; the second is informative.

Extensions (Optional)

  1. Add motivation as a control to OLS and matching. How much does the bias shrink?
  2. Violate parallel trends by adding a differential time trend for treated workers. How does DiD change?
  3. Weaken the instrument by reducing the lottery effect on enrollment. What happens to IV precision?
  4. Estimate CATE using causal forests and compare to the true heterogeneous effects.
  5. Bootstrap the comparison. Repeat 500 times and plot each estimator's distribution around the truth.

Next Step: Return to the Labs index to explore method-specific labs, or revisit the Foundations to review the conceptual framework behind these methods.