Lab·replication·5 min read

replication180 minutes

Capstone Lab: Same Data, Five Methods

Apply OLS, matching, IV, DiD, and RDD to the same dataset. Discover why different methods give different estimates.

Method: OLS (Robust SEs, Clustering)
Languages: Python, R, Stata
Dataset: Simulated job training program with multiple identification strategies

Overview

This capstone lab applies five estimation methods to the same dataset. Because the data are simulated, we know the true causal effect and can compare each method's estimate to the truth.

What you will learn:

How to apply OLS, matching, IV, DiD, and RDD to the same research question
Why different methods estimate different parameters (ATE, ATT, LATE)
How violations of identifying assumptions bias each method differently
How to present multi-method evidence in a paper

Prerequisites: We recommend completing the tutorial labs for OLS, matching, IV, DiD, and RDD first.

The Setting

A government agency runs a job training program:

Eligibility: Workers with a pre-program test score below 70 are eligible.
Selection: Among eligible workers, those with higher motivation (unobserved) are more likely to enroll.
Lottery: Some eligible workers received a random encouragement letter, which increased enrollment.
Timing: The program launched in 2020. We have earnings for 2019 (pre) and 2021 (post).
Heterogeneity: The program helps low-skill workers more than high-skill workers.

The true parameters: ATE ≈ $2,100 (eligible population), ATT ≈ $2,200 (enrolled workers), LATE ≈ $2,000 (lottery compliers), effect at cutoff ≈ $1,900 (workers near score 70).

Step 1: Generate the Data

1set.seed(2024)
2n <- 5000
3age <- round(runif(n, 22, 60))
4education <- pmin(pmax(round(rnorm(n, 12, 2.5)), 8), 20)
5female <- rbinom(n, 1, 0.5)
6test_score <- 40 + 0.8 * education + rnorm(n, sd = 10)
7motivation <- rnorm(n)  # Unobserved confounder
8
9eligible <- as.integer(test_score < 70)
10lottery <- rep(0L, n)
11lottery[eligible == 1] <- rbinom(sum(eligible), 1, 0.5)
12
13# Treatment depends on eligibility, motivation, and lottery
14latent <- -1.5 + 0.8*eligible + 0.6*motivation + 0.9*lottery +
15        0.01*(70 - test_score)*eligible
16training <- rbinom(n, 1, plogis(latent))
17training[eligible == 0] <- 0L
18
19earnings_pre <- 20000 + 500*education + 200*age - 2000*female +
20              1500*motivation + rnorm(n, sd = 3000)
21
22# Heterogeneous treatment effect
23te_i <- pmax(2000 + 8*(60 - test_score) + 400*motivation, 0)
24
25time_trend <- 1000
26earnings_post <- earnings_pre + time_trend + te_i*training + rnorm(n, sd = 2000)
27
28df <- data.frame(id = 1:n, age, education, female,
29test_score = round(test_score, 1), motivation, eligible,
30lottery, training, earnings_pre = round(earnings_pre),
31earnings_post = round(earnings_post),
32earnings_change = round(earnings_post - earnings_pre))
33
34cat("True ATE (eligible):", round(mean(te_i[eligible == 1])), "\n")
35cat("True ATT (treated):", round(mean(te_i[training == 1])), "\n")

Step 2: OLS Regression

Regress post-period earnings on a training indicator with observable controls.

1# First-time setup: install.packages(c("estimatr"))
2library(estimatr)
3
4ols_naive <- lm_robust(earnings_post ~ training,
5                     data = df, se_type = "HC2")
6ols_controls <- lm_robust(
7earnings_post ~ training + education + age + female + earnings_pre,
8data = df, se_type = "HC2")
9
10cat("OLS naive:", round(coef(ols_naive)["training"]), "\n")
11cat("OLS + controls:", round(coef(ols_controls)["training"]), "\n")
12cat("True ATE (eligible):", round(mean(te_i[eligible == 1])), "\n")
13cat("OLS is biased upward by selection on motivation.\n")

Requiresestimatr

What to notice: The OLS coefficient is substantially larger than the true ATE or ATT. Motivated workers both enroll in training and earn more regardless, so OLS conflates causation with selection.

Step 3: Propensity Score Matching

Match treated workers to similar untreated workers based on observable characteristics.

1# First-time setup: install.packages(c("MatchIt"))
2library(MatchIt)
3df_elig <- df[df$eligible == 1, ]
4
5m_out <- matchit(training ~ education + age + female + earnings_pre,
6               data = df_elig, method = "nearest",
7               distance = "glm", ratio = 1, replace = FALSE)
8m_data <- match.data(m_out)
9
10att_match <- lm_robust(earnings_post ~ training, data = m_data,
11                     weights = weights, se_type = "HC2")
12
13cat("ATT (matching):", round(coef(att_match)["training"]), "\n")
14cat("True ATT:", round(mean(te_i[training == 1 & eligible == 1])), "\n")
15cat("Matching reduces bias but cannot eliminate unobserved confounding.\n")

RequiresMatchIt

What to notice: Matching improves on naive OLS by creating a more comparable control group, but the estimate remains biased because unobserved motivation is not balanced.

Step 4: Instrumental Variables

Use the lottery encouragement letter as an instrument for training enrollment.

1# First-time setup: install.packages(c("fixest"))
2library(fixest)
3df_elig <- df[df$eligible == 1, ]
4
5# First stage: how strongly does the lottery instrument predict training?
6first_stage <- feols(training ~ lottery + education + age + female,
7                   data = df_elig, vcov = "hetero")
8cat("First-stage F:", round(fitstat(first_stage, "f")$f$stat, 1), "\n")
9
10# 2SLS via feols: outcome ~ exog | 0 | endog ~ instrument
11iv_est <- feols(earnings_post ~ education + age + female | 0 |
12              training ~ lottery, data = df_elig, vcov = "hetero")
13print(summary(iv_est))
14
15cat("LATE (IV):", round(coef(iv_est)["fit_training"]), "\n")
16cat("IV estimates the LATE for compliers.\n")

Requiresfixest

What to notice: The IV estimate is close to the true LATE (~$2,000), which is lower than the ATT because compliers include less-motivated workers who benefit less. The standard error is larger than OLS — the classic bias-variance tradeoff of IV.

Step 5: Difference-in-Differences

Compare the change in earnings for treated vs. untreated workers before and after the program.

1# First-time setup: install.packages(c("fixest"))
2library(fixest)
3
4# First-difference approach (equivalent to FE with T=2)
5df_elig <- df[df$eligible == 1, ]
6did_est <- lm_robust(earnings_change ~ training, data = df_elig,
7                   se_type = "HC2")
8
9cat("ATT (DiD):", round(coef(did_est)["training"]), "\n")
10cat("True ATT:", round(mean(te_i[training == 1 & eligible == 1])), "\n")
11cat("DiD removes time-invariant confounders like motivation.\n")

Requiresfixest did

What to notice: The DiD estimate should be close to the true ATT (~$2,200) because the common time trend is identical for treated and control workers, so parallel trends holds. DiD succeeds by differencing out the time-invariant component of motivation.

Step 6: Regression Discontinuity Design

Exploit the eligibility cutoff at test score = 70 to estimate the local treatment effect at the threshold.

1# First-time setup: install.packages(c("rdrobust"))
2library(rdrobust)
3df$running <- df$test_score - 70  # Negative = eligible
4
5# Fuzzy RD: eligibility is sharp but treatment is not
6rd_first <- rdrobust(df$training, df$running, c = 0)
7summary(rd_first)
8
9rd_fuzzy <- rdrobust(df$earnings_post, df$running, c = 0,
10                   fuzzy = df$training)
11cat("\nRD estimate:", round(rd_fuzzy$coef[1]), "\n")
12cat("RD estimates the local effect at the cutoff (~$1,900).\n")

Requiresrdrobust

What to notice: The RD estimate targets the effect for workers right at the eligibility cutoff (test score = 70). These higher-skill workers benefit less from training, so the RD estimate (~$1,900) is lower than both the ATT and the LATE. This difference is not a flaw — it reflects the local nature of RDD.

Step 7: Comparison and Interpretation

1results <- data.frame(
2Method = c("OLS (naive)", "OLS + controls", "Matching",
3           "IV (2SLS)", "DiD", "RD (fuzzy)"),
4Target = c("ATE*", "ATE*", "ATT", "LATE", "ATT", "LATE at cutoff"),
5Note = c("Biased upward by selection",
6         "Less biased, motivation still omitted",
7         "Observables only; motivation unbalanced",
8         "Consistent if lottery is valid IV",
9         "Consistent if parallel trends holds",
10         "Local effect at eligibility threshold"))
11print(results, row.names = FALSE)
12cat("\n--- True Parameters ---\n")
13cat("ATE:", round(mean(te_i[eligible == 1])), "\n")
14cat("ATT:", round(mean(te_i[training == 1])), "\n")
15cat("LATE (compliers): ~2,000\nEffect at cutoff: ~1,900\n")

Requiresdid

Summary Comparison Table

Method	Target Estimand	Expected Estimate	Bias Source
OLS (naive)	ATE (biased)	~$2,500–$3,500	Selection on motivation
OLS + controls	ATE (biased)	~$2,000–$3,000	Residual selection on motivation
Matching (PSM)	ATT (biased)	~$2,800+	Cannot match on unobserved motivation
IV (2SLS)	LATE (compliers)	~$2,000	Unbiased if instrument valid
DiD	ATT	~$2,200	Unbiased if parallel trends holds
RD (fuzzy)	LATE at cutoff	~$1,900	Unbiased locally; not generalizable

Why the estimates differ — and why the divergence is okay

The five estimates differ for two distinct reasons:

1. Bias. OLS and matching overestimate because they cannot account for unobserved motivation.

2. Different estimands. Even among the unbiased methods, estimates differ because they target different populations:

DiD estimates the ATT — the effect for those who enrolled (~$2,200). These disproportionately motivated workers benefit more.
IV estimates the LATE — the effect for compliers whose enrollment was changed by the lottery (~$2,000). Compliers include less-motivated workers.
RDD estimates the local effect at the cutoff — the effect for workers near score 70 (~$1,900). These higher-skill workers benefit less.

None of these methods is "wrong." They answer different questions.

Concept Check

In this capstone lab, the IV estimate (~$2,000) is lower than the DiD estimate (~$2,200). Both methods are consistent under their respective assumptions. Why are their point estimates different?

The IV estimate has a larger standard error, so the difference is just sampling noise.IV estimates the LATE for compliers (who include less-motivated workers and benefit less), while DiD estimates the ATT for all treated workers (who are disproportionately motivated and benefit more).DiD is biased because parallel trends is violated.The IV instrument is weak, causing the estimate to be attenuated.

Lessons for Applied Research

1. Define your estimand first. Before choosing a method, ask: what causal parameter do I want? ATE? ATT? LATE?

2. No method is assumption-free. OLS requires no omitted variables. Matching requires selection on observables. IV requires instrument validity. DiD requires parallel trends. RDD requires continuity at the cutoff.

3. Present multiple methods when possible. Agreement across methods strengthens your argument. Disagreement forces you to think about why.

4. Distinguish bias from estimand differences. OLS is biased (violated assumptions). IV and DiD differ because they target different parameters (legitimate). The first is a problem; the second is informative.

Extensions (Optional)

Add motivation as a control to OLS and matching. How much does the bias shrink?
Violate parallel trends by adding a differential time trend for treated workers. How does DiD change?
Weaken the instrument by reducing the lottery effect on enrollment. What happens to IV precision?
Estimate CATE using causal forests and compare to the true heterogeneous effects.
Bootstrap the comparison. Repeat 500 times and plot each estimator's distribution around the truth.

Next Step: Return to the Labs index to explore method-specific labs, or revisit the Foundations to review the conceptual framework behind these methods.

Overview#

The Setting#

Step 1: Generate the Data#

Step 2: OLS Regression#

Step 3: Propensity Score Matching#

Step 4: Instrumental Variables#

Step 5: Difference-in-Differences#

Step 6: Regression Discontinuity Design#

Step 7: Comparison and Interpretation#

Summary Comparison Table#

Lessons for Applied Research#

Extensions (Optional)#