MethodAtlas
Lab·replication·9 min read
replication120 minutes

Replication Lab: Returns to College with Marginal Treatment Effects

Replicate the MTE analysis from Carneiro, Heckman, and Vytlacil (2011). Estimate a probit first stage, trace the parametric MTE curve, test for essential heterogeneity, compare ATE/ATT/LATE, and compute the policy-relevant treatment effect (PRTE) for a college expansion.

LanguagesPython, R, Stata
DatasetSimulated NLSY-style data with college proximity instrument

Overview

In this replication lab, you will reproduce the key results from:

Carneiro, Pedro, James J. Heckman, and Edward J. Vytlacil. 2011. "Estimating Marginal Returns to Education." American Economic Review 101(6): 2754–2781. DOI: 10.1257/aer.101.6.2754

Carneiro et al. (2011) use the MTE framework to estimate how returns to college vary across individuals. Using college proximity as an instrument, they find a declining MTE curve — individuals most likely to attend college benefit the most — and demonstrate that conventional treatment effect parameters (ATE, ATT, LATE) differ substantially due to essential heterogeneity.

Why this paper matters: It provided the first empirical implementation of the MTE framework for estimating returns to education, showing that standard IV estimates (LATE) do not generalize to the population. The paper demonstrated that policy-relevant treatment effects (PRTE) for college expansions are lower than LATE, because the marginal students induced to attend by a new policy have lower returns than current compliers.

What you will do:

  • Simulate NLSY-style data with college proximity as an instrument
  • Estimate a probit first stage for college attendance
  • Compute the parametric MTE curve
  • Test for essential heterogeneity
  • Compare ATE, ATT, and LATE
  • Compute the PRTE for a hypothetical college expansion

Step 1: Simulate NLSY-Style Data

library(MASS)

set.seed(2011)
n <- 12000

# Background covariates (matching NLSY structure)
# AFQT score (cognitive ability)
afqt <- rnorm(n)
# Mother's education (years)
mom_educ <- pmin(pmax(round(12 + 2 * rnorm(n)), 6), 20)
# Family income (log)
log_fam_inc <- 10 + 0.3 * afqt + 0.1 * mom_educ + rnorm(n, 0, 0.5)
# Number of siblings
n_siblings <- rpois(n, 2)
# Urban residence
urban <- rbinom(n, 1, 0.7)

# Instrument: proximity to a four-year college
# (distance in 10-mile units, lower = closer)
# Exogenous after conditioning on covariates
college_nearby <- rbinom(n, 1, 0.6 + 0.1 * urban - 0.05 * n_siblings)

# Unobserved heterogeneity
# (U_S, U_1, U_0) are correlated: selection on gains
Sigma <- matrix(c(1.0, 0.5, 0.2,
                 0.5, 1.0, 0.3,
                 0.2, 0.3, 1.0), 3, 3)
U <- mvrnorm(n, mu = c(0, 0, 0), Sigma = Sigma)
U_S <- U[, 1]  # Unobserved in selection equation
U_1 <- U[, 2]  # Unobserved in treated outcome
U_0 <- U[, 3]  # Unobserved in untreated outcome

# Selection equation (probit model)
# College attendance decision
gamma_S <- c(-0.5, 0.40, 0.05, 0.10, -0.05, 0.15, 0.50)
# Intercept, AFQT, mom_educ, log_fam_inc, n_siblings, urban, college_nearby
latent_S <- gamma_S[1] + gamma_S[2]*afqt + gamma_S[3]*mom_educ +
gamma_S[4]*log_fam_inc + gamma_S[5]*n_siblings +
gamma_S[6]*urban + gamma_S[7]*college_nearby - U_S

D <- as.integer(latent_S > 0)

# Outcome equations (log hourly wages at age 30)
# Y(0): earnings without college
# Y(1): earnings with college
alpha_0 <- c(2.0, 0.15, 0.02, 0.10, -0.02, 0.05)
alpha_1 <- c(2.3, 0.20, 0.03, 0.08, -0.01, 0.08)

Y0 <- alpha_0[1] + alpha_0[2]*afqt + alpha_0[3]*mom_educ +
alpha_0[4]*log_fam_inc + alpha_0[5]*n_siblings +
alpha_0[6]*urban + 0.3*U_0

Y1 <- alpha_1[1] + alpha_1[2]*afqt + alpha_1[3]*mom_educ +
alpha_1[4]*log_fam_inc + alpha_1[5]*n_siblings +
alpha_1[6]*urban + 0.3*U_1

# Individual treatment effect
beta_i <- Y1 - Y0

# Observed outcome
Y <- D * Y1 + (1 - D) * Y0

df <- data.frame(Y, D, afqt, mom_educ, log_fam_inc, n_siblings,
               urban, college_nearby, beta_i)

cat("=== NLSY-Style Data Summary ===\n")
cat("N:", n, "\n")
cat("College attendance rate:", round(mean(D), 3), "\n")
cat("Mean log wage:", round(mean(Y), 3), "\n\n")

cat("True ATE:", round(mean(beta_i), 3), "\n")
cat("True ATT:", round(mean(beta_i[D == 1]), 3), "\n")
cat("True ATU:", round(mean(beta_i[D == 0]), 3), "\n\n")

cat("Published estimates (CHV 2011):\n")
cat("  ATE: ~0.04-0.07\n")
cat("  ATT: ~0.13-0.15\n")
cat("  LATE: ~0.09-0.12\n")
RequiresMASS

Expected output:

StatisticValue
N12,000
College attendance rate~0.55
Mean log wage~3.1
True ATE~0.35–0.45
True ATT~0.45–0.55
True ATU~0.25–0.35

Step 2: Probit First Stage

# Probit model: college attendance on covariates + instrument
probit <- glm(D ~ afqt + mom_educ + log_fam_inc + n_siblings +
              urban + college_nearby,
            data = df, family = binomial(link = "probit"))

cat("=== Probit First Stage ===\n")
print(summary(probit)$coefficients)

df$phat <- predict(probit, type = "response")

cat("\nPropensity score summary:\n")
cat("  Min:", round(min(df$phat), 3), "\n")
cat("  Max:", round(max(df$phat), 3), "\n")
cat("  Mean:", round(mean(df$phat), 3), "\n")
cat("  Median:", round(median(df$phat), 3), "\n\n")

# Instrument relevance
cat("Instrument (college_nearby):\n")
cat("  Coefficient:", round(coef(probit)["college_nearby"], 4), "\n")
cat("  z-statistic:", round(summary(probit)$coefficients["college_nearby", "z value"], 2), "\n")
cat("  (Strong instrument: |z| >> 2)\n\n")

# Published first stage
cat("Published: College proximity increases attendance by ~12-15 pp\n")
# Marginal effect at the mean
marginal_eff <- dnorm(predict(probit, type = "link")) * coef(probit)["college_nearby"]
cat("Our marginal effect at mean:", round(mean(marginal_eff), 3), "\n")

Expected output:

VariableCoefficientSEz-stat
afqt~0.35~0.02~17
mom_educ~0.04~0.01~4
log_fam_inc~0.08~0.03~3
n_siblings~-0.04~0.01~-3
urban~0.12~0.04~3
college_nearby~0.42~0.03~14
StatisticValue
Propensity score range[~0.05, ~0.95]
Marginal effect of proximity~0.12–0.16

Step 3: Parametric MTE Estimation

# Parametric MTE: polynomial in P(Z)
# E[Y | X, P] = X'alpha + K(P)
# MTE(u | X) = K'(u)

df$phat2 <- df$phat^2
df$phat3 <- df$phat^3

# Quadratic specification (main)
mte_quad <- lm(Y ~ afqt + mom_educ + log_fam_inc + n_siblings +
               urban + phat + phat2, data = df)

cat("=== MTE Regression (Quadratic) ===\n")
cat("K(P) = ", round(coef(mte_quad)["phat"], 4), "* P + ",
  round(coef(mte_quad)["phat2"], 4), "* P^2\n\n")

# MTE(u) = dK/dp = beta1 + 2*beta2*u
b1 <- coef(mte_quad)["phat"]
b2 <- coef(mte_quad)["phat2"]

u_grid <- seq(0.05, 0.95, by = 0.05)
mte_vals <- b1 + 2 * b2 * u_grid

cat("=== Estimated MTE Curve ===\n")
cat(sprintf("%-8s %-12s\n", "u_D", "MTE(u)"))
for (i in seq_along(u_grid)) {
cat(sprintf("%-8.2f %-12.4f\n", u_grid[i], mte_vals[i]))
}

# Compare polynomial orders
cat("\n=== Sensitivity to Polynomial Order ===\n")
for (order in 1:3) {
formula <- "Y ~ afqt + mom_educ + log_fam_inc + n_siblings + urban"
for (j in 1:order) {
  df[[paste0("p", j)]] <- df$phat^j
  formula <- paste0(formula, " + p", j)
}
m <- lm(as.formula(formula), data = df)

# MTE at u = 0.5
mte_mid <- 0
for (j in 1:order) {
  mte_mid <- mte_mid + j * coef(m)[paste0("p", j)] * 0.5^(j - 1)
}
cat(sprintf("Order %d: MTE(0.5) = %.4f\n", order, mte_mid))
}

Expected output:

u_DMTE (estimated)
0.10~0.55
0.30~0.45
0.50~0.38
0.70~0.30
0.90~0.22
Polynomial OrderMTE(0.5)
Linear~0.38
Quadratic~0.38
Cubic~0.38

The MTE curve declines from approximately 0.55 at u = 0.10 to approximately 0.22 at u = 0.90, matching the qualitative pattern in the published paper. The MTE at u = 0.50 is stable across polynomial orders.


Step 4: Test for Essential Heterogeneity

# F-test: is the coefficient on P^2 significant?
cat("=== Essential Heterogeneity Test ===\n")
cat("H0: MTE is constant (no essential heterogeneity)\n\n")

# Restricted model (linear in P)
restricted <- lm(Y ~ afqt + mom_educ + log_fam_inc + n_siblings +
                 urban + phat, data = df)
# Unrestricted model (quadratic in P)
unrestricted <- lm(Y ~ afqt + mom_educ + log_fam_inc + n_siblings +
                   urban + phat + phat2, data = df)

f_test <- anova(restricted, unrestricted)
cat("F-statistic:", round(f_test$F[2], 2), "\n")
cat("p-value:", round(f_test$"Pr(>F)"[2], 6), "\n")
cat("Decision:", ifelse(f_test$"Pr(>F)"[2] < 0.05,
  "REJECT H0: Essential heterogeneity is present",
  "Fail to reject H0"), "\n\n")

# Published: "We reject the hypothesis of no essential heterogeneity"
cat("Published result: Reject at 5% level\n")
cat("Implication: LATE != ATE, different IVs give different LATEs\n")

Expected output:

TestF-statisticp-valuePublished
H0: flat MTE~10–40< 0.001Reject at 5%

The essential heterogeneity test strongly rejects the null, confirming that the MTE is non-flat and that conventional treatment effect parameters differ.

Concept Check

The essential heterogeneity test rejects (p < 0.001). What are the practical implications for a policy evaluation of college subsidies?


Step 5: ATE, ATT, LATE Comparison

# ATE: uniform weights over [0, 1]
# For quadratic MTE: ATE = beta1 + beta2
ate <- b1 + b2

# ATT: overweights low u (eager participants)
u_fine <- seq(0.001, 0.999, length.out = 1000)
mte_fine <- b1 + 2 * b2 * u_fine
p_vals <- df$phat

att_w <- sapply(u_fine, function(u) mean(p_vals > u)) / mean(p_vals)
att <- sum(mte_fine * att_w) / sum(att_w)

# ATU: overweights high u (reluctant non-participants)
atu_w <- sapply(u_fine, function(u) mean(p_vals <= u)) / mean(1 - p_vals)
atu <- sum(mte_fine * atu_w) / sum(atu_w)

# LATE: average MTE over complier region for proximity IV
p_no_prox <- mean(df$phat[df$college_nearby == 0])
p_prox <- mean(df$phat[df$college_nearby == 1])
late_u <- seq(p_no_prox, p_prox, length.out = 200)
late_mte <- b1 + 2 * b2 * late_u
late <- mean(late_mte)

cat("=== Treatment Effect Comparison ===\n")
cat(sprintf("%-25s %-12s %-12s\n", "Parameter", "Estimated", "Published"))
cat(sprintf("%-25s %-12.3f %-12s\n", "ATE", ate, "~0.04-0.07"))
cat(sprintf("%-25s %-12.3f %-12s\n", "ATT", att, "~0.13-0.15"))
cat(sprintf("%-25s %-12.3f %-12s\n", "ATU", atu, "~0.01-0.03"))
cat(sprintf("%-25s %-12.3f %-12s\n", "LATE (proximity)", late, "~0.09-0.12"))
cat("\nNote: Our simulation has larger effect sizes than the published\n")
cat("paper. The qualitative ordering ATT > LATE > ATE > ATU is the\n")
cat("key finding to replicate.\n\n")

# True values from the DGP
cat("=== True Values (known from DGP) ===\n")
cat("True ATE:", round(mean(df$beta_i), 3), "\n")
cat("True ATT:", round(mean(df$beta_i[df$D == 1]), 3), "\n")
cat("True ATU:", round(mean(df$beta_i[df$D == 0]), 3), "\n")

Expected output:

ParameterEstimatedPublished (CHV 2011)
ATE~0.40~0.04–0.07
ATT~0.50~0.13–0.15
ATU~0.30~0.01–0.03
LATE (proximity)~0.43~0.09–0.12

The qualitative ordering ATT > LATE > ATE > ATU is the key finding to replicate. The absolute magnitudes differ because our simulation uses a stylized DGP with larger treatment effects; the published results use real NLSY data with more covariates and a noisier outcome.


Step 6: Policy-Relevant Treatment Effect (PRTE)

The PRTE answers: "What would be the average return to college for students induced to attend by a specific policy change?"

# PRTE: Effect of a hypothetical college expansion
# Suppose a new tuition subsidy increases the propensity score by 0.10
# for everyone (capped at 1)

df$phat_new <- pmin(df$phat + 0.10, 1)

# New treatment assignment (counterfactual)
D_new <- rbinom(n, 1, df$phat_new)

# PRTE weights: for MTE at u, weight = [F_new(u) - F_old(u)] / [E[D_new] - E[D]]
delta_D <- mean(D_new) - mean(D)

prte_weights <- sapply(u_fine, function(u) {
(mean(df$phat_new > u) - mean(df$phat > u)) / delta_D
})

prte <- sum(mte_fine * prte_weights) / sum(prte_weights)

# Alternatively, PRTE = avg MTE over new compliers
# These are individuals with phat < u < phat_new
cat("=== Policy-Relevant Treatment Effect ===\n")
cat("Policy: Tuition subsidy increasing P(Z) by 0.10 for all\n\n")
cat("PRTE:", round(prte, 3), "\n")
cat("ATT:", round(att, 3), "\n")
cat("LATE:", round(late, 3), "\n")
cat("ATE:", round(ate, 3), "\n\n")

cat("The PRTE (", round(prte, 3), ") is LOWER than LATE (",
  round(late, 3), ")\n")
cat("because the subsidy targets the margin of students who are\n")
cat("not currently attending — these marginal students have lower\n")
cat("returns than current compliers.\n\n")

cat("Published finding: PRTE for tuition subsidies is lower than LATE,\n")
cat("implying diminishing returns to expanding college access.\n")

# Compare PRTE for different policy sizes
cat("\n=== PRTE by Policy Size ===\n")
for (shift in c(0.05, 0.10, 0.15, 0.20)) {
p_new <- pmin(df$phat + shift, 1)
delta <- mean(p_new) - mean(df$phat)
# Average MTE over the newly treated margin
prte_w <- sapply(u_fine, function(u) {
  (mean(p_new > u) - mean(df$phat > u)) / delta
})
prte_s <- sum(mte_fine * prte_w) / sum(prte_w)
cat(sprintf("Shift = +%.2f: PRTE = %.3f\n", shift, prte_s))
}
cat("(Larger expansions have lower PRTE: diminishing returns)\n")

Expected output:

ParameterEstimate
PRTE (subsidy +0.10)~0.35
LATE (proximity IV)~0.43
ATT~0.50
ATE~0.40
Policy Size (P shift)PRTE
+0.05~0.38
+0.10~0.35
+0.15~0.33
+0.20~0.31

The PRTE is lower than LATE because the subsidy targets students on the margin of college attendance — those with higher unobserved resistance and lower returns. Larger expansions target increasingly reluctant students with even lower returns, producing diminishing marginal policy returns.

Concept Check

The PRTE declines from 0.38 to 0.31 as the subsidy becomes more generous. What does this imply about the marginal value of expanding college access?


Step 7: Error Detective

Error Detective

Read the analysis below carefully and identify the errors.

An education economist uses MTE to evaluate a proposed expansion of community college access. She estimates the propensity score using a probit with 15 covariates and a "presence of community college in county" instrument. She reports:

"The propensity score ranges from 0.35 to 0.65. We estimate a quadratic MTE curve. The MTE at u_D = 0.35 is 0.28, and at u_D = 0.65 it is 0.22. The essential heterogeneity test is insignificant (F = 1.8, p = 0.18). We compute ATE = 0.25, ATT = 0.27, LATE = 0.26, and PRTE = 0.19 for a nationwide community college expansion that would shift propensity scores by 0.20.

We conclude that the expansion would generate substantial returns (PRTE = 0.19) and recommend proceeding."

She does not discuss the limited propensity score support or the implications of the PRTE relying on extrapolation.

Select all errors you can find:


Summary

Our replication confirms the central findings of Carneiro et al. (2011):

  1. The MTE curve declines with unobserved resistance. Individuals who are most likely to attend college (low u_D) have the highest returns. This pattern is consistent with the Roy model of comparative advantage.

  2. Essential heterogeneity is present. The test strongly rejects a flat MTE, confirming that ATE, ATT, and LATE differ and that standard IV should not be naively extrapolated.

  3. ATT exceeds LATE, which exceeds ATE. This ordering follows from the declining MTE and the different weight functions. ATT overweights eager participants (high MTE), while ATE weights uniformly.

  4. The PRTE for college expansions is below LATE. Marginal students induced to attend by subsidies have lower returns than current compliers. Larger expansions produce diminishing marginal returns.

  5. The MTE framework provides a principled basis for policy extrapolation — something standard IV cannot do when treatment effects are heterogeneous.


Extension Exercises

  1. Semiparametric MTE. Instead of a polynomial in P, estimate the MTE using local polynomial smoothing of the relationship between Y and P(Z). Compare the semiparametric MTE curve to the parametric one.

  2. Bounds approach. Use the Mogstad et al. (2018) bounds approach to compute partial identification bounds on ATE when the propensity score support is limited.

  3. Multiple instruments. Add a second instrument (tuition level) to the simulation. Show that different instruments produce different LATE estimates when MTE is non-flat.

  4. Discrete instrument. Replace the continuous instrument with a binary one (college nearby yes/no). How does the discrete instrument affect the propensity score range and MTE identification?

  5. Normal selection model. Estimate the MTE using the Heckman and Vytlacil (2005) normal selection model instead of the polynomial approach. Compare the two MTE curves.