Lab: Marginal Treatment Effects from Scratch
Implement marginal treatment effects step by step. Simulate a Roy model with selection on gains, estimate the propensity score, compute local IV estimates, trace the MTE curve, and derive ATE, ATT, and LATE from MTE weights.
Overview
Marginal Treatment Effects (MTE) provide a unified framework for understanding how treatment effects vary with individuals' propensity to be treated. The MTE function traces how the return to treatment changes as we move from eager participants (low unobserved resistance) to reluctant participants (high unobserved resistance). All standard treatment effect parameters — ATE, ATT, LATE — are weighted averages of the MTE.
What you will learn:
- How to simulate a Roy model with selection on gains
- How to estimate the propensity score for treatment participation
- How to compute local IV estimates at different propensity score values
- How to trace the MTE curve and interpret its shape
- How to recover ATE, ATT, and LATE as weighted averages of the MTE
Prerequisites: Instrumental variables (see the IV tutorial lab), propensity score estimation.
Step 1: Simulate a Roy Model with Selection on Gains
In the Roy model, individuals choose treatment (college) based partly on their comparative advantage. Those who benefit most are most likely to attend.
set.seed(2011)
n <- 10000
# Observed covariates
ability <- rnorm(n)
family_inc <- rnorm(n)
# Instrument: proximity to a four-year college
# Affects the cost of attending college but not earnings directly
proximity <- rnorm(n)
# Unobserved heterogeneity in treatment effect (V)
# V ~ Uniform(0,1) after transformation
# Low V = eager to attend college (low unobserved resistance)
# High V = reluctant (high unobserved resistance)
U_D <- rnorm(n)
# Propensity score: probability of attending college
# P(Z) = Phi(gamma0 + gamma1*ability + gamma2*family_inc + gamma3*proximity)
gamma <- c(0.3, 0.5, 0.3, 0.6) # proximity is a strong instrument
latent <- gamma[1] + gamma[2] * ability + gamma[3] * family_inc +
gamma[4] * proximity - U_D
# Treatment decision: attend college if net benefit > 0
D <- as.integer(latent > 0)
# Potential outcomes with heterogeneous treatment effects
# Y(0) = alpha0 + alpha1 * ability + epsilon0
# Y(1) = alpha0 + beta(U_D) + alpha1 * ability + epsilon1
# MTE(u) = E[Y(1)-Y(0) | V=u] declines in u (positive selection on gains)
alpha0 <- 10
alpha1 <- 0.8
# Treatment effect varies with unobserved resistance
# People with low resistance (who select in) have higher returns
# This creates essential heterogeneity
U_D_quantile <- pnorm(U_D) # Transform to uniform [0,1]
mte_true <- function(u) 0.60 - 0.40 * u # Declining MTE
# Individual treatment effects
beta_i <- mte_true(U_D_quantile)
epsilon0 <- rnorm(n, 0, 1)
epsilon1 <- rnorm(n, 0, 1)
Y0 <- alpha0 + alpha1 * ability + epsilon0
Y1 <- Y0 + beta_i + epsilon1 * 0.5
# Observed outcome
Y <- D * Y1 + (1 - D) * Y0
df <- data.frame(Y, D, ability, family_inc, proximity,
U_D_quantile, beta_i)
cat("=== Data Summary ===\n")
cat("N:", n, "\n")
cat("Pr(College):", round(mean(D), 3), "\n")
cat("Mean Y:", round(mean(Y), 2), "\n")
cat("True ATE:", round(mean(beta_i), 3), "\n")
cat("True ATT:", round(mean(beta_i[D == 1]), 3), "\n")
cat("True ATU:", round(mean(beta_i[D == 0]), 3), "\n")
cat("\nNote: ATT > ATE > ATU because of positive selection on gains\n")Expected output:
| Statistic | Value |
|---|---|
| N | 10,000 |
| Pr(College) | ~0.50 |
| True ATE | ~0.40 |
| True ATT | ~0.45–0.50 |
| True ATU | ~0.30–0.35 |
ATT > ATE > ATU because individuals who are most likely to attend college (low unobserved resistance) also have the highest returns. This pattern — selection on gains — is the defining feature of essential heterogeneity.
Step 2: Estimate the Propensity Score
The propensity score P(Z) is the probability of treatment given the instrument and covariates. In the MTE framework, the propensity score determines the margin of treatment.
# Probit model for propensity score
probit <- glm(D ~ ability + family_inc + proximity,
data = df, family = binomial(link = "probit"))
df$phat <- predict(probit, type = "response")
cat("=== Propensity Score (Probit First Stage) ===\n")
summary(probit)
cat("\nPropensity score range: [", round(min(df$phat), 3),
",", round(max(df$phat), 3), "]\n")
cat("Mean propensity score:", round(mean(df$phat), 3), "\n")
# Check that the instrument is significant
cat("\nProximity coefficient:", round(coef(probit)["proximity"], 4), "\n")
cat("z-statistic:", round(summary(probit)$coefficients["proximity", "z value"], 2), "\n")
cat("(Strong instrument: large z-statistic)\n")Expected output:
| Variable | Coefficient | SE | z-statistic |
|---|---|---|---|
| ability | ~0.35 | ~0.02 | ~17 |
| family_inc | ~0.20 | ~0.02 | ~10 |
| proximity | ~0.40 | ~0.02 | ~20 |
| Statistic | Value |
|---|---|
| Propensity score range | [~0.01, ~0.99] |
| Mean propensity score | ~0.50 |
The propensity score has good support — it spans nearly the full [0, 1] interval. This coverage is important for MTE estimation because the MTE can only be identified over the range of propensity scores observed in the data.
Step 3: Compute Local IV Estimates
The key insight of the MTE framework is that the local IV (LIV) estimator traces out the MTE curve. The LIV at propensity score p estimates the MTE at unobserved resistance u = p.
# The MTE is the derivative of E[Y | P(Z) = p] with respect to p
# MTE(p) = d E[Y | X, P = p] / dp
# Step 1: Regress Y on X and a polynomial in P(Z)
# E[Y | X, P] = X'alpha + K(P) where K(P) is a polynomial
# MTE(p) = K'(p) = derivative of K with respect to p
# Quadratic specification
df$phat2 <- df$phat^2
mte_reg <- lm(Y ~ ability + family_inc + phat + phat2, data = df)
cat("=== MTE Regression ===\n")
summary(mte_reg)
# MTE(u) = beta1 + 2*beta2*u (derivative of K(p) = beta1*p + beta2*p^2)
beta1 <- coef(mte_reg)["phat"]
beta2 <- coef(mte_reg)["phat2"]
# Evaluate MTE at several points
u_grid <- seq(0.05, 0.95, by = 0.05)
mte_estimated <- beta1 + 2 * beta2 * u_grid
mte_truth <- 0.60 - 0.40 * u_grid # True MTE from DGP
cat("\n=== MTE Curve ===\n")
cat(sprintf("%-8s %-12s %-12s\n", "u_D", "MTE (est)", "MTE (true)"))
for (i in seq_along(u_grid)) {
cat(sprintf("%-8.2f %-12.3f %-12.3f\n",
u_grid[i], mte_estimated[i], mte_truth[i]))
}Expected output:
| u_D | MTE (estimated) | MTE (true) |
|---|---|---|
| 0.10 | ~0.55 | 0.56 |
| 0.30 | ~0.47 | 0.48 |
| 0.50 | ~0.40 | 0.40 |
| 0.70 | ~0.32 | 0.32 |
| 0.90 | ~0.24 | 0.24 |
The estimated MTE curve declines from approximately 0.55 at u = 0.10 to approximately 0.24 at u = 0.90, closely tracking the true MTE. This declining pattern confirms positive selection on gains: individuals who are most eager to attend college (low u) benefit the most.
In the MTE framework, what does u_D represent, and why does a declining MTE curve indicate positive selection on gains?
Step 4: Compute ATE, ATT, and LATE from MTE Weights
Every standard treatment effect parameter is a weighted average of the MTE. The weights differ across parameters, which is why they differ when MTE is non-constant.
# ATE: uniform weights over [0, 1]
# ATE = integral of MTE(u) du from 0 to 1
# For MTE(u) = beta1 + 2*beta2*u:
# ATE = beta1 + beta2 (integral of 2u from 0 to 1 = 1)
ate_est <- beta1 + beta2
cat("=== Treatment Effect Parameters ===\n")
cat("ATE (estimated):", round(ate_est, 3), "\n")
cat("ATE (true):", round(mean(df$beta_i), 3), "\n\n")
# ATT: weights concentrated on low u (eager participants)
# ATT weight: w_ATT(u) = (1 - F_P(u)) / E[P]
# where F_P is the CDF of P(Z)
# Numerical integration
u_fine <- seq(0.001, 0.999, length.out = 500)
mte_fine <- beta1 + 2 * beta2 * u_fine
p_vals <- df$phat
# ATT weights: Pr(P > u) / E[P]
att_weights <- sapply(u_fine, function(u) mean(p_vals > u)) / mean(p_vals)
att_est <- sum(mte_fine * att_weights) / sum(att_weights)
cat("ATT (estimated):", round(att_est, 3), "\n")
cat("ATT (true):", round(mean(df$beta_i[df$D == 1]), 3), "\n\n")
# LATE: weights from specific instrument shift
# For a binary instrument shift from P(z0) to P(z1):
# LATE weights are uniform on [P(z0), P(z1)]
# Using the proximity instrument, approximate LATE
# as the average MTE over the complier region
p_low <- mean(df$phat[df$proximity < median(df$proximity)])
p_high <- mean(df$phat[df$proximity >= median(df$proximity)])
late_u <- seq(p_low, p_high, length.out = 100)
late_mte <- beta1 + 2 * beta2 * late_u
late_est <- mean(late_mte)
cat("LATE (estimated, proximity IV):", round(late_est, 3), "\n")
cat("LATE complier range: [", round(p_low, 3), ",", round(p_high, 3), "]\n\n")
cat("=== Summary ===\n")
cat(sprintf("%-20s %-12s\n", "Parameter", "Estimate"))
cat(sprintf("%-20s %-12.3f\n", "ATE", ate_est))
cat(sprintf("%-20s %-12.3f\n", "ATT", att_est))
cat(sprintf("%-20s %-12.3f\n", "LATE (proximity)", late_est))
cat("\nATT > LATE > ATE because MTE declines:\n")
cat("ATT weights low u (high MTE), ATE weights uniformly.\n")Expected output:
| Parameter | Estimated | True |
|---|---|---|
| ATE | ~0.40 | ~0.40 |
| ATT | ~0.47 | ~0.47 |
| LATE (proximity IV) | ~0.42 | — |
The ordering ATT > LATE > ATE follows directly from the declining MTE curve:
- ATT overweights low-u individuals (eager participants with high returns)
- ATE weights uniformly across the entire [0, 1] range
- LATE weights the complier region, which falls between
If the MTE curve were flat (constant), what would happen to the relationship between ATE, ATT, and LATE?
Step 5: Test for Essential Heterogeneity
If the MTE is flat, LATE = ATE = ATT and the elaborate MTE machinery is unnecessary. We test whether the MTE is significantly non-flat.
# Test: is the coefficient on P^2 significant?
# If beta2 = 0, the MTE is flat (no essential heterogeneity)
cat("=== Test for Essential Heterogeneity ===\n")
cat("H0: MTE is constant (beta2 = 0 in quadratic specification)\n\n")
# F-test on the P^2 term
anova_test <- anova(
lm(Y ~ ability + family_inc + phat, data = df), # restricted
lm(Y ~ ability + family_inc + phat + phat2, data = df) # unrestricted
)
cat("F-statistic:", round(anova_test$F[2], 2), "\n")
cat("p-value:", round(anova_test$"Pr(>F)"[2], 4), "\n")
cat("Conclusion:", ifelse(anova_test$"Pr(>F)"[2] < 0.05,
"Reject H0 — essential heterogeneity is present",
"Fail to reject H0 — MTE may be flat"), "\n\n")
# Alternative: test with cubic term
df$phat3 <- df$phat^3
mte_cubic <- lm(Y ~ ability + family_inc + phat + phat2 + phat3, data = df)
f_cubic <- anova(
lm(Y ~ ability + family_inc + phat, data = df),
mte_cubic
)
cat("Joint test (quadratic + cubic):\n")
cat("F-statistic:", round(f_cubic$F[2], 2), "\n")
cat("p-value:", round(f_cubic$"Pr(>F)"[2], 4), "\n")Expected output:
| Test | F-statistic | p-value | Conclusion |
|---|---|---|---|
| H0: flat MTE (beta2 = 0) | ~15–40 | < 0.001 | Reject: essential heterogeneity present |
| Joint test (quadratic + cubic) | ~8–20 | < 0.001 | Reject |
| Polynomial Order | ATE Estimate |
|---|---|
| Linear | ~0.40 |
| Quadratic | ~0.40 |
| Cubic | ~0.40 |
The essential heterogeneity test strongly rejects the null of a flat MTE. This result confirms that treatment effects vary systematically with unobserved resistance, and standard IV (LATE) should not be interpreted as the population-average treatment effect.
Step 6: Guided Exercise
Interpreting a Declining MTE Curve
You estimate MTE for a college attendance decision using proximity to college as an instrument. The propensity score ranges from 0.08 to 0.82. Your estimated MTE curve (quadratic in P) yields:
MTE(0.10) = 0.55 MTE(0.30) = 0.45 MTE(0.50) = 0.38 MTE(0.70) = 0.28 MTE(0.80) = 0.22
Derived parameters: ATE = 0.38, ATT = 0.47, LATE (proximity IV) = 0.42 Essential heterogeneity F-test: F = 12.4, p = 0.0004
Step 7: Exercises
-
Flat MTE. Modify the DGP so that treatment effects are homogeneous (beta_i = 0.40 for everyone). Verify that the essential heterogeneity test fails to reject and that ATE = ATT = LATE.
-
U-shaped MTE. Set MTE(u) = 0.50 - 0.80u + 0.80u^2 (high at both extremes, low in the middle). How does this affect the ordering of ATE, ATT, and LATE?
-
Weak instrument. Reduce the coefficient on proximity from 0.6 to 0.1. How does this affect the propensity score range and the precision of the MTE curve?
-
Semiparametric MTE. Instead of a polynomial in P, estimate the MTE using a series of local Wald estimates at different propensity score bins. Compare to the parametric approach.
Summary
In this lab you learned:
- The MTE framework provides a unified view of treatment effect heterogeneity along the margin of unobserved resistance to treatment
- In a Roy model with selection on gains, the MTE curve declines: eager participants benefit most
- ATE, ATT, and LATE are all weighted averages of the same MTE function — they differ because the weight functions differ
- When MTE is non-flat (essential heterogeneity), LATE from any particular instrument is a poor guide to ATE or ATT
- The propensity score support limits where the MTE can be identified; extrapolation beyond the support is assumption-driven
- The essential heterogeneity test determines whether the elaborate MTE framework is needed or whether standard IV suffices
- The MTE is estimated as the derivative of the conditional expectation of the outcome with respect to the propensity score