Lab·tutorial·9 min read

tutorial90 minutes

Lab: Marginal Treatment Effects from Scratch

Implement marginal treatment effects step by step. Simulate a Roy model with selection on gains, estimate the propensity score, compute local IV estimates, trace the MTE curve, and derive ATE, ATT, and LATE from MTE weights.

MethodMarginal Treatment Effects (MTE)

LanguagesPython, R, Stata

DatasetCollege attendance and earnings (simulated)

Overview

Marginal Treatment Effects (MTE) provide a unified framework for understanding how treatment effects vary with individuals' propensity to be treated. The MTE function traces how the return to treatment changes as we move from eager participants (low unobserved resistance) to reluctant participants (high unobserved resistance). All standard treatment effect parameters — ATE, ATT, LATE — are weighted averages of the MTE.

What you will learn:

How to simulate a Roy model with selection on gains
How to estimate the propensity score for treatment participation
How to compute local IV estimates at different propensity score values
How to trace the MTE curve and interpret its shape
How to recover ATE, ATT, and LATE as weighted averages of the MTE

Prerequisites: Instrumental variables (see the IV tutorial lab), propensity score estimation.

Step 1: Simulate a Roy Model with Selection on Gains

In the Roy model, individuals choose treatment (college) based partly on their comparative advantage. Those who benefit most are most likely to attend.

1set.seed(2011)
2n <- 10000
3
4# Observed covariates
5ability <- rnorm(n)
6family_inc <- rnorm(n)
7
8# Instrument: proximity to a four-year college
9# Affects the cost of attending college but not earnings directly
10proximity <- rnorm(n)
11
12# Unobserved heterogeneity in treatment effect (V)
13# V ~ Uniform(0,1) after transformation
14# Low V = eager to attend college (low unobserved resistance)
15# High V = reluctant (high unobserved resistance)
16U_D <- rnorm(n)
17
18# Propensity score: probability of attending college
19# P(Z) = Phi(gamma0 + gamma1*ability + gamma2*family_inc + gamma3*proximity)
20gamma <- c(0.3, 0.5, 0.3, 0.6)  # proximity is a strong instrument
21latent <- gamma[1] + gamma[2] * ability + gamma[3] * family_inc +
22gamma[4] * proximity - U_D
23
24# Treatment decision: attend college if net benefit > 0
25D <- as.integer(latent > 0)
26
27# Potential outcomes with heterogeneous treatment effects
28# Y(0) = alpha0 + alpha1 * ability + epsilon0
29# Y(1) = alpha0 + beta(U_D) + alpha1 * ability + epsilon1
30# MTE(u) = E[Y(1)-Y(0) | V=u] declines in u (positive selection on gains)
31alpha0 <- 10
32alpha1 <- 0.8
33
34# Treatment effect varies with unobserved resistance
35# People with low resistance (who select in) have higher returns
36# This creates essential heterogeneity
37U_D_quantile <- pnorm(U_D)  # Transform to uniform [0,1]
38mte_true <- function(u) 0.60 - 0.40 * u  # Declining MTE
39
40# Individual treatment effects
41beta_i <- mte_true(U_D_quantile)
42
43epsilon0 <- rnorm(n, 0, 1)
44epsilon1 <- rnorm(n, 0, 1)
45
46Y0 <- alpha0 + alpha1 * ability + epsilon0
47Y1 <- Y0 + beta_i + epsilon1 * 0.5
48
49# Observed outcome
50Y <- D * Y1 + (1 - D) * Y0
51
52df <- data.frame(Y, D, ability, family_inc, proximity,
53               U_D_quantile, beta_i)
54
55cat("=== Data Summary ===\n")
56cat("N:", n, "\n")
57cat("Pr(College):", round(mean(D), 3), "\n")
58cat("Mean Y:", round(mean(Y), 2), "\n")
59cat("True ATE:", round(mean(beta_i), 3), "\n")
60cat("True ATT:", round(mean(beta_i[D == 1]), 3), "\n")
61cat("True ATU:", round(mean(beta_i[D == 0]), 3), "\n")
62cat("\nNote: ATT > ATE > ATU because of positive selection on gains\n")

Expected output:

Statistic	Value
N	10,000
Pr(College)	~0.50
True ATE	~0.40
True ATT	~0.45–0.50
True ATU	~0.30–0.35

ATT > ATE > ATU because individuals who are most likely to attend college (low unobserved resistance) also have the highest returns. This pattern — selection on gains — is the defining feature of essential heterogeneity.

Step 2: Estimate the Propensity Score

The propensity score P(Z) is the probability of treatment given the instrument and covariates. In the MTE framework, the propensity score determines the margin of treatment.

1# Probit model for propensity score
2probit <- glm(D ~ ability + family_inc + proximity,
3            data = df, family = binomial(link = "probit"))
4df$phat <- predict(probit, type = "response")
5
6cat("=== Propensity Score (Probit First Stage) ===\n")
7summary(probit)
8
9cat("\nPropensity score range: [", round(min(df$phat), 3),
10  ",", round(max(df$phat), 3), "]\n")
11cat("Mean propensity score:", round(mean(df$phat), 3), "\n")
12
13# Check that the instrument is significant
14cat("\nProximity coefficient:", round(coef(probit)["proximity"], 4), "\n")
15cat("z-statistic:", round(summary(probit)$coefficients["proximity", "z value"], 2), "\n")
16cat("(Strong instrument: large z-statistic)\n")

Expected output:

Variable	Coefficient	SE	z-statistic
ability	~0.35	~0.02	~17
family_inc	~0.20	~0.02	~10
proximity	~0.40	~0.02	~20

Statistic	Value
Propensity score range	[~0.01, ~0.99]
Mean propensity score	~0.50

The propensity score has good support — it spans nearly the full [0, 1] interval. This coverage is important for MTE estimation because the MTE can only be identified over the range of propensity scores observed in the data.

Step 3: Compute Local IV Estimates

The key insight of the MTE framework is that the local IV (LIV) estimator traces out the MTE curve. The LIV at propensity score p estimates the MTE at unobserved resistance u = p.

1# The MTE is the derivative of E[Y | P(Z) = p] with respect to p
2# MTE(p) = d E[Y | X, P = p] / dp
3
4# Step 1: Regress Y on X and a polynomial in P(Z)
5# E[Y | X, P] = X'alpha + K(P) where K(P) is a polynomial
6# MTE(p) = K'(p) = derivative of K with respect to p
7
8# Quadratic specification
9df$phat2 <- df$phat^2
10mte_reg <- lm(Y ~ ability + family_inc + phat + phat2, data = df)
11
12cat("=== MTE Regression ===\n")
13summary(mte_reg)
14
15# MTE(u) = beta1 + 2*beta2*u (derivative of K(p) = beta1*p + beta2*p^2)
16beta1 <- coef(mte_reg)["phat"]
17beta2 <- coef(mte_reg)["phat2"]
18
19# Evaluate MTE at several points
20u_grid <- seq(0.05, 0.95, by = 0.05)
21mte_estimated <- beta1 + 2 * beta2 * u_grid
22mte_truth <- 0.60 - 0.40 * u_grid  # True MTE from DGP
23
24cat("\n=== MTE Curve ===\n")
25cat(sprintf("%-8s %-12s %-12s\n", "u_D", "MTE (est)", "MTE (true)"))
26for (i in seq_along(u_grid)) {
27cat(sprintf("%-8.2f %-12.3f %-12.3f\n",
28    u_grid[i], mte_estimated[i], mte_truth[i]))
29}

Expected output:

u_D	MTE (estimated)	MTE (true)
0.10	~0.55	0.56
0.30	~0.47	0.48
0.50	~0.40	0.40
0.70	~0.32	0.32
0.90	~0.24	0.24

The estimated MTE curve declines from approximately 0.55 at u = 0.10 to approximately 0.24 at u = 0.90, closely tracking the true MTE. This declining pattern confirms positive selection on gains: individuals who are most eager to attend college (low u) benefit the most.

Concept Check

In the MTE framework, what does u_D represent, and why does a declining MTE curve indicate positive selection on gains?

u_D is the individual's observed propensity score. A declining MTE means observed characteristics predict lower returns for high-propensity individuals.u_D represents the unobserved resistance to treatment (position in the distribution of unobserved costs/reluctance). A declining MTE means individuals with lower resistance (who self-select into treatment) have higher returns, indicating they sort based on their gains.u_D is the quantile of the treatment effect distribution. A declining MTE simply means there is heterogeneity in treatment effects.u_D is the error term in the outcome equation. A declining MTE means the outcome model is misspecified.

Step 4: Compute ATE, ATT, and LATE from MTE Weights

Every standard treatment effect parameter is a weighted average of the MTE. The weights differ across parameters, which is why they differ when MTE is non-constant.

1# ATE: uniform weights over [0, 1]
2# ATE = integral of MTE(u) du from 0 to 1
3# For MTE(u) = beta1 + 2*beta2*u:
4# ATE = beta1 + beta2 (integral of 2u from 0 to 1 = 1)
5ate_est <- beta1 + beta2
6cat("=== Treatment Effect Parameters ===\n")
7cat("ATE (estimated):", round(ate_est, 3), "\n")
8cat("ATE (true):", round(mean(df$beta_i), 3), "\n\n")
9
10# ATT: weights concentrated on low u (eager participants)
11# ATT weight: w_ATT(u) = (1 - F_P(u)) / E[P]
12# where F_P is the CDF of P(Z)
13# Numerical integration
14u_fine <- seq(0.001, 0.999, length.out = 500)
15mte_fine <- beta1 + 2 * beta2 * u_fine
16p_vals <- df$phat
17
18# ATT weights: Pr(P > u) / E[P]
19att_weights <- sapply(u_fine, function(u) mean(p_vals > u)) / mean(p_vals)
20att_est <- sum(mte_fine * att_weights) / sum(att_weights)
21
22cat("ATT (estimated):", round(att_est, 3), "\n")
23cat("ATT (true):", round(mean(df$beta_i[df$D == 1]), 3), "\n\n")
24
25# LATE: weights from specific instrument shift
26# For a binary instrument shift from P(z0) to P(z1):
27# LATE weights are uniform on [P(z0), P(z1)]
28# Using the proximity instrument, approximate LATE
29# as the average MTE over the complier region
30p_low <- mean(df$phat[df$proximity < median(df$proximity)])
31p_high <- mean(df$phat[df$proximity >= median(df$proximity)])
32
33late_u <- seq(p_low, p_high, length.out = 100)
34late_mte <- beta1 + 2 * beta2 * late_u
35late_est <- mean(late_mte)
36
37cat("LATE (estimated, proximity IV):", round(late_est, 3), "\n")
38cat("LATE complier range: [", round(p_low, 3), ",", round(p_high, 3), "]\n\n")
39
40cat("=== Summary ===\n")
41cat(sprintf("%-20s %-12s\n", "Parameter", "Estimate"))
42cat(sprintf("%-20s %-12.3f\n", "ATE", ate_est))
43cat(sprintf("%-20s %-12.3f\n", "ATT", att_est))
44cat(sprintf("%-20s %-12.3f\n", "LATE (proximity)", late_est))
45cat("\nATT > LATE > ATE because MTE declines:\n")
46cat("ATT weights low u (high MTE), ATE weights uniformly.\n")

Expected output:

Parameter	Estimated	True
ATE	~0.40	~0.40
ATT	~0.47	~0.47
LATE (proximity IV)	~0.42	—

The ordering ATT > LATE > ATE follows directly from the declining MTE curve:

ATT overweights low-u individuals (eager participants with high returns)
ATE weights uniformly across the entire [0, 1] range
LATE weights the complier region, which falls between

Concept Check

If the MTE curve were flat (constant), what would happen to the relationship between ATE, ATT, and LATE?

ATT would still exceed ATE because treated individuals are a selected sample.ATE, ATT, and LATE would all be equal, because all weight functions integrate the same constant MTE.LATE would be undefined because there would be no compliers.ATE would exceed ATT because the untreated would now benefit more.

Step 5: Test for Essential Heterogeneity

If the MTE is flat, LATE = ATE = ATT and the elaborate MTE machinery is unnecessary. We test whether the MTE is significantly non-flat.

1# Test: is the coefficient on P^2 significant?
2# If beta2 = 0, the MTE is flat (no essential heterogeneity)
3cat("=== Test for Essential Heterogeneity ===\n")
4cat("H0: MTE is constant (beta2 = 0 in quadratic specification)\n\n")
5
6# F-test on the P^2 term
7anova_test <- anova(
8lm(Y ~ ability + family_inc + phat, data = df),       # restricted
9lm(Y ~ ability + family_inc + phat + phat2, data = df) # unrestricted
10)
11
12cat("F-statistic:", round(anova_test$F[2], 2), "\n")
13cat("p-value:", round(anova_test$"Pr(>F)"[2], 4), "\n")
14cat("Conclusion:", ifelse(anova_test$"Pr(>F)"[2] < 0.05,
15  "Reject H0 — essential heterogeneity is present",
16  "Fail to reject H0 — MTE may be flat"), "\n\n")
17
18# Alternative: test with cubic term
19df$phat3 <- df$phat^3
20mte_cubic <- lm(Y ~ ability + family_inc + phat + phat2 + phat3, data = df)
21f_cubic <- anova(
22lm(Y ~ ability + family_inc + phat, data = df),
23mte_cubic
24)
25cat("Joint test (quadratic + cubic):\n")
26cat("F-statistic:", round(f_cubic$F[2], 2), "\n")
27cat("p-value:", round(f_cubic$"Pr(>F)"[2], 4), "\n")

Expected output:

Test	F-statistic	p-value	Conclusion
H0: flat MTE (beta2 = 0)	~15–40	< 0.001	Reject: essential heterogeneity present
Joint test (quadratic + cubic)	~8–20	< 0.001	Reject

Polynomial Order	ATE Estimate
Linear	~0.40
Quadratic	~0.40
Cubic	~0.40

The essential heterogeneity test strongly rejects the null of a flat MTE. This result confirms that treatment effects vary systematically with unobserved resistance, and standard IV (LATE) should not be interpreted as the population-average treatment effect.

Step 6: Guided Exercise

Guided Exercise

Interpreting a Declining MTE Curve

You estimate MTE for a college attendance decision using proximity to college as an instrument. The propensity score ranges from 0.08 to 0.82. Your estimated MTE curve (quadratic in P) yields:

MTE(0.10) = 0.55 MTE(0.30) = 0.45 MTE(0.50) = 0.38 MTE(0.70) = 0.28 MTE(0.80) = 0.22

Derived parameters: ATE = 0.38, ATT = 0.47, LATE (proximity IV) = 0.42 Essential heterogeneity F-test: F = 12.4, p = 0.0004

Step 7: Exercises

Flat MTE. Modify the DGP so that treatment effects are homogeneous (beta_i = 0.40 for everyone). Verify that the essential heterogeneity test fails to reject and that ATE = ATT = LATE.
U-shaped MTE. Set MTE(u) = 0.50 - 0.80u + 0.80u^2 (high at both extremes, low in the middle). How does this affect the ordering of ATE, ATT, and LATE?
Weak instrument. Reduce the coefficient on proximity from 0.6 to 0.1. How does this affect the propensity score range and the precision of the MTE curve?
Semiparametric MTE. Instead of a polynomial in P, estimate the MTE using a series of local Wald estimates at different propensity score bins. Compare to the parametric approach.

Summary

In this lab you learned:

The MTE framework provides a unified view of treatment effect heterogeneity along the margin of unobserved resistance to treatment
In a Roy model with selection on gains, the MTE curve declines: eager participants benefit most
ATE, ATT, and LATE are all weighted averages of the same MTE function — they differ because the weight functions differ
When MTE is non-flat (essential heterogeneity), LATE from any particular instrument is a poor guide to ATE or ATT
The propensity score support limits where the MTE can be identified; extrapolation beyond the support is assumption-driven
The essential heterogeneity test determines whether the elaborate MTE framework is needed or whether standard IV suffices
The MTE is estimated as the derivative of the conditional expectation of the outcome with respect to the propensity score

Overview#

Step 1: Simulate a Roy Model with Selection on Gains#

Step 2: Estimate the Propensity Score#

Step 3: Compute Local IV Estimates#

Step 4: Compute ATE, ATT, and LATE from MTE Weights#

Step 5: Test for Essential Heterogeneity#

Step 6: Guided Exercise#

Step 7: Exercises#

Summary#

Overview

Step 1: Simulate a Roy Model with Selection on Gains

Step 2: Estimate the Propensity Score

Step 3: Compute Local IV Estimates

Step 4: Compute ATE, ATT, and LATE from MTE Weights

Step 5: Test for Essential Heterogeneity

Step 6: Guided Exercise

Step 7: Exercises

Summary