Marginal Treatment Effects (MTE)
Unifies IV/LATE, ATE, and ATT as weighted averages of the MTE curve -- the treatment effect as a function of unobserved resistance to treatment.
Quick Reference
- When to Use
- When you have IV and want to understand how the treatment effect varies with the unobserved propensity to select into treatment, or when you need to extrapolate from LATE to policy-relevant treatment effects.
- Key Assumption
- A valid instrument plus a threshold-crossing selection model: D = 1[P(Z) >= U_D] where U_D is the unobserved resistance. Monotonicity of P(Z) in Z.
- Common Mistake
- Assuming LATE equals ATE. LATE captures the effect for compliers only. MTE reveals that treatment effects can vary systematically with selection propensity.
- Estimated Time
- 3.5 hours
One-Line Implementation
mtefe y (treatment = instrument), polynomial(2) mte(u_D)ivmte(y ~ treatment | x1 + x2, instrument = ~ instrument, data = df, target = 'ate')# Manual: estimate E[Y|P(Z)=p] nonparametrically, then MTE(u) = d/dp E[Y|P=p] at p=uDownload Full Analysis Code
Complete scripts with diagnostics, robustness checks, and result export.
Motivating Example
A government is considering expanding a subsidized job training program. An earlier randomized encouragement design estimated a of $3,200 in annual earnings gains for compliers -- those induced to participate by the randomized encouragement letter.
The policy question is: if the program is expanded by relaxing eligibility rules, will the next group of participants benefit as much as the original compliers? The program director assumes yes and budgets accordingly.
But the original compliers were people at the margin of participation -- those who needed only a gentle push (the encouragement letter) to enroll. The expansion targets people who did not enroll even with encouragement. These individuals have higher to treatment -- perhaps they face greater barriers, have lower expected returns, or are less motivated.
The framework, developed by Heckman and Vytlacil (2005), reveals that the treatment effect varies systematically with the propensity to participate. When MTE declines with unobserved resistance -- as it often does when individuals self-select based on expected gains -- the LATE of $3,200 overstates the benefit for the next marginal participant. The for the expansion might be only $1,800.
Without MTE, the program director would have over-predicted benefits by 78%. With MTE, she can compute the correct marginal return and design the expansion accordingly.
A. Overview
What Marginal Treatment Effects Does
The MTE framework starts with a fundamental insight: in a world with treatment effect heterogeneity, different causal estimands -- ATE, ATT, LATE -- are all weighted averages of the same underlying object: the MTE curve.
The MTE is defined as:
where is the unobserved component of the selection decision, normalized to be uniform on . An individual selects into treatment when the propensity score exceeds their unobserved resistance:
Individuals with low (low resistance) are eager participants -- they select into treatment even when is low. Individuals with high (high resistance) are reluctant -- they participate only when is very high.
The MTE curve traces how the treatment effect varies across this spectrum of unobserved resistance.
The Unifying Framework
Every conventional treatment effect parameter is a weighted average of MTE:
where is the weight function specific to estimand :
| Estimand | Weight function | Interpretation |
|---|---|---|
| ATE | (uniform) | Averages over all resistance levels equally |
| ATT | Overweights eager participants (low ) | |
| ATU | Overweights reluctant non-participants (high ) | |
| LATE | for | Uniform over the complier margin |
| PRTE | depends on the policy | Weights determined by who the policy moves |
When MTE is flat (constant in ), all estimands are equal: LATE = ATE = ATT. This equality is the case of no essential heterogeneity. When MTE slopes downward (positive selection on gains), eager participants benefit more and ATT > ATE > ATU; LATE captures the effect somewhere in between, depending on the complier margin.
How It Differs from Standard IV
Standard IV with a single instrument identifies a single number: the LATE for the complier subpopulation defined by that instrument. Different instruments identify different LATEs for different complier groups. MTE goes further by recovering the entire curve of treatment effects as a function of , from which any target parameter can be computed as a weighted average.
Common Confusions
B. Identification
For MTE to be identified, three conditions must hold (Heckman & Vytlacil, 2005):
Assumption 1: Valid Instrument
Plain language: The instrument affects the outcome only through its effect on treatment . The instrument is relevant (it shifts ), exogenous (independent of potential outcomes and unobserved resistance), and satisfies the exclusion restriction.
Formally: and is a nontrivial function of .
This assumption is the same requirement as for standard IV/LATE identification. The MTE framework does not weaken the instrument validity requirements.
Assumption 2: Threshold-Crossing Selection Model
Plain language: Treatment take-up is determined by a threshold-crossing rule: an individual participates when the propensity score exceeds their unobserved resistance. This rule means there exists a single latent index that governs selection, and the instrument operates through this index.
Formally: where after normalization. The selection equation can be derived from a latent utility model: where represents unobserved costs/resistance, and is the CDF transformation.
Assumption 3: Monotonicity of in
Plain language: Increasing the instrument value (weakly) increases the probability of treatment for all individuals. There are no "defiers" -- individuals for whom a higher reduces participation.
Formally: This condition is the standard IV assumption, embedded in the threshold-crossing model. Because and is the same function for everyone, monotonicity is automatically satisfied.
The Local IV Identification Strategy
The key identification result is:
The derivative of the conditional expectation of with respect to the propensity score, evaluated at , gives the MTE at . Intuitively: a small increase in from to induces participation by the marginal group with . The corresponding change in the average outcome reveals the treatment effect for this marginal group.
This result means that variation in across different values of traces out the MTE curve. Richer variation in the instrument (more values of , wider support of ) identifies the MTE over a larger portion of .
Estimation Procedure
The Local IV Approach
The workhorse estimation procedure proceeds in three steps:
Step 1: Estimate the propensity score. Estimate using a probit or logit model.
Step 2: Estimate as a function of . Use either:
- Parametric approach: regress on , , , , , and
- Semiparametric approach: local polynomial regression of on
Step 3: Differentiate to obtain MTE. Compute evaluated at :
- For the parametric approach:
- For the semiparametric approach: numerical differentiation of the local polynomial fit
C. Visual Intuition
The central object of the MTE framework is the MTE curve -- a plot of against . The shape of this curve reveals the nature of selection into treatment:
- Flat MTE: No essential heterogeneity. Treatment effects are homogeneous across the selection dimension. LATE = ATE = ATT. Standard IV is sufficient.
- Downward-sloping MTE: Positive selection on gains. Individuals who are most eager to participate (low ) benefit most. ATT > ATE > ATU. LATE overestimates ATE.
- Upward-sloping MTE: Negative selection on gains (rare). Reluctant participants benefit more. ATT < ATE < ATU.
- U-shaped or inverse-U MTE: Non-monotonic heterogeneity. The relationship between selection and gains is complex.
The weight functions determine how each estimand aggregates the MTE curve:
- ATE weights are uniform: every point on the MTE curve receives equal weight
- ATT weights tilt toward low : the treated population is disproportionately composed of eager participants
- LATE weights are concentrated on the complier interval : only the margin shifted by the instrument matters
- PRTE weights depend on the specific policy: a program expansion to the next 10% weights the portion of the MTE curve corresponding to those marginal participants
MTE Curve: How Weight Functions Shape Estimands
MTE(u) = 1.5 + (-1.0)u. ATE weights uniformly, ATT overweights eager participants (low u_D), and LATE averages over compliers in [0.30, 0.60].
Estimation Results
| Estimator | β̂ | SE | 95% CI | Bias |
|---|---|---|---|---|
| ATE (uniform)closest | 1.000 | — | — | +0.000 |
| ATT (treated) | 1.167 | — | — | +0.167 |
| LATE (compliers) | 1.050 | — | — | +0.050 |
| True β | 1.000 | — | — | — |
MTE value at u_D = 0 (most eager participants)
How MTE changes with resistance (negative = positive selection on gains)
Lower bound of the complier subpopulation
Upper bound of the complier subpopulation
Why the difference?
The MTE slopes downward (slope = -1.0), indicating positive selection on gains: individuals with low unobserved resistance (eager participants) benefit more from treatment. ATT (1.167) exceeds ATE (1.000) because the treated are disproportionately eager participants who gain the most. Essential heterogeneity (ATT - ATE) = +0.167. This gap shows that extrapolating from the treated population to the whole population changes the estimated effect. LATE (1.050) captures the average treatment effect for compliers in the interval [0.30, 0.60]. Moving the complier interval shifts which part of the MTE curve the instrument identifies. A narrow interval near low u_D yields a LATE close to ATT; a narrow interval near high u_D pushes LATE below ATE.
D. Mathematical Derivation
Don't worry about the notation yet — here's what this means in words: The marginal treatment effect is identified as the derivative of E[Y|P(Z)=p] with respect to the propensity score p, evaluated at p = u_D.
Setup. Under the threshold-crossing model, treatment selection is , where is the propensity score and captures unobserved resistance to treatment.
Step 1: Conditional expectation. The observed outcome conditional on the propensity score is:
This integral relationship shows that is a running sum of the MTE curve from 0 to .
Step 2: Differentiate to recover MTE. Taking the derivative with respect to :
The MTE at any point is the slope of evaluated at . Intuitively, a small increase in induces the next marginal person (with ) to take treatment, and the change in the average outcome reveals their treatment effect.
Step 3: Estimands as weighted integrals. Any treatment effect parameter can be written as:
where is a weight function specific to the estimand:
- ATE: (uniform)
- ATT: (tilted toward eager participants)
- LATE: (concentrated on compliers)
The key identification result of Heckman and Vytlacil (2005) is that the MTE is recovered as the derivative of the conditional expectation of with respect to the propensity score .
Begin with the outcome equation under the threshold-crossing model. Because , the conditional expectation of given and is:
The integral accumulates treatment effects for all individuals whose unobserved resistance falls below the threshold -- these are the individuals induced into treatment. Differentiating both sides with respect to yields:
Intuitively, a marginal increase in the propensity score from to induces participation by individuals with . The resulting change in the conditional expectation of reveals the treatment effect for this marginal group -- precisely the MTE evaluated at .
This derivative-based identification strategy is the foundation of the local IV estimator: estimate as a smooth function of (parametrically or semiparametrically), then differentiate to recover the MTE curve.
E. Implementation
# Using the ivmte package (Mogstad, Santos, Torgovitsky)
library(ivmte)
mte_fit <- ivmte(
data = df,
target = "ate",
m0 = ~ x1 + x2 + u + I(u^2),
m1 = ~ x1 + x2 + u + I(u^2),
ivlike = y ~ treatment + x1 + x2,
propensity = treatment ~ instrument + x1 + x2,
instrument = ~ instrument
)
print(mte_fit)F. Diagnostics
Test for Essential Heterogeneity
The first diagnostic question is whether MTE is flat. If it is, LATE = ATE and the MTE framework adds nothing beyond standard IV. Test by including interactions in the outcome equation:
A joint F-test on tests whether MTE varies with . Rejection implies essential heterogeneity: LATE ATE.
Propensity Score Support
The MTE is identified only over the support of . Report:
- The range of (e.g., the 1st and 99th percentiles)
- What fraction of is covered
- Whether the target parameter (e.g., ATE, which requires full support) can be point-identified or only bounded
If the support is narrow, consider using the partial identification approach of Mogstad et al. (2018).
Visual Inspection of the MTE Curve
Plot the estimated MTE curve with confidence bands. Look for:
- The slope: is MTE declining, rising, or flat?
- Confidence band width: is the MTE precisely estimated?
- Boundary behavior: are the endpoints of the identified region reliable?
Compare Estimands
Compute ATE, ATT, ATU, and LATE from the estimated MTE curve. Large differences signal important treatment effect heterogeneity and indicate that LATE should not be interpreted as a general treatment effect.
Interpreting Your Results
Reading the Output
The key outputs from an MTE analysis are:
| Output | Interpretation |
|---|---|
| MTE curve | Plot of treatment effect vs. unobserved resistance. Slope reveals selection patterns. |
| ATE | Population-average treatment effect -- uniform weight on MTE |
| ATT | Effect on the treated -- overweights eager participants |
| LATE | Effect for compliers -- concentrated on the instrument's margin |
| PRTE | Effect for the specific policy change -- weights determined by the policy |
| Essential heterogeneity test | F-test on interactions; rejection means LATE ATE |
| P(Z) support | Range over which MTE is identified |
What to Report
A well-reported MTE analysis should include:
- The MTE curve with pointwise confidence bands
- ATE, ATT, LATE computed from the MTE, with standard errors
- PRTE for the specific policy change under consideration
- Essential heterogeneity test (F-test on interactions)
- Propensity score support -- range and coverage fraction
- Sensitivity to polynomial order and bandwidth
- First-stage diagnostics for the propensity score model
- Discussion of the threshold-crossing model's plausibility
G. What Can Go Wrong
LATE != ATE When Treatment Effects Are Heterogeneous
MTE is flat: treatment effect does not vary with unobserved resistance. All estimands agree.
LATE = 0.35, ATE = 0.35, ATT = 0.35. The MTE curve is a horizontal line. LATE generalizes perfectly to the entire population.
Insufficient Propensity Score Support
The instrument generates wide variation in P(Z), covering most of [0, 1]. MTE is identified over a broad range.
P(Z) ranges from 0.08 to 0.91. The MTE curve is estimated precisely over [0.08, 0.91], covering 83% of the unit interval. ATE can be computed with minimal extrapolation, and bounds are tight.
Violated Threshold-Crossing Model
Selection follows a threshold-crossing rule: individuals compare their propensity score P(Z) to their private resistance U_D and participate when P(Z) >= U_D.
The monotonicity assumption holds. The first stage is well-behaved. The MTE curve is smooth and the parametric and semiparametric estimates agree.
H. Practice
H.1 Concept Checks
The ATE weight function is uniform over [0, 1], but the LATE weight function is peaked at the complier margin. Why does this mean LATE can differ from ATE?
When does LATE equal ATE?
A researcher estimates the MTE curve using an instrument whose propensity score P(Z) ranges from 0.30 to 0.65. She then computes ATE = 0.25 by extrapolating the MTE curve to cover [0, 1]. What is wrong with this approach?
H.2 Guided Exercise
Interpreting an MTE Analysis of Returns to College
You study the returns to college education using proximity to a four-year college as an instrument, following the MTE approach. The propensity score (probability of attending college) is estimated via probit and ranges from 0.12 to 0.78. You estimate a quadratic MTE curve and compute target parameters. Your output: Parameter | Estimate | SE MTE at u_D = 0.15 | 0.52 | 0.08 MTE at u_D = 0.40 | 0.38 | 0.06 MTE at u_D = 0.60 | 0.25 | 0.07 MTE at u_D = 0.75 | 0.15 | 0.11 ATE (over identified region) | 0.33 | 0.05 ATT | 0.44 | 0.04 LATE (proximity IV) | 0.38 | 0.06 Essential heterogeneity F-test: F = 6.8, p = 0.009 Sensitivity (polynomial order): Linear MTE: ATE = 0.31, ATT = 0.42 Quadratic MTE: ATE = 0.33, ATT = 0.44 Cubic MTE: ATE = 0.34, ATT = 0.45
H.3 Error Detective
Read the analysis below carefully and identify the errors.
Select all errors you can find:
Read the analysis below carefully and identify the errors.
Select all errors you can find:
H.4 You Are the Referee
Read the paper summary below and write a brief referee critique (2-3 sentences) of the identification strategy.
Paper Summary
The authors study the returns to a subsidized vocational training program using the MTE framework. They use distance to the nearest training center as an instrument for participation, arguing it satisfies the exclusion restriction. The propensity score, estimated via probit, ranges from 0.18 to 0.62. They estimate a linear MTE curve and find a downward slope, concluding that eager participants benefit more than reluctant ones. They then extrapolate the MTE to the full [0, 1] interval and report ATE = 0.15, ATT = 0.32, and LATE = 0.24. They recommend expanding the program based on the positive ATE.
Key Table
| Variable | Coefficient | SE | p-value |
|---|---|---|---|
| MTE intercept | 0.42 | 0.09 | <0.001 |
| MTE slope | -0.38 | 0.18 | 0.035 |
| ATE (extrapolated) | 0.15 | 0.07 | 0.032 |
| ATT | 0.32 | 0.05 | <0.001 |
| LATE | 0.24 | 0.06 | <0.001 |
| P(Z) support | [0.18, 0.62] | ||
| Essential heterog. F | 4.5 | 0.035 | |
| N | 3,800 |
Authors' Identification Claim
The authors argue that distance to the training center shifts participation without directly affecting earnings, and that the MTE framework allows them to recover the full curve of treatment effects and extrapolate to the population ATE.
I. Swap-In: When to Use Something Else
-
IV/2SLS: when you only need the LATE and do not need to extrapolate to other target parameters. If essential heterogeneity is not a concern (or not testable), standard IV is simpler and more robust.
-
Matching: when selection is on observables and you want ATE or ATT. Matching addresses a different selection problem (selection on ) than MTE (selection on ).
-
Causal Forests: when you want to estimate heterogeneous treatment effects as a function of observed covariates. Causal forests estimate CATE but do not address selection on unobservables.
-
Heckman Selection Model: when the selection issue is sample selection (observing outcomes only for a non-random subsample) rather than treatment effect heterogeneity along the selection dimension. The Heckman model and MTE share the same structural foundation but answer different questions.
Limitations
-
Requires sufficient variation in the propensity score. With a binary instrument, the MTE is identified only over the interval . A narrow support means ATE and ATT cannot be point-identified without parametric extrapolation.
-
Threshold-crossing model is restrictive. The model assumes a single latent index governs selection. This assumption may not hold with multiple treatments, strategic interactions, or complex decision processes.
-
Computationally intensive. Semiparametric MTE estimation requires local polynomial regression and numerical differentiation, which can be sensitive to bandwidth and polynomial order choices.
-
Requires a valid instrument. All the standard IV assumptions (relevance, exogeneity, exclusion restriction) must hold. MTE does not relax these requirements; it adds the threshold-crossing structure.
-
Precision can be poor. The MTE involves estimating a derivative (a second-order object), which is inherently noisier than estimating a conditional mean (a first-order object). Confidence bands on the MTE curve can be wide, especially near the boundaries of the propensity score support.
J. Reviewer Checklist
Critical Reading Checklist
Paper Library
Foundational (3)
Heckman, J. J., & Vytlacil, E. (2005). Structural Equations, Treatment Effects, and Econometric Policy Evaluation.
Heckman and Vytlacil developed the marginal treatment effect (MTE) framework, showing that the MTE -- defined as the treatment effect for individuals at the margin of indifference between treatment and no treatment -- is the fundamental building block of treatment effect parameters. They proved that ATE, ATT, LATE, and policy-relevant treatment effects (PRTE) are all weighted averages of the MTE curve, with each estimand using a different weight function. The paper unified the treatment effects literature by demonstrating that IV estimates with different instruments recover different weighted averages of the same underlying MTE curve, resolving the puzzle of why different instruments produce different LATE estimates. This framework provides the theoretical foundation for extrapolating from experimental or quasi-experimental estimates to policy-relevant treatment effects.
Brinch, C. N., Mogstad, M., & Wiswall, M. (2017). Beyond LATE with a Discrete Instrument.
Brinch, Mogstad, and Wiswall showed how to estimate the MTE curve semiparametrically even with a discrete (binary or multivalued) instrument, which is the most common case in practice. They demonstrated that the local IV approach -- estimating E[Y|P(Z)=p] as a function of the propensity score and differentiating -- can be implemented with discrete instruments by imposing shape restrictions (e.g., linearity or monotonicity) on the MTE over the identified region. Applied to the returns to college education using the proximity-to-college instrument, they found that MTE declines with unobserved resistance: individuals who are most inclined to attend college benefit most, while the marginal student gains substantially less. This result implies that LATE overstates the effect for marginal policy expansions.
Mogstad, M., Santos, A., & Torgovitsky, A. (2018). Using Instrumental Variables for Inference about Policy Relevant Treatment Parameters.
Mogstad, Santos, and Torgovitsky developed a framework for using instrumental variables to conduct inference on policy-relevant treatment effects under weaker assumptions than full MTE identification. They showed that even when the MTE is only partially identified (due to limited support of the propensity score), informative bounds on ATE, ATT, and PRTE can be derived by combining the identified portion of the MTE with shape restrictions. Their approach uses linear programming to compute sharp bounds on the target parameter given the data and assumptions. The paper provides the R package ivmte for implementation and demonstrates that useful policy conclusions can be drawn even without point-identifying the entire MTE curve.
Application (1)
Cornelissen, T., Dustmann, C., Raute, A., & Schonberg, U. (2016). From LATE to MTE: Alternative Methods for the Evaluation of Policy Interventions.
Cornelissen, Dustmann, Raute, and Schonberg provided an accessible applied guide to MTE estimation, illustrating the method in the context of child care subsidies in Germany. They estimated the MTE of child care attendance on child development, using variation in subsidy generosity as an instrument. The declining MTE curve showed that children whose parents were most eager to use child care (low unobserved resistance) benefited most, while children at the margin of the subsidy expansion benefited less. The paper clearly demonstrates how to move from LATE to policy-relevant treatment effects and serves as an exemplary applied MTE analysis, showing how the framework can inform the design of program expansions.