Lee Bounds for Attrition
When point identification fails — especially due to differential attrition — informative bounds can still be useful.
When You Cannot Pin It Down
Sometimes you have to admit that the data cannot tell you exactly what the answer is — only that it lies within a range. This situation is the world of , and if you find it frustrating, you are in good company. But it turns out that knowing an effect is "between 0.05 and 0.25" is often far more useful — and far more honest — than reporting a precise but biased point estimate.
The most common setting where partial identification arises in applied economics is sample selection: the outcome you care about is only observed for a non-random subset of your sample. The classic example is wages. You want to estimate the effect of a job training program on wages, but wages are only observed for people who are employed. If the training program itself changes who is employed (as it almost certainly does), then comparing wages among the employed is contaminated by selection.
Sample selection is not a minor technical issue. It threatens the validity of any study where the treatment affects whether the outcome is observed.
Why It Matters
If you ignore sample selection and simply compare outcomes among observed units, your treatment effect estimate is biased — potentially severely. Lee bounds give you honest, assumption-lean bounds on the true effect, letting you report what the data can actually support rather than a precise but misleading point estimate. Reviewers increasingly expect attrition analysis in experimental work, and Lee bounds are the standard tool for it.
Why Point Identification Fails with Differential Attrition
Consider a randomized experiment evaluating a job training program. You randomly assign 1,000 people to training and 1,000 to control. After six months, you observe:
- Training group: 700 employed (70%), wages observed
- Control group: 600 employed (60%), wages observed
You want to estimate the effect of training on wages. But here is the problem: the 700 employed in the treatment group and the 600 employed in the control group are different populations. Training caused 100 more people to be employed. Those 100 "marginal" workers — people who would not have been employed without the training — are probably different from the "always-employed" workers (e.g., lower skill, lower potential wages).
When you compare average wages among the employed, you are comparing:
- Treatment: a mixture of always-employed workers and newly employed workers
- Control: only always-employed workers
The comparison is contaminated by the composition change. You cannot separate the effect of training on wages from the effect of training on who is observed. Randomization guarantees balance in the full sample, but it says nothing about the selected subsample.
Two Approaches: Heckman vs. Bounds
The Heckman Selection Model
The traditional solution to sample selection is the Heckman (1979) model.
(Heckman, 1979)The Heckman model requires:
- A selection equation (a model of who is observed)
- An exclusion restriction (a variable that affects selection but not the outcome)
- Joint normality of the error terms
These conditions are strong requirements. The exclusion restriction is often hard to justify — what affects employment but not wages? Joint normality is a functional form assumption that may not hold. In practice, Heckman estimates can be fragile and sensitive to specification.
The Bounds Approach
The alternative, pioneered by Charles Manski and applied powerfully by David Lee, is to give up on point identification and instead bound the treatment effect using weaker assumptions.
(Manski, 2003)The key insight: if you are willing to assume less, you learn less — but what you learn is more credible. A wide but honest bound beats a precise but questionable point estimate.
Lee (2009) Bounds: The Method
Lee (2009) developed a practical, widely used method for bounding treatment effects in the presence of sample selection.
(Lee, 2009)The Monotonicity Assumption
Lee bounds require a single key assumption: . The treatment must affect selection in one direction only:
where is the selection indicator (1 = observed, 0 = not observed) under treatment status .
In the job training example: training can only increase (or leave unchanged) the probability of employment for every individual. No one who would have been employed without training becomes unemployed because of training.
The Trimming Procedure
Under monotonicity, the treatment group contains everyone the control group contains plus some extra individuals who were "brought in" by the treatment. To make the groups comparable, we need to remove those extra individuals. The question is: which ones?
We do not know. But we can construct the best and worst cases.
Step 1: Compute the selection rates:
Under monotonicity with the treatment increasing selection: .
Step 2: Compute the trimming proportion:
This quantity is the fraction of the treatment group's observed sample that was "brought in" by the treatment.
Step 3: Trim the treatment group to make it comparable:
- Upper bound: Remove the bottom fraction of the treated group's outcome distribution. The remaining treated individuals have the highest outcomes. Comparing them to the control group gives an upper bound on the treatment effect for always-observed individuals.
- Lower bound: Remove the top fraction of the treated group's outcome distribution. Comparing the remaining (lowest) outcomes to the control group gives a lower bound.
Don't worry about the notation yet — here's what this means in words: Under monotonicity, the extra individuals brought into the sample by treatment could be at any point in the outcome distribution. The worst case (lower bound) is that they are at the top; the best case (upper bound) is that they are at the bottom.
Under monotonicity, we can partition the treatment group's observed sample into two types:
- Always-observed (): individuals who would be observed regardless of treatment status.
- Compliers (): individuals brought into the sample by treatment.
The control group's observed sample contains only always-observed types (under monotonicity). So the ideal comparison is: .
We observe directly from the control group. But the treatment group mixes always-observed and compliers. We do not know which individuals are compliers.
The worst case for the treatment effect is that compliers have the highest outcomes (so removing them from the top gives the lowest remaining mean). The best case is that compliers have the lowest outcomes (so removing them from the bottom gives the highest remaining mean).
Formally:
where is the -th quantile of the treated outcome distribution.
These bounds are sharp — they are the tightest possible bounds given only the monotonicity assumption and random assignment. No additional restriction can narrow them without additional assumptions.
A Worked Example
Return to our training program:
- Treatment: 700 of 1,000 employed (70%)
- Control: 600 of 1,000 employed (60%)
Step 1: ,
Step 2:
So 14.3% of the treatment group's employed workers were "brought in" by the program.
Step 3: Trim the treatment group.
Suppose the average wage in the control group is $15/hour. The average wage in the full treatment group is $15.50/hour.
- Trim the bottom 14.3% of the treatment wage distribution. The remaining 85.7% have an average wage of $16.20. Upper bound = $16.20 - $15.00 = $1.20/hour.
- Trim the top 14.3% of the treatment wage distribution. The remaining 85.7% have an average wage of $14.80. Lower bound = $14.80 - $15.00 = -$0.20/hour.
The Lee bounds are [-$0.20, $1.20]. The training program's effect on wages (for always-employed workers) is somewhere in this range. Notice the bounds include zero — we cannot rule out that training has no wage effect, even though the naive comparison shows a positive difference.
Interactive: Attrition and Bounds
Lee Bounds Explorer
Adjust the differential attrition rate — the gap between treatment and control group observation rates — and watch the Lee bounds widen. When attrition is symmetric, the bounds collapse to a narrow interval around the point estimate. As differential attrition grows, the bounds expand, reflecting increasing uncertainty about the treatment effect for always-observed individuals.
Try setting both observation rates to the same value. The bounds collapse because there is no differential selection. Now widen the gap and watch the bounds expand — this widening is the cost of not knowing who was "brought in" by the treatment.
Tightening the Bounds
Lee bounds can be wide, especially when differential attrition is large. Two main strategies help:
1. Condition on Pre-Treatment Covariates
If you have baseline covariates that predict the outcome, compute Lee bounds within covariate cells and then average. Within-cell outcome distributions are less dispersed, so the bounds within each cell are tighter. The overall bounds (a weighted average across cells) are tighter than the unconditional bounds.
This tightening works because trimming removes a fixed fraction of the outcome distribution. If the within-cell distribution has less spread, trimming removes less extreme values, producing tighter bounds.
2. Reduce Differential Attrition
The most effective strategy is prevention. Minimize differential attrition through:
- Intensive follow-up and tracking
- Administrative data linkage (which eliminates survey non-response)
- Incentive payments for survey completion
- Short, simple outcome measures that maximize response
Every percentage point of differential attrition widens the bounds. A 2-percentage-point differential is far more manageable than a 15-percentage-point differential. Accounting for expected attrition in your power analysis at the design stage helps ensure the resulting bounds remain informative.
Lee Bounds vs. Manski Worst-Case Bounds
It is important not to confuse Lee bounds with Manski's "worst-case" bounds, which use no assumptions at all beyond the support of the outcome:
| Manski Bounds | Lee Bounds | |
|---|---|---|
| Assumption | Only that is bounded in | Monotonicity of selection |
| Width | Often very wide (depends on outcome range) | Narrower (depends on differential attrition) |
| Impute missing outcomes as | Extreme values ( or ) | Trimmed quantiles from observed data |
| When useful | As a baseline; monotonicity is not credible | When monotonicity is plausible |
Manski bounds are often uninformatively wide because they impute the worst possible outcomes for missing data. Lee bounds are typically much tighter because monotonicity restricts the set of possible imputations.
When to Use Lee Bounds
| Setting | Use Lee Bounds? | Why |
|---|---|---|
| RCT with differential attrition | Yes | The standard tool for this setting |
| RCT with symmetric attrition | Usually not needed | No differential selection; bias is unlikely |
| Natural experiment (e.g., DiD) with outcome only for selected sample | Yes, if monotonicity holds | Same logic applies to quasi-experiments |
| Study where treatment reduces observation (e.g., mortality) | Yes, but reverse the direction | If treatment reduces , trim the control group |
| Observational study with selection | Possible, but requires more caution | Lee bounds assume random assignment; additional assumptions needed |
How to Do It: Code
lee_bounds <- function(y, treatment, selection, alpha = 0.05, n_boot = 1000) {
# Observation rates
p1 <- mean(selection[treatment == 1])
p0 <- mean(selection[treatment == 0])
# Determine trimming direction
if (p1 >= p0) {
q <- (p1 - p0) / p1
trim_group <- "treatment"
} else {
q <- (p0 - p1) / p0
trim_group <- "control"
}
# Observed outcomes
y1 <- y[treatment == 1 & selection == 1]
y0 <- y[treatment == 0 & selection == 1]
compute_bounds <- function(y1, y0, q) {
if (q < 1e-10) {
# No differential attrition
return(c(lower = mean(y1) - mean(y0), upper = mean(y1) - mean(y0)))
}
# Upper bound: trim bottom q of treatment group
cutoff_low <- quantile(y1, q)
upper <- mean(y1[y1 >= cutoff_low]) - mean(y0)
# Lower bound: trim top q of treatment group
cutoff_high <- quantile(y1, 1 - q)
lower <- mean(y1[y1 <= cutoff_high]) - mean(y0)
c(lower = lower, upper = upper)
}
# Point estimates
bounds <- compute_bounds(y1, y0, q)
# Bootstrap confidence intervals
n1 <- sum(treatment == 1)
n0 <- sum(treatment == 0)
boot_bounds <- replicate(n_boot, {
idx1 <- sample(which(treatment == 1), n1, replace = TRUE)
idx0 <- sample(which(treatment == 0), n0, replace = TRUE)
b_y1 <- y[idx1[selection[idx1] == 1]]
b_y0 <- y[idx0[selection[idx0] == 1]]
b_p1 <- mean(selection[idx1])
b_p0 <- mean(selection[idx0])
b_q <- max(0, (b_p1 - b_p0) / b_p1)
compute_bounds(b_y1, b_y0, b_q)
})
ci_lower <- quantile(boot_bounds["lower", ], c(alpha/2, 1-alpha/2))
ci_upper <- quantile(boot_bounds["upper", ], c(alpha/2, 1-alpha/2))
list(
lower = bounds["lower"],
upper = bounds["upper"],
ci_lower = ci_lower,
ci_upper = ci_upper,
trimming_proportion = q,
p_treated = p1,
p_control = p0
)
}
# Usage
result <- lee_bounds(
y = df$wage,
treatment = df$treatment,
selection = df$employed
)
cat(sprintf("Lee Bounds: [%.3f, %.3f]\n", result$lower, result$upper))
cat(sprintf("95%% CI for lower bound: [%.3f, %.3f]\n",
result$ci_lower[1], result$ci_lower[2]))
cat(sprintf("95%% CI for upper bound: [%.3f, %.3f]\n",
result$ci_upper[1], result$ci_upper[2]))Stata: Manual Implementation
* Step 1: Selection rates
sum employed if treatment == 1
local p1 = r(mean)
sum employed if treatment == 0
local p0 = r(mean)
* Step 2: Trimming proportion
local q = (`p1' - `p0') / `p1'
di "Trimming proportion: " `q'
* Step 3: Quantiles for trimming
_pctile wage if treatment == 1 & employed == 1, p(`=`q'*100')
local cutoff_lower = r(r1)
_pctile wage if treatment == 1 & employed == 1, p(`=(1-`q')*100')
local cutoff_upper = r(r1)
* Upper bound: keep above q-th percentile
sum wage if treatment == 1 & employed == 1 & wage >= `cutoff_lower'
local mean_t_upper = r(mean)
sum wage if treatment == 0 & employed == 1
local mean_c = r(mean)
di "Upper bound: " `mean_t_upper' - `mean_c'
* Lower bound: keep below (1-q)-th percentile
sum wage if treatment == 1 & employed == 1 & wage <= `cutoff_upper'
local mean_t_lower = r(mean)
di "Lower bound: " `mean_t_lower' - `mean_c'How to Report Lee Bounds
A well-reported Lee bounds analysis includes:
- The attrition or selection rates for treatment and control groups.
- A test for differential attrition (is the difference statistically significant?).
- The monotonicity assumption, stated explicitly and justified for your setting.
- The bounds with confidence intervals (bootstrapped).
- Comparison with the naive estimate (ignoring selection).
- Whether covariates were used to tighten bounds.
Example write-up:
Employment rates are 70% in the treatment group and 60% in the control group (p < 0.001), indicating that the training program increased employment. Because wages are only observed for employed individuals, the naive comparison of treatment and control wages is contaminated by differential selection. Following Lee (2009), we compute bounds on the wage effect under the assumption that training weakly increases employment for all individuals (monotonicity). The trimming proportion is 14.3%. The Lee bounds for the treatment effect on hourly wages are [-$0.20, $1.20] (95% CI for the lower bound: [-$0.85, $0.45]; 95% CI for the upper bound: [$0.55, $1.85]). The bounds include zero, so we cannot reject that training has no effect on wages for always-employed workers. Conditioning on baseline covariates (age, gender, education) tightens the bounds to [$0.05, $0.95].
Common Mistakes
(Imbens & Manski, 2004)Concept Check
In an RCT of a tutoring program, 90% of treatment students take the end-of-year exam, compared to 80% of control students. The monotonicity assumption for Lee bounds requires:
Paper Library
Foundational (7)
Lee, D. S. (2009). Training, Wages, and Sample Selection: Estimating Sharp Bounds on Treatment Effects.
Lee developed sharp nonparametric bounds on treatment effects in the presence of sample selection, requiring only a monotonicity assumption (that treatment affects selection in one direction). These bounds are widely used to address attrition and selective sample composition in randomized experiments.
Manski, C. F. (1990). Nonparametric Bounds on Treatment Effects.
Manski introduced the partial identification approach to treatment effects, showing that even without strong assumptions, one can bound causal effects using the observed data. His worst-case bounds framework laid the theoretical foundation for Lee's sharper bounds under the monotonicity assumption.
Horowitz, J. L., & Manski, C. F. (2000). Nonparametric Analysis of Randomized Experiments with Missing Covariate and Outcome Data.
Horowitz and Manski extended the bounding approach to experiments with missing data on both covariates and outcomes. They showed how to construct valid bounds under different assumptions about the missing data mechanism, providing a principled alternative to complete-case analysis and imputation.
Gerard, F., Rokkanen, M., & Rothe, C. (2020). Bounds on Treatment Effects in Regression Discontinuity Designs with a Manipulated Running Variable.
Gerard, Rokkanen, and Rothe extended Lee-type bounding methods to regression discontinuity designs where the running variable is subject to manipulation. They showed how to construct bounds on treatment effects that account for strategic sorting around the cutoff.
Heckman, J. J. (1979). Sample Selection Bias as a Specification Error.
Heckman showed that sample selection—where the observed sample is not random—leads to omitted variable bias, and proposed a two-step correction using the inverse Mills ratio. This foundational paper on selection bias motivated later nonparametric bounding approaches, including Lee bounds, as alternatives that require weaker distributional assumptions.
Imbens, G. W., & Manski, C. F. (2004). Confidence Intervals for Partially Identified Parameters.
Imbens and Manski developed methods for constructing valid confidence intervals when parameters are only partially identified—that is, when the data and assumptions narrow the parameter to a set rather than a point. This paper provides the inferential foundation for reporting uncertainty around bounds estimates, including Lee bounds.
Manski, C. F. (2003). Partial Identification of Probability Distributions.
Manski's monograph provided a comprehensive treatment of partial identification, showing how to derive informative bounds on parameters of interest when point identification is not possible. This book formalized and extended his earlier work on bounding treatment effects and is the definitive reference for the theoretical framework underlying Lee bounds.
Application (4)
Angrist, J., Bettinger, E., & Kremer, M. (2006). Long-Term Educational Consequences of Secondary School Vouchers: Evidence from Administrative Records in Colombia.
Angrist, Bettinger, and Kremer applied Lee bounds to address attrition in a school voucher experiment in Colombia. This paper is one of the earliest and most prominent applications of Lee bounds in development economics, demonstrating how the method handles selective attrition in a real policy evaluation.
Semenova, V. (2025). Generalized Lee Bounds.
Semenova generalized Lee bounds to allow for covariates and machine learning estimation of nuisance functions, improving the tightness of bounds while maintaining their nonparametric validity. This paper connects the Lee bounds literature to the modern machine learning causal inference literature.
Kline, P., & Walters, C. R. (2016). Evaluating Public Programs with Close Substitutes: The Case of Head Start.
Kline and Walters applied bounding methods related to Lee bounds to evaluate Head Start in the presence of substitute programs. Their analysis demonstrates how partial identification and bounding approaches can address complex selection issues in program evaluation.
Crepon, B., Duflo, E., Gurgand, M., Rathelot, R., & Zamora, P. (2013). Do Labor Market Policies Have Displacement Effects? Evidence from a Clustered Randomized Experiment.
Crepon and colleagues evaluated a job placement assistance program in France using a large-scale clustered RCT and applied Lee bounds to address differential attrition from the sample. The paper demonstrates best-practice use of Lee bounds in a labor economics setting, showing that the program's employment effects remain robust to bounding even under worst-case attrition assumptions.