Practice·Estimation Stage·13 min read

Estimation Stage

Lee Bounds for Attrition

When point identification fails — especially due to differential attrition — informative bounds can still be useful.

Applies To: Experimental Design, Difference-in-Differences (Canonical 2×2)
Reading Time: ~13 min read · 14 sections · 3 interactive exercises · 10 papers

When You Cannot Pin It Down

Sometimes you have to admit that the data cannot tell you exactly what the answer is — only that it lies within a range. This situation is the world of , and if you find it frustrating, you are in good company. But it turns out that knowing an effect is "between 0.05 and 0.25" is often far more useful — and far more honest — than reporting a precise but biased point estimate.

The most common setting where partial identification arises in applied economics is sample selection: the outcome you care about is only observed for a non-random subset of your sample. The classic example is wages. You want to estimate the effect of a job training program on wages, but wages are only observed for people who are employed. If the training program itself changes who is employed (as it almost certainly does), then comparing wages among the employed is contaminated by selection.

Sample selection is not a minor technical issue. It threatens the validity of any study where the treatment affects whether the outcome is observed.

Why It Matters

If you ignore sample selection and simply compare outcomes among observed units, your treatment effect estimate is biased — potentially severely. Lee bounds give you honest, assumption-lean bounds on the true effect, letting you report what the data can actually support rather than a precise but misleading point estimate. Reviewers increasingly expect attrition analysis in experimental work, and Lee bounds are a standard tool for it.

Why Point Identification Fails with Differential Attrition

Consider a randomized experiment evaluating a job training program. You randomly assign 1,000 people to training and 1,000 to control. After six months, you observe:

Training group: 700 employed (70%), wages observed
Control group: 600 employed (60%), wages observed

You want to estimate the effect of training on wages. But here is the problem: the 700 employed in the treatment group and the 600 employed in the control group are different populations. Training caused 100 more people to be employed. Those 100 "marginal" workers — people who would not have been employed without the training — are probably different from the "always-employed" workers (e.g., lower skill, lower potential wages).

When you compare average wages among the employed, you are comparing:

Treatment: a mixture of always-employed workers and newly employed workers
Control: only always-employed workers

The comparison is contaminated by the composition change. You typically cannot separate the effect of training on wages from the effect of training on who is observed. Randomization guarantees balance in expectation (and approximate balance in large samples) in the full sample, but it says nothing about the selected subsample.

Two Approaches: Heckman vs. Bounds

The

The traditional solution to sample selection is the Heckman (1979) model.

The Heckman selection model requires:

A selection equation (a model of who is observed)
An exclusion restriction (a variable that affects selection but not the outcome)
Joint normality of the error terms

These conditions are strong requirements. The exclusion restriction is often hard to justify — what affects employment but not wages? Joint normality is a functional form assumption that may not hold. In practice, Heckman estimates can be fragile and sensitive to specification.

The Bounds Approach

The alternative, pioneered by Charles Manski and applied powerfully by David Lee, is to give up on point identification and instead bound the treatment effect using weaker assumptions (Manski, 2003).

The key insight: a wide but honest bound, derived under weaker assumptions, often beats a precise but questionable point estimate.

Lee (2009) Bounds: The Method

Lee (2009) developed a practical, widely used method for bounding treatment effects in the presence of sample selection.

The Monotonicity Assumption

Lee bounds require a single key assumption: . The treatment must affect selection in one direction only:

S_i(1) \geq S_i(0) \quad \text{for all } i

where $S_i(d)$ is the selection indicator (1 = observed, 0 = not observed) under treatment status $d$ .

In the job training example: training can only increase (or leave unchanged) the probability of employment for every individual. No one who would have been employed without training becomes unemployed because of training.

The Trimming Procedure

Under monotonicity, the treatment group contains everyone the control group contains plus some extra individuals who were "brought in" by the treatment. To make the groups comparable, we need to remove those extra individuals. The question is: which ones?

We do not know. But we can construct the best and worst cases.

Step 1: Compute the selection rates:

$p_1 = P(\text{observed} | \text{treated})$
$p_0 = P(\text{observed} | \text{control})$

Under monotonicity with the treatment increasing selection: $p_1 > p_0$ .

Step 2: Compute the trimming proportion:

q = \frac{p_1 - p_0}{p_1}

This quantity is the fraction of the treatment group's observed sample that was "brought in" by the treatment.

Step 3: Trim the treatment group to make it comparable:

Upper bound: Remove the bottom $q$ fraction of the treated group's outcome distribution. The remaining treated individuals have the highest outcomes. Comparing them to the control group gives an upper bound on the treatment effect for always-observed individuals.
Lower bound: Remove the top $q$ fraction of the treated group's outcome distribution. Comparing the remaining (lowest) outcomes to the control group gives a lower bound.

Don't worry about the notation yet — here's what this means in words: Under monotonicity, the extra individuals brought into the sample by treatment could be at any point in the outcome distribution. The worst case (lower bound) is that they are at the top; the best case (upper bound) is that they are at the bottom.

Under monotonicity, we can partition the treatment group's observed sample into two types:

Always-observed ( $S_i(0) = S_i(1) = 1$ ): individuals who would be observed regardless of treatment status.
Compliers ( $S_i(0) = 0, S_i(1) = 1$ ): individuals brought into the sample by treatment.

The control group's observed sample contains only always-observed types (under monotonicity). So the ideal comparison is: $E[Y_i(1) | \text{always-observed}] - E[Y_i(0) | \text{always-observed}]$ .

We observe $E[Y_i(0) | \text{always-observed}]$ directly from the control group. But the treatment group mixes always-observed and compliers. We do not know which individuals are compliers.

The worst case for the treatment effect is that compliers have the highest outcomes (so removing them from the top gives the lowest remaining mean). The best case is that compliers have the lowest outcomes (so removing them from the bottom gives the highest remaining mean).

Formally:

\text{Lower bound:} \quad E[Y_i \mid D_i = 1, Y_i \leq y_{1-q}^T] - E[Y_i \mid D_i = 0]

\text{Upper bound:} \quad E[Y_i \mid D_i = 1, Y_i \geq y_q^T] - E[Y_i \mid D_i = 0]

where $y_q^T$ is the $q$ -th quantile of the treated outcome distribution.

These bounds are sharp — they are the tightest possible bounds given only the monotonicity assumption and random assignment. No additional restriction can narrow them without additional assumptions.

A Worked Example

Return to our training program:

Treatment: 700 of 1,000 employed (70%)
Control: 600 of 1,000 employed (60%)

Step 1: $p_1 = 0.70$ , $p_0 = 0.60$

Step 2: $q = (0.70 - 0.60) / 0.70 = 0.143$

So 14.3% of the treatment group's employed workers were "brought in" by the program.

Step 3: Trim the treatment group.

Suppose the average wage in the control group is $15/hour. The average wage in the full treatment group is $15.50/hour.

Trim the bottom 14.3% of the treatment wage distribution. The remaining 85.7% have an average wage of $16.20. Upper bound = $16.20 - $15.00 = $1.20/hour.
Trim the top 14.3% of the treatment wage distribution. The remaining 85.7% have an average wage of $14.80. Lower bound = $14.80 - $15.00 = -$0.20/hour.

The Lee bounds are [-$0.20, $1.20]. The training program's effect on wages (for always-employed workers) is somewhere in this range. Notice the bounds include zero — we cannot rule out that training has no wage effect, even though the naive comparison shows a positive difference.

Interactive: Attrition and Bounds

Try setting both observation rates to the same value. The bounds collapse because there is no differential selection. Now widen the gap and watch the bounds expand — this widening is the cost of not knowing who was "brought in" by the treatment.

Tightening the Bounds

Lee bounds can be wide, especially when differential attrition is large. Two main strategies help:

1. Condition on Pre-Treatment Covariates

If you have baseline covariates that predict the outcome, compute Lee bounds within covariate cells and then average. Within-cell outcome distributions are less dispersed, so the bounds within each cell are tighter. The overall bounds (a weighted average across cells) are tighter than the unconditional bounds.

This tightening works because trimming removes a fixed fraction of the outcome distribution. If the within-cell distribution has less spread, trimming removes less extreme values, producing tighter bounds.

2. Reduce Differential Attrition

The most effective strategy is prevention. Minimize differential attrition through:

Intensive follow-up and tracking
Administrative data linkage (which eliminates survey non-response)
Incentive payments for survey completion
Short, simple outcome measures that maximize response

Every percentage point of differential attrition widens the bounds. A 2-percentage-point differential is far more manageable than a 15-percentage-point differential. Accounting for expected attrition in your power analysis at the design stage helps ensure the resulting bounds remain informative.

Lee Bounds vs. Manski Worst-Case Bounds

It is important not to confuse Lee bounds with Manski's "worst-case" bounds, which use no assumptions at all beyond the support of the outcome:

	Manski Bounds	Lee Bounds
Assumption	Only that $Y$ is bounded in $[a, b]$	Monotonicity of selection
Width	Often very wide (depends on outcome range)	Narrower (depends on differential attrition)
Impute missing outcomes as	Extreme values ( $a$ or $b$ )	Trimmed quantiles from observed data
When useful	As a baseline; monotonicity is not credible	When monotonicity is plausible

Manski bounds are often uninformatively wide because they impute the worst possible outcomes for missing data. Lee bounds are typically much tighter because monotonicity restricts the set of possible imputations.

When to Use Lee Bounds

Setting	Use Lee Bounds?	Why
RCT with differential attrition	Yes	The standard tool for this setting
RCT with symmetric attrition	Consider as robustness check	Equal attrition rates do not rule out differential selection on unobservables
Natural experiment (e.g., DiD) with outcome only for selected sample	Yes, if monotonicity holds	Same logic applies to quasi-experiments
Study where treatment reduces observation (e.g., mortality)	Yes, but reverse the direction	If treatment reduces $P(\text{observed})$ , trim the control group
Observational study with selection	Possible, but requires more caution	Lee bounds assume random assignment; additional assumptions needed

How to Do It: Code

1# Lee (2009) bounds with bootstrapped confidence intervals
2# Requires: base R only (no additional packages)
3lee_bounds <- function(y, treatment, selection, alpha = 0.05, n_boot = 1000) {
4# --- Step 1: Compute observation (selection) rates by group ---
5p1 <- mean(selection[treatment == 1])  # P(observed | treated)
6p0 <- mean(selection[treatment == 0])  # P(observed | control)
7
8# --- Step 2: Determine which group to trim ---
9# Under monotonicity, the group with higher selection is trimmed
10if (p1 >= p0) {
11  q <- (p1 - p0) / p1        # fraction of treated to trim
12  trim_group <- "treatment"
13} else {
14  q <- (p0 - p1) / p0        # fraction of control to trim
15  trim_group <- "control"
16}
17
18# --- Step 3: Extract observed outcomes for each group ---
19y1 <- y[treatment == 1 & selection == 1]
20y0 <- y[treatment == 0 & selection == 1]
21
22# --- Step 4: Core trimming function ---
23compute_bounds <- function(y1, y0, q) {
24  if (q < 1e-10) {
25    # No differential attrition: bounds collapse to a point
26    return(c(lower = mean(y1) - mean(y0), upper = mean(y1) - mean(y0)))
27  }
28  # Upper bound: trim the bottom q fraction (remove lowest outcomes)
29  cutoff_low <- quantile(y1, q)
30  upper <- mean(y1[y1 >= cutoff_low]) - mean(y0)
31
32  # Lower bound: trim the top q fraction (remove highest outcomes)
33  cutoff_high <- quantile(y1, 1 - q)
34  lower <- mean(y1[y1 <= cutoff_high]) - mean(y0)
35
36  c(lower = lower, upper = upper)
37}
38
39# Point estimates of the bounds
40bounds <- compute_bounds(y1, y0, q)
41
42# --- Step 5: Bootstrap confidence intervals ---
43# Resample treatment and control groups separately to preserve design
44n1 <- sum(treatment == 1)
45n0 <- sum(treatment == 0)
46boot_bounds <- replicate(n_boot, {
47  # Draw bootstrap samples within each group
48  idx1 <- sample(which(treatment == 1), n1, replace = TRUE)
49  idx0 <- sample(which(treatment == 0), n0, replace = TRUE)
50  # Recompute selection rates and bounds on bootstrap sample
51  b_y1 <- y[idx1[selection[idx1] == 1]]
52  b_y0 <- y[idx0[selection[idx0] == 1]]
53  b_p1 <- mean(selection[idx1])
54  b_p0 <- mean(selection[idx0])
55  b_q <- max(0, (b_p1 - b_p0) / b_p1)
56  compute_bounds(b_y1, b_y0, b_q)
57})
58
59# Percentile-based CIs for each bound endpoint
60ci_lower <- quantile(boot_bounds["lower", ], c(alpha/2, 1-alpha/2))
61ci_upper <- quantile(boot_bounds["upper", ], c(alpha/2, 1-alpha/2))
62
63# --- Step 6: Return results ---
64list(
65  lower = bounds["lower"],
66  upper = bounds["upper"],
67  ci_lower = ci_lower,
68  ci_upper = ci_upper,
69  trimming_proportion = q,
70  p_treated = p1,
71  p_control = p0
72)
73}
74
75# --- Usage ---
76result <- lee_bounds(
77y = df$wage,
78treatment = df$treatment,
79selection = df$employed     # 1 = observed, 0 = attrited
80)
81
82# Print the bounds and their confidence intervals
83cat(sprintf("Lee Bounds: [%.3f, %.3f]\n", result$lower, result$upper))
84cat(sprintf("95%% CI for lower bound: [%.3f, %.3f]\n",
85  result$ci_lower[1], result$ci_lower[2]))
86cat(sprintf("95%% CI for upper bound: [%.3f, %.3f]\n",
87  result$ci_upper[1], result$ci_upper[2]))

Manual Implementation

1# Requires: base R (no additional packages)
2# Manual Lee bounds — step-by-step walkthrough
3
4# --- Step 1: Compute selection (observation) rates ---
5# p1 = P(observed | treated), p0 = P(observed | control)
6p1 <- mean(df$employed[df$treatment == 1])
7p0 <- mean(df$employed[df$treatment == 0])
8
9# --- Step 2: Compute the trimming proportion ---
10# q = fraction of the treated group "brought in" by treatment
11q <- (p1 - p0) / p1
12cat("Trimming proportion:", q, "\n")
13
14# --- Step 3: Extract observed outcomes and find trimming cutoffs ---
15y1 <- df$wage[df$treatment == 1 & df$employed == 1]  # treated wages
16y0 <- df$wage[df$treatment == 0 & df$employed == 1]  # control wages
17
18# Quantiles define where to cut the treated distribution
19cutoff_lower <- quantile(y1, q)       # for upper bound: trim below this
20cutoff_upper <- quantile(y1, 1 - q)   # for lower bound: trim above this
21
22# --- Step 4: Compute bounds ---
23# Upper bound: remove bottom q% (assume extra workers have lowest wages)
24upper <- mean(y1[y1 >= cutoff_lower]) - mean(y0)
25cat("Upper bound:", upper, "\n")
26
27# Lower bound: remove top q% (assume extra workers have highest wages)
28lower <- mean(y1[y1 <= cutoff_upper]) - mean(y0)
29cat("Lower bound:", lower, "\n")

How to Report Lee Bounds

A well-reported Lee bounds analysis includes:

The attrition or selection rates for treatment and control groups.
A test for differential attrition (is the difference statistically significant?).
The monotonicity assumption, stated explicitly and justified for your setting.
The bounds with confidence intervals (bootstrapped).
Comparison with the naive estimate (ignoring selection).
Whether covariates were used to tighten bounds.

Example write-up:

Employment rates are 70% in the treatment group and 60% in the control group (p < 0.001), indicating that the training program increased employment. Because wages are only observed for employed individuals, the naive comparison of treatment and control wages is contaminated by differential selection. Following Lee (2009), we compute bounds on the wage effect under the assumption that training weakly increases employment for all individuals (monotonicity). The trimming proportion is 14.3%. The Lee bounds for the treatment effect on hourly wages are [-$0.20, $1.20] (95% CI for the lower bound: [-$0.85, $0.45]; 95% CI for the upper bound: [$0.55, $1.85]). The bounds include zero, so we cannot reject that training has no effect on wages for always-employed workers. Conditioning on baseline covariates (age, gender, education) tightens the bounds to [$0.05, $0.95].

Common Mistakes

Pitfalls to avoid

Ignoring sample selection entirely. If your treatment affects who is in the sample, the naive estimate is biased. At minimum, test for differential attrition and acknowledge the problem. Pairing Lee bounds with sensitivity analysis for unobservables gives a more complete picture of robustness. Even if your main results survive, reviewers will often ask about this issue.
Assuming monotonicity without justification. Monotonicity is not plausible in every setting. A job training program could cause some people to leave their current job to attend training. A medical treatment could have side effects that cause some patients to withdraw. The reasonableness of one-directional selection should be evaluated explicitly for the specific intervention and outcome under study, using the diagnostic questions in the "Is monotonicity reasonable?" callout earlier on the page.
Confusing Lee bounds with Manski worst-case bounds. Manski bounds use no assumptions beyond the support of the outcome and are typically very wide. Lee bounds are tighter because they impose monotonicity. They are different tools for different situations.
Reporting only the naive point estimate and ignoring bounds. If differential attrition is present, the point estimate alone is not credible. Even if the bounds are wide, reporting them is more honest than pretending the problem does not exist. If your pre-analysis plan specifies the attrition threshold for computing bounds, this decision is protected from post-hoc selectivity.
Computing bounds when attrition is symmetric. If treatment and control groups have the same attrition rate, there is no differential selection (at least in terms of rates), and Lee bounds are unnecessary. The issue is differential attrition, not attrition per se.
Not bootstrapping confidence intervals. The bounds themselves have sampling uncertainty. Reporting bounds without confidence intervals understates the total uncertainty. The Imbens and Manski (2004) confidence interval is the appropriate one for partially identified parameters — it covers the true parameter value with the nominal probability for whichever point in the identified set is the true parameter, a target strictly weaker than covering the entire identified set but stronger than covering each endpoint separately. The naive percentile CIs computed separately for the lower and upper endpoints (as implemented in the bootstrap blocks above) over-cover the true parameter, so the Imbens-Manski interval should be reported instead.
Forgetting that Lee bounds target the always-observed subpopulation. The bounds estimate the treatment effect for individuals who would be observed regardless of treatment status — a specific subpopulation, not the full sample.

Concept Check

In an RCT of a tutoring program, 90% of treatment students take the end-of-year exam, compared to 80% of control students. The monotonicity assumption for Lee bounds requires:

That the tutoring program increases exam-taking for every student.That the tutoring program increases exam-taking on average.That exam scores are normally distributed.That there is a valid exclusion restriction for exam-taking.

Paper Library

Has replication code

Foundational (8)

Gerard, F., Rokkanen, M., & Rothe, C. (2020). Bounds on Treatment Effects in Regression Discontinuity Designs with a Manipulated Running Variable.

Quantitative EconomicsDOI: 10.3982/QE1079

Gerard, Rokkanen, and Rothe study regression-discontinuity settings in which the running variable is manipulated, so conventional point identification fails. They show that treatment effects are still partially identified and derive sharp bounds under a general model in which the extent of manipulation is learned from the data.

Heckman, J. J. (1979). Sample Selection Bias as a Specification Error.

EconometricaDOI: 10.2307/1912352

Heckman introduces the two-step estimator for correcting sample selection bias using the inverse Mills ratio. The paper shows that selection bias can be treated as an omitted variable problem, where the omitted variable is the conditional expectation of the error term given selection. One of the most cited papers in econometrics.

Horowitz, J. L., & Manski, C. F. (2000). Nonparametric Analysis of Randomized Experiments with Missing Covariate and Outcome Data.

Journal of the American Statistical AssociationDOI: 10.1080/01621459.2000.10473902

Horowitz and Manski extend the bounding approach to experiments with missing data on both covariates and outcomes. They show how to construct valid bounds under different assumptions about the missing data mechanism, providing a principled alternative to complete-case analysis and imputation.

Imbens, G. W., & Manski, C. F. (2004). Confidence Intervals for Partially Identified Parameters.

EconometricaDOI: 10.1111/j.1468-0262.2004.00555.x

Imbens and Manski develop methods for constructing valid confidence intervals when parameters are only partially identified—that is, when the data and assumptions narrow the parameter to a set rather than a point. This paper provides the inferential foundation for reporting uncertainty around bounds estimates, including Lee bounds.

Lee, D. S. (2009). Training, Wages, and Sample Selection: Estimating Sharp Bounds on Treatment Effects.

Review of Economic StudiesDOI: 10.1111/j.1467-937X.2009.00536.x

Lee develops sharp nonparametric bounds on treatment effects in the presence of sample selection, requiring only a monotonicity assumption (that treatment affects selection in one direction). These bounds are widely used to address attrition and selective sample composition in randomized experiments.

Manski, C. F. (1990). Nonparametric Bounds on Treatment Effects.

American Economic Review: Papers & Proceedings

Manski introduces the partial identification approach to treatment effects, showing that even without strong assumptions, one can bound causal effects using the observed data. His worst-case bounds framework lays the theoretical foundation for Lee's sharper bounds under the monotonicity assumption.

Manski, C. F. (2003). Partial Identification of Probability Distributions.

SpringerDOI: 10.1007/b97478

Manski's monograph provides a comprehensive treatment of partial identification, showing how to derive informative bounds on parameters of interest when point identification is not possible. This book formalizes and extends his earlier work on bounding treatment effects and is the standard reference for the theoretical framework underlying Lee bounds.

Semenova, V. (2025). Generalized Lee Bounds.

Journal of EconometricsDOI: 10.1016/j.jeconom.2025.106055

Semenova generalizes Lee bounds to allow for covariates and machine learning estimation of nuisance functions, improving the tightness of bounds while maintaining their nonparametric validity. This paper connects the Lee bounds literature to the modern machine learning causal inference literature.

Application (2)

Angrist, J., Bettinger, E., & Kremer, M. (2006). Long-Term Educational Consequences of Secondary School Vouchers: Evidence from Administrative Records in Colombia.

American Economic ReviewDOI: 10.1257/aer.96.3.847

Angrist, Bettinger, and Kremer use bounding methods to address attrition in a school voucher experiment in Colombia. The paper is a prominent application of trimming-based bounds in development economics, demonstrating how such methods handle selective attrition in a real policy evaluation.

Kline, P., & Walters, C. R. (2016). Evaluating Public Programs with Close Substitutes: The Case of Head Start.

Quarterly Journal of EconomicsDOI: 10.1093/qje/qjw027

Kline and Walters develop a structural framework to evaluate Head Start's cost-effectiveness in the presence of close-substitute preschool programs, combining the Head Start Impact Study RCT with an unordered discrete-choice model of preschool participation. They show that ignoring substitution to alternative preschools substantially understates Head Start's value, illustrating how careful modeling of program substitutes is essential for credible cost-benefit analysis of social programs.

When You Cannot Pin It Down#

Why It Matters#

Why Point Identification Fails with Differential Attrition#

Two Approaches: Heckman vs. Bounds#

The Heckman Selection Model#

The Bounds Approach#

Lee (2009) Bounds: The Method#

The Monotonicity Assumption#

The Trimming Procedure#

A Worked Example#

Interactive: Attrition and Bounds#

Tightening the Bounds#

1. Condition on Pre-Treatment Covariates#

2. Reduce Differential Attrition#

Lee Bounds vs. Manski Worst-Case Bounds#

When to Use Lee Bounds#

How to Do It: Code#

Manual Implementation#

How to Report Lee Bounds#

Common Mistakes#

Concept Check#