Heckman Selection Model
Corrects for sample selection bias when the outcome is observed only for a non-random subset of the population, using a two-equation system with an exclusion restriction.
Quick Reference
- When to Use
- When your outcome variable is observed only for a selected sample -- e.g., wages only for employed workers, firm performance only for surviving firms, deal returns only for completed transactions.
- Key Assumption
- Joint normality of the error terms in the selection and outcome equations. An exclusion restriction: at least one variable affects selection but not the outcome. Without the exclusion restriction, identification relies entirely on the normality assumption.
- Common Mistake
- Not having a credible exclusion restriction and relying solely on the normality assumption for identification, which produces fragile estimates.
- Estimated Time
- 3 hours
One-Line Implementation
heckman wage education experience, select(employed = children married) twostepselection(employed ~ children + married + education + experience, wage ~ education + experience, data = df)Heckman(df['wage'], df[['const','education','experience']], df['employed'], df[['const','children','married','education','experience']]).fit()Download Full Analysis Code
Complete scripts with diagnostics, robustness checks, and result export.
Motivating Example
A labor economist wants to estimate the returns to education for married women. She has a large survey with information on education, age, number of children, husband's income, and — for women who are employed — their hourly wage.
Here is the problem: wages are only observed for women who work. Of the 753 women in her sample, only 428 are employed. The remaining 325 women have missing wages — not because of data errors, but because they chose not to participate in the labor market.
If she simply runs OLS on the 428 working women, she estimates wages conditional on employment. But employment itself is a choice that depends on potential wages. Women with very low potential wages may choose not to work (because the opportunity cost of leisure or home production exceeds their market wage). By restricting the sample to employed women, the researcher is selecting on an outcome that is correlated with the error term in the wage equation.
This problem is . It is not the same as omitted variable bias or measurement error — it arises because the sample is non-randomly drawn from the population. The observed wage distribution is truncated from below: women with the lowest potential wages are disproportionately absent from the sample.
The Heckman selection model (Heckman, 1979) solves this problem. It jointly models two processes: (1) the selection equation — whether a woman works — and (2) the outcome equation — what she earns if she works. By estimating the correlation between the errors in these two equations, the model corrects for the non-random selection into the observed sample. The key correction term is the , which captures the expected value of the error in the wage equation conditional on the woman being selected into the sample.
This approach earned James Heckman the 2000 Nobel Prize in Economics and remains one of the most widely used corrections for sample selection bias across the social sciences (Mroz, 1987).
A. Overview
What the Heckman Model Does
The Heckman selection model corrects for bias that arises when the outcome variable is observed only for a non-random subset of the population. It does so by modeling the selection process explicitly and incorporating information about selection into the outcome equation.
The model consists of two equations:
Selection equation (who is observed):
where if the outcome is observed (e.g., the woman works) and otherwise. is a vector of covariates that affect selection, and is the error term.
Outcome equation (what is the outcome, conditional on being observed):
where is the outcome of interest (e.g., wages), is a vector of covariates, and is the error term.
The key assumption is that the errors are jointly normally distributed:
where is the correlation between the selection and outcome errors, and is the standard deviation of . The variance of is normalized to 1 (as in a standard probit model).
The Selection Bias Problem
If , then — the expected error in the wage equation is non-zero for the selected sample. Specifically:
The term is the , where is the standard normal PDF and is the standard normal CDF. Running OLS on the selected sample omits this term, producing biased estimates of whenever .
Two Estimation Approaches
Heckman Two-Step (Heckit):
- Estimate the as a probit:
- Compute the inverse Mills ratio:
- Include as an additional regressor in the outcome equation and estimate by OLS:
The coefficient on estimates . Standard errors must be corrected because is a generated regressor (the standard OLS standard errors are too small).
Full Information Maximum Likelihood (FIML):
Estimate both equations simultaneously by maximizing the joint likelihood of the observed data. FIML is more efficient than the two-step estimator (especially when the exclusion restriction is weak) but requires the joint normality assumption to hold for the entire joint distribution, not just the conditional mean (Puhani, 2000).
When to Use the Heckman Model
- Your outcome is observed only for a selected subsample and you believe the selection is non-random
- You have a credible — a variable that affects selection but not the outcome
- Joint normality of the error terms is a reasonable approximation
- You want to estimate causal effects or population-level relationships, not just conditional associations for the selected sample
When NOT to Use the Heckman Model
- Your missing data is missing at random (MAR) — standard imputation or inverse probability weighting may suffice
- You have no credible exclusion restriction — without one, the model is identified only through functional form (normality), which is fragile (Bushway et al., 2007)
- The normality assumption is badly violated — consider semiparametric alternatives or Lee bounds
- You have a randomized experiment with attrition — Lee bounds provide a more robust approach
Common Confusions
When (Not) to Use This Method
Use the Heckman Model When:
-
Your outcome is observed only for a non-randomly selected subset. Wages for workers, test scores for students who did not drop out, firm performance for firms that survived, analyst forecasts for covered firms. The key question is: would including the unobserved outcomes change your conclusions?
-
You have a credible exclusion restriction. A variable that plausibly affects selection but not the outcome. Without one, the model relies on functional form assumptions that are typically indefensible.
-
Joint normality is a reasonable approximation. If the outcome variable is continuous, approximately symmetric, and not heavily skewed or bounded, normality is more plausible.
-
You want to recover population-level parameters. If you only care about the effect of education on wages for women who work, OLS on the selected sample is fine. But if you want the effect of education on potential wages for all women (workers and non-workers), you need the selection correction.
Do NOT Use the Heckman Model When:
-
Your missing data is missing at random (MAR). If missingness depends only on observed covariates, standard methods (multiple imputation, inverse probability weighting) are sufficient and do not require the normality assumption.
-
You have no exclusion restriction and cannot defend one. Without an exclusion restriction, you are relying entirely on the nonlinearity of the normal CDF for identification. Certo et al. (2016) find that many management papers apply Heckman without a credible exclusion restriction, rendering the correction unreliable.
-
The normality assumption is doubtful. If the outcome variable is heavily skewed (e.g., firm value, patent counts) or bounded (e.g., proportions, ratings on a fixed scale), the joint normality assumption is suspect. Consider semiparametric alternatives or Lee bounds.
-
You have a randomized experiment with differential attrition. Lee bounds provide a nonparametric approach that does not require normality or an exclusion restriction. They give bounds on the treatment effect rather than a point estimate, but the bounds are valid under weaker assumptions.
-
Selection is on observables only. If you believe you have observed all the variables that drive selection, matching or inverse probability weighting can address selection without parametric distributional assumptions.
Connection to Other Methods
The Heckman model relates to several other methods for handling selection and endogeneity:
-
Logit/Probit: the selection equation in the Heckman model is a probit. If you are only interested in modeling the selection decision itself (e.g., labor force participation), a standalone probit is sufficient. The Heckman model adds the outcome equation and the correlation between the two.
-
OLS: the outcome equation estimated on the selected sample is OLS with selection bias. The Heckman model adds the inverse Mills ratio to correct this bias. If (no selection bias), the Heckman model reduces to OLS on the selected sample.
-
IV/2SLS: conceptually similar in that both require an exclusion restriction. In IV, the instrument affects the endogenous regressor but not the outcome directly. In Heckman, the excluded variable affects selection but not the outcome. The Heckman two-step can be viewed as a control function approach — adding a correction term rather than instrumenting.
-
Lee bounds: a nonparametric alternative that bounds the treatment effect under weaker assumptions (monotonicity of selection, no functional form). Lee bounds are popular in program evaluation when normality is questionable. The trade-off is that you get bounds rather than a point estimate.
-
Matching: addresses selection on observables. Matching assumes that, conditional on observed covariates, selection is as good as random. The Heckman model addresses selection on unobservables — the case where selection depends on factors correlated with the outcome error term even after conditioning on observables.
-
Control function approach: the Heckman two-step is a special case of the control function approach. In the general control function framework, the correction term can take forms other than the inverse Mills ratio (e.g., for non-normal error distributions). Rivers-Vuong (1988) extends this to simultaneous equations with endogenous regressors.
B. Identification
For the Heckman model to provide valid correction, two key conditions must hold.
Condition 1: Joint Normality
Plain language: The unobserved factors affecting selection and the unobserved factors affecting the outcome must follow a bivariate normal distribution.
Formally: where .
This assumption is crucial because the inverse Mills ratio correction is derived from the properties of the bivariate normal distribution. If the errors are not jointly normal, the functional form of the correction term is wrong, and the bias correction may itself be biased.
Condition 2: Exclusion Restriction
Plain language: At least one variable must appear in the selection equation but not in the outcome equation. This variable affects whether the outcome is observed but does not directly affect the outcome itself.
Formally: There exists a variable such that (it affects selection) but does not appear in (it is excluded from the outcome equation).
Why it matters: Without an exclusion restriction, the Heckman model is identified only through the nonlinearity of the inverse Mills ratio — which comes from the normality assumption. If normality is even slightly wrong, this "identification through functional form" can produce wildly inaccurate corrections. Bushway et al. (2007) demonstrate that without a credible exclusion restriction, the Heckman correction provides no improvement over naive OLS and may increase bias.
Examples of exclusion restrictions in practice:
| Research setting | Selection equation | Exclusion restriction | Justification |
|---|---|---|---|
| Female labor supply | Works or not | Number of young children, husband's income | Affect labor force participation but not wage rate conditional on working |
| Firm R&D spending | Reports R&D or not | Industry peers' reporting practices | Affects disclosure but not the level of R&D |
| CEO compensation | Firm is publicly traded | State-level IPO regulations | Affect listing decision but not pay |
| Analyst coverage | Firm is covered or not | Geographic distance to nearest analyst | Affects coverage probability but not firm value |
C. Visual Intuition
Adjust the selection correlation to see how naive OLS on the selected sample diverges from the true effect. When selection is strong, OLS is badly biased; the Heckman correction uses the inverse Mills ratio to recover the true coefficient.
Heckman Selection Correction vs. Naive OLS
DGP: Selection S* = 0.2 + Z + v, Outcome Y = 1 + 0.5·X + ε, Corr(v, ε) = -0.4. 276 of 500 observations selected.
Estimation Results
| Estimator | β̂ | SE | 95% CI | Bias |
|---|---|---|---|---|
| Naive OLS | 0.518 | 0.037 | [0.44, 0.59] | +0.018 |
| Heckman two-stepclosest | 0.516 | 0.037 | [0.44, 0.59] | +0.016 |
| True β | 0.500 | — | — | — |
Total observations (before selection)
Correlation between selection and outcome errors
The causal effect of X on Y
Standard deviation of the outcome error
Why the difference?
Selection is negatively correlated with the outcome (ρ = -0.4). Naive OLS on the selected sample is biased (β̂ = 0.518, bias = +0.018) because it ignores the non-random selection into the observed sample. The Heckman two-step estimator includes the inverse Mills ratio to correct for selection, reducing the bias by about 13% (β̂ = 0.516, bias = +0.016). The correction works because the exclusion restriction variable Z affects selection but not the outcome directly.
D. Mathematical Derivation
Derivation of the Inverse Mills Ratio Correction
Don't worry about the notation yet — here's what this means in words: The two-step correction derives from the conditional expectation of the outcome given selection, under joint normality of the error terms.
Setup. We have two equations:
Selection: , where
Outcome: , observed only when
Assume are jointly normal with correlation and , .
Step 1: Conditional expectation of the outcome.
The OLS bias is , which is non-zero when selection is correlated with the outcome.
Step 2: Use the properties of the truncated bivariate normal.
Since iff , we need .
For jointly normal :
Step 3: Expected value of a truncated standard normal.
For :
where . This ratio is the inverse Mills ratio.
Step 4: Combine.
This derivation shows that OLS on the selected sample omits , producing an omitted variable bias. The two-step estimator corrects this by including as an additional regressor.
Step 5: Standard error correction.
Because is estimated from the first-stage probit (it is a generated regressor), the OLS standard errors in the second stage are incorrect. The correct variance-covariance matrix accounts for the estimation uncertainty in . All standard software packages compute the corrected standard errors automatically.
E. Implementation
Heckman Selection Model with Diagnostics
library(sampleSelection)
# ---- Step 1: Prepare the data ----
# Outcome: log(wage), observed only for working women
# Selection: whether the woman works (lfp = 1)
# Exclusion restriction: number of young children (nwifeinc, kidslt6)
# These affect labor force participation but not the wage rate
# ---- Step 2: Heckman two-step estimator ----
heck_2step <- heckit(
selection = lfp ~ age + I(age^2) + educ + nwifeinc + kidslt6 + kidsge6,
outcome = log(wage) ~ educ + exper + I(exper^2) + city,
data = df,
method = "2step"
)
summary(heck_2step)
# The coefficient on the inverse Mills ratio (lambda) estimates rho * sigma
# If lambda is significant, selection bias is present
# ---- Step 3: Full information MLE ----
heck_mle <- heckit(
selection = lfp ~ age + I(age^2) + educ + nwifeinc + kidslt6 + kidsge6,
outcome = log(wage) ~ educ + exper + I(exper^2) + city,
data = df,
method = "ml"
)
summary(heck_mle)
# MLE directly estimates rho and sigma (not just rho*sigma)
# Compare two-step and MLE results — large discrepancies suggest
# problems with normality or the exclusion restriction
# ---- Step 4: Diagnostics ----
# Test significance of inverse Mills ratio
# H0: rho = 0 (no selection bias)
# If not rejected, OLS on the selected sample is consistent
# Compare with naive OLS (ignoring selection)
ols_naive <- lm(log(wage) ~ educ + exper + I(exper^2) + city,
data = df[df$lfp == 1, ])
summary(ols_naive)
# If Heckman and OLS give similar coefficients, selection bias
# may be small (or the correction is not working due to
# weak exclusion restriction)F. Diagnostics
F.1 Significance of the Inverse Mills Ratio (Lambda)
The most basic diagnostic: is (the coefficient on the inverse Mills ratio) statistically significant? Under the null hypothesis , there is no selection bias, and OLS on the selected sample is consistent. If is insignificant, either (a) there is genuinely no selection bias, or (b) the exclusion restriction is too weak to detect it.
- In the two-step estimator, test whether the coefficient on is significant (use corrected standard errors)
- In MLE, test using the Wald test on or a likelihood ratio test comparing the joint model to separate probit + OLS
F.2 Normality Tests
Since the correction relies on joint normality, assess whether this assumption is plausible:
-
Residual normality: test the outcome equation residuals for normality (Shapiro-Wilk, Jarque-Bera, Q-Q plot). This checks marginal normality of , which is necessary but not sufficient for joint normality.
-
Polynomial Mills ratio test: add and to the outcome equation. If these higher-order terms are significant, the linear Mills ratio correction is insufficient — evidence against normality. The polynomial Mills ratio test is a specification test for the functional form of the correction.
-
Compare two-step and MLE: large discrepancies between the two estimators suggest normality violations (MLE relies more heavily on normality than the two-step estimator).
F.3 Exclusion Restriction Strength
A weak exclusion restriction produces imprecise estimates and makes the model fragile:
-
Joint significance test: test whether the excluded variables are jointly significant in the probit selection equation ( test). An insignificant exclusion restriction means the model is unidentified without functional form.
-
Magnitude of the first-stage coefficients: the excluded variables should have economically meaningful effects on selection, not just statistical significance.
-
Collinearity check: verify that the inverse Mills ratio is not highly collinear with the covariates in the outcome equation. High collinearity (VIF > 10 for ) indicates that the model cannot distinguish the selection correction from the direct effects of the covariates.
F.4 Two-Step vs. MLE Comparison
Estimate the model using both methods and compare:
- Similar results: reassuring. Both methods are estimating the same parameters, and the normality assumption is likely adequate.
- Different results: cause for concern. Possible explanations: (1) normality is violated (MLE is more sensitive), (2) the exclusion restriction is weak (MLE leverages different information than two-step), (3) the sample size is too small for MLE to converge properly.
library(sampleSelection)
# Fit two-step and MLE
heck_2s <- heckit(
selection = lfp ~ age + I(age^2) + educ + nwifeinc + kidslt6 + kidsge6,
outcome = log(wage) ~ educ + exper + I(exper^2) + city,
data = df, method = "2step"
)
heck_ml <- heckit(
selection = lfp ~ age + I(age^2) + educ + nwifeinc + kidslt6 + kidsge6,
outcome = log(wage) ~ educ + exper + I(exper^2) + city,
data = df, method = "ml"
)
# F.1 Test significance of lambda (rho * sigma)
summary(heck_2s) # Check p-value on "Inverse Mills Ratio"
# F.2 Polynomial Mills ratio test for normality
# Compute Mills ratio from first-stage probit
probit_fit <- glm(lfp ~ age + I(age^2) + educ + nwifeinc +
kidslt6 + kidsge6,
family = binomial(link = "probit"), data = df)
Zgamma <- predict(probit_fit, type = "link")
mills <- dnorm(Zgamma) / pnorm(Zgamma)
# Add squared and cubed Mills ratio to outcome equation
selected <- df[df$lfp == 1, ]
mills_sel <- mills[df$lfp == 1]
selected$mills <- mills_sel
selected$mills2 <- mills_sel^2
selected$mills3 <- mills_sel^3
norm_test <- lm(log(wage) ~ educ + exper + I(exper^2) + city +
mills + mills2 + mills3,
data = selected)
# Joint test of mills2 and mills3
library(car)
linearHypothesis(norm_test, c("mills2 = 0", "mills3 = 0"))
# F.3 Exclusion restriction strength
anova(
glm(lfp ~ age + I(age^2) + educ + kidsge6,
family = binomial(link = "probit"), data = df),
probit_fit, test = "Chisq"
)
# F.4 Compare two-step and MLE
cbind(TwoStep = coef(heck_2s, part = "outcome"),
MLE = coef(heck_ml, part = "outcome"))F.5 Interpreting Your Results
What Rho Tells You
The parameter — the correlation between the selection and outcome errors — is the key parameter that drives the selection correction.
| Sign of bias in naive OLS | Interpretation | |
|---|---|---|
| OLS overestimates | Individuals who select in have higher unobserved outcome potential. Example: women who choose to work have higher unobserved ability, inflating observed wages. | |
| No bias | Selection is independent of the outcome (conditional on covariates). OLS on the selected sample is consistent. | |
| OLS underestimates | Individuals who select in have lower unobserved outcome potential. Example: workers who accept a dangerous job have lower outside options, deflating observed compensating differentials. |
What Lambda ( Coefficient) Captures
The coefficient on the inverse Mills ratio in the two-step estimator is , sometimes denoted or . Its magnitude and significance determine:
-
Significance of : if significant, sample selection bias is present, and the Heckman correction is doing meaningful work. If insignificant, either selection bias is absent or the model lacks power to detect it (weak exclusion restriction).
-
Magnitude of : the product determines the size of the bias correction. A large coefficient means the selection correction substantially changes the outcome equation coefficients.
-
Sign of : reveals the direction of selection. Negative (with positive in the selection equation) means positive selection — those who select in have higher outcomes than a random draw from the population.
What to Report in a Table
A well-reported Heckman model should include:
- Both equations: report the full selection equation (probit coefficients) and the outcome equation (OLS coefficients with Mills ratio correction)
- Lambda (): the coefficient on the inverse Mills ratio, with its standard error and p-value
- Rho () and sigma (): if using MLE, report these separately
- Number of observations: total N, number selected (observed outcome), number not selected
- Exclusion restriction: clearly identify which variables appear in the selection equation but not the outcome equation
- Exclusion restriction strength: chi-squared test of the excluded variables in the selection equation
- Estimation method: two-step or MLE
- Naive OLS comparison: show how the coefficients change when selection is ignored
G. What Can Go Wrong
No Exclusion Restriction
Heckman model with a credible exclusion restriction (number of young children affects labor force participation but not wage rate)
Returns to education: 0.108 (SE = 0.014). Lambda = -0.28 (SE = 0.11, p = 0.01). The selection correction is significant and the education coefficient is well-estimated because the exclusion restriction (kidslt6) is strong in the selection equation (chi-squared = 42.3, p < 0.001).
Normality Violation
Outcome variable (log wages) is approximately normally distributed. Joint normality of error terms is plausible.
Two-step estimate of returns to education: 0.108 (SE = 0.014). MLE estimate: 0.105 (SE = 0.012). The two methods agree closely, and the polynomial Mills ratio test does not reject normality (p = 0.61).
Weak Exclusion Restriction
Exclusion restriction (number of children under 6) is strongly predictive of labor force participation: probit coefficient = -0.87, chi-squared = 42.3, p < 0.001
Lambda = -0.28 (SE = 0.11). The inverse Mills ratio is precisely estimated, providing a meaningful selection correction. The VIF of lambda in the outcome equation is 2.1 — well below the danger zone.
Confusing Incidental Truncation with Sample Selection
True sample selection: wages are missing because women choose not to work. The selection decision is correlated with potential wages (women with low potential wages opt out).
Heckman correction is appropriate: rho = -0.45. Women who select into employment have higher-than-average potential wages. OLS on the selected sample overestimates average wages in the population.
H. Practice
H.1 Concept Checks
A researcher estimates a Heckman selection model for CEO compensation, where compensation is observed only for public firms. She uses state-level IPO regulations as the exclusion restriction. She finds that the inverse Mills ratio coefficient is -0.45 (SE = 0.18, p = 0.01) and rho = -0.38. What does the negative rho tell us about the selection process?
A researcher applies the Heckman two-step estimator to study the effect of R&D spending on firm performance. She includes the same set of variables in both the selection equation (whether the firm reports R&D) and the outcome equation (firm performance given R&D is reported). She has no exclusion restriction. She finds that the inverse Mills ratio is significant (p = 0.04). Is this evidence that her model is working correctly?
You estimate a Heckman model and find that lambda (the coefficient on the inverse Mills ratio) is -0.15 with a standard error of 0.42 and p = 0.72. The naive OLS coefficient on your key variable is 0.35 (SE = 0.08), while the Heckman-corrected coefficient is 0.33 (SE = 0.14). What should you conclude?
H.2 Guided Exercise
Interpreting Heckman Selection Model Output
You study the effect of training programs on worker wages. Wages are observed only for employed workers. Your Heckman model produces: **Selection Equation (Probit: Employed = 1)** Variable | Coeff | SE | p-value Age | 0.045 | 0.012 | < 0.001 Age-squared | -0.0005 | 0.0002| 0.012 Education | 0.112 | 0.021 | < 0.001 Married | 0.284 | 0.098 | 0.004 Num. children < 6 | -0.432 | 0.076 | < 0.001 [EXCLUDED] Spouse income (\$000s) | -0.018 | 0.005 | < 0.001 [EXCLUDED] Exclusion restriction test: chi-squared(2) = 52.4, p < 0.001 **Outcome Equation (Dep. var: log(wage))** Variable | Coeff | SE | p-value Education | 0.098 | 0.015 | < 0.001 Training program | 0.145 | 0.038 | < 0.001 Experience | 0.032 | 0.008 | < 0.001 Experience-squared | -0.0005 | 0.0002| 0.012 Inverse Mills ratio | -0.312 | 0.104 | 0.003 rho = -0.47, sigma = 0.664, lambda = rho * sigma = -0.312 Method: Two-step. N = 2,000 (1,340 employed, 660 not employed). Naive OLS on employed workers: Training coefficient = 0.118 (SE = 0.032, p < 0.001).
H.3 Error Detective
Read the analysis below carefully and identify the errors.
Select all errors you can find:
Read the analysis below carefully and identify the errors.
Select all errors you can find:
H.4 You Are the Referee
Read the paper summary below and write a brief referee critique (2-3 sentences) of the identification strategy.
Paper Summary
The authors study returns to education for married women using a Heckman two-step selection model, correcting for the fact that wages are observed only for employed women. They use a sample of 3,200 married women (1,950 employed, 1,250 not employed) from a national household survey. As their exclusion restriction, they use whether the woman's mother was employed when the woman was age 14, arguing that maternal employment norms affect daughters' labor force participation but not their wages. They find a selection-corrected return to education of 11.2% per year (SE = 1.8%), compared to a naive OLS estimate of 8.5% (SE = 1.1%). Lambda is -0.28 (p = 0.02).
Key Table
| Variable | Coefficient | SE | p-value |
|---|---|---|---|
| Education (years) | 0.112 | 0.018 | <0.001 |
| Experience | 0.035 | 0.009 | <0.001 |
| Experience-squared | -0.001 | 0.000 | 0.008 |
| Inverse Mills ratio | -0.280 | 0.120 | 0.020 |
| N (total) | 3,200 | ||
| N (employed) | 1,950 |
Authors' Identification Claim
The authors argue that maternal employment status is a valid exclusion restriction: it affects daughters' labor force participation through intergenerational transmission of work norms but does not directly affect daughters' market wages.
I. Swap-In: When to Use Something Else
-
Lee bounds: when normality is doubtful or no credible exclusion restriction exists. Lee bounds require only a monotonicity assumption (treatment does not make anyone less likely to be selected). You get an interval rather than a point estimate, but it is valid under weaker assumptions. Recommended as a robustness check alongside the Heckman model.
-
Control function approach: a generalization of the Heckman two-step. Instead of assuming joint normality (which implies the correction term is the inverse Mills ratio), you can use nonparametric or semiparametric estimates of the control function. This relaxes the distributional assumption at the cost of requiring a stronger exclusion restriction and a larger sample.
-
Semiparametric selection models: methods by Powell (1987), Newey (1988), and others estimate the selection correction without assuming normality. These use kernel or series estimators for the conditional expectation of the error. They require larger samples and are computationally more complex.
-
Inverse probability weighting (IPW): weight each observation by the inverse of its probability of being selected (estimated from the selection equation). Unlike Heckman, IPW does not require normality and can accommodate nonlinear outcome models. However, IPW only addresses selection on observables — it cannot correct for selection on unobservables.
-
Bivariate probit: when both the selection and the outcome are binary. The Heckman model assumes a continuous outcome; bivariate probit handles two binary equations with correlated errors. Uses the same exclusion restriction logic.
-
Bounds approaches (Manski bounds): when you want the weakest possible assumptions. Manski worst-case bounds assume nothing about the selection process and provide very wide intervals. Lee bounds tighten these by adding a monotonicity assumption.
J. Reviewer Checklist
Critical Reading Checklist
Paper Library
Foundational (1)
Heckman, J. J. (1979). Sample Selection Bias as a Specification Error.
Introduces the two-step estimator for correcting sample selection bias using the inverse Mills ratio. The paper shows that selection bias can be treated as an omitted variable problem, where the omitted variable is the conditional expectation of the error term given selection. One of the most cited papers in econometrics.
Application (2)
Mroz, T. A. (1987). The Sensitivity of an Empirical Model of Married Women's Hours of Work to Economic and Statistical Assumptions.
Classic application of the Heckman selection model to female labor supply. Shows that the two-step estimator's results are sensitive to the choice of exclusion restriction and the normality assumption. The Mroz dataset remains a standard teaching dataset for selection models.
Lennox, C. S., Francis, J. R., & Wang, Z. (2012). Selection Models in Accounting Research.
Reviews the use (and misuse) of Heckman selection models in accounting research. Documents common pitfalls including weak exclusion restrictions, failure to test normality, and mechanical application without economic justification for the selection equation.
Survey (3)
Puhani, P. A. (2000). The Heckman Correction for Sample Selection and Its Critique.
Comprehensive survey comparing Heckman two-step, MLE, and semiparametric alternatives. Discusses conditions under which the two-step estimator performs poorly (weak exclusion restriction, non-normality) and when MLE is preferable.
Bushway, S., Johnson, B. D., & Slocum, L. A. (2007). Is the Magic Still There? The Use of the Heckman Two-Step Correction for Selection Bias in Criminology.
Reviews Heckman model applications in criminology and finds widespread misapplication. Emphasizes that without a credible exclusion restriction, the Heckman correction provides no improvement over naive OLS and may even increase bias.
Certo, S. T., Busenbark, J. R., Woo, H., & Semadeni, M. (2016). Sample Selection Bias and Heckman Models in Strategic Management Research.
Reviews the use of Heckman models in strategic management. Provides practical guidance on when selection correction is needed, how to choose exclusion restrictions, and how to interpret results. Finds that many SMJ papers misapply the technique.