Heckman Selection Model
Corrects for sample selection bias when the outcome is observed only for a non-random subset of the population, using a two-equation system with an exclusion restriction.
One-Line Implementation
selection(employed ~ children + married + education + experience, wage ~ education + experience, data = df)heckman wage education experience, select(employed = children married education experience) twostep# Manual two-step: probit selection, compute IMR, OLS with IMR -- see method page for full implementationDownload Full Analysis Code
Complete scripts with diagnostics, robustness checks, and result export.
Motivating Example: Wages and Female Labor Supply
A labor economist wants to estimate the returns to education for married women. She has a large survey with information on education, age, number of children, husband's income, and — for women who are employed — their hourly wage.
Here is the problem: wages are only observed for women who work. Of the 753 women in her sample, only 428 are employed. The remaining 325 women have missing wages — not because of data errors, but because they chose not to participate in the labor market.
If she simply runs OLS on the 428 working women, she estimates wages conditional on employment. But employment itself is a choice that depends on potential wages. Women with very low potential wages may choose not to work (because the opportunity cost of leisure or home production exceeds their market wage). By restricting the sample to employed women, the researcher is selecting on an outcome that is correlated with the error term in the wage equation.
This problem is . It is not the same as omitted variable bias or measurement error — it arises because the sample is non-randomly drawn from the population. The observed wage distribution is truncated from below: women with the lowest potential wages are disproportionately absent from the sample.
The Heckman selection model (Heckman, 1979) solves this problem. It jointly models two processes: (1) the selection equation — whether a woman works — and (2) the outcome equation — what she earns if she works. By estimating the correlation between the errors in these two equations, the model corrects for the non-random selection into the observed sample. The key correction term is the , which captures the expected value of the error in the wage equation conditional on the woman being selected into the sample.
This approach contributed to James Heckman's 2000 Nobel Prize in Economics (awarded for his broader development of theory and methods for analyzing selective samples) and remains one of the most widely used corrections for sample selection bias across the social sciences (Mroz, 1987).
AOverview
What the Heckman Model Does
The Heckman selection model corrects for bias that arises when the outcome variable is observed only for a non-random subset of the population. It does so by modeling the selection process explicitly and incorporating information about selection into the outcome equation.
The model consists of two equations:
Selection equation (who is observed):
where if the outcome is observed (e.g., the woman works) and otherwise. is a vector of covariates that affect selection, and is the error term.
Outcome equation (what is the outcome, conditional on being observed):
where is the outcome of interest (e.g., wages), is a vector of covariates, and is the error term.
The key assumption is that the errors are jointly normally distributed:
where is the correlation between the selection and outcome errors, and is the standard deviation of . The variance of is normalized to 1 (as in a standard probit model).
The Selection Bias Problem
If , then — the expected error in the wage equation is non-zero for the selected sample. Specifically:
The term is the , where is the standard normal PDF and is the standard normal CDF. Running OLS on the selected sample omits this term, producing biased estimates of whenever .
Two Estimation Approaches
Heckman Two-Step (Heckit):
- Estimate the as a probit:
- Compute the inverse Mills ratio:
- Include as an additional regressor in the outcome equation and estimate by OLS:
The coefficient on estimates . Standard errors must be corrected because is a generated regressor (the standard OLS standard errors are too small).
Full Information Maximum Likelihood (FIML):
Estimate both equations simultaneously by maximizing the joint likelihood of the observed data. FIML is more efficient than the two-step estimator when the joint normality assumption holds, but it requires this assumption for the entire joint distribution, not just the conditional mean. With a weak exclusion restriction, both methods are fragile — MLE leverages the functional form more aggressively, which helps only if normality is correct (Puhani, 2000).
When to Use the Heckman Model
- Your outcome is observed only for a selected subsample and you believe the selection is non-random
- You have a credible — a variable that affects selection but not the outcome
- Joint normality of the error terms is a reasonable approximation
- You want to estimate causal effects or population-level relationships, not just conditional associations for the selected sample
When NOT to Use the Heckman Model
- Your missing data are missing at random (MAR) — standard imputation or inverse probability weighting may suffice
- You have no credible exclusion restriction — without one, the model is identified only through functional form (normality), which is fragile (Bushway et al., 2007)
- The normality assumption is substantially violated (e.g., the outcome distribution is heavily skewed, bounded, or multimodal) — consider semiparametric alternatives or Lee bounds
- You have a randomized experiment with attrition — Lee bounds provide a more robust approach
Common Confusions
When (Not) to Use This Method
Use the Heckman Model When:
-
Your outcome is observed only for a non-randomly selected subset. Wages for workers, test scores for students who did not drop out, firm performance for firms that survived, analyst forecasts for covered firms. The key question is: would including the unobserved outcomes change your conclusions?
-
You have a credible exclusion restriction. A variable that plausibly affects selection but not the outcome. Without one, the model relies on functional form assumptions that are typically indefensible.
-
Joint normality is a reasonable approximation. If the outcome variable is continuous, approximately symmetric, and not heavily skewed or bounded, normality is more plausible.
-
You want to recover population-level parameters. If you only care about the effect of education on wages for women who work, OLS on the selected sample is fine. But if you want the effect of education on potential wages for all women (workers and non-workers), you need the selection correction.
Do NOT Use the Heckman Model When:
-
Your missing data are missing at random (MAR). If missingness depends only on observed covariates, standard methods (multiple imputation, inverse probability weighting) are sufficient and do not require the normality assumption.
-
You have no exclusion restriction and cannot defend one. Without an exclusion restriction, you are relying entirely on the nonlinearity of the normal CDF for identification. Certo et al. (2016) find that many management papers apply Heckman without a credible exclusion restriction, rendering the correction unreliable.
-
The normality assumption is doubtful. If the outcome variable is heavily skewed (e.g., firm value, patent counts) or bounded (e.g., proportions, ratings on a fixed scale), the joint normality assumption is suspect. Consider semiparametric alternatives or Lee bounds.
-
You have a randomized experiment with differential attrition. Lee bounds provide a nonparametric approach that does not require normality or an exclusion restriction. They give bounds on the treatment effect rather than a point estimate, but the bounds are valid under weaker assumptions.
-
Selection is on observables only. If you believe you have observed all the variables that drive selection, matching or inverse probability weighting can address selection without parametric distributional assumptions.
Connection to Other Methods
The Heckman model relates to several other methods for handling selection and endogeneity:
-
Logit/Probit: the selection equation in the Heckman model is a probit. If you are only interested in modeling the selection decision itself (e.g., labor force participation), a standalone probit is sufficient. The Heckman model adds the outcome equation and the correlation between the two.
-
OLS: the outcome equation estimated on the selected sample is OLS with selection bias. The Heckman model adds the inverse Mills ratio to correct this bias. If (no selection bias), the Heckman model reduces to OLS on the selected sample.
-
IV/2SLS: conceptually similar in that both require an exclusion restriction. In IV, the instrument affects the endogenous regressor but not the outcome directly. In Heckman, the excluded variable affects selection but not the outcome. The Heckman two-step can be viewed as a approach — adding a correction term rather than instrumenting.
-
Lee bounds: a nonparametric alternative that bounds the treatment effect under weaker assumptions (monotonicity of selection, no functional form). Lee bounds are popular in program evaluation when normality is questionable. The trade-off is that you get bounds rather than a point estimate.
-
Matching: addresses selection on observables. Matching assumes that, conditional on observed covariates, selection is as good as random. The Heckman model addresses selection on unobservables — the case where selection depends on factors correlated with the outcome error term even after conditioning on observables.
-
Control function approach: the Heckman two-step is a special case of the control function approach. In the general control function framework, the correction term can take forms other than the inverse Mills ratio (e.g., for non-normal error distributions). Rivers and Vuong (1988) extends this to simultaneous equations with endogenous regressors.
BIdentification
For the Heckman model to provide valid correction, two key conditions must hold.
Condition 1: Joint Normality
Plain language: The unobserved factors affecting selection and the unobserved factors affecting the outcome must follow a bivariate normal distribution.
Formally: where .
This assumption is crucial because the inverse Mills ratio correction is derived from the properties of the bivariate normal distribution. If the errors are not jointly normal, the functional form of the correction term is wrong, and the bias correction may itself be biased.
Condition 2: Exclusion Restriction
Plain language: At least one variable must appear in the selection equation but not in the outcome equation. This variable affects whether the outcome is observed but does not directly affect the outcome itself.
Formally: There exists a variable such that (it affects selection) but does not appear in (it is excluded from the outcome equation).
Why it matters: Without an exclusion restriction, the Heckman model is identified only through the nonlinearity of the inverse Mills ratio — which comes from the normality assumption. If normality is even slightly wrong, this "identification through functional form" can produce wildly inaccurate corrections. Bushway et al. (2007) demonstrate that without a credible exclusion restriction, the Heckman correction provides no improvement over naive OLS and may increase bias.
Examples of exclusion restrictions in practice:
| Research setting | Selection equation | Exclusion restriction | Justification |
|---|---|---|---|
| Female labor supply | Works or not | Number of young children, husband's income | Affect labor force participation but not wage rate conditional on working |
| Firm R&D spending | Reports R&D or not | Industry peers' reporting practices | Affects disclosure but not the level of R&D |
| CEO compensation | Firm is publicly traded | State-level IPO regulations | Affect listing decision but not pay |
| Analyst coverage | Firm is covered or not | Geographic distance to nearest analyst | Affects coverage probability but not firm value |
CVisual Intuition
Adjust the selection correlation to see how naive OLS on the selected sample diverges from the true effect. When selection is strong, OLS is badly biased; the Heckman correction uses the inverse Mills ratio to recover the true coefficient.
Watch how non-random selection into the labor force distorts the observed wage-education relationship:
DMathematical Derivation
Derivation of the Inverse Mills Ratio Correction
Don't worry about the notation yet — here's what this means in words: The two-step correction derives from the conditional expectation of the outcome given selection, under joint normality of the error terms.
Setup. We have two equations:
Selection: , where
Outcome: , observed only when
Assume are jointly normal with correlation and , .
Step 1: Conditional expectation of the outcome.
The OLS bias is , which is non-zero when selection is correlated with the outcome.
Step 2: Use the properties of the truncated bivariate normal.
Since iff , we need .
For jointly normal :
Step 3: Expected value of a truncated standard normal.
For :
where . This ratio is the inverse Mills ratio.
Step 4: Combine.
This derivation shows that OLS on the selected sample omits , producing an omitted variable bias. The two-step estimator corrects this by including as an additional regressor.
Step 5: Standard error correction.
Because is estimated from the first-stage probit (it is a generated regressor), the OLS standard errors in the second stage are incorrect. The correct variance-covariance matrix accounts for the estimation uncertainty in . All standard software packages compute the corrected standard errors automatically.
EImplementation
Heckman Selection Model with Diagnostics
# Requires: sampleSelection
# sampleSelection: R package for Heckman selection models (Toomet & Henningsen)
library(sampleSelection)
# --- Step 1: Prepare the data ---
# Outcome: log(wage), observed ONLY for working women (selected sample)
# Selection: whether the woman works (lfp = 1)
# Exclusion restriction: nwifeinc (non-wife income), kidslt6 (kids under 6)
# These affect labor force participation but should not directly affect wage rate
# --- Step 2: Heckman two-step estimator ---
# heckit() estimates the selection and outcome equations jointly
# selection: probit for participation (first stage)
# outcome: OLS for wages, corrected by the inverse Mills ratio (second stage)
heck_2step <- heckit(
selection = lfp ~ age + I(age^2) + educ + nwifeinc + kidslt6 + kidsge6,
outcome = log(wage) ~ educ + exper + I(exper^2) + city,
data = df,
method = "2step"
)
summary(heck_2step)
# Coefficient on lambda (inverse Mills ratio) estimates rho * sigma
# If lambda is significant, this is consistent with selection bias under the model
# --- Step 3: Full information MLE ---
# MLE estimates the full joint model under bivariate normality
# More efficient than two-step but more sensitive to normality violations
heck_mle <- heckit(
selection = lfp ~ age + I(age^2) + educ + nwifeinc + kidslt6 + kidsge6,
outcome = log(wage) ~ educ + exper + I(exper^2) + city,
data = df,
method = "ml"
)
summary(heck_mle)
# MLE directly estimates rho and sigma (not just rho*sigma)
# Compare two-step and MLE: large discrepancies suggest normality problems
# --- Step 4: Diagnostics ---
# Test H0: rho = 0 (no selection bias)
# If not rejected, OLS on the selected sample is consistent
# Compare with naive OLS (ignoring selection)
ols_naive <- lm(log(wage) ~ educ + exper + I(exper^2) + city,
data = df[df$lfp == 1, ])
summary(ols_naive)
# If Heckman and OLS coefficients differ substantially, selection bias
# is economically meaningful and the correction is neededFDiagnostics
F.1 Significance of the Inverse Mills Ratio (Lambda)
The most basic diagnostic: is (the coefficient on the inverse Mills ratio) statistically significant? Under the null hypothesis , there is no selection bias, and OLS on the selected sample is consistent. If is insignificant, either (a) there is genuinely no selection bias, or (b) the exclusion restriction is too weak to detect it.
- In the two-step estimator, test whether the coefficient on is significant (use corrected standard errors)
- In MLE, test using the Wald test on or a likelihood ratio test comparing the joint model to separate probit + OLS
F.2 Normality Tests
Since the correction relies on joint normality, assess whether this assumption is plausible:
-
Residual normality: test the outcome equation residuals for normality (Shapiro-Wilk, Jarque-Bera, Q-Q plot). This checks marginal normality of , which is necessary but not sufficient for joint normality.
-
Polynomial Mills ratio test: add and to the outcome equation. If these higher-order terms are significant, the linear Mills ratio correction is insufficient — evidence against normality. The polynomial Mills ratio test is a specification test for the functional form of the correction.
-
Compare two-step and MLE: large discrepancies between the two estimators suggest normality violations (MLE relies more heavily on normality than the two-step estimator).
F.3 Exclusion Restriction Strength
A weak exclusion restriction produces imprecise estimates and makes the model fragile:
-
Joint significance test: test whether the excluded variables are jointly significant in the probit selection equation ( test). An insignificant exclusion restriction means the model is unidentified without functional form.
-
Magnitude of the first-stage coefficients: the excluded variables should have economically meaningful effects on selection, not just statistical significance.
-
Collinearity check: verify that the inverse Mills ratio is not highly collinear with the covariates in the outcome equation. High collinearity (variance inflation factor (VIF) > 10 for ) indicates that the model cannot distinguish the selection correction from the direct effects of the covariates.
F.4 Two-Step vs. MLE Comparison
Estimate the model using both methods and compare:
- Similar results: reassuring. Both methods are estimating the same parameters, and the normality assumption is likely adequate.
- Different results: cause for concern. Possible explanations: (1) normality is violated (MLE is more sensitive), (2) the exclusion restriction is weak (MLE leverages different information than two-step), (3) the sample size is too small for MLE to converge properly.
library(sampleSelection)
# Fit two-step and MLE
heck_2s <- heckit(
selection = lfp ~ age + I(age^2) + educ + nwifeinc + kidslt6 + kidsge6,
outcome = log(wage) ~ educ + exper + I(exper^2) + city,
data = df, method = "2step"
)
heck_ml <- heckit(
selection = lfp ~ age + I(age^2) + educ + nwifeinc + kidslt6 + kidsge6,
outcome = log(wage) ~ educ + exper + I(exper^2) + city,
data = df, method = "ml"
)
# F.1 Test significance of lambda (rho * sigma)
summary(heck_2s) # Check p-value on "Inverse Mills Ratio"
# F.2 Polynomial Mills ratio test for normality
# Compute Mills ratio from first-stage probit
probit_fit <- glm(lfp ~ age + I(age^2) + educ + nwifeinc +
kidslt6 + kidsge6,
family = binomial(link = "probit"), data = df)
Zgamma <- predict(probit_fit, type = "link")
mills <- dnorm(Zgamma) / pnorm(Zgamma)
# Add squared and cubed Mills ratio to outcome equation
selected <- df[df$lfp == 1, ]
mills_sel <- mills[df$lfp == 1]
selected$mills <- mills_sel
selected$mills2 <- mills_sel^2
selected$mills3 <- mills_sel^3
norm_test <- lm(log(wage) ~ educ + exper + I(exper^2) + city +
mills + mills2 + mills3,
data = selected)
# Joint test of mills2 and mills3
library(car)
linearHypothesis(norm_test, c("mills2 = 0", "mills3 = 0"))
# F.3 Exclusion restriction strength
anova(
glm(lfp ~ age + I(age^2) + educ + kidsge6,
family = binomial(link = "probit"), data = df),
probit_fit, test = "Chisq"
)
# F.4 Compare two-step and MLE
cbind(TwoStep = coef(heck_2s, part = "outcome"),
MLE = coef(heck_ml, part = "outcome"))F.5 Interpreting Your Results
What Rho Tells You
The parameter — the correlation between the selection and outcome errors — is the key parameter that drives the selection correction.
| Sign of bias in naive OLS | Interpretation | |
|---|---|---|
| OLS underestimates the mean outcome | Individuals who select in have lower unobserved outcome potential (negative selection). Since and the inverse Mills ratio , $E[\varepsilon | |
| No bias | Selection is independent of the outcome (conditional on covariates). OLS on the selected sample is consistent. | |
| OLS overestimates the mean outcome | Individuals who select in have higher unobserved outcome potential (positive selection). Since , $E[\varepsilon |
What Lambda ( Coefficient) Captures
The coefficient on the inverse Mills ratio in the two-step estimator is , sometimes denoted or . Its magnitude and significance determine:
-
Significance of : if significant, this result is consistent with correlation between the selection and outcome disturbances under the maintained model — but significance alone does not prove selection bias is present (Certo et al., 2016). If insignificant, either selection bias is absent or the model lacks power to detect it (weak exclusion restriction).
-
Magnitude of : the product determines the size of the bias correction. A large coefficient means the selection correction substantially changes the outcome equation coefficients.
-
Sign of : reveals the direction of selection. Negative means negative selection () — those who select in have lower unobserved outcome potential than a random draw from the population, so OLS on the selected sample underestimates the population mean.
What to Report in a Table
A well-reported Heckman model should include:
- Both equations: report the full selection equation (probit coefficients) and the outcome equation (OLS coefficients with Mills ratio correction)
- Lambda (): the coefficient on the inverse Mills ratio, with its standard error and p-value
- Rho () and sigma (): if using MLE, report these separately
- Number of observations: total N, number selected (observed outcome), number not selected
- Exclusion restriction: clearly identify which variables appear in the selection equation but not the outcome equation
- Exclusion restriction strength: chi-squared test of the excluded variables in the selection equation
- Estimation method: two-step or MLE
- Naive OLS comparison: show how the coefficients change when selection is ignored
GWhat Can Go Wrong
No Exclusion Restriction
Heckman model with a credible exclusion restriction (number of young children affects labor force participation but not wage rate)
Returns to education: 0.108 (SE = 0.014). Lambda = -0.28 (SE = 0.11, p = 0.01). The selection correction is significant and the education coefficient is well-estimated because the exclusion restriction (kidslt6) is strong in the selection equation (chi-squared = 42.3, p < 0.001).
Normality Violation
Outcome variable (log wages) is approximately normally distributed. Joint normality of error terms is plausible.
Two-step estimate of returns to education: 0.108 (SE = 0.014). MLE estimate: 0.105 (SE = 0.012). The two methods agree closely, and the polynomial Mills ratio test does not reject normality (p = 0.61).
Weak Exclusion Restriction
Exclusion restriction (number of children under 6) is strongly predictive of labor force participation: probit coefficient = -0.87, chi-squared = 42.3, p < 0.001
Lambda = -0.28 (SE = 0.11). The inverse Mills ratio is precisely estimated, providing a meaningful selection correction. The VIF of lambda in the outcome equation is 2.1 — well below the danger zone.
Confusing Incidental Truncation with Sample Selection
True sample selection: wages are missing because women choose not to work. The selection decision is correlated with potential wages (women with low potential wages opt out).
Heckman correction is appropriate: rho = -0.45. Women who select into employment have higher-than-average potential wages. OLS on the selected sample overestimates average wages in the population.
HPractice
H.1 Concept Checks
A researcher estimates a Heckman selection model for CEO compensation, where compensation is observed only for public firms. She uses state-level IPO regulations as the exclusion restriction. She finds that the inverse Mills ratio coefficient is -0.45 (SE = 0.18, p = 0.01) and rho = -0.38. What does the negative rho tell us about the selection process?
A researcher applies the Heckman two-step estimator to study the effect of R&D spending on firm performance. She includes the same set of variables in both the selection equation (whether the firm reports R&D) and the outcome equation (firm performance given R&D is reported). She has no exclusion restriction. She finds that the inverse Mills ratio is significant (p = 0.04). Is this evidence that her model is working correctly?
You estimate a Heckman model and find that lambda (the coefficient on the inverse Mills ratio) is -0.15 with a standard error of 0.42 and p = 0.72. The naive OLS coefficient on your key variable is 0.35 (SE = 0.08), while the Heckman-corrected coefficient is 0.33 (SE = 0.14). What should you conclude?
H.2 Guided Exercise
Interpreting Heckman Selection Model Output
You study the effect of training programs on worker wages. Wages are observed only for employed workers. Your Heckman model produces:
Selection Equation (Probit: Employed = 1)
| Variable | Coeff | SE | p-value |
|---|---|---|---|
| Age | 0.045 | 0.012 | < 0.001 |
| Age-squared | -0.0005 | 0.0002 | 0.012 |
| Education | 0.112 | 0.021 | < 0.001 |
| Married | 0.284 | 0.098 | 0.004 |
| Num. children < 6 | -0.432 | 0.076 | < 0.001 [EXCLUDED] |
| Spouse income ($000s) | -0.018 | 0.005 | < 0.001 [EXCLUDED] |
Exclusion restriction test: chi-squared(2) = 52.4, p < 0.001
Outcome Equation (Dep. var: log(wage))
| Variable | Coeff | SE | p-value |
|---|---|---|---|
| Education | 0.098 | 0.015 | < 0.001 |
| Training program | 0.145 | 0.038 | < 0.001 |
| Experience | 0.032 | 0.008 | < 0.001 |
| Experience-squared | -0.0005 | 0.0002 | 0.012 |
| Inverse Mills ratio | -0.312 | 0.104 | 0.003 |
rho = -0.47, sigma = 0.664, lambda = rho * sigma = -0.312 Method: Two-step. N = 2,000 (1,340 employed, 660 not employed).
Naive OLS on employed workers: Training coefficient = 0.118 (SE = 0.032, p < 0.001).
H.3 Error Detective
Read the analysis below carefully and identify the errors.
A management researcher studies whether board diversity affects firm performance. She argues that firm performance (Tobin's Q) is observed only for publicly listed firms, creating a selection problem. She estimates a Heckman model:
Selection equation: Listed = f(firm_age, total_assets, industry) Outcome equation: TobinsQ = f(board_diversity, firm_size, leverage, ROA, industry)
She uses firm_age and total_assets as exclusion restrictions, arguing they affect the listing decision but not performance. She reports:
- Board diversity coefficient (Heckman): 0.42 (p = 0.03) - Board diversity coefficient (OLS): 0.28 (p = 0.08) - Lambda: -0.89 (SE = 0.31, p = 0.004) - Rho: -0.62 - Method: Two-step
She concludes: "After correcting for selection into public listing, board diversity has an even stronger positive effect on firm performance."
She does not report the selection equation coefficients or test the exclusion restriction strength.
Select all errors you can find:
Read the analysis below carefully and identify the errors.
A labor economist studies the gender wage gap. She estimates a Heckman model separately for men and women:
For women: - Selection equation: Employed = f(age, education, married, children_under_5, spouse_income) - Outcome equation: log(wage) = f(education, experience, experience^2, occupation) - Exclusion restrictions: children_under_5, spouse_income - Lambda = -0.31 (SE = 0.09, p < 0.001), rho = -0.42
For men: - Selection equation: Employed = f(age, education, married, children_under_5, spouse_income) - Outcome equation: log(wage) = f(education, experience, experience^2, occupation) - Exclusion restrictions: children_under_5, spouse_income - Lambda = 0.02 (SE = 0.15, p = 0.89), rho = 0.03
She reports: "The selection-corrected gender wage gap is 22 log points, compared to 18 log points in naive OLS. Selection correction matters for women but not for men."
She does not discuss whether the same exclusion restriction is appropriate for both genders.
Select all errors you can find:
H.4 You Are the Referee
Read the paper summary below and write a brief referee critique (2-3 sentences) of the identification strategy.
Paper Summary
The authors study returns to education for married women using a Heckman two-step selection model, correcting for the fact that wages are observed only for employed women. They use a sample of 3,200 married women (1,950 employed, 1,250 not employed) from a national household survey. As their exclusion restriction, they use whether the woman's mother was employed when the woman was age 14, arguing that maternal employment norms affect daughters' labor force participation but not their wages. They find a selection-corrected return to education of 11.2% per year (SE = 1.8%), compared to a naive OLS estimate of 8.5% (SE = 1.1%). Lambda is -0.28 (p = 0.02).
Key Table
| Variable | Coefficient | SE | p-value |
|---|---|---|---|
| Education (years) | 0.112 | 0.018 | <0.001 |
| Experience | 0.035 | 0.009 | <0.001 |
| Experience-squared | -0.001 | 0.0004 | 0.008 |
| Inverse Mills ratio | -0.280 | 0.120 | 0.020 |
| N (total) | 3,200 | ||
| N (employed) | 1,950 |
Authors' Identification Claim
The authors argue that maternal employment status is a valid exclusion restriction: it affects daughters' labor force participation through intergenerational transmission of work norms but does not directly affect daughters' market wages.
ISwap-In: When to Use Something Else
-
Lee bounds: when normality is doubtful or no credible exclusion restriction exists. Lee bounds require only a monotonicity assumption (treatment does not make anyone less likely to be selected). You get an interval rather than a point estimate, but it is valid under weaker assumptions. Recommended as a robustness check alongside the Heckman model.
-
Control function approach: a generalization of the Heckman two-step. Instead of assuming joint normality (which implies the correction term is the inverse Mills ratio), you can use nonparametric or semiparametric estimates of the control function. This relaxes the distributional assumption at the cost of requiring a stronger exclusion restriction and a larger sample.
-
Semiparametric selection models: methods by , , and others estimate the selection correction without assuming normality. These use kernel or series estimators for the conditional expectation of the error. They require larger samples and are computationally more complex.
-
Inverse probability weighting (IPW): weight each observation by the inverse of its probability of being selected (estimated from the selection equation). Unlike Heckman, IPW does not require normality and can accommodate nonlinear outcome models. However, IPW only addresses selection on observables — it cannot correct for selection on unobservables.
-
Bivariate probit: when both the selection and the outcome are binary. The Heckman model assumes a continuous outcome; bivariate probit handles two binary equations with correlated errors. Uses the same exclusion restriction logic.
-
Bounds approaches (Manski bounds): when you want the weakest possible assumptions. Manski worst-case bounds assume nothing about the selection process and provide very wide intervals. Lee bounds tighten these by adding a monotonicity assumption.
JReviewer Checklist
Critical Reading Checklist
Paper Library
Foundational (4)
Heckman, J. J. (1979). Sample Selection Bias as a Specification Error.
Heckman introduces the two-step estimator for correcting sample selection bias using the inverse Mills ratio. The paper shows that selection bias can be treated as an omitted variable problem, where the omitted variable is the conditional expectation of the error term given selection. One of the most cited papers in econometrics.
Newey, W. K. (1999). Two Step Series Estimation of Sample Selection Models.
Newey proposes a semiparametric two-step estimator for sample selection models that replaces the parametric inverse Mills ratio with a flexible series (power series or regression spline) approximation to the unknown selection correction function. This approach avoids the normality assumption underlying the standard Heckman correction while retaining the computational convenience of a two-step procedure. Researchers concerned about distributional misspecification in selection models can use series-based selection corrections as a robust alternative to parametric methods.
Powell, J. L. (1987). Semiparametric Estimation of Bivariate Latent Variable Models.
Powell develops semiparametric methods for estimating bivariate latent variable models—including censored sample selection models—without imposing distributional assumptions on the error terms. This approach relaxes the bivariate normality requirement of the Heckman two-step estimator, requiring only an exclusion restriction and mild regularity conditions. Researchers who doubt the normality assumption in selection models can apply these methods to obtain consistent estimates under weaker conditions.
Rivers, D., & Vuong, Q. H. (1988). Limited Information Estimators and Exogeneity Tests for Simultaneous Probit Models.
Rivers and Vuong propose a computationally simple two-step maximum likelihood procedure for estimating simultaneous probit models with endogenous regressors, and derive simple exogeneity tests based on this estimator. The exogeneity tests are asymptotically equivalent to classical tests based on limited information maximum likelihood but require only probit and OLS regressions to implement. Applied researchers working with binary outcome models and suspected endogeneity can use the Rivers-Vuong procedure as a tractable alternative to full information maximum likelihood.
Application (2)
Mroz, T. A. (1987). The Sensitivity of an Empirical Model of Married Women's Hours of Work to Economic and Statistical Assumptions.
Mroz provides a classic application of the Heckman selection model to female labor supply. Shows that the two-step estimator's results are sensitive to the choice of exclusion restriction and the normality assumption. The Mroz dataset remains a standard teaching dataset for selection models.
Shaver, J. M. (1998). Accounting for Endogeneity When Assessing Strategy Performance: Does Entry Mode Choice Affect FDI Survival?.
Shaver demonstrates how ignoring endogeneity — specifically, the self-selection of firms into entry modes — biases performance estimates in this foundational strategy paper. He shows that the choice between greenfield entries and acquisitions reflects private information about expected survival, and uses a Heckman-style selection correction to obtain unbiased estimates. One of the first papers to systematically demonstrate endogeneity problems in strategy research.
Survey (6)
Bushway, S., Johnson, B. D., & Slocum, L. A. (2007). Is the Magic Still There? The Use of the Heckman Two-Step Correction for Selection Bias in Criminology.
Bushway, Johnson, and Slocum review Heckman model applications in criminology and find widespread misapplication. Emphasizes that without a credible exclusion restriction, the Heckman correction provides no improvement over naive OLS and may even increase bias.
Certo, S. T., Busenbark, J. R., Woo, H., & Semadeni, M. (2016). Sample Selection Bias and Heckman Models in Strategic Management Research.
Certo, Busenbark, Woo, and Semadeni review the use of Heckman models in strategic management. They provide practical guidance on when selection correction is needed, how to choose exclusion restrictions, and how to interpret results. Finds that many SMJ papers misapply the technique.
Lennox, C. S., Francis, J. R., & Wang, Z. (2012). Selection Models in Accounting Research.
Lennox, Francis, and Wang review the use (and misuse) of Heckman selection models in accounting research. Documents common pitfalls including weak exclusion restrictions, failure to test normality, and mechanical application without economic justification for the selection equation.
Puhani, P. A. (2000). The Heckman Correction for Sample Selection and Its Critique.
Puhani provides a short overview of Monte Carlo evidence on the Heckman two-step estimator, comparing it with full-information MLE and subsample OLS. Finds MLE preferable absent collinearity between the exclusion restriction and other regressors, but subsample OLS most robust when collinearity is present.
Wolfolds, S. E., & Siegel, J. (2019). Misaccounting for Endogeneity: The Peril of Relying on the Heckman Two-Step Method without a Valid Instrument.
Wolfolds and Siegel demonstrate that the Heckman selection correction is frequently misapplied in management research, particularly when the exclusion restriction is not credible. They show via simulation and replication that applying the Heckman correction without a valid instrument can introduce more bias than it removes. The paper provides a cautionary guide for researchers considering selection models and recommends transparent reporting of the exclusion restriction.
Wooldridge, J. M. (2010). Econometric Analysis of Cross Section and Panel Data.
Wooldridge's graduate textbook is the standard reference for cross-section and panel data econometrics. Chapters 10-11 provide a thorough treatment of fixed effects, random effects, and related panel data methods, while later chapters cover general estimation methodology (MLE, GMM, M-estimation) with panel data applications throughout. The book covers both linear and nonlinear models with careful attention to assumptions.