MethodAtlas
Model-BasedEstablished

Heckman Selection Model

Corrects for sample selection bias when the outcome is observed only for a non-random subset of the population, using a two-equation system with an exclusion restriction.

Quick Reference

When to Use
When your outcome variable is observed only for a selected sample -- e.g., wages only for employed workers, firm performance only for surviving firms, deal returns only for completed transactions.
Key Assumption
Joint normality of the error terms in the selection and outcome equations. An exclusion restriction: at least one variable affects selection but not the outcome. Without the exclusion restriction, identification relies entirely on the normality assumption.
Common Mistake
Not having a credible exclusion restriction and relying solely on the normality assumption for identification, which produces fragile estimates.
Estimated Time
3 hours

One-Line Implementation

Stata: heckman wage education experience, select(employed = children married) twostep
R: selection(employed ~ children + married + education + experience, wage ~ education + experience, data = df)
Python: Heckman(df['wage'], df[['const','education','experience']], df['employed'], df[['const','children','married','education','experience']]).fit()

Download Full Analysis Code

Complete scripts with diagnostics, robustness checks, and result export.

Motivating Example

A labor economist wants to estimate the returns to education for married women. She has a large survey with information on education, age, number of children, husband's income, and — for women who are employed — their hourly wage.

Here is the problem: wages are only observed for women who work. Of the 753 women in her sample, only 428 are employed. The remaining 325 women have missing wages — not because of data errors, but because they chose not to participate in the labor market.

If she simply runs OLS on the 428 working women, she estimates wages conditional on employment. But employment itself is a choice that depends on potential wages. Women with very low potential wages may choose not to work (because the opportunity cost of leisure or home production exceeds their market wage). By restricting the sample to employed women, the researcher is selecting on an outcome that is correlated with the error term in the wage equation.

This problem is . It is not the same as omitted variable bias or measurement error — it arises because the sample is non-randomly drawn from the population. The observed wage distribution is truncated from below: women with the lowest potential wages are disproportionately absent from the sample.

The Heckman selection model (Heckman, 1979) solves this problem. It jointly models two processes: (1) the selection equation — whether a woman works — and (2) the outcome equation — what she earns if she works. By estimating the correlation between the errors in these two equations, the model corrects for the non-random selection into the observed sample. The key correction term is the , which captures the expected value of the error in the wage equation conditional on the woman being selected into the sample.

This approach earned James Heckman the 2000 Nobel Prize in Economics and remains one of the most widely used corrections for sample selection bias across the social sciences (Mroz, 1987).


A. Overview

What the Heckman Model Does

The Heckman selection model corrects for bias that arises when the outcome variable is observed only for a non-random subset of the population. It does so by modeling the selection process explicitly and incorporating information about selection into the outcome equation.

The model consists of two equations:

Selection equation (who is observed):

Di=Ziγ+ui,Di=1(Di>0)D_i^* = Z_i'\gamma + u_i, \quad D_i = \mathbf{1}(D_i^* > 0)

where Di=1D_i = 1 if the outcome is observed (e.g., the woman works) and Di=0D_i = 0 otherwise. ZiZ_i is a vector of covariates that affect selection, and uiu_i is the error term.

Outcome equation (what is the outcome, conditional on being observed):

Yi=Xiβ+εi,observed only if Di=1Y_i = X_i'\beta + \varepsilon_i, \quad \text{observed only if } D_i = 1

where YiY_i is the outcome of interest (e.g., wages), XiX_i is a vector of covariates, and εi\varepsilon_i is the error term.

The key assumption is that the errors (ui,εi)(u_i, \varepsilon_i) are jointly normally distributed:

(uiεi)N((00),(1ρσρσσ2))\begin{pmatrix} u_i \\ \varepsilon_i \end{pmatrix} \sim N\left(\begin{pmatrix} 0 \\ 0 \end{pmatrix}, \begin{pmatrix} 1 & \rho\sigma \\ \rho\sigma & \sigma^2 \end{pmatrix}\right)

where ρ\rho is the correlation between the selection and outcome errors, and σ\sigma is the standard deviation of εi\varepsilon_i. The variance of uiu_i is normalized to 1 (as in a standard probit model).

The Selection Bias Problem

If ρ0\rho \neq 0, then E[εiDi=1]0E[\varepsilon_i | D_i = 1] \neq 0 — the expected error in the wage equation is non-zero for the selected sample. Specifically:

E[YiXi,Di=1]=Xiβ+ρσϕ(Ziγ)Φ(Ziγ)E[Y_i | X_i, D_i = 1] = X_i'\beta + \rho\sigma \cdot \frac{\phi(Z_i'\gamma)}{\Phi(Z_i'\gamma)}

The term λ(Ziγ)=ϕ(Ziγ)Φ(Ziγ)\lambda(Z_i'\gamma) = \frac{\phi(Z_i'\gamma)}{\Phi(Z_i'\gamma)} is the , where ϕ()\phi(\cdot) is the standard normal PDF and Φ()\Phi(\cdot) is the standard normal CDF. Running OLS on the selected sample omits this term, producing biased estimates of β\beta whenever ρ0\rho \neq 0.

Two Estimation Approaches

Heckman Two-Step (Heckit):

  1. Estimate the as a probit: P(Di=1Zi)=Φ(Ziγ)P(D_i = 1 | Z_i) = \Phi(Z_i'\gamma)
  2. Compute the inverse Mills ratio: λ^i=ϕ(Ziγ^)/Φ(Ziγ^)\hat{\lambda}_i = \phi(Z_i'\hat{\gamma}) / \Phi(Z_i'\hat{\gamma})
  3. Include λ^i\hat{\lambda}_i as an additional regressor in the outcome equation and estimate by OLS: Yi=Xiβ+ρσλ^i+errorY_i = X_i'\beta + \rho\sigma \cdot \hat{\lambda}_i + \text{error}

The coefficient on λ^i\hat{\lambda}_i estimates ρσ\rho\sigma. Standard errors must be corrected because λ^i\hat{\lambda}_i is a generated regressor (the standard OLS standard errors are too small).

Full Information Maximum Likelihood (FIML):

Estimate both equations simultaneously by maximizing the joint likelihood of the observed data. FIML is more efficient than the two-step estimator (especially when the exclusion restriction is weak) but requires the joint normality assumption to hold for the entire joint distribution, not just the conditional mean (Puhani, 2000).

When to Use the Heckman Model

  • Your outcome is observed only for a selected subsample and you believe the selection is non-random
  • You have a credible — a variable that affects selection but not the outcome
  • Joint normality of the error terms is a reasonable approximation
  • You want to estimate causal effects or population-level relationships, not just conditional associations for the selected sample

When NOT to Use the Heckman Model

  • Your missing data is missing at random (MAR) — standard imputation or inverse probability weighting may suffice
  • You have no credible exclusion restriction — without one, the model is identified only through functional form (normality), which is fragile (Bushway et al., 2007)
  • The normality assumption is badly violated — consider semiparametric alternatives or Lee bounds
  • You have a randomized experiment with attrition — Lee bounds provide a more robust approach

Common Confusions

When (Not) to Use This Method

Use the Heckman Model When:

  1. Your outcome is observed only for a non-randomly selected subset. Wages for workers, test scores for students who did not drop out, firm performance for firms that survived, analyst forecasts for covered firms. The key question is: would including the unobserved outcomes change your conclusions?

  2. You have a credible exclusion restriction. A variable that plausibly affects selection but not the outcome. Without one, the model relies on functional form assumptions that are typically indefensible.

  3. Joint normality is a reasonable approximation. If the outcome variable is continuous, approximately symmetric, and not heavily skewed or bounded, normality is more plausible.

  4. You want to recover population-level parameters. If you only care about the effect of education on wages for women who work, OLS on the selected sample is fine. But if you want the effect of education on potential wages for all women (workers and non-workers), you need the selection correction.

Do NOT Use the Heckman Model When:

  1. Your missing data is missing at random (MAR). If missingness depends only on observed covariates, standard methods (multiple imputation, inverse probability weighting) are sufficient and do not require the normality assumption.

  2. You have no exclusion restriction and cannot defend one. Without an exclusion restriction, you are relying entirely on the nonlinearity of the normal CDF for identification. Certo et al. (2016) find that many management papers apply Heckman without a credible exclusion restriction, rendering the correction unreliable.

  3. The normality assumption is doubtful. If the outcome variable is heavily skewed (e.g., firm value, patent counts) or bounded (e.g., proportions, ratings on a fixed scale), the joint normality assumption is suspect. Consider semiparametric alternatives or Lee bounds.

  4. You have a randomized experiment with differential attrition. Lee bounds provide a nonparametric approach that does not require normality or an exclusion restriction. They give bounds on the treatment effect rather than a point estimate, but the bounds are valid under weaker assumptions.

  5. Selection is on observables only. If you believe you have observed all the variables that drive selection, matching or inverse probability weighting can address selection without parametric distributional assumptions.

Connection to Other Methods

The Heckman model relates to several other methods for handling selection and endogeneity:

  • Logit/Probit: the selection equation in the Heckman model is a probit. If you are only interested in modeling the selection decision itself (e.g., labor force participation), a standalone probit is sufficient. The Heckman model adds the outcome equation and the correlation between the two.

  • OLS: the outcome equation estimated on the selected sample is OLS with selection bias. The Heckman model adds the inverse Mills ratio to correct this bias. If ρ=0\rho = 0 (no selection bias), the Heckman model reduces to OLS on the selected sample.

  • IV/2SLS: conceptually similar in that both require an exclusion restriction. In IV, the instrument affects the endogenous regressor but not the outcome directly. In Heckman, the excluded variable affects selection but not the outcome. The Heckman two-step can be viewed as a control function approach — adding a correction term rather than instrumenting.

  • Lee bounds: a nonparametric alternative that bounds the treatment effect under weaker assumptions (monotonicity of selection, no functional form). Lee bounds are popular in program evaluation when normality is questionable. The trade-off is that you get bounds rather than a point estimate.

  • Matching: addresses selection on observables. Matching assumes that, conditional on observed covariates, selection is as good as random. The Heckman model addresses selection on unobservables — the case where selection depends on factors correlated with the outcome error term even after conditioning on observables.

  • Control function approach: the Heckman two-step is a special case of the control function approach. In the general control function framework, the correction term can take forms other than the inverse Mills ratio (e.g., for non-normal error distributions). Rivers-Vuong (1988) extends this to simultaneous equations with endogenous regressors.


B. Identification

For the Heckman model to provide valid correction, two key conditions must hold.

Condition 1: Joint Normality

Plain language: The unobserved factors affecting selection and the unobserved factors affecting the outcome must follow a bivariate normal distribution.

Formally: (ui,εi)N(0,Σ)(u_i, \varepsilon_i) \sim N(0, \Sigma) where Σ=(1ρσρσσ2)\Sigma = \begin{pmatrix} 1 & \rho\sigma \\ \rho\sigma & \sigma^2 \end{pmatrix}.

This assumption is crucial because the inverse Mills ratio correction is derived from the properties of the bivariate normal distribution. If the errors are not jointly normal, the functional form of the correction term λ()\lambda(\cdot) is wrong, and the bias correction may itself be biased.

Condition 2: Exclusion Restriction

Plain language: At least one variable must appear in the selection equation but not in the outcome equation. This variable affects whether the outcome is observed but does not directly affect the outcome itself.

Formally: There exists a variable ZkZ_k such that γk0\gamma_k \neq 0 (it affects selection) but ZkZ_k does not appear in XiX_i (it is excluded from the outcome equation).

Why it matters: Without an exclusion restriction, the Heckman model is identified only through the nonlinearity of the inverse Mills ratio — which comes from the normality assumption. If normality is even slightly wrong, this "identification through functional form" can produce wildly inaccurate corrections. Bushway et al. (2007) demonstrate that without a credible exclusion restriction, the Heckman correction provides no improvement over naive OLS and may increase bias.

Examples of exclusion restrictions in practice:

Research settingSelection equationExclusion restrictionJustification
Female labor supplyWorks or notNumber of young children, husband's incomeAffect labor force participation but not wage rate conditional on working
Firm R&D spendingReports R&D or notIndustry peers' reporting practicesAffects disclosure but not the level of R&D
CEO compensationFirm is publicly tradedState-level IPO regulationsAffect listing decision but not pay
Analyst coverageFirm is covered or notGeographic distance to nearest analystAffects coverage probability but not firm value

C. Visual Intuition

Adjust the selection correlation to see how naive OLS on the selected sample diverges from the true effect. When selection is strong, OLS is badly biased; the Heckman correction uses the inverse Mills ratio to recover the true coefficient.

Interactive Simulation

Heckman Selection Correction vs. Naive OLS

DGP: Selection S* = 0.2 + Z + v, Outcome Y = 1 + 0.5·X + ε, Corr(v, ε) = -0.4. 276 of 500 observations selected.

-3.2-1.50.11.73.45.0Covariate (X)Outcome (Y)
Naive OLSHeckman two-stepTrue effect

Estimation Results

Estimatorβ̂SE95% CIBias
Naive OLS0.5180.037[0.44, 0.59]+0.018
Heckman two-stepclosest0.5160.037[0.44, 0.59]+0.016
True β0.500
500

Total observations (before selection)

-0.4

Correlation between selection and outcome errors

0.5

The causal effect of X on Y

1.00

Standard deviation of the outcome error

Why the difference?

Selection is negatively correlated with the outcome (ρ = -0.4). Naive OLS on the selected sample is biased (β̂ = 0.518, bias = +0.018) because it ignores the non-random selection into the observed sample. The Heckman two-step estimator includes the inverse Mills ratio to correct for selection, reducing the bias by about 13% (β̂ = 0.516, bias = +0.016). The correction works because the exclusion restriction variable Z affects selection but not the outcome directly.


D. Mathematical Derivation

Derivation of the Inverse Mills Ratio Correction

Don't worry about the notation yet — here's what this means in words: The two-step correction derives from the conditional expectation of the outcome given selection, under joint normality of the error terms.

Setup. We have two equations:

Selection: Di=Ziγ+uiD_i^* = Z_i'\gamma + u_i, where Di=1(Di>0)D_i = \mathbf{1}(D_i^* > 0)

Outcome: Yi=Xiβ+εiY_i = X_i'\beta + \varepsilon_i, observed only when Di=1D_i = 1

Assume (ui,εi)(u_i, \varepsilon_i) are jointly normal with correlation ρ\rho and Var(ui)=1\text{Var}(u_i) = 1, Var(εi)=σ2\text{Var}(\varepsilon_i) = \sigma^2.

Step 1: Conditional expectation of the outcome.

E[YiXi,Di=1]=Xiβ+E[εiDi=1]E[Y_i | X_i, D_i = 1] = X_i'\beta + E[\varepsilon_i | D_i = 1]

The OLS bias is E[εiDi=1]E[\varepsilon_i | D_i = 1], which is non-zero when selection is correlated with the outcome.

Step 2: Use the properties of the truncated bivariate normal.

Since Di=1D_i = 1 iff ui>Ziγu_i > -Z_i'\gamma, we need E[εiui>Ziγ]E[\varepsilon_i | u_i > -Z_i'\gamma].

For jointly normal (ui,εi)(u_i, \varepsilon_i):

E[εiui>Ziγ]=ρσE[uiui>Ziγ]E[\varepsilon_i | u_i > -Z_i'\gamma] = \rho\sigma \cdot E[u_i | u_i > -Z_i'\gamma]

Step 3: Expected value of a truncated standard normal.

For uiN(0,1)u_i \sim N(0, 1):

E[uiui>c]=ϕ(c)Φ(c)E[u_i | u_i > -c] = \frac{\phi(c)}{\Phi(c)}

where c=Ziγc = Z_i'\gamma. This ratio λ(c)=ϕ(c)/Φ(c)\lambda(c) = \phi(c) / \Phi(c) is the inverse Mills ratio.

Step 4: Combine.

E[YiXi,Di=1]=Xiβ+ρσλ(Ziγ)E[Y_i | X_i, D_i = 1] = X_i'\beta + \rho\sigma \cdot \lambda(Z_i'\gamma)

This derivation shows that OLS on the selected sample omits ρσλ(Ziγ)\rho\sigma \cdot \lambda(Z_i'\gamma), producing an omitted variable bias. The two-step estimator corrects this by including λ^i\hat{\lambda}_i as an additional regressor.

Step 5: Standard error correction.

Because λ^i\hat{\lambda}_i is estimated from the first-stage probit (it is a generated regressor), the OLS standard errors in the second stage are incorrect. The correct variance-covariance matrix accounts for the estimation uncertainty in γ^\hat{\gamma}. All standard software packages compute the corrected standard errors automatically.


E. Implementation

Heckman Selection Model with Diagnostics

library(sampleSelection)

# ---- Step 1: Prepare the data ----
# Outcome: log(wage), observed only for working women
# Selection: whether the woman works (lfp = 1)
# Exclusion restriction: number of young children (nwifeinc, kidslt6)
# These affect labor force participation but not the wage rate

# ---- Step 2: Heckman two-step estimator ----
heck_2step <- heckit(
selection = lfp ~ age + I(age^2) + educ + nwifeinc + kidslt6 + kidsge6,
outcome   = log(wage) ~ educ + exper + I(exper^2) + city,
data = df,
method = "2step"
)
summary(heck_2step)

# The coefficient on the inverse Mills ratio (lambda) estimates rho * sigma
# If lambda is significant, selection bias is present

# ---- Step 3: Full information MLE ----
heck_mle <- heckit(
selection = lfp ~ age + I(age^2) + educ + nwifeinc + kidslt6 + kidsge6,
outcome   = log(wage) ~ educ + exper + I(exper^2) + city,
data = df,
method = "ml"
)
summary(heck_mle)

# MLE directly estimates rho and sigma (not just rho*sigma)
# Compare two-step and MLE results — large discrepancies suggest
# problems with normality or the exclusion restriction

# ---- Step 4: Diagnostics ----
# Test significance of inverse Mills ratio
# H0: rho = 0 (no selection bias)
# If not rejected, OLS on the selected sample is consistent

# Compare with naive OLS (ignoring selection)
ols_naive <- lm(log(wage) ~ educ + exper + I(exper^2) + city,
              data = df[df$lfp == 1, ])
summary(ols_naive)

# If Heckman and OLS give similar coefficients, selection bias
# may be small (or the correction is not working due to
# weak exclusion restriction)

F. Diagnostics

F.1 Significance of the Inverse Mills Ratio (Lambda)

The most basic diagnostic: is λ^\hat{\lambda} (the coefficient on the inverse Mills ratio) statistically significant? Under the null hypothesis H0:ρ=0H_0: \rho = 0, there is no selection bias, and OLS on the selected sample is consistent. If λ^\hat{\lambda} is insignificant, either (a) there is genuinely no selection bias, or (b) the exclusion restriction is too weak to detect it.

  • In the two-step estimator, test whether the coefficient on λ^i\hat{\lambda}_i is significant (use corrected standard errors)
  • In MLE, test H0:ρ=0H_0: \rho = 0 using the Wald test on ρ^\hat{\rho} or a likelihood ratio test comparing the joint model to separate probit + OLS

F.2 Normality Tests

Since the correction relies on joint normality, assess whether this assumption is plausible:

  1. Residual normality: test the outcome equation residuals for normality (Shapiro-Wilk, Jarque-Bera, Q-Q plot). This checks marginal normality of εi\varepsilon_i, which is necessary but not sufficient for joint normality.

  2. Polynomial Mills ratio test: add λ^i2\hat{\lambda}_i^2 and λ^i3\hat{\lambda}_i^3 to the outcome equation. If these higher-order terms are significant, the linear Mills ratio correction is insufficient — evidence against normality. The polynomial Mills ratio test is a specification test for the functional form of the correction.

  3. Compare two-step and MLE: large discrepancies between the two estimators suggest normality violations (MLE relies more heavily on normality than the two-step estimator).

F.3 Exclusion Restriction Strength

A weak exclusion restriction produces imprecise estimates and makes the model fragile:

  1. Joint significance test: test whether the excluded variables are jointly significant in the probit selection equation (χ2\chi^2 test). An insignificant exclusion restriction means the model is unidentified without functional form.

  2. Magnitude of the first-stage coefficients: the excluded variables should have economically meaningful effects on selection, not just statistical significance.

  3. Collinearity check: verify that the inverse Mills ratio is not highly collinear with the covariates in the outcome equation. High collinearity (VIF > 10 for λ^\hat{\lambda}) indicates that the model cannot distinguish the selection correction from the direct effects of the covariates.

F.4 Two-Step vs. MLE Comparison

Estimate the model using both methods and compare:

  • Similar results: reassuring. Both methods are estimating the same parameters, and the normality assumption is likely adequate.
  • Different results: cause for concern. Possible explanations: (1) normality is violated (MLE is more sensitive), (2) the exclusion restriction is weak (MLE leverages different information than two-step), (3) the sample size is too small for MLE to converge properly.
library(sampleSelection)

# Fit two-step and MLE
heck_2s <- heckit(
selection = lfp ~ age + I(age^2) + educ + nwifeinc + kidslt6 + kidsge6,
outcome   = log(wage) ~ educ + exper + I(exper^2) + city,
data = df, method = "2step"
)
heck_ml <- heckit(
selection = lfp ~ age + I(age^2) + educ + nwifeinc + kidslt6 + kidsge6,
outcome   = log(wage) ~ educ + exper + I(exper^2) + city,
data = df, method = "ml"
)

# F.1 Test significance of lambda (rho * sigma)
summary(heck_2s)  # Check p-value on "Inverse Mills Ratio"

# F.2 Polynomial Mills ratio test for normality
# Compute Mills ratio from first-stage probit
probit_fit <- glm(lfp ~ age + I(age^2) + educ + nwifeinc +
                  kidslt6 + kidsge6,
                family = binomial(link = "probit"), data = df)
Zgamma <- predict(probit_fit, type = "link")
mills <- dnorm(Zgamma) / pnorm(Zgamma)

# Add squared and cubed Mills ratio to outcome equation
selected <- df[df$lfp == 1, ]
mills_sel <- mills[df$lfp == 1]
selected$mills  <- mills_sel
selected$mills2 <- mills_sel^2
selected$mills3 <- mills_sel^3

norm_test <- lm(log(wage) ~ educ + exper + I(exper^2) + city +
                mills + mills2 + mills3,
              data = selected)
# Joint test of mills2 and mills3
library(car)
linearHypothesis(norm_test, c("mills2 = 0", "mills3 = 0"))

# F.3 Exclusion restriction strength
anova(
glm(lfp ~ age + I(age^2) + educ + kidsge6,
    family = binomial(link = "probit"), data = df),
probit_fit, test = "Chisq"
)

# F.4 Compare two-step and MLE
cbind(TwoStep = coef(heck_2s, part = "outcome"),
    MLE = coef(heck_ml, part = "outcome"))
Requirescar

F.5 Interpreting Your Results

What Rho Tells You

The parameter ρ\rho — the correlation between the selection and outcome errors — is the key parameter that drives the selection correction.

ρ\rhoSign of bias in naive OLSInterpretation
ρ<0\rho < 0OLS overestimates β\betaIndividuals who select in have higher unobserved outcome potential. Example: women who choose to work have higher unobserved ability, inflating observed wages.
ρ=0\rho = 0No biasSelection is independent of the outcome (conditional on covariates). OLS on the selected sample is consistent.
ρ>0\rho > 0OLS underestimates β\betaIndividuals who select in have lower unobserved outcome potential. Example: workers who accept a dangerous job have lower outside options, deflating observed compensating differentials.

What Lambda (λ^\hat{\lambda} Coefficient) Captures

The coefficient on the inverse Mills ratio in the two-step estimator is ρ^σ^\hat{\rho}\hat{\sigma}, sometimes denoted λ^\hat{\lambda} or δ^\hat{\delta}. Its magnitude and significance determine:

  1. Significance of λ^\hat{\lambda}: if significant, sample selection bias is present, and the Heckman correction is doing meaningful work. If insignificant, either selection bias is absent or the model lacks power to detect it (weak exclusion restriction).

  2. Magnitude of λ^\hat{\lambda}: the product ρσ\rho\sigma determines the size of the bias correction. A large λ^|\hat{\lambda}| coefficient means the selection correction substantially changes the outcome equation coefficients.

  3. Sign of λ^\hat{\lambda}: reveals the direction of selection. Negative λ^\hat{\lambda} (with positive γ\gamma in the selection equation) means positive selection — those who select in have higher outcomes than a random draw from the population.

What to Report in a Table

A well-reported Heckman model should include:

  1. Both equations: report the full selection equation (probit coefficients) and the outcome equation (OLS coefficients with Mills ratio correction)
  2. Lambda (ρσ\rho\sigma): the coefficient on the inverse Mills ratio, with its standard error and p-value
  3. Rho (ρ^\hat{\rho}) and sigma (σ^\hat{\sigma}): if using MLE, report these separately
  4. Number of observations: total N, number selected (observed outcome), number not selected
  5. Exclusion restriction: clearly identify which variables appear in the selection equation but not the outcome equation
  6. Exclusion restriction strength: chi-squared test of the excluded variables in the selection equation
  7. Estimation method: two-step or MLE
  8. Naive OLS comparison: show how the coefficients change when selection is ignored

G. What Can Go Wrong

Assumption Failure Demo

No Exclusion Restriction

Heckman model with a credible exclusion restriction (number of young children affects labor force participation but not wage rate)

Returns to education: 0.108 (SE = 0.014). Lambda = -0.28 (SE = 0.11, p = 0.01). The selection correction is significant and the education coefficient is well-estimated because the exclusion restriction (kidslt6) is strong in the selection equation (chi-squared = 42.3, p < 0.001).

Assumption Failure Demo

Normality Violation

Outcome variable (log wages) is approximately normally distributed. Joint normality of error terms is plausible.

Two-step estimate of returns to education: 0.108 (SE = 0.014). MLE estimate: 0.105 (SE = 0.012). The two methods agree closely, and the polynomial Mills ratio test does not reject normality (p = 0.61).

Assumption Failure Demo

Weak Exclusion Restriction

Exclusion restriction (number of children under 6) is strongly predictive of labor force participation: probit coefficient = -0.87, chi-squared = 42.3, p < 0.001

Lambda = -0.28 (SE = 0.11). The inverse Mills ratio is precisely estimated, providing a meaningful selection correction. The VIF of lambda in the outcome equation is 2.1 — well below the danger zone.

Assumption Failure Demo

Confusing Incidental Truncation with Sample Selection

True sample selection: wages are missing because women choose not to work. The selection decision is correlated with potential wages (women with low potential wages opt out).

Heckman correction is appropriate: rho = -0.45. Women who select into employment have higher-than-average potential wages. OLS on the selected sample overestimates average wages in the population.


H. Practice

H.1 Concept Checks

Concept Check

A researcher estimates a Heckman selection model for CEO compensation, where compensation is observed only for public firms. She uses state-level IPO regulations as the exclusion restriction. She finds that the inverse Mills ratio coefficient is -0.45 (SE = 0.18, p = 0.01) and rho = -0.38. What does the negative rho tell us about the selection process?

Concept Check

A researcher applies the Heckman two-step estimator to study the effect of R&D spending on firm performance. She includes the same set of variables in both the selection equation (whether the firm reports R&D) and the outcome equation (firm performance given R&D is reported). She has no exclusion restriction. She finds that the inverse Mills ratio is significant (p = 0.04). Is this evidence that her model is working correctly?

Concept Check

You estimate a Heckman model and find that lambda (the coefficient on the inverse Mills ratio) is -0.15 with a standard error of 0.42 and p = 0.72. The naive OLS coefficient on your key variable is 0.35 (SE = 0.08), while the Heckman-corrected coefficient is 0.33 (SE = 0.14). What should you conclude?

H.2 Guided Exercise

Guided Exercise

Interpreting Heckman Selection Model Output

You study the effect of training programs on worker wages. Wages are observed only for employed workers. Your Heckman model produces: **Selection Equation (Probit: Employed = 1)** Variable | Coeff | SE | p-value Age | 0.045 | 0.012 | < 0.001 Age-squared | -0.0005 | 0.0002| 0.012 Education | 0.112 | 0.021 | < 0.001 Married | 0.284 | 0.098 | 0.004 Num. children < 6 | -0.432 | 0.076 | < 0.001 [EXCLUDED] Spouse income (\$000s) | -0.018 | 0.005 | < 0.001 [EXCLUDED] Exclusion restriction test: chi-squared(2) = 52.4, p < 0.001 **Outcome Equation (Dep. var: log(wage))** Variable | Coeff | SE | p-value Education | 0.098 | 0.015 | < 0.001 Training program | 0.145 | 0.038 | < 0.001 Experience | 0.032 | 0.008 | < 0.001 Experience-squared | -0.0005 | 0.0002| 0.012 Inverse Mills ratio | -0.312 | 0.104 | 0.003 rho = -0.47, sigma = 0.664, lambda = rho * sigma = -0.312 Method: Two-step. N = 2,000 (1,340 employed, 660 not employed). Naive OLS on employed workers: Training coefficient = 0.118 (SE = 0.032, p < 0.001).

Are the exclusion restrictions strong? How do you know?

Is there evidence of sample selection bias? What is the direction of selection?

What is the selection-corrected effect of the training program? How does it differ from the naive OLS estimate, and why?

In plain language, what does rho = -0.47 mean for the population of non-workers?

H.3 Error Detective

Error Detective

Read the analysis below carefully and identify the errors.

A management researcher studies whether board diversity affects firm performance. She argues that firm performance (Tobin's Q) is observed only for publicly listed firms, creating a selection problem. She estimates a Heckman model: Selection equation: Listed = f(firm_age, total_assets, industry) Outcome equation: TobinsQ = f(board_diversity, firm_size, leverage, ROA, industry) She uses firm_age and total_assets as exclusion restrictions, arguing they affect the listing decision but not performance. She reports: - Board diversity coefficient (Heckman): 0.42 (p = 0.03) - Board diversity coefficient (OLS): 0.28 (p = 0.08) - Lambda: -0.89 (SE = 0.31, p = 0.004) - Rho: -0.62 - Method: Two-step She concludes: "After correcting for selection into public listing, board diversity has an even stronger positive effect on firm performance." She does not report the selection equation coefficients or test the exclusion restriction strength.

Select all errors you can find:

Error Detective

Read the analysis below carefully and identify the errors.

A labor economist studies the gender wage gap. She estimates a Heckman model separately for men and women: For women: - Selection equation: Employed = f(age, education, married, children_under_5, spouse_income) - Outcome equation: log(wage) = f(education, experience, experience^2, occupation) - Exclusion restrictions: children_under_5, spouse_income - Lambda = -0.31 (SE = 0.09, p < 0.001), rho = -0.42 For men: - Selection equation: Employed = f(age, education, married, children_under_5, spouse_income) - Outcome equation: log(wage) = f(education, experience, experience^2, occupation) - Exclusion restrictions: children_under_5, spouse_income - Lambda = 0.02 (SE = 0.15, p = 0.89), rho = 0.03 She reports: "The selection-corrected gender wage gap is 22 log points, compared to 18 log points in naive OLS. Selection correction matters for women but not for men." She does not discuss whether the same exclusion restriction is appropriate for both genders.

Select all errors you can find:

H.4 You Are the Referee

Referee Exercise

Read the paper summary below and write a brief referee critique (2-3 sentences) of the identification strategy.

Paper Summary

The authors study returns to education for married women using a Heckman two-step selection model, correcting for the fact that wages are observed only for employed women. They use a sample of 3,200 married women (1,950 employed, 1,250 not employed) from a national household survey. As their exclusion restriction, they use whether the woman's mother was employed when the woman was age 14, arguing that maternal employment norms affect daughters' labor force participation but not their wages. They find a selection-corrected return to education of 11.2% per year (SE = 1.8%), compared to a naive OLS estimate of 8.5% (SE = 1.1%). Lambda is -0.28 (p = 0.02).

Key Table

VariableCoefficientSEp-value
Education (years)0.1120.018<0.001
Experience0.0350.009<0.001
Experience-squared-0.0010.0000.008
Inverse Mills ratio-0.2800.1200.020
N (total)3,200
N (employed)1,950

Authors' Identification Claim

The authors argue that maternal employment status is a valid exclusion restriction: it affects daughters' labor force participation through intergenerational transmission of work norms but does not directly affect daughters' market wages.


I. Swap-In: When to Use Something Else

  • Lee bounds: when normality is doubtful or no credible exclusion restriction exists. Lee bounds require only a monotonicity assumption (treatment does not make anyone less likely to be selected). You get an interval rather than a point estimate, but it is valid under weaker assumptions. Recommended as a robustness check alongside the Heckman model.

  • Control function approach: a generalization of the Heckman two-step. Instead of assuming joint normality (which implies the correction term is the inverse Mills ratio), you can use nonparametric or semiparametric estimates of the control function. This relaxes the distributional assumption at the cost of requiring a stronger exclusion restriction and a larger sample.

  • Semiparametric selection models: methods by Powell (1987), Newey (1988), and others estimate the selection correction without assuming normality. These use kernel or series estimators for the conditional expectation of the error. They require larger samples and are computationally more complex.

  • Inverse probability weighting (IPW): weight each observation by the inverse of its probability of being selected (estimated from the selection equation). Unlike Heckman, IPW does not require normality and can accommodate nonlinear outcome models. However, IPW only addresses selection on observables — it cannot correct for selection on unobservables.

  • Bivariate probit: when both the selection and the outcome are binary. The Heckman model assumes a continuous outcome; bivariate probit handles two binary equations with correlated errors. Uses the same exclusion restriction logic.

  • Bounds approaches (Manski bounds): when you want the weakest possible assumptions. Manski worst-case bounds assume nothing about the selection process and provide very wide intervals. Lee bounds tighten these by adding a monotonicity assumption.


J. Reviewer Checklist

Critical Reading Checklist


Paper Library

Foundational (1)

Heckman, J. J. (1979). Sample Selection Bias as a Specification Error.

EconometricaDOI: 10.2307/1912352

Introduces the two-step estimator for correcting sample selection bias using the inverse Mills ratio. The paper shows that selection bias can be treated as an omitted variable problem, where the omitted variable is the conditional expectation of the error term given selection. One of the most cited papers in econometrics.

Application (2)

Mroz, T. A. (1987). The Sensitivity of an Empirical Model of Married Women's Hours of Work to Economic and Statistical Assumptions.

EconometricaDOI: 10.2307/1911029

Classic application of the Heckman selection model to female labor supply. Shows that the two-step estimator's results are sensitive to the choice of exclusion restriction and the normality assumption. The Mroz dataset remains a standard teaching dataset for selection models.

Lennox, C. S., Francis, J. R., & Wang, Z. (2012). Selection Models in Accounting Research.

The Accounting ReviewDOI: 10.2308/accr-10195

Reviews the use (and misuse) of Heckman selection models in accounting research. Documents common pitfalls including weak exclusion restrictions, failure to test normality, and mechanical application without economic justification for the selection equation.

Survey (3)

Puhani, P. A. (2000). The Heckman Correction for Sample Selection and Its Critique.

Journal of Economic SurveysDOI: 10.1111/1467-6419.00104

Comprehensive survey comparing Heckman two-step, MLE, and semiparametric alternatives. Discusses conditions under which the two-step estimator performs poorly (weak exclusion restriction, non-normality) and when MLE is preferable.

Bushway, S., Johnson, B. D., & Slocum, L. A. (2007). Is the Magic Still There? The Use of the Heckman Two-Step Correction for Selection Bias in Criminology.

Journal of Quantitative CriminologyDOI: 10.1007/s10940-007-9024-4

Reviews Heckman model applications in criminology and finds widespread misapplication. Emphasizes that without a credible exclusion restriction, the Heckman correction provides no improvement over naive OLS and may even increase bias.

Certo, S. T., Busenbark, J. R., Woo, H., & Semadeni, M. (2016). Sample Selection Bias and Heckman Models in Strategic Management Research.

Strategic Management JournalDOI: 10.1002/smj.2475

Reviews the use of Heckman models in strategic management. Provides practical guidance on when selection correction is needed, how to choose exclusion restrictions, and how to interpret results. Finds that many SMJ papers misapply the technique.

Tags

model-basedselection-biassample-selection