MethodAtlas
Method·advanced·20 min read
Model-BasedEstablished

Heckman Selection Model

Corrects for sample selection bias when the outcome is observed only for a non-random subset of the population, using a two-equation system with an exclusion restriction.

When to UseWhen your outcome variable is observed only for a selected sample -- e.g., wages only for employed workers, firm performance only for surviving firms, deal returns only for completed transactions.
AssumptionJoint normality of the error terms in the selection and outcome equations. An exclusion restriction: at least one variable affects selection but not the outcome. Without the exclusion restriction, identification relies entirely on the normality assumption.
MistakeNot having a credible exclusion restriction and relying solely on the normality assumption for identification, which produces fragile estimates.
Reading Time~20 min read · 11 sections · 7 interactive exercises

One-Line Implementation

Rselection(employed ~ children + married + education + experience, wage ~ education + experience, data = df)
Stataheckman wage education experience, select(employed = children married education experience) twostep
Python# Manual two-step: probit selection, compute IMR, OLS with IMR -- see method page for full implementation

Download Full Analysis Code

Complete scripts with diagnostics, robustness checks, and result export.

Motivating Example: Wages and Female Labor Supply

A labor economist wants to estimate the returns to education for married women. She has a large survey with information on education, age, number of children, husband's income, and — for women who are employed — their hourly wage.

Here is the problem: wages are only observed for women who work. Of the 753 women in her sample, only 428 are employed. The remaining 325 women have missing wages — not because of data errors, but because they chose not to participate in the labor market.

If she simply runs OLS on the 428 working women, she estimates wages conditional on employment. But employment itself is a choice that depends on potential wages. Women with very low potential wages may choose not to work (because the opportunity cost of leisure or home production exceeds their market wage). By restricting the sample to employed women, the researcher is selecting on an outcome that is correlated with the error term in the wage equation.

This problem is . It is not the same as omitted variable bias or measurement error — it arises because the sample is non-randomly drawn from the population. The observed wage distribution is truncated from below: women with the lowest potential wages are disproportionately absent from the sample.

The Heckman selection model (Heckman, 1979) solves this problem. It jointly models two processes: (1) the selection equation — whether a woman works — and (2) the outcome equation — what she earns if she works. By estimating the correlation between the errors in these two equations, the model corrects for the non-random selection into the observed sample. The key correction term is the , which captures the expected value of the error in the wage equation conditional on the woman being selected into the sample.

This approach contributed to James Heckman's 2000 Nobel Prize in Economics (awarded for his broader development of theory and methods for analyzing selective samples) and remains one of the most widely used corrections for sample selection bias across the social sciences (Mroz, 1987).


AOverview

What the Heckman Model Does

The Heckman selection model corrects for bias that arises when the outcome variable is observed only for a non-random subset of the population. It does so by modeling the selection process explicitly and incorporating information about selection into the outcome equation.

The model consists of two equations:

Selection equation (who is observed):

Di=Ziγ+ui,Di=1(Di>0)D_i^* = Z_i^{\prime}\gamma + u_i, \quad D_i = \mathbf{1}(D_i^* > 0)

where Di=1D_i = 1 if the outcome is observed (e.g., the woman works) and Di=0D_i = 0 otherwise. ZiZ_i is a vector of covariates that affect selection, and uiu_i is the error term.

Outcome equation (what is the outcome, conditional on being observed):

Yi=Xiβ+εi,observed only if Di=1Y_i = X_i^{\prime}\beta + \varepsilon_i, \quad \text{observed only if } D_i = 1

where YiY_i is the outcome of interest (e.g., wages), XiX_i is a vector of covariates, and εi\varepsilon_i is the error term.

The key assumption is that the errors (ui,εi)(u_i, \varepsilon_i) are jointly normally distributed:

(uiεi)N((00),(1ρσρσσ2))\begin{pmatrix} u_i \\ \varepsilon_i \end{pmatrix} \sim N\left(\begin{pmatrix} 0 \\ 0 \end{pmatrix}, \begin{pmatrix} 1 & \rho\sigma \\ \rho\sigma & \sigma^2 \end{pmatrix}\right)

where ρ\rho is the correlation between the selection and outcome errors, and σ\sigma is the standard deviation of εi\varepsilon_i. The variance of uiu_i is normalized to 1 (as in a standard probit model).

The Selection Bias Problem

If ρ0\rho \neq 0, then E[εiDi=1]0E[\varepsilon_i | D_i = 1] \neq 0 — the expected error in the wage equation is non-zero for the selected sample. Specifically:

E[YiXi,Di=1]=Xiβ+ρσϕ(Ziγ)Φ(Ziγ)E[Y_i \mid X_i, D_i = 1] = X_i^{\prime}\beta + \rho\sigma \cdot \frac{\phi(Z_i^{\prime}\gamma)}{\Phi(Z_i^{\prime}\gamma)}

The term λ(Ziγ)=ϕ(Ziγ)Φ(Ziγ)\lambda(Z_i^{\prime}\gamma) = \frac{\phi(Z_i^{\prime}\gamma)}{\Phi(Z_i^{\prime}\gamma)} is the , where ϕ()\phi(\cdot) is the standard normal PDF and Φ()\Phi(\cdot) is the standard normal CDF. Running OLS on the selected sample omits this term, producing biased estimates of β\beta whenever ρ0\rho \neq 0.

Two Estimation Approaches

Heckman Two-Step (Heckit):

  1. Estimate the as a probit: P(Di=1Zi)=Φ(Ziγ)P(D_i = 1 | Z_i) = \Phi(Z_i^{\prime}\gamma)
  2. Compute the inverse Mills ratio: λ^i=ϕ(Ziγ^)/Φ(Ziγ^)\hat{\lambda}_i = \phi(Z_i^{\prime}\hat{\gamma}) / \Phi(Z_i^{\prime}\hat{\gamma})
  3. Include λ^i\hat{\lambda}_i as an additional regressor in the outcome equation and estimate by OLS:

Yi=Xiβ+ρσλ^i+errorY_i = X_i^{\prime}\beta + \rho\sigma \cdot \hat{\lambda}_i + \text{error}

The coefficient on λ^i\hat{\lambda}_i estimates ρσ\rho\sigma. Standard errors must be corrected because λ^i\hat{\lambda}_i is a generated regressor (the standard OLS standard errors are too small).

Full Information Maximum Likelihood (FIML):

Estimate both equations simultaneously by maximizing the joint likelihood of the observed data. FIML is more efficient than the two-step estimator when the joint normality assumption holds, but it requires this assumption for the entire joint distribution, not just the conditional mean. With a weak exclusion restriction, both methods are fragile — MLE leverages the functional form more aggressively, which helps only if normality is correct (Puhani, 2000).

When to Use the Heckman Model

  • Your outcome is observed only for a selected subsample and you believe the selection is non-random
  • You have a credible — a variable that affects selection but not the outcome
  • Joint normality of the error terms is a reasonable approximation
  • You want to estimate causal effects or population-level relationships, not just conditional associations for the selected sample

When NOT to Use the Heckman Model

  • Your missing data are missing at random (MAR) — standard imputation or inverse probability weighting may suffice
  • You have no credible exclusion restriction — without one, the model is identified only through functional form (normality), which is fragile (Bushway et al., 2007)
  • The normality assumption is substantially violated (e.g., the outcome distribution is heavily skewed, bounded, or multimodal) — consider semiparametric alternatives or Lee bounds
  • You have a randomized experiment with attrition — Lee bounds provide a more robust approach

Common Confusions

When (Not) to Use This Method

Use the Heckman Model When:

  1. Your outcome is observed only for a non-randomly selected subset. Wages for workers, test scores for students who did not drop out, firm performance for firms that survived, analyst forecasts for covered firms. The key question is: would including the unobserved outcomes change your conclusions?

  2. You have a credible exclusion restriction. A variable that plausibly affects selection but not the outcome. Without one, the model relies on functional form assumptions that are typically indefensible.

  3. Joint normality is a reasonable approximation. If the outcome variable is continuous, approximately symmetric, and not heavily skewed or bounded, normality is more plausible.

  4. You want to recover population-level parameters. If you only care about the effect of education on wages for women who work, OLS on the selected sample is fine. But if you want the effect of education on potential wages for all women (workers and non-workers), you need the selection correction.

Do NOT Use the Heckman Model When:

  1. Your missing data are missing at random (MAR). If missingness depends only on observed covariates, standard methods (multiple imputation, inverse probability weighting) are sufficient and do not require the normality assumption.

  2. You have no exclusion restriction and cannot defend one. Without an exclusion restriction, you are relying entirely on the nonlinearity of the normal CDF for identification. Certo et al. (2016) find that many management papers apply Heckman without a credible exclusion restriction, rendering the correction unreliable.

  3. The normality assumption is doubtful. If the outcome variable is heavily skewed (e.g., firm value, patent counts) or bounded (e.g., proportions, ratings on a fixed scale), the joint normality assumption is suspect. Consider semiparametric alternatives or Lee bounds.

  4. You have a randomized experiment with differential attrition. Lee bounds provide a nonparametric approach that does not require normality or an exclusion restriction. They give bounds on the treatment effect rather than a point estimate, but the bounds are valid under weaker assumptions.

  5. Selection is on observables only. If you believe you have observed all the variables that drive selection, matching or inverse probability weighting can address selection without parametric distributional assumptions.

Connection to Other Methods

The Heckman model relates to several other methods for handling selection and endogeneity:

  • Logit/Probit: the selection equation in the Heckman model is a probit. If you are only interested in modeling the selection decision itself (e.g., labor force participation), a standalone probit is sufficient. The Heckman model adds the outcome equation and the correlation between the two.

  • OLS: the outcome equation estimated on the selected sample is OLS with selection bias. The Heckman model adds the inverse Mills ratio to correct this bias. If ρ=0\rho = 0 (no selection bias), the Heckman model reduces to OLS on the selected sample.

  • IV/2SLS: conceptually similar in that both require an exclusion restriction. In IV, the instrument affects the endogenous regressor but not the outcome directly. In Heckman, the excluded variable affects selection but not the outcome. The Heckman two-step can be viewed as a approach — adding a correction term rather than instrumenting.

  • Lee bounds: a nonparametric alternative that bounds the treatment effect under weaker assumptions (monotonicity of selection, no functional form). Lee bounds are popular in program evaluation when normality is questionable. The trade-off is that you get bounds rather than a point estimate.

  • Matching: addresses selection on observables. Matching assumes that, conditional on observed covariates, selection is as good as random. The Heckman model addresses selection on unobservables — the case where selection depends on factors correlated with the outcome error term even after conditioning on observables.

  • Control function approach: the Heckman two-step is a special case of the control function approach. In the general control function framework, the correction term can take forms other than the inverse Mills ratio (e.g., for non-normal error distributions). Rivers and Vuong (1988) extends this to simultaneous equations with endogenous regressors.


BIdentification

For the Heckman model to provide valid correction, two key conditions must hold.

Condition 1: Joint Normality

Plain language: The unobserved factors affecting selection and the unobserved factors affecting the outcome must follow a bivariate normal distribution.

Formally: (ui,εi)N(0,Σ)(u_i, \varepsilon_i) \sim N(0, \Sigma) where Σ=(1ρσρσσ2)\Sigma = \begin{pmatrix} 1 & \rho\sigma \\ \rho\sigma & \sigma^2 \end{pmatrix}.

This assumption is crucial because the inverse Mills ratio correction is derived from the properties of the bivariate normal distribution. If the errors are not jointly normal, the functional form of the correction term λ()\lambda(\cdot) is wrong, and the bias correction may itself be biased.

Condition 2: Exclusion Restriction

Plain language: At least one variable must appear in the selection equation but not in the outcome equation. This variable affects whether the outcome is observed but does not directly affect the outcome itself.

Formally: There exists a variable ZkZ_k such that γk0\gamma_k \neq 0 (it affects selection) but ZkZ_k does not appear in XiX_i (it is excluded from the outcome equation).

Why it matters: Without an exclusion restriction, the Heckman model is identified only through the nonlinearity of the inverse Mills ratio — which comes from the normality assumption. If normality is even slightly wrong, this "identification through functional form" can produce wildly inaccurate corrections. Bushway et al. (2007) demonstrate that without a credible exclusion restriction, the Heckman correction provides no improvement over naive OLS and may increase bias.

Examples of exclusion restrictions in practice:

Research settingSelection equationExclusion restrictionJustification
Female labor supplyWorks or notNumber of young children, husband's incomeAffect labor force participation but not wage rate conditional on working
Firm R&D spendingReports R&D or notIndustry peers' reporting practicesAffects disclosure but not the level of R&D
CEO compensationFirm is publicly tradedState-level IPO regulationsAffect listing decision but not pay
Analyst coverageFirm is covered or notGeographic distance to nearest analystAffects coverage probability but not firm value

CVisual Intuition

Adjust the selection correlation to see how naive OLS on the selected sample diverges from the true effect. When selection is strong, OLS is badly biased; the Heckman correction uses the inverse Mills ratio to recover the true coefficient.

Watch how non-random selection into the labor force distorts the observed wage-education relationship:


DMathematical Derivation

Derivation of the Inverse Mills Ratio Correction

Don't worry about the notation yet — here's what this means in words: The two-step correction derives from the conditional expectation of the outcome given selection, under joint normality of the error terms.

Setup. We have two equations:

Selection: Di=Ziγ+uiD_i^* = Z_i^{\prime}\gamma + u_i, where Di=1(Di>0)D_i = \mathbf{1}(D_i^* > 0)

Outcome: Yi=Xiβ+εiY_i = X_i^{\prime}\beta + \varepsilon_i, observed only when Di=1D_i = 1

Assume (ui,εi)(u_i, \varepsilon_i) are jointly normal with correlation ρ\rho and Var(ui)=1\text{Var}(u_i) = 1, Var(εi)=σ2\text{Var}(\varepsilon_i) = \sigma^2.

Step 1: Conditional expectation of the outcome.

E[YiXi,Di=1]=Xiβ+E[εiDi=1]E[Y_i \mid X_i, D_i = 1] = X_i^{\prime}\beta + E[\varepsilon_i \mid D_i = 1]

The OLS bias is E[εiDi=1]E[\varepsilon_i | D_i = 1], which is non-zero when selection is correlated with the outcome.

Step 2: Use the properties of the truncated bivariate normal.

Since Di=1D_i = 1 iff ui>Ziγu_i > -Z_i^{\prime}\gamma, we need E[εiui>Ziγ]E[\varepsilon_i | u_i > -Z_i^{\prime}\gamma].

For jointly normal (ui,εi)(u_i, \varepsilon_i):

E[εiui>Ziγ]=ρσE[uiui>Ziγ]E[\varepsilon_i \mid u_i > -Z_i^{\prime}\gamma] = \rho\sigma \cdot E[u_i \mid u_i > -Z_i^{\prime}\gamma]

Step 3: Expected value of a truncated standard normal.

For uiN(0,1)u_i \sim N(0, 1):

E[uiui>c]=ϕ(c)Φ(c)E[u_i \mid u_i > -c] = \frac{\phi(c)}{\Phi(c)}

where c=Ziγc = Z_i^{\prime}\gamma. This ratio λ(c)=ϕ(c)/Φ(c)\lambda(c) = \phi(c) / \Phi(c) is the inverse Mills ratio.

Step 4: Combine.

E[YiXi,Di=1]=Xiβ+ρσλ(Ziγ)E[Y_i \mid X_i, D_i = 1] = X_i^{\prime}\beta + \rho\sigma \cdot \lambda(Z_i^{\prime}\gamma)

This derivation shows that OLS on the selected sample omits ρσλ(Ziγ)\rho\sigma \cdot \lambda(Z_i^{\prime}\gamma), producing an omitted variable bias. The two-step estimator corrects this by including λ^i\hat{\lambda}_i as an additional regressor.

Step 5: Standard error correction.

Because λ^i\hat{\lambda}_i is estimated from the first-stage probit (it is a generated regressor), the OLS standard errors in the second stage are incorrect. The correct variance-covariance matrix accounts for the estimation uncertainty in γ^\hat{\gamma}. All standard software packages compute the corrected standard errors automatically.


EImplementation

Heckman Selection Model with Diagnostics

# Requires: sampleSelection
# sampleSelection: R package for Heckman selection models (Toomet & Henningsen)
library(sampleSelection)

# --- Step 1: Prepare the data ---
# Outcome: log(wage), observed ONLY for working women (selected sample)
# Selection: whether the woman works (lfp = 1)
# Exclusion restriction: nwifeinc (non-wife income), kidslt6 (kids under 6)
# These affect labor force participation but should not directly affect wage rate

# --- Step 2: Heckman two-step estimator ---
# heckit() estimates the selection and outcome equations jointly
# selection: probit for participation (first stage)
# outcome: OLS for wages, corrected by the inverse Mills ratio (second stage)
heck_2step <- heckit(
selection = lfp ~ age + I(age^2) + educ + nwifeinc + kidslt6 + kidsge6,
outcome   = log(wage) ~ educ + exper + I(exper^2) + city,
data = df,
method = "2step"
)
summary(heck_2step)
# Coefficient on lambda (inverse Mills ratio) estimates rho * sigma
# If lambda is significant, this is consistent with selection bias under the model

# --- Step 3: Full information MLE ---
# MLE estimates the full joint model under bivariate normality
# More efficient than two-step but more sensitive to normality violations
heck_mle <- heckit(
selection = lfp ~ age + I(age^2) + educ + nwifeinc + kidslt6 + kidsge6,
outcome   = log(wage) ~ educ + exper + I(exper^2) + city,
data = df,
method = "ml"
)
summary(heck_mle)
# MLE directly estimates rho and sigma (not just rho*sigma)
# Compare two-step and MLE: large discrepancies suggest normality problems

# --- Step 4: Diagnostics ---
# Test H0: rho = 0 (no selection bias)
# If not rejected, OLS on the selected sample is consistent

# Compare with naive OLS (ignoring selection)
ols_naive <- lm(log(wage) ~ educ + exper + I(exper^2) + city,
              data = df[df$lfp == 1, ])
summary(ols_naive)
# If Heckman and OLS coefficients differ substantially, selection bias
# is economically meaningful and the correction is needed

FDiagnostics

F.1 Significance of the Inverse Mills Ratio (Lambda)

The most basic diagnostic: is λ^\hat{\lambda} (the coefficient on the inverse Mills ratio) statistically significant? Under the null hypothesis H0:ρ=0H_0: \rho = 0, there is no selection bias, and OLS on the selected sample is consistent. If λ^\hat{\lambda} is insignificant, either (a) there is genuinely no selection bias, or (b) the exclusion restriction is too weak to detect it.

  • In the two-step estimator, test whether the coefficient on λ^i\hat{\lambda}_i is significant (use corrected standard errors)
  • In MLE, test H0:ρ=0H_0: \rho = 0 using the Wald test on ρ^\hat{\rho} or a likelihood ratio test comparing the joint model to separate probit + OLS

F.2 Normality Tests

Since the correction relies on joint normality, assess whether this assumption is plausible:

  1. Residual normality: test the outcome equation residuals for normality (Shapiro-Wilk, Jarque-Bera, Q-Q plot). This checks marginal normality of εi\varepsilon_i, which is necessary but not sufficient for joint normality.

  2. Polynomial Mills ratio test: add λ^i2\hat{\lambda}_i^2 and λ^i3\hat{\lambda}_i^3 to the outcome equation. If these higher-order terms are significant, the linear Mills ratio correction is insufficient — evidence against normality. The polynomial Mills ratio test is a specification test for the functional form of the correction.

  3. Compare two-step and MLE: large discrepancies between the two estimators suggest normality violations (MLE relies more heavily on normality than the two-step estimator).

F.3 Exclusion Restriction Strength

A weak exclusion restriction produces imprecise estimates and makes the model fragile:

  1. Joint significance test: test whether the excluded variables are jointly significant in the probit selection equation (χ2\chi^2 test). An insignificant exclusion restriction means the model is unidentified without functional form.

  2. Magnitude of the first-stage coefficients: the excluded variables should have economically meaningful effects on selection, not just statistical significance.

  3. Collinearity check: verify that the inverse Mills ratio is not highly collinear with the covariates in the outcome equation. High collinearity (variance inflation factor (VIF) > 10 for λ^\hat{\lambda}) indicates that the model cannot distinguish the selection correction from the direct effects of the covariates.

F.4 Two-Step vs. MLE Comparison

Estimate the model using both methods and compare:

  • Similar results: reassuring. Both methods are estimating the same parameters, and the normality assumption is likely adequate.
  • Different results: cause for concern. Possible explanations: (1) normality is violated (MLE is more sensitive), (2) the exclusion restriction is weak (MLE leverages different information than two-step), (3) the sample size is too small for MLE to converge properly.
library(sampleSelection)

# Fit two-step and MLE
heck_2s <- heckit(
selection = lfp ~ age + I(age^2) + educ + nwifeinc + kidslt6 + kidsge6,
outcome   = log(wage) ~ educ + exper + I(exper^2) + city,
data = df, method = "2step"
)
heck_ml <- heckit(
selection = lfp ~ age + I(age^2) + educ + nwifeinc + kidslt6 + kidsge6,
outcome   = log(wage) ~ educ + exper + I(exper^2) + city,
data = df, method = "ml"
)

# F.1 Test significance of lambda (rho * sigma)
summary(heck_2s)  # Check p-value on "Inverse Mills Ratio"

# F.2 Polynomial Mills ratio test for normality
# Compute Mills ratio from first-stage probit
probit_fit <- glm(lfp ~ age + I(age^2) + educ + nwifeinc +
                  kidslt6 + kidsge6,
                family = binomial(link = "probit"), data = df)
Zgamma <- predict(probit_fit, type = "link")
mills <- dnorm(Zgamma) / pnorm(Zgamma)

# Add squared and cubed Mills ratio to outcome equation
selected <- df[df$lfp == 1, ]
mills_sel <- mills[df$lfp == 1]
selected$mills  <- mills_sel
selected$mills2 <- mills_sel^2
selected$mills3 <- mills_sel^3

norm_test <- lm(log(wage) ~ educ + exper + I(exper^2) + city +
                mills + mills2 + mills3,
              data = selected)
# Joint test of mills2 and mills3
library(car)
linearHypothesis(norm_test, c("mills2 = 0", "mills3 = 0"))

# F.3 Exclusion restriction strength
anova(
glm(lfp ~ age + I(age^2) + educ + kidsge6,
    family = binomial(link = "probit"), data = df),
probit_fit, test = "Chisq"
)

# F.4 Compare two-step and MLE
cbind(TwoStep = coef(heck_2s, part = "outcome"),
    MLE = coef(heck_ml, part = "outcome"))
Requirescar

F.5 Interpreting Your Results

What Rho Tells You

The parameter ρ\rho — the correlation between the selection and outcome errors — is the key parameter that drives the selection correction.

ρ\rhoSign of bias in naive OLSInterpretation
ρ<0\rho < 0OLS underestimates the mean outcomeIndividuals who select in have lower unobserved outcome potential (negative selection). Since ρσ<0\rho\sigma < 0 and the inverse Mills ratio >0> 0, $E[\varepsilon
ρ=0\rho = 0No biasSelection is independent of the outcome (conditional on covariates). OLS on the selected sample is consistent.
ρ>0\rho > 0OLS overestimates the mean outcomeIndividuals who select in have higher unobserved outcome potential (positive selection). Since ρσ>0\rho\sigma > 0, $E[\varepsilon

What Lambda (λ^\hat{\lambda} Coefficient) Captures

The coefficient on the inverse Mills ratio in the two-step estimator is ρ^σ^\hat{\rho}\hat{\sigma}, sometimes denoted λ^\hat{\lambda} or δ^\hat{\delta}. Its magnitude and significance determine:

  1. Significance of λ^\hat{\lambda}: if significant, this result is consistent with correlation between the selection and outcome disturbances under the maintained model — but significance alone does not prove selection bias is present (Certo et al., 2016). If insignificant, either selection bias is absent or the model lacks power to detect it (weak exclusion restriction).

  2. Magnitude of λ^\hat{\lambda}: the product ρσ\rho\sigma determines the size of the bias correction. A large λ^|\hat{\lambda}| coefficient means the selection correction substantially changes the outcome equation coefficients.

  3. Sign of λ^\hat{\lambda}: reveals the direction of selection. Negative λ^\hat{\lambda} means negative selection (ρ<0\rho < 0) — those who select in have lower unobserved outcome potential than a random draw from the population, so OLS on the selected sample underestimates the population mean.

What to Report in a Table

A well-reported Heckman model should include:

  1. Both equations: report the full selection equation (probit coefficients) and the outcome equation (OLS coefficients with Mills ratio correction)
  2. Lambda (ρσ\rho\sigma): the coefficient on the inverse Mills ratio, with its standard error and p-value
  3. Rho (ρ^\hat{\rho}) and sigma (σ^\hat{\sigma}): if using MLE, report these separately
  4. Number of observations: total N, number selected (observed outcome), number not selected
  5. Exclusion restriction: clearly identify which variables appear in the selection equation but not the outcome equation
  6. Exclusion restriction strength: chi-squared test of the excluded variables in the selection equation
  7. Estimation method: two-step or MLE
  8. Naive OLS comparison: show how the coefficients change when selection is ignored

GWhat Can Go Wrong

What Can Go Wrong

No Exclusion Restriction

Heckman model with a credible exclusion restriction (number of young children affects labor force participation but not wage rate)

Returns to education: 0.108 (SE = 0.014). Lambda = -0.28 (SE = 0.11, p = 0.01). The selection correction is significant and the education coefficient is well-estimated because the exclusion restriction (kidslt6) is strong in the selection equation (chi-squared = 42.3, p < 0.001).

What Can Go Wrong

Normality Violation

Outcome variable (log wages) is approximately normally distributed. Joint normality of error terms is plausible.

Two-step estimate of returns to education: 0.108 (SE = 0.014). MLE estimate: 0.105 (SE = 0.012). The two methods agree closely, and the polynomial Mills ratio test does not reject normality (p = 0.61).

What Can Go Wrong

Weak Exclusion Restriction

Exclusion restriction (number of children under 6) is strongly predictive of labor force participation: probit coefficient = -0.87, chi-squared = 42.3, p < 0.001

Lambda = -0.28 (SE = 0.11). The inverse Mills ratio is precisely estimated, providing a meaningful selection correction. The VIF of lambda in the outcome equation is 2.1 — well below the danger zone.

What Can Go Wrong

Confusing Incidental Truncation with Sample Selection

True sample selection: wages are missing because women choose not to work. The selection decision is correlated with potential wages (women with low potential wages opt out).

Heckman correction is appropriate: rho = -0.45. Women who select into employment have higher-than-average potential wages. OLS on the selected sample overestimates average wages in the population.


HPractice

H.1 Concept Checks

Concept Check

A researcher estimates a Heckman selection model for CEO compensation, where compensation is observed only for public firms. She uses state-level IPO regulations as the exclusion restriction. She finds that the inverse Mills ratio coefficient is -0.45 (SE = 0.18, p = 0.01) and rho = -0.38. What does the negative rho tell us about the selection process?

Concept Check

A researcher applies the Heckman two-step estimator to study the effect of R&D spending on firm performance. She includes the same set of variables in both the selection equation (whether the firm reports R&D) and the outcome equation (firm performance given R&D is reported). She has no exclusion restriction. She finds that the inverse Mills ratio is significant (p = 0.04). Is this evidence that her model is working correctly?

Concept Check

You estimate a Heckman model and find that lambda (the coefficient on the inverse Mills ratio) is -0.15 with a standard error of 0.42 and p = 0.72. The naive OLS coefficient on your key variable is 0.35 (SE = 0.08), while the Heckman-corrected coefficient is 0.33 (SE = 0.14). What should you conclude?

H.2 Guided Exercise

Guided Exercise

Interpreting Heckman Selection Model Output

You study the effect of training programs on worker wages. Wages are observed only for employed workers. Your Heckman model produces:

Selection Equation (Probit: Employed = 1)

VariableCoeffSEp-value
Age0.0450.012< 0.001
Age-squared-0.00050.00020.012
Education0.1120.021< 0.001
Married0.2840.0980.004
Num. children < 6-0.4320.076< 0.001 [EXCLUDED]
Spouse income ($000s)-0.0180.005< 0.001 [EXCLUDED]

Exclusion restriction test: chi-squared(2) = 52.4, p < 0.001

Outcome Equation (Dep. var: log(wage))

VariableCoeffSEp-value
Education0.0980.015< 0.001
Training program0.1450.038< 0.001
Experience0.0320.008< 0.001
Experience-squared-0.00050.00020.012
Inverse Mills ratio-0.3120.1040.003

rho = -0.47, sigma = 0.664, lambda = rho * sigma = -0.312 Method: Two-step. N = 2,000 (1,340 employed, 660 not employed).

Naive OLS on employed workers: Training coefficient = 0.118 (SE = 0.032, p < 0.001).

Are the exclusion restrictions strong? How do you know?

Is there evidence of sample selection bias? What is the direction of selection?

What is the selection-corrected effect of the training program? How does it differ from the naive OLS estimate, and why?

In plain language, what does rho = -0.47 mean for the population of non-workers?

H.3 Error Detective

Error Detective

Read the analysis below carefully and identify the errors.

A management researcher studies whether board diversity affects firm performance. She argues that firm performance (Tobin's Q) is observed only for publicly listed firms, creating a selection problem. She estimates a Heckman model:

Selection equation: Listed = f(firm_age, total_assets, industry) Outcome equation: TobinsQ = f(board_diversity, firm_size, leverage, ROA, industry)

She uses firm_age and total_assets as exclusion restrictions, arguing they affect the listing decision but not performance. She reports:

- Board diversity coefficient (Heckman): 0.42 (p = 0.03) - Board diversity coefficient (OLS): 0.28 (p = 0.08) - Lambda: -0.89 (SE = 0.31, p = 0.004) - Rho: -0.62 - Method: Two-step

She concludes: "After correcting for selection into public listing, board diversity has an even stronger positive effect on firm performance."

She does not report the selection equation coefficients or test the exclusion restriction strength.

Select all errors you can find:

Error Detective

Read the analysis below carefully and identify the errors.

A labor economist studies the gender wage gap. She estimates a Heckman model separately for men and women:

For women: - Selection equation: Employed = f(age, education, married, children_under_5, spouse_income) - Outcome equation: log(wage) = f(education, experience, experience^2, occupation) - Exclusion restrictions: children_under_5, spouse_income - Lambda = -0.31 (SE = 0.09, p < 0.001), rho = -0.42

For men: - Selection equation: Employed = f(age, education, married, children_under_5, spouse_income) - Outcome equation: log(wage) = f(education, experience, experience^2, occupation) - Exclusion restrictions: children_under_5, spouse_income - Lambda = 0.02 (SE = 0.15, p = 0.89), rho = 0.03

She reports: "The selection-corrected gender wage gap is 22 log points, compared to 18 log points in naive OLS. Selection correction matters for women but not for men."

She does not discuss whether the same exclusion restriction is appropriate for both genders.

Select all errors you can find:

H.4 You Are the Referee

Referee Exercise

Read the paper summary below and write a brief referee critique (2-3 sentences) of the identification strategy.

Paper Summary

The authors study returns to education for married women using a Heckman two-step selection model, correcting for the fact that wages are observed only for employed women. They use a sample of 3,200 married women (1,950 employed, 1,250 not employed) from a national household survey. As their exclusion restriction, they use whether the woman's mother was employed when the woman was age 14, arguing that maternal employment norms affect daughters' labor force participation but not their wages. They find a selection-corrected return to education of 11.2% per year (SE = 1.8%), compared to a naive OLS estimate of 8.5% (SE = 1.1%). Lambda is -0.28 (p = 0.02).

Key Table

VariableCoefficientSEp-value
Education (years)0.1120.018<0.001
Experience0.0350.009<0.001
Experience-squared-0.0010.00040.008
Inverse Mills ratio-0.2800.1200.020
N (total)3,200
N (employed)1,950

Authors' Identification Claim

The authors argue that maternal employment status is a valid exclusion restriction: it affects daughters' labor force participation through intergenerational transmission of work norms but does not directly affect daughters' market wages.


ISwap-In: When to Use Something Else

  • Lee bounds: when normality is doubtful or no credible exclusion restriction exists. Lee bounds require only a monotonicity assumption (treatment does not make anyone less likely to be selected). You get an interval rather than a point estimate, but it is valid under weaker assumptions. Recommended as a robustness check alongside the Heckman model.

  • Control function approach: a generalization of the Heckman two-step. Instead of assuming joint normality (which implies the correction term is the inverse Mills ratio), you can use nonparametric or semiparametric estimates of the control function. This relaxes the distributional assumption at the cost of requiring a stronger exclusion restriction and a larger sample.

  • Semiparametric selection models: methods by , , and others estimate the selection correction without assuming normality. These use kernel or series estimators for the conditional expectation of the error. They require larger samples and are computationally more complex.

  • Inverse probability weighting (IPW): weight each observation by the inverse of its probability of being selected (estimated from the selection equation). Unlike Heckman, IPW does not require normality and can accommodate nonlinear outcome models. However, IPW only addresses selection on observables — it cannot correct for selection on unobservables.

  • Bivariate probit: when both the selection and the outcome are binary. The Heckman model assumes a continuous outcome; bivariate probit handles two binary equations with correlated errors. Uses the same exclusion restriction logic.

  • Bounds approaches (Manski bounds): when you want the weakest possible assumptions. Manski worst-case bounds assume nothing about the selection process and provide very wide intervals. Lee bounds tighten these by adding a monotonicity assumption.


JReviewer Checklist

Critical Reading Checklist

0 of 10 items checked0%

Paper Library

Foundational (4)

Heckman, J. J. (1979). Sample Selection Bias as a Specification Error.

EconometricaDOI: 10.2307/1912352

Heckman introduces the two-step estimator for correcting sample selection bias using the inverse Mills ratio. The paper shows that selection bias can be treated as an omitted variable problem, where the omitted variable is the conditional expectation of the error term given selection. One of the most cited papers in econometrics.

Newey, W. K. (1999). Two Step Series Estimation of Sample Selection Models.

MIT Department of Economics Working Paper 99-04

Newey proposes a semiparametric two-step estimator for sample selection models that replaces the parametric inverse Mills ratio with a flexible series (power series or regression spline) approximation to the unknown selection correction function. This approach avoids the normality assumption underlying the standard Heckman correction while retaining the computational convenience of a two-step procedure. Researchers concerned about distributional misspecification in selection models can use series-based selection corrections as a robust alternative to parametric methods.

Powell, J. L. (1987). Semiparametric Estimation of Bivariate Latent Variable Models.

SSRI Working Paper 8704, University of Wisconsin-Madison

Powell develops semiparametric methods for estimating bivariate latent variable models—including censored sample selection models—without imposing distributional assumptions on the error terms. This approach relaxes the bivariate normality requirement of the Heckman two-step estimator, requiring only an exclusion restriction and mild regularity conditions. Researchers who doubt the normality assumption in selection models can apply these methods to obtain consistent estimates under weaker conditions.

Rivers, D., & Vuong, Q. H. (1988). Limited Information Estimators and Exogeneity Tests for Simultaneous Probit Models.

Journal of EconometricsDOI: 10.1016/0304-4076(88)90063-2

Rivers and Vuong propose a computationally simple two-step maximum likelihood procedure for estimating simultaneous probit models with endogenous regressors, and derive simple exogeneity tests based on this estimator. The exogeneity tests are asymptotically equivalent to classical tests based on limited information maximum likelihood but require only probit and OLS regressions to implement. Applied researchers working with binary outcome models and suspected endogeneity can use the Rivers-Vuong procedure as a tractable alternative to full information maximum likelihood.

Application (2)

Mroz, T. A. (1987). The Sensitivity of an Empirical Model of Married Women's Hours of Work to Economic and Statistical Assumptions.

EconometricaDOI: 10.2307/1911029

Mroz provides a classic application of the Heckman selection model to female labor supply. Shows that the two-step estimator's results are sensitive to the choice of exclusion restriction and the normality assumption. The Mroz dataset remains a standard teaching dataset for selection models.

Shaver, J. M. (1998). Accounting for Endogeneity When Assessing Strategy Performance: Does Entry Mode Choice Affect FDI Survival?.

Management ScienceDOI: 10.1287/mnsc.44.4.571

Shaver demonstrates how ignoring endogeneity — specifically, the self-selection of firms into entry modes — biases performance estimates in this foundational strategy paper. He shows that the choice between greenfield entries and acquisitions reflects private information about expected survival, and uses a Heckman-style selection correction to obtain unbiased estimates. One of the first papers to systematically demonstrate endogeneity problems in strategy research.

Survey (6)

Bushway, S., Johnson, B. D., & Slocum, L. A. (2007). Is the Magic Still There? The Use of the Heckman Two-Step Correction for Selection Bias in Criminology.

Journal of Quantitative CriminologyDOI: 10.1007/s10940-007-9024-4

Bushway, Johnson, and Slocum review Heckman model applications in criminology and find widespread misapplication. Emphasizes that without a credible exclusion restriction, the Heckman correction provides no improvement over naive OLS and may even increase bias.

Certo, S. T., Busenbark, J. R., Woo, H., & Semadeni, M. (2016). Sample Selection Bias and Heckman Models in Strategic Management Research.

Strategic Management JournalDOI: 10.1002/smj.2475

Certo, Busenbark, Woo, and Semadeni review the use of Heckman models in strategic management. They provide practical guidance on when selection correction is needed, how to choose exclusion restrictions, and how to interpret results. Finds that many SMJ papers misapply the technique.

Lennox, C. S., Francis, J. R., & Wang, Z. (2012). Selection Models in Accounting Research.

The Accounting ReviewDOI: 10.2308/accr-10195

Lennox, Francis, and Wang review the use (and misuse) of Heckman selection models in accounting research. Documents common pitfalls including weak exclusion restrictions, failure to test normality, and mechanical application without economic justification for the selection equation.

Puhani, P. A. (2000). The Heckman Correction for Sample Selection and Its Critique.

Journal of Economic SurveysDOI: 10.1111/1467-6419.00104

Puhani provides a short overview of Monte Carlo evidence on the Heckman two-step estimator, comparing it with full-information MLE and subsample OLS. Finds MLE preferable absent collinearity between the exclusion restriction and other regressors, but subsample OLS most robust when collinearity is present.

Wolfolds, S. E., & Siegel, J. (2019). Misaccounting for Endogeneity: The Peril of Relying on the Heckman Two-Step Method without a Valid Instrument.

Strategic Management JournalDOI: 10.1002/smj.2995

Wolfolds and Siegel demonstrate that the Heckman selection correction is frequently misapplied in management research, particularly when the exclusion restriction is not credible. They show via simulation and replication that applying the Heckman correction without a valid instrument can introduce more bias than it removes. The paper provides a cautionary guide for researchers considering selection models and recommends transparent reporting of the exclusion restriction.

Wooldridge, J. M. (2010). Econometric Analysis of Cross Section and Panel Data.

MIT Press

Wooldridge's graduate textbook is the standard reference for cross-section and panel data econometrics. Chapters 10-11 provide a thorough treatment of fixed effects, random effects, and related panel data methods, while later chapters cover general estimation methodology (MLE, GMM, M-estimation) with panel data applications throughout. The book covers both linear and nonlinear models with careful attention to assumptions.

Tags

model-basedselection-biassample-selection