Method·advanced·20 min read

Model-BasedEstablished

Heckman Selection Model

Corrects for sample selection bias when the outcome is observed only for a non-random subset of the population, using a two-equation system with an exclusion restriction.

When to UseWhen your outcome variable is observed only for a selected sample -- e.g., wages only for employed workers, firm performance only for surviving firms, deal returns only for completed transactions.

AssumptionJoint normality of the error terms in the selection and outcome equations. An exclusion restriction: at least one variable affects selection but not the outcome. Without the exclusion restriction, identification relies entirely on the normality assumption.

MistakeNot having a credible exclusion restriction and relying solely on the normality assumption for identification, which produces fragile estimates.

Reading Time~20 min read · 11 sections · 7 interactive exercises

One-Line Implementation

Rselection(employed ~ children + married + education + experience, wage ~ education + experience, data = df)

Stataheckman wage education experience, select(employed = children married education experience) twostep

Python# Manual two-step: probit selection, compute IMR, OLS with IMR -- see method page for full implementation

Download Full Analysis Code

Complete scripts with diagnostics, robustness checks, and result export.

Motivating Example: Wages and Female Labor Supply

A labor economist wants to estimate the returns to education for married women. She has a large survey with information on education, age, number of children, husband's income, and — for women who are employed — their hourly wage.

Here is the problem: wages are only observed for women who work. Of the 753 women in her sample, only 428 are employed. The remaining 325 women have missing wages — not because of data errors, but because they chose not to participate in the labor market.

If she simply runs OLS on the 428 working women, she estimates wages conditional on employment. But employment itself is a choice that depends on potential wages. Women with very low potential wages may choose not to work (because the opportunity cost of leisure or home production exceeds their market wage). By restricting the sample to employed women, the researcher is selecting on an outcome that is correlated with the error term in the wage equation.

This problem is . It is not the same as omitted variable bias or measurement error — it arises because the sample is non-randomly drawn from the population. The observed wage distribution is truncated from below: women with the lowest potential wages are disproportionately absent from the sample.

The Heckman selection model (Heckman, 1979) solves this problem. It jointly models two processes: (1) the selection equation — whether a woman works — and (2) the outcome equation — what she earns if she works. By estimating the correlation between the errors in these two equations, the model corrects for the non-random selection into the observed sample. The key correction term is the , which captures the expected value of the error in the wage equation conditional on the woman being selected into the sample.

This approach contributed to James Heckman's 2000 Nobel Prize in Economics (awarded for his broader development of theory and methods for analyzing selective samples) and remains one of the most widely used corrections for sample selection bias across the social sciences (Mroz, 1987).

AOverview

What the Heckman Model Does

The Heckman selection model corrects for bias that arises when the outcome variable is observed only for a non-random subset of the population. It does so by modeling the selection process explicitly and incorporating information about selection into the outcome equation.

The model consists of two equations:

Selection equation (who is observed):

D_i^* = Z_i^{\prime}\gamma + u_i, \quad D_i = \mathbf{1}(D_i^* > 0)

where $D_i = 1$ if the outcome is observed (e.g., the woman works) and $D_i = 0$ otherwise. $Z_i$ is a vector of covariates that affect selection, and $u_i$ is the error term.

Outcome equation (what is the outcome, conditional on being observed):

Y_i = X_i^{\prime}\beta + \varepsilon_i, \quad \text{observed only if } D_i = 1

where $Y_i$ is the outcome of interest (e.g., wages), $X_i$ is a vector of covariates, and $\varepsilon_i$ is the error term.

The key assumption is that the errors $(u_i, \varepsilon_i)$ are jointly normally distributed:

\begin{pmatrix} u_i \\ \varepsilon_i \end{pmatrix} \sim N\left(\begin{pmatrix} 0 \\ 0 \end{pmatrix}, \begin{pmatrix} 1 & \rho\sigma \\ \rho\sigma & \sigma^2 \end{pmatrix}\right)

where $\rho$ is the correlation between the selection and outcome errors, and $\sigma$ is the standard deviation of $\varepsilon_i$ . The variance of $u_i$ is normalized to 1 (as in a standard probit model).

The Selection Bias Problem

If $\rho \neq 0$ , then $E[\varepsilon_i | D_i = 1] \neq 0$ — the expected error in the wage equation is non-zero for the selected sample. Specifically:

E[Y_i \mid X_i, D_i = 1] = X_i^{\prime}\beta + \rho\sigma \cdot \frac{\phi(Z_i^{\prime}\gamma)}{\Phi(Z_i^{\prime}\gamma)}

The term $\lambda(Z_i^{\prime}\gamma) = \frac{\phi(Z_i^{\prime}\gamma)}{\Phi(Z_i^{\prime}\gamma)}$ is the , where $\phi(\cdot)$ is the standard normal PDF and $\Phi(\cdot)$ is the standard normal CDF. Running OLS on the selected sample omits this term, producing biased estimates of $\beta$ whenever $\rho \neq 0$ .

Two Estimation Approaches

Heckman Two-Step (Heckit):

Estimate the as a probit: $P(D_i = 1 | Z_i) = \Phi(Z_i^{\prime}\gamma)$
Compute the inverse Mills ratio: $\hat{\lambda}_i = \phi(Z_i^{\prime}\hat{\gamma}) / \Phi(Z_i^{\prime}\hat{\gamma})$
Include $\hat{\lambda}_i$ as an additional regressor in the outcome equation and estimate by OLS:

$Y_i = X_i^{\prime}\beta + \rho\sigma \cdot \hat{\lambda}_i + \text{error}$

The coefficient on $\hat{\lambda}_i$ estimates $\rho\sigma$ . Standard errors must be corrected because $\hat{\lambda}_i$ is a generated regressor (the standard OLS standard errors are too small).

Full Information Maximum Likelihood (FIML):

Estimate both equations simultaneously by maximizing the joint likelihood of the observed data. FIML is more efficient than the two-step estimator when the joint normality assumption holds, but it requires this assumption for the entire joint distribution, not just the conditional mean. With a weak exclusion restriction, both methods are fragile — MLE leverages the functional form more aggressively, which helps only if normality is correct (Puhani, 2000).

When to Use the Heckman Model

Your outcome is observed only for a selected subsample and you believe the selection is non-random
You have a credible — a variable that affects selection but not the outcome
Joint normality of the error terms is a reasonable approximation
You want to estimate causal effects or population-level relationships, not just conditional associations for the selected sample

When NOT to Use the Heckman Model

Your missing data are missing at random (MAR) — standard imputation or inverse probability weighting may suffice
You have no credible exclusion restriction — without one, the model is identified only through functional form (normality), which is fragile (Bushway et al., 2007)
The normality assumption is substantially violated (e.g., the outcome distribution is heavily skewed, bounded, or multimodal) — consider semiparametric alternatives or Lee bounds
You have a randomized experiment with attrition — Lee bounds provide a more robust approach

Common Confusions

Frequently asked questions about the Heckman model

Q: Is the Heckman model the same as handling missing data? Not exactly. Standard missing data methods (multiple imputation, EM algorithm) assume data are missing at random (MAR) — the probability of missingness depends only on observed variables. The Heckman model addresses missing not at random (MNAR) — the probability of observing the outcome depends on the outcome itself (or its error term). These situations are fundamentally different problems.
Q: Do I always need an exclusion restriction? Technically, the model is identified without one through the nonlinearity of the inverse Mills ratio. In practice, identification without an exclusion restriction relies entirely on the normality assumption and is extremely fragile. Certo et al. (2016) and Lennox et al. (2012) both emphasize that a credible exclusion restriction is essential for applied work.
Q: What is the difference between the two-step estimator and MLE? The two-step estimator is computationally simpler and more robust to mild departures from normality in the selection equation (only the conditional mean needs to be correct). MLE is more efficient but requires the full joint normality assumption. When the exclusion restriction is weak, neither method is reliable — identification effectively comes from functional form rather than exclusion (Puhani, 2000).
Q: Can I use the Heckman model with a binary or count outcome? The standard Heckman model assumes a continuous outcome. Extensions exist for binary outcomes (Heckman probit) and count data, but they are more complex and less commonly used. For binary outcomes, consider bivariate probit instead.

When (Not) to Use This Method

Use the Heckman Model When:

Your outcome is observed only for a non-randomly selected subset. Wages for workers, test scores for students who did not drop out, firm performance for firms that survived, analyst forecasts for covered firms. The key question is: would including the unobserved outcomes change your conclusions?
You have a credible exclusion restriction. A variable that plausibly affects selection but not the outcome. Without one, the model relies on functional form assumptions that are typically indefensible.
Joint normality is a reasonable approximation. If the outcome variable is continuous, approximately symmetric, and not heavily skewed or bounded, normality is more plausible.
You want to recover population-level parameters. If you only care about the effect of education on wages for women who work, OLS on the selected sample is fine. But if you want the effect of education on potential wages for all women (workers and non-workers), you need the selection correction.

Do NOT Use the Heckman Model When:

Your missing data are missing at random (MAR). If missingness depends only on observed covariates, standard methods (multiple imputation, inverse probability weighting) are sufficient and do not require the normality assumption.
You have no exclusion restriction and cannot defend one. Without an exclusion restriction, you are relying entirely on the nonlinearity of the normal CDF for identification. Certo et al. (2016) find that many management papers apply Heckman without a credible exclusion restriction, rendering the correction unreliable.
The normality assumption is doubtful. If the outcome variable is heavily skewed (e.g., firm value, patent counts) or bounded (e.g., proportions, ratings on a fixed scale), the joint normality assumption is suspect. Consider semiparametric alternatives or Lee bounds.
You have a randomized experiment with differential attrition. Lee bounds provide a nonparametric approach that does not require normality or an exclusion restriction. They give bounds on the treatment effect rather than a point estimate, but the bounds are valid under weaker assumptions.
Selection is on observables only. If you believe you have observed all the variables that drive selection, matching or inverse probability weighting can address selection without parametric distributional assumptions.

Connection to Other Methods

The Heckman model relates to several other methods for handling selection and endogeneity:

Logit/Probit: the selection equation in the Heckman model is a probit. If you are only interested in modeling the selection decision itself (e.g., labor force participation), a standalone probit is sufficient. The Heckman model adds the outcome equation and the correlation between the two.
OLS: the outcome equation estimated on the selected sample is OLS with selection bias. The Heckman model adds the inverse Mills ratio to correct this bias. If $\rho = 0$ (no selection bias), the Heckman model reduces to OLS on the selected sample.
IV/2SLS: conceptually similar in that both require an exclusion restriction. In IV, the instrument affects the endogenous regressor but not the outcome directly. In Heckman, the excluded variable affects selection but not the outcome. The Heckman two-step can be viewed as a approach — adding a correction term rather than instrumenting.
Lee bounds: a nonparametric alternative that bounds the treatment effect under weaker assumptions (monotonicity of selection, no functional form). Lee bounds are popular in program evaluation when normality is questionable. The trade-off is that you get bounds rather than a point estimate.
Matching: addresses selection on observables. Matching assumes that, conditional on observed covariates, selection is as good as random. The Heckman model addresses selection on unobservables — the case where selection depends on factors correlated with the outcome error term even after conditioning on observables.
Control function approach: the Heckman two-step is a special case of the control function approach. In the general control function framework, the correction term can take forms other than the inverse Mills ratio (e.g., for non-normal error distributions). Rivers and Vuong (1988) extends this to simultaneous equations with endogenous regressors.

BIdentification

For the Heckman model to provide valid correction, two key conditions must hold.

Condition 1: Joint Normality

Plain language: The unobserved factors affecting selection and the unobserved factors affecting the outcome must follow a bivariate normal distribution.

Formally: $(u_i, \varepsilon_i) \sim N(0, \Sigma)$ where $\Sigma = \begin{pmatrix} 1 & \rho\sigma \\ \rho\sigma & \sigma^2 \end{pmatrix}$ .

This assumption is crucial because the inverse Mills ratio correction is derived from the properties of the bivariate normal distribution. If the errors are not jointly normal, the functional form of the correction term $\lambda(\cdot)$ is wrong, and the bias correction may itself be biased.

Joint normality is a strong assumption

Joint normality is not testable from the observed data alone (because the outcome is missing for the unselected observations). However, you can perform indirect checks: (1) test whether the residuals from the outcome equation are approximately normal, (2) examine whether the inverse Mills ratio enters linearly (a polynomial in $\hat{\lambda}$ should not be significant if normality holds), and (3) compare two-step and MLE results — large discrepancies suggest normality problems.

The two-step estimator's consistency requires only that $E[u | v] = \gamma v$ (linear conditional expectation), which is weaker than full bivariate normality. This property makes the two-step procedure more robust to departures from normality than MLE (Wooldridge, 2010).

Condition 2: Exclusion Restriction

Plain language: At least one variable must appear in the selection equation but not in the outcome equation. This variable affects whether the outcome is observed but does not directly affect the outcome itself.

Formally: There exists a variable $Z_k$ such that $\gamma_k \neq 0$ (it affects selection) but $Z_k$ does not appear in $X_i$ (it is excluded from the outcome equation).

Why it matters: Without an exclusion restriction, the Heckman model is identified only through the nonlinearity of the inverse Mills ratio — which comes from the normality assumption. If normality is even slightly wrong, this "identification through functional form" can produce wildly inaccurate corrections. Bushway et al. (2007) demonstrate that without a credible exclusion restriction, the Heckman correction provides no improvement over naive OLS and may increase bias.

Examples of exclusion restrictions in practice:

Research setting	Selection equation	Exclusion restriction	Justification
Female labor supply	Works or not	Number of young children, husband's income	Affect labor force participation but not wage rate conditional on working
Firm R&D spending	Reports R&D or not	Industry peers' reporting practices	Affects disclosure but not the level of R&D
CEO compensation	Firm is publicly traded	State-level IPO regulations	Affect listing decision but not pay
Analyst coverage	Firm is covered or not	Geographic distance to nearest analyst	Affects coverage probability but not firm value

CVisual Intuition

Adjust the selection correlation to see how naive OLS on the selected sample diverges from the true effect. When selection is strong, OLS is badly biased; the Heckman correction uses the inverse Mills ratio to recover the true coefficient.

Watch how non-random selection into the labor force distorts the observed wage-education relationship:

DMathematical Derivation

Derivation of the Inverse Mills Ratio Correction

Don't worry about the notation yet — here's what this means in words: The two-step correction derives from the conditional expectation of the outcome given selection, under joint normality of the error terms.

Setup. We have two equations:

Selection: $D_i^* = Z_i^{\prime}\gamma + u_i$ , where $D_i = \mathbf{1}(D_i^* > 0)$

Outcome: $Y_i = X_i^{\prime}\beta + \varepsilon_i$ , observed only when $D_i = 1$

Assume $(u_i, \varepsilon_i)$ are jointly normal with correlation $\rho$ and $\text{Var}(u_i) = 1$ , $\text{Var}(\varepsilon_i) = \sigma^2$ .

Step 1: Conditional expectation of the outcome.

E[Y_i \mid X_i, D_i = 1] = X_i^{\prime}\beta + E[\varepsilon_i \mid D_i = 1]

The OLS bias is $E[\varepsilon_i | D_i = 1]$ , which is non-zero when selection is correlated with the outcome.

Step 2: Use the properties of the truncated bivariate normal.

Since $D_i = 1$ iff $u_i > -Z_i^{\prime}\gamma$ , we need $E[\varepsilon_i | u_i > -Z_i^{\prime}\gamma]$ .

For jointly normal $(u_i, \varepsilon_i)$ :

E[\varepsilon_i \mid u_i > -Z_i^{\prime}\gamma] = \rho\sigma \cdot E[u_i \mid u_i > -Z_i^{\prime}\gamma]

Step 3: Expected value of a truncated standard normal.

For $u_i \sim N(0, 1)$ :

E[u_i \mid u_i > -c] = \frac{\phi(c)}{\Phi(c)}

where $c = Z_i^{\prime}\gamma$ . This ratio $\lambda(c) = \phi(c) / \Phi(c)$ is the inverse Mills ratio.

Step 4: Combine.

E[Y_i \mid X_i, D_i = 1] = X_i^{\prime}\beta + \rho\sigma \cdot \lambda(Z_i^{\prime}\gamma)

This derivation shows that OLS on the selected sample omits $\rho\sigma \cdot \lambda(Z_i^{\prime}\gamma)$ , producing an omitted variable bias. The two-step estimator corrects this by including $\hat{\lambda}_i$ as an additional regressor.

Step 5: Standard error correction.

Because $\hat{\lambda}_i$ is estimated from the first-stage probit (it is a generated regressor), the OLS standard errors in the second stage are incorrect. The correct variance-covariance matrix accounts for the estimation uncertainty in $\hat{\gamma}$ . All standard software packages compute the corrected standard errors automatically.

EImplementation

Heckman Selection Model with Diagnostics

1# Requires: sampleSelection
2# sampleSelection: R package for Heckman selection models (Toomet & Henningsen)
3library(sampleSelection)
4
5# --- Step 1: Prepare the data ---
6# Outcome: log(wage), observed ONLY for working women (selected sample)
7# Selection: whether the woman works (lfp = 1)
8# Exclusion restriction: nwifeinc (non-wife income), kidslt6 (kids under 6)
9# These affect labor force participation but should not directly affect wage rate
10
11# --- Step 2: Heckman two-step estimator ---
12# heckit() estimates the selection and outcome equations jointly
13# selection: probit for participation (first stage)
14# outcome: OLS for wages, corrected by the inverse Mills ratio (second stage)
15heck_2step <- heckit(
16selection = lfp ~ age + I(age^2) + educ + nwifeinc + kidslt6 + kidsge6,
17outcome   = log(wage) ~ educ + exper + I(exper^2) + city,
18data = df,
19method = "2step"
20)
21summary(heck_2step)
22# Coefficient on lambda (inverse Mills ratio) estimates rho * sigma
23# If lambda is significant, this is consistent with selection bias under the model
24
25# --- Step 3: Full information MLE ---
26# MLE estimates the full joint model under bivariate normality
27# More efficient than two-step but more sensitive to normality violations
28heck_mle <- heckit(
29selection = lfp ~ age + I(age^2) + educ + nwifeinc + kidslt6 + kidsge6,
30outcome   = log(wage) ~ educ + exper + I(exper^2) + city,
31data = df,
32method = "ml"
33)
34summary(heck_mle)
35# MLE directly estimates rho and sigma (not just rho*sigma)
36# Compare two-step and MLE: large discrepancies suggest normality problems
37
38# --- Step 4: Diagnostics ---
39# Test H0: rho = 0 (no selection bias)
40# If not rejected, OLS on the selected sample is consistent
41
42# Compare with naive OLS (ignoring selection)
43ols_naive <- lm(log(wage) ~ educ + exper + I(exper^2) + city,
44              data = df[df$lfp == 1, ])
45summary(ols_naive)
46# If Heckman and OLS coefficients differ substantially, selection bias
47# is economically meaningful and the correction is needed

FDiagnostics

F.1 Significance of the Inverse Mills Ratio (Lambda)

The most basic diagnostic: is $\hat{\lambda}$ (the coefficient on the inverse Mills ratio) statistically significant? Under the null hypothesis $H_0: \rho = 0$ , there is no selection bias, and OLS on the selected sample is consistent. If $\hat{\lambda}$ is insignificant, either (a) there is genuinely no selection bias, or (b) the exclusion restriction is too weak to detect it.

In the two-step estimator, test whether the coefficient on $\hat{\lambda}_i$ is significant (use corrected standard errors)
In MLE, test $H_0: \rho = 0$ using the Wald test on $\hat{\rho}$ or a likelihood ratio test comparing the joint model to separate probit + OLS

F.2 Normality Tests

Since the correction relies on joint normality, assess whether this assumption is plausible:

Residual normality: test the outcome equation residuals for normality (Shapiro-Wilk, Jarque-Bera, Q-Q plot). This checks marginal normality of $\varepsilon_i$ , which is necessary but not sufficient for joint normality.
Polynomial Mills ratio test: add $\hat{\lambda}_i^2$ and $\hat{\lambda}_i^3$ to the outcome equation. If these higher-order terms are significant, the linear Mills ratio correction is insufficient — evidence against normality. The polynomial Mills ratio test is a specification test for the functional form of the correction.
Compare two-step and MLE: large discrepancies between the two estimators suggest normality violations (MLE relies more heavily on normality than the two-step estimator).

F.3 Exclusion Restriction Strength

A weak exclusion restriction produces imprecise estimates and makes the model fragile:

Joint significance test: test whether the excluded variables are jointly significant in the probit selection equation ( $\chi^2$ test). An insignificant exclusion restriction means the model is unidentified without functional form.
Magnitude of the first-stage coefficients: the excluded variables should have economically meaningful effects on selection, not just statistical significance.
Collinearity check: verify that the inverse Mills ratio is not highly collinear with the covariates in the outcome equation. High collinearity (variance inflation factor (VIF) > 10 for $\hat{\lambda}$ ) indicates that the model cannot distinguish the selection correction from the direct effects of the covariates.

F.4 Two-Step vs. MLE Comparison

Estimate the model using both methods and compare:

Similar results: reassuring. Both methods are estimating the same parameters, and the normality assumption is likely adequate.
Different results: cause for concern. Possible explanations: (1) normality is violated (MLE is more sensitive), (2) the exclusion restriction is weak (MLE leverages different information than two-step), (3) the sample size is too small for MLE to converge properly.

1library(sampleSelection)
2
3# Fit two-step and MLE
4heck_2s <- heckit(
5selection = lfp ~ age + I(age^2) + educ + nwifeinc + kidslt6 + kidsge6,
6outcome   = log(wage) ~ educ + exper + I(exper^2) + city,
7data = df, method = "2step"
8)
9heck_ml <- heckit(
10selection = lfp ~ age + I(age^2) + educ + nwifeinc + kidslt6 + kidsge6,
11outcome   = log(wage) ~ educ + exper + I(exper^2) + city,
12data = df, method = "ml"
13)
14
15# F.1 Test significance of lambda (rho * sigma)
16summary(heck_2s)  # Check p-value on "Inverse Mills Ratio"
17
18# F.2 Polynomial Mills ratio test for normality
19# Compute Mills ratio from first-stage probit
20probit_fit <- glm(lfp ~ age + I(age^2) + educ + nwifeinc +
21                  kidslt6 + kidsge6,
22                family = binomial(link = "probit"), data = df)
23Zgamma <- predict(probit_fit, type = "link")
24mills <- dnorm(Zgamma) / pnorm(Zgamma)
25
26# Add squared and cubed Mills ratio to outcome equation
27selected <- df[df$lfp == 1, ]
28mills_sel <- mills[df$lfp == 1]
29selected$mills  <- mills_sel
30selected$mills2 <- mills_sel^2
31selected$mills3 <- mills_sel^3
32
33norm_test <- lm(log(wage) ~ educ + exper + I(exper^2) + city +
34                mills + mills2 + mills3,
35              data = selected)
36# Joint test of mills2 and mills3
37library(car)
38linearHypothesis(norm_test, c("mills2 = 0", "mills3 = 0"))
39
40# F.3 Exclusion restriction strength
41anova(
42glm(lfp ~ age + I(age^2) + educ + kidsge6,
43    family = binomial(link = "probit"), data = df),
44probit_fit, test = "Chisq"
45)
46
47# F.4 Compare two-step and MLE
48cbind(TwoStep = coef(heck_2s, part = "outcome"),
49    MLE = coef(heck_ml, part = "outcome"))

Requirescar

F.5 Interpreting Your Results

What Rho Tells You

The parameter $\rho$ — the correlation between the selection and outcome errors — is the key parameter that drives the selection correction.

$\rho$	Sign of bias in naive OLS	Interpretation
$\rho < 0$	OLS underestimates the mean outcome	Individuals who select in have lower unobserved outcome potential (negative selection). Since $\rho\sigma < 0$ and the inverse Mills ratio $> 0$ , $E[\varepsilon
$\rho = 0$	No bias	Selection is independent of the outcome (conditional on covariates). OLS on the selected sample is consistent.
$\rho > 0$	OLS overestimates the mean outcome	Individuals who select in have higher unobserved outcome potential (positive selection). Since $\rho\sigma > 0$ , $E[\varepsilon

What Lambda ( $\hat{\lambda}$ Coefficient) Captures

The coefficient on the inverse Mills ratio in the two-step estimator is $\hat{\rho}\hat{\sigma}$ , sometimes denoted $\hat{\lambda}$ or $\hat{\delta}$ . Its magnitude and significance determine:

Significance of $\hat{\lambda}$ : if significant, this result is consistent with correlation between the selection and outcome disturbances under the maintained model — but significance alone does not prove selection bias is present (Certo et al., 2016). If insignificant, either selection bias is absent or the model lacks power to detect it (weak exclusion restriction).
Magnitude of $\hat{\lambda}$ : the product $\rho\sigma$ determines the size of the bias correction. A large $|\hat{\lambda}|$ coefficient means the selection correction substantially changes the outcome equation coefficients.
Sign of $\hat{\lambda}$ : reveals the direction of selection. Negative $\hat{\lambda}$ means negative selection ( $\rho < 0$ ) — those who select in have lower unobserved outcome potential than a random draw from the population, so OLS on the selected sample underestimates the population mean.

What to Report in a Table

A well-reported Heckman model should include:

Both equations: report the full selection equation (probit coefficients) and the outcome equation (OLS coefficients with Mills ratio correction)
Lambda ( $\rho\sigma$ ): the coefficient on the inverse Mills ratio, with its standard error and p-value
Rho ( $\hat{\rho}$ ) and sigma ( $\hat{\sigma}$ ): if using MLE, report these separately
Number of observations: total N, number selected (observed outcome), number not selected
Exclusion restriction: clearly identify which variables appear in the selection equation but not the outcome equation
Exclusion restriction strength: chi-squared test of the excluded variables in the selection equation
Estimation method: two-step or MLE
Naive OLS comparison: show how the coefficients change when selection is ignored

Example write-up

"We estimate a Heckman selection model to correct for non-random selection into employment. The selection equation models labor force participation as a function of age, education, husband's income, and number of young children. The outcome equation models log wages as a function of education and experience. Husband's income and number of children under 6 serve as exclusion restrictions: they are jointly significant in the selection equation ( $\chi^2$ = 48.7, p < 0.001) but are excluded from the wage equation on the grounds that they affect the participation decision but not the market wage rate conditional on working.

The inverse Mills ratio is negative and significant ( $\hat{\lambda}$ = -0.28, SE = 0.11, p = 0.01), indicating negative selection into employment: women who choose to work have lower unobserved wage potential than non-workers (they work out of financial necessity rather than high market value). The estimated correlation between the selection and wage errors is $\hat{\rho}$ = -0.45. After correcting for selection, the return to education is 10.8% per year (SE = 1.4%), compared to 8.2% in naive OLS — consistent with negative selection attenuating the naive estimate. Results are robust to MLE estimation ( $\hat{\beta}_{\text{educ}}$ = 10.5%, SE = 1.2%)."

Common pitfalls

Do not say "the Heckman model controls for ." It corrects for one specific type of endogeneity — sample selection bias. Other sources of endogeneity (omitted variable bias, reverse causality, measurement error in the regressors) are not addressed.
Do not interpret a significant Mills ratio as proof that selection bias exists or that your model is correctly specified. A significant $\hat{\lambda}$ is consistent with selection bias under the maintained model, but the correction is only valid if the normality assumption and exclusion restriction hold (Certo et al., 2016).
Do not report only the outcome equation. The selection equation is essential for assessing the model. Readers need to see the exclusion restriction and its strength.
Do not apply the Heckman correction mechanically without discussing the economics of the selection process. Lennox et al. (2012) document that mechanical application without economic justification is one of the most common errors.

GWhat Can Go Wrong

What Can Go Wrong

No Exclusion Restriction

Heckman model with a credible exclusion restriction (number of young children affects labor force participation but not wage rate)

Returns to education: 0.108 (SE = 0.014). Lambda = -0.28 (SE = 0.11, p = 0.01). The selection correction is significant and the education coefficient is well-estimated because the exclusion restriction (kidslt6) is strong in the selection equation (chi-squared = 42.3, p < 0.001).

What Can Go Wrong

Normality Violation

Outcome variable (log wages) is approximately normally distributed. Joint normality of error terms is plausible.

Two-step estimate of returns to education: 0.108 (SE = 0.014). MLE estimate: 0.105 (SE = 0.012). The two methods agree closely, and the polynomial Mills ratio test does not reject normality (p = 0.61).

What Can Go Wrong

Weak Exclusion Restriction

Exclusion restriction (number of children under 6) is strongly predictive of labor force participation: probit coefficient = -0.87, chi-squared = 42.3, p < 0.001

Lambda = -0.28 (SE = 0.11). The inverse Mills ratio is precisely estimated, providing a meaningful selection correction. The VIF of lambda in the outcome equation is 2.1 — well below the danger zone.

What Can Go Wrong

Confusing Incidental Truncation with Sample Selection

True sample selection: wages are missing because women choose not to work. The selection decision is correlated with potential wages (women with low potential wages opt out).

Heckman correction is appropriate: rho = -0.45. Women who select into employment have higher-than-average potential wages. OLS on the selected sample overestimates average wages in the population.

HPractice

H.1 Concept Checks

Concept Check

A researcher estimates a Heckman selection model for CEO compensation, where compensation is observed only for public firms. She uses state-level IPO regulations as the exclusion restriction. She finds that the inverse Mills ratio coefficient is -0.45 (SE = 0.18, p = 0.01) and rho = -0.38. What does the negative rho tell us about the selection process?

Public firms have lower unobserved compensation potential than private firms. OLS on public firms underestimates average compensation.Firms that select into being publicly traded tend to have higher unobserved compensation. OLS on the selected sample of public firms overestimates average CEO pay relative to all firms.The exclusion restriction is invalid because rho is significant.The model has failed because rho should be zero if the exclusion restriction is valid.

Concept Check

A researcher applies the Heckman two-step estimator to study the effect of R&D spending on firm performance. She includes the same set of variables in both the selection equation (whether the firm reports R&D) and the outcome equation (firm performance given R&D is reported). She has no exclusion restriction. She finds that the inverse Mills ratio is significant (p = 0.04). Is this evidence that her model is working correctly?

Yes — a significant Mills ratio proves that selection bias exists and the model is correcting for it.No — without an exclusion restriction, the significance of the Mills ratio is driven by functional form (normality) rather than genuine identification. The model may be making the estimates worse, not better.It depends on whether the R-squared improved relative to OLS.Yes, but only if the two-step and MLE estimates agree.

Concept Check

You estimate a Heckman model and find that lambda (the coefficient on the inverse Mills ratio) is -0.15 with a standard error of 0.42 and p = 0.72. The naive OLS coefficient on your key variable is 0.35 (SE = 0.08), while the Heckman-corrected coefficient is 0.33 (SE = 0.14). What should you conclude?

The Heckman model is preferred because it corrects for selection bias.There is no evidence of selection bias (lambda is insignificant). The naive OLS estimate is more efficient and should be preferred, though you should report both results and discuss why selection may be absent.The insignificant lambda proves that there is no selection bias in the data.You should switch to MLE because the two-step estimator lacks power.

H.2 Guided Exercise

Guided Exercise

Interpreting Heckman Selection Model Output

You study the effect of training programs on worker wages. Wages are observed only for employed workers. Your Heckman model produces:

Selection Equation (Probit: Employed = 1)

Variable	Coeff	SE	p-value
Age	0.045	0.012	< 0.001
Age-squared	-0.0005	0.0002	0.012
Education	0.112	0.021	< 0.001
Married	0.284	0.098	0.004
Num. children < 6	-0.432	0.076	< 0.001 [EXCLUDED]
Spouse income ($000s)	-0.018	0.005	< 0.001 [EXCLUDED]

Exclusion restriction test: chi-squared(2) = 52.4, p < 0.001

Outcome Equation (Dep. var: log(wage))

Variable	Coeff	SE	p-value
Education	0.098	0.015	< 0.001
Training program	0.145	0.038	< 0.001
Experience	0.032	0.008	< 0.001
Experience-squared	-0.0005	0.0002	0.012
Inverse Mills ratio	-0.312	0.104	0.003

rho = -0.47, sigma = 0.664, lambda = rho * sigma = -0.312 Method: Two-step. N = 2,000 (1,340 employed, 660 not employed).

Naive OLS on employed workers: Training coefficient = 0.118 (SE = 0.032, p < 0.001).

H.3 Error Detective

Error Detective

Read the analysis below carefully and identify the errors.

A management researcher studies whether board diversity affects firm performance. She argues that firm performance (Tobin's Q) is observed only for publicly listed firms, creating a selection problem. She estimates a Heckman model:

Selection equation: Listed = f(firm_age, total_assets, industry) Outcome equation: TobinsQ = f(board_diversity, firm_size, leverage, ROA, industry)

She uses firm_age and total_assets as exclusion restrictions, arguing they affect the listing decision but not performance. She reports:

- Board diversity coefficient (Heckman): 0.42 (p = 0.03) - Board diversity coefficient (OLS): 0.28 (p = 0.08) - Lambda: -0.89 (SE = 0.31, p = 0.004) - Rho: -0.62 - Method: Two-step

She concludes: "After correcting for selection into public listing, board diversity has an even stronger positive effect on firm performance."

She does not report the selection equation coefficients or test the exclusion restriction strength.

Select all errors you can find:

Exclusion restriction is not credible: firm age and total assets plausibly affect Tobin's Q directly(Exclusion restriction)

Selection equation not reported and exclusion restriction strength not tested(Reporting)

No normality diagnostics reported(Diagnostics)

Causal language without addressing other sources of endogeneity(Interpretation)

Error Detective

Read the analysis below carefully and identify the errors.

A labor economist studies the gender wage gap. She estimates a Heckman model separately for men and women:

For women: - Selection equation: Employed = f(age, education, married, children_under_5, spouse_income) - Outcome equation: log(wage) = f(education, experience, experience^2, occupation) - Exclusion restrictions: children_under_5, spouse_income - Lambda = -0.31 (SE = 0.09, p < 0.001), rho = -0.42

For men: - Selection equation: Employed = f(age, education, married, children_under_5, spouse_income) - Outcome equation: log(wage) = f(education, experience, experience^2, occupation) - Exclusion restrictions: children_under_5, spouse_income - Lambda = 0.02 (SE = 0.15, p = 0.89), rho = 0.03

She reports: "The selection-corrected gender wage gap is 22 log points, compared to 18 log points in naive OLS. Selection correction matters for women but not for men."

She does not discuss whether the same exclusion restriction is appropriate for both genders.

Select all errors you can find:

Same exclusion restriction may not be valid for both genders(Exclusion restriction)

Occupation is included as a control in the wage equation, but it is potentially endogenous(Model specification)

Comparing selection-corrected estimates across different models is problematic(Interpretation)

H.4 You Are the Referee

Referee Exercise

Read the paper summary below and write a brief referee critique (2-3 sentences) of the identification strategy.

Paper Summary

The authors study returns to education for married women using a Heckman two-step selection model, correcting for the fact that wages are observed only for employed women. They use a sample of 3,200 married women (1,950 employed, 1,250 not employed) from a national household survey. As their exclusion restriction, they use whether the woman's mother was employed when the woman was age 14, arguing that maternal employment norms affect daughters' labor force participation but not their wages. They find a selection-corrected return to education of 11.2% per year (SE = 1.8%), compared to a naive OLS estimate of 8.5% (SE = 1.1%). Lambda is -0.28 (p = 0.02).

Key Table

Variable	Coefficient	SE	p-value
Education (years)	0.112	0.018	<0.001
Experience	0.035	0.009	<0.001
Experience-squared	-0.001	0.0004	0.008
Inverse Mills ratio	-0.280	0.120	0.020
N (total)	3,200
N (employed)	1,950

Authors' Identification Claim

The authors argue that maternal employment status is a valid exclusion restriction: it affects daughters' labor force participation through intergenerational transmission of work norms but does not directly affect daughters' market wages.

ISwap-In: When to Use Something Else

Lee bounds: when normality is doubtful or no credible exclusion restriction exists. Lee bounds require only a monotonicity assumption (treatment does not make anyone less likely to be selected). You get an interval rather than a point estimate, but it is valid under weaker assumptions. Recommended as a robustness check alongside the Heckman model.
Control function approach: a generalization of the Heckman two-step. Instead of assuming joint normality (which implies the correction term is the inverse Mills ratio), you can use nonparametric or semiparametric estimates of the control function. This relaxes the distributional assumption at the cost of requiring a stronger exclusion restriction and a larger sample.
Semiparametric selection models: methods by , , and others estimate the selection correction without assuming normality. These use kernel or series estimators for the conditional expectation of the error. They require larger samples and are computationally more complex.
Inverse probability weighting (IPW): weight each observation by the inverse of its probability of being selected (estimated from the selection equation). Unlike Heckman, IPW does not require normality and can accommodate nonlinear outcome models. However, IPW only addresses selection on observables — it cannot correct for selection on unobservables.
Bivariate probit: when both the selection and the outcome are binary. The Heckman model assumes a continuous outcome; bivariate probit handles two binary equations with correlated errors. Uses the same exclusion restriction logic.
Bounds approaches (Manski bounds): when you want the weakest possible assumptions. Manski worst-case bounds assume nothing about the selection process and provide very wide intervals. Lee bounds tighten these by adding a monotonicity assumption.

JReviewer Checklist

Paper Library

Has replication code

Foundational (4)

Heckman, J. J. (1979). Sample Selection Bias as a Specification Error.

EconometricaDOI: 10.2307/1912352

Heckman introduces the two-step estimator for correcting sample selection bias using the inverse Mills ratio. The paper shows that selection bias can be treated as an omitted variable problem, where the omitted variable is the conditional expectation of the error term given selection. One of the most cited papers in econometrics.

Newey, W. K. (1999). Two Step Series Estimation of Sample Selection Models.

MIT Department of Economics Working Paper 99-04

Newey proposes a semiparametric two-step estimator for sample selection models that replaces the parametric inverse Mills ratio with a flexible series (power series or regression spline) approximation to the unknown selection correction function. This approach avoids the normality assumption underlying the standard Heckman correction while retaining the computational convenience of a two-step procedure. Researchers concerned about distributional misspecification in selection models can use series-based selection corrections as a robust alternative to parametric methods.

Powell, J. L. (1987). Semiparametric Estimation of Bivariate Latent Variable Models.

SSRI Working Paper 8704, University of Wisconsin-Madison

Powell develops semiparametric methods for estimating bivariate latent variable models—including censored sample selection models—without imposing distributional assumptions on the error terms. This approach relaxes the bivariate normality requirement of the Heckman two-step estimator, requiring only an exclusion restriction and mild regularity conditions. Researchers who doubt the normality assumption in selection models can apply these methods to obtain consistent estimates under weaker conditions.

Rivers, D., & Vuong, Q. H. (1988). Limited Information Estimators and Exogeneity Tests for Simultaneous Probit Models.

Journal of EconometricsDOI: 10.1016/0304-4076(88)90063-2

Rivers and Vuong propose a computationally simple two-step maximum likelihood procedure for estimating simultaneous probit models with endogenous regressors, and derive simple exogeneity tests based on this estimator. The exogeneity tests are asymptotically equivalent to classical tests based on limited information maximum likelihood but require only probit and OLS regressions to implement. Applied researchers working with binary outcome models and suspected endogeneity can use the Rivers-Vuong procedure as a tractable alternative to full information maximum likelihood.

Application (2)

Mroz, T. A. (1987). The Sensitivity of an Empirical Model of Married Women's Hours of Work to Economic and Statistical Assumptions.

EconometricaDOI: 10.2307/1911029

Mroz provides a classic application of the Heckman selection model to female labor supply. Shows that the two-step estimator's results are sensitive to the choice of exclusion restriction and the normality assumption. The Mroz dataset remains a standard teaching dataset for selection models.

Shaver, J. M. (1998). Accounting for Endogeneity When Assessing Strategy Performance: Does Entry Mode Choice Affect FDI Survival?.

Management ScienceDOI: 10.1287/mnsc.44.4.571

Shaver demonstrates how ignoring endogeneity — specifically, the self-selection of firms into entry modes — biases performance estimates in this foundational strategy paper. He shows that the choice between greenfield entries and acquisitions reflects private information about expected survival, and uses a Heckman-style selection correction to obtain unbiased estimates. One of the first papers to systematically demonstrate endogeneity problems in strategy research.

Survey (6)

Bushway, S., Johnson, B. D., & Slocum, L. A. (2007). Is the Magic Still There? The Use of the Heckman Two-Step Correction for Selection Bias in Criminology.

Journal of Quantitative CriminologyDOI: 10.1007/s10940-007-9024-4

Bushway, Johnson, and Slocum review Heckman model applications in criminology and find widespread misapplication. Emphasizes that without a credible exclusion restriction, the Heckman correction provides no improvement over naive OLS and may even increase bias.

Certo, S. T., Busenbark, J. R., Woo, H., & Semadeni, M. (2016). Sample Selection Bias and Heckman Models in Strategic Management Research.

Strategic Management JournalDOI: 10.1002/smj.2475

Certo, Busenbark, Woo, and Semadeni review the use of Heckman models in strategic management. They provide practical guidance on when selection correction is needed, how to choose exclusion restrictions, and how to interpret results. Finds that many SMJ papers misapply the technique.

Lennox, C. S., Francis, J. R., & Wang, Z. (2012). Selection Models in Accounting Research.

The Accounting ReviewDOI: 10.2308/accr-10195

Lennox, Francis, and Wang review the use (and misuse) of Heckman selection models in accounting research. Documents common pitfalls including weak exclusion restrictions, failure to test normality, and mechanical application without economic justification for the selection equation.

Puhani, P. A. (2000). The Heckman Correction for Sample Selection and Its Critique.

Journal of Economic SurveysDOI: 10.1111/1467-6419.00104

Puhani provides a short overview of Monte Carlo evidence on the Heckman two-step estimator, comparing it with full-information MLE and subsample OLS. Finds MLE preferable absent collinearity between the exclusion restriction and other regressors, but subsample OLS most robust when collinearity is present.

Wolfolds, S. E., & Siegel, J. (2019). Misaccounting for Endogeneity: The Peril of Relying on the Heckman Two-Step Method without a Valid Instrument.

Strategic Management JournalDOI: 10.1002/smj.2995

Wolfolds and Siegel demonstrate that the Heckman selection correction is frequently misapplied in management research, particularly when the exclusion restriction is not credible. They show via simulation and replication that applying the Heckman correction without a valid instrument can introduce more bias than it removes. The paper provides a cautionary guide for researchers considering selection models and recommends transparent reporting of the exclusion restriction.

Wooldridge, J. M. (2010). Econometric Analysis of Cross Section and Panel Data.

MIT Press

Wooldridge's graduate textbook is the standard reference for cross-section and panel data econometrics. Chapters 10-11 provide a thorough treatment of fixed effects, random effects, and related panel data methods, while later chapters cover general estimation methodology (MLE, GMM, M-estimation) with panel data applications throughout. The book covers both linear and nonlinear models with careful attention to assumptions.

One-Line Implementation

Download Full Analysis Code

Motivating Example: Wages and Female Labor Supply#

AOverview#

What the Heckman Model Does#

The Selection Bias Problem#

Two Estimation Approaches#

When to Use the Heckman Model#

When NOT to Use the Heckman Model#

Common Confusions#

When (Not) to Use This Method#

Use the Heckman Model When:

Do NOT Use the Heckman Model When:

Connection to Other Methods#

BIdentification#

Condition 1: Joint Normality#

Condition 2: Exclusion Restriction#

CVisual Intuition#

DMathematical Derivation#

Derivation of the Inverse Mills Ratio Correction#

EImplementation#

Heckman Selection Model with Diagnostics#

FDiagnostics#

F.1 Significance of the Inverse Mills Ratio (Lambda)#

F.2 Normality Tests#

F.3 Exclusion Restriction Strength#

F.4 Two-Step vs. MLE Comparison#

F.5 Interpreting Your Results#

What Rho Tells You

What Lambda (λ^\hat{\lambda}λ^ Coefficient) Captures

What to Report in a Table

GWhat Can Go Wrong#

No Exclusion Restriction

Normality Violation

Weak Exclusion Restriction

Confusing Incidental Truncation with Sample Selection

HPractice#

H.1 Concept Checks#

H.2 Guided Exercise#

H.3 Error Detective#

H.4 You Are the Referee#

Paper Summary

Key Table

Authors' Identification Claim

ISwap-In: When to Use Something Else#

JReviewer Checklist#

Critical Reading Checklist

Paper Library

Foundational (4)

Application (2)

Survey (6)

Tags

Motivating Example: Wages and Female Labor Supply

AOverview

What the Heckman Model Does

The Selection Bias Problem

Two Estimation Approaches

When to Use the Heckman Model

When NOT to Use the Heckman Model

Common Confusions

When (Not) to Use This Method

Connection to Other Methods

BIdentification

Condition 1: Joint Normality

Condition 2: Exclusion Restriction

CVisual Intuition

DMathematical Derivation

Derivation of the Inverse Mills Ratio Correction

EImplementation

Heckman Selection Model with Diagnostics

FDiagnostics

F.1 Significance of the Inverse Mills Ratio (Lambda)

F.2 Normality Tests

F.3 Exclusion Restriction Strength

F.4 Two-Step vs. MLE Comparison

F.5 Interpreting Your Results

What Lambda ( $\hat{\lambda}$ Coefficient) Captures

GWhat Can Go Wrong

HPractice

H.1 Concept Checks

H.2 Guided Exercise

H.3 Error Detective

H.4 You Are the Referee

ISwap-In: When to Use Something Else

JReviewer Checklist