Lab·replication·9 min read

replication120 minutes

Replication Lab: Female Labor Supply and Selection Bias

Replicate key findings from Mroz's classic study of female labor supply using Heckman's two-step correction for sample selection bias. Simulate data matching published summary statistics, estimate naive OLS, two-step, and MLE selection models, and compare the results.

MethodHeckman Selection Model

LanguagesPython, R, Stata

DatasetSimulated to match Mroz (1987) Table II statistics

Overview

In this replication lab, you will reproduce the main results from a foundational paper in labor econometrics:

Mroz, Thomas A. 1987. "The Sensitivity of an Empirical Model of Married Women's Hours of Work to Economic and Statistical Assumptions." Econometrica 55(4): 765--799.

Mroz examined the labor force participation and wage determination of married women, highlighting how naive OLS on the subsample of working women produces biased estimates because women who work are not a random sample of all women. The paper systematically compared OLS, Heckman's two-step estimator, and maximum likelihood estimation, showing that selection correction matters for some coefficients but is fragile across specifications.

Why this paper matters: It became the standard empirical illustration of Heckman's (1979) selection model, demonstrating both the importance and the practical difficulties of correcting for sample selection bias. The Mroz dataset is one of the most widely used teaching datasets in econometrics.

What you will do:

Learn why simulation is used when exploring the data-generating process behind selection bias
Simulate data matching Mroz (1987) Table II summary statistics for 753 married women
Estimate a probit selection equation for labor force participation
Estimate wage equations using naive OLS, Heckman two-step, and MLE
Compare coefficient estimates and assess the magnitude of selection bias
Evaluate the inverse Mills ratio and its role in bias correction

Step 1: Simulate the Mroz Data

The original dataset contains 753 married white women aged 30--60 from the 1976 Panel Study of Income Dynamics (PSID). Of these, 428 participated in the labor force (LFP = 1) and 325 did not (LFP = 0). Wages are observed only for participants.

1library(sampleSelection)
2library(modelsummary)
3
4set.seed(1987)
5n <- 753
6
7# Demographics matching Table II means
8age <- round(rnorm(n, 42.5, 8.1))
9educ <- round(pmin(pmax(rnorm(n, 12.3, 2.3), 5), 20))
10kids_lt6 <- rpois(n, 0.24)      # Mean = 0.24
11kids_6_18 <- rpois(n, 1.35)     # Mean = 1.35
12husb_inc <- pmax(rnorm(n, 20.13, 11.63), 0) # Husband income ($1000s)
13exper <- pmax(round(rnorm(n, 10.6, 8.1)), 0)
14exper_sq <- exper^2
15
16# Selection equation: latent propensity to participate
17z_star <- -4.2 + 0.13 * educ - 0.02 * age + 0.06 * exper -
180.89 * kids_lt6 - 0.05 * kids_6_18 - 0.012 * husb_inc +
19rnorm(n, 0, 1)
20lfp <- as.integer(z_star > 0)
21
22# Wage equation: log wage for all women (latent)
23# Correlation between selection and wage errors (rho ~ 0.3)
24u_wage <- 0.3 * (z_star - mean(z_star)) / sd(z_star) +
25sqrt(1 - 0.3^2) * rnorm(n, 0, 1)
26log_wage <- -0.40 + 0.11 * educ + 0.04 * exper -
270.0008 * exper_sq + 0.49 * u_wage
28wage <- exp(log_wage)
29
30# Wage is observed only if lfp == 1
31wage_obs <- ifelse(lfp == 1, wage, NA)
32log_wage_obs <- ifelse(lfp == 1, log_wage, NA)
33
34df <- data.frame(lfp, log_wage = log_wage_obs, wage = wage_obs,
35               educ, age, exper, exper_sq, kids_lt6,
36               kids_6_18, husb_inc)
37
38cat("=== Sample Composition (Published: 428 LFP, 325 Non-LFP) ===\n")
39cat("LFP = 1:", sum(df$lfp), "  LFP = 0:", sum(1 - df$lfp), "\n")
40cat("Total N:", nrow(df), "\n\n")
41cat("=== Summary Statistics (Working Women) ===\n")
42summary(df$wage[df$lfp == 1])

Requiresmodelsummary

Expected output: Sample composition

Sample composition (Published: 428 LFP = 1, 325 LFP = 0):

Group	N	Published
Labor force participants (LFP = 1)	~420--440	428
Non-participants (LFP = 0)	~310--330	325
Total	753	753

Working women's wages (Published mean ~ $4.18/hr in 1975 dollars):

Statistic	Value
Mean wage	~$3.5--5.0
Std. dev.	~$2.5--4.0
Min	~$0.5
Max	~$20

Exact values vary with simulation randomness.

Step 2: Estimate the Probit Selection Equation

The first stage of Heckman's procedure models the labor force participation decision. Exclusion restrictions (variables that affect participation but not wages) are crucial for identification. Here, kids_lt6, kids_6_18, and husb_inc serve as exclusion restrictions.

1# Probit: LFP = f(educ, age, exper, kids_lt6, kids_6_18, husb_inc)
2probit_sel <- glm(lfp ~ educ + age + exper + kids_lt6 + kids_6_18 +
3                  husb_inc, data = df, family = binomial(link = "probit"))
4summary(probit_sel)
5
6# Compute inverse Mills ratio for working women
7xb <- predict(probit_sel, type = "link")
8lambda <- dnorm(xb) / pnorm(xb)   # IMR = phi(xb) / Phi(xb)
9df$imr <- lambda
10
11cat("\n=== Key coefficients (Published: kids_lt6 ~ -0.87) ===\n")
12cat("kids_lt6:", round(coef(probit_sel)["kids_lt6"], 3), "\n")
13cat("educ:", round(coef(probit_sel)["educ"], 3), "\n")
14cat("husb_inc:", round(coef(probit_sel)["husb_inc"], 3), "\n")

Expected output: Probit selection equation

Probit coefficients for labor force participation:

Variable	Coefficient	Std. Error	Published (approx.)
Intercept	~-3.5 to -5.0	~0.8	-4.16
educ	~0.10--0.15	~0.03	0.13
age	~-0.01--0.03	~0.01	-0.02
exper	~0.04--0.08	~0.02	0.06
kids_lt6	~-0.7 to -1.1	~0.12	-0.87
kids_6_18	~-0.03--0.07	~0.04	-0.05
husb_inc	~-0.008--0.015	~0.005	-0.012

Interpretation: Having a child under 6 strongly reduces the probability of labor force participation. Education and experience increase it. Higher husband income reduces participation, consistent with a family labor supply model.

Concept Check

Why do we need at least one variable in the selection equation that is excluded from the wage equation?

To increase the sample size for the wage regression.Without an exclusion restriction, the inverse Mills ratio is a nonlinear function of the same variables in the wage equation, and identification relies entirely on the functional form of the normal distribution — making estimates fragile.Because the probit model requires more regressors than the OLS model.To avoid multicollinearity between the inverse Mills ratio and the other regressors.

Step 3: Estimate Naive OLS vs. Heckman Two-Step vs. MLE

The core comparison in Mroz (1987) contrasts three approaches to estimating the wage equation: (1) naive OLS on working women only (ignoring selection), (2) Heckman's two-step procedure (adding the inverse Mills ratio as a regressor), and (3) full maximum likelihood estimation of the joint selection-wage model.

1# --- Model 1: Naive OLS on working women only ---
2workers <- df[df$lfp == 1, ]
3m_ols <- lm(log_wage ~ educ + exper + exper_sq, data = workers)
4
5# --- Model 2: Heckman two-step ---
6# Step 1: probit already estimated above
7# Step 2: add IMR to wage equation
8m_2step <- lm(log_wage ~ educ + exper + exper_sq + imr,
9            data = workers)
10
11# --- Model 3: Heckman MLE (joint estimation) ---
12m_mle <- selection(
13selection = lfp ~ educ + age + exper + kids_lt6 + kids_6_18 + husb_inc,
14outcome   = log_wage ~ educ + exper + exper_sq,
15data = df, method = "ml"
16)
17
18cat("=== Naive OLS (working women only) ===\n")
19print(summary(m_ols)$coefficients[, 1:2])
20
21cat("\n=== Heckman Two-Step ===\n")
22print(summary(m_2step)$coefficients[, 1:2])
23
24cat("\n=== Heckman MLE ===\n")
25print(summary(m_mle))
26
27# Compare the education coefficient across methods
28cat("\n=== Education Coefficient Comparison ===\n")
29cat("Naive OLS:       ", round(coef(m_ols)["educ"], 4), "\n")
30cat("Two-step:        ", round(coef(m_2step)["educ"], 4), "\n")
31cat("MLE:             ", round(coef(m_mle, part = "outcome")["educ"], 4), "\n")

Expected output: Coefficient comparison across methods

Wage equation coefficients: log(wage) = b0 + b1educ + b2exper + b3*exper_sq

Variable	Naive OLS	Two-Step	MLE	Published (approx.)
educ	~0.105	~0.110	~0.109	0.109
exper	~0.040	~0.042	~0.041	0.042
exper_sq	~-0.0008	~-0.0008	~-0.0008	-0.0008
IMR (lambda)	---	~0.15--0.40	---	~0.30

Key observations:

The education coefficient increases slightly after selection correction, suggesting that naive OLS underestimates the return to education
The IMR coefficient (lambda) is positive, indicating positive selection: women with higher unobserved propensity to work also tend to earn higher wages
The magnitude of bias correction is modest in this application, consistent with Mroz's findings

Concept Check

The coefficient on the inverse Mills ratio (lambda) is estimated to be positive. What does this imply about the selection process?

Women who work earn less than a random woman would earn.There is no selection bias in the wage equation.Women who participate in the labor force have unobservably higher wages than non-participants would have earned — positive selection into employment.The probit selection equation is misspecified.

Step 4: Assess Selection Bias Magnitude and Robustness

Mroz (1987) emphasized that the Heckman correction is sensitive to specification choices. We now examine how robust the selection correction is across different specifications and test whether the exclusion restrictions are valid.

1# --- Specification sensitivity ---
2# Model with fewer exclusion restrictions (drop kids_6_18)
3m_mle2 <- selection(
4selection = lfp ~ educ + age + exper + kids_lt6 + husb_inc,
5outcome   = log_wage ~ educ + exper + exper_sq,
6data = df, method = "ml"
7)
8
9# Model with no exclusion restriction (identified only by functional form)
10m_mle3 <- selection(
11selection = lfp ~ educ + exper + exper_sq,
12outcome   = log_wage ~ educ + exper + exper_sq,
13data = df, method = "ml"
14)
15
16cat("=== Sensitivity: Education Coefficient ===\n")
17cat("Full exclusions (kids_lt6, kids_6_18, husb_inc, age):\n")
18cat("  educ =", round(coef(m_mle, part = "outcome")["educ"], 4), "\n")
19cat("Fewer exclusions (kids_lt6, husb_inc):\n")
20cat("  educ =", round(coef(m_mle2, part = "outcome")["educ"], 4), "\n")
21cat("No exclusion restriction (functional form only):\n")
22cat("  educ =", round(coef(m_mle3, part = "outcome")["educ"], 4), "\n")
23
24# --- Test significance of IMR ---
25cat("\n=== Is Selection Correction Needed? ===\n")
26cat("H0: rho = 0 (no selection bias)\n")
27lr_stat <- 2 * (logLik(m_mle) - (logLik(probit_sel) + logLik(m_ols)))
28cat("Approximate LR test statistic:", round(as.numeric(lr_stat), 3), "\n")
29
30# --- Compare predicted wages: selected vs. population ---
31df$xb_wage <- predict(m_ols, newdata = df)
32cat("\nMean predicted log wage (all women):    ",
33  round(mean(df$xb_wage), 3), "\n")
34cat("Mean predicted log wage (workers only): ",
35  round(mean(df$xb_wage[df$lfp == 1]), 3), "\n")
36cat("Difference (selection bias):            ",
37  round(mean(df$xb_wage[df$lfp == 1]) - mean(df$xb_wage), 3), "\n")

Expected output: Sensitivity of the selection correction

Education coefficient across specifications:

Specification	educ coeff	IMR (lambda)
Naive OLS (no correction)	~0.105	---
Full exclusions (kids_lt6, kids_6_18, husb_inc, age)	~0.110	~0.25
Fewer exclusions (kids_lt6, husb_inc)	~0.109	~0.22
No exclusion (functional form only)	~0.107	~0.10

Key takeaway from Mroz (1987): The Heckman correction is sensitive to the choice of exclusion restrictions. Without credible exclusions, the model is identified only through the nonlinearity of the inverse Mills ratio, which provides weak and unreliable identification. The education coefficient changes modestly here, but in other applications the sensitivity can be dramatic.

Step 5: Compare with Published Results

1cat("==========================================================\n")
2cat("COMPARISON: Our Replication vs. Mroz (1987) Table II\n")
3cat("==========================================================\n")
4cat(sprintf("%-35s %10s %10s\n", "Statistic", "Published", "Ours"))
5cat("----------------------------------------------------------\n")
6cat(sprintf("%-35s %10d %10d\n", "N (total)", 753, nrow(df)))
7cat(sprintf("%-35s %10d %10d\n", "N (LFP = 1)", 428, sum(df$lfp)))
8cat(sprintf("%-35s %10.3f %10.3f\n", "LFP rate", 0.568,
9          mean(df$lfp)))
10cat(sprintf("%-35s %10.3f %10.3f\n", "educ (OLS)",
11          0.108, coef(m_ols)["educ"]))
12cat(sprintf("%-35s %10.3f %10.3f\n", "educ (two-step)",
13          0.109, coef(m_2step)["educ"]))
14cat(sprintf("%-35s %10.3f %10.3f\n", "exper (OLS)",
15          0.042, coef(m_ols)["exper"]))
16cat(sprintf("%-35s %10.3f %10.3f\n", "lambda (IMR)",
17          0.30, coef(m_2step)["imr"]))
18cat("----------------------------------------------------------\n")
19cat("Note: Differences from simulation randomness.\n")

Expected output: Published vs. replicated comparison

Comparison: Our replication vs. Mroz (1987) Table II:

Statistic	Published	Replicated
N (total)	753	753
N (LFP = 1)	428	~420--440
LFP rate	0.568	~0.55--0.59
educ (OLS)	0.108	~0.10--0.11
educ (two-step)	0.109	~0.10--0.12
exper (OLS)	0.042	~0.03--0.05
lambda (IMR)	~0.30	~0.15--0.40

Key findings confirmed:

Selection correction modestly increases the education coefficient
The IMR coefficient (lambda) is positive, indicating positive selection
The correction is sensitive to the choice of exclusion restrictions
MLE and two-step produce similar results with good exclusion restrictions

Error Detective

Read the analysis below carefully and identify the errors.

A researcher studies the wage gap between union and non-union workers. She estimates a Heckman selection model where the selection equation models union membership and the wage equation estimates the union wage premium. Her specification:

Selection: union = f(educ, exper, industry, state) Wage: log(wage) = g(educ, exper, union)

She reports: "The Heckman correction yields a union premium of 18%, compared to 15% from naive OLS. The IMR coefficient is 0.22 (SE = 0.09, p = 0.014), confirming that selection bias is present and that our corrected estimate is reliable."

Select all errors you can find:

No credible exclusion restriction(Model specification)

Treating a significant IMR as proof that the corrected estimate is reliable(Interpretation of lambda)

Conflating statistical selection with economic selection(Conceptual framework)

Summary

Our replication confirms the main findings of Mroz (1987):

Selection bias exists but is modest in this application. The naive OLS estimate of the return to education (0.108) is close to the selection-corrected estimate (0.109). The IMR coefficient is positive, indicating positive selection into employment.
Exclusion restrictions matter. Without credible variables excluded from the wage equation, the Heckman correction relies on distributional assumptions and can be unreliable. Mroz showed that different specifications yield different results.
Multiple estimators should be compared. OLS, two-step, and MLE can produce different answers. When they agree, we have more confidence. When they diverge, it signals sensitivity to assumptions.
Young children are the strongest predictor of non-participation. The probit selection equation shows that kids under 6 have a large negative effect on labor force participation, making them a plausible exclusion restriction for the wage equation.

Extension Exercises

Tobit comparison. Estimate a Tobit model treating zero hours as censored. Compare with the Heckman model and discuss when each is appropriate.
Sensitivity to distributional assumptions. Replace the normal distribution with a logistic distribution in the selection equation. How do the results change?
Bootstrap standard errors. The two-step standard errors are inconsistent because they ignore the estimation error in the first-stage probit. Implement a bootstrap to obtain correct standard errors.
Predicted wages for non-participants. Use the selection-corrected model to predict what non-participating women would earn if they worked. Is there evidence of a reservation wage threshold?
Modern alternatives. Estimate the wage equation using a control function approach or a semiparametric selection model (e.g., Klein and Spady). Compare with the parametric Heckman results.

Overview#

Step 1: Simulate the Mroz Data#

Step 2: Estimate the Probit Selection Equation#

Step 3: Estimate Naive OLS vs. Heckman Two-Step vs. MLE#

Step 4: Assess Selection Bias Magnitude and Robustness#

Step 5: Compare with Published Results#

Summary#

Extension Exercises#