Replication Lab: Female Labor Supply and Selection Bias
Replicate key findings from Mroz's classic study of female labor supply using Heckman's two-step correction for sample selection bias. Simulate data matching published summary statistics, estimate naive OLS, two-step, and MLE selection models, and compare the results.
Overview
In this replication lab, you will reproduce the main results from a foundational paper in labor econometrics:
Mroz, Thomas A. 1987. "The Sensitivity of an Empirical Model of Married Women's Hours of Work to Economic and Statistical Assumptions." Econometrica 55(4): 765--799.
Mroz examined the labor force participation and wage determination of married women, highlighting how naive OLS on the subsample of working women produces biased estimates because women who work are not a random sample of all women. The paper systematically compared OLS, Heckman's two-step estimator, and maximum likelihood estimation, showing that selection correction matters for some coefficients but is fragile across specifications.
Why this paper matters: It became the standard empirical illustration of Heckman's (1979) selection model, demonstrating both the importance and the practical difficulties of correcting for sample selection bias. The Mroz dataset is one of the most widely used teaching datasets in econometrics.
What you will do:
- Learn why simulation is used when exploring the data-generating process behind selection bias
- Simulate data matching Mroz (1987) Table II summary statistics for 753 married women
- Estimate a probit selection equation for labor force participation
- Estimate wage equations using naive OLS, Heckman two-step, and MLE
- Compare coefficient estimates and assess the magnitude of selection bias
- Evaluate the inverse Mills ratio and its role in bias correction
Step 1: Simulate the Mroz Data
The original dataset contains 753 married white women aged 30--60 from the 1976 Panel Study of Income Dynamics (PSID). Of these, 428 participated in the labor force (LFP = 1) and 325 did not (LFP = 0). Wages are observed only for participants.
library(sampleSelection)
library(modelsummary)
set.seed(1987)
n <- 753
# Demographics matching Table II means
age <- round(rnorm(n, 42.5, 8.1))
educ <- round(pmin(pmax(rnorm(n, 12.3, 2.3), 5), 20))
kids_lt6 <- rpois(n, 0.24) # Mean = 0.24
kids_6_18 <- rpois(n, 1.35) # Mean = 1.35
husb_inc <- pmax(rnorm(n, 20.13, 11.63), 0) # Husband income ($1000s)
exper <- pmax(round(rnorm(n, 10.6, 8.1)), 0)
exper_sq <- exper^2
# Selection equation: latent propensity to participate
z_star <- -4.2 + 0.13 * educ - 0.02 * age + 0.06 * exper -
0.89 * kids_lt6 - 0.05 * kids_6_18 - 0.012 * husb_inc +
rnorm(n, 0, 1)
lfp <- as.integer(z_star > 0)
# Wage equation: log wage for all women (latent)
# Correlation between selection and wage errors (rho ~ 0.3)
u_wage <- 0.3 * (z_star - mean(z_star)) / sd(z_star) +
sqrt(1 - 0.3^2) * rnorm(n, 0, 1)
log_wage <- -0.40 + 0.11 * educ + 0.04 * exper -
0.0008 * exper_sq + 0.49 * u_wage
wage <- exp(log_wage)
# Wage is observed only if lfp == 1
wage_obs <- ifelse(lfp == 1, wage, NA)
log_wage_obs <- ifelse(lfp == 1, log_wage, NA)
df <- data.frame(lfp, log_wage = log_wage_obs, wage = wage_obs,
educ, age, exper, exper_sq, kids_lt6,
kids_6_18, husb_inc)
cat("=== Sample Composition (Published: 428 LFP, 325 Non-LFP) ===\n")
cat("LFP = 1:", sum(df$lfp), " LFP = 0:", sum(1 - df$lfp), "\n")
cat("Total N:", nrow(df), "\n\n")
cat("=== Summary Statistics (Working Women) ===\n")
summary(df$wage[df$lfp == 1])Step 2: Estimate the Probit Selection Equation
The first stage of Heckman's procedure models the labor force participation decision. Exclusion restrictions (variables that affect participation but not wages) are crucial for identification. Here, kids_lt6, kids_6_18, and husb_inc serve as exclusion restrictions.
# Probit: LFP = f(educ, age, exper, kids_lt6, kids_6_18, husb_inc)
probit_sel <- glm(lfp ~ educ + age + exper + kids_lt6 + kids_6_18 +
husb_inc, data = df, family = binomial(link = "probit"))
summary(probit_sel)
# Compute inverse Mills ratio for working women
xb <- predict(probit_sel, type = "link")
lambda <- dnorm(xb) / pnorm(xb) # IMR = phi(xb) / Phi(xb)
df$imr <- lambda
cat("\n=== Key coefficients (Published: kids_lt6 ~ -0.87) ===\n")
cat("kids_lt6:", round(coef(probit_sel)["kids_lt6"], 3), "\n")
cat("educ:", round(coef(probit_sel)["educ"], 3), "\n")
cat("husb_inc:", round(coef(probit_sel)["husb_inc"], 3), "\n")Why do we need at least one variable in the selection equation that is excluded from the wage equation?
Step 3: Estimate Naive OLS vs. Heckman Two-Step vs. MLE
The core comparison in Mroz (1987) contrasts three approaches to estimating the wage equation: (1) naive OLS on working women only (ignoring selection), (2) Heckman's two-step procedure (adding the inverse Mills ratio as a regressor), and (3) full maximum likelihood estimation of the joint selection-wage model.
# --- Model 1: Naive OLS on working women only ---
workers <- df[df$lfp == 1, ]
m_ols <- lm(log_wage ~ educ + exper + exper_sq, data = workers)
# --- Model 2: Heckman two-step ---
# Step 1: probit already estimated above
# Step 2: add IMR to wage equation
m_2step <- lm(log_wage ~ educ + exper + exper_sq + imr,
data = workers)
# --- Model 3: Heckman MLE (joint estimation) ---
m_mle <- selection(
selection = lfp ~ educ + age + exper + kids_lt6 + kids_6_18 + husb_inc,
outcome = log_wage ~ educ + exper + exper_sq,
data = df, method = "ml"
)
cat("=== Naive OLS (working women only) ===\n")
print(summary(m_ols)$coefficients[, 1:2])
cat("\n=== Heckman Two-Step ===\n")
print(summary(m_2step)$coefficients[, 1:2])
cat("\n=== Heckman MLE ===\n")
print(summary(m_mle))
# Compare the education coefficient across methods
cat("\n=== Education Coefficient Comparison ===\n")
cat("Naive OLS: ", round(coef(m_ols)["educ"], 4), "\n")
cat("Two-step: ", round(coef(m_2step)["educ"], 4), "\n")
cat("MLE: ", round(coef(m_mle, part = "outcome")["educ"], 4), "\n")The coefficient on the inverse Mills ratio (lambda) is estimated to be positive. What does this imply about the selection process?
Step 4: Assess Selection Bias Magnitude and Robustness
Mroz (1987) emphasized that the Heckman correction is sensitive to specification choices. We now examine how robust the selection correction is across different specifications and test whether the exclusion restrictions are valid.
# --- Specification sensitivity ---
# Model with fewer exclusion restrictions (drop kids_6_18)
m_mle2 <- selection(
selection = lfp ~ educ + age + exper + kids_lt6 + husb_inc,
outcome = log_wage ~ educ + exper + exper_sq,
data = df, method = "ml"
)
# Model with no exclusion restriction (identified only by functional form)
m_mle3 <- selection(
selection = lfp ~ educ + exper + exper_sq,
outcome = log_wage ~ educ + exper + exper_sq,
data = df, method = "ml"
)
cat("=== Sensitivity: Education Coefficient ===\n")
cat("Full exclusions (kids_lt6, kids_6_18, husb_inc, age):\n")
cat(" educ =", round(coef(m_mle, part = "outcome")["educ"], 4), "\n")
cat("Fewer exclusions (kids_lt6, husb_inc):\n")
cat(" educ =", round(coef(m_mle2, part = "outcome")["educ"], 4), "\n")
cat("No exclusion restriction (functional form only):\n")
cat(" educ =", round(coef(m_mle3, part = "outcome")["educ"], 4), "\n")
# --- Test significance of IMR ---
cat("\n=== Is Selection Correction Needed? ===\n")
cat("H0: rho = 0 (no selection bias)\n")
lr_stat <- 2 * (logLik(m_mle) - (logLik(probit_sel) + logLik(m_ols)))
cat("Approximate LR test statistic:", round(as.numeric(lr_stat), 3), "\n")
# --- Compare predicted wages: selected vs. population ---
df$xb_wage <- predict(m_ols, newdata = df)
cat("\nMean predicted log wage (all women): ",
round(mean(df$xb_wage), 3), "\n")
cat("Mean predicted log wage (workers only): ",
round(mean(df$xb_wage[df$lfp == 1]), 3), "\n")
cat("Difference (selection bias): ",
round(mean(df$xb_wage[df$lfp == 1]) - mean(df$xb_wage), 3), "\n")Step 5: Compare with Published Results
cat("==========================================================\n")
cat("COMPARISON: Our Replication vs. Mroz (1987) Table II\n")
cat("==========================================================\n")
cat(sprintf("%-35s %10s %10s\n", "Statistic", "Published", "Ours"))
cat("----------------------------------------------------------\n")
cat(sprintf("%-35s %10d %10d\n", "N (total)", 753, nrow(df)))
cat(sprintf("%-35s %10d %10d\n", "N (LFP = 1)", 428, sum(df$lfp)))
cat(sprintf("%-35s %10.3f %10.3f\n", "LFP rate", 0.568,
mean(df$lfp)))
cat(sprintf("%-35s %10.3f %10.3f\n", "educ (OLS)",
0.108, coef(m_ols)["educ"]))
cat(sprintf("%-35s %10.3f %10.3f\n", "educ (two-step)",
0.109, coef(m_2step)["educ"]))
cat(sprintf("%-35s %10.3f %10.3f\n", "exper (OLS)",
0.042, coef(m_ols)["exper"]))
cat(sprintf("%-35s %10.3f %10.3f\n", "lambda (IMR)",
0.30, coef(m_2step)["imr"]))
cat("----------------------------------------------------------\n")
cat("Note: Differences from simulation randomness.\n")Read the analysis below carefully and identify the errors.
A researcher studies the wage gap between union and non-union workers. She estimates a Heckman selection model where the selection equation models union membership and the wage equation estimates the union wage premium. Her specification:
Selection: union = f(educ, exper, industry, state) Wage: log(wage) = g(educ, exper, union)
She reports: "The Heckman correction yields a union premium of 18%, compared to 15% from naive OLS. The IMR coefficient is 0.22 (SE = 0.09, p = 0.014), confirming that selection bias is present and that our corrected estimate is reliable."
Select all errors you can find:
Summary
Our replication confirms the main findings of Mroz (1987):
-
Selection bias exists but is modest in this application. The naive OLS estimate of the return to education (0.108) is close to the selection-corrected estimate (0.109). The IMR coefficient is positive, indicating positive selection into employment.
-
Exclusion restrictions matter. Without credible variables excluded from the wage equation, the Heckman correction relies on distributional assumptions and can be unreliable. Mroz showed that different specifications yield different results.
-
Multiple estimators should be compared. OLS, two-step, and MLE can produce different answers. When they agree, we have more confidence. When they diverge, it signals sensitivity to assumptions.
-
Young children are the strongest predictor of non-participation. The probit selection equation shows that kids under 6 have a large negative effect on labor force participation, making them a plausible exclusion restriction for the wage equation.
Extension Exercises
-
Tobit comparison. Estimate a Tobit model treating zero hours as censored. Compare with the Heckman model and discuss when each is appropriate.
-
Sensitivity to distributional assumptions. Replace the normal distribution with a logistic distribution in the selection equation. How do the results change?
-
Bootstrap standard errors. The two-step standard errors are inconsistent because they ignore the estimation error in the first-stage probit. Implement a bootstrap to obtain correct standard errors.
-
Predicted wages for non-participants. Use the selection-corrected model to predict what non-participating women would earn if they worked. Is there evidence of a reservation wage threshold?
-
Modern alternatives. Estimate the wage equation using a control function approach or a semiparametric selection model (e.g., Klein and Spady). Compare with the parametric Heckman results.