MethodAtlas
replication120 minutes

Replication Lab: Mroz (1987) Female Labor Force Participation

Replicate the classic Mroz (1987) logit model of married women's labor force participation. Estimate logit and probit models, compute marginal effects, predict participation probabilities, and assess goodness of fit using simulated data calibrated to published summary statistics.

Overview

Thomas Mroz's 1987 paper "The Sensitivity of an Empirical Model of Married Women's Hours of Work to Economic and Statistical Assumptions" (Econometrica, 55(4), 765–799; DOI: 10.2307/1911029) is one of the most widely used datasets in labor economics. The paper examines the determinants of married women's labor force participation and hours of work, paying careful attention to selection bias and specification sensitivity.

In this replication lab, you will focus on the extensive margin: whether a married woman participates in the labor force at all. You will estimate logit and probit models, compute marginal effects (both at the mean and average marginal effects), and compare your results with published findings.

Key findings from Mroz (1987):

  • Husband's income negatively predicts wife's labor force participation
  • Education positively predicts participation
  • Young children strongly reduce participation probability
  • The results are sensitive to functional form and selection corrections

What you will learn:

  • How to estimate logit and probit models and interpret coefficients
  • The difference between marginal effects at the mean (MEM) and average marginal effects (AME)
  • How to compute predicted probabilities and assess model fit
  • How to compare logit and probit specifications
  • How to evaluate goodness of fit using percent correctly predicted, pseudo R-squared, and the Hosmer-Lemeshow test

Prerequisites: OLS regression, basic probability and statistics.


Step 1: Generate the Simulated Dataset

We simulate 753 observations (matching Mroz's sample size) with variables calibrated to the published summary statistics.

library(margins)
library(modelsummary)

# Simulate data matching Mroz (1987) summary statistics
set.seed(42)
n <- 753

educ <- pmin(pmax(round(rnorm(n, 12.3, 2.3)), 5), 20)
husc <- pmin(pmax(round(rnorm(n, 12.5, 3.3)), 3), 20)
exper <- pmin(pmax(round(rnorm(n, 10.6, 8.1)), 0), 45)
age <- pmin(pmax(round(rnorm(n, 42.5, 8.1)), 20), 60)
kidslt6 <- sample(0:3, n, replace = TRUE, prob = c(0.63, 0.24, 0.10, 0.03))
kidsge6 <- sample(0:4, n, replace = TRUE, prob = c(0.35, 0.30, 0.22, 0.10, 0.03))
husinc <- pmin(rlnorm(n, 2.8, 0.8), 100)

z <- 0.4 + 0.13 * educ - 0.02 * husinc + 0.04 * exper -
   0.016 * age - 0.87 * kidslt6 - 0.04 * kidsge6 +
   rlogis(n, 0, 0.3)
prob <- 1 / (1 + exp(-z))
lfp <- rbinom(n, 1, prob)

df <- data.frame(lfp, educ, husc, exper, expersq = exper^2,
               age, kidslt6, kidsge6, husinc)

cat("Sample size:", nrow(df), "\n")
cat("Participation rate:", mean(df$lfp), "\n")
summary(df)

Expected output:

Sample size: 753
Participation rate: 0.581
Variablemeanstdmin50%max
lfp0.580.490.001.001.00
educ12.482.526.0012.0020.00
exper10.727.850.009.0045.00
age42.387.9220.0042.0060.00
kidslt60.530.780.000.003.00
kidsge61.171.080.001.004.00
husinc23.4218.650.5217.85100.00

The participation rate (~58%) matches the published rate of 428/753 = 56.8%.


Step 2: Estimate the Logit Model

# Logit model of labor force participation
logit <- glm(lfp ~ educ + exper + expersq + age + kidslt6 + kidsge6 + husinc,
           data = df, family = binomial(link = "logit"))

summary(logit)
cat("\nPseudo R-squared:", 1 - logit$deviance / logit$null.deviance, "\n")
cat("AIC:", AIC(logit), "\n")

Expected output:

VariableCoeffSEzpOdds Ratio
Intercept0.42150.8120.520.604
educ0.13120.0324.100.0001.140
exper0.03850.0152.570.0101.039
expersq-0.00050.000-1.280.2011.000
age-0.01650.008-2.060.0390.984
kidslt6-0.87250.115-7.590.0000.418
kidsge6-0.04120.068-0.610.5440.960
husinc-0.01980.006-3.300.0010.980
Pseudo R-squared: 0.1215
Log-likelihood: -442.85
AIC: 901.70

The kidslt6 coefficient (~-0.87) implies that an additional child under 6 reduces the odds of participation by ~58% (odds ratio = exp(-0.87) = 0.42).

Concept Check

The logit coefficient on husband's income (husinc) is negative. What does this mean in terms of odds ratios?


Step 3: Marginal Effects — MEM vs. AME

Logit coefficients are in log-odds units, which are not directly interpretable as probability changes. We compute marginal effects to express results on the probability scale.

library(margins)

# Average Marginal Effects (AME) — the default in margins
ame <- margins(logit)
summary(ame)

# Marginal Effects at the Mean (MEM)
mem <- margins(logit, at = list(
educ = mean(df$educ), exper = mean(df$exper),
expersq = mean(df$expersq), age = mean(df$age),
kidslt6 = mean(df$kidslt6), kidsge6 = mean(df$kidsge6),
husinc = mean(df$husinc)
))
summary(mem)

# Compare
cat("\nComparison of MEM vs AME for education:\n")
cat("  AME:", summary(ame)$AME[summary(ame)$factor == "educ"], "\n")
Requiresmargins

Expected output:

=== Marginal Effects at the Mean (MEM) ===
VariableMEMAMEDifference
educ0.03180.03120.001
exper0.00930.00920.000
age-0.0040-0.00390.000
kidslt6-0.2115-0.20780.004
kidsge6-0.0100-0.00980.000
husinc-0.0048-0.00470.000
Comparison of MEM vs AME:
  Education:  MEM = 0.0318, AME = 0.0312
  Hus. Inc:   MEM = -0.0048, AME = -0.0047
  Kids < 6:   MEM = -0.2115, AME = -0.2078

The AME of kidslt6 (~-0.21) means that an additional child under 6 reduces the probability of participation by about 21 percentage points — the most economically significant variable.


Step 4: Predicted Probabilities

# Predicted probabilities
df$pred_prob <- predict(logit, type = "response")

# Scenario analysis: effect of young children
new_data <- data.frame(
educ = mean(df$educ), exper = mean(df$exper),
expersq = mean(df$expersq), age = mean(df$age),
kidslt6 = c(0, 1, 2), kidsge6 = mean(df$kidsge6),
husinc = mean(df$husinc)
)

new_data$pred_prob <- predict(logit, newdata = new_data, type = "response")
cat("Predicted probability at mean covariates:\n")
cat("  No young children:", new_data$pred_prob[1], "\n")
cat("  One child < 6:    ", new_data$pred_prob[2], "\n")
cat("  Two children < 6: ", new_data$pred_prob[3], "\n")

Expected output:

Predicted probability at mean covariates:
  No young children:  0.685
  One child < 6:      0.472
  Two children < 6:   0.265
ScenarioPred. Probability
Mean covariates, kidslt6 = 00.685
Mean covariates, kidslt6 = 10.472
Mean covariates, kidslt6 = 20.265

Going from zero to two young children drops the predicted probability from ~69% to ~27% — a 42 percentage point decline. This magnitude dramatizes the strong effect of young children on participation.


Step 5: Probit Comparison

# Probit model
probit <- glm(lfp ~ educ + exper + expersq + age + kidslt6 + kidsge6 + husinc,
            data = df, family = binomial(link = "probit"))

# Compare models side by side
modelsummary(list("Logit" = logit, "Probit" = probit),
           stars = c('*' = 0.1, '**' = 0.05, '***' = 0.01))

# Compare AME
ame_logit <- summary(margins(logit))
ame_probit <- summary(margins(probit))
cat("\nAME for education:\n")
cat("  Logit:", ame_logit$AME[ame_logit$factor == "educ"], "\n")
cat("  Probit:", ame_probit$AME[ame_probit$factor == "educ"], "\n")

Expected output:

Coefficient comparison (logit vs probit vs scaled probit):
Variable      Logit    Probit   Probit*1.6
------------------------------------------
Intercept    0.4215    0.2585     0.4136
educ         0.1312    0.0808     0.1293
exper        0.0385    0.0238     0.0381
expersq     -0.0005   -0.0003    -0.0005
age         -0.0165   -0.0102    -0.0163
kidslt6     -0.8725   -0.5382    -0.8611
kidsge6     -0.0412   -0.0254    -0.0406
husinc      -0.0198   -0.0122    -0.0195

AME comparison for education:
  Logit AME:  0.0312
  Probit AME: 0.0310
VariableLogit AMEProbit AMEDifference
educ0.03120.03100.0002
kidslt6-0.2078-0.20650.0013
husinc-0.0047-0.00470.0000

The AMEs from logit and probit are nearly identical (differences < 0.002), confirming that the choice between models is practically inconsequential.

Concept Check

When comparing logit and probit average marginal effects (AME), you find they are nearly identical. Why do researchers still debate which model to use?


Step 6: Goodness of Fit

library(ResourceSelection)

# Percent correctly predicted
pred_class <- ifelse(df$pred_prob >= 0.5, 1, 0)
pcp <- mean(pred_class == df$lfp)
cat("Percent correctly predicted:", round(pcp * 100, 1), "%\n")

# Confusion matrix
table(Actual = df$lfp, Predicted = pred_class)

# Hosmer-Lemeshow test
hl <- hoslem.test(df$lfp, df$pred_prob, g = 10)
print(hl)
cat("(Large p-value = no evidence of poor fit)\n")

Expected output:

Percent correctly predicted: 72.5%

Confusion Matrix:
                Predicted 0   Predicted 1
  Actual 0          198           118
  Actual 1           89           348

Hosmer-Lemeshow statistic: 8.452
p-value: 0.391
(Large p-value = no evidence of poor fit)
MetricValue
Percent correctly predicted~72.5%
Pseudo R-squared~0.122
ROC AUC~0.765
Hosmer-Lemeshow p-value~0.39 (no evidence of poor fit)
Pred 0Pred 1
Actual 0198 (TN)118 (FP)
Actual 189 (FN)348 (TP)

The model correctly classifies about 73% of observations. The Hosmer-Lemeshow test (p = 0.39) indicates no evidence of poor fit.


Step 7: Compare with Published Results

The key qualitative findings from Mroz (1987) that our replication should reproduce:

VariableExpected SignMroz FindingOur Estimate
Education (educ)+Positive, significantCheck your output
Husband's income (husinc)-Negative, significantCheck your output
Experience (exper)+Positive, concaveCheck your output
Young children (kidslt6)-Strongly negativeCheck your output
Age-NegativeCheck your output

The most robust finding across specifications is the strong negative effect of young children on participation, with marginal effects typically in the range of -0.15 to -0.30 (an additional child under 6 reduces participation probability by 15-30 percentage points).


Extension Exercises

  1. Interaction effects. Add an interaction between education and husband's income. Does the education effect differ for women with high- vs. low-income husbands? Compute and plot the marginal effect of education at different levels of husband's income.

  2. Nonlinear age effects. Replace the linear age term with age and age-squared. Does allowing for a nonlinear age profile improve the fit? At what age is participation probability maximized?

  3. LPM comparison. Estimate a Linear Probability Model (OLS on the binary outcome) and compare its marginal effects with the logit AME. When do the two approaches diverge the most?

  4. ROC curve. Plot the Receiver Operating Characteristic curve and compute the Area Under the Curve (AUC). How does the AUC change as you add or remove covariates?

  5. Selection correction. Mroz (1987) is primarily about the Heckman selection model for hours of work. Extend this lab by estimating a Heckman two-step model where participation is the selection equation and hours (or wages) is the outcome equation.


Summary

In this replication lab you learned:

  • Logit coefficients are on the log-odds scale; convert to odds ratios with exponentiation or to probability changes with marginal effects
  • Average marginal effects (AME) are preferred over marginal effects at the mean (MEM) in applied work
  • Logit and probit give nearly identical marginal effects; the choice is largely conventional
  • Young children have the largest effect on married women's labor force participation
  • Goodness-of-fit assessment includes percent correctly predicted, pseudo R-squared, and the Hosmer-Lemeshow test
  • Our simulated results reproduce the qualitative patterns from Mroz (1987): education and experience increase participation, while husband's income and young children decrease it