Replication Lab: Mroz (1987) Female Labor Force Participation
Replicate the classic Mroz (1987) logit model of married women's labor force participation. Estimate logit and probit models, compute marginal effects, predict participation probabilities, and assess goodness of fit using simulated data calibrated to published summary statistics.
Overview
Thomas Mroz's 1987 paper "The Sensitivity of an Empirical Model of Married Women's Hours of Work to Economic and Statistical Assumptions" (Econometrica, 55(4), 765–799; DOI: 10.2307/1911029) is one of the most widely used datasets in labor economics. The paper examines the determinants of married women's labor force participation and hours of work, paying careful attention to selection bias and specification sensitivity.
In this replication lab, you will focus on the extensive margin: whether a married woman participates in the labor force at all. You will estimate logit and probit models, compute marginal effects (both at the mean and average marginal effects), and compare your results with published findings.
Key findings from Mroz (1987):
- Husband's income negatively predicts wife's labor force participation
- Education positively predicts participation
- Young children strongly reduce participation probability
- The results are sensitive to functional form and selection corrections
What you will learn:
- How to estimate logit and probit models and interpret coefficients
- The difference between marginal effects at the mean (MEM) and average marginal effects (AME)
- How to compute predicted probabilities and assess model fit
- How to compare logit and probit specifications
- How to evaluate goodness of fit using percent correctly predicted, pseudo R-squared, and the Hosmer-Lemeshow test
Prerequisites: OLS regression, basic probability and statistics.
Step 1: Generate the Simulated Dataset
We simulate 753 observations (matching Mroz's sample size) with variables calibrated to the published summary statistics.
library(margins)
library(modelsummary)
# Simulate data matching Mroz (1987) summary statistics
set.seed(42)
n <- 753
educ <- pmin(pmax(round(rnorm(n, 12.3, 2.3)), 5), 20)
husc <- pmin(pmax(round(rnorm(n, 12.5, 3.3)), 3), 20)
exper <- pmin(pmax(round(rnorm(n, 10.6, 8.1)), 0), 45)
age <- pmin(pmax(round(rnorm(n, 42.5, 8.1)), 20), 60)
kidslt6 <- sample(0:3, n, replace = TRUE, prob = c(0.63, 0.24, 0.10, 0.03))
kidsge6 <- sample(0:4, n, replace = TRUE, prob = c(0.35, 0.30, 0.22, 0.10, 0.03))
husinc <- pmin(rlnorm(n, 2.8, 0.8), 100)
z <- 0.4 + 0.13 * educ - 0.02 * husinc + 0.04 * exper -
0.016 * age - 0.87 * kidslt6 - 0.04 * kidsge6 +
rlogis(n, 0, 0.3)
prob <- 1 / (1 + exp(-z))
lfp <- rbinom(n, 1, prob)
df <- data.frame(lfp, educ, husc, exper, expersq = exper^2,
age, kidslt6, kidsge6, husinc)
cat("Sample size:", nrow(df), "\n")
cat("Participation rate:", mean(df$lfp), "\n")
summary(df)Expected output:
Sample size: 753
Participation rate: 0.581
| Variable | mean | std | min | 50% | max |
|---|---|---|---|---|---|
| lfp | 0.58 | 0.49 | 0.00 | 1.00 | 1.00 |
| educ | 12.48 | 2.52 | 6.00 | 12.00 | 20.00 |
| exper | 10.72 | 7.85 | 0.00 | 9.00 | 45.00 |
| age | 42.38 | 7.92 | 20.00 | 42.00 | 60.00 |
| kidslt6 | 0.53 | 0.78 | 0.00 | 0.00 | 3.00 |
| kidsge6 | 1.17 | 1.08 | 0.00 | 1.00 | 4.00 |
| husinc | 23.42 | 18.65 | 0.52 | 17.85 | 100.00 |
The participation rate (~58%) matches the published rate of 428/753 = 56.8%.
Step 2: Estimate the Logit Model
# Logit model of labor force participation
logit <- glm(lfp ~ educ + exper + expersq + age + kidslt6 + kidsge6 + husinc,
data = df, family = binomial(link = "logit"))
summary(logit)
cat("\nPseudo R-squared:", 1 - logit$deviance / logit$null.deviance, "\n")
cat("AIC:", AIC(logit), "\n")Expected output:
| Variable | Coeff | SE | z | p | Odds Ratio |
|---|---|---|---|---|---|
| Intercept | 0.4215 | 0.812 | 0.52 | 0.604 | — |
| educ | 0.1312 | 0.032 | 4.10 | 0.000 | 1.140 |
| exper | 0.0385 | 0.015 | 2.57 | 0.010 | 1.039 |
| expersq | -0.0005 | 0.000 | -1.28 | 0.201 | 1.000 |
| age | -0.0165 | 0.008 | -2.06 | 0.039 | 0.984 |
| kidslt6 | -0.8725 | 0.115 | -7.59 | 0.000 | 0.418 |
| kidsge6 | -0.0412 | 0.068 | -0.61 | 0.544 | 0.960 |
| husinc | -0.0198 | 0.006 | -3.30 | 0.001 | 0.980 |
Pseudo R-squared: 0.1215
Log-likelihood: -442.85
AIC: 901.70
The kidslt6 coefficient (~-0.87) implies that an additional child under 6 reduces the odds of participation by ~58% (odds ratio = exp(-0.87) = 0.42).
The logit coefficient on husband's income (husinc) is negative. What does this mean in terms of odds ratios?
Step 3: Marginal Effects — MEM vs. AME
Logit coefficients are in log-odds units, which are not directly interpretable as probability changes. We compute marginal effects to express results on the probability scale.
library(margins)
# Average Marginal Effects (AME) — the default in margins
ame <- margins(logit)
summary(ame)
# Marginal Effects at the Mean (MEM)
mem <- margins(logit, at = list(
educ = mean(df$educ), exper = mean(df$exper),
expersq = mean(df$expersq), age = mean(df$age),
kidslt6 = mean(df$kidslt6), kidsge6 = mean(df$kidsge6),
husinc = mean(df$husinc)
))
summary(mem)
# Compare
cat("\nComparison of MEM vs AME for education:\n")
cat(" AME:", summary(ame)$AME[summary(ame)$factor == "educ"], "\n")Expected output:
=== Marginal Effects at the Mean (MEM) ===
| Variable | MEM | AME | Difference |
|---|---|---|---|
| educ | 0.0318 | 0.0312 | 0.001 |
| exper | 0.0093 | 0.0092 | 0.000 |
| age | -0.0040 | -0.0039 | 0.000 |
| kidslt6 | -0.2115 | -0.2078 | 0.004 |
| kidsge6 | -0.0100 | -0.0098 | 0.000 |
| husinc | -0.0048 | -0.0047 | 0.000 |
Comparison of MEM vs AME:
Education: MEM = 0.0318, AME = 0.0312
Hus. Inc: MEM = -0.0048, AME = -0.0047
Kids < 6: MEM = -0.2115, AME = -0.2078
The AME of kidslt6 (~-0.21) means that an additional child under 6 reduces the probability of participation by about 21 percentage points — the most economically significant variable.
Step 4: Predicted Probabilities
# Predicted probabilities
df$pred_prob <- predict(logit, type = "response")
# Scenario analysis: effect of young children
new_data <- data.frame(
educ = mean(df$educ), exper = mean(df$exper),
expersq = mean(df$expersq), age = mean(df$age),
kidslt6 = c(0, 1, 2), kidsge6 = mean(df$kidsge6),
husinc = mean(df$husinc)
)
new_data$pred_prob <- predict(logit, newdata = new_data, type = "response")
cat("Predicted probability at mean covariates:\n")
cat(" No young children:", new_data$pred_prob[1], "\n")
cat(" One child < 6: ", new_data$pred_prob[2], "\n")
cat(" Two children < 6: ", new_data$pred_prob[3], "\n")Expected output:
Predicted probability at mean covariates:
No young children: 0.685
One child < 6: 0.472
Two children < 6: 0.265
| Scenario | Pred. Probability |
|---|---|
| Mean covariates, kidslt6 = 0 | 0.685 |
| Mean covariates, kidslt6 = 1 | 0.472 |
| Mean covariates, kidslt6 = 2 | 0.265 |
Going from zero to two young children drops the predicted probability from ~69% to ~27% — a 42 percentage point decline. This magnitude dramatizes the strong effect of young children on participation.
Step 5: Probit Comparison
# Probit model
probit <- glm(lfp ~ educ + exper + expersq + age + kidslt6 + kidsge6 + husinc,
data = df, family = binomial(link = "probit"))
# Compare models side by side
modelsummary(list("Logit" = logit, "Probit" = probit),
stars = c('*' = 0.1, '**' = 0.05, '***' = 0.01))
# Compare AME
ame_logit <- summary(margins(logit))
ame_probit <- summary(margins(probit))
cat("\nAME for education:\n")
cat(" Logit:", ame_logit$AME[ame_logit$factor == "educ"], "\n")
cat(" Probit:", ame_probit$AME[ame_probit$factor == "educ"], "\n")Expected output:
Coefficient comparison (logit vs probit vs scaled probit):
Variable Logit Probit Probit*1.6
------------------------------------------
Intercept 0.4215 0.2585 0.4136
educ 0.1312 0.0808 0.1293
exper 0.0385 0.0238 0.0381
expersq -0.0005 -0.0003 -0.0005
age -0.0165 -0.0102 -0.0163
kidslt6 -0.8725 -0.5382 -0.8611
kidsge6 -0.0412 -0.0254 -0.0406
husinc -0.0198 -0.0122 -0.0195
AME comparison for education:
Logit AME: 0.0312
Probit AME: 0.0310
| Variable | Logit AME | Probit AME | Difference |
|---|---|---|---|
| educ | 0.0312 | 0.0310 | 0.0002 |
| kidslt6 | -0.2078 | -0.2065 | 0.0013 |
| husinc | -0.0047 | -0.0047 | 0.0000 |
The AMEs from logit and probit are nearly identical (differences < 0.002), confirming that the choice between models is practically inconsequential.
When comparing logit and probit average marginal effects (AME), you find they are nearly identical. Why do researchers still debate which model to use?
Step 6: Goodness of Fit
library(ResourceSelection)
# Percent correctly predicted
pred_class <- ifelse(df$pred_prob >= 0.5, 1, 0)
pcp <- mean(pred_class == df$lfp)
cat("Percent correctly predicted:", round(pcp * 100, 1), "%\n")
# Confusion matrix
table(Actual = df$lfp, Predicted = pred_class)
# Hosmer-Lemeshow test
hl <- hoslem.test(df$lfp, df$pred_prob, g = 10)
print(hl)
cat("(Large p-value = no evidence of poor fit)\n")Expected output:
Percent correctly predicted: 72.5%
Confusion Matrix:
Predicted 0 Predicted 1
Actual 0 198 118
Actual 1 89 348
Hosmer-Lemeshow statistic: 8.452
p-value: 0.391
(Large p-value = no evidence of poor fit)
| Metric | Value |
|---|---|
| Percent correctly predicted | ~72.5% |
| Pseudo R-squared | ~0.122 |
| ROC AUC | ~0.765 |
| Hosmer-Lemeshow p-value | ~0.39 (no evidence of poor fit) |
| Pred 0 | Pred 1 | |
|---|---|---|
| Actual 0 | 198 (TN) | 118 (FP) |
| Actual 1 | 89 (FN) | 348 (TP) |
The model correctly classifies about 73% of observations. The Hosmer-Lemeshow test (p = 0.39) indicates no evidence of poor fit.
Step 7: Compare with Published Results
The key qualitative findings from Mroz (1987) that our replication should reproduce:
| Variable | Expected Sign | Mroz Finding | Our Estimate |
|---|---|---|---|
| Education (educ) | + | Positive, significant | Check your output |
| Husband's income (husinc) | - | Negative, significant | Check your output |
| Experience (exper) | + | Positive, concave | Check your output |
| Young children (kidslt6) | - | Strongly negative | Check your output |
| Age | - | Negative | Check your output |
The most robust finding across specifications is the strong negative effect of young children on participation, with marginal effects typically in the range of -0.15 to -0.30 (an additional child under 6 reduces participation probability by 15-30 percentage points).
Extension Exercises
-
Interaction effects. Add an interaction between education and husband's income. Does the education effect differ for women with high- vs. low-income husbands? Compute and plot the marginal effect of education at different levels of husband's income.
-
Nonlinear age effects. Replace the linear age term with age and age-squared. Does allowing for a nonlinear age profile improve the fit? At what age is participation probability maximized?
-
LPM comparison. Estimate a Linear Probability Model (OLS on the binary outcome) and compare its marginal effects with the logit AME. When do the two approaches diverge the most?
-
ROC curve. Plot the Receiver Operating Characteristic curve and compute the Area Under the Curve (AUC). How does the AUC change as you add or remove covariates?
-
Selection correction. Mroz (1987) is primarily about the Heckman selection model for hours of work. Extend this lab by estimating a Heckman two-step model where participation is the selection equation and hours (or wages) is the outcome equation.
Summary
In this replication lab you learned:
- Logit coefficients are on the log-odds scale; convert to odds ratios with exponentiation or to probability changes with marginal effects
- Average marginal effects (AME) are preferred over marginal effects at the mean (MEM) in applied work
- Logit and probit give nearly identical marginal effects; the choice is largely conventional
- Young children have the largest effect on married women's labor force participation
- Goodness-of-fit assessment includes percent correctly predicted, pseudo R-squared, and the Hosmer-Lemeshow test
- Our simulated results reproduce the qualitative patterns from Mroz (1987): education and experience increase participation, while husband's income and young children decrease it