MethodAtlas
replication120 minutes

Replication Lab: Cornwell & Rupert (1988) Wage Equation with Random Effects

Replicate the Cornwell & Rupert (1988) comparison of random effects and fixed effects wage equations. Estimate RE and FE models, conduct the Hausman test, implement the Mundlak approach, and run the Breusch-Pagan LM test using simulated panel data.

Overview

Cornwell and Rupert's 1988 paper "Efficient Estimation with Panel Data: An Empirical Comparison of Instrumental Variables Estimators" (Journal of Applied Econometrics, 3(2), 149–155; DOI: 10.1002/jae.3950030206) uses a balanced panel of 595 individuals observed over 7 years (1976-1982) to estimate wage equations. The dataset has become a classic teaching example for comparing random effects (RE) and fixed effects (FE) estimators, particularly through the Hausman specification test.

Key findings:

  • Education, experience, and union membership affect wages
  • The Hausman test typically rejects the RE assumption that individual effects are uncorrelated with regressors
  • Time-invariant variables (education, race, gender) are identified under RE but not FE
  • The Mundlak approach provides a useful compromise

What you will learn:

  • How to estimate random effects (GLS) and fixed effects (within) estimators
  • How to conduct and interpret the Hausman test
  • How to implement the Mundlak (correlated random effects) approach
  • How to run the Breusch-Pagan LM test for individual effects
  • When to use RE vs. FE in practice

Prerequisites: OLS regression, basic panel data concepts.


Step 1: Generate the Simulated Panel Dataset

library(plm)
library(lmtest)
library(modelsummary)

# Simulate panel data matching Cornwell & Rupert (1988)
set.seed(42)
n_ind <- 595
n_years <- 7
n_obs <- n_ind * n_years

# Time-invariant characteristics
educ <- pmin(pmax(round(rnorm(n_ind, 12.5, 2.5)), 6), 20)
female <- rbinom(n_ind, 1, 0.47)
black <- rbinom(n_ind, 1, 0.12)
ability <- rnorm(n_ind)

# Correlation between ability and education (violates RE)
educ <- pmin(pmax(round(educ + 1.5 * ability), 6), 20)

# Expand to panel
ids <- rep(1:n_ind, each = n_years)
years <- rep(1976:1982, n_ind)
educ_p <- rep(educ, each = n_years)
female_p <- rep(female, each = n_years)
black_p <- rep(black, each = n_years)
ability_p <- rep(ability, each = n_years)

exper <- rep(runif(n_ind, 1, 30), each = n_years) + rep(0:6, n_ind)
union <- rbinom(n_obs, 1, 0.3)
hours <- pmin(pmax(rnorm(n_obs, 2000, 400), 500), 3500)

alpha_i <- 0.3 * ability_p + rep(rnorm(n_ind, 0, 0.2), each = n_years)
year_eff <- rep(c(0, 0.02, 0.04, 0.03, 0.01, -0.01, 0.02), n_ind)
epsilon <- rnorm(n_obs, 0, 0.25)

log_wage <- 1.0 + 0.07 * educ_p + 0.03 * exper - 0.0004 * exper^2 -
          0.15 * female_p - 0.05 * black_p + 0.12 * union +
          alpha_i + year_eff + epsilon

df <- pdata.frame(data.frame(id = ids, year = years, log_wage = log_wage,
                            educ = educ_p, exper = exper, expersq = exper^2,
                            female = female_p, black = black_p,
                            union = union, hours = hours),
                index = c("id", "year"))

cat("Panel dimensions:", pdim(df)$nT$n, "individuals x", pdim(df)$nT$T, "years\n")
summary(df[, c("log_wage", "educ", "exper", "female", "black", "union")])

Expected output:

Panel dimensions: 595 individuals x 7 years = 4165 obs

Summary statistics:
Variablemeanstdmin25%50%75%max
log_wage2.2450.5350.1251.8942.2352.5824.120
educ12.682.916.0011.0013.0014.0020.00
exper18.529.451.0011.0018.0026.0051.00
female0.470.500.000.000.001.001.00
black0.120.330.000.000.000.001.00
union0.300.460.000.000.001.001.00

The sample of 4,165 observations (595 x 7) matches the Cornwell and Rupert (1988) panel structure.


Step 2: Estimate Random Effects (GLS)

# Random Effects (GLS) estimator
re_model <- plm(log_wage ~ educ + exper + expersq + female + black + union,
              data = df, model = "random")

summary(re_model)

cat("\nKey estimates:\n")
cat("  Education:", coef(re_model)["educ"], "\n")
cat("  Experience:", coef(re_model)["exper"], "\n")
cat("  Female:", coef(re_model)["female"], "\n")
cat("  Union:", coef(re_model)["union"], "\n")
Requiresplm

Expected output:

=== Random Effects (GLS) ===

Key estimates:
  Education:    0.0842 (SE: 0.0045)
  Experience:   0.0312
  Female:       -0.1485
  Union:        0.1215
VariableCoeffSEzp
Intercept0.65200.0956.860.000
educ0.08420.00518.710.000
exper0.03120.00310.400.000
expersq-0.00040.000-5.920.000
female-0.14850.021-7.070.000
black-0.05200.028-1.860.063
union0.12150.0139.350.000

Note that the education coefficient (~0.084) is biased upward from its true value of 0.07 because ability is correlated with education, violating the RE assumption.


Step 3: Estimate Fixed Effects (Within)

# Fixed Effects (within) estimator
fe_model <- plm(log_wage ~ educ + exper + expersq + female + black + union,
              data = df, model = "within")

summary(fe_model)

# Note: educ, female, black are dropped (time-invariant)
cat("\nNote: educ, female, black are absorbed by individual FE\n")
cat("Only time-varying coefficients are estimated.\n")

# Compare
cat("\nComparison (time-varying variables):\n")
cat("  Experience RE:", coef(re_model)["exper"],
  "FE:", coef(fe_model)["exper"], "\n")
cat("  Union RE:", coef(re_model)["union"],
  "FE:", coef(fe_model)["union"], "\n")
Requiresplm

Expected output:

=== Fixed Effects (Within) ===

Note: Time-invariant variables (educ, female, black) are absorbed
by the individual fixed effects and cannot be estimated.

Comparison of time-varying coefficients:
Variable          RE          FE
----------------------------------
exper         0.0312      0.0325
expersq      -0.0004     -0.0004
union         0.1215      0.1190
VariableREFE
exper0.03120.0325
expersq-0.0004-0.0004
union0.12150.1190
educ0.0842— (dropped)
female-0.1485— (dropped)
black-0.0520— (dropped)

FE drops all time-invariant variables. The time-varying coefficients differ slightly between RE and FE because RE is biased by the correlation between ability and education.

Concept Check

The Fixed Effects model drops education, gender, and race from the estimation. Why can FE not estimate the effects of time-invariant variables?


Step 4: The Hausman Test

The Hausman test compares RE and FE estimates. Under H0, both are consistent but RE is efficient. Under H1 (individual effects correlated with regressors), FE is consistent but RE is biased.

# Hausman test (built into plm)
hausman <- phtest(fe_model, re_model)
print(hausman)

if (hausman$p.value < 0.05) {
cat("\nREJECT H0: Use Fixed Effects\n")
} else {
cat("\nFail to reject H0: Random Effects is acceptable\n")
}
Requiresplm

Expected output:

=== Hausman Test ===
  H0: RE is consistent (individual effects uncorrelated with regressors)
  H1: RE is inconsistent (use FE instead)

  Chi-squared statistic: 25.8741
  Degrees of freedom:    3
  p-value:               0.000010

  REJECT H0: Use Fixed Effects
TestStatisticdfp-valueDecision
Hausman25.873< 0.001Reject RE; use FE

The Hausman test strongly rejects the RE assumption, correctly detecting the built-in correlation between ability and education.


Step 5: The Mundlak (Correlated Random Effects) Approach

The Mundlak (1978) approach adds group means of time-varying regressors to the RE model. This augmentation allows individual effects to be correlated with regressors while still estimating time-invariant variables.

# Compute individual means of time-varying regressors
df$exper_mean <- ave(as.numeric(df$exper), df$id)
df$union_mean <- ave(as.numeric(df$union), df$id)

# Mundlak model: RE + group means
mundlak <- plm(log_wage ~ educ + exper + expersq + female + black + union +
             exper_mean + union_mean,
             data = df, model = "random")
summary(mundlak)

# Test Mundlak terms
cat("\nMundlak terms:\n")
cat("  exper_mean:", coef(mundlak)["exper_mean"],
  " p =", summary(mundlak)$coefficients["exper_mean", 4], "\n")
cat("  union_mean:", coef(mundlak)["union_mean"],
  " p =", summary(mundlak)$coefficients["union_mean", 4], "\n")

cat("\nEducation coefficient comparison:\n")
cat("  RE:", coef(re_model)["educ"], "\n")
cat("  Mundlak:", coef(mundlak)["educ"], "\n")
Requiresplm

Expected output:

=== Mundlak (Correlated Random Effects) ===

Mundlak terms:
  exper_mean: 0.0185 (p = 0.0012)
  union_mean: 0.1452 (p = 0.0003)

If Mundlak terms are significant, RE is biased.

Education coefficient comparison:
  RE:      0.0842
  Mundlak: 0.0725
  (FE cannot estimate educ)
VariableCoeffSEp
educ0.07250.0050.000
exper0.03250.0030.000
union0.11900.0140.000
exper_mean0.01850.0060.001
union_mean0.14520.0400.000

The significant Mundlak terms confirm that RE is biased. The Mundlak education coefficient (~0.073) is closer to the true value of 0.07 than the standard RE estimate (~0.084).


Step 6: Breusch-Pagan LM Test for Individual Effects

Before choosing between RE and FE, we should first test whether individual effects exist at all. The Breusch-Pagan LM test compares pooled OLS against RE.

# Pooled OLS
pooled <- plm(log_wage ~ educ + exper + expersq + female + black + union,
            data = df, model = "pooling")
summary(pooled)

# Breusch-Pagan LM test
bp_test <- plmtest(pooled, type = "bp")
print(bp_test)

if (bp_test$p.value < 0.05) {
cat("\nREJECT H0: Individual effects are present\n")
} else {
cat("\nFail to reject H0: Pooled OLS may be adequate\n")
}
Requiresplm

Expected output:

=== Pooled OLS ===
Education: 0.0895
R-squared: 0.3542

=== Breusch-Pagan LM Test ===
  H0: No individual effects (pooled OLS is appropriate)
  LM statistic: 1542.3215
  p-value: 0.000000
  REJECT H0: Individual effects are present
TestStatisticdfp-valueDecision
Breusch-Pagan LM1542.321< 0.001Reject pooled OLS; individual effects exist

The Breusch-Pagan test overwhelmingly rejects the null of no individual effects, confirming that panel methods (RE or FE) are needed rather than pooled OLS.

Concept Check

You find that the Breusch-Pagan LM test strongly rejects pooled OLS in favor of individual effects, and the Hausman test rejects RE in favor of FE. But you want to estimate the effect of education (time-invariant). What should you do?


Step 7: Compare with Published Results

Summary of expected results:

Test/EstimatorExpected ResultInterpretation
Breusch-Pagan LMReject H0 (p < 0.001)Individual effects exist
Hausman testReject RE (p < 0.05)Individual effects correlated with regressors
RE education coeffBiased upward (~0.08-0.10)Picks up ability bias
FE union coeff~0.10-0.15Within-person union premium
Mundlak termsSignificantConfirms RE inconsistency

The central lesson from Cornwell and Rupert (1988) is that the choice between RE and FE matters empirically, and the Hausman test provides a formal framework for making this choice.


Extension Exercises

  1. Between estimator. Estimate the "between" model (regression on individual means). Compare the between, within, and RE estimates of the union coefficient. Which is largest? Why?

  2. First-difference estimator. Estimate the model in first differences (delta y on delta x). Compare with FE. They are algebraically identical with T=2 but differ with T>2. Which is more efficient here?

  3. Hausman-Taylor IV. Implement the Hausman-Taylor (1981) estimator, which uses within-group variation as instruments for the time-invariant variables. Does the education coefficient change relative to the Mundlak approach?

  4. Heterogeneous effects. Allow the union premium to vary by education level (interact union with education). Does the union premium differ for high- vs. low-education workers?

  5. Serial correlation test. Test for serial correlation in the idiosyncratic errors using the Wooldridge (2002) test. If serial correlation is present, how does it affect inference under RE vs. FE?


Summary

In this replication lab you learned:

  • Random Effects is efficient but requires individual effects to be uncorrelated with regressors — a strong assumption
  • Fixed Effects eliminates individual heterogeneity but cannot estimate time-invariant coefficients
  • The Hausman test formally compares RE and FE; rejection means RE is inconsistent
  • The Breusch-Pagan LM test establishes whether individual effects exist at all
  • The Mundlak approach is a practical compromise: it allows correlated effects while estimating time-invariant coefficients
  • In the Cornwell and Rupert (1988) wage data, the Hausman test rejects RE, consistent with ability bias in returns to education
  • Applied researchers should report both RE and FE and discuss the Hausman test result, rather than mechanically choosing one estimator