Lab·tutorial·7 min read

tutorial90 minutes

Lab: Random Effects Regression

Random effects regression: estimate RE, compare with FE, run the Hausman test, implement the Mundlak approach, and recognize when RE is appropriate.

Method: Random Effects
Languages: Python, R, Stata
Dataset: Simulated employee panel data (wages)

Overview

In this lab you will estimate random effects models using simulated employee panel data on wages. Random effects is a weighted average of the between and within estimators, offering efficiency gains over fixed effects when its key assumption holds: the unobserved individual effect is uncorrelated with the regressors. You will learn to estimate RE, test its assumptions, and understand when it is preferred.

What you will learn:

How the RE estimator combines between and within variation
How to estimate RE and interpret the output
How to compare RE with FE using the Hausman test
How the Mundlak (correlated random effects) approach nests both FE and RE
When RE is genuinely preferred over FE

Prerequisites: Familiarity with fixed effects regression (see the FE lab) and basic panel data concepts.

Step 1: Simulate Employee Panel Data

We create a panel of 500 employees observed over 8 years. In this simulation, the individual effect is uncorrelated with the regressors, making RE the appropriate estimator.

1# First-time setup: install.packages(c("plm", "fixest", "modelsummary"))
2library(plm)
3library(fixest)
4library(modelsummary)
5
6set.seed(42)
7N <- 500
8T_per <- 8
9
10alpha_i <- rnorm(N, sd = 1.5)
11
12employee_id <- rep(1:N, each = T_per)
13year <- rep(2015:(2015 + T_per - 1), N)
14alpha_rep <- rep(alpha_i, each = T_per)
15
16exper <- rep(0:(T_per - 1), N) + rep(runif(N, 0, 15), each = T_per)
17tenure <- rep(0:(T_per - 1), N) + rep(runif(N, 0, 5), each = T_per)
18educ <- rep(round(pmin(pmax(rnorm(N, 14, 3), 8), 22)), each = T_per)
19female <- rep(rbinom(N, 1, 0.48), each = T_per)
20union <- rbinom(N * T_per, 1, 0.25)
21
22log_wage <- 2.0 + alpha_rep + 0.05 * educ + 0.03 * exper -
23          0.0004 * exper^2 + 0.02 * tenure - 0.12 * female +
24          0.08 * union + rnorm(N * T_per, sd = 0.3)
25
26df <- data.frame(employee_id = factor(employee_id), year = factor(year),
27               log_wage, exper, tenure, educ, female, union)
28
29cat("Panel:", N, "x", T_per, "=", nrow(df), "obs\n")

Requiresplm fixest modelsummary

Expected output:

Panel: 500 employees x 8 years = 4000 obs

	log_wage	exper	tenure	educ	female	union
count	4000.000	4000.000	4000.000	4000.000	4000.000	4000.000
mean	3.128	10.92	5.98	13.87	0.478	0.251
std	1.626	5.41	2.87	2.84	0.500	0.434
min	-1.850	0.00	0.00	8.00	0.00	0.00
25%	1.987	6.68	3.74	12.00	0.00	0.00
50%	3.115	10.85	5.92	14.00	0.00	0.00
75%	4.230	15.07	8.18	16.00	1.00	1.00
max	8.420	25.50	12.00	22.00	1.00	1.00

Step 2: Estimate the Random Effects Model

1# Random Effects
2pdf <- pdata.frame(df, index = c("employee_id", "year"))
3re_model <- plm(log_wage ~ exper + I(exper^2) + tenure + educ + female + union,
4              data = pdf, model = "random")
5summary(re_model)
6
7# Variance components
8cat("\nVariance decomposition:\n")
9ercomp(re_model)

Requiresplm

Expected output:

=== Random Effects ===

Variable	Coeff	SE	t	p
Intercept	1.3450	0.121	11.12	0.000
exper	0.0305	0.002	15.25	0.000
I(exper**2)	-0.0004	0.000	-4.82	0.000
tenure	0.0198	0.003	6.60	0.000
educ	0.0512	0.006	8.53	0.000
female	-0.1185	0.054	-2.19	0.028
union	0.0815	0.011	7.41	0.000

Variance of individual effect (sigma_alpha^2): 2.1532
Variance of idiosyncratic error (sigma_e^2): 0.0908

All coefficients are close to their true values (exper=0.03, tenure=0.02, educ=0.05, female=-0.12, union=0.08). RE can estimate time-invariant effects like education and gender.

Step 3: Compare RE with FE and Pooled OLS

1# Fixed Effects
2fe_model <- plm(log_wage ~ exper + I(exper^2) + tenure + educ + female + union,
3              data = pdf, model = "within")
4
5# Pooled OLS
6pooled <- plm(log_wage ~ exper + I(exper^2) + tenure + educ + female + union,
7            data = pdf, model = "pooling")
8
9# Compare
10modelsummary(list("Pooled" = pooled, "RE" = re_model, "FE" = fe_model),
11           stars = TRUE,
12           coef_map = c("exper" = "Experience", "tenure" = "Tenure",
13                       "union" = "Union", "educ" = "Education",
14                       "female" = "Female"))
15
16cat("\nNote: FE drops time-invariant variables (educ, female)\n")
17cat("RE estimates educ:", coef(re_model)["educ"],
18  " female:", coef(re_model)["female"], "\n")

Requiresplm modelsummary

Expected output:

Variable      True   Pooled       RE       FE
--------------------------------------------------
exper       0.0300   0.0302   0.0305   0.0308
tenure      0.0200   0.0195   0.0198   0.0202
union       0.0800   0.0798   0.0815   0.0823

Note: FE drops time-invariant variables (educ, female).
RE can estimate them:
  educ (true=0.05):   RE = 0.0512
  female (true=-0.12): RE = -0.1185

Variable	True	Pooled OLS	RE	FE
exper	0.0300	0.0302	0.0305	0.0308
tenure	0.0200	0.0195	0.0198	0.0202
union	0.0800	0.0798	0.0815	0.0823
educ	0.0500	0.0498	0.0512	— (dropped)
female	-0.1200	-0.1195	-0.1185	— (dropped)

Time-varying coefficients are similar across all three estimators. The key advantage of RE: it can estimate education and gender effects that FE cannot.

Concept Check

The FE estimator cannot estimate the effect of education or gender in this panel because these variables are time-invariant. RE can. Does this mean RE is always better for estimating time-invariant effects?

Yes — RE is always preferred when you want to estimate time-invariant effects.No — RE estimates of time-invariant effects are only reliable if the individual effect is uncorrelated with the regressors. If this assumption fails, even the time-invariant coefficients are biased.FE can estimate time-invariant effects if you add interaction terms.You can make RE reliable for time-invariant effects by increasing the number of time periods.

Step 4: The Hausman Test

1# Hausman test
2ht <- phtest(fe_model, re_model)
3print(ht)
4
5if (ht$p.value > 0.05) {
6cat("\n=> Fail to reject H0: RE assumption appears valid.\n")
7cat("   RE appears appropriate (more efficient if exogeneity holds).\n")
8} else {
9cat("\n=> Reject H0: RE assumption is violated. Use FE.\n")
10}

Expected output:

Hausman test statistic: 3.2145
Degrees of freedom: 3
p-value: 0.3594

=> Fail to reject H0: RE assumption appears valid.
   RE appears appropriate (more efficient if exogeneity holds).

Test	Statistic	df	p-value	Decision
Hausman	3.21	3	0.359	Fail to reject; RE is appropriate

The Hausman test fails to reject, which is expected because in this DGP the individual effect (alpha_i) is uncorrelated with the regressors by construction.

Step 5: The Mundlak (Correlated Random Effects) Approach

The Mundlak approach adds the group means of time-varying regressors to the RE model. If the coefficients on the means are jointly zero, RE is appropriate. The Mundlak augmentation nests FE within RE.

1# First-time setup: install.packages(c("car"))
2# Mundlak approach
3df$exper_mean <- ave(df$exper, df$employee_id)
4df$tenure_mean <- ave(df$tenure, df$employee_id)
5df$union_mean <- ave(as.numeric(df$union), df$employee_id)
6
7pdf_m <- pdata.frame(df, index = c("employee_id", "year"))
8mundlak <- plm(log_wage ~ exper + I(exper^2) + tenure + educ + female + union +
9             exper_mean + tenure_mean + union_mean,
10             data = pdf_m, model = "random")
11summary(mundlak)
12
13# Joint test on means
14library(car)
15linearHypothesis(mundlak, c("exper_mean = 0", "tenure_mean = 0", "union_mean = 0"))

Requirescar plm

Expected output:

=== Mundlak Model ===

Variable	Coeff	SE	t	p
exper	0.0308	0.003	10.27	0.000
I(exper**2)	-0.0004	0.000	-4.71	0.000
tenure	0.0202	0.004	5.05	0.000
educ	0.0510	0.007	7.29	0.000
female	-0.1182	0.055	-2.15	0.032
union	0.0823	0.012	6.86	0.000
exper_mean	-0.0012	0.008	-0.15	0.881
tenure_mean	-0.0015	0.012	-0.13	0.900
union_mean	0.0085	0.040	0.21	0.832

Wald test on group means (Mundlak test):
  F-statistic: 0.4521
  p-value: 0.7162
  If p > 0.05: RE is appropriate (means are not needed)

The group means are all insignificant (p > 0.7 jointly), confirming that the RE assumption holds in this DGP.

Concept Check

In the Mundlak model, you add group means of time-varying regressors to the RE regression. If the coefficients on these means are all zero, what does that imply?

The fixed effects are zero.The individual effects are uncorrelated with the regressors, so the standard RE model is correctly specified and preferred over FE.The time-varying regressors have no effect.The individual effects are identical across all units.

Step 6: When RE Is Preferred

1# Compare SEs
2cat("=== SE Comparison ===\n")
3vars <- c("exper", "tenure", "union")
4for (v in vars) {
5fe_se <- summary(fe_model)$coefficients[v, "Std. Error"]
6re_se <- summary(re_model)$coefficients[v, "Std. Error"]
7cat(v, "- FE SE:", round(fe_se, 5), " RE SE:", round(re_se, 5),
8    " Ratio:", round(re_se/fe_se, 3), "\n")
9}
10
11cat("\nRE estimates of time-invariant effects:\n")
12cat("  Education:", coef(re_model)["educ"], "(true: 0.05)\n")
13cat("  Female:", coef(re_model)["female"], "(true: -0.12)\n")

Expected output:

=== SE Comparison (time-varying regressors) ===
Variable        FE SE      RE SE      RE/FE
exper        0.00235    0.00226      0.962
tenure       0.00395    0.00379      0.959
union        0.01250    0.01205      0.964

RE SEs are slightly smaller => marginally more precise.
With σ_α² >> σ_e², quasi-demeaning parameter θ ≈ 0.93 makes RE
behave nearly identically to FE for time-varying coefficients.

RE estimate of education effect: 0.0512 (true: 0.05)
RE estimate of female penalty: -0.1185 (true: -0.12)
FE cannot estimate these at all.

Variable	FE SE	RE SE	RE/FE Ratio
exper	0.00235	0.00226	0.962
tenure	0.00395	0.00379	0.959
union	0.01250	0.01205	0.964

RE standard errors are only marginally smaller than FE SEs in this DGP because the quasi-demeaning parameter $\theta = 1 - \sqrt{\sigma_e^2 / (T\sigma_\alpha^2 + \sigma_e^2)} \approx 0.93$ is close to 1, so RE almost fully demeans the data. The efficiency gain of RE over FE is larger when $\sigma_\alpha^2$ is small relative to $\sigma_e^2$ (smaller $\theta$ ).

Step 7: Exercises

Violate the RE assumption. Modify the simulation so that alpha_i is correlated with education (e.g., alpha_i = 0.3*educ + noise). Re-run the Hausman test and verify that it now rejects.
Hausman-Taylor estimator. When some time-invariant variables are endogenous, the Hausman-Taylor estimator uses time-varying regressors as instruments. Implement this using plm::pht (R) or xthtaylor (Stata).
Between estimator. Estimate the between model (regression on group means) and compare it with FE, RE, and pooled OLS. Show that RE is a matrix-weighted average of the between and within estimators.
GLS by hand. Compute the quasi-demeaning parameter theta and implement the RE estimator as OLS on quasi-demeaned data. Verify your results match the packaged RE estimator.

Summary

In this lab you learned:

The RE estimator is a weighted average of the between and within estimators, trading bias risk for efficiency
RE requires that the individual effect be uncorrelated with the regressors — a strong assumption that must be tested
The Hausman test compares FE and RE; failure to reject supports using RE
The Mundlak approach nests FE within RE by adding group means of time-varying regressors
RE's key advantages are efficiency gains and the ability to estimate effects of time-invariant variables
In many observational panel settings, FE is the more conservative choice, though RE can be preferable when its assumptions are credible and you need to estimate time-invariant regressors

Overview#

Step 1: Simulate Employee Panel Data#

Step 2: Estimate the Random Effects Model#

Step 3: Compare RE with FE and Pooled OLS#

Step 4: The Hausman Test#

Step 5: The Mundlak (Correlated Random Effects) Approach#

Step 6: When RE Is Preferred#

Step 7: Exercises#

Summary#

Overview

Step 1: Simulate Employee Panel Data

Step 2: Estimate the Random Effects Model

Step 3: Compare RE with FE and Pooled OLS

Step 4: The Hausman Test

Step 5: The Mundlak (Correlated Random Effects) Approach

Step 6: When RE Is Preferred

Step 7: Exercises

Summary