MethodAtlas
PanelEstablished

Random Effects

A more efficient alternative to fixed effects when the unobserved effect is uncorrelated with regressors.

Quick Reference

When to Use
When you believe the unobserved unit effect is uncorrelated with all regressors, and you want to estimate effects of time-invariant variables that FE cannot identify.
Key Assumption
The unobserved individual effect is uncorrelated with all regressors in all time periods. Verify with the Hausman test: if it rejects, FE is preferred.
Common Mistake
Using RE when the Hausman test rejects — indicating the RE assumption is violated and FE should be preferred. Also, not considering the Mundlak/correlated RE approach as a diagnostic.
Estimated Time
2 hours

One-Line Implementation

Stata: xtreg y x1 x2, re vce(robust)
R: plm::plm(y ~ x1 + x2, data = df, index = c('id', 't'), model = 'random')
Python: RandomEffects(df['y'], df[['x1', 'x2']]).fit() # linearmodels

Download Full Analysis Code

Complete scripts with diagnostics, robustness checks, and result export.

Motivating Example: Cross-Country Growth Regressions

You want to understand what drives economic growth across countries. Your panel has 100 countries observed over 30 years, and you are interested in the effects of institutions, trade openness, and human capital on GDP growth.

The challenge is that some of your most important variables — geography, colonial history, legal origin — are time-invariant. A fixed effects model would absorb all of these into the country fixed effects, making them impossible to estimate. But these time-invariant variables are precisely the ones you care about.

If you are willing to assume that the unobserved country-level factors (captured by the country effect αi\alpha_i) are uncorrelated with your regressors, the random effects model lets you estimate the effects of both time-varying and time-invariant variables. This assumption is strong, but in some settings it is defensible — and the payoff is substantial.

(Islam, 1995)

A. Overview: What Random Effects Does

The Model

The panel data model is the same as in fixed effects:

Yit=Xitβ+Ziγ+αi+εitY_{it} = X_{it}'\beta + Z_i'\gamma + \alpha_i + \varepsilon_{it}

where ZiZ_i are time-invariant covariates (like geography) and αi\alpha_i is the unobserved unit effect. The key difference is in the assumption about αi\alpha_i.

  • Fixed effects treats αi\alpha_i as a parameter to be estimated (or differenced away) and allows it to be arbitrarily correlated with XitX_{it}.
  • Random effects treats αi\alpha_i as a random variable drawn from a distribution, and assumes it is uncorrelated with XitX_{it} and ZiZ_i.

Why Does the RE Assumption Matter?

If αi\alpha_i is uncorrelated with the regressors, RE produces more efficient estimates (smaller standard errors) than FE. Intuitively, RE uses both within-unit and between-unit variation, while FE discards the between-unit information entirely.

RE also allows you to estimate the coefficients on time-invariant variables (γ\gamma), which FE cannot.

The RE Estimator as a Weighted Average

The RE estimator is a matrix-weighted average of the within (FE) estimator and the between estimator. In the balanced-panel, single-regressor case, this simplifies to a scalar weight:

β^RE=ωβ^FE+(1ω)β^BE\hat{\beta}_{RE} = \omega \hat{\beta}_{FE} + (1 - \omega) \hat{\beta}_{BE}

where ω\omega depends on the ratio of the idiosyncratic variance σε2\sigma^2_\varepsilon to the unit-effect variance σα2\sigma^2_\alpha and the number of time periods TT. When σα2\sigma^2_\alpha is large (lots of persistent between-unit heterogeneity), RE puts more weight on the within estimator and approaches FE. In the general multivariate or unbalanced-panel case, ω\omega becomes a matrix weight and the expression above no longer applies as a simple scalar formula.


Common Confusions


B. Identification: The RE Assumption

The Core Assumption

E[αiXi1,Xi2,,XiT,Zi]=E[αi]=0E[\alpha_i | X_{i1}, X_{i2}, \ldots, X_{iT}, Z_i] = E[\alpha_i] = 0

The unobserved unit effect is uncorrelated with all regressors in all time periods. This condition is the same as saying there is no selection bias from unobserved time-invariant factors.

When Is the RE Assumption Plausible?

RE is most defensible when:

  1. Treatment is randomly assigned — In an RCT with panel data, the unit effect αi\alpha_i is, by design, uncorrelated with treatment. RE is efficient here.
  2. The between and within estimates agree — If the Hausman test does not reject, and the coefficients are substantively similar, RE may be appropriate.
  3. You are studying time-invariant variables — If the research question demands estimating time-invariant effects, RE (or Mundlak) is your only option.
  4. Prediction is the goal — If you want to predict outcomes (not estimate causal effects), RE can be more useful than FE.

When Is It NOT Plausible?

RE is rarely credible when:

  1. Selection into treatment is likely — If units self-select into treatment based on unobserved characteristics.
  2. Cross-country regressions with endogenous institutions — Country-level unobservables (culture, geography) likely affect both institutions and growth.
  3. The Hausman test strongly rejects — A large test statistic means FE and RE estimates differ substantially, and FE is the safer choice.

C. Visual Intuition

Recall the FE picture: each unit has its own intercept, and FE fits separate within-unit regression lines. RE does something subtler. It partially pools the unit-specific intercepts toward the overall mean. Units with more observations (or more precise data) keep more of their own intercept; units with less data are pulled more toward the grand mean.

Think of RE as a compromise: it does not fully trust the between-unit comparisons (like pooled OLS), but it does not fully discard them either (like FE). It uses the data to find a well-suited blend.

This partial pooling is exactly what hierarchical/multilevel models do. RE and hierarchical linear models (HLM) are the same thing, just with different jargon. Econometricians say "random effects." Psychometricians and education researchers say "multilevel models." The math is identical.

Interactive Simulation

RE vs FE Trade-off

Random effects is efficient when between-group confounding is absent, but becomes biased as confounding grows. FE remains unbiased throughout, at the cost of precision.

00.330.671Simulated ValueBetween-Gro…Number ofTrue EffectParameters

Computed Results

RE Estimate
2.00
FE Estimate (unbiased)
2.00
RE Efficiency Gain (%)
30.0
05
5100
05
Interactive Simulation

When Random Effects vs Fixed Effects?

Panel DGP: Yᵢₜ = αᵢ + 2.0 · Xᵢₜ + εᵢₜ, where αᵢ depends on X (endogeneity = 0.0). 10 units × 8 periods. Hausman statistic = 0.14.

-8.6-4.7-0.73.37.211.2Covariate (X)Outcome (Y)
Pooled OLSFixed EffectsRandom EffectsTrue slope

Estimation Results

Estimatorβ̂SE95% CIBias
Pooled OLS2.1190.080[1.96, 2.27]+0.119
Fixed Effectsclosest2.0270.181[1.67, 2.38]+0.027
Random Effects2.0740.132[1.81, 2.33]+0.074
True β2.000
0.0

How strongly αᵢ depends on X (0 = RE assumption holds; nonzero = RE is biased)

10

Cross-sectional units

8

Time periods per unit

2.0

The causal effect of X on Y

Why the difference?

With near-zero endogeneity between unit effects and X, Random Effects is consistent AND more efficient than Fixed Effects. RE uses both within and between variation, yielding smaller standard errors (SE = 0.132 vs FE SE = 0.181). Hausman test statistic = 0.14 (p ≈ 0.639). We fail to reject: RE appears consistent and is preferred for efficiency.


D. Mathematical Derivation

Don't worry about the notation yet — here's what this means in words: RE uses a partial demeaning (quasi-demeaning) that optimally blends within-unit and between-unit variation, depending on how much of the total variance is due to the unit effect.

The composite error is uit=αi+εitu_{it} = \alpha_i + \varepsilon_{it}. Under RE assumptions:

Var(uit)=σα2+σε2\text{Var}(u_{it}) = \sigma^2_\alpha + \sigma^2_\varepsilonCov(uit,uis)=σα2for ts\text{Cov}(u_{it}, u_{is}) = \sigma^2_\alpha \quad \text{for } t \neq s

The correlation structure within units motivates GLS. The RE estimator applies a quasi-demeaning transformation:

Y~it=Yitθ^Yˉi\tilde{Y}_{it} = Y_{it} - \hat{\theta}\bar{Y}_i

where

θ^=1σε2σε2+Tσα2\hat{\theta} = 1 - \sqrt{\frac{\sigma^2_\varepsilon}{\sigma^2_\varepsilon + T\sigma^2_\alpha}}

This scalar expression holds for the balanced panel, single-regressor case. In general, RE applies a matrix transformation Ω1/2\Omega^{-1/2} to the stacked data, where Ω\Omega is the block-diagonal error covariance matrix.

When σα2\sigma^2_\alpha is large (lots of permanent unit differences), θ1\theta \to 1, and RE approaches FE (full demeaning). When σα2=0\sigma^2_\alpha = 0 (no unit effects), θ=0\theta = 0, and RE reduces to pooled OLS.

The RE estimator is then OLS on the quasi-demeaned data:

β^RE=(itX~itX~it)1(itX~itY~it)\hat{\beta}_{RE} = \left(\sum_i \sum_t \tilde{X}_{it}\tilde{X}_{it}'\right)^{-1} \left(\sum_i \sum_t \tilde{X}_{it}\tilde{Y}_{it}\right)

Hausman test statistic:

H=(β^FEβ^RE)[Var^(β^FE)Var^(β^RE)]1(β^FEβ^RE)H = (\hat{\beta}_{FE} - \hat{\beta}_{RE})'\left[\widehat{\text{Var}}(\hat{\beta}_{FE}) - \widehat{\text{Var}}(\hat{\beta}_{RE})\right]^{-1}(\hat{\beta}_{FE} - \hat{\beta}_{RE})

Under H0H_0 (RE is consistent), Hχk2H \sim \chi^2_k, where kk is the number of time-varying regressors.


E. Implementation

library(plm)

# Set up panel data
pdata <- pdata.frame(df, index = c("country_id", "year"))

# Random effects
re_fit <- plm(growth ~ trade_openness + human_capital + institutions,
            data = pdata, model = "random")
summary(re_fit)

# Fixed effects
fe_fit <- plm(growth ~ trade_openness + human_capital,
            data = pdata, model = "within")

# Hausman test
phtest(fe_fit, re_fit)

# Mundlak approach
library(fixest)
df$mean_trade <- ave(df$trade_openness, df$country_id)
df$mean_hcap <- ave(df$human_capital, df$country_id)
mundlak <- feols(growth ~ trade_openness + human_capital + institutions +
               mean_trade + mean_hcap | 0,
               data = df, vcov = ~country_id)
summary(mundlak)
Requiresplmfixest

F. Diagnostics

The Hausman Test in Practice

Run the Hausman test and report the statistic and p-value. But interpret it carefully:

  • p < 0.05: Reject RE. Use FE. The unobserved effect likely correlates with regressors.
  • p > 0.05: Cannot reject RE. But this does not prove RE is correct — you may simply lack power.

The Mundlak Test (A Better Alternative)

The Mundlak (1978) approach adds group means of time-varying regressors to the RE model. If the coefficients on the group means are jointly significant, the RE assumption is violated. This approach is asymptotically equivalent to the Hausman test under homoskedasticity, but is more intuitive and easier to implement with robust or clustered SEs (under which the classical Hausman test is invalid).

Breusch-Pagan LM Test

Tests whether the variance of the unit effect (σα2\sigma^2_\alpha) is zero. If it is, there are no unit effects and pooled OLS is fine. If σα2>0\sigma^2_\alpha > 0, either FE or RE is needed.


Interpreting Results

  • RE coefficients on time-varying variables reflect both within-unit and between-unit variation. They are efficient but potentially biased if the RE assumption fails.
  • RE coefficients on time-invariant variables (like geography or gender) are identified entirely from between-unit variation. They are only valid if the RE assumption holds.
  • If FE and RE estimates are substantively similar (not just statistically indistinguishable), this agreement is reassuring. Report both.
  • The Mundlak approach gives you the strengths of both approaches: FE-consistent estimates of time-varying effects plus estimates of time-invariant effects, all in one model.

G. What Can Go Wrong

ProblemWhat It DoesHow to Fix It
Violated RE assumptionCoefficients are biased and inconsistentUse FE or Mundlak CRE
Hausman test lacks powerFails to reject RE even when it shouldUse Mundlak test; rely on economic reasoning
Interpreting RE as causal without justificationReviewers will flag this immediatelyExplicitly defend the uncorrelation assumption or use FE
Confusion with multilevel/HLM modelsSame estimator, different jargonRecognize they are equivalent; use whichever framing your audience expects
Using RE because FE "eats up too much variation"Not a valid justification for RELow within-variation means low power, but FE is still consistent. RE gains efficiency at the cost of potential bias
Assumption Failure Demo

Violated RE Assumption (Correlated Unit Effects)

The unobserved country effect (e.g., institutional quality) is uncorrelated with regressors (trade openness, human capital)

RE estimate of trade openness effect: 0.032 (true effect: 0.030). Hausman test p = 0.61. RE is efficient and consistent.

Assumption Failure Demo

Relying on a Low-Power Hausman Test

Hausman test has adequate power due to sufficient within-unit variation and sample size

Hausman statistic = 18.4 (p = 0.001). Clear rejection of RE. Researcher uses FE, avoiding bias.

Assumption Failure Demo

Using RE to Estimate Time-Invariant Effects Without Justification

RE assumption is carefully defended: in an RCT with panel data, randomization ensures the unit effect is uncorrelated with treatment

RE estimate of the time-invariant treatment effect: 0.15 (SE = 0.04). Valid because randomization guarantees the RE assumption.

Concept Check

You run a Hausman test comparing FE and RE estimates. The test statistic is 3.2 with 4 degrees of freedom (p = 0.52). What is the correct interpretation?


H. Practice

Concept Check

A researcher wants to estimate the effect of a country's legal origin (common law vs. civil law) on economic growth using panel data. She uses random effects because 'legal origin is time-invariant and cannot be estimated with fixed effects.' Is this a valid justification for RE?

Concept Check

You estimate both FE and RE models. The FE coefficient on trade openness is 0.03 (SE = 0.01) and the RE coefficient is 0.09 (SE = 0.006). What does the large difference between these estimates suggest?

Concept Check

A Hausman test comparing FE and RE gives a test statistic of 2.1 with 4 degrees of freedom (p = 0.72). A colleague concludes: 'The RE assumption is satisfied, so RE is definitely the right model.' What is wrong with this reasoning?

Concept Check

In an RCT with panel data, a researcher uses random effects rather than fixed effects. Is this defensible?

Guided Exercise

A study of cross-country growth uses RE because the key variable of interest (colonial legal origin) is time-invariant. The Hausman test gives p = 0.23. A Mundlak test (adding group means of trade and human capital to the RE model) shows both group means are significant at the 5% level. What should the researcher do?

The Hausman and Mundlak tests disagree. Which is more reliable, and what estimator should be used?

What does the Hausman test conclude about RE?

What does the Mundlak test conclude about RE?

Given the conflicting tests, what estimator should the researcher use?

Error Detective

Read the analysis below carefully and identify the errors.

A researcher studies the effect of democracy (a slowly varying variable) on economic growth using panel data for 80 countries over 20 years. They estimate a random effects model because "fixed effects removes all the cross-country variation in democracy, leaving insufficient within-country variation." They report: xtreg growth democracy trade human_capital, re Coefficient on democracy: 0.45 (SE = 0.12, p < 0.001). They write: "The Hausman test gives p = 0.08, which does not reject RE at the 5% level. Therefore, RE is the appropriate estimator and democracy has a positive causal effect on growth."

Select all errors you can find:

Error Detective

Read the analysis below carefully and identify the errors.

A health economist studies the effect of hospital staffing ratios on patient outcomes across 500 hospitals over 10 years. They estimate both FE and RE models. The FE estimate of nurse-to-patient ratio on mortality is -0.03 (SE = 0.02, p = 0.13). The RE estimate is -0.08 (SE = 0.01, p < 0.001). They report: "The RE model is preferred because (1) the Hausman test does not reject (p = 0.19), and (2) the RE estimate is more precisely estimated. We conclude that increasing nurse staffing significantly reduces mortality."

Select all errors you can find:

Referee Exercise

Read the paper summary below and write a brief referee critique (2-3 sentences) of the identification strategy.

Paper Summary

A study examines the effect of teacher certification type (traditional vs. alternative) on student test scores. Using panel data from 2,000 schools over 8 years, the authors estimate a random effects model because certification type varies mostly between schools, not within schools over time. They argue RE is necessary to identify the effect of this near-time-invariant variable. The Hausman test gives p = 0.31. They find that traditional certification raises math scores by 0.15 standard deviations (p < 0.01).

Key Table

VariableCoefficientSEp-value
Traditional cert. share0.1500.0420.000
Student-teacher ratio-0.0300.0110.006
% Free lunch-0.2200.0180.000
School REYes
Year FEYes
Hausman test p-value0.31
N (school-years)16,000

Authors' Identification Claim

The Hausman test does not reject RE, supporting our use of the random effects estimator. Because certification type is nearly time-invariant, FE would absorb most of the identifying variation and produce imprecise estimates.


I. Swap-In: When to Use Something Else

  • Fixed effects: When the RE assumption (unit effects uncorrelated with regressors) is implausible — FE allows arbitrary correlation between unit effects and covariates at the cost of discarding between-unit variation.
  • Correlated Random Effects (Mundlak): When you want to test the RE assumption while still estimating time-invariant effects — add group means of time-varying regressors to the RE model.
  • First differencing: Equivalent to FE with two periods. With more periods, FE is generally more efficient unless errors follow a random walk.
  • Arellano-Bond GMM: For dynamic panels with a lagged dependent variable, where both FE and RE are biased.

J. Reviewer Checklist

Critical Reading Checklist


Paper Library

Foundational (6)

Laird, N. M., & Ware, J. H. (1982). Random-Effects Models for Longitudinal Data.

BiometricsDOI: 10.2307/2529876

Laird and Ware developed the general framework for random-effects models in longitudinal data, integrating fixed population parameters with random individual-level effects. This paper is foundational for the mixed-effects modeling approach widely used in biostatistics and social sciences.

Hausman, J. A. (1978). Specification Tests in Econometrics.

EconometricaDOI: 10.2307/1913827

Hausman developed the specification test that compares fixed-effects and random-effects estimates to determine whether the random-effects assumption of no correlation between unobserved heterogeneity and regressors is valid. The Hausman test is a standard diagnostic in virtually all panel data analyses.

Mundlak, Y. (1978). On the Pooling of Time Series and Cross Section Data.

EconometricaDOI: 10.2307/1913646

Mundlak proposed adding group means of time-varying covariates to the random-effects model, which produces the same slope estimates as fixed effects while retaining the ability to estimate coefficients on time-invariant variables. The 'Mundlak device' or correlated random-effects approach remains widely used.

Bell, A., & Jones, K. (2015). Explaining Fixed Effects: Random Effects Modeling of Time-Series Cross-Sectional and Panel Data.

Political Science Research and MethodsDOI: 10.1017/psrm.2014.7

Bell and Jones argued that the 'within-between' random-effects model (essentially the Mundlak approach) is often superior to pure fixed effects because it allows explicit decomposition of within- and between-unit effects while accounting for unobserved heterogeneity.

Wooldridge, J. M. (2019). Correlated Random Effects Models with Unbalanced Panels.

Journal of EconometricsDOI: 10.1016/j.jeconom.2018.12.010

Wooldridge extended the correlated random effects (CRE) framework to handle unbalanced panels, which are the norm in applied research. This paper shows how to combine the flexibility of fixed effects with the ability to estimate effects of time-invariant variables, making the CRE approach practical for real-world datasets.

Hausman, J. A., & Taylor, W. E. (1981). Panel Data and Unobservable Individual Effects.

EconometricaDOI: 10.2307/1911406

Hausman and Taylor developed an instrumental variables estimator for panel data that allows consistent estimation of coefficients on time-invariant variables even when individual effects are correlated with some regressors. The Hausman-Taylor estimator occupies a middle ground between fixed effects (which cannot estimate time-invariant coefficients) and random effects (which requires strict exogeneity).

Application (5)

Raudenbush, S. W., & Bryk, A. S. (2002). Hierarchical Linear Models: Applications and Data Analysis Methods.

Sage Publications

This influential textbook popularized hierarchical linear models (HLM), which are random-effects models for nested data structures such as students within schools. It became the standard reference for multilevel modeling in education, psychology, and organizational research.

Hofmann, D. A. (1997). An Overview of the Logic and Rationale of Hierarchical Linear Models.

Journal of ManagementDOI: 10.1177/014920639702300602

Hofmann introduced hierarchical linear models to the management research community, explaining when and why multilevel random-effects models are appropriate for organizational data with nested structures. This tutorial was highly influential in promoting multilevel methods in management journals.

Aguinis, H., Gottfredson, R. K., & Culpepper, S. A. (2013). Best-Practice Recommendations for Estimating Cross-Level Interaction Effects Using Multilevel Modeling.

Journal of ManagementDOI: 10.1177/0149206313478188

This paper provided detailed guidance for management researchers on estimating cross-level interaction effects in multilevel models, addressing common problems such as insufficient statistical power, centering decisions, and effect size reporting.

Peterson, M. F., Arregle, J.-L., & Martin, X. (2012). Multilevel Models in International Business Research.

Journal of International Business StudiesDOI: 10.1057/jibs.2011.59

This editorial reviewed the use of multilevel random-effects models in international business research, where firms are nested within countries. It discussed best practices for modeling cross-level effects and the importance of accounting for the hierarchical structure of international data.

Islam, N. (1995). Growth Empirics: A Panel Data Approach.

Quarterly Journal of EconomicsDOI: 10.2307/2946651

Islam applied panel data methods—including random effects and fixed effects—to the cross-country growth regression framework, showing that accounting for unobserved country heterogeneity substantially changes estimates of convergence rates. This paper demonstrated the importance of choosing between fixed and random effects in macroeconomic growth empirics.

Survey (6)

Clark, T. S., & Linzer, D. A. (2015). Should I Use Fixed or Random Effects?.

Political Science Research and MethodsDOI: 10.1017/psrm.2014.32

Clark and Linzer provided practical guidance on choosing between fixed and random effects, arguing the decision depends on the research question, sample size, and the degree of correlation between unit effects and covariates rather than simply defaulting to one approach.

Allison, P. D. (2009). Fixed Effects Regression Models.

SAGE PublicationsDOI: 10.4135/9781412993869

Allison's concise and accessible monograph compares fixed effects and random effects models for panel data, providing practical guidance on model selection, estimation, and interpretation. It is particularly useful for social scientists seeking an intuitive understanding of when each approach is appropriate.

Wooldridge, J. M. (2010). Econometric Analysis of Cross Section and Panel Data.

MIT Press

Chapters 10–11 provide a rigorous treatment of random effects estimation, GLS, and the relationship between FE and RE in panel data. The standard graduate-level reference for panel data econometrics.

Baltagi, B. H. (2021). Econometric Analysis of Panel Data.

Springer

The standard graduate-level textbook on panel data econometrics, covering error component models, random effects, and extensions to unbalanced panels and dynamic models. Provides comprehensive treatment of both theoretical foundations and practical implementation.

Angrist, J. D., & Pischke, J.-S. (2009). Mostly Harmless Econometrics: An Empiricist's Companion.

Princeton University PressDOI: 10.1515/9781400829828

Chapter 5 provides an accessible argument for preferring fixed effects over random effects in most applied settings, with clear discussion of when RE may be appropriate. This widely read textbook shaped how a generation of applied researchers think about the FE vs. RE choice.

Rabe-Hesketh, S., & Skrondal, A. (2012). Multilevel and Longitudinal Modeling Using Stata.

Stata Press

A comprehensive practical guide to multilevel (hierarchical) models in Stata, which generalize the random effects framework to more complex nested data structures. Essential reference for applied researchers implementing multilevel models.

Tags

panelcontinuous-outcome