Random Effects
A more efficient alternative to fixed effects when the unobserved effect is uncorrelated with regressors.
Quick Reference
- When to Use
- When you believe the unobserved unit effect is uncorrelated with all regressors, and you want to estimate effects of time-invariant variables that FE cannot identify.
- Key Assumption
- The unobserved individual effect is uncorrelated with all regressors in all time periods. Verify with the Hausman test: if it rejects, FE is preferred.
- Common Mistake
- Using RE when the Hausman test rejects — indicating the RE assumption is violated and FE should be preferred. Also, not considering the Mundlak/correlated RE approach as a diagnostic.
- Estimated Time
- 2 hours
One-Line Implementation
xtreg y x1 x2, re vce(robust)plm::plm(y ~ x1 + x2, data = df, index = c('id', 't'), model = 'random')RandomEffects(df['y'], df[['x1', 'x2']]).fit() # linearmodelsDownload Full Analysis Code
Complete scripts with diagnostics, robustness checks, and result export.
Motivating Example: Cross-Country Growth Regressions
You want to understand what drives economic growth across countries. Your panel has 100 countries observed over 30 years, and you are interested in the effects of institutions, trade openness, and human capital on GDP growth.
The challenge is that some of your most important variables — geography, colonial history, legal origin — are time-invariant. A fixed effects model would absorb all of these into the country fixed effects, making them impossible to estimate. But these time-invariant variables are precisely the ones you care about.
If you are willing to assume that the unobserved country-level factors (captured by the country effect ) are uncorrelated with your regressors, the random effects model lets you estimate the effects of both time-varying and time-invariant variables. This assumption is strong, but in some settings it is defensible — and the payoff is substantial.
(Islam, 1995)A. Overview: What Random Effects Does
The Model
The panel data model is the same as in fixed effects:
where are time-invariant covariates (like geography) and is the unobserved unit effect. The key difference is in the assumption about .
- Fixed effects treats as a parameter to be estimated (or differenced away) and allows it to be arbitrarily correlated with .
- Random effects treats as a random variable drawn from a distribution, and assumes it is uncorrelated with and .
Why Does the RE Assumption Matter?
If is uncorrelated with the regressors, RE produces more efficient estimates (smaller standard errors) than FE. Intuitively, RE uses both within-unit and between-unit variation, while FE discards the between-unit information entirely.
RE also allows you to estimate the coefficients on time-invariant variables (), which FE cannot.
The RE Estimator as a Weighted Average
The RE estimator is a matrix-weighted average of the within (FE) estimator and the between estimator. In the balanced-panel, single-regressor case, this simplifies to a scalar weight:
where depends on the ratio of the idiosyncratic variance to the unit-effect variance and the number of time periods . When is large (lots of persistent between-unit heterogeneity), RE puts more weight on the within estimator and approaches FE. In the general multivariate or unbalanced-panel case, becomes a matrix weight and the expression above no longer applies as a simple scalar formula.
Common Confusions
B. Identification: The RE Assumption
The Core Assumption
The unobserved unit effect is uncorrelated with all regressors in all time periods. This condition is the same as saying there is no selection bias from unobserved time-invariant factors.
When Is the RE Assumption Plausible?
RE is most defensible when:
- Treatment is randomly assigned — In an RCT with panel data, the unit effect is, by design, uncorrelated with treatment. RE is efficient here.
- The between and within estimates agree — If the Hausman test does not reject, and the coefficients are substantively similar, RE may be appropriate.
- You are studying time-invariant variables — If the research question demands estimating time-invariant effects, RE (or Mundlak) is your only option.
- Prediction is the goal — If you want to predict outcomes (not estimate causal effects), RE can be more useful than FE.
When Is It NOT Plausible?
RE is rarely credible when:
- Selection into treatment is likely — If units self-select into treatment based on unobserved characteristics.
- Cross-country regressions with endogenous institutions — Country-level unobservables (culture, geography) likely affect both institutions and growth.
- The Hausman test strongly rejects — A large test statistic means FE and RE estimates differ substantially, and FE is the safer choice.
C. Visual Intuition
Recall the FE picture: each unit has its own intercept, and FE fits separate within-unit regression lines. RE does something subtler. It partially pools the unit-specific intercepts toward the overall mean. Units with more observations (or more precise data) keep more of their own intercept; units with less data are pulled more toward the grand mean.
Think of RE as a compromise: it does not fully trust the between-unit comparisons (like pooled OLS), but it does not fully discard them either (like FE). It uses the data to find a well-suited blend.
This partial pooling is exactly what hierarchical/multilevel models do. RE and hierarchical linear models (HLM) are the same thing, just with different jargon. Econometricians say "random effects." Psychometricians and education researchers say "multilevel models." The math is identical.
RE vs FE Trade-off
Random effects is efficient when between-group confounding is absent, but becomes biased as confounding grows. FE remains unbiased throughout, at the cost of precision.
Computed Results
- RE Estimate
- 2.00
- FE Estimate (unbiased)
- 2.00
- RE Efficiency Gain (%)
- 30.0
When Random Effects vs Fixed Effects?
Panel DGP: Yᵢₜ = αᵢ + 2.0 · Xᵢₜ + εᵢₜ, where αᵢ depends on X (endogeneity = 0.0). 10 units × 8 periods. Hausman statistic = 0.14.
Estimation Results
| Estimator | β̂ | SE | 95% CI | Bias |
|---|---|---|---|---|
| Pooled OLS | 2.119 | 0.080 | [1.96, 2.27] | +0.119 |
| Fixed Effectsclosest | 2.027 | 0.181 | [1.67, 2.38] | +0.027 |
| Random Effects | 2.074 | 0.132 | [1.81, 2.33] | +0.074 |
| True β | 2.000 | — | — | — |
How strongly αᵢ depends on X (0 = RE assumption holds; nonzero = RE is biased)
Cross-sectional units
Time periods per unit
The causal effect of X on Y
Why the difference?
With near-zero endogeneity between unit effects and X, Random Effects is consistent AND more efficient than Fixed Effects. RE uses both within and between variation, yielding smaller standard errors (SE = 0.132 vs FE SE = 0.181). Hausman test statistic = 0.14 (p ≈ 0.639). We fail to reject: RE appears consistent and is preferred for efficiency.
D. Mathematical Derivation
Don't worry about the notation yet — here's what this means in words: RE uses a partial demeaning (quasi-demeaning) that optimally blends within-unit and between-unit variation, depending on how much of the total variance is due to the unit effect.
The composite error is . Under RE assumptions:
The correlation structure within units motivates GLS. The RE estimator applies a quasi-demeaning transformation:
where
This scalar expression holds for the balanced panel, single-regressor case. In general, RE applies a matrix transformation to the stacked data, where is the block-diagonal error covariance matrix.
When is large (lots of permanent unit differences), , and RE approaches FE (full demeaning). When (no unit effects), , and RE reduces to pooled OLS.
The RE estimator is then OLS on the quasi-demeaned data:
Hausman test statistic:
Under (RE is consistent), , where is the number of time-varying regressors.
E. Implementation
library(plm)
# Set up panel data
pdata <- pdata.frame(df, index = c("country_id", "year"))
# Random effects
re_fit <- plm(growth ~ trade_openness + human_capital + institutions,
data = pdata, model = "random")
summary(re_fit)
# Fixed effects
fe_fit <- plm(growth ~ trade_openness + human_capital,
data = pdata, model = "within")
# Hausman test
phtest(fe_fit, re_fit)
# Mundlak approach
library(fixest)
df$mean_trade <- ave(df$trade_openness, df$country_id)
df$mean_hcap <- ave(df$human_capital, df$country_id)
mundlak <- feols(growth ~ trade_openness + human_capital + institutions +
mean_trade + mean_hcap | 0,
data = df, vcov = ~country_id)
summary(mundlak)F. Diagnostics
The Hausman Test in Practice
Run the Hausman test and report the statistic and p-value. But interpret it carefully:
- p < 0.05: Reject RE. Use FE. The unobserved effect likely correlates with regressors.
- p > 0.05: Cannot reject RE. But this does not prove RE is correct — you may simply lack power.
The Mundlak Test (A Better Alternative)
The Mundlak (1978) approach adds group means of time-varying regressors to the RE model. If the coefficients on the group means are jointly significant, the RE assumption is violated. This approach is asymptotically equivalent to the Hausman test under homoskedasticity, but is more intuitive and easier to implement with robust or clustered SEs (under which the classical Hausman test is invalid).
Breusch-Pagan LM Test
Tests whether the variance of the unit effect () is zero. If it is, there are no unit effects and pooled OLS is fine. If , either FE or RE is needed.
Interpreting Results
- RE coefficients on time-varying variables reflect both within-unit and between-unit variation. They are efficient but potentially biased if the RE assumption fails.
- RE coefficients on time-invariant variables (like geography or gender) are identified entirely from between-unit variation. They are only valid if the RE assumption holds.
- If FE and RE estimates are substantively similar (not just statistically indistinguishable), this agreement is reassuring. Report both.
- The Mundlak approach gives you the strengths of both approaches: FE-consistent estimates of time-varying effects plus estimates of time-invariant effects, all in one model.
G. What Can Go Wrong
| Problem | What It Does | How to Fix It |
|---|---|---|
| Violated RE assumption | Coefficients are biased and inconsistent | Use FE or Mundlak CRE |
| Hausman test lacks power | Fails to reject RE even when it should | Use Mundlak test; rely on economic reasoning |
| Interpreting RE as causal without justification | Reviewers will flag this immediately | Explicitly defend the uncorrelation assumption or use FE |
| Confusion with multilevel/HLM models | Same estimator, different jargon | Recognize they are equivalent; use whichever framing your audience expects |
| Using RE because FE "eats up too much variation" | Not a valid justification for RE | Low within-variation means low power, but FE is still consistent. RE gains efficiency at the cost of potential bias |
Violated RE Assumption (Correlated Unit Effects)
The unobserved country effect (e.g., institutional quality) is uncorrelated with regressors (trade openness, human capital)
RE estimate of trade openness effect: 0.032 (true effect: 0.030). Hausman test p = 0.61. RE is efficient and consistent.
Relying on a Low-Power Hausman Test
Hausman test has adequate power due to sufficient within-unit variation and sample size
Hausman statistic = 18.4 (p = 0.001). Clear rejection of RE. Researcher uses FE, avoiding bias.
Using RE to Estimate Time-Invariant Effects Without Justification
RE assumption is carefully defended: in an RCT with panel data, randomization ensures the unit effect is uncorrelated with treatment
RE estimate of the time-invariant treatment effect: 0.15 (SE = 0.04). Valid because randomization guarantees the RE assumption.
You run a Hausman test comparing FE and RE estimates. The test statistic is 3.2 with 4 degrees of freedom (p = 0.52). What is the correct interpretation?
H. Practice
A researcher wants to estimate the effect of a country's legal origin (common law vs. civil law) on economic growth using panel data. She uses random effects because 'legal origin is time-invariant and cannot be estimated with fixed effects.' Is this a valid justification for RE?
You estimate both FE and RE models. The FE coefficient on trade openness is 0.03 (SE = 0.01) and the RE coefficient is 0.09 (SE = 0.006). What does the large difference between these estimates suggest?
A Hausman test comparing FE and RE gives a test statistic of 2.1 with 4 degrees of freedom (p = 0.72). A colleague concludes: 'The RE assumption is satisfied, so RE is definitely the right model.' What is wrong with this reasoning?
In an RCT with panel data, a researcher uses random effects rather than fixed effects. Is this defensible?
A study of cross-country growth uses RE because the key variable of interest (colonial legal origin) is time-invariant. The Hausman test gives p = 0.23. A Mundlak test (adding group means of trade and human capital to the RE model) shows both group means are significant at the 5% level. What should the researcher do?
The Hausman and Mundlak tests disagree. Which is more reliable, and what estimator should be used?
Read the analysis below carefully and identify the errors.
Select all errors you can find:
Read the analysis below carefully and identify the errors.
Select all errors you can find:
Read the paper summary below and write a brief referee critique (2-3 sentences) of the identification strategy.
Paper Summary
A study examines the effect of teacher certification type (traditional vs. alternative) on student test scores. Using panel data from 2,000 schools over 8 years, the authors estimate a random effects model because certification type varies mostly between schools, not within schools over time. They argue RE is necessary to identify the effect of this near-time-invariant variable. The Hausman test gives p = 0.31. They find that traditional certification raises math scores by 0.15 standard deviations (p < 0.01).
Key Table
| Variable | Coefficient | SE | p-value |
|---|---|---|---|
| Traditional cert. share | 0.150 | 0.042 | 0.000 |
| Student-teacher ratio | -0.030 | 0.011 | 0.006 |
| % Free lunch | -0.220 | 0.018 | 0.000 |
| School RE | Yes | ||
| Year FE | Yes | ||
| Hausman test p-value | 0.31 | ||
| N (school-years) | 16,000 |
Authors' Identification Claim
The Hausman test does not reject RE, supporting our use of the random effects estimator. Because certification type is nearly time-invariant, FE would absorb most of the identifying variation and produce imprecise estimates.
I. Swap-In: When to Use Something Else
- Fixed effects: When the RE assumption (unit effects uncorrelated with regressors) is implausible — FE allows arbitrary correlation between unit effects and covariates at the cost of discarding between-unit variation.
- Correlated Random Effects (Mundlak): When you want to test the RE assumption while still estimating time-invariant effects — add group means of time-varying regressors to the RE model.
- First differencing: Equivalent to FE with two periods. With more periods, FE is generally more efficient unless errors follow a random walk.
- Arellano-Bond GMM: For dynamic panels with a lagged dependent variable, where both FE and RE are biased.
J. Reviewer Checklist
Critical Reading Checklist
Paper Library
Foundational (6)
Laird, N. M., & Ware, J. H. (1982). Random-Effects Models for Longitudinal Data.
Laird and Ware developed the general framework for random-effects models in longitudinal data, integrating fixed population parameters with random individual-level effects. This paper is foundational for the mixed-effects modeling approach widely used in biostatistics and social sciences.
Hausman, J. A. (1978). Specification Tests in Econometrics.
Hausman developed the specification test that compares fixed-effects and random-effects estimates to determine whether the random-effects assumption of no correlation between unobserved heterogeneity and regressors is valid. The Hausman test is a standard diagnostic in virtually all panel data analyses.
Mundlak, Y. (1978). On the Pooling of Time Series and Cross Section Data.
Mundlak proposed adding group means of time-varying covariates to the random-effects model, which produces the same slope estimates as fixed effects while retaining the ability to estimate coefficients on time-invariant variables. The 'Mundlak device' or correlated random-effects approach remains widely used.
Bell, A., & Jones, K. (2015). Explaining Fixed Effects: Random Effects Modeling of Time-Series Cross-Sectional and Panel Data.
Bell and Jones argued that the 'within-between' random-effects model (essentially the Mundlak approach) is often superior to pure fixed effects because it allows explicit decomposition of within- and between-unit effects while accounting for unobserved heterogeneity.
Wooldridge, J. M. (2019). Correlated Random Effects Models with Unbalanced Panels.
Wooldridge extended the correlated random effects (CRE) framework to handle unbalanced panels, which are the norm in applied research. This paper shows how to combine the flexibility of fixed effects with the ability to estimate effects of time-invariant variables, making the CRE approach practical for real-world datasets.
Hausman, J. A., & Taylor, W. E. (1981). Panel Data and Unobservable Individual Effects.
Hausman and Taylor developed an instrumental variables estimator for panel data that allows consistent estimation of coefficients on time-invariant variables even when individual effects are correlated with some regressors. The Hausman-Taylor estimator occupies a middle ground between fixed effects (which cannot estimate time-invariant coefficients) and random effects (which requires strict exogeneity).
Application (5)
Raudenbush, S. W., & Bryk, A. S. (2002). Hierarchical Linear Models: Applications and Data Analysis Methods.
This influential textbook popularized hierarchical linear models (HLM), which are random-effects models for nested data structures such as students within schools. It became the standard reference for multilevel modeling in education, psychology, and organizational research.
Hofmann, D. A. (1997). An Overview of the Logic and Rationale of Hierarchical Linear Models.
Hofmann introduced hierarchical linear models to the management research community, explaining when and why multilevel random-effects models are appropriate for organizational data with nested structures. This tutorial was highly influential in promoting multilevel methods in management journals.
Aguinis, H., Gottfredson, R. K., & Culpepper, S. A. (2013). Best-Practice Recommendations for Estimating Cross-Level Interaction Effects Using Multilevel Modeling.
This paper provided detailed guidance for management researchers on estimating cross-level interaction effects in multilevel models, addressing common problems such as insufficient statistical power, centering decisions, and effect size reporting.
Peterson, M. F., Arregle, J.-L., & Martin, X. (2012). Multilevel Models in International Business Research.
This editorial reviewed the use of multilevel random-effects models in international business research, where firms are nested within countries. It discussed best practices for modeling cross-level effects and the importance of accounting for the hierarchical structure of international data.
Islam, N. (1995). Growth Empirics: A Panel Data Approach.
Islam applied panel data methods—including random effects and fixed effects—to the cross-country growth regression framework, showing that accounting for unobserved country heterogeneity substantially changes estimates of convergence rates. This paper demonstrated the importance of choosing between fixed and random effects in macroeconomic growth empirics.
Survey (6)
Clark, T. S., & Linzer, D. A. (2015). Should I Use Fixed or Random Effects?.
Clark and Linzer provided practical guidance on choosing between fixed and random effects, arguing the decision depends on the research question, sample size, and the degree of correlation between unit effects and covariates rather than simply defaulting to one approach.
Allison, P. D. (2009). Fixed Effects Regression Models.
Allison's concise and accessible monograph compares fixed effects and random effects models for panel data, providing practical guidance on model selection, estimation, and interpretation. It is particularly useful for social scientists seeking an intuitive understanding of when each approach is appropriate.
Wooldridge, J. M. (2010). Econometric Analysis of Cross Section and Panel Data.
Chapters 10–11 provide a rigorous treatment of random effects estimation, GLS, and the relationship between FE and RE in panel data. The standard graduate-level reference for panel data econometrics.
Baltagi, B. H. (2021). Econometric Analysis of Panel Data.
The standard graduate-level textbook on panel data econometrics, covering error component models, random effects, and extensions to unbalanced panels and dynamic models. Provides comprehensive treatment of both theoretical foundations and practical implementation.
Angrist, J. D., & Pischke, J.-S. (2009). Mostly Harmless Econometrics: An Empiricist's Companion.
Chapter 5 provides an accessible argument for preferring fixed effects over random effects in most applied settings, with clear discussion of when RE may be appropriate. This widely read textbook shaped how a generation of applied researchers think about the FE vs. RE choice.
Rabe-Hesketh, S., & Skrondal, A. (2012). Multilevel and Longitudinal Modeling Using Stata.
A comprehensive practical guide to multilevel (hierarchical) models in Stata, which generalize the random effects framework to more complex nested data structures. Essential reference for applied researchers implementing multilevel models.