MethodAtlas
Method·beginner·16 min read
Model-BasedEstablished

OLS (Robust SEs, Clustering)

The workhorse of empirical research — linear regression with modern standard error corrections.

When to UseWhen you have a continuous outcome and want to estimate the conditional mean as a linear function of covariates, or as a starting point before applying more sophisticated identification strategies.
AssumptionFor causal interpretation: E[ε|X] = 0 (zero conditional mean / exogeneity). The error term must be uncorrelated with all regressors — this is the assumption that separates descriptive regression from causal inference.
MistakeInterpreting OLS coefficients as causal effects without addressing endogeneity (omitted variable bias), or confusing statistical significance with economic significance.
Reading Time~16 min read · 11 sections · 9 interactive exercises

One-Line Implementation

Rlm_robust(y ~ x1 + x2, data = df, se_type = 'HC1')
Statareg y x1 x2, vce(robust)
Pythonsmf.ols('y ~ x1 + x2', data=df).fit(cov_type='HC1')

Download Full Analysis Code

Complete scripts with diagnostics, robustness checks, and result export.

Motivating Example: The Mincer Earnings Equation

Imagine you are a labor economist in the 1970s, and you want to answer a deceptively simple question: how much more does an additional year of education earn you?

Jacob Mincer formalized this question in what became the most-estimated equation in all of economics . The Mincer earnings equation says:

ln(wagei)=β0+β1Educationi+β2Experiencei+β3Experiencei2+εi\ln(\text{wage}_i) = \beta_0 + \beta_1 \cdot \text{Education}_i + \beta_2 \cdot \text{Experience}_i + \beta_3 \cdot \text{Experience}_i^2 + \varepsilon_i

You run this regression on a sample of workers and find β^1=0.10\hat{\beta}_1 = 0.10. Does that mean one more year of school causes a 10% increase in wages?

Maybe. But maybe not. People who get more education are also different in ways you cannot observe — they may be more motivated, have wealthier parents, or live in areas with better schools. If those unobserved factors also affect wages, your OLS estimate is picking up a mix of the true effect of education and the effect of everything correlated with education that you did not control for.

This tension — between what OLS estimates and what you want it to estimate — is the central challenge of applied empirical research, and it is the reason this entire website exists. OLS is the starting point. Every other method on this site exists because OLS alone is often not enough.


AOverview

What OLS Does

OLS (Ordinary Least Squares) finds the linear function that best predicts your outcome variable, where "best" means the one that minimizes the sum of squared prediction errors. If you have an outcome YY and regressors X1,X2,,XkX_1, X_2, \ldots, X_k, OLS finds the coefficients β^0,β^1,,β^k\hat{\beta}_0, \hat{\beta}_1, \ldots, \hat{\beta}_k that minimize:

i=1n(Yiβ^0β^1X1iβ^kXki)2\sum_{i=1}^{n} (Y_i - \hat{\beta}_0 - \hat{\beta}_1 X_{1i} - \cdots - \hat{\beta}_k X_{ki})^2

In plain language: OLS draws the line (or hyperplane) through your data that makes the vertical distances from the points to the line as small as possible, on average.

What OLS Estimates

The OLS coefficient β^1\hat{\beta}_1 estimates the best linear predictor (linear projection) of YY onto XX, which coincides with the (CEF) when the CEF is linear. This coefficient tells you: "On average, how does YY differ between observations that differ by one unit of X1X_1, holding the other regressors constant?"

Even when the CEF is nonlinear, OLS provides the minimum mean squared error linear approximation to it (Angrist & Pischke, 2009). This projection is a descriptive statement. It becomes causal only under additional assumptions (see Section B).

When to Use OLS

  • You have a continuous outcome variable
  • You want to characterize the conditional mean relationship between variables
  • You want a starting point before applying more sophisticated methods
  • Your research design has addressed (e.g., through randomization, controls, or as a first stage for instrumental variables (IV))

When NOT to Use OLS

  • Your outcome is binary (use logit/probit), a count (use Poisson/negative binomial), or censored at zero or some threshold (consider a Tobit or other censored regression model, not covered in this catalog)
  • You suspect important unobserved confounders and have no strategy to address them — see the discussion of selection bias for why this matters
  • You want to claim causality but have no identification strategy

Common Confusions


BIdentification

For OLS to give you unbiased estimates (and consistent estimates under weaker conditions), you need the following assumptions. We state them first in plain language, then formally.

Assumption 1: Linearity

Plain language: The true relationship between YY and the XX's is (approximately) linear. If the relationship is actually curved, your straight line will miss it.

Formally: Yi=Xiβ+εiY_i = X_i'\beta + \varepsilon_i for some true parameter vector β\beta.

Assumption 2: Random Sampling

Plain language: Each observation in your dataset is drawn independently from the same population. This assumption fails, for example, if you oversample certain groups or if observations within clusters are correlated.

Assumption 3: No Perfect

Plain language: No regressor is a perfect linear function of the others. If two variables are perfectly correlated (e.g., you include both "age" and "years since birth"), OLS cannot separate their effects and the estimation breaks entirely.

Assumption 4: Zero Conditional Mean (Exogeneity)

Plain language: The error term εi\varepsilon_i — which captures everything affecting YY that is not in your regressors — is on average zero for every value of XX. This condition is the critical assumption. If people who get more education also have higher unobserved ability, and ability affects wages, then the error term is correlated with education and the exogeneity assumption fails.

Formally: E[εiXi]=0E[\varepsilon_i | X_i] = 0.

Assumptions (for efficiency)

If Assumptions 1-4 hold, OLS is unbiased. For OLS to be the Best Linear Unbiased Estimator (BLUE) — meaning no other linear estimator has smaller variance — you additionally need:

Assumption 5: Homoscedasticity

Plain language: The spread of the error terms is the same regardless of the value of XX. If errors are larger for some observations than others (e.g., wage variance is higher for highly educated workers), OLS estimates remain unbiased but the conventional standard errors are wrong.

Formally: Var(εiXi)=σ2\text{Var}(\varepsilon_i | X_i) = \sigma^2 for all ii.

Assumption 6: No Serial Correlation (time series/panel only)

Plain language: The error for one observation is not correlated with the error for another. This assumption is automatically satisfied with cross-sectional data under random sampling (Assumption 2) and is only a separate concern with time series or panel data.


CVisual Intuition

Think of OLS geometrically. You have a cloud of data points in a scatter plot. OLS draws the line through the cloud that minimizes the sum of the squared vertical distances from each point to the line.


DMathematical Derivation

Don't worry about the notation yet — here's what this means in words: OLS finds the coefficients that minimize squared prediction errors. The formula is beta-hat equals (X'X) inverse times X'Y.

The OLS Problem. We want to find the vector β^\hat{\beta} that minimizes:

S(β)=(YXβ)(YXβ)=YY2βXY+βXXβS(\beta) = (Y - X\beta)'(Y - X\beta) = Y'Y - 2\beta'X'Y + \beta'X'X\beta

Step 1: Take the first-order condition. Differentiate S(β)S(\beta) with respect to β\beta and set to zero:

Sβ=2XY+2XXβ=0\frac{\partial S}{\partial \beta} = -2X'Y + 2X'X\beta = 0

Step 2: Solve. Rearranging gives the normal equations:

XXβ^=XYX'X\hat{\beta} = X'Y

If XXX'X is invertible (Assumption 3), then:

β^=(XX)1XY\hat{\beta} = (X'X)^{-1}X'Y

Step 3: Show unbiasedness. Substitute Y=Xβ+εY = X\beta + \varepsilon:

β^=(XX)1X(Xβ+ε)=β+(XX)1Xε\hat{\beta} = (X'X)^{-1}X'(X\beta + \varepsilon) = \beta + (X'X)^{-1}X'\varepsilon

Taking the conditional expectation:

E[β^X]=β+(XX)1XE[εX]E[\hat{\beta} \mid X] = \beta + (X'X)^{-1}X'E[\varepsilon \mid X]

If E[εX]=0E[\varepsilon \mid X] = 0 (Assumption 4), then E[β^X]=βE[\hat{\beta} \mid X] = \beta. OLS is unbiased.

Step 4: Variance of the estimator. Under homoscedasticity (Var(εX)=σ2I\text{Var}(\varepsilon|X) = \sigma^2 I):

Var(β^X)=σ2(XX)1\text{Var}(\hat{\beta} \mid X) = \sigma^2 (X'X)^{-1}

This expression is the formula behind conventional standard errors. When homoscedasticity fails, you need the -robust (sandwich) estimator instead:

V^robust=(XX)1(i=1nε^i2xixi)(XX)1\hat{V}_{\text{robust}} = (X'X)^{-1} \left(\sum_{i=1}^n \hat{\varepsilon}_i^2 x_i x_i'\right) (X'X)^{-1}

This sandwich estimator is the heteroscedasticity-consistent (HC) estimator from White (1980). The formula as written is HC0; HC1 applies an additional N/(Nk)N/(N-k) finite-sample correction (the default in Stata). R's sandwich package defaults to HC3 (following Long and Ervin (2000)), while Python's statsmodels uses conventional (non-robust) SEs by default — HC1 is commonly specified explicitly via cov_type='HC1'.

Each coefficient in a multivariate regression equals the bivariate slope coefficient from regressing YY on the residual from regressing that XX on all other covariates. This result is the theorem, also called the regression anatomy formula (Angrist & Pischke, 2009). The practical implication is that a multivariate OLS coefficient isolates only the variation in XX that is not explained by the other regressors.


EImplementation

Basic OLS with Robust Standard Errors

# Requires: estimatr
library(estimatr)

# --- Step 1: OLS with Robust Standard Errors ---
# lm_robust() fits OLS and computes heteroskedasticity-robust SEs.
# HC1 matches Stata's vce(robust) with degrees-of-freedom correction.
# I(exper^2) includes experience-squared to capture diminishing returns.
m1 <- lm_robust(lwage ~ educ + exper + I(exper^2),
              data = df,
              se_type = "HC1")
# summary() reports coefficients, robust SEs, t-stats, and R-squared.
# The educ coefficient = approximate % wage increase per year of education.
summary(m1)

# --- Step 2: Clustered Standard Errors ---
# When observations within a group (e.g., state) are correlated,
# cluster at that level to avoid understating uncertainty.
# CR2 (bias-reduced) is the estimatr default; better for few clusters.
m2 <- lm_robust(lwage ~ educ + exper + I(exper^2),
              data = df,
              clusters = state,
              se_type = "CR2")
# Compare SEs to Step 1: clustered SEs are typically larger,
# reflecting within-cluster correlation.
summary(m2)
Requiresestimatr

FDiagnostics

# --- Fit the base model ---
library(lmtest)
library(car)

m <- lm(lwage ~ educ + exper + I(exper^2), data = df)

# F.1 Residual plot: residuals vs fitted values
plot(fitted(m), resid(m),
   xlab = "Fitted values", ylab = "Residuals",
   main = "Residuals vs Fitted")
abline(h = 0, lty = 2, col = "red")

# F.2 Variance Inflation Factors
vif(m)

# F.3 Breusch-Pagan test for heteroscedasticity
bptest(m)

# F.4 Normality of residuals
shapiro.test(resid(m)[1:5000])  # Shapiro-Wilk (max 5000 obs)
qqnorm(resid(m)); qqline(resid(m), col = "red")

# F.5 Ramsey RESET test for functional form
resettest(m, power = 2:3, type = "fitted")
Requireslmtestcar

F.1 Residual Plots

Plot residuals (ε^i\hat{\varepsilon}_i) against fitted values (Y^i\hat{Y}_i) and against each regressor. You should see a random scatter with no pattern. A funnel shape indicates ; a curve indicates misspecification.

F.2 Variance Inflation Factors (VIF)

VIF measures how much multicollinearity inflates the variance of each coefficient. A common rule of thumb: VIF > 10 signals a problem. In Stata: estat vif. In R: car::vif(model). In Python: statsmodels variance_inflation_factor.

F.3 Heteroscedasticity Tests

The Breusch-Pagan test and White test formally test whether the error variance is constant. In practice, robust standard errors are a common default — they are valid whether or not heteroscedasticity is present, though they can be anti-conservative in small samples (see Section E above).

F.4 Normality of Residuals

OLS does not require normally distributed errors for consistency or unbiasedness. Normality is only needed for exact finite-sample inference (the t and F distributions). With large samples, the Central Limit Theorem ensures approximate normality of β^\hat{\beta} regardless. Do not reject an otherwise good model because of non-normal residuals.

F.5 Ramsey RESET Test

Tests for omitted nonlinear terms. If the test rejects, consider adding quadratic or interaction terms.


Interpreting Your Results

Correct interpretation depends on the functional form of the regression and the units of measurement.

How to Interpret Coefficients

Linear-linear: β^1=0.10\hat{\beta}_1 = 0.10 means a one-unit increase in XX is associated with a 0.10-unit increase in YY, holding other variables constant.

Log-linear (log outcome): β^1=0.10\hat{\beta}_1 = 0.10 means a one-unit increase in XX is associated with approximately a 10% increase in YY.

Log-log (both logged): β^1=0.10\hat{\beta}_1 = 0.10 means a 1% increase in XX is associated with a 0.10% increase in YY (an elasticity).

Confidence Intervals

A 95% confidence interval of [0.05,0.15][0.05, 0.15] means: if you repeated this study many times and computed a confidence interval each time, 95% of those intervals would contain the true β\beta. It does not mean there is a 95% probability that β\beta lies in this particular interval.

Statistical vs. Economic Significance

A coefficient can be statistically significant (small p-value) but economically trivial (tiny ), or statistically insignificant (large p-value) but economically meaningful (large point estimate with wide confidence interval due to limited data). It is important to discuss both.

Common Misstatements to Avoid

  • Do not say "X causes Y" unless your design supports it
  • Do not say "X has no effect" when you mean "the coefficient is not statistically significant" — a null result could mean no effect or insufficient power
  • Do not interpret R-squared as a measure of model quality or causal validity
  • Do not compare R-squared across models with different dependent variables

GWhat Can Go Wrong

What Can Go Wrong

Omitted Variable Bias

Regression includes all relevant confounders

Estimated effect of education on wages: 0.06 (true effect: 0.06)

The short regression coefficient equals the long regression coefficient plus the effect of the omitted variable times the regression of the omitted on the included: plim(β^short)=βlong+βomitted×δ\text{plim}(\hat{\beta}_{\text{short}}) = \beta_{\text{long}} + \beta_{\text{omitted}} \times \delta, where δ\delta is the coefficient from regressing the omitted variable on the included variable (Wooldridge, 2010). This formula shows that OVB is zero only when the omitted variable is uncorrelated with the included regressor (δ=0\delta = 0) or has no effect on the outcome (βomitted=0\beta_{\text{omitted}} = 0).

What Can Go Wrong

Heteroscedasticity with Conventional SEs

Using robust standard errors

SE = 0.032, 95% CI: [0.037, 0.163], correct coverage

What Can Go Wrong

Multicollinearity

Regressors are moderately correlated (correlation = 0.3)

Coefficients are stable, SEs are moderate, VIF = 1.1

What Can Go Wrong

Measurement Error in Regressors

Explanatory variable is measured precisely

Estimated coefficient reflects the true relationship

Classical measurement error in an explanatory variable produces — the coefficient is biased toward zero by the factor σx2/(σx2+σe2)\sigma_x^2 / (\sigma_x^2 + \sigma_e^2), called the reliability ratio. Measurement error in the dependent variable does not bias coefficients but inflates standard errors (Wooldridge, 2010).


HPractice

H.1 Concept Checks

Concept Check

You regress wages on education and find a coefficient of 0.08. A classmate says the coefficient means one more year of education causes an 8% wage increase. What is wrong with this interpretation?

Concept Check

You run a regression of firm revenue on advertising spending and find that the coefficient on advertising is positive but not statistically significant (p = 0.15). Your co-author concludes: 'Advertising has no effect on revenue.' Is this correct?

Concept Check

You are studying how state minimum wage laws affect individual employment. Your data has 500,000 workers in 50 states. How should you compute standard errors?

Concept Check

Your colleague adds 15 control variables to a regression and the R-squared jumps from 0.12 to 0.65. They say the model is now much better. What should you be cautious about?

Concept Check

In a regression of test scores on class size, you find VIF = 22 for class size and VIF = 20 for school enrollment. What does this tell you and what might you do?

H.2 Guided Exercise

Guided Exercise

Fill in the blanks to complete this interpretation of an OLS regression.

You run the following Mincer regression on a sample of 5,000 workers and obtain: ln(wage) = 6.2 + 0.08*Education + 0.03*Experience - 0.0005*Experience^2, with robust standard error on Education = 0.015. The R-squared is 0.30.

One more year of education is associated with approximately what percent higher wages?

What is the lower bound of the 95% CI for the Education coefficient?

What is the upper bound of the 95% CI for the Education coefficient?

The model explains what percentage of the variance in log wages?

H.3 Error Detective

Error Detective

Read the analysis below carefully and identify the errors.

A researcher studies the effect of a new management training program on employee productivity. They run:

reg productivity training age tenure, vce(robust)

They find: coefficient on training = 5.2 (SE = 1.8, p = 0.004). They write: "The management training program causes a 5.2-unit increase in productivity (p < 0.01). The R-squared of 0.45 confirms that our model is well-specified. Since we control for age and tenure, our estimate is free of omitted variable bias."

Select all errors you can find:

Error Detective

Read the analysis below carefully and identify the errors.

A political scientist regresses voter turnout (county level) on whether the county adopted early voting (a state-level policy). They cluster standard errors at the county level and report:

Coefficient on early voting: 3.1 percentage points (clustered SE = 0.8, p < 0.001).

"We cluster at the county level to account for within-county correlation in voter behavior."

Select all errors you can find:

H.4 You Are the Referee

Referee Exercise

Read the paper summary below and write a brief referee critique (2-3 sentences) of the identification strategy.

Paper Summary

The authors study the effect of corporate board diversity on firm performance (ROA). Using a panel of S&P 500 firms (2010-2020), they regress ROA on the percentage of female board members, controlling for firm size (log assets), leverage, and industry fixed effects. They use robust standard errors and find that a 10 percentage point increase in female board representation is associated with a 0.8 percentage point increase in ROA (SE = 0.3, p = 0.008). They conclude that board diversity improves firm performance.

Key Table

VariableCoefficientRobust SEp-value
Female Board Share0.080.030.008
Log(Assets)0.020.010.045
Leverage-0.150.040.000
Industry FEYes
R-squared0.23
N5,500

Authors' Identification Claim

By controlling for firm size, leverage, and industry fixed effects, we address the most important confounders. The robust standard errors account for heteroscedasticity.


ISwap-In: When to Use Something Else

  • Fixed effects: When unobserved time-invariant confounders are the primary concern and panel data are available.
  • IV / 2SLS: When endogeneity from , , or omitted variables cannot be addressed by controls alone, and a valid instrument is available.
  • Logit / Probit: When the outcome is binary (0/1). OLS (the linear probability model) can still be useful for average marginal effects, but logit/probit avoids predictions outside [0, 1].
  • Poisson / Negative Binomial: When the outcome is a non-negative count or inherently multiplicative (e.g., trade flows, patent counts).
  • Difference-in-differences: When a policy change creates a natural experiment with a comparison group and temporal variation.

JReviewer Checklist

Critical Reading Checklist

0 of 6 items checked0%

Paper Library

Foundational (10)

Abadie, A., Athey, S., Imbens, G. W., & Wooldridge, J. M. (2020). Sampling-Based versus Design-Based Uncertainty in Regression Analysis.

EconometricaDOI: 10.3982/ECTA12675

Abadie et al. distinguish between sampling-based uncertainty (from drawing a sample from a population) and design-based uncertainty (from treatment assignment) in regression analysis. They show that conventional standard errors can be conservative when the sample includes a substantial fraction of the population, providing a rigorous framework for understanding what regression standard errors actually measure. This paper clarifies the conceptual foundations for inference in empirical work and complements their separate 2023 QJE paper on clustering.

Cameron, A. C., Gelbach, J. B., & Miller, D. L. (2008). Bootstrap-Based Improvements for Inference with Clustered Errors.

Review of Economics and StatisticsDOI: 10.1162/rest.90.3.414

Cameron, Gelbach, and Miller address what happens when clustering is necessary but the number of clusters is small (fewer than 30-50). They propose the wild cluster bootstrap as a solution, which has become the standard approach when researchers have too few clusters for asymptotic cluster-robust standard errors to be reliable.

Frisch, R., & Waugh, F. V. (1933). Partial Time Regressions as Compared with Individual Trends.

EconometricaDOI: 10.2307/1907330

Frisch and Waugh establish that a coefficient in a multiple regression can be obtained by first residualizing both the outcome and the regressor against all other covariates. The Frisch-Waugh-Lovell (FWL) theorem provides the theoretical foundation for understanding what 'controlling for' means in multiple regression and is the basis for modern fixed-effects estimation.

Griliches, Z. (1977). Estimating the Returns to Schooling: Some Econometric Problems.

EconometricaDOI: 10.2307/1913285

Griliches systematically examines the biases in OLS estimates of returns to schooling, including ability bias and measurement error. This paper is a classic illustration of why researchers must think carefully about omitted variables when interpreting OLS coefficients causally.

Hamilton, B. H., & Nickerson, J. A. (2003). Correcting for Endogeneity in Strategic Management Research.

Strategic OrganizationDOI: 10.1177/1476127003001001218

Hamilton and Nickerson warn strategy researchers that naive OLS estimates of the strategy-performance relationship are often biased by endogeneity, because firms that adopt a strategy differ systematically from those that do not. They provide an accessible tutorial on endogeneity and point toward solutions including instrumental variables and Heckman selection models. The paper remains a key reference for understanding why strategic management research requires identification strategies beyond simple regression.

Holland, P. W. (1986). Statistics and Causal Inference.

Journal of the American Statistical AssociationDOI: 10.1080/01621459.1986.10478354

Holland articulates the fundamental problem of causal inference—that we can never observe both potential outcomes for the same unit—and formalizes the Rubin Causal Model framework. His dictum 'no causation without manipulation' shapes how a generation of researchers thinks about the conditions under which statistical associations can be given causal interpretations.

Long, J. S., & Ervin, L. H. (2000). Using Heteroscedasticity Consistent Standard Errors in the Linear Regression Model.

The American StatisticianDOI: 10.1080/00031305.2000.10474549

Long and Ervin compare HC0, HC1, HC2, and HC3 heteroscedasticity-consistent standard error estimators in a simulation study. Their finding that HC3 performs best in finite samples has influenced applied practice, with many applied researchers preferring HC3 over the default HC0.

Moulton, B. R. (1990). An Illustration of a Pitfall in Estimating the Effects of Aggregate Variables on Micro Units.

Review of Economics and StatisticsDOI: 10.2307/2109724

Moulton demonstrates that when aggregate-level variables (such as state policies) are used to explain individual-level outcomes, OLS standard errors that ignore within-group correlation can be dramatically understated. This paper establishes the 'Moulton problem' and motivates the widespread adoption of clustered standard errors in applied microeconomics.

Newey, W. K., & West, K. D. (1987). A Simple, Positive Semi-Definite, Heteroskedasticity and Autocorrelation Consistent Covariance Matrix.

EconometricaDOI: 10.2307/1913610

Newey and West extend White's robust standard errors to also account for autocorrelation in time-series data in this short but hugely influential paper. The 'Newey-West standard errors' or 'HAC standard errors' are standard practice whenever researchers work with data that have a time dimension.

White, H. (1980). A Heteroskedasticity-Consistent Covariance Matrix Estimator and a Direct Test for Heteroskedasticity.

EconometricaDOI: 10.2307/1912934

White introduces the now-standard 'robust standard errors' that researchers routinely use with OLS. Before White's correction, standard errors could be misleadingly small when the variance of the error term was not constant across observations. Nearly every empirical paper today uses some variant of this approach.

Application (5)

Huselid, M. A. (1995). The Impact of Human Resource Management Practices on Turnover, Productivity, and Corporate Financial Performance.

Academy of Management JournalDOI: 10.2307/256741

Huselid uses OLS (and related cross-sectional methods) to estimate the relationship between HR practices and firm performance in this influential management study. It helps launch the field of strategic HRM and illustrates both the power and limitations of regression-based approaches in management research.

Krueger, A. B. (1999). Experimental Estimates of Education Production Functions.

Quarterly Journal of EconomicsDOI: 10.1162/003355399556052

Krueger uses Tennessee's Project STAR randomized class-size experiment to estimate the effect of class size on student achievement via OLS. Because treatment is randomized, the OLS coefficient has a causal interpretation, demonstrating that the method is not the issue -- the research design is what determines causality.

Mincer, J. (1974). Schooling, Experience, and Earnings.

National Bureau of Economic Research / Columbia University Press

Mincer develops the canonical human-capital earnings function relating log wages to years of schooling and labor-market experience. The Mincer equation remains one of the most replicated empirical models in economics and remains the standard benchmark for wage-equation analysis, though it should not be read as having solved the causal identification problems surrounding returns to schooling.

Shaver, J. M. (1998). Accounting for Endogeneity When Assessing Strategy Performance: Does Entry Mode Choice Affect FDI Survival?.

Management ScienceDOI: 10.1287/mnsc.44.4.571

Shaver demonstrates how ignoring endogeneity — specifically, the self-selection of firms into entry modes — biases performance estimates in this foundational strategy paper. He shows that the choice between greenfield entries and acquisitions reflects private information about expected survival, and uses a Heckman-style selection correction to obtain unbiased estimates. One of the first papers to systematically demonstrate endogeneity problems in strategy research.

Villalonga, B., & Amit, R. (2006). How Do Family Ownership, Control and Management Affect Firm Value?.

Journal of Financial EconomicsDOI: 10.1016/j.jfineco.2004.12.005

Villalonga and Amit study how different forms of family involvement — ownership, control, and management — affect firm value using OLS regression with clustered standard errors on a panel of Fortune 500 firms. The paper disentangles the separate effects of family ownership, voting control through dual-class shares and pyramids, and family management on Tobin's q.

Survey (5)

Angrist, J. D., & Pischke, J.-S. (2009). Mostly Harmless Econometrics: An Empiricist's Companion.

Princeton University PressDOI: 10.1515/9781400829828

Angrist and Pischke write one of the most influential modern textbooks on applied econometrics, organizing the field around a design-based approach to causal inference. The book provides essential treatments of instrumental variables, difference-in-differences, and regression discontinuity, each grounded in the potential outcomes framework. It remains the standard reference for graduate students learning to evaluate and implement identification strategies.

Angrist, J. D., & Pischke, J.-S. (2010). The Credibility Revolution in Empirical Economics: How Better Research Design Is Taking the Con out of Econometrics.

Journal of Economic PerspectivesDOI: 10.1257/jep.24.2.3

Angrist and Pischke provide the intellectual context for why applied economics moved from 'throw variables into OLS and see what sticks' to design-based causal inference. They help researchers understand where OLS fits in the larger methodological landscape and why credible identification strategies matter.

Cameron, A. C., & Miller, D. L. (2015). A Practitioner's Guide to Cluster-Robust Inference.

Journal of Human ResourcesDOI: 10.3368/jhr.50.2.317

Cameron and Miller cover all aspects of cluster-robust inference in OLS regression in this highly practical survey, including when to cluster, at what level, and what to do when the number of clusters is small. It has become the essential reference for applied researchers deciding how to handle clustered data.

King, G., & Roberts, M. E. (2015). How Robust Standard Errors Expose Methodological Problems They Do Not Fix, and What to Do About It.

Political AnalysisDOI: 10.1093/pan/mpu015

King and Roberts argue that researchers often use robust standard errors as a band-aid rather than fixing the underlying model specification. They provide practical guidance on when robust SEs are appropriate and when the model itself needs to be reconsidered.

Wooldridge, J. M. (2010). Econometric Analysis of Cross Section and Panel Data.

MIT Press

Wooldridge's graduate textbook is the standard reference for cross-section and panel data econometrics. Chapters 10-11 provide a thorough treatment of fixed effects, random effects, and related panel data methods, while later chapters cover general estimation methodology (MLE, GMM, M-estimation) with panel data applications throughout. The book covers both linear and nonlinear models with careful attention to assumptions.

Tags

model-basedcontinuous-outcomecross-sectional