MethodAtlas
Model-BasedEstablished

OLS (Robust SEs, Clustering)

The workhorse of empirical research — linear regression with modern standard error corrections.

Quick Reference

When to Use
When you have a continuous outcome and want to estimate the conditional mean as a linear function of covariates, or as a starting point before applying more sophisticated identification strategies.
Key Assumption
For causal interpretation: E[u|X] = 0 (zero conditional mean / exogeneity). The error term must be uncorrelated with all regressors — this is the assumption that separates descriptive regression from causal inference.
Common Mistake
Interpreting OLS coefficients as causal effects without addressing endogeneity (omitted variable bias), or confusing statistical significance with economic significance.
Estimated Time
3 hours

One-Line Implementation

Stata: reg y x1 x2, vce(robust)
R: lm_robust(y ~ x1 + x2, data = df, se_type = 'HC2')
Python: smf.ols('y ~ x1 + x2', data=df).fit(cov_type='HC1')

Download Full Analysis Code

Complete scripts with diagnostics, robustness checks, and result export.

Motivating Example

Imagine you are a labor economist in the 1970s, and you want to answer a deceptively simple question: how much more does an additional year of education earn you?

Jacob Mincer formalized this question in what became the most-estimated equation in all of economics . The Mincer earnings equation says:

ln(wagei)=β0+β1Educationi+β2Experiencei+β3Experiencei2+ui\ln(\text{wage}_i) = \beta_0 + \beta_1 \cdot \text{Education}_i + \beta_2 \cdot \text{Experience}_i + \beta_3 \cdot \text{Experience}_i^2 + u_i

You run this regression on a sample of workers and find β^1=0.10\hat{\beta}_1 = 0.10. Does that mean one more year of school causes a 10% increase in wages?

Maybe. But maybe not. People who get more education are also different in ways you cannot observe — they may be more motivated, have wealthier parents, or live in areas with better schools. If those unobserved factors also affect wages, your OLS estimate is picking up a mix of the true effect of education and the effect of everything correlated with education that you did not control for.

This tension — between what OLS estimates and what you want it to estimate — is the central challenge of applied empirical research, and it is the reason this entire website exists. OLS is the starting point. Every other method on this site exists because OLS alone is often not enough.


A. Overview

What OLS Does

OLS (Ordinary Least Squares) finds the linear function that best predicts your outcome variable, where "best" means the one that minimizes the sum of squared prediction errors. If you have an outcome YY and regressors X1,X2,,XkX_1, X_2, \ldots, X_k, OLS finds the coefficients β^0,β^1,,β^k\hat{\beta}_0, \hat{\beta}_1, \ldots, \hat{\beta}_k that minimize:

i=1n(Yiβ^0β^1X1iβ^kXki)2\sum_{i=1}^{n} (Y_i - \hat{\beta}_0 - \hat{\beta}_1 X_{1i} - \cdots - \hat{\beta}_k X_{ki})^2

In plain language: OLS draws the line (or hyperplane) through your data that makes the vertical distances from the points to the line as small as possible, on average.

What OLS Estimates

The OLS coefficient β^1\hat{\beta}_1 estimates the — specifically, the linear projection of YY onto XX. This coefficient tells you: "On average, how does YY differ between observations that differ by one unit of X1X_1, holding the other regressors constant?"

This projection is a descriptive statement. It becomes causal only under additional assumptions (see Section B).

When to Use OLS

  • You have a continuous outcome variable
  • You want to characterize the conditional mean relationship between variables
  • You want a starting point before applying more sophisticated methods
  • Your research design has addressed endogeneity (e.g., through randomization, controls, or as a first stage for IV)

When NOT to Use OLS

  • Your outcome is binary (use logit/probit), a count (use Poisson/negative binomial), or censored at zero or some threshold (consider a Tobit or other censored regression model, not covered in this catalog)
  • You suspect important unobserved confounders and have no strategy to address them — see the discussion of selection bias for why this matters
  • You want to claim causality but have no identification strategy

Common Confusions


B. Identification

For OLS to give you unbiased and consistent estimates, you need the following assumptions. We state them first in plain language, then formally.

Assumption 1: Linearity

Plain language: The true relationship between YY and the XX's is (approximately) linear. If the relationship is actually curved, your straight line will miss it.

Formally: Yi=Xiβ+uiY_i = X_i'\beta + u_i for some true parameter vector β\beta.

Assumption 2: Random Sampling

Plain language: Each observation in your dataset is drawn independently from the same population. This assumption fails, for example, if you oversample certain groups or if observations within clusters are correlated.

Assumption 3: No Perfect Multicollinearity

Plain language: No regressor is a perfect linear function of the others. If two variables are perfectly correlated (e.g., you include both "age" and "years since birth"), OLS cannot separate their effects and the estimation breaks entirely.

Assumption 4: Zero Conditional Mean (Exogeneity)

Plain language: The error term uiu_i — which captures everything affecting YY that is not in your regressors — is on average zero for every value of XX. This condition is the critical assumption. If people who get more education also have higher unobserved ability, and ability affects wages, then the error term is correlated with education and this assumption fails.

Formally: E[uiXi]=0E[u_i | X_i] = 0.

Gauss-Markov Assumptions (for efficiency)

If Assumptions 1-4 hold, OLS is unbiased. For OLS to be the Best Linear Unbiased Estimator (BLUE) — meaning no other linear estimator has smaller variance — you additionally need:

Assumption 5: Homoscedasticity

Plain language: The spread of the error terms is the same regardless of the value of XX. If errors are larger for some observations than others (e.g., wage variance is higher for highly educated workers), OLS estimates remain unbiased but the conventional standard errors are wrong.

Formally: Var(uiXi)=σ2\text{Var}(u_i | X_i) = \sigma^2 for all ii.

Assumption 6: No Serial Correlation

Plain language: The error for one observation is not correlated with the error for another. This assumption typically fails with time series or panel data.


C. Visual Intuition

Think of OLS geometrically. You have a cloud of data points in a scatter plot. OLS draws the line through the cloud that minimizes the sum of the squared vertical distances from each point to the line.

Interactive Simulation

How Confounders Change OLS Estimates

Adjust the strength of confounding to see how the OLS estimate changes when you omit a relevant variable.

-2.8-1.40.11.63.1X (Regressor)-5.4-2.40.63.76.7Y (Outcome)OLS: 1.97 (true: 2)True beta = 2OLS estimate = 1.97
01
501000

When confounding strength is zero, OLS recovers the true effect. As you increase confounding — meaning the omitted variable is more strongly correlated with both the regressor and the outcome — the OLS estimate drifts away from the truth. This drift is omitted variable bias in action.

Interactive Simulation

OLS Omitted Variable Bias: Full Simulation

Explore how omitting a confounding variable U biases the OLS estimate of the effect of X on Y. The true DGP is Y = 2.0·X + 2·U + ε, where Cor(X, U) = 0.50.

XY
Biased (omits U)Controlled (includes U)

Regression Results

Estimatorβ̂Bias
OLS (omits U)3.251+1.251
OLS (controls U)2.090+0.090
True β2.000
200

Number of observations to generate

2.0

The causal effect of X on Y

0.50

Correlation between X and the omitted variable U

1.5

Standard deviation of the error term

Substantial bias detected. The omitted variable U inflates the naive OLS estimate by +1.25. Controlling for U recovers an estimate much closer to the true β.

The simulation above generates synthetic data where Y = β·X + 2·U + ε and X is correlated with the unobserved confounder U. The "biased" OLS estimate omits U; the "controlled" estimate includes U. Adjust the confounding strength slider to see how the bias magnitude changes. Try setting the confounding to zero — both estimators converge to the true β.

Interactive Simulation

When Does OLS Work — and When Does It Fail?

DGP: Y = 1 + 2.0·X + 2·U + ε, where Cor(X, U) = 0.60. W is an observed proxy with R² on U = 0.40. N = 200 observations.

-9.5-5.3-1.23.07.211.4Covariate (X)Outcome (Y)
Naive OLSWith controls (W)Oracle (controls U)True slopeLow UMid UHigh U

Estimation Results

Estimatorβ̂SE95% CIBias
OLS (no controls)3.0330.113[2.81, 3.25]+1.033
OLS (with W)2.6720.106[2.46, 2.88]+0.672
Oracle (with U)closest2.0220.067[1.89, 2.15]+0.022
True β2.000
200

Number of observations

2.0

The causal effect of X on Y

0.60

Correlation between X and the unobserved confounder (0 = no confounding)

0.40

How well the observed control W captures U (0 = useless, 0.9 = almost perfect)

Why the difference?

Naive OLS overestimates the true effect (β̂ = 3.033, bias = +1.033) because the confounder U is positively correlated with both X (Cor = 0.60) and Y (γ = 2). This is classic omitted variable bias: OLS attributes some of U's effect on Y to X. Adding the observed proxy W (which explains 40% of the variance in U) reduces the bias by about 35% (β̂ = 2.672). The proxy partially captures the confounder, but imperfect proxies leave residual bias. The oracle estimator, which controls for the true U, removes essentially all the bias (β̂ = 2.022, bias = +0.022). Of course, you never observe U in practice — which is exactly why methods like IV, DiD, and FE exist.


D. Mathematical Derivation

Don't worry about the notation yet — here's what this means in words: OLS finds the coefficients that minimize squared prediction errors. The formula is beta-hat equals (X'X) inverse times X'Y.

The OLS Problem. We want to find the vector β^\hat{\beta} that minimizes:

S(β)=(YXβ)(YXβ)=YY2βXY+βXXβS(\beta) = (Y - X\beta)'(Y - X\beta) = Y'Y - 2\beta'X'Y + \beta'X'X\beta

Step 1: Take the first-order condition. Differentiate S(β)S(\beta) with respect to β\beta and set to zero:

Sβ=2XY+2XXβ=0\frac{\partial S}{\partial \beta} = -2X'Y + 2X'X\beta = 0

Step 2: Solve. Rearranging gives the normal equations:

XXβ^=XYX'X\hat{\beta} = X'Y

If XXX'X is invertible (Assumption 3), then:

β^=(XX)1XY\hat{\beta} = (X'X)^{-1}X'Y

Step 3: Show unbiasedness. Substitute Y=Xβ+uY = X\beta + u:

β^=(XX)1X(Xβ+u)=β+(XX)1Xu\hat{\beta} = (X'X)^{-1}X'(X\beta + u) = \beta + (X'X)^{-1}X'u

Taking the conditional expectation:

E[β^X]=β+(XX)1XE[uX]E[\hat{\beta} | X] = \beta + (X'X)^{-1}X'E[u|X]

If E[uX]=0E[u|X] = 0 (Assumption 4), then E[β^X]=βE[\hat{\beta} | X] = \beta. OLS is unbiased.

Step 4: Variance of the estimator. Under homoscedasticity (Var(uX)=σ2I\text{Var}(u|X) = \sigma^2 I):

Var(β^X)=σ2(XX)1\text{Var}(\hat{\beta} | X) = \sigma^2 (X'X)^{-1}

This expression is the formula behind conventional standard errors. When homoscedasticity fails, you need the heteroscedasticity-robust (sandwich) estimator instead:

V^robust=(XX)1(i=1nu^i2xixi)(XX)1\hat{V}_{\text{robust}} = (X'X)^{-1} \left(\sum_{i=1}^n \hat{u}_i^2 x_i x_i'\right) (X'X)^{-1}

This sandwich estimator is the heteroscedasticity-consistent (HC) estimator from White (1980). The formula as written is HC0; HC1 applies an additional N/(Nk)N/(N-k) finite-sample correction (the default in Stata). R's sandwich package defaults to HC3 (following Long and Ervin (2000)), while Python's statsmodels uses conventional (non-robust) SEs by default — HC1 is commonly specified explicitly via cov_type='HC1'.


E. Implementation

Basic OLS with Robust Standard Errors

library(estimatr)

# OLS with HC1 robust standard errors (matches Stata's vce(robust))
m1 <- lm_robust(lwage ~ educ + exper + I(exper^2),
              data = df,
              se_type = "HC1")
summary(m1)

# Clustered standard errors (CR2 is the estimatr default; bias-reduced)
m2 <- lm_robust(lwage ~ educ + exper + I(exper^2),
              data = df,
              clusters = state,
              se_type = "CR2")
summary(m2)
Requiresestimatr

F. Diagnostics

# --- Fit the base model ---
library(lmtest)
library(car)

m <- lm(lwage ~ educ + exper + I(exper^2), data = df)

# F.1 Residual plot: residuals vs fitted values
plot(fitted(m), resid(m),
   xlab = "Fitted values", ylab = "Residuals",
   main = "Residuals vs Fitted")
abline(h = 0, lty = 2, col = "red")

# F.2 Variance Inflation Factors
vif(m)

# F.3 Breusch-Pagan test for heteroscedasticity
bptest(m)

# F.4 Normality of residuals
shapiro.test(resid(m)[1:5000])  # Shapiro-Wilk (max 5000 obs)
qqnorm(resid(m)); qqline(resid(m), col = "red")

# F.5 Ramsey RESET test for functional form
resettest(m, power = 2:3, type = "fitted")
Requireslmtestcar

F.1 Residual Plots

Plot residuals (u^i\hat{u}_i) against fitted values (Y^i\hat{Y}_i) and against each regressor. You should see a random scatter with no pattern. A funnel shape indicates heteroscedasticity; a curve indicates misspecification.

F.2 Variance Inflation Factors (VIF)

VIF measures how much multicollinearity inflates the variance of each coefficient. A common rule of thumb: VIF > 10 signals a problem. In Stata: estat vif. In R: car::vif(model). In Python: statsmodels variance_inflation_factor.

F.3 Heteroscedasticity Tests

The Breusch-Pagan test and White test formally test whether the error variance is constant. In practice, robust standard errors are a common default — they are valid whether or not heteroscedasticity is present, though they can be anti-conservative in small samples (see Section E above).

F.4 Normality of Residuals

OLS does not require normally distributed errors for consistency or unbiasedness. Normality is only needed for exact finite-sample inference (the t and F distributions). With large samples, the Central Limit Theorem ensures approximate normality of β^\hat{\beta} regardless. Do not reject an otherwise good model because of non-normal residuals.

F.5 Ramsey RESET Test

Tests for omitted nonlinear terms. If the test rejects, consider adding quadratic or interaction terms.


Interpreting Your Results

How to Interpret Coefficients

Linear-linear: β^1=0.10\hat{\beta}_1 = 0.10 means a one-unit increase in XX is associated with a 0.10-unit increase in YY, holding other variables constant.

Log-linear (log outcome): β^1=0.10\hat{\beta}_1 = 0.10 means a one-unit increase in XX is associated with approximately a 10% increase in YY.

Log-log (both logged): β^1=0.10\hat{\beta}_1 = 0.10 means a 1% increase in XX is associated with a 0.10% increase in YY (an elasticity).

Confidence Intervals

A 95% confidence interval of [0.05,0.15][0.05, 0.15] means: if you repeated this study many times and computed a confidence interval each time, 95% of those intervals would contain the true β\beta. It does not mean there is a 95% probability that β\beta lies in this particular interval.

Statistical vs. Economic Significance

A coefficient can be statistically significant (small p-value) but economically trivial (tiny effect size), or statistically insignificant (large p-value) but economically meaningful (large point estimate with wide confidence interval due to limited data). It is important to discuss both.

Common Misstatements to Avoid

  • Do not say "X causes Y" unless your design supports it
  • Do not say "X has no effect" when you mean "the coefficient is not statistically significant" — a null result could mean no effect or insufficient power
  • Do not interpret R-squared as a measure of model quality or causal validity
  • Do not compare R-squared across models with different dependent variables

G. What Can Go Wrong

Assumption Failure Demo

Omitted Variable Bias

Regression includes all relevant confounders

Estimated effect of education on wages: 0.06 (true effect: 0.06)

Assumption Failure Demo

Heteroscedasticity with Conventional SEs

Using robust standard errors

SE = 0.032, 95% CI: [0.037, 0.163], correct coverage

Assumption Failure Demo

Multicollinearity

Regressors are moderately correlated (correlation = 0.3)

Coefficients are stable, SEs are moderate, VIF = 1.1


H. Practice

H.1 Concept Checks

Concept Check

You regress wages on education and find a coefficient of 0.08. A classmate says this means one more year of education causes an 8% wage increase. What is wrong with this interpretation?

Concept Check

You run a regression of firm revenue on advertising spending and find that the coefficient on advertising is positive but not statistically significant (p = 0.15). Your co-author concludes: 'Advertising has no effect on revenue.' Is this correct?

Concept Check

You are studying how state minimum wage laws affect individual employment. Your data has 500,000 workers in 50 states. How should you compute standard errors?

Concept Check

Your colleague adds 15 control variables to a regression and the R-squared jumps from 0.12 to 0.65. They say the model is now much better. What should you be cautious about?

Concept Check

In a regression of test scores on class size, you find VIF = 22 for class size and VIF = 20 for school enrollment. What does this tell you and what might you do?

H.2 Guided Exercise

Guided Exercise

Fill in the blanks to complete this interpretation of an OLS regression.

You run the following Mincer regression on a sample of 5,000 workers and obtain: ln(wage) = 6.2 + 0.08*Education + 0.03*Experience - 0.0005*Experience^2, with robust standard error on Education = 0.015. The R-squared is 0.30.

One more year of education is associated with approximately what percent higher wages?

What is the lower bound of the 95% CI for the Education coefficient?

What is the upper bound of the 95% CI for the Education coefficient?

The model explains what percentage of the variance in log wages?

H.3 Error Detective

Error Detective

Read the analysis below carefully and identify the errors.

A researcher studies the effect of a new management training program on employee productivity. They run: reg productivity training age tenure, vce(robust) They find: coefficient on training = 5.2 (SE = 1.8, p = 0.004). They write: "The management training program causes a 5.2-unit increase in productivity (p < 0.01). The R-squared of 0.45 confirms that our model is well-specified. Since we control for age and tenure, our estimate is free of omitted variable bias."

Select all errors you can find:

Error Detective

Read the analysis below carefully and identify the errors.

A political scientist regresses voter turnout (county level) on whether the county adopted early voting (a state-level policy). They cluster standard errors at the county level and report: Coefficient on early voting: 3.1 percentage points (clustered SE = 0.8, p < 0.001). "We cluster at the county level to account for within-county correlation in voter behavior."

Select all errors you can find:

H.5 You Are the Referee

Referee Exercise

Read the paper summary below and write a brief referee critique (2-3 sentences) of the identification strategy.

Paper Summary

The authors study the effect of corporate board diversity on firm performance (ROA). Using a panel of S&P 500 firms (2010-2020), they regress ROA on the percentage of female board members, controlling for firm size (log assets), leverage, and industry fixed effects. They use robust standard errors and find that a 10 percentage point increase in female board representation is associated with a 0.8 percentage point increase in ROA (SE = 0.3, p = 0.008). They conclude that board diversity improves firm performance.

Key Table

VariableCoefficientRobust SEp-value
Female Board Share0.080.030.008
Log(Assets)0.020.010.045
Leverage-0.150.040.000
Industry FEYes
R-squared0.23
N5,500

Authors' Identification Claim

By controlling for firm size, leverage, and industry fixed effects, we address the most important confounders. The robust standard errors account for heteroscedasticity.


I. Swap-In: When to Use Something Else

  • Fixed effects: When unobserved time-invariant confounders are the primary concern and panel data are available.
  • IV / 2SLS: When endogeneity from simultaneity, measurement error, or omitted variables cannot be addressed by controls alone, and a valid instrument is available.
  • Logit / Probit: When the outcome is binary (0/1). OLS (the linear probability model) can still be useful for average marginal effects, but logit/probit avoids predictions outside [0, 1].
  • Poisson / Negative Binomial: When the outcome is a non-negative count or inherently multiplicative (e.g., trade flows, patent counts).
  • Difference-in-differences: When a policy change creates a natural experiment with a comparison group and temporal variation.

J. Reviewer Checklist

Critical Reading Checklist


Paper Library

Foundational (9)

White, H. (1980). A Heteroskedasticity-Consistent Covariance Matrix Estimator and a Direct Test for Heteroskedasticity.

EconometricaDOI: 10.2307/1912934

This paper introduced the now-standard 'robust standard errors' that researchers routinely use with OLS. Before White's correction, standard errors could be misleadingly small when the variance of the error term was not constant across observations. Nearly every empirical paper today uses some variant of this approach.

Newey, W. K., & West, K. D. (1987). A Simple, Positive Semi-Definite, Heteroskedasticity and Autocorrelation Consistent Covariance Matrix.

EconometricaDOI: 10.2307/1913610

This short but hugely influential paper extended White's robust standard errors to also account for autocorrelation in time-series data. The 'Newey-West standard errors' or 'HAC standard errors' are standard practice whenever researchers work with data that have a time dimension.

Angrist, J. D., & Pischke, J.-S. (2009). Mostly Harmless Econometrics: An Empiricist's Companion.

Princeton University PressDOI: 10.1515/9781400829828

This widely-read textbook explains OLS regression and its causal-inference extensions in an accessible way. It is the go-to reference for understanding when OLS estimates can be interpreted causally and how techniques like IV, DID, and RDD build on the OLS framework.

Abadie, A., Athey, S., Imbens, G. W., & Wooldridge, J. M. (2020). Sampling-Based versus Design-Based Uncertainty in Regression Analysis.

EconometricaDOI: 10.3982/ECTA12675

This paper clarifies when and why researchers should cluster standard errors in regression analysis. It distinguishes between sampling-based uncertainty (from drawing a sample from a population) and design-based uncertainty (from treatment assignment), providing rigorous guidance on a question that affects nearly every applied OLS study.

Holland, P. W. (1986). Statistics and Causal Inference.

Journal of the American Statistical AssociationDOI: 10.1080/01621459.1986.10478354

Holland articulated the fundamental problem of causal inference—that we can never observe both potential outcomes for the same unit—and formalized the Rubin Causal Model framework. His dictum 'no causation without manipulation' shaped how a generation of researchers thinks about the conditions under which statistical associations can be given causal interpretations.

Moulton, B. R. (1990). An Illustration of a Pitfall in Estimating the Effects of Aggregate Variables on Micro Units.

Review of Economics and StatisticsDOI: 10.2307/2109724

Moulton demonstrated that when aggregate-level variables (such as state policies) are used to explain individual-level outcomes, OLS standard errors that ignore within-group correlation can be dramatically understated. This paper established the 'Moulton problem' and motivated the widespread adoption of clustered standard errors in applied microeconomics.

Cameron, A. C., Gelbach, J. B., & Miller, D. L. (2008). Bootstrap-Based Improvements for Inference with Clustered Errors.

Review of Economics and StatisticsDOI: 10.1162/rest.90.3.414

Addresses what happens when clustering is necessary but the number of clusters is small (fewer than 30-50). Proposes the wild cluster bootstrap as a solution, which has become the standard approach when researchers have too few clusters for asymptotic cluster-robust standard errors to be reliable.

Frisch, R., & Waugh, F. V. (1933). Partial Time Regressions as Compared with Individual Trends.

EconometricaDOI: 10.2307/1907330

The original result establishing that a coefficient in a multiple regression can be obtained by first residualizing both the outcome and the regressor against all other covariates. The Frisch-Waugh-Lovell (FWL) theorem provides the theoretical foundation for understanding what 'controlling for' means in multiple regression and is the basis for modern fixed-effects estimation.

Long, J. S., & Ervin, L. H. (2000). Using Heteroscedasticity Consistent Standard Errors in the Linear Regression Model.

The American StatisticianDOI: 10.1080/00031305.2000.10474549

A simulation study comparing HC0, HC1, HC2, and HC3 heteroscedasticity-consistent standard error estimators. Found that HC3 provides the best finite-sample performance, influencing R's sandwich package to adopt HC3 as its default.

Application (8)

Mincer, J. (1974). Schooling, Experience, and Earnings.

National Bureau of Economic Research / Columbia University Press

Mincer's earnings equation—regressing log wages on years of schooling and experience—became one of the most replicated OLS models in economics. It established the standard approach for estimating returns to education and remains a benchmark in labor economics.

Griliches, Z. (1977). Estimating the Returns to Schooling: Some Econometric Problems.

EconometricaDOI: 10.2307/1913285

Griliches systematically examined the biases in OLS estimates of returns to schooling, including ability bias and measurement error. This paper is a classic illustration of why researchers must think carefully about omitted variables when interpreting OLS coefficients causally.

Huselid, M. A. (1995). The Impact of Human Resource Management Practices on Turnover, Productivity, and Corporate Financial Performance.

Academy of Management JournalDOI: 10.2307/256741

This influential management study used OLS (and related cross-sectional methods) to estimate the relationship between HR practices and firm performance. It helped launch the field of strategic HRM and illustrates both the power and limitations of regression-based approaches in management research.

Hamilton, B. H., & Nickerson, J. A. (2003). Correcting for Endogeneity in Strategic Management Research.

Strategic OrganizationDOI: 10.1177/1476127003001001218

This paper warned strategy researchers that naive OLS estimates of the strategy-performance relationship are often biased by endogeneity. It provided an accessible tutorial on the problem and pointed toward solutions like instrumental variables and selection models.

Bertrand, M., & Schoar, A. (2003). Managing with Style: The Effect of Managers on Firm Policies.

Quarterly Journal of EconomicsDOI: 10.1162/003355303322552775

This paper used OLS with manager fixed effects to show that individual CEO 'style' matters for corporate decisions. It is a landmark application of regression methods to separate manager effects from firm effects in management and finance research.

Krueger, A. B. (1999). Experimental Estimates of Education Production Functions.

Quarterly Journal of EconomicsDOI: 10.1162/003355399556052

Uses Tennessee's Project STAR randomized class-size experiment to estimate the effect of class size on student achievement via OLS. Because treatment was randomized, the OLS coefficient has a causal interpretation, demonstrating that the method is not the issue -- the research design is what determines causality.

Bertrand, M., & Mullainathan, S. (2004). Are Emily and Greg More Employable Than Lakisha and Jamal? A Field Experiment on Labor Market Discrimination.

American Economic ReviewDOI: 10.1257/0002828042002561

A landmark audit study using OLS on experimentally generated data. Demonstrates clear and clean use of OLS when the research design (randomized names on resumes) justifies causal interpretation, showing significant racial discrimination in hiring callbacks.

Bloom, N., & Van Reenen, J. (2007). Measuring and Explaining Management Practices Across Firms and Countries.

Quarterly Journal of EconomicsDOI: 10.1162/qjec.2007.122.4.1351

Uses OLS with an extensive set of controls to characterize the relationship between management practices and firm performance across countries. A clear example of using OLS descriptively when experimental variation is unavailable, with careful attention to causal language.

Survey (3)

King, G., & Roberts, M. E. (2015). How Robust Standard Errors Expose Methodological Problems They Do Not Fix, and What to Do About It.

Political AnalysisDOI: 10.1093/pan/mpu015

This paper argues that researchers often use robust standard errors as a band-aid rather than fixing the underlying model specification. It provides practical guidance on when robust SEs are appropriate and when the model itself needs to be reconsidered.

Cameron, A. C., & Miller, D. L. (2015). A Practitioner's Guide to Cluster-Robust Inference.

Journal of Human ResourcesDOI: 10.3368/jhr.50.2.317

This highly practical survey covers all aspects of cluster-robust inference in OLS regression, including when to cluster, at what level, and what to do when the number of clusters is small. It has become the essential reference for applied researchers deciding how to handle clustered data.

Angrist, J. D., & Pischke, J.-S. (2010). The Credibility Revolution in Empirical Economics: How Better Research Design Is Taking the Con out of Econometrics.

Journal of Economic PerspectivesDOI: 10.1257/jep.24.2.3

Provides the intellectual context for why applied economics moved from 'throw variables into OLS and see what sticks' to design-based causal inference. Helps researchers understand where OLS fits in the larger methodological landscape and why credible identification strategies matter.

Tags

model-basedcontinuous-outcomecross-sectional