When should I use OLS (Robust SEs, Clustering)?

When you have a continuous outcome and want to estimate the conditional mean as a linear function of covariates, or as a starting point before applying more sophisticated identification strategies.

What is the key assumption of OLS (Robust SEs, Clustering)?

For causal interpretation: E[ε|X] = 0 (zero conditional mean / exogeneity). The error term must be uncorrelated with all regressors — this is the assumption that separates descriptive regression from causal inference.

What is the most common mistake with OLS (Robust SEs, Clustering)?

Interpreting OLS coefficients as causal effects without addressing endogeneity (omitted variable bias), or confusing statistical significance with economic significance.

Method·beginner·14 min read

Model-BasedEstablished

OLS (Robust SEs, Clustering)

The workhorse of empirical research — linear regression with modern standard error corrections.

When to Use: When you have a continuous outcome and want to estimate the conditional mean as a linear function of covariates, or as a starting point before applying more sophisticated identification strategies.
Assumption: For causal interpretation: E[ε|X] = 0 (zero conditional mean / exogeneity). The error term must be uncorrelated with all regressors — this is the assumption that separates descriptive regression from causal inference.
Mistake: Interpreting OLS coefficients as causal effects without addressing endogeneity (omitted variable bias), or confusing statistical significance with economic significance.
Reading Time: ~14 min read · 11 sections · 9 interactive exercises

One-Line Implementation

Rlm_robust(y ~ x1 + x2, data = df, se_type = 'HC1')

Statareg y x1 x2, vce(robust)

Pythonsmf.ols('y ~ x1 + x2', data=df).fit(cov_type='HC1')

Download Full Analysis Code

Complete scripts with diagnostics, robustness checks, and result export.

Motivating Example: The Mincer Earnings Equation

Imagine you are a labor economist in the 1970s, and you want to answer a deceptively simple question: how much more does an additional year of education earn you?

Jacob Mincer formalized this question in what became the most-estimated equation in all of economics . The Mincer earnings equation says:

\ln(\text{wage}_i) = \beta_0 + \beta_1 \cdot \text{Education}_i + \beta_2 \cdot \text{Experience}_i + \beta_3 \cdot \text{Experience}_i^2 + \varepsilon_i

You run this regression on a sample of workers and find $\hat{\beta}_1 = 0.10$ . Does that mean one more year of school causes a 10% increase in wages?

Maybe. But maybe not. People who get more education are also different in ways you typically cannot observe — they may be more motivated, have wealthier parents, or live in areas with better schools. If those unobserved factors also affect wages, your OLS estimate is picking up a mix of the true effect of education and the effect of everything correlated with education that you did not control for.

This tension — between what OLS estimates and what you want it to estimate — is the central challenge of applied empirical research, and it is the reason this entire website exists. OLS is the starting point. Every other method on this site exists because OLS alone is often not enough.

AOverview

What OLS Does

OLS (Ordinary Least Squares) finds the linear function that best predicts your outcome variable, where "best" means the one that minimizes the sum of squared prediction errors. If you have an outcome $Y$ and regressors $X_1, X_2, \ldots, X_k$ , OLS finds the coefficients $\hat{\beta}_0, \hat{\beta}_1, \ldots, \hat{\beta}_k$ that minimize:

\sum_{i=1}^{n} (Y_i - \hat{\beta}_0 - \hat{\beta}_1 X_{1i} - \cdots - \hat{\beta}_k X_{ki})^2

In plain language: OLS draws the line (or hyperplane) through your data that makes the vertical distances from the points to the line as small as possible, on average.

What OLS Estimates

The OLS coefficient $\hat{\beta}_1$ estimates the best linear predictor (linear projection) of $Y$ onto $X$ , which coincides with the (CEF) when the CEF is linear. This coefficient tells you: "On average, how does $Y$ differ between observations that differ by one unit of $X_1$ , holding the other regressors constant?"

Even when the CEF is nonlinear, OLS provides the minimum mean squared error linear approximation to it (Angrist & Pischke, 2009). This projection is a descriptive statement. It becomes causal only under additional assumptions (see Section B).

When to Use OLS

You have a continuous outcome variable
You want to characterize the conditional mean relationship between variables
You want a starting point before applying more sophisticated methods
Your research design has addressed (e.g., through randomization, controls, or as a first stage for instrumental variables (IV))

When NOT to Use OLS

Your outcome is binary (use logit/probit), a count (use Poisson/negative binomial), or censored at zero or some threshold (consider a Tobit or other censored regression model, not covered in this catalog)
You suspect important unobserved confounders and have no strategy to address them — see the discussion of selection bias for why this matters
You want to claim causality but have no identification strategy

Common Confusions

These confusions are questions real students ask — and they are good questions

Q: Does a high R-squared mean my model is good? No — and this misconception is widespread in applied research. R-squared tells you how much of the variance in $Y$ your regressors explain. It does not tell you whether your coefficient estimates are unbiased, whether you have the right functional form, or whether you have omitted important variables. You can have R-squared = 0.95 with a severely biased coefficient estimate, and R-squared = 0.05 with a perfectly identified causal effect. In cross-sectional regressions, R-squared values of 0.10-0.30 are common and perfectly acceptable. See Section F.5 for more.
Q: When should I cluster my standard errors? Cluster your standard errors whenever the treatment or key regressor varies at a group level rather than the individual level. For example, if you are studying the effect of a state-level minimum wage policy on individual wages, you generally want to cluster at the state level because the "treatment" (minimum wage) is the same for everyone in a state. The classic reference is Moulton (1990). See Section E for implementation.
Q: Is OLS causal? OLS is a tool — it is neither causal nor non-causal by itself. Whether an OLS coefficient has a causal interpretation depends entirely on your research design. If you randomly assign treatment (as in an experiment), OLS is causal. If you have observational data with unobserved confounders, it is not. The method does not determine causality — the identification strategy does.
Q: Is it recommended to use by default? In most applied settings, yes. Conventional OLS standard errors assume (constant error variance), which rarely holds in practice. Robust standard errors are valid whether or not homoscedasticity holds, so you generally lose little by using them, and Angrist and Pischke (2009) recommend this as a default. That said, there are exceptions: when the homoscedasticity assumption is known to hold (e.g., in a balanced randomized experiment), or in small-sample settings where HC corrections can be imprecise and even anti-conservative, model-based standard errors may be preferred.
Q: What is the difference between $\beta$ and $\hat{\beta}$ ? $\beta$ is the true population parameter — it is what you are trying to learn but cannot directly observe. $\hat{\beta}$ ("beta-hat") is your estimate — the number your regression produces from a particular sample. The whole point of statistical inference (standard errors, confidence intervals, p-values) is to quantify how far $\hat{\beta}$ might be from $\beta$ .

BIdentification

For OLS to give you unbiased estimates (and consistent estimates under weaker conditions), you need the following assumptions. We state them first in plain language, then formally.

Assumption 1: Linearity

Plain language: The true relationship between $Y$ and the $X$ 's is (approximately) linear. If the relationship is actually curved, your straight line will miss it.

Formally: $Y_i = X_i'\beta + \varepsilon_i$ for some true parameter vector $\beta$ .

Assumption 2: Random Sampling

Plain language: Each observation in your dataset is drawn independently from the same population. This assumption fails, for example, if you oversample certain groups or if observations within clusters are correlated.

Assumption 3: No Perfect

Plain language: No regressor is a perfect linear function of the others. If two variables are perfectly correlated (e.g., you include both "age" and "years since birth"), OLS cannot separate their effects and the estimation breaks entirely.

Assumption 4: Zero Conditional Mean (Exogeneity)

Plain language: The error term $\varepsilon_i$ — which captures everything affecting $Y$ that is not in your regressors — is on average zero for every value of $X$ . This condition is the critical assumption. If people who get more education also have higher unobserved ability, and ability affects wages, then the error term is correlated with education and the exogeneity assumption fails.

Formally: $E[\varepsilon_i | X_i] = 0$ .

This assumption is central to causal interpretation

Assumption 4 is what separates a descriptive regression from a causal one. When people say "OLS is biased because of endogeneity" or "you have (OVB)," they mean Assumption 4 is violated. Every other method on this site — IV, difference-in-differences (DiD), regression discontinuity design (RDD), matching — exists because, in many research settings, this assumption does not hold for OLS. In strategic management, Shaver (1998) provided an influential demonstration of how ignoring endogeneity in strategy-performance regressions leads to systematically misleading conclusions. When it fails, you can turn to design-based approaches such as fixed effects to remove time-invariant confounders, difference-in-differences to exploit policy changes, or matching methods to construct comparable comparison groups.

Assumptions (for efficiency)

If Assumptions 1-4 hold, OLS is unbiased. For OLS to be the Best Linear Unbiased Estimator (BLUE) — meaning no other linear estimator has smaller variance — you additionally need:

Assumption 5: Homoscedasticity

Plain language: The spread of the error terms is the same regardless of the value of $X$ . If errors are larger for some observations than others (e.g., wage variance is higher for highly educated workers), OLS estimates remain unbiased but the conventional standard errors are wrong.

Formally: $\text{Var}(\varepsilon_i | X_i) = \sigma^2$ for all $i$ .

Assumption 6: No Serial Correlation (time series/panel only)

Plain language: The error for one observation is not correlated with the error for another. This assumption is automatically satisfied with cross-sectional data under random sampling (Assumption 2) and is only a separate concern with time series or panel data.

CVisual Intuition

Think of OLS geometrically. You have a cloud of data points in a scatter plot. OLS draws the line through the cloud that minimizes the sum of the squared vertical distances from each point to the line.

DMathematical Derivation

Don't worry about the notation yet — here's what this means in words: OLS finds the coefficients that minimize squared prediction errors. The formula is beta-hat equals (X'X) inverse times X'Y.

The OLS Problem. We want to find the vector $\hat{\beta}$ that minimizes:

S(\beta) = (Y - X\beta)'(Y - X\beta) = Y'Y - 2\beta'X'Y + \beta'X'X\beta

Step 1: Take the first-order condition. Differentiate $S(\beta)$ with respect to $\beta$ and set to zero:

\frac{\partial S}{\partial \beta} = -2X'Y + 2X'X\beta = 0

Step 2: Solve. Rearranging gives the normal equations:

X'X\hat{\beta} = X'Y

If $X'X$ is invertible (Assumption 3), then:

\hat{\beta} = (X'X)^{-1}X'Y

Step 3: Show unbiasedness. Substitute $Y = X\beta + \varepsilon$ :

\hat{\beta} = (X'X)^{-1}X'(X\beta + \varepsilon) = \beta + (X'X)^{-1}X'\varepsilon

Taking the conditional expectation:

E[\hat{\beta} \mid X] = \beta + (X'X)^{-1}X'E[\varepsilon \mid X]

If $E[\varepsilon \mid X] = 0$ (Assumption 4), then $E[\hat{\beta} \mid X] = \beta$ . OLS is unbiased.

Step 4: Variance of the estimator. Under homoscedasticity ( $\text{Var}(\varepsilon|X) = \sigma^2 I$ ):

\text{Var}(\hat{\beta} \mid X) = \sigma^2 (X'X)^{-1}

This expression is the formula behind conventional standard errors. When homoscedasticity fails, you need the -robust (sandwich) estimator instead:

\hat{V}_{\text{robust}} = (X'X)^{-1} \left(\sum_{i=1}^n \hat{\varepsilon}_i^2 x_i x_i'\right) (X'X)^{-1}

This sandwich estimator is the heteroscedasticity-consistent (HC) estimator from White (1980). The formula as written is HC0; HC1 applies an additional $N/(N-k)$ finite-sample correction (the default in Stata). R's sandwich package defaults to HC3 (following Long and Ervin (2000)), while Python's statsmodels uses conventional (non-robust) SEs by default — HC1 is commonly specified explicitly via cov_type='HC1'.

Each coefficient in a multivariate regression equals the bivariate slope coefficient from regressing $Y$ on the residual from regressing that $X$ on all other covariates. The partialling-out identity is the theorem, also called the regression anatomy formula (Angrist & Pischke, 2009). The practical implication is that a multivariate OLS coefficient isolates only the variation in $X$ that is not explained by the other regressors.

EImplementation

Basic OLS with Robust Standard Errors

1# Requires: estimatr
2library(estimatr)
3
4# --- Step 1: OLS with Robust Standard Errors ---
5# lm_robust() fits OLS and computes heteroskedasticity-robust SEs.
6# HC1 matches Stata's vce(robust) with degrees-of-freedom correction.
7# I(exper^2) includes experience-squared to capture diminishing returns.
8m1 <- lm_robust(lwage ~ educ + exper + I(exper^2),
9              data = df,
10              se_type = "HC1")
11# summary() reports coefficients, robust SEs, t-stats, and R-squared.
12# The educ coefficient = approximate % wage increase per year of education.
13summary(m1)
14
15# --- Step 2: Clustered Standard Errors ---
16# When observations within a group (e.g., state) are correlated,
17# cluster at that level to avoid understating uncertainty.
18# CR2 (bias-reduced) is the estimatr default; better for few clusters.
19m2 <- lm_robust(lwage ~ educ + exper + I(exper^2),
20              data = df,
21              clusters = state,
22              se_type = "CR2")
23# Compare SEs to Step 1: clustered SEs are typically larger,
24# reflecting within-cluster correlation.
25summary(m2)

Requiresestimatr

Robust SE convention across languages

All code snippets on Method Atlas use HC1 heteroscedasticity-robust standard errors by default: Stata's vce(robust), R's se_type = "HC1" or vcov = "HC1", and Python's cov_type='HC1'. HC1 applies a degrees-of-freedom correction (multiplying by n / (n − k)) and is the most widely used variant in applied economics.

Alternatives: HC2 uses leverage-adjusted residuals and has better finite-sample properties — R's estimatr package defaults to HC2, and se_type = "HC2" is a reasonable choice when sample sizes are small. HC3 is jackknife-like and the most conservative. For cluster-robust SEs, Stata uses CR1 by default (vce(cluster)), while R's estimatr offers CR0 and CR2 (bias-reduced); to match Stata's CR1, use se_type = "stata" in estimatr. The cov_type parameter in statsmodels .fit() accepts 'HC0', 'HC1', 'HC2', 'HC3', and 'cluster'. See the statsmodels covariance documentation for details.

FDiagnostics

1# --- Fit the base model ---
2library(lmtest)
3library(car)
4
5m <- lm(lwage ~ educ + exper + I(exper^2), data = df)
6
7# F.1 Residual plot: residuals vs fitted values
8plot(fitted(m), resid(m),
9   xlab = "Fitted values", ylab = "Residuals",
10   main = "Residuals vs Fitted")
11abline(h = 0, lty = 2, col = "red")
12
13# F.2 Variance Inflation Factors
14vif(m)
15
16# F.3 Breusch-Pagan test for heteroscedasticity
17bptest(m)
18
19# F.4 Normality of residuals
20shapiro.test(resid(m)[1:5000])  # Shapiro-Wilk (max 5000 obs)
21qqnorm(resid(m)); qqline(resid(m), col = "red")
22
23# F.5 Ramsey RESET test for functional form
24resettest(m, power = 2:3, type = "fitted")

Requireslmtest car

F.1 Residual Plots

Plot residuals ( $\hat{\varepsilon}_i$ ) against fitted values ( $\hat{Y}_i$ ) and against each regressor. You generally want to see a random scatter with no pattern. A funnel shape indicates ; a curve indicates misspecification.

F.2 Variance Inflation Factors (VIF)

VIF measures how much multicollinearity inflates the variance of each coefficient. A common rule of thumb: VIF > 10 signals a problem. In Stata: estat vif. In R: car::vif(model). In Python: statsmodels variance_inflation_factor.

F.3 Heteroscedasticity Tests

The Breusch-Pagan test and White test formally test whether the error variance is constant. In practice, robust standard errors are a common default — they are valid whether or not heteroscedasticity is present, though they can be anti-conservative in small samples (see Section E above).

F.4 Normality of Residuals

OLS does not require normally distributed errors for consistency or unbiasedness. Normality is only needed for exact finite-sample inference (the t and F distributions). With large samples, the Central Limit Theorem ensures approximate normality of $\hat{\beta}$ regardless. Do not reject an otherwise good model because of non-normal residuals.

F.5 Ramsey RESET Test

Tests for omitted nonlinear terms. If the test rejects, consider adding quadratic or interaction terms.

Interpreting Your Results

Correct interpretation depends on the functional form of the regression and the units of measurement.

How to Interpret Coefficients

Linear-linear: $\hat{\beta}_1 = 0.10$ means a one-unit increase in $X$ is associated with a 0.10-unit increase in $Y$ , holding other variables constant.

Log-linear (log outcome): $\hat{\beta}_1 = 0.10$ means a one-unit increase in $X$ is associated with approximately a 10% increase in $Y$ .

Log-log (both logged): $\hat{\beta}_1 = 0.10$ means a 1% increase in $X$ is associated with a 0.10% increase in $Y$ (an elasticity).

Confidence Intervals

A 95% confidence interval of $[0.05, 0.15]$ means: if you repeated this study many times and computed a confidence interval each time, 95% of those intervals would contain the true $\beta$ . It does not mean there is a 95% probability that $\beta$ lies in this particular interval.

Statistical vs. Economic Significance

A coefficient can be statistically significant (small p-value) but economically trivial (tiny ), or statistically insignificant (large p-value) but economically meaningful (large point estimate with wide confidence interval due to limited data). It is important to discuss both.

Common Misstatements to Avoid

Do not say "X causes Y" unless your design supports it
Do not say "X has no effect" when you mean "the coefficient is not statistically significant" — a null result could mean no effect or insufficient power
Do not interpret R-squared as a measure of model quality or causal validity
Do not compare R-squared across models with different dependent variables

GWhat Can Go Wrong

What Can Go Wrong

Omitted Variable Bias

Regression includes all relevant confounders

Estimated effect of education on wages: 0.06 (true effect: 0.06)

The short regression coefficient equals the long regression coefficient plus the effect of the omitted variable times the regression of the omitted on the included: $\text{plim}(\hat{\beta}_{\text{short}}) = \beta_{\text{long}} + \beta_{\text{omitted}} \times \delta$ , where $\delta$ is the coefficient from regressing the omitted variable on the included variable . This formula shows that OVB is zero only when the omitted variable is uncorrelated with the included regressor ( $\delta = 0$ ) or has no effect on the outcome ( $\beta_{\text{omitted}} = 0$ ).

What Can Go Wrong

Heteroscedasticity with Conventional SEs

Using robust standard errors

SE = 0.032, 95% CI: [0.037, 0.163], correct coverage

What Can Go Wrong

Multicollinearity

Regressors are moderately correlated (correlation = 0.3)

Coefficients are stable, SEs are moderate, VIF = 1.1

What Can Go Wrong

Measurement Error in Regressors

Explanatory variable is measured precisely

Estimated coefficient reflects the true relationship

Classical measurement error in an explanatory variable produces — the coefficient is biased toward zero by the factor $\sigma_x^2 / (\sigma_x^2 + \sigma_e^2)$ , called the reliability ratio. Measurement error in the dependent variable does not bias coefficients but inflates standard errors .

HPractice

H.1 Concept Checks

Concept Check

You regress wages on education and find a coefficient of 0.08. A classmate says the coefficient means one more year of education causes an 8% wage increase. What is wrong with this interpretation?

Nothing — the stated interpretation is correct.The coefficient only tells us about the conditional correlation, not the causal effect, because of potential omitted variable bias.The coefficient is wrong because R-squared is probably low.The coefficient is wrong because the sample size is probably too small.

Concept Check

You run a regression of firm revenue on advertising spending and find that the coefficient on advertising is positive but not statistically significant (p = 0.15). Your co-author concludes: 'Advertising has no effect on revenue.' Is this correct?

Yes — if p > 0.05, there is no effect.No — the correct statement is that we cannot reject zero effect given our data, but this result does not prove the effect is zero. The confidence interval may include both zero and economically large effects.No — the problem is that advertising is endogenous, so we generally do not want to run OLS at all.Yes — but only if the R-squared is also low.

Concept Check

You are studying how state minimum wage laws affect individual employment. Your data has 500,000 workers in 50 states. How should you compute standard errors?

Use conventional (non-robust) standard errors because the sample is large.Use heteroscedasticity-robust (HC) standard errors.Cluster standard errors at the state level.Cluster at the individual level.

Concept Check

Your colleague adds 15 control variables to a regression and the R-squared jumps from 0.12 to 0.65. They say the model is now much better. What should you be cautious about?

Nothing — higher R-squared always indicates a better model.Some controls might be bad controls — variables affected by the treatment — which can introduce bias even as R-squared rises.Use adjusted R-squared instead.The model is overfitting because there are too many variables.

Concept Check

In a regression of test scores on class size, you find VIF = 22 for class size and VIF = 20 for school enrollment. What does this tell you and what might you do?

These VIF values are fine — VIF needs to be over 100 to be a problem.Class size and enrollment are highly collinear. The individual coefficients will be imprecise and unstable, even though their joint effect may still be well estimated.The regression will fail and produce no output.High VIF indicates that the model has omitted variable bias.

H.2 Guided Exercise

Guided Exercise

Fill in the blanks to complete this interpretation of an OLS regression.

You run the following Mincer regression on a sample of 5,000 workers and obtain: ln(wage) = 6.2 + 0.08*Education + 0.03*Experience - 0.0005*Experience^2, with robust standard error on Education = 0.015. The R-squared is 0.30.

H.3 Error Detective

Error Detective

Read the analysis below carefully and identify the errors.

A researcher studies the effect of a new management training program on employee productivity. They run:

reg productivity training age tenure, vce(robust)

They find: coefficient on training = 5.2 (SE = 1.8, p = 0.004). They write: "The management training program causes a 5.2-unit increase in productivity (p < 0.01). The R-squared of 0.45 confirms that our model is well-specified. Since we control for age and tenure, our estimate is free of omitted variable bias."

Select all errors you can find:

Claims OLS coefficient is causal without an identification strategy(First sentence)

Claims R-squared confirms model specification(Second sentence)

Claims controlling for age and tenure eliminates all omitted variable bias(Third sentence)

Error Detective

Read the analysis below carefully and identify the errors.

A political scientist regresses voter turnout (county level) on whether the county adopted early voting (a state-level policy). They cluster standard errors at the county level and report:

Coefficient on early voting: 3.1 percentage points (clustered SE = 0.8, p < 0.001).

"We cluster at the county level to account for within-county correlation in voter behavior."

Select all errors you can find:

Clustering at the wrong level(Clustering specification)

Causal language without addressing endogeneity(Implied causal interpretation)

H.4 You Are the Referee

Referee Exercise

Read the paper summary below and write a brief referee critique (2-3 sentences) of the identification strategy.

Paper Summary

The authors study the effect of corporate board diversity on firm performance (ROA). Using a panel of S&P 500 firms (2010-2020), they regress ROA on the percentage of female board members, controlling for firm size (log assets), leverage, and industry fixed effects. They use robust standard errors and find that a 10 percentage point increase in female board representation is associated with a 0.8 percentage point increase in ROA (SE = 0.3, p = 0.008). They conclude that board diversity improves firm performance.

Key Table

Variable	Coefficient	Robust SE	p-value
Female Board Share	0.08	0.03	0.008
Log(Assets)	0.02	0.01	0.045
Leverage	-0.15	0.04	0.000
Industry FE	Yes
R-squared	0.23
N	5,500

Authors' Identification Claim

By controlling for firm size, leverage, and industry fixed effects, we address the most important confounders. The robust standard errors account for heteroscedasticity.

ISwap-In: When to Use Something Else

Fixed effects: When unobserved time-invariant confounders are the primary concern and panel data are available.
IV / 2SLS: When endogeneity from , , or omitted variables cannot be addressed by controls alone, and a valid instrument is available.
Logit / Probit: When the outcome is binary (0/1). OLS (the linear probability model) can still be useful for average marginal effects, but logit/probit avoids predictions outside [0, 1].
Poisson / Negative Binomial: When the outcome is a non-negative count or inherently multiplicative (e.g., trade flows, patent counts).
Difference-in-differences: When a policy change creates a natural experiment with a comparison group and temporal variation.

JReviewer Checklist

Critical Reading Checklist

0 of 6 items checked0%

Is the exogeneity assumption (E[arepsilon|X] = 0) plausible? What omitted variables could bias the estimate, and in which direction?
Are standard errors appropriate? If the treatment varies at a group level, are SEs clustered at that level? Are there enough clusters (30+)?
Are any control variables potentially 'bad controls' — that is, outcomes of the treatment rather than pre-treatment characteristics? A useful heuristic: variables measured before the treatment was determined are generally good controls (Angrist and Pischke 2009, p. 68).
Is the functional form reasonable? Have the authors tested for nonlinearities (e.g., Ramsey RESET, adding polynomial terms)?
Do the authors distinguish between statistical significance and economic significance? Is the effect size meaningful?
Is the coefficient interpreted descriptively or causally, and does the research design justify the chosen interpretation?

Paper Library

Has replication code

Foundational (10)

Abadie, A., Athey, S., Imbens, G. W., & Wooldridge, J. M. (2020). Sampling-Based versus Design-Based Uncertainty in Regression Analysis.

EconometricaDOI: 10.3982/ECTA12675

Abadie et al. distinguish between sampling-based uncertainty (from drawing a sample from a population) and design-based uncertainty (from treatment assignment) in regression analysis. They show that conventional standard errors can be conservative when the sample includes a substantial fraction of the population, providing a rigorous framework for understanding what regression standard errors actually measure. This paper clarifies the conceptual foundations for inference in empirical work and complements their separate 2023 QJE paper on clustering.

Cameron, A. C., Gelbach, J. B., & Miller, D. L. (2008). Bootstrap-Based Improvements for Inference with Clustered Errors.

Review of Economics and StatisticsDOI: 10.1162/rest.90.3.414

Cameron, Gelbach, and Miller address what happens when clustering is necessary but the number of clusters is small (fewer than 30-50). They propose the wild cluster bootstrap as a solution, which has become the standard approach when researchers have too few clusters for asymptotic cluster-robust standard errors to be reliable.

Frisch, R., & Waugh, F. V. (1933). Partial Time Regressions as Compared with Individual Trends.

EconometricaDOI: 10.2307/1907330

Frisch and Waugh establish that a coefficient in a multiple regression can be obtained by first residualizing both the outcome and the regressor against all other covariates. The Frisch-Waugh-Lovell (FWL) theorem provides the theoretical foundation for understanding what 'controlling for' means in multiple regression and is the basis for modern fixed-effects estimation.

Griliches, Z. (1977). Estimating the Returns to Schooling: Some Econometric Problems.

EconometricaDOI: 10.2307/1913285

Griliches systematically examines the biases in OLS estimates of returns to schooling, including ability bias and measurement error. This paper is a classic illustration of why researchers must think carefully about omitted variables when interpreting OLS coefficients causally.

Hamilton, B. H., & Nickerson, J. A. (2003). Correcting for Endogeneity in Strategic Management Research.

Strategic OrganizationDOI: 10.1177/1476127003001001218

Hamilton and Nickerson warn strategy researchers that naive OLS estimates of the strategy-performance relationship are often biased by endogeneity, because firms that adopt a strategy differ systematically from those that do not. They provide an accessible tutorial on endogeneity and point toward solutions including instrumental variables and Heckman selection models. The paper remains a key reference for understanding why strategic management research requires identification strategies beyond simple regression.

Holland, P. W. (1986). Statistics and Causal Inference.

Journal of the American Statistical AssociationDOI: 10.1080/01621459.1986.10478354

Holland articulates the fundamental problem of causal inference—that we can never observe both potential outcomes for the same unit—and formalizes the Rubin Causal Model framework. His dictum 'no causation without manipulation' shaped how a generation of researchers thinks about the conditions under which statistical associations can be given causal interpretations.

Long, J. S., & Ervin, L. H. (2000). Using Heteroscedasticity Consistent Standard Errors in the Linear Regression Model.

The American StatisticianDOI: 10.1080/00031305.2000.10474549

Long and Ervin compare HC0, HC1, HC2, and HC3 heteroscedasticity-consistent standard error estimators in a simulation study. Their finding that HC3 performs best in finite samples has influenced applied practice, with many applied researchers preferring HC3 over the default HC0.

Moulton, B. R. (1990). An Illustration of a Pitfall in Estimating the Effects of Aggregate Variables on Micro Units.

Review of Economics and StatisticsDOI: 10.2307/2109724

Moulton demonstrates that when aggregate-level variables (such as state policies) are used to explain individual-level outcomes, OLS standard errors that ignore within-group correlation can be dramatically understated. This paper establishes the 'Moulton problem' and motivates the widespread adoption of clustered standard errors in applied microeconomics.

Newey, W. K., & West, K. D. (1987). A Simple, Positive Semi-Definite, Heteroskedasticity and Autocorrelation Consistent Covariance Matrix.

EconometricaDOI: 10.2307/1913610

Newey and West extend White's robust standard errors to also account for autocorrelation in time-series data in this short but hugely influential paper. The 'Newey-West standard errors' or 'HAC standard errors' are standard practice whenever researchers work with data that have a time dimension.

White, H. (1980). A Heteroskedasticity-Consistent Covariance Matrix Estimator and a Direct Test for Heteroskedasticity.

EconometricaDOI: 10.2307/1912934

White introduces the now-standard 'robust standard errors' that researchers routinely use with OLS. Before White's correction, standard errors could be misleadingly small when the variance of the error term was not constant across observations. Nearly every empirical paper today uses some variant of this approach.

Application (5)

Huselid, M. A. (1995). The Impact of Human Resource Management Practices on Turnover, Productivity, and Corporate Financial Performance.

Academy of Management JournalDOI: 10.2307/256741

Huselid uses OLS (and related cross-sectional methods) to estimate the relationship between HR practices and firm performance in this influential management study. It helps launch the field of strategic HRM and illustrates both the power and limitations of regression-based approaches in management research.

Krueger, A. B. (1999). Experimental Estimates of Education Production Functions.

Quarterly Journal of EconomicsDOI: 10.1162/003355399556052

Krueger uses Tennessee's Project STAR randomized class-size experiment to estimate the effect of class size on student achievement via OLS. Because treatment is randomized, the OLS coefficient has a causal interpretation, demonstrating that the method is not the issue -- the research design is what determines causality.

Mincer, J. (1974). Schooling, Experience, and Earnings.

National Bureau of Economic Research / Columbia University Press

Mincer develops the canonical human-capital earnings function relating log wages to years of schooling and labor-market experience. The Mincer equation became one of the most replicated empirical models in economics and remains the standard benchmark for wage-equation analysis, though it should not be read as having solved the causal identification problems surrounding returns to schooling.

Shaver, J. M. (1998). Accounting for Endogeneity When Assessing Strategy Performance: Does Entry Mode Choice Affect FDI Survival?.

Management ScienceDOI: 10.1287/mnsc.44.4.571

Shaver demonstrates how ignoring endogeneity — specifically, the self-selection of firms into entry modes — biases performance estimates in this foundational strategy paper. He shows that the choice between greenfield entries and acquisitions reflects private information about expected survival, and uses a Heckman-style selection correction to obtain unbiased estimates. One of the first papers to systematically demonstrate endogeneity problems in strategy research.

Villalonga, B., & Amit, R. (2006). How Do Family Ownership, Control and Management Affect Firm Value?.

Journal of Financial EconomicsDOI: 10.1016/j.jfineco.2004.12.005

Villalonga and Amit use OLS, firm fixed effects, and instrumental-variable and Heckman selection estimators on Fortune 500 panel data to disentangle the separate effects of family ownership, voting control through dual-class shares and pyramids, and family management on Tobin's q. They show that the three dimensions of family involvement have distinct, sometimes offsetting, effects on firm value.

Survey (5)

Angrist, J. D., & Pischke, J.-S. (2009). Mostly Harmless Econometrics: An Empiricist's Companion.

Princeton University PressDOI: 10.1515/9781400829828

Angrist and Pischke write one of the most influential modern textbooks on applied econometrics, organizing the field around a design-based approach to causal inference. The book provides essential treatments of instrumental variables, difference-in-differences, and regression discontinuity, each grounded in the potential outcomes framework. It remains the standard reference for graduate students learning to evaluate and implement identification strategies.

Angrist, J. D., & Pischke, J.-S. (2010). The Credibility Revolution in Empirical Economics: How Better Research Design Is Taking the Con out of Econometrics.

Journal of Economic PerspectivesDOI: 10.1257/jep.24.2.3

Angrist and Pischke survey the rise of design-based empirical economics — randomized experiments, natural experiments, IV, regression discontinuity, and difference-in-differences — and argue that explicit attention to research design (rather than richer functional forms or more controls) is what made applied microeconomics more credible since the 1980s. The piece is the standard reference for the 'credibility revolution' framing and responds directly to critiques of the design-based turn.

Cameron, A. C., & Miller, D. L. (2015). A Practitioner's Guide to Cluster-Robust Inference.

Journal of Human ResourcesDOI: 10.3368/jhr.50.2.317

Cameron and Miller cover all aspects of cluster-robust inference in OLS regression in this highly practical survey, including when to cluster, at what level, and what to do when the number of clusters is small. It has become the essential reference for applied researchers deciding how to handle clustered data.

King, G., & Roberts, M. E. (2015). How Robust Standard Errors Expose Methodological Problems They Do Not Fix, and What to Do About It.

Political AnalysisDOI: 10.1093/pan/mpu015

King and Roberts argue that researchers often use robust standard errors as a band-aid rather than fixing the underlying model specification. They provide practical guidance on when robust SEs are appropriate and when the model itself needs to be reconsidered.

Wooldridge, J. M. (2010). Econometric Analysis of Cross Section and Panel Data.

MIT Press

Wooldridge's graduate textbook covers duration and hazard models in Chapter 22, including the Cox proportional hazard model, parametric alternatives (Weibull, exponential), and the treatment of censoring and truncation in survival data.

One-Line Implementation

Download Full Analysis Code

Motivating Example: The Mincer Earnings Equation#

AOverview#

What OLS Does#

What OLS Estimates#

When to Use OLS#

When NOT to Use OLS#

Common Confusions#

BIdentification#

Assumption 1: Linearity#

Assumption 2: Random Sampling#

Assumption 3: No Perfect Multicollinearity#

Assumption 4: Zero Conditional Mean (Exogeneity)#

Gauss-Markov Assumptions (for efficiency)#

Assumption 5: Homoscedasticity#

Assumption 6: No Serial Correlation (time series/panel only)#

CVisual Intuition#

DMathematical Derivation#

EImplementation#

Basic OLS with Robust Standard Errors#

FDiagnostics#

F.1 Residual Plots#

F.2 Variance Inflation Factors (VIF)#

F.3 Heteroscedasticity Tests#

F.4 Normality of Residuals#

F.5 Ramsey RESET Test#

Interpreting Your Results#

How to Interpret Coefficients#

Confidence Intervals#

Statistical vs. Economic Significance#

Common Misstatements to Avoid#

GWhat Can Go Wrong#

Omitted Variable Bias

Heteroscedasticity with Conventional SEs

Multicollinearity

Measurement Error in Regressors

HPractice#

H.1 Concept Checks#

H.2 Guided Exercise#

H.3 Error Detective#

H.4 You Are the Referee#

Paper Summary

Key Table

Authors' Identification Claim

ISwap-In: When to Use Something Else#

JReviewer Checklist#

Critical Reading Checklist

Paper Library

Foundational (10)

Application (5)

Survey (5)

Tags

Motivating Example: The Mincer Earnings Equation

AOverview

What OLS Does

What OLS Estimates

When to Use OLS

When NOT to Use OLS

Common Confusions

BIdentification

Assumption 1: Linearity

Assumption 2: Random Sampling

Assumption 3: No Perfect

Assumption 4: Zero Conditional Mean (Exogeneity)

Assumptions (for efficiency)

Assumption 5: Homoscedasticity

Assumption 6: No Serial Correlation (time series/panel only)

CVisual Intuition

DMathematical Derivation

EImplementation

Basic OLS with Robust Standard Errors

FDiagnostics

F.1 Residual Plots

F.2 Variance Inflation Factors (VIF)

F.3 Heteroscedasticity Tests

F.4 Normality of Residuals

F.5 Ramsey RESET Test

Interpreting Your Results

How to Interpret Coefficients

Confidence Intervals

Statistical vs. Economic Significance

Common Misstatements to Avoid

GWhat Can Go Wrong

HPractice

H.1 Concept Checks

H.2 Guided Exercise

H.3 Error Detective

H.4 You Are the Referee

ISwap-In: When to Use Something Else

JReviewer Checklist