MethodAtlas
Design-BasedEstablished

Instrumental Variables / 2SLS

Uses an external source of variation (instrument) that affects treatment but not the outcome directly.

Quick Reference

When to Use
When your key regressor is endogenous (correlated with the error term) and you have an instrument — a variable that affects the treatment but has no direct effect on the outcome.
Key Assumption
Relevance (instrument predicts the endogenous regressor, first-stage F > 10), exogeneity (instrument uncorrelated with the error), and the exclusion restriction (instrument affects the outcome only through the endogenous regressor). The exclusion restriction is untestable with a single instrument.
Common Mistake
Using a weak instrument (first-stage F < 10) without acknowledging the resulting bias toward OLS, or not reporting the first-stage F-statistic. Also, not recognizing that IV estimates LATE (the effect for compliers), not the ATE.
Estimated Time
3.5 hours

One-Line Implementation

Stata: ivregress 2sls y x1 (treatment = instrument), first vce(robust)
R: feols(y ~ x1 | 0 | treatment ~ instrument, data = df, vcov = 'HC1')
Python: IV2SLS(dependent=df['y'], exog=df[['const','x1']], endog=df['treatment'], instruments=df['instrument']).fit(cov_type='robust')

Download Full Analysis Code

Complete scripts with diagnostics, robustness checks, and result export.

Motivating Example: Colonial Origins of Comparative Development

Why are some countries rich and others poor? Acemoglu et al. (2001) proposed that institutions — the rules governing economic activity — are the key driver. But institutions are endogenous: rich countries invest in better institutions, creating a classic chicken-and-egg problem.

Their solution was an instrument: settler mortality in the colonial era. The argument runs as follows:

  1. In places where European settlers faced high mortality (tropical diseases), colonizers set up extractive institutions designed to transfer wealth to the metropole.
  2. In places where settlers could survive (temperate climates), they created inclusive institutions with property rights and rule of law.
  3. These institutional differences persisted and shaped modern economic outcomes.
  4. Critically, settler mortality from centuries ago affects current GDP only through its effect on institutions — it has no direct effect on economic performance today.

This claim is the : the instrument (settler mortality) affects the outcome (GDP) only through the endogenous variable (institutions). If this holds, 2SLS can recover the causal effect of institutions on development. A related strategy that builds on this IV logic is the shift-share (Bartik) instrument, which interacts local exposure shares with national-level shocks to generate cross-sectional variation.

Whether the exclusion restriction actually holds in this case has been debated for two decades. That debate is itself a masterclass in IV methodology.

(Albouy, 2012)

This strategy is the fundamental logic of instrumental variables: find an external source of variation that shifts the endogenous regressor without directly affecting the outcome, and use that variation to recover causal effects.


A. Overview

The Endogeneity Problem

Consider the regression:

Yi=β0+β1Di+Xiγ+εiY_i = \beta_0 + \beta_1 D_i + X_i'\gamma + \varepsilon_i

If Cov(Di,εi)0\text{Cov}(D_i, \varepsilon_i) \neq 0 — the treatment or key regressor is correlated with the error — then OLS is biased and inconsistent. This endogeneity arises from , reverse causality, or measurement error. Standard sensitivity analysis techniques can quantify how severe confounding must be to explain the estimated relationship, but when confounding is clearly present, OLS adjustment alone is insufficient.

The IV Solution

An instrumental variable ZiZ_i solves this endogeneity problem by isolating the part of DiD_i that is uncorrelated with εi\varepsilon_i. The instrument must satisfy three conditions for consistent estimation, plus a fourth for the LATE interpretation:

  1. Relevance: Cov(Zi,Di)0\text{Cov}(Z_i, D_i) \neq 0 — the instrument must actually affect the endogenous variable. This condition is testable.
  2. Independence (Exogeneity): Cov(Zi,εi)=0\text{Cov}(Z_i, \varepsilon_i) = 0 — the instrument must be uncorrelated with the error term. This condition is not directly testable with a single instrument.
  3. Exclusion Restriction: ZiZ_i affects YiY_i only through DiD_i — there is no direct effect. This restriction is a maintained assumption that must be argued substantively.
  4. (for LATE interpretation): The instrument affects treatment status in only one direction for all units — there are no "defiers." This assumption is required for the IV estimate to be interpretable as the average effect for compliers.

Two-Stage Least Squares (2SLS)

The estimation proceeds in two stages:

Stage 1: Regress the endogenous variable on the instrument(s) and controls:

Di=π0+π1Zi+Xiδ+viD_i = \pi_0 + \pi_1 Z_i + X_i'\delta + v_i

Stage 2: Regress the outcome on the predicted values D^i\hat{D}_i from Stage 1 and the controls:

Yi=β0+β1D^i+Xiγ+εiY_i = \beta_0 + \beta_1 \hat{D}_i + X_i'\gamma + \varepsilon_i

The coefficient β^1\hat{\beta}_1 uses only the variation in DiD_i that is driven by ZiZ_i — purging the endogenous component.

IV Estimates as LATE

When treatment effects are heterogeneous, the IV estimator does not recover the for the entire population. Instead, it recovers the Local Average Treatment Effect (LATE) — the average effect for , i.e., units whose treatment status is affected by the instrument.

(Imbens & Angrist, 1994)

This locality means the IV estimate may differ from the OLS estimate even if OLS were unbiased, because they estimate effects for different subpopulations. For settings where treatment effects are heterogeneous and researchers want to understand the full distribution of effects, methods like causal forests provide complementary tools.


Common Confusions


B. Identification

Formal Conditions

For the model Yi=βDi+uiY_i = \beta D_i + u_i (suppressing controls for clarity), the IV estimator is consistent if:

  1. E[Ziui]=0E[Z_i u_i] = 0 (exogeneity / exclusion)
  2. E[ZiDi]0E[Z_i D_i] \neq 0 (relevance)

Under these conditions:

β^IV=Cov^(Z,Y)Cov^(Z,D)pCov(Z,Y)Cov(Z,D)=βCov(Z,D)Cov(Z,D)=β\hat{\beta}_{IV} = \frac{\widehat{\text{Cov}}(Z, Y)}{\widehat{\text{Cov}}(Z, D)} \xrightarrow{p} \frac{\text{Cov}(Z, Y)}{\text{Cov}(Z, D)} = \frac{\beta \cdot \text{Cov}(Z, D)}{\text{Cov}(Z, D)} = \beta

The LATE Theorem

(Angrist et al., 1996)

With a binary instrument Z{0,1}Z \in \{0,1\} and binary treatment D{0,1}D \in \{0,1\}, there are four types of units:

  • Compliers: D(1)=1,D(0)=0D(1) = 1, D(0) = 0 — take treatment when encouraged, do not when not
  • Always-takers: D(1)=D(0)=1D(1) = D(0) = 1 — always take treatment
  • Never-takers: D(1)=D(0)=0D(1) = D(0) = 0 — never take treatment
  • Defiers: D(1)=0,D(0)=1D(1) = 0, D(0) = 1 — do the opposite of encouragement

Under the monotonicity assumption (no defiers), the Wald estimator

β^IV=E[YZ=1]E[YZ=0]E[DZ=1]E[DZ=0]\hat{\beta}_{IV} = \frac{E[Y|Z=1] - E[Y|Z=0]}{E[D|Z=1] - E[D|Z=0]}

identifies the LATE: the average treatment effect for compliers.


C. Visual Intuition

Picture a scatter plot of YY vs. DD where DD is endogenous. The OLS line through this cloud is biased because unobserved factors push both DD and YY in the same direction.

Now imagine the instrument ZZ as a lever that shifts DD left or right. Some units get a "push" (high ZZ) and others do not (low ZZ). The IV estimator looks at how much YY changes per unit of ZZ-induced change in DD. It ignores the endogenous part of DD entirely.

Think of it as plumbing. The ordinary relationship between DD and YY is contaminated — dirty water (endogeneity) flows in. The instrument ZZ provides a clean source of variation in DD. By isolating the variation in DD that comes from ZZ (the clean water), you can estimate the effect of DD on YY without contamination.

When confounding is zero, IV and OLS agree. As confounding increases, OLS drifts away from the truth while IV stays on target — but with wider confidence intervals. When instrument strength approaches zero, IV becomes erratic (the weak instrument problem).

Interactive Simulation

Instrument Strength and Bias

IV recovers the true causal effect when the instrument is strong and the exclusion restriction holds. A weak instrument (low first-stage F) amplifies any small exclusion restriction violation into large bias.

02.264.526.78Simulated ValueFirst-Stage…Direct Effe…True CausalParameters

Computed Results

IV Estimate
2.00
OLS Estimate (biased)
3.50
IV Bias from Violation
0.000
1100
02
05
Interactive Simulation

Instrumental Variables (IV / 2SLS)

Explore how IV/2SLS corrects for endogeneity when an unobserved confounder U biases OLS. The DGP is D = 0.8·Z + 0.50·U + ν and Y = 2.0·D + 2·U + ε.

D (treatment)Y (outcome)
OLS (biased)IV / 2SLSTrue β

Regression Results

Estimatorβ̂Bias1st-stage F
OLS (biased)2.563+0.563
IV / 2SLS2.180+0.180137.7
True β2.000
200

Number of observations to generate

2.0

The causal effect of D on Y

0.50

Coefficient of unobserved confounder U in the treatment equation D = π·Z + δ·U + v

0.8

First-stage coefficient of Z on D

0.0

Direct effect of Z on Y (should be 0 for valid IV)

IV corrects the bias. OLS is biased by +0.56 due to the unobserved confounder, while the IV estimate (2.18) is much closer to the true β = 2.0.

Interactive Simulation

Why Instruments? Isolating Exogenous Variation

IV DGP: Y = 2.0·D + 2·U + ε, where D = 0.7·Z + 1.2·U + ν. Confounding strength = 0.6. Exclusion violation = 0.0.

First Stage (D vs Z)-4.8-2.5-0.12.24.5Instrument (Z)Treatment (D)F = 97.4Y vs D-13.5-6.31.08.215.4Endogenous DOutcome (Y)
OLSIV / 2SLSTrue

Estimation Results

Estimatorβ̂SE95% CIBias
OLS2.8660.056[2.76, 2.98]+0.866
IV / 2SLSclosest2.1790.330[1.53, 2.83]+0.179
True β2.000
300

Number of observations

2.0

The true effect of D on Y

0.60

How strongly U affects both D and Y (0 = no endogeneity)

0.70

First-stage effect of Z on D (higher = stronger instrument)

0.00

Direct effect of Z on Y (should be 0 for valid IV)

First-stage F = 97.4

Why the difference?

OLS is biased (+0.87) because D is endogenous: the confounder U pushes both D and Y in the same direction, inflating the estimated relationship. With confounding strength = 0.6, OLS attributes to D the effect that actually comes from U. IV isolates the exogenous variation in D driven by instrument Z (π̂ = 0.77, F = 97.4). The Wald ratio Cov(Z,Y)/Cov(Z,D) = 2.179 removes the confounding bias, yielding an estimate much closer to the truth.


D. Mathematical Derivation

Don't worry about the notation yet — here's what this means in words: 2SLS projects the endogenous regressors onto the instrument space, then runs OLS on the projected values. The formula is beta-hat-IV equals (X'Pz X) inverse times (X'Pz Y).

Let Z\mathbf{Z} be the matrix of instruments (and exogenous regressors), X\mathbf{X} be the matrix of all regressors (including endogenous ones), and Y\mathbf{Y} the outcome vector.

Step 1: Project X onto instrument space.

PZ=Z(ZZ)1ZP_Z = Z(Z'Z)^{-1}Z'X^=PZX\hat{X} = P_Z X

Step 2: OLS of Y on projected X.

β^2SLS=(X^X^)1X^Y=(XPZX)1XPZY\hat{\beta}_{2SLS} = (\hat{X}'\hat{X})^{-1}\hat{X}'Y = (X'P_Z X)^{-1} X'P_Z Y

Consistency: Substitute Y=Xβ+uY = X\beta + u:

β^2SLS=β+(XPZX)1XPZu\hat{\beta}_{2SLS} = \beta + (X'P_Z X)^{-1} X'P_Z u

Under E[Zu]=0E[Z'u] = 0 and relevance (rank(E[ZX])=k\text{rank}(E[Z'X]) = k):

β^2SLSpβ\hat{\beta}_{2SLS} \xrightarrow{p} \beta

Variance (robust):

V^2SLS=(XPZX)1(iu^i2X^iX^i)(XPZX)1\hat{V}_{2SLS} = (X'P_Z X)^{-1} \left(\sum_i \hat{u}_i^2 \hat{X}_i \hat{X}_i'\right) (X'P_Z X)^{-1}

Important: Standard errors in Stage 2 must be computed using the original XX, not X^\hat{X}. Running two separate OLS regressions manually gives incorrect SEs. It is important to use a dedicated 2SLS command.

Bias of 2SLS in finite samples:

E[β^2SLS]β1F(β^OLSβ)E[\hat{\beta}_{2SLS}] - \beta \approx \frac{1}{F}(\hat{\beta}_{OLS} - \beta)

where FF is the first-stage F-statistic. In the just-identified case (one instrument), the relative bias (as a fraction of the OLS bias) is approximately 1/F1/F. This follows because the concentration parameter μ2=KF\mu^2 = K \cdot F, and the Nagar (1959) approximation gives relative bias of order K/μ2=1/FK/\mu^2 = 1/F. When FF is large, bias vanishes. When FF is small, 2SLS inherits much of the OLS bias. Note that with many instruments, the standard F-statistic can be misleadingly large; the effective F-statistic of Montiel Olea and Pflueger (2013) is the appropriate measure in that case.

LIML (Limited Information Maximum Likelihood): An alternative to 2SLS that is less biased with weak instruments. In just-identified models (one instrument per endogenous variable), LIML and 2SLS are numerically identical.


E. Implementation

library(fixest)
library(ivreg)

# 2SLS with fixest
iv_fit <- feols(gdp_pc ~ controls | 0 | institutions ~ settler_mortality,
              data = df, vcov = "HC1")
summary(iv_fit)

# First-stage diagnostics
fitstat(iv_fit, type = "ivf")    # First-stage F
fitstat(iv_fit, type = "sargan") # Over-ID test (if applicable)

# 2SLS with ivreg (more diagnostics)
iv_fit2 <- ivreg(gdp_pc ~ institutions + controls | settler_mortality + controls,
               data = df)
summary(iv_fit2, diagnostics = TRUE)

# LIML (via the ivmodel package)
library(ivmodel)
iv_mod <- ivmodel(Y = df$gdp_pc, D = df$institutions, Z = df$settler_mortality,
                X = as.matrix(df[, "controls", drop = FALSE]))
LIML(iv_mod)

F. Diagnostics

First-Stage F-Statistic

A central diagnostic for IV. The rule of thumb from Staiger and Stock (1997) is F>10F > 10.

More recent guidance from Lee et al. (2022) shows that the standard first-stage F must exceed 104.7 for the conventional tt-ratio critical value of 1.96 to control size at 5% in the just-identified case. Below that threshold, the authors provide an adjusted critical-value function (tFtF procedure) that remains valid. In practice, F-statistics between 10 and 104.7 warrant the use of weak-instrument-robust inference methods such as the Anderson-Rubin test or the tFtF procedure.

Reduced Form

It is recommended to report the reduced-form regression: regress YY directly on ZZ (and controls). If the reduced form is insignificant, IV will be imprecise (even if the first stage is strong). The reduced form is the ITT analog in the IV framework.

Over-Identification Test (Sargan-Hansen J-Test)

When you have more instruments than endogenous variables, the J-test checks whether the instruments are consistent with each other. A rejection suggests at least one instrument violates the exclusion restriction. But beware: the test has low power when all instruments are invalid in the same direction.

Weak Instrument Robust Inference

When the first-stage F is low, use the Anderson-Rubin (AR) test, which is valid regardless of instrument strength. The AR confidence set inverts a test of the reduced-form null and has correct size even with arbitrarily weak instruments.

Hausman/Durbin-Wu-Hausman Test (OLS vs. IV)

Compare OLS and IV estimates formally. Under the null that OLS is consistent, OLS and IV should give similar estimates. If they differ significantly, OLS is likely biased.


Interpreting Results

  • Sign and magnitude: IV estimates are often larger than OLS (in absolute value). Two common explanations: (1) measurement error in DD attenuates OLS estimates, and IV corrects this attenuation; (2) IV identifies the LATE for compliers, who may respond more strongly to treatment than the average person. These explanations are not mutually exclusive, and distinguishing between them requires additional analysis.
  • Precision: IV estimates are typically much less precise than OLS. Wide confidence intervals are the norm, not the exception.
  • Compliers: Think carefully about who the compliers are. In the Acemoglu et al. example, compliers are countries whose institutional quality was determined by settler mortality. In the Angrist and Krueger (1991) quarter-of-birth example, compliers are people who would have gotten more education if not for compulsory schooling laws.

G. What Can Go Wrong

ProblemWhat It DoesHow to Fix It
Weak instruments (F<10F < 10)IV is biased toward OLS; confidence intervals have wrong coverageUse Anderson-Rubin test; find a stronger instrument; use LIML
Exclusion restriction violatedIV is biased in an unknown direction; bias may be worse than OLSArgue substantively; sensitivity analysis (Conley et al., 2012)
Forbidden regressionRunning 2SLS manually with two OLS steps gives wrong SEsUse dedicated IV commands
Many weak instrumentsBias toward OLS increases with number of instrumentsUse LIML estimator or JIVE; reduce instrument count
Heterogeneous effects ignoredInterpreting LATE as ATE when they differDiscuss complier characteristics; present OLS alongside
Instrument not excludableDirect effect of Z on Y biases the IV estimateArgue exclusion carefully; sensitivity analysis
Assumption Failure Demo

Weak Instrument Bias (F < 10)

Strong instrument: settler mortality has a first-stage F-statistic of 22.9

IV estimate of effect of institutions on log GDP: 0.94 (SE = 0.16). Bias relative to OLS is approximately 1/22.9 = 4%. The IV estimate is reliable.

Assumption Failure Demo

Exclusion Restriction Violation

Rainfall affects civil conflict only through its effect on economic growth (the endogenous variable)

IV estimate of growth on conflict: -0.12 (SE = 0.04). If rainfall has no direct effect on conflict except through the economy, the exclusion restriction holds and the estimate is consistent.

Assumption Failure Demo

The Forbidden Regression (Manual Two-Step OLS)

Use a dedicated 2SLS command that computes standard errors correctly

ivregress 2sls y (D = Z), vce(robust). SE = 0.16. Correct inference because the command uses the original D (not fitted D-hat) in the variance formula.

(Conley et al., 2012)

H. Practice

Guided Exercise

IV Validity: Instrumenting Military Service with Draft Lottery Numbers

Angrist (1990) estimates the effect of Vietnam-era military service on long-run earnings. The problem is that who serves is not random — men from disadvantaged backgrounds are more likely to enlist. The famous solution is to use draft lottery numbers (randomly assigned by birth date) as an instrument for military service.

What is the relevance condition for this instrument?

What is the exclusion restriction for this instrument?

If men with low lottery numbers were more likely to drop out of college (to avoid the draft), would this violate the exclusion restriction?

The 2SLS estimate represents the effect of military service for which group of men?

Error Detective

Read the analysis below carefully and identify the errors.

A researcher studies the effect of immigration on native wages. They instrument local immigrant share with historical immigrant settlement patterns (a shift-share instrument). Using data from 200 metropolitan areas, they report: ivregress 2sls native_wage controls (immigrant_share = historical_share), vce(robust) Coefficient on immigrant_share: -0.35 (SE = 0.12, p = 0.004). First-stage F = 45. "We find that a 1 percentage point increase in immigrant share causes a 0.35% decrease in native wages. The strong first stage (F = 45) confirms instrument validity."

Select all errors you can find:

Error Detective

Read the analysis below carefully and identify the errors.

A finance researcher instruments CEO overconfidence (measured by option exercise behavior) with the CEO's birth order (first-born vs. later-born), citing psychology literature that first-borns are more confident. Using a cross-section of 800 firms: First-stage F-statistic: 6.8 IV estimate of overconfidence on firm investment: 0.42 (SE = 0.25, p = 0.09) "Although the first-stage F is below 10, the IV estimate is marginally significant, suggesting overconfident CEOs invest more aggressively. Birth order is clearly exogenous because it is determined at birth."

Select all errors you can find:

Concept Check

A researcher instruments 'years of schooling' with 'quarter of birth' to estimate the return to education. The first-stage F-statistic is 4.2. What is the main concern?

Concept Check

You estimate the effect of institutions on GDP using IV (settler mortality instrument) and OLS. The OLS estimate is 0.52 (SE = 0.06) and the IV estimate is 0.94 (SE = 0.16). Both are statistically significant. Why might the IV estimate be nearly twice as large?

Referee Exercise

Read the paper summary below and write a brief referee critique (2-3 sentences) of the identification strategy.

Paper Summary

A study examines whether R&D spending affects firm revenue. The authors instrument R&D spending with 'industry-average R&D spending' (the average R&D of all OTHER firms in the same industry). Using a panel of 5,000 firms over 10 years, they find a first-stage F-statistic of 28 and estimate that a \$1M increase in R&D raises revenue by \$4.2M (p < 0.01). They include firm and year fixed effects.

Key Table

VariableCoefficientRobust SEp-value
R&D (instrumented)4.2001.1000.000
Firm Size (log)0.0150.0060.012
Firm FEYes
Year FEYes
First-stage F28
N50,000

Authors' Identification Claim

Industry-average R&D (excluding the focal firm) is correlated with the focal firm's R&D through technology spillovers but is uncorrelated with firm-specific revenue shocks.


I. Swap-In: When to Use Something Else

  • OLS with controls: When conditional exogeneity (selection on observables) is more credible than the exclusion restriction, and a rich set of covariates is available.
  • Regression discontinuity: When treatment is assigned by a threshold on a running variable — RDD provides a more transparent and locally randomized design.
  • Difference-in-differences: When a policy change provides before/after and treated/untreated variation without requiring an instrument.
  • Matching: When selection into treatment is primarily on observables and the overlap condition is satisfied.
  • Reduced form only: When the instrument is valid but weak (F < 10), reporting the reduced-form effect of the instrument on the outcome avoids the bias amplification of 2SLS.

J. Reviewer Checklist

Critical Reading Checklist


Paper Library

Foundational (9)

Angrist, J. D., & Krueger, A. B. (1991). Does Compulsory School Attendance Affect Schooling and Earnings?.

Quarterly Journal of EconomicsDOI: 10.2307/2937954

Angrist and Krueger used quarter of birth as an instrument for years of schooling, exploiting the fact that compulsory schooling laws interact with birth timing. This paper is one of the most-taught examples of instrumental variables in economics and also sparked important debates about weak instruments.

Angrist, J. D., Imbens, G. W., & Rubin, D. B. (1996). Identification of Causal Effects Using Instrumental Variables.

Journal of the American Statistical AssociationDOI: 10.1080/01621459.1996.10476902

This paper clarified what IV actually estimates: the Local Average Treatment Effect (LATE), which is the causal effect for 'compliers'—people whose treatment status is changed by the instrument. This reinterpretation fundamentally changed how researchers think about IV estimates and their external validity.

Stock, J. H., & Yogo, M. (2005). Testing for Weak Instruments in Linear IV Regression.

Identification and Inference for Econometric Models: Essays in Honor of Thomas RothenbergDOI: 10.1017/CBO9780511614491.006

Stock and Yogo developed critical values for testing whether instruments are 'weak'—that is, only weakly correlated with the endogenous variable. Their rule of thumb that the first-stage F-statistic should exceed 10 is probably the most widely used diagnostic in applied IV research.

Staiger, D., & Stock, J. H. (1997). Instrumental Variables Regression with Weak Instruments.

EconometricaDOI: 10.2307/2171753

Staiger and Stock showed formally that when instruments are weak, 2SLS estimates are biased toward OLS and standard inference breaks down. This paper established the theoretical foundations for the weak instruments problem that Stock and Yogo (2005) later provided practical tests for.

Lee, D. S., McCrary, J., Moreira, M. J., & Porter, J. (2022). Valid t-Ratio Inference for IV.

American Economic ReviewDOI: 10.1257/aer.20211063

Lee, McCrary, Moreira, and Porter showed that the conventional t-ratio in IV regression has correct size when the first-stage F-statistic exceeds 104.7, far above the traditional Stock-Yogo threshold of 10. This paper fundamentally raised the bar for what constitutes a sufficiently strong instrument and has prompted researchers to reconsider previously accepted IV results.

Imbens, G. W., & Angrist, J. D. (1994). Identification and Estimation of Local Average Treatment Effects.

EconometricaDOI: 10.2307/2951620

The foundational paper on LATE. Showed that IV identifies the average causal effect for compliers -- the subpopulation whose treatment status is changed by the instrument -- under the monotonicity assumption. This reinterpretation fundamentally changed how researchers understand what IV estimates.

Montiel Olea, J. L., & Pflueger, C. (2013). A Robust Test for Weak Instruments.

Journal of Business & Economic StatisticsDOI: 10.1080/00401706.2013.806694

Proposes an effective F-statistic for testing weak instruments that is robust to heteroscedasticity, serial correlation, and clustering — unlike the conventional first-stage F. The effective F is now the standard diagnostic for instrument strength in applied IV research.

Angrist, J. D. (1990). Lifetime Earnings and the Vietnam Era Draft Lottery: Evidence from Social Security Administrative Records.

American Economic Review

A landmark application of instrumental variables using the Vietnam-era draft lottery as a natural experiment. Angrist showed that randomly assigned lottery numbers provide an instrument for military service, allowing causal estimation of the earnings effect of military service.

Manski, C. F. (1993). Identification of Endogenous Social Effects: The Reflection Problem.

Review of Economic StudiesDOI: 10.2307/2298123

Formalized the reflection problem: when individual outcomes depend on group averages, the group average is simultaneously determined by its members, making it impossible to distinguish true social (endogenous) effects from correlated effects without additional structure.

Application (7)

Acemoglu, D., Johnson, S., & Robinson, J. A. (2001). The Colonial Origins of Comparative Development: An Empirical Investigation.

American Economic ReviewDOI: 10.1257/aer.91.5.1369

This celebrated paper used historical settler mortality as an instrument for institutional quality to estimate the causal effect of institutions on economic development. It is one of the most influential IV applications in economics and demonstrates the creativity required to find a plausible instrument.

Levitt, S. D. (1997). Using Electoral Cycles in Police Hiring to Estimate the Effect of Police on Crime.

American Economic Review

Levitt used the timing of mayoral and gubernatorial elections as an instrument for police hiring to estimate the causal effect of police on crime. The paper illustrates the IV approach in a policy-relevant setting where the key concern is reverse causality (more crime leads to more police).

Bloom, N., & Van Reenen, J. (2007). Measuring and Explaining Management Practices Across Firms and Countries.

Quarterly Journal of EconomicsDOI: 10.1162/qjec.2007.122.4.1351

Bloom and Van Reenen developed a survey-based measure of management practices and used IV strategies (including firm age and governance rules) to study the causal relationship between management quality and firm productivity. This paper is a prominent IV application in management and organizational economics.

Semadeni, M., Withers, M. C., & Certo, S. T. (2014). The Perils of Endogeneity and Instrumental Variables in Strategy Research: Understanding through Simulations.

Strategic Management JournalDOI: 10.1002/smj.2136

This paper used Monte Carlo simulations to demonstrate the dangers of using weak or invalid instruments in strategy research. It provides practical guidance for management scholars on when and how to use IV, and when it may do more harm than good.

Albouy, D. Y. (2012). The Colonial Origins of Comparative Development: An Empirical Investigation: Comment.

American Economic ReviewDOI: 10.1257/aer.102.6.3059

Albouy critically re-examined the settler mortality instrument used in Acemoglu et al. (2001), showing that the original results are sensitive to data coding decisions and the sample of countries included. This comment is a cautionary tale about instrument validity and the fragility of influential IV estimates.

Miguel, E., Satyanath, S., & Sergenti, E. (2004). Economic Shocks and Civil Conflict: An Instrumental Variables Approach.

Journal of Political EconomyDOI: 10.1086/421174

Instruments for economic growth using rainfall variation to estimate the causal effect of economic shocks on civil conflict in Sub-Saharan Africa. A clean and widely cited example of using weather as an instrumental variable, illustrating both the power and the exclusion restriction challenges of weather-based instruments.

Young, A. (2022). Consistency Without Inference: Instrumental Variables in Practical Application.

European Economic ReviewDOI: 10.1016/j.euroecorev.2022.104112

A provocative assessment showing that many published IV applications have first-stage F-statistics too weak for reliable inference when examined under modern standards. Highlights the gap between theoretical requirements for valid IV and actual practice in published research.

Survey (6)

Andrews, I., Stock, J. H., & Sun, L. (2019). Weak Instruments in Instrumental Variables Regression: Theory and Practice.

Annual Review of EconomicsDOI: 10.1146/annurev-economics-080218-025643

This survey provides an up-to-date review of the weak instruments problem, covering modern diagnostic tests, robust inference procedures, and practical recommendations. It is an excellent starting point for understanding the current best practices in IV estimation.

Stock, J. H., Wright, J. H., & Yogo, M. (2002). A Survey of Weak Instruments and Weak Identification in Generalized Method of Moments.

Journal of Business & Economic StatisticsDOI: 10.1198/073500102288618658

A comprehensive treatment of weak instruments and their consequences for inference in IV and GMM settings. Covers the theoretical foundations of the weak instrument problem and practical diagnostic tools.

Angrist, J. D., & Pischke, J.-S. (2009). Mostly Harmless Econometrics: An Empiricist's Companion.

Princeton University PressDOI: 10.1515/9781400829828

Chapter 4 provides an accessible yet rigorous treatment of instrumental variables, two-stage least squares, and the LATE framework. The go-to textbook reference for understanding IV estimation in the context of modern applied econometrics.

Wooldridge, J. M. (2010). Econometric Analysis of Cross Section and Panel Data.

MIT Press

Chapter 5 offers a comprehensive graduate-level treatment of IV estimation, including GMM, tests for overidentification, and the relationship between IV and control function approaches. The standard graduate econometrics textbook reference for IV methods.

Angrist, J. D., & Krueger, A. B. (2001). Instrumental Variables and the Search for Identification: From Supply and Demand to Natural Experiments.

Journal of Economic PerspectivesDOI: 10.1257/jep.15.4.69

A historical survey tracing the evolution of IV from its origins in supply-and-demand estimation to modern natural experiments. Provides valuable context for understanding how IV methodology developed and why it became central to applied economics.

Murray, M. P. (2006). Avoiding Invalid Instruments and Coping with Weak Instruments.

Journal of Economic PerspectivesDOI: 10.1257/jep.20.4.111

Practical guidance on evaluating instrument validity and dealing with weak instruments in applied work. Written in an accessible style, it helps applied researchers think critically about their instrument choices and provides concrete strategies for addressing common IV pitfalls.

Tags

design-basedendogeneityLATE