MethodAtlas
tutorial2 hours

Lab: Instrumental Variables and 2SLS from Scratch

Implement two-stage least squares step by step. Simulate endogeneity, construct an instrument, manually perform both stages, and learn to diagnose weak instruments and check the exclusion restriction.

Overview

Instrumental Variables (IV) is the workhorse method for dealing with endogenous regressors — variables that are correlated with the error term, making OLS biased. IV uses an external source of variation (the instrument) that affects the treatment but does not directly affect the outcome.

What you will learn:

  • Why OLS is biased when the treatment is endogenous
  • How to implement 2SLS (two-stage least squares) manually and via packages
  • How to assess instrument strength (first-stage F-statistic)
  • Why the manually-computed second-stage standard errors are incorrect and must be avoided
  • What LATE (Local Average Treatment Effect) means and why IV estimates differ from OLS
  • How to think about the exclusion restriction

Prerequisites: OLS regression (see the OLS lab).


Step 1: The Endogeneity Problem

We want to estimate the causal effect of education on wages. The problem: unobserved ability affects both education and wages, biasing OLS.

library(estimatr)
library(fixest)

set.seed(42)
n <- 10000

ability <- rnorm(n)
qob <- sample(1:4, n, replace = TRUE)
born_q1 <- as.integer(qob == 1)

educ <- pmin(pmax(round(12 + 2 * ability - 0.3 * born_q1 + rnorm(n, sd = 1.5)), 8), 20)

true_beta <- 0.06
log_wage <- 1.0 + true_beta * educ + 0.10 * ability + rnorm(n, sd = 0.4)

df <- data.frame(log_wage, educ, ability, qob, born_q1)

# OLS (biased)
m_ols <- lm_robust(log_wage ~ educ, data = df, se_type = "HC2")
cat("OLS Estimate (biased):", coef(m_ols)["educ"], "\n")
cat("True causal effect:", true_beta, "\n")
cat("Bias:", coef(m_ols)["educ"] - true_beta, "\n")

Expected output:

EstimatorCoefficient on educTrue EffectBias
OLS (biased)~0.0900.060~0.030

The OLS coefficient is substantially above the true causal effect of 0.06 because omitted ability biases the estimate upward.


Step 2: Check the Three IV Conditions

Before running 2SLS, verify that your instrument (quarter of birth) satisfies the three conditions:

1. Relevance: The instrument must affect the treatment (education).

# First stage
first_stage <- lm_robust(educ ~ born_q1, data = df, se_type = "HC2")
summary(first_stage)

fstat <- (coef(first_stage)["born_q1"] / first_stage$std.error["born_q1"])^2
cat("First-stage F-statistic:", round(fstat, 2), "\n")
cat("(Need F > 10 for strong instrument)\n")

Expected output:

VariableCoefficientRobust SEt-statisticF-statistic
born_q1~-0.30~0.07~-4.0~16.0

The first-stage F-statistic is above the rule-of-thumb threshold of 10, indicating a relevant (though not overwhelmingly strong) instrument. The negative sign confirms that Q1-born individuals obtain slightly less education due to compulsory schooling laws.

2. Independence (Exogeneity): The instrument must be uncorrelated with the error term. Quarter of birth is plausibly random — you do not choose when you are born.

3. Exclusion Restriction: The instrument affects wages ONLY through its effect on education. This condition is the untestable assumption. Quarter of birth should not directly affect wages through any channel other than schooling. This exclusion restriction has been debated in the literature.


Step 3: Manual 2SLS

Two-Stage Least Squares (2SLS) works in two steps:

  1. First stage: Regress the endogenous variable (education) on the instrument (quarter of birth). Save the predicted values.
  2. Second stage: Regress the outcome (wages) on the predicted values from step 1.
# Stage 1
stage1 <- lm(educ ~ born_q1, data = df)
df$educ_hat <- fitted(stage1)

cat("Stage 1: born_q1 coefficient =", coef(stage1)["born_q1"], "\n\n")

# Stage 2 (INCORRECT standard errors — for illustration only!)
stage2 <- lm(log_wage ~ educ_hat, data = df)

cat("Stage 2 (manual, wrong SEs!):", coef(stage2)["educ_hat"], "\n")
cat("True effect:", true_beta, "\n")
cat("OLS estimate:", coef(m_ols)["educ"], "\n")
cat("\n*** Never use manual Stage 2 SEs! ***\n")

Expected output:

StageRegressionCoefficientNote
Stage 1educ ~ born_q1b ~ -0.30F-stat ~ 16
Stage 2log_wage ~ educ_hatd ~ 0.06SEs are WRONG
ComparisonEstimate
Manual IV (Stage 2 coeff)~0.060
OLS estimate (biased)~0.090
True causal effect0.060

The manual 2SLS point estimate is close to the true effect, but the standard errors from the second stage are incorrect and must not be used for inference.


Step 4: Proper 2SLS Estimation

Now use proper IV/2SLS commands that compute correct standard errors.

# Using fixest (recommended)
iv_model <- feols(log_wage ~ 1 | 0 | educ ~ born_q1, data = df)
summary(iv_model)

cat("\nComparison:\n")
cat("OLS:", coef(m_ols)["educ"], "\n")
cat("IV:", coef(iv_model)["fit_educ"], "\n")
cat("True:", true_beta, "\n")
Requiresfixest

Expected output:

EstimatorCoefficient on educRobust SE95% CI
OLS (biased)~0.090~0.003[0.084, 0.096]
IV/2SLS (proper)~0.060~0.025[0.011, 0.109]
True effect0.060
MethodEstimateCompared to True (0.06)
OLS~0.090Biased upward by ~0.030
IV/2SLS~0.060Close to true effect

The IV estimate is closer to the true causal effect (0.06) than OLS. Note that the IV standard error is substantially larger than the OLS standard error — this larger variance is the price of identification.

Concept Check

The IV estimate is typically larger in magnitude than the OLS estimate in the returns-to-education literature. Given what you learned about OVB biasing OLS upward, does this seem contradictory? Why might IV give a larger estimate?


Step 5: Weak Instruments

What happens when the instrument barely predicts the endogenous variable? This scenario is the weak instrument problem.

# Weak instrument simulation
weak_iv <- rnorm(n)
df$educ_weak <- pmin(pmax(round(12 + 2 * ability + 0.01 * weak_iv +
                               rnorm(n, sd = 1.5)), 8), 20)
df$weak_iv <- weak_iv

# First stage
fs_weak <- lm_robust(educ_weak ~ weak_iv, data = df, se_type = "HC2")
fstat_weak <- (coef(fs_weak)["weak_iv"] / fs_weak$std.error["weak_iv"])^2
cat("First-stage F with weak IV:", round(fstat_weak, 2), "\n")
cat("(Far below the F > 10 threshold)\n")

# IV with weak instrument
iv_weak <- feols(log_wage ~ 1 | 0 | educ_weak ~ weak_iv, data = df)
cat("\nIV estimate with weak instrument:", coef(iv_weak)["fit_educ_weak"], "\n")
cat("This estimate is unreliable.\n")

Expected output:

InstrumentFirst-Stage FIV EstimateIV SEReliable?
born_q1 (strong)~16.0~0.060~0.025Yes
weak_iv (weak)~0.01UnstableVery largeNo
DiagnosticStrong InstrumentWeak Instrument
F-statistic> 10<< 10
IV biasSmallApproaches OLS bias
Standard errorsModerateExtremely large
Confidence intervalsReliableWrong coverage

With a weak instrument (F << 10), the IV estimate is unreliable: the point estimate becomes erratic, the standard error becomes enormous, and the estimator is biased toward OLS.


Step 6: Overidentification Test (with Multiple Instruments)

When you have more instruments than endogenous variables, you can test whether the instruments are consistent with each other using the Sargan-Hansen overidentification test.

# Multiple instruments
df$born_q2 <- as.integer(df$qob == 2)
df$born_q3 <- as.integer(df$qob == 3)

# Overidentified model
iv_over <- feols(log_wage ~ 1 | 0 | educ ~ born_q1 + born_q2 + born_q3,
               data = df)
summary(iv_over)

# For the Sargan test, use ivreg or AER package
library(AER)
iv_aer <- ivreg(log_wage ~ educ | born_q1 + born_q2 + born_q3, data = df)
summary(iv_aer, diagnostics = TRUE)
RequiresivregAER

Expected output:

StatisticValue
IV estimate (overidentified)~0.060
Number of instruments3 (born_q1, born_q2, born_q3)
Number of endogenous variables1 (educ)
Degrees of overidentification2
Sargan test statistic~0.5–3.0
Sargan p-value> 0.05 (fail to reject)

A Sargan p-value above 0.05 means we fail to reject the null that all instruments are valid. The instruments are internally consistent with each other.


Step 7: Complete Checklist for IV

When using IV in your research, it is important to report and discuss:

  1. The economic argument for the instrument. Why is the exclusion restriction plausible?
  2. The first-stage regression. Show the coefficient and F-statistic on the instrument.
  3. The reduced form. Regress the outcome directly on the instrument. This coefficient should be significant if both the first stage and the IV estimate are real.
  4. The IV estimate with correct standard errors. Use a proper 2SLS command.
  5. Sensitivity checks. How does the estimate change with different instruments, control sets, or subsamples?
# Complete IV checklist

# First stage
fs <- lm_robust(educ ~ born_q1, data = df, se_type = "HC2")
cat("=== FIRST STAGE ===\n")
cat("Instrument coeff:", coef(fs)["born_q1"], "\n")

# Reduced form
rf <- lm_robust(log_wage ~ born_q1, data = df, se_type = "HC2")
cat("\n=== REDUCED FORM ===\n")
cat("Instrument on outcome:", coef(rf)["born_q1"], "\n")

# Wald estimate
cat("\n=== WALD ESTIMATE ===\n")
cat("RF / FS =", coef(rf)["born_q1"] / coef(fs)["born_q1"], "\n")

Expected output:

ComponentVariableCoefficientSEInterpretation
First Stageborn_q1 on educ~-0.30~0.07Instrument is relevant (F ~ 16)
Reduced Formborn_q1 on log_wage~-0.018~0.009Significant negative effect
IV / 2SLSeduc on log_wage~0.060~0.025Close to true effect (0.06)
Wald (RF / FS)(-0.018) / (-0.30)~0.060Identical to 2SLS (single binary IV)

The Wald estimate (reduced form divided by first stage) equals the 2SLS estimate when using a single binary instrument. This identity provides a useful sanity check: if the two differ, something has gone wrong in the computation.


Step 8: Exercises

  1. Hausman test. Test whether OLS and IV give statistically different estimates. If they do not, OLS may be adequate (less noisy). If they do, endogeneity is a problem and IV is preferred.

  2. Multiple endogenous variables. What if both education and experience are endogenous? You need at least one instrument per endogenous variable. Try adding a second instrument.

  3. Weak IV robust inference. With a first-stage F around 10-15, report the Anderson-Rubin confidence set, which is valid regardless of instrument strength.

  4. Just-identified vs. overidentified. Compare estimates from the just-identified model (one instrument) versus the overidentified model (multiple quarter-of-birth dummies). Discuss the tradeoffs.


Summary

In this lab you learned:

  • OLS is biased when the treatment is endogenous (correlated with the error term)
  • IV/2SLS uses an instrument to isolate exogenous variation in the treatment
  • Three conditions: relevance (testable), exogeneity (assumed), exclusion restriction (untestable and crucial)
  • It is important to check the first-stage F-statistic: below 10 means your instrument is weak and IV is unreliable
  • Generally avoid using standard errors from a manual second-stage regression
  • IV estimates LATE (the effect for compliers), which may differ from ATE
  • The Wald estimator (reduced form divided by first stage) equals 2SLS with a single binary instrument
  • It is recommended to report the first stage, reduced form, and IV estimate together