Lab·tutorial·8 min read

tutorial120 minutes

Lab: Instrumental Variables and 2SLS from Scratch

Implement 2SLS step by step: simulate endogeneity, construct an instrument, perform both stages, and diagnose weak instruments and the exclusion restriction.

Method: Instrumental Variables / 2SLS
Languages: Python, R, Stata
Dataset: Returns to education using compulsory schooling (simulated Angrist & Krueger style)

Overview

Instrumental Variables (IV) is the workhorse method for dealing with endogenous regressors — variables that are correlated with the error term, making OLS biased. IV uses an external source of variation (the instrument) that affects the treatment but does not directly affect the outcome.

What you will learn:

Why OLS is biased when the treatment is endogenous
How to implement 2SLS (two-stage least squares) manually and via packages
How to assess instrument strength (first-stage F-statistic)
Why the manually-computed second-stage standard errors are incorrect and must be avoided
What LATE (Local Average Treatment Effect) means and why IV estimates differ from OLS
How to think about the exclusion restriction

Prerequisites: OLS regression (see the OLS lab).

Step 1: The Endogeneity Problem

We want to estimate the causal effect of education on wages. The problem: unobserved ability affects both education and wages, biasing OLS.

1# First-time setup: install.packages(c("estimatr", "fixest"))
2library(estimatr)
3library(fixest)
4
5set.seed(42)
6n <- 10000
7
8ability <- rnorm(n)
9qob <- sample(1:4, n, replace = TRUE)
10born_q1 <- as.integer(qob == 1)
11
12educ <- pmin(pmax(round(12 + 2 * ability - 0.3 * born_q1 + rnorm(n, sd = 1.5)), 8), 20)
13
14true_beta <- 0.06
15log_wage <- 1.0 + true_beta * educ + 0.10 * ability + rnorm(n, sd = 0.4)
16
17df <- data.frame(log_wage, educ, ability, qob, born_q1)
18
19# OLS (biased)
20m_ols <- lm_robust(log_wage ~ educ, data = df, se_type = "HC2")
21cat("OLS Estimate (biased):", coef(m_ols)["educ"], "\n")
22cat("True causal effect:", true_beta, "\n")
23cat("Bias:", coef(m_ols)["educ"] - true_beta, "\n")

Requiresestimatr fixest

Expected output:

Estimator	Coefficient on educ	True Effect	Bias
OLS (biased)	~0.090	0.060	~0.030

The OLS coefficient is substantially above the true causal effect of 0.06 because omitted ability biases the estimate upward.

Step 2: Check the Three IV Conditions

Before running 2SLS, verify that your instrument (quarter of birth) satisfies the three conditions:

1. Relevance: The instrument must affect the treatment (education).

1# First stage
2first_stage <- lm_robust(educ ~ born_q1, data = df, se_type = "HC2")
3summary(first_stage)
4
5fstat <- (coef(first_stage)["born_q1"] / first_stage$std.error["born_q1"])^2
6cat("First-stage F-statistic:", round(fstat, 2), "\n")
7cat("(Staiger-Stock screening rule: F > 10; LMMP 2022: F > 104.7 in just-identified case)\n")

Expected output:

Variable	Coefficient	Robust SE	t-statistic	F-statistic
born_q1	~-0.30	~0.07	~-4.0	~16.0

The first-stage F-statistic is above the rule-of-thumb threshold of 10, indicating a relevant (though not overwhelmingly strong) instrument. The negative sign confirms that Q1-born individuals obtain slightly less education due to compulsory schooling laws.

2. Independence (Exogeneity): The instrument must be uncorrelated with the error term. Quarter of birth is plausibly random — you typically do not choose when you are born.

3. Exclusion Restriction: The instrument affects wages ONLY through its effect on education. This condition is the untestable assumption. Quarter of birth should not directly affect wages through any channel other than schooling. This exclusion restriction has been debated in the literature.

4. Monotonicity: The instrument affects treatment in the same direction for everyone — there are no "defiers" (individuals for whom being born in Q1 would increase their education). This assumption is required for the IV estimate to be interpretable as a LATE for compliers.

Step 3: Manual 2SLS

Two-Stage Least Squares (2SLS) works in two steps:

First stage: Regress the endogenous variable (education) on the instrument (quarter of birth). Save the predicted values.
Second stage: Regress the outcome (wages) on the predicted values from step 1.

1# Stage 1
2stage1 <- lm(educ ~ born_q1, data = df)
3df$educ_hat <- fitted(stage1)
4
5cat("Stage 1: born_q1 coefficient =", coef(stage1)["born_q1"], "\n\n")
6
7# Stage 2 (INCORRECT standard errors — for illustration only!)
8stage2 <- lm(log_wage ~ educ_hat, data = df)
9
10cat("Stage 2 (manual, wrong SEs!):", coef(stage2)["educ_hat"], "\n")
11cat("True effect:", true_beta, "\n")
12cat("OLS estimate:", coef(m_ols)["educ"], "\n")
13cat("\n*** Never use manual Stage 2 SEs! ***\n")

Expected output:

Stage	Regression	Coefficient	Note
Stage 1	educ ~ born_q1	b ~ -0.30	F-stat ~ 16
Stage 2	log_wage ~ educ_hat	d ~ 0.06	SEs are WRONG

Comparison	Estimate
Manual IV (Stage 2 coeff)	~0.060
OLS estimate (biased)	~0.090
True causal effect	0.060

The manual 2SLS point estimate is close to the true effect, but the standard errors from the second stage are incorrect and must not be used for inference.

Step 4: Proper 2SLS Estimation

Now use proper IV/2SLS commands that compute correct standard errors.

1# Using fixest (recommended)
2iv_model <- feols(log_wage ~ 1 | 0 | educ ~ born_q1, data = df)
3summary(iv_model)
4
5cat("\nComparison:\n")
6cat("OLS:", coef(m_ols)["educ"], "\n")
7cat("IV:", coef(iv_model)["fit_educ"], "\n")
8cat("True:", true_beta, "\n")

Requiresfixest

Expected output:

Estimator	Coefficient on educ	Robust SE	95% CI
OLS (biased)	~0.090	~0.003	[0.084, 0.096]
IV/2SLS (proper)	~0.060	~0.025	[0.011, 0.109]
True effect	0.060	—	—

Method	Estimate	Compared to True (0.06)
OLS	~0.090	Biased upward by ~0.030
IV/2SLS	~0.060	Close to true effect

The IV estimate is closer to the true causal effect (0.06) than OLS. Note that the IV standard error is substantially larger than the OLS standard error — this larger variance is the price of identification.

Concept Check

In this simulation, IV (~0.06) is smaller than OLS (~0.09) because OLS is biased upward by ability confounding. Yet in the empirical returns-to-education literature, IV estimates are often *larger* than OLS. Why might IV give a larger estimate in practice?

It is contradictory — IV should always give a smaller estimate when OLS is biased upward.IV estimates the LATE (Local Average Treatment Effect) — the effect for compliers whose education is changed by the instrument. Compliers may have higher returns to education than the average person.The instrument is invalid, so the IV estimate is meaningless.The IV estimate is larger because of measurement error in education.

Step 5: Weak Instruments

What happens when the instrument barely predicts the endogenous variable? This scenario is the weak instrument problem.

1# Weak instrument simulation
2weak_iv <- rnorm(n)
3df$educ_weak <- pmin(pmax(round(12 + 2 * ability + 0.01 * weak_iv +
4                               rnorm(n, sd = 1.5)), 8), 20)
5df$weak_iv <- weak_iv
6
7# First stage
8fs_weak <- lm_robust(educ_weak ~ weak_iv, data = df, se_type = "HC2")
9fstat_weak <- (coef(fs_weak)["weak_iv"] / fs_weak$std.error["weak_iv"])^2
10cat("First-stage F with weak IV:", round(fstat_weak, 2), "\n")
11cat("(Far below the F > 10 Staiger-Stock screening threshold)\n")
12
13# IV with weak instrument
14iv_weak <- feols(log_wage ~ 1 | 0 | educ_weak ~ weak_iv, data = df)
15cat("\nIV estimate with weak instrument:", coef(iv_weak)["fit_educ_weak"], "\n")
16cat("This estimate is unreliable.\n")

Expected output:

Instrument	First-Stage F	IV Estimate	IV SE	Reliable?
born_q1 (strong)	~16.0	~0.060	~0.025	Yes
weak_iv (weak)	~0.01	Unstable	Very large	No

Diagnostic	Strong Instrument	Weak Instrument
F-statistic	> 10	<< 10
IV bias	Small	Approaches OLS bias
Standard errors	Moderate	Extremely large
Confidence intervals	Reliable	Wrong coverage

With a weak instrument (F << 10), the IV estimate is unreliable: the point estimate becomes erratic, the standard error becomes enormous, and the estimator is biased toward OLS.

Step 6: Overidentification Test (with Multiple Instruments)

When you have more instruments than endogenous variables, you can test whether the instruments are consistent with each other using the Sargan-Hansen overidentification test.

1# First-time setup: install.packages(c("AER"))
2# Multiple instruments
3df$born_q2 <- as.integer(df$qob == 2)
4df$born_q3 <- as.integer(df$qob == 3)
5
6# Overidentified model
7iv_over <- feols(log_wage ~ 1 | 0 | educ ~ born_q1 + born_q2 + born_q3,
8               data = df)
9summary(iv_over)
10
11# For the Sargan test, use ivreg or AER package
12library(AER)
13iv_aer <- ivreg(log_wage ~ educ | born_q1 + born_q2 + born_q3, data = df)
14summary(iv_aer, diagnostics = TRUE)

RequiresAER ivreg

Expected output:

Statistic	Value
IV estimate (overidentified)	~0.060
Number of instruments	3 (born_q1, born_q2, born_q3)
Number of endogenous variables	1 (educ)
Degrees of overidentification	2
Sargan test statistic	~0.5–3.0
Sargan p-value	> 0.05 (fail to reject)

A Sargan p-value above 0.05 means we fail to reject the null that all instruments are valid. The instruments are internally consistent with each other.

Step 7: Complete Checklist for IV

When using IV in your research, it is important to report and discuss:

The economic argument for the instrument. Why is the exclusion restriction plausible?
The first-stage regression. Show the coefficient and F-statistic on the instrument.
The reduced form. Regress the outcome directly on the instrument. This coefficient should be significant if both the first stage and the IV estimate are real.
The IV estimate with correct standard errors. Use a proper 2SLS command.
Sensitivity checks. How does the estimate change with different instruments, control sets, or subsamples?

1# Complete IV checklist
2
3# First stage
4fs <- lm_robust(educ ~ born_q1, data = df, se_type = "HC2")
5cat("=== FIRST STAGE ===\n")
6cat("Instrument coeff:", coef(fs)["born_q1"], "\n")
7
8# Reduced form
9rf <- lm_robust(log_wage ~ born_q1, data = df, se_type = "HC2")
10cat("\n=== REDUCED FORM ===\n")
11cat("Instrument on outcome:", coef(rf)["born_q1"], "\n")
12
13# Wald estimate
14cat("\n=== WALD ESTIMATE ===\n")
15cat("RF / FS =", coef(rf)["born_q1"] / coef(fs)["born_q1"], "\n")

Expected output:

Component	Variable	Coefficient	SE	Interpretation
First Stage	born_q1 on educ	~-0.30	~0.07	Instrument is relevant (F ~ 16)
Reduced Form	born_q1 on log_wage	~-0.018	~0.009	Significant negative effect
IV / 2SLS	educ on log_wage	~0.060	~0.025	Close to true effect (0.06)
Wald (RF / FS)	(-0.018) / (-0.30)	~0.060	—	Identical to 2SLS (single binary IV)

The Wald estimate (reduced form divided by first stage) equals the 2SLS estimate when using a single binary instrument. This identity provides a useful sanity check: if the two differ, something has gone wrong in the computation.

Step 8: Exercises

Hausman test. Test whether OLS and IV give statistically different estimates. If they do not, OLS may be adequate (less noisy). If they do, endogeneity is a problem and IV is preferred.
Multiple endogenous variables. What if both education and experience are endogenous? You need at least one instrument per endogenous variable. Try adding a second instrument.
Weak IV robust inference. With a first-stage F around 10-15, report the Anderson-Rubin confidence set, which is valid regardless of instrument strength.
Just-identified vs. overidentified. Compare estimates from the just-identified model (one instrument) versus the overidentified model (multiple quarter-of-birth dummies). Discuss the tradeoffs.

Expected output

If your code runs correctly, expect to see:

OLS (biased): Coefficient on education around 0.08–0.10 (upward biased due to omitted ability; true value: 0.06)
First stage (born_q1 on education): Coefficient around -0.25 to -0.35 (negative, since Q1 births get slightly less schooling)
First-stage F-statistic: Around 10–20 (instrument is relevant but not overwhelmingly strong)
Reduced form (born_q1 on log_wage): Small negative coefficient, around -0.01 to -0.03
2SLS (IV) estimate: Around 0.04–0.08, closer to the true value of 0.06 than OLS
Wald estimate: Reduced form / first stage, identical to 2SLS with a single binary instrument
Manual 2SLS SEs (wrong): Different from package 2SLS SEs — demonstrating why you should generally avoid manual second-stage SEs and use a dedicated 2SLS routine instead
Sample size: 10,000 observations

Summary

In this lab you learned:

OLS is biased when the treatment is endogenous (correlated with the error term)
IV/2SLS uses an instrument to isolate exogenous variation in the treatment
Three conditions: relevance (testable), exogeneity (assumed), exclusion restriction (untestable and crucial)
It is important to check the first-stage F-statistic: below the Staiger-Stock 1997 screening threshold of 10 (or, in the just-identified case, below LMMP 2022 F > 104.7) means your instrument is weak and IV is unreliable
Generally avoid using standard errors from a manual second-stage regression
IV estimates LATE (the effect for compliers), which may differ from ATE
The Wald estimator (reduced form divided by first stage) equals 2SLS with a single binary instrument
It is recommended to report the first stage, reduced form, and IV estimate together

Overview#

Step 1: The Endogeneity Problem#

Step 2: Check the Three IV Conditions#

Step 3: Manual 2SLS#

Step 4: Proper 2SLS Estimation#

Step 5: Weak Instruments#

Step 6: Overidentification Test (with Multiple Instruments)#

Step 7: Complete Checklist for IV#

Step 8: Exercises#

Summary#

Overview

Step 1: The Endogeneity Problem

Step 2: Check the Three IV Conditions

Step 3: Manual 2SLS

Step 4: Proper 2SLS Estimation

Step 5: Weak Instruments

Step 6: Overidentification Test (with Multiple Instruments)

Step 7: Complete Checklist for IV

Step 8: Exercises

Summary