Lab·replication·9 min read

replication120 minutes

Replication Lab: Incumbency Advantage in U.S. House Elections

Replicate Lee on incumbency advantage: visualize the discontinuity at 50% vote share, estimate local linear regressions, run bandwidth and McCrary tests.

Method: Regression Discontinuity Design – Sharp
Languages: Python, R, Stata
Dataset: Simulated U.S. House election data matching Lee (2008)

Overview

In this replication lab, you will reproduce the main findings from one of the foundational papers in regression discontinuity design:

Lee, David S. 2008. "Randomized Experiments from Non-Random Selection in U.S. House Elections." Journal of Econometrics 142(2): 675–697.

Lee exploits the fact that in a two-candidate election, the candidate who barely wins (vote share just above 50%) is quasi-randomly assigned to incumbency. By comparing electoral outcomes of bare winners and bare losers in subsequent elections, he estimates the causal effect of incumbency on the next-election vote share. (Lee's paper also reports a discontinuity of roughly 45 percentage points in the probability of winning the next election; see the RDD method page for the probability-outcome framing.)

Why this paper matters: It demonstrated that RDD can be applied to elections, provided a clean estimate of incumbency advantage, and established best practices for RDD estimation (bandwidth choice, functional form, density tests) that are now standard.

What you will do:

Simulate election data matching the published RDD estimates
Visualize the discontinuity at the 50% threshold
Estimate local linear regressions with various bandwidths
Use rdrobust for optimal bandwidth selection
Conduct the McCrary density test
Compare your results to the published ~8 percentage point incumbency advantage

Step 1: Simulate Election Data

The running variable is the Democratic vote share margin (centered at 50%). Observations just above zero are bare Democratic winners (incumbents in the next election); those just below zero are bare Democratic losers.

1# First-time setup: install.packages(c("estimatr", "rdrobust"))
2library(estimatr)
3library(rdrobust)
4
5set.seed(2008)
6n <- 6558
7
8margin <- pmin(pmax(rnorm(n, 0, 0.20), -0.5), 0.5)
9win <- as.integer(margin > 0)
10
11next_vote <- 0.45 + 0.40 * margin + 0.08 * win -
120.20 * margin^2 + 0.15 * margin * win + rnorm(n, 0, 0.08)
13next_win <- as.integer(next_vote > 0.5)
14
15df <- data.frame(margin, win, next_vote, next_win)
16cat("N =", nrow(df), "elections\n")
17summary(df)

Requiresestimatr rdrobust

Expected output:

	margin	win	next_vote	next_win
0	-0.032	0	0.428	0
1	0.147	1	0.574	1
2	-0.189	0	0.361	0
3	0.056	1	0.527	1
4	-0.104	0	0.401	0

Summary statistics:

Statistic	Value
N (elections)	6,558
Mean vote margin	~0.00 (centered at zero)
Fraction winning current election	~0.50
Mean next-election vote share	~0.49
SD of next-election vote share	~0.13

Step 2: Visualize the Discontinuity

The most compelling evidence for an RDD comes from a visual plot showing a clear jump at the threshold.

1# RDD plot using rdrobust
2rdplot(y = df$next_vote, x = df$margin, c = 0,
3     title = "Next-Election Vote Share",
4     x.label = "Current Vote Margin",
5     y.label = "Next-Election Vote Share")
6
7rdplot(y = df$next_win, x = df$margin, c = 0,
8     title = "Win Probability",
9     x.label = "Current Vote Margin",
10     y.label = "Pr(Win Next Election)")

Requiresrdrobust

Expected visualization: Two-panel RD plot

Panel A (Next-Election Vote Share): Binned scatter plot of Democratic next-election vote share against the current-election vote margin. There is a clear upward jump of approximately 0.08 (8 percentage points) at the zero threshold. To the left of zero (bare losers), binned means trend upward toward approximately 0.45 at the cutoff. To the right (bare winners), binned means start at approximately 0.53 at the cutoff and continue upward. Fitted quadratic curves on each side clearly show the gap. This pattern is the hallmark RDD plot from Lee (2008).

Panel B (Win Probability): Binned scatter plot of the probability of winning the next election against the current margin. The jump is even more dramatic in probability terms: bare losers have roughly a 35–40% chance of winning next time, while bare winners have roughly a 55–65% chance. The discontinuity at zero is sharp and visually unmistakable.

Both panels include a vertical dashed line at the 50% threshold (margin = 0).

Step 3: Local Linear Regression and rdrobust

The standard approach is to fit local linear regressions on each side of the cutoff within a bandwidth window.

1# rdrobust with optimal bandwidth
2rd <- rdrobust(y = df$next_vote, x = df$margin, c = 0)
3summary(rd)
4
5# Manual estimation at various bandwidths
6bandwidths <- c(0.05, 0.10, 0.15, 0.20, 0.25)
7cat("\n=== Bandwidth Sensitivity ===\n")
8for (bw in bandwidths) {
9local_df <- df[abs(df$margin) <= bw, ]
10m <- lm_robust(next_vote ~ win * margin, data = local_df, se_type = "HC1")
11cat("BW =", bw, ": Estimate =", round(coef(m)["win"], 4),
12    " SE =", round(m$std.error["win"], 4),
13    " N =", nrow(local_df), "\n")
14}

Requiresrdrobust

Expected output: Local linear RDD estimates

Bandwidth	Estimate	SE	N (left)	N (right)
0.05	0.078	0.015	~850	~850
0.10	0.081	0.010	~1,650	~1,650
0.15	0.083	0.008	~2,400	~2,400
0.20	0.085	0.007	~3,100	~3,100
0.25	0.086	0.006	~3,250	~3,250

Published estimate: approximately 0.08 (8 percentage points).

Expected output: rdrobust optimal bandwidth estimate

Detail	Value
Method	Local linear, triangular kernel
Conventional estimate	~0.080
Bias-corrected estimate	~0.079
Robust SE	~0.012
Optimal bandwidth (h)	~0.12
Bias-correction bandwidth (b)	~0.20
N (effective, left + right)	~3,900

The estimates are stable across bandwidths and closely match the published incumbency advantage of approximately 8 percentage points. The bias-corrected confidence interval from rdrobust excludes zero, confirming statistical significance.

Concept Check

As you decrease the bandwidth in a local linear RDD, what happens to the bias-variance tradeoff?

Both bias and variance decrease.Bias decreases (observations closer to the cutoff better approximate the local treatment effect) but variance increases (fewer observations in the estimation window).Bias increases and variance decreases.Neither changes — the estimate is always consistent.

Step 4: McCrary Density Test

A key validity check for RDD is that units cannot precisely manipulate the running variable to sort above or below the threshold. The McCrary (2008) test checks for a discontinuity in the density of the running variable at the cutoff.

1# First-time setup: install.packages(c("rddensity"))
2# McCrary density test using rddensity
3library(rddensity)
4density_test <- rddensity(X = df$margin, c = 0)
5summary(density_test)
6
7# Plot
8rdplotdensity(density_test, df$margin,
9            title = "McCrary Density Test")

Requiresrddensity

Expected output: McCrary density test (informal)

Test	Value
Observations in [-0.02, 0)	~330
Observations in [0, 0.02)	~330
Density ratio (right/left)	~1.00
Interpretation	No evidence of manipulation

The ratio near 1.0 indicates that the density of the running variable is continuous through the cutoff. Lee (2008) argues that elections are inherently noisy, making precise manipulation of vote shares effectively impossible. Our simulated data confirms this pattern: there is no bunching because the running variable is drawn from a smooth symmetric distribution.

Concept Check

The McCrary density test checks for bunching at the cutoff. Why would bunching be a problem for the RDD?

It would reduce the sample size near the cutoff.If candidates can precisely manipulate their vote share to land just above 50%, then barely-winners and barely-losers are no longer comparable, violating the as-if-random assignment near the cutoff.Bunching makes the outcome variable non-normal.It indicates that the running variable has measurement error.

Step 5: Compare with Published Results

cat("=== Comparison with Lee (2008) ===\n")
cat("Published incumbency advantage: ~8 percentage points\n")
cat("Our estimate:", round(rd$coef[1], 4), "\n")
cat("Our SE:", round(rd$se[1], 4), "\n")
cat("Optimal bandwidth:", round(rd$bws[1,1], 4), "\n")

Expected output: Comparison with Lee (2008)

Statistic	Published	Ours
Incumbency advantage (vote share)	~0.08	~0.08
SE	~0.01	~0.01
N (elections)	6,558	6,558
McCrary test p-value	> 0.10	pass

The published finding is an approximately 8 percentage point incumbency advantage, robust to bandwidth choice and polynomial order. Our simulation reproduces this central result. Differences in exact standard errors and bandwidth-specific estimates are expected because the data are simulated rather than real.

Summary

Our replication confirms the central finding of Lee (2008):

Incumbency confers a substantial electoral advantage. Barely winning an election increases the Democratic candidate's vote share in the next election by approximately 8 percentage points.
The effect is visually striking. The RDD plot shows a clear, sharp discontinuity at the 50% threshold — the hallmark of a compelling RDD.
Results are robust to bandwidth choice. Estimates are stable across a wide range of bandwidths, from narrow (high variance, low bias) to wide (lower variance, potentially more bias).
No evidence of manipulation. The McCrary density test finds no bunching at the cutoff, supporting the assumption that candidates cannot precisely control their vote share around 50%.
Differences from published results are due to simulation. With the real election data, estimates would match Lee's published tables exactly.

Extension Exercises

Polynomial sensitivity. Estimate the RDD with local quadratic and local cubic regressions. How do the estimates change? Why do Gelman and Imbens (2019) recommend against high-order polynomials?
Win probability as outcome. Re-estimate the RDD using an indicator for winning the next election (instead of vote share) as the outcome. How does the magnitude compare?
Covariate balance. Test whether pre-determined covariates (e.g., district demographics, prior incumbency status) show a discontinuity at the cutoff. They should not if the RDD is valid.
Donut hole test. Drop observations very close to the cutoff (e.g., within 0.5 percentage points) and re-estimate. If results are driven by suspicious data near the cutoff, estimates will change substantially.
Fuzzy RDD. Modify the simulation so that winning an election only increases (but does not guarantee) running again. Estimate a fuzzy RDD where the first stage is the effect of winning on running in the next election.

Overview#

Step 1: Simulate Election Data#

Step 2: Visualize the Discontinuity#

Step 3: Local Linear Regression and rdrobust#

Step 4: McCrary Density Test#

Step 5: Compare with Published Results#

Summary#

Extension Exercises#