Lab: Fuzzy Regression Discontinuity Design
Implement a fuzzy RDD step by step. Learn to visualize the running variable, diagnose imperfect compliance, estimate the local average treatment effect via 2SLS at the cutoff, use rdrobust for bias-corrected inference, and assess bandwidth sensitivity.
Overview
In this lab you will estimate the effect of a scholarship on college GPA using a fuzzy regression discontinuity design. Unlike a sharp RDD where treatment jumps from 0 to 1 at the cutoff, in a fuzzy RDD the probability of treatment jumps discontinuously but not from 0 to 1 — some eligible students decline the scholarship and some ineligible students receive it through appeals. You will learn to handle this imperfect compliance using instrumental variables at the cutoff.
What you will learn:
- The difference between sharp and fuzzy RDD
- How to visualize the running variable, the first stage, and the reduced form
- How to estimate the fuzzy RDD as 2SLS (IV) at the cutoff
- How to use rdrobust for bias-corrected robust inference
- How to assess sensitivity to bandwidth choice and polynomial order
Prerequisites: Familiarity with instrumental variables (2SLS) and basic RDD concepts (see the sharp RDD lab).
Step 1: Simulate Scholarship Eligibility Data
Students take an entrance exam (the running variable). Those scoring above 70 are eligible for a scholarship, but compliance is imperfect: some eligible students decline, and some ineligible students appeal successfully.
library(estimatr)
library(rdrobust)
library(modelsummary)
set.seed(42)
n <- 2000
exam_score <- round(rnorm(n, 70, 12), 1)
above_cutoff <- as.integer(exam_score >= 70)
prob_scholarship <- 0.15 + 0.65 * above_cutoff
dist_from_cutoff <- abs(exam_score - 70)
prob_scholarship <- pmin(pmax(
prob_scholarship + 0.01 * dist_from_cutoff * above_cutoff -
0.005 * dist_from_cutoff * (1 - above_cutoff), 0.05), 0.95)
scholarship <- rbinom(n, 1, prob_scholarship)
ability <- 0.02 * (exam_score - 70) + rnorm(n, sd = 0.5)
gpa <- 2.5 + 0.03 * (exam_score - 70) - 0.0005 * (exam_score - 70)^2 +
0.50 * scholarship + ability + rnorm(n, sd = 0.4)
df <- data.frame(exam_score, above_cutoff, scholarship, gpa,
score_centered = exam_score - 70)
cat("P(scholarship | above):", mean(df$scholarship[df$above_cutoff == 1]), "\n")
cat("P(scholarship | below):", mean(df$scholarship[df$above_cutoff == 0]), "\n")Expected output:
| exam_score | above_cutoff | scholarship | gpa | score_centered | |
|---|---|---|---|---|---|
| 0 | 73.9 | 1 | 1 | 3.41 | 3.9 |
| 1 | 68.3 | 0 | 0 | 2.18 | -1.7 |
| 2 | 77.8 | 1 | 1 | 3.85 | 7.8 |
| 3 | 62.1 | 0 | 0 | 2.05 | -7.9 |
| 4 | 70.4 | 1 | 0 | 2.60 | 0.4 |
Summary statistics:
| Statistic | Value |
|---|---|
| Sample size | 2,000 |
| Above cutoff | ~1,000 (approximately 50%) |
| P(scholarship given above cutoff) | ~0.80 |
| P(scholarship given below cutoff) | ~0.15 |
| First-stage jump | ~0.65 |
| Mean GPA | ~2.8 |
The first-stage jump of approximately 0.65 means that crossing the exam score cutoff of 70 increases the probability of receiving the scholarship by about 65 percentage points. Compliance is imperfect: some students above the cutoff decline the scholarship (~20%), and some below appeal successfully (~15%).
Step 2: Visualize the Discontinuity
Plot three things: (1) the running variable distribution, (2) the first stage (treatment probability), and (3) the reduced form (outcome).
par(mfrow = c(1, 3))
# (1) Running variable density
hist(df$exam_score, breaks = 50, main = "Running Variable",
xlab = "Exam Score", col = "lightblue")
abline(v = 70, col = "red", lwd = 2, lty = 2)
# (2) First stage
rdplot(df$scholarship, df$score_centered, c = 0,
title = "First Stage", x.label = "Score (centered)",
y.label = "P(Scholarship)")
# (3) Reduced form
rdplot(df$gpa, df$score_centered, c = 0,
title = "Reduced Form", x.label = "Score (centered)",
y.label = "GPA")Step 3: Estimate the Fuzzy RDD with 2SLS
The fuzzy RDD is a local IV regression: the instrument is the above-cutoff indicator, the endogenous variable is the treatment (scholarship), and we restrict to observations near the cutoff.
# Bandwidth selection
bw <- 10
df_bw <- df[abs(df$score_centered) <= bw, ]
cat("Observations in bandwidth:", nrow(df_bw), "\n")
# First stage
fs <- lm(scholarship ~ above_cutoff + score_centered, data = df_bw)
cat("First stage F:", summary(fs)$fstatistic[1], "\n")
# Reduced form
rf <- lm(gpa ~ above_cutoff + score_centered, data = df_bw)
cat("Reduced form jump:", coef(rf)["above_cutoff"], "\n")
# Wald estimate
cat("Wald estimate:", coef(rf)["above_cutoff"] / coef(fs)["above_cutoff"], "\n")
# Formal 2SLS with iv_robust
iv_model <- iv_robust(gpa ~ score_centered + scholarship |
score_centered + above_cutoff,
data = df_bw, se_type = "HC2")
summary(iv_model)Expected output: First stage
| Variable | Coeff | SE | t | p |
|---|---|---|---|---|
| Intercept | 0.153 | 0.015 | 10.2 | 0.000 |
| above_cutoff | 0.648 | 0.020 | 32.4 | 0.000 |
| score_centered | -0.003 | 0.001 | -2.1 | 0.036 |
| Detail | Value |
|---|---|
| First-stage F-statistic | ~1,050 (well above 10, strong instrument) |
| Observations in bandwidth | ~1,500 |
Expected output: Reduced form (ITT)
| Variable | Coeff | SE | t | p |
|---|---|---|---|---|
| Intercept | 2.49 | 0.04 | 62.3 | 0.000 |
| above_cutoff | 0.35 | 0.05 | 7.0 | 0.000 |
| score_centered | 0.05 | 0.003 | 16.7 | 0.000 |
Expected output: Fuzzy RD (2SLS)
| Variable | Coeff | SE | t | p |
|---|---|---|---|---|
| scholarship | 0.50 | 0.08 | 6.3 | 0.000 |
| score_centered | 0.05 | 0.003 | 16.7 | 0.000 |
| Detail | Value |
|---|---|
| Method | 2SLS, robust SEs |
| Wald (IV) estimate (RF / FS) | ~0.35 / 0.65 = ~0.54 |
| 2SLS estimate | ~0.50 |
| True LATE | 0.50 |
| Bandwidth | 10 exam score points |
The 2SLS estimate of approximately 0.50 is close to the true LATE. The Wald ratio (reduced form divided by first stage) gives a similar number. This estimate is the local average treatment effect for compliers — students whose scholarship receipt is determined by whether they score above or below 70.
In a fuzzy RDD, the 2SLS estimate is the ratio of the reduced-form jump to the first-stage jump. What does this estimand identify?
Step 4: Use rdrobust for Bias-Corrected Inference
The rdrobust package implements optimal bandwidth selection and bias-corrected confidence intervals.
# Fuzzy RDD with rdrobust
rd_result <- rdrobust(y = df$gpa, x = df$score_centered,
fuzzy = df$scholarship, c = 0)
summary(rd_result)
# Bandwidth selection
bw_result <- rdbwselect(y = df$gpa, x = df$score_centered,
fuzzy = df$scholarship, c = 0)
summary(bw_result)Expected output: rdrobust fuzzy RDD
| Estimator | Coeff | SE | 95% CI |
|---|---|---|---|
| Conventional | ~0.48 | ~0.10 | [0.28, 0.68] |
| Bias-corrected | ~0.50 | ~0.10 | [0.30, 0.70] |
| Robust | ~0.50 | ~0.12 | [0.27, 0.73] |
| Detail | Value |
|---|---|
| Method | Local linear, triangular kernel |
| MSE-optimal bandwidth (h) | ~8–12 exam score points |
| Bias-correction bandwidth (b) | ~14–18 exam score points |
| N (effective, left + right) | ~1,200–1,600 |
The bias-corrected robust confidence interval covers the true LATE of 0.50. The MSE-optimal bandwidth is data-driven and balances bias against variance.
Step 5: Bandwidth Sensitivity Analysis
A credible RDD should produce similar estimates across a reasonable range of bandwidths.
# Bandwidth sensitivity
bandwidths <- c(5, 7, 10, 12, 15, 20)
results <- data.frame()
for (bw in bandwidths) {
rd <- rdrobust(df$gpa, df$score_centered,
fuzzy = df$scholarship, c = 0, h = bw)
results <- rbind(results, data.frame(
bw = bw,
est = rd$coef[1],
se = rd$se[3],
ci_lo = rd$ci[3, 1],
ci_hi = rd$ci[3, 2],
n_eff = sum(rd$N_h)
))
}
print(results)
# Plot
plot(results$bw, results$est, type = "b", pch = 19,
ylim = range(c(results$ci_lo, results$ci_hi)),
xlab = "Bandwidth", ylab = "Estimate",
main = "Bandwidth Sensitivity")
arrows(results$bw, results$ci_lo, results$bw, results$ci_hi,
angle = 90, code = 3, length = 0.05)
abline(h = 0.5, col = "red", lty = 2)Expected output: Bandwidth sensitivity
| Bandwidth | N (effective) | Estimate | SE | 95% CI |
|---|---|---|---|---|
| 5 | ~600 | ~0.55 | ~0.18 | [0.20, 0.90] |
| 7 | ~900 | ~0.52 | ~0.14 | [0.25, 0.79] |
| 10 | ~1,300 | ~0.49 | ~0.11 | [0.28, 0.70] |
| 12 | ~1,500 | ~0.48 | ~0.10 | [0.29, 0.67] |
| 15 | ~1,700 | ~0.47 | ~0.09 | [0.30, 0.64] |
| 20 | ~1,900 | ~0.45 | ~0.08 | [0.30, 0.60] |
The true LATE is 0.50. Estimates are relatively stable across bandwidths, ranging from approximately 0.45 to 0.55. Narrower bandwidths produce noisier estimates (wider confidence intervals) but less bias; wider bandwidths are more precise but may introduce slight bias. All confidence intervals cover the true value.
You find that the fuzzy RDD estimate is 0.50 with bandwidth 10 but jumps to 1.20 with bandwidth 5. What might explain this large difference, and what should you do?
Step 6: Validity Tests
# McCrary density test
library(rddensity)
density_test <- rddensity(df$score_centered, c = 0)
summary(density_test)
# Placebo cutoffs
cat("\n=== Placebo Cutoffs ===\n")
for (pc in c(-15, -10, -5, 5, 10, 15)) {
tryCatch({
rd_p <- rdrobust(df$gpa, df$score_centered,
fuzzy = df$scholarship, c = pc)
cat("Cutoff at", pc, ": est =", round(rd_p$coef[1], 3),
", p =", round(rd_p$pv[1], 3), "\n")
}, error = function(e) {
cat("Cutoff at", pc, ": insufficient data\n")
})
}Expected output: McCrary density test
| Test | Value |
|---|---|
| T-statistic | ~0.30 |
| p-value | > 0.05 (not significant) |
| Interpretation | No evidence of manipulation at the cutoff |
Since exam scores are drawn from a smooth normal distribution, the density is continuous through the cutoff. There is no manipulation in the simulated data.
Expected output: Placebo cutoffs
| Placebo cutoff | Estimate | p-value | Significant? |
|---|---|---|---|
| -15 | ~0.10 | ~0.60 | No |
| -10 | ~-0.15 | ~0.55 | No |
| -5 | ~0.08 | ~0.70 | No |
| +5 | ~-0.05 | ~0.80 | No |
| +10 | ~0.12 | ~0.50 | No |
| +15 | ~-0.20 | ~0.45 | No |
No significant effects are found at placebo cutoffs away from the true cutoff of 70. This null result supports the identifying assumption: the discontinuity in the outcome is specific to the policy-relevant cutoff, not an artifact of the functional form.
Step 7: Exercises
Try these on your own:
-
Sharp RDD comparison. Re-estimate the model as a sharp RDD (using
above_cutoffas the treatment directly, ignoring non-compliance). Compare this reduced-form / ITT estimate with the fuzzy LATE. Verify that ITT = LATE times compliance rate. -
Polynomial order. Estimate the fuzzy RDD with local linear (p=1) and local quadratic (p=2) specifications. How sensitive are the results?
-
Covariate adjustment. Add baseline covariates (e.g., high school GPA) to the rdrobust estimation. Covariates should not change the point estimate much in a valid RDD but may reduce the standard error.
-
Donut RDD. Drop observations within 1 point of the cutoff (the "donut hole") to address potential manipulation concerns. Re-estimate and compare.
Summary
In this lab you learned:
- Fuzzy RDD handles imperfect compliance by treating the cutoff indicator as an instrument for treatment
- The fuzzy RDD estimand is the LATE for compliers at the cutoff, estimated as the reduced-form jump divided by the first-stage jump
- rdrobust provides MSE-optimal bandwidths and bias-corrected confidence intervals and is among the most widely used tools for RDD estimation
- Bandwidth sensitivity analysis is essential: stable estimates across bandwidths strengthen credibility
- Validity tests include the McCrary density test (no manipulation), placebo cutoffs (no effect away from the true cutoff), and covariate balance at the cutoff
- The first-stage F-statistic must be strong; a weak first stage inflates and destabilizes the fuzzy RDD estimate