MethodAtlas
tutorial90 minutes

Lab: Fuzzy Regression Discontinuity Design

Implement a fuzzy RDD step by step. Learn to visualize the running variable, diagnose imperfect compliance, estimate the local average treatment effect via 2SLS at the cutoff, use rdrobust for bias-corrected inference, and assess bandwidth sensitivity.

Overview

In this lab you will estimate the effect of a scholarship on college GPA using a fuzzy regression discontinuity design. Unlike a sharp RDD where treatment jumps from 0 to 1 at the cutoff, in a fuzzy RDD the probability of treatment jumps discontinuously but not from 0 to 1 — some eligible students decline the scholarship and some ineligible students receive it through appeals. You will learn to handle this imperfect compliance using instrumental variables at the cutoff.

What you will learn:

  • The difference between sharp and fuzzy RDD
  • How to visualize the running variable, the first stage, and the reduced form
  • How to estimate the fuzzy RDD as 2SLS (IV) at the cutoff
  • How to use rdrobust for bias-corrected robust inference
  • How to assess sensitivity to bandwidth choice and polynomial order

Prerequisites: Familiarity with instrumental variables (2SLS) and basic RDD concepts (see the sharp RDD lab).


Step 1: Simulate Scholarship Eligibility Data

Students take an entrance exam (the running variable). Those scoring above 70 are eligible for a scholarship, but compliance is imperfect: some eligible students decline, and some ineligible students appeal successfully.

library(estimatr)
library(rdrobust)
library(modelsummary)

set.seed(42)
n <- 2000

exam_score <- round(rnorm(n, 70, 12), 1)
above_cutoff <- as.integer(exam_score >= 70)

prob_scholarship <- 0.15 + 0.65 * above_cutoff
dist_from_cutoff <- abs(exam_score - 70)
prob_scholarship <- pmin(pmax(
prob_scholarship + 0.01 * dist_from_cutoff * above_cutoff -
0.005 * dist_from_cutoff * (1 - above_cutoff), 0.05), 0.95)
scholarship <- rbinom(n, 1, prob_scholarship)

ability <- 0.02 * (exam_score - 70) + rnorm(n, sd = 0.5)
gpa <- 2.5 + 0.03 * (exam_score - 70) - 0.0005 * (exam_score - 70)^2 +
     0.50 * scholarship + ability + rnorm(n, sd = 0.4)

df <- data.frame(exam_score, above_cutoff, scholarship, gpa,
               score_centered = exam_score - 70)

cat("P(scholarship | above):", mean(df$scholarship[df$above_cutoff == 1]), "\n")
cat("P(scholarship | below):", mean(df$scholarship[df$above_cutoff == 0]), "\n")

Expected output:

exam_scoreabove_cutoffscholarshipgpascore_centered
073.9113.413.9
168.3002.18-1.7
277.8113.857.8
362.1002.05-7.9
470.4102.600.4

Summary statistics:

StatisticValue
Sample size2,000
Above cutoff~1,000 (approximately 50%)
P(scholarship given above cutoff)~0.80
P(scholarship given below cutoff)~0.15
First-stage jump~0.65
Mean GPA~2.8

The first-stage jump of approximately 0.65 means that crossing the exam score cutoff of 70 increases the probability of receiving the scholarship by about 65 percentage points. Compliance is imperfect: some students above the cutoff decline the scholarship (~20%), and some below appeal successfully (~15%).


Step 2: Visualize the Discontinuity

Plot three things: (1) the running variable distribution, (2) the first stage (treatment probability), and (3) the reduced form (outcome).

par(mfrow = c(1, 3))

# (1) Running variable density
hist(df$exam_score, breaks = 50, main = "Running Variable",
   xlab = "Exam Score", col = "lightblue")
abline(v = 70, col = "red", lwd = 2, lty = 2)

# (2) First stage
rdplot(df$scholarship, df$score_centered, c = 0,
     title = "First Stage", x.label = "Score (centered)",
     y.label = "P(Scholarship)")

# (3) Reduced form
rdplot(df$gpa, df$score_centered, c = 0,
     title = "Reduced Form", x.label = "Score (centered)",
     y.label = "GPA")

Step 3: Estimate the Fuzzy RDD with 2SLS

The fuzzy RDD is a local IV regression: the instrument is the above-cutoff indicator, the endogenous variable is the treatment (scholarship), and we restrict to observations near the cutoff.

# Bandwidth selection
bw <- 10
df_bw <- df[abs(df$score_centered) <= bw, ]
cat("Observations in bandwidth:", nrow(df_bw), "\n")

# First stage
fs <- lm(scholarship ~ above_cutoff + score_centered, data = df_bw)
cat("First stage F:", summary(fs)$fstatistic[1], "\n")

# Reduced form
rf <- lm(gpa ~ above_cutoff + score_centered, data = df_bw)
cat("Reduced form jump:", coef(rf)["above_cutoff"], "\n")

# Wald estimate
cat("Wald estimate:", coef(rf)["above_cutoff"] / coef(fs)["above_cutoff"], "\n")

# Formal 2SLS with iv_robust
iv_model <- iv_robust(gpa ~ score_centered + scholarship |
                     score_centered + above_cutoff,
                     data = df_bw, se_type = "HC2")
summary(iv_model)

Expected output: First stage

VariableCoeffSEtp
Intercept0.1530.01510.20.000
above_cutoff0.6480.02032.40.000
score_centered-0.0030.001-2.10.036
DetailValue
First-stage F-statistic~1,050 (well above 10, strong instrument)
Observations in bandwidth~1,500

Expected output: Reduced form (ITT)

VariableCoeffSEtp
Intercept2.490.0462.30.000
above_cutoff0.350.057.00.000
score_centered0.050.00316.70.000

Expected output: Fuzzy RD (2SLS)

VariableCoeffSEtp
scholarship0.500.086.30.000
score_centered0.050.00316.70.000
DetailValue
Method2SLS, robust SEs
Wald (IV) estimate (RF / FS)~0.35 / 0.65 = ~0.54
2SLS estimate~0.50
True LATE0.50
Bandwidth10 exam score points

The 2SLS estimate of approximately 0.50 is close to the true LATE. The Wald ratio (reduced form divided by first stage) gives a similar number. This estimate is the local average treatment effect for compliers — students whose scholarship receipt is determined by whether they score above or below 70.

Concept Check

In a fuzzy RDD, the 2SLS estimate is the ratio of the reduced-form jump to the first-stage jump. What does this estimand identify?


Step 4: Use rdrobust for Bias-Corrected Inference

The rdrobust package implements optimal bandwidth selection and bias-corrected confidence intervals.

# Fuzzy RDD with rdrobust
rd_result <- rdrobust(y = df$gpa, x = df$score_centered,
                     fuzzy = df$scholarship, c = 0)
summary(rd_result)

# Bandwidth selection
bw_result <- rdbwselect(y = df$gpa, x = df$score_centered,
                       fuzzy = df$scholarship, c = 0)
summary(bw_result)
Requiresrdrobust

Expected output: rdrobust fuzzy RDD

EstimatorCoeffSE95% CI
Conventional~0.48~0.10[0.28, 0.68]
Bias-corrected~0.50~0.10[0.30, 0.70]
Robust~0.50~0.12[0.27, 0.73]
DetailValue
MethodLocal linear, triangular kernel
MSE-optimal bandwidth (h)~8–12 exam score points
Bias-correction bandwidth (b)~14–18 exam score points
N (effective, left + right)~1,200–1,600

The bias-corrected robust confidence interval covers the true LATE of 0.50. The MSE-optimal bandwidth is data-driven and balances bias against variance.


Step 5: Bandwidth Sensitivity Analysis

A credible RDD should produce similar estimates across a reasonable range of bandwidths.

# Bandwidth sensitivity
bandwidths <- c(5, 7, 10, 12, 15, 20)
results <- data.frame()

for (bw in bandwidths) {
rd <- rdrobust(df$gpa, df$score_centered,
                fuzzy = df$scholarship, c = 0, h = bw)
results <- rbind(results, data.frame(
  bw = bw,
  est = rd$coef[1],
  se = rd$se[3],
  ci_lo = rd$ci[3, 1],
  ci_hi = rd$ci[3, 2],
  n_eff = sum(rd$N_h)
))
}

print(results)

# Plot
plot(results$bw, results$est, type = "b", pch = 19,
   ylim = range(c(results$ci_lo, results$ci_hi)),
   xlab = "Bandwidth", ylab = "Estimate",
   main = "Bandwidth Sensitivity")
arrows(results$bw, results$ci_lo, results$bw, results$ci_hi,
     angle = 90, code = 3, length = 0.05)
abline(h = 0.5, col = "red", lty = 2)
Requiresrdrobust

Expected output: Bandwidth sensitivity

BandwidthN (effective)EstimateSE95% CI
5~600~0.55~0.18[0.20, 0.90]
7~900~0.52~0.14[0.25, 0.79]
10~1,300~0.49~0.11[0.28, 0.70]
12~1,500~0.48~0.10[0.29, 0.67]
15~1,700~0.47~0.09[0.30, 0.64]
20~1,900~0.45~0.08[0.30, 0.60]

The true LATE is 0.50. Estimates are relatively stable across bandwidths, ranging from approximately 0.45 to 0.55. Narrower bandwidths produce noisier estimates (wider confidence intervals) but less bias; wider bandwidths are more precise but may introduce slight bias. All confidence intervals cover the true value.

Concept Check

You find that the fuzzy RDD estimate is 0.50 with bandwidth 10 but jumps to 1.20 with bandwidth 5. What might explain this large difference, and what should you do?


Step 6: Validity Tests

# McCrary density test
library(rddensity)
density_test <- rddensity(df$score_centered, c = 0)
summary(density_test)

# Placebo cutoffs
cat("\n=== Placebo Cutoffs ===\n")
for (pc in c(-15, -10, -5, 5, 10, 15)) {
tryCatch({
  rd_p <- rdrobust(df$gpa, df$score_centered,
                    fuzzy = df$scholarship, c = pc)
  cat("Cutoff at", pc, ": est =", round(rd_p$coef[1], 3),
      ", p =", round(rd_p$pv[1], 3), "\n")
}, error = function(e) {
  cat("Cutoff at", pc, ": insufficient data\n")
})
}

Expected output: McCrary density test

TestValue
T-statistic~0.30
p-value> 0.05 (not significant)
InterpretationNo evidence of manipulation at the cutoff

Since exam scores are drawn from a smooth normal distribution, the density is continuous through the cutoff. There is no manipulation in the simulated data.

Expected output: Placebo cutoffs

Placebo cutoffEstimatep-valueSignificant?
-15~0.10~0.60No
-10~-0.15~0.55No
-5~0.08~0.70No
+5~-0.05~0.80No
+10~0.12~0.50No
+15~-0.20~0.45No

No significant effects are found at placebo cutoffs away from the true cutoff of 70. This null result supports the identifying assumption: the discontinuity in the outcome is specific to the policy-relevant cutoff, not an artifact of the functional form.


Step 7: Exercises

Try these on your own:

  1. Sharp RDD comparison. Re-estimate the model as a sharp RDD (using above_cutoff as the treatment directly, ignoring non-compliance). Compare this reduced-form / ITT estimate with the fuzzy LATE. Verify that ITT = LATE times compliance rate.

  2. Polynomial order. Estimate the fuzzy RDD with local linear (p=1) and local quadratic (p=2) specifications. How sensitive are the results?

  3. Covariate adjustment. Add baseline covariates (e.g., high school GPA) to the rdrobust estimation. Covariates should not change the point estimate much in a valid RDD but may reduce the standard error.

  4. Donut RDD. Drop observations within 1 point of the cutoff (the "donut hole") to address potential manipulation concerns. Re-estimate and compare.


Summary

In this lab you learned:

  • Fuzzy RDD handles imperfect compliance by treating the cutoff indicator as an instrument for treatment
  • The fuzzy RDD estimand is the LATE for compliers at the cutoff, estimated as the reduced-form jump divided by the first-stage jump
  • rdrobust provides MSE-optimal bandwidths and bias-corrected confidence intervals and is among the most widely used tools for RDD estimation
  • Bandwidth sensitivity analysis is essential: stable estimates across bandwidths strengthen credibility
  • Validity tests include the McCrary density test (no manipulation), placebo cutoffs (no effect away from the true cutoff), and covariate balance at the cutoff
  • The first-stage F-statistic must be strong; a weak first stage inflates and destabilizes the fuzzy RDD estimate