Lab·tutorial·9 min read

tutorial90 minutes

Lab: Fuzzy Regression Discontinuity Design

Implement a fuzzy RDD: visualize the running variable, diagnose imperfect compliance, estimate LATE via 2SLS at the cutoff, and assess bandwidth sensitivity.

Method: Regression Discontinuity Design – Fuzzy
Languages: Python, R, Stata
Dataset: Simulated scholarship eligibility data with imperfect compliance

Overview

In this lab you will estimate the effect of a scholarship on college GPA using a fuzzy regression discontinuity design. Unlike a sharp RDD where treatment jumps from 0 to 1 at the cutoff, in a fuzzy RDD the probability of treatment jumps discontinuously but not from 0 to 1 — some eligible students decline the scholarship and some ineligible students receive it through appeals. You will learn to handle this imperfect compliance using instrumental variables at the cutoff.

What you will learn:

The difference between sharp and fuzzy RDD
How to visualize the running variable, the first stage, and the reduced form
How to estimate the fuzzy RDD as 2SLS (IV) at the cutoff
How to use rdrobust for bias-corrected robust inference
How to assess sensitivity to bandwidth choice and polynomial order

Prerequisites: Familiarity with instrumental variables (2SLS) and basic RDD concepts (see the sharp RDD lab).

Step 1: Simulate Scholarship Eligibility Data

Students take an entrance exam (the running variable). Those scoring above 70 are eligible for a scholarship, but compliance is imperfect: some eligible students decline, and some ineligible students appeal successfully.

1# First-time setup: install.packages(c("estimatr", "rdrobust", "modelsummary"))
2library(estimatr)
3library(rdrobust)
4library(modelsummary)
5
6set.seed(42)
7n <- 2000
8
9exam_score <- round(rnorm(n, 70, 12), 1)
10above_cutoff <- as.integer(exam_score >= 70)
11
12prob_scholarship <- 0.15 + 0.65 * above_cutoff
13dist_from_cutoff <- abs(exam_score - 70)
14prob_scholarship <- pmin(pmax(
15prob_scholarship + 0.01 * dist_from_cutoff * above_cutoff -
160.005 * dist_from_cutoff * (1 - above_cutoff), 0.05), 0.95)
17scholarship <- rbinom(n, 1, prob_scholarship)
18
19ability <- 0.02 * (exam_score - 70) + rnorm(n, sd = 0.5)
20gpa <- 2.5 + 0.03 * (exam_score - 70) - 0.0005 * (exam_score - 70)^2 +
21     0.50 * scholarship + ability + rnorm(n, sd = 0.4)
22
23df <- data.frame(exam_score, above_cutoff, scholarship, gpa,
24               score_centered = exam_score - 70)
25
26cat("P(scholarship | above):", mean(df$scholarship[df$above_cutoff == 1]), "\n")
27cat("P(scholarship | below):", mean(df$scholarship[df$above_cutoff == 0]), "\n")

Requiresestimatr rdrobust modelsummary

Expected output:

	exam_score	above_cutoff	scholarship	gpa	score_centered
0	73.9	1	1	3.41	3.9
1	68.3	0	0	2.18	-1.7
2	77.8	1	1	3.85	7.8
3	62.1	0	0	2.05	-7.9
4	70.4	1	0	2.60	0.4

Summary statistics:

Statistic	Value
Sample size	2,000
Above cutoff	~1,000 (approximately 50%)
P(scholarship given above cutoff)	~0.80
P(scholarship given below cutoff)	~0.15
First-stage jump	~0.65
Mean GPA	~2.8

The first-stage jump of approximately 0.65 means that crossing the exam score cutoff of 70 increases the probability of receiving the scholarship by about 65 percentage points. Compliance is imperfect: some students above the cutoff decline the scholarship (~20%), and some below appeal successfully (~15%).

Step 2: Visualize the Discontinuity

Plot three things: (1) the running variable distribution, (2) the first stage (treatment probability), and (3) the reduced form (outcome).

1par(mfrow = c(1, 3))
2
3# (1) Running variable density
4hist(df$exam_score, breaks = 50, main = "Running Variable",
5   xlab = "Exam Score", col = "lightblue")
6abline(v = 70, col = "red", lwd = 2, lty = 2)
7
8# (2) First stage
9rdplot(df$scholarship, df$score_centered, c = 0,
10     title = "First Stage", x.label = "Score (centered)",
11     y.label = "P(Scholarship)")
12
13# (3) Reduced form
14rdplot(df$gpa, df$score_centered, c = 0,
15     title = "Reduced Form", x.label = "Score (centered)",
16     y.label = "GPA")

Expected visualization: Three-panel fuzzy RDD diagnostic plot

Panel 1 (Running Variable Distribution): A histogram of exam scores across 50 bins. The distribution is approximately normal, centered at 70, with a vertical red dashed line at the cutoff (score = 70). The density is smooth through the cutoff, showing no evidence of manipulation (students cannot precisely control their exam score).

Panel 2 (First Stage: Treatment Probability): A binned scatter plot of scholarship receipt probability against the centered score. To the left of zero (scores below 70), the probability is approximately 0.10–0.15. To the right (scores above 70), the probability jumps sharply to approximately 0.75–0.85. The jump at the cutoff is approximately 0.65 — this jump is the first stage. Note that the jump is not from 0 to 1, which is the hallmark of a fuzzy (not sharp) design.

Panel 3 (Reduced Form: Outcome): A binned scatter plot of GPA against the centered score. There is a visible upward jump at the cutoff of approximately 0.30–0.40 GPA points. This reduced-form jump is the intention-to-treat (ITT) effect. To obtain the LATE, divide this by the first-stage jump: approximately 0.35 / 0.65 = 0.54, close to the true LATE of 0.50.

Step 3: Estimate the Fuzzy RDD with 2SLS

The fuzzy RDD is a local IV regression: the instrument is the above-cutoff indicator, the endogenous variable is the treatment (scholarship), and we restrict to observations near the cutoff.

1# Bandwidth selection
2bw <- 10
3df_bw <- df[abs(df$score_centered) <= bw, ]
4cat("Observations in bandwidth:", nrow(df_bw), "\n")
5
6# First stage
7fs <- lm(scholarship ~ above_cutoff + score_centered, data = df_bw)
8cat("First stage partial F on instrument:", summary(fs)$coefficients["above_cutoff", "t value"]^2, "\n")
9
10# Reduced form
11rf <- lm(gpa ~ above_cutoff + score_centered, data = df_bw)
12cat("Reduced form jump:", coef(rf)["above_cutoff"], "\n")
13
14# Wald estimate
15cat("Wald estimate:", coef(rf)["above_cutoff"] / coef(fs)["above_cutoff"], "\n")
16
17# Formal 2SLS with iv_robust
18iv_model <- iv_robust(gpa ~ score_centered + scholarship |
19                     score_centered + above_cutoff,
20                     data = df_bw, se_type = "HC2")
21summary(iv_model)

Expected output: First stage

Variable	Coeff	SE	t	p
Intercept	0.153	0.015	10.2	0.000
above_cutoff	0.648	0.020	32.4	0.000
score_centered	-0.003	0.001	-2.1	0.036

Detail	Value
First-stage F-statistic	~1,050 (well above the Staiger-Stock 1997 screening threshold of 10; comfortably above the LMMP 2022 F > 104.7 just-identified threshold)
Observations in bandwidth	~1,500

Expected output: Reduced form (ITT)

Variable	Coeff	SE	t	p
Intercept	2.49	0.04	62.3	0.000
above_cutoff	0.35	0.05	7.0	0.000
score_centered	0.05	0.003	16.7	0.000

Expected output: Fuzzy RD (2SLS)

Variable	Coeff	SE	t	p
scholarship	0.50	0.08	6.3	0.000
score_centered	0.05	0.003	16.7	0.000

Detail	Value
Method	2SLS, robust SEs
Wald (IV) estimate (RF / FS)	~0.35 / 0.65 = ~0.54
2SLS estimate	~0.50
True LATE	0.50
Bandwidth	10 exam score points

The 2SLS estimate of approximately 0.50 is close to the true LATE. The Wald ratio (reduced form divided by first stage) gives a similar number. This estimate is the local average treatment effect for compliers — students whose scholarship receipt is determined by whether they score above or below 70.

Concept Check

In a fuzzy RDD, the 2SLS estimate is the ratio of the reduced-form jump to the first-stage jump. What does this estimand identify?

The average treatment effect (ATE) for all students.The local average treatment effect (LATE) for compliers at the cutoff — students whose scholarship status is determined by whether they score above or below 70.The intention-to-treat (ITT) effect.The effect for always-takers.

Step 4: Use rdrobust for Bias-Corrected Inference

The rdrobust package implements optimal bandwidth selection and bias-corrected confidence intervals.

1# Fuzzy RDD with rdrobust
2rd_result <- rdrobust(y = df$gpa, x = df$score_centered,
3                     fuzzy = df$scholarship, c = 0)
4summary(rd_result)
5
6# Bandwidth selection
7bw_result <- rdbwselect(y = df$gpa, x = df$score_centered,
8                       fuzzy = df$scholarship, c = 0)
9summary(bw_result)

Requiresrdrobust

Expected output: rdrobust fuzzy RDD

Estimator	Coeff	SE	95% CI
Conventional	~0.48	~0.10	[0.28, 0.68]
Bias-corrected	~0.50	~0.10	[0.30, 0.70]
Robust	~0.50	~0.12	[0.27, 0.73]

Detail	Value
Method	Local linear, triangular kernel
MSE-optimal bandwidth (h)	~8–12 exam score points
Bias-correction bandwidth (b)	~14–18 exam score points
N (effective, left + right)	~1,200–1,600

The bias-corrected robust confidence interval covers the true LATE of 0.50. The MSE-optimal bandwidth is data-driven and balances bias against variance.

Step 5: Bandwidth Sensitivity Analysis

A credible RDD should produce similar estimates across a reasonable range of bandwidths.

1# Bandwidth sensitivity
2bandwidths <- c(5, 7, 10, 12, 15, 20)
3results <- data.frame()
4
5for (bw in bandwidths) {
6rd <- rdrobust(df$gpa, df$score_centered,
7                fuzzy = df$scholarship, c = 0, h = bw)
8results <- rbind(results, data.frame(
9  bw = bw,
10  est = rd$coef[1],
11  se = rd$se[3],
12  ci_lo = rd$ci[3, 1],
13  ci_hi = rd$ci[3, 2],
14  n_eff = sum(rd$N_h)
15))
16}
17
18print(results)
19
20# Plot
21plot(results$bw, results$est, type = "b", pch = 19,
22   ylim = range(c(results$ci_lo, results$ci_hi)),
23   xlab = "Bandwidth", ylab = "Estimate",
24   main = "Bandwidth Sensitivity")
25arrows(results$bw, results$ci_lo, results$bw, results$ci_hi,
26     angle = 90, code = 3, length = 0.05)
27abline(h = 0.5, col = "red", lty = 2)

Requiresrdrobust

Expected output: Bandwidth sensitivity

Bandwidth	N (effective)	Estimate	SE	95% CI
5	~600	~0.55	~0.18	[0.20, 0.90]
7	~900	~0.52	~0.14	[0.25, 0.79]
10	~1,300	~0.49	~0.11	[0.28, 0.70]
12	~1,500	~0.48	~0.10	[0.29, 0.67]
15	~1,700	~0.47	~0.09	[0.30, 0.64]
20	~1,900	~0.45	~0.08	[0.30, 0.60]

The true LATE is 0.50. Estimates are relatively stable across bandwidths, ranging from approximately 0.45 to 0.55. Narrower bandwidths produce noisier estimates (wider confidence intervals) but less bias; wider bandwidths are more precise but may introduce slight bias. All confidence intervals cover the true value.

Concept Check

You find that the fuzzy RDD estimate is 0.50 with bandwidth 10 but jumps to 1.20 with bandwidth 5. What might explain this large difference, and what should you do?

Report the bandwidth-5 result because it uses observations closer to the cutoff, making it more credible.The small bandwidth produces a noisier estimate due to fewer observations. The instability suggests the first stage may be weak or there are too few observations near the cutoff. Report the MSE-optimal bandwidth as primary and show the full sensitivity plot.Average the two estimates to get a more stable number.The larger estimate at bandwidth 5 reveals the true local effect; bandwidth 10 dilutes it by including observations far from the cutoff.

Step 6: Validity Tests

1# First-time setup: install.packages(c("rddensity"))
2# McCrary density test
3library(rddensity)
4density_test <- rddensity(df$score_centered, c = 0)
5summary(density_test)
6
7# Placebo cutoffs
8cat("\n=== Placebo Cutoffs ===\n")
9for (pc in c(-15, -10, -5, 5, 10, 15)) {
10tryCatch({
11  rd_p <- rdrobust(df$gpa, df$score_centered,
12                    fuzzy = df$scholarship, c = pc)
13  cat("Cutoff at", pc, ": est =", round(rd_p$coef[1], 3),
14      ", p =", round(rd_p$pv[1], 3), "\n")
15}, error = function(e) {
16  cat("Cutoff at", pc, ": insufficient data\n")
17})
18}

Requiresrddensity rdrobust

Expected output: McCrary density test

Test	Value
T-statistic	~0.30
p-value	> 0.05 (not significant)
Interpretation	No evidence of manipulation at the cutoff

Since exam scores are drawn from a smooth normal distribution, the density is continuous through the cutoff. There is no manipulation in the simulated data.

Expected output: Placebo cutoffs

Placebo cutoff	Estimate	p-value	Significant?
-15	~0.10	~0.60	No
-10	~-0.15	~0.55	No
-5	~0.08	~0.70	No
+5	~-0.05	~0.80	No
+10	~0.12	~0.50	No
+15	~-0.20	~0.45	No

No significant effects are found at placebo cutoffs away from the true cutoff of 70. This null result supports the identifying assumption: the discontinuity in the outcome is specific to the policy-relevant cutoff, not an artifact of the functional form.

Step 7: Exercises

Sharp RDD comparison. Re-estimate the model as a sharp RDD (using above_cutoff as the treatment directly, ignoring non-compliance). Compare this reduced-form / ITT estimate with the fuzzy LATE. Verify that ITT = LATE times compliance rate.
Polynomial order. Estimate the fuzzy RDD with local linear (p=1) and local quadratic (p=2) specifications. How sensitive are the results?
Covariate adjustment. Add baseline covariates (e.g., high school GPA) to the rdrobust estimation. Covariates should not change the point estimate much in a valid RDD but may reduce the standard error.
Donut RDD. Drop observations within 1 point of the cutoff (the "donut hole") to address potential manipulation concerns. Re-estimate and compare.

Expected output

If your code runs correctly, expect to see:

First-stage jump: Probability of receiving the scholarship jumps by approximately 0.60–0.70 at the cutoff (true value: approximately 0.65)
First-stage F-statistic: Well above the Staiger-Stock 1997 screening threshold of 10 (typically 50+), and well above the LMMP 2022 F > 104.7 just-identified threshold, confirming a strong first stage
Reduced-form (ITT) estimate: Around 0.30–0.40 (the jump in the outcome at the cutoff)
Fuzzy RD LATE estimate: Around 0.40–0.60 (true value: 0.50), computed as reduced form / first stage
rdrobust fuzzy estimate: Point estimate near 0.50 with bias-corrected confidence interval covering the true value
Bandwidth sensitivity: LATE estimates relatively stable across bandwidths, though wider confidence intervals with narrower bandwidths
McCrary density test: McCrary test typically shows p > 0.05, indicating no manipulation at the cutoff
Placebo cutoffs: No significant effects at cutoffs away from the true one
Sample size: 2,000 students

Summary

In this lab you learned:

Fuzzy RDD handles imperfect compliance by treating the cutoff indicator as an instrument for treatment
The fuzzy RDD estimand is the LATE for compliers at the cutoff, estimated as the reduced-form jump divided by the first-stage jump
rdrobust provides MSE-optimal bandwidths and bias-corrected confidence intervals and is among the most widely used tools for RDD estimation
Bandwidth sensitivity analysis is important: stable estimates across bandwidths strengthen credibility
Validity tests include the McCrary density test (no manipulation), placebo cutoffs (no effect away from the true cutoff), and covariate balance at the cutoff
The first-stage F-statistic must be strong; a weak first stage inflates and destabilizes the fuzzy RDD estimate

Overview#

Step 1: Simulate Scholarship Eligibility Data#

Step 2: Visualize the Discontinuity#

Step 3: Estimate the Fuzzy RDD with 2SLS#

Step 4: Use rdrobust for Bias-Corrected Inference#

Step 5: Bandwidth Sensitivity Analysis#

Step 6: Validity Tests#

Step 7: Exercises#

Summary#