Lab: Regression Kink Design from Scratch
Implement a regression kink design step by step. Simulate a kinked treatment assignment, estimate kinks in the outcome, compute the RKD ratio estimator, assess bandwidth sensitivity, and run a density continuity test.
Overview
The Regression Kink Design (RKD) exploits a change in the slope (not the level) of a treatment assignment function at a known threshold. Unlike the regression discontinuity design (RDD), which requires a jump in the treatment at the cutoff, RKD identifies causal effects from a kink — a point where the derivative of the treatment with respect to the running variable changes.
What you will learn:
- How a kink in the treatment assignment function differs from a discontinuity
- How to simulate a kinked benefit schedule
- How to estimate the RKD ratio (change in outcome slope divided by change in treatment slope)
- How bandwidth choice affects precision and bias
- How to run the McCrary density continuity test to check for manipulation
Prerequisites: Regression discontinuity design (see the RDD tutorial lab), local polynomial regression.
Step 1: Simulate a Kinked Treatment Schedule
Unemployment insurance (UI) benefit formulas often create kinks: benefits are a fixed fraction of prior earnings up to a cap, after which the replacement rate drops. We simulate this structure.
library(estimatr)
library(rdrobust)
set.seed(2015)
n <- 5000
# Running variable: prior weekly earnings (centered at the kink)
# Kink is at earnings = 0 (centered)
x <- runif(n, -300, 300)
# Treatment: weekly UI benefit amount
# Below the kink (x < 0): replacement rate = 0.55
# Above the kink (x >= 0): replacement rate = 0.30
# This creates a kink in benefits as a function of earnings
benefit_base <- 300 # Benefit at the kink point
benefit <- ifelse(x < 0,
benefit_base + 0.55 * x, # Steeper slope below kink
benefit_base + 0.30 * x # Flatter slope above kink
)
# Add noise to treatment (fuzzy kink)
benefit <- benefit + rnorm(n, 0, 15)
# Outcome: unemployment duration (weeks)
# True causal effect of benefit on duration: 0.10 weeks per dollar
true_effect <- 0.10
duration <- 20 + true_effect * benefit + 0.02 * x + rnorm(n, 0, 4)
df <- data.frame(x, benefit, duration)
# Visualize the kink in treatment
cat("=== Treatment Schedule ===\n")
cat("Slope below kink (replacement rate):", 0.55, "\n")
cat("Slope above kink (replacement rate):", 0.30, "\n")
cat("Change in slope at kink:", 0.30 - 0.55, "\n")Expected output:
| Statistic | Value |
|---|---|
| N | 5,000 |
| Slope below kink | 0.55 |
| Slope above kink | 0.30 |
| Change in slope (first-stage kink) | -0.25 |
| Mean benefit | ~300 |
| Mean duration | ~50 weeks |
The treatment (UI benefit) has a kink at x = 0: the replacement rate drops from 0.55 to 0.30. Unlike an RDD, there is no jump in benefits at the threshold — the benefit function is continuous but changes slope.
Step 2: Estimate the Kink in the Outcome
If the treatment causally affects the outcome, the kink in the treatment schedule should induce a corresponding kink in the outcome. We estimate separate linear regressions on each side of the kink.
# Estimate slopes of benefit on each side of the kink
df$below <- as.integer(df$x < 0)
df$x_below <- df$x * df$below
df$x_above <- df$x * (1 - df$below)
# First stage: kink in treatment
fs <- lm(benefit ~ x_below + x_above, data = df)
slope_below_t <- coef(fs)["x_below"]
slope_above_t <- coef(fs)["x_above"]
kink_t <- slope_above_t - slope_below_t
cat("=== First-Stage Kink (Treatment) ===\n")
cat("Slope below:", round(slope_below_t, 4), "\n")
cat("Slope above:", round(slope_above_t, 4), "\n")
cat("Change in slope:", round(kink_t, 4), "\n\n")
# Reduced form: kink in outcome
rf <- lm(duration ~ x_below + x_above, data = df)
slope_below_y <- coef(rf)["x_below"]
slope_above_y <- coef(rf)["x_above"]
kink_y <- slope_above_y - slope_below_y
cat("=== Reduced-Form Kink (Outcome) ===\n")
cat("Slope below:", round(slope_below_y, 4), "\n")
cat("Slope above:", round(slope_above_y, 4), "\n")
cat("Change in slope:", round(kink_y, 4), "\n")Expected output:
| Component | Slope Below | Slope Above | Change in Slope |
|---|---|---|---|
| Treatment (benefit) | ~0.55 | ~0.30 | ~-0.25 |
| Outcome (duration) | ~0.075 | ~0.050 | ~-0.025 |
The treatment has a clear kink: the slope changes by approximately -0.25 at the threshold. The outcome also exhibits a kink — this kink is the reduced-form evidence that the kink in benefits affects unemployment duration.
Step 3: Compute the RKD Ratio Estimator
The RKD estimator is the ratio of the change in the outcome slope to the change in the treatment slope at the kink point, analogous to the Wald estimator in IV:
RKD = (change in outcome slope) / (change in treatment slope)
# RKD ratio estimator
rkd_estimate <- kink_y / kink_t
cat("=== RKD Estimate ===\n")
cat("Reduced-form kink (outcome):", round(kink_y, 5), "\n")
cat("First-stage kink (treatment):", round(kink_t, 5), "\n")
cat("RKD ratio (kink_y / kink_t):", round(rkd_estimate, 4), "\n")
cat("True causal effect:", true_effect, "\n")
cat("Bias:", round(rkd_estimate - true_effect, 4), "\n\n")
# Using rdrobust for formal RKD estimation
# deriv = 1 tells rdrobust to estimate the kink (first derivative)
rd_first <- rdrobust(df$benefit, df$x, deriv = 1)
rd_reduced <- rdrobust(df$duration, df$x, deriv = 1)
rkd_robust <- rd_reduced$coef[1] / rd_first$coef[1]
cat("rdrobust RKD estimate:", round(rkd_robust, 4), "\n")Expected output:
| Estimator | Estimate | True Effect | Bias |
|---|---|---|---|
| Manual RKD ratio | ~0.10 | 0.10 | ~0.00 |
| rdrobust RKD | ~0.10 | 0.10 | ~0.00 |
The RKD ratio recovers the true causal effect of 0.10 — each additional dollar of weekly UI benefits extends unemployment duration by approximately 0.10 weeks.
How does the RKD estimator differ from the standard RDD (Wald) estimator?
Step 4: Bandwidth Sensitivity
The choice of bandwidth determines which observations contribute to the estimate. A narrow bandwidth reduces bias but increases variance; a wide bandwidth does the reverse.
# Bandwidth sensitivity analysis
bandwidths <- c(50, 75, 100, 150, 200, 250)
cat("=== Bandwidth Sensitivity ===\n")
cat(sprintf("%-10s %-12s %-12s %-12s\n",
"Bandwidth", "RKD Est.", "N (left)", "N (right)"))
for (h in bandwidths) {
sub <- df[abs(df$x) <= h, ]
sub$x_below <- sub$x * (sub$x < 0)
sub$x_above <- sub$x * (sub$x >= 0)
fs_h <- lm(benefit ~ x_below + x_above, data = sub)
rf_h <- lm(duration ~ x_below + x_above, data = sub)
kink_t_h <- coef(fs_h)["x_above"] - coef(fs_h)["x_below"]
kink_y_h <- coef(rf_h)["x_above"] - coef(rf_h)["x_below"]
rkd_h <- kink_y_h / kink_t_h
n_left <- sum(sub$x < 0)
n_right <- sum(sub$x >= 0)
cat(sprintf("%-10d %-12.4f %-12d %-12d\n",
h, rkd_h, n_left, n_right))
}
# Optimal bandwidth from rdrobust
rd_opt <- rdrobust(df$duration, df$x, deriv = 1)
cat("\nrdrobust optimal bandwidth:", round(rd_opt$bws[1], 1), "\n")Expected output:
| Bandwidth | RKD Estimate | N (left) | N (right) |
|---|---|---|---|
| 50 | ~0.09–0.12 | ~420 | ~420 |
| 75 | ~0.09–0.11 | ~630 | ~630 |
| 100 | ~0.09–0.11 | ~830 | ~830 |
| 150 | ~0.09–0.11 | ~1250 | ~1250 |
| 200 | ~0.09–0.11 | ~1670 | ~1670 |
| 250 | ~0.09–0.11 | ~2080 | ~2080 |
The RKD estimate should be relatively stable across bandwidths, hovering around the true effect of 0.10. Small bandwidths produce noisier estimates but less bias; large bandwidths are more precise but may introduce bias if the relationship is nonlinear away from the kink.
Why is the first-stage kink strength important in the RKD design?
Step 5: Density Continuity Test
A key threat to RKD validity is manipulation of the running variable at the kink point. If individuals can precisely sort to one side of the kink, the density of the running variable will exhibit a discontinuity.
# McCrary density test using rddensity
library(rddensity)
density_test <- rddensity(df$x, c = 0)
cat("=== Density Continuity Test ===\n")
cat("Test statistic:", round(density_test$test$t_jk, 3), "\n")
cat("p-value:", round(density_test$test$p_jk, 4), "\n")
cat("Interpretation:", ifelse(density_test$test$p_jk > 0.05,
"No evidence of manipulation (p > 0.05)",
"Evidence of manipulation (p <= 0.05)"), "\n")
# Also check visually with a histogram
hist(df$x, breaks = 60, main = "Distribution of Running Variable",
xlab = "Prior Earnings (centered)", col = "lightblue",
border = "white")
abline(v = 0, col = "red", lwd = 2, lty = 2)Expected output:
| Diagnostic | Value |
|---|---|
| Test statistic (rddensity) | ~0.1–1.5 |
| p-value | > 0.05 |
| N just below kink | ~160–180 |
| N just above kink | ~160–180 |
| Interpretation | No evidence of manipulation |
With simulated data drawn from a uniform distribution, the density is continuous at the kink. In real applications, bunching at kinks (e.g., taxpayers clustering at tax bracket boundaries) would indicate strategic manipulation and threaten the RKD identifying assumptions.
Step 6: Guided Exercise
Computing and Interpreting the RKD Estimate
You are studying the effect of unemployment insurance benefits on job search duration. The Austrian UI system pays benefits equal to 55% of prior earnings up to a cap, then 30% above the cap. You estimate the following at the kink point (prior earnings = cap):
Your output:
| Component | Slope Below | Slope Above | Change in Slope |
|---|---|---|---|
| Treatment (weekly benefit) | 0.548 | 0.298 | -0.250 |
| Outcome (search duration) | 0.073 | 0.048 | -0.025 |
Bandwidth sensitivity: h = 50: RKD = 0.103 (SE = 0.042) h = 100: RKD = 0.098 (SE = 0.028) h = 150: RKD = 0.101 (SE = 0.023) h = 200: RKD = 0.095 (SE = 0.020)
Density test at kink: p = 0.41
Step 7: Exercises
-
Nonlinear DGP. Add a quadratic term to the outcome equation (e.g., 0.0001 * x^2). How does this affect the RKD estimate at different bandwidths? When does bias become apparent?
-
Fuzzy RKD. Add measurement error to the treatment (increase the noise standard deviation from 15 to 50). How does fuzziness affect the estimate and its precision?
-
Placebo kinks. Test for kinks at false thresholds (e.g., x = -100 or x = +100 where no policy kink exists). The RKD estimate at placebo kinks should be near zero.
-
Covariates. Add covariates (age, education) to the simulation and verify they show no kink at the threshold (balance test).
Summary
In this lab you learned:
- The RKD exploits a change in the slope (not level) of the treatment assignment at a kink point
- The RKD estimator is the ratio of the change in outcome slope to the change in treatment slope — analogous to the Wald estimator in IV
- A strong first-stage kink (large change in treatment slope) is important for precision, just as a strong first stage is important for IV
- Bandwidth sensitivity analysis is important: the estimate should be stable across reasonable bandwidths
- The density continuity test checks for manipulation at the kink, but bunching requires special treatment
- RKD is particularly useful when policy formulas create kinks (tax brackets, benefit schedules, subsidy caps) but no jumps