Lab·tutorial·6 min read

tutorial90 minutes

Lab: Regression Kink Design from Scratch

Implement a regression kink design: simulate a kinked assignment, estimate outcome kinks, compute the RKD ratio, assess bandwidth, and run density tests.

Method: Regression Kink Design (RKD)
Languages: Python, R, Stata
Dataset: Unemployment benefits schedule (simulated)

Overview

The Regression Kink Design (RKD) exploits a change in the slope (not the level) of a treatment assignment function at a known threshold. Unlike the regression discontinuity design (RDD), which requires a jump in the treatment at the cutoff, RKD identifies causal effects from a kink — a point where the derivative of the treatment with respect to the running variable changes.

What you will learn:

How a kink in the treatment assignment function differs from a discontinuity
How to simulate a kinked benefit schedule
How to estimate the RKD ratio (change in outcome slope divided by change in treatment slope)
How bandwidth choice affects precision and bias
How to run the McCrary density continuity test to check for manipulation

Prerequisites: Regression discontinuity design (see the RDD tutorial lab), local polynomial regression.

Step 1: Simulate a Kinked Treatment Schedule

Unemployment insurance (UI) benefit formulas often create kinks: benefits are a fixed fraction of prior earnings up to a cap, after which the replacement rate drops. We simulate this structure.

1# First-time setup: install.packages(c("estimatr", "rdrobust"))
2library(estimatr)
3library(rdrobust)
4
5set.seed(2015)
6n <- 5000
7
8# Running variable: prior weekly earnings (centered at the kink)
9# Kink is at earnings = 0 (centered)
10x <- runif(n, -300, 300)
11
12# Treatment: weekly UI benefit amount
13# Below the kink (x < 0): replacement rate = 0.55
14# Above the kink (x >= 0): replacement rate = 0.30
15# This creates a kink in benefits as a function of earnings
16benefit_base <- 300  # Benefit at the kink point
17benefit <- ifelse(x < 0,
18benefit_base + 0.55 * x,   # Steeper slope below kink
19benefit_base + 0.30 * x    # Flatter slope above kink
20)
21
22# Add noise to treatment (fuzzy kink)
23benefit <- benefit + rnorm(n, 0, 15)
24
25# Outcome: unemployment duration (weeks)
26# True causal effect of benefit on duration: 0.10 weeks per dollar
27true_effect <- 0.10
28duration <- 20 + true_effect * benefit + 0.02 * x + rnorm(n, 0, 4)
29
30df <- data.frame(x, benefit, duration)
31
32# Visualize the kink in treatment
33cat("=== Treatment Schedule ===\n")
34cat("Slope below kink (replacement rate):", 0.55, "\n")
35cat("Slope above kink (replacement rate):", 0.30, "\n")
36cat("Change in slope at kink:", 0.30 - 0.55, "\n")

Requiresestimatr rdrobust

Expected output:

Statistic	Value
N	5,000
Slope below kink	0.55
Slope above kink	0.30
Change in slope (first-stage kink)	-0.25
Mean benefit	~300
Mean duration	~50 weeks

The treatment (UI benefit) has a kink at x = 0: the replacement rate drops from 0.55 to 0.30. Unlike an RDD, there is no jump in benefits at the threshold — the benefit function is continuous but changes slope.

Step 2: Estimate the Kink in the Outcome

If the treatment causally affects the outcome, the kink in the treatment schedule should induce a corresponding kink in the outcome. We estimate separate linear regressions on each side of the kink.

1# Estimate slopes of benefit on each side of the kink
2df$below <- as.integer(df$x < 0)
3df$x_below <- df$x * df$below
4df$x_above <- df$x * (1 - df$below)
5
6# First stage: kink in treatment
7fs <- lm(benefit ~ x_below + x_above, data = df)
8slope_below_t <- coef(fs)["x_below"]
9slope_above_t <- coef(fs)["x_above"]
10kink_t <- slope_above_t - slope_below_t
11
12cat("=== First-Stage Kink (Treatment) ===\n")
13cat("Slope below:", round(slope_below_t, 4), "\n")
14cat("Slope above:", round(slope_above_t, 4), "\n")
15cat("Change in slope:", round(kink_t, 4), "\n\n")
16
17# Reduced form: kink in outcome
18rf <- lm(duration ~ x_below + x_above, data = df)
19slope_below_y <- coef(rf)["x_below"]
20slope_above_y <- coef(rf)["x_above"]
21kink_y <- slope_above_y - slope_below_y
22
23cat("=== Reduced-Form Kink (Outcome) ===\n")
24cat("Slope below:", round(slope_below_y, 4), "\n")
25cat("Slope above:", round(slope_above_y, 4), "\n")
26cat("Change in slope:", round(kink_y, 4), "\n")

Expected output:

Component	Slope Below	Slope Above	Change in Slope
Treatment (benefit)	~0.55	~0.30	~-0.25
Outcome (duration)	~0.075	~0.050	~-0.025

The treatment has a clear kink: the slope changes by approximately -0.25 at the threshold. The outcome also exhibits a kink — this kink is the reduced-form evidence that the kink in benefits affects unemployment duration.

Step 3: Compute the RKD Ratio Estimator

The RKD estimator is the ratio of the change in the outcome slope to the change in the treatment slope at the kink point, analogous to the Wald estimator in IV:

RKD = (change in outcome slope) / (change in treatment slope)

1# RKD ratio estimator
2rkd_estimate <- kink_y / kink_t
3cat("=== RKD Estimate ===\n")
4cat("Reduced-form kink (outcome):", round(kink_y, 5), "\n")
5cat("First-stage kink (treatment):", round(kink_t, 5), "\n")
6cat("RKD ratio (kink_y / kink_t):", round(rkd_estimate, 4), "\n")
7cat("True causal effect:", true_effect, "\n")
8cat("Bias:", round(rkd_estimate - true_effect, 4), "\n\n")
9
10# Using rdrobust for formal RKD estimation
11# deriv = 1 tells rdrobust to estimate the kink (first derivative)
12rd_first <- rdrobust(df$benefit, df$x, deriv = 1)
13rd_reduced <- rdrobust(df$duration, df$x, deriv = 1)
14
15rkd_robust <- rd_reduced$coef[1] / rd_first$coef[1]
16cat("rdrobust RKD estimate:", round(rkd_robust, 4), "\n")

Requiresrdrobust

Expected output:

Estimator	Estimate	True Effect	Bias
Manual RKD ratio	~0.10	0.10	~0.00
rdrobust RKD	~0.10	0.10	~0.00

The RKD ratio recovers the true causal effect of 0.10 — each additional dollar of weekly UI benefits extends unemployment duration by approximately 0.10 weeks.

Concept Check

How does the RKD estimator differ from the standard RDD (Wald) estimator?

RKD uses a different bandwidth selector, but otherwise the estimator is the same as RDD.RKD estimates the ratio of changes in slopes (first derivatives) at the kink, while RDD estimates the ratio of changes in levels (jumps) at the cutoff.RKD is valid only with a sharp kink, while RDD works with both sharp and fuzzy designs.RKD requires a continuous running variable, while RDD works with discrete running variables.

Step 4: Bandwidth Sensitivity

The choice of bandwidth determines which observations contribute to the estimate. A narrow bandwidth reduces bias but increases variance; a wide bandwidth does the reverse.

1# Bandwidth sensitivity analysis
2bandwidths <- c(50, 75, 100, 150, 200, 250)
3
4cat("=== Bandwidth Sensitivity ===\n")
5cat(sprintf("%-10s %-12s %-12s %-12s\n",
6  "Bandwidth", "RKD Est.", "N (left)", "N (right)"))
7
8for (h in bandwidths) {
9sub <- df[abs(df$x) <= h, ]
10sub$x_below <- sub$x * (sub$x < 0)
11sub$x_above <- sub$x * (sub$x >= 0)
12
13fs_h <- lm(benefit ~ x_below + x_above, data = sub)
14rf_h <- lm(duration ~ x_below + x_above, data = sub)
15
16kink_t_h <- coef(fs_h)["x_above"] - coef(fs_h)["x_below"]
17kink_y_h <- coef(rf_h)["x_above"] - coef(rf_h)["x_below"]
18rkd_h <- kink_y_h / kink_t_h
19
20n_left <- sum(sub$x < 0)
21n_right <- sum(sub$x >= 0)
22
23cat(sprintf("%-10d %-12.4f %-12d %-12d\n",
24    h, rkd_h, n_left, n_right))
25}
26
27# Optimal bandwidth from rdrobust
28rd_opt <- rdrobust(df$duration, df$x, deriv = 1)
29cat("\nrdrobust optimal bandwidth:", round(rd_opt$bws[1], 1), "\n")

Requiresrdrobust

Expected output:

Bandwidth	RKD Estimate	N (left)	N (right)
50	~0.09–0.12	~420	~420
75	~0.09–0.11	~630	~630
100	~0.09–0.11	~830	~830
150	~0.09–0.11	~1250	~1250
200	~0.09–0.11	~1670	~1670
250	~0.09–0.11	~2080	~2080

The RKD estimate should be relatively stable across bandwidths, hovering around the true effect of 0.10. Small bandwidths produce noisier estimates but less bias; large bandwidths are more precise but may introduce bias if the relationship is nonlinear away from the kink.

Concept Check

Why is the first-stage kink strength important in the RKD design?

A larger first-stage kink makes the RKD estimate less biased because it reduces confounding.A larger first-stage kink produces a more precise RKD estimate because it reduces the variance of the ratio estimator, similar to how a strong first stage improves IV precision.The first-stage kink is irrelevant as long as it is statistically different from zero.A larger first-stage kink means the identifying assumptions are more likely satisfied.

Step 5: Density Continuity Test

A key threat to RKD validity is manipulation of the running variable at the kink point. If individuals can precisely sort to one side of the kink, the density of the running variable will exhibit a discontinuity.

1# First-time setup: install.packages(c("rddensity"))
2# McCrary density test using rddensity
3library(rddensity)
4
5density_test <- rddensity(df$x, c = 0)
6
7cat("=== Density Continuity Test ===\n")
8cat("Test statistic:", round(density_test$test$t_jk, 3), "\n")
9cat("p-value:", round(density_test$test$p_jk, 4), "\n")
10cat("Interpretation:", ifelse(density_test$test$p_jk > 0.05,
11  "No evidence of manipulation (p > 0.05)",
12  "Evidence of manipulation (p <= 0.05)"), "\n")
13
14# Also check visually with a histogram
15hist(df$x, breaks = 60, main = "Distribution of Running Variable",
16   xlab = "Prior Earnings (centered)", col = "lightblue",
17   border = "white")
18abline(v = 0, col = "red", lwd = 2, lty = 2)

Requiresrddensity

Expected output:

Diagnostic	Value
Test statistic (rddensity)	~0.1–1.5
p-value	> 0.05
N just below kink	~160–180
N just above kink	~160–180
Interpretation	No evidence of manipulation

With simulated data drawn from a uniform distribution, the density is continuous at the kink. In real applications, bunching at kinks (e.g., taxpayers clustering at tax bracket boundaries) would indicate strategic manipulation and threaten the RKD identifying assumptions.

Step 6: Guided Exercise

Guided Exercise

Computing and Interpreting the RKD Estimate

You are studying the effect of unemployment insurance benefits on job search duration. The Austrian UI system pays benefits equal to 55% of prior earnings up to a cap, then 30% above the cap. You estimate the following at the kink point (prior earnings = cap):

Your output:

Component	Slope Below	Slope Above	Change in Slope
Treatment (weekly benefit)	0.548	0.298	-0.250
Outcome (search duration)	0.073	0.048	-0.025

Bandwidth sensitivity: h = 50: RKD = 0.103 (SE = 0.042) h = 100: RKD = 0.098 (SE = 0.028) h = 150: RKD = 0.101 (SE = 0.023) h = 200: RKD = 0.095 (SE = 0.020)

Density test at kink: p = 0.41

Step 7: Exercises

Nonlinear DGP. Add a quadratic term to the outcome equation (e.g., 0.0001 * x^2). How does this affect the RKD estimate at different bandwidths? When does bias become apparent?
Fuzzy RKD. Add measurement error to the treatment (increase the noise standard deviation from 15 to 50). How does fuzziness affect the estimate and its precision?
Placebo kinks. Test for kinks at false thresholds (e.g., x = -100 or x = +100 where no policy kink exists). The RKD estimate at placebo kinks should be near zero.
Covariates. Add covariates (age, education) to the simulation and verify they show no kink at the threshold (balance test).

Summary

In this lab you learned:

The RKD exploits a change in the slope (not level) of the treatment assignment at a kink point
The RKD estimator is the ratio of the change in outcome slope to the change in treatment slope — analogous to the Wald estimator in IV
A strong first-stage kink (large change in treatment slope) is important for precision, just as a strong first stage is important for IV
Bandwidth sensitivity analysis is important: the estimate should be stable across reasonable bandwidths
The density continuity test checks for manipulation at the kink, but bunching requires special treatment
RKD is particularly useful when policy formulas create kinks (tax brackets, benefit schedules, subsidy caps) but no jumps

Overview#

Step 1: Simulate a Kinked Treatment Schedule#

Step 2: Estimate the Kink in the Outcome#

Step 3: Compute the RKD Ratio Estimator#

Step 4: Bandwidth Sensitivity#

Step 5: Density Continuity Test#

Step 6: Guided Exercise#

Step 7: Exercises#

Summary#

Overview

Step 1: Simulate a Kinked Treatment Schedule

Step 2: Estimate the Kink in the Outcome

Step 3: Compute the RKD Ratio Estimator

Step 4: Bandwidth Sensitivity

Step 5: Density Continuity Test

Step 6: Guided Exercise

Step 7: Exercises

Summary