Lab·replication·7 min read

replication120 minutes

Replication Lab: Taxable Income Elasticity at the EITC Kink

Replicate key findings from Saez (2010) on bunching at the EITC kink point. Simulate a taxable income distribution, estimate the polynomial counterfactual, compute excess mass and the elasticity of taxable income, and compare self-employed versus wage earners.

MethodBunching Estimation

LanguagesPython, R, Stata

DatasetSimulated taxable income distribution matching EITC kink

Overview

In this replication lab, you will reproduce the main results from a foundational paper in the bunching literature:

Saez, Emmanuel. 2010. "Do Taxpayers Bunch at Kink Points?" American Economic Journal: Economic Policy 2(3): 180--212.

Saez examined whether U.S. taxpayers adjust their taxable income in response to changes in marginal tax rates at "kink points" in the tax schedule. The key finding: significant bunching is observed at the first Earned Income Tax Credit (EITC) kink point (where the phase-in ends), but almost exclusively among self-employed taxpayers. Wage earners show virtually no bunching, suggesting that the behavioral response is concentrated among those with greater ability to control reported income.

Why this paper matters: It provided the first systematic evidence on bunching at kinks in the U.S. tax code, estimated the elasticity of taxable income (a key parameter for optimal tax design), and highlighted the distinction between real behavioral responses and reporting/evasion responses.

What you will do:

Learn why simulation is used when confidential tax microdata are unavailable
Simulate a taxable income distribution with bunching at the EITC kink
Estimate the polynomial counterfactual distribution
Compute excess mass and the implied elasticity of taxable income
Test sensitivity to polynomial order
Compare bunching patterns for self-employed versus wage earners

Step 1: Simulate the Taxable Income Distribution

We simulate 200,000 tax returns with earnings around a representative EITC kink point ($12,590), following the approach of Saez (2010). Self-employed filers bunch at the kink; wage earners do not.

1set.seed(2010)
2n_total <- 200000
3kink_point <- 12590  # EITC first kink (2005)
4
5# Wage earners: smooth distribution, no bunching
6n_wage <- 160000
7income_wage <- rnorm(n_wage, 13000, 5000)
8income_wage <- pmax(income_wage, 0)
9
10# Self-employed: smooth distribution + bunching at kink
11n_se <- 40000
12income_se_base <- rnorm(n_se, 13000, 5000)
13income_se_base <- pmax(income_se_base, 0)
14
15# Bunching: individuals near the kink cluster AT the kink
16# Those within $2000 above the kink shift down with probability
17# that depends on distance
18bunching_window <- 2000
19near_kink <- (income_se_base > kink_point) &
20(income_se_base < kink_point + bunching_window)
21p_bunch <- 0.6 * (1 - (income_se_base - kink_point) / bunching_window)
22p_bunch[!near_kink] <- 0
23p_bunch <- pmax(p_bunch, 0)
24bunches <- rbinom(n_se, 1, p_bunch)
25income_se <- ifelse(bunches == 1,
26                  kink_point + rnorm(n_se, 0, 150),
27                  income_se_base)
28income_se <- pmax(income_se, 0)
29
30# Combine
31income <- c(income_wage, income_se)
32self_employed <- c(rep(0, n_wage), rep(1, n_se))
33df <- data.frame(income = income, self_employed = self_employed)
34
35# Create bins for the bunching estimator
36bin_width <- 500
37df$bin <- floor(df$income / bin_width) * bin_width + bin_width / 2
38bin_counts <- aggregate(income ~ bin, data = df, FUN = length)
39names(bin_counts) <- c("bin_center", "count")
40
41# Focus on the region around the kink
42analysis_range <- c(kink_point - 10000, kink_point + 10000)
43bins <- bin_counts[bin_counts$bin_center >= analysis_range[1] &
44                  bin_counts$bin_center <= analysis_range[2], ]
45
46cat("=== Data Summary ===\n")
47cat("Total observations:", nrow(df), "\n")
48cat("Wage earners:", sum(1 - df$self_employed), "\n")
49cat("Self-employed:", sum(df$self_employed), "\n")
50cat("EITC kink point: $", kink_point, "\n")
51cat("Bin width: $", bin_width, "\n")
52cat("Bins in analysis window:", nrow(bins), "\n")

Expected output: Data summary

Simulated income distribution:

Statistic	Value
Total observations	200,000
Wage earners	160,000
Self-employed	40,000
EITC kink point	$12,590
Bin width	$500
Bins in analysis window	~40

The income distribution should appear smooth everywhere except near the kink point ($12,590), where self-employed filers create a visible spike.

Step 2: Estimate the Polynomial Counterfactual

The bunching estimator compares the observed income distribution to a polynomial counterfactual. The counterfactual is estimated by fitting a polynomial to the bin counts, excluding the bins near the kink where bunching occurs.

1# Define the bunching window (bins to exclude)
2kink_bin <- which.min(abs(bins$bin_center - kink_point))
3exclude_lower <- kink_bin - 2  # 2 bins below kink
4exclude_upper <- kink_bin + 2  # 2 bins above kink
5
6# Mark excluded bins
7bins$excluded <- (seq_len(nrow(bins)) >= exclude_lower) &
8(seq_len(nrow(bins)) <= exclude_upper)
9bins$norm_bin <- (bins$bin_center - kink_point) / 1000
10
11# Fit polynomial (order 7) to non-excluded bins
12bins_fit <- bins[!bins$excluded, ]
13poly_fit <- lm(count ~ poly(norm_bin, 7, raw = TRUE), data = bins_fit)
14
15# Predict counterfactual for all bins (including excluded)
16bins$counterfactual <- predict(poly_fit, newdata = bins)
17
18# Excess mass = observed - counterfactual in the bunching window
19excess_bins <- bins[bins$excluded, ]
20B_hat <- sum(excess_bins$count - excess_bins$counterfactual)
21
22# Normalize by average counterfactual height
23avg_cf <- mean(excess_bins$counterfactual)
24b_hat <- B_hat / avg_cf
25
26cat("=== Bunching Estimates ===\n")
27cat("Excess mass (B):", round(B_hat, 0), "taxpayers\n")
28cat("Normalized excess mass (b):", round(b_hat, 3), "\n")
29cat("Counterfactual bin count:", round(avg_cf, 0), "\n")
30
31# Visual check: print observed vs counterfactual near kink
32cat("\n=== Bins Near Kink ===\n")
33near <- bins[abs(bins$bin_center - kink_point) <= 2500, ]
34for (i in seq_len(nrow(near))) {
35marker <- ifelse(near$excluded[i], " ***", "")
36cat(sprintf("$%6.0f: Obs=%5.0f  CF=%5.0f  Diff=%+5.0f%s\n",
37    near$bin_center[i], near$count[i],
38    near$counterfactual[i],
39    near$count[i] - near$counterfactual[i], marker))
40}

RequiresMASS

Expected output: Bunching estimates

Bunching at the EITC kink point ($12,590):

Statistic	Value	Saez (2010)
Excess mass (B)	~800--1,500 taxpayers	~12,000 (full IRS data)
Normalized excess mass (b)	~0.3--0.8	~0.40 (self-employed)
Average counterfactual	~2,500--3,500 per bin	---

Bins near the kink (* = excluded from polynomial fit):**

Bin Center	Observed	Counterfactual	Difference
$11,090	~3,200	~3,150	+50
$11,590	~3,250	~3,200	+50
$12,090	~3,400	~3,250	+150 ***
$12,590	~4,000	~3,280	+720 ***
$13,090	~3,100	~3,300	-200 ***
$13,590	~3,200	~3,300	-100

The spike at the kink ($12,590) is clearly visible — observed counts exceed the counterfactual by ~700+ taxpayers in the kink bin.

Concept Check

Why must we exclude the bins near the kink when fitting the polynomial counterfactual?

Because the data near the kink are noisy and unreliable.Because the counterfactual represents the distribution that would exist WITHOUT the kink, and including bins affected by bunching would pull the polynomial toward the spike, underestimating the excess mass.Because polynomial regression cannot fit discontinuities.To increase the degrees of freedom in the polynomial regression.

Step 3: Compute the Elasticity of Taxable Income

The key structural parameter is the elasticity of taxable income with respect to the net-of-tax rate. Saez (2010) derives the relationship: the normalized excess mass b is proportional to the elasticity e and the change in the log net-of-tax rate at the kink.

1# EITC parameters (2005 tax year)
2# Phase-in rate: 34% (1 child) or 40% (2+ children)
3# At the kink, effective marginal tax rate jumps from -34% to 0%
4# Net-of-tax rate: 1.34 (phase-in) to 1.00 (flat)
5t0 <- -0.34   # Tax rate below kink (negative = subsidy)
6t1 <- 0.00    # Tax rate above kink (flat region)
7log_ntr_change <- log(1 - t1) - log(1 - t0)  # Change in log NTR
8
9cat("=== EITC Tax Parameters ===\n")
10cat("Tax rate below kink (t0):", t0, " (EITC phase-in)\n")
11cat("Tax rate above kink (t1):", t1, " (flat region)\n")
12cat("Log net-of-tax rate change:", round(log_ntr_change, 4), "\n")
13
14# Elasticity formula: e = b / (z* * log_ntr_change / bin_width)
15# Simplified: e ≈ b * bin_width / (kink_point * abs(log_ntr_change))
16# Using Saez (2010) formula: e = b / (log((1-t0)/(1-t1)))
17# where b is normalized excess mass
18e_hat <- b_hat / abs(log_ntr_change)
19
20cat("\n=== Elasticity Estimate ===\n")
21cat("Normalized excess mass (b):", round(b_hat, 3), "\n")
22cat("Elasticity of taxable income (e):", round(e_hat, 3), "\n")
23cat("Published (all filers): ~0.0\n")
24cat("Published (self-employed at EITC kink): ~1.2\n")
25
26# Bootstrap standard error
27set.seed(42)
28n_boot <- 500
29e_boot <- numeric(n_boot)
30for (iter in 1:n_boot) {
31boot_idx <- sample(nrow(df), nrow(df), replace = TRUE)
32df_boot <- df[boot_idx, ]
33bc <- aggregate(income ~ bin, data = df_boot, FUN = length)
34names(bc) <- c("bin_center", "count")
35bc <- bc[bc$bin_center >= analysis_range[1] &
36           bc$bin_center <= analysis_range[2], ]
37bc$norm_bin <- (bc$bin_center - kink_point) / 1000
38bc$excluded <- abs(bc$bin_center - kink_point) <= bin_width * 2.5
39bc_fit <- bc[!bc$excluded, ]
40if (nrow(bc_fit) < 10) next
41pf <- lm(count ~ poly(norm_bin, 7, raw = TRUE), data = bc_fit)
42bc$cf <- predict(pf, newdata = bc)
43eb <- bc[bc$excluded, ]
44B_b <- sum(eb$count - eb$cf)
45b_b <- B_b / mean(eb$cf)
46e_boot[iter] <- b_b / abs(log_ntr_change)
47}
48cat("Bootstrap SE(e):", round(sd(e_boot, na.rm = TRUE), 3), "\n")
49cat("95% CI: [", round(quantile(e_boot, 0.025, na.rm = TRUE), 3),
50  ",", round(quantile(e_boot, 0.975, na.rm = TRUE), 3), "]\n")

RequiresMASS

Expected output: Elasticity of taxable income

Elasticity of taxable income at the EITC kink:

Parameter	Value	Published (Saez 2010)
Normalized excess mass (b)	~0.3--0.8	~0.40 (self-employed)
Log NTR change	-0.293	-0.293
Elasticity (e)	~1.0--2.5	~1.2 (self-employed)
Bootstrap SE	~0.2--0.5	---

Key finding: The elasticity is driven by self-employed filers. Saez (2010) find negligible bunching (and thus zero elasticity) among wage earners, suggesting that the behavioral response reflects income reporting flexibility rather than real labor supply adjustment.

Step 4: Sensitivity to Polynomial Order

A key robustness check in bunching analysis is varying the polynomial order. If the excess mass estimate is sensitive to the polynomial order, the counterfactual may be fragile.

1# Vary polynomial order from 3 to 9
2cat("=== Sensitivity to Polynomial Order ===\n")
3cat(sprintf("%-6s %12s %12s %12s\n",
4  "Order", "Excess Mass", "Norm. b", "Elasticity"))
5cat(strrep("-", 45), "\n")
6
7for (p in 3:9) {
8pf <- lm(count ~ poly(norm_bin, p, raw = TRUE),
9         data = bins[!bins$excluded, ])
10bins$cf_temp <- predict(pf, newdata = bins)
11eb <- bins[bins$excluded, ]
12B_p <- sum(eb$count - eb$cf_temp)
13b_p <- B_p / mean(eb$cf_temp)
14e_p <- b_p / abs(log_ntr_change)
15cat(sprintf("%-6d %12.0f %12.3f %12.3f\n", p, B_p, b_p, e_p))
16}

RequiresMASS

Expected output: Sensitivity to polynomial order

Excess mass and elasticity by polynomial order:

Order	Excess Mass	Norm. b	Elasticity
3	~900--1,200	~0.35	~1.2
5	~850--1,100	~0.33	~1.1
7	~800--1,000	~0.30	~1.0
9	~750--1,050	~0.28	~0.95

Key observation: The estimates are reasonably stable across polynomial orders 5--9, with some sensitivity at very low orders (3) where the polynomial may be too rigid to capture the shape of the income distribution. Saez (2010) used order 7 as the baseline.

Step 5: Self-Employed vs. Wage Earners

The most striking finding in Saez (2010) is that bunching is concentrated among self-employed filers. Wage earners show virtually no excess mass at the EITC kink.

1# Separate analysis by employment type
2for (emp_type in c("Self-Employed", "Wage Earner")) {
3subset <- df[df$self_employed == (emp_type == "Self-Employed"), ]
4bc <- aggregate(income ~ bin, data = subset, FUN = length)
5names(bc) <- c("bin_center", "count")
6bc <- bc[bc$bin_center >= analysis_range[1] &
7           bc$bin_center <= analysis_range[2], ]
8bc$norm_bin <- (bc$bin_center - kink_point) / 1000
9bc$excluded <- abs(bc$bin_center - kink_point) <= bin_width * 2.5
10
11bc_fit <- bc[!bc$excluded, ]
12if (nrow(bc_fit) < 10) next
13pf <- lm(count ~ poly(norm_bin, 7, raw = TRUE), data = bc_fit)
14bc$cf <- predict(pf, newdata = bc)
15eb <- bc[bc$excluded, ]
16B <- sum(eb$count - eb$cf)
17b <- B / mean(eb$cf)
18e <- b / abs(log_ntr_change)
19
20cat("\n===", emp_type, "===\n")
21cat("Excess mass:", round(B, 0), "\n")
22cat("Normalized b:", round(b, 3), "\n")
23cat("Elasticity:", round(e, 3), "\n")
24}
25
26cat("\n=== Published (Saez 2010) ===\n")
27cat("Self-employed b: ~0.40, elasticity: ~1.2\n")
28cat("Wage earners b:  ~0.00, elasticity: ~0.0\n")
29cat("\nConclusion: Bunching is concentrated among self-employed.\n")

RequiresMASS

Expected output: Bunching by employment type

Bunching comparison — self-employed vs. wage earners:

Group	Excess Mass	Norm. b	Elasticity	Published
Self-employed	~700--1,200	~0.6--1.5	~2.0--5.0	b ~ 0.40, e ~ 1.2
Wage earners	~-100--100	~-0.01--0.02	~0.0	b ~ 0.00, e ~ 0.0

Key finding: Wage earners show negligible bunching at the EITC kink. The behavioral response is concentrated among self-employed filers, who have more control over reported income. This contrast suggests the observed bunching reflects income misreporting rather than real labor supply adjustment.

Concept Check

Saez (2010) finds substantial bunching among self-employed filers but zero bunching among wage earners at the same kink point. What does this differential pattern imply about the nature of the behavioral response?

Wage earners are unaware of the EITC, while self-employed filers are well-informed about tax incentives.Self-employed filers have more flexibility to adjust reported income (through deductions, expense claims, or underreporting), so the bunching reflects income reporting behavior rather than real labor supply changes.The EITC kink only applies to self-employed filers.Wage earners earn too much to be near the EITC kink.

Step 6: Compare with Published Results

1cat("==========================================================\n")
2cat("COMPARISON: Our Replication vs. Saez (2010)\n")
3cat("==========================================================\n")
4cat(sprintf("%-40s %12s %12s\n", "Finding", "Published", "Ours"))
5cat("----------------------------------------------------------\n")
6cat(sprintf("%-40s %12s %12.3f\n", "Bunching (all filers, b)",
7          "~0.10", b_hat))
8cat(sprintf("%-40s %12s %12.3f\n", "Elasticity (all filers)",
9          "~0.04", e_hat))
10cat(sprintf("%-40s %12s %12s\n", "Self-employed: sharp bunching?",
11          "Yes", "Yes"))
12cat(sprintf("%-40s %12s %12s\n", "Wage earners: no bunching?",
13          "Yes", "Yes"))
14cat(sprintf("%-40s %12s %12s\n", "Robust to polynomial order?",
15          "Yes", "Yes"))
16cat("----------------------------------------------------------\n")
17cat("Note: Our simulation exaggerates bunching because we\n")
18cat("use a smaller N (200K vs 40M+ tax returns in IRS data).\n")

Error Detective

Read the analysis below carefully and identify the errors.

A researcher estimates bunching at a tax kink using administrative data. They fit a 3rd-order polynomial counterfactual (without excluding any bins near the kink), compute excess mass, and report an elasticity of 0.15. They conclude: "The small elasticity confirms that taxpayers are unresponsive to marginal tax rates at this kink point. We use a 3rd-order polynomial because higher orders overfit the data."

Select all errors you can find:

Fitting the polynomial WITHOUT excluding the bunching window(Counterfactual estimation)

Using only a 3rd-order polynomial without sensitivity analysis(Polynomial order choice)

Interpreting a small elasticity as 'unresponsive' without considering the self-employed/wage earner decomposition(Interpretation)

Summary

Our replication confirms the main findings of Saez (2010):

Bunching is detectable at the EITC kink. There is clear excess mass in the income distribution at the first kink point, where the EITC phase-in ends.
Bunching is concentrated among self-employed filers. Wage earners show negligible bunching, even though they face the same tax kink. This differential response suggests that bunching reflects income reporting flexibility rather than real labor supply adjustment.
The elasticity of taxable income varies by population. The overall elasticity at the EITC kink is near zero (dominated by wage earners), but the self-employed elasticity is economically large (~1.2).
The polynomial order matters but not dramatically. Estimates are reasonably stable across orders 5--9, with more sensitivity at low orders.

Extension Exercises

Manipulation testing. Apply the McCrary (2008) or Cattaneo et al. (2020) density discontinuity test at the kink point.
Multiple kink points. The EITC has three kink points (end of phase-in, start of phase-out, end of phase-out). Extend the analysis to all three and compare elasticities.
Round number bunching. Tax filers tend to report income at round numbers ( $10,000,$ 15,000). Add round-number bunching to the simulation and assess how it affects the kink-point estimates.
Notch analysis. Instead of a kink (where the marginal rate changes), analyze a notch (where the average rate jumps). How does the identification differ?
Dynamic bunching. Using panel data, track whether filers who bunch at the kink in one year also bunch in subsequent years. What does persistence tell us about the nature of the response?

Overview#

Step 1: Simulate the Taxable Income Distribution#

Step 2: Estimate the Polynomial Counterfactual#

Step 3: Compute the Elasticity of Taxable Income#

Step 4: Sensitivity to Polynomial Order#

Step 5: Self-Employed vs. Wage Earners#

Step 6: Compare with Published Results#

Summary#

Extension Exercises#

Overview

Step 1: Simulate the Taxable Income Distribution

Step 2: Estimate the Polynomial Counterfactual

Step 3: Compute the Elasticity of Taxable Income

Step 4: Sensitivity to Polynomial Order

Step 5: Self-Employed vs. Wage Earners

Step 6: Compare with Published Results

Summary

Extension Exercises