MethodAtlas
Lab·replication·7 min read
replication120 minutes

Replication Lab: Taxable Income Elasticity at the EITC Kink

Replicate key findings from Saez (2010) on bunching at the EITC kink point. Simulate a taxable income distribution, estimate the polynomial counterfactual, compute excess mass and the elasticity of taxable income, and compare self-employed versus wage earners.

LanguagesPython, R, Stata
DatasetSimulated taxable income distribution matching EITC kink

Overview

In this replication lab, you will reproduce the main results from a foundational paper in the bunching literature:

Saez, Emmanuel. 2010. "Do Taxpayers Bunch at Kink Points?" American Economic Journal: Economic Policy 2(3): 180--212.

Saez examined whether U.S. taxpayers adjust their taxable income in response to changes in marginal tax rates at "kink points" in the tax schedule. The key finding: significant bunching is observed at the first Earned Income Tax Credit (EITC) kink point (where the phase-in ends), but almost exclusively among self-employed taxpayers. Wage earners show virtually no bunching, suggesting that the behavioral response is concentrated among those with greater ability to control reported income.

Why this paper matters: It provided the first systematic evidence on bunching at kinks in the U.S. tax code, estimated the elasticity of taxable income (a key parameter for optimal tax design), and highlighted the distinction between real behavioral responses and reporting/evasion responses.

What you will do:

  • Learn why simulation is used when confidential tax microdata are unavailable
  • Simulate a taxable income distribution with bunching at the EITC kink
  • Estimate the polynomial counterfactual distribution
  • Compute excess mass and the implied elasticity of taxable income
  • Test sensitivity to polynomial order
  • Compare bunching patterns for self-employed versus wage earners

Step 1: Simulate the Taxable Income Distribution

We simulate 200,000 tax returns with earnings around a representative EITC kink point ($12,590), following the approach of Saez (2010). Self-employed filers bunch at the kink; wage earners do not.

set.seed(2010)
n_total <- 200000
kink_point <- 12590  # EITC first kink (2005)

# Wage earners: smooth distribution, no bunching
n_wage <- 160000
income_wage <- rnorm(n_wage, 13000, 5000)
income_wage <- pmax(income_wage, 0)

# Self-employed: smooth distribution + bunching at kink
n_se <- 40000
income_se_base <- rnorm(n_se, 13000, 5000)
income_se_base <- pmax(income_se_base, 0)

# Bunching: individuals near the kink cluster AT the kink
# Those within $2000 above the kink shift down with probability
# that depends on distance
bunching_window <- 2000
near_kink <- (income_se_base > kink_point) &
(income_se_base < kink_point + bunching_window)
p_bunch <- 0.6 * (1 - (income_se_base - kink_point) / bunching_window)
p_bunch[!near_kink] <- 0
p_bunch <- pmax(p_bunch, 0)
bunches <- rbinom(n_se, 1, p_bunch)
income_se <- ifelse(bunches == 1,
                  kink_point + rnorm(n_se, 0, 150),
                  income_se_base)
income_se <- pmax(income_se, 0)

# Combine
income <- c(income_wage, income_se)
self_employed <- c(rep(0, n_wage), rep(1, n_se))
df <- data.frame(income = income, self_employed = self_employed)

# Create bins for the bunching estimator
bin_width <- 500
df$bin <- floor(df$income / bin_width) * bin_width + bin_width / 2
bin_counts <- aggregate(income ~ bin, data = df, FUN = length)
names(bin_counts) <- c("bin_center", "count")

# Focus on the region around the kink
analysis_range <- c(kink_point - 10000, kink_point + 10000)
bins <- bin_counts[bin_counts$bin_center >= analysis_range[1] &
                  bin_counts$bin_center <= analysis_range[2], ]

cat("=== Data Summary ===\n")
cat("Total observations:", nrow(df), "\n")
cat("Wage earners:", sum(1 - df$self_employed), "\n")
cat("Self-employed:", sum(df$self_employed), "\n")
cat("EITC kink point: $", kink_point, "\n")
cat("Bin width: $", bin_width, "\n")
cat("Bins in analysis window:", nrow(bins), "\n")

Step 2: Estimate the Polynomial Counterfactual

The bunching estimator compares the observed income distribution to a polynomial counterfactual. The counterfactual is estimated by fitting a polynomial to the bin counts, excluding the bins near the kink where bunching occurs.

# Define the bunching window (bins to exclude)
kink_bin <- which.min(abs(bins$bin_center - kink_point))
exclude_lower <- kink_bin - 2  # 2 bins below kink
exclude_upper <- kink_bin + 2  # 2 bins above kink

# Mark excluded bins
bins$excluded <- (seq_len(nrow(bins)) >= exclude_lower) &
(seq_len(nrow(bins)) <= exclude_upper)
bins$norm_bin <- (bins$bin_center - kink_point) / 1000

# Fit polynomial (order 7) to non-excluded bins
bins_fit <- bins[!bins$excluded, ]
poly_fit <- lm(count ~ poly(norm_bin, 7, raw = TRUE), data = bins_fit)

# Predict counterfactual for all bins (including excluded)
bins$counterfactual <- predict(poly_fit, newdata = bins)

# Excess mass = observed - counterfactual in the bunching window
excess_bins <- bins[bins$excluded, ]
B_hat <- sum(excess_bins$count - excess_bins$counterfactual)

# Normalize by average counterfactual height
avg_cf <- mean(excess_bins$counterfactual)
b_hat <- B_hat / avg_cf

cat("=== Bunching Estimates ===\n")
cat("Excess mass (B):", round(B_hat, 0), "taxpayers\n")
cat("Normalized excess mass (b):", round(b_hat, 3), "\n")
cat("Counterfactual bin count:", round(avg_cf, 0), "\n")

# Visual check: print observed vs counterfactual near kink
cat("\n=== Bins Near Kink ===\n")
near <- bins[abs(bins$bin_center - kink_point) <= 2500, ]
for (i in seq_len(nrow(near))) {
marker <- ifelse(near$excluded[i], " ***", "")
cat(sprintf("$%6.0f: Obs=%5.0f  CF=%5.0f  Diff=%+5.0f%s\n",
    near$bin_center[i], near$count[i],
    near$counterfactual[i],
    near$count[i] - near$counterfactual[i], marker))
}
RequiresMASS
Concept Check

Why must we exclude the bins near the kink when fitting the polynomial counterfactual?


Step 3: Compute the Elasticity of Taxable Income

The key structural parameter is the elasticity of taxable income with respect to the net-of-tax rate. Saez (2010) derives the relationship: the normalized excess mass b is proportional to the elasticity e and the change in the log net-of-tax rate at the kink.

# EITC parameters (2005 tax year)
# Phase-in rate: 34% (1 child) or 40% (2+ children)
# At the kink, effective marginal tax rate jumps from -34% to 0%
# Net-of-tax rate: 1.34 (phase-in) to 1.00 (flat)
t0 <- -0.34   # Tax rate below kink (negative = subsidy)
t1 <- 0.00    # Tax rate above kink (flat region)
log_ntr_change <- log(1 - t1) - log(1 - t0)  # Change in log NTR

cat("=== EITC Tax Parameters ===\n")
cat("Tax rate below kink (t0):", t0, " (EITC phase-in)\n")
cat("Tax rate above kink (t1):", t1, " (flat region)\n")
cat("Log net-of-tax rate change:", round(log_ntr_change, 4), "\n")

# Elasticity formula: e = b / (z* * log_ntr_change / bin_width)
# Simplified: e ≈ b * bin_width / (kink_point * abs(log_ntr_change))
# Using Saez (2010) formula: e = b / (log((1-t0)/(1-t1)))
# where b is normalized excess mass
e_hat <- b_hat / abs(log_ntr_change)

cat("\n=== Elasticity Estimate ===\n")
cat("Normalized excess mass (b):", round(b_hat, 3), "\n")
cat("Elasticity of taxable income (e):", round(e_hat, 3), "\n")
cat("Published (all filers): ~0.0\n")
cat("Published (self-employed at EITC kink): ~1.2\n")

# Bootstrap standard error
set.seed(42)
n_boot <- 500
e_boot <- numeric(n_boot)
for (iter in 1:n_boot) {
boot_idx <- sample(nrow(df), nrow(df), replace = TRUE)
df_boot <- df[boot_idx, ]
bc <- aggregate(income ~ bin, data = df_boot, FUN = length)
names(bc) <- c("bin_center", "count")
bc <- bc[bc$bin_center >= analysis_range[1] &
           bc$bin_center <= analysis_range[2], ]
bc$norm_bin <- (bc$bin_center - kink_point) / 1000
bc$excluded <- abs(bc$bin_center - kink_point) <= bin_width * 2.5
bc_fit <- bc[!bc$excluded, ]
if (nrow(bc_fit) < 10) next
pf <- lm(count ~ poly(norm_bin, 7, raw = TRUE), data = bc_fit)
bc$cf <- predict(pf, newdata = bc)
eb <- bc[bc$excluded, ]
B_b <- sum(eb$count - eb$cf)
b_b <- B_b / mean(eb$cf)
e_boot[iter] <- b_b / abs(log_ntr_change)
}
cat("Bootstrap SE(e):", round(sd(e_boot, na.rm = TRUE), 3), "\n")
cat("95% CI: [", round(quantile(e_boot, 0.025, na.rm = TRUE), 3),
  ",", round(quantile(e_boot, 0.975, na.rm = TRUE), 3), "]\n")
RequiresMASS

Step 4: Sensitivity to Polynomial Order

A key robustness check in bunching analysis is varying the polynomial order. If the excess mass estimate is sensitive to the polynomial order, the counterfactual may be fragile.

# Vary polynomial order from 3 to 9
cat("=== Sensitivity to Polynomial Order ===\n")
cat(sprintf("%-6s %12s %12s %12s\n",
  "Order", "Excess Mass", "Norm. b", "Elasticity"))
cat(strrep("-", 45), "\n")

for (p in 3:9) {
pf <- lm(count ~ poly(norm_bin, p, raw = TRUE),
         data = bins[!bins$excluded, ])
bins$cf_temp <- predict(pf, newdata = bins)
eb <- bins[bins$excluded, ]
B_p <- sum(eb$count - eb$cf_temp)
b_p <- B_p / mean(eb$cf_temp)
e_p <- b_p / abs(log_ntr_change)
cat(sprintf("%-6d %12.0f %12.3f %12.3f\n", p, B_p, b_p, e_p))
}
RequiresMASS

Step 5: Self-Employed vs. Wage Earners

The most striking finding in Saez (2010) is that bunching is concentrated among self-employed filers. Wage earners show virtually no excess mass at the EITC kink.

# Separate analysis by employment type
for (emp_type in c("Self-Employed", "Wage Earner")) {
subset <- df[df$self_employed == (emp_type == "Self-Employed"), ]
bc <- aggregate(income ~ bin, data = subset, FUN = length)
names(bc) <- c("bin_center", "count")
bc <- bc[bc$bin_center >= analysis_range[1] &
           bc$bin_center <= analysis_range[2], ]
bc$norm_bin <- (bc$bin_center - kink_point) / 1000
bc$excluded <- abs(bc$bin_center - kink_point) <= bin_width * 2.5

bc_fit <- bc[!bc$excluded, ]
if (nrow(bc_fit) < 10) next
pf <- lm(count ~ poly(norm_bin, 7, raw = TRUE), data = bc_fit)
bc$cf <- predict(pf, newdata = bc)
eb <- bc[bc$excluded, ]
B <- sum(eb$count - eb$cf)
b <- B / mean(eb$cf)
e <- b / abs(log_ntr_change)

cat("\n===", emp_type, "===\n")
cat("Excess mass:", round(B, 0), "\n")
cat("Normalized b:", round(b, 3), "\n")
cat("Elasticity:", round(e, 3), "\n")
}

cat("\n=== Published (Saez 2010) ===\n")
cat("Self-employed b: ~0.40, elasticity: ~1.2\n")
cat("Wage earners b:  ~0.00, elasticity: ~0.0\n")
cat("\nConclusion: Bunching is concentrated among self-employed.\n")
RequiresMASS
Concept Check

Saez (2010) finds substantial bunching among self-employed filers but zero bunching among wage earners at the same kink point. What does this differential pattern imply about the nature of the behavioral response?


Step 6: Compare with Published Results

cat("==========================================================\n")
cat("COMPARISON: Our Replication vs. Saez (2010)\n")
cat("==========================================================\n")
cat(sprintf("%-40s %12s %12s\n", "Finding", "Published", "Ours"))
cat("----------------------------------------------------------\n")
cat(sprintf("%-40s %12s %12.3f\n", "Bunching (all filers, b)",
          "~0.10", b_hat))
cat(sprintf("%-40s %12s %12.3f\n", "Elasticity (all filers)",
          "~0.04", e_hat))
cat(sprintf("%-40s %12s %12s\n", "Self-employed: sharp bunching?",
          "Yes", "Yes"))
cat(sprintf("%-40s %12s %12s\n", "Wage earners: no bunching?",
          "Yes", "Yes"))
cat(sprintf("%-40s %12s %12s\n", "Robust to polynomial order?",
          "Yes", "Yes"))
cat("----------------------------------------------------------\n")
cat("Note: Our simulation exaggerates bunching because we\n")
cat("use a smaller N (200K vs 40M+ tax returns in IRS data).\n")

Error Detective

Read the analysis below carefully and identify the errors.

A researcher estimates bunching at a tax kink using administrative data. They fit a 3rd-order polynomial counterfactual (without excluding any bins near the kink), compute excess mass, and report an elasticity of 0.15. They conclude: "The small elasticity confirms that taxpayers are unresponsive to marginal tax rates at this kink point. We use a 3rd-order polynomial because higher orders overfit the data."

Select all errors you can find:


Summary

Our replication confirms the main findings of Saez (2010):

  1. Bunching is detectable at the EITC kink. There is clear excess mass in the income distribution at the first kink point, where the EITC phase-in ends.

  2. Bunching is concentrated among self-employed filers. Wage earners show negligible bunching, even though they face the same tax kink. This differential response suggests that bunching reflects income reporting flexibility rather than real labor supply adjustment.

  3. The elasticity of taxable income varies by population. The overall elasticity at the EITC kink is near zero (dominated by wage earners), but the self-employed elasticity is economically large (~1.2).

  4. The polynomial order matters but not dramatically. Estimates are reasonably stable across orders 5--9, with more sensitivity at low orders.


Extension Exercises

  1. Manipulation testing. Apply the McCrary (2008) or Cattaneo et al. (2020) density discontinuity test at the kink point.

  2. Multiple kink points. The EITC has three kink points (end of phase-in, start of phase-out, end of phase-out). Extend the analysis to all three and compare elasticities.

  3. Round number bunching. Tax filers tend to report income at round numbers (10,000,10,000, 15,000). Add round-number bunching to the simulation and assess how it affects the kink-point estimates.

  4. Notch analysis. Instead of a kink (where the marginal rate changes), analyze a notch (where the average rate jumps). How does the identification differ?

  5. Dynamic bunching. Using panel data, track whether filers who bunch at the kink in one year also bunch in subsequent years. What does persistence tell us about the nature of the response?