MethodAtlas
replication120 minutes

Replication Lab: Synthetic Difference-in-Differences

Replicate the key results from Arkhangelsky et al. (2021) on synthetic difference-in-differences. Simulate a panel with heterogeneous state-specific trends, compare DiD, synthetic control, and SDID, examine the unit and time weight structures, and conduct placebo inference.

Overview

In this replication lab, you will explore the core methodology from a landmark paper in modern causal inference:

Arkhangelsky, Dmitry, Susan Athey, David A. Hirshberg, Guido W. Imbens, and Stefan Pischke. 2021. "Synthetic Difference-in-Differences." American Economic Review 111(12): 4088–4118.

Standard difference-in-differences (DiD) assumes parallel trends and gives equal weight to all control units and pre-treatment periods. Synthetic control (SC) optimizes unit weights to match pre-treatment levels but does not difference out common time effects. Synthetic difference-in-differences (SDID) combines the best of both approaches — it reweights control units (like SC) and differences out time effects (like DiD), while also optimizing time weights to concentrate on the most informative pre-treatment periods.

Why the Arkhangelsky et al. paper matters: It unified two of the most widely used causal inference methods — DiD and synthetic control — into a single framework, providing theoretical guarantees and practical algorithms. The SDID estimator is at least as efficient as the better of DiD or SC in any given setting.

What you will do:

  • Simulate a state-level panel with heterogeneous unit-specific trends
  • Estimate treatment effects using standard TWFE DiD
  • Estimate treatment effects using synthetic control
  • Implement the full SDID estimator with unit and time weights
  • Compare all three estimators against the true ATT
  • Conduct placebo-based inference

Step 1: Simulate the State-Level Panel

The DGP features a balanced panel of 40 states over 30 periods. State 1 receives treatment at period 21. Each state has a fixed effect, a state-specific linear trend, and idiosyncratic noise. The differential trends violate the parallel trends assumption.

library(synthdid)
library(fixest)

set.seed(2021)

N <- 40; T_total <- 30; T_pre <- 20; tau_true <- 5.0

alpha <- rnorm(N, 50, 10)
delta <- cumsum(rnorm(T_total, 0.3, 0.15))
gamma <- rnorm(N, 0, 0.15)
gamma[1] <- 0.4  # treated state

df <- expand.grid(state = 1:N, time = 1:T_total)
df <- df[order(df$state, df$time), ]
df$Y <- alpha[df$state] + delta[df$time] +
      gamma[df$state] * df$time + rnorm(nrow(df), 0, 1)
df$treated_unit <- as.integer(df$state == 1)
df$post <- as.integer(df$time > T_pre)
df$D <- df$treated_unit * df$post
df$Y[df$D == 1] <- df$Y[df$D == 1] + tau_true

cat("Panel:", N, "x", T_total, "=", nrow(df), "obs\n")
cat("True ATT:", tau_true, "\n")

Expected output:

Panel: 40 states x 30 periods = 1200 obs
Treated: state 1, from period 21
Pre-treatment: 20, Post-treatment: 10
True ATT: 5.0

Treated state trend: gamma = 0.40
Control trends: mean ~ 0.00, sd ~ 0.15

Step 2: Standard DiD (TWFE) Estimator

Standard TWFE DiD assumes parallel trends. When state-specific trends differ, TWFE is biased.

# TWFE DiD
did_est <- feols(Y ~ D | state + time, data = df)
cat("TWFE:", coef(did_est)["D"], "\n")
cat("True:", tau_true, "\n")
cat("Bias:", coef(did_est)["D"] - tau_true, "\n")
Requiresdid

Expected output:

MethodEstimateBias
TWFE DiD~7.20+2.20
True ATT5.00---
Concept Check

The TWFE DiD estimate is biased upward by approximately 2.2. What is the source of the bias?


Step 3: Synthetic Control Estimator

Synthetic control constructs a weighted combination of control units that matches the treated unit's pre-treatment trajectory.

setup <- panel.matrices(df, unit = "state", time = "time",
                       outcome = "Y", treatment = "D")

sc_est <- sc_estimate(setup$Y, setup$N0, setup$T0)
cat("SC:", c(sc_est), "\n")
cat("Bias:", c(sc_est) - tau_true, "\n")

Expected output:

MethodEstimateBias
Synthetic Control~5.40+0.40
True ATT5.00---

SC performs better than DiD by finding control states with similar trends. However, SC matches levels without differencing, so imperfect fit creates residual bias.


Step 4: Synthetic Difference-in-Differences (SDID)

SDID combines unit weights (like SC), time weights (unique to SDID), and differencing (like DiD). The time weights concentrate on the most informative pre-treatment periods.

# SDID using synthdid package
sdid_est <- synthdid_estimate(setup$Y, setup$N0, setup$T0)
did_est_pkg <- did_estimate(setup$Y, setup$N0, setup$T0)

cat("=== Comparison ===\n")
cat("DiD:", c(did_est_pkg), " bias:", c(did_est_pkg) - tau_true, "\n")
cat("SC:", c(sc_est), " bias:", c(sc_est) - tau_true, "\n")
cat("SDID:", c(sdid_est), " bias:", c(sdid_est) - tau_true, "\n")
cat("True:", tau_true, "\n")

# Examine weights
sdid_w <- attr(sdid_est, "weights")
cat("\nUnit weights (top 5):\n")
omega <- sdid_w$omega
top5 <- order(omega, decreasing = TRUE)[1:5]
for (i in top5) cat("  State", i+1, ":", round(omega[i], 4), "\n")

cat("\nTime weights on last 5 periods:",
  round(tail(sdid_w$lambda, 5), 3), "\n")

# Plot
plot(sdid_est, main = "Synthetic DiD")
Requiressynthdiddid

Expected output:

MethodEstimateBias
DiD (TWFE)~7.20+2.20
Synthetic Control~5.40+0.40
SDID~5.10+0.10
True ATT5.00---
Time weight concentration:
  Last 5 pre-periods:  ~0.75
  First 5 pre-periods: ~0.05

SDID produces the closest estimate to the true ATT. The time weights concentrate on the later pre-treatment periods, which are most informative for extrapolating the counterfactual.

Concept Check

SDID outperforms both DiD and SC. What specific combination of features allows SDID to achieve smaller bias than either method alone?


Step 5: Placebo Inference and Monte Carlo

# Placebo inference: estimate SE by treating each control unit as if it were treated
# vcov(..., method = "placebo") runs SDID on each control unit and takes the SD of results
se_sdid <- sqrt(vcov(sdid_est, method = "placebo"))
# t-statistic: SDID estimate divided by placebo-based standard error
t_stat <- c(sdid_est) / se_sdid

cat("=== Placebo Inference ===\n")
cat("SDID:", c(sdid_est), "\n")
cat("SE:", se_sdid, "\n")
cat("t:", t_stat, "\n")
# 95% confidence interval using normal approximation
ci <- c(sdid_est) + c(-1, 1) * 1.96 * se_sdid
cat("95% CI: [", ci[1], ",", ci[2], "]\n")
# Check whether the CI contains the true treatment effect (simulation diagnostic)
cat("Covers true:", ci[1] <= tau_true & ci[2] >= tau_true, "\n")

Expected output — Placebo inference:

SDID estimate:   ~5.10
Placebo mean:    ~0.01
Placebo SD:      ~0.50
t-statistic:     ~10.2
Placebo p-value: 0.000
95% CI:          [~4.12, ~6.08]
Covers true ATT: True
Concept Check

SDID has both smaller bias and smaller variance than SC in the Monte Carlo. What explains the variance reduction?


Summary

The replication of Arkhangelsky et al. (2021) demonstrates:

  1. DiD fails under differential trends. When the treated unit has a steeper trajectory, TWFE absorbs the trend into the treatment effect estimate.

  2. SC improves on DiD by reweighting. Finding control units with similar trajectories reduces but does not eliminate bias.

  3. SDID combines the best of both. Unit reweighting (like SC) plus differencing (like DiD) plus time reweighting achieves a form of double robustness.

  4. SDID is efficient. Both smaller bias and smaller variance than SC, and dramatically smaller bias than DiD.

  5. Placebo inference works. Assigning treatment to each control unit in turn produces a valid null distribution for constructing p-values and confidence intervals.


Extension Exercises

  1. Multiple treated units. Modify the DGP so that 5 states receive treatment. How do DiD, SC, and SDID compare with multiple treated units?

  2. Staggered adoption. Have different states adopt treatment in different periods. Combine SDID with staggered adoption methods.

  3. Stronger trends. Increase gamma from 0.4 to 1.0. How does DiD bias scale with the trend differential? Does SDID remain robust?

  4. Short pre-treatment. Reduce T_pre from 20 to 5 periods. How does the quality of SDID unit weights degrade?

  5. Non-stationary errors. Replace i.i.d. noise with AR(1) errors. Does SDID inference remain valid?

  6. Covariates. Add time-varying covariates that partially explain state-specific trends. Estimate an augmented SDID.

  7. Jackknife inference. Compare placebo-based SEs with jackknife SEs. When do the two approaches differ?

  8. California Proposition 99. Apply SDID to the classic Proposition 99 dataset (available in the synthdid package). Compare SDID, SC, and DiD estimates.