MethodAtlas
Lab·replication·8 min read
replication120 minutes

Replication Lab: Synthetic Difference-in-Differences

Replicate the key results from Arkhangelsky et al. (2021) on synthetic difference-in-differences. Simulate a panel with heterogeneous state-specific trends, compare DiD, synthetic control, and SDID, examine the unit and time weight structures, and conduct placebo inference.

LanguagesPython, R, Stata
DatasetSimulated state-level panel matching Arkhangelsky et al. (2021) DGP

Overview

In this replication lab, you will explore the core methodology from a landmark paper in modern causal inference:

Arkhangelsky, Dmitry, Susan Athey, David A. Hirshberg, Guido W. Imbens, and Stefan Wager. 2021. "Synthetic Difference-in-Differences." American Economic Review 111(12): 4088–4118.

Standard difference-in-differences (DiD) assumes parallel trends and gives equal weight to all control units and pre-treatment periods. Synthetic control (SC) optimizes unit weights to match pre-treatment levels but does not difference out common time effects. Synthetic difference-in-differences (SDID) combines the best of both approaches — it reweights control units (like SC) and differences out time effects (like DiD), while also optimizing time weights to concentrate on the most informative pre-treatment periods.

Why the Arkhangelsky et al. (2021) paper matters: It unified two of the most widely used causal inference methods — DiD and synthetic control — into a single framework, providing theoretical guarantees and practical algorithms. Their Monte Carlo evidence and analytical results suggest that SDID tends to perform at least as well as the better of DiD or SC across a range of data-generating processes.

What you will do:

  • Simulate a state-level panel with heterogeneous unit-specific trends
  • Estimate treatment effects using standard TWFE DiD
  • Estimate treatment effects using synthetic control
  • Implement the full SDID estimator with unit and time weights
  • Compare all three estimators against the true ATT
  • Conduct placebo-based inference

Step 1: Simulate the State-Level Panel

The DGP features a balanced panel of 40 states over 30 periods. State 1 receives treatment at period 21. Each state has a fixed effect, a state-specific linear trend, and idiosyncratic noise. The differential trends violate the parallel trends assumption.

library(synthdid)  # Arkhangelsky et al. (2021) SDID implementation
library(fixest)    # fast fixed-effects estimation for TWFE comparison

set.seed(2021)

N <- 40; T_total <- 30; T_pre <- 20; tau_true <- 5.0

# State fixed effects: permanent level differences across states
alpha <- rnorm(N, 50, 10)
# Time fixed effects: common macro shocks (random walk)
delta <- cumsum(rnorm(T_total, 0.3, 0.15))
# State-specific linear trends: key source of parallel trends violation
gamma <- rnorm(N, 0, 0.15)
gamma[1] <- 0.4  # treated state has steeper trend than controls

# Build balanced panel
df <- expand.grid(state = 1:N, time = 1:T_total)
df <- df[order(df$state, df$time), ]
# Outcome = state FE + time FE + differential trend + noise
df$Y <- alpha[df$state] + delta[df$time] +
      gamma[df$state] * df$time + rnorm(nrow(df), 0, 1)
df$treated_unit <- as.integer(df$state == 1)
df$post <- as.integer(df$time > T_pre)
df$D <- df$treated_unit * df$post         # treatment indicator
df$Y[df$D == 1] <- df$Y[df$D == 1] + tau_true  # add known treatment effect

cat("Panel:", N, "x", T_total, "=", nrow(df), "obs\n")
cat("True ATT:", tau_true, "\n")

Expected output:

Panel: 40 states x 30 periods = 1200 obs
Treated: state 1, from period 21
Pre-treatment: 20, Post-treatment: 10
True ATT: 5.0

Treated state trend: gamma = 0.40
Control trends: mean ~ 0.00, sd ~ 0.15

Step 2: Standard DiD (TWFE) Estimator

Standard TWFE DiD assumes parallel trends. When state-specific trends differ, TWFE is biased.

# TWFE DiD: state + time FEs absorb additive level differences
# Under parallel trends TWFE is unbiased; here differential trends cause bias
did_est <- feols(Y ~ D | state + time, data = df)
cat("TWFE:", coef(did_est)["D"], "\n")
cat("True:", tau_true, "\n")
cat("Bias:", coef(did_est)["D"] - tau_true, "\n")  # positive bias from steeper treated trend
Requiresdid

Expected output:

MethodEstimateBias
TWFE DiD~8–12+3 to +7
True ATT5.00---
Concept Check

The TWFE DiD estimate is substantially biased upward. What is the source of the bias?


Step 3: Synthetic Control Estimator

Synthetic control constructs a weighted combination of control units that matches the treated unit's pre-treatment trajectory.

# Reshape panel into the matrix format required by synthdid
setup <- panel.matrices(df, unit = "state", time = "time",
                       outcome = "Y", treatment = "D")

# SC matches pre-treatment levels without differencing — residual bias from imperfect fit
sc_est <- sc_estimate(setup$Y, setup$N0, setup$T0)
cat("SC:", c(sc_est), "\n")
cat("Bias:", c(sc_est) - tau_true, "\n")  # smaller than TWFE but not zero
Requiressynthdid

Expected output:

MethodEstimateBias
Synthetic Control~5.40+0.40
True ATT5.00---

SC performs better than DiD by finding control states with similar trends. However, SC matches levels without differencing, so imperfect fit creates residual bias.


Step 4: Synthetic Difference-in-Differences (SDID)

SDID combines unit weights (like SC), time weights (unique to SDID), and differencing (like DiD). The time weights concentrate on the most informative pre-treatment periods.

# SDID: combines unit reweighting (like SC) + differencing (like DiD) + time weights
sdid_est <- synthdid_estimate(setup$Y, setup$N0, setup$T0)
# DiD from the synthdid package for consistent comparison
did_est_pkg <- did_estimate(setup$Y, setup$N0, setup$T0)

# Compare all three estimators against the true ATT
cat("=== Comparison ===\n")
cat("DiD:", c(did_est_pkg), " bias:", c(did_est_pkg) - tau_true, "\n")
cat("SC:", c(sc_est), " bias:", c(sc_est) - tau_true, "\n")
cat("SDID:", c(sdid_est), " bias:", c(sdid_est) - tau_true, "\n")
cat("True:", tau_true, "\n")

# Examine SDID weights — unit weights select similar donors, time weights
# concentrate on the most informative pre-treatment periods
sdid_w <- attr(sdid_est, "weights")
cat("\nUnit weights (top 5):\n")
omega <- sdid_w$omega
top5 <- order(omega, decreasing = TRUE)[1:5]
for (i in top5) cat("  State", i+1, ":", round(omega[i], 4), "\n")

# Time weights should concentrate on later pre-treatment periods
cat("\nTime weights on last 5 periods:",
  round(tail(sdid_w$lambda, 5), 3), "\n")

# Diagnostic plot: treated unit, synthetic, and the SDID counterfactual
plot(sdid_est, main = "Synthetic DiD")
Requiresdidsynthdid

Expected output:

MethodEstimateBias
DiD (TWFE)~7.20+2.20
Synthetic Control~5.40+0.40
SDID~5.10+0.10
True ATT5.00---
Time weight concentration:
  Last 5 pre-periods:  ~0.75
  First 5 pre-periods: ~0.05

In this simulation, SDID produces the closest estimate to the true ATT. The time weights concentrate on the later pre-treatment periods, which are most informative for extrapolating the counterfactual.

Concept Check

In this simulation, SDID outperforms both DiD and SC. What specific combination of features allows SDID to achieve smaller bias than either method alone?


Step 5: Placebo Inference and Monte Carlo

# Placebo inference: run SDID on each control unit to build the null distribution
# vcov(..., method = "placebo") computes the SD of placebo estimates as the SE
se_sdid <- sqrt(vcov(sdid_est, method = "placebo"))
t_stat <- c(sdid_est) / se_sdid  # large t = estimate is significant

cat("=== Placebo Inference ===\n")
cat("SDID:", c(sdid_est), "\n")
cat("SE:", se_sdid, "\n")
cat("t:", t_stat, "\n")
# 95% CI via normal approximation (valid asymptotically)
ci <- c(sdid_est) + c(-1, 1) * 1.96 * se_sdid
cat("95% CI: [", ci[1], ",", ci[2], "]\n")
# Simulation check: does the CI contain the known true effect?
cat("Covers true:", ci[1] <= tau_true & ci[2] >= tau_true, "\n")

Expected output — Placebo inference:

SDID estimate:   ~5.10
Placebo mean:    ~0.01
Placebo SD:      ~0.50
t-statistic:     ~10.2
Placebo p-value: 0.000
95% CI:          [~4.12, ~6.08]
Covers true ATT: True
Concept Check

SDID has both smaller bias and smaller variance than SC in the Monte Carlo. What explains the variance reduction?


Summary

The replication of Arkhangelsky et al. (2021) demonstrates:

  1. DiD fails under differential trends. When the treated unit has a steeper trajectory, TWFE absorbs the trend into the treatment effect estimate.

  2. SC improves on DiD by reweighting. Finding control units with similar trajectories reduces but does not eliminate bias.

  3. SDID combines the best of both. Unit reweighting (like SC) plus differencing (like DiD) plus time reweighting achieves a form of double robustness.

  4. SDID is efficient. Both smaller bias and smaller variance than SC, and dramatically smaller bias than DiD.

  5. Placebo inference works. Assigning treatment to each control unit in turn produces a valid null distribution for constructing p-values and confidence intervals.


Extension Exercises

  1. Multiple treated units. Modify the DGP so that 5 states receive treatment. How do DiD, SC, and SDID compare with multiple treated units?

  2. Staggered adoption. Have different states adopt treatment in different periods. Combine SDID with staggered adoption methods.

  3. Stronger trends. Increase gamma from 0.4 to 1.0. How does DiD bias scale with the trend differential? Does SDID remain robust?

  4. Short pre-treatment. Reduce T_pre from 20 to 5 periods. How does the quality of SDID unit weights degrade?

  5. Non-stationary errors. Replace i.i.d. noise with AR(1) errors. Does SDID inference remain valid?

  6. Covariates. Add time-varying covariates that partially explain state-specific trends. Estimate an augmented SDID.

  7. Jackknife inference. Compare placebo-based SEs with jackknife SEs. When do the two approaches differ?

  8. California Proposition 99. Apply SDID to the classic Proposition 99 dataset (available in the synthdid package). Compare SDID, SC, and DiD estimates.