Lab·replication·5 min read

replication120 minutes

Replication Lab: Synthetic Difference-in-Differences

Replicate Arkhangelsky et al. (2021) on synthetic DiD: simulate state-specific trends, compare DiD, SC, and SDID, examine weights, and run placebo inference.

Method: Synthetic Difference-in-Differences
Languages: Python, R, Stata
Dataset: Simulated state-level panel matching Arkhangelsky et al. (2021) DGP

Overview

In this replication lab, you will explore the core methodology from a landmark paper in modern causal inference:

Arkhangelsky, Dmitry, Susan Athey, David A. Hirshberg, Guido W. Imbens, and Stefan Wager. 2021. "Synthetic Difference-in-Differences." American Economic Review 111(12): 4088–4118.

Standard difference-in-differences (DiD) assumes parallel trends and gives equal weight to all control units and pre-treatment periods. Synthetic control (SC) optimizes unit weights to match pre-treatment levels but does not difference out common time effects. Synthetic difference-in-differences (SDID) combines the best of both approaches — it reweights control units (like SC) and differences out time effects (like DiD), while also optimizing time weights to concentrate on the most informative pre-treatment periods.

Why the Arkhangelsky et al. (2021) paper matters: It unified two of the most widely used causal inference methods — DiD and synthetic control — into a single framework, providing theoretical guarantees and practical algorithms. Their Monte Carlo evidence and analytical results suggest that SDID tends to perform at least as well as the better of DiD or SC across a range of data-generating processes.

What you will do:

Simulate a state-level panel with heterogeneous unit-specific trends
Estimate treatment effects using standard TWFE DiD
Estimate treatment effects using synthetic control
Implement the full SDID estimator with unit and time weights
Compare all three estimators against the true ATT
Conduct placebo-based inference

Step 1: Simulate the State-Level Panel

The DGP features a balanced panel of 40 states over 30 periods. State 1 receives treatment at period 21. Each state has a fixed effect, a state-specific linear trend, and idiosyncratic noise. The differential trends violate the parallel trends assumption.

1# First-time setup: install.packages(c("synthdid", "fixest"))
2library(synthdid)  # Arkhangelsky et al. (2021) SDID implementation
3library(fixest)    # fast fixed-effects estimation for TWFE comparison
4
5set.seed(2021)
6
7N <- 40; T_total <- 30; T_pre <- 20; tau_true <- 5.0
8
9# State fixed effects: permanent level differences across states
10alpha <- rnorm(N, 50, 10)
11# Time fixed effects: common macro shocks (random walk)
12delta <- cumsum(rnorm(T_total, 0.3, 0.15))
13# State-specific linear trends: key source of parallel trends violation
14gamma <- rnorm(N, 0, 0.15)
15gamma[1] <- 0.4  # treated state has steeper trend than controls
16
17# Build balanced panel
18df <- expand.grid(state = 1:N, time = 1:T_total)
19df <- df[order(df$state, df$time), ]
20# Outcome = state FE + time FE + differential trend + noise
21df$Y <- alpha[df$state] + delta[df$time] +
22      gamma[df$state] * df$time + rnorm(nrow(df), 0, 1)
23df$treated_unit <- as.integer(df$state == 1)
24df$post <- as.integer(df$time > T_pre)
25df$D <- df$treated_unit * df$post         # treatment indicator
26df$Y[df$D == 1] <- df$Y[df$D == 1] + tau_true  # add known treatment effect
27
28cat("Panel:", N, "x", T_total, "=", nrow(df), "obs\n")
29cat("True ATT:", tau_true, "\n")

Requiressynthdid fixest

Expected output:

Panel: 40 states x 30 periods = 1200 obs
Treated: state 1, from period 21
Pre-treatment: 20, Post-treatment: 10
True ATT: 5.0

Treated state trend: gamma = 0.40
Control trends: mean ~ 0.00, sd ~ 0.15

Step 2: Standard DiD (TWFE) Estimator

Standard TWFE DiD assumes parallel trends. When state-specific trends differ, TWFE is biased.

1# TWFE DiD: state + time FEs absorb additive level differences
2# Under parallel trends TWFE is unbiased; here differential trends cause bias
3did_est <- feols(Y ~ D | state + time, data = df)
4cat("TWFE:", coef(did_est)["D"], "\n")
5cat("True:", tau_true, "\n")
6cat("Bias:", coef(did_est)["D"] - tau_true, "\n")  # positive bias from steeper treated trend

Requiresdid

Expected output:

Method	Estimate	Bias
TWFE DiD	~8–12	+3 to +7
True ATT	5.00	—

Concept Check

The TWFE DiD estimate is substantially biased upward. What is the source of the bias?

Measurement error in the outcome variable.The treated state has a steeper pre-existing trend than the control states. DiD attributes the trend-driven increase to the treatment effect, inflating the estimate.Too few control states.Treatment was not randomly assigned.

Step 3: Synthetic Control Estimator

Synthetic control constructs a weighted combination of control units that matches the treated unit's pre-treatment trajectory.

1# Reshape panel into the matrix format required by synthdid
2setup <- panel.matrices(df, unit = "state", time = "time",
3                       outcome = "Y", treatment = "D")
4
5# SC matches pre-treatment levels without differencing — residual bias from imperfect fit
6sc_est <- sc_estimate(setup$Y, setup$N0, setup$T0)
7cat("SC:", c(sc_est), "\n")
8cat("Bias:", c(sc_est) - tau_true, "\n")  # smaller than TWFE but not zero

Requiressynthdid

Expected output:

Method	Estimate	Bias
Synthetic Control	~5.40	+0.40
True ATT	5.00	—

SC performs better than DiD by finding control states with similar trends. However, SC matches levels without differencing, so imperfect fit creates residual bias.

Step 4: Synthetic Difference-in-Differences (SDID)

SDID combines unit weights (like SC), time weights (unique to SDID), and differencing (like DiD). The time weights concentrate on the most informative pre-treatment periods.

1# SDID: combines unit reweighting (like SC) + differencing (like DiD) + time weights
2sdid_est <- synthdid_estimate(setup$Y, setup$N0, setup$T0)
3# DiD from the synthdid package for consistent comparison
4did_est_pkg <- did_estimate(setup$Y, setup$N0, setup$T0)
5
6# Compare all three estimators against the true ATT
7cat("=== Comparison ===\n")
8cat("DiD:", c(did_est_pkg), " bias:", c(did_est_pkg) - tau_true, "\n")
9cat("SC:", c(sc_est), " bias:", c(sc_est) - tau_true, "\n")
10cat("SDID:", c(sdid_est), " bias:", c(sdid_est) - tau_true, "\n")
11cat("True:", tau_true, "\n")
12
13# Examine SDID weights — unit weights select similar donors, time weights
14# concentrate on the most informative pre-treatment periods
15sdid_w <- attr(sdid_est, "weights")
16cat("\nUnit weights (top 5):\n")
17omega <- sdid_w$omega
18top5 <- order(omega, decreasing = TRUE)[1:5]
19for (i in top5) cat("  State", i+1, ":", round(omega[i], 4), "\n")
20
21# Time weights should concentrate on later pre-treatment periods
22cat("\nTime weights on last 5 periods:",
23  round(tail(sdid_w$lambda, 5), 3), "\n")
24
25# Diagnostic plot: treated unit, synthetic, and the SDID counterfactual
26plot(sdid_est, main = "Synthetic DiD")

Requiresdid synthdid

Expected output:

Method	Estimate	Bias
DiD (TWFE)	~7.20	+2.20
Synthetic Control	~5.40	+0.40
SDID	~5.10	+0.10
True ATT	5.00	—

Time weight concentration:
  Last 5 pre-periods:  ~0.75
  First 5 pre-periods: ~0.05

In this simulation, SDID produces the closest estimate to the true ATT. The time weights concentrate on the later pre-treatment periods, which are most informative for extrapolating the counterfactual.

Concept Check

In this simulation, SDID outperforms both DiD and SC. What specific combination of features allows SDID to achieve smaller bias than either method alone?

SDID uses more data than DiD or SC.SDID combines unit reweighting (which matches the treated unit's pre-treatment trajectory, like SC) with differencing (which removes common time effects and level differences, like DiD), plus time reweighting (which concentrates on the most informative pre-treatment periods). The result is a doubly robust estimator.SDID uses a different regression specification.SDID includes covariates that DiD and SC do not.

Step 5: Placebo Inference and Monte Carlo

1# Placebo inference: run SDID on each control unit to build the null distribution
2# vcov(..., method = "placebo") computes the SD of placebo estimates as the SE
3se_sdid <- sqrt(vcov(sdid_est, method = "placebo"))
4t_stat <- c(sdid_est) / se_sdid  # large t = estimate is significant
5
6cat("=== Placebo Inference ===\n")
7cat("SDID:", c(sdid_est), "\n")
8cat("SE:", se_sdid, "\n")
9cat("t:", t_stat, "\n")
10# 95% CI via normal approximation (valid asymptotically)
11ci <- c(sdid_est) + c(-1, 1) * 1.96 * se_sdid
12cat("95% CI: [", ci[1], ",", ci[2], "]\n")
13# Simulation check: does the CI contain the known true effect?
14cat("Covers true:", ci[1] <= tau_true & ci[2] >= tau_true, "\n")

Expected output — Placebo inference:

SDID estimate:   ~5.10
Placebo mean:    ~0.01
Placebo SD:      ~0.50
t-statistic:     ~10.2
Placebo p-value: 0.000
95% CI:          [~4.12, ~6.08]
Covers true ATT: True

Concept Check

SDID has both smaller bias and smaller variance than SC in the Monte Carlo. What explains the variance reduction?

SDID uses more observations.SDID's differencing step removes variance from unit fixed effects and common time shocks, while SC's level-matching approach retains that variance. Additionally, SDID's time weights reduce variance by concentrating on the most informative pre-treatment periods.Regularization always reduces variance.Random variation in the Monte Carlo.

Summary

The replication of Arkhangelsky et al. (2021) demonstrates:

DiD fails under differential trends. When the treated unit has a steeper trajectory, TWFE absorbs the trend into the treatment effect estimate.
SC improves on DiD by reweighting. Finding control units with similar trajectories reduces but does not eliminate bias.
SDID combines the best of both. Unit reweighting (like SC) plus differencing (like DiD) plus time reweighting achieves a form of double robustness.
SDID is efficient. Both smaller bias and smaller variance than SC, and dramatically smaller bias than DiD.
Placebo inference works. Assigning treatment to each control unit in turn produces a valid null distribution for constructing p-values and confidence intervals.

Extension Exercises

Multiple treated units. Modify the DGP so that 5 states receive treatment. How do DiD, SC, and SDID compare with multiple treated units?
Staggered adoption. Have different states adopt treatment in different periods. Combine SDID with staggered adoption methods.
Stronger trends. Increase gamma from 0.4 to 1.0. How does DiD bias scale with the trend differential? Does SDID remain robust?
Short pre-treatment. Reduce T_pre from 20 to 5 periods. How does the quality of SDID unit weights degrade?
Non-stationary errors. Replace i.i.d. noise with AR(1) errors. Does SDID inference remain valid?
Covariates. Add time-varying covariates that partially explain state-specific trends. Estimate an augmented SDID.
Jackknife inference. Compare placebo-based SEs with jackknife SEs. When do the two approaches differ?
California Proposition 99. Apply SDID to the classic Proposition 99 dataset (available in the synthdid package). Compare SDID, SC, and DiD estimates.

Overview#

Step 1: Simulate the State-Level Panel#

Step 2: Standard DiD (TWFE) Estimator#

Step 3: Synthetic Control Estimator#

Step 4: Synthetic Difference-in-Differences (SDID)#

Step 5: Placebo Inference and Monte Carlo#

Summary#

Extension Exercises#

Overview

Step 1: Simulate the State-Level Panel

Step 2: Standard DiD (TWFE) Estimator

Step 3: Synthetic Control Estimator

Step 4: Synthetic Difference-in-Differences (SDID)

Step 5: Placebo Inference and Monte Carlo

Summary

Extension Exercises