Lab·tutorial·9 min read

tutorial120 minutes

Lab: Synthetic Control Method

Construct a synthetic control from donor states to estimate a policy effect on one treated unit. Placebo tests and inference for small-N comparative studies.

Method: Synthetic Control
Languages: Python, R, Stata
Dataset: Simulated state-level economic data (Proposition 99-style)

Overview

In this lab you will estimate the effect of a hypothetical anti-smoking policy (modeled on California's Proposition 99) using the synthetic control method of Abadie et al. (2010). You will construct a weighted combination of untreated states that closely matches the treated state's pre-treatment trajectory, then measure how the treated state diverges after the intervention.

What you will learn:

How to set up panel data for synthetic control estimation
How to construct a synthetic control unit by optimizing donor weights
How to interpret the gap between the treated unit and its synthetic counterpart
How to run placebo tests (in-space and in-time) for inference
How to assess statistical significance with permutation-based p-values

Prerequisites: Familiarity with panel data structure and basic regression. Understanding of the potential outcomes framework is helpful.

Step 1: Simulate State-Level Panel Data

We create a balanced panel of 40 states observed over 30 years (1970–1999). State 1 receives a policy treatment in 1989.

1# First-time setup: install.packages(c("Synth"))
2library(Synth)
3
4set.seed(42)
5J <- 40; T_len <- 30          # 40 states, 30 years
6treat_unit <- 1; treat_year <- 1989
7
8# State fixed effects (permanent level differences) and time trend (random walk)
9state_fe <- rnorm(J, 50, 10)
10time_fe <- cumsum(rnorm(T_len, 0.5, 0.3))
11
12# Build balanced panel: every state x every year
13df <- expand.grid(state = 1:J, year = 1970:1999)
14df <- df[order(df$state, df$year), ]
15
16# Covariates correlated with state FE (for matching)
17df$income <- 5000 + 100 * state_fe[df$state] / 50 + 50 * time_fe[df$year - 1969] + rnorm(nrow(df), 0, 200)
18df$beer <- 20 + 0.5 * state_fe[df$state] / 50 + rnorm(nrow(df), 0, 3)
19df$retprice <- 60 + rnorm(nrow(df), 0, 10)
20
21# Outcome: cigarette sales driven by state FE + time trend + covariates + noise
22df$cigsale <- state_fe[df$state] + time_fe[df$year - 1969] +
23            0.002 * df$income - 0.1 * df$retprice + 0.5 * df$beer +
24            rnorm(nrow(df), 0, 3)
25
26# Apply treatment effect: 20-pack reduction for state 1 after 1989
27df$cigsale[df$state == treat_unit & df$year >= treat_year] <-
28df$cigsale[df$state == treat_unit & df$year >= treat_year] - 20
29
30head(df, 10)
31cat("Panel:", length(unique(df$state)), "states,", length(unique(df$year)), "years\n")

RequiresSynth

Expected output:

state	year	cigsale	income	beer	retprice
1	1970	75.2	5,200	21.4	58.3
1	1971	76.1	5,350	20.8	62.1
1	1972	76.8	5,500	21.5	55.7
1	1973	77.5	5,620	20.2	63.4
1	1974	78.0	5,780	21.1	59.8

Panel: 40 states, 30 years
Treated unit: state 1, treatment year: 1989

Summary statistics:

Variable	Mean	Std Dev	Min	Max
cigsale	75.0	12.5	35	110
income	5,100	350	4,200	6,200
beer	20.5	3.2	12	30
retprice	60.0	10.0	30	90

Step 2: Construct the Synthetic Control Unit

We find weights for the donor states (2–40) so that the weighted average matches state 1's pre-treatment outcomes and covariates as closely as possible.

1# Prepare data for Synth package: define treated unit, donor pool, and matching window
2dataprep_out <- dataprep(
3foo = df,
4predictors = c("income", "beer", "retprice"),  # covariates for matching
5predictors.op = "mean",                         # match on pre-treatment means
6dependent = "cigsale",                          # outcome variable
7unit.variable = "state",
8time.variable = "year",
9treatment.identifier = 1,                       # state 1 is treated
10controls.identifier = 2:40,                     # states 2-40 form the donor pool
11time.predictors.prior = 1970:1988,              # covariate matching window
12time.optimize.ssr = 1970:1988,                  # outcome matching window (pre-treatment)
13time.plot = 1970:1999                           # full period for plotting
14)
15
16# Solve the nested optimization: find unit weights minimizing pre-treatment prediction error
17synth_out <- synth(dataprep_out)
18
19# Extract and display the 5 donors with the largest weights
20w <- synth_out$solution.w
21rownames(w) <- 2:40
22top5 <- head(w[order(-w[,1]), , drop = FALSE], 5)
23print(round(top5, 4))

RequiresSynth

Expected output: Donor weights

State	Weight
State 8	~0.35
State 15	~0.25
State 22	~0.18
State 31	~0.12
State 5	~0.06
Remaining 34 states	~0.04 (total)

The weights are sparse: 3–5 donor states receive meaningful positive weights (>0.05), while most states receive near-zero weight. The selected donors are the states whose pre-treatment cigarette sales trajectories most closely resemble state 1's trajectory.

Concept Check

Why do we constrain the donor weights to be non-negative and sum to one?

This constraint is just a mathematical convenience with no substantive justification.To ensure the synthetic control is a convex combination (interpolation) of real donor units, avoiding extrapolation beyond the data.To guarantee that the optimization has a unique solution.Because OLS requires these restrictions.

Step 3: Estimate the Treatment Effect

Compare the treated unit's actual outcomes to the synthetic control's outcomes after the intervention.

1# Panel A: treated vs synthetic trajectories (pre-treatment fit + post-treatment divergence)
2path.plot(synth_out, dataprep_out,
3        Main = "Treated vs Synthetic Control",
4        Ylab = "Per-capita cigarette sales",
5        Xlab = "Year",
6        Legend = c("State 1", "Synthetic"),
7        Legend.position = "bottomleft")
8abline(v = 1989, lty = 2, col = "gray")  # mark treatment year
9
10# Panel B: gap = treated - synthetic (treatment effect over time)
11gaps.plot(synth_out, dataprep_out,
12        Main = "Gap: Treated - Synthetic",
13        Ylab = "Gap in cigarette sales")
14abline(h = 0, lty = 2, col = "gray")  # zero-effect reference line
15
16# Compute average post-treatment gap (should be close to true effect of -20)
17Y1 <- dataprep_out$Y1plot                            # actual treated outcomes
18Y0 <- dataprep_out$Y0plot %*% synth_out$solution.w   # synthetic counterfactual
19gap <- Y1 - Y0
20post_idx <- which(as.numeric(rownames(Y1)) >= 1989)
21cat("Average treatment effect:", mean(gap[post_idx]), "\n")

Expected output:

Average treatment effect (post-1989): ~-20.0
True effect: -20.00

Step 4: Placebo Tests (In-Space)

To assess whether the estimated gap is statistically unusual, we run the same procedure pretending each donor state was the treated unit.

1# First-time setup: install.packages(c("Synth"))
2# In-space placebo: iterate over all states, treating each as if it were treated
3library(Synth)
4
5# Allocate matrix: rows = years (1970-1999), cols = one per state
6placebo_gaps <- matrix(NA, nrow = 30, ncol = J)
7for (s in 1:J) {
8# All states except s serve as the donor pool for this iteration
9donors <- setdiff(1:J, s)
10# Prepare data treating state s as the treated unit
11dp <- dataprep(foo = df, predictors = c("income", "beer", "retprice"),
12               predictors.op = "mean", dependent = "cigsale",
13               unit.variable = "state", time.variable = "year",
14               treatment.identifier = s, controls.identifier = donors,
15               time.predictors.prior = 1970:1988,
16               time.optimize.ssr = 1970:1988, time.plot = 1970:1999)
17# Use tryCatch so a failed optimization for one state doesn't abort the loop
18so <- tryCatch(suppressMessages(synth(dp)), error = function(e) NULL)
19if (!is.null(so)) {
20  # Store the gap (actual - synthetic) for this placebo state
21  placebo_gaps[, s] <- dp$Y1plot - dp$Y0plot %*% so$solution.w
22}
23}
24
25# Plot all placebo gaps in light gray; treated state's gap in blue
26matplot(1970:1999, placebo_gaps, type = "l", col = "lightgray", lty = 1,
27      ylab = "Gap", xlab = "Year", main = "In-Space Placebo Test")
28lines(1970:1999, placebo_gaps[, 1], col = "blue", lwd = 2)
29# Vertical line marks treatment year; horizontal marks zero effect
30abline(v = 1989, lty = 2); abline(h = 0, lty = 2)

RequiresSynth

Expected output:

Permutation p-value: ~0.025 (1/40)

Only state 1 (the treated unit) has a post/pre RMSPE ratio that ranks at the top of the distribution, yielding a p-value of 1/40 = 0.025, which is significant at the 5% level. The rank position indicates that the estimated effect is statistically unusual.

Concept Check

You run placebo tests on all 39 donor states and find that only 1 out of 40 (including the treated state) has a post/pre RMSPE ratio as large as the treated state. What is the implied p-value?

0.0010.025 (1/40)0.05Cannot be computed without additional assumptions.

Step 5: In-Time Placebo Test

Assign a fake treatment date before the actual intervention to check whether a 'gap' appears when no treatment occurred.

1# In-time placebo: pretend treatment in 1982 (before the real 1989 cutoff)
2# Restrict to years before 1989 so there is no post-treatment contamination
3dp_fake <- dataprep(
4foo = df[df$year < 1989, ],
5predictors = c("income", "beer", "retprice"),
6predictors.op = "mean", dependent = "cigsale",
7unit.variable = "state", time.variable = "year",
8treatment.identifier = 1, controls.identifier = 2:40,
9# Pre-period for matching is 1970-1981 (before the fake treatment date)
10time.predictors.prior = 1970:1981,
11time.optimize.ssr = 1970:1981,
12# Plot window covers 1970-1988 (the full pre-Prop-99 period)
13time.plot = 1970:1988
14)
15# Fit the synthetic control under the fake treatment assumption
16so_fake <- synth(dp_fake)
17# A large gap at 1982 would undermine pre-treatment fit credibility
18gaps.plot(so_fake, dp_fake, Main = "In-Time Placebo (fake treatment 1982)")
19abline(v = 1982, col = "red", lty = 2)

RequiresSynth

Expected output:

If no gap appears at the fake date, the pre-treatment fit is credible.

Step 6: Sensitivity and Extensions

1# Leave-one-out: drop each top donor and re-estimate to check robustness
2top5_states <- as.integer(rownames(top5))
3
4# Baseline: actual treated + full synthetic control
5Y1_vec <- dataprep_out$Y1plot
6Y_synth_full <- dataprep_out$Y0plot %*% synth_out$solution.w
7
8par(mfrow = c(1, 1))
9plot(1970:1999, Y1_vec, type = "l", col = "blue",
10   lwd = 2, ylim = range(Y1_vec), ylab = "Cigarette sales", xlab = "Year",
11   main = "Leave-One-Out Robustness")
12lines(1970:1999, Y_synth_full, col = "red", lwd = 2, lty = 2)
13
14# For each top donor, remove it from the pool and re-solve
15for (d in top5_states) {
16donors_loo <- setdiff(2:40, d)  # donor pool minus state d
17dp_loo <- dataprep(foo = df, predictors = c("income", "beer", "retprice"),
18                   predictors.op = "mean", dependent = "cigsale",
19                   unit.variable = "state", time.variable = "year",
20                   treatment.identifier = 1, controls.identifier = donors_loo,
21                   time.predictors.prior = 1970:1988,
22                   time.optimize.ssr = 1970:1988, time.plot = 1970:1999)
23so_loo <- tryCatch(suppressMessages(synth(dp_loo)), error = function(e) NULL)
24if (!is.null(so_loo)) {
25  # Overlay the LOO synthetic trajectory; close lines = robust estimate
26  synth_loo <- dp_loo$Y0plot %*% so_loo$solution.w
27  lines(1970:1999, synth_loo, col = "gray", lty = 2)
28}
29}
30abline(v = 1989, lty = 2)  # treatment year

RequiresSynth

Expected output:

Leave-one-out robustness plot

The leave-one-out plot shows the sensitivity of the synthetic control to removing individual top-weight donor states:

Blue solid line: Treated state's actual cigarette sales
Red dashed line: Full synthetic control (using all 39 donors)
Gray dashed lines (5 lines): Each line shows the synthetic control re-estimated after dropping one of the top-5 donor states
Pre-treatment: All synthetic control variants closely track the treated unit, with minor differences
Post-treatment: All synthetic variants continue to track each other closely (within ~3–5 packs of the full synthetic), confirming the estimate is not driven by any single donor state
The post-treatment gap remains approximately -20 packs regardless of which donor is dropped, indicating the result is robust to the donor pool composition

Exercises

Change the treatment magnitude. Set the true effect to -5 instead of -20 and re-run the analysis. Can the synthetic control method still detect this smaller effect? What happens to the placebo p-value?
Add anticipation effects. Modify the DGP so that the treated state begins responding 2 years before the formal treatment date. How does this affect your estimates if you use 1989 as the treatment date? What should you do?
Increase the number of donors. Expand to 100 states and re-estimate. Does the pre-treatment fit improve? Does the estimate get closer to the truth?
Try augmented synthetic control. Use the augsynth package in R (or SparseSC in Python) and compare the results with the standard synthetic control. When does the augmented version help?

Expected output

If your code runs correctly, expect to see:

Pre-treatment fit: The synthetic control closely tracks state 1's cigarette sales before 1989 (RMSPE < 3–5 packs)
Post-treatment gap: A reduction of approximately 15–25 packs per capita after 1989 (true effect: -20 packs)
Estimated treatment effect: Around -15 to -25 packs, with the gap growing after treatment and stabilizing
Donor weights: A sparse set of weights — most donor states receive near-zero weight, with 3–6 donors receiving meaningful positive weights
Placebo tests (in-space): When running the same procedure on each control state, most placebo gaps should be much smaller than state 1's gap
Post/pre RMSPE ratio: State 1's ratio should rank highest or near highest among all states, yielding a permutation p-value below 0.10
Leave-one-out: Dropping any single donor state should not dramatically change the estimated gap (within 5 packs)
Panel dimensions: 40 states x 30 years = 1,200 observations

Summary

In this lab you learned:

The synthetic control method constructs a data-driven counterfactual from donor units, ideal for case studies with a single treated unit
Pre-treatment fit is the key diagnostic: if the synthetic control does not track the treated unit before the intervention, the post-treatment gap is not credible
Inference comes from permutation (placebo) tests, not standard asymptotic theory
The post/pre RMSPE ratio provides a principled test statistic for significance
Leave-one-out checks assess whether results depend on any single donor
The method works best with a long pre-treatment period, a sharp intervention, and a rich donor pool

Overview#

Step 1: Simulate State-Level Panel Data#

Step 2: Construct the Synthetic Control Unit#

Step 3: Estimate the Treatment Effect#

Step 4: Placebo Tests (In-Space)#

Step 5: In-Time Placebo Test#

Step 6: Sensitivity and Extensions#

Exercises#

Summary#

Overview

Step 1: Simulate State-Level Panel Data

Step 2: Construct the Synthetic Control Unit

Step 3: Estimate the Treatment Effect

Step 4: Placebo Tests (In-Space)

Step 5: In-Time Placebo Test

Step 6: Sensitivity and Extensions

Exercises

Summary