MethodAtlas
tutorial2 hours

Lab: Difference-in-Differences from Scratch

Implement the canonical 2x2 difference-in-differences design step by step. Simulate data, estimate the treatment effect, test the parallel trends assumption, and learn to spot common pitfalls.

Overview

Difference-in-Differences (DiD) is one of the most widely used causal inference methods in economics, political science, and management research. In this lab, you will implement the classic 2x2 DiD design from scratch, building intuition for why it works and when it fails.

What you will learn:

  • How to set up and estimate a 2x2 DiD design
  • Why the parallel trends assumption is essential and how to assess it
  • How to implement DiD with both manual computation and regression
  • How to properly cluster standard errors
  • What happens when parallel trends is violated

Prerequisites: OLS regression basics (see the OLS lab). Understanding of panel data structure.


Step 1: Understand the Setup

The DiD design requires four groups of observations arranged in a 2x2 table:

Before TreatmentAfter Treatment
TreatedAB
ControlCD

The DiD estimator is: (B - A) - (D - C)

This subtracts the change in the control group from the change in the treated group, removing any common time trend.


Step 2: Simulate the Data

We will simulate a dataset inspired by Card and Krueger (1994): fast-food restaurant employment before and after a minimum wage increase.

library(estimatr)
library(fixest)

set.seed(2024)
n_restaurants <- 400

restaurant_id <- 1:n_restaurants
treated <- c(rep(1, 200), rep(0, 200))

base_emp <- 20 + 3 * rnorm(n_restaurants)
restaurant_fe <- rnorm(n_restaurants) * 2

# Build panel
df <- data.frame()
for (t in 0:1) {
time_trend <- 1.5 * t
treatment_effect <- 2.5 * t * treated
employment <- base_emp + restaurant_fe + time_trend +
              treatment_effect + rnorm(n_restaurants) * 1.5
df <- rbind(df, data.frame(
  restaurant_id = restaurant_id,
  state = ifelse(treated == 1, "NJ", "PA"),
  treated = treated,
  post = t,
  employment = employment
))
}

cat("Dataset:", nrow(df), "observations\n")
aggregate(employment ~ state + post, data = df, FUN = mean)

Expected output:

Dataset: 800 observations (400 restaurants x 2 periods)

Group means by state and period:

StatePeriodMean Employment
NJBefore20.03
NJAfter24.08
PABefore19.88
PAAfter21.32

Step 3: Compute DiD Manually

Before running any regression, compute the DiD estimate by hand using group means. This exercise builds intuition for exactly what the estimator does.

# Group means
means <- aggregate(employment ~ treated + post, data = df, FUN = mean)

nj_before <- means$employment[means$treated == 1 & means$post == 0]
nj_after  <- means$employment[means$treated == 1 & means$post == 1]
pa_before <- means$employment[means$treated == 0 & means$post == 0]
pa_after  <- means$employment[means$treated == 0 & means$post == 1]

did_manual <- (nj_after - nj_before) - (pa_after - pa_before)

cat("Change in NJ:", nj_after - nj_before, "\n")
cat("Change in PA:", pa_after - pa_before, "\n")
cat("DiD Estimate:", did_manual, "\n")
cat("True Effect:  2.50\n")
Requiresdid

Expected output — manual DiD computation:

ComponentValue
NJ Before20.03
NJ After24.08
PA Before19.88
PA After21.32
Change in NJ+4.05
Change in PA+1.44
DiD Estimate2.61
True Effect2.50

The manual DiD estimate should be close to the true treatment effect of 2.50. Small deviations are due to sampling variability.

Concept Check

In the DiD framework, what does the control group (PA) help us estimate?


Step 4: DiD via Regression

The same DiD estimate can be obtained from a regression. The coefficient on the interaction term treated x post is exactly the DiD estimate.

# DiD regression
m_did <- lm_robust(employment ~ treated * post, data = df,
                  clusters = restaurant_id, se_type = "CR2")
summary(m_did)

cat("\nDiD estimate (treated:post):", coef(m_did)["treated:post"], "\n")
cat("Manual estimate:", did_manual, "\n")
Requiresdid

Expected output — DiD regression:

VariableCoefficientSEt-statp-value
Intercept19.880.2773.60.000
treated0.150.380.40.694
post1.440.159.60.000
treated:post2.610.2112.40.000

The coefficient on treated:post is the DiD estimate. It should match the manual computation exactly (around 2.61). Standard errors are clustered at the restaurant level.


Step 5: DiD with Fixed Effects

In most applications, you will use unit and time fixed effects instead of the simple interaction specification. This approach absorbs all time-invariant restaurant characteristics and common time shocks.

library(fixest)

# Create interaction term
df$treat_post <- df$treated * df$post

# DiD with two-way fixed effects (best practice)
m_fe <- feols(employment ~ treat_post | restaurant_id + post,
            data = df, vcov = ~restaurant_id)
summary(m_fe)

cat("\nDiD with TWFE:", coef(m_fe)["treat_post"], "\n")
Requiresfixestdid

Expected output — TWFE regression:

VariableCoefficientSE
treat_post2.610.21
Restaurant FEsYes
Time FEsYes
N800

With a balanced panel and no additional covariates, the two-way fixed effects (TWFE) estimate is identical to the simple DiD regression estimate. The unit fixed effects absorb all time-invariant restaurant characteristics, and the time fixed effect absorbs the common time trend.


Step 6: Test the Parallel Trends Assumption

The key untestable assumption of DiD is that treated and control groups would have followed the same trend absent treatment. While we cannot test this directly, we can check whether trends were parallel in the pre-treatment period. With only two periods, we cannot do this, so let us extend our simulation.

set.seed(2024)
n_rest <- 200
treated_ids <- c(rep(1, 100), rep(0, 100))
base <- 20 + 3 * rnorm(n_rest)
rest_fe <- rnorm(n_rest) * 2

# 8 periods, treatment at t=5
df_ext <- data.frame()
for (t in 0:7) {
time_trend <- 0.5 * t
treat_effect <- 2.5 * pmax(0, t - 4) * treated_ids
emp <- base + rest_fe + time_trend + treat_effect + rnorm(n_rest) * 1.5
df_ext <- rbind(df_ext, data.frame(
  restaurant_id = 1:n_rest,
  treated = treated_ids,
  period = t,
  post = as.integer(t >= 5),
  employment = emp
))
}

# Plot
library(ggplot2)
means_by_time <- aggregate(employment ~ treated + period, data = df_ext, FUN = mean)
means_by_time$group <- ifelse(means_by_time$treated == 1, "Treated (NJ)", "Control (PA)")

ggplot(means_by_time, aes(x = period, y = employment, color = group)) +
geom_point() + geom_line() +
geom_vline(xintercept = 4.5, linetype = "dashed", color = "red") +
labs(title = "Parallel Trends Check", x = "Period", y = "Avg Employment") +
theme_minimal()
Requiresggplot2

Pre-treatment group means (extended simulation):

PeriodTreated (NJ)Control (PA)Difference
020.0519.92+0.13
120.5320.44+0.09
221.0720.90+0.17
321.5021.42+0.08
422.0421.93+0.11
525.0222.41+2.61
625.5522.89+2.66
726.0523.43+2.62

Periods 0–4 are pre-treatment; treatment begins at period 5. The difference between groups is roughly constant in the pre-treatment period (around +0.1), confirming parallel trends. After treatment, the gap jumps to around +2.6, reflecting the treatment effect of 2.5 per post-treatment period.


Step 7: What Happens When Parallel Trends Fails

Let us simulate a violation of parallel trends to see how it biases the DiD estimate.

# Violation: treated group has a different trend
set.seed(2024)
df_bad <- data.frame()
for (t in 0:7) {
time_trend_ctrl <- 0.5 * t
time_trend_treat <- 0.5 * t + 0.3 * t  # Extra trend
treat_effect <- 2.5 * pmax(0, t - 4) * treated_ids

for (i in 1:n_rest) {
  trend <- ifelse(treated_ids[i] == 1, time_trend_treat, time_trend_ctrl)
  emp <- base[i] + rest_fe[i] + trend + treat_effect[i] + rnorm(1) * 1.5
  df_bad <- rbind(df_bad, data.frame(
    restaurant_id = i, treated = treated_ids[i],
    period = t, post = as.integer(t >= 5), employment = emp
  ))
}
}

df_bad$treat_post <- df_bad$treated * df_bad$post
m_bad <- feols(employment ~ treat_post | restaurant_id + period,
             data = df_bad, vcov = ~restaurant_id)

cat("DiD with violated parallel trends:", coef(m_bad)["treat_post"], "\n")
cat("True effect: 2.50\n")
cat("The estimate is biased upward.\n")
Requiresdid

Expected output — biased DiD with violated parallel trends:

MetricValue
DiD estimate (violated trends)3.40
True treatment effect2.50
Bias+0.90

The extra pre-existing upward trend of 0.3 per period in the treated group inflates the DiD estimate. The bias is approximately equal to the differential trend (0.3) multiplied by the number of post-treatment periods contributing to the estimate. Because DiD attributes all of the post-treatment divergence to the treatment, any pre-existing differential trend is incorrectly captured as a treatment effect.

Concept Check

If the treated group was already growing faster than the control group before treatment, how does this bias the DiD estimate?


Step 8: Exercises

  1. Vary the treatment effect. Change the true treatment effect from 2.5 to 0 (no effect) and re-run the analysis. Can you correctly conclude there is no significant effect?

  2. Add covariates. Add restaurant-level controls (e.g., chain affiliation, number of registers) to the regression. Does the estimate change? Why or why not?

  3. Try different clustering levels. Instead of clustering at the restaurant level, try clustering at the state level (only 2 clusters). What happens to inference? Why is this problematic?

  4. Event study specification. Using the extended data (df_ext), create lead and lag indicators and estimate an event study regression. Plot the coefficients to visualize both pre-trends and dynamic treatment effects.


Summary

In this lab you learned:

  • DiD estimates causal effects by comparing changes over time between treated and control groups
  • The manual computation and regression approaches give identical point estimates
  • The parallel trends assumption is the key identification condition and is fundamentally untestable
  • Plotting pre-treatment trends is informative but not conclusive
  • When parallel trends fails, DiD estimates are biased in a predictable direction
  • In most settings, cluster standard errors at the level where treatment varies
  • Two-way fixed effects (unit + time) is the standard implementation in applied work