Lab: Difference-in-Differences from Scratch
Implement the canonical 2x2 difference-in-differences design step by step. Simulate data, estimate the treatment effect, test the parallel trends assumption, and learn to spot common pitfalls.
Overview
Difference-in-Differences (DiD) is one of the most widely used causal inference methods in economics, political science, and management research. In this lab, you will implement the classic 2x2 DiD design from scratch, building intuition for why it works and when it fails.
What you will learn:
- How to set up and estimate a 2x2 DiD design
- Why the assumption is essential and how to assess it
- How to implement DiD with both manual computation and regression
- How to properly cluster standard errors
- What happens when parallel trends is violated
Prerequisites: OLS regression basics (see the OLS lab). Understanding of panel data structure.
Step 1: Understand the Setup
The DiD design requires four groups of observations arranged in a 2x2 table:
| Before Treatment | After Treatment | |
|---|---|---|
| Treated | A | B |
| Control | C | D |
The DiD estimator is: (B - A) - (D - C)
This subtracts the change in the control group from the change in the treated group, removing any common time trend.
Step 2: Simulate the Data
We will simulate a dataset inspired by : fast-food restaurant employment before and after a minimum wage increase.
library(estimatr)
library(fixest)
set.seed(2024)
n_restaurants <- 400 # 200 NJ (treated), 200 PA (control)
restaurant_id <- 1:n_restaurants
treated <- c(rep(1, 200), rep(0, 200)) # First 200 are NJ
# Baseline employment and restaurant-level fixed effects
base_emp <- 20 + 3 * rnorm(n_restaurants)
restaurant_fe <- rnorm(n_restaurants) * 2
# Build panel: two periods (before=0, after=1)
df <- data.frame()
for (t in 0:1) {
time_trend <- 1.5 * t # Common trend for both groups
treatment_effect <- 2.5 * t * treated # True DiD effect = 2.5
employment <- base_emp + restaurant_fe + time_trend +
treatment_effect + rnorm(n_restaurants) * 1.5
df <- rbind(df, data.frame(
restaurant_id = restaurant_id,
state = ifelse(treated == 1, "NJ", "PA"),
treated = treated,
post = t,
employment = employment
))
}
cat("Dataset:", nrow(df), "observations\n")
# Check group means by state and period
aggregate(employment ~ state + post, data = df, FUN = mean)Expected output:
Dataset: 800 observations (400 restaurants x 2 periods)
Group means by state and period:
| State | Period | Mean Employment |
|---|---|---|
| NJ | Before | 20.03 |
| NJ | After | 24.08 |
| PA | Before | 19.88 |
| PA | After | 21.32 |
Step 3: Compute DiD Manually
Before running any regression, compute the DiD estimate by hand using group means. This exercise builds intuition for exactly what the estimator does.
# Compute group means by treatment status and period
means <- aggregate(employment ~ treated + post, data = df, FUN = mean)
# Extract the four cell means for the 2x2 table
nj_before <- means$employment[means$treated == 1 & means$post == 0]
nj_after <- means$employment[means$treated == 1 & means$post == 1]
pa_before <- means$employment[means$treated == 0 & means$post == 0]
pa_after <- means$employment[means$treated == 0 & means$post == 1]
# DiD = (change in treated) - (change in control)
did_manual <- (nj_after - nj_before) - (pa_after - pa_before)
cat("Change in NJ:", nj_after - nj_before, "\n")
cat("Change in PA:", pa_after - pa_before, "\n")
cat("DiD Estimate:", did_manual, "\n")
cat("True Effect: 2.50\n")Expected output — manual DiD computation:
| Component | Value |
|---|---|
| NJ Before | 20.03 |
| NJ After | 24.08 |
| PA Before | 19.88 |
| PA After | 21.32 |
| Change in NJ | +4.05 |
| Change in PA | +1.44 |
| DiD Estimate | 2.61 |
| True Effect | 2.50 |
The manual DiD estimate should be close to the true treatment effect of 2.50. Small deviations are due to sampling variability.
In the DiD framework, what does the control group (PA) help us estimate?
Step 4: DiD via Regression
The same DiD estimate can be obtained from a regression. The coefficient on the interaction term treated x post is exactly the DiD estimate.
# DiD regression: the interaction coefficient IS the DiD estimate
m_did <- lm_robust(employment ~ treated * post, data = df,
clusters = restaurant_id, se_type = "CR2")
summary(m_did)
# Verify the regression DiD matches the manual computation
cat("\nDiD estimate (treated:post):", coef(m_did)["treated:post"], "\n")
cat("Manual estimate:", did_manual, "\n")Expected output — DiD regression:
| Variable | Coefficient | SE | t-stat | p-value |
|---|---|---|---|---|
| Intercept | 19.88 | 0.27 | 73.6 | 0.000 |
| treated | 0.15 | 0.38 | 0.4 | 0.694 |
| post | 1.44 | 0.15 | 9.6 | 0.000 |
| treated:post | 2.61 | 0.21 | 12.4 | 0.000 |
The coefficient on treated:post is the DiD estimate. It should match the manual computation exactly (around 2.61). Standard errors are clustered at the restaurant level.
Step 5: DiD with Fixed Effects
In most applications, you will use unit and time fixed effects instead of the simple interaction specification. This approach absorbs all time-invariant restaurant characteristics and common time shocks.
library(fixest)
# Create the treatment x post interaction manually
df$treat_post <- df$treated * df$post
# Two-way FE: absorb restaurant (unit) and post (time) fixed effects
# Cluster SEs at the restaurant level to account for serial correlation
m_fe <- feols(employment ~ treat_post | restaurant_id + post,
data = df, vcov = ~restaurant_id)
summary(m_fe)
cat("\nDiD with TWFE:", coef(m_fe)["treat_post"], "\n")Expected output — TWFE regression:
| Variable | Coefficient | SE |
|---|---|---|
| treat_post | 2.61 | 0.21 |
| Restaurant FEs | Yes | — |
| Time FEs | Yes | — |
| N | 800 | — |
With a balanced panel and no additional covariates, the two-way fixed effects (TWFE) estimate is identical to the simple DiD regression estimate. The unit fixed effects absorb all time-invariant restaurant characteristics, and the time fixed effect absorbs the common time trend.
Step 6: Test the Parallel Trends Assumption
The key untestable assumption of DiD is that treated and control groups would have followed the same trend absent treatment. While we cannot test the parallel trends assumption directly, we can check whether trends were parallel in the pre-treatment period. With only two periods, we cannot perform such a check, so let us extend our simulation.
set.seed(2024)
n_rest <- 200
treated_ids <- c(rep(1, 100), rep(0, 100))
base <- 20 + 3 * rnorm(n_rest)
rest_fe <- rnorm(n_rest) * 2
# Extended simulation: 8 periods, treatment starts at t=5
df_ext <- data.frame()
for (t in 0:7) {
time_trend <- 0.5 * t # Common trend
treat_effect <- 2.5 * as.integer(t >= 5) * treated_ids # Constant effect after t=5
emp <- base + rest_fe + time_trend + treat_effect + rnorm(n_rest) * 1.5
df_ext <- rbind(df_ext, data.frame(
restaurant_id = 1:n_rest,
treated = treated_ids,
period = t,
post = as.integer(t >= 5),
employment = emp
))
}
# Plot group means over time to check parallel pre-trends
library(ggplot2)
means_by_time <- aggregate(employment ~ treated + period, data = df_ext, FUN = mean)
means_by_time$group <- ifelse(means_by_time$treated == 1, "Treated (NJ)", "Control (PA)")
ggplot(means_by_time, aes(x = period, y = employment, color = group)) +
geom_point() + geom_line() +
geom_vline(xintercept = 4.5, linetype = "dashed", color = "red") +
labs(title = "Parallel Trends Check", x = "Period", y = "Avg Employment") +
theme_minimal()Pre-treatment group means (extended simulation):
| Period | Treated (NJ) | Control (PA) | Difference |
|---|---|---|---|
| 0 | 20.05 | 19.92 | +0.13 |
| 1 | 20.53 | 20.44 | +0.09 |
| 2 | 21.07 | 20.90 | +0.17 |
| 3 | 21.50 | 21.42 | +0.08 |
| 4 | 22.04 | 21.93 | +0.11 |
| 5 | 25.02 | 22.41 | +2.61 |
| 6 | 25.55 | 22.89 | +2.66 |
| 7 | 26.05 | 23.43 | +2.62 |
Periods 0–4 are pre-treatment; treatment begins at period 5. The difference between groups is roughly constant in the pre-treatment period (around +0.1), confirming parallel trends. After treatment, the gap jumps to around +2.6, reflecting the constant treatment effect of 2.5.
Step 7: What Happens When Parallel Trends Fails
Let us simulate a violation of parallel trends to see how it biases the DiD estimate.
# Violation: treated group has an extra upward pre-trend of 0.3 per period
set.seed(2024)
df_bad <- data.frame()
for (t in 0:7) {
time_trend_ctrl <- 0.5 * t
time_trend_treat <- 0.5 * t + 0.3 * t # Differential trend violates PT
treat_effect <- 2.5 * as.integer(t >= 5) * treated_ids
for (i in 1:n_rest) {
trend <- ifelse(treated_ids[i] == 1, time_trend_treat, time_trend_ctrl)
emp <- base[i] + rest_fe[i] + trend + treat_effect[i] + rnorm(1) * 1.5
df_bad <- rbind(df_bad, data.frame(
restaurant_id = i, treated = treated_ids[i],
period = t, post = as.integer(t >= 5), employment = emp
))
}
}
df_bad$treat_post <- df_bad$treated * df_bad$post
m_bad <- feols(employment ~ treat_post | restaurant_id + period,
data = df_bad, vcov = ~restaurant_id)
cat("DiD with violated parallel trends:", coef(m_bad)["treat_post"], "\n")
cat("True effect: 2.50\n")
cat("The estimate is biased upward.\n")Expected output — biased DiD with violated parallel trends:
| Metric | Value |
|---|---|
| DiD estimate (violated trends) | 3.40 |
| True treatment effect | 2.50 |
| Bias | +0.90 |
The extra pre-existing upward trend of 0.3 per period in the treated group inflates the DiD estimate. The bias is approximately equal to the differential trend (0.3) multiplied by the number of post-treatment periods contributing to the estimate. Because DiD attributes all of the post-treatment divergence to the treatment, any pre-existing differential trend is incorrectly captured as a treatment effect.
If the treated group was already growing faster than the control group before treatment, how does this bias the DiD estimate?
Step 8: Exercises
-
Vary the treatment effect. Change the true treatment effect from 2.5 to 0 (no effect) and re-run the analysis. Can you correctly conclude there is no significant effect?
-
Add covariates. Add restaurant-level controls (e.g., chain affiliation, number of registers) to the regression. Does the estimate change? Why or why not?
-
Try different clustering levels. Instead of clustering at the restaurant level, try clustering at the state level (only 2 clusters). What happens to inference? Why is this problematic?
-
Event study specification. Using the extended data (
df_ext), create lead and lag indicators and estimate an event study regression. Plot the coefficients to visualize both pre-trends and dynamic treatment effects.
Summary
In this lab you learned:
- DiD estimates causal effects by comparing changes over time between treated and control groups
- The manual computation and regression approaches give identical point estimates
- The parallel trends assumption is the key identification condition and is fundamentally untestable
- Plotting pre-treatment trends is informative but not conclusive
- When parallel trends fails, DiD estimates are biased in a predictable direction
- In most settings, cluster standard errors at the level where treatment varies
- Two-way fixed effects (unit + time) is the most common implementation in applied work