tutorial2 hours

Lab: Difference-in-Differences from Scratch

Implement the canonical 2x2 difference-in-differences design step by step. Simulate data, estimate the treatment effect, test the parallel trends assumption, and learn to spot common pitfalls.

Overview

Difference-in-Differences (DiD) is one of the most widely used causal inference methods in economics, political science, and management research. In this lab, you will implement the classic 2x2 DiD design from scratch, building intuition for why it works and when it fails.

What you will learn:

How to set up and estimate a 2x2 DiD design
Why the parallel trends assumption is essential and how to assess it
How to implement DiD with both manual computation and regression
How to properly cluster standard errors
What happens when parallel trends is violated

Prerequisites: OLS regression basics (see the OLS lab). Understanding of panel data structure.

Step 1: Understand the Setup

The DiD design requires four groups of observations arranged in a 2x2 table:

	Before Treatment	After Treatment
Treated	A	B
Control	C	D

The DiD estimator is: (B - A) - (D - C)

This subtracts the change in the control group from the change in the treated group, removing any common time trend.

Step 2: Simulate the Data

We will simulate a dataset inspired by Card and Krueger (1994): fast-food restaurant employment before and after a minimum wage increase.

1library(estimatr)
2library(fixest)
3
4set.seed(2024)
5n_restaurants <- 400
6
7restaurant_id <- 1:n_restaurants
8treated <- c(rep(1, 200), rep(0, 200))
9
10base_emp <- 20 + 3 * rnorm(n_restaurants)
11restaurant_fe <- rnorm(n_restaurants) * 2
12
13# Build panel
14df <- data.frame()
15for (t in 0:1) {
16time_trend <- 1.5 * t
17treatment_effect <- 2.5 * t * treated
18employment <- base_emp + restaurant_fe + time_trend +
19              treatment_effect + rnorm(n_restaurants) * 1.5
20df <- rbind(df, data.frame(
21  restaurant_id = restaurant_id,
22  state = ifelse(treated == 1, "NJ", "PA"),
23  treated = treated,
24  post = t,
25  employment = employment
26))
27}
28
29cat("Dataset:", nrow(df), "observations\n")
30aggregate(employment ~ state + post, data = df, FUN = mean)

Requiresestimatr fixest

Expected output:

Dataset: 800 observations (400 restaurants x 2 periods)

Group means by state and period:

State	Period	Mean Employment
NJ	Before	20.03
NJ	After	24.08
PA	Before	19.88
PA	After	21.32

Step 3: Compute DiD Manually

Before running any regression, compute the DiD estimate by hand using group means. This exercise builds intuition for exactly what the estimator does.

1# Group means
2means <- aggregate(employment ~ treated + post, data = df, FUN = mean)
3
4nj_before <- means$employment[means$treated == 1 & means$post == 0]
5nj_after  <- means$employment[means$treated == 1 & means$post == 1]
6pa_before <- means$employment[means$treated == 0 & means$post == 0]
7pa_after  <- means$employment[means$treated == 0 & means$post == 1]
8
9did_manual <- (nj_after - nj_before) - (pa_after - pa_before)
10
11cat("Change in NJ:", nj_after - nj_before, "\n")
12cat("Change in PA:", pa_after - pa_before, "\n")
13cat("DiD Estimate:", did_manual, "\n")
14cat("True Effect:  2.50\n")

Requiresdid

Expected output — manual DiD computation:

Component	Value
NJ Before	20.03
NJ After	24.08
PA Before	19.88
PA After	21.32
Change in NJ	+4.05
Change in PA	+1.44
DiD Estimate	2.61
True Effect	2.50

The manual DiD estimate should be close to the true treatment effect of 2.50. Small deviations are due to sampling variability.

Concept Check

In the DiD framework, what does the control group (PA) help us estimate?

The average employment level in the absence of treatment.The counterfactual trend that the treated group (NJ) would have followed in the absence of the minimum wage increase.The direct effect of the treatment on untreated units.The measurement error in the employment variable.

Step 4: DiD via Regression

The same DiD estimate can be obtained from a regression. The coefficient on the interaction term treated x post is exactly the DiD estimate.

1# DiD regression
2m_did <- lm_robust(employment ~ treated * post, data = df,
3                  clusters = restaurant_id, se_type = "CR2")
4summary(m_did)
5
6cat("\nDiD estimate (treated:post):", coef(m_did)["treated:post"], "\n")
7cat("Manual estimate:", did_manual, "\n")

Requiresdid

Expected output — DiD regression:

Variable	Coefficient	SE	t-stat	p-value
Intercept	19.88	0.27	73.6	0.000
treated	0.15	0.38	0.4	0.694
post	1.44	0.15	9.6	0.000
treated:post	2.61	0.21	12.4	0.000

The coefficient on treated:post is the DiD estimate. It should match the manual computation exactly (around 2.61). Standard errors are clustered at the restaurant level.

Step 5: DiD with Fixed Effects

In most applications, you will use unit and time fixed effects instead of the simple interaction specification. This approach absorbs all time-invariant restaurant characteristics and common time shocks.

1library(fixest)
2
3# Create interaction term
4df$treat_post <- df$treated * df$post
5
6# DiD with two-way fixed effects (best practice)
7m_fe <- feols(employment ~ treat_post | restaurant_id + post,
8            data = df, vcov = ~restaurant_id)
9summary(m_fe)
10
11cat("\nDiD with TWFE:", coef(m_fe)["treat_post"], "\n")

Requiresfixest did

Expected output — TWFE regression:

Variable	Coefficient	SE
treat_post	2.61	0.21
Restaurant FEs	Yes	—
Time FEs	Yes	—
N	800	—

With a balanced panel and no additional covariates, the two-way fixed effects (TWFE) estimate is identical to the simple DiD regression estimate. The unit fixed effects absorb all time-invariant restaurant characteristics, and the time fixed effect absorbs the common time trend.

Step 6: Test the Parallel Trends Assumption

The key untestable assumption of DiD is that treated and control groups would have followed the same trend absent treatment. While we cannot test this directly, we can check whether trends were parallel in the pre-treatment period. With only two periods, we cannot do this, so let us extend our simulation.

1set.seed(2024)
2n_rest <- 200
3treated_ids <- c(rep(1, 100), rep(0, 100))
4base <- 20 + 3 * rnorm(n_rest)
5rest_fe <- rnorm(n_rest) * 2
6
7# 8 periods, treatment at t=5
8df_ext <- data.frame()
9for (t in 0:7) {
10time_trend <- 0.5 * t
11treat_effect <- 2.5 * pmax(0, t - 4) * treated_ids
12emp <- base + rest_fe + time_trend + treat_effect + rnorm(n_rest) * 1.5
13df_ext <- rbind(df_ext, data.frame(
14  restaurant_id = 1:n_rest,
15  treated = treated_ids,
16  period = t,
17  post = as.integer(t >= 5),
18  employment = emp
19))
20}
21
22# Plot
23library(ggplot2)
24means_by_time <- aggregate(employment ~ treated + period, data = df_ext, FUN = mean)
25means_by_time$group <- ifelse(means_by_time$treated == 1, "Treated (NJ)", "Control (PA)")
26
27ggplot(means_by_time, aes(x = period, y = employment, color = group)) +
28geom_point() + geom_line() +
29geom_vline(xintercept = 4.5, linetype = "dashed", color = "red") +
30labs(title = "Parallel Trends Check", x = "Period", y = "Avg Employment") +
31theme_minimal()

Requiresggplot2

Pre-treatment group means (extended simulation):

Period	Treated (NJ)	Control (PA)	Difference
0	20.05	19.92	+0.13
1	20.53	20.44	+0.09
2	21.07	20.90	+0.17
3	21.50	21.42	+0.08
4	22.04	21.93	+0.11
5	25.02	22.41	+2.61
6	25.55	22.89	+2.66
7	26.05	23.43	+2.62

Periods 0–4 are pre-treatment; treatment begins at period 5. The difference between groups is roughly constant in the pre-treatment period (around +0.1), confirming parallel trends. After treatment, the gap jumps to around +2.6, reflecting the treatment effect of 2.5 per post-treatment period.

Step 7: What Happens When Parallel Trends Fails

Let us simulate a violation of parallel trends to see how it biases the DiD estimate.

1# Violation: treated group has a different trend
2set.seed(2024)
3df_bad <- data.frame()
4for (t in 0:7) {
5time_trend_ctrl <- 0.5 * t
6time_trend_treat <- 0.5 * t + 0.3 * t  # Extra trend
7treat_effect <- 2.5 * pmax(0, t - 4) * treated_ids
8
9for (i in 1:n_rest) {
10  trend <- ifelse(treated_ids[i] == 1, time_trend_treat, time_trend_ctrl)
11  emp <- base[i] + rest_fe[i] + trend + treat_effect[i] + rnorm(1) * 1.5
12  df_bad <- rbind(df_bad, data.frame(
13    restaurant_id = i, treated = treated_ids[i],
14    period = t, post = as.integer(t >= 5), employment = emp
15  ))
16}
17}
18
19df_bad$treat_post <- df_bad$treated * df_bad$post
20m_bad <- feols(employment ~ treat_post | restaurant_id + period,
21             data = df_bad, vcov = ~restaurant_id)
22
23cat("DiD with violated parallel trends:", coef(m_bad)["treat_post"], "\n")
24cat("True effect: 2.50\n")
25cat("The estimate is biased upward.\n")

Requiresdid

Expected output — biased DiD with violated parallel trends:

Metric	Value
DiD estimate (violated trends)	3.40
True treatment effect	2.50
Bias	+0.90

The extra pre-existing upward trend of 0.3 per period in the treated group inflates the DiD estimate. The bias is approximately equal to the differential trend (0.3) multiplied by the number of post-treatment periods contributing to the estimate. Because DiD attributes all of the post-treatment divergence to the treatment, any pre-existing differential trend is incorrectly captured as a treatment effect.

Concept Check

If the treated group was already growing faster than the control group before treatment, how does this bias the DiD estimate?

The estimate is biased downward (toward zero) because the pre-existing trend masks the treatment effect.The estimate is biased upward because DiD attributes the pre-existing differential trend to the treatment.The estimate is unbiased because fixed effects remove the pre-existing trend.There is no bias because both groups are affected by the violation.

Step 8: Exercises

Vary the treatment effect. Change the true treatment effect from 2.5 to 0 (no effect) and re-run the analysis. Can you correctly conclude there is no significant effect?
Add covariates. Add restaurant-level controls (e.g., chain affiliation, number of registers) to the regression. Does the estimate change? Why or why not?
Try different clustering levels. Instead of clustering at the restaurant level, try clustering at the state level (only 2 clusters). What happens to inference? Why is this problematic?
Event study specification. Using the extended data (df_ext), create lead and lag indicators and estimate an event study regression. Plot the coefficients to visualize both pre-trends and dynamic treatment effects.

Summary

In this lab you learned:

DiD estimates causal effects by comparing changes over time between treated and control groups
The manual computation and regression approaches give identical point estimates
The parallel trends assumption is the key identification condition and is fundamentally untestable
Plotting pre-treatment trends is informative but not conclusive
When parallel trends fails, DiD estimates are biased in a predictable direction
In most settings, cluster standard errors at the level where treatment varies
Two-way fixed effects (unit + time) is the standard implementation in applied work