When should I use Difference-in-Differences (Canonical 2×2)?

When a treatment or policy change affects one group but not another, you observe both groups before and after the treatment, and both groups were on similar trajectories before treatment.

What is the key assumption of Difference-in-Differences (Canonical 2×2)?

Parallel trends: absent treatment, the treated and control groups would have followed the same trend over time. They do not need the same level — only the same change. Also requires no anticipation and SUTVA.

What is the most common mistake with Difference-in-Differences (Canonical 2×2)?

Failing to cluster standard errors at the level of treatment assignment (e.g., state level for state policies), or interpreting non-significant pre-trends as proof that parallel trends holds.

Design-BasedEstablished

Difference-in-Differences (Canonical 2×2)

Estimates causal effects by comparing changes over time between treated and control groups.

Quick Reference

When to Use: When a treatment or policy change affects one group but not another, you observe both groups before and after the treatment, and both groups were on similar trajectories before treatment.
Key Assumption: Parallel trends: absent treatment, the treated and control groups would have followed the same trend over time. They do not need the same level — only the same change. Also requires no anticipation and SUTVA.
Common Mistake: Failing to cluster standard errors at the level of treatment assignment (e.g., state level for state policies), or interpreting non-significant pre-trends as proof that parallel trends holds.
Estimated Time: 3 hours with tutorial lab

One-Line Implementation

Stata: reghdfe y treat_post, absorb(unit time) vce(cluster state)

R: feols(y ~ treat_post | unit + time, data = df, vcov = ~state)

Python:

PanelOLS.from_formula('y ~ treat_post + EntityEffects + TimeEffects', data=df).fit(cov_type='clustered', clusters=df['state'])

Download Full Analysis Code

Complete scripts with diagnostics, robustness checks, and result export.

Motivating Example

On April 1, 1992, New Jersey raised its minimum wage from $4.25 to $5.05 per hour. Neighboring Pennsylvania did not. David Card and Alan Krueger saw an opportunity(Card & Krueger, 1994).

The conventional economic wisdom at the time was clear: raising the minimum wage should reduce employment. If labor costs go up, firms hire fewer workers. Theory said so, and decades of empirical work appeared to confirm it.

Card and Krueger did something simple but powerful. They surveyed fast-food restaurants in New Jersey (where the minimum wage went up) and eastern Pennsylvania (where it did not), both before and after the wage increase. If you just compared employment in New Jersey before and after, you would conflate the minimum wage effect with everything else happening to the New Jersey economy. If you just compared New Jersey to Pennsylvania at one point in time, you would conflate the minimum wage effect with all the other ways the two states differ.

But by taking the difference in the change over time — how employment changed in New Jersey minus how employment changed in Pennsylvania — you can difference out both time-invariant state differences and common time trends. What remained, they argued, was the effect of the minimum wage.

Their finding shocked the profession: employment in New Jersey fast-food restaurants did not fall. If anything, it rose slightly. This study is one of the most cited papers in economics, not because it settled the minimum wage debate (it did not), but because it demonstrated the power of the difference-in-differences research design.

A. Overview

What DiD Does

Difference-in-Differences is a research design that estimates causal effects by comparing the change over time in an outcome for a group affected by a treatment to the change over time for a group not affected.

The core logic has two "differences":

First difference (over time): How did the outcome change from before to after the treatment, for each group?
Second difference (across groups): How does the change for the treated group compare to the change for the control group?

The second difference removes any common time trends — events that affect both groups equally — leaving (under assumptions) the causal effect of the treatment.

When to Use DiD

A treatment or policy change affects one group but not another
You observe both groups both before and after the treatment
The treated and control groups were on similar trajectories before the treatment
Treatment timing is sharp and known

When NOT to Use DiD

There is no credible comparison group
Treatment adoption is staggered across many groups at different times (you may need modern staggered DiD estimators — see the Staggered DiD page)
The parallel trends assumption is clearly violated
The treatment is anticipated and agents change behavior before it takes effect

The Taxonomy Position

DiD is a method. Its credibility comes from the research design — the particular setting and the parallel trends argument — rather than from controlling for observed confounders alone. It sits between pure experimental designs (randomized control trials) and pure model-based approaches (OLS with controls).

Common Confusions

Questions students genuinely struggle with

Q: Do I need to see perfectly parallel pre-trends to use DiD? No — and this belief is one of the most common misconceptions. Parallel trends is an assumption about what would have happened in the absence of treatment, which is fundamentally untestable. What you can check is whether the trends were similar before treatment. If they were, that is reassuring but not proof. If they were not, that is concerning but does not automatically disqualify DiD. The event study plot (Section F) helps you assess this visually. Recent work by Roth (2022)(Roth, 2022) shows that pre-testing for parallel trends and proceeding conditional on "passing" can actually distort your inference.
Q: Do I difference manually or use regression? Both give the same answer in the canonical 2x2 case. The regression approach is more flexible because it lets you add controls, use fixed effects, and handle more complex designs. In practice, everyone uses regression (or its modern variants). See Section E.
Q: At what level should I cluster my standard errors? Cluster at the level at which the treatment is assigned. If a state changes its policy, cluster at the state level. If a school district adopts a program, cluster at the district level. This choice is essential — failing to cluster at the treatment level can inflate your t-statistics enormously(Bertrand et al., 2004).
Q: Is DiD better than OLS with controls? DiD addresses a specific type of confounding — time-invariant differences between groups and common time shocks — without needing to observe or measure those confounders. OLS with controls only works if you observe and correctly specify all confounders. The two approaches address different threats, and DiD is more credible when you have a clear treatment event and a plausible comparison group. Understanding selection bias is fundamental to appreciating why DiD's design-based approach is often more credible than regression adjustment alone.
Q: What if my treatment is adopted at different times by different states? That scenario is the staggered adoption setting, and the canonical 2x2 DiD can give misleading results. Recent econometric work has shown that two-way fixed effects (the standard regression approach) can produce biased estimates when treatment effects vary over time(Roth et al., 2023). See the Staggered DiD page for modern solutions.

B. Identification

Assumption 1: Parallel Trends

Plain language: In the absence of treatment, the treated and control groups would have experienced the same change in the outcome over time. Note: they do not need the same level of the outcome, just the same trend.

Formally: $E[Y(0)_{t=1} - Y(0)_{t=0} | D = 1] = E[Y(0)_{t=1} - Y(0)_{t=0} | D = 0]$

where $Y(0)$ is the potential outcome without treatment, $t$ indexes pre/post, and $D$ indicates treatment status.

Assumption 2: No Anticipation

Plain language: The treated group does not change its behavior before the treatment takes effect. If New Jersey restaurants started adjusting employment in anticipation of the minimum wage increase, the pre-treatment period is contaminated.

Assumption 3: Stable Composition (No Selective Attrition)

Plain language: The composition of the treated and control groups does not change because of the treatment. If the minimum wage increase caused some restaurants to close (and you only observe survivors), your estimates are biased.

Assumption 4: SUTVA (Stable Unit Treatment Value Assumption)

Plain language: The treatment of one unit does not affect the outcomes of other units. If New Jersey restaurants losing workers caused Pennsylvania restaurants to gain them, the "control" group is contaminated by spillovers.

The Omitted Variable Bias Connection

DiD eliminates a specific class of confounders: variables that are constant over time within groups (group fixed effects) and variables that are constant across groups at each point in time (time fixed effects). It does not eliminate confounders that change differentially over time across groups. That remaining threat is what the parallel trends assumption fundamentally addresses. Sensitivity analysis can help quantify how much a parallel trends violation would need to be to overturn your results.

C. Visual Intuition

Animated Explanation — parallel trends

Parallel Trends: The Core of DiD

Step 1(1/4)

Two groups start at different levels but follow the same trend before treatment.

The 2x2 Table

Think of DiD through a simple table:

	Before Treatment	After Treatment	Change (After - Before)
Treated	$\bar{Y}_{T,pre}$	$\bar{Y}_{T,post}$	$\Delta \bar{Y}_T$
Control	$\bar{Y}_{C,pre}$	$\bar{Y}_{C,post}$	$\Delta \bar{Y}_C$
Difference			$\Delta \bar{Y}_T - \Delta \bar{Y}_C = \hat{\tau}_{DiD}$

The DiD estimate is the difference in the differences: how much the treated group's outcome changed, minus how much the control group's outcome changed.

In Card and Krueger's data:

	Before (Feb 1992)	After (Nov 1992)	Change
NJ (treated)	20.44 FTEs	21.03 FTEs	+0.59
PA (control)	23.33 FTEs	21.17 FTEs	-2.16
DiD estimate			+2.75

The DiD estimate of +2.75 full-time equivalent employees suggests the minimum wage increase was associated with a modest increase in employment, contrary to the standard prediction.

Interactive Simulation

Build Your Own DiD

Adjust the treatment effect and parallel trends violation to see how they affect the DiD estimate.

True Treatment Effect

-55

Parallel Trends Violation

-33

Noise Level

0.13

Interactive Simulation

Difference-in-Differences Simulator

Explore how difference-in-differences identifies treatment effects by comparing changes over time between treatment and control groups. The DGP is Y_it = α_i + γ_t + β·D_it + δ·trend_it + ε_it.

TreatmentControlCounterfactual

Estimation Results

Estimator	Estimate	Bias
DiD estimate	2.859	-0.141
Simple pre-post (no control)	5.020	+2.020
True β	3.000	—

N per group100

Number of units in each group

Pre-periods4

Number of time periods before treatment

Post-periods4

Number of time periods after treatment

Treatment effect (β)3.0

True causal effect of the treatment

Parallel trends violation (δ)0.00

Differential trend slope for the treatment group (0 = parallel trends hold)

Noise (σ)1.5

Standard deviation of the individual error term

DiD removes common trends. The simple pre-post estimate is biased by +2.02 because it confounds the treatment effect with the common time trend. DiD differences this out.

Interactive Simulation

Why DiD? Seeing the Parallel Trends

DGP: Yᵢₜ = αᵢ + γₜ + 3.0·Dᵢₜ + 0.0·t·Treatᵢ + ε. 50 units per group, 5 pre-periods, 5 post-periods, σ = 1.5.

Treatment groupControl groupCounterfactual

Estimation Results

Estimator	β̂	SE	95% CI	Bias
Pre-Post	5.244	0.151	[4.95, 5.54]	+2.244
Cross-Section	4.828	0.152	[4.53, 5.13]	+1.828
DiD	2.881	0.214	[2.46, 3.30]	-0.119
OLS + Group/Time FEclosest	2.881	0.190	[2.51, 3.25]	-0.119
True β	3.000	—	—	—

Units per group (N)50

Number of units in each group

Pre-treatment periods5

Number of periods before treatment onset

Post-treatment periods5

Number of periods after treatment onset

True effect (β)3.0

The causal effect of treatment

Parallel trends violation (δ)0.0

Differential time trend for treated group (0 = parallel trends holds)

Noise (σ)1.5

Standard deviation of idiosyncratic error

Why the difference?

The pre-post estimator (treatment group only) yields β̂ = 5.244 (bias = +2.244). It confounds the treatment effect with the common time trend: anything that changed between periods for *everyone* inflates the estimate. The cross-section estimator (post period only) yields β̂ = 4.828 (bias = +1.828). It confounds the treatment effect with the level difference between groups: the treatment group already had higher outcomes before treatment. DiD removes both confounders by differencing out the group level difference *and* the common time trend, yielding β̂ = 2.881 (bias = -0.119). With parallel trends satisfied (δ ≈ 0), DiD recovers the true effect. The OLS regression with group and time fixed effects (β̂ = 2.881) matches DiD closely, as expected — in the canonical 2×2 setting, the regression implementation is numerically equivalent to the double-differencing procedure.

D. Mathematical Derivation

Don't worry about the notation yet — here's what this means in words: DiD takes the change over time for the treated group and subtracts the change over time for the control group, netting out both group-level and time-level confounders.

Setup. Suppose we have two groups ( $D \in \{0, 1\}$ ) and two time periods ( $t \in \{0, 1\}$ ). Treatment is applied only to group $D = 1$ in period $t = 1$ .

The potential outcomes model:

Y_{it} = \alpha + \gamma D_i + \lambda t + \tau (D_i \times t) + u_{it}

where:

$\alpha$ is the baseline level
$\gamma$ captures the time-invariant difference between groups
$\lambda$ captures the common time trend
$\tau$ is the treatment effect (the parameter of interest)
$u_{it}$ is the idiosyncratic error

Step 1: Compute group-time means.

\begin{aligned} E[Y | D=1, t=1] &= \alpha + \gamma + \lambda + \tau \\ E[Y | D=1, t=0] &= \alpha + \gamma \\ E[Y | D=0, t=1] &= \alpha + \lambda \\ E[Y | D=0, t=0] &= \alpha \end{aligned}

Step 2: First difference (within each group).

\begin{aligned} \Delta_T &= E[Y|D=1,t=1] - E[Y|D=1,t=0] = \lambda + \tau \\ \Delta_C &= E[Y|D=0,t=1] - E[Y|D=0,t=0] = \lambda \end{aligned}

Step 3: Second difference (across groups).

\hat{\tau}_{DiD} = \Delta_T - \Delta_C = (\lambda + \tau) - \lambda = \tau

The common time trend $\lambda$ cancels out. The group fixed effect $\gamma$ was already differenced out in Step 2.

Regression implementation. The DiD estimator is numerically identical to OLS on:

Y_{it} = \beta_0 + \beta_1 \cdot D_i + \beta_2 \cdot \text{Post}_t + \beta_3 \cdot (D_i \times \text{Post}_t) + \varepsilon_{it}

Here $\hat{\beta}_3 = \hat{\tau}_{DiD}$ . In a panel with unit and time fixed effects:

Y_{it} = \alpha_i + \lambda_t + \tau \cdot \text{Treat}_{it} + \varepsilon_{it}

where $\text{Treat}_{it} = D_i \times \text{Post}_t$ , and $\alpha_i$ and $\lambda_t$ are unit and time fixed effects that absorb $\gamma$ and $\lambda$ respectively. The coefficient $\hat{\tau}$ is the DiD estimate.

E. Implementation

1library(fixest)
2
3# Method 1: Two-way fixed effects
4m1 <- feols(y ~ treat_post | unit_id + year,
5          data = df,
6          vcov = ~state)
7summary(m1)
8
9# Method 2: Event study
10m2 <- feols(y ~ i(event_time, treated, ref = -1) | unit_id + year,
11          data = df,
12          vcov = ~state)
13iplot(m2, main = "Event Study Plot")
14
15# The i() function creates interactions with a reference period
16# ref = -1 means the period just before treatment is the baseline

Requiresfixest

F. Diagnostics

F.1 Event Study Plot (The Essential Diagnostic)

The event study plot is a central diagnostic for DiD. It shows the estimated treatment effect for each period relative to the treatment date. Before treatment, the coefficients should be close to zero (supporting parallel trends). After treatment, they show the dynamic treatment effect.

How to read it:

x-axis: Periods relative to treatment (negative = pre-treatment, 0 or 1 = first treated period)
y-axis: Estimated coefficient (difference between treated and control relative to the reference period)
Pre-treatment coefficients near zero: Supports parallel trends
Pre-treatment coefficients trending: Warns that parallel trends may fail
Post-treatment coefficients: Show how the treatment effect evolves over time

Interactive Simulation

Pre-Trend Power Lens

Failure to reject parallel trends does not mean parallel trends hold — it might just mean your test lacks power. Explore how sample size, number of pre-periods, the magnitude of a true pre-trend violation, and noise all affect the probability of detecting a violation.

N units60

Total number of units (half treated, half control)

Pre-periods5

Number of time periods before treatment

Pre-trend magnitude0.5

Size of the true (hidden) differential pre-trend

Noise level (σ)1.5

Standard deviation of the idiosyncratic error

Power at N = 60

100%

Probability of rejecting H₀: no pre-trend (at α = 0.05)

Good power. At N = 60, the test has 100% power. Failing to reject H₀ here would provide more meaningful evidence that the pre-trend is small.

F.2 Placebo Tests

Run DiD using a fake treatment date (before the actual treatment). If you find an "effect" at the fake date, your parallel trends assumption is likely violated.

F.3 Alternative Control Groups

If you have multiple potential control groups, check that the DiD estimate is similar across different choices. Sensitivity to the control group suggests the parallel trends assumption may depend on which group you choose.

F.4 Inclusion of Group-Specific Trends

Adding group-specific linear time trends to the regression allows the treated and control groups to have different (linear) pre-trends. If results change substantially, the parallel trends assumption may be fragile. But be cautious: group-specific trends can absorb real treatment effects if the effect is gradual.

Interpreting Your Results

How to interpret the DiD coefficient: "The treatment is associated with a [coefficient] change in [outcome] relative to the comparison group, controlling for group and time fixed effects."

Example write-up:

Common misstatements:

Do not say "the treated and control groups had parallel trends" — say "we find no evidence against parallel trends in the pre-treatment period"
Do not omit the event study plot — reviewers now expect it
Do not forget to report the number of clusters and the clustering level

G. What Can Go Wrong

Assumption Failure Demo

Violation of Parallel Trends

Treated and control groups have parallel pre-trends

DiD estimate = 2.0, close to the true effect of 2.0. Event study shows flat pre-treatment coefficients.

Assumption Failure Demo

Clustering at the Wrong Level

Standard errors clustered at the state level (the level of treatment variation)

SE = 1.36, t-stat = 2.03, marginally significant. Honest uncertainty about the estimate.

Assumption Failure Demo

Anticipation Effects

No behavioral change before the treatment date

Clean break in the outcome at the treatment date. Event study shows zero pre-treatment effects.

H. Practice

H.1 Concept Checks

Concept Check

In a canonical 2x2 DiD, what does the parallel trends assumption require?

The treated and control groups must have the same level of the outcome before treatment.The treated and control groups would have experienced the same change in outcomes over time, absent the treatment.The treatment must be randomly assigned.There can be no difference between the groups at any point in time.

Concept Check

You run a DiD regression and your colleague asks you to add county fixed effects, year fixed effects, AND state-specific linear time trends. What concern might you have about the state-specific trends?

State-specific trends can never be included in a DiD model.State-specific trends might absorb the treatment effect itself, especially if the effect is gradual or linear.State-specific trends make the model nonlinear.State-specific trends eliminate the need for parallel trends.

Concept Check

A study has 50 states, 10 of which adopted a policy. The authors cluster standard errors at the individual level (N = 500,000). The coefficient is significant at p < 0.001. Should you be concerned?

No — the sample is large and the p-value is very small.Yes — standard errors should be clustered at the state level (the treatment level), which would likely produce much larger SEs.No — individual-level clustering is more conservative.Yes — but only if you have fewer than 30 states.

Concept Check

Your event study plot shows pre-treatment coefficients of -0.1, -0.5, -0.9, -1.3 (increasingly negative as you move further before treatment). What does this suggest?

The parallel trends assumption is well-supported because all coefficients are negative.The treated group was trending downward relative to the control before treatment, which threatens the parallel trends assumption.This pre-trend is fine as long as the post-treatment coefficients are larger in magnitude.This pattern means the treatment had a negative anticipation effect.

Concept Check

You are studying the effect of a city-level smoking ban on restaurant revenue. You use DiD comparing restaurants in cities that adopted the ban to restaurants in cities that did not. What violation of SUTVA might concern you?

Restaurants might close due to the ban, creating attrition.Smokers in ban cities might drive to restaurants in nearby non-ban cities, boosting revenue in the control group.The parallel trends assumption might not hold.Restaurants might anticipate the ban and change behavior.

H.2 Guided Exercise

Guided Exercise

You have the following data from a DiD analysis. Fill in the blanks.

A state implemented a subsidized childcare program. You compare counties in the treated state to counties in a neighboring state. Average female labor force participation rates: Treated state, before: 55% Treated state, after: 62% Control state, before: 58% Control state, after: 60%

H.3 Error Detective

Error Detective

Read the analysis below carefully and identify the errors.

A researcher studies the effect of a state-level paid family leave policy on birth rates. Three states adopted the policy in 2015; 47 states did not. Using state-year panel data (2010-2020), they estimate: reghdfe birth_rate treat_post, absorb(state year) vce(robust) They find: coefficient = 1.8 births per 1000, SE = 0.4, p < 0.001. They conclude: "The paid family leave policy caused a statistically significant increase in birth rates."

Select all errors you can find:

Standard errors are not clustered at the state level(vce(robust) instead of vce(cluster state))

Only 3 treated states — few clusters problem(Research design)

No event study or pre-trend assessment reported(Missing diagnostic)

Error Detective

Read the analysis below carefully and identify the errors.

A management researcher studies whether adopting agile methodology improves software team productivity. They observe 200 teams at a tech company, 80 of which adopted agile in Q1 2020 (the rest continued with waterfall). Using quarterly panel data from 2018-2022: reg productivity agile_team##post_Q12020 team_size experience, vce(cluster team) Coefficient on interaction: 12.5 (SE = 3.1, p < 0.001). "Agile adoption increased productivity by 12.5 units." The event study shows flat pre-trends from 2018-2019.

Select all errors you can find:

Selection into treatment is likely endogenous(Identification strategy)

Coincides with COVID-19 pandemic(Treatment timing)

H.4 You Are the Referee

Referee Exercise

Read the paper summary below and write a brief referee critique (2-3 sentences) of the identification strategy.

Paper Summary

The authors study the effect of a 2016 European regulation requiring large firms to disclose gender pay gap data. Using a DiD design, they compare firms above the size threshold (treated) to firms just below it (control), before and after the regulation. They find that the pay gap narrowed by 2.1 percentage points (SE = 0.7) in treated firms relative to control firms. The event study shows no significant pre-trends in the 3 years before the regulation.

Key Table

Variable	Coefficient	Clustered SE	p-value
Treated x Post	-2.10	0.70	0.003
Firm Size (log)	0.45	0.12	0.000
Industry FE	Yes
Firm FE	Yes
Year FE	Yes
Clusters (firms)	1,200
N (firm-years)	9,600
Pre-trend F-test p-value	0.42

Authors' Identification Claim

The regulation created a quasi-experiment by treating firms above the size threshold and not those below it. The parallel trends assumption is supported by the non-significant pre-trends.

I. Swap-In: When to Use Something Else

Staggered DiD: When treatment is adopted at different times by different units — canonical 2×2 DiD is a special case, and staggered estimators handle heterogeneous treatment effects.
Synthetic control: When there is a single (or very few) treated unit and constructing a data-driven counterfactual from donor units is more credible than parallel trends.
Event studies: When the full time profile of treatment effects is of primary interest, or when you need to visualize pre-trends as a diagnostic.
RDD: When treatment is assigned by a threshold on a running variable rather than by group membership over time.
Fixed effects: When treatment varies within unit over time but the parallel trends assumption is suspect — FE removes time-invariant confounders without requiring a comparison group.

J. Reviewer Checklist

Critical Reading Checklist

Paper Library

Has replication code

Foundational (4)

Ashenfelter, O. (1978). Estimating the Effect of Training Programs on Earnings.

Review of Economics and StatisticsDOI: 10.2307/1924332

This paper is one of the earliest applications of the difference-in-differences logic. Ashenfelter compared the earnings of trainees before and after a job training program to a comparison group, introducing the idea that you can remove time-invariant unobserved differences by looking at changes over time.

Card, D., & Krueger, A. B. (1994). Minimum Wages and Employment: A Case Study of the Fast-Food Industry in New Jersey and Pennsylvania.

American Economic ReviewDOI: 10.1257/aer.84.4.772

Perhaps the most famous DID study in economics. Card and Krueger compared fast-food employment in New Jersey (which raised its minimum wage) with neighboring Pennsylvania (which did not). They found no negative employment effect, challenging the standard textbook prediction. This paper popularized DID as a research design.

Bertrand, M., Duflo, E., & Mullainathan, S. (2004). How Much Should We Trust Differences-in-Differences Estimates?.

Quarterly Journal of EconomicsDOI: 10.1162/003355304772839588

This critical paper showed that standard errors in DID studies are often far too small because they ignore serial correlation within units over time. It proposed clustering standard errors at the group level as a simple fix, which is now standard practice in all DID analyses.

Roth, J. (2022). Pretest with Caution: Event-Study Estimates after Testing for Parallel Trends.

American Economic Review: InsightsDOI: 10.1257/aeri.20210236

Shows that the common practice of testing for parallel pre-trends and proceeding conditional on 'passing' can lead to distorted inference. Proposes honest confidence intervals that account for pre-testing. Fundamentally changes how researchers should think about event study pre-trends in DiD designs.

Application (7)

Gruber, J. (1994). The Incidence of Mandated Maternity Benefits.

American Economic Review

Gruber used a DID design exploiting variation in state-level mandated maternity benefits to show that the costs of these benefits were shifted to workers in the form of lower wages. This study is a classic example of how DID can exploit policy variation across states and time.

Autor, D. H. (2003). Outsourcing at Will: The Contribution of Unjust Dismissal Doctrine to the Growth of Employment Outsourcing.

Journal of Labor EconomicsDOI: 10.1086/344122

Autor used a DID design that exploited the staggered adoption of wrongful-discharge protections across U.S. states. He found that stronger employment protections led firms to outsource more jobs. This paper is a model for using staggered state-level policy changes in a DID framework.

Lerner, J., & Wulf, J. (2007). Innovation and Incentives: Evidence from Corporate R&D.

Review of Economics and StatisticsDOI: 10.1162/rest.89.4.634

This paper applied panel data methods including DID-style designs to study how compensation incentives for R&D managers affect innovation outcomes. It illustrates how DID thinking can be applied to management and innovation questions.

Flammer, C. (2015). Does Corporate Social Responsibility Lead to Superior Financial Performance? A Regression Discontinuity Approach.

Management ScienceDOI: 10.1287/mnsc.2014.2038

Although primarily an RDD paper, Flammer also used DID-style before-after comparisons around shareholder proposals on CSR. Published in Management Science, it is a prominent example of quasi-experimental methods in top management journals.

Kellogg, R. (2011). Learning by Drilling: Interfirm Learning and Relationship Persistence in the Texas Oilpatch.

Quarterly Journal of EconomicsDOI: 10.1093/qje/qjr039

Kellogg used a DID approach leveraging oil price shocks to study how interfirm relationships affect productivity in the Texas oil industry. It is an excellent example of DID applied to organizational learning and firm boundaries questions relevant to strategy scholars.

Duflo, E. (2001). Schooling and Labor Market Consequences of School Construction in Indonesia: Evidence from an Unusual Policy Experiment.

American Economic ReviewDOI: 10.1257/aer.91.4.795

Uses DiD comparing cohorts exposed to a massive school construction program in Indonesia to older cohorts not exposed, across regions with different program intensity. A beautifully clean application showing how DiD can exploit variation in treatment intensity across space and cohorts.

Agarwal, R., & Ohyama, A. (2013). Industry or Academia, Basic or Applied? Career Choices and Earnings Trajectories of Scientists.

Management ScienceDOI: 10.1287/mnsc.1120.1582

Uses panel data on scientists' career choices with DiD-style comparisons to identify the effect of early career environment on long-run earnings trajectories. A management-journal application showing how DiD logic can be applied to career and human capital questions.

Survey (2)

Roth, J., Sant'Anna, P. H. C., Bilinski, A., & Poe, J. (2023). What's Trending in Difference-in-Differences? A Synthesis of the Recent Econometrics Literature.

Journal of EconometricsDOI: 10.1016/j.jeconom.2023.03.008

This comprehensive survey synthesizes the explosion of recent econometric work on DID, covering staggered treatment timing, heterogeneous treatment effects, pre-trends testing, and new estimators. It is the essential starting point for understanding the modern DID literature.

Cunningham, S. (2021). Causal Inference: The Mixtape.

Yale University PressReplication

An accessible textbook with an excellent DiD chapter that walks through the intuition, the math, and the code (in Stata and R). Freely available online at mixtape.scunning.com, it is a valuable companion for students who want worked examples alongside formal treatment.

Quick Reference

One-Line Implementation

Download Full Analysis Code

Motivating Example

A. Overview

What DiD Does

When to Use DiD

When NOT to Use DiD

The Taxonomy Position

Common Confusions

B. Identification

Assumption 1: Parallel Trends

Assumption 2: No Anticipation

Assumption 3: Stable Composition (No Selective Attrition)

Assumption 4: SUTVA (Stable Unit Treatment Value Assumption)

The Omitted Variable Bias Connection

C. Visual Intuition

Parallel Trends: The Core of DiD

The 2x2 Table

Build Your Own DiD

Difference-in-Differences Simulator

Why DiD? Seeing the Parallel Trends

D. Mathematical Derivation

E. Implementation

F. Diagnostics

F.1 Event Study Plot (The Essential Diagnostic)

Pre-Trend Power Lens

F.2 Placebo Tests

F.3 Alternative Control Groups

F.4 Inclusion of Group-Specific Trends

Interpreting Your Results

G. What Can Go Wrong

Violation of Parallel Trends

Clustering at the Wrong Level

Anticipation Effects

H. Practice

H.1 Concept Checks

H.2 Guided Exercise

H.3 Error Detective

H.4 You Are the Referee

Paper Summary

Key Table

Authors' Identification Claim

I. Swap-In: When to Use Something Else

J. Reviewer Checklist

Critical Reading Checklist

Paper Library

Foundational (4)

Application (7)

Survey (2)

Tags