MethodAtlas
Method·intermediate·13 min read
Design-BasedEstablished

Difference-in-Differences (Canonical 2×2)

Estimates causal effects by comparing changes over time between treated and control groups.

When to UseWhen a treatment or policy change affects one group but not another, you observe both groups before and after the treatment, and both groups were on similar trajectories before treatment.
AssumptionParallel trends: absent treatment, the treated and control groups would have followed the same trend over time. They do not need the same level — only the same change. Also requires no anticipation and SUTVA.
MistakeFailing to cluster standard errors at the level of treatment assignment (e.g., state level for state policies), or interpreting non-significant pre-trends as proof that parallel trends holds.
Reading Time~13 min read · 11 sections · 10 interactive exercises

One-Line Implementation

Rfeols(y ~ treat_post | unit + time, data = df, vcov = ~state)
Statareghdfe y treat_post, absorb(unit time) vce(cluster state)
PythonPanelOLS.from_formula('y ~ treat_post + EntityEffects + TimeEffects', data=df).fit(cov_type='clustered', clusters=df['state'])

Download Full Analysis Code

Complete scripts with diagnostics, robustness checks, and result export.

Motivating Example: The New Jersey Minimum Wage Increase

On April 1, 1992, New Jersey raised its minimum wage from $4.25 to $5.05 per hour. Neighboring Pennsylvania did not. David Card and Alan Krueger saw an opportunity.

The conventional economic wisdom at the time was clear: raising the minimum wage should reduce employment. If labor costs go up, firms hire fewer workers. Theory said so, and decades of empirical work appeared to confirm it.

did something simple but powerful. They surveyed fast-food restaurants in New Jersey (where the minimum wage went up) and eastern Pennsylvania (where it did not), both before and after the wage increase. If you just compared employment in New Jersey before and after, you would conflate the minimum wage effect with everything else happening to the New Jersey economy. If you just compared New Jersey to Pennsylvania at one point in time, you would conflate the minimum wage effect with all the other ways the two states differ.

But by taking the difference in the change over time — how employment changed in New Jersey minus how employment changed in Pennsylvania — you can difference out both time-invariant state differences and common time trends. What remained, they argued, was the effect of the minimum wage.

Their finding shocked the profession: employment in New Jersey fast-food restaurants did not fall. If anything, it rose slightly. This study is one of the most cited papers in economics, not because it settled the minimum wage debate (it did not), but because it demonstrated the power of the difference-in-differences research design.


AOverview

What DiD Does

Difference-in-Differences is a research design that estimates causal effects by comparing the change over time in an outcome for a group affected by a treatment to the change over time for a group not affected.

The core logic has two "differences":

  1. First difference (over time): How did the outcome change from before to after the treatment, for each group?
  2. Second difference (across groups): How does the change for the treated group compare to the change for the control group?

The second difference removes any common time trends — events that affect both groups equally — leaving (under assumptions) the causal effect of the treatment.

When to Use DiD

  • A treatment or policy change affects one group but not another
  • You observe both groups both before and after the treatment
  • The treated and control groups were on similar trajectories before the treatment
  • Treatment timing is sharp and known

When NOT to Use DiD

  • There is no credible comparison group
  • Treatment adoption is staggered across many groups at different times (you may need modern staggered DiD estimators — see the Staggered DiD page)
  • The assumption is clearly violated
  • The treatment is anticipated and agents change behavior before it takes effect

The Taxonomy Position

DiD is a method. Its credibility comes from the research design — the particular setting and the parallel trends argument — rather than from controlling for observed confounders alone. It sits between pure experimental designs (randomized control trials) and pure model-based approaches (OLS with controls).


Common Confusions


BIdentification

Plain language: In the absence of treatment, the treated and control groups would have experienced the same change in the outcome over time. Note: they do not need the same level of the outcome, just the same trend.

Formally: E[Yit(0)Yi,t1(0)Di=1]=E[Yit(0)Yi,t1(0)Di=0]E[Y_{it}(0) - Y_{i,t-1}(0) | D_i = 1] = E[Y_{it}(0) - Y_{i,t-1}(0) | D_i = 0] for all tt

where Yit(0)Y_{it}(0) is the potential outcome without treatment for unit ii at time tt, and DiD_i indicates treatment group membership.

Assumption 2: No Anticipation

Plain language: The treated group does not change its behavior before the treatment takes effect. If New Jersey restaurants started adjusting employment in anticipation of the minimum wage increase, the pre-treatment period is contaminated.

Assumption 3: Stable Composition (No Selective Attrition)

Plain language: The composition of the treated and control groups does not change because of the treatment. If the minimum wage increase caused some restaurants to close (and you only observe survivors), your estimates are biased.

Assumption 4: SUTVA (Stable Unit Treatment Value Assumption)

Plain language: The treatment of one unit does not affect the outcomes of other units. If New Jersey restaurants losing workers caused Pennsylvania restaurants to gain them, the "control" group is contaminated by .

The Omitted Variable Bias Connection

DiD eliminates a specific class of confounders: variables that are constant over time within groups (group fixed effects) and variables that are constant across groups at each point in time (time fixed effects). It does not eliminate confounders that change differentially over time across groups. That remaining threat is what the parallel trends assumption fundamentally addresses. Sensitivity analysis can help quantify how much a parallel trends violation would need to be to overturn your results.


CVisual Intuition

Animated Explanation — parallel trends

Parallel Trends: The Core of DiD

Step 1(1/4)

Two groups start at different levels but follow the same trend before treatment.

Watch how the treatment effect is identified by comparing changes over time between treated and control groups.

The 2x2 Table

Think of DiD through a simple table:

Before TreatmentAfter TreatmentChange (After - Before)
TreatedYˉT,pre\bar{Y}_{T,pre}YˉT,post\bar{Y}_{T,post}ΔYˉT\Delta \bar{Y}_T
ControlYˉC,pre\bar{Y}_{C,pre}YˉC,post\bar{Y}_{C,post}ΔYˉC\Delta \bar{Y}_C
DifferenceΔYˉTΔYˉC=τ^DiD\Delta \bar{Y}_T - \Delta \bar{Y}_C = \hat{\tau}_{DiD}

The DiD estimate is the difference in the differences: how much the treated group's outcome changed, minus how much the control group's outcome changed.

In 's data:

Before (Feb 1992)After (Nov 1992)Change
NJ (treated)20.44 FTEs21.03 FTEs+0.59
PA (control)23.33 FTEs21.17 FTEs-2.16
DiD estimate+2.75

The DiD estimate of +2.75 full-time equivalent employees suggests the minimum wage increase was associated with a modest increase in employment, contrary to the standard prediction. DiD designs have since been applied across many settings in management and strategy — for example, Choudhury et al. (2021) use a DiD design to study the productivity effects of a work-from-anywhere natural experiment.


DMathematical Derivation

Don't worry about the notation yet — here's what this means in words: DiD takes the change over time for the treated group and subtracts the change over time for the control group, netting out both group-level and time-level confounders.

Setup. Suppose we have two groups (D{0,1}D \in \{0, 1\}) and two time periods (t{0,1}t \in \{0, 1\}). Treatment is applied only to group D=1D = 1 in period t=1t = 1.

The potential outcomes model:

Yit=α+γDi+λt+τ(Di×t)+εitY_{it} = \alpha + \gamma D_i + \lambda t + \tau (D_i \times t) + \varepsilon_{it}

where:

  • α\alpha is the baseline level
  • γ\gamma captures the time-invariant difference between groups
  • λ\lambda captures the common time trend
  • τ\tau is the treatment effect (the parameter of interest)
  • εit\varepsilon_{it} is the idiosyncratic error

Step 1: Compute group-time means.

E[YD=1,t=1]=α+γ+λ+τE[YD=1,t=0]=α+γE[YD=0,t=1]=α+λE[YD=0,t=0]=α\begin{aligned} E[Y \mid D=1, t=1] &= \alpha + \gamma + \lambda + \tau \\ E[Y \mid D=1, t=0] &= \alpha + \gamma \\ E[Y \mid D=0, t=1] &= \alpha + \lambda \\ E[Y \mid D=0, t=0] &= \alpha \end{aligned}

Step 2: First difference (within each group).

ΔT=E[YD=1,t=1]E[YD=1,t=0]=λ+τΔC=E[YD=0,t=1]E[YD=0,t=0]=λ\begin{aligned} \Delta_T &= E[Y \mid D=1,t=1] - E[Y \mid D=1,t=0] = \lambda + \tau \\ \Delta_C &= E[Y \mid D=0,t=1] - E[Y \mid D=0,t=0] = \lambda \end{aligned}

Step 3: Second difference (across groups).

τ^DiD=ΔTΔC=(λ+τ)λ=τ\hat{\tau}_{DiD} = \Delta_T - \Delta_C = (\lambda + \tau) - \lambda = \tau

The common time trend λ\lambda cancels out. The group fixed effect γ\gamma was already differenced out in Step 2.

Regression implementation. The DiD estimator is numerically identical to OLS on:

Yit=β0+β1Di+β2Postt+β3(Di×Postt)+εitY_{it} = \beta_0 + \beta_1 \cdot D_i + \beta_2 \cdot \text{Post}_t + \beta_3 \cdot (D_i \times \text{Post}_t) + \varepsilon_{it}

Here β^3=τ^DiD\hat{\beta}_3 = \hat{\tau}_{DiD}. In a panel with unit and time fixed effects:

Yit=αi+λt+τTreatit+εitY_{it} = \alpha_i + \lambda_t + \tau \cdot \text{Treat}_{it} + \varepsilon_{it}

where Treatit=Di×Postt\text{Treat}_{it} = D_i \times \text{Post}_t, and αi\alpha_i and λt\lambda_t are unit and time fixed effects that absorb γ\gamma and λ\lambda respectively. The coefficient τ^\hat{\tau} is the DiD estimate.


EImplementation

# Requires: fixest
# fixest: fast fixed-effects estimation with multi-way clustering (Berge)
library(fixest)

# --- Step 1: Two-way fixed effects (TWFE) DiD ---
# feols() estimates OLS with absorbed fixed effects
# treat_post = interaction of treated group x post-treatment period (the DiD term)
# | unit_id + year: absorbs unit and time fixed effects
# vcov = ~state: clusters SEs at state level because treatment varies at state level
m1 <- feols(y ~ treat_post | unit_id + year,
          data = df,
          vcov = ~state)
summary(m1)
# Coefficient on treat_post: estimated ATT (Average Treatment Effect on the Treated)

# --- Step 2: Event study specification ---
# i(event_time, treated, ref = -1) creates interactions between event-time dummies
# and the treated indicator, omitting k=-1 as the reference (normalization) period
# Pre-treatment coefficients near zero support parallel trends assumption
m2 <- feols(y ~ i(event_time, treated, ref = -1) | unit_id + year,
          data = df,
          vcov = ~state)
# iplot() displays event-study coefficients with confidence intervals
iplot(m2, main = "Event Study Plot")
# Pre-treatment coefficients: test for parallel trends (should be near zero)
# Post-treatment coefficients: dynamic treatment effect path
Requiresfixestdid

FDiagnostics

F.1 Event Study Plot (The Key Diagnostic)

The event study plot is a central diagnostic for DiD. It shows the estimated treatment effect for each period relative to the treatment date. Before treatment, the coefficients should be close to zero (supporting parallel trends). After treatment, they show the dynamic treatment effect.

How to read it:

  • x-axis: Periods relative to treatment (negative = pre-treatment, 0 or 1 = first treated period)
  • y-axis: Estimated coefficient (difference between treated and control relative to the reference period)
  • Pre-treatment coefficients near zero: Supports parallel trends
  • Pre-treatment coefficients trending: Warns that parallel trends may fail
  • Post-treatment coefficients: Show how the treatment effect evolves over time

F.2

Run DiD using a fake treatment date (before the actual treatment). If you find an "effect" at the fake date, your parallel trends assumption is likely violated.

F.3 Alternative Control Groups

If you have multiple potential control groups, check that the DiD estimate is similar across different choices. Sensitivity to the control group suggests the parallel trends assumption may depend on which group you choose.

Adding group-specific linear time trends to the regression allows the treated and control groups to have different (linear) pre-trends. If results change substantially, the parallel trends assumption may be fragile. But be cautious: group-specific trends can absorb real treatment effects if the effect is gradual.

Interpreting Your Results

How to interpret the DiD coefficient: "The treatment is associated with a [coefficient] change in [outcome] relative to the comparison group, controlling for group and time fixed effects."

Example write-up:

Common misstatements:

  • Do not say "the treated and control groups had parallel trends" — say "we find no evidence against parallel trends in the pre-treatment period"
  • Do not omit the event study plot — reviewers now expect it
  • Do not forget to report the number of clusters and the clustering level

GWhat Can Go Wrong

What Can Go Wrong

Violation of Parallel Trends

Treated and control groups have parallel pre-trends

DiD estimate = 2.0, close to the true effect of 2.0. Event study shows flat pre-treatment coefficients.

What Can Go Wrong

Clustering at the Wrong Level

Standard errors clustered at the state level (the level of treatment variation)

SE = 1.36, t-stat = 2.03, marginally significant. Honest uncertainty about the estimate.

What Can Go Wrong

Anticipation Effects

No behavioral change before the treatment date

DiD estimate = -2.16 (SE = 1.06), p = 0.04. Event study pre-treatment coefficients are -0.08, 0.12, -0.05, and 0.03 for periods t-4 through t-1 — all statistically insignificant (largest |t| = 0.41). Clean break at the treatment date.


HPractice

H.1 Concept Checks

Concept Check

In a canonical 2x2 DiD, what does the parallel trends assumption require?

Concept Check

You run a DiD regression and your colleague asks you to add county fixed effects, year fixed effects, AND state-specific linear time trends. What concern might you have about the state-specific trends?

Concept Check

A study has 50 states, 10 of which adopted a policy. The authors cluster standard errors at the individual level (N = 500,000). The coefficient is significant at p < 0.001. Should you be concerned?

Concept Check

Your event study plot shows pre-treatment coefficients (relative to t = −1) of −1.3 at t = −5, −0.9 at t = −4, −0.5 at t = −3, and −0.1 at t = −2. What does this pattern suggest?

Concept Check

You are studying the effect of a city-level smoking ban on restaurant revenue. You use DiD comparing restaurants in cities that adopted the ban to restaurants in cities that did not. What violation of SUTVA might concern you?

H.2 Guided Exercise

Guided Exercise

You have the following data from a DiD analysis. Fill in the blanks.

A state implemented a subsidized childcare program. You compare counties in the treated state to counties in a neighboring state. Average female labor force participation rates:

Treated state, before: 55% Treated state, after: 62% Control state, before: 58% Control state, after: 60%

What is the before-after change for the treated state?

What is the before-after change for the control state?

What is the DiD estimate?

Interpret the DiD estimate (with units).

H.3 Error Detective

Error Detective

Read the analysis below carefully and identify the errors.

A researcher studies the effect of a state-level paid family leave policy on birth rates. Three states adopted the policy in 2015; 47 states did not. Using state-year panel data (2010-2020), they estimate:

reghdfe birth_rate treat_post, absorb(state year) vce(robust)

They find: coefficient = 1.8 births per 1000, SE = 0.4, p < 0.001. They conclude: "The paid family leave policy caused a statistically significant increase in birth rates."

Select all errors you can find:

Error Detective

Read the analysis below carefully and identify the errors.

A management researcher studies whether adopting agile methodology improves software team productivity. They observe 200 teams at a tech company, 80 of which adopted agile in Q1 2020 (the rest continued with waterfall). Using quarterly panel data from 2018-2022:

reg productivity agile_team##post_Q12020 team_size experience, vce(cluster team)

Coefficient on interaction: 12.5 (SE = 3.1, p < 0.001). "Agile adoption increased productivity by 12.5 units."

The event study shows flat pre-trends from 2018-2019.

Select all errors you can find:

H.4 You Are the Referee

Referee Exercise

Read the paper summary below and write a brief referee critique (2-3 sentences) of the identification strategy.

Paper Summary

The authors study the effect of a 2016 European regulation requiring large firms to disclose gender pay gap data. Using a DiD design, they compare firms above the size threshold (treated) to firms just below it (control), before and after the regulation. They find that the pay gap narrowed by 2.1 percentage points (SE = 0.7) in treated firms relative to control firms. The event study shows no significant pre-trends in the 3 years before the regulation.

Key Table

VariableCoefficientClustered SEp-value
Treated x Post-2.100.700.003
Firm Size (log)0.450.120.000
Industry FEYes
Firm FEYes
Year FEYes
Clusters (firms)1,200
N (firm-years)9,600
Pre-trend F-test p-value0.42

Authors' Identification Claim

The regulation created a quasi-experiment by treating firms above the size threshold and not those below it. The parallel trends assumption is supported by the non-significant pre-trends.


ISwap-In: When to Use Something Else

  • Staggered DiD: When treatment is adopted at different times by different units — canonical 2×2 DiD is a special case, and staggered estimators handle heterogeneous treatment effects.
  • Synthetic control: When there is a single (or very few) treated unit and constructing a data-driven counterfactual from donor units is more credible than parallel trends.
  • Event studies: When the full time profile of treatment effects is of primary interest, or when you need to visualize pre-trends as a diagnostic.
  • RDD: When treatment is assigned by a threshold on a running variable rather than by group membership over time.
  • Fixed effects: When treatment varies within unit over time but the parallel trends assumption is suspect — FE removes time-invariant confounders without requiring a comparison group.

JReviewer Checklist

Critical Reading Checklist

0 of 6 items checked0%

Paper Library

Foundational (6)

Abadie, A., Athey, S., Imbens, G. W., & Wooldridge, J. M. (2023). When Should You Adjust Standard Errors for Clustering?.

Quarterly Journal of EconomicsDOI: 10.1093/qje/qjac038

Abadie et al. provide guidance on when clustering standard errors is necessary. They show that clustering can be motivated by sampling-based uncertainty (e.g., two-stage sampling of clusters then units) or design-based uncertainty (e.g., treatment assigned at the cluster level), and that whether to cluster, and at what level, is a substantive question tied to the sampling and assignment process — not a purely mechanical rule.

Ashenfelter, O. (1978). Estimating the Effect of Training Programs on Earnings.

Review of Economics and StatisticsDOI: 10.2307/1924332

Ashenfelter provides one of the earliest applications of the difference-in-differences logic, comparing the earnings of trainees before and after a job training program to a comparison group. The key insight is that differencing removes time-invariant unobserved differences between treatment and control groups. This paper also documents the 'Ashenfelter dip' — the pre-program earnings decline among trainees — which becomes a canonical example of why parallel trends cannot be taken for granted.

Bertrand, M., Duflo, E., & Mullainathan, S. (2004). How Much Should We Trust Differences-in-Differences Estimates?.

Quarterly Journal of EconomicsDOI: 10.1162/003355304772839588

Bertrand, Duflo, and Mullainathan show that standard errors in DID studies are often far too small because they ignore serial correlation within units over time. They propose clustering standard errors at the group level as a simple fix, which is now widely recommended practice in DID applications.

Card, D., & Krueger, A. B. (1994). Minimum Wages and Employment: A Case Study of the Fast-Food Industry in New Jersey and Pennsylvania.

American Economic Review

Card and Krueger compare fast-food employment in New Jersey (which raised its minimum wage) with neighboring Pennsylvania (which did not) in perhaps the most famous DID study in economics. They find no negative employment effect, challenging the standard textbook prediction. This paper popularizes DID as a research design.

Frake, J., Gibbs, A., Goldfarb, B., Hiraiwa, T., Starr, E., & Yamaguchi, S. (2025). From Perfect to Practical: Partial Identification Methods for Causal Inference in Strategic Management Research.

Strategic Management JournalDOI: 10.1002/smj.3714

Frake and colleagues introduce partial identification methods to strategic management, providing a practical framework for assessing the sensitivity of difference-in-differences and instrumental variables estimates to violations of identifying assumptions. The paper demonstrates how researchers can construct informative bounds on treatment effects when parallel trends or exclusion restriction assumptions are relaxed. It bridges the gap between the theoretical ideal of point identification and the practical reality that identifying assumptions are rarely perfectly satisfied.

Roth, J. (2022). Pretest with Caution: Event-Study Estimates after Testing for Parallel Trends.

American Economic Review: InsightsDOI: 10.1257/aeri.20210236

Roth shows that the common practice of testing for parallel pre-trends and proceeding conditional on 'passing' can lead to distorted inference. He proposes honest confidence intervals that account for pre-testing, fundamentally changing how researchers should think about event study pre-trends in DiD designs.

Application (6)

Autor, D. H. (2003). Outsourcing at Will: The Contribution of Unjust Dismissal Doctrine to the Growth of Employment Outsourcing.

Journal of Labor EconomicsDOI: 10.1086/344122

Autor uses a DID design that exploits the staggered adoption of wrongful-discharge protections across U.S. states. He finds that stronger employment protections led firms to outsource more jobs. This paper is a model for using staggered state-level policy changes in a DID framework.

Choudhury, P., Foroughi, C., & Larson, B. (2021). Work-from-anywhere: The Productivity Effects of Geographic Flexibility.

Strategic Management JournalDOI: 10.1002/smj.3251

Choudhury, Foroughi, and Larson use a difference-in-differences design to study the productivity effects of a work-from-anywhere policy at the U.S. Patent and Trademark Office. They find that geographic flexibility increases output by approximately 4.4% without reducing quality. The paper demonstrates the application of DiD to a natural experiment in organizational design and is a leading example of causal inference in the future-of-work literature.

Duflo, E. (2001). Schooling and Labor Market Consequences of School Construction in Indonesia: Evidence from an Unusual Policy Experiment.

American Economic ReviewDOI: 10.1257/aer.91.4.795

Duflo uses DiD comparing cohorts exposed to a massive school construction program in Indonesia to older cohorts not exposed, across regions with different program intensity. A beautifully clean application showing how DiD can exploit variation in treatment intensity across space and cohorts.

Gruber, J. (1994). The Incidence of Mandated Maternity Benefits.

American Economic Review

Gruber uses a DID design exploiting variation in state-level mandated maternity benefits to show that the costs of these benefits are shifted to workers in the form of lower wages. This study is a classic example of how DID can exploit policy variation across states and time.

Neumark, D., & Wascher, W. (2000). Minimum Wages and Employment: A Case Study of the Fast-Food Industry in New Jersey and Pennsylvania: Comment.

American Economic ReviewDOI: 10.1257/aer.90.5.1362

Neumark and Wascher challenge Card and Krueger's (1994) minimum wage findings by re-analyzing the data using payroll records instead of survey responses, finding negative employment effects. The exchange illustrates the importance of data quality and measurement choices in difference-in-differences designs.

Singh, J., & Agrawal, A. (2011). Recruiting for Ideas: How Firms Exploit the Prior Inventions of New Hires.

Management ScienceDOI: 10.1287/mnsc.1100.1253

Singh and Agrawal use a difference-in-differences approach, comparing citation rates to recruits' patents before and after the move against matched control patents, to study how hiring inventors affects knowledge flows to the hiring firm. They find that hiring an inventor increases the hiring firm's citations to the recruit's prior patents, indicating knowledge transfer. The paper demonstrates how DiD with matched controls can identify causal effects in knowledge flow studies.

Survey (3)

Angrist, J. D., & Pischke, J.-S. (2009). Mostly Harmless Econometrics: An Empiricist's Companion.

Princeton University PressDOI: 10.1515/9781400829828

Angrist and Pischke write one of the most influential modern textbooks on applied econometrics, organizing the field around a design-based approach to causal inference. The book provides essential treatments of instrumental variables, difference-in-differences, and regression discontinuity, each grounded in the potential outcomes framework. It remains the standard reference for graduate students learning to evaluate and implement identification strategies.

Cunningham, S. (2021). Causal Inference: The Mixtape.

Yale University PressDOI: 10.12987/9780300255881Replication

Cunningham provides an accessible textbook with an excellent DiD chapter that walks through the intuition, the math, and the code (in Stata and R). Freely available online at mixtape.scunning.com, it is a valuable companion for students who want worked examples alongside formal treatment.

Roth, J., Sant'Anna, P. H. C., Bilinski, A., & Poe, J. (2023). What's Trending in Difference-in-Differences? A Synthesis of the Recent Econometrics Literature.

Journal of EconometricsDOI: 10.1016/j.jeconom.2023.03.008

Roth et al. synthesize the explosion of recent econometric work on DID in this comprehensive survey, covering staggered treatment timing, heterogeneous treatment effects, pre-trends testing, and new estimators. It is the essential starting point for understanding the modern DID literature.

Tags

design-basedpanelcontinuous-outcomepolicy-evaluation