Difference-in-Differences (Canonical 2×2)
Estimates causal effects by comparing changes over time between treated and control groups.
Quick Reference
- When to Use
- When a treatment or policy change affects one group but not another, you observe both groups before and after the treatment, and both groups were on similar trajectories before treatment.
- Key Assumption
- Parallel trends: absent treatment, the treated and control groups would have followed the same trend over time. They do not need the same level — only the same change. Also requires no anticipation and SUTVA.
- Common Mistake
- Failing to cluster standard errors at the level of treatment assignment (e.g., state level for state policies), or interpreting non-significant pre-trends as proof that parallel trends holds.
- Estimated Time
- 3 hours with tutorial lab
One-Line Implementation
reghdfe y treat_post, absorb(unit time) vce(cluster state)feols(y ~ treat_post | unit + time, data = df, vcov = ~state)PanelOLS.from_formula('y ~ treat_post + EntityEffects + TimeEffects', data=df).fit(cov_type='clustered', clusters=df['state'])Download Full Analysis Code
Complete scripts with diagnostics, robustness checks, and result export.
Motivating Example
On April 1, 1992, New Jersey raised its minimum wage from $4.25 to $5.05 per hour. Neighboring Pennsylvania did not. David Card and Alan Krueger saw an opportunity(Card & Krueger, 1994).
The conventional economic wisdom at the time was clear: raising the minimum wage should reduce employment. If labor costs go up, firms hire fewer workers. Theory said so, and decades of empirical work appeared to confirm it.
Card and Krueger did something simple but powerful. They surveyed fast-food restaurants in New Jersey (where the minimum wage went up) and eastern Pennsylvania (where it did not), both before and after the wage increase. If you just compared employment in New Jersey before and after, you would conflate the minimum wage effect with everything else happening to the New Jersey economy. If you just compared New Jersey to Pennsylvania at one point in time, you would conflate the minimum wage effect with all the other ways the two states differ.
But by taking the difference in the change over time — how employment changed in New Jersey minus how employment changed in Pennsylvania — you can difference out both time-invariant state differences and common time trends. What remained, they argued, was the effect of the minimum wage.
Their finding shocked the profession: employment in New Jersey fast-food restaurants did not fall. If anything, it rose slightly. This study is one of the most cited papers in economics, not because it settled the minimum wage debate (it did not), but because it demonstrated the power of the difference-in-differences research design.
A. Overview
What DiD Does
Difference-in-Differences is a research design that estimates causal effects by comparing the change over time in an outcome for a group affected by a treatment to the change over time for a group not affected.
The core logic has two "differences":
- First difference (over time): How did the outcome change from before to after the treatment, for each group?
- Second difference (across groups): How does the change for the treated group compare to the change for the control group?
The second difference removes any common time trends — events that affect both groups equally — leaving (under assumptions) the causal effect of the treatment.
When to Use DiD
- A treatment or policy change affects one group but not another
- You observe both groups both before and after the treatment
- The treated and control groups were on similar trajectories before the treatment
- Treatment timing is sharp and known
When NOT to Use DiD
- There is no credible comparison group
- Treatment adoption is staggered across many groups at different times (you may need modern staggered DiD estimators — see the Staggered DiD page)
- The parallel trends assumption is clearly violated
- The treatment is anticipated and agents change behavior before it takes effect
The Taxonomy Position
DiD is a method. Its credibility comes from the research design — the particular setting and the parallel trends argument — rather than from controlling for observed confounders alone. It sits between pure experimental designs (randomized control trials) and pure model-based approaches (OLS with controls).
Common Confusions
B. Identification
Assumption 1: Parallel Trends
Plain language: In the absence of treatment, the treated and control groups would have experienced the same change in the outcome over time. Note: they do not need the same level of the outcome, just the same trend.
Formally:
where is the potential outcome without treatment, indexes pre/post, and indicates treatment status.
Assumption 2: No Anticipation
Plain language: The treated group does not change its behavior before the treatment takes effect. If New Jersey restaurants started adjusting employment in anticipation of the minimum wage increase, the pre-treatment period is contaminated.
Assumption 3: Stable Composition (No Selective Attrition)
Plain language: The composition of the treated and control groups does not change because of the treatment. If the minimum wage increase caused some restaurants to close (and you only observe survivors), your estimates are biased.
Assumption 4: SUTVA (Stable Unit Treatment Value Assumption)
Plain language: The treatment of one unit does not affect the outcomes of other units. If New Jersey restaurants losing workers caused Pennsylvania restaurants to gain them, the "control" group is contaminated by spillovers.
The Omitted Variable Bias Connection
DiD eliminates a specific class of confounders: variables that are constant over time within groups (group fixed effects) and variables that are constant across groups at each point in time (time fixed effects). It does not eliminate confounders that change differentially over time across groups. That remaining threat is what the parallel trends assumption fundamentally addresses. Sensitivity analysis can help quantify how much a parallel trends violation would need to be to overturn your results.
C. Visual Intuition
Parallel Trends: The Core of DiD
Two groups start at different levels but follow the same trend before treatment.
The 2x2 Table
Think of DiD through a simple table:
| Before Treatment | After Treatment | Change (After - Before) | |
|---|---|---|---|
| Treated | |||
| Control | |||
| Difference |
The DiD estimate is the difference in the differences: how much the treated group's outcome changed, minus how much the control group's outcome changed.
In Card and Krueger's data:
| Before (Feb 1992) | After (Nov 1992) | Change | |
|---|---|---|---|
| NJ (treated) | 20.44 FTEs | 21.03 FTEs | +0.59 |
| PA (control) | 23.33 FTEs | 21.17 FTEs | -2.16 |
| DiD estimate | +2.75 |
The DiD estimate of +2.75 full-time equivalent employees suggests the minimum wage increase was associated with a modest increase in employment, contrary to the standard prediction.
Build Your Own DiD
Adjust the treatment effect and parallel trends violation to see how they affect the DiD estimate.
Difference-in-Differences Simulator
Explore how difference-in-differences identifies treatment effects by comparing changes over time between treatment and control groups. The DGP is Yit = αi + γt + β·Dit + δ·trendit + εit.
Estimation Results
| Estimator | Estimate | Bias |
|---|---|---|
| DiD estimate | 2.859 | -0.141 |
| Simple pre-post (no control) | 5.020 | +2.020 |
| True β | 3.000 | — |
Number of units in each group
Number of time periods before treatment
Number of time periods after treatment
True causal effect of the treatment
Differential trend slope for the treatment group (0 = parallel trends hold)
Standard deviation of the individual error term
DiD removes common trends. The simple pre-post estimate is biased by +2.02 because it confounds the treatment effect with the common time trend. DiD differences this out.
Why DiD? Seeing the Parallel Trends
DGP: Yᵢₜ = αᵢ + γₜ + 3.0·Dᵢₜ + 0.0·t·Treatᵢ + ε. 50 units per group, 5 pre-periods, 5 post-periods, σ = 1.5.
Estimation Results
| Estimator | β̂ | SE | 95% CI | Bias |
|---|---|---|---|---|
| Pre-Post | 5.244 | 0.151 | [4.95, 5.54] | +2.244 |
| Cross-Section | 4.828 | 0.152 | [4.53, 5.13] | +1.828 |
| DiD | 2.881 | 0.214 | [2.46, 3.30] | -0.119 |
| OLS + Group/Time FEclosest | 2.881 | 0.190 | [2.51, 3.25] | -0.119 |
| True β | 3.000 | — | — | — |
Number of units in each group
Number of periods before treatment onset
Number of periods after treatment onset
The causal effect of treatment
Differential time trend for treated group (0 = parallel trends holds)
Standard deviation of idiosyncratic error
Why the difference?
The pre-post estimator (treatment group only) yields β̂ = 5.244 (bias = +2.244). It confounds the treatment effect with the common time trend: anything that changed between periods for *everyone* inflates the estimate. The cross-section estimator (post period only) yields β̂ = 4.828 (bias = +1.828). It confounds the treatment effect with the level difference between groups: the treatment group already had higher outcomes before treatment. DiD removes both confounders by differencing out the group level difference *and* the common time trend, yielding β̂ = 2.881 (bias = -0.119). With parallel trends satisfied (δ ≈ 0), DiD recovers the true effect. The OLS regression with group and time fixed effects (β̂ = 2.881) matches DiD closely, as expected — in the canonical 2×2 setting, the regression implementation is numerically equivalent to the double-differencing procedure.
D. Mathematical Derivation
Don't worry about the notation yet — here's what this means in words: DiD takes the change over time for the treated group and subtracts the change over time for the control group, netting out both group-level and time-level confounders.
Setup. Suppose we have two groups () and two time periods (). Treatment is applied only to group in period .
The potential outcomes model:
where:
- is the baseline level
- captures the time-invariant difference between groups
- captures the common time trend
- is the treatment effect (the parameter of interest)
- is the idiosyncratic error
Step 1: Compute group-time means.
Step 2: First difference (within each group).
Step 3: Second difference (across groups).
The common time trend cancels out. The group fixed effect was already differenced out in Step 2.
Regression implementation. The DiD estimator is numerically identical to OLS on:
Here . In a panel with unit and time fixed effects:
where , and and are unit and time fixed effects that absorb and respectively. The coefficient is the DiD estimate.
E. Implementation
library(fixest)
# Method 1: Two-way fixed effects
m1 <- feols(y ~ treat_post | unit_id + year,
data = df,
vcov = ~state)
summary(m1)
# Method 2: Event study
m2 <- feols(y ~ i(event_time, treated, ref = -1) | unit_id + year,
data = df,
vcov = ~state)
iplot(m2, main = "Event Study Plot")
# The i() function creates interactions with a reference period
# ref = -1 means the period just before treatment is the baselineF. Diagnostics
F.1 Event Study Plot (The Essential Diagnostic)
The event study plot is a central diagnostic for DiD. It shows the estimated treatment effect for each period relative to the treatment date. Before treatment, the coefficients should be close to zero (supporting parallel trends). After treatment, they show the dynamic treatment effect.
How to read it:
- x-axis: Periods relative to treatment (negative = pre-treatment, 0 or 1 = first treated period)
- y-axis: Estimated coefficient (difference between treated and control relative to the reference period)
- Pre-treatment coefficients near zero: Supports parallel trends
- Pre-treatment coefficients trending: Warns that parallel trends may fail
- Post-treatment coefficients: Show how the treatment effect evolves over time
Pre-Trend Power Lens
Failure to reject parallel trends does not mean parallel trends hold — it might just mean your test lacks power. Explore how sample size, number of pre-periods, the magnitude of a true pre-trend violation, and noise all affect the probability of detecting a violation.
Total number of units (half treated, half control)
Number of time periods before treatment
Size of the true (hidden) differential pre-trend
Standard deviation of the idiosyncratic error
Power at N = 60
100%
Probability of rejecting H0: no pre-trend (at α = 0.05)
Good power. At N = 60, the test has 100% power. Failing to reject H0 here would provide more meaningful evidence that the pre-trend is small.
F.2 Placebo Tests
Run DiD using a fake treatment date (before the actual treatment). If you find an "effect" at the fake date, your parallel trends assumption is likely violated.
F.3 Alternative Control Groups
If you have multiple potential control groups, check that the DiD estimate is similar across different choices. Sensitivity to the control group suggests the parallel trends assumption may depend on which group you choose.
F.4 Inclusion of Group-Specific Trends
Adding group-specific linear time trends to the regression allows the treated and control groups to have different (linear) pre-trends. If results change substantially, the parallel trends assumption may be fragile. But be cautious: group-specific trends can absorb real treatment effects if the effect is gradual.
Interpreting Your Results
How to interpret the DiD coefficient: "The treatment is associated with a [coefficient] change in [outcome] relative to the comparison group, controlling for group and time fixed effects."
Example write-up:
Common misstatements:
- Do not say "the treated and control groups had parallel trends" — say "we find no evidence against parallel trends in the pre-treatment period"
- Do not omit the event study plot — reviewers now expect it
- Do not forget to report the number of clusters and the clustering level
G. What Can Go Wrong
Violation of Parallel Trends
Treated and control groups have parallel pre-trends
DiD estimate = 2.0, close to the true effect of 2.0. Event study shows flat pre-treatment coefficients.
Clustering at the Wrong Level
Standard errors clustered at the state level (the level of treatment variation)
SE = 1.36, t-stat = 2.03, marginally significant. Honest uncertainty about the estimate.
Anticipation Effects
No behavioral change before the treatment date
Clean break in the outcome at the treatment date. Event study shows zero pre-treatment effects.
H. Practice
H.1 Concept Checks
In a canonical 2x2 DiD, what does the parallel trends assumption require?
You run a DiD regression and your colleague asks you to add county fixed effects, year fixed effects, AND state-specific linear time trends. What concern might you have about the state-specific trends?
A study has 50 states, 10 of which adopted a policy. The authors cluster standard errors at the individual level (N = 500,000). The coefficient is significant at p < 0.001. Should you be concerned?
Your event study plot shows pre-treatment coefficients of -0.1, -0.5, -0.9, -1.3 (increasingly negative as you move further before treatment). What does this suggest?
You are studying the effect of a city-level smoking ban on restaurant revenue. You use DiD comparing restaurants in cities that adopted the ban to restaurants in cities that did not. What violation of SUTVA might concern you?
H.2 Guided Exercise
You have the following data from a DiD analysis. Fill in the blanks.
A state implemented a subsidized childcare program. You compare counties in the treated state to counties in a neighboring state. Average female labor force participation rates: Treated state, before: 55% Treated state, after: 62% Control state, before: 58% Control state, after: 60%
H.3 Error Detective
Read the analysis below carefully and identify the errors.
Select all errors you can find:
Read the analysis below carefully and identify the errors.
Select all errors you can find:
H.4 You Are the Referee
Read the paper summary below and write a brief referee critique (2-3 sentences) of the identification strategy.
Paper Summary
The authors study the effect of a 2016 European regulation requiring large firms to disclose gender pay gap data. Using a DiD design, they compare firms above the size threshold (treated) to firms just below it (control), before and after the regulation. They find that the pay gap narrowed by 2.1 percentage points (SE = 0.7) in treated firms relative to control firms. The event study shows no significant pre-trends in the 3 years before the regulation.
Key Table
| Variable | Coefficient | Clustered SE | p-value |
|---|---|---|---|
| Treated x Post | -2.10 | 0.70 | 0.003 |
| Firm Size (log) | 0.45 | 0.12 | 0.000 |
| Industry FE | Yes | ||
| Firm FE | Yes | ||
| Year FE | Yes | ||
| Clusters (firms) | 1,200 | ||
| N (firm-years) | 9,600 | ||
| Pre-trend F-test p-value | 0.42 |
Authors' Identification Claim
The regulation created a quasi-experiment by treating firms above the size threshold and not those below it. The parallel trends assumption is supported by the non-significant pre-trends.
I. Swap-In: When to Use Something Else
- Staggered DiD: When treatment is adopted at different times by different units — canonical 2×2 DiD is a special case, and staggered estimators handle heterogeneous treatment effects.
- Synthetic control: When there is a single (or very few) treated unit and constructing a data-driven counterfactual from donor units is more credible than parallel trends.
- Event studies: When the full time profile of treatment effects is of primary interest, or when you need to visualize pre-trends as a diagnostic.
- RDD: When treatment is assigned by a threshold on a running variable rather than by group membership over time.
- Fixed effects: When treatment varies within unit over time but the parallel trends assumption is suspect — FE removes time-invariant confounders without requiring a comparison group.
J. Reviewer Checklist
Critical Reading Checklist
Paper Library
Foundational (4)
Ashenfelter, O. (1978). Estimating the Effect of Training Programs on Earnings.
This paper is one of the earliest applications of the difference-in-differences logic. Ashenfelter compared the earnings of trainees before and after a job training program to a comparison group, introducing the idea that you can remove time-invariant unobserved differences by looking at changes over time.
Card, D., & Krueger, A. B. (1994). Minimum Wages and Employment: A Case Study of the Fast-Food Industry in New Jersey and Pennsylvania.
Perhaps the most famous DID study in economics. Card and Krueger compared fast-food employment in New Jersey (which raised its minimum wage) with neighboring Pennsylvania (which did not). They found no negative employment effect, challenging the standard textbook prediction. This paper popularized DID as a research design.
Bertrand, M., Duflo, E., & Mullainathan, S. (2004). How Much Should We Trust Differences-in-Differences Estimates?.
This critical paper showed that standard errors in DID studies are often far too small because they ignore serial correlation within units over time. It proposed clustering standard errors at the group level as a simple fix, which is now standard practice in all DID analyses.
Roth, J. (2022). Pretest with Caution: Event-Study Estimates after Testing for Parallel Trends.
Shows that the common practice of testing for parallel pre-trends and proceeding conditional on 'passing' can lead to distorted inference. Proposes honest confidence intervals that account for pre-testing. Fundamentally changes how researchers should think about event study pre-trends in DiD designs.
Application (7)
Gruber, J. (1994). The Incidence of Mandated Maternity Benefits.
Gruber used a DID design exploiting variation in state-level mandated maternity benefits to show that the costs of these benefits were shifted to workers in the form of lower wages. This study is a classic example of how DID can exploit policy variation across states and time.
Autor, D. H. (2003). Outsourcing at Will: The Contribution of Unjust Dismissal Doctrine to the Growth of Employment Outsourcing.
Autor used a DID design that exploited the staggered adoption of wrongful-discharge protections across U.S. states. He found that stronger employment protections led firms to outsource more jobs. This paper is a model for using staggered state-level policy changes in a DID framework.
Lerner, J., & Wulf, J. (2007). Innovation and Incentives: Evidence from Corporate R&D.
This paper applied panel data methods including DID-style designs to study how compensation incentives for R&D managers affect innovation outcomes. It illustrates how DID thinking can be applied to management and innovation questions.
Flammer, C. (2015). Does Corporate Social Responsibility Lead to Superior Financial Performance? A Regression Discontinuity Approach.
Although primarily an RDD paper, Flammer also used DID-style before-after comparisons around shareholder proposals on CSR. Published in Management Science, it is a prominent example of quasi-experimental methods in top management journals.
Kellogg, R. (2011). Learning by Drilling: Interfirm Learning and Relationship Persistence in the Texas Oilpatch.
Kellogg used a DID approach leveraging oil price shocks to study how interfirm relationships affect productivity in the Texas oil industry. It is an excellent example of DID applied to organizational learning and firm boundaries questions relevant to strategy scholars.
Duflo, E. (2001). Schooling and Labor Market Consequences of School Construction in Indonesia: Evidence from an Unusual Policy Experiment.
Uses DiD comparing cohorts exposed to a massive school construction program in Indonesia to older cohorts not exposed, across regions with different program intensity. A beautifully clean application showing how DiD can exploit variation in treatment intensity across space and cohorts.
Agarwal, R., & Ohyama, A. (2013). Industry or Academia, Basic or Applied? Career Choices and Earnings Trajectories of Scientists.
Uses panel data on scientists' career choices with DiD-style comparisons to identify the effect of early career environment on long-run earnings trajectories. A management-journal application showing how DiD logic can be applied to career and human capital questions.
Survey (2)
Roth, J., Sant'Anna, P. H. C., Bilinski, A., & Poe, J. (2023). What's Trending in Difference-in-Differences? A Synthesis of the Recent Econometrics Literature.
This comprehensive survey synthesizes the explosion of recent econometric work on DID, covering staggered treatment timing, heterogeneous treatment effects, pre-trends testing, and new estimators. It is the essential starting point for understanding the modern DID literature.
Cunningham, S. (2021). Causal Inference: The Mixtape.
An accessible textbook with an excellent DiD chapter that walks through the intuition, the math, and the code (in Stata and R). Freely available online at mixtape.scunning.com, it is a valuable companion for students who want worked examples alongside formal treatment.