Difference-in-Differences (Canonical 2×2)
Estimates causal effects by comparing changes over time between treated and control groups.
One-Line Implementation
feols(y ~ treat_post | unit + time, data = df, vcov = ~state)reghdfe y treat_post, absorb(unit time) vce(cluster state)PanelOLS.from_formula('y ~ treat_post + EntityEffects + TimeEffects', data=df).fit(cov_type='clustered', clusters=df['state'])Download Full Analysis Code
Complete scripts with diagnostics, robustness checks, and result export.
Motivating Example: The New Jersey Minimum Wage Increase
On April 1, 1992, New Jersey raised its minimum wage from $4.25 to $5.05 per hour. Neighboring Pennsylvania did not. David Card and Alan Krueger saw an opportunity.
The conventional economic wisdom at the time was clear: raising the minimum wage should reduce employment. If labor costs go up, firms hire fewer workers. Theory said so, and decades of empirical work appeared to confirm it.
did something simple but powerful. They surveyed fast-food restaurants in New Jersey (where the minimum wage went up) and eastern Pennsylvania (where it did not), both before and after the wage increase. If you just compared employment in New Jersey before and after, you would conflate the minimum wage effect with everything else happening to the New Jersey economy. If you just compared New Jersey to Pennsylvania at one point in time, you would conflate the minimum wage effect with all the other ways the two states differ.
But by taking the difference in the change over time — how employment changed in New Jersey minus how employment changed in Pennsylvania — you can difference out both time-invariant state differences and common time trends. What remained, they argued, was the effect of the minimum wage.
Their finding shocked the profession: employment in New Jersey fast-food restaurants did not fall. If anything, it rose slightly. This study is one of the most cited papers in economics, not because it settled the minimum wage debate (it did not), but because it demonstrated the power of the difference-in-differences research design.
AOverview
What DiD Does
Difference-in-Differences is a research design that estimates causal effects by comparing the change over time in an outcome for a group affected by a treatment to the change over time for a group not affected.
The core logic has two "differences":
- First difference (over time): How did the outcome change from before to after the treatment, for each group?
- Second difference (across groups): How does the change for the treated group compare to the change for the control group?
The second difference removes any common time trends — events that affect both groups equally — leaving (under assumptions) the causal effect of the treatment.
When to Use DiD
- A treatment or policy change affects one group but not another
- You observe both groups both before and after the treatment
- The treated and control groups were on similar trajectories before the treatment
- Treatment timing is sharp and known
When NOT to Use DiD
- There is no credible comparison group
- Treatment adoption is staggered across many groups at different times (you may need modern staggered DiD estimators — see the Staggered DiD page)
- The assumption is clearly violated
- The treatment is anticipated and agents change behavior before it takes effect
The Taxonomy Position
DiD is a method. Its credibility comes from the research design — the particular setting and the parallel trends argument — rather than from controlling for observed confounders alone. It sits between pure experimental designs (randomized control trials) and pure model-based approaches (OLS with controls).
Common Confusions
BIdentification
Assumption 1:
Plain language: In the absence of treatment, the treated and control groups would have experienced the same change in the outcome over time. Note: they do not need the same level of the outcome, just the same trend.
Formally: for all
where is the potential outcome without treatment for unit at time , and indicates treatment group membership.
Assumption 2: No Anticipation
Plain language: The treated group does not change its behavior before the treatment takes effect. If New Jersey restaurants started adjusting employment in anticipation of the minimum wage increase, the pre-treatment period is contaminated.
Assumption 3: Stable Composition (No Selective Attrition)
Plain language: The composition of the treated and control groups does not change because of the treatment. If the minimum wage increase caused some restaurants to close (and you only observe survivors), your estimates are biased.
Assumption 4: SUTVA (Stable Unit Treatment Value Assumption)
Plain language: The treatment of one unit does not affect the outcomes of other units. If New Jersey restaurants losing workers caused Pennsylvania restaurants to gain them, the "control" group is contaminated by .
The Omitted Variable Bias Connection
DiD eliminates a specific class of confounders: variables that are constant over time within groups (group fixed effects) and variables that are constant across groups at each point in time (time fixed effects). It does not eliminate confounders that change differentially over time across groups. That remaining threat is what the parallel trends assumption fundamentally addresses. Sensitivity analysis can help quantify how much a parallel trends violation would need to be to overturn your results.
CVisual Intuition
Parallel Trends: The Core of DiD
Two groups start at different levels but follow the same trend before treatment.
The 2x2 Table
Think of DiD through a simple table:
| Before Treatment | After Treatment | Change (After - Before) | |
|---|---|---|---|
| Treated | |||
| Control | |||
| Difference |
The DiD estimate is the difference in the differences: how much the treated group's outcome changed, minus how much the control group's outcome changed.
In 's data:
| Before (Feb 1992) | After (Nov 1992) | Change | |
|---|---|---|---|
| NJ (treated) | 20.44 FTEs | 21.03 FTEs | +0.59 |
| PA (control) | 23.33 FTEs | 21.17 FTEs | -2.16 |
| DiD estimate | +2.75 |
The DiD estimate of +2.75 full-time equivalent employees suggests the minimum wage increase was associated with a modest increase in employment, contrary to the standard prediction. DiD designs have since been applied across many settings in management and strategy — for example, Choudhury et al. (2021) use a DiD design to study the productivity effects of a work-from-anywhere natural experiment.
DMathematical Derivation
Don't worry about the notation yet — here's what this means in words: DiD takes the change over time for the treated group and subtracts the change over time for the control group, netting out both group-level and time-level confounders.
Setup. Suppose we have two groups () and two time periods (). Treatment is applied only to group in period .
The potential outcomes model:
where:
- is the baseline level
- captures the time-invariant difference between groups
- captures the common time trend
- is the treatment effect (the parameter of interest)
- is the idiosyncratic error
Step 1: Compute group-time means.
Step 2: First difference (within each group).
Step 3: Second difference (across groups).
The common time trend cancels out. The group fixed effect was already differenced out in Step 2.
Regression implementation. The DiD estimator is numerically identical to OLS on:
Here . In a panel with unit and time fixed effects:
where , and and are unit and time fixed effects that absorb and respectively. The coefficient is the DiD estimate.
EImplementation
# Requires: fixest
# fixest: fast fixed-effects estimation with multi-way clustering (Berge)
library(fixest)
# --- Step 1: Two-way fixed effects (TWFE) DiD ---
# feols() estimates OLS with absorbed fixed effects
# treat_post = interaction of treated group x post-treatment period (the DiD term)
# | unit_id + year: absorbs unit and time fixed effects
# vcov = ~state: clusters SEs at state level because treatment varies at state level
m1 <- feols(y ~ treat_post | unit_id + year,
data = df,
vcov = ~state)
summary(m1)
# Coefficient on treat_post: estimated ATT (Average Treatment Effect on the Treated)
# --- Step 2: Event study specification ---
# i(event_time, treated, ref = -1) creates interactions between event-time dummies
# and the treated indicator, omitting k=-1 as the reference (normalization) period
# Pre-treatment coefficients near zero support parallel trends assumption
m2 <- feols(y ~ i(event_time, treated, ref = -1) | unit_id + year,
data = df,
vcov = ~state)
# iplot() displays event-study coefficients with confidence intervals
iplot(m2, main = "Event Study Plot")
# Pre-treatment coefficients: test for parallel trends (should be near zero)
# Post-treatment coefficients: dynamic treatment effect pathFDiagnostics
F.1 Event Study Plot (The Key Diagnostic)
The event study plot is a central diagnostic for DiD. It shows the estimated treatment effect for each period relative to the treatment date. Before treatment, the coefficients should be close to zero (supporting parallel trends). After treatment, they show the dynamic treatment effect.
How to read it:
- x-axis: Periods relative to treatment (negative = pre-treatment, 0 or 1 = first treated period)
- y-axis: Estimated coefficient (difference between treated and control relative to the reference period)
- Pre-treatment coefficients near zero: Supports parallel trends
- Pre-treatment coefficients trending: Warns that parallel trends may fail
- Post-treatment coefficients: Show how the treatment effect evolves over time
F.2
Run DiD using a fake treatment date (before the actual treatment). If you find an "effect" at the fake date, your parallel trends assumption is likely violated.
F.3 Alternative Control Groups
If you have multiple potential control groups, check that the DiD estimate is similar across different choices. Sensitivity to the control group suggests the parallel trends assumption may depend on which group you choose.
F.4 Inclusion of Group-Specific Trends
Adding group-specific linear time trends to the regression allows the treated and control groups to have different (linear) pre-trends. If results change substantially, the parallel trends assumption may be fragile. But be cautious: group-specific trends can absorb real treatment effects if the effect is gradual.
Interpreting Your Results
How to interpret the DiD coefficient: "The treatment is associated with a [coefficient] change in [outcome] relative to the comparison group, controlling for group and time fixed effects."
Example write-up:
Common misstatements:
- Do not say "the treated and control groups had parallel trends" — say "we find no evidence against parallel trends in the pre-treatment period"
- Do not omit the event study plot — reviewers now expect it
- Do not forget to report the number of clusters and the clustering level
GWhat Can Go Wrong
Violation of Parallel Trends
Treated and control groups have parallel pre-trends
DiD estimate = 2.0, close to the true effect of 2.0. Event study shows flat pre-treatment coefficients.
Clustering at the Wrong Level
Standard errors clustered at the state level (the level of treatment variation)
SE = 1.36, t-stat = 2.03, marginally significant. Honest uncertainty about the estimate.
Anticipation Effects
No behavioral change before the treatment date
DiD estimate = -2.16 (SE = 1.06), p = 0.04. Event study pre-treatment coefficients are -0.08, 0.12, -0.05, and 0.03 for periods t-4 through t-1 — all statistically insignificant (largest |t| = 0.41). Clean break at the treatment date.
HPractice
H.1 Concept Checks
In a canonical 2x2 DiD, what does the parallel trends assumption require?
You run a DiD regression and your colleague asks you to add county fixed effects, year fixed effects, AND state-specific linear time trends. What concern might you have about the state-specific trends?
A study has 50 states, 10 of which adopted a policy. The authors cluster standard errors at the individual level (N = 500,000). The coefficient is significant at p < 0.001. Should you be concerned?
Your event study plot shows pre-treatment coefficients (relative to t = −1) of −1.3 at t = −5, −0.9 at t = −4, −0.5 at t = −3, and −0.1 at t = −2. What does this pattern suggest?
You are studying the effect of a city-level smoking ban on restaurant revenue. You use DiD comparing restaurants in cities that adopted the ban to restaurants in cities that did not. What violation of SUTVA might concern you?
H.2 Guided Exercise
You have the following data from a DiD analysis. Fill in the blanks.
A state implemented a subsidized childcare program. You compare counties in the treated state to counties in a neighboring state. Average female labor force participation rates:
Treated state, before: 55% Treated state, after: 62% Control state, before: 58% Control state, after: 60%
H.3 Error Detective
Read the analysis below carefully and identify the errors.
A researcher studies the effect of a state-level paid family leave policy on birth rates. Three states adopted the policy in 2015; 47 states did not. Using state-year panel data (2010-2020), they estimate:
reghdfe birth_rate treat_post, absorb(state year) vce(robust)
They find: coefficient = 1.8 births per 1000, SE = 0.4, p < 0.001. They conclude: "The paid family leave policy caused a statistically significant increase in birth rates."
Select all errors you can find:
Read the analysis below carefully and identify the errors.
A management researcher studies whether adopting agile methodology improves software team productivity. They observe 200 teams at a tech company, 80 of which adopted agile in Q1 2020 (the rest continued with waterfall). Using quarterly panel data from 2018-2022:
reg productivity agile_team##post_Q12020 team_size experience, vce(cluster team)
Coefficient on interaction: 12.5 (SE = 3.1, p < 0.001). "Agile adoption increased productivity by 12.5 units."
The event study shows flat pre-trends from 2018-2019.
Select all errors you can find:
H.4 You Are the Referee
Read the paper summary below and write a brief referee critique (2-3 sentences) of the identification strategy.
Paper Summary
The authors study the effect of a 2016 European regulation requiring large firms to disclose gender pay gap data. Using a DiD design, they compare firms above the size threshold (treated) to firms just below it (control), before and after the regulation. They find that the pay gap narrowed by 2.1 percentage points (SE = 0.7) in treated firms relative to control firms. The event study shows no significant pre-trends in the 3 years before the regulation.
Key Table
| Variable | Coefficient | Clustered SE | p-value |
|---|---|---|---|
| Treated x Post | -2.10 | 0.70 | 0.003 |
| Firm Size (log) | 0.45 | 0.12 | 0.000 |
| Industry FE | Yes | ||
| Firm FE | Yes | ||
| Year FE | Yes | ||
| Clusters (firms) | 1,200 | ||
| N (firm-years) | 9,600 | ||
| Pre-trend F-test p-value | 0.42 |
Authors' Identification Claim
The regulation created a quasi-experiment by treating firms above the size threshold and not those below it. The parallel trends assumption is supported by the non-significant pre-trends.
ISwap-In: When to Use Something Else
- Staggered DiD: When treatment is adopted at different times by different units — canonical 2×2 DiD is a special case, and staggered estimators handle heterogeneous treatment effects.
- Synthetic control: When there is a single (or very few) treated unit and constructing a data-driven counterfactual from donor units is more credible than parallel trends.
- Event studies: When the full time profile of treatment effects is of primary interest, or when you need to visualize pre-trends as a diagnostic.
- RDD: When treatment is assigned by a threshold on a running variable rather than by group membership over time.
- Fixed effects: When treatment varies within unit over time but the parallel trends assumption is suspect — FE removes time-invariant confounders without requiring a comparison group.
JReviewer Checklist
Critical Reading Checklist
Paper Library
Foundational (6)
Abadie, A., Athey, S., Imbens, G. W., & Wooldridge, J. M. (2023). When Should You Adjust Standard Errors for Clustering?.
Abadie et al. provide guidance on when clustering standard errors is necessary. They show that clustering can be motivated by sampling-based uncertainty (e.g., two-stage sampling of clusters then units) or design-based uncertainty (e.g., treatment assigned at the cluster level), and that whether to cluster, and at what level, is a substantive question tied to the sampling and assignment process — not a purely mechanical rule.
Ashenfelter, O. (1978). Estimating the Effect of Training Programs on Earnings.
Ashenfelter provides one of the earliest applications of the difference-in-differences logic, comparing the earnings of trainees before and after a job training program to a comparison group. The key insight is that differencing removes time-invariant unobserved differences between treatment and control groups. This paper also documents the 'Ashenfelter dip' — the pre-program earnings decline among trainees — which becomes a canonical example of why parallel trends cannot be taken for granted.
Bertrand, M., Duflo, E., & Mullainathan, S. (2004). How Much Should We Trust Differences-in-Differences Estimates?.
Bertrand, Duflo, and Mullainathan show that standard errors in DID studies are often far too small because they ignore serial correlation within units over time. They propose clustering standard errors at the group level as a simple fix, which is now widely recommended practice in DID applications.
Card, D., & Krueger, A. B. (1994). Minimum Wages and Employment: A Case Study of the Fast-Food Industry in New Jersey and Pennsylvania.
Card and Krueger compare fast-food employment in New Jersey (which raised its minimum wage) with neighboring Pennsylvania (which did not) in perhaps the most famous DID study in economics. They find no negative employment effect, challenging the standard textbook prediction. This paper popularizes DID as a research design.
Frake, J., Gibbs, A., Goldfarb, B., Hiraiwa, T., Starr, E., & Yamaguchi, S. (2025). From Perfect to Practical: Partial Identification Methods for Causal Inference in Strategic Management Research.
Frake and colleagues introduce partial identification methods to strategic management, providing a practical framework for assessing the sensitivity of difference-in-differences and instrumental variables estimates to violations of identifying assumptions. The paper demonstrates how researchers can construct informative bounds on treatment effects when parallel trends or exclusion restriction assumptions are relaxed. It bridges the gap between the theoretical ideal of point identification and the practical reality that identifying assumptions are rarely perfectly satisfied.
Roth, J. (2022). Pretest with Caution: Event-Study Estimates after Testing for Parallel Trends.
Roth shows that the common practice of testing for parallel pre-trends and proceeding conditional on 'passing' can lead to distorted inference. He proposes honest confidence intervals that account for pre-testing, fundamentally changing how researchers should think about event study pre-trends in DiD designs.
Application (6)
Autor, D. H. (2003). Outsourcing at Will: The Contribution of Unjust Dismissal Doctrine to the Growth of Employment Outsourcing.
Autor uses a DID design that exploits the staggered adoption of wrongful-discharge protections across U.S. states. He finds that stronger employment protections led firms to outsource more jobs. This paper is a model for using staggered state-level policy changes in a DID framework.
Choudhury, P., Foroughi, C., & Larson, B. (2021). Work-from-anywhere: The Productivity Effects of Geographic Flexibility.
Choudhury, Foroughi, and Larson use a difference-in-differences design to study the productivity effects of a work-from-anywhere policy at the U.S. Patent and Trademark Office. They find that geographic flexibility increases output by approximately 4.4% without reducing quality. The paper demonstrates the application of DiD to a natural experiment in organizational design and is a leading example of causal inference in the future-of-work literature.
Duflo, E. (2001). Schooling and Labor Market Consequences of School Construction in Indonesia: Evidence from an Unusual Policy Experiment.
Duflo uses DiD comparing cohorts exposed to a massive school construction program in Indonesia to older cohorts not exposed, across regions with different program intensity. A beautifully clean application showing how DiD can exploit variation in treatment intensity across space and cohorts.
Gruber, J. (1994). The Incidence of Mandated Maternity Benefits.
Gruber uses a DID design exploiting variation in state-level mandated maternity benefits to show that the costs of these benefits are shifted to workers in the form of lower wages. This study is a classic example of how DID can exploit policy variation across states and time.
Neumark, D., & Wascher, W. (2000). Minimum Wages and Employment: A Case Study of the Fast-Food Industry in New Jersey and Pennsylvania: Comment.
Neumark and Wascher challenge Card and Krueger's (1994) minimum wage findings by re-analyzing the data using payroll records instead of survey responses, finding negative employment effects. The exchange illustrates the importance of data quality and measurement choices in difference-in-differences designs.
Singh, J., & Agrawal, A. (2011). Recruiting for Ideas: How Firms Exploit the Prior Inventions of New Hires.
Singh and Agrawal use a difference-in-differences approach, comparing citation rates to recruits' patents before and after the move against matched control patents, to study how hiring inventors affects knowledge flows to the hiring firm. They find that hiring an inventor increases the hiring firm's citations to the recruit's prior patents, indicating knowledge transfer. The paper demonstrates how DiD with matched controls can identify causal effects in knowledge flow studies.
Survey (3)
Angrist, J. D., & Pischke, J.-S. (2009). Mostly Harmless Econometrics: An Empiricist's Companion.
Angrist and Pischke write one of the most influential modern textbooks on applied econometrics, organizing the field around a design-based approach to causal inference. The book provides essential treatments of instrumental variables, difference-in-differences, and regression discontinuity, each grounded in the potential outcomes framework. It remains the standard reference for graduate students learning to evaluate and implement identification strategies.
Cunningham, S. (2021). Causal Inference: The Mixtape.
Cunningham provides an accessible textbook with an excellent DiD chapter that walks through the intuition, the math, and the code (in Stata and R). Freely available online at mixtape.scunning.com, it is a valuable companion for students who want worked examples alongside formal treatment.
Roth, J., Sant'Anna, P. H. C., Bilinski, A., & Poe, J. (2023). What's Trending in Difference-in-Differences? A Synthesis of the Recent Econometrics Literature.
Roth et al. synthesize the explosion of recent econometric work on DID in this comprehensive survey, covering staggered treatment timing, heterogeneous treatment effects, pre-trends testing, and new estimators. It is the essential starting point for understanding the modern DID literature.