MethodAtlas
Design-BasedModern

Staggered Adoption DiD (Modern Estimators)

Under staggered adoption with heterogeneous effects, traditional TWFE can produce biased estimates — modern estimators correct for this.

Quick Reference

When to Use
When treatment is adopted at different times by different units (staggered rollout) and treatment effects may be heterogeneous across cohorts or over time. A recommended first step is to run the Goodman-Bacon decomposition as a diagnostic.
Key Assumption
Parallel trends holds for each cohort separately; no anticipation effects; irreversibility (units do not switch treatment off once adopted). The not-yet-treated group is the preferred control.
Common Mistake
Using standard TWFE when treatment effects are heterogeneous across cohorts or over time — already-treated units serve as implicit controls, contaminating estimates. Run a Goodman-Bacon decomposition before choosing an estimator.
Estimated Time
3 hours

One-Line Implementation

Stata: csdid y x1 x2, ivar(unit) time(year) gvar(first_treat) method(dripw)
R: att_gt(yname='y', tname='year', idname='unit', gname='first_treat', data=df, control_group='notyettreated')
Python: # No standard Python package; use R did via rpy2 or differences

Download Full Analysis Code

Complete scripts with diagnostics, robustness checks, and result export.

Motivating Example

Imagine you are studying the effect of state-level anti-discrimination laws on hiring outcomes for minority workers. These laws were not all passed at once — different states adopted them at different times between 1970 and 2000. This pattern is staggered adoption, and it is incredibly common in policy evaluation. The canonical 2x2 DiD framework handles a single treatment date cleanly, but staggered rollout introduces new complications.

You might naturally reach for a two-way fixed effects regression:

Yit=αi+λt+δDit+εitY_{it} = \alpha_i + \lambda_t + \delta \cdot D_{it} + \varepsilon_{it}

where Dit=1D_{it} = 1 once state ii has adopted the law. This specification seems perfectly reasonable. You have unit fixed effects to absorb time-invariant state differences, year fixed effects to absorb common shocks, and δ\delta captures the treatment effect.

But here is the surprising result that has reshaped applied econometrics since roughly 2019: when treatment effects are heterogeneous across cohorts or evolve dynamically over time, this regression can give you a negative estimate even when the true treatment effect is positive for every single unit (Goodman-Bacon, 2021; de Chaisemartin & D'Haultfoeuille, 2020).

This result is not a typo. It is a consequence of how TWFE constructs comparisons under staggered timing — specifically, the use of already-treated units as controls, whose changing outcomes contaminate the estimate. Understanding this problem is now important for applied researchers working with staggered adoption designs.

A. Overview: The TWFE Problem

The core issue is the following: when treatment rolls out at different times, TWFE does not just compare treated units to never-treated units. It also makes "forbidden comparisons" — comparing later-treated units to already-treated units. When already-treated units serve as controls, their treatment effects contaminate the estimate.

If treatment effects are homogeneous (the same for every unit at every point in time), this weighting does not matter. But if effects differ across cohorts or evolve over time — which is common in practice — TWFE can assign negative weights to some treatment effects, producing an estimate that is a distorted average of the true effects. Visualizing these dynamics through an event study is essential for diagnosing the problem.

(Goodman-Bacon, 2021)

Goodman-Bacon (2021) showed that the TWFE estimator is a weighted average of all possible 2x2 DiD comparisons, including the problematic ones where early-treated units serve as controls for later-treated units.

Common Confusions

"Does TWFE always give wrong answers?" No. If treatment effects are truly homogeneous across cohorts and over time, TWFE works fine. Running a sensitivity analysis on the degree of heterogeneity can help you assess how much the bias matters in your specific application. The problem arises specifically when effects are heterogeneous. Unfortunately, heterogeneous effects are common in many applied settings — treatment effects often vary across cohorts, grow or fade over time, or depend on implementation context. And the difficulty is that you usually cannot tell in advance whether the heterogeneity is severe enough to matter.

"Can I just add cohort-by-time interactions?" Adding interactions can help, but it changes what you are estimating and can create its own problems. The modern estimators are specifically designed to handle this complication correctly.

"Do I need to re-do all my old papers?" Not necessarily. It is advisable to run the Goodman-Bacon decomposition on your old estimates to check whether the problematic comparisons drive your results. If the weights are mostly positive and the problematic comparisons are a small share, your original results may be fine.

"Which modern estimator should I use?" There is no single best answer. Callaway and Sant'Anna (2021) is among the most flexible and widely adopted options as of 2024. de Chaisemartin and D'Haultfoeuille (2020) is well-suited when you want a simple overall average effect. Borusyak et al. (2024) offer an imputation-based approach that is efficient and intuitive. We discuss each below.

B. Identification

The Goodman-Bacon Decomposition

Goodman-Bacon (2021) proved that the TWFE estimator δ^TWFE\hat{\delta}^{TWFE} can be written as:

δ^TWFE=klkwklδ^kl\hat{\delta}^{TWFE} = \sum_{k} \sum_{l \neq k} w_{kl} \cdot \hat{\delta}_{kl}

where δ^kl\hat{\delta}_{kl} is the simple 2x2 DiD estimate comparing cohort kk (treated) to cohort ll (control), and wklw_{kl} are non-negative weights (proportional to group sizes and treatment timing variance) that sum to one. The Goodman-Bacon weights themselves are never negative; the problem is that some δ^kl\hat{\delta}_{kl} comparisons use already-treated units as controls, which biases those individual estimates. (Separately, de Chaisemartin and D'Haultfoeuille (2020) show a different decomposition where TWFE weights on individual treatment effects can be negative.)

The weights depend on:

  • Group sizes (larger groups get more weight)
  • Variance in treatment timing (more variation = more weight)
  • Whether the comparison is "clean" (treated vs. never-treated) or "contaminated" (treated vs. already-treated)

The contaminated comparisons are the problem. When an already-treated unit's outcomes have been affected by treatment, using it as a control biases the comparison.

Modern Solutions

All modern estimators share a common strategy: avoid the forbidden comparisons. They differ in how they achieve this goal.

Callaway and Sant'Anna (2021) estimate group-time average treatment effects ATT(g,t)ATT(g,t) — the effect for cohort gg at time tt — using only never-treated or not-yet-treated units as controls. These building blocks are then aggregated into summary measures.

de Chaisemartin and D'Haultfoeuille (2020) estimate a different parameter: the average effect of switching treatment on among the switchers. They show that this parameter is robust to heterogeneous effects.

Key Assumptions

All approaches require:

  1. Parallel trends for each cohort: In the absence of treatment, each cohort would have followed the same trend as its control group.
  2. No anticipation: Treatment has no effect before it is implemented.
  3. Irreversibility (often): Units do not switch treatment off once adopted. (Some estimators, such as de Chaisemartin and D'Haultfoeuille, can accommodate treatment reversals.)

C. Visual Intuition

The Goodman-Bacon decomposition provides an illuminating diagnostic plot. It shows each 2x2 comparison (treated vs. control cohort pair) as a dot, with the x-axis showing the weight and the y-axis showing the 2x2 estimate. The TWFE estimate is the weighted average.

If the dots are clustered together, TWFE is fine. If the "clean" comparisons (treated vs. never-treated) give very different estimates from the "contaminated" comparisons (treated vs. already-treated), and the contaminated comparisons have large weights, you have a problem.

Interactive Simulation

TWFE Decomposition

Set the treatment effect for early adopters and late adopters. When effects are heterogeneous, watch how TWFE diverges from the true average effect. The contaminated comparisons (using already-treated as controls) pull the estimate in the wrong direction.

06.3312.6618.98Simulated ValueEffect forEffect forEffect Grow…Share NeverParameters
010
010
02
00.5
Interactive Visualization

Bad Comparisons in Staggered DiD

This heatmap shows which comparisons TWFE implicitly makes when treatment is adopted at different times. Red cells indicate already-treated units that TWFE incorrectly uses as controls for later-adopting cohorts.

Cohort by time period heatmap showing valid and forbidden comparisons in TWFE estimationt=1t=2t=3t=4t=5t=6t=7Cohort (t*=2)Cohort (t*=3)Cohort (t*=4)Cohort (t*=5)Never treated
4

Groups that adopt treatment at different times

Legend

Not yet treated
Treated
Already treated (used as control)
Never treated

TWFE Comparison Summary

TWFE uses 12 forbidden comparisons out of 29 total control cells (41.4%).

Why this matters: Under treatment effect heterogeneity, using already-treated units as controls produces negative weights that bias the TWFE estimate. Early adopters who have already responded to treatment are poor counterfactuals for later adopters, because the comparison captures the difference in treatment effects rather than the treatment effect itself.

Interactive Simulation

Why Modern DiD? TWFE vs. Clean Estimators

DGP: 3 cohorts with staggered adoption + never-treated group. Base effect = 3.0, heterogeneity = ±3.0, dynamic growth = 0.5/period, σ = 1.5. 50 units per cohort.

0.83.66.49.111.914.7012345678PeriodMean Y
Cohort 1 (t*=2)Cohort 2 (t*=3)Cohort 3 (t*=4)Never treated

Estimation Results

Estimatorβ̂SE95% CIBias
TWFE4.3050.164[3.98, 4.63]+0.360
Clean (Modern)closest4.1560.637[2.91, 5.41]+0.211
True β3.944
50

Number of units in each cohort

3

Groups adopting treatment at different times

3.0

Average treatment effect across cohorts

3.0

Spread of cohort-specific effects around the base (0 = homogeneous)

0.5

How much the treatment effect grows each period post-adoption

1.5

Standard deviation of idiosyncratic error

Why the difference?

TWFE yields β̂ = 4.305 (bias = +0.360). Under staggered adoption with heterogeneous effects, TWFE implicitly compares later-adopting cohorts against already-treated cohorts. When early adopters' outcomes have already shifted, this "forbidden comparison" contaminates the estimate. The clean estimator uses only not-yet-treated and never-treated units as controls, yielding β̂ = 4.156 (bias = +0.211). By avoiding forbidden comparisons, it recovers the true ATT (3.944) much more accurately. Cohort-specific ATTs: Cohort 1: 1.50, Cohort 2: 4.25, Cohort 3: 7.00. The variation across cohorts is what makes TWFE unreliable — it weights these effects in a way that can produce estimates outside the range of any individual cohort effect.

D. Mathematical Derivation

Don't worry about the notation yet — here's what this means in words: TWFE takes a weighted average of all possible two-group, two-period DiD estimates. The Goodman-Bacon weights on these 2×2 comparisons are non-negative, but some comparisons use already-treated units as controls, producing contaminated estimates. Under the de Chaisemartin and D'Haultfoeuille (2020) decomposition, the implied weights on individual treatment effects can be negative.

Consider a setting with three groups: Early treated (E), Late treated (L), and Never treated (N). Denote the treatment adoption dates as GE<GLG_E < G_L.

The TWFE regression is:

Yit=αi+λt+δDit+εitY_{it} = \alpha_i + \lambda_t + \delta D_{it} + \varepsilon_{it}

Goodman-Bacon (2021) shows:

δ^TWFE=wENδ^EN+wLNδ^LN+wELpreδ^ELpre+wELpostδ^ELpost\hat{\delta}^{TWFE} = w_{EN} \hat{\delta}_{EN} + w_{LN} \hat{\delta}_{LN} + w_{EL}^{pre} \hat{\delta}_{EL}^{pre} + w_{EL}^{post} \hat{\delta}_{EL}^{post}

where:

  • δ^EN\hat{\delta}_{EN}: DiD comparing Early to Never-treated (clean)
  • δ^LN\hat{\delta}_{LN}: DiD comparing Late to Never-treated (clean)
  • δ^ELpre\hat{\delta}_{EL}^{pre}: DiD comparing Early to Late, before Late is treated (clean)
  • δ^ELpost\hat{\delta}_{EL}^{post}: DiD comparing Late to Early, after Early is already treated (contaminated)

The last term is problematic. After GEG_E, the Early group's outcomes have been shifted by their treatment effect. If this effect is growing over time (dynamic effects), then δ^ELpost\hat{\delta}_{EL}^{post} subtracts the change in the Early group's treatment effect from the Late group's treatment effect.

In extreme cases, if the Early group's effect is growing fast, δ^ELpost\hat{\delta}_{EL}^{post} can be negative even if the true effect for Late adopters is positive. When this term gets enough weight, the overall TWFE estimate can be negative.

Callaway and Sant'Anna solution: Estimate ATT(g,t)ATT(g,t) for each group gg at each time tt using only clean comparisons:

ATT(g,t)=E[YtYg1Gi=g]E[YtYg1Gi=]ATT(g,t) = E[Y_t - Y_{g-1} | G_i = g] - E[Y_t - Y_{g-1} | G_i = \infty]

where Gi=G_i = \infty denotes never-treated units (or not-yet-treated, depending on the specification). These group-time effects can then be aggregated:

δ^CS=gtwg,tATT^(g,t)\hat{\delta}^{CS} = \sum_g \sum_t w_{g,t} \cdot \widehat{ATT}(g,t)

with only positive weights.

E. Implementation

library(did)          # Callaway & Sant'Anna
library(fixest)       # Sun & Abraham via sunab()
library(bacondecomp)  # Goodman-Bacon decomposition
library(ggplot2)      # Plotting

# Step 1: Goodman-Bacon decomposition
# Decomposes the TWFE estimate into all 2x2 DiD comparisons
bacon_out <- bacon(y ~ D, data = df, id_var = "unit_id", time_var = "year")

# Plot each 2x2 comparison: weight on x-axis, estimate on y-axis
# Points colored by type (treated vs never-treated, already-treated, etc.)
ggplot(bacon_out) + aes(x = weight, y = estimate, color = type) +
geom_point() + geom_hline(yintercept = 0)

# Step 2: Callaway & Sant'Anna estimator
# Estimates group-time ATT(g,t) using only clean comparisons
# control_group = "notyettreated" uses not-yet-treated units as controls
cs_out <- att_gt(
yname = "y", tname = "year", idname = "unit_id", gname = "first_treat",
data = df, control_group = "notyettreated"
)
summary(cs_out)
ggdid(cs_out)  # Event-study plot of group-time ATTs

# Step 3: Aggregate to overall ATT
# type = "simple" gives the weighted average across all group-time cells
agg_cs <- aggte(cs_out, type = "simple")
summary(agg_cs)

# Step 4: Sun & Abraham via fixest
# sunab(cohort_var, time_var) implements interaction-weighted estimation
est_sa <- feols(y ~ sunab(first_treat, year) | unit_id + year,
              data = df, vcov = ~state)
iplot(est_sa)

F. Diagnostics

  1. Run the Goodman-Bacon decomposition first. Before reaching for a modern estimator, decompose your TWFE estimate. If the problematic (already-treated as control) comparisons have small weights, TWFE may be fine.

  2. Compare TWFE to modern estimators. If they give similar results, the heterogeneity bias is small. If they diverge, report the modern estimator.

  3. Check group-time effects. Plot all the ATT(g,t)ATT(g,t) estimates from Callaway and Sant'Anna. This disaggregation reveals heterogeneity across cohorts and over time.

  4. Test for pre-trends within each cohort. The overall event study might look clean, but individual cohorts might have pre-trends that cancel each other out.

  5. Sensitivity to control group choice. Compare results using never-treated vs. not-yet-treated as the control group. If they diverge substantially, investigate why.

Interpreting Your Results

TWFE and modern estimator agree: Encouraging, but not conclusive on its own. Agreement suggests the heterogeneity bias may be small in your setting, especially if the Goodman-Bacon decomposition confirms that contaminated comparisons receive little weight. Report both estimators and note the agreement, but do not treat agreement alone as proof that TWFE is unbiased — agreement can also occur when both estimators are affected by a common violation (e.g., a parallel trends failure that affects all cohorts similarly).

TWFE and modern estimator diverge: Report the modern estimator as your main result. Show the Goodman-Bacon decomposition to explain why TWFE differs. This divergence is actually a compelling narrative for your paper — it shows you understand the methodology.

Group-time effects vary substantially: This heterogeneity is often substantively interesting. Why do early adopters have different effects than late adopters? Is it because the policy is different, the context is different, or the treated populations are different?

G. What Can Go Wrong

Assumption Failure Demo

Negative TWFE Estimate Despite Uniformly Positive Treatment Effects

Researcher studies staggered adoption of right-to-carry (RTC) gun laws across US states from 1980 to 2010 using the Callaway and Sant'Anna (2021) estimator with not-yet-treated states as controls. Treatment effects are allowed to vary by adoption cohort and time since treatment.

Group-time ATTs reveal that early-adopting states (1980s cohort) show violent crime reductions of -8% that grow to -12% over 10 years, while late-adopting states (2000s cohort) show smaller reductions of -3%. The overall ATT aggregated with proper positive weights is -6.2%.

Assumption Failure Demo

Using Never-Treated as Controls When They Are Systematically Different

Researcher studying staggered Medicaid expansion uses both never-treated and not-yet-treated states as alternative control groups and compares results. They also test for pre-trends separately for each treatment cohort.

Results using not-yet-treated controls show ATT of +4.8 pp in insurance coverage. Results using never-treated controls show ATT of +7.1 pp. The discrepancy arises because the 14 states that never expanded Medicaid are systematically more conservative and had different baseline coverage trends. Cohort-specific pre-trend tests reject parallel trends for the never-treated control group.

Assumption Failure Demo

Ignoring Treatment Effect Heterogeneity Across Cohorts

Researcher studying the effect of state minimum wage increases on teen employment estimates cohort-specific effects and discovers that states raising minimum wages during recessions (2008-2010 cohort) show employment effects of -2.1%, while states raising during expansions (2014-2016 cohort) show effects of -0.4%.

The researcher reports the heterogeneity and discusses how macroeconomic conditions moderate the employment effect of minimum wages, providing evidence relevant to the policy debate about timing of wage increases.

H. Practice

Concept Check

You are studying the effect of state-level policies adopted between 2005 and 2015. Your TWFE estimate is 0.02 (p = 0.03). The Goodman-Bacon decomposition shows that 60% of the weight comes from comparisons where already-treated states serve as controls, and those comparisons yield estimates near zero. Clean comparisons (treated vs. never-treated) yield estimates around 0.05. What should you conclude?

Guided Exercise

Staggered DiD: State Marijuana Legalization and Traffic Fatalities

A public health researcher studies whether legalizing recreational marijuana increases traffic fatalities. Between 2012 and 2020, 14 states legalized recreational marijuana at different times. She has annual traffic fatality rates for all 50 states from 2005 to 2022. She estimates a two-way fixed effects (TWFE) regression and finds a positive but imprecise effect.

Why is standard TWFE potentially biased in this staggered adoption setting?

What does 'heterogeneous treatment effects' mean in this context?

Name one modern estimator that handles staggered adoption correctly and what it does differently from TWFE.

If you find TWFE = +0.8 but Callaway-Sant'Anna = +2.1, what does the difference suggest?

Error Detective

Read the analysis below carefully and identify the errors.

A researcher studies the effect of state-level paid family leave mandates on female labor force participation. Six states adopted paid leave between 2004 and 2018. The researcher estimates a TWFE regression: Y_it = alpha_i + lambda_t + delta * D_it + epsilon_it, clustered at the state level. They find delta = 0.023 (p = 0.04). They then estimate an event study using the same TWFE specification and report: "The event-study plot shows no pre-trends and a persistent positive effect. The TWFE estimate of 2.3 percentage points is our preferred specification because it is more efficient than the Callaway and Sant'Anna estimator, which yields a noisier estimate of 3.1 percentage points."

Select all errors you can find:

Error Detective

Read the analysis below carefully and identify the errors.

A researcher studies the staggered rollout of electronic health record (EHR) mandates across 30 hospitals in a health system from 2012 to 2019. They use the Callaway and Sant'Anna estimator with not-yet-treated hospitals as controls. They report: "The aggregate ATT is a 12% reduction in medication errors (p < 0.01). We find no evidence of heterogeneity across cohorts (F-test p = 0.34)." They do not report group-time specific effects or discuss which hospitals adopted early versus late. Their dataset has 5 hospitals that adopted in 2012, 8 in 2014, 10 in 2016, and 7 in 2019.

Select all errors you can find:

Referee Exercise

Read the paper summary below and write a brief referee critique (2-3 sentences) of the identification strategy.

Paper Summary

The authors study the staggered rollout of state-level renewable portfolio standards (RPS) on electricity prices using data from 50 states over 2000-2020. Twenty-nine states adopted RPS between 2002 and 2015. They implement both TWFE and Callaway-Sant'Anna estimators. The TWFE estimate suggests RPS increases electricity prices by 1.2 cents/kWh (p = 0.02). The Callaway-Sant'Anna estimate is 2.8 cents/kWh (p = 0.01). They report the Callaway-Sant'Anna result as their preferred specification.

Key Table

EstimatorATT (cents/kWh)SEp-value
TWFE1.20.50.02
Callaway-Sant'Anna2.81.10.01
Sun-Abraham2.50.90.01
Goodman-Bacon Decomposition:
  Treated vs Never-treated:    3.1 (weight: 0.35)
  Treated vs Not-yet-treated:  2.4 (weight: 0.25)
  Already-treated vs Later:    -0.6 (weight: 0.40)

Authors' Identification Claim

Parallel trends are supported by flat pre-treatment event-study coefficients in the Callaway-Sant'Anna specification. The large discrepancy between TWFE and modern estimators demonstrates the importance of using heterogeneity-robust methods.

I. Swap-In: When to Use Something Else

  • Canonical 2×2 DiD: When treatment is adopted simultaneously by all treated units — the classic two-group, two-period design avoids the negative-weighting issues of staggered settings.
  • Event studies: When the full time profile of dynamic treatment effects is of primary interest, rather than a single summary parameter.
  • Synthetic DiD: When the parallel trends assumption is suspect and reweighting control units to match the pre-treatment trajectory of treated units improves credibility.
  • Synthetic control: When the number of treated units is very small (one to five) and constructing a data-driven counterfactual is more transparent than assuming parallel trends.

J. Reviewer Checklist

Critical Reading Checklist

Paper Library

Foundational (5)

Goodman-Bacon, A. (2021). Difference-in-Differences with Variation in Treatment Timing.

Journal of EconometricsDOI: 10.1016/j.jeconom.2021.03.014

Goodman-Bacon decomposed the two-way fixed-effects DID estimator into a weighted average of all possible two-group, two-period DID comparisons, revealing that some comparisons use already-treated units as controls. This paper sparked the modern revolution in staggered DID methods by exposing the bias problem.

Callaway, B., & Sant'Anna, P. H. C. (2021). Difference-in-Differences with Multiple Time Periods.

Journal of EconometricsDOI: 10.1016/j.jeconom.2020.12.001

Callaway and Sant'Anna proposed group-time average treatment effects (ATT(g,t)) that avoid the problematic comparisons in TWFE. Their framework allows for heterogeneous treatment effects across groups and time and provides aggregation schemes for summary parameters.

Sun, L., & Abraham, S. (2021). Estimating Dynamic Treatment Effects in Event Studies with Heterogeneous Treatment Effects.

Journal of EconometricsDOI: 10.1016/j.jeconom.2020.09.006

Sun and Abraham showed that conventional event-study regression coefficients are contaminated by treatment effect heterogeneity across cohorts and proposed an interaction-weighted estimator that recovers clean dynamic treatment effects. This paper is the key reference for event-study plots in staggered settings.

de Chaisemartin, C., & D'Haultfoeuille, X. (2020). Two-Way Fixed Effects Estimators with Heterogeneous Treatment Effects.

American Economic ReviewDOI: 10.1257/aer.20181169

De Chaisemartin and D'Haultfoeuille showed that the TWFE estimator can assign negative weights to some treatment effects, potentially producing estimates with the wrong sign. They proposed an alternative estimator and a diagnostic test for the presence of negative weights.

Borusyak, K., Jaravel, X., & Spiess, J. (2024). Revisiting Event-Study Designs: Robust and Efficient Estimation.

Review of Economic StudiesDOI: 10.1093/restud/rdae007

Borusyak, Jaravel, and Spiess proposed an imputation estimator for staggered DID that first estimates unit and time fixed effects from untreated observations, then imputes the counterfactual outcomes. This approach is efficient, flexible, and avoids the negative weighting problem of TWFE.

Application (4)

Baker, A. C., Larcker, D. F., & Wang, C. C. Y. (2022). How Much Should We Trust Staggered Difference-in-Differences Estimates?.

Journal of Financial EconomicsDOI: 10.1016/j.jfineco.2022.01.004

Baker, Larcker, and Wang demonstrated that the staggered DID problems identified in the econometrics literature are empirically relevant in finance research. They re-analyzed prominent finance studies and showed that results can change substantially when using robust estimators.

Barrios, J. M. (2021). Staggered Rollout Designs in Accounting Research.

Working Paper, Washington University in St. Louis

Barrios examined the prevalence of staggered DID designs in accounting research and showed that many published results are sensitive to the choice of estimator. This paper raised awareness of the staggered DID problem in the accounting and management fields. [UNVERIFIED: Working paper status may have changed.]

Deshpande, M., & Li, Y. (2019). Who Is Screened Out? Application Costs and the Targeting of Disability Programs.

American Economic Journal: Economic PolicyDOI: 10.1257/pol.20180076

Deshpande and Li used staggered closings of Social Security field offices across the United States to estimate the effects of application costs on disability program participation. The staggered timing of office closures across locations provides a natural setting for modern staggered DID methods, and the paper demonstrates how treatment-timing variation can be leveraged for credible policy evaluation.

Flammer, C. (2015). Does Corporate Social Responsibility Lead to Superior Financial Performance? A Regression Discontinuity Approach.

Management ScienceDOI: 10.1287/mnsc.2014.2038

Flammer used staggered adoption of constituency statutes across U.S. states as a natural experiment to examine the effect of stakeholder orientation on firm performance. The staggered passage of these laws across states and years provides an archetypal setting for staggered DID, and the paper is a prominent example in the management literature of leveraging staggered policy rollouts for causal identification.

Survey (1)

Roth, J., Sant'Anna, P. H. C., Bilinski, A., & Poe, J. (2023). What's Trending in Difference-in-Differences? A Synthesis of the Recent Econometrics Literature.

Journal of EconometricsDOI: 10.1016/j.jeconom.2023.03.008

This comprehensive survey synthesizes the explosion of recent econometric work on DID, covering staggered treatment timing, heterogeneous treatment effects, pre-trends testing, and new estimators. It is the essential starting point for understanding the modern staggered DID literature.

Tags

design-basedpanelstaggered-treatmentheterogeneous-effects