Natural Experiment Workflow
A decision tree and end-to-end workflow for identifying, validating, and exploiting natural experiments for causal inference. Covers source identification, estimator selection, diagnostics, and reporting.
What Makes a Natural Experiment
A natural experiment occurs when some force outside the researcher's control — a policy change, an institutional rule, a geographic boundary, a lottery — generates as-if random variation in a treatment of interest. Unlike a randomized controlled trial, the researcher does not assign treatment. Instead, the researcher discovers a source of variation that approximates random assignment and exploits it for causal inference (Angrist & Pischke, 2009). This approach is central to the credibility revolution in empirical social science. Dunning (2012) provides a systematic framework for identifying and evaluating natural experiments across the social sciences.
The quality of a natural experiment depends entirely on one question: Is the variation in treatment plausibly exogenous? Everything in this workflow serves that question (Imbens & Rubin, 2015).
Phase 1: Identify the Source of Variation
The first and most important step is finding a credible source of exogenous variation. This search is where research intuition, institutional knowledge, and creativity matter most.
Common Sources of Variation
Policy changes. A new law, regulation, or program that affects some units but not others. State-level policy variation in the United States is especially valuable because states adopt policies at different times, providing staggered treatment variation. Example: state-level adoption of medical marijuana laws.
Institutional cutoffs. Age thresholds, test score cutoffs, size thresholds, or regulatory boundaries that create sharp discontinuities in treatment. Example: firms above a certain number of employees must comply with a regulation; firms just below do not.
Geographic boundaries. County borders, state borders, or school district boundaries that create discontinuities in policy exposure for otherwise similar populations. Example: comparing outcomes for students on either side of a school district boundary.
Timing variation. Events that affect different units at different times, creating natural variation in treatment exposure. Example: staggered rollout of broadband internet across regions.
Lotteries and quasi-random assignment. Draft lotteries, charter school admission lotteries, random judge assignment, alphabetical assignment, and other institutional mechanisms that approximate random allocation.
Validating the Variation
Once you have identified a potential source of variation, ask these critical questions:
- Could units anticipate the treatment? If firms knew a regulation was coming and adjusted their behavior before it took effect, your "before" period is contaminated.
- Could units select into treatment? If individuals moved to a state because of a policy change, your treated and control groups are no longer comparable.
- Is the variation correlated with other changes? If a policy change coincided with an economic recession, your treatment effect is confounded with the recession's effect.
A state passes a new corporate tax cut in 2015. You plan to use DiD comparing firms in this state to firms in neighboring states. What is the most important threat to this design?
Phase 2: Choose the Right Estimator
The type of variation determines which estimator is appropriate. Use this decision tree:
Is treatment determined by crossing a threshold on a continuous variable? If yes, use Regression Discontinuity Design (sharp if compliance is perfect, fuzzy if not).
Is there a before-and-after comparison with treated and untreated groups? If yes:
- If treatment is adopted simultaneously by all treated units, use canonical DiD.
- If treatment rolls out at different times, use staggered DiD with modern estimators (Callaway-Sant'Anna, Sun-Abraham, or imputation).
- If there are very few treated units at an aggregate level, use Synthetic Control.
Do you have an instrument that affects treatment but not the outcome directly? If yes, use IV/2SLS. If the instrument is constructed from shares and shocks, consider the shift-share framework.
Is treatment randomly assigned but with imperfect compliance? Use the IV/LATE framework with randomized assignment as the instrument.
None of the above, but you observe rich covariates? You are in selection-on-observables territory. Consider matching, doubly robust estimation, or double machine learning. Be transparent that your identification relies on the assumption that all confounders are observed. The observational data workflow covers this path in detail.
Phase 3: Assemble and Clean the Data
With your variation identified and estimator chosen, assemble the data. Be meticulous at this stage — data errors are the most common source of incorrect results.
Define the treatment precisely. What exactly changed? When did it change? For whom? Document the treatment assignment rule with enough precision that another researcher could reconstruct your treatment indicator from source data.
Construct the treatment variable carefully. For staggered designs, you need the exact adoption date for each unit. For RDD, you need the running variable measured precisely. For IV, you need both the instrument and the endogenous treatment.
Define the sample. What time period? Which units are included? Document every sample restriction and justify each one. Arbitrary sample restrictions invite suspicion.
Measure outcomes consistently. Make sure the outcome variable is measured the same way before and after treatment. Measurement changes that coincide with treatment will be mistaken for treatment effects.
Assemble pre-treatment covariates. These serve two purposes: balance checks (are treated and control groups comparable?) and controls (to improve precision and, in some designs, reduce bias).
Phase 4: Run Diagnostics Before Estimation
Do not jump straight to the treatment effect. First, validate your design:
For Difference-in-Differences
- Plot the event study. Are pre-treatment coefficients close to zero and statistically insignificant? Look at both the magnitude and the confidence intervals — non-significant pre-trends with wide confidence intervals are not reassuring.
- Test for differential pre-trends. A joint F-test of all pre-treatment coefficients is more powerful than examining them individually.
- Check for anticipation effects. If the coefficient starts moving before the treatment date, either treatment was anticipated or your treatment timing is wrong.
For Regression Discontinuity
- McCrary density test. Is there bunching at the cutoff? If units can manipulate the running variable to be just above or below the threshold, the design fails.
- Covariate smoothness. Do observable covariates change smoothly through the cutoff? Discontinuities in covariates suggest that units are sorting around the threshold.
- Bandwidth sensitivity. How do results change with different bandwidth choices? Use the Calonico-Cattaneo-Titiunik optimal bandwidth procedure and show robustness to alternatives.
For Instrumental Variables
- First-stage strength. Report the F-statistic from the first stage. Is it well above 10? With multiple instruments, use the Kleibergen-Paap or Cragg-Donald statistics.
- Exclusion restriction plausibility. This assumption cannot be tested statistically. Make the argument on substantive grounds and address the most plausible violations.
- Overidentification tests. If you have more instruments than endogenous variables, run Hansen's J-test. But remember: it tests a joint hypothesis, and passing it does not prove validity.
For All Designs
- Balance tables. Compare treated and control units on pre-treatment observables. Significant imbalances suggest selection problems (Imbens & Rubin, 2015).
- Placebo outcomes. Test your design on outcomes that should not be affected by the treatment. Significant effects on placebo outcomes indicate a design problem.
Phase 5: Estimate and Report
With diagnostics complete:
- Run your main specification. Report the coefficient, standard error, confidence interval, and sample size.
- Interpret the effect size in context. What percentage of the dependent variable mean? How many standard deviations? How does it compare to other interventions?
- Run robustness checks. Alternative specifications, different samples, placebo tests, sensitivity analysis (Oster bounds, specification curves).
- Report all pre-specified analyses. If you wrote a pre-analysis plan, follow it. Do not selectively report only the results that "worked."
- Discuss limitations honestly. What threats remain? How severe would confounding need to be to explain your result?
Decision Flowchart
Use this as a quick reference when starting a new project:
- Find the variation — What event, policy, or rule creates treatment variation?
- Assess exogeneity — Could units anticipate, select into, or manipulate the treatment?
- Match estimator to variation — DiD, RDD, IV, SC, or selection-on-observables?
- Build the dataset — Treatment indicator, outcomes, covariates, panel structure
- Validate the design — Pre-trends, density tests, first-stage, balance
- Estimate the effect — Main specification, then add controls and fixed effects
- Stress-test the result — Placebo tests, sensitivity analysis, specification curve
- Report with full transparency — Estimand, assumptions, limitations, all pre-specified analyses