Guide·8 min read

Guide

Natural Experiment Workflow

A decision tree and end-to-end workflow for identifying, validating, and exploiting natural experiments — source, estimator, diagnostics, and reporting.

Reading Time: ~8 min read · 7 sections · 1 interactive exercise

What Makes a Natural Experiment

A natural experiment occurs when some force outside the researcher's control — a policy change, an institutional rule, a geographic boundary, a lottery — generates as-if random variation in a treatment of interest. Unlike a randomized controlled trial, the researcher does not assign treatment. Instead, the researcher discovers a source of variation that approximates random assignment and exploits it for causal inference (Angrist & Pischke, 2009). Exploiting discovered exogenous variation is central to the credibility revolution in empirical social science. Dunning (2012) provides a systematic framework for identifying and evaluating natural experiments across the social sciences.

The quality of a natural experiment depends entirely on one question: Is the variation in treatment plausibly exogenous? Everything in this workflow serves that question (Imbens & Rubin, 2015).

Phase 1: Identify the Source of Variation

The first and most important step is finding a credible source of exogenous variation. This search is where research intuition, institutional knowledge, and creativity matter most.

Common Sources of Variation

Policy changes. A new law, regulation, or program that affects some units but not others. State-level policy variation in the United States is especially valuable because states adopt policies at different times, providing staggered treatment variation. Example: state-level adoption of medical marijuana laws.

Institutional cutoffs. Age thresholds, test score cutoffs, size thresholds, or regulatory boundaries that create sharp discontinuities in treatment. Example: firms above a certain number of employees must comply with a regulation; firms just below do not.

Geographic boundaries. County borders, state borders, or school district boundaries that create discontinuities in policy exposure for otherwise similar populations. Example: comparing outcomes for students on either side of a school district boundary.

Timing variation. Events that affect different units at different times, creating natural variation in treatment exposure. Example: staggered rollout of broadband internet across regions.

Lotteries and quasi-random assignment. Draft lotteries, charter school admission lotteries, random judge assignment, alphabetical assignment, and other institutional mechanisms that approximate random allocation.

Validating the Variation

Once you have identified a potential source of variation, ask these critical questions:

Could units anticipate the treatment? If firms knew a regulation was coming and adjusted their behavior before it took effect, your "before" period is contaminated.
Could units select into treatment? If individuals moved to a state because of a policy change, your treated and control groups are no longer comparable.
Is the variation correlated with other changes? If a policy change coincided with an economic recession, your treatment effect is confounded with the recession's effect.

Concept Check

A state passes a new corporate tax cut in 2015. You plan to use DiD comparing firms in this state to firms in neighboring states. What is the most important threat to this design?

The tax cut might not be large enough to detect an effect.Firms might relocate to the state specifically because of the tax cut, contaminating the treated group.The neighboring states might have different industry compositions.Quarterly data might not be available.

Phase 2: Choose the Right Estimator

The type of variation determines which estimator is appropriate. Use this decision tree:

Is treatment determined by crossing a threshold on a continuous variable? If yes, use Regression Discontinuity Design (sharp if compliance is perfect, fuzzy if not).

Is there a before-and-after comparison with treated and untreated groups? If yes:

If treatment is adopted simultaneously by all treated units, use canonical DiD.
If treatment rolls out at different times, use staggered DiD with modern estimators (Callaway-Sant'Anna, Sun-Abraham, or imputation).
If there are very few treated units at an aggregate level, use Synthetic Control.

Do you have an instrument that affects treatment but not the outcome directly? If yes, use IV/2SLS. If the instrument is constructed from shares and shocks, consider the shift-share framework.

Is treatment randomly assigned but with imperfect compliance? Use the IV/LATE framework with randomized assignment as the instrument.

None of the above, but you observe rich covariates? You are in selection-on-observables territory. Consider matching, doubly robust estimation, or double machine learning. Be transparent that your identification relies on the assumption that all confounders are observed. The observational data workflow covers this path in detail.

Phase 3: Assemble and Clean the Data

With your variation identified and estimator chosen, assemble the data. Be meticulous at this stage — data errors are the most common source of incorrect results.

Define the treatment precisely. What exactly changed? When did it change? For whom? Document the treatment assignment rule with enough precision that another researcher could reconstruct your treatment indicator from source data.

Construct the treatment variable carefully. For staggered designs, you need the exact adoption date for each unit. For RDD, you need the running variable measured precisely. For IV, you need both the instrument and the endogenous treatment.

Define the sample. What time period? Which units are included? Document every sample restriction and justify each one. Arbitrary sample restrictions invite suspicion.

Measure outcomes consistently. Make sure the outcome variable is measured the same way before and after treatment. Measurement changes that coincide with treatment will be mistaken for treatment effects.

Assemble pre-treatment covariates. These covariates serve two purposes: balance checks (are treated and control groups comparable?) and controls (to improve precision and, in some designs, reduce bias).

Phase 4: Run Diagnostics Before Estimation

Do not jump straight to the treatment effect. First, validate your design:

For Difference-in-Differences

Plot the event study. Are pre-treatment coefficients close to zero and statistically insignificant? Look at both the magnitude and the confidence intervals — non-significant pre-trends with wide confidence intervals are not reassuring.
Test for differential pre-trends. A joint F-test of all pre-treatment coefficients is more powerful than examining them individually.
Check for anticipation effects. If the coefficient starts moving before the treatment date, either treatment was anticipated or your treatment timing is wrong.

For Regression Discontinuity

McCrary density test. Is there bunching at the cutoff? If units can manipulate the running variable to be just above or below the threshold, the design fails.
Covariate smoothness. Do observable covariates change smoothly through the cutoff? Discontinuities in covariates suggest that units are sorting around the threshold.
Bandwidth sensitivity. How do results change with different bandwidth choices? Use the Calonico-Cattaneo-Titiunik optimal bandwidth procedure and show robustness to alternatives.

For Instrumental Variables

First-stage strength. Report the effective F-statistic (Montiel Olea & Pflueger, 2013). The Staiger and Stock (1997) rule of thumb $F > 10$ — formalized by Stock and Yogo (2005) — remains the standard screening threshold; more recently, Lee et al. (2022) show that for the conventional 5% confidence interval to maintain its nominal coverage in the just-identified case, the first-stage F must exceed 104.7 (use the tF procedure if F is lower). With multiple instruments, report the effective F or the Kleibergen-Paap statistic. The IV/2SLS method page details the full set of thresholds.
Exclusion restriction plausibility. This assumption cannot be tested statistically. Make the argument on substantive grounds and address the most plausible violations.
Overidentification tests. If you have more instruments than endogenous variables, run Hansen's J-test. But remember: it tests a joint hypothesis, and passing it does not prove validity.

For All Designs

Balance tables. Compare treated and control units on pre-treatment observables. Significant imbalances suggest selection problems (Imbens & Rubin, 2015).
Placebo outcomes. Test your design on outcomes that should not be affected by the treatment. Significant effects on placebo outcomes indicate a design problem.

Phase 5: Estimate and Report

With diagnostics complete:

Run your main specification. Report the coefficient, standard error, confidence interval, and sample size.
Interpret the effect size in context. What percentage of the dependent variable mean? How many standard deviations? How does it compare to other interventions?
Run robustness checks. Alternative specifications, different samples, placebo tests, sensitivity analysis (Oster bounds, specification curves).
Report all pre-specified analyses. If you wrote a pre-analysis plan, follow it. Do not selectively report only the results that "worked."
Discuss limitations honestly. What threats remain? How severe would confounding need to be to explain your result?

Decision Flowchart

Use this as a quick reference when starting a new project:

Find the variation — What event, policy, or rule creates treatment variation?
Assess exogeneity — Could units anticipate, select into, or manipulate the treatment?
Match estimator to variation — DiD, RDD, IV, SC, or selection-on-observables?
Build the dataset — Treatment indicator, outcomes, covariates, panel structure
Validate the design — Pre-trends, density tests, first-stage, balance
Estimate the effect — Main specification, then add controls and fixed effects
Stress-test the result — Placebo tests, sensitivity analysis, specification curve
Report with full transparency — Estimand, assumptions, limitations, all pre-specified analyses

What Makes a Natural Experiment#

Phase 1: Identify the Source of Variation#

Common Sources of Variation#

Validating the Variation#

Phase 2: Choose the Right Estimator#

Phase 3: Assemble and Clean the Data#

Phase 4: Run Diagnostics Before Estimation#

For Difference-in-Differences#

For Regression Discontinuity#

For Instrumental Variables#

For All Designs#

Phase 5: Estimate and Report#

Decision Flowchart#