Foundation·Chapter 6 of 8·10 min read

Chapter 6 of 8

A Taxonomy of Identification Strategies

The full toolkit — design-based, model-based, experimental, and quasi-experimental approaches.

The Mystery

Here are all the tools we have to solve problems like the training mystery.

Prerequisites: DAGs for Beginners
Reading Time: ~10 min read · 10 sections · 2 interactive exercises
Next Up: Chapter 7: Working with Data

Every Tool in the Toolbox

We have spent five pages building up to this moment. You understand why correlation is not causation. You have seen how a research design structures the causal argument. You know that selection bias is the enemy. You can speak the language of estimands and identification. You can draw a DAG and figure out which backdoor paths need blocking.

Now the question is: how do you actually block those paths?

The answer depends on your setting — what data you have, what variation exists, and what assumptions you are willing to make. Different settings call for different tools. This page is your map of the entire toolkit — what Angrist and Pischke (2009) describe as the core toolkit of applied econometrics, plus the methods that have emerged since. Think of it as the table of contents for every method you will learn on this site, organized by the logic of how each method solves the identification problem.

The Big Divide: Design-Based vs. Model-Based

The most fundamental distinction in modern empirical research is between design-based and model-based approaches.

Design-Based Approaches

methods rely on features of how the treatment was assigned or how variation arose in the real world. The credibility of the estimate comes from the research design — the institutional setting, the policy, the natural experiment — rather than from the statistical model.

The philosophy: "I found a setting where treatment was assigned in a way that is as if random (or at least, where I can isolate exogenous variation). My statistical model can be simple because the design does the heavy lifting."

Examples: randomized experiments, difference-in-differences, regression discontinuity, instrumental variables.

Model-Based Approaches

methods rely on the statistical model itself — its functional form, its distributional assumptions, its conditioning set — to achieve identification. The credibility comes from getting the model "right."

The philosophy: "I will control for enough variables, in the right way, to eliminate confounding. My model does the heavy lifting."

Examples: OLS with controls, matching, , .

Three Tiers of Evidence

Another useful taxonomy organizes methods by how close they come to a true experiment:

Tier 1: Experimental (Randomized)

What it is: The researcher (or someone) randomly assigns treatment. Randomization ensures that, in expectation, treated and control groups are balanced on all confounders — observed and unobserved.

For the training mystery: Randomly assign some job seekers to receive training and others to a control group. Compare earnings. The randomized design is exactly what the National Supported Work (NSW) Demonstration did in the 1970s — a landmark study we will revisit in The Credibility Revolution. You can learn the mechanics of designing and analyzing such studies on the experimental design page.

Limitation: Often infeasible or unethical. You typically cannot randomly assign firm bankruptcy, gender, or economic recessions.

Tier 2: Quasi-Experimental (Natural Experiments)

What it is: The researcher finds a setting where some naturally occurring event, policy change, or institutional feature created variation in treatment that is as if random — or at least, provides exogenous variation that can be isolated. Dunning (2012) provides a comprehensive framework for evaluating the credibility of such natural experiments.

For the training mystery: Perhaps the program was rolled out in some cities before others (difference-in-differences). Perhaps eligibility depended on a test score cutoff (regression discontinuity). Perhaps draft lottery numbers affected who enrolled (instrumental variables).

Limitation: Requires finding the right setting. The identifying variation must be genuinely exogenous, which is often debatable.

Tier 3: Observational (Selection on Observables)

What it is: No experiment, no natural experiment. The researcher uses observational data and relies on controlling for the right set of confounders to eliminate selection bias.

For the training mystery: Use survey data on trainees and non-trainees, control for education, age, prior earnings, motivation proxies, and hope that you have captured all the relevant confounders.

Limitation: Relies on the assumption that there are no unobserved confounders — an assumption that can never be verified from the data.

The Complete Method Map

Below is every method covered on this site, organized by category. Click any method to jump to its full page.

Animated Explanation — method map

Method Taxonomy Map

Experimental(1/6)

Randomized experiments are widely regarded as the strongest basis for causal inference — random assignment eliminates confounding in expectation. When you can randomize, the identification argument is comparatively simple, though practical complications (noncompliance, attrition, spillovers) still require care.

Method Summary Table

Here is a compact reference for all the methods you will encounter. Do not worry about memorizing this table — you will learn each method in depth on its own page. Use this table to orient yourself and to find the right method for your setting.

Method	Category	What It Exploits	Key Assumption	When to Use
Experimental Design	Experimental	Random assignment	Random assignment, Stable Unit Treatment Value Assumption (SUTVA)	You can randomize treatment
OLS	Model-based	Conditional independence	Zero conditional mean of errors	Baseline; building block for other methods
Logit / Probit	Model-based	Latent variable threshold	Correct functional form	Binary outcome variable
Poisson / Negative Binomial	Model-based	Conditional mean specification	Correct mean function	Count outcome variable
Cox Proportional Hazard	Model-based	Partial likelihood	Proportional hazards, non-informative censoring	Duration / time-to-event data
Fixed Effects	Design-based	Within-unit variation	All unobserved confounders are time-invariant	Panel data with repeated observations
Random Effects	Model-based	GLS quasi-demeaning	Unit effects uncorrelated with regressors	Panel data with time-invariant variables of interest
Difference-in-Differences	Design-based	Treatment timing variation	Parallel trends	Policy change, staggered rollout
Interrupted Time Series	Design-based	Time series break	No concurrent events, stable trend	Single unit, many time periods
RDD — Sharp	Design-based	Score cutoff	Continuity at the cutoff	Treatment assigned by a threshold
Regression Kink Design	Design-based	Slope change at kink	Twice-differentiable conditional expectation at the kink	Treatment intensity changes at a kink
RDD — Fuzzy	Design-based	Score cutoff with noncompliance	Continuity, monotonicity	Threshold affects treatment probability
Matching	Model-based	Selection on observables	Conditional independence, common support	Rich covariate data, no instrument
Heckman Selection Model	Model-based	Exclusion restriction in selection	Joint normality, valid exclusion restriction	Outcome observed only for a selected sample
Instrumental Variables / 2SLS	Design-based	Exogenous instrument	Relevance, exogeneity, exclusion restriction, monotonicity	Endogenous treatment with a valid instrument
Event Studies	Design-based	Pre/post treatment dynamics	Parallel trends (visible in pre-period)	Complement to DiD; visualizing dynamic effects
Staggered DiD	Design-based	Differential treatment timing	Parallel trends (heterogeneity-robust)	Multiple units treated at different times
Synthetic Control	Design-based	Weighted donor pool	Good pre-treatment fit	Few treated units, aggregate data
Shift-Share / Bartik	Design-based	Differential exposure to shocks	Exogenous shocks or exogenous shares	Regional/industry exposure variation
Bunching Estimation	Design-based	Mass point at threshold	Smooth counterfactual density	Tax kinks, notches, regulatory thresholds
Doubly Robust / AIPW	Model-based	Outcome + propensity model	At least one model correctly specified	Want insurance against model misspecification
Quantile Treatment Effects	Varies by design	Distributional variation	Rank invariance or rank similarity (+ design-specific assumptions)	Effects vary across the outcome distribution
Causal Mediation	Mechanism	Treatment → mediator → outcome	Sequential ignorability	Understand the causal pathway
Synthetic DiD	Design-based	Combines DiD + synthetic control	Relaxed parallel trends	When parallel trends alone is too strong
Double/Debiased ML	ML + Causal	Neyman orthogonality	Conditional independence + ML convergence	High-dimensional confounders
Causal Forests / HTE	ML + Causal	Honest estimation	Unconfoundedness, overlap	Estimate treatment effect heterogeneity
Marginal Treatment Effects	Design-based	IV + threshold-crossing model	Monotonicity, valid instrument	Understand selection into treatment

Choosing a Method: First Principles

When you face a research question, here is how to think about method choice:

Step 1: What is your estimand? Do you want the ATE, ATT, or LATE? This narrows your options.

Step 2: What does your DAG look like? What are the confounders? Are any of them unobservable?

Step 3: What variation exists? This step is the crucial question:

Was treatment randomized? Use an experiment.
Is there a cutoff? Consider .
Is there a before-and-after, with a comparison group? Consider DiD.
Is there an instrument? Consider .
Do you only have observational data with rich covariates? Consider matching, IPW, or doubly robust estimation methods.
Do you need to handle many covariates flexibly? Consider DML or causal forests.

Step 4: Are the assumptions plausible? Every method requires assumptions. The best method is the one whose assumptions are most credible in your setting — not the one that sounds most sophisticated.

Concept Check

A researcher wants to estimate the effect of a new job training program on earnings. The program was available to all unemployed workers in City A but not in City B. Both cities are similar. She has earnings data for workers in both cities before and after the program launched. Which identification strategy is most natural for this setting?

Instrumental variablesRegression discontinuityDifference-in-differencesPropensity score matching

How the Training Mystery Gets Solved (Preview)

Our running question — "Did the job training program work?" — has been studied with nearly every method on this site:

Randomized experiment: The National Supported Work (NSW) Demonstration randomly assigned participants; later compared the experimental benchmark against observational estimators on the same sample.
Matching and reweighting: Dehejia and Wahba (1999) showed that propensity score methods could recover the experimental benchmark from observational data.
Difference-in-differences: Training programs rolled out at different times across regions have been studied using DiD.
Instrumental variables: Draft lotteries and eligibility rules have been used as instruments for training participation.

Each approach has strengths and weaknesses. The credibility revolution is partly the story of researchers learning which approach works best in which setting — and developing tools to assess credibility rather than taking any single method on faith.

Cross-Cutting Practices

Beyond choosing a method, credible empirical research requires a set of practices that apply to every method:

Sensitivity analysis: How much would your results change if your assumptions were slightly wrong?
Power analysis: Did you have enough data to detect the effect you were looking for?
Multiple testing corrections: If you tested many outcomes or subgroups, did you account for the increased chance of false positives?
Pre-registration: Did you commit to your analysis plan before seeing the results?
Specification curves: How robust are your results across reasonable alternative specifications?

These practices strengthen the credibility of any empirical analysis and are increasingly expected in top journals. We cover each in the Practices section of this site.

✓Key Takeaways

Key Takeaways

Design-based methods get their credibility from the research setting (natural experiments, randomization, cutoffs). Model-based methods get their credibility from the statistical model (controlling for the right variables).
Methods form a rough spectrum from experimental (typically strongest internal validity) to quasi-experimental to observational (typically requires the strongest assumptions), though a well-executed observational study can be more credible than a poorly executed experiment.
There is no universally best method. The right choice depends on your setting: what variation exists, what data you have, and what assumptions are credible.
Every method exploits a specific source of variation and requires specific assumptions. The summary table on this page is your reference guide.
Cross-cutting practices (sensitivity analysis, power analysis, pre-registration) are important companions to any method.
The training mystery has been studied with experiments, matching, DiD, and IV — showing that the same question can be approached from many angles.

→What Comes Next

Before we dive into specific methods, there is one more foundational skill: working with data. The next page covers loading, cleaning, reshaping, and constructing variables — the practical skills you need before you can implement any method.

Next Step: Working with Data — Master the practical skills of loading, cleaning, reshaping, and constructing variables before implementing any method.

Every Tool in the Toolbox#

The Big Divide: Design-Based vs. Model-Based#

Design-Based Approaches#

Model-Based Approaches#

Three Tiers of Evidence#

Tier 1: Experimental (Randomized)#

Tier 2: Quasi-Experimental (Natural Experiments)#

Tier 3: Observational (Selection on Observables)#

The Complete Method Map#