A Taxonomy of Identification Strategies
The full toolkit — design-based, model-based, experimental, and quasi-experimental approaches.
Here are all the tools we have to solve problems like the training mystery.
Every Tool in the Toolbox
We have spent five pages building up to this moment. You understand why correlation is not causation. You have seen how a research design structures the causal argument. You know that selection bias is the enemy. You can speak the language of estimands and identification. You can draw a DAG and figure out which backdoor paths need blocking.
Now the question is: how do you actually block those paths?
The answer depends on your setting — what data you have, what variation exists, and what assumptions you are willing to make. Different settings call for different tools. This page is your map of the entire toolkit — what Angrist and Pischke (2009) describe as the core toolkit of applied econometrics, plus the methods that have emerged since. Think of it as the table of contents for every method you will learn on this site, organized by the logic of how each method solves the identification problem.
The Big Divide: Design-Based vs. Model-Based
The most fundamental distinction in modern empirical research is between design-based and model-based approaches.
Design-Based Approaches
methods rely on features of how the treatment was assigned or how variation arose in the real world. The credibility of the estimate comes from the research design — the institutional setting, the policy, the natural experiment — rather than from the statistical model.
The philosophy: "I found a setting where treatment was assigned in a way that is as if random (or at least, where I can isolate exogenous variation). My statistical model can be simple because the design does the heavy lifting."
Examples: randomized experiments, difference-in-differences, regression discontinuity, instrumental variables.
Model-Based Approaches
methods rely on the statistical model itself — its functional form, its distributional assumptions, its conditioning set — to achieve identification. The credibility comes from getting the model "right."
The philosophy: "I will control for enough variables, in the right way, to eliminate confounding. My model does the heavy lifting."
Examples: OLS with controls, matching, , .
Three Tiers of Evidence
Another useful taxonomy organizes methods by how close they come to a true experiment:
Tier 1: Experimental (Randomized)
What it is: The researcher (or someone) randomly assigns treatment. Randomization ensures that, in expectation, treated and control groups are balanced on all confounders — observed and unobserved.
For the training mystery: Randomly assign some job seekers to receive training and others to a control group. Compare earnings. This approach is exactly what the National Supported Work (NSW) Demonstration did in the 1970s — a landmark study we will revisit in The Credibility Revolution. You can learn the mechanics of designing and analyzing such studies on the experimental design page.
Limitation: Often infeasible or unethical. You cannot randomly assign firm bankruptcy, gender, or economic recessions.
Tier 2: Quasi-Experimental (Natural Experiments)
What it is: The researcher finds a setting where some naturally occurring event, policy change, or institutional feature created variation in treatment that is as if random — or at least, provides exogenous variation that can be isolated. Dunning (2012) provides a comprehensive framework for evaluating the credibility of such natural experiments.
For the training mystery: Perhaps the program was rolled out in some cities before others (difference-in-differences). Perhaps eligibility depended on a test score cutoff (regression discontinuity). Perhaps draft lottery numbers affected who enrolled (instrumental variables).
Limitation: Requires finding the right setting. The identifying variation must be genuinely exogenous, which is often debatable.
Tier 3: Observational (Selection on Observables)
What it is: No experiment, no natural experiment. The researcher uses observational data and relies on controlling for the right set of confounders to eliminate selection bias.
For the training mystery: Use survey data on trainees and non-trainees, control for education, age, prior earnings, motivation proxies, and hope that you have captured all the relevant confounders.
Limitation: Relies on the assumption that there are no unobserved confounders — an assumption that can never be verified from the data.
The Complete Method Map
Below is every method covered on this site, organized by category. Click any method to jump to its full page.
Method Taxonomy Map
Randomized experiments are widely regarded as the strongest basis for causal inference — random assignment eliminates confounding in expectation. When you can randomize, the identification argument is comparatively simple, though practical complications (noncompliance, attrition, spillovers) still require care.
Method Summary Table
Here is a compact reference for all the methods you will encounter. Do not worry about memorizing this table — you will learn each method in depth on its own page. Use this table to orient yourself and to find the right method for your setting.
| Method | Category | What It Exploits | Key Assumption | When to Use |
|---|---|---|---|---|
| Experimental Design | Experimental | Random assignment | Random assignment, Stable Unit Treatment Value Assumption (SUTVA) | You can randomize treatment |
| OLS | Model-based | Conditional independence | Zero conditional mean of errors | Baseline; building block for other methods |
| Logit / Probit | Model-based | Latent variable threshold | Correct functional form | Binary outcome variable |
| Poisson / Negative Binomial | Model-based | Conditional mean specification | Correct mean function | Count outcome variable |
| Cox Proportional Hazard | Model-based | Partial likelihood | Proportional hazards, non-informative censoring | Duration / time-to-event data |
| Fixed Effects | Design-based | Within-unit variation | All unobserved confounders are time-invariant | Panel data with repeated observations |
| Random Effects | Model-based | GLS quasi-demeaning | Unit effects uncorrelated with regressors | Panel data with time-invariant variables of interest |
| Difference-in-Differences | Design-based | Treatment timing variation | Parallel trends | Policy change, staggered rollout |
| Interrupted Time Series | Design-based | Time series break | No concurrent events, stable trend | Single unit, many time periods |
| RDD -- Sharp | Design-based | Score cutoff | Continuity at the cutoff | Treatment assigned by a threshold |
| Regression Kink Design | Design-based | Slope change at kink | Twice-differentiable conditional expectation at the kink | Treatment intensity changes at a kink |
| RDD -- Fuzzy | Design-based | Score cutoff with noncompliance | Continuity, monotonicity | Threshold affects treatment probability |
| Matching | Model-based | Selection on observables | Conditional independence, common support | Rich covariate data, no instrument |
| Heckman Selection Model | Model-based | Exclusion restriction in selection | Joint normality, valid exclusion restriction | Outcome observed only for a selected sample |
| Instrumental Variables / 2SLS | Design-based | Exogenous instrument | Relevance, exogeneity, exclusion restriction, monotonicity | Endogenous treatment with a valid instrument |
| Event Studies | Design-based | Pre/post treatment dynamics | Parallel trends (visible in pre-period) | Complement to DiD; visualizing dynamic effects |
| Staggered DiD | Design-based | Differential treatment timing | Parallel trends (heterogeneity-robust) | Multiple units treated at different times |
| Synthetic Control | Design-based | Weighted donor pool | Good pre-treatment fit | Few treated units, aggregate data |
| Shift-Share / Bartik | Design-based | Differential exposure to shocks | Exogenous shocks or exogenous shares | Regional/industry exposure variation |
| Bunching Estimation | Design-based | Mass point at threshold | Smooth counterfactual density | Tax kinks, notches, regulatory thresholds |
| Doubly Robust / AIPW | Model-based | Outcome + propensity model | At least one model correctly specified | Want insurance against model misspecification |
| Quantile Treatment Effects | Varies by design | Distributional variation | Rank invariance or rank similarity (+ design-specific assumptions) | Effects vary across the outcome distribution |
| Causal Mediation | Mechanism | Treatment → mediator → outcome | Sequential ignorability | Understand the causal pathway |
| Synthetic DiD | Design-based | Combines DiD + synthetic control | Relaxed parallel trends | When parallel trends alone is too strong |
| Double/Debiased ML | ML + Causal | Neyman orthogonality | Conditional independence + ML convergence | High-dimensional confounders |
| Causal Forests / HTE | ML + Causal | Honest estimation | Unconfoundedness, overlap | Estimate treatment effect heterogeneity |
| Marginal Treatment Effects | Design-based | IV + threshold-crossing model | Monotonicity, valid instrument | Understand selection into treatment |
Choosing a Method: First Principles
When you face a research question, here is how to think about method choice:
Step 1: What is your estimand? Do you want the ATE, ATT, or LATE? This narrows your options.
Step 2: What does your DAG look like? What are the confounders? Are any of them unobservable?
Step 3: What variation exists? This step is the crucial question:
- Was treatment randomized? Use an experiment.
- Is there a cutoff? Consider .
- Is there a before-and-after, with a comparison group? Consider DiD.
- Is there an instrument? Consider .
- Do you only have observational data with rich covariates? Consider matching, IPW, or doubly robust estimation methods.
- Do you need to handle many covariates flexibly? Consider DML or causal forests.
Step 4: Are the assumptions plausible? Every method requires assumptions. The best method is the one whose assumptions are most credible in your setting — not the one that sounds most sophisticated.
A researcher wants to estimate the effect of a new job training program on earnings. The program was available to all unemployed workers in City A but not in City B. Both cities are similar. She has earnings data for workers in both cities before and after the program launched. Which identification strategy is most natural for this setting?
How the Training Mystery Gets Solved (Preview)
Our running question — "Did the job training program work?" — has been studied with nearly every method on this site:
- Randomized experiment: The NSW Demonstration randomly assigned participants .
- Matching and reweighting: Dehejia and Wahba (1999) showed that propensity score methods could recover the experimental benchmark from observational data.
- Difference-in-differences: Training programs rolled out at different times across regions have been studied using DiD.
- Instrumental variables: Draft lotteries and eligibility rules have been used as instruments for training participation.
Each approach has strengths and weaknesses. The credibility revolution is partly the story of researchers learning which approach works best in which setting — and developing tools to assess credibility rather than taking any single method on faith.
Cross-Cutting Practices
Beyond choosing a method, credible empirical research requires a set of practices that apply to every method:
- Sensitivity analysis: How much would your results change if your assumptions were slightly wrong?
- Power analysis: Did you have enough data to detect the effect you were looking for?
- Multiple testing corrections: If you tested many outcomes or subgroups, did you account for the increased chance of false positives?
- Pre-registration: Did you commit to your analysis plan before seeing the results?
- Specification curves: How robust are your results across reasonable alternative specifications?
These practices strengthen the credibility of any empirical analysis and are increasingly expected in top journals. We cover each in the Practices section of this site.
✓Key Takeaways
→What Comes Next
Before we dive into specific methods, there is one more foundational skill: working with data. The next page covers loading, cleaning, reshaping, and constructing variables — the practical skills you need before you can implement any method.
Next Step: Working with Data — Master the practical skills of loading, cleaning, reshaping, and constructing variables before implementing any method.