Chapter 4 of 8
The Language of Identification
Estimand, estimator, estimate. ATE, ATT, LATE. The precise vocabulary you need.
By now, you have a solid intuition for why causal inference is hard. You know that comparing treated and untreated groups is misleading when selection bias is present. You know that omitted variables distort regression estimates. You can even sign the direction of bias.
But if you sit in an empirical methods seminar right now, you will hear people say things like:
"What is your estimand?" "This design identifies the ATT, not the ATE." "The identification relies on exogenous variation in exposure to the policy."
And if you are like most first-year students, you will nod politely while internally panicking.
This page fixes that gap. We are going to equip you with the precise vocabulary of causal inference — not as abstract definitions to memorize, but as tools for thinking about your own research. Every term will be grounded in our running example: the job training program.
Estimand, Estimator, Estimate
These three words sound similar, but they refer to very different things. Confusing them is one of the most common conceptual errors in applied work.
The Estimand
The is the thing you are trying to learn — defined in terms of potential outcomes, before you look at any data or choose any statistical method. The potential outcomes framework was formalized by Rubin (1974) and is the foundation for modern causal inference (Imbens & Rubin, 2015).
Think of it as the question you are asking, stated with mathematical precision.
For our training program, the estimand might be:
"What is the average causal effect of the training program on participants' earnings?"
In potential outcomes notation:
Notice: the estimand says nothing about regression, matching, instrumental variables, or any other statistical technique. It is a conceptual quantity — a feature of reality that exists whether or not you are smart enough to measure it.
The Estimator
The is the statistical recipe you apply to data to try to learn the estimand.
Different estimators can target the same estimand. For the training program, you might use:
- A simple difference in means (naive, biased if selection is present)
- OLS with control variables (less biased if controls are sufficient)
- Propensity score matching (relies on selection-on-observables)
- Difference-in-differences (relies on parallel trends)
- Instrumental variables (relies on a valid instrument)
Each estimator comes with its own assumptions. If those assumptions hold, the estimator consistently recovers the estimand. If they do not hold, the estimator converges to something else — something that is not the causal effect.
The Estimate
The estimate is the number you get when you apply your estimator to your specific dataset.
It is a single realization — one number, like "$2,347." It is subject to sampling variability (a different sample would give a different number), which is why we report standard errors and confidence intervals alongside it.
Why does this distinction matter? Because one of the most common mistakes in applied work is choosing an estimator without first defining the estimand. Researchers jump straight to "I'll run a regression" without asking "what causal quantity am I trying to recover, and under what assumptions does my regression recover it?"
Four Flavors of Causal Effect
Not all causal effects are the same. Depending on whose outcomes you are averaging over, you get different quantities — and they can have very different values and policy implications.
ATE: Average Treatment Effect
The is the average causal effect across the entire population:
For our training program: if you could somehow force every person in the population — both those who would volunteer and those who would not — to take the training, how much would average earnings increase?
ATT: Average Treatment Effect on the Treated
The is the average causal effect among those who actually received treatment:
For our training program: among the people who actually enrolled, how much did the program increase their earnings?
The ATT is often what policymakers care about most: "Did the program help the people we served?" But notice it is a different question from the ATE. If motivated people both benefit more from training and are more likely to enroll, then ATT > ATE. The program works better for its actual participants than it would for a random person off the street.
ATU: Average Treatment Effect on the Untreated
The Average Treatment Effect on the Untreated is the average causal effect among those who did not receive treatment:
For our training program: if you could have given training to the people who did not enroll, how much would it have helped them?
The ATU estimand might seem obscure, but it matters enormously for policy. If the government is considering expanding the program to reach non-participants, the ATU tells you what to expect. And it might be very different from the ATT — maybe the people who did not enroll would benefit less (or more) from training.
LATE: Local Average Treatment Effect
The is the average causal effect for a specific subpopulation called :
The LATE estimand takes a bit more setup. Suppose the government sends letters encouraging people to enroll in the training program. Some people who get the letter enroll (they would not have enrolled without it). Some people who get the letter ignore it. Some people enroll regardless of whether they get the letter.
The LATE is the causal effect for the people whose behavior was changed by the letter — the compliers. It is the estimand you get when you use an instrumental variable (the letter) to estimate the effect of the treatment (training) (Imbens & Angrist, 1994). You will learn much more about this estimand when you study instrumental variables.
Let us make these distinctions concrete with our training program:
| Estimand | In Words | Value Might Be... | Who Cares |
|---|---|---|---|
| ATE | Effect on a random person | $1,500 | Economists studying the general impact |
| ATT | Effect on actual trainees | $3,000 | Program administrators evaluating success |
| ATU | Effect on non-trainees if they had trained | $500 | Policymakers considering expansion |
| LATE | Effect on people nudged by outreach letters | $2,000 | Researchers using the letter as an instrument |
The listed estimands are all different quantities, and a study that estimates one is not necessarily telling you about the others.
Estimand Explorer
See how ATE, ATT, and ATU can differ when treatment effects vary across individuals. Adjust the parameters to create scenarios where the estimands diverge dramatically.
Try these experiments:
- Set both treatment effects equal. The ATE, ATT, and ATU all converge — there is no selection on gains.
- Make the effect for motivated people much larger. Now ATT > ATE > ATU, because the people who enroll benefit more.
- Set the unmotivated effect to zero or negative. Expanding the program to non-participants would be wasteful, even though the ATT looks impressive.
What Does "Identification" Mean?
You will hear the word constantly. Informally, it means: can you recover the causal parameter you care about from the data you have, given the assumptions you are willing to make?
More precisely, a research design is identified when the assumptions of the design are sufficient to express the estimand — a causal quantity involving unobservable potential outcomes — as a function of observable data. When point identification is not achievable, researchers can sometimes derive informative bounds on the parameter of interest .
Here is the key insight: identification is about assumptions, not about statistical techniques. You do not identify a causal effect by running a regression. You identify it by providing an argument — grounded in institutional knowledge, research design, or both — for why the variation you are exploiting is unrelated to confounders. As Holland (1986) emphasized, the "fundamental problem of causal inference" is that we can never observe both potential outcomes for the same unit — which is why identification strategies are essential. Angrist and Pischke (2009) provide an accessible treatment of these identification strategies in practice.
Think of it this way:
- Identified: "Because the training slots were assigned by lottery, the treated and untreated groups are comparable in expectation. Therefore, a simple difference in means identifies the ATE."
- Not identified: "I ran a regression of earnings on training participation." (Why should I believe the coefficient is causal? What ensures that trainees and non-trainees are comparable?)
A researcher regresses earnings on a job training indicator and finds a coefficient of \$3,200 (p < 0.01). She concludes that the training program caused a \$3,200 increase in earnings. What is the most fundamental problem with this conclusion?
What Does "Exogenous Variation" Mean?
When researchers say they are exploiting "exogenous variation," they mean they have found a source of variation in the treatment variable that is not driven by the confounders that create selection bias.
The word means "coming from outside." In this context, it means: the variation is determined by something outside the system of confounders you are worried about.
Examples:
- A lottery that assigns training slots generates exogenous variation in training — this lottery mechanism is the logic behind randomized experiments. Whether you get trained is determined by a random number, not by your motivation or ability.
- A policy change in one state but not a neighboring state generates plausibly exogenous variation in the policy variable — which state you live in was not determined by your anticipated response to the policy (usually).
- An arbitrary bureaucratic cutoff (e.g., schools with enrollment above 25 get an additional teacher) generates exogenous variation near the cutoff — whether your school has 24 or 26 students is approximately random.
In each case, the researcher is arguing that the variation they exploit is "as good as random" — not perfectly random, but close enough that the confounders that plague naive comparisons are not driving the variation.
Which of the following is the best example of exogenous variation in exposure to a job training program?
Putting It All Together
Let us revisit our training mystery one more time, now with the precise vocabulary:
-
The estimand we want: The ATT — the average causal effect of training on those who participated. .
-
The estimator we used (naively): A simple difference in means. .
-
The estimate we got: $7,500.
-
The problem: Our estimator is not identified for the ATT because the variation in training participation is endogenous — driven by motivation and ability, which also affect earnings. The estimate captures both the causal effect and selection bias.
-
What we need: A source of exogenous variation in training participation — something that changes who gets trained but is not correlated with the confounders. The source of variation is an identification strategy, and the anatomy of a research design shows how it fits into the broader pipeline of credible empirical work.
-
What we would say in a seminar: "The naive comparison does not identify the ATT because training take-up is endogenous. We need exogenous variation — such as a lottery, a policy discontinuity, or a quasi-experimental design — to separate the causal effect from selection bias."
You can now speak this language. It will serve you for the rest of your career.
Key Takeaways
What Comes Next
You now have the conceptual vocabulary and the intuition. But there is a powerful visual tool that makes the relationships between variables, confounders, and causal pathways much easier to see — and to reason about formally. On the next page, you will learn to draw Directed Acyclic Graphs (DAGs): simple diagrams that let you see exactly where selection bias comes from and what you need to block it.
We will return to our training mystery and draw the problem. And once you see the picture, the solution will become much clearer.
Next Step: DAGs for Beginners — A visual tool for thinking about causal relationships. Draw the problem, see the solution.