Foundation·Chapter 4 of 8·12 min read

Chapter 4 of 8

The Language of Identification

Estimand, estimator, estimate. ATE, ATT, LATE. The precise vocabulary you need.

The Mystery

Now that you feel the problem, let's equip you with the language to talk about it precisely.

Prerequisites: Why Causal Inference?, Selection Bias and Confounding
Reading Time: ~12 min read · 7 sections · 3 interactive exercises
Next Up: Chapter 5: DAGs for Beginners

By now, you have a solid intuition for why causal inference is hard. You know that comparing treated and untreated groups is misleading when selection bias is present. You know that omitted variables distort regression estimates. You can even sign the direction of bias.

But if you sit in an empirical methods seminar right now, you will hear people say things like:

"What is your estimand?" "This design identifies the ATT, not the ATE." "The identification relies on exogenous variation in exposure to the policy."

And if you are like most first-year students, you will nod politely while internally panicking.

This page fixes that gap. We are going to equip you with the precise vocabulary of causal inference — not as abstract definitions to memorize, but as tools for thinking about your own research. Every term will be grounded in our running example: the job training program.

Estimand, Estimator, Estimate

These three words sound similar, but they refer to very different things. Confusing them is one of the most common conceptual errors in applied work.

The Estimand

The is the thing you are trying to learn — defined in terms of potential outcomes, before you look at any data or choose any statistical method. The potential outcomes framework was formalized by Rubin (1974) and is the foundation for modern causal inference (Imbens & Rubin, 2015).

Think of it as the question you are asking, stated with mathematical precision.

For our training program, the estimand might be:

"What is the average causal effect of the training program on participants' earnings?"

In potential outcomes notation: $E[Y(1) - Y(0) | D = 1]$

Notice: the estimand says nothing about regression, matching, instrumental variables, or any other statistical technique. It is a conceptual quantity — a feature of reality that exists whether or not you are smart enough to measure it.

The Estimator

The is the statistical recipe you apply to data to try to learn the estimand.

Different estimators can target the same estimand. For the training program, you might use:

A simple (naive, biased if selection is present)
Ordinary Least Squares (OLS) with control variables (less biased if controls are sufficient)
Propensity score matching (relies on selection-on-observables)
Difference-in-differences (relies on parallel trends)
Instrumental variables (relies on a valid instrument)

Each estimator comes with its own assumptions. If those assumptions hold, the estimator consistently recovers the estimand. If they do not hold, the estimator converges to something else — something that is not the causal effect.

The Estimate

The estimate is the number you get when you apply your estimator to your specific dataset.

It is a single realization — one number, like "$2,347." It is subject to sampling variability (a different sample would give a different number), which is why we report standard errors and confidence intervals alongside it.

Why does this distinction matter? Because one of the most common mistakes in applied work is choosing an estimator without first defining the estimand. Researchers jump straight to "I'll run a regression" without asking "what causal quantity am I trying to recover, and under what assumptions does my regression recover it?"

Four Flavors of Causal Effect

Not all causal effects are the same. Depending on whose outcomes you are averaging over, you get different quantities — and they can have very different values and policy implications.

ATE: Average Treatment Effect

The is the average causal effect across the entire population:

\text{ATE} = E[Y(1) - Y(0)]

For our training program: if you could somehow force every person in the population — both those who would volunteer and those who would not — to take the training, how much would average earnings increase?

ATT: Average Treatment Effect on the Treated

The is the average causal effect among those who actually received treatment:

\text{ATT} = E[Y(1) - Y(0) \mid D = 1]

For our training program: among the people who actually enrolled, how much did the program increase their earnings?

The ATT is often what policymakers care about most: "Did the program help the people we served?" But notice it is a different question from the ATE. If motivated people both benefit more from training and are more likely to enroll, then ATT > ATE. The program works better for its actual participants than it would for a random person off the street.

ATU: Average Treatment Effect on the Untreated

The Average Treatment Effect on the Untreated is the average causal effect among those who did not receive treatment:

\text{ATU} = E[Y(1) - Y(0) \mid D = 0]

For our training program: if you could have given training to the people who did not enroll, how much would it have helped them?

The ATU estimand might seem obscure, but it matters enormously for policy. If the government is considering expanding the program to reach non-participants, the ATU tells you what to expect. And it might be very different from the ATT — maybe the people who did not enroll would benefit less (or more) from training.

LATE: Local Average Treatment Effect

The is the average causal effect for a specific subpopulation called :

\text{LATE} = E[Y(1) - Y(0) \mid \text{complier}]

The LATE estimand takes a bit more setup. Suppose the government sends letters encouraging people to enroll in the training program. Some people who get the letter enroll (they would not have enrolled without it). Some people who get the letter ignore it. Some people enroll regardless of whether they get the letter.

The LATE is the causal effect for the people whose behavior was changed by the letter — the compliers. It is the estimand you get when you use an instrumental variable (the letter) to estimate the effect of the treatment (training) (Imbens & Angrist, 1994). You will learn much more about this estimand when you study instrumental variables.

Let us make these distinctions concrete with our training program:

Estimand	In Words	Value Might Be...	Who Cares
ATE	Effect on a random person	$1,500	Economists studying the general impact
ATT	Effect on actual trainees	$3,000	Program administrators evaluating success
ATU	Effect on non-trainees if they had trained	$500	Policymakers considering expansion
LATE	Effect on people nudged by outreach letters	$2,000	Researchers using the letter as an instrument

The listed estimands are all different quantities, and a study that estimates one is not necessarily telling you about the others.

Try these experiments:

Set both treatment effects equal. The ATE, ATT, and ATU all converge — there is no selection on gains.
Make the effect for motivated people much larger. Now ATT > ATE > ATU, because the people who enroll benefit more.
Set the unmotivated effect to zero or negative. Expanding the program to non-participants would be wasteful, even though the ATT looks impressive.

What Does "Identification" Mean?

You will hear the word constantly. Informally, it means: can you recover the causal parameter you care about from the data you have, given the assumptions you are willing to make?

More precisely, a research design is identified when the assumptions of the design are sufficient to express the estimand — a causal quantity involving unobservable potential outcomes — as a function of observable data. When point identification is not achievable, researchers can sometimes derive informative bounds on the parameter of interest .

Here is the key insight: identification is about assumptions, not about statistical techniques. You typically do not identify a causal effect by running a regression. You identify it by providing an argument — grounded in institutional knowledge, research design, or both — for why the variation you are exploiting is unrelated to confounders. As Holland (1986) emphasized, the "fundamental problem of causal inference" is that we can never observe both potential outcomes for the same unit — which is why identification strategies are essential. Holland also formalized the principle of "no causation without manipulation" — causal effects can only be defined for treatments that could, in principle, be manipulated. This sparked debate with the DAG tradition, which allows causal reasoning about non-manipulable attributes. Angrist and Pischke (2009) provide an accessible treatment of these identification strategies in practice.

Think of it this way:

Identified: "Because the training slots were assigned by lottery, the treated and untreated groups are comparable in expectation. Therefore, a simple difference in means identifies the ATE."
Not identified: "I ran a regression of earnings on training participation." (Why should I believe the coefficient is causal? What ensures that trainees and non-trainees are comparable?)

Concept Check

A researcher regresses earnings on a job training indicator and finds a coefficient of 3,200 dollars (p < 0.01). She concludes that the training program caused a 3,200-dollar increase in earnings. What is the most fundamental problem with this conclusion?

The sample size is too small.She should have used a different statistical method.She has not established that the variation in training participation is exogenous — she has an estimator and an estimate but no identification strategy.She should have reported confidence intervals instead of p-values.

What Does "Exogenous Variation" Mean?

When researchers say they are exploiting "exogenous variation," they mean they have found a source of variation in the treatment variable that is not driven by the confounders that create selection bias.

The word means "coming from outside." In this context, it means: the variation is determined by something outside the system of confounders you are worried about.

Examples:

A lottery that assigns training slots generates exogenous variation in training — this lottery mechanism is the logic behind randomized experiments. Whether you get trained is determined by a random number, not by your motivation or ability.
A policy change in one state but not a neighboring state generates plausibly exogenous variation in the policy variable — which state you live in was not determined by your anticipated response to the policy (usually).
An arbitrary bureaucratic cutoff (e.g., schools with enrollment above 25 get an additional teacher) generates exogenous variation near the cutoff — whether your school has 24 or 26 students is approximately random.

In each case, the researcher is arguing that the variation they exploit is "as good as random" — not perfectly random, but close enough that the confounders that plague naive comparisons are not driving the variation.

Concept Check

Which of the following is the best example of exogenous variation in exposure to a job training program?

Comparing workers who voluntarily enrolled in training to those who did not.Comparing workers in a firm that offered training to workers in a firm that did not, where the decision to offer training was based on firm profitability.Comparing workers who were randomly assigned to receive training vouchers (in a lottery with more applicants than slots) to those who were not.Comparing workers before and after they chose to complete a training course.

Putting It All Together

Let us revisit our training mystery one more time, now with the precise vocabulary:

The estimand we want: The ATT — the average causal effect of training on those who participated. $E[Y(1) - Y(0) | D = 1]$ .
The estimator we used (naively): A simple difference in means. $\bar{Y}_{\text{trained}} - \bar{Y}_{\text{not trained}}$ .
The estimate we got: $7,500.
The problem: Our estimator is not identified for the ATT because the variation in training participation is endogenous — driven by motivation and ability, which also affect earnings. The estimate captures both the causal effect and selection bias.
What we need: A source of exogenous variation in training participation — something that changes who gets trained but is not correlated with the confounders. The source of variation is an , and the anatomy of a research design shows how it fits into the broader pipeline of credible empirical work.
What we would say in a seminar: "The naive comparison does not identify the ATT because training take-up is endogenous. We need exogenous variation — such as a lottery, a policy discontinuity, or a quasi-experimental design — to separate the causal effect from selection bias."

You can now speak this language. It will serve you for the rest of your career.

✓Key Takeaways

Key Takeaways

Estimand, estimator, estimate are three distinct concepts. The estimand is what you want to learn (defined without reference to any method). The estimator is the statistical procedure you use. The estimate is the number you get. Define your estimand first.
ATE, ATT, ATU, and LATE are different estimands. They average over different subpopulations and can have very different values. Know which one your design recovers.
Identification means your assumptions are sufficient to recover the causal parameter from observed data. It is about the argument, not the technique. A regression can identify a causal effect if the right assumptions hold — but the regression itself does not generate the identification.
Exogenous variation is variation in treatment that is not driven by confounders. Finding or creating such variation — whether through experiments, instrumental variables, or other designs — is the fundamental challenge of empirical research.
Use this vocabulary precisely. When you read a paper, ask: What is the estimand? What identification strategy justifies interpreting the estimates causally? Is the variation plausibly exogenous? These questions will make you a more sophisticated reader and a better researcher.

→What Comes Next

You now have the conceptual vocabulary and the intuition. But there is a powerful visual tool that makes the relationships between variables, confounders, and causal pathways much easier to see — and to reason about formally. On the next page, you will learn to draw Directed Acyclic Graphs (DAGs): simple diagrams that let you see exactly where selection bias comes from and what you need to block it.

We will return to our training mystery and draw the problem. And once you see the picture, the solution will become much clearer.

Next Step: DAGs for Beginners — A visual tool for thinking about causal relationships. Draw the problem, see the solution.

Estimand, Estimator, Estimate#

The Estimand#

The Estimator#

The Estimate#

Four Flavors of Causal Effect#

ATE: Average Treatment Effect#

ATT: Average Treatment Effect on the Treated#

ATU: Average Treatment Effect on the Untreated#

LATE: Local Average Treatment Effect#

What Does "Identification" Mean?#

What Does "Exogenous Variation" Mean?#

Putting It All Together#

✓Key Takeaways#

→What Comes Next#