MethodAtlas

Chapter 1 of 8

Why Causal Inference?

The fundamental problem of causal inference — why comparing treated and untreated is not enough.

The Mystery: A government launches a job training program. Trainees earn more. Did the program work?

The Research Mystery

Here is a story you will encounter — in some form — in nearly every empirical paper you read during your PhD.

A state government launches a new job training program. It is expensive. Legislators want to know: did the program actually work? A policy analyst pulls the data and finds that, one year later, trainees earn an average of $2,400 more than non-trainees.

Case closed?

Not even close.

Think about it for a second. Who signs up for a job training program? Probably people who are already motivated to improve their situation. People who are actively looking for better jobs. People who have the time, transportation, and childcare to attend sessions. Maybe people who are slightly better educated, or who live in neighborhoods with better access to program sites.

In other words, the people who chose to participate might have earned more even without the program. The $2,400 difference could be entirely explained by the fact that motivated, resourceful people both (a) sign up for training and (b) earn more — regardless of whether the training itself did anything.

The concern is not a minor technical quibble. The confounding problem is the central problem of empirical social science, and it has a name. (We will dissect it fully on the Selection Bias and Confounding page.)

The Fundamental Problem of Causal Inference

Here is what we fundamentally want to know: for a specific person — say, Maria — what would her earnings be if she participated in the training program, compared to what her earnings would be if she did not participate?

The difference between those two numbers is the causal effect of the program on Maria.

The trouble is devastating in its simplicity: we can never observe both. Maria either participates or she does not. We see one outcome. The other outcome — the one that would have happened under the alternative scenario — is forever unobserved.

The impossibility is called the Fundamental Problem of Causal Inference, and it was articulated with beautiful clarity by the statistician Paul Holland in 1986.

(Holland, 1986)

Read that sentence again, because it is genuinely profound: we can never observe the causal effect of a treatment on a single individual. Not because our data are noisy or our methods are imperfect, but because causation requires comparing two states of the world, and any individual can only ever exist in one.

If the impossibility seems unsettling, it should be. The fundamental limitation is precisely what motivates the entire field of causal inference.

Potential Outcomes: A Framework for Thinking Clearly

So how do researchers think about causal effects if we can never directly observe them? They use a framework called potential outcomes, developed primarily by the statistician Donald Rubin and sometimes called the Rubin Causal Model.

(Rubin, 1974)

The idea is simple but powerful. For every person in our study, we imagine two :

  • Y(1): the outcome (earnings) if the person receives the training
  • Y(0): the outcome (earnings) if the person does not receive the training

The causal effect for person i is simply the difference:

τi=Yi(1)Yi(0)\tau_i = Y_i(1) - Y_i(0)

Let us make this concrete. Imagine we could somehow peek into parallel universes and see both outcomes for five people:

PersonY(1): Earnings with TrainingY(0): Earnings without TrainingCausal Effect
Maria$32,000$28,000+$4,000
James$26,000$25,000+$1,000
Aisha$35,000$30,000+$5,000
David$22,000$24,000-$2,000
Lin$29,000$27,000+$2,000

Notice something important: the training helps most people but actually hurts David (maybe it pulled him away from a better opportunity). The average causal effect across these five people is:

ATE=4000+1000+5000+(2000)+20005=$2,000\text{ATE} = \frac{4000 + 1000 + 5000 + (-2000) + 2000}{5} = \$2,000

The is $2,000 — one of several causal estimands you will learn to distinguish. But here is the catch — we can never see this table in real life. Each person takes either the training or does not. We observe only one column per person.

The Problem of Comparison

"Fine," you might say. "I cannot observe both outcomes for the same person. But I can compare the average earnings of people who did train to those who did not train. Is that not the same thing?"

It would be — if the people who trained and the people who did not train were identical in every way except for the training itself. But they almost never are.

Let us go back to our five people. Suppose Maria, Aisha, and Lin choose to participate (they are the more motivated ones), while James and David do not. Here is what we actually observe:

PersonTrained?Observed EarningsWhat We Cannot See
MariaYes$32,000 (her Y(1))$28,000 (her Y(0))
JamesNo$25,000 (his Y(0))$26,000 (his Y(1))
AishaYes$35,000 (her Y(1))$30,000 (her Y(0))
DavidNo$24,000 (his Y(0))$22,000 (his Y(1))
LinYes$29,000 (her Y(1))$27,000 (her Y(0))

The naive comparison — average earnings of trainees minus average earnings of non-trainees — gives us:

32,000+35,000+29,000325,000+24,0002=32,00024,500=$7,500\frac{32{,}000 + 35{,}000 + 29{,}000}{3} - \frac{25{,}000 + 24{,}000}{2} = 32{,}000 - 24{,}500 = \$7{,}500

But we know the true ATE is only $2,000. The naive estimate of $7,500 is wildly inflated. Where does that extra $5,500 come from?

It comes from . The people who chose to train were already going to earn more, even without the program. They were not a random sample — they selected into treatment.

Concept Check

Using the table above, the naive comparison of trainees vs. non-trainees gives \$7,500. The true ATE is \$2,000. Why is the naive estimate so much larger than the true causal effect?

Seeing Selection Bias in Action

The next insight is particularly important. The gap between what you estimate and what is true is not random noise. It is systematic — and it can point in either direction, making your estimate too large or too small depending on how selection works.

The simulation below lets you experience this dynamic directly. You can adjust how strongly people's characteristics (motivation, education, prior earnings) influence both their decision to enroll and their future earnings. When selection is strong, the naive comparison diverges dramatically from the truth. When selection is zero — meaning enrollment is effectively random — the naive comparison recovers the true effect.

Interactive Simulation

Selection Bias Simulator

Adjust the strength of selection bias and watch how the naive comparison diverges from the true treatment effect. When selection is zero, comparing trainees to non-trainees gives you the right answer. As selection increases, the naive estimate becomes increasingly misleading.

0.00.20.40.60.81.0Selection Strength1.6k2.0k2.4k2.8k3.2kEstimated Effect ($)Bias: $400True EffectNaive Estimate
01
502000
05000

Play with this simulator for a minute. Set the selection strength to zero and notice how the naive estimate hovers around the true effect. Then crank it up to 0.8 or 0.9. The divergence is striking — and the same divergence is what happens in most observational studies.

What This Means for You

If you are reading this page at the beginning of your PhD, you might be wondering: "If we can never observe counterfactuals, how does anyone ever figure out whether anything causes anything?"

The question is precisely what the rest of the site is about.

Here is the short answer: researchers have developed an extraordinary toolkit of strategies — randomized experiments, natural experiments, difference-in-differences, instrumental variables, regression discontinuity, and more — that each, in their own way, try to solve or approximate the comparison problem. Every single one of these methods is, at its core, a strategy for constructing a credible : what would have happened to the treated group if they had not been treated?

Some of these strategies are more convincing than others. Some require stronger assumptions. Some only work in specific settings. Learning when each tool is appropriate, what assumptions it requires, and how to assess whether those assumptions are credible — that judgment is the intellectual core of modern empirical research. Even the simplest tool, OLS regression, requires careful thought about what it can and cannot identify.

Key Takeaways

What Comes Next

So how DO we figure out if the program worked? How do researchers go from "we observe that trainees earn more" to "we are confident the program caused higher earnings"?

The next page introduces the full pipeline of a research design — from formulating your question to writing up your results — and walks through a real published study that tackled exactly this kind of question. You will see how every stage of the pipeline works together to produce credible evidence.

Next Step: The Anatomy of a Research Design — See how the pieces of a credible study fit together, through the lens of a landmark paper.