Lab·tutorial·6 min read

tutorial90 minutes

Lab: Causal Mediation Analysis

Decompose a total effect into direct and indirect pathways using Baron-Kenny and the Imai-Keele-Tingley framework, with sequential ignorability sensitivity.

Method: Causal Mediation Analysis
Languages: Python, R, Stata
Dataset: Simulated training program with skills mediator

Overview

In this lab you will analyze a simulated job training program where the treatment (training) affects earnings both directly (e.g., signaling to employers) and indirectly through improved skills (the mediator). You will decompose the total effect into the average causal mediation effect (ACME) and the average direct effect (ADE).

What you will learn:

How to implement the classic Baron-Kenny mediation approach
How to use the modern Imai et al. (2010) framework for causal mediation
The sequential ignorability assumption and why it is strong
How to perform sensitivity analysis for violations of sequential ignorability
How to handle treatment-mediator interactions

Prerequisites: Familiarity with OLS regression and the concept of causal pathways. Understanding of potential outcomes is helpful.

Step 1: Simulate Training Program Data

The treatment (training) affects a mediator (skills) which in turn affects the outcome (earnings). There is also a direct effect of training on earnings.

1# First-time setup: install.packages(c("mediation"))
2library(mediation)
3
4set.seed(42)
5n <- 1500
6
7age <- pmin(pmax(rnorm(n, 30, 8), 18), 60)
8educ <- pmin(pmax(rnorm(n, 12, 2.5), 8), 20)
9motivation <- rnorm(n)
10
11treat <- rbinom(n, 1, 0.5)
12
13# Mediator: skills
14skills <- 50 + 5 * treat + 2 * educ + 3 * motivation + rnorm(n, 0, 5)
15
16# Outcome: earnings
17earnings <- 20000 + 1000 * treat + 200 * skills +
18          300 * educ + 150 * age + 500 * motivation + rnorm(n, 0, 3000)
19
20df <- data.frame(earnings, treat, skills, age, educ, motivation)
21
22cat("True total:", 2000, "\nTrue ACME:", 1000, "\nTrue ADE:", 1000, "\n")

Requiresmediation

Expected output:

Variable	Mean	Std Dev	Min	Max
earnings	43,900	4,200	28,000	60,000
treat	0.50	0.50	0	1
skills	77.5	7.5	52	102
age	30.2	7.5	18.0	60.0
educ	12.0	2.4	8.0	20.0
motivation	0.0	1.0	-3.2	3.1

True total effect:  2000
True ACME (indirect): 1000
True ADE (direct):    1000
Mediation share: 50%

Step 2: The Baron-Kenny Approach

The classic approach involves three regressions: (1) total effect, (2) mediator model, (3) outcome model conditioning on the mediator.

1# Step 1: Total effect
2total <- lm(earnings ~ treat + age + educ + motivation, data = df)
3cat("Total effect:", coef(total)["treat"], "\n")
4
5# Step 2: Mediator model
6med_model <- lm(skills ~ treat + age + educ + motivation, data = df)
7cat("Treatment -> Skills:", coef(med_model)["treat"], "\n")
8
9# Step 3: Outcome model with mediator
10outcome <- lm(earnings ~ treat + skills + age + educ + motivation, data = df)
11cat("Direct effect:", coef(outcome)["treat"], "\n")
12cat("Skills coef:", coef(outcome)["skills"], "\n")
13
14# Baron-Kenny
15a <- coef(med_model)["treat"]
16b <- coef(outcome)["skills"]
17cat("\nIndirect (a*b):", a * b, "\n")
18cat("Direct (c'):", coef(outcome)["treat"], "\n")

Expected output:

Baron-Kenny Step	Regression	Key Coefficient	Estimate	True Value
Step 1: Total effect	earnings ~ treat + controls	treat	~2,000	2,000
Step 2: Mediator model	skills ~ treat + controls	treat	~5.0	5.0
Step 3: Outcome model	earnings ~ treat + skills + controls	treat (direct)	~1,000	1,000
Step 3: Outcome model	earnings ~ treat + skills + controls	skills	~200	200

=== Baron-Kenny Decomposition ===
Indirect (a*b): ~1000  (5.0 * 200)
Direct (c'):    ~1000
Total:          ~2000

Concept Check

In the Baron-Kenny framework, controlling for the mediator (skills) in the outcome regression is essential for identifying the direct effect. What assumption makes this valid?

The treatment must be randomized.There must be no unobserved confounders of the mediator-outcome relationship (sequential ignorability).The mediator must be continuous.The outcome model must be linear.

Step 3: Imai-Keele-Tingley Framework

The modern approach uses simulation to compute mediation effects under the potential outcomes framework, providing valid confidence intervals.

1# Use the mediation package
2med_fit <- lm(skills ~ treat + age + educ + motivation, data = df)
3out_fit <- lm(earnings ~ treat + skills + age + educ + motivation, data = df)
4
5med_result <- mediate(med_fit, out_fit,
6                     treat = "treat", mediator = "skills",
7                     sims = 1000, boot = TRUE)
8summary(med_result)
9
10# Key outputs:
11# ACME = Average Causal Mediation Effect (indirect)
12# ADE = Average Direct Effect
13# Total = ACME + ADE
14# Prop. Mediated = ACME / Total

Requiresmediation

Expected output:

Effect	Estimate	95% CI	True Value
ACME (indirect)	~1,000	[800, 1,200]	1,000
ADE (direct)	~1,000	[600, 1,400]	1,000
Total effect	~2,000	[1,600, 2,400]	2,000
Prop. mediated	~50%	[35%, 65%]	50%

ACME (indirect): ~1000  95% CI: [800, 1200]
ADE (direct):    ~1000  95% CI: [600, 1400]
Total:           ~2000
Prop. mediated:  ~50%

The simulation-based confidence intervals from the Imai-Keele-Tingley framework are wider than the Baron-Kenny point estimates because they account for parameter uncertainty in both the mediator and outcome models.

Step 4: Sensitivity Analysis

The key untestable assumption (sequential ignorability) states that there are no unobserved confounders of the mediator-outcome relationship. Sensitivity analysis asks: how large would such a confounder need to be to overturn the results?

1# Sensitivity analysis in the mediation package
2sens <- medsens(med_result, rho.by = 0.05, effect.type = "indirect")
3summary(sens)
4
5# Plot: ACME as a function of rho
6plot(sens, main = "Sensitivity of ACME to Unobserved Confounding")
7
8# The plot shows at what value of rho the ACME crosses zero
9# Larger |rho_critical| means more robust results

Requiresmediation

Expected output:

ACME crosses zero at rho = ~0.33
If you believe rho is plausibly below this value, the mediation finding is robust.

Step 5: Treatment-Mediator Interaction

Allow the effect of skills on earnings to differ by treatment status.

1# Interaction model
2out_int <- lm(earnings ~ treat * skills + age + educ + motivation, data = df)
3summary(out_int)
4
5# Re-run mediation with interaction
6med_int <- mediate(med_fit, out_int,
7                  treat = "treat", mediator = "skills",
8                  sims = 1000)
9summary(med_int)
10# Now reports ACME(0), ACME(1), ADE(0), ADE(1) separately

Requiresmediation

Expected output:

Variable	Coeff	SE	t	p
Intercept	~5,500	~1,200	~4.6	<0.001
treat	~1,000	~2,800	~0.4	~0.72
skills	~200	~15	~13.3	<0.001
treat:skills	~0.0	~35	~0.0	~0.99
age	~150	~10	~15.0	<0.001
educ	~300	~35	~8.6	<0.001
motivation	~500	~80	~6.3	<0.001

ACME for control group: ~1000
ACME for treated group:  ~1000
Interaction coefficient: ~0.0

In this DGP the treatment-mediator interaction is approximately zero because the effect of skills on earnings does not depend on treatment status. In practice, a significant interaction would mean ACME(0) differs from ACME(1).

Concept Check

If there is a significant treatment-mediator interaction, what does this imply for the decomposition into ACME and ADE?

The decomposition is invalid and should not be reported.ACME and ADE are the same for treated and control groups.There are now separate ACME and ADE for the treated (ACME(1), ADE(1)) and control (ACME(0), ADE(0)) groups, and they may differ.You need to use nonlinear models instead.

Exercises

Add a second mediator. Introduce a second mediator (e.g., confidence) that is also affected by treatment and affects earnings. Decompose the total effect through both pathways.
Binary mediator. Replace the continuous skills variable with a binary indicator (passed/failed the skills exam). Re-run the mediation analysis using probit for the mediator model.
Vary the mediation share. Change the DGP so that 90% of the total effect is mediated. How does this affect the sensitivity analysis?
Causal forests for heterogeneous mediation. Use machine learning (e.g., causal forests) to estimate how the ACME varies across subgroups defined by age and education.

Expected output

If your code runs correctly, expect to see:

Total effect of training on earnings: Around $1,800–$2,200 (true value: $2,000)
Average causal mediation effect (ACME): Around $800–$1,200 (true value: $1,000), representing the indirect path through skills
Average direct effect (ADE): Around $800–$1,200 (true value: $1,000), representing the direct effect of training
Mediation share (ACME / Total): Around 40–60% (true value: 50%)
Baron-Kenny Step 2 (treatment on mediator): Training increases skills by approximately 5 points (true value: 5)
Baron-Kenny Step 3 (skills coefficient): Around 200 per skill point (true value: 200)
Sensitivity analysis: At rho = 0 (no unobserved confounders), results match the main analysis; the ACME crosses zero at rho around 0.3–0.5
Sample size: 1,500 participants (randomized treatment)

Summary

In this lab you learned:

Mediation analysis decomposes a total effect into direct (ADE) and indirect/mediated (ACME) components
The Baron-Kenny approach is simple but relies on strong linearity and no-confounding assumptions
The Imai-Keele-Tingley framework provides valid inference under the sequential ignorability assumption using simulation
Sequential ignorability (no unobserved mediator-outcome confounders) is untestable and often questionable in practice
Sensitivity analysis quantifies how robust mediation findings are to violations of this assumption
Treatment-mediator interactions mean the mediation effect differs for treated and control groups

Overview#

Step 1: Simulate Training Program Data#

Step 2: The Baron-Kenny Approach#

Step 3: Imai-Keele-Tingley Framework#

Step 4: Sensitivity Analysis#

Step 5: Treatment-Mediator Interaction#

Exercises#

Summary#

Overview

Step 1: Simulate Training Program Data

Step 2: The Baron-Kenny Approach

Step 3: Imai-Keele-Tingley Framework

Step 4: Sensitivity Analysis

Step 5: Treatment-Mediator Interaction

Exercises

Summary