Lab: Causal Mediation Analysis
Decompose a total treatment effect into direct and indirect (mediated) pathways using the Baron-Kenny approach and the modern Imai-Keele-Tingley framework. Learn to perform sensitivity analysis for the sequential ignorability assumption.
Overview
In this lab you will analyze a simulated job training program where the treatment (training) affects earnings both directly (e.g., signaling to employers) and indirectly through improved skills (the mediator). You will decompose the total effect into the average causal mediation effect (ACME) and the average direct effect (ADE).
What you will learn:
- How to implement the classic Baron-Kenny mediation approach
- How to use the modern Imai-Keele-Tingley (2010) framework for causal mediation
- The sequential ignorability assumption and why it is strong
- How to perform sensitivity analysis for violations of sequential ignorability
- How to handle treatment-mediator interactions
Prerequisites: Familiarity with OLS regression and the concept of causal pathways. Understanding of potential outcomes is helpful.
Step 1: Simulate Training Program Data
The treatment (training) affects a mediator (skills) which in turn affects the outcome (earnings). There is also a direct effect of training on earnings.
library(mediation)
set.seed(42)
n <- 1500
age <- pmin(pmax(rnorm(n, 30, 8), 18), 60)
educ <- pmin(pmax(rnorm(n, 12, 2.5), 8), 20)
motivation <- rnorm(n)
treat <- rbinom(n, 1, 0.5)
# Mediator: skills
skills <- 50 + 5 * treat + 2 * educ + 3 * motivation + rnorm(n, 0, 5)
# Outcome: earnings
earnings <- 20000 + 1000 * treat + 200 * skills +
300 * educ + 150 * age + 500 * motivation + rnorm(n, 0, 3000)
df <- data.frame(earnings, treat, skills, age, educ, motivation)
cat("True total:", 2000, "\nTrue ACME:", 1000, "\nTrue ADE:", 1000, "\n")Expected output:
| Variable | Mean | Std Dev | Min | Max |
|---|---|---|---|---|
| earnings | 32,800 | 4,200 | 18,500 | 48,000 |
| treat | 0.50 | 0.50 | 0 | 1 |
| skills | 77.5 | 7.5 | 52 | 102 |
| age | 30.2 | 7.5 | 18.0 | 60.0 |
| educ | 12.0 | 2.4 | 8.0 | 20.0 |
| motivation | 0.0 | 1.0 | -3.2 | 3.1 |
True total effect: 2000
True ACME (indirect): 1000
True ADE (direct): 1000
Mediation share: 50%
Step 2: The Baron-Kenny Approach
The classic approach involves three regressions: (1) total effect, (2) mediator model, (3) outcome model conditioning on the mediator.
# Step 1: Total effect
total <- lm(earnings ~ treat + age + educ + motivation, data = df)
cat("Total effect:", coef(total)["treat"], "\n")
# Step 2: Mediator model
med_model <- lm(skills ~ treat + age + educ + motivation, data = df)
cat("Treatment -> Skills:", coef(med_model)["treat"], "\n")
# Step 3: Outcome model with mediator
outcome <- lm(earnings ~ treat + skills + age + educ + motivation, data = df)
cat("Direct effect:", coef(outcome)["treat"], "\n")
cat("Skills coef:", coef(outcome)["skills"], "\n")
# Baron-Kenny
a <- coef(med_model)["treat"]
b <- coef(outcome)["skills"]
cat("\nIndirect (a*b):", a * b, "\n")
cat("Direct (c'):", coef(outcome)["treat"], "\n")Expected output:
| Baron-Kenny Step | Regression | Key Coefficient | Estimate | True Value |
|---|---|---|---|---|
| Step 1: Total effect | earnings ~ treat + controls | treat | ~2,000 | 2,000 |
| Step 2: Mediator model | skills ~ treat + controls | treat | ~5.0 | 5.0 |
| Step 3: Outcome model | earnings ~ treat + skills + controls | treat (direct) | ~1,000 | 1,000 |
| Step 3: Outcome model | earnings ~ treat + skills + controls | skills | ~200 | 200 |
=== Baron-Kenny Decomposition ===
Indirect (a*b): ~1000 (5.0 * 200)
Direct (c'): ~1000
Total: ~2000
In the Baron-Kenny framework, controlling for the mediator (skills) in the outcome regression is essential for identifying the direct effect. What assumption makes this valid?
Step 3: Imai-Keele-Tingley Framework
The modern approach uses simulation to compute mediation effects under the potential outcomes framework, providing valid confidence intervals.
# Use the mediation package
med_fit <- lm(skills ~ treat + age + educ + motivation, data = df)
out_fit <- lm(earnings ~ treat + skills + age + educ + motivation, data = df)
med_result <- mediate(med_fit, out_fit,
treat = "treat", mediator = "skills",
sims = 1000, boot = TRUE)
summary(med_result)
# Key outputs:
# ACME = Average Causal Mediation Effect (indirect)
# ADE = Average Direct Effect
# Total = ACME + ADE
# Prop. Mediated = ACME / TotalExpected output:
| Effect | Estimate | 95% CI | True Value |
|---|---|---|---|
| ACME (indirect) | ~1,000 | [800, 1,200] | 1,000 |
| ADE (direct) | ~1,000 | [600, 1,400] | 1,000 |
| Total effect | ~2,000 | [1,600, 2,400] | 2,000 |
| Prop. mediated | ~50% | [35%, 65%] | 50% |
ACME (indirect): ~1000 95% CI: [800, 1200]
ADE (direct): ~1000 95% CI: [600, 1400]
Total: ~2000
Prop. mediated: ~50%
The simulation-based confidence intervals from the Imai-Keele-Tingley framework are wider than the Baron-Kenny point estimates because they account for parameter uncertainty in both the mediator and outcome models.
Step 4: Sensitivity Analysis
The key untestable assumption (sequential ignorability) states that there are no unobserved confounders of the mediator-outcome relationship. Sensitivity analysis asks: how large would such a confounder need to be to overturn the results?
# Sensitivity analysis in the mediation package
sens <- medsens(med_result, rho.by = 0.05, effect.type = "indirect")
summary(sens)
# Plot: ACME as a function of rho
plot(sens, main = "Sensitivity of ACME to Unobserved Confounding")
# The plot shows at what value of rho the ACME crosses zero
# Larger |rho_critical| means more robust resultsExpected output:
ACME crosses zero at rho = ~0.33
If you believe rho is plausibly below this value, the mediation finding is robust.
Step 5: Treatment-Mediator Interaction
Allow the effect of skills on earnings to differ by treatment status.
# Interaction model
out_int <- lm(earnings ~ treat * skills + age + educ + motivation, data = df)
summary(out_int)
# Re-run mediation with interaction
med_int <- mediate(med_fit, out_int,
treat = "treat", mediator = "skills",
sims = 1000)
summary(med_int)
# Now reports ACME(0), ACME(1), ADE(0), ADE(1) separatelyExpected output:
| Variable | Coeff | SE | t | p |
|---|---|---|---|---|
| Intercept | ~5,500 | ~1,200 | ~4.6 | <0.001 |
| treat | ~1,000 | ~2,800 | ~0.4 | ~0.72 |
| skills | ~200 | ~15 | ~13.3 | <0.001 |
| treat:skills | ~0.0 | ~35 | ~0.0 | ~0.99 |
| age | ~150 | ~10 | ~15.0 | <0.001 |
| educ | ~300 | ~35 | ~8.6 | <0.001 |
| motivation | ~500 | ~80 | ~6.3 | <0.001 |
ACME for control group: ~1000
ACME for treated group: ~1000
Interaction coefficient: ~0.0
In this DGP the treatment-mediator interaction is essentially zero because the effect of skills on earnings does not depend on treatment status. In practice, a significant interaction would mean ACME(0) differs from ACME(1).
If there is a significant treatment-mediator interaction, what does this imply for the decomposition into ACME and ADE?
Exercises
-
Add a second mediator. Introduce a second mediator (e.g., confidence) that is also affected by treatment and affects earnings. Decompose the total effect through both pathways.
-
Binary mediator. Replace the continuous skills variable with a binary indicator (passed/failed the skills exam). Re-run the mediation analysis using probit for the mediator model.
-
Vary the mediation share. Change the DGP so that 90% of the total effect is mediated. How does this affect the sensitivity analysis?
-
Causal forests for heterogeneous mediation. Use machine learning (e.g., causal forests) to estimate how the ACME varies across subgroups defined by age and education.
Summary
In this lab you learned:
- Mediation analysis decomposes a total effect into direct (ADE) and indirect/mediated (ACME) components
- The Baron-Kenny approach is simple but relies on strong linearity and no-confounding assumptions
- The Imai-Keele-Tingley framework provides valid inference under the sequential ignorability assumption using simulation
- Sequential ignorability (no unobserved mediator-outcome confounders) is untestable and often questionable in practice
- Sensitivity analysis quantifies how robust mediation findings are to violations of this assumption
- Treatment-mediator interactions mean the mediation effect differs for treated and control groups