MethodAtlas
tutorial90 minutes

Lab: Causal Mediation Analysis

Decompose a total treatment effect into direct and indirect (mediated) pathways using the Baron-Kenny approach and the modern Imai-Keele-Tingley framework. Learn to perform sensitivity analysis for the sequential ignorability assumption.

Overview

In this lab you will analyze a simulated job training program where the treatment (training) affects earnings both directly (e.g., signaling to employers) and indirectly through improved skills (the mediator). You will decompose the total effect into the average causal mediation effect (ACME) and the average direct effect (ADE).

What you will learn:

  • How to implement the classic Baron-Kenny mediation approach
  • How to use the modern Imai-Keele-Tingley (2010) framework for causal mediation
  • The sequential ignorability assumption and why it is strong
  • How to perform sensitivity analysis for violations of sequential ignorability
  • How to handle treatment-mediator interactions

Prerequisites: Familiarity with OLS regression and the concept of causal pathways. Understanding of potential outcomes is helpful.


Step 1: Simulate Training Program Data

The treatment (training) affects a mediator (skills) which in turn affects the outcome (earnings). There is also a direct effect of training on earnings.

library(mediation)

set.seed(42)
n <- 1500

age <- pmin(pmax(rnorm(n, 30, 8), 18), 60)
educ <- pmin(pmax(rnorm(n, 12, 2.5), 8), 20)
motivation <- rnorm(n)

treat <- rbinom(n, 1, 0.5)

# Mediator: skills
skills <- 50 + 5 * treat + 2 * educ + 3 * motivation + rnorm(n, 0, 5)

# Outcome: earnings
earnings <- 20000 + 1000 * treat + 200 * skills +
          300 * educ + 150 * age + 500 * motivation + rnorm(n, 0, 3000)

df <- data.frame(earnings, treat, skills, age, educ, motivation)

cat("True total:", 2000, "\nTrue ACME:", 1000, "\nTrue ADE:", 1000, "\n")
Requiresmediation

Expected output:

VariableMeanStd DevMinMax
earnings32,8004,20018,50048,000
treat0.500.5001
skills77.57.552102
age30.27.518.060.0
educ12.02.48.020.0
motivation0.01.0-3.23.1
True total effect:  2000
True ACME (indirect): 1000
True ADE (direct):    1000
Mediation share: 50%

Step 2: The Baron-Kenny Approach

The classic approach involves three regressions: (1) total effect, (2) mediator model, (3) outcome model conditioning on the mediator.

# Step 1: Total effect
total <- lm(earnings ~ treat + age + educ + motivation, data = df)
cat("Total effect:", coef(total)["treat"], "\n")

# Step 2: Mediator model
med_model <- lm(skills ~ treat + age + educ + motivation, data = df)
cat("Treatment -> Skills:", coef(med_model)["treat"], "\n")

# Step 3: Outcome model with mediator
outcome <- lm(earnings ~ treat + skills + age + educ + motivation, data = df)
cat("Direct effect:", coef(outcome)["treat"], "\n")
cat("Skills coef:", coef(outcome)["skills"], "\n")

# Baron-Kenny
a <- coef(med_model)["treat"]
b <- coef(outcome)["skills"]
cat("\nIndirect (a*b):", a * b, "\n")
cat("Direct (c'):", coef(outcome)["treat"], "\n")

Expected output:

Baron-Kenny StepRegressionKey CoefficientEstimateTrue Value
Step 1: Total effectearnings ~ treat + controlstreat~2,0002,000
Step 2: Mediator modelskills ~ treat + controlstreat~5.05.0
Step 3: Outcome modelearnings ~ treat + skills + controlstreat (direct)~1,0001,000
Step 3: Outcome modelearnings ~ treat + skills + controlsskills~200200
=== Baron-Kenny Decomposition ===
Indirect (a*b): ~1000  (5.0 * 200)
Direct (c'):    ~1000
Total:          ~2000
Concept Check

In the Baron-Kenny framework, controlling for the mediator (skills) in the outcome regression is essential for identifying the direct effect. What assumption makes this valid?


Step 3: Imai-Keele-Tingley Framework

The modern approach uses simulation to compute mediation effects under the potential outcomes framework, providing valid confidence intervals.

# Use the mediation package
med_fit <- lm(skills ~ treat + age + educ + motivation, data = df)
out_fit <- lm(earnings ~ treat + skills + age + educ + motivation, data = df)

med_result <- mediate(med_fit, out_fit,
                     treat = "treat", mediator = "skills",
                     sims = 1000, boot = TRUE)
summary(med_result)

# Key outputs:
# ACME = Average Causal Mediation Effect (indirect)
# ADE = Average Direct Effect
# Total = ACME + ADE
# Prop. Mediated = ACME / Total
Requiresmediation

Expected output:

EffectEstimate95% CITrue Value
ACME (indirect)~1,000[800, 1,200]1,000
ADE (direct)~1,000[600, 1,400]1,000
Total effect~2,000[1,600, 2,400]2,000
Prop. mediated~50%[35%, 65%]50%
ACME (indirect): ~1000  95% CI: [800, 1200]
ADE (direct):    ~1000  95% CI: [600, 1400]
Total:           ~2000
Prop. mediated:  ~50%

The simulation-based confidence intervals from the Imai-Keele-Tingley framework are wider than the Baron-Kenny point estimates because they account for parameter uncertainty in both the mediator and outcome models.


Step 4: Sensitivity Analysis

The key untestable assumption (sequential ignorability) states that there are no unobserved confounders of the mediator-outcome relationship. Sensitivity analysis asks: how large would such a confounder need to be to overturn the results?

# Sensitivity analysis in the mediation package
sens <- medsens(med_result, rho.by = 0.05, effect.type = "indirect")
summary(sens)

# Plot: ACME as a function of rho
plot(sens, main = "Sensitivity of ACME to Unobserved Confounding")

# The plot shows at what value of rho the ACME crosses zero
# Larger |rho_critical| means more robust results
Requiresmediation

Expected output:

ACME crosses zero at rho = ~0.33
If you believe rho is plausibly below this value, the mediation finding is robust.

Step 5: Treatment-Mediator Interaction

Allow the effect of skills on earnings to differ by treatment status.

# Interaction model
out_int <- lm(earnings ~ treat * skills + age + educ + motivation, data = df)
summary(out_int)

# Re-run mediation with interaction
med_int <- mediate(med_fit, out_int,
                  treat = "treat", mediator = "skills",
                  sims = 1000)
summary(med_int)
# Now reports ACME(0), ACME(1), ADE(0), ADE(1) separately
Requiresmediation

Expected output:

VariableCoeffSEtp
Intercept~5,500~1,200~4.6<0.001
treat~1,000~2,800~0.4~0.72
skills~200~15~13.3<0.001
treat:skills~0.0~35~0.0~0.99
age~150~10~15.0<0.001
educ~300~35~8.6<0.001
motivation~500~80~6.3<0.001
ACME for control group: ~1000
ACME for treated group:  ~1000
Interaction coefficient: ~0.0

In this DGP the treatment-mediator interaction is essentially zero because the effect of skills on earnings does not depend on treatment status. In practice, a significant interaction would mean ACME(0) differs from ACME(1).

Concept Check

If there is a significant treatment-mediator interaction, what does this imply for the decomposition into ACME and ADE?


Exercises

  1. Add a second mediator. Introduce a second mediator (e.g., confidence) that is also affected by treatment and affects earnings. Decompose the total effect through both pathways.

  2. Binary mediator. Replace the continuous skills variable with a binary indicator (passed/failed the skills exam). Re-run the mediation analysis using probit for the mediator model.

  3. Vary the mediation share. Change the DGP so that 90% of the total effect is mediated. How does this affect the sensitivity analysis?

  4. Causal forests for heterogeneous mediation. Use machine learning (e.g., causal forests) to estimate how the ACME varies across subgroups defined by age and education.


Summary

In this lab you learned:

  • Mediation analysis decomposes a total effect into direct (ADE) and indirect/mediated (ACME) components
  • The Baron-Kenny approach is simple but relies on strong linearity and no-confounding assumptions
  • The Imai-Keele-Tingley framework provides valid inference under the sequential ignorability assumption using simulation
  • Sequential ignorability (no unobserved mediator-outcome confounders) is untestable and often questionable in practice
  • Sensitivity analysis quantifies how robust mediation findings are to violations of this assumption
  • Treatment-mediator interactions mean the mediation effect differs for treated and control groups