Replication Lab: Causal Mediation Analysis
Replicate the causal mediation analysis from Imai et al. (2010). Simulate data with direct and indirect effects, estimate mediation effects using the Baron-Kenny and modern causal mediation approaches, and conduct sensitivity analysis for sequential ignorability.
Overview
In this replication lab, you will reproduce the key methodological contributions from:
Imai, Kosuke, Luke Keele, and Dustin Tingley. 2010. "A General Approach to Causal Mediation Analysis." Psychological Methods 15(4): 309–334.
Imai, Kosuke, Luke Keele, and Teppei Yamamoto. 2010. "Identification, Inference and Sensitivity Analysis for Causal Mediation Effects." Statistical Science 25(1): 51–71.
Mediation analysis asks: through what mechanism does a treatment affect an outcome? The classic Baron-Kenny approach decomposes the total effect into a direct effect and an indirect effect operating through a mediator. Imai et al. placed mediation analysis on a rigorous causal footing by defining the average causal mediation effect (ACME) and the average direct effect (ADE) in terms of potential outcomes, identifying the sequential ignorability assumption needed for causal interpretation, and providing sensitivity analysis tools.
Why the Imai et al. paper matters: It formalized causal mediation under the potential outcomes framework, showing that the traditional Baron-Kenny approach implicitly assumes sequential ignorability — an often-untestable assumption. The paper provided general-purpose estimation and sensitivity analysis tools, making mediation analysis more rigorous.
What you will do:
- Simulate data with known direct and indirect effects
- Estimate mediation effects using the Baron-Kenny (product-of-coefficients) approach
- Estimate mediation effects using the modern causal mediation approach (ACME/ADE)
- Conduct sensitivity analysis for violations of sequential ignorability
- Compare all estimates to the known true effects
Step 1: Simulate Data with Direct and Indirect Effects
The DGP has a treatment T that affects the outcome Y both directly and indirectly through a mediator M.
library(mediation)
set.seed(2010)
n <- 2000
X <- rnorm(n)
Tr <- rbinom(n, 1, 0.5)
alpha_T <- 0.6; alpha_X <- 0.3
M <- 1 + alpha_T * Tr + alpha_X * X + rnorm(n, 0, 0.5)
beta_T <- 0.5; beta_M <- 0.5; beta_X <- 0.4
Y <- 2 + beta_T * Tr + beta_M * M + beta_X * X + rnorm(n)
df <- data.frame(Y, Tr, M, X)
true_acme <- alpha_T * beta_M # 0.30
true_ade <- beta_T # 0.50
cat("True ACME:", true_acme, "\n")
cat("True ADE:", true_ade, "\n")
cat("True total:", true_acme + true_ade, "\n")Expected output:
Sample size: 2000
Treatment rate: ~50%
True effects:
ACME (indirect): 0.300
ADE (direct): 0.500
Total effect: 0.800
Prop. mediated: 0.375
Step 2: Baron-Kenny Mediation (Traditional Approach)
The traditional Baron-Kenny approach estimates mediation through three regressions: (1) Y on T, (2) M on T, and (3) Y on T and M. The indirect effect equals the product of the coefficient of T on M (path a) and the coefficient of M on Y controlling for T (path b).
# Baron-Kenny
step1 <- lm(Y ~ Tr + X, data = df)
step2 <- lm(M ~ Tr + X, data = df)
step3 <- lm(Y ~ Tr + M + X, data = df)
a_path <- coef(step2)["Tr"]
b_path <- coef(step3)["M"]
direct_bk <- coef(step3)["Tr"]
indirect_bk <- a_path * b_path
cat("Path a (T->M):", a_path, "\n")
cat("Path b (M->Y):", b_path, "\n")
cat("Direct (c'):", direct_bk, "\n")
cat("Indirect (a*b):", indirect_bk, "\n")Expected output:
| Component | Estimate | True |
|---|---|---|
| Path a (T on M) | ~0.60 | 0.60 |
| Path b (M on Y, controlling T) | ~0.50 | 0.50 |
| Indirect effect (a * b) | ~0.30 | 0.30 |
| Direct effect (c') | ~0.50 | 0.50 |
| Total effect | ~0.80 | 0.80 |
| Prop. mediated | ~0.375 | 0.375 |
The Baron-Kenny approach estimates the indirect effect as the product a*b. Under what assumption does the product-of-coefficients identify the causal indirect effect?
Step 3: Modern Causal Mediation (ACME/ADE)
The Imai et al. approach defines the ACME and ADE in terms of potential outcomes and uses simulation-based estimation.
# Causal mediation using the mediation package
med_fit <- lm(M ~ Tr + X, data = df)
out_fit <- lm(Y ~ Tr + M + X, data = df)
med_out <- mediate(med_fit, out_fit, treat = "Tr", mediator = "M",
sims = 1000, boot = TRUE)
summary(med_out)
cat("\nTrue ACME:", true_acme, "\n")
cat("True ADE:", true_ade, "\n")Expected output:
| Effect | Estimate | 95% CI | True |
|---|---|---|---|
| ACME (indirect) | ~0.300 | [0.24, 0.37] | 0.300 |
| ADE (direct) | ~0.500 | [0.38, 0.62] | 0.500 |
| Total effect | ~0.800 | [0.68, 0.92] | 0.800 |
| Prop. mediated | ~0.375 | [0.30, 0.46] | 0.375 |
Both the Baron-Kenny and modern causal mediation approaches recover the true effects in the linear DGP. The modern approach additionally provides simulation-based confidence intervals and a formal framework for sensitivity analysis.
Step 4: Sensitivity Analysis for Sequential Ignorability
Sequential ignorability is untestable. Imai et al. propose a sensitivity analysis that examines how the ACME changes under varying degrees of unmeasured confounding (parameterized by the correlation rho between the residuals of the mediator and outcome models).
# Sensitivity analysis using mediation package
sens_out <- medsens(med_out, rho.by = 0.05, sims = 1000)
summary(sens_out)
# Plot sensitivity
# plot(sens_out, main = "Sensitivity of ACME to rho")
cat("\nACME is robust to moderate violations of sequential ignorability.\n")
cat("The ACME crosses zero at rho ~=", round(sens_out$err.cr.d, 2), "\n")Expected output — Sensitivity analysis:
| rho | Adjusted ACME | ACME = 0? |
|---|---|---|
| -0.3 | ~0.38 | No |
| -0.2 | ~0.35 | No |
| -0.1 | ~0.32 | No |
| 0.0 | ~0.30 | No |
| 0.1 | ~0.27 | No |
| 0.2 | ~0.25 | No |
| 0.3 | ~0.22 | No |
The ACME remains positive across a wide range of rho values, indicating that the mediation finding is relatively robust to moderate violations of sequential ignorability. The ACME would cross zero only with very strong unmeasured confounding (rho > 0.5).
Why is sensitivity analysis particularly important for mediation analysis, compared to other causal inference methods?
Step 5: Comparison with Published Results
cat("=== Final Comparison ===\n")
cat("Baron-Kenny ACME:", indirect_bk, "\n")
cat("Causal Mediation ACME:", med_out$d0, "\n")
cat("True ACME:", true_acme, "\n\n")
cat("Both methods recover the true mediation effects.\n")Expected output — Final comparison:
| Method | ACME | ADE | Total |
|---|---|---|---|
| Baron-Kenny | ~0.300 | ~0.500 | ~0.800 |
| Causal Mediation | ~0.300 | ~0.500 | ~0.800 |
| True DGP | 0.300 | 0.500 | 0.800 |
Imai et al. define the ACME using potential outcomes: ACME = E[Y(1, M(1)) - Y(1, M(0))]. Why is the potential outcome M(0) (the mediator value under no treatment) needed to define the indirect effect, even for treated individuals?
Summary
The replication of Imai et al. (2010) demonstrates:
-
Causal mediation formalizes indirect effects. The ACME and ADE are defined as potential outcome contrasts, providing a rigorous foundation for mediation analysis.
-
Baron-Kenny and modern approaches agree in linear models. When all models are linear, the product-of-coefficients equals the simulation-based ACME.
-
Sequential ignorability is the key assumption. Even with randomized treatment, the mediator is not randomized, so unmeasured confounders of the mediator-outcome relationship can invalidate the ACME.
-
Sensitivity analysis is essential. Quantifying how the ACME changes under violations of sequential ignorability helps assess the robustness of mediation findings.
-
The modern framework generalizes. Unlike Baron-Kenny, the Imai et al. approach works with nonlinear models, multiple mediators, and provides formal inference.
Extension Exercises
-
Nonlinear models. Replace the linear outcome model with a logistic regression (binary outcome). Compare Baron-Kenny (which requires linearity) with the modern approach (which handles nonlinear models).
-
Interaction effects. Add a treatment-mediator interaction (beta_TM * T * M) to the DGP. Show that Baron-Kenny gives a single indirect effect while the modern approach gives separate ACME(0) and ACME(1).
-
Multiple mediators. Add a second mediator M2 and decompose the total effect into the indirect effect through M1, the indirect effect through M2, and the direct effect.
-
Binary mediator. Make the mediator binary (logistic mediator model) and compare estimation approaches.
-
Violated sequential ignorability. Add an unobserved confounder that affects both M and Y. Show that the ACME estimate is biased and that the sensitivity analysis detects the violation.
-
Bootstrapped confidence intervals. Implement the nonparametric bootstrap for the indirect effect and compare with the simulation-based CIs from the mediation package.
-
Instrumental variable approach. Introduce an instrument for the mediator and estimate the ACME using IV methods. Compare with the standard approach.
-
Real data application. Apply causal mediation analysis to a real dataset (e.g., effect of education on wages mediated by occupation choice) and conduct sensitivity analysis.