Lab·replication·5 min read

replication120 minutes

Replication Lab: Causal Mediation Analysis

Replicate Imai et al. (2010) on causal mediation: simulate direct and indirect effects, run Baron-Kenny and modern estimators, and run sensitivity analysis.

Method: Causal Mediation Analysis
Languages: Python, R, Stata
Dataset: Simulated data matching Imai et al. (2010) DGP

Overview

In this replication lab, you will reproduce the key methodological contributions from:

Imai, Kosuke, Luke Keele, and Dustin Tingley. 2010. "A General Approach to Causal Mediation Analysis." Psychological Methods 15(4): 309–334.

Imai, Kosuke, Luke Keele, and Teppei Yamamoto. 2010. "Identification, Inference and Sensitivity Analysis for Causal Mediation Effects." Statistical Science 25(1): 51–71.

Mediation analysis asks: through what mechanism does a treatment affect an outcome? The classic Baron-Kenny approach decomposes the total effect into a direct effect and an indirect effect operating through a mediator. Imai et al. (2010) placed mediation analysis on a rigorous causal footing by defining the average causal mediation effect (ACME) and the average direct effect (ADE) in terms of potential outcomes, identifying the sequential ignorability assumption needed for causal interpretation, and providing sensitivity analysis tools.

Why the Imai et al. (2010) paper matters: It formalized causal mediation under the potential outcomes framework, showing that the traditional Baron-Kenny approach implicitly assumes sequential ignorability — an often-untestable assumption. The paper provided general-purpose estimation and sensitivity analysis tools, making mediation analysis more rigorous.

What you will do:

Simulate data with known direct and indirect effects
Estimate mediation effects using the Baron-Kenny (product-of-coefficients) approach
Estimate mediation effects using the modern causal mediation approach (ACME/ADE)
Conduct sensitivity analysis for violations of sequential ignorability
Compare all estimates to the known true effects

Step 1: Simulate Data with Direct and Indirect Effects

The DGP has a treatment T that affects the outcome Y both directly and indirectly through a mediator M.

1# First-time setup: install.packages(c("mediation"))
2library(mediation)
3
4set.seed(2010)
5n <- 2000
6
7X <- rnorm(n)
8Tr <- rbinom(n, 1, 0.5)
9
10alpha_T <- 0.6; alpha_X <- 0.3
11M <- 1 + alpha_T * Tr + alpha_X * X + rnorm(n, 0, 0.5)
12
13beta_T <- 0.5; beta_M <- 0.5; beta_X <- 0.4
14Y <- 2 + beta_T * Tr + beta_M * M + beta_X * X + rnorm(n)
15
16df <- data.frame(Y, Tr, M, X)
17
18true_acme <- alpha_T * beta_M  # 0.30
19true_ade <- beta_T  # 0.50
20cat("True ACME:", true_acme, "\n")
21cat("True ADE:", true_ade, "\n")
22cat("True total:", true_acme + true_ade, "\n")

Requiresmediation

Expected output:

Sample size: 2000
Treatment rate: ~50%

True effects:
  ACME (indirect): 0.300
  ADE (direct):    0.500
  Total effect:    0.800
  Prop. mediated:  0.375

Step 2: Baron-Kenny Mediation (Traditional Approach)

The traditional Baron-Kenny approach estimates mediation through three regressions: (1) Y on T, (2) M on T, and (3) Y on T and M. The indirect effect equals the product of the coefficient of T on M (path a) and the coefficient of M on Y controlling for T (path b).

1# Baron-Kenny
2step1 <- lm(Y ~ Tr + X, data = df)
3step2 <- lm(M ~ Tr + X, data = df)
4step3 <- lm(Y ~ Tr + M + X, data = df)
5
6a_path <- coef(step2)["Tr"]
7b_path <- coef(step3)["M"]
8direct_bk <- coef(step3)["Tr"]
9indirect_bk <- a_path * b_path
10
11cat("Path a (T->M):", a_path, "\n")
12cat("Path b (M->Y):", b_path, "\n")
13cat("Direct (c'):", direct_bk, "\n")
14cat("Indirect (a*b):", indirect_bk, "\n")

Expected output:

Component	Estimate	True
Path a (T on M)	~0.60	0.60
Path b (M on Y, controlling T)	~0.50	0.50
Indirect effect (a * b)	~0.30	0.30
Direct effect (c')	~0.50	0.50
Total effect	~0.80	0.80
Prop. mediated	~0.375	0.375

Concept Check

The Baron-Kenny approach estimates the indirect effect as the product a*b. Under what assumption does the product-of-coefficients identify the causal indirect effect?

The mediator must be randomly assigned.Sequential ignorability: (1) conditional on covariates, treatment assignment is ignorable, AND (2) conditional on treatment and covariates, the mediator is ignorable — meaning there are no unobserved confounders of the mediator-outcome relationship, even after conditioning on treatment.The outcome model must be linear.The treatment must have no direct effect on the outcome.

Step 3: Modern Causal Mediation (ACME/ADE)

The Imai et al. (2010) approach defines the ACME and ADE in terms of potential outcomes and uses simulation-based estimation.

1# Causal mediation using the mediation package
2med_fit <- lm(M ~ Tr + X, data = df)
3out_fit <- lm(Y ~ Tr + M + X, data = df)
4
5med_out <- mediate(med_fit, out_fit, treat = "Tr", mediator = "M",
6                 sims = 1000, boot = TRUE)
7summary(med_out)
8
9cat("\nTrue ACME:", true_acme, "\n")
10cat("True ADE:", true_ade, "\n")

Requiresmediation

Expected output:

Effect	Estimate	95% CI	True
ACME (indirect)	~0.300	[0.24, 0.37]	0.300
ADE (direct)	~0.500	[0.38, 0.62]	0.500
Total effect	~0.800	[0.68, 0.92]	0.800
Prop. mediated	~0.375	[0.30, 0.46]	0.375

Both the Baron-Kenny and modern causal mediation approaches recover the true effects in the linear DGP. The modern approach additionally provides simulation-based confidence intervals and a formal framework for sensitivity analysis.

Step 4: Sensitivity Analysis for Sequential Ignorability

Sequential ignorability is untestable. Imai et al. (2010) propose a sensitivity analysis that examines how the ACME changes under varying degrees of unmeasured confounding (parameterized by the correlation rho between the residuals of the mediator and outcome models).

1# Sensitivity analysis using mediation package
2sens_out <- medsens(med_out, rho.by = 0.05, sims = 1000)
3summary(sens_out)
4
5# Plot sensitivity
6# plot(sens_out, main = "Sensitivity of ACME to rho")
7
8cat("\nACME is robust to moderate violations of sequential ignorability.\n")
9cat("The ACME crosses zero at rho ~=", round(sens_out$err.cr.d, 2), "\n")

Requiresmediation

Expected output — Sensitivity analysis:

rho	Adjusted ACME	ACME = 0?
-0.3	~0.38	No
-0.2	~0.35	No
-0.1	~0.32	No
0.0	~0.30	No
0.1	~0.27	No
0.2	~0.25	No
0.3	~0.22	No

The ACME remains positive across a wide range of rho values, indicating that the mediation finding is relatively robust to moderate violations of sequential ignorability. The ACME would cross zero only with very strong unmeasured confounding (rho > 0.5).

Concept Check

Why is sensitivity analysis particularly important for mediation analysis, compared to other causal inference methods?

Because mediation models have more parameters than other models.Because sequential ignorability requires no unmeasured confounders of the mediator-outcome relationship, which is much harder to guarantee than no unmeasured confounders of the treatment-outcome relationship — even when treatment is randomized, the mediator is not randomized, so post-treatment confounders can invalidate the indirect effect estimate.Because mediation analysis requires a larger sample size.Because linear models are always misspecified.

Step 5: Comparison with Published Results

cat("=== Final Comparison ===\n")
cat("Baron-Kenny ACME:", indirect_bk, "\n")
cat("Causal Mediation ACME:", med_out$d0, "\n")
cat("True ACME:", true_acme, "\n\n")
cat("Both methods recover the true mediation effects.\n")

Requiresmediation

Expected output — Final comparison:

Method	ACME	ADE	Total
Baron-Kenny	~0.300	~0.500	~0.800
Causal Mediation	~0.300	~0.500	~0.800
True DGP	0.300	0.500	0.800

Concept Check

Imai et al. define the ACME using potential outcomes: ACME = E[Y(1, M(1)) - Y(1, M(0))]. Why is the potential outcome M(0) (the mediator value under no treatment) needed to define the indirect effect, even for treated individuals?

Because M(0) is always observable for the treated group.Because the indirect effect measures how much the outcome changes when the mediator shifts from its control value M(0) to its treated value M(1), holding the treatment fixed at T=1. The counterfactual M(0) represents the mediator value that would have occurred without treatment, isolating the portion of the treatment effect operating through the mediator.Because M(0) is needed for the Baron-Kenny product formula.Because without M(0) the total effect cannot be computed.

Summary

The replication of Imai et al. (2010) demonstrates:

Causal mediation formalizes indirect effects. The ACME and ADE are defined as potential outcome contrasts, providing a rigorous foundation for mediation analysis.
Baron-Kenny and modern approaches agree in linear models. When all models are linear, the product-of-coefficients equals the simulation-based ACME.
Sequential ignorability is the key assumption. Even with randomized treatment, the mediator is not randomized, so unmeasured confounders of the mediator-outcome relationship can invalidate the ACME.
Sensitivity analysis is essential. Quantifying how the ACME changes under violations of sequential ignorability helps assess the robustness of mediation findings.
The modern framework generalizes. Unlike Baron-Kenny, the Imai et al. (2010) approach works with nonlinear models, multiple mediators, and provides formal inference.

Extension Exercises

Nonlinear models. Replace the linear outcome model with a logistic regression (binary outcome). Compare Baron-Kenny (which requires linearity) with the modern approach (which handles nonlinear models).
Interaction effects. Add a treatment-mediator interaction (beta_TM * T * M) to the DGP. Show that Baron-Kenny gives a single indirect effect while the modern approach gives separate ACME(0) and ACME(1).
Multiple mediators. Add a second mediator M2 and decompose the total effect into the indirect effect through M1, the indirect effect through M2, and the direct effect.
Binary mediator. Make the mediator binary (logistic mediator model) and compare estimation approaches.
Violated sequential ignorability. Add an unobserved confounder that affects both M and Y. Show that the ACME estimate is biased and that the sensitivity analysis detects the violation.
Bootstrapped confidence intervals. Implement the nonparametric bootstrap for the indirect effect and compare with the simulation-based CIs from the mediation package.
Instrumental variable approach. Introduce an instrument for the mediator and estimate the ACME using IV methods. Compare with the standard approach.
Real data application. Apply causal mediation analysis to a real dataset (e.g., effect of education on wages mediated by occupation choice) and conduct sensitivity analysis.

Overview#

Step 1: Simulate Data with Direct and Indirect Effects#

Step 2: Baron-Kenny Mediation (Traditional Approach)#

Step 3: Modern Causal Mediation (ACME/ADE)#

Step 4: Sensitivity Analysis for Sequential Ignorability#

Step 5: Comparison with Published Results#

Summary#

Extension Exercises#

Overview

Step 1: Simulate Data with Direct and Indirect Effects

Step 2: Baron-Kenny Mediation (Traditional Approach)

Step 3: Modern Causal Mediation (ACME/ADE)

Step 4: Sensitivity Analysis for Sequential Ignorability

Step 5: Comparison with Published Results

Summary

Extension Exercises