MethodAtlas
replication120 minutes

Replication Lab: Causal Forests and Heterogeneous Treatment Effects

Replicate the key simulation results from Wager and Athey (2018) on causal forests. Simulate data with heterogeneous treatment effects, estimate ATE with OLS, estimate CATE with causal forests, assess variable importance, and conduct calibration tests.

Overview

In this replication lab, you will explore the methodology from two foundational papers on causal forests:

Athey, Susan, and Guido Imbens. 2016. "Recursive Partitioning for Heterogeneous Causal Effects." Proceedings of the National Academy of Sciences 113(27): 7353–7360.

Wager, Stefan, and Susan Athey. 2018. "Estimation and Inference of Heterogeneous Treatment Effects using Random Forests." Journal of the American Statistical Association 113(523): 1228–1242.

Causal forests extend random forests from prediction to causal inference. Instead of predicting Y, a causal forest estimates the conditional average treatment effect (CATE): tau(x) = E[Y(1) - Y(0) | X = x]. The method splits the covariate space to maximize treatment effect heterogeneity rather than outcome prediction accuracy. Combined with honesty (separate subsamples for splitting and estimation), causal forests produce pointwise asymptotically normal and consistent CATE estimates.

Why the Wager and Athey paper matters: It provided the first theoretically grounded method for estimating heterogeneous treatment effects using tree-based methods, complete with valid confidence intervals. The causal forest has become the standard tool for CATE estimation in economics.

What you will do:

  • Simulate data with known treatment effect heterogeneity
  • Estimate the ATE using OLS (ignoring heterogeneity)
  • Estimate the CATE function using a causal forest
  • Assess variable importance to identify effect modifiers
  • Conduct calibration tests comparing estimated and true CATEs

Step 1: Simulate Data with Heterogeneous Treatment Effects

The DGP features a randomized experiment (treatment is independent of covariates) with treatment effect heterogeneity driven by X1 and X2.

library(grf)
library(data.table)

set.seed(2018)

n <- 4000; p <- 10
X <- matrix(rnorm(n * p), n, p)
colnames(X) <- paste0("X", 1:p)

W <- rbinom(n, 1, 0.5)
tau_true <- 1 + 2 * X[,1] + X[,2]
mu_0 <- 2 * X[,1] + X[,2]^2 + 0.5 * X[,3] * X[,4] + sin(X[,5])
Y <- mu_0 + tau_true * W + rnorm(n)

true_ate <- mean(tau_true)
cat("n =", n, ", p =", p, "\n")
cat("True ATE:", round(true_ate, 3), "\n")
cat("CATE range:", round(range(tau_true), 2), "\n")

Expected output:

Sample size: 4000
Covariates: 10
Treatment rate: ~50%

True CATE function: tau(x) = 1 + 2*X0 + X1
True ATE: ~1.00
CATE range: [-6.50, 8.50]
CATE std: ~2.24

Step 2: Estimate ATE with OLS (Ignoring Heterogeneity)

Before using causal forests, start with a simple OLS estimate that captures the average effect but misses heterogeneity.

# Simple difference in means
ols_simple <- lm(Y ~ W)
cat("Difference in means:", coef(ols_simple)["W"], "\n")

# OLS with controls
ols_ctrl <- lm(Y ~ W + X)
cat("OLS + controls:", coef(ols_ctrl)["W"], "\n")

cat("True ATE:", true_ate, "\n")

Expected output:

ModelATESE
Difference in means~1.00~0.07
OLS + controls~1.00~0.05
True ATE~1.00---
Concept Check

In a randomized experiment, why would you want to estimate the CATE function tau(x) rather than just the ATE?


Step 3: Estimate CATE with Causal Forest

The causal forest estimates tau(x) = E[Y(1) - Y(0) | X = x] by splitting the covariate space to maximize treatment effect heterogeneity.

# Fit causal forest using grf
cf <- causal_forest(X, Y, W,
                  num.trees = 2000,
                  min.node.size = 5,
                  seed = 42)

# Predict CATE
tau_hat <- predict(cf)$predictions

# ATE
ate_cf <- average_treatment_effect(cf)
cat("=== Causal Forest ===\n")
cat("ATE:", ate_cf["estimate"], "(SE:", ate_cf["std.err"], ")\n")
cat("True ATE:", true_ate, "\n")

# CATE accuracy
mse <- mean((tau_hat - tau_true)^2)
corr <- cor(tau_hat, tau_true)
cat("CATE MSE:", round(mse, 4), "\n")
cat("CATE correlation:", round(corr, 4), "\n")
Requiresgrf

Expected output:

MetricValue
ATE (causal forest)~1.00
ATE 95% CI[0.87, 1.13]
True ATE1.00
CATE MSE~0.45
CATE correlation~0.92

Step 4: Variable Importance and Calibration

# Variable importance
vi <- variable_importance(cf)
cat("=== Variable Importance ===\n")
vi_df <- data.frame(Variable = colnames(X), Importance = vi)
vi_df <- vi_df[order(-vi_df$Importance), ]
print(vi_df)

# Calibration test
cal_test <- test_calibration(cf)
cat("\n=== Calibration ===\n")
print(cal_test)

Expected output — Variable importance:

CovariateImportance / CorrTrue CorrNote
X0~0.850.89True effect modifier
X1~0.420.45True effect modifier
X2~0.020.00Noise
X3–X9~0.010.00Noise

Calibration:

Intercept: ~0.05 (ideal: 0)
Slope:     ~0.95 (ideal: 1)
R-squared: ~0.85
Concept Check

The causal forest correctly identifies X0 and X1 as treatment effect modifiers, even though X2–X4 also affect Y. Why does the causal forest distinguish between outcome predictors and effect modifiers?


Step 5: Sorted-Group Analysis and Final Comparison

# Sorted-group analysis
tau_q <- cut(tau_hat, breaks = quantile(tau_hat, seq(0,1,0.2)),
            include.lowest = TRUE, labels = 1:5)
cat("=== CATE by Quintile ===\n")
for (q in 1:5) {
idx <- tau_q == q
cat("Q", q, ": hat =", round(mean(tau_hat[idx]), 3),
    ", true =", round(mean(tau_true[idx]), 3), "\n")
}

cat("\nATE (causal forest):", round(ate_cf["estimate"], 3), "\n")
cat("True ATE:", round(true_ate, 3), "\n")

Expected output — Sorted-group analysis:

QuintileMean tau_hatMean tau_trueDiff-in-means
1 (lowest)-2.30-2.15-2.10
2-0.40-0.35-0.50
30.951.000.90
42.302.402.55
5 (highest)4.504.204.30
Concept Check

Wager and Athey (2018) prove that causal forests are pointwise consistent and asymptotically normal. What feature makes pointwise confidence intervals valid?


Summary

The replication of Wager and Athey (2018) demonstrates:

  1. Causal forests recover the CATE function. Estimated effects correlate highly (r > 0.9) with the true treatment effect function.

  2. Variable importance identifies true effect modifiers. The causal forest correctly distinguishes between covariates that drive heterogeneity and outcome-only predictors.

  3. Good calibration. Units predicted to have high effects truly do have high effects.

  4. Valid inference. Pointwise confidence intervals are enabled by honesty and the infinitesimal jackknife.

  5. Complementary to ATE. The causal forest recovers the ATE and additionally reveals the full distribution of treatment effects.


Extension Exercises

  1. Binary treatment effect. Set tau(x) as a step function: tau = 3 if X0 > 0, tau = -1 otherwise. How does the causal forest handle discontinuities?

  2. Observational study. Make treatment depend on covariates. Compare the causal forest with and without propensity score adjustment.

  3. Best linear projection. Use the best_linear_projection function in grf to recover a linear approximation of tau(x). Compare with the true CATE coefficients.

  4. Policy learning. Construct an optimal treatment assignment rule from the estimated CATE. Calculate the value of targeting the top 50% vs. treating everyone.

  5. Larger p. Increase covariates to p = 50 or 200 while keeping 2 true effect modifiers. How does variable selection degrade?

  6. Coverage study. Run 200 Monte Carlo simulations and check pointwise CI coverage. Does coverage approach 95%?

  7. BART comparison. Fit Bayesian Additive Regression Trees and compare CATE estimates. Discuss relative advantages.

  8. Real data application. Apply the causal forest to an RCT dataset and identify subgroups with the largest and smallest treatment effects.