Lab·replication·5 min read

replication120 minutes

Replication Lab: Causal Forests and Heterogeneous Treatment Effects

Replicate Wager and Athey (2018) on causal forests: simulate heterogeneous effects, estimate ATE via OLS and CATE via forests, assess variable importance.

Method: Causal Forests / Heterogeneous Treatment Effects
Languages: Python, R, Stata
Dataset: Simulated data with treatment effect heterogeneity matching Wager and Athey (2018)

Overview

In this replication lab, you will explore the methodology from two foundational papers on causal forests:

Athey, Susan, and Guido Imbens. 2016. "Recursive Partitioning for Heterogeneous Causal Effects." Proceedings of the National Academy of Sciences 113(27): 7353–7360.

Wager, Stefan, and Susan Athey. 2018. "Estimation and Inference of Heterogeneous Treatment Effects using Random Forests." Journal of the American Statistical Association 113(523): 1228–1242.

Causal forests extend random forests from prediction to causal inference. Instead of predicting Y, a causal forest estimates the conditional average treatment effect (CATE): tau(x) = E[Y(1) - Y(0) | X = x]. The method splits the covariate space to maximize treatment effect heterogeneity rather than outcome prediction accuracy. Combined with honesty (separate subsamples for splitting and estimation), and under unconfoundedness, overlap, and regularity conditions, causal forests produce pointwise asymptotically normal and consistent CATE estimates.

Why the Wager and Athey (2018) paper matters: It provided the first theoretically grounded method for estimating heterogeneous treatment effects using tree-based methods, complete with valid confidence intervals. The causal forest has become one of the most widely used tools for CATE estimation in economics.

What you will do:

Simulate data with known treatment effect heterogeneity
Estimate the ATE using OLS (ignoring heterogeneity)
Estimate the CATE function using a causal forest
Assess variable importance to identify effect modifiers
Conduct calibration tests comparing estimated and true CATEs

Step 1: Simulate Data with Heterogeneous Treatment Effects

The DGP features a randomized experiment (treatment is independent of covariates) with treatment effect heterogeneity driven by X1 and X2.

1# First-time setup: install.packages(c("grf", "data.table"))
2library(grf)
3library(data.table)
4
5set.seed(2018)
6
7# DGP parameters: 4000 observations, 10 covariates
8n <- 4000; p <- 10
9X <- matrix(rnorm(n * p), n, p)
10colnames(X) <- paste0("X", 1:p)
11
12# Randomized treatment with 50% probability
13W <- rbinom(n, 1, 0.5)
14# True CATE depends only on X1 and X2
15tau_true <- 1 + 2 * X[,1] + X[,2]
16# Baseline outcome is nonlinear in X (creates a challenging estimation problem)
17mu_0 <- 2 * X[,1] + X[,2]^2 + 0.5 * X[,3] * X[,4] + sin(X[,5])
18# Observed outcome = baseline + treatment effect * treatment + noise
19Y <- mu_0 + tau_true * W + rnorm(n)
20
21true_ate <- mean(tau_true)
22cat("n =", n, ", p =", p, "\n")
23cat("True ATE:", round(true_ate, 3), "\n")
24cat("CATE range:", round(range(tau_true), 2), "\n")

Requiresgrf data.table

Expected output:

Sample size: 4000
Covariates: 10
Treatment rate: ~50%

True CATE function: tau(x) = 1 + 2*X0 + X1
True ATE: ~1.00
CATE range: [-6.50, 8.50]
CATE std: ~2.24

Step 2: Estimate ATE with OLS (Ignoring Heterogeneity)

Before using causal forests, start with a simple OLS estimate that captures the average effect but misses heterogeneity.

1# Simple difference in means (unbiased in an RCT)
2ols_simple <- lm(Y ~ W)
3cat("Difference in means:", coef(ols_simple)["W"], "\n")
4
5# OLS with all X controls (reduces residual variance, improves precision)
6ols_ctrl <- lm(Y ~ W + X)
7cat("OLS + controls:", coef(ols_ctrl)["W"], "\n")
8
9cat("True ATE:", true_ate, "\n")

Expected output:

Model	ATE	SE
Difference in means	~1.00	~0.07
OLS + controls	~1.00	~0.05
True ATE	~1.00	—

Concept Check

In a randomized experiment, why would you want to estimate the CATE function tau(x) rather than just the ATE?

Because the ATE is biased in experiments.Because the CATE reveals which subpopulations benefit most (or are harmed by) the treatment, enabling personalized treatment assignment. If tau(x) varies substantially across individuals, knowing who benefits most has direct policy value.Because random forests are more efficient than OLS.Because the ATE does not have a causal interpretation.

Step 3: Estimate CATE with Causal Forest

The causal forest estimates tau(x) = E[Y(1) - Y(0) | X = x] by splitting the covariate space to maximize treatment effect heterogeneity.

1# Fit causal forest using grf (honest splitting for valid inference)
2cf <- causal_forest(X, Y, W,
3                  num.trees = 2000,    # Number of trees
4                  min.node.size = 5,   # Minimum leaf size
5                  seed = 42)
6
7# Out-of-bag CATE predictions for each observation
8tau_hat <- predict(cf)$predictions
9
10# Forest-based ATE with valid confidence interval
11ate_cf <- average_treatment_effect(cf)
12cat("=== Causal Forest ===\n")
13cat("ATE:", ate_cf["estimate"], "(SE:", ate_cf["std.err"], ")\n")
14cat("True ATE:", true_ate, "\n")
15
16# Evaluate CATE accuracy against the known true effects
17mse <- mean((tau_hat - tau_true)^2)
18corr <- cor(tau_hat, tau_true)
19cat("CATE MSE:", round(mse, 4), "\n")
20cat("CATE correlation:", round(corr, 4), "\n")

Requiresgrf

Expected output:

Metric	Value
ATE (causal forest)	~1.00
ATE 95% CI	[0.87, 1.13]
True ATE	1.00
CATE MSE	~0.45
CATE correlation	~0.92

Step 4: Variable Importance and Calibration

1# Variable importance: how much each covariate drives splitting
2vi <- variable_importance(cf)
3cat("=== Variable Importance ===\n")
4vi_df <- data.frame(Variable = colnames(X), Importance = vi)
5vi_df <- vi_df[order(-vi_df$Importance), ]  # Sort by importance
6print(vi_df)
7
8# Calibration test: checks if CATE predictions are well-calibrated
9# "mean.forest.prediction" should be near ATE; "differential" near 1
10cal_test <- test_calibration(cf)
11cat("\n=== Calibration ===\n")
12print(cal_test)

Expected output — Variable importance:

Covariate (R/Stata)	Covariate (Python)	Importance / Corr	True Corr	Note
X1	X0	~0.85	0.89	True effect modifier (largest coefficient)
X2	X1	~0.42	0.45	True effect modifier
X3	X2	~0.02	0.00	Noise
X4–X10	X3–X9	~0.01	0.00	Noise

Calibration:

Intercept: ~0.05 (ideal: 0)
Slope:     ~0.95 (ideal: 1)
R-squared: ~0.85

Concept Check

The causal forest correctly identifies X0 and X1 as treatment effect modifiers, even though X2–X4 also affect Y. Why does the causal forest distinguish between outcome predictors and effect modifiers?

Because the causal forest only uses treated observations.Because the causal forest splits to maximize heterogeneity in the treatment effect tau(x), not in the outcome level. Variables that shift the baseline outcome equally for treated and control units do not create treatment effect heterogeneity.Because the causal forest uses lasso variable selection.Because the experiment is randomized.

Step 5: Sorted-Group Analysis and Final Comparison

1# Sorted-group analysis: bin observations by estimated CATE quintile
2tau_q <- cut(tau_hat, breaks = quantile(tau_hat, seq(0,1,0.2)),
3            include.lowest = TRUE, labels = 1:5)
4cat("=== CATE by Quintile ===\n")
5# Compare predicted vs. true mean CATE within each quintile
6for (q in 1:5) {
7idx <- tau_q == q
8cat("Q", q, ": hat =", round(mean(tau_hat[idx]), 3),
9    ", true =", round(mean(tau_true[idx]), 3), "\n")
10}
11
12cat("\nATE (causal forest):", round(ate_cf["estimate"], 3), "\n")
13cat("True ATE:", round(true_ate, 3), "\n")

Expected output — Sorted-group analysis:

Quintile	Mean tau_hat	Mean tau_true	Diff-in-means
1 (lowest)	-2.30	-2.15	-2.10
2	-0.40	-0.35	-0.50
3	0.95	1.00	0.90
4	2.30	2.40	2.55
5 (highest)	4.50	4.20	4.30

Concept Check

Wager and Athey (2018) prove that causal forests are pointwise consistent and asymptotically normal. What feature makes pointwise confidence intervals valid?

Using a large number of trees.Honesty: the causal forest uses separate subsamples for determining the tree structure and estimating treatment effects within each leaf. Honesty prevents overfitting from distorting estimated effects and enables a valid variance estimator.Using the propensity score as a splitting variable.Bootstrap aggregation.

Summary

The replication of Wager and Athey (2018) demonstrates:

Causal forests recover the CATE function. Estimated effects correlate highly (r > 0.9) with the true treatment effect function.
Variable importance identifies true effect modifiers. The causal forest correctly distinguishes between covariates that drive heterogeneity and outcome-only predictors.
Good calibration. Units predicted to have high effects truly do have high effects.
Valid inference. Pointwise confidence intervals are enabled by honesty and the infinitesimal jackknife.
Complementary to ATE. The causal forest recovers the ATE and additionally reveals the full distribution of treatment effects.

Extension Exercises

Binary treatment effect. Set tau(x) as a step function: tau = 3 if X0 > 0, tau = -1 otherwise. How does the causal forest handle discontinuities?
Observational study. Make treatment depend on covariates. Compare the causal forest with and without propensity score adjustment.
Best linear projection. Use the best_linear_projection function in grf to recover a linear approximation of tau(x). Compare with the true CATE coefficients.
Policy learning. Construct an optimal treatment assignment rule from the estimated CATE. Calculate the value of targeting the top 50% vs. treating everyone.
Larger p. Increase covariates to p = 50 or 200 while keeping 2 true effect modifiers. How does variable selection degrade?
Coverage study. Run 200 Monte Carlo simulations and check pointwise CI coverage. Does coverage approach 95%?
BART comparison. Fit Bayesian Additive Regression Trees and compare CATE estimates. Discuss relative advantages.
Real data application. Apply the causal forest to an RCT dataset and identify subgroups with the largest and smallest treatment effects.

Overview#

Step 1: Simulate Data with Heterogeneous Treatment Effects#

Step 2: Estimate ATE with OLS (Ignoring Heterogeneity)#

Step 3: Estimate CATE with Causal Forest#

Step 4: Variable Importance and Calibration#

Step 5: Sorted-Group Analysis and Final Comparison#

Summary#

Extension Exercises#

Overview

Step 1: Simulate Data with Heterogeneous Treatment Effects

Step 2: Estimate ATE with OLS (Ignoring Heterogeneity)

Step 3: Estimate CATE with Causal Forest

Step 4: Variable Importance and Calibration

Step 5: Sorted-Group Analysis and Final Comparison

Summary

Extension Exercises