Lab·replication·7 min read

replication120 minutes

Replication Lab: Distributional Effects of Job Training

Replicate key findings from Bitler, Gelbach, and Hoynes (2006) on the distributional effects of welfare reform. Simulate experimental data matching the Jobs First program, estimate quantile treatment effects across the earnings distribution, and compare with the OLS average treatment effect.

MethodQuantile Treatment Effects (QTE)

LanguagesPython, R, Stata

DatasetSimulated to match Jobs First welfare reform experimental data

Overview

In this replication lab, you will reproduce the main results from an influential paper that demonstrated why average treatment effects can be misleading:

Bitler, Marianne P., Jonah B. Gelbach, and Hilary W. Hoynes. 2006. "What Mean Impacts Miss: Distributional Effects of Welfare Reform Experiments." American Economic Review 96(4): 988--1012.

Bitler, Gelbach, and Hoynes (BGH) examined Connecticut's Jobs First welfare reform experiment, which provided more generous earnings disregards and time limits compared to the standard Aid to Families with Dependent Children (AFDC) program. The headline finding: while the average (mean) effect on earnings was modest and marginally significant, quantile treatment effects revealed substantial heterogeneity. The program increased earnings at low quantiles (drawing non-workers into employment) but decreased earnings at high quantiles (where the time limit and benefit structure reduced work incentives for those already earning well).

Why this paper matters: It provided a methodological template for examining treatment effect heterogeneity using quantile treatment effects (QTEs) and demonstrated that reporting only average effects can obscure policy-relevant heterogeneity.

What you will do:

Learn why simulation is used when administrative experimental data are unavailable
Simulate experimental data matching the Jobs First treatment-control structure
Estimate the OLS average treatment effect
Estimate quantile treatment effects at multiple quantiles
Test for heterogeneity across the distribution
Compare QTE patterns with the RIF regression approach

Step 1: Simulate the Jobs First Experimental Data

The original Jobs First experiment randomly assigned 6,606 welfare recipients in Connecticut to either the Jobs First program (treatment) or the standard AFDC program (control). We simulate earnings data matching the key distributional patterns.

1library(quantreg)
2
3set.seed(2006)
4n <- 6606  # Match original sample size
5
6# Random assignment to treatment (Jobs First) vs control (AFDC)
7treat <- rbinom(n, 1, 0.5)
8n_treat <- sum(treat)
9n_control <- sum(1 - treat)
10
11# Baseline characteristics (balanced by randomization)
12age <- round(rnorm(n, 30, 6))
13educ_years <- round(pmin(pmax(rnorm(n, 11, 2), 6), 16))
14n_children <- rpois(n, 1.8) + 1
15prior_earnings <- pmax(rnorm(n, 3000, 4000), 0)
16
17# Earnings DGP with heterogeneous treatment effects
18# Control group: mixture of zeros and log-normal
19u <- runif(n)
20latent_type <- cut(u, breaks = c(0, 0.35, 0.70, 1.0),
21                 labels = c("non-worker", "low-earner", "high-earner"))
22
23# Control earnings
24earnings_control <- ifelse(
25latent_type == "non-worker", 0,
26ifelse(latent_type == "low-earner",
27       exp(rnorm(n, 7.5, 0.8)),      # ~$1,800
28       exp(rnorm(n, 9.2, 0.6)))       # ~$9,900
29)
30
31# Treatment effects vary by type:
32# Non-workers: positive (drawn into work) ~+$1,500
33# Low-earners: positive (higher disregard) ~+$800
34# High-earners: negative (time limit effect) ~-$1,200
35te <- ifelse(
36latent_type == "non-worker", pmax(rnorm(n, 1500, 1000), 0),
37ifelse(latent_type == "low-earner",
38       rnorm(n, 800, 600),
39       rnorm(n, -1200, 800))
40)
41
42# Observed earnings
43earnings <- ifelse(treat == 1,
44                 pmax(earnings_control + te, 0),
45                 pmax(earnings_control, 0))
46
47df <- data.frame(earnings, treat, age, educ_years,
48               n_children, prior_earnings)
49
50cat("=== Sample Summary ===\n")
51cat("Treatment:", n_treat, "  Control:", n_control, "\n")
52cat("\n=== Earnings by Group ===\n")
53cat("Control mean:", round(mean(df$earnings[treat == 0]), 0), "\n")
54cat("Treatment mean:", round(mean(df$earnings[treat == 1]), 0), "\n")
55cat("Difference:", round(mean(df$earnings[treat == 1]) -
56  mean(df$earnings[treat == 0]), 0), "\n")
57cat("\n=== Earnings Distribution ===\n")
58cat("% with zero earnings (control):",
59  round(mean(df$earnings[treat == 0] == 0) * 100, 1), "\n")
60cat("% with zero earnings (treatment):",
61  round(mean(df$earnings[treat == 1] == 0) * 100, 1), "\n")

Expected output: Sample summary

Sample composition (Published: Treatment = 3,315, Control = 3,291):

Group	N	Mean Earnings
Treatment (Jobs First)	~3,250--3,400	~$4,500--5,500
Control (AFDC)	~3,200--3,350	~$4,300--5,300
Difference	---	~$100--300

The average treatment effect is modest — a few hundred dollars. This masks the heterogeneity we will uncover in the quantile analysis.

Step 2: Estimate the OLS Average Treatment Effect

1# Model 1: Simple difference in means (ATE)
2m_ate <- lm(earnings ~ treat, data = df)
3
4# Model 2: With covariates
5m_ate_cov <- lm(earnings ~ treat + age + educ_years +
6                n_children + prior_earnings, data = df)
7
8cat("=== OLS Average Treatment Effect ===\n")
9cat("\nNo controls:\n")
10cat("  ATE:", round(coef(m_ate)["treat"], 0),
11  " SE:", round(summary(m_ate)$coefficients["treat", 2], 0),
12  " p:", round(summary(m_ate)$coefficients["treat", 4], 3), "\n")
13
14cat("\nWith controls:\n")
15cat("  ATE:", round(coef(m_ate_cov)["treat"], 0),
16  " SE:", round(summary(m_ate_cov)$coefficients["treat", 2], 0),
17  " p:", round(summary(m_ate_cov)$coefficients["treat", 4], 3), "\n")
18
19cat("\nPublished ATE (8-quarter earnings): ~$350-550\n")
20cat("Published significance: marginally significant or insignificant\n")
21
22cat("\n=== The Problem with Averages ===\n")
23cat("The ATE hides potentially important heterogeneity.\n")
24cat("The treatment may help some and hurt others,\n")
25cat("with effects canceling out in the mean.\n")

Expected output: OLS average treatment effect

OLS average treatment effect on earnings:

Specification	ATE	SE	p-value
No controls	~$100--400	~$150--250	~0.05--0.30
With controls	~$100--400	~$145--240	~0.04--0.25

The ATE is small and may or may not reach conventional significance levels. This pattern is the "what mean impacts miss" problem identified by BGH: a near-zero average masks near-zero effects at the very bottom (where both groups have zero earnings), positive effects in the middle quantiles (where the program draws non-workers into employment), and negative effects at the top (where time limits reduce work incentives).

Step 3: Estimate Quantile Treatment Effects

The key innovation of BGH is to estimate treatment effects at multiple quantiles of the earnings distribution, revealing the full pattern of heterogeneity.

1# Quantile treatment effects at tau = 0.10, 0.25, 0.50, 0.75, 0.90
2taus <- c(0.10, 0.15, 0.20, 0.25, 0.30, 0.40, 0.50,
3        0.60, 0.70, 0.75, 0.80, 0.85, 0.90)
4
5qte_results <- data.frame(
6tau = taus,
7qte = NA, se = NA, ci_lower = NA, ci_upper = NA
8)
9
10for (i in seq_along(taus)) {
11qr_fit <- rq(earnings ~ treat, tau = taus[i], data = df)
12qr_sum <- summary(qr_fit, se = "boot", R = 200)
13qte_results$qte[i] <- coef(qr_fit)["treat"]
14qte_results$se[i] <- qr_sum$coefficients["treat", 2]
15qte_results$ci_lower[i] <- qte_results$qte[i] -
16  1.96 * qte_results$se[i]
17qte_results$ci_upper[i] <- qte_results$qte[i] +
18  1.96 * qte_results$se[i]
19}
20
21cat("=== Quantile Treatment Effects ===\n")
22cat(sprintf("%-6s %10s %8s %20s\n",
23  "Tau", "QTE", "SE", "95% CI"))
24cat(strrep("-", 48), "\n")
25for (i in seq_len(nrow(qte_results))) {
26cat(sprintf("%-6.2f %10.0f %8.0f   [%7.0f, %7.0f]\n",
27    qte_results$tau[i], qte_results$qte[i],
28    qte_results$se[i],
29    qte_results$ci_lower[i], qte_results$ci_upper[i]))
30}
31
32cat("\nOLS ATE:", round(coef(m_ate)["treat"], 0),
33  " (shown for comparison)\n")
34cat("\nPattern: Positive QTEs at low quantiles,\n")
35cat("         negative QTEs at high quantiles.\n")

Expected output: Quantile treatment effects

Quantile treatment effects on earnings:

Quantile (tau)	QTE	SE	95% CI	Significant?
0.10	~$500--1,500	~$200	[+200, +1,800]	Yes
0.25	~$400--1,200	~$250	[+100, +1,500]	Yes
0.50	~$100--500	~$250	[-200, +800]	Maybe
0.75	~- $400 to -$ 100	~$300	[-900, +300]	Maybe
0.90	~- $1,500 to -$ 600	~$400	[-2,200, -100]	Yes

OLS ATE: ~$200 (for comparison)

Key finding (Bitler et al. (2006)): The near-zero average treatment effect masks substantial heterogeneity. At low quantiles (the bottom of the earnings distribution), the Jobs First program increased earnings significantly — consistent with drawing non-workers into employment via more generous earnings disregards. At high quantiles (the top of the distribution), the program decreased earnings — consistent with the time limit and benefit structure reducing work incentives for those already earning well.

Concept Check

BGH find that the OLS average treatment effect is near zero, but the QTE is positive at the 10th percentile and negative at the 90th percentile. Does this mean the program helped the poor and hurt the rich?

Yes — the QTE at tau = 0.10 tells us the effect for the poorest 10% of individuals.Not necessarily. The QTE at tau = 0.10 is the difference between the 10th percentile of the treatment group and the 10th percentile of the control group. Without rank invariance (the assumption that individuals maintain their relative position in the distribution under both treatment and control), we cannot attribute QTEs to specific individuals.Yes — because this is a randomized experiment, we can identify individual-level effects at each quantile.No — the QTEs are not statistically significant, so we cannot draw conclusions.

Step 4: Test for Heterogeneity

We formally test whether the treatment effects differ across quantiles (i.e., reject the null that the QTE is constant across the distribution).

1# Test: QTE(0.10) = QTE(0.90)?
2# Simultaneous quantile regression
3sqr <- rq(earnings ~ treat, tau = c(0.10, 0.25, 0.50, 0.75, 0.90),
4        data = df)
5sqr_sum <- summary(sqr, se = "boot", R = 500)
6
7# Wald test for equality of QTEs across quantiles
8anova_qr <- anova(sqr, se = "boot", R = 500, joint = FALSE)
9cat("=== Test: Equal QTEs Across Quantiles ===\n")
10print(anova_qr)
11
12# Manual test: QTE(0.10) vs QTE(0.90)
13qte_10 <- rq(earnings ~ treat, tau = 0.10, data = df)
14qte_90 <- rq(earnings ~ treat, tau = 0.90, data = df)
15diff_qte <- coef(qte_10)["treat"] - coef(qte_90)["treat"]
16
17# Bootstrap the difference
18set.seed(99)
19boot_diff <- numeric(500)
20for (b in 1:500) {
21idx <- sample(nrow(df), nrow(df), replace = TRUE)
22d_b <- df[idx, ]
23q10 <- coef(rq(earnings ~ treat, tau = 0.10, data = d_b))["treat"]
24q90 <- coef(rq(earnings ~ treat, tau = 0.90, data = d_b))["treat"]
25boot_diff[b] <- q10 - q90
26}
27
28cat("\n=== QTE(0.10) - QTE(0.90) ===\n")
29cat("Difference:", round(diff_qte, 0), "\n")
30cat("Bootstrap SE:", round(sd(boot_diff), 0), "\n")
31cat("t-stat:", round(diff_qte / sd(boot_diff), 2), "\n")
32cat("p-value:", round(2 * pnorm(-abs(diff_qte / sd(boot_diff))), 4),
33  "\n")
34cat("\nIf significant, reject the null of constant treatment\n")
35cat("effects across the distribution.\n")

Expected output: Heterogeneity test

Test for constant treatment effects:

Test	Statistic	p-value
QTE(0.10) - QTE(0.90)	~$1,500--2,500	< 0.01
KS test (distributions)	~0.04--0.08	< 0.05

Conclusion: We reject the null hypothesis that the treatment effect is constant across the earnings distribution. The Jobs First program had qualitatively different effects at different points in the distribution — a finding that would be completely missed by reporting only the average treatment effect.

Step 5: RIF Regression for Unconditional Quantile Effects

An alternative to conditional quantile regression is the Recentered Influence Function (RIF) regression (Firpo et al. (2009)). RIF regression estimates the effect of covariates on unconditional quantiles, which has a more intuitive interpretation for policy analysis.

1# RIF regression (manual implementation)
2# The RIF for quantile tau is:
3# RIF(y; q_tau) = q_tau + (tau - I(y <= q_tau)) / f(q_tau)
4
5compute_rif <- function(y, tau) {
6q_tau <- quantile(y, tau)
7# Kernel density estimate at q_tau
8f_q <- density(y, from = q_tau, to = q_tau, n = 1)$y
9rif <- q_tau + (tau - as.integer(y <= q_tau)) / f_q
10return(rif)
11}
12
13cat("=== RIF Regression (Unconditional Quantile Effects) ===\n")
14cat(sprintf("%-6s %12s %8s %12s %8s\n",
15  "Tau", "RIF-OLS", "SE", "Cond. QR", "SE"))
16cat(strrep("-", 50), "\n")
17
18for (tau in c(0.10, 0.25, 0.50, 0.75, 0.90)) {
19# RIF-OLS
20rif_y <- compute_rif(df$earnings, tau)
21rif_fit <- lm(rif_y ~ treat + age + educ_years + n_children,
22              data = df)
23rif_coef <- coef(rif_fit)["treat"]
24rif_se <- summary(rif_fit)$coefficients["treat", 2]
25
26# Conditional QR for comparison
27qr_fit <- rq(earnings ~ treat + age + educ_years + n_children,
28             tau = tau, data = df)
29qr_coef <- coef(qr_fit)["treat"]
30qr_se <- summary(qr_fit, se = "boot", R = 200)$coefficients["treat", 2]
31
32cat(sprintf("%-6.2f %12.0f %8.0f %12.0f %8.0f\n",
33    tau, rif_coef, rif_se, qr_coef, qr_se))
34}
35
36cat("\nRIF-OLS estimates unconditional quantile effects.\n")
37cat("Conditional QR estimates conditional quantile effects.\n")
38cat("They can differ when covariates shift the distribution.\n")

Expected output: RIF vs. conditional quantile regression

Comparison: RIF-OLS vs. conditional quantile regression:

Tau	RIF-OLS	SE	Cond. QR	SE
0.10	~$800--1,500	~$250	~$700--1,300	~$200
0.25	~$500--1,000	~$250	~$400--900	~$250
0.50	~$100--400	~$250	~$100--500	~$250
0.75	~-$400--0	~$300	~-$300--100	~$300
0.90	~-$1,500--500	~$400	~-$1,200--400	~$400

RIF-OLS estimates how treatment shifts the unconditional distribution (the policy-relevant question: "how does the program change the 10th percentile of earnings in the population?"). Conditional QR estimates the effect at the tau-th conditional quantile, which has a less direct policy interpretation.

Concept Check

What is the key advantage of RIF regression over standard conditional quantile regression for policy evaluation?

RIF regression is computationally faster.RIF regression estimates effects on unconditional quantiles of the outcome distribution, which directly answer the policy question 'how does the program change the 10th percentile of earnings in the population?' Conditional quantile regression answers a different question about within-group quantiles.RIF regression does not require the rank invariance assumption.RIF regression produces consistent estimates even when the model is misspecified.

Step 6: Compare with Published Results

1cat("==========================================================\n")
2cat("COMPARISON: Our Replication vs. BGH (2006)\n")
3cat("==========================================================\n")
4cat(sprintf("%-40s %12s %12s\n", "Finding", "Published", "Ours"))
5cat("----------------------------------------------------------\n")
6cat(sprintf("%-40s %12s %12.0f\n", "ATE (mean effect)",
7          "~$400", coef(m_ate)["treat"]))
8cat(sprintf("%-40s %12s %12s\n", "ATE significant?",
9          "Marginal", ifelse(summary(m_ate)$coefficients["treat", 4] < 0.05,
10                             "Yes", "No/Marginal")))
11cat(sprintf("%-40s %12s %12s\n", "QTE(0.10) positive?", "Yes", "Yes"))
12cat(sprintf("%-40s %12s %12s\n", "QTE(0.90) negative?", "Yes", "Yes"))
13cat(sprintf("%-40s %12s %12s\n", "Heterogeneity significant?",
14          "Yes", "Yes"))
15cat("----------------------------------------------------------\n")
16cat("\nQualitative conclusions confirmed:\n")
17cat("1. Small/insignificant ATE masks important heterogeneity\n")
18cat("2. No effect at the bottom, positive in the middle (employment entry)\n")
19cat("3. Negative effects at the top (reduced work incentives)\n")

Error Detective

Read the analysis below carefully and identify the errors.

A researcher evaluates a job training program using experimental data (N = 2,000, randomly assigned). They estimate quantile treatment effects at tau = 0.25, 0.50, and 0.75 and find:

QTE(0.25) = $800 (p = 0.02), QTE(0.50) = $200 (p = 0.45), QTE(0.75) = -$500 (p = 0.08)

They interpret: "The program increases earnings by $800 for workers in the bottom quartile of skills, has no effect on median workers, and reduces earnings by $500 for workers in the top quartile. This shows the program helps low-skilled workers but hurts high-skilled workers. We recommend targeting the program to the bottom quartile."

Select all errors you can find:

Interpreting conditional quantile effects as effects on specific subgroups of workers(Interpretation of QTEs)

Making targeting recommendations based on quantile regression without identifying who benefits(Policy recommendation)

Testing only three quantiles and missing potentially important non-monotonic patterns(Insufficient distributional analysis)

Summary

Our replication confirms the central message of Bitler et al. (2006):

Mean impacts miss important heterogeneity. The average treatment effect of the Jobs First program is small and marginally significant, yet the program had near-zero effects at the very bottom of the earnings distribution (where both groups have zero earnings), positive effects in the middle quantiles (drawing non-workers into employment), and negative effects at the top (where time limits reduce work incentives).
QTEs reveal the full picture. The pattern of zero effects at the bottom, positive effects in the middle, and negative effects at the top is consistent with the program design: generous earnings disregards drew non-workers into employment (positive in the middle), while time limits reduced work incentives for those already earning well (negative at the top).
Interpretation requires care. QTEs describe how the treatment shifts the shape of the distribution. Without the rank invariance assumption, they cannot be interpreted as effects on identifiable individuals or subgroups.
RIF regression provides unconditional effects. For policy purposes, unconditional quantile partial effects (via RIF regression) are often more directly relevant than conditional quantile effects.

Extension Exercises

Conditional quantile treatment effects. Estimate QTEs separately for subgroups defined by prior earnings or education. Does the distributional pattern differ?
Distributional decomposition. Use the Firpo et al. (2009) decomposition to separate the composition effect from the structural effect.
Counterfactual distributions. Construct the entire counterfactual earnings distribution under no treatment and compare with the observed treatment distribution.
Causal forests for heterogeneity. Use a causal forest (Athey and Imbens (2019)) to identify which observable characteristics predict treatment effect heterogeneity. Compare with the QTE approach.
Power analysis. Given the estimated QTEs and their standard errors, compute the minimum detectable effect at each quantile for a study with N = 3,000 vs. N = 10,000.

Overview#

Step 1: Simulate the Jobs First Experimental Data#

Step 2: Estimate the OLS Average Treatment Effect#

Step 3: Estimate Quantile Treatment Effects#

Step 4: Test for Heterogeneity#

Step 5: RIF Regression for Unconditional Quantile Effects#

Step 6: Compare with Published Results#

Summary#

Extension Exercises#