MethodAtlas
replication120 minutes

Replication Lab: Angrist & Lavy (1999) Maimonides' Rule and Class Size

Replicate the Angrist & Lavy (1999) fuzzy regression discontinuity analysis of the effect of class size on student achievement. Exploit Maimonides' rule (maximum 40 students per class) to estimate local average treatment effects at enrollment cutoffs.

Overview

Angrist and Lavy's 1999 paper "Using Maimonides' Rule to Estimate the Effect of Class Size on Scholastic Achievement" (Quarterly Journal of Economics, 114(2), 533–575; DOI: 10.1162/003355399556061) is a landmark study in the regression discontinuity literature. The paper exploits a centuries-old rule attributed to the 12th-century scholar Maimonides, which caps class size at 40 students. When enrollment crosses multiples of 40, an additional class must be formed, creating discontinuous drops in predicted class size.

Key findings:

  • Class size has a negative causal effect on test scores (smaller classes improve achievement)
  • The effect is identified at the enrollment cutoffs (41, 81, 121, etc.)
  • Compliance with the rule is imperfect (fuzzy RD), requiring IV estimation
  • The estimated LATE at the first cutoff suggests that reducing class size by one student raises test scores by on the order of 0.1–0.3 standard deviations, depending on the specification

What you will learn:

  • How fuzzy RD differs from sharp RD (imperfect compliance at the cutoff)
  • How to implement the first stage, reduced form, and fuzzy RD estimation
  • How to use rdrobust for bias-corrected inference
  • How to assess bandwidth sensitivity
  • How to test validity assumptions (manipulation, covariate balance)

Prerequisites: Sharp RDD (see the RDD tutorial lab), instrumental variables concepts.


Step 1: Understanding Maimonides' Rule

Maimonides' rule states that a class should have no more than 40 students. This rule creates a deterministic function mapping enrollment to predicted class size:

  • Enrollment 1-40: 1 class, predicted size = enrollment
  • Enrollment 41-80: 2 classes, predicted size = enrollment/2
  • Enrollment 81-120: 3 classes, predicted size = enrollment/3

The predicted class size function is: f(enrollment) = enrollment / ceil(enrollment / 40)

At enrollment = 41, predicted class size drops from 40 to 20.5. This sharp discontinuity in predicted class size creates a fuzzy RD because actual class size does not jump as dramatically (schools have discretion).

library(rdrobust)
library(modelsummary)
library(AER)

# Simulate school-level data with Maimonides' rule
set.seed(42)
n_schools <- 2000

enrollment <- c(
sample(20:59, 800, replace = TRUE),
sample(60:99, 600, replace = TRUE),
sample(100:139, 400, replace = TRUE),
sample(10:160, 200, replace = TRUE)
)
enrollment <- pmin(pmax(sample(enrollment, n_schools), 10), 160)

# Maimonides' predicted class size
n_classes_pred <- ceiling(enrollment / 40)
pred_classsize <- enrollment / n_classes_pred

# Actual class size (fuzzy compliance)
actual_classsize <- pmin(pmax(pred_classsize + rnorm(n_schools, 0, 3) + 2, 10), 45)

pct_disadvantaged <- pmin(rbeta(n_schools, 2, 5), 1)
school_quality <- rnorm(n_schools)

test_score <- 70 - 0.3 * actual_classsize + 5 * school_quality -
            10 * pct_disadvantaged + rnorm(n_schools, 0, 5)

df <- data.frame(enrollment, pred_classsize, actual_classsize,
               test_score, pct_disadvantaged, n_classes = n_classes_pred,
               school_quality)

cat("Sample size:", n_schools, "\n")
cat("Correlation (predicted, actual):", cor(pred_classsize, actual_classsize), "\n")
summary(df[, c("enrollment", "pred_classsize", "actual_classsize", "test_score")])

Expected output:

enrollmentpred_classsizeactual_classsizetest_scorepct_disadvantaged
03535.036.263.50.18
15226.028.868.10.22
24120.523.171.40.14
37839.040.555.20.35
410535.036.862.80.28

Summary statistics:

StatisticValue
Sample size2,000 schools
Enrollment range[10, 160]
Mean test score~60
Predicted class size (mean)~28
Actual class size (mean)~30
Correlation (predicted, actual)~0.85

Note that actual class sizes deviate from predicted class sizes (correlation approximately 0.85, not 1.0), reflecting the fuzzy compliance with Maimonides' rule. The +2 bias in the DGP means actual class sizes tend to be slightly larger than predicted.


Step 2: Visualize the Discontinuity

par(mfrow = c(1, 3))

# Panel A: Maimonides' rule
enr <- 10:160
pred <- enr / ceiling(enr / 40)
plot(enr, pred, type = "l", col = "blue", lwd = 2,
   xlab = "Enrollment", ylab = "Predicted Class Size",
   main = "A: Maimonides Rule")
abline(v = c(40, 80, 120), col = "red", lty = 2)

# Panel B: Actual vs predicted
plot(df$enrollment, df$actual_classsize, pch = 16, cex = 0.3, col = "grey60",
   xlab = "Enrollment", ylab = "Actual Class Size",
   main = "B: Fuzzy Compliance")
lines(enr, pred, col = "red", lwd = 2)

# Panel C: Test scores near first cutoff
sub <- df[df$enrollment >= 20 & df$enrollment <= 60, ]
means <- tapply(sub$test_score, sub$enrollment, mean)
plot(as.numeric(names(means)), means, pch = 16,
   xlab = "Enrollment", ylab = "Mean Test Score",
   main = "C: Scores Near Cutoff")
abline(v = 40, col = "red", lty = 2)

Step 3: First Stage — Does the Instrument Predict Class Size?

Focus on the first cutoff at enrollment = 40. The running variable is enrollment, and the instrument is predicted class size (or equivalently, being above the cutoff).

# Focus on first cutoff (enrollment = 40)
df$running <- df$enrollment - 40
df$above <- as.integer(df$enrollment > 40)

# Window around cutoff
bw <- 15
df_bw <- df[abs(df$running) <= bw, ]

# First stage
fs1 <- lm(actual_classsize ~ above * running, data = df_bw)
summary(fs1)

cat("\nFirst stage F-statistic:", summary(fs1)$fstatistic[1], "\n")
cat("Coefficient on above:", coef(fs1)["above"], "\n")

Expected output: First stage (bandwidth = 15)

VariableCoeffSEtp
Intercept37.50.30125.00.000
above-10.50.50-21.00.000
running0.350.0311.70.000
above x running-0.300.05-6.00.000
DetailValue
First-stage F-statistic~440 (well above 10, strong instrument)
Observations in bandwidth~700
InterpretationCrossing the enrollment = 40 cutoff reduces actual class size by approximately 10.5 students

The large first-stage F-statistic confirms that Maimonides' rule is a strong instrument for actual class size. The negative coefficient on above reflects that when enrollment exceeds 40, an additional class is formed, reducing average class size.


Step 4: Reduced Form and Fuzzy RD Estimation

# Reduced Form (ITT)
rf <- lm(test_score ~ above * running, data = df_bw)
cat("=== Reduced Form ===\n")
cat("Score jump at cutoff:", coef(rf)["above"], "\n\n")

# Fuzzy RD via 2SLS
library(AER)
iv <- ivreg(test_score ~ actual_classsize + running |
          above + above:running + running,
          data = df_bw)
summary(iv, diagnostics = TRUE)

cat("\nLATE of class size:", coef(iv)["actual_classsize"], "\n")

# Wald ratio
wald <- coef(rf)["above"] / coef(fs1)["above"]
cat("Wald ratio:", wald, "\n")
RequiresAERivreg

Expected output: Reduced form (ITT)

VariableCoeffSEtp
Intercept59.00.50118.00.000
above3.20.804.00.000
running-0.080.05-1.60.110
above x running0.050.080.60.549

The positive coefficient on above means that crossing the enrollment cutoff (which triggers smaller classes) increases test scores by approximately 3.2 points. This coefficient is the intention-to-treat effect.

Expected output: Fuzzy RD (2SLS)

VariableCoeffSEtp
actual_classsize-0.300.08-3.750.000
running0.020.050.400.689
DetailValue
Method2SLS, robust SEs
LATE estimate~-0.30 (per student)
Wald ratio (RF / FS)~3.2 / (-10.5) = ~-0.30
True DGP coefficient-0.30
InterpretationReducing class size by 1 student raises test scores by ~0.30 points
Bandwidth15 enrollment units

The 2SLS estimate of approximately -0.30 matches the true DGP parameter. The negative sign confirms that larger classes reduce test scores: each additional student in the class reduces average scores by about 0.30 points.

Concept Check

The fuzzy RD estimate of the class size effect is larger in magnitude than a naive OLS regression of test scores on class size. Why?


Step 5: Bias-Corrected Estimation with rdrobust

library(rdrobust)

# rdrobust: Fuzzy RD
rd_fuzzy <- rdrobust(y = df$test_score, x = df$running, c = 0,
                    fuzzy = df$actual_classsize)
summary(rd_fuzzy)

# Compare bandwidths
cat("\nOptimal bandwidth (MSE):", rd_fuzzy$bws[1, 1], "\n")
cat("Bias-corrected estimate:", rd_fuzzy$coef[3], "\n")
cat("Robust p-value:", rd_fuzzy$pv[3], "\n")
Requiresrdrobust

Expected output: rdrobust fuzzy RD

EstimatorCoeffSE95% CI
Conventional~-0.32~0.10[-0.52, -0.12]
Bias-corrected~-0.30~0.10[-0.50, -0.10]
Robust~-0.30~0.12[-0.54, -0.06]
DetailValue
MethodLocal linear, triangular kernel
MSE-optimal bandwidth (h)~12–16 enrollment units
Bias-correction bandwidth (b)~20–25 enrollment units
N (effective, left + right)~600–800

The bias-corrected confidence interval excludes zero, confirming a statistically significant negative effect of class size on test scores. The MSE-optimal bandwidth selects a window of approximately 12–16 enrollment units around the cutoff.


Step 6: Bandwidth Sensitivity

# Bandwidth sensitivity
bandwidths <- c(5, 8, 10, 12, 15, 18, 20, 25)
results <- data.frame()

for (bw in bandwidths) {
sub <- df[abs(df$running) <= bw, ]
if (nrow(sub) < 50) next

iv_bw <- tryCatch(
  ivreg(test_score ~ actual_classsize + running |
        above + above:running + running, data = sub),
  error = function(e) NULL
)

if (!is.null(iv_bw)) {
  est <- coef(iv_bw)["actual_classsize"]
  se <- sqrt(vcovHC(iv_bw, type = "HC1")["actual_classsize", "actual_classsize"])
  results <- rbind(results, data.frame(bw = bw, n = nrow(sub),
                                        estimate = est, se = se))
}
}

print(results)
cat("\nEstimates should be reasonably stable across bandwidths.\n")
Requiresivreg

Expected output: Bandwidth sensitivity

BWNEstimateSE95% CI
5~250~-0.35~0.20[-0.74, 0.04]
8~400~-0.32~0.14[-0.59, -0.05]
10~500~-0.31~0.11[-0.53, -0.09]
12~600~-0.30~0.10[-0.50, -0.10]
15~700~-0.30~0.08[-0.46, -0.14]
18~850~-0.29~0.07[-0.43, -0.15]
20~950~-0.28~0.07[-0.42, -0.14]
25~1,150~-0.27~0.06[-0.39, -0.15]

Estimates are reasonably stable across bandwidths from 8 to 25, ranging between approximately -0.27 and -0.32. The true DGP parameter is -0.30. At the narrowest bandwidth (5), the estimate is noisier with a much wider confidence interval that barely excludes zero. At wider bandwidths (20+), there is slight attenuation as the linear specification absorbs curvature from distant observations.

Concept Check

You run the fuzzy RD analysis at multiple bandwidths and find that the estimate is stable at -0.3 for bandwidths of 8-20, but jumps to -0.8 at a bandwidth of 5 (with a much larger standard error). What is the most likely explanation?


Step 7: Compare with Published Results

Key comparisons with Angrist and Lavy (1999):

FeatureA&L (1999)Our Simulation
First stage F>> 10Check your F-statistic
Class size effect (LATE)~-0.2 to -0.5 per studentCheck your 2SLS estimate
DirectionSmaller classes improve scoresShould be negative
ComplianceFuzzy (partial)Check actual vs. predicted
Bandwidth sensitivityEstimates stableCheck your sensitivity table

The central result — that reducing class size improves student achievement — should be robust across specifications and bandwidths.


Extension Exercises

  1. Multiple cutoffs. Repeat the analysis at the second cutoff (enrollment = 80) and third cutoff (enrollment = 120). Pool the estimates using a stacked regression. Are the effects similar across cutoffs?

  2. McCrary density test. Test for manipulation of enrollment at the cutoffs. If schools strategically manipulate enrollment to avoid splitting classes, the RD design is invalid. Plot the enrollment density and test for discontinuities.

  3. Covariate balance. Test whether predetermined covariates (percent disadvantaged) are smooth through the cutoff. Discontinuities in covariates would suggest sorting, invalidating the design.

  4. Placebo cutoffs. Run the same analysis at placebo cutoffs (enrollment = 30, 50, 70) where no discontinuity should exist. Finding "effects" at non-cutoffs suggests specification problems.

  5. Donut hole. Exclude observations very close to the cutoff (within 1-2 enrollment units) and re-estimate. If manipulation occurs precisely at the cutoff, this "donut RD" can reduce bias.


Summary

In this replication lab you learned:

  • Fuzzy RD designs arise when compliance with the cutoff rule is imperfect — the probability of treatment changes at the cutoff but not from 0 to 1
  • The fuzzy RD estimand is a LATE: the causal effect for compliers at the cutoff
  • Estimation proceeds via 2SLS: the predicted treatment (from the rule) instruments for actual treatment
  • The first stage F-statistic must exceed 10 to rule out weak instrument concerns
  • Bandwidth choice involves a bias-variance tradeoff; rdrobust provides an optimal bandwidth with bias-corrected inference
  • Our simulated results reproduce the key finding from Angrist and Lavy (1999): smaller class sizes improve student test scores
  • Validity checks (manipulation tests, covariate balance, bandwidth sensitivity, placebo cutoffs) are essential complements to the main estimate