MethodAtlas
Lab·replication·8 min read
replication120 minutes

Replication Lab: The Effect of a Smoking Ban on Hospital Admissions

Replicate key findings from a public health interrupted time series study of a smoking ban's effect on hospital admissions. Simulate monthly admission data, estimate segmented regression models, diagnose and correct for autocorrelation, and control for seasonality.

LanguagesPython, R, Stata
DatasetSimulated monthly hospital admission counts

Overview

In this replication lab, you will reproduce key findings from the public health literature on the effect of smoking bans on hospital admissions for acute myocardial infarction (AMI):

Barone-Adesi, Francesco, et al. 2011. "Effects of Italian Smoking Regulation on Rates of Hospital Admission for Acute Coronary Events: A Country-Wide Study." PLoS ONE 6(3): e17419.

The interrupted time series (ITS) design compares the trend in hospital admissions before and after the implementation of a smoking ban. The headline finding: comprehensive smoking bans are associated with an immediate reduction of approximately 4--11% in AMI hospital admissions, with effects appearing within the first months and persisting over time.

Why this paper matters: ITS is one of the strongest quasi-experimental designs for evaluating policies that are implemented at a single point in time across an entire population. The smoking ban literature provides a textbook application of segmented regression with autocorrelation diagnostics.

What you will do:

  • Learn why simulation is used when individual-level health records are unavailable
  • Simulate monthly hospital admission counts matching stylized patterns
  • Estimate segmented regression (level change and slope change)
  • Diagnose and correct for autocorrelation using Durbin-Watson and Newey-West SEs
  • Add seasonality controls and assess robustness
  • Compare your results to published findings

Step 1: Simulate Monthly Hospital Admission Data

We simulate 96 months of data (8 years: 4 pre-ban and 4 post-ban). The smoking ban takes effect at month 49.

library(lmtest)
library(sandwich)
library(nlme)

set.seed(2011)
T_total <- 96  # 8 years of monthly data
t_interv <- 49  # Intervention at month 49

# Time variables
time <- 1:T_total
post <- as.integer(time >= t_interv)
time_since <- pmax(time - t_interv, 0)

# Seasonality: winter peaks (month 1,2,12 = higher admissions)
month_in_year <- rep(1:12, 8)
seasonal <- 15 * sin(2 * pi * (month_in_year - 1) / 12) +
8 * cos(2 * pi * (month_in_year - 1) / 12)

# Baseline trend: slight decline (aging population offset by
# medical advances)
baseline <- 450 - 0.3 * time

# Intervention effect: level drop of ~25 admissions (~5-6%)
# and slight additional decline in trend (~0.2/month)
intervention_effect <- -25 * post - 0.2 * time_since

# Autocorrelated errors (AR(1) with rho ~ 0.3)
e <- numeric(T_total)
e[1] <- rnorm(1, 0, 12)
for (i in 2:T_total) {
e[i] <- 0.3 * e[i - 1] + rnorm(1, 0, 12)
}

admissions <- round(baseline + seasonal + intervention_effect + e)

df <- data.frame(time, month_in_year, post, time_since,
               admissions)

cat("=== Data Summary ===\n")
cat("Pre-ban mean admissions:", round(mean(df$admissions[df$post == 0]), 1),
  "\n")
cat("Post-ban mean admissions:", round(mean(df$admissions[df$post == 1]), 1),
  "\n")
cat("Raw difference:", round(mean(df$admissions[df$post == 0]) -
  mean(df$admissions[df$post == 1]), 1), "\n")
cat("Months: Pre =", sum(1 - df$post), "  Post =", sum(df$post), "\n")

Step 2: Estimate the Segmented Regression (ITS Model)

The standard ITS model estimates both a level change (immediate effect) and a slope change (gradual effect) at the intervention point:

admissions_t = b0 + b1 * time + b2 * post + b3 * time_since + e_t

# Basic ITS model (OLS)
its_ols <- lm(admissions ~ time + post + time_since, data = df)
summary(its_ols)

cat("\n=== ITS Parameter Interpretation ===\n")
cat("b0 (intercept):    ", round(coef(its_ols)[1], 2),
  " — baseline level at time 0\n")
cat("b1 (time):         ", round(coef(its_ols)[2], 4),
  " — pre-intervention slope\n")
cat("b2 (post):         ", round(coef(its_ols)[3], 2),
  " — immediate level change at intervention\n")
cat("b3 (time_since):   ", round(coef(its_ols)[4], 4),
  " — change in slope after intervention\n")

# Predicted admissions at intervention point
pred_at_interv <- coef(its_ols)[1] + coef(its_ols)[2] * t_interv
cat("\nPredicted admissions at intervention (counterfactual):",
  round(pred_at_interv, 1), "\n")
cat("Actual level at intervention:",
  round(pred_at_interv + coef(its_ols)[3], 1), "\n")
cat("Percentage drop:", round(coef(its_ols)[3] / pred_at_interv * 100, 1),
  "%\n")
Concept Check

In the ITS model, what does the coefficient b3 (time_since) represent?


Step 3: Diagnose Autocorrelation

Time series data almost always exhibit autocorrelation — successive observations are correlated. If we ignore autocorrelation, standard errors are biased (usually too small), leading to inflated significance.

# Durbin-Watson test
dw <- dwtest(its_ols)
cat("=== Durbin-Watson Test ===\n")
cat("DW statistic:", round(dw$statistic, 3), "\n")
cat("p-value:", round(dw$p.value, 4), "\n")
cat("H0: No first-order autocorrelation\n")
cat("DW ~ 2: no autocorrelation; DW < 2: positive; DW > 2: negative\n")

# ACF of residuals
resids <- residuals(its_ols)
acf_vals <- acf(resids, lag.max = 12, plot = FALSE)
cat("\n=== ACF of Residuals ===\n")
for (k in 1:6) {
cat("Lag", k, ":", round(acf_vals$acf[k + 1], 3), "\n")
}

# Ljung-Box test
lb <- Box.test(resids, lag = 12, type = "Ljung-Box")
cat("\nLjung-Box test (12 lags): chi-sq =", round(lb$statistic, 2),
  " p =", round(lb$p.value, 4), "\n")

# Breusch-Godfrey test for higher-order autocorrelation
bg <- bgtest(its_ols, order = 4)
cat("Breusch-Godfrey test (order 4): p =", round(bg$p.value, 4), "\n")

Step 4: Correct for Autocorrelation — Newey-West SEs and GLS

# --- Method 1: Newey-West HAC standard errors ---
nw_vcov <- NeweyWest(its_ols, lag = 4, prewhite = FALSE)
its_nw <- coeftest(its_ols, vcov = nw_vcov)
cat("=== Newey-West HAC Standard Errors (lag = 4) ===\n")
print(its_nw)

# --- Method 2: Prais-Winsten GLS ---
its_pw <- gls(admissions ~ time + post + time_since, data = df,
            correlation = corAR1(form = ~ time))
cat("\n=== Prais-Winsten GLS (AR(1)) ===\n")
summary(its_pw)

# --- Compare standard errors ---
cat("\n=== SE Comparison for b2 (level change) ===\n")
cat("OLS SE:         ", round(summary(its_ols)$coefficients["post", 2], 3), "\n")
cat("Newey-West SE:  ", round(its_nw["post", 2], 3), "\n")
cat("Prais-Winsten SE:", round(summary(its_pw)$tTable["post", 2], 3), "\n")
cat("\nAutocorrelation-robust SEs are typically LARGER than OLS SEs.\n")
Concept Check

A researcher estimates an ITS model and finds DW = 1.15. They proceed to report OLS standard errors and claim significance at the 1% level. What is the problem?


Step 5: Add Seasonality Controls

Hospital admissions for cardiovascular events show strong seasonal patterns (higher in winter). Failing to control for seasonality can bias the ITS estimates if the intervention coincides with a seasonal peak or trough.

# Add month dummies for seasonality
df$month_factor <- factor(df$month_in_year)

# Model with seasonality
its_season <- lm(admissions ~ time + post + time_since + month_factor,
               data = df)

# Newey-West SEs with seasonality
nw_season <- coeftest(its_season,
                     vcov = NeweyWest(its_season, lag = 4,
                                      prewhite = FALSE))

# Alternative: harmonic (Fourier) terms for seasonality
df$sin12 <- sin(2 * pi * df$month_in_year / 12)
df$cos12 <- cos(2 * pi * df$month_in_year / 12)
df$sin6 <- sin(2 * pi * df$month_in_year / 6)
df$cos6 <- cos(2 * pi * df$month_in_year / 6)

its_fourier <- lm(admissions ~ time + post + time_since +
                  sin12 + cos12 + sin6 + cos6, data = df)
nw_fourier <- coeftest(its_fourier,
                      vcov = NeweyWest(its_fourier, lag = 4,
                                       prewhite = FALSE))

cat("=== ITS with Seasonality Controls (Newey-West SEs) ===\n")
cat("\nMonth Dummies:\n")
cat("  Level change (post):", round(nw_season["post", 1], 2),
  " SE:", round(nw_season["post", 2], 3), "\n")
cat("  Slope change:", round(nw_season["time_since", 1], 4),
  " SE:", round(nw_season["time_since", 2], 4), "\n")

cat("\nFourier Terms:\n")
cat("  Level change (post):", round(nw_fourier["post", 1], 2),
  " SE:", round(nw_fourier["post", 2], 3), "\n")
cat("  Slope change:", round(nw_fourier["time_since", 1], 4),
  " SE:", round(nw_fourier["time_since", 2], 4), "\n")

cat("\n=== Comparison with No Seasonality ===\n")
cat("No controls - post:", round(its_nw["post", 1], 2), "\n")
cat("Month dummies - post:", round(nw_season["post", 1], 2), "\n")
cat("Fourier terms - post:", round(nw_fourier["post", 1], 2), "\n")

Step 6: Compare with Published Results

cat("==========================================================\n")
cat("COMPARISON: Our Replication vs. ITS Literature\n")
cat("==========================================================\n")
pct_drop <- round(nw_season["post", 1] /
  mean(df$admissions[df$post == 0]) * 100, 1)
cat(sprintf("%-40s %10s %10s\n", "Finding", "Literature", "Ours"))
cat("----------------------------------------------------------\n")
cat(sprintf("%-40s %10s %10.1f\n", "Level drop (admissions/month)",
          "20-45", nw_season["post", 1]))
cat(sprintf("%-40s %10s %10.1f\n", "Percentage drop (%)",
          "4-11%", pct_drop))
cat(sprintf("%-40s %10s %10s\n", "Significant after AC correction?",
          "Yes", ifelse(nw_season["post", 4] < 0.05, "Yes", "No")))
cat(sprintf("%-40s %10s %10s\n", "Seasonality controlled?",
          "Yes", "Yes"))
cat("----------------------------------------------------------\n")

Error Detective

Read the analysis below carefully and identify the errors.

A public health researcher evaluates a city-wide policy intervention using monthly data from 2015-2022 (84 months). The policy was implemented in January 2019 (month 49). They estimate:

admissions = 320 - 0.5*time - 18*post + e

Results: "The policy reduced admissions by 18 per month (p < 0.001, OLS). The R-squared is 0.72, confirming a strong model fit." The researcher does not test for autocorrelation, does not include a slope change term, and does not control for seasonality. Notably, the policy was implemented in January (a month with typically high admissions).

Select all errors you can find:


Summary

Our replication confirms the key findings from the ITS literature on smoking bans:

  1. Smoking bans reduce AMI hospital admissions. The estimated level drop is approximately 25 admissions per month (~5--6%), consistent with published estimates of 4--11%.

  2. Autocorrelation is present and must be addressed. Naive OLS standard errors understate uncertainty. Newey-West HAC standard errors and Prais-Winsten GLS provide more reliable inference.

  3. Seasonality controls improve precision. Controlling for seasonal patterns reduces residual variance without substantially changing the point estimate, confirming that the intervention effect is not an artifact of seasonal variation.

  4. The ITS design is powerful but requires careful diagnostics. The three key threats are autocorrelation, seasonality, and concurrent events. Reporting should always include autocorrelation tests, seasonal controls, and discussion of potential co-interventions.


Extension Exercises

  1. Negative binomial model. Since admissions are count data, re-estimate using a negative binomial regression. Compare with the linear model.

  2. Control series. Add a control outcome (e.g., hospital admissions for a condition unaffected by smoking) to create a controlled ITS design.

  3. Lagged effects. Allow the intervention effect to phase in over several months using distributed lag terms.

  4. Sensitivity to bandwidth. Vary the number of pre- and post-intervention periods. How stable is the estimate?

  5. ARIMA-based ITS. Fit an ARIMA model to the pre-intervention series, forecast the counterfactual, and compare with the observed post-intervention series.