Lab·replication·8 min read

replication120 minutes

Replication Lab: The Effect of a Smoking Ban on Hospital Admissions

Replicate key findings from a public health interrupted time series study of a smoking ban's effect on hospital admissions. Simulate monthly admission data, estimate segmented regression models, diagnose and correct for autocorrelation, and control for seasonality.

MethodInterrupted Time Series (ITS)

LanguagesPython, R, Stata

DatasetSimulated monthly hospital admission counts

Overview

In this replication lab, you will reproduce key findings from the public health literature on the effect of smoking bans on hospital admissions for acute myocardial infarction (AMI):

Barone-Adesi, Francesco, et al. 2011. "Effects of Italian Smoking Regulation on Rates of Hospital Admission for Acute Coronary Events: A Country-Wide Study." PLoS ONE 6(3): e17419.

The interrupted time series (ITS) design compares the trend in hospital admissions before and after the implementation of a smoking ban. The headline finding: comprehensive smoking bans are associated with an immediate reduction of approximately 4--11% in AMI hospital admissions, with effects appearing within the first months and persisting over time.

Why this paper matters: ITS is one of the strongest quasi-experimental designs for evaluating policies that are implemented at a single point in time across an entire population. The smoking ban literature provides a textbook application of segmented regression with autocorrelation diagnostics.

What you will do:

Learn why simulation is used when individual-level health records are unavailable
Simulate monthly hospital admission counts matching stylized patterns
Estimate segmented regression (level change and slope change)
Diagnose and correct for autocorrelation using Durbin-Watson and Newey-West SEs
Add seasonality controls and assess robustness
Compare your results to published findings

Step 1: Simulate Monthly Hospital Admission Data

We simulate 96 months of data (8 years: 4 pre-ban and 4 post-ban). The smoking ban takes effect at month 49.

1library(lmtest)
2library(sandwich)
3library(nlme)
4
5set.seed(2011)
6T_total <- 96  # 8 years of monthly data
7t_interv <- 49  # Intervention at month 49
8
9# Time variables
10time <- 1:T_total
11post <- as.integer(time >= t_interv)
12time_since <- pmax(time - t_interv, 0)
13
14# Seasonality: winter peaks (month 1,2,12 = higher admissions)
15month_in_year <- rep(1:12, 8)
16seasonal <- 15 * sin(2 * pi * (month_in_year - 1) / 12) +
178 * cos(2 * pi * (month_in_year - 1) / 12)
18
19# Baseline trend: slight decline (aging population offset by
20# medical advances)
21baseline <- 450 - 0.3 * time
22
23# Intervention effect: level drop of ~25 admissions (~5-6%)
24# and slight additional decline in trend (~0.2/month)
25intervention_effect <- -25 * post - 0.2 * time_since
26
27# Autocorrelated errors (AR(1) with rho ~ 0.3)
28e <- numeric(T_total)
29e[1] <- rnorm(1, 0, 12)
30for (i in 2:T_total) {
31e[i] <- 0.3 * e[i - 1] + rnorm(1, 0, 12)
32}
33
34admissions <- round(baseline + seasonal + intervention_effect + e)
35
36df <- data.frame(time, month_in_year, post, time_since,
37               admissions)
38
39cat("=== Data Summary ===\n")
40cat("Pre-ban mean admissions:", round(mean(df$admissions[df$post == 0]), 1),
41  "\n")
42cat("Post-ban mean admissions:", round(mean(df$admissions[df$post == 1]), 1),
43  "\n")
44cat("Raw difference:", round(mean(df$admissions[df$post == 0]) -
45  mean(df$admissions[df$post == 1]), 1), "\n")
46cat("Months: Pre =", sum(1 - df$post), "  Post =", sum(df$post), "\n")

Requireslmtest sandwich

Expected output: Data summary

Monthly hospital admission summary:

Period	Mean Admissions	N Months
Pre-ban (months 1--48)	~430--445	48
Post-ban (months 49--96)	~395--415	48
Raw difference	~25--40	---

The raw difference overstates the intervention effect because it confounds the intervention with the pre-existing time trend (admissions were already declining). The segmented regression separates these effects.

Step 2: Estimate the Segmented Regression (ITS Model)

The standard ITS model estimates both a level change (immediate effect) and a slope change (gradual effect) at the intervention point:

admissions_t = b0 + b1 * time + b2 * post + b3 * time_since + e_t

1# Basic ITS model (OLS)
2its_ols <- lm(admissions ~ time + post + time_since, data = df)
3summary(its_ols)
4
5cat("\n=== ITS Parameter Interpretation ===\n")
6cat("b0 (intercept):    ", round(coef(its_ols)[1], 2),
7  " — baseline level at time 0\n")
8cat("b1 (time):         ", round(coef(its_ols)[2], 4),
9  " — pre-intervention slope\n")
10cat("b2 (post):         ", round(coef(its_ols)[3], 2),
11  " — immediate level change at intervention\n")
12cat("b3 (time_since):   ", round(coef(its_ols)[4], 4),
13  " — change in slope after intervention\n")
14
15# Predicted admissions at intervention point
16pred_at_interv <- coef(its_ols)[1] + coef(its_ols)[2] * t_interv
17cat("\nPredicted admissions at intervention (counterfactual):",
18  round(pred_at_interv, 1), "\n")
19cat("Actual level at intervention:",
20  round(pred_at_interv + coef(its_ols)[3], 1), "\n")
21cat("Percentage drop:", round(coef(its_ols)[3] / pred_at_interv * 100, 1),
22  "%\n")

Expected output: ITS segmented regression

Segmented regression results:

Parameter	Coefficient	SE	Interpretation
b0 (intercept)	~448--455	~3.0	Baseline level at time 0
b1 (time)	~-0.25 to -0.35	~0.08	Pre-ban monthly decline
b2 (post)	~-20 to -30	~4.5	Immediate level drop
b3 (time_since)	~-0.10 to -0.30	~0.12	Additional post-ban decline

Interpretation: The smoking ban is associated with an immediate drop of ~25 admissions (~5--6%) and an additional monthly decline of ~0.2 admissions beyond the pre-existing trend. Published estimates typically find a 4--11% immediate reduction.

Concept Check

In the ITS model, what does the coefficient b3 (time_since) represent?

The total change in admissions after the ban.The change in the trend (slope) of admissions after the intervention, above and beyond the pre-existing trend. A negative b3 means admissions decline faster after the ban than before.The average monthly admissions in the post-ban period.The difference in means between the pre-ban and post-ban periods.

Step 3: Diagnose Autocorrelation

Time series data almost always exhibit autocorrelation — successive observations are correlated. If we ignore autocorrelation, standard errors are biased (usually too small), leading to inflated significance.

1# Durbin-Watson test
2dw <- dwtest(its_ols)
3cat("=== Durbin-Watson Test ===\n")
4cat("DW statistic:", round(dw$statistic, 3), "\n")
5cat("p-value:", round(dw$p.value, 4), "\n")
6cat("H0: No first-order autocorrelation\n")
7cat("DW ~ 2: no autocorrelation; DW < 2: positive; DW > 2: negative\n")
8
9# ACF of residuals
10resids <- residuals(its_ols)
11acf_vals <- acf(resids, lag.max = 12, plot = FALSE)
12cat("\n=== ACF of Residuals ===\n")
13for (k in 1:6) {
14cat("Lag", k, ":", round(acf_vals$acf[k + 1], 3), "\n")
15}
16
17# Ljung-Box test
18lb <- Box.test(resids, lag = 12, type = "Ljung-Box")
19cat("\nLjung-Box test (12 lags): chi-sq =", round(lb$statistic, 2),
20  " p =", round(lb$p.value, 4), "\n")
21
22# Breusch-Godfrey test for higher-order autocorrelation
23bg <- bgtest(its_ols, order = 4)
24cat("Breusch-Godfrey test (order 4): p =", round(bg$p.value, 4), "\n")

Expected output: Autocorrelation diagnostics

Autocorrelation test results:

Test	Statistic	p-value	Conclusion
Durbin-Watson	~1.3--1.7	---	Possible positive autocorrelation
Ljung-Box (12)	~10--25	~0.01--0.10	Some evidence of autocorrelation
Breusch-Godfrey (4)	---	~0.01--0.15	Confirms AR structure

ACF of residuals:

Lag	ACF	Significant?
1	~0.15--0.35	Likely yes
2	~0.05--0.15	Borderline
3	~-0.05--0.10	No

Conclusion: Positive first-order autocorrelation is present. OLS standard errors understate uncertainty. We need autocorrelation-robust inference.

Step 4: Correct for Autocorrelation — Newey-West SEs and GLS

1# --- Method 1: Newey-West HAC standard errors ---
2nw_vcov <- NeweyWest(its_ols, lag = 4, prewhite = FALSE)
3its_nw <- coeftest(its_ols, vcov = nw_vcov)
4cat("=== Newey-West HAC Standard Errors (lag = 4) ===\n")
5print(its_nw)
6
7# --- Method 2: Prais-Winsten GLS ---
8its_pw <- gls(admissions ~ time + post + time_since, data = df,
9            correlation = corAR1(form = ~ time))
10cat("\n=== Prais-Winsten GLS (AR(1)) ===\n")
11summary(its_pw)
12
13# --- Compare standard errors ---
14cat("\n=== SE Comparison for b2 (level change) ===\n")
15cat("OLS SE:         ", round(summary(its_ols)$coefficients["post", 2], 3), "\n")
16cat("Newey-West SE:  ", round(its_nw["post", 2], 3), "\n")
17cat("Prais-Winsten SE:", round(summary(its_pw)$tTable["post", 2], 3), "\n")
18cat("\nAutocorrelation-robust SEs are typically LARGER than OLS SEs.\n")

Expected output: SE comparison across methods

Standard error comparison for the level change (b2 = post):

Method	Coefficient	SE	p-value
OLS (naive)	~-25	~4.0--5.0	< 0.001
Newey-West (lag=4)	~-25	~5.0--7.0	< 0.01
Prais-Winsten GLS	~-25	~5.0--6.5	< 0.01

Key observations:

The point estimate (level change) is similar across all three methods
Newey-West and Prais-Winsten SEs are ~20--40% larger than naive OLS SEs
The smoking ban effect remains statistically significant even with corrected SEs
Published studies typically report Newey-West or Prais-Winsten results

Concept Check

A researcher estimates an ITS model and finds DW = 1.15. They proceed to report OLS standard errors and claim significance at the 1% level. What is the problem?

DW = 1.15 is close enough to 2 that autocorrelation is not a concern.With DW = 1.15, there is substantial positive autocorrelation. The OLS standard errors are biased downward, meaning the reported significance is inflated. The coefficient may not actually be significant at the 1% level with proper autocorrelation-robust standard errors.The DW test is not valid for time series models.The researcher should have used clustered standard errors instead.

Step 5: Add Seasonality Controls

Hospital admissions for cardiovascular events show strong seasonal patterns (higher in winter). Failing to control for seasonality can bias the ITS estimates if the intervention coincides with a seasonal peak or trough.

1# Add month dummies for seasonality
2df$month_factor <- factor(df$month_in_year)
3
4# Model with seasonality
5its_season <- lm(admissions ~ time + post + time_since + month_factor,
6               data = df)
7
8# Newey-West SEs with seasonality
9nw_season <- coeftest(its_season,
10                     vcov = NeweyWest(its_season, lag = 4,
11                                      prewhite = FALSE))
12
13# Alternative: harmonic (Fourier) terms for seasonality
14df$sin12 <- sin(2 * pi * df$month_in_year / 12)
15df$cos12 <- cos(2 * pi * df$month_in_year / 12)
16df$sin6 <- sin(2 * pi * df$month_in_year / 6)
17df$cos6 <- cos(2 * pi * df$month_in_year / 6)
18
19its_fourier <- lm(admissions ~ time + post + time_since +
20                  sin12 + cos12 + sin6 + cos6, data = df)
21nw_fourier <- coeftest(its_fourier,
22                      vcov = NeweyWest(its_fourier, lag = 4,
23                                       prewhite = FALSE))
24
25cat("=== ITS with Seasonality Controls (Newey-West SEs) ===\n")
26cat("\nMonth Dummies:\n")
27cat("  Level change (post):", round(nw_season["post", 1], 2),
28  " SE:", round(nw_season["post", 2], 3), "\n")
29cat("  Slope change:", round(nw_season["time_since", 1], 4),
30  " SE:", round(nw_season["time_since", 2], 4), "\n")
31
32cat("\nFourier Terms:\n")
33cat("  Level change (post):", round(nw_fourier["post", 1], 2),
34  " SE:", round(nw_fourier["post", 2], 3), "\n")
35cat("  Slope change:", round(nw_fourier["time_since", 1], 4),
36  " SE:", round(nw_fourier["time_since", 2], 4), "\n")
37
38cat("\n=== Comparison with No Seasonality ===\n")
39cat("No controls - post:", round(its_nw["post", 1], 2), "\n")
40cat("Month dummies - post:", round(nw_season["post", 1], 2), "\n")
41cat("Fourier terms - post:", round(nw_fourier["post", 1], 2), "\n")

Expected output: ITS with seasonality controls

Level change (b2) across specifications (Newey-West SEs):

Specification	Level Change	SE	Slope Change	SE
No seasonality	~-25	~5.5	~-0.20	~0.15
Month dummies	~-25	~4.0	~-0.20	~0.10
Fourier (2 harmonics)	~-25	~4.2	~-0.20	~0.11

Key observation: Adding seasonality controls does not substantially change the point estimate of the level change, but it reduces the standard errors. This reduction occurs because seasonality is a major source of residual variation in monthly hospital admission data, and controlling for it increases precision.

Step 6: Compare with Published Results

1cat("==========================================================\n")
2cat("COMPARISON: Our Replication vs. ITS Literature\n")
3cat("==========================================================\n")
4pct_drop <- round(nw_season["post", 1] /
5  mean(df$admissions[df$post == 0]) * 100, 1)
6cat(sprintf("%-40s %10s %10s\n", "Finding", "Literature", "Ours"))
7cat("----------------------------------------------------------\n")
8cat(sprintf("%-40s %10s %10.1f\n", "Level drop (admissions/month)",
9          "20-45", nw_season["post", 1]))
10cat(sprintf("%-40s %10s %10.1f\n", "Percentage drop (%)",
11          "4-11%", pct_drop))
12cat(sprintf("%-40s %10s %10s\n", "Significant after AC correction?",
13          "Yes", ifelse(nw_season["post", 4] < 0.05, "Yes", "No")))
14cat(sprintf("%-40s %10s %10s\n", "Seasonality controlled?",
15          "Yes", "Yes"))
16cat("----------------------------------------------------------\n")

Error Detective

Read the analysis below carefully and identify the errors.

A public health researcher evaluates a city-wide policy intervention using monthly data from 2015-2022 (84 months). The policy was implemented in January 2019 (month 49). They estimate:

admissions = 320 - 0.5*time - 18*post + e

Results: "The policy reduced admissions by 18 per month (p < 0.001, OLS). The R-squared is 0.72, confirming a strong model fit." The researcher does not test for autocorrelation, does not include a slope change term, and does not control for seasonality. Notably, the policy was implemented in January (a month with typically high admissions).

Select all errors you can find:

No autocorrelation diagnostics or correction(Standard errors)

No slope change term (time_since) in the model(Model specification)

No seasonality controls despite January implementation(Confounding with seasonality)

Summary

Our replication confirms the key findings from the ITS literature on smoking bans:

Smoking bans reduce AMI hospital admissions. The estimated level drop is approximately 25 admissions per month (~5--6%), consistent with published estimates of 4--11%.
Autocorrelation is present and must be addressed. Naive OLS standard errors understate uncertainty. Newey-West HAC standard errors and Prais-Winsten GLS provide more reliable inference.
Seasonality controls improve precision. Controlling for seasonal patterns reduces residual variance without substantially changing the point estimate, confirming that the intervention effect is not an artifact of seasonal variation.
The ITS design is powerful but requires careful diagnostics. The three key threats are autocorrelation, seasonality, and concurrent events. Reporting should always include autocorrelation tests, seasonal controls, and discussion of potential co-interventions.

Extension Exercises

Negative binomial model. Since admissions are count data, re-estimate using a negative binomial regression. Compare with the linear model.
Control series. Add a control outcome (e.g., hospital admissions for a condition unaffected by smoking) to create a controlled ITS design.
Lagged effects. Allow the intervention effect to phase in over several months using distributed lag terms.
Sensitivity to bandwidth. Vary the number of pre- and post-intervention periods. How stable is the estimate?
ARIMA-based ITS. Fit an ARIMA model to the pre-intervention series, forecast the counterfactual, and compare with the observed post-intervention series.

Overview#

Step 1: Simulate Monthly Hospital Admission Data#

Step 2: Estimate the Segmented Regression (ITS Model)#

Step 3: Diagnose Autocorrelation#

Step 4: Correct for Autocorrelation — Newey-West SEs and GLS#

Step 5: Add Seasonality Controls#

Step 6: Compare with Published Results#

Summary#

Extension Exercises#