Lab: Interrupted Time Series from Scratch
Implement an interrupted time series analysis step by step. Simulate monthly data with an intervention, fit segmented regression, interpret level and slope changes, test for autocorrelation, and apply Newey-West standard errors.
Overview
Interrupted Time Series (ITS) is a quasi-experimental design that evaluates the effect of an intervention by comparing the trend in an outcome before and after the intervention. It is one of the strongest non-experimental designs when the intervention occurs at a clearly defined point in time and affects an entire population simultaneously.
What you will learn:
- How to set up and estimate a segmented regression for ITS
- How to interpret the level change and slope change parameters
- How to construct the counterfactual trajectory
- How to test for and address autocorrelation (Durbin-Watson test)
- How to compute Newey-West standard errors that are robust to autocorrelation
- How to compute the cumulative effect at specific post-intervention time points
Prerequisites: OLS regression, basic understanding of time series concepts (trends, autocorrelation).
Step 1: Simulate Monthly Time Series Data
We simulate monthly hospital admissions over 8 years (96 months). A policy intervention occurs at month 49, reducing admissions immediately (level change) and altering the trend (slope change).
library(lmtest)
library(sandwich)
set.seed(42)
T_total <- 96 # 96 months (8 years)
T_interv <- 49 # Intervention at month 49
# Time variable
time <- 1:T_total
# Intervention indicator: 1 after intervention
D <- as.integer(time >= T_interv)
# Time since intervention (0 before, 1, 2, ... after)
time_after <- pmax(time - T_interv + 1, 0) * D
# True parameters
true_intercept <- 500 # Baseline admissions at time 0
true_trend <- -0.5 # Pre-intervention trend (slight decline)
true_level_change <- -30 # Immediate drop at intervention
true_slope_change <- -1.5 # Additional decline per month after intervention
# Generate data with autocorrelated errors (AR(1) with phi = 0.4)
phi <- 0.4
epsilon <- numeric(T_total)
epsilon[1] <- rnorm(1, sd = 10)
for (t in 2:T_total) {
epsilon[t] <- phi * epsilon[t - 1] + rnorm(1, sd = 10)
}
# Outcome: admissions
admissions <- true_intercept + true_trend * time +
true_level_change * D +
true_slope_change * time_after + epsilon
df <- data.frame(time, admissions, D, time_after)
cat("Pre-intervention mean admissions:", round(mean(admissions[D == 0]), 1), "\n")
cat("Post-intervention mean admissions:", round(mean(admissions[D == 1]), 1), "\n")
cat("Intervention at month:", T_interv, "\n")Expected output:
| Period | Mean Admissions |
|---|---|
| Pre-intervention (months 1–48) | ~475–485 |
| Post-intervention (months 49–96) | ~400–430 |
The raw comparison of means shows a decline, but ITS allows us to separate the intervention effect from the pre-existing trend.
Step 2: Fit the Segmented Regression
The ITS segmented regression model is:
Y_t = beta_0 + beta_1 * time + beta_2 * D_t + beta_3 * time_after_t + epsilon_t
where beta_2 is the level change and beta_3 is the slope change at the intervention.
# Fit segmented regression (standard OLS)
its_model <- lm(admissions ~ time + D + time_after, data = df)
summary(its_model)
cat("\n=== ITS Estimates vs. True Values ===\n")
cat(sprintf("%-20s %10s %10s\n", "Parameter", "Estimate", "True"))
cat(sprintf("%-20s %10.3f %10.3f\n", "Intercept", coef(its_model)[1], true_intercept))
cat(sprintf("%-20s %10.3f %10.3f\n", "Pre-trend", coef(its_model)["time"], true_trend))
cat(sprintf("%-20s %10.3f %10.3f\n", "Level change", coef(its_model)["D"], true_level_change))
cat(sprintf("%-20s %10.3f %10.3f\n", "Slope change", coef(its_model)["time_after"], true_slope_change))Expected output:
| Parameter | Estimate | True Value | Interpretation |
|---|---|---|---|
| Intercept (beta_0) | ~500 | 500 | Baseline admissions at time 0 |
| Pre-trend (beta_1) | ~-0.5 | -0.5 | Monthly decline before intervention |
| Level change (beta_2) | ~-30 | -30 | Immediate drop at intervention |
| Slope change (beta_3) | ~-1.5 | -1.5 | Additional monthly decline after intervention |
| Post-intervention slope | ~-2.0 | -2.0 | beta_1 + beta_3: total monthly decline after intervention |
The level change (beta_2) captures the immediate effect of the intervention. The slope change (beta_3) captures how the intervention altered the trajectory. The post-intervention slope (beta_1 + beta_3) describes the total monthly change after the intervention.
Step 3: Construct the Counterfactual
The counterfactual is what would have happened without the intervention — the extrapolation of the pre-intervention trend.
# Counterfactual: predicted values if intervention had NOT occurred
# Set D = 0 and time_after = 0 for all time points
df$counterfactual <- coef(its_model)["(Intercept)"] +
coef(its_model)["time"] * df$time
# Predicted values (actual model)
df$predicted <- fitted(its_model)
# Plot
plot(df$time, df$admissions, pch = 16, cex = 0.5, col = "gray40",
xlab = "Month", ylab = "Admissions",
main = "ITS: Observed, Fitted, and Counterfactual")
lines(df$time, df$predicted, col = "blue", lwd = 2)
lines(df$time[df$D == 1], df$counterfactual[df$D == 1],
col = "red", lwd = 2, lty = 2)
abline(v = T_interv - 0.5, lty = 3)
legend("topright", legend = c("Observed", "Fitted", "Counterfactual"),
col = c("gray40", "blue", "red"), pch = c(16, NA, NA),
lty = c(NA, 1, 2), lwd = c(NA, 2, 2))
# Effect at specific post-intervention time points
for (m in c(1, 12, 24, 48)) {
effect <- coef(its_model)["D"] + coef(its_model)["time_after"] * m
cat(sprintf("Effect at %d months post-intervention: %.1f admissions\n", m, effect))
}Expected output:
| Months Post-Intervention | Effect (admissions reduction) |
|---|---|
| 1 | ~-31.5 |
| 12 | ~-48.0 |
| 24 | ~-66.0 |
| 48 | ~-102.0 |
The effect grows over time because the slope change adds to the level change. The counterfactual line shows where admissions would have been without the intervention.
In the ITS model Y_t = beta_0 + beta_1*time + beta_2*D + beta_3*time_after + e_t, what is the total effect of the intervention at 24 months after it was implemented?
Step 4: Test for Autocorrelation
Time series data are often autocorrelated — adjacent observations are not independent. Autocorrelation does not bias OLS coefficients, but it makes standard errors incorrect (usually too small).
# Durbin-Watson test for autocorrelation
dw_test <- dwtest(its_model)
cat("Durbin-Watson Test:\n")
cat(" DW statistic:", dw_test$statistic, "\n")
cat(" p-value:", dw_test$p.value, "\n")
cat(" (DW ~ 2: no autocorrelation; DW < 2: positive; DW > 2: negative)\n")
# Breusch-Godfrey test (more general: tests for higher-order AC)
bg_test <- bgtest(its_model, order = 4)
cat("\nBreusch-Godfrey Test (up to lag 4):\n")
cat(" LM statistic:", bg_test$statistic, "\n")
cat(" p-value:", bg_test$p.value, "\n")
# Examine residual autocorrelation
resid <- residuals(its_model)
acf_vals <- acf(resid, plot = FALSE, lag.max = 12)
cat("\nResidual ACF at lag 1:", round(acf_vals$acf[2], 3), "\n")
cat("Residual ACF at lag 2:", round(acf_vals$acf[3], 3), "\n")
cat("\nIf ACF at lag 1 is large, autocorrelation is present.\n")Expected output:
| Test | Statistic | p-value | Interpretation |
|---|---|---|---|
| Durbin-Watson | ~1.2–1.5 | < 0.05 | Positive autocorrelation detected |
| Breusch-Godfrey (lag 4) | ~10–25 | < 0.05 | Autocorrelation present |
| ACF at lag 1 | ~0.30–0.45 | — | Substantial first-order autocorrelation |
The DW statistic is below 2, confirming positive autocorrelation in the residuals. This finding is expected — we simulated AR(1) errors with phi = 0.4. The OLS coefficient estimates are still unbiased, but the standard errors are too small.
Your ITS model shows significant autocorrelation in the residuals (DW = 1.3, BG p < 0.01). How does this affect your results?
Step 5: Apply Newey-West Standard Errors
Newey-West standard errors are robust to both heteroskedasticity and autocorrelation (HAC). They correct the standard errors without changing the point estimates.
# Newey-West standard errors
# Rule of thumb for bandwidth: floor(T^(1/3)) or floor(0.75 * T^(1/3))
bw <- floor(T_total^(1/3))
cat("Newey-West bandwidth (lags):", bw, "\n\n")
nw_vcov <- NeweyWest(its_model, lag = bw, prewhite = FALSE)
nw_test <- coeftest(its_model, vcov = nw_vcov)
cat("=== Comparison: OLS SEs vs. Newey-West SEs ===\n")
cat(sprintf("%-15s %10s %10s %10s\n", "Parameter", "Estimate", "OLS SE", "NW SE"))
params <- c("time", "D", "time_after")
for (p in params) {
cat(sprintf("%-15s %10.3f %10.3f %10.3f\n",
p, coef(its_model)[p],
summary(its_model)$coef[p, "Std. Error"],
nw_test[p, "Std. Error"]))
}
cat("\nNewey-West SEs are larger than OLS SEs,\n")
cat("reflecting the autocorrelation in the data.\n")Expected output:
| Parameter | Estimate | OLS SE | Newey-West SE | Ratio |
|---|---|---|---|---|
| Pre-trend | ~-0.50 | ~0.08 | ~0.12 | ~1.5x |
| Level change | ~-30.0 | ~3.5 | ~5.5 | ~1.6x |
| Slope change | ~-1.50 | ~0.12 | ~0.18 | ~1.5x |
The Newey-West standard errors are 40–60% larger than OLS standard errors, reflecting the positive autocorrelation that OLS ignores. The level change and slope change are still statistically significant with the corrected standard errors, but confidence intervals are wider.
Step 6: Alternative — Prais-Winsten GLS Estimation
An alternative to Newey-West is to model the autocorrelation structure directly using Prais-Winsten GLS estimation.
# Prais-Winsten estimation (GLS for AR(1) errors)
library(prais)
pw_model <- prais_winsten(admissions ~ time + D + time_after, data = df)
summary(pw_model)
cat("\n=== All Three Approaches ===\n")
cat(sprintf("%-20s %10s %10s %10s\n", "Method", "Level Chg", "SE", "Slope Chg"))
cat(sprintf("%-20s %10.2f %10.2f %10.3f\n", "OLS",
coef(its_model)["D"], summary(its_model)$coef["D", "Std. Error"],
coef(its_model)["time_after"]))
cat(sprintf("%-20s %10.2f %10.2f %10.3f\n", "Newey-West",
coef(its_model)["D"], nw_test["D", "Std. Error"],
coef(its_model)["time_after"]))
cat(sprintf("%-20s %10.2f %10.2f %10.3f\n", "Prais-Winsten",
coef(pw_model)["D"], summary(pw_model)$coef["D", "Std. Error"],
coef(pw_model)["time_after"]))
cat("\nEstimated AR(1) coefficient:", pw_model$rho, "\n")
cat("True AR(1) coefficient:", phi, "\n")Expected output:
| Method | Level Change | SE | Slope Change |
|---|---|---|---|
| OLS | ~-30.0 | ~3.5 | ~-1.50 |
| Newey-West | ~-30.0 | ~5.5 | ~-1.50 |
| Prais-Winsten | ~-30.0 | ~5.0 | ~-1.50 |
All three methods give similar point estimates. The Prais-Winsten and Newey-West standard errors are larger than OLS, properly accounting for autocorrelation. Prais-Winsten may be slightly more efficient because it models the error structure parametrically.
Step 7: Robustness Checks
# Robustness 1: Placebo test — move intervention to month 25
df$D_placebo <- as.integer(df$time >= 25)
df$time_after_placebo <- pmax(df$time - 25 + 1, 0) * df$D_placebo
# Only use pre-intervention data
placebo_model <- lm(admissions ~ time + D_placebo + time_after_placebo,
data = df[df$time < T_interv, ])
cat("=== Placebo Test (intervention at month 25) ===\n")
cat("Level change:", coef(placebo_model)["D_placebo"],
"(should be ~0)\n")
cat("Slope change:", coef(placebo_model)["time_after_placebo"],
"(should be ~0)\n")
cat("p-value (level):", summary(placebo_model)$coef["D_placebo", "Pr(>|t|)"],
"\n")
# Robustness 2: Add quadratic time trend
its_quad <- lm(admissions ~ time + I(time^2) + D + time_after, data = df)
cat("\n=== Quadratic Time Trend ===\n")
cat("Level change (quadratic):", coef(its_quad)["D"], "\n")
cat("Level change (linear):", coef(its_model)["D"], "\n")
# Robustness 3: Exclude observations near the intervention
df_donut <- df[abs(df$time - T_interv) > 3, ]
its_donut <- lm(admissions ~ time + D + time_after, data = df_donut)
cat("\n=== Donut Specification (exclude 3 months around intervention) ===\n")
cat("Level change (donut):", coef(its_donut)["D"], "\n")Expected output:
| Robustness Check | Level Change | Significant? |
|---|---|---|
| Main model | ~-30.0 | Yes |
| Placebo (month 25) | ~0.0 | No |
| Quadratic trend | ~-30.0 | Yes |
| Donut (exclude 3 months) | ~-30.0 | Yes |
The placebo test should find no significant effect, confirming that the effect is specific to the true intervention date. The quadratic trend and donut specifications should give similar estimates, suggesting the result is robust to functional form and observations near the discontinuity.
Step 8: Exercises
Computing the Effect at a Specific Post-Intervention Time
You estimate an ITS model and find: pre-trend = -0.5 admissions/month, level change = -30 admissions, slope change = -1.5 admissions/month. You want to compute the intervention effect at 12 months post-intervention.
-
Seasonality. Add monthly seasonal dummies to the model. In real hospital data, admissions often follow seasonal patterns. How do the ITS estimates change?
-
Control series. If you have a control group that was not exposed to the intervention, add it as a comparison series (controlled ITS). This strengthens the design by controlling for contemporaneous shocks.
-
Poisson ITS. When the outcome is a count variable, consider using Poisson or negative binomial regression instead of OLS. Refit the model using a GLM with a log link.
✓Key Takeaways
- Interrupted time series is a powerful quasi-experimental design for evaluating interventions that affect an entire population at a known time point
- The segmented regression model separates the intervention effect into a level change (immediate shift) and a slope change (altered trajectory)
- The total effect at t months post-intervention is beta_2 + beta_3 * t, which grows over time if the slope change is non-zero
- The counterfactual is the extrapolation of the pre-intervention trend — what would have happened without the intervention
- Autocorrelation is common in time series data and biases OLS standard errors (typically making them too small)
- Always test for autocorrelation (Durbin-Watson, Breusch-Godfrey) and use robust standard errors (Newey-West) or model the error structure (Prais-Winsten)
- Robustness checks should include placebo tests, alternative functional forms, and donut specifications
- ITS is strongest when the intervention date is clearly defined and there are no contemporaneous confounders — a control series strengthens the design