MethodAtlas
Lab·tutorial·8 min read
tutorial90 minutes

Lab: Interrupted Time Series from Scratch

Implement an interrupted time series analysis step by step. Simulate monthly data with an intervention, fit segmented regression, interpret level and slope changes, test for autocorrelation, and apply Newey-West standard errors.

LanguagesPython, R, Stata
DatasetHospital admissions (simulated monthly data)

Overview

Interrupted Time Series (ITS) is a quasi-experimental design that evaluates the effect of an intervention by comparing the trend in an outcome before and after the intervention. It is one of the strongest non-experimental designs when the intervention occurs at a clearly defined point in time and affects an entire population simultaneously.

What you will learn:

  • How to set up and estimate a segmented regression for ITS
  • How to interpret the level change and slope change parameters
  • How to construct the counterfactual trajectory
  • How to test for and address autocorrelation (Durbin-Watson test)
  • How to compute Newey-West standard errors that are robust to autocorrelation
  • How to compute the cumulative effect at specific post-intervention time points

Prerequisites: OLS regression, basic understanding of time series concepts (trends, autocorrelation).


Step 1: Simulate Monthly Time Series Data

We simulate monthly hospital admissions over 8 years (96 months). A policy intervention occurs at month 49, reducing admissions immediately (level change) and altering the trend (slope change).

library(lmtest)
library(sandwich)

set.seed(42)
T_total <- 96       # 96 months (8 years)
T_interv <- 49      # Intervention at month 49

# Time variable
time <- 1:T_total

# Intervention indicator: 1 after intervention
D <- as.integer(time >= T_interv)

# Time since intervention (0 before, 1, 2, ... after)
time_after <- pmax(time - T_interv + 1, 0) * D

# True parameters
true_intercept   <- 500    # Baseline admissions at time 0
true_trend       <- -0.5   # Pre-intervention trend (slight decline)
true_level_change <- -30   # Immediate drop at intervention
true_slope_change <- -1.5  # Additional decline per month after intervention

# Generate data with autocorrelated errors (AR(1) with phi = 0.4)
phi <- 0.4
epsilon <- numeric(T_total)
epsilon[1] <- rnorm(1, sd = 10)
for (t in 2:T_total) {
epsilon[t] <- phi * epsilon[t - 1] + rnorm(1, sd = 10)
}

# Outcome: admissions
admissions <- true_intercept + true_trend * time +
            true_level_change * D +
            true_slope_change * time_after + epsilon

df <- data.frame(time, admissions, D, time_after)

cat("Pre-intervention mean admissions:", round(mean(admissions[D == 0]), 1), "\n")
cat("Post-intervention mean admissions:", round(mean(admissions[D == 1]), 1), "\n")
cat("Intervention at month:", T_interv, "\n")

Expected output:

PeriodMean Admissions
Pre-intervention (months 1–48)~475–485
Post-intervention (months 49–96)~400–430

The raw comparison of means shows a decline, but ITS allows us to separate the intervention effect from the pre-existing trend.


Step 2: Fit the Segmented Regression

The ITS segmented regression model is:

Y_t = beta_0 + beta_1 * time + beta_2 * D_t + beta_3 * time_after_t + epsilon_t

where beta_2 is the level change and beta_3 is the slope change at the intervention.

# Fit segmented regression (standard OLS)
its_model <- lm(admissions ~ time + D + time_after, data = df)
summary(its_model)

cat("\n=== ITS Estimates vs. True Values ===\n")
cat(sprintf("%-20s %10s %10s\n", "Parameter", "Estimate", "True"))
cat(sprintf("%-20s %10.3f %10.3f\n", "Intercept", coef(its_model)[1], true_intercept))
cat(sprintf("%-20s %10.3f %10.3f\n", "Pre-trend", coef(its_model)["time"], true_trend))
cat(sprintf("%-20s %10.3f %10.3f\n", "Level change", coef(its_model)["D"], true_level_change))
cat(sprintf("%-20s %10.3f %10.3f\n", "Slope change", coef(its_model)["time_after"], true_slope_change))

Expected output:

ParameterEstimateTrue ValueInterpretation
Intercept (beta_0)~500500Baseline admissions at time 0
Pre-trend (beta_1)~-0.5-0.5Monthly decline before intervention
Level change (beta_2)~-30-30Immediate drop at intervention
Slope change (beta_3)~-1.5-1.5Additional monthly decline after intervention
Post-intervention slope~-2.0-2.0beta_1 + beta_3: total monthly decline after intervention

The level change (beta_2) captures the immediate effect of the intervention. The slope change (beta_3) captures how the intervention altered the trajectory. The post-intervention slope (beta_1 + beta_3) describes the total monthly change after the intervention.


Step 3: Construct the Counterfactual

The counterfactual is what would have happened without the intervention — the extrapolation of the pre-intervention trend.

# Counterfactual: predicted values if intervention had NOT occurred
# Set D = 0 and time_after = 0 for all time points
df$counterfactual <- coef(its_model)["(Intercept)"] +
                   coef(its_model)["time"] * df$time

# Predicted values (actual model)
df$predicted <- fitted(its_model)

# Plot
plot(df$time, df$admissions, pch = 16, cex = 0.5, col = "gray40",
   xlab = "Month", ylab = "Admissions",
   main = "ITS: Observed, Fitted, and Counterfactual")
lines(df$time, df$predicted, col = "blue", lwd = 2)
lines(df$time[df$D == 1], df$counterfactual[df$D == 1],
    col = "red", lwd = 2, lty = 2)
abline(v = T_interv - 0.5, lty = 3)
legend("topright", legend = c("Observed", "Fitted", "Counterfactual"),
     col = c("gray40", "blue", "red"), pch = c(16, NA, NA),
     lty = c(NA, 1, 2), lwd = c(NA, 2, 2))

# Effect at specific post-intervention time points
for (m in c(1, 12, 24, 48)) {
effect <- coef(its_model)["D"] + coef(its_model)["time_after"] * m
cat(sprintf("Effect at %d months post-intervention: %.1f admissions\n", m, effect))
}

Expected output:

Months Post-InterventionEffect (admissions reduction)
1~-31.5
12~-48.0
24~-66.0
48~-102.0

The effect grows over time because the slope change adds to the level change. The counterfactual line shows where admissions would have been without the intervention.

Concept Check

In the ITS model Y_t = beta_0 + beta_1*time + beta_2*D + beta_3*time_after + e_t, what is the total effect of the intervention at 24 months after it was implemented?


Step 4: Test for Autocorrelation

Time series data are often autocorrelated — adjacent observations are not independent. Autocorrelation does not bias OLS coefficients, but it makes standard errors incorrect (usually too small).

# Durbin-Watson test for autocorrelation
dw_test <- dwtest(its_model)
cat("Durbin-Watson Test:\n")
cat("  DW statistic:", dw_test$statistic, "\n")
cat("  p-value:", dw_test$p.value, "\n")
cat("  (DW ~ 2: no autocorrelation; DW < 2: positive; DW > 2: negative)\n")

# Breusch-Godfrey test (more general: tests for higher-order AC)
bg_test <- bgtest(its_model, order = 4)
cat("\nBreusch-Godfrey Test (up to lag 4):\n")
cat("  LM statistic:", bg_test$statistic, "\n")
cat("  p-value:", bg_test$p.value, "\n")

# Examine residual autocorrelation
resid <- residuals(its_model)
acf_vals <- acf(resid, plot = FALSE, lag.max = 12)
cat("\nResidual ACF at lag 1:", round(acf_vals$acf[2], 3), "\n")
cat("Residual ACF at lag 2:", round(acf_vals$acf[3], 3), "\n")
cat("\nIf ACF at lag 1 is large, autocorrelation is present.\n")

Expected output:

TestStatisticp-valueInterpretation
Durbin-Watson~1.2–1.5< 0.05Positive autocorrelation detected
Breusch-Godfrey (lag 4)~10–25< 0.05Autocorrelation present
ACF at lag 1~0.30–0.45Substantial first-order autocorrelation

The DW statistic is below 2, confirming positive autocorrelation in the residuals. This finding is expected — we simulated AR(1) errors with phi = 0.4. The OLS coefficient estimates are still unbiased, but the standard errors are too small.

Concept Check

Your ITS model shows significant autocorrelation in the residuals (DW = 1.3, BG p < 0.01). How does this affect your results?


Step 5: Apply Newey-West Standard Errors

Newey-West standard errors are robust to both heteroskedasticity and autocorrelation (HAC). They correct the standard errors without changing the point estimates.

# Newey-West standard errors
# Rule of thumb for bandwidth: floor(T^(1/3)) or floor(0.75 * T^(1/3))
bw <- floor(T_total^(1/3))
cat("Newey-West bandwidth (lags):", bw, "\n\n")

nw_vcov <- NeweyWest(its_model, lag = bw, prewhite = FALSE)
nw_test <- coeftest(its_model, vcov = nw_vcov)

cat("=== Comparison: OLS SEs vs. Newey-West SEs ===\n")
cat(sprintf("%-15s %10s %10s %10s\n", "Parameter", "Estimate", "OLS SE", "NW SE"))
params <- c("time", "D", "time_after")
for (p in params) {
cat(sprintf("%-15s %10.3f %10.3f %10.3f\n",
    p, coef(its_model)[p],
    summary(its_model)$coef[p, "Std. Error"],
    nw_test[p, "Std. Error"]))
}

cat("\nNewey-West SEs are larger than OLS SEs,\n")
cat("reflecting the autocorrelation in the data.\n")

Expected output:

ParameterEstimateOLS SENewey-West SERatio
Pre-trend~-0.50~0.08~0.12~1.5x
Level change~-30.0~3.5~5.5~1.6x
Slope change~-1.50~0.12~0.18~1.5x

The Newey-West standard errors are 40–60% larger than OLS standard errors, reflecting the positive autocorrelation that OLS ignores. The level change and slope change are still statistically significant with the corrected standard errors, but confidence intervals are wider.


Step 6: Alternative — Prais-Winsten GLS Estimation

An alternative to Newey-West is to model the autocorrelation structure directly using Prais-Winsten GLS estimation.

# Prais-Winsten estimation (GLS for AR(1) errors)
library(prais)

pw_model <- prais_winsten(admissions ~ time + D + time_after, data = df)
summary(pw_model)

cat("\n=== All Three Approaches ===\n")
cat(sprintf("%-20s %10s %10s %10s\n", "Method", "Level Chg", "SE", "Slope Chg"))
cat(sprintf("%-20s %10.2f %10.2f %10.3f\n", "OLS",
  coef(its_model)["D"], summary(its_model)$coef["D", "Std. Error"],
  coef(its_model)["time_after"]))
cat(sprintf("%-20s %10.2f %10.2f %10.3f\n", "Newey-West",
  coef(its_model)["D"], nw_test["D", "Std. Error"],
  coef(its_model)["time_after"]))
cat(sprintf("%-20s %10.2f %10.2f %10.3f\n", "Prais-Winsten",
  coef(pw_model)["D"], summary(pw_model)$coef["D", "Std. Error"],
  coef(pw_model)["time_after"]))

cat("\nEstimated AR(1) coefficient:", pw_model$rho, "\n")
cat("True AR(1) coefficient:", phi, "\n")

Expected output:

MethodLevel ChangeSESlope Change
OLS~-30.0~3.5~-1.50
Newey-West~-30.0~5.5~-1.50
Prais-Winsten~-30.0~5.0~-1.50

All three methods give similar point estimates. The Prais-Winsten and Newey-West standard errors are larger than OLS, properly accounting for autocorrelation. Prais-Winsten may be slightly more efficient because it models the error structure parametrically.


Step 7: Robustness Checks

# Robustness 1: Placebo test — move intervention to month 25
df$D_placebo <- as.integer(df$time >= 25)
df$time_after_placebo <- pmax(df$time - 25 + 1, 0) * df$D_placebo

# Only use pre-intervention data
placebo_model <- lm(admissions ~ time + D_placebo + time_after_placebo,
                  data = df[df$time < T_interv, ])
cat("=== Placebo Test (intervention at month 25) ===\n")
cat("Level change:", coef(placebo_model)["D_placebo"],
  "(should be ~0)\n")
cat("Slope change:", coef(placebo_model)["time_after_placebo"],
  "(should be ~0)\n")
cat("p-value (level):", summary(placebo_model)$coef["D_placebo", "Pr(>|t|)"],
  "\n")

# Robustness 2: Add quadratic time trend
its_quad <- lm(admissions ~ time + I(time^2) + D + time_after, data = df)
cat("\n=== Quadratic Time Trend ===\n")
cat("Level change (quadratic):", coef(its_quad)["D"], "\n")
cat("Level change (linear):", coef(its_model)["D"], "\n")

# Robustness 3: Exclude observations near the intervention
df_donut <- df[abs(df$time - T_interv) > 3, ]
its_donut <- lm(admissions ~ time + D + time_after, data = df_donut)
cat("\n=== Donut Specification (exclude 3 months around intervention) ===\n")
cat("Level change (donut):", coef(its_donut)["D"], "\n")

Expected output:

Robustness CheckLevel ChangeSignificant?
Main model~-30.0Yes
Placebo (month 25)~0.0No
Quadratic trend~-30.0Yes
Donut (exclude 3 months)~-30.0Yes

The placebo test should find no significant effect, confirming that the effect is specific to the true intervention date. The quadratic trend and donut specifications should give similar estimates, suggesting the result is robust to functional form and observations near the discontinuity.


Step 8: Exercises

Guided Exercise

Computing the Effect at a Specific Post-Intervention Time

You estimate an ITS model and find: pre-trend = -0.5 admissions/month, level change = -30 admissions, slope change = -1.5 admissions/month. You want to compute the intervention effect at 12 months post-intervention.

What is the total effect at 12 months post-intervention? (effect = level_change + slope_change * 12)

What is the post-intervention slope (total monthly change after intervention)? (answer = pre_trend + slope_change)

At how many months post-intervention will the total effect reach -60 admissions? (solve: -30 + (-1.5)*m = -60)

  1. Seasonality. Add monthly seasonal dummies to the model. In real hospital data, admissions often follow seasonal patterns. How do the ITS estimates change?

  2. Control series. If you have a control group that was not exposed to the intervention, add it as a comparison series (controlled ITS). This strengthens the design by controlling for contemporaneous shocks.

  3. Poisson ITS. When the outcome is a count variable, consider using Poisson or negative binomial regression instead of OLS. Refit the model using a GLM with a log link.


Key Takeaways

  • Interrupted time series is a powerful quasi-experimental design for evaluating interventions that affect an entire population at a known time point
  • The segmented regression model separates the intervention effect into a level change (immediate shift) and a slope change (altered trajectory)
  • The total effect at t months post-intervention is beta_2 + beta_3 * t, which grows over time if the slope change is non-zero
  • The counterfactual is the extrapolation of the pre-intervention trend — what would have happened without the intervention
  • Autocorrelation is common in time series data and biases OLS standard errors (typically making them too small)
  • Always test for autocorrelation (Durbin-Watson, Breusch-Godfrey) and use robust standard errors (Newey-West) or model the error structure (Prais-Winsten)
  • Robustness checks should include placebo tests, alternative functional forms, and donut specifications
  • ITS is strongest when the intervention date is clearly defined and there are no contemporaneous confounders — a control series strengthens the design