When should I use Interrupted Time Series (ITS)?

When you have a single treated unit (or group) with a long pre-intervention time series and no suitable control group, and the intervention date is known and sharp.

What is the key assumption of Interrupted Time Series (ITS)?

The pre-intervention trend would have continued unchanged in the absence of the intervention. No concurrent events affect the outcome at the intervention time.

What is the most common mistake with Interrupted Time Series (ITS)?

Ignoring autocorrelation in the time series, which inflates t-statistics and produces false positives. Use Newey-West SEs or model the autocorrelation structure explicitly.

Method·intermediate·16 min read

Design-BasedEstablished

Interrupted Time Series (ITS)

Estimates causal effects of interventions by modeling level and slope changes in a single unit's time series at the intervention point.

When to Use: When you have a single treated unit (or group) with a long pre-intervention time series and no suitable control group, and the intervention date is known and sharp.
Assumption: The pre-intervention trend would have continued unchanged in the absence of the intervention. No concurrent events affect the outcome at the intervention time.
Mistake: Ignoring autocorrelation in the time series, which inflates t-statistics and produces false positives. Use Newey-West SEs or model the autocorrelation structure explicitly.
Reading Time: ~16 min read · 11 sections · 7 interactive exercises

One-Line Implementation

Rlm(y ~ time + intervention + time_since_intervention, data = df) |> coeftest(vcov = NeweyWest)

Stataitsa outcome, single trperiod(intervention_date) lag(1) posttrend

Pythonsmf.ols('y ~ time + intervention + time_since', data=df).fit(cov_type='HAC', cov_kwds={'maxlags': 4})

Download Full Analysis Code

Complete scripts with diagnostics, robustness checks, and result export.

Motivating Example: Smoking Bans and Hospital Admissions

A public health researcher wants to know whether a comprehensive smoking ban introduced in January 2010 reduced hospital admissions for acute coronary events. She collects monthly hospital admission counts from January 2004 through December 2015 — six years before and six years after the ban.

Here is the problem: she cannot randomly assign the smoking ban to some months and not others. The ban was a single policy change applied to an entire jurisdiction at a specific point in time. There is no control group — every hospital in the region was affected simultaneously.

She cannot simply compare the average admission rate before and after the ban, either. Hospital admissions were already declining over time due to secular trends in cardiovascular health, improvements in emergency medicine, and other public health initiatives. A naive before-after comparison would attribute the entire pre-existing downward trend to the ban, vastly overstating its effect.

The design solves this problem (Wagner et al., 2002). It uses the pre-intervention trend as a counterfactual — projecting what would have happened without the ban — and then tests whether the post-intervention data deviates from that projection. The deviation, if any, is attributed to the intervention.

Specifically, the researcher fits a model that allows both the level and the slope of the time series to change at the moment of the intervention. A sudden drop in admissions at the ban date indicates an immediate level change; a steeper post-ban decline indicates a gradual slope change. Both are policy-relevant: the level change captures the immediate effect, and the slope change captures whether the effect grows or diminishes over time.

AOverview

What the ITS Design Does

The interrupted time series design estimates the causal effect of an intervention that occurs at a known point in time by modeling the outcome as a function of time, with a structural break at the intervention date. The standard segmented regression model is:

Y_t = \beta_0 + \beta_1 T_t + \beta_2 D_t + \beta_3 P_t + \varepsilon_t

where:

$Y_t$ is the outcome at time $t$ (e.g., monthly hospital admissions)
$T_t$ is the time elapsed since the start of the series (1, 2, 3, ...)
$D_t$ is a dummy variable equal to 1 after the intervention and 0 before
$P_t$ is the time elapsed since the intervention (0 before, 1, 2, 3, ... after)

The four parameters have clear interpretations:

$\beta_0$ : baseline level at $T = 0$
$\beta_1$ : pre-intervention slope (the secular trend)
$\beta_2$ : immediate level change at the intervention — the jump (or drop) in the outcome the moment the policy takes effect
$\beta_3$ : change in slope after the intervention — the difference between the post-intervention trend and the pre-intervention trend

Level Change vs. Slope Change

Different interventions produce different patterns:

Level change only ( $\beta_2 \neq 0$ , $\beta_3 = 0$ ): the intervention causes an immediate, permanent shift. Example: a new billing code instantly changes how diagnoses are recorded.
Slope change only ( $\beta_2 = 0$ , $\beta_3 \neq 0$ ): the intervention has no immediate effect but gradually changes the trajectory. Example: a new medical guideline slowly changes physician behavior.
Both ( $\beta_2 \neq 0$ , $\beta_3 \neq 0$ ): the intervention causes an immediate shift and a change in trajectory. Example: a smoking ban immediately reduces exposure and also accelerates a downward trend as compliance increases.

How It Differs from Simple Before-After Comparison

A simple before-after comparison estimates $E[Y_t | \text{post}] - E[Y_t | \text{pre}]$ . This comparison conflates the intervention effect with any pre-existing trend. ITS explicitly models the pre-trend and asks whether the post-intervention data deviates from what the pre-trend would have predicted. This feature is why ITS is sometimes described as among the strongest quasi-experimental designs when randomization is not possible (Shadish, Cook, and Campbell, 2002), particularly when a comparison group is available .

Detailed criteria for when to use (and avoid) ITS — together with the connection to other designs — appear in the When to Use and Connection to Other Methods subsections below.

Common Confusions

Frequently asked questions about ITS

Q: How many time points do I need? There is no universal minimum, but methodological guidance recommends at least 8 pre- and 8 post-intervention time points (Lopez Bernal et al., 2017). Fewer points make it difficult to estimate the pre-trend reliably and leave little power to detect slope changes. More is better — 24+ points per segment is common in well-powered studies.
Q: Is ITS just a regression with a dummy for 'post'? No. A simple pre-post dummy assumes no time trend, which biases the estimated effect. ITS models the time trend explicitly and tests whether the intervention changes it. The dummy $D_t$ captures the level shift, but the time-since-intervention term $P_t$ captures the slope change — both are needed.
Q: Can I use ITS with a control group? Yes, and it substantially strengthens the design. A controlled ITS (CITS) adds a comparison group not exposed to the intervention and includes group-by-time interactions. This addition controls for time-varying confounders that affect both groups equally. See Section D.
Q: How is ITS different from regression discontinuity? In RDD, the running variable determines treatment assignment (e.g., test scores above a cutoff get a scholarship). In ITS, the running variable is time, and everyone is exposed to the intervention after the cutoff date. ITS exploits a temporal discontinuity; RDD exploits a threshold discontinuity.

BIdentification

For the ITS design to provide valid causal estimates, three key assumptions must hold (Lopez Bernal et al., 2017).

Assumption 1: Stable Pre-Intervention Trend

Plain language: The pre-intervention trend must be well-characterized and would have continued unchanged in the absence of the intervention. The counterfactual is the projection of the pre-trend into the post-period.

Formally: $E[Y_t^{(0)} | t > T^*] = \beta_0 + \beta_1 t$ for all $t > T^*$ , where $Y_t^{(0)}$ is the potential outcome without the intervention and $T^*$ is the intervention time.

This assumption is violated if the pre-intervention trend was nonlinear (e.g., admissions were already accelerating downward before the ban), if it was driven by a transient shock, or if there was a "" effect from a temporary spike just before the intervention.

Assumption 2: No Concurrent Events (History Threat)

Plain language: Nothing else that could affect the outcome happened at the same time as the intervention. If a new cardiac treatment was introduced in the same month as the smoking ban, the estimated effect of the ban is confounded.

Concurrent events are the most common threat to ITS validity . The researcher must carefully document the policy landscape and argue that no other plausible cause of the observed change coincided with the intervention.

Assumption 3: No Anticipation Effects

Plain language: Individuals, firms, or institutions did not change their behavior before the intervention in anticipation of it. If hospitals reduced admissions or smokers quit in the months leading up to the ban (because the ban was announced in advance), the pre-trend is contaminated and the level change at $T^*$ is attenuated.

If anticipation is plausible, the researcher can:

Move the intervention date earlier to the announcement date
Exclude a "transition window" around the intervention
Test for a structural break before the official date

When to Use

A policy or event occurs at a single known date. Smoking bans, speed limit changes, new regulations, product launches, organizational restructurings — any clearly dated intervention that applies to an entire population.
You have a long time series before and after the intervention. At least 8 observations per segment, ideally 24+ for seasonal data (Kontopantelis et al., 2015).
No suitable control group exists. ITS does not require a control group (though one helps). This flexibility makes it ideal for nationwide policies where everyone is treated.
You want to separate immediate from gradual effects. The level and slope change parameters distinguish immediate shifts from long-term trend changes.

Do NOT Use ITS When:

The intervention timing is ambiguous. If the policy was phased in over months or years, the sharp break assumed by segmented regression is inappropriate.
You have very few time points. With 3-4 observations per segment, you typically cannot reliably estimate the pre-trend or the slope change. Consider DiD with panel data instead.
The pre-trend is chaotic or nonlinear. If the outcome fluctuates wildly before the intervention, the linear pre-trend extrapolation is unreliable and the counterfactual is poorly identified.
Multiple interventions overlap. If several policies changed simultaneously, ITS cannot disentangle their individual effects without strong additional assumptions.

Connection to Other Methods

The ITS design relates to several other causal inference methods:

Difference-in-Differences (DiD): DiD uses a control group to net out common time trends; ITS uses the pre-trend of the treated group as the counterfactual. When you add a control group to ITS, you get a controlled ITS (CITS), which is DiD with more flexible time trends. DiD is preferred when you have a good control group but few time points; ITS is preferred when you have many time points but no control group.
Regression Discontinuity (RDD): Both ITS and RDD exploit a discontinuity, but the running variable differs. In RDD, the running variable is a score that determines treatment (e.g., test scores above a cutoff). In ITS, the running variable is time. ITS can be thought of as "RDD in time" (Lopez Bernal et al., 2017).
Synthetic Control: When no single control group is available, synthetic control constructs a weighted combination of untreated units that matches the treated unit's pre-trend. ITS uses the treated unit's own pre-trend as the counterfactual. Synthetic control is preferred when you have a panel of potential control units; ITS is preferred when you have a single treated unit with a long time series.
Event Studies: Event study designs estimate dynamic treatment effects at multiple leads and lags around the intervention. ITS can be viewed as a parametric event study that constrains the pre- and post-effects to follow linear trends. Event studies are more flexible but require more data.

CVisual Intuition

Compare three approaches to estimating the intervention effect. The naive pre-post difference ignores the pre-existing trend, the Ordinary Least Squares (OLS) trend model misses the slope change, and the segmented regression correctly captures both the level shift and the change in trajectory.

DMathematical Derivation

Don't worry about the notation yet — here's what this means in words: The segmented regression model estimates both level and slope changes at the intervention point, with appropriate standard errors for autocorrelated time-series data.

Setup. Suppose we observe $Y_t$ for $t = 1, \ldots, N$ , with an intervention occurring at time $T^*$ .

Step 1: Define the design variables.

$T_t = t$ (time index)
$D_t = \mathbf{1}(t > T^*)$ (post-intervention indicator)
$P_t = (t - T^*) \cdot D_t$ (time since intervention, zero in pre-period)

Step 2: Fit the model.

Y_t = \beta_0 + \beta_1 T_t + \beta_2 D_t + \beta_3 P_t + \varepsilon_t

Under the null hypothesis of no intervention effect, $\beta_2 = 0$ and $\beta_3 = 0$ .

Step 3: Counterfactual construction. The predicted value at post-intervention time $t$ without the intervention is:

\hat{Y}_t^{(0)} = \hat{\beta}_0 + \hat{\beta}_1 t

The predicted value with the intervention is:

\hat{Y}_t^{(1)} = \hat{\beta}_0 + \hat{\beta}_1 t + \hat{\beta}_2 + \hat{\beta}_3 (t - T^*)

The estimated effect at time $t$ is the difference:

\hat{\tau}_t = \hat{\beta}_2 + \hat{\beta}_3 (t - T^*)

This effect grows (or shrinks) linearly over time if $\hat{\beta}_3 \neq 0$ .

Step 4: Autocorrelation-robust inference. Because $Y_t$ is a time series, the errors $\varepsilon_t$ are typically . OLS standard errors assume independence and will be too small, leading to false positives. Use Newey-West standard errors, GLS with an AR(1) error structure, or ARIMA-based approaches.

EImplementation

Segmented Regression with Autocorrelation-Robust SEs

1# Requires: lmtest, sandwich, nlme
2library(lmtest)    # lmtest: diagnostic tests for linear models
3library(sandwich)  # sandwich: heteroskedasticity/autocorrelation-consistent SEs
4
5# --- Step 1: Construct ITS variables ---
6# time: linear time trend (captures pre-intervention trajectory)
7# post: indicator for post-intervention period
8# time_since: time elapsed since intervention (captures slope change)
9df$time       <- 1:nrow(df)
10df$post       <- as.integer(df$month >= as.Date("2010-01-01"))
11df$time_since <- ifelse(df$post == 1,
12                      df$time - min(df$time[df$post == 1]) + 1, 0)
13
14# --- Step 2: Fit segmented regression (OLS) ---
15# Y = b0 + b1*time + b2*post + b3*time_since + error
16# b1: pre-intervention trend; b2: level change at intervention
17# b3: slope change after intervention (difference in trends)
18its_ols <- lm(admissions ~ time + post + time_since, data = df)
19summary(its_ols)
20# b2 (post): immediate effect of the ban on admissions
21# b3 (time_since): gradual effect — does the ban's impact grow or fade?
22
23# --- Step 3: Newey-West HAC standard errors ---
24# Time series data typically has autocorrelated errors
25# NeweyWest() corrects SEs for heteroskedasticity and autocorrelation
26# Bandwidth rule of thumb: floor(0.75 * N^(1/3))
27bw <- floor(0.75 * nrow(df)^(1/3))
28coeftest(its_ols, vcov = NeweyWest(its_ols, lag = bw,
29                                  prewhite = FALSE))
30
31# --- Step 4: Check for autocorrelation ---
32# dwtest(): Durbin-Watson test for AR(1) serial correlation
33dwtest(its_ols)
34# bgtest(): Breusch-Godfrey test for higher-order autocorrelation
35bgtest(its_ols, order = 12)
36# Visual: ACF plot should show no significant spikes beyond lag 0
37acf(resid(its_ols), main = "ACF of Residuals")
38
39# --- Step 5: GLS with AR(1) errors (alternative to HAC SEs) ---
40# If autocorrelation is strong, model the error process directly
41library(nlme)  # nlme: generalized least squares with correlated errors
42its_gls <- gls(admissions ~ time + post + time_since,
43             data = df,
44             correlation = corARMA(p = 1, q = 0))  # AR(1) error structure
45summary(its_gls)
46
47# --- Step 6: Plot the ITS ---
48plot(df$time, df$admissions, pch = 19, cex = 0.6,
49   xlab = "Month", ylab = "Hospital Admissions",
50   main = "Interrupted Time Series: Smoking Ban")
51abline(v = min(df$time[df$post == 1]) - 0.5,
52     lty = 2, col = "red", lwd = 2)
53
54# Pre-intervention fitted line (observed trajectory)
55pre <- df[df$post == 0, ]
56lines(pre$time, predict(its_ols, pre), col = "blue", lwd = 2)
57
58# Post-intervention fitted line (actual outcome)
59pst <- df[df$post == 1, ]
60lines(pst$time, predict(its_ols, pst), col = "darkgreen", lwd = 2)
61
62# Counterfactual projection (what would have happened without the ban)
63# Setting post=0 and time_since=0 extrapolates the pre-trend
64cf <- data.frame(time = pst$time, post = 0, time_since = 0)
65lines(pst$time, predict(its_ols, cf),
66    col = "blue", lwd = 2, lty = 3)
67legend("topright",
68     c("Pre-trend", "Post-trend", "Counterfactual"),
69     col = c("blue", "darkgreen", "blue"),
70     lty = c(1, 1, 3), lwd = 2)

Requireslmtest sandwich

FDiagnostics

F.1 Durbin-Watson Test

The Durbin-Watson test checks for first-order autocorrelation in OLS residuals. The test statistic ranges from 0 to 4: values near 2 indicate no autocorrelation, values significantly below 2 indicate positive autocorrelation (common in time series), and values above 2 indicate negative autocorrelation.

In R: dwtest(model) from the lmtest package
In Stata: estat dwatson after regress
In Python: durbin_watson(results.resid) from statsmodels

If the Durbin-Watson statistic is far from 2, OLS standard errors are invalid and Newey-West SEs (Newey & West, 1987), GLS, or an ARIMA-based approach should be used.

F.2 Ljung-Box Test

The Ljung-Box test checks for autocorrelation at multiple lags simultaneously. Unlike Durbin-Watson (which only tests lag 1), Ljung-Box tests whether the first $k$ autocorrelations are jointly zero. Multi-lag testing is especially important for seasonal data where autocorrelation may be present at lag 12 (monthly) or lag 4 (quarterly) even if lag 1 autocorrelation is mild.

In R: Box.test(resid(model), lag = 12, type = "Ljung-Box")
In Stata: estat bgodfrey, lags(1/12) (Breusch-Godfrey, similar purpose)
In Python: acorr_ljungbox(results.resid, lags=12)

F.3 Residual Autocorrelation Plots

Plot the autocorrelation function (ACF) and partial autocorrelation function (PACF) of the model residuals. Under correct specification:

ACF should show no significant spikes beyond lag 0
PACF should show no significant spikes beyond lag 0

Significant spikes at lag 1 suggest AR(1) errors. Spikes at seasonal lags (12 for monthly data) suggest seasonality not captured by the model. Either add seasonal dummies or use seasonal ARIMA.

F.4 Seasonal Decomposition

For monthly or quarterly data, decompose the outcome into trend, seasonal, and residual components before fitting the ITS model. If strong seasonality is present, add seasonal dummies (month indicators) to the segmented regression:

Y_t = \beta_0 + \beta_1 T_t + \beta_2 D_t + \beta_3 P_t + \sum_{s=2}^{12} \gamma_s M_{st} + \varepsilon_t

where $M_{st}$ are month dummies. Omitting seasonal controls when seasonality is present will bias the level-change estimate if the intervention happens to coincide with a seasonal peak or trough (Lopez Bernal et al., 2017).

F.5 Visual Inspection of the Fit

Plot three elements:

Raw data points — the observed time series
Fitted segments — the pre- and post-intervention regression lines
Counterfactual projection — the dotted extension of the pre-trend into the post-period

The gap between the post-trend line and the counterfactual is the estimated intervention effect. If the data points do not track the fitted lines reasonably well, the linear specification may be wrong.

1library(lmtest)
2
3its_fit <- lm(admissions ~ time + post + time_since, data = df)
4
5# F.1 Durbin-Watson
6dwtest(its_fit)
7
8# F.2 Breusch-Godfrey for higher-order autocorrelation
9bgtest(its_fit, order = 12)
10
11# F.3 ACF / PACF plots
12par(mfrow = c(1, 2))
13acf(resid(its_fit), lag.max = 24, main = "ACF of Residuals")
14pacf(resid(its_fit), lag.max = 24, main = "PACF of Residuals")
15
16# F.4 Seasonal decomposition (pre-intervention only)
17pre_ts <- ts(df$admissions[df$post == 0], frequency = 12)
18decomp <- decompose(pre_ts)
19plot(decomp)
20
21# F.5 Add seasonal dummies if needed
22df$month_of_year <- factor(format(df$month, "%m"))
23its_seasonal <- lm(admissions ~ time + post + time_since +
24                   month_of_year, data = df)
25bgtest(its_seasonal, order = 12)

Requireslmtest

Reading the Coefficients

The four coefficients from the segmented regression have direct policy interpretations:

Parameter	Estimate	Interpretation
$\hat{\beta}_0$	320	Estimated admissions at time 0 (intercept)
$\hat{\beta}_1$	-0.35	Pre-ban trend: admissions falling by 0.35/month
$\hat{\beta}_2$	-11.2	Immediate effect: admissions dropped by 11.2 at the ban date
$\hat{\beta}_3$	-0.40	Trend change: post-ban decline is 0.40/month steeper

The total effect at time $k$ after the intervention is:

\hat{\tau}_k = \hat{\beta}_2 + \hat{\beta}_3 \cdot k = -11.2 + (-0.40) \cdot k

At 12 months post-ban: $\hat{\tau}_{12} = -11.2 - 4.8 = -16.0$ admissions per month relative to counterfactual.

What to Report

A well-reported ITS analysis should include:

Number of time points before and after the intervention
Level change ( $\hat{\beta}_2$ ) with confidence interval and p-value
Slope change ( $\hat{\beta}_3$ ) with confidence interval and p-value
Pre-intervention trend ( $\hat{\beta}_1$ ) to show what the counterfactual trajectory looks like
Standard error type (Newey-West, GLS, etc.) and bandwidth if applicable
Autocorrelation diagnostics (Durbin-Watson, ACF)
A plot showing the raw data, fitted segments, and counterfactual projection

Example write-up

"We estimated the effect of the January 2010 smoking ban on monthly hospital admissions for acute coronary events using an interrupted time series design with segmented regression (Wagner et al., 2002). The analysis included 72 pre-intervention and 72 post-intervention monthly observations. Prior to the ban, admissions were declining at 0.35 per month (95% CI: [-0.45, -0.25]). At the ban date, we estimate an immediate level reduction of 11.2 admissions per month (95% CI: [-17.4, -5.0], p < 0.001) and an additional slope change of -0.40 per month (95% CI: [-0.70, -0.10], p = 0.009). Standard errors are Newey-West with bandwidth 4. The Durbin-Watson statistic was 1.89, indicating no substantial residual autocorrelation. By 12 months post-ban, the cumulative estimated reduction was 16.0 admissions per month relative to the counterfactual."

GWhat Can Go Wrong

What Can Go Wrong

Concurrent Event Confounds the Intervention

ITS applied to a smoking ban where no other major cardiovascular policy changed at the same time

Level change: -11.2 admissions/month (SE = 3.1, p < 0.001). Slope change: -0.4/month (SE = 0.15, p = 0.008). The ban is associated with an immediate drop and an accelerating decline in admissions.

What Can Go Wrong

Autocorrelation Ignored — False Precision

Segmented regression with Newey-West HAC standard errors to account for serial correlation in monthly admissions data

Level change: -8.3 admissions/month. Newey-West SE = 4.2. 95% CI: [-16.5, -0.1]. p = 0.047. The effect is marginally significant with appropriately wide confidence intervals.

What Can Go Wrong

Short Pre-Period — Unreliable Counterfactual

ITS with 72 monthly observations before the intervention (6 years), providing a stable and well-estimated pre-trend

Pre-trend slope: -0.35 admissions/month (SE = 0.05). The pre-trend is precisely estimated with narrow confidence intervals, yielding a credible counterfactual projection.

What Can Go Wrong

Anticipation Effects Contaminate the Pre-Trend

An unannounced policy change — hospitals and patients had no advance knowledge of the smoking ban, so behavior did not change before the implementation date

Level change: -10.5 admissions/month. The discontinuity at the intervention date is sharp and clearly visible in the data.

HPractice

H.1 Concept Checks

Concept Check

A researcher fits a segmented regression to monthly data and finds beta_2 = -15 (p < 0.001) and beta_3 = +1.2 (p = 0.03). What does this pattern tell you about the intervention effect?

The intervention had a large permanent effect — admissions dropped by 15 and stayed there.The intervention caused an immediate drop of 15, but this effect is fading over time at a rate of 1.2 per month. The effect will fully dissipate after about 12-13 months.The model is misspecified because level and slope changes should have the same sign.The slope change is small relative to the level change, so it can be ignored.

Concept Check

You run a Durbin-Watson test on the residuals of your ITS segmented regression and get DW = 0.95. What does this imply, and what should you do?

DW close to 1 indicates no autocorrelation; proceed with OLS standard errors.DW = 0.95 indicates strong positive autocorrelation. OLS standard errors are too small, and significance tests are unreliable. Use Newey-West SEs, Prais-Winsten GLS, or an ARIMA-based approach.DW = 0.95 means the model has heteroscedasticity, not autocorrelation.DW = 0.95 means the intervention effect is not significant.

Concept Check

A public health study uses ITS to evaluate a vaccination campaign launched in March 2015. The analysis uses monthly disease incidence from 2010-2019. The researcher finds a significant level drop at March 2015. A critic points out that a new diagnostic test was introduced in the same region in February 2015, which reduced reported disease cases by changing the diagnostic threshold. How should the researcher respond?

The diagnostic change does not matter because it happened one month before the campaign.The scenario is a classic history threat. The researcher should acknowledge the concurrent diagnostic change, attempt to quantify its effect separately (e.g., using data from regions where the diagnostic changed but no campaign occurred), or use a controlled ITS with a comparison region.The researcher should add the diagnostic change as a control variable in the regression.The researcher should ignore the critique because ITS is a strong quasi-experimental design.

H.2 Guided Exercise

Guided Exercise

Interpreting ITS Output from a Smoking Ban Study

You evaluate the effect of a citywide smoking ban (effective January 2012) on monthly emergency room visits for respiratory complaints. You fit a segmented regression to 48 pre-intervention months (2008-2011) and 48 post-intervention months (2012-2015). Your output:

Variable	Coefficient	Newey-West SE	95% CI	p-value
Intercept	245.0	8.2	[228.9, 261.1]	< 0.001
Time (pre-trend)	-0.50	0.12	[-0.74, -0.26]	< 0.001
Post (level)	-18.3	5.6	[-29.3, -7.3]	0.001
Time_since (slope)	-0.65	0.22	[-1.08, -0.22]	0.004

Durbin-Watson = 1.82. Ljung-Box (12 lags) p = 0.31. N = 96 monthly observations (48 pre, 48 post).

H.3 Error Detective

Error Detective

Read the analysis below carefully and identify the errors.

A health policy researcher studies the effect of a hospital staffing mandate enacted on July 1, 2018 on patient mortality rates. She collects quarterly mortality data from Q1 2016 to Q4 2020 (20 quarters: 10 pre, 10 post). She fits:

lm(mortality ~ time + post + time_since, data = df)

She reports: "The staffing mandate reduced mortality by 2.3 deaths per 1,000 admissions (SE = 0.8, p = 0.004). There was no significant slope change (p = 0.45)."

She does not report autocorrelation diagnostics or seasonal controls. The data are quarterly. She does not discuss that a major electronic health records (EHR) system was implemented at the same hospital network in Q3 2018.

Select all errors you can find:

No autocorrelation diagnostic reported(Standard errors / diagnostics)

Concurrent event not addressed — EHR implementation(Identification / concurrent events)

No seasonal controls for quarterly data(Model specification)

Error Detective

Read the analysis below carefully and identify the errors.

An education researcher evaluates a new math curriculum introduced at the start of the 2017-2018 school year. She collects annual standardized test scores for a single school district from 2010 to 2022 (8 pre-intervention years, 5 post-intervention years — 13 total observations). She fits a segmented regression and reports:

"The new curriculum improved test scores by 4.2 points (SE = 1.5, p = 0.012). The slope change was +0.8 points per year (SE = 0.4, p = 0.06). We used Newey-West standard errors with bandwidth 3."

She presents a plot showing the fitted pre-trend, post-trend, and counterfactual. The plot shows that the pre-trend line is estimated from 8 annual points and appears to fit the data well.

Select all errors you can find:

Only 13 total observations — borderline insufficient for ITS(Sample size / power)

Newey-West bandwidth may be inappropriate with only 13 observations(Standard error specification)

No discussion of other changes during the transition year(Concurrent events)

H.4 You Are the Referee

Referee Exercise

Read the paper summary below and write a brief referee critique (2-3 sentences) of the identification strategy.

Paper Summary

The authors evaluate the effect of a statewide opioid prescribing limit (enacted July 2019, capping initial prescriptions at 7 days) on monthly opioid-related emergency department visits. They collect monthly ED visit counts from January 2017 to December 2020 (30 pre-intervention months, 18 post-intervention months) for a single state. They fit a segmented regression with OLS and report a significant level drop of 42 visits per month (p = 0.003) and a non-significant slope change. They present a plot of the fitted model but do not report autocorrelation diagnostics.

Key Table

Variable	Coefficient	SE	p-value
Intercept	312.0	14.5	<0.001
Time (pre-trend)	-1.80	0.42	<0.001
Post (level change)	-42.0	13.8	0.003
Time_since (slope)	+0.35	0.68	0.610
N (months)	48
Pre-intervention	30
Post-intervention	18

Authors' Identification Claim

The prescribing limit created a clear intervention point. The authors argue that the stable pre-trend validates the counterfactual projection, and that no other major opioid policy changes occurred in the state during the study period.

ISwap-In: When to Use Something Else

Difference-in-Differences (DiD) with a control group: when you have a suitable comparison group not affected by the intervention but with a similar pre-trend. DiD-with-control is more credible than single-group ITS because it nets out time-varying confounders that affect both groups. Preferred when control groups are available, even if you have only a few time periods.
Synthetic Control: when you have one treated unit and a panel of untreated units that can be combined to match the treated unit's pre-trend. Synthetic control is often well-suited to settings where a single jurisdiction adopts a policy and you have data on many non-adopting jurisdictions. More flexible than ITS because it does not assume a linear counterfactual, though it requires the treated unit's pre-treatment profile to lie within the convex hull of donor units.
ARIMA-based ITS: when the time series has strong autocorrelation, seasonality, or nonlinear trends that the simple segmented regression cannot capture. ARIMA models (e.g., intervention analysis via transfer functions) fit the autocorrelation structure directly rather than relying on post-hoc corrections. More complex to specify but handles autocorrelation better. See for discussion.

JReviewer Checklist

Paper Library

Has replication code

Foundational (2)

Shadish, W. R., Cook, T. D., & Campbell, D. T. (2002). Experimental and Quasi-Experimental Designs for Generalized Causal Inference.

Houghton Mifflin

Shadish, Cook, and Campbell write the standard textbook on quasi-experimental designs, including a comprehensive treatment of interrupted time series. Discusses threats to validity (history, instrumentation, selection-maturation interaction) specific to ITS designs and provides guidance on when ITS is most credible.

Wagner, A. K., Soumerai, S. B., Zhang, F., & Ross-Degnan, D. (2002). Segmented Regression Analysis of Interrupted Time Series Studies in Medication Use Research.

Journal of Clinical Pharmacy and TherapeuticsDOI: 10.1046/j.1365-2710.2002.00430.x

Wagner and colleagues formalize segmented regression for ITS in health services research. Clearly specifies the model with level-change and slope-change parameters, discusses autocorrelation correction, and provides practical recommendations for minimum series length and model diagnostics.

Survey (4)

Kontopantelis, E., Doran, T., Springate, D. A., Buchan, I., & Reeves, D. (2015). Regression Based Quasi-Experimental Approach When Randomisation Is Not an Option: Interrupted Time Series Analysis.

BMJDOI: 10.1136/bmj.h2750

Kontopantelis and colleagues provide a practical guide to ITS analysis published in the BMJ. Covers model specification, autocorrelation testing, sensitivity analyses, and the addition of control series. Provides clear visual examples of level and slope changes and discusses common pitfalls.

Linden, A. (2015). Conducting Interrupted Time-Series Analysis for Single- and Multiple-Group Comparisons.

Stata JournalDOI: 10.1177/1536867X1501500208

Linden introduces the itsa command in Stata for single- and multiple-group ITS analysis. Covers Newey-West standard errors for autocorrelation, Prais-Winsten estimation, and the extension to controlled ITS with a comparison group. A key reference for Stata users.

Lopez Bernal, J., Cummins, S., & Gasparrini, A. (2017). Interrupted Time Series Regression for the Evaluation of Public Health Interventions: A Tutorial.

International Journal of EpidemiologyDOI: 10.1093/ije/dyw098

Lopez Bernal, Cummins, and Gasparrini provide an accessible tutorial on ITS regression for public health researchers. Covers the segmented regression model, autocorrelation diagnostics, Newey-West standard errors, and practical guidance on minimum number of time points. An excellent starting point for applied researchers.

Lopez Bernal, J., Cummins, S., & Gasparrini, A. (2018). The Use of Controls in Interrupted Time Series Studies of Public Health Interventions.

International Journal of EpidemiologyDOI: 10.1093/ije/dyy135

Lopez Bernal and colleagues provide a tutorial on extending ITS analysis with control groups to strengthen causal inference. Discusses controlled ITS (CITS) designs that combine the ITS framework with a comparison series, addressing the key threat of concurrent events confounding the intervention effect.

One-Line Implementation

Download Full Analysis Code

Motivating Example: Smoking Bans and Hospital Admissions#

AOverview#

What the ITS Design Does#

Level Change vs. Slope Change#

How It Differs from Simple Before-After Comparison#

Common Confusions#

BIdentification#

Assumption 1: Stable Pre-Intervention Trend#

Assumption 2: No Concurrent Events (History Threat)#

Assumption 3: No Anticipation Effects#

When to Use#

Do NOT Use ITS When:#

Connection to Other Methods#

CVisual Intuition#

DMathematical Derivation#

EImplementation#

Segmented Regression with Autocorrelation-Robust SEs#

FDiagnostics#

F.1 Durbin-Watson Test#

F.2 Ljung-Box Test#

F.3 Residual Autocorrelation Plots#

F.4 Seasonal Decomposition#

F.5 Visual Inspection of the Fit#

Reading the Coefficients#

What to Report#

GWhat Can Go Wrong#

Concurrent Event Confounds the Intervention

Autocorrelation Ignored — False Precision

Short Pre-Period — Unreliable Counterfactual

Anticipation Effects Contaminate the Pre-Trend

HPractice#

H.1 Concept Checks#

H.2 Guided Exercise#

H.3 Error Detective#

H.4 You Are the Referee#

Paper Summary

Key Table

Authors' Identification Claim

ISwap-In: When to Use Something Else#

JReviewer Checklist#

Critical Reading Checklist

Paper Library

Foundational (2)

Survey (4)

Tags

Motivating Example: Smoking Bans and Hospital Admissions

AOverview

What the ITS Design Does

Level Change vs. Slope Change

How It Differs from Simple Before-After Comparison

Common Confusions

BIdentification

Assumption 1: Stable Pre-Intervention Trend

Assumption 2: No Concurrent Events (History Threat)

Assumption 3: No Anticipation Effects

When to Use

Do NOT Use ITS When:

Connection to Other Methods

CVisual Intuition

DMathematical Derivation

EImplementation

Segmented Regression with Autocorrelation-Robust SEs

FDiagnostics

F.1 Durbin-Watson Test

F.2 Ljung-Box Test

F.3 Residual Autocorrelation Plots

F.4 Seasonal Decomposition

F.5 Visual Inspection of the Fit

Reading the Coefficients

What to Report

GWhat Can Go Wrong

HPractice

H.1 Concept Checks

H.2 Guided Exercise

H.3 Error Detective

H.4 You Are the Referee

ISwap-In: When to Use Something Else

JReviewer Checklist