Interrupted Time Series (ITS)
Estimates causal effects of interventions by modeling level and slope changes in a single unit's time series at the intervention point.
Quick Reference
- When to Use
- When you have a single treated unit (or group) with a long pre-intervention time series and no suitable control group, and the intervention date is known and sharp.
- Key Assumption
- The pre-intervention trend would have continued unchanged in the absence of the intervention. No concurrent events affect the outcome at the intervention time.
- Common Mistake
- Ignoring autocorrelation in the time series, which inflates t-statistics and produces false positives. Use Newey-West SEs or model the autocorrelation structure explicitly.
- Estimated Time
- 2.5 hours
One-Line Implementation
itsa outcome, single trperiod(intervention_date) lag(1) posttrendlm(y ~ time + intervention + time_since_intervention, data = df) |> coeftest(vcov = NeweyWest)smf.ols('y ~ time + intervention + time_since', data=df).fit(cov_type='HAC', cov_kwds={'maxlags': 4})Download Full Analysis Code
Complete scripts with diagnostics, robustness checks, and result export.
Motivating Example
A public health researcher wants to know whether a comprehensive smoking ban introduced in January 2010 reduced hospital admissions for acute coronary events. She collects monthly hospital admission counts from January 2004 through December 2015 -- six years before and six years after the ban.
Here is the problem: she cannot randomly assign the smoking ban to some months and not others. The ban was a single policy change applied to an entire jurisdiction at a specific point in time. There is no control group -- every hospital in the region was affected simultaneously.
She cannot simply compare the average admission rate before and after the ban, either. Hospital admissions were already declining over time due to secular trends in cardiovascular health, improvements in emergency medicine, and other public health initiatives. A naive before-after comparison would attribute the entire pre-existing downward trend to the ban, vastly overstating its effect.
The design solves this problem (Wagner et al., 2002). It uses the pre-intervention trend as a counterfactual -- projecting what would have happened without the ban -- and then tests whether the post-intervention data deviates from that projection. The deviation, if any, is attributed to the intervention.
Specifically, the researcher fits a model that allows both the level and the slope of the time series to change at the moment of the intervention. A sudden drop in admissions at the ban date indicates an immediate level change; a steeper post-ban decline indicates a gradual slope change. Both are policy-relevant: the level change captures the immediate effect, and the slope change captures whether the effect grows or diminishes over time.
A. Overview
What the ITS Design Does
The interrupted time series design estimates the causal effect of an intervention that occurs at a known point in time by modeling the outcome as a function of time, with a structural break at the intervention date. The standard segmented regression model is:
where:
- is the outcome at time (e.g., monthly hospital admissions)
- is the time elapsed since the start of the series (1, 2, 3, ...)
- is a dummy variable equal to 1 after the intervention and 0 before
- is the time elapsed since the intervention (0 before, 1, 2, 3, ... after)
The four parameters have clear interpretations:
- : baseline level at
- : pre-intervention slope (the secular trend)
- : immediate level change at the intervention -- the jump (or drop) in the outcome the moment the policy takes effect
- : change in slope after the intervention -- the difference between the post-intervention trend and the pre-intervention trend
Level Change vs. Slope Change
Different interventions produce different patterns:
- Level change only (, ): the intervention causes an immediate, permanent shift. Example: a new billing code instantly changes how diagnoses are recorded.
- Slope change only (, ): the intervention has no immediate effect but gradually changes the trajectory. Example: a new medical guideline slowly changes physician behavior.
- Both (, ): the intervention causes an immediate shift and a change in trajectory. Example: a smoking ban immediately reduces exposure and also accelerates a downward trend as compliance increases.
How It Differs from Simple Before-After Comparison
A simple before-after comparison estimates . This comparison conflates the intervention effect with any pre-existing trend. ITS explicitly models the pre-trend and asks whether the post-intervention data deviates from what the pre-trend would have predicted. This feature is why ITS is sometimes called "the strongest quasi-experimental design when randomization is not possible" .
When to Use ITS
- Your intervention occurs at a single known time point applied to a population
- You have a sufficient number of time points before and after the intervention (at least 8-12 per segment, ideally more) (Kontopantelis et al., 2015)
- The pre-intervention trend is reasonably stable and estimable
- No control group is available (though adding one strengthens the design -- see Section D)
When NOT to Use ITS
- The intervention was phased in gradually with no clear start date
- The outcome is measured at only a few time points before or after (use DiD instead)
- Multiple major changes happened at the same time as the intervention
- The pre-intervention trend is highly volatile or nonlinear, making extrapolation unreliable
Common Confusions
B. Identification
For the ITS design to provide valid causal estimates, three key assumptions must hold (Lopez Bernal et al., 2017).
Assumption 1: Stable Pre-Intervention Trend
Plain language: The pre-intervention trend must be well-characterized and would have continued unchanged in the absence of the intervention. The counterfactual is the projection of the pre-trend into the post-period.
Formally: for all , where is the potential outcome without the intervention and is the intervention time.
This assumption is violated if the pre-intervention trend was nonlinear (e.g., admissions were already accelerating downward before the ban), if it was driven by a transient shock, or if there was a "regression to the mean" effect from a temporary spike just before the intervention.
Assumption 2: No Concurrent Events (History Threat)
Plain language: Nothing else that could affect the outcome happened at the same time as the intervention. If a new cardiac treatment was introduced in the same month as the smoking ban, the estimated effect of the ban is confounded.
Concurrent events are the most common threat to ITS validity . The researcher must carefully document the policy landscape and argue that no other plausible cause of the observed change coincided with the intervention.
Assumption 3: No Anticipation Effects
Plain language: Individuals, firms, or institutions did not change their behavior before the intervention in anticipation of it. If hospitals reduced admissions or smokers quit in the months leading up to the ban (because the ban was announced in advance), the pre-trend is contaminated and the level change at is attenuated.
If anticipation is plausible, the researcher can:
- Move the intervention date earlier to the announcement date
- Exclude a "transition window" around the intervention
- Test for a structural break before the official date
When to Use
-
A policy or event occurs at a single known date. Smoking bans, speed limit changes, new regulations, product launches, organizational restructurings -- any clearly dated intervention that applies to an entire population.
-
You have a long time series before and after the intervention. At least 8 observations per segment, ideally 24+ for seasonal data (Kontopantelis et al., 2015).
-
No suitable control group exists. ITS does not require a control group (though one helps). This flexibility makes it ideal for nationwide policies where everyone is treated.
-
You want to separate immediate from gradual effects. The level and slope change parameters distinguish immediate shifts from long-term trend changes.
Do NOT Use ITS When:
-
The intervention timing is ambiguous. If the policy was phased in over months or years, the sharp break assumed by segmented regression is inappropriate.
-
You have very few time points. With 3-4 observations per segment, you cannot reliably estimate the pre-trend or the slope change. Consider DiD with panel data instead.
-
The pre-trend is chaotic or nonlinear. If the outcome fluctuates wildly before the intervention, the linear pre-trend extrapolation is unreliable and the counterfactual is poorly identified.
-
Multiple interventions overlap. If several policies changed simultaneously, ITS cannot disentangle their individual effects without strong additional assumptions.
Connection to Other Methods
The ITS design relates to several other causal inference methods:
-
Difference-in-Differences (DiD): DiD uses a control group to net out common time trends; ITS uses the pre-trend of the treated group as the counterfactual. When you add a control group to ITS, you get a controlled ITS (CITS), which is DiD with more flexible time trends. DiD is preferred when you have a good control group but few time points; ITS is preferred when you have many time points but no control group.
-
Regression Discontinuity (RDD): Both ITS and RDD exploit a discontinuity, but the running variable differs. In RDD, the running variable is a score that determines treatment (e.g., test scores above a cutoff). In ITS, the running variable is time. ITS can be thought of as "RDD in time" (Lopez Bernal et al., 2017).
-
Synthetic Control: When no single control group is available, synthetic control constructs a weighted combination of untreated units that matches the treated unit's pre-trend. ITS uses the treated unit's own pre-trend as the counterfactual. Synthetic control is preferred when you have a panel of potential control units; ITS is preferred when you have a single treated unit with a long time series.
-
Event Studies: Event study designs estimate dynamic treatment effects at multiple leads and lags around the intervention. ITS can be viewed as a parametric event study that constrains the pre- and post-effects to follow linear trends. Event studies are more flexible but require more data.
C. Visual Intuition
Compare three approaches to estimating the intervention effect. The naive pre-post difference ignores the pre-existing trend, the OLS trend model misses the slope change, and the segmented regression correctly captures both the level shift and the change in trajectory.
Why Segmented Regression? Three Estimators on the Same Data
DGP: Yₜ = 50 + 0.3·t + -5.0·Dₜ + -0.5·(t−t₀)·Dₜ + 2.0·εₜ. Intervention at t = 24, N = 48 periods.
Estimation Results
| Estimator | β̂ | SE | 95% CI | Bias |
|---|---|---|---|---|
| Naive pre-post | -3.315 | 0.924 | [-5.13, -1.50] | +1.685 |
| OLS with trendclosest | -5.674 | 1.825 | [-9.25, -2.10] | -0.674 |
| Segmented regression | -5.982 | 1.322 | [-8.57, -3.39] | -0.982 |
| True β | -5.000 | — | — | — |
Total number of monthly observations
Immediate shift in outcome at intervention
Change in trend after intervention
Standard deviation of idiosyncratic error
Why the difference?
The naive pre-post estimator yields a level change of -3.31 (bias = +1.69). It ignores the pre-existing upward trend of 0.3 per period, so it attributes trend-driven changes to the intervention. OLS with a linear trend controls for the secular trend but assumes no slope change, yielding a level change of -5.67 (bias = -0.67). Because the true DGP includes a slope change, forcing a common slope across pre and post periods introduces bias. The segmented regression models both a level shift and a slope change, yielding a level change of -5.98 (bias = -0.98) and a slope change of -0.617. This is the correct specification for this DGP.
D. Mathematical Derivation
Don't worry about the notation yet — here's what this means in words: The segmented regression model estimates both level and slope changes at the intervention point, with appropriate standard errors for autocorrelated time-series data.
Setup. Suppose we observe for , with an intervention occurring at time .
Step 1: Define the design variables.
- (time index)
- (post-intervention indicator)
- (time since intervention, zero in pre-period)
Step 2: Fit the model.
Under the null hypothesis of no intervention effect, and .
Step 3: Counterfactual construction. The predicted value at post-intervention time without the intervention is:
The predicted value with the intervention is:
The estimated effect at time is the difference:
This effect grows (or shrinks) linearly over time if .
Step 4: Autocorrelation-robust inference. Because is a time series, the errors are typically . OLS standard errors assume independence and will be too small, leading to false positives. Use Newey-West standard errors, GLS with an AR(1) error structure, or ARIMA-based approaches.
E. Implementation
Segmented Regression with Autocorrelation-Robust SEs
library(lmtest)
library(sandwich)
# ---- Step 1: Construct ITS variables ----
# Assume df has columns: month (1..N), admissions, ban (0/1)
df$time <- 1:nrow(df)
df$post <- as.integer(df$month >= as.Date("2010-01-01"))
df$time_since <- ifelse(df$post == 1,
df$time - min(df$time[df$post == 1]) + 1, 0)
# ---- Step 2: Fit segmented regression (OLS) ----
its_ols <- lm(admissions ~ time + post + time_since, data = df)
summary(its_ols)
# ---- Step 3: Newey-West HAC standard errors ----
# Bandwidth = floor(0.75 * N^(1/3)) is a common rule of thumb
bw <- floor(0.75 * nrow(df)^(1/3))
coeftest(its_ols, vcov = NeweyWest(its_ols, lag = bw,
prewhite = FALSE))
# ---- Step 4: Check for autocorrelation ----
dwtest(its_ols) # Durbin-Watson test
bgtest(its_ols, order = 12) # Breusch-Godfrey (up to lag 12)
acf(resid(its_ols), main = "ACF of Residuals")
# ---- Step 5: GLS with AR(1) errors (alternative) ----
library(nlme)
its_gls <- gls(admissions ~ time + post + time_since,
data = df,
correlation = corARMA(p = 1, q = 0))
summary(its_gls)
# ---- Step 6: Plot the ITS ----
plot(df$time, df$admissions, pch = 19, cex = 0.6,
xlab = "Month", ylab = "Hospital Admissions",
main = "Interrupted Time Series: Smoking Ban")
abline(v = min(df$time[df$post == 1]) - 0.5,
lty = 2, col = "red", lwd = 2)
# Pre-intervention fitted line
pre <- df[df$post == 0, ]
lines(pre$time, predict(its_ols, pre), col = "blue", lwd = 2)
# Post-intervention fitted line
pst <- df[df$post == 1, ]
lines(pst$time, predict(its_ols, pst), col = "darkgreen", lwd = 2)
# Counterfactual projection
cf <- data.frame(time = pst$time, post = 0, time_since = 0)
lines(pst$time, predict(its_ols, cf),
col = "blue", lwd = 2, lty = 3)
legend("topright",
c("Pre-trend", "Post-trend", "Counterfactual"),
col = c("blue", "darkgreen", "blue"),
lty = c(1, 1, 3), lwd = 2)F. Diagnostics
F.1 Durbin-Watson Test
The Durbin-Watson test checks for first-order autocorrelation in OLS residuals. The test statistic ranges from 0 to 4: values near 2 indicate no autocorrelation, values significantly below 2 indicate positive autocorrelation (common in time series), and values above 2 indicate negative autocorrelation.
- In R:
dwtest(model)from thelmtestpackage - In Stata:
estat dwatsonafterregress - In Python:
durbin_watson(results.resid)fromstatsmodels
If the Durbin-Watson statistic is far from 2, OLS standard errors are invalid and you must use Newey-West SEs, GLS, or an ARIMA-based approach.
F.2 Ljung-Box Test
The Ljung-Box test checks for autocorrelation at multiple lags simultaneously. Unlike Durbin-Watson (which only tests lag 1), Ljung-Box tests whether the first autocorrelations are jointly zero. Multi-lag testing is especially important for seasonal data where autocorrelation may be present at lag 12 (monthly) or lag 4 (quarterly) even if lag 1 autocorrelation is mild.
- In R:
Box.test(resid(model), lag = 12, type = "Ljung-Box") - In Stata:
estat bgodfrey, lags(1/12)(Breusch-Godfrey, similar purpose) - In Python:
acorr_ljungbox(results.resid, lags=12)
F.3 Residual Autocorrelation Plots
Plot the autocorrelation function (ACF) and partial autocorrelation function (PACF) of the model residuals. Under correct specification:
- ACF should show no significant spikes beyond lag 0
- PACF should show no significant spikes beyond lag 0
Significant spikes at lag 1 suggest AR(1) errors. Spikes at seasonal lags (12 for monthly data) suggest seasonality not captured by the model. Either add seasonal dummies or use seasonal ARIMA.
F.4 Seasonal Decomposition
For monthly or quarterly data, decompose the outcome into trend, seasonal, and residual components before fitting the ITS model. If strong seasonality is present, add seasonal dummies (month indicators) to the segmented regression:
where are month dummies. Omitting seasonal controls when seasonality is present will bias the level-change estimate if the intervention happens to coincide with a seasonal peak or trough (Lopez Bernal et al., 2017).
F.5 Visual Inspection of the Fit
Always plot three elements:
- Raw data points -- the observed time series
- Fitted segments -- the pre- and post-intervention regression lines
- Counterfactual projection -- the dotted extension of the pre-trend into the post-period
The gap between the post-trend line and the counterfactual is the estimated intervention effect. If the data points do not track the fitted lines reasonably well, the linear specification may be wrong.
library(lmtest)
its_fit <- lm(admissions ~ time + post + time_since, data = df)
# F.1 Durbin-Watson
dwtest(its_fit)
# F.2 Breusch-Godfrey for higher-order autocorrelation
bgtest(its_fit, order = 12)
# F.3 ACF / PACF plots
par(mfrow = c(1, 2))
acf(resid(its_fit), lag.max = 24, main = "ACF of Residuals")
pacf(resid(its_fit), lag.max = 24, main = "PACF of Residuals")
# F.4 Seasonal decomposition (pre-intervention only)
pre_ts <- ts(df$admissions[df$post == 0], frequency = 12)
decomp <- decompose(pre_ts)
plot(decomp)
# F.5 Add seasonal dummies if needed
df$month_of_year <- factor(format(df$month, "%m"))
its_seasonal <- lm(admissions ~ time + post + time_since +
month_of_year, data = df)
bgtest(its_seasonal, order = 12)Reading the Coefficients
The four coefficients from the segmented regression have direct policy interpretations:
| Parameter | Estimate | Interpretation |
|---|---|---|
| 320 | Estimated admissions at time 0 (intercept) | |
| -0.35 | Pre-ban trend: admissions falling by 0.35/month | |
| -11.2 | Immediate effect: admissions dropped by 11.2 at the ban date | |
| -0.40 | Trend change: post-ban decline is 0.40/month steeper |
The total effect at time after the intervention is:
At 12 months post-ban: admissions per month relative to counterfactual.
What to Report
A well-reported ITS analysis should include:
- Number of time points before and after the intervention
- Level change () with confidence interval and p-value
- Slope change () with confidence interval and p-value
- Pre-intervention trend () to show what the counterfactual trajectory looks like
- Standard error type (Newey-West, GLS, etc.) and bandwidth if applicable
- Autocorrelation diagnostics (Durbin-Watson, ACF)
- A plot showing the raw data, fitted segments, and counterfactual projection
G. What Can Go Wrong
Concurrent Event Confounds the Intervention
ITS applied to a smoking ban where no other major cardiovascular policy changed at the same time
Level change: -11.2 admissions/month (SE = 3.1, p < 0.001). Slope change: -0.4/month (SE = 0.15, p = 0.008). The ban is associated with an immediate drop and an accelerating decline in admissions.
Autocorrelation Ignored — False Precision
Segmented regression with Newey-West HAC standard errors to account for serial correlation in monthly admissions data
Level change: -8.3 admissions/month. Newey-West SE = 4.2. 95% CI: [-16.5, -0.1]. p = 0.047. The effect is marginally significant with appropriately wide confidence intervals.
Short Pre-Period — Unreliable Counterfactual
ITS with 72 monthly observations before the intervention (6 years), providing a stable and well-estimated pre-trend
Pre-trend slope: -0.35 admissions/month (SE = 0.05). The pre-trend is precisely estimated with narrow confidence intervals, yielding a credible counterfactual projection.
Anticipation Effects Contaminate the Pre-Trend
An unannounced policy change — hospitals and patients had no advance knowledge of the smoking ban, so behavior did not change before the implementation date
Level change: -10.5 admissions/month. The discontinuity at the intervention date is sharp and clearly visible in the data.
H. Practice
H.1 Concept Checks
A researcher fits a segmented regression to monthly data and finds beta_2 = -15 (p < 0.001) and beta_3 = +1.2 (p = 0.03). What does this pattern tell you about the intervention effect?
You run a Durbin-Watson test on the residuals of your ITS segmented regression and get DW = 0.95. What does this imply, and what should you do?
A public health study uses ITS to evaluate a vaccination campaign launched in March 2015. The analysis uses monthly disease incidence from 2010-2019. The researcher finds a significant level drop at March 2015. A critic points out that a new diagnostic test was introduced in the same region in February 2015, which reduced reported disease cases by changing the diagnostic threshold. How should the researcher respond?
H.2 Guided Exercise
Interpreting ITS Output from a Smoking Ban Study
You evaluate the effect of a citywide smoking ban (effective January 2012) on monthly emergency room visits for respiratory complaints. You fit a segmented regression to 48 pre-intervention months (2008-2011) and 48 post-intervention months (2012-2015). Your output: Variable | Coefficient | Newey-West SE | 95% CI | p-value Intercept | 245.0 | 8.2 | [228.9, 261.1] | < 0.001 Time (pre-trend) | -0.50 | 0.12 | [-0.74, -0.26] | < 0.001 Post (level) | -18.3 | 5.6 | [-29.3, -7.3] | 0.001 Time_since (slope)| -0.65 | 0.22 | [-1.08, -0.22] | 0.004 Durbin-Watson = 1.82. Ljung-Box (12 lags) p = 0.31. N = 96 monthly observations (48 pre, 48 post).
H.3 Error Detective
Read the analysis below carefully and identify the errors.
Select all errors you can find:
Read the analysis below carefully and identify the errors.
Select all errors you can find:
H.4 You Are the Referee
Read the paper summary below and write a brief referee critique (2-3 sentences) of the identification strategy.
Paper Summary
The authors evaluate the effect of a statewide opioid prescribing limit (enacted July 2019, capping initial prescriptions at 7 days) on monthly opioid-related emergency department visits. They collect monthly ED visit counts from January 2017 to December 2020 (30 pre-intervention months, 18 post-intervention months) for a single state. They fit a segmented regression with OLS and report a significant level drop of 42 visits per month (p = 0.003) and a non-significant slope change. They present a plot of the fitted model but do not report autocorrelation diagnostics.
Key Table
| Variable | Coefficient | SE | p-value |
|---|---|---|---|
| Intercept | 312.0 | 14.5 | <0.001 |
| Time (pre-trend) | -1.80 | 0.42 | <0.001 |
| Post (level change) | -42.0 | 13.8 | 0.003 |
| Time_since (slope) | +0.35 | 0.68 | 0.610 |
| N (months) | 48 | ||
| Pre-intervention | 30 | ||
| Post-intervention | 18 |
Authors' Identification Claim
The prescribing limit created a clear intervention point. The authors argue that the stable pre-trend validates the counterfactual projection, and that no other major opioid policy changes occurred in the state during the study period.
I. Swap-In: When to Use Something Else
-
Difference-in-Differences (DiD) with a control group: when you have a suitable comparison group not affected by the intervention but with a similar pre-trend. DiD-with-control is more credible than single-group ITS because it nets out time-varying confounders that affect both groups. Preferred when control groups are available, even if you have only a few time periods.
-
Synthetic Control: when you have one treated unit and a panel of untreated units that can be combined to match the treated unit's pre-trend. Synthetic control is the method of choice when a single jurisdiction adopts a policy and you have data on many non-adopting jurisdictions. More flexible than ITS because it does not assume a linear counterfactual.
-
ARIMA-based ITS: when the time series has strong autocorrelation, seasonality, or nonlinear trends that the simple segmented regression cannot capture. ARIMA models (e.g., intervention analysis via transfer functions) fit the autocorrelation structure directly rather than relying on post-hoc corrections. More complex to specify but handles autocorrelation better. See for discussion.
J. Reviewer Checklist
Critical Reading Checklist
Paper Library
Foundational (2)
Shadish, W. R., Cook, T. D., & Campbell, D. T. (2002). Experimental and Quasi-Experimental Designs for Generalized Causal Inference.
The definitive textbook on quasi-experimental designs, including a comprehensive treatment of interrupted time series. Discusses threats to validity (history, instrumentation, selection-maturation interaction) specific to ITS designs and provides guidance on when ITS is most credible.
Wagner, A. K., Soumerai, S. B., Zhang, F., & Ross-Degnan, D. (2002). Segmented Regression Analysis of Interrupted Time Series Studies in Medication Use Research.
Foundational paper formalizing segmented regression for ITS in health services research. Clearly specifies the model with level-change and slope-change parameters, discusses autocorrelation correction, and provides practical recommendations for minimum series length and model diagnostics.
Survey (4)
Lopez Bernal, J., Cummins, S., & Gasparrini, A. (2017). Interrupted Time Series Regression for the Evaluation of Public Health Interventions: A Tutorial.
Accessible tutorial on ITS regression for public health researchers. Covers the segmented regression model, autocorrelation diagnostics, Newey-West standard errors, and practical guidance on minimum number of time points. An excellent starting point for applied researchers.
Kontopantelis, E., Doran, T., Springate, D. A., Buchan, I., & Reeves, D. (2015). Regression Based Quasi-Experimental Approach When Randomisation Is Not an Option: Interrupted Time Series Analysis.
Practical guide to ITS analysis published in the BMJ. Covers model specification, autocorrelation testing, sensitivity analyses, and the addition of control series. Provides clear visual examples of level and slope changes and discusses common pitfalls.
Linden, A. (2015). Conducting Interrupted Time-Series Analysis for Single- and Multiple-Group Comparisons.
Introduces the itsa command in Stata for single- and multiple-group ITS analysis. Covers Newey-West standard errors for autocorrelation, Prais-Winsten estimation, and the extension to controlled ITS with a comparison group. A key reference for Stata users.
Lopez Bernal, J., Cummins, S., & Gasparrini, A. (2018). The Use of Controls in Interrupted Time Series Studies of Public Health Interventions.
Tutorial on extending ITS analysis with control groups to strengthen causal inference. Discusses controlled ITS (CITS) designs that combine the ITS framework with a comparison series, addressing the key threat of concurrent events confounding the intervention effect.