- Home
- Diagnostics & Pitfalls
Diagnostics & Pitfalls
Every causal inference method requires diagnostic checks before, during, and after estimation. This page collects the most important checks and the most common pitfalls in one place.
Pre-Estimation Checks
Balance Tables
Compare covariates across treatment and control. Large imbalances signal selection bias.
Common Support / Overlap
Check that treated and control units share similar covariate distributions. Without overlap, extrapolation is unreliable.
Pre-Trend Tests (Event Study)
Plot leads and lags of treatment. Pre-treatment coefficients near zero are consistent with parallel trends, though a non-significant pre-trend test does not guarantee the assumption holds (Roth, 2022).
McCrary Density Test
Test for bunching of the running variable at the cutoff. Bunching raises concerns about manipulation of the running variable, which would violate the continuity assumption underlying RDD.
Estimation Diagnostics
First-Stage F-Statistic
For IV: the widely used F < 10 heuristic (Staiger & Stock, 1997) suggests a weak instrument, though more recent work recommends the effective F-statistic with a threshold that depends on the tolerable bias (Montiel Olea & Pflueger, 2013). Consider weak-instrument robust inference (e.g., Anderson-Rubin) when the first stage is borderline.
Overidentification Tests
With multiple instruments, test whether all instruments are exogenous (Hansen J test). Rejection suggests at least one instrument may be invalid, though the test has low power in some settings and depends on the other instruments being valid.
Hausman Test (FE vs RE)
Tests whether the random effects assumption holds. Rejection provides evidence in favor of fixed effects, though the test can be sensitive to other specification issues.
Goodness of Fit (Pre-Treatment)
For synthetic control: how well does the synthetic unit match the treated unit before treatment?
Post-Estimation Robustness
Sensitivity Analysis (Oster, Cinelli-Hazlett)
How much unobserved confounding would be needed to explain away your result?
Placebo Tests
Apply the method to outcomes, groups, or time periods where the treatment should have no effect.
Specification Curve Analysis
How sensitive are results to analyst degrees of freedom? The approach involves running all defensible specifications to assess sensitivity.
Bandwidth Sensitivity (RDD)
Check whether RDD estimates are robust to different bandwidth choices around the cutoff.
Leave-One-Out (Synthetic Control)
Remove each donor unit one at a time. If results change dramatically, the synthetic control is fragile.
Common Pitfalls
Bad Controls
Including post-treatment variables as controls can introduce collider bias. As a general principle, control for pre-treatment covariates rather than post-treatment variables.
Incorrect Clustering
Standard errors are typically clustered at the level of treatment assignment rather than the level of observation.
p-Hacking / Specification Searching
Running many specifications and reporting only significant ones inflates false positive rates. Pre-registration can help mitigate this concern.
Winner's Curse
Published effect sizes are systematically overestimated because only significant results get published.
Weak Instruments
IV with weak instruments can exhibit bias toward the OLS estimate, and in finite samples this bias can exceed that of OLS itself (Bound, Jaeger & Baker, 1995; Staiger & Stock, 1997). Standard practice is to check the first-stage F-statistic and consider weak-instrument robust inference.
Negative Weights in TWFE
With staggered treatment and heterogeneous effects, TWFE can produce negative weights on some ATTs. Goodman-Bacon (2021, Journal of Econometrics) showed that the TWFE estimator is a weighted average of all 2×2 DiD comparisons, including problematic ones using already-treated units as controls. A common recommendation is to run the Goodman-Bacon decomposition as a diagnostic and, when contaminated comparisons have substantial weight, use modern DiD estimators that are robust to treatment effect heterogeneity.
Related Practices
These research practices provide formal frameworks for many of the diagnostic checks described above.