- Home
- Comparisons
Method Comparison Demonstrations
The same dataset, analyzed with different methods, can produce strikingly different estimates. These demonstrations show why method choice matters and help you understand what each method identifies.
Demo 1: OLS vs Fixed Effects vs Difference-in-Differences
Same dataset (Simulated panel: N=500 firms, T=10 periods, true effect=0.20, selection bias=0.8) analyzed with different methods.
OLS (Pooled)
Exogeneity: training is uncorrelated with error term
1.23
[1.11, 1.34]
In this simulated DGP, likely biased upward: firms that adopt training tend to already be higher-performing, though the magnitude of bias depends on the strength of selection.
Fixed Effects (TWFE)
No time-varying confounders
0.18
[-0.05, 0.41]
Controls for firm-level time-invariant factors. Smaller estimate is consistent with selection bias in OLS, though other explanations are possible.
Difference-in-Differences
Parallel trends: absent training, treated and control firms would have trended similarly
0.18
[0.01, 0.36]
An event study can provide evidence about whether pre-trends are present, though a non-significant pre-trend test does not guarantee parallel trends holds (Roth, 2022). Pre-treatment dynamics should be stable for the design to be credible.
DGP Parameters
Number of firms in the panel
Number of observation periods
The actual causal effect of training on firm performance
How strongly firm ability predicts treatment adoption (0 = random, 2 = strong selection)
Generates a fresh random dataset and recomputes all estimates.
Demo 2: OLS vs Matching vs Doubly Robust (AIPW)
Same dataset (Simulated cross-section: N=2000, true effect=0.25, nonlinearity=0.30, confounding=0.40) analyzed with different methods.
OLS (Linear)
Conditional independence + correct outcome functional form
0.29
[0.20, 0.37]
Preferred when the conditional expectation E[Y|D,X] is well-approximated by a linear function; under correct specification it is efficient. The linear specification may be misspecified if the relationship is nonlinear in X.
Propensity Score Matching
Selection on observables (CIA); overlap/common support; correctly specified propensity score
0.33
[0.24, 0.43]
Preferred when the analyst is unwilling to impose a linear outcome model but is willing to model the treatment assignment process. Wider CI (738 matched pairs) reflects information loss from using only matched units; targets the ATT on the matched sample rather than the ATE.
Doubly Robust (AIPW)
CIA holds; consistent if either outcome or PS model is correct
0.28
[0.19, 0.37]
Consistent if EITHER the outcome model OR the propensity score model is correctly specified (robust to misspecification of one of the two nuisance models, but not both). Preferred when the analyst wants insurance against misspecification in a single nuisance model; still requires CIA and overlap. Performance in finite samples depends on the quality of both nuisance estimates.
DGP Parameters
Number of observations
The actual causal effect of R&D on innovation
Strength of X2-squared term in the outcome model (0 = linear, 1 = strong nonlinearity)
How strongly covariates predict treatment (0 = random, 1 = strong selection)
Generates a fresh random dataset and recomputes all estimates.
Demo 3: Intent-to-Treat vs IV/LATE vs Lee Bounds
Same dataset (Simulated RCT: N=3000, true LATE=0.30, compliance=40%, differential attrition=15%) analyzed with different methods.
Intent-to-Treat (ITT)
Random assignment only
0.04
[-0.04, 0.13]
Estimates the effect of being offered treatment (the offer-of-treatment estimand), not the effect of taking treatment. Under random assignment with full follow-up it is unbiased for the ITT estimand; with imperfect compliance it is attenuated toward zero relative to the LATE on compliers. Differential attrition can introduce bias even with random assignment.
IV / LATE (Wald Estimator)
Random assignment + exclusion restriction + monotonicity
0.10
[-0.11, 0.32]
Identifies the Local Average Treatment Effect for compliers only — not the ATE; the LATE equals the ATE only if treatment effects are homogeneous across compliance types. Larger in magnitude than ITT here because the Wald estimator scales up by 1/first-stage to adjust for ~60% non-compliance.
Lee Bounds
Monotonicity of sample selection
[-0.29, 0.38]
Bounds, not point estimate
Accounts for 15 percentage-point differential attrition (Lee, 2009). Width of the bounds reflects the cost of partial identification when sample selection is non-random; preferred when point identification via assumed selection mechanisms is not defensible.
DGP Parameters
Number of subjects in the RCT
The true treatment effect for compliers
Share of subjects who comply with their assignment
Extra attrition in the control group vs treatment (0 = no differential attrition)
Generates a fresh random dataset and recomputes all estimates.
Key Takeaway
Different methods answer different questions and make different assumptions. There is no universally “best” method. The right choice depends on your research design, the data you have, and the assumptions you are willing to defend. Understanding what each method estimates is the first step toward credible empirical research.