- Home
- Comparisons
Method Comparison Demonstrations
The same dataset, analyzed with different methods, can produce strikingly different estimates. These demonstrations show why method choice matters and help you understand what each method identifies.
Demo 1: OLS vs Fixed Effects vs Difference-in-Differences
Same dataset (Simulated panel: N=500 firms, T=10 periods, true effect=0.20, selection bias=0.8) analyzed with different methods.
OLS (Pooled)
Exogeneity: training is uncorrelated with error term
1.23
[1.11, 1.34]
In this simulated DGP, likely biased upward: firms that adopt training tend to already be higher-performing, though the magnitude of bias depends on the strength of selection.
Fixed Effects (TWFE)
No time-varying confounders
0.18
[-0.03, 0.40]
Controls for firm-level time-invariant factors. Smaller estimate is consistent with selection bias in OLS, though other explanations are possible.
Difference-in-Differences
Parallel trends: absent training, treated and control firms would have trended similarly
0.18
[0.01, 0.36]
An event study can provide evidence about whether pre-trends are present, though a non-significant pre-trend test does not guarantee parallel trends holds (Roth, 2022). More credible when pre-treatment dynamics are stable. Typically needs a clear treatment event with pre/post observations for both groups, whereas FE only requires panel variation in the regressor.
DGP Parameters
Number of firms in the panel
Number of observation periods
The actual causal effect of training on firm performance
How strongly firm ability predicts treatment adoption (0 = random, 2 = strong selection)
Generates a fresh random dataset and recomputes all estimates.
Demo 2: OLS vs Matching vs Doubly Robust (AIPW)
Same dataset (Simulated cross-section: N=2000, true effect=0.25, nonlinearity=0.30, confounding=0.40) analyzed with different methods.
OLS (Linear)
Correct functional form, exogeneity
0.29
[0.20, 0.37]
Linear specification may be misspecified if the relationship is nonlinear.
Propensity Score Matching
Selection on observables (CIA); common support
0.33
[0.24, 0.43]
Wider CI due to matched sample (738 matched pairs). Balance improves after matching.
Doubly Robust (AIPW)
CIA holds; consistent if either outcome or PS model is correct
0.29
[0.19, 0.38]
Combines both models for robustness. Often used when selection on observables is assumed; performance depends on model specification and the plausibility of the conditional independence assumption.
DGP Parameters
Number of observations
The actual causal effect of R&D on innovation
Strength of X2-squared term in the outcome model (0 = linear, 1 = strong nonlinearity)
How strongly covariates predict treatment (0 = random, 1 = strong selection)
Generates a fresh random dataset and recomputes all estimates.
Demo 3: Intent-to-Treat vs IV/LATE vs Lee Bounds
Same dataset (Simulated RCT: N=3000, true LATE=0.30, compliance=40%, differential attrition=15%) analyzed with different methods.
Intent-to-Treat (ITT)
Random assignment only
0.04
[-0.04, 0.13]
Captures effect of being offered treatment, not treatment itself. Generally conservative; unbiased under random assignment, though differential attrition can introduce bias.
IV / LATE (Wald Estimator)
Random assignment + exclusion restriction + monotonicity
0.10
[-0.11, 0.32]
LATE for compliers only. Larger than ITT because it adjusts for ~60% non-compliance rate.
Lee Bounds
Monotonicity of sample selection
[-0.29, 0.38]
Bounds, not point estimate
Accounts for 15% differential attrition. Wide bounds reflect the cost of honest inference.
DGP Parameters
Number of subjects in the RCT
The true treatment effect for compliers
Share of subjects who comply with their assignment
Extra attrition in the control group vs treatment (0 = no differential attrition)
Generates a fresh random dataset and recomputes all estimates.
Key Takeaway
Different methods answer different questions and make different assumptions. There is no universally “best” method. The right choice depends on your research design, the data you have, and the assumptions you are willing to defend. Understanding what each method estimates is the first step toward credible empirical research.