MethodAtlas

Method Comparison Demonstrations

The same dataset, analyzed with different methods, can produce strikingly different estimates. These demonstrations show why method choice matters and help you understand what each method identifies.

Method Comparison

Demo 1: OLS vs Fixed Effects vs Difference-in-Differences

Same dataset (Simulated panel: N=500 firms, T=10 periods, true effect=0.20, selection bias=0.8) analyzed with different methods.

OLS (Pooled)

Key Assumption:

Exogeneity: training is uncorrelated with error term

Estimate:

1.23

95% CI:

[1.11, 1.34]

Diagnostic:

In this simulated DGP, likely biased upward: firms that adopt training tend to already be higher-performing, though the magnitude of bias depends on the strength of selection.

Fixed Effects (TWFE)

Key Assumption:

No time-varying confounders

Estimate:

0.18

95% CI:

[-0.03, 0.40]

Diagnostic:

Controls for firm-level time-invariant factors. Smaller estimate is consistent with selection bias in OLS, though other explanations are possible.

Difference-in-Differences

Key Assumption:

Parallel trends: absent training, treated and control firms would have trended similarly

Estimate:

0.18

95% CI:

[0.01, 0.36]

Diagnostic:

An event study can provide evidence about whether pre-trends are present, though a non-significant pre-trend test does not guarantee parallel trends holds (Roth, 2022). More credible when pre-treatment dynamics are stable. Typically needs a clear treatment event with pre/post observations for both groups, whereas FE only requires panel variation in the regressor.

DGP Parameters

500

Number of firms in the panel

10

Number of observation periods

0.20

The actual causal effect of training on firm performance

0.8

How strongly firm ability predicts treatment adoption (0 = random, 2 = strong selection)

Generates a fresh random dataset and recomputes all estimates.

Method Comparison

Demo 2: OLS vs Matching vs Doubly Robust (AIPW)

Same dataset (Simulated cross-section: N=2000, true effect=0.25, nonlinearity=0.30, confounding=0.40) analyzed with different methods.

OLS (Linear)

Key Assumption:

Correct functional form, exogeneity

Estimate:

0.29

95% CI:

[0.20, 0.37]

Diagnostic:

Linear specification may be misspecified if the relationship is nonlinear.

Propensity Score Matching

Key Assumption:

Selection on observables (CIA); common support

Estimate:

0.33

95% CI:

[0.24, 0.43]

Diagnostic:

Wider CI due to matched sample (738 matched pairs). Balance improves after matching.

Doubly Robust (AIPW)

Key Assumption:

CIA holds; consistent if either outcome or PS model is correct

Estimate:

0.29

95% CI:

[0.19, 0.38]

Diagnostic:

Combines both models for robustness. Often used when selection on observables is assumed; performance depends on model specification and the plausibility of the conditional independence assumption.

DGP Parameters

2000

Number of observations

0.25

The actual causal effect of R&D on innovation

0.30

Strength of X2-squared term in the outcome model (0 = linear, 1 = strong nonlinearity)

0.40

How strongly covariates predict treatment (0 = random, 1 = strong selection)

Generates a fresh random dataset and recomputes all estimates.

Method Comparison

Demo 3: Intent-to-Treat vs IV/LATE vs Lee Bounds

Same dataset (Simulated RCT: N=3000, true LATE=0.30, compliance=40%, differential attrition=15%) analyzed with different methods.

Intent-to-Treat (ITT)

Key Assumption:

Random assignment only

Estimate:

0.04

95% CI:

[-0.04, 0.13]

Diagnostic:

Captures effect of being offered treatment, not treatment itself. Generally conservative; unbiased under random assignment, though differential attrition can introduce bias.

IV / LATE (Wald Estimator)

Key Assumption:

Random assignment + exclusion restriction + monotonicity

Estimate:

0.10

95% CI:

[-0.11, 0.32]

Diagnostic:

LATE for compliers only. Larger than ITT because it adjusts for ~60% non-compliance rate.

Lee Bounds

Key Assumption:

Monotonicity of sample selection

Estimate:

[-0.29, 0.38]

95% CI:

Bounds, not point estimate

Diagnostic:

Accounts for 15% differential attrition. Wide bounds reflect the cost of honest inference.

DGP Parameters

3000

Number of subjects in the RCT

0.30

The true treatment effect for compliers

40%

Share of subjects who comply with their assignment

15%

Extra attrition in the control group vs treatment (0 = no differential attrition)

Generates a fresh random dataset and recomputes all estimates.

Key Takeaway

Different methods answer different questions and make different assumptions. There is no universally “best” method. The right choice depends on your research design, the data you have, and the assumptions you are willing to defend. Understanding what each method estimates is the first step toward credible empirical research.