Method Comparison Demonstrations

The same dataset, analyzed with different methods, can produce strikingly different estimates. These demonstrations show why method choice matters and help you understand what each method identifies.

Method Comparison

Demo 1: OLS vs Fixed Effects vs Difference-in-Differences

Same dataset (Simulated panel: N=500 firms, T=10 periods, true effect=0.20, selection bias=0.8) analyzed with different methods.

OLS (Pooled)

Key Assumption:

Exogeneity: training is uncorrelated with error term

Estimate:

1.23

95% CI:

[1.11, 1.34]

Diagnostic:

In this simulated DGP, likely biased upward: firms that adopt training tend to already be higher-performing, though the magnitude of bias depends on the strength of selection.

Fixed Effects (TWFE)

Key Assumption:

No time-varying confounders

Estimate:

0.18

95% CI:

[-0.03, 0.40]

Diagnostic:

Controls for firm-level time-invariant factors. Smaller estimate is consistent with selection bias in OLS, though other explanations are possible.

Difference-in-Differences

Key Assumption:

Parallel trends: absent training, treated and control firms would have trended similarly

Estimate:

0.18

95% CI:

[0.01, 0.36]

Diagnostic:

An event study can provide evidence about whether pre-trends are present, though a non-significant pre-trend test does not guarantee parallel trends holds (Roth, 2022). More credible when pre-treatment dynamics are stable. Typically needs a clear treatment event with pre/post observations for both groups, whereas FE only requires panel variation in the regressor.

DGP Parameters

Firms (N)500

Number of firms in the panel

Time periods (T)10

Number of observation periods

True treatment effect0.20

The actual causal effect of training on firm performance

Selection bias strength0.8

How strongly firm ability predicts treatment adoption (0 = random, 2 = strong selection)

Generates a fresh random dataset and recomputes all estimates.

Method Comparison

Demo 2: OLS vs Matching vs Doubly Robust (AIPW)

Same dataset (Simulated cross-section: N=2000, true effect=0.25, nonlinearity=0.30, confounding=0.40) analyzed with different methods.

OLS (Linear)

Key Assumption:

Correct functional form, exogeneity

Estimate:

0.29

95% CI:

[0.20, 0.37]

Diagnostic:

Linear specification may be misspecified if the relationship is nonlinear.

Propensity Score Matching

Key Assumption:

Selection on observables (CIA); common support

Estimate:

0.33

95% CI:

[0.24, 0.43]

Diagnostic:

Wider CI due to matched sample (738 matched pairs). Balance improves after matching.

Doubly Robust (AIPW)

Key Assumption:

CIA holds; consistent if either outcome or PS model is correct

Estimate:

0.29

95% CI:

[0.19, 0.38]

Diagnostic:

Combines both models for robustness. Often used when selection on observables is assumed; performance depends on model specification and the plausibility of the conditional independence assumption.

DGP Parameters

Sample size (N)2000

Number of observations

True treatment effect0.25

The actual causal effect of R&D on innovation

Nonlinearity0.30

Strength of X2-squared term in the outcome model (0 = linear, 1 = strong nonlinearity)

Confounding strength0.40

How strongly covariates predict treatment (0 = random, 1 = strong selection)

Generates a fresh random dataset and recomputes all estimates.

Method Comparison

Demo 3: Intent-to-Treat vs IV/LATE vs Lee Bounds

Same dataset (Simulated RCT: N=3000, true LATE=0.30, compliance=40%, differential attrition=15%) analyzed with different methods.

Intent-to-Treat (ITT)

Key Assumption:

Random assignment only

Estimate:

0.04

95% CI:

[-0.04, 0.13]

Diagnostic:

Captures effect of being offered treatment, not treatment itself. Generally conservative; unbiased under random assignment, though differential attrition can introduce bias.

IV / LATE (Wald Estimator)

Key Assumption:

Random assignment + exclusion restriction + monotonicity

Estimate:

0.10

95% CI:

[-0.11, 0.32]

Diagnostic:

LATE for compliers only. Larger than ITT because it adjusts for ~60% non-compliance rate.

Lee Bounds

Key Assumption:

Monotonicity of sample selection

Estimate:

[-0.29, 0.38]

95% CI:

Bounds, not point estimate

Diagnostic:

Accounts for 15% differential attrition. Wide bounds reflect the cost of honest inference.

DGP Parameters

Sample size (N)3000

Number of subjects in the RCT

True LATE0.30

The true treatment effect for compliers

Compliance rate40%

Share of subjects who comply with their assignment

Differential attrition15%

Extra attrition in the control group vs treatment (0 = no differential attrition)

Generates a fresh random dataset and recomputes all estimates.

Key Takeaway

Different methods answer different questions and make different assumptions. There is no universally “best” method. The right choice depends on your research design, the data you have, and the assumptions you are willing to defend. Understanding what each method estimates is the first step toward credible empirical research.