MethodAtlas
Guide

Working with Observational Data

A structured workflow for pursuing causal inference from observational data. Covers when observational designs are appropriate, choosing estimators, managing the assumption burden, and strategies for building credibility.

The Observational Data Challenge

Most empirical research in the social sciences uses observational data — data collected from the world as it is, without experimental intervention. The fundamental challenge is that treatment assignment is not random. People, firms, and governments choose their treatments based on factors that also affect outcomes. This self-selection creates confounding, and confounding is the enemy of causal inference.

This challenge does not mean causal inference from observational data is impossible. It means that every causal claim rests on assumptions, and your job is to make those assumptions as credible as possible, to test them where you can, and to be transparent about where they might fail.

Phase 1: Assess Your Identification Options

Before committing to an identification strategy, honestly evaluate what sources of variation are available in your data.

The Identification Hierarchy

In rough order of credibility (with important caveats):

Randomized experiment. If you can randomize treatment, do it. Randomization is not always possible, but it is worth considering first. Field experiments, survey experiments, and lab experiments all have a place in observational research programs.

Natural experiment. Can you find an event, policy, rule, or lottery that creates as-if random variation in your treatment? If yes, use the natural experiment workflow and the appropriate design-based estimator (DiD, RDD, IV). A natural experiment is the strongest observational design.

Selection on observables. If no natural experiment is available, you may argue that all confounders are observed and can be controlled for. This unconfoundedness claim is a strong assumption, but it is sometimes defensible — particularly when you have rich administrative data with many covariates.

Panel data with fixed effects. If you have repeated observations of the same units over time, fixed effects remove all time-invariant confounders. This strategy is powerful but does not address time-varying confounders that coincide with treatment.

Sensitivity analysis with observational methods. When none of the above is fully convincing, you can still produce useful evidence by combining observational methods with rigorous sensitivity analysis that quantifies how much confounding would need to exist to explain your results.

Concept Check

You want to study the effect of corporate board diversity on firm performance. You have panel data on S&P 500 firms over 20 years. No relevant policy change or cutoff exists. What is the most honest framing for your identification strategy?

Phase 2: Choose Your Estimator

When You Have Panel Data

Fixed effects. Include unit and time fixed effects to absorb time-invariant unobservables and common time shocks. This specification is the workhorse of observational panel data studies. The key remaining assumption is that no time-varying confounders coincide with treatment changes.

Random effects. A more efficient alternative to FE when the individual effect is uncorrelated with the regressors. Use the Hausman test to compare FE and RE, but remember that failing to reject does not prove RE is valid — it may simply reflect low power. Correlated random effects (the Mundlak/Chamberlain approach) relaxes the independence assumption by including group means of time-varying regressors, combining the consistency of FE with some of the efficiency gains of RE.

When You Have Cross-Sectional Data

Propensity score matching (PSM). Match treated units to similar control units based on the estimated probability of treatment. Reduces selection bias from observed covariates but does nothing for unobservable confounders.

Coarsened exact matching (CEM). A nonparametric alternative to PSM that matches on coarsened versions of covariates. Avoids some of PSM's model-dependence issues.

Inverse probability weighting (IPW). Reweight observations by the inverse of their propensity score to create a pseudo-population in which treatment is independent of observed covariates.

Doubly robust estimation (AIPW). Combines outcome modeling and propensity score weighting. Consistent if either the outcome model or the propensity score model is correctly specified. This double protection is a major advantage.

Double/debiased machine learning (DML). Uses ML algorithms (random forests, LASSO, neural networks) to estimate nuisance parameters (propensity scores and conditional outcome means) while maintaining valid inference on the causal parameter. Particularly useful with high-dimensional covariates.

Choosing Controls

What to include:

  • Pre-treatment covariates that predict both treatment and outcome (confounders)
  • Pre-treatment covariates that predict the outcome but not treatment (precision gains)

What to exclude:

  • Post-treatment variables. Including a variable affected by treatment induces post-treatment bias and can mask or create spurious effects. This mistake is one of the most common anti-patterns in applied research.
  • Colliders. Variables caused by both treatment and outcome. Conditioning on them opens a backdoor path and introduces bias.
  • Instruments. Variables that affect treatment but not the outcome directly. Including them as controls removes the exogenous variation you need.

Phase 3: Manage the Assumption Burden

The core assumption of all selection-on-observables methods is conditional independence (also called unconfoundedness or ignorability): conditional on observed covariates, treatment assignment is independent of potential outcomes. This assumption is untestable. Here is how to make it more plausible:

Strategy 1: Rich Covariates

The more covariates you observe and control for, the more plausible conditional independence becomes. Administrative data with detailed individual or firm characteristics is particularly valuable. When you have hundreds of potential covariates, use DML to handle the dimensionality.

Strategy 2: Institutional Knowledge

Use your knowledge of the treatment assignment process to argue that specific confounders are the most important and that you observe them. A detailed description of why units receive treatment helps the reader assess whether your covariates are sufficient.

Strategy 3: Formal Sensitivity Analysis

(Oster, 2019)

Oster (2019) provides a formal framework for assessing how much selection on unobservables would be needed to explain your result. The key parameter, delta, measures the relative importance of unobserved versus observed selection. If delta must exceed 1 (meaning unobserved selection must be stronger than observed selection) to eliminate your result, this magnitude is reassuring.

(Cinelli & Hazlett, 2020)

Cinelli and Hazlett (2020) extend Oster's framework and provide tools for visualizing sensitivity. Their R package sensemakr produces contour plots showing how different levels of confounding would affect your estimate.

Strategy 4: Specification Curve Analysis

Run your analysis across the full space of defensible specifications: different control sets, different sample definitions, different functional forms, different estimators. A specification curve provides a systematic way to visualize this exercise. If your result is robust across specifications, it is less likely to be an artifact of one particular set of choices.

Phase 4: Estimate with Transparency

The Results Progression

Structure your results to build credibility:

  1. Naive OLS. Show the raw bivariate relationship. This regression is your baseline — biased, but informative about the direction and magnitude of selection.
  2. Add controls progressively. Add covariates in meaningful groups. If the coefficient is stable as you add controls, this stability suggests limited selection bias (at least from observables).
  3. Preferred specification. Your main result with the full set of controls, fixed effects, and the appropriate standard error correction.
  4. Matching or weighting estimator. Show that your result survives a non-parametric approach that does not impose linearity.
  5. Sensitivity analysis. Report the Oster delta or Cinelli-Hazlett robustness value. How strong would confounding need to be to eliminate your result?

Reporting Standards

  • Report the estimand explicitly (ATT, ATE)
  • Describe the conditional independence assumption in your specific context (not just generically)
  • Present balance tables or covariate balance diagnostics after matching/weighting
  • Report the overlap (common support) region — if many treated units have no comparable controls, your estimate relies heavily on extrapolation
  • Include formal sensitivity analysis (Oster bounds, Cinelli-Hazlett, or specification curve)
  • Discuss the most plausible threats to conditional independence and why you believe they are not severe enough to explain your result

Phase 5: Be Honest About Limitations

Every observational study has limitations. Acknowledging them does not weaken your paper — it strengthens it by demonstrating intellectual honesty.

State what you cannot rule out. "Our design cannot address time-varying confounders that coincide with treatment adoption" is an honest and useful statement.

Quantify the concern. "Our Oster delta is 2.3, meaning unobserved selection would need to be 2.3 times as important as the combined observed selection to fully explain our result" is much more informative than "we cannot rule out omitted variable bias."

Suggest what would strengthen the evidence. "A staggered policy change in this domain would permit a DiD design that does not rely on selection on observables" signals that you understand the limitations of your own approach and know what better evidence would look like.