Practice·Robustness Stage·11 min read

Robustness Stage

Sensitivity Analysis for Unobservables

How robust are your results to omitted variable bias? Oster (2019) and Cinelli & Hazlett (2020) provide formal answers.

Applies To: OLS (Robust SEs, Clustering), Difference-in-Differences (Canonical 2×2), Fixed Effects (Two-Way FE), Matching (PSM, CEM, NN, Weighting), Instrumental Variables / 2SLS, Regression Discontinuity Design – Sharp, Regression Discontinuity Design – Fuzzy, Doubly Robust / AIPW Estimation, Causal Mediation Analysis, Synthetic Control, Synthetic Difference-in-Differences, Staggered DiD, Shift-Share / Bartik Instruments, Event Studies (Dynamic Treatment Effects), Logit / Probit, Experimental Design, Double/Debiased Machine Learning (DML), Causal Forests / Heterogeneous Treatment Effects, Random Effects, Poisson / Negative Binomial, Bunching Estimation, Cox Proportional Hazard Model, Heckman Selection Model, Interrupted Time Series (ITS), Regression Kink Design (RKD), Marginal Treatment Effects (MTE), Quantile Treatment Effects (QTE)
Reading Time: ~11 min read · 10 sections · 2 interactive exercises · 12 papers

The Problem You Cannot See

You have run your regression, included every control variable you can think of, and your treatment effect is statistically significant. Your advisor nods. A referee writes: "How robust is this result to omitted variable bias?"

Addressing this question is central to building a convincing empirical argument.

Nearly every observational study — and even many quasi-experimental ones — faces the same uncomfortable question: what if there is an unobserved variable that, if you could control for it, would shrink your estimate to zero? You can never prove that such a confounder does not exist. What you can do is formally characterize how strong such a confounder would have to be in order to eliminate your findings. If the answer is "implausibly strong," your result gains credibility. If the answer is "a modest confounder would do it," you generally want to worry.

The formalization is what for unobservables provides. The exercise does not prove your result is causal — it quantifies the magnitude of unobserved confounding required to overturn it.

Why Does This Matter?

The traditional approach to omitted variable bias was informal: run a regression with few controls, then add more, and argue that if the coefficient does not move much, unobservables probably would not change things either. This logic was formalized by Altonji et al. (2005), who proposed comparing the coefficient with and without controls as a way to bound the bias from unobservables.

The intuition is clean: if adding observed controls barely moves the coefficient, then unobserved confounders — which, under certain assumptions, produce bias of similar magnitude — probably would not move it much either. But this argument had limitations. It was hard to make precise, and it ignored the role of R-squared movements.

Two frameworks have since emerged as the standard tools for formalizing this argument. Both are worth knowing.

Framework 1: Oster's Delta and Bias-Adjusted Estimates

Oster (2019) extended the Altonji, Elder, and Taber logic into a complete framework that accounts for both coefficient movements and R-squared movements when controls are added.

The Core Idea: Coefficient Stability

Oster's approach asks: if selection on unobservables is proportional to selection on observables, how large would that proportionality factor (called delta, or $\delta$ ) have to be to drive the treatment effect to zero?

Here is the setup. You run two regressions:

Short regression (no controls): coefficient $\dot{\beta}$ , R-squared $\dot{R}$
Long regression (with controls): coefficient $\tilde{\beta}$ , R-squared $\tilde{R}$

If adding observables moves the coefficient from $\dot{\beta}$ to $\tilde{\beta}$ and moves R-squared from $\dot{R}$ to $\tilde{R}$ , then the bias from unobservables depends on two things:

Delta ( $\delta$ ): the ratio of selection on unobservables to selection on observables. If $\delta = 1$ , unobservables are equally important as observables for confounding. If $\delta = 2$ , unobservables are twice as important.
$R_{\max}$ : the R-squared you would obtain if you could include all relevant variables (observed and unobserved). This bound caps how much explanatory power remains to be captured.

Don't worry about the notation yet — here's what this means in words: The bias-adjusted treatment effect equals the controlled estimate minus a correction term that depends on how the coefficient and R-squared move when you add controls, scaled by delta and R-max.

The commonly used first-order approximation of the bias-adjusted estimate is:

\beta^* \approx \tilde{\beta} - \delta \cdot (\dot{\beta} - \tilde{\beta}) \cdot \frac{R_{\max} - \tilde{R}}{\tilde{R} - \dot{R}}

Note: The exact result in Oster (2019), Proposition 2, involves the real root of a cubic equation. The formula above is a widely used approximation that captures the key intuition. For precise computation, use the psacalc (Stata) or robomit (R) packages, which implement the exact solution.

When $\delta = 1$ (proportional selection) and a specific $R_{\max}$ is assumed, you can solve for the value of $\beta^*$ . Alternatively, you can set $\beta^* = 0$ and solve for the value of $\delta$ that would be required to explain away the result entirely. This procedure is the identified set approach.

Oster (2019) recommends $R_{\max} = 1.3 \tilde{R}$ as a default — based on calibrating observational estimates against RCT benchmarks. In practice, since $R^2$ cannot exceed 1, the heuristic is typically applied as $R_{\max} = \min(1.3 \tilde{R},\, 1)$ (a software-convention guardrail rather than Oster's stated formula). Sensitivity to the choice of $R_{\max}$ is worth exploring.

Recent management applications of the Oster method include Lee (2022), who use it to assess the robustness of start-up hierarchy effects, and Starr et al. (2019), who apply it to evaluate the sensitivity of findings on noncompete externalities.

Adjust the estimated coefficients and R-squared values from a short and long regression to see how the bias-adjusted estimate and delta change. Find the critical R-max where delta equals 1:

How to Interpret Delta

$\delta > 1$ : unobservables would need to be more important than observables to explain away the result. The further $\delta$ is above 1, the more robust the result — though $\delta$ only slightly above 1 provides little reassurance.
$\delta < 1$ : even modest unobserved confounding could overturn the result. This vulnerability is concerning.
$\delta < 0$ : the result would actually get stronger if you added more confounders. This sign reversal can happen and is very reassuring.

Framework 2: Cinelli & Hazlett's Partial R-Squared Approach

Cinelli and Hazlett (2020) propose a different — and in many ways more intuitive — framework based on partial R-squared values.

The Core Idea: Partial R-Squared

Instead of asking "how proportional is selection on unobservables to selection on observables?", Cinelli and Hazlett ask: how much residual variance in the treatment and in the outcome would a confounder have to explain in order to change the conclusion?

They parameterize the confounder's strength using two quantities:

$R^2_{Y \sim Z | X, D}$ : partial R-squared of the confounder $Z$ with the outcome $Y$ , after partialing out treatment $D$ and controls $X$
$R^2_{D \sim Z | X}$ : partial R-squared of the confounder $Z$ with the treatment $D$ , after partialing out controls $X$

The key result is that the omitted variable bias from leaving out $Z$ is a function of these two partial R-squared values. This framework lets you ask: "How strong would a confounder have to be — in terms of its association with treatment and outcome — to change my conclusion?"

Robustness Values

The framework produces two key quantities:

Robustness Value (RV): the minimum confounding strength that would reduce the estimate to zero. In Cinelli-Hazlett's general formulation, $RV_q$ is the common value of $R^2_{Y \sim Z | X, D} = R^2_{D \sim Z | X}$ (along the 45-degree line of the contour plot) at which the bias equals $100 \cdot q\%$ of the point estimate; the value cited here is $RV_{q=1}$ (bias equals the full point estimate). A confounder whose partial R-squareds with both treatment and outcome both exceed the RV would be strong enough to explain away the result. A related quantity, $RV_{q, \alpha}$ , fixes $q$ and gives the minimum confounding strength that would make the estimate statistically insignificant at level $\alpha$ .
Benchmarking: you compare the robustness value against the partial R-squared of observed covariates. If no observed covariate is as strong as the robustness value, an unobserved confounder would need to be stronger than anything you observe.

Contour Plots

The most distinctive feature of this framework is the contour plot: a two-dimensional graph with $R^2_{D \sim Z | X}$ on one axis and $R^2_{Y \sim Z | X, D}$ on the other, showing contour lines for different levels of bias. Benchmark covariates are plotted as points, giving readers an immediate visual sense of how strong a confounder would have to be.

Framework 3: Rosenbaum Bounds (Γ)

Oster's $\delta$ and Cinelli-Hazlett's partial $R^2$ both target regression-based estimators that already condition on observed covariates. When the estimator is matching, the natural sensitivity question is different: how strongly must hidden bias depart from random assignment within matched pairs before the matching estimate flips sign or loses significance? Rosenbaum bounds answer that question.

The Core Idea: Departure from Random Assignment Within Matched Pairs

After matching on observed covariates, treated and control units within each matched pair share the observed covariate profile by construction. Under conditional ignorability, they are also balanced on unobservables — the matched comparison identifies the treatment effect. The Rosenbaum sensitivity parameter $\Gamma \geq 1$ relaxes the second half of the conditional-ignorability claim:

\frac{1}{\Gamma} \leq \frac{P(D_i = 1 \mid u_i) / P(D_i = 0 \mid u_i)}{P(D_j = 1 \mid u_j) / P(D_j = 0 \mid u_j)} \leq \Gamma

for any two matched units $i$ and $j$ . A value of $\Gamma = 1$ corresponds to no hidden bias; $\Gamma > 1$ allows a hidden confounder $u$ to make treatment up to $\Gamma$ times more likely for one matched unit than the other.

Interpreting Γ

For each candidate $\Gamma$ , the procedure computes an upper bound on the one-sided p-value of the matched-pair test (typically Wilcoxon signed-rank). The reported summary is the largest $\Gamma$ at which the p-value still rejects the null at the conventional level. Larger threshold values indicate stronger evidence: a result that survives $\Gamma = 2.0$ requires an unobserved confounder that doubles the odds of treatment within matched pairs, conditional on the observed covariates.

As a rough field convention in applied matching work, threshold values of $\Gamma \approx 1.3$ – $1.5$ are sometimes invoked as evidence of moderate robustness; $\Gamma \geq 2$ is unusually robust. Rosenbaum (2002) develops the framework but does not prescribe universal numerical benchmarks — defensible thresholds depend on what unmeasured confounders are plausible in the specific application. The rbounds package in R and the mhbounds package in Stata implement the procedure.

When to Use

Rosenbaum bounds are the natural sensitivity tool when the headline estimator is matching, propensity-score matching, or any pair-matched design. They complement rather than substitute for Oster and Cinelli-Hazlett, which apply to regression-based estimators. Studies that report a propensity-score matching estimate as the primary specification are commonly expected to report a Rosenbaum bound; in regression-heavy applied economics work, Oster and Cinelli-Hazlett remain the default.

When to Use Which Framework

Feature	Oster (2019)	Cinelli and Hazlett (2020)	Rosenbaum (2002)
Key parameter	$\delta$ (proportionality of selection)	Partial $R^2$ (confounder strength)	$\Gamma$ (within-pair odds ratio)
Estimator	Regression	Regression	Matching / pair-matched
Requires	Short and long regression	Only the long regression	Matched-pair data
Benchmarking	Implicit (via $R_{\max}$ )	Explicit (observed covariates)	Direct (compare to $\Gamma = 1$ )
Visualization	Bias-adjusted $\beta^*$ as a function of $\delta$	Contour plots	$\Gamma$ -vs-p-value curve
Best for	Economics papers, applied micro	Papers with many potential confounders	Matching-based studies
Software	`psacalc` (Stata), `robomit` (R)	`sensemakr` (R, Stata, Python)	`rbounds` (R), `mhbounds` (Stata)

In practice, many papers now report multiple complementary frameworks. They address different aspects of the unobservables problem. For an alternative robustness approach that focuses on researcher degrees of freedom rather than unobservables, see specification curve analysis.

How to Do It: Code

Cinelli & Hazlett (sensemakr)

1library(sensemakr)
2
3# Fit the main regression
4model <- lm(outcome ~ treatment + x1 + x2 + x3, data = df)
5
6# Run sensitivity analysis
7sensitivity <- sensemakr(
8model = model,
9treatment = "treatment",
10benchmark_covariates = c("x1", "x2", "x3"),  # observed covariates for benchmarking
11kd = 1:3,  # multiples of each benchmark's strength to consider
12ky = 1:3
13)
14
15# Print summary
16summary(sensitivity)
17
18# Contour plot
19plot(sensitivity)
20
21# Extreme scenario plot
22ovb_contour_plot(sensitivity, sensitivity.of = "t-value")

Requiressensemakr

Oster (2019) Coefficient Stability

The R2max / rmax() arguments take an absolute $R^2$ value, so $\tilde{R}$ (the $R^2$ from the regression with all observed controls) must be computed first before applying the $R_{\max} = \min(1.3 \tilde{R},\, 1)$ heuristic.

1library(robomit)
2
3# Step 1: Estimate the fully controlled model to get R-tilde
4full_model <- lm(outcome ~ treatment + x1 + x2 + x3, data = df)
5R_tilde <- summary(full_model)$r.squared
6
7# Step 2: Apply Oster's heuristic with cap at 1
8R2max <- min(1.3 * R_tilde, 1)
9
10# Compute delta for beta = 0
11o_delta(
12y = "outcome",          # outcome variable
13x = "treatment",        # treatment variable
14con = "x1 + x2 + x3",  # controls (as a formula string)
15beta = 0,               # hypothesized true beta (0 = test for sign change)
16R2max = R2max,          # absolute R-max value (not a multiplier)
17type = "lm",
18data = df
19)
20
21# Bias-adjusted beta for delta = 1
22o_beta(
23y = "outcome",
24x = "treatment",
25con = "x1 + x2 + x3",
26delta = 1,
27R2max = R2max,          # same computed R-max
28type = "lm",
29data = df
30)

Requiresrobomit

How to Report Sensitivity Analysis

A good sensitivity analysis section includes:

The controlled estimate and its standard error.
The robustness value (Cinelli & Hazlett) or delta (Oster) — stated clearly in words: "A confounder would need to explain X% of the residual variation in both treatment and outcome to reduce the estimate to zero."
Benchmarking against observed covariates: "The strongest observed covariate explains Y% of residual variation; a confounder would need to be Z times as strong."
A contour plot or table showing sensitivity across a range of assumptions.
Bias-adjusted estimates for key scenarios ( $\delta = 1$ , $R_{\max} = 1.3\tilde{R}$ for Oster; benchmark multiples for Cinelli & Hazlett).

Here is an example of how to write this up:

We assess robustness to omitted variable bias using the partial R-squared framework of Cinelli and Hazlett (2020). The robustness value for our main estimate is 0.12, meaning that an unobserved confounder would need to explain at least 12% of the residual variance in both treatment assignment and the outcome to reduce the point estimate to zero. For comparison, the strongest observed predictor in our model (household income) explains only 4% of the residual variance in treatment. A confounder would therefore need to be three times as strong as income — a scenario we consider implausible given the institutional setting.

Common Mistakes

Pitfalls to avoid

Treating $R_{\max} = 1$ as the default. In many social science settings, outcomes are noisy. An $R_{\max}$ of 1 assumes a confounder could explain all remaining variance, which is often unrealistic. Use $R_{\max} = 1.3\tilde{R}$ as a starting point and explore sensitivity.
Reporting only delta without context. Saying " $\delta = 1.5$ " means nothing to a reader who does not know the framework. Translate into plain language: "Selection on unobservables would need to be 50% stronger than selection on observables."
Ignoring coefficient movements that go the wrong way. If adding controls increases your estimate, the implied selection bias runs in the opposite direction from what would explain away your result. This pattern is informative — report it.
Using sensitivity analysis as a substitute for a research design. Sensitivity analysis is a robustness exercise, not an identification strategy. A high robustness value does not make a poorly identified study well-identified. It adds credibility to an already-plausible design based on methods like OLS, matching, or fixed effects.
Forgetting to benchmark. Raw robustness values or deltas are hard to interpret in isolation. A common recommendation is to compare against the strength of observed confounders.
Applying Oster's method to IV or RDD. The Oster framework is designed for selection-on-observables settings. If your identification comes from an instrument or a discontinuity, sensitivity analysis for unobservables may not be the right tool (though there are extensions for these settings).

Concept Check

You run a regression of wages on a job training dummy, controlling for education and experience. The coefficient on training is 0.15 (SE = 0.05). You compute a robustness value (RV) of 0.08 using the Cinelli & Hazlett framework. Education has a partial R-squared of 0.10 with both treatment and outcome. Should you be worried about omitted variable bias?

No, because the estimate is statistically significant at the 1% level.Yes, because the robustness value (0.08) is less than the partial R-squared of education (0.10).No, because delta would likely be greater than 1.It depends on the R-max assumption.

Paper Library

Has replication code

Foundational (7)

Altonji, J. G., Elder, T. E., & Taber, C. R. (2005). Selection on Observed and Unobserved Variables: Assessing the Effectiveness of Catholic Schools.

Journal of Political EconomyDOI: 10.1086/426036

Altonji, Elder, and Taber develop the idea that if selection on observables is informative about selection on unobservables, one can bound the bias from omitted variables. Their approach becomes the basis for the widely used Oster (2019) sensitivity framework.

Cinelli, C., & Hazlett, C. (2020). Making Sense of Sensitivity: Extending Omitted Variable Bias.

Journal of the Royal Statistical Society: Series BDOI: 10.1111/rssb.12348

Cinelli and Hazlett develop a modern framework for sensitivity analysis based on partial R-squared measures, extending the omitted variable bias formula. Their approach allows researchers to benchmark the strength of hypothetical confounders against observed covariates, making sensitivity analysis more interpretable.

Cinelli, C., Ferwerda, J., & Hazlett, C. (2024). Sensemakr: Sensitivity Analysis Tools for OLS in R and Stata.

Observational StudiesDOI: 10.1353/obs.2024.a946583 Replication

Cinelli, Ferwerda, and Hazlett develop the sensemakr R and Stata package implementing their partial R-squared sensitivity analysis framework. They demonstrate the tool with applications to studies of violence and political attitudes, showing how researchers can benchmark potential confounders against observed covariates to assess the robustness of causal claims from observational data.

Frank, K. A. (2000). Impact of a Confounding Variable on a Regression Coefficient.

Sociological Methods & ResearchDOI: 10.1177/0049124100029002001

Frank develops the impact threshold for a confounding variable (ITCV), which calculates how much bias an omitted variable would need to introduce to invalidate an inference. This approach is widely adopted in education and management research.

Masten, M. A., & Poirier, A. (2021). Salvaging Falsified Instrumental Variable Models.

EconometricaDOI: 10.3982/ECTA17969

Masten and Poirier study what researchers can do when an IV model is falsified. They introduce the falsification frontier and the falsification adaptive set, which quantify minimal relaxations of the baseline assumptions and report the parameter values consistent with minimally nonfalsified models, providing a structured sensitivity-analysis framework for IV.

Oster, E. (2019). Unobservable Selection and Coefficient Stability: Theory and Evidence.

Journal of Business & Economic StatisticsDOI: 10.1080/07350015.2016.1227711

Oster extends the Altonji, Elder, and Taber approach to assess the robustness of regression estimates to omitted variable bias. She proposes a bounding method based on the proportional selection assumption and coefficient stability across specifications, now widely used in applied economics.

VanderWeele, T. J., & Ding, P. (2017). Sensitivity Analysis in Observational Research: Introducing the E-Value.

Annals of Internal MedicineDOI: 10.7326/M16-2607

VanderWeele and Ding introduce the E-value, a simple and intuitive measure of the minimum strength of association that an unmeasured confounder would need to have with both the treatment and outcome to fully explain away an observed treatment-outcome association. The E-value is widely adopted in epidemiology and increasingly discussed in social science.

Application (4)

Busenbark, J. R., Yoon, H., Gamache, D. L., & Withers, M. C. (2022). Omitted Variable Bias: Examining Management Research with the Impact Threshold of a Confounding Variable (ITCV).

Journal of ManagementDOI: 10.1177/01492063211006458

Busenbark and colleagues provide a practical guide to conducting sensitivity analysis in management research using the ITCV framework. They review its application in strategic management and organizational behavior, and demonstrate how to interpret and report results for management audiences.

Cinelli, C., & Hazlett, C. (2025). An Omitted Variable Bias Framework for Sensitivity Analysis of Instrumental Variables.

BiometrikaDOI: 10.1093/biomet/asaf004

Cinelli and Hazlett extend their OLS sensitivity framework to instrumental variables settings, showing how to assess the robustness of IV estimates to violations of the exclusion restriction. They derive bounds on IV bias as a function of the partial R-squared of a hypothetical confounder with both the instrument and the outcome, providing practical tools for benchmarking the plausibility of IV assumptions.

Lee, S. (2022). The Myth of the Flat Start-Up: Reconsidering the Organizational Structure of Start-Ups.

Strategic Management JournalDOI: 10.1002/smj.3333

Lee examines the relationship between organizational hierarchy on start-up creative and commercial success in the video game industry. She uses Oster's (2019) coefficient stability method to assess robustness to omitted variable bias, demonstrating how partial identification techniques complement standard empirical approaches in strategy research.

Starr, E., Frake, J., & Agarwal, R. (2019). Mobility Constraint Externalities.

Organization ScienceDOI: 10.1287/orsc.2018.1252

Starr, Frake, and Agarwal study how noncompete agreements generate externalities for all workers in a labor market, not just those directly constrained. They use Oster's (2019) coefficient stability diagnostic to assess robustness of findings to omitted variable bias, demonstrating that enforceable noncompetes are associated with reduced job offers, mobility, and wages even for unconstrained workers.

Survey (1)

Rosenbaum, P. R. (2002). Observational Studies.

SpringerDOI: 10.1007/978-1-4757-3692-2

Rosenbaum provides the standard textbook on observational study design, covering matching, sensitivity analysis, and design principles for drawing causal inferences from non-experimental data. His framework for sensitivity analysis (Rosenbaum bounds) is the standard tool for assessing how much unobserved confounding would be needed to overturn a matching-based finding.

The Problem You Cannot See#

Why Does This Matter?#

Framework 1: Oster's Delta and Bias-Adjusted Estimates#

The Core Idea: Coefficient Stability#

How to Interpret Delta#

Framework 2: Cinelli & Hazlett's Partial R-Squared Approach#

The Core Idea: Partial R-Squared#

Robustness Values#

Contour Plots#

Framework 3: Rosenbaum Bounds (Γ)#

The Core Idea: Departure from Random Assignment Within Matched Pairs#

Interpreting Γ#

When to Use#

When to Use Which Framework#

How to Do It: Code#

Cinelli & Hazlett (sensemakr)#

Oster (2019) Coefficient Stability#

How to Report Sensitivity Analysis#

Common Mistakes#

Concept Check#