235 Terms

Glossary

Key terms in causal inference and empirical research methods, defined precisely and accessibly.

Search terms/

10 terms

Always-Takers: In the instrumental variables framework: units who always take treatment regardless of the instrument value. The instrument has no effect on their treatment status, so they do not contribute to the LATE estimate.
Anticipation Effects: Changes in behavior that occur before treatment is actually implemented, because agents foresee or learn about the upcoming intervention. Anticipation violates the no-anticipation assumption required by difference-in-differences and event study designs: if treated units adjust their behavior before the treatment date, pre-treatment outcomes no longer represent untreated potential outcomes, and the parallel trends assumption may appear to fail even when it holds in the absence of anticipation.
Attenuation Bias: The systematic downward bias (toward zero) in a regression coefficient caused by classical measurement error in the corresponding independent variable. The magnitude of attenuation depends on the reliability ratio — the ratio of true signal variance to total (signal + noise) variance. The term strictly refers to the bivariate case; in multivariate OLS, classical measurement error in one regressor can bias the coefficients on the other (correctly-measured) regressors in either direction, depending on how the mismeasured variable covaries with the others. Attenuation bias is a common concern when using survey-based or proxy measures in management research.
Attrition: Loss of sample units after initial assignment. Attrition can bias treatment effect estimates if it differs by treatment status — for example, if treated units are more likely to drop out. Lee bounds and other partial-identification methods can address attrition bias.
Audit Study: A field experiment that tests for discrimination or differential treatment by sending matched applications — such as fictitious resumes, emails, or loan requests — that are identical except for a randomly varied characteristic of interest (e.g., name, race, gender). Also called a correspondence study when conducted via written materials. Audit studies provide some of the most credible evidence on discrimination because random assignment of the characteristic of interest eliminates confounding.
Autocorrelation: Correlation between observations at different time points in a time series. In ITS analysis, positive autocorrelation (common in health and policy data) causes conventional standard errors to be too small, producing spurious significance. Newey-West or ARIMA-based standard errors correct for this.
Average Marginal Effect (AME): The mean of the marginal effects of a variable computed separately at each observation's covariate values. In nonlinear models (logit, probit, Poisson), raw coefficients are not directly interpretable as marginal effects because the link function is nonlinear. The AME provides a single summary measure of a variable's effect on the outcome probability or count, averaged across the sample's covariate distribution.
Average Treatment Effect (ATE): The average causal effect of a treatment across the entire population. Formally: ATE = E[Y(1) - Y(0)]. Contrast with ATT (effect on the treated only, conditioning on D=1) and LATE (effect on compliers only, identified by IV).
Average Treatment Effect on the Treated (ATT): The average causal effect of a treatment on those who actually received it. Formally: E[Y(1) - Y(0) | D=1].
Average Treatment Effect on the Untreated (ATU): The average causal effect of a treatment on those who did not receive it. Formally: ATU = E[Y(1) - Y(0) | D = 0]. The ATU answers the question of what would happen if the untreated were to receive treatment, and differs from the ATT when there is selection into treatment based on potential gains.

14 terms

Backdoor Criterion: A graphical condition in a DAG that determines whether a set of covariates Z is sufficient to identify the causal effect of X on Y. The criterion is satisfied when Z blocks all backdoor paths from X to Y and does not include any descendant of X. When the backdoor criterion is met, conditioning on Z removes all confounding and the interventional distribution P(Y | do(X)) equals the conditional distribution P(Y | X, Z) averaged over Z.
Backdoor Path: A non-causal path between treatment and outcome in a DAG — specifically, any path that begins with an arrow pointing into the treatment node. Backdoor paths transmit spurious associations and must be blocked — by conditioning on appropriate variables — to identify causal effects.
Bad Control: A variable that, when included in a regression, introduces or amplifies bias in the causal estimate rather than reducing it. Angrist and Pischke (2009) define bad controls as 'variables that are themselves outcome variables in the notional experiment at hand' — that is, variables affected by the treatment. Common bad controls include post-treatment variables (which can block part of the causal effect or change the estimand) and colliders (which can open spurious non-causal paths when conditioned on). Cinelli, Forney, and Pearl (2022) provide a comprehensive taxonomy of good and bad controls.
Balance Test: A diagnostic check verifying that treatment and control groups are similar on observable baseline characteristics. In a well-executed randomized experiment, differences between groups should be small and attributable to chance. Balance tests typically report means by treatment arm and p-values from tests of equality. Significant imbalance on many covariates may signal problems with randomization, though some imbalance is expected by chance alone.
Bandwidth: The window width around a cutoff in regression discontinuity or regression kink designs that determines which observations inform the local estimate. A wider bandwidth includes more data but introduces more bias from observations far from the cutoff; a narrower bandwidth reduces bias but increases variance. Optimal bandwidth selection methods (Imbens and Kalyanaraman, 2012; Calonico, Cattaneo, and Titiunik, 2014) balance this tradeoff.
Bandwidth Selection: The choice of how much data around the cutoff to use in an RDD analysis. Narrower bandwidths reduce bias from misspecification but increase variance. Optimal bandwidths (e.g., Imbens-Kalyanaraman, Calonico-Cattaneo-Titiunik) balance this bias-variance trade-off.
Benjamini-Hochberg (BH) Procedure: A step-up procedure that controls the false discovery rate (FDR) at level q. Sort p-values in ascending order, find the largest k such that p(k) ≤ kq/m, and reject all hypotheses with p-values up to p(k). Less conservative than Bonferroni; appropriate when some false positives among discoveries are tolerable.
Bias: The systematic deviation of an estimator from the true parameter value. An estimator is unbiased if E[β̂] = β. Bias can arise from omitted variables, measurement error, selection, simultaneity, or misspecification. Consistency (convergence as n → ∞) is a weaker but often sufficient property when unbiasedness is unattainable.
Blocking: An experimental design technique that groups units into homogeneous blocks based on pre-treatment characteristics before randomizing treatment within each block. Blocking is the experimental analog of stratification: it ensures balance on the blocking variable and increases precision by removing between-block variation from the treatment effect estimate. In the extreme case of matched pairs, each block contains exactly two units, one assigned to treatment and one to control.
Bonferroni Correction: A multiple testing correction that controls the FWER by dividing the significance level α by the number of tests m: reject if pⱼ ≤ α/m. Simple but conservative — it controls the FWER under arbitrary dependence but may have low power when many tests are conducted.
Bootstrap: A resampling method that approximates the sampling distribution of an estimator by repeatedly drawing samples with replacement from the observed data and recomputing the statistic of interest. The bootstrap provides standard errors, confidence intervals, and p-values without relying on analytical formulas or distributional assumptions. Requires that the resampling scheme respects the data's dependence structure (e.g., resampling clusters in clustered data).
Bounds: An interval [lower, upper] within which the true parameter value must lie given the data and maintained assumptions. Bounds are the output of partial identification analysis. They are informative if the interval is narrow enough to be policy-relevant, even when the true value cannot be pinpointed.
Bunching Estimator: A method that estimates behavioral responses from the excess mass of units clustering at kink or notch points in a schedule (e.g., a tax bracket threshold). The estimator compares the observed density to a counterfactual density constructed by fitting a polynomial to the distribution away from the threshold.
Bunching Mass: The excess number of observations concentrated at a threshold (kink or notch point) relative to a smooth counterfactual density. Bunching mass B is computed as the integral of the observed density minus the counterfactual density over the bunching region. The implied elasticity is a function of B — specifically, B divided by the size of the kink in the log net-of-tax rate, normalised by the density of earnings at the threshold (Saez 2010).

30 terms

Caliper Matching: A matching method that restricts matches to treated-control pairs whose propensity scores (or covariate distances) differ by less than a specified maximum distance (the caliper). Caliper matching prevents poor matches between units with very different propensity scores, at the cost of potentially dropping treated units for which no sufficiently close control exists. The caliper width controls the bias-variance tradeoff: tighter calipers improve match quality but reduce the matched sample size.
Callaway-Sant'Anna Estimator: A modern DiD estimator for staggered treatment timing that computes group-time average treatment effects (ATT(g,t)) for each cohort at each time period, using either never-treated or not-yet-treated units as controls. Aggregation of these group-time effects into summary parameters avoids the negative weighting problem of TWFE.
Causal Forest: A machine learning method (Wager and Athey, 2018) that estimates heterogeneous treatment effects by growing an ensemble of causal trees, each splitting the covariate space to maximize treatment effect heterogeneity. Honest splitting — using separate subsamples for tree construction and leaf-effect estimation — is the key innovation that, together with overlap and regularity conditions, underwrites the pointwise asymptotic normality result. Causal forests produce pointwise estimates of the conditional average treatment effect (CATE) and, under those conditions, asymptotic confidence intervals. They are a leading method for data-driven discovery of treatment effect heterogeneity.
Censoring: When the outcome of interest (e.g., survival time) is only partially observed. Right censoring occurs when the event has not yet occurred by the end of the study period; left censoring occurs when the start time is unknown. Ignoring censoring by dropping censored observations or treating censoring times as event times introduces bias. Valid analysis requires methods designed for censored data, such as the Cox proportional hazards model or the Kaplan-Meier estimator.
Cluster Randomization: Random assignment at the group level — for example, randomizing entire stores, classrooms, or firms rather than individuals within them. Required when the treatment operates at the group level or when individual randomization is infeasible. Cluster randomization reduces effective sample size relative to individual randomization (by a factor related to the intraclass correlation) and requires clustered standard errors for valid inference.
Cluster-Robust Standard Errors: Standard errors that account for within-cluster correlation of residuals. Required when treatment is assigned at a higher level (e.g., state) than the unit of observation (e.g., individual). Computed via the sandwich estimator applied to cluster-level residual products. Also called clustered standard errors or Rogers standard errors.
Clustered Standard Errors: Standard error correction that accounts for correlation of errors within groups (e.g., students within schools, workers within firms). Usually required when treatment is assigned at the group level.
Coarsened Exact Matching (CEM): A matching method that temporarily coarsens continuous covariates into discrete bins, performs exact matching within bins, and then uses the original uncoarsened data for analysis (Iacus, King & Porro, 2012). By restricting matches to the same coarsened stratum, CEM bounds the maximum imbalance ex ante. The coarsening level controls the bias–variance tradeoff: finer bins yield better matches but fewer matched pairs.
Collider: A variable that is a common effect of two other variables on a path in a DAG (both arrows point into it). Conditioning on a collider opens a spurious association between its causes, a phenomenon known as collider bias or Berkson's paradox.
Common Support: The requirement that for every combination of covariate values, there is a positive probability of being in both the treatment and control groups. Formally: 0 < P(D=1 | X) < 1. Without common support, treatment effects cannot be estimated for units with covariate values found only in one group.
Complier: In the IV framework: a unit whose treatment status is changed by the instrument. LATE estimates the effect only for compliers.
Concentration Parameter: A measure of instrument strength in IV estimation, defined as the non-centrality parameter of the first-stage F distribution. In simple homoskedastic settings, it is related to the F-statistic by approximately E[F] ≈ 1 + (concentration parameter)/(number of instruments) in homoskedastic linear IV with fixed first-stage strength asymptotics (Stock-Yogo 2005); the mapping is more complex under heteroskedasticity or clustering. A low concentration parameter indicates weak instruments, which cause 2SLS to be biased toward OLS and produce unreliable inference. Stock and Yogo (2005) provide critical values for weak instrument tests based on the concentration parameter.
Conditional Average Treatment Effect (CATE): The average treatment effect for a subpopulation defined by a specific value of covariates X: CATE(x) = E[Y(1) − Y(0) | X = x]. CATE captures treatment effect heterogeneity — the idea that treatment may help some subgroups more than others. Estimation methods include subgroup analysis, interaction terms, and machine learning approaches such as causal forests, which estimate CATE nonparametrically across the covariate space.
Conditional Expectation Function (CEF): The function E[Y | X = x] that gives the expected value of the outcome Y for each value of the covariates X. OLS provides the best linear approximation to the CEF, and under certain assumptions the CEF itself has a causal interpretation.
Conditional Independence Assumption (CIA): The formal notation (Y(0), Y(1)) ⊥ D | X for the assumption defined under 'unconfoundedness' — that treatment assignment is independent of potential outcomes given observed covariates X. Also referred to as selection on observables. The two terms are interchangeable.
Conditioning: The act of holding a variable fixed — by controlling for it in a regression, stratifying, or matching — to block non-causal paths in a DAG. Conditioning on the right variables removes confounding, but conditioning on the wrong variables (e.g., colliders or mediators) can introduce or amplify bias.
Confidence Interval: A range of values constructed from sample data that, under repeated sampling, would contain the true parameter value at a specified frequency (e.g., 95% of the time). A 95% confidence interval does not mean there is a 95% probability the parameter lies in the interval; rather, 95% of intervals constructed by this procedure across repeated samples would contain the true value.
Confounder: A variable that causally affects both the treatment and the outcome, creating a spurious association between them. Confounders must be controlled for — through conditioning, design, or identification strategy — to recover causal effects.
Consistency: An estimator is consistent if it converges in probability to the true parameter value as the sample size approaches infinity. Consistency is weaker than unbiasedness: a consistent estimator may be biased in finite samples but the bias vanishes as n grows. Many commonly used estimators (e.g., IV, MLE, GMM) are consistent but not unbiased.
Construct Validity: The degree to which a measure accurately captures the theoretical concept it claims to represent. In management research, construct validity is critical because key concepts — such as dynamic capabilities, absorptive capacity, or organizational ambidexterity — are abstract and difficult to operationalize. Poor construct validity undermines causal inference regardless of the identification strategy employed.
Control Function: An approach to handling endogeneity that explicitly models the source of correlation between the regressor and the error term, then includes an estimate of that correlation as a control variable in the outcome equation. The Heckman two-step correction is a special case. Control function methods are particularly useful in nonlinear models where standard IV/2SLS may be inconsistent.
Correlated Random Effects (CRE): A panel data approach that models the correlation between unit-specific effects and regressors parametrically — typically by including the time-averaged values of the time-varying regressors (the Mundlak device) in a random effects specification. CRE produces identical slope estimates to fixed effects for time-varying regressors but additionally allows estimation of coefficients on time-invariant variables and facilitates testing of the RE assumption via significance of the Mundlak terms.
Correspondence Study: A type of audit study that uses written correspondence — typically fictitious resumes, cover letters, or emails — rather than in-person testers. The researcher sends matched pairs of applications that differ only in the characteristic under study (e.g., a name signaling race or gender) and measures callback rates. Correspondence studies avoid the confounds of in-person audit studies (e.g., differences in tester behavior) but are limited to measuring callbacks rather than job offers or wages.
Counterfactual: The outcome that would have occurred under an alternative treatment status. For a treated unit, the counterfactual is Y(0) — the outcome had it not been treated. The fundamental challenge of causal inference is that counterfactuals are never directly observed for any individual unit; all causal inference methods are strategies for imputing or estimating counterfactual outcomes.
Counterfactual Density: In bunching estimation, the hypothetical density of the running variable that would prevail in the absence of the threshold. Estimated by fitting a polynomial to the observed density histogram, excluding bins in the bunching region, then interpolating through the excluded region.
Covariate Balance: The condition where the distribution of observed covariates is similar across treatment and control groups. Balance is guaranteed in expectation by randomization and can be achieved in observational studies via matching, weighting, or stratification. Assessed using standardized mean differences, variance ratios, or KS statistics.
Cox Proportional Hazard Model: A semiparametric survival model that estimates the effect of covariates on the hazard rate without specifying the baseline hazard function. The proportional hazards assumption requires that the ratio of hazards for any two individuals is constant over time. Identification additionally requires non-informative censoring (censoring time conditionally independent of event time). Coefficients are interpreted as log hazard ratios.
Credibility Revolution: The methodological shift in empirical economics (roughly 1990s–2000s) toward research designs that provide more credible causal identification — emphasizing transparent identification strategies, natural experiments, and design-based methods over structural models with strong parametric assumptions. The term is attributed to Angrist and Pischke (2010), who argued that this shift improved the quality and believability of empirical work.
Cross-Fitting: A sample-splitting procedure used in double/debiased machine learning (DML) that partitions the data into K folds, trains nuisance models (outcome and treatment models) on the out-of-fold observations, and generates predictions for the held-out fold. By separating the data used for ML training from the data used for causal estimation, cross-fitting prevents overfitting bias that would arise if the same observations were used for both purposes. Common choices are K = 5 or K = 10 folds.
Cross-Validation: A model selection technique that partitions the data into training and validation sets to assess out-of-sample prediction performance. In K-fold cross-validation, the data are divided into K non-overlapping subsets; the model is trained on K−1 folds and evaluated on the held-out fold, rotating through all K folds. Cross-validation helps guard against overfitting and is widely used for tuning regularization parameters in machine learning methods.

17 terms

d-Separation/dee-separation/: A graphical criterion in DAGs for determining whether two variables are conditionally independent given a set of conditioning variables. Two variables are d-separated if every path between them is blocked — either by a non-collider that is conditioned on, or by a collider that is not conditioned on (and has no conditioned-on descendants).
Debiased Machine Learning (DML): A framework (Chernozhukov et al., 2018) for valid inference on low-dimensional causal parameters while using machine learning to estimate high-dimensional nuisance functions. DML combines Neyman orthogonality — which makes the causal parameter insensitive to first-order errors in nuisance estimation — with cross-fitting to avoid overfitting bias. Under regularity conditions — most importantly, sufficient nuisance-estimation convergence rates (the product of the two nuisance errors faster than n^(-1/2), e.g., each at o_p(n^(-1/4))) — the result is √n-consistent and asymptotically normal estimates of the causal parameter.
Defiers: In the instrumental variables framework: units who do the opposite of what the instrument prescribes — they take treatment when the instrument discourages it, and vice versa. The monotonicity assumption rules out defiers, ensuring that LATE is well-defined.
Demand Effects: Changes in participant behavior driven by cues about what the experimenter expects or desires, rather than by the treatment itself. Participants may unconsciously (or consciously) adjust their responses to confirm the researcher's hypothesis. Demand effects are a primary concern in lab experiments and surveys; field experiments conducted without participant awareness largely avoid them.
Derivative Ratio Estimator: In RKD, the causal effect is identified as the ratio of the change in slope of E[Y|X] at the kink to the change in slope of the treatment rule T(X) at the kink. In practice, researchers estimate local polynomial derivatives of the outcome and treatment equations on either side of the kink and take their ratio, analogous to the Wald estimator in fuzzy RDD.
Design Effect (DEFF): The ratio of the variance of an estimator under a complex sampling or experimental design to the variance under simple random sampling. Clustering typically produces a design effect greater than 1 (increasing effective variance), while stratification can produce a design effect less than 1 (decreasing effective variance). A design effect above 1 means a larger sample size is needed to achieve the same precision as simple random sampling.
Design-Based Inference: An approach to causal inference that derives identification from features of the research design — such as randomization, a natural experiment, or a known assignment mechanism — rather than from functional form assumptions on the outcome model. Examples include RCTs, DiD, RDD, and IV.
Difference in Means: The simplest estimator of a treatment effect: the average outcome in the treated group minus the average outcome in the control group. Under random assignment, the difference in means is an unbiased estimator of the average treatment effect (ATE). In observational studies, the difference in means is generally biased due to confounding.
Difference-in-Differences (DiD): A quasi-experimental design that estimates causal effects by comparing the change in outcomes over time between a treatment group and a control group (Card & Krueger, 1994; Ashenfelter, 1978). DiD removes time-invariant unobserved confounders by differencing within units, and removes common time trends by differencing across groups. The key identifying assumption is parallel trends: absent treatment, the treated and control groups would have followed the same trajectory.
Directed Acyclic Graph (DAG): A visual diagram showing causal relationships between variables. Arrows indicate the direction of causation. 'Acyclic' means no variable can cause itself through a chain of effects.
Dominated Region: In bunching estimation at a notch, the range of values just above the threshold where agents are strictly worse off than at the threshold itself. Because the average rate (not just the marginal rate) jumps at a notch, earning slightly above the threshold yields less after-tax income than earning exactly at the threshold. All rational agents in the dominated region should relocate to the threshold, producing a 'hole' (missing mass) in the income distribution.
Donor Pool: The set of untreated units from which the synthetic control is constructed. A good donor pool consists of units that are similar to the treated unit and were not affected by the same intervention. Including inappropriate donors can introduce bias; excluding too many reduces the quality of the match.
Donut-Hole RDD: A regression discontinuity specification that excludes observations in a narrow window immediately around the cutoff, creating a 'donut' of missing data at the threshold. Used as a robustness check when there is concern about manipulation of the running variable near the cutoff — if units can sort precisely at the threshold, excluding those observations may reduce bias. The donut-hole approach trades bias reduction for reduced power and changes the estimand slightly.
Doubly Robust (DR / AIPW): An estimator that combines an outcome regression model with a propensity score model and is consistent if either is correctly specified (Robins, Rotnitzky & Zhao, 1994). The augmented inverse probability weighting (AIPW) implementation additionally achieves the semiparametric efficiency bound under regularity conditions.
Doubly Robust Estimation: An estimation approach that combines an outcome regression model with inverse probability weighting (Robins, Rotnitzky & Zhao, 1994). The estimator is consistent if either the outcome model or the propensity score model is correctly specified — hence 'doubly robust.' This property provides insurance against model misspecification: the researcher gets two chances to get it right. Augmented inverse probability weighting (AIPW) is the most common doubly robust estimator.
Dynamic Completeness: The assumption that, once sufficient lags of the dependent variable and covariates are included in the model, no further lags have predictive power for the current outcome. Formally, it requires E[Yₜ | Yₜ₋₁, ..., Y₁, Xₜ, ..., X₁] = E[Yₜ | Yₜ₋₁, ..., Yₜ₋ₚ, Xₜ, ..., Xₜ₋q] for finite p and q. Dynamic completeness justifies the lag truncation in autoregressive distributed lag (ARDL) and VAR models.
Dynamic Panel Bias (Nickell Bias): Bias that arises when a lagged dependent variable is included in a fixed effects regression. Because the within-group transformation mechanically correlates the transformed lagged dependent variable with the transformed error, fixed effects estimates of the autoregressive coefficient are biased downward in short panels. The bias is of order 1/T and disappears as the number of time periods grows. Arellano–Bond and Blundell–Bond GMM estimators were designed to address this.

14 terms

Ecological Fallacy: The error of inferring individual-level relationships from aggregate data. Finding that industries with higher R&D spending have higher profits does not mean individual firms benefit from R&D — the relationship may be driven by composition effects or confounders that operate at different levels. Multilevel or hierarchical models can help distinguish individual-level from aggregate-level effects.
Effect Size: A measure of the magnitude of a treatment effect or association. Some effect sizes are standardized (e.g., Cohen's d for mean differences, with conventional benchmarks of 0.2 = small, 0.5 = medium, 0.8 = large), while others are scale-specific or model-specific (e.g., odds ratios, hazard ratios, elasticities, or percentage-point effects). Standardized effect sizes facilitate comparison across studies using different scales, but no single metric is universal across methods.
Effective Number of Clusters: A measure of the information content available for clustered inference, accounting for cluster size variation and within-cluster correlation. When the effective number of clusters is small (typically below 20-30), standard cluster-robust standard errors can severely under-reject, and the wild cluster bootstrap or randomization inference should be used.
Elasticity: The percentage change in one variable in response to a one-percent change in another variable. Elasticities are unit-free, making them comparable across contexts. In bunching estimation, the elasticity of taxable income measures how sensitively taxpayers adjust their reported income in response to changes in the net-of-tax rate. In regression with log-transformed variables, the coefficient on log(X) in a log-log specification directly estimates the elasticity.
Endogeneity/en-DOJ-en-ee-tee/: When the treatment or key regressor is correlated with the error term — meaning OLS estimates are biased and inconsistent (they do not converge to the true parameter as n grows). The central problem of observational research.
Entropy Balancing: A preprocessing method that reweights control observations to exactly match the treated group on specified moments (mean, variance, skewness) of covariate distributions. Unlike propensity score methods, entropy balancing directly targets covariate balance rather than modeling the treatment assignment mechanism (Hainmueller, 2012), guaranteeing exact balance on the chosen moments while retaining as much information as possible from the control group.
Essential Heterogeneity: A condition where individuals who select into treatment have systematically different treatment effects from those who do not. When essential heterogeneity is present, the OLS estimate, the IV estimate (LATE), and the ATE will generally differ because each identifies a different weighted average of individual treatment effects.
Estimand: The quantity you are trying to estimate — defined in terms of potential outcomes, not in terms of any particular statistical method.
Estimator: The statistical procedure you apply to data to estimate the estimand. Different estimators can target the same estimand.
Event Study: In panel causal inference, a dynamic version of difference-in-differences that estimates treatment effects at each time period relative to the treatment date, producing a sequence of period-specific coefficients. Event studies allow visualization of pre-trends (to assess the parallel trends assumption) and dynamic treatment effects (to examine how effects evolve over time). One pre-treatment period must be normalized to zero as the reference category. This usage is distinct from the classic financial event-study methodology (FFJR 1969, MacKinlay 1997), which measures abnormal stock returns around information events.
Event Time (Relative Time): Time measured relative to when a unit first receives treatment. In event studies and staggered DiD, event time recenters the calendar-time data so that all units have their treatment date at period 0, enabling comparison of dynamic effects across units with different treatment dates.
Exclusion Restriction: The assumption that the instrument affects the outcome only through the treatment — not through any other channel. The key (and untestable) assumption of instrumental variables.
Exogeneity/ek-SOJ-en-ee-tee/: When a variable is determined outside the system of interest — uncorrelated with the error term. Exogenous variation is the foundation of credible causal inference: it mimics random assignment.
External Validity: Whether findings from one study generalize to other populations, settings, or time periods.

10 terms

False Discovery Rate (FDR): The expected proportion of rejected hypotheses that are false rejections. FDR control is less conservative than FWER control and is appropriate when the goal is to limit the fraction of false positives among discoveries rather than preventing any false positive. The Benjamini-Hochberg procedure controls the FDR.
Falsification Test: A diagnostic test that checks a prediction that must hold if the identification strategy is valid. If a treatment should only affect certain outcomes but not others, showing a null effect on 'unaffected' outcomes strengthens the causal claim. For example, in a DiD study of a policy change, showing no effect on a group known to be unaffected provides evidence supporting the identification strategy.
Family-Wise Error Rate (FWER): The probability of making at least one Type I error (false rejection) across a family of hypothesis tests. Corrections such as Bonferroni, Holm, and Romano-Wolf control the FWER to guard against spurious discoveries when testing multiple hypotheses.
Field Experiment: A randomized experiment conducted in a real-world setting — an actual workplace, market, or organization — rather than in a laboratory. Field experiments combine the internal validity of randomization with the external validity of naturalistic conditions. Harrison and List (2004) distinguish artefactual, framed, and natural field experiments based on the subject pool, information set, and environment.
First Stage: In IV/2SLS, the regression of the endogenous treatment variable on the instrument(s). A strong first stage is necessary for reliable inference; the classic rule of thumb is F > 10 (Staiger and Stock, 1997); Stock and Yogo (2005) provide formal critical values, and Lee, McCrary, Moreira, and Porter (2022) show that valid 5% t-test inference in the just-identified case requires F > 104.7.
Fixed Effects: A panel data estimator that removes time-invariant unobserved heterogeneity by using only within-unit variation over time. Implemented by demeaning (subtracting unit means) or equivalently by including unit indicator variables. Fixed effects require the strict exogeneity assumption and cannot estimate the effects of time-invariant regressors.
Forbidden Regression: A common but generally invalid IV procedure in which fitted values from a nonlinear first stage (e.g., probit or logit) are substituted directly into a linear second-stage outcome regression. This predictor-substitution approach is generally inconsistent for the linear IV estimand because the nonlinear fitted values do not equal the linear projection onto the instrument set. Valid alternatives include standard 2SLS (which uses the linear first stage), control-function / two-stage residual inclusion methods, or using the nonlinear fitted values as instruments rather than direct substitutes.
Frisch-Waugh-Lovell Theorem: A theorem showing that the coefficient on a regressor in a multiple regression equals the coefficient from a simple regression of the residualized outcome on the residualized regressor, after partialling out all other covariates. The FWL theorem provides the theoretical foundation for understanding what "controlling for" variables means in regression and underlies modern methods like double/debiased machine learning.
Fundamental Problem of Causal Inference: The impossibility of observing both potential outcomes for the same unit at the same time — a treated unit's Y(0) and an untreated unit's Y(1) are never observed. Because individual causal effects Y(1) - Y(0) require both potential outcomes, they cannot be directly computed for any unit. All causal inference methods are strategies for overcoming this problem by using data from other units, time periods, or assumptions to impute the missing counterfactual. Named by Holland (1986).
Fuzzy RDD: A regression discontinuity design where crossing the cutoff increases the probability of treatment but does not determine it perfectly. Fuzzy RDD uses the cutoff as an instrument for actual treatment receipt, identifying a LATE for compliers near the cutoff via a Wald-type ratio.

4 terms

Garden of Forking Paths: The many researcher degrees of freedom in data analysis — choices about variable definitions, sample selection, control variables, functional form, outlier treatment, and subgroup analysis — that can collectively inflate the probability of finding a significant result even without deliberate manipulation. Named by Andrew Gelman and Eric Loken.
Gauss-Markov Theorem: A theorem stating that under the classical linear model assumptions (linearity, random sampling, no perfect collinearity, zero conditional mean of errors, homoscedasticity, and no autocorrelation across observations — the latter two often combined as spherical errors), OLS is the Best Linear Unbiased Estimator (BLUE). "Best" means lowest variance among all linear unbiased estimators. When homoscedasticity or no-autocorrelation is violated, OLS remains unbiased but is no longer efficient.
Generalized Method of Moments (GMM): An estimation framework that nests OLS, IV, and many other estimators as special cases. In management research, GMM is primarily used for dynamic panel models (Arellano–Bond, Blundell–Bond) where lagged dependent variables create dynamic panel bias that standard fixed effects cannot solve. GMM uses lagged levels and differences as internal instruments. Requires careful attention to instrument proliferation and specification tests (Hansen J-test, Arellano–Bond AR(2) test).
Goodman-Bacon Decomposition: A diagnostic that decomposes the two-way fixed effects (TWFE) DiD estimator with staggered treatment timing into a weighted average of all possible two-group, two-period DiD comparisons. Goodman-Bacon (2021) shows that TWFE assigns weights based on group size and timing variance, and that some comparisons use already-treated units as controls — producing potentially negative weights and biased estimates when treatment effects are heterogeneous across cohorts or over time.

7 terms

Hausman Test: A specification test that compares two estimators — one consistent under both the null and alternative hypotheses (e.g., fixed effects) and one efficient under the null but inconsistent under the alternative (e.g., random effects). A large test statistic rejects the null that the efficient estimator is consistent, indicating the assumptions of the efficient estimator are violated. Most commonly used to choose between fixed and random effects in panel data, though it has well-known power limitations in small samples.
Hawthorne Effect: A change in behavior caused by participants' awareness of being observed or studied, rather than by the treatment itself. Named after the Hawthorne Works factory experiments of the 1920s–1930s. The Hawthorne effect threatens experimental validity because the observed treatment effect may partly reflect the novelty of attention rather than the treatment's true impact. Natural field experiments, where participants do not know they are in a study, avoid this problem.
Hazard Rate: The instantaneous rate of experiencing an event at time t, conditional on having survived to time t. Formally: h(t) = lim(dt→0) P(t ≤ T < t+dt | T ≥ t) / dt. The hazard rate can increase, decrease, or remain constant over time depending on the process.
Hazard Ratio: The ratio of hazard rates between two groups, exp(β) in the Cox model. Under the proportional hazards assumption, the hazard ratio is constant over time — so a hazard ratio of 1.5 means the treatment group experiences the event at 1.5 times the rate of the control group at every time point. When proportional hazards is violated, the reported hazard ratio is a time-averaged summary rather than a constant. Unlike odds ratios, hazard ratios incorporate the timing of events.
Heckman Correction: A two-step procedure for correcting selection bias when the sample is non-randomly selected (Heckman, 1979). Step 1 estimates a probit model predicting selection (e.g., firm survival, market entry). Step 2 includes the inverse Mills ratio from Step 1 as a control in the outcome equation. Identification relies on joint normality of the error terms and an exclusion restriction — a variable that affects selection but not the outcome directly. Widely used in strategy research; sensitive to violations of normality and to weak exclusion restrictions (which produce near-collinearity between the inverse Mills ratio and the other regressors).
Heteroscedasticity/het-er-oh-skeh-das-TIS-ih-tee/: When the variance of the error term is not constant across observations. OLS coefficient estimates remain unbiased, but conventional standard errors are invalid; robust standard errors are required.
Homoscedasticity/ho-mo-skeh-das-TIS-ih-tee/: The assumption that the variance of the error term is constant across all values of the independent variables. When this assumption is violated (heteroscedasticity), conventional OLS standard errors are invalid — they converge to the wrong variance — though the coefficient estimates themselves remain unbiased. Heteroscedasticity-robust or clustered standard errors should be used instead.

12 terms

Identification: A research design is 'identified' when its assumptions are sufficient to recover the causal parameter of interest from observed data.
Identification Strategy: The specific source of exogenous variation and set of assumptions used to recover a causal parameter from observational data. Examples include random assignment (RCTs), parallel trends (DiD), continuity at the cutoff (RDD), and exclusion restrictions (IV). A credible identification strategy makes transparent what assumptions are required for the causal interpretation to hold.
Incidental Parameters Problem: A problem arising in nonlinear panel models (e.g., logit or probit with fixed effects) where the number of parameters grows with the sample size, causing inconsistency of the remaining parameters of interest. With a fixed number of time periods T, the unit-specific intercepts cannot be consistently estimated, and their inconsistency contaminates the slope coefficients. The conditional logit (Chamberlain, 1980) and bias correction methods (Fernandez-Val and Weidner, 2016) are solutions.
Independence Assumption: The assumption that treatment assignment D is statistically independent of potential outcomes: {Y(0), Y(1)} is independent of D. Unconditional independence is stronger than the conditional ignorability/unconfoundedness assumption and is guaranteed in expectation by randomization. When it holds, the simple difference in outcomes identifies the ATE without any need to condition on covariates.
Instrumental Variables (IV): An estimation strategy that uses an exogenous variable (the instrument) to isolate variation in the endogenous treatment that is uncorrelated with the error term. Valid instruments must satisfy three conditions: (1) relevance — the instrument affects the treatment, (2) independence/exogeneity — the instrument is uncorrelated with the error term, and (3) the exclusion restriction — the instrument affects the outcome only through the treatment. The exclusion restriction is untestable with a single instrument. With heterogeneous treatment effects, IV identifies the local average treatment effect (LATE) for compliers.
Intent-to-Treat (ITT): The average effect of being assigned to treatment, regardless of whether units actually comply with the assignment. The ITT is estimated by the reduced-form regression of the outcome on the assignment indicator. Under the standard IV assumptions (independence of the instrument, exclusion restriction, monotonicity, and a non-zero first stage), LATE = ITT / first-stage effect (the difference in treatment take-up between Z=1 and Z=0 groups), which is the Wald estimator. Distinct from the treatment-on-the-treated (TOT), which conditions on actual treatment receipt rather than assignment.
Interaction Effect: The additional effect of combining two variables beyond their separate individual effects. In regression, an interaction term (X × Z) tests whether the effect of X on Y changes at different levels of Z. In linear models, the interaction coefficient directly measures this differential effect. In nonlinear models (logit, probit), interaction effects cannot be read directly from the interaction coefficient and require computing cross-partial derivatives or marginal effects at representative values.
Internal Validity: Whether a study correctly estimates the causal effect for the population and setting it actually studies.
Interrupted Time Series (ITS): A quasi-experimental design that estimates the causal effect of an intervention by comparing the level and trend of a time series before and after a known intervention point. The key assumptions are that the pre-intervention trend would have continued in the absence of the intervention, no co-occurring interventions affect the outcome around the same time, and the counterfactual trend is correctly specified (e.g., linear vs nonlinear).
Intraclass Correlation Coefficient (ICC): The proportion of total variance in an outcome that is attributable to between-cluster (rather than within-cluster) variation. A high ICC means observations within clusters are similar, which reduces the effective sample size and must be accounted for in power calculations and standard errors.
Inverse Mills Ratio: The ratio of the standard normal PDF to the CDF, evaluated at the linear predictor from the selection equation: λ(z) = φ(z)/Φ(z). In the Heckman two-step estimator, the inverse Mills ratio is included as an additional regressor in the outcome equation to correct for selection bias. Its coefficient (ρ · σ_u, where σ_u is the standard deviation of the outcome-equation error) measures the direction and magnitude of selection.
Inverse Probability Weighting (IPW): A method that weights each observation by the inverse of its probability of receiving the treatment it actually received — 1/e(X) for treated units and 1/(1−e(X)) for untreated units, where e(X) is the propensity score (Horvitz & Thompson, 1952). Creates a pseudo-population in which treatment assignment is independent of observed covariates. IPW is an alternative to matching for estimating causal effects under the conditional independence assumption, and is a building block of doubly robust estimation.

2 terms

Kaplan-Meier Estimator: A nonparametric estimator of the survival function S(t) = P(T > t) from censored data (Kaplan & Meier, 1958). The Kaplan-Meier curve is a step function that decreases at each observed event time, with the step size depending on the number at risk. Censored observations contribute to the denominator until their censoring time. Validity requires non-informative (independent) censoring — the censoring time and event time are conditionally independent.
Kink: A point in a schedule (e.g., a tax bracket boundary) where the slope changes but the level is continuous — the marginal rate changes while the average rate remains continuous. Agents near a kink may adjust their behavior to locate at or near the kink point, producing bunching in the income distribution. The regression kink design (RKD) exploits kinks to identify causal effects from changes in slopes rather than levels. Contrast with a notch, where the level itself jumps.

6 terms

LASSO (Least Absolute Shrinkage and Selection Operator)/LASS-oh/: A penalized regression method that adds an L1 penalty (the sum of absolute coefficient values) to the OLS objective function, producing sparse coefficient estimates by shrinking some coefficients exactly to zero. LASSO performs simultaneous estimation and variable selection. In causal inference, post-LASSO and double-selection LASSO (Belloni, Chernozhukov, and Hansen, 2014) are used for principled covariate selection.
Linear Exponential Family: A class of distributions — including the normal, Poisson, binomial, and gamma — for which the score of the log-likelihood is linear in the dependent variable. Wooldridge (2010) shows that QMLE based on any member of the linear exponential family is consistent for the conditional mean parameters when the mean is correctly specified, regardless of the true distribution. This property provides a theoretical justification for Poisson QMLE with non-count data.
Linear Projection: The population linear function L[Y | 1, X] that minimizes the mean squared prediction error among all linear functions of X. Unlike the conditional expectation function (CEF), the linear projection is always well-defined and does not require E[Y | X] to be linear. OLS estimates the coefficients of the linear projection, which equal the CEF coefficients when the CEF is linear — e.g., in a saturated model (all discrete regressors with full interactions) or under joint normality of (Y, X) (Angrist & Pischke 2009, ch. 3).
Local Average Treatment Effect (LATE): Under monotonicity (no defiers), the average causal effect for the subpopulation of compliers — those whose treatment status is changed by the instrument. Formally: E[Y(1) − Y(0) | D(1) > D(0)], where D(z) is potential treatment status under instrument value z. Without monotonicity the IV estimand is a weighted mixture of complier and defier effects. Introduced by Imbens and Angrist (1994).
Local Linear Regression: A nonparametric regression method that fits a weighted linear regression within a neighborhood (bandwidth) of each evaluation point. In RDD, it is the standard estimation method because it has better boundary properties than local constant (Nadaraya-Watson) regression at the cutoff.
Local Polynomial Regression: A nonparametric regression method that fits polynomial functions locally within a bandwidth window around each evaluation point. In regression discontinuity designs, local linear (degree 1) or local quadratic (degree 2) regression is the standard approach for estimating treatment effects at the cutoff. Local linear regression avoids the boundary bias that affects kernel regression and is implemented in the rdrobust package.

14 terms

Manipulation Testing: A diagnostic check in RDD that tests whether units can precisely manipulate the running variable to sort above or below the cutoff. If manipulation is possible, the assumption that units near the cutoff are comparable breaks down. The McCrary density test and Cattaneo-Jansson-Ma test are standard tools.
Marginal Treatment Effect (MTE): The treatment effect for individuals at the margin of participation — those whose unobserved resistance to treatment U_D equals a particular value u. Formally: MTE(x, u) = E[Y(1) - Y(0) | X = x, U_D = u]. The MTE curve traces how treatment effects vary with the unobserved propensity to select into treatment. ATE, ATT, and LATE are all weighted averages of the MTE curve under different weight functions.
Maximum Likelihood Estimation (MLE): An estimation method that finds the parameter values maximizing the probability (likelihood) of observing the sample data given the assumed statistical model. MLE is asymptotically efficient (achieves the Cramer-Rao lower bound), consistent, and asymptotically normal under correct model specification. It nests many common estimators: logit, probit, Poisson, and Tobit are all maximum likelihood estimators.
McCrary Density Test: A diagnostic test for manipulation of the running variable in regression discontinuity designs. McCrary (2008) tests whether the density of the running variable is continuous at the cutoff. A discontinuity in the density — more observations just above or just below the cutoff than expected — suggests that units can manipulate their position relative to the threshold, violating the continuity assumption required for RDD identification.
Measurement Error: Discrepancy between a variable's true value and its recorded value. In the bivariate case, classical measurement error in the treatment variable biases its OLS coefficient toward zero (attenuation bias); in multivariate regression the direction of bias on other coefficients is ambiguous. Measurement error in the outcome increases noise but does not bias coefficients. Measurement error in control variables can transmit bias to the treatment coefficient. IV estimation can correct for measurement error in the treatment if the instrument is measured without error.
Mediation: A causal mechanism in which an independent variable X affects an outcome Y through an intermediate variable M (the mediator). The indirect effect is X → M → Y; the direct effect is X → Y net of M. In management research, mediation tests theoretical mechanisms — for example, whether CEO characteristics affect firm performance through strategic choices. Modern causal mediation analysis requires the sequential ignorability assumption.
Minimum Detectable Effect (MDE): The smallest treatment effect that a study is powered to detect at a given significance level and statistical power. MDE is a key output of power analysis and depends on sample size, variance, and the desired Type I and Type II error rates.
Model-Based Inference: An approach to causal inference that relies on correctly specifying a statistical model — including its functional form and distributional assumptions — to identify causal effects. Contrasted with design-based inference, where identification comes from the research design itself.
Moderation: When the effect of an independent variable X on an outcome Y depends on a third variable Z (the moderator). In management research, moderation tests boundary conditions — for example, whether the effect of diversification on performance depends on industry dynamism. Estimated via interaction terms (X × Z) in regression. Distinct from mediation, which concerns causal mechanisms rather than contingencies.
Monotonicity: In the instrumental variables framework, the assumption that the instrument affects treatment status in only one direction for all units — there are no 'defiers' who do the opposite of what the instrument encourages. The IV estimand is identified without monotonicity, but the LATE interpretation as a positive-weighted average of complier-level effects (Imbens and Angrist, 1994) requires it; without monotonicity the IV estimand mixes complier and defier effects with potentially canceling signs.
Moulton Factor: The variance-inflation factor that arises when within-cluster dependence is ignored. For a regressor that is constant within clusters (e.g., a state-level treatment dummy), the simplified inflation factor is √(1 + (m−1)·ρ), where m is the average cluster size and ρ is the intraclass correlation of the errors. Even modest ρ (e.g., 0.05) with large clusters (m = 50) yields a Moulton factor of approximately 1.86, meaning naive SEs are 86% too small. In general, the inflation also depends on within-cluster regressor correlation and cluster-size structure.
Multicollinearity/mul-tee-koh-lin-ee-AIR-ih-tee/: A condition in which two or more regressors are highly (but not perfectly) correlated, making it difficult to isolate their individual effects. Multicollinearity inflates the variance of coefficient estimates without introducing bias. Symptoms include large standard errors on individually insignificant regressors whose joint F-test is significant. Perfect collinearity causes the OLS estimator to be undefined.
Multiple Hypothesis Testing: The problem that testing many hypotheses simultaneously inflates the probability of at least one false positive beyond the nominal significance level. With m independent tests at α = 0.05, the probability of at least one false rejection is 1 − (1 − α)ᵐ. Corrections include Bonferroni and Holm (controlling the family-wise error rate) and Benjamini-Hochberg (controlling the false discovery rate).
Mundlak Device: A technique that adds group-level means of time-varying covariates to a random effects model, allowing simultaneous estimation of between-group and within-group effects. Proposed by Mundlak (1978), it bridges fixed and random effects: if the Mundlak terms are jointly significant, the random effects estimator is inconsistent and fixed effects is preferred. Also called correlated random effects.

9 terms

Natural Experiment: A situation in which some external event or institutional feature creates variation in treatment assignment that is plausibly exogenous — mimicking random assignment without deliberate experimental intervention. Examples include policy changes, lotteries, and geographic boundaries.
Negative Weights: In two-way fixed effects (TWFE) regressions with staggered treatment and heterogeneous effects, some group-time treatment effects receive negative weights in the overall TWFE estimate. The pattern occurs because TWFE implicitly uses already-treated units as controls, and the resulting 'DiD' comparison subtracts a positive treatment effect from the comparison group. Under sufficient heterogeneity in treatment effects over time, negative weights can cause the TWFE estimate to have the opposite sign from every group-time ATT. The Goodman-Bacon decomposition reveals these weights.
Never-Takers: In the instrumental variables framework: units who never take treatment regardless of the instrument value. Like always-takers, the instrument has no effect on their treatment status and they do not contribute to the LATE estimate.
Never-Treated: Units that do not receive treatment at any point during the study period. In staggered DiD designs, never-treated units serve as a clean comparison group because their trends are not contaminated by treatment effects. Some modern DiD estimators require a never-treated group.
Neyman Orthogonality: A condition ensuring that estimation of nuisance parameters does not affect the first-order bias of the target causal parameter. Neyman orthogonality is the key property that enables double/debiased machine learning (DML) to use flexible ML estimators for nuisance functions while maintaining √n consistency for the parameter of interest.
Noncompliance: When experimental subjects do not follow their randomly assigned treatment status — either failing to take the treatment when assigned to it, or taking the treatment when assigned to control. Noncompliance breaks the simple link between assignment and treatment, requiring the researcher to distinguish between the intent-to-treat (ITT) effect of assignment and the local average treatment effect (LATE) of actual treatment receipt. IV/2SLS with assignment as the instrument is the standard remedy.
Not-Yet-Treated: Units that have not yet received treatment at a given calendar time but will receive it later. Under the assumption that treatment effects do not anticipate future adoption, not-yet-treated units can serve as valid controls for currently-treated units in staggered DiD designs.
Notch: A point in a schedule (e.g., a tax or benefit schedule) where the average rate jumps discontinuously — the level of the schedule itself changes, not just its slope. At a notch, there is a 'dominated region' just above the threshold where agents are strictly worse off than at the threshold, creating strong incentives to bunch. Notches generate both excess mass at the threshold and a 'hole' (missing mass) in the dominated region above it. Contrast with a kink, where only the marginal rate changes.
Nuisance Parameter: A parameter that is not of direct interest but must be estimated to identify the target causal parameter. In DML, the conditional expectations E[Y | X] and E[D | X] are nuisance parameters — they must be estimated to partial out confounders, but they are not the causal effect of interest. Neyman orthogonality ensures that first-order errors in nuisance parameter estimation do not bias the causal estimate.

7 terms

Odds Ratio: The ratio of the odds of an event occurring in one group to the odds in another group. In logistic regression, exp(β) gives the odds ratio associated with a one-unit change in the regressor. An odds ratio of 1 indicates no association; values above 1 indicate higher odds of the event for a one-unit increase in the regressor. Odds ratios approximate relative risks only when the outcome is rare (the rare disease assumption).
Omitted Variable Bias (OVB): Bias in a coefficient estimate caused by excluding a relevant variable that is correlated with both the treatment and the outcome. The direction and magnitude of OVB depend on the correlation of the omitted variable with the treatment and its partial effect on the outcome.
Optimization Friction: Costs or barriers that prevent agents from perfectly adjusting to thresholds. Frictions attenuate observed bunching below the frictionless prediction, causing the naive bunching estimator to underestimate the true behavioral elasticity. Kleven and Waseem (2013) develop a structural model that accounts for frictions.
Overdispersion: When the variance of a count outcome exceeds its conditional mean, violating the Poisson assumption that the variance equals the mean. Overdispersion is common in empirical count data and, if ignored, leads to standard errors that are too small and inflated test statistics. The negative binomial model explicitly accommodates overdispersion by adding a dispersion parameter; alternatively, Poisson quasi-maximum likelihood estimation (QMLE) with robust standard errors remains consistent for the conditional mean.
Overfitting: When a statistical model fits the training data too closely — capturing noise rather than the true underlying relationship — resulting in poor out-of-sample prediction. In causal inference, overfitting is particularly dangerous when ML methods are used to estimate nuisance functions, because overfitted predictions can bias the causal parameter estimate. Cross-fitting and regularization are standard remedies.
Overidentification Test: A test of instrument validity when there are more instruments than endogenous regressors (the overidentified case). The Hansen J-test (or Sargan test in the homoscedastic case) checks whether the instruments are mutually consistent — whether they all point to the same causal estimate. Rejection suggests at least one instrument violates the exclusion restriction, though the test has low power when all instruments are invalid in the same direction.
Overlap Assumption: The assumption that for every value of the covariates, both treatment and control are possible: 0 < P(D=1|X) < 1. Without overlap, some units have no comparable counterparts in the opposite treatment group, and the treatment effect is not identified for those units.

15 terms

p-Hacking: The practice of manipulating data analysis — trying multiple specifications, subsamples, variable definitions, or statistical tests — until a statistically significant result (p < 0.05) is obtained. p-hacking inflates false positive rates and produces findings that fail to replicate. Pre-registration, specification curve analysis, and multiple testing corrections are remedies.
Panel Data: Data in which the same cross-sectional units (individuals, firms, countries) are observed across multiple time periods. Panel data enable methods like fixed effects and difference-in-differences that exploit within-unit variation over time to control for time-invariant unobserved confounders.
Parallel Trends Assumption: The assumption that, in the absence of treatment, the treated and control groups would have followed the same trend over time. This assumption applies to the counterfactual trend (which is never observed) and is fundamentally untestable — pre-treatment parallel trends are suggestive but do not prove the assumption holds post-treatment. The key assumption of difference-in-differences.
Partial Identification: An approach that acknowledges when data and assumptions are insufficient to point-identify a causal parameter and instead derives informative bounds on the parameter. Examples include Manski bounds, Lee bounds for attrition, and sensitivity analyses that report a range of estimates under varying assumptions.
Phantom Counterfactual: The unobserved potential outcome for a unit — the outcome it would have experienced under the alternative treatment state. For a treated unit, the phantom counterfactual is Y(0); for an untreated unit, it is Y(1). It is 'phantom' because it is fundamentally unobservable: the same unit cannot simultaneously be treated and untreated (the fundamental problem of causal inference). All causal inference methods — randomization, matching, differencing, instrumental variables — are strategies for imputing or averaging over phantom counterfactuals using observed data from other units.
Placebo Test: A specific falsification test that applies the 'treatment' where no effect should exist — either at a different time, in a different population, or using a fake treatment variable. In DiD, a common placebo test estimates the treatment effect using pre-treatment periods only; a significant 'effect' suggests the parallel trends assumption is violated. In synthetic control, placebo tests permute the treated unit across donor pool members.
Poisson Regression: A regression model for count outcomes that specifies the conditional mean as an exponential function: E[Y|X] = exp(X'β). Coefficients are interpreted as semi-elasticities: a one-unit change in Xⱼ multiplies the conditional mean by exp(βⱼ). The Poisson QMLE is consistent for the conditional mean parameters even when the Poisson distributional assumption is violated, provided the conditional mean is correctly specified (Gourieroux, Monfort, and Trognon, 1984).
Policy-Relevant Treatment Effect (PRTE): The average treatment effect for the specific subpopulation whose treatment status would change under a proposed policy change. Unlike LATE (which corresponds to the compliers induced by a particular instrument), PRTE uses the weight function implied by the specific policy under consideration. PRTE is computed as a weighted average of the MTE curve, where the weights reflect which individuals are moved into or out of treatment by the policy.
Positivity (Overlap): The assumption that every unit has a nonzero probability of receiving each treatment level, conditional on covariates: 0 < P(D = 1 | X) < 1 for all values of X in the population. Positivity ensures that both treated and untreated units exist at every covariate value, so treatment effects are estimable. Violations — propensity scores near 0 or 1 — force extrapolation rather than interpolation and inflate variance in inverse probability weighting and doubly robust estimators.
Potential Outcomes: The outcomes a unit would experience under each possible treatment status. Y(1) is the outcome if treated; Y(0) is the outcome if not treated. We can only observe one.
Pre-Analysis Plan: A detailed specification of the hypotheses, outcome variables, estimating equations, and sample restrictions that a researcher commits to before examining the data. Pre-analysis plans combat p-hacking and researcher degrees of freedom by distinguishing pre-specified (confirmatory) analyses from post-hoc (exploratory) ones. Standard practice in clinical trials and increasingly expected in social science experiments.
Pre-Trends Test: A diagnostic test in difference-in-differences designs that examines whether the treatment and control groups exhibited similar outcome trends before the intervention. Statistically insignificant pre-treatment coefficients in an event study provide suggestive (but not conclusive) evidence of parallel trends. Roth (2022) demonstrates that pre-trends tests have low power against many plausible violations and that conditioning on passing a pre-test can distort inference.
Propensity Score: The probability of receiving treatment conditional on observed covariates, e(X) = P(D=1 | X). The propensity score is always a balancing score: conditioning on e(X) makes treatment assignment independent of the observed covariates X (Rosenbaum & Rubin, 1983, Theorem 1) — no assumption on potential outcomes is needed for this property. Under the additional conditional independence (unconfoundedness) assumption together with positivity/overlap (0 < e(X) < 1 on the support of interest), conditioning on e(X) also suffices to make treatment ignorable for potential outcomes, enabling matching, stratification, or inverse probability weighting to identify causal effects.
Proportional Hazards Assumption: The assumption that covariate effects multiply the baseline hazard by a constant factor that does not change over time: h(t|X) = h₀(t) · exp(Xβ). The implication is that the hazard ratio for any two individuals is constant across time. Test with Schoenfeld residuals or log-log survival plots.
Publication Bias: The systematic tendency for statistically significant, positive, or novel results to be more likely to be published than null or negative findings. Publication bias distorts the scientific literature by creating a biased sample of all studies conducted, making effects appear larger and more consistent than they truly are. Meta-analytic methods (funnel plots, trim-and-fill, p-curve) can detect and partially correct for publication bias.

3 terms

Quantile Regression: A regression method that estimates the conditional quantile function Qτ(Y|X) instead of the conditional mean E[Y|X]. At τ = 0.5, quantile regression estimates the conditional median. Minimizes a check function (asymmetric absolute loss) rather than squared errors. Introduced by Koenker and Bassett (1978).
Quantile Treatment Effect (QTE): The difference in the quantiles of the potential outcome distributions: QTE(τ) = the τ-th quantile of Y(1) minus the τ-th quantile of Y(0). QTE reveals how treatment effects vary across the outcome distribution, not just at the mean. QTEs are causally meaningful as differences in marginal potential-outcome quantiles without additional assumptions; stronger assumptions such as rank invariance or rank similarity are only needed for interpretations that link individuals' ranks across treatment states.
Quasi-Maximum Likelihood Estimation (QMLE): An estimation method that maximizes a likelihood function that may be misspecified for the true data-generating process, but still yields consistent estimates of the conditional mean parameters under certain conditions. The Poisson QMLE, for example, consistently estimates E[Y | X] = exp(X'β) even when the data are not Poisson-distributed, provided the conditional mean is correctly specified. Robust (sandwich) standard errors are required because the information matrix equality does not hold under misspecification.

19 terms

Random Effects: A panel data estimator that models unit-specific heterogeneity as random draws from a distribution, using a weighted average of between-unit and within-unit variation. Random effects allow estimation of time-invariant regressors but require the stronger assumption that unit effects are uncorrelated with all regressors (past, present, and future). The Hausman test compares fixed and random effects to assess this assumption.
Randomization Inference: A mode of statistical inference that derives the distribution of a test statistic by considering all possible random assignments of treatment, rather than relying on large-sample asymptotics (Fisher, 1935). Particularly useful when the number of clusters or treated units is small and conventional standard errors are unreliable.
Rank Invariance: The assumption that treatment does not change the rank ordering of individuals in the outcome distribution — the person at the τ-th quantile under treatment is the same person at the τ-th quantile without treatment. Rank invariance is required to interpret quantile treatment effects (QTEs) as individual-level effects rather than distributional shifts. The assumption is plausible for small perturbations but implausible when treatment fundamentally reshuffles outcomes.
Recentered Influence Function (RIF): A transformation of the outcome variable that allows OLS to estimate effects on distributional statistics (quantiles, Gini, variance). For the τ-th quantile: RIF(Y; Qτ) = Qτ + (τ − 1(Y ≤ Qτ)) / fY(Qτ). Running OLS on the RIF-transformed outcome gives unconditional quantile effects.
Reduced Form: In instrumental variables, the regression of the outcome directly on the instrument, bypassing the endogenous treatment variable. In the just-identified case, the reduced-form coefficient equals the product of the first-stage effect and the causal effect. A significant reduced form is evidence that the instrument affects the outcome through some channel.
Regression Discontinuity Design (RDD): A quasi-experimental design that exploits a known cutoff in a continuous assignment variable (the running variable) to estimate local causal effects (Thistlethwaite & Campbell, 1960). Identification requires continuity of the conditional expectations of potential outcomes in the running variable at the cutoff (Hahn, Todd & Van der Klaauw, 2001), so units just above and just below the cutoff are comparable in the limit, and the discontinuity in the outcome at the cutoff identifies the local average treatment effect. RDD can be sharp (treatment is deterministic at the cutoff) or fuzzy (the probability of treatment jumps at the cutoff but is not deterministic).
Regression Kink Design: A quasi-experimental design that identifies causal effects from a change in the slope (not level) of the treatment assignment function at a known threshold. The estimand is the ratio of the derivative of the conditional outcome expectation to the derivative of the treatment function at the kink point.
Regression to the Mean: The statistical tendency for extreme observations to be closer to the average on subsequent measurement, even absent any intervention. In management research, this threatens pre-post comparisons: a firm selected for poor performance will tend to improve regardless of treatment. Regression to the mean is often mistaken for a treatment effect in before-after studies without a control group.
Regularization: A technique that adds a penalty term to the objective function to prevent overfitting by shrinking coefficient estimates toward zero. L1 regularization (LASSO) produces sparse models by setting some coefficients exactly to zero; L2 regularization (ridge) shrinks all coefficients but retains all variables. In causal inference, regularization introduces bias in exchange for reduced variance, and Neyman orthogonality is needed to ensure this regularization bias does not contaminate the causal estimate.
Researcher Degrees of Freedom: The many decisions a researcher makes during data analysis — variable definitions, sample restrictions, model specifications, outcome measures — that are not dictated by theory and can be used (consciously or not) to obtain desired results. Pre-registration and specification curve analysis are remedies.
Reverse Audit Study: A field experiment that reverses the direction of a traditional audit study: instead of sending fictitious applications from job seekers to employers, the researcher sends recruitment materials from an employer to prospective job seekers and measures their responses. This design identifies how organizational characteristics (e.g., hierarchy, culture, compensation structure) causally affect the composition of applicant pools, complementing traditional audit studies that identify employer-side discrimination.
Reverse Causality: A threat to causal inference where the presumed effect actually causes the presumed cause. In management research, this commonly arises when studying the relationship between firm practices and performance — for example, whether governance practices improve performance or high-performing firms adopt different governance. Instrumental variables, lagged variables, and quasi-experimental designs are common strategies for addressing reverse causality.
Right Censoring: When the exact event time is not observed because follow-up ended before the event occurred. The observation is censored at the last known survival time. Ignoring censoring (e.g., dropping censored observations or treating censoring time as event time) introduces bias. Non-informative censoring — where the censoring mechanism is independent of the event process — is required for valid inference.
Robust Standard Errors: Standard error estimates that remain consistent under heteroscedasticity (Huber-White HC estimators) or within-cluster correlation (cluster-robust estimators), without requiring correct specification of the error variance structure. Robust standard errors adjust the variance-covariance matrix of the coefficient estimates using the observed residuals. They do not affect point estimates — only the precision of inference.
Robustness Check: A supplementary analysis testing whether the main result holds under alternative specifications, samples, measures, or assumptions. Common robustness checks include alternative dependent variables, different control sets, subsample analyses, and different estimation methods. In causal inference, robustness checks assess sensitivity to modeling choices, while falsification and placebo tests assess the causal identification itself.
Romano-Wolf Correction: A stepwise multiple testing procedure that controls the FWER while accounting for the dependence structure among test statistics via a bootstrap or resampling approach. More powerful than Bonferroni because it exploits the correlation structure of the test statistics rather than treating them as independent.
Root Mean Squared Prediction Error (RMSPE): A measure of how well the synthetic control matches the treated unit in the pre-treatment period. Computed as the square root of the average squared difference between the treated unit's outcome and the synthetic control's outcome across pre-treatment periods. A small pre-treatment RMSPE indicates a good counterfactual fit. In placebo inference, the ratio of post-treatment RMSPE to pre-treatment RMSPE is compared across treated and placebo units to construct p-values.
Rubin Causal Model: The potential outcomes framework for causal inference, formalized by Donald Rubin building on Jerzy Neyman's earlier work (Rubin, 1974; Neyman, 1923). Each unit i has potential outcomes Yᵢ(1) and Yᵢ(0) corresponding to treatment and control; the individual causal effect is Yᵢ(1) − Yᵢ(0). Because only one potential outcome is observed per unit (the fundamental problem of causal inference), causal inference requires assumptions that link observed data to the missing potential outcomes.
Running Variable: The continuous variable in a regression discontinuity design that determines treatment assignment based on whether it falls above or below a known cutoff. Also called the forcing variable or assignment variable. Units cannot precisely manipulate the running variable around the cutoff for RDD to be valid.

28 terms

Sample Selection Bias: Bias that arises when the sample available for analysis is not representative of the population of interest because of a non-random selection process. For example, observing wages only for employed workers means the sample excludes those who chose not to work -- and the decision to work may be correlated with unobserved determinants of wages.
Saturated Model: A regression model that includes a separate parameter for every possible combination of values of the discrete regressors, so that the number of parameters equals the number of cells in the cross-tabulation. Saturated models impose no functional form restrictions and yield fitted values equal to the cell means. OLS on a saturated model recovers the CEF exactly for discrete regressors.
Segmented Regression: A regression model that allows different intercepts and slopes before and after an intervention point. In ITS: Yₜ = β₀ + β₁·time + β₂·intervention + β₃·time_since_intervention + eₜ. β₂ captures the immediate level change; β₃ captures the slope change.
Selection Bias: Bias arising from non-random treatment assignment, where systematic differences between treated and control groups exist before treatment — making simple comparisons misleading.
Selection Equation: In the Heckman model, the first-stage equation that models the binary selection decision (e.g., whether to participate in the labor force). Typically estimated as a probit: P(Sᵢ = 1 | Zᵢ) = Φ(Zᵢ'α). Identification is technically possible through the nonlinearity of the inverse Mills ratio alone (and depends on the joint-normality assumption that produces the closed-form selection correction), but in practice the selection equation should contain at least one variable (an exclusion restriction) that is not in the outcome equation — without one, the second-stage estimates are typically near-collinear and unstable.
Self-Selection: A threat to causal inference where units choose their own treatment status based on factors related to the outcome. In management, firms self-select into strategies, alliances, and markets based on private information about expected returns. Naively comparing treated and untreated firms confounds treatment effects with selection effects. Matching, Heckman selection models, instrumental variables, and regression discontinuity are common remedies.
Sensitivity Analysis: Methods for assessing how robust causal conclusions are to violations of identifying assumptions, particularly unconfoundedness. The Oster (2019) delta framework quantifies how much selection on unobservables relative to observables would be needed to explain away the estimated effect. The Cinelli and Hazlett (2020) partial R-squared approach provides benchmarked bounds on omitted variable bias.
Sequential Exogeneity: The assumption that the error term in each period is uncorrelated with current and past (but not necessarily future) values of the regressors: E[εᵢₜ | Xᵢ₁, ..., Xᵢₜ, αᵢ] = 0. Sequential exogeneity is weaker than strict exogeneity and permits feedback from past outcomes to current regressors, making it appropriate for dynamic panel models. It is the key identifying assumption for GMM estimators such as Arellano-Bond.
Sequential Ignorability: The key assumption of causal mediation analysis: (1) treatment is unconfounded conditional on observed covariates, and (2) the mediator is as-if randomly assigned conditional on treatment and observed confounders. The second part is very strong.
Sharp Null Hypothesis: The hypothesis that the treatment effect is exactly zero for every individual unit, not just on average. Under the sharp null, each unit's potential outcomes are identical regardless of treatment, which enables exact randomization inference by imputing all missing potential outcomes.
Sharp RDD: A regression discontinuity design where treatment is a deterministic function of the running variable crossing a cutoff: D = 1(X ≥ c). All units above the cutoff are treated, all below are untreated. The treatment effect is identified by the discontinuity in the conditional expectation of the outcome at the cutoff.
Shift-Share Instrument (Bartik Instrument): An instrumental variable constructed as a weighted sum of sectoral shocks (shifts), where the weights are pre-determined local exposure shares. Originally used by Bartik (1991) to instrument for local labor demand. Identification can rely on exogeneity of the shares (Goldsmith-Pinkham, Sorkin, and Swift, 2020) or exogeneity of the shocks (Borusyak, Hull, and Jaravel, 2022), with different implications for inference.
Simple Difference in Outcomes (SDO): The raw difference in average outcomes between the treated and untreated groups: SDO = E[Y | D=1] - E[Y | D=0]. The SDO equals the ATE only under random assignment; otherwise it conflates the causal effect with selection bias and (when treatment effects are heterogeneous) a weighted difference between ATT and ATU. The full decomposition (Cunningham, Mixtape) is: SDO = ATE + selection bias + (1−π)(ATT − ATU), where π is the share treated; the last term vanishes under constant treatment effects.
Simultaneity: A form of endogeneity in which two variables are jointly determined — each causes the other in equilibrium. For example, a firm's R&D spending and market share may be simultaneously determined: R&D builds market share, but market share funds R&D. OLS estimates are biased because the regressor is correlated with the error term. Instrumental variables or structural equation models are typical solutions.
Spillovers: When the treatment of one unit affects the outcomes of other units, violating SUTVA. Also called interference or contamination. Spillovers are common in settings with social interactions, geographic proximity, or market-level treatments.
Stable Unit Treatment Value Assumption (SUTVA): The assumption that one unit's treatment does not affect another unit's outcome, and that there is only one version of each treatment level. SUTVA rules out interference (spillovers) between units and hidden variations of the treatment.
Staggered Treatment Adoption: A research setting where different units adopt treatment at different times rather than all at once. Common in management research when firms adopt policies, enter markets, or respond to regulatory changes at varying dates. Standard two-way fixed effects estimators can produce biased estimates under staggered adoption with heterogeneous treatment effects. Modern DiD estimators (Callaway and Sant'Anna, Sun and Abraham, de Chaisemartin and d'Haultfoeuille) are designed for this setting.
Standard Error: The estimated standard deviation of the sampling distribution of an estimator. The standard error measures the precision of an estimate: smaller standard errors indicate more precise estimation. Standard errors are used to construct confidence intervals and test statistics. Their validity depends on correct assumptions about the error structure (homoscedasticity, independence, or appropriate corrections for violations).
Statistical Power: The probability that a statistical test correctly rejects a false null hypothesis. Power = 1 − P(Type II error). Power depends on sample size, effect size, significance level, and residual variance. A study with low power (e.g., below 0.80) is unlikely to detect true effects and, conditional on obtaining significance, is prone to overstating effect magnitudes (the winner's curse).
Stratified Randomization: Randomization performed separately within strata (subgroups) defined by important baseline covariates — such as firm size, industry, or baseline performance. Stratification ensures balance on the stratifying variables by construction and typically improves statistical power by reducing residual variance. The analysis should account for the stratification, either by including stratum fixed effects or by using randomization inference that respects the stratification.
Strict Exogeneity: The assumption E[εᵢₜ | Xᵢ₁, ..., XᵢT, αᵢ] = 0 for all t — that the error in each period is uncorrelated with the regressors in all periods (past, present, and future). Strict exogeneity is required for the consistency of the fixed effects estimator. It rules out feedback effects where past outcomes affect future regressors, which is violated when lagged dependent variables are included (dynamic panel bias).
Sun-Abraham Estimator: An interaction-weighted estimator for staggered DiD that estimates cohort-specific treatment effects by interacting relative-time indicators with cohort indicators. It provides consistent estimates of dynamic treatment effects under treatment effect heterogeneity across cohorts.
Survival Function: The probability of surviving (not experiencing the event) beyond time t: S(t) = P(T > t) = 1 - F(t) where F(t) is the CDF of the event time distribution. S(0) = 1 and S(t) decreases monotonically. Estimated nonparametrically by the Kaplan-Meier estimator.
Survivorship Bias: A selection bias that occurs when analysis is restricted to units that 'survived' some process, omitting those that exited. In strategy research, studying only surviving firms overstates average performance and masks failure patterns. For example, analyzing acquisition performance only among firms that still exist years later ignores acquirers that were driven to bankruptcy. Heckman selection models, inverse probability weighting, or Lee bounds can partially address this.
Switching Equation: An equation that expresses the observed outcome as a function of treatment status and both potential outcomes: Y = D * Y(1) + (1-D) * Y(0). The switching equation makes explicit that the observed outcome 'switches' between potential outcomes depending on treatment status, and is the basis for decomposing observed differences into causal effects and selection bias.
Synthetic Control Method: A method for estimating the causal effect of an intervention on a single treated unit by constructing a weighted combination of untreated units that best reproduces the treated unit's pre-treatment outcome trajectory. The post-treatment gap between the treated unit and its synthetic counterfactual estimates the treatment effect. Introduced by Abadie and Gardeazabal (2003) and formalized by Abadie, Diamond, and Hainmueller (2010). Identification requires no spillovers from the intervention to the donor pool, no anticipation, and good pre-treatment fit (low pre-period RMSPE between the treated unit and its synthetic counterpart).
Synthetic Control Weights: The non-negative weights assigned to donor units that construct the synthetic control — a weighted average of untreated units designed to match the treated unit's pre-treatment characteristics and outcomes. Good pre-treatment fit (low MSPE) strengthens the credibility of the counterfactual.
Synthetic Difference-in-Differences (SDID): A method that combines the strengths of DiD and synthetic control by reweighting both units and time periods to improve pre-treatment fit. SDID relaxes the strict parallel trends assumption of DiD while avoiding the requirement of a single treated unit in standard synthetic control.

5 terms

Table 2 Fallacy: The error of interpreting coefficients on control variables in a multivariate regression as causal effects, when those coefficients were not the target of the identification strategy. In a typical empirical paper, 'Table 2' reports the main regression with control variables included to reduce omitted variable bias in the treatment coefficient. The controls serve this purpose, but their own coefficients may be biased because the controls themselves are not exogenous and no identification strategy was designed for them. Westreich and Greenland (2013) coined the term in the American Journal of Epidemiology.
Treatment Effect Heterogeneity: Variation in causal effects across subgroups or individuals. When treatment effects are heterogeneous, the ATE may mask important differences. Conditional average treatment effects (CATE) can be estimated using methods like causal forests, sorted effects, and subgroup analysis.
Two-Way Fixed Effects (TWFE): A regression specification that includes both unit and time fixed effects, commonly used to implement difference-in-differences designs. With a single common treatment date and parallel trends, TWFE recovers a valid ATT (with or without effect heterogeneity). Under staggered treatment adoption with heterogeneous effects, however, recent econometric work (de Chaisemartin and d'Haultfoeuille, 2020; Goodman-Bacon, 2021; Sun and Abraham, 2021) shows that TWFE can produce severely biased estimates because it implicitly uses already-treated units as controls, generating potentially negative weights on some group-time ATTs.
Type I Error: Rejecting the null hypothesis when it is actually true (a false positive). The significance level α is the maximum tolerable probability of a Type I error. At α = 0.05, a researcher accepts a 5% chance of falsely declaring a significant effect when none exists. Multiple hypothesis testing inflates the overall Type I error rate beyond the nominal α.
Type II Error: Failing to reject the null hypothesis when it is actually false (a false negative). The probability of a Type II error is denoted β; statistical power is 1 − β. Type II errors are more likely in underpowered studies with small samples, small effect sizes, or high residual variance.

4 terms

Unconditional Quantile Effect: The effect of a treatment on a specific quantile of the unconditional (population) outcome distribution. Distinguished from the conditional quantile effect, which conditions on covariates X. Estimated via recentered influence function (RIF) regression (Firpo, Fortin & Lemieux, 2009).
Unconfoundedness: The assumption that treatment assignment is independent of potential outcomes conditional on observed covariates: {Y(0), Y(1)} ⊥ D | X. Also called the conditional independence assumption (CIA) or selection on observables. Combined with positivity/overlap (0 < P(D=1|X) < 1 on the support of interest) it is sometimes labelled strong ignorability. Unconfoundedness is fundamentally untestable because it involves unobserved potential outcomes, but sensitivity analysis methods can assess how robust conclusions are to its violation.
Unobserved Heterogeneity: Systematic differences across units (firms, individuals, teams) that are not captured by observed variables but affect outcomes. In panel data, unobserved heterogeneity — such as managerial ability, organizational culture, or firm-specific capabilities — confounds causal estimates if correlated with treatment. Fixed effects, instrumental variables, and the Mundlak device are standard remedies.
Unobserved Resistance (U_D): In the MTE framework, the unobserved component of the selection equation that determines treatment take-up. An individual selects into treatment when the propensity score P(Z) exceeds their unobserved resistance U_D: D = 1[P(Z) ≥ U_D]. Low U_D means eagerness to participate (low resistance); high U_D means reluctance. U_D is normalized to be uniformly distributed on [0, 1].

5 terms

Wald Estimator: The ratio of the reduced-form effect of the instrument on the outcome to the first-stage effect of the instrument on the treatment: β(IV) = Cov(Z, Y) / Cov(Z, D). For a binary instrument the classic Wald (1940) form is (E[Y|Z=1] - E[Y|Z=0]) / (E[D|Z=1] - E[D|Z=0]). In the just-identified case (one instrument for one endogenous variable), the Wald estimator equals the IV/2SLS estimate. Under the standard IV assumptions with heterogeneous effects, it identifies the local average treatment effect (LATE) for compliers.
Weak Instruments: Instruments that have little predictive power for the endogenous treatment variable in the first stage of 2SLS. Weak instruments cause the IV estimator to be biased toward the OLS estimate, inflate standard errors, and produce unreliable confidence intervals. The conventional diagnostic is the first-stage F-statistic: the classic screening threshold is F > 10 (Staiger and Stock, 1997); Stock and Yogo (2005) provide formal critical values, Lee, McCrary, Moreira, and Porter (2022) show that valid 5% t-test inference in the just-identified case requires F > 104.7, and weak-instrument-robust inference methods (Anderson-Rubin test, tF procedure) do not rely on any single F threshold.
Wild Bootstrap: A bootstrap variant designed for heteroscedastic data that resamples residuals by multiplying them by random weights (e.g., Rademacher or Webb weights) rather than resampling observations. The wild cluster bootstrap is the standard tool for inference with few clusters in DiD and panel settings.
Wild Cluster Bootstrap: A bootstrap procedure that resamples at the cluster level using Rademacher or Webb weights to construct test statistics. Provides reliable inference with few clusters (G < 20-30) where conventional cluster-robust standard errors underestimate true uncertainty. Implemented via boottest (Stata), fwildclusterboot (R), or wildboottest (Python).
Winner's Curse: The phenomenon where statistically significant estimates from underpowered studies tend to overstate the true effect size, because only large (possibly inflated) estimates cross the significance threshold. The winner's curse is a form of selection bias in the distribution of published estimates and contributes to replication failures. Adequately powered studies and pre-registration mitigate this problem.