388 References

Bibliography

All papers referenced across Method Atlas, formatted in APA 7th edition. Search, filter, sort, and export the collection.

Spanning: 1933–2025 · 9 decades of research
Coverage: 357 with DOI · 6 with replication package

Search/

Filters

Category

Decade

Cited on

Management only

Sort

Showing 388 of 388 references

0320
Abadie, A., & Gardeazabal, J. (2003). The Economic Costs of Conflict: A Case Study of the Basque Country. American Economic Review, 93(1), 113–132.
doi.org/10.1257/000282803321455188
Foundationalon synthetic control
terrorismBasque-Countryeconomic-costs
Annotation
Abadie and Gardeazabal introduce the synthetic control idea in the context of estimating the economic costs of terrorism in the Basque Country. They construct a synthetic Basque Country from other Spanish regions and show that terrorism reduced GDP per capita by about 10 percentage points.
0620
Abadie, A., & Imbens, G. W. (2006). Large Sample Properties of Matching Estimators for Average Treatment Effects. Econometrica, 74(1), 235–267.
doi.org/10.1111/j.1468-0262.2006.00655.x
Foundationalon matching methods
nearest-neighborlarge-sample-theoryvariance-estimation
Annotation
Abadie and Imbens derive the large-sample properties of nearest-neighbor matching estimators, showing that such estimators are not root-N consistent in general and do not attain the semiparametric efficiency bound. Their main practical contribution is a consistent analytical variance estimator that does not require nonparametric estimation of unknown functions. Bootstrap invalidity for matching is established separately in Abadie and Imbens (2008), and the bias-corrected matching estimator is developed in Abadie and Imbens (2011).
0820
Abadie, A., & Imbens, G. W. (2008). On the Failure of the Bootstrap for Matching Estimators. Econometrica, 76(6), 1537–1557.
doi.org/10.3982/ECTA6474
Foundationalon matching methods
bootstrapmatching-inferencevariance-estimation
Annotation
Abadie and Imbens show that the standard bootstrap is inconsistent for nearest-neighbor matching estimators with a fixed number of matches, even though these estimators are asymptotically normal. Researchers should use the analytical variance estimator from Abadie and Imbens (2006) instead of bootstrapping.
1020
Abadie, A., Diamond, A., & Hainmueller, J. (2010). Synthetic Control Methods for Comparative Case Studies: Estimating the Effect of California's Tobacco Control Program. Journal of the American Statistical Association, 105(490), 493–505.
doi.org/10.1198/jasa.2009.ap08746
Foundationalon synthetic control, synthetic difference in differences
synthetic-controltobacco-policyCalifornia
Annotation
Abadie, Diamond, and Hainmueller formalize and popularize the synthetic control method, which constructs a weighted combination of control units to approximate the counterfactual for a single treated unit. The application to California's Proposition 99 tobacco control program becomes the canonical example of the method.
1120
Abadie, A., & Imbens, G. W. (2011). Bias-Corrected Matching Estimators for Average Treatment Effects. Journal of Business & Economic Statistics, 29(1), 1–11.
doi.org/10.1198/jbes.2009.07333
Foundationalon matching methods
bias-correctionnearest-neighborregression-adjustment
Annotation
Abadie and Imbens develop bias-corrected matching estimators that adjust for the finite-sample bias inherent in nearest-neighbor matching when matching is not exact. Their bias correction uses a regression adjustment within matched pairs and has become a standard recommendation for applied researchers using matching methods.
1520
Abadie, A., Diamond, A., & Hainmueller, J. (2015). Comparative Politics and the Synthetic Control Method. American Journal of Political Science, 59(2), 495–510.
doi.org/10.1111/ajps.12116
Applicationon synthetic control
German-reunificationcomparative-politicspermutation-test
Annotation
Abadie, Diamond, and Hainmueller apply the synthetic control method to estimate the economic impact of German reunification, constructing a synthetic West Germany from OECD countries. They demonstrate the method's applicability to major political events and provided inference procedures based on permutation tests.
2020
Abadie, A., Athey, S., Imbens, G. W., & Wooldridge, J. M. (2020). Sampling-Based versus Design-Based Uncertainty in Regression Analysis. Econometrica, 88(1), 265–296.
doi.org/10.3982/ECTA12675
Foundationalon ols regression
clusteringstandard-errorsinferenceresearch-design
Annotation
Abadie et al. distinguish between sampling-based uncertainty (from drawing a sample from a population) and design-based uncertainty (from treatment assignment) in regression analysis. They show that conventional standard errors can be conservative when the sample includes a substantial fraction of the population, providing a rigorous framework for understanding what regression standard errors actually measure. This paper clarifies the conceptual foundations for inference in empirical work and complements their separate 2023 QJE paper on clustering.
2120
Abadie, A. (2021). Using Synthetic Controls: Feasibility, Data Requirements, and Methodological Aspects. Journal of Economic Literature, 59(2), 391–425.
doi.org/10.1257/jel.20191450
Surveyon synthetic control
surveyplacebo-testsmethodology
Annotation
Abadie provides a comprehensive methodological overview of synthetic control, covering data requirements, inference via placebo tests, extensions to multiple treated units, and common pitfalls. This paper is the authoritative practitioner's guide to the method.
2320
Abadie, A., Athey, S., Imbens, G. W., & Wooldridge, J. M. (2023). When Should You Adjust Standard Errors for Clustering?. Quarterly Journal of Economics, 138(1), 1–35.
doi.org/10.1093/qje/qjac038
Foundationalon difference in differences
clusteringstandard-errorsinferencedesign-based
Annotation
Abadie et al. provide guidance on when clustering standard errors is necessary. They show that clustering can be motivated by sampling-based uncertainty (e.g., two-stage sampling of clusters then units) or design-based uncertainty (e.g., treatment assigned at the cluster level), and that whether to cluster, and at what level, is a substantive question tied to the sampling and assignment process — not a purely mechanical rule.
9919
Abowd, J. M., Kramarz, F., & Margolis, D. N. (1999). High Wage Workers and High Wage Firms. Econometrica, 67(2), 251–333.
doi.org/10.1111/1468-0262.00020
Applicationon fixed effects
worker-fixed-effectsfirm-fixed-effectswage-decomposition
Annotation
Abowd, Kramarz, and Margolis use worker and firm fixed effects jointly to decompose wage variation into worker ability and firm pay premia in this landmark paper. The 'AKM' model has become the standard framework for studying labor market sorting, wage inequality, and the role of firms in wage-setting.
0120
Acemoglu, D., Johnson, S., & Robinson, J. A. (2001). The Colonial Origins of Comparative Development: An Empirical Investigation. American Economic Review, 91(5), 1369–1401.
doi.org/10.1257/aer.91.5.1369
Applicationon instrumental variables
institutionseconomic-developmentcolonial-history
Annotation
Acemoglu, Johnson, and Robinson use historical settler mortality as an instrument for institutional quality to estimate the causal effect of institutions on economic development in this celebrated paper. It is one of the most influential IV applications in economics and demonstrates the creativity required to find a plausible instrument.
1620
Acharya, A., Blackwell, M., & Sen, M. (2016). Explaining Causal Findings Without Bias: Detecting and Assessing Direct Effects. American Political Science Review, 110(3), 512–529.
doi.org/10.1017/S0003055416000216
Foundationalon causal mediation analysis
controlled-direct-effectssequential-g-estimationobservational-studiescollider-bias
Annotation
Acharya, Blackwell, and Sen develop a sequential g-estimation approach for estimating controlled direct effects in observational studies, addressing the problem that conditioning on a post-treatment mediator can introduce collider bias. Their method is particularly useful in political science and social science settings where intermediate confounders make standard mediation analysis unreliable.
2020
Acquisti, A., & Fong, C. M. (2020). An Experiment in Hiring Discrimination via Online Social Networks. Management Science, 66(3), 1005–1024.
doi.org/10.1287/mnsc.2018.3269
ApplicationMgmton experimental design
audit-studydiscriminationsocial-mediahiringfield-experiment
Annotation
Acquisti and Fong conduct an audit study sending fictitious applications to over 4,000 US employers, randomly varying the social media profiles associated with applicants to signal religion and sexual orientation. At the national level they find no significant difference in callback rates for Muslim versus Christian or gay versus straight candidates, but document significant bias against Muslim candidates in Republican-leaning areas. The paper demonstrates how correspondence experiments can leverage online social networks to study discrimination in hiring.
1920
Adao, R., Kolesar, M., & Morales, E. (2019). Shift-Share Designs: Theory and Inference. Quarterly Journal of Economics, 134(4), 1949–2010.
doi.org/10.1093/qje/qjz025
Foundationalon shift share instruments
inferencestandard-errorsspatial-correlation
Annotation
Adao, Kolesar, and Morales show that standard errors in shift-share regressions are too small when computed with conventional clustering because residuals are correlated across regions that share similar industry compositions. They propose an inference procedure that accounts for this dependence.
0520
Aguinis, H., Beaty, J. C., Boik, R. J., & Pierce, C. A. (2005). Effect Size and Power in Assessing Moderating Effects of Categorical Variables Using Multiple Regression: A 30-Year Review. Journal of Applied Psychology, 90(1), 94–107.
doi.org/10.1037/0021-9010.90.1.94
Applicationon power analysis
moderationinteraction-effectsapplied-psychology
Annotation
Aguinis, Beaty, Boik, and Pierce review 30 years of moderator analysis in applied psychology and management, finding that most studies are severely underpowered to detect interaction effects. They provide guidelines for computing power for moderated regression.
1320
Aguinis, H., Gottfredson, R. K., & Culpepper, S. A. (2013). Best-Practice Recommendations for Estimating Cross-Level Interaction Effects Using Multilevel Modeling. Journal of Management, 39(6), 1490–1528.
doi.org/10.1177/0149206313478188
ApplicationMgmton random effects
cross-level-interactionsmultilevelbest-practices
Annotation
Aguinis, Gottfredson, and Culpepper provide detailed guidance for management researchers on estimating cross-level interaction effects in multilevel models. They address common problems including insufficient statistical power, centering decisions, and effect size reporting that frequently lead to unreliable results in organizational research. The paper offers concrete recommendations for sample size, model specification, and interpretation that improve the credibility of multilevel interaction analyses.
1720
Aguinis, H., Edwards, J. R., & Bradley, K. J. (2017). Improving Our Understanding of Moderation and Mediation in Strategic Management Research. Organizational Research Methods, 20(4), 665–685.
doi.org/10.1177/1094428115627498
ApplicationMgmton causal mediation analysis
management-methodologymoderationbest-practices
Annotation
Aguinis, Edwards, and Bradley review how mediation and moderation analyses are conducted in strategic management research and identify common errors. They provide recommendations for improving practice, including using causal mediation frameworks and proper inference procedures.
1820
Aguinis, H., Ramani, R. S., & Alabduljader, N. (2018). What You See Is What You Get? Enhancing Methodological Transparency in Management Research. Academy of Management Annals, 12(1), 83–110.
doi.org/10.5465/annals.2016.0011
ApplicationMgmton pre registration
management-methodologytransparencyopen-science
Annotation
Aguinis, Ramani, and Alabduljader review methodological transparency in management research and advocate for pre-registration, open data, and open materials. They document the extent of undisclosed analytical flexibility in management studies and propose concrete steps for improvement.
0020
Ahuja, G. (2000). Collaboration Networks, Structural Holes, and Innovation: A Longitudinal Study. Administrative Science Quarterly, 45(3), 425–455.
doi.org/10.2307/2667105
ApplicationMgmton poisson negative binomial
networksstructural-holesinnovationpatentsnegative-binomial+1
Annotation
Ahuja uses a random effects Poisson model (following Hausman, Hall, and Griliches 1984) to model patent counts as a function of collaboration network structure in this landmark network study. He finds that direct ties and indirect ties both increase innovation, while structural holes (gaps between partners) decrease it — challenging Burt's structural holes theory in the context of innovation. The paper demonstrates the use of count models with panel data in management research, with fixed effects Poisson estimated as a robustness check.
0320
Ai, C., & Norton, E. C. (2003). Interaction Terms in Logit and Probit Models. Economics Letters, 80(1), 123–129.
doi.org/10.1016/S0165-1765(03)00032-6
Foundationalon logit probit
interaction-effectsmarginal-effectsnonlinear-models
Annotation
Ai and Norton show that the interpretation of interaction terms in nonlinear models like logit and probit is much more complicated than in linear models. The marginal effect of an interaction is not simply the coefficient on the interaction term, a mistake that is widespread in applied research.
8219
Akerlof, G. A. (1982). Labor Contracts as Partial Gift Exchange. Quarterly Journal of Economics, 97(4), 543–569.
doi.org/10.2307/1885099
Foundationalon lab experiment replication
Annotation
Akerlof proposes the gift exchange model of labor markets, in which firms pay above-market wages and workers reciprocate with above-minimum effort. This framework provides a behavioral foundation for efficiency wages and has been tested extensively in laboratory and field experiments.
1220
Albouy, D. Y. (2012). The Colonial Origins of Comparative Development: An Empirical Investigation: Comment. American Economic Review, 102(6), 3059–3076.
doi.org/10.1257/aer.102.6.3059
Applicationon instrumental variables
instrument-validityreplicationcolonial-originssensitivity
Annotation
Albouy critically re-examines the settler mortality instrument used in Acemoglu et al. (2001), showing that the original results are sensitive to data coding decisions and the sample of countries included. This comment is a cautionary tale about instrument validity and the fragility of influential IV estimates.
1520
Allcott, H. (2015). Site Selection Bias in Program Evaluation. Quarterly Journal of Economics, 130(3), 1117–1165.
doi.org/10.1093/qje/qjv015
Foundationalon external validity
Annotation
Allcott documents a striking pattern of site-selection bias in a household energy-conservation program rolled out across many utility partners: the utilities that volunteered first had systematically larger treatment effects, so the early-site estimates substantially overstated the effect on the broader population the program eventually reached. The paper provides a canonical empirical example of why external validity claims need explicit examination of which sites or contexts are studied.
9919
Allison, P. D. (1999). Comparing Logit and Probit Coefficients Across Groups. Sociological Methods & Research, 28(2), 186–208.
doi.org/10.1177/0049124199028002003
Foundationalon logit probit
logitprobitgroup-comparisonscoefficient-scaling
Annotation
Allison shows that naive comparisons of logit or probit coefficients across groups are misleading because differences in residual variation across groups rescale the coefficients. He proposes a method to adjust for this confound, which is essential for interpreting interaction effects and group comparisons in nonlinear models.
0920
Allison, P. D. (2009). Fixed Effects Regression Models. SAGE Publications.
doi.org/10.4135/9781412993869
Surveyon random effects
fixed-vs-randompanel-datatextbookpractical-guidance
Annotation
Allison's concise and accessible monograph compares fixed effects and random effects models for panel data, providing practical guidance on model selection, estimation, and interpretation. It is particularly useful for social scientists seeking an intuitive understanding of when each approach is appropriate.
0520
Altonji, J. G., Elder, T. E., & Taber, C. R. (2005). Selection on Observed and Unobserved Variables: Assessing the Effectiveness of Catholic Schools. Journal of Political Economy, 113(1), 151–184.
doi.org/10.1086/426036
Foundationalon sensitivity analysis
selection-on-observablesCatholic-schoolsbounding
Annotation
Altonji, Elder, and Taber develop the idea that if selection on observables is informative about selection on unobservables, one can bound the bias from omitted variables. Their approach becomes the basis for the widely used Oster (2019) sensitivity framework.
8119
Amemiya, T. (1981). Qualitative Response Models: A Survey. Journal of Economic Literature, 19(4), 1483–1536.
Foundationalon logit probit
surveyqualitative-responsemaximum-likelihood
Annotation
Amemiya provides a comprehensive survey of qualitative response models including logit, probit, and tobit. This survey organized the theoretical properties, estimation methods, and specification tests for binary and multinomial choice models and becomes a standard reference for applied researchers.
0820
Anderson, M. L. (2008). Multiple Inference and Gender Differences in the Effects of Early Intervention: A Reevaluation of the Abecedarian, Perry Preschool, and Early Training Projects. Journal of the American Statistical Association, 103(484), 1481–1495.
doi.org/10.1198/016214508000000841
Foundationalon multiple testing
index-testsWestfall-Youngprogram-evaluation
Annotation
Anderson proposes using summary index tests and familywise error rate corrections to address multiple inference in program evaluation. Reanalyzing the Abecedarian, Perry Preschool, and Early Training Projects, he finds that girls garner substantial short- and long-term benefits from early interventions, but there are no significant long-term benefits for boys after correcting for multiple testing.
1920
Andrews, I., Stock, J. H., & Sun, L. (2019). Weak Instruments in Instrumental Variables Regression: Theory and Practice. Annual Review of Economics, 11, 727–753.
doi.org/10.1146/annurev-economics-080218-025643
Surveyon instrumental variables
weak-instrumentssurveyrobust-inference
Annotation
Andrews, Stock, and Sun provide an up-to-date review of the weak instruments problem, covering modern diagnostic tests, robust inference procedures, and practical recommendations. It is an excellent starting point for understanding the current best practices in IV estimation.
9019
Angrist, J. D. (1990). Lifetime Earnings and the Vietnam Era Draft Lottery: Evidence from Social Security Administrative Records. American Economic Review, 80(3), 313–336.
Foundationalon instrumental variables
instrumental-variablesnatural-experimentdraft-lotteryLATE
Annotation
Angrist uses the Vietnam-era draft lottery as a natural experiment in this landmark application of instrumental variables. He shows that randomly assigned lottery numbers provide an instrument for military service, allowing causal estimation of the earnings effect of military service.
9119
Angrist, J. D., & Krueger, A. B. (1991). Does Compulsory School Attendance Affect Schooling and Earnings?. Quarterly Journal of Economics, 106(4), 979–1014.
doi.org/10.2307/2937954
Foundationalon instrumental variables
returns-to-educationquarter-of-birthcompulsory-schooling
Annotation
Angrist and Krueger use quarter of birth as an instrument for years of schooling, exploiting the fact that compulsory schooling laws interact with birth timing. This paper is one of the most-taught examples of instrumental variables in economics and also sparked important debates about weak instruments.
9619
Angrist, J. D., Imbens, G. W., & Rubin, D. B. (1996). Identification of Causal Effects Using Instrumental Variables. Journal of the American Statistical Association, 91(434), 444–455.
doi.org/10.1080/01621459.1996.10476902
Foundationalon experimental design, instrumental variables
LATEcompliersinstrumental-variablespotential-outcomes
Annotation
Angrist, Imbens, and Rubin formalize the LATE framework — originally introduced in Imbens and Angrist (1994) — within the Rubin Causal Model, providing a detailed treatment of the assumptions required for causal interpretation of IV estimates. This paper introduces the complier taxonomy (always-takers, never-takers, compliers, defiers) that is now standard in the IV literature. The practical implication is that IV estimates should be interpreted as local to the complier subpopulation, not as average effects for the entire population.
9919
Angrist, J. D., & Lavy, V. (1999). Using Maimonides' Rule to Estimate the Effect of Class Size on Scholastic Achievement. Quarterly Journal of Economics, 114(2), 533–575.
doi.org/10.1162/003355399556061
Applicationon regression discontinuity fuzzy
class-sizeeducationMaimonides-rule
Annotation
Angrist and Lavy exploit a rule that caps class sizes at 40 students, creating discontinuities in class size as enrollment crosses multiples of 40. The imperfect compliance with the rule makes this a fuzzy RDD. This paper is one of the most widely taught examples of the fuzzy RDD approach.
0120
Angrist, J. D., & Krueger, A. B. (2001). Instrumental Variables and the Search for Identification: From Supply and Demand to Natural Experiments. Journal of Economic Perspectives, 15(4), 69–85.
doi.org/10.1257/jep.15.4.69
Surveyon instrumental variables
history-of-IVnatural-experimentssupply-and-demandidentification
Annotation
Angrist and Krueger trace the evolution of IV from its origins in supply-and-demand estimation to modern natural experiments in this historical survey. They provide valuable context for understanding how IV methodology developed and why it becomes central to applied economics.
0620
Angrist, J. D., Chernozhukov, V., & Fernandez-Val, I. (2006). Quantile Regression under Misspecification, with an Application to the U.S. Wage Structure. Econometrica, 74(2), 539–563.
doi.org/10.1111/j.1468-0262.2006.00671.x
Applicationon quantile treatment effects
applicationwage-structurereturns-to-education
Annotation
Angrist, Chernozhukov, and Fernandez-Val study quantile regression under misspecification, showing that QR coefficients minimize a weighted mean-squared specification-error loss and deriving an omitted-variable-bias formula for quantile regression. Applying this framework to U.S. Census wage data, they document continued residual inequality growth in the 1990s, primarily in the upper half of the distribution.
0620
Angrist, J., Bettinger, E., & Kremer, M. (2006). Long-Term Educational Consequences of Secondary School Vouchers: Evidence from Administrative Records in Colombia. American Economic Review, 96(3), 847–862.
doi.org/10.1257/aer.96.3.847
Applicationon lee bounds
school-vouchersattritionColombia
Annotation
Angrist, Bettinger, and Kremer use bounding methods to address attrition in a school voucher experiment in Colombia. The paper is a prominent application of trimming-based bounds in development economics, demonstrating how such methods handle selective attrition in a real policy evaluation.
0920
Angrist, J. D., & Pischke, J.-S. (2009). Mostly Harmless Econometrics: An Empiricist's Companion. Princeton University Press.
doi.org/10.1515/9781400829828
Surveyon difference in differences, staggered difference in differences, doubly robust estimation +11
textbookcausal-inferencedesign-basedcredibility-revolution
Annotation
Angrist and Pischke write one of the most influential modern textbooks on applied econometrics, organizing the field around a design-based approach to causal inference. The book provides essential treatments of instrumental variables, difference-in-differences, and regression discontinuity, each grounded in the potential outcomes framework. It remains the standard reference for graduate students learning to evaluate and implement identification strategies.
1020
Angrist, J. D., & Pischke, J.-S. (2010). The Credibility Revolution in Empirical Economics: How Better Research Design Is Taking the Con out of Econometrics. Journal of Economic Perspectives, 24(2), 3–30.
doi.org/10.1257/jep.24.2.3
Surveyon ols regression
credibility-revolutionresearch-designcausal-inferencemethodology
Annotation
Angrist and Pischke survey the rise of design-based empirical economics — randomized experiments, natural experiments, IV, regression discontinuity, and difference-in-differences — and argue that explicit attention to research design (rather than richer functional forms or more controls) is what made applied microeconomics more credible since the 1980s. The piece is the standard reference for the 'credibility revolution' framing and responds directly to critiques of the design-based turn.
2120
Arkhangelsky, D., Athey, S., Hirshberg, D. A., Imbens, G. W., & Wager, S. (2021). Synthetic Difference-in-Differences. American Economic Review, 111(12), 4088–4118.
doi.org/10.1257/aer.20190159
Foundationalon synthetic difference in differences
synthetic-DIDunit-weightstime-weights
Annotation
Arkhangelsky et al. introduce the synthetic difference-in-differences estimator, which combines the strengths of DID (parallel trends assumption) and synthetic control (re-weighting to improve pre-treatment fit). The method uses both unit weights and time weights to construct a more credible counterfactual, and provides valid inference without requiring a large donor pool.
2220
Arkhangelsky, D., & Imbens, G. W. (2022). Doubly Robust Identification for Causal Panel Data Models. Econometrics Journal, 25(3), 649–674.
doi.org/10.1093/ectj/utac019
Foundationalon synthetic difference in differences
doubly-robustcausal-panel-dataSDID-extension
Annotation
Arkhangelsky and Imbens develop doubly robust identification strategies for causal panel data models, combining outcome modeling with re-weighting to provide consistent estimates if either the outcome model or the weighting scheme is correctly specified. The framework is broader than synthetic DID specifically but directly relevant to it, strengthening the theoretical foundations for panel-data treatment effect estimation.
7819
Ashenfelter, O. (1978). Estimating the Effect of Training Programs on Earnings. Review of Economics and Statistics, 60(1), 47–57.
doi.org/10.2307/1924332
Foundationalon difference in differences
training-programsearningsearly-DID
Annotation
Ashenfelter provides one of the earliest applications of the difference-in-differences logic, comparing the earnings of trainees before and after a job training program to a comparison group. The key insight is that differencing removes time-invariant unobserved differences between treatment and control groups. This paper also documents the 'Ashenfelter dip' — the pre-program earnings decline among trainees — which becomes a canonical example of why parallel trends cannot be taken for granted.
1620
Athey, S., & Imbens, G. W. (2016). Recursive Partitioning for Heterogeneous Causal Effects. Proceedings of the National Academy of Sciences, 113(27), 7353–7360.
doi.org/10.1073/pnas.1510489113
Foundationalon causal forests
causal-treeshonest-estimationheterogeneous-effects
Annotation
Athey and Imbens introduce causal trees, adapting the CART algorithm to estimate heterogeneous treatment effects with valid inference. They propose the honest estimation approach, where one subsample is used for tree construction and another for estimation, ensuring valid confidence intervals.
1720
Athey, S., & Imbens, G. W. (2017). The Econometrics of Randomized Experiments. Handbook of Economic Field Experiments, 1, 73–140.
doi.org/10.1016/bs.hefe.2016.10.003
Foundationalon experimental design, randomization inference
field-experimentsrandomization-inferencedesign
Annotation
Athey and Imbens provide a modern, rigorous treatment of the econometrics behind randomized experiments. They cover design, analysis, and inference issues such as stratification, clustering, and multiple hypothesis testing. It is an excellent reference for researchers running field experiments.
1920
Athey, S., & Imbens, G. W. (2019). Machine Learning Methods That Economists Should Know About. Annual Review of Economics, 11, 685–725.
doi.org/10.1146/annurev-economics-080217-053433
Surveyon double debiased machine learning
surveymachine-learningeconomics
Annotation
Athey and Imbens provide a broad survey of machine learning methods relevant to economists, covering supervised learning, unsupervised learning, matrix completion, and methods at the intersection of ML and causal inference including DML and causal forests. The paper explains when and why machine learning methods can improve both prediction and causal inference in economics. It serves as an accessible entry point for applied researchers seeking to understand the full landscape of ML tools available for economic applications.
1920
Athey, S., Tibshirani, J., & Wager, S. (2019). Generalized Random Forests. Annals of Statistics, 47(2), 1148–1178.
doi.org/10.1214/18-AOS1709
FoundationalReplicationon causal forests
generalized-random-forestsestimating-equationsgrf-package
Annotation
Athey, Tibshirani, and Wager introduce the generalized random forest (GRF) framework, which extends causal forests to a broad class of estimating equations including quantile regression, IV, and local average treatment effects. GRF provides the theoretical foundation and the widely used grf R package.
0320
Autor, D. H. (2003). Outsourcing at Will: The Contribution of Unjust Dismissal Doctrine to the Growth of Employment Outsourcing. Journal of Labor Economics, 21(1), 1–42.
doi.org/10.1086/344122
Applicationon difference in differences
employment-lawoutsourcingstaggered-adoption
Annotation
Autor uses a DID design that exploits the staggered adoption of wrongful-discharge protections across U.S. states. He finds that stronger employment protections led firms to outsource more jobs. This paper is a model for using staggered state-level policy changes in a DID framework.
1320
Autor, D. H., Dorn, D., & Hanson, G. H. (2013). The China Syndrome: Local Labor Market Effects of Import Competition in the United States. American Economic Review, 103(6), 2121–2168.
doi.org/10.1257/aer.103.6.2121
Applicationon shift share instruments
China-shocktradelabor-markets
Annotation
Autor, Dorn, and Hanson use a shift-share instrument to study how Chinese import competition affected U.S. local labor markets, instrumenting U.S. import exposure with Chinese exports to other high-income countries. This paper is one of the most influential and widely discussed shift-share applications.
1020
Azoulay, P., Graff Zivin, J. S., & Wang, J. (2010). Superstar Extinction. Quarterly Journal of Economics, 125(2), 549–589.
doi.org/10.1162/qjec.2010.125.2.549
Applicationon matching methods
superstar-scientistscollaborationinnovationscience-of-science
Annotation
Azoulay and coauthors exploit the premature and unexpected deaths of 112 academic superstars as a natural experiment, using coarsened exact matching to construct a control group of comparable collaborators. They find that the death of a superstar leads to a lasting 5-8% decline in the quality-adjusted publication rates of their collaborators, with spillovers circumscribed in idea space but less so in physical or social space. This study is an elegant application of a natural experiment combined with matching in the economics of science and innovation.
1420
Azoulay, P., Stuart, T., & Wang, Y. (2014). Matthew: Effect or Fable?. Management Science, 60(1), 92–109.
doi.org/10.1287/mnsc.2013.1755
ApplicationMgmton matching methods
matchingcoarsened-exact-matchingMatthew-effectcumulative-advantagescience-of-science
Annotation
Azoulay, Stuart, and Wang investigate whether mid-career recognition (Howard Hughes Medical Institute appointment) creates a cumulative advantage or 'Matthew effect' in science. They use coarsened exact matching to construct a comparison group of equally productive scientists, addressing the selection problem inherent in studying prestigious awards. The study finds a small, short-lived citation boost to papers published before HHMI appointment, suggesting a status or halo effect on pre-existing work rather than a sustained productivity advantage.

2220
Bach, P., Chernozhukov, V., Kurz, M. S., & Spindler, M. (2022). DoubleML – An Object-Oriented Implementation of Double Machine Learning in Python. Journal of Machine Learning Research, 23(53), 1–6.
FoundationalReplicationon double debiased machine learning
softwarePythonRimplementation
Annotation
Bach and colleagues develop the DoubleML Python package, providing a user-friendly object-oriented implementation of the DML framework. The package supports partially linear, interactive, and instrumental variable models with a variety of machine learning methods for nuisance estimation. A companion R package is described separately.
2220
Baker, A. C., Larcker, D. F., & Wang, C. C. Y. (2022). How Much Should We Trust Staggered Difference-in-Differences Estimates?. Journal of Financial Economics, 144(2), 370–395.
doi.org/10.1016/j.jfineco.2022.01.004
Applicationon staggered difference in differences
financereplicationTWFE-bias
Annotation
Baker, Larcker, and Wang demonstrate that the staggered DID problems identified in the econometrics literature are empirically relevant in finance research. They re-analyzed prominent finance studies and show that results can change substantially when using robust estimators.
2120
Baltagi, B. H. (2021). Econometric Analysis of Panel Data. Springer, 6th edition.
doi.org/10.1007/978-3-030-53953-5
Surveyon random effects
textbookpanel-dataerror-componentsdynamic-panels
Annotation
Baltagi provides the standard graduate-level textbook on panel data econometrics, covering one-way and two-way error component models, fixed and random effects, dynamic panels, unbalanced panels, spatial panels, and limited dependent variable panel models. The book emphasizes the theoretical foundations, asymptotic properties, and Monte Carlo evidence on different panel estimators, and is the primary reference for understanding the assumptions, properties, and trade-offs of panel data methods.
0520
Bandiera, O., Barankay, I., & Rasul, I. (2005). Social Preferences and the Response to Incentives: Evidence from Personnel Data. Quarterly Journal of Economics, 120(3), 917–962.
doi.org/10.1093/qje/120.3.917
Applicationon experimental design
incentivesfield-experimentpersonnel-economics
Annotation
Bandiera, Barankay, and Rasul exploit a natural experiment in a fruit-picking firm that changed compensation from relative incentives to piece rates. Using personnel records, they document a 50% productivity increase under piece rates and trace the effect to reduced negative externalities among coworkers under relative pay.
1520
Banerjee, A., Duflo, E., Goldberg, N., Karlan, D., Osei, R., Pariente, W., Shapiro, J., Thuysbaert, B., & Udry, C. (2015). A Multifaceted Program Causes Lasting Progress for the Very Poor: Evidence from Six Countries. Science, 348(6236), 1260799.
doi.org/10.1126/science.1260799
Applicationon experimental design
development-economicsmulti-country-RCTpoverty
Annotation
Banerjee, Duflo, and colleagues conduct a large-scale RCT across six countries, demonstrating that a multifaceted anti-poverty program produces sustained economic gains for the ultra-poor. The study is notable for its multi-site design, which provides rare multi-country evidence on how the same intervention performs across diverse contexts. It demonstrates both the power of randomized evaluation at scale and the importance of bundled interventions when individual components may be insufficient.
0520
Bang, H., & Robins, J. M. (2005). Doubly Robust Estimation in Missing Data and Causal Inference Models. Biometrics, 61(4), 962–973.
doi.org/10.1111/j.1541-0420.2005.00377.x
Foundationalon doubly robust estimation
double-robustnesssimulationtutorial
Annotation
Bang and Robins provide an accessible exposition of doubly robust estimators, demonstrating their properties through simulations and clarifying when the double robustness property provides meaningful protection. This paper helps make the method more accessible to applied researchers.
8619
Baron, R. M., & Kenny, D. A. (1986). The Moderator-Mediator Variable Distinction in Social Psychological Research: Conceptual, Strategic, and Statistical Considerations. Journal of Personality and Social Psychology, 51(6), 1173–1182.
doi.org/10.1037/0022-3514.51.6.1173
Foundationalon causal mediation analysis
mediationmoderator-mediatorsocial-psychology
Annotation
Baron and Kenny introduce the widely used four-step approach to testing mediation, comparing total, direct, and indirect effects using sequential regressions. While later work has identified limitations of this approach, it remains one of the most cited papers in all of social science.
1120
Barone-Adesi, F., Gasparrini, A., Vizzini, L., Merletti, F., & Richiardi, L. (2011). Effects of Italian Smoking Regulation on Rates of Hospital Admission for Acute Coronary Events: A Country-Wide Study. PLoS ONE, 6(3), e17419.
doi.org/10.1371/journal.pone.0017419
Applicationon lab its replication
Annotation
Barone-Adesi et al. use an interrupted time series design to estimate the effect of Italy's 2005 smoking ban on acute coronary event admissions, finding a significant reduction among those under 70 in the months following implementation.
9119
Bartik, T. J. (1991). Who Benefits from State and Local Economic Development Policies?. W.E. Upjohn Institute for Employment Research.
doi.org/10.17848/9780585223940
Foundationalon shift share instruments
Bartik-instrumentlocal-labor-marketsemployment
Annotation
Bartik's book constructs a predicted local employment growth index built from national industry growth rates interacted with initial local industry composition, used to study the incidence of state and local development policies. The literature subsequently adopted this index as an instrument for local labor-demand shocks and now refers to it as the 'Bartik' or 'shift-share' instrument — one of the most widely used instruments in labor and urban economics.
0820
Battistin, E., & Rettore, E. (2008). Ineligibles and Eligible Non-Participants as a Double Comparison Group in Regression-Discontinuity Designs. Journal of Econometrics, 142(2), 715–730.
doi.org/10.1016/j.jeconom.2007.05.006
Foundationalon regression discontinuity fuzzy
imperfect-compliancedouble-comparisonboundsfuzzy-RDD
Annotation
Battistin and Rettore propose using ineligible units and eligible non-participants as a double comparison group in regression-discontinuity designs. This specification-testing strategy allows researchers to assess the validity of RDD assumptions by checking whether the two comparison groups yield consistent estimates, strengthening the credibility of RDD-based inference.
1520
Bell, A., & Jones, K. (2015). Explaining Fixed Effects: Random Effects Modeling of Time-Series Cross-Sectional and Panel Data. Political Science Research and Methods, 3(1), 133–153.
doi.org/10.1017/psrm.2014.7
Foundationalon random effects
within-betweenpanel-datamodel-choice
Annotation
Bell and Jones argue that the 'within-between' random-effects model (closely related to the Mundlak approach) can outperform pure fixed effects in certain settings because it allows explicit decomposition of within- and between-unit effects while accounting for unobserved heterogeneity. This approach retains the unbiasedness of the within estimator for time-varying regressors while also estimating between-unit effects that fixed effects discard. The paper provides practical guidance for researchers who need to estimate both types of effects or who have time-invariant regressors that fixed effects cannot identify.
1420
Belloni, A., Chernozhukov, V., & Hansen, C. (2014). Inference on Treatment Effects after Selection among High-Dimensional Controls. Review of Economic Studies, 81(2), 608–650.
doi.org/10.1093/restud/rdt044
Foundationalon double debiased machine learning
LASSOpost-double-selectionhigh-dimensional
Annotation
Belloni, Chernozhukov, and Hansen introduce the post-double-selection LASSO method for inference on treatment effects with many potential controls. This paper is a key precursor to DML, demonstrating how regularized selection in both the treatment and outcome equations can yield valid inference.
2120
Ben-Michael, E., Feller, A., & Rothstein, J. (2021). The Augmented Synthetic Control Method. Journal of the American Statistical Association, 116(536), 1789–1803.
doi.org/10.1080/01621459.2021.1929245
Foundationalon synthetic control, synthetic difference in differences
augmented-SCMbias-reductiondoubly-robust
Annotation
Ben-Michael, Feller, and Rothstein propose augmenting the synthetic control estimator with an outcome model to reduce bias when the synthetic control does not achieve perfect pre-treatment fit. The resulting doubly robust estimator is consistent if either the outcome model or the weighting is correct, providing a practical improvement for applied synthetic control studies.
2220
Ben-Michael, E., Feller, A., & Rothstein, J. (2022). Synthetic Controls with Staggered Adoption. Journal of the Royal Statistical Society: Series B, 84(2), 351–381.
doi.org/10.1111/rssb.12448
Foundationalon synthetic difference in differences
staggered-adoptioncollective-bargainingeducation-policy
Annotation
Ben-Michael, Feller, and Rothstein extend synthetic control and synthetic DID methods to staggered adoption settings where multiple units adopt treatment at different times. They demonstrate the approach by estimating the effects of teacher collective bargaining laws on school spending across U.S. states, showing how synthetic DID-style reweighting improves counterfactual estimation when treatment rolls out over time.
9519
Benjamini, Y., & Hochberg, Y. (1995). Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing. Journal of the Royal Statistical Society: Series B, 57(1), 289–300.
doi.org/10.1111/j.2517-6161.1995.tb02031.x
Foundationalon multiple testing
FDRstep-up-procedurefalse-discovery-rate
Annotation
Benjamini and Hochberg introduce the false discovery rate (FDR) as an alternative to family-wise error rate control. Their step-up procedure for controlling FDR is less conservative than Bonferroni while still providing meaningful protection against false positives, and has become the standard in many fields.
0720
Bennedsen, M., Nielsen, K. M., Pérez-González, F., & Wolfenzon, D. (2007). Inside the Family Firm: The Role of Families in Succession Decisions and Performance. Quarterly Journal of Economics, 122(2), 647–691.
doi.org/10.1162/qjec.122.2.647
Applicationon instrumental variables
corporate-governanceCEO-successionnatural-experimentfamily-firms
Annotation
Bennedsen et al. use exogenous variation in CEO succession decisions driven by the gender of the departing CEO's firstborn child to study the effect of family versus professional management on firm performance. A widely cited example of using a natural experiment to address endogeneity in corporate governance research.
0320
Bertrand, M., & Schoar, A. (2003). Managing with Style: The Effect of Managers on Firm Policies. Quarterly Journal of Economics, 118(4), 1169–1208.
doi.org/10.1162/003355303322552775
Applicationon fixed effects
manager-fixed-effectsCEO-stylecorporate-policy
Annotation
Bertrand and Schoar use manager fixed effects (tracking CEOs who moved between firms) to show that individual managerial 'style' explains a significant portion of the variation in corporate investment, financial, and organizational practices. This paper is a key reference linking fixed effects methods to management questions.
0420
Bertrand, M., Duflo, E., & Mullainathan, S. (2004). How Much Should We Trust Differences-in-Differences Estimates?. Quarterly Journal of Economics, 119(1), 249–275.
doi.org/10.1162/003355304772839588
Foundationalon difference in differences
serial-correlationclustered-standard-errorsinference
Annotation
Bertrand, Duflo, and Mullainathan show that standard errors in DID studies are often far too small because they ignore serial correlation within units over time. They propose clustering standard errors at the group level as a simple fix, which is now widely recommended practice in DID applications.
0420
Bertrand, M., & Mullainathan, S. (2004). Are Emily and Greg More Employable Than Lakisha and Jamal? A Field Experiment on Labor Market Discrimination. American Economic Review, 94(4), 991–1013.
doi.org/10.1257/0002828042002561
Applicationon experimental design
audit-studydiscriminationlabor-marketfield-experiment
Annotation
Bertrand and Mullainathan send fictitious resumes with randomly assigned names to employers and find that 'white-sounding' names receive 50% more callbacks in this famous audit study. It is one of the most widely cited field experiments in social science and a powerful example of how randomization can identify discrimination.
0620
Bitler, M. P., Gelbach, J. B., & Hoynes, H. W. (2006). What Mean Impacts Miss: Distributional Effects of Welfare Reform Experiments. American Economic Review, 96(4), 988–1012.
doi.org/10.1257/aer.96.4.988
Applicationon quantile treatment effects
applicationwelfare-reformdistributional-effects
Annotation
Bitler, Gelbach, and Hoynes apply quantile treatment effects to experimental data from the Connecticut Jobs First welfare reform program. They show that the average treatment effect masks dramatic heterogeneity: the program had no impact at the bottom of the earnings distribution, increased earnings in the middle, and decreased earnings at the top. The paper demonstrates why distributional analysis is essential for evaluating social programs whose effects vary across the outcome distribution.
9219
Blanchard, O. J., & Katz, L. F. (1992). Regional Evolutions. Brookings Papers on Economic Activity, 1992(1), 1–76.
doi.org/10.2307/2534556
Applicationon shift share instruments
regional-adjustmentmigrationlabor-markets
Annotation
Blanchard and Katz study regional labor market adjustment in the United States, analyzing how local employment shocks affect wages, unemployment, and migration. They construct a predicted-employment instrument using national industry growth interacted with local industry shares—the approach the subsequent literature calls the Bartik or shift-share instrument.
2120
Blomquist, S., Newey, W. K., Kumar, A., & Liang, C.-Y. (2021). On Bunching and Identification of the Taxable Income Elasticity. Journal of Political Economy, 129(8), 2320–2343.
doi.org/10.1086/714446
Foundationalon bunching estimation
identificationtaxable-income-elasticitycritiquenonparametric
Annotation
Blomquist, Newey, Kumar, and Liang provide a critical examination of the identification assumptions underlying bunching estimation. They show that the standard bunching estimator identifies the elasticity only under strong assumptions about the functional form of the counterfactual density and the distribution of preferences. Without these assumptions, the amount of bunching is consistent with a range of elasticities. The paper sparks an important methodological debate about what bunching can and cannot identify, and motivated subsequent work on tightening identification in bunching designs.
9519
Bloom, H. S. (1995). Minimum Detectable Effects: A Simple Way to Report the Statistical Power of Experimental Designs. Evaluation Review, 19(5), 547–556.
doi.org/10.1177/0193841X9501900504
Foundationalon power analysis
MDEminimum-detectable-effectprogram-evaluation
Annotation
Bloom introduces the minimum detectable effect (MDE) framework, which reports the smallest effect size a study can reliably detect given its design and sample size. This approach is now the standard way to discuss statistical power in program evaluation and experimental economics.
0720
Bloom, N., & Van Reenen, J. (2007). Measuring and Explaining Management Practices Across Firms and Countries. Quarterly Journal of Economics, 122(4), 1351–1408.
doi.org/10.1162/qjec.2007.122.4.1351
Applicationon instrumental variables
management-practicesproductivityfirm-performance
Annotation
Bloom and Van Reenen develop a survey-based measure of management practices and document that better management is strongly associated with higher productivity, profitability, and growth. They use IV strategies (including product market competition and primogeniture rules for family management succession) to investigate why management quality varies, finding that poor management is more prevalent when competition is weak and when family firms follow primogeniture. The paper is foundational for the measurement of management practices; the IV analysis is one component of a broader measurement and descriptive study.
1520
Bloom, N., Liang, J., Roberts, J., & Ying, Z. J. (2015). Does Working from Home Work? Evidence from a Chinese Experiment. Quarterly Journal of Economics, 130(1), 165–218.
doi.org/10.1093/qje/qju032
Applicationon experimental design
remote-workproductivityfield-experimentmanagement
Annotation
Bloom and colleagues conduct a large-scale randomized experiment at a Chinese travel agency, finding that working from home led to a 13% performance increase. The study becomes a landmark reference in management and labor economics for its clean experimental design applied to a practical workplace question.
3619
Bonferroni, C. E. (1936). Teoria statistica delle classi e calcolo delle probabilità. Pubblicazioni del R Istituto Superiore di Scienze Economiche e Commerciali di Firenze, 8, 3–62.
Foundationalon multiple testing
Bonferroni-correctionFWERclassical
Annotation
Bonferroni develops the classical correction for multiple comparisons, which controls the family-wise error rate by dividing the significance level by the number of tests. While conservative, the Bonferroni correction remains widely used due to its simplicity and broad applicability.
2220
Borusyak, K., Hull, P., & Jaravel, X. (2022). Quasi-Experimental Shift-Share Research Designs. Review of Economic Studies, 89(1), 181–213.
doi.org/10.1093/restud/rdab030
Foundationalon shift share instruments
shock-exogeneitymany-shocksidentification
Annotation
Borusyak, Hull, and Jaravel provide an alternative framework where identification comes from the exogeneity of the shocks rather than the shares. They show that with many independent shocks, the instrument is valid even if shares are endogenous, greatly expanding the range of credible applications.
2420
Borusyak, K., Jaravel, X., & Spiess, J. (2024). Revisiting Event-Study Designs: Robust and Efficient Estimation. Review of Economic Studies, 91(6), 3253–3285.
doi.org/10.1093/restud/rdae007
Foundationalon staggered difference in differences, event studies
imputation-estimatorefficiencyevent-study
Annotation
Borusyak, Jaravel, and Spiess propose an imputation estimator for staggered DID that first estimates unit and time fixed effects from untreated observations, then imputes the counterfactual outcomes. This approach is efficient, flexible, and avoids the negative weighting problem of TWFE.
9519
Bound, J., Jaeger, D. A., & Baker, R. M. (1995). Problems with Instrumental Variables Estimation When the Correlation Between the Instruments and the Endogenous Explanatory Variable Is Weak. Journal of the American Statistical Association, 90(430), 443–450.
doi.org/10.1080/01621459.1995.10476536
Foundationalon instrumental variables
weak-instrumentsIV-biasfinite-samplefirst-stage-F
Annotation
Bound, Jaeger, and Baker demonstrate that instrumental variables estimates can be severely biased when instruments are weakly correlated with the endogenous regressor. They show that with weak instruments, the finite-sample bias of IV approaches that of OLS, and that the standard IV confidence intervals can have coverage far below their nominal levels. The paper motivates the widespread practice of reporting first-stage F-statistics as a diagnostic for instrument strength.
2120
Brand, J. E., Xu, J., Koch, B., & Geraldo, P. (2021). Uncovering Sociological Effect Heterogeneity Using Tree-Based Machine Learning. Sociological Methodology, 51(2), 189–223.
doi.org/10.1177/0081175021993503
Applicationon causal forests
social-sciencereturns-to-educationvariable-importance
Annotation
Brand and colleagues provide a practical guide to using causal trees and forests in social science research. They discuss honest estimation, variable importance for understanding which covariates drive heterogeneity, and apply the methods to study heterogeneous returns to college education.
1720
Brinch, C. N., Mogstad, M., & Wiswall, M. (2017). Beyond LATE with a Discrete Instrument. Journal of Political Economy, 125(4), 985–1039.
doi.org/10.1086/692712
Foundationalon marginal treatment effects
MTEdiscrete-instrumentsemiparametricquantity-qualityLATE
Annotation
Brinch, Mogstad, and Wiswall show how to estimate the MTE curve semiparametrically even with a discrete (binary or multivalued) instrument, which is a common case in practice. They demonstrate that the local IV approach can be implemented with discrete instruments by imposing additive separability between observed covariates and unobserved heterogeneity along with a parametric structure on the MTE. Applied to the quantity-quality tradeoff of children using twin births and sibling sex composition as instruments for family size, they find that MTE varies with unobserved resistance to having additional children, demonstrating how discrete instruments can recover policy-relevant heterogeneity beyond LATE.
8519
Brown, S. J., & Warner, J. B. (1985). Using Daily Stock Returns: The Case of Event Studies. Journal of Financial Economics, 14(1), 3–31.
doi.org/10.1016/0304-405X(85)90042-X
Foundationalon event studies
daily-returnstest-statisticssimulation
Annotation
Brown and Warner extend the event study framework from monthly to daily stock returns and examine the statistical properties of various test statistics. Their simulations show that simple methods perform well in most settings, providing practical reassurance for applied researchers.
0920
Bruhn, M., & McKenzie, D. (2009). In Pursuit of Balance: Randomization in Practice in Development Field Experiments. American Economic Journal: Applied Economics, 1(4), 200–232.
doi.org/10.1257/app.1.4.200
Foundationalon experimental design
stratificationbalancerandomization-methodsfield-experiments
Annotation
Bruhn and McKenzie compare simple, stratified, pair-wise matched, and rerandomization designs in development field experiments. They find that gains from stratification and pair-wise matching over rerandomization are largest in small samples (under about 300) and with persistent outcomes, while in larger samples the methods perform similarly. They also call for clearer reporting of randomization procedures and provide guidance on which baseline variables to balance on and which controls to include in ex-post analysis.
1820
Buchanan, A. L., Hudgens, M. G., Cole, S. R., Mollan, K. R., Sax, P. E., Daar, E. S., Adimora, A. A., Eron, J. J., & Mugavero, M. J. (2018). Generalizing Evidence from Randomized Trials Using Inverse Probability of Sampling Weights. Journal of the Royal Statistical Society: Series A, 181(4), 1193–1209.
doi.org/10.1111/rssa.12357
Foundationalon external validity
Annotation
Buchanan et al. extend inverse probability of sampling weights (IPSW) methods for generalizing findings from randomized trials to target populations defined by observational surveys.
2220
Busenbark, J. R., Yoon, H., Gamache, D. L., & Withers, M. C. (2022). Omitted Variable Bias: Examining Management Research with the Impact Threshold of a Confounding Variable (ITCV). Journal of Management, 48(1), 17–48.
doi.org/10.1177/01492063211006458
ApplicationMgmton sensitivity analysis
management-methodologyITCVbest-practices
Annotation
Busenbark and colleagues provide a practical guide to conducting sensitivity analysis in management research using the ITCV framework. They review its application in strategic management and organizational behavior, and demonstrate how to interpret and report results for management audiences.
0720
Bushway, S., Johnson, B. D., & Slocum, L. A. (2007). Is the Magic Still There? The Use of the Heckman Two-Step Correction for Selection Bias in Criminology. Journal of Quantitative Criminology, 23(2), 151–178.
doi.org/10.1007/s10940-007-9024-4
Surveyon heckman selection model
surveycriminologymisapplication
Annotation
Bushway, Johnson, and Slocum review Heckman model applications in criminology and find widespread misapplication. Emphasizes that without a credible exclusion restriction, the Heckman correction provides no improvement over naive OLS and may even increase bias.

2120
Callaway, B., & Sant'Anna, P. H. C. (2021). Difference-in-Differences with Multiple Time Periods. Journal of Econometrics, 225(2), 200–230.
doi.org/10.1016/j.jeconom.2020.12.001
Foundationalon staggered difference in differences, event studies, synthetic difference in differences
group-time-ATTheterogeneous-effectsaggregation
Annotation
Callaway and Sant'Anna propose group-time average treatment effects (ATT(g,t)) that avoid the problematic comparisons in TWFE. Their framework allows for heterogeneous treatment effects across groups and time and provides aggregation schemes for summary parameters.
1420
Calonico, S., Cattaneo, M. D., & Titiunik, R. (2014). Robust Nonparametric Confidence Intervals for Regression-Discontinuity Designs. Econometrica, 82(6), 2295–2326.
doi.org/10.3982/ECTA11757
Foundationalon regression discontinuity fuzzy, regression discontinuity sharp
bias-correctionbandwidthrdrobustinference
Annotation
Calonico, Cattaneo, and Titiunik develop bias-corrected confidence intervals for RDD that address the problem of conventional confidence intervals being invalid when using optimal bandwidth selectors. Their rdrobust software package has become the standard tool for implementing RDD in practice.
8619
Cameron, A. C., & Trivedi, P. K. (1986). Econometric Models Based on Count Data: Comparisons and Applications of Some Estimators and Tests. Journal of Applied Econometrics, 1(1), 29–53.
doi.org/10.1002/jae.3950010104
Foundationalon poisson negative binomial
overdispersionmodel-selectioncount-data
Annotation
Cameron and Trivedi compare Poisson, negative binomial, and other count data models, providing tests for overdispersion and guidance on model selection. This paper helps establish the practical toolkit for applied researchers working with count outcomes.
9019
Cameron, A. C., & Trivedi, P. K. (1990). Regression-based Tests for Overdispersion in the Poisson Model. Journal of Econometrics, 46(3), 347–364.
doi.org/10.1016/0304-4076(90)90014-K
Foundationalon poisson negative binomial
overdispersionPoissonmodel-selectioncount-data
Annotation
Cameron and Trivedi develop regression-based score tests for overdispersion in Poisson regression by testing whether the conditional variance equals the conditional mean against parametric alternatives. The tests remain a standard diagnostic for count-data model selection.
0520
Cameron, A. C., & Trivedi, P. K. (2005). Microeconometrics: Methods and Applications. Cambridge University Press.
doi.org/10.1017/CBO9780511811241
Surveyon fixed effects, logit probit
textbookpanel-datamicroeconometricsdynamic-panels
Annotation
Cameron and Trivedi cover panel data methods comprehensively in Chapter 21, including fixed effects, random effects, and dynamic panel models. A standard graduate-level reference for microeconometric methods.
0820
Cameron, A. C., Gelbach, J. B., & Miller, D. L. (2008). Bootstrap-Based Improvements for Inference with Clustered Errors. Review of Economics and Statistics, 90(3), 414–427.
doi.org/10.1162/rest.90.3.414
Foundationalon ols regression, randomization inference
cluster-bootstrapfew-clustersinferencewild-bootstrap
Annotation
Cameron, Gelbach, and Miller address what happens when clustering is necessary but the number of clusters is small (fewer than 30-50). They propose the wild cluster bootstrap as a solution, which has become the standard approach when researchers have too few clusters for asymptotic cluster-robust standard errors to be reliable.
1120
Cameron, A. C., Gelbach, J. B., & Miller, D. L. (2011). Robust Inference with Multiway Clustering. Journal of Business & Economic Statistics, 29(2), 238–249.
doi.org/10.1198/jbes.2010.07136
Foundationalon clustering inference
two-way-clusteringmultiway
Annotation
Cameron, Gelbach, and Miller extend cluster-robust variance estimation to settings with two-way (or multi-way) clustering. The variance estimator adds the two one-way cluster-robust variance matrices and subtracts the heteroscedasticity-robust matrix.
1320
Cameron, A. C., & Trivedi, P. K. (2013). Regression Analysis of Count Data. Cambridge University Press.
doi.org/10.1017/CBO9781139013567
Surveyon poisson negative binomial
textbookcount-datazero-inflationpanel-data
Annotation
Cameron and Trivedi provide the standard reference on count data regression, covering Poisson, negative binomial, zero-inflated, hurdle, and panel count models. They provide both the theoretical foundations and practical implementation guidance that applied researchers need.
1520
Cameron, A. C., & Miller, D. L. (2015). A Practitioner's Guide to Cluster-Robust Inference. Journal of Human Resources, 50(2), 317–372.
doi.org/10.3368/jhr.50.2.317
Surveyon ols regression
cluster-robuststandard-errorsinferencepractical-guide
Annotation
Cameron and Miller cover all aspects of cluster-robust inference in OLS regression in this highly practical survey, including when to cluster, at what level, and what to do when the number of clusters is small. It has become the essential reference for applied researchers deciding how to handle clustered data.
2020
Camuffo, A., Cordova, A., Gambardella, A., & Spina, C. (2020). A Scientific Approach to Entrepreneurial Decision Making: Evidence from a Randomized Control Trial. Management Science, 66(2), 564–586.
doi.org/10.1287/mnsc.2018.3249
ApplicationMgmton experimental design
RCTentrepreneurshipdecision-makingscientific-method
Annotation
Camuffo and colleagues conduct a randomized controlled trial with 116 Italian startups, randomly assigning half to receive training in a 'scientific' approach to entrepreneurial decision-making (formulating and testing hypotheses before committing resources). Treated startups perform better, are more likely to pivot, and are not more likely to drop out, providing experimental evidence that structured decision-making improves entrepreneurial outcomes.
2420
Camuffo, A., Gambardella, A., Messinese, D., Novelli, E., Paolucci, E., & Spina, C. (2024). A Scientific Approach to Entrepreneurial Decision-Making: Large-Scale Replication and Extension. Strategic Management Journal, 45(6), 1209–1237.
doi.org/10.1002/smj.3580
ApplicationMgmton experimental design
RCTreplicationentrepreneurshipscientific-methodexternal-validity
Annotation
Camuffo and colleagues conduct four randomized controlled trials with 759 firms across Italy, the UK, and India, replicating and extending their earlier finding that training entrepreneurs to adopt a 'scientific' approach to decision-making improves venture performance. The multi-site, multi-country design provides strong evidence on the external validity of the original RCT findings.
0220
Capron, L., & Pistre, N. (2002). When Do Acquirers Earn Abnormal Returns?. Strategic Management Journal, 23(9), 781–794.
doi.org/10.1002/smj.262
ApplicationMgmton event studies
M&Aacquirer-returnsstrategy
Annotation
Capron and Pistre examine when acquirers earn abnormal returns from acquisitions. Using event-study and cross-sectional analyses, they find that acquirers earn positive abnormal returns only when they transfer their own resources to the target; when value comes from target-side resources, competing bidders capture the surplus.
9419
Card, D., & Krueger, A. B. (1994). Minimum Wages and Employment: A Case Study of the Fast-Food Industry in New Jersey and Pennsylvania. American Economic Review, 84(4), 772–793.
Foundationalon difference in differences
minimum-wageemploymentnatural-experiment
Annotation
Card and Krueger compare fast-food employment in New Jersey (which raised its minimum wage) with neighboring Pennsylvania (which did not) in perhaps the most famous DID study in economics. They find no negative employment effect, challenging the standard textbook prediction. This paper popularizes DID as a research design.
0020
Card, D., & Levine, P. B. (2000). Extended benefits and the duration of UI spells: evidence from the New Jersey extended benefit program. Journal of Public Economics, 78(1-2), 107–138.
doi.org/10.1016/S0047-2727(99)00113-9
Foundationalon lab rkd replication
unemployment-insurancemoral-hazardnatural-experimentextended-benefits
Annotation
Card and Levine exploit New Jersey's June 1996 Extended Benefit (EB) program — which extended UI benefits by 13 weeks for spells beginning in a defined window — as a natural experiment to estimate the moral-hazard effect of UI duration on unemployment duration. They find that extended benefits raised unemployment duration by about one week (roughly an 8% effect on duration), implying a moral-hazard elasticity of unemployment duration with respect to benefit duration of around 0.1–0.2. The paper is a standard benchmark for the U.S. moral-hazard literature on UI generosity.
0120
Card, D. (2001). Immigrant Inflows, Native Outflows, and the Local Labor Market Impacts of Higher Immigration. Journal of Labor Economics, 19(1), 22–64.
doi.org/10.1086/209979
Applicationon shift share instruments
immigrationenclave-instrumentlabor-markets
Annotation
Card uses a shift-share instrument based on historical settlement patterns of immigrant groups to predict current immigration flows to U.S. cities. This 'enclave instrument' is adopted in hundreds of subsequent immigration studies and is a classic example of the shift-share approach.
1520
Card, D., Lee, D. S., Pei, Z., & Weber, A. (2015). Inference on Causal Effects in a Generalized Regression Kink Design. Econometrica, 83(6), 2453–2483.
doi.org/10.3982/ECTA11224
Foundationalon regression kink design
RKD-foundationskink-designunemployment-insurancederivative-ratio
Annotation
Card, Lee, Pei, and Weber formalize the regression kink design, establishing conditions under which a kink in the treatment assignment function identifies causal effects. They show that the estimand is the ratio of the change in the slope of the conditional expectation of the outcome to the change in the slope of the treatment function at the kink point. The paper develops inference procedures and applies the method to estimate the effect of unemployment insurance benefits on unemployment duration.
1820
Card, D., Kluve, J., & Weber, A. (2018). What Works? A Meta Analysis of Recent Active Labor Market Program Evaluations. Journal of the European Economic Association, 16(3), 894–931.
doi.org/10.1093/jeea/jvx028
Surveyon power analysis
Annotation
Card, Kluve, and Weber provide a comprehensive meta-analysis of over 200 active labor market program evaluations, finding that training programs tend to show positive medium-term effects while public employment subsidies are less effective.
1120
Carneiro, P., Heckman, J. J., & Vytlacil, E. J. (2011). Estimating Marginal Returns to Education. American Economic Review, 101(6), 2754–2781.
doi.org/10.1257/aer.101.6.2754
Applicationon lab mte replication
Annotation
Carneiro, Heckman, and Vytlacil use the marginal treatment effect framework to estimate heterogeneous returns to college. They find a declining MTE curve -- individuals most likely to attend college benefit the most -- demonstrating that conventional treatment effect parameters (ATE, ATT, LATE) differ substantially due to essential heterogeneity.
1220
Casey, K., Glennerster, R., & Miguel, E. (2012). Reshaping Institutions: Evidence on Aid Impacts Using a Preanalysis Plan. Quarterly Journal of Economics, 127(4), 1755–1812.
doi.org/10.1093/qje/qje027
Applicationon multiple testing, pre registration
field-experimentpre-analysis-plandevelopment-economicsWestfall-Young
Annotation
Casey, Glennerster, and Miguel pre-registered their analysis plan for a community-driven development program in Sierra Leone and apply multiple testing corrections (including the Westfall-Young step-down procedure and family-wise error rate adjustments) across outcome families. This paper is one of the most prominent examples of rigorous multiple testing adjustment in a field experiment, demonstrating that many individually significant effects lose significance after correction.
1320
Cattaneo, M. D., Drukker, D. M., & Holland, A. D. (2013). Estimation of Multivalued Treatment Effects Under Conditional Independence. Stata Journal, 13(3), 407–450.
doi.org/10.1177/1536867X1301300301
Foundationalon matching methods
multi-valued-treatmentdose-responseinverse-probability-weightingStata
Annotation
Cattaneo, Drukker, and Holland extend matching and inverse probability weighting methods to settings with multi-valued (rather than binary) treatments, developing estimators for dose-response functions under conditional independence. Their accompanying Stata implementation made these methods readily accessible to applied researchers.
1520
Cattaneo, M. D., Frandsen, B. R., & Titiunik, R. (2015). Randomization Inference in the Regression Discontinuity Design: An Application to Party Advantages in the U.S. Senate. Journal of Causal Inference, 3(1), 1–24.
doi.org/10.1515/jci-2013-0010
Foundationalon randomization inference
RDDlocal-randomizationelectionsfinite-sample
Annotation
Cattaneo, Frandsen, and Titiunik develop a randomization inference framework for regression discontinuity designs, exploiting the local randomization interpretation of close elections. They apply the method to estimate party advantages in U.S. Senate elections, demonstrating how Fisher-style permutation tests can provide finite-sample exact inference in RDD settings where asymptotic approximations may be unreliable.
1920
Cattaneo, M. D., Titiunik, R., & Vazquez-Bare, G. (2019). Power Calculations for Regression-Discontinuity Designs. Stata Journal, 19(1), 210–245.
doi.org/10.1177/1536867X19830919
FoundationalReplicationon regression discontinuity sharp
power-calculationssample-sizestudy-designsoftware
Annotation
Cattaneo, Titiunik, and Vazquez-Bare provide methods and software for power calculations in RDD, essential for study design and determining adequate sample sizes near the cutoff. The associated rdsampsi command enables researchers to plan appropriately powered RDD studies before data collection.
2020
Cattaneo, M. D., Idrobo, N., & Titiunik, R. (2020). A Practical Introduction to Regression Discontinuity Designs: Foundations. Cambridge University Press.
doi.org/10.1017/9781108684606
Surveyon regression discontinuity fuzzy, regression discontinuity sharp
RDD-practical-guiderdrobusttextbookfuzzy-RDD
Annotation
Cattaneo, Idrobo, and Titiunik provide a practical and accessible guide to implementing regression discontinuity designs, covering both sharp and fuzzy cases with worked examples and code. Part of the Cambridge Elements series, it provides step-by-step guidance on bandwidth selection, estimation, and inference using the rdrobust toolkit.
2020
Cattaneo, M. D., Jansson, M., & Ma, X. (2020). Simple Local Polynomial Density Estimators. Journal of the American Statistical Association, 115(531), 1449–1455.
doi.org/10.1080/01621459.2019.1635480
Foundationalon regression discontinuity fuzzy
manipulation-testingdensity-estimationrddensity
Annotation
Cattaneo, Jansson, and Ma propose a local polynomial density estimator for manipulation testing in regression discontinuity designs. Implemented in the rddensity package, it provides a modern alternative to the McCrary (2008) density test with better boundary properties.
2220
Cattaneo, M. D., & Titiunik, R. (2022). Regression Discontinuity Designs. Annual Review of Economics, 14, 821–851.
doi.org/10.1146/annurev-economics-051520-021409
Surveyon regression discontinuity sharp
surveystate-of-the-artfuzzy-RDDgeographic-RDDmulti-cutoff
Annotation
Cattaneo and Titiunik survey the state of the art in RDD methodology, including extensions to fuzzy designs, geographic RDD, and multi-cutoff designs. They provide guidance on current recommended practices and an excellent entry point to the modern RDD literature.
2420
Cattaneo, M. D., Idrobo, N., & Titiunik, R. (2024). A Practical Introduction to Regression Discontinuity Designs: Extensions. Cambridge University Press.
doi.org/10.1017/9781009441896
Surveyon regression discontinuity sharp
textbookextensionsmulti-scoregeographic-RDDkink-design
Annotation
Cattaneo, Idrobo, and Titiunik cover extensions of the regression discontinuity framework in this follow-up volume, including multi-score designs, geographic RDD, kink designs, and discrete running variables. They provide practical guidance and software implementations for these more advanced settings, making it an essential companion for applied researchers going beyond the standard sharp RDD.
1620
Certo, S. T., Busenbark, J. R., Woo, H., & Semadeni, M. (2016). Sample Selection Bias and Heckman Models in Strategic Management Research. Strategic Management Journal, 37(13), 2639–2657.
doi.org/10.1002/smj.2475
SurveyMgmton heckman selection model
surveymanagementbest-practices
Annotation
Certo, Busenbark, Woo, and Semadeni review the use of Heckman models in strategic management. They provide practical guidance on when selection correction is needed, how to choose exclusion restrictions, and how to interpret results. Finds that many SMJ papers misapply the technique.
8019
Chamberlain, G. (1980). Analysis of Covariance with Qualitative Data. Review of Economic Studies, 47(1), 225–238.
doi.org/10.2307/2297110
Foundationalon fixed effects, logit probit
nonlinear-modelsconditional-logitdiscrete-choice
Annotation
Chamberlain extends the fixed effects approach to nonlinear models like logit, showing how to condition out the fixed effects in discrete choice settings. This work is fundamental for researchers who need fixed effects in models where the dependent variable is binary or categorical.
1620
Chatterji, A. K., Findley, M., Jensen, N. M., Meier, S., & Nielson, D. (2016). Field Experiments in Strategy Research. Strategic Management Journal, 37(1), 116–132.
doi.org/10.1002/smj.2449
ApplicationMgmton experimental design
field-experimentsstrategymethodology
Annotation
Chatterji, Findley, Jensen, Meier, and Nielson make the case for using field experiments in strategy research and provide practical guidance for doing so. They discuss internal validity, external validity, and ethical considerations specific to strategy scholars.
0820
Chava, S., & Roberts, M. R. (2008). How Does Financing Impact Investment? The Role of Debt Covenants. Journal of Finance, 63(5), 2085–2121.
doi.org/10.1111/j.1540-6261.2008.01391.x
Applicationon regression discontinuity sharp
debt-covenantscorporate-financeinvestment
Annotation
Chava and Roberts use an RDD around debt covenant thresholds to study how covenant violations affect firm investment. This paper is an important early application of RDD in corporate finance, where accounting-based thresholds create natural discontinuities.
0520
Chernozhukov, V., & Hansen, C. (2005). An IV Model of Quantile Treatment Effects. Econometrica, 73(1), 245–261.
doi.org/10.1111/j.1468-0262.2005.00570.x
Foundationalon quantile treatment effects
foundationalinstrumental-variablesquantile-regression
Annotation
Chernozhukov and Hansen develop an instrumental variable framework for quantile regression to address endogeneity. Proposes the inverse quantile regression (IQR) method that exploits moment conditions implied by the structural quantile model. Provides conditions under which quantile treatment effects are identified with endogenous treatments, extending quantile regression to credible causal inference settings.
1820
Chernozhukov, V., Chetverikov, D., Demirer, M., Duflo, E., Hansen, C., Newey, W., & Robins, J. (2018). Double/Debiased Machine Learning for Treatment and Structural Parameters. Econometrics Journal, 21(1), C1–C68.
doi.org/10.1111/ectj.12097
Foundationalon double debiased machine learning
Neyman-orthogonalitycross-fittingpartially-linear-model
Annotation
Chernozhukov et al. introduce double/debiased machine learning (DML), showing how to combine Neyman orthogonality with cross-fitting to obtain root-n consistent and asymptotically normal estimates of low-dimensional causal parameters while using high-dimensional machine learning for nuisance functions. This paper provides the theoretical foundation for valid inference when first-stage estimation uses flexible ML methods that would otherwise invalidate standard asymptotic arguments. The cross-fitting procedure it introduces is now standard practice for any application combining ML prediction with causal parameter estimation.
2120
Chernozhukov, V., Wuthrich, K., & Zhu, Y. (2021). An Exact and Robust Conformal Inference Method for Counterfactual and Synthetic Controls. Journal of the American Statistical Association, 116(536), 1849–1864.
doi.org/10.1080/01621459.2021.1920957
Foundationalon synthetic control
conformal-inferencecounterfactualfinite-sample
Annotation
Chernozhukov, Wuthrich, and Zhu develop a conformal inference method for synthetic control that provides exact, finite-sample valid p-values and confidence intervals without requiring a large number of control units. This approach offers a modern, robust alternative to placebo-based inference for counterfactual and synthetic control estimators.
2220
Chernozhukov, V., Escanciano, J. C., Ichimura, H., Newey, W. K., & Robins, J. M. (2022). Locally Robust Semiparametric Estimation. Econometrica, 90(4), 1501–1535.
doi.org/10.3982/ECTA16294
Foundationalon double debiased machine learning
semiparametriclocal-robustnessdebiasing
Annotation
Chernozhukov, Escanciano, Ichimura, Newey, and Robins develop locally robust semiparametric estimators that extend the DML framework, demonstrating how automatic debiasing with machine learning first-stage estimates can be applied broadly. Their approach yields root-n consistent estimates of causal and structural parameters even when nuisance functions are estimated with regularized machine learning methods.
1120
Chetty, R., Friedman, J. N., Olsen, T., & Pistaferri, L. (2011). Adjustment Costs, Firm Responses, and Micro vs. Macro Labor Supply Elasticities: Evidence from Danish Tax Records. Quarterly Journal of Economics, 126(2), 749–804.
doi.org/10.1093/qje/qjr013
Applicationon bunching estimation
labor-supplyadjustment-costsDenmarktax-kinksfrictions
Annotation
Chetty, Friedman, Olsen, and Pistaferri use Danish administrative tax data to reconcile the gap between micro and macro labor supply elasticities using bunching methods. They show that adjustment frictions explain why micro estimates from bunching at tax kinks are small: many workers cannot freely adjust hours, so observed bunching understates the frictionless elasticity. They estimate that accounting for frictions raises the implied elasticity substantially. The paper is a landmark application of bunching to the micro-macro elasticity puzzle and introduces key methods for dealing with frictions in bunching designs.
1420
Chetty, R., Friedman, J. N., & Rockoff, J. E. (2014). Measuring the Impacts of Teachers I: Evaluating Bias in Teacher Value-Added Estimates. American Economic Review, 104(9), 2593–2632.
doi.org/10.1257/aer.104.9.2593
Applicationon fixed effects
teacher-value-addededucationcausal-validation
Annotation
Chetty, Friedman, and Rockoff use teacher fixed effects (value-added models) and quasi-experimental validation to measure individual teachers' causal impacts on student outcomes. They demonstrate that teacher fixed effects capture real causal effects, not just selection, and their work has influenced education policy worldwide.
2120
Choudhury, P., Foroughi, C., & Larson, B. (2021). Work-from-anywhere: The Productivity Effects of Geographic Flexibility. Strategic Management Journal, 42(4), 655–683.
doi.org/10.1002/smj.3251
ApplicationMgmton difference in differences
difference-in-differencesremote-worknatural-experimentproductivity
Annotation
Choudhury, Foroughi, and Larson use a difference-in-differences design to study the productivity effects of a work-from-anywhere policy at the U.S. Patent and Trademark Office. They find that geographic flexibility increased output by approximately 4.4% without reducing quality. The paper demonstrates the application of DiD to a natural experiment in organizational design and is a leading example of causal inference in the future-of-work literature.
1820
Christensen, G., & Miguel, E. (2018). Transparency, Reproducibility, and the Credibility of Economics Research. Journal of Economic Literature, 56(3), 920–980.
doi.org/10.1257/jel.20171350
Surveyon pre registration
transparencyreproducibilityAEA-registryeconomics
Annotation
Christensen and Miguel survey the transparency and reproducibility landscape in economics, documenting the growing adoption of pre-registration through the AEA RCT Registry and other platforms. They present evidence on the prevalence of specification searching and publication bias, and make the case that pre-registration combined with pre-analysis plans substantially improves the credibility of empirical findings.
2020
Cinelli, C., & Hazlett, C. (2020). Making Sense of Sensitivity: Extending Omitted Variable Bias. Journal of the Royal Statistical Society: Series B, 82(1), 39–67.
doi.org/10.1111/rssb.12348
Foundationalon sensitivity analysis
omitted-variable-biaspartial-R-squaredbenchmarking
Annotation
Cinelli and Hazlett develop a modern framework for sensitivity analysis based on partial R-squared measures, extending the omitted variable bias formula. Their approach allows researchers to benchmark the strength of hypothetical confounders against observed covariates, making sensitivity analysis more interpretable.
2420
Cinelli, C., Ferwerda, J., & Hazlett, C. (2024). Sensemakr: Sensitivity Analysis Tools for OLS in R and Stata. Observational Studies, 10(2), 93–127.
doi.org/10.1353/obs.2024.a946583
FoundationalReplicationon sensitivity analysis
softwarepartial-R-squaredbenchmarkingR-package
Annotation
Cinelli, Ferwerda, and Hazlett develop the sensemakr R and Stata package implementing their partial R-squared sensitivity analysis framework. They demonstrate the tool with applications to studies of violence and political attitudes, showing how researchers can benchmark potential confounders against observed covariates to assess the robustness of causal claims from observational data.
2520
Cinelli, C., & Hazlett, C. (2025). An Omitted Variable Bias Framework for Sensitivity Analysis of Instrumental Variables. Biometrika, 112(2), asaf004.
doi.org/10.1093/biomet/asaf004
Applicationon sensitivity analysis
instrumental-variablesexclusion-restrictionomitted-variable-biasIV-sensitivity
Annotation
Cinelli and Hazlett extend their OLS sensitivity framework to instrumental variables settings, showing how to assess the robustness of IV estimates to violations of the exclusion restriction. They derive bounds on IV bias as a function of the partial R-squared of a hypothetical confounder with both the instrument and the outcome, providing practical tools for benchmarking the plausibility of IV assumptions.
1520
Clark, T. S., & Linzer, D. A. (2015). Should I Use Fixed or Random Effects?. Political Science Research and Methods, 3(2), 399–408.
doi.org/10.1017/psrm.2014.32
Surveyon random effects
fixed-vs-randommodel-selectionpractical-guidance
Annotation
Clark and Linzer provide practical guidance on choosing between fixed and random effects, arguing the decision depends on the research question, sample size, and the degree of correlation between unit effects and covariates. They demonstrate via simulation that random effects can outperform fixed effects when the number of units is small or when between-unit variation is of substantive interest. The paper challenges the common practice of defaulting to fixed effects solely because the Hausman test rejects.
2020
Clarke, D., Romano, J. P., & Wolf, M. (2020). The Romano-Wolf Multiple-Hypothesis Correction in Stata. Stata Journal, 20(4), 812–843.
doi.org/10.1177/1536867X20976314
Foundationalon multiple testing
StatasoftwareRomano-WolfFWER
Annotation
Clarke, Romano, and Wolf develop a Stata implementation of the Romano-Wolf stepwise multiple testing correction, which controls the family-wise error rate while accounting for the dependence structure among test statistics via resampling. This correction is more powerful than Bonferroni or Holm procedures when test statistics are correlated, which is the typical case in applied research with related outcomes. The rwolf command provides applied researchers with an accessible tool for rigorous multiple hypothesis testing.
2420
Clarke, D., Pailanir, D., Athey, S., & Imbens, G. (2024). On Synthetic Difference-in-Differences and Related Estimation Methods in Stata. Stata Journal, 24(4), 557–598.
doi.org/10.1177/1536867X241297914
Foundationalon synthetic difference in differences
Statasoftwareimplementation
Annotation
Clarke and colleagues develop the sdid Stata package for implementing synthetic DID, providing detailed documentation and empirical examples. This paper makes the method accessible to applied researchers and demonstrates implementation with real policy evaluation data.
1620
Cleves, M., Gould, W., & Marchenko, Y. (2016). An Introduction to Survival Analysis Using Stata. Stata Press.
Surveyon cox proportional hazard
surveystatapractical-guide
Annotation
Cleves, Gould, and Marchenko provide a comprehensive practical guide to survival analysis in Stata. Covers Kaplan-Meier estimation, Cox regression, parametric models, competing risks, and frailty models with extensive Stata code examples and diagnostic procedures.
1520
Coffman, L. C., & Niederle, M. (2015). Pre-Analysis Plans Have Limited Upside, Especially Where Replications Are Feasible. Journal of Economic Perspectives, 29(3), 81–98.
doi.org/10.1257/jep.29.3.81
Surveyon pre registration
skepticismreplicationflexibility
Annotation
Coffman and Niederle offer a skeptical perspective on pre-analysis plans, arguing that their benefits are limited when replication is feasible and that rigid adherence to pre-specified analyses can prevent researchers from learning from the data. This paper provides important counterarguments in the pre-registration debate.
8819
Cohen, J. (1988). Statistical Power Analysis for the Behavioral Sciences. Lawrence Erlbaum Associates.
Foundationalon power analysis
textbookeffect-sizesample-size
Annotation
Cohen's foundational textbook introduces the concepts of effect size, statistical power, and sample size determination that becomes standard in the behavioral sciences. He provides power tables and conventions for small, medium, and large effect sizes that remain widely used across disciplines.
9919
Conley, T. G. (1999). GMM Estimation with Cross Sectional Dependence. Journal of Econometrics, 92(1), 1–45.
doi.org/10.1016/S0304-4076(98)00084-0
Foundationalon choosing standard errors
Annotation
Conley develops a GMM-based approach to HAC standard errors that accounts for spatial (cross-sectional) dependence, allowing errors to be correlated across observations within a specified geographic or economic distance.
1220
Conley, T. G., Hansen, C. B., & Rossi, P. E. (2012). Plausibly Exogenous. Review of Economics and Statistics, 94(1), 260–272.
doi.org/10.1162/REST_a_00139
Foundationalon instrumental variables
exclusion-restrictionsensitivity-analysisplausible-exogeneity
Annotation
Conley, Hansen, and Rossi develop methods for inference when the exclusion restriction is 'plausibly' rather than exactly satisfied, parameterizing the degree of violation and constructing valid confidence intervals. This approach provides a formal sensitivity analysis for IV estimates, answering the question: how large would the violation of the exclusion restriction need to be to overturn the result? Applied researchers can use these methods to transparently assess the robustness of IV findings to a common critique.
1620
Cornelissen, T., Dustmann, C., Raute, A., & Schonberg, U. (2016). From LATE to MTE: Alternative Methods for the Evaluation of Policy Interventions. Labour Economics, 41, 47–60.
doi.org/10.1016/j.labeco.2016.06.004
Applicationon marginal treatment effects
MTEchild-carepolicy-evaluationappliedGermany+1
Annotation
Cornelissen, Dustmann, Raute, and Schonberg provide an accessible methodological guide to MTE estimation, covering the theoretical foundations and practical steps for moving from LATE to the full marginal treatment effect curve. The paper explains how to use local instrumental variables to trace out how treatment effects vary with individuals' unobserved propensity to participate. It serves as a tutorial for applied researchers seeking to implement MTE methods, with clear exposition of identification, estimation, and interpretation.
8819
Cornwell, C., & Rupert, P. (1988). Efficient Estimation with Panel Data: An Empirical Comparison of Instrumental Variables Estimators. Journal of Applied Econometrics, 3(2), 149–155.
doi.org/10.1002/jae.3950030206
Applicationon lab re replication
Annotation
Cornwell and Rupert compare the efficiency of alternative instrumental variables estimators for panel data models with correlated individual effects, including the Hausman-Taylor, Amemiya-MaCurdy, and Breusch-Mizon-Schmidt estimators. Using a Mincer wage equation on PSID data, they find that efficiency gains from the more complex estimators are limited to the coefficients of time-invariant endogenous variables.
1720
Correia, S. (2017). Linear Models with High-Dimensional Fixed Effects: An Efficient and Feasible Estimator. Working Paper.
Foundationalon fixed effects
reghdfehigh-dimensional-FEStatacomputational
Annotation
Correia develops an efficient iterative demeaning estimator for linear models with multiple high-dimensional fixed effects that scales to very large datasets. The estimator handles arbitrary numbers of fixed-effect dimensions and supports cluster-robust standard errors. Its implementation as the reghdfe Stata command has become the standard tool for applied researchers working with high-dimensional fixed effects in panel data.
2020
Correia, S., Guimaraes, P., & Zylkin, T. (2020). Fast Poisson Estimation with High-Dimensional Fixed Effects. Stata Journal, 20(1), 95–115.
doi.org/10.1177/1536867X20909691
Foundationalon poisson negative binomial
ppmlhdfehigh-dimensional-FEPPMLStata
Annotation
Correia, Guimaraes, and Zylkin introduce the ppmlhdfe Stata command for fast Poisson estimation with multiple levels of fixed effects, making PPML feasible for large datasets with high-dimensional fixed effects. This tool has become standard for applied researchers working with count data in panel settings.
7219
Cox, D. R. (1972). Regression Models and Life-Tables. Journal of the Royal Statistical Society: Series B (Methodological), 34(2), 187–220.
doi.org/10.1111/j.2517-6161.1972.tb00899.x
Foundationalon cox proportional hazard
foundationalproportional-hazardspartial-likelihood
Annotation
Cox introduces the proportional hazards model with an unspecified baseline hazard, estimated via partial likelihood. The semiparametric approach avoids distributional assumptions on the baseline hazard while allowing covariate effects to be estimated consistently. One of the most cited papers in statistics.
1320
Crépon, B., Duflo, E., Gurgand, M., Rathelot, R., & Zamora, P. (2013). Do Labor Market Policies Have Displacement Effects? Evidence from a Clustered Randomized Experiment. Quarterly Journal of Economics, 128(2), 531–580.
doi.org/10.1093/qje/qjt001
Applicationon experimental design
job-placementdisplacement-effectscluster-RCTFrance
Annotation
Crépon and colleagues evaluate a job placement assistance program in France using a two-step clustered randomization design that varies treatment intensity across 235 labor markets. The paper's key contribution is identifying displacement effects: treated job seekers gain at the expense of untreated competitors, particularly in weak labor markets and among workers with similar skills. This innovative experimental design allows estimation of both direct and indirect (general equilibrium) effects of active labor market policies.
1220
Cunat, V., Gine, M., & Guadalupe, M. (2012). The Vote Is Cast: The Effect of Corporate Governance on Shareholder Value. Journal of Finance, 67(5), 1943–1977.
doi.org/10.1111/j.1540-6261.2012.01776.x
Applicationon regression discontinuity fuzzy
corporate-governanceshareholder-valueclose-votes
Annotation
Cunat, Gine, and Guadalupe use a fuzzy RDD around the majority threshold in shareholder governance proposals to estimate the causal effect of governance provisions on firm value. This paper is a leading example of fuzzy RDD applied to corporate governance and finance.
1820
Cunningham, S., & Shah, M. (2018). Decriminalizing Indoor Prostitution: Implications for Sexual Violence and Public Health. Review of Economic Studies, 85(3), 1683–1715.
doi.org/10.1093/restud/rdx065
Applicationon synthetic control
policy-evaluationpublic-healthcrime
Annotation
Cunningham and Shah use the synthetic control method to study how Rhode Island's accidental decriminalization of indoor prostitution affected sex crimes and STI rates. This study is a well-known application that illustrates how synthetic control can exploit a unique policy change affecting a single unit.
2120
Cunningham, S. (2021). Causal Inference: The Mixtape. Yale University Press.
doi.org/10.12987/9780300255881
SurveyReplicationon difference in differences, instrumental variables, regression discontinuity fuzzy +2
textbookcausal-inferenceaccessiblecode-examples
Annotation
Cunningham provides an accessible textbook with an excellent DiD chapter that walks through the intuition, the math, and the code (in Stata and R). Freely available online at mixtape.scunning.com, it is a valuable companion for students who want worked examples alongside formal treatment.

1920
Dahabreh, I. J., Robertson, S. E., Tchetgen Tchetgen, E. J., Stuart, E. A., & Hernan, M. A. (2019). Generalizing Causal Inferences from Individuals in Randomized Trials to All Trial-Eligible Individuals. Biometrics, 75(2), 685–694.
doi.org/10.1111/biom.13009
Foundationalon external validity
Annotation
Dahabreh et al. develop a framework for generalizing causal inferences from randomized individuals to all trial-eligible individuals, reviewing identifiability conditions and proposing estimators.
1720
Davis, J., & Heller, S. B. (2017). Using Causal Forests to Predict Treatment Heterogeneity: An Application to Summer Jobs. American Economic Review, 107(5), 546–550.
doi.org/10.1257/aer.p20171000
Applicationon causal forests
policy-evaluationsummer-jobstargeting
Annotation
Davis and Heller apply causal forests to two randomized controlled trials of the One Summer Chicago youth summer jobs program, exploring how useful predicted treatment-effect heterogeneity is in practice. They train on one trial and validate predicted effects in the other, finding that the method can identify heterogeneity for some outcomes that standard interaction methods miss, while highlighting limitations of the approach.
2020
de Chaisemartin, C., & D'Haultfoeuille, X. (2020). Two-Way Fixed Effects Estimators with Heterogeneous Treatment Effects. American Economic Review, 110(9), 2964–2996.
doi.org/10.1257/aer.20181169
Foundationalon staggered difference in differences, fixed effects
negative-weightsTWFEheterogeneous-effects
Annotation
De Chaisemartin and D'Haultfoeuille show that the TWFE estimator can assign negative weights to some treatment effects, potentially producing estimates with the wrong sign. They propose an alternative estimator and a decomposition that reveals which group-time effects receive negative weights.
2320
de Chaisemartin, C., & D'Haultfoeuille, X. (2023). Two-Way Fixed Effects and Differences-in-Differences with Heterogeneous Treatment Effects: A Survey. Econometrics Journal, 26(3), C1–C30.
doi.org/10.1093/ectj/utac017
Surveyon event studies
TWFEheterogeneous-effectssurveyDID
Annotation
De Chaisemartin and D'Haultfoeuille provide a comprehensive survey of the recent literature on problems with two-way fixed effects estimators under heterogeneous treatment effects. They cover the key diagnostic tests (including the Goodman-Bacon decomposition), alternative estimators that are robust to heterogeneity, and practical guidance for choosing among them. The survey is essential reading for applied researchers working with event-study and difference-in-differences designs who need to understand when standard TWFE is and is not appropriate.
9919
Dehejia, R. H., & Wahba, S. (1999). Causal Effects in Nonexperimental Studies: Reevaluating the Evaluation of Training Programs. Journal of the American Statistical Association, 94(448), 1053–1062.
doi.org/10.1080/01621459.1999.10473858
Applicationon matching methods
propensity-scoreprogram-evaluationexperimental-benchmark
Annotation
Dehejia and Wahba show that propensity score matching can replicate experimental estimates of a job training program using observational data, revisiting LaLonde's influential critique. The paper demonstrates the practical value of matching by showing that propensity score methods yield estimates much closer to the experimental benchmark than the nonexperimental estimators LaLonde had examined.
1020
Dell, M. (2010). The Persistent Effects of Peru's Mining Mita. Econometrica, 78(6), 1863–1903.
doi.org/10.3982/ECTA8121
Applicationon regression discontinuity sharp
geographic-RDDcolonial-institutionspersistencespatial-discontinuity
Annotation
Dell uses a geographic RDD exploiting the historical boundary of the mita forced labor system in Peru to estimate the persistent effect of colonial institutions on economic outcomes centuries later. The study demonstrates how RDD can exploit spatial discontinuities, not just score-based cutoffs.
1920
Deshpande, M., & Li, Y. (2019). Who Is Screened Out? Application Costs and the Targeting of Disability Programs. American Economic Journal: Economic Policy, 11(4), 213–248.
doi.org/10.1257/pol.20180076
Applicationon staggered difference in differences
disability-policystaggered-rolloutfield-office-closures
Annotation
Deshpande and Li use staggered closings of Social Security field offices across the United States to estimate the effects of application costs on disability program participation. The staggered timing of office closures provides quasi-experimental variation in application costs, and the paper demonstrates how treatment-timing variation can be leveraged for credible policy evaluation.
1520
Dong, Y. (2015). Regression Discontinuity Applications with Rounding Errors in the Running Variable. Journal of Applied Econometrics, 30(3), 422–446.
doi.org/10.1002/jae.2369
Foundationalon regression discontinuity sharp
rounding-errorsdiscrete-running-variablediagnosticsmeasurement
Annotation
Dong examines regression discontinuity designs when the running variable is subject to rounding or heaping, a common practical concern. She shows that standard RD estimators can be biased in such settings and derives correction formulas for the resulting discretization bias, extending the applicability of RDD to settings with imperfect measurement of the running variable.
1520
Dong, Y., & Lewbel, A. (2015). Identifying the Effect of Changing the Policy Threshold in Regression Discontinuity Models. Review of Economics and Statistics, 97(5), 1081–1092.
doi.org/10.1162/REST_a_00510
Foundationalon regression discontinuity fuzzy
policy-thresholdcounterfactualfuzzy-RDD-extensions
Annotation
Dong and Lewbel show that the derivative of the RD treatment effect with respect to the running variable at the cutoff is identified. Under a local policy-invariance interpretation, this derivative can be used to evaluate counterfactual policies that shift the eligibility threshold, broadening the policy relevance of RDD beyond the effect at the existing cutoff.
1620
Doudchenko, N., & Imbens, G. W. (2016). Balancing, Regression, Difference-in-Differences and Synthetic Control Methods: A Synthesis. NBER Working Paper No. 22791.
doi.org/10.3386/w22791
Foundationalon synthetic control
unificationDID-connectionpenalized-regression
Annotation
Doudchenko and Imbens place synthetic control within a broader framework that includes DID and regression as special cases, proposing extensions that relax the non-negativity and adding-up constraints on weights. This paper helps researchers understand the connections between synthetic control and other methods.
9419
Dranove, D., & Olsen, C. (1994). The Economic Side Effects of Dangerous Drug Announcements. Journal of Law and Economics, 37(2), 323–348.
doi.org/10.1086/467316
Applicationon event studies
pharmaceuticalFDAregulationstock-market
Annotation
Dranove and Olsen use event studies to measure the stock market impact of FDA drug safety announcements on pharmaceutical firms. This application demonstrates how event studies can quantify the financial consequences of regulatory actions in health care and management contexts.
2520
Dube, A., Girardi, D., Jordà, Ò., & Taylor, A. M. (2025). A Local Projections Approach to Difference-in-Differences. Journal of Applied Econometrics, 40(7), 741–758.
doi.org/10.1002/jae.70000
Foundationalon staggered difference in differences
local-projectionsdynamic-effectsevent-study
Annotation
Dube and colleagues propose a local projections (LP) approach to difference-in-differences estimation that combines LPs with a flexible 'clean control' condition to define appropriate treated and control units. The LP-DiD estimator subsumes many recent solutions to negative weighting problems, accommodates covariates and nonabsorbing treatments, and is simple to implement.
0120
Duflo, E. (2001). Schooling and Labor Market Consequences of School Construction in Indonesia: Evidence from an Unusual Policy Experiment. American Economic Review, 91(4), 795–813.
doi.org/10.1257/aer.91.4.795
Applicationon difference in differences
educationschool-constructionIndonesiatreatment-intensity
Annotation
Duflo uses DiD comparing cohorts exposed to a massive school construction program in Indonesia to older cohorts not exposed, across regions with different program intensity. A beautifully clean application showing how DiD can exploit variation in treatment intensity across space and cohorts.
0720
Duflo, E., Glennerster, R., & Kremer, M. (2007). Using Randomization in Development Economics Research: A Toolkit. Handbook of Development Economics, 4, 3895–3962.
doi.org/10.1016/S1573-4471(07)04061-2
Surveyon experimental design, power analysis
development-economicstoolkitfield-experimentspractical-guide
Annotation
Duflo, Glennerster, and Kremer write a comprehensive practical guide to running randomized experiments in development economics. The chapter covers all stages from design to analysis, including power calculations, stratification, dealing with attrition, and estimating treatment effects with imperfect compliance. It has become required reading for anyone designing a field experiment.
1220
Dunning, T. (2012). Natural Experiments in the Social Sciences: A Design-Based Approach. Cambridge University Press.
doi.org/10.1017/CBO9781139084444
Foundationalon experimental design
natural-experimentsdesign-basedtextbooksocial-sciences
Annotation
Dunning provides a systematic framework for identifying and analyzing natural experiments across the social sciences. The book covers as-if random assignment, instrumental variables, regression discontinuity, and difference-in-differences through a unified design-based lens, making it essential reading for researchers exploiting natural variation for causal inference.

7919
Efron, B. (1979). Bootstrap Methods: Another Look at the Jackknife. Annals of Statistics, 7(1), 1–26.
doi.org/10.1214/aos/1176344552
Foundationalon randomization inference
Annotation
Efron introduces the bootstrap as a resampling-based alternative to the jackknife for estimating sampling distributions of statistics. The bootstrap approximates the unknown population distribution by repeated sampling with replacement from the observed data, and forms the foundation of modern resampling-based inference.

6919
Fama, E. F., Fisher, L., Jensen, M. C., & Roll, R. (1969). The Adjustment of Stock Prices to New Information. International Economic Review, 10(1), 1–21.
doi.org/10.2307/2525569
Foundationalon event studies
stock-pricesabnormal-returnsmarket-efficiency
Annotation
Fama, Fisher, Jensen, and Roll establish the modern event study methodology by studying how stock prices adjust to stock splits. They develop the framework of measuring abnormal returns around corporate events using a market model to construct the counterfactual return. This methodology has become the standard tool for studying how information events affect asset prices and is used in thousands of subsequent studies across finance and strategy.
2220
Fan, Q., Hsu, Y.-C., Lieli, R. P., & Zhang, Y. (2022). Estimation of Conditional Average Treatment Effects with High-Dimensional Data. Journal of Business & Economic Statistics, 40(1), 313–327.
doi.org/10.1080/07350015.2020.1811102
Foundationalon double debiased machine learning
CATEhigh-dimensionaldoubly-robust
Annotation
Fan and colleagues propose nonparametric estimators for conditional average treatment effects in high-dimensional settings. Their approach uses machine learning to estimate nuisance functions in a first stage, then applies local linear regression for the CATE function of interest, with functional limit theory and multiplier-bootstrap uniform inference.
9919
Fine, J. P., & Gray, R. J. (1999). A Proportional Hazards Model for the Subdistribution of a Competing Risk. Journal of the American Statistical Association, 94(446), 496–509.
doi.org/10.1080/01621459.1999.10474144
Foundationalon cox proportional hazard
foundationalcompeting-riskssubdistribution
Annotation
Fine and Gray develop a regression model for the cumulative incidence function under competing risks. The Fine-Gray model extends the Cox framework to settings where multiple event types compete, allowing estimation of covariate effects on the subdistribution hazard.
1220
Finkelstein, A., Taubman, S., Wright, B., Bernstein, M., Gruber, J., Newhouse, J. P., Allen, H., Baicker, K., & The Oregon Health Study Group (2012). The Oregon Health Insurance Experiment: Evidence from the First Year. Quarterly Journal of Economics, 127(3), 1057–1106.
doi.org/10.1093/qje/qjs020
Applicationon experimental design
health-insurancelotteryLATEfield-experiment
Annotation
Finkelstein and colleagues analyze the Oregon Health Insurance Experiment, in which uninsured low-income adults were selected by lottery for the chance to apply for Medicaid. Using this randomized controlled design with IV to handle noncompliance, they estimate the local average treatment effect of Medicaid coverage on health care utilization, financial strain, and self-reported health. The study demonstrates the practical difference between intent-to-treat and LATE estimates in a real-world experiment where not all lottery winners enrolled.
0920
Firpo, S., Fortin, N. M., & Lemieux, T. (2009). Unconditional Quantile Regressions. Econometrica, 77(3), 953–973.
doi.org/10.3982/ECTA6822
Foundationalon quantile treatment effects
foundationalunconditional-quantileRIF
Annotation
Firpo, Fortin, and Lemieux introduce the recentered influence function (RIF) regression for estimating unconditional quantile effects. They show that standard quantile regression estimates conditional quantile effects that do not aggregate to unconditional effects. RIF regression transforms the outcome variable so that OLS on the transformed outcome recovers the effect of covariates on unconditional quantiles. The key innovation enabling policy-relevant distributional analysis.
1820
Firpo, S., & Possebom, V. (2018). Synthetic Control Method: Inference, Sensitivity Analysis and Confidence Sets. Journal of Causal Inference, 6(2), 1–26.
doi.org/10.1515/jci-2016-0026
Foundationalon synthetic control
inferencesensitivity-analysisconfidence-sets
Annotation
Firpo and Possebom develop formal inference procedures for the synthetic control method, including sensitivity analysis tools and confidence sets. Their framework provides a more rigorous basis for statistical inference in synthetic control applications beyond the standard permutation-based placebo tests.
3519
Fisher, R. A. (1935). The Design of Experiments. Oliver & Boyd.
Foundationalon experimental design, randomization inference
randomizationfactorial-designfoundations
Annotation
Fisher's classic book lays the foundations of experimental design, introducing concepts like randomization, blocking, and factorial designs. The 'lady tasting tea' example from this book remains one of the most famous illustrations of hypothesis testing and the logic of controlled experiments.
1520
Flammer, C. (2015). Does Corporate Social Responsibility Lead to Superior Financial Performance? A Regression Discontinuity Approach. Management Science, 61(11), 2549–2568.
doi.org/10.1287/mnsc.2014.2038
ApplicationMgmton regression discontinuity sharp
CSRshareholder-votingmanagement-science
Annotation
Flammer uses a regression discontinuity design around close-call shareholder votes on CSR proposals, comparing proposals that pass or fail by a small margin as a quasi-experiment. She finds that adopting CSR proposals leads to positive announcement returns and superior accounting performance, with effects operating through labor productivity and sales growth. Published in Management Science, it is a prominent example of RDD in top management journals.
0120
Fleming, L., & Sorenson, O. (2001). Technology as a Complex Adaptive System: Evidence from Patent Data. Research Policy, 30(7), 1019–1039.
doi.org/10.1016/S0048-7333(00)00135-9
Applicationon poisson negative binomial
patent-citationstechnology-complexityinnovation
Annotation
Fleming and Sorenson use negative binomial regression on patent citation counts to study how the interdependence among recombined technological components — a complex-adaptive-systems framing of invention — affects the usefulness of inventions.
2520
Frake, J., Gibbs, A., Goldfarb, B., Hiraiwa, T., Starr, E., & Yamaguchi, S. (2025). From Perfect to Practical: Partial Identification Methods for Causal Inference in Strategic Management Research. Strategic Management Journal, 46(8), 1894–1929.
doi.org/10.1002/smj.3714
FoundationalMgmton difference in differences, instrumental variables
partial-identificationsensitivity-analysisboundsmanagement
Annotation
Frake and colleagues introduce partial identification methods to strategic management, providing a practical framework for assessing the sensitivity of difference-in-differences and instrumental variables estimates to violations of identifying assumptions. The paper demonstrates how researchers can construct informative bounds on treatment effects when parallel trends or exclusion restriction assumptions are relaxed. It bridges the gap between the theoretical ideal of point identification and the practical reality that identifying assumptions are rarely perfectly satisfied.
0020
Frank, K. A. (2000). Impact of a Confounding Variable on a Regression Coefficient. Sociological Methods & Research, 29(2), 147–194.
doi.org/10.1177/0049124100029002001
Foundationalon sensitivity analysis
ITCVconfounding-variablethreshold
Annotation
Frank develops the impact threshold for a confounding variable (ITCV), which calculates how much bias an omitted variable would need to introduce to invalidate an inference. This approach is widely adopted in education and management research.
8419
Freeman, R. B., & Medoff, J. L. (1984). What Do Unions Do?. Basic Books.
Applicationon fixed effects
union-wage-premiumfixed-effectslabor-economics
Annotation
Freeman and Medoff's influential book uses longitudinal data and fixed effects methods to estimate the union wage premium, demonstrating that within-worker variation provides different (and arguably more credible) estimates than cross-sectional comparisons. The union wage premium example remains a standard pedagogical motivation for fixed effects models.
1620
Fremeth, A. R., Holburn, G. L. F., & Richter, B. K. (2016). Bridging Qualitative and Quantitative Methods in Organizational Research: Applications of Synthetic Control Methodology in the U.S. Automobile Industry. Organization Science, 27(2), 462–482.
doi.org/10.1287/orsc.2015.1034
ApplicationMgmton synthetic control
managementstrategyfirm-level-synthetic-control
Annotation
Fremeth, Holburn, and Richter introduce synthetic control methodology to strategic management research, demonstrating its application for studying the causal effect of organizational and regulatory events on individual firms. The paper shows how data-driven counterfactuals can replace ad-hoc comparison group selection in comparative case studies. It provides a template for strategy researchers seeking to apply synthetic control methods to firm-level outcome data with few treated units.
1920
Freyaldenhoven, S., Hansen, C., & Shapiro, J. M. (2019). Pre-Event Trends in the Panel Event-Study Design. American Economic Review, 109(9), 3307–3338.
doi.org/10.1257/aer.20180609
Foundationalon event studies
pre-trendspanel-datainstrumental-variables
Annotation
Freyaldenhoven, Hansen, and Shapiro study panel event-study designs in which unobserved confounds can generate pre-event trends. They show how causal effects can still be identified by exploiting covariates related to the policy only through the confounds, yielding a 2SLS estimator that remains valid even when endogeneity induces pre-trends.
2220
Friebel, G., Heinz, M., & Zubanov, N. (2022). Middle Managers, Personnel Turnover, and Performance: A Long-Term Field Experiment in a Retail Chain. Management Science, 68(1), 211–229.
doi.org/10.1287/mnsc.2020.3905
ApplicationMgmton experimental design
field-experimentRCTmanagement-practicesturnoverretail
Annotation
Friebel, Heinz, and Zubanov conduct a long-term randomized field experiment in a large Eastern European retail chain, in which the CEO asked treated store managers to reduce employee quit rates. The intervention decreased the quit rate by a fifth to a quarter, lasting nine months before petering out, but reappearing after a reminder. However, there was no treatment effect on sales, illustrating that reducing turnover does not automatically translate into improved store performance.
3319
Frisch, R., & Waugh, F. V. (1933). Partial Time Regressions as Compared with Individual Trends. Econometrica, 1(4), 387–401.
doi.org/10.2307/1907330
Foundationalon ols regression
FWL-theorempartialling-outmultiple-regressionfixed-effects
Annotation
Frisch and Waugh establish that a coefficient in a multiple regression can be obtained by first residualizing both the outcome and the regressor against all other covariates. The Frisch-Waugh-Lovell (FWL) theorem provides the theoretical foundation for understanding what 'controlling for' means in multiple regression and is the basis for modern fixed-effects estimation.
1120
Funk, M. J., Westreich, D., Wiesen, C., Sturmer, T., Brookhart, M. A., & Davidian, M. (2011). Doubly Robust Estimation of Causal Effects. American Journal of Epidemiology, 173(7), 761–767.
doi.org/10.1093/aje/kwq439
Applicationon doubly robust estimation
epidemiologytutorialAIPW
Annotation
Funk and colleagues provide a practical tutorial on doubly robust estimation for epidemiologists, demonstrating through a worked example how the AIPW estimator protects against misspecification of either the outcome model or the propensity score model. This paper helps spread the method in health sciences.

1420
Gelman, A., & Carlin, J. (2014). Beyond Power Calculations: Assessing Type S (Sign) and Type M (Magnitude) Errors. Perspectives on Psychological Science, 9(6), 641–651.
doi.org/10.1177/1745691614551642
Foundationalon power analysis
Type-S-errorType-M-errorexaggeration-ratio
Annotation
Gelman and Carlin extend traditional power analysis by introducing Type S (sign) errors (the probability a significant estimate has the wrong sign) and Type M (magnitude) errors (the expected exaggeration ratio of significant estimates). These concepts provide a richer understanding of what happens in underpowered studies.
1420
Gelman, A., & Loken, E. (2014). The Statistical Crisis in Science. American Scientist, 102(6), 460–465.
doi.org/10.1511/2014.111.460
Foundationalon pre registration
replication-crisisstatistical-crisispre-registration
Annotation
Gelman and Loken argue that data-dependent analysis creates a 'garden of forking paths' that explains why many statistically significant comparisons do not hold up. They emphasize that researchers' analytical choices conditional on data characteristics inflate false positive rates even without deliberate p-hacking.
1920
Gelman, A., & Imbens, G. W. (2019). Why High-Order Polynomials Should Not Be Used in Regression Discontinuity Designs. Journal of Business & Economic Statistics, 37(3), 447–456.
doi.org/10.1080/07350015.2017.1366909
Foundationalon regression discontinuity sharp
polynomial-orderlocal-polynomialbest-practicesbandwidth
Annotation
Gelman and Imbens show that using high-order global polynomials in RDD leads to noisy estimates, sensitivity to the degree of polynomial, and poor coverage of confidence intervals. They recommend local linear or quadratic fits with appropriate bandwidth selection instead, fundamentally changing best practice for RDD estimation.
2020
Gerard, F., Rokkanen, M., & Rothe, C. (2020). Bounds on Treatment Effects in Regression Discontinuity Designs with a Manipulated Running Variable. Quantitative Economics, 11(3), 839–870.
doi.org/10.3982/QE1079
Foundationalon lee bounds
RDDmanipulationrunning-variable
Annotation
Gerard, Rokkanen, and Rothe study regression-discontinuity settings in which the running variable is manipulated, so conventional point identification fails. They show that treatment effects are still partially identified and derive sharp bounds under a general model in which the extent of manipulation is learned from the data.
1220
Gerber, A. S., & Green, D. P. (2012). Field Experiments: Design, Analysis, and Interpretation. W. W. Norton.
Surveyon experimental design
field-experimentstextbookpolitical-science
Annotation
Gerber and Green write a comprehensive textbook on field experiments covering randomization, blocking, clustering, noncompliance, and attrition. The book provides rigorous treatment of experimental design principles with practical guidance drawn from political science and public policy applications. It is particularly valuable for its coverage of complications that arise in real-world experiments, including how to handle noncompliance through intent-to-treat analysis and instrumental variables.
1020
Glynn, A. N., & Quinn, K. M. (2010). An Introduction to the Augmented Inverse Propensity Weighted Estimator. Political Analysis, 18(1), 36–56.
doi.org/10.1093/pan/mpp036
Foundationalon doubly robust estimation
political-sciencetutorialAIPW
Annotation
Glynn and Quinn introduce the AIPW estimator to political scientists, providing intuition, simulation evidence, and practical guidance. This tutorial demonstrates the advantages of doubly robust methods over propensity score weighting or outcome regression alone in social science applications.
0620
Gneezy, U., & List, J. A. (2006). Putting Behavioral Economics to Work: Testing for Gift Exchange in Labor Markets Using Field Experiments. Econometrica, 74(5), 1365–1384.
doi.org/10.1111/j.1468-0262.2006.00707.x
Applicationon lab experiment replication
Annotation
Gneezy and List conduct field experiments to test gift exchange in labor markets. Workers who received an unexpectedly higher wage initially increased effort, but the effect dissipated within hours, suggesting that strong forms of gift exchange may not persist outside the laboratory.
1620
Gobillon, L., & Magnac, T. (2016). Regional Policy Evaluation: Interactive Fixed Effects and Synthetic Controls. Review of Economics and Statistics, 98(3), 535–551.
doi.org/10.1162/REST_a_00537
Foundationalon synthetic control
interactive-fixed-effectsfactor-modelsregional-policy
Annotation
Gobillon and Magnac connect synthetic control to interactive fixed-effects models, showing that synthetic control can be interpreted as an estimator that allows for time-varying factor loadings. This paper bridges the synthetic control and factor model literatures.
1620
Goldfarb, B., & King, A. A. (2016). Scientific Apophenia in Strategic Management Research: Significance Tests & Mistaken Inference. Strategic Management Journal, 37(1), 167–176.
doi.org/10.1002/smj.2459
ApplicationMgmton specification curve
apopheniastrategic-managementrobustness
Annotation
Goldfarb and King use distributional matching and posterior predictive checks to estimate that 24-40% of significant coefficients in strategic management research would become insignificant if studies were repeated. They document the problem of apophenia (finding patterns in noise) and offer practical suggestions for reducing false and inflated findings at both the individual and field level.
2020
Goldsmith-Pinkham, P., Sorkin, I., & Swift, H. (2020). Bartik Instruments: What, When, Why, and How. American Economic Review, 110(8), 2586–2624.
doi.org/10.1257/aer.20181047
Foundationalon shift share instruments
share-exogeneitydecompositionidentification
Annotation
Goldsmith-Pinkham, Sorkin, and Swift provide a rigorous econometric framework for shift-share instruments, showing that the Bartik instrument can be decomposed into a weighted sum of individual share-based instruments. They clarify that identification requires exogeneity of the initial shares, not the shocks.
2120
Goodman-Bacon, A. (2021). Difference-in-Differences with Variation in Treatment Timing. Journal of Econometrics, 225(2), 254–277.
doi.org/10.1016/j.jeconom.2021.03.014
Foundationalon staggered difference in differences
TWFE-decompositiontreatment-timingnegative-weights
Annotation
Goodman-Bacon decomposes the two-way fixed-effects DID estimator into a weighted average of all possible two-group, two-period DID comparisons, revealing that some comparisons use already-treated units as controls. The decomposition clarifies when already-treated units enter as controls and why this can make the estimator difficult to interpret under treatment-effect heterogeneity.
2520
Gornall, W., & Strebulaev, I. A. (2025). Gender, Race, and Entrepreneurship: A Randomized Field Experiment on Venture Capitalists and Angels. Management Science, 71(6), 5308–5327.
doi.org/10.1287/mnsc.2024.4990
ApplicationMgmton experimental design
audit-studycorrespondence-studydiscriminationventure-capitalentrepreneurship+2
Annotation
Gornall and Strebulaev conduct a large-scale correspondence experiment, sending approximately 80,000 pitch emails from fictitious startups to 28,000 venture capitalists and angel investors. By randomly varying the entrepreneur's name to signal gender and race, they find that female entrepreneurs received 9% more interested replies and Asian-surname entrepreneurs received 6% more responses than White-surname entrepreneurs, indicating favorable rather than adverse bias. The paper provides large-scale experimental evidence on investor response patterns by entrepreneur demographics in entrepreneurial finance.
8419
Gourieroux, C., Monfort, A., & Trognon, A. (1984). Pseudo Maximum Likelihood Methods: Theory. Econometrica, 52(3), 681–700.
doi.org/10.2307/1913471
Foundationalon poisson negative binomial
pseudo-MLEPoisson-regressionrobust-estimationPPML
Annotation
Gourieroux, Monfort, and Trognon develop the general theory of pseudo maximum likelihood estimation for cases in which the likelihood family may be misspecified. They derive conditions for consistency and asymptotic normality and characterize efficiency bounds in this broader framework. The Poisson PML result — consistency for the conditional mean under misspecification — is a special case that underpins the later widespread use of Poisson regression with robust standard errors.
9419
Grambsch, P. M., & Therneau, T. M. (1994). Proportional Hazards Tests and Diagnostics Based on Weighted Residuals. Biometrika, 81(3), 515–526.
doi.org/10.1093/biomet/81.3.515
Foundationalon cox proportional hazard
foundationaldiagnosticsschoenfeld-residuals
Annotation
Grambsch and Therneau introduce the scaled Schoenfeld residual test for the proportional hazards assumption. Plotting scaled Schoenfeld residuals against time reveals time-varying effects. The test is the standard diagnostic in applied survival analysis.
0820
Grant, A. M. (2008). The Significance of Task Significance: Job Performance Effects, Relational Mechanisms, and Boundary Conditions. Journal of Applied Psychology, 93(1), 108–124.
doi.org/10.1037/0021-9010.93.1.108
Applicationon experimental design
task-significancemotivationorganizational-behaviorfield-experiment
Annotation
Grant conducts field experiments showing that briefly exposing workers to the beneficiaries of their work significantly increased their motivation and performance. This paper is a well-known example of experimental design applied within organizational behavior research.
0320
Greve, H. R. (2003). A Behavioral Theory of R&D Expenditures and Innovations: Evidence from Shipbuilding. Academy of Management Journal, 46(6), 685–702.
doi.org/10.5465/30040661
ApplicationMgmton poisson negative binomial
behavioral-theoryaspiration-levelsinnovationR&Dnegative-binomial+1
Annotation
Greve tests behavioral theory predictions about how performance relative to aspiration levels affects R&D investment and innovation output using count models in the Japanese shipbuilding industry. He finds that low performance triggers problemistic search (increasing R&D), high slack triggers slack search (also increasing R&D), and low performance increases risk tolerance for launching innovations. The paper demonstrates how to model count-based innovation outcomes with firm-level panel data in a management context.
7719
Griliches, Z. (1977). Estimating the Returns to Schooling: Some Econometric Problems. Econometrica, 45(1), 1–22.
doi.org/10.2307/1913285
Foundationalon ols regression
ability-biasreturns-to-educationomitted-variables
Annotation
Griliches systematically examines the biases in OLS estimates of returns to schooling, including ability bias and measurement error. This paper is a classic illustration of why researchers must think carefully about omitted variables when interpreting OLS coefficients causally.
9019
Griliches, Z. (1990). Patent Statistics as Economic Indicators: A Survey. Journal of Economic Literature, 28(4), 1661–1707.
Surveyon poisson negative binomial
patentsinnovationeconomic-indicators
Annotation
Griliches surveys patents as economic indicators of innovative output, discussing measurement issues (truncation, value heterogeneity, the use of citations as quality weights) that shape virtually all subsequent empirical work on innovation. The standard reference for researchers using patent counts or citations as outcome variables.
9419
Gruber, J. (1994). The Incidence of Mandated Maternity Benefits. American Economic Review, 84(3), 622–641.
Applicationon difference in differences
maternity-benefitslabor-economicspolicy-evaluation
Annotation
Gruber uses a DID design exploiting variation in state-level mandated maternity benefits to show that the costs of these benefits are shifted to workers in the form of lower wages. This study is a classic example of how DID can exploit policy variation across states and time.

9819
Hahn, J. (1998). On the Role of the Propensity Score in Efficient Semiparametric Estimation of Average Treatment Effects. Econometrica, 66(2), 315–331.
doi.org/10.2307/2998560
Foundationalon doubly robust estimation
semiparametric-efficiencypropensity-scoreefficiency-bound
Annotation
Hahn derives the semiparametric efficiency bound for estimating average treatment effects and shows that knowledge of the propensity score does not improve the bound—it is ancillary for ATE. The efficient estimators take the form of sample averages completed by nonparametric imputation. This paper is foundational for understanding efficient semiparametric estimation of treatment effects.
0120
Hahn, J., Todd, P., & Van der Klaauw, W. (2001). Identification and Estimation of Treatment Effects with a Regression-Discontinuity Design. Econometrica, 69(1), 201–209.
doi.org/10.1111/1468-0262.00183
Foundationalon regression discontinuity fuzzy
identificationnonparametricWald-estimator
Annotation
Hahn, Todd, and Van der Klaauw provide the formal econometric framework for both sharp and fuzzy regression discontinuity designs. For the fuzzy case, they show that the treatment effect can be identified as the ratio of the discontinuity in the outcome to the discontinuity in the treatment probability, analogous to a Wald estimator.
1220
Hainmueller, J. (2012). Entropy Balancing for Causal Effects: A Multivariate Reweighting Method to Produce Balanced Samples in Observational Studies. Political Analysis, 20(1), 25–46.
doi.org/10.1093/pan/mpr025
Foundationalon matching methods
entropy-balancingreweightingcovariate-balanceobservational-studies
Annotation
Hainmueller introduces entropy balancing, a reweighting scheme that directly targets covariate balance by finding weights that satisfy pre-specified balance constraints while remaining as close to uniform as possible. Entropy balancing has become a popular alternative to propensity score matching because it achieves exact balance on specified moments by construction.
9219
Hall, P. (1992). The Bootstrap and Edgeworth Expansion. Springer Series in Statistics.
doi.org/10.1007/978-1-4612-4384-7
Foundationalon randomization inference
Annotation
Hall provides the canonical theoretical treatment of the bootstrap, deriving its higher-order accuracy via Edgeworth expansions and characterizing when bootstrap approximations improve on first-order asymptotic theory. The reference text on bootstrap theory for inference about means and regression coefficients.
0320
Hamilton, B. H., & Nickerson, J. A. (2003). Correcting for Endogeneity in Strategic Management Research. Strategic Organization, 1(1), 51–78.
doi.org/10.1177/1476127003001001218
FoundationalMgmton ols regression
endogeneitystrategyself-selection
Annotation
Hamilton and Nickerson warn strategy researchers that naive OLS estimates of the strategy-performance relationship are often biased by endogeneity, because firms that adopt a strategy differ systematically from those that do not. They provide an accessible tutorial on endogeneity and point toward solutions including instrumental variables and Heckman selection models. The paper remains a key reference for understanding why strategic management research requires identification strategies beyond simple regression.
0420
Harrison, G. W., & List, J. A. (2004). Field Experiments. Journal of Economic Literature, 42(4), 1009–1055.
doi.org/10.1257/0022051043004577
Foundationalon experimental design, randomization inference
field-experimentstaxonomyexternal-validity
Annotation
Harrison and List provide an influential taxonomy of field experiments, distinguishing artefactual, framed, and natural field experiments from conventional lab experiments. The paper helps establish field experiments as a mainstream methodology in economics.
1620
Haushofer, J., & Shapiro, J. (2016). The Short-Term Impact of Unconditional Cash Transfers to the Poor: Experimental Evidence from Kenya. Quarterly Journal of Economics, 131(4), 1973–2042.
doi.org/10.1093/qje/qjw025
Applicationon multiple testing
cash-transfersRCTFDRdevelopment-economics
Annotation
Haushofer and Shapiro evaluate GiveDirectly's unconditional cash transfer program in Kenya, testing effects across many outcome domains including consumption, assets, food security, health, and psychological well-being. They applied FWER corrections with bootstrapped p-values across outcome families, providing a model for how to handle multiple testing transparently in large-scale randomized evaluations. A 2017 erratum (QJE 132(4): 2057–2060) corrected the FWER-adjusted p-values in Tables I and II, which had used insufficient bootstrap iterations.
7819
Hausman, J. A. (1978). Specification Tests in Econometrics. Econometrica, 46(6), 1251–1271.
doi.org/10.2307/1913827
Foundationalon fixed effects, random effects
Hausman-testspecification-testfixed-vs-random
Annotation
Hausman develops a general test for comparing two estimators—one consistent under a broader set of assumptions (fixed effects) and one efficient under stronger assumptions (random effects). The 'Hausman test' for choosing between fixed and random effects is one of the most frequently used specification tests in applied economics.
8119
Hausman, J. A., & Taylor, W. E. (1981). Panel Data and Unobservable Individual Effects. Econometrica, 49(6), 1377–1398.
doi.org/10.2307/1911406
Foundationalon random effects
Hausman-Taylortime-invariant-variablespanel-datainstrumental-variables
Annotation
Hausman and Taylor develop an instrumental variables estimator for panel data that allows consistent estimation of coefficients on time-invariant variables even when individual effects are correlated with some regressors. The Hausman-Taylor estimator occupies a middle ground between fixed effects (which cannot estimate time-invariant coefficients) and random effects (which requires strict exogeneity).
8419
Hausman, J., Hall, B. H., & Griliches, Z. (1984). Econometric Models for Count Data with an Application to the Patents–R&D Relationship. Econometrica, 52(4), 909–938.
doi.org/10.2307/1911191
Foundationalon poisson negative binomial
count-datapatentsR&Dpanel-data
Annotation
Hausman, Hall, and Griliches develop the econometric framework for Poisson and negative binomial regression models applied to count data, using the relationship between R&D spending and patent counts as the motivating application. The paper is a classic early econometric treatment of count-data models in panel settings.
8419
Hausman, J., & McFadden, D. (1984). Specification Tests for the Multinomial Logit Model. Econometrica, 52(5), 1219–1240.
doi.org/10.2307/1910997
Foundationalon logit probit
IIAspecification-testmultinomial-logit
Annotation
Hausman and McFadden develop a specification test for the independence of irrelevant alternatives (IIA) assumption in multinomial logit. The test allows researchers to assess whether the logit model's restrictive substitution patterns are appropriate for their data, which is critical for applied work with multiple choice categories.
1920
Haven, T. L., & Van Grootel, L. (2019). Preregistering Qualitative Research. Accountability in Research, 26(3), 229–244.
doi.org/10.1080/08989621.2019.1580147
Surveyon pre registration
qualitative-researchpre-registrationextension
Annotation
Haven and Van Grootel explore extending pre-registration to qualitative research, discussing what elements of qualitative studies can and should be pre-registered. This paper broadens the pre-registration conversation beyond quantitative experimental designs.
7919
Heckman, J. J. (1979). Sample Selection Bias as a Specification Error. Econometrica, 47(1), 153–161.
doi.org/10.2307/1912352
Foundationalon heckman selection model, lee bounds
foundationalselection-biasinverse-mills-ratio
Annotation
Heckman introduces the two-step estimator for correcting sample selection bias using the inverse Mills ratio. The paper shows that selection bias can be treated as an omitted variable problem, where the omitted variable is the conditional expectation of the error term given selection. One of the most cited papers in econometrics.
9719
Heckman, J. J., Ichimura, H., & Todd, P. E. (1997). Matching as an Econometric Evaluation Estimator: Evidence from Evaluating a Job Training Programme. Review of Economic Studies, 64(4), 605–654.
doi.org/10.2307/2971733
Foundationalon matching methods
matching-estimatorcommon-supportprogram-evaluation
Annotation
Heckman, Ichimura, and Todd develop the econometric theory behind matching estimators, including conditions for identification and the importance of common support. They apply these methods to evaluate job training programs and show when matching works well and when it does not.
0520
Heckman, J. J., & Vytlacil, E. (2005). Structural Equations, Treatment Effects, and Econometric Policy Evaluation. Econometrica, 73(3), 669–738.
doi.org/10.1111/j.1468-0262.2005.00594.x
Foundationalon marginal treatment effects
MTEtreatment-effectsLATEATEATT+2
Annotation
Heckman and Vytlacil use the marginal treatment effect (MTE) to connect the treatment-effects literature with structural econometric policy evaluation. A central result is that commonly used treatment-effect parameters (ATE, ATT, LATE, PRTE) can be expressed as weighted averages of the MTE curve, with each estimand using a different weight function. The framework shows how IV estimates with different instruments recover different weighted averages of the same underlying MTE, providing the theoretical foundation for understanding instrument-dependent variation in treatment-effect estimates.
0620
Henderson, A. D., Miller, D., & Hambrick, D. C. (2006). How Quickly Do CEOs Become Obsolete? Industry Dynamism, CEO Tenure, and Company Performance. Strategic Management Journal, 27(5), 447–460.
doi.org/10.1002/smj.524
ApplicationMgmton fixed effects
CEO-tenurefirm-performanceindustry-dynamism
Annotation
Henderson, Miller, and Hambrick study how CEO tenure affects performance in dynamic versus stable industries in this longitudinal strategy paper. In the stable food industry, performance improved steadily with tenure, declining only after 10-15 years; in the dynamic computer industry, performance declined steadily from the start. The paper demonstrates that the relationship between CEO tenure and performance is contingent on industry dynamism.
1720
Heß, S. (2017). Randomization Inference with Stata: A Guide and Software. Stata Journal, 17(3), 630–651.
doi.org/10.1177/1536867X1701700306
Foundationalon randomization inference
Statasoftwareimplementation
Annotation
Heß develops the ritest Stata command and provide a practical guide to implementing randomization inference. The paper covers standard and clustered randomization designs and demonstrates how to conduct Fisher exact tests for a variety of experimental and quasi-experimental settings.
0720
Ho, D. E., Imai, K., King, G., & Stuart, E. A. (2007). Matching as Nonparametric Preprocessing for Reducing Model Dependence in Parametric Causal Inference. Political Analysis, 15(3), 199–236.
doi.org/10.1093/pan/mpl013
Foundationalon matching methods
preprocessingmodel-dependencenonparametriccausal-inference
Annotation
Ho, Imai, King, and Stuart argue that matching should be used as a preprocessing step before parametric modeling, reducing model dependence and improving robustness of causal estimates. This influential paper reframed matching not as a standalone estimator but as a way to make subsequent parametric analyses less sensitive to specification choices.
0120
Hoenig, J. M., & Heisey, D. M. (2001). The Abuse of Power: The Pervasive Fallacy of Power Calculations for Data Analysis. The American Statistician, 55(1), 19–24.
doi.org/10.1198/000313001300339897
Foundationalon power analysis
post-hoc-powerstatistical-fallacypower-calculations
Annotation
Hoenig and Heisey demonstrate that post hoc (observed) power calculations are fundamentally flawed because they are a monotone function of the p-value and add no information beyond the test result itself. This paper is essential reading for understanding why power analysis must be conducted before data collection.
0720
Hoetker, G. (2007). The Use of Logit and Probit Models in Strategic Management Research: Critical Issues. Strategic Management Journal, 28(4), 331–343.
doi.org/10.1002/smj.582
ApplicationMgmton logit probit
strategy-researchmethodologycoefficient-comparison
Annotation
Hoetker reviews how strategy researchers use logit and probit models and identifies common pitfalls, including misinterpretation of coefficients across groups and incorrect use of interaction terms. This paper provides concrete guidance for improving practice in management journals.
9719
Hofmann, D. A. (1997). An Overview of the Logic and Rationale of Hierarchical Linear Models. Journal of Management, 23(6), 723–744.
doi.org/10.1177/014920639702300602
FoundationalMgmton random effects
HLMmanagement-methodologymultilevel
Annotation
Hofmann introduces hierarchical linear models to the management research community, explaining when and why multilevel random-effects models are appropriate for organizational data with nested structures. This tutorial is highly influential in promoting multilevel methods in management journals.
8619
Holland, P. W. (1986). Statistics and Causal Inference. Journal of the American Statistical Association, 81(396), 945–960.
doi.org/10.1080/01621459.1986.10478354
Foundationalon ols regression
causal-inferencepotential-outcomesRubin-causal-modelfundamental-problem
Annotation
Holland articulates the fundamental problem of causal inference—that we can never observe both potential outcomes for the same unit—and formalizes the Rubin Causal Model framework. His dictum 'no causation without manipulation' shaped how a generation of researchers thinks about the conditions under which statistical associations can be given causal interpretations.
1720
Hollenbeck, J. R., & Wright, P. M. (2017). Harking, Sharking, and Tharking: Making the Case for Post Hoc Analysis of Scientific Data. Journal of Management, 43(1), 5–18.
doi.org/10.1177/0149206316679487
FoundationalMgmton pre registration
HARKingpost-hoc-analysismanagement-methodology
Annotation
Hollenbeck and Wright introduce the concept of 'Tharking' (Transparently Hypothesizing After Results Are Known), arguing that post hoc analysis of scientific data is valuable when conducted and reported transparently. They distinguish destructive HARKing from constructive post hoc exploration, making the case that management researchers should embrace exploratory analysis in discussion sections rather than disguising it as confirmatory.
1720
Hoogendoorn, S., Parker, S. C., & van Praag, M. (2017). Smart or Diverse Start-up Teams? Evidence from a Field Experiment. Organization Science, 28(6), 1010–1028.
doi.org/10.1287/orsc.2017.1158
ApplicationMgmton experimental design
field-experimentteam-diversityentrepreneurshipperformance
Annotation
Hoogendoorn, Parker, and van Praag conduct a field experiment with 573 students randomly assigned to 49 startup teams that varied in cognitive ability dispersion. They find an inverted U-shaped relationship between ability dispersion and team performance, with moderately diverse teams in ability outperforming both homogeneous and highly dispersed teams. The random assignment to teams ensures that ability composition is exogenous, providing clean experimental identification of the effect of team cognitive diversity on venture performance.
0020
Horowitz, J. L., & Manski, C. F. (2000). Nonparametric Analysis of Randomized Experiments with Missing Covariate and Outcome Data. Journal of the American Statistical Association, 95(449), 77–84.
doi.org/10.1080/01621459.2000.10473902
Foundationalon lee bounds
missing-datanonparametric-boundsrandomized-experiments
Annotation
Horowitz and Manski extend the bounding approach to experiments with missing data on both covariates and outcomes. They show how to construct valid bounds under different assumptions about the missing data mechanism, providing a principled alternative to complete-case analysis and imputation.
2420
Hurst, R., Lee, S., & Frake, J. (2024). The Effect of Flatter Hierarchy on Applicant Pool Gender Diversity: Evidence from Experiments. Strategic Management Journal, 45(8), 1446–1484.
doi.org/10.1002/smj.3590
ApplicationMgmton experimental design
reverse-audit-studyfield-experimentgenderhierarchyrecruitment+1
Annotation
Hurst, Lee, and Frake conduct a reverse audit study in partnership with a U.S. healthcare startup, sending recruitment emails to approximately 8,400 job seekers with randomly varied descriptions of the firm's organizational hierarchy. Featuring a flatter hierarchy did not significantly affect applicant pool size but significantly decreased women's representation, because women perceived flatter structures as offering fewer career advancement opportunities and greater workload burdens.
9519
Huselid, M. A. (1995). The Impact of Human Resource Management Practices on Turnover, Productivity, and Corporate Financial Performance. Academy of Management Journal, 38(3), 635–672.
doi.org/10.2307/256741
ApplicationMgmton ols regression
human-resource-managementfirm-performancestrategic-HRM
Annotation
Huselid uses OLS (and related cross-sectional methods) to estimate the relationship between HR practices and firm performance in this influential management study. It helps launch the field of strategic HRM and illustrates both the power and limitations of regression-based approaches in management research.

1220
Iacus, S. M., King, G., & Porro, G. (2012). Causal Inference without Balance Checking: Coarsened Exact Matching. Political Analysis, 20(1), 1–24.
doi.org/10.1093/pan/mpr013
Foundationalon matching methods
coarsened-exact-matchingCEMbalance
Annotation
Iacus, King, and Porro introduce Coarsened Exact Matching (CEM), which coarsens covariates into bins and then performs exact matching within those bins. CEM avoids many pitfalls of propensity score matching, such as the need to check balance iteratively, and gives the researcher direct control over the matching quality.
1020
Imai, K., Keele, L., & Tingley, D. (2010). A General Approach to Causal Mediation Analysis. Psychological Methods, 15(4), 309–334.
doi.org/10.1037/a0020761
Foundationalon causal mediation analysis
potential-outcomessequential-ignorabilitysensitivity-analysis
Annotation
Imai, Keele, and Tingley develop a general framework for causal mediation analysis grounded in the potential outcomes framework. They clarify the assumptions needed for identifying causal mediation effects, particularly the sequential ignorability assumption, and provide sensitivity analyses for violations.
1920
Imai, K., & Kim, I. S. (2019). When Should We Use Unit Fixed Effects Regression Models for Causal Inference with Longitudinal Data?. American Journal of Political Science, 63(2), 467–490.
doi.org/10.1111/ajps.12417
Foundationalon fixed effects
causal-inferencelongitudinal-datatreatment-historyassumptions
Annotation
Imai and Kim provide a modern causal-inference framework for understanding when unit fixed effects regression yields unbiased estimates with longitudinal data. They clarify the often-implicit assumptions about treatment history and carryover effects, offering a more rigorous foundation for applied fixed effects analysis.
9419
Imbens, G. W., & Angrist, J. D. (1994). Identification and Estimation of Local Average Treatment Effects. Econometrica, 62(2), 467–475.
doi.org/10.2307/2951620
Foundationalon instrumental variables
LATEcompliersmonotonicityidentification
Annotation
Imbens and Angrist show that IV identifies the average causal effect for compliers -- the subpopulation whose treatment status is changed by the instrument -- under the monotonicity assumption, in this foundational paper on LATE. This reinterpretation fundamentally changed how researchers understand what IV estimates.
0420
Imbens, G. W. (2004). Nonparametric Estimation of Average Treatment Effects Under Exogeneity: A Review. Review of Economics and Statistics, 86(1), 4–29.
doi.org/10.1162/003465304323023651
Surveyon matching methods
average-treatment-effectunconfoundednessnonparametricsurvey
Annotation
Imbens provides a comprehensive review of nonparametric methods for estimating average treatment effects under the unconfoundedness assumption, covering matching, weighting, and subclassification estimators. This survey unifies the theoretical foundations of matching methods and clarified the connections between different estimators used in program evaluation.
0420
Imbens, G. W., & Manski, C. F. (2004). Confidence Intervals for Partially Identified Parameters. Econometrica, 72(6), 1845–1857.
doi.org/10.1111/j.1468-0262.2004.00555.x
Foundationalon lee bounds
partial-identificationconfidence-intervalsboundsinference
Annotation
Imbens and Manski develop methods for constructing valid confidence intervals when parameters are only partially identified—that is, when the data and assumptions narrow the parameter to a set rather than a point. This paper provides the inferential foundation for reporting uncertainty around bounds estimates, including Lee bounds.
0820
Imbens, G. W., & Lemieux, T. (2008). Regression Discontinuity Designs: A Guide to Practice. Journal of Econometrics, 142(2), 615–635.
doi.org/10.1016/j.jeconom.2007.05.001
Foundationalon regression discontinuity fuzzy, regression discontinuity sharp
practical-guidebandwidth-selectionlocal-IV
Annotation
Imbens and Lemieux provide a comprehensive practical guide to implementing RDD, covering bandwidth selection, functional form, and graphical analysis. Their treatment of fuzzy RDD as a local IV estimator clarified the interpretation and implementation for applied researchers.
1220
Imbens, G., & Kalyanaraman, K. (2012). Optimal Bandwidth Choice for the Regression Discontinuity Estimator. Review of Economic Studies, 79(3), 933–959.
doi.org/10.1093/restud/rdr043
Foundationalon regression discontinuity fuzzy
bandwidth-selectionlocal-linearoptimal-bandwidth
Annotation
Imbens and Kalyanaraman derive the asymptotically optimal bandwidth for the local linear regression discontinuity estimator and propose a simple data-driven bandwidth selector. The IK bandwidth becomes the standard choice before the Calonico-Cattaneo-Titiunik (2014) refinement.
1520
Imbens, G. W. (2015). Matching Methods in Practice: Three Examples. Journal of Human Resources, 50(2), 373–419.
doi.org/10.3368/jhr.50.2.373
Applicationon matching methods
practical-guidepropensity-scorebalancesensitivity-analysis
Annotation
Imbens demonstrates how to implement matching methods in practice through three detailed empirical examples, covering propensity score estimation, covariate balance assessment, overlap and trimming, and robustness to alternative estimators. This paper is an invaluable practical guide that bridges the gap between matching theory and applied research.
1520
Imbens, G. W., & Rubin, D. B. (2015). Causal Inference for Statistics, Social, and Biomedical Sciences: An Introduction. Cambridge University Press.
doi.org/10.1017/CBO9781139025751
Surveyon matching methods, randomization inference
causal-inferencepotential-outcomespropensity-scoretextbook
Annotation
Imbens and Rubin provide a comprehensive textbook grounding causal inference in the potential outcomes framework, with detailed treatment of matching, propensity scores, and subclassification. They provide rigorous foundations for selection-on-observables methods.
1720
Ioannidis, J. P. A., Stanley, T. D., & Doucouliagos, H. (2017). The Power of Bias in Economics Research. Economic Journal, 127(605), F236–F265.
doi.org/10.1111/ecoj.12461
Surveyon power analysis
underpowered-studiespublication-biasmeta-science
Annotation
Ioannidis, Stanley, and Doucouliagos conduct a large-scale assessment of statistical power in economics research and find that the median power to detect typical effect sizes is only 18%. They document widespread underpowering and publication bias, highlighting the importance of ex ante power analysis.
9519
Islam, N. (1995). Growth Empirics: A Panel Data Approach. Quarterly Journal of Economics, 110(4), 1127–1170.
doi.org/10.2307/2946651
Applicationon random effects
growth-empiricsconvergencecross-countrypanel-data
Annotation
Islam applies panel data methods—including random effects and fixed effects—to the cross-country growth regression framework, showing that accounting for unobserved country heterogeneity substantially changes estimates of convergence rates. This paper demonstrates the importance of choosing between fixed and random effects in macroeconomic growth empirics.

1820
Jaeger, D. A., Ruist, J., & Stuhler, J. (2018). Shift-Share Instruments and the Impact of Immigration. NBER Working Paper No. 24285.
doi.org/10.3386/w24285
Foundationalon shift share instruments
immigrationserial-correlationexclusion-restriction
Annotation
Jaeger, Ruist, and Stuhler highlight a threat to shift-share instruments in immigration research: serial correlation in immigrant inflows can bias estimates if past immigration affects current outcomes through channels other than current immigration. This paper raises important concerns about the exclusion restriction.
2420
Jia, N., Luo, X., Fang, Z., & Liao, C. (2024). When and How Artificial Intelligence Augments Employee Creativity. Academy of Management Journal, 67(1), 5–32.
doi.org/10.5465/amj.2022.0426
ApplicationMgmton experimental design
field-experimentRCTartificial-intelligencecreativitydouble-randomization
Annotation
Jia and colleagues conduct a randomized field experiment at a telemarketing company that varies AI augmentation across employees and examines how the effect on creativity depends on task characteristics. The design illustrates how field experiments paired with task-level heterogeneity analysis can identify not just the average effect of AI but also where the technology is most complementary to human creative work.

0720
Kang, J. D. Y., & Schafer, J. L. (2007). Demystifying Double Robustness: A Comparison of Alternative Strategies for Estimating a Population Mean from Incomplete Data. Statistical Science, 22(4), 523–539.
doi.org/10.1214/07-STS227
Surveyon doubly robust estimation
model-misspecificationsimulationcritical-assessment
Annotation
Kang and Schafer show through simulations that doubly robust estimators can perform poorly when both models are moderately misspecified, even though they remain consistent when one model is correct. This influential paper tempered enthusiasm and motivated further methodological work on practical performance.
1620
Kang, S. K., DeCelles, K. A., Tilcsik, A., & Jun, S. (2016). Whitened Résumés: Race and Self-Presentation in the Labor Market. Administrative Science Quarterly, 61(3), 469–502.
doi.org/10.1177/0001839216639577
ApplicationMgmton experimental design
audit-studydiscriminationhiringracerésumés
Annotation
Kang and colleagues combine qualitative interviews with minority applicants, a laboratory experiment, and a résumé audit study sending fictitious applications to real employers. They find that minority applicants who 'whitened' their résumés received significantly more callbacks, providing a powerful mixed-methods example of how audit studies can identify discrimination in hiring.
5819
Kaplan, E. L., & Meier, P. (1958). Nonparametric Estimation from Incomplete Observations. Journal of the American Statistical Association, 53(282), 457–481.
doi.org/10.1080/01621459.1958.10501452
Foundationalon cox proportional hazard
foundationalnonparametricsurvival-function
Annotation
Kaplan and Meier introduce the product-limit estimator (Kaplan-Meier estimator) for the survival function from right-censored data. The KM curve is the standard nonparametric tool for visualizing survival and comparing groups before fitting regression models.
0220
Katila, R., & Ahuja, G. (2002). Something Old, Something New: A Longitudinal Study of Search Behavior and New Product Introduction. Academy of Management Journal, 45(6), 1183–1194.
doi.org/10.2307/3069433
ApplicationMgmton poisson negative binomial
knowledge-searchnew-productsinnovation-management
Annotation
Katila and Ahuja use Poisson panel count models to study how the depth and scope of a firm's knowledge search affect new product introductions, finding non-linear effects of search behaviors on innovative output.
2220
Kaul, A., Klossner, S., Pfeifer, G., & Schieler, M. (2022). Standard Synthetic Control Methods: The Case of Using All Preintervention Outcomes Together With Covariates. Journal of Business & Economic Statistics, 40(3), 1362–1376.
doi.org/10.1080/07350015.2021.1930012
Applicationon synthetic control
synthetic-controlmatching-pitfallspre-treatment-outcomes
Annotation
Kaul et al. show that using all pre-treatment outcome lags as predictors in synthetic control (a form of matching for aggregate units) renders other covariates irrelevant, threatening unbiasedness. Their finding highlights pitfalls when matching on pre-treatment outcomes and is relevant for understanding matching assumptions more broadly.
0120
King, G., & Zeng, L. (2001). Logistic Regression in Rare Events Data. Political Analysis, 9(2), 137–163.
doi.org/10.1093/oxfordjournals.pan.a004868
Foundationalon logit probit
rare-eventslogistic-regressionbinary-outcomesmethodology
Annotation
King and Zeng develop a correction for logistic regression when the outcome event is rare. Standard logit underestimates the probability of rare events; their rare-events logit (relogit) applies a correction based on prior information about the event rate in the population. Essential reference for binary outcome studies with highly imbalanced classes.
1520
King, G., & Roberts, M. E. (2015). How Robust Standard Errors Expose Methodological Problems They Do Not Fix, and What to Do About It. Political Analysis, 23(2), 159–179.
doi.org/10.1093/pan/mpu015
Surveyon ols regression
robust-standard-errorsmodel-specificationmethodology
Annotation
King and Roberts argue that researchers often use robust standard errors as a band-aid rather than fixing the underlying model specification. They provide practical guidance on when robust SEs are appropriate and when the model itself needs to be reconsidered.
1920
King, G., & Nielsen, R. (2019). Why Propensity Scores Should Not Be Used for Matching. Political Analysis, 27(4), 435–454.
doi.org/10.1017/pan.2019.11
Surveyon matching methods
propensity-score-critiquebalancemodel-dependence
Annotation
King and Nielsen argue that propensity score matching can increase imbalance, model dependence, and bias relative to other matching methods. This provocative paper has influenced a shift toward alternatives like CEM and Mahalanobis distance matching in applied research.
1320
Kleven, H. J., & Waseem, M. (2013). Using Notches to Uncover Optimization Frictions and Structural Elasticities: Theory and Evidence from Pakistan. Quarterly Journal of Economics, 128(2), 669–723.
doi.org/10.1093/qje/qjt004
Foundationalon bunching estimation
notchoptimization-frictionsstructural-estimationPakistan
Annotation
Kleven and Waseem extend bunching estimation from kinks to notches -- discrete jumps in the tax schedule where the average tax rate changes discontinuously. They develop a structural framework that distinguishes between frictionless and frictional bunching, showing that optimization frictions attenuate observed bunching and cause the naive estimator to understate the true elasticity. Their model identifies both the structural elasticity and the friction distribution from the observed bunching pattern. Applied to Pakistan's income tax notches, they demonstrate that frictions are empirically important and that ignoring them substantially biases elasticity estimates downward.
1620
Kleven, H. J. (2016). Bunching. Annual Review of Economics, 8, 435–464.
doi.org/10.1146/annurev-economics-080315-015234
Surveyon bunching estimation
surveykinknotchfrictionsmethodology
Annotation
Kleven provides a comprehensive survey of the bunching methodology, covering both kink and notch designs, the role of optimization frictions, and extensions to multiple applications beyond taxation. The survey unifies the theoretical frameworks from Saez (2010) and Kleven and Waseem (2013), discusses practical implementation issues (polynomial order, bandwidth, bin width), and catalogs the growing literature applying bunching to estimate behavioral elasticities in public finance, labor economics, and regulation. Essential reading for anyone starting with bunching methods.
1620
Kline, P., & Walters, C. R. (2016). Evaluating Public Programs with Close Substitutes: The Case of Head Start. Quarterly Journal of Economics, 131(4), 1795–1848.
doi.org/10.1093/qje/qjw027
Applicationon lee bounds
Head-Startprogram-evaluationsubstitution
Annotation
Kline and Walters develop a structural framework to evaluate Head Start's cost-effectiveness in the presence of close-substitute preschool programs, combining the Head Start Impact Study RCT with an unordered discrete-choice model of preschool participation. They show that ignoring substitution to alternative preschools substantially understates Head Start's value, illustrating how careful modeling of program substitutes is essential for credible cost-benefit analysis of social programs.
2120
Knaus, M. C., Lechner, M., & Strittmatter, A. (2021). Machine Learning Estimation of Heterogeneous Causal Effects: Empirical Monte Carlo Evidence. Econometrics Journal, 24(1), 134–161.
doi.org/10.1093/ectj/utaa014
Applicationon double debiased machine learning
labor-market-policyheterogeneous-effectsempirical-Monte-Carlo
Annotation
Knaus, Lechner, and Strittmatter apply DML-based methods to estimate heterogeneous causal effects of a Swiss active labor market program, comparing causal forests, DML, and other machine learning approaches. The paper provides an empirical Monte Carlo framework that uses real data to benchmark different estimators, offering practical guidance for applied researchers choosing among machine learning causal inference tools.
7819
Koenker, R., & Bassett, G., Jr. (1978). Regression Quantiles. Econometrica, 46(1), 33–50.
doi.org/10.2307/1913643
Foundationalon quantile treatment effects
foundationalquantile-regressioneconometrics
Annotation
Koenker and Bassett introduce quantile regression, proposing to estimate conditional quantile functions by minimizing an asymmetric absolute loss (check function), generalizing least absolute deviations to arbitrary quantiles. Establishes asymptotic theory and demonstrates robustness to outliers and heteroscedasticity relative to OLS.
9919
Koenker, R., & Machado, J. A. F. (1999). Goodness of Fit and Related Inference Processes for Quantile Regression. Journal of the American Statistical Association, 94(448), 1296–1310.
doi.org/10.1080/01621459.1999.10473882
Foundationalon quantile treatment effects
Annotation
Koenker and Machado develop goodness-of-fit measures for quantile regression, including the pseudo-R-squared based on the ratio of minimized check functions that has become the standard fit statistic for quantile regression models.
1520
Kontopantelis, E., Doran, T., Springate, D. A., Buchan, I., & Reeves, D. (2015). Regression Based Quasi-Experimental Approach When Randomisation Is Not an Option: Interrupted Time Series Analysis. BMJ, 350, h2750.
doi.org/10.1136/bmj.h2750
Surveyon interrupted time series
surveypractical-guidehealth-policy
Annotation
Kontopantelis and colleagues provide a practical guide to ITS analysis published in the BMJ. Covers model specification, autocorrelation testing, sensitivity analyses, and the addition of control series. Provides clear visual examples of level and slope changes and discusses common pitfalls.
0720
Kothari, S. P., & Warner, J. B. (2007). Econometrics of Event Studies. Handbook of Empirical Corporate Finance, 1, 3–36.
doi.org/10.1016/B978-0-444-53265-7.50015-9
Surveyon event studies
long-horizoncross-sectionaleconometrics
Annotation
Kothari and Warner provide an updated survey of event study methods, covering long-horizon event studies, cross-sectional regression approaches, and the econometric challenges that arise with overlapping events and event-induced variance changes. The survey documents how the basic FFJR framework is extended and refined over four decades. It is an essential reference for researchers designing event studies who need to understand the full menu of methodological choices and their trade-offs.
9919
Krueger, A. B. (1999). Experimental Estimates of Education Production Functions. Quarterly Journal of Economics, 114(2), 497–532.
doi.org/10.1162/003355399556052
Applicationon ols regression
educationclass-sizerandomized-experimentProject-STAR
Annotation
Krueger uses Tennessee's Project STAR randomized class-size experiment to estimate the effect of class size on student achievement via OLS. Because treatment is randomized, the OLS coefficient has a causal interpretation, demonstrating that the method is not the issue -- the research design is what determines causality.
1920
Künzel, S. R., Sekhon, J. S., Bickel, P. J., & Yu, B. (2019). Metalearners for Estimating Heterogeneous Treatment Effects Using Machine Learning. Proceedings of the National Academy of Sciences, 116(10), 4156–4165.
doi.org/10.1073/pnas.1804597116
Foundationalon causal forests
X-learnermeta-learnersCATE
Annotation
Künzel and colleagues propose the X-learner meta-algorithm for estimating CATEs and systematically compare it with T-learners and S-learners using random forests and BART as base learners. The paper provides practical guidance on when different meta-learning strategies perform well or poorly.

8219
Laird, N. M., & Ware, J. H. (1982). Random-Effects Models for Longitudinal Data. Biometrics, 38(4), 963–974.
doi.org/10.2307/2529876
Foundationalon random effects
longitudinal-datamixed-effectsbiostatistics
Annotation
Laird and Ware develop the general framework for random-effects models in longitudinal data, integrating fixed population parameters with random individual-level effects. This paper is foundational for the mixed-effects modeling approach widely used in biostatistics and social sciences.
0720
Lalive, R. (2007). Unemployment Benefits, Unemployment Duration, and Post-Unemployment Jobs: A Regression Discontinuity Approach. American Economic Review, 97(2), 108–112.
doi.org/10.1257/aer.97.2.108
Foundationalon lab rkd replication
unemployment-insuranceRDDmoral-hazardpost-unemployment-jobs
Annotation
Lalive uses age-based eligibility cutoffs for extended unemployment benefits in Austria (the Regional Extended Benefit Program — REBP) in a regression discontinuity design to estimate the causal effect of longer potential benefit duration on actual unemployment duration and on subsequent job match quality. The companion full paper (Lalive 2008, Journal of Econometrics) develops the RDD methodology and reports an elasticity of unemployment duration with respect to potential benefit duration in the range commonly cited (0.1–0.5). This AER Papers & Proceedings note is the short companion piece focused on post-unemployment job outcomes.
0820
Lalive, R. (2008). How do extended benefits affect unemployment duration? A regression discontinuity approach. Journal of Econometrics, 142(2), 785–806.
doi.org/10.1016/j.jeconom.2007.05.013
Foundationalon lab rkd replication
unemployment-insuranceRDDmoral-hazardAustrianatural-experiment
Annotation
Lalive uses the Austrian Regional Extended Benefit Program (REBP) — which extended potential UI benefit duration from 30 weeks to 209 weeks for older workers in specific regions — as a regression discontinuity natural experiment. Exploiting sharp age and regional eligibility cutoffs, Lalive estimates that extended benefits substantially raise unemployment duration. The paper is a standard reference for the moral-hazard effect of benefit duration on unemployment spell length and a methodological reference for applying RDD to UI policy.
8619
LaLonde, R. J. (1986). Evaluating the Econometric Evaluations of Training Programs with Experimental Data. American Economic Review, 76(4), 604–620.
Foundationalon matching methods
experimental-benchmarkprogram-evaluationjob-trainingnon-experimental-methods
Annotation
LaLonde compares econometric estimates of a job training program's effect with experimental benchmarks from a randomized trial, finding that non-experimental methods often failed to replicate the experimental results. This paper establishes the standard test bed for evaluating matching and other observational causal methods.
9219
Lambert, D. (1992). Zero-Inflated Poisson Regression, with an Application to Defects in Manufacturing. Technometrics, 34(1), 1–14.
doi.org/10.2307/1269547
Foundationalon poisson negative binomial
zero-inflated-Poissonexcess-zerosmanufacturingcount-data
Annotation
Lambert introduces the zero-inflated Poisson (ZIP) model, which accounts for excess zeros in count data by mixing a point mass at zero with a Poisson distribution. The ZIP model has become a standard tool for count outcomes where a subpopulation generates only zeros.
1520
Landais, C. (2015). Assessing the Welfare Effects of Unemployment Benefits Using the Regression Kink Design. American Economic Journal: Economic Policy, 7(4), 243–278.
doi.org/10.1257/pol.20130248
Applicationon regression kink design
unemployment-insuranceUSbenefit-schedulessocial-insurance
Annotation
Landais uses the regression kink design to decompose the moral hazard and liquidity effects of unemployment insurance benefits using US data. The progressive UI benefit formula creates kinks that provide quasi-experimental variation in benefit levels. This paper demonstrates the power of RKD for evaluating social insurance programs where benefits change slope at known thresholds.
8319
Leamer, E. E. (1983). Let's Take the Con Out of Econometrics. American Economic Review, 73(1), 31–43.
Foundationalon specification curve
extreme-boundsrobustnessspecification-sensitivity
Annotation
Leamer's classic paper argues that the sensitivity of empirical results to specification choices undermines the credibility of econometric evidence. He proposes extreme bounds analysis, an early form of systematic robustness testing that anticipated modern specification curve analysis by several decades.
0820
Lee, D. S. (2008). Randomized Experiments from Non-random Selection in U.S. House Elections. Journal of Econometrics, 142(2), 675–697.
doi.org/10.1016/j.jeconom.2007.05.004
Foundationalon regression discontinuity sharp
electionslocal-randomizationmanipulation
Annotation
Lee formalizes the conditions under which an RDD is 'as good as' a randomized experiment—namely, when agents cannot precisely manipulate the running variable around the cutoff. Applied to U.S. House elections, this paper establishes the modern theoretical foundation for sharp RDD.
0920
Lee, D. S. (2009). Training, Wages, and Sample Selection: Estimating Sharp Bounds on Treatment Effects. Review of Economic Studies, 76(3), 1071–1102.
doi.org/10.1111/j.1467-937X.2009.00536.x
Foundationalon lee bounds
sharp-boundssample-selectionmonotonicity
Annotation
Lee develops sharp nonparametric bounds on treatment effects in the presence of sample selection, requiring only a monotonicity assumption (that treatment affects selection in one direction). These bounds are widely used to address attrition and selective sample composition in randomized experiments.
1020
Lee, D. S., & Lemieux, T. (2010). Regression Discontinuity Designs in Economics. Journal of Economic Literature, 48(2), 281–355.
doi.org/10.1257/jel.48.2.281
Surveyon regression discontinuity fuzzy, regression discontinuity sharp
surveyvalidity-testseconometric-theory
Annotation
Lee and Lemieux write the standard survey of RDD methods in economics, covering both sharp and fuzzy designs, validity tests, and extensions. This paper is the standard reference for understanding the econometric theory and practical implementation of RDD.
2220
Lee, S. (2022). The Myth of the Flat Start-Up: Reconsidering the Organizational Structure of Start-Ups. Strategic Management Journal, 43(1), 58–92.
doi.org/10.1002/smj.3333
ApplicationMgmton sensitivity analysis
Oster-methodcoefficient-stabilityomitted-variable-biasstart-upsorganizational-structure+1
Annotation
Lee examines the relationship between organizational hierarchy on start-up creative and commercial success in the video game industry. She uses Oster's (2019) coefficient stability method to assess robustness to omitted variable bias, demonstrating how partial identification techniques complement standard empirical approaches in strategy research.
2220
Lee, D. S., McCrary, J., Moreira, M. J., & Porter, J. (2022). Valid t-Ratio Inference for IV. American Economic Review, 112(10), 3260–3290.
doi.org/10.1257/aer.20211063
Foundationalon instrumental variables
weak-instrumentst-ratioF-statisticinference
Annotation
Lee, McCrary, Moreira, and Porter address the potentially severe large-sample distortions of t-ratio-based inference in the single-IV model. They introduce the tF critical value function, a standard error adjustment that is a smooth function of the first-stage F-statistic, which corrects for weak instrument bias. They find that for one-quarter of specifications in 61 AER papers, corrected standard errors are at least 49% larger than conventional 2SLS standard errors at the 5% significance level. The practical implication is that researchers using IV should apply their tF correction rather than relying on conventional standard errors.
1220
Lennox, C. S., Francis, J. R., & Wang, Z. (2012). Selection Models in Accounting Research. The Accounting Review, 87(2), 589–616.
doi.org/10.2308/accr-10195
Surveyon heckman selection model
surveyaccountingbest-practices
Annotation
Lennox, Francis, and Wang review the use (and misuse) of Heckman selection models in accounting research. They document common pitfalls — most prominently the severe collinearity between the inverse Mills ratio and second-stage covariates when no valid exclusion restriction is imposed, sensitivity to the joint-normality assumption, and mechanical application without economic justification for the selection equation — and provide a checklist of diagnostics applied researchers should report.
9719
Levitt, S. D. (1997). Using Electoral Cycles in Police Hiring to Estimate the Effect of Police on Crime. American Economic Review, 87(3), 270–290.
Applicationon instrumental variables
crimepoliceelectoral-cyclesreverse-causality
Annotation
Levitt uses the timing of mayoral and gubernatorial elections as an instrument for police hiring to estimate the causal effect of police on crime. The paper illustrates the IV approach in a policy-relevant setting where the key concern is reverse causality (more crime leads to more police).
9319
Lin, D. Y., Wei, L. J., & Ying, Z. (1993). Checking the Cox Model with Cumulative Sums of Martingale-Based Residuals. Biometrika, 80(3), 557–572.
doi.org/10.1093/biomet/80.3.557
Foundationalon cox proportional hazard
foundationaldiagnosticsmodel-checking
Annotation
Lin, Wei, and Ying develop graphical and numerical methods for checking the Cox model using cumulative sums of martingale-based residuals. Provides formal tests for the proportional hazards assumption, functional form of covariates, and overall model adequacy.
1320
Lin, W. (2013). Agnostic Notes on Regression Adjustments to Experimental Data: Reexamining Freedman's Critique. Annals of Applied Statistics, 7(1), 295–318.
doi.org/10.1214/12-AOAS583
Foundationalon lab experiment tutorial
Annotation
Lin shows that interacting covariates with treatment and demeaning them in a randomized experiment yields an estimator that is guaranteed to be at least as precise as the unadjusted difference in means (asymptotically), resolving Freedman's critique of regression adjustment in experiments.
1520
Linden, A. (2015). Conducting Interrupted Time-Series Analysis for Single- and Multiple-Group Comparisons. Stata Journal, 15(2), 480–500.
doi.org/10.1177/1536867X1501500208
Surveyon interrupted time series
surveystatasoftware
Annotation
Linden introduces the itsa command in Stata for single- and multiple-group ITS analysis. Covers Newey-West standard errors for autocorrelation, Prais-Winsten estimation, and the extension to controlled ITS with a comparison group. A key reference for Stata users.
1120
List, J. A., Sadoff, S., & Wagner, M. (2011). So You Want to Run an Experiment, Now What? Some Simple Rules of Thumb for Optimal Experimental Design. Experimental Economics, 14(4), 439–457.
doi.org/10.1007/s10683-011-9275-7
Surveyon experimental design
power-analysissample-sizedesign-guide
Annotation
List, Sadoff, and Wagner provide rules of thumb for sample size, treatment assignment, and other design decisions in field experiments in this practical guide. It is a useful starting point for researchers planning their first experiment.
1920
List, J. A., Shaikh, A. M., & Xu, Y. (2019). Multiple Hypothesis Testing in Experimental Economics. Experimental Economics, 22(4), 773–793.
doi.org/10.1007/s10683-018-09597-5
Surveyon multiple testing
experimental-economicsfield-experimentspractical-guide
Annotation
List, Shaikh, and Xu provide practical guidance on addressing multiple hypothesis testing in experimental economics. They compare various correction methods including Bonferroni, Holm, and FDR procedures, and demonstrate their application to field experiments with multiple outcome variables.
9719
Long, J. S. (1997). Regression Models for Categorical and Limited Dependent Variables. SAGE Publications.
Surveyon logit probit
textbookcategorical-datalimited-dependent-variables
Annotation
Long provides a comprehensive reference for applied researchers working with binary, ordinal, multinomial, and count outcome models. The textbook covers maximum likelihood estimation, marginal effects computation, and model diagnostics with clear exposition and software implementation guidance. It remains the standard practical guide for researchers who need to move beyond OLS to handle categorical and limited dependent variables.
0020
Long, J. S., & Ervin, L. H. (2000). Using Heteroscedasticity Consistent Standard Errors in the Linear Regression Model. The American Statistician, 54(3), 217–224.
doi.org/10.1080/00031305.2000.10474549
Foundationalon ols regression
robust-standard-errorsheteroscedasticityHC3simulation
Annotation
Long and Ervin compare HC0, HC1, HC2, and HC3 heteroscedasticity-consistent standard error estimators in a simulation study. Their finding that HC3 performs best in finite samples has influenced applied practice, with many applied researchers preferring HC3 over the default HC0.
1720
Lopez Bernal, J., Cummins, S., & Gasparrini, A. (2017). Interrupted Time Series Regression for the Evaluation of Public Health Interventions: A Tutorial. International Journal of Epidemiology, 46(1), 348–355.
doi.org/10.1093/ije/dyw098
Surveyon interrupted time series
surveytutorialpublic-health
Annotation
Lopez Bernal, Cummins, and Gasparrini provide an accessible tutorial on ITS regression for public health researchers. Covers the segmented regression model, autocorrelation diagnostics, Newey-West standard errors, and practical guidance on minimum number of time points. An excellent starting point for applied researchers.
1820
Lopez Bernal, J., Cummins, S., & Gasparrini, A. (2018). The Use of Controls in Interrupted Time Series Studies of Public Health Interventions. International Journal of Epidemiology, 47(6), 2082–2093.
doi.org/10.1093/ije/dyy135
Surveyon interrupted time series
surveytutorialcontrolled-its
Annotation
Lopez Bernal and colleagues provide a tutorial on extending ITS analysis with control groups to strengthen causal inference. Discusses controlled ITS (CITS) designs that combine the ITS framework with a comparison series, addressing the key threat of concurrent events confounding the intervention effect.
0420
Lunceford, J. K., & Davidian, M. (2004). Stratification and Weighting via the Propensity Score in Estimation of Causal Treatment Effects: A Comparative Study. Statistics in Medicine, 23(19), 2937–2960.
doi.org/10.1002/sim.1903
Applicationon doubly robust estimation
propensity-scorecomparisonsimulation
Annotation
Lunceford and Davidian compare propensity-score stratification, inverse probability weighting, and doubly robust estimators in a systematic simulation study. The paper provides a side-by-side assessment of these approaches for estimating causal treatment effects from observational data.

1920
Machado, J. A. F., & Santos Silva, J. M. C. (2019). Quantiles via Moments. Journal of Econometrics, 213(1), 145–173.
doi.org/10.1016/j.jeconom.2019.04.009
Foundationalon quantile treatment effects
foundationalpanel-datafixed-effects
Annotation
Machado and Santos Silva show that, under a conditional location-scale structure, regression quantiles can be estimated by estimating conditional means. This 'quantiles via moments' approach makes it possible to use tools developed for mean regression in distributional-effects settings, and it can be adapted to panel data with fixed effects by avoiding the incidental parameters problem.
9719
MacKinlay, A. C. (1997). Event Studies in Economics and Finance. Journal of Economic Literature, 35(1), 13–39.
Surveyon event studies
surveymethodologyabnormal-returnsstatistical-testing
Annotation
MacKinlay provides a comprehensive methodological survey of event studies, covering the statistical framework, estimation windows, abnormal return calculations, and testing procedures. This paper remains the standard reference for researchers designing and implementing event studies.
0720
MacKinnon, D. P., Fairchild, A. J., & Fritz, M. S. (2007). Mediation Analysis. Annual Review of Psychology, 58, 593–614.
doi.org/10.1146/annurev.psych.58.110405.085542
Surveyon causal mediation analysis
psychologySobel-testbootstrapping
Annotation
MacKinnon, Fairchild, and Fritz provide an accessible review of mediation analysis methods for psychologists, covering the Baron-Kenny approach, the Sobel test, bootstrapping methods, and extensions to multiple mediators. This survey helped bridge the gap between traditional and modern approaches.
1720
MacKinnon, J. G., and Webb, M. D. (2017). Wild Bootstrap Inference for Wildly Different Cluster Sizes. Journal of Applied Econometrics, 32(2), 233–254.
doi.org/10.1002/jae.2508
Foundationalon choosing standard errors
Annotation
MacKinnon and Webb show that the wild cluster bootstrap with Rademacher weights performs poorly when treated and control clusters are severely imbalanced — for instance, when only one or two clusters are treated. They propose using Webb six-point weights as an alternative that remains reliable under cluster imbalance, and provide simulation evidence comparing the two weighting schemes.
9019
Manski, C. F. (1990). Nonparametric Bounds on Treatment Effects. American Economic Review: Papers & Proceedings, 80(2), 319–323.
Foundationalon lee bounds
partial-identificationworst-case-boundsnonparametric
Annotation
Manski introduces the partial identification approach to treatment effects, showing that even without strong assumptions, one can bound causal effects using the observed data. His worst-case bounds framework lays the theoretical foundation for Lee's sharper bounds under the monotonicity assumption.
9319
Manski, C. F. (1993). Identification of Endogenous Social Effects: The Reflection Problem. Review of Economic Studies, 60(3), 531–542.
doi.org/10.2307/2298123
Foundationalon instrumental variables
identificationsocial-interactionspeer-effectsreflection-problem
Annotation
Manski formalizes the reflection problem in the analysis of social interactions: when individual outcomes depend on group averages, the group average is simultaneously determined by its members. This simultaneity makes it impossible to distinguish true social (endogenous) effects from correlated effects without additional structure or exclusion restrictions. The paper is essential reading for any researcher attempting to estimate peer effects or social spillovers.
0320
Manski, C. F. (2003). Partial Identification of Probability Distributions. Springer.
doi.org/10.1007/b97478
Foundationalon lee bounds
partial-identificationtextbookboundsnonparametric
Annotation
Manski's monograph provides a comprehensive treatment of partial identification, showing how to derive informative bounds on parameters of interest when point identification is not possible. This book formalizes and extends his earlier work on bounding treatment effects and is the standard reference for the theoretical framework underlying Lee bounds.
1220
Masicampo, E. J., & Lalande, D. (2012). A Peculiar Prevalence of p Values Just Below .05. Quarterly Journal of Experimental Psychology, 65(11), 2271–2279.
doi.org/10.1080/17470218.2012.711335
Applicationon specification curve
p-valuespublication-biasspecification-searching
Annotation
Masicampo and Lalande document a suspicious clustering of p-values just below the .05 threshold in psychology journals, providing empirical evidence of publication bias and researcher degrees of freedom. They discuss potential sources of this pattern and its implications for the credibility of published findings in the social sciences.
2120
Masten, M. A., & Poirier, A. (2021). Salvaging Falsified Instrumental Variable Models. Econometrica, 89(3), 1449–1469.
doi.org/10.3982/ECTA17969
Foundationalon sensitivity analysis
instrumental-variablesfalsificationpartial-identificationbounds
Annotation
Masten and Poirier study what researchers can do when an IV model is falsified. They introduce the falsification frontier and the falsification adaptive set, which quantify minimal relaxations of the baseline assumptions and report the parameter values consistent with minimally nonfalsified models, providing a structured sensitivity-analysis framework for IV.
0820
McCrary, J. (2008). Manipulation of the Running Variable in the Regression Discontinuity Design: A Density Test. Journal of Econometrics, 142(2), 698–714.
doi.org/10.1016/j.jeconom.2007.05.005
Foundationalon regression discontinuity fuzzy, regression discontinuity sharp
manipulation-testdensity-testvalidity
Annotation
McCrary develops the standard test for whether agents are manipulating the running variable to sort around the cutoff. If the density of the running variable shows a discontinuity at the cutoff, the RDD is compromised. This density test is now a routine validity check in all RDD papers.
7419
McFadden, D. (1974). Conditional Logit Analysis of Qualitative Choice Behavior. Frontiers in Econometrics, 105–142.
Foundationalon logit probit
conditional-logitdiscrete-choicerandom-utility
Annotation
McFadden develops the conditional logit model grounded in random utility theory, showing how discrete choices among alternatives can be modeled by assuming individuals maximize utility with an extreme-value distributed error. This work earned him the 2000 Nobel Prize and remains the foundation of discrete choice analysis.
1220
McKenzie, D. (2012). Beyond Baseline and Follow-Up: The Case for More T in Experiments. Journal of Development Economics, 99(2), 210–221.
doi.org/10.1016/j.jdeveco.2012.01.002
Foundationalon power analysis
ANCOVAmultiple-periodsdevelopment-economics
Annotation
McKenzie shows that collecting multiple rounds of data can substantially increase statistical power in randomized experiments. He demonstrates that ANCOVA with baseline data and difference-in-differences with multiple time periods can substantially reduce the required sample size, which is particularly valuable in development economics.
9719
McWilliams, A., & Siegel, D. (1997). Event Studies in Management Research: Theoretical and Empirical Issues. Academy of Management Journal, 40(3), 626–657.
doi.org/10.2307/257056
SurveyMgmton event studies
management-methodologytutorialstrategy-research
Annotation
McWilliams and Siegel introduce event study methods to the management research community, explaining the assumptions, methodology, and common pitfalls. This tutorial article led to widespread adoption of event studies in strategic management research.
0420
Miguel, E., Satyanath, S., & Sergenti, E. (2004). Economic Shocks and Civil Conflict: An Instrumental Variables Approach. Journal of Political Economy, 112(4), 725–753.
doi.org/10.1086/421174
Applicationon instrumental variables
civil-conflictrainfall-instrumentweather-IVAfrica
Annotation
Miguel, Satyanath, and Sergenti instrument for economic growth using rainfall variation to estimate the causal effect of economic shocks on civil conflict in Sub-Saharan Africa. Their paper is a clean and widely cited example of using weather as an instrumental variable, illustrating both the power and the exclusion restriction challenges of weather-based instruments.
1420
Miguel, E., Camerer, C., Casey, K., Cohen, J., Esterling, K. M., Gerber, A., Glennerster, R., Green, D. P., Humphreys, M., Imbens, G., Laitin, D., Madon, T., Nelson, L., Nosek, B. A., Petersen, M., Sedlmayr, R., Simmons, J. P., Simonsohn, U., & Van der Laan, M. (2014). Promoting Transparency in Social Science Research. Science, 343(6166), 30–31.
doi.org/10.1126/science.1245317
Foundationalon pre registration
transparencyopen-sciencesocial-science
Annotation
Miguel and a coalition of leading social scientists call for greater transparency in research, including pre-registration of studies and analysis plans, open data, and replication. This short but influential piece in Science helps establish the norms and infrastructure for pre-registration in social science.
7419
Mincer, J. (1974). Schooling, Experience, and Earnings. National Bureau of Economic Research / Columbia University Press.
Applicationon ols regression
returns-to-educationlabor-economicswage-equation
Annotation
Mincer develops the canonical human-capital earnings function relating log wages to years of schooling and labor-market experience. The Mincer equation became one of the most replicated empirical models in economics and remains the standard benchmark for wage-equation analysis, though it should not be read as having solved the causal identification problems surrounding returns to schooling.
1820
Mogstad, M., Santos, A., & Torgovitsky, A. (2018). Using Instrumental Variables for Inference about Policy Relevant Treatment Parameters. Econometrica, 86(5), 1589–1619.
doi.org/10.3982/ECTA15463
Foundationalon marginal treatment effects
MTEpartial-identificationboundspolicy-evaluationivmte
Annotation
Mogstad, Santos, and Torgovitsky develop a framework for using instrumental variables to conduct inference on policy-relevant treatment effects under weaker assumptions than full MTE identification. They show that even when the MTE is only partially identified (due to limited support of the propensity score), informative bounds on ATE, ATT, and PRTE can be derived by combining the identified portion of the MTE with shape restrictions. Their approach uses linear programming to compute sharp bounds on the target parameter given the data and assumptions. The paper provides the R package ivmte for implementation and demonstrates that useful policy conclusions can be drawn even without point-identifying the entire MTE curve.
1320
Montiel Olea, J. L., & Pflueger, C. (2013). A Robust Test for Weak Instruments. Journal of Business & Economic Statistics, 31(3), 358–369.
doi.org/10.1080/00401706.2013.806694
Foundationalon instrumental variables
weak-instrumentsrobust-inferenceF-statistic
Annotation
Montiel Olea and Pflueger propose an effective F-statistic for testing weak instruments that is robust to heteroscedasticity, serial correlation, and clustering — unlike the conventional first-stage F. The effective F is now the standard diagnostic for instrument strength in applied IV research.
9019
Moulton, B. R. (1990). An Illustration of a Pitfall in Estimating the Effects of Aggregate Variables on Micro Units. Review of Economics and Statistics, 72(2), 334–338.
doi.org/10.2307/2109724
Foundationalon ols regression
clusteringaggregate-variablesstandard-errorsMoulton-problem
Annotation
Moulton demonstrates that when aggregate-level variables (such as state policies) are used to explain individual-level outcomes, OLS standard errors that ignore within-group correlation can be dramatically understated. This paper establishes the 'Moulton problem' and motivates the widespread adoption of clustered standard errors in applied microeconomics.
8719
Mroz, T. A. (1987). The Sensitivity of an Empirical Model of Married Women's Hours of Work to Economic and Statistical Assumptions. Econometrica, 55(4), 765–799.
doi.org/10.2307/1911029
Applicationon heckman selection model
applicationlabor-supplysensitivity
Annotation
Mroz provides a classic application of the Heckman selection model to female labor supply. Shows that the two-step estimator's results are sensitive to the choice of exclusion restriction and the normality assumption. The Mroz dataset remains a standard teaching dataset for selection models.
1720
Mullainathan, S., & Spiess, J. (2017). Machine Learning: An Applied Econometric Approach. Journal of Economic Perspectives, 31(2), 87–106.
doi.org/10.1257/jep.31.2.87
Surveyon double debiased machine learning
machine-learningprediction-vs-causationeconomics
Annotation
Mullainathan and Spiess provide an accessible introduction to supervised machine learning for economists, emphasizing how ML differs from classical parameter estimation and where prediction-oriented tools can be useful in empirical economics. The paper is a broad ML-for-economists survey, not a foundational paper on double/debiased machine learning specifically.
1720
Munafo, M. R., Nosek, B. A., Bishop, D. V. M., Button, K. S., Chambers, C. D., Percie du Sert, N., Simonsohn, U., Wagenmakers, E.-J., Ware, J. J., & Ioannidis, J. P. A. (2017). A Manifesto for Reproducible Science. Nature Human Behaviour, 1, 0021.
doi.org/10.1038/s41562-016-0021
Foundationalon specification curve
reproducibilityopen-sciencemanifesto
Annotation
Munafo, Nosek, and colleagues identify threats to reproducible science and propose a broad reform agenda spanning methods, reporting, reproducibility practices, evaluation, and incentives. The article is a general reproducibility manifesto that provides the broader scientific reform context motivating robustness-analysis approaches.
7819
Mundlak, Y. (1978). On the Pooling of Time Series and Cross Section Data. Econometrica, 46(1), 69–85.
doi.org/10.2307/1913646
Foundationalon fixed effects, random effects
correlated-random-effectspanel-datapooling
Annotation
Mundlak shows that the fixed effects estimator can be understood as an OLS regression that includes the group means of all time-varying regressors. This 'correlated random effects' interpretation bridges the fixed effects and random effects models and clarifies exactly what assumption is being relaxed.
1620
Muralidharan, K., Niehaus, P., & Sukhtankar, S. (2016). Building State Capacity: Evidence from Biometric Smartcards in India. American Economic Review, 106(10), 2895–2929.
doi.org/10.1257/aer.20141346
Applicationon power analysis
cluster-RCTMDEdevelopment-economicsstate-capacity
Annotation
Muralidharan, Niehaus, and Sukhtankar evaluate a large-scale randomized rollout of biometric smartcards for NREGS welfare payments across 157 Indian subdistricts (covering approximately 19 million beneficiaries), finding that the reform delivered faster, more predictable, and less corrupt payments. The paper is widely cited as an example of well-powered, large-scale cluster-randomized policy evaluation in development economics.
0620
Murray, M. P. (2006). Avoiding Invalid Instruments and Coping with Weak Instruments. Journal of Economic Perspectives, 20(4), 111–132.
doi.org/10.1257/jep.20.4.111
Surveyon instrumental variables
instrument-validityweak-instrumentspractical-guideapplied-work
Annotation
Murray provides practical guidance on evaluating instrument validity and dealing with weak instruments in applied work. Written in an accessible style, it helps applied researchers think critically about their instrument choices and provides concrete strategies for addressing common IV pitfalls.

0020
Neumark, D., & Wascher, W. (2000). Minimum Wages and Employment: A Case Study of the Fast-Food Industry in New Jersey and Pennsylvania: Comment. American Economic Review, 90(5), 1362–1396.
doi.org/10.1257/aer.90.5.1362
Applicationon difference in differences
minimum-wagereplicationmeasurementdid
Annotation
Neumark and Wascher challenge Card and Krueger's (1994) minimum wage findings by re-analyzing the data using payroll records instead of survey responses, finding negative employment effects. The exchange illustrates the importance of data quality and measurement choices in difference-in-differences designs.
8719
Newey, W. K., & West, K. D. (1987). A Simple, Positive Semi-Definite, Heteroskedasticity and Autocorrelation Consistent Covariance Matrix. Econometrica, 55(3), 703–708.
doi.org/10.2307/1913610
Foundationalon ols regression
HACautocorrelationtime-seriesstandard-errors
Annotation
Newey and West extend White's robust standard errors to also account for autocorrelation in time-series data in this short but hugely influential paper. The 'Newey-West standard errors' or 'HAC standard errors' are standard practice whenever researchers work with data that have a time dimension.
9919
Newey, W. K. (1999). Two Step Series Estimation of Sample Selection Models. MIT Department of Economics Working Paper 99-04.
Foundationalon heckman selection model
Annotation
Newey proposes a semiparametric two-step estimator for sample selection models that replaces the parametric inverse Mills ratio with a flexible series approximation, avoiding the normality assumption of the Heckman correction.
8119
Nickell, S. (1981). Biases in Dynamic Models with Fixed Effects. Econometrica, 49(6), 1417–1426.
doi.org/10.2307/1911408
Foundationalon fixed effects
dynamic-panelsNickell-biaslagged-dependent-variable
Annotation
Nickell shows that including a lagged dependent variable in a fixed effects regression creates a bias that does not vanish as the number of cross-sectional units grows. This 'Nickell bias' is a critical concern for researchers using fixed effects in dynamic panel models with short time series.
2120
Nie, X., & Wager, S. (2021). Quasi-Oracle Estimation of Heterogeneous Treatment Effects. Biometrika, 108(2), 299–319.
doi.org/10.1093/biomet/asaa076
Foundationalon causal forests
R-learnerCATEmeta-learners
Annotation
Nie and Wager propose the R-learner, a two-step approach for estimating heterogeneous treatment effects that first residualizes outcomes and treatment on covariates, then estimates the CATE by regressing outcome residuals on treatment residuals. This approach can use any machine learning method including causal forests.
1020
Nielsen, H. S., Sorensen, T., & Taber, C. (2010). Estimating the Effect of Student Aid on College Enrollment: Evidence from a Government Grant Policy Reform. American Economic Journal: Economic Policy, 2(2), 185–215.
doi.org/10.1257/pol.2.2.185
Applicationon regression kink design
student-aidcollege-enrollmentDenmarkearly-application
Annotation
Nielsen, Sorensen, and Taber apply a regression kink design to estimate the effect of student financial aid on college enrollment in Denmark. The Danish student aid formula creates a kink in the relationship between parental income and aid received. They exploit this kink to identify causal effects, providing one of the earliest applications of the RKD methodology.
1820
Nosek, B. A., Ebersole, C. R., DeHaven, A. C., & Mellor, D. T. (2018). The Preregistration Revolution. Proceedings of the National Academy of Sciences, 115(11), 2600–2606.
doi.org/10.1073/pnas.1708274114
Foundationalon pre registration
pre-registrationopen-sciencecredibility
Annotation
Nosek and colleagues make the case for widespread adoption of pre-registration, arguing that it distinguishes confirmatory from exploratory analyses, reduces publication bias, and increases the credibility of empirical research. This paper helps catalyze the pre-registration movement across the social sciences.

1520
Olken, B. A. (2015). Promises and Perils of Pre-Analysis Plans. Journal of Economic Perspectives, 29(3), 61–80.
doi.org/10.1257/jep.29.3.61
Surveyon pre registration
pre-analysis-plansdevelopment-economicstradeoffs
Annotation
Olken provides a balanced assessment of pre-analysis plans in development economics, discussing both benefits (reduced specification searching, increased credibility) and costs (loss of flexibility, difficulty specifying analyses in advance). This paper is essential reading for understanding the practical tradeoffs of pre-registration.
1920
Oprescu, M., Syrgkanis, V., & Wu, Z. S. (2019). Orthogonal Random Forest for Causal Inference. Proceedings of the 36th International Conference on Machine Learning, 97, 4932–4941.
FoundationalReplicationon causal forests
orthogonal-forestsEconMLDML
Annotation
Oprescu, Syrgkanis, and Wu propose orthogonal random forests, which combine Neyman-orthogonal moments with generalized random forests to reduce sensitivity to nuisance-estimation error. The paper provides theoretical results and shows how the method can be used for heterogeneous-effect estimation with discrete or continuous treatments.
1920
Orben, A., & Przybylski, A. K. (2019). The Association between Adolescent Well-Being and Digital Technology Use. Nature Human Behaviour, 3(2), 173–182.
doi.org/10.1038/s41562-018-0506-1
Applicationon specification curve
digital-technologywell-beinglarge-scale-applicationpsychology
Annotation
Orben and Przybylski apply specification curve analysis to the hotly debated question of whether digital technology use harms adolescent well-being, running over 20,000 specifications across three large datasets. They find that technology use has a negligible negative association with well-being, far smaller than commonly assumed, demonstrating how specification curve analysis can bring clarity to contested empirical questions by mapping the full space of defensible analytical choices.
1920
Oster, E. (2019). Unobservable Selection and Coefficient Stability: Theory and Evidence. Journal of Business & Economic Statistics, 37(2), 187–204.
doi.org/10.1080/07350015.2016.1227711
Foundationalon sensitivity analysis
coefficient-stabilityproportional-selectionbounding
Annotation
Oster extends the Altonji, Elder, and Taber approach to assess the robustness of regression estimates to omitted variable bias. She proposes a bounding method based on the proportional selection assumption and coefficient stability across specifications, now widely used in applied economics.
1720
Otsu, T., & Rai, Y. (2017). Bootstrap Inference of Matching Estimators for Average Treatment Effects. Journal of the American Statistical Association, 112(520), 1720–1732.
doi.org/10.1080/01621459.2016.1231613
Foundationalon matching methods
matchingbootstrapstandard-errorsnearest-neighbor
Annotation
Otsu and Rai propose a weighted bootstrap procedure that is consistent for nearest-neighbor matching estimators, addressing the inconsistency of the standard nonparametric bootstrap proved by Abadie and Imbens (2008). Their weighted bootstrap correctly accounts for the matching step and recovers valid standard errors and confidence intervals for matching estimators where the naive bootstrap fails. The procedure is not yet implemented in major matching packages, so practitioners are typically still directed to the analytical Abadie-Imbens (2006) standard errors.

8619
Palepu, K. G. (1986). Predicting Takeover Targets: A Methodological and Empirical Analysis. Journal of Accounting and Economics, 8(1), 3–35.
doi.org/10.1016/0165-4101(86)90008-X
Applicationon logit probit
takeover-predictioncorporate-governancefinance
Annotation
Palepu uses logit models to study takeover prediction and identifies methodological flaws in prior prediction studies, showing that targets are more difficult to predict than earlier work suggests. The paper highlights the importance of proper classification criteria and sampling methodology when applying binary choice models to rare-event corporate outcomes.
0120
Pearl, J. (2001). Direct and Indirect Effects. Proceedings of the Seventeenth Conference on Uncertainty in Artificial Intelligence, 411–420.
Foundationalon causal mediation analysis
structural-causal-modelsnatural-effectsdo-calculus
Annotation
Pearl formalizes the concepts of natural direct and indirect effects using structural causal models and do-calculus. This paper establishes the nonparametric identification conditions for mediation effects and shows that traditional mediation analysis conflates causal and non-causal pathways.
0920
Pearl, J. (2009). Causality: Models, Reasoning, and Inference. Cambridge University Press.
doi.org/10.1017/CBO9780511803161
Foundationalon matching methods
DAGsdo-calculusstructural-causal-modelsfoundations
Annotation
Pearl provides a comprehensive treatment of causal inference using directed acyclic graphs, the do-calculus, and structural causal models. The book formalizes the rules for reading conditional independence from graphs and establishes when causal effects are identifiable from observational data. It is the foundational reference for any researcher using DAGs to reason about confounding, mediation, and causal identification.
1420
Pearl, J. (2014). Interpretation and Identification of Causal Mediation. Psychological Methods, 19(4), 459–481.
doi.org/10.1037/a0036434
Foundationalon causal mediation analysis
structural-causal-modelsidentificationnatural-effectsgraphical-criteria
Annotation
Pearl provides a structural causal model perspective on mediation, clarifying the interpretation and identification of natural direct and indirect effects. He shows how graphical criteria can determine when mediation effects are identifiable and contrasts the structural approach with the potential outcomes framework used by Imai, Keele, and Tingley.
1220
Peterson, M. F., Arregle, J.-L., & Martin, X. (2012). Multilevel Models in International Business Research. Journal of International Business Studies, 43(5), 451–457.
doi.org/10.1057/jibs.2011.59
SurveyMgmton random effects
international-businessmultilevelcross-country
Annotation
Peterson, Arregle, and Martin review the use of multilevel random-effects models in international business research, where firms are nested within countries. They discuss best practices for modeling cross-level effects and the importance of accounting for the hierarchical structure of international data.
2420
Pongeluppe, L. S. (2024). The Allegory of the Favela: The Multifaceted Effects of Socioeconomic Mobility. Administrative Science Quarterly, 69(3), 619–654.
doi.org/10.1177/00018392241240469
ApplicationMgmton experimental design
RCTfield-experimentsocioeconomic-mobilitystigmaentrepreneurship+1
Annotation
Pongeluppe conducts a randomized controlled trial of a business training program offered to residents of Brazilian favelas, complementing the experiment with quantile regressions, field visits, and interviews. The results show that training improves economic outcomes such as income and entrepreneurship participation, but also intensifies participants' experiences of favela-related stigma, revealing that socioeconomic mobility can simultaneously generate material benefits and psychosocial costs.
2220
Porreca, Z. (2022). Synthetic Difference-in-Differences Estimation with Staggered Treatment Timing. Economics Letters, 220, 110874.
doi.org/10.1016/j.econlet.2022.110874
Foundationalon synthetic difference in differences
staggered-adoptionextensionpolicy-evaluation
Annotation
Porreca extends the synthetic DID estimator to staggered treatment adoption settings, where multiple units adopt treatment at different times. The method constructs a localized estimator in which treated units are compared to a never-treated control group weighted on both the time and unit dimensions.
8719
Powell, J. L. (1987). Semiparametric Estimation of Bivariate Latent Variable Models. SSRI Working Paper 8704, University of Wisconsin-Madison.
Foundationalon heckman selection model
Annotation
Powell develops semiparametric methods for sample selection correction that do not require distributional assumptions on the error terms, providing an alternative to the Heckman two-step estimator when normality is suspect.
0820
Preacher, K. J., & Hayes, A. F. (2008). Asymptotic and Resampling Strategies for Assessing and Comparing Indirect Effects in Multiple Mediator Models. Behavior Research Methods, 40(3), 879–891.
doi.org/10.3758/BRM.40.3.879
Foundationalon causal mediation analysis
multiple-mediatorsbootstrappingsoftware
Annotation
Preacher and Hayes develop methods and software for testing indirect effects through multiple mediators simultaneously, using bootstrapping to construct confidence intervals. Their approach and accompanying SPSS and SAS macros become extremely widely used in psychology and management research.
0020
Puhani, P. A. (2000). The Heckman Correction for Sample Selection and Its Critique. Journal of Economic Surveys, 14(1), 53–68.
doi.org/10.1111/1467-6419.00104
Surveyon heckman selection model
surveycomparisontwo-step-vs-mle
Annotation
Puhani surveys Monte Carlo evidence on the Heckman two-step estimator and the alternatives that succeeded it (full-information MLE, subsample OLS). The survey concludes that the Heckman correction performs poorly when collinearity between selection and outcome regressors is high and recommends caution in applied use.
1820
Pustejovsky, J. E., & Tipton, E. (2018). Small-Sample Methods for Cluster-Robust Variance Estimation and Hypothesis Testing in Fixed Effects Models. Journal of Business & Economic Statistics, 36(4), 672–683.
doi.org/10.1080/07350015.2016.1247004
Foundationalon clustering inference
cluster-robustfew-clustersCR2
Annotation
Pustejovsky and Tipton develop the CR2 bias-reduced cluster-robust variance estimator for fixed effects models with few clusters. The CR2 correction improves coverage relative to the standard CR1 estimator when the number of clusters is small.

1220
Rabe-Hesketh, S., & Skrondal, A. (2012). Multilevel and Longitudinal Modeling Using Stata. Stata Press, 3rd edition.
Surveyon random effects
multilevel-modelsStatapractical-guidehierarchical
Annotation
Rabe-Hesketh and Skrondal provide a comprehensive practical guide to multilevel (hierarchical) models in Stata, which generalize the random effects framework to more complex nested data structures. It is an essential reference for applied researchers implementing multilevel models.
2320
Rambachan, A., & Roth, J. (2023). A More Credible Approach to Parallel Trends. Review of Economic Studies, 90(5), 2555–2591.
doi.org/10.1093/restud/rdad018
Foundationalon event studies
parallel-trendssensitivity-analysishonest-confidence-intervals
Annotation
Rambachan and Roth develop a sensitivity analysis framework for assessing the robustness of event-study and difference-in-differences estimates to violations of the parallel trends assumption. Their approach constructs honest confidence intervals under restrictions on how pre-trends can extrapolate into the post-treatment period, providing a disciplined alternative to informal pre-trend tests.
2420
Rathje, J., Katila, R., & Reineke, P. (2024). Making the Most of AI and Machine Learning in Organizations and Strategy Research: Supervised Machine Learning, Causal Inference, and Matching Models. Strategic Management Journal, 45(10), 1926–1953.
doi.org/10.1002/smj.3604
SurveyMgmton matching methods
machine-learningmatchingpropensity-scorecausal-inferencemethodology+1
Annotation
Rathje, Katila, and Reineke review how supervised machine learning can support causal-inference workflows in strategy research, with emphasis on two-stage matching models for sample-selection problems. Using technology invention data, they demonstrate ML-based approaches to covariate selection and matching while discussing the broader potential and limits of ML in organizational research.
0220
Raudenbush, S. W., & Bryk, A. S. (2002). Hierarchical Linear Models: Applications and Data Analysis Methods. SAGE Publications.
Surveyon random effects
HLMmultilevel-modelingnested-datatextbook
Annotation
Raudenbush and Bryk popularize hierarchical linear models (HLM), which are random-effects models for nested data structures such as students within schools, in this influential textbook. It becomes the standard reference for multilevel modeling in education, psychology, and organizational research.
8819
Rivers, D., & Vuong, Q. H. (1988). Limited Information Estimators and Exogeneity Tests for Simultaneous Probit Models. Journal of Econometrics, 39(3), 347–366.
doi.org/10.1016/0304-4076(88)90063-2
Foundationalon heckman selection model
Annotation
Rivers and Vuong develop the control function approach for limited dependent variable models with endogenous regressors, extending the Heckman two-step logic to simultaneous probit equations.
9219
Robins, J. M., & Greenland, S. (1992). Identifiability and Exchangeability for Direct and Indirect Effects. Epidemiology, 3(2), 143–155.
doi.org/10.1097/00001648-199203000-00013
Foundationalon causal mediation analysis
direct-effectsindirect-effectsepidemiology
Annotation
Robins and Greenland provide early formal conditions for identifying direct and indirect causal effects in epidemiology. Their work on controlled direct effects and the assumptions required for mediation analysis lays important groundwork for the modern causal mediation literature.
9419
Robins, J. M., Rotnitzky, A., & Zhao, L. P. (1994). Estimation of Regression Coefficients When Some Regressors Are Not Always Observed. Journal of the American Statistical Association, 89(427), 846–866.
doi.org/10.1080/01621459.1994.10476818
Foundationalon doubly robust estimation
AIPWmissing-datasemiparametric
Annotation
Robins, Rotnitzky, and Zhao introduce the augmented inverse probability weighting (AIPW) estimator, which combines outcome modeling and propensity score weighting. The key insight is that the estimator is consistent if either the outcome model or the propensity score model is correctly specified, providing a double layer of protection against misspecification.
8819
Robinson, P. M. (1988). Root-N-Consistent Semiparametric Regression. Econometrica, 56(4), 931–954.
doi.org/10.2307/1912705
Foundationalon double debiased machine learning
partially-linearsemiparametricroot-n-consistency
Annotation
Robinson develops the partially linear regression estimator that achieves root-n consistency for the parametric component by partialling out nonparametric nuisance functions. This paper provides the semiparametric foundation that DML generalizes to the machine learning setting.
1720
Rohrer, J. M., Egloff, B., & Schmukle, S. C. (2017). Probing Birth-Order Effects on Narrow Traits Using Specification-Curve Analysis. Psychological Science, 28(12), 1821–1832.
doi.org/10.1177/0956797617723726
Applicationon specification curve
birth-orderpersonalityapplied-example
Annotation
Rohrer, Egloff, and Schmukle apply specification curve analysis to birth-order effects on personality. Across thousands of defensible specifications they confirm a robust effect on intellect but find essentially null effects on life satisfaction, locus of control, trust, reciprocity, risk taking, patience, impulsivity, and political orientation, demonstrating how specification curves can adjudicate which contested birth-order effects survive analytical flexibility.
0520
Romano, J. P., & Wolf, M. (2005). Stepwise Multiple Testing as Formalized Data Snooping. Econometrica, 73(4), 1237–1282.
doi.org/10.1111/j.1468-0262.2005.00615.x
Foundationalon multiple testing
stepwise-testingresamplingFWER
Annotation
Romano and Wolf develop a stepwise multiple testing procedure that controls the family-wise error rate while being less conservative than Bonferroni by resampling from the joint distribution of test statistics. Their method accounts for the correlation structure among tests and is widely used in economics.
8319
Rosenbaum, P. R., & Rubin, D. B. (1983). The Central Role of the Propensity Score in Observational Studies for Causal Effects. Biometrika, 70(1), 41–55.
doi.org/10.1093/biomet/70.1.41
Foundationalon matching methods
propensity-scoreselection-on-observablescausal-inference
Annotation
Rosenbaum and Rubin introduce the propensity score as a dimension-reduction tool for matching, showing that conditioning on the scalar probability of treatment is sufficient to remove selection bias when the unconfoundedness assumption holds. This paper establishes the theoretical foundation for all propensity-score-based methods, including matching, stratification, and inverse probability weighting. The key practical insight is that matching on a single score avoids the curse of dimensionality that makes direct covariate matching infeasible with many confounders.
0220
Rosenbaum, P. R. (2002). Observational Studies. Springer.
doi.org/10.1007/978-1-4757-3692-2
Surveyon matching methods, randomization inference, sensitivity analysis
observational-studiessensitivity-analysisRosenbaum-boundstextbook
Annotation
Rosenbaum provides the standard textbook on observational study design, covering matching, sensitivity analysis, and design principles for drawing causal inferences from non-experimental data. His framework for sensitivity analysis (Rosenbaum bounds) is the standard tool for assessing how much unobserved confounding would be needed to overturn a matching-based finding.
2220
Roth, J. (2022). Pretest with Caution: Event-Study Estimates after Testing for Parallel Trends. American Economic Review: Insights, 4(3), 305–322.
doi.org/10.1257/aeri.20210236
Foundationalon difference in differences, event studies
pre-trendspre-testinghonest-confidence-intervalsevent-study
Annotation
Roth shows that the common practice of testing for parallel pre-trends and proceeding conditional on 'passing' can distort inference: conventional pre-tests often have low power against economically meaningful violations, while conditioning on a non-rejection introduces post-test selection bias. The paper, together with Rambachan and Roth (2023), has fundamentally changed how researchers think about event-study pre-trends in DiD designs.
2320
Roth, J., Sant'Anna, P. H. C., Bilinski, A., & Poe, J. (2023). What's Trending in Difference-in-Differences? A Synthesis of the Recent Econometrics Literature. Journal of Econometrics, 235(2), 2218–2244.
doi.org/10.1016/j.jeconom.2023.03.008
Surveyon difference in differences, staggered difference in differences, synthetic difference in differences
surveystaggered-DIDheterogeneous-effectspre-trends
Annotation
Roth et al. synthesize the explosion of recent econometric work on DID in this comprehensive survey, covering staggered treatment timing, heterogeneous treatment effects, pre-trends testing, and new estimators. It is the essential starting point for understanding the modern DID literature.
7419
Rubin, D. B. (1974). Estimating Causal Effects of Treatments in Randomized and Nonrandomized Studies. Journal of Educational Psychology, 66(5), 688–701.
doi.org/10.1037/h0037350
Foundationalon experimental design
potential-outcomescausal-inferenceRubin-causal-model
Annotation
Rubin formalizes the 'potential outcomes' framework that is now central to causal inference. The idea is simple but powerful: each unit has a potential outcome under treatment and under control, and the causal effect is the difference. This paper is the origin of what is now called the Rubin Causal Model.

1020
Saez, E. (2010). Do Taxpayers Bunch at Kink Points?. American Economic Journal: Economic Policy, 2(3), 180–212.
doi.org/10.1257/pol.2.3.180
Foundationalon bunching estimation
bunchingkink-pointelasticityincome-taxEITC
Annotation
Saez introduces the modern bunching methodology by examining taxpayer responses to kink points in the US income tax schedule, where marginal tax rates change discretely. He shows how to estimate the compensated elasticity of reported income from the excess mass of taxpayers at kink points relative to a smooth counterfactual density fitted by polynomial. The paper establishes the standard empirical approach: bin the data, fit a polynomial excluding the bunching region, and compute the excess mass. He finds modest elasticities overall but sharp bunching among the self-employed near the first EITC kink.
2020
Sant'Anna, P. H. C., & Zhao, J. (2020). Doubly Robust Difference-in-Differences Estimators. Journal of Econometrics, 219(1), 101–122.
doi.org/10.1016/j.jeconom.2020.06.003
Foundationalon doubly robust estimation
DIDdoubly-robustATT
Annotation
Sant'Anna and Zhao develop doubly robust DID estimators that combine outcome regression and inverse probability weighting. The estimator is consistent for the ATT if either the outcome evolution model or the propensity score model for treatment group membership is correctly specified.
9919
Scharfstein, D. O., Rotnitzky, A., & Robins, J. M. (1999). Adjusting for Nonignorable Drop-Out Using Semiparametric Nonresponse Models. Journal of the American Statistical Association, 94(448), 1096–1120.
doi.org/10.1080/01621459.1999.10473862
Foundationalon doubly robust estimation
missing-datadropoutsemiparametric-efficiency
Annotation
Scharfstein, Rotnitzky, and Robins develop a semiparametric sensitivity analysis framework for nonignorable dropout in longitudinal studies. They propose treating the selection bias parameter as known, then varying it over a plausible range to assess how inferences change. This paper provides foundational methods for sensitivity analysis under nonignorable missing data.
1420
Semadeni, M., Withers, M. C., & Certo, S. T. (2014). The Perils of Endogeneity and Instrumental Variables in Strategy Research: Understanding through Simulations. Strategic Management Journal, 35(7), 1070–1079.
doi.org/10.1002/smj.2136
SurveyMgmton instrumental variables
weak-instrumentsstrategy-researchsimulationmethodology
Annotation
Semadeni, Withers, and Certo use Monte Carlo simulations to demonstrate the dangers of using weak or invalid instruments in strategy research. They provide practical guidance for management scholars on when and how to use IV, and when it may do more harm than good.
2120
Semenova, V., & Chernozhukov, V. (2021). Debiased Machine Learning of Conditional Average Treatment Effects and Other Causal Functions. Econometrics Journal, 24(2), 264–289.
doi.org/10.1093/ectj/utaa027
Foundationalon double debiased machine learning
CATEheterogeneous-effectsgroup-effects
Annotation
Semenova and Chernozhukov extend DML to estimate conditional average treatment effects (CATEs) and other causal functions, allowing researchers to characterize treatment effect heterogeneity. They provide inference methods for projections of the CATE onto interpretable subgroups.
2520
Semenova, V. (2025). Generalized Lee Bounds. Journal of Econometrics, 251, 106055.
doi.org/10.1016/j.jeconom.2025.106055
Foundationalon lee bounds
machine-learningcovariatestighter-bounds
Annotation
Semenova generalizes Lee bounds to allow for covariates and machine learning estimation of nuisance functions, improving the tightness of bounds while maintaining their nonparametric validity. This paper connects the Lee bounds literature to the modern machine learning causal inference literature.
0220
Shadish, W. R., Cook, T. D., & Campbell, D. T. (2002). Experimental and Quasi-Experimental Designs for Generalized Causal Inference. Houghton Mifflin.
Foundationalon interrupted time series
foundationaltextbookquasi-experimental
Annotation
Shadish, Cook, and Campbell write the standard textbook on quasi-experimental designs, including a comprehensive treatment of interrupted time series. Discusses threats to validity (history, instrumentation, selection-maturation interaction) specific to ITS designs and provides guidance on when ITS is most credible.
9819
Shaver, J. M. (1998). Accounting for Endogeneity When Assessing Strategy Performance: Does Entry Mode Choice Affect FDI Survival?. Management Science, 44(4), 571–585.
doi.org/10.1287/mnsc.44.4.571
ApplicationMgmton heckman selection model, ols regression
endogeneityself-selectionentry-modeFDIHeckman-correction+1
Annotation
Shaver demonstrates how ignoring endogeneity — specifically, the self-selection of firms into entry modes — biases performance estimates in this foundational strategy paper. He shows that the choice between greenfield entries and acquisitions reflects private information about expected survival, and uses a Heckman-style selection correction to obtain unbiased estimates. One of the first papers to systematically demonstrate endogeneity problems in strategy research.
1720
Shipman, J. E., Swanquist, Q. T., & Whited, R. L. (2017). Propensity Score Matching in Accounting Research. The Accounting Review, 92(1), 213–244.
doi.org/10.2308/accr-51449
Surveyon matching methods
propensity-scoreaccountingbest-practicesmethodology
Annotation
Shipman, Swanquist, and Whited review how propensity score matching is used (and sometimes misused) in accounting research. They provide practical guidelines on common pitfalls such as matching on post-treatment variables, inadequate balance checks, and ignoring the unconfoundedness assumption.
0120
Shumway, T. (2001). Forecasting Bankruptcy More Accurately: A Simple Hazard Model. Journal of Business, 74(1), 101–124.
doi.org/10.1086/209665
Applicationon cox proportional hazard
applicationfinancebankruptcy
Annotation
Shumway shows that discrete-time hazard models outperform static logit models for bankruptcy prediction because they properly account for the time dimension and censoring. Demonstrates the importance of survival analysis framing for event prediction in finance.
0620
Silva, J. M. C. S., & Tenreyro, S. (2006). The Log of Gravity. Review of Economics and Statistics, 88(4), 641–658.
doi.org/10.1162/rest.88.4.641
Foundationalon poisson negative binomial
gravity-modelPPMLtradeheteroskedasticity
Annotation
Silva and Tenreyro demonstrate that OLS estimation of log-linearized gravity models produces inconsistent estimates in the presence of heteroskedasticity. They show that Poisson pseudo-maximum-likelihood (PPML) provides consistent estimates and naturally handles zero trade flows, transforming the trade literature.
1120
Simmons, J. P., Nelson, L. D., & Simonsohn, U. (2011). False-Positive Psychology: Undisclosed Flexibility in Data Collection and Analysis Allows Presenting Anything as Significant. Psychological Science, 22(11), 1359–1366.
doi.org/10.1177/0956797611417632
Foundationalon pre registration
p-hackingresearcher-degrees-of-freedomfalse-positives
Annotation
Simmons, Nelson, and Simonsohn demonstrate how researcher degrees of freedom in data collection and analysis can inflate false-positive rates dramatically. Their paper, which proposes disclosure requirements and pre-registration as solutions, is one of the catalysts for the replication crisis and pre-registration movement.
2020
Simonsohn, U., Simmons, J. P., & Nelson, L. D. (2020). Specification Curve Analysis. Nature Human Behaviour, 4(11), 1208–1214.
doi.org/10.1038/s41562-020-0912-z
Foundationalon specification curve
specification-curverobustnessanalytical-flexibility
Annotation
Simonsohn, Simmons, and Nelson introduce specification curve analysis, which systematically runs all reasonable specifications of a model and displays the distribution of estimates. This approach replaces selective reporting of specifications with a comprehensive view of how results depend on analytical choices.
0320
Singer, J. D., & Willett, J. B. (2003). Applied Longitudinal Data Analysis: Modeling Change and Event Occurrence. Oxford University Press.
doi.org/10.1093/acprof:oso/9780195152968.001.0001
Surveyon cox proportional hazard
surveytextbookdiscrete-time
Annotation
Singer and Willett write an accessible textbook covering both growth curve models and discrete-time survival analysis. Chapters 9-15 provide a clear introduction to hazard modeling for social science researchers, with worked examples and practical guidance.
1120
Singh, J., & Agrawal, A. (2011). Recruiting for Ideas: How Firms Exploit the Prior Inventions of New Hires. Management Science, 57(1), 129–150.
doi.org/10.1287/mnsc.1100.1253
ApplicationMgmton difference in differences
knowledge-transferinventor-mobilitypatent-citations
Annotation
Singh and Agrawal use a difference-in-differences approach, comparing citation rates to recruits' patents before and after the move against matched control patents, to study how hiring inventors affects knowledge flows to the hiring firm. They find that hiring an inventor increases the hiring firm's citations to the recruit's prior patents, indicating knowledge transfer. The paper demonstrates how DiD with matched controls can identify causal effects in knowledge flow studies.
0520
Smith, J. A., & Todd, P. E. (2005). Does Matching Overcome LaLonde's Critique of Nonexperimental Estimators?. Journal of Econometrics, 125(1–2), 305–353.
doi.org/10.1016/j.jeconom.2004.04.011
Foundationalon matching methods
LaLonde-critiquepropensity-scoreexternal-validity
Annotation
Smith and Todd reexamine the Dehejia and Wahba (1999) reanalysis of LaLonde (1986), showing that the matching results are sensitive to specific sample and specification choices. They demonstrate that matching methods cannot solve fundamental problems when treated and comparison groups come from very different populations.
9719
Staiger, D., & Stock, J. H. (1997). Instrumental Variables Regression with Weak Instruments. Econometrica, 65(3), 557–586.
doi.org/10.2307/2171753
Foundationalon instrumental variables
weak-instruments2SLS-biasasymptotic-theory
Annotation
Staiger and Stock show formally that when instruments are weak, 2SLS estimates are biased toward OLS and standard inference breaks down. This paper establishes the theoretical foundations for the weak instruments problem that Stock and Yogo (2005) later provided practical tests for.
1920
Starr, E., Frake, J., & Agarwal, R. (2019). Mobility Constraint Externalities. Organization Science, 30(5), 961–980.
doi.org/10.1287/orsc.2018.1252
ApplicationMgmton sensitivity analysis
Oster-methodcoefficient-stabilitynoncompete-agreementslabor-mobilityexternalities
Annotation
Starr, Frake, and Agarwal study how noncompete agreements generate externalities for all workers in a labor market, not just those directly constrained. They use Oster's (2019) coefficient stability diagnostic to assess robustness of findings to omitted variable bias, demonstrating that enforceable noncompetes are associated with reduced job offers, mobility, and wages even for unconstrained workers.
1620
Steegen, S., Tuerlinckx, F., Gelman, A., & Vanpaemel, W. (2016). Increasing Transparency Through a Multiverse Analysis. Perspectives on Psychological Science, 11(5), 702–712.
doi.org/10.1177/1745691616658637
Foundationalon specification curve
multiverse-analysisgarden-of-forking-pathstransparency
Annotation
Steegen and colleagues introduce multiverse analysis, which examines how results vary across the full set of defensible data processing and analytical decisions. This approach is closely related to specification curve analysis and emphasizes transparency about the garden of forking paths in data analysis.
0220
Stock, J. H., Wright, J. H., & Yogo, M. (2002). A Survey of Weak Instruments and Weak Identification in Generalized Method of Moments. Journal of Business & Economic Statistics, 20(4), 518–529.
doi.org/10.1198/073500102288618658
Surveyon instrumental variables
weak-instrumentsGMMweak-identificationsurvey
Annotation
Stock, Wright, and Yogo combine a review of weak-identification problems in GMM and IV with original contributions on weak-identification-robust inference. The paper foreshadows the formal critical-value tables that now appear in Stock and Yogo (2005).
0520
Stock, J. H., & Yogo, M. (2005). Testing for Weak Instruments in Linear IV Regression. Identification and Inference for Econometric Models: Essays in Honor of Thomas Rothenberg, 80–108.
doi.org/10.1017/CBO9780511614491.006
Foundationalon instrumental variables
weak-instrumentsF-statisticdiagnostic-test
Annotation
Stock and Yogo develop formal critical value tables for testing whether instruments are 'weak'—that is, only weakly correlated with the endogenous variable. Their tables formalize the Staiger and Stock (1997) rule of thumb that the first-stage F-statistic should exceed 10, and are probably the most widely used diagnostic in applied IV research.
1020
Stuart, E. A. (2010). Matching Methods for Causal Inference: A Review and a Look Forward. Statistical Science, 25(1), 1–21.
doi.org/10.1214/09-STS313
Surveyon matching methods
matching-reviewpropensity-scorepractical-guidancesurvey
Annotation
Stuart provides a comprehensive review of matching methods including propensity score matching, Mahalanobis distance matching, and coarsened exact matching, with practical guidance on implementation. She offers an accessible overview of when and how to use different matching approaches.
1120
Stuart, E. A., Cole, S. R., Bradshaw, C. P., & Leaf, P. J. (2011). The Use of Propensity Scores to Assess the Generalizability of Results from Randomized Trials. Journal of the Royal Statistical Society: Series A, 174(2), 369–386.
doi.org/10.1111/j.1467-985X.2010.00673.x
Foundationalon external validity
Annotation
Stuart et al. propose propensity-score-based metrics for assessing the generalizability of randomized trial results to target populations, providing diagnostics for evaluating how well trial participants represent a broader population.
2120
Sun, L., & Abraham, S. (2021). Estimating Dynamic Treatment Effects in Event Studies with Heterogeneous Treatment Effects. Journal of Econometrics, 225(2), 175–199.
doi.org/10.1016/j.jeconom.2020.09.006
Foundationalon staggered difference in differences, event studies
event-studyinteraction-weighteddynamic-effects
Annotation
Sun and Abraham show that conventional event-study regression coefficients are contaminated by treatment effect heterogeneity across cohorts and propose an interaction-weighted estimator that recovers clean dynamic treatment effects. This paper is the key reference for event-study plots in staggered settings.

0020
Therneau, T. M., & Grambsch, P. M. (2000). Modeling Survival Data: Extending the Cox Model. Springer.
doi.org/10.1007/978-1-4757-3294-8
Surveyon cox proportional hazard
surveytextbookcox-extensions
Annotation
Therneau and Grambsch provide an authoritative reference on extensions of the Cox model including time-varying covariates, stratification, frailty models, and multistate models. The R survival package is maintained by Therneau and implements the methods described here.
6019
Thistlethwaite, D. L., & Campbell, D. T. (1960). Regression-Discontinuity Analysis: An Alternative to the Ex Post Facto Experiment. Journal of Educational Psychology, 51(6), 309–317.
doi.org/10.1037/h0044319
Foundationalon regression discontinuity sharp
RDD-originscutoff-designquasi-experiment
Annotation
Thistlethwaite and Campbell introduce the regression discontinuity design, proposing to compare units just above and just below a cutoff score to estimate causal effects, reasoning that units near the cutoff are as-good-as randomly assigned. The idea lay dormant for decades before being rediscovered by economists.
0920
Train, K. E. (2009). Discrete Choice Methods with Simulation. Cambridge University Press.
doi.org/10.1017/CBO9780511805271
Surveyon logit probit
textbookdiscrete-choicesimulation-estimation
Annotation
Train's textbook provides a comprehensive and accessible treatment of logit, probit, mixed logit, and other discrete choice models. It covers both theory and practical simulation-based estimation methods and is widely used in economics, marketing, and transportation research.

0220
Van der Klaauw, W. (2002). Estimating the Effect of Financial Aid Offers on College Enrollment: A Regression-Discontinuity Approach. International Economic Review, 43(4), 1249–1287.
doi.org/10.1111/1468-2354.t01-1-00055
Applicationon regression discontinuity fuzzy
financial-aideducationfuzzy-RDD
Annotation
Van der Klaauw applies a fuzzy RDD to study how financial aid offers affect college enrollment decisions, exploiting discontinuities in an aid assignment rule where eligibility changes at GPA thresholds but compliance is imperfect. This paper is one of the earliest and most influential applications of fuzzy RDD.
1520
VanderWeele, T. J. (2015). Explanation in Causal Inference: Methods for Mediation and Interaction. Oxford University Press.
Surveyon causal mediation analysis
textbookmediationinteractionsensitivity
Annotation
VanderWeele's comprehensive textbook unifies the causal mediation literature, covering potential outcomes and structural equation approaches, sensitivity analysis, time-varying treatments, and interaction effects. It is the standard reference for researchers conducting mediation analysis.
1620
VanderWeele, T. J. (2016). Mediation Analysis: A Practitioner's Guide. Annual Review of Public Health, 37, 17–32.
doi.org/10.1146/annurev-publhealth-032315-021402
Surveyon causal mediation analysis
practitioners-guidesensitivity-analysispublic-healthsurvey
Annotation
VanderWeele provides an accessible practitioner-oriented guide to modern causal mediation analysis, covering the assumptions required for identification, sensitivity analysis for unmeasured confounding, and extensions to multiple mediators and interactions. This review is an excellent entry point for applied researchers seeking to move beyond the Baron-Kenny framework.
1720
VanderWeele, T. J., & Ding, P. (2017). Sensitivity Analysis in Observational Research: Introducing the E-Value. Annals of Internal Medicine, 167(4), 268–274.
doi.org/10.7326/M16-2607
Foundationalon sensitivity analysis
E-valueunmeasured-confoundingepidemiology
Annotation
VanderWeele and Ding introduce the E-value, a simple and intuitive measure of the minimum strength of association that an unmeasured confounder would need to have with both the treatment and outcome to fully explain away an observed treatment-outcome association. The E-value is widely adopted in epidemiology and increasingly discussed in social science.
0620
Villalonga, B., & Amit, R. (2006). How Do Family Ownership, Control and Management Affect Firm Value?. Journal of Financial Economics, 80(2), 385–417.
doi.org/10.1016/j.jfineco.2004.12.005
Applicationon ols regression
family-firmscorporate-governancefirm-value
Annotation
Villalonga and Amit use OLS, firm fixed effects, and instrumental-variable and Heckman selection estimators on Fortune 500 panel data to disentangle the separate effects of family ownership, voting control through dual-class shares and pyramids, and family management on Tobin's q. They show that the three dimensions of family involvement have distinct, sometimes offsetting, effects on firm value.

1820
Wager, S., & Athey, S. (2018). Estimation and Inference of Heterogeneous Treatment Effects Using Random Forests. Journal of the American Statistical Association, 113(523), 1228–1242.
doi.org/10.1080/01621459.2017.1319839
Foundationalon causal forests
causal-forestsrandom-forestsasymptotic-normality
Annotation
Wager and Athey develop causal forests by extending random forests to estimate conditional average treatment effects. They prove pointwise consistency and asymptotic normality under regularity conditions, enabling valid confidence intervals for individualized treatment effect estimates.
0220
Wagner, A. K., Soumerai, S. B., Zhang, F., & Ross-Degnan, D. (2002). Segmented Regression Analysis of Interrupted Time Series Studies in Medication Use Research. Journal of Clinical Pharmacy and Therapeutics, 27(4), 299–309.
doi.org/10.1046/j.1365-2710.2002.00430.x
Foundationalon interrupted time series
foundationalsegmented-regressionhealth-services
Annotation
Wagner and colleagues formalize segmented regression for ITS in health services research. Clearly specifies the model with level-change and slope-change parameters, discusses autocorrelation correction, and provides practical recommendations for minimum series length and model diagnostics.
2320
Webb, M. D. (2023). Reworking Wild Bootstrap-Based Inference for Clustered Errors. Canadian Journal of Economics, 56(3), 839–858.
doi.org/10.1111/caje.12661
Foundationalon clustering inference
wild-bootstrapfew-clustersWebb-weights
Annotation
Webb introduces the six-point distribution as an alternative to Rademacher weights for the wild cluster bootstrap. The Webb weights improve finite-sample performance when the number of clusters is very small.
9319
Westfall, P. H., & Young, S. S. (1993). Resampling-Based Multiple Testing: Examples and Methods for p-Value Adjustment. Wiley.
Foundationalon multiple testing
resamplingpermutationstep-downtextbook
Annotation
Westfall and Young develop resampling-based methods for multiple testing that account for the dependence structure among test statistics. Their permutation-based step-down procedure is less conservative than Bonferroni and becomes a standard reference for multiple testing adjustments in applied research.
8019
White, H. (1980). A Heteroskedasticity-Consistent Covariance Matrix Estimator and a Direct Test for Heteroskedasticity. Econometrica, 48(4), 817–838.
doi.org/10.2307/1912934
Foundationalon ols regression
robust-standard-errorsheteroskedasticityinference
Annotation
White introduces the now-standard 'robust standard errors' that researchers routinely use with OLS. Before White's correction, standard errors could be misleadingly small when the variance of the error term was not constant across observations. Nearly every empirical paper today uses some variant of this approach.
1920
Wolfolds, S. E., & Siegel, J. (2019). Misaccounting for Endogeneity: The Peril of Relying on the Heckman Two-Step Method without a Valid Instrument. Strategic Management Journal, 40(3), 432–462.
doi.org/10.1002/smj.2995
SurveyMgmton heckman selection model
Heckman-correctionexclusion-restrictionselection-modelsmisapplication
Annotation
Wolfolds and Siegel demonstrate that the Heckman selection correction is frequently misapplied in management research, particularly when the exclusion restriction is not credible. They show via simulation and replication that applying the Heckman correction without a valid instrument can introduce more bias than it removes. The paper provides a cautionary guide for researchers considering selection models and recommends transparent reporting of the exclusion restriction.
9919
Wooldridge, J. M. (1999). Distribution-Free Estimation of Some Nonlinear Panel Data Models. Journal of Econometrics, 90(1), 77–97.
doi.org/10.1016/S0304-4076(98)00033-5
Foundationalon poisson negative binomial
quasi-MLEpanel-datarobustness
Annotation
Wooldridge shows that the fixed-effects Poisson quasi-MLE for panel data is consistent for the conditional mean even if the data are not Poisson-distributed (or even count), as long as the conditional mean is correctly specified. This result underpins the widespread use of fixed-effects Poisson in panel settings with overdispersion, zeros, or non-count nonnegative outcomes; the cross-sectional analogue for trade-flow gravity equations is Silva and Tenreyro (2006).
1020
Wooldridge, J. M. (2010). Econometric Analysis of Cross Section and Panel Data. MIT Press, 2nd edition.
Surveyon cox proportional hazard, experimental design, fixed effects +7
textbookpanel-datareference
Annotation
Wooldridge's graduate textbook covers duration and hazard models in Chapter 22, including the Cox proportional hazard model, parametric alternatives (Weibull, exponential), and the treatment of censoring and truncation in survival data.
1920
Wooldridge, J. M. (2019). Correlated Random Effects Models with Unbalanced Panels. Journal of Econometrics, 211(1), 137–150.
doi.org/10.1016/j.jeconom.2018.12.010
Foundationalon random effects
correlated-random-effectsunbalanced-panelspanel-dataCRE
Annotation
Wooldridge extends the correlated random effects (CRE) framework to handle unbalanced panels, which are the norm in applied research. This paper shows how to combine the flexibility of fixed effects with the ability to estimate effects of time-invariant variables, making the CRE approach practical for real-world datasets.

1720
Young, C., & Holsteen, K. (2017). Model Uncertainty and Robustness: A Computational Framework for Multimodel Analysis. Sociological Methods & Research, 46(1), 3–40.
doi.org/10.1177/0049124115610347
Foundationalon specification curve
model-uncertaintymultimodel-analysissociology
Annotation
Young and Holsteen develop a computational framework for systematically exploring model uncertainty by running thousands of plausible specifications. Their approach is one of the earliest implementations of what would become known as specification curve or multiverse analysis, applied to sociological research.
1920
Young, A. (2019). Channeling Fisher: Randomization Tests and the Statistical Insignificance of Seemingly Significant Experimental Results. Quarterly Journal of Economics, 134(2), 557–598.
doi.org/10.1093/qje/qjy029
Applicationon randomization inference
replicationexperimental-economicsinference
Annotation
Young applies randomization inference to a large sample of experimental papers published in top economics journals and finds that many results that appear significant under conventional inference are insignificant under randomization tests. This paper demonstrates the practical importance of randomization inference for credible empirical research.
2220
Young, A. (2022). Consistency Without Inference: Instrumental Variables in Practical Application. European Economic Review, 147, 104112.
doi.org/10.1016/j.euroecorev.2022.104112
Applicationon instrumental variables
weak-instrumentspublished-researchreplicationinference-failures
Annotation
Young reexamines published IV applications and argues that standard first-stage F-statistic diagnostics are largely uninformative of both size and bias under non-iid errors and high leverage. The paper finds that IV estimates in practice rarely demonstrate that OLS is biased, raising broader questions about the reliability of IV as commonly implemented.

0920
Zelner, B. A. (2009). Using Simulation to Interpret Results from Logit, Probit, and Other Nonlinear Models. Strategic Management Journal, 30(12), 1335–1348.
doi.org/10.1002/smj.783
ApplicationMgmton logit probit
simulationinterpretationpredicted-probabilities
Annotation
Zelner advocates using simulation-based approaches to interpret and present results from nonlinear models in management research. By computing predicted probabilities and marginal effects via simulation, researchers can convey substantive significance more clearly than raw coefficients.
1020
Zhao, X., Lynch, J. G., & Chen, Q. (2010). Reconsidering Baron and Kenny: Myths and Truths about Mediation Analysis. Journal of Consumer Research, 37(2), 197–206.
doi.org/10.1086/651257
Foundationalon causal mediation analysis
mediation-classificationBaron-Kenny-critiqueconsumer-research
Annotation
Zhao, Lynch, and Chen provide an important critique of the Baron and Kenny mediation framework from within the marketing literature. They argue that the 'step 1' requirement of a significant total effect is unnecessary and introduces a more sensible classification of mediation types (complementary, competitive, indirect-only, direct-only, no-effect). While still operating within the regression framework rather than the full causal framework, this paper is a significant step forward for applied researchers.
1920
Zhao, Q., Small, D. S., & Bhattacharya, B. B. (2019). Sensitivity Analysis for Inverse Probability Weighting Estimators via the Percentile Bootstrap. Journal of the Royal Statistical Society: Series B, 81(4), 735–761.
doi.org/10.1111/rssb.12327
Foundationalon doubly robust estimation
sensitivity-analysishealthcarebootstrapAIPW
Annotation
Zhao, Small, and Bhattacharya develop sensitivity analysis tools for inverse probability weighted and augmented IPW estimators via the percentile bootstrap. They apply the methods to evaluate the causal effect of fish consumption on blood mercury levels, demonstrating practical use of AIPW sensitivity analysis in an observational study context. The paper provides a computationally convenient approach for assessing how sensitive doubly robust estimates are to violations of the unconfoundedness assumption.