MethodAtlas

Bibliography

All 311 papers referenced across Method Atlas, formatted for clarity. Search, filter, sort, and export the full collection as BibTeX.

Sort by:

Showing 311 of 311 references

  1. Abadie, A., & Gardeazabal, J. (2003). The Economic Costs of Conflict: A Case Study of the Basque Country. American Economic Review, 93(1), 113–132.

    https://doi.org/10.1257/000282803321455188

    FoundationalCited on: synthetic control
    Annotation

    This earlier paper introduced the synthetic control idea in the context of estimating the economic costs of terrorism in the Basque Country. It constructed a synthetic Basque Country from other Spanish regions and showed that terrorism reduced GDP per capita by about 10 percentage points.

  2. Abadie, A., & Imbens, G. W. (2006). Large Sample Properties of Matching Estimators for Average Treatment Effects. Econometrica, 74(1), 235–267.

    https://doi.org/10.1111/j.1468-0262.2006.00655.x

    FoundationalCited on: matching methods
    Annotation

    Abadie and Imbens derived the large-sample properties of nearest-neighbor matching estimators and showed that the standard bootstrap is not valid for inference with matching. They proposed a bias-corrected estimator and proper variance formula that have become standard in practice.

  3. Abadie, A., Diamond, A., & Hainmueller, J. (2010). Synthetic Control Methods for Comparative Case Studies: Estimating the Effect of California's Tobacco Control Program. Journal of the American Statistical Association, 105(490), 493–505.

    https://doi.org/10.1198/jasa.2009.ap08746

    Annotation

    This paper formalized and popularized the synthetic control method, which constructs a weighted combination of control units to approximate the counterfactual for a single treated unit. The application to California's Proposition 99 tobacco control program became the canonical example of the method.

  4. Abadie, A., & Imbens, G. W. (2011). Bias-Corrected Matching Estimators for Average Treatment Effects. Journal of Business & Economic Statistics, 29(1), 1–11.

    https://doi.org/10.1198/jbes.2009.07333

    FoundationalCited on: matching methods
    Annotation

    Abadie and Imbens developed bias-corrected matching estimators that adjust for the finite-sample bias inherent in nearest-neighbor matching when matching is not exact. Their bias correction uses a regression adjustment within matched pairs and has become a standard recommendation for applied researchers using matching methods.

  5. Abadie, A., Diamond, A., & Hainmueller, J. (2015). Comparative Politics and the Synthetic Control Method. American Journal of Political Science, 59(2), 495–510.

    https://doi.org/10.1111/ajps.12116

    ApplicationCited on: synthetic control
    Annotation

    This paper applied the synthetic control method to estimate the economic impact of German reunification, constructing a synthetic West Germany from OECD countries. It demonstrated the method's applicability to major political events and provided inference procedures based on permutation tests.

  6. Abadie, A., Athey, S., Imbens, G. W., & Wooldridge, J. M. (2020). Sampling-Based versus Design-Based Uncertainty in Regression Analysis. Econometrica, 88(1), 265–296.

    https://doi.org/10.3982/ECTA12675

    FoundationalCited on: ols regression
    Annotation

    This paper clarifies when and why researchers should cluster standard errors in regression analysis. It distinguishes between sampling-based uncertainty (from drawing a sample from a population) and design-based uncertainty (from treatment assignment), providing rigorous guidance on a question that affects nearly every applied OLS study.

  7. Abadie, A. (2021). Using Synthetic Controls: Feasibility, Data Requirements, and Methodological Aspects. Journal of Economic Literature, 59(2), 391–425.

    https://doi.org/10.1257/jel.20191450

    FoundationalCited on: synthetic control
    Annotation

    Abadie provided a comprehensive methodological overview of synthetic control, covering data requirements, inference via placebo tests, extensions to multiple treated units, and common pitfalls. This paper is the authoritative practitioner's guide to the method.

  8. Abowd, J. M., Kramarz, F., & Margolis, D. N. (1999). High Wage Workers and High Wage Firms. Econometrica, 67(2), 251–333.

    https://doi.org/10.1111/1468-0262.00020

    ApplicationCited on: fixed effects
    Annotation

    This landmark paper used worker and firm fixed effects jointly to decompose wage variation into worker ability and firm pay premia. The 'AKM' model has become the standard framework for studying labor market sorting, wage inequality, and the role of firms in wage-setting.

  9. Acemoglu, D., Johnson, S., & Robinson, J. A. (2001). The Colonial Origins of Comparative Development: An Empirical Investigation. American Economic Review, 91(5), 1369–1401.

    https://doi.org/10.1257/aer.91.5.1369

    ApplicationCited on: instrumental variables
    Annotation

    This celebrated paper used historical settler mortality as an instrument for institutional quality to estimate the causal effect of institutions on economic development. It is one of the most influential IV applications in economics and demonstrates the creativity required to find a plausible instrument.

  10. Acharya, A., Blackwell, M., & Sen, M. (2016). Explaining Causal Findings Without Bias: Detecting and Assessing Direct Effects. American Political Science Review, 110(3), 512–529.

    https://doi.org/10.1017/S0003055416000216

    FoundationalCited on: causal mediation analysis
    Annotation

    Acharya, Blackwell, and Sen developed a sequential g-estimation approach for estimating controlled direct effects in observational studies, addressing the problem that conditioning on a post-treatment mediator can introduce collider bias. Their method is particularly useful in political science and social science settings where intermediate confounders make standard mediation analysis unreliable.

  11. Adao, R., Kolesar, M., & Morales, E. (2019). Shift-Share Designs: Theory and Inference. Quarterly Journal of Economics, 134(4), 1949–2010.

    https://doi.org/10.1093/qje/qjz025

    FoundationalCited on: shift share instruments
    Annotation

    Adao, Kolesar, and Morales showed that standard errors in shift-share regressions are too small when computed with conventional clustering because residuals are correlated across regions that share similar industry compositions. They proposed an inference procedure that accounts for this dependence.

  12. Agarwal, R., & Ohyama, A. (2013). Industry or Academia, Basic or Applied? Career Choices and Earnings Trajectories of Scientists. Management Science, 59(4), 950–970.

    https://doi.org/10.1287/mnsc.1120.1582

    ApplicationManagement journalCited on: difference in differences
    Annotation

    Uses panel data on scientists' career choices with DiD-style comparisons to identify the effect of early career environment on long-run earnings trajectories. A management-journal application showing how DiD logic can be applied to career and human capital questions.

  13. Aguinis, H., Beaty, J. C., Boik, R. J., & Pierce, C. A. (2005). Effect Size and Power in Assessing Moderating Effects of Categorical Variables Using Multiple Regression: A 30-Year Review. Journal of Applied Psychology, 90(1), 94–107.

    https://doi.org/10.1037/0021-9010.90.1.94

    ApplicationManagement journalCited on: power analysis
    Annotation

    Aguinis and colleagues reviewed 30 years of moderator analysis in applied psychology and management, finding that most studies were severely underpowered to detect interaction effects. They provided guidelines for computing power for moderated regression, which is highly relevant to management researchers testing contingency hypotheses.

  14. Aguinis, H., Gottfredson, R. K., & Culpepper, S. A. (2013). Best-Practice Recommendations for Estimating Cross-Level Interaction Effects Using Multilevel Modeling. Journal of Management, 39(6), 1490–1528.

    https://doi.org/10.1177/0149206313478188

    ApplicationManagement journalCited on: random effects
    Annotation

    This paper provided detailed guidance for management researchers on estimating cross-level interaction effects in multilevel models, addressing common problems such as insufficient statistical power, centering decisions, and effect size reporting.

  15. Aguinis, H., Edwards, J. R., & Bradley, K. J. (2017). Improving Our Understanding of Moderation and Mediation in Strategic Management Research. Organizational Research Methods, 20(4), 665–685.

    https://doi.org/10.1177/1094428115627498

    ApplicationManagement journalCited on: causal mediation analysis
    Annotation

    Aguinis, Edwards, and Bradley reviewed how mediation and moderation analyses are conducted in strategic management research and identified common errors. They provided recommendations for improving practice, including using causal mediation frameworks and proper inference procedures.

  16. Aguinis, H., Ramani, R. S., & Alabduljader, N. (2018). What You See Is What You Get? Enhancing Methodological Transparency in Management Research. Academy of Management Annals, 12(1), 83–110.

    https://doi.org/10.5465/annals.2016.0011

    ApplicationManagement journalCited on: pre registration
    Annotation

    Aguinis, Ramani, and Alabduljader reviewed methodological transparency in management research and advocated for pre-registration, open data, and open materials. They documented the extent of undisclosed analytical flexibility in management studies and proposed concrete steps for improvement.

  17. Ai, C., & Norton, E. C. (2003). Interaction Terms in Logit and Probit Models. Economics Letters, 80(1), 123–129.

    https://doi.org/10.1016/S0165-1765(03)00032-6

    FoundationalCited on: logit probit
    Annotation

    Ai and Norton showed that the interpretation of interaction terms in nonlinear models like logit and probit is much more complicated than in linear models. The marginal effect of an interaction is not simply the coefficient on the interaction term, a mistake that was widespread in applied research.

  18. Albouy, D. Y. (2012). The Colonial Origins of Comparative Development: An Empirical Investigation: Comment. American Economic Review, 102(6), 3059–3076.

    https://doi.org/10.1257/aer.102.6.3059

    ApplicationCited on: instrumental variables
    Annotation

    Albouy critically re-examined the settler mortality instrument used in Acemoglu et al. (2001), showing that the original results are sensitive to data coding decisions and the sample of countries included. This comment is a cautionary tale about instrument validity and the fragility of influential IV estimates.

  19. Allison, P. D. (2009). Fixed Effects Regression Models. SAGE Publications.

    https://doi.org/10.4135/9781412993869

    SurveyCited on: random effects
    Annotation

    Allison's concise and accessible monograph compares fixed effects and random effects models for panel data, providing practical guidance on model selection, estimation, and interpretation. It is particularly useful for social scientists seeking an intuitive understanding of when each approach is appropriate.

  20. Altonji, J. G., Elder, T. E., & Taber, C. R. (2005). Selection on Observed and Unobserved Variables: Assessing the Effectiveness of Catholic Schools. Journal of Political Economy, 113(1), 151–184.

    https://doi.org/10.1086/426036

    FoundationalCited on: sensitivity analysis
    Annotation

    Altonji, Elder, and Taber developed the idea that if selection on observables is informative about selection on unobservables, one can bound the bias from omitted variables. Their approach became the basis for the widely used Oster (2019) sensitivity framework.

  21. Amemiya, T. (1981). Qualitative Response Models: A Survey. Journal of Economic Literature, 19(4), 1483–1536.

    FoundationalCited on: logit probit
    Annotation

    Amemiya provided a comprehensive survey of qualitative response models including logit, probit, and tobit. This survey organized the theoretical properties, estimation methods, and specification tests for binary and multinomial choice models and became a standard reference for applied researchers.

  22. Anderson, M. L. (2008). Multiple Inference and Gender Differences in the Effects of Early Intervention: A Reevaluation of the Abecedarian, Perry Preschool, and Early Training Projects. Journal of the American Statistical Association, 103(484), 1481–1495.

    https://doi.org/10.1198/016214508000000841

    FoundationalCited on: multiple testing
    Annotation

    Anderson proposed using index tests and the Westfall-Young step-down procedure to address multiple testing in program evaluation. He demonstrated that many previously reported significant gender differences in early childhood interventions disappeared after proper multiple testing corrections.

  23. Andrews, I., Stock, J. H., & Sun, L. (2019). Weak Instruments in Instrumental Variables Regression: Theory and Practice. Annual Review of Economics, 11, 727–753.

    https://doi.org/10.1146/annurev-economics-080218-025643

    SurveyCited on: instrumental variables
    Annotation

    This survey provides an up-to-date review of the weak instruments problem, covering modern diagnostic tests, robust inference procedures, and practical recommendations. It is an excellent starting point for understanding the current best practices in IV estimation.

  24. Angrist, J. D. (1990). Lifetime Earnings and the Vietnam Era Draft Lottery: Evidence from Social Security Administrative Records. American Economic Review, 80(3), 313–336.

    FoundationalCited on: instrumental variables
    Annotation

    A landmark application of instrumental variables using the Vietnam-era draft lottery as a natural experiment. Angrist showed that randomly assigned lottery numbers provide an instrument for military service, allowing causal estimation of the earnings effect of military service.

  25. Angrist, J. D., & Krueger, A. B. (1991). Does Compulsory School Attendance Affect Schooling and Earnings?. Quarterly Journal of Economics, 106(4), 979–1014.

    https://doi.org/10.2307/2937954

    FoundationalCited on: instrumental variables
    Annotation

    Angrist and Krueger used quarter of birth as an instrument for years of schooling, exploiting the fact that compulsory schooling laws interact with birth timing. This paper is one of the most-taught examples of instrumental variables in economics and also sparked important debates about weak instruments.

  26. Angrist, J. D., Imbens, G. W., & Rubin, D. B. (1996). Identification of Causal Effects Using Instrumental Variables. Journal of the American Statistical Association, 91(434), 444–455.

    https://doi.org/10.1080/01621459.1996.10476902

    Annotation

    The foundational paper on LATE and the complier framework, establishing when and what IV (and randomization with non-compliance) identifies.

  27. Angrist, J. D., & Lavy, V. (1999). Using Maimonides' Rule to Estimate the Effect of Class Size on Scholastic Achievement. Quarterly Journal of Economics, 114(2), 533–575.

    https://doi.org/10.1162/003355399556061

    Annotation

    Angrist and Lavy exploited a rule that caps class sizes at 40 students, creating discontinuities in class size as enrollment crosses multiples of 40. The imperfect compliance with the rule makes this a fuzzy RDD. This paper is one of the most widely taught examples of the fuzzy RDD approach.

  28. Angrist, J. D., & Krueger, A. B. (2001). Instrumental Variables and the Search for Identification: From Supply and Demand to Natural Experiments. Journal of Economic Perspectives, 15(4), 69–85.

    https://doi.org/10.1257/jep.15.4.69

    SurveyCited on: instrumental variables
    Annotation

    A historical survey tracing the evolution of IV from its origins in supply-and-demand estimation to modern natural experiments. Provides valuable context for understanding how IV methodology developed and why it became central to applied economics.

  29. Angrist, J., Bettinger, E., & Kremer, M. (2006). Long-Term Educational Consequences of Secondary School Vouchers: Evidence from Administrative Records in Colombia. American Economic Review, 96(3), 847–862.

    https://doi.org/10.1257/aer.96.3.847

    ApplicationCited on: lee bounds
    Annotation

    Angrist, Bettinger, and Kremer applied Lee bounds to address attrition in a school voucher experiment in Colombia. This paper is one of the earliest and most prominent applications of Lee bounds in development economics, demonstrating how the method handles selective attrition in a real policy evaluation.

  30. Angrist, J. D., & Pischke, J.-S. (2009). Mostly Harmless Econometrics: An Empiricist's Companion. Princeton University Press.

    https://doi.org/10.1515/9781400829828

    Annotation

    The most influential modern textbook on applied econometrics, with essential chapters on experimental analysis, IV, and the design-based approach to causal inference.

  31. Angrist, J. D., & Pischke, J.-S. (2010). The Credibility Revolution in Empirical Economics: How Better Research Design Is Taking the Con out of Econometrics. Journal of Economic Perspectives, 24(2), 3–30.

    https://doi.org/10.1257/jep.24.2.3

    Annotation

    Provides the intellectual context for why applied economics moved from 'throw variables into OLS and see what sticks' to design-based causal inference. Helps researchers understand where OLS fits in the larger methodological landscape and why credible identification strategies matter.

  32. Arkhangelsky, D., Athey, S., Hirshberg, D. A., Imbens, G. W., & Wager, S. (2021). Synthetic Difference-in-Differences. American Economic Review, 111(12), 4088–4118.

    https://doi.org/10.1257/aer.20190159

    Annotation

    This paper introduced the synthetic difference-in-differences estimator, which combines the strengths of DID (parallel trends assumption) and synthetic control (re-weighting to improve pre-treatment fit). The method uses both unit weights and time weights to construct a more credible counterfactual, and provides valid inference without requiring a large donor pool.

  33. Arkhangelsky, D., & Imbens, G. W. (2023). Doubly Robust Identification for Causal Panel Data Models. Econometrics Journal, 26(3), C48–C76.

    https://doi.org/10.1093/ectj/utad018

    Annotation

    Arkhangelsky and Imbens extended the synthetic DID framework by developing doubly robust identification strategies for causal panel data models. Their approach combines outcome modeling with re-weighting, providing consistent estimates if either the outcome model or the weighting scheme is correctly specified, thereby strengthening the theoretical foundations of the SDID approach.

  34. Arrfelt, M., Wiseman, R. M., & Hult, G. T. M. (2013). Looking Backward Instead of Forward: Aspiration-Driven Influences on the Efficiency of the Capital Allocation Process. Academy of Management Journal, 56(4), 1081–1103.

    https://doi.org/10.5465/amj.2010.0879

    ApplicationManagement journalCited on: matching methods
    Annotation

    This paper used propensity score matching alongside other methods to study how performance relative to aspirations affects capital allocation in diversified firms. Published in AMJ, it is an example of how matching methods have been adopted in top management journals to address selection concerns.

  35. Ashenfelter, O. (1978). Estimating the Effect of Training Programs on Earnings. Review of Economics and Statistics, 60(1), 47–57.

    https://doi.org/10.2307/1924332

    FoundationalCited on: difference in differences
    Annotation

    This paper is one of the earliest applications of the difference-in-differences logic. Ashenfelter compared the earnings of trainees before and after a job training program to a comparison group, introducing the idea that you can remove time-invariant unobserved differences by looking at changes over time.

  36. Athey, S., & Imbens, G. W. (2016). Recursive Partitioning for Heterogeneous Causal Effects. Proceedings of the National Academy of Sciences, 113(27), 7353–7360.

    https://doi.org/10.1073/pnas.1510489113

    FoundationalCited on: causal forests
    Annotation

    Athey and Imbens introduced causal trees, adapting the CART algorithm to estimate heterogeneous treatment effects with valid inference. They proposed the honest estimation approach, where one subsample is used for tree construction and another for estimation, ensuring valid confidence intervals.

  37. Athey, S., & Imbens, G. W. (2017). The Econometrics of Randomized Experiments. Handbook of Economic Field Experiments, 1, 73–140.

    https://doi.org/10.1016/bs.hefe.2016.10.003

    Annotation

    This chapter provides a modern, rigorous treatment of the econometrics behind randomized experiments. It covers design, analysis, and inference issues such as stratification, clustering, and multiple hypothesis testing. It is an excellent reference for researchers running field experiments.

  38. Athey, S., & Imbens, G. W. (2019). Machine Learning Methods That Economists Should Know About. Annual Review of Economics, 11, 685–725.

    https://doi.org/10.1146/annurev-economics-080217-053433

    Annotation

    Athey and Imbens provided a comprehensive overview of machine learning methods relevant to economists, with DML as a centerpiece. They explained when and why machine learning methods can improve causal inference and prediction in economics, making these tools accessible to applied researchers.

  39. Athey, S., Tibshirani, J., & Wager, S. (2019). Generalized Random Forests. Annals of Statistics, 47(2), 1148–1178.

    https://doi.org/10.1214/18-AOS1709

    FoundationalReplication packageCited on: causal forests
    Annotation

    This paper introduced the generalized random forest (GRF) framework, which extends causal forests to a broad class of estimating equations including quantile regression, IV, and local average treatment effects. GRF provides the theoretical foundation and the widely used grf R package.

  40. Autor, D. H. (2003). Outsourcing at Will: The Contribution of Unjust Dismissal Doctrine to the Growth of Employment Outsourcing. Journal of Labor Economics, 21(1), 1–42.

    https://doi.org/10.1086/344122

    ApplicationCited on: difference in differences
    Annotation

    Autor used a DID design that exploited the staggered adoption of wrongful-discharge protections across U.S. states. He found that stronger employment protections led firms to outsource more jobs. This paper is a model for using staggered state-level policy changes in a DID framework.

  41. Autor, D. H., Dorn, D., & Hanson, G. H. (2013). The China Syndrome: Local Labor Market Effects of Import Competition in the United States. American Economic Review, 103(6), 2121–2168.

    https://doi.org/10.1257/aer.103.6.2121

    ApplicationCited on: shift share instruments
    Annotation

    Autor, Dorn, and Hanson used a shift-share instrument to study how Chinese import competition affected U.S. local labor markets, instrumenting U.S. import exposure with Chinese exports to other high-income countries. This paper is one of the most influential and widely discussed shift-share applications.

  42. Azoulay, P., Graff Zivin, J. S., & Wang, J. (2010). Superstar Extinction. Quarterly Journal of Economics, 125(2), 549–589.

    https://doi.org/10.1162/qjec.2010.125.2.549

    ApplicationCited on: matching methods
    Annotation

    Azoulay and coauthors used propensity score matching to construct a control group of scientists who did not experience the unexpected death of a 'superstar' collaborator. They found that the death of a superstar leads to a lasting decline in the productivity of their collaborators. This study is an elegant application of matching in the economics of science and innovation.

  43. Bach, P., Chernozhukov, V., Kurz, M. S., & Spindler, M. (2024). DoubleML: An Object-Oriented Implementation of Double Machine Learning in Python. Journal of Machine Learning Research, 25(53), 1–6.

    Annotation

    Bach and colleagues developed the DoubleML Python and R package, providing a user-friendly object-oriented implementation of the DML framework. The package supports partially linear, interactive, and instrumental variable models with a variety of machine learning methods for nuisance estimation.

  44. Baker, A. C., Larcker, D. F., & Wang, C. C. Y. (2022). How Much Should We Trust Staggered Difference-in-Differences Estimates?. Journal of Financial Economics, 144(2), 370–395.

    https://doi.org/10.1016/j.jfineco.2022.01.004

    Annotation

    Baker, Larcker, and Wang demonstrated that the staggered DID problems identified in the econometrics literature are empirically relevant in finance research. They re-analyzed prominent finance studies and showed that results can change substantially when using robust estimators.

  45. Baltagi, B. H. (2021). Econometric Analysis of Panel Data. Springer, 6th edition.

    SurveyCited on: random effects
    Annotation

    The standard graduate-level textbook on panel data econometrics, covering error component models, random effects, and extensions to unbalanced panels and dynamic models. Provides comprehensive treatment of both theoretical foundations and practical implementation.

  46. Bandiera, O., Barankay, I., & Rasul, I. (2005). Social Preferences and the Response to Incentives: Evidence from Personnel Data. Quarterly Journal of Economics, 120(3), 917–962.

    https://doi.org/10.1093/qje/120.3.917

    ApplicationCited on: experimental design
    Annotation

    This paper used a field experiment in a fruit-picking firm to study how switching from relative to piece-rate pay affected productivity. It demonstrated that social preferences among workers matter for incentive design, bridging experimental economics and management.

  47. Banerjee, A. V., & Duflo, E. (2009). The Experimental Approach to Development Economics. Annual Review of Economics.

    https://doi.org/10.1146/annurev.economics.050708.143235

    SurveyCited on: experimental design
    Annotation

    Lays out the case for randomized experiments in development economics, discussing design choices, ethical considerations, and the generalizability of experimental findings.

  48. Banerjee, A., Duflo, E., Goldberg, N., Karlan, D., Osei, R., Pariente, W., Shapiro, J., Thuysbaert, B., & Udry, C. (2015). A Multifaceted Program Causes Lasting Progress for the Very Poor: Evidence from Six Countries. Science, 348(6236), 1260799.

    https://doi.org/10.1126/science.1260799

    ApplicationCited on: experimental design
    Annotation

    A large-scale RCT across six countries demonstrating that a multifaceted anti-poverty program produces sustained economic gains for the ultra-poor.

  49. Bang, H., & Robins, J. M. (2005). Doubly Robust Estimation in Missing Data and Causal Inference Models. Biometrics, 61(4), 962–973.

    https://doi.org/10.1111/j.1541-0420.2005.00377.x

    FoundationalCited on: doubly robust estimation
    Annotation

    Bang and Robins provided an accessible exposition of doubly robust estimators, demonstrating their properties through simulations and clarifying when the double robustness property provides meaningful protection. This paper helped make the method more accessible to applied researchers.

  50. Baron, R. M., & Kenny, D. A. (1986). The Moderator-Mediator Variable Distinction in Social Psychological Research: Conceptual, Strategic, and Statistical Considerations. Journal of Personality and Social Psychology, 51(6), 1173–1182.

    https://doi.org/10.1037/0022-3514.51.6.1173

    FoundationalCited on: causal mediation analysis
    Annotation

    Baron and Kenny introduced the widely used four-step approach to testing mediation, comparing total, direct, and indirect effects using sequential regressions. While later work has identified limitations of this approach, it remains one of the most cited papers in all of social science.

  51. Barrios, J. M. (2021). Staggered Rollout Designs in Accounting Research. Working Paper, Washington University in St. Louis.

    Annotation

    Barrios examined the prevalence of staggered DID designs in accounting research and showed that many published results are sensitive to the choice of estimator. This paper raised awareness of the staggered DID problem in the accounting and management fields. [UNVERIFIED: Working paper status may have changed.]

  52. Bartik, T. J. (1991). Who Benefits from State and Local Economic Development Policies?. W.E. Upjohn Institute for Employment Research.

    https://doi.org/10.17848/9780585223940

    FoundationalCited on: shift share instruments
    Annotation

    Bartik introduced the shift-share instrument—constructing predicted local employment growth from national industry growth rates interacted with initial local industry composition. This 'Bartik instrument' has become one of the most widely used instruments in labor and urban economics.

  53. Battistin, E., & Rettore, E. (2008). Ineligibles and Eligible Non-Participants as a Double Comparison Group in Regression-Discontinuity Designs. Journal of Econometrics, 142(2), 715–730.

    https://doi.org/10.1016/j.jeconom.2007.05.006

    FoundationalCited on: regression discontinuity fuzzy
    Annotation

    Battistin and Rettore addressed the problem of imperfect compliance in fuzzy RDD by proposing a double comparison group strategy that uses both ineligible units and eligible non-participants to bound treatment effects. Their framework clarified how partial compliance affects identification and offered practical tools for strengthening fuzzy RDD inference.

  54. Bell, A., & Jones, K. (2015). Explaining Fixed Effects: Random Effects Modeling of Time-Series Cross-Sectional and Panel Data. Political Science Research and Methods, 3(1), 133–153.

    https://doi.org/10.1017/psrm.2014.7

    FoundationalCited on: random effects
    Annotation

    Bell and Jones argued that the 'within-between' random-effects model (essentially the Mundlak approach) is often superior to pure fixed effects because it allows explicit decomposition of within- and between-unit effects while accounting for unobserved heterogeneity.

  55. Belloni, A., Chernozhukov, V., & Hansen, C. (2014). Inference on Treatment Effects after Selection among High-Dimensional Controls. Review of Economic Studies, 81(2), 608–650.

    https://doi.org/10.1093/restud/rdt044

    FoundationalCited on: double debiased machine learning
    Annotation

    Belloni, Chernozhukov, and Hansen introduced the post-double-selection LASSO method for inference on treatment effects with many potential controls. This paper was a key precursor to DML, demonstrating how regularized selection in both the treatment and outcome equations can yield valid inference.

  56. Ben-Michael, E., Feller, A., & Rothstein, J. (2021). The Augmented Synthetic Control Method. Journal of the American Statistical Association, 116(536), 1789–1803.

    https://doi.org/10.1080/01621459.2021.1929245

    Annotation

    Ben-Michael, Feller, and Rothstein proposed augmenting the synthetic control estimator with an outcome model to reduce bias when the synthetic control does not achieve perfect pre-treatment fit. The resulting doubly robust estimator is consistent if either the outcome model or the weighting is correct, providing a practical improvement for applied synthetic control studies.

  57. Ben-Michael, E., Feller, A., & Rothstein, J. (2022). Synthetic Controls with Staggered Adoption. Journal of the Royal Statistical Society: Series B, 84(2), 351–381.

    https://doi.org/10.1111/rssb.12448

    Annotation

    Ben-Michael, Feller, and Rothstein extended synthetic control and synthetic DID methods to staggered adoption settings where multiple units adopt treatment at different times. They demonstrated the approach by estimating the effects of teacher collective bargaining laws on school spending across U.S. states, showing how synthetic DID-style reweighting improves counterfactual estimation when treatment rolls out over time.

  58. Benjamini, Y., & Hochberg, Y. (1995). Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing. Journal of the Royal Statistical Society: Series B, 57(1), 289–300.

    https://doi.org/10.1111/j.2517-6161.1995.tb02031.x

    FoundationalCited on: multiple testing
    Annotation

    Benjamini and Hochberg introduced the false discovery rate (FDR) as an alternative to family-wise error rate control. Their step-up procedure for controlling FDR is less conservative than Bonferroni while still providing meaningful protection against false positives, and has become the standard in many fields.

  59. Bennedsen, M., Nielsen, K. M., Pérez-González, F., & Wolfenzon, D. (2007). Inside the Family Firm: The Role of Families in Succession Decisions and Performance. Quarterly Journal of Economics, 122(2), 647–691.

    https://doi.org/10.1162/qjec.122.2.647

    ApplicationCited on: fixed effects
    Annotation

    Uses exogenous variation in CEO succession decisions driven by the gender of the departing CEO's firstborn child to study the effect of family versus professional management on firm performance. A widely cited example of using a natural experiment to address endogeneity in corporate governance research.

  60. Bertrand, M., & Schoar, A. (2003). Managing with Style: The Effect of Managers on Firm Policies. Quarterly Journal of Economics, 118(4), 1169–1208.

    https://doi.org/10.1162/003355303322552775

    ApplicationCited on: fixed effects, ols regression
    Annotation

    Bertrand and Schoar used manager fixed effects (tracking CEOs who moved between firms) to show that individual managerial 'style' explains a significant portion of the variation in corporate investment, financial, and organizational practices. This paper is a key reference linking fixed effects methods to management questions.

  61. Bertrand, M., Duflo, E., & Mullainathan, S. (2004). How Much Should We Trust Differences-in-Differences Estimates?. Quarterly Journal of Economics, 119(1), 249–275.

    https://doi.org/10.1162/003355304772839588

    Annotation

    This critical paper showed that standard errors in DID studies are often far too small because they ignore serial correlation within units over time. It proposed clustering standard errors at the group level as a simple fix, which is now standard practice in all DID analyses.

  62. Bertrand, M., & Mullainathan, S. (2004). Are Emily and Greg More Employable Than Lakisha and Jamal? A Field Experiment on Labor Market Discrimination. American Economic Review, 94(4), 991–1013.

    https://doi.org/10.1257/0002828042002561

    ApplicationCited on: experimental design, ols regression
    Annotation

    This famous audit study sent fictitious resumes with randomly assigned names to employers and found that 'white-sounding' names received 50% more callbacks. It is one of the most widely cited field experiments in social science and a powerful example of how randomization can identify discrimination.

  63. Blanchard, O. J., & Katz, L. F. (1992). Regional Evolutions. Brookings Papers on Economic Activity, 1992(1), 1–75.

    https://doi.org/10.2307/2534556

    ApplicationCited on: shift share instruments
    Annotation

    Blanchard and Katz used the Bartik shift-share instrument to study regional labor market adjustment in the United States, analyzing how local employment shocks affect wages, unemployment, and migration. This paper is one of the earliest and most influential applications of the shift-share IV strategy.

  64. Bloom, H. S. (1995). Minimum Detectable Effects: A Simple Way to Report the Statistical Power of Experimental Designs. Evaluation Review, 19(5), 547–556.

    https://doi.org/10.1177/0193841X9501900504

    FoundationalCited on: power analysis
    Annotation

    Bloom introduced the minimum detectable effect (MDE) framework, which reports the smallest effect size a study can reliably detect given its design and sample size. This approach is now the standard way to discuss statistical power in program evaluation and experimental economics.

  65. Bloom, N., & Van Reenen, J. (2007). Measuring and Explaining Management Practices Across Firms and Countries. Quarterly Journal of Economics, 122(4), 1351–1408.

    https://doi.org/10.1162/qjec.2007.122.4.1351

    Annotation

    Bloom and Van Reenen developed a survey-based measure of management practices and used IV strategies (including firm age and governance rules) to study the causal relationship between management quality and firm productivity. This paper is a prominent IV application in management and organizational economics.

  66. Bloom, N., Liang, J., Roberts, J., & Ying, Z. J. (2015). Does Working from Home Work? Evidence from a Chinese Experiment. Quarterly Journal of Economics, 130(1), 165–218.

    https://doi.org/10.1093/qje/qju032

    ApplicationCited on: experimental design
    Annotation

    A large-scale randomized experiment at a Chinese travel agency found that working from home led to a 13% performance increase. This study became a landmark reference in management and labor economics for its clean experimental design applied to a practical workplace question.

  67. Bonferroni, C. (1936). Teoria Statistica delle Classi e Calcolo delle Probabilita. Pubblicazioni del R Istituto Superiore di Scienze Economiche e Commerciali di Firenze, 8, 3–62.

    FoundationalCited on: multiple testing
    Annotation

    Bonferroni developed the classical correction for multiple comparisons, which controls the family-wise error rate by dividing the significance level by the number of tests. While conservative, the Bonferroni correction remains widely used due to its simplicity and broad applicability.

  68. Borusyak, K., Hull, P., & Jaravel, X. (2022). Quasi-Experimental Shift-Share Research Designs. Review of Economic Studies, 89(1), 181–213.

    https://doi.org/10.1093/restud/rdab030

    FoundationalCited on: shift share instruments
    Annotation

    Borusyak, Hull, and Jaravel provided an alternative framework where identification comes from the exogeneity of the shocks rather than the shares. They showed that with many independent shocks, the instrument is valid even if shares are endogenous, greatly expanding the range of credible applications.

  69. Borusyak, K., Jaravel, X., & Spiess, J. (2024). Revisiting Event-Study Designs: Robust and Efficient Estimation. Review of Economic Studies, 91(6), 3253–3285.

    https://doi.org/10.1093/restud/rdae007

    Annotation

    Borusyak, Jaravel, and Spiess proposed an imputation estimator for staggered DID that first estimates unit and time fixed effects from untreated observations, then imputes the counterfactual outcomes. This approach is efficient, flexible, and avoids the negative weighting problem of TWFE.

  70. Brand, J. E., Xu, J., Koch, B., & Gerber, R. (2021). Uncovering Sociological Effect Heterogeneity Using Tree-Based Machine Learning. Sociological Methodology, 51(2), 189–223.

    https://doi.org/10.1177/0081175021993503

    ApplicationCited on: causal forests
    Annotation

    Brand and colleagues provided a practical guide to using causal trees and forests in social science research. They discussed honest estimation, variable importance for understanding which covariates drive heterogeneity, and applied the methods to study heterogeneous returns to college education.

  71. Brown, S. J., & Warner, J. B. (1985). Using Daily Stock Returns: The Case of Event Studies. Journal of Financial Economics, 14(1), 3–31.

    https://doi.org/10.1016/0304-405X(85)90042-X

    FoundationalCited on: event studies
    Annotation

    Brown and Warner extended the event study framework from monthly to daily stock returns and examined the statistical properties of various test statistics. Their simulations showed that simple methods perform well in most settings, providing practical reassurance for applied researchers.

  72. Bruhn, M., & McKenzie, D. (2009). In Pursuit of Balance: Randomization in Practice in Development Field Experiments. American Economic Journal: Applied Economics, 1(4), 200–232.

    https://doi.org/10.1257/app.1.4.200

    FoundationalCited on: experimental design
    Annotation

    Bruhn and McKenzie compared different randomization methods—simple, stratified, and pairwise—in practice and showed that stratified randomization substantially improves balance on baseline covariates and increases statistical power. They provided practical recommendations for choosing among randomization procedures in field experiments.

  73. Busenbark, J. R., Yoon, H., Gamache, D. L., & Withers, M. C. (2022). Omitted Variable Bias: Examining Management Research with the Impact Threshold of a Confounding Variable (ITCV). Journal of Management, 48(1), 17–48.

    https://doi.org/10.1177/01492063211006458

    ApplicationManagement journalCited on: sensitivity analysis
    Annotation

    Busenbark and colleagues provided a practical guide to conducting sensitivity analysis in management research using the ITCV framework. They reviewed its application in strategic management and organizational behavior, and demonstrated how to interpret and report results for management audiences.

  74. Callaway, B., & Sant'Anna, P. H. C. (2021). Difference-in-Differences with Multiple Time Periods. Journal of Econometrics, 225(2), 200–230.

    https://doi.org/10.1016/j.jeconom.2020.12.001

    Annotation

    Callaway and Sant'Anna proposed group-time average treatment effects (ATT(g,t)) that avoid the problematic comparisons in TWFE. Their framework allows for heterogeneous treatment effects across groups and time and provides aggregation schemes for summary parameters.

  75. Calonico, S., Cattaneo, M. D., & Titiunik, R. (2014). Robust Nonparametric Confidence Intervals for Regression-Discontinuity Designs. Econometrica, 82(6), 2295–2326.

    https://doi.org/10.3982/ECTA11757

    Annotation

    Calonico, Cattaneo, and Titiunik developed bias-corrected confidence intervals for RDD that address the problem of conventional confidence intervals being invalid when using optimal bandwidth selectors. Their rdrobust software package has become the standard tool for implementing RDD in practice.

  76. Cameron, A. C., & Trivedi, P. K. (1986). Econometric Models Based on Count Data: Comparisons and Applications of Some Estimators and Tests. Journal of Applied Econometrics, 1(1), 29–53.

    https://doi.org/10.1002/jae.3950010104

    FoundationalCited on: poisson negative binomial
    Annotation

    Cameron and Trivedi compared Poisson, negative binomial, and other count data models, providing tests for overdispersion and guidance on model selection. This paper helped establish the practical toolkit for applied researchers working with count outcomes.

  77. Cameron, A. C., & Trivedi, P. K. (1990). Regression-Based Tests for Overdispersion in the Poisson Model. Journal of Econometrics, 46(3), 347-364.

    https://doi.org/10.1016/0304-4076(90)90014-K

    MethodologicalCited on: poisson negative binomial
    Annotation

    Develops regression-based test statistics for detecting overdispersion in Poisson regression models.

  78. Cameron, A. C., & Trivedi, P. K. (2005). Microeconometrics: Methods and Applications. Cambridge University Press.

    SurveyCited on: fixed effects, logit probit
    Annotation

    Chapter 21 covers panel data methods comprehensively, including fixed effects, random effects, and dynamic panel models. A standard graduate-level reference for microeconometric methods.

  79. Cameron, A. C., Gelbach, J. B., & Miller, D. L. (2008). Bootstrap-Based Improvements for Inference with Clustered Errors. Review of Economics and Statistics, 90(3), 414–427.

    https://doi.org/10.1162/rest.90.3.414

    Annotation

    Addresses what happens when clustering is necessary but the number of clusters is small (fewer than 30-50). Proposes the wild cluster bootstrap as a solution, which has become the standard approach when researchers have too few clusters for asymptotic cluster-robust standard errors to be reliable.

  80. Cameron, A. C., & Trivedi, P. K. (2013). Regression Analysis of Count Data. Cambridge University Press.

    https://doi.org/10.1017/CBO9781139013567

    Annotation

    This textbook is the definitive reference on count data regression, covering Poisson, negative binomial, zero-inflated, hurdle, and panel count models. It provides both the theoretical foundations and practical implementation guidance that applied researchers need.

  81. Cameron, A. C., & Miller, D. L. (2015). A Practitioner's Guide to Cluster-Robust Inference. Journal of Human Resources, 50(2), 317–372.

    https://doi.org/10.3368/jhr.50.2.317

    SurveyCited on: ols regression
    Annotation

    This highly practical survey covers all aspects of cluster-robust inference in OLS regression, including when to cluster, at what level, and what to do when the number of clusters is small. It has become the essential reference for applied researchers deciding how to handle clustered data.

  82. Capron, L., & Pistre, N. (2002). When Do Acquirers Earn Abnormal Returns?. Strategic Management Journal, 23(9), 781–794.

    https://doi.org/10.1002/smj.262

    ApplicationManagement journalCited on: event studies
    Annotation

    Capron and Pistre used event study methodology to examine when acquiring firms earn positive abnormal returns from mergers and acquisitions. They found that acquirers earn positive returns only when they are the primary source of value creation, contributing to the M&A strategy literature.

  83. Card, D., & Krueger, A. B. (1994). Minimum Wages and Employment: A Case Study of the Fast-Food Industry in New Jersey and Pennsylvania. American Economic Review, 84(4), 772–793.

    https://doi.org/10.1257/aer.84.4.772

    FoundationalCited on: difference in differences
    Annotation

    Perhaps the most famous DID study in economics. Card and Krueger compared fast-food employment in New Jersey (which raised its minimum wage) with neighboring Pennsylvania (which did not). They found no negative employment effect, challenging the standard textbook prediction. This paper popularized DID as a research design.

  84. Card, D. (2001). Immigrant Inflows, Native Outflows, and the Local Labor Market Impacts of Higher Immigration. Journal of Labor Economics, 19(1), 22–64.

    https://doi.org/10.1086/209979

    ApplicationCited on: shift share instruments
    Annotation

    Card used a shift-share instrument based on historical settlement patterns of immigrant groups to predict current immigration flows to U.S. cities. This 'enclave instrument' has been adopted in hundreds of subsequent immigration studies and is a classic example of the shift-share approach.

  85. Casey, K., Glennerster, R., & Miguel, E. (2012). Reshaping Institutions: Evidence on Aid Impacts Using a Preanalysis Plan. Quarterly Journal of Economics, 127(4), 1755–1812.

    https://doi.org/10.1093/qje/qje027

    ApplicationCited on: multiple testing, pre registration
    Annotation

    Casey, Glennerster, and Miguel pre-registered their analysis plan for a community-driven development program in Sierra Leone and applied multiple testing corrections (including the Westfall-Young step-down procedure and family-wise error rate adjustments) across outcome families. This paper is one of the most prominent examples of rigorous multiple testing adjustment in a field experiment, demonstrating that many individually significant effects lose significance after correction.

  86. Cattaneo, M. D. (2010). Efficient Semiparametric Estimation of Multi-Valued Treatment Effects under Ignorability. Journal of Econometrics.

    https://doi.org/10.1016/j.jeconom.2009.09.023

    FoundationalCited on: doubly robust estimation
    Annotation

    Extended doubly robust estimation to multi-valued treatments and established semiparametric efficiency bounds. Showed how to combine flexible nonparametric estimation with valid inference for treatment effects.

  87. Cattaneo, M. D., Drukker, D. M., & Holland, A. D. (2013). Estimation of Multivalued Treatment Effects Under Conditional Independence. Stata Journal, 13(3), 407–450.

    https://doi.org/10.1177/1536867X1301300301

    FoundationalCited on: matching methods
    Annotation

    Cattaneo, Drukker, and Holland extended matching and inverse probability weighting methods to settings with multi-valued (rather than binary) treatments, developing estimators for dose-response functions under conditional independence. Their accompanying Stata implementation made these methods readily accessible to applied researchers.

  88. Cattaneo, M. D., Frandsen, B. R., & Titiunik, R. (2015). Randomization Inference in the Regression Discontinuity Design: An Application to Party Advantages in the U.S. Senate. Journal of Causal Inference, 3(1), 1–24.

    https://doi.org/10.1515/jci-2013-0010

    ApplicationCited on: randomization inference
    Annotation

    Cattaneo, Frandsen, and Titiunik developed a randomization inference framework for regression discontinuity designs, exploiting the local randomization interpretation of close elections. They applied the method to estimate party advantages in U.S. Senate elections, demonstrating how Fisher-style permutation tests can provide finite-sample exact inference in RDD settings where asymptotic approximations may be unreliable.

  89. Cattaneo, M. D., Titiunik, R., & Vazquez-Bare, G. (2019). Power Calculations for Regression-Discontinuity Designs. Stata Journal, 19(1), 210–245.

    Annotation

    Provides methods and software for power calculations in RDD, essential for study design and determining adequate sample sizes near the cutoff. The associated rdsampsi command enables researchers to plan appropriately powered RDD studies before data collection.

  90. Cattaneo, M. D., Idrobo, N., & Titiunik, R. (2020). A Practical Introduction to Regression Discontinuity Designs: Foundations. Cambridge University Press.

    https://doi.org/10.1017/9781108684606

    Annotation

    A practical and accessible guide to implementing regression discontinuity designs, covering both sharp and fuzzy cases with worked examples and code. Part of the Cambridge Elements series, it provides step-by-step guidance on bandwidth selection, estimation, and inference using the rdrobust toolkit.

  91. Cattaneo, M. D., & Titiunik, R. (2022). Regression Discontinuity Designs. Annual Review of Economics, 14, 821–851.

    https://doi.org/10.1146/annurev-economics-051520-021409

    Annotation

    A recent survey covering the state of the art in RDD methodology, including extensions to fuzzy designs, geographic RDD, and multi-cutoff designs. Provides guidance on current recommended practices and is an excellent entry point to the modern RDD literature.

  92. Cattaneo, M. D., Idrobo, N., & Titiunik, R. (2024). A Practical Introduction to Regression Discontinuity Designs: Extensions. Cambridge University Press.

    https://doi.org/10.1017/9781009441896

    FoundationalCited on: regression discontinuity sharp
    Annotation

    This follow-up volume to Cattaneo, Idrobo, and Titiunik's first book covers extensions of the regression discontinuity framework, including multi-score designs, geographic RDD, kink designs, and discrete running variables. It provides practical guidance and software implementations for these more advanced settings, making it an essential companion for applied researchers going beyond the standard sharp RDD.

  93. Chamberlain, G. (1980). Analysis of Covariance with Qualitative Data. Review of Economic Studies, 47(1), 225–238.

    https://doi.org/10.2307/2297110

    FoundationalCited on: fixed effects, logit probit
    Annotation

    Chamberlain extended the fixed effects approach to nonlinear models like logit, showing how to condition out the fixed effects in discrete choice settings. This work is fundamental for researchers who need fixed effects in models where the dependent variable is binary or categorical.

  94. Chatterji, A. K., Findley, M., Jensen, N. M., Meier, S., & Nielson, D. (2016). Field Experiments in Strategy Research. Strategic Management Journal, 37(1), 116–132.

    https://doi.org/10.1002/smj.2449

    ApplicationManagement journalCited on: experimental design
    Annotation

    This paper makes the case for using field experiments in strategy research and provides practical guidance for doing so. It discusses internal validity, external validity, and ethical considerations specific to strategy scholars.

  95. Chava, S., & Roberts, M. R. (2008). How Does Financing Impact Investment? The Role of Debt Covenants. Journal of Finance, 63(5), 2085–2121.

    https://doi.org/10.1111/j.1540-6261.2008.01391.x

    ApplicationCited on: regression discontinuity sharp
    Annotation

    Chava and Roberts used an RDD around debt covenant thresholds to study how covenant violations affect firm investment. This paper is an important early application of RDD in corporate finance, where accounting-based thresholds create natural discontinuities.

  96. Chernozhukov, V., Chetverikov, D., Demirer, M., Duflo, E., Hansen, C., Newey, W., & Robins, J. (2018). Double/Debiased Machine Learning for Treatment and Structural Parameters. Econometrics Journal, 21(1), C1–C68.

    https://doi.org/10.1111/ectj.12097

    FoundationalCited on: double debiased machine learning
    Annotation

    The foundational paper introducing double/debiased machine learning (DML). Chernozhukov and colleagues showed how to combine Neyman orthogonality with cross-fitting to obtain root-n consistent and asymptotically normal estimates of low-dimensional causal parameters while using high-dimensional machine learning for nuisance functions.

  97. Chernozhukov, V., Wuthrich, K., & Zhu, Y. (2021). An Exact and Robust Conformal Inference Method for Counterfactual and Synthetic Controls. Journal of the American Statistical Association, 116(536), 1849–1864.

    https://doi.org/10.1080/01621459.2021.1920957

    FoundationalCited on: synthetic control
    Annotation

    Chernozhukov, Wuthrich, and Zhu developed a conformal inference method for synthetic control that provides exact, finite-sample valid p-values and confidence intervals without requiring a large number of control units. This approach offers a modern, robust alternative to placebo-based inference for counterfactual and synthetic control estimators.

  98. Chernozhukov, V., Hausman, J. A., & Newey, W. K. (2022). Locally Robust Semiparametric Estimation. Econometrica, 90(4), 1501–1535.

    https://doi.org/10.3982/ECTA16294

    Annotation

    Chernozhukov, Hausman, and Newey developed locally robust semiparametric estimators that extend the DML framework, demonstrating how automatic debiasing with machine learning first-stage estimates can be applied broadly. Their approach yields root-n consistent estimates of causal and structural parameters even when nuisance functions are estimated with regularized machine learning methods.

  99. Chetty, R., Friedman, J. N., & Rockoff, J. E. (2014). Measuring the Impacts of Teachers I: Evaluating Bias in Teacher Value-Added Estimates. American Economic Review, 104(9), 2593–2632.

    https://doi.org/10.1257/aer.104.9.2593

    ApplicationCited on: fixed effects
    Annotation

    This paper used teacher fixed effects (value-added models) and quasi-experimental validation to measure individual teachers' causal impacts on student outcomes. It demonstrated that teacher fixed effects capture real causal effects, not just selection, and has influenced education policy worldwide.

  100. Choudhury, P., Allen, R. T., & Endres, M. G. (2021). Machine Learning for Pattern Discovery in Management Research. Strategic Management Journal, 42(1), 30–57.

    https://doi.org/10.1002/smj.3215

    ApplicationManagement journalCited on: causal forests
    Annotation

    Choudhury, Allen, and Endres discussed how machine learning methods including causal forests can be used for pattern discovery in management research. They provided guidance on when tree-based methods for heterogeneous treatment effects are appropriate for strategy and organizational questions.

  101. Christensen, G., & Miguel, E. (2018). Transparency, Reproducibility, and the Credibility of Economics Research. Journal of Economic Literature, 56(3), 920–980.

    https://doi.org/10.1257/jel.20171350

    ApplicationCited on: pre registration
    Annotation

    Christensen and Miguel surveyed the transparency and reproducibility landscape in economics, documenting the growing adoption of pre-registration through the AEA RCT Registry and other platforms. They presented evidence on the prevalence of specification searching and publication bias, and made the case that pre-registration combined with pre-analysis plans substantially improves the credibility of empirical findings.

  102. Cinelli, C., & Hazlett, C. (2020). Making Sense of Sensitivity: Extending Omitted Variable Bias. Journal of the Royal Statistical Society: Series B, 82(1), 39–67.

    https://doi.org/10.1111/rssb.12348

    FoundationalCited on: sensitivity analysis
    Annotation

    Cinelli and Hazlett developed a modern framework for sensitivity analysis based on partial R-squared measures, extending the omitted variable bias formula. Their approach allows researchers to benchmark the strength of hypothetical confounders against observed covariates, making sensitivity analysis more interpretable.

  103. Cinelli, C., Ferwerda, J., & Hazlett, C. (2022). Sensemakr: Sensitivity Analysis Tools for OLS in R and Stata. Journal of Statistical Software, 104(11), 1–33.

    https://doi.org/10.18637/jss.v104.i11

    Annotation

    Cinelli, Ferwerda, and Hazlett developed the sensemakr R and Stata package implementing their partial R-squared sensitivity analysis framework. They demonstrate the tool with applications to studies of violence and political attitudes, showing how researchers can benchmark potential confounders against observed covariates to assess the robustness of causal claims from observational data.

  104. Cinelli, C., & Hazlett, C. (2022). An Omitted Variable Bias Framework for Sensitivity Analysis of Instrumental Variables. Working Paper.

    ApplicationCited on: sensitivity analysis
    Annotation

    Cinelli and Hazlett extended their OLS sensitivity framework to instrumental variables settings, showing how to assess the robustness of IV estimates to violations of the exclusion restriction. They derived bounds on IV bias as a function of the partial R-squared of a hypothetical confounder with both the instrument and the outcome, providing practical tools for benchmarking the plausibility of IV assumptions.

  105. Clark, T. S., & Linzer, D. A. (2015). Should I Use Fixed or Random Effects?. Political Science Research and Methods, 3(2), 399–408.

    https://doi.org/10.1017/psrm.2014.32

    SurveyCited on: random effects
    Annotation

    Clark and Linzer provided practical guidance on choosing between fixed and random effects, arguing the decision depends on the research question, sample size, and the degree of correlation between unit effects and covariates rather than simply defaulting to one approach.

  106. Clarke, D., Romano, J. P., & Wolf, M. (2020). The Romano-Wolf Multiple-Hypothesis Correction in Stata. Stata Journal, 20(4), 812–843.

    https://doi.org/10.1177/1536867X20976314

    ApplicationCited on: multiple testing
    Annotation

    Clarke, Romano, and Wolf developed a Stata implementation of the Romano-Wolf stepwise multiple testing correction, providing applied researchers with an accessible tool for controlling the family-wise error rate while accounting for the dependence structure among test statistics.

  107. Clarke, D., Paillanir, D., Romano, J. P., & Wolf, M. (2023). Multi-Cutoff RD Designs with Synthetic Controls. Working Paper.

    Annotation

    Extended SDID methods to regression discontinuity settings with multiple cutoffs, demonstrating the flexibility of synthetic control ideas beyond the standard panel data framework.

  108. Clarke, D., Pailanir, D., Athey, S., & Imbens, G. (2024). On Synthetic Difference-in-Differences and Related Estimation Methods in Stata. Stata Journal, 24(4), 557–598.

    https://doi.org/10.1177/1536867X241297914

    Annotation

    Clarke and colleagues developed the sdid Stata package for implementing synthetic DID, providing detailed documentation and empirical examples. This paper makes the method accessible to applied researchers and demonstrates implementation with real policy evaluation data.

  109. Coffman, L. C., & Niederle, M. (2015). Pre-Analysis Plans Have Limited Upside, Especially Where Replications Are Feasible. Journal of Economic Perspectives, 29(3), 81–98.

    https://doi.org/10.1257/jep.29.3.81

    ApplicationCited on: pre registration
    Annotation

    Coffman and Niederle offered a skeptical perspective on pre-analysis plans, arguing that their benefits are limited when replication is feasible and that rigid adherence to pre-specified analyses can prevent researchers from learning from the data. This paper provides important counterarguments in the pre-registration debate.

  110. Cohen, J. (1988). Statistical Power Analysis for the Behavioral Sciences. Lawrence Erlbaum Associates.

    https://doi.org/10.4324/9780203771587

    FoundationalCited on: power analysis
    Annotation

    Cohen's foundational textbook introduced the concepts of effect size, statistical power, and sample size determination that became standard in the behavioral sciences. He provided power tables and conventions for small, medium, and large effect sizes that remain widely used across disciplines.

  111. Cohn, J. B., Liu, Z., & Wardlaw, M. I. (2022). Count (and Count-Like) Data in Finance. Journal of Financial Economics.

    https://doi.org/10.1016/j.jfineco.2021.08.004

    Annotation

    Practical guide on when and how to use Poisson regression in finance research. Shows that many finance variables (patents, citations, analyst coverage) are count data and that OLS/log-OLS can give misleading results.

  112. Conley, T. G., Hansen, C. B., & Rossi, P. E. (2012). Plausibly Exogenous. Review of Economics and Statistics, 94(1), 260-272.

    https://doi.org/10.1162/REST_a_00139

    MethodologicalCited on: instrumental variables
    Annotation

    Develops methods for IV estimation when the exclusion restriction may be only approximately valid.

  113. Correia, S. (2016). Linear Models with Multi-Way Fixed Effects and Clustered Standard Errors. Working Paper.

    FoundationalCited on: fixed effects
    Annotation

    Introduces the reghdfe Stata command for fast estimation of linear models with multiple levels of fixed effects, now the standard tool for applied researchers working with high-dimensional fixed effects in panel data.

  114. Correia, S., Guimaraes, P., & Zylkin, T. (2020). Fast Poisson Estimation with High-Dimensional Fixed Effects. Stata Journal, 20(1), 95–115.

    https://doi.org/10.1177/1536867X20909691

    FoundationalCited on: poisson negative binomial
    Annotation

    Introduces the ppmlhdfe Stata command for fast Poisson estimation with multiple levels of fixed effects, making PPML feasible for large datasets with high-dimensional fixed effects. This tool has become standard for applied researchers working with count data in panel settings.

  115. Crepon, B., Duflo, E., Gurgand, M., Rathelot, R., & Zamora, P. (2013). Do Labor Market Policies Have Displacement Effects? Evidence from a Clustered Randomized Experiment. Quarterly Journal of Economics, 128(2), 531–580.

    https://doi.org/10.1093/qje/qjt001

    ApplicationCited on: lee bounds
    Annotation

    Crepon and colleagues evaluated a job placement assistance program in France using a large-scale clustered RCT and applied Lee bounds to address differential attrition from the sample. The paper demonstrates best-practice use of Lee bounds in a labor economics setting, showing that the program's employment effects remain robust to bounding even under worst-case attrition assumptions.

  116. Cunat, V., Gine, M., & Guadalupe, M. (2012). The Vote Is Cast: The Effect of Corporate Governance on Shareholder Value. Journal of Finance, 67(5), 1943–1977.

    https://doi.org/10.1111/j.1540-6261.2012.01776.x

    Annotation

    Cunat, Gine, and Guadalupe used a fuzzy RDD around the majority threshold in shareholder governance proposals to estimate the causal effect of governance provisions on firm value. This paper is a leading example of fuzzy RDD applied to corporate governance and finance.

  117. Cunningham, S., & Shah, M. (2018). Decriminalizing Indoor Prostitution: Implications for Sexual Violence and Public Health. Review of Economic Studies, 85(3), 1683–1715.

    https://doi.org/10.1093/restud/rdx065

    ApplicationCited on: synthetic control
    Annotation

    Cunningham and Shah used the synthetic control method to study how Rhode Island's accidental decriminalization of indoor prostitution affected sex crimes and STI rates. This study is a well-known application that illustrates how synthetic control can exploit a unique policy change affecting a single unit.

  118. Cunningham, S. (2021). Causal Inference: The Mixtape. Yale University Press.

    Annotation

    An accessible textbook with an excellent DiD chapter that walks through the intuition, the math, and the code (in Stata and R). Freely available online at mixtape.scunning.com, it is a valuable companion for students who want worked examples alongside formal treatment.

  119. Davis, J., & Heller, S. B. (2017). Using Causal Forests to Predict Treatment Heterogeneity: An Application to Summer Jobs. American Economic Review: Papers & Proceedings, 107(5), 546–550.

    https://doi.org/10.1257/aer.p20171000

    ApplicationCited on: causal forests
    Annotation

    Davis and Heller applied causal forests to a randomized summer jobs program for disadvantaged youth in Chicago, demonstrating how the method can identify which subpopulations benefit most from a policy intervention. This paper is an accessible applied introduction to causal forests.

  120. de Chaisemartin, C., & D'Haultfoeuille, X. (2020). Two-Way Fixed Effects Estimators with Heterogeneous Treatment Effects. American Economic Review, 110(9), 2964–2996.

    https://doi.org/10.1257/aer.20181169

    Annotation

    De Chaisemartin and D'Haultfoeuille showed that the TWFE estimator can assign negative weights to some treatment effects, potentially producing estimates with the wrong sign. They proposed an alternative estimator and a diagnostic test for the presence of negative weights.

  121. de Chaisemartin, C., & D'Haultfoeuille, X. (2023). Two-Way Fixed Effects and Differences-in-Differences with Heterogeneous Treatment Effects: A Survey. Econometrics Journal, 26(3), C1–C30.

    https://doi.org/10.1093/ectj/utac017

    SurveyCited on: event studies
    Annotation

    De Chaisemartin and D'Haultfoeuille provided a comprehensive survey of the recent literature on problems with two-way fixed effects estimators under heterogeneous treatment effects, covering the key diagnostic tests, alternative estimators, and practical guidance for applied researchers working with event-study and difference-in-differences designs.

  122. Dehejia, R. H., & Wahba, S. (1999). Causal Effects in Nonexperimental Studies: Reevaluating the Evaluation of Training Programs. Journal of the American Statistical Association, 94(448), 1053–1062.

    https://doi.org/10.1080/01621459.1999.10473858

    ApplicationCited on: matching methods
    Annotation

    Dehejia and Wahba showed that propensity score matching could replicate experimental estimates of a job training program using observational data. This influential paper demonstrated the practical value of matching and made propensity score methods mainstream in applied social science.

  123. Dell, M. (2010). The Persistent Effects of Peru's Mining Mita. Econometrica, 78(6), 1863–1903.

    https://doi.org/10.3982/ECTA8121

    ApplicationCited on: regression discontinuity sharp
    Annotation

    Uses a geographic RDD exploiting the historical boundary of the mita forced labor system in Peru to estimate the persistent effect of colonial institutions on economic outcomes centuries later. Demonstrates how RDD can exploit spatial discontinuities, not just score-based cutoffs.

  124. Deshpande, M., & Li, Y. (2019). Who Is Screened Out? Application Costs and the Targeting of Disability Programs. American Economic Journal: Economic Policy, 11(4), 213–248.

    https://doi.org/10.1257/pol.20180076

    Annotation

    Deshpande and Li used staggered closings of Social Security field offices across the United States to estimate the effects of application costs on disability program participation. The staggered timing of office closures across locations provides a natural setting for modern staggered DID methods, and the paper demonstrates how treatment-timing variation can be leveraged for credible policy evaluation.

  125. Dong, Y., & Lewbel, A. (2015). Identifying the Effect of Changing the Policy Threshold in Regression Discontinuity Models. Review of Economics and Statistics, 97(5), 1081–1092.

    https://doi.org/10.1162/REST_a_00510

    FoundationalCited on: regression discontinuity fuzzy
    Annotation

    Dong and Lewbel extended fuzzy RDD by showing how to identify the effect of changing the policy threshold, not just the effect of treatment at the existing cutoff. This approach allows researchers to evaluate counterfactual policies that shift the eligibility boundary, broadening the policy relevance of fuzzy RDD estimates.

  126. Doudchenko, N., & Imbens, G. W. (2016). Balancing, Regression, Difference-in-Differences and Synthetic Control Methods: A Synthesis. NBER Working Paper No. 22791.

    https://doi.org/10.3386/w22791

    FoundationalCited on: synthetic control
    Annotation

    Doudchenko and Imbens placed synthetic control within a broader framework that includes DID and regression as special cases, proposing extensions that relax the non-negativity and adding-up constraints on weights. This paper helps researchers understand the connections between synthetic control and other methods.

  127. Dranove, D., & Olsen, C. (1994). The Economic Side Effects of Dangerous Drug Announcements. Journal of Law and Economics, 37(2), 323–348.

    https://doi.org/10.1086/467316

    ApplicationCited on: event studies
    Annotation

    Dranove and Olsen used event studies to measure the stock market impact of FDA drug safety announcements on pharmaceutical firms. This application demonstrated how event studies can quantify the financial consequences of regulatory actions in health care and management contexts.

  128. Dube, A., Girardi, D., Jorda, O., & Taylor, A. M. (2023). A Local Projections Approach to Difference-in-Differences Event Studies. NBER Working Paper No. 31184.

    https://doi.org/10.3386/w31184

    Annotation

    Dube and colleagues connected local projections to DID event studies and demonstrated how synthetic DID-type weighting can improve estimation of dynamic treatment effects. This paper shows the broader applicability of the synthetic DID idea beyond the original static setting.

  129. Duflo, E. (2001). Schooling and Labor Market Consequences of School Construction in Indonesia: Evidence from an Unusual Policy Experiment. American Economic Review, 91(4), 795–813.

    https://doi.org/10.1257/aer.91.4.795

    ApplicationCited on: difference in differences
    Annotation

    Uses DiD comparing cohorts exposed to a massive school construction program in Indonesia to older cohorts not exposed, across regions with different program intensity. A beautifully clean application showing how DiD can exploit variation in treatment intensity across space and cohorts.

  130. Duflo, E., Glennerster, R., & Kremer, M. (2007). Using Randomization in Development Economics Research: A Toolkit. Handbook of Development Economics, 4, 3895–3962.

    https://doi.org/10.1016/S1573-4471(07)04061-2

    Annotation

    Duflo, Glennerster, and Kremer wrote the definitive practical guide to running randomized experiments in development economics. The chapter covers all stages from design to analysis, including power calculations, stratification, dealing with attrition, and estimating treatment effects with imperfect compliance. It has become required reading for anyone designing a field experiment.

  131. Dunning, T. (2012). Natural Experiments in the Social Sciences: A Design-Based Approach. Cambridge University Press.

    https://doi.org/10.1017/CBO9781139084444

    FoundationalCited on: experimental design
    Annotation

    Dunning provided a systematic framework for identifying and analyzing natural experiments across the social sciences. The book covers as-if random assignment, instrumental variables, regression discontinuity, and difference-in-differences through a unified design-based lens, making it essential reading for researchers exploiting natural variation for causal inference.

  132. Fama, E. F., Fisher, L., Jensen, M. C., & Roll, R. (1969). The Adjustment of Stock Prices to New Information. International Economic Review, 10(1), 1–21.

    https://doi.org/10.2307/2525569

    FoundationalCited on: event studies
    Annotation

    This paper is the origin of the modern event study methodology in finance. Fama, Fisher, Jensen, and Roll studied how stock prices adjust to stock splits and established the basic framework of measuring abnormal returns around corporate events that has been used in thousands of subsequent studies.

  133. Fan, Q., Hsu, Y.-C., Lieli, R. P., & Zhang, Y. (2022). Estimation of Conditional Average Treatment Effects with High-Dimensional Data. Journal of Business & Economic Statistics, 40(1), 313–327.

    https://doi.org/10.1080/07350015.2020.1811102

    Annotation

    Fan and colleagues developed methods for estimating CATEs using DML-type approaches in high-dimensional settings with applications to economics and business research. They showed how doubly robust estimation combined with machine learning can uncover meaningful treatment effect heterogeneity.

  134. Ferman, B., & Pinto, C. (2021). Synthetic Controls with Imperfect Pre-Treatment Fit. Quantitative Economics.

    https://doi.org/10.3982/QE1596

    FoundationalCited on: synthetic control
    Annotation

    Analyzed the consequences of imperfect pre-treatment fit in synthetic control, showing that poor fit can lead to biased estimates. Provided conditions under which the method remains valid despite imperfect fit.

  135. Finkelstein, A., Taubman, S., Wright, B., Bernstein, M., Gruber, J., Newhouse, J. P., Allen, H., & Baicker, K. (2012). The Oregon Health Insurance Experiment: Evidence from the First Year. Quarterly Journal of Economics, 127(3), 1057–1106.

    https://doi.org/10.1093/qje/qjs020

    ApplicationCited on: experimental design
    Annotation

    The Oregon Health Insurance Experiment — a masterclass in analyzing a lottery-based experiment with non-compliance, using IV to estimate the local average treatment effect of Medicaid coverage.

  136. Firpo, S., & Possebom, V. (2018). Synthetic Control Method: Inference, Sensitivity Analysis and Confidence Sets. Journal of Causal Inference, 6(2).

    https://doi.org/10.1515/jci-2016-0026

    FoundationalCited on: synthetic control
    Annotation

    Firpo and Possebom developed formal inference procedures for the synthetic control method, including sensitivity analysis tools and confidence sets. Their framework provides a more rigorous basis for statistical inference in synthetic control applications beyond the standard permutation-based placebo tests.

  137. Fisher, R. A. (1935). The Design of Experiments.

    Annotation

    Fisher's classic book laid the foundations of experimental design, introducing concepts like randomization, blocking, and factorial designs. The 'lady tasting tea' example from this book remains one of the most famous illustrations of hypothesis testing and the logic of controlled experiments.

  138. Flammer, C. (2015). Does Corporate Social Responsibility Lead to Superior Financial Performance? A Regression Discontinuity Approach. Management Science, 61(11), 2549–2568.

    https://doi.org/10.1287/mnsc.2014.2038

    Annotation

    Although primarily an RDD paper, Flammer also used DID-style before-after comparisons around shareholder proposals on CSR. Published in Management Science, it is a prominent example of quasi-experimental methods in top management journals.

  139. Fleming, L., & Sorenson, O. (2001). Technology as a Complex Adaptive System: Evidence from Patent Data. Research Policy, 30(7), 1019–1039.

    https://doi.org/10.1016/S0048-7333(00)00135-9

    ApplicationCited on: poisson negative binomial
    Annotation

    Fleming and Sorenson used negative binomial regression on patent citation counts to study how the complexity of technological combinations affects the usefulness of inventions. This paper is a prominent application of count models in the innovation and technology management literature.

  140. Frank, K. A. (2000). Impact of a Confounding Variable on a Regression Coefficient. Sociological Methods & Research, 29(2), 147–194.

    https://doi.org/10.1177/0049124100029002001

    ApplicationCited on: sensitivity analysis
    Annotation

    Frank developed the impact threshold for a confounding variable (ITCV), which calculates how much bias an omitted variable would need to introduce to invalidate an inference. This approach has been widely adopted in education and management research.

  141. Freeman, S., Eddy, S. L., McDonough, M., Smith, M. K., Okoroafor, N., Jordt, H., & Wenderoth, M. P. (2014). Active Learning Increases Student Performance in Science, Engineering, and Mathematics. Proceedings of the National Academy of Sciences, 111(23), 8410–8415.

    https://doi.org/10.1073/pnas.1319030111

    FoundationalCited on: about
    Annotation

    A landmark meta-analysis of 225 studies showing that active learning increases examination performance by 0.47 standard deviations and reduces failure rates by 55% compared to traditional lecturing in STEM courses.

  142. Fremeth, A. R., Richter, B. K., & Schotter, A. (2016). Eliciting Cooperation: A Principal Agent Experiment. Management Science, 62(10), 2914–2936.

    https://doi.org/10.1287/mnsc.2015.2278

    ApplicationManagement journalCited on: synthetic control
    Annotation

    While primarily an experimental study, Fremeth and colleagues' work illustrates the growing adoption of quasi-experimental causal inference methods in management research. Synthetic control is increasingly used in management to study the effect of regulations, leadership changes, and other shocks on individual firms. [UNVERIFIED: This specific paper may not use synthetic control; it is included as an example of causal methods in management journals.]

  143. Freyaldenhoven, S., Hansen, C., & Shapiro, J. M. (2019). Pre-Event Trends in the Panel Event-Study Design. American Economic Review, 109(9), 3307–3338.

    https://doi.org/10.1257/aer.20180609

    FoundationalCited on: event studies
    Annotation

    Freyaldenhoven, Hansen, and Shapiro developed diagnostic tools and an instrumental-variables-based estimator for panel event-study designs when pre-event trends may be present. Their framework helps researchers distinguish true anticipation effects from confounding trends, addressing a central challenge in event-study credibility.

  144. Frisch, R., & Waugh, F. V. (1933). Partial Time Regressions as Compared with Individual Trends. Econometrica, 1(4), 387–401.

    https://doi.org/10.2307/1907330

    FoundationalCited on: ols regression
    Annotation

    The original result establishing that a coefficient in a multiple regression can be obtained by first residualizing both the outcome and the regressor against all other covariates. The Frisch-Waugh-Lovell (FWL) theorem provides the theoretical foundation for understanding what 'controlling for' means in multiple regression and is the basis for modern fixed-effects estimation.

  145. Funk, M. J., Westreich, D., Wiesen, C., Sturmer, T., Brookhart, M. A., & Davidian, M. (2011). Doubly Robust Estimation of Causal Effects. American Journal of Epidemiology, 173(7), 761–767.

    https://doi.org/10.1093/aje/kwq439

    ApplicationCited on: doubly robust estimation
    Annotation

    Funk and colleagues provided a practical tutorial on doubly robust estimation for epidemiologists, demonstrating through a worked example how the AIPW estimator protects against misspecification of either the outcome model or the propensity score model. This paper helped spread the method in health sciences.

  146. Gelman, A., & Loken, E. (2013). The Garden of Forking Paths: Why Multiple Comparisons Can Be a Problem, Even When There Is No 'Fishing Expedition' or 'p-Hacking' and the Research Hypothesis Was Posited Ahead of Time. Unpublished manuscript, Columbia University.

    FoundationalCited on: pre registration
    Annotation

    Gelman and Loken argued that even without deliberate p-hacking, the multitude of defensible analytical choices creates a 'garden of forking paths' that inflates false-positive rates. This influential working paper provided a key intellectual motivation for pre-registration by showing that researcher degrees of freedom are unavoidable without pre-commitment.

  147. Gelman, A., & Carlin, J. (2014). Beyond Power Calculations: Assessing Type S (Sign) and Type M (Magnitude) Errors. Perspectives on Psychological Science, 9(6), 641–651.

    https://doi.org/10.1177/1745691614551642

    ApplicationCited on: power analysis
    Annotation

    Gelman and Carlin extended traditional power analysis by introducing Type S (sign) errors (the probability a significant estimate has the wrong sign) and Type M (magnitude) errors (the expected exaggeration ratio of significant estimates). These concepts provide a richer understanding of what happens in underpowered studies.

  148. Gelman, A., & Loken, E. (2014). The Statistical Crisis in Science. American Scientist, 102(6), 460.

    https://doi.org/10.1511/2014.111.460

    FoundationalCited on: pre registration
    Annotation

    Gelman and Loken summarized the statistical crisis in science, emphasizing how researcher degrees of freedom and the garden of forking paths lead to unreliable findings. This accessible piece extended their 2013 working paper and reinforced the case for pre-registration as a solution to the replication crisis.

  149. Gelman, A., & Imbens, G. W. (2019). Why High-Order Polynomials Should Not Be Used in Regression Discontinuity Designs. Journal of Business & Economic Statistics, 37(3), 447–456.

    https://doi.org/10.1080/07350015.2017.1366909

    FoundationalCited on: regression discontinuity sharp
    Annotation

    Gelman and Imbens showed that using high-order global polynomials in RDD leads to noisy estimates, sensitivity to the degree of polynomial, and poor coverage of confidence intervals. They recommended local linear or quadratic fits with appropriate bandwidth selection instead, fundamentally changing best practice for RDD estimation.

  150. Gerard, F., Rokkanen, M., & Rothe, C. (2020). Bounds on Treatment Effects in Regression Discontinuity Designs with a Manipulated Running Variable. Quantitative Economics, 11(3), 839–870.

    https://doi.org/10.3982/QE1079

    FoundationalCited on: lee bounds
    Annotation

    Gerard, Rokkanen, and Rothe extended Lee-type bounding methods to regression discontinuity designs where the running variable is subject to manipulation. They showed how to construct bounds on treatment effects that account for strategic sorting around the cutoff.

  151. Gerber, A. S., & Green, D. P. (2012). Field Experiments: Design, Analysis, and Interpretation. W. W. Norton.

    SurveyCited on: experimental design
    Annotation

    A comprehensive textbook on field experiments covering randomization, blocking, clustering, non-compliance, and attrition, with examples from political science and public policy.

  152. Glynn, A. N., & Quinn, K. M. (2010). An Introduction to the Augmented Inverse Propensity Weighted Estimator. Political Analysis, 18(1), 36–56.

    https://doi.org/10.1093/pan/mpp036

    ApplicationCited on: doubly robust estimation
    Annotation

    Glynn and Quinn introduced the AIPW estimator to political scientists, providing intuition, simulation evidence, and practical guidance. This tutorial demonstrated the advantages of doubly robust methods over propensity score weighting or outcome regression alone in social science applications.

  153. Gobillon, L., & Magnac, T. (2016). Regional Policy Evaluation: Interactive Fixed Effects and Synthetic Controls. Review of Economics and Statistics, 98(3), 535–551.

    https://doi.org/10.1162/REST_a_00537

    ApplicationCited on: synthetic control
    Annotation

    Gobillon and Magnac connected synthetic control to interactive fixed-effects models, showing that synthetic control can be interpreted as an estimator that allows for time-varying factor loadings. This paper bridges the synthetic control and factor model literatures.

  154. Goldfarb, B., & King, A. A. (2016). Scientific Apophenia in Strategic Management Research: Significance Tests & Mistaken Inference. Strategic Management Journal, 37(1), 167–176.

    https://doi.org/10.1002/smj.2459

    ApplicationManagement journalCited on: specification curve
    Annotation

    Goldfarb and King documented the problem of apophenia (finding patterns in noise) in strategic management research, driven partly by selective reporting of favorable specifications. They argued for multiverse-style robustness checks, making the case for specification curve analysis in management.

  155. Goldsmith-Pinkham, P., Sorkin, I., & Swift, H. (2020). Bartik Instruments: What, When, Why, and How. American Economic Review, 110(8), 2586–2624.

    https://doi.org/10.1257/aer.20181047

    FoundationalCited on: shift share instruments
    Annotation

    This paper provided the first rigorous econometric framework for shift-share instruments, showing that the Bartik instrument can be decomposed into a weighted sum of individual share-based instruments. They clarified that identification requires exogeneity of the initial shares, not the shocks.

  156. Goodman-Bacon, A. (2021). Difference-in-Differences with Variation in Treatment Timing. Journal of Econometrics, 225(2), 254–277.

    https://doi.org/10.1016/j.jeconom.2021.03.014

    Annotation

    Goodman-Bacon decomposed the two-way fixed-effects DID estimator into a weighted average of all possible two-group, two-period DID comparisons, revealing that some comparisons use already-treated units as controls. This paper sparked the modern revolution in staggered DID methods by exposing the bias problem.

  157. Grant, A. M. (2008). The Significance of Task Significance: Job Performance Effects, Relational Mechanisms, and Boundary Conditions. Journal of Applied Psychology, 93(1), 108–124.

    https://doi.org/10.1037/0021-9010.93.1.108

    ApplicationCited on: experimental design
    Annotation

    Grant conducted field experiments showing that briefly exposing workers to the beneficiaries of their work significantly increased their motivation and performance. This paper is a well-known example of experimental design applied within organizational behavior research.

  158. Greenland, A., & Loualiche, E. (2024). Financial Implications of Supply Chain Disruptions: Evidence from the Japanese Tsunami. Management Science, 70(5), 2928–2950.

    https://doi.org/10.1287/mnsc.2023.4855

    ApplicationManagement journalCited on: shift share instruments
    Annotation

    Greenland and Loualiche used a shift-share instrument based on pre-existing supplier linkages and the geographic incidence of the 2011 Japanese tsunami to identify the causal effects of supply chain disruptions on U.S. firms' stock returns and real outcomes. The paper illustrates how the Bartik-style approach extends naturally to settings where firm-level exposure shares interact with exogenous shocks, providing a clean identification strategy in management and finance research.

  159. Griliches, Z. (1977). Estimating the Returns to Schooling: Some Econometric Problems. Econometrica, 45(1), 1–22.

    https://doi.org/10.2307/1913285

    ApplicationCited on: ols regression
    Annotation

    Griliches systematically examined the biases in OLS estimates of returns to schooling, including ability bias and measurement error. This paper is a classic illustration of why researchers must think carefully about omitted variables when interpreting OLS coefficients causally.

  160. Griliches, Z. (1990). Patent Statistics as Economic Indicators: A Survey. Journal of Economic Literature, 28(4), 1661–1707.

    ApplicationCited on: poisson negative binomial
    Annotation

    Griliches surveyed the use of patent data as economic indicators, establishing patent counts as a key measure of innovative output. This survey motivated much of the subsequent applied work using Poisson and negative binomial models to study innovation.

  161. Gruber, J. (1994). The Incidence of Mandated Maternity Benefits. American Economic Review, 84(3), 622–641.

    ApplicationCited on: difference in differences
    Annotation

    Gruber used a DID design exploiting variation in state-level mandated maternity benefits to show that the costs of these benefits were shifted to workers in the form of lower wages. This study is a classic example of how DID can exploit policy variation across states and time.

  162. Hahn, J. (1998). On the Role of the Propensity Score in Efficient Semiparametric Estimation of Average Treatment Effects. Econometrica, 66(2), 315–331.

    https://doi.org/10.2307/2998560

    FoundationalCited on: doubly robust estimation
    Annotation

    Hahn derived the semiparametric efficiency bound for estimating average treatment effects and showed that knowledge of the propensity score does not improve the bound, but using estimated propensity scores can achieve efficiency. This paper provided the theoretical foundation for why doubly robust estimators can attain semiparametric efficiency.

  163. Hahn, J., Todd, P., & Van der Klaauw, W. (2001). Identification and Estimation of Treatment Effects with a Regression-Discontinuity Design. Econometrica, 69(1), 201–209.

    https://doi.org/10.1111/1468-0262.00183

    FoundationalCited on: regression discontinuity fuzzy
    Annotation

    This paper provided the formal econometric framework for both sharp and fuzzy regression discontinuity designs. For the fuzzy case, it showed that the treatment effect can be identified as the ratio of the discontinuity in the outcome to the discontinuity in the treatment probability, analogous to a Wald estimator.

  164. Hainmueller, J. (2012). Entropy Balancing for Causal Effects: A Multivariate Reweighting Method to Produce Balanced Samples in Observational Studies. Political Analysis, 20(1), 25–46.

    https://doi.org/10.1093/pan/mpr025

    FoundationalCited on: matching methods
    Annotation

    Hainmueller introduced entropy balancing, a reweighting scheme that directly targets covariate balance by finding weights that satisfy pre-specified balance constraints while remaining as close to uniform as possible. Entropy balancing has become a popular alternative to propensity score matching because it achieves exact balance on specified moments by construction.

  165. Hamilton, B. H., & Nickerson, J. A. (2003). Correcting for Endogeneity in Strategic Management Research. Strategic Organization, 1(1), 51–78.

    https://doi.org/10.1177/1476127003001001218

    ApplicationCited on: ols regression
    Annotation

    This paper warned strategy researchers that naive OLS estimates of the strategy-performance relationship are often biased by endogeneity. It provided an accessible tutorial on the problem and pointed toward solutions like instrumental variables and selection models.

  166. Harrison, G. W., & List, J. A. (2004). Field Experiments. Journal of Economic Literature, 42(4), 1009–1055.

    https://doi.org/10.1257/0022051043004577

    Annotation

    This paper provided an influential taxonomy of field experiments, distinguishing artefactual, framed, and natural field experiments from conventional lab experiments. It helped establish field experiments as a mainstream methodology in economics.

  167. Haushofer, J., & Shapiro, J. (2016). The Short-Term Impact of Unconditional Cash Transfers to the Poor: Experimental Evidence from Kenya. Quarterly Journal of Economics, 131(4), 1973–2042.

    https://doi.org/10.1093/qje/qjw025

    ApplicationCited on: multiple testing
    Annotation

    Haushofer and Shapiro evaluated GiveDirectly's unconditional cash transfer program in Kenya, testing effects across many outcome domains including consumption, assets, food security, health, and psychological well-being. They rigorously applied FDR corrections (Benjamini-Hochberg) across outcome families, providing a model for how to handle multiple testing transparently in large-scale randomized evaluations.

  168. Hausman, J. A. (1978). Specification Tests in Econometrics. Econometrica, 46(6), 1251–1271.

    https://doi.org/10.2307/1913827

    FoundationalCited on: fixed effects, random effects
    Annotation

    Hausman developed a general test for comparing two estimators—one consistent under a broader set of assumptions (fixed effects) and one efficient under stronger assumptions (random effects). The 'Hausman test' for choosing between fixed and random effects is one of the most frequently used specification tests in applied economics.

  169. Hausman, J. A., & Taylor, W. E. (1981). Panel Data and Unobservable Individual Effects. Econometrica, 49(6), 1377–1398.

    https://doi.org/10.2307/1911406

    FoundationalCited on: random effects
    Annotation

    Hausman and Taylor developed an instrumental variables estimator for panel data that allows consistent estimation of coefficients on time-invariant variables even when individual effects are correlated with some regressors. The Hausman-Taylor estimator occupies a middle ground between fixed effects (which cannot estimate time-invariant coefficients) and random effects (which requires strict exogeneity).

  170. Hausman, J., Hall, B. H., & Griliches, Z. (1984). Econometric Models for Count Data with an Application to the Patents–R&D Relationship. Econometrica, 52(4), 909–938.

    https://doi.org/10.2307/1911191

    FoundationalCited on: poisson negative binomial
    Annotation

    This paper developed the econometric framework for Poisson and negative binomial regression models applied to count data, using the relationship between R&D spending and patent counts as the motivating application. It established the standard approach for modeling count outcomes in economics.

  171. Hausman, J., & McFadden, D. (1984). Specification Tests for the Multinomial Logit Model. Econometrica, 52(5), 1219–1240.

    https://doi.org/10.2307/1910997

    FoundationalCited on: logit probit
    Annotation

    This paper developed a specification test for the independence of irrelevant alternatives (IIA) assumption in multinomial logit. The test allows researchers to assess whether the logit model's restrictive substitution patterns are appropriate for their data, which is critical for applied work with multiple choice categories.

  172. Haven, T. L., & Van Grootel, L. (2019). Preregistering Qualitative Research. Accountability in Research, 26(3), 229–244.

    https://doi.org/10.1080/08989621.2019.1580147

    SurveyCited on: pre registration
    Annotation

    Haven and Van Grootel explored extending pre-registration to qualitative research, discussing what elements of qualitative studies can and should be pre-registered. This paper broadens the pre-registration conversation beyond quantitative experimental designs.

  173. Heckman, J. J. (1979). Sample Selection Bias as a Specification Error. Econometrica, 47(1), 153–161.

    https://doi.org/10.2307/1912352

    FoundationalCited on: lee bounds
    Annotation

    Heckman showed that sample selection—where the observed sample is not random—leads to omitted variable bias, and proposed a two-step correction using the inverse Mills ratio. This foundational paper on selection bias motivated later nonparametric bounding approaches, including Lee bounds, as alternatives that require weaker distributional assumptions.

  174. Heckman, J. J., Ichimura, H., & Todd, P. E. (1997). Matching as an Econometric Evaluation Estimator: Evidence from Evaluating a Job Training Programme. Review of Economic Studies, 64(4), 605–654.

    https://doi.org/10.2307/2971733

    FoundationalCited on: matching methods
    Annotation

    Heckman, Ichimura, and Todd developed the econometric theory behind matching estimators, including conditions for identification and the importance of common support. They applied these methods to evaluate job training programs and showed when matching works well and when it does not.

  175. Henderson, A. D., Miller, D., & Hambrick, D. C. (2006). How Quickly Do CEOs Become Obsolete? Industry Dynamism, CEO Tenure, and Company Performance. Strategic Management Journal, 27(5), 447–460.

    https://doi.org/10.1002/smj.524

    ApplicationManagement journalCited on: fixed effects
    Annotation

    This strategy paper used firm fixed effects to study how CEO tenure affects performance in dynamic versus stable industries. It found that long CEO tenure becomes a liability in fast-changing industries, demonstrating the value of fixed effects for controlling for stable firm characteristics in strategy research.

  176. Hess, S. (2017). Randomization Inference with Stata: A Guide and Software. Stata Journal, 17(3), 630–651.

    https://doi.org/10.1177/1536867X1701700306

    ApplicationCited on: randomization inference
    Annotation

    Hess developed the ritest Stata command and provided a practical guide to implementing randomization inference. The paper covers standard and clustered randomization designs and demonstrates how to conduct Fisher exact tests for a variety of experimental and quasi-experimental settings.

  177. Ho, D. E., Imai, K., King, G., & Stuart, E. A. (2007). Matching as Nonparametric Preprocessing for Reducing Model Dependence in Parametric Causal Inference. Political Analysis, 15(3), 199–236.

    https://doi.org/10.1093/pan/mpl013

    FoundationalCited on: matching methods
    Annotation

    Argues that matching should be used as a preprocessing step before parametric modeling, reducing model dependence and improving robustness of causal estimates. This influential paper reframed matching not as a standalone estimator but as a way to make subsequent parametric analyses less sensitive to specification choices.

  178. Hoenig, J. M., & Heisey, D. M. (2001). The Abuse of Power: The Pervasive Fallacy of Power Calculations for Data Analysis. American Statistician, 55(1), 19–24.

    https://doi.org/10.1198/000313001300339897

    FoundationalCited on: power analysis
    Annotation

    Hoenig and Heisey demonstrated that post hoc (observed) power calculations are fundamentally flawed because they are a monotone function of the p-value and add no information beyond the test result itself. This paper is essential reading for understanding why power analysis must be conducted before data collection.

  179. Hoetker, G. (2007). The Use of Logit and Probit Models in Strategic Management Research: Critical Issues. Strategic Management Journal, 28(4), 331–343.

    https://doi.org/10.1002/smj.582

    ApplicationManagement journalCited on: logit probit
    Annotation

    Hoetker reviewed how strategy researchers use logit and probit models and identified common pitfalls, including misinterpretation of coefficients across groups and incorrect use of interaction terms. This paper provided concrete guidance for improving practice in management journals.

  180. Hofmann, D. A. (1997). An Overview of the Logic and Rationale of Hierarchical Linear Models. Journal of Management, 23(6), 723–744.

    https://doi.org/10.1177/014920639702300602

    ApplicationManagement journalCited on: random effects
    Annotation

    Hofmann introduced hierarchical linear models to the management research community, explaining when and why multilevel random-effects models are appropriate for organizational data with nested structures. This tutorial was highly influential in promoting multilevel methods in management journals.

  181. Holland, P. W. (1986). Statistics and Causal Inference. Journal of the American Statistical Association, 81(396), 945–960.

    https://doi.org/10.1080/01621459.1986.10478354

    FoundationalCited on: ols regression
    Annotation

    Holland articulated the fundamental problem of causal inference—that we can never observe both potential outcomes for the same unit—and formalized the Rubin Causal Model framework. His dictum 'no causation without manipulation' shaped how a generation of researchers thinks about the conditions under which statistical associations can be given causal interpretations.

  182. Hollenbeck, J. R., & Wright, P. M. (2017). Harking, Sharking, and Tharking: Making the Case for Post Hoc Analysis of Scientific Data. Journal of Management, 43(1), 5–18.

    https://doi.org/10.1177/0149206316679487

    ApplicationManagement journalCited on: multiple testing
    Annotation

    Hollenbeck and Wright discussed the multiple testing problem in management research in the context of post hoc analyses. They argued for transparency about data exploration while maintaining statistical rigor, emphasizing the importance of adjusting for multiple comparisons when testing is exploratory.

  183. Horowitz, J. L., & Manski, C. F. (2000). Nonparametric Analysis of Randomized Experiments with Missing Covariate and Outcome Data. Journal of the American Statistical Association, 95(449), 77–84.

    https://doi.org/10.1080/01621459.2000.10473902

    FoundationalCited on: lee bounds
    Annotation

    Horowitz and Manski extended the bounding approach to experiments with missing data on both covariates and outcomes. They showed how to construct valid bounds under different assumptions about the missing data mechanism, providing a principled alternative to complete-case analysis and imputation.

  184. Huselid, M. A. (1995). The Impact of Human Resource Management Practices on Turnover, Productivity, and Corporate Financial Performance. Academy of Management Journal, 38(3), 635–672.

    https://doi.org/10.2307/256741

    ApplicationManagement journalCited on: ols regression
    Annotation

    This influential management study used OLS (and related cross-sectional methods) to estimate the relationship between HR practices and firm performance. It helped launch the field of strategic HRM and illustrates both the power and limitations of regression-based approaches in management research.

  185. Iacus, S. M., King, G., & Porro, G. (2012). Causal Inference without Balance Checking: Coarsened Exact Matching. Political Analysis, 20(1), 1–24.

    https://doi.org/10.1093/pan/mpr013

    FoundationalCited on: matching methods
    Annotation

    This paper introduced Coarsened Exact Matching (CEM), which coarsens covariates into bins and then performs exact matching within those bins. CEM avoids many pitfalls of propensity score matching, such as the need to check balance iteratively, and gives the researcher direct control over the matching quality.

  186. Imai, K., Keele, L., & Tingley, D. (2010). A General Approach to Causal Mediation Analysis. Psychological Methods, 15(4), 309–334.

    https://doi.org/10.1037/a0020761

    FoundationalCited on: causal mediation analysis
    Annotation

    Imai, Keele, and Tingley developed a general framework for causal mediation analysis grounded in the potential outcomes framework. They clarified the assumptions needed for identifying causal mediation effects, particularly the sequential ignorability assumption, and provided sensitivity analyses for violations.

  187. Imai, K., & Kim, I. S. (2019). When Should We Use Unit Fixed Effects Regression Models for Causal Inference with Longitudinal Data?. American Journal of Political Science, 63(2), 467–490.

    https://doi.org/10.1111/ajps.12417

    FoundationalCited on: fixed effects
    Annotation

    Imai and Kim provided a modern causal-inference framework for understanding when unit fixed effects regression yields unbiased estimates with longitudinal data. They clarified the often-implicit assumptions about treatment history and carryover effects, offering a more rigorous foundation for applied fixed effects analysis.

  188. Imbens, G. W., & Angrist, J. D. (1994). Identification and Estimation of Local Average Treatment Effects. Econometrica, 62(2), 467–475.

    https://doi.org/10.2307/2951620

    FoundationalCited on: instrumental variables
    Annotation

    The foundational paper on LATE. Showed that IV identifies the average causal effect for compliers -- the subpopulation whose treatment status is changed by the instrument -- under the monotonicity assumption. This reinterpretation fundamentally changed how researchers understand what IV estimates.

  189. Imbens, G. W. (2004). Nonparametric Estimation of Average Treatment Effects Under Exogeneity: A Review. Review of Economics and Statistics, 86(1), 4–29.

    https://doi.org/10.1162/003465304323023651

    SurveyCited on: matching methods
    Annotation

    Imbens provided a comprehensive review of nonparametric methods for estimating average treatment effects under the unconfoundedness assumption, covering matching, weighting, and subclassification estimators. This survey unified the theoretical foundations of matching methods and clarified the connections between different estimators used in program evaluation.

  190. Imbens, G. W., & Manski, C. F. (2004). Confidence Intervals for Partially Identified Parameters. Econometrica, 72(6), 1845–1857.

    https://doi.org/10.1111/j.1468-0262.2004.00555.x

    FoundationalCited on: lee bounds
    Annotation

    Imbens and Manski developed methods for constructing valid confidence intervals when parameters are only partially identified—that is, when the data and assumptions narrow the parameter to a set rather than a point. This paper provides the inferential foundation for reporting uncertainty around bounds estimates, including Lee bounds.

  191. Imbens, G. W., & Lemieux, T. (2008). Regression Discontinuity Designs: A Guide to Practice. Journal of Econometrics, 142(2), 615–635.

    https://doi.org/10.1016/j.jeconom.2007.05.001

    Annotation

    Imbens and Lemieux provided a comprehensive practical guide to implementing RDD, covering bandwidth selection, functional form, and graphical analysis. Their treatment of fuzzy RDD as a local IV estimator clarified the interpretation and implementation for applied researchers.

  192. Imbens, G. W. (2015). Matching Methods in Practice: Three Examples. Journal of Human Resources, 50(2), 373–419.

    https://doi.org/10.3368/jhr.50.2.373

    ApplicationCited on: matching methods
    Annotation

    Imbens demonstrated how to implement matching methods in practice through three detailed empirical examples, covering propensity score estimation, covariate balance assessment, and sensitivity analysis. This paper is an invaluable practical guide that bridges the gap between matching theory and applied research.

  193. Imbens, G. W., & Rubin, D. B. (2015). Causal Inference for Statistics, Social, and Biomedical Sciences: An Introduction. Cambridge University Press.

    https://doi.org/10.1017/CBO9781139025751

    Annotation

    A comprehensive textbook grounding causal inference in the potential outcomes framework, with detailed treatment of matching, propensity scores, and subclassification. Provides rigorous foundations for selection-on-observables methods.

  194. Ioannidis, J. P. A., Stanley, T. D., & Doucouliagos, H. (2017). The Power of Bias in Economics Research. Economic Journal, 127(605), F236–F265.

    https://doi.org/10.1111/ecoj.12461

    ApplicationCited on: power analysis
    Annotation

    Ioannidis, Stanley, and Doucouliagos conducted a large-scale assessment of statistical power in economics research and found that the median power to detect typical effect sizes was only 18%. They documented widespread underpowering and publication bias, highlighting the importance of ex ante power analysis.

  195. Islam, N. (1995). Growth Empirics: A Panel Data Approach. Quarterly Journal of Economics, 110(4), 1127–1170.

    https://doi.org/10.2307/2946651

    ApplicationCited on: random effects
    Annotation

    Islam applied panel data methods—including random effects and fixed effects—to the cross-country growth regression framework, showing that accounting for unobserved country heterogeneity substantially changes estimates of convergence rates. This paper demonstrated the importance of choosing between fixed and random effects in macroeconomic growth empirics.

  196. Jaeger, D. A., Ruist, J., & Stuhler, J. (2018). Shift-Share Instruments and the Impact of Immigration. NBER Working Paper No. 24285.

    https://doi.org/10.3386/w24285

    SurveyCited on: shift share instruments
    Annotation

    Jaeger, Ruist, and Stuhler highlighted a threat to shift-share instruments in immigration research: serial correlation in immigrant inflows can bias estimates if past immigration affects current outcomes through channels other than current immigration. This paper raised important concerns about the exclusion restriction.

  197. Kang, J. D. Y., & Schafer, J. L. (2007). Demystifying Double Robustness: A Comparison of Alternative Strategies for Estimating a Population Mean from Incomplete Data. Statistical Science, 22(4), 523–539.

    https://doi.org/10.1214/07-STS227

    SurveyCited on: doubly robust estimation
    Annotation

    Kang and Schafer showed through simulations that doubly robust estimators can perform poorly when both models are moderately misspecified, even though they remain consistent when one model is correct. This influential paper tempered enthusiasm and motivated further methodological work on practical performance.

  198. Katila, R., & Ahuja, G. (2002). Something Old, Something New: A Longitudinal Study of Search Behavior and New Product Introduction. Academy of Management Journal, 45(6), 1183–1194.

    https://doi.org/10.2307/3069433

    ApplicationManagement journalCited on: poisson negative binomial
    Annotation

    Katila and Ahuja used negative binomial models to study how the depth and scope of a firm's knowledge search affect new product introductions. This paper is a widely cited application of count data models in the strategic management and innovation literature.

  199. Kaul, A., Klossner, S., Pfeifer, G., & Schieler, M. (2022). Standard Synthetic Control Methods: The Case of Using a False Predictor. Journal of Business & Economic Statistics, 40(2), 829–838.

    https://doi.org/10.1080/07350015.2021.1930012

    ApplicationCited on: matching methods
    Annotation

    While focused on synthetic control (a form of matching for aggregate units), this paper highlights pitfalls when matching on pre-treatment outcomes and is relevant for understanding matching assumptions more broadly. [UNVERIFIED - publication year/details may differ from working paper version]

  200. Kellogg, R. (2011). Learning by Drilling: Interfirm Learning and Relationship Persistence in the Texas Oilpatch. Quarterly Journal of Economics, 126(4), 1961–2004.

    https://doi.org/10.1093/qje/qjr039

    ApplicationCited on: difference in differences
    Annotation

    Kellogg used a DID approach leveraging oil price shocks to study how interfirm relationships affect productivity in the Texas oil industry. It is an excellent example of DID applied to organizational learning and firm boundaries questions relevant to strategy scholars.

  201. King, G., & Roberts, M. E. (2015). How Robust Standard Errors Expose Methodological Problems They Do Not Fix, and What to Do About It. Political Analysis, 23(2), 159–179.

    https://doi.org/10.1093/pan/mpu015

    SurveyCited on: ols regression
    Annotation

    This paper argues that researchers often use robust standard errors as a band-aid rather than fixing the underlying model specification. It provides practical guidance on when robust SEs are appropriate and when the model itself needs to be reconsidered.

  202. King, G., & Nielsen, R. (2019). Why Propensity Scores Should Not Be Used for Matching. Political Analysis, 27(4), 435–454.

    https://doi.org/10.1017/pan.2019.11

    SurveyCited on: matching methods
    Annotation

    King and Nielsen argue that propensity score matching can increase imbalance, model dependence, and bias relative to other matching methods. This provocative paper has influenced a shift toward alternatives like CEM and Mahalanobis distance matching in applied research.

  203. Kline, P., & Walters, C. R. (2016). Evaluating Public Programs with Close Substitutes: The Case of Head Start. Quarterly Journal of Economics, 131(4), 1795–1848.

    https://doi.org/10.1093/qje/qjw027

    ApplicationCited on: lee bounds
    Annotation

    Kline and Walters applied bounding methods related to Lee bounds to evaluate Head Start in the presence of substitute programs. Their analysis demonstrates how partial identification and bounding approaches can address complex selection issues in program evaluation.

  204. Knaus, M. C., Lechner, M., & Strittmatter, A. (2021). Machine Learning Estimation of Heterogeneous Causal Effects: Empirical Monte Carlo Evidence. Econometrics Journal, 24(1), 134–161.

    https://doi.org/10.1093/ectj/utaa014

    Annotation

    Knaus, Lechner, and Strittmatter applied DML-based methods to estimate heterogeneous causal effects of a Swiss active labor market program, comparing causal forests, DML, and other machine learning approaches. The paper provides an empirical Monte Carlo framework that uses real data to benchmark different estimators, offering practical guidance for applied researchers choosing among machine learning causal inference tools.

  205. Koh, P.-S., & Reeb, D. M. (2015). Missing R&D. Journal of Accounting and Economics, 60(1), 73–94.

    https://doi.org/10.1016/j.jacceco.2015.03.004

    ApplicationCited on: event studies
    Annotation

    Koh and Reeb used event study methodology to examine stock market reactions to firms that report versus do not report R&D expenditures, showing that missing R&D data is not random and has implications for how investors value innovation. This paper illustrates event study methods applied to accounting and disclosure questions.

  206. Kothari, S. P., & Warner, J. B. (2007). Econometrics of Event Studies. Handbook of Empirical Corporate Finance, 1, 3–36.

    https://doi.org/10.1016/B978-0-444-53265-7.50015-9

    FoundationalCited on: event studies
    Annotation

    Kothari and Warner updated the survey of event study methods, covering long-horizon event studies, cross-sectional regression approaches, and the econometric challenges that arise with overlapping events and event-induced variance changes.

  207. Krueger, A. B. (1999). Experimental Estimates of Education Production Functions. Quarterly Journal of Economics, 114(2), 497–532.

    https://doi.org/10.1162/003355399556052

    ApplicationCited on: ols regression
    Annotation

    Uses Tennessee's Project STAR randomized class-size experiment to estimate the effect of class size on student achievement via OLS. Because treatment was randomized, the OLS coefficient has a causal interpretation, demonstrating that the method is not the issue -- the research design is what determines causality.

  208. Kunzel, S. R., Sekhon, J. S., Bickel, P. J., & Yu, B. (2019). Metalearners for Estimating Heterogeneous Treatment Effects Using Machine Learning. Proceedings of the National Academy of Sciences, 116(10), 4156–4165.

    https://doi.org/10.1073/pnas.1804597116

    ApplicationCited on: causal forests
    Annotation

    Kunzel and colleagues proposed the X-learner meta-algorithm for estimating CATEs and systematically compared it with T-learners and S-learners. The paper provides practical guidance on when different meta-learning strategies, including those based on causal forests, perform well or poorly.

  209. Laird, N. M., & Ware, J. H. (1982). Random-Effects Models for Longitudinal Data. Biometrics, 38(4), 963–974.

    https://doi.org/10.2307/2529876

    FoundationalCited on: random effects
    Annotation

    Laird and Ware developed the general framework for random-effects models in longitudinal data, integrating fixed population parameters with random individual-level effects. This paper is foundational for the mixed-effects modeling approach widely used in biostatistics and social sciences.

  210. LaLonde, R. J. (1986). Evaluating the Econometric Evaluations of Training Programs with Experimental Data. American Economic Review, 76(4), 604–620.

    FoundationalCited on: matching methods
    Annotation

    LaLonde compared econometric estimates of a job training program's effect with experimental benchmarks from a randomized trial, finding that non-experimental methods often failed to replicate the experimental results. This paper established the standard test bed for evaluating matching and other observational causal methods.

  211. Lambert, D. (1992). Zero-Inflated Poisson Regression, with an Application to Defects in Manufacturing. Technometrics, 34(1), 1–14.

    https://doi.org/10.2307/1269547

    FoundationalCited on: poisson negative binomial
    Annotation

    Lambert introduced the zero-inflated Poisson (ZIP) model, which accounts for excess zeros in count data by mixing a point mass at zero with a Poisson distribution. The ZIP model has become a standard tool for count outcomes where a subpopulation generates only zeros, such as patent counts for non-innovating firms.

  212. Leamer, E. E. (1983). Let's Take the Con Out of Econometrics. American Economic Review, 73(1), 31–43.

    FoundationalCited on: specification curve
    Annotation

    Leamer's classic paper argued that the sensitivity of empirical results to specification choices undermines the credibility of econometric evidence. He proposed extreme bounds analysis, an early form of systematic robustness testing that anticipated modern specification curve analysis by several decades.

  213. Lee, D. S. (2008). Randomized Experiments from Non-random Selection in U.S. House Elections. Journal of Econometrics, 142(2), 675–697.

    https://doi.org/10.1016/j.jeconom.2007.05.004

    FoundationalCited on: regression discontinuity sharp
    Annotation

    Lee formalized the conditions under which an RDD is 'as good as' a randomized experiment—namely, when agents cannot precisely manipulate the running variable around the cutoff. Applied to U.S. House elections, this paper established the modern theoretical foundation for sharp RDD.

  214. Lee, D. S. (2009). Training, Wages, and Sample Selection: Estimating Sharp Bounds on Treatment Effects. Review of Economic Studies, 76(3), 1071–1102.

    https://doi.org/10.1111/j.1467-937X.2009.00536.x

    FoundationalCited on: lee bounds
    Annotation

    Lee developed sharp nonparametric bounds on treatment effects in the presence of sample selection, requiring only a monotonicity assumption (that treatment affects selection in one direction). These bounds are widely used to address attrition and selective sample composition in randomized experiments.

  215. Lee, D. S., & Lemieux, T. (2010). Regression Discontinuity Designs in Economics. Journal of Economic Literature, 48(2), 281–355.

    https://doi.org/10.1257/jel.48.2.281

    Annotation

    Lee and Lemieux wrote the definitive survey of RDD methods in economics, covering both sharp and fuzzy designs, validity tests, and extensions. This paper is the standard reference for understanding the econometric theory and practical implementation of RDD.

  216. Lee, D. S., McCrary, J., Moreira, M. J., & Porter, J. (2022). Valid t-Ratio Inference for IV. American Economic Review, 112(10), 3260–3290.

    https://doi.org/10.1257/aer.20211063

    FoundationalCited on: instrumental variables
    Annotation

    Lee, McCrary, Moreira, and Porter showed that the conventional t-ratio in IV regression has correct size when the first-stage F-statistic exceeds 104.7, far above the traditional Stock-Yogo threshold of 10. This paper fundamentally raised the bar for what constitutes a sufficiently strong instrument and has prompted researchers to reconsider previously accepted IV results.

  217. Lerner, J., & Wulf, J. (2007). Innovation and Incentives: Evidence from Corporate R&D. Review of Economics and Statistics, 89(4), 634–644.

    https://doi.org/10.1162/rest.89.4.634

    ApplicationCited on: difference in differences
    Annotation

    This paper applied panel data methods including DID-style designs to study how compensation incentives for R&D managers affect innovation outcomes. It illustrates how DID thinking can be applied to management and innovation questions.

  218. Levitt, S. D. (1997). Using Electoral Cycles in Police Hiring to Estimate the Effect of Police on Crime. American Economic Review, 87(3), 270–290.

    ApplicationCited on: instrumental variables
    Annotation

    Levitt used the timing of mayoral and gubernatorial elections as an instrument for police hiring to estimate the causal effect of police on crime. The paper illustrates the IV approach in a policy-relevant setting where the key concern is reverse causality (more crime leads to more police).

  219. List, J. A., Sadoff, S., & Wagner, M. (2011). So You Want to Run an Experiment, Now What? Some Simple Rules of Thumb for Optimal Experimental Design. Experimental Economics, 14(4), 439–457.

    https://doi.org/10.1007/s10683-011-9275-7

    SurveyCited on: experimental design
    Annotation

    This practical guide provides rules of thumb for sample size, treatment assignment, and other design decisions in field experiments. It is a useful starting point for researchers planning their first experiment.

  220. List, J. A., Shaikh, A. M., & Xu, Y. (2019). Multiple Hypothesis Testing in Experimental Economics. Experimental Economics, 22(4), 773–793.

    https://doi.org/10.1007/s10683-018-09597-5

    ApplicationCited on: multiple testing
    Annotation

    List, Shaikh, and Xu provided practical guidance on addressing multiple hypothesis testing in experimental economics. They compared various correction methods including Bonferroni, Holm, and FDR procedures, and demonstrated their application to field experiments with multiple outcome variables.

  221. Long, J. S. (1997). Regression Models for Categorical and Limited Dependent Variables. Sage Publications.

    SurveyCited on: logit probit
    Annotation

    A widely used reference for applied researchers working with binary, ordinal, multinomial, and count outcome models, with clear exposition of interpretation and software implementation.

  222. Long, J. S., & Ervin, L. H. (2000). Using Heteroscedasticity Consistent Standard Errors in the Linear Regression Model. The American Statistician, 54(3), 217–224.

    https://doi.org/10.1080/00031305.2000.10474549

    FoundationalCited on: ols regression
    Annotation

    A simulation study comparing HC0, HC1, HC2, and HC3 heteroscedasticity-consistent standard error estimators. Found that HC3 provides the best finite-sample performance, influencing R's sandwich package to adopt HC3 as its default.

  223. Lovell, M. C. (1963). Seasonal Adjustment of Economic Time Series and Multiple Regression Analysis. Journal of the American Statistical Association.

    https://doi.org/10.1080/01621459.1963.10480682

    FoundationalCited on: ols regression
    Annotation

    Generalized the Frisch-Waugh (1933) partitioned-regression result to seasonal adjustment and arbitrary partitions of the regressor matrix, completing the theorem now known as Frisch-Waugh-Lovell (FWL).

  224. Luca, M. (2016). Reviews, Reputation, and Revenue: The Case of Yelp.com. Harvard Business School Working Paper No. 12-016.

    ApplicationCited on: experimental design
    Annotation

    Luca used regression discontinuity on Yelp's rounding thresholds as a quasi-experimental design to show that a one-star increase in Yelp rating causes a 5-9% increase in restaurant revenue. This paper illustrates creative experimental thinking applied to platform and strategy questions. [UNVERIFIED - working paper; widely cited but DOI may vary by version]

  225. Lunceford, J. K., & Davidian, M. (2004). Stratification and Weighting via the Propensity Score in Estimation of Causal Treatment Effects: A Comparative Study. Statistics in Medicine, 23(19), 2937–2960.

    https://doi.org/10.1002/sim.1903

    ApplicationCited on: doubly robust estimation
    Annotation

    Lunceford and Davidian compared propensity score methods including doubly robust estimators in a systematic simulation study. They showed that doubly robust estimators generally perform well and recommended them as a default approach for causal inference from observational data.

  226. MacKinlay, A. C. (1997). Event Studies in Economics and Finance. Journal of Economic Literature, 35(1), 13–39.

    FoundationalCited on: event studies
    Annotation

    MacKinlay provided a comprehensive methodological survey of event studies, covering the statistical framework, estimation windows, abnormal return calculations, and testing procedures. This paper remains the standard reference for researchers designing and implementing event studies.

  227. MacKinnon, D. P., Fairchild, A. J., & Fritz, M. S. (2007). Mediation Analysis. Annual Review of Psychology, 58, 593–614.

    https://doi.org/10.1146/annurev.psych.58.110405.085542

    ApplicationCited on: causal mediation analysis
    Annotation

    MacKinnon, Fairchild, and Fritz provided an accessible review of mediation analysis methods for psychologists, covering the Baron-Kenny approach, the Sobel test, bootstrapping methods, and extensions to multiple mediators. This survey helped bridge the gap between traditional and modern approaches.

  228. Manski, C. F. (1990). Nonparametric Bounds on Treatment Effects. American Economic Review: Papers & Proceedings, 80(2), 319–323.

    FoundationalCited on: lee bounds
    Annotation

    Manski introduced the partial identification approach to treatment effects, showing that even without strong assumptions, one can bound causal effects using the observed data. His worst-case bounds framework laid the theoretical foundation for Lee's sharper bounds under the monotonicity assumption.

  229. Manski, C. F. (1993). Identification of Endogenous Social Effects: The Reflection Problem. Review of Economic Studies, 60(3), 531–542.

    https://doi.org/10.2307/2298123

    FoundationalCited on: instrumental variables
    Annotation

    Formalized the reflection problem: when individual outcomes depend on group averages, the group average is simultaneously determined by its members, making it impossible to distinguish true social (endogenous) effects from correlated effects without additional structure.

  230. Manski, C. F. (2003). Partial Identification of Probability Distributions. Springer.

    https://doi.org/10.1007/b97478

    FoundationalCited on: lee bounds
    Annotation

    Manski's monograph provided a comprehensive treatment of partial identification, showing how to derive informative bounds on parameters of interest when point identification is not possible. This book formalized and extended his earlier work on bounding treatment effects and is the definitive reference for the theoretical framework underlying Lee bounds.

  231. Masicampo, E. J., & Lalande, D. (2012). A Peculiar Prevalence of p Values Just Below .05. Quarterly Journal of Experimental Psychology, 65(11), 2271–2279.

    https://doi.org/10.1080/17470218.2012.711335

    ApplicationCited on: specification curve
    Annotation

    Masicampo and Lalande documented a suspicious clustering of p-values just below the .05 threshold in psychology journals, providing empirical evidence of publication bias and specification searching. Their findings motivate the use of specification curve analysis as a tool for assessing the robustness of results across analytical choices.

  232. Masten, M. A., & Poirier, A. (2021). Salvaging Falsified Instrumental Variable Models. Working Paper.

    ApplicationCited on: sensitivity analysis
    Annotation

    Masten and Poirier developed methods for recovering useful causal conclusions from instrumental variable models that fail overidentification tests. Rather than discarding a falsified model entirely, they showed how to construct bounds on the causal parameter that remain valid under weaker assumptions, providing a structured approach to sensitivity analysis when standard IV assumptions are violated.

  233. McCrary, J. (2008). Manipulation of the Running Variable in the Regression Discontinuity Design: A Density Test. Journal of Econometrics, 142(2), 698–714.

    https://doi.org/10.1016/j.jeconom.2007.05.005

    ApplicationCited on: regression discontinuity sharp
    Annotation

    McCrary developed the standard test for whether agents are manipulating the running variable to sort around the cutoff. If the density of the running variable shows a discontinuity at the cutoff, the RDD is compromised. This density test is now a routine validity check in all RDD papers.

  234. McFadden, D. (1974). Conditional Logit Analysis of Qualitative Choice Behavior. Frontiers in Econometrics, 105–142.

    FoundationalCited on: logit probit
    Annotation

    McFadden developed the conditional logit model grounded in random utility theory, showing how discrete choices among alternatives can be modeled by assuming individuals maximize utility with an extreme-value distributed error. This work earned him the 2000 Nobel Prize and remains the foundation of discrete choice analysis.

  235. McKenzie, D. (2012). Beyond Baseline and Follow-Up: The Case for More T in Experiments. Journal of Development Economics, 99(2), 210–221.

    https://doi.org/10.1016/j.jdeveco.2012.01.002

    FoundationalCited on: power analysis
    Annotation

    McKenzie showed that collecting multiple rounds of data dramatically increases statistical power in randomized experiments. He demonstrated that ANCOVA with baseline data and difference-in-differences with multiple time periods can substantially reduce the required sample size, which is particularly valuable in development economics.

  236. McWilliams, A., & Siegel, D. (1997). Event Studies in Management Research: Theoretical and Empirical Issues. Academy of Management Journal, 40(3), 626–657.

    https://doi.org/10.2307/257056

    ApplicationManagement journalCited on: event studies
    Annotation

    McWilliams and Siegel introduced event study methods to the management research community, explaining the assumptions, methodology, and common pitfalls. This tutorial article led to widespread adoption of event studies in strategic management research.

  237. Miguel, E., Satyanath, S., & Sergenti, E. (2004). Economic Shocks and Civil Conflict: An Instrumental Variables Approach. Journal of Political Economy, 112(4), 725–753.

    https://doi.org/10.1086/421174

    ApplicationCited on: instrumental variables
    Annotation

    Instruments for economic growth using rainfall variation to estimate the causal effect of economic shocks on civil conflict in Sub-Saharan Africa. A clean and widely cited example of using weather as an instrumental variable, illustrating both the power and the exclusion restriction challenges of weather-based instruments.

  238. Miguel, E., Camerer, C., Casey, K., Cohen, J., Esterling, K. M., Gerber, A., Glennerster, R., Green, D. P., Humphreys, M., Imbens, G., Laitin, D., Madon, T., Nelson, L., Nosek, B. A., Petersen, M., Sedlmayr, R., Simmons, J. P., Simonsohn, U., & Van der Laan, M. (2014). Promoting Transparency in Social Science Research. Science, 343(6166), 30–31.

    https://doi.org/10.1126/science.1245317

    FoundationalCited on: pre registration
    Annotation

    A coalition of leading social scientists called for greater transparency in research, including pre-registration of studies and analysis plans, open data, and replication. This short but influential piece in Science helped establish the norms and infrastructure for pre-registration in social science.

  239. Mincer, J. (1974). Schooling, Experience, and Earnings. National Bureau of Economic Research / Columbia University Press.

    ApplicationCited on: ols regression
    Annotation

    Mincer's earnings equation—regressing log wages on years of schooling and experience—became one of the most replicated OLS models in economics. It established the standard approach for estimating returns to education and remains a benchmark in labor economics.

  240. Montiel Olea, J. L., & Pflueger, C. (2013). A Robust Test for Weak Instruments. Journal of Business & Economic Statistics, 31(3), 358–369.

    https://doi.org/10.1080/00401706.2013.806694

    FoundationalCited on: instrumental variables
    Annotation

    Proposes an effective F-statistic for testing weak instruments that is robust to heteroscedasticity, serial correlation, and clustering — unlike the conventional first-stage F. The effective F is now the standard diagnostic for instrument strength in applied IV research.

  241. Moulton, B. R. (1990). An Illustration of a Pitfall in Estimating the Effects of Aggregate Variables on Micro Units. Review of Economics and Statistics, 72(2), 334–338.

    https://doi.org/10.2307/2109724

    FoundationalCited on: ols regression
    Annotation

    Moulton demonstrated that when aggregate-level variables (such as state policies) are used to explain individual-level outcomes, OLS standard errors that ignore within-group correlation can be dramatically understated. This paper established the 'Moulton problem' and motivated the widespread adoption of clustered standard errors in applied microeconomics.

  242. Mullainathan, S., & Spiess, J. (2017). Machine Learning: An Applied Econometric Approach. Journal of Economic Perspectives, 31(2), 87–106.

    https://doi.org/10.1257/jep.31.2.87

    Annotation

    Mullainathan and Spiess provided an accessible introduction to machine learning for economists, clarifying the distinction between prediction and causal inference tasks. They discussed how methods like DML use machine learning for prediction of nuisance functions while maintaining valid causal inference, a framing widely adopted in management and strategy research.

  243. Munafo, M. R., Nosek, B. A., Bishop, D. V. M., Button, K. S., Chambers, C. D., Percie du Sert, N., Simonsohn, U., Wagenmakers, E.-J., Ware, J. J., & Ioannidis, J. P. A. (2017). A Manifesto for Reproducible Science. Nature Human Behaviour, 1, 0021.

    https://doi.org/10.1038/s41562-016-0021

    FoundationalCited on: specification curve
    Annotation

    This manifesto identified threats to reproducible science, including analytical flexibility and specification searching, and proposed solutions including pre-registration and multiverse analysis. It provided the broader scientific reform context within which specification curve analysis emerged as a practical tool.

  244. Mundlak, Y. (1978). On the Pooling of Time Series and Cross Section Data. Econometrica, 46(1), 69–85.

    https://doi.org/10.2307/1913646

    FoundationalCited on: fixed effects, random effects
    Annotation

    Mundlak showed that the fixed effects estimator can be understood as an OLS regression that includes the group means of all time-varying regressors. This 'correlated random effects' interpretation bridges the fixed effects and random effects models and clarifies exactly what assumption is being relaxed.

  245. Muralidharan, K., Niehaus, P., & Sukhtankar, S. (2019). Building State Capacity: Evidence from Biometric Smartcards in India. American Economic Review, 109(10), 3542–3597.

    https://doi.org/10.1257/aer.20141346

    ApplicationCited on: power analysis
    Annotation

    Muralidharan, Niehaus, and Sukhtankar conducted a large-scale cluster-randomized evaluation of biometric smartcards for welfare payments in India, featuring detailed ex ante power calculations for their primary and secondary outcomes across districts. The paper demonstrates best practices for power analysis in a complex cluster-randomized design, showing how minimum detectable effects were computed and reported to justify the experimental design.

  246. Murray, M. P. (2006). Avoiding Invalid Instruments and Coping with Weak Instruments. Journal of Economic Perspectives, 20(4), 111–132.

    https://doi.org/10.1257/jep.20.4.111

    SurveyCited on: instrumental variables
    Annotation

    Practical guidance on evaluating instrument validity and dealing with weak instruments in applied work. Written in an accessible style, it helps applied researchers think critically about their instrument choices and provides concrete strategies for addressing common IV pitfalls.

  247. Newey, W. K., & West, K. D. (1987). A Simple, Positive Semi-Definite, Heteroskedasticity and Autocorrelation Consistent Covariance Matrix. Econometrica, 55(3), 703–708.

    https://doi.org/10.2307/1913610

    FoundationalCited on: ols regression
    Annotation

    This short but hugely influential paper extended White's robust standard errors to also account for autocorrelation in time-series data. The 'Newey-West standard errors' or 'HAC standard errors' are standard practice whenever researchers work with data that have a time dimension.

  248. Nickell, S. (1981). Biases in Dynamic Models with Fixed Effects. Econometrica, 49(6), 1417–1426.

    https://doi.org/10.2307/1911408

    SurveyCited on: fixed effects
    Annotation

    Nickell showed that including a lagged dependent variable in a fixed effects regression creates a bias that does not vanish as the number of cross-sectional units grows. This 'Nickell bias' is a critical concern for researchers using fixed effects in dynamic panel models with short time series.

  249. Nie, X., & Wager, S. (2021). Quasi-Oracle Estimation of Heterogeneous Treatment Effects. Biometrika, 108(2), 299–319.

    https://doi.org/10.1093/biomet/asaa076

    FoundationalCited on: causal forests
    Annotation

    Nie and Wager proposed the R-learner, a two-step approach for estimating heterogeneous treatment effects that first residualizes outcomes and treatment on covariates, then estimates the CATE by regressing outcome residuals on treatment residuals. This approach can use any machine learning method including causal forests.

  250. Nosek, B. A., Ebersole, C. R., DeHaven, A. C., & Mellor, D. T. (2018). The Preregistration Revolution. Proceedings of the National Academy of Sciences, 115(11), 2600–2606.

    https://doi.org/10.1073/pnas.1708274114

    FoundationalCited on: pre registration
    Annotation

    Nosek and colleagues made the case for widespread adoption of pre-registration, arguing that it distinguishes confirmatory from exploratory analyses, reduces publication bias, and increases the credibility of empirical research. This paper helped catalyze the pre-registration movement across the social sciences.

  251. Olken, B. A. (2015). Promises and Perils of Pre-Analysis Plans. Journal of Economic Perspectives, 29(3), 61–80.

    https://doi.org/10.1257/jep.29.3.61

    ApplicationCited on: pre registration
    Annotation

    Olken provided a balanced assessment of pre-analysis plans in development economics, discussing both benefits (reduced specification searching, increased credibility) and costs (loss of flexibility, difficulty specifying analyses in advance). This paper is essential reading for understanding the practical tradeoffs of pre-registration.

  252. Oprescu, M., Syrgkanis, V., & Wu, Z. S. (2019). Orthogonal Random Forest for Causal Inference. Proceedings of the 36th International Conference on Machine Learning, 97, 4932–4941.

    Annotation

    Oprescu, Syrgkanis, and Wu combined orthogonal moment conditions from DML with random forests, creating orthogonal random forests that are robust to estimation of nuisance components. This approach bridges the DML and causal forest literatures and is implemented in Microsoft's EconML package.

  253. Orben, A., & Przybylski, A. K. (2019). The Association between Adolescent Well-Being and Digital Technology Use. Nature Human Behaviour, 3(2), 173–182.

    https://doi.org/10.1038/s41562-018-0506-1

    ApplicationCited on: specification curve
    Annotation

    Orben and Przybylski applied specification curve analysis to the hotly debated question of whether digital technology use harms adolescent well-being, running over 20,000 specifications across three large datasets. They found that technology use has a negligible negative association with well-being, far smaller than commonly assumed, demonstrating how specification curve analysis can bring clarity to contested empirical questions by mapping the full space of defensible analytical choices.

  254. Oster, E. (2019). Unobservable Selection and Coefficient Stability: Theory and Evidence. Journal of Business & Economic Statistics, 37(2), 187–204.

    https://doi.org/10.1080/07350015.2016.1227711

    FoundationalCited on: sensitivity analysis
    Annotation

    Oster extended the Altonji, Elder, and Taber approach to assess the robustness of regression estimates to omitted variable bias. She proposed a bounding method based on the proportional selection assumption and coefficient stability across specifications, now widely used in applied economics.

  255. Palepu, K. G. (1986). Predicting Takeover Targets: A Methodological and Empirical Analysis. Journal of Accounting and Economics, 8(1), 3–35.

    https://doi.org/10.1016/0165-4101(86)90008-X

    ApplicationCited on: logit probit
    Annotation

    Palepu used logit models to predict which firms would become takeover targets based on financial and market characteristics. This influential paper demonstrated the practical application of binary choice models to corporate strategy and governance questions.

  256. Pearl, J. (2001). Direct and Indirect Effects. Proceedings of the Seventeenth Conference on Uncertainty in Artificial Intelligence, 411–420.

    FoundationalCited on: causal mediation analysis
    Annotation

    Pearl formalized the concepts of natural direct and indirect effects using structural causal models and do-calculus. This paper established the nonparametric identification conditions for mediation effects and showed that traditional mediation analysis conflates causal and non-causal pathways.

  257. Pearl, J. (2009). Causality: Models, Reasoning, and Inference. Cambridge University Press.

    https://doi.org/10.1017/CBO9780511803161

    FoundationalCited on: dags for beginners
    Annotation

    The foundational textbook on causal inference using directed acyclic graphs, structural causal models, and the do-calculus.

  258. Pearl, J. (2014). Interpretation and Identification of Causal Mediation. Psychological Methods, 19(4), 459–481.

    https://doi.org/10.1037/a0036434

    FoundationalCited on: causal mediation analysis
    Annotation

    Pearl provided a structural causal model perspective on mediation, clarifying the interpretation and identification of natural direct and indirect effects. He showed how graphical criteria can determine when mediation effects are identifiable and contrasted the structural approach with the potential outcomes framework used by Imai, Keele, and Tingley.

  259. Peterson, M. F., Arregle, J.-L., & Martin, X. (2012). Multilevel Models in International Business Research. Journal of International Business Studies, 43(5), 451–457.

    https://doi.org/10.1057/jibs.2011.59

    ApplicationManagement journalCited on: random effects
    Annotation

    This editorial reviewed the use of multilevel random-effects models in international business research, where firms are nested within countries. It discussed best practices for modeling cross-level effects and the importance of accounting for the hierarchical structure of international data.

  260. Porreca, Z. (2022). Synthetic Difference-in-Differences Estimation with Staggered Treatment Timing. Economics Letters, 220, 110874.

    https://doi.org/10.1016/j.econlet.2022.110874

    Annotation

    Porreca extended the synthetic DID estimator to staggered treatment adoption settings, where multiple units adopt treatment at different times. The method constructs a localized estimator in which treated units are compared to a never-treated control group weighted on both the time and unit dimensions.

  261. Preacher, K. J., & Hayes, A. F. (2008). Asymptotic and Resampling Strategies for Assessing and Comparing Indirect Effects in Multiple Mediator Models. Behavior Research Methods, 40(3), 879–891.

    https://doi.org/10.3758/BRM.40.3.879

    ApplicationCited on: causal mediation analysis
    Annotation

    Preacher and Hayes developed methods and software for testing indirect effects through multiple mediators simultaneously, using bootstrapping to construct confidence intervals. Their approach and accompanying SPSS and SAS macros became extremely widely used in psychology and management research.

  262. Rabe-Hesketh, S., & Skrondal, A. (2012). Multilevel and Longitudinal Modeling Using Stata. Stata Press, 3rd edition.

    SurveyCited on: random effects
    Annotation

    A comprehensive practical guide to multilevel (hierarchical) models in Stata, which generalize the random effects framework to more complex nested data structures. Essential reference for applied researchers implementing multilevel models.

  263. Rambachan, A., & Roth, J. (2023). A More Credible Approach to Parallel Trends. Review of Economic Studies, 90(5), 2555–2591.

    https://doi.org/10.1093/restud/rdad018

    FoundationalCited on: event studies
    Annotation

    Rambachan and Roth developed a sensitivity analysis framework for assessing the robustness of event-study and difference-in-differences estimates to violations of the parallel trends assumption. Their approach constructs honest confidence intervals under restrictions on how pre-trends can extrapolate into the post-treatment period, providing a disciplined alternative to informal pre-trend tests.

  264. Raudenbush, S. W., & Bryk, A. S. (2002). Hierarchical Linear Models: Applications and Data Analysis Methods. Sage Publications.

    ApplicationCited on: random effects
    Annotation

    This influential textbook popularized hierarchical linear models (HLM), which are random-effects models for nested data structures such as students within schools. It became the standard reference for multilevel modeling in education, psychology, and organizational research.

  265. Robins, J. M., & Greenland, S. (1992). Identifiability and Exchangeability for Direct and Indirect Effects. Epidemiology, 3(2), 143–155.

    https://doi.org/10.1097/00001648-199203000-00013

    FoundationalCited on: causal mediation analysis
    Annotation

    Robins and Greenland provided early formal conditions for identifying direct and indirect causal effects in epidemiology. Their work on controlled direct effects and the assumptions required for mediation analysis laid important groundwork for the modern causal mediation literature.

  266. Robins, J. M., Rotnitzky, A., & Zhao, L. P. (1994). Estimation of Regression Coefficients When Some Regressors Are Not Always Observed. Journal of the American Statistical Association, 89(427), 846–866.

    https://doi.org/10.1080/01621459.1994.10476818

    FoundationalCited on: doubly robust estimation
    Annotation

    This paper introduced the augmented inverse probability weighting (AIPW) estimator, which combines outcome modeling and propensity score weighting. The key insight is that the estimator is consistent if either the outcome model or the propensity score model is correctly specified, providing a double layer of protection against misspecification.

  267. Robinson, P. M. (1988). Root-N-Consistent Semiparametric Regression. Econometrica, 56(4), 931–954.

    https://doi.org/10.2307/1912705

    FoundationalCited on: double debiased machine learning
    Annotation

    Robinson developed the partially linear regression estimator that achieves root-n consistency for the parametric component by partialling out nonparametric nuisance functions. This paper provided the semiparametric foundation that DML generalizes to the machine learning setting.

  268. Rohrer, J. M., Egloff, B., & Schmukle, S. C. (2017). Probing Birth-Order Effects on Narrow Traits Using Specification-Curve Analysis. Psychological Science, 28(12), 1821–1832.

    https://doi.org/10.1177/0956797617723726

    ApplicationCited on: specification curve
    Annotation

    Rohrer, Egloff, and Schmukle applied specification curve analysis to the long-debated question of whether birth order affects personality traits. By running all defensible specifications, they showed that most previously reported birth-order effects disappear, demonstrating the method's power to resolve contested empirical questions.

  269. Romano, J. P., & Wolf, M. (2005). Stepwise Multiple Testing as Formalized Data Snooping. Econometrica, 73(4), 1237–1282.

    https://doi.org/10.1111/j.1468-0262.2005.00615.x

    FoundationalCited on: multiple testing
    Annotation

    Romano and Wolf developed a stepwise multiple testing procedure that controls the family-wise error rate while being less conservative than Bonferroni by resampling from the joint distribution of test statistics. Their method accounts for the correlation structure among tests and is widely used in economics.

  270. Rosenbaum, P. R., & Rubin, D. B. (1983). The Central Role of the Propensity Score in Observational Studies for Causal Effects. Biometrika, 70(1), 41–55.

    https://doi.org/10.1093/biomet/70.1.41

    FoundationalCited on: matching methods
    Annotation

    This paper introduced propensity score matching. Rosenbaum and Rubin showed that instead of matching on many covariates simultaneously, you can match on a single number—the propensity score (predicted probability of treatment)—and this is sufficient to remove selection bias under the assumption of no unobserved confounders.

  271. Rosenbaum, P. R. (2002). Observational Studies. Springer.

    https://doi.org/10.1007/978-1-4757-3692-2

    Annotation

    The definitive textbook on observational study design, covering matching, sensitivity analysis, and design principles for drawing causal inferences from non-experimental data. Rosenbaum's framework for sensitivity analysis (Rosenbaum bounds) is the standard tool for assessing how much unobserved confounding would be needed to overturn a matching-based finding.

  272. Roth, J. (2022). Pretest with Caution: Event-Study Estimates after Testing for Parallel Trends. American Economic Review: Insights, 4(3), 305–322.

    https://doi.org/10.1257/aeri.20210236

    Annotation

    Shows that the common practice of testing for parallel pre-trends and proceeding conditional on 'passing' can lead to distorted inference. Proposes honest confidence intervals that account for pre-testing. Fundamentally changes how researchers should think about event study pre-trends in DiD designs.

  273. Roth, J., Sant'Anna, P. H. C., Bilinski, A., & Poe, J. (2023). What's Trending in Difference-in-Differences? A Synthesis of the Recent Econometrics Literature. Journal of Econometrics, 235(2), 2218–2244.

    https://doi.org/10.1016/j.jeconom.2023.03.008

    Annotation

    This comprehensive survey synthesizes the explosion of recent econometric work on DID, covering staggered treatment timing, heterogeneous treatment effects, pre-trends testing, and new estimators. It is the essential starting point for understanding the modern DID literature.

  274. Rubin, D. B. (1974). Estimating Causal Effects of Treatments in Randomized and Nonrandomized Studies. Journal of Educational Psychology, 66(5), 688–701.

    https://doi.org/10.1037/h0037350

    FoundationalCited on: experimental design
    Annotation

    Rubin formalized the 'potential outcomes' framework that is now central to causal inference. The idea is simple but powerful: each unit has a potential outcome under treatment and under control, and the causal effect is the difference. This paper is the origin of what is now called the Rubin Causal Model.

  275. Sant'Anna, P. H. C., & Zhao, J. (2020). Doubly Robust Difference-in-Differences Estimators. Journal of Econometrics, 219(1), 101–122.

    https://doi.org/10.1016/j.jeconom.2020.06.003

    FoundationalCited on: doubly robust estimation
    Annotation

    Sant'Anna and Zhao developed doubly robust DID estimators that combine outcome regression and inverse probability weighting. The estimator is consistent for the ATT if either the outcome evolution model or the propensity score model for treatment group membership is correctly specified.

  276. Scharfstein, D. O., Rotnitzky, A., & Robins, J. M. (1999). Adjusting for Nonignorable Drop-Out Using Semiparametric Nonresponse Models. Journal of the American Statistical Association, 94(448), 1096–1120.

    https://doi.org/10.1080/01621459.1999.10473862

    FoundationalCited on: doubly robust estimation
    Annotation

    Scharfstein, Rotnitzky, and Robins extended the doubly robust framework to handle nonignorable missing data and dropout in longitudinal studies. This paper further developed the semiparametric efficiency theory underlying doubly robust estimation.

  277. Schmidheiny, K., & Siegloch, S. (2023). On Event Studies and Distributed-Lags in Two-Way Fixed Effects Models: Identification, Equivalence, and Generalization. Journal of Applied Econometrics.

    https://doi.org/10.1002/jae.2971

    FoundationalCited on: event studies
    Annotation

    Clarified the relationship between event study designs and distributed-lag models in TWFE settings. Showed their equivalence and provided guidance on normalization, binning endpoints, and identification.

  278. Schunck, R. (2013). Within and Between Effects in Fixed and Random Effects Models: Advantages and Caveats. Methods, Data, Analyses.

    https://doi.org/10.12758/mda.2013.011

    SurveyCited on: random effects
    Annotation

    A clear pedagogical comparison of FE and RE estimators, emphasizing that the hybrid/within-between RE model can simultaneously estimate within and between effects while testing the RE assumption.

  279. Semadeni, M., Withers, M. C., & Certo, S. T. (2014). The Perils of Endogeneity and Instrumental Variables in Strategy Research: Understanding through Simulations. Strategic Management Journal, 35(7), 1070–1079.

    https://doi.org/10.1002/smj.2136

    ApplicationManagement journalCited on: instrumental variables
    Annotation

    This paper used Monte Carlo simulations to demonstrate the dangers of using weak or invalid instruments in strategy research. It provides practical guidance for management scholars on when and how to use IV, and when it may do more harm than good.

  280. Semenova, V., & Chernozhukov, V. (2021). Debiased Machine Learning of Conditional Average Treatment Effects and Other Causal Functions. Econometrics Journal, 24(2), 264–289.

    https://doi.org/10.1093/ectj/utaa027

    FoundationalCited on: double debiased machine learning
    Annotation

    Semenova and Chernozhukov extended DML to estimate conditional average treatment effects (CATEs) and other causal functions, allowing researchers to characterize treatment effect heterogeneity. They provided inference methods for projections of the CATE onto interpretable subgroups.

  281. Semenova, V. (2025). Generalized Lee Bounds. Journal of Econometrics, 252, 106119.

    https://doi.org/10.1016/j.jeconom.2025.106119

    ApplicationCited on: lee bounds
    Annotation

    Semenova generalized Lee bounds to allow for covariates and machine learning estimation of nuisance functions, improving the tightness of bounds while maintaining their nonparametric validity. This paper connects the Lee bounds literature to the modern machine learning causal inference literature.

  282. Shipman, J. E., Swanquist, Q. T., & Whited, R. L. (2017). Propensity Score Matching in Accounting Research. The Accounting Review, 92(1), 213–244.

    https://doi.org/10.2308/accr-51449

    SurveyCited on: matching methods
    Annotation

    This paper reviews how propensity score matching has been used (and sometimes misused) in accounting research. It provides practical guidelines on common pitfalls such as matching on post-treatment variables, inadequate balance checks, and ignoring the unconfoundedness assumption.

  283. Silva, J. M. C. S., & Tenreyro, S. (2006). The Log of Gravity. Review of Economics and Statistics, 88(4), 641–658.

    https://doi.org/10.1162/rest.88.4.641

    FoundationalCited on: poisson negative binomial
    Annotation

    Silva and Tenreyro demonstrated that OLS estimation of log-linearized gravity models produces inconsistent estimates in the presence of heteroskedasticity. They showed that Poisson pseudo-maximum-likelihood (PPML) provides consistent estimates and naturally handles zero trade flows, transforming the trade literature.

  284. Simmons, J. P., Nelson, L. D., & Simonsohn, U. (2011). False-Positive Psychology: Undisclosed Flexibility in Data Collection and Analysis Allows Presenting Anything as Significant. Psychological Science, 22(11), 1359–1366.

    https://doi.org/10.1177/0956797611417632

    FoundationalCited on: pre registration
    Annotation

    Simmons, Nelson, and Simonsohn demonstrated how researcher degrees of freedom in data collection and analysis can inflate false-positive rates dramatically. Their paper, which proposed disclosure requirements and pre-registration as solutions, was one of the catalysts for the replication crisis and pre-registration movement.

  285. Simonsohn, U., Simmons, J. P., & Nelson, L. D. (2020). Specification Curve Analysis. Nature Human Behaviour, 4(11), 1208–1214.

    https://doi.org/10.1038/s41562-020-0912-z

    FoundationalCited on: specification curve
    Annotation

    Simonsohn, Simmons, and Nelson introduced specification curve analysis, which systematically runs all reasonable specifications of a model and displays the distribution of estimates. This approach replaces selective reporting of specifications with a comprehensive view of how results depend on analytical choices.

  286. Singh, J., & Agrawal, A. (2011). Recruiting for Ideas: How Firms Exploit the Prior Inventions of New Hires. Management Science, 57(1), 129–150.

    https://doi.org/10.1287/mnsc.1100.1253

    ApplicationManagement journalCited on: poisson negative binomial
    Annotation

    Singh and Agrawal used negative binomial regression to study how hiring inventors affects the knowledge flows to the hiring firm, as measured by citation counts. This paper demonstrates the application of count models to questions of knowledge transfer and human capital in organizations.

  287. Staiger, D., & Stock, J. H. (1997). Instrumental Variables Regression with Weak Instruments. Econometrica, 65(3), 557–586.

    https://doi.org/10.2307/2171753

    FoundationalCited on: instrumental variables
    Annotation

    Staiger and Stock showed formally that when instruments are weak, 2SLS estimates are biased toward OLS and standard inference breaks down. This paper established the theoretical foundations for the weak instruments problem that Stock and Yogo (2005) later provided practical tests for.

  288. Steegen, S., Tuerlinckx, F., Gelman, A., & Vanpaemel, W. (2016). Increasing Transparency Through a Multiverse Analysis. Perspectives on Psychological Science, 11(5), 702–712.

    https://doi.org/10.1177/1745691616658637

    FoundationalCited on: specification curve
    Annotation

    Steegen and colleagues introduced multiverse analysis, which examines how results vary across the full set of defensible data processing and analytical decisions. This approach is closely related to specification curve analysis and emphasizes transparency about the garden of forking paths in data analysis.

  289. Stock, J. H., Wright, J. H., & Yogo, M. (2002). A Survey of Weak Instruments and Weak Identification in Generalized Method of Moments. Journal of Business & Economic Statistics, 20(4), 518–529.

    https://doi.org/10.1198/073500102288618658

    SurveyCited on: instrumental variables
    Annotation

    A comprehensive treatment of weak instruments and their consequences for inference in IV and GMM settings. Covers the theoretical foundations of the weak instrument problem and practical diagnostic tools.

  290. Stock, J. H., & Yogo, M. (2005). Testing for Weak Instruments in Linear IV Regression. Identification and Inference for Econometric Models: Essays in Honor of Thomas Rothenberg, 80–108.

    https://doi.org/10.1017/CBO9780511614491.006

    FoundationalCited on: instrumental variables
    Annotation

    Stock and Yogo developed critical values for testing whether instruments are 'weak'—that is, only weakly correlated with the endogenous variable. Their rule of thumb that the first-stage F-statistic should exceed 10 is probably the most widely used diagnostic in applied IV research.

  291. Stuart, E. A. (2010). Matching Methods for Causal Inference: A Review and a Look Forward. Statistical Science, 25(1), 1–21.

    https://doi.org/10.1214/09-STS313

    SurveyCited on: matching methods
    Annotation

    A comprehensive review of matching methods including propensity score matching, Mahalanobis distance matching, and coarsened exact matching, with practical guidance on implementation. Provides an accessible overview of when and how to use different matching approaches.

  292. Sun, L., & Abraham, S. (2021). Estimating Dynamic Treatment Effects in Event Studies with Heterogeneous Treatment Effects. Journal of Econometrics, 225(2), 175–199.

    https://doi.org/10.1016/j.jeconom.2020.09.006

    Annotation

    Sun and Abraham showed that conventional event-study regression coefficients are contaminated by treatment effect heterogeneity across cohorts and proposed an interaction-weighted estimator that recovers clean dynamic treatment effects. This paper is the key reference for event-study plots in staggered settings.

  293. Thistlethwaite, D. L., & Campbell, D. T. (1960). Regression-Discontinuity Analysis: An Alternative to the Ex Post Facto Experiment. Journal of Educational Psychology, 51(6), 309–317.

    https://doi.org/10.1037/h0044319

    FoundationalCited on: regression discontinuity sharp
    Annotation

    This paper introduced the regression discontinuity design. Thistlethwaite and Campbell proposed comparing units just above and just below a cutoff score to estimate causal effects, reasoning that units near the cutoff are essentially randomly assigned. The idea lay dormant for decades before being rediscovered by economists.

  294. Train, K. E. (2009). Discrete Choice Methods with Simulation. Cambridge University Press.

    https://doi.org/10.1017/CBO9780511805271

    SurveyCited on: logit probit
    Annotation

    Train's textbook provides a comprehensive and accessible treatment of logit, probit, mixed logit, and other discrete choice models. It covers both theory and practical simulation-based estimation methods and is widely used in economics, marketing, and transportation research.

  295. Van der Klaauw, W. (2002). Estimating the Effect of Financial Aid Offers on College Enrollment: A Regression-Discontinuity Approach. International Economic Review, 43(4), 1249–1287.

    https://doi.org/10.1111/1468-2354.t01-1-00055

    ApplicationCited on: regression discontinuity fuzzy
    Annotation

    Van der Klaauw applied a fuzzy RDD to study how financial aid offers affect college enrollment decisions, exploiting discontinuities in an aid assignment rule where eligibility changes at GPA thresholds but compliance is imperfect. This paper is one of the earliest and most influential applications of fuzzy RDD.

  296. VanderWeele, T. J. (2015). Explanation in Causal Inference: Methods for Mediation and Interaction. Oxford University Press.

    ApplicationCited on: causal mediation analysis
    Annotation

    VanderWeele's comprehensive textbook unified the causal mediation literature, covering potential outcomes and structural equation approaches, sensitivity analysis, time-varying treatments, and interaction effects. It is the standard reference for researchers conducting mediation analysis.

  297. VanderWeele, T. J. (2016). Mediation Analysis: A Practitioner's Guide. Annual Review of Public Health, 37, 17–32.

    https://doi.org/10.1146/annurev-publhealth-032315-021402

    Annotation

    VanderWeele provided an accessible practitioner-oriented guide to modern causal mediation analysis, covering the assumptions required for identification, sensitivity analysis for unmeasured confounding, and extensions to multiple mediators and interactions. This review is an excellent entry point for applied researchers seeking to move beyond the Baron-Kenny framework.

  298. VanderWeele, T. J., & Ding, P. (2017). Sensitivity Analysis in Observational Research: Introducing the E-Value. Annals of Internal Medicine, 167(4), 268–274.

    https://doi.org/10.7326/M16-2607

    ApplicationCited on: sensitivity analysis
    Annotation

    VanderWeele and Ding introduced the E-value, a simple and intuitive measure of the minimum strength of association that an unmeasured confounder would need to have with both the treatment and outcome to fully explain away an observed treatment-outcome association. The E-value has been widely adopted in epidemiology and social science.

  299. Villalonga, B., & Amit, R. (2006). How Do Family Ownership, Control and Management Affect Firm Value?. Journal of Financial Economics, 80(2), 385–417.

    https://doi.org/10.1016/j.jfineco.2004.12.005

    ApplicationCited on: matching methods
    Annotation

    This paper studied how different forms of family involvement in firms affect value, using matching and regression methods to compare family and non-family firms. It illustrates how matching can help address selection issues in corporate governance research.

  300. Wager, S., & Athey, S. (2018). Estimation and Inference of Heterogeneous Treatment Effects Using Random Forests. Journal of the American Statistical Association, 113(523), 1228–1242.

    https://doi.org/10.1080/01621459.2017.1319839

    FoundationalCited on: causal forests
    Annotation

    Wager and Athey developed causal forests by extending random forests to estimate conditional average treatment effects. They proved pointwise consistency and asymptotic normality under regularity conditions, enabling valid confidence intervals for individualized treatment effect estimates.

  301. Westfall, P. H., & Young, S. S. (1993). Resampling-Based Multiple Testing: Examples and Methods for p-Value Adjustment. Wiley.

    FoundationalCited on: multiple testing
    Annotation

    Westfall and Young developed resampling-based methods for multiple testing that account for the dependence structure among test statistics. Their permutation-based step-down procedure is less conservative than Bonferroni and became a standard reference for multiple testing adjustments in applied research.

  302. White, H. (1980). A Heteroskedasticity-Consistent Covariance Matrix Estimator and a Direct Test for Heteroskedasticity. Econometrica, 48(4), 817–838.

    https://doi.org/10.2307/1912934

    FoundationalCited on: ols regression
    Annotation

    This paper introduced the now-standard 'robust standard errors' that researchers routinely use with OLS. Before White's correction, standard errors could be misleadingly small when the variance of the error term was not constant across observations. Nearly every empirical paper today uses some variant of this approach.

  303. Wooldridge, J. M. (1999). Distribution-Free Estimation of Some Nonlinear Panel Data Models. Journal of Econometrics, 90(1), 77–97.

    https://doi.org/10.1016/S0304-4076(98)00033-5

    FoundationalCited on: poisson negative binomial
    Annotation

    Wooldridge showed that Poisson quasi-maximum-likelihood estimation is consistent for the conditional mean even if the data are not Poisson-distributed, as long as the mean is correctly specified. This result justifies the widespread use of Poisson regression for non-count continuous outcomes.

  304. Wooldridge, J. M. (2010). Econometric Analysis of Cross Section and Panel Data. MIT Press, 2nd edition.

    https://doi.org/10.7551/mitpress/8548.001.0001

    Annotation

    Wooldridge's graduate textbook is the standard reference for panel data econometrics. Chapters 10–14 provide a thorough treatment of fixed effects, random effects, and related panel data methods, covering both linear and nonlinear models with careful attention to assumptions.

  305. Wooldridge, J. M. (2019). Correlated Random Effects Models with Unbalanced Panels. Journal of Econometrics, 211(1), 137–150.

    https://doi.org/10.1016/j.jeconom.2018.12.010

    FoundationalCited on: random effects
    Annotation

    Wooldridge extended the correlated random effects (CRE) framework to handle unbalanced panels, which are the norm in applied research. This paper shows how to combine the flexibility of fixed effects with the ability to estimate effects of time-invariant variables, making the CRE approach practical for real-world datasets.

  306. Young, C., & Holsteen, K. (2017). Model Uncertainty and Robustness: A Computational Framework for Multimodel Analysis. Sociological Methods & Research, 46(1), 3–40.

    https://doi.org/10.1177/0049124115610347

    ApplicationCited on: specification curve
    Annotation

    Young and Holsteen developed a computational framework for systematically exploring model uncertainty by running thousands of plausible specifications. Their approach is one of the earliest implementations of what would become known as specification curve or multiverse analysis, applied to sociological research.

  307. Young, A. (2019). Channeling Fisher: Randomization Tests and the Statistical Insignificance of Seemingly Significant Experimental Results. Quarterly Journal of Economics, 134(2), 557–598.

    https://doi.org/10.1093/qje/qjy029

    FoundationalCited on: randomization inference
    Annotation

    Young applied randomization inference to a large sample of experimental papers published in top economics journals and found that many results that appear significant under conventional inference are insignificant under randomization tests. This paper demonstrated the practical importance of randomization inference for credible empirical research.

  308. Young, A. (2022). Consistency Without Inference: Instrumental Variables in Practical Application. European Economic Review, 147, 104112.

    https://doi.org/10.1016/j.euroecorev.2022.104112

    ApplicationCited on: instrumental variables
    Annotation

    A provocative assessment showing that many published IV applications have first-stage F-statistics too weak for reliable inference when examined under modern standards. Highlights the gap between theoretical requirements for valid IV and actual practice in published research.

  309. Zelner, B. A. (2009). Using Simulation to Interpret Results from Logit, Probit, and Other Nonlinear Models. Strategic Management Journal, 30(12), 1335–1348.

    https://doi.org/10.1002/smj.783

    ApplicationManagement journalCited on: logit probit
    Annotation

    Zelner advocated using simulation-based approaches to interpret and present results from nonlinear models in management research. By computing predicted probabilities and marginal effects via simulation, researchers can convey substantive significance more clearly than raw coefficients.

  310. Zhao, X., Lynch, J. G., & Chen, Q. (2010). Reconsidering Baron and Kenny: Myths and Truths about Mediation Analysis. Journal of Consumer Research, 37(2), 197–206.

    https://doi.org/10.1086/651257

    Annotation

    Zhao, Lynch, and Chen provided an important critique of the Baron and Kenny mediation framework from within the management literature. They argued that the 'step 1' requirement of a significant total effect is unnecessary and introduced a more sensible classification of mediation types (complementary, competitive, indirect-only, direct-only, no-effect). While still operating within the regression framework rather than the full causal framework, this paper was a significant step forward for applied researchers.

  311. Zhao, Q., Small, D. S., & Bhatt, D. L. (2019). Sensitivity Analysis for Inverse Probability Weighting Estimators via the Percentile Bootstrap. Journal of the Royal Statistical Society: Series B, 81(4), 735–761.

    https://doi.org/10.1111/rssb.12327

    ApplicationCited on: doubly robust estimation
    Annotation

    Zhao, Small, and Bhatt developed sensitivity analysis tools for inverse probability weighted and doubly robust estimators, applying them to evaluate the causal effect of bariatric surgery on mortality using health-care claims data. The paper demonstrates practical use of AIPW in a medical decision-making context while addressing concerns about unobserved confounding.