MethodAtlas
380 References

Bibliography

All papers referenced across Method Atlas, formatted in APA 7th edition. Search, filter, sort, and export the collection.

Spanning19332025 · over nine decades of research
Coverage350 with DOI · 6 with replication package
Filters
Sort

Showing 380 of 380 references

A
48
  1. Abadie, A., & Gardeazabal, J. (2003). The Economic Costs of Conflict: A Case Study of the Basque Country. American Economic Review, 93(1), 113–132.

    doi.org/10.1257/000282803321455188

    Foundationalon synthetic control
    terrorismBasque-Countryeconomic-costs
    Annotation

    Abadie and Gardeazabal introduce the synthetic control idea in the context of estimating the economic costs of terrorism in the Basque Country. They construct a synthetic Basque Country from other Spanish regions and show that terrorism reduced GDP per capita by about 10 percentage points.

  2. Abadie, A., & Imbens, G. W. (2006). Large Sample Properties of Matching Estimators for Average Treatment Effects. Econometrica, 74(1), 235–267.

    doi.org/10.1111/j.1468-0262.2006.00655.x

    Foundationalon matching methods
    nearest-neighborlarge-sample-theoryvariance-estimation
    Annotation

    Abadie and Imbens derive the large-sample properties of nearest-neighbor matching estimators, showing that such estimators are not root-N consistent in general and do not attain the semiparametric efficiency bound. Their main practical contribution is a consistent analytical variance estimator that does not require nonparametric estimation of unknown functions. Bootstrap invalidity for matching is established separately in Abadie and Imbens (2008), and the bias-corrected matching estimator is developed in Abadie and Imbens (2011).

  3. Abadie, A., & Imbens, G. W. (2008). On the Failure of the Bootstrap for Matching Estimators. Econometrica, 76(6), 1537–1557.

    doi.org/10.3982/ECTA6474

    Foundationalon matching methods
    bootstrapmatching-inferencevariance-estimation
    Annotation

    Abadie and Imbens show that the standard bootstrap is inconsistent for nearest-neighbor matching estimators with a fixed number of matches, even though these estimators are asymptotically normal. Researchers should use the analytical variance estimator from Abadie and Imbens (2006) instead of bootstrapping.

  4. Abadie, A., Diamond, A., & Hainmueller, J. (2010). Synthetic Control Methods for Comparative Case Studies: Estimating the Effect of California's Tobacco Control Program. Journal of the American Statistical Association, 105(490), 493–505.

    doi.org/10.1198/jasa.2009.ap08746

    synthetic-controltobacco-policyCalifornia
    Annotation

    Abadie, Diamond, and Hainmueller formalize and popularize the synthetic control method, which constructs a weighted combination of control units to approximate the counterfactual for a single treated unit. The application to California's Proposition 99 tobacco control program becomes the canonical example of the method.

  5. Abadie, A., & Imbens, G. W. (2011). Bias-Corrected Matching Estimators for Average Treatment Effects. Journal of Business & Economic Statistics, 29(1), 1–11.

    doi.org/10.1198/jbes.2009.07333

    Foundationalon matching methods
    bias-correctionnearest-neighborregression-adjustment
    Annotation

    Abadie and Imbens develop bias-corrected matching estimators that adjust for the finite-sample bias inherent in nearest-neighbor matching when matching is not exact. Their bias correction uses a regression adjustment within matched pairs and has become a standard recommendation for applied researchers using matching methods.

  6. Abadie, A., Diamond, A., & Hainmueller, J. (2015). Comparative Politics and the Synthetic Control Method. American Journal of Political Science, 59(2), 495–510.

    doi.org/10.1111/ajps.12116

    Applicationon synthetic control
    German-reunificationcomparative-politicspermutation-test
    Annotation

    Abadie, Diamond, and Hainmueller apply the synthetic control method to estimate the economic impact of German reunification, constructing a synthetic West Germany from OECD countries. They demonstrate the method's applicability to major political events and discuss its use in comparative politics as a bridge between quantitative and qualitative approaches. The application illustrates synthetic control's value for case studies where only one unit is treated.

  7. Abadie, A., Athey, S., Imbens, G. W., & Wooldridge, J. M. (2020). Sampling-Based versus Design-Based Uncertainty in Regression Analysis. Econometrica, 88(1), 265–296.

    doi.org/10.3982/ECTA12675

    Foundationalon ols regression
    clusteringstandard-errorsinferenceresearch-design
    Annotation

    Abadie et al. distinguish between sampling-based uncertainty (from drawing a sample from a population) and design-based uncertainty (from treatment assignment) in regression analysis. They show that conventional standard errors can be conservative when the sample includes a substantial fraction of the population, providing a rigorous framework for understanding what regression standard errors actually measure. This paper clarifies the conceptual foundations for inference in empirical work and complements their separate 2023 QJE paper on clustering.

  8. Abadie, A. (2021). Using Synthetic Controls: Feasibility, Data Requirements, and Methodological Aspects. Journal of Economic Literature, 59(2), 391–425.

    doi.org/10.1257/jel.20191450

    surveyplacebo-testsmethodology
    Annotation

    Abadie provides a comprehensive methodological overview of synthetic control, covering data requirements, inference via placebo tests, extensions to multiple treated units, and common pitfalls. This paper is the authoritative practitioner's guide to the method.

  9. Abadie, A., Athey, S., Imbens, G. W., & Wooldridge, J. M. (2023). When Should You Adjust Standard Errors for Clustering?. Quarterly Journal of Economics, 138(1), 1–35.

    doi.org/10.1093/qje/qjac038

    clusteringstandard-errorsinferencedesign-based
    Annotation

    Abadie et al. provide guidance on when clustering standard errors is necessary. They show that clustering can be motivated by sampling-based uncertainty (e.g., two-stage sampling of clusters then units) or design-based uncertainty (e.g., treatment assigned at the cluster level), and that whether to cluster, and at what level, is a substantive question tied to the sampling and assignment process — not a purely mechanical rule.

  10. Abowd, J. M., Kramarz, F., & Margolis, D. N. (1999). High Wage Workers and High Wage Firms. Econometrica, 67(2), 251–333.

    doi.org/10.1111/1468-0262.00020

    Applicationon fixed effects
    worker-fixed-effectsfirm-fixed-effectswage-decomposition
    Annotation

    Abowd, Kramarz, and Margolis use worker and firm fixed effects jointly to decompose wage variation into worker ability and firm pay premia in this landmark paper. The 'AKM' model has become the standard framework for studying labor market sorting, wage inequality, and the role of firms in wage-setting.

  11. Acemoglu, D., Johnson, S., & Robinson, J. A. (2001). The Colonial Origins of Comparative Development: An Empirical Investigation. American Economic Review, 91(5), 1369–1401.

    doi.org/10.1257/aer.91.5.1369

    institutionseconomic-developmentcolonial-history
    Annotation

    Acemoglu, Johnson, and Robinson use historical settler mortality as an instrument for institutional quality to estimate the causal effect of institutions on economic development in this celebrated paper. It is one of the most influential IV applications in economics and demonstrates the creativity required to find a plausible instrument.

  12. Acharya, A., Blackwell, M., & Sen, M. (2016). Explaining Causal Findings Without Bias: Detecting and Assessing Direct Effects. American Political Science Review, 110(3), 512–529.

    doi.org/10.1017/S0003055416000216

    controlled-direct-effectssequential-g-estimationobservational-studiescollider-bias
    Annotation

    Acharya, Blackwell, and Sen develop a sequential g-estimation approach for estimating controlled direct effects in observational studies, addressing the problem that conditioning on a post-treatment mediator can introduce collider bias. Their method is particularly useful in political science and social science settings where intermediate confounders make standard mediation analysis unreliable.

  13. Acquisti, A., & Fong, C. M. (2020). An Experiment in Hiring Discrimination via Online Social Networks. Management Science, 66(3), 1005–1024.

    doi.org/10.1287/mnsc.2018.3269

    ApplicationMgmton experimental design
    audit-studydiscriminationsocial-mediahiringfield-experiment
    Annotation

    Acquisti and Fong conduct a correspondence experiment using social media profiles to study hiring discrimination based on religion and sexual orientation. They find no significant national-level discrimination against Muslim or gay candidates, but significant anti-Muslim discrimination emerges in Republican-leaning areas. The paper illustrates how online information creates new channels for employment discrimination that vary with local attitudes.

  14. Adao, R., Kolesar, M., & Morales, E. (2019). Shift-Share Designs: Theory and Inference. Quarterly Journal of Economics, 134(4), 1949–2010.

    doi.org/10.1093/qje/qjz025

    inferencestandard-errorsspatial-correlation
    Annotation

    Adao, Kolesar, and Morales show that standard errors in shift-share regressions are too small when computed with conventional clustering because residuals are correlated across regions that share similar industry compositions. They propose an inference procedure that accounts for this dependence.

  15. Aguinis, H., Beaty, J. C., Boik, R. J., & Pierce, C. A. (2005). Effect Size and Power in Assessing Moderating Effects of Categorical Variables Using Multiple Regression: A 30-Year Review. Journal of Applied Psychology, 90(1), 94–107.

    doi.org/10.1037/0021-9010.90.1.94

    Applicationon power analysis
    moderationinteraction-effectsapplied-psychology
    Annotation

    Aguinis, Beaty, Boik, and Pierce review 30 years of moderator analysis in applied psychology and management, finding that most studies are severely underpowered to detect interaction effects. They provide guidelines for computing power for moderated regression.

  16. Aguinis, H., Gottfredson, R. K., & Culpepper, S. A. (2013). Best-Practice Recommendations for Estimating Cross-Level Interaction Effects Using Multilevel Modeling. Journal of Management, 39(6), 1490–1528.

    doi.org/10.1177/0149206313478188

    ApplicationMgmton random effects
    cross-level-interactionsmultilevelbest-practices
    Annotation

    Aguinis, Gottfredson, and Culpepper provide detailed guidance for management researchers on estimating cross-level interaction effects in multilevel models. They address common problems including insufficient statistical power, centering decisions, and effect size reporting that frequently lead to unreliable results in organizational research. The paper offers concrete recommendations for sample size, model specification, and interpretation that improve the credibility of multilevel interaction analyses.

  17. Aguinis, H., Edwards, J. R., & Bradley, K. J. (2017). Improving Our Understanding of Moderation and Mediation in Strategic Management Research. Organizational Research Methods, 20(4), 665–685.

    doi.org/10.1177/1094428115627498

    ApplicationMgmton causal mediation analysis
    management-methodologymoderationbest-practices
    Annotation

    Aguinis, Edwards, and Bradley review how mediation and moderation analyses are conducted in strategic management research and identify common errors. They provide recommendations for improving practice, including using causal mediation frameworks and proper inference procedures.

  18. Aguinis, H., Ramani, R. S., & Alabduljader, N. (2018). What You See Is What You Get? Enhancing Methodological Transparency in Management Research. Academy of Management Annals, 12(1), 83–110.

    doi.org/10.5465/annals.2016.0011

    ApplicationMgmton pre registration
    management-methodologytransparencyopen-science
    Annotation

    Aguinis, Ramani, and Alabduljader review methodological transparency in management research and advocate for pre-registration, open data, and open materials. They document the extent of undisclosed analytical flexibility in management studies and propose concrete steps for improvement.

  19. Ahuja, G. (2000). Collaboration Networks, Structural Holes, and Innovation: A Longitudinal Study. Administrative Science Quarterly, 45(3), 425–455.

    doi.org/10.2307/2667105

    ApplicationMgmton poisson negative binomial
    networksstructural-holesinnovationpatentsnegative-binomial+1
    Annotation

    Ahuja uses a random effects Poisson model (following Hausman, Hall, and Griliches 1984) to model patent counts as a function of collaboration network structure in this landmark network study. He finds that direct ties and indirect ties both increase innovation, while structural holes (gaps between partners) decrease it — challenging Burt's structural holes theory in the context of innovation. The paper demonstrates the use of count models with panel data in management research, with fixed effects Poisson estimated as a robustness check.

  20. Ai, C., & Norton, E. C. (2003). Interaction Terms in Logit and Probit Models. Economics Letters, 80(1), 123–129.

    doi.org/10.1016/S0165-1765(03)00032-6

    Foundationalon logit probit
    interaction-effectsmarginal-effectsnonlinear-models
    Annotation

    Ai and Norton show that the interpretation of interaction terms in nonlinear models like logit and probit is much more complicated than in linear models. The marginal effect of an interaction is not simply the coefficient on the interaction term, a mistake that is widespread in applied research.

  21. Akerlof, G. A. (1982). Labor Contracts as Partial Gift Exchange. Quarterly Journal of Economics, 97(4), 543–569.

    doi.org/10.2307/1885099

    Annotation

    Akerlof proposes the gift exchange model of labor markets, in which firms pay above-market wages and workers reciprocate with above-minimum effort. This framework provides a behavioral foundation for efficiency wages and has been tested extensively in laboratory and field experiments.

  22. Albouy, D. Y. (2012). The Colonial Origins of Comparative Development: An Empirical Investigation: Comment. American Economic Review, 102(6), 3059–3076.

    doi.org/10.1257/aer.102.6.3059

    instrument-validityreplicationcolonial-originssensitivity
    Annotation

    Albouy critically re-examines the settler mortality instrument used in Acemoglu et al. (2001), showing that the original results are sensitive to data coding decisions and the sample of countries included. This comment is a cautionary tale about instrument validity and the fragility of influential IV estimates.

  23. Allison, P. D. (1999). Comparing Logit and Probit Coefficients Across Groups. Sociological Methods & Research, 28(2), 186–208.

    doi.org/10.1177/0049124199028002003

    Foundationalon logit probit
    logitprobitgroup-comparisonscoefficient-scaling
    Annotation

    Allison shows that naive comparisons of logit or probit coefficients across groups are misleading because differences in residual variation across groups rescale the coefficients. He proposes a method to adjust for this confound, which is essential for interpreting interaction effects and group comparisons in nonlinear models.

  24. Allison, P. D. (2009). Fixed Effects Regression Models. SAGE Publications.

    doi.org/10.4135/9781412993869

    fixed-vs-randompanel-datatextbookpractical-guidance
    Annotation

    Allison's concise and accessible monograph compares fixed effects and random effects models for panel data, providing practical guidance on model selection, estimation, and interpretation. It is particularly useful for social scientists seeking an intuitive understanding of when each approach is appropriate.

  25. Altonji, J. G., Elder, T. E., & Taber, C. R. (2005). Selection on Observed and Unobserved Variables: Assessing the Effectiveness of Catholic Schools. Journal of Political Economy, 113(1), 151–184.

    doi.org/10.1086/426036

    Foundationalon sensitivity analysis
    selection-on-observablesCatholic-schoolsbounding
    Annotation

    Altonji, Elder, and Taber develop the idea that if selection on observables is informative about selection on unobservables, one can bound the bias from omitted variables. Their approach becomes the basis for the widely used Oster (2019) sensitivity framework.

  26. Amemiya, T. (1981). Qualitative Response Models: A Survey. Journal of Economic Literature, 19(4), 1483–1536.

    Foundationalon logit probit
    surveyqualitative-responsemaximum-likelihood
    Annotation

    Amemiya provides a comprehensive survey of qualitative response models including logit, probit, and tobit. This survey organizes the theoretical properties, estimation methods, and specification tests for binary and multinomial choice models and becomes a standard reference for applied researchers.

  27. Anderson, M. L. (2008). Multiple Inference and Gender Differences in the Effects of Early Intervention: A Reevaluation of the Abecedarian, Perry Preschool, and Early Training Projects. Journal of the American Statistical Association, 103(484), 1481–1495.

    doi.org/10.1198/016214508000000841

    Foundationalon multiple testing
    index-testsWestfall-Youngprogram-evaluation
    Annotation

    Anderson proposes using summary index tests and familywise error rate corrections to address multiple inference in program evaluation. Reanalyzing the Abecedarian, Perry Preschool, and Early Training Projects, he finds that girls garner substantial short- and long-term benefits from early interventions, but there are no significant long-term benefits for boys after correcting for multiple testing.

  28. Andrews, I., Stock, J. H., & Sun, L. (2019). Weak Instruments in Instrumental Variables Regression: Theory and Practice. Annual Review of Economics, 11, 727–753.

    doi.org/10.1146/annurev-economics-080218-025643

    weak-instrumentssurveyrobust-inference
    Annotation

    Andrews, Stock, and Sun provide an up-to-date review of the weak instruments problem, covering modern diagnostic tests, robust inference procedures, and practical recommendations. It is an excellent starting point for understanding the current best practices in IV estimation.

  29. Angrist, J. D. (1990). Lifetime Earnings and the Vietnam Era Draft Lottery: Evidence from Social Security Administrative Records. American Economic Review, 80(3), 313–336.

    Foundationalon instrumental variables
    instrumental-variablesnatural-experimentdraft-lotteryLATE
    Annotation

    Angrist uses the Vietnam-era draft lottery as a natural experiment in this landmark application of instrumental variables. He shows that randomly assigned lottery numbers provide an instrument for military service, allowing causal estimation of the earnings effect of military service.

  30. Angrist, J. D., & Krueger, A. B. (1991). Does Compulsory School Attendance Affect Schooling and Earnings?. Quarterly Journal of Economics, 106(4), 979–1014.

    doi.org/10.2307/2937954

    Foundationalon instrumental variables
    returns-to-educationquarter-of-birthcompulsory-schooling
    Annotation

    Angrist and Krueger use quarter of birth as an instrument for years of schooling, exploiting the fact that compulsory schooling laws interact with birth timing. This paper is one of the most-taught examples of instrumental variables in economics and also sparks important debates about weak instruments.

  31. Angrist, J. D., Imbens, G. W., & Rubin, D. B. (1996). Identification of Causal Effects Using Instrumental Variables. Journal of the American Statistical Association, 91(434), 444–455.

    doi.org/10.1080/01621459.1996.10476902

    LATEcompliersinstrumental-variablespotential-outcomes
    Annotation

    Angrist, Imbens, and Rubin formalize the LATE framework — originally introduced in Imbens and Angrist (1994) — within the Rubin Causal Model, providing a detailed treatment of the assumptions required for causal interpretation of IV estimates. This paper introduces the complier taxonomy (always-takers, never-takers, compliers, defiers) that is now standard in the IV literature. The practical implication is that IV estimates should be interpreted as local to the complier subpopulation, not as average effects for the entire population.

  32. Angrist, J. D., & Lavy, V. (1999). Using Maimonides' Rule to Estimate the Effect of Class Size on Scholastic Achievement. Quarterly Journal of Economics, 114(2), 533–575.

    doi.org/10.1162/003355399556061

    class-sizeeducationMaimonides-rule
    Annotation

    Angrist and Lavy exploit a rule that caps class sizes at 40 students, creating discontinuities in class size as enrollment crosses multiples of 40. The imperfect compliance with the rule makes this a fuzzy RDD. This paper is one of the most widely taught examples of the fuzzy RDD approach.

  33. Angrist, J. D., & Krueger, A. B. (2001). Instrumental Variables and the Search for Identification: From Supply and Demand to Natural Experiments. Journal of Economic Perspectives, 15(4), 69–85.

    doi.org/10.1257/jep.15.4.69

    history-of-IVnatural-experimentssupply-and-demandidentification
    Annotation

    Angrist and Krueger trace the evolution of IV from its origins in supply-and-demand estimation to modern natural experiments in this historical survey. They provide valuable context for understanding how IV methodology developed and why it becomes central to applied economics.

  34. Angrist, J. D., Chernozhukov, V., & Fernandez-Val, I. (2006). Quantile Regression under Misspecification, with an Application to the U.S. Wage Structure. Econometrica, 74(2), 539–563.

    doi.org/10.1111/j.1468-0262.2006.00671.x

    applicationwage-structurereturns-to-education
    Annotation

    Angrist, Chernozhukov, and Fernandez-Val study quantile regression under misspecification, showing that QR coefficients minimize a weighted mean-squared specification-error loss and deriving an omitted-variable-bias formula for quantile regression. Applying this framework to U.S. Census wage data, they document continued residual inequality growth in the 1990s, primarily in the upper half of the distribution.

  35. Angrist, J., Bettinger, E., & Kremer, M. (2006). Long-Term Educational Consequences of Secondary School Vouchers: Evidence from Administrative Records in Colombia. American Economic Review, 96(3), 847–862.

    doi.org/10.1257/aer.96.3.847

    Applicationon lee bounds
    school-vouchersattritionColombia
    Annotation

    Angrist, Bettinger, and Kremer use administrative records to study the long-term effects of Colombia's PACES school voucher lottery, finding that vouchers increase secondary school completion rates by 15-20% and raise college admissions test scores by 0.2 standard deviations. They correct for differential test-taking rates between lottery winners and losers using bounding methods. The paper demonstrates how administrative data and lottery-based instruments enable credible long-term policy evaluation.

  36. Angrist, J. D., & Pischke, J.-S. (2009). Mostly Harmless Econometrics: An Empiricist's Companion. Princeton University Press.

    doi.org/10.1515/9781400829828

    textbookcausal-inferencedesign-basedcredibility-revolution
    Annotation

    Angrist and Pischke write one of the most influential modern textbooks on applied econometrics, organizing the field around a design-based approach to causal inference. The book provides essential treatments of instrumental variables, difference-in-differences, and regression discontinuity, each grounded in the potential outcomes framework. It remains the standard reference for graduate students learning to evaluate and implement identification strategies.

  37. Angrist, J. D., & Pischke, J.-S. (2010). The Credibility Revolution in Empirical Economics: How Better Research Design Is Taking the Con out of Econometrics. Journal of Economic Perspectives, 24(2), 3–30.

    doi.org/10.1257/jep.24.2.3

    credibility-revolutionresearch-designcausal-inferencemethodology
    Annotation

    Angrist and Pischke provide the intellectual context for why applied economics moved from 'throw variables into OLS and see what sticks' to design-based causal inference. They help researchers understand where OLS fits in the larger methodological landscape and why credible identification strategies matter.

  38. Arkhangelsky, D., Athey, S., Hirshberg, D. A., Imbens, G. W., & Wager, S. (2021). Synthetic Difference-in-Differences. American Economic Review, 111(12), 4088–4118.

    doi.org/10.1257/aer.20190159

    synthetic-DIDunit-weightstime-weights
    Annotation

    Arkhangelsky et al. introduce the synthetic difference-in-differences estimator, which combines the strengths of DID (parallel trends assumption) and synthetic control (re-weighting to improve pre-treatment fit). The method uses both unit weights and time weights to construct a more credible counterfactual, and provides valid inference without requiring a large donor pool.

  39. Arkhangelsky, D., & Imbens, G. W. (2022). Doubly Robust Identification for Causal Panel Data Models. Econometrics Journal, 25(3), 649–674.

    doi.org/10.1093/ectj/utac019

    doubly-robustcausal-panel-dataSDID-extension
    Annotation

    Arkhangelsky and Imbens develop doubly robust identification strategies for causal panel data models, combining outcome modeling with re-weighting to provide consistent estimates if either the outcome model or the weighting scheme is correctly specified. The framework is broader than synthetic DID specifically but directly relevant to it, strengthening the theoretical foundations for panel-data treatment effect estimation.

  40. Ashenfelter, O. (1978). Estimating the Effect of Training Programs on Earnings. Review of Economics and Statistics, 60(1), 47–57.

    doi.org/10.2307/1924332

    training-programsearningsearly-DID
    Annotation

    Ashenfelter provides one of the earliest applications of the difference-in-differences logic, comparing the earnings of trainees before and after a job training program to a comparison group. The key insight is that differencing removes time-invariant unobserved differences between treatment and control groups. This paper also documents the 'Ashenfelter dip' — the pre-program earnings decline among trainees — which becomes a canonical example of why parallel trends cannot be taken for granted.

  41. Athey, S., & Imbens, G. W. (2016). Recursive Partitioning for Heterogeneous Causal Effects. Proceedings of the National Academy of Sciences, 113(27), 7353–7360.

    doi.org/10.1073/pnas.1510489113

    Foundationalon causal forests
    causal-treeshonest-estimationheterogeneous-effects
    Annotation

    Athey and Imbens introduce causal trees, adapting the CART algorithm to estimate heterogeneous treatment effects with valid inference. They propose the honest estimation approach, where one subsample is used for tree construction and another for estimation, ensuring valid confidence intervals.

  42. Athey, S., & Imbens, G. W. (2017). The Econometrics of Randomized Experiments. Handbook of Economic Field Experiments, 1, 73–140.

    doi.org/10.1016/bs.hefe.2016.10.003

    field-experimentsrandomization-inferencedesign
    Annotation

    Athey and Imbens provide a modern, rigorous treatment of the econometrics behind randomized experiments. They cover design, analysis, and inference issues such as stratification, clustering, and multiple hypothesis testing. It is an excellent reference for researchers running field experiments.

  43. Athey, S., & Imbens, G. W. (2019). Machine Learning Methods That Economists Should Know About. Annual Review of Economics, 11, 685–725.

    doi.org/10.1146/annurev-economics-080217-053433

    surveymachine-learningeconomics
    Annotation

    Athey and Imbens provide a broad survey of machine learning methods relevant to economists, covering supervised learning, unsupervised learning, matrix completion, and methods at the intersection of ML and causal inference including DML and causal forests. The paper explains when and why machine learning methods can improve both prediction and causal inference in economics. It serves as an accessible entry point for applied researchers seeking to understand the full landscape of ML tools available for economic applications.

  44. Athey, S., Tibshirani, J., & Wager, S. (2019). Generalized Random Forests. Annals of Statistics, 47(2), 1148–1178.

    doi.org/10.1214/18-AOS1709

    generalized-random-forestsestimating-equationsgrf-package
    Annotation

    Athey, Tibshirani, and Wager introduce the generalized random forest (GRF) framework, which extends causal forests to a broad class of estimating equations including quantile regression, IV, and local average treatment effects. GRF provides the theoretical foundation and the widely used grf R package.

  45. Autor, D. H. (2003). Outsourcing at Will: The Contribution of Unjust Dismissal Doctrine to the Growth of Employment Outsourcing. Journal of Labor Economics, 21(1), 1–42.

    doi.org/10.1086/344122

    employment-lawoutsourcingstaggered-adoption
    Annotation

    Autor uses a DID design that exploits the staggered adoption of wrongful-discharge protections across U.S. states. He finds that stronger employment protections led firms to outsource more jobs. This paper is a model for using staggered state-level policy changes in a DID framework.

  46. Autor, D. H., Dorn, D., & Hanson, G. H. (2013). The China Syndrome: Local Labor Market Effects of Import Competition in the United States. American Economic Review, 103(6), 2121–2168.

    doi.org/10.1257/aer.103.6.2121

    China-shocktradelabor-markets
    Annotation

    Autor, Dorn, and Hanson use a shift-share instrument to study how Chinese import competition affected U.S. local labor markets, instrumenting U.S. import exposure with Chinese exports to other high-income countries. This paper is one of the most influential and widely discussed shift-share applications.

  47. Azoulay, P., Graff Zivin, J. S., & Wang, J. (2010). Superstar Extinction. Quarterly Journal of Economics, 125(2), 549–589.

    doi.org/10.1162/qjec.2010.125.2.549

    Applicationon matching methods
    superstar-scientistscollaborationinnovationscience-of-science
    Annotation

    Azoulay and coauthors exploit the premature and unexpected deaths of 112 academic superstars as a natural experiment, using coarsened exact matching to construct a control group of comparable collaborators. They find that the death of a superstar leads to a lasting 5-8% decline in the quality-adjusted publication rates of their collaborators, with spillovers circumscribed in idea space but less so in physical or social space. This study is an elegant application of a natural experiment combined with matching in the economics of science and innovation.

  48. Azoulay, P., Stuart, T., & Wang, Y. (2014). Matthew: Effect or Fable?. Management Science, 60(1), 92–109.

    doi.org/10.1287/mnsc.2013.1755

    ApplicationMgmton matching methods
    matchingcoarsened-exact-matchingMatthew-effectcumulative-advantagescience-of-science
    Annotation

    Azoulay, Stuart, and Wang investigate whether mid-career recognition (Howard Hughes Medical Institute appointment) creates a cumulative advantage or 'Matthew effect' in science. They use coarsened exact matching to construct a comparison group of equally productive scientists, addressing the selection problem inherent in studying prestigious awards. The study finds a small, short-lived citation boost to papers published before HHMI appointment, suggesting a status or halo effect on pre-existing work rather than a sustained productivity advantage.

B
36
  1. Bach, P., Chernozhukov, V., Kurz, M. S., & Spindler, M. (2022). DoubleML – An Object-Oriented Implementation of Double Machine Learning in Python. Journal of Machine Learning Research, 23(53), 1–6.

    softwarePythonRimplementation
    Annotation

    Bach and colleagues develop the DoubleML Python package, providing a user-friendly object-oriented implementation of the DML framework. The package supports partially linear, interactive, and instrumental variable models with a variety of machine learning methods for nuisance estimation. A companion R package is described separately.

  2. Baker, A. C., Larcker, D. F., & Wang, C. C. Y. (2022). How Much Should We Trust Staggered Difference-in-Differences Estimates?. Journal of Financial Economics, 144(2), 370–395.

    doi.org/10.1016/j.jfineco.2022.01.004

    financereplicationTWFE-bias
    Annotation

    Baker, Larcker, and Wang demonstrate that the staggered DID problems identified in the econometrics literature are empirically relevant in finance research. They re-analyzed prominent finance studies and show that results can change substantially when using robust estimators.

  3. Baltagi, B. H. (2021). Econometric Analysis of Panel Data. Springer, 6th edition.

    doi.org/10.1007/978-3-030-53953-5

    textbookpanel-dataerror-componentsdynamic-panels
    Annotation

    Baltagi provides the standard graduate-level textbook on panel data econometrics, covering fixed effects, random effects, error component models, and extensions to unbalanced panels and dynamic models. The book offers comprehensive treatment of both the theoretical foundations of panel data estimators and their practical implementation across statistical software. It is the primary reference for researchers who need to understand the assumptions, properties, and trade-offs of different panel data methods.

  4. Bandiera, O., Barankay, I., & Rasul, I. (2005). Social Preferences and the Response to Incentives: Evidence from Personnel Data. Quarterly Journal of Economics, 120(3), 917–962.

    doi.org/10.1093/qje/120.3.917

    Applicationon experimental design
    incentivesfield-experimentpersonnel-economics
    Annotation

    Bandiera, Barankay, and Rasul use a field experiment in a fruit-picking firm to study how switching from relative to piece-rate pay affects productivity. They demonstrate that social preferences among workers matter for incentive design, bridging experimental economics and management.

  5. Banerjee, A., Duflo, E., Goldberg, N., Karlan, D., Osei, R., Pariente, W., Shapiro, J., Thuysbaert, B., & Udry, C. (2015). A Multifaceted Program Causes Lasting Progress for the Very Poor: Evidence from Six Countries. Science, 348(6236), 1260799.

    doi.org/10.1126/science.1260799

    Applicationon experimental design
    development-economicsmulti-country-RCTpoverty
    Annotation

    Banerjee, Duflo, and colleagues conduct a large-scale RCT across six countries, demonstrating that a multifaceted anti-poverty program produces sustained economic gains for the ultra-poor. The study is notable for its multi-site design, which provides rare multi-country evidence on how the same intervention performs across diverse contexts. It demonstrates both the power of randomized evaluation at scale and the importance of bundled interventions when individual components may be insufficient.

  6. Bang, H., & Robins, J. M. (2005). Doubly Robust Estimation in Missing Data and Causal Inference Models. Biometrics, 61(4), 962–973.

    doi.org/10.1111/j.1541-0420.2005.00377.x

    double-robustnesssimulationtutorial
    Annotation

    Bang and Robins provide an accessible exposition of doubly robust estimators, demonstrating their properties through simulations and clarifying when the double robustness property provides meaningful protection. This paper helps make the method more accessible to applied researchers.

  7. Baron, R. M., & Kenny, D. A. (1986). The Moderator-Mediator Variable Distinction in Social Psychological Research: Conceptual, Strategic, and Statistical Considerations. Journal of Personality and Social Psychology, 51(6), 1173–1182.

    doi.org/10.1037/0022-3514.51.6.1173

    mediationmoderator-mediatorsocial-psychology
    Annotation

    Baron and Kenny introduce the widely used four-step approach to testing mediation, comparing total, direct, and indirect effects using sequential regressions. While later work has identified limitations of this approach, it remains one of the most cited papers in all of social science.

  8. Barone-Adesi, F., Gasparrini, A., Vizzini, L., Merletti, F., & Richiardi, L. (2011). Effects of Italian Smoking Regulation on Rates of Hospital Admission for Acute Coronary Events: A Country-Wide Study. PLoS ONE, 6(3), e17419.

    doi.org/10.1371/journal.pone.0017419

    Applicationon lab its replication
    Annotation

    Barone-Adesi et al. use an interrupted time series design to estimate the effect of Italy's 2005 smoking ban on acute coronary event admissions, finding a significant reduction among those under 70 in the months following implementation.

  9. Bartik, T. J. (1991). Who Benefits from State and Local Economic Development Policies?. W.E. Upjohn Institute for Employment Research.

    doi.org/10.17848/9780585223940

    Bartik-instrumentlocal-labor-marketsemployment
    Annotation

    Bartik introduces the shift-share instrument—constructing predicted local employment growth from national industry growth rates interacted with initial local industry composition. This 'Bartik instrument' has become one of the most widely used instruments in labor and urban economics.

  10. Battistin, E., & Rettore, E. (2008). Ineligibles and Eligible Non-Participants as a Double Comparison Group in Regression-Discontinuity Designs. Journal of Econometrics, 142(2), 715–730.

    doi.org/10.1016/j.jeconom.2007.05.006

    imperfect-compliancedouble-comparisonboundsfuzzy-RDD
    Annotation

    Battistin and Rettore propose using ineligible units and eligible non-participants as a double comparison group in regression-discontinuity designs. This specification-testing strategy allows researchers to assess the validity of RDD assumptions by checking whether the two comparison groups yield consistent estimates, strengthening the credibility of RDD-based inference.

  11. Bell, A., & Jones, K. (2015). Explaining Fixed Effects: Random Effects Modeling of Time-Series Cross-Sectional and Panel Data. Political Science Research and Methods, 3(1), 133–153.

    doi.org/10.1017/psrm.2014.7

    Foundationalon random effects
    within-betweenpanel-datamodel-choice
    Annotation

    Bell and Jones argue that the 'within-between' random-effects model (closely related to the Mundlak approach) can outperform pure fixed effects in certain settings because it allows explicit decomposition of within- and between-unit effects while accounting for unobserved heterogeneity. This approach retains the unbiasedness of the within estimator for time-varying regressors while also estimating between-unit effects that fixed effects discard. The paper provides practical guidance for researchers who need to estimate both types of effects or who have time-invariant regressors that fixed effects cannot identify.

  12. Belloni, A., Chernozhukov, V., & Hansen, C. (2014). Inference on Treatment Effects after Selection among High-Dimensional Controls. Review of Economic Studies, 81(2), 608–650.

    doi.org/10.1093/restud/rdt044

    LASSOpost-double-selectionhigh-dimensional
    Annotation

    Belloni, Chernozhukov, and Hansen introduce the post-double-selection LASSO method for inference on treatment effects with many potential controls. This paper is a key precursor to DML, demonstrating how regularized selection in both the treatment and outcome equations can yield valid inference.

  13. Ben-Michael, E., Feller, A., & Rothstein, J. (2021). The Augmented Synthetic Control Method. Journal of the American Statistical Association, 116(536), 1789–1803.

    doi.org/10.1080/01621459.2021.1929245

    augmented-SCMbias-reductiondoubly-robust
    Annotation

    Ben-Michael, Feller, and Rothstein propose augmenting the synthetic control estimator with an outcome model to reduce bias when the synthetic control does not achieve perfect pre-treatment fit. The resulting doubly robust estimator is consistent if either the outcome model or the weighting is correct, providing a practical improvement for applied synthetic control studies.

  14. Ben-Michael, E., Feller, A., & Rothstein, J. (2022). Synthetic Controls with Staggered Adoption. Journal of the Royal Statistical Society: Series B, 84(2), 351–381.

    doi.org/10.1111/rssb.12448

    staggered-adoptioncollective-bargainingeducation-policy
    Annotation

    Ben-Michael, Feller, and Rothstein extend synthetic control and synthetic DID methods to staggered adoption settings where multiple units adopt treatment at different times. They demonstrate the approach by estimating the effects of teacher collective bargaining laws on school spending across U.S. states, showing how synthetic DID-style reweighting improves counterfactual estimation when treatment rolls out over time.

  15. Benjamini, Y., & Hochberg, Y. (1995). Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing. Journal of the Royal Statistical Society: Series B, 57(1), 289–300.

    doi.org/10.1111/j.2517-6161.1995.tb02031.x

    Foundationalon multiple testing
    FDRstep-up-procedurefalse-discovery-rate
    Annotation

    Benjamini and Hochberg introduce the false discovery rate (FDR) as an alternative to family-wise error rate control. Their step-up procedure for controlling FDR is less conservative than Bonferroni while still providing meaningful protection against false positives, and has become the standard in many fields.

  16. Bennedsen, M., Nielsen, K. M., Pérez-González, F., & Wolfenzon, D. (2007). Inside the Family Firm: The Role of Families in Succession Decisions and Performance. Quarterly Journal of Economics, 122(2), 647–691.

    doi.org/10.1162/qjec.122.2.647

    corporate-governanceCEO-successionnatural-experimentfamily-firms
    Annotation

    Bennedsen et al. use the gender of the controlling family's firstborn child as an instrument for whether the successor CEO is a family member or a professional outsider. They find that family successions cause a large negative impact on firm performance, with operating profitability falling by at least four percentage points. The paper demonstrates how a creative natural experiment can address endogeneity in corporate governance research.

  17. Bertrand, M., & Schoar, A. (2003). Managing with Style: The Effect of Managers on Firm Policies. Quarterly Journal of Economics, 118(4), 1169–1208.

    doi.org/10.1162/003355303322552775

    Applicationon fixed effects
    manager-fixed-effectsCEO-stylecorporate-policy
    Annotation

    Bertrand and Schoar use manager fixed effects (tracking CEOs who moved between firms) to show that individual managerial 'style' explains a significant portion of the variation in corporate investment, financial, and organizational practices. This paper is a key reference linking fixed effects methods to management questions.

  18. Bertrand, M., Duflo, E., & Mullainathan, S. (2004). How Much Should We Trust Differences-in-Differences Estimates?. Quarterly Journal of Economics, 119(1), 249–275.

    doi.org/10.1162/003355304772839588

    serial-correlationclustered-standard-errorsinference
    Annotation

    Bertrand, Duflo, and Mullainathan show that standard errors in DID studies are often far too small because they ignore serial correlation within units over time. They propose clustering standard errors at the group level as a simple fix, which is now widely recommended practice in DID applications.

  19. Bertrand, M., & Mullainathan, S. (2004). Are Emily and Greg More Employable Than Lakisha and Jamal? A Field Experiment on Labor Market Discrimination. American Economic Review, 94(4), 991–1013.

    doi.org/10.1257/0002828042002561

    Applicationon experimental design
    audit-studydiscriminationlabor-marketfield-experiment
    Annotation

    Bertrand and Mullainathan send fictitious resumes with randomly assigned names to employers and find that 'white-sounding' names receive 50% more callbacks in this famous audit study. It is one of the most widely cited field experiments in social science and a powerful example of how randomization can identify discrimination.

  20. Bitler, M. P., Gelbach, J. B., & Hoynes, H. W. (2006). What Mean Impacts Miss: Distributional Effects of Welfare Reform Experiments. American Economic Review, 96(4), 988–1012.

    doi.org/10.1257/aer.96.4.988

    applicationwelfare-reformdistributional-effects
    Annotation

    Bitler, Gelbach, and Hoynes apply quantile treatment effects to experimental data from the Connecticut Jobs First welfare reform program. They show that the average treatment effect masks dramatic heterogeneity: the program had no impact at the bottom of the earnings distribution, increased earnings in the middle, and decreased earnings at the top. The paper demonstrates why distributional analysis is essential for evaluating social programs whose effects vary across the outcome distribution.

  21. Blanchard, O. J., & Katz, L. F. (1992). Regional Evolutions. Brookings Papers on Economic Activity, 1992(1), 1–76.

    doi.org/10.2307/2534556

    regional-adjustmentmigrationlabor-markets
    Annotation

    Blanchard and Katz study regional labor market adjustment in the United States, analyzing how local employment shocks affect wages, unemployment, and migration. They construct a predicted-employment instrument using national industry growth interacted with local industry shares—the approach the subsequent literature calls the Bartik or shift-share instrument.

  22. Blomquist, S., Newey, W. K., Kumar, A., & Liang, C.-Y. (2021). On Bunching and Identification of the Taxable Income Elasticity. Journal of Political Economy, 129(8), 2320–2343.

    doi.org/10.1086/714446

    Foundationalon bunching estimation
    identificationtaxable-income-elasticitycritiquenonparametric
    Annotation

    Blomquist, Newey, Kumar, and Liang provide a critical examination of the identification assumptions underlying bunching estimation. They show that the standard bunching estimator identifies the elasticity only under strong assumptions about the functional form of the counterfactual density and the distribution of preferences. Without these assumptions, the amount of bunching is consistent with a range of elasticities. The paper sparks an important methodological debate about what bunching can and cannot identify, and motivates subsequent work on tightening identification in bunching designs.

  23. Bloom, H. S. (1995). Minimum Detectable Effects: A Simple Way to Report the Statistical Power of Experimental Designs. Evaluation Review, 19(5), 547–556.

    doi.org/10.1177/0193841X9501900504

    Foundationalon power analysis
    MDEminimum-detectable-effectprogram-evaluation
    Annotation

    Bloom introduces the minimum detectable effect (MDE) framework, which reports the smallest effect size a study can reliably detect given its design and sample size. This approach is now the standard way to discuss statistical power in program evaluation and experimental economics.

  24. Bloom, N., & Van Reenen, J. (2007). Measuring and Explaining Management Practices Across Firms and Countries. Quarterly Journal of Economics, 122(4), 1351–1408.

    doi.org/10.1162/qjec.2007.122.4.1351

    management-practicesproductivityfirm-performance
    Annotation

    Bloom and Van Reenen develop a survey-based measure of management practices and document that better management is strongly associated with higher productivity, profitability, and growth. They use IV strategies (including product market competition and primogeniture rules for family management succession) to investigate why management quality varies, finding that poor management is more prevalent when competition is weak and when family firms follow primogeniture. The paper is foundational for the measurement of management practices; the IV analysis is one component of a broader measurement and descriptive study.

  25. Bloom, N., Liang, J., Roberts, J., & Ying, Z. J. (2015). Does Working from Home Work? Evidence from a Chinese Experiment. Quarterly Journal of Economics, 130(1), 165–218.

    doi.org/10.1093/qje/qju032

    Applicationon experimental design
    remote-workproductivityfield-experimentmanagement
    Annotation

    Bloom and colleagues conduct a large-scale randomized experiment at a Chinese travel agency, finding that working from home leads to a 13% performance increase. The study becomes a landmark reference in management and labor economics for its clean experimental design applied to a practical workplace question.

  26. Bonferroni, C. E. (1936). Teoria statistica delle classi e calcolo delle probabilità. Pubblicazioni del R Istituto Superiore di Scienze Economiche e Commerciali di Firenze, 8, 3–62.

    Foundationalon multiple testing
    Bonferroni-correctionFWERclassical
    Annotation

    Bonferroni develops the classical correction for multiple comparisons, which controls the family-wise error rate by dividing the significance level by the number of tests. While conservative, the Bonferroni correction remains widely used due to its simplicity and broad applicability.

  27. Borusyak, K., Hull, P., & Jaravel, X. (2022). Quasi-Experimental Shift-Share Research Designs. Review of Economic Studies, 89(1), 181–213.

    doi.org/10.1093/restud/rdab030

    shock-exogeneitymany-shocksidentification
    Annotation

    Borusyak, Hull, and Jaravel provide an alternative framework where identification comes from the exogeneity of the shocks rather than the shares. They show that with many independent shocks, the instrument is valid even if shares are endogenous, greatly expanding the range of credible applications.

  28. Borusyak, K., Jaravel, X., & Spiess, J. (2024). Revisiting Event-Study Designs: Robust and Efficient Estimation. Review of Economic Studies, 91(6), 3253–3285.

    doi.org/10.1093/restud/rdae007

    imputation-estimatorefficiencyevent-study
    Annotation

    Borusyak, Jaravel, and Spiess propose an imputation estimator for staggered DID that first estimates unit and time fixed effects from untreated observations, then imputes the counterfactual outcomes. This approach is efficient, flexible, and avoids the negative weighting problem of TWFE.

  29. Bound, J., Jaeger, D. A., & Baker, R. M. (1995). Problems with Instrumental Variables Estimation When the Correlation Between the Instruments and the Endogenous Explanatory Variable Is Weak. Journal of the American Statistical Association, 90(430), 443–450.

    doi.org/10.1080/01621459.1995.10476536

    Foundationalon instrumental variables
    weak-instrumentsIV-biasfinite-samplefirst-stage-F
    Annotation

    Bound, Jaeger, and Baker demonstrate that instrumental variables estimates can be severely biased when instruments are weakly correlated with the endogenous regressor. They show that with weak instruments, the finite-sample bias of IV approaches that of OLS, and that the standard IV confidence intervals can have coverage far below their nominal levels. The paper motivates the widespread practice of reporting first-stage F-statistics as a diagnostic for instrument strength.

  30. Brand, J. E., Xu, J., Koch, B., & Geraldo, P. (2021). Uncovering Sociological Effect Heterogeneity Using Tree-Based Machine Learning. Sociological Methodology, 51(2), 189–223.

    doi.org/10.1177/0081175021993503

    Applicationon causal forests
    social-sciencereturns-to-educationvariable-importance
    Annotation

    Brand and colleagues provide a practical guide to using causal trees and forests in social science research. They discuss honest estimation, variable importance for understanding which covariates drive heterogeneity, and apply the methods to study heterogeneous returns to college education.

  31. Brinch, C. N., Mogstad, M., & Wiswall, M. (2017). Beyond LATE with a Discrete Instrument. Journal of Political Economy, 125(4), 985–1039.

    doi.org/10.1086/692712

    MTEdiscrete-instrumentsemiparametricquantity-qualityLATE
    Annotation

    Brinch, Mogstad, and Wiswall show how to estimate the MTE curve semiparametrically even with a discrete (binary or multivalued) instrument, which is a common case in practice. They demonstrate that the local IV approach can be implemented with discrete instruments by imposing additive separability between observed covariates and unobserved heterogeneity along with a parametric structure on the MTE. Applied to the quantity-quality tradeoff of children using twin births and sibling sex composition as instruments for family size, they find that MTE varies with unobserved resistance to having additional children, demonstrating how discrete instruments can recover policy-relevant heterogeneity beyond LATE.

  32. Brown, S. J., & Warner, J. B. (1985). Using Daily Stock Returns: The Case of Event Studies. Journal of Financial Economics, 14(1), 3–31.

    doi.org/10.1016/0304-405X(85)90042-X

    Foundationalon event studies
    daily-returnstest-statisticssimulation
    Annotation

    Brown and Warner extend the event study framework from monthly to daily stock returns and examine the statistical properties of various test statistics. Their simulations show that simple methods perform well in most settings, providing practical reassurance for applied researchers.

  33. Bruhn, M., & McKenzie, D. (2009). In Pursuit of Balance: Randomization in Practice in Development Field Experiments. American Economic Journal: Applied Economics, 1(4), 200–232.

    doi.org/10.1257/app.1.4.200

    Foundationalon experimental design
    stratificationbalancerandomization-methodsfield-experiments
    Annotation

    Bruhn and McKenzie compare different randomization methods—simple, stratified, and pairwise—in practice and show that stratified randomization substantially improves balance on baseline covariates and increases statistical power. They provide practical recommendations for choosing among randomization procedures in field experiments.

  34. Buchanan, A. L., Hudgens, M. G., Cole, S. R., Mollan, K. R., Sax, P. E., Daar, E. S., Adimora, A. A., Eron, J. J., & Mugavero, M. J. (2018). Generalizing Evidence from Randomized Trials Using Inverse Probability of Sampling Weights. Journal of the Royal Statistical Society: Series A, 181(4), 1193–1209.

    doi.org/10.1111/rssa.12357

    Foundationalon external validity
    Annotation

    Buchanan and colleagues develop inverse probability of sampling weighted (IPSW) estimators for generalizing treatment effect estimates from randomized trials to well-defined target populations, and derive consistent sandwich-type variance estimators. The method models the probability of trial participation as a function of observed covariates, reweighting trial outcomes to represent the target population. Researchers seeking to transport trial results to a broader population can apply IPSW when a probability sample or census of the target population is available for comparison.

  35. Busenbark, J. R., Yoon, H., Gamache, D. L., & Withers, M. C. (2022). Omitted Variable Bias: Examining Management Research with the Impact Threshold of a Confounding Variable (ITCV). Journal of Management, 48(1), 17–48.

    doi.org/10.1177/01492063211006458

    ApplicationMgmton sensitivity analysis
    management-methodologyITCVbest-practices
    Annotation

    Busenbark and colleagues provide a practical guide to conducting sensitivity analysis in management research using the ITCV framework. They review its application in strategic management and organizational behavior, and demonstrate how to interpret and report results for management audiences.

  36. Bushway, S., Johnson, B. D., & Slocum, L. A. (2007). Is the Magic Still There? The Use of the Heckman Two-Step Correction for Selection Bias in Criminology. Journal of Quantitative Criminology, 23(2), 151–178.

    doi.org/10.1007/s10940-007-9024-4

    surveycriminologymisapplication
    Annotation

    Bushway, Johnson, and Slocum review Heckman model applications in criminology and find widespread misapplication. Emphasizes that without a credible exclusion restriction, the Heckman correction provides no improvement over naive OLS and may even increase bias.

C
57
  1. Callaway, B., & Sant'Anna, P. H. C. (2021). Difference-in-Differences with Multiple Time Periods. Journal of Econometrics, 225(2), 200–230.

    doi.org/10.1016/j.jeconom.2020.12.001

    group-time-ATTheterogeneous-effectsaggregation
    Annotation

    Callaway and Sant'Anna propose group-time average treatment effects (ATT(g,t)) that avoid the problematic comparisons in TWFE. Their framework allows for heterogeneous treatment effects across groups and time and provides aggregation schemes for summary parameters.

  2. Calonico, S., Cattaneo, M. D., & Titiunik, R. (2014). Robust Nonparametric Confidence Intervals for Regression-Discontinuity Designs. Econometrica, 82(6), 2295–2326.

    doi.org/10.3982/ECTA11757

    bias-correctionbandwidthrdrobustinference
    Annotation

    Calonico, Cattaneo, and Titiunik develop bias-corrected confidence intervals for RDD that address the problem of conventional confidence intervals being invalid when using optimal bandwidth selectors. Their rdrobust software package has become the standard tool for implementing RDD in practice.

  3. Cameron, A. C., & Trivedi, P. K. (1986). Econometric Models Based on Count Data: Comparisons and Applications of Some Estimators and Tests. Journal of Applied Econometrics, 1(1), 29–53.

    doi.org/10.1002/jae.3950010104

    overdispersionmodel-selectioncount-data
    Annotation

    Cameron and Trivedi compare Poisson, negative binomial, and other count data models, providing tests for overdispersion and guidance on model selection. This paper helps establish the practical toolkit for applied researchers working with count outcomes.

  4. Cameron, A. C., & Trivedi, P. K. (1990). Regression-based Tests for Overdispersion in the Poisson Model. Journal of Econometrics, 46(3), 347–364.

    doi.org/10.1016/0304-4076(90)90014-K

    overdispersionPoissonmodel-selectioncount-data
    Annotation

    Cameron and Trivedi develop regression-based tests for overdispersion in count data models, enabling formal testing of whether the Poisson equidispersion assumption holds. Their tests compare the observed variance to the Poisson-implied mean, providing the foundation for model selection between Poisson and negative binomial specifications. Researchers working with count outcomes should use these tests before defaulting to either model.

  5. Cameron, A. C., & Trivedi, P. K. (2005). Microeconometrics: Methods and Applications. Cambridge University Press.

    doi.org/10.1017/CBO9780511811241

    textbookpanel-datamicroeconometricsdynamic-panels
    Annotation

    Cameron and Trivedi cover panel data methods comprehensively in Chapter 21, including fixed effects, random effects, and dynamic panel models. A standard graduate-level reference for microeconometric methods.

  6. Cameron, A. C., Gelbach, J. B., & Miller, D. L. (2008). Bootstrap-Based Improvements for Inference with Clustered Errors. Review of Economics and Statistics, 90(3), 414–427.

    doi.org/10.1162/rest.90.3.414

    cluster-bootstrapfew-clustersinferencewild-bootstrap
    Annotation

    Cameron, Gelbach, and Miller address what happens when clustering is necessary but the number of clusters is small (fewer than 30-50). They propose the wild cluster bootstrap as a solution, which has become the standard approach when researchers have too few clusters for asymptotic cluster-robust standard errors to be reliable.

  7. Cameron, A. C., Gelbach, J. B., & Miller, D. L. (2011). Robust Inference with Multiway Clustering. Journal of Business & Economic Statistics, 29(2), 238–249.

    doi.org/10.1198/jbes.2010.07136

    Foundationalon clustering inference
    two-way-clusteringmultiway
    Annotation

    Cameron, Gelbach, and Miller extend cluster-robust variance estimation to settings with two-way (or multi-way) clustering. The variance estimator adds the two one-way cluster-robust variance matrices and subtracts the heteroscedasticity-robust matrix.

  8. Cameron, A. C., & Trivedi, P. K. (2013). Regression Analysis of Count Data. Cambridge University Press.

    doi.org/10.1017/CBO9781139013567

    textbookcount-datazero-inflationpanel-data
    Annotation

    Cameron and Trivedi provide the standard reference on count data regression, covering Poisson, negative binomial, zero-inflated, hurdle, and panel count models. They provide both the theoretical foundations and practical implementation guidance that applied researchers need.

  9. Cameron, A. C., & Miller, D. L. (2015). A Practitioner's Guide to Cluster-Robust Inference. Journal of Human Resources, 50(2), 317–372.

    doi.org/10.3368/jhr.50.2.317

    cluster-robuststandard-errorsinferencepractical-guide
    Annotation

    Cameron and Miller cover all aspects of cluster-robust inference in OLS regression in this highly practical survey, including when to cluster, at what level, and what to do when the number of clusters is small. It has become the essential reference for applied researchers deciding how to handle clustered data.

  10. Camuffo, A., Cordova, A., Gambardella, A., & Spina, C. (2020). A Scientific Approach to Entrepreneurial Decision Making: Evidence from a Randomized Control Trial. Management Science, 66(2), 564–586.

    doi.org/10.1287/mnsc.2018.3249

    ApplicationMgmton experimental design
    RCTentrepreneurshipdecision-makingscientific-method
    Annotation

    Camuffo and colleagues conduct a randomized controlled trial with 116 Italian startups, randomly assigning half to receive training in a 'scientific' approach to entrepreneurial decision-making (formulating and testing hypotheses before committing resources). Treated startups perform better, are more likely to pivot, and are not more likely to drop out, providing experimental evidence that structured decision-making improves entrepreneurial outcomes.

  11. Camuffo, A., Gambardella, A., Messinese, D., Novelli, E., Paolucci, E., & Spina, C. (2024). A Scientific Approach to Entrepreneurial Decision-Making: Large-Scale Replication and Extension. Strategic Management Journal, 45(6), 1209–1237.

    doi.org/10.1002/smj.3580

    ApplicationMgmton experimental design
    RCTreplicationentrepreneurshipscientific-methodexternal-validity
    Annotation

    Camuffo and colleagues conduct four randomized controlled trials with 759 firms across Italy, the UK, and India, replicating and extending their earlier finding that training entrepreneurs to adopt a 'scientific' approach to decision-making improves venture performance. The multi-site, multi-country design provides strong evidence on the external validity of the original RCT findings.

  12. Capron, L., & Pistre, N. (2002). When Do Acquirers Earn Abnormal Returns?. Strategic Management Journal, 23(9), 781–794.

    doi.org/10.1002/smj.262

    ApplicationMgmton event studies
    M&Aacquirer-returnsstrategy
    Annotation

    Capron and Pistre use event study methodology to examine when acquiring firms earn positive abnormal returns from mergers and acquisitions. They find that acquirers earn positive returns only when they are the primary source of value creation, contributing to the M&A strategy literature.

  13. Card, D., & Krueger, A. B. (1994). Minimum Wages and Employment: A Case Study of the Fast-Food Industry in New Jersey and Pennsylvania. American Economic Review, 84(4), 772–793.

    minimum-wageemploymentnatural-experiment
    Annotation

    Card and Krueger compare fast-food employment in New Jersey (which raised its minimum wage) with neighboring Pennsylvania (which did not) in perhaps the most famous DID study in economics. They find no negative employment effect, challenging the standard textbook prediction. This paper popularizes DID as a research design.

  14. Card, D. (2001). Immigrant Inflows, Native Outflows, and the Local Labor Market Impacts of Higher Immigration. Journal of Labor Economics, 19(1), 22–64.

    doi.org/10.1086/209979

    immigrationenclave-instrumentlabor-markets
    Annotation

    Card uses a shift-share instrument based on historical settlement patterns of immigrant groups to predict current immigration flows to U.S. cities. This 'enclave instrument' is adopted in hundreds of subsequent immigration studies and is a classic example of the shift-share approach.

  15. Card, D., Lee, D. S., Pei, Z., & Weber, A. (2015). Inference on Causal Effects in a Generalized Regression Kink Design. Econometrica, 83(6), 2453–2483.

    doi.org/10.3982/ECTA11224

    Foundationalon regression kink design
    RKD-foundationskink-designunemployment-insurancederivative-ratio
    Annotation

    Card, Lee, Pei, and Weber formalize the regression kink design, establishing conditions under which a kink in the treatment assignment function identifies causal effects. They show that the estimand is the ratio of the change in the slope of the conditional expectation of the outcome to the change in the slope of the treatment function at the kink point. The paper develops inference procedures and applies the method to estimate the effect of unemployment insurance benefits on unemployment duration.

  16. Card, D., Kluve, J., & Weber, A. (2018). What Works? A Meta Analysis of Recent Active Labor Market Program Evaluations. Journal of the European Economic Association, 16(3), 894–931.

    doi.org/10.1093/jeea/jvx028

    Annotation

    Card, Kluve, and Weber conduct a meta-analysis of over 200 active labor market program evaluations across multiple countries, classifying estimates by program type, participant group, and post-program time horizon. They find that average impacts are near zero in the short run but become more positive two to three years after program completion, with human capital programs showing the largest medium-term gains and public employment subsidies proving less effective. Policy researchers designing labor market interventions should consider program type and evaluation time horizon when interpreting treatment effect estimates.

  17. Carneiro, P., Heckman, J. J., & Vytlacil, E. J. (2011). Estimating Marginal Returns to Education. American Economic Review, 101(6), 2754–2781.

    doi.org/10.1257/aer.101.6.2754

    Applicationon lab mte replication
    Annotation

    Carneiro, Heckman, and Vytlacil use the marginal treatment effect framework to estimate heterogeneous returns to college. They find a declining MTE curve -- individuals most likely to attend college benefit the most -- demonstrating that conventional treatment effect parameters (ATE, ATT, LATE) differ substantially due to essential heterogeneity.

  18. Casey, K., Glennerster, R., & Miguel, E. (2012). Reshaping Institutions: Evidence on Aid Impacts Using a Preanalysis Plan. Quarterly Journal of Economics, 127(4), 1755–1812.

    doi.org/10.1093/qje/qje027

    field-experimentpre-analysis-plandevelopment-economicsWestfall-Young
    Annotation

    Casey, Glennerster, and Miguel pre-registered their analysis plan for a community-driven development program in Sierra Leone and apply multiple testing corrections (including the Westfall-Young step-down procedure and family-wise error rate adjustments) across outcome families. This paper is one of the most prominent examples of rigorous multiple testing adjustment in a field experiment, demonstrating that many individually significant effects lose significance after correction.

  19. Cattaneo, M. D., Drukker, D. M., & Holland, A. D. (2013). Estimation of Multivalued Treatment Effects Under Conditional Independence. Stata Journal, 13(3), 407–450.

    doi.org/10.1177/1536867X1301300301

    Foundationalon matching methods
    multi-valued-treatmentdose-responseinverse-probability-weightingStata
    Annotation

    Cattaneo, Drukker, and Holland extend matching and inverse probability weighting methods to settings with multi-valued (rather than binary) treatments, developing estimators for dose-response functions under conditional independence. Their accompanying Stata implementation made these methods readily accessible to applied researchers.

  20. Cattaneo, M. D., Frandsen, B. R., & Titiunik, R. (2015). Randomization Inference in the Regression Discontinuity Design: An Application to Party Advantages in the U.S. Senate. Journal of Causal Inference, 3(1), 1–24.

    doi.org/10.1515/jci-2013-0010

    RDDlocal-randomizationelectionsfinite-sample
    Annotation

    Cattaneo, Frandsen, and Titiunik develop a randomization inference framework for regression discontinuity designs, exploiting the local randomization interpretation of close elections. They apply the method to estimate party advantages in U.S. Senate elections, demonstrating how Fisher-style permutation tests can provide finite-sample exact inference in RDD settings where asymptotic approximations may be unreliable.

  21. Cattaneo, M. D., Titiunik, R., & Vazquez-Bare, G. (2019). Power Calculations for Regression-Discontinuity Designs. Stata Journal, 19(1), 210–245.

    doi.org/10.1177/1536867X19830919

    power-calculationssample-sizestudy-designsoftware
    Annotation

    Cattaneo, Titiunik, and Vazquez-Bare provide methods and software for power calculations in RDD, essential for study design and determining adequate sample sizes near the cutoff. The associated rdsampsi command enables researchers to plan appropriately powered RDD studies before data collection.

  22. Cattaneo, M. D., Idrobo, N., & Titiunik, R. (2020). A Practical Introduction to Regression Discontinuity Designs: Foundations. Cambridge University Press.

    doi.org/10.1017/9781108684606

    RDD-practical-guiderdrobusttextbookfuzzy-RDD
    Annotation

    Cattaneo, Idrobo, and Titiunik provide a practical and accessible guide to implementing regression discontinuity designs, covering both sharp and fuzzy cases with worked examples and code. Part of the Cambridge Elements series, it provides step-by-step guidance on bandwidth selection, estimation, and inference using the rdrobust toolkit.

  23. Cattaneo, M. D., Jansson, M., & Ma, X. (2020). Simple Local Polynomial Density Estimators. Journal of the American Statistical Association, 115(531), 1449–1455.

    doi.org/10.1080/01621459.2019.1635480

    manipulation-testingdensity-estimationrddensity
    Annotation

    Cattaneo, Jansson, and Ma propose a local polynomial density estimator for manipulation testing in regression discontinuity designs. Implemented in the rddensity package, it provides a modern alternative to the McCrary (2008) density test with better boundary properties.

  24. Cattaneo, M. D., & Titiunik, R. (2022). Regression Discontinuity Designs. Annual Review of Economics, 14, 821–851.

    doi.org/10.1146/annurev-economics-051520-021409

    surveystate-of-the-artfuzzy-RDDgeographic-RDDmulti-cutoff
    Annotation

    Cattaneo and Titiunik survey the state of the art in RDD methodology, including extensions to fuzzy designs, geographic RDD, and multi-cutoff designs. They provide guidance on current recommended practices and an excellent entry point to the modern RDD literature.

  25. Cattaneo, M. D., Idrobo, N., & Titiunik, R. (2024). A Practical Introduction to Regression Discontinuity Designs: Extensions. Cambridge University Press.

    doi.org/10.1017/9781009441896

    textbookextensionsmulti-scoregeographic-RDDkink-design
    Annotation

    Cattaneo, Idrobo, and Titiunik cover extensions of the regression discontinuity framework in this follow-up volume, including multi-score designs, geographic RDD, kink designs, and discrete running variables. They provide practical guidance and software implementations for these more advanced settings, making it an essential companion for applied researchers going beyond the standard sharp RDD.

  26. Certo, S. T., Busenbark, J. R., Woo, H., & Semadeni, M. (2016). Sample Selection Bias and Heckman Models in Strategic Management Research. Strategic Management Journal, 37(13), 2639–2657.

    doi.org/10.1002/smj.2475

    surveymanagementbest-practices
    Annotation

    Certo, Busenbark, Woo, and Semadeni review the use of Heckman models in strategic management. They provide practical guidance on when selection correction is needed, how to choose exclusion restrictions, and how to interpret results. Finds that many SMJ papers misapply the technique.

  27. Chamberlain, G. (1980). Analysis of Covariance with Qualitative Data. Review of Economic Studies, 47(1), 225–238.

    doi.org/10.2307/2297110

    Foundationalon fixed effects, logit probit
    nonlinear-modelsconditional-logitdiscrete-choice
    Annotation

    Chamberlain extends the fixed effects approach to nonlinear models like logit, showing how to condition out the fixed effects in discrete choice settings. This work is fundamental for researchers who need fixed effects in models where the dependent variable is binary or categorical.

  28. Chatterji, A. K., Findley, M., Jensen, N. M., Meier, S., & Nielson, D. (2016). Field Experiments in Strategy Research. Strategic Management Journal, 37(1), 116–132.

    doi.org/10.1002/smj.2449

    ApplicationMgmton experimental design
    field-experimentsstrategymethodology
    Annotation

    Chatterji, Findley, Jensen, Meier, and Nielson make the case for using field experiments in strategy research and provide practical guidance for doing so. They discuss internal validity, external validity, and ethical considerations specific to strategy scholars.

  29. Chava, S., & Roberts, M. R. (2008). How Does Financing Impact Investment? The Role of Debt Covenants. Journal of Finance, 63(5), 2085–2121.

    doi.org/10.1111/j.1540-6261.2008.01391.x

    debt-covenantscorporate-financeinvestment
    Annotation

    Chava and Roberts use an RDD around debt covenant thresholds to study how covenant violations affect firm investment. This paper is an important early application of RDD in corporate finance, where accounting-based thresholds create natural discontinuities.

  30. Chernozhukov, V., & Hansen, C. (2005). An IV Model of Quantile Treatment Effects. Econometrica, 73(1), 245–261.

    doi.org/10.1111/j.1468-0262.2005.00570.x

    foundationalinstrumental-variablesquantile-regression
    Annotation

    Chernozhukov and Hansen develop an instrumental variable framework for quantile regression to address endogeneity. Proposes the inverse quantile regression (IQR) method that exploits moment conditions implied by the structural quantile model. Provides conditions under which quantile treatment effects are identified with endogenous treatments, extending quantile regression to credible causal inference settings.

  31. Chernozhukov, V., Chetverikov, D., Demirer, M., Duflo, E., Hansen, C., Newey, W., & Robins, J. (2018). Double/Debiased Machine Learning for Treatment and Structural Parameters. Econometrics Journal, 21(1), C1–C68.

    doi.org/10.1111/ectj.12097

    Neyman-orthogonalitycross-fittingpartially-linear-model
    Annotation

    Chernozhukov et al. introduce double/debiased machine learning (DML), showing how to combine Neyman orthogonality with cross-fitting to obtain root-n consistent and asymptotically normal estimates of low-dimensional causal parameters while using high-dimensional machine learning for nuisance functions. This paper provides the theoretical foundation for valid inference when first-stage estimation uses flexible ML methods that would otherwise invalidate standard asymptotic arguments. The cross-fitting procedure it introduces is now standard practice for any application combining ML prediction with causal parameter estimation.

  32. Chernozhukov, V., Wuthrich, K., & Zhu, Y. (2021). An Exact and Robust Conformal Inference Method for Counterfactual and Synthetic Controls. Journal of the American Statistical Association, 116(536), 1849–1864.

    doi.org/10.1080/01621459.2021.1920957

    Foundationalon synthetic control
    conformal-inferencecounterfactualfinite-sample
    Annotation

    Chernozhukov, Wuthrich, and Zhu develop a conformal inference method for synthetic control that provides exact, finite-sample valid p-values and confidence intervals without requiring a large number of control units. This approach offers a modern, robust alternative to placebo-based inference for counterfactual and synthetic control estimators.

  33. Chernozhukov, V., Escanciano, J. C., Ichimura, H., Newey, W. K., & Robins, J. M. (2022). Locally Robust Semiparametric Estimation. Econometrica, 90(4), 1501–1535.

    doi.org/10.3982/ECTA16294

    semiparametriclocal-robustnessdebiasing
    Annotation

    Chernozhukov, Escanciano, Ichimura, Newey, and Robins develop locally robust semiparametric estimators that extend the DML framework, demonstrating how automatic debiasing with machine learning first-stage estimates can be applied broadly. Their approach yields root-n consistent estimates of causal and structural parameters even when nuisance functions are estimated with regularized machine learning methods.

  34. Chetty, R., Friedman, J. N., Olsen, T., & Pistaferri, L. (2011). Adjustment Costs, Firm Responses, and Micro vs. Macro Labor Supply Elasticities: Evidence from Danish Tax Records. Quarterly Journal of Economics, 126(2), 749–804.

    doi.org/10.1093/qje/qjr013

    Applicationon bunching estimation
    labor-supplyadjustment-costsDenmarktax-kinksfrictions
    Annotation

    Chetty, Friedman, Olsen, and Pistaferri use Danish administrative tax data to reconcile the gap between micro and macro labor supply elasticities using bunching methods. They show that adjustment frictions explain why micro estimates from bunching at tax kinks are small: many workers cannot freely adjust hours, so observed bunching understates the frictionless elasticity. They estimate that accounting for frictions raises the implied elasticity substantially. The paper is a landmark application of bunching to the micro-macro elasticity puzzle and introduces key methods for dealing with frictions in bunching designs.

  35. Chetty, R., Friedman, J. N., & Rockoff, J. E. (2014). Measuring the Impacts of Teachers I: Evaluating Bias in Teacher Value-Added Estimates. American Economic Review, 104(9), 2593–2632.

    doi.org/10.1257/aer.104.9.2593

    Applicationon fixed effects
    teacher-value-addededucationcausal-validation
    Annotation

    Chetty, Friedman, and Rockoff use teacher fixed effects (value-added models) and quasi-experimental validation to measure individual teachers' causal impacts on student outcomes. They demonstrate that teacher fixed effects capture real causal effects, not just selection, and their work has influenced education policy worldwide.

  36. Choudhury, P., Foroughi, C., & Larson, B. (2021). Work-from-anywhere: The Productivity Effects of Geographic Flexibility. Strategic Management Journal, 42(4), 655–683.

    doi.org/10.1002/smj.3251

    ApplicationMgmton difference in differences
    difference-in-differencesremote-worknatural-experimentproductivity
    Annotation

    Choudhury, Foroughi, and Larson use a difference-in-differences design to study the productivity effects of a work-from-anywhere policy at the U.S. Patent and Trademark Office. They find that geographic flexibility increases output by approximately 4.4% without reducing quality. The paper demonstrates the application of DiD to a natural experiment in organizational design and is a leading example of causal inference in the future-of-work literature.

  37. Christensen, G., & Miguel, E. (2018). Transparency, Reproducibility, and the Credibility of Economics Research. Journal of Economic Literature, 56(3), 920–980.

    doi.org/10.1257/jel.20171350

    transparencyreproducibilityAEA-registryeconomics
    Annotation

    Christensen and Miguel survey the transparency and reproducibility landscape in economics, documenting the growing adoption of pre-registration through the AEA RCT Registry and other platforms. They present evidence on the prevalence of specification searching and publication bias, and make the case that pre-registration combined with pre-analysis plans substantially improves the credibility of empirical findings.

  38. Cinelli, C., & Hazlett, C. (2020). Making Sense of Sensitivity: Extending Omitted Variable Bias. Journal of the Royal Statistical Society: Series B, 82(1), 39–67.

    doi.org/10.1111/rssb.12348

    Foundationalon sensitivity analysis
    omitted-variable-biaspartial-R-squaredbenchmarking
    Annotation

    Cinelli and Hazlett develop a modern framework for sensitivity analysis based on partial R-squared measures, extending the omitted variable bias formula. Their approach allows researchers to benchmark the strength of hypothetical confounders against observed covariates, making sensitivity analysis more interpretable.

  39. Cinelli, C., Ferwerda, J., & Hazlett, C. (2024). Sensemakr: Sensitivity Analysis Tools for OLS in R and Stata. Observational Studies, 10(2), 93–127.

    doi.org/10.1353/obs.2024.a946583

    softwarepartial-R-squaredbenchmarkingR-package
    Annotation

    Cinelli, Ferwerda, and Hazlett develop the sensemakr R and Stata package implementing their partial R-squared sensitivity analysis framework. They demonstrate the tool with applications to studies of violence and political attitudes, showing how researchers can benchmark potential confounders against observed covariates to assess the robustness of causal claims from observational data.

  40. Cinelli, C., & Hazlett, C. (2025). An Omitted Variable Bias Framework for Sensitivity Analysis of Instrumental Variables. Biometrika, 112(2), asaf004.

    doi.org/10.1093/biomet/asaf004

    Applicationon sensitivity analysis
    instrumental-variablesexclusion-restrictionomitted-variable-biasIV-sensitivity
    Annotation

    Cinelli and Hazlett extend their OLS sensitivity framework to instrumental variables settings, showing how to assess the robustness of IV estimates to violations of the exclusion restriction. They derive bounds on IV bias as a function of the partial R-squared of a hypothetical confounder with both the instrument and the outcome, providing practical tools for benchmarking the plausibility of IV assumptions.

  41. Clark, T. S., & Linzer, D. A. (2015). Should I Use Fixed or Random Effects?. Political Science Research and Methods, 3(2), 399–408.

    doi.org/10.1017/psrm.2014.32

    fixed-vs-randommodel-selectionpractical-guidance
    Annotation

    Clark and Linzer provide practical guidance on choosing between fixed and random effects, arguing the decision depends on the research question, sample size, and the degree of correlation between unit effects and covariates. They demonstrate via simulation that random effects can outperform fixed effects when the number of units is small or when between-unit variation is of substantive interest. The paper challenges the common practice of defaulting to fixed effects solely because the Hausman test rejects.

  42. Clarke, D., Romano, J. P., & Wolf, M. (2020). The Romano-Wolf Multiple-Hypothesis Correction in Stata. Stata Journal, 20(4), 812–843.

    doi.org/10.1177/1536867X20976314

    Foundationalon multiple testing
    StatasoftwareRomano-WolfFWER
    Annotation

    Clarke, Romano, and Wolf develop a Stata implementation of the Romano-Wolf stepwise multiple testing correction, which controls the family-wise error rate while accounting for the dependence structure among test statistics via resampling. This correction is more powerful than Bonferroni or Holm procedures when test statistics are correlated, which is the typical case in applied research with related outcomes. The rwolf command provides applied researchers with an accessible tool for rigorous multiple hypothesis testing.

  43. Clarke, D., Pailanir, D., Athey, S., & Imbens, G. (2024). On Synthetic Difference-in-Differences and Related Estimation Methods in Stata. Stata Journal, 24(4), 557–598.

    doi.org/10.1177/1536867X241297914

    Statasoftwareimplementation
    Annotation

    Clarke and colleagues develop the sdid Stata package for implementing synthetic DID, providing detailed documentation and empirical examples. This paper makes the method accessible to applied researchers and demonstrates implementation with real policy evaluation data.

  44. Cleves, M., Gould, W., & Marchenko, Y. (2016). An Introduction to Survival Analysis Using Stata. Stata Press.

    surveystatapractical-guide
    Annotation

    Cleves, Gould, and Marchenko provide a comprehensive practical guide to survival analysis in Stata. Covers Kaplan-Meier estimation, Cox regression, parametric models, competing risks, and frailty models with extensive Stata code examples and diagnostic procedures.

  45. Coffman, L. C., & Niederle, M. (2015). Pre-Analysis Plans Have Limited Upside, Especially Where Replications Are Feasible. Journal of Economic Perspectives, 29(3), 81–98.

    doi.org/10.1257/jep.29.3.81

    skepticismreplicationflexibility
    Annotation

    Coffman and Niederle offer a skeptical perspective on pre-analysis plans, arguing that their benefits are limited when replication is feasible and that rigid adherence to pre-specified analyses can prevent researchers from learning from the data. This paper provides important counterarguments in the pre-registration debate.

  46. Cohen, J. (1988). Statistical Power Analysis for the Behavioral Sciences. Lawrence Erlbaum Associates.

    doi.org/10.4324/9780203771587

    Foundationalon power analysis
    textbookeffect-sizesample-size
    Annotation

    Cohen's foundational textbook introduces the concepts of effect size, statistical power, and sample size determination that becomes standard in the behavioral sciences. He provides power tables and conventions for small, medium, and large effect sizes that remain widely used across disciplines.

  47. Conley, T. G. (1999). GMM Estimation with Cross Sectional Dependence. Journal of Econometrics, 92(1), 1–45.

    doi.org/10.1016/S0304-4076(98)00084-0

    Annotation

    Conley develops GMM estimators and nonparametric, positive semi-definite covariance matrix estimators that account for cross-sectional dependence characterized by economic or geographic distance between observations. The approach extends HAC-style inference to spatial settings by allowing error correlations to decline smoothly with distance, and the covariance estimator remains consistent even when distances are imprecisely measured. Researchers with spatially distributed data should use Conley standard errors when observations within a defined neighborhood are likely correlated.

  48. Conley, T. G., Hansen, C. B., & Rossi, P. E. (2012). Plausibly Exogenous. Review of Economics and Statistics, 94(1), 260–272.

    doi.org/10.1162/REST_a_00139

    Foundationalon instrumental variables
    exclusion-restrictionsensitivity-analysisplausible-exogeneity
    Annotation

    Conley, Hansen, and Rossi develop methods for inference when the exclusion restriction is 'plausibly' rather than exactly satisfied, parameterizing the degree of violation and constructing valid confidence intervals. This approach provides a formal sensitivity analysis for IV estimates, answering the question: how large would the violation of the exclusion restriction need to be to overturn the result? Applied researchers can use these methods to transparently assess the robustness of IV findings to a common critique.

  49. Cornelissen, T., Dustmann, C., Raute, A., & Schonberg, U. (2016). From LATE to MTE: Alternative Methods for the Evaluation of Policy Interventions. Labour Economics, 41, 47–60.

    doi.org/10.1016/j.labeco.2016.06.004

    MTEchild-carepolicy-evaluationappliedGermany+1
    Annotation

    Cornelissen, Dustmann, Raute, and Schonberg provide an accessible methodological guide to MTE estimation, covering the theoretical foundations and practical steps for moving from LATE to the full marginal treatment effect curve. The paper explains how to use local instrumental variables to trace out how treatment effects vary with individuals' unobserved propensity to participate. It serves as a tutorial for applied researchers seeking to implement MTE methods, with clear exposition of identification, estimation, and interpretation.

  50. Cornwell, C., & Rupert, P. (1988). Efficient Estimation with Panel Data: An Empirical Comparison of Instrumental Variables Estimators. Journal of Applied Econometrics, 3(2), 149–155.

    doi.org/10.1002/jae.3950030206

    Applicationon lab re replication
    Annotation

    Cornwell and Rupert compare the efficiency of alternative instrumental variables estimators for panel data models with correlated individual effects, including the Hausman-Taylor, Amemiya-MaCurdy, and Breusch-Mizon-Schmidt estimators. Using a Mincer wage equation on PSID data, they find that efficiency gains from the more complex estimators are limited to the coefficients of time-invariant endogenous variables.

  51. Correia, S. (2017). Linear Models with High-Dimensional Fixed Effects: An Efficient and Feasible Estimator. Working Paper.

    Foundationalon fixed effects
    reghdfehigh-dimensional-FEStatacomputational
    Annotation

    Correia develops an efficient iterative demeaning estimator for linear models with multiple high-dimensional fixed effects that scales to very large datasets. The estimator handles arbitrary numbers of fixed-effect dimensions and supports cluster-robust standard errors. Its implementation as the reghdfe Stata command has become the standard tool for applied researchers working with high-dimensional fixed effects in panel data.

  52. Correia, S., Guimaraes, P., & Zylkin, T. (2020). Fast Poisson Estimation with High-Dimensional Fixed Effects. Stata Journal, 20(1), 95–115.

    doi.org/10.1177/1536867X20909691

    ppmlhdfehigh-dimensional-FEPPMLStata
    Annotation

    Correia, Guimaraes, and Zylkin introduce the ppmlhdfe Stata command for fast Poisson estimation with multiple levels of fixed effects, making PPML feasible for large datasets with high-dimensional fixed effects. This tool has become standard for applied researchers working with count data in panel settings.

  53. Cox, D. R. (1972). Regression Models and Life-Tables. Journal of the Royal Statistical Society: Series B (Methodological), 34(2), 187–220.

    doi.org/10.1111/j.2517-6161.1972.tb00899.x

    foundationalproportional-hazardspartial-likelihood
    Annotation

    Cox introduces the proportional hazards model with an unspecified baseline hazard, estimated via a conditional likelihood argument (later formalized as partial likelihood in Cox, 1975). The semiparametric approach avoids distributional assumptions on the baseline hazard while allowing covariate effects to be estimated consistently. One of the most cited papers in statistics.

  54. Crepon, B., Duflo, E., Gurgand, M., Rathelot, R., & Zamora, P. (2013). Do Labor Market Policies Have Displacement Effects? Evidence from a Clustered Randomized Experiment. Quarterly Journal of Economics, 128(2), 531–580.

    doi.org/10.1093/qje/qjt001

    Applicationon experimental design
    job-placementdisplacement-effectscluster-RCTFrance
    Annotation

    Crepon and colleagues evaluate a job placement assistance program in France using a two-step clustered randomization design that varies treatment intensity across 235 labor markets. The paper's key contribution is identifying displacement effects: treated job seekers gain at the expense of untreated competitors, particularly in weak labor markets and among workers with similar skills. This innovative experimental design allows estimation of both direct and indirect (general equilibrium) effects of active labor market policies.

  55. Cunat, V., Gine, M., & Guadalupe, M. (2012). The Vote Is Cast: The Effect of Corporate Governance on Shareholder Value. Journal of Finance, 67(5), 1943–1977.

    doi.org/10.1111/j.1540-6261.2012.01776.x

    corporate-governanceshareholder-valueclose-votes
    Annotation

    Cunat, Gine, and Guadalupe use a fuzzy RDD around the majority threshold in shareholder governance proposals to estimate the causal effect of governance provisions on firm value. This paper is a leading example of fuzzy RDD applied to corporate governance and finance.

  56. Cunningham, S., & Shah, M. (2018). Decriminalizing Indoor Prostitution: Implications for Sexual Violence and Public Health. Review of Economic Studies, 85(3), 1683–1715.

    doi.org/10.1093/restud/rdx065

    Applicationon synthetic control
    policy-evaluationpublic-healthcrime
    Annotation

    Cunningham and Shah use the synthetic control method to study how Rhode Island's accidental decriminalization of indoor prostitution affected sex crimes and STI rates. This study is a well-known application that illustrates how synthetic control can exploit a unique policy change affecting a single unit.

  57. Cunningham, S. (2021). Causal Inference: The Mixtape. Yale University Press.

    doi.org/10.12987/9780300255881

    textbookcausal-inferenceaccessiblecode-examples
    Annotation

    Cunningham provides an accessible textbook with an excellent DiD chapter that walks through the intuition, the math, and the code (in Stata and R). Freely available online at mixtape.scunning.com, it is a valuable companion for students who want worked examples alongside formal treatment.

D
15
  1. Dahabreh, I. J., Robertson, S. E., Tchetgen Tchetgen, E. J., Stuart, E. A., & Hernan, M. A. (2019). Generalizing Causal Inferences from Individuals in Randomized Trials to All Trial-Eligible Individuals. Biometrics, 75(2), 685–694.

    doi.org/10.1111/biom.13009

    Foundationalon external validity
    Annotation

    Dahabreh, Robertson, Tchetgen Tchetgen, Stuart, and Hernan develop a formal framework for generalizing causal inferences from randomized trial participants to all trial-eligible individuals in a target population, using baseline covariate data from both randomized and non-randomized individuals. They establish identifiability conditions and propose inverse probability weighting, outcome modeling, and doubly robust estimators for the target population average treatment effect. Researchers conducting trials nested within observational cohorts can apply this framework to estimate treatment effects for the full eligible population rather than only for those who enrolled.

  2. Davis, J., & Heller, S. B. (2017). Using Causal Forests to Predict Treatment Heterogeneity: An Application to Summer Jobs. American Economic Review, 107(5), 546–550.

    doi.org/10.1257/aer.p20171000

    Applicationon causal forests
    policy-evaluationsummer-jobstargeting
    Annotation

    Davis and Heller apply causal forests to a randomized summer jobs program for disadvantaged youth in Chicago, exploring how useful predicted treatment effect heterogeneity is in practice. They find the method can identify heterogeneity for some outcomes that standard interaction methods miss, while highlighting limitations of the approach.

  3. de Chaisemartin, C., & D'Haultfoeuille, X. (2020). Two-Way Fixed Effects Estimators with Heterogeneous Treatment Effects. American Economic Review, 110(9), 2964–2996.

    doi.org/10.1257/aer.20181169

    negative-weightsTWFEheterogeneous-effects
    Annotation

    De Chaisemartin and D'Haultfoeuille show that the TWFE estimator can assign negative weights to some treatment effects, potentially producing estimates with the wrong sign. They propose an alternative estimator and a decomposition that reveals which group-time effects receive negative weights.

  4. de Chaisemartin, C., & D'Haultfoeuille, X. (2023). Two-Way Fixed Effects and Differences-in-Differences with Heterogeneous Treatment Effects: A Survey. Econometrics Journal, 26(3), C1–C30.

    doi.org/10.1093/ectj/utac017

    Surveyon event studies
    TWFEheterogeneous-effectssurveyDID
    Annotation

    De Chaisemartin and D'Haultfoeuille provide a comprehensive survey of the recent literature on problems with two-way fixed effects estimators under heterogeneous treatment effects. They cover the key diagnostic tests (including the Goodman-Bacon decomposition), alternative estimators that are robust to heterogeneity, and practical guidance for choosing among them. The survey is essential reading for applied researchers working with event-study and difference-in-differences designs who need to understand when standard TWFE is and is not appropriate.

  5. Dehejia, R. H., & Wahba, S. (1999). Causal Effects in Nonexperimental Studies: Reevaluating the Evaluation of Training Programs. Journal of the American Statistical Association, 94(448), 1053–1062.

    doi.org/10.1080/01621459.1999.10473858

    Applicationon matching methods
    propensity-scoreprogram-evaluationexperimental-benchmark
    Annotation

    Dehejia and Wahba show that propensity score matching can replicate experimental estimates of a job training program using observational data, revisiting LaLonde's influential critique. The paper demonstrates the practical value of matching by showing that propensity score methods yield estimates much closer to the experimental benchmark than the nonexperimental estimators LaLonde had examined.

  6. Dell, M. (2010). The Persistent Effects of Peru's Mining Mita. Econometrica, 78(6), 1863–1903.

    doi.org/10.3982/ECTA8121

    geographic-RDDcolonial-institutionspersistencespatial-discontinuity
    Annotation

    Dell uses a geographic RDD exploiting the historical boundary of the mita forced labor system in Peru to estimate the persistent effect of colonial institutions on economic outcomes centuries later. The study demonstrates how RDD can exploit spatial discontinuities, not just score-based cutoffs.

  7. Deshpande, M., & Li, Y. (2019). Who Is Screened Out? Application Costs and the Targeting of Disability Programs. American Economic Journal: Economic Policy, 11(4), 213–248.

    doi.org/10.1257/pol.20180076

    disability-policystaggered-rolloutfield-office-closures
    Annotation

    Deshpande and Li use staggered closings of Social Security field offices across the United States to estimate the effects of application costs on disability program participation. The staggered timing of office closures provides quasi-experimental variation in application costs, and the paper demonstrates how treatment-timing variation can be leveraged for credible policy evaluation.

  8. Dong, Y. (2015). Regression Discontinuity Applications with Rounding Errors in the Running Variable. Journal of Applied Econometrics, 30(3), 422–446.

    doi.org/10.1002/jae.2369

    rounding-errorsdiscrete-running-variablediagnosticsmeasurement
    Annotation

    Dong examines regression discontinuity designs when the running variable is subject to rounding or heaping, a common practical concern. She shows that standard RD estimators can be biased in such settings and derives correction formulas for the resulting discretization bias, extending the applicability of RDD to settings with imperfect measurement of the running variable.

  9. Dong, Y., & Lewbel, A. (2015). Identifying the Effect of Changing the Policy Threshold in Regression Discontinuity Models. Review of Economics and Statistics, 97(5), 1081–1092.

    doi.org/10.1162/REST_a_00510

    policy-thresholdcounterfactualfuzzy-RDD-extensions
    Annotation

    Dong and Lewbel show that the derivative of the RD treatment effect with respect to the running variable at the cutoff is identified. Under a local policy-invariance interpretation, this derivative can be used to evaluate counterfactual policies that shift the eligibility threshold, broadening the policy relevance of RDD beyond the effect at the existing cutoff.

  10. Doudchenko, N., & Imbens, G. W. (2016). Balancing, Regression, Difference-in-Differences and Synthetic Control Methods: A Synthesis. NBER Working Paper No. 22791.

    doi.org/10.3386/w22791

    Foundationalon synthetic control
    unificationDID-connectionpenalized-regression
    Annotation

    Doudchenko and Imbens place synthetic control within a broader framework that includes DID and regression as special cases, proposing extensions that relax the non-negativity and adding-up constraints on weights. This paper helps researchers understand the connections between synthetic control and other methods.

  11. Dranove, D., & Olsen, C. (1994). The Economic Side Effects of Dangerous Drug Announcements. Journal of Law and Economics, 37(2), 323–348.

    doi.org/10.1086/467316

    Applicationon event studies
    pharmaceuticalFDAregulationstock-market
    Annotation

    Dranove and Olsen use event studies to measure the stock market impact of FDA drug safety announcements on pharmaceutical firms. This application demonstrates how event studies can quantify the financial consequences of regulatory actions in health care and management contexts.

  12. Dube, A., Girardi, D., Jordà, Ò., & Taylor, A. M. (2025). A Local Projections Approach to Difference-in-Differences. Journal of Applied Econometrics, 40(7), 741–758.

    doi.org/10.1002/jae.70000

    local-projectionsdynamic-effectsevent-study
    Annotation

    Dube and colleagues propose a local projections (LP) approach to difference-in-differences estimation that combines LPs with a flexible 'clean control' condition to define appropriate treated and control units. The LP-DiD estimator subsumes many recent solutions to negative weighting problems, accommodates covariates and nonabsorbing treatments, and is simple to implement.

  13. Duflo, E. (2001). Schooling and Labor Market Consequences of School Construction in Indonesia: Evidence from an Unusual Policy Experiment. American Economic Review, 91(4), 795–813.

    doi.org/10.1257/aer.91.4.795

    educationschool-constructionIndonesiatreatment-intensity
    Annotation

    Duflo uses DiD comparing cohorts exposed to a massive school construction program in Indonesia to older cohorts not exposed, across regions with different program intensity. A beautifully clean application showing how DiD can exploit variation in treatment intensity across space and cohorts.

  14. Duflo, E., Glennerster, R., & Kremer, M. (2007). Using Randomization in Development Economics Research: A Toolkit. Handbook of Development Economics, 4, 3895–3962.

    doi.org/10.1016/S1573-4471(07)04061-2

    development-economicstoolkitfield-experimentspractical-guide
    Annotation

    Duflo, Glennerster, and Kremer write a comprehensive practical guide to running randomized experiments in development economics. The chapter covers all stages from design to analysis, including power calculations, stratification, dealing with attrition, and estimating treatment effects with imperfect compliance. It has become required reading for anyone designing a field experiment.

  15. Dunning, T. (2012). Natural Experiments in the Social Sciences: A Design-Based Approach. Cambridge University Press.

    doi.org/10.1017/CBO9781139084444

    Foundationalon experimental design
    natural-experimentsdesign-basedtextbooksocial-sciences
    Annotation

    Dunning provides a systematic framework for identifying and analyzing natural experiments across the social sciences. The book covers as-if random assignment, instrumental variables, regression discontinuity, and difference-in-differences through a unified design-based lens, making it essential reading for researchers exploiting natural variation for causal inference.

F
17
  1. Fama, E. F., Fisher, L., Jensen, M. C., & Roll, R. (1969). The Adjustment of Stock Prices to New Information. International Economic Review, 10(1), 1–21.

    doi.org/10.2307/2525569

    Foundationalon event studies
    stock-pricesabnormal-returnsmarket-efficiency
    Annotation

    Fama, Fisher, Jensen, and Roll establish the modern event study methodology by studying how stock prices adjust to stock splits. They develop the framework of measuring abnormal returns around corporate events using a market model to construct the counterfactual return. This methodology has become the standard tool for studying how information events affect asset prices and is used in thousands of subsequent studies across finance and strategy.

  2. Fan, Q., Hsu, Y.-C., Lieli, R. P., & Zhang, Y. (2022). Estimation of Conditional Average Treatment Effects with High-Dimensional Data. Journal of Business & Economic Statistics, 40(1), 313–327.

    doi.org/10.1080/07350015.2020.1811102

    CATEhigh-dimensionaldoubly-robust
    Annotation

    Fan and colleagues propose nonparametric estimators for conditional average treatment effects in high-dimensional settings. Their approach uses machine learning to estimate nuisance functions in a first stage, then applies local linear regression for the CATE function of interest, with functional limit theory and multiplier-bootstrap uniform inference.

  3. Fine, J. P., & Gray, R. J. (1999). A Proportional Hazards Model for the Subdistribution of a Competing Risk. Journal of the American Statistical Association, 94(446), 496–509.

    doi.org/10.1080/01621459.1999.10474144

    foundationalcompeting-riskssubdistribution
    Annotation

    Fine and Gray develop a regression model for the cumulative incidence function under competing risks. The Fine-Gray model extends the Cox framework to settings where multiple event types compete, allowing estimation of covariate effects on the subdistribution hazard.

  4. Finkelstein, A., Taubman, S., Wright, B., Bernstein, M., Gruber, J., Newhouse, J. P., Allen, H., Baicker, K., & The Oregon Health Study Group (2012). The Oregon Health Insurance Experiment: Evidence from the First Year. Quarterly Journal of Economics, 127(3), 1057–1106.

    doi.org/10.1093/qje/qjs020

    Applicationon experimental design
    health-insurancelotteryLATEfield-experiment
    Annotation

    Finkelstein and colleagues analyze the Oregon Health Insurance Experiment, in which uninsured low-income adults are selected by lottery for the chance to apply for Medicaid. Using this randomized controlled design with IV to handle noncompliance, they estimate the local average treatment effect of Medicaid coverage on health care utilization, financial strain, and self-reported health. The study demonstrates the practical difference between intent-to-treat and LATE estimates in a real-world experiment where not all lottery winners enrolled.

  5. Firpo, S., Fortin, N. M., & Lemieux, T. (2009). Unconditional Quantile Regressions. Econometrica, 77(3), 953–973.

    doi.org/10.3982/ECTA6822

    foundationalunconditional-quantileRIF
    Annotation

    Firpo, Fortin, and Lemieux introduce the recentered influence function (RIF) regression for estimating unconditional quantile effects. They show that standard quantile regression estimates conditional quantile effects that do not aggregate to unconditional effects. RIF regression transforms the outcome variable so that OLS on the transformed outcome recovers the effect of covariates on unconditional quantiles. The key innovation enabling policy-relevant distributional analysis.

  6. Firpo, S., & Possebom, V. (2018). Synthetic Control Method: Inference, Sensitivity Analysis and Confidence Sets. Journal of Causal Inference, 6(2), 1–26.

    doi.org/10.1515/jci-2016-0026

    Foundationalon synthetic control
    inferencesensitivity-analysisconfidence-sets
    Annotation

    Firpo and Possebom develop formal inference procedures for the synthetic control method, including sensitivity analysis tools and confidence sets. Their framework provides a more rigorous basis for statistical inference in synthetic control applications beyond the standard permutation-based placebo tests.

  7. Fisher, R. A. (1935). The Design of Experiments. Oliver & Boyd.

    randomizationfactorial-designfoundations
    Annotation

    Fisher's classic book lays the foundations of experimental design, introducing concepts like randomization, blocking, and factorial designs. The 'lady tasting tea' example from this book remains one of the most famous illustrations of hypothesis testing and the logic of controlled experiments.

  8. Flammer, C. (2015). Does Corporate Social Responsibility Lead to Superior Financial Performance? A Regression Discontinuity Approach. Management Science, 61(11), 2549–2568.

    doi.org/10.1287/mnsc.2014.2038

    CSRshareholder-votingmanagement-science
    Annotation

    Flammer uses a regression discontinuity design around close-call shareholder votes on CSR proposals, comparing proposals that pass or fail by a small margin as a quasi-experiment. She finds that adopting CSR proposals leads to positive announcement returns and superior accounting performance, with effects operating through labor productivity and sales growth. Published in Management Science, it is a prominent example of RDD in top management journals.

  9. Fleming, L., & Sorenson, O. (2001). Technology as a Complex Adaptive System: Evidence from Patent Data. Research Policy, 30(7), 1019–1039.

    doi.org/10.1016/S0048-7333(00)00135-9

    patent-citationstechnology-complexityinnovation
    Annotation

    Fleming and Sorenson use negative binomial regression on patent citation counts to study how the complexity of technological combinations affects the usefulness of inventions. This paper is a prominent application of count models in the innovation and technology management literature.

  10. Frake, J., Gibbs, A., Goldfarb, B., Hiraiwa, T., Starr, E., & Yamaguchi, S. (2025). From Perfect to Practical: Partial Identification Methods for Causal Inference in Strategic Management Research. Strategic Management Journal, 46(8), 1894–1929.

    doi.org/10.1002/smj.3714

    partial-identificationsensitivity-analysisboundsmanagement
    Annotation

    Frake and colleagues introduce partial identification methods to strategic management, providing a practical framework for assessing the sensitivity of difference-in-differences and instrumental variables estimates to violations of identifying assumptions. The paper demonstrates how researchers can construct informative bounds on treatment effects when parallel trends or exclusion restriction assumptions are relaxed. It bridges the gap between the theoretical ideal of point identification and the practical reality that identifying assumptions are rarely perfectly satisfied.

  11. Frank, K. A. (2000). Impact of a Confounding Variable on a Regression Coefficient. Sociological Methods & Research, 29(2), 147–194.

    doi.org/10.1177/0049124100029002001

    Foundationalon sensitivity analysis
    ITCVconfounding-variablethreshold
    Annotation

    Frank develops the impact threshold for a confounding variable (ITCV), which calculates how much bias an omitted variable would need to introduce to invalidate an inference. This approach is widely adopted in education and management research.

  12. Freeman, R. B., & Medoff, J. L. (1984). What Do Unions Do?. Basic Books.

    Applicationon fixed effects
    union-wage-premiumfixed-effectslabor-economics
    Annotation

    Freeman and Medoff examine the effects of unions on wages, productivity, inequality, and workplace governance, drawing on a wide range of data sources and econometric methods including longitudinal analysis. The book argues that unions have both a monopoly face (raising wages above competitive levels) and a collective voice face (improving workplace communication and reducing turnover). It remains influential as a comprehensive empirical assessment of union effects and a common pedagogical motivation for fixed effects methods in labor economics.

  13. Fremeth, A. R., Holburn, G. L. F., & Richter, B. K. (2016). Bridging Qualitative and Quantitative Methods in Organizational Research: Applications of Synthetic Control Methodology in the U.S. Automobile Industry. Organization Science, 27(2), 462–482.

    doi.org/10.1287/orsc.2015.1034

    ApplicationMgmton synthetic control
    managementstrategyfirm-level-synthetic-control
    Annotation

    Fremeth, Holburn, and Richter introduce synthetic control methodology to strategic management research, demonstrating its application for studying the causal effect of organizational and regulatory events on individual firms. The paper shows how data-driven counterfactuals can replace ad-hoc comparison group selection in comparative case studies. It provides a template for strategy researchers seeking to apply synthetic control methods to firm-level outcome data with few treated units.

  14. Freyaldenhoven, S., Hansen, C., & Shapiro, J. M. (2019). Pre-Event Trends in the Panel Event-Study Design. American Economic Review, 109(9), 3307–3338.

    doi.org/10.1257/aer.20180609

    Foundationalon event studies
    pre-trendspanel-datainstrumental-variables
    Annotation

    Freyaldenhoven, Hansen, and Shapiro study panel event-study designs in which unobserved confounds can generate pre-event trends. They show how causal effects can still be identified by exploiting covariates related to the policy only through the confounds, yielding a 2SLS estimator that remains valid even when endogeneity induces pre-trends.

  15. Friebel, G., Heinz, M., & Zubanov, N. (2022). Middle Managers, Personnel Turnover, and Performance: A Long-Term Field Experiment in a Retail Chain. Management Science, 68(1), 211–229.

    doi.org/10.1287/mnsc.2020.3905

    ApplicationMgmton experimental design
    field-experimentRCTmanagement-practicesturnoverretail
    Annotation

    Friebel, Heinz, and Zubanov conduct a long-term randomized field experiment in a large Eastern European retail chain, in which the CEO asked treated store managers to reduce employee quit rates. The intervention decreased the quit rate by a fifth to a quarter, lasting nine months before petering out, but reappearing after a reminder. However, there is no treatment effect on sales, illustrating that reducing turnover does not automatically translate into improved store performance.

  16. Frisch, R., & Waugh, F. V. (1933). Partial Time Regressions as Compared with Individual Trends. Econometrica, 1(4), 387–401.

    doi.org/10.2307/1907330

    Foundationalon ols regression
    FWL-theorempartialling-outmultiple-regressionfixed-effects
    Annotation

    Frisch and Waugh establish that a coefficient in a multiple regression can be obtained by first residualizing both the outcome and the regressor against all other covariates. The Frisch-Waugh-Lovell (FWL) theorem provides the theoretical foundation for understanding what 'controlling for' means in multiple regression and is the basis for modern fixed-effects estimation.

  17. Funk, M. J., Westreich, D., Wiesen, C., Sturmer, T., Brookhart, M. A., & Davidian, M. (2011). Doubly Robust Estimation of Causal Effects. American Journal of Epidemiology, 173(7), 761–767.

    doi.org/10.1093/aje/kwq439

    epidemiologytutorialAIPW
    Annotation

    Funk and colleagues provide a practical tutorial on doubly robust estimation for epidemiologists, demonstrating through a worked example how the AIPW estimator protects against misspecification of either the outcome model or the propensity score model. This paper helps spread the method in health sciences.

G
19
  1. Gelman, A., & Carlin, J. (2014). Beyond Power Calculations: Assessing Type S (Sign) and Type M (Magnitude) Errors. Perspectives on Psychological Science, 9(6), 641–651.

    doi.org/10.1177/1745691614551642

    Foundationalon power analysis
    Type-S-errorType-M-errorexaggeration-ratio
    Annotation

    Gelman and Carlin extend traditional power analysis by introducing Type S (sign) errors (the probability a significant estimate has the wrong sign) and Type M (magnitude) errors (the expected exaggeration ratio of significant estimates). These concepts provide a richer understanding of what happens in underpowered studies.

  2. Gelman, A., & Loken, E. (2014). The Statistical Crisis in Science. American Scientist, 102(6), 460–465.

    doi.org/10.1511/2014.111.460

    Foundationalon pre registration
    replication-crisisstatistical-crisispre-registration
    Annotation

    Gelman and Loken argue that data-dependent analysis creates a 'garden of forking paths' that explains why many statistically significant comparisons do not hold up. They emphasize that researchers' analytical choices conditional on data characteristics inflate false positive rates even without deliberate p-hacking.

  3. Gelman, A., & Imbens, G. W. (2019). Why High-Order Polynomials Should Not Be Used in Regression Discontinuity Designs. Journal of Business & Economic Statistics, 37(3), 447–456.

    doi.org/10.1080/07350015.2017.1366909

    polynomial-orderlocal-polynomialbest-practicesbandwidth
    Annotation

    Gelman and Imbens show that using high-order global polynomials in RDD leads to noisy estimates, sensitivity to the degree of polynomial, and poor coverage of confidence intervals. They recommend local linear or quadratic fits with appropriate bandwidth selection instead, fundamentally changing best practice for RDD estimation.

  4. Gerard, F., Rokkanen, M., & Rothe, C. (2020). Bounds on Treatment Effects in Regression Discontinuity Designs with a Manipulated Running Variable. Quantitative Economics, 11(3), 839–870.

    doi.org/10.3982/QE1079

    Foundationalon lee bounds
    RDDmanipulationrunning-variable
    Annotation

    Gerard, Rokkanen, and Rothe study regression-discontinuity settings in which the running variable is manipulated, so conventional point identification fails. They show that treatment effects are still partially identified and derive sharp bounds under a general model in which the extent of manipulation is learned from the data.

  5. Gerber, A. S., & Green, D. P. (2012). Field Experiments: Design, Analysis, and Interpretation. W. W. Norton.

    field-experimentstextbookpolitical-science
    Annotation

    Gerber and Green write a comprehensive textbook on field experiments covering randomization, blocking, clustering, noncompliance, and attrition. The book provides rigorous treatment of experimental design principles with practical guidance drawn from political science and public policy applications. It is particularly valuable for its coverage of complications that arise in real-world experiments, including how to handle noncompliance through intent-to-treat analysis and instrumental variables.

  6. Glynn, A. N., & Quinn, K. M. (2010). An Introduction to the Augmented Inverse Propensity Weighted Estimator. Political Analysis, 18(1), 36–56.

    doi.org/10.1093/pan/mpp036

    political-sciencetutorialAIPW
    Annotation

    Glynn and Quinn introduce the AIPW estimator to political scientists, providing intuition, simulation evidence, and practical guidance. This tutorial demonstrates the advantages of doubly robust methods over propensity score weighting or outcome regression alone in social science applications.

  7. Gneezy, U., & List, J. A. (2006). Putting Behavioral Economics to Work: Testing for Gift Exchange in Labor Markets Using Field Experiments. Econometrica, 74(5), 1365–1384.

    doi.org/10.1111/j.1468-0262.2006.00707.x

    Annotation

    Gneezy and List conduct field experiments to test gift exchange in labor markets. Workers who received an unexpectedly higher wage initially increased effort, but the effect dissipated within hours, suggesting that strong forms of gift exchange may not persist outside the laboratory.

  8. Gobillon, L., & Magnac, T. (2016). Regional Policy Evaluation: Interactive Fixed Effects and Synthetic Controls. Review of Economics and Statistics, 98(3), 535–551.

    doi.org/10.1162/REST_a_00537

    Foundationalon synthetic control
    interactive-fixed-effectsfactor-modelsregional-policy
    Annotation

    Gobillon and Magnac connect synthetic control to interactive fixed-effects models, showing that synthetic control can be interpreted as an estimator that allows for time-varying factor loadings. This paper bridges the synthetic control and factor model literatures.

  9. Goldfarb, B., & King, A. A. (2016). Scientific Apophenia in Strategic Management Research: Significance Tests & Mistaken Inference. Strategic Management Journal, 37(1), 167–176.

    doi.org/10.1002/smj.2459

    ApplicationMgmton specification curve
    apopheniastrategic-managementrobustness
    Annotation

    Goldfarb and King use distributional matching and posterior predictive checks to estimate that 24-40% of significant coefficients in strategic management research would become insignificant if studies were repeated. They document the problem of apophenia (finding patterns in noise) and offer practical suggestions for reducing false and inflated findings at both the individual and field level.

  10. Goldsmith-Pinkham, P., Sorkin, I., & Swift, H. (2020). Bartik Instruments: What, When, Why, and How. American Economic Review, 110(8), 2586–2624.

    doi.org/10.1257/aer.20181047

    share-exogeneitydecompositionidentification
    Annotation

    Goldsmith-Pinkham, Sorkin, and Swift provide a rigorous econometric framework for shift-share instruments, showing that the Bartik instrument can be decomposed into a weighted sum of individual share-based instruments. They clarify that identification requires exogeneity of the initial shares, not the shocks.

  11. Goodman-Bacon, A. (2021). Difference-in-Differences with Variation in Treatment Timing. Journal of Econometrics, 225(2), 254–277.

    doi.org/10.1016/j.jeconom.2021.03.014

    TWFE-decompositiontreatment-timingnegative-weights
    Annotation

    Goodman-Bacon decomposes the two-way fixed-effects DID estimator into a weighted average of all possible two-group, two-period DID comparisons, revealing that some comparisons use already-treated units as controls. The decomposition clarifies when already-treated units enter as controls and why this can make the estimator difficult to interpret under treatment-effect heterogeneity.

  12. Gornall, W., & Strebulaev, I. A. (2025). Gender, Race, and Entrepreneurship: A Randomized Field Experiment on Venture Capitalists and Angels. Management Science, 71(6), 5308–5327.

    doi.org/10.1287/mnsc.2024.4990

    ApplicationMgmton experimental design
    audit-studycorrespondence-studydiscriminationventure-capitalentrepreneurship+2
    Annotation

    Gornall and Strebulaev conduct a large-scale correspondence experiment, sending approximately 80,000 pitch emails from fictitious startups to 28,000 venture capitalists and angel investors. By randomly varying the entrepreneur's name to signal gender and race, they find that female entrepreneurs received 9% more interested replies and Asian-surname entrepreneurs received 6% more responses than White-surname entrepreneurs, indicating favorable rather than adverse bias. The paper provides large-scale experimental evidence on investor response patterns by entrepreneur demographics in entrepreneurial finance.

  13. Gourieroux, C., Monfort, A., & Trognon, A. (1984). Pseudo Maximum Likelihood Methods: Theory. Econometrica, 52(3), 681–700.

    doi.org/10.2307/1913471

    pseudo-MLEPoisson-regressionrobust-estimationPPML
    Annotation

    Gourieroux, Monfort, and Trognon develop the general theory of pseudo maximum likelihood estimation for cases in which the likelihood family may be misspecified. They derive conditions for consistency and asymptotic normality and characterize efficiency bounds in this broader framework. The Poisson PML result — consistency for the conditional mean under misspecification — is a special case that underpins the later widespread use of Poisson regression with robust standard errors.

  14. Grambsch, P. M., & Therneau, T. M. (1994). Proportional Hazards Tests and Diagnostics Based on Weighted Residuals. Biometrika, 81(3), 515–526.

    doi.org/10.1093/biomet/81.3.515

    foundationaldiagnosticsschoenfeld-residuals
    Annotation

    Grambsch and Therneau introduce the scaled Schoenfeld residual test for the proportional hazards assumption. Plotting scaled Schoenfeld residuals against time reveals time-varying effects. The test is the standard diagnostic in applied survival analysis.

  15. Grant, A. M. (2008). The Significance of Task Significance: Job Performance Effects, Relational Mechanisms, and Boundary Conditions. Journal of Applied Psychology, 93(1), 108–124.

    doi.org/10.1037/0021-9010.93.1.108

    Applicationon experimental design
    task-significancemotivationorganizational-behaviorfield-experiment
    Annotation

    Grant conducts field experiments showing that briefly exposing workers to the beneficiaries of their work significantly increased their motivation and performance. This paper is a well-known example of experimental design applied within organizational behavior research.

  16. Greve, H. R. (2003). A Behavioral Theory of R&D Expenditures and Innovations: Evidence from Shipbuilding. Academy of Management Journal, 46(6), 685–702.

    doi.org/10.5465/30040661

    ApplicationMgmton poisson negative binomial
    behavioral-theoryaspiration-levelsinnovationR&Dnegative-binomial+1
    Annotation

    Greve tests behavioral theory predictions about how performance relative to aspiration levels affects R&D investment and innovation output using count models in the Japanese shipbuilding industry. He finds that low performance triggers problemistic search (increasing R&D), high slack triggers slack search (also increasing R&D), and low performance increases risk tolerance for launching innovations. The paper demonstrates how to model count-based innovation outcomes with firm-level panel data in a management context.

  17. Griliches, Z. (1977). Estimating the Returns to Schooling: Some Econometric Problems. Econometrica, 45(1), 1–22.

    doi.org/10.2307/1913285

    Foundationalon ols regression
    ability-biasreturns-to-educationomitted-variables
    Annotation

    Griliches systematically examines the biases in OLS estimates of returns to schooling, including ability bias and measurement error. This paper is a classic illustration of why researchers must think carefully about omitted variables when interpreting OLS coefficients causally.

  18. Griliches, Z. (1990). Patent Statistics as Economic Indicators: A Survey. Journal of Economic Literature, 28(4), 1661–1707.

    patentsinnovationeconomic-indicators
    Annotation

    Griliches surveys the use of patent data as economic indicators, establishing patent counts as a key measure of innovative output. This survey motivates much of the subsequent applied work using Poisson and negative binomial models to study innovation.

  19. Gruber, J. (1994). The Incidence of Mandated Maternity Benefits. American Economic Review, 84(3), 622–641.

    maternity-benefitslabor-economicspolicy-evaluation
    Annotation

    Gruber uses a DID design exploiting variation in state-level mandated maternity benefits to show that the costs of these benefits are shifted to workers in the form of lower wages. This study is a classic example of how DID can exploit policy variation across states and time.

H
26
  1. Hahn, J. (1998). On the Role of the Propensity Score in Efficient Semiparametric Estimation of Average Treatment Effects. Econometrica, 66(2), 315–331.

    doi.org/10.2307/2998560

    semiparametric-efficiencypropensity-scoreefficiency-bound
    Annotation

    Hahn derives the semiparametric efficiency bound for estimating average treatment effects and shows that knowledge of the propensity score does not improve the bound—it is ancillary for ATE. The efficient estimators take the form of sample averages completed by nonparametric imputation. This paper is foundational for understanding efficient semiparametric estimation of treatment effects.

  2. Hahn, J., Todd, P., & Van der Klaauw, W. (2001). Identification and Estimation of Treatment Effects with a Regression-Discontinuity Design. Econometrica, 69(1), 201–209.

    doi.org/10.1111/1468-0262.00183

    identificationnonparametricWald-estimator
    Annotation

    Hahn, Todd, and Van der Klaauw provide the formal econometric framework for both sharp and fuzzy regression discontinuity designs. For the fuzzy case, they show that the treatment effect can be identified as the ratio of the discontinuity in the outcome to the discontinuity in the treatment probability, analogous to a Wald estimator.

  3. Hainmueller, J. (2012). Entropy Balancing for Causal Effects: A Multivariate Reweighting Method to Produce Balanced Samples in Observational Studies. Political Analysis, 20(1), 25–46.

    doi.org/10.1093/pan/mpr025

    Foundationalon matching methods
    entropy-balancingreweightingcovariate-balanceobservational-studies
    Annotation

    Hainmueller introduces entropy balancing, a reweighting scheme that directly targets covariate balance by finding weights that satisfy pre-specified balance constraints while remaining as close to uniform as possible. Entropy balancing has become a popular alternative to propensity score matching because it achieves exact balance on specified moments by construction.

  4. Hamilton, B. H., & Nickerson, J. A. (2003). Correcting for Endogeneity in Strategic Management Research. Strategic Organization, 1(1), 51–78.

    doi.org/10.1177/1476127003001001218

    FoundationalMgmton ols regression
    endogeneitystrategyself-selection
    Annotation

    Hamilton and Nickerson warn strategy researchers that naive OLS estimates of the strategy-performance relationship are often biased by endogeneity, because firms that adopt a strategy differ systematically from those that do not. They provide an accessible tutorial on endogeneity and point toward solutions including instrumental variables and Heckman selection models. The paper remains a key reference for understanding why strategic management research requires identification strategies beyond simple regression.

  5. Harrison, G. W., & List, J. A. (2004). Field Experiments. Journal of Economic Literature, 42(4), 1009–1055.

    doi.org/10.1257/0022051043004577

    field-experimentstaxonomyexternal-validity
    Annotation

    Harrison and List provide an influential taxonomy of field experiments, distinguishing artefactual, framed, and natural field experiments from conventional lab experiments. The paper helps establish field experiments as a mainstream methodology in economics.

  6. Haushofer, J., & Shapiro, J. (2016). The Short-Term Impact of Unconditional Cash Transfers to the Poor: Experimental Evidence from Kenya. Quarterly Journal of Economics, 131(4), 1973–2042.

    doi.org/10.1093/qje/qjw025

    Applicationon multiple testing
    cash-transfersRCTFDRdevelopment-economics
    Annotation

    Haushofer and Shapiro evaluate GiveDirectly's unconditional cash transfer program in Kenya, testing effects across many outcome domains including consumption, assets, food security, health, and psychological well-being. They apply FWER corrections with bootstrapped p-values across outcome families, providing a model for how to handle multiple testing transparently in large-scale randomized evaluations. A 2017 erratum (QJE 132(4): 2057–2060) corrected the FWER-adjusted p-values in Tables I and II, which had used insufficient bootstrap iterations.

  7. Hausman, J. A. (1978). Specification Tests in Econometrics. Econometrica, 46(6), 1251–1271.

    doi.org/10.2307/1913827

    Hausman-testspecification-testfixed-vs-random
    Annotation

    Hausman develops a general framework for specification testing based on comparing two estimators: one consistent under a broad set of assumptions and one efficient under a narrower null hypothesis. The test's most well-known application compares fixed effects (consistent if unit effects are correlated with regressors) against random effects (efficient under the null of no correlation), but the framework applies broadly to IV, simultaneous equations, and time-series cross-section models. The test statistic has a chi-squared distribution under the null and remains one of the most widely used diagnostic tools in applied econometrics.

  8. Hausman, J. A., & Taylor, W. E. (1981). Panel Data and Unobservable Individual Effects. Econometrica, 49(6), 1377–1398.

    doi.org/10.2307/1911406

    Foundationalon random effects
    Hausman-Taylortime-invariant-variablespanel-datainstrumental-variables
    Annotation

    Hausman and Taylor develop an instrumental variables estimator for panel data that allows consistent estimation of coefficients on time-invariant variables even when individual effects are correlated with some regressors. The Hausman-Taylor estimator occupies a middle ground between fixed effects (which cannot estimate time-invariant coefficients) and random effects (which requires strict exogeneity).

  9. Hausman, J., Hall, B. H., & Griliches, Z. (1984). Econometric Models for Count Data with an Application to the Patents–R&D Relationship. Econometrica, 52(4), 909–938.

    doi.org/10.2307/1911191

    count-datapatentsR&Dpanel-data
    Annotation

    Hausman, Hall, and Griliches develop the econometric framework for Poisson and negative binomial regression models applied to count data, using the relationship between R&D spending and patent counts as the motivating application. The paper is a classic early econometric treatment of count-data models in panel settings.

  10. Hausman, J., & McFadden, D. (1984). Specification Tests for the Multinomial Logit Model. Econometrica, 52(5), 1219–1240.

    doi.org/10.2307/1910997

    Foundationalon logit probit
    IIAspecification-testmultinomial-logit
    Annotation

    Hausman and McFadden develop a specification test for the independence of irrelevant alternatives (IIA) assumption in multinomial logit. The test allows researchers to assess whether the logit model's restrictive substitution patterns are appropriate for their data, which is critical for applied work with multiple choice categories.

  11. Haven, T. L., & Van Grootel, L. (2019). Preregistering Qualitative Research. Accountability in Research, 26(3), 229–244.

    doi.org/10.1080/08989621.2019.1580147

    qualitative-researchpre-registrationextension
    Annotation

    Haven and Van Grootel explore extending pre-registration to qualitative research, discussing what elements of qualitative studies can and should be pre-registered. This paper broadens the pre-registration conversation beyond quantitative experimental designs.

  12. Heckman, J. J. (1979). Sample Selection Bias as a Specification Error. Econometrica, 47(1), 153–161.

    doi.org/10.2307/1912352

    foundationalselection-biasinverse-mills-ratio
    Annotation

    Heckman introduces the two-step estimator for correcting sample selection bias using the inverse Mills ratio. The paper shows that selection bias can be treated as an omitted variable problem, where the omitted variable is the conditional expectation of the error term given selection. One of the most cited papers in econometrics.

  13. Heckman, J. J., Ichimura, H., & Todd, P. E. (1997). Matching as an Econometric Evaluation Estimator: Evidence from Evaluating a Job Training Programme. Review of Economic Studies, 64(4), 605–654.

    doi.org/10.2307/2971733

    Foundationalon matching methods
    matching-estimatorcommon-supportprogram-evaluation
    Annotation

    Heckman, Ichimura, and Todd develop the econometric theory behind matching estimators, including conditions for identification and the importance of common support. They apply these methods to evaluate job training programs and show when matching works well and when it does not.

  14. Heckman, J. J., & Vytlacil, E. (2005). Structural Equations, Treatment Effects, and Econometric Policy Evaluation. Econometrica, 73(3), 669–738.

    doi.org/10.1111/j.1468-0262.2005.00594.x

    MTEtreatment-effectsLATEATEATT+2
    Annotation

    Heckman and Vytlacil use the marginal treatment effect (MTE) to connect the treatment-effects literature with structural econometric policy evaluation. A central result is that commonly used treatment-effect parameters (ATE, ATT, LATE, PRTE) can be expressed as weighted averages of the MTE curve, with each estimand using a different weight function. The framework shows how IV estimates with different instruments recover different weighted averages of the same underlying MTE, providing the theoretical foundation for understanding instrument-dependent variation in treatment-effect estimates.

  15. Henderson, A. D., Miller, D., & Hambrick, D. C. (2006). How Quickly Do CEOs Become Obsolete? Industry Dynamism, CEO Tenure, and Company Performance. Strategic Management Journal, 27(5), 447–460.

    doi.org/10.1002/smj.524

    ApplicationMgmton fixed effects
    CEO-tenurefirm-performanceindustry-dynamism
    Annotation

    Henderson, Miller, and Hambrick study how CEO tenure affects performance in dynamic versus stable industries in this longitudinal strategy paper. In the stable food industry, performance improved steadily with tenure, declining only after 10-15 years; in the dynamic computer industry, performance declined steadily from the start. The paper demonstrates that the relationship between CEO tenure and performance is contingent on industry dynamism.

  16. Heß, S. (2017). Randomization Inference with Stata: A Guide and Software. Stata Journal, 17(3), 630–651.

    doi.org/10.1177/1536867X1701700306

    Statasoftwareimplementation
    Annotation

    Heß develops the ritest Stata command for randomization inference, providing a practical tool for conducting permutation tests under arbitrary randomization procedures in experimental and quasi-experimental settings. The command accommodates stratified, clustered, and blocked randomization designs, and produces exact finite-sample p-values without distributional assumptions. The paper serves as both a software introduction and a practical guide to randomization inference for applied researchers.

  17. Ho, D. E., Imai, K., King, G., & Stuart, E. A. (2007). Matching as Nonparametric Preprocessing for Reducing Model Dependence in Parametric Causal Inference. Political Analysis, 15(3), 199–236.

    doi.org/10.1093/pan/mpl013

    Foundationalon matching methods
    preprocessingmodel-dependencenonparametriccausal-inference
    Annotation

    Ho, Imai, King, and Stuart argue that matching should be used as a preprocessing step before parametric modeling, reducing model dependence and improving robustness of causal estimates. This influential paper reframed matching not as a standalone estimator but as a way to make subsequent parametric analyses less sensitive to specification choices.

  18. Hoenig, J. M., & Heisey, D. M. (2001). The Abuse of Power: The Pervasive Fallacy of Power Calculations for Data Analysis. The American Statistician, 55(1), 19–24.

    doi.org/10.1198/000313001300339897

    Foundationalon power analysis
    post-hoc-powerstatistical-fallacypower-calculations
    Annotation

    Hoenig and Heisey demonstrate that post hoc (observed) power calculations are fundamentally flawed because they are a monotone function of the p-value and add no information beyond the test result itself. This paper is essential reading for understanding why power analysis must be conducted before data collection.

  19. Hoetker, G. (2007). The Use of Logit and Probit Models in Strategic Management Research: Critical Issues. Strategic Management Journal, 28(4), 331–343.

    doi.org/10.1002/smj.582

    ApplicationMgmton logit probit
    strategy-researchmethodologycoefficient-comparison
    Annotation

    Hoetker reviews how strategy researchers use logit and probit models and identifies common pitfalls, including misinterpretation of coefficients across groups and incorrect use of interaction terms. This paper provides concrete guidance for improving practice in management journals.

  20. Hofmann, D. A. (1997). An Overview of the Logic and Rationale of Hierarchical Linear Models. Journal of Management, 23(6), 723–744.

    doi.org/10.1177/014920639702300602

    FoundationalMgmton random effects
    HLMmanagement-methodologymultilevel
    Annotation

    Hofmann introduces hierarchical linear models to the management research community, explaining when and why multilevel random-effects models are appropriate for organizational data with nested structures. This tutorial is highly influential in promoting multilevel methods in management journals.

  21. Holland, P. W. (1986). Statistics and Causal Inference. Journal of the American Statistical Association, 81(396), 945–960.

    doi.org/10.1080/01621459.1986.10478354

    Foundationalon ols regression
    causal-inferencepotential-outcomesRubin-causal-modelfundamental-problem
    Annotation

    Holland articulates the fundamental problem of causal inference—that we can never observe both potential outcomes for the same unit—and formalizes the Rubin Causal Model framework. His dictum 'no causation without manipulation' shapes how a generation of researchers thinks about the conditions under which statistical associations can be given causal interpretations.

  22. Hollenbeck, J. R., & Wright, P. M. (2017). Harking, Sharking, and Tharking: Making the Case for Post Hoc Analysis of Scientific Data. Journal of Management, 43(1), 5–18.

    doi.org/10.1177/0149206316679487

    FoundationalMgmton pre registration
    HARKingpost-hoc-analysismanagement-methodology
    Annotation

    Hollenbeck and Wright introduce the concept of 'Tharking' (Transparently Hypothesizing After Results Are Known), arguing that post hoc analysis of scientific data is valuable when conducted and reported transparently. They distinguish destructive HARKing from constructive post hoc exploration, making the case that management researchers should embrace exploratory analysis in discussion sections rather than disguising it as confirmatory.

  23. Hoogendoorn, S., Parker, S. C., & van Praag, M. (2017). Smart or Diverse Start-up Teams? Evidence from a Field Experiment. Organization Science, 28(6), 1010–1028.

    doi.org/10.1287/orsc.2017.1158

    ApplicationMgmton experimental design
    field-experimentteam-diversityentrepreneurshipperformance
    Annotation

    Hoogendoorn, Parker, and van Praag conduct a field experiment with 573 students randomly assigned to 49 startup teams that varied in cognitive ability dispersion. They find an inverted U-shaped relationship between ability dispersion and team performance, with moderately diverse teams in ability outperforming both homogeneous and highly dispersed teams. The random assignment to teams ensures that ability composition is exogenous, providing clean experimental identification of the effect of team cognitive diversity on venture performance.

  24. Horowitz, J. L., & Manski, C. F. (2000). Nonparametric Analysis of Randomized Experiments with Missing Covariate and Outcome Data. Journal of the American Statistical Association, 95(449), 77–84.

    doi.org/10.1080/01621459.2000.10473902

    Foundationalon lee bounds
    missing-datanonparametric-boundsrandomized-experiments
    Annotation

    Horowitz and Manski extend the bounding approach to experiments with missing data on both covariates and outcomes. They show how to construct valid bounds under different assumptions about the missing data mechanism, providing a principled alternative to complete-case analysis and imputation.

  25. Hurst, R., Lee, S., & Frake, J. (2024). The Effect of Flatter Hierarchy on Applicant Pool Gender Diversity: Evidence from Experiments. Strategic Management Journal, 45(8), 1446–1484.

    doi.org/10.1002/smj.3590

    ApplicationMgmton experimental design
    reverse-audit-studyfield-experimentgenderhierarchyrecruitment+1
    Annotation

    Hurst, Lee, and Frake conduct a reverse audit study in partnership with a U.S. healthcare startup, sending recruitment emails to approximately 8,400 job seekers with randomly varied descriptions of the firm's organizational hierarchy. Featuring a flatter hierarchy did not significantly affect applicant pool size but significantly decreased women's representation, because women perceived flatter structures as offering fewer career advancement opportunities and greater workload burdens.

  26. Huselid, M. A. (1995). The Impact of Human Resource Management Practices on Turnover, Productivity, and Corporate Financial Performance. Academy of Management Journal, 38(3), 635–672.

    doi.org/10.2307/256741

    ApplicationMgmton ols regression
    human-resource-managementfirm-performancestrategic-HRM
    Annotation

    Huselid uses OLS (and related cross-sectional methods) to estimate the relationship between HR practices and firm performance in this influential management study. It helps launch the field of strategic HRM and illustrates both the power and limitations of regression-based approaches in management research.

I
12
  1. Iacus, S. M., King, G., & Porro, G. (2012). Causal Inference without Balance Checking: Coarsened Exact Matching. Political Analysis, 20(1), 1–24.

    doi.org/10.1093/pan/mpr013

    Foundationalon matching methods
    coarsened-exact-matchingCEMbalance
    Annotation

    Iacus, King, and Porro introduce Coarsened Exact Matching (CEM), which coarsens covariates into bins and then performs exact matching within those bins. CEM avoids many pitfalls of propensity score matching, such as the need to check balance iteratively, and gives the researcher direct control over the matching quality.

  2. Imai, K., Keele, L., & Tingley, D. (2010). A General Approach to Causal Mediation Analysis. Psychological Methods, 15(4), 309–334.

    doi.org/10.1037/a0020761

    potential-outcomessequential-ignorabilitysensitivity-analysis
    Annotation

    Imai, Keele, and Tingley develop a general framework for causal mediation analysis grounded in the potential outcomes framework. They clarify the assumptions needed for identifying causal mediation effects, particularly the sequential ignorability assumption, and provide sensitivity analyses for violations.

  3. Imai, K., & Kim, I. S. (2019). When Should We Use Unit Fixed Effects Regression Models for Causal Inference with Longitudinal Data?. American Journal of Political Science, 63(2), 467–490.

    doi.org/10.1111/ajps.12417

    Foundationalon fixed effects
    causal-inferencelongitudinal-datatreatment-historyassumptions
    Annotation

    Imai and Kim provide a modern causal-inference framework for understanding when unit fixed effects regression yields unbiased estimates with longitudinal data. They clarify the often-implicit assumptions about treatment history and carryover effects, offering a more rigorous foundation for applied fixed effects analysis.

  4. Imbens, G. W., & Angrist, J. D. (1994). Identification and Estimation of Local Average Treatment Effects. Econometrica, 62(2), 467–475.

    doi.org/10.2307/2951620

    Foundationalon instrumental variables
    LATEcompliersmonotonicityidentification
    Annotation

    Imbens and Angrist show that IV identifies the average causal effect for compliers -- the subpopulation whose treatment status is changed by the instrument -- under the monotonicity assumption, in this foundational paper on LATE. This reinterpretation fundamentally changes how researchers understand what IV estimates.

  5. Imbens, G. W. (2004). Nonparametric Estimation of Average Treatment Effects Under Exogeneity: A Review. Review of Economics and Statistics, 86(1), 4–29.

    doi.org/10.1162/003465304323023651

    average-treatment-effectunconfoundednessnonparametricsurvey
    Annotation

    Imbens provides a comprehensive review of nonparametric methods for estimating average treatment effects under the unconfoundedness assumption, covering matching, weighting, and subclassification estimators. This survey unifies the theoretical foundations of matching methods and clarifies the connections between different estimators used in program evaluation.

  6. Imbens, G. W., & Manski, C. F. (2004). Confidence Intervals for Partially Identified Parameters. Econometrica, 72(6), 1845–1857.

    doi.org/10.1111/j.1468-0262.2004.00555.x

    Foundationalon lee bounds
    partial-identificationconfidence-intervalsboundsinference
    Annotation

    Imbens and Manski develop methods for constructing valid confidence intervals when parameters are only partially identified—that is, when the data and assumptions narrow the parameter to a set rather than a point. This paper provides the inferential foundation for reporting uncertainty around bounds estimates, including Lee bounds.

  7. Imbens, G. W., & Lemieux, T. (2008). Regression Discontinuity Designs: A Guide to Practice. Journal of Econometrics, 142(2), 615–635.

    doi.org/10.1016/j.jeconom.2007.05.001

    practical-guidebandwidth-selectionlocal-IV
    Annotation

    Imbens and Lemieux provide a comprehensive practical guide to implementing RDD, covering bandwidth selection, functional form, and graphical analysis. Their treatment of fuzzy RDD as a local IV estimator clarifies the interpretation and implementation for applied researchers.

  8. Imbens, G., & Kalyanaraman, K. (2012). Optimal Bandwidth Choice for the Regression Discontinuity Estimator. Review of Economic Studies, 79(3), 933–959.

    doi.org/10.1093/restud/rdr043

    bandwidth-selectionlocal-linearoptimal-bandwidth
    Annotation

    Imbens and Kalyanaraman derive the asymptotically optimal bandwidth for the local linear regression discontinuity estimator and propose a simple data-driven bandwidth selector. The IK bandwidth becomes the standard choice before the Calonico-Cattaneo-Titiunik (2014) refinement.

  9. Imbens, G. W. (2015). Matching Methods in Practice: Three Examples. Journal of Human Resources, 50(2), 373–419.

    doi.org/10.3368/jhr.50.2.373

    Applicationon matching methods
    practical-guidepropensity-scorebalancesensitivity-analysis
    Annotation

    Imbens demonstrates how to implement matching methods in practice through three detailed empirical examples, covering propensity score estimation, covariate balance assessment, overlap and trimming, and robustness to alternative estimators. This paper is an invaluable practical guide that bridges the gap between matching theory and applied research.

  10. Imbens, G. W., & Rubin, D. B. (2015). Causal Inference for Statistics, Social, and Biomedical Sciences: An Introduction. Cambridge University Press.

    doi.org/10.1017/CBO9781139025751

    causal-inferencepotential-outcomespropensity-scoretextbook
    Annotation

    Imbens and Rubin provide a comprehensive textbook grounding causal inference in the potential outcomes framework, with detailed treatment of matching, propensity scores, and subclassification. They provide rigorous foundations for selection-on-observables methods.

  11. Ioannidis, J. P. A., Stanley, T. D., & Doucouliagos, H. (2017). The Power of Bias in Economics Research. Economic Journal, 127(605), F236–F265.

    doi.org/10.1111/ecoj.12461

    underpowered-studiespublication-biasmeta-science
    Annotation

    Ioannidis, Stanley, and Doucouliagos conduct a large-scale assessment of statistical power in economics research and find that the median power to detect typical effect sizes is only 18%. They document widespread underpowering and publication bias, highlighting the importance of ex ante power analysis.

  12. Islam, N. (1995). Growth Empirics: A Panel Data Approach. Quarterly Journal of Economics, 110(4), 1127–1170.

    doi.org/10.2307/2946651

    Applicationon random effects
    growth-empiricsconvergencecross-countrypanel-data
    Annotation

    Islam applies panel data methods—including random effects and fixed effects—to the cross-country growth regression framework, showing that accounting for unobserved country heterogeneity substantially changes estimates of convergence rates. This paper demonstrates the importance of choosing between fixed and random effects in macroeconomic growth empirics.

J
2
  1. Jaeger, D. A., Ruist, J., & Stuhler, J. (2018). Shift-Share Instruments and the Impact of Immigration. NBER Working Paper No. 24285.

    doi.org/10.3386/w24285

    immigrationserial-correlationexclusion-restriction
    Annotation

    Jaeger, Ruist, and Stuhler highlight a threat to shift-share instruments in immigration research: serial correlation in immigrant inflows can bias estimates if past immigration affects current outcomes through channels other than current immigration. This paper raises important concerns about the exclusion restriction.

  2. Jia, N., Luo, X., Fang, Z., & Liao, C. (2024). When and How Artificial Intelligence Augments Employee Creativity. Academy of Management Journal, 67(1), 5–32.

    doi.org/10.5465/amj.2022.0426

    ApplicationMgmton experimental design
    field-experimentRCTartificial-intelligencecreativitydouble-randomization
    Annotation

    Jia, Luo, Fang, and Liao conduct a field experiment examining how AI assistance affects creative work through a sequential division of labor. They find that AI augmentation improves average output quality but reduces the novelty of top-performing work, with effects moderated by employee skill level. The paper provides causal evidence on the productivity implications of human-AI collaboration in knowledge work.

K
18
  1. Kang, J. D. Y., & Schafer, J. L. (2007). Demystifying Double Robustness: A Comparison of Alternative Strategies for Estimating a Population Mean from Incomplete Data. Statistical Science, 22(4), 523–539.

    doi.org/10.1214/07-STS227

    model-misspecificationsimulationcritical-assessment
    Annotation

    Kang and Schafer show through simulations that doubly robust estimators can perform poorly when both models are moderately misspecified, even though they remain consistent when one model is correct. This influential paper tempers enthusiasm and motivates further methodological work on practical performance.

  2. Kang, S. K., DeCelles, K. A., Tilcsik, A., & Jun, S. (2016). Whitened Résumés: Race and Self-Presentation in the Labor Market. Administrative Science Quarterly, 61(3), 469–502.

    doi.org/10.1177/0001839216639577

    ApplicationMgmton experimental design
    audit-studydiscriminationhiringracerésumés
    Annotation

    Kang and colleagues conduct a résumé audit study sending fictitious applications to real employers, finding that minority applicants who 'whitened' their résumés received significantly more callbacks. The study combines a correspondence experiment with qualitative interviews, providing a powerful example of how audit studies can identify discrimination in hiring.

  3. Kaplan, E. L., & Meier, P. (1958). Nonparametric Estimation from Incomplete Observations. Journal of the American Statistical Association, 53(282), 457–481.

    doi.org/10.1080/01621459.1958.10501452

    foundationalnonparametricsurvival-function
    Annotation

    Kaplan and Meier introduce the product-limit estimator (Kaplan-Meier estimator) for the survival function from right-censored data. The KM curve is the standard nonparametric tool for visualizing survival and comparing groups before fitting regression models.

  4. Katila, R., & Ahuja, G. (2002). Something Old, Something New: A Longitudinal Study of Search Behavior and New Product Introduction. Academy of Management Journal, 45(6), 1183–1194.

    doi.org/10.2307/3069433

    ApplicationMgmton poisson negative binomial
    knowledge-searchnew-productsinnovation-management
    Annotation

    Katila and Ahuja use negative binomial models to study how the depth and scope of a firm's knowledge search affect new product introductions. This paper is a widely cited application of count data models in the strategic management and innovation literature.

  5. Kaul, A., Klossner, S., Pfeifer, G., & Schieler, M. (2022). Standard Synthetic Control Methods: The Case of Using All Preintervention Outcomes Together With Covariates. Journal of Business & Economic Statistics, 40(3), 1362–1376.

    doi.org/10.1080/07350015.2021.1930012

    Applicationon synthetic control
    synthetic-controlmatching-pitfallspre-treatment-outcomes
    Annotation

    Kaul et al. show that using all pre-treatment outcome lags as predictors in synthetic control (a form of matching for aggregate units) renders other covariates irrelevant, threatening unbiasedness. Their finding highlights pitfalls when matching on pre-treatment outcomes and is relevant for understanding matching assumptions more broadly.

  6. King, G., & Zeng, L. (2001). Logistic Regression in Rare Events Data. Political Analysis, 9(2), 137–163.

    doi.org/10.1093/oxfordjournals.pan.a004868

    Foundationalon logit probit
    rare-eventslogistic-regressionbinary-outcomesmethodology
    Annotation

    King and Zeng develop a correction for logistic regression when the outcome event is rare. Standard logit underestimates the probability of rare events; their rare-events logit (relogit) applies a correction based on prior information about the event rate in the population. Essential reference for binary outcome studies with highly imbalanced classes.

  7. King, G., & Roberts, M. E. (2015). How Robust Standard Errors Expose Methodological Problems They Do Not Fix, and What to Do About It. Political Analysis, 23(2), 159–179.

    doi.org/10.1093/pan/mpu015

    robust-standard-errorsmodel-specificationmethodology
    Annotation

    King and Roberts argue that researchers often use robust standard errors as a band-aid rather than fixing the underlying model specification. They provide practical guidance on when robust SEs are appropriate and when the model itself needs to be reconsidered.

  8. King, G., & Nielsen, R. (2019). Why Propensity Scores Should Not Be Used for Matching. Political Analysis, 27(4), 435–454.

    doi.org/10.1017/pan.2019.11

    propensity-score-critiquebalancemodel-dependence
    Annotation

    King and Nielsen argue that propensity score matching can increase imbalance, model dependence, and bias relative to other matching methods. This provocative paper has influenced a shift toward alternatives like CEM and Mahalanobis distance matching in applied research.

  9. Kleven, H. J., & Waseem, M. (2013). Using Notches to Uncover Optimization Frictions and Structural Elasticities: Theory and Evidence from Pakistan. Quarterly Journal of Economics, 128(2), 669–723.

    doi.org/10.1093/qje/qjt004

    Foundationalon bunching estimation
    notchoptimization-frictionsstructural-estimationPakistan
    Annotation

    Kleven and Waseem extend bunching estimation from kinks to notches -- discrete jumps in the tax schedule where the average tax rate changes discontinuously. They develop a structural framework that distinguishes between frictionless and frictional bunching, showing that optimization frictions attenuate observed bunching and cause the naive estimator to understate the true elasticity. Their model identifies both the structural elasticity and the friction distribution from the observed bunching pattern. Applied to Pakistan's income tax notches, they demonstrate that frictions are empirically important and that ignoring them substantially biases elasticity estimates downward.

  10. Kleven, H. J. (2016). Bunching. Annual Review of Economics, 8, 435–464.

    doi.org/10.1146/annurev-economics-080315-015234

    surveykinknotchfrictionsmethodology
    Annotation

    Kleven provides a comprehensive survey of the bunching methodology, covering both kink and notch designs, the role of optimization frictions, and extensions to multiple applications beyond taxation. The survey unifies the theoretical frameworks from Saez (2010) and Kleven and Waseem (2013), discusses practical implementation issues (polynomial order, bandwidth, bin width), and catalogs the growing literature applying bunching to estimate behavioral elasticities in public finance, labor economics, and regulation. Essential reading for anyone starting with bunching methods.

  11. Kline, P., & Walters, C. R. (2016). Evaluating Public Programs with Close Substitutes: The Case of Head Start. Quarterly Journal of Economics, 131(4), 1795–1848.

    doi.org/10.1093/qje/qjw027

    Applicationon lee bounds
    Head-Startprogram-evaluationsubstitution
    Annotation

    Kline and Walters develop a semi-parametric selection model to evaluate Head Start in the presence of close substitute preschool programs, estimating both average and marginal treatment effects. They find that Head Start's effects vary substantially with the quality of available alternatives, and that the program passes a cost-benefit test for the average participant. The paper demonstrates how accounting for alternative program availability changes the interpretation of experimental treatment effects.

  12. Knaus, M. C., Lechner, M., & Strittmatter, A. (2021). Machine Learning Estimation of Heterogeneous Causal Effects: Empirical Monte Carlo Evidence. Econometrics Journal, 24(1), 134–161.

    doi.org/10.1093/ectj/utaa014

    labor-market-policyheterogeneous-effectsempirical-Monte-Carlo
    Annotation

    Knaus, Lechner, and Strittmatter conduct an empirical Monte Carlo study benchmarking eleven causal machine learning estimators for heterogeneous treatment effects across 24 data-generating processes based on real labor market data. They find that no single estimator dominates across all settings, and that ensemble methods combining multiple learners perform well overall. The study provides practical guidance on when different CATE estimators (causal forests, DML-based methods, meta-learners) are most reliable.

  13. Koenker, R., & Bassett, G., Jr. (1978). Regression Quantiles. Econometrica, 46(1), 33–50.

    doi.org/10.2307/1913643

    foundationalquantile-regressioneconometrics
    Annotation

    Koenker and Bassett introduce quantile regression, proposing to estimate conditional quantile functions by minimizing an asymmetric absolute loss (check function), generalizing least absolute deviations to arbitrary quantiles. Establishes asymptotic theory and demonstrates robustness to outliers and heteroscedasticity relative to OLS.

  14. Koenker, R., & Machado, J. A. F. (1999). Goodness of Fit and Related Inference Processes for Quantile Regression. Journal of the American Statistical Association, 94(448), 1296–1310.

    doi.org/10.1080/01621459.1999.10473882

    Annotation

    Koenker and Machado introduce a goodness-of-fit measure for quantile regression analogous to the R-squared of least squares, based on the ratio of minimized check functions across restricted and unrestricted models. They also develop related inference processes for testing composite hypotheses about covariate effects over an entire range of quantiles, with asymptotic behavior linked to Bessel processes. Practitioners estimating quantile regressions can use this pseudo-R-squared and joint significance tests to assess model fit across the conditional distribution.

  15. Kontopantelis, E., Doran, T., Springate, D. A., Buchan, I., & Reeves, D. (2015). Regression Based Quasi-Experimental Approach When Randomisation Is Not an Option: Interrupted Time Series Analysis. BMJ, 350, h2750.

    doi.org/10.1136/bmj.h2750

    surveypractical-guidehealth-policy
    Annotation

    Kontopantelis and colleagues provide a practical guide to ITS analysis published in the BMJ. Covers model specification, autocorrelation testing, sensitivity analyses, and the addition of control series. Provides clear visual examples of level and slope changes and discusses common pitfalls.

  16. Kothari, S. P., & Warner, J. B. (2007). Econometrics of Event Studies. Handbook of Empirical Corporate Finance, 1, 3–36.

    doi.org/10.1016/B978-0-444-53265-7.50015-9

    Surveyon event studies
    long-horizoncross-sectionaleconometrics
    Annotation

    Kothari and Warner provide an updated survey of event study methods, covering long-horizon event studies, cross-sectional regression approaches, and the econometric challenges that arise with overlapping events and event-induced variance changes. The survey documents how the basic FFJR framework is extended and refined over four decades. It is an essential reference for researchers designing event studies who need to understand the full menu of methodological choices and their trade-offs.

  17. Krueger, A. B. (1999). Experimental Estimates of Education Production Functions. Quarterly Journal of Economics, 114(2), 497–532.

    doi.org/10.1162/003355399556052

    Applicationon ols regression
    educationclass-sizerandomized-experimentProject-STAR
    Annotation

    Krueger uses Tennessee's Project STAR randomized class-size experiment to estimate the effect of class size on student achievement via OLS. Because treatment is randomized, the OLS coefficient has a causal interpretation, demonstrating that the method is not the issue -- the research design is what determines causality.

  18. Künzel, S. R., Sekhon, J. S., Bickel, P. J., & Yu, B. (2019). Metalearners for Estimating Heterogeneous Treatment Effects Using Machine Learning. Proceedings of the National Academy of Sciences, 116(10), 4156–4165.

    doi.org/10.1073/pnas.1804597116

    Foundationalon causal forests
    X-learnermeta-learnersCATE
    Annotation

    Künzel and colleagues propose the X-learner meta-algorithm for estimating CATEs and systematically compare it with T-learners and S-learners using random forests and BART as base learners. The paper provides practical guidance on when different meta-learning strategies perform well or poorly.

L
22
  1. Laird, N. M., & Ware, J. H. (1982). Random-Effects Models for Longitudinal Data. Biometrics, 38(4), 963–974.

    doi.org/10.2307/2529876

    Foundationalon random effects
    longitudinal-datamixed-effectsbiostatistics
    Annotation

    Laird and Ware develop the general framework for random-effects models in longitudinal data, integrating fixed population parameters with random individual-level effects. This paper is foundational for the mixed-effects modeling approach widely used in biostatistics and social sciences.

  2. LaLonde, R. J. (1986). Evaluating the Econometric Evaluations of Training Programs with Experimental Data. American Economic Review, 76(4), 604–620.

    Foundationalon matching methods
    experimental-benchmarkprogram-evaluationjob-trainingnon-experimental-methods
    Annotation

    LaLonde compares econometric estimates of a job training program's effect with experimental benchmarks from a randomized trial, finding that non-experimental methods often failed to replicate the experimental results. This paper establishes the standard test bed for evaluating matching and other observational causal methods.

  3. Lambert, D. (1992). Zero-Inflated Poisson Regression, with an Application to Defects in Manufacturing. Technometrics, 34(1), 1–14.

    doi.org/10.2307/1269547

    zero-inflated-Poissonexcess-zerosmanufacturingcount-data
    Annotation

    Lambert introduces the zero-inflated Poisson (ZIP) model, which accounts for excess zeros in count data by mixing a point mass at zero with a Poisson distribution. The ZIP model has become a standard tool for count outcomes where a subpopulation generates only zeros.

  4. Landais, C. (2015). Assessing the Welfare Effects of Unemployment Benefits Using the Regression Kink Design. American Economic Journal: Economic Policy, 7(4), 243–278.

    doi.org/10.1257/pol.20130248

    unemployment-insuranceUSbenefit-schedulessocial-insurance
    Annotation

    Landais uses the regression kink design to decompose the moral hazard and liquidity effects of unemployment insurance benefits using US data. The progressive UI benefit formula creates kinks that provide quasi-experimental variation in benefit levels. This paper demonstrates the power of RKD for evaluating social insurance programs where benefits change slope at known thresholds.

  5. Leamer, E. E. (1983). Let's Take the Con Out of Econometrics. American Economic Review, 73(1), 31–43.

    Foundationalon specification curve
    extreme-boundsrobustnessspecification-sensitivity
    Annotation

    Leamer's classic paper argues that the sensitivity of empirical results to specification choices undermines the credibility of econometric evidence. He proposes extreme bounds analysis, an early form of systematic robustness testing that anticipates modern specification curve analysis by several decades.

  6. Lee, D. S. (2008). Randomized Experiments from Non-random Selection in U.S. House Elections. Journal of Econometrics, 142(2), 675–697.

    doi.org/10.1016/j.jeconom.2007.05.004

    electionslocal-randomizationmanipulation
    Annotation

    Lee formalizes the conditions under which an RDD is 'as good as' a randomized experiment—namely, when agents cannot precisely manipulate the running variable around the cutoff. Applied to U.S. House elections, this paper establishes the modern theoretical foundation for sharp RDD.

  7. Lee, D. S. (2009). Training, Wages, and Sample Selection: Estimating Sharp Bounds on Treatment Effects. Review of Economic Studies, 76(3), 1071–1102.

    doi.org/10.1111/j.1467-937X.2009.00536.x

    Foundationalon lee bounds
    sharp-boundssample-selectionmonotonicity
    Annotation

    Lee develops sharp nonparametric bounds on treatment effects in the presence of sample selection, requiring only a monotonicity assumption (that treatment affects selection in one direction). These bounds are widely used to address attrition and selective sample composition in randomized experiments.

  8. Lee, D. S., & Lemieux, T. (2010). Regression Discontinuity Designs in Economics. Journal of Economic Literature, 48(2), 281–355.

    doi.org/10.1257/jel.48.2.281

    surveyvalidity-testseconometric-theory
    Annotation

    Lee and Lemieux write the standard survey of RDD methods in economics, covering both sharp and fuzzy designs, validity tests, and extensions. This paper is the standard reference for understanding the econometric theory and practical implementation of RDD.

  9. Lee, S. (2022). The Myth of the Flat Start-Up: Reconsidering the Organizational Structure of Start-Ups. Strategic Management Journal, 43(1), 58–92.

    doi.org/10.1002/smj.3333

    ApplicationMgmton sensitivity analysis
    Oster-methodcoefficient-stabilityomitted-variable-biasstart-upsorganizational-structure+1
    Annotation

    Lee examines the relationship between organizational hierarchy on start-up creative and commercial success in the video game industry. She uses Oster's (2019) coefficient stability method to assess robustness to omitted variable bias, demonstrating how partial identification techniques complement standard empirical approaches in strategy research.

  10. Lee, D. S., McCrary, J., Moreira, M. J., & Porter, J. (2022). Valid t-Ratio Inference for IV. American Economic Review, 112(10), 3260–3290.

    doi.org/10.1257/aer.20211063

    Foundationalon instrumental variables
    weak-instrumentst-ratioF-statisticinference
    Annotation

    Lee, McCrary, Moreira, and Porter address the potentially severe large-sample distortions of t-ratio-based inference in the single-IV model. They introduce the tF critical value function, a standard error adjustment that is a smooth function of the first-stage F-statistic, which corrects for weak instrument bias. They find that for one-quarter of specifications in 61 AER papers, corrected standard errors are at least 49% larger than conventional 2SLS standard errors at the 5% significance level. The practical implication is that researchers using IV should apply their tF correction rather than relying on conventional standard errors.

  11. Lennox, C. S., Francis, J. R., & Wang, Z. (2012). Selection Models in Accounting Research. The Accounting Review, 87(2), 589–616.

    doi.org/10.2308/accr-10195

    surveyaccountingbest-practices
    Annotation

    Lennox, Francis, and Wang review the use (and misuse) of Heckman selection models in accounting research. Documents common pitfalls including weak exclusion restrictions, failure to test normality, and mechanical application without economic justification for the selection equation.

  12. Levitt, S. D. (1997). Using Electoral Cycles in Police Hiring to Estimate the Effect of Police on Crime. American Economic Review, 87(3), 270–290.

    crimepoliceelectoral-cyclesreverse-causality
    Annotation

    Levitt uses the timing of mayoral and gubernatorial elections as an instrument for police hiring to estimate the causal effect of police on crime. The paper illustrates the IV approach in a policy-relevant setting where the key concern is reverse causality (more crime leads to more police).

  13. Lin, D. Y., Wei, L. J., & Ying, Z. (1993). Checking the Cox Model with Cumulative Sums of Martingale-Based Residuals. Biometrika, 80(3), 557–572.

    doi.org/10.1093/biomet/80.3.557

    foundationaldiagnosticsmodel-checking
    Annotation

    Lin, Wei, and Ying develop graphical and numerical methods for checking the Cox model using cumulative sums of martingale-based residuals. Provides formal tests for the proportional hazards assumption, functional form of covariates, and overall model adequacy.

  14. Lin, W. (2013). Agnostic Notes on Regression Adjustments to Experimental Data: Reexamining Freedman's Critique. Annals of Applied Statistics, 7(1), 295–318.

    doi.org/10.1214/12-AOAS583

    Annotation

    Lin shows that OLS regression adjustment with a full set of treatment-covariate interactions yields an estimator that is asymptotically no less precise than the unadjusted difference in means in randomized experiments, even without assuming correct model specification. This result resolves Freedman's critique of regression adjustment by demonstrating that the interacted specification, combined with Huber-White standard errors, produces valid inference under Neyman's randomization model. Experimentalists should include treatment-by-covariate interactions and use robust standard errors when adjusting for baseline covariates.

  15. Linden, A. (2015). Conducting Interrupted Time-Series Analysis for Single- and Multiple-Group Comparisons. Stata Journal, 15(2), 480–500.

    doi.org/10.1177/1536867X1501500208

    surveystatasoftware
    Annotation

    Linden introduces the itsa command in Stata for single- and multiple-group ITS analysis. Covers Newey-West standard errors for autocorrelation, Prais-Winsten estimation, and the extension to controlled ITS with a comparison group. A key reference for Stata users.

  16. List, J. A., Sadoff, S., & Wagner, M. (2011). So You Want to Run an Experiment, Now What? Some Simple Rules of Thumb for Optimal Experimental Design. Experimental Economics, 14(4), 439–457.

    doi.org/10.1007/s10683-011-9275-7

    power-analysissample-sizedesign-guide
    Annotation

    List, Sadoff, and Wagner provide rules of thumb for sample size, treatment assignment, and other design decisions in field experiments in this practical guide. It is a useful starting point for researchers planning their first experiment.

  17. List, J. A., Shaikh, A. M., & Xu, Y. (2019). Multiple Hypothesis Testing in Experimental Economics. Experimental Economics, 22(4), 773–793.

    doi.org/10.1007/s10683-018-09597-5

    experimental-economicsfield-experimentspractical-guide
    Annotation

    List, Shaikh, and Xu provide practical guidance on addressing multiple hypothesis testing in experimental economics. They compare various correction methods including Bonferroni, Holm, and FDR procedures, and demonstrate their application to field experiments with multiple outcome variables.

  18. Long, J. S. (1997). Regression Models for Categorical and Limited Dependent Variables. SAGE Publications.

    Surveyon logit probit
    textbookcategorical-datalimited-dependent-variables
    Annotation

    Long provides a comprehensive reference for applied researchers working with binary, ordinal, multinomial, and count outcome models. The textbook covers maximum likelihood estimation, marginal effects computation, and model diagnostics with clear exposition and software implementation guidance. It remains the standard practical guide for researchers who need to move beyond OLS to handle categorical and limited dependent variables.

  19. Long, J. S., & Ervin, L. H. (2000). Using Heteroscedasticity Consistent Standard Errors in the Linear Regression Model. The American Statistician, 54(3), 217–224.

    doi.org/10.1080/00031305.2000.10474549

    Foundationalon ols regression
    robust-standard-errorsheteroscedasticityHC3simulation
    Annotation

    Long and Ervin compare HC0, HC1, HC2, and HC3 heteroscedasticity-consistent standard error estimators in a simulation study. Their finding that HC3 performs best in finite samples has influenced applied practice, with many applied researchers preferring HC3 over the default HC0.

  20. Lopez Bernal, J., Cummins, S., & Gasparrini, A. (2017). Interrupted Time Series Regression for the Evaluation of Public Health Interventions: A Tutorial. International Journal of Epidemiology, 46(1), 348–355.

    doi.org/10.1093/ije/dyw098

    surveytutorialpublic-health
    Annotation

    Lopez Bernal, Cummins, and Gasparrini provide an accessible tutorial on ITS regression for public health researchers. Covers the segmented regression model, autocorrelation diagnostics, Newey-West standard errors, and practical guidance on minimum number of time points. An excellent starting point for applied researchers.

  21. Lopez Bernal, J., Cummins, S., & Gasparrini, A. (2018). The Use of Controls in Interrupted Time Series Studies of Public Health Interventions. International Journal of Epidemiology, 47(6), 2082–2093.

    doi.org/10.1093/ije/dyy135

    surveytutorialcontrolled-its
    Annotation

    Lopez Bernal and colleagues provide a tutorial on extending ITS analysis with control groups to strengthen causal inference. Discusses controlled ITS (CITS) designs that combine the ITS framework with a comparison series, addressing the key threat of concurrent events confounding the intervention effect.

  22. Lunceford, J. K., & Davidian, M. (2004). Stratification and Weighting via the Propensity Score in Estimation of Causal Treatment Effects: A Comparative Study. Statistics in Medicine, 23(19), 2937–2960.

    doi.org/10.1002/sim.1903

    propensity-scorecomparisonsimulation
    Annotation

    Lunceford and Davidian compare propensity-score stratification, inverse probability weighting, and doubly robust estimators in a systematic simulation study. The paper provides a side-by-side assessment of these approaches for estimating causal treatment effects from observational data.

M
24
  1. Machado, J. A. F., & Santos Silva, J. M. C. (2019). Quantiles via Moments. Journal of Econometrics, 213(1), 145–173.

    doi.org/10.1016/j.jeconom.2019.04.009

    foundationalpanel-datafixed-effects
    Annotation

    Machado and Santos Silva show that, under a conditional location-scale structure, regression quantiles can be estimated by estimating conditional means. This 'quantiles via moments' approach makes it possible to use tools developed for mean regression in distributional-effects settings, and it can be adapted to panel data with fixed effects by avoiding the incidental parameters problem.

  2. MacKinlay, A. C. (1997). Event Studies in Economics and Finance. Journal of Economic Literature, 35(1), 13–39.

    Surveyon event studies
    surveymethodologyabnormal-returnsstatistical-testing
    Annotation

    MacKinlay provides a comprehensive methodological survey of event studies, covering the statistical framework, estimation windows, abnormal return calculations, and testing procedures. This paper remains the standard reference for researchers designing and implementing event studies.

  3. MacKinnon, D. P., Fairchild, A. J., & Fritz, M. S. (2007). Mediation Analysis. Annual Review of Psychology, 58, 593–614.

    doi.org/10.1146/annurev.psych.58.110405.085542

    psychologySobel-testbootstrapping
    Annotation

    MacKinnon, Fairchild, and Fritz provide an accessible review of mediation analysis methods for psychologists, covering the Baron-Kenny approach, the Sobel test, bootstrapping methods, and extensions to multiple mediators. This survey helped bridge the gap between traditional and modern approaches.

  4. Manski, C. F. (1990). Nonparametric Bounds on Treatment Effects. American Economic Review: Papers & Proceedings, 80(2), 319–323.

    Foundationalon lee bounds
    partial-identificationworst-case-boundsnonparametric
    Annotation

    Manski introduces the partial identification approach to treatment effects, showing that even without strong assumptions, one can bound causal effects using the observed data. His worst-case bounds framework lays the theoretical foundation for Lee's sharper bounds under the monotonicity assumption.

  5. Manski, C. F. (1993). Identification of Endogenous Social Effects: The Reflection Problem. Review of Economic Studies, 60(3), 531–542.

    doi.org/10.2307/2298123

    Foundationalon instrumental variables
    identificationsocial-interactionspeer-effectsreflection-problem
    Annotation

    Manski formalizes the reflection problem in the analysis of social interactions: when individual outcomes depend on group averages, the group average is simultaneously determined by its members. This simultaneity makes it impossible to distinguish true social (endogenous) effects from correlated effects without additional structure or exclusion restrictions. The paper is essential reading for any researcher attempting to estimate peer effects or social spillovers.

  6. Manski, C. F. (2003). Partial Identification of Probability Distributions. Springer.

    doi.org/10.1007/b97478

    Foundationalon lee bounds
    partial-identificationtextbookboundsnonparametric
    Annotation

    Manski's monograph provides a comprehensive treatment of partial identification, showing how to derive informative bounds on parameters of interest when point identification is not possible. This book formalizes and extends his earlier work on bounding treatment effects and is the standard reference for the theoretical framework underlying Lee bounds.

  7. Masicampo, E. J., & Lalande, D. (2012). A Peculiar Prevalence of p Values Just Below .05. Quarterly Journal of Experimental Psychology, 65(11), 2271–2279.

    doi.org/10.1080/17470218.2012.711335

    Applicationon specification curve
    p-valuespublication-biasspecification-searching
    Annotation

    Masicampo and Lalande document a suspicious clustering of p-values just below the .05 threshold in psychology journals, providing empirical evidence of publication bias and researcher degrees of freedom. They discuss potential sources of this pattern and its implications for the credibility of published findings in the social sciences.

  8. Masten, M. A., & Poirier, A. (2021). Salvaging Falsified Instrumental Variable Models. Econometrica, 89(3), 1449–1469.

    doi.org/10.3982/ECTA17969

    Foundationalon sensitivity analysis
    instrumental-variablesfalsificationpartial-identificationbounds
    Annotation

    Masten and Poirier study what researchers can do when an IV model is falsified. They introduce the falsification frontier and the falsification adaptive set, which quantify minimal relaxations of the baseline assumptions and report the parameter values consistent with minimally nonfalsified models, providing a structured sensitivity-analysis framework for IV.

  9. McCrary, J. (2008). Manipulation of the Running Variable in the Regression Discontinuity Design: A Density Test. Journal of Econometrics, 142(2), 698–714.

    doi.org/10.1016/j.jeconom.2007.05.005

    manipulation-testdensity-testvalidity
    Annotation

    McCrary develops the standard test for whether agents are manipulating the running variable to sort around the cutoff. If the density of the running variable shows a discontinuity at the cutoff, the RDD is compromised. This density test is now a routine validity check in all RDD papers.

  10. McFadden, D. (1974). Conditional Logit Analysis of Qualitative Choice Behavior. Frontiers in Econometrics, 105–142.

    Foundationalon logit probit
    conditional-logitdiscrete-choicerandom-utility
    Annotation

    McFadden develops the conditional logit model grounded in random utility theory, showing how discrete choices among alternatives can be modeled by assuming individuals maximize utility with an extreme-value distributed error. This work earns him the 2000 Nobel Prize and remains the foundation of discrete choice analysis.

  11. McKenzie, D. (2012). Beyond Baseline and Follow-Up: The Case for More T in Experiments. Journal of Development Economics, 99(2), 210–221.

    doi.org/10.1016/j.jdeveco.2012.01.002

    Foundationalon power analysis
    ANCOVAmultiple-periodsdevelopment-economics
    Annotation

    McKenzie shows that collecting multiple rounds of data can substantially increase statistical power in randomized experiments. He demonstrates that ANCOVA with baseline data and difference-in-differences with multiple time periods can substantially reduce the required sample size, which is particularly valuable in development economics.

  12. McWilliams, A., & Siegel, D. (1997). Event Studies in Management Research: Theoretical and Empirical Issues. Academy of Management Journal, 40(3), 626–657.

    doi.org/10.2307/257056

    SurveyMgmton event studies
    management-methodologytutorialstrategy-research
    Annotation

    McWilliams and Siegel provide a critical assessment of event study methodology as applied in management research, identifying common theoretical and design pitfalls including confounding events, improper event window selection, and thin trading. The paper outlines procedures for appropriate use of event studies and serves as a widely cited methodological guide for strategy and management researchers conducting event studies.

  13. Miguel, E., Satyanath, S., & Sergenti, E. (2004). Economic Shocks and Civil Conflict: An Instrumental Variables Approach. Journal of Political Economy, 112(4), 725–753.

    doi.org/10.1086/421174

    civil-conflictrainfall-instrumentweather-IVAfrica
    Annotation

    Miguel, Satyanath, and Sergenti instrument for economic growth using rainfall variation to estimate the causal effect of economic shocks on civil conflict in Sub-Saharan Africa. Their paper is a clean and widely cited example of using weather as an instrumental variable, illustrating both the power and the exclusion restriction challenges of weather-based instruments.

  14. Miguel, E., Camerer, C., Casey, K., Cohen, J., Esterling, K. M., Gerber, A., Glennerster, R., Green, D. P., Humphreys, M., Imbens, G., Laitin, D., Madon, T., Nelson, L., Nosek, B. A., Petersen, M., Sedlmayr, R., Simmons, J. P., Simonsohn, U., & Van der Laan, M. (2014). Promoting Transparency in Social Science Research. Science, 343(6166), 30–31.

    doi.org/10.1126/science.1245317

    Foundationalon pre registration
    transparencyopen-sciencesocial-science
    Annotation

    Miguel and a coalition of leading social scientists call for greater transparency in research, including pre-registration of studies and analysis plans, open data, and replication. This short but influential piece in Science helps establish the norms and infrastructure for pre-registration in social science.

  15. Mincer, J. (1974). Schooling, Experience, and Earnings. National Bureau of Economic Research / Columbia University Press.

    Applicationon ols regression
    returns-to-educationlabor-economicswage-equation
    Annotation

    Mincer develops the canonical human-capital earnings function relating log wages to years of schooling and labor-market experience. The Mincer equation remains one of the most replicated empirical models in economics and remains the standard benchmark for wage-equation analysis, though it should not be read as having solved the causal identification problems surrounding returns to schooling.

  16. Mogstad, M., Santos, A., & Torgovitsky, A. (2018). Using Instrumental Variables for Inference about Policy Relevant Treatment Parameters. Econometrica, 86(5), 1589–1619.

    doi.org/10.3982/ECTA15463

    MTEpartial-identificationboundspolicy-evaluationivmte
    Annotation

    Mogstad, Santos, and Torgovitsky develop a framework for using instrumental variables to conduct inference on policy-relevant treatment effects under weaker assumptions than full MTE identification. They show that even when the MTE is only partially identified (due to limited support of the propensity score), informative bounds on ATE, ATT, and PRTE can be derived by combining the identified portion of the MTE with shape restrictions. Their approach uses linear programming to compute sharp bounds on the target parameter given the data and assumptions. The paper provides the R package ivmte for implementation and demonstrates that useful policy conclusions can be drawn even without point-identifying the entire MTE curve.

  17. Montiel Olea, J. L., & Pflueger, C. (2013). A Robust Test for Weak Instruments. Journal of Business & Economic Statistics, 31(3), 358–369.

    doi.org/10.1080/00401706.2013.806694

    Foundationalon instrumental variables
    weak-instrumentsrobust-inferenceF-statistic
    Annotation

    Montiel Olea and Pflueger propose an effective F-statistic for testing weak instruments that is robust to heteroscedasticity, serial correlation, and clustering — unlike the conventional first-stage F. The effective F is now the standard diagnostic for instrument strength in applied IV research.

  18. Moulton, B. R. (1990). An Illustration of a Pitfall in Estimating the Effects of Aggregate Variables on Micro Units. Review of Economics and Statistics, 72(2), 334–338.

    doi.org/10.2307/2109724

    Foundationalon ols regression
    clusteringaggregate-variablesstandard-errorsMoulton-problem
    Annotation

    Moulton demonstrates that when aggregate-level variables (such as state policies) are used to explain individual-level outcomes, OLS standard errors that ignore within-group correlation can be dramatically understated. This paper establishes the 'Moulton problem' and motivates the widespread adoption of clustered standard errors in applied microeconomics.

  19. Mroz, T. A. (1987). The Sensitivity of an Empirical Model of Married Women's Hours of Work to Economic and Statistical Assumptions. Econometrica, 55(4), 765–799.

    doi.org/10.2307/1911029

    applicationlabor-supplysensitivity
    Annotation

    Mroz provides a classic application of the Heckman selection model to female labor supply. Shows that the two-step estimator's results are sensitive to the choice of exclusion restriction and the normality assumption. The Mroz dataset remains a standard teaching dataset for selection models.

  20. Mullainathan, S., & Spiess, J. (2017). Machine Learning: An Applied Econometric Approach. Journal of Economic Perspectives, 31(2), 87–106.

    doi.org/10.1257/jep.31.2.87

    machine-learningprediction-vs-causationeconomics
    Annotation

    Mullainathan and Spiess provide an accessible introduction to supervised machine learning for economists, emphasizing how ML differs from classical parameter estimation and where prediction-oriented tools can be useful in empirical economics. The paper is a broad ML-for-economists survey, not a foundational paper on double/debiased machine learning specifically.

  21. Munafo, M. R., Nosek, B. A., Bishop, D. V. M., Button, K. S., Chambers, C. D., Percie du Sert, N., Simonsohn, U., Wagenmakers, E.-J., Ware, J. J., & Ioannidis, J. P. A. (2017). A Manifesto for Reproducible Science. Nature Human Behaviour, 1, 0021.

    doi.org/10.1038/s41562-016-0021

    Foundationalon specification curve
    reproducibilityopen-sciencemanifesto
    Annotation

    Munafo, Nosek, and colleagues identify threats to reproducible science and propose a broad reform agenda spanning methods, reporting, reproducibility practices, evaluation, and incentives. The article is a general reproducibility manifesto that provides the broader scientific reform context motivating robustness-analysis approaches.

  22. Mundlak, Y. (1978). On the Pooling of Time Series and Cross Section Data. Econometrica, 46(1), 69–85.

    doi.org/10.2307/1913646

    correlated-random-effectspanel-datapooling
    Annotation

    Mundlak shows that the fixed effects estimator can be understood as an OLS regression that includes the group means of all time-varying regressors. This 'correlated random effects' interpretation bridges the fixed effects and random effects models and clarifies exactly what assumption is being relaxed.

  23. Muralidharan, K., Niehaus, P., & Sukhtankar, S. (2016). Building State Capacity: Evidence from Biometric Smartcards in India. American Economic Review, 106(10), 2895–2929.

    doi.org/10.1257/aer.20141346

    Applicationon power analysis
    cluster-RCTMDEdevelopment-economicsstate-capacity
    Annotation

    Muralidharan, Niehaus, and Sukhtankar evaluate a large-scale randomized rollout of biometric smartcards for welfare payments in India, finding that the reform improved payment speed, predictability, and integrity. The paper includes detailed ex ante power calculations that demonstrate best practices for reporting minimum detectable effects in cluster-randomized designs.

  24. Murray, M. P. (2006). Avoiding Invalid Instruments and Coping with Weak Instruments. Journal of Economic Perspectives, 20(4), 111–132.

    doi.org/10.1257/jep.20.4.111

    instrument-validityweak-instrumentspractical-guideapplied-work
    Annotation

    Murray provides practical guidance on evaluating instrument validity and dealing with weak instruments in applied work. Written in an accessible style, it helps applied researchers think critically about their instrument choices and provides concrete strategies for addressing common IV pitfalls.

N
7
  1. Neumark, D., & Wascher, W. (2000). Minimum Wages and Employment: A Case Study of the Fast-Food Industry in New Jersey and Pennsylvania: Comment. American Economic Review, 90(5), 1362–1396.

    doi.org/10.1257/aer.90.5.1362

    minimum-wagereplicationmeasurementdid
    Annotation

    Neumark and Wascher challenge Card and Krueger's (1994) minimum wage findings by re-analyzing the data using payroll records instead of survey responses, finding negative employment effects. The exchange illustrates the importance of data quality and measurement choices in difference-in-differences designs.

  2. Newey, W. K., & West, K. D. (1987). A Simple, Positive Semi-Definite, Heteroskedasticity and Autocorrelation Consistent Covariance Matrix. Econometrica, 55(3), 703–708.

    doi.org/10.2307/1913610

    Foundationalon ols regression
    HACautocorrelationtime-seriesstandard-errors
    Annotation

    Newey and West extend White's robust standard errors to also account for autocorrelation in time-series data in this short but hugely influential paper. The 'Newey-West standard errors' or 'HAC standard errors' are standard practice whenever researchers work with data that have a time dimension.

  3. Newey, W. K. (1999). Two Step Series Estimation of Sample Selection Models. MIT Department of Economics Working Paper 99-04.

    Annotation

    Newey proposes a semiparametric two-step estimator for sample selection models that replaces the parametric inverse Mills ratio with a flexible series (power series or regression spline) approximation to the unknown selection correction function. This approach avoids the normality assumption underlying the standard Heckman correction while retaining the computational convenience of a two-step procedure. Researchers concerned about distributional misspecification in selection models can use series-based selection corrections as a robust alternative to parametric methods.

  4. Nickell, S. (1981). Biases in Dynamic Models with Fixed Effects. Econometrica, 49(6), 1417–1426.

    doi.org/10.2307/1911408

    Foundationalon fixed effects
    dynamic-panelsNickell-biaslagged-dependent-variable
    Annotation

    Nickell shows that including a lagged dependent variable in a fixed effects regression creates a bias that does not vanish as the number of cross-sectional units grows. This 'Nickell bias' is a critical concern for researchers using fixed effects in dynamic panel models with short time series.

  5. Nie, X., & Wager, S. (2021). Quasi-Oracle Estimation of Heterogeneous Treatment Effects. Biometrika, 108(2), 299–319.

    doi.org/10.1093/biomet/asaa076

    Foundationalon causal forests
    R-learnerCATEmeta-learners
    Annotation

    Nie and Wager propose the R-learner, a two-step approach for estimating heterogeneous treatment effects that first residualizes outcomes and treatment on covariates, then estimates the CATE by regressing outcome residuals on treatment residuals. This approach can use any machine learning method including causal forests.

  6. Nielsen, H. S., Sorensen, T., & Taber, C. (2010). Estimating the Effect of Student Aid on College Enrollment: Evidence from a Government Grant Policy Reform. American Economic Journal: Economic Policy, 2(2), 185–215.

    doi.org/10.1257/pol.2.2.185

    student-aidcollege-enrollmentDenmarkearly-application
    Annotation

    Nielsen, Sorensen, and Taber apply a regression kink design to estimate the effect of student financial aid on college enrollment in Denmark. The Danish student aid formula creates a kink in the relationship between parental income and aid received. They exploit this kink to identify causal effects, providing one of the earliest applications of the RKD methodology.

  7. Nosek, B. A., Ebersole, C. R., DeHaven, A. C., & Mellor, D. T. (2018). The Preregistration Revolution. Proceedings of the National Academy of Sciences, 115(11), 2600–2606.

    doi.org/10.1073/pnas.1708274114

    Foundationalon pre registration
    pre-registrationopen-sciencecredibility
    Annotation

    Nosek and colleagues make the case for widespread adoption of pre-registration, arguing that it distinguishes confirmatory from exploratory analyses, reduces publication bias, and increases the credibility of empirical research. This paper helps catalyze the pre-registration movement across the social sciences.

O
4
  1. Olken, B. A. (2015). Promises and Perils of Pre-Analysis Plans. Journal of Economic Perspectives, 29(3), 61–80.

    doi.org/10.1257/jep.29.3.61

    pre-analysis-plansdevelopment-economicstradeoffs
    Annotation

    Olken provides a balanced assessment of pre-analysis plans in development economics, discussing both benefits (reduced specification searching, increased credibility) and costs (loss of flexibility, difficulty specifying analyses in advance). This paper is essential reading for understanding the practical tradeoffs of pre-registration.

  2. Oprescu, M., Syrgkanis, V., & Wu, Z. S. (2019). Orthogonal Random Forest for Causal Inference. Proceedings of the 36th International Conference on Machine Learning, 97, 4932–4941.

    orthogonal-forestsEconMLDML
    Annotation

    Oprescu, Syrgkanis, and Wu propose orthogonal random forests, which combine Neyman-orthogonal moments with generalized random forests to reduce sensitivity to nuisance-estimation error. The paper provides theoretical results and shows how the method can be used for heterogeneous-effect estimation with discrete or continuous treatments.

  3. Orben, A., & Przybylski, A. K. (2019). The Association between Adolescent Well-Being and Digital Technology Use. Nature Human Behaviour, 3(2), 173–182.

    doi.org/10.1038/s41562-018-0506-1

    Applicationon specification curve
    digital-technologywell-beinglarge-scale-applicationpsychology
    Annotation

    Orben and Przybylski apply specification curve analysis to the hotly debated question of whether digital technology use harms adolescent well-being, running over 20,000 specifications across three large datasets. They find that technology use has a negligible negative association with well-being, far smaller than commonly assumed, demonstrating how specification curve analysis can bring clarity to contested empirical questions by mapping the full space of defensible analytical choices.

  4. Oster, E. (2019). Unobservable Selection and Coefficient Stability: Theory and Evidence. Journal of Business & Economic Statistics, 37(2), 187–204.

    doi.org/10.1080/07350015.2016.1227711

    Foundationalon sensitivity analysis
    coefficient-stabilityproportional-selectionbounding
    Annotation

    Oster extends the Altonji, Elder, and Taber approach to assess the robustness of regression estimates to omitted variable bias. She proposes a bounding method based on the proportional selection assumption and coefficient stability across specifications, now widely used in applied economics.

P
11
  1. Palepu, K. G. (1986). Predicting Takeover Targets: A Methodological and Empirical Analysis. Journal of Accounting and Economics, 8(1), 3–35.

    doi.org/10.1016/0165-4101(86)90008-X

    Applicationon logit probit
    takeover-predictioncorporate-governancefinance
    Annotation

    Palepu uses logit models to study takeover prediction and identifies methodological flaws in prior prediction studies, showing that targets are more difficult to predict than earlier work suggests. The paper highlights the importance of proper classification criteria and sampling methodology when applying binary choice models to rare-event corporate outcomes.

  2. Pearl, J. (2001). Direct and Indirect Effects. Proceedings of the Seventeenth Conference on Uncertainty in Artificial Intelligence, 411–420.

    structural-causal-modelsnatural-effectsdo-calculus
    Annotation

    Pearl formalizes the concepts of natural direct and indirect effects using structural causal models and do-calculus. This paper establishes the nonparametric identification conditions for mediation effects and shows that traditional mediation analysis conflates causal and non-causal pathways.

  3. Pearl, J. (2009). Causality: Models, Reasoning, and Inference. Cambridge University Press.

    doi.org/10.1017/CBO9780511803161

    Foundationalon matching methods
    DAGsdo-calculusstructural-causal-modelsfoundations
    Annotation

    Pearl provides a comprehensive treatment of causal inference using directed acyclic graphs, the do-calculus, and structural causal models. The book formalizes the rules for reading conditional independence from graphs and establishes when causal effects are identifiable from observational data. It is the foundational reference for any researcher using DAGs to reason about confounding, mediation, and causal identification.

  4. Pearl, J. (2014). Interpretation and Identification of Causal Mediation. Psychological Methods, 19(4), 459–481.

    doi.org/10.1037/a0036434

    structural-causal-modelsidentificationnatural-effectsgraphical-criteria
    Annotation

    Pearl provides a structural causal model perspective on mediation, clarifying the interpretation and identification of natural direct and indirect effects. He shows how graphical criteria can determine when mediation effects are identifiable and contrasts the structural approach with the potential outcomes framework used by Imai, Keele, and Tingley.

  5. Peterson, M. F., Arregle, J.-L., & Martin, X. (2012). Multilevel Models in International Business Research. Journal of International Business Studies, 43(5), 451–457.

    doi.org/10.1057/jibs.2011.59

    SurveyMgmton random effects
    international-businessmultilevelcross-country
    Annotation

    Peterson, Arregle, and Martin review the use of multilevel random-effects models in international business research, where firms are nested within countries. They discuss best practices for modeling cross-level effects and the importance of accounting for the hierarchical structure of international data.

  6. Pongeluppe, L. S. (2024). The Allegory of the Favela: The Multifaceted Effects of Socioeconomic Mobility. Administrative Science Quarterly, 69(3), 619–654.

    doi.org/10.1177/00018392241240469

    ApplicationMgmton experimental design
    RCTfield-experimentsocioeconomic-mobilitystigmaentrepreneurship+1
    Annotation

    Pongeluppe conducts a randomized controlled trial of a business training program offered to residents of Brazilian favelas, complementing the experiment with quantile regressions, field visits, and interviews. The results show that training improves economic outcomes such as income and entrepreneurship participation, but also intensifies participants' experiences of favela-related stigma, revealing that socioeconomic mobility can simultaneously generate material benefits and psychosocial costs.

  7. Porreca, Z. (2022). Synthetic Difference-in-Differences Estimation with Staggered Treatment Timing. Economics Letters, 220, 110874.

    doi.org/10.1016/j.econlet.2022.110874

    staggered-adoptionextensionpolicy-evaluation
    Annotation

    Porreca extends the synthetic DID estimator to staggered treatment adoption settings, where multiple units adopt treatment at different times. The method constructs a localized estimator in which treated units are compared to a never-treated control group weighted on both the time and unit dimensions.

  8. Powell, J. L. (1987). Semiparametric Estimation of Bivariate Latent Variable Models. SSRI Working Paper 8704, University of Wisconsin-Madison.

    Annotation

    Powell develops semiparametric methods for estimating bivariate latent variable models—including censored sample selection models—without imposing distributional assumptions on the error terms. This approach relaxes the bivariate normality requirement of the Heckman two-step estimator, requiring only an exclusion restriction and mild regularity conditions. Researchers who doubt the normality assumption in selection models can apply these methods to obtain consistent estimates under weaker conditions.

  9. Preacher, K. J., & Hayes, A. F. (2008). Asymptotic and Resampling Strategies for Assessing and Comparing Indirect Effects in Multiple Mediator Models. Behavior Research Methods, 40(3), 879–891.

    doi.org/10.3758/BRM.40.3.879

    multiple-mediatorsbootstrappingsoftware
    Annotation

    Preacher and Hayes develop methods and software for testing indirect effects through multiple mediators simultaneously, using bootstrapping to construct confidence intervals. Their approach and accompanying SPSS and SAS macros become extremely widely used in psychology and management research.

  10. Puhani, P. A. (2000). The Heckman Correction for Sample Selection and Its Critique. Journal of Economic Surveys, 14(1), 53–68.

    doi.org/10.1111/1467-6419.00104

    surveycomparisontwo-step-vs-mle
    Annotation

    Puhani provides a short overview of Monte Carlo evidence on the Heckman two-step estimator, comparing it with full-information MLE and subsample OLS. Finds MLE preferable absent collinearity between the exclusion restriction and other regressors, but subsample OLS most robust when collinearity is present.

  11. Pustejovsky, J. E., & Tipton, E. (2018). Small-Sample Methods for Cluster-Robust Variance Estimation and Hypothesis Testing in Fixed Effects Models. Journal of Business & Economic Statistics, 36(4), 672–683.

    doi.org/10.1080/07350015.2016.1247004

    Foundationalon clustering inference
    cluster-robustfew-clustersCR2
    Annotation

    Pustejovsky and Tipton develop the CR2 bias-reduced cluster-robust variance estimator for fixed effects models with few clusters. The CR2 correction improves coverage relative to the standard CR1 estimator when the number of clusters is small.

R
15
  1. Rabe-Hesketh, S., & Skrondal, A. (2012). Multilevel and Longitudinal Modeling Using Stata. Stata Press, 3rd edition.

    multilevel-modelsStatapractical-guidehierarchical
    Annotation

    Rabe-Hesketh and Skrondal provide a comprehensive practical guide to multilevel (hierarchical) models in Stata, which generalize the random effects framework to more complex nested data structures. It is an essential reference for applied researchers implementing multilevel models.

  2. Rambachan, A., & Roth, J. (2023). A More Credible Approach to Parallel Trends. Review of Economic Studies, 90(5), 2555–2591.

    doi.org/10.1093/restud/rdad018

    Foundationalon event studies
    parallel-trendssensitivity-analysishonest-confidence-intervals
    Annotation

    Rambachan and Roth develop a sensitivity analysis framework for assessing the robustness of event-study and difference-in-differences estimates to violations of the parallel trends assumption. Their approach constructs honest confidence intervals under restrictions on how pre-trends can extrapolate into the post-treatment period, providing a disciplined alternative to informal pre-trend tests.

  3. Rathje, J., Katila, R., & Reineke, P. (2024). Making the Most of AI and Machine Learning in Organizations and Strategy Research: Supervised Machine Learning, Causal Inference, and Matching Models. Strategic Management Journal, 45(10), 1926–1953.

    doi.org/10.1002/smj.3604

    SurveyMgmton matching methods
    machine-learningmatchingpropensity-scorecausal-inferencemethodology+1
    Annotation

    Rathje, Katila, and Reineke review how supervised machine learning can support causal-inference workflows in strategy research, with emphasis on two-stage matching models for sample-selection problems. Using technology invention data, they demonstrate ML-based approaches to covariate selection and matching while discussing the broader potential and limits of ML in organizational research.

  4. Raudenbush, S. W., & Bryk, A. S. (2002). Hierarchical Linear Models: Applications and Data Analysis Methods. SAGE Publications.

    HLMmultilevel-modelingnested-datatextbook
    Annotation

    Raudenbush and Bryk popularize hierarchical linear models (HLM), which are random-effects models for nested data structures such as students within schools, in this influential textbook. It becomes the standard reference for multilevel modeling in education, psychology, and organizational research.

  5. Rivers, D., & Vuong, Q. H. (1988). Limited Information Estimators and Exogeneity Tests for Simultaneous Probit Models. Journal of Econometrics, 39(3), 347–366.

    doi.org/10.1016/0304-4076(88)90063-2

    Annotation

    Rivers and Vuong propose a computationally simple two-step maximum likelihood procedure for estimating simultaneous probit models with endogenous regressors, and derive simple exogeneity tests based on this estimator. The exogeneity tests are asymptotically equivalent to classical tests based on limited information maximum likelihood but require only probit and OLS regressions to implement. Applied researchers working with binary outcome models and suspected endogeneity can use the Rivers-Vuong procedure as a tractable alternative to full information maximum likelihood.

  6. Robins, J. M., & Greenland, S. (1992). Identifiability and Exchangeability for Direct and Indirect Effects. Epidemiology, 3(2), 143–155.

    doi.org/10.1097/00001648-199203000-00013

    direct-effectsindirect-effectsepidemiology
    Annotation

    Robins and Greenland provide early formal conditions for identifying direct and indirect causal effects in epidemiology. Their work on controlled direct effects and the assumptions required for mediation analysis lays important groundwork for the modern causal mediation literature.

  7. Robins, J. M., Rotnitzky, A., & Zhao, L. P. (1994). Estimation of Regression Coefficients When Some Regressors Are Not Always Observed. Journal of the American Statistical Association, 89(427), 846–866.

    doi.org/10.1080/01621459.1994.10476818

    AIPWmissing-datasemiparametric
    Annotation

    Robins, Rotnitzky, and Zhao introduce the augmented inverse probability weighting (AIPW) estimator, which combines outcome modeling and propensity score weighting. The key insight is that the estimator is consistent if either the outcome model or the propensity score model is correctly specified, providing a double layer of protection against misspecification.

  8. Robinson, P. M. (1988). Root-N-Consistent Semiparametric Regression. Econometrica, 56(4), 931–954.

    doi.org/10.2307/1912705

    partially-linearsemiparametricroot-n-consistency
    Annotation

    Robinson develops the partially linear regression estimator that achieves root-n consistency for the parametric component by partialling out nonparametric nuisance functions. This paper provides the semiparametric foundation that DML generalizes to the machine learning setting.

  9. Rohrer, J. M., Egloff, B., & Schmukle, S. C. (2017). Probing Birth-Order Effects on Narrow Traits Using Specification-Curve Analysis. Psychological Science, 28(12), 1821–1832.

    doi.org/10.1177/0956797617723726

    Applicationon specification curve
    birth-orderpersonalityapplied-example
    Annotation

    Rohrer, Egloff, and Schmukle apply specification curve analysis to the long-debated question of whether birth order affects personality traits. By running all defensible specifications, they show that most previously reported birth-order effects disappear, demonstrating the method's power to resolve contested empirical questions.

  10. Romano, J. P., & Wolf, M. (2005). Stepwise Multiple Testing as Formalized Data Snooping. Econometrica, 73(4), 1237–1282.

    doi.org/10.1111/j.1468-0262.2005.00615.x

    Foundationalon multiple testing
    stepwise-testingresamplingFWER
    Annotation

    Romano and Wolf develop a stepwise multiple testing procedure that controls the family-wise error rate while being less conservative than Bonferroni by resampling from the joint distribution of test statistics. Their method accounts for the correlation structure among tests and is widely used in economics.

  11. Rosenbaum, P. R., & Rubin, D. B. (1983). The Central Role of the Propensity Score in Observational Studies for Causal Effects. Biometrika, 70(1), 41–55.

    doi.org/10.1093/biomet/70.1.41

    Foundationalon matching methods
    propensity-scoreselection-on-observablescausal-inference
    Annotation

    Rosenbaum and Rubin introduce the propensity score as a dimension-reduction tool for matching, showing that conditioning on the scalar probability of treatment is sufficient to remove selection bias when the unconfoundedness assumption holds. This paper establishes the theoretical foundation for all propensity-score-based methods, including matching, stratification, and inverse probability weighting. The key practical insight is that matching on a single score avoids the curse of dimensionality that makes direct covariate matching infeasible with many confounders.

  12. Rosenbaum, P. R. (2002). Observational Studies. Springer.

    doi.org/10.1007/978-1-4757-3692-2

    observational-studiessensitivity-analysisRosenbaum-boundstextbook
    Annotation

    Rosenbaum provides the standard textbook on observational study design, covering matching, sensitivity analysis, and design principles for drawing causal inferences from non-experimental data. His framework for sensitivity analysis (Rosenbaum bounds) is the standard tool for assessing how much unobserved confounding would be needed to overturn a matching-based finding.

  13. Roth, J. (2022). Pretest with Caution: Event-Study Estimates after Testing for Parallel Trends. American Economic Review: Insights, 4(3), 305–322.

    doi.org/10.1257/aeri.20210236

    pre-trendspre-testinghonest-confidence-intervalsevent-study
    Annotation

    Roth shows that the common practice of testing for parallel pre-trends and proceeding conditional on 'passing' can lead to distorted inference. He proposes honest confidence intervals that account for pre-testing, fundamentally changing how researchers should think about event study pre-trends in DiD designs.

  14. Roth, J., Sant'Anna, P. H. C., Bilinski, A., & Poe, J. (2023). What's Trending in Difference-in-Differences? A Synthesis of the Recent Econometrics Literature. Journal of Econometrics, 235(2), 2218–2244.

    doi.org/10.1016/j.jeconom.2023.03.008

    surveystaggered-DIDheterogeneous-effectspre-trends
    Annotation

    Roth et al. synthesize the explosion of recent econometric work on DID in this comprehensive survey, covering staggered treatment timing, heterogeneous treatment effects, pre-trends testing, and new estimators. It is the essential starting point for understanding the modern DID literature.

  15. Rubin, D. B. (1974). Estimating Causal Effects of Treatments in Randomized and Nonrandomized Studies. Journal of Educational Psychology, 66(5), 688–701.

    doi.org/10.1037/h0037350

    Foundationalon experimental design
    potential-outcomescausal-inferenceRubin-causal-model
    Annotation

    Rubin formalizes the 'potential outcomes' framework that is now central to causal inference. The idea is simple but powerful: each unit has a potential outcome under treatment and under control, and the causal effect is the difference. This paper is the origin of what is now called the Rubin Causal Model.

S
24
  1. Saez, E. (2010). Do Taxpayers Bunch at Kink Points?. American Economic Journal: Economic Policy, 2(3), 180–212.

    doi.org/10.1257/pol.2.3.180

    Foundationalon bunching estimation
    bunchingkink-pointelasticityincome-taxEITC
    Annotation

    Saez introduces the modern bunching methodology by examining taxpayer responses to kink points in the US income tax schedule, where marginal tax rates change discretely. He shows how to estimate the compensated elasticity of reported income from the excess mass of taxpayers at kink points relative to a smooth counterfactual density fitted by polynomial. The paper establishes the standard empirical approach: bin the data, fit a polynomial excluding the bunching region, and compute the excess mass. He finds modest elasticities overall but sharp bunching among the self-employed near the first EITC kink.

  2. Sant'Anna, P. H. C., & Zhao, J. (2020). Doubly Robust Difference-in-Differences Estimators. Journal of Econometrics, 219(1), 101–122.

    doi.org/10.1016/j.jeconom.2020.06.003

    DIDdoubly-robustATT
    Annotation

    Sant'Anna and Zhao develop doubly robust DID estimators that combine outcome regression and inverse probability weighting. The estimator is consistent for the ATT if either the outcome evolution model or the propensity score model for treatment group membership is correctly specified.

  3. Scharfstein, D. O., Rotnitzky, A., & Robins, J. M. (1999). Adjusting for Nonignorable Drop-Out Using Semiparametric Nonresponse Models. Journal of the American Statistical Association, 94(448), 1096–1120.

    doi.org/10.1080/01621459.1999.10473862

    missing-datadropoutsemiparametric-efficiency
    Annotation

    Scharfstein, Rotnitzky, and Robins develop a semiparametric sensitivity analysis framework for nonignorable dropout in longitudinal studies. They propose treating the selection bias parameter as known, then varying it over a plausible range to assess how inferences change. This paper provides foundational methods for sensitivity analysis under nonignorable missing data.

  4. Semadeni, M., Withers, M. C., & Certo, S. T. (2014). The Perils of Endogeneity and Instrumental Variables in Strategy Research: Understanding through Simulations. Strategic Management Journal, 35(7), 1070–1079.

    doi.org/10.1002/smj.2136

    weak-instrumentsstrategy-researchsimulationmethodology
    Annotation

    Semadeni, Withers, and Certo use Monte Carlo simulations to demonstrate the dangers of using weak or invalid instruments in strategy research. They provide practical guidance for management scholars on when and how to use IV, and when it may do more harm than good.

  5. Semenova, V., & Chernozhukov, V. (2021). Debiased Machine Learning of Conditional Average Treatment Effects and Other Causal Functions. Econometrics Journal, 24(2), 264–289.

    doi.org/10.1093/ectj/utaa027

    CATEheterogeneous-effectsgroup-effects
    Annotation

    Semenova and Chernozhukov extend DML to estimate conditional average treatment effects (CATEs) and other causal functions, allowing researchers to characterize treatment effect heterogeneity. They provide inference methods for projections of the CATE onto interpretable subgroups.

  6. Semenova, V. (2025). Generalized Lee Bounds. Journal of Econometrics, 251, 106055.

    doi.org/10.1016/j.jeconom.2025.106055

    Foundationalon lee bounds
    machine-learningcovariatestighter-bounds
    Annotation

    Semenova generalizes Lee bounds to allow for covariates and machine learning estimation of nuisance functions, improving the tightness of bounds while maintaining their nonparametric validity. This paper connects the Lee bounds literature to the modern machine learning causal inference literature.

  7. Shadish, W. R., Cook, T. D., & Campbell, D. T. (2002). Experimental and Quasi-Experimental Designs for Generalized Causal Inference. Houghton Mifflin.

    foundationaltextbookquasi-experimental
    Annotation

    Shadish, Cook, and Campbell write the standard textbook on quasi-experimental designs, including a comprehensive treatment of interrupted time series. Discusses threats to validity (history, instrumentation, selection-maturation interaction) specific to ITS designs and provides guidance on when ITS is most credible.

  8. Shaver, J. M. (1998). Accounting for Endogeneity When Assessing Strategy Performance: Does Entry Mode Choice Affect FDI Survival?. Management Science, 44(4), 571–585.

    doi.org/10.1287/mnsc.44.4.571

    endogeneityself-selectionentry-modeFDIHeckman-correction+1
    Annotation

    Shaver demonstrates how ignoring endogeneity — specifically, the self-selection of firms into entry modes — biases performance estimates in this foundational strategy paper. He shows that the choice between greenfield entries and acquisitions reflects private information about expected survival, and uses a Heckman-style selection correction to obtain unbiased estimates. One of the first papers to systematically demonstrate endogeneity problems in strategy research.

  9. Shipman, J. E., Swanquist, Q. T., & Whited, R. L. (2017). Propensity Score Matching in Accounting Research. The Accounting Review, 92(1), 213–244.

    doi.org/10.2308/accr-51449

    propensity-scoreaccountingbest-practicesmethodology
    Annotation

    Shipman, Swanquist, and Whited review how propensity score matching is used (and sometimes misused) in accounting research. They provide practical guidelines on common pitfalls such as matching on post-treatment variables, inadequate balance checks, and ignoring the unconfoundedness assumption.

  10. Shumway, T. (2001). Forecasting Bankruptcy More Accurately: A Simple Hazard Model. Journal of Business, 74(1), 101–124.

    doi.org/10.1086/209665

    applicationfinancebankruptcy
    Annotation

    Shumway shows that discrete-time hazard models outperform static logit models for bankruptcy prediction because they properly account for the time dimension and censoring. Demonstrates the importance of survival analysis framing for event prediction in finance.

  11. Silva, J. M. C. S., & Tenreyro, S. (2006). The Log of Gravity. Review of Economics and Statistics, 88(4), 641–658.

    doi.org/10.1162/rest.88.4.641

    gravity-modelPPMLtradeheteroskedasticity
    Annotation

    Silva and Tenreyro demonstrate that OLS estimation of log-linearized gravity models produces inconsistent estimates in the presence of heteroskedasticity. They show that Poisson pseudo-maximum-likelihood (PPML) provides consistent estimates and naturally handles zero trade flows, transforming the trade literature.

  12. Simmons, J. P., Nelson, L. D., & Simonsohn, U. (2011). False-Positive Psychology: Undisclosed Flexibility in Data Collection and Analysis Allows Presenting Anything as Significant. Psychological Science, 22(11), 1359–1366.

    doi.org/10.1177/0956797611417632

    Foundationalon pre registration
    p-hackingresearcher-degrees-of-freedomfalse-positives
    Annotation

    Simmons, Nelson, and Simonsohn demonstrate how researcher degrees of freedom in data collection and analysis can inflate false-positive rates dramatically. Their paper, which proposes disclosure requirements and pre-registration as solutions, is one of the catalysts for the replication crisis and pre-registration movement.

  13. Simonsohn, U., Simmons, J. P., & Nelson, L. D. (2020). Specification Curve Analysis. Nature Human Behaviour, 4(11), 1208–1214.

    doi.org/10.1038/s41562-020-0912-z

    Foundationalon specification curve
    specification-curverobustnessanalytical-flexibility
    Annotation

    Simonsohn, Simmons, and Nelson introduce specification curve analysis, which systematically runs all reasonable specifications of a model and displays the distribution of estimates. This approach replaces selective reporting of specifications with a comprehensive view of how results depend on analytical choices.

  14. Singer, J. D., & Willett, J. B. (2003). Applied Longitudinal Data Analysis: Modeling Change and Event Occurrence. Oxford University Press.

    doi.org/10.1093/acprof:oso/9780195152968.001.0001

    surveytextbookdiscrete-time
    Annotation

    Singer and Willett write an accessible textbook covering both growth curve models and discrete-time survival analysis. Chapters 9-15 provide a clear introduction to hazard modeling for social science researchers, with worked examples and practical guidance.

  15. Singh, J., & Agrawal, A. (2011). Recruiting for Ideas: How Firms Exploit the Prior Inventions of New Hires. Management Science, 57(1), 129–150.

    doi.org/10.1287/mnsc.1100.1253

    ApplicationMgmton difference in differences
    knowledge-transferinventor-mobilitypatent-citations
    Annotation

    Singh and Agrawal use a difference-in-differences approach, comparing citation rates to recruits' patents before and after the move against matched control patents, to study how hiring inventors affects knowledge flows to the hiring firm. They find that hiring an inventor increases the hiring firm's citations to the recruit's prior patents, indicating knowledge transfer. The paper demonstrates how DiD with matched controls can identify causal effects in knowledge flow studies.

  16. Smith, J. A., & Todd, P. E. (2005). Does Matching Overcome LaLonde's Critique of Nonexperimental Estimators?. Journal of Econometrics, 125(1–2), 305–353.

    doi.org/10.1016/j.jeconom.2004.04.011

    Foundationalon matching methods
    LaLonde-critiquepropensity-scoreexternal-validity
    Annotation

    Smith and Todd reexamine the Dehejia and Wahba (1999) reanalysis of LaLonde (1986), showing that the matching results are sensitive to specific sample and specification choices. They demonstrate that matching methods cannot solve fundamental problems when treated and comparison groups come from very different populations.

  17. Staiger, D., & Stock, J. H. (1997). Instrumental Variables Regression with Weak Instruments. Econometrica, 65(3), 557–586.

    doi.org/10.2307/2171753

    Foundationalon instrumental variables
    weak-instruments2SLS-biasasymptotic-theory
    Annotation

    Staiger and Stock show formally that when instruments are weak, 2SLS estimates are biased toward OLS and standard inference breaks down. This paper establishes the theoretical foundations for the weak instruments problem that Stock and Yogo (2005) later provided practical tests for.

  18. Starr, E., Frake, J., & Agarwal, R. (2019). Mobility Constraint Externalities. Organization Science, 30(5), 961–980.

    doi.org/10.1287/orsc.2018.1252

    ApplicationMgmton sensitivity analysis
    Oster-methodcoefficient-stabilitynoncompete-agreementslabor-mobilityexternalities
    Annotation

    Starr, Frake, and Agarwal study how noncompete agreements generate externalities for all workers in a labor market, not just those directly constrained. They use Oster's (2019) coefficient stability diagnostic to assess robustness of findings to omitted variable bias, demonstrating that enforceable noncompetes are associated with reduced job offers, mobility, and wages even for unconstrained workers.

  19. Steegen, S., Tuerlinckx, F., Gelman, A., & Vanpaemel, W. (2016). Increasing Transparency Through a Multiverse Analysis. Perspectives on Psychological Science, 11(5), 702–712.

    doi.org/10.1177/1745691616658637

    Foundationalon specification curve
    multiverse-analysisgarden-of-forking-pathstransparency
    Annotation

    Steegen and colleagues introduce multiverse analysis, which examines how results vary across the full set of defensible data processing and analytical decisions. This approach is closely related to specification curve analysis and emphasizes transparency about the garden of forking paths in data analysis.

  20. Stock, J. H., Wright, J. H., & Yogo, M. (2002). A Survey of Weak Instruments and Weak Identification in Generalized Method of Moments. Journal of Business & Economic Statistics, 20(4), 518–529.

    doi.org/10.1198/073500102288618658

    weak-instrumentsGMMweak-identificationsurvey
    Annotation

    Stock, Wright, and Yogo survey the weak instruments and weak identification literature in IV and GMM settings, covering finite-sample bias toward OLS, size distortions in Wald tests, and practical diagnostic tools. The paper provides a comprehensive review of the theoretical landscape; the formal critical value tables now standard in applied work appear in the separate Stock and Yogo (2005) chapter.

  21. Stock, J. H., & Yogo, M. (2005). Testing for Weak Instruments in Linear IV Regression. Identification and Inference for Econometric Models: Essays in Honor of Thomas Rothenberg, 80–108.

    doi.org/10.1017/CBO9780511614491.006

    Foundationalon instrumental variables
    weak-instrumentsF-statisticdiagnostic-test
    Annotation

    Stock and Yogo develop formal critical value tables for testing whether instruments are 'weak'—that is, only weakly correlated with the endogenous variable. Their tables formalize the Staiger and Stock (1997) rule of thumb that the first-stage F-statistic should exceed 10, and are probably the most widely used diagnostic in applied IV research.

  22. Stuart, E. A. (2010). Matching Methods for Causal Inference: A Review and a Look Forward. Statistical Science, 25(1), 1–21.

    doi.org/10.1214/09-STS313

    matching-reviewpropensity-scorepractical-guidancesurvey
    Annotation

    Stuart provides a comprehensive review of matching methods including propensity score matching, Mahalanobis distance matching, and coarsened exact matching, with practical guidance on implementation. She offers an accessible overview of when and how to use different matching approaches.

  23. Stuart, E. A., Cole, S. R., Bradshaw, C. P., & Leaf, P. J. (2011). The Use of Propensity Scores to Assess the Generalizability of Results from Randomized Trials. Journal of the Royal Statistical Society: Series A, 174(2), 369–386.

    doi.org/10.1111/j.1467-985X.2010.00673.x

    Foundationalon external validity
    Annotation

    Stuart, Cole, Bradshaw, and Leaf propose propensity-score-based metrics for quantifying the similarity between randomized trial participants and a target population, using a model that predicts trial participation given observed covariates. The resulting scores enable matching, subclassification, or weighting of trial outcomes to the population, providing a diagnostic framework for assessing external validity. Researchers planning to generalize trial findings should use these propensity score diagnostics to evaluate whether their trial sample adequately represents the intended target population.

  24. Sun, L., & Abraham, S. (2021). Estimating Dynamic Treatment Effects in Event Studies with Heterogeneous Treatment Effects. Journal of Econometrics, 225(2), 175–199.

    doi.org/10.1016/j.jeconom.2020.09.006

    event-studyinteraction-weighteddynamic-effects
    Annotation

    Sun and Abraham show that conventional event-study regression coefficients are contaminated by treatment effect heterogeneity across cohorts and propose an interaction-weighted estimator that recovers clean dynamic treatment effects. This paper is the key reference for event-study plots in staggered settings.

T
3
  1. Therneau, T. M., & Grambsch, P. M. (2000). Modeling Survival Data: Extending the Cox Model. Springer.

    doi.org/10.1007/978-1-4757-3294-8

    surveytextbookcox-extensions
    Annotation

    Therneau and Grambsch provide an authoritative reference on extensions of the Cox model including time-varying covariates, stratification, frailty models, and multistate models. The R survival package is maintained by Therneau and implements the methods described here.

  2. Thistlethwaite, D. L., & Campbell, D. T. (1960). Regression-Discontinuity Analysis: An Alternative to the Ex Post Facto Experiment. Journal of Educational Psychology, 51(6), 309–317.

    doi.org/10.1037/h0044319

    RDD-originscutoff-designquasi-experiment
    Annotation

    Thistlethwaite and Campbell introduce the regression discontinuity design, proposing to compare units just above and just below a cutoff score to estimate causal effects, reasoning that units near the cutoff are as-good-as randomly assigned. The idea lies dormant for decades before being rediscovered by economists.

  3. Train, K. E. (2009). Discrete Choice Methods with Simulation. Cambridge University Press.

    doi.org/10.1017/CBO9780511805271

    Surveyon logit probit
    textbookdiscrete-choicesimulation-estimation
    Annotation

    Train's textbook provides a comprehensive and accessible treatment of logit, probit, mixed logit, and other discrete choice models. It covers both theory and practical simulation-based estimation methods and is widely used in economics, marketing, and transportation research.

V
5
  1. Van der Klaauw, W. (2002). Estimating the Effect of Financial Aid Offers on College Enrollment: A Regression-Discontinuity Approach. International Economic Review, 43(4), 1249–1287.

    doi.org/10.1111/1468-2354.t01-1-00055

    financial-aideducationfuzzy-RDD
    Annotation

    Van der Klaauw applies a fuzzy RDD to study how financial aid offers affect college enrollment decisions, exploiting discontinuities in an aid assignment rule where eligibility changes at GPA thresholds but compliance is imperfect. This paper is one of the earliest and most influential applications of fuzzy RDD.

  2. VanderWeele, T. J. (2015). Explanation in Causal Inference: Methods for Mediation and Interaction. Oxford University Press.

    textbookmediationinteractionsensitivity
    Annotation

    VanderWeele's comprehensive textbook unifies the causal mediation literature, covering potential outcomes and structural equation approaches, sensitivity analysis, time-varying treatments, and interaction effects. It is the standard reference for researchers conducting mediation analysis.

  3. VanderWeele, T. J. (2016). Mediation Analysis: A Practitioner's Guide. Annual Review of Public Health, 37, 17–32.

    doi.org/10.1146/annurev-publhealth-032315-021402

    practitioners-guidesensitivity-analysispublic-healthsurvey
    Annotation

    VanderWeele provides an accessible practitioner-oriented guide to modern causal mediation analysis, covering the assumptions required for identification, sensitivity analysis for unmeasured confounding, and extensions to multiple mediators and interactions. This review is an excellent entry point for applied researchers seeking to move beyond the Baron-Kenny framework.

  4. VanderWeele, T. J., & Ding, P. (2017). Sensitivity Analysis in Observational Research: Introducing the E-Value. Annals of Internal Medicine, 167(4), 268–274.

    doi.org/10.7326/M16-2607

    Foundationalon sensitivity analysis
    E-valueunmeasured-confoundingepidemiology
    Annotation

    VanderWeele and Ding introduce the E-value, a simple and intuitive measure of the minimum strength of association that an unmeasured confounder would need to have with both the treatment and outcome to fully explain away an observed treatment-outcome association. The E-value is widely adopted in epidemiology and increasingly discussed in social science.

  5. Villalonga, B., & Amit, R. (2006). How Do Family Ownership, Control and Management Affect Firm Value?. Journal of Financial Economics, 80(2), 385–417.

    doi.org/10.1016/j.jfineco.2004.12.005

    Applicationon ols regression
    family-firmscorporate-governancefirm-value
    Annotation

    Villalonga and Amit study how different forms of family involvement — ownership, control, and management — affect firm value using OLS regression with clustered standard errors on a panel of Fortune 500 firms. The paper disentangles the separate effects of family ownership, voting control through dual-class shares and pyramids, and family management on Tobin's q.

W
9
  1. Wager, S., & Athey, S. (2018). Estimation and Inference of Heterogeneous Treatment Effects Using Random Forests. Journal of the American Statistical Association, 113(523), 1228–1242.

    doi.org/10.1080/01621459.2017.1319839

    Foundationalon causal forests
    causal-forestsrandom-forestsasymptotic-normality
    Annotation

    Wager and Athey develop causal forests by extending random forests to estimate conditional average treatment effects. They prove pointwise consistency and asymptotic normality under regularity conditions, enabling valid confidence intervals for individualized treatment effect estimates.

  2. Wagner, A. K., Soumerai, S. B., Zhang, F., & Ross-Degnan, D. (2002). Segmented Regression Analysis of Interrupted Time Series Studies in Medication Use Research. Journal of Clinical Pharmacy and Therapeutics, 27(4), 299–309.

    doi.org/10.1046/j.1365-2710.2002.00430.x

    foundationalsegmented-regressionhealth-services
    Annotation

    Wagner and colleagues formalize segmented regression for ITS in health services research. The paper clearly specifies the model with level-change and slope-change parameters, discusses autocorrelation correction, and provides practical recommendations for minimum series length and model diagnostics.

  3. Webb, M. D. (2023). Reworking Wild Bootstrap-Based Inference for Clustered Errors. Canadian Journal of Economics, 56(3), 839–858.

    doi.org/10.1111/caje.12661

    Foundationalon clustering inference
    wild-bootstrapfew-clustersWebb-weights
    Annotation

    Webb introduces the six-point distribution as an alternative to Rademacher weights for the wild cluster bootstrap. The Webb weights improve finite-sample performance when the number of clusters is very small.

  4. Westfall, P. H., & Young, S. S. (1993). Resampling-Based Multiple Testing: Examples and Methods for p-Value Adjustment. Wiley.

    Foundationalon multiple testing
    resamplingpermutationstep-downtextbook
    Annotation

    Westfall and Young develop resampling-based methods for multiple testing that account for the dependence structure among test statistics. Their permutation-based step-down procedure is less conservative than Bonferroni and becomes a standard reference for multiple testing adjustments in applied research.

  5. White, H. (1980). A Heteroskedasticity-Consistent Covariance Matrix Estimator and a Direct Test for Heteroskedasticity. Econometrica, 48(4), 817–838.

    doi.org/10.2307/1912934

    Foundationalon ols regression
    robust-standard-errorsheteroskedasticityinference
    Annotation

    White introduces the now-standard 'robust standard errors' that researchers routinely use with OLS. Before White's correction, standard errors could be misleadingly small when the variance of the error term was not constant across observations. Nearly every empirical paper today uses some variant of this approach.

  6. Wolfolds, S. E., & Siegel, J. (2019). Misaccounting for Endogeneity: The Peril of Relying on the Heckman Two-Step Method without a Valid Instrument. Strategic Management Journal, 40(3), 432–462.

    doi.org/10.1002/smj.2995

    Heckman-correctionexclusion-restrictionselection-modelsmisapplication
    Annotation

    Wolfolds and Siegel demonstrate that the Heckman selection correction is frequently misapplied in management research, particularly when the exclusion restriction is not credible. They show via simulation and replication that applying the Heckman correction without a valid instrument can introduce more bias than it removes. The paper provides a cautionary guide for researchers considering selection models and recommends transparent reporting of the exclusion restriction.

  7. Wooldridge, J. M. (1999). Distribution-Free Estimation of Some Nonlinear Panel Data Models. Journal of Econometrics, 90(1), 77–97.

    doi.org/10.1016/S0304-4076(98)00033-5

    quasi-MLEpanel-datarobustness
    Annotation

    Wooldridge shows that Poisson quasi-maximum-likelihood estimation in panel data models is consistent for the conditional mean even if the data are not Poisson-distributed, as long as the mean is correctly specified. This result justifies the widespread use of Poisson regression for non-count continuous outcomes and provides the foundation for distribution-free estimation of nonlinear panel data models.

  8. Wooldridge, J. M. (2010). Econometric Analysis of Cross Section and Panel Data. MIT Press, 2nd edition.

    textbookpanel-datareference
    Annotation

    Wooldridge's graduate textbook is the standard reference for cross-section and panel data econometrics. Chapters 10-11 provide a thorough treatment of fixed effects, random effects, and related panel data methods, while later chapters cover general estimation methodology (MLE, GMM, M-estimation) with panel data applications throughout. The book covers both linear and nonlinear models with careful attention to assumptions.

  9. Wooldridge, J. M. (2019). Correlated Random Effects Models with Unbalanced Panels. Journal of Econometrics, 211(1), 137–150.

    doi.org/10.1016/j.jeconom.2018.12.010

    Foundationalon random effects
    correlated-random-effectsunbalanced-panelspanel-dataCRE
    Annotation

    Wooldridge extends the correlated random effects (CRE) framework to handle unbalanced panels, which are the norm in applied research. This paper shows how to combine the flexibility of fixed effects with the ability to estimate effects of time-invariant variables, making the CRE approach practical for real-world datasets.

Y
3
  1. Young, C., & Holsteen, K. (2017). Model Uncertainty and Robustness: A Computational Framework for Multimodel Analysis. Sociological Methods & Research, 46(1), 3–40.

    doi.org/10.1177/0049124115610347

    Foundationalon specification curve
    model-uncertaintymultimodel-analysissociology
    Annotation

    Young and Holsteen develop a computational framework for systematically exploring model uncertainty by running thousands of plausible specifications. Their approach is one of the earliest implementations of what would become known as specification curve or multiverse analysis, applied to sociological research.

  2. Young, A. (2019). Channeling Fisher: Randomization Tests and the Statistical Insignificance of Seemingly Significant Experimental Results. Quarterly Journal of Economics, 134(2), 557–598.

    doi.org/10.1093/qje/qjy029

    replicationexperimental-economicsinference
    Annotation

    Young applies randomization inference to a large sample of experimental papers published in top economics journals and finds that many results that appear significant under conventional inference are insignificant under randomization tests. This paper demonstrates the practical importance of randomization inference for credible empirical research.

  3. Young, A. (2022). Consistency Without Inference: Instrumental Variables in Practical Application. European Economic Review, 147, 104112.

    doi.org/10.1016/j.euroecorev.2022.104112

    weak-instrumentspublished-researchreplicationinference-failures
    Annotation

    Young reexamines published IV applications and argues that standard first-stage F-statistic diagnostics are largely uninformative of both size and bias under non-iid errors and high leverage. The paper finds that IV estimates in practice rarely demonstrate that OLS is biased, raising broader questions about the reliability of IV as commonly implemented.

Z
3
  1. Zelner, B. A. (2009). Using Simulation to Interpret Results from Logit, Probit, and Other Nonlinear Models. Strategic Management Journal, 30(12), 1335–1348.

    doi.org/10.1002/smj.783

    ApplicationMgmton logit probit
    simulationinterpretationpredicted-probabilities
    Annotation

    Zelner advocates using simulation-based approaches to interpret and present results from nonlinear models in management research. By computing predicted probabilities and marginal effects via simulation, researchers can convey substantive significance more clearly than raw coefficients.

  2. Zhao, X., Lynch, J. G., & Chen, Q. (2010). Reconsidering Baron and Kenny: Myths and Truths about Mediation Analysis. Journal of Consumer Research, 37(2), 197–206.

    doi.org/10.1086/651257

    mediation-classificationBaron-Kenny-critiqueconsumer-research
    Annotation

    Zhao, Lynch, and Chen provide an important critique of the Baron and Kenny mediation framework from within the marketing literature. They argue that the 'step 1' requirement of a significant total effect is unnecessary and introduces a more sensible classification of mediation types (complementary, competitive, indirect-only, direct-only, no-effect). While still operating within the regression framework rather than the full causal framework, this paper is a significant step forward for applied researchers.

  3. Zhao, Q., Small, D. S., & Bhattacharya, B. B. (2019). Sensitivity Analysis for Inverse Probability Weighting Estimators via the Percentile Bootstrap. Journal of the Royal Statistical Society: Series B, 81(4), 735–761.

    doi.org/10.1111/rssb.12327

    sensitivity-analysishealthcarebootstrapAIPW
    Annotation

    Zhao, Small, and Bhattacharya develop sensitivity analysis tools for inverse probability weighted and augmented IPW estimators via the percentile bootstrap. They apply the methods to evaluate the causal effect of fish consumption on blood mercury levels, demonstrating practical use of AIPW sensitivity analysis in an observational study context. The paper provides a computationally convenient approach for assessing how sensitive doubly robust estimates are to violations of the unconfoundedness assumption.