How to Replicate a Study
A step-by-step guide for replicating published causal inference papers. Covers choosing a paper, obtaining data, reproducing results, diagnosing discrepancies, and extending the analysis.
Why Replication Is the Best Way to Learn
You cannot truly understand a method by reading about it. You understand it by implementing it on real data, encountering every practical complication that the textbook omits, and debugging your code until your estimates match (or do not match) published results. Replication is the bridge between textbook knowledge and research competence.
Beyond personal learning, replication serves science. Published results that cannot be reproduced should not be trusted. Every successful replication adds to the credibility of a finding; every failed replication signals that something needs further investigation.
Step 1: Choose a Paper
Not every paper is a good candidate for replication. Look for these qualities:
Publicly available data. Many top journals now require data and code deposits. Check the journal website (AEA Data Archive, Harvard Dataverse, ICPSR), the paper's online appendix, or the authors' personal websites. Papers published before the open data era may still have available data if the underlying sources are public.
Clear identification strategy. Papers with clean designs — difference-in-differences, regression discontinuity, experiments with randomized assignment — are easier to replicate than papers with complex structural models or proprietary identification schemes. Start simple.
Reasonable scope. For your first replication, choose a paper with one main table and a few robustness checks. A paper with twenty tables and a hundred-page appendix is not a learning exercise; it is a career.
Replication code available. Having the authors' original code is not cheating — it is a reference implementation. Write your own code first, then compare. Discrepancies between your code and theirs will teach you more than either alone.
Step 2: Read the Paper Thoroughly
Before touching data, read the paper at least twice using the framework from the reading guide:
- Identify the research question and estimand. What causal effect is being estimated? ATE, ATT, or LATE?
- Map the identification strategy. What is the source of exogenous variation? What are the key assumptions?
- Draw the DAG. Place the treatment, outcome, confounders, and instrument (if applicable) on the graph.
- Inventory every table and figure. Make a checklist of everything you need to reproduce. Note which tables contain the main results versus robustness checks.
- Document data transformations. Read the data section carefully. Note every sample restriction, variable construction, and recoding. These details are often buried in footnotes or appendix paragraphs, and they matter enormously.
Step 3: Obtain and Explore the Data
Once you have the data:
Verify the basics. How many observations? How many unique units? What is the time coverage? Compare your counts to the paper's description. Discrepancies here indicate you have the wrong data or are applying different sample restrictions.
Reproduce Table 1. Almost every empirical paper starts with descriptive statistics. Compute means, standard deviations, and sample sizes for every variable in the paper's summary statistics table. If your numbers do not match, stop and figure out why before proceeding.
Explore the data beyond what the paper shows. Look for outliers, missing data patterns, and distributional anomalies. Sometimes you will discover data issues that the paper does not discuss, which can inform your own extensions later.
Step 4: Reproduce the Main Results
Work column by column through the main results table:
- Construct the exact sample. Apply every sample restriction mentioned in the paper, in the order described. Drop the same observations. Restrict to the same time period.
- Create variables. This step is often the hardest. Variable constructions that take one sentence to describe can require hours of data wrangling to implement. Pay close attention to how variables are winsorized, logged, lagged, or interacted.
- Run the specification. Use the same estimator, the same fixed effects, the same standard error correction. Match every detail.
- Compare results. Your coefficients should match to at least two decimal places. Standard errors may differ slightly due to software version differences, convergence criteria, or rounding at intermediate steps.
Your replicated coefficient is 0.342 while the published coefficient is 0.347. Your standard error is 0.089 while the published SE is 0.091. How should you interpret this?
Step 5: Diagnose Discrepancies
When your results do not match (and at some point they will not), diagnose systematically:
Check sample sizes first. If your N differs from the paper's N, you have a sample construction problem. This mismatch is the most common source of discrepancies and is important to resolve before anything else.
Compare variable distributions. If N matches but coefficients differ, compare the mean and standard deviation of every variable in the regression. This comparison will reveal variable construction differences.
Check fixed effects and clustering. Subtle differences in how fixed effects are implemented (absorbed vs. included as dummies) or how clustering is handled can produce noticeable differences, especially with few clusters.
Consult the authors' code. If available, read their code line by line. Often you will discover an undocumented sample restriction, a specific winsorization threshold, or a variable transformation that is not described in the paper.
Contact the authors. If you have exhausted other options, email the authors. Be specific: "I can reproduce your Table 2, Column 3 but not Column 4. My sample size matches yours, but my coefficient is 0.15 instead of 0.23. Here is my code." Most researchers appreciate replication efforts and will help.
Step 6: Reproduce Robustness Checks
After matching the main results, reproduce the key robustness checks. These checks are often more instructive than the main results because they reveal the sensitivity (or resilience) of the findings to analytical choices.
Pay particular attention to:
- Placebo tests. Do the null results truly show null effects, or are some borderline significant?
- Alternative specifications. How much do coefficients move across specifications?
- Sensitivity analysis. If the paper includes Oster bounds, reproduce them. They are straightforward to implement and highly informative.
Step 7: Extend the Analysis
A completed replication is the perfect foundation for original research. Consider these extensions:
Apply diagnostics the paper did not include. Run an Oster (2019) sensitivity analysis if the paper did not. Compute a specification curve. Apply modern staggered DiD estimators if the paper used standard TWFE with staggered treatment.
Update with new data. Does the result hold with more recent observations? Temporal stability is a strong form of robustness.
Explore heterogeneity. Are effects concentrated in particular subgroups that the paper did not examine? Heterogeneity analysis using causal forests can reveal patterns that pre-specified subgroup analysis misses.
Apply a different method. If the paper used DiD, can the same question be answered with an IV or RDD? If methods that rely on different assumptions produce similar estimates, the finding is more credible.
Replication Checklist
Use this checklist to track your progress:
- Paper selected with available data and clear identification strategy
- Paper read twice; DAG drawn; table checklist created
- Data downloaded and summary statistics match Table 1
- Main results table reproduced (coefficients match to 2+ decimal places)
- Standard errors and significance levels match
- Key robustness checks reproduced
- Discrepancies (if any) diagnosed and documented
- At least one extension attempted
- Code documented, version-controlled, and reproducible
- Replication log maintained throughout the process