Writing Up Your Results
How to present causal inference results in academic papers. Covers structuring the results section, formatting tables, reporting effect sizes, discussing robustness, and avoiding common writing mistakes.
The Results Section Is Where Credibility Lives
Your results section does not merely report numbers. It builds a case. Every table, every figure, every sentence should serve the same purpose: convincing a skeptical reader that your identification strategy works and that your estimate is credible. If the reader reaches your main table without understanding why your coefficient is causal, you have failed.
Core Principles
Lead with the design, not the data. Before showing any regression table, remind the reader of your identification strategy. What is the source of exogenous variation? What does the estimator identify? The reader should know how to interpret the coefficient before they see it.
Report the estimand, not just the estimate. State clearly whether you are estimating the ATE, ATT, or LATE. This choice determines the population your results speak to. A difference-in-differences estimate is typically an ATT. An instrumental variables estimate is a LATE for compliers. These estimands are different objects, and conflating them is a fundamental error that reviewers will catch.
Economic significance matters as much as statistical significance. A coefficient with p = 0.001 that implies a 0.2% change in the outcome is statistically significant but may not be economically important. It is important to interpret your effect size: How many standard deviations? What percentage of the dependent variable mean? How does it compare to other interventions in the same domain?
Show your work on robustness. Reviewers expect sensitivity analysis. Report what checks you ran, which assumptions you tested, and what happened when you varied your specification choices. Anticipate the reviewer's objections and address them before they are raised.
Structuring the Results Section
A well-organized results section typically follows this sequence:
1. Descriptive Statistics and Balance
Start with summary statistics for key variables. For experiments or matching designs, include a balance table comparing treated and control groups on observables. For DiD, show that treated and control groups looked similar before treatment. This summary is Table 1 in most papers, and it is more important than many authors realize.
2. Main Results
Present your primary specification — the one that most directly implements your identification strategy. Build up systematically: start with the simplest version (no controls, minimal fixed effects), then progressively add controls and fixed effects. Each column should add something, and you should explain why.
3. Pre-trend and Validity Tests
For DiD designs, show the event study plot with pre-treatment coefficients. For RDD, show the McCrary density test and covariate smoothness checks. For IV, report the first-stage table with the F-statistic. These diagnostics are not optional appendix material — they belong in the main paper because they directly support your identification.
4. Robustness Checks
This section should address the most important threats to your identification. Common checks include:
- Alternative specifications: Different sets of controls, different functional forms, different fixed effects structures
- Placebo tests: Show null effects where you expect them (wrong outcome, wrong time, wrong group)
- Sample restrictions: Drop outliers, restrict to different time windows, exclude potentially problematic units
- Sensitivity analysis: Oster (2019) bounds for selection on unobservables, or Cinelli and Hazlett (2020) for omitted variable bias
5. Heterogeneity (If Warranted)
Subgroup analysis can reveal important variation in treatment effects. But be honest about multiple testing — if you examine twenty subgroups, one will be significant by chance. Pre-register heterogeneity analyses when possible, or apply multiple testing corrections.
6. Mechanism Analysis (If Warranted)
If you claim to understand why the treatment works, use proper causal mediation analysis. Do not simply "add the mediator as a control" — this shortcut is not a valid test of mediation and can introduce bias.
Table Formatting Best Practices
Three horizontal lines only. Top border, below the header row, and bottom border. No vertical lines. No grid lines. No shading for alternating rows. This format is the standard across most social science journals.
Standard errors in parentheses. Report standard errors directly below coefficients in parentheses. It is important to state the type: robust, clustered (and at what level), bootstrapped. Generally avoid reporting t-statistics instead.
Include the dependent variable mean. Readers need this to interpret the magnitude of your coefficient. A treatment effect of 500 means very different things when the mean is 1,000 versus 50,000.
Show N clearly. Report the number of observations in every column. If N changes across columns, explain exactly why. Unexplained drops in sample size raise red flags about selective sample construction.
Use asterisks consistently. If using significance stars, define them in the table notes: * p < 0.10, ** p < 0.05, *** p < 0.01. Better yet, report confidence intervals or exact p-values when space permits.
Label everything. Every column needs a clear header. Fixed effects should be indicated with "Yes/No" rows. Controls should be listed or described. The reader should understand every element of the table without consulting the text.
Sentence Templates
These templates are not formulas to copy mechanically. They are scaffolding to help you hit the right notes of precision, context, and honesty.
Introducing the main result:
Column (1) of Table 2 reports estimates from our baseline specification. The coefficient on [treatment variable] is [value] (SE = [value]), implying that [treatment] increased [outcome] by [X]%, relative to the dependent variable mean of [value].
Showing stability across specifications:
Columns (2) through (4) progressively add [controls/fixed effects]. The point estimate remains stable at [range], suggesting that the estimated effect is not driven by [specific concern that the additional controls address].
Discussing a robustness check:
Table 3 reports sensitivity of our main result to [specific variation]. In Column (2), we [describe change]; the coefficient moves to [value], within the confidence interval of our baseline estimate.
Acknowledging limitations:
Our design identifies the effect of [treatment] on [outcome] for [specific population/context]. We cannot rule out that [specific threat to validity]. To assess the severity of this concern, we conduct [sensitivity analysis], which shows that confounding would need to be [X times] as strong as [strongest observed covariate] to fully explain our result.
You run a DiD regression and find a treatment effect of 2.3 (SE = 0.8) with a dependent variable mean of 45. Which is the best way to report this result?
Common Mistakes
Burying the identification. If the reader encounters your main regression table before understanding why your estimate is causal, the table will be unpersuasive. It is important to set up the identification strategy clearly before presenting results.
Reporting only statistical significance. "The effect is significant at the 5% level" tells the reader almost nothing about whether the effect matters. A tiny effect can be statistically significant with a large enough sample. It is important to report and interpret effect sizes.
Omitting standard error details. "Robust standard errors" is insufficient. Clustered at what level? Why that level? If treatment is assigned at the state level, you need state-level clustering. Getting this wrong can dramatically overstate your precision.
Overselling robustness. Twelve specifications that all use the same data, similar models, and identical identifying assumptions do not constitute strong robustness evidence. True robustness checks vary the assumptions, not just the details.
Cherry-picking specifications. If you tried twenty specifications and report the five that support your hypothesis, you are p-hacking. Pre-analysis plans help, but at minimum, report the full range of estimates across reasonable specifications using a specification curve, not just the favorable ones.
Confusing the estimand. Claiming your IV estimate applies to the full population when it is a LATE for compliers is a fundamental error. Be precise about who your estimate is about and resist the temptation to overclaim.