Poisson / Negative Binomial
Models for count outcomes — patents filed, citations received, number of acquisitions.
Quick Reference
- When to Use
- When your outcome is a non-negative integer count (patents, citations, acquisitions, events). Poisson with robust SEs (PPML) is often all you need even with overdispersion.
- Key Assumption
- Poisson: conditional mean correctly specified as exp(X'beta). With robust SEs, equidispersion is not required — only the mean specification must be correct (Poisson pseudo-maximum likelihood).
- Common Mistake
- Using OLS on count data, which can produce negative predicted values and ignores the discrete, non-negative nature of the outcome. Also, reflexively switching to negative binomial when Poisson with robust SEs already handles overdispersion.
- Estimated Time
- 2.5 hours
One-Line Implementation
poisson y x1 x2, vce(robust)fepois(y ~ x1 + x2, data = df, vcov = 'hetero')smf.poisson('y ~ x1 + x2', data=df).fit(cov_type='HC1')Download Full Analysis Code
Complete scripts with diagnostics, robustness checks, and result export.
Motivating Example: Patent Citations
You are studying what predicts how many citations a patent receives. Citations are the currency of patent value — more citations typically mean a more influential invention. Your data has 50,000 patents, and the citation counts look like this distribution: a huge spike at zero (many patents are never cited), a long right tail (a few blockbuster patents get hundreds of citations), and everything in between.
The mean number of citations is 8.2, but the variance is 142.6. That gap is a variance-to-mean ratio of about 17:1.
If you run OLS on this outcome, three problems arise:
- You predict negative citations for some patents (impossible).
- Your residuals are wildly heteroskedastic (variance increases with the mean).
- The normality assumption for inference is badly violated.
Count models are designed for exactly this situation.
A. Overview: Why Count Models Exist
The Poisson Model
The Poisson regression models the conditional mean of a count outcome as:
The exponential function ensures that predicted counts are always positive. Estimation is by maximum likelihood, assuming .
The key property of the Poisson distribution: the mean equals the variance. That property is:
This restriction is called equidispersion. In practice, real count data very frequently violate this assumption — the variance exceeds the mean, a condition called overdispersion.
The Negative Binomial Model
The negative binomial relaxes equidispersion by adding a dispersion parameter :
When , this collapses to Poisson. When , the variance exceeds the mean, and the negative binomial accommodates the extra variability.
An Important Subtlety: Poisson with Robust SEs
An important result is that even if the Poisson variance assumption is wrong (and it usually is), the Poisson model with robust standard errors still gives consistent estimates of the coefficients. This consistency holds because the Poisson MLE only requires the mean to be correctly specified — you do not need the variance to be correct.
(Gourieroux et al., 1984)This result is the basis for Poisson Pseudo-Maximum Likelihood (PPML). You use the Poisson likelihood as a working model but make no assumption about the variance. With robust SEs, inference is valid as long as the conditional mean is correctly specified.
Common Confusions
(Santos Silva & Tenreyro, 2006)B. Identification
Like OLS and logit/probit, count models require exogeneity for causal interpretation:
where . If your key regressor is endogenous, you need an identification strategy. Common approaches include:
- Poisson with fixed effects — removes time-invariant confounders, just like linear FE.
- Control function approach — the nonlinear equivalent of IV for exponential models.
- PPML with instrumental variables — available in specialized software.
This classic paper first applied Poisson and negative binomial models to patent data, establishing the methodology that most innovation researchers still use.
C. Visual Intuition
Picture the distribution of patent citations. It looks nothing like a bell curve. It is a histogram bunched up at zero, rising to a peak around 2-5 citations, and then trailing off in a long right tail. A few patents have 50, 100, or even 500 citations.
The Poisson model says: the average number of citations depends on covariates through an exponential function, and the spread around that average follows a Poisson distribution. But if the actual spread is much wider than the Poisson predicts (which it very frequently is), you have overdispersion.
Visually, overdispersion means the histogram is wider and has thicker tails than the Poisson would predict. The negative binomial adds a "mixing" distribution that spreads things out more, better matching the observed data.
Count Data: Poisson vs OLS
OLS ignores the non-negative, discrete nature of count outcomes, producing negative predicted values when the true rate is low. When overdispersion is present, default (model-based) Poisson standard errors are too small; the Negative Binomial and robust Poisson SEs correctly account for the extra variance.
Computed Results
- Robust Std. Error (accounts for overdispersion)
- 0.122
- OLS / Default Poisson Std. Error
- 0.122
- SE Ratio (Robust / Default) — >1 means default SEs are too small
- 1.00
Why Poisson / Negative Binomial?
Count DGP: log E[Y|X] = 0.5 + 0.5 · X, overdispersion α = 0.5. N = 200. Estimates are Average Marginal Effects (AMEs) so all models are on the same scale.
Estimation Results
| Estimator | β̂ | SE | 95% CI | Bias |
|---|---|---|---|---|
| OLS | 1.195 | 0.187 | [0.83, 1.56] | -0.463 |
| Poisson | 1.293 | 0.118 | [1.06, 1.52] | -0.366 |
| Neg. Binomialclosest | 1.324 | 0.187 | [0.96, 1.69] | -0.334 |
| True β | 1.658 | — | — | — |
Number of observations
Coefficient in the log-linear model
0 = equidispersed (Poisson); higher = more overdispersion
Why the difference?
In this sample OLS avoids negative predictions, but with a wider X range or different parameters the linear model would inevitably predict negative counts. The fundamental issue remains: OLS ignores the non-negative, discrete nature of count data. With overdispersion = 0.5, the variance exceeds the mean (violating Poisson's equidispersion assumption). The Negative Binomial model accommodates extra variability, providing more reliable standard errors. Negative Binomial recovers the true AME more accurately (AME = 1.324) than Poisson (AME = 1.293). All estimates are reported as Average Marginal Effects (AMEs), putting OLS and log-linear models on the same scale. Log-likelihood comparison: Poisson = -501.9, NB = -434.5.
D. Mathematical Derivation
Don't worry about the notation yet — here's what this means in words: The Poisson model maximizes the likelihood of observing the counts you see, given an exponential mean function. The overdispersion test checks whether the Poisson variance assumption holds.
Poisson log-likelihood:
For with :
The first-order conditions are:
Note the similarity to OLS normal equations — replace the linear predictor with the exponential.
Overdispersion test (Cameron & Trivedi, 1990):
Regress on (or ). Under the Poisson assumption, the coefficient should be zero. A positive and significant coefficient indicates overdispersion.
(Cameron & Trivedi, 1990)Negative binomial:
The NegBin adds a gamma-distributed heterogeneity term, leading to:
Testing is a test of Poisson against NegBin.
Coefficient interpretation:
For a continuous covariate :
So is a semi-elasticity: a one-unit increase in changes the expected count by approximately percent.
E. Implementation
library(fixest)
library(MASS)
library(pscl)
# Poisson with robust SEs
pois_fit <- fepois(citations ~ rd_spending + firm_age | tech_class + year,
data = df, vcov = ~firm_id)
summary(pois_fit)
# Negative binomial
nb_fit <- glm.nb(citations ~ rd_spending + firm_age + factor(tech_class), data = df)
summary(nb_fit)
# Zero-inflated Poisson
zip_fit <- zeroinfl(citations ~ rd_spending + firm_age | small_firm, data = df)
summary(zip_fit)
# Overdispersion test
library(AER)
dispersiontest(glm(citations ~ rd_spending + firm_age, family = poisson, data = df))F. Diagnostics
Testing for Overdispersion
- Deviance test: Compare the deviance (or Pearson chi-squared) to the degrees of freedom. A ratio much greater than 1 suggests overdispersion.
- Cameron-Trivedi test: A regression-based test (see derivation above).
- Compare Poisson and NegBin: If the NegBin dispersion parameter is significantly different from zero, overdispersion is present.
Zero-Inflation
If your data has more zeros than the Poisson (or NegBin) predicts, you may need a zero-inflated model. Two options:
- Zero-inflated Poisson/NegBin (ZIP/ZINB): A two-part model where one equation determines the probability of being a "structural zero" (e.g., a patent that could never receive citations), and a second equation models the count for the rest.
- Hurdle model: The first part is a binary model for zero vs. positive, and the second part is a truncated count model for positive counts.
The Vuong test can help distinguish between standard Poisson and zero-inflated Poisson, though it has known size issues.
PPML for Gravity Models
In international trade, the gravity equation models bilateral trade flows as a function of GDP and distance. PPML has become the standard estimator because it:
- Handles zeros naturally (many country pairs have zero trade).
- Is robust to heteroskedasticity in levels.
- Provides consistent estimates even with non-integer outcomes.
Interpreting Results
Incidence Rate Ratios (IRRs)
The exponentiated coefficient is the incidence rate ratio. If , a one-unit increase in multiplies the expected count by 1.15 — a 15% increase.
Semi-Elasticities
The raw coefficient is a semi-elasticity: a one-unit increase in is associated with an approximate % change in the expected count. This approximation is good for small (say, below 0.3 in absolute value).
Comparing Poisson and NegBin Coefficients
Unlike logit (where rescaling makes comparison difficult), Poisson and NegBin coefficients are directly comparable because both target the same conditional mean function. If the coefficients differ substantially, it suggests the conditional mean may be misspecified.
G. What Can Go Wrong
| Problem | What It Does | How to Fix It |
|---|---|---|
| Using OLS on counts | Negative predictions, wrong SEs, inefficient | Use Poisson or NegBin |
| Using log(Y+1) | Jensen's inequality bias, arbitrary constant | Use PPML or count models |
| Ignoring overdispersion with default Poisson SEs | Standard errors are too small, false significance | Use robust/clustered SEs or switch to NegBin |
| Confusing zero-inflation with overdispersion | Both produce excess zeros but for different reasons | Zero-inflated models for structural zeros; NegBin for general overdispersion |
| Incidental parameters problem with NegBin FE | NegBin with unit FE can be inconsistent | Use Poisson FE (which is consistent) or the Hausman-Hall-Griliches approach |
Using OLS on Count Data
Poisson regression with robust standard errors, which guarantees non-negative predicted counts
Estimated effect of R&D on citations: coefficient = 0.12 (SE = 0.03). A $1M increase in R&D is associated with a 12% increase in expected citations. All predicted counts are positive.
Using log(Y+1) Instead of Count Models
PPML (Poisson PML) directly models the conditional mean E[Y|X] = exp(X'beta), handling zeros naturally
PPML coefficient on FTA membership: 0.38 (SE = 0.09). Free trade agreements increase bilateral trade by approximately 46% (exp(0.38) - 1 = 0.46). Zeros in trade flows are included in estimation.
Ignoring Overdispersion with Default Poisson SEs
Poisson regression with robust (sandwich) standard errors that account for overdispersion
Coefficient on R&D: 0.12. Robust SE = 0.031. 95% CI: [0.059, 0.181]. Confidence interval has correct coverage despite variance being 17x the mean.
You estimate a Poisson regression of patent citations on R&D spending (in millions). The coefficient on R&D is 0.12 (SE = 0.03). How do you interpret this coefficient?
H. Practice
You run a Poisson regression and find that the variance of your outcome (patent citations) is 15 times larger than the mean. A colleague says: 'Your Poisson model is invalid — you must switch to negative binomial.' Is the colleague correct?
A researcher studies hospital readmissions (a count outcome). She uses an exposure variable (length of initial stay in days) because patients with longer stays have more time at risk of readmission. How should she incorporate this exposure variable in the Poisson model?
A colleague uses ln(patents + 1) as the dependent variable and runs OLS. She argues this is equivalent to Poisson regression because 'both model the log of the outcome.' Is she correct?
An innovation researcher estimates a negative binomial regression with firm fixed effects to study how R&D tax credits affect patent counts. She has a panel of 3,000 firms over 8 years. A reviewer says she should use Poisson FE instead. Why?
You estimate two models for patent citations. The Poisson model gives a deviance of 8,500 with 4,000 degrees of freedom. The NegBin model gives an estimated dispersion parameter alpha = 2.3 (SE = 0.4).
Is the deviance-to-df ratio consistent with overdispersion? Is the NegBin dispersion parameter significant?
Read the analysis below carefully and identify the errors.
Select all errors you can find:
Read the analysis below carefully and identify the errors.
Select all errors you can find:
Read the paper summary below and write a brief referee critique (2-3 sentences) of the identification strategy.
Paper Summary
The authors study whether venture capital (VC) investment increases startup patent output. Using a sample of 8,000 startups observed annually over 2005-2018, they estimate a Poisson regression of annual patent counts on a VC funding dummy, controlling for firm age, industry, founding year, and total employees. They include year fixed effects. Standard errors are clustered at the firm level. They find that VC-backed startups produce 65% more patents (IRR = 1.65, p < 0.001) and conclude that VC funding causally increases innovation.
Key Table
| Variable | IRR | Clustered SE | p-value |
|---|---|---|---|
| VC funded (0/1) | 1.65 | 0.12 | 0.000 |
| Firm age | 1.08 | 0.02 | 0.000 |
| Log(employees) | 1.32 | 0.05 | 0.000 |
| Year FE | Yes | ||
| Industry FE | Yes | ||
| Firm FE | No | ||
| N (firm-years) | 42,000 | ||
| Alpha (dispersion) | 2.1 |
Authors' Identification Claim
By controlling for firm age, industry, size, and year effects, we isolate the independent effect of VC funding on patent production. Clustering at the firm level accounts for serial correlation.
I. Swap-In: When to Use Something Else
- OLS on log(Y): If all your counts are large (say, above 20) and you have no zeros, taking the log and running OLS is approximately valid. But with zeros or small counts, do not do this transformation.
- Tobit: If your count is in fact a censored continuous variable (e.g., hours worked), Tobit may be more appropriate.
- Hurdle models: If the process that generates zeros is different from the process that generates positive counts (e.g., the decision to patent at all vs. how many patents to file), a hurdle model captures this two-stage structure.
- Quasi-Poisson: In some fields, quasi-Poisson (which adjusts the variance without specifying a full distribution) is used as a middle ground.
J. Reviewer Checklist
Critical Reading Checklist
Paper Library
Foundational (6)
Hausman, J., Hall, B. H., & Griliches, Z. (1984). Econometric Models for Count Data with an Application to the Patents–R&D Relationship.
This paper developed the econometric framework for Poisson and negative binomial regression models applied to count data, using the relationship between R&D spending and patent counts as the motivating application. It established the standard approach for modeling count outcomes in economics.
Cameron, A. C., & Trivedi, P. K. (1986). Econometric Models Based on Count Data: Comparisons and Applications of Some Estimators and Tests.
Cameron and Trivedi compared Poisson, negative binomial, and other count data models, providing tests for overdispersion and guidance on model selection. This paper helped establish the practical toolkit for applied researchers working with count outcomes.
Wooldridge, J. M. (1999). Distribution-Free Estimation of Some Nonlinear Panel Data Models.
Wooldridge showed that Poisson quasi-maximum-likelihood estimation is consistent for the conditional mean even if the data are not Poisson-distributed, as long as the mean is correctly specified. This result justifies the widespread use of Poisson regression for non-count continuous outcomes.
Silva, J. M. C. S., & Tenreyro, S. (2006). The Log of Gravity.
Silva and Tenreyro demonstrated that OLS estimation of log-linearized gravity models produces inconsistent estimates in the presence of heteroskedasticity. They showed that Poisson pseudo-maximum-likelihood (PPML) provides consistent estimates and naturally handles zero trade flows, transforming the trade literature.
Lambert, D. (1992). Zero-Inflated Poisson Regression, with an Application to Defects in Manufacturing.
Lambert introduced the zero-inflated Poisson (ZIP) model, which accounts for excess zeros in count data by mixing a point mass at zero with a Poisson distribution. The ZIP model has become a standard tool for count outcomes where a subpopulation generates only zeros, such as patent counts for non-innovating firms.
Correia, S., Guimaraes, P., & Zylkin, T. (2020). Fast Poisson Estimation with High-Dimensional Fixed Effects.
Introduces the ppmlhdfe Stata command for fast Poisson estimation with multiple levels of fixed effects, making PPML feasible for large datasets with high-dimensional fixed effects. This tool has become standard for applied researchers working with count data in panel settings.
Application (4)
Griliches, Z. (1990). Patent Statistics as Economic Indicators: A Survey.
Griliches surveyed the use of patent data as economic indicators, establishing patent counts as a key measure of innovative output. This survey motivated much of the subsequent applied work using Poisson and negative binomial models to study innovation.
Fleming, L., & Sorenson, O. (2001). Technology as a Complex Adaptive System: Evidence from Patent Data.
Fleming and Sorenson used negative binomial regression on patent citation counts to study how the complexity of technological combinations affects the usefulness of inventions. This paper is a prominent application of count models in the innovation and technology management literature.
Katila, R., & Ahuja, G. (2002). Something Old, Something New: A Longitudinal Study of Search Behavior and New Product Introduction.
Katila and Ahuja used negative binomial models to study how the depth and scope of a firm's knowledge search affect new product introductions. This paper is a widely cited application of count data models in the strategic management and innovation literature.
Singh, J., & Agrawal, A. (2011). Recruiting for Ideas: How Firms Exploit the Prior Inventions of New Hires.
Singh and Agrawal used negative binomial regression to study how hiring inventors affects the knowledge flows to the hiring firm, as measured by citation counts. This paper demonstrates the application of count models to questions of knowledge transfer and human capital in organizations.
Survey (2)
Cameron, A. C., & Trivedi, P. K. (2013). Regression Analysis of Count Data.
This textbook is the definitive reference on count data regression, covering Poisson, negative binomial, zero-inflated, hurdle, and panel count models. It provides both the theoretical foundations and practical implementation guidance that applied researchers need.
Wooldridge, J. M. (2010). Econometric Analysis of Cross Section and Panel Data.
Chapter 18 provides a rigorous treatment of count data models in the context of panel data and cross-sectional settings, covering Poisson, negative binomial, and related estimators with careful attention to the quasi-MLE properties that justify Poisson estimation even when the data are not Poisson-distributed.