Poisson / Negative Binomial
Models for count outcomes — patents filed, citations received, number of acquisitions.
One-Line Implementation
fepois(y ~ x1 + x2, data = df, vcov = 'hetero')poisson y x1 x2, vce(robust)smf.poisson('y ~ x1 + x2', data=df).fit(cov_type='HC1')Download Full Analysis Code
Complete scripts with diagnostics, robustness checks, and result export.
Motivating Example: Patent Citations
You are studying what predicts how many citations a patent receives. Citations are a widely used proxy for patent value — more citations typically suggest a more influential invention. Your dataset has 50,000 patents, and the citation counts look like this distribution: a huge spike at zero (many patents are never cited), a long right tail (a few blockbuster patents get hundreds of citations), and everything in between.
The mean number of citations is 8.2, but the variance is 142.6. That gap is a variance-to-mean ratio of about 17:1.
If you run OLS on this outcome, three problems arise:
- You predict negative citations for some patents (impossible).
- Your residuals are wildly heteroskedastic (variance increases with the mean).
- The normality assumption for inference is badly violated.
Count models are designed for exactly this situation.
AOverview
The Poisson Model
The models the conditional mean of a count outcome as:
The exponential function ensures that predicted counts are always positive. Estimation is by maximum likelihood, assuming .
The key property of the Poisson distribution: the mean equals the variance. That property is:
This restriction is called equidispersion. In practice, real count data very frequently violate this assumption — the variance exceeds the mean, a condition called .
The Negative Binomial Model
The negative binomial relaxes equidispersion by adding a dispersion parameter :
When , this collapses to Poisson. When , the variance exceeds the mean, and the negative binomial accommodates the extra variability. This specification is the Negative Binomial type 2 (NB2) parameterization (Cameron & Trivedi, 1986). An alternative is Negative Binomial type 1 (NB1), where — overdispersion proportional to the mean rather than quadratic (Wooldridge, 2010).
An Important Subtlety: Poisson with Robust SEs
An important result is that even if the Poisson variance assumption is wrong (and it usually is), the Poisson model with robust standard errors still gives consistent estimates of the coefficients. This consistency holds because the Poisson maximum likelihood estimator (MLE) only requires the mean to be correctly specified — you do not need the variance to be correct (Gourieroux et al., 1984).
This result is the basis for Poisson (PPML). You use the Poisson likelihood as a working model but make no assumption about the variance. With robust SEs, inference is valid as long as the conditional mean is correctly specified.
Common Confusions
BIdentification
Like OLS and logit/probit, count models require exogeneity for causal interpretation:
where . If your key regressor is endogenous, you need an identification strategy. Common approaches include:
- Poisson with fixed effects — removes time-invariant confounders, just like linear FE.
- Control function approach — the nonlinear equivalent of IV for exponential models.
- PPML with instrumental variables — available in specialized software.
Hausman et al. (1984) first applied Poisson and negative binomial models to patent data, establishing the methodology that many innovation researchers still use. In management, count models are widely used to study innovation and organizational behavior — for example, Ahuja (2000) used negative binomial regression to examine how network structure and structural holes affect firm innovation output, and Greve (2003) modeled R&D expenditures and innovation counts to test behavioral theories of the firm.
CVisual Intuition
Picture the distribution of patent citations. It looks nothing like a bell curve. It is a histogram bunched up at zero, rising to a peak around 2-5 citations, and then trailing off in a long right tail. A few patents have 50, 100, or even 500 citations.
The Poisson model says: the average number of citations depends on covariates through an exponential function, and the spread around that average follows a Poisson distribution. But if the actual spread is much wider than the Poisson predicts (which it very frequently is), you have overdispersion.
Visually, overdispersion means the histogram is wider and has thicker tails than the Poisson would predict. The negative binomial adds a "mixing" distribution that spreads things out more, better matching the observed data.
DMathematical Derivation
Don't worry about the notation yet — here's what this means in words: The Poisson model maximizes the likelihood of observing the counts you see, given an exponential mean function. The overdispersion test checks whether the Poisson variance assumption holds.
Poisson log-likelihood:
For with :
The first-order conditions are:
Note the similarity to OLS normal equations — replace the linear predictor with the exponential.
Overdispersion test (Cameron & Trivedi, 1990):
Regress on (or ). Under the Poisson assumption, the coefficient should be zero. A positive and significant coefficient indicates overdispersion.
Negative binomial:
The NegBin adds a gamma-distributed heterogeneity term, leading to:
Testing is a test of Poisson against NegBin.
Coefficient interpretation:
For a continuous covariate :
So is a semi-elasticity: a one-unit increase in changes the expected count by approximately percent.
EImplementation
# Requires: fixest, MASS, pscl, AER
library(fixest)
library(MASS)
library(pscl)
# --- Step 1: Poisson with Robust SEs (PPML) ---
# fepois() from fixest fits Poisson pseudo-maximum likelihood.
# Coefficients are semi-elasticities: beta = approx. % change in E[Y] per unit X.
# vcov = ~firm_id clusters standard errors at the firm level.
# The "|" syntax absorbs tech_class and year fixed effects.
pois_fit <- fepois(citations ~ rd_spending + firm_age | tech_class + year,
data = df, vcov = ~firm_id)
# Output: coefficients in log units. Exponentiate for IRRs: exp(beta).
summary(pois_fit)
# --- Step 2: Negative Binomial ---
# glm.nb() from MASS adds a dispersion parameter (alpha) to relax
# the Poisson equidispersion assumption (mean = variance).
# Use when you want model-based SEs that account for overdispersion.
nb_fit <- glm.nb(citations ~ rd_spending + firm_age + factor(tech_class), data = df)
# Check theta (= 1/alpha): small theta = high overdispersion.
summary(nb_fit)
# --- Step 3: Zero-Inflated Poisson ---
# zeroinfl() fits a two-part model: (1) logit for structural zeros
# (e.g., patents that can never be cited), (2) Poisson for counts.
# The "|" separates the count model from the zero-inflation model.
zip_fit <- zeroinfl(citations ~ rd_spending + firm_age | small_firm, data = df)
# Output includes both count and zero-inflation model coefficients.
summary(zip_fit)
# --- Step 4: Overdispersion Test ---
# dispersiontest() from AER tests H0: Var(Y) = E[Y] (equidispersion).
# Rejection (p < 0.05) indicates overdispersion — use robust SEs or NegBin.
library(AER)
dispersiontest(glm(citations ~ rd_spending + firm_age, family = poisson, data = df))FDiagnostics
Testing for Overdispersion
- Deviance test: Compare the deviance (or Pearson chi-squared) to the degrees of freedom. A ratio much greater than 1 suggests overdispersion.
- Cameron-Trivedi test: A regression-based test (see derivation above).
- Compare Poisson and NegBin: If the NegBin dispersion parameter is significantly different from zero, overdispersion is present.
Zero-Inflation
If your data have more zeros than the Poisson (or NegBin) predicts, you may need a zero-inflated model. Two options:
- Zero-inflated Poisson/NegBin (ZIP/ZINB): A two-part model where one equation determines the probability of being a "structural zero" (e.g., a patent that could never receive citations), and a second equation models the count for the rest.
- Hurdle model: The first part is a binary model for zero vs. positive, and the second part is a truncated count model for positive counts.
The Vuong test can help distinguish between standard Poisson and zero-inflated Poisson, though it has known size issues.
PPML for Gravity Models
In international trade, the gravity equation models bilateral trade flows as a function of GDP and distance. PPML has become the standard estimator (Silva & Tenreyro, 2006) because it:
- Handles zeros naturally (many country pairs have zero trade).
- Is robust to heteroskedasticity in levels.
- Provides consistent estimates even with non-integer outcomes.
Incidence Rate Ratios (IRRs)
The exponentiated coefficient is the incidence rate ratio. If , a one-unit increase in multiplies the expected count by 1.15 — a 15% increase.
Semi-Elasticities
The raw coefficient is a semi-elasticity: a one-unit increase in is associated with an approximate % change in the expected count. This approximation is good for small (say, below 0.3 in absolute value).
Comparing Poisson and NegBin Coefficients
Unlike logit (where rescaling makes comparison difficult), Poisson and NegBin coefficients are directly comparable because both target the same conditional mean function. If the coefficients differ substantially, it suggests the conditional mean may be misspecified.
GWhat Can Go Wrong
| Problem | What It Does | How to Fix It |
|---|---|---|
| Using OLS on counts | Negative predictions, wrong SEs, inefficient | Use Poisson or NegBin |
| Using log(Y+1) | Jensen's inequality bias, arbitrary constant | Use PPML or count models |
| Ignoring overdispersion with default Poisson SEs | Standard errors are too small, false significance | Use robust/clustered SEs or switch to NegBin |
| Confusing zero-inflation with overdispersion | Both produce excess zeros but for different reasons | Zero-inflated models for structural zeros; NegBin for general overdispersion |
| Incidental parameters problem with NegBin FE | NegBin with unit FE can be inconsistent | Use Poisson FE (which is consistent) or the Hausman-Hall-Griliches approach |
Using OLS on Count Data
Poisson regression with robust standard errors, which guarantees non-negative predicted counts
Estimated effect of R&D on citations: coefficient = 0.12 (SE = 0.03). A $1M increase in R&D is associated with a 12% increase in expected citations. All predicted counts are positive.
Using log(Y+1) Instead of Count Models
PPML (Poisson PML) directly models the conditional mean E[Y|X] = exp(X'beta), handling zeros naturally
PPML coefficient on FTA membership: 0.38 (SE = 0.09). Free trade agreements increase bilateral trade by approximately 46% (exp(0.38) - 1 = 0.46). Zeros in trade flows are included in estimation.
Ignoring Overdispersion with Default Poisson SEs
Poisson regression with robust (sandwich) standard errors that account for overdispersion
Coefficient on R&D: 0.12. Robust SE = 0.031. 95% CI: [0.059, 0.181]. Confidence interval has correct coverage despite variance being 17x the mean.
You estimate a Poisson regression of patent citations on R&D spending (in millions). The coefficient on R&D is 0.12 (SE = 0.03). How do you interpret this coefficient?
HPractice
You run a Poisson regression and find that the variance of your outcome (patent citations) is 15 times larger than the mean. A colleague says: 'Your Poisson model is invalid — you must switch to negative binomial.' Is the colleague correct?
A researcher studies hospital readmissions (a count outcome). She uses an exposure variable (length of initial stay in days) because patients with longer stays have more time at risk of readmission. How should she incorporate this exposure variable in the Poisson model?
A colleague uses ln(patents + 1) as the dependent variable and runs OLS. She argues the approach is equivalent to Poisson regression because 'both model the log of the outcome.' Is she correct?
An innovation researcher estimates a negative binomial regression with firm fixed effects to study how R&D tax credits affect patent counts. She has a panel of 3,000 firms over 8 years. A reviewer says she should use Poisson FE instead. Why?
You estimate two models for patent citations. The Poisson model gives a deviance of 8,500 with 4,000 degrees of freedom. The NegBin model gives an estimated dispersion parameter alpha = 2.3 (SE = 0.4).
Is the deviance-to-df ratio consistent with overdispersion? Is the NegBin dispersion parameter significant?
Read the analysis below carefully and identify the errors.
An innovation researcher studies the effect of R&D tax credits on patent output. Using a panel of 5,000 firms over 10 years, they estimate a negative binomial regression with firm fixed effects. They report:
"The coefficient on R&D tax credit (binary) is 0.18 (p = 0.01), meaning that firms receiving tax credits produce 0.18 more patents per year. We use negative binomial with firm fixed effects to control for unobserved firm heterogeneity. The dispersion parameter alpha = 1.8 confirms that negative binomial is preferred over Poisson."
Select all errors you can find:
Read the analysis below carefully and identify the errors.
A trade economist estimates a gravity model of bilateral trade flows between 180 countries. Many country pairs have zero trade. The researcher takes ln(trade + 1) and runs OLS with exporter and importer fixed effects. They report:
"We estimate: ln(trade_ij + 1) = alpha_i + gamma_j + 0.85*ln(GDP_i*GDP_j) - 1.2*ln(distance_ij) + 0.45*FTA_ij + epsilon_ij. The coefficient on FTA membership indicates that free trade agreements increase bilateral trade by 45%. We add 1 to trade before taking the log to handle zeros."
Select all errors you can find:
Read the paper summary below and write a brief referee critique (2-3 sentences) of the identification strategy.
Paper Summary
The authors study whether venture capital (VC) investment increases startup patent output. Using a sample of 8,000 startups observed annually over 2005-2018, they estimate a Poisson regression of annual patent counts on a VC funding dummy, controlling for firm age, industry, founding year, and total employees. They include year fixed effects. Standard errors are clustered at the firm level. They find that VC-backed startups produce 65% more patents (IRR = 1.65, p < 0.001) and conclude that VC funding causally increases innovation.
Key Table
| Variable | IRR | Clustered SE | p-value |
|---|---|---|---|
| VC funded (0/1) | 1.65 | 0.12 | 0.000 |
| Firm age | 1.08 | 0.02 | 0.000 |
| Log(employees) | 1.32 | 0.05 | 0.000 |
| Year FE | Yes | ||
| Industry FE | Yes | ||
| Firm FE | No | ||
| N (firm-years) | 42,000 | ||
| Alpha (dispersion) | 2.1 |
Authors' Identification Claim
By controlling for firm age, industry, size, and year effects, we isolate the independent effect of VC funding on patent production. Clustering at the firm level accounts for serial correlation.
ISwap-In: When to Use Something Else
- OLS on log(Y): If all your counts are large (say, above 20) and you have no zeros, taking the log and running OLS is approximately valid. But with zeros or small counts, do not do this transformation.
- Tobit: If your count is in fact a censored continuous variable (e.g., hours worked), Tobit may be more appropriate.
- Hurdle models: If the process that generates zeros is different from the process that generates positive counts (e.g., the decision to patent at all vs. how many patents to file), a hurdle model captures this two-stage structure.
- Quasi-Poisson: In some fields, quasi-Poisson (which adjusts the variance without specifying a full distribution) is used as a middle ground.
JReviewer Checklist
Critical Reading Checklist
Paper Library
Foundational (8)
Cameron, A. C., & Trivedi, P. K. (1986). Econometric Models Based on Count Data: Comparisons and Applications of Some Estimators and Tests.
Cameron and Trivedi compare Poisson, negative binomial, and other count data models, providing tests for overdispersion and guidance on model selection. This paper helps establish the practical toolkit for applied researchers working with count outcomes.
Cameron, A. C., & Trivedi, P. K. (1990). Regression-based Tests for Overdispersion in the Poisson Model.
Cameron and Trivedi develop regression-based tests for overdispersion in count data models, enabling formal testing of whether the Poisson equidispersion assumption holds. Their tests compare the observed variance to the Poisson-implied mean, providing the foundation for model selection between Poisson and negative binomial specifications. Researchers working with count outcomes should use these tests before defaulting to either model.
Correia, S., Guimaraes, P., & Zylkin, T. (2020). Fast Poisson Estimation with High-Dimensional Fixed Effects.
Correia, Guimaraes, and Zylkin introduce the ppmlhdfe Stata command for fast Poisson estimation with multiple levels of fixed effects, making PPML feasible for large datasets with high-dimensional fixed effects. This tool has become standard for applied researchers working with count data in panel settings.
Gourieroux, C., Monfort, A., & Trognon, A. (1984). Pseudo Maximum Likelihood Methods: Theory.
Gourieroux, Monfort, and Trognon develop the general theory of pseudo maximum likelihood estimation for cases in which the likelihood family may be misspecified. They derive conditions for consistency and asymptotic normality and characterize efficiency bounds in this broader framework. The Poisson PML result — consistency for the conditional mean under misspecification — is a special case that underpins the later widespread use of Poisson regression with robust standard errors.
Hausman, J., Hall, B. H., & Griliches, Z. (1984). Econometric Models for Count Data with an Application to the Patents–R&D Relationship.
Hausman, Hall, and Griliches develop the econometric framework for Poisson and negative binomial regression models applied to count data, using the relationship between R&D spending and patent counts as the motivating application. The paper is a classic early econometric treatment of count-data models in panel settings.
Lambert, D. (1992). Zero-Inflated Poisson Regression, with an Application to Defects in Manufacturing.
Lambert introduces the zero-inflated Poisson (ZIP) model, which accounts for excess zeros in count data by mixing a point mass at zero with a Poisson distribution. The ZIP model has become a standard tool for count outcomes where a subpopulation generates only zeros.
Silva, J. M. C. S., & Tenreyro, S. (2006). The Log of Gravity.
Silva and Tenreyro demonstrate that OLS estimation of log-linearized gravity models produces inconsistent estimates in the presence of heteroskedasticity. They show that Poisson pseudo-maximum-likelihood (PPML) provides consistent estimates and naturally handles zero trade flows, transforming the trade literature.
Wooldridge, J. M. (1999). Distribution-Free Estimation of Some Nonlinear Panel Data Models.
Wooldridge shows that Poisson quasi-maximum-likelihood estimation in panel data models is consistent for the conditional mean even if the data are not Poisson-distributed, as long as the mean is correctly specified. This result justifies the widespread use of Poisson regression for non-count continuous outcomes and provides the foundation for distribution-free estimation of nonlinear panel data models.
Application (4)
Ahuja, G. (2000). Collaboration Networks, Structural Holes, and Innovation: A Longitudinal Study.
Ahuja uses a random effects Poisson model (following Hausman, Hall, and Griliches 1984) to model patent counts as a function of collaboration network structure in this landmark network study. He finds that direct ties and indirect ties both increase innovation, while structural holes (gaps between partners) decrease it — challenging Burt's structural holes theory in the context of innovation. The paper demonstrates the use of count models with panel data in management research, with fixed effects Poisson estimated as a robustness check.
Fleming, L., & Sorenson, O. (2001). Technology as a Complex Adaptive System: Evidence from Patent Data.
Fleming and Sorenson use negative binomial regression on patent citation counts to study how the complexity of technological combinations affects the usefulness of inventions. This paper is a prominent application of count models in the innovation and technology management literature.
Greve, H. R. (2003). A Behavioral Theory of R&D Expenditures and Innovations: Evidence from Shipbuilding.
Greve tests behavioral theory predictions about how performance relative to aspiration levels affects R&D investment and innovation output using count models in the Japanese shipbuilding industry. He finds that low performance triggers problemistic search (increasing R&D), high slack triggers slack search (also increasing R&D), and low performance increases risk tolerance for launching innovations. The paper demonstrates how to model count-based innovation outcomes with firm-level panel data in a management context.
Katila, R., & Ahuja, G. (2002). Something Old, Something New: A Longitudinal Study of Search Behavior and New Product Introduction.
Katila and Ahuja use negative binomial models to study how the depth and scope of a firm's knowledge search affect new product introductions. This paper is a widely cited application of count data models in the strategic management and innovation literature.
Survey (3)
Cameron, A. C., & Trivedi, P. K. (2013). Regression Analysis of Count Data.
Cameron and Trivedi provide the standard reference on count data regression, covering Poisson, negative binomial, zero-inflated, hurdle, and panel count models. They provide both the theoretical foundations and practical implementation guidance that applied researchers need.
Griliches, Z. (1990). Patent Statistics as Economic Indicators: A Survey.
Griliches surveys the use of patent data as economic indicators, establishing patent counts as a key measure of innovative output. This survey motivates much of the subsequent applied work using Poisson and negative binomial models to study innovation.
Wooldridge, J. M. (2010). Econometric Analysis of Cross Section and Panel Data.
Wooldridge's graduate textbook is the standard reference for cross-section and panel data econometrics. Chapters 10-11 provide a thorough treatment of fixed effects, random effects, and related panel data methods, while later chapters cover general estimation methodology (MLE, GMM, M-estimation) with panel data applications throughout. The book covers both linear and nonlinear models with careful attention to assumptions.