MethodAtlas
Method·intermediate·11 min read
Count ModelsEstablished

Poisson / Negative Binomial

Models for count outcomes — patents filed, citations received, number of acquisitions.

When to UseWhen your outcome is a non-negative integer count (patents, citations, acquisitions, events). Poisson with robust SEs (PPML) is often all you need even with overdispersion.
AssumptionPoisson: conditional mean correctly specified as exp(X'beta). With robust SEs, equidispersion is not required — only the mean specification must be correct (Poisson pseudo-maximum likelihood).
MistakeUsing OLS on count data, which can produce negative predicted values and ignores the discrete, non-negative nature of the outcome. Also, reflexively switching to negative binomial when Poisson with robust SEs already handles overdispersion.
Reading Time~11 min read · 11 sections · 9 interactive exercises

One-Line Implementation

Rfepois(y ~ x1 + x2, data = df, vcov = 'hetero')
Statapoisson y x1 x2, vce(robust)
Pythonsmf.poisson('y ~ x1 + x2', data=df).fit(cov_type='HC1')

Download Full Analysis Code

Complete scripts with diagnostics, robustness checks, and result export.

Motivating Example: Patent Citations

You are studying what predicts how many citations a patent receives. Citations are a widely used proxy for patent value — more citations typically suggest a more influential invention. Your dataset has 50,000 patents, and the citation counts look like this distribution: a huge spike at zero (many patents are never cited), a long right tail (a few blockbuster patents get hundreds of citations), and everything in between.

The mean number of citations is 8.2, but the variance is 142.6. That gap is a variance-to-mean ratio of about 17:1.

If you run OLS on this outcome, three problems arise:

  1. You predict negative citations for some patents (impossible).
  2. Your residuals are wildly heteroskedastic (variance increases with the mean).
  3. The normality assumption for inference is badly violated.

Count models are designed for exactly this situation.


AOverview

The Poisson Model

The models the conditional mean of a count outcome as:

E[YiXi]=μi=exp(Xiβ)E[Y_i \mid X_i] = \mu_i = \exp(X_i'\beta)

The exponential function ensures that predicted counts are always positive. Estimation is by maximum likelihood, assuming YiXiPoisson(μi)Y_i | X_i \sim \text{Poisson}(\mu_i).

The key property of the Poisson distribution: the mean equals the variance. That property is:

E[YiXi]=Var(YiXi)=μiE[Y_i \mid X_i] = \text{Var}(Y_i \mid X_i) = \mu_i

This restriction is called equidispersion. In practice, real count data very frequently violate this assumption — the variance exceeds the mean, a condition called .

The Negative Binomial Model

The negative binomial relaxes equidispersion by adding a dispersion parameter α\alpha:

Var(YiXi)=μi(1+αμi)\text{Var}(Y_i \mid X_i) = \mu_i(1 + \alpha \mu_i)

When α=0\alpha = 0, this collapses to Poisson. When α>0\alpha > 0, the variance exceeds the mean, and the negative binomial accommodates the extra variability. This specification is the Negative Binomial type 2 (NB2) parameterization (Cameron & Trivedi, 1986). An alternative is Negative Binomial type 1 (NB1), where Var(YiXi)=σ2μi\text{Var}(Y_i | X_i) = \sigma^2 \mu_i — overdispersion proportional to the mean rather than quadratic (Wooldridge, 2010).

An Important Subtlety: Poisson with Robust SEs

An important result is that even if the Poisson variance assumption is wrong (and it usually is), the Poisson model with robust standard errors still gives consistent estimates of the coefficients. This consistency holds because the Poisson maximum likelihood estimator (MLE) only requires the mean to be correctly specified — you do not need the variance to be correct (Gourieroux et al., 1984).

This result is the basis for Poisson (PPML). You use the Poisson likelihood as a working model but make no assumption about the variance. With robust SEs, inference is valid as long as the conditional mean is correctly specified.


Common Confusions


BIdentification

Like OLS and logit/probit, count models require exogeneity for causal interpretation:

E[YiXi]=exp(Xiβ)E[εiXi]=0E[Y_i \mid X_i] = \exp(X_i'\beta) \quad \Rightarrow \quad E[\varepsilon_i \mid X_i] = 0

where εi=Yiexp(Xiβ)\varepsilon_i = Y_i - \exp(X_i'\beta). If your key regressor is endogenous, you need an identification strategy. Common approaches include:

  • Poisson with fixed effects — removes time-invariant confounders, just like linear FE.
  • Control function approach — the nonlinear equivalent of IV for exponential models.
  • PPML with instrumental variables — available in specialized software.

Hausman et al. (1984) first applied Poisson and negative binomial models to patent data, establishing the methodology that many innovation researchers still use. In management, count models are widely used to study innovation and organizational behavior — for example, Ahuja (2000) used negative binomial regression to examine how network structure and structural holes affect firm innovation output, and Greve (2003) modeled R&D expenditures and innovation counts to test behavioral theories of the firm.


CVisual Intuition

Picture the distribution of patent citations. It looks nothing like a bell curve. It is a histogram bunched up at zero, rising to a peak around 2-5 citations, and then trailing off in a long right tail. A few patents have 50, 100, or even 500 citations.

The Poisson model says: the average number of citations depends on covariates through an exponential function, and the spread around that average follows a Poisson distribution. But if the actual spread is much wider than the Poisson predicts (which it very frequently is), you have overdispersion.

Visually, overdispersion means the histogram is wider and has thicker tails than the Poisson would predict. The negative binomial adds a "mixing" distribution that spreads things out more, better matching the observed data.


DMathematical Derivation

Don't worry about the notation yet — here's what this means in words: The Poisson model maximizes the likelihood of observing the counts you see, given an exponential mean function. The overdispersion test checks whether the Poisson variance assumption holds.

Poisson log-likelihood:

For YiPoisson(μi)Y_i \sim \text{Poisson}(\mu_i) with μi=exp(Xiβ)\mu_i = \exp(X_i'\beta):

(β)=i=1n[YiXiβexp(Xiβ)ln(Yi!)]\ell(\beta) = \sum_{i=1}^{n} \left[ Y_i \cdot X_i'\beta - \exp(X_i'\beta) - \ln(Y_i!) \right]

The first-order conditions are:

i=1n(Yiexp(Xiβ))Xi=0\sum_{i=1}^{n} (Y_i - \exp(X_i'\beta)) X_i = 0

Note the similarity to OLS normal equations — replace the linear predictor with the exponential.

Overdispersion test (Cameron & Trivedi, 1990):

Regress (Yiμ^i)2Yi(Y_i - \hat{\mu}_i)^2 - Y_i on μ^i\hat{\mu}_i (or μ^i2\hat{\mu}_i^2). Under the Poisson assumption, the coefficient should be zero. A positive and significant coefficient indicates overdispersion.

Negative binomial:

The NegBin adds a gamma-distributed heterogeneity term, leading to:

Var(YiXi)=μi(1+αμi)\text{Var}(Y_i \mid X_i) = \mu_i(1 + \alpha \mu_i)

Testing H0:α=0H_0: \alpha = 0 is a test of Poisson against NegBin.

Coefficient interpretation:

For a continuous covariate XjX_j:

lnE[YiXi]Xj=βj\frac{\partial \ln E[Y_i \mid X_i]}{\partial X_j} = \beta_j

So βj\beta_j is a semi-elasticity: a one-unit increase in XjX_j changes the expected count by approximately 100×βj100 \times \beta_j percent.


EImplementation

# Requires: fixest, MASS, pscl, AER
library(fixest)
library(MASS)
library(pscl)

# --- Step 1: Poisson with Robust SEs (PPML) ---
# fepois() from fixest fits Poisson pseudo-maximum likelihood.
# Coefficients are semi-elasticities: beta = approx. % change in E[Y] per unit X.
# vcov = ~firm_id clusters standard errors at the firm level.
# The "|" syntax absorbs tech_class and year fixed effects.
pois_fit <- fepois(citations ~ rd_spending + firm_age | tech_class + year,
                 data = df, vcov = ~firm_id)
# Output: coefficients in log units. Exponentiate for IRRs: exp(beta).
summary(pois_fit)

# --- Step 2: Negative Binomial ---
# glm.nb() from MASS adds a dispersion parameter (alpha) to relax
# the Poisson equidispersion assumption (mean = variance).
# Use when you want model-based SEs that account for overdispersion.
nb_fit <- glm.nb(citations ~ rd_spending + firm_age + factor(tech_class), data = df)
# Check theta (= 1/alpha): small theta = high overdispersion.
summary(nb_fit)

# --- Step 3: Zero-Inflated Poisson ---
# zeroinfl() fits a two-part model: (1) logit for structural zeros
# (e.g., patents that can never be cited), (2) Poisson for counts.
# The "|" separates the count model from the zero-inflation model.
zip_fit <- zeroinfl(citations ~ rd_spending + firm_age | small_firm, data = df)
# Output includes both count and zero-inflation model coefficients.
summary(zip_fit)

# --- Step 4: Overdispersion Test ---
# dispersiontest() from AER tests H0: Var(Y) = E[Y] (equidispersion).
# Rejection (p < 0.05) indicates overdispersion — use robust SEs or NegBin.
library(AER)
dispersiontest(glm(citations ~ rd_spending + firm_age, family = poisson, data = df))

FDiagnostics

Testing for Overdispersion

  1. Deviance test: Compare the deviance (or Pearson chi-squared) to the degrees of freedom. A ratio much greater than 1 suggests overdispersion.
  2. Cameron-Trivedi test: A regression-based test (see derivation above).
  3. Compare Poisson and NegBin: If the NegBin dispersion parameter α\alpha is significantly different from zero, overdispersion is present.

Zero-Inflation

If your data have more zeros than the Poisson (or NegBin) predicts, you may need a zero-inflated model. Two options:

  • Zero-inflated Poisson/NegBin (ZIP/ZINB): A two-part model where one equation determines the probability of being a "structural zero" (e.g., a patent that could never receive citations), and a second equation models the count for the rest.
  • Hurdle model: The first part is a binary model for zero vs. positive, and the second part is a truncated count model for positive counts.

The Vuong test can help distinguish between standard Poisson and zero-inflated Poisson, though it has known size issues.

PPML for Gravity Models

In international trade, the gravity equation models bilateral trade flows as a function of GDP and distance. PPML has become the standard estimator (Silva & Tenreyro, 2006) because it:

  1. Handles zeros naturally (many country pairs have zero trade).
  2. Is robust to heteroskedasticity in levels.
  3. Provides consistent estimates even with non-integer outcomes.

Incidence Rate Ratios (IRRs)

The exponentiated coefficient eβje^{\beta_j} is the incidence rate ratio. If eβj=1.15e^{\beta_j} = 1.15, a one-unit increase in XjX_j multiplies the expected count by 1.15 — a 15% increase.

Semi-Elasticities

The raw coefficient βj\beta_j is a semi-elasticity: a one-unit increase in XjX_j is associated with an approximate 100×βj100 \times \beta_j% change in the expected count. This approximation is good for small βj\beta_j (say, below 0.3 in absolute value).

Comparing Poisson and NegBin Coefficients

Unlike logit (where rescaling makes comparison difficult), Poisson and NegBin coefficients are directly comparable because both target the same conditional mean function. If the coefficients differ substantially, it suggests the conditional mean may be misspecified.


GWhat Can Go Wrong

ProblemWhat It DoesHow to Fix It
Using OLS on countsNegative predictions, wrong SEs, inefficientUse Poisson or NegBin
Using log(Y+1)Jensen's inequality bias, arbitrary constantUse PPML or count models
Ignoring overdispersion with default Poisson SEsStandard errors are too small, false significanceUse robust/clustered SEs or switch to NegBin
Confusing zero-inflation with overdispersionBoth produce excess zeros but for different reasonsZero-inflated models for structural zeros; NegBin for general overdispersion
Incidental parameters problem with NegBin FENegBin with unit FE can be inconsistentUse Poisson FE (which is consistent) or the Hausman-Hall-Griliches approach
What Can Go Wrong

Using OLS on Count Data

Poisson regression with robust standard errors, which guarantees non-negative predicted counts

Estimated effect of R&D on citations: coefficient = 0.12 (SE = 0.03). A $1M increase in R&D is associated with a 12% increase in expected citations. All predicted counts are positive.

What Can Go Wrong

Using log(Y+1) Instead of Count Models

PPML (Poisson PML) directly models the conditional mean E[Y|X] = exp(X'beta), handling zeros naturally

PPML coefficient on FTA membership: 0.38 (SE = 0.09). Free trade agreements increase bilateral trade by approximately 46% (exp(0.38) - 1 = 0.46). Zeros in trade flows are included in estimation.

What Can Go Wrong

Ignoring Overdispersion with Default Poisson SEs

Poisson regression with robust (sandwich) standard errors that account for overdispersion

Coefficient on R&D: 0.12. Robust SE = 0.031. 95% CI: [0.059, 0.181]. Confidence interval has correct coverage despite variance being 17x the mean.

Concept Check

You estimate a Poisson regression of patent citations on R&D spending (in millions). The coefficient on R&D is 0.12 (SE = 0.03). How do you interpret this coefficient?


HPractice

Concept Check

You run a Poisson regression and find that the variance of your outcome (patent citations) is 15 times larger than the mean. A colleague says: 'Your Poisson model is invalid — you must switch to negative binomial.' Is the colleague correct?

Concept Check

A researcher studies hospital readmissions (a count outcome). She uses an exposure variable (length of initial stay in days) because patients with longer stays have more time at risk of readmission. How should she incorporate this exposure variable in the Poisson model?

Concept Check

A colleague uses ln(patents + 1) as the dependent variable and runs OLS. She argues the approach is equivalent to Poisson regression because 'both model the log of the outcome.' Is she correct?

Concept Check

An innovation researcher estimates a negative binomial regression with firm fixed effects to study how R&D tax credits affect patent counts. She has a panel of 3,000 firms over 8 years. A reviewer says she should use Poisson FE instead. Why?

Guided Exercise

You estimate two models for patent citations. The Poisson model gives a deviance of 8,500 with 4,000 degrees of freedom. The NegBin model gives an estimated dispersion parameter alpha = 2.3 (SE = 0.4).

Is the deviance-to-df ratio consistent with overdispersion? Is the NegBin dispersion parameter significant?

What is the deviance-to-degrees-of-freedom ratio?

Is the data overdispersed? (yes/no)

Is the NegBin dispersion parameter statistically significant? (yes/no)

Error Detective

Read the analysis below carefully and identify the errors.

An innovation researcher studies the effect of R&D tax credits on patent output. Using a panel of 5,000 firms over 10 years, they estimate a negative binomial regression with firm fixed effects. They report:

"The coefficient on R&D tax credit (binary) is 0.18 (p = 0.01), meaning that firms receiving tax credits produce 0.18 more patents per year. We use negative binomial with firm fixed effects to control for unobserved firm heterogeneity. The dispersion parameter alpha = 1.8 confirms that negative binomial is preferred over Poisson."

Select all errors you can find:

Error Detective

Read the analysis below carefully and identify the errors.

A trade economist estimates a gravity model of bilateral trade flows between 180 countries. Many country pairs have zero trade. The researcher takes ln(trade + 1) and runs OLS with exporter and importer fixed effects. They report:

"We estimate: ln(trade_ij + 1) = alpha_i + gamma_j + 0.85*ln(GDP_i*GDP_j) - 1.2*ln(distance_ij) + 0.45*FTA_ij + epsilon_ij. The coefficient on FTA membership indicates that free trade agreements increase bilateral trade by 45%. We add 1 to trade before taking the log to handle zeros."

Select all errors you can find:

Referee Exercise

Read the paper summary below and write a brief referee critique (2-3 sentences) of the identification strategy.

Paper Summary

The authors study whether venture capital (VC) investment increases startup patent output. Using a sample of 8,000 startups observed annually over 2005-2018, they estimate a Poisson regression of annual patent counts on a VC funding dummy, controlling for firm age, industry, founding year, and total employees. They include year fixed effects. Standard errors are clustered at the firm level. They find that VC-backed startups produce 65% more patents (IRR = 1.65, p < 0.001) and conclude that VC funding causally increases innovation.

Key Table

VariableIRRClustered SEp-value
VC funded (0/1)1.650.120.000
Firm age1.080.020.000
Log(employees)1.320.050.000
Year FEYes
Industry FEYes
Firm FENo
N (firm-years)42,000
Alpha (dispersion)2.1

Authors' Identification Claim

By controlling for firm age, industry, size, and year effects, we isolate the independent effect of VC funding on patent production. Clustering at the firm level accounts for serial correlation.


ISwap-In: When to Use Something Else

  • OLS on log(Y): If all your counts are large (say, above 20) and you have no zeros, taking the log and running OLS is approximately valid. But with zeros or small counts, do not do this transformation.
  • Tobit: If your count is in fact a censored continuous variable (e.g., hours worked), Tobit may be more appropriate.
  • Hurdle models: If the process that generates zeros is different from the process that generates positive counts (e.g., the decision to patent at all vs. how many patents to file), a hurdle model captures this two-stage structure.
  • Quasi-Poisson: In some fields, quasi-Poisson (which adjusts the variance without specifying a full distribution) is used as a middle ground.

JReviewer Checklist

Critical Reading Checklist

0 of 8 items checked0%

Paper Library

Foundational (8)

Cameron, A. C., & Trivedi, P. K. (1986). Econometric Models Based on Count Data: Comparisons and Applications of Some Estimators and Tests.

Journal of Applied EconometricsDOI: 10.1002/jae.3950010104

Cameron and Trivedi compare Poisson, negative binomial, and other count data models, providing tests for overdispersion and guidance on model selection. This paper helps establish the practical toolkit for applied researchers working with count outcomes.

Cameron, A. C., & Trivedi, P. K. (1990). Regression-based Tests for Overdispersion in the Poisson Model.

Journal of EconometricsDOI: 10.1016/0304-4076(90)90014-K

Cameron and Trivedi develop regression-based tests for overdispersion in count data models, enabling formal testing of whether the Poisson equidispersion assumption holds. Their tests compare the observed variance to the Poisson-implied mean, providing the foundation for model selection between Poisson and negative binomial specifications. Researchers working with count outcomes should use these tests before defaulting to either model.

Correia, S., Guimaraes, P., & Zylkin, T. (2020). Fast Poisson Estimation with High-Dimensional Fixed Effects.

Correia, Guimaraes, and Zylkin introduce the ppmlhdfe Stata command for fast Poisson estimation with multiple levels of fixed effects, making PPML feasible for large datasets with high-dimensional fixed effects. This tool has become standard for applied researchers working with count data in panel settings.

Gourieroux, C., Monfort, A., & Trognon, A. (1984). Pseudo Maximum Likelihood Methods: Theory.

EconometricaDOI: 10.2307/1913471

Gourieroux, Monfort, and Trognon develop the general theory of pseudo maximum likelihood estimation for cases in which the likelihood family may be misspecified. They derive conditions for consistency and asymptotic normality and characterize efficiency bounds in this broader framework. The Poisson PML result — consistency for the conditional mean under misspecification — is a special case that underpins the later widespread use of Poisson regression with robust standard errors.

Hausman, J., Hall, B. H., & Griliches, Z. (1984). Econometric Models for Count Data with an Application to the Patents–R&D Relationship.

EconometricaDOI: 10.2307/1911191

Hausman, Hall, and Griliches develop the econometric framework for Poisson and negative binomial regression models applied to count data, using the relationship between R&D spending and patent counts as the motivating application. The paper is a classic early econometric treatment of count-data models in panel settings.

Lambert, D. (1992). Zero-Inflated Poisson Regression, with an Application to Defects in Manufacturing.

TechnometricsDOI: 10.2307/1269547

Lambert introduces the zero-inflated Poisson (ZIP) model, which accounts for excess zeros in count data by mixing a point mass at zero with a Poisson distribution. The ZIP model has become a standard tool for count outcomes where a subpopulation generates only zeros.

Silva, J. M. C. S., & Tenreyro, S. (2006). The Log of Gravity.

Review of Economics and StatisticsDOI: 10.1162/rest.88.4.641

Silva and Tenreyro demonstrate that OLS estimation of log-linearized gravity models produces inconsistent estimates in the presence of heteroskedasticity. They show that Poisson pseudo-maximum-likelihood (PPML) provides consistent estimates and naturally handles zero trade flows, transforming the trade literature.

Wooldridge, J. M. (1999). Distribution-Free Estimation of Some Nonlinear Panel Data Models.

Journal of EconometricsDOI: 10.1016/S0304-4076(98)00033-5

Wooldridge shows that Poisson quasi-maximum-likelihood estimation in panel data models is consistent for the conditional mean even if the data are not Poisson-distributed, as long as the mean is correctly specified. This result justifies the widespread use of Poisson regression for non-count continuous outcomes and provides the foundation for distribution-free estimation of nonlinear panel data models.

Application (4)

Ahuja, G. (2000). Collaboration Networks, Structural Holes, and Innovation: A Longitudinal Study.

Administrative Science QuarterlyDOI: 10.2307/2667105

Ahuja uses a random effects Poisson model (following Hausman, Hall, and Griliches 1984) to model patent counts as a function of collaboration network structure in this landmark network study. He finds that direct ties and indirect ties both increase innovation, while structural holes (gaps between partners) decrease it — challenging Burt's structural holes theory in the context of innovation. The paper demonstrates the use of count models with panel data in management research, with fixed effects Poisson estimated as a robustness check.

Fleming, L., & Sorenson, O. (2001). Technology as a Complex Adaptive System: Evidence from Patent Data.

Fleming and Sorenson use negative binomial regression on patent citation counts to study how the complexity of technological combinations affects the usefulness of inventions. This paper is a prominent application of count models in the innovation and technology management literature.

Greve, H. R. (2003). A Behavioral Theory of R&D Expenditures and Innovations: Evidence from Shipbuilding.

Academy of Management JournalDOI: 10.5465/30040661

Greve tests behavioral theory predictions about how performance relative to aspiration levels affects R&D investment and innovation output using count models in the Japanese shipbuilding industry. He finds that low performance triggers problemistic search (increasing R&D), high slack triggers slack search (also increasing R&D), and low performance increases risk tolerance for launching innovations. The paper demonstrates how to model count-based innovation outcomes with firm-level panel data in a management context.

Katila, R., & Ahuja, G. (2002). Something Old, Something New: A Longitudinal Study of Search Behavior and New Product Introduction.

Academy of Management JournalDOI: 10.2307/3069433

Katila and Ahuja use negative binomial models to study how the depth and scope of a firm's knowledge search affect new product introductions. This paper is a widely cited application of count data models in the strategic management and innovation literature.

Survey (3)

Cameron, A. C., & Trivedi, P. K. (2013). Regression Analysis of Count Data.

Cambridge University PressDOI: 10.1017/CBO9781139013567

Cameron and Trivedi provide the standard reference on count data regression, covering Poisson, negative binomial, zero-inflated, hurdle, and panel count models. They provide both the theoretical foundations and practical implementation guidance that applied researchers need.

Griliches, Z. (1990). Patent Statistics as Economic Indicators: A Survey.

Journal of Economic Literature

Griliches surveys the use of patent data as economic indicators, establishing patent counts as a key measure of innovative output. This survey motivates much of the subsequent applied work using Poisson and negative binomial models to study innovation.

Wooldridge, J. M. (2010). Econometric Analysis of Cross Section and Panel Data.

MIT Press

Wooldridge's graduate textbook is the standard reference for cross-section and panel data econometrics. Chapters 10-11 provide a thorough treatment of fixed effects, random effects, and related panel data methods, while later chapters cover general estimation methodology (MLE, GMM, M-estimation) with panel data applications throughout. The book covers both linear and nonlinear models with careful attention to assumptions.

Tags

count-modelscount-outcome