When should I use Poisson / Negative Binomial?

When your outcome is a non-negative integer count (patents, citations, acquisitions, events). Poisson with robust SEs (PPML) is often all you need even with overdispersion.

What is the key assumption of Poisson / Negative Binomial?

Poisson: conditional mean correctly specified as exp(X'beta). With robust SEs, equidispersion is not required — only the mean specification must be correct (Poisson pseudo-maximum likelihood).

What is the most common mistake with Poisson / Negative Binomial?

Using OLS on count data, which can produce negative predicted values and ignores the discrete, non-negative nature of the outcome. Also, reflexively switching to negative binomial when Poisson with robust SEs already handles overdispersion.

Method·intermediate·10 min read

Count ModelsEstablished

Poisson / Negative Binomial

Models for count outcomes — patents filed, citations received, number of acquisitions.

When to Use: When your outcome is a non-negative integer count (patents, citations, acquisitions, events). Poisson with robust SEs (PPML) is often all you need even with overdispersion.
Assumption: Poisson: conditional mean correctly specified as exp(X'beta). With robust SEs, equidispersion is not required — only the mean specification must be correct (Poisson pseudo-maximum likelihood).
Mistake: Using OLS on count data, which can produce negative predicted values and ignores the discrete, non-negative nature of the outcome. Also, reflexively switching to negative binomial when Poisson with robust SEs already handles overdispersion.
Reading Time: ~10 min read · 11 sections · 9 interactive exercises

One-Line Implementation

Rfepois(y ~ x1 + x2, data = df, vcov = 'hetero')

Statapoisson y x1 x2, vce(robust)

Pythonsmf.poisson('y ~ x1 + x2', data=df).fit(cov_type='HC1')

Download Full Analysis Code

Complete scripts with diagnostics, robustness checks, and result export.

Motivating Example: Patent Citations

You are studying what predicts how many citations a patent receives. Citations are a widely used proxy for patent value — more citations typically suggest a more influential invention. Your dataset has 50,000 patents, and the citation counts look like this distribution: a huge spike at zero (many patents are never cited), a long right tail (a few blockbuster patents get hundreds of citations), and everything in between.

The mean number of citations is 8.2, but the variance is 142.6. That gap is a variance-to-mean ratio of about 17:1.

If you run OLS on this outcome, three problems arise:

You predict negative citations for some patents (impossible).
Your residuals are wildly heteroskedastic (variance increases with the mean).
The normality assumption for inference is badly violated.

Count models are designed for exactly this situation.

AOverview

The Poisson Model

The models the conditional mean of a count outcome as:

E[Y_i \mid X_i] = \mu_i = \exp(X_i'\beta)

The exponential function ensures that predicted counts are always positive. Estimation is by maximum likelihood, assuming $Y_i | X_i \sim \text{Poisson}(\mu_i)$ .

The key property of the Poisson distribution: the mean equals the variance. That property is:

E[Y_i \mid X_i] = \text{Var}(Y_i \mid X_i) = \mu_i

This restriction is called equidispersion. In practice, real count data very frequently violate this assumption — the variance exceeds the mean, a condition called .

The Negative Binomial Model

The negative binomial relaxes equidispersion by adding a dispersion parameter $\alpha$ :

\text{Var}(Y_i \mid X_i) = \mu_i(1 + \alpha \mu_i)

When $\alpha = 0$ , this collapses to Poisson. When $\alpha > 0$ , the variance exceeds the mean, and the negative binomial accommodates the extra variability. This specification is the Negative Binomial type 2 (NB2) parameterization (Cameron & Trivedi, 1986). An alternative is Negative Binomial type 1 (NB1), where $\text{Var}(Y_i | X_i) = (1+\alpha)\mu_i$ — overdispersion proportional to the mean rather than quadratic, with $\alpha$ estimated by maximum likelihood under the NB1 distributional assumption . NB1 should not be confused with the quasi-Poisson model, which assumes $\text{Var}(Y_i | X_i) = \phi\mu_i$ for a free scalar $\phi$ but does not specify a full likelihood (it is a quasi-likelihood / GLM).

An Important Subtlety: Poisson with Robust SEs

An important result is that even if the Poisson variance assumption is wrong (and it usually is), the Poisson model with robust standard errors still gives consistent estimates of the coefficients. This consistency holds because the Poisson maximum likelihood estimator (MLE) only requires the mean to be correctly specified — you typically do not need the variance to be correct (Gourieroux et al., 1984).

The mean-only consistency is the basis for Poisson (PPML). You use the Poisson likelihood as a working model but make no assumption about the variance. With robust SEs, inference is valid as long as the conditional mean is correctly specified.

Common Confusions

My Data Has Many Zeros — Does Poisson Still Work?

It depends on why the zeros are there. If zeros arise naturally from the count process (e.g., many patents simply receive zero citations because they are not influential), standard Poisson or negative binomial handles this fine. But if zeros come from a different generating process — for example, some firms structurally never patent because they are in non-inventive sectors — then you have zero-inflation, and a standard count model will underpredict the number of zeros. In that case, consider a zero-inflated model (ZIP or ZINB) or a hurdle model that separates the "any vs. none" decision from the "how many" decision. A quick diagnostic: compare the observed share of zeros to the share predicted by your fitted Poisson or NegBin model. If the model predicts far fewer zeros than observed, zero-inflation may be present.

BIdentification

Like OLS and logit/probit, count models require exogeneity for causal interpretation:

E[Y_i \mid X_i] = \exp(X_i'\beta) \quad \Rightarrow \quad E[\varepsilon_i \mid X_i] = 0

where $\varepsilon_i = Y_i - \exp(X_i'\beta)$ . If your key regressor is endogenous, you need an identification strategy. Common approaches include:

Poisson with fixed effects — removes time-invariant confounders, just like linear FE.
Control function approach — the nonlinear equivalent of IV for exponential models.
PPML with instrumental variables — available in specialized software.

Hausman et al. (1984) first applied Poisson and negative binomial models to patent data, establishing the methodology that many innovation researchers still use. In management, count models are widely used to study innovation and organizational behavior — for example, Ahuja (2000) used negative binomial regression to examine how network structure and structural holes affect firm innovation output, and Greve (2003) modeled R&D expenditures and innovation counts to test behavioral theories of the firm.

CVisual Intuition

Picture the distribution of patent citations. It looks nothing like a bell curve. It is a histogram bunched up at zero, rising to a peak around 2-5 citations, and then trailing off in a long right tail. A few patents have 50, 100, or even 500 citations.

The Poisson model says: the average number of citations depends on covariates through an exponential function, and the spread around that average follows a Poisson distribution. But if the actual spread is much wider than the Poisson predicts (which it very frequently is), you have overdispersion.

Visually, overdispersion means the histogram is wider and has thicker tails than the Poisson would predict. The negative binomial adds a "mixing" distribution that spreads things out more, better matching the observed data.

DMathematical Derivation

Don't worry about the notation yet — here's what this means in words: The Poisson model maximizes the likelihood of observing the counts you see, given an exponential mean function. The overdispersion test checks whether the Poisson variance assumption holds.

Poisson log-likelihood:

For $Y_i \sim \text{Poisson}(\mu_i)$ with $\mu_i = \exp(X_i'\beta)$ :

\ell(\beta) = \sum_{i=1}^{n} \left[ Y_i \cdot X_i'\beta - \exp(X_i'\beta) - \ln(Y_i!) \right]

The first-order conditions are:

\sum_{i=1}^{n} (Y_i - \exp(X_i'\beta)) X_i = 0

Note the similarity to OLS normal equations — replace the linear predictor with the exponential.

Overdispersion test (Cameron & Trivedi, 1990):

Regress $(Y_i - \hat{\mu}_i)^2 - Y_i$ on $\hat{\mu}_i$ (or $\hat{\mu}_i^2$ ). Under the Poisson assumption, the coefficient should be zero. A positive and significant coefficient indicates overdispersion.

Negative binomial:

The NegBin adds a gamma-distributed heterogeneity term, leading to:

\text{Var}(Y_i \mid X_i) = \mu_i(1 + \alpha \mu_i)

Testing $H_0: \alpha = 0$ is a test of Poisson against NegBin.

Coefficient interpretation:

For a continuous covariate $X_j$ :

\frac{\partial \ln E[Y_i \mid X_i]}{\partial X_j} = \beta_j

So $\beta_j$ is a semi-elasticity: a one-unit increase in $X_j$ changes the expected count by approximately $100 \times \beta_j$ percent.

EImplementation

1# Requires: fixest, MASS, pscl, AER
2library(fixest)
3library(MASS)
4library(pscl)
5
6# --- Step 1: Poisson with Robust SEs (PPML) ---
7# fepois() from fixest fits Poisson pseudo-maximum likelihood.
8# Coefficients are semi-elasticities: beta = approx. % change in E[Y] per unit X.
9# vcov = ~firm_id clusters standard errors at the firm level.
10# The "|" syntax absorbs tech_class and year fixed effects.
11pois_fit <- fepois(citations ~ rd_spending + firm_age | tech_class + year,
12                 data = df, vcov = ~firm_id)
13# Output: coefficients in log units. Exponentiate for IRRs: exp(beta).
14summary(pois_fit)
15
16# --- Step 2: Negative Binomial ---
17# glm.nb() from MASS adds a dispersion parameter (alpha) to relax
18# the Poisson equidispersion assumption (mean = variance).
19# Use when you want model-based SEs that account for overdispersion.
20nb_fit <- glm.nb(citations ~ rd_spending + firm_age + factor(tech_class), data = df)
21# Check theta (= 1/alpha): small theta = high overdispersion.
22summary(nb_fit)
23
24# --- Step 3: Zero-Inflated Poisson ---
25# zeroinfl() fits a two-part model: (1) logit for structural zeros
26# (e.g., patents that can never be cited), (2) Poisson for counts.
27# The "|" separates the count model from the zero-inflation model.
28zip_fit <- zeroinfl(citations ~ rd_spending + firm_age | small_firm, data = df)
29# Output includes both count and zero-inflation model coefficients.
30summary(zip_fit)
31
32# --- Step 4: Overdispersion Test ---
33# dispersiontest() from AER tests H0: Var(Y) = E[Y] (equidispersion).
34# Rejection (p < 0.05) indicates overdispersion — use robust SEs or NegBin.
35library(AER)
36dispersiontest(glm(citations ~ rd_spending + firm_age, family = poisson, data = df))

Requiresfixest MASS pscl AER

FDiagnostics

Testing for Overdispersion

Deviance test: Compare the deviance (or Pearson chi-squared) to the degrees of freedom. A ratio much greater than 1 suggests overdispersion.
Cameron-Trivedi test: A regression-based test (see derivation above).
Compare Poisson and NegBin: If the NegBin dispersion parameter $\alpha$ is significantly different from zero, overdispersion is present.

Zero-Inflation

If your data have more zeros than the Poisson (or NegBin) predicts, you may need a zero-inflated model. Two options:

Zero-inflated Poisson/NegBin (ZIP/ZINB): A two-part model where one equation determines the probability of being a "structural zero" (e.g., a patent that could never receive citations), and a second equation models the count for the rest.
Hurdle model: The first part is a binary model for zero vs. positive, and the second part is a truncated count model for positive counts.

The Vuong test can help distinguish between standard Poisson and zero-inflated Poisson, though it has known size issues.

PPML for Gravity Models

In international trade, the gravity equation models bilateral trade flows as a function of GDP and distance. PPML has become the standard estimator (Silva & Tenreyro, 2006) because it:

Handles zeros naturally (many country pairs have zero trade).
Is robust to heteroskedasticity in levels.
Provides consistent estimates even with non-integer outcomes.

Incidence Rate Ratios (IRRs)

The exponentiated coefficient $e^{\beta_j}$ is the incidence rate ratio. If $e^{\beta_j} = 1.15$ , a one-unit increase in $X_j$ multiplies the expected count by 1.15 — a 15% increase.

Semi-Elasticities

The raw coefficient $\beta_j$ is a semi-elasticity: a one-unit increase in $X_j$ is associated with an approximate $100 \times \beta_j$ % change in the expected count. This approximation is good for small $\beta_j$ (say, below 0.3 in absolute value).

Comparing Poisson and NegBin Coefficients

Unlike logit (where rescaling makes comparison difficult), Poisson and NegBin coefficients are directly comparable because both target the same conditional mean function. If the coefficients differ substantially, it suggests the conditional mean may be misspecified.

GWhat Can Go Wrong

Problem	What It Does	How to Fix It
Using OLS on counts	Negative predictions, wrong SEs, inefficient	Use Poisson or NegBin
Using log(Y+1)	Jensen's inequality bias, arbitrary constant	Use PPML or count models
Ignoring overdispersion with default Poisson SEs	Standard errors are too small, false significance	Use robust/clustered SEs or switch to NegBin
Confusing zero-inflation with overdispersion	Both produce excess zeros but for different reasons	Zero-inflated models for structural zeros; NegBin for general overdispersion
Incidental parameters problem with NegBin FE	NegBin with unit FE can be inconsistent	Use Poisson FE (which is consistent) or the Hausman-Hall-Griliches approach

What Can Go Wrong

Using OLS on Count Data

Poisson regression: the exponential link guarantees non-negative predicted counts, and robust standard errors accommodate dispersion

Estimated effect of R&D on citations: coefficient = 0.12 (SE = 0.03). A $1M increase in R&D is associated with a 12% increase in expected citations. All predicted counts are positive.

What Can Go Wrong

Using log(Y+1) Instead of Count Models

PPML (Poisson PML) directly models the conditional mean E[Y|X] = exp(X'beta), handling zeros naturally

PPML coefficient on FTA membership: 0.38 (SE = 0.09). Free trade agreements increase bilateral trade by approximately 46% (exp(0.38) - 1 = 0.46). Zeros in trade flows are included in estimation.

What Can Go Wrong

Ignoring Overdispersion with Default Poisson SEs

Poisson regression with robust (sandwich) standard errors that account for overdispersion

Coefficient on R&D: 0.12. Robust SE = 0.031. 95% CI: [0.059, 0.181]. Confidence interval has correct coverage despite variance being 17x the mean.

Concept Check

You estimate a Poisson regression of patent citations on R&D spending (in millions). The coefficient on R&D is 0.12 (SE = 0.03). How do you interpret this coefficient?

An additional million dollars of R&D spending is associated with 0.12 more citations.An additional million dollars of R&D spending is associated with approximately a 12% increase in expected citations.The incidence rate ratio is 0.12.R&D spending explains 12% of the variation in citations.

HPractice

Concept Check

You run a Poisson regression and find that the variance of your outcome (patent citations) is 15 times larger than the mean. A colleague says: 'Your Poisson model is invalid — you must switch to negative binomial.' Is the colleague correct?

Yes — the Poisson model requires mean equals variance, and a 15:1 ratio violates this assumption.No — Poisson with robust standard errors gives consistent coefficient estimates regardless of overdispersion. The SEs just need to be adjusted.Yes — but only if you are using maximum likelihood, not quasi-MLE.No — the variance-to-mean ratio is irrelevant for count models.

Concept Check

A researcher studies hospital readmissions (a count outcome). She uses an exposure variable (length of initial stay in days) because patients with longer stays have more time at risk of readmission. How should she incorporate this exposure variable in the Poisson model?

Include length of stay as a regular covariate on the right-hand side.Include log(length of stay) as an offset — a covariate with a coefficient constrained to 1.Divide the readmission count by length of stay and model the ratio with OLS.Drop patients with very short stays to avoid zero-exposure problems.

Concept Check

A colleague uses ln(patents + 1) as the dependent variable and runs OLS. She argues the approach is equivalent to Poisson regression because 'both model the log of the outcome.' Is she correct?

Yes — log-transforming counts and running OLS is equivalent to Poisson regression.No — Poisson models the log of the conditional mean, while OLS on ln(Y+1) models the conditional mean of the log. These differ due to Jensen's inequality.Yes — but only if there are no zeros in the data.No — OLS cannot handle count data under any transformation.

Concept Check

An innovation researcher estimates a negative binomial regression with firm fixed effects to study how R&D tax credits affect patent counts. She has a panel of 3,000 firms over 8 years. A reviewer says she should use Poisson FE instead. Why?

Poisson is always preferred over negative binomial.Negative binomial with fixed effects suffers from the incidental parameters problem, while Poisson FE does not.Negative binomial cannot be estimated with fixed effects.The reviewer is wrong — NegBin FE is the standard approach for overdispersed panel count data.

Guided Exercise

You estimate two models for patent citations. The Poisson model gives a deviance of 8,500 with 4,000 degrees of freedom. The NegBin model gives an estimated dispersion parameter alpha = 2.3 (SE = 0.4).

Is the deviance-to-df ratio consistent with overdispersion? Is the NegBin dispersion parameter significant?

Error Detective

Read the analysis below carefully and identify the errors.

An innovation researcher studies the effect of R&D tax credits on patent output. Using a panel of 5,000 firms over 10 years, they estimate a negative binomial regression with firm fixed effects. They report:

"The coefficient on R&D tax credit (binary) is 0.18 (p = 0.01), meaning that firms receiving tax credits produce 0.18 more patents per year. We use negative binomial with firm fixed effects to control for unobserved firm heterogeneity. The dispersion parameter alpha = 1.8 confirms that negative binomial is preferred over Poisson."

Select all errors you can find:

Interpreting the coefficient as a linear change in counts(Interpretation of the coefficient as '0.18 more patents')

Negative binomial with fixed effects suffers from the incidental parameters problem(Choice of negative binomial with firm fixed effects)

Error Detective

Read the analysis below carefully and identify the errors.

A trade economist estimates a gravity model of bilateral trade flows between 180 countries. Many country pairs have zero trade. The researcher takes ln(trade + 1) and runs OLS with exporter and importer fixed effects. They report:

"We estimate: ln(trade_ij + 1) = alpha_i + gamma_j + 0.85*ln(GDP_i*GDP_j) - 1.2*ln(distance_ij) + 0.45*FTA_ij + epsilon_ij. The coefficient on FTA membership indicates that free trade agreements increase bilateral trade by 45%. We add 1 to trade before taking the log to handle zeros."

Select all errors you can find:

Using log(Y+1) instead of PPML(Dependent variable transformation and estimator choice)

Incorrect interpretation of the FTA coefficient(Interpretation of the FTA coefficient as a 45% increase)

Referee Exercise

Read the paper summary below and write a brief referee critique (2-3 sentences) of the identification strategy.

Paper Summary

The authors study whether venture capital (VC) investment increases startup patent output. Using a sample of 8,000 startups observed annually over 2005-2018, they estimate a Poisson regression of annual patent counts on a VC funding dummy, controlling for firm age, industry, founding year, and total employees. They include year fixed effects. Standard errors are clustered at the firm level. They find that VC-backed startups produce 65% more patents (IRR = 1.65, p < 0.001) and conclude that VC funding causally increases innovation.

Key Table

Variable	IRR	Clustered SE	p-value
VC funded (0/1)	1.65	0.12	0.000
Firm age	1.08	0.02	0.000
Log(employees)	1.32	0.05	0.000
Year FE	Yes
Industry FE	Yes
Firm FE	No
N (firm-years)	42,000
Alpha (dispersion)	2.1

Authors' Identification Claim

By controlling for firm age, industry, size, and year effects, we isolate the independent effect of VC funding on patent production. Clustering at the firm level accounts for serial correlation.

ISwap-In: When to Use Something Else

OLS on log(Y): If all your counts are large (say, above 20) and you have no zeros, taking the log and running OLS is approximately valid. But with zeros or small counts, do not do this transformation.
Tobit: If your count is in fact a censored continuous variable (e.g., hours worked), Tobit may be more appropriate.
Hurdle models: If the process that generates zeros is different from the process that generates positive counts (e.g., the decision to patent at all vs. how many patents to file), a hurdle model captures this two-stage structure.
Quasi-Poisson: In some fields, quasi-Poisson (which adjusts the variance without specifying a full distribution) is used as a middle ground.

JReviewer Checklist

Critical Reading Checklist

0 of 8 items checked0%

Is there a test for overdispersion, and is the choice of Poisson vs. NegBin justified?
Are robust or clustered standard errors used with Poisson?
Are coefficients interpreted as semi-elasticities or IRRs, not as linear effects?
If there are many zeros, is zero-inflation discussed or tested?
If fixed effects are used, is Poisson FE preferred over NegBin FE (to avoid the incidental parameters problem)?
Is the log(Y+1) transformation avoided in favor of proper count models?
For gravity/trade models, is PPML used instead of log-linear OLS?
Is endogeneity of key regressors addressed?

Paper Library

Has replication code

Foundational (8)

Cameron, A. C., & Trivedi, P. K. (1986). Econometric Models Based on Count Data: Comparisons and Applications of Some Estimators and Tests.

Journal of Applied EconometricsDOI: 10.1002/jae.3950010104

Cameron and Trivedi compare Poisson, negative binomial, and other count data models, providing tests for overdispersion and guidance on model selection. This paper helps establish the practical toolkit for applied researchers working with count outcomes.

Cameron, A. C., & Trivedi, P. K. (1990). Regression-based Tests for Overdispersion in the Poisson Model.

Journal of EconometricsDOI: 10.1016/0304-4076(90)90014-K

Cameron and Trivedi develop regression-based score tests for overdispersion in Poisson regression by testing whether the conditional variance equals the conditional mean against parametric alternatives. The tests remain a standard diagnostic for count-data model selection.

Correia, S., Guimaraes, P., & Zylkin, T. (2020). Fast Poisson Estimation with High-Dimensional Fixed Effects.

Stata JournalDOI: 10.1177/1536867X20909691

Correia, Guimaraes, and Zylkin introduce the ppmlhdfe Stata command for fast Poisson estimation with multiple levels of fixed effects, making PPML feasible for large datasets with high-dimensional fixed effects. This tool has become standard for applied researchers working with count data in panel settings.

Gourieroux, C., Monfort, A., & Trognon, A. (1984). Pseudo Maximum Likelihood Methods: Theory.

EconometricaDOI: 10.2307/1913471

Gourieroux, Monfort, and Trognon develop the general theory of pseudo maximum likelihood estimation for cases in which the likelihood family may be misspecified. They derive conditions for consistency and asymptotic normality and characterize efficiency bounds in this broader framework. The Poisson PML result — consistency for the conditional mean under misspecification — is a special case that underpins the later widespread use of Poisson regression with robust standard errors.

Hausman, J., Hall, B. H., & Griliches, Z. (1984). Econometric Models for Count Data with an Application to the Patents–R&D Relationship.

EconometricaDOI: 10.2307/1911191

Hausman, Hall, and Griliches develop the econometric framework for Poisson and negative binomial regression models applied to count data, using the relationship between R&D spending and patent counts as the motivating application. The paper is a classic early econometric treatment of count-data models in panel settings.

Lambert, D. (1992). Zero-Inflated Poisson Regression, with an Application to Defects in Manufacturing.

TechnometricsDOI: 10.2307/1269547

Lambert introduces the zero-inflated Poisson (ZIP) model, which accounts for excess zeros in count data by mixing a point mass at zero with a Poisson distribution. The ZIP model has become a standard tool for count outcomes where a subpopulation generates only zeros.

Silva, J. M. C. S., & Tenreyro, S. (2006). The Log of Gravity.

Review of Economics and StatisticsDOI: 10.1162/rest.88.4.641

Silva and Tenreyro demonstrate that OLS estimation of log-linearized gravity models produces inconsistent estimates in the presence of heteroskedasticity. They show that Poisson pseudo-maximum-likelihood (PPML) provides consistent estimates and naturally handles zero trade flows, transforming the trade literature.

Wooldridge, J. M. (1999). Distribution-Free Estimation of Some Nonlinear Panel Data Models.

Journal of EconometricsDOI: 10.1016/S0304-4076(98)00033-5

Wooldridge shows that the fixed-effects Poisson quasi-MLE for panel data is consistent for the conditional mean even if the data are not Poisson-distributed (or even count), as long as the conditional mean is correctly specified. This result underpins the widespread use of fixed-effects Poisson in panel settings with overdispersion, zeros, or non-count nonnegative outcomes; the cross-sectional analogue for trade-flow gravity equations is Silva and Tenreyro (2006).

Application (4)

Ahuja, G. (2000). Collaboration Networks, Structural Holes, and Innovation: A Longitudinal Study.

Administrative Science QuarterlyDOI: 10.2307/2667105

Ahuja uses a random effects Poisson model (following Hausman, Hall, and Griliches 1984) to model patent counts as a function of collaboration network structure in this landmark network study. He finds that direct ties and indirect ties both increase innovation, while structural holes (gaps between partners) decrease it — challenging Burt's structural holes theory in the context of innovation. The paper demonstrates the use of count models with panel data in management research, with fixed effects Poisson estimated as a robustness check.

Fleming, L., & Sorenson, O. (2001). Technology as a Complex Adaptive System: Evidence from Patent Data.

Research PolicyDOI: 10.1016/S0048-7333(00)00135-9

Fleming and Sorenson use negative binomial regression on patent citation counts to study how the interdependence among recombined technological components — a complex-adaptive-systems framing of invention — affects the usefulness of inventions.

Greve, H. R. (2003). A Behavioral Theory of R&D Expenditures and Innovations: Evidence from Shipbuilding.

Academy of Management JournalDOI: 10.5465/30040661

Greve tests behavioral theory predictions about how performance relative to aspiration levels affects R&D investment and innovation output using count models in the Japanese shipbuilding industry. He finds that low performance triggers problemistic search (increasing R&D), high slack triggers slack search (also increasing R&D), and low performance increases risk tolerance for launching innovations. The paper demonstrates how to model count-based innovation outcomes with firm-level panel data in a management context.

Katila, R., & Ahuja, G. (2002). Something Old, Something New: A Longitudinal Study of Search Behavior and New Product Introduction.

Academy of Management JournalDOI: 10.2307/3069433

Katila and Ahuja use Poisson panel count models to study how the depth and scope of a firm's knowledge search affect new product introductions, finding non-linear effects of search behaviors on innovative output.

Survey (3)

Cameron, A. C., & Trivedi, P. K. (2013). Regression Analysis of Count Data.

Cambridge University PressDOI: 10.1017/CBO9781139013567

Cameron and Trivedi provide the standard reference on count data regression, covering Poisson, negative binomial, zero-inflated, hurdle, and panel count models. They provide both the theoretical foundations and practical implementation guidance that applied researchers need.

Griliches, Z. (1990). Patent Statistics as Economic Indicators: A Survey.

Journal of Economic Literature

Griliches surveys patents as economic indicators of innovative output, discussing measurement issues (truncation, value heterogeneity, the use of citations as quality weights) that shape virtually all subsequent empirical work on innovation. The standard reference for researchers using patent counts or citations as outcome variables.

Wooldridge, J. M. (2010). Econometric Analysis of Cross Section and Panel Data.

MIT Press

Wooldridge's graduate textbook covers duration and hazard models in Chapter 22, including the Cox proportional hazard model, parametric alternatives (Weibull, exponential), and the treatment of censoring and truncation in survival data.

One-Line Implementation

Download Full Analysis Code

Motivating Example: Patent Citations#

AOverview#

The Poisson Model#

The Negative Binomial Model#

An Important Subtlety: Poisson with Robust SEs#

Common Confusions#

BIdentification#

CVisual Intuition#

DMathematical Derivation#

EImplementation#

FDiagnostics#

Testing for Overdispersion#

Zero-Inflation#

PPML for Gravity Models#

Incidence Rate Ratios (IRRs)#

Semi-Elasticities#

Comparing Poisson and NegBin Coefficients#

GWhat Can Go Wrong#

Using OLS on Count Data

Using log(Y+1) Instead of Count Models

Ignoring Overdispersion with Default Poisson SEs

HPractice#

Paper Summary

Key Table

Authors' Identification Claim

ISwap-In: When to Use Something Else#

JReviewer Checklist#

Critical Reading Checklist

Paper Library

Foundational (8)

Application (4)

Survey (3)

Tags

Motivating Example: Patent Citations

AOverview

The Poisson Model

The Negative Binomial Model

An Important Subtlety: Poisson with Robust SEs

Common Confusions

BIdentification

CVisual Intuition

DMathematical Derivation

EImplementation

FDiagnostics

Testing for Overdispersion

Zero-Inflation

PPML for Gravity Models

Incidence Rate Ratios (IRRs)

Semi-Elasticities

Comparing Poisson and NegBin Coefficients

GWhat Can Go Wrong

HPractice

ISwap-In: When to Use Something Else

JReviewer Checklist