Logit / Probit
Models for binary outcomes — when your dependent variable is yes/no, pass/fail, or adopt/don't adopt.
One-Line Implementation
glm(y ~ x1 + x2, family = binomial(link = 'logit'), data = df) |> lmtest::coeftest(vcov. = sandwich::vcovHC)logit y x1 x2, vce(robust)smf.logit('y ~ x1 + x2', data=df).fit(cov_type='HC1')Download Full Analysis Code
Complete scripts with diagnostics, robustness checks, and result export.
Motivating Example: Firm Adoption of a New Technology
Imagine you are studying why some firms adopt a new manufacturing technology and others do not. Your outcome variable is binary: if firm adopts, otherwise. You want to know how firm size, R&D spending, and industry competition affect the probability of adoption.
You could try running OLS — regressing the 0/1 outcome on your covariates. This approach is called the linear probability model (LPM), and it is a reasonable starting point. But it has problems. The predicted probabilities can fall outside , the error term is necessarily heteroskedastic, and the marginal effect of a covariate is assumed to be constant regardless of where you are on the probability scale.
Logit and probit models address these problems by modeling the probability through a nonlinear link function that keeps predictions bounded between 0 and 1.
AOverview
The Problem with OLS on Binary Outcomes
When you run with , you are modeling:
This equation is the LPM. It works surprisingly well in many cases, especially near the center of the data. But at the extremes, it can predict probabilities below 0 or above 1, which is nonsensical.
Link Functions: The Core Idea
Both logit and probit model the probability through a nonlinear transformation:
where is a function that maps any real number to the interval.
- Logit uses the logistic function:
- Probit uses the standard normal CDF:
Both are S-shaped curves. They are nearly identical in practice — probit is slightly steeper at the center and slightly thinner at the tails. In most applications, they give very similar results.
When Does It Matter Which You Choose?
In most applications, it does not. The choice between logit and probit rarely changes substantive conclusions. Logit is more common in epidemiology and management because of the convenient interpretation. Probit is more common in economics, partly by convention and partly because it connects naturally to latent variable models.
Common Confusions
BIdentification
The identification strategy for logit/probit is the same as for OLS: you need exogeneity of the regressors. The logit/probit framework does not solve problems — it just handles the functional form for binary outcomes.
If your regressors are endogenous, you need an identification strategy (IV, DiD, matching, etc.) combined with the appropriate binary outcome model. For IV with binary outcomes, see the bivariate probit or IV-probit approach. It is also advisable to consider sensitivity analysis to assess how robust your estimates are to potential unobserved confounders.
The Latent Variable Interpretation
Both models can be motivated by a latent variable :
If follows a logistic distribution, you get logit. If follows a standard normal, you get probit. The firm adopts the technology when the latent net benefit exceeds zero.
CVisual Intuition
Think of the probability curve as a hill. At the bottom (low probability of adoption), even a large change in firm size barely moves the probability — you are pushing against inertia. At the top (high probability), the same is true — most firms have already adopted. The steepest part of the hill is in the middle, around 50% probability. This middle region is where a change in X has the biggest effect on the probability.
This nonlinearity is why marginal effects depend on where you evaluate them. A one-unit increase in firm size might raise adoption probability by 8 percentage points for a mid-sized firm (on the steep part of the curve) but only 2 percentage points for a very large firm (on the flat part).
DMathematical Derivation
Don't worry about the notation yet — here's what this means in words: We find the coefficients that make the observed data most likely, by maximizing the probability of seeing the 1s and 0s we actually observe.
For a binary outcome with probability , the likelihood for observation is:
The log-likelihood for the full sample is:
Taking the derivative and using the fact that :
This likelihood has no closed-form solution and must be solved numerically via iteratively reweighted least squares (IRLS) or Newton-Raphson.
Marginal effects: The partial effect of on the probability is:
This expression depends on , which is why you must evaluate it at specific values or average it across the sample.
EImplementation
# Requires: marginaleffects
library(marginaleffects)
# --- Step 1: Fit the Logit Model ---
# glm() with family=binomial(link="logit") estimates via maximum likelihood.
# Coefficients are in LOG-ODDS units, not probability units.
logit_fit <- glm(adopt ~ firm_size + rd_spending + competition,
family = binomial(link = "logit"), data = df)
# summary() shows log-odds coefficients, SEs, z-values, and Akaike Information Criterion (AIC)
summary(logit_fit)
# --- Step 2: Compute Average Marginal Effects (AMEs) ---
# AMEs translate log-odds coefficients into probability-scale effects.
# Each AME represents the average change in P(Y=1) for a one-unit change in X,
# averaged across all observations (accounting for nonlinearity).
# marginaleffects::avg_slopes() is the modern replacement for the archived margins package.
ame <- avg_slopes(logit_fit)
# Output: AME in percentage-point terms — the primary quantity to report
print(ame)
# --- Step 3: Compute Odds Ratios ---
# Exponentiate coefficients to get odds ratios: exp(beta).
# An OR of 1.35 means a one-unit increase in X multiplies the odds by 1.35.
exp(coef(logit_fit))
# Confidence intervals for odds ratios (profile likelihood-based)
exp(confint(logit_fit))
# --- Step 4: Fit Probit for Robustness ---
# Probit uses the normal CDF as the link function instead of logistic.
# Results should be substantively similar to logit — showing both
# demonstrates robustness to the choice of link function.
probit_fit <- glm(adopt ~ firm_size + rd_spending + competition,
family = binomial(link = "probit"), data = df)
# Compare probit AMEs to logit AMEs — they should nearly agree
print(avg_slopes(probit_fit))FDiagnostics
Pseudo R-Squared
There is no true for logit/probit. McFadden's pseudo- compares the log-likelihood of your model to a null model (intercept only):
Values above 0.2 are sometimes informally considered indicative of good fit, though interpretation depends on context and there is no universal threshold. Do not compare pseudo- values across different link functions.
Classification Table
Predict if (usually ) and compute the confusion matrix. Report sensitivity (true positive rate), specificity (true negative rate), and overall accuracy. But be cautious: classification accuracy is sensitive to class imbalance.
Hosmer-Lemeshow Test
Groups observations into deciles of predicted probability and tests whether observed frequencies match predicted frequencies. A significant test suggests poor calibration, but the test has low power and is sensitive to the number of groups.
Three Ways to Report Logit Results
- Log-odds coefficients — the raw output. Hard to interpret; mainly useful for checking sign and significance.
- Odds ratios — . "A one-unit increase in X multiplies the odds of Y=1 by ." Common in epidemiology and management.
- Marginal effects — the change in probability. Most intuitive. Preferred in economics.
GWhat Can Go Wrong
| Problem | What It Does | How to Fix It |
|---|---|---|
| Reporting coefficients as marginal effects | Overstates/understates the effect | Compute and report AMEs |
| Perfect separation | Maximum likelihood estimation (MLE) does not converge; coefficients explode to infinity | Drop the problematic variable, use penalized likelihood (Firth logit), or combine categories |
| Rare events | Finite-sample bias in predicted probabilities and intercept estimates when Y=1 is very rare (e.g., well under 5%). Slope coefficients are less affected, but predicted event probabilities can be substantially downward-biased. | Use rare-events logit (King & Zeng, 2001) or exact logit |
| Ignoring heteroskedasticity | Standard errors are wrong | Use robust SEs |
| Comparing coefficients across models | Logit coefficients are not comparable across models with different covariates (rescaling problem) (Allison, 1999) | Compare marginal effects instead |
| Neglected heterogeneity | When an unobserved variable independent of X is omitted, probit and logit coefficients are attenuated toward zero. However, average partial effects (marginal effects) remain consistently estimated. This result is one of the strongest arguments for reporting marginal effects rather than raw coefficients (Wooldridge, 2010). | Report AMEs rather than raw coefficients |
Interpreting Logit Coefficients as Marginal Effects
Researcher computes average marginal effects after logit estimation
AME of firm size on adoption probability: 0.05 (SE = 0.017). A one-unit increase in firm size raises the probability of adoption by about 5 percentage points on average.
Perfect Separation
All covariate values have some variation in the outcome — both 0s and 1s appear at every level of X
Logit converges normally. Coefficient on industry dummy: 1.8 (SE = 0.4). MLE is well-defined and standard errors are reliable.
Comparing Logit Coefficients Across Models
Researcher compares average marginal effects across a baseline model and a model with additional controls
AME of R&D on adoption: 0.08 (baseline model) vs. 0.06 (with controls). The 2 percentage point decrease suggests modest confounding by the added covariates.
A logit regression of firm adoption on firm size produces a coefficient of 0.3 with robust SE 0.1. The average marginal effect is 0.05. How do you interpret the result?
HPractice
A researcher runs a logit model and a probit model on the same data. The logit coefficient on firm size is 0.48 and the probit coefficient is 0.28. She concludes that the logit model estimates a much larger effect. Is she correct?
A logit model predicting loan default produces a coefficient of -0.8 on credit score (standardized). The odds ratio is exp(-0.8) = 0.45. A manager asks: 'So a one-SD increase in credit score cuts the default probability in half?' Is the manager correct?
A colleague says: 'I always use logit for binary outcomes because OLS can predict probabilities outside [0,1].' When might the linear probability model (LPM) actually be a reasonable choice?
You add an interaction term (firm_size * rd_spending) to a logit model. The coefficient on the interaction is 0.15 (p = 0.03). A reviewer says you cannot interpret the interaction effect by looking at this coefficient alone. Why?
Interpreting Logit Results: Loan Default Prediction
A bank analyst runs a logit regression to predict whether a small business loan will default. The dependent variable is Default (1 = defaulted, 0 = repaid). The key predictor is `Years_in_business` (continuous). The estimated logit coefficient is -0.4 and the average marginal effect is -0.06. The baseline default probability in the sample is 20%.
Read the analysis below carefully and identify the errors.
A researcher studies whether receiving venture capital funding affects the probability that a startup goes public (IPO). They run a logit regression of IPO (0/1) on `VC_funded` (0/1), controlling for firm age, industry, and founder experience. They report: 'The coefficient on `VC_funded` is 1.2 (p < 0.01), meaning that VC funding increases the probability of IPO by 120 percentage points.'
Select all errors you can find:
Read the paper summary below and write a brief referee critique (2-3 sentences) of the identification strategy.
Paper Summary
The authors study whether firms with female CEOs are more likely to adopt environmental sustainability practices. Using a cross-section of 3,200 publicly traded firms, they run a logit regression of sustainability adoption (0/1) on a female CEO dummy, controlling for firm size (log revenue), industry dummies, ROA, and firm age. They report that the odds ratio on female CEO is 1.85 (p = 0.002) and conclude that female leadership causes firms to be 85% more likely to adopt sustainability practices.
Key Table
| Variable | Odds Ratio | Robust SE | p-value |
|---|---|---|---|
| Female CEO | 1.85 | 0.35 | 0.002 |
| Log(Revenue) | 1.42 | 0.08 | 0.000 |
| ROA | 1.10 | 0.22 | 0.640 |
| Firm age | 1.01 | 0.003 | 0.001 |
| Industry FE | Yes | ||
| Pseudo R-squared | 0.18 | ||
| N | 3,200 |
Authors' Identification Claim
By controlling for firm size, profitability, firm age, and industry, we isolate the independent effect of CEO gender on sustainability adoption.
ISwap-In: When to Use Something Else
- Linear Probability Model (LPM): If your probabilities are between 0.2 and 0.8 for most observations, the LPM with robust SEs often gives nearly identical average marginal effects. Easier to interpret and to combine with FE or IV.
- Conditional logit (fixed effects logit): For panel data with unit fixed effects. Only uses within-unit variation. See Chamberlain (1980). Unlike logit, probit does not have an analogous conditional MLE that eliminates fixed effects. FE probit suffers from the problem and is inconsistent with fixed T (Wooldridge, 2010).
- Multinomial logit: When the outcome has more than two unordered categories.
- Ordered logit/probit: When the outcome has ordered categories (e.g., strongly disagree to strongly agree).
- Count models: When the outcome is a non-negative integer (number of events), see Poisson / Negative Binomial instead.
JReviewer Checklist
Critical Reading Checklist
Paper Library
Foundational (7)
Ai, C., & Norton, E. C. (2003). Interaction Terms in Logit and Probit Models.
Ai and Norton show that the interpretation of interaction terms in nonlinear models like logit and probit is much more complicated than in linear models. The marginal effect of an interaction is not simply the coefficient on the interaction term, a mistake that is widespread in applied research.
Allison, P. D. (1999). Comparing Logit and Probit Coefficients Across Groups.
Allison shows that naive comparisons of logit or probit coefficients across groups are misleading because differences in residual variation across groups rescale the coefficients. He proposes a method to adjust for this confound, which is essential for interpreting interaction effects and group comparisons in nonlinear models.
Amemiya, T. (1981). Qualitative Response Models: A Survey.
Amemiya provides a comprehensive survey of qualitative response models including logit, probit, and tobit. This survey organizes the theoretical properties, estimation methods, and specification tests for binary and multinomial choice models and becomes a standard reference for applied researchers.
Chamberlain, G. (1980). Analysis of Covariance with Qualitative Data.
Chamberlain extends the fixed effects approach to nonlinear models like logit, showing how to condition out the fixed effects in discrete choice settings. This work is fundamental for researchers who need fixed effects in models where the dependent variable is binary or categorical.
Hausman, J., & McFadden, D. (1984). Specification Tests for the Multinomial Logit Model.
Hausman and McFadden develop a specification test for the independence of irrelevant alternatives (IIA) assumption in multinomial logit. The test allows researchers to assess whether the logit model's restrictive substitution patterns are appropriate for their data, which is critical for applied work with multiple choice categories.
King, G., & Zeng, L. (2001). Logistic Regression in Rare Events Data.
King and Zeng develop a correction for logistic regression when the outcome event is rare. Standard logit underestimates the probability of rare events; their rare-events logit (relogit) applies a correction based on prior information about the event rate in the population. Essential reference for binary outcome studies with highly imbalanced classes.
McFadden, D. (1974). Conditional Logit Analysis of Qualitative Choice Behavior.
McFadden develops the conditional logit model grounded in random utility theory, showing how discrete choices among alternatives can be modeled by assuming individuals maximize utility with an extreme-value distributed error. This work earns him the 2000 Nobel Prize and remains the foundation of discrete choice analysis.
Application (3)
Hoetker, G. (2007). The Use of Logit and Probit Models in Strategic Management Research: Critical Issues.
Hoetker reviews how strategy researchers use logit and probit models and identifies common pitfalls, including misinterpretation of coefficients across groups and incorrect use of interaction terms. This paper provides concrete guidance for improving practice in management journals.
Palepu, K. G. (1986). Predicting Takeover Targets: A Methodological and Empirical Analysis.
Palepu uses logit models to study takeover prediction and identifies methodological flaws in prior prediction studies, showing that targets are more difficult to predict than earlier work suggests. The paper highlights the importance of proper classification criteria and sampling methodology when applying binary choice models to rare-event corporate outcomes.
Zelner, B. A. (2009). Using Simulation to Interpret Results from Logit, Probit, and Other Nonlinear Models.
Zelner advocates using simulation-based approaches to interpret and present results from nonlinear models in management research. By computing predicted probabilities and marginal effects via simulation, researchers can convey substantive significance more clearly than raw coefficients.
Survey (5)
Angrist, J. D., & Pischke, J.-S. (2009). Mostly Harmless Econometrics: An Empiricist's Companion.
Angrist and Pischke write one of the most influential modern textbooks on applied econometrics, organizing the field around a design-based approach to causal inference. The book provides essential treatments of instrumental variables, difference-in-differences, and regression discontinuity, each grounded in the potential outcomes framework. It remains the standard reference for graduate students learning to evaluate and implement identification strategies.
Cameron, A. C., & Trivedi, P. K. (2005). Microeconometrics: Methods and Applications.
Cameron and Trivedi cover panel data methods comprehensively in Chapter 21, including fixed effects, random effects, and dynamic panel models. A standard graduate-level reference for microeconometric methods.
Long, J. S. (1997). Regression Models for Categorical and Limited Dependent Variables.
Long provides a comprehensive reference for applied researchers working with binary, ordinal, multinomial, and count outcome models. The textbook covers maximum likelihood estimation, marginal effects computation, and model diagnostics with clear exposition and software implementation guidance. It remains the standard practical guide for researchers who need to move beyond OLS to handle categorical and limited dependent variables.
Train, K. E. (2009). Discrete Choice Methods with Simulation.
Train's textbook provides a comprehensive and accessible treatment of logit, probit, mixed logit, and other discrete choice models. It covers both theory and practical simulation-based estimation methods and is widely used in economics, marketing, and transportation research.
Wooldridge, J. M. (2010). Econometric Analysis of Cross Section and Panel Data.
Wooldridge's graduate textbook is the standard reference for cross-section and panel data econometrics. Chapters 10-11 provide a thorough treatment of fixed effects, random effects, and related panel data methods, while later chapters cover general estimation methodology (MLE, GMM, M-estimation) with panel data applications throughout. The book covers both linear and nonlinear models with careful attention to assumptions.