Logit / Probit
Models for binary outcomes — when your dependent variable is yes/no, pass/fail, or adopt/don't adopt.
Quick Reference
- When to Use
- When your outcome variable is binary (0/1, yes/no, adopt/don't adopt) and the linear probability model is inadequate, especially when predicted probabilities near 0 or 1 matter.
- Key Assumption
- Correct specification of the link function (logistic or normal CDF). For causal interpretation, the same exogeneity condition as OLS applies.
- Common Mistake
- Reporting logit coefficients as if they were marginal effects — logit coefficients are in log-odds units, not probability units. Computing and reporting average marginal effects is standard practice.
- Estimated Time
- 2.5 hours
One-Line Implementation
logit y x1 x2, vce(robust)margins::margins(glm(y ~ x1 + x2, family = binomial(link = 'logit'), data = df))smf.logit('y ~ x1 + x2', data=df).fit().get_margeff().summary()Download Full Analysis Code
Complete scripts with diagnostics, robustness checks, and result export.
Motivating Example: Firm Adoption of a New Technology
Imagine you are studying why some firms adopt a new manufacturing technology and others do not. Your outcome variable is binary: if firm adopts, otherwise. You want to know how firm size, R&D spending, and industry competition affect the probability of adoption.
You could try running OLS — regressing the 0/1 outcome on your covariates. This approach is called the linear probability model (LPM), and it is a reasonable starting point. But it has problems. The predicted probabilities can fall outside , the error term is necessarily heteroskedastic, and the marginal effect of a covariate is assumed to be constant regardless of where you are on the probability scale.
Logit and probit models address these problems by modeling the probability through a nonlinear link function that keeps predictions bounded between 0 and 1.
A. Overview: Binary Outcome Models
The Problem with OLS on Binary Outcomes
When you run with , you are modeling:
This equation is the LPM. It works surprisingly well in many cases, especially near the center of the data. But at the extremes, it can predict probabilities below 0 or above 1, which is nonsensical.
Link Functions: The Core Idea
Both logit and probit model the probability through a nonlinear transformation:
where is a function that maps any real number to the interval.
- Logit uses the logistic function:
- Probit uses the standard normal CDF:
Both are S-shaped curves. They are nearly identical in practice — probit is slightly steeper at the center and slightly thinner at the tails. In most applications, they give very similar results.
When Does It Matter Which You Choose?
In most applications, it does not. The choice between logit and probit rarely changes substantive conclusions. Logit is more common in epidemiology and management because of the convenient odds ratio interpretation. Probit is more common in economics, partly by convention and partly because it connects naturally to latent variable models.
Common Confusions
B. Identification
The identification strategy for logit/probit is the same as for OLS: you need exogeneity of the regressors. The logit/probit framework does not solve problems — it just handles the functional form for binary outcomes.
If your regressors are endogenous, you need an identification strategy (IV, DiD, matching, etc.) combined with the appropriate binary outcome model. For IV with binary outcomes, see the bivariate probit or IV-probit approach. It is also advisable to consider sensitivity analysis to assess how robust your estimates are to potential unobserved confounders.
The Latent Variable Interpretation
Both models can be motivated by a latent variable :
If follows a logistic distribution, you get logit. If follows a standard normal, you get probit. The firm adopts the technology when the latent net benefit exceeds zero.
C. Visual Intuition
Think of the probability curve as a hill. At the bottom (low probability of adoption), even a large change in firm size barely moves the probability — you are pushing against inertia. At the top (high probability), the same is true — most firms have already adopted. The steepest part of the hill is in the middle, around 50% probability. This middle region is where a change in X has the biggest effect on the probability.
This nonlinearity is why marginal effects depend on where you evaluate them. A one-unit increase in firm size might raise adoption probability by 8 percentage points for a mid-sized firm (on the steep part of the curve) but only 2 percentage points for a very large firm (on the flat part).
Logit Marginal Effects
The marginal effect of X on P(Y=1) is not constant in logit: it peaks near the 50% baseline probability and shrinks toward zero at the extremes, unlike OLS where the marginal effect equals the coefficient everywhere.
Computed Results
- Baseline Probability P(Y=1)
- 0.500
- Marginal Effect (AME at baseline)
- 0.375
- Peak ME (at p = 0.5, = β/4)
- 0.375
Why Logit / Probit?
Binary DGP: P(Y=1|X) = sigmoid(0.0 + 1.5 · X). N = 200. Comparing average marginal effects (AMEs) across estimators. LPM produces 34 predictions outside [0, 1].
Estimation Results
| Estimator | β̂ | SE | 95% CI | Bias |
|---|---|---|---|---|
| LPMclosest | 0.138 | 0.010 | [0.12, 0.16] | -0.000 |
| Logit | 0.136 | 0.019 | [0.10, 0.17] | -0.002 |
| Probit | 0.135 | 0.016 | [0.10, 0.17] | -0.003 |
| True β | 0.138 | — | — | — |
Number of observations
Coefficient in the latent index (steeper = more extreme probabilities)
Shifts the probability curve left/right
Why the difference?
The Linear Probability Model predicts outside [0, 1] for 34 of 200 observations (18 below 0, 16 above 1). These nonsensical probabilities are a fundamental problem with applying OLS to binary outcomes. On the average marginal effect (AME) scale, logit recovers the true AME more accurately here because the DGP uses a logistic link. Both logit and probit correctly bound predictions to [0, 1] and model the inherent nonlinearity of binary outcomes. The table compares AMEs rather than raw coefficients, since the logit slope (log-odds), probit slope (latent-index), and LPM slope (linear probability) are not on the same scale. AMEs express each estimator's effect as the average change in P(Y=1) for a unit increase in X.
D. Mathematical Derivation
Don't worry about the notation yet — here's what this means in words: We find the coefficients that make the observed data most likely, by maximizing the probability of seeing the 1s and 0s we actually observe.
For a binary outcome with probability , the likelihood for observation is:
The log-likelihood for the full sample is:
Taking the derivative and using the fact that :
This likelihood has no closed-form solution and must be solved numerically via iteratively reweighted least squares (IRLS) or Newton-Raphson.
Marginal effects: The partial effect of on the probability is:
This expression depends on , which is why you must evaluate it at specific values or average it across the sample.
E. Implementation
library(margins)
# Logit
logit_fit <- glm(adopt ~ firm_size + rd_spending + competition,
family = binomial(link = "logit"), data = df)
summary(logit_fit)
# Average marginal effects
ame <- margins(logit_fit)
summary(ame)
# Odds ratios
exp(coef(logit_fit))
exp(confint(logit_fit))
# Probit
probit_fit <- glm(adopt ~ firm_size + rd_spending + competition,
family = binomial(link = "probit"), data = df)
summary(margins(probit_fit))F. Diagnostics and Model Fit
Pseudo R-Squared
There is no true for logit/probit. McFadden's pseudo- compares the log-likelihood of your model to a null model (intercept only):
Values above 0.2 are considered quite good. Do not compare pseudo- values across different link functions.
Classification Table
Predict if (usually ) and compute the confusion matrix. Report sensitivity (true positive rate), specificity (true negative rate), and overall accuracy. But be cautious: classification accuracy is sensitive to class imbalance.
Hosmer-Lemeshow Test
Groups observations into deciles of predicted probability and tests whether observed frequencies match predicted frequencies. A significant test suggests poor calibration, but the test has low power and is sensitive to the number of groups.
Interpreting Results
Three Ways to Report Logit Results
- Log-odds coefficients — the raw output. Hard to interpret; mainly useful for checking sign and significance.
- Odds ratios — . "A one-unit increase in X multiplies the odds of Y=1 by ." Common in epidemiology and management.
- Marginal effects — the change in probability. Most intuitive. Preferred in economics.
G. What Can Go Wrong
| Problem | What It Does | How to Fix It |
|---|---|---|
| Reporting coefficients as marginal effects | Overstates/understates the effect | Compute and report AMEs |
| Perfect separation | MLE does not converge; coefficients explode to infinity | Drop the problematic variable, use penalized likelihood (Firth logit), or combine categories |
| Rare events | Finite-sample bias in coefficient estimates when Y=1 is very rare (under 5%) | Use rare-events logit (King & Zeng, 2001) or exact logit |
| Ignoring heteroskedasticity | Standard errors are wrong | Use robust SEs |
| Comparing coefficients across models | Logit coefficients are not comparable across models with different covariates (rescaling problem) | Compare marginal effects instead |
Interpreting Logit Coefficients as Marginal Effects
Researcher computes average marginal effects after logit estimation
AME of firm size on adoption probability: 0.05 (SE = 0.017). A one-unit increase in firm size raises the probability of adoption by about 5 percentage points on average.
Perfect Separation
All covariate values have some variation in the outcome — both 0s and 1s appear at every level of X
Logit converges normally. Coefficient on industry dummy: 1.8 (SE = 0.4). MLE is well-defined and standard errors are reliable.
Comparing Logit Coefficients Across Models
Researcher compares average marginal effects across a baseline model and a model with additional controls
AME of R&D on adoption: 0.08 (baseline model) vs. 0.06 (with controls). The 2 percentage point decrease suggests modest confounding by the added covariates.
A logit regression of firm adoption on firm size produces a coefficient of 0.3 with robust SE 0.1. The average marginal effect is 0.05. How do you interpret the result?
H. Practice
A researcher runs a logit model and a probit model on the same data. The logit coefficient on firm size is 0.48 and the probit coefficient is 0.28. She concludes that the logit model estimates a much larger effect. Is she correct?
A logit model predicting loan default produces a coefficient of -0.8 on credit score (standardized). The odds ratio is exp(-0.8) = 0.45. A manager asks: 'So a one-SD increase in credit score cuts the default probability in half?' Is the manager correct?
A colleague says: 'I always use logit for binary outcomes because OLS can predict probabilities outside [0,1].' When might the linear probability model (LPM) actually be a reasonable choice?
You add an interaction term (firm_size * rd_spending) to a logit model. The coefficient on the interaction is 0.15 (p = 0.03). A reviewer says you cannot interpret the interaction effect by looking at this coefficient alone. Why?
Interpreting Logit Results: Loan Default Prediction
A bank analyst runs a logit regression to predict whether a small business loan will default. The dependent variable is Default (1 = defaulted, 0 = repaid). The key predictor is Years_in_business (continuous). The estimated logit coefficient is -0.4 and the average marginal effect is -0.06. The baseline default probability in the sample is 20%.
Read the analysis below carefully and identify the errors.
Select all errors you can find:
Read the paper summary below and write a brief referee critique (2-3 sentences) of the identification strategy.
Paper Summary
The authors study whether firms with female CEOs are more likely to adopt environmental sustainability practices. Using a cross-section of 3,200 publicly traded firms, they run a logit regression of sustainability adoption (0/1) on a female CEO dummy, controlling for firm size (log revenue), industry dummies, ROA, and firm age. They report that the odds ratio on female CEO is 1.85 (p = 0.002) and conclude that female leadership causes firms to be 85% more likely to adopt sustainability practices.
Key Table
| Variable | Odds Ratio | Robust SE | p-value |
|---|---|---|---|
| Female CEO | 1.85 | 0.35 | 0.002 |
| Log(Revenue) | 1.42 | 0.08 | 0.000 |
| ROA | 1.10 | 0.22 | 0.640 |
| Firm age | 1.01 | 0.003 | 0.001 |
| Industry FE | Yes | ||
| Pseudo R-squared | 0.18 | ||
| N | 3,200 |
Authors' Identification Claim
By controlling for firm size, profitability, firm age, and industry, we isolate the independent effect of CEO gender on sustainability adoption.
I. Swap-In: When to Use Something Else
- Linear Probability Model (LPM): If your probabilities are between 0.2 and 0.8 for most observations, the LPM with robust SEs often gives nearly identical average marginal effects. Easier to interpret and to combine with FE or IV.
- Conditional logit (fixed effects logit): For panel data with unit fixed effects. Only uses within-unit variation. See Chamberlain (1980).
- Multinomial logit: When the outcome has more than two unordered categories.
- Ordered logit/probit: When the outcome has ordered categories (e.g., strongly disagree to strongly agree).
- Count models: When the outcome is a non-negative integer (number of events), see Poisson / Negative Binomial instead.
J. Reviewer Checklist
Critical Reading Checklist
Paper Library
Foundational (6)
McFadden, D. (1974). Conditional Logit Analysis of Qualitative Choice Behavior.
McFadden developed the conditional logit model grounded in random utility theory, showing how discrete choices among alternatives can be modeled by assuming individuals maximize utility with an extreme-value distributed error. This work earned him the 2000 Nobel Prize and remains the foundation of discrete choice analysis.
Amemiya, T. (1981). Qualitative Response Models: A Survey.
Amemiya provided a comprehensive survey of qualitative response models including logit, probit, and tobit. This survey organized the theoretical properties, estimation methods, and specification tests for binary and multinomial choice models and became a standard reference for applied researchers.
Hausman, J., & McFadden, D. (1984). Specification Tests for the Multinomial Logit Model.
This paper developed a specification test for the independence of irrelevant alternatives (IIA) assumption in multinomial logit. The test allows researchers to assess whether the logit model's restrictive substitution patterns are appropriate for their data, which is critical for applied work with multiple choice categories.
Ai, C., & Norton, E. C. (2003). Interaction Terms in Logit and Probit Models.
Ai and Norton showed that the interpretation of interaction terms in nonlinear models like logit and probit is much more complicated than in linear models. The marginal effect of an interaction is not simply the coefficient on the interaction term, a mistake that was widespread in applied research.
Wooldridge, J. M. (2010). Econometric Analysis of Cross Section and Panel Data.
Wooldridge's graduate textbook provides a comprehensive and rigorous treatment of logit, probit, and other discrete choice models in both cross-sectional and panel data settings. Chapters 15–16 cover binary response models, multinomial models, and the econometric issues specific to nonlinear estimation with unobserved heterogeneity.
Chamberlain, G. (1980). Analysis of Covariance with Qualitative Data.
Chamberlain showed how to incorporate fixed effects into logit models via conditional maximum likelihood, which is essential for panel data applications where unobserved unit-level heterogeneity must be controlled for.
Application (4)
Angrist, J. D., & Pischke, J.-S. (2009). Mostly Harmless Econometrics: An Empiricist's Companion.
Angrist and Pischke argue that for causal inference purposes, the linear probability model (OLS on a binary outcome) is often preferable to logit or probit because it avoids functional form assumptions and yields easily interpretable coefficients. This influential perspective has shifted many applied researchers toward LPM.
Hoetker, G. (2007). The Use of Logit and Probit Models in Strategic Management Research: Critical Issues.
Hoetker reviewed how strategy researchers use logit and probit models and identified common pitfalls, including misinterpretation of coefficients across groups and incorrect use of interaction terms. This paper provided concrete guidance for improving practice in management journals.
Zelner, B. A. (2009). Using Simulation to Interpret Results from Logit, Probit, and Other Nonlinear Models.
Zelner advocated using simulation-based approaches to interpret and present results from nonlinear models in management research. By computing predicted probabilities and marginal effects via simulation, researchers can convey substantive significance more clearly than raw coefficients.
Palepu, K. G. (1986). Predicting Takeover Targets: A Methodological and Empirical Analysis.
Palepu used logit models to predict which firms would become takeover targets based on financial and market characteristics. This influential paper demonstrated the practical application of binary choice models to corporate strategy and governance questions.
Survey (3)
Train, K. E. (2009). Discrete Choice Methods with Simulation.
Train's textbook provides a comprehensive and accessible treatment of logit, probit, mixed logit, and other discrete choice models. It covers both theory and practical simulation-based estimation methods and is widely used in economics, marketing, and transportation research.
Cameron, A. C., & Trivedi, P. K. (2005). Microeconometrics: Methods and Applications.
Chapters 14–15 offer comprehensive coverage of binary and multinomial choice models, with detailed discussion of estimation and specification testing.
Long, J. S. (1997). Regression Models for Categorical and Limited Dependent Variables.
A widely used reference for applied researchers working with binary, ordinal, multinomial, and count outcome models, with clear exposition of interpretation and software implementation.