When should I use Logit / Probit?

When your outcome variable is binary (0/1, yes/no, adopt/don't adopt) and the linear probability model is inadequate, especially when predicted probabilities near 0 or 1 matter.

What is the key assumption of Logit / Probit?

Correct specification of the link function (logistic or normal CDF). For causal interpretation, the same exogeneity condition as OLS applies.

Method·intermediate·9 min read

Discrete ChoiceEstablished

Logit / Probit

Models for binary outcomes — when your dependent variable is yes/no, pass/fail, or adopt/don't adopt.

When to Use: When your outcome variable is binary (0/1, yes/no, adopt/don't adopt) and the linear probability model is inadequate, especially when predicted probabilities near 0 or 1 matter.
Assumption: Correct specification of the link function (logistic or normal CDF). For causal interpretation, the same exogeneity condition as OLS applies.
Mistake: Reporting logit coefficients as if they were marginal effects — logit coefficients are in log-odds units, not probability units. Computing and reporting average marginal effects is standard practice.
Reading Time: ~9 min read · 11 sections · 8 interactive exercises

One-Line Implementation

Rglm(y ~ x1 + x2, family = binomial(link = 'logit'), data = df) |> lmtest::coeftest(vcov. = sandwich::vcovHC)

Statalogit y x1 x2, vce(robust)

Pythonsmf.logit('y ~ x1 + x2', data=df).fit(cov_type='HC1')

Download Full Analysis Code

Complete scripts with diagnostics, robustness checks, and result export.

Motivating Example: Firm Adoption of a New Technology

Imagine you are studying why some firms adopt a new manufacturing technology and others do not. Your outcome variable is binary: $Y_i = 1$ if firm $i$ adopts, $Y_i = 0$ otherwise. You want to know how firm size, R&D spending, and industry competition affect the probability of adoption.

You could try running OLS — regressing the 0/1 outcome on your covariates. The OLS-on-binary-outcome strategy is called the linear probability model (LPM), and it is a reasonable starting point. But it has problems. The predicted probabilities can fall outside $[0, 1]$ , the error term is necessarily heteroskedastic, and the marginal effect of a covariate is assumed to be constant regardless of where you are on the probability scale.

Logit and probit models address these problems by modeling the probability through a nonlinear link function that keeps predictions bounded between 0 and 1.

AOverview

The Problem with OLS on Binary Outcomes

When you run $Y_i = \beta_0 + \beta_1 X_i + \varepsilon_i$ with $Y_i \in \{0, 1\}$ , you are modeling:

E[Y_i \mid X_i] = P(Y_i = 1 \mid X_i) = \beta_0 + \beta_1 X_i

This equation is the LPM. It works surprisingly well in many cases, especially near the center of the data. But at the extremes, it can predict probabilities below 0 or above 1, which is nonsensical.

Link Functions: The Core Idea

Both logit and probit model the probability through a nonlinear transformation:

P(Y_i = 1 \mid X_i) = G(X_i'\beta)

where $G(\cdot)$ is a function that maps any real number to the $(0, 1)$ interval.

Logit uses the logistic function: $G(z) = \frac{e^z}{1 + e^z} = \Lambda(z)$
Probit uses the standard normal CDF: $G(z) = \Phi(z)$

Both are S-shaped curves. They are nearly identical in practice — probit is slightly steeper at the center and slightly thinner at the tails. In most applications, they give very similar results.

When Does It Matter Which You Choose?

In most applications, it does not. The choice between logit and probit rarely changes substantive conclusions. Logit is more common in epidemiology and management because of the convenient interpretation. Probit is more common in economics, partly by convention and partly because it connects naturally to latent variable models.

Common Confusions

For practical guidance on when the LPM is sufficient and how to compare its average marginal effects with logit/probit, see the "LPM Is Not Always Wrong" callout above and §I (Swap-In).

BIdentification

The identification strategy for logit/probit is the same as for OLS: you need exogeneity of the regressors. The logit/probit framework does not solve problems — it just handles the functional form for binary outcomes.

E[\varepsilon_i \mid X_i] = 0

If your regressors are endogenous, you need an identification strategy (IV, DiD, matching, etc.) combined with the appropriate binary outcome model. For IV with binary outcomes, see the bivariate probit or IV-probit approach. It is also advisable to consider sensitivity analysis to assess how robust your estimates are to potential unobserved confounders.

The Latent Variable Interpretation

Both models can be motivated by a latent variable $Y_i^*$ :

Y_i^* = X_i'\beta + \varepsilon_i, \quad Y_i = \mathbf{1}(Y_i^* > 0)

If $\varepsilon_i$ follows a logistic distribution, you get logit. If $\varepsilon_i$ follows a standard normal, you get probit. The firm adopts the technology when the latent net benefit exceeds zero.

CVisual Intuition

Think of the probability curve as a hill. At the bottom (low probability of adoption), even a large change in firm size barely moves the probability — you are pushing against inertia. At the top (high probability), the same is true — most firms have already adopted. The steepest part of the hill is in the middle, around 50% probability. This middle region is where a change in X has the biggest effect on the probability.

This nonlinearity is why marginal effects depend on where you evaluate them. A one-unit increase in firm size might raise adoption probability by 8 percentage points for a mid-sized firm (on the steep part of the curve) but only 2 percentage points for a very large firm (on the flat part).

DMathematical Derivation

Don't worry about the notation yet — here's what this means in words: We find the coefficients that make the observed data most likely, by maximizing the probability of seeing the 1s and 0s we actually observe.

For a binary outcome $Y_i \in \{0, 1\}$ with probability $p_i = P(Y_i = 1 \mid X_i) = \Lambda(X_i'\beta)$ , the likelihood for observation $i$ is:

L_i(\beta) = p_i^{Y_i} (1 - p_i)^{1 - Y_i}

The log-likelihood for the full sample is:

\ell(\beta) = \sum_{i=1}^{n} \left[ Y_i \ln(\Lambda(X_i'\beta)) + (1 - Y_i) \ln(1 - \Lambda(X_i'\beta)) \right]

Taking the derivative and using the fact that $\Lambda'(z) = \Lambda(z)(1 - \Lambda(z))$ :

\frac{\partial \ell}{\partial \beta} = \sum_{i=1}^{n} (Y_i - \Lambda(X_i'\beta)) X_i = 0

This likelihood has no closed-form solution and must be solved numerically via iteratively reweighted least squares (IRLS) or Newton-Raphson.

Marginal effects: The partial effect of $X_j$ on the probability is:

\frac{\partial P(Y_i = 1 \mid X_i)}{\partial X_{ij}} = \Lambda(X_i'\beta)(1 - \Lambda(X_i'\beta)) \cdot \beta_j = \Lambda'(X_i'\beta) \cdot \beta_j

This expression depends on $X_i$ , which is why you typically need to evaluate it at specific values or average it across the sample.

EImplementation

1# Requires: marginaleffects
2library(marginaleffects)
3
4# --- Step 1: Fit the Logit Model ---
5# glm() with family=binomial(link="logit") estimates via maximum likelihood.
6# Coefficients are in LOG-ODDS units, not probability units.
7logit_fit <- glm(adopt ~ firm_size + rd_spending + competition,
8               family = binomial(link = "logit"), data = df)
9# summary() shows log-odds coefficients, SEs, z-values, and Akaike Information Criterion (AIC)
10summary(logit_fit)
11
12# --- Step 2: Compute Average Marginal Effects (AMEs) ---
13# AMEs translate log-odds coefficients into probability-scale effects.
14# Each AME represents the average change in P(Y=1) for a one-unit change in X,
15# averaged across all observations (accounting for nonlinearity).
16# marginaleffects::avg_slopes() is the modern replacement for the archived margins package.
17ame <- avg_slopes(logit_fit)
18# Output: AME in percentage-point terms — the primary quantity to report
19print(ame)
20
21# --- Step 3: Compute Odds Ratios ---
22# Exponentiate coefficients to get odds ratios: exp(beta).
23# An OR of 1.35 means a one-unit increase in X multiplies the odds by 1.35.
24exp(coef(logit_fit))
25# Confidence intervals for odds ratios (profile likelihood-based)
26exp(confint(logit_fit))
27
28# --- Step 4: Fit Probit for Robustness ---
29# Probit uses the normal CDF as the link function instead of logistic.
30# Results should be substantively similar to logit — showing both
31# demonstrates robustness to the choice of link function.
32probit_fit <- glm(adopt ~ firm_size + rd_spending + competition,
33                family = binomial(link = "probit"), data = df)
34# Compare probit AMEs to logit AMEs — they should nearly agree
35print(avg_slopes(probit_fit))

Requiresmargins

FDiagnostics

Pseudo R-Squared

There is no true $R^2$ for logit/probit. McFadden's pseudo- $R^2$ compares the log-likelihood of your model to a null model (intercept only):

\text{Pseudo-}R^2 = 1 - \frac{\ell(\hat{\beta})}{\ell(\hat{\beta}_0)}

Values above 0.2 are sometimes informally considered indicative of good fit, though interpretation depends on context and there is no universal threshold. Do not compare pseudo- $R^2$ values across different link functions.

Classification Table

Predict $\hat{Y}_i = 1$ if $\hat{p}_i > c$ (usually $c = 0.5$ ) and compute the confusion matrix. Report sensitivity (true positive rate), specificity (true negative rate), and overall accuracy. But be cautious: classification accuracy is sensitive to class imbalance.

Hosmer-Lemeshow Test

Groups observations into deciles of predicted probability and tests whether observed frequencies match predicted frequencies. A significant test suggests poor calibration, but the test has low power and is sensitive to the number of groups.

Three Ways to Report Logit Results

Log-odds coefficients — the raw output. Hard to interpret; mainly useful for checking sign and significance.
Odds ratios — $e^{\beta_j}$ . "A one-unit increase in X multiplies the odds of Y=1 by $e^{\beta_j}$ ." Common in epidemiology and management.
Marginal effects — the change in probability. Most intuitive. Preferred in economics.

AME (averaged across the sample) is generally preferred to MEM (evaluated at the mean of all covariates) because it does not rely on a potentially unrepresentative "average" individual — see the "Report marginal effects" callout in §E.

GWhat Can Go Wrong

Problem	What It Does	How to Fix It
Reporting coefficients as marginal effects	Overstates/understates the effect	Compute and report AMEs
Perfect separation	Maximum likelihood estimation (MLE) does not converge; coefficients explode to infinity	Drop the problematic variable, use penalized likelihood (Firth logit), or combine categories
Rare events	Finite-sample bias in predicted probabilities and intercept estimates when Y=1 is very rare (e.g., well under 5%). Slope coefficients are less affected, but predicted event probabilities can be substantially downward-biased.	Use rare-events logit (King & Zeng, 2001) or exact logit
Ignoring heteroskedasticity	Standard errors are wrong	Use robust SEs
Comparing coefficients across models	Logit coefficients are not comparable across models with different covariates (rescaling problem) (Allison, 1999)	Compare marginal effects instead
Neglected heterogeneity	When an unobserved variable independent of X is omitted, probit and logit coefficients are attenuated toward zero. However, average partial effects (marginal effects) remain consistently estimated. The robustness of average partial effects is one of the strongest arguments for reporting marginal effects rather than raw coefficients .	Report AMEs rather than raw coefficients

What Can Go Wrong

Interpreting Logit Coefficients as Marginal Effects

Researcher computes average marginal effects after logit estimation

AME of firm size on adoption probability: 0.05 (SE = 0.017). A one-unit increase in firm size raises the probability of adoption by about 5 percentage points on average.

What Can Go Wrong

Perfect Separation

All covariate values have some variation in the outcome — both 0s and 1s appear at every level of X

Logit converges normally. Coefficient on industry dummy: 1.8 (SE = 0.4). MLE is well-defined and standard errors are reliable.

What Can Go Wrong

Comparing Logit Coefficients Across Models

Researcher compares average marginal effects across a baseline model and a model with additional controls

AME of R&D on adoption: 0.08 (baseline model) vs. 0.06 (with controls). The 2 percentage point decrease suggests modest confounding by the added covariates.

Concept Check

A logit regression of firm adoption on firm size produces a coefficient of 0.3 with robust SE 0.1. The average marginal effect is 0.05. How do you interpret the result?

A one-unit increase in firm size raises adoption probability by 30 percentage points.A one-unit increase in firm size raises adoption probability by about 5 percentage points, on average.The odds of adoption increase by 30% for each unit of firm size.The effect cannot be interpreted without knowing the base probability.

HPractice

Concept Check

A researcher runs a logit model and a probit model on the same data. The logit coefficient on firm size is 0.48 and the probit coefficient is 0.28. She concludes that the logit model estimates a much larger effect. Is she correct?

Yes — the logit coefficient is nearly twice as large, so the estimated effect is larger.No — logit and probit coefficients are on different scales; you generally want to compare marginal effects instead.Yes — but only if both models have the same pseudo-R-squared.No — you generally want to compare odds ratios from both models.

Concept Check

A logit model predicting loan default produces a coefficient of -0.8 on credit score (standardized). The odds ratio is exp(-0.8) = 0.45. A manager asks: 'So a one-SD increase in credit score cuts the default probability in half?' Is the manager correct?

Yes — an odds ratio of 0.45 means the probability is roughly halved.No — the odds ratio applies to the odds (p/(1-p)), not the probability itself. The effect on probability depends on the baseline.Yes — odds and probability are the same thing for rare events.No — the odds ratio should be inverted because the coefficient is negative.

Concept Check

A colleague says: 'I always use logit for binary outcomes because OLS can predict probabilities outside [0,1].' When might the linear probability model (LPM) actually be a reasonable choice?

Never — predicted probabilities outside [0,1] always invalidate OLS for binary outcomes.When the focus is on a treatment effect, most predicted probabilities are between 0.2 and 0.8, and logit/probit give similar average marginal effects.Only when the outcome is continuous, not truly binary.When you have panel data and need to include fixed effects.

Concept Check

You add an interaction term (firm_size * rd_spending) to a logit model. The coefficient on the interaction is 0.15 (p = 0.03). A reviewer says you cannot interpret the interaction effect by looking at this coefficient alone. Why?

Because interaction effects in logit are always zero.Because in nonlinear models, the interaction effect on the probability depends on all covariates and can even change sign across observations.Because the coefficient is too small to be meaningful.Because you need to exponentiate the interaction coefficient to get the odds ratio.

Guided Exercise

Interpreting Logit Results: Loan Default Prediction

A bank analyst runs a logit regression to predict whether a small business loan will default. The dependent variable is Default (1 = defaulted, 0 = repaid). The key predictor is `Years_in_business` (continuous). The estimated logit coefficient is -0.4 and the average marginal effect is -0.06. The baseline default probability in the sample is 20%.

Error Detective

Read the analysis below carefully and identify the errors.

A researcher studies whether receiving venture capital funding affects the probability that a startup goes public (IPO). They run a logit regression of IPO (0/1) on `VC_funded` (0/1), controlling for firm age, industry, and founder experience. They report: 'The coefficient on `VC_funded` is 1.2 (p < 0.01), meaning that VC funding increases the probability of IPO by 120 percentage points.'

Select all errors you can find:

Interpreting the logit coefficient as a probability change(Results interpretation)

No discussion of endogeneity of VC funding(Research design)

Referee Exercise

Read the paper summary below and write a brief referee critique (2-3 sentences) of the identification strategy.

Paper Summary

The authors study whether firms with female CEOs are more likely to adopt environmental sustainability practices. Using a cross-section of 3,200 publicly traded firms, they run a logit regression of sustainability adoption (0/1) on a female CEO dummy, controlling for firm size (log revenue), industry dummies, ROA, and firm age. They report that the odds ratio on female CEO is 1.85 (p = 0.002) and conclude that female leadership causes firms to be 85% more likely to adopt sustainability practices.

Key Table

Variable	Odds Ratio	Robust SE	p-value
Female CEO	1.85	0.35	0.002
Log(Revenue)	1.42	0.08	0.000
ROA	1.10	0.22	0.640
Firm age	1.01	0.003	0.001
Industry FE	Yes
Pseudo R-squared	0.18
N	3,200

Authors' Identification Claim

By controlling for firm size, profitability, firm age, and industry, we isolate the independent effect of CEO gender on sustainability adoption.

ISwap-In: When to Use Something Else

Linear Probability Model (LPM): If your probabilities are between 0.2 and 0.8 for most observations, the LPM with robust SEs often gives nearly identical average marginal effects. Easier to interpret and to combine with FE or IV.
Conditional logit (fixed effects logit): For panel data with unit fixed effects. Only uses within-unit variation. See Chamberlain (1980). Unlike logit, probit does not have an analogous conditional MLE that eliminates fixed effects. FE probit suffers from the problem and is inconsistent with fixed T .
Multinomial logit: When the outcome has more than two unordered categories.
Ordered logit/probit: When the outcome has ordered categories (e.g., strongly disagree to strongly agree).
Count models: When the outcome is a non-negative integer (number of events), see Poisson / Negative Binomial instead.

JReviewer Checklist

Critical Reading Checklist

0 of 8 items checked0%

Are marginal effects reported, not just raw coefficients or odds ratios?
Are AME and MEM distinguished? Is the choice justified?
Are robust standard errors used?
Is the endogeneity of key regressors discussed?
Is perfect or quasi-perfect separation checked for?
If comparing coefficients across models or groups, are marginal effects (not raw coefficients) compared?
Is model fit reported (pseudo R-squared, classification accuracy)?
If the LPM is used instead, is robustness to logit/probit shown?

Paper Library

Has replication code

Foundational (7)

Ai, C., & Norton, E. C. (2003). Interaction Terms in Logit and Probit Models.

Economics LettersDOI: 10.1016/S0165-1765(03)00032-6

Ai and Norton show that the interpretation of interaction terms in nonlinear models like logit and probit is much more complicated than in linear models. The marginal effect of an interaction is not simply the coefficient on the interaction term, a mistake that is widespread in applied research.

Allison, P. D. (1999). Comparing Logit and Probit Coefficients Across Groups.

Sociological Methods & ResearchDOI: 10.1177/0049124199028002003

Allison shows that naive comparisons of logit or probit coefficients across groups are misleading because differences in residual variation across groups rescale the coefficients. He proposes a method to adjust for this confound, which is essential for interpreting interaction effects and group comparisons in nonlinear models.

Amemiya, T. (1981). Qualitative Response Models: A Survey.

Journal of Economic Literature

Amemiya provides a comprehensive survey of qualitative response models including logit, probit, and tobit. This survey organized the theoretical properties, estimation methods, and specification tests for binary and multinomial choice models and becomes a standard reference for applied researchers.

Chamberlain, G. (1980). Analysis of Covariance with Qualitative Data.

Review of Economic StudiesDOI: 10.2307/2297110

Chamberlain extends the fixed effects approach to nonlinear models like logit, showing how to condition out the fixed effects in discrete choice settings. This work is fundamental for researchers who need fixed effects in models where the dependent variable is binary or categorical.

Hausman, J., & McFadden, D. (1984). Specification Tests for the Multinomial Logit Model.

EconometricaDOI: 10.2307/1910997

Hausman and McFadden develop a specification test for the independence of irrelevant alternatives (IIA) assumption in multinomial logit. The test allows researchers to assess whether the logit model's restrictive substitution patterns are appropriate for their data, which is critical for applied work with multiple choice categories.

King, G., & Zeng, L. (2001). Logistic Regression in Rare Events Data.

Political AnalysisDOI: 10.1093/oxfordjournals.pan.a004868

King and Zeng develop a correction for logistic regression when the outcome event is rare. Standard logit underestimates the probability of rare events; their rare-events logit (relogit) applies a correction based on prior information about the event rate in the population. Essential reference for binary outcome studies with highly imbalanced classes.

McFadden, D. (1974). Conditional Logit Analysis of Qualitative Choice Behavior.

Frontiers in Econometrics

McFadden develops the conditional logit model grounded in random utility theory, showing how discrete choices among alternatives can be modeled by assuming individuals maximize utility with an extreme-value distributed error. This work earned him the 2000 Nobel Prize and remains the foundation of discrete choice analysis.

Application (3)

Hoetker, G. (2007). The Use of Logit and Probit Models in Strategic Management Research: Critical Issues.

Strategic Management JournalDOI: 10.1002/smj.582

Hoetker reviews how strategy researchers use logit and probit models and identifies common pitfalls, including misinterpretation of coefficients across groups and incorrect use of interaction terms. This paper provides concrete guidance for improving practice in management journals.

Palepu, K. G. (1986). Predicting Takeover Targets: A Methodological and Empirical Analysis.

Journal of Accounting and EconomicsDOI: 10.1016/0165-4101(86)90008-X

Palepu uses logit models to study takeover prediction and identifies methodological flaws in prior prediction studies, showing that targets are more difficult to predict than earlier work suggests. The paper highlights the importance of proper classification criteria and sampling methodology when applying binary choice models to rare-event corporate outcomes.

Zelner, B. A. (2009). Using Simulation to Interpret Results from Logit, Probit, and Other Nonlinear Models.

Strategic Management JournalDOI: 10.1002/smj.783

Zelner advocates using simulation-based approaches to interpret and present results from nonlinear models in management research. By computing predicted probabilities and marginal effects via simulation, researchers can convey substantive significance more clearly than raw coefficients.

Survey (5)

Angrist, J. D., & Pischke, J.-S. (2009). Mostly Harmless Econometrics: An Empiricist's Companion.

Princeton University PressDOI: 10.1515/9781400829828

Angrist and Pischke write one of the most influential modern textbooks on applied econometrics, organizing the field around a design-based approach to causal inference. The book provides essential treatments of instrumental variables, difference-in-differences, and regression discontinuity, each grounded in the potential outcomes framework. It remains the standard reference for graduate students learning to evaluate and implement identification strategies.

Cameron, A. C., & Trivedi, P. K. (2005). Microeconometrics: Methods and Applications.

Cambridge University PressDOI: 10.1017/CBO9780511811241

Cameron and Trivedi cover panel data methods comprehensively in Chapter 21, including fixed effects, random effects, and dynamic panel models. A standard graduate-level reference for microeconometric methods.

Long, J. S. (1997). Regression Models for Categorical and Limited Dependent Variables.

SAGE Publications

Long provides a comprehensive reference for applied researchers working with binary, ordinal, multinomial, and count outcome models. The textbook covers maximum likelihood estimation, marginal effects computation, and model diagnostics with clear exposition and software implementation guidance. It remains the standard practical guide for researchers who need to move beyond OLS to handle categorical and limited dependent variables.

Train, K. E. (2009). Discrete Choice Methods with Simulation.

Cambridge University PressDOI: 10.1017/CBO9780511805271

Train's textbook provides a comprehensive and accessible treatment of logit, probit, mixed logit, and other discrete choice models. It covers both theory and practical simulation-based estimation methods and is widely used in economics, marketing, and transportation research.

Wooldridge, J. M. (2010). Econometric Analysis of Cross Section and Panel Data.

MIT Press

Wooldridge's graduate textbook covers duration and hazard models in Chapter 22, including the Cox proportional hazard model, parametric alternatives (Weibull, exponential), and the treatment of censoring and truncation in survival data.

One-Line Implementation

Download Full Analysis Code

Motivating Example: Firm Adoption of a New Technology#

AOverview#

The Problem with OLS on Binary Outcomes#

Link Functions: The Core Idea#

When Does It Matter Which You Choose?#

Common Confusions#

BIdentification#

The Latent Variable Interpretation#

CVisual Intuition#

DMathematical Derivation#

EImplementation#

FDiagnostics#

Pseudo R-Squared#

Classification Table#

Hosmer-Lemeshow Test#

Three Ways to Report Logit Results#

GWhat Can Go Wrong#

Interpreting Logit Coefficients as Marginal Effects

Perfect Separation

Comparing Logit Coefficients Across Models

HPractice#

Paper Summary

Key Table

Authors' Identification Claim

ISwap-In: When to Use Something Else#

JReviewer Checklist#

Critical Reading Checklist

Paper Library

Foundational (7)

Application (3)

Survey (5)

Tags

Motivating Example: Firm Adoption of a New Technology

AOverview

The Problem with OLS on Binary Outcomes

Link Functions: The Core Idea

When Does It Matter Which You Choose?

Common Confusions

BIdentification

The Latent Variable Interpretation

CVisual Intuition

DMathematical Derivation

EImplementation

FDiagnostics

Pseudo R-Squared

Classification Table

Hosmer-Lemeshow Test

Three Ways to Report Logit Results

GWhat Can Go Wrong

HPractice

ISwap-In: When to Use Something Else

JReviewer Checklist