MethodAtlas
Method·intermediate·10 min read
Discrete ChoiceEstablished

Logit / Probit

Models for binary outcomes — when your dependent variable is yes/no, pass/fail, or adopt/don't adopt.

When to UseWhen your outcome variable is binary (0/1, yes/no, adopt/don't adopt) and the linear probability model is inadequate, especially when predicted probabilities near 0 or 1 matter.
AssumptionCorrect specification of the link function (logistic or normal CDF). For causal interpretation, the same exogeneity condition as OLS applies.
MistakeReporting logit coefficients as if they were marginal effects — logit coefficients are in log-odds units, not probability units. Computing and reporting average marginal effects is standard practice.
Reading Time~10 min read · 11 sections · 8 interactive exercises

One-Line Implementation

Rglm(y ~ x1 + x2, family = binomial(link = 'logit'), data = df) |> lmtest::coeftest(vcov. = sandwich::vcovHC)
Statalogit y x1 x2, vce(robust)
Pythonsmf.logit('y ~ x1 + x2', data=df).fit(cov_type='HC1')

Download Full Analysis Code

Complete scripts with diagnostics, robustness checks, and result export.

Motivating Example: Firm Adoption of a New Technology

Imagine you are studying why some firms adopt a new manufacturing technology and others do not. Your outcome variable is binary: Yi=1Y_i = 1 if firm ii adopts, Yi=0Y_i = 0 otherwise. You want to know how firm size, R&D spending, and industry competition affect the probability of adoption.

You could try running OLS — regressing the 0/1 outcome on your covariates. This approach is called the linear probability model (LPM), and it is a reasonable starting point. But it has problems. The predicted probabilities can fall outside [0,1][0, 1], the error term is necessarily heteroskedastic, and the marginal effect of a covariate is assumed to be constant regardless of where you are on the probability scale.

Logit and probit models address these problems by modeling the probability through a nonlinear link function that keeps predictions bounded between 0 and 1.


AOverview

The Problem with OLS on Binary Outcomes

When you run Yi=β0+β1Xi+εiY_i = \beta_0 + \beta_1 X_i + \varepsilon_i with Yi{0,1}Y_i \in \{0, 1\}, you are modeling:

E[YiXi]=P(Yi=1Xi)=β0+β1XiE[Y_i \mid X_i] = P(Y_i = 1 \mid X_i) = \beta_0 + \beta_1 X_i

This equation is the LPM. It works surprisingly well in many cases, especially near the center of the data. But at the extremes, it can predict probabilities below 0 or above 1, which is nonsensical.

Both logit and probit model the probability through a nonlinear transformation:

P(Yi=1Xi)=G(Xiβ)P(Y_i = 1 \mid X_i) = G(X_i'\beta)

where G()G(\cdot) is a function that maps any real number to the (0,1)(0, 1) interval.

  • Logit uses the logistic function: G(z)=ez1+ez=Λ(z)G(z) = \frac{e^z}{1 + e^z} = \Lambda(z)
  • Probit uses the standard normal CDF: G(z)=Φ(z)G(z) = \Phi(z)

Both are S-shaped curves. They are nearly identical in practice — probit is slightly steeper at the center and slightly thinner at the tails. In most applications, they give very similar results.

When Does It Matter Which You Choose?

In most applications, it does not. The choice between logit and probit rarely changes substantive conclusions. Logit is more common in epidemiology and management because of the convenient interpretation. Probit is more common in economics, partly by convention and partly because it connects naturally to latent variable models.


Common Confusions


BIdentification

The identification strategy for logit/probit is the same as for OLS: you need exogeneity of the regressors. The logit/probit framework does not solve problems — it just handles the functional form for binary outcomes.

E[εiXi]=0E[\varepsilon_i \mid X_i] = 0

If your regressors are endogenous, you need an identification strategy (IV, DiD, matching, etc.) combined with the appropriate binary outcome model. For IV with binary outcomes, see the bivariate probit or IV-probit approach. It is also advisable to consider sensitivity analysis to assess how robust your estimates are to potential unobserved confounders.

The Latent Variable Interpretation

Both models can be motivated by a latent variable YiY_i^*:

Yi=Xiβ+εi,Yi=1(Yi>0)Y_i^* = X_i'\beta + \varepsilon_i, \quad Y_i = \mathbf{1}(Y_i^* > 0)

If εi\varepsilon_i follows a logistic distribution, you get logit. If εi\varepsilon_i follows a standard normal, you get probit. The firm adopts the technology when the latent net benefit exceeds zero.


CVisual Intuition

Think of the probability curve as a hill. At the bottom (low probability of adoption), even a large change in firm size barely moves the probability — you are pushing against inertia. At the top (high probability), the same is true — most firms have already adopted. The steepest part of the hill is in the middle, around 50% probability. This middle region is where a change in X has the biggest effect on the probability.

This nonlinearity is why marginal effects depend on where you evaluate them. A one-unit increase in firm size might raise adoption probability by 8 percentage points for a mid-sized firm (on the steep part of the curve) but only 2 percentage points for a very large firm (on the flat part).


DMathematical Derivation

Don't worry about the notation yet — here's what this means in words: We find the coefficients that make the observed data most likely, by maximizing the probability of seeing the 1s and 0s we actually observe.

For a binary outcome Yi{0,1}Y_i \in \{0, 1\} with probability pi=P(Yi=1Xi)=Λ(Xiβ)p_i = P(Y_i = 1 \mid X_i) = \Lambda(X_i'\beta), the likelihood for observation ii is:

Li(β)=piYi(1pi)1YiL_i(\beta) = p_i^{Y_i} (1 - p_i)^{1 - Y_i}

The log-likelihood for the full sample is:

(β)=i=1n[Yiln(Λ(Xiβ))+(1Yi)ln(1Λ(Xiβ))]\ell(\beta) = \sum_{i=1}^{n} \left[ Y_i \ln(\Lambda(X_i'\beta)) + (1 - Y_i) \ln(1 - \Lambda(X_i'\beta)) \right]

Taking the derivative and using the fact that Λ(z)=Λ(z)(1Λ(z))\Lambda'(z) = \Lambda(z)(1 - \Lambda(z)):

β=i=1n(YiΛ(Xiβ))Xi=0\frac{\partial \ell}{\partial \beta} = \sum_{i=1}^{n} (Y_i - \Lambda(X_i'\beta)) X_i = 0

This likelihood has no closed-form solution and must be solved numerically via iteratively reweighted least squares (IRLS) or Newton-Raphson.

Marginal effects: The partial effect of XjX_j on the probability is:

P(Yi=1Xi)Xij=Λ(Xiβ)(1Λ(Xiβ))βj=Λ(Xiβ)βj\frac{\partial P(Y_i = 1 \mid X_i)}{\partial X_{ij}} = \Lambda(X_i'\beta)(1 - \Lambda(X_i'\beta)) \cdot \beta_j = \Lambda'(X_i'\beta) \cdot \beta_j

This expression depends on XiX_i, which is why you must evaluate it at specific values or average it across the sample.


EImplementation

# Requires: marginaleffects
library(marginaleffects)

# --- Step 1: Fit the Logit Model ---
# glm() with family=binomial(link="logit") estimates via maximum likelihood.
# Coefficients are in LOG-ODDS units, not probability units.
logit_fit <- glm(adopt ~ firm_size + rd_spending + competition,
               family = binomial(link = "logit"), data = df)
# summary() shows log-odds coefficients, SEs, z-values, and Akaike Information Criterion (AIC)
summary(logit_fit)

# --- Step 2: Compute Average Marginal Effects (AMEs) ---
# AMEs translate log-odds coefficients into probability-scale effects.
# Each AME represents the average change in P(Y=1) for a one-unit change in X,
# averaged across all observations (accounting for nonlinearity).
# marginaleffects::avg_slopes() is the modern replacement for the archived margins package.
ame <- avg_slopes(logit_fit)
# Output: AME in percentage-point terms — the primary quantity to report
print(ame)

# --- Step 3: Compute Odds Ratios ---
# Exponentiate coefficients to get odds ratios: exp(beta).
# An OR of 1.35 means a one-unit increase in X multiplies the odds by 1.35.
exp(coef(logit_fit))
# Confidence intervals for odds ratios (profile likelihood-based)
exp(confint(logit_fit))

# --- Step 4: Fit Probit for Robustness ---
# Probit uses the normal CDF as the link function instead of logistic.
# Results should be substantively similar to logit — showing both
# demonstrates robustness to the choice of link function.
probit_fit <- glm(adopt ~ firm_size + rd_spending + competition,
                family = binomial(link = "probit"), data = df)
# Compare probit AMEs to logit AMEs — they should nearly agree
print(avg_slopes(probit_fit))
Requiresmargins

FDiagnostics

Pseudo R-Squared

There is no true R2R^2 for logit/probit. McFadden's pseudo-R2R^2 compares the log-likelihood of your model to a null model (intercept only):

Pseudo-R2=1(β^)(β^0)\text{Pseudo-}R^2 = 1 - \frac{\ell(\hat{\beta})}{\ell(\hat{\beta}_0)}

Values above 0.2 are sometimes informally considered indicative of good fit, though interpretation depends on context and there is no universal threshold. Do not compare pseudo-R2R^2 values across different link functions.

Classification Table

Predict Y^i=1\hat{Y}_i = 1 if p^i>c\hat{p}_i > c (usually c=0.5c = 0.5) and compute the confusion matrix. Report sensitivity (true positive rate), specificity (true negative rate), and overall accuracy. But be cautious: classification accuracy is sensitive to class imbalance.

Hosmer-Lemeshow Test

Groups observations into deciles of predicted probability and tests whether observed frequencies match predicted frequencies. A significant test suggests poor calibration, but the test has low power and is sensitive to the number of groups.


Three Ways to Report Logit Results

  1. Log-odds coefficients — the raw output. Hard to interpret; mainly useful for checking sign and significance.
  2. Odds ratioseβje^{\beta_j}. "A one-unit increase in X multiplies the odds of Y=1 by eβje^{\beta_j}." Common in epidemiology and management.
  3. Marginal effects — the change in probability. Most intuitive. Preferred in economics.

GWhat Can Go Wrong

ProblemWhat It DoesHow to Fix It
Reporting coefficients as marginal effectsOverstates/understates the effectCompute and report AMEs
Perfect separationMaximum likelihood estimation (MLE) does not converge; coefficients explode to infinityDrop the problematic variable, use penalized likelihood (Firth logit), or combine categories
Rare eventsFinite-sample bias in predicted probabilities and intercept estimates when Y=1 is very rare (e.g., well under 5%). Slope coefficients are less affected, but predicted event probabilities can be substantially downward-biased.Use rare-events logit (King & Zeng, 2001) or exact logit
Ignoring heteroskedasticityStandard errors are wrongUse robust SEs
Comparing coefficients across modelsLogit coefficients are not comparable across models with different covariates (rescaling problem) (Allison, 1999)Compare marginal effects instead
Neglected heterogeneityWhen an unobserved variable independent of X is omitted, probit and logit coefficients are attenuated toward zero. However, average partial effects (marginal effects) remain consistently estimated. This result is one of the strongest arguments for reporting marginal effects rather than raw coefficients (Wooldridge, 2010).Report AMEs rather than raw coefficients
What Can Go Wrong

Interpreting Logit Coefficients as Marginal Effects

Researcher computes average marginal effects after logit estimation

AME of firm size on adoption probability: 0.05 (SE = 0.017). A one-unit increase in firm size raises the probability of adoption by about 5 percentage points on average.

What Can Go Wrong

Perfect Separation

All covariate values have some variation in the outcome — both 0s and 1s appear at every level of X

Logit converges normally. Coefficient on industry dummy: 1.8 (SE = 0.4). MLE is well-defined and standard errors are reliable.

What Can Go Wrong

Comparing Logit Coefficients Across Models

Researcher compares average marginal effects across a baseline model and a model with additional controls

AME of R&D on adoption: 0.08 (baseline model) vs. 0.06 (with controls). The 2 percentage point decrease suggests modest confounding by the added covariates.

Concept Check

A logit regression of firm adoption on firm size produces a coefficient of 0.3 with robust SE 0.1. The average marginal effect is 0.05. How do you interpret the result?


HPractice

Concept Check

A researcher runs a logit model and a probit model on the same data. The logit coefficient on firm size is 0.48 and the probit coefficient is 0.28. She concludes that the logit model estimates a much larger effect. Is she correct?

Concept Check

A logit model predicting loan default produces a coefficient of -0.8 on credit score (standardized). The odds ratio is exp(-0.8) = 0.45. A manager asks: 'So a one-SD increase in credit score cuts the default probability in half?' Is the manager correct?

Concept Check

A colleague says: 'I always use logit for binary outcomes because OLS can predict probabilities outside [0,1].' When might the linear probability model (LPM) actually be a reasonable choice?

Concept Check

You add an interaction term (firm_size * rd_spending) to a logit model. The coefficient on the interaction is 0.15 (p = 0.03). A reviewer says you cannot interpret the interaction effect by looking at this coefficient alone. Why?

Guided Exercise

Interpreting Logit Results: Loan Default Prediction

A bank analyst runs a logit regression to predict whether a small business loan will default. The dependent variable is Default (1 = defaulted, 0 = repaid). The key predictor is `Years_in_business` (continuous). The estimated logit coefficient is -0.4 and the average marginal effect is -0.06. The baseline default probability in the sample is 20%.

In what units is the logit coefficient (-0.4) expressed?

How do you interpret the average marginal effect of -0.06?

If a colleague says the odds of default decrease by 40% per additional year, are they correct?

Why can you not interpret the logit coefficient directly as a probability change?

Error Detective

Read the analysis below carefully and identify the errors.

A researcher studies whether receiving venture capital funding affects the probability that a startup goes public (IPO). They run a logit regression of IPO (0/1) on `VC_funded` (0/1), controlling for firm age, industry, and founder experience. They report: 'The coefficient on `VC_funded` is 1.2 (p < 0.01), meaning that VC funding increases the probability of IPO by 120 percentage points.'

Select all errors you can find:

Referee Exercise

Read the paper summary below and write a brief referee critique (2-3 sentences) of the identification strategy.

Paper Summary

The authors study whether firms with female CEOs are more likely to adopt environmental sustainability practices. Using a cross-section of 3,200 publicly traded firms, they run a logit regression of sustainability adoption (0/1) on a female CEO dummy, controlling for firm size (log revenue), industry dummies, ROA, and firm age. They report that the odds ratio on female CEO is 1.85 (p = 0.002) and conclude that female leadership causes firms to be 85% more likely to adopt sustainability practices.

Key Table

VariableOdds RatioRobust SEp-value
Female CEO1.850.350.002
Log(Revenue)1.420.080.000
ROA1.100.220.640
Firm age1.010.0030.001
Industry FEYes
Pseudo R-squared0.18
N3,200

Authors' Identification Claim

By controlling for firm size, profitability, firm age, and industry, we isolate the independent effect of CEO gender on sustainability adoption.


ISwap-In: When to Use Something Else

  • Linear Probability Model (LPM): If your probabilities are between 0.2 and 0.8 for most observations, the LPM with robust SEs often gives nearly identical average marginal effects. Easier to interpret and to combine with FE or IV.
  • Conditional logit (fixed effects logit): For panel data with unit fixed effects. Only uses within-unit variation. See Chamberlain (1980). Unlike logit, probit does not have an analogous conditional MLE that eliminates fixed effects. FE probit suffers from the problem and is inconsistent with fixed T (Wooldridge, 2010).
  • Multinomial logit: When the outcome has more than two unordered categories.
  • Ordered logit/probit: When the outcome has ordered categories (e.g., strongly disagree to strongly agree).
  • Count models: When the outcome is a non-negative integer (number of events), see Poisson / Negative Binomial instead.

JReviewer Checklist

Critical Reading Checklist

0 of 8 items checked0%

Paper Library

Foundational (7)

Ai, C., & Norton, E. C. (2003). Interaction Terms in Logit and Probit Models.

Economics LettersDOI: 10.1016/S0165-1765(03)00032-6

Ai and Norton show that the interpretation of interaction terms in nonlinear models like logit and probit is much more complicated than in linear models. The marginal effect of an interaction is not simply the coefficient on the interaction term, a mistake that is widespread in applied research.

Allison, P. D. (1999). Comparing Logit and Probit Coefficients Across Groups.

Sociological Methods & ResearchDOI: 10.1177/0049124199028002003

Allison shows that naive comparisons of logit or probit coefficients across groups are misleading because differences in residual variation across groups rescale the coefficients. He proposes a method to adjust for this confound, which is essential for interpreting interaction effects and group comparisons in nonlinear models.

Amemiya, T. (1981). Qualitative Response Models: A Survey.

Journal of Economic Literature

Amemiya provides a comprehensive survey of qualitative response models including logit, probit, and tobit. This survey organizes the theoretical properties, estimation methods, and specification tests for binary and multinomial choice models and becomes a standard reference for applied researchers.

Chamberlain, G. (1980). Analysis of Covariance with Qualitative Data.

Review of Economic StudiesDOI: 10.2307/2297110

Chamberlain extends the fixed effects approach to nonlinear models like logit, showing how to condition out the fixed effects in discrete choice settings. This work is fundamental for researchers who need fixed effects in models where the dependent variable is binary or categorical.

Hausman, J., & McFadden, D. (1984). Specification Tests for the Multinomial Logit Model.

EconometricaDOI: 10.2307/1910997

Hausman and McFadden develop a specification test for the independence of irrelevant alternatives (IIA) assumption in multinomial logit. The test allows researchers to assess whether the logit model's restrictive substitution patterns are appropriate for their data, which is critical for applied work with multiple choice categories.

King, G., & Zeng, L. (2001). Logistic Regression in Rare Events Data.

King and Zeng develop a correction for logistic regression when the outcome event is rare. Standard logit underestimates the probability of rare events; their rare-events logit (relogit) applies a correction based on prior information about the event rate in the population. Essential reference for binary outcome studies with highly imbalanced classes.

McFadden, D. (1974). Conditional Logit Analysis of Qualitative Choice Behavior.

Frontiers in Econometrics

McFadden develops the conditional logit model grounded in random utility theory, showing how discrete choices among alternatives can be modeled by assuming individuals maximize utility with an extreme-value distributed error. This work earns him the 2000 Nobel Prize and remains the foundation of discrete choice analysis.

Application (3)

Hoetker, G. (2007). The Use of Logit and Probit Models in Strategic Management Research: Critical Issues.

Strategic Management JournalDOI: 10.1002/smj.582

Hoetker reviews how strategy researchers use logit and probit models and identifies common pitfalls, including misinterpretation of coefficients across groups and incorrect use of interaction terms. This paper provides concrete guidance for improving practice in management journals.

Palepu, K. G. (1986). Predicting Takeover Targets: A Methodological and Empirical Analysis.

Journal of Accounting and EconomicsDOI: 10.1016/0165-4101(86)90008-X

Palepu uses logit models to study takeover prediction and identifies methodological flaws in prior prediction studies, showing that targets are more difficult to predict than earlier work suggests. The paper highlights the importance of proper classification criteria and sampling methodology when applying binary choice models to rare-event corporate outcomes.

Zelner, B. A. (2009). Using Simulation to Interpret Results from Logit, Probit, and Other Nonlinear Models.

Strategic Management JournalDOI: 10.1002/smj.783

Zelner advocates using simulation-based approaches to interpret and present results from nonlinear models in management research. By computing predicted probabilities and marginal effects via simulation, researchers can convey substantive significance more clearly than raw coefficients.

Survey (5)

Angrist, J. D., & Pischke, J.-S. (2009). Mostly Harmless Econometrics: An Empiricist's Companion.

Princeton University PressDOI: 10.1515/9781400829828

Angrist and Pischke write one of the most influential modern textbooks on applied econometrics, organizing the field around a design-based approach to causal inference. The book provides essential treatments of instrumental variables, difference-in-differences, and regression discontinuity, each grounded in the potential outcomes framework. It remains the standard reference for graduate students learning to evaluate and implement identification strategies.

Cameron, A. C., & Trivedi, P. K. (2005). Microeconometrics: Methods and Applications.

Cambridge University PressDOI: 10.1017/CBO9780511811241

Cameron and Trivedi cover panel data methods comprehensively in Chapter 21, including fixed effects, random effects, and dynamic panel models. A standard graduate-level reference for microeconometric methods.

Long, J. S. (1997). Regression Models for Categorical and Limited Dependent Variables.

SAGE Publications

Long provides a comprehensive reference for applied researchers working with binary, ordinal, multinomial, and count outcome models. The textbook covers maximum likelihood estimation, marginal effects computation, and model diagnostics with clear exposition and software implementation guidance. It remains the standard practical guide for researchers who need to move beyond OLS to handle categorical and limited dependent variables.

Train, K. E. (2009). Discrete Choice Methods with Simulation.

Cambridge University PressDOI: 10.1017/CBO9780511805271

Train's textbook provides a comprehensive and accessible treatment of logit, probit, mixed logit, and other discrete choice models. It covers both theory and practical simulation-based estimation methods and is widely used in economics, marketing, and transportation research.

Wooldridge, J. M. (2010). Econometric Analysis of Cross Section and Panel Data.

MIT Press

Wooldridge's graduate textbook is the standard reference for cross-section and panel data econometrics. Chapters 10-11 provide a thorough treatment of fixed effects, random effects, and related panel data methods, while later chapters cover general estimation methodology (MLE, GMM, M-estimation) with panel data applications throughout. The book covers both linear and nonlinear models with careful attention to assumptions.

Tags

discrete-choicebinary-outcomecross-sectional