MethodAtlas
Model-BasedEstablished

Cox Proportional Hazard Model

Models the hazard rate of an event (failure, exit, adoption) as a function of covariates, using a semiparametric baseline hazard that does not require distributional assumptions.

Quick Reference

When to Use
When your outcome is time-to-event (e.g., time to firm exit, CEO tenure, technology adoption, employee turnover) and you have right-censored observations (subjects who have not yet experienced the event by the end of the observation window).
Key Assumption
Proportional hazards: the ratio of hazard rates for any two individuals is constant over time. The baseline hazard h_0(t) is left completely unspecified (semiparametric). Non-informative censoring is also required.
Common Mistake
Not testing the proportional hazards assumption (use Schoenfeld residuals). If the assumption fails, the Cox model produces a weighted average of time-varying effects that may be misleading.
Estimated Time
3 hours

One-Line Implementation

Stata: stcox treatment x1 x2, efron vce(robust)
R: coxph(Surv(time, event) ~ treatment + x1 + x2, data = df, ties = 'efron')
Python: CoxPHFitter().fit(df, duration_col='time', event_col='event', formula='treatment + x1 + x2')

Download Full Analysis Code

Complete scripts with diagnostics, robustness checks, and result export.

Motivating Example

A management researcher wants to know whether founder-CEOs stay in their positions longer than professional (externally hired) CEOs. She collects data on 1,200 CEO spells at publicly traded firms between 1995 and 2015, recording the date each CEO took office and, if applicable, the date they left.

Here is the problem: 40% of the CEOs in her sample are still in their position at the end of the study period in 2015. These are observations — she knows the CEO was still active at the end of 2015, but she does not know when they will eventually leave.

If she simply compares average observed tenure between the two groups, she will underestimate average tenure for both groups (because she is treating the end of the observation window as though it were the event date). Worse, if founder-CEOs are disproportionately censored (because they tend to stay longer), the bias will be asymmetric — she will underestimate founder-CEO tenure more than professional-CEO tenure, potentially masking the very difference she wants to detect.

She cannot simply drop the censored observations either. Doing so would select on the outcome: she would be left with only CEOs who departed, which is a non-random subset. If the censored CEOs are systematically different (e.g., better performing), dropping them introduces .

The Cox proportional hazard model (Cox, 1972) solves this problem. It models the instantaneous rate of departure (the ) as a function of covariates, while properly accounting for censored observations. Censored CEOs contribute information up to the point they are last observed — they were "at risk" of departing during all the time they were observed, and the model uses this information without requiring knowledge of when they eventually leave.


A. Overview

What the Cox Model Does

The Cox proportional hazard model estimates — the multiplicative effect of covariates on the instantaneous rate of experiencing an event. The model is:

h(tXi)=h0(t)exp(Xiβ)h(t | X_i) = h_0(t) \cdot \exp(X_i'\beta)

where:

  • h(tXi)h(t | X_i) is the hazard rate for individual ii at time tt
  • h0(t)h_0(t) is the baseline hazard — the hazard when all covariates equal zero
  • exp(Xiβ)\exp(X_i'\beta) is the multiplicative shift due to covariates

The critical feature of the Cox model is that it is semiparametric: the baseline hazard h0(t)h_0(t) is left completely unspecified. You do not need to assume that it follows any particular distribution (exponential, Weibull, log-normal, etc.). The model only estimates how covariates shift the hazard, not the shape of the hazard itself.

Key Concepts

  • S(t)=P(T>t)S(t) = P(T > t): the probability of surviving (not experiencing the event) beyond time tt. The provides a nonparametric estimate of this function (Kaplan & Meier, 1958).

  • Hazard rate h(t)h(t): the instantaneous risk of the event at time tt, conditional on having survived to tt. Unlike a probability, the hazard rate can exceed 1 because it is a rate (events per unit time), not a probability.

  • : when the event has not occurred by the end of the observation period. The Cox model handles this by including censored observations in the risk set up to the censoring time but not requiring them to contribute an event.

  • : the assumption that the ratio of hazards for any two individuals is constant over time. Proportional hazards is the defining assumption of the Cox model.

How It Differs from OLS

The key difference from OLS is that the Cox model correctly handles censored observations. In a regression of observed duration on covariates, OLS treats censored observations as though their observed time is the true duration — biasing coefficients downward. The Cox model instead uses the partial likelihood, which conditions on the observed ordering of events and does not require knowing the full distribution of event times.

When to Use the Cox Model

  • Your outcome is time-to-event: time to firm exit, CEO departure, technology adoption, employee turnover, patent citation, project completion, loan default
  • You have right-censored observations: subjects who have not experienced the event by the end of your study
  • You want to estimate how covariates affect the rate of the event rather than a binary yes/no outcome
  • You do not want to assume a particular parametric form for the baseline hazard

When NOT to Use the Cox Model

  • Your outcome is binary with no meaningful time dimension — use logit/probit
  • Your outcome is a continuous variable that is not a duration — use OLS
  • The proportional hazards assumption is violated and cannot be fixed by stratification — consider accelerated failure time (AFT) models
  • You have multiple competing event types (e.g., CEO can leave voluntarily, be fired, or retire) and you want event-specific effects — use competing risks models (Fine & Gray, 1999)

When to Use (Detailed)

  1. Your outcome is a duration with censoring. Time to firm exit, CEO tenure, time to technology adoption, employee turnover, time to loan default, patent lifetime — any setting where you observe a time-to-event and some subjects have not yet experienced the event.

  2. You want to estimate covariate effects without specifying the baseline hazard. If you do not have a strong prior about the shape of the hazard over time (whether it increases, decreases, or is constant), the semiparametric Cox model is the safe choice.

  3. You want to compare groups controlling for covariates. The Cox model provides hazard ratios that quantify how much faster or slower the event occurs for one group relative to another, holding other factors constant.

When NOT to Use (Detailed)

  1. The proportional hazards assumption is badly violated. If a treatment effect wears off sharply over time, the Cox model produces a single "average" hazard ratio that may not describe any actual time period well. Consider:

    • Stratified Cox model: allows different baseline hazards for different groups
    • Time-varying coefficients: interact the covariate with time
    • Accelerated failure time (AFT) models: model log(time) directly
  2. You have interval-censored data. If you only know that the event occurred between two time points (e.g., between annual surveys), use discrete-time hazard models rather than the standard Cox model.

  3. Multiple event types compete. If a CEO can leave voluntarily, be fired, or retire, standard Cox treats the competing events as censoring events, which can bias results. Use the Fine-Gray competing risks model (Fine & Gray, 1999) or cause-specific hazard models instead.

  4. You need to estimate the baseline hazard. The Cox model does not directly estimate h0(t)h_0(t). If you need the baseline hazard for prediction or simulation, consider parametric alternatives (Weibull, Gompertz).


Connection to Other Methods

The Cox model sits within a broader ecosystem of methods for different outcome types:

  • Logit/Probit: models whether the event occurred (binary 0/1) but ignores when. Use logit when the time dimension is not meaningful. Use Cox when timing matters. Shumway (2001) shows that switching from static logit to a hazard model significantly improves bankruptcy prediction accuracy.

  • OLS on duration: regressing observed duration on covariates ignores censoring. Censored observations are treated as though their observed time is the true event time, biasing all coefficients toward zero (attenuation bias). This approach is never appropriate when censoring is present.

  • Competing risks (Fine-Gray): extends the Cox framework to settings with multiple event types (Fine & Gray, 1999). The standard Cox model treats competing events as censored, which can be problematic. The Fine-Gray model estimates the effect of covariates on the cumulative incidence of a specific event type.

  • Parametric survival models: Weibull, exponential, Gompertz, and log-normal models fully specify the baseline hazard. They are more efficient than Cox when the distributional assumption is correct, but biased when it is wrong.

  • Fixed effects: in panel settings with repeated spells, stratified Cox models can be used to absorb time-invariant unobserved heterogeneity (analogous to fixed effects).


Common Confusions


B. Identification

For the Cox model to provide valid inference, three key assumptions must hold.

Assumption 1: Proportional Hazards

Plain language: The effect of a covariate on the hazard rate is constant over time. If founder-CEOs have a 30% lower departure hazard than professional CEOs at year 1, they also have a 30% lower hazard at year 5, year 10, and so on.

Formally: h(tXi)=h0(t)exp(Xiβ)h(t | X_i) = h_0(t) \cdot \exp(X_i'\beta), where β\beta does not depend on tt.

This property means the for any two individuals is:

h(tXi)h(tXj)=h0(t)exp(Xiβ)h0(t)exp(Xjβ)=exp((XiXj)β)\frac{h(t | X_i)}{h(t | X_j)} = \frac{h_0(t) \exp(X_i'\beta)}{h_0(t) \exp(X_j'\beta)} = \exp((X_i - X_j)'\beta)

The baseline hazard h0(t)h_0(t) cancels, and the ratio is constant over time. If the proportional hazards assumption fails — for example, if a treatment effect wears off over time — the Cox model estimates a weighted average of the time-varying effect, which may be misleading (Grambsch & Therneau, 1994).

Assumption 2: Non-Informative Censoring

Plain language: The reason an observation is censored is unrelated to the likelihood of the event occurring. In the CEO example, this means that the end of the study period, firm going private, or data unavailability is not systematically related to whether the CEO would have left soon.

Formally: TCXT \perp C | X, where TT is the true event time and CC is the censoring time.

This assumption is violated, for example, if healthier patients selectively drop out of a clinical trial, or if firms with troubled CEOs are more likely to be acquired (and thus disappear from the sample).

Assumption 3: Correct Specification (for Causal Interpretation)

Plain language: For the Cox model coefficients to have a causal interpretation, the covariates must be exogenous — the same zero conditional mean assumption required for OLS. If there are unobserved confounders that affect both the covariates and the hazard, the coefficient estimates are biased.

Formally: E[εiXi]=0E[\varepsilon_i | X_i] = 0 where εi\varepsilon_i captures all unobserved factors affecting the hazard.

This requirement is the same exogeneity concern as in any regression model. The Cox model does not solve endogeneity — it handles censoring and the functional form for duration data.


C. Visual Intuition

Adjust the hazard ratio and censoring rate to see how the Kaplan-Meier survival curves diverge between treatment and control groups. Higher censoring makes the curves noisier and the log-rank test less powerful.

Interactive Simulation

Survival Curve Simulator

Simulate Kaplan-Meier survival curves for treatment and control groups with exponential survival times. Adjust the hazard ratio, sample size, censoring rate, and baseline hazard to see how they affect the survival curves and the log-rank test.

0173451680.00.20.40.60.81.0TimeSurvival Probability+++++++++++++++++++++++++++++++++++++++++++++++++Log-rank p = < 0.001
ControlTreatment (HR = 0.70)

Results

MetricValue
Hazard Ratio (HR)0.70
N per group200
Events (control)173
Events (treatment)178
Log-rank p-value< 0.001
200

Number of subjects in each group

0.7

HR < 1: treatment protective; HR > 1: treatment harmful

0.30

Proportion of subjects subject to random right-censoring

0.10

Hazard rate for the control group

Statistically significant difference. The log-rank test rejects the null hypothesis of equal survival curves (p = < 0.001).


D. Mathematical Derivation

Partial Likelihood Derivation

Don't worry about the notation yet — here's what this means in words: The partial likelihood eliminates the baseline hazard by conditioning on the set of individuals at risk at each event time. It estimates beta without requiring distributional assumptions on h_0(t).

Setup. Suppose there are KK distinct event times t1<t2<<tKt_1 < t_2 < \cdots < t_K. At event time tkt_k, let R(tk)\mathcal{R}(t_k) denote the risk set — the set of individuals who are still under observation (have not yet experienced the event or been censored) just before tkt_k.

Step 1: Conditional probability of failure. At time tkt_k, exactly one individual (say, individual jkj_k) experiences the event. Conditional on one event occurring at tkt_k, the probability that it is individual jkj_k (rather than anyone else in the risk set) is:

P(individual jk failsone failure at tk)=h(tkXjk)lR(tk)h(tkXl)P(\text{individual } j_k \text{ fails} | \text{one failure at } t_k) = \frac{h(t_k | X_{j_k})}{\sum_{l \in \mathcal{R}(t_k)} h(t_k | X_l)}

Step 2: Cancel the baseline hazard. Under the proportional hazards model:

h0(tk)exp(Xjkβ)lR(tk)h0(tk)exp(Xlβ)=exp(Xjkβ)lR(tk)exp(Xlβ)\frac{h_0(t_k) \exp(X_{j_k}'\beta)}{\sum_{l \in \mathcal{R}(t_k)} h_0(t_k) \exp(X_l'\beta)} = \frac{\exp(X_{j_k}'\beta)}{\sum_{l \in \mathcal{R}(t_k)} \exp(X_l'\beta)}

The baseline hazard h0(tk)h_0(t_k) cancels. This cancellation is the key insight of the partial likelihood: by conditioning on the risk set, we eliminate the nuisance parameter h0(t)h_0(t).

Step 3: Construct the partial likelihood. The partial likelihood is the product over all event times:

PL(β)=k=1Kexp(Xjkβ)lR(tk)exp(Xlβ)PL(\beta) = \prod_{k=1}^{K} \frac{\exp(X_{j_k}'\beta)}{\sum_{l \in \mathcal{R}(t_k)} \exp(X_l'\beta)}

The log partial likelihood is:

(β)=k=1K[Xjkβln(lR(tk)exp(Xlβ))]\ell(\beta) = \sum_{k=1}^{K} \left[ X_{j_k}'\beta - \ln\left(\sum_{l \in \mathcal{R}(t_k)} \exp(X_l'\beta)\right) \right]

Step 4: Estimate β\beta. Maximize (β)\ell(\beta) numerically (Newton-Raphson). The resulting β^\hat{\beta} is consistent and asymptotically normal under regularity conditions.

Step 5: Variance estimation. The variance of β^\hat{\beta} is estimated from the inverse of the observed information matrix. For robust inference, use the sandwich (Lin-Wei) variance estimator (Lin et al., 1993), which is analogous to robust standard errors in OLS.

Handling ties. When multiple events occur at the same time, the partial likelihood must be adjusted. The Efron approximation is preferred over the Breslow approximation (which is biased when ties are common). All code on this page uses the Efron method.


E. Implementation

Cox Regression with Diagnostics

library(survival)

# ---- Step 1: Kaplan-Meier curves ----
# Visualize survival by group before fitting a regression
km_fit <- survfit(Surv(tenure, departed) ~ founder_ceo, data = df)
plot(km_fit, col = c("blue", "red"), lwd = 2,
   xlab = "Years", ylab = "Survival probability",
   main = "CEO Tenure by Type")
legend("topright", c("Professional CEO", "Founder CEO"),
     col = c("blue", "red"), lwd = 2)

# Log-rank test: are the survival curves significantly different?
survdiff(Surv(tenure, departed) ~ founder_ceo, data = df)

# ---- Step 2: Cox regression ----
cox_fit <- coxph(Surv(tenure, departed) ~ founder_ceo + firm_size +
                 roa + industry,
               data = df,
               ties = "efron")
summary(cox_fit)

# Hazard ratios with 95% confidence intervals
exp(cbind(HR = coef(cox_fit), confint(cox_fit)))

# ---- Step 3: Test proportional hazards ----
ph_test <- cox.zph(cox_fit)
print(ph_test)
plot(ph_test)  # Scaled Schoenfeld residuals vs time

# ---- Step 4: Predicted survival curves ----
# Survival curves for a specific covariate profile
newdata <- data.frame(founder_ceo = c(0, 1),
                    firm_size = median(df$firm_size),
                    roa = median(df$roa),
                    industry = "Manufacturing")
surv_pred <- survfit(cox_fit, newdata = newdata)
plot(surv_pred, col = c("blue", "red"), lwd = 2,
   xlab = "Years", ylab = "Survival probability")
legend("topright", c("Professional CEO", "Founder CEO"),
     col = c("blue", "red"), lwd = 2)

F. Diagnostics

F.1 Schoenfeld Residuals (Proportional Hazards Test)

The most important diagnostic for the Cox model. Grambsch and Therneau (1994) proposed testing the PH assumption by examining whether Schoenfeld residuals trend with time. Under the null hypothesis of proportional hazards, the scaled Schoenfeld residuals for each covariate should show no systematic pattern over time.

  • Global test: tests whether any covariate violates PH (reported by cox.zph() in R, estat phtest in Stata, check_assumptions() in lifelines)
  • Covariate-specific test: tests each covariate individually
  • Visual inspection: plot scaled Schoenfeld residuals against time. A flat line (zero slope) supports PH; a trend suggests violation

If the PH assumption is violated for a specific covariate:

  1. Stratify: coxph(Surv(time, event) ~ x1 + strata(x2), data = df) — allows different baseline hazards by strata
  2. Time interaction: include x×ln(t)x \times \ln(t) or x×tx \times t to allow the effect to vary over time
  3. Split the time axis: estimate separate models for early and late periods

F.2 Cox-Snell Residuals (Overall Fit)

Cox-Snell residuals assess overall model fit. If the model is correct, these residuals should follow a unit exponential distribution. Plot the Nelson-Aalen cumulative hazard of the Cox-Snell residuals against the residuals themselves — the plot should follow a 45-degree line. Systematic departures indicate poor overall fit.

F.3 Log-Log Survival Plot

Plot ln(ln(S^(t)))\ln(-\ln(\hat{S}(t))) versus ln(t)\ln(t) for different groups. Under proportional hazards, these curves should be approximately parallel. Crossing or converging curves indicate PH violation. This graphical diagnostic is especially useful for categorical covariates.

F.4 Martingale Residuals (Functional Form)

Martingale residuals assess whether continuous covariates enter the model with the correct functional form (Lin et al., 1993). Plot martingale residuals from a null model (no covariates) against each covariate. A nonlinear pattern suggests the covariate should be transformed (log, polynomial) or categorized.

F.5 Deviance Residuals (Outliers)

Deviance residuals are a normalized transformation of martingale residuals that are more symmetrically distributed around zero. Observations with large positive deviance residuals experienced the event "too soon" relative to the model's prediction; large negative values experienced it "too late" or were censored unexpectedly early.

library(survival)

cox_fit <- coxph(Surv(tenure, departed) ~ founder_ceo + firm_size + roa,
               data = df, ties = "efron")

# F.1 Schoenfeld residuals — PH test
ph_test <- cox.zph(cox_fit)
print(ph_test)          # Global and per-covariate tests
par(mfrow = c(1, 3))
plot(ph_test)            # Scaled Schoenfeld residuals vs time

# F.2 Cox-Snell residuals — overall fit
cs_resid <- df$departed - resid(cox_fit, type = "martingale")
surv_cs <- survfit(Surv(cs_resid, df$departed) ~ 1)
plot(surv_cs, fun = "cumhaz",
   xlab = "Cox-Snell residuals",
   ylab = "Cumulative hazard",
   main = "Cox-Snell Residual Plot")
abline(0, 1, col = "red", lty = 2)

# F.3 Log-log survival plot
km <- survfit(Surv(tenure, departed) ~ founder_ceo, data = df)
plot(km, fun = "cloglog",
   xlab = "ln(time)", ylab = "ln(-ln(S(t)))",
   col = c("blue", "red"), main = "Log-Log Plot")
legend("topleft", c("Professional", "Founder"),
     col = c("blue", "red"), lwd = 2)

# F.4 Martingale residuals — functional form
null_fit <- coxph(Surv(tenure, departed) ~ 1, data = df)
mart_resid <- resid(null_fit, type = "martingale")
plot(df$firm_size, mart_resid,
   xlab = "Firm size", ylab = "Martingale residuals",
   main = "Functional Form Check")
lines(lowess(df$firm_size, mart_resid), col = "red", lwd = 2)

Hazard Ratios vs. Coefficients

The Cox model estimates β\beta coefficients, but results are typically reported as hazard ratios exp(β)\exp(\beta):

Coefficient β\betaHazard Ratio exp(β)\exp(\beta)Interpretation
β=0.36\beta = -0.36HR=0.70HR = 0.7030% lower hazard (event rate)
β=0\beta = 0HR=1.00HR = 1.00No effect on hazard
β=0.41\beta = 0.41HR=1.50HR = 1.5050% higher hazard
β=0.69\beta = 0.69HR=2.00HR = 2.00Doubled hazard

For a continuous covariate: HR=1.05HR = 1.05 means a one-unit increase in XX is associated with a 5% increase in the instantaneous event rate, holding other covariates constant.

For a binary covariate: HR=0.65HR = 0.65 means the group with X=1X = 1 has a hazard that is 65% of the reference group's hazard — a 35% lower event rate at every point in time.

What to Report in a Table

A well-reported Cox regression table should include:

  1. Hazard ratios (not just coefficients) with 95% confidence intervals
  2. Number of subjects and number of events (not just total N)
  3. Median follow-up time or total person-time at risk
  4. Proportion censored
  5. PH test results (Schoenfeld global test p-value)
  6. Tie-handling method (Efron vs. Breslow)
  7. Standard error type (robust/sandwich if used)

G. What Can Go Wrong

Assumption Failure Demo

Ignoring Censoring (OLS on Observed Durations)

Cox model properly handles right-censored observations

Hazard ratio for founder CEO: 0.65 (SE = 0.08). Founder-CEOs have a 35% lower departure rate at any point in time, consistent with the Kaplan-Meier curves showing longer survival.

Assumption Failure Demo

Proportional Hazards Violation

A clinical trial where the treatment reduces hazard at a constant rate over time (PH holds)

Cox HR = 0.60 (SE = 0.12). The treatment reduces the hazard by 40% at all time points. Schoenfeld residual test p = 0.45 (PH not rejected).

Assumption Failure Demo

Informative Censoring

Censoring is administrative — all subjects are followed until a fixed end date, with censoring due only to the study ending

Cox HR = 0.70 (SE = 0.09). Censoring is non-informative because it depends only on the calendar date, not on patient characteristics.

Assumption Failure Demo

Competing Risks Ignored

Researcher accounts for competing risks — CEO departure modeled separately for voluntary resignation, forced dismissal, and retirement using cause-specific or Fine-Gray models

Cause-specific HR for forced dismissal: 1.8 for low-performing firms. HR for voluntary resignation: 1.1 (not significant). The effect is concentrated in forced departures.


H. Practice

H.1 Concept Checks

Concept Check

A researcher studies time to firm bankruptcy using a sample of 500 firms observed over 10 years. Of these, 150 go bankrupt and 350 survive to the end of the study. The researcher runs OLS: duration = beta_0 + beta_1 * leverage + beta_2 * size + epsilon. What is wrong with this approach?

Concept Check

A Cox regression of employee turnover on job satisfaction produces a hazard ratio of 0.70 (95% CI: [0.58, 0.85]). How should this be interpreted?

Concept Check

After fitting a Cox model for CEO tenure, you run the Schoenfeld residual test and find that the global test p-value is 0.002, with the covariate 'firm_performance' showing a significant trend (p = 0.001). What does this mean and what should you do?

H.2 Guided Exercise

Guided Exercise

Interpreting Cox Regression Output

You study CEO tenure at S&P 500 firms. Your Cox model produces the following output: Variable | Coeff (beta) | SE | HR [exp(beta)] | 95% CI HR | p-value Founder CEO | -0.43 | 0.11 | 0.65 | [0.52, 0.81] | < 0.001 Firm size (log) | -0.15 | 0.06 | 0.86 | [0.76, 0.97] | 0.012 ROA | 0.02 | 0.08 | 1.02 | [0.87, 1.19] | 0.820 Board independence| 0.31 | 0.14 | 1.36 | [1.04, 1.79] | 0.027 N = 1,200 CEO spells; 720 events (departures); 480 censored (40%). Schoenfeld global test p = 0.38. Efron method for ties.

What does the hazard ratio of 0.65 for Founder CEO mean?

How do you interpret the HR of 1.36 for Board independence?

Is ROA a significant predictor of CEO departure? How do you know?

Is the proportional hazards assumption satisfied? How do you know?

H.3 Error Detective

Error Detective

Read the analysis below carefully and identify the errors.

A management researcher studies time to first international expansion using a sample of 800 domestic firms observed from 2000 to 2020. Of these, 300 expanded internationally and 500 remained domestic. The researcher runs: coxph(Surv(years_observed, expanded) ~ firm_size + rd_intensity + industry, data = df) She reports: "Large firms internationalize 2.3 times faster than small firms (HR = 2.3, p < 0.001). The Kaplan-Meier curve shows a median time to internationalization of 8 years." She does not report any diagnostic tests. In the data, 60 firms were acquired during the study period (and thus could no longer expand internationally). These were coded as censored (expanded = 0).

Select all errors you can find:

Error Detective

Read the analysis below carefully and identify the errors.

A health economist studies time to hospital readmission after heart surgery. She has 2,000 patients, of whom 800 are readmitted within 1 year and 1,200 are not readmitted. She fits a Cox model: stset readmit_days, failure(readmitted) stcox age female diabetes surgery_type, efron She finds HR for diabetes = 1.45 (p = 0.02). She concludes: "Diabetic patients have a 45% higher probability of readmission." She also notes that the Schoenfeld test for diabetes gives p = 0.04 but does not discuss this result. She reports that 200 patients died during follow-up and were coded as censored.

Select all errors you can find:

H.4 You Are the Referee

Referee Exercise

Read the paper summary below and write a brief referee critique (2-3 sentences) of the identification strategy.

Paper Summary

The authors study whether founder-CEOs survive longer in their positions than professional CEOs using a Cox proportional hazard model on 950 CEO spells at publicly traded firms (2000-2018). They find that founder-CEOs have a significantly lower departure hazard (HR = 0.58, p < 0.001), controlling for firm size, ROA, board independence, and industry fixed effects. They do not test the proportional hazards assumption. Forty-five CEOs in the sample died in office; these were coded as censored observations.

Key Table

VariableCoefficientSEp-value
Founder CEO-0.540.13<0.001
Firm size (log)-0.180.070.010
ROA0.050.090.578
Board independence0.290.150.053
N950
Events (departures)580
Censored370

Authors' Identification Claim

The authors argue that the Cox model identifies the causal effect of founder status on CEO tenure by controlling for observable firm characteristics.


I. Swap-In: When to Use Something Else

  • Parametric models (Weibull, exponential, Gompertz): when you need to estimate the baseline hazard — for prediction, simulation, or forecasting. More efficient than Cox when the distributional assumption is correct. Weibull is the most flexible single-parameter family (can model increasing or decreasing hazards). See Therneau and Grambsch (2000) for extensions.

  • Accelerated failure time (AFT) models: when the proportional hazards assumption is violated. AFT models parameterize the effect of covariates as a multiplicative shift in the time scale, not the hazard scale: ln(T)=Xγ+σε\ln(T) = X'\gamma + \sigma \varepsilon. This formulation is analogous to OLS on log(duration) but with proper censoring handling. Common distributions: log-normal, log-logistic, Weibull.

  • Discrete-time hazard models: when time is measured in discrete intervals (e.g., annual employment spells, quarterly observations) or when you have interval-censored data. These are logit or complementary log-log models applied to person-period data (Singer & Willett, 2003).

  • Competing risks (Fine-Gray): when multiple event types exist and you want to estimate effects on the cumulative incidence of a specific event, rather than the cause-specific hazard (Fine & Gray, 1999). The Fine-Gray model accounts for the fact that experiencing a competing event removes the subject from risk for the event of interest.

  • Frailty models (random effects): when you have clustered data (e.g., employees within firms, patients within hospitals) and want to account for unobserved heterogeneity. A frailty term adds a random effect to the Cox model (Therneau & Grambsch, 2000).


J. Reviewer Checklist

Critical Reading Checklist


Paper Library

Foundational (5)

Cox, D. R. (1972). Regression Models and Life-Tables.

Journal of the Royal Statistical Society: Series B (Methodological)DOI: 10.1111/j.2517-6161.1972.tb00899.x

Introduces the proportional hazards model with an unspecified baseline hazard, estimated via partial likelihood. The semiparametric approach avoids distributional assumptions on the baseline hazard while allowing covariate effects to be estimated consistently. One of the most cited papers in statistics.

Kaplan, E. L., & Meier, P. (1958). Nonparametric Estimation from Incomplete Observations.

Journal of the American Statistical AssociationDOI: 10.1080/01621459.1958.10501452

Introduces the product-limit estimator (Kaplan-Meier estimator) for the survival function from right-censored data. The KM curve is the standard nonparametric tool for visualizing survival and comparing groups before fitting regression models.

Lin, D. Y., Wei, L. J., & Ying, Z. (1993). Checking the Cox Model with Cumulative Sums of Martingale-Based Residuals.

Develops graphical and numerical methods for checking the Cox model using cumulative sums of martingale-based residuals. Provides formal tests for the proportional hazards assumption, functional form of covariates, and overall model adequacy.

Grambsch, P. M., & Therneau, T. M. (1994). Proportional Hazards Tests and Diagnostics Based on Weighted Residuals.

Introduces the scaled Schoenfeld residual test for the proportional hazards assumption. Plotting scaled Schoenfeld residuals against time reveals time-varying effects. The test is the standard diagnostic in applied survival analysis.

Fine, J. P., & Gray, R. J. (1999). A Proportional Hazards Model for the Subdistribution of a Competing Risk.

Journal of the American Statistical AssociationDOI: 10.1080/01621459.1999.10474144

Develops a regression model for the cumulative incidence function under competing risks. The Fine-Gray model extends the Cox framework to settings where multiple event types compete, allowing estimation of covariate effects on the subdistribution hazard.

Application (1)

Shumway, T. (2001). Forecasting Bankruptcy More Accurately: A Simple Hazard Model.

Journal of BusinessDOI: 10.1086/209665

Shows that discrete-time hazard models outperform static logit models for bankruptcy prediction because they properly account for the time dimension and censoring. Demonstrates the importance of survival analysis framing for event prediction in finance.

Survey (3)

Singer, J. D., & Willett, J. B. (2003). Applied Longitudinal Data Analysis: Modeling Change and Event Occurrence.

Accessible textbook covering both growth curve models and discrete-time survival analysis. Chapters 9-15 provide a clear introduction to hazard modeling for social science researchers, with worked examples and practical guidance.

Cleves, M., Gould, W., & Marchenko, Y. (2016). An Introduction to Survival Analysis Using Stata.

Stata Press

Comprehensive practical guide to survival analysis in Stata. Covers Kaplan-Meier estimation, Cox regression, parametric models, competing risks, and frailty models with extensive Stata code examples and diagnostic procedures.

Therneau, T. M., & Grambsch, P. M. (2000). Modeling Survival Data: Extending the Cox Model.

Authoritative reference on extensions of the Cox model including time-varying covariates, stratification, frailty models, and multistate models. The R survival package is maintained by Therneau and implements the methods described here.

Tags

model-basedsurvival-analysisdurationhazard-rate