MethodAtlas
Method·intermediate·19 min read
Model-BasedEstablished

Cox Proportional Hazard Model

Models the hazard rate of an event (failure, exit, adoption) as a function of covariates, using a semiparametric baseline hazard that does not require distributional assumptions.

When to UseWhen your outcome is time-to-event (e.g., time to firm exit, CEO tenure, technology adoption, employee turnover) and you have right-censored observations (subjects who have not yet experienced the event by the end of the observation window).
AssumptionProportional hazards: the ratio of hazard rates for any two individuals is constant over time. The baseline hazard h_0(t) is left completely unspecified (semiparametric). Non-informative censoring is also required.
MistakeNot testing the proportional hazards assumption (use Schoenfeld residuals). If the assumption fails, the Cox model produces a weighted average of time-varying effects that may be misleading.
Reading Time~19 min read · 11 sections · 7 interactive exercises

One-Line Implementation

Rcoxph(Surv(time, event) ~ treatment + x1 + x2, data = df, ties = 'efron')
Statastcox treatment x1 x2, efron vce(robust)
PythonCoxPHFitter().fit(df, duration_col='time', event_col='event', formula='treatment + x1 + x2')

Download Full Analysis Code

Complete scripts with diagnostics, robustness checks, and result export.

Motivating Example: Founder vs. Professional CEO Tenure

A management researcher wants to know whether founder-CEOs stay in their positions longer than professional (externally hired) CEOs. She collects data on 1,200 CEO spells at publicly traded firms between 1995 and 2015, recording the date each CEO took office and, if applicable, the date they left.

Here is the problem: 40% of the CEOs in her sample are still in their position at the end of the study period in 2015. These cases are observations — she knows the CEO was still active at the end of 2015, but she does not know when they will eventually leave.

If she simply compares average observed tenure between the two groups, she will underestimate average tenure for both groups (because she is treating the end of the observation window as though it were the event date). Worse, if founder-CEOs are disproportionately censored (because they tend to stay longer), the bias will be asymmetric — she will underestimate founder-CEO tenure more than professional-CEO tenure, potentially masking the very difference she wants to detect.

She cannot simply drop the censored observations either. Doing so would select on the outcome: she would be left with only CEOs who departed, which is a non-random subset. If the censored CEOs are systematically different (e.g., better performing), dropping them introduces .

The model (Cox, 1972) solves this problem. It models the instantaneous rate of departure (the ) as a function of covariates, while properly accounting for censored observations. Censored CEOs contribute information up to the point they are last observed — they were "at risk" of departing during all the time they were observed, and the model uses this information without requiring knowledge of when they eventually leave.


AOverview

What the Cox Model Does

The Cox proportional hazard model estimates — the multiplicative effect of covariates on the instantaneous rate of experiencing an event. The model is:

h(tXi)=h0(t)exp(Xiβ)h(t | X_i) = h_0(t) \cdot \exp(X_i'\beta)

where:

  • h(tXi)h(t | X_i) is the hazard rate for individual ii at time tt
  • h0(t)h_0(t) is the baseline hazard — the hazard when all covariates equal zero
  • exp(Xiβ)\exp(X_i'\beta) is the multiplicative shift due to covariates

The critical feature of the Cox model is that it is semiparametric: the baseline hazard h0(t)h_0(t) is left completely unspecified. You do not need to assume that it follows any particular distribution (exponential, Weibull, log-normal, etc.). The model only estimates how covariates shift the hazard, not the shape of the hazard itself.

Key Concepts

  • S(t)=P(T>t)S(t) = P(T > t): the probability of surviving (not experiencing the event) beyond time tt. The provides a nonparametric estimate of this function (Kaplan & Meier, 1958).

  • Hazard rate h(t)h(t): the instantaneous risk of the event at time tt, conditional on having survived to tt. Unlike a probability, the hazard rate can exceed 1 because it is a rate (events per unit time), not a probability.

  • : when the event has not occurred by the end of the observation period. The Cox model handles this by including censored observations in the risk set up to the censoring time but not requiring them to contribute an event.

  • : the assumption that the ratio of hazards for any two individuals is constant over time. Proportional hazards is the defining assumption of the Cox model.

How It Differs from OLS

The key difference from OLS is that the Cox model correctly handles censored observations. In a regression of observed duration on covariates, OLS treats censored observations as though their observed time is the true duration — biasing coefficients downward. The Cox model instead uses the partial likelihood, which conditions on the observed ordering of events and does not require knowing the full distribution of event times.

When to Use the Cox Model

  • Your outcome is time-to-event: time to firm exit, CEO departure, technology adoption, employee turnover, patent citation, project completion, loan default
  • You have right-censored observations: subjects who have not experienced the event by the end of your study
  • You want to estimate how covariates affect the rate of the event rather than a binary yes/no outcome
  • You do not want to assume a particular parametric form for the baseline hazard

When NOT to Use the Cox Model

  • Your outcome is binary with no meaningful time dimension — use logit/probit
  • Your outcome is a continuous variable that is not a duration — use OLS
  • The proportional hazards assumption is violated and cannot be fixed by stratification — consider accelerated failure time (AFT) models
  • You have multiple competing event types (e.g., CEO can leave voluntarily, be fired, or retire) and you want event-specific effects — use competing risks models (Fine & Gray, 1999)

When to Use (Detailed)

  1. Your outcome is a duration with censoring. Time to firm exit, CEO tenure, time to technology adoption, employee turnover, time to loan default, patent lifetime — any setting where you observe a time-to-event and some subjects have not yet experienced the event.

  2. You want to estimate covariate effects without specifying the baseline hazard. If you do not have a strong prior about the shape of the hazard over time (whether it increases, decreases, or is constant), the semiparametric Cox model is the safe choice.

  3. You want to compare groups controlling for covariates. The Cox model provides hazard ratios that quantify how much faster or slower the event occurs for one group relative to another, holding other factors constant.

When NOT to Use (Detailed)

  1. The proportional hazards assumption is badly violated. If a treatment effect wears off sharply over time, the Cox model produces a single "average" hazard ratio that may not describe any actual time period well. Consider:

    • Stratified Cox model: allows different baseline hazards for different groups
    • Time-varying coefficients: interact the covariate with time
    • Accelerated failure time (AFT) models: model log(time) directly
  2. You have interval-censored data. If you only know that the event occurred between two time points (e.g., between annual surveys), use discrete-time hazard models rather than the standard Cox model.

  3. Multiple event types compete. If a CEO can leave voluntarily, be fired, or retire, you must choose an appropriate competing risks approach. To estimate cause-specific hazard ratios (how covariates affect the instantaneous rate among those still event-free), fit separate Cox models for each event type, censoring the others — standard Cox software handles this directly. To estimate effects on the cumulative incidence of a specific event type, use the Fine-Gray subdistribution hazard model (Fine & Gray, 1999).

  4. You need to estimate the baseline hazard. The Cox model does not directly estimate h0(t)h_0(t). If you need the baseline hazard for prediction or simulation, consider parametric alternatives (Weibull, Gompertz).


Connection to Other Methods

The Cox model sits within a broader ecosystem of methods for different outcome types:

  • Logit/Probit: models whether the event occurred (binary 0/1) but ignores when. Use logit when the time dimension is not meaningful. Use Cox when timing matters. Shumway (2001) shows that switching from static logit to a hazard model significantly improves bankruptcy prediction accuracy.

  • OLS on duration: regressing observed duration on covariates ignores censoring. Censored observations are treated as though their observed time is the true event time, biasing all coefficients toward zero (attenuation bias). This approach is never appropriate when censoring is present.

  • Competing risks: with multiple event types, two approaches target different estimands (Fine & Gray, 1999). Cause-specific hazard models estimate how covariates affect the instantaneous event rate among those still event-free (fit with standard Cox, censoring competing events). Fine-Gray models estimate effects on the cumulative incidence function (subdistribution hazard). The right choice depends on whether you want to understand cause-specific rates or absolute risks.

  • Parametric survival models: Weibull, exponential, Gompertz, and log-normal models fully specify the baseline hazard. They are more efficient than Cox when the distributional assumption is correct, but biased when it is wrong.

  • Fixed effects: in panel settings with repeated spells, stratified Cox models can be used to absorb time-invariant unobserved heterogeneity (analogous to fixed effects).


Common Confusions


BIdentification

For the Cox model to provide valid inference, three key assumptions must hold.

Assumption 1: Proportional Hazards

Plain language: The effect of a covariate on the hazard rate is constant over time. If founder-CEOs have a 30% lower departure hazard than professional CEOs at year 1, they also have a 30% lower hazard at year 5, year 10, and so on.

Formally: h(tXi)=h0(t)exp(Xiβ)h(t | X_i) = h_0(t) \cdot \exp(X_i'\beta), where β\beta does not depend on tt.

This property means the for any two individuals is:

h(tXi)h(tXj)=h0(t)exp(Xiβ)h0(t)exp(Xjβ)=exp((XiXj)β)\frac{h(t | X_i)}{h(t | X_j)} = \frac{h_0(t) \exp(X_i'\beta)}{h_0(t) \exp(X_j'\beta)} = \exp((X_i - X_j)'\beta)

The baseline hazard h0(t)h_0(t) cancels, and the ratio is constant over time. If the proportional hazards assumption fails — for example, if a treatment effect wears off over time — the Cox model estimates a weighted average of the time-varying effect, which may be misleading (Grambsch & Therneau, 1994).

Assumption 2: Non-Informative Censoring

Plain language: The reason an observation is censored is unrelated to the likelihood of the event occurring. In the CEO example, this assumption means that the end of the study period, firm going private, or data unavailability is not systematically related to whether the CEO would have left soon.

Formally: TCXT \perp C | X, where TT is the true event time and CC is the censoring time.

This assumption is violated, for example, if healthier patients selectively drop out of a clinical trial, or if firms with troubled CEOs are more likely to be acquired (and thus disappear from the sample).

Assumption 3: Correct Specification (for Causal Interpretation)

Plain language: For the Cox model coefficients to have a causal interpretation, the covariates must be exogenous — the same zero conditional mean assumption required for OLS. If there are unobserved confounders that affect both the covariates and the hazard, the coefficient estimates are biased.

Formally: Conditional on XiX_i, the event times are independent of any unobserved factors that affect the hazard. This requirement is the analog of the exogeneity condition in the proportional hazards framework.

This requirement is the same exogeneity concern as in any regression model. The Cox model does not solve endogeneity — it handles censoring and the functional form for duration data.


CVisual Intuition

Adjust the hazard ratio and censoring rate to see how the Kaplan-Meier survival curves diverge between treatment and control groups. Higher censoring makes the curves noisier and the log-rank test less powerful.

Explore how the proportional hazards assumption breaks down when the founder CEO advantage erodes over time:


DMathematical Derivation

Partial Likelihood Derivation

Don't worry about the notation yet — here's what this means in words: The partial likelihood eliminates the baseline hazard by conditioning on the set of individuals at risk at each event time. It estimates beta without requiring distributional assumptions on h_0(t).

Setup. Suppose there are KK distinct event times t1<t2<<tKt_1 < t_2 < \cdots < t_K. At event time tkt_k, let R(tk)\mathcal{R}(t_k) denote the risk set — the set of individuals who are still under observation (have not yet experienced the event or been censored) just before tkt_k.

Step 1: Conditional probability of failure. At time tkt_k, exactly one individual (say, individual jkj_k) experiences the event. Conditional on one event occurring at tkt_k, the probability that it is individual jkj_k (rather than anyone else in the risk set) is:

P(individual jk failsone failure at tk)=h(tkXjk)lR(tk)h(tkXl)P(\text{individual } j_k \text{ fails} \mid \text{one failure at } t_k) = \frac{h(t_k \mid X_{j_k})}{\sum_{l \in \mathcal{R}(t_k)} h(t_k \mid X_l)}

Step 2: Cancel the baseline hazard. Under the proportional hazards model:

h0(tk)exp(Xjkβ)lR(tk)h0(tk)exp(Xlβ)=exp(Xjkβ)lR(tk)exp(Xlβ)\frac{h_0(t_k) \exp(X_{j_k}'\beta)}{\sum_{l \in \mathcal{R}(t_k)} h_0(t_k) \exp(X_l'\beta)} = \frac{\exp(X_{j_k}'\beta)}{\sum_{l \in \mathcal{R}(t_k)} \exp(X_l'\beta)}

The baseline hazard h0(tk)h_0(t_k) cancels. This cancellation is the key insight of the partial likelihood: by conditioning on the risk set, we eliminate the nuisance parameter h0(t)h_0(t).

Step 3: Construct the partial likelihood. The partial likelihood is the product over all event times:

PL(β)=k=1Kexp(Xjkβ)lR(tk)exp(Xlβ)PL(\beta) = \prod_{k=1}^{K} \frac{\exp(X_{j_k}'\beta)}{\sum_{l \in \mathcal{R}(t_k)} \exp(X_l'\beta)}

The log partial likelihood is:

(β)=k=1K[Xjkβln(lR(tk)exp(Xlβ))]\ell(\beta) = \sum_{k=1}^{K} \left[ X_{j_k}'\beta - \ln\left(\sum_{l \in \mathcal{R}(t_k)} \exp(X_l'\beta)\right) \right]

Step 4: Estimate β\beta. Maximize (β)\ell(\beta) numerically (Newton-Raphson). The resulting β^\hat{\beta} is consistent and asymptotically normal under regularity conditions.

Step 5: Variance estimation. The variance of β^\hat{\beta} is estimated from the inverse of the observed information matrix. For robust inference, use the sandwich variance estimator (analogous to robust standard errors in OLS), available in all major survival analysis packages.

Handling ties. When multiple events occur at the same time, the partial likelihood must be adjusted. The Efron approximation is preferred over the Breslow approximation (which is biased when ties are common). R's survival package uses the Efron method by default. Python's lifelines and Stata's stcox default to the Breslow method; specify the Efron method explicitly when ties are common.


EImplementation

Cox Regression with Diagnostics

# Requires: survival
# survival: R's core package for survival analysis (Therneau & Grambsch)
library(survival)

# --- Step 1: Kaplan-Meier survival curves ---
# Visualize non-parametric survival estimates by group before modeling
# Surv() creates a survival object: (time, event indicator)
km_fit <- survfit(Surv(tenure, departed) ~ founder_ceo, data = df)
plot(km_fit, col = c("blue", "red"), lwd = 2,
   xlab = "Years", ylab = "Survival probability",
   main = "CEO Tenure by Type")
legend("topright", c("Professional CEO", "Founder CEO"),
     col = c("blue", "red"), lwd = 2)

# Log-rank test: non-parametric test for equality of survival curves
# H0: survival functions are the same across groups
survdiff(Surv(tenure, departed) ~ founder_ceo, data = df)

# --- Step 2: Cox proportional hazards regression ---
# coxph() estimates the semiparametric Cox model: h(t|X) = h0(t) * exp(X*beta)
# ties = "efron": Efron approximation for tied event times (more accurate than Breslow)
cox_fit <- coxph(Surv(tenure, departed) ~ founder_ceo + firm_size +
                 roa + industry,
               data = df,
               ties = "efron")
summary(cox_fit)

# Hazard ratios with 95% confidence intervals
# exp(coef) = hazard ratio: HR > 1 means higher hazard (shorter survival)
exp(cbind(HR = coef(cox_fit), confint(cox_fit)))

# --- Step 3: Test the proportional hazards assumption ---
# cox.zph() tests whether Schoenfeld residuals trend with time
# H0: hazard ratios are constant over time (PH holds)
# A significant p-value indicates PH violation for that covariate
ph_test <- cox.zph(cox_fit)
print(ph_test)
plot(ph_test)  # Flat line = PH holds; trend = PH violated

# --- Step 4: Predicted survival curves ---
# Generate survival curves at specific covariate values for comparison
# Setting covariates to median creates a "representative" profile
newdata <- data.frame(founder_ceo = c(0, 1),
                    firm_size = median(df$firm_size),
                    roa = median(df$roa),
                    industry = "Manufacturing")
surv_pred <- survfit(cox_fit, newdata = newdata)
plot(surv_pred, col = c("blue", "red"), lwd = 2,
   xlab = "Years", ylab = "Survival probability")
legend("topright", c("Professional CEO", "Founder CEO"),
     col = c("blue", "red"), lwd = 2)

FDiagnostics

F.1 Schoenfeld Residuals (Proportional Hazards Test)

The most important diagnostic for the Cox model. Grambsch and Therneau (1994) proposed testing the PH assumption by examining whether Schoenfeld residuals trend with time. Under the null hypothesis of proportional hazards, the scaled Schoenfeld residuals for each covariate should show no systematic pattern over time.

  • Global test: tests whether any covariate violates PH (reported by cox.zph() in R, estat phtest in Stata, check_assumptions() in lifelines)
  • Covariate-specific test: tests each covariate individually
  • Visual inspection: plot scaled Schoenfeld residuals against time. A flat line (zero slope) supports PH; a trend suggests violation

If the PH assumption is violated for a specific covariate:

  1. Stratify: coxph(Surv(time, event) ~ x1 + strata(x2), data = df) — allows different baseline hazards by strata
  2. Time interaction: include x×ln(t)x \times \ln(t) or x×tx \times t to allow the effect to vary over time
  3. Split the time axis: estimate separate models for early and late periods

F.2 Cox-Snell Residuals (Overall Fit)

Cox-Snell residuals assess overall model fit. If the model is correct, these residuals should follow a unit exponential distribution. Plot the Nelson-Aalen cumulative hazard of the Cox-Snell residuals against the residuals themselves — the plot should follow a 45-degree line. Systematic departures indicate poor overall fit.

F.3 Log-Log Survival Plot

Plot ln(ln(S^(t)))\ln(-\ln(\hat{S}(t))) versus ln(t)\ln(t) for different groups. Under proportional hazards, these curves should be approximately parallel. Crossing or converging curves indicate PH violation. This graphical diagnostic is especially useful for categorical covariates.

F.4 Martingale Residuals (Functional Form)

Martingale residuals assess whether continuous covariates enter the model with the correct functional form (Lin et al., 1993). Plot martingale residuals from a null model (no covariates) against each covariate. A nonlinear pattern suggests the covariate should be transformed (log, polynomial) or categorized.

F.5 Deviance Residuals (Outliers)

Deviance residuals are a normalized transformation of martingale residuals that are more symmetrically distributed around zero. Observations with large positive deviance residuals experienced the event "too soon" relative to the model's prediction; large negative values experienced it "too late" or were censored unexpectedly early.

library(survival)

cox_fit <- coxph(Surv(tenure, departed) ~ founder_ceo + firm_size + roa,
               data = df, ties = "efron")

# F.1 Schoenfeld residuals — PH test
ph_test <- cox.zph(cox_fit)
print(ph_test)          # Global and per-covariate tests
par(mfrow = c(1, 3))
plot(ph_test)            # Scaled Schoenfeld residuals vs time

# F.2 Cox-Snell residuals — overall fit
cs_resid <- df$departed - resid(cox_fit, type = "martingale")
surv_cs <- survfit(Surv(cs_resid, df$departed) ~ 1)
plot(surv_cs, fun = "cumhaz",
   xlab = "Cox-Snell residuals",
   ylab = "Cumulative hazard",
   main = "Cox-Snell Residual Plot")
abline(0, 1, col = "red", lty = 2)

# F.3 Log-log survival plot
km <- survfit(Surv(tenure, departed) ~ founder_ceo, data = df)
plot(km, fun = "cloglog",
   xlab = "ln(time)", ylab = "ln(-ln(S(t)))",
   col = c("blue", "red"), main = "Log-Log Plot")
legend("topleft", c("Professional", "Founder"),
     col = c("blue", "red"), lwd = 2)

# F.4 Martingale residuals — functional form
null_fit <- coxph(Surv(tenure, departed) ~ 1, data = df)
mart_resid <- resid(null_fit, type = "martingale")
plot(df$firm_size, mart_resid,
   xlab = "Firm size", ylab = "Martingale residuals",
   main = "Functional Form Check")
lines(lowess(df$firm_size, mart_resid), col = "red", lwd = 2)

Hazard Ratios vs. Coefficients

The Cox model estimates β\beta coefficients, but results are typically reported as hazard ratios (HR) exp(β)\exp(\beta):

Coefficient β\betaHazard Ratio exp(β)\exp(\beta)Interpretation
β=0.36\beta = -0.36HR=0.70HR = 0.7030% lower hazard (event rate)
β=0\beta = 0HR=1.00HR = 1.00No effect on hazard
β=0.41\beta = 0.41HR=1.50HR = 1.5050% higher hazard
β=0.69\beta = 0.69HR=2.00HR = 2.00Doubled hazard

For a continuous covariate: HR=1.05HR = 1.05 means a one-unit increase in XX is associated with a 5% increase in the instantaneous event rate, holding other covariates constant.

For a binary covariate: HR=0.65HR = 0.65 means the group with X=1X = 1 has a hazard that is 65% of the reference group's hazard — a 35% lower event rate at every point in time.

What to Report in a Table

A well-reported Cox regression table should include:

  1. Hazard ratios (not just coefficients) with 95% confidence intervals
  2. Number of subjects and number of events (not just total N)
  3. Median follow-up time or total person-time at risk
  4. Proportion censored
  5. PH test results (Schoenfeld global test p-value)
  6. Tie-handling method (Efron vs. Breslow)
  7. Standard error type (robust/sandwich if used)

GWhat Can Go Wrong

What Can Go Wrong

Ignoring Censoring (OLS on Observed Durations)

Cox model properly handles right-censored observations

Hazard ratio for founder CEO: 0.65 (SE = 0.08). Founder-CEOs have a 35% lower departure rate at any point in time, consistent with the Kaplan-Meier curves showing longer survival.

What Can Go Wrong

Proportional Hazards Violation

A clinical trial where the treatment reduces hazard at a constant rate over time (PH holds)

Cox HR = 0.60 (SE = 0.12). The treatment reduces the hazard by 40% at all time points. Schoenfeld residual test p = 0.45 (PH not rejected).

What Can Go Wrong

Informative Censoring

Censoring is administrative — all subjects are followed until a fixed end date, with censoring due only to the study ending

Cox HR = 0.70 (SE = 0.09). Censoring is non-informative because it depends only on the calendar date, not on patient characteristics.

What Can Go Wrong

Competing Risks Ignored

Researcher accounts for competing risks — CEO departure modeled separately for voluntary resignation, forced dismissal, and retirement using cause-specific or Fine-Gray models

Cause-specific HR for forced dismissal: 1.8 for low-performing firms. HR for voluntary resignation: 1.1 (not significant). The effect is concentrated in forced departures.


HPractice

H.1 Concept Checks

Concept Check

A researcher studies time to firm bankruptcy using a sample of 500 firms observed over 10 years. Of these, 150 go bankrupt and 350 survive to the end of the study. The researcher runs OLS: duration = beta_0 + beta_1 * leverage + beta_2 * size + epsilon. What is wrong with this approach?

Concept Check

A Cox regression of employee turnover on job satisfaction produces a hazard ratio of 0.70 (95% CI: [0.58, 0.85]). How should this be interpreted?

Concept Check

After fitting a Cox model for CEO tenure, you run the Schoenfeld residual test and find that the global test p-value is 0.002, with the covariate 'firm_performance' showing a significant trend (p = 0.001). What does this mean and what should you do?

H.2 Guided Exercise

Guided Exercise

Interpreting Cox Regression Output

You study CEO tenure at S&P 500 firms. Your Cox model produces the following output:

VariableCoeff (beta)SEHR [exp(beta)]95% CI HRp-value
Founder CEO-0.430.110.65[0.52, 0.81]< 0.001
Firm size (log)-0.150.060.86[0.76, 0.97]0.012
ROA0.020.081.02[0.87, 1.19]0.820
Board independence0.310.141.36[1.04, 1.79]0.027

N = 1,200 CEO spells; 720 events (departures); 480 censored (40%). Schoenfeld global test p = 0.38. Efron method for ties.

What does the hazard ratio of 0.65 for Founder CEO mean?

How do you interpret the HR of 1.36 for Board independence?

Is ROA a significant predictor of CEO departure? How do you know?

Is the proportional hazards assumption satisfied? How do you know?

H.3 Error Detective

Error Detective

Read the analysis below carefully and identify the errors.

A management researcher studies time to first international expansion using a sample of 800 domestic firms observed from 2000 to 2020. Of these, 300 expanded internationally and 500 remained domestic. The researcher runs:

coxph(Surv(years_observed, expanded) ~ firm_size + rd_intensity + industry, data = df)

She reports: "Large firms internationalize 2.3 times faster than small firms (HR = 2.3, p < 0.001). The Kaplan-Meier curve shows a median time to internationalization of 8 years."

She does not report any diagnostic tests. In the data, 60 firms were acquired during the study period (and thus could no longer expand internationally). These acquisitions were coded as censored (expanded = 0).

Select all errors you can find:

Error Detective

Read the analysis below carefully and identify the errors.

A health economist studies time to hospital readmission after heart surgery. She has 2,000 patients, of whom 800 are readmitted within 1 year and 1,200 are not readmitted. She fits a Cox model:

stset readmit_days, failure(readmitted) stcox age female diabetes surgery_type, efron

She finds HR for diabetes = 1.45 (p = 0.02). She concludes: "Diabetic patients have a 45% higher probability of readmission." She also notes that the Schoenfeld test for diabetes gives p = 0.04 but does not discuss this result. She reports that 200 patients died during follow-up and were coded as censored.

Select all errors you can find:

H.4 You Are the Referee

Referee Exercise

Read the paper summary below and write a brief referee critique (2-3 sentences) of the identification strategy.

Paper Summary

The authors study whether founder-CEOs survive longer in their positions than professional CEOs using a Cox proportional hazard model on 950 CEO spells at publicly traded firms (2000-2018). They find that founder-CEOs have a significantly lower departure hazard (HR = 0.58, p < 0.001), controlling for firm size, ROA, board independence, and industry fixed effects. They do not test the proportional hazards assumption. Forty-five CEOs in the sample died in office; these deaths were coded as censored observations.

Key Table

VariableCoefficientSEp-value
Founder CEO-0.540.13<0.001
Firm size (log)-0.180.070.010
ROA0.050.090.578
Board independence0.290.150.053
N950
Events (departures)580
Censored370

Authors' Identification Claim

The authors argue that the Cox model identifies the causal effect of founder status on CEO tenure by controlling for observable firm characteristics.


ISwap-In: When to Use Something Else

  • Parametric models (Weibull, exponential, Gompertz): when you need to estimate the baseline hazard — for prediction, simulation, or forecasting. More efficient than Cox when the distributional assumption is correct. Weibull is the simplest two-parameter family (can model increasing or decreasing hazards). See Therneau and Grambsch (2000) for extensions.

  • Accelerated failure time (AFT) models: when the proportional hazards assumption is violated. AFT models parameterize the effect of covariates as a multiplicative shift in the time scale, not the hazard scale: ln(T)=Xγ+σε\ln(T) = X'\gamma + \sigma \varepsilon. This formulation is analogous to OLS on log(duration) but with proper censoring handling. Common distributions: log-normal, log-logistic, Weibull.

  • Discrete-time hazard models: when time is measured in discrete intervals (e.g., annual employment spells, quarterly observations) or when you have interval-censored data. These models are logit or complementary log-log specifications applied to person-period data (Singer & Willett, 2003).

  • Competing risks (Fine-Gray): when multiple event types exist and you want to estimate effects on the cumulative incidence of a specific event, rather than the cause-specific hazard (Fine & Gray, 1999). The Fine-Gray model accounts for the fact that experiencing a competing event removes the subject from risk for the event of interest.

  • Frailty models (random effects): when you have clustered data (e.g., employees within firms, patients within hospitals) and want to account for unobserved heterogeneity. A frailty term adds a random effect to the Cox model (Therneau & Grambsch, 2000).


JReviewer Checklist

Critical Reading Checklist

0 of 10 items checked0%

Paper Library

Foundational (5)

Cox, D. R. (1972). Regression Models and Life-Tables.

Journal of the Royal Statistical Society: Series B (Methodological)DOI: 10.1111/j.2517-6161.1972.tb00899.x

Cox introduces the proportional hazards model with an unspecified baseline hazard, estimated via a conditional likelihood argument (later formalized as partial likelihood in Cox, 1975). The semiparametric approach avoids distributional assumptions on the baseline hazard while allowing covariate effects to be estimated consistently. One of the most cited papers in statistics.

Fine, J. P., & Gray, R. J. (1999). A Proportional Hazards Model for the Subdistribution of a Competing Risk.

Journal of the American Statistical AssociationDOI: 10.1080/01621459.1999.10474144

Fine and Gray develop a regression model for the cumulative incidence function under competing risks. The Fine-Gray model extends the Cox framework to settings where multiple event types compete, allowing estimation of covariate effects on the subdistribution hazard.

Grambsch, P. M., & Therneau, T. M. (1994). Proportional Hazards Tests and Diagnostics Based on Weighted Residuals.

Grambsch and Therneau introduce the scaled Schoenfeld residual test for the proportional hazards assumption. Plotting scaled Schoenfeld residuals against time reveals time-varying effects. The test is the standard diagnostic in applied survival analysis.

Kaplan, E. L., & Meier, P. (1958). Nonparametric Estimation from Incomplete Observations.

Journal of the American Statistical AssociationDOI: 10.1080/01621459.1958.10501452

Kaplan and Meier introduce the product-limit estimator (Kaplan-Meier estimator) for the survival function from right-censored data. The KM curve is the standard nonparametric tool for visualizing survival and comparing groups before fitting regression models.

Lin, D. Y., Wei, L. J., & Ying, Z. (1993). Checking the Cox Model with Cumulative Sums of Martingale-Based Residuals.

Lin, Wei, and Ying develop graphical and numerical methods for checking the Cox model using cumulative sums of martingale-based residuals. Provides formal tests for the proportional hazards assumption, functional form of covariates, and overall model adequacy.

Application (1)

Shumway, T. (2001). Forecasting Bankruptcy More Accurately: A Simple Hazard Model.

Journal of BusinessDOI: 10.1086/209665

Shumway shows that discrete-time hazard models outperform static logit models for bankruptcy prediction because they properly account for the time dimension and censoring. Demonstrates the importance of survival analysis framing for event prediction in finance.

Survey (4)

Cleves, M., Gould, W., & Marchenko, Y. (2016). An Introduction to Survival Analysis Using Stata.

Stata Press

Cleves, Gould, and Marchenko provide a comprehensive practical guide to survival analysis in Stata. Covers Kaplan-Meier estimation, Cox regression, parametric models, competing risks, and frailty models with extensive Stata code examples and diagnostic procedures.

Singer, J. D., & Willett, J. B. (2003). Applied Longitudinal Data Analysis: Modeling Change and Event Occurrence.

Singer and Willett write an accessible textbook covering both growth curve models and discrete-time survival analysis. Chapters 9-15 provide a clear introduction to hazard modeling for social science researchers, with worked examples and practical guidance.

Therneau, T. M., & Grambsch, P. M. (2000). Modeling Survival Data: Extending the Cox Model.

Therneau and Grambsch provide an authoritative reference on extensions of the Cox model including time-varying covariates, stratification, frailty models, and multistate models. The R survival package is maintained by Therneau and implements the methods described here.

Wooldridge, J. M. (2010). Econometric Analysis of Cross Section and Panel Data.

MIT Press

Wooldridge's graduate textbook is the standard reference for cross-section and panel data econometrics. Chapters 10-11 provide a thorough treatment of fixed effects, random effects, and related panel data methods, while later chapters cover general estimation methodology (MLE, GMM, M-estimation) with panel data applications throughout. The book covers both linear and nonlinear models with careful attention to assumptions.

Tags

model-basedsurvival-analysisdurationhazard-rate