When should I use Cox Proportional Hazard Model?

When your outcome is time-to-event (e.g., time to firm exit, CEO tenure, technology adoption, employee turnover) and you have right-censored observations (subjects who have not yet experienced the event by the end of the observation window).

What is the key assumption of Cox Proportional Hazard Model?

Proportional hazards: the ratio of hazard rates for any two individuals is constant over time. The baseline hazard h_0(t) is left completely unspecified (semiparametric). Non-informative censoring is also required.

What is the most common mistake with Cox Proportional Hazard Model?

Not testing the proportional hazards assumption (use Schoenfeld residuals). If the assumption fails, the Cox model produces a weighted average of time-varying effects that may be misleading.

Method·intermediate·16 min read

Model-BasedEstablished

Cox Proportional Hazard Model

Models the hazard rate of an event (failure, exit, adoption) as a function of covariates, using a semiparametric baseline hazard that does not require distributional assumptions.

When to Use: When your outcome is time-to-event (e.g., time to firm exit, CEO tenure, technology adoption, employee turnover) and you have right-censored observations (subjects who have not yet experienced the event by the end of the observation window).
Assumption: Proportional hazards: the ratio of hazard rates for any two individuals is constant over time. The baseline hazard h_0(t) is left completely unspecified (semiparametric). Non-informative censoring is also required.
Mistake: Not testing the proportional hazards assumption (use Schoenfeld residuals). If the assumption fails, the Cox model produces a weighted average of time-varying effects that may be misleading.
Reading Time: ~16 min read · 11 sections · 7 interactive exercises

One-Line Implementation

Rcoxph(Surv(time, event) ~ treatment + x1 + x2, data = df, ties = 'efron')

Statastcox treatment x1 x2, efron vce(robust)

PythonCoxPHFitter().fit(df, duration_col='time', event_col='event', formula='treatment + x1 + x2')

Download Full Analysis Code

Complete scripts with diagnostics, robustness checks, and result export.

Motivating Example: Founder vs. Professional CEO Tenure

A management researcher wants to know whether founder-CEOs stay in their positions longer than professional (externally hired) CEOs. She collects data on 1,200 CEO spells at publicly traded firms between 1995 and 2015, recording the date each CEO took office and, if applicable, the date they left.

Here is the problem: 40% of the CEOs in her sample are still in their position at the end of the study period in 2015. These cases are observations — she knows the CEO was still active at the end of 2015, but she does not know when they will eventually leave.

If she simply compares average observed tenure between the two groups, she will underestimate average tenure for both groups (because she is treating the end of the observation window as though it were the event date). Worse, if founder-CEOs are disproportionately censored (because they tend to stay longer), the bias will be asymmetric — she will underestimate founder-CEO tenure more than professional-CEO tenure, potentially masking the very difference she wants to detect.

She cannot simply drop the censored observations either. Doing so would select on the outcome: she would be left with only CEOs who departed, which is a non-random subset. If the censored CEOs are systematically different (e.g., better performing), dropping them introduces .

The model (Cox, 1972) solves this problem. It models the instantaneous rate of departure (the ) as a function of covariates, while properly accounting for censored observations. Censored CEOs contribute information up to the point they are last observed — they were "at risk" of departing during all the time they were observed, and the model uses this information without requiring knowledge of when they eventually leave.

AOverview

What the Cox Model Does

The Cox proportional hazard model estimates — the multiplicative effect of covariates on the instantaneous rate of experiencing an event. The model is:

h(t \mid X_i) = h_0(t) \cdot \exp(X_i'\beta)

where:

$h(t \mid X_i)$ is the hazard rate for individual $i$ at time $t$
$h_0(t)$ is the baseline hazard — the hazard when all covariates equal zero
$\exp(X_i'\beta)$ is the multiplicative shift due to covariates

The critical feature of the Cox model is that it is semiparametric: the baseline hazard $h_0(t)$ is left completely unspecified. You typically do not need to assume that it follows any particular distribution (exponential, Weibull, log-normal, etc.). The model only estimates how covariates shift the hazard, not the shape of the hazard itself.

Key Concepts

$S(t) = P(T > t)$ : the probability of surviving (not experiencing the event) beyond time $t$ . The provides a nonparametric estimate of this function (Kaplan & Meier, 1958).
Hazard rate $h(t)$ : the instantaneous risk of the event at time $t$ , conditional on having survived to $t$ . Unlike a probability, the hazard rate can exceed 1 because it is a rate (events per unit time), not a probability.
: when the event has not occurred by the end of the observation period. The Cox model handles this by including censored observations in the risk set up to the censoring time but not requiring them to contribute an event.
: the assumption that the ratio of hazards for any two individuals is constant over time. Proportional hazards is the defining assumption of the Cox model.

How It Differs from OLS

The key difference from OLS is that the Cox model correctly handles censored observations. In a regression of observed duration on covariates, OLS treats censored observations as though their observed time is the true duration — biasing coefficients downward. The Cox model instead uses the partial likelihood, which conditions on the observed ordering of events and does not require knowing the full distribution of event times.

When to Use the Cox Model

Your outcome is a duration with censoring. Time to firm exit, CEO tenure, time to technology adoption, employee turnover, time to loan default, patent lifetime — any setting where you observe a time-to-event and some subjects have not yet experienced the event.
You want to estimate covariate effects without specifying the baseline hazard. If you typically do not have a strong prior about the shape of the hazard over time (whether it increases, decreases, or is constant), the semiparametric Cox model is the safe choice.
You want to compare groups controlling for covariates. The Cox model provides hazard ratios that quantify how much faster or slower the event occurs for one group relative to another, holding other factors constant.

When NOT to Use the Cox Model

The proportional hazards assumption is badly violated. If a treatment effect wears off sharply over time, the Cox model produces a single "average" hazard ratio that may not describe any actual time period well. Consider:
- Stratified Cox model: allows different baseline hazards for different groups
- Time-varying coefficients: interact the covariate with time
- Accelerated failure time (AFT) models: model log(time) directly
You have interval-censored data. If you only know that the event occurred between two time points (e.g., between annual surveys), use discrete-time hazard models rather than the standard Cox model.
Multiple event types compete. If a CEO can leave voluntarily, be fired, or retire, you typically need to choose an appropriate competing risks approach. To estimate cause-specific hazard ratios (how covariates affect the instantaneous rate among those still event-free), fit separate Cox models for each event type, censoring the others — standard Cox software handles this directly. To estimate effects on the cumulative incidence of a specific event type, use the Fine-Gray subdistribution hazard model (Fine & Gray, 1999).
You need to estimate the baseline hazard. The Cox model does not directly estimate $h_0(t)$ . If you need the baseline hazard for prediction or simulation, consider parametric alternatives (Weibull, Gompertz).

Connection to Other Methods

The Cox model sits within a broader ecosystem of methods for different outcome types:

Logit/Probit: models whether the event occurred (binary 0/1) but ignores when. Use logit when the time dimension is not meaningful. Use Cox when timing matters. Shumway (2001) shows that switching from static logit to a hazard model significantly improves bankruptcy prediction accuracy.
OLS on duration: regressing observed duration on covariates ignores censoring. Censored observations are treated as though their observed time is the true event time, biasing all coefficients toward zero (attenuation bias). Duration OLS is generally inappropriate when a non-negligible fraction of the sample is censored.
Competing risks: with multiple event types, two approaches target different estimands (Fine & Gray, 1999). Cause-specific hazard models estimate how covariates affect the instantaneous event rate among those still event-free (fit with standard Cox, censoring competing events). Fine-Gray models estimate effects on the cumulative incidence function (subdistribution hazard). The right choice depends on whether you want to understand cause-specific rates or absolute risks.
Parametric survival models: Weibull, exponential, Gompertz, and log-normal models fully specify the baseline hazard. They are more efficient than Cox when the distributional assumption is correct, but biased when it is wrong.
Fixed effects: in panel settings with repeated spells, stratified Cox models can be used to absorb time-invariant unobserved heterogeneity (analogous to fixed effects).

Common Confusions

Frequently asked questions about survival analysis

Q: Can I just use logit to model whether the event happened? You can, but you lose information. Logit tells you whether the event occurred by the end of the study but ignores when it occurred. Two CEOs who both left — one after 2 years and one after 15 years — look identical in a logit model. The Cox model uses the full timing information. Shumway (2001) demonstrates how hazard models outperform static logit for bankruptcy prediction precisely because they account for the time dimension.
Q: What is the difference between a hazard rate and a probability? A hazard rate is an instantaneous rate, not a probability. It represents the rate of event occurrence at a specific instant, conditional on survival to that point. It can exceed 1 (e.g., a hazard of 2 means events are occurring at twice the rate per unit time). The survival probability $S(t)$ is the cumulative result of the hazard rate over time.
Q: Do I need to worry about the baseline hazard? Not for estimating covariate effects — that is the beauty of the semiparametric approach. The partial likelihood factors out the baseline hazard. However, if you want to predict survival probabilities or plot survival curves for specific covariate profiles, you will need to estimate the baseline hazard (using the Breslow or Nelson-Aalen estimator).
Q: What does "semiparametric" mean? The Cox model is parametric in the covariates (it assumes $\exp(X'\beta)$ multiplicatively shifts the hazard) but nonparametric in the baseline hazard ( $h_0(t)$ is left unspecified). This combination is why it is called semiparametric.

BIdentification

For the Cox model to provide valid inference, three key assumptions must hold.

Assumption 1: Proportional Hazards

Plain language: The effect of a covariate on the hazard rate is constant over time. If founder-CEOs have a 30% lower departure hazard than professional CEOs at year 1, they also have a 30% lower hazard at year 5, year 10, and so on.

Formally: $h(t \mid X_i) = h_0(t) \cdot \exp(X_i'\beta)$ , where $\beta$ does not depend on $t$ .

This property means the for any two individuals is:

\frac{h(t \mid X_i)}{h(t | X_j)} = \frac{h_0(t) \exp(X_i'\beta)}{h_0(t) \exp(X_j'\beta)} = \exp((X_i - X_j)'\beta)

The baseline hazard $h_0(t)$ cancels, and the ratio is constant over time. If the proportional hazards assumption fails — for example, if a treatment effect wears off over time — the Cox model estimates a weighted average of the time-varying effect, which may be misleading (Grambsch & Therneau, 1994).

Assumption 2: Non-Informative Censoring

Plain language: The reason an observation is censored is unrelated to the likelihood of the event occurring. In the CEO example, this assumption means that the end of the study period, firm going private, or data unavailability is not systematically related to whether the CEO would have left soon.

Formally: $T \perp C | X$ , where $T$ is the true event time and $C$ is the censoring time.

This assumption is violated, for example, if healthier patients selectively drop out of a clinical trial, or if firms with troubled CEOs are more likely to be acquired (and thus disappear from the sample).

Assumption 3: Correct Specification (for Causal Interpretation)

Plain language: For the Cox model coefficients to have a causal interpretation, the covariates must be exogenous — the same zero conditional mean assumption required for OLS. If there are unobserved confounders that affect both the covariates and the hazard, the coefficient estimates are biased.

Formally: Conditional on $X_i$ , the event times are independent of any unobserved factors that affect the hazard. This requirement is the analog of the exogeneity condition in the proportional hazards framework.

This requirement is the same exogeneity concern as in any regression model. The Cox model does not solve endogeneity — it handles censoring and the functional form for duration data.

CVisual Intuition

Adjust the hazard ratio and censoring rate to see how the Kaplan-Meier survival curves diverge between treatment and control groups. Higher censoring makes the curves noisier and the log-rank test less powerful.

Explore how the proportional hazards assumption breaks down when the founder CEO advantage erodes over time:

DMathematical Derivation

Partial Likelihood Derivation

Don't worry about the notation yet — here's what this means in words: The partial likelihood eliminates the baseline hazard by conditioning on the set of individuals at risk at each event time. It estimates beta without requiring distributional assumptions on h_0(t).

Setup. Suppose there are $K$ distinct event times $t_1 < t_2 < \cdots < t_K$ . At event time $t_k$ , let $\mathcal{R}(t_k)$ denote the risk set — the set of individuals who are still under observation (have not yet experienced the event or been censored) just before $t_k$ .

Step 1: Conditional probability of failure. At time $t_k$ , exactly one individual (say, individual $j_k$ ) experiences the event. Conditional on one event occurring at $t_k$ , the probability that it is individual $j_k$ (rather than anyone else in the risk set) is:

P(\text{individual } j_k \text{ fails} \mid \text{one failure at } t_k) = \frac{h(t_k \mid X_{j_k})}{\sum_{l \in \mathcal{R}(t_k)} h(t_k \mid X_l)}

Step 2: Cancel the baseline hazard. Under the proportional hazards model:

\frac{h_0(t_k) \exp(X_{j_k}'\beta)}{\sum_{l \in \mathcal{R}(t_k)} h_0(t_k) \exp(X_l'\beta)} = \frac{\exp(X_{j_k}'\beta)}{\sum_{l \in \mathcal{R}(t_k)} \exp(X_l'\beta)}

The baseline hazard $h_0(t_k)$ cancels. This cancellation is the key insight of the partial likelihood: by conditioning on the risk set, we eliminate the nuisance parameter $h_0(t)$ .

Step 3: Construct the partial likelihood. The partial likelihood is the product over all event times:

PL(\beta) = \prod_{k=1}^{K} \frac{\exp(X_{j_k}'\beta)}{\sum_{l \in \mathcal{R}(t_k)} \exp(X_l'\beta)}

The log partial likelihood is:

\ell(\beta) = \sum_{k=1}^{K} \left[ X_{j_k}'\beta - \ln\left(\sum_{l \in \mathcal{R}(t_k)} \exp(X_l'\beta)\right) \right]

Step 4: Estimate $\beta$ . Maximize $\ell(\beta)$ numerically (Newton-Raphson). The resulting $\hat{\beta}$ is consistent and asymptotically normal under regularity conditions.

Step 5: Variance estimation. The variance of $\hat{\beta}$ is estimated from the inverse of the observed information matrix. For robust inference, use the sandwich variance estimator (analogous to robust standard errors in OLS), available in all major survival analysis packages.

Handling ties. When multiple events occur at the same time, the partial likelihood must be adjusted. The Efron approximation is preferred over the Breslow approximation (which is biased when ties are common). R's survival::coxph and Python's lifelines.CoxPHFitter both default to Efron. Stata's stcox defaults to Breslow; pass the efron option (or use texreg/estout tabulations after stcox …, efron) when ties are common.

EImplementation

Cox Regression with Diagnostics

1# Requires: survival
2# survival: R's core package for survival analysis (Therneau & Grambsch)
3library(survival)
4
5# --- Step 1: Kaplan-Meier survival curves ---
6# Visualize non-parametric survival estimates by group before modeling
7# Surv() creates a survival object: (time, event indicator)
8km_fit <- survfit(Surv(tenure, departed) ~ founder_ceo, data = df)
9plot(km_fit, col = c("blue", "red"), lwd = 2,
10   xlab = "Years", ylab = "Survival probability",
11   main = "CEO Tenure by Type")
12legend("topright", c("Professional CEO", "Founder CEO"),
13     col = c("blue", "red"), lwd = 2)
14
15# Log-rank test: non-parametric test for equality of survival curves
16# H0: survival functions are the same across groups
17survdiff(Surv(tenure, departed) ~ founder_ceo, data = df)
18
19# --- Step 2: Cox proportional hazards regression ---
20# coxph() estimates the semiparametric Cox model: h(t|X) = h0(t) * exp(X*beta)
21# ties = "efron": Efron approximation for tied event times (more accurate than Breslow)
22cox_fit <- coxph(Surv(tenure, departed) ~ founder_ceo + firm_size +
23                 roa + industry,
24               data = df,
25               ties = "efron")
26summary(cox_fit)
27
28# Hazard ratios with 95% confidence intervals
29# exp(coef) = hazard ratio: HR > 1 means higher hazard (shorter survival)
30exp(cbind(HR = coef(cox_fit), confint(cox_fit)))
31
32# --- Step 3: Test the proportional hazards assumption ---
33# cox.zph() tests whether Schoenfeld residuals trend with time
34# H0: hazard ratios are constant over time (PH holds)
35# A significant p-value indicates PH violation for that covariate
36ph_test <- cox.zph(cox_fit)
37print(ph_test)
38plot(ph_test)  # Flat line = PH holds; trend = PH violated
39
40# --- Step 4: Predicted survival curves ---
41# Generate survival curves at specific covariate values for comparison
42# Setting covariates to median creates a "representative" profile
43newdata <- data.frame(founder_ceo = c(0, 1),
44                    firm_size = median(df$firm_size),
45                    roa = median(df$roa),
46                    industry = "Manufacturing")
47surv_pred <- survfit(cox_fit, newdata = newdata)
48plot(surv_pred, col = c("blue", "red"), lwd = 2,
49   xlab = "Years", ylab = "Survival probability")
50legend("topright", c("Professional CEO", "Founder CEO"),
51     col = c("blue", "red"), lwd = 2)

FDiagnostics

F.1 Schoenfeld Residuals (Proportional Hazards Test)

The most important diagnostic for the Cox model. Grambsch and Therneau (1994) proposed testing the PH assumption by examining whether Schoenfeld residuals trend with time. Under the null hypothesis of proportional hazards, the scaled Schoenfeld residuals for each covariate should show no systematic pattern over time.

Global test: tests whether any covariate violates PH (reported by cox.zph() in R, estat phtest in Stata, check_assumptions() in lifelines)
Covariate-specific test: tests each covariate individually
Visual inspection: plot scaled Schoenfeld residuals against time. A flat line (zero slope) supports PH; a trend suggests violation

If the PH assumption is violated for a specific covariate:

Stratify: coxph(Surv(time, event) ~ x1 + strata(x2), data = df) — allows different baseline hazards by strata
Time interaction: include $x \times \ln(t)$ or $x \times t$ to allow the effect to vary over time
Split the time axis: estimate separate models for early and late periods

F.2 Cox-Snell Residuals (Overall Fit)

Cox-Snell residuals assess overall model fit. If the model is correct, these residuals should follow a unit exponential distribution. Plot the Nelson-Aalen cumulative hazard of the Cox-Snell residuals against the residuals themselves — the plot should follow a 45-degree line. Systematic departures indicate poor overall fit.

F.3 Log-Log Survival Plot

Plot $\ln(-\ln(\hat{S}(t)))$ versus $\ln(t)$ for different groups. Under proportional hazards, these curves should be approximately parallel. Crossing or converging curves indicate PH violation. This graphical diagnostic is especially useful for categorical covariates.

F.4 Martingale Residuals (Functional Form)

Martingale residuals assess whether continuous covariates enter the model with the correct functional form (Lin et al., 1993). Plot martingale residuals from a null model (no covariates) against each covariate. A nonlinear pattern suggests the covariate should be transformed (log, polynomial) or categorized.

F.5 Deviance Residuals (Outliers)

Deviance residuals are a normalized transformation of martingale residuals that are more symmetrically distributed around zero. Observations with large positive deviance residuals experienced the event "too soon" relative to the model's prediction; large negative values experienced it "too late" or were censored unexpectedly early.

1library(survival)
2
3cox_fit <- coxph(Surv(tenure, departed) ~ founder_ceo + firm_size + roa,
4               data = df, ties = "efron")
5
6# F.1 Schoenfeld residuals — PH test
7ph_test <- cox.zph(cox_fit)
8print(ph_test)          # Global and per-covariate tests
9par(mfrow = c(1, 3))
10plot(ph_test)            # Scaled Schoenfeld residuals vs time
11
12# F.2 Cox-Snell residuals — overall fit
13cs_resid <- df$departed - resid(cox_fit, type = "martingale")
14surv_cs <- survfit(Surv(cs_resid, df$departed) ~ 1)
15plot(surv_cs, fun = "cumhaz",
16   xlab = "Cox-Snell residuals",
17   ylab = "Cumulative hazard",
18   main = "Cox-Snell Residual Plot")
19abline(0, 1, col = "red", lty = 2)
20
21# F.3 Log-log survival plot
22km <- survfit(Surv(tenure, departed) ~ founder_ceo, data = df)
23plot(km, fun = "cloglog",
24   xlab = "ln(time)", ylab = "ln(-ln(S(t)))",
25   col = c("blue", "red"), main = "Log-Log Plot")
26legend("topleft", c("Professional", "Founder"),
27     col = c("blue", "red"), lwd = 2)
28
29# F.4 Martingale residuals — functional form
30null_fit <- coxph(Surv(tenure, departed) ~ 1, data = df)
31mart_resid <- resid(null_fit, type = "martingale")
32plot(df$firm_size, mart_resid,
33   xlab = "Firm size", ylab = "Martingale residuals",
34   main = "Functional Form Check")
35lines(lowess(df$firm_size, mart_resid), col = "red", lwd = 2)

Hazard Ratios vs. Coefficients

The Cox model estimates $\beta$ coefficients, but results are typically reported as hazard ratios (HR) $\exp(\beta)$ :

Coefficient $\beta$	Hazard Ratio $\exp(\beta)$	Interpretation
$\beta = -0.36$	$HR = 0.70$	30% lower hazard (event rate)
$\beta = 0$	$HR = 1.00$	No effect on hazard
$\beta = 0.41$	$HR = 1.50$	50% higher hazard
$\beta = 0.69$	$HR = 2.00$	Doubled hazard

For a continuous covariate: $HR = 1.05$ means a one-unit increase in $X$ is associated with a 5% increase in the instantaneous event rate, holding other covariates constant.

For a binary covariate: $HR = 0.65$ means the group with $X = 1$ has a hazard that is 65% of the reference group's hazard — a 35% lower event rate at every point in time.

What to Report in a Table

A well-reported Cox regression table should include:

Hazard ratios (not just coefficients) with 95% confidence intervals
Number of subjects and number of events (not just total N)
Median follow-up time or total person-time at risk
Proportion censored
PH test results (Schoenfeld global test p-value)
Tie-handling method (Efron vs. Breslow)
Standard error type (robust/sandwich if used)

GWhat Can Go Wrong

What Can Go Wrong

Ignoring Censoring (OLS on Observed Durations)

Cox model properly handles right-censored observations

Hazard ratio for founder CEO: 0.65 (SE = 0.08). Founder-CEOs have a 35% lower departure rate at any point in time, consistent with the Kaplan-Meier curves showing longer survival.

What Can Go Wrong

Proportional Hazards Violation

A clinical trial where the treatment reduces hazard at a constant rate over time (PH holds)

Cox HR = 0.60 (SE = 0.12). The treatment reduces the hazard by 40% at all time points. Schoenfeld residual test p = 0.45 (PH not rejected).

What Can Go Wrong

Informative Censoring

Censoring is administrative — all subjects are followed until a fixed end date, with censoring due only to the study ending

Cox HR = 0.70 (SE = 0.09). Censoring is non-informative because it depends only on the calendar date, not on patient characteristics.

What Can Go Wrong

Competing Risks Ignored

Researcher accounts for competing risks — CEO departure modeled separately for voluntary resignation, forced dismissal, and retirement using cause-specific or Fine-Gray models

Cause-specific HR for forced dismissal: 1.8 for low-performing firms. HR for voluntary resignation: 1.1 (not significant). The effect is concentrated in forced departures.

HPractice

H.1 Concept Checks

Concept Check

A researcher studies time to firm bankruptcy using a sample of 500 firms observed over 10 years. Of these, 150 go bankrupt and 350 survive to the end of the study. The researcher runs OLS: duration = beta_0 + beta_1 * leverage + beta_2 * size + epsilon. What is wrong with this approach?

Nothing — OLS is always appropriate for continuous outcomes.The 350 surviving firms are right-censored. OLS treats their observed time as their true bankruptcy time, biasing coefficients toward zero. A survival model (Cox, parametric) is needed.The problem is that bankruptcy is binary, so logit should be used.The problem is heteroscedasticity in the error term.

Concept Check

A Cox regression of employee turnover on job satisfaction produces a hazard ratio of 0.70 (95% CI: [0.58, 0.85]). How should this be interpreted?

A one-unit increase in job satisfaction reduces turnover probability by 30%.A one-unit increase in job satisfaction is associated with a 30% lower instantaneous turnover rate at any point in time, holding other covariates constant.Employees with higher job satisfaction stay 30% longer on average.The odds of turnover decrease by 30% for each unit of job satisfaction.

Concept Check

After fitting a Cox model for CEO tenure, you run the Schoenfeld residual test and find that the global test p-value is 0.002, with the covariate 'firm_performance' showing a significant trend (p = 0.001). What does this mean and what should you do?

The model is invalid and should be discarded entirely.The proportional hazards assumption is violated for firm_performance. The effect of firm_performance on the departure hazard changes over time. You generally want to either stratify by firm_performance, interact it with time, or use an alternative model.The result means firm_performance is endogenous and should be removed.Switch to logit because the Cox model assumptions are violated.

H.2 Guided Exercise

Guided Exercise

Interpreting Cox Regression Output

You study CEO tenure at S&P 500 firms. Your Cox model produces the following output:

Variable	Coeff (beta)	SE	HR [exp(beta)]	95% CI HR	p-value
Founder CEO	-0.43	0.11	0.65	[0.52, 0.81]	< 0.001
Firm size (log)	-0.15	0.06	0.86	[0.76, 0.97]	0.012
ROA	0.02	0.08	1.02	[0.87, 1.19]	0.820
Board independence	0.31	0.14	1.36	[1.04, 1.79]	0.027

N = 1,200 CEO spells; 720 events (departures); 480 censored (40%). Schoenfeld global test p = 0.38. Efron method for ties.

H.3 Error Detective

Error Detective

Read the analysis below carefully and identify the errors.

A management researcher studies time to first international expansion using a sample of 800 domestic firms observed from 2000 to 2020. Of these, 300 expanded internationally and 500 remained domestic. The researcher runs:

coxph(Surv(years_observed, expanded) ~ firm_size + rd_intensity + industry, data = df)

She reports: "Large firms internationalize 2.3 times faster than small firms (HR = 2.3, p < 0.001). The Kaplan-Meier curve shows a median time to internationalization of 8 years."

She does not report any diagnostic tests. In the data, 60 firms were acquired during the study period (and thus could no longer expand internationally). These acquisitions were coded as censored (expanded = 0).

Select all errors you can find:

Competing risks not addressed: acquisitions prevent internationalization(Event/censoring coding)

No proportional hazards test reported(Missing diagnostics)

Causal language without identification strategy(Results interpretation)

Error Detective

Read the analysis below carefully and identify the errors.

A health economist studies time to hospital readmission after heart surgery. She has 2,000 patients, of whom 800 are readmitted within 1 year and 1,200 are not readmitted. She fits a Cox model:

stset readmit_days, failure(readmitted) stcox age female diabetes surgery_type, efron

She finds HR for diabetes = 1.45 (p = 0.02). She concludes: "Diabetic patients have a 45% higher probability of readmission." She also notes that the Schoenfeld test for diabetes gives p = 0.04 but does not discuss this result. She reports that 200 patients died during follow-up and were coded as censored.

Select all errors you can find:

Hazard ratio misinterpreted as probability change(Results interpretation)

PH violation detected but ignored(Diagnostics)

Death is a competing risk, not a censoring event(Event/censoring coding)

H.4 You Are the Referee

Referee Exercise

Read the paper summary below and write a brief referee critique (2-3 sentences) of the identification strategy.

Paper Summary

The authors study whether founder-CEOs survive longer in their positions than professional CEOs using a Cox proportional hazard model on 950 CEO spells at publicly traded firms (2000-2018). They find that founder-CEOs have a significantly lower departure hazard (HR = 0.58, p < 0.001), controlling for firm size, ROA, board independence, and industry fixed effects. They do not test the proportional hazards assumption. Forty-five CEOs in the sample died in office; these deaths were coded as censored observations.

Key Table

Variable	Coefficient	SE	p-value
Founder CEO	-0.54	0.13	<0.001
Firm size (log)	-0.18	0.07	0.010
ROA	0.05	0.09	0.578
Board independence	0.29	0.15	0.053
N	950
Events (departures)	580
Censored	370

Authors' Identification Claim

The authors argue that the Cox model identifies the causal effect of founder status on CEO tenure by controlling for observable firm characteristics.

ISwap-In: When to Use Something Else

Parametric models (Weibull, exponential, Gompertz): when you need to estimate the baseline hazard — for prediction, simulation, or forecasting. More efficient than Cox when the distributional assumption is correct. Weibull is the simplest two-parameter family (can model increasing or decreasing hazards). See Therneau and Grambsch (2000) for extensions.
Accelerated failure time (AFT) models: when the proportional hazards assumption is violated. AFT models parameterize the effect of covariates as a multiplicative shift in the time scale, not the hazard scale: $\ln(T) = X'\gamma + \sigma \varepsilon$ . This formulation is analogous to OLS on log(duration) but with proper censoring handling. Common distributions: log-normal, log-logistic, Weibull.
Discrete-time hazard models: when time is measured in discrete intervals (e.g., annual employment spells, quarterly observations) or when you have interval-censored data. These models are logit or complementary log-log specifications applied to person-period data (Singer & Willett, 2003).
Competing risks (Fine-Gray): when multiple event types exist and you want to estimate effects on the cumulative incidence of a specific event, rather than the cause-specific hazard (Fine & Gray, 1999). The Fine-Gray model accounts for the fact that experiencing a competing event removes the subject from risk for the event of interest.
Frailty models (random effects): when you have clustered data (e.g., employees within firms, patients within hospitals) and want to account for unobserved heterogeneity. A frailty term adds a random effect to the Cox model (Therneau & Grambsch, 2000).

JReviewer Checklist

Paper Library

Has replication code

Foundational (5)

Cox, D. R. (1972). Regression Models and Life-Tables.

Journal of the Royal Statistical Society: Series B (Methodological)DOI: 10.1111/j.2517-6161.1972.tb00899.x

Cox introduces the proportional hazards model with an unspecified baseline hazard, estimated via partial likelihood. The semiparametric approach avoids distributional assumptions on the baseline hazard while allowing covariate effects to be estimated consistently. One of the most cited papers in statistics.

Fine, J. P., & Gray, R. J. (1999). A Proportional Hazards Model for the Subdistribution of a Competing Risk.

Journal of the American Statistical AssociationDOI: 10.1080/01621459.1999.10474144

Fine and Gray develop a regression model for the cumulative incidence function under competing risks. The Fine-Gray model extends the Cox framework to settings where multiple event types compete, allowing estimation of covariate effects on the subdistribution hazard.

Grambsch, P. M., & Therneau, T. M. (1994). Proportional Hazards Tests and Diagnostics Based on Weighted Residuals.

BiometrikaDOI: 10.1093/biomet/81.3.515

Grambsch and Therneau introduce the scaled Schoenfeld residual test for the proportional hazards assumption. Plotting scaled Schoenfeld residuals against time reveals time-varying effects. The test is the standard diagnostic in applied survival analysis.

Kaplan, E. L., & Meier, P. (1958). Nonparametric Estimation from Incomplete Observations.

Journal of the American Statistical AssociationDOI: 10.1080/01621459.1958.10501452

Kaplan and Meier introduce the product-limit estimator (Kaplan-Meier estimator) for the survival function from right-censored data. The KM curve is the standard nonparametric tool for visualizing survival and comparing groups before fitting regression models.

Lin, D. Y., Wei, L. J., & Ying, Z. (1993). Checking the Cox Model with Cumulative Sums of Martingale-Based Residuals.

BiometrikaDOI: 10.1093/biomet/80.3.557

Lin, Wei, and Ying develop graphical and numerical methods for checking the Cox model using cumulative sums of martingale-based residuals. Provides formal tests for the proportional hazards assumption, functional form of covariates, and overall model adequacy.

Application (1)

Shumway, T. (2001). Forecasting Bankruptcy More Accurately: A Simple Hazard Model.

Journal of BusinessDOI: 10.1086/209665

Shumway shows that discrete-time hazard models outperform static logit models for bankruptcy prediction because they properly account for the time dimension and censoring. Demonstrates the importance of survival analysis framing for event prediction in finance.

Survey (4)

Cleves, M., Gould, W., & Marchenko, Y. (2016). An Introduction to Survival Analysis Using Stata.

Stata Press

Cleves, Gould, and Marchenko provide a comprehensive practical guide to survival analysis in Stata. Covers Kaplan-Meier estimation, Cox regression, parametric models, competing risks, and frailty models with extensive Stata code examples and diagnostic procedures.

Singer, J. D., & Willett, J. B. (2003). Applied Longitudinal Data Analysis: Modeling Change and Event Occurrence.

Oxford University PressDOI: 10.1093/acprof:oso/9780195152968.001.0001

Singer and Willett write an accessible textbook covering both growth curve models and discrete-time survival analysis. Chapters 9-15 provide a clear introduction to hazard modeling for social science researchers, with worked examples and practical guidance.

Therneau, T. M., & Grambsch, P. M. (2000). Modeling Survival Data: Extending the Cox Model.

SpringerDOI: 10.1007/978-1-4757-3294-8

Therneau and Grambsch provide an authoritative reference on extensions of the Cox model including time-varying covariates, stratification, frailty models, and multistate models. The R survival package is maintained by Therneau and implements the methods described here.

Wooldridge, J. M. (2010). Econometric Analysis of Cross Section and Panel Data.

MIT Press

Wooldridge's graduate textbook covers duration and hazard models in Chapter 22, including the Cox proportional hazard model, parametric alternatives (Weibull, exponential), and the treatment of censoring and truncation in survival data.

One-Line Implementation

Download Full Analysis Code

Motivating Example: Founder vs. Professional CEO Tenure#

AOverview#

What the Cox Model Does#

Key Concepts#

How It Differs from OLS#

When to Use the Cox Model#

When NOT to Use the Cox Model#

Connection to Other Methods#

Common Confusions#

BIdentification#

Assumption 1: Proportional Hazards#

Assumption 2: Non-Informative Censoring#

Assumption 3: Correct Specification (for Causal Interpretation)#

CVisual Intuition#

DMathematical Derivation#

Partial Likelihood Derivation#

EImplementation#

Cox Regression with Diagnostics#

FDiagnostics#

F.1 Schoenfeld Residuals (Proportional Hazards Test)#

F.2 Cox-Snell Residuals (Overall Fit)#

F.3 Log-Log Survival Plot#

F.4 Martingale Residuals (Functional Form)#

F.5 Deviance Residuals (Outliers)#

Hazard Ratios vs. Coefficients#

What to Report in a Table#

GWhat Can Go Wrong#

Ignoring Censoring (OLS on Observed Durations)

Proportional Hazards Violation

Informative Censoring

Competing Risks Ignored

HPractice#

H.1 Concept Checks#

H.2 Guided Exercise#

H.3 Error Detective#

H.4 You Are the Referee#

Paper Summary

Key Table

Authors' Identification Claim

ISwap-In: When to Use Something Else#

JReviewer Checklist#

Critical Reading Checklist

Paper Library

Foundational (5)

Application (1)

Survey (4)

Tags

Motivating Example: Founder vs. Professional CEO Tenure

AOverview

What the Cox Model Does

Key Concepts

How It Differs from OLS

When to Use the Cox Model

When NOT to Use the Cox Model

Connection to Other Methods

Common Confusions

BIdentification

Assumption 1: Proportional Hazards

Assumption 2: Non-Informative Censoring

Assumption 3: Correct Specification (for Causal Interpretation)

CVisual Intuition

DMathematical Derivation

Partial Likelihood Derivation

EImplementation

Cox Regression with Diagnostics

FDiagnostics

F.1 Schoenfeld Residuals (Proportional Hazards Test)

F.2 Cox-Snell Residuals (Overall Fit)

F.3 Log-Log Survival Plot

F.4 Martingale Residuals (Functional Form)

F.5 Deviance Residuals (Outliers)

Hazard Ratios vs. Coefficients

What to Report in a Table

GWhat Can Go Wrong

HPractice

H.1 Concept Checks

H.2 Guided Exercise

H.3 Error Detective

H.4 You Are the Referee

ISwap-In: When to Use Something Else

JReviewer Checklist