The Cox Regression Model

9.4.3 The Cox Regression Model

When analysing survival data, one is often interested in elucidating the influence of explanatory variables in the survivor and hazard functions. For instance, when analysing the Heart Valve dataset, one is probably interested in knowing the influence of a patient’s age on chances of surviving.

Let h 1 (t) and h 2 (t) be the hazards of death at time t, for two groups: 1 and 2. The Cox regression model allows elucidating the influence of the group variable using

9 Survival Analysis

the proportional hazards assumption, i.e., the assumption that the hazards can be expressed as:

h 1 (t) = ψh 2 (t), 9.36

where the positive constant ψ is known as the hazard ratio, mentioned in 9.3. Let X be an indicator variable such that its value for the ith individual, x i , is 1 or

0, according to the group membership of the individual. In order to impose a positive value to ψ, we rewrite formula 9.36 as:

2 0 (t) and ψ=e . This model can be generalised for p explanatory variables:

Thus h (t) = h

h ( t ) = e η i i h 0 ( t ) , with η i = β 1 x 1 i + β 2 x 2 i + K + β p x pi , 9.38

where η i is known as the risk score and h 0 (t) is the baseline hazard function, i.e.,

the hazard that one would obtain if all independent explanatory variables were zero.

The Cox regression model is the most general of the regression models for survival data since it does not assume any particular underlying survival distribution. The model is fitted to the data by first estimating the risk score using a log-likelihood approach and finally computing the baseline hazard by an iterative procedure. As a result of the model fitting process, one can obtain parameter estimates and plots for specific values of the explanatory variables.

Example 9.10

Q: Determine the Cox regression solution for the Heart Valve dataset (event- free survival time), using Age as the explanatory variable. Compare the survivor functions and determine the estimated percentages of an event-free 10-year post- operative period for the mean age and for 20 and 60 years-old patients as well.

A: STATISTICA determines the parameter β Age = 0.0214 for the Cox regression model. The chi-square test under the null hypothesis of “no Age influence” yields an observed p = 0.004. Therefore, variable Age is highly significant in the estimation of survival times, i.e., is an explanatory variable.

Figure 9.9a shows the baseline survivor function. Figures 9.9b, c and d, show the survivor function plots for 20, 47.17 (mean age) and 60 years, respectively. As expected, the probability of a given post-operative event-free period decreases with age (survivor curves lower with age). From these plots, we see that the estimated percentages of patients with post-operative event-free 10-year periods are 80%, 65% and 59% for 20, 47.17 (mean age) and 60 year-old patients, respectively.

0.7 ion Survi

0.7 ti on S ropor

0.6 ti ve Proport

Survival Time a 0.3 0 1000

Survival Time

1.0 Age = 47.17 1.0 Age = 60

Survival Time c 0.3 0 1000

Survival Time

Figure 9.9. Baseline survivor function (a) and survivor functions for different patient ages (b, c and d) submitted to heart valve implant ( Heart Valve dataset), obtained by Cox regression in STATISTICA. The survival times are in days. The Age = 47.17 (years) corresponds to the sample mean age.

Exercises

9.1 Determine the probability of having no complaint in the first year for the Car Sale dataset using the life table and Kaplan-Meier estimates of the survivor function.

9.2 Redo Example 9.3 for the iron specimens submitted to high loads using the Kaplan- Meier estimate of the survivor function.

9.3 Redo the previous Exercise 9.2 for the aluminium specimens submitted to low and high loads. Compare the results.

9.4 Consider the Heart Valve dataset. Compute the Kaplan-Meier estimate for the following events: death after 1 st operation, death after 1 st or 2 nd operations, re-operation and endocarditis occurrence. Compute the following statistics:

a) Percentage of patients surviving 5 years. b) Percentage of patients without endocarditis in the first 5 years. c) Median survival time with 95% confidence interval.

9 Survival Analysis

9.5 Compute the median time until breaking for all specimen types of the Fatigue dataset.

9.6 Redo Example 9.7 for the high amplitude load groups of the Fatigue dataset. Compare the survival times of the iron and aluminium specimens using the Log-Rank or Peto-Wilcoxon tests. Discuss which of these tests is more appropriate.

9.7 Consider the following two groups of patients submitted to heart valve implant (Heart Valve dataset), according to the pre-surgery heart functional class: i.

Patients with mild or no symptoms before the operation (PRE C < 3). ii. Patients with severe symptoms before the operation (PRE C ≥ 3). Compare the survival time until death of these two groups using the most appropriate of the Log-Rank or Peto-Wilcoxon tests.

9.8 Determine the exponential and Weibull estimates of the survivor function for the Car Sale dataset. Verify that a Weibull model is more appropriate than the exponential model and compute the median time until complaint for that model.

9.9 Redo Example 9.9 for all group specimens of the Fatigue dataset. Determine which groups are better modelled by the Weibull distribution.

9.10 Consider the Weather dataset (Data 1) containing daily measurements of wind speed in m/s at 12H00. Assuming that a wind stroke at 12H00 was used to light an electric lamp by means of an electric dynamo, the time that the lamp would glow is proportional to the wind speed. The wind speed data can thus be interpreted as survival data. Fit a Weibull model to this data using n = 10, 20 and 30 time intervals. Compare the corresponding parameter estimates.

9.11 Compare the survivor functions for the wind speed data of the previous Exercise 9.11 for the groups corresponding to the two seasons: winter and summer. Use the most appropriate of the Log-Rank or Peto-Wilcoxon tests.

2 9.1 Using the Heart Valve dataset, determine the Cox regression solution for the survival time until death of patients undergoing heart valve implant with Age as the explanatory variable. Determine the estimated percentage of a 10-year survival time after operation for 30 years-old patients.

3 9.1 Using the Cox regression model for the time until breaking of the aluminium specimens of the Fatigue dataset, verify the following results: a) The load amplitude (AMP variable) is an explanatory variable, with chi-square

p = 0. b) The probability of surviving 2 million cycles for amplitude loads of 80 and 100 MPa is 0.6 and 0.17, respectively (point estimates).

9.14 Using the Cox regression model, show that the load amplitude (AMP variable) cannot be accepted as an explanatory variable for the time until breaking of the iron specimens of the Fatigue dataset. Verify that the survivor functions are approximately the same for different values of AMP.