The Life Table Analysis

9.2.1 The Life Table Analysis

In survival analysis, the survivor and hazard functions are estimated from the

observed survival times. Consider a set of ordered survival times t 1 ,t 2 , …, t k . One

may then estimate any particular value of the survivor function, S(t i ), in the following way:

S(t i ) = P(surviving to time t i ) =

P(surviving to time t 1 ) × P(surviving to time t 1 | survived to time t 2 )

… × P(surviving to time t i | survived to time t i −1 ). 9.6

Let us denote by n j the number of individuals that are alive at the start of the interval [t j ,t j+1 [, and by d j the number of individuals that die during that interval. We then derive the following non-parametric estimate:

P ˆ ( surviving to t j + 1 | survived to t j ) = , 9.7

from where we estimate S(t i ) using formula 9.6.

Example 9.1

Q: A car stand has a record of the sale date and the date that a complaint was first presented for three different cars (this is a subset of the Car Sale dataset in Appendix E). These dates are shown in Table 9.1. Compute the estimate of the time-to-complaint probability for t = 300 days.

A: In this example, the time-to-complaint, “Complaint Date” – “Sale Date”, is the survival time. The computed times in days are shown in the last column of Table

9.1. Since there are no complaints occurring between days 261 and 300, we may apply 9.6 and 9.7 as follows:

9.2 Non-Parametric Analysis of Survival Data

S ˆ ( 300 ) = S ˆ ( 261 ) = P ˆ ( surviving to 240 ) P ˆ ( surviving to 261 | survived to 240 )

Alternatively, one could also compute this estimate as (3 – 2)/3, considering the [0, 261] interval.

Table 9.1. Time-to-complaint data in car sales (3 cars).

Time-to-complaint

Car Sale Date

Complaint Date

(days)

#1 1-Nov-00 29-Jun-01

#2 22-Nov-00 10-Aug-01

#3 16-Feb-01 30-Jan-02

In a survival study, information concerning the “death” times of one or more cases that entered the study is often not available either because the cases were “lost” during the study or because they are still “alive” at the end of the study. These are the so-called censored cases 1 .

The information of the censored cases must also be taken into consideration when estimating the survivor function. Let us denote by c j the number of cases censored in the interval [t j ,t j+1 [ . The actuarial or life-table estimate of the survivor function is a non-parametric estimate that assumes that the censored survival times occur uniformly throughout that interval, so that the average number of individuals that are at risk of dying during [t j ,t j+1 [ is:

n * j = n j − c j / 2 . 9.8

Taking into account formulas 9.6 and 9.7, the life-table estimate of the survivor function is computed as:

, for t k ≤t<t k+1 .

The hazard function is an estimate of 9.5, given by:

, for t j ≤t<t j+1 ,

where τ j is the length of the jth time interval.

The type of censoring described here is the one most frequently encountered, known as right censoring. There are other, less frequent types of censoring.

9 Survival Analysis

Example 9.2

Q: Consider that the data records of the car stand ( Car Sale dataset), presented in the previous example, was enlarged to 8 cars, as shown in Table 9.2. Determine the survivor and hazard functions using the life-table estimate.

A: We now have two sources of censored data: the three cars that are known to have had no complaints at the end of the study, and one car whose owner could not

be contacted at the end of the study, but whose car was known to have had no complaint at a previous date. We can summarise this information as shown in Table 9.3.

Using SPSS, with the time-to-complaint and censored columns of Table 9.3 and

a specification of displaying time intervals 0 through 600 days by 75 days, we obtain the life-table estimate results shown in Table 9.4. Figure 9.1 shows the survivor function plot. Note that it is a monotonic decreasing function.

Table 9.2. Time-to-complaint data in car sales (8 cars). Sale

Complaint Without Complaint at Last Date Known to be Car Date

Without Complaint #1 12-Sep-00

Date

the End of the Study

31-Mar-02

#2 26-Oct-00 31-Mar-02 #3 01-Nov-00 29-Jun-01 #4 22-Nov-00 10-Aug-01 #5 18-Jan-01

31-Mar-02

#6 02-Jul-01 24-Sep-01 #7 16-Feb-01 30-Jan-02 #8 03-May-01

31-Mar-02

Table 9.3. Summary table of the time-to-complaint data in car sales (8 cars). Time-to-complaint

Car Start Date

Stop Date

Censored

(days) #1 12-Sep-00 31-Mar-02

TRUE 565 #2 26-Oct-00 31-Mar-02

FALSE 521 #3 01-Nov-00

FALSE 240 #4 22-Nov-00 10-Aug-01

29-Jun-01

FALSE 261 #5 18-Jan-01 31-Mar-02

TRUE 437 #6 02-Jul-01 24-Sep-01

TRUE 84 #7 16-Feb-01

FALSE 348 #9 03-May-01 31-Mar-02

30-Jan-02

TRUE 332

9.2 Non-Parametric Analysis of Survival Data 357

Columns 2 through 5 of Table 9.4 list the successive values of n j ,c j , n * j , and d j , respectively. The “Propn Surviving” column is obtained by applying formula 9.7 with correction for censored data (formula 9.8). The “Cumul Propn Surv at End” column lists the values of S ˆ ( t ) obtained with formula 9.9. The “Propn Terminating” column is the complement of the “Propn Surviving” column. Finally, the last two columns list the values of the probability density and hazard functions, computed with the finite difference approximation of f(t) = ∆F(t)/∆t and formula

9.5, respectively. ฀

Table 9.4. Life-table of the time-to-complaint data, obtained with SPSS.

Intrvl Number Number

Cumul Wdrawn

Propn bility Hazard Time

Proba- Start

Number Number of Propn

Propn

Entrng this During

Density Rate Intrvl

to Risk

Surv at End

Cu TIME

Figure 9.1. Life-table estimate of the survivor function for the time-to-complaint data (first eight cases of the Car Sale dataset) obtained with SPSS.

Example 9.3

Q: Consider the amount of time until breaking of iron specimens, submitted to low amplitude sinusoidal loads (Group 1) in fatigue tests, a sample of which is given in

9 Survival Analysis

the Fatigue dataset. Determine the survivor, hazard and density functions using the life-table estimate procedure. What is the estimated percentage of specimens breaking beyond 2 million cycles? In addition determine the estimated percentage of specimens that will break at 500000 cycles.

A: We first convert the time data, given in number of 20 Hz cycles, to a lower range of values by dividing it by 10000. Next, we use this data with SPSS, assigning the Break variable as a censored data indicator (Break = 1 if the specimen has broken), and obtain the plots of the requested functions between 0 and 400 with steps of 20, shown in Figure 9.2.

Note the right tailed, positively skewed aspect of the density function, shown in Figure 9.2b, typical of survival data. From Figure 9.2a, we see that the estimated percentage of specimens surviving beyond 2 million cycles (marked 200 in the t axis) is over 45%. From Figure 9.2c, we expect a break rate of about 0.4% at 500000 cycles (marked 50 in the t axis).

1.0 S(t)

a f(t)

h(t) c

Figure 9.2. Survival functions for the group 1 iron specimens of the Fatigue dataset, obtained with SPSS: a) Survivor function; b) Density function; c) Hazard

function. The time scale is given in 10 4 cycles.

Commands 9.1. SPSS, STATISTICA, MATLAB and R commands used to perform survival analysis.

SPSS Analyze; Survival Statistics; Advanced Linear/Nonlinear

STATISTICA Models; Survival Analysis; Life tables & Distributions | Kaplan & Meier | Comparing

two samples | Regression models

MATLAB [par, pci] = expfit(x,alpha)

[par, pci] = weibfit(x,alpha) Surv(time,event); survfit(survobject)

R survdif(survobject ~ group, rho) coxph(survobject ~ factor)

9.2 Non-Parametric Analysis of Survival Data 359

SPSS uses as input data in survival analysis the survival time (e.g. last column of Table 9.3) and a censoring variable ( Status). STATISTICA allows, as an alternative, the specification of the start and stop dates (e.g., second and third columns of Table 9.3) either in date format or as separate columns for day, month and year. All the features described in the present chapter are easily found in SPSS or STATISTICA windows.

MATLAB stats toolbox does not have specific functions for survival analysis. It has, however, the expfit and weibfit functions which can be used for parametric survival analysis (see section 9.4) since they compute the maximum likelihood estimates of the parameters of the exponential and Weibull distributions, respectively, fitting the data vector x. The parameter estimates are returned in par. The confidence intervals of the parameters, at alpha significance level, are returned in pci.

A suite of R functions for survival analysis, together with functions for operating with dates, is available in the survival package. Be sure to load it first with library(survival). The Surv function is used as a preliminary operation to create an object (a Surv object) that serves as an argument for other functions. The arguments of Surv are a time and event vectors. The event vector contains the censored information. Let us illustrate the use of Surv for the Example 9.2 dataset. We assume that the last two columns of Table 9.3 are stored in t and ev, respectively for “Time-to-complaint” and “Censored”, and that the ev values are 1 for “censored” and 0 for “not censored”. We then apply Surv as follows:

> x <- Surv(t[1:8],ev[1:8]==0) >x [1] 565+ 521 240 261 437+ 84+ 348 332+

The event argument of Surv must specify which value corresponds to the “not censored”; hence, the specification ev[1:8]==0. In the list above the values marked with “+” are the censored observations (any observation with an event label different from 0 is deemed “censored”). We may next proceed, for instance, to create a Kaplan-Meier estimate of the data using survfit(x) (or, if preferred, survfit(Surv(t[1:8],ev[1:8]==0)).

The survdiff function provides tests for comparing groups of survival data. The argument rho can be 0 or 1 depending on whether one wants the log-rank or the Peto-Wilcoxon test, respectively.

The cosxph function fits a Cox regression model for a specified factor.