The Kaplan-Meier Analysis
9.2.2 The Kaplan-Meier Analysis
The Kaplan-Meier estimate, also known as product-limit estimate of the survivor function is another type of non-parametric estimate, which uses intervals starting at
9 Survival Analysis
“death” times. The formula for computing the estimate of the survivor function is similar to formula 9.9, using n instead of * j n j :
j = 1 , for t k ≤t<t n j
k+1 .
Since, by construction, there are n j individuals who are alive just before t j and d j deaths occurring at t j , the probability that an individual dies between t j – δ and t j is estimated by d j /n j . Thus, the probability of individuals surviving through [t j ,t j+1 [ is estimated by (n j –d j )/ n j .
The only influence of the censored data is in the computation of the number of individuals, n j , who are alive just before t j . If a censored survival time occurs simultaneously with one or more deaths, then the censored survival time is taken to occur immediately after the death time.
The Kaplan-Meier estimate of the hazard function is given by:
h ˆ ( t ) = , for t j ≤t<t j+1 , 9.12 n j τ j
where τ j is the length of the jth time interval. For details, see e.g. (Collet D, 1994) or (Kleinbaum DG, Klein M, 2005).
Example 9.4
Q: Redo Example 9.2 using the Kaplan-Meier estimate.
A: Table 9.5 summarises the computations needed for obtaining the Kaplan-Meier estimate of the “time-to-complaint” data. Figure 9.3 shows the respective survivor function plot obtained with STATISTICA. The computed data in Table 9.5 agrees with the results obtained with either STATISTICA or SPSS.
In R one uses the survfit function to obtain the Kaplan-Meier estimate. Assuming one has created the Surv object x as explained in Commands 9.1, one proceeds to calling survfit(x). A plot as in Figure 9.3, with Greenwood’s confidence interval (see section 9.2.3), can be obtained with plot(survfit(x)). Applying summary to survfit(x) the confidence intervals for S(t) are displayed as follows:
time n.risk n.event survival std.err lower 95% CI upper 95% CI
9.2 Non-Parametric Analysis of Survival Data 361
Table 9.5. Kaplan-Meier estimate of the survivor function for the first eight cases of the Car Sale dataset.
Interval Event
Start j p j S j
0 1 0.7143 348 “Death” 4 1 0.75 0.5357 437 Censored 3
0.5 Cumulative Proportion Surviving 0.4 0.3
0.2 Survival Time 0.1
Figure 9.3. Kaplan-Meier estimate of the survivor function for the first eight cases of the Car Sale dataset, obtained with STATISTICA. (The “Complete” cases are the “deaths”.)
Example 9.5
Q: Consider the Heart Valve dataset containing operation dates for heart valve implants at São João Hospital, Porto, Portugal, and dates of subsequent event occurrences, namely death, re-operation and endocarditis. Compute the Kaplan- Meier estimate for the event-free survival time, that is, survival time without occurrence of death, re-operation or endocarditis events. What is the percentage of patients surviving 5 years without any event occurring?
9 Survival Analysis
A: The Heart Valve Survival datasheet contains the computed final date for the study (variable DATE_STOP). This is the date of the first occurring event, if it did occur, or otherwise, the last date the patient was known to be alive and well. The survivor function estimate shown in Figure 9.4 is obtained by using STATISTICA with DATE_OP and DATE_STOP as initial and final dates, and variable EVENT as censored data indicator. From this figure, one can estimate that about 85% of patients survive five years (1825 days) without any event occurring.
S(t) Complete Censored
t (days)
Figure 9.4. Kaplan-Meier estimate of the survivor function for the event-free survival of patients with heart valve implant, obtained with STATISTICA.