0% found this document useful (0 votes)
36 views43 pages

Lecture 14: Nonparametric Survival Analysis Methods: James J. Dignam

This document introduces non-parametric methods for estimating the survivor function and hazard function from survival data that may include right-censored observations. It first considers the case where there is no censoring and the empirical survivor function can be used. It then presents an example survival dataset with no censoring and estimates the survivor function. Finally, it discusses how censoring poses a problem for the empirical approach and introduces the Kaplan-Meier estimator as a non-parametric method that accounts for censoring.

Uploaded by

cdcdiver
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
36 views43 pages

Lecture 14: Nonparametric Survival Analysis Methods: James J. Dignam

This document introduces non-parametric methods for estimating the survivor function and hazard function from survival data that may include right-censored observations. It first considers the case where there is no censoring and the empirical survivor function can be used. It then presents an example survival dataset with no censoring and estimates the survivor function. Finally, it discusses how censoring poses a problem for the empirical approach and introduces the Kaplan-Meier estimator as a non-parametric method that accounts for censoring.

Uploaded by

cdcdiver
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 43

Lecture 14: Nonparametric Survival Analysis

Methods

James J. Dignam

Department of Public Health Sciences


University of Chicago

J. Dignam (UChicago) Lecture 14 Feb. 25, 2020 1 / 42


Non-parametric Estimators in Survival Analysis

In this lecture, we review non-parametric (distribution-free)


methods to estimate the survivor function and hazard function
We will assume non-informative right censoring is in effect
To motivate the derivation of these estimators, we will first
consider a set of survival times where there is no censoring.

J. Dignam (UChicago) Lecture 14 Feb. 25, 2020 2 / 42


If there is no censoring

Consider a single sample of survival times, where each is


completely observed to failure, so none of the observations are
censored: T1 , ..., Tn i i d ∼ F (t )

The survivor function S(t ) = P (T ≥ t ) is the probability that an


individual survives to a time greater than or equal to t . This could
be estimated by the empirical survivor function, the proportion of
individuals with survival times greater than or equal to t, given by

Number of individuals with t i ≥ t


Ŝ(t ) = (1)
Number of individuals in the (initial) sample

J. Dignam (UChicago) Lecture 14 Feb. 25, 2020 3 / 42


Example 1: Pulmonary Metastasis

One complication in the management of patients with a malignant


bone tumor, or osteosarcoma, is that the tumor often metastasizes to
the lungs. The following data give the survival times, in months, of
eleven male patients in a study of treatment for pulmonary metastasis
arising from osteosarcoma.

11 13 13 13 13 13 14 14 15 15 17
This is a case where all the survival times are fully observed (no
censoring, which does not often occur in medical studies).

J. Dignam (UChicago) Lecture 14 Feb. 25, 2020 4 / 42


Pulmonary metastasis: estimate S(t )

11 13 13 13 13 13 14 14 15 15 17

One approach: estimate S(t ) by computing the survival


proportions following:

Number of individuals with t i ≥ t


Ŝ(t ) =
Number of individuals in the data set

11
0 < t ≤ 11: Ŝ(t ) = P̂ (T ≥ 11) = 11 =1
10
11 < t ≤ 13: Ŝ(t ) = P̂ (T ≥ 13) = 11 = 0.909
5
13 < t ≤ 14: Ŝ(t ) = P̂ (T ≥ 14) = 11 = 0.455
3
14 < t ≤ 15: Ŝ(t ) = P̂ (T ≥ 15) = 11 = 0.273
1
15 < t ≤ 17: Ŝ(t ) = P̂ (T ≥ 17) = 11 = 0.091
+ 0
17 < t : Ŝ(t ) = P̂ (T ≥ 17 ) = 11 = 0

J. Dignam (UChicago) Lecture 14 Feb. 25, 2020 5 / 42


Pulmonary metastasis: estimate S(t )

. use pulmonary_metastasis . d t a
. s t s e t time

f a i l u r e event : ( assumed t o f a i l a t t i m e = t i m e )
obs . t i m e i n t e r v a l : (0 , time ]
e x i t on o r b e f o r e : failure

−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
11 t o t a l observations
0 exclusions
−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
11 o b s e r v a t i o n s remaining , r e p r e s e n t i n g
11 f a i l u r e s i n s i n g l e − r e c o r d / s i n g l e − f a i l u r e data
151 t o t a l a n a l y s i s t i m e a t r i s k and under o b s e r v a t i o n
a t r i s k from t = 0
e a r l i e s t observed e n t r y t = 0
l a s t observed e x i t t = 17

The stset command tells Stata that this is a survival time variable -
must have certain properties (non-negative, may have censoring var
associated with it)

J. Dignam (UChicago) Lecture 14 Feb. 25, 2020 6 / 42


Pulmonary metastasis: estimate S(t )
. s t s graph

f a i l u r e _d : 1 ( meaning a l l fail )
analysis time _t : time

Kaplan-Meier survival estimate


1.00
0.75
0.50
0.25
0.00

0 5 10 15 20
analysis time

Ŝ(t ) is 1 from the time origin until the time of first death (11 months).
Ŝ(t ) is 0 after the last observed survival time (17 months).
Ŝ(t ) is non-increasing in t .

The sts command relates to a set of survival summaries that are


available - graph is one of these.
J. Dignam (UChicago) Lecture 14 Feb. 25, 2020 7 / 42
Pulmonary metastasis: estimate S(t )
. sts l i s t

f a i l u r e _d : 1 ( meaning a l l fail )
analysis time _t : time

Beg . Net Survivor Std .


Time Total Fail Lost Function Error [95% Conf . I n t . ]
−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
11 11 1 0 0.9091 0.0867 0.5081 0.9867
13 10 5 0 0.4545 0.1501 0.1666 0.7069
14 5 2 0 0.2727 0.1343 0.0652 0.5389
15 3 2 0 0.0909 0.0867 0.0054 0.3329
17 1 1 0 0.0000 . . .
−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−

Survival to the first failure time is 100% (S(t) =1.0). Stata does not show this
(other pgms do by convention).
First value change of the estimated survivor function occurs at time 11 months,
Ŝ(11+ ) = 0.9091:
Second value change of the estimated survivor function occurs at time 13
months, Ŝ(13+ ) = 0.4545:
...

J. Dignam (UChicago) Lecture 14 Feb. 25, 2020 8 / 42


Pulmonary Metastasis: estimate S(t )
11 13 13 13 13 13 14 14 15 15 17

The other approach: we can estimate S(t ) using the conditional probability idea:
11
0 < t ≤ 11 : Ŝ(t ) = P̂ (T ≥ 11) = 11 =1
11 < t ≤ 13:

Ŝ(t ) = P̂ (T ≥ t ) = P̂ (T ≥ 13) = P̂ (T ≥ 13, T ≥ 11)


10 11
= P̂ (T ≥ 13 | T ≥ 11)P̂ (T ≥ 11) = ∗ = 0.909
11 11

13 < t ≤ 14:

Ŝ(t ) = P̂ (T ≥ 14) = P̂ (T ≥ 14, T ≥ 13, T ≥ 11)


5 10 11
= P̂ (T ≥ 14 | T ≥ 13)P̂ (T ≥ 13 | T ≥ 11)P̂ (T ≥ 11) = ∗ ∗ = 0.455
10 11 11
...
This conditional probability idea allows for extension to the case where we have
right censoring.

J. Dignam (UChicago) Lecture 14 Feb. 25, 2020 9 / 42


Pulmonary Metastasis: estimate S(t )
11 13 13 13 13 13 14 14 15 15 17

The other approach: we can estimate S(t ) using the conditional probability idea:
11
0 < t ≤ 11 : Ŝ(t ) = P̂ (T ≥ 11) = 11 =1
11 < t ≤ 13:

Ŝ(t ) = P̂ (T ≥ t ) = P̂ (T ≥ 13) = P̂ (T ≥ 13, T ≥ 11)


10 11
= P̂ (T ≥ 13 | T ≥ 11)P̂ (T ≥ 11) = ∗ = 0.909
11 11

13 < t ≤ 14:

Ŝ(t ) = P̂ (T ≥ 14) = P̂ (T ≥ 14, T ≥ 13, T ≥ 11)


5 10 11
= P̂ (T ≥ 14 | T ≥ 13)P̂ (T ≥ 13 | T ≥ 11)P̂ (T ≥ 11) = ∗ ∗ = 0.455
10 11 11
...
This conditional probability idea allows for extension to the case where we have
right censoring.

J. Dignam (UChicago) Lecture 14 Feb. 25, 2020 9 / 42


What if there is censoring?

The method of estimating the survivor function using the empirical


survivor function cannot be used when there are censored
observations.
The reason for this is that the method does not allow information
provided by an individual whose survival time is censored before
time t to be used in computing the estimated survivor function at t.
The best known non-parametric method that accounts for
censoring is the Kaplan-Meier estimator.
Introduced in 1958 (JASA) - Paul Meier, Department of Statistics,
University of Chicago 1950’s-1990’s.
Kaplan-Meier estimator is widely used today (their original paper
has been cited over 57,000 times (Google Scholar) since its
publication).

J. Dignam (UChicago) Lecture 14 Feb. 25, 2020 10 / 42


Kaplan-Meier estimator

HEALTH

Paul Meier, Statistician Who


Revolutionized Medical Trials, Dies at 87
By DENNIS HEVESI AUG. 12, 2011

Paul Meier, a leading medical statistician who had a major influence on how the
federal government assesses and makes decisions about new treatments that can
affect the lives of millions, died on Sunday at his home in Manhattan. He was 87.

The cause was complications of a stroke, his daughter Diane Meier said.

As early as the mid-1950s, Dr. Meier was one of the first and most vocal
proponents of what is called “randomization.”

Under the protocol, researchers randomly assign one group of patients to


receive an experimental treatment and another to receive the standard treatment. In
that way, the researchers try to avoid unintentionally skewing the results by
choosing, for example, the healthier or younger patients to receive the new
treatment.

If the number of subjects is large enough, the two groups will be the same in
every respect except the treatment they receive. Such randomized controlled trials
are considered the most rigorous way to conduct a study and the best way to gather

4 SEE MY OPTIONS Subscriber login


J.ARTICLES
Dignam REMAINING
(UChicago) Lecture 14 Feb. 25, 2020 11 / 42
Kaplan-Meier estimate of S(t ): notation

Consider n individuals with observed survival times t 1 , t 2 , ..., t n .


Some of these observations may be right-censored, and there
may also be more than one individual with the same observed
survival time.
Suppose that there are r (distinct) survival times (event occurred,
not censored) among those n individuals where r ≤ n . Let’s
arrange these r survival times in ascending order, the j t h is
denoted t ( j ) , for j = 1, 2, ..., r , and so the r ordered survival times
are t (1) < t (2) < · · · < t (r ) .
For t ( j ) , let
n j denote the total number at risk at time t ( j ) (the number of
individuals who are known to be alive just before time t ( j ) , including
those who are about to fail at t ( j ) )
d j denote the total number of deaths occurring at time t ( j )

J. Dignam (UChicago) Lecture 14 Feb. 25, 2020 12 / 42


Kaplan-Meier estimate of S(t ): notation

To compute the estimator, at each of the ordered survival times


t (1) < t (2) < · · · < t (r ) :
Record n j , by counting all those whose failure (or censored time is
equal or greater than t ( j )
Record the number of failures d j at each t ( j )
From this information alone, Kaplan-Meier estimate of S(t ) can be
computed

J. Dignam (UChicago) Lecture 14 Feb. 25, 2020 13 / 42


Example: Time to discontinuation of the use of an IUD

World Health Organization (WHO) data from clinical trials involving a


number of different types of contraceptive (WHO, 1987): The data in
Table 1 are the number of weeks from the commencement of use of a
particular type of intrauterine device (IUD), known as the Multiload
250, until discontinuation because of menstrual bleeding problems.
Discontinuation times that are censored are labeled with an asterisk.

Table 1: Time in weeks to discontinuation of the use of an IUD


10 13* 18* 19 23* 30 36 38* 54*
56* 59 75 93 97 104* 107 107* 107*

J. Dignam (UChicago) Lecture 14 Feb. 25, 2020 14 / 42


Time to discontinuation of the use of an IUD
(continued)

For analytic purposes, survival data are recorded using two variables:

A variable for the observed failure/last status time. For


observations that are not censored, it is the failure time; for
observations that are right censored, it is the censored failure time
(the actual failure time is unknown and exceeds the censored
time).
A second variable is the event indicator, 1 if event is observed, 0
of not observed (censored).
Each individual has data pair (t i me, st at us)

J. Dignam (UChicago) Lecture 14 Feb. 25, 2020 15 / 42


Time to discontinuation of the use of an IUD
(continued)
. use d i s c o n t i n u a t i o n _ I U D . d t a
. list

+−−−−−−−−−−−−−−−+
| time status |
|−−−−−−−−−−−−−−−|
1. | 10 1 |
2. | 13 0 |
3. | 18 0 |
4. | 19 1 |
5. | 23 0 |
|−−−−−−−−−−−−−−−|
6. | 30 1 |
7. | 36 1 |
8. | 38 0 |
9. | 54 0 |
10. | 56 0 |
|−−−−−−−−−−−−−−−|
11. | 59 1 |
12. | 75 1 |
13. | 93 1 |
14. | 97 1 |
15. | 104 0 |
|−−−−−−−−−−−−−−−|
16. | 107 1 |
17. | 107 0 |
18. | 107 0 |
+−−−−−−−−−−−−−−−+

J. Dignam (UChicago) Lecture 14 Feb. 25, 2020 16 / 42


Time to discontinuation of the use of an IUD
(continued)
C
C
D CC D C D DC C C D D D D C D

0 t(1) t(2) t(3) t(4) t(5) t(6) t(7) t(8) t(9)


10 19 30 36 59 75 93 97 107

By convention, when censored survival times occur at the same time as one or
more failures, the censored survival time is taken to occur immediately after the
failure time.
t( j ) nj dj
10 18 1
19 15 1
30 13 1
36 12 1
59 8 1
75 7 1
93 6 1
97 5 1
107 3 1
J. Dignam (UChicago) Lecture 14 Feb. 25, 2020 17 / 42
Kaplan-Meier estimate of S(t )

Let’s apply the conditional probability idea.


For 0 < t ≤ t (1) , Ŝ(t ) = 1.
For t (k−1) < t ≤ t (k) ,

Ŝ(t ) = P̂ (T ≥ t (k) | T ≥ t (k−1) )P̂ (T ≥ t (k−1) )


and so on all the way back to t (1)
The Kaplan-Meier estimate of the survivor function is given by
k dj k n −d
Y Y j j
Ŝ(t ) = (1 − )= ( ) (2)
j =1 nj j =1 n j

for t (k) < t ≤ t (k+1) , k = 1, 2, ..., r, with Ŝ(t ) = 1 for t ≤ t (1) , and where
t (r +1) = ∞.

J. Dignam (UChicago) Lecture 14 Feb. 25, 2020 18 / 42


Kaplan-Meier estimate of S(t )

Strictly speaking, if the largest observation is a censored survival


time, t ∗ , say, Ŝ(t ) is undefined for t > t ∗ .
On the other hand, if the largest observed survival time, t (r ) , is an
uncensored observation, n r = dr , and so Ŝ(t ) is 0 for t > t (r ) .
A plot of the Kaplan-Meier estimate of the survivor function is a
step-function, in which the estimated survival probabilities are
constant between adjacent ordered survival times and decrease
at each ordered survival time.

J. Dignam (UChicago) Lecture 14 Feb. 25, 2020 19 / 42


Time to discontinuation of the use of an IUD
(continued)

Table 2: Kaplan-Meier estimate of the survivor function for the IUD data.
Time interval nj dj (n j − d j )/n j Ŝ(t )
0- 18 0 1.0000 1.0000
10- 18 1 0.9444 0.9444
19- 15 1 0.9333 0.8815
30- 13 1 0.9231 0.8137
36- 12 1 0.9167 0.7459
59- 8 1 0.8750 0.6526
75- 7 1 0.8571 0.5594
93- 6 1 0.8333 0.4662
97- 5 1 0.8000 0.3729
107 3 1 0.6667 0.2486

Note that since the largest discontinuation time of 107 days is


censored, Ŝ(t ) is not defined beyond t = 107.

J. Dignam (UChicago) Lecture 14 Feb. 25, 2020 20 / 42


Time to discontinuation of the use of an IUD
(continued)

. s t s e t time , f a i l u r e ( s t a t u s )

f a i l u r e event : status != 0 & status < .


obs . t i m e i n t e r v a l : (0 , time ]
e x i t on o r b e f o r e : failure

−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
18 t o t a l observations
0 exclusions
−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
18 o b s e r v a t i o n s remaining , r e p r e s e n t i n g
9 f a i l u r e s i n s i n g l e − r e c o r d / s i n g l e − f a i l u r e data
1046 t o t a l a n a l y s i s t i m e a t r i s k and under o b s e r v a t i o n
a t r i s k from t = 0
e a r l i e s t observed e n t r y t = 0
l a s t observed e x i t t = 107

J. Dignam (UChicago) Lecture 14 Feb. 25, 2020 21 / 42


Time to discontinuation of the use of an IUD
(continued)

. s t s graph

f a i l u r e _d : status
analysis time _t : time

Kaplan-Meier survival estimate


1.00
0.75
0.50
0.25
0.00

0 20 40 60 80 100
analysis time

J. Dignam (UChicago) Lecture 14 Feb. 25, 2020 22 / 42


Time to discontinuation of the use of an IUD
(continued)
. sts l i s t

f a i l u r e _d : status
analysis time _t : time

Beg . Net Survivor Std .


Time Total Fail Lost Function Error [95% Conf . I n t . ]
−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
10 18 1 0 0.9444 0.0540 0.6664 0.9920
13 17 0 1 0.9444 0.0540 0.6664 0.9920
18 16 0 1 0.9444 0.0540 0.6664 0.9920
19 15 1 0 0.8815 0.0790 0.6019 0.9691
23 14 0 1 0.8815 0.0790 0.6019 0.9691
30 13 1 0 0.8137 0.0978 0.5241 0.9363
36 12 1 0 0.7459 0.1107 0.4536 0.8970
38 11 0 1 0.7459 0.1107 0.4536 0.8970
54 10 0 1 0.7459 0.1107 0.4536 0.8970
56 9 0 1 0.7459 0.1107 0.4536 0.8970
59 8 1 0 0.6526 0.1303 0.3438 0.8432
75 7 1 0 0.5594 0.1412 0.2564 0.7804
93 6 1 0 0.4662 0.1452 0.1830 0.7097
97 5 1 0 0.3729 0.1430 0.1209 0.6310
104 4 0 1 0.3729 0.1430 0.1209 0.6310
107 3 1 2 0.2486 0.1392 0.0468 0.5313
−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−

J. Dignam (UChicago) Lecture 14 Feb. 25, 2020 23 / 42


Standard error of the Kaplan-Meier estimate
The Kaplan-Meier estimate of the survivor function for any value of t in
the interval from t (k) to t (k+1) can be written as
k n −d
Y j j
Ŝ(t ) =
j =1 nj

Variance estimate by Greenwood’s Formula:


k dj
Var{Ŝ(t )} ≈ [Ŝ(t )]2
X
(3)
j =1 n j (n j −dj)

thus, the standard error is given by


k
X dj 1
se{Ŝ(t )} ≈ Ŝ(t ){ }2 (4)
j =1 n j (n j −dj)

J. Dignam (UChicago) Lecture 14 Feb. 25, 2020 24 / 42


Confidence interval for survivor function

1 (Ŝ(t ) − z α2 · se{Ŝ(t )}, Ŝ(t ) + z α2 · se{Ŝ(t )})


One difficulty is that when the estimated survivor function is close
to 0 or 1, this method can lead to confidence limits for the survivor
function that lie outside the interval (0,1).
2 An alternative procedure is to transform Ŝ(t ) to a value in the
range (−∞, ∞), and obtain a CI for the transformed value. The
resulting confidence limits are then back-transformed to give a CI
for S(t ) itself.

J. Dignam (UChicago) Lecture 14 Feb. 25, 2020 25 / 42


Confidence interval for survivor function

For example, use complementary log-log transform of S(t )

The standard error for log[− log(Ŝ(t ))] is


s
X di ¡ ¢
SE{log[− log S(t )]} = / − log Ŝ(t )
i :t i ≤t n i (n i − d i )

The confidence interval for S(t ) is

(Ŝ(t )exp[−zα/2 ·SE{log[−logŜ(t )]}] , Ŝ(t )exp[zα/2 ·SE{log[−logŜ(t )]}] )

J. Dignam (UChicago) Lecture 14 Feb. 25, 2020 26 / 42


Time to discontinuation of the use of an IUD
(continued)

. s t s graph , gwood

f a i l u r e _d : status
analysis time _t : time

Kaplan-Meier survival estimate


1
.75
.5
.25
0

0 20 40 60 80 100
analysis time

95% CI Survivor function

J. Dignam (UChicago) Lecture 14 Feb. 25, 2020 27 / 42


Estimating the median and other percentiles of
survival times

The median and other percentiles are frequently used as


summary measure of the distribution and survival experience
Median survival time t .50 is that time such that half of all survival
times are larger than t .50 and half are smaller, i.e.

F (t .50 ) = S(t .50 ) = 0.5 (5)

The p th percentile of survival times t p is that time such that a


100
fraction p/100 of survival times are less than t p and the other
100
remaining fraction 1 − p/100 of times are larger than t p , i.e.
100

F (t p ) = p/100 ⇔ S(t p ) = 1 − p/100 (6)


100 100

J. Dignam (UChicago) Lecture 14 Feb. 25, 2020 28 / 42


Estimating the median and other percentiles of
survival times

To deal with the discreteness in Ŝ(t ), define

t̂ p = inf{t | Ŝ(t ) ≤ 1 − p/100} (7)


100

i.e., the earliest time t where Ŝ(t ) dips below 1 − p/100.


In the example of time to discontinuation of the use of an IUD, the
smallest discontinuation time where the estimated probability of
discontinuation dips below 0.5 is 93, t̂ .50 = 93; the smallest
discontinuation time where Ŝ(t ) dips below 1 − 0.25 is 36, t̂ .25 = 36; the
smallest discontinuation time where Ŝ(t ) dips below 1 − 0.75 is 107,
t̂ .75 = 107

J. Dignam (UChicago) Lecture 14 Feb. 25, 2020 29 / 42


Estimating the median and other percentiles of
survival times

The stsum command in Stata is useful to summarize survival data


. stsum

f a i l u r e _d : status
analysis time _t : time

| incidence no . o f |−−−−−− S u r v i v a l t i m e −−−−−|


| time at r i s k rate subjects 25% 50% 75%
−−−−−−−−−+−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
total | 1046 .0086042 18 36 93 107

J. Dignam (UChicago) Lecture 14 Feb. 25, 2020 30 / 42


Kaplan-Meier estimate of the cumulative hazard
function

Since H (t ) = −log(S(t )), we can estimate H (t ) by


k
X nj −dj
Ĥ (t ) = −log(Ŝ(t )) = − log( ) (8)
j =1 nj

for t in the interval from t (k) to t (k+1) .


If the hazard function is assumed to be constant between
successive death times, then the hazard function in the interval
from t (k) to t (k+1) can be estimated by

dk
ĥ(t ) = (9)
n k τk

where τk = t (k+1) − t (k) .

J. Dignam (UChicago) Lecture 14 Feb. 25, 2020 31 / 42


Life-table Estimate of the Survivor Function

Actuarial Method

Method used by actuaries, demographers, etc.


Motivated by the case where the survival data are grouped into
intervals, in which case the estimation of the survivor function is
complicated by the fact that we don’t know exactly when during
each time interval an event occurs.
Could be applied to ungrouped survival data by first grouping
survival data into intervals.

J. Dignam (UChicago) Lecture 14 Feb. 25, 2020 32 / 42


Life-table estimate of the survivor function: notation

The j t h time interval is [t j , t j +1 )


c j : the number of censored survival times in the j t h interval
d j : the number of deaths (events) in the j t h interval
n j : the number of individuals who are alive, and therefore at risk
of death, at the start of the j t h interval.

J. Dignam (UChicago) Lecture 14 Feb. 25, 2020 33 / 42


Life-table estimate of the survivor function: notation
(continued)

Table 3: Time in weeks to discontinuation of the use of an IUD


10 13* 18* 19 23* 30 36 38* 54*
56* 59 75 93 97 104* 107 107* 107*

Time interval j dj cj nj
[0 − 10) 1 0 0 18
[10 − 20) 2 2 2 18
[20 − 30) 3 0 1 14
[30 − 40) 4 2 1 13
[40 − 50) 5 0 0 10
[50 − 60) 6 1 2 10
[60 − 70) 7 0 0 7
[70 − 80) 8 1 0 7
[80 − 90) 9 0 0 6
[90 − 100) 10 2 0 6
[100 − 110) 11 1 3 4

J. Dignam (UChicago) Lecture 14 Feb. 25, 2020 34 / 42


Life-table estimate of the survivor function (continued)

We could apply the Kaplan-Meier formula directly to the numbers


in the table on the previous page, estimating S(t ) by
k
Y dj
Ŝ(t ) = (1 − )
j =1 nj

for t in the k t h interval from t k to t k+1


However, this approach is unsatisfactory for grouped data: it treats
the problem as if it were in discrete time, with events happening
only at 10 weeks, 20 weeks, 30 week, etc. In fact, what we are
trying to calculate here is the conditional probability of dying
(event) within the interval, given survival to the beginning of it.

J. Dignam (UChicago) Lecture 14 Feb. 25, 2020 35 / 42


Life-table estimate of the survivor function (continued)

What should we do with the censored individuals? We should


assume that censoring occurs:
at the beginning of each interval: n 0j = n j − c j
at the end of each interval: n 0j = n j
on average, number of subjects at risk within the interval:
n 0j = n j − c j /2
The last assumption yields the Life-table (Actuarial) estimator. It is
appropriate if censorings occur uniformly throughout the interval,
which is reasonable to assume in absence of evidence otherwise:
k
Y dj
Ŝ(t ) = (1 − ) (10)
j =1 n j − c j /2

for the j t h interval.

J. Dignam (UChicago) Lecture 14 Feb. 25, 2020 36 / 42


Time to discontinuation of the use of an IUD

Table 4: Life-table estimate of the survivor function for the data of Time to
discontinuation of the use of an IUD
dj
Time interval j dj cj nj 1 − n −c /2 Ŝ(t )
j j
[0 − 10) 1 0 0 18 1 1
[10 − 20) 2 2 2 18 0.8824 0.8824
[20 − 30) 3 0 1 14 1 0.8824
[30 − 40) 4 2 1 13 0.8400 0.7412
[40 − 50) 5 0 0 10 1 0.7412
[50 − 60) 6 1 2 10 0.8889 0.6588
[60 − 70) 7 0 0 7 1 0.6588
[70 − 80) 8 1 0 7 0.8571 0.5647
[80 − 90) 9 0 0 6 1 0.5647
[90 − 100) 10 2 0 6 0.6667 0.3765
[100 − 110) 11 1 3 4 0.6000 0.2259

J. Dignam (UChicago) Lecture 14 Feb. 25, 2020 37 / 42


Time to discontinuation of the use of an IUD

. l t a b l e time status , i n t e r v a l (10)

Beg . Std .
Interval Total Deaths Lost Survival Error [95% Conf . I n t . ]
−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
10 20 18 2 2 0.8824 0.0781 0.6060 0.9692
20 30 14 0 1 0.8824 0.0781 0.6060 0.9692
30 40 13 2 1 0.7412 0.1126 0.4451 0.8951
50 60 10 1 2 0.6588 0.1267 0.3572 0.8444
70 80 7 1 0 0.5647 0.1392 0.2642 0.7824
90 100 6 2 0 0.3765 0.1429 0.1234 0.6337
100 110 4 1 3 0.2259 0.1448 0.0314 0.5276
−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−

J. Dignam (UChicago) Lecture 14 Feb. 25, 2020 38 / 42


Life-table estimates of the cumulative hazard function

Since H (t ) = −log(S(t )), we can estimate H (t ) by


k
X dj
Ĥ (t ) = −log(Ŝ(t )) = − log(1 − ) (11)
j =1 n 0j

where n 0j = n j − c j /2, and t is in the interval of t k to t k+1 .


The life-table estimate of the hazard function in the k t h time
interval (from t k to t k+1 ) is given by

dk
ĥ(t ) = (12)
(n k0 − d k /2)τk

where τk = t k+1 − t k .

J. Dignam (UChicago) Lecture 14 Feb. 25, 2020 39 / 42


Time to discontinuation of the use of an IUD

Actuarial Estimate
. l t a b l e t i m e s t a t u s , i n t e r v a l ( 1 0 ) graph
1 .8
Proportion Surviving
.6 .4
.2

20 40 60 80 100 120
time

J. Dignam (UChicago) Lecture 14 Feb. 25, 2020 40 / 42


Time to discontinuation of the use of an IUD
Actuarial Estimate

. l t a b l e t i m e s t a t u s , i n t e r v a l ( 1 0 ) hazard

Beg . Cum. Std . Std .


Interval Total Failure Error Hazard Error [95% Conf . I n t . ]
−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
10 20 18 0.1176 0.0781 0.0125 0.0088 0.0000 0.0298
20 30 14 0.1176 0.0781 0.0000 . . .
30 40 13 0.2588 0.1126 0.0174 0.0123 0.0000 0.0414
50 60 10 0.3412 0.1267 0.0118 0.0117 0.0000 0.0348
70 80 7 0.4353 0.1392 0.0154 0.0153 0.0000 0.0454
90 100 6 0.6235 0.1429 0.0400 0.0277 0.0000 0.0943
100 110 4 0.7741 0.1448 0.0500 0.0484 0.0000 0.1449
−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−

J. Dignam (UChicago) Lecture 14 Feb. 25, 2020 41 / 42


Summary

Estimating S(t)

Non-parametric KM estimator is most commonly used and


reported
Actuarial estimator still relevant, used in public health life tables
See https://www.cdc.gov/nchs
Next: Inference for survival data

J. Dignam (UChicago) Lecture 14 Feb. 25, 2020 42 / 42

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy