Class 7

Logistic Regression & Survival Analysis
Analysis of binary outcome & time to event data

Larry Holmes, Jr
Joabyer Hossain
Stats Research, Lecture 7 November 13, 2008

Presentation Objectives
 At the end of this presentation, participants should be able to :

 Rationale for logistic regression, conduct and interpretation of result
 Survival analysis
– Measure Time and Events
– Understand Truncation and Censoring
– Understand Survival and Hazard Functions
– Define Competing Risks
– Understand Models and Hypothesis Testing
 Log rank
 Kaplan- Meier survival curve & estimates
 Cox Proportional Hazards Model (semi-parametric model)

What is Logistic Regression?
– Logistic regression is often used

because the relationship between
the DV (a discrete variable) and a
predictor is non-linear
 Blood glucose level and diabetes
mellitus
 Hypertension and LDL level
Logistic Regression
In logistic regression:
 Outcome variable is binary
 Purpose of the analysis is to assess the
effects of multiple explanatory variables,
which can be numeric and/or categorical, on
the outcome variable.
Requirements for Logistic Regression
The Following need to be specified:

1) An outcome variable with two possible categorical
outcomes (1=success; 0=failure).
2) Estimating the probability P of the outcome variable.
3) Linking the outcome variable to the explanatory
variables.
4) Estimating the coefficients of the regression equation, as
well as their confidence intervals.
5) Testing the goodness of fit of the regression model.
Measuring the Probability of Outcome
The probability of the outcome is measured

by the odds of occurrence of an event.
If P is the probability of an event, then (1-P) is
the probability of it not occurring.
Odds of success = P / 1-P
P
1 P
The logistic function
) e u
Yi =
1+ e u
 Where Y-hat is the estimated probability

that the ith case is in a category and u is the
regular linear regression equation:
u = A + B1 X 1 + B2 X 2 + L + BK X K
Logistic function
For a response variable y with p(y=1)= P and p(y=0) = 1- P
1.0
0.8
Probability
of disease
Logistic regression will allow

0.6 for the estimation of an
equation that fits a curve the
age/probability of CHD
0.4 relationship
A regression method to
0.2 deal with the case when
the dependent variable y is
binary (dichotomous)
0.0
x
 Change in probability is not constant

(linear) with constant changes in X
 This means that the probability of a
success (Y = 1) given the predictor
variable (X) is a non-linear function,
specifically a logistic function
 It is not obvious how the regression

coefficients for X are related to changes
in the dependent variable (Y) when the
model is written this way
 Change in Y(in probability units)|X
depends on value of X. Look at S-shaped
function
The Logistic Regression
The joint effects of all explanatory variables put together on

the odds is
Odds = P/1-P = e α + β1X1 + β2X2 + …+βpXp
Taking the logarithms of both sides

Log{P/1-P} = log α+β1X1+β2X2+…+βpXp
Logit P = α+β1X1+β2X2+..+βpXp
The coefficients β1, β2, βp are such that the sums of the
squared distance between the observed and predicted
values (i.e. regression line) are smallest.
The Logistic Regression
Logit p = α + β1X1 +β2X2 + .. + βpXp

α represents the overall disease risk
β1 represents the fraction by which the disease risk is
altered by a unit change in X1
β2 is the fraction by which the disease risk is altered by a
unit change in X2
……. and so on.
What changes is the log odds. The odds themselves are
changed by eβ
If β = 1.6 the odds are e1.6 = 4.95
Logistic Regression-Demo
 MS-Excel: No default functions
 SPSS: Analyze > Regression > Binary Logistic > Select
Dependent variable: > Select independent variable
(covariate)
Logistic Regression SPSS output
Logistic Regression SPSS output
Regression vs. Survival Analysis
Technique Predictor Outcome Censoring

Variables Variable permitted?
Linear Categorical or Normally No
Regression continuous distributed
Logistic Categorical or Binary (except in No

continuous polytomous log.
Regression regression)
Survival Time and Binary Yes

Analyses categorical or
continuous
Regression vs. Survival Analysis
Technique Mathematical Yields

model
Linear Y=B1X + Bo Linear changes
Regression (linear)
Logistic Ln(P/1-P)=B1X+Bo Odds ratios

(sigmoidal prob.)
Regression
Survival h(t) = Hazard rates
Analyses ho(t)exp(B1X+Bo)
What is survival analysis?
 Model time to failure or time to event

– Unlike linear regression, survival analysis has a dichotomous
(binary) outcome
– Unlike logistic regression, survival analysis analyzes the time
to an event
 Why is that important?
 Able to account for censoring
 Can compare survival between 2+ groups
 Assess relationship between covariates and
survival time
Survival Analysis
 Survival analysis deals with making inference about

EVENT RATES
 Rate at t = Rate among those at risk at t
 Deals with Median survival (50%) .
 Not Mean survival (need everyone to have an event)
…..Why?
 Survival vs. time-to-event
 Outcome variable = event time
 Examples of events:
– Death, infection, MI,prostate cancer death, hospitalization
– Recurrence of cancer after treatment
Types of censoring
 Subject does not

experience event of
interest
 Incomplete follow-up
– Lost to follow-up
– Withdraws from study
– Dies (if not being studied)
 Left or right censored
Survival Function
 S(t) = P[ T ≥ t ] = 1 – P[ T < t ]
 Plot: Y axis = % alive, X axis = time
 Proportion of population still without the
event by time t
Survival Curve
1.0
0.2 0.4 0.6 0.8 Survival Curve
Proportion Alive
0.0
0 1 2 3 4 5 6 7 8 9
Months since surgery
Hazard Function
 Also termed incidence rate, instantaneous risk,

force of mortality
 λ(t)
 Event rate at t among those at risk for an event
 Key function
 Estimated in a straightforward way
– Censored
– Truncated
Time to Cardiovascular Adverse Event in VIGOR Trial
Hazard Function
 Event = death, scale = months since Tx

 “λ(t) = 1% at t = 12 months”
 “At 1 year, patients are dying at a rate of
1% per month”
 “At 1 year the chance of dying in the
following month is 1%”
Relationship between survivor
function and hazard function
 Survivor function, S(t) defines the
probability of surviving longer than
time t
– this is what the Kaplan-Meier curves show.
– Hazard function is the derivative of the
survivor function over time h(t)=dS(t)/dt
 instantaneous risk of event at time t (conditional
failure rate)
 Survivor and hazard functions can be
converted into each other
Use of survival analysis: clinical trial
 Accrual into the study over 2 years

 Data analysis at year 3
 Reasons for exiting a study
– Died
– Alive at study end
– Withdrawal for non-study related reasons
(LTFU)
– Died from other causes
Kaplan-Meier
 One way to estimate survival

 Nice, simple, can compute by hand
 Can add stratification factors
 Cannot evaluate covariates like Cox model
 No sensible interpretation for competing
risks
Kaplan-Meier estimate
 Multiply together a series of conditional probabilities
Time ti # at risk # events Ŝ

0 20 0 1.00
5 20 2 [1-(2/20)]*1.00=0.90
6 18 0 [1-(0/18)]*0.90=0.90
10 15 1 [1-(1/15)]*0.90=0.84
13 14 2 (1-(2/14)]*0.84=0.72
Proportion Surviving (95% Confidence)
0.6 0.7 0.8 0.9 1.0
0
1.0 0.9 0.8 0.7 0.6
105
Survival Time
Kaplan-Meier Curve
15
20
Kaplan Meier Curve
Limit of Kaplan-Meier curves
 What happens when you have several covariates

that you believe contribute to survival?
 Example
– Smoking, hyperlipidemia, diabetes, hypertension,
contribute to time to myocardial infarct
 Can use stratified K-M curves – for 2 or
maybe 3 covariates
 Need another approach – multivariate Cox
proportional hazards model is most common
-- for many covariates
– (think multivariate regression or logistic regression
rather than a Student’s t-test or the odds ratio from a
2 x 2 table)
Multivariable method: Cox
proportional hazards
 Needed to assess effect of multiple
covariates on survival
 Cox-proportional hazards is the
most commonly used multivariable
survival method
Cox proportional hazard model
 Works with hazard model
 Conveniently separates baseline hazard

function from covariates
– Baseline hazard function over time
 h(t) = ho(t)exp(B1X+Bo)
– Covariates are time independent
– B1 is used to calculate the hazard ratio, which is
similar to the relative risk
 Semi-parametric
Cox Proportional Hazards Model
 Add covariates to the model

 Change in a prognostic factor →
proportional change in the hazard (on the
log scale)
 Can test the effect of the prognostic factor
as in linear regression - H0: β=0
Limitations of Cox PH model
 Does not accommodate variables that change

over time
– Most variables (e.g. gender, ethnicity, or congenital
condition) are constant
 If necessary, one can program time-dependent variables
 When might you want this?
 Baseline hazard function, ho(t), is never specified
– You can estimate ho(t) accurately if you need to
estimate S(t).
Summary
 Survival analyses quantifies time to a single,
dichotomous event
 Handles censored data well
 Survival and hazard can be mathematically converted to
each other
 Kaplan-Meier survival curves can be compared
statistically and graphically
 Cox proportional hazards models help distinguish
individual contributions of covariates on survival,
provided certain assumptions are met.
SPSS output of Survival functions
Survival Table
Cumulative Proportion N of N of
Surviving at the Time Cumulative Remaining
Time Status Estimate Std. Error Events Cases
1 6.000 1 .800 .179 1 4
2 14.000 1 .600 .219 2 3
3 21.000 0 . . 2 2
4 44.000 1 .300 .239 3 1
5 62.000 1 .000 .000 4 0
Means and Medians for Survival Time

a
Mean Median
95% Confidence Interval 95% Confidence Interval
Estimate Std. Error Lower B ound Upper Bound Estimate Std. Error Lower Bound Upper B ound
35.800 11.810 12.652 58.948 44.000 23.875 .000 90.794
a. Estimation is limited to the largest survival time if it is censored.
SPSS output of KM plot
SPSS output of cumulative hazard
SPSS output of Cox Regression
Omnibus Tests of Model Coefficientsa,b
-2 Log Overall (score) Change From Previous Step Change From Previous Block
Likelihood Chi-square df Sig. Chi-square df Sig. Chi-square df Sig.
6.732 .468 1 .494 .646 1 .422 .646 1 .422
a. Beginning Block Number 0, initial Log Likelihood function: -2 Log likelihood: 7.378
b. Beginning Block Number 1. Method = Enter
Variables in the Equation
B SE Wald df Sig. Exp(B)

psa -1.393 2.305 .365 1 .546 .248

Class 7

Uploaded by

Copyright:

Available Formats

Class 7

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Class 7

Uploaded by

Copyright:

Available Formats

Logistic Regression & Survival Analysis

Analysis of binary outcome & time to event data

Stats Research, Lecture 7 November 13, 2008

 At the end of this presentation, participants should be able to :

 Kaplan- Meier survival curve & estimates

 Cox Proportional Hazards Model (semi-parametric model)

– Logistic regression is often used

The Following need to be specified:

The probability of the outcome is measured

 Where Y-hat is the estimated probability

For a response variable y with p(y=1)= P and p(y=0) = 1- P

Logistic regression will allow

 Change in probability is not constant

 It is not obvious how the regression

The joint effects of all explanatory variables put together on

Taking the logarithms of both sides

Logit p = α + β1X1 +β2X2 + .. + βpXp

Technique Predictor Outcome Censoring

Logistic Categorical or Binary (except in No

Survival Time and Binary Yes

Technique Mathematical Yields

Logistic Ln(P/1-P)=B1X+Bo Odds ratios

 Model time to failure or time to event

 Survival analysis deals with making inference about

 Subject does not

 Also termed incidence rate, instantaneous risk,

 Event = death, scale = months since Tx

 Accrual into the study over 2 years

 One way to estimate survival

 Multiply together a series of conditional probabilities

Time ti # at risk # events Ŝ

 What happens when you have several covariates

 Works with hazard model

 Conveniently separates baseline hazard

 Add covariates to the model

 Does not accommodate variables that change

Means and Medians for Survival Time

Omnibus Tests of Model Coefficientsa,b

Variables in the Equation

B SE Wald df Sig. Exp(B)

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.