The LIFEREG Procedure

Download as pdf or txt
Download as pdf or txt
You are on page 1of 39

Chapter 36

The LIFEREG Procedure

Chapter Table of Contents


OVERVIEW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1761
GETTING STARTED . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1763
Modeling Right-Censored Failure Time Data . . . . . . . . . . . . . . . . . 1764
SYNTAX . . . . . . . . . . .
PROC LIFEREG Statement
BY Statement . . . . . . . .
CLASS Statement . . . . . .
MODEL Statement . . . . .
OUTPUT Statement . . . .
WEIGHT Statement . . . .

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

. 1767
. 1768
. 1769
. 1769
. 1770
. 1775
. 1777

DETAILS . . . . . . . . .
Missing Values . . . . .
Main Effects . . . . . . .
Computational Method .
Model Specifications . .
Supported Distributions .
Predicted Values . . . .
OUTEST= Data Set . . .
Computational Resources
Displayed Output . . . .
ODS Table Names . . .

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

. 1777
. 1777
. 1777
. 1778
. 1778
. 1780
. 1783
. 1784
. 1784
. 1785
. 1786

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

EXAMPLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1786
Example 36.1 Motorette Failure . . . . . . . . . . . . . . . . . . . . . . . . 1786
Example 36.2 Computing Predicted Values for a Tobit Model . . . . . . . . . 1789
Example 36.3 Overcoming Convergence Problems by Specifying Initial Values1793
REFERENCES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1796

1760 

Chapter 36. The LIFEREG Procedure

SAS OnlineDoc: Version 8

Chapter 36

The LIFEREG Procedure


Overview
The LIFEREG procedure fits parametric models to failure time data that can be right,
left, or interval censored. The models for the response variable consist of a linear
effect composed of the covariates and a random disturbance term. The distribution
of the random disturbance can be taken from a class of distributions that includes the
extreme value, normal, logistic, and, by using a log transformation, the exponential,
Weibull, lognormal, loglogistic, and gamma distributions.
The model assumed for the response y is
y

= X + 

where y is a vector of response values, often the log of the failure times, X is a matrix of covariates or independent variables (usually including an intercept term), is
a vector of unknown regression parameters,  is an unknown scale parameter, and 
is a vector of errors assumed to come from a known distribution (such as the standard
normal distribution). The distribution may depend on additional shape parameters.
These models are equivalent to accelerated failure time models when the log of the
response is the quantity being modeled. The effect of the covariates in an accelerated failure time model is to change the scale, and not the location, of a baseline
distribution of failure times.
The LIFEREG procedure estimates the parameters by maximum likelihood using a
Newton-Raphson algorithm. PROC LIFEREG estimates the standard errors of the
parameter estimates from the inverse of the observed information matrix.
The accelerated failure time model assumes that the effect of independent variables
on an event time distribution is multiplicative on the event time. Usually, the scale
function is exp(x0 ), where x is the vector of covariate values and is a vector
of unknown parameters. Thus, if T0 is an event time sampled from the baseline
distribution corresponding to values of zero for the covariates, then the accelerated
failure time model specifies that, if the vector of covariates is x, the event time is
T = exp(x0 )T0 . If y = log(T ) and y0 = log(T0 ), then

y = x0 + y0
This is a linear model with y0 as the error term.
In terms of survival or exceedance probabilities, this model is

Pr(T > t j x) = Pr(T0 > exp(,x0 )t)

1762 

Chapter 36. The LIFEREG Procedure

The probability on the left-hand side of the equal sign is evaluated given the value x
for the covariates, and the right-hand side is computed using the baseline probability
distribution but at a scaled value of the argument. The right-hand side of the equation represents the value of the baseline Survival Distribution Function evaluated at
exp(,x0 )t.
Usually, an intercept parameter and a scale parameter are allowed in the model. In
terms of the original untransformed event times, the effects of the intercept term and
the scale term are to scale the event time and power the event time, respectively. That
is, if

log(T ) =  +  log(T0 )
then

T = exp()T0
Although it is possible to fit these models to the original response variable using
the NOLOG option, it is more common to model the log of the response variable.
Because of this log transformation, zero values for the observed failure times are
not allowed unless the NOLOG option is specified. Similarly, small values for the
observed failure times lead to large negative values for the transformed response. The
NOLOG option should only be used if you want to fit a distribution appropriate for
the untransformed response, the extreme value instead of the Weibull, for example.
The parameter estimates for the normal distribution are sensitive to large negative
values, and care must be taken that the fitted model is not unduly influenced by them.
Likewise, values that are extremely large even after the log transformation have a
strong influence in fitting the extreme value (Weibull) and normal distributions. You
should examine the residuals and check the effects of removing observations with
large residuals or extreme values of covariates on the model parameters. The logistic
distribution gives robust parameter estimates in the sense that the estimates have a
bounded influence function.
The standard errors of the parameter estimates are computed from large sample normal approximations using the observed information matrix. In small samples, these
approximations may be poor. Refer to Lawless (1982) for additional discussion and
references. You can sometimes construct better confidence intervals by transforming
the parameters. For example, large sample theory is often more accurate for log( )
than  . Therefore, it may be more accurate to construct confidence intervals for
log() and transform these into confidence intervals for . The parameter estimates
and their estimated covariance matrix are available in an output SAS data set and can
be used to construct additional tests or confidence intervals for the parameters. Alternatively, tests of parameters can be based on log-likelihood ratios. Refer to Cox and
Oakes (1984) for a discussion of the merits of some possible test methods including
score, Wald, and likelihood ratio tests. It is believed that likelihood ratio tests are
generally more reliable in small samples than tests based on the information matrix.

SAS OnlineDoc: Version 8

Getting Started

1763

The log-likelihood function is computed using the log of the failure time as a response. This log likelihood differs from thePlog likelihood obtained using the failure
time as the response by an additive term of log(ti ), where the sum is over the noncensored failure times. This term does not depend on the unknown parameters and
does not affect parameter or standard error estimates. However, many published values of log likelihoods use the failure time as the basic response variable and, hence,
differ by the additive term from the value computed by the LIFEREG procedure.
The classic Tobit model (Tobin 1958) also fits into this class of models but with data
usually censored on the left. The data considered by Tobin in his original paper came
from a survey of consumers where the response variable is the ratio of expenditures
on durable goods to the total disposable income. The two explanatory variables are
the age of the head of household and the ratio of liquid assets to total disposable
income. Because many observations in this data set have a value of zero for the
response variable, the model fit by Tobin is
y

= max(x0 + ; 0)

which is a regression model with left censoring.

Getting Started
The following examples demonstrate how you can use the LIFEREG procedure to fit
a parametric model to failure time data.
Suppose you have a response variable y that represents failure time, censor is a binary variable indicating censored values, and x1 and x2 are two linearly independent
variables. The following statements perform a typical accelerated failure time model
analysis. Note that no higher-order effects such as interactions are allowed in the
covariables list.
proc lifereg;
model y*censor(0) = x1 x2;
run;

PROC LIFEREG can operate on interval-censored data. The model syntax for specifying the censored interval is
proc lifereg;
model (begin, end) = x1 x2;
run;

You can also express the response with events/trials syntax, as illustrated in the following statements:
proc lifereg;
model r/n=x1 x2;
run;

SAS OnlineDoc: Version 8

1764 

Chapter 36. The LIFEREG Procedure

The variable n represents the number of trials and the variable r represents the number
of events.

Modeling Right-Censored Failure Time Data


The following example demonstrates how you can use the LIFEREG procedure to fit
a model to right-censored failure time data.
Suppose you conduct a study of two headache pain relievers. You divide patients into
two groups, with each group receiving a different type of pain reliever. You record
the time taken (in minutes) for each patient to report headache relief. Because some
of the patients never report relief for the entire study, some of the observations are
censored.
The following DATA step creates the SAS data set headache:
data headache;
input minutes group censor @@;
datalines;
11 1 0
12 1 0
19 1 0
19
19 1 0
19 1 0
21 1 0
20
21 1 0
21 1 0
20 1 0
21
20 1 0
21 1 0
25 1 0
27
30 1 0
21 1 1
24 1 1
14
16 2 0
16 2 0
21 2 0
21
23 2 0
23 2 0
23 2 0
23
25 2 1
23 2 0
24 2 0
24
26 2 1
32 2 1
30 2 1
30
32 2 1
20 2 1
;

1
1
1
1
2
2
2
2
2

0
0
0
0
0
0
0
0
0

The data set headache contains the variable minutes, which represents the reported
time to headache relief, the variable group, the group to which the patient is assigned, and the variable censor, a binary variable indicating whether the observation
is censored. Valid values of the variable censor are 0 (no) and 1 (yes). The first five
records of the data set headache are shown below.

Figure 36.1.

SAS OnlineDoc: Version 8

Obs

minutes

group

censor

1
2
3
4
5

11
12
19
19
19

1
1
1
1
1

0
0
0
0
0

Headache Data

Modeling Right-Censored Failure Time Data

1765

The following statements invoke the LIFEREG procedure:


proc lifereg;
class group;
model minutes*censor(1)=group;
output out=new cdf=prob;
run;

The CLASS statement specifies the variable group as the classification variable. The
MODEL statement syntax indicates that the response variable minutes is censored
when the variable censor takes the value 1. The MODEL statement specifies the
variable group as the single explanatory variable. Because the MODEL statement
does not specify the DISTRIBUTION= option, the LIFEREG procedure fits the default type 1 extreme value distribution using log(minutes) as the response. This is
equivalent to fitting the Weibull distribution.
The OUTPUT statement creates the output data set new. In addition to the variables
in the original data set headache, the SAS data set new also contains the variable
prob. This new variable is created by the CDF= option to contain the estimates of
the cumulative distribution function evaluated at the observed response.
The results of this analysis are displayed in the following figures.
The LIFEREG Procedure
Class Level Information
Name

Levels

group

Values
1 2

Model Information
Data Set
Dependent Variable
Censoring Variable
Censoring Value(s)
Number of Observations
Noncensored Values
Right Censored Values
Left Censored Values
Interval Censored Values
Name of Distribution
Log Likelihood

Figure 36.2.

WORK.HEADACHE
Log(minutes)
censor
1
38
30
8
0
0
WEIBULL
-9.37930239

Model Fitting Information from the LIFEREG Procedure

Figure 36.2 displays the class level information and model fitting information. There
are 30 noncensored observations and 8 right-censored observations. The log likelihood for the Weibull distribution is -9.3793. The log-likelihood value can be used to
compare the goodness of fit for different models.

SAS OnlineDoc: Version 8

1766 

Chapter 36. The LIFEREG Procedure

The LIFEREG Procedure


Analysis of Parameter Estimates

Variable
Intercept
group

Scale

Standard
Error Chi-Square Pr > ChiSq Label

DF

Estimate

1
1
1
0
1

3.30912

0.05885

-0.19330
0
0.21219

0.07856
0
0.03036

Figure 36.3.

3161.7000
6.0540
6.0540
.

<.0001 Intercept
0.0139
0.0139 1
.
2
Extreme value scale

Model Parameter Estimates from the LIFEREG Procedure

The table of parameter estimates is displayed in Figure 36.3. Both the intercept and
the slope parameter for the variable group are significantly different from 0 at the
0.05 level. Because the variable group has only one degree of freedom, parameter
estimates are given for only one level of the variable group (group=1). However, the
estimate for the intercept parameter provides a baseline for group=2. The resulting
model is

log(minutes) =

3:30911843 , 0:1933025
3:30911843

for group=1
for group=2

Note that the Weibull shape parameter for this model is the reciprocal of the extreme
value scale parameter estimate shown in Figure 36.3 (1=0:21219 = 4:7128).
The following statements produce a graph of the cumulative distribution values versus
the variable minutes. The LEGEND1 statement defines the appearance of the legend
that displays on the plot. The two AXIS statements define the appearance of the plot
axes. The SYMBOL statements control the plotting symbol, color, and method of
smoothing.
legend1 frame cframe=ligr cborder=black
position=center value=(justify=center);
axis1 label=(angle=90 rotate=0 Estimated CDF) minor=none;
axis2 minor=none;
symbol1 c=white i=spline;
symbol2 c=yellow i=spline;
proc sort data=new;
by prob;
proc gplot data=new;
plot prob*minutes=group/ frame cframe=ligr
legend=legend1 vaxis=axis1 haxis=axis2;
run;

The SORT procedure sorts the data set new by the variable prob. Then the GPLOT
procedure plots the variable prob versus the variable minutes using the grouping
SAS OnlineDoc: Version 8

Syntax

1767

variable as the identification variable. The LEGEND=, VAXIS=, and HAXIS= options specify the previously defined legend and axis statements.
Figure 36.4 displays the estimated cumulative distribution function for each group.

Figure 36.4.

Plot of the Estimated Cumulative Distribution Function

Syntax
The following statements are available in PROC LIFEREG.

PROC LIFEREG < options > ;


MODEL response=independents < / options > ;
BY variables ;
CLASS variables ;
OUTPUT < OUT=SAS-data-set >
keyword=name < : : : keyword=name >
< options > ;
WEIGHT variable ;
The PROC LIFEREG statement invokes the procedure. The MODEL statement is
required and specifies the variables used in the regression part of the model as well as
the distribution used for the error, or random, component of the model. Only main effects can be specified in the MODEL statements. Interaction terms involving CLASS
variables, allowed in the GLM procedure, are not available in PROC LIFEREG. Initial values can be specified in the MODEL statement. If no initial values are specified,
the starting estimates are obtained by ordinary least squares. The CLASS statement
determines which explanatory variables are treated as categorical. The WEIGHT

SAS OnlineDoc: Version 8

1768 

Chapter 36. The LIFEREG Procedure

statement identifies a variable with values that are used to weight the observations.
Observations with zero or negative weights are not used to fit the model, although
predicted values can be computed for them. The OUTPUT statement creates an output data set containing predicted values and residuals.

PROC LIFEREG Statement


PROC LIFEREG < options > ;
The PROC LIFEREG statement invokes the procedure. You can specify the following
options in the PROC LIFEREG statement.
COVOUT

writes the estimated covariance matrix to the OUTEST=data set if convergence is


attained.
DATA=SAS-data-set

specifies the input SAS data set used by PROC LIFEREG. By default, the most recently created SAS data set is used.
NOPRINT

suppresses the display of the output. Note that this option temporarily disables the
Output Delivery System (ODS). For more information, see Chapter 15, Using the
Output Delivery System.
ORDER=DATA | FORMATTED | FREQ | INTERNAL

specifies the sorting order for the levels of the classification variables (specified in the
CLASS statement). This ordering determines which parameters in the model correspond to each level in the data. The following table illustrates how PROC LIFEREG
interprets values of the ORDER= option.
Value of ORDER=
DATA

Levels Sorted By
order of appearance in the input data set

FORMATTED

formatted value

FREQ

descending frequency count; levels with the


most observations come first in the order

INTERNAL

unformatted value

By default, ORDER=FORMATTED. For FORMATTED and INTERNAL, the sort


order is machine dependent. For more information on sorting order, refer to the
chapter titled The SORT Procedure in the SAS Procedures Guide.
OUTEST=SAS-data-set

specifies an output SAS data set containing the parameter estimates, the maximized
log likelihood and, if the COVOUT option is specified, the estimated covariance matrix. See the section OUTEST= Data Set on page 1784 for a detailed description of
the contents of the OUTEST= data set. This data set is not created if class variables
are used.
SAS OnlineDoc: Version 8

CLASS Statement

1769

BY Statement
BY variables ;
You can specify a BY statement with PROC LIFEREG to obtain separate analyses on
observations in groups defined by the BY variables. When a BY statement appears,
the procedure expects the input data set to be sorted in order of the BY variables.
If your input data set is not sorted in ascending order, use one of the following alternatives:




Sort the data using the SORT procedure with a similar BY statement.
Specify the BY statement option NOTSORTED or DESCENDING in the BY
statement for the LIFEREG procedure. The NOTSORTED option does not
mean that the data are unsorted but rather that the data are arranged in groups
(according to values of the BY variables) and that these groups are not necessarily in alphabetical or increasing numeric order.
Create an index on the BY variables using the DATASETS procedure.

For more information on the BY statement, refer to the discussion in SAS Language
Reference: Concepts. For more information on the DATASETS procedure, refer to
the discussion in the SAS Procedures Guide.

CLASS Statement
CLASS variables ;
Variables that are classification variables rather than quantitative numeric variables
must be listed in the CLASS statement. For each explanatory variable listed in the
CLASS statement, indicator variables are generated for the levels assumed by the
CLASS variable. If you use a CLASS statement, you cannot output parameter estimates to the OUTEST= data set (you can output them to a data set via ODS). If the
CLASS statement is used, it must appear before any of the MODEL statements.

SAS OnlineDoc: Version 8

1770 

Chapter 36. The LIFEREG Procedure

MODEL Statement
<label:> MODEL response<*censor(list)>=independents < / options > ;
<label:>

MODEL (lower,upper)=independents < / options > ;

<label:>

MODEL events/trials=independents < / options > ;

Multiple MODEL statements can be used with one invocation of the LIFEREG procedure. The optional label is used to label the model estimates in the output SAS data
set.
The first MODEL syntax allows for right censoring. The variable response is possibly
right censored. If the response variable can be right censored, then a second variable,
denoted censor, must appear after the response variable with a list of parenthesized
values, separated by commas or blanks, to indicate censoring. That is, if the censor
variable takes on a value given in the list, the response is a right-censored value;
otherwise, it is an observed value.
The second MODEL syntax specifies two variables, lower and upper, that contain
values of the endpoints of the censoring interval. If the two values are the same (and
not missing), it is assumed that there is no censoring and the actual response value is
observed. If the lower value is missing, then the upper value is used as a left-censored
value. If the upper value is missing, then the lower value is taken as a right-censored
value. If both values are present and the lower value is less than the upper value, it
is assumed that the values specify a censoring interval. If the lower value is greater
than the upper value or both values are missing, then the observation is not used in
the analysis although predicted values can still be obtained if none of the covariates
are missing. The following table summarizes the ways of specifying censoring.
lower
not missing

upper
not missing

Comparison
equal

Interpretation
no censoring

not missing

not missing

lower < upper

censoring interval-

missing

not missing

upper used as leftcensoring value

not missing

missing

lower used as rightcensoring value

not missing

not missing

missing

missing

SAS OnlineDoc: Version 8

lower > upper

observation not used


observation not used

MODEL Statement

1771

The third MODEL syntax specifies two variables that contain count data for a binary
response. The value of the first variable, events, is the number of successes. The
value of the second variable, trials, is the number of tries. The values of both events
and (trials-events) must be nonnegative, and trials must be positive for the response
to be valid. The values of the two variables do not need to be integers and are not
modified to be integers.
The variables following the equal sign are the covariates in the model. No higher
order effects, such as interactions, are allowed in the covariables list; only variable
names are allowed to appear in this list. However, a class variable can be used as a
main effect, and indicator variables are generated for the class levels. If you do not
specify any covariates following the equal sign, an intercept-only model is fit.
Examples of three valid MODEL statements are
a: model time*flag(1,3)=temp;
b: model (start, finish)=;
c: model r/n=dose;

Model statement a indicates that the response is contained in a variable named time
and that, if the variable flag takes on the values 1 or 3, the observation is right censored. The explanatory variable is temp, which could be a class variable. Model
statement b indicates that the response is known to be in the interval between the
values of the variables start and finish and that there are no covariates except for a
default intercept term. Model statement c indicates a binary response, with the variable r containing the number of responses and the variable n containing the number
of trials.
The following options can appear in the MODEL statement.

SAS OnlineDoc: Version 8

1772 

Chapter 36. The LIFEREG Procedure


Task
Model specification
specify distribution type for failure time
request no log transformation of response
initial estimate for intercept term
hold intercept term fixed
initial estimates for regression parameters
initialize scale parameter
hold scale parameter fixed
initialize first shape parameter
hold first shape parameter fixed

DISTRIBUTION=
NOLOG
INTERCEPT=
NOINT
INITIAL=
SCALE=
NOSCALE
SHAPE1=
NOSHAPE1

Model fitting
set convergence criterion
set maximum iterations
set tolerance for testing singularity

CONVERGE=
MAXITER=
SINGULAR=

Output
display estimated correlation matrix
display estimated covariance matrix
display iteration history, final gradient,
and second derivative matrix

Option

CORRB
COVB
ITPRINT

CONVERGE=value

sets the convergence criterion. Convergence is declared when the maximum change
in the parameter estimates between Newton-Raphson steps is less than the value specified. The change is a relative change if the parameter is greater than 0.01 in absolute
value; otherwise, it is an absolute change. By default, CONVERGE=0.001.
CONVG=number

sets the relative Hessian convergence criterion. The value of number must be between 0 and 1. After convergence is determined with the change in parameter crite0 ,1
rion specified with the CONVERGE= option, the quantity tc = g Hjf j g is computed
and compared to number, where g is the gradient vector, H is the Hessian matrix
for the model parameters, and f is the log-likelihood function. If tc is greater than
number, a warning that the relative Hessian convergence criterion has been exceeded
is printed. This criterion detects the occasional case where the change in parameter
convergence criterion is satisfied, but a maximum in the log-likelihood function has
not been attained. By default, CONVG=1E,4.
CORRB

produces the estimated correlation matrix of the parameter estimates.


COVB

produces the estimated covariance matrix of the parameter estimates.


DISTRIBUTION=distribution-type
DIST=distribution-type
D=distribution-type

specifies the distribution type assumed for the failure time. By default, PROC
LIFEREG fits a type 1 extreme value distribution to the log of the response. This
SAS OnlineDoc: Version 8

MODEL Statement

1773

is equivalent to fitting the Weibull distribution, since the scale parameter for the extreme value distribution is related to a Weibull shape parameter and the intercept
is related to the Weibull scale parameter in this case. When the NOLOG option is
specified, PROC LIFEREG models the untransformed response with a type 1 extreme value distribution as the default. See the section Supported Distributions on
page 1780 for descriptions of the distributions. The following are valid values for
distribution-type:
EXPONENTIAL the exponential distribution, which is treated as a restricted
Weibull distribution
GAMMA

a generalized gamma distribution (Lawless, 1982, p. 240). The


two parameter gamma distribution is not available in PROC
LIFEREG.

LLOGISTIC

a loglogistic distribution

LNORMAL

a lognormal distribution

LOGISTIC

a logistic distribution (equivalent to LLOGISTIC when the


NOLOG option is specified)

NORMAL

a normal distribution (equivalent to LNORMAL when the


NOLOG option is specified)

WEIBULL

a Weibull distribution. If NOLOG is specified, it fits a type 1


extreme value distribution to the raw, untransformed data.

By default, PROC LIFEREG transforms the response with the natural logarithm
before fitting the specified model when you specify the GAMMA, LLOGISTIC,
LNORMAL, or WEIBULL option. You can suppress the log transformation with
the NOLOG option. The following table summarizes the resulting distributions when
the distribution options above are used in combination with the NOLOG option.
DISTRIBUTION=
EXPONENTIAL
EXPONENTIAL
GAMMA
GAMMA
LOGISTIC
LOGISTIC
LLOGISTIC
LLOGISTIC
LNORMAL
LNORMAL
NORMAL
NORMAL
WEIBULL
WEIBULL

NOLOG specified?
No
Yes
No
Yes
No
Yes
No
Yes
No
Yes
No
Yes
No
Yes

Resulting distribution
Exponential
One parameter extreme value
Generalized gamma
Generalized gamma with untransformed responses
Logistic
Logistic (NOLOG has no effect)
Log-logistic
Logistic
Lognormal
Normal
Normal
Normal (NOLOG has no effect)
Weibull
Extreme value

SAS OnlineDoc: Version 8

1774 

Chapter 36. The LIFEREG Procedure

INITIAL=values

sets initial values for the regression parameters. This option can be helpful in the case
of convergence difficulty. Specified values are used to initialize the regression coefficients for the covariates specified in the MODEL statement. The intercept parameter
is initialized with the INTERCEPT= option and is not included here. The values are
assigned to the variables in the MODEL statement in the same order in which they
are listed in the MODEL statement. Note that a class variable requires k , 1 values
when the class variable takes on k different levels. The order of the class levels is determined by the ORDER= option. If there is no intercept term, the first class variable
requires k initial values. If a BY statement is used, all class variables must take on
the same number of levels in each BY group or no meaningful initial values can be
specified. The INITIAL option can be specified as follows.
Type of List
list separated by blanks

initial=3 4 5

Specification

list separated by commas

initial=3,4,5

x to y

initial=3 to 5

x to y by z

initial=3 to 5 by 1

combination of methods

initial=1,3 to 5,9

By default, PROC LIFEREG computes initial estimates with ordinary least squares.
See the section Computational Method on page 1778 for details.
INTERCEPT=value

initializes the intercept term to value. By default, the intercept is initialized by an


ordinary least squares estimate.
ITPRINT

displays the iteration history, the final evaluation of the gradient, and the final evaluation of the negative of the second derivative matrix, that is, the negative of the
Hessian.
MAXITER=value

sets the maximum allowable number of iterations during the model estimation. By
default, MAXITER=50.
NOINT

holds the intercept term fixed. Because of the usual log transformation of the response, the intercept parameter is usually a scale parameter for the untransformed
response, or a location parameter for a transformed response.
NOLOG

requests that no log transformation of the response variable be performed. By default, PROC LIFEREG models the log of the response variable for the GAMMA,
LLOGISTIC, LOGNORMAL, and WEIBULL distribution options.
NOSCALE

holds the scale parameter fixed. Note that if the log transformation has been applied
to the response, the effect of the scale parameter is a power transformation of the
original response. If no SCALE= value is specified, the scale parameter is fixed at
the value 1.
SAS OnlineDoc: Version 8

OUTPUT Statement

1775

NOSHAPE1

holds the first shape parameter, SHAPE1, fixed. If no SHAPE= value is specified,
SHAPE1 is fixed at a value that depends on the DISTRIBUTION type.
SCALE=value

initializes the scale parameter to value. If the Weibull distribution is specified, this
scale parameter is the scale parameter of the type 1 extreme value distribution, not the
Weibull scale parameter. Note that, with a log transformation, the exponential model
is the same as a Weibull model with the scale parameter fixed at the value 1.
SHAPE1=value

initializes the first shape parameter to value. If the specified distribution does not
depend on this parameter, then this option has no effect. The only distribution that
depends on this shape parameter is the generalized gamma distribution. See the
Supported Distributions section on page 1780 for descriptions of the parameterizations of the distributions.
SINGULAR=value

sets the tolerance for testing singularity of the information matrix and the crossproducts matrix for the initial least-squares estimates. Roughly, the test requires that
a pivot be at least this number times the original diagonal value. By default,
SINGULAR=1E,12.

OUTPUT Statement
OUTPUT <OUT=SAS-data-set> keyword=name <: : :keyword=name> ;
The OUTPUT statement creates a new SAS data set containing statistics calculated
after fitting the model. At least one specification of the form keyword=name is required.
All variables in the original data set are included in the new data set, along with the
variables created as options to the OUTPUT statement. These new variables contain
fitted values and estimated quantiles. If you want to create a permanent SAS data set,
you must specify a two-level name (refer to SAS Language Reference: Concepts for
more information on permanent SAS data sets). Each OUTPUT statement applies to
the preceding MODEL statement. See Example 36.1 for illustrations of the OUTPUT
statement.
The following specifications can appear in the OUTPUT statement:
OUT=SAS-data-set specifies the new data set. By default, the procedure uses the
DATAn convention to name the new data set.
keyword=name

specifies the statistics to include in the output data set and gives
names to the new variables. Specify a keyword for each desired
statistic (see the following list of keywords), an equal sign, and
the variable to contain the statistic.

SAS OnlineDoc: Version 8

1776 

Chapter 36. The LIFEREG Procedure

The keywords allowed and the statistics they represent are as follows:
CENSORED

specifies an indicator variable to signal censoring. The variable


takes on the value 1 if the observation is censored; otherwise, it is
0.

CDF

specifies a variable to contain the estimates of the cumulative distribution function evaluated at the observed response. See the
Predicted Values section on page 1783 for more information.

CONTROL

specifies a variable in the input data set to control the estimation of


quantiles. See Example 36.1 for an illustration. If the specified
variable has the value of 1, estimates for all the values listed in the
QUANTILE= list are computed for that observation in the input
data set; otherwise, no estimates are computed. If no CONTROL=
variable is specified, all quantiles are estimated for all observations.
If the response variable in the MODEL statement is binomial, then
this option has no effect.

PREDICTED | P specifies a variable to contain the quantile estimates. If the response variable in the corresponding model statement is binomial, then this variable contains the estimated probabilities, 1 ,
F (,x0 b).
QUANTILES | QUANTILE | Q gives a list of values for which quantiles are calculated. The values must be between 0 and 1, noninclusive. For each
value, a corresponding quantile is estimated. This option is not
used if the response variable in the corresponding MODEL statement is binomial. The QUANTILES option can be specified as
follows.
Type of List
list separated by blanks

Specification
.2 .4 .6 .8

list separated by commas

.2,.4,.6,.8

x to y

.2 to .8

x to y by z

.2 to .8 by .1

combination of methods

.1,.2 to .8 by .2

By default, QUANTILES=0.5. When the response is not binomial, a numeric variable, PROB , is added to the OUTPUT data
set whenever the QUANTILES= option is specified. The variable
PROB gives the probability value for the quantile estimates.
These are the values taken from the QUANTILES= list and are
given as values between 0 and 1, not as values between 0 and 100.
STD ERR | STD specifies a variable to contain the estimates of the standard errors of the estimated quantiles or x0 b. If the response used in the
MODEL statement is a binomial response, then these are the standard errors of x0 b. Otherwise, they are the standard errors of the

SAS OnlineDoc: Version 8

Computational Method

1777

quantile estimates. These estimates can be used to compute confidence intervals for the quantiles. However, if the model is fit to
the log of the event time, better confidence intervals can usually
be computed by transforming the confidence intervals for the log
response. See Example 36.1 for such a transformation.
XBETA

specifies a variable to contain the computed value of x0 b, where x


is the covariate vector and b is the vector of parameter estimates.

WEIGHT Statement
WEIGHT variable ;
If you want to use weights for each observation in the input data set, place the weights
in a variable in the data set and specify the name in a WEIGHT statement. The values
of the WEIGHT variable can be nonintegral and are not truncated. Observations with
nonpositive or missing values for the weight variable do not contribute to the fit of
the model. The WEIGHT variable multiplies the contribution to the log likelihood
for each observation.

Details
Missing Values
Any observation with missing values for the dependent variable is not used in the
model estimation unless it is one and only one of the values in an interval specification. Also, if one of the explanatory variables or the censoring variable is missing, the
observation is not used. For any observation to be used in the estimation of a model,
only the variables needed in that model have to be nonmissing. Predicted values are
computed for all observations with no missing explanatory variable values. If the
censoring variable is missing, the CENSORED= variable in the OUT= SAS data set
is also missing.

Main Effects
Unlike the GLM procedure, only main effect terms are allowed in the model specification. For numeric variables, this is a linear term equal to the value of the variable unless the variable appears in the CLASS statement. For variables listed in the
CLASS statement, PROC LIFEREG creates indicator variables (variables taking the
values zero or one) for every level of the variable except the last level. If there is
no intercept term, the first class variable has indicator variables created for all levels
including the last level. The levels are ordered according to the ORDER= option.
Estimates of a main effect depend upon other effects in the model and, therefore, are
adjusted for the presence of other effects in the model.

SAS OnlineDoc: Version 8

1778 

Chapter 36. The LIFEREG Procedure

Computational Method
By default, the LIFEREG Procedure computes initial values for the parameters using
ordinary least squares (OLS) ignoring censoring. This might not be the best set of
starting values for a given set of data. For example, if there are extreme values in your
data the OLS fit may be excessively influenced by the extreme observations, causing
an overflow or convergence problems. See Example 36.3 for one way to deal with
convergence problems.
You can specify the INITIAL= option in the MODEL statement to override these
starting values. You can also specify the INITIAL=, SCALE=, and SHAPE= options
to set initial values of the intercept, scale, and shape parameters.
The rank of the design matrix X is estimated before the model is fit. Columns of X
that are judged linearly dependent on other columns have the corresponding parameters set to zero. The test for linear dependence is controlled by the SINGULAR=
option in the MODEL statement. Variables are included in the model in the order in
which they are listed in the MODEL statement with the nonclass variables included
in the model before any class variables.
The log-likelihood function is maximized by means of a ridge-stabilized NewtonRaphson algorithm. The maximized value of the log-likelihood can take positive or
negative values, depending on the specified model and the values of the maximum
likelihood estimates of the model parameters.
A composite chi-square test statistic is computed for each class variable, testing
whether there is any effect from any of the levels of the variable. This statistic is
computed as a quadratic form in the appropriate parameter estimates using the corresponding submatrix of the asymptotic covariance matrix estimate. The asymptotic
covariance matrix is computed as the inverse of the observed information matrix.
Note that if the NOINT option is specified and class variables are used, the first class
variable contains a contribution from an intercept term.

Model Specifications
LIFEREG procedure
Suppose there are n observations from the model y = X + , where X is an n  k
matrix of covariate values (including the intercept), y is a vector of responses, and
 is a vector of errors with survival distribution function S , cumulative distribution
function F , and probability density function f . That is, S (t) = Pr(i > t), F (t) =
Pr(i  t), and f (t) = dF (t)=dt, where i is a component of the error vector. Then,
if all the responses are observed, the log likelihood, L, can be written as

L=
where wi

log f (wi )

= 1 (yi , x0i ).

SAS OnlineDoc: Version 8

Model Specifications

1779

If some of the responses are left, right, or interval censored, the log likelihood can be
written as

L=

X
X
X
log f (wi ) + log (S (wi )) + log (F (wi )) + log (F (wi ) , F (vi ))

with the first sum over uncensored observations, the second sum over right-censored
observations, the third sum over left-censored observations, the last sum over intervalcensored observations, and

vi = (zi , x0i )

where zi is the lower end of a censoring interval.
If the response is specified in the binomial format, events/trials, then the loglikelihood function is

L=

ri log(Pi ) + (ni , ri ) log(1 , Pi )

where ri is the number of events and ni is the number of trials for the ith observation.
In this case, Pi = 1 , F (,x0i ). For the symmetric distributions, logistic and normal,
this is the same as F (x0i ). Additional information on censored and limited dependent variable models can be found in Kalbfleisch and Prentice (1980) and Maddala
(1983).
The estimated covariance matrix of the parameter estimates is computed as the negative inverse of I, which is the information matrix of second derivatives of L with
respect to the parameters evaluated at the final parameter estimates. If I is not positive definite, a positive definite submatrix of I is inverted, and the remaining rows and
columns of the inverse are set to zero. If some of the parameters, such as the scale
and intercept, are restricted, the corresponding elements of the estimated covariance
matrix are set to zero. The standard error estimates for the parameter estimates are
taken as the square roots of the corresponding diagonal elements.
For restrictions placed on the intercept, scale, and shape parameters, one-degree-offreedom Lagrange multiplier test statistics are computed. These statistics are computed as

2 =

g2
V

where g is the derivative of the log likelihood with respect to the restricted parameter
at the restricted maximum and

V = I11 , I12 I,221 I21


where the 1 subscripts refer to the restricted parameter and the 2 subscripts refer
to the unrestricted parameters. The information matrix is evaluated at the restricted
SAS OnlineDoc: Version 8

1780 

Chapter 36. The LIFEREG Procedure

maximum. These statistics are asymptotically distributed as chi-squares with one degree of freedom under the null hypothesis that the restrictions are valid, provided that
some regularity conditions are satisfied. See Rao (1973, p. 418) for a more complete
discussion. It is possible for these statistics to be missing if the observed information
matrix is not positive definite. Higher degree-of-freedom tests for multiple restrictions are not currently computed.
A Lagrange multiplier test statistic is computed to test this constraint. Notice that this
test statistic is comparable to the Wald test statistic for testing that the scale is one.
The Wald statistic is the result of squaring the difference of the estimate of the scale
parameter from one and dividing this by the square of its estimated standard error.

Supported Distributions
For each distribution, the baseline survival distribution function (S ) and the probability density function(f ) are listed for the additive random disturbance. These distributions apply when the log of the response is modeled (this is the default analysis). The
corresponding survival distribution function (G) and its density function (g ) are given
for the untransformed baseline distribution. For example, for the WEIBULL distribution, S (w) and f (w) are the baseline survival distribution function and the probability density function for the extreme value distribution (the log of the response)
while G(t) and g (t) are the survival distribution function and probability distribution
function of a Weibull distribution (using the untransformed response).
The chosen baseline functions define the meaning of the intercept, scale, and shape
parameters. Only the gamma distribution has a free shape parameter in the following
parameterizations. Notice that some of the distributions do not have mean zero and
that  is not, in general, the standard deviation of the baseline distribution.
Additionally, it is worth mentioning that, for the Weibull distribution, the accelerated failure time model is also a proportional-hazards model. However, the parameterization for the covariates differs by a multiple of the scale parameter from the
parameterization commonly used for the proportional hazards model.
The distributions supported in the LIFEREG procedure follow.
= Scale in the output.

Exponential

S (w) = exp(, exp(w , ))


f (w) = exp(w , ) exp(, exp(w , ))
G(t) = exp(, t)
g(t) = exp(, t)
where exp(,) = .

SAS OnlineDoc: Version 8

 = Intercept and 

Supported Distributions

1781

Generalized Gamma
(with  = 0,  = 1)

S (w ) =

8
, ,2 ;,2 exp(w))
>
< (
,(,2 )
,2 ,2
>
: 1 ,( ; exp(w))
,(,2 )

G(t) =

8
, ,2 ;,2 t )
>
< (
,(,2 )
,2 ,2 
>
: 1 ,( ; t )
,(,2 )

if 

>0

if  < 0
,
jj ,,2 exp(w),2 exp ,, exp(w),2 
f (w ) =
, (,2 )

g(t) =

jj

,2 t

t, (,2 )

,2

if 

>0

if 

<0

exp ,t ,2

where ,(a) denotes the complete gamma function, ,(a; z ) denotes the incomplete
gamma function, and  is a free shape parameter. The  parameter is referred
to as Shape by PROC LIFEREG. Refer to Lawless, 1982, p.240 and Klein and
Moeschberger, 1997, p.386 for a description of the generalized gamma distribution.

Loglogistic

S (w ) =


1 + exp w ,

,

,1

exp w,
f (w ) = ,

,
 1 + exp w,  2
1
G(t) =
1 + t
g(t) =
where

t ,1
(1 + t )2

= 1= and = exp(,=).

Lognormal

S (w ) = 1 , 

w,


1 exp , 1 w , 
f (w ) = p
2

2

2 !

SAS OnlineDoc: Version 8

1782 

Chapter 36. The LIFEREG Procedure




log(t) , 
G(t) = 1 , 


1 exp , 1 log(t) , 
g(t) = p
2

2t

2 !

where  is the cumulative distribution function for the normal distribution.

Weibull


w,
S (w) = exp ,exp





w,
w,
1
exp
,
exp
f (w) = exp



G(t) = exp (, t )



g(t) = t ,1 exp (, t )
where 

= 1= and = exp(,=).

If your parameterization is different from the ones shown here, you can still use the
procedure to fit your model. For example, a common parameterization for the Weibull
distribution is

g(t; ; ) =

    ,1

G(t; ; ) = exp
so that  = exp() and

  !

exp , t

  !

, t

= 1=.

Again note that the expected value of the baseline log response is, in general, not
zero and that the distributions are not symmetric in all cases. Thus, for a given set of
covariates, x, the expected value of the log response is not always x0 .
Some relations among the distributions are as follows:





The gamma with Shape=1 is a Weibull distribution.


The gamma with Shape=0 is a lognormal distribution.
The Weibull with Scale=1 is an exponential distribution.

SAS OnlineDoc: Version 8

OUTEST= Data Set

1783

Predicted Values
For a given set of covariates, x (including the intercept term), the pth quantile of the
log response, yp , is given by

yp = x0 + wp
where wp is the pth quantile of the baseline distribution. The estimated quantile is
computed by replacing the unknown parameters with their estimates, including any
shape parameters on which the baseline distribution might depend. The estimated
quantile of the original response is obtained by taking the exponential of the estimated log quantile unless the NOLOG option is specified in the preceding MODEL
statement.
The standard errors of the quantile estimates are computed using the estimated covariance matrix of the parameter estimates and a Taylor series expansion of the quantile
estimate. The standard error is computed as
STD =

z0 Vz

where V is the estimated covariance matrix of the parameter vector ( 0 ; ;  )0 , and z


is the vector
2
z

= 64 w^p
p
^ @w
@

3
7
5

where  is the vector of the shape parameters. Unless the NOLOG option is specified,
this standard error estimate is converted into a standard error estimate for exp(yp )
as exp(^
yp)STD. It may be more desirable to compute confidence limits for the log
response and convert them back to the original response variable than to use the
standard error estimates for exp(yp ) directly. See Example 36.1 for a 90% confidence
interval of the response constructed by exponentiating a confidence interval for the
log response.
The variable, CDF, is computed as
CDFi

= F (wi )

where the residual




y , x0 i b
wi = i
^

and F is the baseline cumulative distribution function.

SAS OnlineDoc: Version 8

1784 

Chapter 36. The LIFEREG Procedure

OUTEST= Data Set


The OUTEST= data set contains parameter estimates and the log likelihood for the
specified models. A set of observations is created for each MODEL statement specified. You can specify a label in the MODEL statement to distinguish between the
estimates for different MODEL statements. If the COVOUT option is specified, the
OUTEST= data set also contains the estimated covariance matrix of the parameter
estimates. Note that, if the LIFEREG procedure does not converge, the parameter
estimates are set to missing in the OUTEST data set.
The OUTEST= data set is not created if there are any CLASS variables in any models.
If created, this data set contains all variables specified in the MODEL statement and
the BY statement. One observation consists of parameter values for the model with
the dependent variable having the value ,1. If the COVOUT option is specified, there
are additional observations containing the rows of the estimated covariance matrix.
For these observations, the dependent variable contains the parameter estimate for the
corresponding row variable. The following variables are also added to the data set:

MODEL

a character variable of length 8 containing the label of the MODEL


statement, if present. Otherwise, the variables value is blank.

NAME

a character variable of length 8 containing the name of the dependent variable for the parameter estimates observations or the name
of the row for the covariance matrix estimates

TYPE

a character variable of length 8 containing the type of the observation, either PARMS for parameter estimates or COV for covariance
estimates

DIST

a character variable of length 8 containing the name of the distribution modeled

LNLIKE

a numeric variable containing the last computed value of the log


likelihood

INTERCEPT

a numeric variable containing the intercept parameter estimates and


covariances

SCALE

a numeric variable containing the scale parameter estimates and


covariances

SHAPE1

a numeric variable containing the first shape parameter estimates


and covariances if the specified distribution has additional shape
parameters

Any BY variables specified are also added to the OUTEST= data set.

Computational Resources
Let p be the number of parameters estimated in the model. The minimum working
space (in bytes) needed is

16p2 + 100p
SAS OnlineDoc: Version 8

Displayed Output

1785

However, if sufficient space is available, the input data set is also kept in memory;
otherwise, the input data set is reread for each evaluation of the likelihood function
and its derivatives, with the resulting execution time of the procedure substantially
increased.
Let n be the number of observations used in the model estimation. Each evaluation
of the likelihood function and its first and second derivatives requires O (np2 ) multiplications and additions, n individual function evaluations for the log density or log
distribution function, and n evaluations of the first and second derivatives of the function. The calculation of each updating step from the gradient and Hessian requires
O(p3 ) multiplications and additions. The O(v) notation means that, for large values
of the argument, v , O (v ) is approximately a constant times v .

Displayed Output
For each model, PROC LIFEREG displays









the name of the Data Set


the name of the Dependent Variable
the name of the Censoring Variable
the Censoring Value(s) that indicate a censored observation
the number of Noncensored and Censored Values
the final estimate of the maximized log likelihood
the iteration history and the Last Evaluation of the Gradient and Hessian if the
ITPRINT option is specified (not shown)

For each explanatory variable in the model, the LIFEREG procedure displays









the name of the Variable


the degrees of freedom (DF) associated with the variable in the model
the Estimate of the parameter
the standard error (Std Err) estimate from the observed information matrix
an approximate chi-square statistic for testing that the parameter is zero (the
class variables also have an overall chi-square test statistic computed that precedes the individual level parameters)
the probability of a larger chi-square value (Pr>Chi)
the Label of the variable or, if the variable is a class level, the Value of the class
variable

If there are constrained parameters in the model, such as the scale or intercept, then
PROC LIFEREG displays a Lagrange multiplier test for the constraint.

SAS OnlineDoc: Version 8

1786 

Chapter 36. The LIFEREG Procedure

ODS Table Names


PROC LIFEREG assigns a name to each table it creates. You can use these names
to reference the table when using the Output Delivery System (ODS) to select tables
and create output data sets. These names are listed in the following table. For more
information on ODS, see Chapter 15, Using the Output Delivery System.
Table 36.1.

ODS Tables Produced in PROC LIFEREG

ODS Table Name


ClassLevels
ConvergenceStatus
CorrB
CovB
IterHistory
LagrangeStatistics
LastGrad
LastHess
ParameterEstimates
ModelInfo

Description
Class variable levels
Convergence status
Parameter estimate correlation matrix
Parameter estimate covariance matrix
Iteration history
Lagrange statistics
Last Evaluation of the Gradient
Last Evaluation of the Hessian
Parameter estimates
Model information

Statement
CLASS
MODEL
MODEL
MODEL
MODEL
MODEL
MODEL
MODEL
MODEL
MODEL

Option
default
default
CORRB
COVB
ITPRINT
NOINT | NOSCALE
ITPRINT
ITPRINT
default
default

 Depends on data.

Examples
Example 36.1. Motorette Failure
This example fits a Weibull model and a lognormal model to the example given in
Kalbfleisch and Prentice (1980, p. 5). An output data set called models is specified
to contain the parameter estimates. By default, the natural log of the variable time
is used by the procedure as the response. After this log transformation, the Weibull
model is fit using the extreme value baseline distribution, and the lognormal is fit
using the normal baseline distribution.
Since the extreme value and normal distributions do not contain any shape parameters, the variable SHAPE1 is missing in the models data set. An additional output
data set, out, is requested that contains the predicted quantiles and their standard errors for values of the covariate corresponding to temp=130 and temp=150. This is
done with the control variable, which is set to 1 for only two observations.
Using the standard error estimates obtained from the output data set, approximate
90% confidence limits for the predicted quantities are then created in a subsequent
DATA step for the log response. The logs of the predicted values are obtained because
the values of the P= variable in the OUT= data set are in the same units as the original
response variable, time. The standard errors of the quantiles of the log(time) are
approximated (using a Taylor series approximation) by the standard deviation of time
divided by the mean value of time. These confidence limits are then converted back
to the original scale by the exponential function. The following statements produce
Output 36.1.1 through Output 36.1.5.

SAS OnlineDoc: Version 8

Example 36.1.

Motorette Failure

1787

title Motorette Failures With Operating Temperature as a Covariate;


data motors;
input time censor temp @@;
if _N_=1 then
do;
temp=130;
time=.;
control=1;
z=1000/(273.2+temp);
output;
temp=150;
time=.;
control=1;
z=1000/(273.2+TEMP);
output;
end;
if temp>150;
control=0;
z=1000/(273.2+temp);
output;
datalines;
8064 0 150 8064 0 150 8064 0 150 8064 0 150 8064 0 150
8064 0 150 8064 0 150 8064 0 150 8064 0 150 8064 0 150
1764 1 170 2772 1 170 3444 1 170 3542 1 170 3780 1 170
4860 1 170 5196 1 170 5448 0 170 5448 0 170 5448 0 170
408 1 190 408 1 190 1344 1 190 1344 1 190 1440 1 190
1680 0 190 1680 0 190 1680 0 190 1680 0 190 1680 0 190
408 1 220 408 1 220 504 1 220 504 1 220 504 1 220
528 0 220 528 0 220 528 0 220 528 0 220 528 0 220
;
proc print data=motors;
run;
proc lifereg data=motors outest=models covout;
a: model time*censor(0)=z;
b: model time*censor(0)=z / dist=lnormal;
output out=out quantiles=.1 .5 .9 std=std p=predtime
control=control;
run;
proc print data=models;
id _model_;
title fitted models;
run;
data out1;
set out;
ltime=log(predtime);
stde=std/predtime;
upper=exp(ltime+1.64*stde);
lower=exp(ltime-1.64*stde);
proc print;
id temp;
title quantile estimates and confidence limits;
run;

SAS OnlineDoc: Version 8

1788 

Chapter 36. The LIFEREG Procedure

Output 36.1.1.

Motorette Failure Data

Motorette Failures With Operating Temperature as a Covariate


Obs

time

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32

.
.
1764
2772
3444
3542
3780
4860
5196
5448
5448
5448
408
408
1344
1344
1440
1680
1680
1680
1680
1680
408
408
504
504
504
528
528
528
528
528

Output 36.1.2.

censor
0
0
1
1
1
1
1
1
1
0
0
0
1
1
1
1
1
0
0
0
0
0
1
1
1
1
1
0
0
0
0
0

temp

control

1
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0

2.48016
2.36295
2.25632
2.25632
2.25632
2.25632
2.25632
2.25632
2.25632
2.25632
2.25632
2.25632
2.15889
2.15889
2.15889
2.15889
2.15889
2.15889
2.15889
2.15889
2.15889
2.15889
2.02758
2.02758
2.02758
2.02758
2.02758
2.02758
2.02758
2.02758
2.02758
2.02758

130
150
170
170
170
170
170
170
170
170
170
170
190
190
190
190
190
190
190
190
190
190
220
220
220
220
220
220
220
220
220
220

Motorette Failure: Model A


The LIFEREG Procedure
Model Information
Data Set
Dependent Variable
Censoring Variable
Censoring Value(s)
Number of Observations
Noncensored Values
Right Censored Values
Left Censored Values
Interval Censored Values
Missing Values
Name of Distribution
Log Likelihood

WORK.MOTORS
Log(time)
censor
0
30
17
13
0
0
2
WEIBULL
-22.95148315

Analysis of Parameter Estimates

Variable
Intercept
z
Scale

DF

Estimate

1
1
1

-11.89122
9.03834
0.36128

SAS OnlineDoc: Version 8

Standard
Error Chi-Square Pr > ChiSq Label
1.96551
0.90599
0.07950

36.6019
99.5239

<.0001 Intercept
<.0001
Extreme value scale

Example 36.2.
Output 36.1.3.

Computing Predicted Values for a Tobit Model

1789

Motorette Failure: Model B


The LIFEREG Procedure
Model Information
Data Set
Dependent Variable
Censoring Variable
Censoring Value(s)
Number of Observations
Noncensored Values
Right Censored Values
Left Censored Values
Interval Censored Values
Missing Values
Name of Distribution
Log Likelihood

WORK.MOTORS
Log(time)
censor
0
30
17
13
0
0
2
LNORMAL
-24.47381031

Analysis of Parameter Estimates

Variable
Intercept
z
Scale

DF

Estimate

1
1
1

-10.47056
8.32208
0.60403

Output 36.1.4.

Standard
Error Chi-Square Pr > ChiSq Label
2.77192
1.28412
0.11073

14.2685
42.0001

0.0002 Intercept
<.0001
Normal scale

Motorette Failure: Fitted Models


fitted models

_MODEL_
A
A
A
A
B
B
B
B

_NAME_

_TYPE_

_DIST_

time
Intercept
z
Scale
time
Intercept
z
Scale

PARMS
COV
COV
COV
PARMS
COV
COV
COV

WEIBULL
WEIBULL
WEIBULL
WEIBULL
LNORMAL
LNORMAL
LNORMAL
LNORMAL

Output 36.1.5.

_STATUS_
0
0
0
0
0
0
0
0

Converged
Converged
Converged
Converged
Converged
Converged
Converged
Converged

_LNLIKE_

Intercept

time

-22.9515
-22.9515
-22.9515
-22.9515
-24.4738
-24.4738
-24.4738
-24.4738

-11.8912
3.8632
-1.7788
0.0345
-10.4706
7.6835
-3.5557
0.0327

-1.0000
-11.8912
9.0383
0.3613
-1.0000
-10.4706
8.3221
0.6040

_SCALE_

9.03834
-1.77878
0.82082
-0.01488
8.32208
-3.55566
1.64897
-0.01285

0.36128
0.03448
-0.01488
0.00632
0.60403
0.03267
-0.01285
0.01226

_SHAPE1_
.
.
.
.
.
.
.
.

Motorette Failure: Quantile Estimates and Confidence Limits


quantile estimates and confidence limits

temp
130
130
130
150
150
150

time
.
.
.
.
.
.

censor
0
0
0
0
0
0

control

_PROB_

PREDTIME

STD

ltime

1
1
1
1
1
1

2.48016
2.48016
2.48016
2.36295
2.36295
2.36295

0.1
0.5
0.9
0.1
0.5
0.9

12033.19
26095.68
56592.19
4536.88
9838.86
21336.97

5482.34
11359.45
26036.90
1443.07
2901.15
7172.34

9.3954
10.1695
10.9436
8.4200
9.1941
9.9682

stde
0.45560
0.43530
0.46008
0.31808
0.29487
0.33615

upper
25402.68
53285.36
120349.65
7643.71
15957.38
37029.72

lower
5700.09
12779.95
26611.42
2692.83
6066.36
12294.62

Example 36.2. Computing Predicted Values for a Tobit Model


The LIFEREG Procedure can be used to perform a Tobit analysis. The Tobit model,
described by Tobin (1958), is a regression model for left censored data assuming a
normally distributed error term. The model parameters are estimated by maximum
likelihood. PROC LIFEREG provides estimates of the parameters of the distribution
of the uncensored data. Refer to Greene (1993) and Maddala (1983) for a more
complete discussion of censored normal data and related distributions. This example
shows how you can use PROC LIFEREG and the data step to compute two of the
three types of predicted values discussed there.
SAS OnlineDoc: Version 8

1790 

Chapter 36. The LIFEREG Procedure

Consider a continuous random variable Y, and a constant C. If you were to sample


from the distribution of Y but discard values less than (greater than) C, the distribution
of the remaining observations would be truncated on the left (right). If you were to
sample from the distribution of Y and report values less than (greater than) C as C,
the distribution of the sample would be left (right) censored.
The probability density function of the truncated random variable Y0 is given by

fY0 (y) =

fY (y )
Pr(Y > C)

for

y>C

where fY (y ) is the probability density function of Y. PROC LIFEREG cannot compute the proper likelihood function to estimate parameters or predicted values for a
truncated distribution.
Suppose the model being fit is specified as follows:

Yi = x0i + i
where i is a normal error term with zero mean and standard deviation  .
Define the censored random variable Yi as

Yi = 0 if Yi  0
Yi = Yi if Yi > 0
This is the Tobit model for left-censored normal data. Yi is sometimes called the
latent variable. PROC LIFEREG estimates parameters of the distribution of Yi by
maximum likelihood.
You can use the LIFEREG procedure to compute predicted values based on the mean
functions of the latent and observed variables. The mean of the latent variable Yi
is x0i and you can compute values of the mean for different settings of xi by specifying XBETA=variable-name in an OUTPUT statement. Estimates of x0i for each
observation will be written to the OUT= data set. Predicted values of the observed
variable Yi can be computed based on the mean

E (Yi ) = 

x0i

(x0i + i )

where

i =

(x0i =)
(x0i =)

 and  represent the normal probability density and cumulative distribution functions.

SAS OnlineDoc: Version 8

Example 36.2.

Computing Predicted Values for a Tobit Model

1791

The following table shows a subset of the Mroz (1987) data set. In this data, Hours is
the number of hours the wife worked outside the household in a given year, Yrs Ed is
the years of education, and Yrs Exp is the years of work experience. A Tobit model
will be fit to the hours worked with years of education and experience as covariates.

Hours
0
0
0
0
0
0
1000
1960
0
2100
3686
1920
0
1728
1568
1316
0

Yrs Ed
8
8
9
10
11
11
12
12
13
13
14
14
15
16
16
17
17

Yrs Exp
9
12
10
15
4
6
1
29
3
36
11
38
14
3
19
7
15

If the wife was not employed (worked 0 hours), her hours worked will be left censored
at zero. In order to accommodate left censoring in PROC LIFEREG, you need two
variables to indicate censoring status of observations. You can think of these variables
as lower and upper endpoints of interval censoring. If there is no censoring, set both
variables to the observed value of Hours. To indicate left censoring, set the lower
endpoint to missing and the upper endpoint to the censored value, zero in this case.
The following statements create a SAS data set with the variables Hours, Yrs Ed,
and Yrs Exp from the data above. A new variable, Lower is created such that
Lower=. if Hours=0 and Lower=Hours if Hours>0.
data subset;
input Hours Yrs_Ed Yrs_Exp @@;
if Hours eq 0
then Lower=.;
else Lower=Hours;
datalines;
0 8 9 0 8 12 0 9 10 0 10 15 0 11 4 0 11 6
1000 12 1 1960 12 29 0 13 3 2100 13 36
3686 14 11 1920 14 38 0 15 14 1728 16 3
1568 16 19 1316 17 7 0 17 15
;

The following statements fit a normal regression model to the left censored Hours
data using Yrs Ed and Yrs Exp as covariates. You will need the estimated standard
SAS OnlineDoc: Version 8

1792 

Chapter 36. The LIFEREG Procedure

deviation of the normal distribution to compute the predicted values of the censored
distribution from the formulas above. The data set OUTEST contains the standard
deviation estimate in a variable named SCALE . You also need estimates of x0i .
These are contained in the data set OUT as the variable Xbeta
proc lifereg data=subset outest=OUTEST(keep=_scale_);
model (lower, hours) = yrs_ed yrs_exp / d=normal;
output out=OUT xbeta=Xbeta;
run;

Output 36.2.1 shows the results of the model fit. These tables show parameter estimates for the uncensored, or latent variable, distribution.
Output 36.2.1.

Parameter Estimates from PROC LIFEREG


The LIFEREG Procedure
Model Information
Data Set
Dependent Variable
Dependent Variable
Number of Observations
Noncensored Values
Right Censored Values
Left Censored Values
Interval Censored Values
Name of Distribution
Log Likelihood

WORK.SUBSET
Lower
Hours
17
8
0
9
0
NORMAL
-74.9369977

Analysis of Parameter Estimates

Variable
Intercept
Yrs_Ed
Yrs_Exp
Scale

DF

Estimate

1
1
1
1

-5598.6
373.14771
63.33711
1582.9

Standard
Error Chi-Square Pr > ChiSq Label
2850.2
191.88717
38.36317
442.67318

3.8583
3.7815
2.7258

0.0495 Intercept
0.0518
0.0987
Normal scale

The following statements combine the two data sets created by PROC LIFEREG
to compute predicted values for the censored distribution. The OUTEST= data set
contains the estimate of the standard deviation from the uncensored distribution, and
the OUT= data set contains estimates of x0i .
data predict;
drop lambda _scale_ _prob_;
set out;
if _n_ eq 1 then set outest;
lambda = pdf(NORMAL,Xbeta/_scale_)
/ cdf(NORMAL,Xbeta/_scale_);
Predict = cdf(NORMAL, Xbeta/_scale_)
* (Xbeta + _scale_*lambda);
label Xbeta=MEAN OF UNCENSORED VARIABLE
Predict = MEAN OF CENSORED VARIABLE;
run;
proc print data=predict noobs label;
var hours lower yrs: xbeta predict;
run;

SAS OnlineDoc: Version 8

Example 36.3.

Overcoming Convergence Problems by Specifying Initial Values




1793

Output 36.2.2 shows the original variables, the predicted means of the uncensored
distribution, and the predicted means of the censored distribution.
Output 36.2.2.

Predicted Means from PROC LIFEREG

Hours

Lower

Yrs_Ed

0
0
0
0
0
0
1000
1960
0
2100
3686
1920
0
1728
1568
1316
0

.
.
.
.
.
.
1000
1960
.
2100
3686
1920
.
1728
1568
1316
.

8
8
9
10
11
11
12
12
13
13
14
14
15
16
16
17
17

Yrs_Exp
9
12
10
15
4
6
1
29
3
36
11
38
14
3
19
7
15

MEAN OF
UNCENSORED
VARIABLE
-2043.42
-1853.41
-1606.94
-917.10
-1240.67
-1113.99
-1057.53
715.91
-557.71
1532.42
322.14
2032.24
885.30
561.74
1575.13
1188.23
1694.93

MEAN OF
CENSORED
VARIABLE
73.46
94.23
128.10
276.04
195.76
224.72
238.63
1052.94
391.42
1672.50
805.58
2106.81
1170.39
951.69
1708.24
1395.61
1809.97

Example 36.3. Overcoming Convergence Problems by


Specifying Initial Values
This example illustrates the use of parameter initial value specification to help overcome convergence difficulties.
The following statements create a data set and request a Weibull regression model be
fit to the data.
data raw;
input censor x c1 @@;
datalines;
0 16 0.00
0 17 0.00
0
0 17 0.04
0 18 0.04
0
0 23 0.40
0 22 0.40
0
0 33 4.00
0 34 4.00
0
1 54 40.00 1 54 40.00 1
1 54 400.00 1 54 400.00 1
;
run;

18
18
22
35
54
54

0.00
0.04
0.40
4.00
40.00
400.00

proc print;
run;
title OLS (default) initial values;
proc lifereg data=raw;
model x*censor(1) = c1 / distribution = weibull itprint;
run;

Output 36.3.1 shows the data set contents.

SAS OnlineDoc: Version 8

1794 

Chapter 36. The LIFEREG Procedure

Output 36.3.1.

Contents of the Data Set


Obs
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18

censor

c1

0
0
0
0
0
0
0
0
0
0
0
0
1
1
1
1
1
1

16
17
18
17
18
18
23
22
22
33
34
35
54
54
54
54
54
54

0.00
0.00
0.00
0.04
0.04
0.04
0.40
0.40
0.40
4.00
4.00
4.00
40.00
40.00
40.00
400.00
400.00
400.00

Convergence was not attained in 50 iterations for this model, as the messages to the
log indicate:
WARNING: Convergence not attained in 50 iterations.
WARNING: The procedure is continuing but the validity of the model
fit is questionable.

The first line (iter=0) of the iteration history table, in Output 36.3.2, shows the default
initial ordinary least squares (OLS) estimates of the parameters.
Output 36.3.2.

Initial Least Squares


OLS (default) initial values

Iter

Ridge

Loglike

Intercept

c1

Scale

-22.891088

3.2324769714

0.0020664542

0.3995754195

The log logistic distribution is more robust to large values of the response than the
Weibull, so one approach to improving the convergence performance is to fit a log
logistic distribution, and if this converges, use the resulting parameter estimates as
initial values in a subsequent fit of a model with the Weibull distribution.
The following statements fit a log logistic distribution to the data.
proc lifereg data=raw;
model x*censor(1) = c1 / distribution = llogistic;
run;

The algorithm converges, and the maximum likelihood estimates for the log logistic
distribution are shown in Output 36.3.3

SAS OnlineDoc: Version 8

References
Output 36.3.3.

1795

Estimates from the Log Logistic Distribution


The LIFEREG Procedure
Model Information
Data Set
Dependent Variable
Censoring Variable
Censoring Value(s)
Number of Observations
Noncensored Values
Right Censored Values
Left Censored Values
Interval Censored Values
Name of Distribution
Log Likelihood

WORK.RAW
Log(x)
censor
1
18
12
6
0
0
LLOGISTC
12.093136846

Analysis of Parameter Estimates

Variable

DF

Estimate

1
1
1

2.89828
0.15921
0.04979

Intercept
c1
Scale

Standard
Error Chi-Square Pr > ChiSq Label
0.03179
0.01327
0.01218

8309.4488
143.8537

<.0001 Intercept
<.0001
Logistic scale

The following statements re-fit the Weibull model using the maximum likelihood
estimates from the log logistic fit as initial values.
proc lifereg data=raw outest=outest;
model x*censor(1) = c1 / itprint distribution = weibull
intercept=2.898 initial=0.16 scale=0.05;
output out=out xbeta=xbeta;
run;

Examination of the resulting output in Output 36.3.4 shows that the convergence
problem has been solved by specifying different initial values.
Output 36.3.4.

Final Estimates from the Weibull Distribution


The LIFEREG Procedure
Model Information
Data Set
Dependent Variable
Censoring Variable
Censoring Value(s)
Number of Observations
Noncensored Values
Right Censored Values
Left Censored Values
Interval Censored Values
Name of Distribution
Log Likelihood

WORK.RAW
Log(x)
censor
1
18
12
6
0
0
WEIBULL
11.232023272

Algorithm converged.

Analysis of Parameter Estimates

Variable
Intercept
c1
Scale

DF

Estimate

1
1
1

2.96986
0.14346
0.08437

Standard
Error Chi-Square Pr > ChiSq Label
0.03264
0.01652
0.01887

8278.8602
75.4316

<.0001 Intercept
<.0001
Extreme value scale

SAS OnlineDoc: Version 8

1796 

Chapter 36. The LIFEREG Procedure

References
Allison, P.D. (1995) Survival Analysis Using the SAS System: A Practical Guide,
Cary, NC: SAS Institute.
Cox, D.R. (1972), Regression Models and Life Tables (with discussion), Journal
of the Royal Statistical Society, Series B, 34, 187220.
Cox, D.R. and Oakes, D. (1984), Analysis of Survival Data, London: Chapman and
Hall.
Elandt-Johnson, R.C. and Johnson, N.L. (1980), Survival Models and Data Analysis,
New York: John Wiley & Sons, Inc.
Green, W.H. (1993) Econometric Analysis, 2nd Edition, New York: Cambridge University Press.
Gross, A.J. and Clark, V.A. (1975), Survival Distributions: Reliability Applications
in the Biomedical Sciences, New York: John Wiley & Sons, Inc.
Kalbfleisch, J.D. and Prentice, R.L. (1980), The Statistical Analysis of Failure Time
Data, New York: John Wiley & Sons, Inc.
Klein, J.P. and Moeschberger, M.L. (1997), Survival Analysis: Techniques for Censored and Truncated Data, Berlin: Springer.
Lawless, J.E. (1982), Statistical Models and Methods for Lifetime Data, New York:
John Wiley & Sons, Inc.
Lee, E.T. (1980), Statistical Methods for Survival Data Analysis, Belmont, CA: Lifetime Learning Publications.
Maddala, G.S. (1983), Limited-Dependent and Qualitative Variables in Econometrics, New York: Cambridge University Press.
Mroz, T.A. (1987) The Sensitivity of an Empirical Model of Married Womens Work
to Economic and Statistical Assumptions, Econometrica 55, 765799.
Rao, C.R. (1973), Linear Statistical Inference and Its Applications, New York: John
Wiley & Sons, Inc.
Tobin, J. (1958), Estimation of Relationships for Limited Dependent Variables,
Econometrica, 26, 2436.

SAS OnlineDoc: Version 8

The correct bibliographic citation for this manual is as follows: SAS Institute Inc.,
SAS/STAT Users Guide, Version 8, Cary, NC: SAS Institute Inc., 1999.

SAS/STAT Users Guide, Version 8


Copyright 1999 by SAS Institute Inc., Cary, NC, USA.
ISBN 1580254942
All rights reserved. Produced in the United States of America. No part of this publication
may be reproduced, stored in a retrieval system, or transmitted, in any form or by any
means, electronic, mechanical, photocopying, or otherwise, without the prior written
permission of the publisher, SAS Institute Inc.
U.S. Government Restricted Rights Notice. Use, duplication, or disclosure of the
software and related documentation by the U.S. government is subject to the Agreement
with SAS Institute and the restrictions set forth in FAR 52.22719 Commercial Computer
Software-Restricted Rights (June 1987).
SAS Institute Inc., SAS Campus Drive, Cary, North Carolina 27513.
1st printing, October 1999
SAS and all other SAS Institute Inc. product or service names are registered trademarks
or trademarks of SAS Institute Inc. in the USA and other countries. indicates USA
registration.
Other brand and product names are registered trademarks or trademarks of their
respective companies.
The Institute is a private company devoted to the support and further development of its
software and related services.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy