8 Software Reliability Models With Environmental Factors: Et Al. (2001a) - The Information About The Background of
8 Software Reliability Models With Environmental Factors: Et Al. (2001a) - The Information About The Background of
_
ij
ij
n
ij
i
r
w
r
(8.1)
where n is the number of factors on the jth survey. Therefore
1
1 for all
n
ij
i
w j
_
.
Different people may give different original ranking and some of them may
give higher scores for all factors. Therefore, the summation of all the scores from
f1 to f32 ranges from 117 to 200. By normalizing the original ranking scores using
equation (8.1), the final weight for the ith factor can be written as
1
*
_
l
ij
j
i
w
w
l
(8.2)
where l is the number of surveys used in this method.
Analysis of Variance Method
Analysis of variance (ANOVA) model is a very versatile statistical tool for
studying the relationship between a dependent variable and one or more
independent variables. The response in the model is named dependent variable
because its value depends on other variables, and explanatory variables are named
independent variables since their values are not influenced by anything else.
A factor in ANOVA is an independent variable to be studied in the model. A
level is a factor is a particular form or value of that factor. The primary purpose of
ANOVA is to determine the effects of factors on the response and to single out the
most significant factors.
One-way ANOVA Model
0 ij ij i
r o r - -
where
r
ij
is the original or observed response value
0
is a constant or intercept term in the model, the mean of all the cells
i
o is the main effect for a factor, say A, at the ith level
ij
r is the random error term, which follows normal distribution with mean 0
262 System Software Reliability
Using one-way ANOVA, we can perform the hypothesis tests to classify all
factors into different groups according to the significance of their impact on the
software reliability analysis.
Two-way ANOVA Model
0 j ijk
( * )
ijk i ij
r o o r - - - -
where
r
i
is the original or observed response value
0
is a constant or intercept term in the model, the mean of all the cells
i
o is the main effect for a factor, say A, at the ith level
j
is the main effect for a factor, say B, at the jth level
( * ) o
ij
is the interaction term of A and B at the ijth level
ij
r is the random error term
One-way ANOVA can be used to rank the weights of the factors and select the
most important ones. Like relative weight method, one-way ANOVA treats the
original ranking from the survey equally. Therefore, these two methods may have
some bias in the analysis. The two-way ANOVA can be used to overcome this
weakness.
Two-way ANOVA can be used to get rid of survey bias and adjust the mean
ranking score of significance for each environmental factor. People who have
different experience of software development may not have same opinions on the
significance of the environmental factors. The background factors then can be the
title, the experience and so on. These can be the factors of two-way ANOVA
analysis, and each can have several levels. Also interaction between these
influence factors can also be tested. After these analyses, the information can be
used to adjust the mean ranking score for each environmental factor. Based on this
information, further analyses can be conducted on the treatments of the survey
data. The disadvantage of this model is its complexity. The model validations such
as normality and independence can be obtained in Zhang and Pham (2000a).
Based on the information obtained from the survey data, Zhang and Pham
(2000a) studied a number of hypotheses as follows:
Hypothesis 1: The significance of the impacts of the 32 factors on software
reliability assessment is of the same level. Intuitively, the impacts of the 32 factors
may not be the same. Some may have more significant impacts than others; then
the ranking of these factors in terms of their impacts on software reliability
assessment will be desirable.
Hypothesis 2: People playing different roles in software development have the
same opinion on the significance of the 32 factors. Managers, system engineers,
programmers, and testers may not have the same opinion on the significance of all
these factors. This hypothesis will find out whether their opinion can be considered
as the same.
Software Reliability Models with Environmental Factors 263
Hypothesis 3: People developing software for different applications have the same
opinion on the importance of the 32 factors. Safety-critical, commercial and inside-
used systems are considered here.
8.3 Exploratory Analysis of Environmental Factors
Relative Weight Method
Table 8.4 shows the results by the relative weight method. The ten most important
environmental factors are classified as factors in the analysis phase (three factors),
coding (one factor), testing (four factors), and general (two factors). The column
Normalized priorities gives the contribution of each environmental factor. For
example, program complexity factor contributes approximately 3.7% (its relative
weight = 0.03768). A higher priority value indicates a higher ranking. The
application of this finding in Table 8.4 is not to discard the environmental factors
belonging to lower ranking classes, but hopefully, to help software developers or
managers prioritize their tasks.
ANOVA
ANOVA method was performed on each quantitative survey variable including all
environmental factors and mean and variance of these factors are calculated. The
final ranking based on this information listed in Table 8.5. It seems that the final
ranking of the environmental factors is consistent with the one we got from relative
weight method. For example, the top 10 factors remain the same except that factor
# 8 (frequency of specification change) ranks two positions down.
ANOVA method also classified the factors into several groups in terms of their
importance. The first five factors are the first class, which is the most important
factors, the next five factors belong to the second group and so on. This finding can
be used to help software developers to determine which are the most important
groups of environmental factors subject to the available resources.
Correlation Analysis
Correlation analysis is also studied based on the survey information. The purpose
is to find out the correlation of environmental factors and determine if they are
independent or not. (If not, then which factors are related to each other?) Table 8.6
shows the correlation of the environmental factors. Correlation analysis aims at
finding out the correlation among the factors. In other words, to find out whether
the factors are independent or not. If not, which factors are related to each other? In
this section, we present the result obtained from a correlation test of the factors.
For example, Factor 1 (program complexity) is statistical significantly
correlated with Factor 17 (development team size). For those correlated factors, we
may not want to include all of them in the software reliability models provided that
one has been considered. This is because by considering one of these related
factors we already take the contributions of these factors into consideration. Inclu-
ding the correlated ones will not make much additional contribution but just
increase the complexity of the model.
264 System Software Reliability
Table 8.4. Results ranking based on relative weight method
Rank Rank
factors
Factor name Normalized
priorities
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
f1
f15
f25
f22
f21
f8
f24
f11
f6
f12
f5
f27
f18
f26
f16
f19
f3
f10
f20
f14
f23
f4
f2
f13
f32
f9
f17
f7
f28
f31
f30
f29
Program complexity
Programmer skills
Testing coverage
Testing effort
Testing environment
Frequency of specification change
Testing methodologies
Requirements analysis
Percentage of reused code
Relationship of detailed design, requirement
Level of programming technologies
Documentation
Program workload
Testing tools
Programmer organization
Domain knowledge
Difficulty of programming
Design methodologies
Human nature (mistake and omission)
Development management
Testing resource allocation
Amount of programming effort
Program categories
Work standards
System software
Volume of program design documents
Development team size
Programming language
Processor
Telecommunication device
Input/output device
Storage device
0.03768
0.03693
0.03675
0.03650
0.03533
0.03483
0.03433
0.03417
0.03369
0.03330
0.03315
0.03281
0.03275
0.03227
0.03210
0.03180
0.03171
0.03171
0.03169
0.03166
0.03096
0.03072
0.03058
0.02985
0.02839
0.02750
0.02738
0.02711
0.02414
0.02404
0.02291
0.02127
Software Reliability Models with Environmental Factors 265
Table 8.5. Final ranking based on ANOVA method
SNK grouping
Mean N Factor
no.
Factor name Final
grouping
A 6.0435 23 f1 Program complexity 1
A 6.0000 23 f15 Programmer skills 1
A 5.9565 23 f25 Testing coverage 1
A 5.9565 23 f22 Testing effort 1
A 5.7826 23 f21 Testing environment 1
B A 5.6087 23 f24 Testing methodologies 2
B A 5.6087 23 f11 Requirements analysis 2
B A 5.6087 23 f8 Frequency of spec change 2
B A 5.5652 23 f6 Percentage of reused code 2
B A 5.5000 22 f18 Program work load 2
B A C 5.4762 23 f12 Relationship of detailed
design and requirement
3
B A C 5.4348 23 f5 Level of programming
technologies
3
B A C 5.3913 23 f27 Documentation 3
B A C 5.3636 22 f26 Testing tools 3
B A C 5.2727 22 f19 Domain knowledge 3
B A C 5.2174 23 f23 Difficulty of programming 3
B A C 5.1905 21 f23 Testing resource allocation 3
B A C 5.1739 23 f10 Design methodologies 3
B A C 5.1739 23 f16 Programmer organization 3
B A C 5.1364 22 f14 Development management 3
B A C 5.1304 23 f20 Human nature (mistake and
omission
3
B A C 5.0909 22 f4 Amount of programming
effort
3
B A C 5.0000 23 f2 Program categories 3
B A C 5.0000 20 f13 Work ttandards 3
B D A C 4.7727 22 f32 System software 4
B D A C 4.4783 23 f7 Programming language 4
B D A C 4.4348 23 f17 Development team size 4
B D A C 4.4286 21 f9 Volume of program design
documents
4
B D C 4.0909 22 f28 Processor 5
B D C 4.0455 22 f31 Telecommunication device 5
D C 3.90916 22 f30 Input / output device 6
D 3.5455 22 f29 Storage device 7
266 System Software Reliability
8.4 Further Exploratory Analysis
This section considers those 11 top factors in Table 8.4 by looking at ways how to
combine those related factors so that the dimension of the factor can be reduced
without losing much information (Zhang et al. 2001a). Table 8.7 presents a list of
the top 11 ranking environmental factors.
A factor analysis method is used to find the explanation of relationships among
the EFs to derive a small number of linear combinations of EFs that retain as much
of the information in the original EFs as possible. This linear combination of EFs
will be called common factor to avoid confusion with environmental factors and
it will be used in place of the original EFs for regression analysis later. Suppose the
environmental factors can be grouped by their correlation; then the factors that
belong to the same group are highly correlated among themselves but have
relatively small correlation with factors in a different group.
Factor Analysis
The factor analysis based on those 11 EFs is discussed. The eigenvalues of the
correlation, the proportion of variation represented, and the cumulative proportions
of variation are summarized in Table 8.8. It shows that the common factors
(denoted by C
i
s) and the first four factors, C
1
through C
4
, have eigenvalues greater
than 1 and a drop below 10% of the variance explained after the C
4
. Therefore, four
common factors are retained.
To figure out the characteristic of the common factors we chose the EFs with
factor loading greater than 0.6 in absolute value and listed them in Table 8.9.
Factor loadings describe the correlation between the common factors emerging
from a factor analysis and the original EFs used in the construction of the factors.
The higher the factor loading for a given EF, the more that the EF contributes to
the factor score. Note that the first common factor, C
1
, has high loadings which
exceed 0.6 for f5, f12, f21 and f22. Since it is difficult to pin a specific label on the
first common factor we call it the Overall factor. In many cases, the first common
factor represents an overall measure of the information contained in all the
variables. The first common factor explains 32.58% (Table 8.8) of the total
variation. f6, f24, and f25 have large loadings on the second common factor, C
2
.
These EFs related to the index of testing efficiency. Therefore, the second common
factor is called the 'Testing Efficiency' factor. Similarly, we call the third and
fourth common factors the 'Requirement and Specification' factor and 'Program
and Skill' factor, respectively.
Software Reliability Models with Environmental Factors 267
Table 8.6. Correlation of environmental factors
Factor
no.
Name of factor Correlated factors
f1
f15
f25
f22
f21
f8
f24
f11
f6
f12
f5
f27
f18
Program complexity
Programmer skills
Testing coverage
Testing effort
Testing environment
Frequency of specification
change
Testing methodologies
Requirements analysis
Percentage of reused
code
Relationship of detailed
design and requirement
Level of programming
technologies
Documentation
Program workload
f 17 Development team size
f 3 Difficulty of programming
f 18 Program workload
f 2 Program categories
f 24 Testing methodologies
f 5 Level of programming technologies
f 13 Work standards
f 14 Development management
f 21 Testing environment
f 5 Level of programming technologies
f 12 Relationship of detailed design and
requirement
f 13 Work standards
f 22 Testing effort
f 25 Testing coverage
f 26 Testing tools
f 12 Relationship of detailed design and
requirement
f 30 Input/output device
f 32 System software
f 7 Programming language
f 9 Volume of program design
documents
f 14 Development management
f 18 Program workload
f 23 Testing resource allocation
f 24 Testing methodologies
f 26 Testing tools
f 27 Documentation
f 11 Requirements analysis
f 21 Testing environment
f 21 Testing environment
f 22 Testing effort
f 6 Percentage of reused code
f 26 Testing tools
f 31 Telecommunication device
f 3 Difficulty of programming
f 4 Amount of programming effort
f 7 Programming language
f 13 Work standards
f 15 Programmer skills
f 23 Testing resource allocation
f 26 Testing tools
f 30 Input/output device
268 System Software Reliability
Table 8.6. (continued)
Factor
no.
Name of factor Correlated factors
f26
f16
f19
f3
f10
f20
f14
f23
f4
f2
f13
f32
f9
f17
f7
f28
f31
f30
f29
Testing tools
Programmer organization
Domain knowledge
Difficulty of programming
Design methodologies
Human nature (mistake and
omission)
Development management
Testing resource allocation
Amount of programming effort
Program categories
Work standards
System software
Volume of program design
documents
Development team size
Programming language
Processor
Telecommunication device
Input/output device
Storage device
f 6 Percentage of reused code
f 18 Program workload
f 27 Documentation
f 6 Percentage of reused code
f 15 Programmer skills
f 18 Program workload
f 2 Program categories
f 22 Testing effort
f 6 Percentage of reused code
f 24 Testing methodologies
f 6 Percentage of reused code
f 7 Programming language
f 18 Program workload
f 10 Design methodologies
f 25 Testing coverage
f 14 Development management
f 18 Program workload
f 21 Testing environment
f 22 Testing effort
f 11 Requirements analysis
f 28 Processor
f 30 Input/output device
f31 Telecommunication device
f 6 Percentage of reused code
f 1 Program complexity
f 18 Program workload
f 4 Amount of pro. effort
f 6 Percentage of reused code
f 18 Program workload
f 29 Storage device
f 31 Telecom. device
f 27 Documentation
f 28 Processor
f 32 System software
f 28 Processor
f 30 Input/output device
Software Reliability Models with Environmental Factors 269
Table 8.7. Top 11 ranking EFs based on relative weight method
Rank EF Name
1
2
3
4
5
6
7
8
9
10
11
f1
f15
f25
f22
f21
f8
f24
f11
f6
f12
f5
Program complexity
Programmer skills
Testing coverage
Testing effort
Testing environment
Frequency of specification change
Testing methodologies
Requirements analysis
Percentage of reused code
Relationship of detailed design and requirement
Level of programming technologies
Table 8.8. Eigenvalue of the correlation matrix
Common
factor
Eigenvalue
Percentage
Cumulative
percentage
C
1
C
2
C
3
C
4
C
5
C
6
C
7
C
8
C
9
C
10
C
11
3.5836
1.7940
1.5753
1.3039
0.8316
0.7020
0.4288
0.3616
0.2622
0.1120
0.0450
32.58
16.31
14.32
11.85
7.56
6.38
3.90
3.29
2.38
1.02
0.41
32.58
48.89
63.21
75.06
82.62
89.00
92.90
96.19
98.57
99.59
100.00
Regression Analysis
In this section, we examine the relationships between the four common factors and
the software reliability assessment improvement based on multiple linear regres-
sion. The weighted linear combinations of Efs (X
i
: i = 1,,4) for each index and
the improvement of the accuracy of software reliability assessment (IASRA)
(Section B of the survey form in Zhang and Pham 2000a) are considered as
independent variables and dependent variable, respectively.
Let k
ij
be the jth loading on the ith index and l
ij
be the normalized loading
using the following form:
_
ij
ij
ij
ij
k
l
k
, i = 1 ,.., 4
The linear combination of EFs that will be used as independent variables is
270 System Software Reliability
*
_ i ij ij
j
X l EF
where EF
ij
denotes the jth EF score in the ith common factor C
i
. Here X
1
and X
2
represent the weighted linear combinations of 'Overall' and 'Testing Efficiency'
index scores, respectively. X
3
and X
4
represent the weighted linear combinations of
'Requirement and Specification' and 'Program and Skill Level' index score for each
one.
Table 8.9. Identification of the common factors
Common
factor
Index
EF
Name
Loading
C
1
Overall
F21
F22
F5
F12
Testing environment
Testing effort
Level of programming
Technologies
relationship of detailed design
and requirements
0.926
0.803
0.718
0.696
C
2
Testing
efficiency
F24
F25
F6
Testing methodologies
Testing coverage
Percentage of reused code
0.844
0.823
0.714
C
3
Requirements
and
specification
F11
F8
Requirements analysis
Frequency of specification
change
0.826
0.603
C
4
Program and
skill level
F15
F1
Programmer skills
Program complexity
0.845
0.605
There are a total of 15 different linear regression models: 4 simple linear
regression models with only 1 independent variable each: 6 linear regression
models with 2 independent variables: 4 linear regression models with 3 indepen-
dent variables: and one full linear regression with all 4 independent variables.
Among the 15 linear regression models, six models turned out to be significant
which include all simple regression models (Models I, II, III, and IV) and 2 models
(Model V and VI) with 2 independent variables (Zhang et al. 2001a). The intercept
is not statistical significant in any of these regression models. The results are
summarized in Table 8.10.
From Table 8.10, for example, the regression model VI of the estimate of
IASRA,
IASRA = 3.31* X
3
+2.76*X
4
.
This equation implies that the predicted score for the improvement of the accuracy
of software reliability assessment increases as the 'Requirement and Specification'
index and in the 'Program and Skill Level' index increases.
Software Reliability Models with Environmental Factors 271
Table 8.10. Parameter estimates based on the common factors
Model
No. of
variables
in model
Variables
Parameter
estimate
P-value
R
2
I
II
III
IV
V
VI
1
1
1
1
2
2
X
1
X
2
X
3
X
4
X
2
X
4
X
3
X
4
1.29
2.20
6.29
5.68
1.08
2.97
3.31
2.76
0.0001
0.0001
0.0001
0.0001
0.0571
0.0436
0.0369
0.0522
0.9433
0.9332
0.9368
0.9348
0.9464
0.9485
Testing Between - Phases Within the Development Process
Thirty-two EFs are now divided into five categories: 'General', 'Analysis and
Design', 'Coding', 'Testing' and 'Hardware systems'. In this analysis, we consider
'Analysis and Design', 'Coding' and 'Testing' as the software development process
phase (Zhang et al. 2001a). The null hypothesis is that these phases are of the same
significant level. The results of one-way ANOVA including the Student-Newman-
Keuls (SNK) multiple comparison tests are summarized in Table 8.11.
Table 8.11. SNK test result
SNK grouping Phase Mean
A
A
A
A
Testing
Coding
General
Analysis and design
5.43
5.35
5.24
5.03
The results show that these phases have slightly different mean values, ranging
from 5.43 for the Testing phase to 5.03 for the Analysis and design phase. The
same letter in the SNK grouping shown in the first column of Table 8.11 indicates
that the different among the mean values are not statistically significant. Zhang and
Pham (2000a) studied the time allocation for each of the development phases and it
was found that the requirement analysis, design, coding, and testing takes about 25,
18, 36, and 21% of the entire development time. However, this result indicates that
each development phase is considered equally important in terms of their impact
on software reliability assessment.
272 System Software Reliability
Identifying Significant Environmental Factors Within Each Phase
This section discusses the most important subsets of environmental factors
describing the relationship between software reliability assessment and environ-
mental factors for each phase. We consider the improvement of IASRA as a
dependent variable and the 32 environmental factors as independent variables. The
significant factors and their parameter estimates based on a linear regression
backward elimination method are presented in Table 8.12.
Table 8.12. Significant EFs for each phase
Phase Variables Name Parameter
estimate
P-
value
R
2
General f1
f6
Program complexity
Percentage of reused
modules
9.17
3.17
0.0001
0.0907
0.9697
Analysis
and
design
f8
f10
f13
Frequency of
program
specification change
Design methodology
Work standards
3.37
4.90
6.42
0.0635
0.0063
0.0068
0.9801
Coding f17
f19
Development team
size
Domain knowledge
8.88
6.46
0.0192
0.0341
0.9551
Testing f21 Testing environment 12.57 0.0001 0.9703
The hypothesis of zero intercept was not rejected for any phase regression
model. Therefore, it may be appropriate to remove the constant term from the
model. Note that the program complexity (f1) and percent of reused modules (f6)
are significant for General phase with p-value 0.0001 and 0.0907, respectively.
That means program complexity and percent of reused modules provides
significant information for the prediction of software reliability assessment in
'General' phase. Similar interpretation can be represented in the different develop-
ment phase. More findings can be obtained in Zhang et al. (2001a) and Zhang and
Pham (2000a).
8.5 A Generalized Model with Environmental Factors
In this section, we discuss several newly developed software reliability models that
consider environmental factors by combining the proportional hazard model (Cox
1975) and existing software reliability models (Pham 2000a). Such factors are, e.g.,
the complexity metrics of the software, the development and environmental condi-
tions, the effect of mental stress and human nature, the level of the test-team
Software Reliability Models with Environmental Factors 273
members, and the facility level during testing. The proportional hazard model has
been widely used in medical applications to estimate the survival rate of patients.
Notation
% z Vector of environmental factors
%
Coefficient vector of environmental factors
( z) d
%
% Function of environmental factors
0
( ) / t Failure intensity rate function without environmental factors
( , ) / % t z Failure intensity rate function with environmental factors
0
( ) m t Baseline mean value function without environmental factors
( , ) % m t z Mean value function with environmental factors
0
( / ) R x t Baseline reliability function without environmental factors
( / , ) % R x t z Reliability function with environmental factors
Chapter 6 has discussed the fault intensity rate function ( ) / t and the mean
value function m(t) based on NHPP without environmental factors. In this section,
a fault intensity rate function that integrates environmental factors based on a
proportional hazard model can be constructed using the following assumptions:
1. The fault intensity rate function consists of two categories: the fault intensity
rate functions without environmental factors,
0
( ) / t , and the environmental factor
function, ( z) d
%
% .
2. The fault intensity rate function
0
( ) t / and the function of the environmental
factors are independent. The function
0
( ) t / is also called the baseline intensity
function.
Based on the proportional hazard model (PHM), let us consider the failure
intensity function of a software system as the product of an unspecified baseline
failure intensity /
0
(t), a function that only depends on time, and environmental
factor function ( z) d
%
% incorporating the effects of a number of environmental
factors.
The fault intensity function with environmental factors, ( , ) t z / % , can be
expressed as:
0
( , ) ( ) ( z) / / d
%
% % t z t (8.3)
The mean value function with environmental factors then can be obtained as
follows:
0 0 0
0 0
( , ) ( ) ( ) ( ) ( ) ( ) ( ) / / d d d
) )
% % %
% % % %
t t
m t z s z ds z s ds z m t (8.4)
The reliability function with environmental factors can be expressed as
follows:
274 System Software Reliability
( )
( )
[ ]
0 0
( , ) ( , )
( z) ( , ) ( z) ( , )
( z)
0
( / , )
( / )
- - -
- d - -d
d
% %
% %
% % % %
%
%
%
m t x z m t z
m t x z m t z
R x t z e
e
R x t
(8.5)
The basic assumption for PHM is that the ratio of the failure intensity
functions of any two errors observed at any time t associated with any environ-
mental factor sets z
li
and z
2i
is a constant with respect to time and they are
proportional to each other. In other words, (t
i
, z
1i
) is directly proportional to (t
i
, z
2i
).
Assuming the exponential function of environmental form, then a failure
intensity function of the software reliability model that considers environmental
factors can be written as
1
( )
0
( ; ) ( )
/ /
m
j ji
j
z
i i i
t z t e (8.6)
where
z
ji
environmental factor j of the ith error
j
regression coefficient of the jth factor
t
i
failure time between the (i-1)
th
error and ith error, i = 1, 2, . . . , n
z
i
environmental factor of the ith error
m number of environmental factors.
It is easy to see that /
0
(t) is a baseline failure intensity function that represents the
failure intensity when all environmental factors variables are set to zero.
Let Z be a column vector consisting of the environmental factors and B
represents a row vector consisting of the corresponding regression parameters.
Then the above failure intensity model can be rewritten as
( )
0
( ; ) ( ) / /
BZ
t Z t e (8.7)
Therefore, the reliability of the software systems can be written in a general form,
as follows:
0
0
( )
0
0
( )
( )
0
( ; )
[ ( )]
/
/
-
-
)
(
)
(
(
(
t
BZ
BZ
t
BZ
BZ
s e ds
e
s e ds
e
R t Z e
e
R t
(8.8)
where R
0
(t) is the time-dependent software reliability. The pdf of the software sys-
tem is given by
0 0
( ; ) ( ; ) ( ; )
( ) [ ( )]
/
/
BZ
BZ e
f t Z t Z R t Z
t e R t
Software Reliability Models with Environmental Factors 275
The regression coefficient B can be estimated, using either the MLE method or
the maximum partial likelihood approach, which is discussed later, without
assuming any specific distributions about the failure data and estimating the
baseline failure intensity function. A direct generalization of the above model in
equation (8.7) is that one may want to consider the environmental factor variables
Z
ji
as a function of time. In this case, a mathematical generalized form of the failure
intensity function is given by
1
( )
0
( ; ) ( )
/ /
(
(
(
_
m
j ji
j
z t
t Z t e (8.9)
8.6 Environmental Parameter Estimation
In this section, we will discuss how to estimate the parameters in the environmental
factor model by using two widely used methods: the MLE method and the partial
likelihood method. The advantage of the partial likelihood method is that it does
not require as much data as the typical maximum likelihood method. Therefore, the
data collection required by regression method can be simplified. Information of
similar applications and settings of environmental factors that have been stored in
databases can be utilized.
Environmental Factors Estimation Using MLE
Assume that there are p unknown parameters in the baseline failure intensity
function /
0
(t), say o
l
, o
2
, . . ., o
p
, and there are m environmental factors
1
,
2
, ...,
m
. Let A = (o
l
, o
2
, . . ., o
p
) be a set of unknown parameterso
l
, o
2
, . . ., o
p
, and B be
a set of
1
,
2
, ...,
m
. Then the likelihood function is given by
1
1
1
0 0
1
( , ) ( ; )
( ) [ ( )]
/
| |
|
_
|
\ .
| |
|
|
\ .
| |
_
|
|
|
\ .
j
j
m
m
z
j ji
j ji
j
j
n
i i
i
z n
e
i i
i
L A B f t z
t e R t
(8.10)
The log likelihood function is given by
1
0
1 1 1 1
ln ( , ) ln[ ( )]
| |
|
|
\ .
_
- -
_ __ _
m
j ji
j
z n n m n
i j ji
i i j i
L A B t z e
Taking the first partial derivatives of the log likelihood function with respect to (m
+p) parameters, we obtain
1
0 0
1 1 0 0
[ ( )] [ ( )]
[ln ( , )]
( ) ( )
o o
/
o /
| |
o o
|
|
o o
\ .
_
o
-
o
_ _
m
j ji
j k k
z n n
i i
i i k i i
t R t
L A B e
t R t
276 System Software Reliability
1
0
1 1
[ln ( , )] ln[ ( )]
| |
|
|
\ .
_
o
-
o
_ _
m
j ji
j
z n n
si si i
i i s
L A B z z e R t
where k = 1, 2, - - -, p and s = 1, 2, ..., m.
Setting the previous equations equal to zero, we can obtain all the (m + p)
parameters by solving the following system of (m + p) equations simultaneously:
1
0 0
1 0 0
[ ( )] [ ( )]
0 for 1, 2, ...,
( ) ( )
o o
/
/
| |
o o
|
|
o o
\ .
(
_
(
-
(
(
_
m
j ji
j k k
z n
i i
i i i
t R t
e k p
t R t
1
0
1
1 ln[ ( )] 0 for 1, 2, ...,
| |
|
|
\ .
(
_
(
-
(
(
_
m
j ji
j
z n
si i
i
z e R t s m
Environmental Factors Estimation Using Maximum Partial Likelihood
Approach
According to the idea of Cox's proportional hazard model, we can use the
maximum partial likelihood method to estimate environmental factors without
assuming any specific distributions about the failure data and estimating the
baseline failure intensity function. The only basic assumption of this model is that
the ratio of the failure intensity functions of any two errors observed at any time t
associated with any environmental factor sets z
li
and z
2i
is constant with respect to
time and they are proportional to each other.
First we estimate the environmental factor parameters based on the partial
likelihood function. The partial likelihood function of this model is given by
1 1 2 2
1 1 2 2
( ...... )
( ...... )
1
( )
- - -
- - -
j
_
i i m mi
k k l lk
i
z z z n
z z z
i
k R
e
L B
e
(8.11)
where R
i
is the risk set at t
i
. Take the derivatives of the log partial likelihood
function with respect to
1
,
2
, ...,
m
and let them equal zero. Therefore, we can
obtain all of the estimated
s
by solving these equations simultaneously using
numerical methods. After estimating the factor parameters
1
,
2
, ...,
m
, the
remaining task is to estimate the unknown parameters of the baseline failure
intensity function /
0
(t).
8.7 Enhanced Proportional Hazard Jelinski-Moranda (EPJM)
Model
Recall that the Jelinski-Moranda (JM) model is one of the earliest models
developed for predicting software reliability (see Chapter 5). The failure intensity
of the software at the ith failure interval of this model is given by
Software Reliability Models with Environmental Factors 277
( ) [ ( 1)] 1, 2,..., / o - -
i
t N i i N
and the probability density function is given by
[ ( 1)]
( ) [ ( 1)]
o
o
- - -
- -
i
N i t
i
f t N i e
The enhanced proportional hazard JM model (Pham 2000a), called the EPJM
model, which is based on the proportional hazard and J M model, is expressed as
1
( ; ) [ ( 1)]
/ o
| |
|
_
|
\ .
- -
m
z
j ji
j
i i
t z N i e (8.12)
and the pdf corresponding to (t
i
, z
i
) is given by
1
1
[ ( 1)]
( ; ) [ ( 1)]
o
o
| |
|
_
|
\ .
| |
|
_
|
\ .
(
(
(
- - -
(
(
- -
m
z
j ji
j
i m
z
j ji
j
N i t e
i i
f t z N i e e
Now we wish to estimate the parameters of the EPJM model using the two
methods discussed in Section 5, the maximum likelihood method and the
maximum partial likelihood method. There are (m + 2) unknown parameters in this
model.
The Maximum Likelihood Method
From equation (8.10), the likelihood function of the model is given by
1
1
1
{ [ ( 1)] }
1
( , , ) ( ; )
( [ ( 1)] )
o
o
o
| |
| |
|
|
|
|
|
| \ .
\ .
_
_
- - -
- -
j
j
m
m
z
j ji
z
j ji
j
j
i
n
i i
i
n
N i t e
i
L B N f t z
N i e e
The log likelihood function is given by
1
1 1 1
( )
1
ln ( , , ) ln ln[ ( 1)]
[ ( 1)]
o o
o
| |
- - - -
|
|
\ .
_
- - -
_ _ _
_
m
j ji
j
n n m
j ji
i i j
n z
i
i
L B N n N i z
N i t e
Taking the first partial derivatives of the log likelihood function with respect
to (m+2) parameter:
1
,
2
, ...,
m
, N, and d, we obtain the following:
1
( )
1
log
[ ( 1)]
o o
_
o
- - -
o
_
m
j ji
j
n z
i
i
L n
N i t e
1
( )
1 1
log 1
[ ( 1)]
_
o
-
o - -
_ _
m
j ji
j
n n z
i
i i
L
t e
N N i
and
278 System Software Reliability
1
( )
1 1
log
[ ( 1)]
_
o
- - -
o
_ _
m
j ji
j
n n z
ji i ji
i i j
L
z N i t z e
Setting all of these equations equal to zero, we can obtain the estimated (m+2)
parameters by solving the following system equations simultaneously using a
numerical method:
1
( )
1
[ ( 1)]
_
- -
_
m
j ji
j
n z
i
i
n
N i t e
1
( )
1 1
1
[ ( 1)]
- -
_ _
m
j ji
j
n n z
i
i i
t e
N i
(8.13)
1
( )
1 1
[ ( 1)] for 1, 2,...,
_
- -
_ _
m
j ji
j
n n z
i ji ji
i i
N i t z e z j m
The Maximum Partial Likelihood Method
Assume that the baseline failure intensity has the form of the JM model. That
means that the basic assumption of this model (see Section 4) is satisfied and that
the ratio of the failure intensity functions of any two errors observed at any time t,
associated with any environmental factor sets z
li
and z
2i
, is a constant with respect
to time and they are proportional to each other.
Having estimated the factor parameters
1
,
2
, ...,
m
the remaining task is to
estimate the unknown parameters of the baseline failure intensity function. Note
that the failure intensity function model has the form
1 2 1 2
( ...... )
( ; ) [ ( 1)]
[ ( 1)]
/ o
o
r r r
- - -
- -
- -
i i mi m
z z z
i i
i
t z N i e
N i E
where
1 2 1 2
( ...... )
r r r
- - -
i i mi m
z z z
i
E e
The pdf is given by
( [ ( 1)] )
( ; ) [ ( 1)]
o
o
- - -
- -
i i
E N i t
i i i
f t z E N i e
The likelihood function is given by
( [ ( 1)] )
1
( , ) ( [ ( 1)] )
o
o o
- - -
- -
j
i i
n
E N i t
i
i
L N E N i e
By taking the log of the likelihood function and its derivatives with respect to N
and o, and setting them equal to zero, we obtain the following equations:
Software Reliability Models with Environmental Factors 279
1 1
ln 1
0
( 1)
o
o
-
o - -
_ _
n n
i i
i i
L
E t
N N i
and
1
ln
[ ( 1)] 0
o o
o
- - -
o
_
n
i i
i
L n
E N i t
The estimated N and o can be obtained as follows. First, the parameter N can
be obtained by solving the following equation:
1 1 1
1
[ ( 1)]
[ ( 1)]
| || |
- -
| |
- -
\ .\ .
_ _ _
n n n
i i i i
i i i
E N i t n E t
N i
(8.14)
After finding N, the parameter can easily be obtained and is given by
1
1
1
[ ( 1)]
o
- -
_
_
n
i
n
i i
i
N i
E t
(8.15)
8.8 Applications
Almost all software reliability engineering models need one of two basic types of
input data: time-domain data and interval-domain data. One can possibly transform
between the two types of data domains. The time-domain approach is characterized
by recording the individual times at which the failure occurred. The interval--
domain approach is characterized by counting the number of failures that occurred
over a given period.
Application 8.1: To illustrate the EPJM model, we use the software failure data
reported by Musa (1975) and also refer to data set #9 in Chapter 4. The data is
related to a real-time command and control system. There is, however, no record of
corresponding environmental factor measures in most, if not all, existing available
data. To demonstrate the use of this model, we generate a failure-cluster factor and
give its value which is logically realistic based on the failure data and consultation
with several local software firms by the author.
One of the assumptions of the J-M model is that the time between failures is
independent. As in many real testing environments, the failure times indeed occur
in a cluster, i.e., the failure time within a cluster is relatively shorter than that
between the clusters. Data set #9 shows that it is reasonable in that particular
application. This may indicate that the assumption of independent failure time is
not correct. We can enhance the J-M model considering the failure-cluster factor
by generating this factor based on the failure data.
We assume that if the present failure time, compared to the previous failure
time, is relatively short, then some correlation may exist between them. Let us
define a failure-cluster factor, such as
280 System Software Reliability
1 2
1 when 7 5
0 otherwise
- -
i i
i i
t t
t t
i
or
z
The data used in this model include both the failure time data and the explanatory
environmental factor data (see Table 8.13). The explanatory variable data is
dynamic, that is, it changes depending on the failure time. For example, in Table
8.13, the time between the fourth and fifth errors is 115 seconds; the time between
the fifth and sixth errors is 9 seconds. Therefore, z
5
is assigned to 0 and z
6
is equal
to 1.
For the J-M model, using the MLE, we obtain the estimate of the two
parameters, N and o, as follows:
5
142
(3.48893) 10 o
-
N
x
Therefore, the current reliability of the software system is given by
137
[ (137 1)]
137
( )
o
r r
- - -
N t
R t e
Now, we want to predict the future failure behavior using only data collected in
the past after 136 errors have been found. For example, the reliability of the
software for the next 100 seconds after 136 errors are detected is given by
137
-5
[ (137 1)]
137
(3.48893 10 )[142 136](100)
( 100)
0.979284
o
r r
- - -
- -
N t
R t e
e
Similarly, the reliability of the software for the next 1000 seconds is given by
(0.0000348893[142 136](1,000)
137
( 1000)
0.811123
- -
R t e
Assume that we use the partial likelihood approach to estimate the environmental
factor parameter for the EPJM model. As there is only one factor in this example,
we can easily obtain the estimated parameter using the statistical software package
SAS:
1
1.767109
r
with a significance level of 0.0001. Then the estimates of N and o are given as
follows:
5
141
(3.28246) 10 o
-
N
x
Software Reliability Models with Environmental Factors 281
Table 8.13. Musas failure time data with a generated covariate
Fault Time z Fault Time z
1 3 0 35 227 0
2 30 0 36 65 0
3 113 0 37 176 0
4 81 0 38 58 0
5 115 0 39 457 0
6 9 1 40 300 0
7 2 1 41 97 0
8 91 0 42 263 0
9 112 0 43 452 0
10 15 1 44 255 0
11 138 0 45 197 0
12 50 0 46 193 0
13 77 0 47 6 1
14 24 0 48 79 0
15 108 0 49 816 0
16 88 0 50 1351 0
17 670 0 51 148 1
18 120 0 52 21 1
19 26 1 53 233 0
20 114 0 54 134 0
21 325 0 55 357 0
22 55 0 56 193 0
23 242 0 57 236 0
24 68 0 58 31 1
25 422 0 59 369 0
26 180 0 60 748 0
27 10 1 61 0 1
28 1146 0 62 232 0
29 600 0 63 330 0
30 15 1 64 365 0
31 36 1 65 1222 0
32 4 1 66 543 0
33 0 1 67 10 1
34 8 0 68 16 1
282 System Software Reliability
Table 8.13. (continued)
Fault Time z Fault Time z
69 529 0 103 108 0
70 379 0 104 0 1
71 44 1 105 3110 0
72 129 0 106 1247 0
73 810 0 107 943 0
74 290 0 108 700 0
75 300 0 109 875 0
76 529 0 110 245 0
77 281 0 111 729 0
78 160 0 112 1897 0
79 828 0 113 447 0
80 1011 0 114 386 0
81 445 0 115 446 0
82 296 0 116 122 0
83 1755 0 117 990 0
84 1064 0 118 948 0
85 1783 0 119 1082 0
86 860 0 120 22 1
87 983 0 121 75 1
88 707 0 122 482 0
89 33 1 123 5509 0
90 868 0 124 100 1
91 724 0 125 10 1
92 2323 0 126 1071 0
93 2930 0 127 371 0
94 1461 0 128 790 0
95 843 0 129 6150 0
96 12 1 130 3321 0
97 261 0 131 1045 1
98 1800 0 132 648 1
99 865 0 133 5485 0
100 1435 0 134 1160 0
101 30 1 135 1864 0
102 143 1 136 4116 0
Therefore,
1 1
5.853905 for 1
1 for 0
i
z
i
z
E e
z
The current reliability of the software system is given by
Software Reliability Models with Environmental Factors 283
4
137
4
137
137 137
137
9.6076 10
1.64123 10
[ (137 1)]
( )
for 1
for 0
o
-
-
r r
-
-
- - -
t
t
E N t
R t e
e z
e z
Assuming that
28
( 1) 0.20588
136
108
( 0) 0.79412
136
P Z
P Z
The reliability of the software for the next 100 seconds is given by
R(t
137
= 100) = 0.90839 for z = 1 with probability = 0.20588
= 0.98372 for z = 0 with probability = 0.79412
or, equivalently, that
R(t
137
= 100) = 0.95375.
Similarly, the reliability of the software for the next 1000 seconds is given by
137
0.3826 for 1 with probability 0.20588
R(t 1000)
0.84864 for 0 with probability 0.79412
z
z
or
R(t
137
= 1000) = 0.74021.
In the next two applications, we use Pham-Zhang NHPP model given in
equation (6.62) to illustrate the model with environmental factors. The correspon-
ding Pham-Zhang NHPP model baseline intensity function can be expressed as
follows:
0
-bt
2
1
( ) [( )(1 ) ( )]
(1 e )
( )
[( )(1 ) ]
(1 )
o
o
/
o
o
- - -
- - -
-
-
- - - -
- -
- -
- - - -
- -
bt t bt
bt t bt
bt
bt
ab
t c a e e e
b
be ab e e
c a e
b e
(8.16)
Any NHPP models can easily be integrated into a model with environmental
factors using equation (8.5).
Application 8.2: The first set of software failure data is collected from testing a
program for a monitor and real-time control systems (Tohma 1991) (also see data
set #8, Chapter 4). Table 8.14 records the software failures detected during a 111-
day testing period. The only environmental factor available for this application is
the testing team size. Team size is one of the most useful measures in the software
development process since it has a close relationship with the testing effort, testing
efficiency and the development management issues.
From the correlation analysis of the 32 environmental factors, team size is the
only environmental factor correlated to the program complexity, which is the
number one significant factor according to our environmental factor study.
284 System Software Reliability
Intuitively, the more complex the software, the larger the development team.
Therefore, testing team size is an important factor to be incorporated into the
software reliability analysis.
Table 8.14 combines the information of testing team size with the software
failure data. It is interesting to note that there are two clusters where increasing
number of faults are detected. Checking the testing team size, we find that the
testing team size was enlarged for the periods associated with the two clusters
where increasing number of failures were encountered (day 11 - day 17 and day 36
- day 42). This indicates that testing team is an important factor we need to consi-
der at least for this data set.
Since the testing team size ranges from 1 to 8, we first categorize the factor of
team size into two levels. Let
1
z denote the factor of team size as follows:
1
0 team size ranges from 1-4
1 team size ranges from 5-8
z
After carefully examining the failure data, we find that after day 61, the
software turns stable and the failures occur with a much slower frequency.
Therefore, we use the first 61 data points for testing the goodness-of-fit and
estimating the parameters. Then we use the calibrated model to predict the
remaining 50 data points and compare the prediction to the 50 data points actually
observed (from day 62 to day 111) for examining the predictive power of software
reliability models.
From equation (8.16), the intensity function with environmental factor is given by:
1 1
1 1
0
-bt
2
( ) ( )
1
[( )(1 ) ( )]
(1 e )
( )
[( )(1 ) ]
(1 )
o
/ /
o
o
- - -
- - -
-
-
- - - -
- -
`
- -
- - - -
- -
)
z
bt t bt
z
bt t bt
bt
bt
t t e
ab
c a e e e
b
e
be ab e e
c a e
b e
(8.17)
First, the coefficient
1
is estimated using partial likelihood estimate method. The
partial likelihood method estimates the coefficients of covariates separately from
the parameters in the baseline intensity function. From equation (8.11), the
likelihood function of partial likelihood method is given by
exp( )
( )
[ exp( )]
=
| |
|
|
|
|
\ .
j
_
i
i
d i
m
m R
z
L
z
(8.18)
where
i
d represented the tie failure times. The estimate of
1
for our example is
1
0.0246 with p-value 0.01, which indicates that this factor is statistical signifi-
cant to consider. We then substitute
1
z (8.19)
The estimate of
1
' for this example is
1
bt
m t a e
a t a
b t b
1,052,528 978.14
Delayed S-
shaped
2
( ) (1 (1 ) )
( )
( )
1
-
- -
-
bt
m t a bt e
a t a
b t
b t
bt
83,929.3 983.90
Inflexion S-
shaped
(1 )
( )
1
( )
( )
1
-
-
-
-
-
bt
bt
bt
a e
m t
e
a t a
b
b t
e
1,051,714.7 980.14
Yamada
exponential
( )
(1 )
( ) (1 )
( )
( )
o
-
- -
-
-
t
r e
t
m t a e
a t a
b t r e
1,085,650.8 979.88
Yamada
Rayleigh
2
( / 2)
2
(1 )
/ 2
( ) (1 )
( )
( )
o
-
- -
-
-
t
r e
t
m t a e
a t a
b t r te
86,472.3 967.92
Imperfect
debugging (1)
t
( ) ( )
( )
( )
o
o
o
-
-
-
bt
t
ab
m t e e
b
a t ae
b t b
791,941 981.44
Imperfect
debugging (2)
( ) [1 ][1 ]
( ) (1 )
( )
o
o
o
-
- - -
-
bt
m t a e a t
b
a t a t
b t b
238,324 984.62
Software Reliability Models with Environmental Factors 289
Table 8.15. (continued)
PNZ Model
( ) [(1 )(1 ) ]
1
o
o
-
-
- - -
-
bt
bt
a
m t e t
b e
( ) (1 )
( )
1
bt
a t a t
b
b t
e
o
-
-
-
94,112.2 965.37
PZ model
-bt
t
1
( ) [( )(1 )
(1 e )
( )]
( ) (1 )
( )
1
o
o
-
- -
-
-
- -
-
- -
-
- -
-
bt
t bt
bt
m t c a e
ab
e e
b
a t c a e
b
b t
e
86,180.8 960.68
Environmental
factor model
1 1
-bt
t
1
( ) [( )(1 )
(1 e )
( )]
( ) (1 )
( )
1
o
o
-
- -
-
-
- -
-
- -
-
- -
-
bt
z t bt
bt
m t c a e
ab
e e e
b
a t c a e
b
b t
e
560.82 890.68
From Table 8.16, we can see that increases in faults are associated with
increases in code size. This indicates that change of code size is an important factor
to be considered. From Table 8.17 we can see that the environmental factor model
seems to provide the best predictive power according to the SSE and AIC values.
Other measures such as the testing effort and development cost are usually
estimated based on code size. Therefore, code size is an important factor to be
incorporated into the software reliability analysis.
Let z
c
denote the factor of changed code size as follows:
0 changed code 1,000 NLOC
1 1,000 NLOC changed code 5,000 NLOC
2 5,000 NLOC changed code 10,000 NLOC
3 10,000 NLOC changed code
s
< s
< s
<
c
z (8.20)
After carefully examining the failure data, we find that the failures occur with
a much slower frequency after 1013.9 staff days of testing. Therefore, we use data
290 System Software Reliability
up to the 1013.9 staff-days to fit the models and estimate the parameters, and use
calibrated model to predict the remaining data and compare the predictive power of
software reliability models.
Similar to analysis of Application 2, the estimate of
1
for our example is
1
0.00567 with p-value 0.048, which indicates that this factor is significant to
consider. The estimates of parameters in the baseline failure intensity function in
equation (17) are as follows:
101.0, 0.004, 8.9, 0.0148, and 803.5 a b c o .
Table 8.17 lists the SSE and AIC values for the model comparison. From the
results it is seen that the number of initial faults is 804 c and the number of
introduced faults is 102 a . Therefore, the number of total faults in the software is
about 906. By the end of the software testing, 870 faults were detected, which
implies that the number of residual faults is about 36.
Table 8.16. Software testing data for application 3
Staff
days
Faults Code
size
z Staff
days
Faults Code size z Staff
days
Faults Code size z
0 0 0 0 207.2 97 213093 2 424.9 321 272457 1
4.8 0 16012 3 211.9 98 219248 2 434.2 326 273741 1
6 0 16012 3 217 105 221355 1 442.7 339 275025 1
14.3 7 32027 3 223.5 113 223462 1 451.4 346 276556 1
22.8 7 48042 3 227 113 225568 1 456.1 347 278087 1
32.1 7 58854 3 234.1 122 227675 1 460.8 351 279618 1
41.4 7 69669 3 241.6 129 229784 1 466 356 281149 1
51.2 11 80483 3 250.7 141 233557 1 472.3 359 283592 1
60.6 12 91295 3 259.8 155 237330 1 476.4 362 286036 1
70 13 102110 3 268.3 166 241103 1 480.9 367 288480 1
79.9 15 112925 3 277.2 178 244879 1 486.8 374 290923 1
91.3 20 120367 2 285.5 186 247946 1 495.8 376 293367 1
97 21 127812 2 294.2 190 251016 1 505.7 380 295811 1
107.7 22 135257 2 298 190 254086 1 516 392 298254 1
119.1 28 142702 2 305.2 195 257155 1 526.2 399 300698 1
127.6 40 150147 2 312.3 201 260225 1 527.3 401 300698 1
135.1 44 152806 1 318.2 209 260705 0 535.8 405 303142 1
142.8 46 155464 1 328.9 224 261188 0 546.3 415 304063 0
148.9 48 158123 1 334.8 231 261669 0 556.1 425 305009 0
156.6 52 160781 1 342.7 243 262889 0 568.1 440 305956 0
163.9 52 167704 2 350.5 252 263629 0 577.2 457 306902 0
169.7 59 174626 2 356.3 259 264367 0 578.3 457 306902 0
170.1 59 174626 2 360.6 271 265107 0 587.2 467 307849 0
174.7 63 181548 2 365.7 277 265845 0 595.5 473 308795 0
179.6 68 188473 2 386.5 290 267325 1 605.6 480 309742 0
185.5 71 194626 2 396.5 300 268607 1 613.9 491 310688 0
194 88 200782 2 408 310 269891 1 621.6 496 311635 0
200.3 93 206937 2 417.3 312 271175 1 623.4 496 311635 0
Software Reliability Models with Environmental Factors 291
Table 8.16. (continued)
Staff
days
Faults Code size z Staff
days
Faults Code size z Staff
days
Faults Code size z
636.3 502 311750 0 938.3 710 330435 0 1231.6 842 333481 0
649.7 517 311866 0 952 720 330263 0 1240.9 844 333695 0
663.9 527 312467 0 965 729 330091 0 1249.5 845 333909 0
675.1 540 313069 0 967.7 729 330091 0 1262.2 849 335920 1
677.4 543 313069 0 968.6 731 330091 0 1271.3 851 337932 1
677.9 544 313069 0 981.3 740 329919 0 1279.8 854 339943 1
688.4 553 313671 0 997 749 329747 0 1281 854 339943 1
698.1 561 314273 0 1013.9 759 330036 0 1287.4 855 341955 1
710.5 573 314783 0 1030.1 776 330326 0 1295.1 859 341967 0
720.9 581 315294 0 1044 781 330616 0 1304.8 860 341979 0
731.6 584 315805 0 1047 782 330616 0 1305.8 865 342073 0
732.7 585 315805 0 1059.7 783 330906 0 1313.3 867 342168 0
733.6 585 315805 0 1072.6 787 331196 0 1314.4 867 342168 0
746.7 586 316316 0 1085.7 793 331486 0 1320 867 342262 0
761 598 316827 0 1098.4 796 331577 0 1325.3 867 342357 0
776.5 612 318476 1 1112.4 797 331669 0 1330.6 870 342357 0
793.5 621 320125 1 1113.5 798 331669 0 1334.2 870 342358 0
807.2 636 321774 1 1141.1 798 331669 0 1336.7 870 342358 0
811.8 639 321774 1 1128 802 331760 0
812.5 639 321774 1 1139.1 805 331852 0
829 648 323423 1 1151.4 811 331944 0
844.4 658 325072 1 1163.2 823 332167 0
860.5 666 326179 1 1174.3 827 332391 0
876.7 674 327286 1 1184.6 832 332615 0
892 679 328393 1 1198.3 834 332939 0
895.5 686 328393 1 1210.3 836 333053 0
910.8 690 329500 1 1221.1 839 333267 0
925.1 701 330608 1 1230.5 842 333481 0
Table 8.17. Model comparison
Model name SSE AIC
G-O model 240,773 1473.5
Delayed S-shaped 4,322 1422.6
Inflexion S-shaped 246,702 1481.5
Yamada exponential 230,955 1471.1
Yamada Rayleigh 8,824 1441.3
Imperfect debugging (1) 290,449 1491.7
Imperfect debugging (2) 364,398 1496.1
PNZ model 17,753 1441.7
PZ model 8,947 1436.5
Environmental factor model 1,182 1411.0
292 System Software Reliability
8.9 Further Reading
Some interesting research papers and book on this subject are, but not limited to:
Zhang X. and Pham, H., An analysis of factors affecting software reliability,
Journal of Systems and Software, 1999
Venkatesh, G. A. and Fischer, C. N., SPARE: A development environment for
program analysis algorithms, IEEE Trans on Software Engineering, vol 18, no. 4,
April 1992
Madhavji, N.H., Environment Evolution: The Prism model of changes, IEEE
Trans on Software Engineering, vol 18, no. 5, May 1992
8.10 Problems
1. Using the real-time control system as in Table 4.12 (data set #8, Chapter 4),
calculate the MLE for unknown parameters of the EPJM model discussed in
Section 8.7.
2. Based on the first 60 days in Table 4.12 (data set #8, Chapter 4), calculate the
MLE for unknown parameters of the EPJM model.
3. Let us define a failure-cluster factor, such as
1 2
1 when 10 12
0 otherwise
- -
i i
i i
t t
t t
i
or
z
Using the software failure data set #9 in Chapter 4, obtain the entire data set with
the environmental factor variable z
i
. Then estimate the two parameters, N and o, of
EPJM model.