Goodness of Fit Techniques
Goodness of Fit Techniques
Goodness of Fit Techniques
GOODNESS-OF-FIT
TECHNIQUES
1.0
.9
.8
.7
Pni’') 6
76 80 120 126
edited by
Ralph B. D’Agostino
Michael A. Stephens
GOODNE8S-OF-FIT TECHNIQUES
S T A T IS T IC S : T e x tb o o k s and M o n ograph s
A S E R IE S E D IT E D B Y
D e p a r t m e n t o f S ta tistics
S o u t h e r n M e t h o d i s t U n iv e r s it y
Dallas, T ex a s
Vol. 62: Applied Regression Analysis and Experimental Design, Richard J. B rook and
Gregory C A rn o ld
V oL 63: Statpal: A Statistical Package for Microcomputers - PC-DOS Version for the
IBM PC and Compatibles, B ru ce J, Chalmer and D a vid G. W hitmore
V oL 64: Statpal: A Statistical Package for Microcomputers — Apple Version for the
II, II+, and lie, D a vid G. W hitm ore and Bru ce J. Chalmer
V oL 65: Nonparametric Statistical Inference, Second Edition, Revised and
Expanded, Jean Dickinson G ibbons
V oL 66: Design and Analysis o f Experiments, R o g e r G. Petersen
V o l. 67: Statistical Methods for Pharmaceutical Research Planning, Sten W. Bergman and
John C G ittim
Marcel Dekker, Inc. is pleased to offer this specialized service to its readers in
the academic and scientifíc communities.
GOODNESS-OF-FIT TECHNIQUES
ed/ied by
Ralph B. D’Agostino
D epartm ent of M athem atics
Boston University
Boston, M assachusetts
Michael A. Stephens
Departm ent of M athem atics and Statistics
Simon Fraser University
Burnaby, British C olum bia, C anada
Goodness-of-fit techniques.
Neither this book nor any part may be reproduced or transmitted in any form or
by any means, electronic or mechanical, including photocopying, microfilming,
and recording, or by any information storage and retrieval system, without per
mission in writing from the publisher.
M A R C E L DEKKER, INC.
270 Madison Avenue, New York, New York 10016
F rom the earliest days of statistics, statisticians have begun their analysis
by proposing a distribution fo r their observations and then, perhaps with
somewhat less enthusiasm, have checked on whether this distribution is
true. Thus over the years a vast number of test procedures have appeared,
and the study of these procedures has come to be known as goodness-of-flt.
When several of the present authors met at the Annual Meeting of the A m e ri
can Statistical Association in Boston in 1976 and proposed writing a book on
goodn ess-of-fit techniques, we certainly did not foresee the magnitude of
the task ahead. Quite early on we asked P ro fe sso r E . S- Pearson if he would
join us. He declined and stated his view that the time was not yet ripe fo r a
book on the subject. A s w e, nevertheless, have slowly written it, it has often
appeared that his assessm ent was correct. A s fast as we have tried to survey
what we know, with every issue the journals produce new papers with new
techniques and new information.
However, many colleagues have told us that the time is ready fo r a m ajor
sum m ary of the literature, and for some sorting and sifting to take place.
This we have tried to do. The emphasis of this book was determined by the
w riters to be mostly on the practical side. The intent is to give a survey of
the leading methods of testing fit, to provide tables where necessary to make
the tests available, to make (where possible) some assessm ent of the com
parative m erits of different test procedures, and finally to supply numerical
examples to aid in understanding the techniques.
This applied emphasis has led to some difficult decisions. Many goodness-
o f-flt techniques are supported by elegant mathematics involving combina
torics, analysis, and geometric probability, mostly arisin g in the distribution
theory, both sm all-sam ple and asymptotic, o r in examining power and effi
ciency. Furtherm ore, there are many unsolved problem s, especially in
Vl PREFACE
R A L P H B . D ’AGOSTINO
M IC H A E L A . STEPHENS
Acknowledgments
IX
ACKNOWLEDGMENTS
from Stephens (1974), and Chandra, Singpurwalla, and Stephens (1981) for
Chapter 4 .
We thank the editors of the Journal of the Royal Statistical Society and
the Royal Statistical Society for perm ission to reprint tables from Stephens
(1970), Pettitt (1977), Burrow s (1979), and Stephens (1981) in Chapters 4*
and 8. W e thank the editor of Communications in Statistics and M arcel
Dekker, In c ., for perm ission to reprint tables from M ille r and Quesenberry
(1979), and Solomon and Stephens (1983) in Chapter 8. Finally we thank the
editors of the Journal of Statistical Computation and Simulation and Gordon
and Breach Science P ublishers, I n c ., fo r perm ission to reprint a table from
Quesenberry and M ille r (1977).
Contents
P reface V
Acknowledgments ix
Contributors xviii
I . O V E R V IE W
Ralph B . D ’Agostino and M ichael A . Stephens
2. G R A P m C A L A NALYSIS
Ralph B . D ’Agostino
2. 1 Introduction 7
2.2 Em pirical Cumulative Distribution Function 8
2. 3 General Concepts of Probability Plotting 24
2.4 N orm al Probability Plotting 35
2. 5 Lognorm al Probability Plotting 47
2.6 W eibull Probability Plotting 54
2.7 O th erT op ics 57
2.8 ConcludingComment 59
References 59
Xl
Xll CONTENTS
3. TESTS O F С Ш -SQ U AR ED T Y P E 63
David S. Moore
3.1 Introduction 63
3.2 C lassicalC h i-S q u ared S tatistics 64
3.3 G eneral Chi-Squared Statistics 75
3.4 Recommendations on U se o f Chi-Squared Tests 91
References 93
4 •I Introduction 97
4. 2 Em pirical Distribution Function Statistics 97
4.3 G oodn ess-of-F it Tests Based on the EDF (E D F Tests) 102
4.4 EDF Tests fo r a Fully Specified Distribution (Case 0) 104
4. 5 Comments on EDF Tests fo r Case 0 106
4. 6 Pow er of EDF Statistics fo r Case 0 HO
4. 7 EDF Tests fo r Censored Data: Case 0 111
4. 8 EDF Tests for the Norm al Distribution with
Unknown Param eters 122
4. 9 EDF Tests for the Exponential Distribution 133
4.10 ED F Tests fo r the E xtrem e-Value Distribution 145
4.11 ED F Tests fo r the W elbull Distribution 149
4.12 EDF T ests fo r the Gamma Distribution 151
4 . 13 ED F Tests fo r the Logistic Distribution 156
4.14 EDF Tests fo r the Cauchy Distribution 160
4.15 EDF Tests fo r the von M ises Distribution 164
4.16 ED F Tests fo r Continuous Distributions:
M scellaneous Topics 166
4.17 E D F T ests for D iscrete Distributions 171
4.18 Combinations of Tests 176
4 . 19 E D F Statistics as Indicators o f Parent Populations 180
4 . 20 Tests Based on Norm alized Spacings 180
References 185
A P P E N D IX 523
IN D E X 551
Contributors
X V ll
X V lll CONTRIBUTORS
There are several reasons for this. F irst, the distribution of sample
data may throw light on the process that generated the data; if a suggested
model fo r the process is correct, the sample data follow a specific d istri
bution, which can be tested. Also, param eters of the distribution may be
connected with important param eters in describing the basic model. Sec
ondly, knowledge of the distribution of data allows for application of standard
statistical testing and estimation procedures. F o r example, if the data follow
a normal distribution, inferences concerning the means and variances can
be made using t tests, analyses of variances, and F tests; sim ilarly, if the
residuals after fitting a regression model are norm al, tests may be made
on the model param eters. Estimation procedures such as the calculation of
confidence intervals, tolerance intervals, and prediction intervals, often
depend strongly on the underlying distribution. Finally, when a distribution
can be assumed, extreme tail percentiles, which are needed, fo r example,
in environmental work, can be computed.
The fact that it is usually hoped to accept the null hypothesis and proceed
with other analyses as if it w ere true, sets goodness-of-fit testing apart
from most statistical testing procedures. In many testing situations it is
rejection of the null hypothesis which appears to prove a point. This might
be so, for example, in a test for no treatment effects in a factorial analysis—
rejection of Hq indicates one o r m ore treatments to be better than others.
Even when one would like to accept a null hypothesis—for example, in a test
for no interaction in the above factorial analysis—the statistical test is
usually clear and the only problem is with the level of significance. In a
test of fit, where the alternative is very vague, the appropriate statistical
test w ill often be by no means clear and no general theory of Незгтап-
Pearson type appears applicable in these situations. Thus many different,
sometimes elaborate, procedures have been generated to test the same null
hypothesis, and the ideas and motivations behind these are d iverse. Even
when concepts such as statistical power of the procedures a re considered it
ra re ly happens that one testing procedure em erges as superior.
It may happen that the alternative hypothesis has some specification,
although it could be Incomplete; for example, an alternative to the null
hypothesis of normality may be that the random variable has positive skew
ness. When the alternative distribution contains some such specification,
tests of fit should be designed to be sensitive to it. Even in these situations
uniquely best tests are ra ritie s .
In addition to form al hypothesis testing procedures, goodness-of-flt
techniques also include less form al methods, in particular, graphical tech
niques. These have a long history in statistical analysis. Graphs are drawn
so that adherence to o r deviation from the hypothesized distribution results
in certain features of the graph. F o r example, in the probability plot the
ordered observations are plotted against functions of the ranks. In such plots
a straight line indicates that the hypothesized distribution is a reasonable
model for the data and deviations from the straight line indicate inappropri-
OVERVIEW 3
atenesS of the model. The type of departure from the straight line may indi
cate the nature of the true distribution. Historically the straight line has
been judged by eye, and it is only recently that m ore form al techniques have
been given.
have not been made to assess their m erits; for these and sim ilar reasons,
some subjects have been lightly treated, if at a ll.
In goodness-of-fit there are many areas with unsolved problem s, o r
unanswered questions. Some of the subjects on which there w ill surely be
much work in the future include tests for censored data, especially fo r ran
domly censored data, tests based on the em pirical characteristic function,
tests based on spacings, and tests fo r multivariate distributions, especially
fo r multivariate normality. Many comparisons between techniques are still
needed, and also the exploration o f w ider questions such as the relationship
of form al goodness-of-fit testing (as, indeed, in other form s of testing) to
modern, m ore inform al, approaches to statistical analysis where distribu
tional models are not so rigidly specified. W e hope this book sets forth the
m ajor topics of its subject, and w ill act as a base from which these and many
other questions can be explored.
In addition to this chapter the book consists of eleven other chapters. These
are divided into three grou ps. The first consists of Chapters 2 to 7, con
taining general concepts applicable to testing fo r a variety of distributions.
Chapter 2 describes graphical procedures for evaluating goodness-of-fit.
These are informal procedures based mainly on the probability plot, useful
for exploring data and fo r supplementing the form al testing procedures of the
other chapters.
Chapter 3 reviews chi-square-type tests. The classical chi-square
goodness-of-fit tests are reviewed first and then recent developments in
volving general quadratic form s and nonstandard chi-squared statistics are
also discussed.
Chapter 4 presents tests based on the em pirical distribution function
(e d f). These tests include the classical Kolm ogorov-Sm im ov test and other
tests such as the C ram d r-von M ises and Anderson-D arling tests. Considera
tion is given to simple and composite null hypotheses. The norm al, expo
nential, extrem e-value, W eibu ll, and gamma distributions among other
distributions are given individual discussion.
Chapter 5 deals with tests based on regression and correlation. Some
of these procedures can be viewed as arisin g from computing a correlation
coefficient from a probability plot and testing if it differs significantly from
unity. Also involved are tests based on comparisons of linear regression
estimates of the scale param eter of the hypothesized distribution to the esti
mate coming from the sample standard deviation. The Shapiro-W ilk test for
normality is one such test.
In Chapter 6 transformation techniques are reviewed. Here the data are
first transform ed to uniformity and goodness-of-fit tests fo r uniformity are
OVERVIEW
applied to these transform ed data. These techniques can deal with simple
and composite hypotheses.
Tests based on the third and fourth sample moments are presented in
Chapter 7. These techniques w ere first developed to test fo r normality. In
Chapter 7 they are extended to nonnormal distributions.
The second group of chapters consists o f Chapters 8, 9, and 10. These
deal with tests for three distributions—the uniform , the n orm al, and the
exponential —which have played prominent roles in statistical methodology.
Many tests for these distributions have been devised, often based on the
methods of previous chapters, and they are brought together, for each d is
tribution, in these three chapters.
Chapters 11 and 12 form the last group; they cover extra m aterials.
The problem of analyzing censored data is of great importance and Chapter 11
is devoted to this. Many of the previous chapters have sections on censored
data. Chapter 11 collects these together, fills in some om issions, and gives
exam ples; there is also a discussion on probability plotting of censored data.
The final chapter 12 is on the analysis and detection of o u tlie rs. This
m aterial might be considered outside the direct scope o f goodn ess-of-flt
techniques; however, it is closely related to them since they are often applied
with this problem in mind, so we felt it would be useful to close the book with
a chapter on o u tlie rs.
Graphical Analysis
2.1 IN TR O D U C TIO N
the following we w ill point out the specific relations between some
graphical procedures and those form al num erical tests that quantify the
information revealed in the graph s.
2.2 E M P IR IC A L C U M U L A T IV E DISTRIBUTION FU N C T IO N
2 .2 .1 Definition
#(X. < X)
Fn(X) -OO < X < OO (2 . 1)
where #(Xj < x) is read, the number of X j’s less than o r equal to x. The
ecdf is also often called the edf, em pirical distribution function. The plot of
the ecdf is done on arithmetic graph paper plotting i/n as ordinate against
the i’th ordered value of the sam ple, as abscissa. Figure 2.1a is an
ecdf plot of the data set NOR given in the appendix which is a random sample
of size 100 from the normal distribution with mean 100 and standard devia
tion 10.
The ecdf plot provides an exhaustive representation of the data. F o r all
X values F ^(x) converges for large sam ples to F (x ), the value of the under
lying distribution’s cdf at x. This convergence is actually strong convergence
uniformly for all x (R^nyi, 1970, p. 400).
The use of the ecdf plot does not depend upon any assumptions concerning
the underlying param etric distribution and it has some definite advantages
over other statistical devices, v iz .,
1. It is invariant under monotone transformations with regard to quan
tiles. However, its appearance may change.
2 . Its complexity is independent of the number of observations.
GRAPfflCAL ANALYSIS 9
FIGURE 2.1 Em pirical distribution function of NOR data set. (a) Ecdf of
full data set (n = 100). (b) Ecdf of first ten observations.
10 D’AGOSTINO
Two other technical points are worth mentioning here. F irst, as defined
by form ula (2 . 1) the ecdf is actually a step function with steps o r jumps at
the values of the variable that occur in the data. Figure 2. Ia does not d is
play the ecdf as a step function. V ery often it is not displayed as such, espe
cially when the sample size is large and the underlying variable is continuous
as is the case with the NOR data. Figure 2. Ib displays the ecdf as a step
function fo r the first ten observations of the NOR data set. The ordered v a l
ues of these first ten observations along with their ecdf values are:
Ordered observations
F (X ) = i/n
Num ber (i) Value n'
I 84.27 .1
2 90.87 .2
3 92.55 .3
4 96.20 .4
5 98.70 .5
6 98.98 .6
7 100.42 .7
8 101.58 .8
9 106.82 .9
10 113.75 1.0
Second, if the data set consists of grouped data and the variable is con
tinuous , then the ecdf should be defined so that the steps occur at the true
upper class lim its. F o r example, if the frequency table is
C lasses Frequency
10-13 15
14-17 20
18-21 15
13.5 .30
17.5 .70
21.5 1.00
GRAPfflCAL ANALYSIS 11
2 .2 .2 Investigation of S5rmmetry
I -x/e
f(x) = - e
and
f(x) =
cr's/^ 's/1 + ((X - Д)/(Т)2 'B
езф - o i v + ö s i n h - * [(X - ß )/ (T ] Ÿ
d en sity
cu m u lative
N orm al N egative Exponential Johnson SU(1,2)
(Sym m etri c) (P o sitiv e Skew) (N e gative Skew)
Sample percentiles
P
100 - P NOR EXP SU(1,2)
10 21 3.5 1.75
90 25 15.0 1.55
20 11 2.3 .50
80 9 6.0 .85
25 10 2.0 .15
75 7 3.5 .55
40 4 .7 .05
60 3 1.1 .20
If the distribution has positive skewness the portion of the ecdf fo r i/n
values close to I ( e . g . , greater than .9) w ill usually be longer and flatter
(almost p a rallel to the horizontal axis) than the rest of the ecdf. Sim ilarly,
if the distribution has negative skewness the long flat portion w ill lie in the
lower end of its ecdf ( e . g . , i/n values less than . I ). The ecdfs from both
the E X P and SU(1,2) data sets behave as expected.
Another, m ore sensitive and informative graph fo r studying asym m etry
is a sim ple scatter diagram plotting the upper half of the ordered observa
tions against the low er. That is, letting X ^i), ^ (2 )^ * * * » ^ (n ) represent the
ordered observations, plot X^^j versus versus X^2)» ^
general, Х^^+^.ц versus Х(ц fo r i < n/2. Figure 2.4 contains these plots
for the NOR, E X P , and SU(1,2) data sets. A negative unit slope indicates
S5rmmetry, a negative slope exceeding unity in absolute value indicates posi
tive skewness, and a negative slope less than unity in absolute value indi
cates negative skew ness. Notice how w ell this technique identifies the
behavior of the distribution with respect to symmetry. Note also that not all
of the observations are plotted. They are not needed usually fo r a correct
visual identification.
Another useful plotting technique involves plotting the sums Х(ц+ 1 _^) + Х^ц
against the differences - X^¿^, which would produce a horizontal con
figuration fo r a symmetric distribution (Wilk and Gnanades ikan, 1968). A
plot o f the (100 - P)th sample percentile versus the P^^ sample percentile
14 D’AGOSTINO
39 J
125 261^
20'
120
Upper
Observation ^^
no 15
105
100
10
95
90
85
80
75
75 80 85 90 95 100 b 0 2 4 6 -3
fo r 0 < P < 50 is called a symmetry plot and is also useful (Cham bers et a l . ,
1983).
F o rm al num erical techniques fo r investigating and testing fo r S3nmnetry
are often based on the sample statistic. A full treatment of this proce
dure is given in Chapter 7.
2 .2 .3 Detection of Outliers
At times we may be dealing with sam ples that a rise as m ixtures of two o r
more distributions. F o r example, the author once was involved in a study
dealing with taking measurements on parasite transmitting snails obtained
from field sampling. There was no nonstatistical way to separate the d iffer
ent generations ( i . e . , age groups) of snails in the sam ple. The param eters
that w ere desired w ere related to age. The author was also involved in
FIGURE 2.5 Plots of ecdfs from NOR data set illustrating effect of an outlier,
(a) Ecdf of first ten observations, (b) Ecdf of first nine plus one outlier.
16 D’AGOSTINO
another study dealing with o ra l glucose tolerance test data. In this study it
was suggested that there might exist two sul^>opulations—norm als and dia
betics. The data set consisted mainly of norm als. Again there was no simple
nonstatistical way of removing the sm all "contaminating" subsample of dia
betics . In both o f these situations the graphical techniques of this chapter
proved to be extrem ely useful.
Unless the component distributions of the mixture a re very distinct ( e . g . ,
the difference between the means is much la rg e r than the individual d istri
butions’ standard deviation), the ecdf of the combined sample may not supply
much information to aid in determining if a mixture exists. Figure 2.6 illu s
trates the problem . It contains separate and combined densities of mixtures
of norm al distributions. If the component distributions are "c lo s e " as in (a)
and (b) of Figure 2.6, the combined distribution may very w ell be unimodal.
Figure 2.7 further illustrates the problem . These are ecdfs from m ix
tures of two norm al distributions. The main underlying distribution is the
norm al distribution with mean zero and standard deviation unity. However,
the sampling w as done in such a way that for each observation drawn there
was a probability тг that the observation would come from the norm al d istri
bution with mean 3 and standard deviation unity. The data set fo r (a) of
F igure 2.7 had тг = . I (data set L C N (. 10,3) of the appendix) and the set
separate
component
dens iti es
comb i ned
components
density
1.0 -
.9-
Я-
.7-
Fn(X)
.5
.4
.3
.2
.1
-3 -2 -I -2 0 I
fo r (b) of Figure 2.7 had тг= .2 (data set L C N (.2 0 ,3 )). The ecdfs in F ig
ure 2.7 look very much like those that are produced by positively skewed
distributions. In fact the populations cdfs are positively skewed. The con
tamination "caused” the skewness.
If the component distributions a re "w e ll separated" as in (c) of Figure
2.6, the resulting mixture w ill be bimodal and with sufficient data available,
the ecdf w ill show the changes from concavity to convexity to concavity as
does the cdf of (c) in Figure 2.6. In general only under the condition of sub
stantial separation o f the components w ill the ecdf reveal bimodality.
There is an extensive literature on mixtures (see, for example, Johnson
and Kotz, 1970, Section 7.2) and the usual procedure is to assume some
functional form for the components or for the m ajor distribution o r d istri
butions of the components. Specific param etric techniques are then employed
18 D’AGOSTINO
to establish if a mixture does exist and to estimate the param eters of the
components (e . g . , B liss, 1967, Chapter 7). G iventhese assumptions about
the functional form of the underlying distributions, graphical techniques such
as the probability plotting techniques which w ill be discussed later in this
chapter can be very useful in detecting the presence of mixtures even in situ
ations such as those in Figure 2.6a and 2.6b. These probability plotting
techniques are the graphical techniques we recommend fo r use. Other graph
ical procedures are given in Harding (1949) and Taylor (1965).
2 .2 .5 A ssessin g T a il Thickness
r -x/ö
f(x) = ~ e в > 0, X > 0 (2.2a)
, . -x/â
F(X) = I - e (2.2b)
I - F(X) =
and
Л -1
f(x) = | (| ) , e>0, x > 0, k > 0 (2.3a)
-(х/в)
F(X) = I - e (2.3b)
f « . — (2.4)
XO-
Its cdf, F (X ), does not have a closed form representation. Figu re 2.8 con
tains plots of I - F (x ) versus x on sem i-lo g paper fo r a negative exponential,
a W eibull with к > I and a lognorm al distribution. Notice the negative expo
nential produces a straight line, the lognorm al distribution curves upward
and the W eibull with к > I curves downward. A distribution that curves down
ward is termed "light tailed. " A heavy tailed distribution has a probability
density function whose iQ>per tail approaches zero less rapidly than the
ISD
О
Ö
>
FIGURE 2.9 Plots of data sets on sem i-log graph paper (for determining tail thickness), (a) E X P data set.
cn
(b) WE2 data set.
d
§
GRAPfflCAL ANALYSIS 21
#(X. < x) - . 5
V x z ________ .5
(2.5)
n n
and plot the data only for Fjj(x) > . 50. Note in (2.5) i = #(Xj < x ).
E 2 .2 .5 .1 Example
Figure 2.9 contains the above described plots fo r the E X P and WE2 (W eibull
with к = 2) data sets. Consider first the E X P data set plotted in Figure 2 .9a.
The dots represent the observed values of I - Fjj(x) fo r Fn(x) > .50. These
appear to lie roughly on a straight line. If the negative exponential distribu
tion is an adequate model fo r these data, then a straight line for the theo
retical exponential as in Figure 2.8 should fit the observed points. To obtain
the theoretical line we need The param eter Ö in (2. 2) can be estimated by
в = (2 . 6)
ln (l - F^(X))
model which accounts w ell for all the data points except possibly the last
two. Note that in judging the goodness-of-fit of these points it is the hori
zontal distance from the points to the line that are important, not the vertical
distances. The last data point, in particular, may appear to be further away
from the line than might be expected. Such variability in the extreme o b ser
vation is, however, often observed.
Consider next the WE2 data plotted in Figure 2.9b. Using the x fo r which
Fn(X) = .6321 to obtain в in (2.6) we obtain 0 = .98. The line I - F (x) for
the negative exponential with ö = .98 is drawn in Figure 2.9b. Notice how it
lies above most of the data. Using Figure 2.8 as a guide this suggests (cor
rectly) that the data is from a distribution with a thinner upper tail than the
negative exponential. A lso on Figu re 2.9b are plotted two other lines re p re
senting I - F ( X ) for the negative exponential of (2 .2 ). These arose from
solving (2.6) for fu s in g F jj(x) = .90 and Fjj(x) = .95. The estimators of в
a re, respectively, .59 and .52. A gainthe inference is the sam e, v iz ., the
negative exponential model of (2. 2) is not appropriate and the distribution
under consideration has a thinner tail than the negative exponential. Note,
this inference is correct.
A further examination of the WE2 data plot in Figure 2 . 9b does reveal
that the points do appear to lie on a straight line. The above analysis estab
lishes that the data cannot be explained by a model such as (2 .2 ). They can,
however, be explained by a negative e^qx>nential model which incorporates a
displacement value, v i z . ,
Ч -. -[(x-X )/ö ]
F(X) = I - e
If we start with this model then any two distinct x values (or two F^(X)
values) can be used to produce linear equations fo r X and 0. The equations
are
ln (l - F^(x^))e + A =
(2 . 8)
ln (l - F^(Xg)) 0 + A = X^
Using F jj(Xi ) = .50 and Fjj(X 2) = .90 in the WE2 data set produces в = .28,
X = .73. The line of I - F (x) for model (2.7) with these param eters is also
plotted in Figure 2.9b. This provides an excellent fit. So if we restrict our
attention solely to the upper tail the WE2 data can be w ell explained by a
negative exponential of the form (2 .7 ). O f course, the correct model is the
GRAPHICAL ANALYSIS 23
W eibull of (2.3) with к = 2. Completion of the first p art of the above analysis
would have led correctly to the W eibull model.
2 .2 .5 .2 Extensions
The above m aterial can easily be modified to examine the low er tail of the
distribution ( v i z . , by plotting F^(X) of (2.5) versus the observations on
sem i-lo g p a p e r).
Often the norm al distribution is used as the reference distribution in
discussing tail thickness and the standardized central fourth moment (the
kurtosis m easure) is used as the appropriate m easure. F o r these problem s,
one is usually interested in fitting the complete distribution and not just the
tail. W e w ill discuss this tail thickness concept in Section 2.4 below. A lso,
Chapter 7 w ill discuss in detail the form al computational procedures associ
ated with this concept.
The ecdf can be used also fo r assessin g how w ell a particular statistical
distribution fits the entire data set. The procedure starts by plotting on the
same grid o f a piece o f graph paper the ecdf of the sample and the cdf of the
hзфothetical distribution. F o r example. Figure 2.10a contains the ecdf of
the NOR data along with the cdf for the norm al distribution with mean 100
and standard deviation 10 ( i . e . , the true underlying distribution). If values
of the param eters of the hypothetical distribution are unspecified, these
must be estimated fo r the data set under investigation by means of some
procedure such as the method of moments o r the method of maximum likeli
hood and then the cdf of the hypothetical distribution using these estimates
as param eter values are plotted. F o r an example, Figure 2. IOb contains
the plot of the ecdf of the first ten observations of the NOR data set along
with the cdf of the norm al distribution with mean and standard deviation equal
to the sample mean and standard deviation, v i z . , x = 98.41 and s = 8.28.
The next step in the informal graphical analysis involves comparing the
two plots (ecdf and cdf) and deciding if they a re " c lo s e .” Usually this infor
m al procedure is the first step in a m ore elaborate análysis which includes
form al numerical techniques re fe rre d to as em pirical cumulative distribution
function techniques o r m ore simply em pirical distribution function (ED F)
techniques. Chapter 4 contains a detailed account of these techniques.
W hile the above described graphical procedure has m erit, especially
when used with the form al num erical ED F techniques, it is deficient as an
inform al technique in that there are m ore informative sim ple graphical
techniques—namely those involving probability plotting which are the subject
matter of the rem ainder of this chapter.
2.3 G E N E R A L C O N C E PTS O F P R O B A B IL IT Y P L O T T IN G
2 .3 .1 Introduction
A m ajor problem with the use of the ecdf plot in attempting to judge visually
the correctness of a specific hypothesized distribution is due to the curva
ture o f the ecdf and cdf plots. It is usually very hard to judge visually the
closeness of the curved (o r step function) ecdf plot to the curved cdf plot. If
one is attempting to reach a decision based on visual inspection it is prob
ably easiest to judge if a set of points deviates from a straight line. A prob
ability plot is a plot o f the data that offers exactly the opportunity fo r such a
judgment, for it w ill be a straight line plot, to within sampling e r r o r , if the
hypothesized distribution is the true underlying distribution. The straight
line results from transform ing the vertical scale of the ecdf plot to a scale
which w ill produce exactly a straight line if the h3npothesized distribution is
plotted on the graph.
The principle behind this transformation is simple and is as follow s.
Say the true underlying distribution depends on a location param eter ß and a
scale param eter ct*. Qx and a need not be the mean and standard deviation,
respectively). The cdf of such a distribution can be written as
GRAPHICAL ANALYSIS 25
where
is re fe rre d to as the standardized variable and G (*) is the cdf of the standard
ized random variable Z . The ecdf plot is based on plotting F (x) on x. F o r
sample data F (x) is replaced by Fj^(x) and the plotted values o f x are the ob
served values of the random variable X. Now if the plot w ere one of z on x
(or equivalently G “^(F(x)) on x where G "^ (.) is the inverse transformation
which here transform s F (x) into the corresponding standardized value z), the
resulting plot would be the straight line
X -M
Z = G -^ F (X )) = (2.10a)
or in term s of x on z
X = M + Z(7 (2 .10b)
Z = G"^(F (X )) on X (2.11a)
(2.11b)
'l ■ “ ’‘fl)
fo r i= l. ....n (2 . 12)
o r m ore generally as
I-C
fo r 0 < C < I (2.13)
n - 2c + I
In (2.12) the C o f (2.13) is equal to 0.5. See Barnett (1975) and Chapter 11
fo r further discussion o f the selection o f c. In the following we w ill always
26 D'AGOSTINO
use the (x) given by p,- of (2.12). Given that F is the true cdf, the proba
bility plot of (2.11) should be approximately a straight line. In fact there is
strong convergence to a straight line for large sam ples.
Here Д and a are the mean and standard deviation, respectively. The cdf of
the standardized logistic distribution ( i - e . , of Z = (X - pt)/o*) is
lOOF-lx)
995
99.0
98.5
97.5
95
90
80
70
60
50
40
30
20
10
5
2.5 -2
15
1.0
.5 -3
I i i J
50 60 70 80 90 100 Л0 120 130 140 150
I .005 -2 .9 0 51.90
2 .015 -2 .3 1 60.57
3 .025 -2.02 63.35
4 .035 -1 .8 3 65.87
5 .045 -1 .6 8 66.35
6 .055 -1 .5 6 68.44
7 .065 -1.47 74.29
8 .075 -1 .3 9 76.52
9 .085 -1.31 78.32
10 .095 -1.24 78.48
11 .105 -1 .1 8 79.07
12 .115 -1 .1 2 79.32
13 .125 -1.07 81.17
14 .135 -1.02 81.61
15 .145 - .98 82.45
F(X) = g ( ^ ) = G(Z)
(2.15)
ж ^ V l-F (X )/
n' '
where Fj^(x) is the ecdf defined by (2.12) on one axis ( e . g . , the vertical axis)
versus X on the other (horizontal) axis. Here x represents the observed v a l
ues in the sam ple. Figure 2.11 contains an appropriate graph set up for
this problem .
Notice in Figure 2.11 there are two alternative ways of labeling the
vertical axis. The first way, which is probably the most informative, is to
label the axis in term s of Fn(x) (or lOOFjj(x) which is the m ore conventional
w a y ). The second way is in terms of the values of the standardized v a r i
able z. In Figure 2.11 we have labelled the left vertical axis as 100Fjj(x) and
the right vertical axis as z. Notice the vertical axis is linear in z. It is not
linear in F ^ (x ).
Figure 2.11 contains the data plotted on it. To make m ore езфИси the
actual points plotted on this graph we list in Table 2.1 the values of Fjj(x) of
(2.12) and Z obtained for (2.15) for the first and last fifteen ordered o b se r
vations . A lso listed are the corresponding ordered observations.
Once the data are plotted the next step is to determine the goodness-of-fit of
the data. F o r a probability plot this means determining if a straight line
’’fits w e ll” the data. This problem can be approached in a very form al manner
and Chapter 5 (Regression Techniques) discusses this approach in detail. F o r
the purposes of this chapter it means drawing a straight line through the
points and deciding in an informal manner if the fit is good.
2 .3 .3 .1 F irs t Procedure
The sim plest procedure is to draw a line ”by eye” through the points. One
convenient way to do this is to locate a point on the plot corresponding to
GRAPmCAL ANALYSIS 29
around the IOth percentile (F^(X) = . 10) and another around the 90th percent
ile (Fji(x) = -90) and connect these two. F igure 2 . 12a contains such a line for
the logistic probability plot of the complete LO G data set. (Notice this is the
sam e plot as Figure 2.11. H ere we have the straight line imposed on the
grap h .) This line fits the data extrem ely w e ll, accommodating even the ex
trem e points. There are two comments which need mentioning here. The
first concerns the non-random pattern of the points about the line. The
ordered observations are not independent and the type o f pattern shown in
F igure 2 . 12a is to be ej^ected. Second, in judging deviations from the line
rem em ber it is the horizontal distances from the points to the line that are
important.
A fter the *Ъу eye” line is drawn it can be used to supply quick estimates
o f the param eters o f the distribution. F o r exam ple, with the LO G data of
Figure 2 . 12a we can obtain estimates of the mean м and standard deviation a
by recalling that z = 0 corresponds to the mean ц. and z = I (86th percentile)
corresponds to д + o*. In Figure 2 . 12a we have lines extending from z = 0
and I to the straight line and down to the x a x is . From these we estimate
ii = 99 and a = 17.
X = /1+2(7 (2.16)
^ Z (z - z)x ,-4 - ^-
S ( Z - Z ) Z and Í. = X - (T Z (2.17)
If Z z = O, then
^ - ,-4 Z ZX
Al = X and (T = (2.18)
U
FIGURE 2.12 Logistic analysis of LOG data set. (a) Full data set (n = 100) line fit "by eye. " (b) F irst
ten observations, sm all sample analysis.
a
§
GRAPfflCAL ANALYSIS 31
2 .3 .4 Small Samples
When the size of the sample is sm all (say 50 o r less) the probability plots
of Z on X as given by (2.11) may display curvature in the tails even if the
hypothesized distribution is correct. F o r these cases the usual recommen
dation is to use the expected values of the ordered statistics from the stand
ardized distribution of the h3^othesized distribution fo r the plotting positions
of the vertical axis. These are used in place of the z o f (2.11) which are the
percentile points of the standardized distribution. The ejqjected values are
defined as follow s. Say Z (i ) < • • • < Z(n) represent the ordered observations
for a sample of size n from a standardized distribution. Then the expected
values are defined as fo r i = I , . . . » n where E represents the e3q>ected
value operator.
E 2 .3 .4 .1 Example
F or the logistic distribution the expected values are readily available (Gupta
and Shah, 1965, and Gupta, Qurelshi and Shah, 1967). However, fo r this
particular distribution there appears to be no reason to use them in plotting.
Figure 2.12b contains a logistic probability plot ( i . e . , a logistic analysis) of
the first ten unordered observations of the LOG data. The data along with the
expected values of the ordered observations, Fjj(x) and z o f (2.11) and (2.15)
are as follows:
Expected values
Ordered of standardized Z of (2.11)
F (X ) = p.
observations logistic ordered observations n' and (2.15)
The differences between the expected values and the z ’s of (2.15) are not
large enough to influence the plots. This is seen clearly in Figure 2 . 12b.
Rem em ber in judging the fit it is the horizontal distance from a point to the
line that is important.
32 D^AGOSTINO
Z
Ш5
A3
.39
.17
-.05 J. J. J.
C -0.2887 1.8148 3.9183 6.0217 8.1252 10.2287
the uniform distribution defined on the interval 0 to 10. This analysis clearly
indicates lack of fit. Figure 2.13c is a uniform analysis of the UNI data.
That i s , it is a probability plot Investigating if the UNI data w ere drawn from
a uniform distribution. In Chapter 6 techniques are discussed which involve
transform ing the data first to a uniform distribution. In that chapter the
uniform probability plot plays a very important adjunct role in judging
goodness-of-fit. F o r a uniform probability plot the standardized variable z
usually is defined as the uniform distribution on the unit Interval.
In addition to the plotting, many interactive computer program s can also
be used to obtain the estimates of д and <t given by (2 .1 7 ). These a re just the
intercept and slope estimates from a simple linear regression of x on z.
However, the correlation coefficient from this simple linear regression must
be viewed with care in attempting to judge goodness-of-fit. Because of the
matching of the ordered observations with increasing z values both x and z
a re monotonically increasing, so the correlation coefficient w ill be usually
large in magnitude regard less o f how w ell the data fit a straight line. F o r
example, the correlation coefficient for the data of Figure 2 . 13b (logistic
analysis of the UNI data) is . 947. The fit o f these data to a straight line
obviously leaves much to be desired.
In addition to the use of program s as described above to do probability
plotting, many standard software packages ( e . g . , SAS) have specific routines
for probability plotting. These should be used when available.
2 .3 .7 Summary Comments
( 2 . 20)
^n("(i)> = Pi =
In the examples above we have used arithmetic graph paper placing z on the
vertical axis and x on the horizontal axis. O f course, it is not incorrect to
place X on the vertical axis and z on the horizontal axis. (In Chapter 11 prob
ability plotting is done that w ay .) N or is it essential to use arithmetic paper.
Many probability plotting papers, which have the axes appropriately labelled,
are available com m ercially. Logistic paper and many other probability
papers are available from the Codex Book Company, 74 Broadway in Norwood,
Massachusetts.
GRAPmCAL ANALYSIS 35
i - .5
Uniform ------- fo r Д < X < ß + (T
(T * (i ) P i= n
Once the points are plotted the m ajor task is to judge if the plotted data
form a straight line. If they do not, the task is then to decide what a re the
properties of the underlying distribution o r data which cause this nonlinearity.
W e w ill now illustrate this probability procedure with the norm al, lognormal
and W eibull plotting. Table 2.2 contains the appropriate form ulas fo r proba
bility plotting for those and other fam iliar distributions.
2.4 N O R M A L P R O B A B IL IT Y P L O T T IN G
2 .4 .1 Probability Plotting
f(x) = (2 . 21 )
^Í2Ír<T
CO
05
g
d
§
FIGURE 2.14 Normal probability paper.
GRAPmCAL ANALYSIS 37
i\ n 3 4 5 6 7 8 9 10
11 12 13 14 15 16 17 18
19 20 21 22 23 24 25 26
i\n 27 28 29 30 31 32 33 34
35 36 37 38 39 40 41 42
T A B L E 2.3 (continued)
i\n 43 44 45 46 47 48 49 50
Here
t = {-In [4 F (X ) (I - F ( X ) ) ] } (2.23)
n n
and
100 Fn(x)
S
(b) Dosim eter data set.
GRAPfflCAL ANALYSIS 41
+1 if F (X ) - .5 > О
n
sign (F^(X) - .5) = (2.24)
- I if F (X ) - .5 < О
n' '
E 2 .4 .1 .1 Examples
Figure 2.15 contains two norm al probability plots. Figure 2 . 15a is a plot of
the NOR data set already extensively discussed in Section 2 .2 . Figure 2 . 15b
is a plot from a sample of 20 dosim eter readings of benzene (D'Agostino and
G illespie, 1978). A dosim eter is a portable device for m easuring a person's
exposure to various gases. The dosim eter data are in parts p er million
(ppm ). The frequency distribution and plotting points z are:
.93 3 -1 .4 7
.95 6 -.5 3
.97 3 .06
.98 I .32
.99 I .45
1.01 4 .85
1.05 I 1.41
1.07 I 1.87
20
42 D^AGOSTINO
Notice in Figure 2 . 15b we plotted only 8 points since only eight different
values appeared in the sam ple. The z values are averaged in the case of the
ties. The line drawn in Figure 2.15b is the line x + zs, where x = .98 and
S = .04.
F o r grouped data ( i . e . , data grouped into frequency classes) only one
value p er class should be plotted. This plotted value should be the true upper
limit of the class (see Section 2 .2 .1 for an illustration of true upper limits
and Section 2 .3 .5 fo r m ore d etails).
2 .4 .2 .1 Unimodal Distributions
_3________ E (X - M)-
Skewness : 4 ß ^ = - (2.25)
3/2
and
E (X - H )
Kurtosis: (2.26)
“ 2
{ E ( X -ß ) Y
100F„(x)
99
90
70
50
30 /
10
0 2 4 6 8 10
FIG U R E 2.17 Norm al analysis for outliers and mixtures (or contamination),
(a) NOR data (n = 10) detection of outlier.
corresponding to the observation 140 is clearly out o f line with the rest of
the data. In practice the techniques of Chapter 12 should now be used to con
firm that this point is an outlier.
F igu res 2.17b and 2 . 17c a re norm al probability plots of the contami
nated normal data sets L C N (. 10,3) and L C N (.2 0 ,3 ) whose ecdfs are given,
respectively, in Figures 2.7a and 2.7b. The read er should note two impor
tant related points concerning both Figures 2 . 17b and 2.17c. F irst, both
reveal the presence of two straight lines. This is seen, for example, in
Figure 2.17b [L C N (. 1,3) data set] where one straight line can be fit nicely
through the data below the 80th percentile of the sample and a second straight
line can be fit through the data from about the 92nd percentile up. The points
GRAPmCAL ANALYSIS 45
eters o f the components. The read er is re fe rre d to B liss (1967) and Johnson
and Kotz (1970) for further details.
Figure 2.18 provides guidelines to aid the user in interpreting normal prob
ability plots. Notice in the drawings of Figure 2.18 the em pirical cumulative
distribution function and/or z scale is on the vertical a x is . Some graphs
have these on the horizontal axis. The resulting configurations w ill be differ
ent if this is done.
GRAPfflCAL ANALYSIS 47
S y m m e t ric S y m m e t ric
T a ils Thinner T a ils T h ick er
Than N o rm al Than N o r m a l
J
in d ic a tio n Vjff < 0 ViS, > 0
Skewed to Skewed to
Left R igh t
2.5 LO G N O R M A L P R O B A B IL IT Y P L O T T IN G
where the data x is the variable o f the log axis and z of (2.22) to (2.24) is
the variable of the equal interval scale axis. In selecting graph paper with a
log scale, the user should select one with enough cycles to accommodate the
data. Graph papers with one to five cycles on the log axis a re readily avail
able. F o r sam ples of less than 50 the z of (2.22) to (2.24) should be replaced
with the expected values of the standardized ord er statistics of T able 2.3.
Figure 2.20 contains lognormal probability plots. Figure 2.20a contains
three plots of TSP (total suspended particulates) data from three a ir quality
monitoring sites near Boston, Massachusetts. Notice this figure uses log
normal paper. Figure 2.20b is a plot on arithmetic graph paper of the CH EN
data set given in the Appendix. These data are taken from B liss (1967) and
they are the lethal doses of the drug cinobrifagln in 10 (m g/kg), as deter
mined by titration to cardiac a rre s t in individual etherized cats. The loses.
In (d oses), frequencies and plotting position z values are:
25
where Д and a are the sample mean and standard deviation, respectively, of
the logs o f the data. F o r lognorm al data the param eters o f interest are
usually the geometric mean and geometric standard deviation. F o r the model
(2.27) these are, respectively,
e^ and e^
Estimates o f these based on the data in Figure 2.20b a re exp (. 7972) = 2.2193
and exp (.2790) = 1.3218. F o r data plotted directly on log normal paper
ехр(д) is estimated as the 50th percentile ( i . e . , x value corresponding to
Z = O) and exp (cr) is estimated as the ratio of the 84th percentile to the 50th
GRAPmCAL ANALYSIS 51
At tim es, when dealing with a set of data that appears to be Iognormally d is
tributed, there may be a subset of these observations that are a ll equal to
zero. Before the data can be plotted these zeros must be "adjusted.** F irst,
it is possible that they represent a contamination and simply should be r e
moved. Second, they may reflect a measurement limitation o f the m easure
ment instrument. In this case it may be justified to replace them with the
**least detectable level** o f the instrument. I f this is not known then it may be
possible to adjust the zeros by adding a sm all arb itrary constant to them o r
to a ll the data values before they a re plotted. C areful consideration should
be given before any of these suggestions a re employed.
2 .5 .2 T h re e P a ra m e te rL o g n o rm a l
FIGURE 2.21 Lognorm al plot fo r three param eter lognorm al. • Original data;
X D a t a - . I . Line is In (Data - . 1 ) = -1.7025 + 2.2781z; X = .1 , M = -1.7025,
(T = 2.2781.
52 D^AGOSTINO
A plot of data from this distribution for a lognormal analysis w ill not produce
a straight line unless the Л value o r an estimate o f it is subtracted from all
the data. The data in Figure 2.21 Illustrate the situation. These data come
from Leidel, Busch and Lynch (1977) and represent readings o f hydrogen
ñuoride. The dots r ^ r e s e n t the unadjusted data. These data and the plotting
positions fo r Z a re given in the firs t four columns below:
12
Notice how the dots at the low er end bend in a concave manner while those
at the upper end do appear to follow a straight line.
There a re many ways to obtain estim ators o f X fo r this type of data
(Aitchinson and Brown, 1957, and Johnson and Kotz, 1970, chapter 14). The
author has found the following two simple inform al procedures to be useful
in the graphical stage of analysis. F irst, note in Figure 2.21 that the low er
end dots do appear to be approaching asymptotically the log value o f -2 .3 .
The antilog of this asymptote ( v i z . , exp (-2 .3 ) = . 10) can be used as an esti
mate of X . Second, if we use X p to represent the Pth sample quantile
(0 < P < 100) then the following should be approximately true
In (Xg^ - X) (2.29)
^ lO O -P ^ ”
X = (2.30)
^ lO O -P ^ " ^^0
95 5 50
X = (2.31)
X « , + X^ 2X.
95 5 50
A s a value fo r the Pth sample quantile, the user can use either the ith o rd e r
statistics where
i = [n (.O lP )] + I (2.32)
(here [y] is the largest integer in y) o r else obtain it directly from a graph.
That is, draw a curve by hand through the data and use o f the Pth quantile
the X value on the horizontal axis corresponding to 100Гц(х) = Pth value on
the vertical axis. Applying (2.31) to the data o f F igure 2.21 we again obtain
. I as a good approximation to X. Figu re 2.21 also contains a plot o f the data
minus this . I value ( i . e . , In (Data - . 1)). This plot is given as x values on
the graph. A straight line now does fit reasonably w ell these values indicating
the appropriateness o f the lognorm al distribution.
2.6 W E IB U L L P R O B A B IL IT Y P L O T T IN G
(2.33)
and cdf
<t/0Ÿ
F (t ) = I - e (2.34)
K)OF„(x)
55
56 D’AGOSTINO
This follows immediately from the cdf given in (2.34) and the general proce
dure for probability plotting described in Section 2 .3 .1 . In particular, from
(2 . 34) we obtain
I - F (t ) =
In ( I - F (t )) = -(t/0)
In (-In (I - F (t ))) = k l n t - k l n 0
where
X = In t, /i = l n 0 and (T = l/ k (2.36)
Note with W eibull plotting the log o f the data ( i . e . , log t ) Is put on the hori
zontal axis rather than the data values directly. This is needed to obtain
linearity in the plots. Now applying (2.11) o r (2.19) we obtain (2.35). Further
while sm all sample e^qpected values o f ord e r statistics are available (Mann,
1968) W eibull probability plotting usually works w ell by simply employing
(2.35) fo r all sample sizes. To illustrate W eibull plotting we present two
plots. Figure 2.23b is a W eibull plot of the W E2 data set ( i . e . , 100 o b se r
vations from the W eibull distribution with к = 2 and 0 = 1 ) . Figure 2.23c
is a plot of eleven survival times in months of cancer patients who have had
an adrenalectomy. This data set was obtained from a study by D r . Richard
O berfield of the Lahey Clinic, Boston. The data are:
The straight line drawn in F igu re 2.23b w as drawn using the "by eye"
technique described in Section 2 .3 .3 .1 . F ro m the line we obtain
X = In t = Д + z i = -.0 5 + z(.47)
R ecall the true param eter values a re 9 = 1 and к = 2. The inform al "by eye"
technique did w e ll in this case.
Because the sample size is sm all fo r the data in F igure 2.23c we use
the unweighted least squares estimates o f (2.17) to obtain the estimates o f д
and a from this data (columns (3) and (5) o f the data above w ere used fo r the
x ’s and Z^S, respectively). The estimates are Д = 3.40 and a = .55. From
these we obtain ff = езф(3.40) = 29.96 and a = 1/.55 = 1.82.
2 .6 .1 .1 Z e ro D a ta
2 .6 .2 T h re e P a ra m e te rW e ib u ll
(2.37)
Sim ilar to the three param eter lognorm al distribution the X value must be
subtracted from a ll the data before a W eibull plot w ill produce a straight
line. In many cases a value close to the minimum t is adequate as an esti
mate of Л. Other m ore precise techniques can be employed (Johnson and
Kotz, 1970).
Y = h(X,/3) + TJ (2.38)
Y = h {X ,ß ) (2.39)
and
7? = Y - Y (2.40)
is the residual. While the residuals com prise a dependent sample the graph
ical techniques described above can be used to analyze them. Chapter 12 on
outliers discusses further the analysis of residuals.
2. Analysis of Censored Sam ples. There are no restrictions in the
above which make it necessary fo r a ll the data to be available fo r plotting.
The above techniques can be used on censored data. There a re , however,
special concerns which a rise with censored data ( e . g . , see Nelson, 1972)
that require careful and complete discussion. Chapter 11 is devoted solely
to this problem .
3. Q -Q Plots and P - P P lo ts. W ilk and Gnanadesikan (1968) discuss in
detail the quantile-quantile probability plots (Q -Q plots) and the percentage-
percentage probability plots ( P - P plo ts). A Q -Q plot is the plot o f the quan
tiles (o r, as we call them above, the percentiles) of one distribution on the
quantiles o f a second distribution. If one of these distributions is a hypothe
sized theoretical distribution a Q -Q plot is just a probability plot as devel
oped in Section 2.3. A P - P plot is the plot o f the percentages of one d istri
bution on the percentages of a second. W ilk and Gnanadesikan (1968) state
that while the P - P plots are limited in their usefulness, they are useful for
detecting discrepancies in the middle of a distribution ( i . e . , about the median)
and also may be useful for multivariate analysis.
4. Transform ation to Norm ality. In some settings ( e . g . , analysis o f
variance) it is suggested first to transform to norm alily and then analyze
the transform ed data. Box and Cox (1964) suggest a power transformation fo r
this. T h eir transformation is as follows:
: if 0 > 0
у = log X if Ö = 0 (2.41)
H ere X re fe rs to the original data and в is the power exponent. Box and Cox
develop a maximum likelihood estim ator fo r в . Once it is computed norm al
probability plotting as developed in Section 2.4 can be applied directly to the
transform ed data у o f (2.41). The techniques o f Chapter 9 can be used to
test form ally the norm alily o f the transform ed data.
5. Probability Plotting fo r the Gam m a Distribution. One important
distribution that does not lend itself immediately to the probability plotting
techniques described above is the Gamma distribution. Even with the aid of
transformations it cannot be put in the sim ple form o f a distribution depend
ent upon a location and scale param eter. W ilk , Gnanadesikan and Huyett
(1962) present a technique and accompanying tables to handle this situation.
See Chapter 11 fo r further comments on this.
6 . M ultivariate N orm ality. There has been much attention paid to the
problem o f probabilily plotting fo r the m ultivariate norm al distribution and
a number o f techniques have been suggested (Gnanadesikan, 1973). The
author has found the following technique to be very informative. F irs t trans
form the data to principal components and then do univariate norm al proba
bility plotting (Section 2.4) fo r each component. Each component can be con
sidered an independent variable. I f the original data set is from a multivariate
norm al distribution then each component should produce a straight line in the
univariate plots. M ore w ill be said about m ultivariate normality in Chapter 9.
The aim o f this chapter has been to present to the read er sim ple inform al
graphical techniques which can be used in conjunction with the form al tech
niques to be discussed in the following chapters. In perform ing an analysis
we suggest that the read er should draw a graph, examine it and judge if
other graphs are needed. As the form al num erical techniques a re being
applied use the graphs to interpret them and to gain insight into the phenom
enon under investigation.
REFERENCES
C hem off, H. and Lieberm an, G . (1954). Use of norm al probability paper.
Jour. A m er. Stat. A s s o c . 49, 778-785.
Harding, J. P . (1949). The use o f probabilily paper for the graphieal analy
sis of polynomial frequeney distributions. Jour, o f M arine Biology
A s s o e . , United Kingdom 28, 141-153.
3.1 IN TR O D U C TIO N
63
64 MOORE
Tests of the types considered here are also used in assessing the fit of
models fo r categorical data. The scope of this volume forbids venturing into
this closely related territory. Bishop, Fienberg, and Holland (1975) discuss
the methods of categorical data analysis most closely related to the contents
of this chapter.
To test the simple hypothesis that a random sample X^, . . . , Xj^ has the
distribution function F (x ), Pearson partitioned the range o f Xj into M cells,
say E l, . . . , E]y[. If N i , . . . , N m a re the observed number of Xj^s in these
c e lls , then N i has the binomial distribution with param eters n and
when the null hypothesis is true. Pearson reasoned that the differences
N i - npi between observed and expected cell frequencies express lack of fit
of the data to F , and he sought an appropriate function of these differences
fo r use as a m easure o f fit.
P earso n ’s argument here w as in three stages: (i) The quantities N ^ -n p i
have in large sam ples approximately a multivariate norm al distribution, and
this distribution is nonsingular if only M - I o f the cells are considered.
(ii) If Y = ( Y i , . . • , Y p )’ has a nonsingular p -variate norm al distribution
Np(M, Z ) , then the quadratic form (Y - /i)’Z ”^(Y - /x) appearing in the expo
nent of the density function has the (P) distribution as a function o f Y . Here
of course /u is the p-vector of m eans, and Z is the p x p covariance m atrix
of Y . (iii) Computation shows that if Y = (N^ - n p ^ , . . • ,N m » i -
this quadratic form is
M (N -n p .)2
X" . Z ‘ *
1¾ ”P1
3. 2. 2 Composite Hypothesis
Ei
M [N -n p (0 )]^
i =l
P earson did not think that estimating 0 changes the large sample distribution
of X ^ , at least when 0^ is consistent. In this he was wrong. It was not until
1924 that F ish er showed that the limiting null distribution of X ^ (^ ) is not
(M - I ) , and that this distribution depends on the method o f estimation
used.
F ish er argued that the appropriate method of estimation is maximum
likelihood estimation based on the cell frequencies This grouped data
M L E is the solution of the equations
M N. ap.(0)
= 0, к = I, (3.2)
M N
= 2 X N. log —
I ^ np.
1=1
M ( N dp ^ ( 9 )
i_
= 0, к = I ......... ... (3.3)
(9 ) дв.
66 MOORE
M [N - np (в)]2
M Pj (O ) 8 P j(0 )
= 0. к = I, P (3.4)
Э0,
i= l ^ i
Since for the purposes of large sample theor}»^ under the null hypothesis this
estimator is Interchangeable with the previous two, call it also to mini
mize notation. Neyman’s rem ark is important because equations (3.4) are
m ore often solvable in closed form than a re (3.3) and (3 .2 ).
3. 2. 2. 1 Example
with Q = ( - 1 , 1). This fam ily has been used as a model fo r the distribution of
the cosine of the scattering angle in some beam -scattering experiments in
physics. F o r cells E^ = (a|_^, aj] with
- I = ao <
we have
P j(ö ) = / f(x|0) dx
i-1
_ в, 2 2 ^ ^ 1 ,
4 ^^1 “ ^1-1^ 2 ^^1 " ®i-l^
TESTS OF CHI-SQUARED TYPE 67
It Is easily seen that neither (3.2) nor (3.3) has a closed solution, while
((3.4) has solution
0 = -2
n M
But even the minimum modified chi-squared estim ator must often be
obtained by numerical solution of its defining equations. If cells
Ei = (a ^ .i, a j a re used in a chi-squared test of fit to the norm al fam ily
A -I ” ^
P.(ix, a ) = ф ( - 4 “ ) - ------ )
It takes only a moment to see that none of the three versions of can be
obtained algebraically, so that recourse to num erical solution is required.
Most computer lib ra rie s contain efficient routines using (for example)
Newton’s method to accomplish the solution.
This circumstance calls to mind F ish e r’s warning that his ’’lose one
degree of freedom for each param eter estimated” result is not true when
estimators not asymptotically the same as are used. F o r example, in
testing univariate normality we may not simply use the raw data M L E ’s
X = ^ Z x .
“ j=l ^
in the Pearson statistic. Chernoff and Lehmann (1954) studied the conse
quences of using the raw data M L E 0^^ in the Pearson statistic. They found
that (¾ ) has as its limiting distribution under F(* |0) the distribution of
68 MOORE
M I I
FT^ = 4 2 { N f - (np )2 }"
1=1
C ressie and Read (1984) have systematized the theory o f classical chi-squared
procedures by introducing a class o f test statistics based on m easures o f
divergence between discrete distributions on M points. If q = (q i* • • • , q]y[)
and P = ( p ^ , . . . ,P]y[) are such probability distributions, the directed d iv er
gence o f order Л. of q from p is
M
:p ) = Ë ч* Л
\ (X + r (ч*i/ р
‘ .)
i- -I)
) = 2n А н / п :р (0 ))
3 .2 .4 Choosing C ells
An objection to the use of chi-squared tests has been the arbitrarin ess intro
duced by the necessity to choose c e lls. This choice is guided by two consid
erations: the pow er of the resulting test, and the desire to use the asymptotic
distribution o f the statistic as an approximation to the exact distribution fo r
sample size n. These issues have been studied in detail fo r the case o f a
simple hypothesis, i . e . , the case of testing fit to a completely specified d is
tribution F . Recommendations can be made in this case which may reasonably
be extended to the case of testing fit to a param etric fam ily { F (- \ $ )} .
Mann and W ald (1942) initiated the study o f the choice of cells in the
Pearson test of fit to a continuous distribution F . They recommended, first,
that the cells be chosen to have equal probabilities imder the hypothesized
distribution F . The advantages of such a choice a re: (I ) The P earson test
is unbiased. (Mann and W ald proved only local unbiasedness, but Cohen and
Sackrowitz (1975) establish unbiasedness o f both and G^. This is not true
when the cells have unequal probabilities under F . ) (2) The distance
sup I F j(X) - F(X) I to the nearest alternative F j indistinguishable from F by X^
is maximized (M an n -W ald), and X^ m axim izes the determinant of the m atrix
o f second partial derivatives o f the power function among a ll locally unbiased
tests of the same size (Cohen-Sackrowitz). (3) E m pirical studies have shown
that the distribution is a m ore accurate approximation to the exact null
distribution of X ^, and FT^ when équiprobable cells a re employed (see
Section 3 .2 .5 for re fe re n c e s).
Ш пп and W ald then made recommendations on the number M o f équi
probable cells to be used. Their work rests on large-sam p le approximations
and on a somewhat complex mlnimax criterion, so that it is at best a rough
guide in practice. Mann and W ald found that for a sample of size n (large)
and significance level a , one should use approximately
1/5
M (3.7)
70 MOORE
3 .2 .6 Choosing a Statistic
Since both hypotheses and alternatives of interest for an omnibus test o f fit
are very general, it is difficult to give comprehensive recommendations
based on power for choosing among a class of such tests. Asymptotic results
(fo r the simple Hq case) are ambiguous. When M is held fixed as n increases,
all are equivalent against local alternatives, and is favored against
distant alternatives (Hoeffding, 1965). But if M increases with n, the limiting
distributions of R^ vary with X under both hypothesis and local alternatives,
and appears to be favored (Holst 1972, M o rris 1975, C re ssie and Read
1984).
In many practical situations, power considerations are secondary to the
accuracy o f the approximation to the exact null distribution. In such cases,
the Pearson is the statistic of choice. Some quite limited computations of
exact power by Koehler and Larntz (1980) and Read (1984) shed some light on
the dependence of pow er on the alternative hypothesis and on the choice of X.
Read suggests 1/3 < X < 2/3 as a compromise with reasonable power against
the alternatives he considers. Again X^ fáres better than its common com
petitors , X|n and F T ^ .
A different approach that may aid the choosing o f a statistic is to exam
ine the type of lack of fit m easured by each statistic. The sample m easure
of the degree of lack of fit accompanying R ^ ( ^ ) (which m easures the signif
icance of lack of fit) is R ^ (^ )/ n . If G is the true distribution o f the o b se r
vations Xj, a ll common estim ators converge under G to a Öq such that
F (* 10q) is ’^closest” to G in some sense. When G is a m em ber o f the hypoth
esized fam ily { F ( * 10) : 0 in Í2}, this is iust consistencv o f When G is
not in this fam ily and 0j^ is the minimum-R ^ estim ator, 0 q is the point
such that p (0 q) is closest to the vector ttq of cell probabilities under G by
the discrepancy m easure i \7Tq : p ( 0 )). M oreover, R ^ ^ ) / n converges w .p . I
to 2I^(7Tq :p (0 o )). F o r example, X^(0j^)/n converges to
M (7Г - p ) 2
2I^V'P(^o)) = L „ '
i= l ^i
w ill modify this recommendation. But w ill rem ain the statistic o f choice
when the null hзфothesis is sim ple o r when minimum chi-squared estimation
is used.
3 . 2 . 7 . 1 Example
Since NOR purports to be data simulating a norm al sample with ix = 100 and
Cr = 10, let US first a ssess the simulation by testing fit to this specific d istri
bution. The M ann-W ald recipe (3.7) with o; = 0.05 and n = 100 gives M = 24.
F o r computational convenience, we use M = 25 cells chosen to be équi
probable under N(100,100). The cell boundaries a re 100 + 10zj, where zj
Is the 0.04Í point from the standard norm al table, i = I, 2, . . . , 24. F o r
example, the 0.04 point is -1 .7 5 , so the upper boundary of the leftmost cell
is 100 + (10)(-1.75) = 82.5. Table 3.1 shows the cells and their observed
frequencies. The expected frequencies are all (100)(0.04) = 4. When p^ = l/M
for all i, we have
n .4, ^ I
1=1
So in this example.
I 25
X" = 4 E (N j - 4 )
i= l
112
= 28
I 82.5 3 81.2 3
2 85.9 8 84.8 5
3 88.3 5 87.3 5
4 90.1 8 89.2 5
5 91.6 4 90.7 6
6 92.9 2 92.1 4
7 94.2 I 93.5 3
8 95.3 5 94.6 I
9 96.4 6 95.8 4
10 97.5 I 96.9 6
11 98.5 3 98.0 3
12 99.5 3 99.0 3
13 100.5 4 100.1 2
14 101.5 2 101.1 5
15 102.5 2 102.2 2
16 103.6 7 103.3 5
17 104.7 7 104.5 9
18 105.8 3 105.6 3
19 107.1 I 107.0 I
20 108.4 2 108.3 I
21 109.9 4 109.9 5
22 111.7 6 111.8 6
23 114.1 6 114.3 6
24 117.5 4 117.8 4
25 OO 3 OO 3
from the data. The P -v alu e falls between 0.460 (from (22)) and 0.579
(from (24)).
F o r comparison, the same procedure w as applied to test the LOG data
set fo r normality. In this case, X = 99.84 and a = 16.51, and the observed
chi-squared value using cell boundaries X + is X^ = 31.5. The c o rre
sponding P -valu e lies between 0.086 (from X^(22)) and 0.140 (from x^(24)).
Thus this test has correctly concluded that NOR fits the norm al fam ily w ell,
while the fit of LO G is m arginal. Since the logistic distributions a re difficult
to distinguish from the norm al fam ily, this is a pleasing perform ance. In
contrast, the samé procedure with M = 10 has X^ = 9.4 fo r the IXXl data,
so that the P -valu e lies between 0.225 (from X^(T)) and 0.402 (from X^(¾)*
Using three cells gives X^ = 0.98 and again fails to suggest that the LO G
data set is not norm ally distributed. Thus fo r these p articular data, the
la rg e r M suggested by (3.7) produces a m ore sensitive test.
3 .2 .7 .2 Example
The same procedure can be applied to the E M E A data, but a glance shows
that these data as given a re discrete and therefore not norm al. Indeed, with
15 cells équiprobable under the N (X ,o ) distribution fo r these data, X^ = 554.
Since the data are grouped in classes centered at integers, a m ore intelligent
procedure is to use fixed cells of unit width centered at the integers, with
cell probabilities computed from N (X , a ) . O f course, X and î from the
grouped data are only approximate. Sheppard^s correction fo r a im proves
the approximation, and gives X = 14.540 and a = 2.216. Calculatingthe
cell probabilities and computing the P earson statistic, we obtain X^ = 7.56.
The P -v alu e lies between 0.819 (from X^(12)) and 0.911 (from x^(14)), so
that the E M E A data fit the norm al fam ily very w e ll indeed. The applicability
of X^ to grouped data such as these is an advantage o f chi-squared methods.
3 .3 .1 Data-Dependent Cells
/ V -Ч
P.(X,<r) = J /o
(2iro-^) •‘ e -(t-3QV2â='
- Z s - i
' dt
X+z.
1-1
= J
,O- . -5 e - u V 2 ^du
(2тг)
"i-i
Nonetheless, data-dependent cells move the cells to the data without essen
tially changing the asymptotic distribution theory of the chl-squared statistic.
They should be routinely employed in practice, and this is done in most of
the examples in this chapter.
Some of the most useful recent work on chi-squared tests involves the study
o f quadratic form s in the standardized cell frequencies other than the sum
of squares used by Pearson . Random cells a re commonly recommended in
these statistics, fo r the reasons outlined in Section 3 .3 .1 , and do not affect
the theory. A statement o f the nature and behavior o f thèse general statistics
o f chi-squared type is n ecessarily somewhat complex. Practitioners may
find it helpful to study the examples computed in Section 3 .3 .3 and in Rao
and Robson (1974) before approaching the summary treatment below.
Random cells should be denoted by ^ ^ p recise notation,
but here the notation E^ fo r cells and fo r cell frequencies w ill be con
tinued. The ”cell probabilities” under F (-| ö ) a re
p (ö ) = J d F (X ie ), 1 = 1. M
Ei
[Nj -npj(e)]/(np.(e))^
V J (3.8)
n n n n n
important case of raw data M L E s . They give the quadratic form in Vn(§n)
having the (M - I) limiting null distribution. The appropriate m atrix is
Q(ôjj), where
Q (0 ) = + B (6 )[J (Ö ) - В ( 0 ) ’В ( 0 ) Г * В ( 0 ) ’
i Эр (0 )
n n n n n n
I /M N Эр. M N. Эр, \
JN. öp.
V’ B = (3.9)
.... Ш )
and
= V’ a ., - B(B'B)-iB*)V
n n n M n
3 .3 .3 .1 Example
r/ 1/.Ч “X/9
f(x l0 ) = в e , O< X < «>
^ M (N ,-n p ^ )^ M ^
n np^ nD V de /
where
M dp ^
80 MOORE
P (в ) = (3.11)
Z^ = - l o g ( l 1=1, . . . , 24
25
D = Jr^F l -2 5 Z v [l
Finally
1 -25
^ (251^ ( S p N v )Z
100 4 я / i 100 I -2 5 Z p V?
1=1 * I
25
I - 25 Z
E = 0. 04255
I
F o r the WE2 data set, X = 0.878. The resulting cell boundaries and cell f r e
quencies appear in Table 3.2, and
R - I (25)Z (-0.0519)Z
^lOO 4^® ^ 100 0.04255
TABLE 3.2 The Rao-Robson Test for the Negative Exponential Family,
with 25 Equiprobable Cells
WE2 EXP
i z. V. ZjX N. Z^x N.
I I I I
^ (25)^ (-0.1231)^
i(54)
100 100 0.04255
3 .3 .3 .2 Example
0, = median (X ^, . . . , X )
In ' I ’ n
Kr. = i l IX ,-e ,
¢. = - l o g ( l , i = 0, V
Эр.
Ц < ^ = -1 / Ч п 1=1.
= 1/Мв i = + I, . . . , M (3.12)
2n
^ I k-1
<V ■ - C^e ) 1 = p+k, v - k + 1
к = 1, ►, I'
T- , “\ - l "^k
then
-P .2
» 'V i
This computation w as sim plified by the fact that B 'B is diagonal and the first
term o f (3.9) is O by (3.12) and the definition o f the median.
The B A E N data contain n = 33 observations, fo r which = 13 and
^ n = ^ .36. Table 3.3 contains cj, upper cell boundaries + ^ i^ n *
cell frequencies fo r these data. The statistic is, after some arithmetic.
10
Z = ^ ^ (N - 3.3)2 [-1.2828)2
n 33 ' I 33 (2)(.1574)
The P -v alu e from (7) is 0.426. The Pearson statistic = 7.30 has c rit
ical points falling between those of x^(7) and x^(8), taking advantage of the
fact that the grouped data M L E was used to estimate one of the two unknown
param eters. The corresponding bounds on the P -v alu e are 0.398 and 0.505.
The double exponential model clearly fits the B A E N data very w e ll. Even
84 MOORE
Cell 0- + C .0 . N.
In I 2n
1 -1.609 4.722 4
2 -0.916 7.051 7
3 -0.511 8.414 3
4 -0.223 9.380 2
5 0 10.130 1
6 0.223 10.880 3
7 0.511 11.846 4
8 0.916 13.209 3
9 1.609 15.538 4
10 2
3 .3 .3 .3 Example
Ml = X Мг = Ÿ
In constructing a test o f fit to this fam ily, Jit is natural to use as cells annuli
centered at (X ,Y ) with successive radii cjor fo r
Thus
= S I f(x ,y l0 )d x d y
Ei
= | -2 1 o g ( l - ¿ ) | , i=l, . . . , M - I
= ^ = О
в ~ в
1 2 1 2
— 1, 2 ■ 2 ® i-l 2 "2 °i.
= о- (Cj jC - с. e )
да
= v./o-
I
Hence
O O O
a M 2
0 0 Z j Vj
86 MOORE
The F ish er information m atrix fo r the circu lar bivariate norm al fam ily is
also diagonal,
J (0 ) = I
2
<T
0
JL туг
V B = n * (0 ,0 ,2 , N v 7 î )
n I i i '
( s f N .d ./
_ M y / n
, 2
1=1 I - M S ^ d^
where
.,.V ,/ . = O - y . 0 . ( : 4 ) . O ^ ( I - V )
3 .3 .3 .4 Example
r/ I /.-I - x / 9
f(x l^ ) = O e , O< X < «
í2 = { 0 : O < 0 < o o }
is often assumed in life testing situations. Such studies may involve not a full
sample, but rather Type II censored data. That is, ord er statistics are
observed up to the sample a-quantile.
< X
^ (1 ) ^ ^(2 ) ^ ([na])
where [no:] is the greatest integer in na and O < a < I. It is natural to make
TESTS OF CHI-SQUARED TYPE 87
-ij_ / e -ij/ 0
M
Ё
Z
i= l
N j
«i-1® - ¥
/в
= O
- e
M [N - n p (5 )]^
where
-Í/“
Pj(e> = e - e (random)
^ = T j X,,, + (n - [пог])Х , )
n [nû'JVj^^ (i) ' ([nû'l)/
([no'])>
By obtaining the limiting distribution o f Vjj(öj^) and then finding the appro
priate quadratic form , a generalization of the Rao-Robson statistic to cen
sored samples can be obtained. This is done in Mihalko and M oore (1980).
88 MOORE
M 2
\ = x ^ (e ;) + ( n D ) - ^: z
^
1=1
v)
where Щ and p j(0 ) are as above, and
,-1 ,, -h - Л .
- “Í^tM
v/
t
-I n V 2/
^ = I - e - 2i
1=1
with a P -v alu e of 0.734 from X^(S)* These results are comparable to those
obtained fo r the same data in Section 3 . 3 . 3 . 1.
the individual observations too often. That is, positive dependence is con
founded with lack of fit. This is shown in considerable generality fo r both
chi-squared and EDF tests by G leser and M oore (1983). If a model for the
dependence is assumed, it may be possible to compute the effect of depend
ence or even to construct a valid chi-squared test using the distributional
results in IVbore (1982). But in general, data should be checked for se ria l
dependence before testing fit, as the tests are sensitive to dependence as
w ell as to lack of fit.
3 . Sequentially adjusted c e lls . By use of the conditional probability
integral transformation (see Chapter 6), O^Reilly and Quesenberry (1973)
obtain particular m em bers of the following class of nonstandard chi-squared
tests. Rather than base cell frequencies on cells E^ (fixed) o r (X^^,. . . , ¾ )
(data-dependent) into which all of ..., a re classified, the cells used
to classify each successive Xj are functions Eij of X^, . . ♦ , Xj only. Thus
additional observations do not require reclassification of e a rlie r observa
tions, as in the usual random cell case. No general theory of chi-squared
statistics based on such sequentially adjusted cells is known. O ’Reilly and
Quesenberry obtain by their transformation approach specific functions E y
such that the cell frequencies are multinomially distributed and the Pearson
statistic has the “ I) limiting null distribution. The transformation
approach requires the computation of the minimum variance unbiased esti
mator of F ( - 10) • Testing fit to an uncommon fam ily thus requires the p ra c
titioner to do a hard calculation. M oreover, any test using sequentially
adjusted cells has the disadvantage that the value of the statistic depends on
the order in which the observations w ere obtained. These are serious b a r
r ie rs to use.
4 . Easterling’s approach. Easterling (1976) provides an interesting
approach to param eter estimation based on tests o f fit. Roughly speaking,
he advocates replacing the usual confidence Intervals for в in F(* |0) based
on the acceptance regions of a test of
Hq I O = Oq
H j : 0 Ф Oo
h J: G (-) = F(-|0o)
H f: G (.) ^ F(.|0o)
M [N -n p (0 )]2
2
i= l — = V V -
But if 0J1 is the minimum chi-squared estim ator, (3.13) holds if and only if
When any F(x|ö) is true, (5^) has the X^ (M - p - I) distribution, and the
probability of the event (3.14) can be explicitly computed. It is less than oj,
but close to a when M is la rg e . Thus Easterling’s suggestion essentially
reduces to the use o f standard tests of fit with param eters estimated by the
minimum distance method corresponding to the test statistic employed.
M oreover, his method b y -p asses a proper consideration of the distributional
effects of estimating unknown p aram eters.
Chi-squared tests are generally less powerful than ED F tests and sp ecial-
purpose tests of fit. It is difficult to assess the seriousness o f this lack of
power from published sources. Comparative studies have generally used the
Pearson statistic rather than the m ore powerful Watson-Roy and Rao-Robson
statistics. M oreover, such studies have often dealt with problem s of param
eter estimation in ways which tend to understate the power of general purpose
tests such as chi-squared and Kolm ogorov-Sm im ov tests. This is true of the
study by Shapiro, W ilk and Chen (1968), fo r example. R eliable information
about the power o f chi-squared tests fo r normality can be gained from Table
IV o f Rao and Robson (1974) and from Tables I and 2 of Dahiya and Gurland
(1973). The form er demonstrates strikingly the gain in power (always at
least 40% in the cases considered, and usually much greater) obtained by
abandoning the P e a rso n -F ish e r statistic fo r m ore m odem chi-squared statis
tics. Nonetheless, chi-squared tests cannot in general match ED F and spe
cial purpose tests of fit in power.
92 MOORE
This relative lack of power Implies three theses on the practical use of
chi-squared techniques. F irst, chi-squared tests of fit must compete for use
prim arily on the basis of flexibility and ease of u se. D iscrete and/or multi
variate data do not discomfit chi-squared methods, and the necessity to esti
mate unknown param eters Is m ore easily dealt with by chi-squared tests
than by other tests of fit.
Second, chi-squared statistics actually having a (limiting) chl-squared
null distribution have a much stronger claim to practical usefulness. Ease
of use requires the ability to obtain (I) the observed value of the test statis
tic , and (2) critical points for the test statistic. The calculations required
for (I) in chl-squared statistics are at most Iterative solutions of nonlinear
equations and evaluation of quadratic form s, perhaps with m atrix expressed
as the Inverse of a given symmetric pd m atrix. These are not serious b a r
rie rs to practical use, given the current availability of computer lib ra ry
routines. Computation of critical points of an untabled distribution is a much
harder task for a user of statistical methods. Chi-squared and ED F statistics
both have as their limiting null distributions the distributions of linear com bi
nations of central chi-squared random variables. General statistics of both
classes require a separate table of critical points for each hзфothesized
fam ily. The effort needed is justified when the hypothesized family is com
mon, but should be expended on a test m ore powerful than chi-squared tests.
In less common cases, o r when no more powerful test with 0-free null d is
tribution is available, there are several chi-squared tests requiring only
tables of the distribution. These include the P e a rs o n -F is h e r, Rao-Robson,
and Dzhaparidze-Nikulin tests, and others which can be constructed by W ald ’s
method. Among the chi-squared statistics proposed and studied to date, the
Rao-Robson statistic R^ of (3.10) appears to have generally superior power
and is therefore the statistic of choice for protection against general alterna
tives. Computation of R^ in the nonstandard cases most appropriate for ch i-
squared tests of fit does require some mathematical work. However, the
Pearson statistic {0^) with raw -data MLEs is the first and usually domi
nant component of Rjj. If Х^(0д) itself lies in the upper tail of the X^ (M - I)
distribution, the fit can be rejected without computing R^.
The third thesis rests on the exposition and examples in this chapter.
Chi-squared tests are the most practical tests of fit in many situations. When
param eters must be estimated in non-location-scale fam ilies o r in uncommon
distributions, when the data are discrete, multivariate, o r even censored,
chi-squared tests remain easily applicable.
A C K N O W LE D G M E N T
R E FE R E N C E S
F ish er, R. A . (1924). The conditions under which m easures the d iscrep
ancy between observation and hypothesis. J. Roy. Statist. Soc. 87,
442-450.
4.1 IN TR O D UCTIO N
4 . 2 E M P IR IC A L DISTRIBUTIO N F U N C T IO N STATISTICS
Suppose a given random sample of size n is ..., and let < X^2)
< • • • < X (n) be the o rd er statistics; simpóse further that the distribution of
X is F (X ). F o r the present and in most of this chapter we assume this d istri
bution to be continuous. The em pirical distribution function (E D F) is Fjj(x)
defined by
number of observations S. x
F^(X) = -OO < X < OO
97
98 STEPHENS
Z,b
F^(X) = 0, X < X
(I)
., n - I (4.1)
^n^*^ n’ ^(1) - * ^(1+1) ’ ^ ■
= V ) - *
E 4 .2 .1 Example
T able 4.1 gives the weight X in gram s of twenty 21-day-old leghorn chicks,
given by B liss (1967) and taken from Appendix A . Figure 4.1 gives the EDF
of these sample data. It is clear that the weights have been rounded to the
nearest gram , so that strictly the parent population is discrete, but with
these large numbers this approximation w ill make negligible difference. The
Fjj(x) suggests that F (x) w ill have the characteristic S-shape of many d istri
butions, including the normal distribution. A tзфlcal norm al distribution,
with mean д = 200 and variance = 1225, has also been drawn in F ig
ure 4.1; this w ill be used in later work to illustrate tests that the sample of
weights comes from this distribution.
4 .2 .2 ED F Statistics
A statistic m easuring the difference between Fjj(x) and F (x) w ill be called an
EDF statistic; we shall concentrate on seven which have attracted most atten
tion. They are based on the vertical differences between Fjj(x) and F (x ), and
are conveniently divided into two classes, the supremum class and the
quadratic class.
The supremum statistics. The first two EDF statistics, D*^ and D “ , a re,
respectively, the largest vertical difference when F^(X) is greater than F (x ),
and the largest vertical difference when F^(X) is sm aller than F (x ); form ally,
D ^ = supx{ F n (X ) - F ( X ) } and D ’ = s u p x {F (x ) - Fn(X)}. The most well-known
EDF statistic is D, introduced by Kolmogorov (1933):
+
D = sup IF (X) F(X)I = max(D ,D )
^x n'
V = d "^ + d “
Q = n / {F (X) - F (x )}z ^ (x )d F (x )
where ф{х) is a suitable function which gives weights to the squared d iffe r
ence { F n (X ) - F (X ) }^. When ip{x) = I the statistic is the C ram ár-von M ises
statistic, now usually called W ^, and when ip(x) = [ { F ( x ) } { ( i - F ( x ) } ] “^ the
statistic is the Anderson-D arling (1954) statistic, called A ^. A modification
of also devised originally for the circle (see Section 4. 5 . 3 ) , is the Watson
(1961) statistic defined by
From the basic definitions of the supremum statistics and the quadratic sta
tistics given above, suitable computing form ulas must be found. This is done
by using the Probability Integral Transform ation (P IT ), Z = F (X ); when F (x)
is the true distribution of X, the new random variable Z is uniformly distrib
uted between 0 and I . Then Z has distribution function F *(z ) = z, 0 < z < I.
Suppose that a sample X^, . . . , X^ gives values Z^ = F (X j), i = I, . , n,
and let F J ( z ) be the EDF of the values Z j .
ED F statistics can now be calculated from a comparison of F *(z ) with
the uniform distribution for Z . It is easily shown that, for values z and x
related by z = F (x ), the corresponding vertical differences in the EDF dia
gram s for X and for Z are equal; that is,
V = d "^ + D“
In these e g r e s s io n s , log x means loge x, and the sums and maxima are
over I < I < n. A ll these form ulas are very straightforward to calculate,
particularly with a m odem computer, o r program m able desk calculator.
Note that statistic D “ can be easily m iscalculated, using 1/n instead of
(i - l)/n .
102 STEPHENS
4 .3 .1 General Comments
When the unknown components of 9 are location o r scale param eters, and if
these are estimated by appropriate methods, the distributions of ED F statis
tics w ill not depend on the true values of the unknown param eters. Thus p e r
centage points for EDF tests for such distributions, for example, the normal,
exponential, extrem e-value, and logistic distributions, depend only on the
fam ily tested and on the sample size n. Nevertheless, the exact distributions
of EDF statistics a re very difficult to find and except fo r the exponential d is
tribution, Monte Carlo studies have been extensively used to find points for
finite n. Fortunately, for the quadratic statistics W ^, U^, and A^, asymptotic
theory is available; furtherm ore, the percentage points of these statistics for
TESTS BASED ON EDF STATISTICS 103
finite n converge rapidly to the asymptotic points. F o r the statistics D"^, D ",
D, and V, there is no general asymptotic theory (except for Case 0), and
even asymptotic points must be estimated. This may be done by plotting, for
a fixed a , the Monte Carlo points for samples of size n against m = l/n,
and then extrapolating to m = 0; alternatively, since the statistics are func
tions of a process which is asymptotically Gaussian, points may be found by
simulating the Gaussian process. Serfling and Wood (1975) and Wood (1978a)
have obtained asymptotic points by this method. Both techniques are of
course subject to sampling variation and e rro rs due to extrapolation; Chandra,
Singpurwalla, and Stephens (1981) have given some comparisons of the two
methods in obtaining points for tests for the extreme value distribution.
F o r the tests corresponding to many distributional fa m ilie s, Stephens
(1970, 1974b, 1977, 1979) has given modifications of the test statistics; if
the statistic is, say, T , the modification is a function o f n and T which is
then referred to the asymptotic points of T o r of Т * Л . Asymptotic theory
depends on using asymptotically efficient estim ators fo r the estimates of
unknown components of Э ; the asymptotic points given w ill then be valid for
any ,such estim ators. Points for finite n w ill depend on which estimators are
used; usually these are maximum likelihood estim ators although an exception
is the Cauchy distribution below. In the test situations in following sections,
percentage points for finite n, using the estim ators given, w ere found from
extensive Monte Carlo studies, often done by the author, although other
studies referenced have also been used. The modifications are then derived
from an examination of how these points, for a = 0.05 say, converge to the
asymptotic point. A feature of the modifications is that, at least in the appro
priate tall, they do not depend on o ; thus when such modifications have been
found, the usual tables of percentage points, with entries for n and a , can
be reduced to one line for each test situation. The modifications hold only if
the estimators given are used. They have been calculated to be most accurate
at about a = 0.05, but usually give good results, for practical purposes,
for Oi less than about 0.2.
When unknown param eters are not location o r scale param eters, for example
when the shape param eter of a Gamma o r a W elbull distribution is unknown,
null distribution theory, even asymptotic, when the param eters are esti
mated, w ill depend on the true values of these param eters. However, if this
dependence is very slight, a set of tables, to be used with the estimated
value of the shape param eter, can still be valuable (see, for example. Sec
tion 4.12, concerning tests for the Gamma distribution). Other methods of
dealing with unknown param eters are discussed in Section 4 . 16. 3 .
There is now a vast literature on EDF statistics and tests, and only the
principal references related directly to the tests and tables are included
here. Surveys have been given by Sahler (1968) and by Neuhaus (1979); a com
prehensive review of the theory, and many references, is given by Durbin
(1973).
104 STEPHENS
W e next give tests and applications of EDF statistics for Case 0, and in
subsequent sections give tests for the m ajor distributions.
The following procedure can now be set out for EDF tests for Case 0, that
is, for the null hypothesis
(a) Put the X j in ascending ord er, X (i) < X^2) < • • * < ^ (n )-
(b) Calculate = F ( X ^ ^ ÿ $ ) , i = I, .. , n.
(о) Calculate the appropriate test statistic using (4 .2 ).
(d) Modify the test statistic as in Table 4 . 2 using the modifications for the
upper tail, and compare with the appropriate line of percentage points.
If the statistic exceeds the value in the upper tall given at level a , Hq is
rejected at significance level Oi.
E 4 . 4 . 1 Example
Suppose the data in Table 4 . 1 are to be tested to come from a normal d istri
bution with mean /u = 200 and standard deviation (j = 35. This is the distri
bution drawn in Figure 4 .1 . The Probability Integral Transform ation gives
the values in column Z j of Table 4 . 1. These have then been used to draw the
EDF in Figure 4.2. The calculation of D+ and D ” is also illustrated in the
figure. Form ulas (4.2) give, t o S d . p . : D"*^ =0.044, D = 0 .1 7 1 , D = 0.171,
V = 0.216, for the supremum statistics, and = Ó.187, U^ = 0.051, and
A^ = 1.019 for the quadratic c la ss. From Table 4.2 the modified value D *
is found from D * = D( n/ii + 0 .1 2 + 0. l l / ’sTn) and the value is 0.790. R e fe r
ence to the percentage points on the same line as the modification (the asymp
totic percentage points of iW n ) shows D to be not significant at the 15% level.
The modified values o f the other statistics (using, fo r example, W * for m od-
lfled W ^) a re : D +* = 0.203, D " * = 0.790, V * = 1.011, V ,* = . 177, U * = .048,
A * = 1.019. These give levels o f significance a (o r p -le v e ls) w ell below the
25% point for all the statistics, so that the hypothesis Hq w ill not be rejected.
It w ill be observed from the modifications in Table 4 . 2 that the percent
age points of and U^ for finite n converge rapidly in the upper tail to the
asymptotic points, and even if the modifications w ere not included in the
table, the use of the asymptotic percentage points for n > 20 would give
negligible e rro r in a . Even more striking is the fact that, for n > 3, the
distribution of A^ is accurately given by the asymptotic distribution. This
H
И
T A B L E 4.2 Modifications and Percentage Points for EDF Statistics for Testing a Completely Specified Distribution H
W
(Case 0; Section 4.4)
W
Statistic Significance level a И
ö
T Modified form T * .25 .15 .10 .05 .025 .01 .005 .001
о
(U^ - 0.1/n + O .l/n^H l.O + 0.8/n) 0.105 0.131 0.152 0.187 0.222 0.268 0.304 0.385
Adapted from Stephens (1970), with perm ission of the Royal Statistical Society. O
СЛ
106 STEPHENS
FIGURE 4.2 E D F o fZ j.
The test as described above is a one-tail test, using only the upper tail of
the test statistics. This is because, in general, we should expect the d iffer
ence between Fjj(x) and F (x) o r between F^(Z) and F (z) to be large when Ho
is not tru e . If a test statistic appears to be significant in the low er tail it
suggests that the Z -sam ple is too regu lar to be a random uniform sam ple,
and perhaps the original X-data have been tampered with. Such Z -values are
called superuniform ; tests for superuniformity can be made using the modi
fications and lower tail points also given in Table 4 .2 . Superuniform o b se r
vations can arise in other ways also, particularly in connection with tests for
the exponential distribution, o r for randomness of points in time. An inter
TESTS BASED ON EDF STATISTICS 107
esting data set which appears to be superuniform is the dates of the kings
and queens of England (Pearson, 1963). Further comments on superuniform
ity are in Chapters 8 and 10.
Suppose a test statistic T takes the value t; the significance level, o r p-value,
of the statistic w ill then be the value p = P (T > t ) . In some contexts the term
is also applied to the low er tail probability P (T < t) but here q, o r q -level,
w ill be used fo r this quantity; thus q = I - p. It is useful (especially in com
bining several independent tests, see Sections 4.18 and 8.15) to be able to
calculate the significance level a ll along the distribution o f T , and not m erely
in the tails. F o r A^, Case 0, the table of q-values of the as 3rmptotic d istri
bution given by Lew is (1961) is reproduced in Table 4 .3 . Since A^ needs no
modification for sample size greater than three. Table 4.3 may be used to
give q-values for all n > 3. T ables to find p o r q in the tails w ere given for
other EDF statistics by Stephens (1970).
E 4 .5 .2 Example
4 .5 .3 Observations on a C ircle
E 4 .5 .3 Example
Consider a sm all sample of four valu es, which are to be tested for uniformity
around a circle of unit circum ference. With North as origin, and positive
direction clockwise, suppose the X v a lu e s are 0.3, 0.4, 0.5, 0.9. If East
w ere regarded as origin, these values would change to 0.05, 0.15, 0.35,
and 0.65. When the EDF are drawn fo r these two cases, the values of D”^,
D ", and D a re , respectively, 0.15, 0.30, 0.30 and 0.40, 0.05, 0.40, bu tin
both cases V is 0.45. The corresponding values of W ^, A^, and are
108 STEPHENS
TABLE 4.3 Distribution of A^, Case 0: The Table Gives q = P (A^ < z)
Adapted from Lew is (1961), with perm ission of the author and o f the Institute
of Mathematical Statistics.
TESTS BASED ON EDF STATISTICS 109
0.053, 0.337, 0.043 fo r North as origin, and 0.203, 1.116, 0.043 for East;
and A^ change in value but U^, like V , rem ains constant.
When param eters are not known in F (x ;ö ), confidence sets may be provided
fo r them by the following device. Suppose 0 is a vector of unknown param
eters, which need not be only location o r scale p aram eters, and suppose
values are given to unknown components of 0 to give vector Oq ; then F(x;0o)
is completely specified and ED F statistics can be calculated. Suppose T is
such a statistic. The confidence set fo r 0, derived from T , and with level
100(1 - oi)% , includes a ll those values of Oq which make T not significant at
level 0?. The confidence set is sometimes called a consonance set o r region.
Easterling (1976) and Littelland Rao (1978) have investigated the use of A^
and D fo r finding consonance sets fo r p aram eters. The technique affords an
interesting mixture of goodn ess-of-fit and param eter estimation methods.
Many other statistics have been proposed to m easure the discrepancy between
F jj(X) and F (x ;0 ); they are often closely related to the seven statistics d is
cussed above, and have sim ila r properties.
between F^(Z) and F (z) at equal Intervals along the z-a x is between 0
and I; the statistic has a discrete distribution.
(C ) Let Ô1 = m a x i{ I - (i - l)/ n | , I - i/n|}; the Kolmogorov statistic
D is then max ôj. Finkelsteln and Schafer (1971) have proposed the sta
tistic S = Z i ôi and have given a table of percentage points for n up to
30.
(d) Hegazy and Green (1975) and Green and Hegazy (1976) have discussed
several statistics calculated from slight modifications of the computing
form ulas in Section 4 .2 . Berk and Jones (1979) gave other statistics
based on F^(x) and sim ilar to the Kolmogorov statistics. Hegazy and
Green (1975) have demonstrated that their modified statistics can in
crease power against certain alternatives, and Berk and Jones showed
certain optimal properties in the sense of Bahadur efficiency for their
statistics.
(e) A set of statistics closely related to ED F statistics, although not derived
from the E D F , is the C and K set, described in Section 8.8.
In Case 0, as we have seen, the final test is that a set of variables Z is uni
form ly distributed U (0 ,1), and a discussion of power properties of EDF
statistics is therefore deferred to Chapter 8, where tests for uniformity are
discussed in detail. However, for later comparisons in this chapter, we
sum m arize certain properties of EDF statistics in Case 0 situations.
(a) EDF statistics are usually much m ore powerful than the Pearson c h i-
square statistic; this might be explained by the fact that for the chi-square
statistic the data must be grouped, with a resulting loss of information,
especially fo r sm all sam ples.
(b) The most well-known EDF statistic is D, but it is often much less pow er
ful than the quadratic statistics and A ^ .
(C ) Statistics D"*" and D “ w ill be powerful in detecting whether o r not the
Z -set tends to be close to 0 o r to I, respectively; A^, W ^, and D w ill
detect either of these two alternatives, and and V are powerful in
detecting a clustering of Z values at one point, o r a division into two
groups near 0 and I . In term s of the original observations X, statistics
D"^, D “ , A^, W ^, and D w ill detect an e r r o r in mean in F (x ;d ) as speci
fied, and and V w ill detect an e r r o r in variance.
(d) A^ often behaves sim ilarly to , but is on the whole m ore powerful for
tests when F (x ;0 ) departs from the true distribution in the tails, espe
cially when there appears to be too many outlying X -values for the F (x ;ö )
as specified. In goodness-of-flt work, departure in the tails is often im
portant to detect, and A^ is the recommended statistic.
TESTS BASED ON EDF STATISTICS 111
4 .7 .1 Introduction
D = sup I F ( Z ) - Z l
I t,n Л . n
0< z< t
= max
1< 1< r '
T A B L E 4.4 Upper T all A s 5rmptotic Percentage Points for n/iiD, and A^,
fo r Type I o r Type 2 Censored Data from U (0 ,1) (Section 4.7)
Significance level ot
Statistic
^ЛlD
.2 .4923 .6465 .7443 .8155 .9268 1.0282 1.1505 1.2361
.3 .5889 .7663 .8784 .9597 1.0868 1.2024 1.3419 1.4394
.4 .6627 .8544 .9746 1.0616 1.1975 1.3209 1.4696 1.5735
.5 .7204 .9196 1.0438 1.1334 1.2731 1.3997 1.5520 1.6583
.6 .7649 .9666 1.0914 1.1813 1.3211 1.4476 1.5996 1.7056
.7 .7975 .9976 1.1208 1.2094 1.3471 1.4717 1.6214 1.7258
.8 .8183 1.0142 1.1348 1.2216 1.3568 1.4794 1.6272 1.7306
.9 .8270 1.0190 1.1379 1.2238 1.3581 1.4802 1.6276 1.7308
1.0 .8276 1.0192 1.1379 1.2238 1.3581 1.4802 1.6276 1.7308
Statistic
Statistic
A^
.2 .135 .252 .333 .436 .588 .747 .962 1.129
.3 .204 .378 .528 .649 .872 1.106 1.425 1.731
.4 .275 .504 .700 .857 1.150 1.455 1.872 2.194
.5 .349 .630 .875 1.062 1.419 1.792 2.301 -
.6 .425 .756 1.028 1.260 1.676 2.112 2.707 -
.7 .504 .882 1.184 1.451 1.920 2.421 3.083 -
.8 .588 1.007 1.322 1.623 2.146 2.684 3.419 -
.9 .676 1.131 1.467 1.798 2.344 2.915 3.698 -
1.0 .779 1.248 1.610 1.933 2.492 3.070 3. 880 4.500
Table fo r »N/nD adapted from Koziol and B yar (1975), with perm ission of the
authors and of the Am erican Statistical Association. Tables fo r and A^
adapted from Pettltt and Stephens (1976), with perm ission of the author and
of the Biom etrika Trustees.
TESTS BASED ON EDF STATISTICS 113
2 ^ .n = ""P
0 < z < Z (p )
= max (4.4)
, { ; - V
K i< r
Ho: the censored sample < X^2) < • * • < X^j.) comes from the fully
specified continuous distribution F (x)
F o r Type I censoring;
F o r Type 2 censoring;
IimР{\Гп( I t,n -
)<у)} =G(у) t
n-*oo
2у2
ф (А ^ ) - ф (В ^ ) е •" , у > О
where At = (t - t^)^ and Bt = (2t - l)A t* The tall area for the two-sided
statistic is approximated w ell for significance levels less than 0.20 by dou
bling the one-sided value. Thus to obtain a p-value, the test statistic iDt^n
o r 2l^r,n Is first adapted to obtain D f o r D * as described above, and then
the p -value, for a two-sided test, is well-approxim ated by p^ = 2 { l - G ^ (D *)}
o r P r = 2 { l - G t (D ^ }. Examples of calculations of Kolm ogorov-Sm irnov
statistics fo r censored data are given in Section 1 1 .3 .1.
= У (z
2 r .n (i) 2n ^ 3V (r) n/
= - nZ
2 r ,n 2 r ,n (r)
where Z = ^ (4.5)
i= i W
2 < . ■
1=1 i= l
Significance level a
(continued)
TESTS BASED ON EDF STATISTICS 117
T A B L E 4.5 (continued)
Significance level a
(continued)
118 STEPHENS
T A B L E 4 . 5 (continued)
Significance level a
4 .7 .3 .2 Doubly-Censored Data
= i
2 s r ,n 2 r ,n 2 s ,n
sim ilar definitions hold fo r Type I censoring and for and . Pettltt and
TESTS BASED ON EDF STATISTICS 119
4 .7 .4 Random Censoring
X, = m i n ( X ? , T ) and 0 = 1 if X =X ?
I M i' i i l
= 0 if X^ = T j
Such data could occur when X?^ are lifetimes o f patients who enter a study of
a certain disease; then if the patient dies from the disease before the study
ends, is recorded, but if the patient is still alive at the end of the study,
o r withdraws, o r dies of another cause, the time T j is recorded fo r which
he o r she was observed. The distribution function F(x) of X is then given by
I - F(X) = { I - F° (X)} { I - Fc(X) }. There has been much recent Interest in
testing fit in the presence of random censoring, o r in estimating and giving
confidence Intervals for F° (x) o r the related survival function S® (x) =
I - F °(x ).
0
F (X ) = 0, X < X
C n' ' (I)
n - R lö i
X < X
Ь й 4 т ■ (n)
i:X^< X f I
= I X > X
(n)
2
W = n E - {Z^.^ - Z^._^^}, . I
Suppose Ft(t) is the distribution o f the ti, and let G^(t) = F ^ (s)d s. Then
4 .7 .5 Renyl Statistics
IFJx) -F (X )I
R- = sup
FW
a< F (x )< b
R = sup { F (X) - F (X )}
^ F (x )< b ”
F^(X)
«5 = 7 -iw
F (X)
° "F W
where P is the interval 0 < F^(X) < r/n, with r an Integer in the range
I < r < n. Birnbaum and Lientz (1969a,b) have given exact and asymptotic
theory for some of these statistics fo r Case 0, and have produced tables of
percentage points fo r R j , R 2 , and R^ ; they also gave examples of the use of
the statistics, particularly in giving confidence lim its for F (x) over a r e
stricted range. Niederhausen (1981) has given tables of points fo r variance-
weighted Kolm ogorov-Sm im ov D, that is, R 3 above but with denominator
[F(X) { I - F ( x ) } ] Instead of F (x) (see Section 4 .5 .6 ), and for the analogues
of D"*” and D ” . Other statistics o f Renyi type, o r closely related, have been
discussed by a number of authors but, despite the potential applications fo r
censored data, they have not been much developed fo r practical use.
statistics specially adapted for censored data, and it is discussed m ore fully
in Section 11. 3. 3.
W e now turn to EDF tests for distributions with one o r m ore param eters
unknown, beginning with the normal distribution.
Signiflcancelevel a
Adapted from Stephens (1974b), with perm ission of the Am erican Statistical
Association.
H
CA
H
T A B L E 4 . 7 Modifications and Percentage Points for a Test for Norm ality with /л and <т^ Unknown CO
W
(Section 4 .8 .1 , Case 3) >
CO
W
Significance level a Ö
i
Statistic Modified statistic .50 .25 . 15 . 10 .05 .025 .01 .005
Ö
Upper tail
H
D D(^Гn - 0.01 + О .вЗ/'Л) - 0.775 0.819 0.895 0.995 1.035 >
ä
V V(NÍñ + 0.05 + 0.82/nTii) - 1.320 1.386 1.489 1.585 1.693
d
O
W2 (1.0 + 0.5/n) .051 .074 .091 .104 .126 .148 .179 .201 CO
U* U^(1.0 + 0.5/n) .048 .070 .085 .096 .117 . 136 . 164 .183
A* A *(1 .0 + 0.75/n + 2.25/n*) .341 .470 .561 .631 .752 .873 1.035 1.159
Low er tail
A^(1.0 + 0 .75/n + 2.25/n*) .341 .249 .226 .188 .160 .139 .119
Adapted, with additions, from Table 54 of Pearson and Hartley (1972) and from Stephens (1974b), with perm ission
of the Biometrika Trustees and of the American Statistical Association.
CO
124 STEPHENS
Wi = (X^i^ - ß ) / s ^ (Case 2)
E 4 . 8 . 1 Example
Ho : the sample is from a norm al distribution but with mean and variance
unknown
The situation is therefore Case 3, and the appropriate estimates for ц and
a re given by x = 209.6, and s^ = 939.25. The transformations give the v a l
ues in column Z 2 o f T able 4 .1 , and Figure 4.3 shows their E D F . Equations
(4.2) give for the test statistics the values: D^ = 0.089, D~ = 0.104, D =
0.104, V = 0.192, = 0.034, = 0.034, A^ = 0 .2 1 4 . T h em o d ified valu es
are D * = 0.483, V * = 0.906, W * = 0.035, U * = 0.034, A * = 0.223. It c a n b e
seen from Table 4 .7 , using Case 3 percentage points, that these are not
nearly significant at the 15% level, so that at this level the sample would not
be rejected as coming from a normal population.
FIG U R E 4.3 E D F o fZ 2 -
126 STEPHENS
Asymptotic
bo bi percentage point a «
Statistic
Z < Zi log q = -13.953 + 775.5z - 12542.eiz^ log q = -13.642 + 766.31z - 12432.74z^ log q = -13.436 + 101.14z - 223.73z^
Zi<z<Z 2 Io g q = -5.903 + 179.546z-1 5 1 5 .29z^ I o g q = -6 .3 328+214.5 7 z-2 0 2 2 .28z2 Io g q = -8.318 + 4 2 .796z- 5 9 . 938z^
Z2 < Z < Z3 I o g p = 0 .886 - 3 1 .62z + 10.897z^I o g p = 0.8510 - 32.006z - 3.45z^ Io g p = 0.9177 - 4.279z - 1.38z^
z>Z 3 Io g p = 1.111 - 34.242z + 12.832z 2I o g p = 1.325 - 3 8 .918z + 16.45z2 Io g p = 1.2937 - 5 .709z + 0.0186z2
S u ppose Z is a modified value of W ^, o r A^ (see Table 4 .7 ). F or a given z, find the interval in which z lies. The formula
gives the value of log q (q = lower tail significance level) o r log p (p = upper tail significance le v e l).
128 STEPHENS
E 4 .8 .2 Example
Green and Hegazy (1976) have shown that slight modifications of the basic
EDF statistics can improve power in tests for normality against selected
alternatives. Hegazy and Green (1975) have discussed tests based on values
V i = {(X (i ) - X )/ s } - m i, where mi Is the expected value o f the i-th o rd er
statistic o f a sample of size n from N (0 ,1).
ß i and ) for the most efficient estimates. F o r n > 10, which would be the
situation most needed in practice, Gupta gives the easily calculated coeffi
cients
^ m(m^ - m) m. - m
I
and
^ l= I ^ ^ i "
Significance level ol
(continued)
130 STEPHENS
T A B L E 4.10 (continued)
Significance level a
(continued)
TESTS BASED ON EDF STATISTICS 131
T A B L E 4.10 (continued)
Significance level a
Some as 3miptotic points taken from Pettitt (1976), with perm ission of the
author and of the Biom etrika T ru ste e s.
132 STEPHENS
Mukantseva (1977), P ierce and Kopecky (1978), and Lo 3mes (1980) have
studied the asymptotic behavior o f the E D F of the residuals when a regression
TESTS BASED ON EDF STATISTICS 133
model has been fitted, with the intention of testing these residuals for norm al
ity. If the model, of any ord er, is correct, the residuals w ill be norm al,
with known mean equal to z e ro , but with unknown varian ce. At first sight
this situation would appear to be Case 2 of Section 4 .8 .1 above, but this is
not so, because the residuals are not Independent. However, the above
authors have shown that if EDF statistics a re calculated from the residuals
their asymptotic distributions are the same as for Case 3 above.
A s an example, consider sim ple linear regression using the model
У1 = /^0 + ^i, i = I, • • • , n, with = N(0,cr2). Let Д, and be the
usual least squares estim ators, and let be the usual estimate of <r^ ob
tained from the e r r o r sum of squares in the A N O V A table; if y^ = ^ + ß i X i ,
У1 " ^ ^ = Z4 e f / ( n - 2). The studentized residuals a re (P ie rc e
and G ray, 1982, with slight change in notation)
Wj = € j / [ a { l - 1/n ( X , -x )2 / S }* ]
' i ' xx‘ ‘
4 .9 .2 Tests fo r Case I
The first method we shall describe fo r Case I uses a special property of the
exponential distribution, as follows. Let = X^i) “ ^ (1 )» ..., n;
on Ho , the w ill be a random sample from Exp (0,)3) (see Section 10.3.1,
Result 2) and, since ß is known, a Case 0 test can be made using the n - I
values o f .
Alternatively, o¿ may be estimated unbiasedly by a = (Х^ц - 1/n); this
estimate is derived from the maximum likelihood estimate X (i ), and has
variance diminishing as l/n^. Then Z (j) are found from Z (i) =
I - exp[-(X (I) - a ) / ß ] , i = I , • • • , n, and EDF statistics calculated from the
Z (i) by form ulas (4.2) w ill have their Case 0 distributions asymptotically so
that the percentage points in Table 4 . 2 may be used for large sam ples. How
ever, in contrast to the previous test procedure, the modifications given
there w ill not apply, and since the two procedures are likely to have very
sim ilar power properties, the first procedure is m ore practical for r e la
tively sm all sam ples.
F o r this case (Case 2) suppose first that a is known to be zero. The maximum
likelihood estimate of ß is given b y ß = X where X is the sample mean.
The steps in testing Hq are as follows:
Upper tail
Significance level Qt
Statistic
T Modified form T * .25 .20 .15 .10 .05 .025 .01 .005 .0025
W2 W 2(1.0 + 0.16/n) .116 .130 .148 .175 .222 .271 .338 .390 .442
U^(1.0 + 0.16/n) .090 .099 .112 .129 .159 .189 .230 .261 .293
A^ A ^(1.0 + 0.6/n) .736 .816 .916 1.062 1.321 1.591 1.959 2.244 2.534
Low er tail
Significance level ol
W2 Asymptotic percentage points. .0192 .0233 .0276 .0338 .039 .044 .048 .074
Adapted from Table 54 of Pearson and Hartley (1972) and from Stephens (1974b), with perm ission of the Biom etrika Trustees
and of the Am erican Statistical Association.
TABLE 4.12 Formulas for Significance Levels, Tests for Exponentiality, Case 2: Origin Known, Scale Unknown
(Section 4.9.3)^
S ta tis tic
U 2 A*
Z < Zj l o g q = - 1 1 . 3 3 4 + 4 5 9 . 0 9 8 z - 5 6 5 2 . Iz=' lo g q = -1 1 .7 0 3 + 5 4 2 .5 z - 7 5 7 4 .5 9 z * l o g q = - 1 2 . 2 2 0 4 + 6 7 . 4 5 9 z - H O . S z^
Zl 0 .0 3 5 0 .0 2 9 0 .2 6 0
Zj < Z < Z2 l o g q = - 5 . 7 7 9 + 1 3 2 . 8 9 z - 8 6 6 . 5 8 z^ lo g q = -6 .3 2 8 8 + 1 7 8 .1 z - 1 3 9 9 .4 9 z * l o g q = - 6 . 1 3 2 7 + 2 0 . 2 1 8 z - 1 8 .6 6 3 z 2
Zz 0 .0 7 4 0 .0 6 2 0 .5 1 0
Z 2 < Z < Z3 I o g p = 0 .5 8 6 - 1 7 .8 7 z + 7 .4 1 7 z * l o g P = 0 .8 0 7 1 - 2 5 . 1 6 6 z + 8 . 4 4 z * l o g P = 0 . 9 2 0 9 - 3 . 3 5 3 z + 0 .3 0 0 z ^
Z3 0 .1 6 0 0 .1 2 0 0 .9 5 0
Z > Z3 l o g P = 0 . 4 4 7 - 1 6 .5 9 2 z + 4 . 8 4 9 z * I o g p = 0 .7 6 6 3 - 2 4 . 3 5 9 z + 4 . 5 3 9 z * l o g P = 0 . 7 3 1 - 3 . 0 0 9 z + 0 .1 5 z ^
S u p p o se Z is a modified value of W ^, U^, o r (See Table 4.11). F or a given z, find the interval in which z lie s. The formula
gives the value o f log q (q = lower tail significance level) o r log p (p = upper tall significance le v e l).
TESTS BASED ON EDF STATISTICS 137
E 4 .9 .3 Example
Proschan (1963, Table I) has given a number of sam ples of data, consisting
of intervals between failures of a ir conditioning equipment in aircraft. W e
T A B L E 4.13 O r ig ln a lV a lu e s X
in a Test for E^x)nentiali1y
X Zi^ zb
12 0.113 0.094
21 0.189 0.159
26 0.229 0.193
27 0.237 0.200
29 0.252 0.213
29 0.252 0.213
48 0.381 0.327
57 0.434 0.375
59 0.446 0.385
70 0.503 0.439
74 0.522 0.457
153 0.783 0.717
326 0.962 0.932
386 0.979 0.959
502 0.993 0.984
Significance level
Statistic
Statistic n/iiD
Statistic N/nV
(continued)
140 STEPHENS
Statistic
Statistic
Statistic
(d) Find the EDF statistics from (4 .2 ), modify W ^, U^, and A^ using
Table 4.14 and compare with the.asymptotic percentage points given;
fo r D"^, D ” , D, and V use T able 4.15 without modification.
and by
Significance level a
(continued)
TESTS BASED ON EDF STATISTICS 143
TABLE 4. 16 (continued)
Significance level a
(continued)
144 STEPHENS
Significance level a
Some asymptotic points taken, from Pettitt (1977b), with perm ission of the
author and of the Biometrlka T ru stees.
TESTS BASED ON EDF STATISTICS 145
2 2
(C ) R efer 2 ^ г ,п 2-^r,n ^ percentage points given In Table 4.16.
Pettitt (1977b) gave asymptotic theory and points for this test: some of the
points have been used in Table 4.16. Tables fo r 2 ^ ^ ^ a re given by Stephens
(1986). ’ 2
F o r a test with Type I censored data, the test statistic, say,
can be found by setting p = = I - exp (-t /Д), where t is the censoring
value and ß is found as above, and then using the form ulas of Section 4 .7 .3 ,
with sample size r + I . F o r large sam ples, an approximate test may be
made by referrin g the statistic to Table 4.16, with entries p = p and n, but
fo r sm aller sam ples, entering the table at an estimate of p instead of the
true value can produce a considerable e r r o r in significance level; see the
comments in Section 4 .8 .4 on tests fo r normality.
Another method o f treating right-censored data is to use the N -tra n s fo r-
mation of Chapter 10 (see Section 10.5.6). This converts a right-censored
exponential sample to a complete exponential sam ple, and the above tests of
exponentlality fo r complete sam ples, o r others given in Chapter 10, may
then be used to test R 0 •
and
Equation (4.8) is solved iteratively for ß , and then (4.9) can be solved for a .
In Case I , )3 is known; then o: is given by (4.9) with ß replacing Д. In Case 2,
Significance level a
A^
Significance level a
(continued)
148 STEPHENS
Significance level a
Taken from Chandra, Singpurwalla, and Stephens (1981), with perm ission of
the authors and o f the Am erican Statistical Association.
The table fo r \/nD, Case 2, has been corrected.
TESTS BASED ON EDF STATISTICS 149
ß = { 2 . Y . - 2 .Y . ex p (-Y ./ i3 )}/ n
JJ J J J
Table 4.17 is taken from Stephens (1977), and Table 4.18 from Chandra,
Singpuivv^lla, and Stephens (1981).
Case I above is equivalent to a test fo r the е^фопепйа! distribution on the
transform ed variable Y = e x p { - X / ß ) . This transformation in (4.6) gives,
fo r Y , the distribution F(y) = I - e x p (-ô y ), y > 0, with ô = езф(се//3). When
ß is known, the transformation can be made, and the Y values are then tested
to come from the exponential distribution with origin zero and unknown scale
param eter (Section 4 .9 .3 ). The test statistics for the exponential test w ill
take the same values as those fo r the Case I test in the present section,
except that D“^ becomes D “ and vice v e rsa .
, ^m -I ( , ^mj
-P 1-(^ ) I
The null hypothesis in this section is
W e consider the case where a is known. Suppose its value is zero, so that
Ho becomes
F o r the test of Hq ^,, the tables fo r the extrem e-value distribution tests
may be used. Let Y = - log X in the distribution W {x ;0 ,ß ,m ); the distribu
tion for Y becomes
with в = l/ m and ф = - log /3. This distribution is the extrem e-value d istri
bution of Section 4.10, and a test o f Hoa fo r X may be made by testing that
Y has the extrem e-value distribution, with one o r both of в and ф unknown.
The test procedure therefore becomes:
The location param eter a w ill be called the origin o f the distribution; ß and
m a re , respectively, scale and shape param eters.
If a is not zero, but has value oíq , say, the transform ation x j = X j - QJq ,
i = I, . . . , n, is made to give a set X*: then the null hypothesis Hq fo r set X
reduces to Hqq, fo r set X*, and Hqq, is tested using the set X*.
In considering Hqq, we can distinguish three cases:
(a) Putth esam plein ascen din gorder Х (ц < • • • < Х(ц).
(b) Let X be the sample mean, and estimate ß h y ß = X^m ; ß is the m axi
mum likelihood estim ator o f /3.
(c) Define
I(X ;m ,^ ) =
ß^T(m )
I
0
^ e x p (-x / ^ )d x
152 STEPHENS
Significance level a
^ ff2
I .175 .222 .271 .338
2 .156 .195 .234 .288
3 .149 .185 .222 .271
4 .146 .180 .215 .262
5 .144 .177 .211 .257
6 .142 .175 .209 .254
8 .140 .173 .205 .250
10 .139 .171 .204 .247
12 .138 .170 .202 .245
15 .138 .169 .201 .244
20 .137 .169 .200 .243
CO .135 .165 .196 .237
F o r m = I , calculate
F o r m > 2, calculate
w . . - » • » ) , D. . .
1.8n-l l.Sn-l n^ m^
The modified statistics are then re fe rre d to the upper tail percentage points
given in Table 4.19 fo r the appropriate known value of m . These points a re
the as 3nnptotic points for the various distributions; they w ere given by Pettitt
and Stephens (1983).
The modifications given above a re based on Monte C arlo studies fo r
finite n, and have been designed to be as comprehensive as possible, cover
ing all values of m and n; when the given percentage points a re used at levèl a
it is believed that the true level o f significance w ill not differ by m ore than
0.5% fo r n > 5.
Significance level a
Significance level a
(d) Calculate the EDF statistics from the using form ulas (4 .2 ).
(e) Reject H oq/ if the value of the statistic used is greater than the value in
Table 4.20 fo r desired significance level a and for appropriate m .
4 .1 2 .5 .1 Comment
The points in Table 4.21 rem ain rem arkably stable as m changes, especially
fo r , and accurate results can be expected when in is used fo r m , except
possibly fo r sm all values of m . Note that only asymptotic points are given;
experience with , , and suggests these w ill be very good approxi
mations to the points for finite n, even for quite sm all n. The points in
T ables 4.20 and 4.21 are taken from Lockhart and Stephens (1985b), where
the as 3rmptotic theory is also developed. A somewhat different treatment
was given much e a rlie r in an unpublished report by Mickey, M m dle, W alk er,
and Glinskl (1963). The various cases when the origin is not known are much
m ore unlikely; furtherm ore, it is often difficult to estimate param eters effi
ciently. Tests for these cases have been given by Lockhart and Stephens
(1985b). T ables for the Kolmogorov statistic D , fo r n = 4(1)10(5)30, have
been given fo r Cases 1 ,2 , and 3 above (a different estimate of m is used in
Case 3) by Schneider and Clickner (1976).
w* W
Case I (1.9nW2 - 0 .1 5 )/(1 .Sn - 1.0) .083 .119 .148 .177 .218 .249
hH
CO
Case 2 (0 .SSnW^ - 0 .45)/(0.95n - 1.0) .184 .323 .438 .558 .721 .847 H
H
Case 3 .114 .152 O
(nW=^ - 0.08)/(n - 1.0) .060 .081 .098 .136 CO
Case 2 (1.6nU^ - 0 . 16)/(1.6n - 1.0) .080 .116 .145 .174 .214 .246
Case 2 (О.бпА^ - 1 .8)/(0 .6n - 1.0) 1.043 1.725 2.290 2.880 3.685 4.308
^F or Gases 1 and 3 use modifications and percentage points for Cases I and 3, respectively (see Section4.13).
Taken from Stephens (1979), with perm ission of the Biometrika T rustees.
158 STEPHENS
Significance level a
Case 0.10 0.05 0.025 0.01
Statistic D'^^Гn
Statistic D n/п
(continued)
TESTS BASED ON EDF STATISTICS 159
T A B L E 4 . 23 (continued)
Significance level Of
Case n 0.10 0.05 0.025 0.01
Statistic Y \T n
Taken from Stephens (1979), with perm ission of the Biom etrika
T ru ste e s.
The param eters are estimated from the data by maximum likelihood.
F o r Case 3, when both a and ß a re unknown, the equations fo r the estimates
Of, ß are
Xj - a I - exp { (X j - Z t ) / ß }
П-* S . ' = -I (4.12)
ß
I + exp {(X ^ - ä ) / ß }
Significance level a
Case I . Statistic
Case I . Statistic A^
Significance level a
Case 2. Statistic
Case 2. Statistic
Significance level a
Case 3. Statistic
Case 3. Statistic A^
local maxima, and It may be difficult to find the true maximum. W e there
fore find estimates using sums of weighted ord er statistics. Chernoff, G ast-
wirth, and Johns (1967) have given the estimate a = with
_ sin 4тг j/ (n + I) - 0 . 5 }
i n tan 7Г j/ (n + I) - 0 . 5 }
A - 8 tan 7r{j/(n + I) - 0 . 5 }
i n sec^ ^ {j/ (n + I) - 0 . 5 }
The points are taken from Stephens (1985), where the asymptotic theory,
and tables for , are given.
The von M ises distribution is used to describe unlmodal data on the circum
ference of a c ircle. Suppose the circle has center O and radius I, and let a
radius O P be measured by the polar coordinate 9 , from ON as origin. Let
$Q be the coordinate of a radius O A , and let /c be a positive constant. The
von M ises density is
Ho : the random sample o f 0-values comes from the von M ises distribution
= Ц в ;в о ,к )0 9
T A B L E 4.27 Upper T a ll Percentage Points fo r fo r Tests
of the von M ises Distribution (Section 4 . 15)
Case I
Case 2
0.0 0.047 0.071 0.089 0.105 0.133 0.163 0.204 0.235
0.50 0.048 0.072 0.091 0.107 0.135 0.165 0.205 0.237
1.00 0.051 0.076 0.095 0 .1 1 1 0.139 0.169 0.209 0.241
1.50 0.053 0.080 0.100 0.116 0.144 0.174 0.214 0.245
2.00 0.055 0.082 0.103 0.119 0.148 0.177 0.218 0.249
2.50 0.056 0.084 0.105 0 .12 1 0.150 0.180 0.220 0.251
3.00 0.057 0.085 0.105 0.122 0 . 151 0.181 0.221 0.252
3.50 0.057 0.085 0.106 0.122 0.151 0.181 0.221 0.253
4.00 0.057 0.085 0.106 0.122 0.151 0.181 0.221 0.253
10.00 0.057 0.085 0.105 0.122 0.151 0.180 0.221 0.252
OO 0.057 0.085 0.105 0.122 0.151 0.180 0.221 0.252
Case 3
Taken from Lockhart and Stephens (1985c), with perm ission of the Biometrika
T ru ste e s.
165
166 STEPHENS
R^
(4.15)
IoM N
where I i (к) is the imaginary B essel function of ord er I . T ables for solving
(4.15) a re given in, fo r example, Biometrika T ables fo r Statisticians, V o l. 2
(Pearson and Hartley, 1972), and by M ardia (1972).
When O A is known, let X be the component o f g on O A ; then the estimate
of K is now given by K i , obtained by replacing R by X in (4.15).
Since the distribution is on a c irc le , only o r V are valid EDF statis
tics, of those we have been considering (see Section 4 .5 .3 ). Asymptotic null
distributions can be found for ; because к is not a scale param eter, the
distribution depends on /c. However, as for the gamma and W eibull distribu
tions, useful tests are still available.
The steps in making a test of Hq a re then as follows:
In Section 4 .6 some comments w ere made on the power of different EDF sta
tistics for Case 0 , using complete sam ples, where essentially the final test
TESTS BASED ON EDF STATISTICS 167
Some other interesting methods have been proposed to deal with unknown
param eters. When sufficient statistics are available for в , Srinivasan (1970,
1971) has suggested using the Kolmogorov statistic D calculated from a com
parison of Fji(x) with the estimate f { x ; 0 ) obtained by applying the R a o -
Blackwell theorem to F (x, 0 ), where в is, say, the maximum likelihood
estimator o f 0 . The resulting tests are asymptotically equivalent to the tests
given in previous sections using F (x ; 0 ) itself (M oore, 1973) and can be ex
pected to have sim ilar properties for finite n. The method w ill usually lead
to complicated calculations, and has been developed only fo r tests fo r norm al
ity (Srinlvasan, 1970; see also Kotz, 1973) and for tests o f exponentiality (see
Section 10.8 .1 ).
TESTS BASED ON EDF STATISTICS 169
Another method of eliminating unknown param eters is called the half-sam ple
method; this can be useful when unknown param eters are not location o r
scale. Unknown components in в a re estimated by asymptotically efficient
methods (fo r example, maximum likelihood) using only half the given sample,
randomly chosen. The estimates, together with any known components, give
an estimate в of the vector 0 . The transformation = ; 0 ), i = I ,
. . . , n, is made, and EDF statistics are calculated from form ulas (4 .2 ),
now using the whole sam ple. A rem arkable result is that, asymptotically,
the EDF statistics w ill have their Case 0 distributions (Section 4 .4 ), although
this w ill not be true fo r finite n. Stephens (1978) has examined the h alf-
sample method applied to tests fo r normality and exponentiallty, to compare
with the techniques given in Sections 4 .8 and 4 .9 . Several points can be made:
(a) The quadratic statistics W ^, U^, and A^, as in other situations, appear
to converge fairly rapidly to their asymptotic distributions: this is p ro b
ably the case fo r tests for other distributions a ls o , so that for reason
ably large (say n > 20) sam ples, the half-sam ple method could be used
with the Case 0 as 3rmptotic points.
(b) The half-sam ple technique is not Invariant; different statisticians w ill
obtain different values o f the estim ates, according to the different p o s
sible random h alf-sam ples chosen fo r estimation, and so w ill get d iffer
ent values o f the test statistics.
(c) There is considerable loss in power when the half-sam ple method is
used fo r tests of normality and exponentiallty, compared with using EDF
statistics with param eter estimates obtained from the whole sam ple, as
described in Sections 4 . 8 and 4 . 9 . The pow ers also tend to vary among
the different statistics.
on the generalizations. The statistic has the property that it takes the same
value if Xi is replaced by -X i o r if it is replaced by l/ X i fo r a ll I. H ill and
Rao (1977) gave tables o f probabilities in the upper tail fo r n 2 T W /4 , fo r n
from 10 to 24. Lockhart and M cLaren (1985) have given as 3nnptotic points
fo r this test.
Use of the EDF to estimate the center o f symmetry was discussed by
Butler (1969) and by Rao, Schuster, and Littell (1975).
4 .1 7 .1 Introduction
4.17.2 T h e E D F fo r D is c r e t e D a t a : C aseO
Suppose that fo r discrete data the possible outcomes a re divided into к cells
and the null hypothesis is
J
S= m ax I J (0 - E )|
l < j < k 'i = l '
j = l.
F jj(X) is the cumulative histogram o f the data. The grouped distribution func
tion Fg(X) may be defined in the sam e way, by replacing O j by E j. Then the
statistic S is equal to
S = n sup IF ( X ) - F (X )I
X n g
and there is an obvious p a ra lle l with the Kolmogorov statistic nD. Sim ilarly,
a statistic p a ra lle l to would be
O к ( j )2
W^ = H - I E I E (O i-E )
3=1 4 = 1 ^ ^
The value of the statistic S depends on the ordering of the cells so that
a different ordering w ill produce a different value for the same data* It is
therefore recommended that S be used when there is a natural ordering of
the categories.
Several authors have discussed the statistic S o r the statistics and S"
defined by
j
S = m ax E (O i - E^) and S = m ax
l<j<k U=I l<3<k
which are analogous to nD"** and nD", and we confine ourselves to tests for
discrete data based on these three statistics.
Pettitt and Stephens (1977) have given exact probabilities fo r the d istri
bution of S fo r equal cell probabilities. They also showed how the tables
can be used as good approximations fo r probability distributions o f S fo r
unequal probabilities p e r cell, and also to deduce approximate probabilities
fo r S^ o r S“ (see also Conover, 1972). Table 4.28 is taken from Table I
o f Pettitt and Stephens (1977). The table gives values o f P (S > m ), fo r values
o f m which give probabilities near the usual test lev els. Thus a test o f Hq is
made as follows:
(a) Record the observed number o f observations (¾ and the e 3q>ected number
E l, for all i, i = I, . . . , k.
(b) Calculate T j = i = I . • • • . k.
Statistic S gives a two-sided test and statistics and S" give one-sided
tests.
E 4.17.2 Example
The data given in Table 4.29, used by Pettitt and Stephens (1977), are taken
from Siegel (1956). Each of ten subjects was presented with five photographs
o f him self, varying in tone (grades 1-5), and was asked to choose the photo
graph he liked best. The hypothesis tested was that there was no overall
preference for any tone, that is , each tone was equally likely to be chosen.
The values of T. = (O^ - E^) are given in the table. The values of
174 STEPHENS
k = 3 к = 5
n= 6 m ; 4 3 n = 10 m : 5 4
.00274 .03567 .00477 .04162
n = 9 m : 5 4 3 n = 15 m : 6 5 4
.00193 .01656 . 12361 .00584 .03202 . 12322
n = 12 m : 6 5 4 n= 20 m : 7 6 5
.00109 .00771 .04994 .00496 .02203 .07617
n = 15 m : 6 5 4 n = 25 m : 8 6 5
.00361 .02089 .09181 .00368 .04717 . 13083
n = 18 m : 6 5 4 n = 30 m : 8 7 6
.00902 .04005 . 13579 .00946 .02930 .07924
n = 21 m : 7 6 5
к = 6
.00402 .01760 .06308
n = 12 m : 6 5 4
n = 24 m : 7 6 5
.08064
.00173 .01422
.00792 .02897 .08824
n = 18 m : 7 6 5
n = 27 m ; 7 6 5
.00308 .01599 .06435
.01325 .04245 .011433
n = 24 m : 7 6 5
n = 30 m : 8 ' 7 6
.01375 .04695 . 13203
.00609 .02015 .05757
n = 30 m : 8 .7 6
к = 4 .01071 .13317 .08836
n = 8 m : 4 3 Ir
K — *7«
—
.01514 .10791
n = 14 m : 6 5 4
n = 12 m : 5 4
.00511 .02996 . 12856
.01115 . 05974
n = 21 m : 7 6 5
n = 16 m : 6 5 4
.00807 .03242 . 10550
.00706 .03299 .12611
n = 28 m : 8 7 6
n = 20 m : 7 6 5
.00853 .02828 .08047
.00424 .01826 . 06598
n = 24 m ; 7 6 5 k = 8
.01014 .03526 . 10519
n = 16 m : 6 5
n = 28 m ; 8 7 6 .01122 .05166
.00566 .01914 . 05689
n = 24 m : 8 7 6
.00410 .01641 .05477
(continued)
TESTS BASED ON EDF STATISTICS 175
T A B L E 4.28 (continued)
к = 9
n = 18 m : 5
.00406 .02043 .07840
n = 27 m : 8 7 6
.00833 .02831 .08210
к = 10
n = 20 m : 6 5
.00781 .03276 .10909
n = 30 m : 9 7 6
.00421 .04365 .11333
^F o r given n and k, the table gives values of P(S > m) beneath values of m .
The probabilities given a re exact fo r cells o f equal probability. H alf the
tabulated probabilHy is a good approximation to P(S+ > m) = P (S “ > m ).
Taken from Pettitt and Stephens (1977), with perm ission of the Am erican
Statistical Association.
1 0 -2
2 I -3
3 0 -5
4 5 -2
5 4 0
^The first column gives the five grades of tone o f a photograph, and the data
in Column 2 are the numbers out o f 10 persons in an experiment who chose
the different tone grad es. (See Section 4 .1 7 .)
176 STEPHENS
and S” are respectively О and 5, and the value of S is 5. F rom Table 4.29
for n = 10, к = 5, we have P (S > 5) = 0.00477, so S is highly significant,
with p -le v e l less than .005, and Hq w ill be rejected. The Pearson statistic
к
= 2 (О - E )V E
1=1
has the value 11. Using the usual x| approximation, P (X^ > 11) = 0.024,
while by exact enumeration the probability is 0. 04. The S statistic thus gives
a much m ore extreme value than does , and appears to be m ore sensitive
in this instance. Pettitt and Stephens have investigated the power of S, espe
cially against alternatives representing a trend in cell probability as the cell
index i increases, and it appears that fo r such alternatives, S w ill often be
m ore powerful than X ^ .
Note that Case 0 tables fo r nD should not be used for S, despite the
parallel between the two statistics. Noether (1963) suggested that use of the
nD tables would give a conservative test; Pettltt and Stephens have given
several examples to show this to be true, with the true a -v a lu e very different
from the supposed value.
The test fo r S has been given above fo r Case 0 where the null hypothesis
is completely specified. The analogue o f S is not available fo r the various
cases where probabilities for each cell must be estimated, fo r example, in
a test for a Poisson o r binomial distribution, where an unknown param eter
must be estimated from the data.
Wood and Altavela (1978) have discussed asymptotic properties of
Kolm ogorov-Sm lm ov statistics D^, D “ , and D when used with discrete d is
tributions, and have shown how asymptotic percentage points may be
simulated.
4.18.1 Introduction
Suppose к independent statistical tests are made. It may be that the p -le v e ls
are quite sm all, but not sm all enough to be significant. If the к tests are all
tests of sim ilar type—fo r example, all tests fo r normality of sim ilar data,
with sm all samples of each—the results may suggest, overall, that the data
a re non-norm al, but the sam ples are too sm all to detect this. It then becomes
desirable to combine the tests. The general problem of combining tests, even
of different t5ф es, has been discussed by many authors; see, for example,
F isher (1967), Birnbaum (1954), and Volodin (1965). F ish er (1967) suggested
an easy method of combination, based on the p -le v e ls of the к separate test
statistics. In effect, the p -le v e ls are tested for uniformity. This method is
discussed in Section 8.15 and has been used to combine various tests for
TESTS BASED ON EDF STATISTICS 177
normality by W ilk and Shapiro (1968) and by Pettitt (1977a). Volodin (1965)
has discussed tests for one distribution (the norm al, exponential, Poisson,
o r W eibull) against specific alternatives which are close to the one tested.
£ 4 .1 8 .2 Example
Suppose six Case 0 tests for normality are made, and the values of A^ are
2.353, 1.526, 0.550, 0.252, 2.981, 2.309. The p -le v e ls of the tests are,
from Table 4.3: 0.06, 0.17, 0.70, 0.97, 0.03, 0.06. T h e a v e ra g e is
Z^ = 1.662, and reference to Table 4.30 shows Z¿ to be significant at the 5 %
level. Although only one component test is significant at the 5 % level, the
overall combination suggests a total picture o f non-norm ality.
Significance level a
N o. of sam óles
к .25 .10 .05 .025 .01
Significance level a
No. o f samples
Test к .25 .10 .05 .025 .01
The same technique can be applied to combine tests o f fit when param eters
a re estimated. Here each value of A^ w ill be modified, as described in
previous sections, fo r the appropriate test, and then the mean o f the
modified values A?, j = I , • • • , k, w ill be taken as the overall test statistic.
Upper tall percentage points of fo r tests of normality and fo r tests o f
exponentiallty, are in Table 4.31. F o r к too large fo r the table, follow the
same procedure as described in Section 4 .1 8 .2 , fo r Table 4.30.
E 4.18.3 Example
P ierce (1978) has suggested the following method of combining tests based
on к sam ples, fo r testing Hq : the sample comes from a distribution F (x ; 0 ),
with в containing unknown location and/or scale param eters a and ß . The
true values o f these param eters may be different fo r each test. F o r sample i,
let Oi^ and /¾ be the maximum likelihood estim ates. Define standardized
values W r i = ( ^ r i “ ^ = I* . . . , n¿, where r = I, . . . , n^, are
the observations in sample i. The proposal of P ie rc e is that the W ^j fo r all
the к sam ples should be pooled to form one la rg e sam ple o f size n = n^.
In this section tests are discussed based on the spaclngs o f a sam ple. Each
spacing is norm alized by division of a constant and a transformation made
to produce Z-values between 0 and I . W e give the test based on the ED F sta
tistic A| , the Anderson-D arling A^ calculated from these z-valu es, and
compare it with tests based on the median o r the mean of the values. The
technique affords an Interesting method o f testing by eliminating location and
scale param eters, rather than directly estimating them, and is based on
tests fo r the e^qxDnential distribution. The tests can be used fo r censored
data. The general case w ill be treated o f a sample which has been censored
at both ends. Suppose t + 2 successive observations X , X , ...,
(K) (K"^l)
X a re given, and the test is o f the null hypothesis:
= .........
i / t+1
"(I. ■ Д ........‘
and the w ill be ordered uniform s, that is, they w ill be distributed as an
ordered sample of size t from the uniform distribution with lim its 0 and I .
A special case of norm alized spacings is the set derived, as in Sec
tion 10.5.2, by the N-transform ation applied to a complete sample of X -
values from Exp (0,/3); here an extra spacing is available between X^i) and
‘the known low er endpoint a = 0 of the distribution of X. The spacings
a re E i = X ( I ) , E 2 = X ( 2) - X (i ), E 3 = Х ( 3 ) - X ( 2), e t c ., and, fo r the
Exp ( 0 , 1) distribution, m j - m j » ! = l/ (n + I - i), with mg = 0; thus the
norm alized spacings are the values X j in Section 10.5.2. Tests that the
original sample is exponential can then be based on statistics used fo r test
ing that the Z(^ are ordered uniforms, for example, E D F statistics. Case 0
(Section 4 .4 ), o r any of the test statistics described in Chapter 8 . Tests of
this type, for exponentiality of the original X, are discussed in Chapter 10.
These tests can be adapted to test that X came from a m ore general
F(x;o¿,/3) as follow s. The spacings are calculated as described above, and,
provided values of ki = mj - m^^j^ known, norm alized spacings y i can
be found; then the transformation to z(i) can be made. Then, subject to
important conditions particularly affecting the extreme spacings, suitably
separated norm alized spacings from any continuous distribution are asym p
totically Independent and exponentially distributed with mean ß (see Pyke,
1965, fo r rigorous and detailed resu lts). The conditions on this result are
sufficiently strong, however, that the transform ed values Z(^) must not be
assumed to be distributed as uniform o rd er statistics, even asymptotically,
fo r the purpose o f finding distributions o f test statistics. However, asym p
totic distribution theory of three statistics calculated from the z-valu es has
been given by Lockhart, O^Reilly, and Stephens (1985, 1986).
Norm al Distribution
Logistic Distribution
Monte C arlo power studies by Mann, Scheuer, and F ertig showed that S has
good power properties for the problem they w ere considerii^ where S w as
used with one tail. Other power studies for the extrem e-value test have been
given by Littell, M cClave, and Offen (1979), Tiku and Singh (1981), and
Lockhart, O ’R eilly, and Stephens (1986). These show that A| and Z have
high power, often better than S. F o r some alternatives, S w ill be biased;
we return to this point below.
Tlku (1981) has investigated S*, equivalent to Z , in testing for normality.
Lockhart, O ’R eilly, and Stephens (1985) have made further comparisons,
involving A|, S, Z , the A^ (C ase 3) test of Section 4 .8 , and the Shaplro-WiIk
W test discussed in Chapter 5; here S must be used with two tails to cover
reasonable alternatives. The most powerful tests fo r normality are given by
W , A^ (Case 3), and A|: these three give quite sim ilar results.
Lockhart, O ’R eilly, and Stephens have also shown that statistics S and Z
can give non-conslstent tests fo r some alternatives to the null; fo r example,
this is the case in testing fo r normality. Statistic S may also be biased, as
w as observed by Tiku and Singh (1981), although for the problem discussed
TESTS BASED ON EDF STATISTICS 185
by Mann, Scheuer, and F ertig (1973) this does not appear to be the case. It
would seem that statistic A|, which is consistent, should be p referred to Z
and S except perhaps fo r some situations in which the alternatives to the null
a re very carefully specified to avoid problem s of bias and non-consistency.
A| has good power In studies so fa r reported. It can be easily calculated
without direct estimation of param eters, can be used fo r censored data, and
is consistent. These properties suggest that A|, and possibly other ED F
statistics found from norm alized spacings, might prove useful in other test
situations, rivaling the regu lar use o f E D F statistics as described in the
re st of this chapter.
R E FE R E N C E S
D rap er, N . R. and Smith, H. (1966). Applied Regression A n aly sis. New
Y ork: W iley.
G illespie, M . J. and F ish er, L . (1979). Confidence band for the Kaplan-
M eier survival curve estimate. Ann. Statist. £, 920-924.
Horn, S. D . (1977). GoodnesS-o f-fit tests for discrete data; a review and an
application to a health impairment scale. Biom etrics 33, 237-248.
Tiku, M . L . and Singh, M . (1981). Testing the two param eter W elbull d is
tribution. Comm. .Statist. A , 10, 907-918.
Van Soest, J. (1967). Some e^)erim ental results concerning tests o f norm al
ity. Statistica Neerlandica 2 1 , 91-97.
In previous chapters it has been shown how a random sample can be used in
a graphical display (for example, on probability paper, o r by drawing the
ED F) which is then used to indicate whether the sample comes from a given
distribution. The techniques make use o f the sample values arranged in
ascending o rd e r, that is, they use the o rd e r statistics. In this chapter we
examine another graphical method, related to probability plots, in which the
o rd er statistics are plotted on the vertical axis o f the graph, against T^,
a suitable function o f i, on the horizontal axis. A straight line is then fitted
to the points, and tests are based on statistics associated with this line. This
type of test w ill be called a regression test; when the test statistic used is
the correlation coefficient between X and T , the test w ill be called a c o rre
lation test.
5. 1. 1 Notation
195
196 STEPHENS
5. 1. 2 Definitionss CorrelationCoefficient
S (T ,T ) = 2 (T j - T)2
Suppose only a subset o f the X^¿^ is available, because the data have been
censored. Provided the ranks i of the known Х(ц are known also, the c o rre
sponding T i can be paired with the X^ij, and the correlation coefficient c a l
culated as above, with the sums running only over the available values i. The
calculation of R (X, T ) is thus very easily adapted to all types of censored
data.
Regression tests arise most: naturally when unknown param eters in the tested
distribution Fo(X) are location and scale param eters. Simpóse F q (x) is F(w )
with W = (x - a ) / ß , so that ce is a location param eter and ß a scale param eter,
TESTS BASED ON REGRESSION AND CORRELATION 197
= ce + ß W ., i = I, (5.1)
E (X ,.J = a + ß m . (5.2)
(I) I
X = O' + ÔT + € (5.3)
(I) ^ i i
5.3 M EASURES OF F IT
Three main approaches to testing how w ell the data fit (5.3) can be identified;
discuss statistics based on methods (b) and (c) when they arise in connection
with tests for the norm al and exponential distributions.
Define
Z (X ,T ) = n { l - R 2 ( X , T ) } (5.4)
Saiiiadi (1975) showed the consistency of the text based on R (X ,m ) for testing
norm ality, and m ore recently G erlach (1979) has shown consistency fo r c o r
relation tests based on R (X ,m ), o r equivalently on Z = n { l - R ^ (X ,m )}, for
a wide class o f distributions including a ll the usual continuous distributions.
This is to be expected, since fo r large n we expect a sample to become p e r
fect in the sense of Section 5 .4 .1 . W e can expect the consistency property
to extend to R (X , T) provided T approaches m sufficiently rapidly fo r large
sam ples. W e now give tests based on R ^ (X ,m ), for the uniform distribution,
with unknown lim its (next section) and with lim its 0 and I (Section 5.6 ).
with a, b unknown.
The ordered values Х(ц w ill be plotted against mi = l/(n + I ); mi =
E (W (i)), where W i , W 2 , . . . , W ^ is a random sample from U (0 ,1 ), and
m = 0.5; fo r this distribution mi = Hi. The correlation test then takes the
following steps:
Significance level a
F o r a Type I censored sam ple, the procedure above does not make use of
the censoring values, say A for the lower value and B for the upper. Suppose,
for simplicity, A is zero and B is 11. Then it is possible to have, say, five
values out of ten, X^^) to X ( 5), which are 0 .9 , 2.1, 3.1, 3.9, 5.2; these
could give a large correlation coefficient, but would be suspect as being a
uniform sample because of the large gap between the maximum X ( 5 ) = 5.2
and B = 11. One way to make use of A and B is to include these in the sample,
m;aking it now a sample of size 7. On Hq , all 7 values w ill be uniform between
TESTS BASED ON REGRESSION AND CORRELATION 201
unknown lim its a *, b * and again the complete sample test in Section 5 .5 .1
may be used. The test as now made combines a test fo r uniformity of the
X (i) with a test that they are spread over the range (A , B ).
5.6 TH E C O R R E LA T IO N T E ST FO R U (0 ,1 )
= “ i
In sections 5.8 and 5.9 other regression tests fo r (5.5) are developed,
based on residuals, o r on the ratio of two estimates of scale.
(5.7)
.1 /J_2_0^^75>|
m. = Ф (5.8)
Vn + 0 125>'
first suggested by Blom (1958) can also be used; Ф"^(*) is the inverse of the
standard norm al cdf and computer routines exist for this function. Other
form ulas given by Hastings (1955) are quoted in Abram owltz and Stegun
(1965, p. 933); an algorithm has been given by Milton and Hotchkiss (1969),
and sim pler form ulas, suitable for use on pocket calculators, have since
been given by Page (1977) and by Hamaker (1978). W e isberg and Bingham
(1975) have shown that use of Blom*s form ula and the Milton-Hotchkiss
algorithm fo r Ф"^(*) to approximate mj makes negligible difference to the
distribution of R ^(X ,m ) = which in any case must be found by Monte
Carlo methods. Shapiro and Francia (1972) originally introduced W* as a
replacement for the Shapiro-W llk statistic below, for use when n > 50, but
the above articles suggest that W ’ = R ^(X ,m ) w ill be useful, because of the
T A B L E 5.2 Upper Tail Percentage Points for Z = n { l - R ^ ( X , m ) } for a
Test for Normality with Complete o r Type 2 Censored Data; p = censoring
ratio r/n
Significance level a
Tables are not given for Type I right-censored data since objections
apply sim ilar to those against EDF statistics. If the цррег censoring value
w ere t, and if p = ф {1 - ß ) / a } , tables of Z could be given for selected p and n;
however, they would have to be entered, in practice, at p = ф {(t - ? )/ ? }, and
this could cause an e r r o r in the apparent significance level o f Z . F o r large
sam ples, Z can be calculated from the available observations, and Table 5.2
can be used to give an approximate test.
E 5 .7 .3 Example
Table 5.3 contains 20 values of weights of chicks, taken from B liss (1976),
already used in Chapter 4 in tests of normality. When these a re correlated
X^; 156, 162, 168, 182, 186, 190, 190, 196, 202, 210, 214, 220, 226,
230, 230, 236, 236, 242, 246, 270
b
Hija........ .... : 1.86748, 1.40760, 1.13095, 0.92098, 0.74538, 0.59030,
0.44833, 0.31493, 0.18696, 0.06200
I C
n»20. , m i l • 1*8682, 1.4034, 1.1281, 0.9191, 0.7441, 0.5895, 0.4478,
0.3146, 0.1868, 0.0619
Smith and Bain (1976) have proposed the correlation statistic R (X ,K ), where
Ki is a close approximation to Щ , given by Abram owltz and Stegun (1965,
p. 933). Smith and Baln have given tables fo r use when R ^(X ,K ) has been
calculated from Type 2 censored data. Filliben (1975) investigated tests
using T i = nii the median of the distribution of W^i^; nii is given by Ф“^(íïi),
where Ui is the median of the i-th o rd er statistic of a uniform sam ple.
Filliben has given an em pirical approximation fo r Ui which gives a form ula
fo r ihi sim ilar to that for mi given in (5.8) above; thus R ^(X ,m ) is close to
R ^(X ,m ) = W * and has sim ilar power properties (Filliben, 1975). Fllllben
also gave tables of critical values of R^ (X, m ).
where
V “4lm^ - т Г ) У
G =
(Г У - Ч ) - ( r V “^m)2
Finally we turn to the third method of testing the fit of the model (5 .2 ), one
which has been developed by Shapiro and W ilk for testing normality and
exponentiality. The procedure used is to compare /3^, where /? is the gen eral
ized least squares estimate given in equation (5 .9 ), with the estimate of /3^
given by the sample variance, and the test statistic is essentially
Tests of this type are closely related to those in the previous section.
F o r the normal test, this statistic works very w ell, but In other test
situations, fo r example, for the exponential test, the statistic is inconsistent;
in practical term s there w ill be certain alternative distributions which w ill
not be detected even with very large samples (see Section 5.12).
In the case of tests fo r normality, modifications of the first estimate
of /3^ above have also been suggested, since the estimate is complicated to
calculate. It is not, of course, necessary to use the particular estimates of
scale given above, and a test can be developed using the ratio of any two con
venient estimates for ß ^ . Some comments on the choice of these estimates
are in Section 5.11 below.
In the next section we give tests for normality based on residuals and on
ratios of scale estimates. Regression tests for exponentiality o f all types
are Included with other tests in Section 5.11, and some general comments
on the techniques are given in Section 5.12.
TESTS BASED ON REGRESSION AND CORRELATION 207
The model ( 5 . 2) can be extended to provide further tests of fit. Suppose the
m \ a re the expected values o f o rd e r statistics from a N ( 0 , 1) distribution; the
model expressed by (5.2) is then correct only when the sample X i comes
from a norm al distribution. A wide class of alternative distributions can be
specified by supposing that the o rd er statistics in a sample of size n from a
distribution F(X) have expectations which satisfy the model
where ß^y • • • are constants and W2(m i), W3 (m j), . . . are functions of mi-
By different choices o f these functions, the norm al model fo r X is embedded
in various classes of densities. F o r the appropriate class, fo r given W j(*),
the estimates ^ of the constants ß j can be derived by generalized least
squares. A test fo r normality can be based on these estimates by using them
to test Ho : /З2 = /З3 = • • • = 0. Equivalently, tests may be based on the reduc
tion in e r r o r sum of squares between a fit of model (5.2) and the m ore gen
e ra l model (5.10) above. LaBrecque (1973, 1977) has developed such tests,
fo r Wj(m) a j-th o rd er polynomial in m , chosen so that the covariance m atrix
of the estimates д, o-, e t c ., should be diagonal, and has given the
necessary tables of coefficients and significance points. Stephens (1976) has
suggested use o f Hermite polynomials fo r w j(* ) and has given asymptotic
theory of the tests based on ßj, fo r j > 2 .
5.10.2 T e s t fo r N ( 0 , 1)
When the param eters in N (^ ,(j^ ) a re specified, the test based on residuals
takes a sim ple form . Let Х^ц = (Х(ц - д)/о-; the null h3pothesis then reduces
to a test that the Х^ц are N ( 0 , 1) , and a natural statistic based on residuals
is
“ Î . - = = % ) - ” ,>*
- - , . m *V -iX
Ai = X and <7 = , — (5.11)
m’ V ^m
2r 4
W = (5.12)
S^
then
W (5.14)
S^ S2
The steps in making the W test for normality, that is , for testing : the Xj
are a random sample from N( ai, ct^), with ju, cr unknown, are:
TESTS BASED ON REGRESSION AND CORRELATION 209
2 3 4 5 6 7 8 9 10
11 12 13 14 15 16 17 18 19 20
1 .5601 .5475 .5359 .5251 .5150 .5056 .4968 .4886 .4808 .4734
2 .3315 .3325 .3325 .3318 .3306 .3290 .3273 .3253 .3232 .3211
3 .2260 .2347 .2412 .2460 .2495 .2521 .2540 .2553 .2561 .2565
4 .1429 .1586 .1707 .1802 .1878 .1939 .1988 .2027 .2059 .2085
5 .0695 .0922 .1099 .1240 .1353 .1447 .1524 .1587 .1641 .1686
.0000 .0303 .0539 .0727 .0880 .1005 .1109 .1197 .1271 .1334
— — .0000 .0240 .0433 .0593 .0725 .0837 .0932 .1013
— — — — .0000 .0196 .0359 .0496 .0612 .0711
— — — — — — .0000 .0163 .0303 .0422
— — — — — — — — .0000 .0140
21 22 23 24 25 26 27 28 29 30
I .4643 .4590 .4542 .4493 .4450 .4407 .4366 .4328 .4291 .4254
2 .3185 .3156 .3126 .3098 .3069 .3043 .3018 .2992 .2968 .2944
3 .2578 .2571 .2563 .2554 .2543 .2533 .2522 .2510 .2499 .2487
4 .2119 .2131 .2139 .2145 .2148 .2151 .2152 .2151 .2150 .2148
5 .1736 .1764 .1787 .1807 .1822 .1836 .1848 .1857 .1864 .1870
6 .1399 .1443 .1480 .1512 .1539 .1563 .1584 .1601 .1616 .1630
7 .1092 .1150 .1201 .1245 .1283 .1316 .1346 .1372 .1395 .1415
8 .0804 .0878 .0941 .0997 .1046 .1089 .1128 .1162 .1192 .1219
9 .0530 .0618 .0696 .0764 .0823 .0876 .0923 .0965 .1002 .1036
10 .0263 .0368 .0459 .0539 .0610 .0672 .0728 .0778 .0822 .0862
11 .0000 .0122 .0228 .0321 .0403 .0476 .0540 .0598 .0650 .0697
12 — — .0000 .0107 .0200 .0284 .0358 .0424 .0483 .0537
13 — — — — .0000 .0094 .0178 .0253 .0320 .0381
14 ___ — — — — — .0000 .0084 .0159 .0227
15 — — — — — — — — .0000 .0076
(continued)
210 STEPHENS
SI 32 33 34 35 36 37 38 39 40
1 .4220 .4188 .4156 .4127 .4096 .4068 .4040 .4015 .3989 .3964
2 .2921 .2898 .2876 .2854 .2834 .2813 2794 .2774 .2755 .2737
3 .2475 .2463 .2451 .2439 .2427 .2415 2403 .2391 .2380 .2368
4 .2145 .2141 .2137 .2132 .2127 .2121 2116 .2110 .2104 .2098
5 . 1874 .1878 .1880 .1882 .1883 .1883 .1883 .1881 .1880 .1878
6 .1641 .1651 .1660 .1667 .1673 .1678 .1683 .1686 .1689 .1691
7 . 1433 .1449 .1463 .1475 .1487 .1496 .1505 .1513 .1520 .1526
8 . 1243 .1265 .1284 .1301 .1317 .1331 .1344 .1356 .1366 .1376
9 . 1066 .1093 .1118 .1140 .1160 .1179 .1196 .1211 .1225 .1237
10 . 0899 .0931 .0961 .0988 .1013 .1036 .1056 .1075 .1092 .1108
11 .0739 .0777 .0812 .0844 .0873 .0900 .0924 .0947 .0967 .0986
12 .0585 .0629 .0669 .0706 .0739 .0770 .0798 .0824 .0848 .0870
13 .0435 .0485 .0530 .0572 .0610 .0645 .0677 .0706 .0733 .0759
14 .0289 .0344 .0395 .0441 .0484 .0523 .0559 .0592 .0622 .0651
15 .0144 .0206 .0262 .0314 .0361 .0404 .0444 .0481 .0515 .0546
16 .0000 .0068 .0131 .0187 .0239 .0287 .0331 .0372 .0409 .0444
17 - .0000 .0062 .0119 .0172 .0220 .0264 .0305 .0343
18 - — .0000 .0057 .0110 .0158 .0203 .0244
19 - .0000 .0053 .0101 .0146
20 - — — .0000 .0049
41 42 43 44 45 46 47 48 49 50
I .3940 .3917 .3894 .3872 .3850 .3830 .3808 .3789 .3770 .3751
2 .2719 .2701 .2684 .2667 .2651 .2635 .2620 .2604 .2589 .2574
3 .2357 .2345 .2334 .2323 .2313 .2302 .2291 .2281 .2271 .2260
4 .2091 .2085 .2078 .2072 .2065 .2058 .2052 .2045 .2038 .2032
5 .1876 .1874 .1871 .1868 .1865 .1862 .1859 .1855 .1851 .1847
6 .1693 .1694 .1695 .1695 .1695 .1695 .1695 .1693 .1692 .1691
7 .1531 .1535 .1530 .1542 .1545 .1548 .1550 .1551 .1553 .1554
8 .1384 .1392 .1398 .1405 .1410 .1415 .1420 .1423 .1427 .1430
9 .1249 .1259 .1269 .1278 .1286 .1293 .1300 .1306 .1312 .1317
10 .1123 .1136 .1149 .1160 .1170 .1180 .1189 .1197 .1205 .1212
(continued)
TESTS BASED ON REGRESSION AND CORRELATION 211
41 42 43 44 45 46 47 48 49 50
11 .1004 .1020 .1035 .1049 .1062 .1073 .1085 .1095 .1105 .1113
12 .0891 .0909 .0927 .0943 .0959 .0972 .0986 .0998 .1010 .1020
13 .0782 .0804 .0824 .0842 .0860 .0876 .0892 .0906 .0919 .0932
14 .0677 .0701 .0724 .0745 .0765 .0783 .0801 .0817 .0832 .0846
15 .0575 .0602 .0628 .0651 .0673 .0694 .0713 .0731 .0748 .0764
16 .0476 .0506 .0534 .0560 .0584 .0607 .0628 .0648 .0667 .0685
17 .0379 .0411 .0442 .0471 .0497 .0522 .0546 .0568 .0588 .0608
18 .0283 .0318 .0352 .0383 .0412 .0439 .0465 .0489 .0511 .0532
19 .0188 .0227 .0263 .0296 .0328 .0357 .0385 .0411 .0436 .0459
20 .0094 .0136 .0175 .0211 .0245 .0277 .0307 .0335 .0361 .0386
21 .0000 .0045 .0087 .0126 .0163 .0197 .0229 .0259 .0288 .0314
22 — — .0000 .0042 .0081 .0118 .0153 .0185 .0215 .0244
23 — — — — .0000 .0039 .0076 .0111 .0143 .0174
24 — — — — — — .0000 .0037 .0071 .0104
25 — — — — — — — — .000 .0035
Taken from Shapiro and W ilk (1965), with perm ission of the authors and of
the Biom etrika T ru stees.
Significance level а.
(continued)
TESTS BASED ON REGRESSION AND CORRELATION 213
T A B L E 5.5 (continued)
Significance level a
Taken from Shapiro and W ilk (1965), with perm ission of the authors
and of the Biom etrika T ru ste e s.
E 5 .10. 3 . 3 Example
F o r the data of Table 5.3, Y in Step (a) above is (using coefficients from
Table 5.4) Y = 0.4734(270 - 156) + 0.3211(246 - 162) + • • • = 131.95. S^ is
17845 and W = 0.976. Reference to Table 5.5 shows W to be significant at
approximately the 10% level, upper tail, indicating a very good normal fit.
A test sim ilar to W , but fo r use with n > 50, was later suggested by Shapiro
and Francia (1972). This is based on the observation of Gupta (1952), that
the estimate a is almost the same if is Ignored in (5.11); the test statistic
then given by Shapiro and Francia is
214 STEPHENS
o/o
= - è ( n + l )}]/ (S n ) (5.15)
Shapiro and W ilk (1965) and Shapiro, W ilk, and Chen (1968) have given exten
sive Monte Carlo studies to compare W with other test statistics for n orm al
ity. Their studies indicate that W is a powerful omnibus test, particularly
when compared with the Pearson chi-square tests and against other older
tests, fo r example, those based on skewness bj and kurtosis Ь з , o r
U = range/(standard deviation). Stephens (1974) also gave com parisons, p a r
ticularly of W with EDF statistics, and pointed out that low power results for
the latter, given in the papers quoted above, are based on non-com parable
test situations. Nevertheless, over a wide range of alternative distributions,
W gives slightly better power than EDF statistics A^ and W ^ , and consider
ably better than the Kolmogorov D , o r the Pearson X^ o r P ea rso n -F ish e r X^
discussed in Chapter 3. F o r large sam ples, Stephens (1974) also compared
W , W *, and D a for power over a wide range of alternatives. The power of W ’
is m arginally less than that of W when W is available, and that of D a is
TESTS BASED ON REGRESSION AND CORRELATION 215
sm aller again. Thus the power drops as the statistic becomes m ore easy to
calculate. D yer (1974) has shown that W* has good power properties. F o r
la rg e samples these studies are effectively showing the value of the c o rre
lation coefficients R ^(X ,m ) o r R ^ (X jH) as test statistics for normality. Of
these statistics, it would appear to be best to use the Shapiro-W ilk W for
sm all sam ples, and Z (X ,m ) for la rg e r sam ples (n > 50), but further com
parisons would be useful.
5.11.1 Introduction
P aram eter a is the origin o f the distribution, and ß is the scale param eter.
F irst suppose that both a and ß are unknown.
Significance level a.
Significance level a
A s was described before, tests may also be based on the ESS of Section
5 .4 .1 , divided by a quadratic form in the When simple least squares
is used, and when the divisor is S^, regression on m yields n(ESS/S^) =
Z (X ,m ) as test statistic, and regression on H 3rlelds Z (X , H ). The divisor S^
is an estimate of nß^ , but fo r the exponential distribution it might be better
to use the estimate nX^, since X is a sufficient estim ator of ß» The c o rre
sponding test statistic ESS/(nX^) does not appear to have been Investigated.
In the test statistics R ^(X ,m ) o r R ^(X ,H ) the fact that oj is 0 is not used;
however, the line Е(Х^ц) = ß m \ can be fitted to the p airs {m j,X ^ jj} and a
test statistic can be constructed using the ESS divided by a suitable estimate
of /3^, sim ilar to those in Section 5.11.3 above. If a and ß w ere both known,
a natural statistic on these lines would be M?, = - m.)^ where X* =
E (I) i' (i)
(X (i) - O')//? ; this is analogous to of Section 5 .10.2. Lockhart and Stephens
(1986) have studied statistics based on residuals.
n (X _ ^
^ = ¾ and ¢ = - (5.17)
and the comparison o f ß ^ with the sample variance leads to the Shapiro-W ilk
(1972) statistic
n (X -X )2
W „ = ------------ ^ (5.18)
'E (n - 1)S2
Thus the test for exponentiallty with origin and scale unknown is as follows:
TESTS BASED ON REGRESSION AND CORRELATION 219
Shapiro and W ilk pointed out that, in general, W g w ill give a tw o-tail test,
since fo r alternative distributions W g may take either low o r high values.
Shapiro and W ilk (1972) gave a table of percentage points fo r W g based
on Monte C arlo studies; Table 5.8 is adapted from their table. C u rrie (1980)
has since calculated the points by numerical methods. Points fo r W g can also
be found from those of Greenwood^s statistic based on uniform spacings (see
Section 1 0 .9 .3 .2 ). The test is discussed in Section 5.12.
It is often required to test for F (x) in (5.16) with a known; fo r the present,
suppose O' = 0. The estimate of ß in the new model = /5т^ now becomes
/? = X, the same as the maximum likelihood estimate, and the corresponding
Shapiro-W ilk statistic would be based on X^/S^; Hahn and Shapiro (1967)
have proposed W E q = S^/(n5c)^, and have given percentage points derived
from Monte Carlo methods. Stephens (1978) later gave a test statistic, here
called W g , which, for sample size n, has the same distribution as W g for
sample size n + I . Statistics W E q and W g are in fact equivalent, and both
a re equivalent to Greenwood^s statistic based on spacings (see Section
1 0 .9 .3 .2 ). The test based on W g w ill be given here since no new tables are
necessary for its use. Suppose, returning to the general situation, that the
known origin has value a = Œq . The steps in the test are as follows:
(b) Calculate
and = E
i= l 1=1
W g can also be calculated by adding one extra value X^+^ equal t o to the
given sample of X -values of size n, and then using all n + I values to calcu
late W e from (5.18). This is a useful device if a computer program is already
available for W e * Stephens (1978) showed that, fo r most alternatives, use of
W g gives greater power than using W e as though a w ere not known.
Shapiro and W ilk (1972) also discussed the situation when it is desired to
test the exponential distributional form , and at the same time to test that
to
N3
О
Significance level a
n 0.005 0.01 0.025 0.05 0.10 0.50 0.10 0.05 0.025 0.01 0.005
3 .252 .254 .260 .270 .292 .571 .971 .993 .998 .9997 .9999
4 .124 .130 .143 .160 .189 .377 .751 .858 .924 .968 .984
5 .085 .091 .105 .119 .144 .288 .555 .668 .759 .860 .919
6 .061 .067 .080 .096 .117 .228 .429 .509 .584 .678 .750
7 .051 .059 .070 .081 .097 .187 .347 .416 .485 .571 .643
8 .045 .051 .061 .071 .085 .163 .293 .350 .403 .485 .543
9 .040 .0^4 .054 .063 .075 .142 .255 .301 .345 .402 .443
10 .037 .040 .049 .057 .068 .123 .218 .253 .288 .339 .370
12 .031 .036 .041 .049 .057 .101 .172 .202 .236 .272 .298
H
14 .027 .038 .036 .043 .050 .085 .142 .165 .186 .213 .232 И
16 .023 .028 .033 .037 .044 .073 .119 .136 .154 .177 .193
18 .021 .025 .029 .033 .039 .064 .102 .116 .131 .448 .167
.035 .057 .088 .100 .112 .129 .137 H
20 .020 .023 .026 .030 И
25 .017 .019 .022 .025 .029 .045 .067 .075 .084 .093 .100 H
с»
30 .015 .016 .019 .021 о 024 .036 .054 .059 .064 .072 .079
Î
.014 .021 .031 .044 .049 .054 .059 .064 И
35 .013 .017 .019 Ö
.041 .045 .050 .051 6
40 .012 .013 .015 .016 .019 .027 .038
W
45 .011 .012 .013 .015 .017 .024 .033 .036 .039 .042 .045 и
о
50 .010 .011 .012 .014 .015 .021 .029 .032 .034 .036 .039
55 .009 .010 .012 .013 .014 .019 .026 .028 .030 .032 .034
о
60 .009 .010 .011 .012 .013 .018 .023 .025 .027 .029 .031 !25
>
65 .008 .009 .010 .011 .012 .016 .022 .023 .025 .027 .028
.008 .009 .010 .011 .015 .019 .021 .022 .024 .026 О
70 .008 о
75 .007 .008 .009 .010 .011 .014 .018 .019 .021 .022 .023
и
i
80 .007 .008 .008 .009 .010 .013 .017 .018 .019 .021 .021
85 .007 .007 .008 .009 .009 .012 .016 .017 .017 .019 .019
îz:
90 .006 .007 .008 .008 .009 .012 .015 .016 .016 .018 .018
95 .006 .007 .007 .008 .008 .011 .014 .015 .015 .016 .017
100 .006 .006 .007 .007 .008 .010 .013 .014 .015 .015 .016
Adapted from Shapiro and W ilk (1972), with perm ission of the authors and of the Am erican Statistical Association.
222 STEPHENS
U =
J = Z m .X ,.y (n ÍQ
I (I)
De Wet and Venter (1973) have devised a statistic dependent on the ratio of
two asymptotically efficient estim ators of /3. It is straightforward to apply
their method to the exponential test with a = 0, fo r a complete sample; the
test statistic is = n j^ {Z X ^ y / H j} where = - l o g { l - i/(n + 1)}. De Wet
and Venter have given as 5nnptotic null distribution theory for V g .
Splnelli and Stephens (1987) have reported results on power studies to com
pare R ^(X ,m ) and R ^(X ,H ) in tests for exponentiality, with ED F statistics
W^ and , and with W g ; for these studies both a and ß w ere assumed
unknown. On the whole, the correlation statistics R ^(X ,m ) and R ^(X ,H )
w ere less effective than EDF statistics o r than W g , particularly fo r large
sample s iz e s . F o r tests for this distribution (in contrast to tests fo r norm al
ity) we might also expect some difference between R^ (X, m) and R^ (X, H) ;
this em erges clearly from the studies, with R^ (X , H) less powerful overall
than R ^ (X ,m ). Statistic W g has good power over a wide range of alternatives,
although it w ill have low er power against alternatives with coefficient of
variation equal to I (see Section 5.12).
TESTS BASED ON REGRESSION AND CORRELATION 223
In Section 5.3 it w as noted that Sarkadi (1975), fo r norm ality, and Gerlach
(1979) m ore generally, have proved the consistency of tests based on the
correlation coefficient R (X ,m ). The value o f R (X ,m ) w ill, loosely speaking,
approach I as the fit gets better. The test based on R (X ,m ) is then a one-tail
test; the null hypothesis is rejected only fo r sm all values of R, o r fo r large
values of Z = n (l - R ^ (X ,m )).
This consistency does not necessarily extend to correlation statistics
R (X ,T ) when T is not m . F o r example, tests based on the ratio of the gener
alized least squares estimate of ß with an estimate obtained from the sample
variance can be put in term s of correlation statistics, but w ill not generally
be consistent. F o r the Shapiro-W ilk test for normality in Section 5.10.3,
dr/S is equivalent to the correlation coefficient R (X ,T ) where T¿ is the i-th
component of T = V"^m . Then if T is proportional to m , o r very nearly so,
the graph of X (j) against T^ w ill be approximately a straight line; R (X ,T ) w ill
be a good m easure of fit, and low values of R (X ,T ) w ill lead to rejection of
the norm al model. F o r the norm al distribution, T is nearly proportional
to m , since Vm » m/2 impl 5dng » 2m (Stephens 1975), with the ap
proximation becoming better for large n; then the Shapiro-W ilk W approaches
the Shapiro-Francia W \ and this is the same as the correlation statistic
R ^ (X ,m ), which is consistent.
However, the norm al case is exceptional. Even for other symmetric
distributions V “^m w ill not as a rule be proportional to m , even as 3rmptot-
Ically , and the situation is m ore complicated fo r non-S3nnmetric distributions,
s u c h as the exponential. F o r such distributions the test based on the ratio of
/32 to S^ is equivalent to the correlation R^ (X , T) with the vector T =
l» V " ^ (lm ’ - m l* )V ”^, and fo r most distributions this vector w ill not even be
close to m . F o r example, fo r the e3qx>nential distribution, with a sample of
size n, T is proportional to a vector with one component equal to -(n - I ),
and the other n - I components a ll equal to I . A plot o f X (j) against T j , even
fo r a ^’perfect” exponential sample with X^y = m j, would not be close to a
straight line, and the value o f R (X ,T ) would not be close to I . The statistic
R (X , T ) then gives no indication of the fit in the sense that a large value indi
cates a good fit and a sm all value a bad fit. In practical term s, this means
that fo r W e , equivalent to R ^ (X ,T ), both tails are needed to make the test.
A lso , the test statistic w ill not be consistent.
F o r the exponential distribution, the coefficient of variation C y = o'/^l
is I , and for large samples nW g converges in probability to l/ C ^ = I; how
ever, there are other distributions for which nW£ would also converge to I
(for example, the Beta distribution Beta (x; p ,q ) defined in Section 8 .8 .2 ,
with P < I , and q = p { p + l } / { l - p } ) , and W e w ill not detect these a lte r
224 STEPHENS
5.14.1 V e r s lo n l
5.14.2 Version 2
Significance level oi
Significance level a
Significance level o¿
Significance level a
Smith and Bain (1976) have given tables of null critical values of I - R ^(X ,H )
for the exponential power distribution, F (x) = I - exp [ I - e x p { ( x - a)//J}],
(O' < X < » , ß > 0 ) . These are fo r Type 2 right-censo red data, with r/n = 0.5,
0.75, and I , and fo r n = 8, 20, 40, 60, and 80.
REFERENCES
6 .1 .1 Hypothesis T e stin g P ro b le m s
Ho: F = Fo
(6. 1)
Щ г F Ф Tq
Ho : F G 5^0
(6 . 2)
F ¢
235
236 QUESENBERRY
Ho : F G
(6.3)
Hi : F G ^i
X , X , ...,X
11’ 12’ Ini
X^.,
21’ 22’ ’ ” ’ 2Пг
(6.4)
\nk
H.: X,, - Fo G i — I, . . . , k ; j — I, -, n.
IJ I
(6.5)
Negation of
In w ords, this null h5pothesis is that all random variables (rv*s) have
df*s of the same functional form ; however, the param eters may change from
sample to sam ple. The testing problem of (6.5) is an important generalization
of the classical single sample goodness-of-fit composite null hypothesis test
ing problem of (6 . 2 ). The several samples null hypothesis can also be con
sidered against a corresponding several samples separate fam ilies hypothesis
testing problem as follows:
TRANSFORMATION METHODS IN GOODNESS-OF-FIT 237
(6 . 6)
Xy F j-^ E ^ k> j = l , • • • » П.
U i = h^(X^,...,X^)
,.X ^ )
Condition (I) is very important because it assures that every size a. test
based on (U i , . . . ,U n ) Is also a sim ilar a test for the same null hyix)thesls.
Condition (2) is a theoretically interesting property of transformations
of the structure of (6.7); however, it should be pointed out that this type of
characterization property of transformations has not played an important
role in the goodness-of-fit field. The apparent reason why this is so involves
the following considerations. The actual test statistic w ill itself be a r e a l
valued function of the values ( U^ , . . . ,U n ) of (6 .7 ), i . e . , the test statistic is
obtained by composition of a real-valued function with the transformations
of (6 .7 ). Unless the distribution of the test statistic is also a characterization
of the distribution Q in property (2 ), then property (2) fo r the transformations
of (6.7) is of little relevance. In other w ords, it m atters little whether char
acterization is lost in the first o r second step of the transform ations. In this
context, it should be observed that most of the goodness-of-fit statistics that
are important in applied statistics do not characterize a null class o f d istri
butions. As a particular example, consider the chi-squared test statistic.
Although chi-squared test statistics do not characterize null hypothesis
classes of distributions, they have, of course, been and are of great im por
tance in applied statistics (see Chapter 3).
A number of transformations of the form of (6 . 7) have been given in the
literature for particular param etric fam ilies. David and Johnson (1948) con
sidered the probability integral transformation when param eters are replaced
by estimates. They showed that the transform ed values are dependent, and
fo r many location-scale param eter fam ilies that the transform ed random
variables have distributions that do not depend upon the values of the param
eters .
A number of w riters have given transformations fo r particular fam ilies
of the structure of (6.7) that satisfy condition (I) for transformations. Sarkadi
(1960,1965) gives transformations for the three univariate normal fam ilies.
Durbin (1961) has proposed a transformation approach that eliminates the
nuisance param eters by introducing a further randomization step. Störmer
(1964) gives a method for transform ing a sample from a N (^,o ^) distribution
to a sample of n - 2 values from a N (0 ,1) distribution. A number of tran s
formations of the structure of (6.7) and satisfying property (I ) have been
considered in the literature for one and two param eter exponential classes.
Two of these transformations are considered by Seshadri, Csörgö and
Stephens (1969), and one is shown to have property (2), also. Csörgö and
Seshadri (1970), Csörgö and Seshadri (1971), and C sörgö, Seshadri, and
TRANSFORMATION METHODS IN GOODNESS-OF-FIT 239
6 . 2 P R O B A B IL IT Y IN T E G R A L TR ANSFO R M ATIO NS
Theorem 6 . 1
Theorem 6.2
Uj - F ( Y j ) . U j - F ( Y j I Y j ) ...........U j ^ T I Y ^ Y j ............. Y _ , . j )
Then
and
Ui = F ( Y i ) = I - e x p (-Y i) and U 2 = F (Y 2 l Y i ) = I - e x p (Y i - Y 2)
fo r 0 < Y i < Y 2 < are i . i . d . U (0 ,1) rv^s. This result is easily verified
directly. Applying the standard transformation of densities gives
The generality with which both o f the preceding theorems hold should be
carefully noted. W e shall apply these in particular for cases when F is
already езфИсШу a conditional distribution function.
The model assumptions we make in this chapter are m ore restrictive than
those in O ’Reilly and Quesenberry (1973) (O -Q ), but they are sufficiently gen
eral to cover many important cases. W e assume that the param etric class
of distribution functions corresponds to a continuous ejqx^nentlal class (cf.
Zacks, 1971, Section 2.5), and that T^ is a p-component vector that is a suf
ficient and complete statistic for в = (% , • • • , 0p)- Denote by ^ ( x ^ , . ,x^)
the distribution function of ( X^, . . . ,X^) given the statistic Tj^. Then Fjj(x^),
x i ) , O. . , ^ (X n I X j , . . . ,Xj^„;j^) are the m arginal and conditional d istri
Theorem 6.3
= W ' = ••
(6. 8)
V p = V V p ' ^ ’-'-*V p -i)
ar e i . i . d . U (0 ,1 ) rv^s.
W e note that the assumption that ( X^, . . ' , ¾ ) are i . i . d . is not necessary
to obtain Theorem 6.3. It is sufficient to require that ( X^, . . • ,X^) have a
full-rank absolutely continuous distribution. W e give results below in sub
section 6. 5. 3 obtained by applying Theorem 6.3 to the ord er statistics of a
sample from an exponential distribution.
A sequence (T jj)jj> i of statistics is said to be doubly transitive if each
Theorem 6.4
(6.9)
......... o „ -p ■ W
a re i . i . d . U (0 ,1 ) rv^s.
Some important classes of distributions which are not covered by the trans
formation theory of the preceding section are the truncation param eter fam
ilies considered by Quesenberry (1975). F o r these fam ilies we assume that
the parent density defined on an interval (a ,b ), finite o r infinite, is of one of
the three form s:
242 QUESENBERRY
f(x;^i , a¿2 , 0) = c(/ij ,/¿2 , 0)h(x, 0); а < / Х х < х < Д 2 < Ь (6 . 10)
fj (х;д, в ) = Cl (м, 0)hi ( x;0) , a < м< x < b (6 . 11)
Í 2 (x ;m , 0) = C2 ( ß ,0 )h2 ( x ;e ) , а<х<м <Ь (6 . 12)
Here a and b are known constants; м» Mi and ^^2 are truncation param eters;
0 is a p-component param eter vector; and h(x, Ö), hj (x, 0), and h2(x, 0) are
positive, continuous, and integrable functions over the intervals (^¿!,Mz )»
(M,b), an d (a,M ), respectively.
F o r X;!^, . . , Xn a sample we now set out a particular transformation
to another set of values. Let r denote the antirank of i . e . , r must
satisfy X r = X(n) ; and put
W = X •••, W = X , W =X W = X^
I I r -1 r -1 r ir^l n-1 n
W = X , ..., W = X W = X ......... W =X
I I т ^ “1 т ^ ”! ^^1 nij^l m2""" m2” i
W . = X ., . . . , W ^ = X
m 2 -l L* ’ n-2 n
These values W x , . . . » W^_2 ^ called the sample with X(x) and X^j^j
deleted.
F o r X x , » • • J Xji a sample from the density f of (6.10) and W x .........
W n _2 sample with X ^ j and X^j^^ deleted, the next theorem is from
Quesenberry (1975).
Theorem 6.5
h (w ,0 )I (W)
n i)* (P)^ ( 6 . 13 )
g(w , 0) =
J h(w, 0) dw
'‘(I )
TRANSFORMATION METHODS IN GOODNESS-OF-FIT 243
Theorem 6.6
a re i . i . d . U (0,1) r v 's .
/ hj^(w, 6) dw
^(1)
= { p “ ; p “ = P x • • • x p , P e #>.}, ) = 0,1
3 3
Theorem 6.7
From the distributional result of Theorem 6.3 and Basu (1955, 1960),
the next result follows.
Theorem 6 . 8
Theorem 6*9
A most powerful sim ilar test exists fo r testing Hq of (6.2) against a simple
alternative F = F ^ , and this test can be expressed as a function of ( и ^ , . . . ,
Un-p) only.
The Import of this result is that a ll information in the sample about the
class of df*s is also in the U -valu es U i , . . . » Un«p. This result and
Theorem 6.8 shows that the C P IT transformations may be regarded as a
technique whereby the information in the sample ( X^ , . . . , X^) can be p arti
tioned into two vectors (U ^ T ), and the vector T contains all information
about the param eters (ö i, • • • , 6^), the vector U contains a ll information
about the class , and T and U are independent. Thus the values ( U i , . . . ,
Un_p) may be used to make inferences about the class of distributions (is
it norm al? езфопепйа!? ), the statistic(s) T may be used to make inferences
within the class about param eter values (estimate the m ean?), and the inde
pendence o f U and T can be e l i c i t e d to assess overall e r r o r rates.
From the nature of the transformations in (6.8) and (6.9) it is apparent that,
in general, the vector U o f transform ed values is not a sym m etric function
o f the X sam ple. That i s , the vector U is not invariant under permutations
o f the observations. F o r those cases when the transformations are not p e r
mutation Invariant, this property requires consideration on a number of
points.
One point of concern when the u’s are not permutationally invariant is
that a goodness-of-fit test o r other analysis may lead to a conclusion that
depends upon the presum ably irrelevant indexing of the X*s. If two randomly
selected orderings are used to compute U and then a particular goodness- o f -
fit test (such as the Neyman smooth test discussed below) is computed on the
two U vectors, the statistics are Identically distributed and dependent, but
nonidentical. If we consider these two test statistics, then the situation is
sim ilar to that when m ore than one competing test statistic is computed for
the same testing problem . In particular, it is common practice today to com
pute a number of the go o d n ess-of-f it test statistics, discussed elsewhere in
this book, for each sam ple. A quantity of relevance here is the probability
that two tests w ill agree in their conclusions. Quesenberry and Dietz (1983)
considered this probability fo r Neyman smooth tests made on the U ’s from
random permutations of a sam ple. They gave em pirical evidence that these
agreement probabilities are very high in many cases of interest and are
bounded below by the value 2/3 for all cases considered.
It is possible to obtain C P IT transform s that are invariant under perm u
tations of the original sam ple. To obtain these transform s re c a ll that in
o rd e r to apply the approach of Rosenblatt it was necessary to assume only
that the rv*s ( X^ , . . . ,X^) had a full-ran k continuous distribution. Thus to
246 QUESENBERRY
6.4 TESTING S IM P LE U N IF O R M IT Y
The transformations of Section 6.2 can be used to construct sim ilar a tests
fo r the testing problem s of (6.2) and (6.3) by making size a tests of the
surrogate null hypothesis that the U -values are themselves independently
and identically distributed as uniform random variables on the interval (0 ,1 ),
i . e . , ar e i . i . d . U (0 ,1) r v ’s. These transform ed values of the U*s should be
studied with care because they contain a ll test information in the sense of
Theorem 6.9. It should also be observed that when the null hypothesis
H(,: P G fa lls, the distribution of the U^s may fail to be i . i . d . U (0 ,1) in
many w ays. They may no longer be independent, nor identically distributed,
nor uniformly distributed. M oreover, if the model properties of independence
o r identical distributions of the observations as assumed by the form al
goodness-of-fit and separate hypothesis testing problem s of (6 .2 )-(6 .6 ) are
violated, then this w ill also result in transform ed values that are not, in
general, i . i . d . U (0 ,1) r v ’s. Thus we can use these transform ed values to
study the validity of other model specification, in addition to violations of an
assumed param etric density functional form . Now, the choice of tests to be
made w ill, of course, depend upon the type of violation of model assumptions
of concern. In some problem s, we may not have reason to be concerned about
pàrticular types of model violations, and we would like to perform an analysis
with good sensitivity against a wide range of alternatives. Such an analysis
TRANSFORMATION METHODS IN GOODNESS-OF-FIT 247
z. = Ф j = l. N (6.17)
fo r Ф a N (0 ,1) df, and Ф“* Its Inverse. By the Inverse of Theorem 6.1, when
the Uj^s are i . i . d . U (0,1) rv^s, then the z j ’s are i . i . d . N (0,1) rv*s. Thus we
can also test (6.3) by testing the surrogate null hypothesis that the zj^s are
i . i . d . N (0,1) rv^s. W e shall consider further tests based on the zj*s below
when we consider particular param etric c lasses. W e shall call the values zj
norm al uniform (NU) residuals. A principal reason fo r considering these N U
residuals is that the problem of testing normality is the most extensively
studied problem in the goodness-of-fit area. See Chapter 9 o f this book, and
we w ill recommend tests of normality below. Hester and Quesenberry (1984)
have found that a test based on these N U residuals has attractive power prop
erties for testing for heteroscedasticity, i . e . , fo r either increasing o r de
creasing variance fo r ordered regression data.
Graphical methods are very useful fo r studying the uniformity of the values
U i , . . . , U^ on the unit Interval. A s a first step the data can be plotted on
the (0 , 1) interval; o r, if N is large it w ill be m ore convenient to partition
the (0, 1) interval into a number of subintervals of equal length and to con
struct a histogram on these subintervals. The data pattern on the unit inter
val, o r the shape of the histogram constructed, conveys important information
about the shape o f the parent density from which the data w ere drawn relative
to the shapes of the densities of the null hypothesis cla ss. W e next consider
this in m ore detail.
In o rd er to study the significance of particular patterns of the u-values
on the unit interval, we first suppose that the u’s w ere obtained from the
classical probability integral transformation of Theorem 6 . 1 by transform ing
a sample X i , . . . , using a continuous df F q (x) with corresponding density
248 QUESENBERRY
N
= (1/12N) + { (2i 1)/2N U N (u - 0.5)^ (6 . 18)
(i/
i=l
has critical values that are approximately constant in N fo r N > 10. Table
6.1 gives approximate significance points fo r U ^ qj^ that w ere obtained by
Q -M by Monte Carlo methods, and the Stephens approximation fo r N > 10.
In a classic paper Neyman (1937) proposed a test statistic that is defined
as follow s. Let тг^. denote the rth degree Legendre polynomial, the first five
of which are, for 0 < u < I,
250 QUESENBERRY
TTo(U) = I
Then put
N
V = Z v K ) » r = I , ..., к
j=i ^ J
and
p2 = (1/N) Z V
r=l
6. 5. 1 Introduction
6. 5. 2 Uniform Distributions
Densities:
= (l/{ß2 - M l ^ J (X ) (6 . 20)
TRANSFORMATION METHODS IN GOODNESS-OF-FIT 253
j = I. n - I (6 . 21)
X (I)), 3 = 1, . . . , n - 2 (6.23)
6. 5. 3 E3ç>onential Distributions
Densities:
Note that the rem arks about the effect of permuting the X*s on the u*s
above also obtain to these u^s, v i z . , if the X ’s are permuted we still get the
same set of values { u i , . . . *Uj^_i} from (6.25). F o r cases П and HI that
follow, we give the transform s obtained from the sample X i , . . . , from
Theorem 6.4, as (6.26), and, also, transform s obtained by applying The
orem 6.3 directly to the o rd er statistics (x (i ), . . • ,x^^p, as (6.27); these
latter form ulas w ere obtained by O^Reilly and Stephens (1984).
254 QUESENBERRY
/j-1 /3 \} - l
(6.26)
......... ”
. I - (n - j + l)y / (y . + • • • + У ) J “ ^
U = 1 - } ----------------------------------------------------------- — Л ,j = l, . . . , n - 1 (6.27)
'"j j i - (n - 3 + i )y . y (y . + ---+y^)(
6. 5. 4 Pareto Distributions
Densities:
0¿ l+ûf
f(x ;a ,y ) = { a y /x )I^^ (x), a > 0, y > 0 (6.28)
• I - (X,„ / » , ) “ • (6.29)
fo r j = l , . . . , n — I*
6 . 5. 5 Norm al Distributions
Densities:
u . _ i = ф { ( j - 1)^(X^ - (6.31)
fo r j = 2, . . . , n.
sf = Z (X - . j = I, ..., n
J i= l
and let Gj; denote a Student-t distribution function with v degrees of freedom .
Then
s* = {[(j - + +j - 1)}^
and
*>j_l = • j = 2. • • • . n (6.32')
256 QUESENBERRY
J J
X. = (1/j) Z X,. S = (1 / 0 -1 )) Z (X. - X )
J I= I -" i= l ^
Then
(6.33)
“j-2 = - « / « S - V l > ''^ j - 1 ^ ’ ^ =
s * = { [ ( j - 2)S^_^ + v s ^ / (i; + j - 2 ) } ^
and
(6.33*)
“j-2 =
V - 2
(SS.) = ¿ (X. - X .)
then
-2
X. = [ ( j - l)x ._^ + x .]/ j, (SS.) = (SS^_^) + [ j / (j - l)](x . - Í . ) (6.34)
Youngs and C ram er (1972) and Chan, Golub, and Le Veque (1983) d is
cussed these form ulas; the latter authors w ere prim arily concerned with
numerical accuracy.
6. 5. 6 Lognorm al Distributions
Densities:
y. ^ N ( x ^ £ .( t2), 3 = 1. (6.36)
fo r the jth row of Хд. The m atrix Xj denotes the m atrix consisting of the
firs t j rows o f Xjj. W e shall give transformations here fo r two cases» v i z . ,
fo r (7^ known and unknown.
Using this result with Theorem 6.4 gives the following resu lts. Put
y. - x : ( x : x . ) - i x : y .
A = .. Д 1 3 3____ 3 Z 1 - (6.37)
^ (7J1
Ql - x ! ( X !Jx .j/
) - ^ x^jJ
.]^
and
U = Ф(А ), j = P + I, .. (6.38)
J-P 3
(6.40)
^3 3“1 3 3 33 3 3-1
S? = S? + a lA } (6.41)
3 3 -1 ^ 3
y. “ x! b. ^
A. = ____ 3- ГЗ 3-1 (6.37*)
è
^ (tJ 1 + ^ : ( X : , X. J - ^ x . ]
0 3 3-1 3-1 3
Put Wj = c7oAj, and these Wj*s are the quantities sometimes called
recursive resid u als, and have been considered by a number o f w riters in
cluding Hedayat and Robson (1970), Brown, Durbin, and Evans (1975), and,
recently, by Galpin and Hawkins (1984). These w riters have generally not
assumed that is known. Then in the form (6.37*) the Wj*s can be shown to
be i . i . d . N(0,(j^), which also follows easily from the C P IT results given
above.
(i - P - I)I .(yj
B. = ----- (6.42)
^ { [ 1 - x ! ( X : x . ) " ‘ x.]S? - (y. - x î b . ) * } ’
J J J r J J JJ
and
U sing the updating form ulas again, we express Bj in the alternative form
B. = (6.42*)
Note that the quantity (yj - x j b j . i ) in the numerator of (6.37*) and (6.42*)
is the residual of yj from the least squares line fitted using the first j - I
points. This is a norm al rv with mean zero, and by examining (6.37*) and
(6.42*), we see that Aj and Bj are the standardized and Studentized form s of
TRANSFORMATION METHODS IN GOODNESS-OF-FIT 259
B. = ( j - P - l ) ^ A y ^ ( Д = ( 3-P - [Д w ?) (6.44)
n = n, + • • • + n, , V .. = n +
I к IJ I
(6.46)
Ч гW
X SS + SS.,. ,,
Ii
(Rem ark: Readers fam iliar with the QGB paper w ill note that the iqpdat-
Ing form ulas in (6 . 34) above have been used to rew rite the form ula given for
A y in that paper in a m ore convenient fo rm .)
i
S ..| û SS
“ “a
+ S S „ . ,, + VS^
И 3-1) *' )/¾
and
= G * Ü ( j - l ) / j ] * ( x « - X , , , ,, )/ S * } (6.49)
“ij i(j-l)'
Ij
6.6 N U M E R IC A L E X A M P L E S
6. 6. 1 Introduction
6. 6. 2 Salinity Data
TABLE 6.3
O bs. 6
Norm al Analysis
^ = 0.013 p j = 1.631
MOD
Uniform Analysis
W e have transform ed each of the sam ples of T able 6.3 using the transform a
tions of (6.23) for a tw o-param eter uniform fam ily. The 54 u-values ob
tained are given in Table 6.5 and are graphed against the null hypothesis
expected values in Figure 6.2. This uniform probability plot is typical of the
pattern we obtain when norm al data are transform ed using the transform a
tions for a uniform fam ily. This S-shaped pattern can be anticipated from
the discussion of data patterns in subsection 6 .4 .2 , since both distributions
a re S3mimetric and on the range of the sample the norm al density has thinner
tails near the extrem es, and is higher than the uniform density in the center.
The value o f in T able 6.5 fa lls between the upper 10 and 5 percent
points, and the observed significance level of p j is
From these statistics and from Figure 6 .2 it is clear that the uniform
distributions do not fit the data w ell, and certainly not as w ell as normal
distributions. It should be borne in mind here that the uniform distribution
as an alternative to normality is one that is rather difficult to detect.
Exponential Analysis
W e have transform ed each of the sam ples of T able 6 . 3 using the transform a
tions of (6.27) for tw o-param eter exponential distributions; see Case DI in
subsection 6 .5 .3 . The 54 pooled and ranked values obtained from these trans
formations are given in Table 6 - 6 . These values are plotted in Figure 6.3
against the expected valu es. This probability plot is typical of the pattern
obtained when norm al data are transform ed using exponential fam ily tran s
form ations—re c a ll the discussion in subsection 6 .4 .2 .
This value of U ^ qj ^ is highly significant since the upper I percent point
fo r U |ioD is -267- M oreover, P(p| > 28.260) = P(x^(4) > 28.260) = 0.00001.
264 QUESENBERRY
p| = 6.610
= -7 4 7 p| = 28.260
MOD
266 QUESENBERRY
From these values and Figure 6.3 it is clear that the tw o-param eter e x p o -
nential distributions fit these data very poorly.
6 .6 .3 Simulated Data
The appendix gives three samples of 100 observations each that have been
drawn from norm al, exponential, and uniform distributions. These samples
a re named NOR, N E X , and UNI, which we shall abbreviate further to N, E,
and U in this section. F o r each of these three sam ples we have computed 98
u-values using the appropriate transformations fo r tw o-param eter norm al,
exponential, and uniform fam ilies. This gives nine sets of u-values. W e have
partitioned the ( 0 , 1) interval into ten subintervals of equal lengths, and de
note by M i the number of u^s that fall in the subinterval ((i - 1)/10, I/10]
fo r i = I , . . , , 10 for each set of u-values. These values and those for U?,
p| and the observed significance level of p|, called Sig., are given in Table
6.7. The label (N , E) on row 4 means the normal sample was subjected to
exponential transformations, for example.
The first three rows of Table 6.7 give results when the sample from a
distribution is subjected to the transformations fo r a class to which that d is
tribution belongs. The expected cell frequencies are 9.8, and none of the
test statistics is significant at any of the usual levels. The other six rows
all give level .05 significant tests for both U ^ qj ) and p|. A ll except (U ,N )
have observed significance levels for both statistics that are very sm all,
indeed. A m ore detailed analysis could be carried out by plotting the 98
u-values fo r each case as was done in the last example in Figures 6 . 1 , 6 . 2 ,
and 6 . 3.
M l М 2 М3 M4 M5 M 6 M7 M8 M9 MlO ^ pi Sig.
MOD
TABLE 6.8
Sample A Sample B
210 196
190 236
182 .060 .053 .1 1 1 246 .022 .147 .184
230 .067 .105 .191 187 .024 .216 .281
236 .077 .228 .356 193 .083 .224 .318
The appendix gives data from B liss (1967) on the body weight in gram s of
21 -day-old white leghorn chicks at two dosage levels of vitamin D . W e have
randomized each of the sam ples labeled series A and se rie s B . Table 6 .8
gives the data in the o rd er in which it was analyzed here. Table 6 .8 gives
also the transform ed u-values when the sam ples are subjected to tw o-
param eter norm al, uniform, and e:qx>nential analyses. The test statistics
u | jo D observed significance level, Sig. , of p|, have
been computed fo r each sample as w ell as for the pooled sam ples and are
given in Table 6 .9 . Graphs of the pooled u-values are given in F igures 6.4,
6 .5 , and 6.6 for the norm al, uniform, and exponential c lasses, respectively.
268 QUESENBERRY
S. 0.4 —
M OD
0.117 P i = 12.16
TRANSFORMATION METHODS IN GOODNESS-OF-FIT 271
The probability plot in Figure 6.6 as w ell as the test statistics for the
exponential case in Table 6.9 easily eliminate the exponential class from
consideration for fitting these data. Comparison of the graphs in Figures
6.4 and 6.5 suggests that the normal class fits better than the uniform class,
and this conclusion is supported by the values of and p| computed for
the pooled u-values in Table 6.9. The statistic is significant at the
. 05 level fo r the uniform class but is much too sm all fo r significance at even
the . 10 level for the norm al c la ss. The probability plot of Figure 6.4 sug
gests that the true distribution (assuming that the i . i . d . assumptions hold,
of course) has a density with slightly thicker tails than a norm al density, but
it does look pretty sym m etric. Figure 6 .5 suggests that the true density has
thinner tails on its interval of support (where the density is positive) than the
tails of the uniform density on its interval of support, but again the density
appears to be sym m etric.
6 .6 .5 Regression Data
In this section we give three examples to Illustrate the use of the transform a
tions of (6.37) to test the regression model assumptions of (6.36). It should
be borne in mind that we are testing M of the assumptions of the normal
linear regression model.
Fisher^s Data
MOD
.063 p| = 2.389
XJ2
MOD
0.225 p| = 9.464
274 QUESENBERRY
Snedecor-Cochran Data
Snedecor and Cochran (1967, p. 140) give the initial weight x and the gain in
weight у of 15 female rats on a high protein diet, from the 24th to 84th day
of age and suggest considering a norm al linear regression model. The ranked
u-values fo r these data are given in Table 6 .12 and are plotted against ex
pected values in Figure 6.9.
The number of observations here, 15, is rather sm all and gives only 12
transform ed u-values for testing the model. With so little data only rather
large violations of the model w ill be detected. Recall that for this sm all
number of u^s that many tests are biased fo r some alternatives. The U m o d
statistic is unbiased for all known cases and is our p referred test statistic
here. The observed value of U m o D ” *^25 here falls between the upper 5
and I percent values, and P(p| > 9.464) = .05. The pattern of points in
Figure 6.9 also ra ise s doubts about the adequacy of fitting these data with a
norm al simple linear regression m odel. W e conclude, even with so few ob
servations, that the norm al simple linear regression model assumptions
a re unwarranted.
(Author’s note: This chapter was written in 1977 and revised in 1984.)
TRANSFORMATION METHODS IN GOODNESS-OF-FIT 275
R E FE R E N C E S
Stephens, M. A . (1970). Use of the Kolm ogorov-Sm irnov, C ram er-von M ises
and related statistics without extensive tables. J. R. Statist. Soc. B32,
115-122.
Stôrm er, Horand (1964). Ein test zum erkennen von Norm alverteilungen.
Zeitschrift W ahrscheinlichkeitstheorie und Verwände Gebiete 2 , 420-428.
7.1 IN TR O D UCTIO N
n
= Z X /n
j= l ^
Ч = Z (X. - m )V n , i = 2, 3, 4 (7.1)
j= l ^
n/Ь- = m
I 3 2
(7.2)
‘'г =
and it is readily seen that they are invariant under origin and scale changes.
The corresponding m easures for a specified density are denoted by \Tßi
and F o r a normal distribution = 0, /З2 = 3, and in random samples
from it there may be wide variations from these values, especially fo r sm all
sam ples (n < 25). M oreover, the sample may a rise from some nonnormal
279
280 BOWMAN AND SHENTON
7.2 N O R M A L DISTRIBUTION
E arly work goes back to K a rl Pearson (1902) who gave in general sampling
expressions for the dominant term s (n”^ asymptotics) in the variance of bj
and Ъ г , and also the correlation p(bi ,Ь з ). The idea was to use, fo r example,
N/Var (b j) and NTVar (Ьз) as e r r o r assessm ents, but it was fa r too early in
the development of the subject to consider questions o f the validity of the
of the asymptotics o r their uses.
A quarter of a century later an important development cams from
E . S. Pearson (1930) who used the work of F ish er (1928) and W ishart (1930)
on k-statistics to develop a Taylor series expansion (in term s o f the k -s ta tis -
tic discrepancies ki - /cj , 1 = 2, 3, 4) fo r NZbj and Ьз* F o r example, defining
у = (n - l)'^ {b i/ (n - 2)}
Pearson showed for the second and fourth central moments o f у (in norm al
sampling) that
/Ч . 6 ^ 22 70
Mz(y) = I + “Т “ “ Г +
^ n n^ n^
and
MOMENT ('T bi.bj) TECHNIQUES 281
. 1056 , 24132
ß i< y ) = 3 - ^ -------
the odd moments being zero. He developed sim ilar e g r e s s io n s fo r the 2nd,
3rd, and 4th central moments o f b 2 , along with Е ф г ) = 3(n - l)/ (n + I ) . To
damp-out higher o rd er term s, Pearson used sam ples o f n > 50 fo r n/Bj ,
n > 100 for ¿2 so as to a ssess the low er and upper 1% and 5% o f the d istri
butions in norm al sampling. Thirty o r so y ears later (P earson , 1965), he
gave a set of ^accepted" percentage points; for a sample o f 50 there is no
change in the third d .p . entries fo r ^Tbi at the 1%, 5% levels; fo r b 2 and
n = 100 there is no change in the second d .p . entries; in a ll, quite a rem ark
able achievement.
The next step forw ard came from F ish er (1930) who showed that in
norm al sampling the standardized moments m^/m^/^, r > 3, are distributed
independently of the second moment m 2 (F ish er used k-statistic notation).
Thus, fo r example, E (b i) = Em|/Em| follows from the independence of
m|/m| and n ^ ; here E means the mathematical expectation operator. In
this manner, the exact moments of and b 2 can be found. In fact, F ish er
derived the first six cumulants of ^ I h i , Hsu and Lawley (1939) the fifth and
sixth moments of b 2 . L ater, G eary and W orlledge (1947) gave the seventh
noncentral moment of b 2 ; actually they give . Some o f the coeffi
cients are quite la rg e , that of n^ being a 13-digit integer multiplied by 25515
(the whole expression has had scant usage to date).
Knowing exact moments, it w as a natural development to search for
approximating distributions, reaching out toward percentage points of the
distributions. Four-m om ent fits w ere studied by Pearson (1963) and at the
time he had the choice of the Pearson system, G ra m -C h a rlle r series system
based on the norm al, and the Johnson Su translation system (Johnson and
Kotz, 1970). F o r n > 30, the Student-t density gave an acceptable approxi
mation fo r \ l h i , the criterion being the closeness of agreement between the
standardized sixth and eighth moments fo r the model and the true values.
Johnson's Su, although troublesome to fit, seemed to be equally acceptable
to Pearson (1963, p. 106). Recently, D'Agostino (1970) has shown that
Johnson's Su fo r n > 8 gives a very acceptable and simple approximation;
in fact
Z = Ô In { Y / a + nJ (1 + ( Y / o i) 2 ) } (7.3)
is approximately a standard norm al variate with zero mean and unit standard
deviation, where
Ô = 1/Æ W , a = \ T { 2 / ( W ^ - 1)}
§
>
!z:
>
Ul
FIGURE 7.1 Contours for Kg test; n = 30 (5) 65, 85, 100, 120, 150, 200, 250, 300, 500, 1000; n 20, 25
based on K| (normal sam pling). O
й:
MOMENT (\íbi,b2) TECHNIQUES 283
7 .2 .1 Onmibus Tests
7 .2 .3 .1 Example
Sample values: Xj = I, i = ± I , ± 2, . . . , ±m
(A) m = 10 (B) m = 25
n= 20 n = 50
mj = 0 mi = 0
m 2 = 38.5 m2 = 221
m3 = 0 m3 = 0
m^ — 2533 • 3 m^ = 86145.8
*Л >1 =0 ^ T b i= O
Ьг = 1.71 Ьг = 1.76
Conclusion
7 .2 .3 .2 Example
Sample values a re the first 20 of each of the first four data sets in the
appendix.
= significant.
^NS = not significant.
MOMENT (N/bi ,Ьг) TECHNIQUES 285
Comment
It is interesting to see how the single tests using 'v/bj ,Ьз separately would
p erform - Using the D'Agostino approximation fo r N/bj under normality, we
have fo r n = 20 ,
and
The .90 and .95 levels of N/bj are 0.589 and 0.772, respectively. Thus E X P
is significant, and the other three not significant. A s fo r Ьг, using D'Agostino
and Pearson (1973, 1974), we find the approximate levels (n = 20):
W e now see that fo r NOR and UNI the interest is in values of Ьг significantly
lo w er than 3, whereas fo r E X P and LO G , the interest lies in values o f Ьг
la r g e r than 3; thus the tests become directional, a concept introduced by
Pearson . C learly Рг(Ьг < 2.13) is about .20, Рг(Ьг < 1.54) < 0.01,
Рг(Ьг > 3.40) = 0.15, and Рг(Ьг > 4.15) = 0.05 approx. F o r the four cases,
the single test sum m aries a re:
NOR U NI EXP LO G
^ГЬl NS NS S NS
NS S S NS
7 .2 .3 .3 Example
n = 20
m^ = 209.6
m2 = 892.24
m3 = -1201.25
m^ = 1758724
N/bi = -0.045
Ьг = 2 .2 1
286 BOWMAN AND SHENTON
Conclusion
Not significant at 90% level fo r the three tests. Note that under normal
sampling, N/bj has a = 0.473, 02 = 3.58, and Ьз has ß [ = 2.714, a = 0.761,
n/01 = 1.738, and = 8.54. Evidently, there is a very good chance that
-0 .0 5 < bi < 0.05. F o r the kurtosis, using a 4-moment Pearson curve as an
approximant p r (2.18 < Ьз < 3 .0 5 ) = 0.5 a p p ro x ., so the observed value is
quite acceptable.
7 .2 .3 .4 Example
n = 50
mi = -0.024
тз = 0.1084
m^ = -0.00118
m^ = 0.0401
>^>J = -0.03
bz = 3 .4 2
Conclusion
Not significant for the single tests nor the onmibus. Note that under norm al
ity, \/bi has (7 = 0.326, 02 = 3 .4 5 , and Ьз has ß [ = 2.882, (7 = 0.598, n/0i =
1.582, and 02 = 8.42. C learly, the observed N/bj is not significant. A
4 -moment Pearson curve fo r Ьз gives Рг(Ьз > 3.63) = 0 .1 a p p ro x ., so that
the observed value is quite acceptable.
7 .2 .3 .5 Example
n = 30
nil ~ 3.4343
m.2 = 2.6857
^ЛЬl = —0.05
Ьз =1.71
MOMENT (^/bl ,b j) TECHNIQUES 287
Conclusion
The omnibus test rejects at 90%. Also РгФ г < 1*73) = 0.025, so the kurtosls
test rejects.
7 .2 .4 Further Comments
7.3 N O N N O R M A L S A M P LIN G
The state of the art for this case is fa r less developed than fo r normal sam
pling. Contours of acceptance under various hypotheses a re shown below in
F igu res 7.3, 7.4, 7.5, 7.7, and 7.8; fo r each of these the sample values
NTbi, Ьг 3^^® plotted and judged against the appropriate percent and sample
size. In particular. Figure 7.8 shows 90, 95, and 99 percent contours for
sam ples of 75 from a skew Type I distribution.
An overview of 90% contours (n = 200) fo r several Pearson populations
is shown in Figure 7 .9 . Note the significant changes in area enclosed for a
norm al (n/^1 = 0, /З2 = 3), uniform {^Tßi = 0, /!3 = 1.8) and a population close
to the Type in line (circled as 8) . It is evident that discrimination between
populations using an omnibus test based on ^ fb i and Ьз ra ise s many unsolved
problem s.
F o r a discussion o f various aspects of this situation, the read er should
turn to 7.6. The rem ainder of this chapter deals with theoretical aspects of
the omnibus contours, including evaluation of the moments of ^ ÍЬ l and Ь з ,
their correlation, and the construction of equivalent normal deviates (based
on Johnson^s Su system) for \ lh i and Ьз.
288 BOWMAN AND SHENTON
7 .4 .1 Series Developments
If
Ag = E(m^ -
(7.4)
& = E (X -
S I
then
(7.5)
r= l
A j = Mj /n, A j = Mj /n^
A 4 = 3a|/n^ + (a 4 - 3a|)/n^
(b) Bivariate
S t
д(к)
V l.t
- У f P)fMa
" -4k „ t k V l - X . t - M
A^"^^
(k+X+M-s-t)
\. M
X=O ß = 0
S t
(7.6a)
M O M EN T , b j ) TECH NIQUES 289
t
(k+\+/x-s-t)
E
X-O
(7.6b)
V , - E (x - „ ; , V - „ ÿ *
2 .0 5+7 1
------ —^ 2 .2 6 2 9 ^ 1.8042 5.2943E 05
n
Sim ilarly, in sampling from Pearson Type I with •sTßi = 1.4, ß^ = 3.4, the
critical size is n = 100 fo r МзФг), and
C learly, in both cases the sample sizes a re only just adequate to damp-out
the n“® term s. Pearson (1930) used rather sim ila r damping factors; fo r ex
am ple, for norm al sam ples he gave
when n = 100. In the case of cr it looks as if a sm a lle r sample size could have
been used.
F o r our illustrations in 7 .4 .2 the safe sample sizes are for
(a) 8, 8, 10, 18
(b) 18, 28, 28, 61, 86
(C ) 21, 43, 102
(d) 36, 57, 72, 179
of a few term s (8, 15, o r perhaps 30), one must not expect to arriv e at a
precise answer; rather, one looks for an optimum assessm ent.
F o r fu ller accounts o f the general Pad^ approach the reader is re fe rre d
to Baker (1965, 1970, 1975). An extensive bibliography on Padá approxima
tion and related matters is given by Brezlnskl (1977, 1978, 1980, 1981).
General comments and cautionary rem arks on summing divergent series are
given by Van Dyke (1974), and problem s o f e r r o r analysis for convergent
and divergent series are discussed by O liver (1974).
An interesting account of the properties of continued fractions (special
case of the Pad^ table) is given by Henrici (1976) who, among other things,
pays much attention to the rate of convergence; continued fraction as develop
ment for rapidly divergent series fairly frequently converge slow ly—rem ark
able property at w orst. A b rie f account of Pad^ methods with special r e fe r
ence to statistical series is given in Bowman and Shenton (1984).
Discussion on divergent series fo r moments of statistics is to be found
in Shenton and Bowman (1977a), and Bowman and Shenton (1978, 1983a,
1983b, 1984). A summation algorithm due to Levin (1973) has turned out to
be successful (Bowman et a l. 1978b, and Bowman and Shenton 1983c) with
series of alternating sign and moderately divergent (as for example with the
factorial series I - x l ! + x^2! - x^3! . . . ) . Cases considered include the
standard deviation from exponential, logistic, rectangular, and half Gaussian
populations.
Pad^ algorithms have been used to find low o rd er moments of ^ Ib i and Ьг
required in the Su approximations.
7. 5. 1 A s 3nmptotlc Correlation
E arly work goes back to Pearson (1905) who gave, for general sampling, the
firs t-o rd e r as 5rmptotics for Mz Фг)» and the correlation
. , (Щ - m ß * - ^ ß z ß s + 6/3,/31 + 3 ß iß 2 - 6¾ + I Z ß l + 24ßi)
R(bi ,Ьз) ------------------------------------------ T------------------------
[Mz (bl ) Mz(^z)I
(7.11)
where
Note, as fa r as firs t-o rd e r term s go, this is the same as the correlation
between ^Tbi and Ьз •
MOMENT cVbi, bj) TECHNIQUES 293
. n , , Е(Ь2‘Л>1) - E(Uj)E(KZbi)
(7.12)
216n(n - 2)(n - 3)
c o v (b i,b 2 ) = (7.13)
(n + l)2(n-b 3 )(n + 5 )(n + 7)
54n(n^ - 9)
P (b i»b 2 ) = (n > 3) (7.14)
(n - 2)(n + 5)(n + - I
Returning to Table 7 .1 , also note that sample size beyond 50 only changes
the correlation slightly. M ore extensive tabulations (Bowman and Shenton,
1975a, Table 5) suggest that in sampling from Pearson Type I distributions
with sam ples n > 50, the correlation between ^ГЬl and b 2 is 0.8 o r m ore if
the skewness (^/Pl) of the sampled population exceeds around 0 .7 . This prop
erty shows that in sampling from Pearson Type I distributions the dot dia
gram of (N/bj ,b 2> w ill consist of an elongated narrow band fo r s f b i > 0 . 7 o r
so (doubtless, if ^2 is also large and the Pearson curve considered is Type
Ш o r Type IV , then the narrow band may broaden, but we have insufficient
data to be sure of this). The dot diagram s of the couplets {^ÍЪl ,b 2> in F ig
ures 7.3, 7.4, and 7.5, below, support these properties (note that a limited
investigation of a sim ilar grid of values ß i у ß z) for normal mixtures showed
no significant change in the pattern of behavior of the correlation coefficient).
TABLE 7 .1 Covariance and Correlation between \íbi and Ьз
ln Pearson Sampling
where
I
Z = V + In { y + ( l + y ^ )2 }
so that
y = Sinh { ( z - y ) / ô }
Ml(y) = sinhfí
solutions are available for the neighboring S3 system (Bowman and Shenton,
1980b); otherwise, see Johnson (1965) o r Pearson and Hartley (1972) for
facilitating tables). Define t = (X - ^)/X, and determine ¢, X setting
Aii(t) =/^i(y) and Mz (t) =M z(y)- Thus
X^ = K ^ / ß ^ iy )
Í = K j - X ß [ (у)
A. K | .
Norm al
B: K» = X | (A j ) * X j(b ^
Sampling
S e I - E*
Nonnormal
where
Sampling
R = P (X g i-J b j), Xgib^))
(In the sequel we shall also describe a bivariate model fo r the joint distribu
tion of N/bj, Ьз from which acceptance contours are constructed.)
Case C is the D^Agostino-Pearson test—a fàirly obvious concept since
the components a re squares of approximate norm al variates—so that p e r
centage values can be used to test significance. They used extensive sim u
lations to assess the distribution o f b 2 (sam ple sizes n = 20(5) 50,100,200)
from which X e(b 2), the equivalent norm al deviate for Ьз, could be estimated
at various probability lev els. However, the statistics Xg(^/bl) and Хе(Ьз)
a re not independent (D*Agostino and Pearson, 1974); and whereas this p e r
haps has little effect on the test contours for la rg e sam ples, it does have to
be taken into account fo r most applications.
The test statistics in A , B , and D are to be regarded as mappings, so
that probability levels a re to be discovered by Monte C arlo sim ulations. A
is a quick and easy statistic to simulate if we want a rough test; all that we
require are the Johnson transform s fo r N/bj and Ьз followed by simulating K|
(see Appendix). B is an improvement on A and C , since it uses the em pirical
equivalent norm al deviate ХеОЬз)« D is intended as an approximate testing
statistic but is still fairly complicated to construct. F irs t of all, fo r a given
sample size, we must c a rry out simulations to asse ss the correlation between
Xs(NZbi) and X s (b 2), since this correlation is intractable mathematically.
Next, we simulate Kg to determine a set of percentage points, and finally,
we map the regions in the (N/bj ,Ьз) plane. F o r present purposes, the x | r
version is adequate.
In the next paragraph we set out some supporting m aterial fo r the sta
tistics involved in A , B , and C . R eaders who p re fe r to follow the main devel
opment may move to 7. 6 . 5.
Moments of N/b^, Ь з , etc. under normality a re given in T ables 7.2 and 7.3
(Bowman and Shenton, 1975b). Xg(NZbi) has moments very close to the normal
even fo r n = 20. X g(b 2) is m ore discrepant but is satisfactory fo r n > 50; the
Improvement using Хё(Ьз) is evident, especially fo r the sm aller sam ples.
The theoretical moments of NZb^ and Ьз are derived from the T aylor series
developments and show gratifying closeness to the simulation results. Those
fo r the “fype statistic (A and B) on the assumption of independence should
be = 2, (7 = 2 , ^IWl = 2, /?2 = 9. The discrepancies (Table 7.2) are marked
fo r n sm all and p ersist for and even at n = 100.
The upper percentage points in Table 7.3 of Xg(NZbi), X s(b 2), and X e(b 2>
are close to the normal values. However, fo r the low er percentage points,
whereas the agreement for Xg(NZbi) is satisfactory, that fo r Xg(b2) is quite
discrepant, especially fo r sm all sam ples; by contrast the percentage points
fo r Xe(b2) a re in good agreement fo r all sample sizes studied. Thus Kg w ill
give too much weight to la rg e discrepancies especially fo r sm all sam ples.
tSD
TABLE 7.2 Moments of 'v/bj, and Related Variates in Normal Sampling Based on 50,000 Simulations^ CD
OO
100 ^ibi 0.000 ( 0 . 000) 0.238 (0.238) -0.016 ( 0 . 000) 3.30 (3.28) tí
H
о
X s(Ȓb i) 0.002 1.001 -0.015 3.01
^Parenthetic entries re fe r to theoretical values of the moment param eters. Most of the simulation for (^/bl, Ьз)
and the statistics was carried out on IBM system 360 model 91 with occasional checks on a CDC 6400 s y s
tem. The basic uniform variates were generated by a multiplicative congruential method; recommended starting
values and m ultipliers are given in a Computer Technology Center Report, Union Carbide, Oak Ridge, Tennessee,
by J. G. Sullivan and B . Coveyou. Pseudo-random normal deviates w ere derived from the uniform variates, using
a rejection method attributed to von Neumann (Kahn, 1956, p. 39). This method sets up x j = log (1/¾ ) (i = 1,2)
and accepts Xj if (Xj - 1)^ < 2x 3 , where Mj and ^3 are identically and independently distributed uniform variates
on (0,1); a normal variate follows by giving Xj an equal chance of being positive or negative.
to
CD
CD
300 BOWMAN AND SHENTON
(continued)
M O M E N T (N/bi, b j ) TECHNlQtJES 301
T A B L E 7*3 (continued)
Standard
norm al -2.326 -1.645 -1.282 1.282 1.645 2.326
Sample size
n 90% 95% 99%
The form of the joint distribution f (^Vbj, Ьг) and the test contours may be
gained from a dot diagram of 5,000 runs o f samples of 20 (F igure 7 .2 ).
Superimposed on the diagram are the 90, 95, and 99% contours based on Kg,
k |, and ” 2). The parabolic shape adumbrated in the dot diagram is
striking and (from unpublished graphs) becomes less sharp for la rg e r
samples (this feature w ill be mentioned in the sequel). It is also to be noted
that there is evidence that the conditional distribution of \ lb i fo r Ьг sm all
is unimodal, whereas as Ьз increases it becomes markedly bimodal. Con
tours at 90% and 95% levels of acceptance fo r numerous sample sizes are
given in Figure 7 .1 . To use them, m erely compute \ I b i , Ьз and enter and
interpret with respect to the appropriate sample size contour. Slightly im
proved contours are given below in Figures 7.6b and 7.6c, constructed from
an entirely different model.
Omnibus contours D (see 7.6 .3 ) and dot diagram s of 1,000 points as illu s -
2 5
2.5
FIGURE 7.2 Dot diagram and 90, 95, and 99 percent contours in normal
sampling for n = 20. K |-----, K| — , x| — -.
i
W
и
о
Î33
3
JO
W
M
со
о
со
304 BOWMAN AND SHENTON
trations have been constructed In the following cases (the assumption that
Is approximately a -variate (i^ = 2) was made fo r sim plicity):
FIGURE 7.4 90, 95, and 99 percent contours XsR i^o^mal mixture \Tßi = 1.0,
= 2.6, n = 50.
MOMENT (-Vbi, b j) TECHNIQUES 305
FIGURE 7.5 90, 95, and 99 percent contours (x |r ) fo r the norm al mixture
f ( . ) = 0 .9 N (0 ,1 ) + 0 .1 N (3 ,1 ); n = 100.
The odd param eter points in the Type I case appear because they correspond
to a density f(n) = k x (l - x)^, chosen for ease of simulation.
Comments
7.7 A B IV A R IA T E M O D E L
7 .7 .1 Genesis
where
(1) Z = y + ô slnh-4(NTbi - í ) A } ,
(3) Í , X, у , and Ô are the param eters associated with Johnson^s S u fo r the
distribution of N/bi.
SO that the unknowns in the model (bearing in mind (3)) are now a, b, c,
and k. The sim plest method to determine these (there are several) is to use
^ (Ь з ), v a r (Ьз), E (b 2N/bi), and БОЬзЬ^); examples of the latter are given in
7 .4 . Define
V = E('s/b,)'^b^
r ,s ' I' 2
+ (b/k)^Mjo + (a + bi'io + c v ^ ) / ] x }
P = >'oi - q^io -
к = (p - I + qt'io + (r - l ) v ^)/(Aioj - - rq aj -
where
D = P« - m|) + Zi'ioMso
These in the general case (nonnormal sampling) are found by the series
approach described in 7.3.
Note that since g (b 2 1^/bj) is the conditional distribution of b 2 , it follows
that
so that since c > 0 (see 7 .7 .2 (2) and r = I + c/k, the conditional mean and
variance ultimately Increase with I Ф > 1 1.
Again since the correlation between bj and b 2 is given by
and since the moments involved here have been used to determine the model
param eters, it follows that the model exactly reproduces r { \ I b i , b j ) and
r ( b j , b j ) ; it does not, however, respond directly to the skewness and k u r-
tosis of b - , .
b = 0
Since C > 0 fo r the existence o f the density, we must have n > 7, in which
case there is always a unique solution. It is also evident that as the sample
size increases a and c — n/6, whereas к — n/12, so that p and r approach 3.
Param eters o f the model a re shown in Table 7 . 6 and comparisons of the con
ditional means and variances (theory versus simulation) in T able 7 . 7 .
Com parison of the omnibus test contours K| (equivalent to K| fo r n > 25)
and those fo r the bivariate model fo r n = 100 a re given in Figure 7.6a. The
new contours a re m ore responsive to the bim odallly property noticed in the
a rra y s fo r Ьг > 3. Theoretically, the \ I b i a rra y s fo r given Ьг from the
model have a density of the form
ф(х) = c e ^ (d^ - x^)^^^^ ^/Г(а + bx^) (x^ < d^, a ,b > 0; x = N/bi)
n a C к P r
T A B L E 7.7 Conditional Means and Variances for the Model in Norm al Sampling
-0.02 < Ф )1 < 0.02 2.07 (2.11) 0.21 (0.21) 2695 2.67 (2.68) 0.23 (0.21) 5108
0.18 < N/bj < 0.22 2.11 (2.14) 0.21 (0.19) 2581 2.75 (2.78) 0.23 (0.22) 3747
0.38 <NTbi < 0.42 2.25 (2.30) 0.22 (0.21) 1999 2.99 (3.04) 0.25 (0.26) 1885
0.58 <^/bl < 0.62 2.48 (2.58) 0.22 (0.24) 1411 3.40 (3.48) 0.28 (0.29) 645
0.78 < ^ Ih i < 0.82 2.81 (2.90) 0,23 (0.23) 822 3.96 (4.04) 0.32 (0.33) 188
0.98 < 'v/bj < 1.02 3.22 (3.41) 0.24 (0.24) 481 4.69 (4.46) 0.37 (0.16) 32
^Parenthetic entries re fe r to Monte Carlo simulations, the number of samples drawn in each range being c. >
iz:
>
O
îz:
MOMENT (N/bi, Ьг) TECHNIQUES 311
FIG UR E 7-6a Norm al sampling. Contours of 90, 95, and 99 percent content,
n = 100. (Rem arks on the test sam ples are given in Sec. 7 .8 .)
312 BOWMAN AND SHENTON
Moments Coefficients
Pearson type
Pearson type
Norm al mixture®
m1 = 0.2037
/l¿2 = 0.0503
4 ß i = 0.0806
ß2 = 3.1431
Ml = 1.2517
М2 = 0.0178
Mii = 0.0232
M21 = 0.0063
0 = 5.590
у = -0.819
X = 1.220
í = 0.021
a =
b =
C =
к =
32.202
0.010
17.432
211.431
E (b 2 l^ i) =
1.15 - 1.08bi
I
!2:
Illustrations o f the param eters in a few cases fo r the bivariate model are
given in Table 7.8.
In F igures 7.4 and 7.7 there is a comparison of the xsR contours and
bivariate contours fo r a norm al mixture population. There is little to choose
between the tw o. Again an example of the bivariate contours fo r another
norm al mixture is given in Figure 7.8. In each figure there are 1000 sim u
lated couplets (N/bi, Ьз), and there is good evidence of the agreement between
the contours and the trend and density of the dots.
7.8 E X P E R IM E N T A L S A M PLE S
The editors have provided 17 random sam ples of 100 from specified popu
lations fo r discussion. The corresponding values of N/bj, Ьз are plotted in
Figure 7.6a. If we work at the 99% level, we should reject populations 2 , 3,
5, 11, 12, 14, 15, and 17 immediately without further complicated calcula
tions . This brings out the striking simplicity of the omnibus test approach.
M oreover, if we had some p rio r knowledge of the population sampled,
we could (if it is specified by (n/0 i , ¢ 3)) construct omnibus test regions from
x | r o r the bivariate model to a ssess significance. F o r example, sample 12
is drawn from a normal mixture with n/0j = 0.80, = 4.02; the Xgj^ con
tours a re shown in Figure 7 .5 . The sample values = 0.73, Ьз = 3.49
a re w ell inside the 90% contour.
MOMENT (N/bi, Ьг) TECHNIQUES 317
R E FE R E N C E S
Baker, G. A . , Jr. and Gam m el, J. L . (1970). (Editors) The Padá A p p ro xi-
mant in Theoretical P h ysics. Academic P r e s s , New Y ork.
F ish er, R . A . (1930). The moments of the distribution for normal samples
of m easures of departures from normality. P r o c . Roy. Soc., Series A
130, 17-28.
Hsu, C. T . and Lawley, D . N. (1939). The derivation of the fifth and sixth
moments of Ьз in samples from a normal population. Blom etrlka 31,
238-248.
A P P E N D IX I
The main Pearson distribution types (Elderton and Johnson, 1969) are:
ba,
I - a i < X< a2
/ x2\"“
IV f(x) е зф {-Ь a rc tan (x/a)} -O O < X< OO
Restrictions on the param eters are omitted, but in all cases the probability
integrals must exist. A s for the structures, Type I can be bell-shaped, J,
o r U-shaped, the range being finite. The other types have unlimited ranges
and generally, at most, one mode.
A P P E N D IX 2
Let Xg(NZbi) and X g(b 2) be the normal deviates corresponding to the skew
ness ( ^ 1 ) and kurtosls (Ьг), respectively. Then Kg is
(Al)
W e shall need the moments of N/bj and •
The first four moments of are
) = 0
^Í2(^Л)l) = 6(n - 2)/{(n + l)(n + 3)}
%(N/bj) = 0
(A2)
^/■^x(T) = Мз(Т)/(М2(Т))®/*
- ~0 < r+
l s<
i 3А У л
w* =V[Ñ/fÍ^7r^^5l(^7^¡rr2>- 1] (AS)
TABLE A l Coefficients
00 1.333848465690817 1.0
Sinh n*
=Л D (w *) - I - W * )
2w* /
(A6)
where 3
and
(1) Determine w.
(2) Determine Q .
(3) Go to A 2 .2.6.
Compute
W i= ^/[^/(2/32 - 2) - 1]
Ui = 1 + ^ { P + n/Q} - ^ { - P + 'n/Q }
W 2= [ - A + nÍ(A 2 -4 B )]/ 2
Wo = (W i + W2)/2
is close to the lognormal line (A S ), the iterative process may abort (this
line is defined by w = I + ß ^/(w + 2)^ and ß 2 2w^ + 3w^ - 3 ). At this
stage check if there is, in any event, a Johnson S u curve tomatch ( ß i , ¢ 3 ).
There is no solution if
ß* > ßi
W = ^/{^/(2^2 - 2) - 1 }
where
ßz - f ( w ^ + 2w 2 + 3)
ßi
Thus
Return this new value to (I) and continue the cycle to meet whatever tolerance
Is deemed reasonable; for example, at the sth cycle, demand
One could also require a tolerance based on the la rg e r of Iwg - W g_il and
I ß g - f^ s - l ^•
A2 ♦2 >4 Examples
Check a few cases using the final computed values o f w and Q in (7.16) to
determine n/^i and ¢ 3 .
A 2 .2 «5 Illustrations
, . , (Arithmetic Mean)
ln P -^ (P ) = '” ÍGeometricMe^;
fi\ = 1.2044
P2 = 0.2299
^Г¢l = 2.1680
/З2 = 14.4765
Then
W j = 1.469166402
Wi 1.579959821 ***
W2 1.597436626 ***
Wj 1.595876511 ***
W4 1.596017234 ***
MOMENT (N/bi, Ьг) TECHNIQUES 327
Ws 1.596004551
We 1.596005695
iterates
W7 1.596005591
We I . 596005602
W9 1.596005601
Wio 1.596005601
Wii 1.596005601
n — -0.905955881
t
1.204400000
0.229900000
Л 0.358105885 ***
I 0.736127663
Feed-back
W 1.596005601 ***
Si -0.905955881 ***
М1(У) 1.307636524 ***
U z(y) 1.792734830 **♦
n/^1 2.167999999 ***
14.476500001 ***
The density o f a random variable X is P (x) = 1/2 exp - I x l , -«. < x < <«..
Then Su is given by
6.000000000 ***
^Z
Wo 1.315956418
W 1.470468517 ♦♦♦
Ô 1.610431098 %:|c4e
t ***
Ml 0.000000000
2.000000000 ***
Mz
Л 1.855132999
0.000000000
1.470468517
0.000000000
Feed-back
W 1.470468517
Í2 0.000000000 ♦♦He
0.000000000 ♦♦♦
328 BOWMAN AND SHENTON
0.581138830
ßl 0.000000000
ßz 5.999999997
(T)
X =
~ (w - l)(w cosh (2ÍÍ) + I)
and
Z = 7 + Ô sinh~^^^ V ^ )
(3) Construct k | = Xg(NZbi) + X g(b 2) and compare with the levels F I , F2,
F2 fo r the given n.
W = n/{-^(2B2 - 2) - 1 }
Ô = 1 / ln W
0! = 'v/{2/(W^ - 1)}
Then
Tests for a uniform distribution with unknown lim its a re given in Section 8.16,
and tests for censored uniforms a re sum m arized in Section 8.17. An enor
mous literature exists on the properties of a uniform sam ple, its o rd e r sta
tistics, and its spacings, and no attempt w ill be made to reference this
entire literature. Two articles with extensive references are those by Barton
and M allows (1965) and Pyke (1965). References given in subsequent sections
w ill mostly be concerned with a particular test o r group of tests, and the
provision of tables.
331
332 STEPHENS
8.2 N O TA TIO N
F(U) = u, 0 <U< I ( 8 . 2)
D = U -U i = l, n+ I (8.3)
I (I) (i - 1 )’
a) The Probability Integral Transform ation, and the related half-sam ple
method, considered in Chapter 4;
b) The Conditional Probability Integral Transform ation discussed in
Chapter 6;
c) The J and K transformations, discussed in Sections 10.5.4 and 10.5.5,
which take exponential random variables to uniform random v a ria b le s.
TESTS FOR THE UNIFORM DISTRIBUTION 333
D* = (n + 2 - r )(D . . - D , r = I, . . . , n + I (8.4)
r ' (r) (r -1 )'
j
" Z » j = I, . . . , n
i = I, >, n
” 1' ■ ' " ( « л * « » •
When a test is made for uniformity, the alternative is often that the sample
comes from a distribution which gives spacings m ore irre g u la r than those
from a uniform sam ple. The implication for testing is that fo r most test
statistics a one-tail test is used. There a re , however, some occasions when
a sample should be tested as too regu lar to be uniform; such a sample w ill
be called superuniform . Stuart (1954) entertainingly re fe rs to superuniform
observations as ”too good to be t r u e ." One situation in which one wishes to
detect superuniformity arises when there is a suspicion that values unhelpful
to a certain hypothesis have been deliberately removed from the sample;
hopefully this situation w ill be r a r e . However, there a re other cases where
superuniform observations can arise quite naturally, with important im pli
cations in test situations. They can occur, fo r example, when transform a
tions are made (for example, the J transformation o f Chapter 10; see
Section 10.6).
E 8. 6 Example
The fact that , U ^ , and a ll have much the same significance levels is
unusual; it reflects the fact that the sample is highly non-uniform in several
featu res. The fact that D " is much m ore significant than D"*” shows that the
drift in the values is toward I rather than 0. This is also shown by the mean
Ü = .686.
Another example of a test fo r U (0 ,1) is given in Section 4 .4 .' Here the
test is a Case 0 test for normality; because the distribution tested is com
pletely specified, the test reduces to a test that the values of Z j given in
T able 4.1 are uniform. A lso, in Chapter 10, tests for exponentlality are
Values U: .004, .304, .612, .748, .771, .806, .850, .885, .906, .977
3 Mean: U = .686
4 Values Vj: -.087, .122, .339, .384, .316, .261, .214, .158, .088, .068
(Section 8.8.1)
5 Statistics C ^ i .384; C": .087; C: .384; K: .471
(Section 8.8.1)
6 Spacings Dj; .004, .300, .308, .136, .023, .035, .044, .035, .021, .071, .023
7 Statistics; G(IO) = .214; Q = 0.361
336 STEPHENS
A m easure of the displacement of each variable U(i) from its e^)ected value
mi = i/(n + I) is
(8.5)
and statistics for testing Ho can be based on the v j. Some statistics which
have been suggested are
C = max. V . ; C = max. ( - v j ;
I l 1 1
+ — + —
C = m ax (C , C ) ; K = C + C ; ( 8 . 6)
T l = S .v ? / n ; T 2 = 2. Iv.l/n
I I I I
Iated modified form s, sim ilar to those for the ED F statistics in Section 8.6,
so that the C and K statistics may be modified and used with only the asym p
totic points; these points are the sam e as fo r the corresponding EDF statis
tics. The modified form s and the asymptotic points are given in Table 8.2.
Johannes and Rasche (1980) have recently given m ore detailed modifications
fo r C"*", C ", and C.
The C -statistics a rise in a natural way when testing the periodogram in
tim e -se rie s analysis; see, for example, Durbin (1969a, 1969b). Statistic T j
a ris e s if U (i) are plotted against mi and a test is based on the sum of squares
o f residuals (see Section 5 .6 ).
Hegazy and G reen (1975) have given moments and percentage points for
statistics T l and T 2 above, and also fo r the statistics t J and T 2 , calculated
using the same form ulas as fo r T j and T 2 , but with v^ = U (j) - (i - l)/ (n - 1);
they also gave some power studies fo r these statistics.
The statistics discussed in this section have much in common with EDF
statistics, and overall they have much the same power properties.
E 8 .8 .1 Example
The values v j for the data set in T able 8 . 1 a re included in the table together
with the statistics ^ C “ , C , and K. When these are modified as in
T able 8.2, the significance levels of the test statistics a re : C*": .01;
C : ^ .25; 0 : .02; IC: ^ .20.
Significance level a
Statistic
T Modified form T* 0.15 0.10 0.05 0.025 0.01
Upper tail
+
C (C^ + 0.4/п)(\Гп'* 0.2 + 0.б8/^/n) 0.973 1.073 1.224 1.358 1.518
K {K -l/(n -^ l)}K (n + I)-+ 0.1555+0.24/NT(n+ 1)} 1.537 1.620 1.747 1.862 2.001
Lower tall
K {K -l)/ (n + l)| K(n + 1) + 0.41-0.26/-/(114 1)} 0.976 0.928 0.861 0.810 0.755
Adapted from Stephens (1970), with perm ission o f the Royal Statistical
Society.
338 STEPHENS
On Ho, the o rd er statistic U (r) has a beta distribution Beta (x; p, q) with
density
where Г (р ) is the gamma function; fo r statistic U (r) the param eters are
P = r and q = n - r + I . Thus in a sample o f size n = 2k + I, the median
U (k +i) has density
n!
0 < x< I
therefore be shared with Chapter 10. In this section we discuss only the
Greenwood statistic and some modifications.
n+1
G(n) = Y j D?
i= l
G *(n) = S j { D . - l / ( n + 1)}^
Hj^(n) = D^. Kim ball (1947, 1949) discussed distribution theory and
(I - V ))'
(8.7)
°г < ”) = «г < “) " Т Г 7 ^
340 STEPHENS
TABLE 8.3 Upper and Lower Percentage Points for nG(n) (Section 8.9.1)
Adapted from Burrow s (1979) and from Stephens (1981), with perm ission of
the first author and of the Royal Statistical Society.
This reduces to G(n) above when r = n. L u rie, Hartley, and Stroud (1974)
investigated statistic S^ = (n + 2 ){(n + l)G p(n) - l } and gave moments and
some null percentage points obtained by curve-fitting. F o r complete samples
S^ had previously been discussed by Hartley and Pfàffenberger (1972). The
statistic Is clearly equivalent as a test statistic to G r (n ), and the moments
o f S^ can be used to give moments of G r(n ). The moments of % (n ) and of
Gj.(n) have been used by the author to fit Pearson curves to the null d istri
bution to give percentage points fo r G ^in). These percentage points are given
in Stephens (1986).
TESTS FOR THE UNIFORM DISTRIBUTION 341
Significance level a.
(continued)
342 STEPHENS
T A B L E 8.4 (continued)
Significance level a.
n+1 n
Q = Z Dj i+ l
i= l i= i ^ ^
E 8.9.2 Example
The values o f the spacings , fo r the data set U in T able 8 . 1 a re also given
in the table. From these are calculated Greenwood’s statistic G(IO) = 0.214,
and Q = 0.361. Reference to Table 8.3 shows G(IO) to be significant at about
the upper 10% level, and reference to Table 8.4 shows Q to be significant at
about the upper 6% level.
E 8 .9.3 Example
There has recently been Interest in k -spacin gs, defined by Dki = U(^i) -U (k i-k )»
these a re the spacings between the observations, taken к at a tim e. This use
o f spacings suppresses some of the Information in the sam ple, but Hartley
and Pfaffenberger (1972) suggested that k-spacings might be useful in tests
fo r large sam ples. Del Pino (1979) discussed statistics of the form
W = Zih(nDki) where h (-) is an appropriate function, summed over the range
o f i for fixed к (for simplicity, suppose к divides into n + 1 and let
344 STEPHENS
i = (n + I ) /к; then the range o f i is I < i < f ) . Del Pino showed that, by the
criteria o f asymptotic relative efficiency, h(x) = gives an optimum sta
tistic W ; let this be W j . Del Pino also argued fo r the utility o f such statistics
fo r la rg e sam ples; see also D arling (1963) and W eiss (1957a, 1957b) fo r m ore
general considerations involving spacings.
C re ssie (1976, 1977a, 1978, 1979) and Deken (1980) have considered
test statistics which are functions of m -th o rd er gaps = U „. , ,
------------------- I (i+m) (I)
fo r m a fixed integer, and 0 < l < n + l - m ; a s before, U(0) = 0 and
.(m )
U . . _ = I- For m = I, = D., : fo r higher m , the G ' contain o v e r -
(n+1) 1 1 + 1 ® i
lapping sets of D j, in contrast with k-spaclngs above. Deken defined G^ з(Р+1)
as a p -stretch , and gave distribution theory fo r the maximum p-stretch.
Solomon and Stephens (1981) gave percentage points fo r n = 5 and 10 and
made a comparison with an approximation given by Deken. C ressie (1977a,
1978) has also studied the minimum p-stretch and the minimum one
might suppose the minimum p-stretch to be useful in detecting a ”bump”
in an otherwise uniform density, and this would be valuable in studying the
tim es of a series of events (see Chapter 10); however, C re ssie (1978) found
the minimum gap of either type to be less powerful against a specific bump
showing asymptotic normality and giving some Monte C arlo power results.
jj.
where now = ^
if r < I and U , = U , , if r > n. There are clearly
Дт)
close connections between 1¾ and H (m ,n ). Dudewicz and van der Meulen
(1981) have proposed H (m ,n) as a test statistic fo r uniformity, and have
given tables of percentage points, derived from Monte Carlo methods, fo r
n = 10, 20, 30, 40, 50, 100 and fo r various values o f m from m = I to
m = M , with M becoming la rg e r with n. They also show asymptotic normality
TESTS FOR THE UNIFORM DISTRIBUTION 345
and Sn , and Holst (1979) gave the mean and variance of H, but tables fo r
finite n are not yet available fo r these statistics. Greenwood^s statistic
Itself converges only slowly to its asymptotic distribution, and this may be
the case also for these related statistics. M cLaren and Stephens (1985) have
8.10 STATISTICS FO R S P E C IA L A L T E R N A T IV E S
ku
f(u) = 0 < U< I (8. 8)
к
e - I
Significance level oi
Stephens (1966); if Z q, is the given point for level ce, the corresponding upper
tail point, that is, for level 1-ce, is I - Z q,. F o r large n (n > 20) the d is
tribution of Ú is well-approxim ated by the norm al distribution with mean 0.5
and variance l / {l2 n }o The distribution (8.8) occurs in connection with
points U obtained from a renewal process with a trend (see Section 10.9.1).
E 8.10.1 Example
The mean of the U -se t in Table 8 . 1 is 0.686, so I - U = .314. Reference to
Table 8.5 gives a p -le v e l equal to 0.02 (one-tail) o r 0.04 (tw o-tail).
which reduces to the uniform density when к = 0. This fam ily of densities is
sometimes re fe rre d to as the Lehmann fam ily. The likelihood ratio test
statistic for a test for к = 0 against к ^ 0 is P/2 where
P = -2 Z ^ log U j
This density is S3mimetric, with a mode along the line OA with coordinate »
and is increasingly clustered around OA as к becomes la rg e r; when K = O the
distribution is uniform around the circ le . I q {к) re fe rs to the B essel function
with im aginary argument, of ord er zero. F o r a von M ises alternative, the
null hypothesis Hq is equivalent to
When the modal vector OA is not known, the likelihood ratio procedure gives
a test statistic which is the length R of the resultant, o r vector sum, g , of
the vectors OP¿, i = I, . . . , n. In the m ore unlikely event that, on the alter
native, the modal vector OA is known, the component of g along OA, called X,
is the test statistic.
The distributions of R and X are very complicated for points on a circle;
they have been studied by Greenwood and Durand (1955) and Durand and
Greenwood (1957), who have given some percentage points. Stephens (1969a)
has given a table of upper tail percentage points for testing Hq , for both R
and X. F o r large sam ples, 2R^/n has the x| distribution and X has the n or
mal distribution with mean O and variance n/2. These statistics arise also
in a totally different context, when EDF statistics and are partitioned
into components (Section 8.12).
Further applications of the uniform distribution arise in studying random
ness of directions on a sphere, against various alternatives. Suppose the
sphere has center 0, and radius I , and let a typical point P on the sphere be
located by spherical polar coordinates ( в , ¢ ). If P is uniformly distributed
on the surface of the sphere, cos в is uniformly distributed between - I and I.
Again, the von M ises distribution, with density p er unit area proportional to
exp ( k cos y ) , is the most important for unimodal data; here y i s the angle
between O P and the modal vector P A . Likelihood ratio tests fo r uniformity
( k = 0) against a von M ises distribution (к > 0) (also called the F ish er d istri
bution on the sph ere), lead again to the length R of the resultant R of a sample
of n vectors, as test statistic; when the modal vector of the alternative is
known, the test statistic is the component X o f R on this vector, as for the
c irc le . Stephens (1964) has given tables of percentage points for R and for X.
F o r large n, 3R^/n is approximately Хз distributed, and X is norm al, with
mean 0 and variance n/3.
Other alternatives to randomness have been proposed to describe natural
data, among them, densities for which the probability p er unit area is p ro
portional to
fj(y ) = e -« lc o s y |
. , . K Sin y
0 < у < 7Г
or
- . . K CO S^y
fa(y) = e
TESTS FOR THE UNIFORM DISTRIBUTION 349
L i = S il V i l / n ; L j = S i (I - V f)^ / n ; L j = S iV j/ n
Q = S j (I - U*)^/n
and
T = Z^uVn
S2 = (U. - 0 .5 )V n
which is a m easure of the dispersion of the U^, has the same distribution as
T/4. Significance points fo r Ü are in Table 8 .5; points fo r Q and T have been
given by Stephens (1966), and the applications to tests fo r directions are d is
cussed further in that re feren ce. When OA is not known fo r the distributions
above, the tests for uniformity become m ore complicated. The statistics Ü,
S^ i and T w ill appear again in the next section in connection with Neym an-
Barton tests, and with partitioning the Anderson-D arling statistic into
components.
A=-Zj N(X) - |n f dx
“ 0 (
where N(x) is the number of observations falling in the sem icircle (x, x + l/ 2).
Computing form ulas have been given by Watson (1967) and by Stephens (1969c).
350 STEPHENS
Suppose the observations are < U^2) < * * ' < ^ (n )» m easured around the
circum ference. Then
A = J - - Z
4 n
where
n j-1
Z = Z Z “ ij
j=2 i=l
with
U -U
(j) (i) ^ (j) " % - 2
“ ij =
Omnibus tests are not designed for specific alternatives, but it is convenient
to mention several of these, especially for the c ircle, before leaving this
section. EDF statistics and V (Chapter 4; here replaces Z (i) in Equa
tions (4.2)) w ere designed for the circle because they do not depend on the
origin of U; Watson (1976) gave another statistic derived from the E D F and
Darling (1982, 1983) has recently given the asymptotic points. Ajne (1968)
also gave another statistic fo r the circle. A review of tests for uniformity on
the hypersphere is given by Prentice (1978); see also Beran (1968) and Gine
(1975). Such tests are sometimes derived in very general term s, and often
give the test statistics in this section when particular cases are taken.
TESTS FOR THE UNIFORM DISTRIBUTION 351
K
f(u) = C(^) езф 11 + 2] 0. [ . O< U < I, к = I, 2, (8.9)
j= l ^ ^
where 7Tj(u) are the Legendre polynomials, ¿ is a vector of param eters with
components ..., and c (^ ) is the norm alizing constant. The Legendre
pol 30iomlals are orthonormal on the Interval (0 ,1 ). By varying k, the density
may be made to approximate a given density, and it also va rie s smoothly
from the uniform distribution as the take increasingly large values. The
test fo r uniformity of U then reduces to testing the null hypothesis
$,= 0 fo r all j
J
(a) Let
n
^ (8.10).
3 = Vn
T E
J I
\ = E V? (8 . 11)
j= i ^
The null hypK)thesis of uniformity w ill be rejected for large values of Nj^; for
large n, on Ho, % is asjmaptotically distributed as The tests based on
Nk a re consistent and asymptotically unbiased. Незппап showed that, asym p-
352 STEPHENS
E 8 .11 Example
F o r the data in Table 8 . 1, Neyman’s statistic N 3 = 6.437 and is significant
at about the 4% level. The individual components vf and v| (equivalent to Ü
and S^) have significance levels p = 0.04 and p = 0.12, respectively.
TESTS FOR THE UNIFORM DISTRIBUTION 353
Significance level a
Statistic N 2
Statistic N 3
(continued)
354 STEPHENS
Significance level a
Statistic N 4
Adapted from M ille r and Quesenberry (1979) and from Solomon and Stephens
(1983), by courtesy of the authors and of M arcel Dekker, Inc.
TESTS FOR THE UNIFORM DISTRIBUTION 355
W* = 2 ^
j= 0 ^
where
and where Xj are weights (Durbin and Knott, 1972; see also Schoenfeld, 1977).
Suppose V i = TIjUi. i = I* 2, . . . , n. Starting at the point (1,0) in the usual
rectangular coordinates, V i can be recorded on the circum ference o f the
unit c ircle, centered at the origin O, and with radius I . Let point be the
point on the c ircle corresponding to V i , and let Rj be the resultant (vector
sum) o f the vectors O P i, i = I , . . . , n. Component Zj is proportional to X j,
the length of the projection of Rj on the X -a x is . When the Uj are U (0 ,1), the
V i w ill be uniform on (0, Jtt) and the distributions of Xj are tiie sam e as those
discussed in connection with directions in Section 8.10.3. Stephens (1974a)
has shown that the components of U^ are proportional to R j, the length o f g j ,
also discussed in Section 8.10.3. F o r A ^, the components a re proportional
to the Vj in the Ne 3onan-Barton test statistics; thus the sum o f the firs t к
components of A^ is related to Neyman*s N^ in that they use the same com
ponents V j, j = I , . . . » k, defined in equation (8.10), but with different
weights. Against some alternatives one o r two components of, say, o r A^
may be m ore powerful, as a test statistic fo r uniformity of U i , than the
entire statistic o r A ^ . Durbin and ICnott (1972) have demonstrated this
fo r a test o f normality N (0 ,1), in which the U i are obtained by the Probability
Integral Transform ation, against alternatives involving either a shift in mean
o r a shift in variance. The firs t component alone, fo r example, is better
than in detecting the shift in mean. However, Stephens (1974a) has shown
that the first component is insensitive to an alternative where both mean and
356 STEPHENS
variance have been changed; for such an alternative, at least the first two
components would be needed—this is roughly the same as using the first
component of U^. By expanding an alternative density into a se rie s, using
appropriate orthogonal functions, it should be possible to suggest which
departures are detected by which components, and then perhaps decide how
many to use to get best power, but this w ill be difficult in the usual situation
where the alternative distribution to the null is not clearly known. Sim ilar
rem arks apply to the other statistics partitioned into components.
Durbin, Knott, and T aylor (1975), Stephens (1976), and Schoenfeld (1980)
have discussed partitioning of EDF statistics into components for the case
where the tested distribution contains unknown param eters; here the d istri
bution theory of components is very complicated, and only as 5nnptotic results
a re known. Components can be useful in the theoretical examination of test
statistics, especially in calculating asymptotic power properties.
In the next section we discuss the power of the various test statistics in this
chapter. However, before this, some general observations can be made on
the appearance of the U -se t and its effect on different test statistics. If the
U -s e t is truly uniform, it should be scattered m ore o r less evenly along the
interval (0 ,1 ). If the alternative to uniformity makes the values tend toward 0,
there w ill be a high value for D"**; if they tend toward I , there w ill be a high
value for D “ . In either case D w ill be large and perhaps significant. The
statistics and A^ w ill also detect a shift of values toward 0 o r I . If the
U set has been obtained from the Probability Integral Transform ation (Chap
ter 4), from a Case 0 test that are from a completely specified F (x ), a
set of U -values tending toward 0 o r toward I w ill suggest that the hypothesized
F (x) has an incorrect mean (it may of course also have other incorrect param
eters o r be of incorrect shape). If the U -s e t tends to cluster at some point
in the interval, o r to divide into two groups toward 0 and I , the statistics V
and U^ w ill be large and w ill tend to show significance. This indicates that
the variance of the h5TX)thesized F (x) is too large or too sm all. The statistic
P = -2 log U¿, like D"^ and D “ , also indicates which way the points have
moved; it they have moved closer to 0, P w ill be large, and if closer to I ,
P w ill be sm all. The value of P is very much m ore dependent on low values
of Ui than on high values, because log u, when u is nearly I , is nearly 0,
while as u approaches 0, log u becomes very large and negative. W e shall
see later that this has some importance in methods of combining tests for
several sam ples. Among the other statistics, clearly Ù o r U(n/2) might have
some power against an e r r o r in mean, but not against an e r r o r in variance of
the tested distribution. The same applies to the first component v^ o r z^ in the
decomposition of both the Ne 3rman tests and o r A ^ , and in turn in
equation (8.12) w ill not be sensitive to an e r r o r in mean (Durbin and Knott,
TESTS FOR THE UNIFORM DISTRIBUTION 357
A number of studies have been made on tests for uniformity, including those
by Stephens (1974b), Quesenberry and M ille r (1977), Locke and Spurrier
(1978), and M ille r and Quesenberry (1979).
In general it can be said that, among the ED F statistics, the quadratic
statistics appear to be m ore powerful than the supremum class; when the
discrepancy between the ED F F^(U) and the theoretical distribution F(u) = u
is used a ll along the interval 0 < u < I it appears that better use is made of
the Information in the sample than by using only the maximum discrepancy.
When the basic problem is to test an X -se t fo r a distribution F (x ), so that
the observations U^ have been obtained by the Probability Integral T ra n sfo r
mation, W^ and A^ (especially A^) w ill detect shifts in the mean of the
hypothesized distribution from the true mean, and U^ is effective at detecting
shifts in variance. A^ is also especially good at detecting irregularity in the
tails of the distribution. Among other tests fo r uniformity, statistics ob
tained by the likelihood-ratio method are most powerful against their respec
tive fam ilies of alternatives, as would be expected. Against a decreasing o r
increasing density, Ù alone is very efficient, and against distributional alter
natives in which the mean is near the uniform mean of 0 .5 , but the variance
is changed either because the distribution is unimodal and S3ntnmetric, o r
U-shaped and sym m etric, the quantity above is a powerful statistic. F o r
unimodal nonsymmetric distributions, these statistics lose some of their
efficiency. The effect of the good perform ance of these relatively simple
statistics means that the Незппап statistic N 2 in Section 8.11, which com
bines them both, is effective fo r a wide range of alternatives to uniformity
(Locke and Spurrier, 1978; M ille r and Q uesenberry, 1979). Although the two
components in N 3 occur again in A ^ , the presence of further components,
and the different weightings, sometimes make A^ less effective than N 3 ; in
a sim ilar way, N 4 can be less effective than N 3 , whenever adding further
components "dilutes" the power of the first two (Quesenberry and M ille r,
1979). This is sim ilar to the situation for EDF statistics (Section 8.12).
general use of F ish e r's method, which is based on the p -le v e ls of the com
ponent tests.
To fix ideas, suppose к tests are to be made of null hypotheses H q i ,
Ho2> • • • » Hok* ^ composite hypothesis that all Hok are true; if
any one is not true, Hq should be rejected. Let T^ be the test statistic used
fo r Hq J, and suppose the test is an upper tail test. When the test is made,
let T i take the value T j , and suppose Pi is the significance level (often called
the p -le v e l) of this value, that is, when Hq í Is true, P r (T^ > Т|) = pi. When
Hqí is true, Pi is U (0 ,1), and when к independent tests are made of the к
null hypotheses above, we should obtain a random sample of к values of Pi
from U (0 , 1). Thus all к null hypotheses are tested simultaneously by testing
if the Pi appear to be such a uniform sam ple. This of course can be done by
any of the methods described in this chapter. F ish er (1967) suggested the
statistic P = -2 Z i log Pi, already discussed in Section 8.10.2. Effectively
the same idea had been put forw ard before by K arl Pearson, who suggested
using the product of the Pi- F o r a summary of early work on some of the
problem s discussed in this section see Pearson (1938). Note that if Qi = l - P i *
Qi could replace Pi in P , since clearly Qi is also U (0 ,1). Finally, let r i be
the minimum of Pi and qi, form ally written r i = min (P i,q i); it is easily proved
that, when pi is U (0 ,1), r i has the uniform distribution with limits 0 and 0.5,
so that P could be calculated using 2ri, o r I - 2ri = I qi - Pil instead of Pi.
Thus we have possible statistics
E 8 .1 5 .1 Example
Suppose five independent tests, for example, that five sm all sam ples are
each from the norm al distribution with mean zero and variance one, give
p -values . 15, .20, .28, . 16, .25, so that each test is not significant at the
10% level. Then P^ = -2 2¾ log pj = 15.99; this value is exactly significant
at the 10% level for xîo • This follows the usual procedure as suggested by
F ish e r. However, if we use the q-values .85, .80, .72, .84, .75 and calcu
late P 2 = -2 Z i log Qi we obtain P 2 = 2.352. This is significant at the 1%
level in the lower tail of Xio 5 the sample gives greater significance using P 2
than using P j .
E 8.15.2 Example
F o r this example we take F ish e r’s firs t illustration of his test (F ish er, 1967).
In this example three tests of fit yielded p-values of .145, .263, .087, and
P j is 11.42. In the upper tail of x l » this is significant at approximately the
7 . 5% level. The q-values are .855, .737, .913, and P 2 is 1.105. This value
is significant at the 2.5% level in the lower tail of x l • Again the value of P 2
is m ore significant than the value of P j in its appropriate tail.
P5 = -2 Z i log Ui
with Ui = qi for one-tail tests, and U i = I - 2ri for two-tail tests. Alterna
tively P^ = -2 Z i log U i could be used with U i = pi for one-tail tests and
U i = 2ri fo r two-tail tests. Pg and P¿ w ill again have the x^y^ distribution
when a ll H^i are true.
360 STEPHENS
E 80I 5 .3 Example
Suppose five independent tests are to be made that five samples from normal
distributions have means 0 , against the alternative that the means are not 0 .
Thus five t-tests w ill be used, with significance in either tail for each test.
Suppose the significance levels m easured a ll from the upper tail (for this we
use the tem porary notation P ^ are p * = .15, .04, .75, .92, .07; thus only
the second sample would be declared significant using a two-tail 10 % level
f o r t ^ . The corresponding values o f r j are .15, .04, .25, .08, .07 and these
give the value for P 3 = 16.44. Using I - 2rj instead of 2r i we have P^ = 2.92.
P 3 is significant at the 10 % level of Xio (upper tail), and P^ at the 2 % level
(lower ta il).
It is possible to m isuse F ish e r’s test, in the situation where some of the
component tests are tw o-tail tests, by using r^ when 2r j should be used.
This is especially easily done when the results of two-tail tests are som e
times reported, using expressions such as ’’the lower tail p-value equals
0 .1 1 ,” o r ’’the upper tail p-value equals 0 .3 5 .” The statistician might then
wrongly use log 0.11 and log 0.35 in the calculation of, say, P ^ , and obtain
false levels of significance for the test of the overall hypothesis H^. A s an
example, consider Example E 8.15.3 above in which all tests are tw o-tail
tests, and the third and fourth tests, for example, could have been reported
as significant at the .25 and .08 levels in the lower tail. Then if the values
of T l instead of 2r j are used in calculating P 3 (which is the same as P^ since
a ll tests a re tw o-tall), we have P 3 = 23.37. This is spuriously highly s ig
nificant; P 3 is at the 1 % level of xîo •
Until now, tests in this chapter have been tests fo r a U (0 ,1) distribution;
they can be used easily fo r a test for U (a ,b ), where a and b are known, by
making the transformation given In Section 8.2. If the limits of the distri
bution a re not known, other procedures are available. Two of these are:
Tests fo r censored data are described in several other chapters and examples
a re given in Chapter 11. Tests specifically for censored uniforms are
REFERENCES
Barton, D . E. (1956). Neyman*s p si-2 test of goodn ess-of-fit when the null
hypothesis is composite. Skand. Aktuartidskr. , 39, 216-245.
9.1 IN TR O D UCTIO N
367
368 D’AGOSTINO
from which data w ere generated. These tests complement the informal
graphical techniques already discussed in Chapter 2 (see Sections 2.4 and
2 .5 ).
This chapter w ill focus on tests applicable to complete sam ples. Tech
niques based on incomplete o r censored samples are discussed elsewhere;
in Chapter 1 1 , Analysis of Data from Censored Samples, and also in Chap
ters 3, 4, and 5. W e start by discussing tests that assume a complete random
sample is available for analysis. These tests occupy the m ajor portion o f the
chapter and are the prim ary interest of the chapter. Tests applicable on
residuals and tests for multivariate normality w ill also be discussed.
9. 2 C O M P L E T E RANDOM S A M PLE S
9 .2 .1 Null Hypothesis
(9.2)
^27Г0-
a > O /
In testing for departures from the normal distribution the null hypothesis of
(9 .1 ), Hq , is that the random variable X under consideration is distributed as
a normal variable, o r in other w ords, X has a probability density function
given by (9 .2 ). If, further, specific values of both the mean and standard
deviation, ß and <7 , of (9.2) are specified by the null hypothesis ( e . g . , X is
norm ally distributed with ß = 500 and or = 100), then the null hypothesis is a
simple hypothesis. This means the null hypothesis concerns itself with only
one particular distribution. If either ß o r er are not specified completely, then
the null hypothesis under consideration is a composite hypothesis. This
chapter deals mainly with the composite null hypothesis with both ß and cr
unknown. In most applications p rio r knowledge o f ß o r c r is not available. If.
it is available, it usually is of no help, from a power point of view, in judging
goodness-of-fit (see, for example, Chapter 4 on E D F tests).
TESTS FOR THE NORMAL DISTRIBUTION 369
9 .2 .2 Alternative Hypothesis
(9.3)
and
(a)
(b)
E(X -
ßz (9.4)
F o r the purposes of this chapter tests fo r normality can be grouped into five
categories, chi-square test, em pirical distribution function tests, moment
tests, regression tests, and m iscellaneous tests. W e now discuss these
gro u p s.
cell. The latter expected values are computed assuming the data did arise
from a normal distribution. O f particular interest to testing for normality
are the articles of Chernoff and Lehmann (1954) and Watson (1957). In the
fo rm er article it is shown that the use of the sample mean and standard devi
ation based on ungrouped data to obtain the expected values results in the
observed chi-square statistic as being as 3onptotically distributed as
In (9 O5) x|(i^) represents a chi square variable with v degree of freedom , all
these chi square variables are independent and O < ofj < I . The often quoted
к - 3 degrees of freedom is incorrect. The Watson (1957) article describes
how switching appropriately to fixed cell probabilities can result in obtaining
explicit form ulas fo r the a ^ and a ^ o f (9 .5 ). W hile the chi square tests are
o f historical Interest and are continuously being modified, we agree with
P ro fe sso r D. S. M oore, the author o f Chapter 3, that they should not be
recommended fo r use in testing fo r departures from normality when the full
ungrouped sample of data is available. Other procedures to be discussed
below are m ore powerful. In the cases where the fuU sample is not available
( i . e . , data are censored o r truncated) o r where the data a re grouped into
classes (see Section 3 .2 .7 , Example 2) these procedures are of use. We
r e fe r the read er to Chapter 3 fo r further details. The rem ainder of this
chapter w ill not contain any further discussion of these tests.
Chapter 4 above discussed in detail the concept and applications of the tests
based on the em pirical distribution function (E D F ). B asically fo r the normal
distribution these tests involve m easuring the discrepancy between the cumu
lative distribution function
-é((t-íí)/o)^dt
Ф(х) = / — (9.6)
# (X < X )
F (X) = — r — - (9.7)
n n
o f the sam ple. The д and a o f (9.6) often are not specified and are replaced
by the sample mean X and standard deviation S, where
V XX ^ /z ( X - 5x i !
X = ----- and S (9.8)
n N n - I
372 D’AGOSTINO
Many tests have been developed fo r this situation. Some prominent ones are
the Kolmogorov (1933)-Sm irnov (1939) test, the Kuiper V test (1960), Pyke’s
C test (1959), Brunk’s B test (1962), Durbin’s D test (1961), the C ra m e r-
von M ises test (1928), Durbin’s M test (1973), W atson’s test (1961),
the Anderson-D arling A^ test (1954), F ish e r’s Trand тг’ tests (1928), and the
H artley-Pfaffen berger test (1972). See Chapter 4 fo r details on some of
these.
Some o f the above tests have been modified to apply to the composite null
hypothesis of normality with ц and a unknown (Stephens, 1974, and Green
and Hegazy, 1976). In Chapter 4, Section 4 .2 , form ulas are given fo r the
Kolm ogorov-Sm irnov D test, the Kuiper V test, the C ram er-von M ises
test, the Watson test, and the Anderson-D arling A^ test, and in Section
4 .8 the application of these to the normal distribution is described in detail.
F o r the purposes of this chapter we now present in the notation of this chap
te r the procedure fo r perform ing the Anderson-D arling A^ test, which is the
EDF test we recommend for u s .
(1) A rran ge the sample in ascending o rder,
= fo r 1 = 1 , . . . , n (9.9>
(I) S
(9.10)
Ф(у) in (9.10) represents the cdf o f the standard norm al distribution and P j
is the cumulative probability corresponding to the standard score Y^jj of
(9 .9 ). Pj[ can be found from standard norm al tables as given in the Appendix
o r by use of the following approximation due to Hastings (1955). F o r Y (i)
such that O < Y (i) < OO define у = Y^y and compute
where
TESTS FOR THE NORMAL DISTRIBUTION 373
Cl = 0.196854, C 3 = 0.000344
C 2 = 0.115194, C 4 = 0.019527
2 ^
A = [ ( 2i - I) {lo g P. + log (I - }/ n ] (9.12)
i= l
A * = A2(1.0 + 0 .7 5 /n +2 .2 5 /n ^) (9.13)
I-P ,.
^(i) ^(i) ^(1) ■"
_ 101. 8045
ЗА* “ IЛ “ 10 = .18045
1 ^ .75 ^ 2.25'
A * = A^( -) = . 198044 (Accept Norm ality for NOR data)
10 100
b^2 = 12 Z - g .§ 5 3 _ 3_Q ^ ^ ,7 5 3 5 3
10
Durbin, Knott, and T aylor (1975) have employed a procedure from which it
is possible to express the test statistic of an EDF test as a weighted linear
function o f independent chi square variables each with one degree of freedom.
This perm its computation of asymptotic significance points. These are sim
ila r to those obtained by Monte Carlo procedures and presented by Stephens
in Chapter 4 of the present volume
The read er is referred to Chapter 4 for further discussion of EDF tests.
TESTS FOR THE NORMAL DISTRIBUTION 375
Chapter 7, Section 7.2 discusses moment tests as they apply to the normal
distribution. The m odem theory o f tests fo r normality can be regarded as
having been Initiated by K a rl P earson (1895), who recognized that deviations
from normality could be characterized by the standard third and fourth
moments of a distribution. To be m ore езфИси, as previously discussed In
Section 9 .2 .2 , the normal distribution with density given by (9.2) has as its
standardized third and fourth moments, respectively.
= = о (9.14)
CT
and
^ EjX _
Pz - 3
^^bl = m j/ m f (9.16)
and
b2 = т 4 / т | (9.17)
where
m^ = 2 (X - i p V n , к > I (9.18)
and
X = ZX/n (9.19)
could be used to judge departures from normality. He found the first approx
imation ( l . e . , to n"^) to the variances and covariances of ^ ÍЬ l and bz for
sam ples drawn at random from any population, and assuming that ^ГЬl and b 2
w ere distributed jointly with bivariate normal probability, constructed equal
376 D’AGOSTINO
Taken from D ’Agostino and Tletjen (1973) with perm ission o f the B io -
metrlka Trustees.
TESTS FOR THE NORMAL DISTRIBUTION 377
significant c¿ = 0.001, 0.005, 0.01, 0.025, 0.05, and 0.10. These points are
given here in Table 9.2. Mulholland (1977) gives good approximations for
n = 4 to 25.
9 .3 .3 .1 .2 Syy approximation
(9.20)
Ч 6(n
6 ( n - 22)) /
W 2 = - I + {2(/32 - 1 ) } ^ (9.22)
Ô = I / n/log W (9.23)
(3) Compute
9 .3 .3 .1 .3 t approximation
T = ( ^ ) / CT(NTbi) (9.26)
where
T A B L E 9. 3 Probability Points of ^/bl for n > 36
(Su Approximation Points)
378
TABLE 9.3 (continued)
(continued)
379
TABLE 9.3 (continued)
380
D’AGOSTINO 381
(9o27)
¢2 is given by (9.21)
and
(9.28)
. [(n ^lMn^3)1^
6(n-2) J (9.29)
фу
[Z (X -X )Z jr
Distributionst
0 0 -0.87 2 0
” 2
Z I X - Xl /n
a = Geary^s statistic =
m:
Sample size
n Percentiles
Part I (n = 7 to 20)
I 2 2.5 5 10 20 80 90 95 97.5 98 99
7 1.25 1.30 1.34 1.41 1.53 1.70 2.78 3.20 3.55 3.85 3.93 4.23
8 1.31 1.37 1.40 1.46 1.58 1.75 2.84 3.31 3.70 4.09 4.20 4.53
9 1.35 1.42 1.45 1.53 1.63 1.80 2.98 3.43 3.86 4.28 4.41 4.82
10 1.39 1.45 1.49 1.56 1.68 1.85 3.01 3.53 3.95 4.40 4.55 5.00
12 1.46 1.52 1.56 1.64 1.76 1.93 3.06 3,55 4.05 4.56 4.73 5.20
15 1.55 1.61 1.64 1.72 1.84 2.01 3.13 3.62 4.13 4.66 4.85 5.30
20 1.64 1.71 1.73 1.83 1.95 2.12 3.20 3.68 4.18 4.68 4.87 5.38
Part 2 (n = 20 to 100)
20 1.58 1.64 1.73 1.83 1.95 2.04 2.12 3.20 3.40 3.68 4.18 4.68 5.38 5.91
25 1.66 1.72 1.82 1.92 2.03 2.12 2.20 3.24 3.43 3.69 4.15 4.63 5.29 5.81
30 1.73 1.79 1.89 1.98 2.10 2.19 2.26 3.26 3.44 3.69 4.12 4.57 5.20 5,69
35 1.78 1.84 1.94 2.03 2.15 2.24 2.31 3.28 3.45 3.68 4.09 4.51 5.12 5.58
40 1.83 1.89 1.99 2.07 2.19 2.28 2.35 3.29 3.45 3.66 4.06 4.46 5.04 5.48 O
2.31 2.38 3.29 3.44 3.65 4.02 4.41 4.96 5.38
45
50
1.87
1.91
1.93
1.96
2.03
2.06
2.11
2.15
2.23
2.26 2.34 2.41 3.29 3.44 3.63 4.00 4.36 4.88 5.28 §
55 1.94 2.00 2.09 2.18 2. 29 2.37 2.44 3.29 3.43 3.62 3.97 4.32 4.81 5.19 Ï
60 1.97 2.03 2.12 2.21 2.32 2.39 2.46 3.29 3.43 3.60 3.94 4.28 4.75 5.11
65 2.00 2.05 2.15 2.23 2.34 2.41 2.48 3.28 3.42 3.59 3.91 4.24 4.69 5.03
70 2.02 2.07 2.17 2.25 2.36 2.43 2.50 3.28 3.41 3.58 3.89 4.20 4.64 4.97
75 2.05 2.10 2.19 2.27 2.38 2.45 2.51 3.28 3.41 3.57 3.87 4.17 4.59 4.90
80 2.07 2.12 2.21 2.29 2.39 2.46 2.53 3.27 3.40 3.56 3.85 4.14 4.54 4.84
85 2.08 2.14 2.22 2.31 2.41 2.48 2.54 3.27 3.39 3.55 3.83 4.11 4.50 4.79
90 2.10 2.16 2.24 2.32 2.43 2.49 2.55 3.27 3.39 3.54 3.81 4.08 4.46 4.74
95 2.11 2.17 2.26 2.34 2.44 2.50 2.56 3.27 3.38 3.53 3.80 4.05 4.43 4.70
100 2.13 2.19 2.27 2.35 2.45 2.52 2.57 3.26 3.37 3.52 3.78 4.03 4.39 4.66
100 2.13 2.19 2.27 2.35 2.45 2.52 2.57 3.26 3.37 3.52 3.78 4.03 4.39 4.66
HO 2.15 2.22 2.30 2.37 2.47 2.53 2.59 3.26 3.37 3.51 3.75 3.99 4.32 4.58
120 2.18 2.24 2.32 2.39 2.49 2.55 2.61 3.25 3.35 3.49 3.72 3.95 4.26 4.52
130 2.20 2.26 2.34 2.41 2.51 2.57 2.63 3.25 3.34 3.47 3.70 3.92 4.21 4.46
140 2.22 2.28 2.36 2.43 2.52 2.58 2.64 3.25 3.33 3.46 3.67 3.89 4.17 4.41
150 2.24 2.30 2.37 2.45 2.54 2.60 2.65 3.24 3.33 3.45 3.65 3.86 4.13 4.36
160 2.26 2.32 2.39 2.46 2.55 2.61 2.66 3.24 3.32 3.44 3.63 3.83 4.09 4.31
170 2.28 2.33 2.40 2.48 2.56 2.62 2.67 3.23 3.32 3.43 3.62 3.81 4.06 4.27
180 2.29 2.35 2.41 2.49 2.57 2.63 2.68 3.23 3.31 3.42 3.60 3.79 4.03 4.23
190 2.31 2.36 2.43 2.50 2.58 2.64 2.69 3.22 3.30 3.41 3.58 3.77 4.00 4.19
200 2.32 2.37 2.44 2.51 2.59 2.65 2.70 3.22 3.30 3.40 3.57 3.75 3.98 4.16
Adapted from D ’Agostino and Tietjen (1971) and D'Agostino and Pearson (1973), with perm ission of the
Biometrika Trustees.
CO
OO
СЛ
386 D’AGOSTINO
Finally, both a one sided o r two sided test can be used. If the direction
o f the skewness is anticipated ( i . e . , ^Ißi > 0 o r ^Ißi < 0) a one sided test
should be used.
Anscombe and Gl3nm (1983) showed that the results of Table 9.5 and
Figure 9.2 for n > 20 can be adequately approximated, when the first three
moments of the distribution of Ьз have been determined, by fitting a linear
function of the reciprocal of a variable and then using the W ilson-H ilferty
transformation. Their approximation is computed as follows:
E(b ) = m j ú i (9.31)
n+ I
and
= 2 4 п (п -2 )(п -3 )
^ ^ /„ + 1ч2/„ + Qwj, + C (9.32)
(n + l)2 (n + 3 )(n + 5)
^ E(bz) (9.33)
N/var (Ьг)
^ 6(n^ - 5 n + 2 j
(9.34)
HiV z; (n + + 9) V n(n - 2)(n - 3)
(5) Compute
8 2
A = 6+
N/^l(bz) ФхОЬг)
J ^1(Ьг) } (9.35)
TESTS FOR THE NORMAL DISTRIBUTION 389
(6) Compute
(9.36)
•((‘ -¿ » -[ tt S S s í ]* )/ « " «
Z of (9.36) is approximately a standard norm al variable with mean zero
and variance unity.
The approximation given by (9.31) to (9.36) can be used to test directly
null hypotheses concerning Ьз fo r two sided o r one sided alternatives. F o r
example, fo r testing at level of significance 0.05
Ho : Norm ality
9 .3 .3 .2 .3 Norm al approximation
Ьг - E (b ,)
(9.37)
(b2)
is valid only fo r extrem ely large n values ( i . e . , w ell over 1000). It should
not be used.
Table 9.5 o r Figure 9.2 can be used for 7 < n < 200. The Anscombe and
Glynn o r Bowman and Shenton approximations can be used for n > 20. Both
require computations. The form er requires less computations giving an ex
plicit solution.
Again, as with N/bj, when knowledge is available concerning the a lter
native ( i . e . , /?2 > 3 o r < 3) a one sided test should be used.
9 .3 .3 .3 .1 The R-test
The simplest omnibus test consists of perform ing the N/bj test at level
and the b 2 test at level 0^2 and reject normality if either test leads to re je c
tion. The overall level of significance a for these two tests combined would
then be, by Bonferroni^s inequality,
a = Ц а * - (0^+)2) (9.39)
(9.39) would hold exactly if Nibj and w ere independent. They are uncorre
lated but not independent and use o f (9.39) to determine the overall level of
significance produces a conservative test. T ables o f corrected values are
given in Pearson et al. (1966) fo r n = 20, 50, and O' = 0.05 and 0.10.
In order to use this test one can determine a * as
where a i s the d esired overall lev el. Note 2a* is the level o f the in dividual
te sts.
The term R test was given to the above omnibus procedure because it
can be viewed as employing rectangular coordinates for rejection of normality.
= x^(\/bi) + (b j) (9.41)
TESTS FOR THE NORMAL DISTRIBUTION 391
In Chapter 7 Bowman and Shenton review their k | test which has the
same form at as (9.41) except their approximation to X (b 2) involved use of
Johnson Su curves (see also Bowman and Shenton (1975)). They also present
in F igure 7 . 1 contours which allow fo r exact level o f significance tests o f
a = 0.05 and 0.10 fo r the K| test. These contours are fo r n = 25 to 1000.
where fff = 6/n and o-f = 24/n. Asymptotically (9.42) would be distributed
as a chi square variable with two degrees of freedom if the null hзфothesis
of normality was true. Due to the slow convergence of b 2 to normality this
test is not useful.
Cox and Hinkley (1974) suggested
Another possibility is
9 .3 .3 .4 Related Tests
9 .3 .3 .4 .1 G eary’s tests
W = f ^ (9.45)
'jn (Z X * )*
and
Z I X - X l /n
a = (9.46)
\Гп(а - .7979)
(9.47)
.2123
can be considered as a standard normal variable with mean zero and variance
unity for n > 41. Recently Gastwirth and Owen (1977) discussed optimal fea
tures of a.
Geary (194 7) in one of the most distinguished papers on tests of normality
considered tests of the form
Note a (l) = a of (9.46), and a(4) = Ьз» Geary discussed optimal properties
of Ьз in this fram ework. Table 9.4 contains numerical examples of G eary’s
a test.
_ ^(n) “ ^(1)
(9.49)
TESTS FOR THE NORMAL DISTRIBUTION 393
that is, the ratio of the sample range to the sample standard deviation.
Probability points of u are given in Pearson and Hartley (1972). Table 9.4
contains numerical examples of this u test.
C learly, the recent interest in tests of normality is due mainly to the exciting
work of S. S. Shapiro and M . B. W ilk (1965). Their test fo r normality and
the tests that have resulted as modifications and extensions of their test are
called regression and correlation tests (see Chapter 5). These term s are used
in that these tests can be viewed as originally arisin g from considering a
linear model
X = Д + O’ E Z + €. (9.50)
(i) ^ (i)
In (9.50) the best linear unbiased estimate o f <x from the Gauss M irkov the
orem is
52 = c*V“^X/c*V“b (9.51)
(a ^ 2
W = (9.52)
( n -1)8=' 2 (X - 3 Q ^
where
c*V
(9.53)
(c»V "2ç)-
a.
I
Shapiro-W ilk
24.627 13.880
Z (X - 616.554 240.498
W = - g ^ - 0,984 0.801^
S (X -X )^
D ’Agostino
217.53 123.095
z (x - 616.554 240.498
Sc X
I (I)
D = 0,27703 0,25101
n N / n Z (X -X )2
ÎD ata sets are first ten observations fo r NOR and E X P data sets of
Appendix,
^ F o r the Shapiro-Wilk W test reject the null hypothesis at 0,02 level of
significance if W < 0.806. So reject for negative exponential at 0.02 level,
R e j e c t the null hyxюthesls at 0.05 level of significance if observed
D < 0.2513, the low er tall critical value of D (see text). So reject at 0.05
level.
TESTS FOR THE NORMAL DISTRIBUTION 395
mator of (j given that the population is norm ally distributed. W can also be
viewed as the R^ (square of the correlation coefficient) obtained from a n o r
m al probability plot (see Section 2 . 4) and thus the notion of a correlation
test.
The ai values fo r n = 3 to 50 w ere given by Shapiro and W ilk (1965)
and are presented in this book in Table 5.4. Because W is sim ilar to an R^
value, large values ( i . e . , values close to one) indicate normality and values
sm aller than unity indicate nonnormality. Thus values in the low er tail of
the null distribution of W are used for rejection. Table 5.5 gives the critical
values of W fo r n = 3 to 50.
9 .3 .4 .2 D^Agostino^s D Test
The W test requires a different set of a weights fo r each sam ple size n. A
modification was presented by D^Agostino (1971) which does not require any
tables of weights. It is given as follow s:
D =
n^'v/m”.
(9.54)
n ^ {Z (X - X )2}5
where
02998598
(9.56)
L 24тт J n/п
^ n/Ïi (D - 0.28209479)
Y = (9.57)
0.02998598
CO
CD
Percentiles
10 -4.66 -4.06 -3.25 -2.62 -1.99 0.149 0.235 0.299 0.356 0.385
12 -4.63 -4.02 -3,20 -2.58 -1.94 0.237 0,329 0.381 0.440 0.479
14 -4.57 -3.97 -3.16 -2.53 -1.90 0.308 0.399 0.460 0.515 0.555
16 -4.52 -3.92 -3.12 -2.50 -1.87 0,367 0.459 0.526 0.587 0.613
18 -4.47 -3.87 -3.08 -2.47 -1.85 0.417 0.515 0.574 0.636 0.667
20 -4.41 -3.83 -3,04 -2.44 -1.82 0.460 0.565 0.628 0.690 0.720
22 -4.36 -3.78 -3.01 -2,41 -1.81 0.497 0.609 0.677 0.744 0.775
24 -4.32 -3.75 -2.98 -2.39 -1.79 0.530 0.648 0.720 0.783 0.822
26 -4.27 -3.71 -2.96 -2.37 -1.77 0.559 0.682 0.760 0.827 0.867
28 -4,23 -3,68 -2,93 -2.35 -1.76 0.586 0.714 0.797 0.868 0.910
30 -4.19 -3.64 -2.91 -2.33 -1.75 0.610 0.743 0.830 0.906 0.941
32 -4.16 -3.61 - 2,88 -2.32 -1.73 0.631 0.770 0.862 0.942 0.983
34 -4.12 -3.59 - 2,86 -2.30 -1,72 0.651 0.794 0.891 0.975 1.02
36 -4.09 -3.56 -2,85 -2.29 -1.71 0.669 0.816 0.917 1.00 1.05
38 -4,06 -3.54 -2.83 -2.28 -1.70 0.686 0.837 0.941 1.03 1.08
40 -4.03 -3.51 -2.81 -2.26 -1.70 0.702 0.857 0.964 1.06 1.11
42 -4,00 -3.49 -2.80 -2,25 -1.69 0.716 0.875 0.986 1.09 1.14
44 -3.98 -3.47 -2,78 -2.24 - 1.68 0.730 0.892 1.01 1.11 1.17
46 -3.95 -3.45 -2.77 -2.23 -1.67 0.742 0.908 1.0 2 1.13 1,19
48 -3.93 -3.43 -2.75 - 2.22 -1.67 0.754 0.923 1.04 1.15 1.22
50 -3.91 -3.41 -2.74 - 2.21 - 1.66 0.765 0.937 1.06 1.18 1.24 §
H
60 -3.81 -3.34 - 2.68 -2.17 -1.64 0.812 0.997 1.13 1.26 1.34 W
70 -3.73 -3.27 -2.64 -2.14 -1.61 0.849 1.05 1.19 1.33 1.42 H
Ul
80 -3.67 -3.22 -2.60 - 2.11 -1.59 0.878 1.08 1.24 1.39 1.48
4
90 -3.61 -3.17 -2.57 -2.09 -1.58 0.902 1.12 1.28 1.44 1.54 0
tti
100 -3.57 -3.14 -2.54 -2.07 -1.57 0.923 1.14 1.31 1.48 1.59
д
150 -3.409 -3.009 -2.452 -2.004 -1.520 0.990 1.233 1.423 1.623 1.746
200 -3.302 -2.922 -2.391 -1.960 -1.491 1.032 1.290 1.496 1.715 1.853
1.060 1.328 1.545 1.779 1.927
§
250 -3.227 -2.861 -2.348 -1.926 -1.471
300 -3.172 -2.816 -2.316 -1.906 -1.456 1.080 1.357 1.528 1.826 1.983
>
350 -3.129 -2.781 -2.291 - 1.888 -1.444 1.096 1.379 1.610 1.863 2.026 ïr*
400 -3.094 -2.753 -2.270 -1.873 -1.434 1.108 1.396 1.633 1.893 2.061 1
450 -3.064 -2.729 -2.253 -1.861 -1.426 1.119 1.411 1.652 1.918 2.090
1.668 1.938 2.114 5
500 -3.040 -2.709 -2.239 -1.850 -1.419 1.127 1.423 д
550 -3.019 -2.691 -2.226 -1.841 -1.413 1.135 1.434 1.682 1.957 2.136
600 -3.000 -2.676 -2.215 -1.833 -1.408 1.141 1.443 1.694 1.972 2.154 о
650 -2.984 -2.663 -2.206 -1.826 -1.403 1.147 1.451 1.704 1.986 2.171
700 -2.969 -2.651 -2.197 -1.820 -1.399 1.152 1.458 1.714 1.999 2.185
750 -2.956 -2.640 -2.189 -1.814 -1.395 1.157 1.465 1.722 2.010 2.199
800 -2.944 -2.630 -2.182 -1.809 -1.392 1.161 1.471 1.730 2.020 2.211
850 -2.933 -2.621 -2.176 -1.804 -1.389 1.165 1.476 1.737 2.029 2.221
900 -2.923 -2.613 -2.170 -1.800 -1.386 1.168 1.481 1.743 2.037 2.231
950 -2.914 -2.605 -2.164 -1.796 -1.383 1.171 1.485 1.749 2.045 2.241
1000 -2.906 -2.599 -2.159 -1.792 -1.381 1.174 1.489 1.754 2.052 2.249
1500 -2.845 -2.549 -2.123 -1.765 -1.363 1.194 1.519 1.793 2.103 2.309
2000 -2.807 -2.515 - 2 .1 0 1 -1.750 -1.353 1.207 1.536 1.815 2.132 2.342
Adapted from D ’Agostino (1971) and (1972) with perm ission of the Biometrika Trustees. CO
(£>
-Cl
398 D’AGOSTINO
D = E(D) + V (9.58)
P P
where
y i(Z ^ - I) 3Zp) y f(2 Z 3 - 5Zp)
Vp = Zp + (9.59)
24 36
Here
0.02998598
VPzW = (9.61)
\/n
-8.5836542
Vi = (9.62)
n/п
and
Yz = 114.732
n
(9.63)
Note with the C o m ish -F ish e r expansion of (9.58) to (9.63) there is no need
to transform to Y of (9.57).
Finally because the range of D is sm all, it should be calculated to five
decimal p laces.
TESTS FOR THE NORMAL DISTRIBUTION 399
0.02998598
D = 0.28209479 +
\/n
one rejects if D < 0.2513 o r D > 0.2849. The NOR data set does not lead to
rejection at the 0.05 level. E X P does.
9 .3 .4 .3 Shapiro-Francia*s W^ Test
Shapiro and Francia (1972) addressed the problem of the a weights of the
Shapiro-W ilk test by noting that fo r large samples the ordered observations
may be treated as if they w ere independent. With this, the a weights of
(9 . 53) can be replaced by
b^ = (9.64)
9 .3 .4 .4 W eisberg-B in gh am ’s Test
and Asymptotic Extensions
d = (9.66)
Z (X ,c ) = n (l - W ' ) (9.69)
and supplies a table of critical values for n < 1000 (see Section 5.7.3 and
Table 5.2 where the statistic (9.69) is written as Z (X ,m )). Royston (1982)
gave an extension of the Shapiro-W ilk test sim ilar to (9.69) fo r n < 2000.
He presented a statistic for the form
(I-W ) (9.70)
9 .3 .5 M iscellaneous Tests
Andrews, Gnanadesikan, and W arn er (1971, 1972) developed gap test for
normality. Gaps gi are defined as
where c(j) are the expected values of the ith o rd e r statistics from the standard
normal distribution. If the null hypothesis of normality is true, the g^ of
(9.72) are independent exponential variables. Specific types of deviations
from normality reflect themselves in deviations from exponentiality of the g j.
Andrews et al. gave an omnibus test for normality that is distributed under
the null hypothesis approximately as a chi square variable with two degrees
of freedom .
402 D^AGOSTINO
Dumonceaux, Antle, and Haas (1973) looked at the theory of likelihood ratio
tests to develop tests of normality versus some specific alternative d istri
butions (Cauchy, Exponential, and Double Exponential). C rltic a lv a lu e s of
the tests are given. F o r the double exponential the likelihood ratio test is
sim ilar to G eary 's a of Section 9 .3 .3 .4 .1 . Hogg (1972) presented the family
of distributions of the form
в
-Ix I
f(x; 0 ) = (9.73)
2Г
( - 1)
9 .3 .5 .4 Tiku's Tests
Spiegelhalter (1977) used the theory of most powerful location and scale
Invariant tests to develop tests of normality against the uniform and the
double exponential distributions. He then suggested the sum o f these two as
a combined test statistic. A Bayesian argument was presented to justify the
combination. A s 3ntnptotically the two components are equal to G eary 's a test
o f (9.46) and the David, Hartley, and Pearson u test of (9.49). Critical
points w ere given in the article for n < 100 fo r level o f significance 0.05
and 0 . 10 .
9 .3 .5 .6 Other Tests
Other tests of interest are the tests based on the Independence of the sample
mean and standard deviation (Line and Mudholkar, 1980), the test based on
the em pirical characteristic function (Hall and W elsh, 1983) and the squeeze
test (Burch and Parsons, 1976). The term squeeze was derived from the
method used to perform the test, whereby data points plotted on the appro
priate probability paper are squeezed between p arallel r u le s .
TESTS FOR THE NORMAL DISTRIBUTION 403
9 .4 .1 Pow er Studies
There are a large number of tests for judging normality o r departures from
normality. There is no one test that is optimal fo r all possible deviations
from normality. The procedure usually adopted to investigate the sensitivity
of these tests is to perform power studies where the tests are applied to a
wide range of nonnormal populations fo r a variety of sample sizes. A number
of such studies haye been undertaken. The m ajor ones, in ord er of complete
ness and importance, are Pearson , D*Agostino, and Bowman (1977), Shapiro,
W ilk, and Chen (1968), Saniga and M iles (1979), Stephens (1974), D*Agostino
(1971), Filliben (1975), and D*Agostino and Rosman (1974). Other useful,
but less m ajor, studies are D yer (1974), Prescott (1976), Prescott (1978),
Tiku (1974), and Locke and Spurrier (1976).
Presentation of the results of the above power studies produces a number
of difficulties. In o rd e r to be most informative we w ill first present results
Indexed by skewness { • ^ i ) and kurtosis {ß 2 ) and then indexed by specific
tests. The fo rm er comparisons are mainly from Pearson et al. (1977). The
latter are sum m arized from all of the above articles.
When the omnibus tests are applied without directional knowledge of the
alternative distribution, there is very little to choose between the powers
of (Section 9 .3 .3 .3 .2 ) and the R test (Section 9 .3 .3 .3 .1 ) when applied to
platykurtic populations. The Shapiro-W ilk W test is on the whole more pow er
ful than these. The D ’Agostino D test (Section 9 .3 .4 .2 ) does not fit consist
ently in the comparison. In general tnere is usually some other test m ore
powerful than it.
When knowledge of the direction of ß 2 is known {ß 2 < 3 ), the low er tail Ьз
is more powerful than the , R, W , and D tests.
404 D’AGOSTINO
1. The Shapiro-W ilk W test and the Shapiro Francia extension are very
sensitive omnibus tests. F o r many skewed populations they are clearly the
most powerful. When = O and /?2 > 3 a number o f other tests are more
powerful.
2. N/bj and b 2 have excellent sensitivity over a wide range of alternative
distributions which deviate from normality with respect to skewness and
kurtosis, respectively. A s n gets large the N/bj test has no power fo r symmetric
alternatives. When directional information is available ( e . g . , sjß i > O o r
ß 2 < 3) appropriate one sided versions of these tests are very powerful. In
most cases studied in the literature they are usually most powerful.
3. The D ’Agostino-Pearson of Section 9 .3 .3 .3 .2 o r because of its
equivalency to K^, the Bowman-Shenton Kg of Section 9 .3 .3 .3 .3 are sensi
tive to a wide range of nonnormal populations. They can be considered omni
bus tests. F o r skewed alternatives the Shaplro-W ilk W test is usually m ore
powerful. A lso for symmetric alternatives with ß ^ < the W test is often
m ore powerful. F o r symmetric alternatives with ß ^ > Zy is often most
powerful.
A . The R test of Section 9 .3 .3 .3 .1 is also an omnibus test. Its power
usually does not exceed that of .
5. The most powerful EDF test appears to be the Anderson-D arling A^
(Section 9 .3 .2 .2 ). It is at times presented as being sim ilar in power to the
Shapiro-W ilk W test. However, it has not been studied as extensively as
either the moments tests o r the regression tests. M ore power studies are
required to compare it m ore fully to the W , K^, R, and D ’Agostino’s D tests.
6. While D ’Agostino’s D test is an omnibus test it has best power for
distributions with /?2 > 3. Other tests are better than it fo r skewed alter
natives.
7. G eary's a test (Section 9 .3 .3 .4 .1 ) has good power for symmetric
alternatives with ß^ Z * However, b 2 is usually better. F o r skewed a lter
natives W is generally superior.
8. Kolm ogorov-Sm irnov test has poor power in comparison to the many
tests described in detail in this chapter.
9. Chi-Square test is in general not a powerful test of normality.
TESTS FOR THE NORMAL DISTRIBUTION 405
Results of power studies are not the only means fo r judging o r comparing
the normality tests. In practice the data may often involve ties, either b e
cause available figures have been rounded fo r grouping purposes o r because
measurements cannot be carried out beyond a certain degree of accuracy.
It may not be desirable to reject the null hypothesis just because the data
contain these ties o r are grouped. Use of the data as if the underlying popu
lation w ere norm al may not present any problem s if the true population is
approximately norm al and the resulting data contain tie s. (R esearch is
needed on this point.) Pearson et a l. (1977) investigated the effect which
ties and the grouping of data have on four tests o r norm ality, ^ I b i , D, W ,
and W *.
F o r judging the effect what matters is the ratio, say Í , o f the standard
deviation of the distribution to the rounding interval, i . e . , the interval b e
tween the nearest possible readings o r observations left after rounding.
Pearson et al. (1977) considered the effect of grouping on ^ I Ь l , and W for
i = 3, 5, and 10 and n = 20 and 50, and for D and W* fo r f = 3, 5, 8, and 10
and n = 100. The present author also considered ^/bl for n = 100, D for
n = 20 and 50, and Ьз fo r n = 20, 50, and 100.
The effect of grouping on and Ьз was not significant. That is, group
ing did not produce differences between the actual declared o r nominal level
of significance. The effect on D w as to make the test slightly conservative.
That is, the actual level of significance was slightly sm aller than the de
clared o r nominal level. F o r the W test the effect was significant for f = 3
and 5. Here the actual level of significance exceeded significantly the normal
level. F o r H = 10, the effect was m inim al. The statistic W* was extrem ely
unsatisfactory. Usually the actual level of significance exceeded by substan
tial amounts (e . g . , 30%) the nominal level, even for £ = 10.
The above results suggested that o r its derivatives as given in Sec
tions 9 .3 .4 .4 and 9 .3 .4 .5 and Chapter 5 must be used with caution if there
are multiple ties.
Pearson et al. (1977) did not consider the effects of ties on the EDF
tests. Until the effects are investigated they should be used with caution on
data containing ties.
how to employ the computer for a good graphical analysis« A detailed exam i
nation of the probability plot should be undertaken.
2. The omnibus tests, the Shapiro-W ilks W test and its extensions
(Sections 9 .3 .4 .1 , 9 .3 .4 .3 , and 9 .3 .4 .4 ), the D^Agostino-P e a rso n test
o r the Bowman-Shenton version k | (Sections 9 .3 .3 .3 .2 and 9 .3 .3 .3 .3 ), and
the Anderson-D arling edf test A^ (Section 9 .3 .2 .2 ) appear to be the best
omnibus tests available. The Shapiro-W ilk type tests are probably overall
most powerful. However, due to the problem with ties (Section 9 .4 .2 ) and
the fact that it gives as a by-product no numerical indications of the nonnor-
m alily, the test based jointly on the very Informative NZb^ and Ьз statistics
may be p referred by many. The ^ I b i and Ьз statistics can be very useful fo r
indicating the type of nonnormality. A lso they can be useful fo r judging if
nonnormality w ill affect any inferences to be made with the data ( e . g . , if
a t test is to be applied to the data o r a prediction is to be m ade).
3. The R test (Section 9 .3 .3 .3 .1 ) and D*Agostino^s D test (Section
9 .3 .4 .2 ) can be used as omnibus test. They are convenient and easy to use.
The tests o f point 2 are probably m ore powerful.
4. If the direction of the alternative to normality is known ( e . g . ,
> O o r /5з > 3), then the directional versions of the ^/bl, Ьз, and D^Agos-
tino D test should be used.
5. F o r testing for normality, the Kolm ogorov-Sm im ov test is only a
historical curiosity. It should never be used. It has poor power in comparison
to the above procedures.
6. F o r testing for normality, when a complete sample is available the
chi-square test should not be used. It does not have good power when com
pared to the above tests.
Y = f(^ ,X )+ € (9.74)
€4 = Y i - Y,. (9.75)
where
TESTS FOR THE NORMAL DISTRIBUTION 407
Y. = H ß , X^) (9.76)
9 .6 .1 R e s id u a ls fr o m a L in e a r R e g r e s s io n
Y = W x+ € (9.77)
€ = Y -Y (9.78)
A lexa B e ise r (1985) in an unpublished work has considered first order auto
re g re ssiv e models of the form
where are Independent norm al variables. She has shown that the N/bj and b 2
computed on the residuals produce valid levels of significance fo r n > 50
and P < 0.9. In fact, this procedure appeared to be m ore appropriate than
computing N/bi and b 2 on directly Incorporating adjustments fo r the de
pendencies of the observations.
In her work the p of (9.79) is estimated as
t (Y .-Y H Y -Ÿ )
t=2
P =
n
Z
Z (Y t -Y )
t= l
where
11
2 ".
= P
9.7 M U L T IV A R IA T E N O R M A L IT Y
О'/p
9 .7 .2 .1 Mardia^ s Tests
b = \ Z Z .rf. (9.81)
I.P ц2 i J ц
and
к - i V ,.2
b_ ” S r .. (9.82)
2 ,P n i 11
where
Here X is the mean vector and S, the sample covariance m atrix. Significant
points obtained from simulations are given in IVbrdia (1970) and M ardia
(1975). M ardia and Foster (1983) discussed omnibus multivariate tests based
on b i^p and b 2 ,p .
Malkovich and A fifi (1973) proposed multivariate skewness and kurtosis tests
using Roy’s union-intersection principle. Multivariate skewness w as given
by them as
^ (ELc'.:?L--cV )? )! (9.84)
(v a r (c 'X ))’
(9.85)
(v a r (c ’X))2
fall outside the interval (к 2 ,кз) where these k ’s produce an a level test.
Machado (1983) found the asymptotic distributions of the statistics in
(9.86) and (9.87) for p = 2, 3, and 4. As with the univariate case, the statis
tics approach their asymptotic behavior very slowly. Machado used the
D ’Agostino ^ГЬl suggested Johnson Su approximation for (9.86) and the
Anscombe and Glynn b 2 approximation for (9.87) to obtain null distributions
TESTS FOR THE NORMAL DISTRIBUTION 411
Z. = S"^(X. - X ) (9.88)
I ' I '
S.w.Z^
X = ^ ^ ^ (9.89)
a llEjW.Z.II
Here d^ is a vector.
W^ = IIZjII (9.90)
stant to be chosen.
F or^a = -1 , d^ is a function only of the orientation of the Zi^s, while for
a = I , da becomes sensitive to the observations distant from the mean. M ore
generally, for a > 0 the vector da w ill tend to point toward any clustering of
observations fa r from the mean, while for a < 0, the vector d^ w ill point in
the direction of any abnormal clustering near the center of gravity of the
data. I
Therefore, d* = d^ fo r a given value o f a, can be regarded as a uni
variate sam ple. Any univariate test of normality can now be employed. The
value o f a can be selected to be sensitive to certain types of nonnormality.
Because of the data-dependence of the approach, the procedure can only be
used as a guide. The form al significance levels do not apply.
Z. = S"^(X. - X ) (9.91)
1 1
9 .7 .3 .2 .1 Bivariate case
O rd er the n squared radii r^^^ < • • • < r^^^ and plot these against the
9 .7 .3 .3 Other Procedures
V ery little has been done by way of power studies fo r multivariate normality
tests. MaIkovich and A fifi (1973) have undertaken a sm all study. M ore is
needed.
TESTS FOR THE NORMAL DISTRIBUTION 413
9 .7 .5 Recommendations
R EFER EN C ES
This reference list contains two sets of referen ces. The first set is a b rie f
set of references in which the norm al distribution is used as a mathematical
model to re a l world data. The second set is the m ajor set of references and
it is on the statistical methods of tests of normality.
Geary, R . C. (1935). The ratio of the mean deviation to the standard devi
ation as a test of normality. Biometrika 27, 310-332.
Uthoff, V . (1973). The most powerful scale and location invariant test of the
normal versus the double exponential. Ann. Statist. £, 170-174.
The е?фопеп11а1 distribution is probably the one most used in statistical work
after the norm al distribution. It has Important connections with life testing,
reliability theory, and the theory o f stochastic processes, and is closely
related to several other well-known distributions with statistical applications,
for example, the gamma and the W elbull distributions.
The general form of the exponential distribution is
421
422 STEPHENS
The null hypotheses corresponding to these four cases w ill be called Hqq ,
Hqi 9 Hq2 9 and Hqj .
Case 0 can be easily handled: the Probability Integral T ransform (P IT ,
Section 4 .2 .3 ) gives n values Z j = F (X i; which, on Hqq , are uniform
U ( 0 , 1) and these can be tested by any of the methods o f Section 4.4 o r Chap
ter 8 . An example of a Case O test for exponentiality is given in Section 4 .9 .
The data are the 15 values of X, given in Table 10.1, and are times to failure
of airconditioning equipment in aircraft, given by Proschan (1963). W e shall
use these data throughout the chapter to illustrate test procedures.
Mathematical properties of the exponential distribution can be used to
change Case I to Case 0 , and to change Case 2 and Case 3 to the special test
of Case 2 with о = 0 . Most of the tests in the literature have been proposed
fo r this situation, so we shall reserv e the notation Hq fo r the hypothesis:
Tests fo r this hypothesis are those discussed throughout most of this chapter,
although we return to Case 3 in Section 10. 14.
A large number of test procedures have been given fo r Hq . One reason
fo r this is that, again because of mathematical properties of the e^x>nential
distribution, it is possible to transform a sample of n X -values from
Exp (0,/3) in several useful ways; one transformation (N below) takes the
sample to a new n-sam ple X* which is also E x p (0,/3), and another tran sfor
mation (J) takes X to a set of n - I ordered uniforms U . Further, J can be
applied to the X* set, to give a set of n - I uniforms U*; we call the conver
sion of X to the K-transform ation on X. Thus tests o f Hq on X can become
tests of Hq on X ’ , o r tests of uniformity on U o r on U*. Furtherm ore, the
different transformations have useful interpretations, depending on the o r ig
inal motivation for testing the X sam ple, and on the alternative distributions
to E x p (0,/3) that the X might have. Two of the most important applications
of Exp (0,/3) variables are to modelling time intervals between events, o r
to modelling lifetim es, o r times to failure, in survival analysis and re lia b il
ity theory. A particular application w ill tend to lead to a particular group of
tests. In a general way, the J transformation, and various tests on U, w ill
arise naturally in connection with a series of events, and the N and K tran s
formations , with tests on X^ o r on U ’ , w ill arise in tests on lifetim es. This
is partly because the properties of X^ and are influenced by whether o r
not the true distribution of X, if not exponential, has a decreasing o r in creas
ing failure rate.
The overall plan of this chapter is therefore as follows. A fter a section
on notation, we show how other cases are reduced to Case 0 o r to a test of
Hq . The applications of the exponential distribution are discussed, and fo l-
TESTS FOR TH E E X P O N E N T IA L DISTRIBUTION 423
T U® d" X-®
lowed by the details of the N , J, and K transform ations, and some of their
properties. Then we turn to tests o f Hq • With the potential applications in
mind, these are roughly grouped into three groups, as follows: Group I ,
those applied to X; Group 2 , those applied to U ; Group 3, those applied to
X^ o r to U 4 The tests fo r the three groups occupy Sections 10.7 to 10 . 1 1 .
The test statistics discussed are almost always presented in the context in
which they w ere first suggested, although, obviously, any test first suggested
fo r the X set could equally be applied to set X \ and vice v e rsa . There w ill
be some inevitable overlap with other chapters, particularly Chapter 8 , con
taining tests for uniformity. A few statistics a re repeated in Chapter 8 (they
differ slightly because in Chapter 8 it is natural to calculate the statistics
from n uniform s, but in this chapter they are found from m = n - 1 uniform s).
424 STEPHENS
One group which is treated mainly in this chapter are tests based on spacings
between uniforms since such spacings are exponentials and these tests have
arisen mostly in connection with tests of Hq •
If a general random sample is given, with no details as to context, and
a test o f Ho is required, the question of making a transformation o r not b e
comes one of obtaining the best power for a test procedure against a class
of alternatives. Some studies on power of tests are reported in Section 10.13,
fo r both omnibus tests and one-sided tests. One feature of these studies is
that they reveal much sim ilarity in power of many of the test procedures,
when applied to a general random sam ple. The user w ill therefore often be
guided by personal preference.
The value of a test statistic often gives information on the set from which
it is calculated, and this in turn may sometimes be interpreted in term s of
the X -sam p le. Thus, in modern statistics, it is common practice to calculate
several statistics, and to use them to analyze features of the given data,
rather than rigorously to apply significance tests. This approach is taken to
illustrate the tests, applied to the data set in Table 10 . 1 , in Section 10 . 12 .
In Section 10.14 we return to Case 3 tests.
10.2 NO TATIO N
The notation used in this chapter, apart from that already described, is
listed in this section.
n = size of set X; m = n - I.
In this section we show how Case I can be reduced to Case 0, and C ases 2
and 3 to a test of Hq . These employ the following properties of a sample
from Езф ( 01,/3 ).
= X, -X ^ ,, I = I, n - r , ( 10. 2)
F o r this case. Result 2 above can be used to change the original X -sam ple
to a Y -sam p le of size n - 1 which, on Hqi , w ill be Exp (0,/3), with /3 known,
and the test becomes a Case 0 test on the Y set.
F o r this case, when the known value of a is not zero. Result I can be used
to produce a Y -sam p le which, on Hq2 , w ill be Exp (0,/3). Thus the Case 2
test with a known is reduced to a test of Hq , applied to the Y -s e t.
426 STEPHENS
This test, of Н03: that a given set X is from Exp ( a , ß ) with both param eters
unknown, can be transformed by use of Results 2 o r 3 above to a test that the
Y -s e t o r Z -se t is Exp (0,/3), that is, a test of Hq applied to set Y o r to set Z .
Result 2 w ill be used if a complete X -sam ple of size n is available; the Y -
sample is then of size n - I . Result 3 w ill be useful if the first r - I ordered
observations of the X -se t are not available, o r if, fo r some reason, they are
suspected to be outliers.
This use of Results 2 and 3 has been a generally accepted way to handle
tests with unknown a , but it may not be the best way, and we return to tests
of Case 3 in Section 10. 14.
X. Tj — ^ ...,n
If the events at times T j are from a Poisson process the variables Xj w ill be
independently distributed Exp (0,/3), where /3 is a positive constant. A
test that the process generating the events is Poisson can therefore be
based on a test that the intervals X¿ are Exp (0,/S). The Poisson process
is discussed in many textbooks, see, for example, Cox and Lew is (1966,
Chapter 2). If the T j are recorded on a horizontal time axis, the Xi are the
spacings between the T j .
T = X + X^ + + X
n I 2
these times w ill be the times of failure for the overall equipment (here an
a irc ra ft).
TESTS FOR THE EXPONENTIAL DISTRIBUTION 427
E 1 0 .4 .2 .1 Example
where m and ß are positive constants. When the constant m = I , the d istri
bution reduces to the exponential. The density is infinite at x = 0 when m < I ,
and is zero when m > I .
428 STEPHENS
mi
f ^ (x ) = ( m / ß ) ( x / ß ) ^ ~ ^ exp{-(x//3)“ } , x > 0, (10.4)
Д = m ß and = m ß ^ ; thus Cy = m
The failure rate, o r hazard rate, of X, is defined as
where f(x) and F (x) are the density and distribution functions of X, assumed
continuous. If F(X) is a distribution of lifetim es, the quantity h(x) dx may be
interpreted as the probability of failing in time dx at x, given that failure
has not occurred up to x. F o r the exponential distribution Exp (0,/3), h(x) =/3,
a constant; for the gamma and the W eibull distributions, the failure rate
increases steadily with x for m > I, and decreases fo r 0 < m < I; for m = I
they both reduce to the exponential distribution with constant failure ra te .
Abbreviations IF R and D FR are commonly used fo r increasing o r decreasing
failure ra te . A distribution with IF R (often called an IF R distribution; a
sample from such a distribution w ill be called an IF R sample) w ill have
C y < I , and a D F R distribution w ill have C y > I . W e can sum m arize results
fo r the gamma and W eibull distributions in the following sm all table.
m = I (E x p (0,/3) Cy = I Constant = ß
m > I Cy < I IF R
TESTS FOR THE EXPONENTIAL DISTRIBUTION 429
10.5.1 Introduction
W e now describe the three important transform ations which are often made
to the data set X under test fo r Exp (0,)3). These a re:
I — I , •• • > n
= (i-l)’
X| = (n + I - i ) E j , i = I, . . . , n
T' = X + X + (10.6)
г (I) (2)
n n
= 2 x: = X X = T (10.7)
n ^ I ^ i n
i= l i= l
Tj = X¿, j = I , • • • , n.
b) Define = Tj/Tj^, j = I , . . . , n - I .
A sample o f X -valu es, all positive, could first be transform ed to a new set
by transformation N , and then to a set U* by applying transformation J
to X ’ . On Hq , the set U ’ w ill be n - I ordered uniform s. The combination
of N and J is equivalent to the following transformation, which we call K.
X^ = (n + I - i ) E . , i = I, , n
c) Calculate Tl =
J 1=1 X ! , and U * = T \ / T
I* (J) J n
, j = I,
•
>,n“I•
Result« The set U* is a sample of n - I ordered uniforms from U (0 ,1 ).
Note also that, from (10.7), Following e a rlie r notation we write
U* = KX.
E 10.5.5 Example
The N and K transformations can be used with a censored sam ple. When only
the r sm allest values o f the original set X a re available, the r values
X \ , X2, . , X^ o f set X* given by the N transformation can still be obtained;
the J transformation can be applied to these to produce r - I ordered uni
fo rm s. This is equivalent to the following useful result. Suppose we calculate,
using the notation o f Section 1 0 .5 . 5 , values
.........
censored data.
432 STEPHENS
The decision whether o r not to test Hq directly on the X -set, o r to use one
of the transformations to X ’ o r to U o r U *, is related to the three test situ
ations: tests on a general random sam ple, with no particular context given,
tests on intervals between events (where the indexing of the intervals might
be important), and tests on lifetim es. W e first observe an important d iffer
ence between transformations J and K: the J transformation p reserves the
original indexing of the X i, while the K transformation does not—that is to
say, in making transformation K (or N) the Xi are first put in ascending
ord er, and the original labelling is irrelevant. Most tests based directly on
the Xi (Group I below) w ill also involve putting them in o rd er and losing the
original indexing.
In tests on a series of events, the original indexing w ill probably be
Important—for example, one wants to know if the intervals are getting longer
o r shorter as time p a sse s—and then m ore Information w ill be given by using
J followed by tests on U.
On the other hand, when the labelling of the Xi is not important there
a re some disadvantages to J. Consider, for example, the lifetim es of equip
ment in the laboratory experiment described in Section 10. 4 . 3 . There is no
significance (presumably) in the index I attached to lifetime X i, and so
different statisticians could label the X -set differently; when this is done the
J transformation w ill produce different values U, and when tests for uniform
ity are applied to the U -se t, different conclusions w ill be reached.
In contrast, use of the K transformation gives always the same set U \
and tests based on U' w ill, for all statisticians, give the same results. The
same holds true of tests based on the X^ themselves, and of those tests on
the X -set, such as EDF o r regression tests, where the observations are
first ordered. This invariance is usually considered a desirable property
for a test procedure.
Another feature of J is that it can produce superuniform U, that is, a
set U which is too evenly spaced to be considered uniform (see Sections 8 . 5
and 4 .5 .1 ). This w ill occur, sometimes, when the X -set comes from an IF R
alternative. Many test procedures are not set up to detect superuniforms; for
example, EDF tests o r regression tests using the upper tail (as is customary)
w ill not detect them. Several of the test statistics to follow w ill detect su p er-
uniform s, o r EDF tests o r regression tests may be used with the low er tail,
but then power is lost when the tests are used with two tails against DFR
alternatives.
The K transformation produces U ’ values which w ill never typically be
superuniform; in fact, they w ill drift toward 0 for D FR samples and toward I
for IF R sam ples, so the pattern of the U ’ set gives information about the
alternative. Finally, K may be used on a censored sam ple. F o r these reasons
K is recommended rather than J to apply to a general random sam ple. These
TESTS FOR THE EXPONENTIAL DISTRIBUTION 433
10.6.2 T e s t s o n a S e r le s o f E v e n t s
It has been suggested above that in tests on events, the natural index of X¿
w ill play an important role. Consider tests for the^ Poisson p rocess, against
the alternative of trend.
In the Poisson p rocess, events occur at a constant rate as time passes;
this leads to the intervals between events being independent and exponential.
M ore p recisely, let X(t) dt be the probability of an event occurring in the
interval dt at time t ; for the Poisson process X(t) is constant. An obvious
alternative to the Poisson process is the model fo r which X(t) increases o r
434 STEPHENS
decreases with t, that is, events occur m ore quickly o r m ore slowly as time
passes. There is then said to be a trend in the rate of occurrence. A possible
model fo r trend considered, fo r example, by Cox (1955) is to suppose \(t) =
C e^^^; if к is positive (negative) there is an increasing (decreasing) rate of
events, and if к = O the rate is constant. Another model for trend is \(t) =
(a > I ); this was considered by Bartholomew (1956) who found a sequen
tial test for randomness against this model for trend.
If events occur more quickly, the Intervals Xj between events w ill b e
come shorter on average; thus the X j, as naturally indexed in time, become
sm a lle r. If the J transformation is applied to the Xj as naturally indexed,
the U(i) values w ill tend toward I. If events occur m ore slowly, the tend
toward zero. Thus trend w ill be detected by the statistics which detect move
ments of the U -values toward O or I . Another alternative to a constant
a rriv a l rate for events is the possibility that they occur periodically; then
the intervals between events are of fairly constant length. The coefficient of
variation Cy of the Xj w ill be sm all, and the U(J) from the J transformation
w ill be super uniform . Statistics to detect periodicity of events must be well
adapted to detect superuniformity of the U (j ). In contrast, if there is a wide
disparity between the Intervals X j, when the longer intervals appear too long
compared with the shorter intervals, the Cy of the Xj w ill be large.
It is possible that the Intervals between events, even if exponentially
distributed, are not independent. Lack of independence is usually difficult to
detect, and the appropriate test statistic w ill depend on how the intervals
are related . Here the indexing of Xj w ill again be important; as naturally
indexed in time, the Xj might, for example, be tested fo r autocorrelation.
An interesting example of events fo r which the J transform produces super
uniform U (i), probably because of lack of independence, are the events
recorded by the dates T j marking the reigns of kings and queens of England
(see Pearson (1963)). F o r further rem arks on the problem s of correlated
intervals see Lewis (1965).
It might be decided to base a test for uniformity of the Uj on the spacings
D j between the U j. The spacing Dj is Xj/T^, so that the rem arks above con
cerning the indexing of the Xj w ill apply also to the spacings D j. The sizes
of the spacings, as naturally indexed by time, w ill be Important both in tests
fo r trend o r for independence, whereas the variance of the spacings w ill be
important in m easuring periodicity o r great disparity between the time
in tervals.
10.8 GR O UP I TESTS
Here the set X w ill be tested directly for Exp (0,/3), with no transformation
N, J, o r K. Tests available include the Pearson chi-square test and m odem
adaptations, discussed in Chapter 3, EDF tests and regression tests, d is
cussed in Chapters 4 and 5, and tests based on sample moments. No further
comments w ill be offered on chi-square tests, but the other tests w ill be r e
viewed below, in the special context of tests fo r exponentiality.
436 STEPHENS
10.8.1 E D F T e s ts
n -1
i = I, ., n
tic is S* = Z?_j^ ÔJ. Finkelstein and Schafer have provided tables for S *
based on Monte Carlo studies.
Most regression and correlation tests described in Chapter 5 are devised for
Case 3, where both location and scale param eters are unknown. However,
two tests are designed specifically for testing Hq . These are
Gurland and Dahiya (1970) and Dahlya and Gurland (1972) have discussed a
general method of deriving a test statistic based on s sample moments. Siç)-
pose the r-th sample moment is m^ = zj^_^ (X.)^/n; when the Dahiya-Gurland
method is applied to the test for Exp (0,/3) we have (C urrie and Stephens,
1984)
Group I tests
K (X ,m ) = .958, Z = 1 5 { l - R * (X ,m )} = 1.24 ф = .2 5 )
R (X ,H ) = .950, Z = 1 5 { l - R ^ (X ,H )} = 1.47 ф = . 15)
Q j = 0 .9 7 7 ф = 0.32)
Q j = 3 .3 8 4 ф = 0.18)
Q j = 5.081 ф = 0.16)
438 STEPHENS
E 10.8.3 Example
Values of EDF statistics for the X -data of Table 1 0 .1 are given in Table 10.2,
P art I . F o r these d a t a = 121 . 2 ; thus, following Section 4 .9 , we calculate
= I = I - e x p (-12/121.2) = 0.094,
(I)
and so on. The values of Z are given in Table 4.13, column 3. Then equa
tions (4 . 2) give = 0.277 and the other values given in Table 10.2. Some
P -lev els of the statistics (used with upper tail only) are recorded. Several
are significant at the 5% level, suggesting rejection of Hq . D and S* (Sec
tion 10 . 8 . 1) are also given.
Values of regression statistics are given in P art 2 of Table 10.2. The
values of R (X ,m ) and R(X, H) (Section 5.11) are, respectively, 0.958 and
0.950; then Z (X ,m ) in Section 5.11.2 is 15(1 - 0.958)^) = 1.24 and Z (X ,H )
is 1 5 { l - (0.950)2} = 1.47. These are, respectively, significant at p = 0.25
and P = 0.15 when referred to Tables 5.6 and 5.7. The value of the Shapiro-
W ilk statistic W g , which is designed fo r use when a is not known, is included
fo r comparison. In P art 3 of Table 10.2 are given the values of the Dahiya
and Gurland statistics Q i , Q 2 , and Q 3 . The p-values have been found from
the Currie-Stephens tables.
A s was stated before, a number of tests for a series of events are based on
making transformation J and then testing that the n - I values U are U (0 ,1).
Because these tests have been suggested in this connection they are reviewed
here; there w ill necessarily be some overlap with Chapter 8 .
Important Note: In Chapter 8 it is natural to assume that the U set has
n values; when tests of Chapter 8 are applied to set U (or to set U* after the
K transformation) the value n must be replaced by m (= n - I) in the form ulas
fo r test statistics, and in using the tables.
a) EDF tests. EDF tests (Case 0) can be used on the U -se t, as described
in Chapter 8 . Statistics D and D " are w ell adapted to detect a shift of U
toward 0 o r I , that is*, to detect trend in events (Section 10.6 .2 ). W^ and A^
can also be expected to be effective fo r these alternatives. Notice, however,
that as customarily used (employing only the upper tall of test statistics for
significance) EDF statistics w ill not detect superuniform U; thus they w ill
not detect periodicity in events (Section 1 0.6.2), o r the occasions when the
J transform can produce superuniformity (Section 1 0 .6 .2 ), unless test statis
tics are referred to the low er tail of the relevant null distribution.
TESTS FOR THE EXPONENTIAL DISTRIBUTION 439
When the model for trend in a se rie s of events is A.(t) = ce^^^, as discussed
in Section 10.6.2, the values U (i), i = l , • • • , n - 1 , instead of being ordered
uniforms, w ill be an ordered sample from density
Group 2 tests
None of the above is significant at the 20% level upper tail, o r the 15% level
low er tail.
R (U ,m ) = 0 .9 7 8 ; Z = 1 4 { l - R 2 (U ,m )} = 0.61 (p > 0 .5 0 )
Kendall-Sherm an K = 0. 4 70
ku ,
f(u) = e , 0< U< 1
к .
e - I
Cox (1955) suggested that the test for к = 0 (Poisson process) against к O
should be based on Ù, which is the likelihood ratio statistic. Large values
of Ü w ill Indicate к > 0 , that i s , events are occurring m ore rapidly with
Increasing time and the U(J) are tending to drift toward I sim ilarly, low Ù
indicates that events are happening less often, and the U(i) a re moving toward
zero. Thus a one-tail test is used if the direction of trend is known, but in
general a tw o-tall test is required. The median U w ill also detect movements
of the U -valu es toward 0 o r I . Note that neither the mean Ú nor the median Ú
w ill detect superuniform observations, nor, in general, the case where there
is excessive variation among the inteiwals between events.
E 10.9.2 Example
D = U -U , 1 = 1. ., n, where U, s o, and U, = I
I (i) (i - 1 )’ (O) ’ (n)
thus n - I uniforms give n spacings. The spacings are connected with the
original observations Xj by = Xj/Tn, i = I, . , n, where Тд = X j.
Basic articles for work on spacings are by Pyke (1965, 1972). Many test
statistics for exponentiality have been based on the values X j, divided by
to eliminate the scale param eter /5. These statistics are therefore calculated,
in effect, from the values D^, and the associated tests can be regarded as
tests for uniformity of the set U , based on the spacings. Test statistics of
this type are discussed both in this chapter and also in Chapter 8.
10 .9 .3 .1 Greenwood^ s Statistic
E 1 0 .9 .3 . 1 Example
11
V = ^ (D. - l/n)2
i= l
a statistic studied by Kim ball (1947). It is easily shown that V = G(n - 1) - l/n.
A lso , in term s o f the original X j, V can be written
V = 2] (X j - x )V t ^
i= l
= S2/(n5i)2
and
Qi = n {n G (n - 1) - 2 } V 4
(W g)-> = n (n + l ) { G ( n - 1 )} - n
Tests using the upper tail of G(n - I) o r of W E q are equivalent to tests using
the low er tail of W g , and vice v ersa.
Thus, several statistics which have been derived from very different
approaches all turn out to be equivalent to Greenwood’s G(n - I ).
A number of statistics have been devised which are directly related to Green
wood’s statistic; most of these have been discussed in connection with testing
fo r uniformity and are included in Chapter 8. The Q u esen berry -M iller statis
tic Q (Section 8 .9.2) might be useful in detecting autocorrelation in a series
of events; so, also, might statistics based on high-order gaps (Section 8 .9 .4 ).
W e now continue with four tests based on spacings which have been developed
specifically in connection with tests for exponentiallty on X o r X ’ . They are
defined in term s of both Dj and the original X¿.
This statistic is
Xl
M (n - I) = -2 2] log (n D )
i= l
n
= -2 Y j lo g (X ./ X )
i= l
n
= - 2 1 £ log X. I + 2n log X
4=1
XX
E IX, - X l
i= l
K(n - I) =
2nX
1 0 .9 .3 .6 ED F Tests fo r Spacings
n-1
F ^ (x ;n ) = P ( D j < x ) = 1 - (1 •X) O< X < 1
This iS a fuliy specified distribution, and it might be thought that EDF tests,
Case 0 (Section 4.4) could be made. The Probability Integral Transformation
would be Z (i) = I - (I - where the D (i) are the ordered spacings,
and E D F statistics could be calculated from the Z (i). In particular, suppose
Ъ is the Kolmogorov statistic. Because the spacings are not independent
(their sum is I ), the Z (ij are not ordered uniforms, so Case 0 tables cannot
be used. However, fortuitously, the Z^y are exactly those which arise in
Srlnivasan’s test (Section 1 0 .8 .1), and Ь w ill be the same as D of that
section, and w ill be re fe rre d to the tables referenced there.
Let P be a value between 0 and I , and let r = [n p ], that is , the greatest inte
g e r less than o r equal to np. The Lorenz curve statistic, derived from the n
ordered spacings D^^^, is
i.„(P) - E %
E 1 0.9.3.7 Example
P art 2 o f T able 10. 3 shows the values of some of the above statistics, based
on the 15 spacings D^ given in Table 10.1. F o r example, M oran's statistic
is M(14) = - 2 [l o g (15 X .041) + lo g (15 X .031) + ••• + log(15 X .179)] = 19.232;
C is then I + 16/90 = 1.178, so М/с = 16.329. This must be compared to X u ,
to give a p -le v e l = 0.30, approximately.
10.11.1 Introduction
EDF and regression tests, applied directly to set X* have not been much
emphasized, perhaps because the X* would first be ordered, and the inform a
tion given by the indexing of the X* is then lost. Other tests on the X* them
selves have been proposed by Epstein (1960a, 1960b). These make use of
Result 4 of Section 10.3.1 but now applied to X*; on Hq , yi = 2X¡/fi has the
446 STEPHENS
E 10 . 11.2 Example
Values of EDF statistics and regression statistics, calculated from the
given in Table 10 . 1 , are given in Table 10.4.
Group 3 tests
R egression statistics
Values of EDF statistics based on the U* derived from the data set X in
Table 10.1, are given in Table 10.5. The statistics are found by using u Jq
in equations (4 . 2 ), and p-values a re found from T ables 4 .2 .1 and 4 . 2 . 2 .
F o r m > 15, (Ü* - 0.5)(12m )^ w ill have approximately the standard normal
distribution.
Lew is (1965) also suggested the statistic as test statistic, with r
a suitable Integer. The statistic given by Z¿ = (n - r )U ji .)/ {i ’(i - U ( r ) ) }
has, on Ц ) , the Fp^q distribution with p = 2r and q = 2(n - r) degrees of
freedom . A commonly suggested statistic is the median U^^), with r = ( n + 1)/2
o r n/2. F o r IF R alternatives, U (r) can be expected to be la rg e , so that Zp
is significant in the upper tail o f Fp^ q; fo r D F R alternatives Zp w ill be s ig
nificant in the low er tail. This statistic w as again examined by Gnedenko,
Belyayev, and Solovyev (1969), by Fercho and Ringer (1972), and Tiku, R a i,
and Mead (1974); the statistic у of Tiku, Rai, and Mead (1974, Section 4)
designed fo r testing Hq , is equivalent to .
In Table 10.5 are given values of Ü ’ and U(Y) and U ( 3) fo r the U ’ set
derived from U of T able 1 0 .1; p-values are found from Table 8.5 and the
beta (x ;7 ,8 ) and beta (x ;8 ,7 ) distributions (Sections 8 .8 .2 and 8 .1 0 .1 ).
448 STEPHENS
Group 3 tests
D =0-----------
.374 (P = 0.015) D " = 0.099 (p > 0.25)
R (U ,m )= 0 .9 0 , Z = 2.62 (p < 0 .0 1 )
The ----------------------------
total time on test statistic was defined in Section 10.5.3 as т
r!, = е У
1=1^ X|.
I
Suppose, fo r given k, we define
TESTS FOR THE EXPONENTIAL DISTRIBUTION 449
k-1
E T'
=1 ^
\ = T ^ ’
n
“
E E
Ii U
(n - i ) x ; iX *
i= l i= l
V = = n- •
n T’
n
n n
V = E (n -i)D i = n - E id ;
i= l i= l
Another group of tests, proposed fo r the set X* by Bickel and Doksum (1969)
includes
n n
S* = ^ x ;H j/T^ = E d ;H j , where = -lo g (I - i/(n + 1))
i= l i= l
n
=E x ; (-lo g H )/ T = - E D 'a o g H )
1=1 1=1
R ecall, in these form ulas, that Tj^ = (Equation 10.7). Statistics and S j
have a resem blance to the regressio n statistics o f Chapter 5, but there is
an important difference; the set x [ are not ordered in the above form ulas,
and they would be fo r regression statistics.
Statistic S f is as 3rmptotically most powerful against W eibull alternatives
fo r X , and statistics s f and S f against two other alternatives (the Makeham
and linear failure rate alternatives) discussed by Bickel and Doksum. Other
statistics may be derived using the properties of X j. F o r example, on a v e r
age, X* < Xl for i > 3 if the distribution is IF R and a test could be based on
450 STEPHENS
Values of the Greenwood G(14), M oran M , and Q u esen berry-M iller Q, calcu
lated from the D j , are given in Table 10. 5. These tend to have low er p-levels
than the corresponding statistics based on the D i, in Table 10.3. The X ’ set,
and hence the D ’ set, contains a zero, and the M oran statistic (also the
Anderson-D arling A^) has been calculated using the correction suggested in
Section 10.10; X^^j has been given value 0.25, since the rounding interval
is I.
Several of the statistics given in Section 10.11.4 are equivalent to the statis
tic Ü* discussed in Section 10.11.3. Since Vn = (n - 1 )Ü \ Vn is the same
as S of Section 10.11.3. A lso Vn and S* are related by Vn = n + (n + l ) S j .
Thus, Vn, S, and S* are all equivalent to as test statistics.
Another statistic equivalent to Ù* is the Ginl statistic Gn, discussed by
G all and Gastwirth (1975). Gn is related to the Lorenz curve discussed in
Section 10 .9 .3 .8 above, and, like the Lorenz curve, derives from concepts
used in economics. The Gini index for a distribution is twice the area b e
tween the population Lorenz curve у = I^ (p ) and the line у = p. The Gini
statistic Gn derived from this index can be calculated in two w a y s. In term s
o f the original X j, Gn is
n-1
E iXi-x.i
G
2(n - 1)T
n-1 n-1
T j ^^1+1 T ^^i+1
i= l 1=1
G =
n (n - 1)T* n -1
TESTS FOR THE EXPONENTIAL DISTRIBUTION 451
The values of all the various statistics can now be used to give an overall
assessm ent o f the X -se t in T able 10.1.
a) The direct tests on X in Group I (Table 10.2) point towards rejection
of Hq : that the X are Exp { 0 , ß ) , with the large value of D'*' indicating that
there are too many sm all values of X compared with large values. This im
plies a D F R population fo r X (see next section and Table 10.6). However,
transformation J, from X to U , gives little information, although G re e n -
wood*s statistic is quite large (0.167), implying high dispersion among
X -valu es.
b) The values X are lifetim es, and the discussion in this chapter sug
gests that tests on X* and on U* w ill be Informative. The high value of D“^
fo r the EDF tests on U* (Table 10.5) means that the U^ set tends toward zero,
and this is confirmed by the low values of Ü ’ , U ( 7 ) and U(8)- These low values
are because the normalized spacings Xi a re , on the whole, increasing with I,
implying that the original X a re from a D FR distribution (Section 1 0 .6 .3 ).
Thus the Group I tests and the tests on U ’ give supporting conclusions.
c) The indexing of set X ’ is giving information about the parent popula
tion for X. Then M oran’s o r Greenwood’s statistics found from the spacings
D i, which a re , in effect, sym m etric functions of the X^ (in which the indexing
is lost), are not significant. Neither are direct ED F tests on X ’ , based on
first ordering the X ’-se t, so that the Indexing is again Iost-
d) The lack of significance of the statistics where indexing is lost in
set X ’ indicates how the Indexing can be Important. H ere, when the X ’ are
only regarded as a random sam ple, as in c) above, they are acceptably
exponential, and one would then accept that the original X -se t is exponential;
however, when the indexing of the X { gives information, as it does in the
tests on U ’ , the indications are that the X -se t comes from a D F R distribution.
The author (Stephens, 1978, 1986) has conducted a large power study on the
various test procedures for Hq , using Monte C arlo sam ples o f sizes n = 10
and n = 20 from a wide range of alternatives. T ables 10.6, 10.7 give a
selection of these results for n = 20. These perm it some comparisons b e
tween tests, applied to a general random sam ple, when the special consid
erations of preserving Indexes, e t c ., are not important.
TABLE 10.6 Power Studies for Tests of Exponentiality, Origin Known (n = 20)
IF R
Alternatives
I 70 55 51 63 55 62 49 64 56 51 42 35 71 57 57 61 I 67 54 34 57 35 60 61 25
U(0,1) 16 82 69 79 81 75 79 63 83 93 85 69 54 56 87 72 71 0 80 75 50 80 48 89 83 73
W elb ( l . s f I 70 55 51 61 55 59 50 65 64 58 45 34 67 63 62 62 0 71 56 30 61 31 61 64 36
I 43 29 26 33 28 28 23 32 37 29 18 11 27 32 33 29 I 38 28 16 30 17 31 33 27
D FR
Alternatives
69 I 57 69 62 51 75 61 62 53 51 67 72 80 47 6 70 64 I 59 40 65 40 75 65 47
XÍ
W eib (0.8)^ 39 3 21 21 26 22 35 30 27 29 27 36 40 32 25 33 34 42 I 30 19 33 22 34 34 26
Iognor (1)'^ 18 18 21 22 23 35 23 22 24 23 24 24 23 14 25 19 12 33 6 23 30 23 33 22 18 18
¿ Cauchy^ 73 2 66 59 69 61 69 70 75 77 79 79 56 73 72 61 79 I 73 66 74 67 72 72 63
IF R
Alternatives
Significant tail L L L L U U U
0 53 39 36 47 41 43 32 47 56 24 10 5 71 57 57 61 0 52 37 21 44 20 42 61 25
x|
U(0,1) 2 67 54 66 66 61 64 48 73 93 60 27 12 56 87 72 71 0 75 63 35 68 34 82 83 72
W eib (1.5)^ 0 53 38 33 46 39 42 33 51 64 27 10 5 67 63 62 62 0 52 39 18 44 19 47 64 34
0 31 17 17 23 20 19 14 23 37 10 3 I 27 32 33 29 0 28 18 8 18 8 22 33 27
D FR
Alternatives
Significant tail U U U U L L L
60 0 45 39 54 44 71 51 52 53 45 56 61 80 47 66 70 62 0 50 31 55 30 69 65 47
Xi
W eib (0.8)^ 24 0 16 13 20 15 26 22 21 29 23 30 31 32 25 33 34 30 I 22 14 23 13 27 34 26
lognor (1)^ 11 11 13 15 15 16 14 15 15 23 19 18 18 14 25 19 12 14 9 15 21 14 20 13 18 18
2 Cauchy 65 I 58 52 65 53 62 61 61 75 72 74 75 56 73 72 61 73 I 66 62 70 62 68 7 63
Since many statistics have a significant tail for IF R alternatives and the
opposite tail significant for D FR alternatives, such statistics can be used
with one tall only if it is desired to guard against only one of these types of
alternative. The tests must be used with care, since they w ill be biased
against the other alternative fam ily. Table 10.7 gives power comparisons
when some statistics are used with one tail only; the size of the test is
now 5%. F o r a fa ir comparison, statistics always used with one tail only,
such as EDF o r sample moment statistics, should now be compared for test
size 5%. The significant tail of one-sided statistics is indicated in the table.
When the direction of the alternative is known, the Group 3 statistic Ù*
is again effective, and now is overall better than in Group 3 o r Group I.
The Greenwood, Sherman, and Lorenz statistics compare with but on
occasion are less powerful; U(n/2) is poor in term s of power. M oran’s sta
tistic is again best of all against gamma and W eibull alternatives, but drops
behind Ù ’ fo r other alternatives. Again, EDF statistics (Group I) and EDF
statistics (Group 3) show rem arkably sim ilar results. The power results
reported form part of a la rg e r study, and values are available (showing sim
ila r trends) fo r n = 10 and n = 50.
used only Group I EDF tests on set Y . F rom the various comparisons the
following points em erge:
(I) ED F statistics on set Y are slightly less powerful than direct EDF statis
tics where a and ß are both estimated, as in (b) above, except possibly
fo r alternatives with a very high probability of sm all values ( e . g . , Xi
or W eibull (0 .5 )).
(2) The direct EDF statistics and (method (b) above) and the Shapiro-
W llk W e give sim ilar power results, but other correlation statistics
have low er power.
(3) The results of T lku, R a i, and Mead suggest that with r =
[(n + 1)/2] (equivalent to their statistic T e ), has better power than has
the median U(n/2) ^ tests of Hq when a was known to be zero, but is
still not as powerful overall as W^ o r in direct EDF tests. These two
are, therefore, the recommended statistics for omnibus tests, since there
is a problem of consistency with W e (see Section 5.12).
(4) The significant tall for W jj o r T g is the same as that for W g, and oppo
site to that for G. These statistics, and also D^ o r D** for direct EDF
statistics, can be used as one-tail tests on one-sided fam ilies of a lte r
natives (DFR o r !F R ), with a consequent increase in power.
(5) Splnelll and Stephens showed that when a. is known, it is best, on the
whole, to use this fact, and therefore to apply the tests given e a rlie r in
the chapter. Note that the opposite effect has been observed in connection
with EDF tests for normality (Section 4'. 16. 2 ).
From the plethora of tests and power results which have been given in this
chapter, it would be useful to extract a simple strategy fo r the practical
statistician to follow, but this is not easy. Perhaps we should sum m arize by
simply repeating four themes which have surfaced throughout the chapter:
(a) Test statistics should be regarded as giving information about the
data and their parent population, rather than as tools for form al testing
p roced u res.
(b) If the data are intervals between events, and if the times of these
events are known, the natural questions to ask w ill be m ore readily answered
by converting these times to the U -se t via the J-transform ation, and looking
at the configuration of the U.
(C) If the data are lifetim es, one must ask what alternative populations
a re of interest. Information on IFR or D FR parent populations can be deduced
from the spacings between the X; this leads naturally to the N -transformation
(which gives the normalized spacings) o r the K-transform ation, with tests to
follow on the U ’ -set. This approach has been much advanced, especially as
the T^ which lead to the U* have the ^’total time on test" interpretation: how
ev er, very sim ilar information is given by direct EDF tests on the original X,
TESTS FOR THE EXPONENTIAL DISTRIBUTION 457
R EFE R E N C E S
Dahiya, R. C . and G u rland, J. (1972). Goodness of fit tests for the gamma
and exponential distributions. Technometrics 14, 791-801.
Epstein, B . (1960a). Tests for the validity of the assumption that the under
lying distribution of life is exponential. P art I. Technometrics 2, 83-
101.
Epstein, B . (1960b). Tests fo r the validity of the assumption that the under
lying distribution of life is exponential. P art П . Technometrics 2, 167-
183.
Sherman, B . (1957). Percentages o f the cj^ statistic. Ann. Math. Statist. 28,
259-261.
11.2 P R O B A B IL IT Y P LO T S
where Z (i) is the ith o rd er statistic from the standardized distribution, and
m| is E { Z ( i ) } . Sim ilarly, fo r 0 < p j < I ,
= M +(T[G-i(p^)]
{ G - '( p , ) , Y ) (U.2)
Uniform JLzJi
a (I)
Normal Ф * > ,)
(I)
pog(y) - mJ
Lognormal Ф
*>1 >
Laplace
log [1 / (2 -2 p j)], Pj > I (i)
w ill have little effect on the appearance of the main body o f the plot. There
may be some noticeable differences, however, fo r extreme o rd e r statistics
from long-tailed distributions. (The read er should note that in Chapter 2 the
Pl of (11.2) was symbolized by Fn(y), the em pirical distribution function.)
Data for this example consist of the first 40 values from the NOR data set
which w ere simulated from the normal distribution with д = 100 and cr = 10.
A normal probability plot is shown in Figure 11.1. The norm al distribution
provides a good fit to the data. Note that the intercept and slope of a straight
line drawn through the points provide estimates of the theoretical mean and
standard deviation. (The reader should compare Figure 11.1 to Figure 2.15
466 MCHAEL AND SCHÜCANY
FIGURE 11.1 Norm al probability plot of the first 40 observations from the
NOR data set.
where the full NOR data set is plotted with X and Y axes interchanged from
Figure 1 1.1.)
The method of the previous section can be applied directly In any situation
where the data consist of some known subset o f o rd er statistics from a ran
dom sam ple. This is because the available are still sample quantiles
from the complete sample and appropriate quantiles of G can be calculated
as before. Although only a portion of the observations from the hypothetical
complete sample can be plotted, the plotted positions of the uncensored points
are the same as when the complete sample is available. The only difference
ANALYSIS OF DATA FROM CENSORED SAMPLES 467
Data for this example consist o f the sm allest 20 values among the firs t 40
values listed in the NOR data set. A norm al probability plot is shown in
Figure 11.2. This plot is m erely an enlargement of the low er portion of the
plot shown in Figure 11.1.
The plotting procedure is the same for Type I as fo r Type 2 sin gly-
censored sam ples; however, with Type I censoring there is one additional
piece of information, namely the censoring time, that can be represented
graphically. Suppose we observe the r sm allest observations from a random
JQ
O
■o
Theoretical Quantiles
sample of size n. F o r a location and scale fam ily we plot the points (G “^(p¿),
Y (i)) fo r 1 = 1, 2, r . Now suppose that the censoring is Type I and that
the observations are all those that are less than some predetermined value t;
thus Y(XH-I) must be greater than t. This additional Information can be given
by plotting the point {G "^ (p j 4-]^),t} with a symbol such as an arro w pointing
up, thus indicating the range of possible values fo r Y ( xh^^x)* Nelson (1973)
Illustrates this technique.
■ П
±
n - j + I
je s
j:Y .< y
(This estim ator is undefined fo r y > Y(n) if Y(u) is not a failure tim e.) In the
case of a complete sample the K -M estim ator reduces to the fam iliar EDF
Fn(y) = (the number of Y ( j ) < y)/n, discussed in Chapter 4. The estimated
probability at the point Y j provided by the K aplan-M eier estim ator is given
by
p .(K -M ) = 1 - 1 (11.3)
n - j + I
je s
3<i
ANALYSIS OF DATA FROM CENSORED SAMPLES 469
fo r i £ S. Herd (1960) and Johnson (1964) propose the sim ilar quantile p rob
abilities
+ I
p.(H -J) = I - П (11.4)
3<i
P j(N) = Ц exp ( -
T T -.)
(11.5)
je S
ill
( 11. 6)
' n - 2 c + I .Ii n - j - C+ 2
3<i
Data fo r this example consist of the 100 observations from the W E2 data set
which w ere simulated from the W eibull distribution with cr = I and m = 2.
The data w ere censored as follows: observations among the first, second,
third, and fourth sets of 25 w ere recorded that w ere less than I , 0.75, 0.50,
and 0.25, respectively. This tj^ e of p rogressive Type I censoring could have
occurred if four sets o f 25 devices w ere placed on test at times 0, 0.25,
0.50, and 0.75 with the experiment terminating at time I . The 100 values
470 MICHAEL AND SCHUCANY
Quantile Probabilities
Theoretical Quantiles
(censored and failed) w ere ranked from sm allest to largest. The 33 failure
times are listed in Table 11.2 along with four different choices of quantile
probabilities. O fth e 67 censored devices, 23, 17, 17, and 10 devices had
censoring times of 0.25, 0.50, 0.75, and 1.00, respectively. O nepurpose
of this example is to show how close the agreement can be fo r different
choices of quantile probabilities. Note also the relationship p j(K -M ) > pi(N)
> P i(H -J ). A W eibull probability plot fo r the data using pi(c) with c = 0.3175
is shown in Figure 11.3.
The rem arks made in Section 11.2.1 and Chapter 2 concerning the inter
pretation of probability plots with complete sam ples hold also fo r the case of
multiple censoring. However, in the case of a multiply right censored
sam ple, the effect o f censoring is to Increase the variability on the righ t-
hand side of the plot.
472 MICHAEL AND SCHUCANY
I Time A B Device A B
Data fo r this example consist of the lifetim es, m easured in m illions of op er
ations, of 40 mechanical devices. The devices w ere placed on test at d iffer
ent tim es, and three w ere still working at the end of the experiment. The
data are presented in T able 11.3. Only two modes of failure w ere observed:
either comoonent A failed o r component B failed. These two components are
identical in construction, but they a re subject to different stresses when the
device is operated. Thus their life distributions need not be identical. Quan
tile probabilities are given in T able 11.3 fo r the device as a whole, compo
nent A , and component B under the columns headed “d e v ic e ,” “A , ” and “B , “
respectively. The data for the device are multiply censored since the 35th,
36th, and 40th ordered lifetim es are incomplete. In addition, observations
on component A are censored by failures of component B and vice v e rsa .
This is an example of random censoring caused by competing modes of
failure.
Probability plots fo r the individual components w ere constructed using
several common life distributions. The lognormal distribution seemed to
offer the best fit. Lognorm al probability plots are shown for components A
and B ln F ig u re 11.4(A). The intercepts and slopes of the two lines suggested
by the plots appear to be different. This raises the possibility that, while
the life distributions of the two components may be of the same fam ily, the
distribution param eters may be different. A distracting feature is the notice
able gap near the center of the p lots. The natural tendency is to expect too
much orderliness and to declare that something unusual has occurred. But
such anomalies frequently arise by chance and should not be taken too s e r i
ously. The reader is re fe rre d to Hahn and Shapiro (1967), pages 264-265,
fo r an example of a plot in which the same unusual feature has arisen by
chance.
If the life distributions for components A and B are independent and log
norm al, then the life of the device is distributed as the minimum of two log
norm al random variables. F o r illustration we assume the equality of param
eters . The cdf of the device is then given by
474 MICHAEL AND SCHUCANY
_ ф [Ь б (у )
Fíylíí.a-) = I - <1
{ф -*(1 lo g (Y )} (11.7)
Theoretical Quantiles
Cl
FIGURE 11.4(B) Probability plots for the mechanical device where the
assumed distribution is the minimum of 2 1.1.D . Iognorm als.
Section 11.3. Note finally that the derivation o f special theoretical quantiles
given in (H o 7) would not have been necessary if we had modeled the lifetimes
of the components as exponential, extrem e-value, o r W eibull random v a ri
ables . This is because the minimum of any number of independent identically
distributed random variables from one o f these fam ilies is also of the same
fam ily.
Although the development in the last example is somewhat speculative
in nature, it does serve to illustrate the versatility and usefulness o f proba
bility plotting, as w ell as its subjective and limited ihterpretability.
There are other m ore complicated t5^ e s of multiple censoring which can
a rise in practice. A few of these w ill be discussed below. The thought to
476 MICHAEL AND SCHUCANY
/Ч
Pj(C) = -
n - C +L i П _ i - C ( 11. 8)
2c + I . “ o 3 - C + I
je s ^
j> i
I Z А
1 Z А
1 Z А
1 Z А
¥
Gamma
Intervals which are assigned probability, and the last is the largest value for
which Fn(y) is defined. The four probabilities used are 0.028, 0.099, 0.127,
and 0 . 222. The first three are the midpoints of the jumps and the fourth is
equal to Fn(340). Probability plots are shown in Figure 11.5 fo r four fam i
lies commonly used to model lifetim es. The lognorm al, gamma (with origin 0
and shape near I ), and W eibull distributions all appear to fit the data w ell.
These results are not inconsistent since the gamma distribution described
(езфопепйа! distribution with origin zero) is a m em ber of the W elbull fam ily,
and the low er portions of the W eibull and lognormal cdfs are very sim ilar.
Any conclusions, however, are highly tentative because of the sm all
sample size and the severity of the censoring. If we use a jackknife technique
o r the theory of m . l.e . (see Turnbull, 1976) to estimate the variances of
probabilities assigned to each of the four Intervals, it then appears that none
of the models considered in Figure 11.5 can be soundly rejected.
Grouping is perhaps the most common form of censoring encountered in
practice. Each grouped observation is both righ t- and left-cen sored. Quantile
probabilities can be calculated using the form ula for a complete sample. One
approach to constructing a probability plot is to represent each observation
with the endpoint (or midpoint) of the Interval in which it fa lls . The resulting
plot w ill have a stairstep appearance with the number of steps equal to the
number o f groups. One advantage of this approach is that the sample size is
evident. A simplification is to plot only one point per group.
operation at the same tim e. If a unit fa ils, it is instantly replaced with a new
unit. It is assumed that the lifetim es of the original and replacement units
are independent and identically distributed with cdf F . The exact times of
failures are known but not the identities of the failed units. Except for the
first failure, then, we cannot be sure fo r the ages of the failed units. W e
thus observe a superposition of renewal p ro c e sse s. The failure times are
not censored here, but the identities and therefore the ages o f the failed units
are censored in a sense.
Trlndade and Haugh (1979) describe a method fo r the nonparametric esti
mation of F in the above situation. The renewal function, M, is estimated
using a straightforward nonparametric method. The parent cdf is then esti
mated by eiqploiting the relationship of F to M through the fundamental r e
newal equation. F o r any particular set o f points in time, the estimate of F
provides appropriate probabilities for determining corresponding theoretical
standard quantiles fo r purposes o f probability plotting. Again we w ill empha
size that a meaningful probability plot can always be constructed as long as
the parent cdf can be estimated using a nonparametric method.
Below are given the steps required in constructing a probability plot with
uncensored, singly-censored, multiply right-censored, and multiply le ft-
censored data. The user must provide a value of the constant c with 0< c < I .
The values c = 0.3175 and c = .5 are popular.
(1) Let Y ( I ) , Y ( 2)» •••> Y(n) denote n ordered observations, some of
which may be censored, and let S be the set of subscripts corresponding to
the observations in the ordered list that are not censored.
(2) Determine quantile probabilities p¿ fo r each i e S using one of the
following form ulas:
n - C + I Tl
Ti n—
5
- ]y
j - C + I
C +1 ^ multiply right-censored
n - 2c “* X . П —I - C * r ^
je S samples
j< i (11.9)
n - C
n - 2c
“Г Т II ■ » fo r multiply left-censored sam ples
3es
(3) Enter Table 11.1 and find the line corresponding to the hypothesized
family of distributions. Plot the entry under ”abscissa" versus the entry
under ’’ordinate” fo r each i e S.
480 MICHAEL AND SCHUCANY
11.3.1 E D FStatistics
i
^(1) % ^(1)
I .1 .00995 .03979
2 ,2 .01980 .07918
3 .3 .02955 .11815
4 .4 .03921 .15681
5 o7 .06761 .27038
6 1.0 .09516 .38056
7 1.4 . 13064 .52245
ANALYSIS OF DATA FROM CENSORED SAMPLES 481
and 0.005, respectively. If the data are treated as Type 2 censored, the test
statistics become = C.057 and A i = 0.863; re fe rrin g to Table
2 I , Ai) A I , Ai)
4.5 with P = r/n = .35 gives approximate p -valu es o f 0.25 and 0.08.
E 1 1 .3 .1 .2 Uniform Example
shows that this value is not significant even at the 50% level. If the endpoints
S = O and t = 0.1975 are included, then T = 1.071, which is significant at
the .20 level approximately.
When a ll the values from U (s) to U (r ) (s < r) are available, the test of
H(,: that these a re a subset from an ordered uniform sam ple, can be changed
to a test fo r a complete sam ple. There are several ways to do this. The
sim plest method is as fo llo w s. F o r Type I censoring siq>pose the low er cen
soring value is A and the upper censoring value is B , and let R = B - A ;
then under Hq the values V (i) = (U (i) - A )/R , i = I , . . . , r - s + I, w ill be a
complete ordered uniform sample on the unit interval and can be so tested.
F o r Type 2 censoring, under Hq the values U (s + l)i • • • , U ( r - l ) w ill be
distributed as a complete ordered sample from the uniform distribution b e
tween lim its A = U(S) and B = U (r)* The transformation У (ц = (U(s+1) -
U(s))/R, can be made for i = I , . . . » n*, where n* = r - I - s and R = B - A ;
the V (i), i = I , . . . , n* can then be tested fo r uniformity between 0 and I .
Consider, again, the data of Table 11.4. The upper (Type I) censoring value
is t = 0.1975; thus we can first transform the V(i) as У (ц = U (i)/ 0 .1975 and
then test the У ( 1) fo r uniformity between 0 and I . The У(^) are then 0.050,
0.100, 0.150, 0.199, 0.342, 0.482, and 0.661. The ED F statistics a re
D'^ = 0.375, D " = 0.050, D = 0.375, V = 0.426, W * = 0.413, = 0.085, and
A^ = 2.107. Reference to Case 0 tables (Table 4.2) gives p-vaiues of 0.21
fo r D , 0.07 fo r W ^, and 0.08 fo r A ^.
Suppose censoring occurs in г uniform sample other than at the ends; for
example, U (r) and U (r+ q ) might be known, but the q - I observations in b e
tween are not known. A spacing U (j^ q ) - U(y) is called a q-spacing. Now
suppose that S is a q-spacing covering unknown observations, and let its
length be d. Keeping in mind the exchangeability of uniform spacings (see
Chapter 8) we exchange S with the set of all spacings to the right of S. Under
Ho the new sample U (i ), U ( 2), • • • , U (r ), U fri-i), . . . , Uj^n-q+1)» where
^ fj) ” j = r + l , . . . , n - q + 1 , w ill be distributed as an
ordered uniform sample which is right-censored at U n -q + l* The process
may be repeated if there is m ore than one such spacing. The method can be
used only if it is known how many values are m issing in the spacings. Thus
a uniform (0 , 1) sample with known blocks of m issing observations can be
transform ed to behave like a right (or left) censored sample. Techniques
for this sim pler kind of censoring can then be used.
484 MICHAEL AND SCHUCANY
Z =U h (U , J (11. 10)
(i) (I) r,n^ (r )'
l/ r
where h^^nW = { B * ( x ) } /x, then the Z (i), i = I , . . . , r , a re distributed
like a complete uniform ( 0 , 1) sample of size r . The proof is straightforward
by change of variable.
The computations fo r the transformation a re easily perform ed on a
scientific calculator since the beta cdf can be expressed as the binomial sum
= Z ( i)
l= r
n -i
a. 0 ,3 ^ ,,= .5 0 0
b. 0 ,5 ^ „ = .103
IXKXXX
c. U ,5 „ = .897
-« H
0 1
FIGURE 11.6 Examples of the transformation with г = 5, n = 9.
Consider again the values U (i), i = I, . . . , 7 given in Table 11.4. Using the
transformation above, we first compute the scale factor h = h
1 /7
h = [ B * ( 0 . 13064)] /0.13064 = 3.9991
Comment
In this section the hypothesis of interest is that the sampled population has
an absolutely continuous cdf F(y|0), where (9 is a vector of unknown (nuisance)
param eters. Typically the censored data at hand must be a singly-censored
sample if published tables of critical points are to be used. F o r m ore com
plicated types of censoring, such as multiple right censoring, little work has
been done. F o r a particular set of data, it may be possible to modify a
standard statistic and then estimate certain percentiles, o r the observed
significance level, using simulation techniques. When the censoring is
Type 2, test statistics can often be constructed which have p aram eter-free
null distributions. When the censoring is Type I , statistics with asymptot
ically param eter-free distributions are a possibility.
The data consist o f the sm allest 20 values among the first 40 values
listed in the NOR data set. W e wish to test that the underlying distribution is
norm al. Gupta’s estimates (Gupta, 1952) here are /i = 98.233 and a = 9.444.
Relevant calculations are given in Table 11.5. The value o f the C ra m á r-
von M ises statistic is found to be, using Section 4 .7 .3 ,
20 40
W^ + 0.02512 - — (0.5 - 0.53741)^ = 0.02686
2 20,40 12(40)^ 3 ^ ^
488 MICHAEL AND SCHUCANY
I - 0.5
I
^(1) 40 ^(i) L (i) n J
I 79.43 0.0125 0.02323 0.00012
2 83.53 0.0375 0.05974 0.00049
3 83.67 0.0625 0.06152 0.00000
4 84.27 0.0875 0.06962 0.00032
5 85.29 0.1125 0.08525 0.00074
6 87.83 0.1375 0.13531 0.00000
7 89.00 0.1625 0,16411 0.00000
8 89.90 0.1875 0.18878 0.00000
9 90.03 0.2125 0.19252 0.00040
10 90.87 0.2375 0.21778 0.00039
11 91.46 0.2625 0.23662 0.00067
12 92.02 0.2875 0.25529 0.00104
13 92.45 0.3125 0.27014 0.00179
14 92.55 0.3375 0.27364 0.00408
15 95.45 0.3625 0.38411 0.00047
16 96.13 0.3875 0.41188 0.00059
17 96.20 0.4125 0.41477 0.00001
18 98.70 0.4375 0.51972 0.00676
19 98.98 0.4625 0.53152 0.00476
20 99.12 0.4875 0.53741 0.00249
0.02512
Consider again the sample correlation coefficient between the Y (i) and a set
of constants Ki, denoted, as in Chapter 5, by R (Y ,K ). Because R (Y ,K ) is
invariant with respect to a linear transformation of the Y (i), it follows that
Its null distribution does not depend on location o r scale param eters of the
distribution. This makes it a useful statistic for testing fit. Suppose F(y) is
ANALYSIS OF DATA FROM CENSORED SAMPLES 489
the cdf of Y fo r location param eter 0 and scale param eter I , and let F “^(*)
be its Inverse. Suitable sets of constants a re then Ki = mt, where mt is the
eзфected value o f the ith o rd e r statistic o f a sam ple o f size n from F (y ), o r
Ki = Hi = F “^(i/(n + 1)). The statistic Z = n { l - R 2 (Y ,K )} has been discussed
in Chapter 5, and percentage points have been given fo r censored versions
of Z .
Chen (1984) presents a correlation statistic as an omnibus test fo r the
composite hypothesis o f exponentiality in the presence o f random censoring.
A s 3nnptotic distributions are derived under a particular censorship model,
which is quite robust provided that less than 40% of the observations are
censored.
Consider again the sm allest 20 values among the first 40 values in the
NOR data set. When testing fo r normality using R (Y ,K ), we obtain Z = 0.035
which fa lls just below the .50 point. Thus on the basis of the statistic Z we
cannot reject the hypothesis o f normality at the usual lev els. Another example
of a correlation type statistic is given in the next section.
Spacings between ordered uniforms w ere defined in Section 10.9.3. Sim ilarly,
spaclngs can be defined between o rd e r statistics Y (i) o f a sample from any
distribution. If the distribution has no low er lim it, the first spacing w ill be
D i = Y(2) - Y ( I ) , and so on. Sim ilarly leaps jgj can be defined by f i = D^ZE(Di).
An important property of leaps is that, fo r continuous distributions, they
w ill (under regularity conditions) be asymptotically exponentially distributed
with mean I . Then a test fo r a given distribution with unknown location and
scale param eters is reduced to a test sim ilar to a test fo r е^фопеп^аШу of
the f i. The test w ill not be exactly the same as a test fo r exponentiality
because the f i do not become an independent sam ple, even asymptotically.
W e illustrate the technique with an example given by Mann, Scheuer, and
F ertig (1973) who created a test fo r the extrem e-value distribution by using
leaps; the test is fo r right-censored data and can be adapted to a test fo r the
W elbull distribution. Both these features a re illustrated in the example.
The following values are t(i), i = 1......... 15, the firs t 15 o rd e r statistics of
a sample o f 22 t-values; the null hypothesis is Hq: the t-sam ple comes from
a tw o-param eter W elbull distribution (Section 10.4 .4 ), against a three-
param eter W eibull (with positive origin) as alternative. Values are: 15.5,
15.6, 16.5, 17.5, 19.5, 20.6, 22.8, 23.1, 23.5, 24.5, 26.5, 26.5, 32.7,
33.8, 33.9. The steps in making the test a re as follow s. F irs t find X (i) =
log f(i)> I = 1> • • • » 15; Hq then reduces to a test that X (i) are from an
extrem e-value distribution (equation 4.7 o r 5.22) with unknown location and
scale param eters. The denominator E(D j) o f f j depends on the unknown
490 MICHAEL AND SCHUCANY
Typically when testing for goodness of fit we assume only that the underl3ring
cdf is absolutely continuous. Occasionally we may wish to limit our choices
to, say, two fam ilies o f distributions. In particular we may wish to test the
composite null hypothesis
H i: F(y) = F i(y | 6 i)
F(y) = T ^ { y \ e ^ )
L = H f . (Y , . . ; 0 . )
i V ( 3) I
3=1
R M L = L 1/ L 2
T = In (R M L ) - E[ln (R M L)]
shows that the uniformly most powerful Invariant (under linear transform a
tions) test is based upon the (Lehmann) ratio of integrals
LRI = I i /I2
where
Ij = / / fj(vyj^ + u, v y ^ + u )d v d u
-O O O
W e once m ore consider the sm allest 20 values among the first 40 values
listed in the NOR data set. W e first exponentiated the data, and then proceeded
to test the lognormal against the W eibull fam ily. An Interactive procedure was
used to determine the value of R M L . Entries in Table IX of Antle (1975) must
R E FER ENCES
B reslow , N . and Crowley, J. (1974). A large sample study of the life table
and product limit estimates under random censorship. Ann. Statist. 2,
437-453.
param eter v s . three param eter W eibu ll; confidence bounds fo r threshold.
Technometrics 17. 237-245.
12.1 IN TR O D U C TIO N
497
498 TIETJEN
1. Omit the outliers and treat the reduced sample as a "n ew " sam ple.
2. Omit the outliers and treat the reduced sample as a censored sam ple.
3. W insorlze the outliers, i . e . , replace them with the value of the nearest
"good" observation. This at least preserves the direction of m easure
ment.
4. Ask the experimenter to take additional observations to replace the
ou tliers.
5. Present one analysis including the outliers and another excluding them.
If the results are very different, view the conclusions cautiously.
E 12. 2. 1 Example
In this and the following examples, sample sizes w ere chosen partly for
convenience in computation; the tests may not be as powerful as one would
like. A set of eight mass spectrometer measurements w ere made on a single
sample of a particular Isotope of uranium. The data, arranged in ord er, are
as follows: 199.31, 199.53, 200.19, 200.82, 201.92, 201.95, 202.18,
245.57. Experience has shown that outliers usually occur on the high side.
Assuming normality, can the largest observation be rejected as an outlier?
W e calculate x = 206.43, s = 15.85, and Тд = (245.57 - 206.43)/15.85 =
2.460. Since this is greater than the 5% critical value of 2.03 from Table
12.1, we reject the Ьзфоthesis ( i . e . , we decide that 245.57 is an outlier).
n 5% 2.5% 1% 5% 1%
(continued)
501
502 TIETJEN
n 5% 2.5% 1%
60 3.03 3. 20 3. 41
70 3.09 3. 26 3. 47
80 3.14 3. 31 3. 52
90 3.18 3. 35 3. 56
100 3.21 3. 38 3. 60
n 3 4 5 6 7 8 9 10 12
d .f. 1% points
120 2.25 2.48 2.62 2.73 2.82 2.89 2.95 3.00 3.08
OO 2.22 2.43 2.57 2.68 2.76 2.83 2.88 2.93 3.01
(continued)
AN ALYSIS AND D E T E C T IO N O F O U TLIE R S 503
T A B L E 12.1 (continued)
n 3 4 5 6 7 8 9 10 12
d .f. 5% points
120 1.76 1.96 2.11 2.22 2.30 2.37 2.43 2.48 2.57
When there is a possibility of m ore than one outlier in the sam ple, com pli
cations quickly a r is e . Grubbs (1950) derived exact critical values fo r the two
largest (o r two sm allest) outliers, but did not obtain critical values fo r the
largest and sm allest observations. Tietjen and M oore (1972) extended G rubbs’
critical values, by simulation, for up to 10 outliers. The statistic used was
the ratio of the sum of squares in a sample which omits the outliers to the
sum of squares for the complete sam ple. This statistic was called
Tletjen and M oore obtained another statistic, Ej^, which took the same form
as L j^ but the numerator was based on omitting the к most extreme o bserva
tions from the mean (from either o r both ends). The test was based on the
assumption that к was known and in practice determined by looking at the
data. Since one does not anticipate any outliers in a sample, к could not be
known in advance, and looking at the sample interfered with the a -le v e l in
an unknown way. Furtherm ore, if к w ere determined automatically, Ej^ could
pick the wrong observations to test. (In a sample of size 12, let 10 values be
from a N (0 ,1), one be at 10 and one be at 100. Since the mean is close to 10,
the two most extreme observations are the sm allest one and the one at 100.
C learly the sm allest observation is not an ou tlier.) The last problem is
remedied simply by picking out the single observation furthest from the
mean as outlier candidate #1, then omitting it from the sample and picking
the farthest from the new mean as outlier candidate #2, etc.
Yet another problem arises in trying to choose a value for k, the number
of observations to be tested as o u tliers. If there are two large observations
which a re nearly equal, and if one uses a one-outlier test on the largest, the
test w ill usually fail to reject because the second outlier masks the presence
ANALYSIS AND DETECTION OF OUTLIERS 505
a ce
(continued)
506 TIETJEN
a a
o f the first; the procedure caimot get started. Thus the repeated application
of a single outlier test can easily fa il. Furtherm ore, if there is only one
large outlier and one uses a test fo r two ou tliers, the test is likely to reject
Hq and claim that there are two outliers, a phenomenon known as swamping.
Rosner (1975) devised a procedure which successfully overcam e masking
but is still subject to some swamping. Let I q be the full data set and It+ i be
the set obtained by omitting from It the point most extrem e from the mean
o f It* Let к be an upper bound on the number of outliers in the sam ple.
Apply a one-outlier test in succession to I q , I i , • • • , I k - 1 * and let the last
significant result be fo r 1щ - 1 - Decide that the m observations omitted from
Im are outliers. The critical values have to hold simultaneously fo r the sev
e ra l tests, hence a re difficult to generate. It should be e a sie r to estimate ku
than k, but the amount of swamping w ill depend on how badly we estimate кц.
Despite the swamping, we recommend R osner*s test fo r sev eral outliers if
the a -le v e l is Important to maintain. W e cannot state how effective Lj^ o r Ej^.
might be because there is no objective way of deciding upon a value o f k. The
best tables a re given by Jain (1981) as Table 12.2.
E 1 2 .3 .1 Example
Twenty laboratories did an analysis on a single blood sample for lead content.
Assum ing the data are norm ally distributed, are there any outliers? .000,
.015, .016, .022, .022, .023, .026, .027, .027, .028, .028, .031, .032,
.033, .035, .037, .038, .041, .056, .058.
Using Rosner^s ESD procedure, we set 20% as an upper limit on the
number of outliers and check fo r up to 4 o u tliers. I q is the full set of data
(x = .0298, S = .0131), I j the set with .000 omitted (¾ = .0313, s^ = .0114),
I 2 is the set with .000 and .058 omitted (¾ = .0298, S2 = .0097), I 3 is the
set with .000, .058, and .056 omitted (¾ = .0283, S3 = .0073), I 4 is the set
with .000, .058, .056, and .015 omitted. W e calculate the Tj^ statistic fo r
each set, obtaining Rj = 2.27 (for set I^ ), R 2 = 2 .3 4 , R 3 = 2.70, R^ = 1.82.
F rom Table 12.2 we obtain the 5% critical values fo r Ri, R2 , R3 , and R^ of
2.95, 2.63, 2.49, and 2.39, respectively. R 3 is the only one significant,
hence we declare .000, .056, and .058 to be outliers.
In the univariate case the residuals are correlated and have a common v a ri
ance. In the regression case, the residuals are also correlated but each
residual has its own variance which depends, to some extent, on the arran ge
ment of the x -v alu es. Let Y = J g + £ be the linear model in which X is the
(n X I) vector of respon ses, X an (n x p) m atrix of known constants, ß a
(p X I) vector of unknown param eters, and € an (n x I) vector of norm ally
508 TIETJEN
n I 2 3 4 5 6 8 10 15 25
5 1.87
6 2.00 1.89
30 2.80 2.79 2.79 2.78 2.77 2.77 2.75 2.73 2.66 2.13
35 2.86 2.85 2.85 2.85 2.84 2.84 2.82 2.81 2.77 2.55
40 2.91 2.91 2.90 2.90 2.90 2.89 2.88 2.87 2.84 2.72
45 2.95 2.95 2.95 2.95 2.94 2.94 2.93 2.93 2.90 2.82
50 2.99 2.99 2.99 2.99 2.98 2.98 2.98 2.97 2.95 2.89
60 3.06 3.06 3.05 3.05 3.05 3.05 3.05 3.04 3.03 3.00
70 3.11 3.11 3.11 3.11 3.11 3.11 3.10 3.10 3.09 3.07
80 3.16 3.16 3.16 3.15 3.15 3.15 3.15 3.15 3.14 3.12
90 3.20 3.20 3.19 3.19 3.19 3.19 3.19 3.19 3.18 3.17
100 3.23 3.23 3.23 3.23 3.23 3.23 3.23 3.22 3.22 3.21
(continued)
ANALYSIS AND DETECTION OF OUTLIERS 509
n I 2 3 4 5 6 8 10 15 25
5 1.92
6 2.07 1.93
30 2.96 2.96 2.95 2.94 2.93 2.93 2.90 2.88 2.79 2.17
35 3.03 3.02 3.02 3.01 3.00 3.00 2.98 2.97 2.91 2.64
40 3.08 3.08 3.07 3.07 3.06 3.06 3.05 3.03 3.00 2.84
45 3.13 3.12 3.12 3.12 3.11 3.11 3.10 3.09 3.06 2.96
50 3.17 3.16 3.16 3.16 3.15 3.15 3.14 3.14 3.11 3.04
60 3.23 3.23 3.23 3.23 3.22 3.22 3.22 3.21 3.20 3.15
70 3.29 3.29 3.28 3.28 3.28 3.28 3.27 3.27 3.26 3.23
80 3.33 3.33 3.33 3.33 3.33 3.33 3.32 3.32 3.31 3.29
90 3.37 3.37 3.37 3.37 3.37 3.37 3.36 3.36 3.36 3.34
100 3.41 3.41 3.40 3.40 3.40 3.40 3.40 3.40 3.39 3.38
(continued)
510 TIETJEN
n I 2 3 4 5 6 8 10 15 25
5 1.98
6 2.17 1.98
30 3.30 3.29 3.28 3.26 3.25 3.24 3.21 3.17 3.04 2.21
35 3.37 3.36 3.35 3.34 3.34 3.33 3.30 3.28 3.19 2.81
40 3.43 3.42 3.42 3.41 3.40 3.40 3.38 3.36 3.30 3.08
45 3.48 3.47 3.47 3.46 3.46 3.45 3.44 3.43 3.38 3.23
50 3.52 3.52 3.51 3.51 3.51 3.50 3.49 3.48 3.45 3.34
60 3.60 3.59 3.59 3.59 3.58 3.58 3.57 3.56 3.54 3.48
70 3.65 3.65 3.65 3.65 3.64 3.64 3.64 3.63 3.61 3,57
80 3.70 3.70 3.70 3.70 3.69 3.69 3.69 3.68 3.67 3.64
90 3.74 3.74 3.74 3.74 3.74 3.74 3.73 3.73 3.72 3.70
100 3.78 3.78 3.78 3.77 3.77 3.77 3.77 3.77 3.76 3.74
n = number of observations
q = number of independent variables (including count for intercept if fitted)
ANALYSIS AND DETECTION OF OUTLIERS 511
R C 6 8 10
a = 0.01
3 .66033*
4 .67484* .66511*
a = 0.05
3 .64810*
4 .64512* . 62066*
a = 0.01
. 50294*
.48778*
.46011* .42529
(continued)
512 TIETJEN
R C 3 4 5 6 7 8 9 10
5 3 .46503*
4 .43209 .39515
6 3 .44276
4 .40755 .37024
7 3 .42260
4 .38649 .34951
8 3 .40468
4 .36836 .33200
9 3 .38877
4 .35261 .31697
10 3 .37461
4 .33878 .30391
(continued)
ANALYSIS AND DETECTION OF OUTLIERS 513
R C 4 5 6 7 8 9 10
a = 0.05
3 .47790*
3 .45465*
.42314 .38800
.42912
.39495 .35936
3 .40625
4 .37136 .33624
3 .38640
4 .35157 .31724
3 .36920
4 .33476 .30131
(continued)
514 TIETJEN
R C 10
9 3 . 35418 (1)
4 .32028 .28770
10 .34094 (I)
.30765 .27590
distributed e rro rs with mean zero and variance o^I. If Y is the vector of fitted
values, fe = IC - X is the vector of residuals. Letting У = (vy) = J(JC^X)
(n - p) and v a r (ej) ^ ( I ~ v jj). The ith studentized residual is
r i = e^/N/var ej and the maximum o f the absolute values of the n w ill be de
noted by r^n* If is greater than some critical value hoj, the observation
which gave rise to r^j^ is declared to be an outlier. Much o f the early work in
this area is due to Srlkantan (1961), but the best tables to date are those of
Lund (1975), Table 12.3.
In cases where we "know” cr^ o r have an independent estimate Sy of it,
we may use this knowledge in calculating r ^ , but should use the critical
values given by Joshi (1972).
F o r a certain large class o f designed experiments the residuals have a
common variance. This class of experiments includes a ll balanced factorial
arrangem ents, balanced incomplete blocks, and Latin squares. F o r these
arrangem ents we fit the model and find the ith residual. The ith normalized
I
residual is = е^/(2е?)^ and the maximum o f the absolute values of the
is called the maximum normed residual, M N R . Important work in this area
ANALYSIS AND DETECTION OF OUTLIERS 515
w as done by Stefansky (1972), but Galpin and Hawkins (1981) have a better
and m ore extensive set of tables (Table 12.4). If the M NR exceeds the tabu
lated critical values, the observation which gave ris e to it is judged to be
an outlier. The statistic given by C . Daniel (1960) in his work on half normal
plots is equivalent to this test.
E 1 2 .4 .1 Example
Snedecor and Cochran (1967) give an example o f the use o f organic phosphorus
(X2) inorganic phosphorus (x^) on the yield of co m (y) on 18 Iowa so ils. The
data are shown below:
Soil
Sample Ii i i
residuals are also shown, as w ell as the studentlzed residuals. The value
of Гщ is 3.18, and from Table 12.3 with q = 3 (number of param eters)
significant at the .05 level, hence is declared an outlier.
E 12.4.2 Example
35 32 40 37 33 34 39 38 +2 -2 +1 -I
29 29 36 34 29 30 35 34 +0 -I +1 +0
25 29 20 30 23 24 29 28 +2 +5 -9 +2
19 25 35 25 23 24 29 -28 -4+1+6-3
22 20 29 29 22 23 28 27 +0 -3 +1 +2
The residual in the third row and column seems large. ____
F o r these data, Stefansky^s maximum normed residual is 9/n/202 =
. 6332. Consulting Table 12.4 we find a 5% critical value of . 590 and a 1%
critical value of .640, hence the observation in the third row and third col
umn would be judged an outlier at the 5% level but not at the 1% level.
In the sim ple linear regression case, the observation with largest residual
may not be the one with the largest test statistic, Гщ» so that judging re s id
uals by eye begins to fail as a tool. In a two-way table. Gentleman and W ilk
(1975) declare that ”in the null case of no outliers, the residuals behave much
like a norm al sam ple. When one outlier is present, the direct statistical
treatment o f residuals provides a complete basis fo r data-analytic judgements,
especially through judicious use of probability plots. When two outliers are
present, however, the resulting residuals w ill often not have any noticeable
statistical p e cu liarities.” These authors devised a test statistic (¾ = e * e - e * e
where ”e*e is the sum of squares o f revised residuals resulting from fitting
the basic model to the data remaining after the om ission of к data points”
and e*e is the sum of squares of residuals obtained by fitting the model to
possible data partitions, and used the largest of the to identify the ”k most
likely outlier su bset.” Such a procedure can be computationally awesome.
Methods of reducing the labor have been devised, but they are still form idable,
since one firs t chooses a maximum value fo r к (no easy task) then proceeds.
ANALYSIS AND DETECTION OF OUTLIERS 517
E 12.6.1 Example
1 42 80 27 89
2 37 80 27 88
3 37 75 25 90
4 28 62 24 87
5 18 62 22 87
6 18 62 23 87
7 19 62 24 93
8 20 62 24 93
9 15 58 23 87
10 14 58 18 80
11 14 58 18 89
12 13 58 17 88
13 11 58 18 82
14 12 58 19 93
15 8 50 18 89
16 7 50 18 86
17 8 50 19 72
18 8 50 19 79
19 9 50 20 80
20 15 56 20 82
21 15 70 20 91
ANALYSIS AND DETECTION OF OUTLIERS 519
Daniel and Wood (1971) did careful work on this problem « T h eir fit to
the origin al data was E(y) = -3 9 .9 + .72xi + !.ЗОХз - . 15x 3 . A fter much
consideration and plotting of the data, they discarded observations I , 3, 4,
and 21 as outliers, and refitted the equation, obtaining E(y) = -3 7 .6 + .80xi
+ . 58X2 - . 07 x 3 . A robust regressio n yields E(y) = -37.2 + .82Xi + . 52X3
- . 07 x 3 , and deletion o f the four points does not alter the coefficients. The
residuals from the four fits are shown below . The size of the residuals for
points I , 3, 4, and 21 indicate, somewhat subjectively, that they are outliers.
Residuals
12.7 M U L T IV A R IA T E O U TLIER S
Fox (1972) was apparently the first to take into account the correlations b e
tween successive observations in p rocessing time series for outliers. He
considered a type I outlier as one in which a gross e r r o r of observation o r
recording e rro rs affects only a single observation. The type П outlier occurs
when a single ’^innovation” is extreme and w ill affect that observation and
the subsequent ones.
As a model for the t y p e I outlier, Fox used a stationary pth o rd er auto
regressiv e process in which the qth observation has d added to it. The null
hypothesis is Hq : Д = 0, and the alternative, H^ : A 0. Using an asymptotic
expression for the elements w 4 o f the inverse of the covariance m atrix.
Fox obtained a likelihood ratio criterion.
= (y - A ) ' W - y - A)
(12.8.1)
q ,n y ’^ ”V
R E FE R E N C E S
Barnett, V . and Lew is, T . (1978). Outliers in Statistical Data. New York:
W iley.
Guttman, I. (1973). Prem ium and protection of several procedures for deal
ing with outliers w'hen sample sizes are moderate to large, Techno
m etrics 15, 385-404.
Rosner, B. (1977). Percentage points for the RST many outlier procedure.
Technometrics 19, 307-312.
523
524 APPENDIX
6 8 9
-3 .0 * .0013 .0013 .0013 .0012 .0012 .0011 .0011 .0011 .0010 .0010
-2 .9 .0019 .0018 .0017 .0017 .0016 .0016 .0015 .0015 .0014 .0014
- 2.8 .0026 .0025 .0024 .0023 .0023 .0022 .0021 .0020 .0020 .0019
-2 .7 .0035 .0034 .0033 .0032 .0031 .0030 .0029 .0028 .0027 .0026
- 2.6 .0047 .0045 .0044 .0043 .0041 .0040 .0039 .0038 .0037 .0036
-2 .5 .0062 .0060 .0059 .0057 .0055 .0054 .0052 .0051 .0049 .0048
-2 .4 .0082 .0080 .0078 .0075 .0073 .0071 .0069 .0068 .0066 .0064
-2 .3 .0107 .0104 .0102 .0099 .0096 .0094 .0091 .0089 .0087 .0084
- 2.2 .0139 .0136 .0132 .0129 .0125 .0122 .0119 .0116 .0113 .0110
- 2.1 .0179 .0174 .0170 .0166 .0162 .0158 .0154 .0150 .0146 .0143
- 2.0 .0228 .0222 .0217 .0212 .0207 .0202 .0197 .0192 .0188 .0183
-1 .9 .0287 .0281 .0274 .0268 .0262 .0256 .0250 .0244 .0239 .0233
- 1. 8 .0359 .0351 .0344 .0336 .0329 .0322 .0314 .0307 .0301 .0294
-1 .7 .0446 .0436 .0427 .0418 .0409 .0401 .0392 .0384 .0375 .0367
- 1.6 .0548 .0537 .0526 .0516 .0505 .0495 .0485 .0475 .0465 .0455
-1 .5 .0668 .0655 .0643 .0630 .0618 .0606 .0594 .0582 .0571 .0559
-1 .4 .0808 .0793 .0778 .0764 .0749 .0735 .0721 .0708 .0694 .0681
-1 .3 .0968 .0951 .0934 .0918 .0901 .0885 .0869 .0853 .0838 .0823
- 1.2 .1151 .1131 .1112 .1093 .1075 .1056 .1038 .1020 .1003 .0985
- 1.1 .1357 .1335 .1314 .1292 .1271 .1251 .1230 .1210 .1190 .1170
- 1.0 .1587 .1562 .1539 .1515 .1492 .1469 .1446 .1423 .1401 .1379
-.9 .1841 .1814 .1788 .1762 .1736 .1711 .1685 .1660 .1635 .1611
-.8 .2119 .2090 .2061 .2033 .2005 .1977 .1949 .1922 .1894 .1867
-.7 .2420 .2389 .2358 .2327 .2296 .2266 .2236 .2206 .2177 .2148
-.6 .2743 .2709 .2676 .2643 .2611 .2578 .2546 .2514 .2483 .2451
-.5 .3085 .3050 .3015 .2981 .2946 .2912 .2877 .2843 .2810 .2776
-.4 .3446 .3409 .3372 .3336 .3300 .3264 .3228 .3192 .3516 .3121
-.3 .3821 .3783 .3745 .3707 .3669 .3632 .3594 .3557 .3520 .3483
-.2 .4207 .4168 .4129 .4090 .4052 .4013 .3974 .3936 .3897 .3859
-.1 .4602 .4562 .4522 .44 83 .4443 .4404 .4364 .4325 .4286 .4247
- .0 .5000 .4960 .4920 .4880 .4840 .4801 .4761 .4721 .4681 .4641
T A B L E I (continued)
8 9
.0 .5000 .5040 .5080 .5120 .5160 .5199 .5239 .5279 .5319 .5359
.1 .5398 .5438 .5478 .5517 .5557 .5596 .5636 .5675 .5714 .5753
.2 .5793 .5832 .5871 .5910 .5948 .5987 .6026 .6064 .6103 .6141
.3 .6179 .6217 .6255 .6293 .6331 .6368 .6406 .6443 .6480 .6517
.4 .6554 .6591 .6628 .6664 .6700 .6736 .6772 .6808 .6844 .6879
.5 .6915 .6950 .6985 .7019 .7054 .7088 .7123 .7157 .7190 .7224
.6 .7257 .7291 .7324 .7357 .7389 .7422 .7454 .7486 .7517 .7549
.7 .7580 .7611 .7642 .7673 .7704 .7734 .7764 .7794 .7823 .7852
.8 .7881 .7910 .7939 .7967 .7995 .8023 .8051 .8078 .8106 .8133
.9 .8159 .8186 .8212 .8238 .8264 .8289 .8315 .8340 .8365 .8389
1. 0 .8413 .8438 .8461 .8485 .8508 .8531 .8554 .8577 .8599 .8621
1.1 .8643 .8665 .8686 .8708 .8729 .8749 .8770 .8790 .8810 .8830
1.2 .8849 .8869 .8888 .8907 .8925 .8944 .8962 .8980 .8997 .9015
1.3 .9032 .9049 .9066 .9082 .9099 .9115 .9131 .9147 .9162 .9177
1.4 .9192 .9207 .9222 .9236 .9251 .9265 .9279 .9292 .9306 .9319
1.5 .9332 .9345 .9357 .9370 .9382 .9394 .9406 .9418 .9429 .9441
1.6 .9452 .9463 .9474 .9484 .9495 .9505 .9515 .9525 .9535 .9545
1.7 .9554 .9564 .9573 .9582 .9591 .9599 .9608 .9616 .9625 .9633
1.8 .9641 .9649 .9656 .9664 .9671 .9678 .9686 .9693 .9699 .9706
1.9 .9713 .9719 .9726 .9732 .9738 .9744 .9750 .9756 .9761 .9767
2.0 .9772 .9778 .9783 .9788 .9793 .9798 .9803 .9808 .9812 .9817
2.1 .9821 .9826 .9830 .9834 .9838 .9842 .9846 .9850 .9854 .9857
2.2 .9861 .9864 .9868 .9871 .9875 .9878 .9881 .9884 .9887 .9890
2.3 .9893 .9896 .9898 .9901 .9904 .9906 .9909 .9911 .9913 .9916
2.4 .9918 .9920 .9922 .9925 .9927 .9929 .9931 .9932 .9934 .9936
2.5 .9938 .9940 .9941 .9943 .9945 .9946 .9948 .9949 .9951 .9952
2.6 .9953 .9955 .9956 .9957 .9959 .9960 .9961 .9962 .9963 .9964
2.7 .9965 .9966 .9967 .9968 .9969 .9970 .9971 .9972 .9973 .9974
2.8 .9974 .9975 .9976 .9977 .9977 .9978 .9979 .9979 .9980 .9981
2.9 .9981 .9982 .9982 .9983 .9984 .9984 .9985 .9985 .9986 .9986
3.ot .9987 .9987 .9987 .9988 .9988 .9989 .9989 .9989 .9990 .9990
d.f. .995 .99 .975 .95 .90 .10 .05 .025 .01 .005
1 .00 .00 .00 .00 .02 2.71 3.84 5.02 6.63 7.88
2 .0 1 .02 .05 .10 .21 4.61 5.99 7.38 9.21 10.60
3 .07 .11 .22 .35 .58 6.25 7.81 9.35 11.34 12.84
4 .21 .30 .48 .71 1.06 7.78 9.49 11.14 13.28 14.86
5 .41 .55 .83 1.15 1.61 9.24 11.07 12.83 15.09 16.75
6 .68 .87 1.24 1.64 2.20 10.64 12.59 14.45 16.81 18.55
7 .99 1.24 1.69 2.17 2.83 12.02 14.07 16.01 18.48 20.28
8 1.34 1.65 2.18 2.73 3.49 13.36 15.51 17.54 20.09 21.96
9 1.73 2.09 2.70 3.33 4.17 14.68 16.92 19.02 21.67 23.59
10 2.16 2.56 3.25 3.94 4.87 15.99 18.31 20.48 23.21 25.19
11 2.60 3.05 3.82 4.57 5.58 17.28 19.68 21.92 24.72 26.76
12 3.07 3.57 4.40 5.23 6.30 18.55 21.03 23.34 26.22 28.30
13 3.57 4.11 5.01 5.89 7.04 19.81 22.36 24.74 27.69 29.82
14 4.07 4.66 5.63 6.57 7.79 21.06 23.68 26.12 29.14 31.32
15 4.60 5.23 6.26 7.26 8.55 22.31 25.00 27.49 30.58 32.80
16 5.14 5.81 6.91 7.96 9.31 23.54 26.30 28.85 32.00 34.27
17 5.70 6.41 7.56 8.67 10.09 24.77 27.59 30.19 33.41 35.72
18 6.26 7.01 8.23 9.39 10.86 25.99 28.87 31.53 34.81 37.16
19 6.84 7.63 8.91 10.12 11.65 27.20 30.14 32.85 36.19 38.58
20 7.43 8.26 9.59 10.85 12.44 28.41 31.41 34.17 37.57 40.00
21 8.03 8.90 10.28 11.59 13.24 29.62 32.67 35.48 38.93 41.40
22 8.64 9.54 10.98 12.34 14.04 30.81 33.92 36.78 40.29 42.80
23 9.26 10.20 11.69 13.09 14.85 32.01 35.17 38.08 41.64 44.18
24 9.89 10.86 12.40 13.85 15.66 33.20 36.42 39.36 42.98 45.56
25 10.52 11.52 13.12 14.61 16.47 34.38 37.65 40.65 44.31 46.93
26 11.16 12.20 13.84 15.38 17.29 35.56 38.89 41.92 45.64 48.29
27 11.81 12.88 14.57 16.15 18.11 36.74 40.11 43.19 46.96 49.65
28 12.46 13.56 15.31 16.93 18.94 37.92 41.34 44.46 48.28 50.99
29 13.12 14.26 16.05 17.71 19.77 39.09 42.56 45.72 49.59 52.34
30 13.79 14.95 16.76 18.49 20.60 40.26 43.77 46.98 50.89 53.67
50 27.99 29.71 32.36 34.76 37.69 63.17 67.50 71.42 76.15 79.49
100 67.33 70.06 74.22 77.93 82.36 118.5 124.3 129.6 135.8 140.2
500 422.3 429.4 439.9 449.1 459.9 540.9 553.1 563.9 576.5 585.2
1000 888.6 898.8 914.3 927.6 943.1 1058 1075 1090 1107 1119
There are 17 simulated data sets, each of 100 observations. Throughout the
book these have been used to illustrate and compare procedures.
I “x/5
EXP Negative езфопеп11а11у f(x) = - e fo r x > 0 (2, 9)
5
I
-X^
WE. 5 W elbull, F(X) = I - e (6.62, 87.72)
-X^
W E2 W eibull, F(X) = I - e (.6 3 , 3.25)
y=/. ä-г /A \
ß, = a 7 6 , ß , = 5 . 5 9 \
5r
y== 0 . 6 = 2 j
/\ \
= 0 . ß ,= 4 .5 l \w
5«
y = ( ) , 8 = 0 .5
ß ,= ( ) , ß ,= 1 .6 3
The above distributions w ere the first ones generated. To increase the
number of skewed distributions, the following w ere added.
p N (M ,l) + ( l - p ) N ( 0 , l )
Data Set
P M ß2 Name
The last three simulated data sets are from Johnson Bounded SB(y,ô) d istri
bution where SB(y,0) is defined above. The three samples here are from :
Data Set
У Ô Name
O b s e r O b s e r O b s e r O b s e r
vation vation No. vation No. vation
O b se r- O b se r- O b s e r- O b s e r-
No. vation No. vation No. vation No. vation
O b s e r O b se r O b s e r O b se r
No. vation No. vation No. vation No. vation
O b s e r O b se r O b se r O b se r
vation No. vation No. vation No. vation
O b s e r- O b s e r- O b s e r O b s e r-
No. vatlon No. vation No. vation № . vation
O b se r O b se r O b se r Obser-
vation No. vation No. vation No . vation
O b s e r O b se r O b se r O bser
No. vation vation No. vation No . vation
O bser- O b se r O b s e r- O b s e r-
No. vation №. v a tio n No. vation No. vation
O b se r O b se r O b se r O b se r
vation vation No. vation No. vation
O b se r O b se r O b se r O b ser-
No, vation No. vation №. vation № . vatioii
O b ser- O b ser- O b s e r O b s e r-
No. vation No. vation No. vation . vation
O b se r O b se r O b s e r O b s e r-
No. vation No. vation No. vation No . vatioii
O b s e r- O bser- O b s e r O bser-
No. vation No. vation No. vation No . vation
O b s e r O b se r O b se r O b s e r-
No. vation No. vation vation . vation
SB(. 535,. 5) Data Set Johnson Bounded (..535, .5) ^Гßl = .65
= 2.13
O b se r O b se r O b s e r O b se r
No. vation No. vation vation vation
There are five data sets which are given here. These can be used to illu s -
trate and compare various techniques. Some are used in the text.
Body weight in gram s of white Leghorn chicks at 21 days; from two labora-
tories in a collaborative vitamin D assay (Blisses data)
Series A Series B
156 I 130 I
162 I 147 I
168 I 155 I
182 I 156 I
186 I 167 I
190 2 177 I
196 I 179 I
202 I 183 I
210 I 187 2
214 I 193 I
220 I 195 I
226 I 196 I
230 2 199 I
236 2 203 I
242 I 208 I
246 I 225 I
270 231 I
20 232 I
236 I
246 I
21
0 16
I 41
2 49
3 20
4 14
5 5
6 I
7 I
8 and over __ 0
147
Height of plants
in dms
(class center) Frequency
7 I
8 3
9 4
10 12
11 25
12 49
13 68
14 95
15 96
16 78
17 53
18 26
19 16
20 3
21 __ I
530
X = 14.5396
= 4.9936
N -I
S = \/4.9103 = 2.2159
BAEN Data Set Data Set for Testing for Double Eзфonential (Laplace)
Differences in flood stages for two stations on the Fox R iver in Wisconsin
(Bain and Engelhardt's data)
lo97, 1.96, 3.60, 3.80, 4.79, 5.66, 5.76, 5.78, 6.27, 6.30, 6.78, 7.65,
7.84, 7.99, 8.51, 9.18, 10.13, 10.24, 10.25, 10.43, 11.45, 11.48, 11.75,
11.81, 12.34, 12.78, 13.06, 13.29, 13.98, 14.18, 14.40, 16.22, 17.06
551
552 INDEX
Most powerful sim ilar test, 245 [N orm ality, tests of]
M ultivariate normality, tests of: G eary’s a, 392
directional normality, 411 test, 283, 297, 390, 403-406
generalization of univariate tests, Kg test, 282, 283, 286, 296, 300,
409-411 301, 309, 322, 328, 391, 403, 406
Machado, 410 LaBrecque, 400
MaIkovich and A fifi, 410 likelihood ratio, 400
M ardia, 409 Locke and S p u rrier’s U statistic,
maximum curvature test, 412 401
nearest distance test, 412 moment, 375
radius and angle decomposition, 411 Neyman smooth test, 249, 261
solely multivariate procedures, norm alized spacings, using, 181
411-413 omnibus tests based on moments,
univariate procedures, 409 390-391
power studies, 214, 403-404
Negatively skewed, 11 R test, 390
Neym an-Pearson theory, 2 regression tests, 201-205, 393-
Nonparametric estimator of the cdf, 401
468, 476 residuals, on, 406-408
Norm al mixture, 290, 304, 305, 314, autoregressive, 408
315, 316 linear regression , 132, 407
Norm ality, multivariate (see multi Royston, 400
variate normality, tests of) sam ple range, 392
Norm ality, tests of: Shapiro-Francia W ’ , 213, 223,
Anderson-Darling, 372-374 399, 403-406
n/Ь^, 376-381, 403-406 Shapiro-W ilk W , 3, 4, 206, 208,
Bowman and Shenton’s omnibus test 211, 252, 393, 403-406
(see Kg test under normality, Spiegelhalter, 402
tests of) third standardized moment, 376-
b2, 388-390, 403-406 381, 403-406
chi-squared type, 370-371 tie s , effect of, 405
comparison of, 403-405 Tiku, 402
correlation coefficient tests, 201- l^^tsonU^, 249, 261
205 W eisberg-B in gh am , 399
D ’Agostino and Pearson ’s omnibus, N transformation, 422, 424, 429,
283, 297, 390, 403-406 431
D ’Agostino’s D, 212, 395-399, N U residuals (norm al uniform
403-406 resid u als), 247, 250, 251
effects of ties on, 405
edf, 122-133, 214, 371-374 Omnibus tests (see also normality,
edf for simple null hypothesis, 372 tests of), 251, 283, 285, 315,
entropy, 344 390-391, 486
Filliben, 400 Omnlbustest contours, 296-297,
fourth standardized moment, 388- 302, 315
390, 403-406 Ordinary least sq u ares, 197, 198
558 INDEX
Illustrated with tables and drawings, this volume is an ideal reference for mathematical and
applied statisticians, and biostatisticians; professionals in applied science fields, including psy
chologists, biometricians, physicians, and quality control and reliability engineers; advanced
undergraduate- and graduate-level courses on goodness-of-fit techniques; and professional sem-
inarr. and symposia on applied statistics, quality control, and reliability.