0% found this document useful (0 votes)
1K views80 pages

Elementary Statistics Tables Open University Text

Elementary Statistics Tables

Uploaded by

G_ASantos
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1K views80 pages

Elementary Statistics Tables Open University Text

Elementary Statistics Tables

Uploaded by

G_ASantos
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 80

Elementary Statistics Tables

Henry R.Neave
University of Nottingham

London and New York


Preface

Having published my Statistics tables in 1978, the obvious question is: why another book
of Statistics tables so soon afterwards? The answer derives from reactions to the first book
from a sample of some 500 lecturers and teachers covering a wide range both of edu-
cational establishments and of departments within those establishments. Approximately
half found Statistics tables suitable for their needs; however the other half indicated that
their courses covered rather less topics than included in the Tables, and therefore that a
less comprehensive collection would be adequate. Further, some North American advisers
suggested that more ‘on the spot’ descriptions, directions and illustrative examples would
make such a book far more attractive and useful. Elementary statistics tables has been
produced with these comments very much in mind.
The coverage of topics is probably still wider than in most introductory Statistics
courses. But useful techniques are often omitted from such courses because of the lack of
good tables or charts in the textbook being used, and it is one of the aims of this book to
enable instructors to broaden the range of statistical methods included in their syllabuses.
Even if some of the methods are completely omitted from the course textbook, instructors
and students will find that these pages contain brief but adequate explanations and illustra-
tions.
In deciding the topics to be included, I was guided to an extent by draft proposals for the
Technician Education Council (TEC) awards, and Elementary statistics tables essentially
covers the areas included in this scheme for which tables and/or charts are necessary. The
standard distributions are of course included, i.e. binomial, Poisson, normal, t, χ2 and F.
Both individual and cumulative probabilities are given for binomial and Poisson distribu-
tions, the cumulative Poisson probabilities being derived from a newly designed chart on
which the curves are virtually straight: this should enhance ease of reading and accuracy. A
selection of useful nonparametric techniques is included, and advocates of these excellent
and easy-to-apply methods will notice the inclusion of considerably improved tables for
the Kruskal-Wallis and Friedman tests, and a new table for a Kolmogorov-Smirnov general
test for normality. The book also contains random-number tables, including random num-
bers from normal and exponential distributions (useful for simple simulation experiments),
binomial coefficients, control chart constants, various tables and charts concerned with
correlation and rank correlation, and charts giving confidence intervals for a binomial p.
The book ends with four pages of familiar mathematical tables and a table of useful con-
stants, and a glossary of symbols used in the book will be found inside the back cover.
Considerable care and thought has been given to the design and layout of the tables.
Special care has been taken to simplify a matter which many students find confusing: which
table entries to use for one-sided and two-sided tests and for confidence intervals. Several
tables, such as the percentage points for the normal, t, χ2 and F distributions, may be used
for several purposes. Throughout this book, α1 and α2 are used to denote significance levels
for one-sided (or ‘one-tailed’) and two-sided tests, respectively, and γ indicates confidence
levels for confidence intervals. (Where occasion demands, we even go so far as to use
and to denote significance levels for right-hand and left-hand one-sided tests.) If a table
Preface  iii
can be used for all three purposes, all three cases are clearly indicated, with 5% and 1%
critical values and 95% and 99% confidence levels being highlighted.
My thanks are due to many people who have contributed in various ways to the pro-
duction of this book. I am especially grateful to Peter Worthington and Arthur Morley for
their help and guidance throughout its development: Peter deserves special mention for his
large contribution to the new tables for the Kruskal-Wallis and Friedman tests. Thanks also
to Graham Littler and John Silk who very usefully reviewed some early proposals, and to
Trevor Easingwood for discussions concerning the TEC proposals. At the time of writing,
the proof-reading stage has not yet arrived; but thanks in advance to Tonie-Carol Brown
who will be helping me with that unenviable task. Finally, I must express my gratitude to
the staff of the Cripps Computing Centre at Nottingham University: all of the tables and
charts have been newly computed for this publication, and the service which they have
provided has been excellent.
Naturally, total responsibility for any errors is mine alone. It would be nice to think that
there are none, but I would greatly appreciate anybody who sees anything that they know
or suspect to be incorrect communicating the facts immediately to me.

HENRY NEAVE
October 1979
Contents

  pages

  1

  2

  10

  12

  17

  18

  22

  23

  24

  26

  31

  34

  39

  40

  41

  42

  43
Contents v

  45

  45

  50

  50

  51

  52

  54

  54

  55

  58

  60

  60

  61

  63

  63

  64 , 65

  66

  68

inside back
  cover
First published 1981 by Unwin Hyman Ltd
Sixth impression 1989

Simultaneously published in the USA and Canada


by Routledge
29 West 35th Street, New York, NY 10001

Routledge is an imprint of the Taylor & Francis Group

This edition published in the Taylor & Francis e-Library, 2009.

To purchase your own copy of this or any of


Taylor & Francis or Routledge’s collection of thousands of eBooks
please go to www.eBookstore.tandf.co.uk.

© 1981 H.R.Neave

All rights reserved. No part of this book may be reprinted


or reproduced or utilized in any form or by any electronic,
mechanical, or other means, now known or hereafter
invented, including photocopying and recording, or in any
information storage or retrieval system, without permission
in writing from the publishers.

British Library Cataloguing in Publication Data


A catalogue record for this book is available from
the British Library

Library of Congress Cataloguing in Publication Data


A catalogue record for this book is available from
the Library of Congress

ISBN 0-203-13336-6 Master e-book ISBN

ISBN 0-203-17589-1 (Adobe ebook Reader Format)


ISBN 0-415-08458-X (Print Edition)
The binomial distribution: individual probabilities

If the probability is p that a certain event (often called a ‘success’) occurs in a trial of an
experiment, the binomial distribution is concerned with the total number X of successes
obtained in n independent trials of the experiment. Pages 4, 6, 8 and 10 give Prob (X=x)
for all possible x and n up to 20, and 39 values of p. For values of (along the top hori-
zontal) refer to the x-values in the left-hand column; for values of (along the bottom
horizontal) refer to the x-values in the right-hand column.
The binomial distribution: cumulative probabilities

Pages 5, 7, 9 and 11 give cumulative probabilities for the same range of binomial distributions
as covered on pages 4, 6, 8 and 10. For values of (along the top horizontal) refer to the x
values in the left-hand column, the table entries giving Prob (X≥x); for values of (along
the bottom horizontal) refer to the x-values in the right-hand column, the table entries giving
Prob (X≤x) for these cases. Note that cumulative probabilities of the opposite type to those
given may be calculated by Prob (X≤x)=1−Prob (X≥x+1) and Prob (X≥x)=1−Prob (X≤x−1).
Elementary Statistics Tables 3
EXAMPLES: If ten dice are thrown, what is the probability of obtaining exactly two sixes?
With n=10 and , Prob (X=2) is found from the table to be 0.2907. If a treatment has a
90% success-rate, what is the probability that all of twelve treated patients recover? With
n=12 and p=0.9, the table gives Prob (X=12)=0.2824.
4 Elementary Statistics Tables
EXAMPLES: If ten dice are thrown, what is the probability of obtaining at most two
sixes? Now, Prob (X≤2)=1−Prob (X≥3). With n=10 and , the table gives Prob (X≥3)
as 0.2248, so Prob (X≤2)=1−0.2248=0.7752. If a treatment has a 90% success-rate, what is
the probability that no more than ten patients recover out of twelve who are treated? With
n=12 and p=0.9, the table gives Prob (X≤10)=0.3410.
Elementary Statistics Tables 5
6 Elementary Statistics Tables

The four charts on pages 12 and 13 are for use in binomial sampling experiments, both
to find confidence intervals for p and to produce critical regions for the sample fraction
f=X/n (see bottom of page 4 for notation) when testing a null hypothesis H0:p=p0. The
charts produce (a) confidence intervals having γ=90%, 95%, 98% and 99% confidence
levels; (b) one-sided critical regions (for alternative hypotheses H1 of the form p<p0 or
Elementary Statistics Tables 7

p>p0) for tests with significance levels and ; and (c) two-sided criti-
cal regions (for H1 of the form p≠p0) for tests with significance levels α2=10%, 5%, 2%
and 1%. For confidence intervals, locate the sample fraction f on the horizontal axis,
trace up to the two curves labelled with the appropriate sample size n, and read off the
confidence limits on the vertical axis. For critical regions, locate the hypothesised
value of p, p0, on the vertical axis, trace across to the two curves labelled with the sam-
ple size n and read off critical values f1 and/or f2 on the horizontal axis. If f1<f2 the one-
sided critical region for H1:p<p0 is f≤f1, or if H1 is p>p0 it is f≥f2. A two-sided critical
8 Elementary Statistics Tables

region appropriate for H1:p≠p0 is comprised of both of these one-sided regions.


The ‘curves’ are in fact drawn as straight lines joining points correspond-
ing to all n+1 possible values of f (this is seen most clearly for small n). Use of val-
ues of f1 and f2 which are in fact not realisable valuesof f result in conservative
critical regions, i.e. actual α1 or α2 values which are less than the nominal values.
Elementary Statistics Tables 9
EXAMPLES: With eight successes out of twenty, i.e. n=20, X=8 and f=8/20=0.4, the
γ=95% confidence interval for p is (0.19:0.64), using the second chart on page 12. Using
the same chart, suppose we wish to test H0:p=0.6, again with n=20. We read off f1=0.36
and f2=0.83. So f≤0.36 (i.e. X≤7) is the critical region appropriate for H1:p<0.6,
f≥0.83 (i.e. X≥17) is the critical region appropriate for H1:p>0.6, and these two
regions combined constitute the α2=5% critical region appropriate for H1:p≠0.6. and
denote significance levels for the one-sided tests where H1 says that p is to the Left or
Right respectively of p0. The true significance levels here in all cases slightly less than the
nominal figures of or 5%.
Charts giving confidence intervals for p and critical
values for the sample fraction

For description, see pages 10 and 11.


Elementary Statistics Tables 11


For description, see pages 10 and 11.
The Poisson distribution: individual probabilities

The main uses of the Poisson distribution are as an approximation to binomial distri-
butions having large n and small p (for notation see page 4) and as a description of the
occurrence of random events over time (or other continua). Individual probabilities
are given on pages 14–16 for a wide range of values of the mean µ, and cumulative
probabilities are obtained from the Poisson probability chart on page 17.
EXAMPLES: A production process is supposed to have a 1% rate of defectives. In a
random sample of size eighty, what is the probability of there being (a) exactly two
defectives, and (b) at least two defectives? The number X of defectives has a binomial
distribution with n=80 and p=0.01; its mean µ is np=80×0.01=0.8. This distribution
Elementary Statistics Tables 13
is well approximated by the Poisson distribution having the same mean, µ=0.8. So imme-
diately we find (a) Prob (X=2)=0.1438. For (b) Prob (X≥2) we can use the chart on page
17 directly. However this probability can also be found by noting that Prob (X≥2)=1−Prob
(X≤1)=1−{Prob (X=0)+Prob (X=1)}=1−{0.4993+0.3595}=0.1912, using the above table.
A binomial distribution with large n and a p-value close to 1 may also be dealt with by means
of a Poisson approximation if the problem is re-expressed in terms of a small p-value. For
example if a treatment has a 90% (p=0.9) success-rate, what is the probability that exactly 95 out
of 100 treated patients recover? This is the same as asking what is the probability that
exactly 5 patients out of 100 fail to recover when the failure-rate is 10% or 0.1. That is
we want Prob (X=5) in the binomial distribution with n=100 and p=0.1 which can be
approximated by the Poisson distribution with mean µ=np=100×0.1=10.0. From page 15,
this probability is found to be 0.0378.
14 Elementary Statistics Tables
Elementary Statistics Tables 15
16 Elementary Statistics Tables
The Poisson probability chart on page 17 gives cumulative probabilities of the form Prob
(X≥x) where X has a Poisson distribution with mean µ in the range 0.01≤µ≤100. To find
such a probability, locate the appropriate value of µ on the right-hand vertical axis, trace
back along the horizontal to the line or curve labelled with the desired value of x, and read
off the probability on the horizontal axis. The horizontal scale is designed to give most
accuracy in the tails of the distribution, i.e. where the probabilities are close to 0 or 1, and
the vertical scale has been devised to make the curves almost linear.
EXAMPLES: A production process is supposed to have a 1% rate of defectives. In a ran-
dom sample of size eighty, what is the probability of there being at least two defectives?
This question has already been answered on p. 14 using individual probabilities. Here we
may read off the probability directly, following the above directions with µ=0.8 and x=2,
giving Prob (X≥2)=0.19. Obviously, accuracy may be somewhat limited when using the
chart.
Probabilities of events such as X≤2 can also be easily found. For Prob (X≤2)=1−Prob
(X≥3), and Prob (X≥3) is seen to be just less than 0.05, say 0.048, giving Prob
(X≤2)=1−0.048=0.952.
As a final example, suppose the number X of serious road accidents per week in a cer-
tain region has a Poisson distribution with mean µ=2.0. What is the probability of there
being no more than three accidents in a particular week? This again can be calculated
using either individual probabilities or the chart. From page 14, the probabilities of 0, 1,
2 or 3 accidents are respectively 0.1353, 0.2707, 0.2707 and 0.1804, and adding these we
have Prob (X≤3)=0.8571. Using the chart, since Prob (X≤3)=1−Prob (X≥4), we obtain Prob
(X≤3)=1−0.14=0.86.
Poisson probability chart (cumulative probabilities)

For description, see page 16.


Probabilities and ordinates
in the normal distribution
Elementary statistics tables 19
The superscript in numbers such as 0.08182 indicates a number of zeros, thus:
0.08182=0.00000000182, and 0.03483=0.000483.
Proportional parts have not been given in this region because they would not be of suf-
ficient accuracy.
The left-hand column gives the ordinate of the standard normal distri-
bution (i.e. the normal distribution having mean 0 and standard deviation 1), z being listed
in the second column. The rest of the table gives , where Z is
a random variable having the standard normal distribution. Locate z, expressed to its first
decimal place in the second column, and its second decimal place along the top or bottom
horizontal: the corresponding table entry is Φ(z). Proportional parts are given for the third
decimal place of z in part of the table. These proportional parts should be subtracted if z<0
and added if z>0.
EXAMPLES: Ф(−1.2)=Prob (Z≤−1.2)=0.1151; Ф(−1.23)=0.1093; Ф(−1.234)=0.1086.
20 Elementary Statistics Tables

The superscript in numbers such as 0.98401 indicates a number of nines, thus:


0.98401=0.99999999401, and 0.93032=0.999032.
Elementary Statistics Tables 21
Proportional parts have not been given in this region because they would not be of suf-
ficient accuracy.
EXAMPLES: Ф(1.2)=Prob (Z≤1.2)=0.8849; Φ(1.23)=0.8907; Φ(1.234)=0.8914;
Prob(Z≥2.3)=Prob (Z≤−2.3)=Φ(−2.3)=0.0107 (making use of the symmetry of the normal
distribution); Prob (0.32≤Z≤1.43)=Ф(1.43)−Φ(0.32)=0.9236−0.6255=0.2981.
Other normal distributions may be dealt with by standardisation, i.e. by subtracting the
mean and dividing by the standard deviation. For example if X has the normal distribution with
mean 10.0 and standard deviation 2.0, Prob
.
Percentage points
of the normal distribution

The following notation is used in this and subsequent tables. q represents a quan-
tile, i.e. q and the tabulated value z are related here by Prob (Z≤z)=q=Φ(z); e.g.
Ф(1.9600)=q=0.975, where z=1.9600. and denote significance levels for one-
tailed or one-sided critical regions. Sometimes and values, corresponding to
critical regions in the left-hand and right-hand tails, need to be tabulated separately;
in other cases one may easily be obtained from the other. Here we have included
only , since values are obtained using the symmetry of the normal distribu-
tion. Thus if a 5% critical region in the right-hand tail is required, we find the entry
corresponding to and obtain Z≥1.6449. Had we required a 5% critical region
in the left-hand tail it would have been Z≤−1.6449. α2 gives critical regions for two-
sided tests; here |Z|≥1.9600 is the critical region for the two-sided test at the α2=5%
significance level. Finally, γ indicates confidence levels for confidence intervals—so
a 95% confidence interval here is derived from |Z|≤1.9600. For example with a large
sample X1, X2,…, Xn we know that has approximately a standard nor-
mal distribution, where and the adjusted sample standard deviation s is
given by . So a 95% confidence interval for µ is derived from
, which is equivalent to .
Percentage points of the
Student t distribution

The t distribution is mainly used for testing hypotheses and finding confidence intervals for
means, given small samples from normal distributions. For a single sample,
has the t distribution with v=n−1 degrees of freedom (see notation above). So, e.g. if n=10,
giving v=9, the γ=95% confidence interval for µ is . Given
two samples of sizes n1 and n2, sample means and , and adjusted sample standard
deviations s1 and s2, has the t distribution with v=n1+n2−2 degrees
of freedom, where . So if the population means are
denoted µ1 and µ2, then to test H0:µ1=µ2 against H1:µ1>µ2 at the 5% level, given samples
of sizes 6 and 10, the critical region is , using v=6+10−2=14 and
. As with the normal distribution, symmetry shows that values are just the
values prefixed with a minus sign.
Percentage points
of the chi-squared
(χ2) distribution

The χ2 (chi-squared) distribution is used in testing hypotheses and forming confidence


intervals for the standard deviation σ and the variance σ2 of a normal population. Given a
random sample of size n, χ2=(n−1)s2/σ2 has the chi-squared distribution with v=n−1 degrees
of freedom (s is defined on page 20). So if n=10, giving v=9, and the null hypothesis H0 is
Elementary Statistics Tables 25
σ=5, 5% critical regions for testing against (a) H1:σ<5, (b) H1:σ>5 and (c) H1:σ≠5 are (a)
9s2/25≤ 3.325, (b) 9s2/25≥ 16.919 and (c) 9s2/25≤2.700 or 9s2/25≥19.023, using signifi-
cance levels (a) , (b) and (c) α2 as appropriate. For example if s2=50.0, this would
result in rejection of H0 in favour of H1 at the 5% significance level in case (b) only. A
γ=95% confidence interval for σ with these data is derived from 2.700≤(n−1)s2/σ2≤19.023,
i.e. 2.700≤450.0/σ2≤19.023, which gives 450/19.023≤σ2≤450/2.700 or, taking square roots,
4.864≤σ≤12.910.
The χ2 distribution also gives critical values for the familiar χ2 goodness-of-fit tests and
tests for association in contingency tables (cross-tabulations). A classification scheme is
given such that any observation must fall into precisely one class. The data then consist of
frequency-counts and the statistic used is χ2=Σ(Ob.−Ex.)2/Ex., where the sum is over all the
classes, Ob. denoting Observed frequencies and Ex. Expected frequencies, these being cal-
culated from the appropriate null hypothesis H0. It is common to require that no expected
frequencies be less than 5, and to regroup if necessary to achieve this. In goodness-of-fit
tests, H0 directly or indirectly specifies the probabilities of a random observation falling
in each class. It is sometimes necessary to estimate population parameters (e.g. the mean
and/or the standard deviation) to do this. The expected frequencies are these probabili-
ties multiplied by the sample size. The number of degrees of freedom v=(the number of
classes−1−the number of population parameters which have to be estimated). With con-
tingency tables, H0 is the hypothesis of no association between the classification schemes
by rows and by columns, the expected frequency in any cell is (its row’s subtotal)×(its
column’s subtotal)÷(total number of observations), and the number of degrees of freedom
v is (number of rows−1)×(number of columns−1).
In all these cases, it is large values of χ2 which are significant, so critical regions are of
the form χ2≥tabulated value, using significance levels.
Percentage points
of the F distribution

Three of the main uses of the F distribution are (a) the comparison of two variances, (b)
to give critical values in the wide range of analysis-of-variance tests and (c) to find critical
values for the multiple correlation coefficient.

(a) Comparison of two variances

Given random samples of sizes n1 and n2 from two normal populations having standard
deviations σ1 and σ2 respectively, and where s1 and s2 denote the adjusted sample stan-
dard deviations (see page 20), has the F distribution with (v1, v2)−(n1−1, n2−1)
degrees of freedom. In the tables the degrees of freedom are given along the top (v1) and
down the left-hand side (v2). For economy of space, the tables only give values in the right-
hand tail of the distribution. This gives rise to minor inconvenience in some applications,
which will be seen in the following illustrations:
(i) One-sided test—H0:σ1=σ2, H1:σ1>σ2. The tabulated figures are directly appropriate.
Thus if n1=5 and n2=8, giving v1=4 and v2=7, the critical region is .
(ii) One-sided test—H0:σ1=σ2, H1:σ1<σ2. Here we would normally need values for
. However the tabulated values are appropriate if we use the statistic and switch
round the degrees of freedom. So if n1=5 and n2=8, the appropriate critical region
is (using v1=7, v2=4).
(iii) Two-sided test—H0:σ1=σ2, H1:σ1≠σ2. Calculate either or , whichever is the
larger, switching round the degrees of freedom if is chosen, and enter the tables using
the α2 significance levels. So if n1=5 and n2=8, giving v1=4 and v2=7, then we reject H0 in
favour of H1 at the α2=5% significance level if either or .
(iv) Confidence interval for σ1/σ2 or . This is derived from an interval of the form
where f2 is read directly from the tables, using the desired confidence level
γ, and f1 is the reciprocal of the tabulated value found after switching the degrees of freedom.
Thus if γ=95%, and n1=5, n2=8 giving v1=4, v2=7 again, then f2=5.523 and f1=1/9.074. So, e.g. if
we have which, after a little manipulation, gives
, and taking square roots yields (0.851:6.025) as the γ=95% con-
fidence interval for σ1/σ2.

(b) Analysis-of-variance(ANOVA) tests

The F statistics produced in the standard analysis-of-variance procedures are in the correct
form for direct application of the tables, i.e. the critical regions are F≥tabulated value. Note
that (not α2) significance levels should be used. In the one-way classification analysis-
of-variance, v1 is one less than the number of samples being compared; otherwise in experi-
ments where more than one factor is involved, F statistics can be found to test the effect of
each of the factors and v1 is then one less than the number of levels of the particular factor
Elementary Statistics Tables 27
being examined. If an F statistic is being used to test for an interactive effect between two
or more factors, v1 is the product of the numbers of degrees of freedom for the component
factors. v2 is the number of degrees of freedom in the residual (or error, or within-sample)
sum of squares, and is usually calculated as (total number of observations−1)−(total num-
ber of degrees of freedom attributable to individual factors and their interactions (if rel-
evant)). If the experiment includes replication, and a replication effect is included in the
underlying model, this also counts as a factor for these purposes.

(c) Testing a multiple correlation coefficient

In a multiple linear regression Ŷ=a0+a1X1+a2X2+…+akXk, where a0, a1, a2,…, ak are esti-
mated by least squares, the multiple correlation coefficient R is a measure of the goodness-
of-fit of the regression model. R can be calculated as , where Y
denotes the observed values and their mean. R is also the linear correlation coefficient
of Ŷ with Y. Assuming normality of residuals, R can be used to test if the regression model
is useful. Calculate F=(n−k−1)R2/k(1−R2), where n is the size of the sample from which R
was computed, and the critical regions showing evidence that the model is indeed useful
are of the form F≥tabulated value, using the F tables with v1=k, v2=n−k−1 and signifi-
cance levels.
28 Elementary Statistics Tables
Elementary Statistics Tables 29
30 Elementary Statistics Tables
Critical values for the Kolmogorov-Smirnov
goodness-of-fit test (for completely
specified distributions)

Goodness-of-fit tests are designed to test a null hypothesis that some given data are a ran-
dom sample from a specified probability distribution. The Kolmogorov-Smirnov tests are
based on the maximum absolute difference Dn between the c.d.f. (cumulative distribution
function) F0(x) of the hypothesised distribution and the c.d.f. of the sample (sometimes
called the empirical c.d.f.) Fn(x). This sample c.d.f. is the step-function which starts at 0
and rises by 1/n at each observed value, where n is the sample size; i.e. Fn(x) is equal to the
proportion of the sample values which are less than or equal to x.
Critical regions for rejecting H0 are of the form Dn≥tabulated value, and in most cases
the general alternative hypothesis is appropriate, i.e. the α2 significance levels should be
used. One-sided alternative hypotheses can be dealt with by only considering differences in
one direction between the c.d.f.s. For example, suppose H1 says that the actual values being
sampled are mainly less than those expected from F0(x). If this is the case Fn(x) will tend to
rise earlier than F0(x), and so instead of Dn we should then use the statistic
. In the opposite case, where H1 says that the values sampled are mainly greater
than those expected from F0(x), we should use . Critical regions are
(or ) ≥tabulated value, and in these one-sided tests the α1 significance levels should
be used.
For illustration, let us test the null hypothesis H0 that the following ten observations
(derived in fact from part of the top row of the table of random digits on page 42) are a ran-
dom sample from the uniform distribution over (0:1), having c.d.f. F0(x)=0 for x<0, F0(x)=x
for 0≤x≤1, and F0(x)=1 for x>1:

0.02484 0.88139 0.31788 0.35873 0.63259 0.99886 0.20644 0.41853 0.41915 0.02944
32 Elementary Statistics Tables
Sorting the data into ascending order, we have:

0.02484 0.02944 0.20644 0.31788 0.35873 0.41853 0.41915 0.63259 0.88139 0.99886

It is then easy to draw the sample c.d.f., F10(x), and from the diagram we find that the
maximum vertical distance between the two c.d.f.s, which occurs at x=0.41915, is
D10=0.7−0.41915=0.28085. But the critical region for rejection of H0 even at the α2=10%
significance level is D10≥0.3687, and so we have no reason here to doubt the null
hypothesis.
The Kolmogorov-Smirnov test may be used both when F0(x) is continuous and discrete.
In the continuous case critical values are exact; in the discrete case they may be conserva-
tive (i.e. true α<nominal α).
A particularly useful application of the test is to test data for normality. In this case use
may be made of the graph on page 27 of the c.d.f. of the standard normal distribution by
first standardising the data, i.e. subtracting the mean and dividing by the standard devia-
tion. The resulting sample c.d.f. may be drawn on page 27 and the Kolmogorov-Smirnov
test performed as usual. For example to test the hypothesis that the following data come
from the normal distribution with mean 5 and standard deviation 2, we transform each
observation X into :

(original) X 8.74 4.08 8.31 7.80 6.39 7.21 7.05 5.94


(transformed) Z 1.87 −0.46 1.655 1.40 0.695 1.105 1.025 0.47

Then we sort the transformed data into ascending order and draw the sample c.d.f. on the
graph on page 27 (step-heights are 1/8 since the sample size n is 8 here). The maximum
vertical distance between the two c.d.f.s is seen to be about 0.556, and this shows strong
evidence that the data do not come from the hypothesised distribution, since the α2=1%
critical region is D8≥0.5418.
Elementary Statistics Tables 33
Perhaps it is more commonly necessary to test for normality without the mean and stan-
dard deviation being specified. To perform the test in these circumstances, first estimate
the mean by and the standard deviation by . Standardise
the data using these estimates, and then proceed as before except that the critical values on
page 27 should be used. For the above eight observations, and s=1.484. The
transformed data are now:

1.213 −1.927 0.923 0.579 −0.371 0.182 0.074 −0.674

The maximum difference now found between the c.d.f. of this sample and that of the stan-
dard normal distribution is D8=0.155, and this is certainly not significantly large, for even
at the α2=10% level the critical region is D8≥0.2652. We conclude therefore that although
there was strong evidence that the data do not come from the originally specified normal
distribution, they could quite easily have come from some other normal distribution. The
originator of this type of test was W.H.Lilliefors.
Critical values for larger sample sizes than covered in the tables are discussed on
page 35.
Critical values for the Kolmogorov-Smirnov test
for normality

For description see page 26; for larger sample sizes, see page 35.

The c.d.f. of the standard normal distribution


Elementary Statistics Tables 35

Nonparametric tests
Pages 29–34 give critical values for six nonparametric tests. The sign test and the Wil-
coxon signed-rank test are one-sample tests and can also be applied to matched-pairs data,
the Mann-Whitney and Kolmogorov-Smirnov tests are two-sample tests, and the Kruskal-
Wallis and Friedman tests are nonparametric alternatives to the standard one-way and two-
way analyses-of-variance. Critical values for larger sample sizes than those included in
these tables are covered on page 35.
The sign test (page 29). Suppose that the national average mark in an English examination
is 60%. (In nonparametric work, the average is usually taken to be the median rather than
the mean.) Test whether the following marks, obtained by twelve students from a particular
school, are consistent with this average.

70 65 75 58 56 60 80 75 71 69 58 75
+ + + − − 0 + + + + − +

We have printed + or − under each mark to indicate whether it is greater or less than the
hypothesised 60. There is one mark of exactly 60 which is completely ignored for the pur-
poses of the test, reducing the sample size n to 11. The sign test statistic S is the number of
+ signs or the number of − signs, whichever is smaller; here S=3. Critical regions are of the
form S≤tabulated value. As the α2=10% critical region for n=11 is S≤2, we cannot reject the
null hypothesis H0 that these marks are consistent with an average of 60%.
For a one-sided test, count either the number of + or − signs, whichever the alternative
hypothesis H1 suggests should be the smaller. For example if H1 says that the average mark
is less than 60%, S would be defined as the number of + signs since if H1 is true there will
generally be fewer marks exceeding 60%. Critical regions are of the same form as previ-
ously, but the α1 significance levels should be used.
The Wilcoxon signed-rank test (page 29). This test is more powerful than the sign test as
it takes into account the sizes of the differences from the hypothesised average, rather than
just their signs. In the above example, first subtract 60 from each mark, and then rank the
resulting differences, irrespective of their signs. Again ignore the mark of exactly 60, and
also average the ranks of tied observations.

The Wilcoxon statistic T is the sum of the ranks of either the +ve or −ve differences, which-
ever is smaller. Here . Critical regions are of the form T≤tabulated value,
and the test thus shows evidence at better than the α2=2% significance level that these
marks are inconsistent with the national average, since the 2% critical region for n=11
is T≤7.
36 Elementary Statistics Tables
For a one-sided test, let T be the sum of the ranks of either the +ve or the −ve differ-
ences, whichever the one-sided H1 suggests should be the smaller—it will be the same
choice as in the sign test—and use the α1 significance levels.
Matched-pairs data. Matched-pairs data arise in such examples as the following. One
member of each of eight pairs of identical twins is taught mathematics by programmed
learning, the other by a standard teaching method. Do the test results imply any difference
in the effectiveness of the two teaching methods?

Such data may be analysed by either of the above tests, comparing the twin-by-twin differ-
ences in the final row with a hypothesised average of 0. The reader may confirm that S=1
and , so that the null hypothesis of no difference is rejected at the α2=10% level in the
sign test and at near to the α2=2% level in Wilcoxon’s test.
The Mann-Whitney U test (page 30). Six students from another school take the same
English examination as mentioned above. Their marks are: 53, 65, 63, 57, 68 and 56. We
want to check whether the two sets of students are of different average standards.
We order the two samples of marks together and indicate by A or B whether a mark
comes from the first or second school:

The observations are given ranks as shown, the ranks being averaged in the case of ties
(unnecessary if a tie only involves members of one sample). Then either form the sum
RA of the ranks of observations from sample A, and calculate , or the
sum RB of the ranks of observations from sample B, and calculate ,
where nA and nB are the sizes of samples A and B. Finally obtain U as the smaller of UA or
nAnB−UA, or equivalently the smaller of UB or nAnB−UB. Critical regions have the form U≤
tabulated value. In the above example, RA=135 so that , or RB=36
and . In either case U is found to be 15, and this provides a little evi-
dence for a difference between the two sets of students since the α2=10% critical region is
U≤17 and the 5% region is U≤14. (In the table, sample sizes are denoted by n1 and n2 with
n1≤n2.)
For a one-sided test, calculate whichever of UA and UB is more likely to be small if the
one-sided H1 is true, use this in place of U, and refer to the α1 significance levels.
The Kolmogorov-Smirnov two-sample test (page 31). Whereas the Mann-Whitney test
is designed specifically to detect differences in average, the Kolmogorov-Smirnov test is
used when other types of difference may also be of interest. To calculate the test statistic D,
Elementary Statistics Tables 37
draw the sample c.d.f.s .s (see page 26) for both sample A and sample B on the same graph;
D is then the maximum vertical distance between these two c.d.f.s. To use the table on page
31, form D*=nAnBD, and critical regions are of the form D*≥tabulated value, using the α2
significance levels. A one-sided version of the test is also available, but is not often used
since the alternative hypothesis is then essentially concerned not with general differences
but a difference in average, for which the Mann-Whitney test is more powerful. Applied to
the above example on the two sets of English results, D=7/12 and D*=12×6×7/12=42. This
is not even significant at the α2=10% level, as that critical region is D*≥48. This supports
the above remark that the Mann-Whitney test (which gave significance at better than the
10% level) is more powerful as a test for differences in average.
The Kruskal-Wallis test (pages 32–34). The Kruskal-Wallis test is also designed to
detect differences in average, but now when we have three or more samples to compare.
Again, as in the Mann-Whitney test, we rank all of the data together (averaging the ranks
of tied observations) and form the sum of the ranks in each sample. The test statistic is

where k is the number of samples, n1, n2,…, nk are their sizes, N=Σni and R1, R2,…, Rk are
the rank sums. Critical regions are of the form H≥tabulated value. Tables are given on page
32 for k=3 and N≤19, on page 33 for k=4 (N≤14), k=5 (N≤13) and k=6 (N≤13), and on page
34 for 3≤k≤6 and equal sample sizes n1=n2=…=nk=n for 2≤n≤25.
To illustrate the Kruskal-Wallis test, we show samples of mileages per gallon for three
different engine designs:

Then

This is significant of a difference between average mileages at better than the 5% level,
the α=5% critical region being H≥5.791. (In such cases where there is no meaningful one-
sided version of the test, α2 is written as α with no subscript.)
Friedman’s test (page 34). Friedman’s test applies when the observations in three or more
samples are related or ‘blocked’ (similarly as with matched-pairs data). If there are k sam-
ples and n blocks, the observations in each block are ranked from 1 to k, the rank sums R1,
R2,…, Rk for each sample abtained, and Friedman’s test statistic is then
38 Elementary Statistics Tables

To illustrate the test, suppose that in a mileages survey we use cars of five different ages
and obtain the following data

Then , which is strongly


significant since the α=1% critical region is M≥8.400.
Note: All of the nonparametric tests described above have discrete-valued statistics, so
that the exact nominal α-levels are not usually obtainable. The tables give best conservative
critical regions, i.e. the largest regions with significance levels less than or equal to α.
Critical values for the sign test

For description, see page 28; for larger sample sizes, see page 35.
Critical values for the Wilcoxon signed-rank test

For description, see page 28; for larger sample sizes, see page 35.
Critical values for the Mann-Whitney U test

For description, see page 28; for larger sample sizes, see page 35.
Critical values for the Kolmogorov-Smirnov
two-sample test

For description, see page 28; for larger sample sizes, see page 35.
Critical values for the Kruskal-Wallis test
(small sample sizes)

For description, see page 28; for larger equal sample sizes, see page 34.
44 Elementary Statistics Tables

For description, see page 28; for larger equal sample sizes, see page 34.
Critical values for the Kruskal-Wallis test
(equal sample sizes)

For description, see page 28.

Critical values for Friedman’s test


For description, see pages 28–9.


46 Elementary Statistics Tables

Critical values for nonparametric tests with large samples

For all the eight tests dealt with on pages 26–34 there are approximate methods for finding
critical values when sample sizes exceed those covered in the tables.
Approximate critical values for the sign test, Wilcoxon signed-rank test and Mann-
Whitney U test may be found from the table of percentage points of the standard normal
distribution on page 20. Denote by z the appropriate percentage point of the standard nor-
mal distribution, e.g. 1.9600 for an α2=5% two-sided test or 1.6449 for an α1=5% one-
sided test. Then calculate µ and σ from the table below. The required critical value is
the square brackets denoting the integer part.

For example in the sign test with sample size n=144, and
, so that the α2=5% critical value is , i.e. the α2=5% critical
region is S≤59. The reader may verify similarly that (i) for the signed-rank test with n=144:
µ=5220, σ=501.428, and the α2=5% critical region is T≤4236; and (ii) in the Mann-Whitney
test with sample sizes 25 and 30:µ=375, σ=59.161, and the α2=5% critical region is U≤258.
For the Kolmogorov-Smirnov goodness-of-fit test, approximate critical values are
simply found by dividing the constants b in the following table by √n:

So with a sample of size n=144, the α2=5% critical value is 1.3581/√144=0.1132, i.e. the
critical region is D144≥0.1132. The same constants b are used to obtain approximate critical
regions for the Kolmogorov-Smirnov two-sample test. In this case b is multiplied by {1/
n1+1/n2}1/2 to give critical values for D (not D*). So with sample sizes 25 and 30, {1/n1 1/
n2}1/2={1/25+1/30}1/2=0.2708 and the α2=5% critical region is D≥1.3581×0.2708=0.3678.
For the Kolmogorov-Smirnov test for normality (with unspecified mean and standard
deviation), the critical values are found as in the goodness-of-fit test except that the second
row of constants c is used instead of b. In this case the α2=5% critical region with n=144 is
D144≥0.8993/√l44=0.0749.
Finally, the Kruskal-Wallis and Friedman test statistics are, for large sample sizes,
both distributed approximately as the χ2 distribution with v=k−1 degrees of freedom. The
Elementary Statistics Tables 47

appropriate values have been inserted at the ends of the tables on pages 32–34; values
from the χ2 table (page 21) are appropriate.

Linear and rank correlation

When data consist of pairs (X, Y) of related measurements it is often important to study
whether there is at least an approximate linear relationship between X and Y. The strength
of such a relationship is measured by the linear correlation coefficient ρ (rho), which always
lies between −1 and +1. ρ=0 indicates no linear relationship; ρ=+1 and ρ=−1 indicate exact
linear relationships of +ve and −ve slopes respectively. More generally, values of ρ near 0
indicate little linear relationship, and values near +1 or −1 indicate strong linear relation-
ships.
Tests, etc. concerning ρ are formulated using the sample linear correlation coefficient
, and being the sample mean values of X and Y.
The first table on page 36 is for testing the null hypothesis H0 that ρ=0. Critical regions
are |r|≥tabulated value if H1 is the two-sided alternative hypothesis ρ≠0 (using significance
levels α2) or r≥tabulated value or r≤−(tabulated value) if H1 is ρ>0 or ρ<0 respectively
(using levels ).
The following data show the market value (in units of £10000) of eight houses four
years ago (X) and currently (Y).

Here r is found to be 0.8918. This is very strong evidence in favour of the one-sided H1:
ρ>0, since the critical region with sample size n=8 is r≥0.8343. Had critical
values been required, they would have been given by the values prefixed with a minus
sign.
The construction of confidence intervals for ρ and the testing of values of ρ other
than ρ=0 may be accomplished using Fisher’s z-transformation. For any value of r
or ρ, this gives a ‘z-value’, z(r) or z(ρ), computed from and z(r) is known to have an

approximate normal distribution with mean z(ρ) and standard deviation . A table
giving z(r) is provided on page 36, and on page 37 there is a table for converting back from
a z-value to its corresponding r-value or ρ-value. If r or ρ is −ve, attach a minus sign to the
z-value, and vice versa.
So to find a γ=95% confidence interval for ρ with the above data, we first find the
95% confidence interval for z(ρ) as (the 1.9600
48 Elementary Statistics Tables
being the γ=95% value in the table of normal percentage points on page 20) where n=8
and z(r)=z(0.8918), which is about 1.4306 (interpolating between z(0.891)=1.4268 and
z(0.892)=1.4316 on page 36). This interval works out to (0.554:2.307). These limits for the
value of z(ρ) are then converted to ρ-values by the table on page 37, giving the confidence
interval for ρ of (0.503:0.980). As a second example, if we wish to test H :ρ=0.8 against
H1:ρ>0.8 at the significance level, the critical value for z(r) would be
(the 1.6449 again coming from page 20). The critical region
z(r)≥1.834 then converts to r≥0.950 from page 37, and so we are unable to reject H0:ρ=0.8
in favour of H1:ρ>0.8 at this significance level.
An alternative and quicker method is to use the charts on pages 38–39. For confidence
intervals, locate the obtained value of r on the horizontal axis, trace along the vertical to
the points of intersection with the two curves labelled with the sample size n, and read off
the confidence limits on the vertical axis. For critical values, locate the hypothesised value
of ρ, say ρ0, on the vertical axis, trace along the horizontal to the points of intersection with
the two curves, and read off the critical values on the horizontal axis. If these two values
are r1 and r2, with r1<r2, then the one-sided critical regions with significance level α1 for
testing H0:ρ=ρ0 against H1:ρ<ρ0 or H1:ρ>ρ0 are r≤r1 and r≥r2 respectively, and the critical
region with significance level α2=2α1 for testing H0 against H1:ρ≠ρ0 is comprised of both of
these one-sided regions.
The reader may check the charts for the results found above using the z-transformation.
Accuracy may be rather limited, especially when r and ρ are close to +1 or −1; however the
z-transformation methods are not completely accurate either, especially for small n. Further
inaccuracies may occur for sample sizes not included on the charts, in which case the user
has to judge distances between the curves.
All of the above work depends on the assumption that (X, Y) has a bivariate normal
distribution. Tables for two nonparametric methods, which do not require such an assump-
tion, are given on page 40. These methods do not test specifically for linearity but for the
tendency of Y to increase (or decrease) as X increases.
To calculate Spearman’s rank correlation coefficient, first rank the X-values and Y-
values separately from 1 to n, calculate the difference in ranks for each (X, Y) pair, and
sum the squares of these differences to obtain D2. Spearman’s coefficient rS is calculated as
rS=1−6D2/(n3−n). With the above data we have:

Thus D2 is , giving . The critical


region for testing against the tendency for Y to increase with X is rS≥0.8810, so there is
virtually conclusive proof that this tendency is present. The general forms of the critical
regions are the same as for r above.
For Kendall’s rank correlation coefficient, we compare each (X, Y) pair in turn with
every other pair; if the pair with the smaller X-value also has the smaller Y-value, the pair
Elementary Statistics Tables 49
is said to be concordant, but if it has the larger Y-value the pair is discordant. If NC and ND
are the total numbers of concordant and discordant pairs, Kendall’s coefficient is cal-
culated as , where in fact is the total number of comparisons
made. Any comparison in which the X-values and/or the Y-values are equal counts to both
NC and ND. Critical regions are of the same forms as with r and rS. In the above example,
, and . This is again clearly significant of the tendency for
Y to increase with X, since the critical region is .
Critical regions for large n may be found using the facts that, under the null hypothesis,
r, rS and have approximate normal distributions with zero means and standard deviations
for both r and rS, and {2(2n+5)/9n(n−1)}1/2 for . For example the reader may
check that with n=144 the approximate α2=5% critical regions are |r|≥0.1639, |rS|≥0.1639
and .
Critical values for
the sample linear
correlation coefficient r

For description, see page 35.

The Fisher z-transformation


For description, see page 35.


The inverse of the Fisher z-transformation

For description, see page 35.


Charts giving confidence intervals
for ρ and critical values for r

For description, see page 35.


Elementary Statistics Tables 53

For description, see page 35.


Critical values for Spearman’s rank
correlation coefficient

For description, see page 35.

Critical values for Kendall’s rank


correlation coefficient

For description, see page 35.


Control chart constants
and conversion factors
for estimating σ

Control charts are designed to aid the regular periodic checking of production and other
processes. The situation envisaged is that a quite small sample (the table caters for sam-
ple sizes n up to 20) is drawn and examined at regular intervals, and in particular the
sample mean and the sample range R are recorded (the range is the largest
value in the sample minus the smallest value). and R are then plotted on sepa-
rate control charts to monitor respectively the process average and variability.
The general form of a control chart is illustrated in the diagram. There is a central line
representing the expected (i.e. average) value of the quantity ( or R) being plotted when
the process is behaving normally (is in control). On either side of the central line are warn-
ing limits and action limits. These terms are virtually self-explanatory. The levels are such
that if an observation falls outside the warning limits the user should be alerted to watch
the subsequent behaviour of the process but should also realise that such observations are
bound to occur by chance occasionally even when the process is in control. An observation
may also fall outside the action limits when the process is in control, but the probability of
this is very small and so a more positive alert would normally be signalled. Information can
also be obtained by watching for possible trends and other such features on the charts.
The central line and warning and action limits may be derived from studying pilot sam-
ples taken when the process is presumed or known to be in control, or alternatively may
be fixed by a priori considerations. If they are derived from pilot samples we shall assume
56 Elementary Statistics Tables
that they are of the same size as those to be taken when the control scheme is in operation
and that the mean and range R are calculated for each such sample. These quantities are
then averaged over all the pilot samples to obtain and . We may also calculate, instead
of R, either the unadjusted or the adjusted sample standard deviations S or s (see below).
The charts are then drawn up as follows:

As an alternative to using pilot samples, specifications of the mean µ and/or the standard
deviation σ of the process measurements may be used to define the ‘in control’ situation.
If µ is given, use it in place of in drawing up the . If σ is given, the expected
value of R is equal to d1σ, so here define as d1σ and then proceed as above. This appli-
cation allows an exact interpretation to be made of the warning and action limits, for if
the process measurements are normally distributed with mean µ and standard deviation
σ the warning limits thus obtained correspond to quantiles q of 0.025 and 0.975 and the
action limits to quantiles of 0.001 and 0.999. In other words, the limits can be regarded as
critical values for testing the null hypothesis that the data are indeed from a normal distri-
bution with mean µ and standard deviation σ, the warning limits corresponding to signifi-
cance levels of or α2=5% and the action limits to levels of α1=0.1% or α2=0.2%.
If pilot samples are used it may be that the variability of the process has been measured
by recording the sample standard deviations rather than ranges. If the unadjusted sample
standard deviation has been calculated for each pilot sample, average the
values of S to obtain , and then define proceed as above. Or, if adjusted sample
standard deviations have been calculated, multiply their average by
d3 to obtain , and again proceed as above. It should be understood that in general these
formulae for will not give exactly the same value as if were calculated directly from the
pilot samples, but represent the expected value of given the information available.
For convenience we have also included in this table a column of constants c for forming
unbiased estimators of the standard deviation σ from either the range of a single sample or
the average range of more than one sample of the same size. Denoting by the range or
average range, σ is estimated by . σ may also be estimated from or by or
respectively.
EXAMPLES: If samples are of size n=10, and pilot samples have average value of the sample
means and average range , then the has central line at 15.00, warning
limits at 15.00±0.2014×7.00, i.e. 13.59 and 16.41, and action limits at 15.00±0.3175×7.00,
i.e. 12.78 and 17.22; the R-chart has central line at 7.00, warning limits at 0.5438×7.00=3.81
and 1.5545×7.00=10.88, and action limits at 0.3524×7.00=2.47 and 1.9410×7.00=13.59. The
standard deviation σ may be estimated from the pilot samples as .
Alternatively, if the unadjusted sample standard deviations S had been computed
instead of ranges, and the average value of the S-values were , we would define
Elementary Statistics Tables 57

. The reader may confirm that the would then have cen-
tral line 15.00, warning limits 13.66 and 16.34, and action limits 12.88 and 17.12; and the
R-chart would have central line 6.670, warning limits 3.63 and 10.37, and action limits
2.35 and 12.95. The standard deviation σ could be estimated as .
Finally if the ‘in control’ situation is defined by a mean value µ=14.0 and standard
deviation σ=2.5, we define , and then obtain an with
central line 14.0, warning limits 12.45 and 15.55, and action limits 11.56 and 16.44; and
the R-chart would have central line 7.694, warning limits 4.18 and 11.96, and action limits
2.71 and 14.93.
Random digits

Each digit in the table was generated by a process equally likely to give any one of the ten
digits 0, 1, 2,…, 9. The digits have been grouped purely for convenience of reading.
Random digits can be used to simulate random samples from any probability distribu-
tion. First note that random numbers U from the continuous uniform distribution on (0:1)
can be formed approximately by placing a decimal point in front of groups of, say, five
random digits (again for ease of reading), thus: 0.02484, 0.88139, 0.31788, etc. These
Elementary Statistics Tables 59
numbers may in turn be transformed to random numbers X from any continuous distribution
with c.d.f. F(x) by solving the equation U=F(X) for X in terms of U—this may be accomplished
using a graph of F(x) as shown in the diagram. Random numbers from discrete distributions
may be obtained by a similar graphical process or by finding the smallest X such that F(X)>U.
Random numbers from normal distributions

These random numbers are from the standard normal distribution, i.e. the normal distribu-
tion with mean 0 and standard deviation 1. They may be transformed to random numbers
from any other normal distribution with mean μ and standard deviation σ by multiply-
ing them by σ and adding μ. For example to obtain a sample from the normal distribu-
tion with mean μ=10 and standard deviation σ=2, double the numbers and add 10, thus:
2×(0.5117)+10=11.0234, 2×(−0.6501)+10=8.6998, 2×(−0.0240)+10=9.9520, etc.

Random numbers from exponential distributions

These are random numbers from the exponential distribution with mean 1. They may be
transformed to random numbers from any other exponential distribution with mean μ sim-
ply by multiplying them by μ. Thus a sample from the exponential distribution with mean
10 is 6.193, 18.350, 2.285,…, etc.
Binomial coefficients

for n=1 to 36 and 52 (for playing-card problems)


62 Elementary Statistics Tables
The binomial coefficient gives the number of different groups of r objects which may
be selected from a collection of n objects: e.g. there are different pairs of letters
which may be selected from the four letters A, B, C, D; they are (A, B), (A, C), (A, D), (B,
C), (B, D) and (C, D). The order of selection is presumed immaterial, so (B, A) is regarded
as the same as (A, B) etc. As a more substantial example, the number of different hands of
five cards which may be dealt from a full pack of 52 cards is .
Reciprocals, squares, square roots and their
reciprocals, and factorials

Useful constants
The negative exponential function: e−χ
The exponential function: eχ
Natural logarithms: logex or lnx
Elementary Statistics Tables 67
Common logarithms: log10x
Elementary Statistics Tables 69
Glossary of symbols

    Main page
references
A factor for action Umits on 41

a1 lower action limit on R-chart is 41

a2 41
upper action limit on R-chart is
c =σ/E{R}=l/d1; conversion factor for estimating σ from sample 41
range
c.d.f. cumulative distribution function: Prob (X≤x)  
D Kolmogorov-Smirnov two-sample test statistic 28, 35
D* =nAnBD 28, 31
D 2
sum of squares of rank differences 35, 40
Dn test statistic for Kolmogorov-Smirnov goodness-of-fit test or 26–27
test for normality
one-sided versions of Dn 26–27

d1 =E{R}/σ; conversion factor from σ to 41


d2 =E{R}/E{S}; conversion factor from to 41

d3 =E{R}/E{s}; conversion factor from to 41

E{} expcctcd, i.e. long-term mean, value of


e =2.71828; base of natural logarithms 18, 46
F (Snedccor) F statistic, test or distribution 22–25
F0(x) c.d.f. of (null) hypothesised probability distribution 26
Fn(x) sample (empihcal) c.d.f.; proportion of sample values which 26
are ≤x
f sample fraction; number of occurrences divided by sample size 10–13
f1 lower critical value for f, or confidence limit using F distribu- 10–13, 22
tion
f2 uppcr critical value for f, or confidence limit using F distribu- 10–13, 22
tion
H Kruskal-Wailis tcst statistic 28, 32–35
H0 null hypothesis (usually of status quo or no difference)
H1 altemative hypothesis (what a test is designed to detect)
Glossary of symbols  71

k number of regression variables 22


k number of samples 28, 32–35
logarithm to base e (natural logarithm), such that if logex=y then 45, 47
ey=x
log10 logarithm to base 10 (common logarithm), such that if log10x=y 45, 48
then 10y=x
M Friedman’s test statistic 28, 34–35
max{} maximum (largest) value of 26
N total number of observations 28, 32–33
NC number of concordant pairs, i.e. (X1, Y1,), (X2, Y2) with (X1−X2) 35, 40
(Y1−Y2) +ve
ND number of discordant pairs, i.e. (X1, Y1), (X2, Y2) with (X1−X2) 35, 40
(Y1−Y2) −ve
n, n1, n2, nA, samplc sizes
nB, ni
n common sample size of equal-size samples 28, 34
binomial coefficient; number of possible groups of r objects out 4, 44
of n
p binomial parameter; probability of event happening at any trial 4
of experiment
p0 (null) hypothesised value of p 10–11
Prob() probability of
q quantile; the number x such that Prob (X≤x)=q 20
R multiple correlation coefficient 22
R sample range: maximum value—minimum value 41
average range in pilot samples 41

RA, RB rank sums of samples A and B 28


r sample linear corrclation coefficient 35–39
r 1, r 2 lower and uppcr critical values for r 35
rS the Spearman rank correlation coefficient 35, 40
S the sign test statistic 28–29, 35
S unadjusted samplc standard deviation:
average value of S in pilot samples 41
72  Glossary of symbols

s adjusted sample standard deviation, satisfying E{s2}=σ2 (but not  


E(s}=σ); with single sample,
average value of s in pilot samples 41

adjusted sample variances  

T the Wilcoxon signed-rank test statistic 28–29, 35


t ‘Student’ t statistic, test or distribution 20
U random variable having uniform distribution on (0:1) 42
U the Mann-Whitncy test statistic 28, 30, 35
UA 28

UB 28

W factor for warning limits on 41

w1 lower warning limit on R-chart is 41

w2 upper warning limit on R-chart is 41

X random variable
x value of X
sample means

average of sample means in pilot samples 41

(X, Y) matched-pair or bivariate (two-variable) quantity 28, 35


Y random variable
y value of Y
Z random variable having standard normal distribution 18–20
z valueof Z 18–20
z, z(r), z(ρ) values obtained using Fisher’s z-transformation 35–37
α sometimes used in place of α2 if one-sided test non-existent 28
α1 significance level for one-sided test 20
significance level for left-hand tail one-sided test 20

significance level for right-hand tail one-sided test 20

α2 significance level for two-sided test 20


γ confidence level for confidence intervals
µ, µ1, µ2 population means; means of probability distributions; µ=E{X}
Glossary of symbols  73

v, v1, v2 degrees of freedom (indices for t, χ2 and F distributions) 20–25


π mathematical constant, =3.14159 18, 45
ρ population linear correlation coefficient 35–39
ρ0 (null) hypothesised value of p 35
Σ summation, e.g. ΣX=ΣXi=X1+X2+X3+…  
σ, σ1, σ2 population standard deviations; standard deviations of probabil-  
ity distributions; σ=(E{(X−µ)2})l/2
σ2 population variance; variance of probability distribution; 21
σ2=E{(X−µ)2}
the Kendall rank correlation coefficient 35, 40
Φ c.d.f. of the standard normal distribution 18–20, 27
ordinate of the standard normal curve 18–19

χ2 chi-squared statistic, test or distribution 21


< is less than  
≤ is less than or equal to  
> is greater than  
≥ is greater than or equal to  
≠ is not equal to  
+ve positive (>0)  
−ve negative (<0)  
|| modulus, absolute value, ignore minus sign if −ve  
[] integcr part; [x] is the largest integer ≤x 35
! factorial, c.g. 4!=4×3×2×1=24 44–45
integral  

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy