Chapter 10, Probability and Stats
Chapter 10, Probability and Stats
Chapter 10, Probability and Stats
10.1 Introduction
Axiom 1: P(S) = 1.
This says it is certain something must happen. If we are toss-
ing a coin the probability that the event A = {heads, tails}
must occur.
P( A1 ∪ A2 ∪ . . . ∪ An ) = P( A1 ) + P( A2 ) + . . . + P( An ). (10.1)
This axiom is more subtle than the first two, and is known
as the additivity property of probability. It says we can cal-
culate probabilities of complicated events by adding up the
probabilities of smaller events provided the smaller events
are disjoint and together make up the entire complicated
event. When we say disjoint we mean the events do not in-
tersect. This axiom can be extended to a countable sequence
of disjoint events A1 , A2 , . . . and is needed in general but not
for MATH1002 level.
P( A ∪ ( Ā ∩ B)) = P( A) + P( Ā ∩ B)
= P( A) + P( B) − P( A ∩ B)
P ( A ∪ B ) = P ( A ) + P ( B ) − P ( A ∩ B ).
1. A1 ∪ A2 ∪ . . . ∪ Ak = S
2. Ai ∩ A j = ∅ for all i 6= j
P( B| A j ) P( A j )
P( A j | B) = , j = 1, 2, . . . , k.
∑ik=1 P ( B | Ai ) P ( Ai )
P( B| A) P( A)
P( A| B) = .
P( B| A) P( A) + P( B| Ā) P( Ā)
A( x ) = {s : X (s) = x } ⊆ S.
p X ( x ) ≥ 0 and ∑ p X ( x ) = 1.
all x
FX (t) = P( X ≤ t) = ∑ p X ( t ).
x ≤t
1. 0 ≤ FX (t) ≤ 1
3. limt→∞ FX (t) = 1
4. limt→−∞ FX (t) = 0
E[ X ] = ∑ xp X ( x )
range of X
E[ H ( X )] = ∑ H (x) pX (x)
range of X
1. E[c] = c,
2. E[cX ] = cE[ X ],
3. E[ X + Y ] = E[ X ] + E[Y ].
Var ( X ) = E[( X − µ X )2 ] = ∑ ( x − µ X )2 p X ( x )
range of X
p
and the standard deviation is defined by σX = Var ( X ).
Intuitively, σX2 and σX are measures of how spread out the distri-
bution of X is or how much it varies. As a measure of variability
the variance is not so intuitive because it is measured in different
units than the random variable, but it is convenient mathemati-
cally. Therefore, the standard deviation σX is also defined and is the
square root of the variance, providing a measure of variability in
the same units as the random variable. The variance of a random
mathematical methods 2 145
1. Var ( X ) ≥ 0.
3. Var ( X ) = E( X 2 ) − µ2X = E[ X 2 ] − E[ X ]2 .
pY ( y ) = p y q 1 − y y = 0, 1.
µY = 1 × p + 0 × ( 1 − p ) = p
(clear).
1.0
● ● ● ● ● ● ●● ●● ● ● ● ● ●
●
●
●
●
0.20
0.8
●
●
0.15
0.6
●
p
y
0.10
0.4
● ●
●
0.05
0.2
●
●
0.00
●
0.0
●
● ● ● ● ●
0 5 10 15 20 0 5 10 15 20
x x
Observe that the ratio is close to 3 for small p and close to 1 for
large p illustrating the general principal of reliability: that redun-
dancy improves system reliability when components are ‘unreli-
able’ but there is little advantage in having redundancy when the
components are highly reliable.
Note that Theorem 10.7 and Theorem 10.8 hold also for continu-
ous random variables.
1.0
1.5
0.8
0.6
1.0
F
f
0.4
0.5
0.2
0.0
0.0
0 1 2 3 4 5 0 1 2 3 4 5
t t
(
λe− xλ x>0
f T (x) =
0 x≤0
(
1 − e−tλ t>0
FT (t) =
0 t≤0
1 1
E[ X ] = and Var ( X ) = 2 .
λ λ
P( T ≥ t) = 1 − FT (t) = e−tλ .
1 2 /2σ2
f X (x) = √ e−( x−µ)
σ2 2π
1.0
0.8
0.6
0.6
0.4
f
0.4
0.2
0.2
0.0
0.0
-4 -2 0 2 4 -4 -2 0 2 4
x x
P( a ≤ X ≤ b) = P ( Z ≤ (b − µ)/σ ) − P ( Z ≤ ( a − µ)/σ )
b−µ a−µ
= FZ − FZ .
σ σ
Theorem 10.9 and Corollary 10.1 say that we can write probabil-
ity statements about any normally distributed random variable in
terms of the standard normal c.d.f. Section 10.4.3 at the end of this
chapter contains approximations of the P( Z < z) = FZ (z) for z > 0
for a standard normal random variable. Some equalities will be
useful to remember: P( Z > z) = 1 − P( Z < z) and if −z < 0,
P ( Z < − z ) = P ( Z > z ) = 1 − P ( Z < z ).
In several problems in statistical inference the probability to a
standard normal is given, e.g. P( Z ≤ z) = 0.95, and it is asked what
is the corresponding value of z, e.g. z ≈ 1.64 in this case.
α = P ( Z > z α ),
a−µ
P( X > a) = 0.01. We know P( X > a) = P( Z > σ ). We also know
from Section 10.4.3 that P( Z > 2.33) ≈ 0.01 so
a − 75
= 2.33
σ
and a = 75.233. Thus if 1% of screws get rejected then the smallest
to be rejected would be approximately 75.233 mm.
10.4.1 Estimation
A very basic concept to statistical inference is that of a random sam-
ple. By sample we mean only a fraction or portion of the whole –
crash testing every car to inspect the effectiveness of airbags is not
a good idea. By random we mean essentially that the portion taken
is determined non-systematically. Heuristically, it is expected that a
haphazardly selected random sample will be representative of the
whole population because there is no selection bias. For the pur-
poses of MATH1002 (and often in practice), we will assume every
observation in a random sample is generated by the same probabil-
ity distribution and each observation is made independently of the
others. This motivates the following definition
Also,
mathematical methods 2 155
x̄n = ( x1 + x2 + . . . + xn )/n
2.58 2.58 1.75 0.53 3.29 2.04 3.46 2.92 3.10 2.41
3.89 1.99 0.74 1.59 0.35 0.03 0.52 1.42 0.04 4.02
ticularly since we know the variance will shrink around the true
mean because E[ X̄n ] = µ. Finally in the case of a normal random
sample we also have
1.5
1.0
−3 −2 −1 0 1 2 3
mathematical methods 2 157
X̄n − µ X
Zn = σX , n = 1, 2, 3, ...
√
n
where
n
1
X̄n =
n ∑ Xi .
i =1
Loosely, the CLT states that for any random sample as n gets
larger the distribution of the sample average X̄n , properly nor-
malised, approaches a standard normal distribution. Even more
˙ N (µ X , σX2 /n) where ∼
loosely, the CLT states, for large n, X̄n ∼ ˙ de-
notes approximately distributed. The CLT is one of the most power-
ful results presented in probability theory. Obviously, knowing the
distribution of the sample average is important because it provides
a measure of how precise we believe our estimate of the model pa-
rameter to be. However, more powerfully, the CLT means that for
any random variable we can always perform statistical inference on
the mean or expected value. Figure 10.4 shows a approximate p.d.f
of an exponential random variable and the standardised estimator
X̄n based on random samples of increasing size from this random
variable. The figure shows that although the original density of
each Xi is exponential, the p.d.f of the standardised estimator ap-
proaches the standard normal distribution as n increases.
By providing the p.d.f of the estimator X̄n Theorem 10.10 and
Theorem 10.11 (in the case of large n) can be used to provide ranges
of plausible values of the mean parameter – a confidence interval.
Definition 10.30. Suppose we have a random sample whose probability
model depends on a parameter θ. The two statistics, L1 and L2 , form a
100 × (1 − α)% confidence interval if
P ( L1 < θ < L2 ) = 1 − α
158 a. bassom, e. cripps, a. devillers, l. jennings, a. niemeyer, t. stemler
0.5
approximate p.d.f of Zn when n = 2 is
shown in the upper right panel, when
n = 20 in the bottom left panel and
1.5
0.4
when n = 2000 in the bottom right
panel. The thick black line shows the
p.d.f of the standard normal random
0.3
1.0
fZn
fX
0.2
p.d.f.
0.5
0.1
0.0
0.0
0 1 2 3 4 5 6 −2 0 2 4 6 8 10 12
x zn
0.4
0.4
0.3
0.3
0.2
0.2
fZn
fZn
0.1
0.1
0.0
0.0
−2 0 2 4 6 −4 −2 0 2 4
zn zn
For an observed random sample, we substitute the value x̄n for X̄n
to calculate the confidence interval. It is important to note that for
this observed data set, we cannot say the probability that µ X lies in
this interval is 0.95. All we can say is that after repeated samples we
expect 95% of the confidence intervals constructed to contain the
true value of µ X . We will now do an example using the binomial
random variable.
Notice the confidence interval does not extend below 0.05 and we
should probably doubt the producer’s claim that no more than 5%
of their bullets misfire.
0.05
0.00
xn
0.05
0.00
xn
The discussion to this point in the present section has been our
motivation for testing hypotheses; now we formalise some ideas.
By a statistical hypothesis we mean a statement about the value of a
specified parameter in a probability model. For example, based on
the probability model described by
given, apart from that in the sample data, regarding the direction of
the departure from the null hypothesis. In different circumstances
other possible alternative hypotheses are the one-side hypotheses
H1 : µ > 1000 or H1 : µ < 1000. Assessment of the null hypothesis
is then made using the observed value of a suitable test statistic con-
structed from the random sample. For our purposes this is just a
standardised version of an estimator of the relevant parameter – in
√
our case Z = n( X̄n − µ X )/σX . Based on the observed test statistic,
z, we determine the P-value, this being the probability of obtain-
ing a value of the test statistic at least as extreme as that observed. In
determining the P-value we use one or both tails of the distribu-
tion, depending on whether the alternative hypothesis is one-sided
(P( Z > z) or P( Z < z)) or two-sided (P(| Z | > z)), respectively.
H0 : p = 0.05
H1 : p > 0.05
qn ∼ N (0, 1)
0.05(1−0.05)
1000
50
because the sample size is large enough for the CLT to apply.
40
0.07 − 0.05
p = 2.9019.
0.05(1 − 0.05)/1000
10
H0 : µ X = 50
H1 : µ X 6= 50
The test is two sided because we do not have any extra informa-
tion to specify which way we should test.
X̄n − 50
q ∼ N (0, 1).
0.5
16
25
0.4
is
51.3 − 50
fXn
= 1.625.
4/5
0.2
47 48 49 50 51 52 53
where Z ∼ N (0, 1). See the figure on the right for an illustration xn
of this P-value.
The p.d.f of X̄n if the burning of the
propellant is burning according to
specification. The observed estimate is
indicated with a black dot and does
not lie far out in the tail of the p.d.f.
The probability of lying further out in
the tail in either direction is the shaded
area.
mathematical methods 2 165
1. To reject H0 does not mean that the null hypothesis is false. Only
that the data shows sufficient evidence to cast doubt on H0 . To
not reject H0 does not mean it is true, only that the data shows
insufficient evidence against H0 .
z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
0.0 0.5000 0.5040 0.5080 0.5120 0.5160 0.5199 0.5239 0.5279 0.5319 0.5359
0.1 0.5398 0.5438 0.5478 0.5517 0.5557 0.5596 0.5636 0.5675 0.5714 0.5753
0.2 0.5793 0.5832 0.5871 0.5910 0.5948 0.5987 0.6026 0.6064 0.6103 0.6141
0.3 0.6179 0.6217 0.6255 0.6293 0.6331 0.6368 0.6406 0.6443 0.6480 0.6517
0.4 0.6554 0.6591 0.6628 0.6664 0.6700 0.6736 0.6772 0.6808 0.6844 0.6879
0.5 0.6915 0.6950 0.6985 0.7019 0.7054 0.7088 0.7123 0.7157 0.7190 0.7224
0.6 0.7257 0.7291 0.7324 0.7357 0.7389 0.7422 0.7454 0.7486 0.7517 0.7549
0.7 0.7580 0.7611 0.7642 0.7673 0.7704 0.7734 0.7764 0.7794 0.7823 0.7852
0.8 0.7881 0.7910 0.7939 0.7967 0.7995 0.8023 0.8051 0.8078 0.8106 0.8133
0.9 0.8159 0.8186 0.8212 0.8238 0.8264 0.8289 0.8315 0.8340 0.8365 0.8389
1.0 0.8413 0.8438 0.8461 0.8485 0.8508 0.8531 0.8554 0.8577 0.8599 0.8621
1.1 0.8643 0.8665 0.8686 0.8708 0.8729 0.8749 0.8770 0.8790 0.8810 0.8830
1.2 0.8849 0.8869 0.8888 0.8907 0.8925 0.8944 0.8962 0.8980 0.8997 0.9015
1.3 0.9032 0.9049 0.9066 0.9082 0.9099 0.9115 0.9131 0.9147 0.9162 0.9177
1.4 0.9192 0.9207 0.9222 0.9236 0.9251 0.9265 0.9279 0.9292 0.9306 0.9319
1.5 0.9332 0.9345 0.9357 0.9370 0.9382 0.9394 0.9406 0.9418 0.9429 0.9441
1.6 0.9452 0.9463 0.9474 0.9484 0.9495 0.9505 0.9515 0.9525 0.9535 0.9545
1.7 0.9554 0.9564 0.9573 0.9582 0.9591 0.9599 0.9608 0.9616 0.9625 0.9633
1.8 0.9641 0.9649 0.9656 0.9664 0.9671 0.9678 0.9686 0.9693 0.9699 0.9706
1.9 0.9713 0.9719 0.9726 0.9732 0.9738 0.9744 0.9750 0.9756 0.9761 0.9767
2.0 0.9772 0.9778 0.9783 0.9788 0.9793 0.9798 0.9803 0.9808 0.9812 0.9817
2.1 0.9821 0.9826 0.9830 0.9834 0.9838 0.9842 0.9846 0.9850 0.9854 0.9857
2.2 0.9861 0.9864 0.9868 0.9871 0.9875 0.9878 0.9881 0.9884 0.9887 0.9890
2.3 0.9893 0.9896 0.9898 0.9901 0.9904 0.9906 0.9909 0.9911 0.9913 0.9916
2.4 0.9918 0.9920 0.9922 0.9925 0.9927 0.9929 0.9931 0.9932 0.9934 0.9936
2.5 0.9938 0.9940 0.9941 0.9943 0.9945 0.9946 0.9948 0.9949 0.9951 0.9952
2.6 0.9953 0.9955 0.9956 0.9957 0.9959 0.9960 0.9961 0.9962 0.9963 0.9964
2.7 0.9965 0.9966 0.9967 0.9968 0.9969 0.9970 0.9971 0.9972 0.9973 0.9974
2.8 0.9974 0.9975 0.9976 0.9977 0.9977 0.9978 0.9979 0.9979 0.9980 0.9981
2.9 0.9981 0.9982 0.9982 0.9983 0.9984 0.9984 0.9985 0.9985 0.9986 0.9986
3.0 0.9987 0.9987 0.9987 0.9988 0.9988 0.9989 0.9989 0.9989 0.9990 0.9990
3.1 0.9990 0.9991 0.9991 0.9991 0.9992 0.9992 0.9992 0.9992 0.9993 0.9993
3.2 0.9993 0.9993 0.9994 0.9994 0.9994 0.9994 0.9994 0.9995 0.9995 0.9995
3.3 0.9995 0.9995 0.9995 0.9996 0.9996 0.9996 0.9996 0.9996 0.9996 0.9997
3.4 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9998
3.5 0.9998 0.9998 0.9998 0.9998 0.9998 0.9998 0.9998 0.9998 0.9998 0.9998