Part 8
Part 8
Part 8
Abdullah Al-Shiha
9.1 Introduction:
• Suppose we have a population with some unknown
parameter(s).
Example: Normal(μ,σ)
μ and σ are parameters.
• We need to draw conclusions (make inferences) about the
unknown parameters.
• We select samples, compute some statistics, and make
inferences about the unknown parameters based on the
sampling distributions of the statistics.
* Statistical Inference
(1) Estimation of the parameters (Chapter 9)
→ Point Estimation
→ Interval Estimation (Confidence Interval)
(2) Tests of hypotheses about the parameters (Chapter 10)
Point Estimation:
A point estimate of some population parameter θ is a single
value θˆ of a statistic Θˆ . For example, the value x of the statistic
X computed from a sample of size n is a point estimate of the
population mean μ.
Notation:
Za is the Z-value leaving an area of a to the
right; i.e., P(Z>Za)=a or equivalently,
P(Z<Za)=1−a
σ σ
( X − Zα , X + Zα )
2
n 2
n
σ
⇔ X ± Zα
2
n
σ σ
⇔ X − Zα < μ < X + Zα
2
n 2
n
where Z α is the Z-value leaving an area
2
of α/2 to the right; i.e., P(Z> Z α )=α/2, or
2
equivalently, P(Z< Z α )=1−α/2.
2
Note:
σ σ
We are (1−α)100% confident that μ ∈ ( X − Z α , X + Zα )
2
n 2
n
Example 9.2:
The average zinc concentration recorded from a sample of zinc
measurements in 36 different locations is found to be 2.6
gram/milliliter. Find a 95% and 99% confidence interval (C.I.)
for the mean zinc concentration in the river. Assume that the
population standard deviation is 0.3.
Solution:
μ=the mean zinc concentration in the river.
(unknown parameter)
Population Sample
μ=?? n=36
σ=0.3 X =2.6
Z α = Z0.025
2
= 1.96
A 95% C.I. for μ is
σ
X ± Zα
2
n
σ σ
⇔ X − Zα < μ < X + Zα
2
n 2
n
⇔ 2.6 − (1.96)⎛⎜
0.3 ⎞ ⎛ 0.3 ⎞
⎟ < μ < 2.6 + (1.96)⎜ ⎟
⎝ 36 ⎠ ⎝ 36 ⎠
⇔ 2.6 − 0.098 < μ < 2.6 + 0.098
⇔ 2.502 < μ < 2.698
⇔ μ ∈( 2.502 , 2.698)
We are 95% confident that μ ∈( 2.502 , 2.698).
Note:
σ
max error of estimation = Z α with (1−α)100% confidence.
2
n
Example:
In Example 9.2, we are 95% confident that the sample mean
X = 2.6 differs from the true mean μ by an amount less than
σ ⎛ 0.3 ⎞
Zα = (1.96)⎜ ⎟ = 0.098 .
2
n ⎝ 36 ⎠
Note:
σ
Let e be the maximum amount of the error, that is e = Z α ,
2
n
then:
2
σ σ ⎛ σ⎞
e = Zα ⇔ n = Zα ⇔ n = ⎜ Zα ⎟
⎜ ⎟
2
n 2
e ⎝ 2 e⎠
Theorem 9.2:
If X is used as an estimate of μ, we can then be (1−α)100%
confident that the error (in estimation) will not exceed a
2
⎛ σ⎞
specified amount e when the sample size is n = ⎜ Z α ⎟ .
⎜ ⎟
⎝ 2 e⎠
Note:
1. All fractional values of n = ( Z α σ / e) 2 are rounded up to the
2
next whole number.
2. If σ is unknown, we could take a preliminary sample of
size n≥30 to provide an estimate of σ. Then using
n
S= ∑ ( X i − X ) /( n − 1)
2
as an approximation for σ in
i =1
Theorem 9.2 we could determine approximately how
many observations are needed to provide the desired
degree of accuracy.
Example 9.3:
How large a sample is required in Example 9.2 if we want to be
95% confident that our estimate of μ is off by less than 0.05?
Solution:
n
First, a point estimate for μ is X = ∑ X i / n = 10.0
i =1
⇔ 10.0 − (2.447)⎛⎜
0.283 ⎞ ⎛ 0.283 ⎞
⎟ < μ < 10.0 + (2.447)⎜ ⎟
⎝ 7 ⎠ ⎝ 7 ⎠
⇔ 10.0 − 0.262< μ < 10.0 + 0.262
⇔ 9.74 < μ < 10.26
⇔ μ ∈( 9.74 , 10.26)
We are 95% confident that μ ∈( 9.74 , 10.26).
σ 12 σ 22
or ( X 1 − X 2 ) ± Z α +
2
n1 n2
⎛ σ 12 σ 22 σ 12 σ 22 ⎞⎟
⎜
or ( X 1 − X 2 ) − Z α + , ( X1 − X 2 ) + Z α +
⎜ n n n n2 ⎟
⎝ 2 1 2 2 1 ⎠
mileage for engine A was 36 miles per gallon and the average
for engine B was 42 miles per gallon. Find 96% confidence
interval for μB−μA, where μA and μB are population mean gas
mileage for engines A and B, respectively. Assume that the
population standard deviations are 6 and 8 for engines A and B,
respectively.
Solution:
Engine A Engine B
nA=50 nB=75
X A =36 X B =42
σA=6 σB=8
A point estimate for μB−μA is X B − X A =42−36=6.
Confidence interval:
α = ??
96% = (1−α)100% ⇔ 0. 96 = (1−α) ⇔ α=0.04 ⇔ α/2 = 0.02
Z α = Z0.02 = 2.05
2
A 96% C.I. for μB−μA is
σ B2 σ A2 σ B2 σ A2
( X B − X A ) − Zα + < μB − μ A < ( X B − X A ) + Zα +
2
nB nA 2
nB nA
σ B2 σ A2
( X B − X A ) ± Zα +
2
nB nA
82 62
(42 − 36) ± Z 0.02 +
75 50
64 36
6 ± (2.05) +
75 50
6 ± 2.571
3.43 < μB−μA < 8.57
We are 96% confident that μB−μA ∈(3.43, 8.57).
Solution:
Wire A Wire B
nA=6 nB=6
X A =140.67 X B =138.50
S2A=7.86690 S2B=7.10009
A point estimate for μA−μB is X A − X B =140.67−138.50=2.17.
Confidence interval:
95% = (1−α)100% ⇔ 0. 95 = (1−α) ⇔ α=0.05 ⇔ α/2 = 0.025
ν= df = nA+nB − 2= 10B
t α = t0.025 = 2.228
2
(n A − 1) S A2 + (n B − 1) S B2
S 2p =
n A + nB − 2
(6 − 1)(7.86690) + (6 − 1)(7.10009)
= =7.4835
6+6−2
S p = S 2p = 7.4835 = 2.7356
A 95% C.I. for μA−μB is
1 1 1 1
( X A − X B ) − tα S p + < μ A − μ B < ( X A − X B ) + tα S p +
2
n A nB 2
n A nB
1 1
or ( X A − X B ) ± t α S p +
2
n A nB
1 1
(140.67 − 138.50) ± (2.228) (2.7356) +
6 6
2.17 ± 3.51890
−1.35< μA−μB < 5.69
We are 95% confident that μA−μB ∈(−1.35, 5.69)B
Example 9.10:
In a random sample of n=500 families owing television sets in
the city of Hamilton, Canada, it was found that x=340
subscribed to HBO. Find 95% confidence interval for the actual
proportion of families in this city who subscribe to HBO.
Solution:
.p = proportion of families in this city who subscribe to HBO.
.n = sample size
= 500
X = no. of families in the sample who subscribe to HBO.
= 340
p̂ = proportion of families in the sample who subscribe to HBO.
X 340
= = = 0.68
n 500
qˆ = 1 − pˆ = 1 − 0.68 = 0.32
A point estimator for p is
X 340
p̂ = = = 0.68
n 500
Now,
95% = (1−α)100% ⇔ 0. 95 = (1−α) ⇔ α=0.05 ⇔ α/2 = 0.025
Z α = Z0.025 = 1.96
2
·
independent
¹
E(X1)= n1 p1
Var(X1)=n1 p1 q1 (q1=1− p1)
• Let X2 = no. of elements of type A in the 2-nd sample.
X2~Binomial(n2, p2)
E(X2)= n2 p2
Var(X2)=n2 p2 q2 (q2=1− p2)
X1
• pˆ1 = = proportion of the 1-st sample
n1
X
• pˆ 2 = 2 = proportion of the 2-nd sample
n2
• The sampling distribution of pˆ 1 − pˆ 2 is used to make
inferences about p1− p2.
Result:
(1) E ( pˆ 1 − pˆ 2 ) = p1 − p2
p1 q1 p2 q2
(2) Var ( pˆ 1 − pˆ 2 ) = + ; q1 = 1 − p1 , q2 = 1 − p2
n1 n2
(3) For large n1 and n2, we have
p1 q1 p 2 q 2
pˆ 1 − pˆ 2 ~ N ( p1 − p 2 , + ) (Approximately)
n1 n2
( pˆ − pˆ 2 ) − ( p1 − p2 )
Z= 1 ~ N(0,1) (Approximately)
p1 q1 p2 q2
+
n1 n2
or
⎛ pˆ 1 qˆ1 pˆ 2 qˆ 2 pˆ 1 qˆ1 pˆ 2 qˆ 2 ⎞
⎜ ( pˆ − pˆ ) − Z + , ( pˆ 1 − pˆ 2 ) + Z α + ⎟
⎜ 1 2 α ⎟
⎝ 2
n1 n2 2
n1 n2 ⎠
or
pˆ1 qˆ1 pˆ 2 qˆ2 pˆ1 qˆ1 pˆ 2 qˆ2
( pˆ1 − pˆ 2 ) − Z α + < p1 − p2 < ( pˆ1 − pˆ 2 ) + Z α +
2
n1 n2 2
n1 n2
Example 9.13:
A certain change in a process for manufacture of component
parts is being considered. Samples are taken using both existing
and the new procedure to determine if the new process results in
an improvement. If 75 of 1500 items from the existing
procedure were found to be defective and 80 of 2000 items from
the new procedure were found to be defective, find 90%
confidence interval for the true difference in the fraction of
defectives between the existing and the new process.
Solution:
.p1 = fraction (proportion) of defectives of the existing process
.p2 = fraction (proportion) of defectives of the new process
p̂1 = sample fraction of defectives of the existing process
p̂2 = sample fraction of defectives of the new process
pˆ 1 qˆ1 pˆ 2 qˆ 2
( pˆ 1 − pˆ 2 ) ± Z α +
2
n1 n2
pˆ 1 qˆ1 pˆ 2 qˆ 2
( pˆ 1 − pˆ 2 ) ± Z 0.05 +
n1 n2
(0.05)(0.95) (0.04)(0.96)
0.01 ± 1.645 +
1500 2000
0.01 ± 0.01173
−0.0017 < p1− p2 < 0.0217
We are 90% confident that p1− p2 ∈(−0.0017, 0.0217).
Note:
Since 0∈90% confidence interval=(−0.0017, 0.0217), there is no
reason to believe that the new procedure produced a significant
decrease in the proportion of defectives over the existing
method (p1− p2 ≈ 0 ⇔ p1 ≈ p2).