Stimation: Statistic
Stimation: Statistic
Stimation: Statistic
6 M235
ESTIMATION
Statistic: a numerical measure obtained
from the sample, for example, the sample
mean X̄ is a statistic
Parameter: a numerical measure obtained
from the population, for example, the
population mean μ is a parameter.
Estimation use of sample data (statistics) to
estimate population parameters
Types of Estimation:
1. Point Estimation
2. Interval Estimation
Point Estimation
A point estimate is a single value of a statistic
used to estimate a population parameter.
1
Sample
s = sample SD
μ = Mean
= SD
X 1 + X 2 +.. .+ X n ∑ Xi
i=1
X̄ = =
n n
is an estimator for the population mean μ.
2
How good the estimator X̄ ?
Look to its probability distribution!
Important properties of the distribution of the sample
mean X̄
1. Mean
2. Variance or Standard Deviation
3. Shape
Properties of Estimators
1. Unbiasedness
Example: A statistics class has six students,
ages are:
18, 18, 19, 20, 20, 21
The population mean
18+18+19+20+20+21 58
μ= = ≈19. 33
6 3
Select a samples of size 2 (n = 2).
There are 15 possible samples.
Find the mean of every possible sample
where n = 2:
3
4
Sampling distribution for the sample mean X̄
x̄ 18 18.5 19 19.5 20 20.5
E(θ )=θ
5
If this property is not satisfied, θ is biased
estimator for θ.
Result:
Let X1, X2,…, Xn be random sample from
a population with mean µ and variance
σ2.
Consider: The sample mean
n
∑ Xi
i =1
X̄ =
n
Then
E( S2 )=E [ ] ∑ ( X i− X̄ )2
i=1
n−1
=σ 2
then
n
E( S2 )=E [ ] ∑ ( X i− X̄ )2
i=1
n−1
=σ 2
But
n
E [ ]
∑ i
(
i=1
X − X̄ )2
n
=
n−1 2 2
n
σ ≠σ
Then
n
∑ ( X i − X̄ )2
i=1
n
8
2. Consistent Estimators
An unbiased estimator θ is consistent if
θ−θ →0 , as n→∞
To check the condition
θ−θ →0 , as n→∞ , we need to
check the limiting behavior of Var (θ) .
Consistent estimator is one for which estimates
close to the value of the population parameter
increases as sample size increases.
Result:
2
σ
Var ( X̄ )=
n
Proof.
n
Var ( X̄ )=Var
1
∑ [
n i =1
Xi
]
9
n
1
Var ( X̄ )= 2 ∑ Var ( X i )
n i=1
2 n
1 2 nσ
Var ( X̄ )= 2 ∑ σ = 2
n i=1 n
2
σ
Var ( X̄ )= → 0 , as n → ∞
n
The sample mean X̄ is unbiased estimator
for μ, then X̄ is consistent estimator for µ
Standard Deviation (S.D) of the estimator
X̄
σ
S . D( X̄ )=
√n
10
To estimate σ2:
n
2
∑ i
( X − X̄ )
i=1
σ 2 =S2 =
n−1
Estimate σ by S (the sample S.D)
2
σ =S=√ S
Standard Error (S.E) of the estimator
X̄
s
S . E( X̄ )=
√n
The smaller the sampling variability (i.e.,
the S.E) is, the better an estimate will be.
As the sample size increases to infinity, the
sampling distribution concentrates around
the population mean.
11
Result: Let X1, X2,…, Xn be random sample
from normal population with mean μ and
2
variance σ , 2 N( μ,σ )
12
Example-4: Want to estimate the average
rainfall in March at Irbid city. Someone
collected the data of the past 5 years: 10, 20, 30,
40, 50 (mm).
(1) What statistic should we use to estimate the
long-term average rainfall (μ), what is the point
estimate for (μ)?
Use the sample mean
Point estimate
n
∑ xi
i=1 10+20+30+ 40+50
x̄= = =30
n 5
?What statistic should we use to estimate σ2 )2(
?
find is the point estimate for σ2
n
( Hint: ∑ x 2i =5500 )
i=1
13
The sample variance
n
2
∑ i
( x − x̄ )
s 2 = i=1
n−1
n
n ( ∑ xi )2
x 2i −
i=1 ( 150 )2
∑ n
5500−
2 i=1 5
s = = =250
n−1 5−1
(3) Find the standard error
s
S . E( X̄ )=
√n
s √ 250
S . E( X̄ )= ≈7 . 07
√n √5
14
Result: Let X 1 , …, X n
are a random sample from
a Poisson distribution Poisson ( λ ) . Then
X̄ is the best estimator for λ .
Example-1: Suppose the number of vehicles
arrive at a signalized road intersection follow a
Poisson distribution. The following number of
vehicles were observed during six different
hours on a day: 350, 330, 370, 320, 280, 420.
Estimate average rate of arriving vehicles to the
road intersection per hour?
n
∑ xi
λ= x̄= i=1
n
350+330+370+320+280+ 420
λ=
6
λ=345
Then the average rate of 345 per hour
15
Interval Estimation
Let X1, X2,…, Xn be random sample and
θ be an unknown parameter. A
confidence interval (C.I) for θ is an
interval (L, U) computed based on the
sample such that it includes the true
parameter θ with a specified high
probability.
Confidence Interval for θ
Let L and U be functions of X1, X2,…, Xn, ,
(L, U) is (1-α)100% C.I. (0<α<1) for θ if
P(L < θ < U) = 1-α
1-α is called confidence coefficient.
where
L: Lower confidence limit.
U: Upper confidence limit.
16
Sampling from Normal Population
Let X1, X2,…, Xn be random sample from
normal population with mean μ and
2
variance σ2 , N( μ,σ )
Result :
2
σ
X̄ follow N ( μ , )
n
Confidence Interval for the mean (μ)
when σ2 is known:
Given the confidence coefficient 1-α, from the
Zα
z-table, find 2 (called critical value) Use
17
We have
1−α =P(−Z α / 2 < Z< Z α / 2 )
X̄−μ
=P(−Z α / 2 < < Z α / 2)
σ /√n
which implies
σ σ
P( X̄−Z α / 2 < μ< X̄ + Z α / 2 )
√n √n
Therefore, (1-α)100% C.I. for μ is (L, U),
where
18
σ
L= X̄ −Z α /2
√n ,
σ
U= X̄ +Z α /2
√n
For simplicity (L, U) can be written as
σ
X̄ ±Z α /2
√n
Margin of error = Critical value x Standard
deviation of the statistic
In other words,
X̄ ±E
(point estimator + margin of error)
Where E the margin of error of estimation E is
defined as
19
σ
E=Z α /2 =Z α /2 ×S . D( X̄ )
√n
Remarks:
Usually 90%, 95%, or 99% confidence
levels are used.
An increase in sample size leads to a
decreased interval width.
Higher confidence levels have wider
intervals than lower confidence levels.
21
Confidence Interval for the mean μ
(σ2 is unknown)
The sample from Normal population
N(μ,σ2)
Need to new distribution ? t-distribution
22
Similar in appearance to normal:
Bell-shaped, symmetric.
The mean of the t-distribution equals zero.
The t distribution has only one parameter v,
called the degrees of freedom (df).
Shape depends on the degrees of freedom v
(df).
Has a lower height and a wider spread than
the standard normal distribution.
As the degrees of freedom v (df).
becomes larger, the t distribution approaches
the standard normal distribution.
23
24
Example: Give the upper tail P, and the
degrees of freedom v, find
t 0. 025, 7 ?
From the table
t 0. 025, 7 =2 .36462
Theorem
Let X1, X2,…, Xn be random sample
from normal population with mean μ and
2
2
variance σ , N( μ,σ )
Then
X̄−μ
t=
s/ √ n
Follow t-distribution with n-1 degrees of
freedom ( tn-1)
25
Let X1, X2,…, Xn be random sample from
normal population with mean μ and variance
2
2
σ , N( μ,σ )
We have
1−α= p(−t α /2, n−1 <T < t α /2 , n−1 )
26
X̄−μ
= p(−t α /2, n−1 < <t α /2, n−1 )
s/ √ n
This implies
s s
= p( X̄ −t α / 2, n−1 < μ< X̄ +t α /2 , n−1 )
√n √n
Therefore, (1-α)100% C.I. for μ is (L, U),
where
s
L= X̄ −t α /2, n−1
√n ,
s
U= X̄ +t α /2, n−1
√n
For simplicity (L, U) can be written as
s
X̄ ±t α /2 , n−1
√n
27
Example: A random sample of 9
observations from a normal population was
selected and the observed data as follows:
n n ( ∑ x i )2
i=1
∑ ( x i− x̄ ) ∑ x 2i −
2
n
s 2 = i=1 = i=1
n−1 n−1
28
2
( 609 )
41371−
9
= =20 .25
9−1
Standard Error
s 4.5
SE ( X̄ )= = =1 . 5
√n √ 9
Therefore, 95% C.I. for μ is,
s
X̄ ±t α /2 , n−1
√n
4 .5
67 . 67±t 0 .025 ,8
√9
4.5
67 . 67±2 .306×
√9
67 . 67±3 . 46
(64.21, 71.13)
29
Confidence Interval for the mean μ
(σ2 is unknown)
Sample from non-normal, but its
distribution is continuous, symmetric and
unimodal and large sample size n (n ≥
30),
A (1-α)100% C.I. for μ is
s
X̄ ±Z α / 2
√n
Example: Results of a sample of a n=3534
participants in a Heart Study are shown below.
n Sample Standard
Mean Deviation (s)
Systolic Blood 3,534 127.3 19.0
Pressure
Because the sample is large, we can generate a
95% confidence interval for systolic blood
pressure using the following formula:
30
s
X̄ ±Z α / 2
√n
31
Sample Size Determination, n?
σ
E=Z α / 2
√n
Then
σ E
=
√n Z α / 2
32
Zα /2
⇒√ n= ( E ) σ
2
( Z α / 2)
⇒ n= 2
σ2
E
Round to the next higher integer.
33
thus, our sample size estimate for the main
experiment may differ considerably from
what is actually required.
Example:IQ scores are known to vary normally
with a standard deviation of 15. How many
students should be sampled if we want to
estimate the population mean IQ at 99%
confidence with a margin of error equal to 2?
2
( Z α / 2)
⇒n= σ2
E2
2
( 2 . 576 )
⇒n= 2
( 15 )2 =373 . 26
( 2)
35
Theorem: Let X1, X2,…, Xn be random
sample from normal population with
mean μ and variance σ , 2
2
N( μ,σ ) ,
Then
2
( n−1 ) S
2
: χ n−1
σ
To find (1-α)100% C.I for σ2
36
Which implies
2 2
(√ (n−1 )s
2
χ α /2 , n−1
,
√ (n−1)s
2
χ 1−α /2, n−1 )
Example: A random sample of size n=9
from a normal distribution, the observed
sample data: 62, 63, 65, 61, 65, 64, 66, 67
and 63 inches in height. Construct a 95
present confidence interval estimate for the
population variance.
38
Solution: Given that
9 9
n=9 , ∑ xi =576 , x̄=64 , ∑ x 2i =36894 , α=0 .05
i=1 i=1
n
n n ( ∑ x i )2
i=1
∑ ( x i− x̄ )2 ∑ x 2i − n
2 i=1 i=1
s= =
n−1 n−1
2
(567 )
36894−
9
= =3 . 75
9−1
The chi-square table against 8 degrees of
freedom
39
Thus, the required interval estimate is
40
Chi-Square Table
Degrees of Probability, p
Freedom 0.99 0.95 0.05 0.01 0.001
1 0.000 0.004 3.84 6.64 10.83
2 0.020 0.103 5.99 9.21 13.82
3 0.115 0.352 7.82 11.35 16.27
4 0.297 0.711 9.49 13.28 18.47
5 0.554 1.145 11.07 15.09 20.52
6 0.872 1.635 12.59 16.81 22.46
7 1.239 2.167 14.07 18.48 24.32
8 1.646 2.733 15.51 20.09 26.13
9 2.088 3.325 16.92 21.67 27.88
10 2.558 3.940 18.31 23.21 29.59
11 3.05 4.58 19.68 24.73 31.26
12 3.57 5.23 21.03 26.22 32.91
13 4.11 5.89 22.36 27.69 34.53
14 4.66 6.57 23.69 29.14 36.12
15 5.23 7.26 25.00 30.58 37.70
16 5.81 7.96 26.30 32.00 39.25
17 6.41 8.67 27.59 33.41 40.79
18 7.02 9.39 28.87 34.81 42.31
19 7.63 10.12 30.14 36.19 43.82
20 8.26 10.85 31.41 37.57 45.32
21 8.90 11.59 32.67 38.93 46.80
22 9.54 12.34 33.92 40.29 48.27
23 10.20 13.09 35.17 41.64 49.73
41
24 10.86 13.85 36.42 42.98 51.18
25 11.52 14.61 37.65 44.31 52.62
26 12.20 15.38 38.89 45.64 54.05
27 12.88 16.15 40.11 46.96 55.48
28 13.57 16.93 41.34 48.28 56.89
29 14.26 17.71 42.56 49.59 58.30
30 14.95 18.49 43.77 50.89 59.70
Estimation of the
Population Proportion (p)
43
p (1−p )
S . E( p )=
√n
p (1− p)
p≈N ( p , )
n for large n
p− p
≈Z : N (0,1 )
p (1− p )
√ n
44
p( 1− p )
p±Z α / 2
√ n
Replacing p by the point estimate p
Then the (1-α)100% C.I for p is
p( 1− p )
p±Z α / 2
√ n
Example: Suppose that a new drug is used to
treat patients with lung cancer. The
treatment was successful on 134 of the 245
patients it was administered to. Assume that
these patients are representative of the
population of individuals who have lung
cancer.
p( 1− p )
p±Z α / 2
√
0 .547 (1−0 .547 )
n
0 .547±1 .96
0 .547±062
245 √
95% C.I. for p = ( 0.485, 0.609)
= (48.5%, 60.9%).
46