Stimation: Statistic

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 46

LECTURE NOTES NO.

6 M235

ESTIMATION
 Statistic: a numerical measure obtained
from the sample, for example, the sample
mean X̄ is a statistic
 Parameter: a numerical measure obtained
from the population, for example, the
population mean μ is a parameter.
Estimation  use of sample data (statistics) to
estimate population parameters

Types of Estimation:
1. Point Estimation
2. Interval Estimation
Point Estimation
A point estimate is a single value of a statistic
used to estimate a population parameter.

1
Sample

Population X̄ = sample mean

s = sample SD
μ = Mean

 = SD

For example, the sample mean,


n

X 1 + X 2 +.. .+ X n ∑ Xi
i=1
X̄ = =
n n
is an estimator for the population mean μ.

2
 How good the estimator X̄ ?
Look to its probability distribution!
Important properties of the distribution of the sample
mean X̄
1. Mean
2. Variance or Standard Deviation
3. Shape
Properties of Estimators
1. Unbiasedness
Example: A statistics class has six students,
ages are:
18, 18, 19, 20, 20, 21
The population mean
18+18+19+20+20+21 58
μ= = ≈19. 33
6 3
Select a samples of size 2 (n = 2).
There are 15 possible samples.
Find the mean of every possible sample
where n = 2:

3
4
Sampling distribution for the sample mean X̄
x̄ 18 18.5 19 19.5 20 20.5

P( x̄) 1/15 2/15 4/15 3/15 3/15 2/15

The mean (expected value) of the sample X̄

E( X̄ )≡μ X̄ =18(1/15)+18.5(2/15)+19( 4/15)+19.5(3/15)+


=20(3/15)+20.5(2/15)≈19.33
The mean value of X̄ is the same as the population
mean 
E( X̄ )=μ
X̄ is an called an unbiased estimator for μ
An estimator is unbiased if the mean of the
sampling distribution of the estimator is equal to
the parameter. An estimator θ is unbiased for
a parameter θ if

E(θ )=θ
5
If this property is not satisfied, θ is biased
estimator for θ.

Result:
Let X1, X2,…, Xn be random sample from
a population with mean µ and variance
σ2.
Consider: The sample mean
n
∑ Xi
i =1
X̄ =
n

The sample Variance


n
∑ ( X i − X̄ )2
S 2 = i=1
n−1

Then

1. X̄ is an unbiased estimator for µ,


E( X̄ )=μ
6
Proof:
n
E ( X̄ )= E
1
∑ [
n i=1
Xi
]
n
1
E( X̄ )= ∑ E ( X i )
n i=1
n
1
E( X̄ )= ∑ μ
n i=1
1
E( X̄ )= nμ=μ
n
2. S2 is an unbiased estimator for σ2
n

E( S2 )=E [ ] ∑ ( X i− X̄ )2
i=1
n−1
=σ 2

It can be shown that


7
n
E
[ ]
∑ ( X i− X̄ )2 =( n−1 )σ 2
i =1

then
n

E( S2 )=E [ ] ∑ ( X i− X̄ )2
i=1
n−1
=σ 2

But
n

E [ ]
∑ i
(
i=1
X − X̄ )2

n
=
n−1 2 2
n
σ ≠σ

Then
n
∑ ( X i − X̄ )2
i=1
n

is biased estimator for σ2.

8
2. Consistent Estimators
An unbiased estimator θ is consistent if
θ−θ →0 , as n→∞
To check the condition
θ−θ →0 , as n→∞ , we need to
check the limiting behavior of Var (θ) .
Consistent estimator is one for which estimates
close to the value of the population parameter
increases as sample size increases.
Result:
2
σ
Var ( X̄ )=
n
Proof.
n
Var ( X̄ )=Var
1
∑ [
n i =1
Xi
]
9
n
1
Var ( X̄ )= 2 ∑ Var ( X i )
n i=1
2 n
1 2 nσ
Var ( X̄ )= 2 ∑ σ = 2
n i=1 n
2
σ
Var ( X̄ )= → 0 , as n → ∞
n
The sample mean X̄ is unbiased estimator
for μ, then X̄ is consistent estimator for µ
Standard Deviation (S.D) of the estimator

σ
S . D( X̄ )=
√n

10
To estimate σ2:
n
2
∑ i
( X − X̄ )
i=1
σ 2 =S2 =
n−1
Estimate σ by S (the sample S.D)
2
σ =S=√ S
Standard Error (S.E) of the estimator

s
S . E( X̄ )=
√n
The smaller the sampling variability (i.e.,
the S.E) is, the better an estimate will be.
As the sample size increases to infinity, the
sampling distribution concentrates around
the population mean.

11
Result: Let X1, X2,…, Xn be random sample
from normal population with mean μ and
2
variance σ , 2 N( μ,σ )

 X̄ is an unbiased estimator for µ


 X̄ is a consistent estimator for µ
2
σ
X̄ : N ( μ , )
 n
X̄ is the BEST estimator for µ, it has the
smallest variance among the class of all
unbiased estimators.

12
Example-4: Want to estimate the average
rainfall in March at Irbid city. Someone
collected the data of the past 5 years: 10, 20, 30,
40, 50 (mm).
(1) What statistic should we use to estimate the
long-term average rainfall (μ), what is the point
estimate for (μ)?
Use the sample mean
Point estimate
n
∑ xi
i=1 10+20+30+ 40+50
x̄= = =30
n 5
?What statistic should we use to estimate σ2 )2(
?
find is the point estimate for σ2
n
( Hint: ∑ x 2i =5500 )
i=1

13
The sample variance
n
2
∑ i
( x − x̄ )
s 2 = i=1
n−1
n

n ( ∑ xi )2
x 2i −
i=1 ( 150 )2
∑ n
5500−
2 i=1 5
s = = =250
n−1 5−1
(3) Find the standard error
s
S . E( X̄ )=
√n
s √ 250
S . E( X̄ )= ≈7 . 07
√n √5

14
Result: Let X 1 , …, X n
are a random sample from
a Poisson distribution Poisson ( λ ) . Then
X̄ is the best estimator for λ .
Example-1: Suppose the number of vehicles
arrive at a signalized road intersection follow a
Poisson distribution. The following number of
vehicles were observed during six different
hours on a day: 350, 330, 370, 320, 280, 420.
Estimate average rate of arriving vehicles to the
road intersection per hour?
n
∑ xi
λ= x̄= i=1
n
350+330+370+320+280+ 420
λ=
6
λ=345
Then the average rate of 345 per hour
15
Interval Estimation
Let X1, X2,…, Xn be random sample and
θ be an unknown parameter. A
confidence interval (C.I) for θ is an
interval (L, U) computed based on the
sample such that it includes the true
parameter θ with a specified high
probability.
Confidence Interval for θ
Let L and U be functions of X1, X2,…, Xn, ,
(L, U) is (1-α)100% C.I. (0<α<1) for θ if
P(L < θ < U) = 1-α
1-α is called confidence coefficient.
where
L: Lower confidence limit.
U: Upper confidence limit.
16
Sampling from Normal Population
Let X1, X2,…, Xn be random sample from
normal population with mean μ and
2
variance σ2 , N( μ,σ )

Result :
2
σ
X̄ follow N ( μ , )
n
Confidence Interval for the mean (μ)
when σ2 is known:
Given the confidence coefficient 1-α, from the

z-table, find 2 (called critical value) Use

following diagram of a standard normal curve:

17
We have
1−α =P(−Z α / 2 < Z< Z α / 2 )
X̄−μ
=P(−Z α / 2 < < Z α / 2)
σ /√n
which implies
σ σ
P( X̄−Z α / 2 < μ< X̄ + Z α / 2 )
√n √n
Therefore, (1-α)100% C.I. for μ is (L, U),
where

18
σ
L= X̄ −Z α /2
√n ,
σ
U= X̄ +Z α /2
√n
For simplicity (L, U) can be written as
σ
X̄ ±Z α /2
√n
Margin of error = Critical value x Standard
deviation of the statistic
In other words,
X̄ ±E
(point estimator + margin of error)
Where E the margin of error of estimation E is
defined as

19
σ
E=Z α /2 =Z α /2 ×S . D( X̄ )
√n

Remarks:
 Usually 90%, 95%, or 99% confidence
levels are used.
 An increase in sample size leads to a
decreased interval width.
 Higher confidence levels have wider
intervals than lower confidence levels.

In the following table, we present three


commonly confidence coefficients and
Z α /2
their respective values of

Significance Level Confidence Level Critical z Value


1% 99%=0.99 2.575
20
5% 95%=0.95 1.96
10% 90%=0.90 1.645

95% confidence interval for the mean of the


population.
σ
x̄±1. 96
√n
99% confidence interval for the mean of the
population.
σ
x̄±2 .58
√n

21
Confidence Interval for the mean μ
(σ2 is unknown)
The sample from Normal population
N(μ,σ2)
Need to new distribution ? t-distribution

22
 Similar in appearance to normal:
Bell-shaped, symmetric.
 The mean of the t-distribution equals zero.
 The t distribution has only one parameter v,
called the degrees of freedom (df).
 Shape depends on the degrees of freedom v
(df).
 Has a lower height and a wider spread than
the standard normal distribution.
 As the degrees of freedom v (df).
becomes larger, the t distribution approaches
the standard normal distribution.

23
24
Example: Give the upper tail P, and the
degrees of freedom v, find
t 0. 025, 7 ?
From the table
t 0. 025, 7 =2 .36462
Theorem
Let X1, X2,…, Xn be random sample
from normal population with mean μ and
2
2
variance σ , N( μ,σ )

Then
X̄−μ
t=
s/ √ n
Follow t-distribution with n-1 degrees of
freedom ( tn-1)

25
Let X1, X2,…, Xn be random sample from
normal population with mean μ and variance
2
2
σ , N( μ,σ )

We have
1−α= p(−t α /2, n−1 <T < t α /2 , n−1 )

26
X̄−μ
= p(−t α /2, n−1 < <t α /2, n−1 )
s/ √ n
This implies
s s
= p( X̄ −t α / 2, n−1 < μ< X̄ +t α /2 , n−1 )
√n √n
Therefore, (1-α)100% C.I. for μ is (L, U),
where
s
L= X̄ −t α /2, n−1
√n ,
s
U= X̄ +t α /2, n−1
√n
For simplicity (L, U) can be written as
s
X̄ ±t α /2 , n−1
√n

27
Example: A random sample of 9
observations from a normal population was
selected and the observed data as follows:

60, 62, 65, 68, 68, 70, 71, 72, 73

Obtain a 95% confidence interval for the


population mean .
Solution:
The sample mean
60+62+65+68+68+70+71+72+73 609
x̄= = =67. 67
9 9
The sample variance
n
2
∑ i
( x − x̄ )
s 2 = i=1
n−1
n

n n ( ∑ x i )2
i=1
∑ ( x i− x̄ ) ∑ x 2i −
2
n
s 2 = i=1 = i=1
n−1 n−1
28
2
( 609 )
41371−
9
= =20 .25
9−1

Standard Error
s 4.5
SE ( X̄ )= = =1 . 5
√n √ 9
Therefore, 95% C.I. for μ is,
s
X̄ ±t α /2 , n−1
√n
4 .5
67 . 67±t 0 .025 ,8
√9
4.5
67 . 67±2 .306×
√9
67 . 67±3 . 46

(64.21, 71.13)

29
Confidence Interval for the mean μ
(σ2 is unknown)
Sample from non-normal, but its
distribution is continuous, symmetric and
unimodal and large sample size n (n ≥
30),
A (1-α)100% C.I. for μ is
s
X̄ ±Z α / 2
√n
Example: Results of a sample of a n=3534
participants in a Heart Study are shown below.
n Sample Standard
Mean Deviation (s)
Systolic Blood 3,534 127.3 19.0
Pressure
Because the sample is large, we can generate a
95% confidence interval for systolic blood
pressure using the following formula:
30
s
X̄ ±Z α / 2
√n

 Therefore, the point estimate for the true


mean systolic blood pressure in the
population is 127.3.
 We are 95% confident that the true mean is
between 126.7 and 127.9.
 The margin of error is very small (the
confidence interval is narrow), because the
sample size is large.

31
Sample Size Determination, n?

For the purpose of planning, it is required to


determine the sample size needed for estimating
the population mean μ

Assume that we need to estimate μ within E


units of the true value. That is; the interval

estimate is ( X̄−E , X̄+ E) , it follows that

σ
E=Z α / 2
√n
Then

σ E
=
√n Z α / 2

32
Zα /2
⇒√ n= ( E ) σ

2
( Z α / 2)
⇒ n= 2
σ2
E
Round to the next higher integer.

Remark 1: The margin of error ↓ as n ↑


Remark 2: The above formula to find n depends
on σ2 which is unknown?

1. Use the sample variance S2 from


previous studies.
2. Collect some data prior to the main
experiment. Such a preliminary study is
called a pilot study (How many observations
should be collected in the pilot study)
3. Our estimate of σ2 is subject to error,

33
thus, our sample size estimate for the main
experiment may differ considerably from
what is actually required.
Example:IQ scores are known to vary normally
with a standard deviation of 15. How many
students should be sampled if we want to
estimate the population mean IQ at 99%
confidence with a margin of error equal to 2?
2
( Z α / 2)
⇒n= σ2
E2
2
( 2 . 576 )
⇒n= 2
( 15 )2 =373 . 26
( 2)

round up to be safe, and take a sample of 374


students.
Interval Estimation for σ2

Need to introduce new distribution


Chi-Square (χ2) Distribution
34

χ2 distribution is continuous
distribution

It ranges from 0 to infinity.

It depends on the parameter v (v=1,2,3,…)
called the degrees of freedom.
 Chi Square distributions are positively
skewed, with the degree of skew decreasing
with increasing degrees of freedom.
 As the degrees of freedom increases, the Chi
Square distribution approaches a normal
distribution
 Figure below shows density functions for
three Chi Square distributions. Notice how
the skew decreases as the degrees of
freedom increase

35
Theorem: Let X1, X2,…, Xn be random
sample from normal population with
mean μ and variance σ , 2
2
N( μ,σ ) ,
Then
2
( n−1 ) S
2
: χ n−1
σ
To find (1-α)100% C.I for σ2

36
Which implies

Then a (1-α)100% C.I for σ2


37
2 2
(n−1)s (n−1 )s
( 2
χ α /2 , n−1
, 2
χ 1−α /2 , n−1 )
a (1-α)100% C.I for σ

2 2

(√ (n−1 )s
2
χ α /2 , n−1
,
√ (n−1)s
2
χ 1−α /2, n−1 )
Example: A random sample of size n=9
from a normal distribution, the observed
sample data: 62, 63, 65, 61, 65, 64, 66, 67
and 63 inches in height. Construct a 95
present confidence interval estimate for the
population variance.

38
Solution: Given that
9 9
n=9 , ∑ xi =576 , x̄=64 , ∑ x 2i =36894 , α=0 .05
i=1 i=1
n

n n ( ∑ x i )2
i=1
∑ ( x i− x̄ )2 ∑ x 2i − n
2 i=1 i=1
s= =
n−1 n−1

2
(567 )
36894−
9
= =3 . 75
9−1
The chi-square table against 8 degrees of
freedom

The lower limit of the interval

The upper limit of the interval

39
Thus, the required interval estimate is

40
Chi-Square Table

Degrees of Probability, p
Freedom 0.99 0.95 0.05 0.01 0.001
1 0.000 0.004 3.84 6.64 10.83
2 0.020 0.103 5.99 9.21 13.82
3 0.115 0.352 7.82 11.35 16.27
4 0.297 0.711 9.49 13.28 18.47
5 0.554 1.145 11.07 15.09 20.52
6 0.872 1.635 12.59 16.81 22.46
7 1.239 2.167 14.07 18.48 24.32
8 1.646 2.733 15.51 20.09 26.13
9 2.088 3.325 16.92 21.67 27.88
10 2.558 3.940 18.31 23.21 29.59
11 3.05 4.58 19.68 24.73 31.26
12 3.57 5.23 21.03 26.22 32.91
13 4.11 5.89 22.36 27.69 34.53
14 4.66 6.57 23.69 29.14 36.12
15 5.23 7.26 25.00 30.58 37.70
16 5.81 7.96 26.30 32.00 39.25
17 6.41 8.67 27.59 33.41 40.79
18 7.02 9.39 28.87 34.81 42.31
19 7.63 10.12 30.14 36.19 43.82
20 8.26 10.85 31.41 37.57 45.32
21 8.90 11.59 32.67 38.93 46.80
22 9.54 12.34 33.92 40.29 48.27
23 10.20 13.09 35.17 41.64 49.73

41
24 10.86 13.85 36.42 42.98 51.18
25 11.52 14.61 37.65 44.31 52.62
26 12.20 15.38 38.89 45.64 54.05
27 12.88 16.15 40.11 46.96 55.48
28 13.57 16.93 41.34 48.28 56.89
29 14.26 17.71 42.56 49.59 58.30
30 14.95 18.49 43.77 50.89 59.70

Estimation of the
Population Proportion (p)

Qualitative data with two categories


(denoted by success and failure)
Examples:
A person has a certain disease or not
An item is defective or good
Response by yes or no
Interested in estimating population
proportion of success!!!
Estimate the proportion p of defective
components in a large population of
components.
42
Let X1,X2,…,Xn be a random sample from
Bernoulli (p) ( n independent Bernoulli
trials)

X i=¿ {1 if success (with probability p)¿} ¿{}


X1+ X2+…,+Xn =Total number of Successes
= X
Sample proportion of successes p
Observed number of success ( X )
p=
sample size (n )
n
∑ Xi
p= i=1 = X̄
n

43
p (1−p )
S . E( p )=
√n
p (1− p)
p≈N ( p , )
n for large n
p− p
≈Z : N (0,1 )
p (1− p )
√ n

1−α = p(−Z α / 2 < Z< Z α / 2 )


p− p
1−α = p(− Z α / 2 < <Z α /2 )
p( 1− p )
√ n
Which implies a (1-α)100% C.I for p

44
p( 1− p )
p±Z α / 2
√ n
Replacing p by the point estimate p
Then the (1-α)100% C.I for p is

p( 1− p )
p±Z α / 2
√ n
Example: Suppose that a new drug is used to
treat patients with lung cancer. The
treatment was successful on 134 of the 245
patients it was administered to. Assume that
these patients are representative of the
population of individuals who have lung
cancer.

a) Calculate the sample proportion


successfully treated.
45
^¿
p ¿ =134/245 = 0.547

b) Calculate the S.E?


p(1− p ) 0 . 547(1−0 . 547 )
S . E=
√ n
=

245
≈0 .03

c) Determine a 95% C.I. for the proportion


successfully treated.

p( 1− p )
p±Z α / 2

0 .547 (1−0 .547 )
n

0 .547±1 .96
0 .547±062
245 √
95% C.I. for p = ( 0.485, 0.609)
= (48.5%, 60.9%).

46

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy