Simple Random Sampling

Download as pdf or txt
Download as pdf or txt
You are on page 1of 18

SIMPLE RANDOM SAMPLING

A procedure for selecting a sample of size n out of a finite population of size N in which
each of the possible distinct samples has an equal chance of being selected is called random
sampling or simple random sampling.
We may have two distinct types of simple random sampling as follows:
i) Simple random sampling with replacement (srswr ) .
ii) Simple random sampling without replacement (srswor ) .

Simple random sampling with replacement (srswr )


In sampling with replacement a unit is selected from the population consisting of N units, its
content noted and then returned to the population before the next draw is made, and the
process is repeated n times to give a sample of n units. In this method, at each draw, each of
1
the N units of the population gets the same probability of being selected. Here the same
N
unit of the population may occur more than once in the sample (order in which the sample
units are obtained is regarded). There are N n samples, and each has an equal probability
1
of being selected.
Nn
Note: If order in which the sample units are obtained is ignored (unordered), then in such
case the number of possible samples will be
N
C n + N (1+ N −1C1 + N −1C 2 + L + N −1C n − 2 ) .

Simple random sampling without replacement (srswor )


Suppose the population consist of N units, then, in simple random sampling without
replacement a unit is selected, its content noted and the unit is not returned to the population
before next draw is made. The process is repeated n times to give a sample of n units. In this
method at the r − th drawing, each of the N − r + 1 units of the population gets the same
1
probability of being included in the sample. Here any unit of the population cannot
N − r +1
occur more than once in the sample (order is ignored). There are N C n possible samples, and
1
each such sample has an equal probability of being selected.
N
Cn

Theory of simple random sampling with replacement


N , population size.
n , sample size.
Yi , value of the i − th unit of the population.

yi , value of the i − th unit of the sample.


N
Y= ∑ Yi , population total.
i =1
6 RU Khan

1 N
Y =
N
∑ Yi , population mean.
i =1

1 n
y= ∑ yi , sample mean.
n i =1

1 N 1 N
σ2 =
N
∑ (Yi − Y ) 2 = N ∑ Yi2 − Y 2 , population variance.
i =1 i =1

1 N 1 N 2 
S2 = ∑
N − 1 i =1
(Yi − Y ) 2
=  ∑ Yi − N Y 2  , population mean square.

N − 1  i =1 

1 n 1 n 2 
s2 = ∑
n − 1 i =1
( yi − y ) 2 =  ∑ y i − n y 2  , sample mean square.
n − 1  i =1 

Theorem: In srswr , the sample mean y is an unbiased estimate of the population mean Y
σ2 N −1 2
i.e. E ( y ) = Y and its variance V ( y ) = = S .
n nN

Corollary: Yˆ = N y is an unbiased estimate of the population total Y with its variance


N 2σ 2 N ( N − 1) 2
V (Yˆ ) = = S .
n n

Theorem: In srswr , sample mean square s 2 is an unbiased estimate of the population


N −1 2
variance σ 2 i.e. E ( s 2 ) = σ 2 = S .
N
Example: In a population with N = 5 , the values of Yi are 8, 3, 11, 4 and 7.

a) Calculate population mean Y , variance σ 2 and mean sum square S 2 .


b) Enumerate all possible samples of size 2 by the replacement method and verify that
i) Sample mean y is unbiased estimate of population mean Y i.e. E ( y ) = Y .

ii) N y is unbiased estimate of population total Y i.e. E ( N y ) = Y .

( N − 1) S 2 σ 2
iii) V ( y ) = = , and
nN n
 N − 1 2
iv) E ( s 2 ) =  2
S =σ .
 N 
Solution:
a) We know that

1 N 1 N 2 1  N 2 
Y = ∑ Y = 6 . 6 , σ 2
= ∑ Y − Y 2
= 8 . 24 and S 2
= ∑ Yi − N Y 2  = 10.3 .
N − 1  i =1 
i i
N i =1 N i =1 
Simple random sampling 7

b) Form a table for calculation as below:


Samples yi yi2 N yi si2 Samples yi yi2 N yi si2
(8, 8) 8.0 64.00 40.0 0.0 (11, 4) 7.5 56.25 37.5 24.5
(8, 3) 5.5 30.25 27.5 12.5 (11, 7) 9.0 81.00 45.0 8.0
(8, 11) 9.5 90.25 47.5 4.5 (4, 8) 6.0 36.00 30.0 8.0
(8, 4) 6.0 36.00 30.0 8.0 (4, 3) 3.5 12.25 17.5 0.5
(8, 7) 7.5 56.25 37.5 0.5 (4, 11) 7.5 56.25 37.5 24.5
(3, 8) 5.5 30.25 27.5 12.5 (4, 4) 4.0 16.00 20.0 0.0
(3, 3) 3.0 9.00 15.0 0.0 (4, 7) 5.5 30.25 27.5 4.5
(3, 11) 7.0 49.00 35.0 32.0 (7, 8) 7.5 56.25 37.5 0.5
(3, 4) 3.5 12.25 17.5 0.5 (7, 3) 5.0 25.00 25.0 8.0
(3, 7) 5.0 25.00 25.0 8.0 (7, 11) 9.0 81.00 45.0 8.0
(11, 8) 9.5 90.25 47.5 4.5 (7, 4) 5.5 30.25 27.5 4.5
(11, 3) 7.0 49.00 35.0 32.0 (7, 7) 7.0 49.00 35.0 0.0
(11, 11) 11.0 121.00 55.0 0.0

1 n′ 1
i) E ( y ) = ∑ yi = × 165 = 6.6 = Y , where n ′ is the number of sample.
n ′ i =1 25

1 n′
ii) E ( N y ) = ∑ N yi = 33
n ′ i =1
or E ( N y ) = N E ( y ) = 33 .

1 n′ 2
iii) V ( y ) = ∑ y i − Y 2 = 4.12 .
n ′ i =1

Now,
( N − 1) S 2 σ2
= 4.12 , and = 4.12 , therefore,
nN n
(n − 1) S 2 σ 2
V ( y) = = = 4.12 .
nN n
1 n′ 2 1
iv) E ( s ) = ∑ s i =
2
× 206 = 8.24 (1a)
n ′ i =1 25

( N − 1) S 2
and = 8.24 (2a)
N
In view of equation (1a) and (2a), we get

2 ( N − 1) S 2
E(s ) = = σ 2 = 8.24 .
N
8 RU Khan

Theory of simple random sampling without replacement


Theorem: In srswor , sample mean y is an unbiased estimate of the population mean Y
 N −n 2
i.e. E ( y ) = Y and its variance is V ( y ) =  S .
 nN 
Corollary: Yˆ = N y is an unbiased estimate of the population total Y with its variance
V (Yˆ ) = N 2 (1 − f ) S 2 / n .

Theorem: In srswor , sample mean square s 2 is an unbiased estimate of the population


mean square S 2 i.e. E ( s 2 ) = S 2 .
Example: A random sample of n = 2 households was drawn from a small colony of N = 5
households having monthly income (in rupees) as follows:
Households: 1 2 3 4 5
Income (in thousand rupees): 8 6.5 7.5 7 6

a) Calculate population mean Y , variance σ 2 and mean sum square S 2 .


b) Enumerate all possible samples of size n = 2 by the without replacement method and
verify that
i) Sample mean y is unbiased estimate of population mean Y i.e. E ( y ) = Y .
ii) N y is unbiased estimate of population total Y i.e. E ( N y ) = Y .

( N − n) S 2
iii) V ( y ) = , and
nN

iv) E ( s 2 ) = S 2 .
Solution:
a) We know that

1 N 1 N 2 1  N 2 
Y = ∑ Y = 7 , σ 2
= ∑ Y − Y 2
= 0 . 5 , and S 2
= ∑ Yi − N Y 2  = 0.625 .
N − 1  i =1 
i i
N i −1 N i =1 
b) Form a table for calculation as below:
Samples yi yi2 N yi si2 Samples yi yi2 N yi si2
(8, 6.5) 7.25 52.563 36.25 1.125 (8, 7.5) 7.75 60.063 38.75 0.125
(8, 7) 7.50 56.250 37.50 0.500 (8, 6) 7.00 49.000 35.00 2.000
(6.5, 7.5) 7.00 49.000 35.00 0.500 (6.5, 7) 6.75 45.563 33.75 0.125
(6.5, 6) 6.25 39.063 31.25 0.125 (7.5, 7) 7.25 52.563 36.25 0.125
(7.5, 6) 6.75 45.563 33.75 1.125 (7, 6) 6.50 42.250 32.50 0.500

1 n′
i) E ( y ) = ∑ y i = 7 = Y , where n ′ is the number of sample.
n ′ i =1
Simple random sampling 9

1 n′
ii) E ( N y ) = ∑ N yi = 35 , or
n ′ i =1
E ( N y ) = N E ( y ) = 35 .

1 n′ 1 n′ 2 ( N − n) S 2
iii) V ( y ) = ∑ i
n ′ i =1
( y − Y ) 2
= ∑ i
n ′ i =1
y − Y 2
= 0 . 1875 , and
nN
= 0.1875 .

Therefore,
( N − n) S 2
V ( y) = = 0.1875 .
nN
1 n′ 2
iv) E ( s ) = ∑ si = 0.625 = S 2 .
2
n ′ i =1
Property: V ( y ) under srswor is less than the V ( y ) under srswr .
Theorem: Let srswor sample of size n is drawn from a population of size N . Let
n
T= ∑ α i yi is a class of linear estimator of Y , where α i ' s are coefficient attached to
i =1
sample values, then,
n
i) The class T is linear unbiased estimate class if ∑α i = 1 .
i =1
ii) The sample mean y is the best linear unbiased estimate.
Proof:
 n  n n n
i) E (T ) = E  ∑ α i yi  = ∑ α i E ( yi ) = ∑ α i Y = Y , iff ∑ α i = 1 .
 i =1  i =1 i =1 i =1
2
 n  n
ii) V (T ) = E [T − E (T )] = E  ∑ α i yi − Y  , under ∑ α i = 1 .
2

 i =1  i =1

 n 
2
 n    n 
2
 
= E  ∑ α i yi  − 2 Y  ∑ α i yi  + Y = E  ∑ α i yi  − Y 2 .
2
 i =1   i =1    i =1 
 
Consider,
2
 n  n n
E  ∑ α i yi  = ∑ α i2 E ( yi2 ) + ∑ α iα j E ( yi y j ) (1)
 i =1  i =1 i≠ j

Note that
V ( yi ) = E ( yi2 ) − Y 2
1 N −1 2
⇒ E ( y i2 ) = ( N − 1) S 2 + Y 2 , since V ( yi ) = S for each i . (2)
N N
Now
N 1 1 N
E ( yi y j ) = ∑ Yi Pr (i ) Y j Pr ( j | i ) = N ∑ Yi Y j .
N − 1 i≠ j
i≠ j
10 RU Khan

Note that
2
N  N N N
 ∑ Yi  = ∑ Yi 2 + ∑ Yi Y j = ( N − 1) S 2 + N Y 2 + ∑ Yi Y j
 
 i =1  i =1 i≠ j i≠ j
N
⇒ ∑ Yi Y j = N 2Y 2 − ( N − 1) S 2 − N Y 2 .
i≠ j

Thus
1 1
E ( yi y j ) = [ N 2Y 2 − ( N − 1) S 2 − N Y 2 ] = Y 2 − S 2 / N . (3)
N N −1
In view of equations (2) and (3), equation (1) becomes
2
 n  n 1  n  S 2 
E  ∑ α i yi  = ∑ α i2  ( N − 1) S 2 + Y 2  + ∑ α iα j  Y 2 −
N  i≠ j  N 
 i =1  i =1 
n S2 n 2 n
2 
n 
2   2 S 
2
= S2 ∑ α i2 − ∑ iα + Y 2
∑ i  ∑ i 
α + 1 − α Y −
i =1 N i =1 i =1  i =1   N 

2
n S2
=S ∑ α i2 +Y 2
− .
i =1 N
Therefore,
n S2
V (T ) = S 2 ∑ α i2 − .
i =1 N
n n 2 n
 1 1
Since ∑ α i2 = ∑  α i −  + , under condition ∑ α i = 1 , so that
i =1 i =1 n n i =1

n 1
2
 1 1 
V (T ) = S 2  ∑  α i −  +  −  .
i =1 n  n N 

n  2
1 1
We note that V (T ) will be minimum, if ∑  α i −  = 0 , where α i = , for all
i =1 n n
1 n
i = 1, 2, L , n , and T = ∑ yi = y .
n i =1
OR
To determine α i such that V (T ) is minimum, consider the function
 n 
φ = V (T ) + λ  ∑ α i − 1 , where λ is some unknown constant.
 i =1 
Using the calculus method of Lagrange multipliers, we select α i and the constant λ to
minimize φ . Differentiating φ with respect to α i and equating to zero, we have
∂φ λ
= 0 = 2 S 2α i + λ or α i = − (4)
∂α i 2S2
Simple random sampling 11

Taking summation on both the sides of (4), we get


n nλ 2S 2
∑α i = − ⇒ λ=− (5)
i =1 2S2 n
Thus, from equations (4) and (5), we have
1 1 n
αi = , for all i = 1, 2, L , n , and T = ∑ yi = y .
n n i =1

Simple random sampling applied to qualitative characteristics


Suppose a random sample of size n is drawn from a population of size N , for which the
proportion of individuals having a character C (attribute) is P . Thus, in the population, NP
members are with a particular character C and NQ members with the character not − C (e.g.
in sampling from a population of persons, we may have persons who are smokers and non-
smokers, honest and dishonest, below poverty line and above poverty line etc.). Let a be the
a
number of members in the sample having the character C , then the sample proportion p = .
n
To obtain the expectation and variance of sample proportion, first we change the attribute to
variable by adopting the following procedure.
We assign to the i − th member of the population the value Yi , which is equal to 1 if this
member possesses the character C and is equal to 0 otherwise. In this way, we get a variable
y , which has
N
Population total = ∑ Yi = NP = A .
i =1

1 N NP
Population mean = ∑ Yi = =P.
N i =1 N

1 N 1 N NP
Population variance = ∑ (Yi − P) 2 = ∑ Yi2 − P 2 = − P 2 = PQ .
N i =1 N i =1 N

1 N 2 1  N 2 2 
Mean square of population = ∑ i(Y − P ) = ∑ Y − NP
N − 1  i =1 
i
N − 1 i =1 
NP − NP 2 NPQ
= = .
N −1 N −1
Similarly, assign to the i − th member of the sample the value y i , which is equal to 1 if this
member possesses the character C and is equal to 0 otherwise, then
n 1 n a
Sample total = ∑ yi = np = a , and Sample mean = ∑ yi = = p .
i =1 n i =1 n

1 n 1 n 2 
Mean square for sample = ∑
n − 1 i =1
( y i − p ) 2
=  ∑ yi − np 2 

n − 1  i =1 

1 npq
= (np − np 2 ) = .
n −1 n −1
12 RU Khan

Case I) Random sampling with replacement


a NPQ
On replacing Y by P , Y by NP , y by p = , S 2 by and σ 2 by PQ in the
n N −1
expressions obtained in expectation and variance of the estimates of population mean and
population total, we find
i) E ( p) = E ( y ) = Y = P . This shows that sample proportion p is an unbiased estimate of
σ2 PQ
population proportion P and V ( p) = V ( y ) = = .
n n
ii) E ( Aˆ ) = E ( Np ) = N E ( p ) = NP = A , means that Np = Aˆ is an unbiased estimate of
NP = A and
N 2σ 2 N 2 PQ
V ( Aˆ ) = V (Yˆ ) = N 2V ( y ) = = .
n n
pq PQ
Theorem: Vˆ ( p ) = v( p ) = is an unbiased estimate of V ( p ) = .
n −1 n
Case II) Random sampling without replacement
Results are:
i) E ( p) = E ( y ) = Y = P . This shows that sample proportion p is an unbiased estimate of
N − n 2  N − n  NPQ  N − n  PQ
population proportion P and V ( p ) = V ( y ) = S =  =  .
nN  nN  N − 1  N − 1  n
ii) E ( Aˆ ) = E ( Np ) = N E ( p ) = NP = A , means that Np is an unbiased estimate of NP and
 N −n 2 2  N − n  NPQ  N − n  PQ
V ( Aˆ ) = V (Yˆ ) = N 2V ( y ) = N 2  S = N   = N2  .
 nN   nN  N − 1  N −1  n
 N − n  pq  N − n  PQ
Theorem: Vˆ ( p ) = v( p ) =   is an unbiased estimate of V ( p ) =   .
 n −1  N  N −1  n
 N −n
Corollary: Vˆ ( Aˆ ) = Vˆ ( Np ) = N 2 Vˆ ( p ) = N   pq is an unbiased estimate of
 n −1 
 N − n  PQ
V ( Aˆ ) = N 2   .
 N −1  n

Confidence interval (Interval estimations)


After having the estimate of an unknown parameter (which is rarely equal to parameter), it
becomes necessary to measure the reliability of the estimate and to construct some confidence
limits with a given degree of confidence. An estimate of a population parameter given by two
numbers between which the parameter may be considered to lie is called an interval estimate,
i.e. an interval estimate of a parameter θ is an interval of the form L ≤ θ ≤ U , where L and
U depends on the sampling distribution of θˆ .
To choose L and U for any specified probability 1 − α , where L , such that
Pr ( L ≤ θ ≤ U ) = 1 − α . An interval L ≤ θ ≤ U , computed for a particular sample, is called a
(1 − α )100% confidence interval, the quantity (1 − α ) is called the confidence coefficient or
the degree of confidence, and the end points L and U are called the lower and upper
confidence limits. For instance, when α = 0.05 the degree of confidence is 0.95 and we get a
95% confidence interval.
Simple random sampling 13

Limits in case of simple random sampling with replacement


1. Confidence limit for population mean: It is usually assumed that the estimator y is
normally distributed about the corresponding population values, i.e. y ~ N (Y , σ 2 / n) .
Since the tables are available for standard normal variable, so that we transform the values
y −Y
normal to standard normal as Z = ~ N (0, 1) .
σ/ n
By definition,
Pr ( | Z | ≤ Z α / 2 ) = 1 − α or Pr (− Z α / 2 ≤ Z ≤ Z α / 2 ) = 1 − α
or Pr [ y − Z α / 2 SE ( y ) ≤ Y ≤ y + Z α / 2 SE ( y )] = 1 − α .
The probability being (1 − α ) , the interval
Pr [ y − Z α / 2 SE ( y ) ≤ Y ≤ y + Z α / 2 SE ( y )] = 1 − α will include Y , i.e. y ± Z α / 2 σ / n
will include Y .
2. Confidence limit for population total: On the same above lines, we see that
Pr [ N y − Z α / 2 SE (Yˆ ) ≤ Y ≤ N y + Z α / 2 SE (Yˆ )] = 1 − α
The probability being (1 − α ) , the interval, N y ± Z α / 2 Nσ / n will include Y .
Note: If the sample size is less than 30, and population variance is unknown, Student − t is
used, instead of standard normal.
3. Confidence limit for population proportion: As above, we see that
Pr [ p − Z α / 2 SE ( p) ≤ P ≤ p + Z α / 2 SE ( p)] = 1 − α
The probability being (1 − α ) , the interval, p ± Z α / 2 PQ / n will include P .
Limits in case of simple random sampling without replacement
1. Confidence limit for population mean: Here also the distribution of the estimate based
on the sample as distributed normally, i.e. y ~ N (Y , (1 − f ) S 2 / n) , then,
y −Y
Z= ~ N (0, 1) . By definition,
S (1 − f ) / n
Pr [ y − Z α / 2 SE ( y ) ≤ Y ≤ y + Z α / 2 SE ( y )] = 1 − α .
The probability being (1 − α ) , the interval [ y − Z α / 2 SE ( y ) ≤ Y ≤ y + Z α / 2 SE ( y )] will
include Y , i.e. y ± Z α / 2 S (1 − f ) / n will include Y .
2. Confidence limit for population total: As in srswr , we see that
Pr [ N y − Z α / 2 SE (Yˆ ) ≤ Y ≤ N y + Z α / 2 SE (Yˆ )] = 1 − α . The probability being (1 − α ) ,
the interval, N y ± Z α / 2 NS (1 − f ) / n will include Y .
Note: If the sample size is less than 30, and population variance is unknown, Student − t is
used, instead of standard normal.
3. Confidence limit for population proportion: As in srswr , we see that
Pr [ p − Z α / 2 SE ( p) ≤ P ≤ p + Z α / 2 SE ( p)] = 1 − α . The probability being (1 − α ) , the
 N − n  PQ
interval , p ± Zα / 2   will include P .
 N −1  n
14 RU Khan

Estimation of sample size


In planning a sample survey for estimating the population parameters, the preliminary thing is
how to determine the size of the sample to be drawn. Following ways can do it:
a) Specify the precision in terms of margin of error: The margin of error, which is
permissible in the estimate, is known as permissible error. It is taken as the maximum
difference between the estimate and the parametric value that can be tolerated. Suppose an
error d on either side of the parameter value Y can be tolerated in the estimate y based
on the sample values. Thus the permissible error in the estimate y is specified by
y = Y ± d or y − Y = ± d or | y − Y | = d .
Since | y − Y | = d differ from sample to sample, so this margin of error can be specified
in the form of probability statement as:
Pr [ | y − Y | ≥ d ] = α or Pr [ | y − Y | ≤ d ] = 1 − α . (1)
where α is small and it is the risk that we are willing to bear if the actual difference is
greater than d . This α is called the level of significance and (1 − α ) is called level of
confidence or confidence coefficient.
As the population is normally distributed, so the sample mean will also follow the normal
y −Y
distribution i.e. y ~ N [Y , V ( y )] , then Z = ~ N (0,1) .
V ( y)
For the given value of α we can find a value Zα of standard normal variate from the
standard normal table by the following equation:
| y − Y | 
Pr  ≥ Zα 2  = α or Pr [| y − Y | ≥ V ( y ) Zα 2 ] = α (2)
 V ( y ) 
Comparing the equation (1) and (2), we get
1 1 
d = Z α 2 V ( y ) , so that d 2 = Z α2 2 V ( y ) = Z α2 2  −  S 2 .
n N 
Z α2 2 S 2  1 1  1 1  Z α2 / 2 S 2
⇒ 1=  −  = n 0 −  , where n 0 = (3)
d2 n N  n N  d2
n n n0 n n0
or 1 = 0 − 0 ⇒ = 1 + 0 or n = (4)
n N n N n
1+ 0
N
If N is sufficiently large, then n ≅ n0 and for unknown S 2 , some rough estimate of S 2
can be used in relation’s (4) and (3).
b) Specify the precision in terms of margin of V ( y ) i.e. we have to find sample size n
such that V ( y ) = V (given). As in case of margin of error,

d2 Z α2 2 S 2 S2
d = Zα 2 V ( y ) ⇒ V ( y ) = , and n0 = =
Zα / 2 d2 V ( y)

Therefore, n0 = S 2 / V , and hence n can be obtained by relation (4).


Simple random sampling 15

c) Specify the precision in terms of coefficient of variation of y :


V ( y) V ( y)
Let CV ( y ) = e = ⇒ = e 2 or V ( y ) = e 2 Y 2 (5)
Y 2
Y
Substitute equation (5) in relation (3), we get,
S2
n0 = , and hence n from (4).
e2 Y 2
Remark
i) To get n such that the margin of error in the estimate Yˆ = N y of the population total Y
is d ′ , then,
d ′2
| Yˆ − Y | = d ′ or | N y − N Y | = d ′ , or N | d | = d ′ or N 2 d 2 = d ′ 2 , or d 2 = .
N2
Therefore,
2
 N Zα 2 S 
n0 =   , and n can be obtained by the relation (4).

 d ′ 
ii) To find n for Aˆ = N y with precision specified as V ( Aˆ ) = V i.e. V ( Aˆ ) = N 2 V ( y ) = V ′

V′ N2 S2
⇒ V ( y) = , and n0 = , then, n from (4).
N2 V′

Example: For a population of size N = 430 roughly we know that Y = 19 , S 2 = 85.6 with
srs , what should be the size of sample to estimate Yˆ with a margin of error 10% of Y apart
chance is 1 in 20.
Solution: Margin of error in the estimate y of Y is given, i.e.
19
y = Y ± 10% of Y or | y − Y | = 10% of Y = = 1.9 , so that
10

1 Z α2 2 S 2 (1.96) 2 × 85.6
Pr [ | y − Y | ≥ 1.9] = = 0.05 , and n0 = = = 91.091678 .
20 2 2
d (1.9)
Therefore,
n0
n= = 75.168 ≅ 75 .
n0
1+
N
Example: In the population of 676 petition sheets. How large must the sample be if the total
number of signatures is to be estimated with a margin of error of 1000, apart from a 1 in 20
chance? Assume that the population mean square to be 229.
Solution: Let Y be the number of signature on all the sheets. Let Yˆ is the estimate of Y .
Margin of error is specified in the estimate Yˆ of Y as
1
| Yˆ − Y | = 1000 , so that, Pr [ | Yˆ − Y | ≥ 1000] = = 0.05 .
20
16 RU Khan

We know that
2 2
 N Zα 2 S 
 =  676 × 1.96  229 = 402.01385
n0
n= , here, n0 =  
n  d′   1000 
1+ 0
N
and hence
n = 252.09 ≅ 252 .
Estimation of sample size for proportion
a) When precision is specified in terms of margin of error: Suppose size of the
population is N and population proportion is P . Let a srs of size n is taken and p be
the corresponding sample proportion and d is the margin of error in the estimate p of
P . The margin of error can be specified in the form of probability statement as
Pr [ | p − P | ≥ d ] = α or Pr [ | p − P | ≤ d ] = 1 − α (1)

As the population is normally distributed, so y ~ N [ P, V ( p )] , then


p−P
Z= ~ N (0,1) . For the given value of α we can find a value Zα of the standard
V ( p)
normal variate from the standard normal table by the following relation:

| p − P | 
Pr  ≥ Z α 2  = α or Pr [ | p − P | ≥ V ( p ) Z α 2 ] = α (2)
 V ( p) 

Comparing equation (1) and (2), the relation which gives the value of n with the required
precision of the estimate p of P is given by

 N − n  PQ
d = Z α 2 V ( p ) or d 2 = Z α2 / 2 V ( p ) = Z α2 2   , as sampling is srswr .
 N −1  n

Z α2 2 PQ  N − n  N −n Z α2 2 PQ PQ
⇒ 1=   = n0 , where n0 = = (3)
d2  n ( N − 1)  n ( N − 1) d2 V ( p)

N −1 N − n N N N −1
or = = −1 ⇒ = 1+
n0 n n n n0

N N n0 n0 n0
or n = = = = (4)
N − 1 n0 + ( N − 1) n0 N − 1 n
1+ + 1+ 0
n0 N N N

If N is sufficiently large, then n ≅ n0

b) If precision is specified in terms of V ( p ) i.e. V ( p ) = V (given).

PQ
Substituting V ( p ) = V in relation (1.16) we get, n0 = , and hence n can be obtained
V
by relation (4).
Simple random sampling 17

c) When precision is given in terms of coefficient of variation of p


Let
V ( p) V ( p)
CV ( p) = e = ⇒ = e 2 , or V ( p) = e 2 P 2 (5)
P 2
P
Substitute equation (5) in relation (3), we get,
PQ Q 1 1 
n0 = = =  − 1 , and hence n is given by the relation (4).
e2 P2 e2 P e2  P 
Remarks
i) To get n , if the margin of error in the estimate Aˆ = Np of the population total A = NP is
d ′ , then,
d ′2
| Aˆ − A | = d ′ or | N p − N P | = d ′ , or N | d | = d ′ , or N 2 d 2 = d ′ 2 , or d 2 = .
N2
Thus,
2
 N Z α 2 PQ 
n0 =   , and n can be obtained by the relation (4).

 d ′ 
ii) To find n , for Aˆ = Np with precision specified as V ( Aˆ ) = V i.e. V ( Aˆ ) = N 2 V ( p) = V ′ ,
V′ N 2 PQ
so that, V ( p ) = , substitute this value in equation (3), we get, n0 = , and n
N2 V′
is given by relation (4).
Example: In a population of 4000 people who were called for casting their votes, 50%
returned to the poll. Estimate the sample size to estimate this proportion so that the marginal
error is 5% with 95% confidence coefficient.
Solution: Margin of error in the estimate p of P is given by
| p − P | = 0.05 , then Pr [ | p − P | ≥ 0.05] = 0.05 .
We know that
Z α2 / 2 PQ (1.96) 2 × 0.5 × 0.5
n0 = = = 384.16 ≅ 384
d2 0.0025
and hence,
n0
n= = 350.498 ≅ 351 .
1 + (n 0 / N )
Exercise: In a study of the possible use of sampling to cut down the work in taking
inventory in a stock room, a count is made of the value of the articles on each of 36 shelves
in the room. The values to the nearest dollar are as follows.
29, 38, 42, 44, 45, 47, 51, 53, 53, 54, 56, 56, 56, 58, 58, 59, 60, 60, 60, 60, 61, 61, 61, 62, 64,
65, 65, 67, 67, 68, 69, 71, 74, 77, 82, 85.
The estimate of total value made from a sample is to be correct within $200, apart from a 1 in
20 chance. An advisor suggests that a simple random sample of 12 shelves will meet the
requirements. Do you agree? ∑ Yi = 2138 , and ∑ Yi2 = 131 682 .
18 RU Khan

Solution: It is given that


∑ Yi = 2138 , ∑ Yi2 = 131 682 , and N = 36 , then
i i

1   2138  
2
1 
2 2 2
S = ∑ Yi − NY  = 131 682 − 36    = 134.5
N −1  i  36 − 1   36  
and
1
| Yˆ − Y | ≤ 200 , then, Pr[| Yˆ − Y | ≤ 200] = = 0.05 .
20
We know that
2 2
n0  N Zα / 2   36 × 1.96 
n= , here n0 =   S =  134.5 = 16.7409
1+
n0  d   200 
N
and therefore,
n = 11.42765 ≅ 12 .
Exercise: With certain populations, it is known that the observations Yi are all zero on a
portion QN of N units (0 < Q < 1) . Sometimes with varying expenditure of efforts, these
units can be found and listed, so that they need not be sampled. If σ 2 is the variance of Yi in
the original population and σ 02 is the variance when all zeros are excluded, then show that
σ2 Q
σ 02 = − Y 2 , where P = 1 − Q , and Y is the mean value of Yi for the whole
P 2
P
population.
Solution: Given Y1 , Y2 , L , Y NP , Y NP +1 , L , Y N (first NP units not zero, and rest NQ units
1 N 1 NP
which are all zero). Thus, Y = ∑ Yi , population mean, and YNP = ∑ Yi ,
N i =1 NP i =1
1 NQ N NP N NP
YNQ = ∑ Yi = 0 , also, ∑ Yi = ∑ Yi , and ∑ Yi2 = ∑ Yi2 , so that NY = NP Y NP ,
NQ i =1 i =1 i =1 i =1 i =1
1
or Y NP = Y . By definition,
P
1 N 1 N N
σ2 = ∑ (Yi − Y ) 2 = ∑ Yi2 − Y 2 , or Nσ 2 = ∑ Yi2 − NY 2 .
N i =1 N i =1 i =1
NP
Similarly, NPσ 02 = ∑ Yi2 − NP YNP
2
.
i =1
Thus,
1 2 1  Q
N (σ 2 − Pσ 02 ) = NP Y NP
2
− NY 2 = NP Y − N Y 2 = N  − 1 Y 2 = N   Y 2 .
P2 P  P
Therefore,
Q σ2 Q 2
Pσ o2 = σ 2 −   Y 2 or σ o2 = − Y .
P P P2
Simple random sampling 19

Exercise: From a random sample of n units, a random sub-sample of n1 units is drawn


without replacement and added to the original sample. Show that the mean based on (n + n1 )
units is an unbiased estimator of the population mean, and that ratio of its variance to that of
1 + 3 n1 / n
the mean of the original n units is approximately , assuming that the population
(1 + n1 / n) 2
size is large.
Solution: Let the sample mean based on n , n1 , and n + n1 elements are denoted by y n ,
n n1
yn1 , and yn+n1 respectively, and are defined as y n = 1 ∑ yi , y n1 = 1 ∑ yi , and
n i =1 n1 i =1
n y n + n1 y n1
y n+ n1 = . We have to show E ( y n+ n1 ) = Y , in this case the expectation is taken
n + n1
in two stages,
i) when n is fixed
ii) over all expectation
1 1
E ( y n + n1 ) = E (n y n + n1 y n1 ) = E [n y n + n1 E ( y n1 n)]
n + n1 n + n1
1
= E (n y n + n1 y n ) , since n1 is a sub-sample of the sample of size n .
n + n1
1
= (n Y + n1 Y ) = Y .
n + n1
To obtain the variance
2
 n y n + n1 y n1 
V ( y n + n1 ) = E ( y n+ n1 − Y ) = E 2
−Y 
 n + n1 
 
1
= E [n y n + n1 y n1 − (n + n1 ) Y ] 2
2
(n + n1 )
1
= E [n y n − n Y + n1 y n1 − n1 Y ] 2
2
(n + n1 )
1
= E [n ( y n − Y ) + n1 y n1 − n1 y n + n1 y n − n1Y ] 2
2
(n + n1 )
1
= E [(n + n1 ) ( y n − Y ) + n1 ( y n1 − y n )] 2
2
(n + n1 )
1
= [(n + n1 ) 2 E ( y n − Y ) 2 + n12 E ( y n1 − y n ) 2 ] , as samples are drawn
2
(n + n1 )
independently.
1
= [(n + n1 ) 2 V ( y n ) + n12 E{E ( y n1 − y n ) 2 n}]
2
(n + n1 )
20 RU Khan

1   1 1  
=  (n + n1 ) 2 V ( y n ) + n12 E  −  S n2 
(n + n1 ) 2   n1 n  
1   n − n1  2 
=  (n + n1 ) 2 V ( y n ) + n12   S 
2 n n
(n + n1 )   1  
1  n (n − n1 ) 2  n (n − n1 ) 2
=  (n + n1 ) 2 V ( y n ) + 1 S  = V ( yn ) + 1 S .
(n + n1 ) 2  n  n (n + n1 ) 2
Therefore,
V ( y n + n1 ) n1 (n − n1 ) n1 (n − n1 )
= 1+ S 2 ≅ 1+ S2
V ( yn ) 2 2 2
n (n + n1 ) V ( y n ) n (n + n1 ) S / n

(n + n1 ) 2 + n1 (n − n1 ) n 2 + n12 + 2 n1n + n1n − n12


= =
(n + n1 ) 2 (n + n1 ) 2
n 2 + 3 n1n 1 + (3 n1 / n)
= = .
2
(n + n1 ) (1 + n1 / n) 2
Exercise: A simple random sample of size n = n1 + n2 with mean y is drawn from a finite
population, and a simple random subsample of size n1 is drawn from it with mean y1 . Show
that
i) V ( y1 − y 2 ) = S 2 [(1 / n1 ) + (1 / n2 )] , where y 2 is mean of the remaining n2 units in the
sample,
ii) V ( y1 − y ) = S 2 [(1 / n1 ) − (1 / n)] ,
iii) Cov ( y , y1 − y ) = 0 .
Repeated sampling implies repetition of the drawing of both the sample and subsample.
Solution:
i) In repeated sampling the given procedure is equivalent to draw subsamples of sizes n1
and n2 independently, thus
V ( y1 − y 2 ) = V ( y1 ) + V ( y 2 ) , since Cov ( y1 , y 2 ) = 0
= S 2 [(1 / n1 ) + (1 / n2 )] , ignoring fpc .
n y + n2 y 2 n y + n2 y 2
ii) y = 1 1 ⇒ y1 − y = y1 − 1 1
n1 + n2 n1 + n2
n y + n2 y1 − n1 y1 − n2 y 2 n2 ( y1 − y 2 )
or y1 − y = 1 1 = .
n1 + n2 n
Therefore,
2
 n2 ( y1 − y 2 )  n2 n22  1 1  2
V ( y1 − y ) = V   = 2 V ( y1 − y 2 ) =  +  S
 n  n 2 n n
n  1 2

n 2  n + n2  2 n2 2 n − n1 2  1 1  2
= 2  1  S = S = S =  −  S .
n 2  n1 n2  n1 n n1 n  n1 n 
Simple random sampling 21

iii) Cov ( y , y1 − y ) = E [ y ( y1 − y )] − E ( y ) E ( y1 − y )

= E ( y y1 − y 2 ) − Y × 0 = E ( y y1 ) − E ( y 2 ) (1)
Consider
 n y + n2 y 2  n n 
E ( y y1 ) = E  1 1 y1  = E  1 y12 + 2 y1 y 2 
 n   n n 
n n
= 1 E ( y12 ) + 2 E ( y1 ) E ( y 2 )
n n

n n n S2  n
= 1 [ V ( y1 ) + Y 2 ] + 2 Y 2 = 1  +Y 2+ 2 Y 2
n n n  n1  n

S 2 n1 2 n2 2 S 2
= + Y + Y = +Y 2 (2)
n n n n
Now

S2
V ( y ) = E ( y 2 ) − Y 2 or E ( y 2 ) = V ( y ) + Y 2 = +Y 2 (3)
n
In view of equations (1), (2), and (3), we get
 S2   2
2 S

Cov ( y , y1 − y ) =  +Y − + Y 2  = 0.
 n   n 
   
Exercise: A population has three units U1 , U 2 and U 3 with variates Y1 , Y2 and Y3
respectively. It is required to estimate the population total Y by selecting a sample of two
units. Let the sampling and estimation procedures be as follows:
Sample (s ) P (s ) Estimator t Estimator t ′
(U1 , U 2 ) 1/ 2 Y1 + 2Y2 Y1 + 2Y2 + Y12
(U1 , U 3 ) 1/ 2 Y1 + 2Y3 Y1 + 2Y3 − Y12

Prove that both t and t ′ are unbiased for Y and find their variances. Comment on the
estimators.
Solution: By definition
1
E (t ) = ∑ t i p (t i ) = (Y1 + 2Y2 + Y1 + 2Y3 ) = Y .
i 2
This shows that estimator t is unbiased for Y .
1 1
E (t 2 ) = [(Y1 + 2Y2 ) 2 + (Y1 + 2Y3 ) 2 ] = (Y12 + 4Y22 + 4Y1Y2 + Y12 + 4Y32 + 4Y1Y3 )
2 2

= Y12 + 2Y22 + 2Y32 + 2Y1Y2 + 2Y1Y3 .

Therefore,
22 RU Khan

V (t ) = E (t 2 ) − [ E (t )]2 = Y12 + 2Y22 + 2Y32 + 2Y1Y2 + 2Y1Y3 − (Y1 + Y2 + Y3 ) 2

= Y22 + Y32 − 2Y2Y3 = (Y2 − Y3 ) 2 .

Similarly,
1
E (t ′) = ∑ t i′ p (ti′ ) = (Y1 + 2Y2 + Y12 + Y1 + 2Y3 − Y12 ) = Y , hence, t ′ is unbiased for Y .
i 2
1
E (t ′ 2 ) = [(Y1 + 2Y2 + Y12 ) 2 + (Y1 + 2Y3 − Y12 ) 2 ]
2
1 4
= (Y1 + 2Y13 + Y12 + 4Y12Y2 + 4Y1Y2 + 4Y22 + Y14 − 2Y13
2

+ Y12 − 4Y12Y3 + 4Y1Y3 + 4Y32 )

= Y14 + Y12 + 2Y12Y2 + 2Y1Y2 + 2Y22 − 2Y12Y3 + 2Y1Y3 + 2Y32 .

Therefore,

V (t ′) = E (t ′ 2 ) − [ E (t ′)] 2

= Y14 + Y12 + 2Y12Y2 + 2Y1Y2 + 2Y22 − 2Y12Y3 + 2Y1Y3 + 2Y32 − (Y1 + Y2 + Y3 ) 2

= (Y2 − Y3 ) 2 + Y12 (Y12 + 2Y2 − 2Y3 )

= V (t ) + Y12 (Y12 + 2Y2 − 2Y3 ) .

We conclude that both linear estimator t and quadratic estimator t ′ are unbiased; among
which estimator has minimum variance depends on the variate values.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy