Simple Random Sampling
Simple Random Sampling
Simple Random Sampling
A procedure for selecting a sample of size n out of a finite population of size N in which
each of the possible distinct samples has an equal chance of being selected is called random
sampling or simple random sampling.
We may have two distinct types of simple random sampling as follows:
i) Simple random sampling with replacement (srswr ) .
ii) Simple random sampling without replacement (srswor ) .
1 N
Y =
N
∑ Yi , population mean.
i =1
1 n
y= ∑ yi , sample mean.
n i =1
1 N 1 N
σ2 =
N
∑ (Yi − Y ) 2 = N ∑ Yi2 − Y 2 , population variance.
i =1 i =1
1 N 1 N 2
S2 = ∑
N − 1 i =1
(Yi − Y ) 2
= ∑ Yi − N Y 2 , population mean square.
N − 1 i =1
1 n 1 n 2
s2 = ∑
n − 1 i =1
( yi − y ) 2 = ∑ y i − n y 2 , sample mean square.
n − 1 i =1
Theorem: In srswr , the sample mean y is an unbiased estimate of the population mean Y
σ2 N −1 2
i.e. E ( y ) = Y and its variance V ( y ) = = S .
n nN
( N − 1) S 2 σ 2
iii) V ( y ) = = , and
nN n
N − 1 2
iv) E ( s 2 ) = 2
S =σ .
N
Solution:
a) We know that
1 N 1 N 2 1 N 2
Y = ∑ Y = 6 . 6 , σ 2
= ∑ Y − Y 2
= 8 . 24 and S 2
= ∑ Yi − N Y 2 = 10.3 .
N − 1 i =1
i i
N i =1 N i =1
Simple random sampling 7
1 n′ 1
i) E ( y ) = ∑ yi = × 165 = 6.6 = Y , where n ′ is the number of sample.
n ′ i =1 25
1 n′
ii) E ( N y ) = ∑ N yi = 33
n ′ i =1
or E ( N y ) = N E ( y ) = 33 .
1 n′ 2
iii) V ( y ) = ∑ y i − Y 2 = 4.12 .
n ′ i =1
Now,
( N − 1) S 2 σ2
= 4.12 , and = 4.12 , therefore,
nN n
(n − 1) S 2 σ 2
V ( y) = = = 4.12 .
nN n
1 n′ 2 1
iv) E ( s ) = ∑ s i =
2
× 206 = 8.24 (1a)
n ′ i =1 25
( N − 1) S 2
and = 8.24 (2a)
N
In view of equation (1a) and (2a), we get
2 ( N − 1) S 2
E(s ) = = σ 2 = 8.24 .
N
8 RU Khan
( N − n) S 2
iii) V ( y ) = , and
nN
iv) E ( s 2 ) = S 2 .
Solution:
a) We know that
1 N 1 N 2 1 N 2
Y = ∑ Y = 7 , σ 2
= ∑ Y − Y 2
= 0 . 5 , and S 2
= ∑ Yi − N Y 2 = 0.625 .
N − 1 i =1
i i
N i −1 N i =1
b) Form a table for calculation as below:
Samples yi yi2 N yi si2 Samples yi yi2 N yi si2
(8, 6.5) 7.25 52.563 36.25 1.125 (8, 7.5) 7.75 60.063 38.75 0.125
(8, 7) 7.50 56.250 37.50 0.500 (8, 6) 7.00 49.000 35.00 2.000
(6.5, 7.5) 7.00 49.000 35.00 0.500 (6.5, 7) 6.75 45.563 33.75 0.125
(6.5, 6) 6.25 39.063 31.25 0.125 (7.5, 7) 7.25 52.563 36.25 0.125
(7.5, 6) 6.75 45.563 33.75 1.125 (7, 6) 6.50 42.250 32.50 0.500
1 n′
i) E ( y ) = ∑ y i = 7 = Y , where n ′ is the number of sample.
n ′ i =1
Simple random sampling 9
1 n′
ii) E ( N y ) = ∑ N yi = 35 , or
n ′ i =1
E ( N y ) = N E ( y ) = 35 .
1 n′ 1 n′ 2 ( N − n) S 2
iii) V ( y ) = ∑ i
n ′ i =1
( y − Y ) 2
= ∑ i
n ′ i =1
y − Y 2
= 0 . 1875 , and
nN
= 0.1875 .
Therefore,
( N − n) S 2
V ( y) = = 0.1875 .
nN
1 n′ 2
iv) E ( s ) = ∑ si = 0.625 = S 2 .
2
n ′ i =1
Property: V ( y ) under srswor is less than the V ( y ) under srswr .
Theorem: Let srswor sample of size n is drawn from a population of size N . Let
n
T= ∑ α i yi is a class of linear estimator of Y , where α i ' s are coefficient attached to
i =1
sample values, then,
n
i) The class T is linear unbiased estimate class if ∑α i = 1 .
i =1
ii) The sample mean y is the best linear unbiased estimate.
Proof:
n n n n
i) E (T ) = E ∑ α i yi = ∑ α i E ( yi ) = ∑ α i Y = Y , iff ∑ α i = 1 .
i =1 i =1 i =1 i =1
2
n n
ii) V (T ) = E [T − E (T )] = E ∑ α i yi − Y , under ∑ α i = 1 .
2
i =1 i =1
n
2
n n
2
= E ∑ α i yi − 2 Y ∑ α i yi + Y = E ∑ α i yi − Y 2 .
2
i =1 i =1 i =1
Consider,
2
n n n
E ∑ α i yi = ∑ α i2 E ( yi2 ) + ∑ α iα j E ( yi y j ) (1)
i =1 i =1 i≠ j
Note that
V ( yi ) = E ( yi2 ) − Y 2
1 N −1 2
⇒ E ( y i2 ) = ( N − 1) S 2 + Y 2 , since V ( yi ) = S for each i . (2)
N N
Now
N 1 1 N
E ( yi y j ) = ∑ Yi Pr (i ) Y j Pr ( j | i ) = N ∑ Yi Y j .
N − 1 i≠ j
i≠ j
10 RU Khan
Note that
2
N N N N
∑ Yi = ∑ Yi 2 + ∑ Yi Y j = ( N − 1) S 2 + N Y 2 + ∑ Yi Y j
i =1 i =1 i≠ j i≠ j
N
⇒ ∑ Yi Y j = N 2Y 2 − ( N − 1) S 2 − N Y 2 .
i≠ j
Thus
1 1
E ( yi y j ) = [ N 2Y 2 − ( N − 1) S 2 − N Y 2 ] = Y 2 − S 2 / N . (3)
N N −1
In view of equations (2) and (3), equation (1) becomes
2
n n 1 n S 2
E ∑ α i yi = ∑ α i2 ( N − 1) S 2 + Y 2 + ∑ α iα j Y 2 −
N i≠ j N
i =1 i =1
n S2 n 2 n
2
n
2 2 S
2
= S2 ∑ α i2 − ∑ iα + Y 2
∑ i ∑ i
α + 1 − α Y −
i =1 N i =1 i =1 i =1 N
2
n S2
=S ∑ α i2 +Y 2
− .
i =1 N
Therefore,
n S2
V (T ) = S 2 ∑ α i2 − .
i =1 N
n n 2 n
1 1
Since ∑ α i2 = ∑ α i − + , under condition ∑ α i = 1 , so that
i =1 i =1 n n i =1
n 1
2
1 1
V (T ) = S 2 ∑ α i − + − .
i =1 n n N
n 2
1 1
We note that V (T ) will be minimum, if ∑ α i − = 0 , where α i = , for all
i =1 n n
1 n
i = 1, 2, L , n , and T = ∑ yi = y .
n i =1
OR
To determine α i such that V (T ) is minimum, consider the function
n
φ = V (T ) + λ ∑ α i − 1 , where λ is some unknown constant.
i =1
Using the calculus method of Lagrange multipliers, we select α i and the constant λ to
minimize φ . Differentiating φ with respect to α i and equating to zero, we have
∂φ λ
= 0 = 2 S 2α i + λ or α i = − (4)
∂α i 2S2
Simple random sampling 11
1 N NP
Population mean = ∑ Yi = =P.
N i =1 N
1 N 1 N NP
Population variance = ∑ (Yi − P) 2 = ∑ Yi2 − P 2 = − P 2 = PQ .
N i =1 N i =1 N
1 N 2 1 N 2 2
Mean square of population = ∑ i(Y − P ) = ∑ Y − NP
N − 1 i =1
i
N − 1 i =1
NP − NP 2 NPQ
= = .
N −1 N −1
Similarly, assign to the i − th member of the sample the value y i , which is equal to 1 if this
member possesses the character C and is equal to 0 otherwise, then
n 1 n a
Sample total = ∑ yi = np = a , and Sample mean = ∑ yi = = p .
i =1 n i =1 n
1 n 1 n 2
Mean square for sample = ∑
n − 1 i =1
( y i − p ) 2
= ∑ yi − np 2
n − 1 i =1
1 npq
= (np − np 2 ) = .
n −1 n −1
12 RU Khan
d2 Z α2 2 S 2 S2
d = Zα 2 V ( y ) ⇒ V ( y ) = , and n0 = =
Zα / 2 d2 V ( y)
V′ N2 S2
⇒ V ( y) = , and n0 = , then, n from (4).
N2 V′
Example: For a population of size N = 430 roughly we know that Y = 19 , S 2 = 85.6 with
srs , what should be the size of sample to estimate Yˆ with a margin of error 10% of Y apart
chance is 1 in 20.
Solution: Margin of error in the estimate y of Y is given, i.e.
19
y = Y ± 10% of Y or | y − Y | = 10% of Y = = 1.9 , so that
10
1 Z α2 2 S 2 (1.96) 2 × 85.6
Pr [ | y − Y | ≥ 1.9] = = 0.05 , and n0 = = = 91.091678 .
20 2 2
d (1.9)
Therefore,
n0
n= = 75.168 ≅ 75 .
n0
1+
N
Example: In the population of 676 petition sheets. How large must the sample be if the total
number of signatures is to be estimated with a margin of error of 1000, apart from a 1 in 20
chance? Assume that the population mean square to be 229.
Solution: Let Y be the number of signature on all the sheets. Let Yˆ is the estimate of Y .
Margin of error is specified in the estimate Yˆ of Y as
1
| Yˆ − Y | = 1000 , so that, Pr [ | Yˆ − Y | ≥ 1000] = = 0.05 .
20
16 RU Khan
We know that
2 2
N Zα 2 S
= 676 × 1.96 229 = 402.01385
n0
n= , here, n0 =
n d′ 1000
1+ 0
N
and hence
n = 252.09 ≅ 252 .
Estimation of sample size for proportion
a) When precision is specified in terms of margin of error: Suppose size of the
population is N and population proportion is P . Let a srs of size n is taken and p be
the corresponding sample proportion and d is the margin of error in the estimate p of
P . The margin of error can be specified in the form of probability statement as
Pr [ | p − P | ≥ d ] = α or Pr [ | p − P | ≤ d ] = 1 − α (1)
| p − P |
Pr ≥ Z α 2 = α or Pr [ | p − P | ≥ V ( p ) Z α 2 ] = α (2)
V ( p)
Comparing equation (1) and (2), the relation which gives the value of n with the required
precision of the estimate p of P is given by
N − n PQ
d = Z α 2 V ( p ) or d 2 = Z α2 / 2 V ( p ) = Z α2 2 , as sampling is srswr .
N −1 n
Z α2 2 PQ N − n N −n Z α2 2 PQ PQ
⇒ 1= = n0 , where n0 = = (3)
d2 n ( N − 1) n ( N − 1) d2 V ( p)
N −1 N − n N N N −1
or = = −1 ⇒ = 1+
n0 n n n n0
N N n0 n0 n0
or n = = = = (4)
N − 1 n0 + ( N − 1) n0 N − 1 n
1+ + 1+ 0
n0 N N N
PQ
Substituting V ( p ) = V in relation (1.16) we get, n0 = , and hence n can be obtained
V
by relation (4).
Simple random sampling 17
1 2138
2
1
2 2 2
S = ∑ Yi − NY = 131 682 − 36 = 134.5
N −1 i 36 − 1 36
and
1
| Yˆ − Y | ≤ 200 , then, Pr[| Yˆ − Y | ≤ 200] = = 0.05 .
20
We know that
2 2
n0 N Zα / 2 36 × 1.96
n= , here n0 = S = 134.5 = 16.7409
1+
n0 d 200
N
and therefore,
n = 11.42765 ≅ 12 .
Exercise: With certain populations, it is known that the observations Yi are all zero on a
portion QN of N units (0 < Q < 1) . Sometimes with varying expenditure of efforts, these
units can be found and listed, so that they need not be sampled. If σ 2 is the variance of Yi in
the original population and σ 02 is the variance when all zeros are excluded, then show that
σ2 Q
σ 02 = − Y 2 , where P = 1 − Q , and Y is the mean value of Yi for the whole
P 2
P
population.
Solution: Given Y1 , Y2 , L , Y NP , Y NP +1 , L , Y N (first NP units not zero, and rest NQ units
1 N 1 NP
which are all zero). Thus, Y = ∑ Yi , population mean, and YNP = ∑ Yi ,
N i =1 NP i =1
1 NQ N NP N NP
YNQ = ∑ Yi = 0 , also, ∑ Yi = ∑ Yi , and ∑ Yi2 = ∑ Yi2 , so that NY = NP Y NP ,
NQ i =1 i =1 i =1 i =1 i =1
1
or Y NP = Y . By definition,
P
1 N 1 N N
σ2 = ∑ (Yi − Y ) 2 = ∑ Yi2 − Y 2 , or Nσ 2 = ∑ Yi2 − NY 2 .
N i =1 N i =1 i =1
NP
Similarly, NPσ 02 = ∑ Yi2 − NP YNP
2
.
i =1
Thus,
1 2 1 Q
N (σ 2 − Pσ 02 ) = NP Y NP
2
− NY 2 = NP Y − N Y 2 = N − 1 Y 2 = N Y 2 .
P2 P P
Therefore,
Q σ2 Q 2
Pσ o2 = σ 2 − Y 2 or σ o2 = − Y .
P P P2
Simple random sampling 19
1 1 1
= (n + n1 ) 2 V ( y n ) + n12 E − S n2
(n + n1 ) 2 n1 n
1 n − n1 2
= (n + n1 ) 2 V ( y n ) + n12 S
2 n n
(n + n1 ) 1
1 n (n − n1 ) 2 n (n − n1 ) 2
= (n + n1 ) 2 V ( y n ) + 1 S = V ( yn ) + 1 S .
(n + n1 ) 2 n n (n + n1 ) 2
Therefore,
V ( y n + n1 ) n1 (n − n1 ) n1 (n − n1 )
= 1+ S 2 ≅ 1+ S2
V ( yn ) 2 2 2
n (n + n1 ) V ( y n ) n (n + n1 ) S / n
n 2 n + n2 2 n2 2 n − n1 2 1 1 2
= 2 1 S = S = S = − S .
n 2 n1 n2 n1 n n1 n n1 n
Simple random sampling 21
iii) Cov ( y , y1 − y ) = E [ y ( y1 − y )] − E ( y ) E ( y1 − y )
= E ( y y1 − y 2 ) − Y × 0 = E ( y y1 ) − E ( y 2 ) (1)
Consider
n y + n2 y 2 n n
E ( y y1 ) = E 1 1 y1 = E 1 y12 + 2 y1 y 2
n n n
n n
= 1 E ( y12 ) + 2 E ( y1 ) E ( y 2 )
n n
n n n S2 n
= 1 [ V ( y1 ) + Y 2 ] + 2 Y 2 = 1 +Y 2+ 2 Y 2
n n n n1 n
S 2 n1 2 n2 2 S 2
= + Y + Y = +Y 2 (2)
n n n n
Now
S2
V ( y ) = E ( y 2 ) − Y 2 or E ( y 2 ) = V ( y ) + Y 2 = +Y 2 (3)
n
In view of equations (1), (2), and (3), we get
S2 2
2 S
Cov ( y , y1 − y ) = +Y − + Y 2 = 0.
n n
Exercise: A population has three units U1 , U 2 and U 3 with variates Y1 , Y2 and Y3
respectively. It is required to estimate the population total Y by selecting a sample of two
units. Let the sampling and estimation procedures be as follows:
Sample (s ) P (s ) Estimator t Estimator t ′
(U1 , U 2 ) 1/ 2 Y1 + 2Y2 Y1 + 2Y2 + Y12
(U1 , U 3 ) 1/ 2 Y1 + 2Y3 Y1 + 2Y3 − Y12
Prove that both t and t ′ are unbiased for Y and find their variances. Comment on the
estimators.
Solution: By definition
1
E (t ) = ∑ t i p (t i ) = (Y1 + 2Y2 + Y1 + 2Y3 ) = Y .
i 2
This shows that estimator t is unbiased for Y .
1 1
E (t 2 ) = [(Y1 + 2Y2 ) 2 + (Y1 + 2Y3 ) 2 ] = (Y12 + 4Y22 + 4Y1Y2 + Y12 + 4Y32 + 4Y1Y3 )
2 2
Therefore,
22 RU Khan
Similarly,
1
E (t ′) = ∑ t i′ p (ti′ ) = (Y1 + 2Y2 + Y12 + Y1 + 2Y3 − Y12 ) = Y , hence, t ′ is unbiased for Y .
i 2
1
E (t ′ 2 ) = [(Y1 + 2Y2 + Y12 ) 2 + (Y1 + 2Y3 − Y12 ) 2 ]
2
1 4
= (Y1 + 2Y13 + Y12 + 4Y12Y2 + 4Y1Y2 + 4Y22 + Y14 − 2Y13
2
Therefore,
V (t ′) = E (t ′ 2 ) − [ E (t ′)] 2
We conclude that both linear estimator t and quadratic estimator t ′ are unbiased; among
which estimator has minimum variance depends on the variate values.