Sampling Distribution and Estimation
Sampling Distribution and Estimation
Sampling Distribution and Estimation
Prepared by:
B. S. Parajuli
Population Sample
Definition Collection of items under study Part or portion of population chosen
for study
Characteristic Parameter Statistic
Population Value(Y) ഥ =Y - 6
Y-𝒀 ഥ)𝟐
(Y − 𝒀
2 -4 16
3 -3 9
6 0 0
8 2 4
11 5 25
∑Y = 30 ഥ) = 0
∑(Y - 𝒀 ഥ )𝟐 = 54
∑(Y − 𝒀
ഥ ∑Y 30
Population mean (µ) = 𝒀 = = =6
𝑁 5
∑( Y − ഥ )𝟐
𝒀 54
Population Variance (𝜎 2 ) = = = 10.8
𝑁 5
(b) Possible number of samples of size 2 which can be drawn from the population
without replacement = NCn = 5C2 = 10 Possible Samples are : (2,3) (2,6) (2,8)
(2,11), (3,6) (3,8), (3,11), (6,8), (6,11) , (8,11)
Calculation of mean and variance of sampling distribution of sample means
Sample number Sample values Sample (ഥ
𝒚-𝒚
ന)=𝒚
ഥ −𝟔 (ഥ ന)𝟐
𝒚−𝒚
(y) mean(ഥ
𝒚)
1 (2,3) 2.5 -3.5 12.25
2 (2,6) 4 -2 4
3 (2,8) 5 -1 1
4 (2,11) 6.5 0.5 0.25
5 (3,6) 4.5 -1.5 2.25
6 (3,8) 5.5 -0.5 0.25
7 (3,11) 7 1 1
8 (6,8) 7 1 1
9 (6,11) 8.5 2.5 6.25
10 (8,11) 9.5 3.5 12.25
NC
n =10 ∑𝒚
ഥ = 60 ∑(ഥ
𝒚-𝒚
ന)=0 ∑(ഥ ന ) 𝟐 = 40.5
𝒚−𝒚
∑ 𝒚ഥ60
Now, mean of sample means, 𝒚
ന = NC =10 =6
n
Population mean µ =6
Therefore, mean of sample means is equal to population mean i.e. 𝒚
ന=
µ =6
E(𝑦)
ത = µ i.e. sample mean is unbiased estimate of the population
mean.
2 ∑(𝒚ഥ − 𝒚ന ) 𝟐 40.5
(c) Variance of sample mean V(𝑦)
ത =𝜎𝑦ത = N = = 4.05
Cn 10
𝜎 2 𝑁−𝑛 10.8 5−2
Variance with formula, V(𝑦)
ത = . = . = 4.05
𝑛 𝑁−1 2 5−1
Hence verified.
(d) standard deviation of sampling distribution of sample mean is
𝜎𝑦ത = standard error of mean = S.E. (𝑦)
ത = 𝑉𝑎𝑟(𝑦)
ത = 4.05 = 2.01
Population standard deviation (𝜎) = 10.8 = 3.29
Here 𝜎𝑦ത < 𝜎 , hence standard deviation of sampling distribution of
sample mean is smaller than the population standard deviation.
Example 2
A population consists of 4 numbers 1,2,5 and 8. Consider all sample
of size 2 that can be drawn with replacement from this population.
(a) Find the mean and variance of population
(b) Show that mean of sample mean is equal to population mean.
(c) Find the variance of sampling distribution of means and also verify
𝜎2
with formula V(𝑦)ത =
𝑛
(d) Also show that standard deviation of sampling distribution of
means is less than population standard deviation.
Solution:
Population size (N) =4
Population: 1, 2, 5 & 8
Sample Size(n) =2
(a) Calculation of population mean and population variance
Population Value(Y) ഥ =Y - 4
Y-𝒀 ഥ)𝟐
(Y − 𝒀
1
2
5
8
∑Y = ഥ) = 0
∑(Y - 𝒀 ഥ )𝟐 = 30
∑(Y − 𝒀
ഥ ∑Y
Population mean (µ) = 𝒀 = = …..
𝑁
∑(Y − 𝒀ഥ)𝟐
Population Variance (𝜎 2 ) = =……
𝑁
(b) Possible number of samples of size 2 which can be drawn from
the population with replacement = 𝑁 𝑛 = 42 =16
Possible Samples are :
(1,1), (1,2), (1,5), (1,8), (2,1), (2,2), (2,5), (2,8),
(5,1), (5,2), (5,5), (5,8), (8,1), (8,2), (8,5), (8,8)
Calculation of mean and variance of sampling distribution of sample
means:
Same as above
Conclusion
Population data are uniformly distributed and sample means are
symmetrically distributed.
Population mean and sample mean are equal.
Sample mean is unbiased estimator of population mean.
Sample variance is not equal to population variance.
SThe spread of the sample means in the distribution is small than
the spread in the population values.
The shape of the sampling distribution of the sample means tends
to e bell-shaped and approximates the normal distribution, even
when the population is not normally distributed, provided that
the sample size is reasonably large.
The Central Limit Theorem
The central limit theorem states that, “When the size of the sample increases and
ഥ) will be
becomes sufficiently large, the sampling distribution of the mean(X
𝝈𝟐
approximately normally distributed with mean μ and variance .”
𝒏
If Xi (i = 1, 2, …,n) be independent random variables, such that E(Xi) = μi
& Var.(Xi) = σi2 then it can be proved that under very certain conditions,
the random variables (Sn), Sn = X1 + X2 + …+ Xn. is asymptotically
normal with mean 𝜇 = σ𝑛𝑖=1 𝜇𝑖 and S. D.(𝜎) = σ𝑛𝑖=1 𝜎𝑖 .
A mathematical formulation of the central limit theorem is that the
𝑆𝑡𝑎𝑡𝑖𝑠𝑡𝑖𝑐 −𝑃𝑎𝑟𝑎𝑚𝑒𝑡𝑒𝑟 ഥ −μ
X
distribution of = σ , approaches a normal
𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑒𝑟𝑟𝑜𝑟
n
distribution with mean 0(zero) and variance 1 (one) as 𝑛 → ∞.
Note: Note that the central limit theorem allows us to sample from non-
normally distributed populations with a guarantee of approximately the
same results as would be obtained if the populations were normally
distributed provided that we take a large sample.
Standard Error
The standard deviation of sampling distribution of a sample statistic is known
as standard error and it is abbreviated as S. E.
It is a statistical term that measures the accuracy with which a sample
represents a population.
It means standard error measures chance deviation and not an error or
mistake.
Mean 𝒔
ഥ =
S. E. 𝑿 When 𝞼 is unknown and population size infinite
𝒏
Proportion 𝒑.𝒒 For infinite population when P and Q are unknown where p + q = 1
𝐒. 𝐄. 𝒑 =
𝒏
α/2 (1 – α)
α/2
Acceptance region
−∞ 𝑋𝑈 +∞
𝑋𝐿 𝜃
Confidence level : The probability that we associate with an interval estimate is
called the confidence level. This probability then indicates how confident we are that
the interval estimate will include the population parameter. It is denoted by
1 − 𝛼 . Example: 99% confidence level indicated that there is 95% probability of
estimated random value will lie within the confidence limits and there is 5% risk to
lies the estimator value on the outside of the confidence limits.
Confidence level 50% 90% 95% 96% 97% 98% 99% 99.73%
(1 – α)
Level of 50% 10% 5% 4% 3% 2% 1%
Significance/Risk(𝛼)
Value of Z 0.6745 1.645 1.96 2.05 2.17 2.33 2.58 3
(Two Tailed)
Confidence Interval for Large Samples ( n > 30)
Confidence Interval estimate of population mean from large sample:
ഥ ± 𝒁𝜶 . 𝐒. 𝐄. 𝑿
𝐂. 𝐈. 𝝁 = 𝑿 ഥ
𝝈
ഥ ± 𝒁𝜶 .
=𝑿 [When 𝝈 is unknown we use 𝝈
ෝ = s for large samples ]
𝒏
𝝈 𝑵− 𝒏
ഥ ± 𝒁𝜶 .
=𝑿 . [In case of simple random sampling without replacement from a finite
𝒏 𝑵−𝟏
population of size N]
𝑷.𝑸 𝑵− 𝒏
= 𝐩 ± 𝒁𝜶 . . [In case of simple random sampling without replacement from a finite
𝒏 𝑵 −𝟏
population of size N]
𝒑.𝒒 𝑵− 𝒏
= 𝒑 ± 𝒁𝜶 . . [ It is used when P and Q are unknown but N is finite ]
𝑵 𝒏 −𝟏
Example: A sample of 400 students taking Entrance Exam for B.sc CSIT
revealed an average score of 56 and sample standard deviation of 10.
Construct a 98% as well as 99% confidence interval for the population
mean.
Solution:
With usual notation, n =400, μ X ഥ = 56, 𝑠 = 10
For 98% confidence level: 1 – α = 98% , α = 2% 𝑍𝛼 = 2.33
Hence 98% confidence limits for population mean is given by
ഥ ± Z α . S. E. X
X ഥ
𝒔
ഥ
=X±Z α .
𝒏
𝟏𝟎
= 56 ± 2.33 ×
𝟒𝟎𝟎
= 56 ± 1.165
= (56 – 1.165, 56+ 1.165) = ( 54.83, 57.165)
Example From a population of 540, a sample of 60 individuals is taken.
From this sample the mean is found to be 6.2 and the standard deviation
1.368.
(i) Find the standard error of the mean.
(ii) Construct a 96% confidence interval for the mean.
Solution:
With usual notations, N = 540, n = 60, 𝑋ത = 6.2, 𝑠 = 1.368
𝒔 𝑵− 𝒏
ത =
(i) Standard error of mean S.E. (𝑋) . [ for large sample 𝝈
ෝ = s]
𝒏 𝑵−𝟏
𝟏.𝟑𝟔𝟖 𝟓𝟒𝟎−𝟔𝟎
= .
𝟔𝟎 𝟓𝟒𝟎−𝟏
= 0.17
(ii) For 96% confidence level, 1 – α = 96% , α = 4%
𝑍𝛼 = 2.05
Hence 96% confidence interval for the mean is given by
= X ഥ ± Z α . S. E. X
ഥ
= 6.2 ± 2.05 × 0.17
= 6.2 ± 0.3485
= (5.85 , 6.55)
Example: In laboratory experiment, for the test of a material in good
condition, a sample of 400 units was drawn. When they were tested, 80
were good. Find 95% confidence limits for the percentage of good.
Solution:
Sample size(n) = 400 number of good material (x) = 80
𝑥 80
Sample proportion(p) = = = 0.2, q = 1 – p = 1 – 0.2 = 0.8
𝑛 400
For 95% confidence limits, (1 – α) = 95% , α = 5%, 𝑍𝛼 = 1.96
95% Confidence interval for population proportion (P) is given by
=p±Z α . S. E. p
𝒑.𝒒
= p ± 1.96 ×
𝒏
𝟎.𝟐×𝟎.𝟖
= 0.2 ± 1.96 ×
𝟒𝟎𝟎
= 0.2 ± 0.039 = (0.161 , 0.239)
Example: A factory is producing 5000 CD daily from a sample of 500
CD, 2% were found to be substandard quality. Estimate the percentage
of CD that can be reasonable expected to spoiled in the daily production
at 95% confidence interval.
Solution:
Confidence Interval for Small Samples ( n≤ 𝟑𝟎)
1
𝑆 2 = (𝑛−1) σ(𝑥 − 𝑥̅ )2
2 1 2 σ𝑥 2
S = σ𝑥 −
(n− 1) 𝑛
1 σ𝑑 2 σ𝑑
𝑆2 = σ 𝑑2 − where d = x – A , 𝑥̅ = 𝐴 +
(𝑛−1) 𝑛 𝑛
24 2.064
Example: A random sample of size 25 showed a mean of 172.50 cm with a
standard deviation of 15.40 cm. Determine 95% confidence interval for the
mean of the population.
Solution:
95% confidence interval for population mean:
n= 25, 𝑋ത = 172.50 , s = 15.40
1- 𝛼 = 0.95 , 𝛼 = 0.05 d.f. = n-1 = 25-1 = 24
𝑡𝛼,𝑛−1 = 𝑡0.05,24 = 2.064
95% CI for Population mean( 𝜇) is
𝑠
= 𝑋ത ± 𝑡𝛼,𝑛−1 .
𝑛−1
15.40
= 172.50 ± 2.064 .
25−1
= 172.50 ± 6.48
= (172.50 – 6.48 , 172.50 + 6.48)
= (166.01 , 178.98)
A machine produces metal rods used in an automobile suspension
system. A random sample of 6 rods is selected and diameter is
measured. To measuring data (in mm) are as follows. Assuming
that the samples drawn from the normally distributed population
8.24 8.26 8.20 8.28 8.21 8.23
Find 95% two sided confidence interval on the mean rod diameter
and interpret the result with reference to the given problem.
Solution :
Calculate sample s.d.(s) from the given data and use the formula
𝑠
𝑋ത ± 𝑡𝛼,𝑛−1 . here d.f. = n-1 = 6-1 = 5
𝑛
X X2
8.24 67.8976
8.26 68.2276
8.20 67.24
8.28 68.5584
8.21 67.4041
8.23 67.7329
∑X2= 407.0606
∑X = 49.42
Determination of Sample Size
➢ Sample size describes about the number of samples that is taken from
the population for the study.
➢ It is said that if sample size is higher than chances error will be lower
and vice-versa.
➢ Samples will totally represent the population when sample size equals
the population.
➢ Any number of samples can be taken but it should always properly
represent the population.
Sample Size for Estimating a Population Mean
𝒁𝜶 .𝝈 𝟐
Sample size (𝐧) = , for infinite population
𝒆
Where, n = sample size
𝜎 = population standard deviation
E/e/d = permissible error /allowable error which is the
difference between the sample mean and population mean
𝑍𝛼 = significant value or critical value of Z corresponding
to 𝛼 level of significance
(𝑍𝛼 .𝜎)2 𝑛
Sample size (n) = (𝑍𝛼 .𝜎)2
= 𝑛 , for finite population
𝐸2 + 𝑁 1 +𝑁
Note: Note: In the case of sample size determination if value of confidence level is
not given take 95% and for almost certainty 𝑍𝛼 = 3.
Example: A manufacturing concern wants to estimate the average
amount of purchase of its product in a month by the customers. If the
standard deviation is Rs. 10. Find the sample size if the maximum error is
not exceed to Rs. 3 with probability of 0.99.
Solution:
Standard deviation 𝜎 = Rs. 10, Permissible error (𝑒 ) = Rs. 3
Confidence level (1 – α) = 0.99 = 99% , α = 1%
Significant value (𝑍𝛼 ) = 2.58
𝑍𝛼 .𝜎 2 (2.58)2 .(10)2
Sample Size (n) = = = 73.96 ≅ 74
𝑒 32
Hence the required sample size is 74.
Example: A health officer wishes to estimate the mean hemoglobin
level in defined community. Preliminary information is that the mean is
about 150mg/dl with standard deviation of 30 mg/ dl. If sampling error
of to 5 mg/ dl in the estimate to be tolerated, how many people should
be included in the study at 95 % confidence level? If the community to
be sampled has 1000 people, what should be the sample size?
Solution:
Mean hemoglobin level (𝑥̅) = 150 mg/dl, Standard Deviation (𝜎) = 30
mg/dl, Allowable error (e) = 5 mg/dl,
Confidence level (1 – α) = 95%, Population Size (N) = 1000
𝑍𝛼 .𝜎 2 (1.96)2 .(30)2
Sample size(𝑛) = = = 138.29 ≈ 138.
𝑒 52
Hence minimum sample size is 138.
𝑍𝛼 .𝜎 2 𝑍𝛼 .𝑠 2
If n is increased to 144, we have 𝑛 = 𝑒
= 𝑒
𝑍𝛼 . 4 2
or, 144 = or, 𝑍𝛼 2 = 225 ∴ 𝑍𝛼 = ±15
5
Now, P(-15 < Z < 15) = 1
and P(-15 < Z < 15) = 1 - 𝛼 ∴ 1 = 1 - 𝛼 so, risk (𝛼 ) = 0
Hence, by increasing sample size from 64 to 144 risk will not affected.
THANKYOU