Sampling Distribution and Estimation

Download as pdf or txt
Download as pdf or txt
You are on page 1of 46

Sampling Distribution and Estimation

Prepared by:
B. S. Parajuli
Population Sample
Definition Collection of items under study Part or portion of population chosen
for study
Characteristic Parameter Statistic

Symbols Population size = N Sample size = n


Population mean = 𝜇 Sample mean = x̅
Population SD = 𝜎 Sample SD = s
Population variance = 𝜎 2 Sample variance = s2
Population correlation coefficient = 𝜌 (𝑟ℎ𝑜) Sample correlation coefficient = r
Regression coefficient = 𝛽 Regression coefficient = b
Population proportion = P Sample proportion = p
Population proportion(P)
𝑛𝑜. 𝑜𝑓 𝑖𝑡𝑒𝑚𝑠 𝑖𝑛 𝑡ℎ𝑒 𝑝𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 Sample proportion(p)
ℎ𝑎𝑣𝑖𝑛𝑔 𝑠𝑜𝑚𝑒 𝑐ℎ𝑎𝑟𝑎𝑐𝑡𝑒𝑟𝑖𝑠𝑡𝑖𝑐 𝑎
= =𝑁 𝑛𝑜. 𝑜𝑓 𝑖𝑡𝑒𝑚𝑠 𝑖𝑛 𝑡ℎ𝑒 sample
P𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 𝑠𝑖𝑧𝑒 𝑥
ℎ𝑎𝑣𝑖𝑛𝑔 𝑠𝑜𝑚𝑒 𝑐ℎ𝑎𝑟𝑎𝑐𝑡𝑒𝑟𝑖𝑠𝑡𝑖𝑐
= sample 𝑠𝑖𝑧𝑒
=𝑛
Some important Theorems on Simple random Sampling
 The sample mean is unbiased estimate of population mean
ത = 𝑌ത = µ
E(𝑦)
 In simple random sampling without replacement(SRSWOR) , the
variance of the sample mean is given by
𝜎 2 𝑁−𝑛
V(𝑦)
ത = . where N is population size(finite population)
𝑛 𝑁−1
 In simple random sampling with replacement(SRSWR), the variance
of the sample mean is given by
𝜎2
V(𝑦)
ത =
𝑛
Sampling Distribution

A population parameter is always a constant, whereas a sample


statistic is a random variable. Because every random variable must
possesses a probability distribution. The probability distribution of a
sample statistic is called sampling distribution.
 Definition: “The distribution of all possible values that can be
assumed by some statistic, computed from samples of the same
size randomly drawn from the same population, is called the
sampling distribution of that statistic” (Daniel, 2010).
Construction of Sampling Distribution
 It may be constructed empirically when sampling from a discrete
finite population.
 To construct a sampling distribution we proceed as follows:
i. From a finite population of size N, randomly draw all possible
samples of size ‘n’.
ii. Compute the statistic of interest for each sample.
iii. List in one column the different distinct observed values of the
statistic and in another column list the corresponding
frequency of occurrence of each distinct observed value of the
statistic.
Example 1
 A population consists of 5 numbers 2,3,6,8 and 11. Consider all
sample of size 2 that can be drawn without replacement from
this population.
(a) Find the mean and variance of population
(b) Show that mean of sample mean is equal to population mean.
(c) Find the variance of sampling distribution of means and also
𝜎 2 𝑁−𝑛
verify with formula V(𝑦)
ത = .
𝑛 𝑁−1
(d) Also show that standard deviation of sampling distribution of
means is less than population standard deviation.
Solution:
Population size (N) =5
Population: 2,3,6,8 and 11
Sample Size(n) =2
(a) Calculation of population mean and population variance

Population Value(Y) ഥ =Y - 6
Y-𝒀 ഥ)𝟐
(Y − 𝒀
2 -4 16
3 -3 9
6 0 0
8 2 4
11 5 25
∑Y = 30 ഥ) = 0
∑(Y - 𝒀 ഥ )𝟐 = 54
∑(Y − 𝒀

ഥ ∑Y 30
Population mean (µ) = 𝒀 = = =6
𝑁 5
∑( Y − ഥ )𝟐
𝒀 54
Population Variance (𝜎 2 ) = = = 10.8
𝑁 5
(b) Possible number of samples of size 2 which can be drawn from the population
without replacement = NCn = 5C2 = 10 Possible Samples are : (2,3) (2,6) (2,8)
(2,11), (3,6) (3,8), (3,11), (6,8), (6,11) , (8,11)
Calculation of mean and variance of sampling distribution of sample means
Sample number Sample values Sample (ഥ
𝒚-𝒚
ന)=𝒚
ഥ −𝟔 (ഥ ന)𝟐
𝒚−𝒚
(y) mean(ഥ
𝒚)
1 (2,3) 2.5 -3.5 12.25
2 (2,6) 4 -2 4
3 (2,8) 5 -1 1
4 (2,11) 6.5 0.5 0.25
5 (3,6) 4.5 -1.5 2.25
6 (3,8) 5.5 -0.5 0.25
7 (3,11) 7 1 1
8 (6,8) 7 1 1
9 (6,11) 8.5 2.5 6.25
10 (8,11) 9.5 3.5 12.25
NC
n =10 ∑𝒚
ഥ = 60 ∑(ഥ
𝒚-𝒚
ന)=0 ∑(ഥ ന ) 𝟐 = 40.5
𝒚−𝒚
∑ 𝒚ഥ60
Now, mean of sample means, 𝒚
ന = NC =10 =6
n
Population mean µ =6
Therefore, mean of sample means is equal to population mean i.e. 𝒚
ന=
µ =6
E(𝑦)
ത = µ i.e. sample mean is unbiased estimate of the population
mean.
2 ∑(𝒚ഥ − 𝒚ന ) 𝟐 40.5
(c) Variance of sample mean V(𝑦)
ത =𝜎𝑦ത = N = = 4.05
Cn 10
𝜎 2 𝑁−𝑛 10.8 5−2
Variance with formula, V(𝑦)
ത = . = . = 4.05
𝑛 𝑁−1 2 5−1
Hence verified.
(d) standard deviation of sampling distribution of sample mean is
𝜎𝑦ത = standard error of mean = S.E. (𝑦)
ത = 𝑉𝑎𝑟(𝑦)
ത = 4.05 = 2.01
Population standard deviation (𝜎) = 10.8 = 3.29
Here 𝜎𝑦ത < 𝜎 , hence standard deviation of sampling distribution of
sample mean is smaller than the population standard deviation.
Example 2
 A population consists of 4 numbers 1,2,5 and 8. Consider all sample
of size 2 that can be drawn with replacement from this population.
(a) Find the mean and variance of population
(b) Show that mean of sample mean is equal to population mean.
(c) Find the variance of sampling distribution of means and also verify
𝜎2
with formula V(𝑦)ത =
𝑛
(d) Also show that standard deviation of sampling distribution of
means is less than population standard deviation.
Solution:
Population size (N) =4
Population: 1, 2, 5 & 8
Sample Size(n) =2
(a) Calculation of population mean and population variance

Population Value(Y) ഥ =Y - 4
Y-𝒀 ഥ)𝟐
(Y − 𝒀
1
2
5
8
∑Y = ഥ) = 0
∑(Y - 𝒀 ഥ )𝟐 = 30
∑(Y − 𝒀

ഥ ∑Y
Population mean (µ) = 𝒀 = = …..
𝑁
∑(Y − 𝒀ഥ)𝟐
Population Variance (𝜎 2 ) = =……
𝑁
(b) Possible number of samples of size 2 which can be drawn from
the population with replacement = 𝑁 𝑛 = 42 =16
Possible Samples are :
(1,1), (1,2), (1,5), (1,8), (2,1), (2,2), (2,5), (2,8),
(5,1), (5,2), (5,5), (5,8), (8,1), (8,2), (8,5), (8,8)
Calculation of mean and variance of sampling distribution of sample
means:

Sample number Sample values Sample (ഥ


𝒚-𝒚
ന)=𝒚
ഥ −𝟔 (ഥ ന)𝟐
𝒚−𝒚
(y) mean(ഥ
𝒚)
1

16
∑𝒚
ഥ= ∑(ഥ
𝒚-𝒚
ന)=0 ∑(ഥ ന ) 𝟐 = 60
𝒚−𝒚

Same as above
Conclusion
 Population data are uniformly distributed and sample means are
symmetrically distributed.
 Population mean and sample mean are equal.
 Sample mean is unbiased estimator of population mean.
 Sample variance is not equal to population variance.
 SThe spread of the sample means in the distribution is small than
the spread in the population values.
 The shape of the sampling distribution of the sample means tends
to e bell-shaped and approximates the normal distribution, even
when the population is not normally distributed, provided that
the sample size is reasonably large.
The Central Limit Theorem
 The central limit theorem states that, “When the size of the sample increases and
ഥ) will be
becomes sufficiently large, the sampling distribution of the mean(X
𝝈𝟐
approximately normally distributed with mean μ and variance .”
𝒏
 If Xi (i = 1, 2, …,n) be independent random variables, such that E(Xi) = μi
& Var.(Xi) = σi2 then it can be proved that under very certain conditions,
the random variables (Sn), Sn = X1 + X2 + …+ Xn. is asymptotically
normal with mean 𝜇 = σ𝑛𝑖=1 𝜇𝑖 and S. D.(𝜎) = σ𝑛𝑖=1 𝜎𝑖 .
 A mathematical formulation of the central limit theorem is that the
𝑆𝑡𝑎𝑡𝑖𝑠𝑡𝑖𝑐 −𝑃𝑎𝑟𝑎𝑚𝑒𝑡𝑒𝑟 ഥ −μ
X
distribution of = σ , approaches a normal
𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑒𝑟𝑟𝑜𝑟
n
distribution with mean 0(zero) and variance 1 (one) as 𝑛 → ∞.
 Note: Note that the central limit theorem allows us to sample from non-
normally distributed populations with a guarantee of approximately the
same results as would be obtained if the populations were normally
distributed provided that we take a large sample.
Standard Error
 The standard deviation of sampling distribution of a sample statistic is known
as standard error and it is abbreviated as S. E.
 It is a statistical term that measures the accuracy with which a sample
represents a population.
 It means standard error measures chance deviation and not an error or
mistake.

Use of standard error:


 To work out the limits within which the population means would lie.
 To determine whether the sample is drawn from a known population or not,
when mean is known.
 To determine the standard error of difference between the means of two
samples whether it is real and statistically insignificant or it is apparent and
insignificant due to chance.
 To calculate the size of sample.
Statistic Standard Error (S. E.) Conditions

Mean 𝝈 For infinite population (If sample is drawn with replacement)


ഥ =
S. E. 𝑿
𝒏

Mean 𝝈 𝑵− 𝒏 For finite population


ഥ) =
𝑺. 𝑬. (𝑿 .
𝒏 𝑵−𝟏

Mean 𝒔
ഥ =
S. E. 𝑿 When 𝞼 is unknown and population size infinite
𝒏

Mean 𝒔 𝑵− 𝒏 For finite population (sample is drawn without replacement)


ഥ) =
𝑺. 𝑬. (𝑿 .
𝒏 𝑵−𝟏

Proportion 𝑷.𝑸 For infinite population where P + Q = 1


𝐒. 𝐄. 𝒑 =
𝒏

Proportion 𝒑.𝒒 For infinite population when P and Q are unknown where p + q = 1
𝐒. 𝐄. 𝒑 =
𝒏

Proportion 𝑷.𝑸 𝑵− 𝒏 For finite population (N is given)


𝑺. 𝑬. (𝒑) = .
𝒏 𝑵 −𝟏

Proportion 𝒑.𝒒 𝑵− 𝒏 When P and Q are unknown but N is given


𝑺. 𝑬. (𝒑) = .
𝑵 𝒏 −𝟏
Statistic Standard Error (S. E.) Conditions

Difference of means 𝝈𝟏 𝟐 𝝈𝟐 𝟐 If 𝝈𝟏 & 𝝈𝟐 are known


𝑺. 𝑬. (𝑿𝟏 − 𝑿𝟐 ) = + )
𝒏𝟏 𝒏𝟐
𝟏 𝟏 If two samples have drawn from same population
𝑺. 𝑬. (𝑿𝟏 − 𝑿𝟐 ) = 𝝈𝟐( + ) (𝝈𝟏 = 𝝈𝟐 = 𝛔)
𝒏𝟏 𝒏𝟐

Difference of means 𝟏 𝟏 If 𝝈𝟏 & 𝝈𝟐 are not known then combined variance


𝑺. 𝑬. (𝑿𝟏 − 𝑿𝟐 ) = 𝑺𝟐 ( + )
𝒏𝟏 𝒏𝟐 𝒏𝟏𝒔𝟏𝟐+𝒏𝟐 𝒔𝟐 𝟐 σ(𝑿𝟏 − 𝑿𝟏 )𝟐+σ(𝑿𝟐 − 𝑿𝟐 )𝟐
𝟐
𝑺 = =
𝒏𝟏+𝒏𝟐 𝒏𝟏+𝒏𝟐 −𝟐

Difference of 𝑷𝟏 𝑸𝟏 𝑷𝟐 𝑸𝟐 If two population proportions are known


proportions 𝑺. 𝑬. (𝒑𝟏 − 𝒑𝟐 ) = +
𝒏𝟏 𝒏𝟐

Difference of 𝟏 𝟏 If two population proportions are equal


𝑺. 𝑬. . (𝒑𝟏 − 𝒑𝟐 ) = ෡𝑸
𝑷 ෡( + )
proportions 𝒏𝟏 𝒏𝟐 𝑷𝟏 = 𝑷𝟐 = P

Combined proportion 𝑷 ෡ = 𝒏𝟏𝒑𝟏+𝒏𝟐𝒑𝟐 =


𝑿𝟏+𝑿𝟐
𝒏𝟏+𝒏𝟐 𝒏𝟏+𝒏𝟐
Marketing Manager in an organization needs to estimate
the likely market share his company can achieve in the
market place.

Quality Assurance Manager may be interested in


estimating the proportion defective of the finished
product before shipment to the customer.

Manager of the credit department needs to estimate the


average collection period for collecting dues from the
customers.
Estimation
 Estimation: The statistical technique of estimating unknown population parameters
from corresponding sample statistic is known as estimation.
 Estimator: A function (or algebraic expression) which uses sample information to
estimate a population parameter is known as estimator.
Let sample mean (x̅) is used as an estimate of the population mean (𝜇). Here population
mean (𝜇) is the parameter to be used estimated, x̅ = ΣX/n. is an estimator which is a
function (or formula) of sample values and the numerical values of x̅ for a particular
sample is an estimate of the population parameter (𝜇).
 Estimates: A specific numerical value of estimator is called estimate.
 Point Estimate: A point estimate is a single numerical value used to estimate the
corresponding population parameter. For example: The number of subscribers of the
Namaste mobile in the next year will be 85, 00,000 estimated by general manager of
Nepal Telecom.
 Interval Estimate: An interval estimate consists of two numerical values defining a
range of values that, with a specified degree of confidence, most likely includes the
parameter being estimated. For example: The number of subscribers of the Namaste
mobile in the next year will be 80,00,000 to 90,00,000 estimated by general manager
of Nepal Telecom.
Criteria of good estimator
 Unbiasedness: The estimator is said to be unbiased if expected value of
sample statistic is equal to population parameter. An estimator, say T of
the parameter ‘𝜃’ is said to be an unbiased estimator of ‘𝜃’ if E(T) =
𝜃. For example: If the expected sample mean is equal to the
population mean i. e. E(X ഥ) = 𝜇, then sample mean is said to be
unbiased estimator of population mean. Similarly E(p) = P. hence
sample mean and sample proportion are unbiased estimator.
 Note: (a) if E(Xഥ) − 𝜇 ≠ 0 then it is called biased.
 ഥ) − 𝜇 > 0 then it is called positively biased.
(b) if E(X
 ഥ) − 𝜇 < 0 then it is called negatively biased.
(c) if E(X
 Consistency: A statistic is considered to be consistent estimator of the
population parameter if as the sample size increases; the sample value
is more close to the population parameter. Thus a consistent estimator
is more reliable with large sample. An estimator T calculated from a
sample variate is said to be consistent estimator of a parameter 𝜃 if,
𝑇 → 𝜃 𝑎𝑠 𝑛 → ∞. Example: A sample mean come close to the
population mean as the sample size increases.
Contd…
 Efficiency: Efficiency refers to the size of the standard error of the
sample statistic. The estimator with the lesser variance is considered as
the most efficient estimator. Let t1 and t2 be two consistent estimators
of parameter ‘𝜃’ such that Var. (t1) < Var. (t2) for all ‘n’ then t1 is said
Var.(t1 )
to be more efficient than t2 i. e. E = . Example: The standard
Var (t2 )
deviation is the least compared standard deviation of median and
mode. So mean is efficient estimator of the population mean.
 Sufficiency: An estimator is said to be sufficient estimator if it uses all
the information about the population parameter contained in the
sample. For example: Sample mean (X ഥ) is sufficient estimator of the
population mean (𝜇) because it uses all the information given in the
sample but median uses the information of two extreme classes and
mode uses information of three classes. Therefore mean is more
sufficient estimator than median and mode.
 Confidence interval: The interval within which unknown
value of population parameter is expected to lie is known as
confidence interval. The limits within which parameter value are
estimated is called confidence limits/Fudicial limits. The lower
limit 𝑋𝐿 and upper limit 𝑋𝑈 of the interval are called confidence
limits.

α/2 (1 – α)
α/2
Acceptance region

−∞ 𝑋𝑈 +∞
𝑋𝐿 𝜃
 Confidence level : The probability that we associate with an interval estimate is
called the confidence level. This probability then indicates how confident we are that
the interval estimate will include the population parameter. It is denoted by
1 − 𝛼 . Example: 99% confidence level indicated that there is 95% probability of
estimated random value will lie within the confidence limits and there is 5% risk to
lies the estimator value on the outside of the confidence limits.

 Level of significance: The maximum size of probability assigned to tolerate in


decision making based on sample evidence is called level of significance. It is denoted
by ‘𝛼’ (alpha).

Confidence level 50% 90% 95% 96% 97% 98% 99% 99.73%
(1 – α)
Level of 50% 10% 5% 4% 3% 2% 1%
Significance/Risk(𝛼)
Value of Z 0.6745 1.645 1.96 2.05 2.17 2.33 2.58 3
(Two Tailed)
Confidence Interval for Large Samples ( n > 30)
Confidence Interval estimate of population mean from large sample:
ഥ ± 𝒁𝜶 . 𝐒. 𝐄. 𝑿
𝐂. 𝐈. 𝝁 = 𝑿 ഥ
𝝈
ഥ ± 𝒁𝜶 .
=𝑿 [When 𝝈 is unknown we use 𝝈
ෝ = s for large samples ]
𝒏

𝝈 𝑵− 𝒏
ഥ ± 𝒁𝜶 .
=𝑿 . [In case of simple random sampling without replacement from a finite
𝒏 𝑵−𝟏
population of size N]

Confidence Interval estimate of population Proportion from large sample:


𝐂. 𝐈. 𝑷 = 𝐩 ± 𝒁𝜶 . 𝐒. 𝐄. 𝒑
𝑷.𝑸
= 𝐩 ± 𝒁𝜶 . ෡ = 𝐩 for large samples ]
[ When P is unknown 𝑷
𝒏

𝑷.𝑸 𝑵− 𝒏
= 𝐩 ± 𝒁𝜶 . . [In case of simple random sampling without replacement from a finite
𝒏 𝑵 −𝟏
population of size N]
𝒑.𝒒 𝑵− 𝒏
= 𝒑 ± 𝒁𝜶 . . [ It is used when P and Q are unknown but N is finite ]
𝑵 𝒏 −𝟏
Example: A sample of 400 students taking Entrance Exam for B.sc CSIT
revealed an average score of 56 and sample standard deviation of 10.
Construct a 98% as well as 99% confidence interval for the population
mean.
Solution:
With usual notation, n =400, μ X ഥ = 56, 𝑠 = 10
For 98% confidence level: 1 – α = 98% , α = 2% 𝑍𝛼 = 2.33
Hence 98% confidence limits for population mean is given by
ഥ ± Z α . S. E. X
X ഥ
𝒔

=X±Z α .
𝒏
𝟏𝟎
= 56 ± 2.33 ×
𝟒𝟎𝟎
= 56 ± 1.165
= (56 – 1.165, 56+ 1.165) = ( 54.83, 57.165)
Example From a population of 540, a sample of 60 individuals is taken.
From this sample the mean is found to be 6.2 and the standard deviation
1.368.
(i) Find the standard error of the mean.
(ii) Construct a 96% confidence interval for the mean.
Solution:
With usual notations, N = 540, n = 60, 𝑋ത = 6.2, 𝑠 = 1.368
𝒔 𝑵− 𝒏
ത =
(i) Standard error of mean S.E. (𝑋) . [ for large sample 𝝈
ෝ = s]
𝒏 𝑵−𝟏

𝟏.𝟑𝟔𝟖 𝟓𝟒𝟎−𝟔𝟎
= .
𝟔𝟎 𝟓𝟒𝟎−𝟏
= 0.17
(ii) For 96% confidence level, 1 – α = 96% , α = 4%
𝑍𝛼 = 2.05
Hence 96% confidence interval for the mean is given by
= X ഥ ± Z α . S. E. X

= 6.2 ± 2.05 × 0.17
= 6.2 ± 0.3485
= (5.85 , 6.55)
Example: In laboratory experiment, for the test of a material in good
condition, a sample of 400 units was drawn. When they were tested, 80
were good. Find 95% confidence limits for the percentage of good.
Solution:
Sample size(n) = 400 number of good material (x) = 80
𝑥 80
Sample proportion(p) = = = 0.2, q = 1 – p = 1 – 0.2 = 0.8
𝑛 400
For 95% confidence limits, (1 – α) = 95% , α = 5%, 𝑍𝛼 = 1.96
95% Confidence interval for population proportion (P) is given by
=p±Z α . S. E. p
𝒑.𝒒
= p ± 1.96 ×
𝒏

𝟎.𝟐×𝟎.𝟖
= 0.2 ± 1.96 ×
𝟒𝟎𝟎
= 0.2 ± 0.039 = (0.161 , 0.239)
Example: A factory is producing 5000 CD daily from a sample of 500
CD, 2% were found to be substandard quality. Estimate the percentage
of CD that can be reasonable expected to spoiled in the daily production
at 95% confidence interval.
Solution:
Confidence Interval for Small Samples ( n≤ 𝟑𝟎)

 Confidence intervals using ‘t’: The general procedure for constructing


confidence intervals is as:
 estimator ± (reliability constant) x (standard error of the estimate)
 Reliability coefficient is obtained from the table of t-distribution rather
than from the table of standard normal distribution. When sampling is
from a normal distribution whose standard deviation, σ, is unknown,
then Confidence interval is given by:
ഥ 𝑠
 C. I. (μ) = 𝑋 ± 𝑡𝛼,𝑛−1 . [ For unbiased estimator/when
𝑛
actual data is given]
 C. I. (μ) = 𝑋 ഥ ± 𝑡𝛼,𝑛−1 . 𝑠 [ For biased estimator/when
𝑛−1
actual data is not given]
 Degrees of freedom: The number of independent observations in a sample
is called degrees of freedom. It is defined as the difference between the
total number of items and the total number of constraints. ‘t’-
distribution follows (n – 1) degrees of freedom.
Computation of S2 for Numerical Problems:

1
 𝑆 2 = (𝑛−1) σ(𝑥 − 𝑥̅ )2
2 1 2 σ𝑥 2
 S = σ𝑥 −
(n− 1) 𝑛
1 σ𝑑 2 σ𝑑
 𝑆2 = σ 𝑑2 − where d = x – A , 𝑥̅ = 𝐴 +
(𝑛−1) 𝑛 𝑛

Level of significance for one tailed test


0.10 0.05 0.025 0.01 0.005 0.0005
Level of significance for two tailed test
d.f. 0.20 0.10 0.05 0.02 0.01 0.001
1 3.078 6.314 12.706 31.821 63.657 636.619

24 2.064
Example: A random sample of size 25 showed a mean of 172.50 cm with a
standard deviation of 15.40 cm. Determine 95% confidence interval for the
mean of the population.
Solution:
95% confidence interval for population mean:
n= 25, 𝑋ത = 172.50 , s = 15.40
1- 𝛼 = 0.95 , 𝛼 = 0.05 d.f. = n-1 = 25-1 = 24
𝑡𝛼,𝑛−1 = 𝑡0.05,24 = 2.064
95% CI for Population mean( 𝜇) is
𝑠
= 𝑋ത ± 𝑡𝛼,𝑛−1 .
𝑛−1
15.40
= 172.50 ± 2.064 .
25−1
= 172.50 ± 6.48
= (172.50 – 6.48 , 172.50 + 6.48)
= (166.01 , 178.98)
 A machine produces metal rods used in an automobile suspension
system. A random sample of 6 rods is selected and diameter is
measured. To measuring data (in mm) are as follows. Assuming
that the samples drawn from the normally distributed population
8.24 8.26 8.20 8.28 8.21 8.23

Find 95% two sided confidence interval on the mean rod diameter
and interpret the result with reference to the given problem.
Solution :
Calculate sample s.d.(s) from the given data and use the formula
𝑠
𝑋ത ± 𝑡𝛼,𝑛−1 . here d.f. = n-1 = 6-1 = 5
𝑛
X X2
8.24 67.8976
8.26 68.2276
8.20 67.24
8.28 68.5584
8.21 67.4041
8.23 67.7329

∑X2= 407.0606
∑X = 49.42
Determination of Sample Size

➢ Sample size describes about the number of samples that is taken from
the population for the study.
➢ It is said that if sample size is higher than chances error will be lower
and vice-versa.
➢ Samples will totally represent the population when sample size equals
the population.
➢ Any number of samples can be taken but it should always properly
represent the population.
Sample Size for Estimating a Population Mean
𝒁𝜶 .𝝈 𝟐
Sample size (𝐧) = , for infinite population
𝒆
Where, n = sample size
𝜎 = population standard deviation
E/e/d = permissible error /allowable error which is the
difference between the sample mean and population mean
𝑍𝛼 = significant value or critical value of Z corresponding
to 𝛼 level of significance
(𝑍𝛼 .𝜎)2 𝑛
Sample size (n) = (𝑍𝛼 .𝜎)2
= 𝑛 , for finite population
𝐸2 + 𝑁 1 +𝑁

Note: Note: In the case of sample size determination if value of confidence level is
not given take 95% and for almost certainty 𝑍𝛼 = 3.
Example: A manufacturing concern wants to estimate the average
amount of purchase of its product in a month by the customers. If the
standard deviation is Rs. 10. Find the sample size if the maximum error is
not exceed to Rs. 3 with probability of 0.99.
Solution:
Standard deviation 𝜎 = Rs. 10, Permissible error (𝑒 ) = Rs. 3
Confidence level (1 – α) = 0.99 = 99% , α = 1%
Significant value (𝑍𝛼 ) = 2.58
𝑍𝛼 .𝜎 2 (2.58)2 .(10)2
Sample Size (n) = = = 73.96 ≅ 74
𝑒 32
Hence the required sample size is 74.
Example: A health officer wishes to estimate the mean hemoglobin
level in defined community. Preliminary information is that the mean is
about 150mg/dl with standard deviation of 30 mg/ dl. If sampling error
of to 5 mg/ dl in the estimate to be tolerated, how many people should
be included in the study at 95 % confidence level? If the community to
be sampled has 1000 people, what should be the sample size?
Solution:
Mean hemoglobin level (𝑥̅) = 150 mg/dl, Standard Deviation (𝜎) = 30
mg/dl, Allowable error (e) = 5 mg/dl,
Confidence level (1 – α) = 95%, Population Size (N) = 1000
𝑍𝛼 .𝜎 2 (1.96)2 .(30)2
Sample size(𝑛) = = = 138.29 ≈ 138.
𝑒 52
Hence minimum sample size is 138.

When population size (N) is given i. e. N = 1000


𝑛 138
then n = 𝑛 = 138 = 121.26 ≅ 121
1+ 𝑁 1+ 1000
Hence minimum required sample size is 121 when population size is
1000.
Sample Size for estimating a Population Proportion
𝒁𝜶 . 𝟐 𝒁𝟐 𝑷𝑸
Sample Size(𝒏) = . 𝑷𝑸 = , for infinite population size.
𝒆 𝒆𝟐
Where, E/d/e = permissible error/allowable error
𝑍𝛼 = significant value or critical value of Z corresponding
to 𝛼 level of significance
෠ p)
P = population proportion( if not given we use 𝑃=
p = sample proportion
P + Q = 1, p + q = 1
𝑛 𝑍𝛼 . 2
 n= 𝑛 , 𝑓𝑜𝑟 𝑓𝑖𝑛𝑖𝑡𝑒 𝑝𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛. 𝑤ℎ𝑒𝑟𝑒 𝑛 = . 𝑃𝑄
1+ 𝑁 𝑒
𝑁 = 𝑃𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 𝑠𝑖𝑧𝑒
If prevalence rate (P or p ) unknown or previous study is not given we can use the
value of P or p = 0.5 (50%)
Example: It is desired to estimate the proportion of children watching
television on Saturday morning in order to develop a promotional
strategy for electronics games. We want to be 95% confident that our
estimate will be within ±2% of the true population proportion.
(i) What sample size should we take if a previous survey showed that
40% of children watched television on Saturday mornings?
(ii) What would be the sample size be for the same degree of confidence
and same maximum allowable error if no such previous survey had
been taken?
Solution:
(i) confidence level(1 – α) = 95% , α = 5% , 𝑍𝛼 = 1.96
P = 0.40, Q = 1- P = 1 – 0.40 = 0.60 Error (𝑒 ) = 0.02
Now we know that,
𝑍 2 𝛼 𝑃𝑄 (1.96)2 × 0.4 × 0.6
sample size (n) = = = 2305
𝑒2 (0.02)2
(ii) Since no previous study have been taken, we assume P = 0.5
Q = 1- 0.5 = 0.5
Now,
𝑍 2 𝛼 𝑃𝑄 (1.96)2 × 0.5 × 0.5
Sample size (n) = = = 2401
𝑒2 (0.02)2
Question:
If the population proportion of success is 0.65 and n= 100, what will be
the value of sampling error when acceptance region is 0.95?
 A study of 1000 computer engineers conducted by their
professional organization reported that 300 stated that their firm’s
greatest concern was to uplift the professional quality of work. In
order to conduct a follow up study to estimate the population
proportion of computer engineers to fulfill their greatest concern
within ±0.01 with 99% confidence interval, how many computer
engineers would be required to surveyed?
Solution :
From the study of previous survey proportion of computer engineers
300
wants to uplift is P = then Q = 1 – P
1000
Then use formula
𝑍 2 𝛼 𝑃𝑄
Sample size (n) =
𝑒2
Question: A sample of 64 students appearing in an examination
yield the error as 5 with standard deviation of 4. Find the risk.
If the sample size is increased to 144, how will risk be affected, the
standard deviation and error remaining the same.
Solution:
Sample size (n) = 64, Error (E) = 5 sample s.d. (s) = 4 Risk(𝛼)=?
𝑍𝛼 .𝜎 2 𝑍𝛼 .𝑠 2
We have, 𝑛 = = 𝑒
𝑒
𝑍 .4 2
𝛼
or, 64 = or, 𝑍𝛼 2 = 100 ∴ 𝑍𝛼 = ±10
5
Now, P(-10 < Z < 10) = 1
And P(-10 < Z < 10) = 1 - 𝛼 ∴ 1 = 1 - 𝛼 so, risk (𝛼 ) = 0
- 1-α
-10 0 +10

𝑍𝛼 .𝜎 2 𝑍𝛼 .𝑠 2
If n is increased to 144, we have 𝑛 = 𝑒
= 𝑒
𝑍𝛼 . 4 2
or, 144 = or, 𝑍𝛼 2 = 225 ∴ 𝑍𝛼 = ±15
5
Now, P(-15 < Z < 15) = 1
and P(-15 < Z < 15) = 1 - 𝛼 ∴ 1 = 1 - 𝛼 so, risk (𝛼 ) = 0
Hence, by increasing sample size from 64 to 144 risk will not affected.
THANKYOU

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy