Week 11: Sampling Distribution
Week 11: Sampling Distribution
Week 11: Sampling Distribution
Learning objectives:
Any quantity obtained from a sample for the purpose of estimating a population
parameter is called a sample statistic or static.
Population is the entire group that you want to draw conclusion about. Can be
defined in terms of geographical location, age, income, and many other
characteristics.
Sample is the specific group of individuals that you will collect data from.
Kinds of Sampling
Example
You want to select a simple random sample of 100 employees of TEP
Company. You assign a number to every employee in the company database
from 1 to 1000, and use a random number generator to select 100 numbers.
To use this sampling method, you divide the population into subgroups (called
strata) based on the relevant characteristic (e.g. gender, age range, income
bracket, job role).
Based on the overall proportions of the population, you calculate how many
people should be sampled from each subgroup. Then you use random or
systematic sampling to select a sample from each subgroup.
Example
The company has 800 female employees and 200 male employees. You want to
ensure that the sample reflects the gender balance of the company, so you sort
the population into two strata based on gender. Then you use random sampling
on each group, selecting 80 women and 20 men, which give you a representative
sample of 100 people.
C. Cluster sampling - also involves dividing the population into subgroups, but
each subgroup should have similar characteristics to the whole sample. Instead
of sampling individuals from each subgroup, you randomly select entire
subgroups.
If it is practically possible, you might include every individual from each sampled
cluster. If the clusters themselves are large, you can also sample individuals from
within each cluster using one of the techniques above.
This method is good for dealing with large and dispersed populations, but there is
more risk of error in the sample, as there could be substantial differences
between clusters. It’s difficult to guarantee that the sampled clusters are really
representative of the whole population.
Example
The company has offices in 10 cities across the country (all with roughly the
same number of employees in similar roles). You don’t have the capacity to
travel to every office to collect your data, so you use random sampling to select 3
offices – these are your clusters.
Example
All employees of the company are listed in alphabetical order. From the first 10
numbers, you randomly select a starting point: number 6. From number 6
onwards, every 10th person on the list is selected (6, 16, 26, 36, and so on), and
you end up with a sample of 100 people.
If you use this technique, it is important to make sure that there is no hidden
pattern in the list that might skew the sample. For example, if the HR database
groups employees by team, and team members are listed in order of seniority,
there is a risk that your interval might skip over people in junior roles, resulting in
a sample that is skewed towards senior employees.
If random samples of (n) elements are drown from a non-normal population with
finite mean (m) and standard deviation σ , then when n is larger, the sampling
distribution of the sample mean x is approximately normal distributed, with mean and
standard deviation.
σ
. μ x = μ and σ x =
√n
Sampling distribution of means
The sampling distribution of means is the expected value of the sample mean in
the population mean, denoted by μ x given by
E (x) = μ x = μ
If the population is infinite or if the sampling is with replacement, then the variance of
the sampling distribution of means is denoted by σ 2x given by
σ2
E [(X – μ)]= σ 2x =
N
σ
σ x =
√n
If the population is of size N, if sampling is without replacement, and if the sample size
is n ≤ N, then
2 σ 2 N −n
σ x= ( )
n N −1
Suppose that the populations from which samples are taken has a probability
distribution with mean m and varianceσ 2, that is not normal distribution, and then the
standardized variable associated with X is given by
X−µ
Z=
σ / √n
Examples
1. Assume that the weight of 2,000 students of TEP is more normally distributed
with a mean of 45kg and standard deviation of 2kg. if 100 samples consisting of
25 students each are obtained. What would be the expected mean and standard
deviation of the resulting sampling distribution of means if sampling were done
a) With replacement
b) Without replacement?
Solution:
a. With replacement
σ 2
μ x = μ and σ x = = =0.4
√ n √ 25
b. Without replacement
σ N −n 2 3000−25
μ x = μ and σ x =
√ =
√
√ n N −1 √ 25 3000−1
=0.398
2. In how many samples of example 1 would you expect to find the mean between
a) 44.7 and 46.1?
b) Less than 46kg?
Solution:
44.7−45
a. 44.7kg in standard units = =0.75
0.4
46.1−45
46.1 kg in standard units = =2.75
0.4
46−45
b. 46 kg in standard units =2.5
0.4
Examples
Solution:
µ p= p=0.55 and
σ p=
√ √
Pq
n
=
P (1−P)
n
=
( 0.55 ) ( 0.45) = 0.222
√ 500
0.6−0.55
0.60 In standard units = =2.25
0.0222
2. Find the probability that in 200 tosses of a fair coin between 40% and 60% will be
heads.
Solution:
40% of 200 is 80 and 60% of 200 is 120 . we are asked to find the probability
that the number of heads lies between 79.5 and 120.5,
79.5−100
79.5 in standard units = = 2.90
7.07
120.5−100
120.5 in standard units = = 2.90
7.07
Required probability is
Suppose that we are given two populations, then the sampling distribution of the
difference of two means is given by
μ x − x = μ x - μ x = μ1- μ2
1 2 1 2
And
σ 21 σ 22
σ x −x = √ σ 2x + σ 2x =
1 2 1 2
√ +
n1 n2
Z= ¿ ¿
μ P −P = μ P - μ P = P1- P2
1 2 1 2
And
P1 q P2 q
σ P −P = √ σ 2P + σ 2P =
1 2 1 2
√ n1
1
+
n2
2
Examples
1. LED flashlight A has a mean life of 1400 hours, with a standard deviation of 200
hours, while those of LED flashlight B have a mean life of 1200 hours, with a
standard deviation of 100 hours. If random samples of 125 flashlights of each
brand are tested, what is the probability that LED flashlight A will have a mean
lifetime which is at least
a) 180 hours
b) 240 hours more than the brand B bulbs?
Solution:
Then
σ 21 σ 22 (200)2 (100)2
σ x −x =
1 2
√ +
n1 n2 √ =
125
+
125
= 20hrs
180−200
180 hrs in standard units = =1
20
Required probability
240−200
b. 240 hours in standard units = =2
20
Required probability
P(z>2)=1-P(z<2)=1-0.9772 = 0.0228
Reference: