Week 11: Sampling Distribution

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 9

COURSE CODE : BM 212

COURSE DESCRIPTION : Business Statistics 


TARGET POPULATION : All 2nd year BSBA Students
COURSE FACILITATOR : Jerald M. Pedregosa

Week 11: Sampling Distribution

Learning objectives:

 Understand the concept of Sampling distribution.


 Identify and define different sampling methods.
 State the central limit theorem.

What Is a Sampling Distribution?

A sampling distribution is a probability distribution of a statistic obtained from a larger


number of samples drawn from a specific population. The sampling distribution of a
given population is the distribution of frequencies of a range of different outcomes that
could possibly occur for a statistic of a population.

In a sampling distribution, we can compute a mean, variance, standard deviation,


moments, etc. The standard deviation is called standard error.

Any quantity obtained from a sample for the purpose of estimating a population
parameter is called a sample statistic or static.

Types of sampling method

 Probability sampling – Involves random selection, allowing you to make


statistical inferences about the whole group.
 Non-probability sampling – Involves non-random selection based on
convenience or other criteria, allowing you to easily collect initial data.

Population vs. Sample

 Population is the entire group that you want to draw conclusion about. Can be
defined in terms of geographical location, age, income, and many other
characteristics.
 Sample is the specific group of individuals that you will collect data from.
Kinds of Sampling

A. Simple random sampling- every member of the population has an equal


chance of being selected. To conduct this type of sampling, you can use tools
like random number generators or other techniques that are based entirely on
chance.

Example
You want to select a simple random sample of 100 employees of TEP
Company. You assign a number to every employee in the company database
from 1 to 1000, and use a random number generator to select 100 numbers.

B. Stratified random sampling – involves dividing the population into


subpopulations that may differ in important ways. It allows you to draw more
precise conclusions by ensuring that every subgroup is properly represented in
the sample.

To use this sampling method, you divide the population into subgroups (called
strata) based on the relevant characteristic (e.g. gender, age range, income
bracket, job role).

Based on the overall proportions of the population, you calculate how many
people should be sampled from each subgroup. Then you use random or
systematic sampling to select a sample from each subgroup.

Example
The company has 800 female employees and 200 male employees. You want to
ensure that the sample reflects the gender balance of the company, so you sort
the population into two strata based on gender. Then you use random sampling
on each group, selecting 80 women and 20 men, which give you a representative
sample of 100 people.

C. Cluster sampling - also involves dividing the population into subgroups, but
each subgroup should have similar characteristics to the whole sample. Instead
of sampling individuals from each subgroup, you randomly select entire
subgroups.

If it is practically possible, you might include every individual from each sampled
cluster. If the clusters themselves are large, you can also sample individuals from
within each cluster using one of the techniques above.
This method is good for dealing with large and dispersed populations, but there is
more risk of error in the sample, as there could be substantial differences
between clusters. It’s difficult to guarantee that the sampled clusters are really
representative of the whole population.

Example
The company has offices in 10 cities across the country (all with roughly the
same number of employees in similar roles). You don’t have the capacity to
travel to every office to collect your data, so you use random sampling to select 3
offices – these are your clusters.

D. Systematic random sampling - is similar to simple random sampling, but it is


usually slightly easier to conduct. Every member of the population is listed with a
number, but instead of randomly generating numbers, individuals are chosen at
regular intervals.

Example
All employees of the company are listed in alphabetical order. From the first 10
numbers, you randomly select a starting point: number 6. From number 6
onwards, every 10th person on the list is selected (6, 16, 26, 36, and so on), and
you end up with a sample of 100 people.

If you use this technique, it is important to make sure that there is no hidden
pattern in the list that might skew the sample. For example, if the HR database
groups employees by team, and team members are listed in order of seniority,
there is a risk that your interval might skip over people in junior roles, resulting in
a sample that is skewed towards senior employees.

Central Limit Theorem

If random samples of (n) elements are drown from a non-normal population with
finite mean (m) and standard deviation σ , then when n is larger, the sampling
distribution of the sample mean x is approximately normal distributed, with mean and
standard deviation.

σ
. μ x = μ and σ x  =
√n
Sampling distribution of means

The sampling distribution of means is the expected value of the sample mean in
the population mean, denoted by μ x given by
E (x) = μ x = μ

Where μ is the mean of the population.

If the population is infinite or if the sampling is with replacement, then the variance of
the sampling distribution of means is denoted by  σ 2x given by

σ2
E [(X – μ)]= σ 2x =
N

Where σ 2 is the variance of the population?

The standard deviation of the sampling distribution of the means is defined by

σ
σ x  =
√n
If the population is of size N, if sampling is without replacement, and if the sample size
is n ≤ N, then

2 σ 2 N −n
σ x= ( )
n N −1

Suppose that the populations from which samples are taken has a probability
distribution with mean m and varianceσ 2, that is not normal distribution, and then the
standardized variable associated with X is given by

X−µ
Z=
σ / √n

Examples

1. Assume that the weight of 2,000 students of TEP is more normally distributed
with a mean of 45kg and standard deviation of 2kg. if 100 samples consisting of
25 students each are obtained. What would be the expected mean and standard
deviation of the resulting sampling distribution of means if sampling were done
a) With replacement
b) Without replacement?

Solution:

a. With replacement

σ 2
μ x = μ and σ x  = = =0.4
√ n √ 25
b. Without replacement
σ N −n 2 3000−25
μ x = μ and σ x  =
√ =

√ n N −1 √ 25 3000−1
=0.398

2. In how many samples of example 1 would you expect to find the mean between
a) 44.7 and 46.1?
b) Less than 46kg?

Solution:

44.7−45
a. 44.7kg in standard units = =0.75
0.4

46.1−45
46.1 kg in standard units = =2.75
0.4

Proportion of samples with mean between 44.7 and 46.1

= ( area under the normal curve between z = -0.75 and z = 2.75)


= 0.2734 + 0.4970 = 0.7704

Then the expected number of samples = (100)(0.7704) or 77

46−45
b. 46 kg in standard units =2.5
0.4

Proportion of samples with mean less than 46 kg

= ( area under the normal curve less than z = 2.5)

= 0.5+ 0.4938 = 0.9938

Then the expected number of samples = (100) (0.9938) or 99

Sampling Distribution of Population

Suppose that a population is infinite and binomially distributed, with p and q = 1 – p.

Then a sampling distribution of proportions whose mean µ p and standard deviation σ p


are given by
µ p= p and σ p= Pq = P (1−P)
√ √
n n

Examples

1. In a survey of 500 internet users, 60% of teenagers agreed that is important in


social life. Suppose the proportion of teenagers in the population is actually equal
to 0.55. What is the probability of observing a sample proportion greater than or
equal to 0.60?

Solution:

µ p= p=0.55 and

σ p=
√ √
Pq
n
=
P (1−P)
n
=
( 0.55 ) ( 0.45) = 0.222
√ 500

0.6−0.55
0.60 In standard units = =2.25
0.0222

The probability of observing a sample proportion greater than or equal to 0.60


is

P p > 0.6 = P ( z>2.25) = 1-P (z<2.25) = 1- 0.9878= 0.0122

2. Find the probability that in 200 tosses of a fair coin between 40% and 60% will be
heads.

Solution:

The expected number of heads µ=np = 200 (0.5) = 100


And
σ =√ npq = √ ( 200 ) ( 0.5 )( 0.5 ) = 7.07

40% of 200 is 80 and 60% of 200 is 120 . we are asked to find the probability
that the number of heads lies between 79.5 and 120.5,
79.5−100
79.5 in standard units = = 2.90
7.07

120.5−100
120.5 in standard units = = 2.90
7.07

Required probability is

= (area under the normal curve between z= -2.90 and z= 2.90)


=P (z<2.9) –P (z <2.9) = 0.9981 – 0.0019 = 0.9962

Sampling Distribution of Differences

Suppose that we are given two populations, then the sampling distribution of the
difference of two means is given by

μ x − x = μ x - μ x = μ1- μ2
1 2 1 2

And

σ 21 σ 22
σ x −x = √ σ 2x + σ 2x =
1 2 1 2
√ +
n1 n2

The standardized variable is

Z= ¿ ¿

The sampling distribution of the difference of two proportions is given by

μ P −P = μ P - μ P = P1- P2
1 2 1 2

And

P1 q P2 q
σ P −P = √ σ 2P + σ 2P =
1 2 1 2
√ n1
1
+
n2
2

Examples
1. LED flashlight A has a mean life of 1400 hours, with a standard deviation of 200
hours, while those of LED flashlight B have a mean life of 1200 hours, with a
standard deviation of 100 hours. If random samples of 125 flashlights of each
brand are tested, what is the probability that LED flashlight A will have a mean
lifetime which is at least
a) 180 hours
b) 240 hours more than the brand B bulbs?

Solution:

a. Let μ x = 1400 be the mean life of brand A and μ x = 1200.


1 2

Then

μ x - μ x = 1400 -1200 = 200 hours


1 2

σ 21 σ 22 (200)2 (100)2
σ x −x =
1 2

√ +
n1 n2 √ =
125
+
125
= 20hrs

180−200
180 hrs in standard units = =1
20

Required probability

=(area under normal curve to the right of z = -1 )

P(z>-1) = 1-P(z<-1) = 1- 0.1587 = 0.8413

240−200
b. 240 hours in standard units = =2
20

Required probability

= (area under the normal curve to the right of z = 2 )

P(z>2)=1-P(z<2)=1-0.9772 = 0.0228

Reference: 

Cabrero, B (2013). Business Statistics.Anvil Publishing, Inc

Sirug, W (2015). Basic robability Statistics, A Step by Step Approach, Revised


Edition.Mindshapers.,Inc.
--------------------------------------------End of Week 11------------------------------------------------

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy