Sampling Distributions

Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

5.

Sampling distributions
5.1 Introduction
A Parameter is a number that describes some characteristic of a population.
A Statistic is a number that describes some characteristic of a sample.
A Population consists of all observations of concern.
A Sample is a subset of the population.

A statistic is used to estimate a population parameter. The value of a statistic varies in repeated
random samples. Examples of parameters and statistics include;

Population Sample
∑ni=1 xi
Mean µ x̄ = n
∑ni=1 (xi −x̄)2 ∑ni=1 (xi −x̄)2 n ∑ni=1 xi2 −(∑ni=1 xi )2
Variance σ2 = s2 = =
√n n−1 √ n(n−1)
Standard deviation σ = σ2 s = s2
Proportion p p̂
The probability distribution of a statistic is called a Sampling distribution.
A statistic is said to be an unbiased estimator of a parameter if the mean of the sampling distribution
is equal to the parameter. eg. Consider a random sample of size n taken from a N(µ, σ 2 ) population.
Then;

∑ni=1 xi
x̄ =
n
x1 + x2 + ... + xn
E[x̄] = E[ ]
n

=
n

Therefore the sample mean x̄ is an unbiased estimator of the population mean µ.

Exercise 5.1.1
Show that the sample variance s2 is an unbiased estimator of the population variance σ 2 .

5.2 Sampling distributions of the mean


The sampling distribution of the mean is obtained from the central limit theorem.

Central limit theorem:


If we take many random samples of size n for a random variable with any distribution and record the
distribution of the means of each sample then;

1. The distribution of the means will be approximately normal for large sample sizes.

2. The mean of the distribution of means will be equal to the population mean.

1
σ2
3. The variance of the distribution of means will be equal to n .

The above hold for n large i.e n > 30. NOTE: If the original population is normally distributed then
the sample mean will be normally distributed for any sample size n.

Therefore from the central limit theorem, for n large the sampling distribution of the sample means is
normal i.e
σ2
X̄ ∼ N(µ, )
n

Example 5.2.1
Given a population {2,6,8,10,10,12}

i Find the population mean µ and population variance σ 2 .

ii List all 36 possible simple random samples of size n=2(assume you are picking with replace-
ment to maintain independence).Find x̄ for each sample.

iii Obtain the sampling distribution of x̄. Make a graph of the sampling distribution, obtain the
mean and variance of the sampling distribution and finally compare the mean and variance of
the sampling distribution with the population mean and variance obtained in part i.

Solution:
The population mean and variance
48
µ=
6
=8
(Xi − µ)2
σ2 = ∑
N
(−6) + (−2)2 + (0)2 + (2)2 + (2)2 + (4)2
2
=
6
64
=
6
= 10.6667
σ = 3.266

The distribution of X is given as

x 2 6 8 10 12
P(X = x) 1/6 1/6 1/6 2/6 1/6
and the histogram of the distribution

2
-~-
t 7
7t:T~~+--k:i±. ~-•
_I

1 1-- •

+-
-X •
...
-+ +-+-

I I

The 36 simple random samples are given as

2 6 8 10 10 12
2 2,2 2,6 2,8 2,10 2,10 2,12
6 6,2 6,6 6,8 6,10 6,10 6,12
8 8,2 8,6 8,8 8,10 8,10 8,12
10 10,2 10,6 10,8 10,10 10,10 10,12
10 10,2 10,6 10,8 10,10 10,10 10,12
12 12,2 12,6 12,8 12,10 12,10 12,12
and their means

2 6 8 10 10 12
2 2 4 5 6 6 7
6 4 6 7 8 8 9
8 5 7 8 9 9 10
10 6 8 9 10 10 11
10 6 8 9 10 10 11
12 7 9 10 11 11 12

3
The mean and variance of the sample means is given as
288
µx̄ = =8
36
(X̄i − µx̄ )2
σx̄2 = ∑
N
192
=
36
= 5.33333
σx̄ = 2.3094

The distribution of the sample means is given as

x̄ 2 4 5 6 7 8 9 10 11 12
P(X̄ = x̄) 1/36 2/36 2/36 5/36 4/36 5/36 6/36 6/36 4/36 1/36
and the histogram of the distribution

-~-
t 7
7t:T~~+--k:i±. ~-•
_I

1 1-- •

+-
-X •
...
-+ +-+-

I I

σx̄2 10.6667
From the above; µx = µx̄ and σx2 = n = 2 = 5.3333 .

Example 5.2.2
The weights of a population of workers have µ = 167 and σ = 27.A sample of 36 workers is chosen.
Approximate the probability that the sample mean of their weights lies between 163 and 170.

4
Solution:
2

X−µ n(X−µ)
The sample mean X̄ ∼ N(µ, σn ) therefore Z = σ

= σ Hence
n

P(163 < X̄ < 170) = P(−0.89 < Z < 0.67)


= 0.5619

Exercise 5.2
1. The amount of time that a drive-through bank teller spends on a customer is a random variable
with a mean µ = 3.2 minutes and a standard deviation σ = 1.6 minutes. If a random sample of
64 customers is observed, find the probability that their mean time at the teller’s counter is (a)
at most 2.7 minutes: (b) more than 3.5 minutes; (c) at least 3.2 minutes but less than 3.4 minutes.

2. The average life of a bread-making machine is 7 years, with a standard deviation of 1 year.
Assuming that the lives of these machines follow approximately a normal distribution, find
(a) the probability that the mean life of a random sample of 9 such machines falls between 6.4
and 7.2 years;
(b) the value of x; to the right of which 15% of the means computed from random samples of
size 9 would fall.

3. A certain type of thread is manufactured with a mean tensile strength of 78.3 kilograms and a
standard deviation of 5.6 kilograms. How is the variance of the sample mean changed when the
sample size is
(a) increased from 64 to 196?
(b) decreased from 784 to 49?

5.3 Sampling distributions of the difference between two means


If 2 independent samples of size n1 and n2 are drawn from two populations with means µ1 and µ2 and
variances σ12 and σ22 then the sampling distribution of the difference of sample means is approximately
normally distributed with mean
µx¯1 −x¯2 = µ1 − µ2
and variance
σ12 σ22
σx2¯1 −x¯2 = +
n1 n2
Hence
σ12 σ22
x¯1 − x¯2 ∼ N(µ1 − µ2 , + )
n1 n2
The standard normal random variable
(x¯1 − x¯2 ) − (µ1 − µ2 )
Z= q 2 ∼ N(0, 1)
σ1 σ22
n1 + n2

5
Example 5.3.1
Two independent experiments are being run in which two different types of paints are compared.
Eighteen specimens are painted using type A and the drying time, in hours, is recorded on each. The
same is done with type B. The population standard deviations are both known to be 1.0. Assuming
that the mean drying time is equal for the two types of paint, find P(X¯A − X¯B > 1.0) where X¯A and X¯B
are average drying times for samples of size n1 = n2 = 18.

Solution
σA2 σB2 1 1 1
x¯A − x¯B ∼ N(µ1 − µ2 = 0, + = + = )
n1 n2 18 18 9
Hence
1.0 − 0
P(X¯A − X¯B > 1.0) = P(Z > q )
1
9
= P(Z > 3.0)
= 0.0013

Exercise 5.3
1. Given the following data

Population 1 Population 2
µ1 = 6.5 µ2 = 6.0
σ1 = 0.9 σ2 = 0.8
n1 = 36 n2 = 49
Find P(X¯1 − X¯2 ≥ 1.0)

5.4 Sampling distribution of the variance


A random sample of size n is taken from a N(µ, σ 2 ) population. The sample variance is given as

∑ni=1 (xi − x̄)2


s2 =
n−1

To obtain the distribution of s2 we show that


n n
2
∑ (xi − µ) = ∑ (xi − x̄)2 + n(x̄ − µ)2
i=1 i=1

6
To show this
n n
∑ (xi − µ)2 = ∑ (xi − x̄ + x̄ − µ)2
i=1 i=1
n
= ∑ [(xi − x̄) + (x̄ − µ)]2
i=1
n n n
= ∑ (xi − x̄)2 + ∑ (x̄ − µ)2 + 2(x̄ − µ) ∑ (xi − x̄)
i=1 i=1 i=1
n n
= ∑ (xi − x̄)2 + n(x̄ − µ)2 + 2(x̄ − µ)[ ∑ xi − nx̄]
i=1 i=1
n
= ∑ (xi − x̄)2 + n(x̄ − µ)2 + 2(x̄ − µ)[nx̄ − nx̄]
i=1
n n
∑ (xi − µ)2 = ∑ (xi − x̄)2 + n(x̄ − µ)2
i=1 i=1
2
Dividing each term by σ
∑ni=1 (xi − µ)2 ∑ni=1 (xi − x̄)2 ∑ni=1 (xi − µ)2
= +
σ2 σ2 σ2
n 2
(n − 1)s
∑ Z2 = σ 2 + Z2
i=1
n
But ∑ Z 2 ∼ χ(n)
2
i=1
2
2
And Z ∼ χ(1)
Therefore
2 (n − 1)s2 2
χ(n) = + χ(1)
σ2
(n − 1)s2 2
= χ(n−1)
σ2
Thus the sampling distribution of the sample variance is dependent on the population variance and
has a chi-square distribution with n-1 degrees of freedom.

Example 5.4.1
The time it takes a central processing unit to process a certain type of job is normally distributed with
mean 20 seconds and standard deviation 3 seconds. If a sample of 7 such jobs is observed, what is
the probability that the sample variance will exceed 12?

Solution
We need to find P(S2 > 12) given that n=15 and σ = 3 thus

(n − 1) ∗ S2 6 ∗ 12
P(S2 > 12) = P( > )
σ2 32
2
= P(χ(14) > 8)
= 0.9

7
5.5 Sampling distribution of the ratio of variances
Given that S12 and S22 are the sample variances of independent random samples of size n1 and n2 taken
from normal populations with variances σ12 and σ22 respectively, then
S12
σ12 S12 σ22
F= = ∼ Fn1 −1,n2 −1
S22 S22 σ12
σ22

5.6 Sampling distribution of proportions


A sample proportion ( p̂) is a fraction of a sample and is left as a fraction or given as a percentage.
The sampling distribution of the sample proportion is related to the Binomial distribution where
Number of successes in a sample X
p̂ = =
Size of sample n

Since X ∼ Binomial(n, p) then µx = np and σx2 = np(1 − p) Hence


E(X) np
µ p̂ = = =p
n n
Var(X) np(1 − p) p(1 − p)
σ p̂2 = = =
n2 n2 n

Therefore for n large


p(1 − p)
p̂ ∼ N(p, )
n
We use the normal approximation IFF np ≥ 10 and np(1 − p) ≥ 10. This is known as the large counts
condition.

NOTE: When sampling from a finite population without replacement i.e the observations are not
independent then sampling too large a fraction of the population means that the standard deviation
of sample proportion σ p̂ will be inaccurate. To calculate σ p̂ we use the finite population correction
(FPC). However this is not considered when the population size is large in relation to the sample size.

Example 5.6.1
Suppose that 45 percent of the population favors a certain candidate in an upcoming election. If a
random sample of size 200 is chosen, find the expected value and standard deviation of the number
of members of the sample that favor the candidate

Solution
Population proportion (p=0.45), sample size n=200 and number of members of the sample that favor
the candidate=X, hence

E(X) = np
= 200 ∗ 0.45 = 90
p
stddev(X) = np(1 − p)
p
= 90(0.55) = 7.0356

8
Exercise 5.6
1. A local college has 500 students and 54 of them are left-handed. You conduct a survey of 50
students and find that 6 of them are left-handed.
(a) What is the population proportion of left-handed students?
(b) What is the sample proportion of left-handed students?

2. A distribution of sample proportions is given with mean=0.6 and standard deviation=0.1. A


random sample of 32 is selected from this population.
(a) Compute the sample proportion for this sample.
(b) What is the probability of selecting another sample with a greater proportion than the one
you selected?

3. The following table gives the percentages of individuals, categorized by gender, that follow
certain negative health practices.

Sleeps 6 Hours Rarely Eats Is 20 Percent or


or Less per Night Smoker Breakfast More Overweight
Men 22.7 28.4 45.4 29.6
Women 21.4 22.8 42.0 25.6
i. Suppose a random sample of 300men is chosen. Approximate the probability that
(a) at least 150 of them rarely eat breakfast;
(b) fewer than 100 of them smoke.
ii. Suppose a random sample of 300 women is chosen. Approximate the probability that
(a) at least 60 of them are overweight by 20 percent or more;
(b) fewer than 50 of them sleep 6 hours or less nightly.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy