Lecture 8 - Statistics

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 21

Lecture 8

Random Sampling

• A population consists of the totality of the observations with which


we are concerned.
• Let X1, X2,……..,Xn be n independent random variables each having
the same probability distribution f(x). We then define X1, X2,……..,Xn
to be random sample of size n from the population f(x) and write
its joint probability distribution as
• f(x1, x2, ….., xn) = f(x1)f(x2)…..f(xn).
• Sampling Theory
• A statistic is a random variable that depends only on the observed
random sample.
• The most common statistics for measuring the center of a set of
data, arranged in order of magnitude, are the mean, the median,
and mode. The most important of these is the mean
If X1, X2,……..,Xn represent a random sample of size n,
then the sample mean is defined by the statistic

Example
Find the mean of the random sample whose
observations are 20, 27, and 25.
Solution
The second most useful statistic for measuring the center of the set of data is the median.
If X1, X2,……..,Xn represent a random sample of size n,
then the sample median is defined by the statistic

Example
Find the median for the random sample
whose observations are 8, 3, 9, 5, 6, 8, and 5.
Solution
Arranging the observations in order of magnitude, 3,5,5,6,8,8,9 gives the median as 6.
Example
Find the median for the random sample whose observations are 10, 8, 4, and 7.
Solution
Arranging the observations in order of magnitude 4, 7,8,10,
the median is the arithmetic mean of the two middle values.
Therefore the median is (7+8)/2 = 7.5.
• The third and final statistic for measuring the center of a random
sample is the mode.
• If X1, X2,……..,Xn, not necessarily different, represent a random
sample of size n, then the sample mode M is that value of sample
that occurs more often or with greatest frequency. The mode may
not exist, and when it does it is not necessarily unique.
• Example
• The mode of the random sample whose observations are
2,4,4,5,6,6,6,7,7, and 8 is 6.
• Example
• The observations 3,4,4,4,4,7,7,8,8,8,8, and 9 have two modes, 4
and 8 occur with the greatest frequency. The distribution of the
sample is said to be bimodal.
• The relative merits of the three (mean, median, mode)
are
• (1) The mean is the most common used measure of
central tendency in statistics.
• (2) The only real disadvantage of the mean is that it
may be affected adversely by extreme values.
• (3) The median has the advantage being easy to
compute. It is not influenced by extreme values.
• (4) The mode is the least used measure of the three.
For small set of data its value is almost useless, if in
fact it exist at all. Its only advantage is that it requires
no calculation.
Variability Measurement

• The most important statistics for measuring the variability


of a random sample are the range and the variance. The
simplest of these to compute is the range.
• The range of a random X1, X2,……..,Xn, arranged in
increasing order of magnitude, is defined by the statistic Xn
– X1.
• Example
• The range of the set of observations 10,12,12,18,19,22, and
24 is 24-10 =14.
• The range is a poor measure of variability, particularly if the
size of the sample is large. It considers only extreme values
and tells nothing about the distribution of values in
between.
To overcome the disadvantage of the range the sample variance
is used which consider the position of each observation relative to the sample mean.
If X1, X2,……..,Xn represent a random sample of size n,
then the sample variance is defined by the statistic

If S2 is the variance of a random sample of size n, we may write

Example
Find the variance of the sample whose observations are 3,4,5,6,6, and7.
Solution
∑xi2 = 171 i=1 to 6, ∑xi = 31, i=1 to 6, n=6. Hence
S2 = ((6)(171) – (31)2)/(6)(5) = 13/6
.

The probability distribution of a statistic is called a sampling distribution.


The standard deviation of the sampling distribution of a statistic
is called the standard error of the statistic.

The probability distribution of

is called the sampling distribution of the mean,


and the standard error of the mean is the standard
deviation of the sampling distribution of
Sampling Distributions of Means

If we are sampling from a population with unknown distribution,


either finite or infinite, the sampling distribution of
will still be approximately normal with mean μ and variance σ2/n
provided that the sample size is large.

If is the mean of a random sample of size n taken from a population


with mean μ and variance σ2 then the limiting form of the distribution of

as n→∞, the standardized normal distribution n(z;0,1).


• The normal approximation for sampling mean
will generally be good if n≥30 regardless of the
shape of the population. If n˂30, the
approximation is only good if the population is
not too different from the normal population.
Example

• An electrical firm manufactures light bulbs that


have a length of life that is approximately
normally distributed, with mean 800 hours and
standard deviation of 40 hours. Find the
probability that a random sample of 16 bulbs will
have an average life of less than 775 hours.
• Solution
• mean = 800, standard deviation = 40/√16 = 10
• z = (775 – 800)/10 = -2.5
• P( ˂ 775) = P(Z˂-2.5) = 0.0062
• Example
• Given the discrete uniform population
• f(x) =1/4 , x=0, 1, 2, 3
• = 0 elsewhere
• Find the probability that a random sample of
size 36, selected with replacement will yield a
sample mean greater than 1.4 but less than
1.8 if the mean is measured to the nearest
tenth.
Solution
• μ = (0 + 1 + 2 + 3 )/4 = 3.5
• σ2 = 5/4
• σ2(x) = σ2/√n = (5/4)/√36 =5/144
• σ (x) = √(5/144) = 0.186
• x̅1 =1.45, x̅2 = 1.75
• z1 = (1.45 – 1.5)/0.186 = -0.269
• z2 = (1.75 – 1.5)/0.186 = 1.344
• P(1.4˂X˂ 1.8) ≈ P(-0.269˂Z˂1.344)
• P(Z˂1.344) – P(Z˂-0.269) =0.9105-0.3932=0.5173
Sampling Distribution of the Difference of Means
If independent samples of size n1 and n2 are drawn at random
from two populations, discrete or continuous, with mean μ1 and μ2
and variances σ12 and σ22, respectively,
then the sampling distribution of the differences of means
is approximately normally distributed with mean and variance given by

Hence

is approximately a standard normal variable.


• Example
• The television picture tubes of manufacture A
have a mean lifetime of 6.5 years and a standard
deviation of 0.9 year, while those of manufacture
B have a mean lifetime of 6.0 years and standard
deviation of 0.8 year. What is the probability that
a random sample of 36 tubes from manufacture
A will have a mean lifetime that at least a year
more than the mean lifetime of a sample of 49
tubes from manufacture B?
Solution

P(Z˃2.646) = 1- P(Z˂2.646)
= 1 – 0.9959 = 0.0041
Sampling Distribution of (n-1)S2/σ2

• If S2 is the variance of a random sample of size n taken from a normal population


having a variance σ2, then the random variable
• χ2 = (n-1)S2/σ2
• has a chi-square distribution with ν degrees of freedom.
• The probability that a random sample produces a χ2 value greater than some
specified value is equal to the area under the curve to the right of this value.
• Example
• A manufacturer of car batteries guarantees that his batteries will last, on the
average, 3 years with standard deviation of 1 year. If five of these batteries have
lifetimes of 1.9,2.4,3.0,3.5, and 4.2 years, is the manufacture still convinced that
his batteries have standard deviation of 1 year.
• Solution
• S2 = ((5)(48.26)-(15)2)/((5)(4)) = 0.815
• Then χ2 = (4)(0.815)/(1) = 3.26
• from tables 95% of χ2 fall between 0.484 and 11.143, then the computed value is
reasonable.
t Distribution

• If the sample size is small(n˂30), the value of S2 fluctuate


considerably from sample to sample and the distribution of the
random variable (X̅ -μ)/(S/√n) is no longer a standard normal
distribution. We are now dealing with the distribution of statistic
that we shall call T, where
• T = (X̅-μ)/(S/√n)
• The distribution of T is similar to the distribution of Z in that they
both are symmetric about a mean of zero. Both distributions are
bell-shaped, but the t distribution is more variable, owing to the
fact that the T values depend on the on the fluctuations of two of
two quantities X̅ and S2, whereas the Z values depend only on the
changes of X̅ from sample to sample. The distribution of T differs
from that of Z in that the variance of T depends on the sample size
and always greater than 1.
Example
• A manufacture of light bulbs claims that his bulbs will burn on the average
500 hours. To maintain this average, he tests 25 bulbs each month. If the
computed t values falls between –t0.05 and t0.05, he is satisfied with his
claim. What conclusion should he draw from a sample that has mean x̅
=518 hours and standard deviation s = 40 hours? Assuming the
distribution of burning times is approximately normal.
• Solution
• From table V we find t0.05 =1.711 for 24 degrees of freedom. Therefore,
the manufacture is satisfied with claim if a sample of 25 bulbs yield a t
value between -1.711 and 1.711. If μ = 500, then
• t = (518 -500)/(40/√25) = 2.25
• a value well above 1.711. The probability of obtaining a t value, with ν =24
equal or greater than 2.25 is approximately 0.02. If μ ˃ 500, the value of t
computed from the sample would be more reasonable. Hence the
manufacture is likely to conclude that his bulbs are a better product than
he thought.
F Distribution

• One of the most important distribution in applied statistics is the F


Distribution. The statistic F is defined to be the ratio of two independent
chi-square random variables, each divided by their degrees of freedom.
Hence
• F = (U/ν1)/(V/ν2)
• where U and V are independent random variables having chi-square
distributions with ν1 and ν2 degrees of freedom, respectively.
• Let us define fα to be a particular value f of the random variable F above
which we find an area equal to α.
• Writing fα (ν1,ν2) for fα with ν1 and ν2 degrees of freedom we obtain
• f1-α(ν1,ν2) = 1/ fα (ν2,ν1)
• f0.95(6,10) = 1/ f0.05 (10,6)=1/4.06=0.246
• If S12 and S22 are the variances of independent random samples of size n1
and n2 taken from normal populations with variances σ12 and σ22 ,
respectively, then
• F = (S12/ σ12)/( S22/ σ22) =( σ22S12)/( σ12 S22)

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy