Chap5 Estimation Upload

1
CHAPTER 5
ESTIMATION AND
STATISTICAL INTERVALS
Outline of Chapter 5
2
5.1 Point Estimation
5.2 Large-Sample Confidence Intervals for a Population Mean
5.3 More Large-Sample Confidence Intervals
5.4 Small-Sample Intervals Based on a Normal Population Distribution
5.5 Intervals for µ1- µ2 Based on Normal Population Distributions

à Introduction
• The general objective of statistical inference is to use sample information

as a basis for drawing various types of conclusions.
• When a parameter is being estimated, the estimate can be either a single

number or it can be a range of scores.
Ø When the estimate is a single number, the estimate is called a point
estimate.
Ø When the estimate is a range of scores, the estimate is called an
interval estimate. Confidence intervals are used for interval estimates.
à Point Estimation
• A point estimate of some parameter q is a single number, calculated

from sample data, that can be regarded as an educated guess for the
value of q.
• The symbol q! is frequently used to denote either the estimator or the
resulting estimate.
Ø For example: we might decide that .350 is a point estimate for the
proportion p of all individuals who would try a particular product
again after using a free trial sample.
à Properties of Estimators
à Bias and unbiased estimator
• One desirable property that a good estimator should possess is that it

be unbiased.
• In terms of sampling distributions, an estimator is said to be unbiased
if the mean of its sampling distribution coincides with the parameter
that is being estimated.
Ø For instance, the sampling distribution of the statistic 𝑥̅ has a mean

value of µ𝑥 ̅ , which equals the mean µ of the population from which
the samples are taken.
Ø Then 𝑥̅ is said to be an estimator of the parameter µ and, because

µ𝑥 ̅ = µ, 𝑥̅ is also an unbiased estimator of µ.
In general, for any

population parameter q
and any estimator q! of
that parameter, Figure
illustrates what it means
for q! to be unbiased or
biased.
Sampling distribution of an estimator q!

DEFINITIONS
Denote a population parameter generically by the letter q and denote any

estimator of this parameter by q. ! Then q! is an unbiased estimator if
µq̂ = q . Otherwise, q! is said to be biased, and the quantity µq̂ - q is called
!
the bias of q.
• Some of the most important statistics we have studied are unbiased

estimators of certain population parameters.
• For example, it can be shown that the sample mean 𝑥̅ is an unbiased
estimator of the population mean µ, the sample variance s2 is an
unbiased estimator of the population variance s2, and the sample
proportion p is an unbiased estimator of the population proportion p.
à Consistency
• A second desirable property that estimators often possess is

consistency. If q! denotes an estimator of some population parameter
q, then q! is said to be consistent if the probability that it lies close to q
increases to 1 as the sample size increases.
Ø consistent estimators become more and more accurate as the sample

size increases. That is, as you increase n, it becomes more and more
likely that such estimators will be very close to the parameter they are
intended to estimate.
• The most common method for showing that an estimator is consistent

is to show that its standard error decreases as the sample size
increases.
à Interval estimate or confidence interval (CI)
• A point estimate, because it is a single number, by itself provides no

information about the precision and reliability of estimation.
• Because of sampling variability, it is virtually never the case that 𝑥̅ = µ.

The point estimate says nothing about how close it might be to µ.
• An alternative way is to calculate and report an entire interval of plausible

values—an interval estimate or confidence interval (CI).
• A confidence interval is always calculated by first selecting a confidence

level, which is a measure of the degree of reliability of the interval.
• The higher the confidence level, the more strongly we believe that the value
of the parameter being estimated lies within the interval.
à A Confidence Interval for µ with Confidence Level 95%
• A confidence interval for a population or process mean µ is based on

the following properties of the sampling distribution of 𝑥:̅
• When n is large, the 𝑥̅ distribution is approximately normal (this is

the Central Limit Theorem).
• Standardizing 𝑥̅ gives standard normal (the z curve):
• In fact, standard deviation 𝛔 will almost never be knownà replace

with the sample standard deviation s:
• Due to Appendix Table I:
Capturing a central curve area of .95

Substituting the values of n, 𝑥,̅ and s from any particular sample into these
expressions gives a confidence interval for µ with a confidence level of
approximately 95%.
à A Confidence Interval for µ with Confidence Level 95%à Example 5.3
Given the accompanying sample

observations on breakdown
voltage (kV) of a particular
circuit under certain conditions: What is the CI for µ ?
The boxplot of the data shows a

high concentration in the middle
half of the data (narrow box width). The output from the JMP software’s Analyze/Distribution command
à A Confidence Interval for µ with Confidence Level 95%à Example 5.3
Solution:
à Other Confidence Levels and a General Formula
• The confidence level of 95% was inherited from the probability .95 with which we began
the derivation of the interval. This probability in turn dictated the use of the z critical value 1.96
in the confidence interval formula.
Ø It follows that if we want a confidence level of 99%, we should identify the z critical value
that captures a central z curve area of .99.
Finding the critical value for a 99% confidence level

A large-sample confidence interval for a population or process mean µ is given by

the formula:
• As a general rule, this

interval is appropriate
when the sample size
exceeds 30.
• The three most commonly

used confidence levels,
90%, 95%, and 99%, use
critical values of 1.645,
1.96, and 2.576,
respectively.
Finding the critical value for a 99% confidence level

Exercises:
1.
2. Random samples of size n are selected from a normal population whose standard
deviation 𝜎 is known to be 2.
a. Suppose you want 90% of the area under the sampling distribution of 𝑥̅ to lie within
±1 unit of a population mean 𝜇. Find the minimum sample size n that satisfies this
requirement.
b. Repeat the calculations in part (a) for areas of 80%, 95%, and 99%.
Solution Ex1:
Solution Ex2:
à Choosing the Sample Size
• The half-width 1.96s/ 𝑛 of the 95% CI is sometimes called the bound on

the error of estimation associated with a 95% confidence level; that is, with
95% confidence, the point estimate 𝑥̅ will be no farther than this from µ.
• Before obtaining data, an investigator may wish to determine a sample size
for which a particular value of the bound is achieved.
• More generally, suppose we wish to estimate µ to within an amount B (the
specified bound on the error of estimation) with 95% confidence. This
implies that B=1.96s/ 𝑛, from which:
à How to define s in general?

=> For a population distribution that is not too skewed: s = (𝒙𝒎𝒂𝒙 −𝒙𝒎𝒊𝒏 )/4
à Choosing the Sample Size: Example
• Example: Back to Example 5.3

Given the accompanying
sample observations on
breakdown voltage (kV)
of a particular circuit:
à Suppose that the investigator believes that almost all values in the population
distribution are between 40 and 70. Then (70-40)/4 = 7.5 gives a reasonable value
for s.
à Question: What is the appropriate sample size for estimating true average
breakdown voltage to within 1 kV with confidence level 95%?
à One-Sided Confidence Intervals (Confidence Bounds)
à A Large-Sample Confidence Interval for µ1- µ2
• For population: µ à mean value, s à standard deviation (std), 𝜎 ! à variance

• For sample distribution: 𝑥̅ à sample mean, 𝑠 à sample std, 𝑠 ! à sample variance
à Example 5.5
A study was carried out to compare population mean lifetimes (hr) for two different
brands of AA batteries. Here, µ1 and s1 mean value and standard deviation for the
distribution of brand 1 lifetimes; µ2 and s2 are the mean value and standard deviation
for the distribution of brand 2 lifetimes. Values of the summary quantities calculated
from the two resulting samples are as follows:
Question: What is the estimation of the difference µ1- µ2?

à The natural statistic for estimating µ1 is 𝑥̅ 1, for estimating µ2 is 𝑥̅ 2
àThe difference µ1- µ2 is estimated of 𝑥"̅ − 𝑥̅!
à The point estimate from the data is 4.15 - 4.53 = 0.38.
Ø That is, we estimate that, on average, brand 2 batteries last 0.38 hr longer than do
brand 1 batteries!!
1. For any two random variables x and y,
2. If x and y are two independent random variables, then
3. If x and y are independent random variables, each with a normal distribution, then
the difference x - y also has a normal distribution. If each variable is approximately
normal, then the distribution of the difference is also approximately normal.
%1- 𝒙
à Properties of the Sampling Distribution of 𝒙 %2
%1- 𝒙
(1- 𝒙
• Consider results to standardize 𝒙 (2 when both sample sizes are large.
• When 1 and 2 are both large, the standardized variable
has approximately a standard normal distribution (the z curve).

%1- 𝒙
• Using this variable in the same way that variables were used earlier to
obtain confidence intervals for µ and for p gives the following large-sample
confidence interval formula for estimating µ1- µ2:
• This formula is valid irrespective of the shapes of the two underlying

distributions.
• The three most frequently used confidence levels of 95%, 99%, and 90%
are achieved by using the critical values 1.96, 2.576, and 1.645,
respectively.
à Section 5.3 Exercises
• Given the two samples of the following disks with 3/8-inches and 1/2-
inches diameter.
à What is the estimate of µ1- µ2 of the two populations with a confidence
level of 95%?
• The estimate of µ1- µ2 with a confidence level of 95%:

à t Distributions and the One-Sample
t Confidence Interval
The large-sample interval for µ by introducing a standard normal distribution
to have
à However, for small n this is no longer true!!!

à For small-sample n, we can use t-distribution!
PROPOSITION
Let x1, x2, . . . , xn be a random sample from a normal distribution. Then
the standardized variable
has a type of probability distribution called a t distribution with n - 1

degrees of freedom (df).
t Confidence Interval à Properties of Distributions
• The Z distribution is a special case of the normal distribution with a mean

of 0 and standard deviation of 1, i.e. Z ~ N(0,1).
• The t-distribution is similar to the Z - distribution, but is sensitive to
sample size and is used for small or moderate samples when the
population standard deviation is unknown.
Ø At large samples, the z and t-samples are very similar.
t Confidence Interval à Properties of Distributions
Ø How well a t-distribution

approximates a normal
distribution is determined
by degrees of freedom (df).
Ø The greater the sample size
(n) is, the larger the degrees of
freedom (n-1) are, and the better
the t-distribution approximates
the normal distribution!
Ø The z curve is sometimes
referred to as the t curve with
df = ∞
t Confidence Interval à One-Sample Confidence Intervals
Let 𝑥̅ and s be the sample mean and sample standard deviation of a random sample of size from a
normal population distribution. Then a two-sided confidence interval for the population mean µ
has the form
t critical values for the most frequently used confidence levels, corresponding to particular
central t curve areas, are given in Appendix Table IV.
à t Distributions and the One-Sample - t Confidence Interval
à One-Sample Confidence Intervals à Example 5.6
Consider the following observations
To simplify calculation, we simplify data by replacing: yi = xi - 10,000.
A Q-Q plot of the

data
à Tolerance Intervals
Let k be a number between 0 and 100. A tolerance interval for capturing at

least k% of the x values in a normal population distribution with a confidence
level 95% has the form
Tolerance critical values for k = 90, 95, and 99 in combination with

various sample sizes are given in Appendix Table V.
à Tolerance Intervals
1.
2. Given the following 16 mileages of a Porsche car :
a. What are the min, max values? Q1, Q2? Mean 𝑥? ̅ Sample Std s?
b. What is the estimate of mileage mean (CI) with 95% confident level?
Solution:
5.5 Intervals for µ1 - µ2 Based on Normal Population Distributions
à The Two-Sample t Interval
PROPOSITION
Consider two normal distributions with mean values µ1 and µ2, respectively. Suppose a
random sample of size n1 is selected from the first distribution, resulting in a sample
mean of 𝑥̅ 1 and a sample standard deviation of s1. A random sample from the second
distribution, selected independently of that from the first one, yields sample mean 𝑥̅ 2
and sample standard deviation s2. Then the standardized variable
has approximately a t distribution with df estimated from the sample by the following
formula:
where se = s/Ön (Note: df should be rounded down to the nearest integer).

à The Two-Sample t Interval
PROPOSITION
This implies that a confidence interval for µ1 - µ2 in this situation is
t critical values corresponding to the most frequently used confidence levels appear in
Appendix Table IV
à The Two-Sample t Interval à Example 5.7
Which way of dispensing champagne, the traditional vertical method or a tilted beerlike
pour, preserves more of the tiny gas bubbles that improve flavor and aroma? The
following data was reported in the article “On the Losses of Dissolved CO2 during
Champagne Serving”
(standard deviation)
è Question: Assuming the sampled distributions are normal, what are the confidence
intervals for the difference between true average dissolved CO2 loss for the traditional
pour and that for the slanted pour at each of the two temperatures?
à The Two-Sample t Interval à Example 5.7
à A Confidence Interval from Paired Data
• Let d denote the population mean difference, that is, the average of all
differences in the population. It can be shown that
where µ1 is the population mean value of all first numbers within pairs
and µ2 is defined similarly for all second numbers.
• The importance of this relationship is that if we can obtain a CI for
µd, it will also be a CI for µ1 - µ2 .
• A CI for µd can be calculated from the differences for pairs in the
sample.
• In particular, if the population distribution of the differences can be
assumed to be normal, then a one-sample t interval based on the
sample differences is appropriate.
à Example 5.8
Given data on the modulus of elasticity obtained 1 minute after loading in a certain
configuration, the values of modulus of elasticity obtained 4 weeks after loading for
the same lumber specimens. The data is presented here.
Normal quantile plot of the differences
It is reasonable to assume that the population

distribution of the differences is approximately
normal
à Example 5.8
The sample consists of 16 pairs, so a 99% confidence interval based on 15 df

requires the t critical value 2.947. With d52635.6 and sd5508.64, the interval is
Normal quantile plot of the differences

The firmness of a piece of fruit is an important indicator of fruit ripeness.

The Magness–Taylor firmness (N) was determined for one sample of 20
golden apples with a shelf life of zero days, resulting in a sample mean
of 8.74 and a sample standard deviation of .66, and another sample of 20
apples with a shelf life of 20 days, with a sample mean and sample
standard deviation of 4.96 and .39, respectively.
à Calculate a confidence interval for the difference between true
average firmness for zero-day apples and true average firmness for 20-
day apples using a confidence level of 95%, and interpret the interval.
Solution:

Chap5 Estimation Upload

Uploaded by

Copyright:

Available Formats

Chap5 Estimation Upload

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chap5 Estimation Upload

Uploaded by

Copyright:

Available Formats

1

5.1 Point Estimation

5.2 Large-Sample Confidence Intervals for a Population Mean

5.3 More Large-Sample Confidence Intervals

5.4 Small-Sample Intervals Based on a Normal Population Distribution

5.5 Intervals for µ1- µ2 Based on Normal Population Distributions

• The general objective of statistical inference is to use sample information

• When a parameter is being estimated, the estimate can be either a single

• A point estimate of some parameter q is a single number, calculated

• One desirable property that a good estimator should possess is that it

Ø For instance, the sampling distribution of the statistic 𝑥̅ has a mean

Ø Then 𝑥̅ is said to be an estimator of the parameter µ and, because

In general, for any

Sampling distribution of an estimator q!

Denote a population parameter generically by the letter q and denote any

• Some of the most important statistics we have studied are unbiased

• A second desirable property that estimators often possess is

Ø consistent estimators become more and more accurate as the sample

• The most common method for showing that an estimator is consistent

• A point estimate, because it is a single number, by itself provides no

• Because of sampling variability, it is virtually never the case that 𝑥̅ = µ.

• An alternative way is to calculate and report an entire interval of plausible

• A confidence interval is always calculated by first selecting a confidence

• A confidence interval for a population or process mean µ is based on

• When n is large, the 𝑥̅ distribution is approximately normal (this is

• In fact, standard deviation 𝛔 will almost never be knownà replace

• Due to Appendix Table I:

Capturing a central curve area of .95

Given the accompanying sample

The boxplot of the data shows a

Finding the critical value for a 99% confidence level

A large-sample confidence interval for a population or process mean µ is given by

• As a general rule, this

• The three most commonly

Finding the critical value for a 99% confidence level

• The half-width 1.96s/ 𝑛 of the 95% CI is sometimes called the bound on

à How to define s in general?

• Example: Back to Example 5.3

• For population: µ à mean value, s à standard deviation (std), 𝜎 ! à variance

Question: What is the estimation of the difference µ1- µ2?

1. For any two random variables x and y,

2. If x and y are two independent random variables, then

• When 1 and 2 are both large, the standardized variable

has approximately a standard normal distribution (the z curve).

• This formula is valid irrespective of the shapes of the two underlying

• The estimate of µ1- µ2 with a confidence level of 95%:

The large-sample interval for µ by introducing a standard normal distribution

à However, for small n this is no longer true!!!

has a type of probability distribution called a t distribution with n - 1

• The Z distribution is a special case of the normal distribution with a mean

Ø How well a t-distribution

Consider the following observations

To simplify calculation, we simplify data by replacing: yi = xi - 10,000.

A Q-Q plot of the

Let k be a number between 0 and 100. A tolerance interval for capturing at

Tolerance critical values for k = 90, 95, and 99 in combination with

2. Given the following 16 mileages of a Porsche car :

where se = s/Ön (Note: df should be rounded down to the nearest integer).