0% found this document useful (0 votes)
21 views

Estimation and CI

The document discusses statistical inference, focusing on estimation and confidence intervals (CIs) as methods to generalize findings from a sample to a population. It explains key concepts such as population, parameter, sample, statistic, and the importance of sampling distributions and the Central Limit Theorem in constructing CIs. Additionally, it outlines how to calculate CIs, interpret confidence levels, and the factors affecting the accuracy and width of these intervals.

Uploaded by

Sukanya Das
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views

Estimation and CI

The document discusses statistical inference, focusing on estimation and confidence intervals (CIs) as methods to generalize findings from a sample to a population. It explains key concepts such as population, parameter, sample, statistic, and the importance of sampling distributions and the Central Limit Theorem in constructing CIs. Additionally, it outlines how to calculate CIs, interpret confidence levels, and the factors affecting the accuracy and width of these intervals.

Uploaded by

Sukanya Das
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 87

Estimation and CI

Dr. Adrija Bhattacharya


Statistical inference
• Statistical inference  generalizing from a
sample to a population with calculated
degree of certainty
• Two forms of statistical inference
– Estimation
– Hypothesis testing
Inferential Statistics

• Research is about trying to make valid


inferences
• Inferential statistics: the part of statistics
that allows researchers to generalize their
findings beyond data collected.
• Statistical inference: a procedure for
making inferences or generalizations about
a larger population from a sample of that
population
How Statistical Inference Works
Basic Terminology
• Population: any collection of entities that
have at least one characteristic in common
• Parameter: the numbers that describe
characteristics of scores in the population
(mean, variance, s.d., etc.)
Basic Terminology (cont’d)
• Sample: a part of the population
• Statistic: the numbers that describe
characteristics of scores in the sample (mean,
variance, s.d., correlation coefficient,
reliability coefficient, etc.)
Basic Statistical Symbols
Basic Terminology (con’t)
• Estimate: a number computed by using the
data collected from a sample
• Estimator: formula used to compute an
estimate
The Process of Estimation
Types of Samples
• Probability
– Simple Random Samples
– Simple Stratified Samples
– Systematic Samples
– Cluster Samples
• Non Probability
– Purposive Samples
– Convenience Samples
– Quota Samples
– Snowball Samples
Limits on Inferences and Warnings

• Response Rates
• Source of data
• Sample size and sample quality
• “Random”
Estimation

• Point Estimation
• Interval estimation
– Sampling Error
– Sampling Distribution
– Confidence Intervals
Interval Estimation

• Interval Estimation: an inferential statistical


procedure used to estimate population
parameters from sample data through the
building of confidence intervals
• Confidence Intervals: a range of values
computed from sample data that has a known
probability of capturing some population
parameter of interest
Parameters and estimates
• Parameter  numerical characteristic of a population
• Statistics = a value calculated in a sample
• Estimate  a statistic that “guesstimates” a parameter
• Example: sample mean “x-bar” is the estimator of
population mean µ

Parameters and estimates are related but are not


the same
Parameters and statistics

Parameters Statistics

Source Population Sample

Notation Greek (μ, σ) Roman (x, s)

Random No Yes
variable?
Calculated No Yes
Sampling Error

• Samples rarely mirror exactly the population


• The sample statistics will almost always
contain sampling error
• The magnitude of the difference of the
sampling statistic from the population
parameter
Sampling Distribution

• Sampling Distribution: a theoretical distribution


that shows the frequency of occurrence of values
of some statistic computed for all possible
samples of size N drawn from some population.

• Sampling Distribution of the Mean: A theoretical


distribution of the frequency of occurrence of
values of the mean computed for all possible
samples of size N from a population
Sampling Distribution of Mean
Sampling Distribution of Means and
Standard Error of the Means

-3sem -2sem -1sem u +1sem +2sem +3sem


mu

Population mean
Central Limit Theorem

• The sampling distribution of means, for samples


of 30 or more:
– Is normally distributed (regardless of the shape of the
population from which the samples were drawn)
– Has a mean equal to the population mean, “mu”
regardless of the shape population or of the size of the
sample
– Has a standard deviation--the standard error of the
mean--equal to the population standard deviation
divided by the square root of the sample size
Sampling Distribution of 1000 Sample Means

Ave Ave Ave Ave. IQ of Ave. Ave. Ave.


minus minus minus 5000 4th plus plus plus
4.5 pts 3.0 pts 1.5 pts graders also 1.5 pts 3.0 pts 4.5 pts
Ave. of 1000
sample
averages
Confidence Intervals
• A defined interval of values that includes the
statistic of interest, by adding and subtracting a
specific amount from the computed statistic
• A CI is the probability that the interval computed
from the sample data includes the population
parameter of interest
So… What is a Confidence Interval?
• Confidence Interval (CI) – interval containing the
“most believable” values for a parameter
– Confidence level – probability that this method produces
an interval that contains (covers) the parameter
– Confidence level is usually close to 1.00
(most commonly 0.95 or 95%, but depends on criticality
of the decision)
• Margin of Error – measures how accurate the point
estimate is likely to be
– Multiple of the standard deviation (e.g. 1.96 * std dev)
• Confidence Interval is constructed by taking a point
estimate and adding and subtracting the margin of
error (that is, critical z-score times the standard
error)
Lower Upper
Confidence Limit Confidence Limit
Point Estimate
Width of
confidence interval
• That is, CI = point estimate ± (Critical Value)(Std Error)
– Where:
– Point Estimate is the sample statistic estimating the population parameter of
interest
– Critical Value is a table value based on the sampling distribution of the point
estimate and the desired confidence level
– Standard Error is the standard deviation of the point estimate
• How confident are we that the interval covers the unknown
population parameter?
– Some percentage (less than 100%)
– 95% confident (probably most common), 99%, 90%
– Desired level of confidence defines the “critical value” or z-score
• But that means, we are NEVER sure…
Understanding
Confidence
Intervals
CI = point estimate ±
(Critical Value)(Std Error)
**A 95% confidence
interval is formed under the
knowledge:
• 95% of all the possible
intervals based on every
possible sample from the Figure 21.4 Twenty-five samples from the
population same population give these 95% confidence
• Would cover the intervals. In the long run, 95% of all such
intervals cover the true population
parameter and the other proportion, marked by the vertical line.
5% would miss (Statistics: Concepts and Controversies (8th
Edition), by Moore and Notz, W.H. Freeman
and Company, 2013 p. 495 )
Confidence Level, (1-)
• Suppose confidence level = 95%
• Also written (1 - ) = 0.95, (so  = 0.05)
• A relative frequency interpretation:
– 95% of all the confidence intervals that can be
constructed will contain the unknown true parameter
• A specific interval Confidence
Confidence
Coefficient,
either will contain Level 1  Zα/2 value

or will not contain 80% 0.80 1.28


the true parameter 90% 0.90 1.645
95% 0.95 1.96
– No probability
98% 0.98 2.33
involved in a
99% 0.99 2.58
specific interval
99.8% 0.998 3.08
99.9% 0.999 3.27
Central Limit Theorem: Proportions AND Means
RULE: If many samples or repetitions of the SAME SIZE are taken, the
frequency curve made from STATISTICS from the SAMPLES will be
approximately normally distributed
Categorical (2 outcomes) Quantitative (Measurement)
PROPORTIONS (ෝ 𝒑’s): MEANS (𝑿 ഥ ’s ):
• Assumptions: • Conditions/Assumptions
1. Population w/fixed proportion 1. If population bell-shaped (normal),
2. Random sample from population random sample of any size
3. np5 and n(1-p)5 (“large” samples) 2. If population not bell-shaped, a large
random sample ( 30)
• MEAN of samples 𝒑ෝ ’s will be ഥ ’s) will be
– MEAN of sample means (𝑿
population proportion (p) population mean (𝝁)
 𝜇 𝒑ො = 𝒑  𝜇𝒙ҧ = 𝝁

• STANDARD DEVIATION of the – STANDARD DEVIATION of the sample


ഥ ’s) will be:
means (𝑿
𝒑′s) will be:
sample proportions (ෝ


𝒑
Population Proportion CI
• CI for population proportion:
pˆ (1  pˆ ) Margin
pˆ  z of Error
n
• What are usual values of z? Standard Error
Confidence Error Z or
Level Probability
.9 .10 1.645 Standard Deviation
.95 .05 1.96 of the Sampling Distribution
.99 .01 2.58

• Assumption: Sampling distribution of 𝑝Ƹ is bell-shaped


– To ensure assumption is met, need npˆ  5 and n(1  pˆ )  5
• What affects the margin of error?
• The level of confidence which determines the value of z
• the standard error which is a function of sample size
• How can we achieve a narrower confidence interval?
1. Decrease the level of confidence OR
2. Increase the sample size
MLE
Contd.
Contd.
Example
Solution
Example
Solution
Example
Solution
Moment Estimator
How to Calculate
Example
Solution
Example
Confidence Intervals: The Basics

If you had to give one number to estimate an unknown population


parameter, what would it be? If you were estimating a population
mean µ,you would probably use x. If you were estimating a
population proportion p, you might use pˆ . In both cases, you would be
providing a point estimate of the parameter of interest.

A point estimator is a statistic that provides an estimate of a


population parameter. The value of that statistic from a sample
is called a point estimate.

An ideal point estimator will have no bias and low variability. Since variability is
almost always present when calculating statistics from different samples, we must
extend our thinking about estimating parameters to include an acknowledgement
that repeated sampling could yield different results.
The Idea of a Confidence Interval
The big idea : The sampling distribution of x tells us how close to m the
sample mean x is likely to be. All confidence intervals we construct will
have a form similar to this :
estimate ± margin of error

A C% confidence interval gives an interval of plausible values


for a parameter. The interval is calculated from the data and has
the form
point estimate ± margin of error

The difference between the point estimate and the true


parameter value will be less than the margin of error in C% of
all samples.
The confidence level C gives the overall success rate of the
method for calculating the confidence interval. That is, in C% of
all possible samples, the method would yield an interval that
captures the true parameter value.
Interpreting Confidence Levels and
Intervals
The confidence level is the overall capture rate if the method is used
many times. The sample mean will vary from sample to sample, but
when we use the method estimate ± margin of error to get an interval
based on each sample, C% of these intervals capture the unknown
population mean µ.
Interpreting Confidence Levels and
Intervals
Interpreting Confidence Intervals
To interpret a C% confidence interval for an unknown parameter, say,
“We are C% confident that the interval from _____ to _____ captures
the actual value of the [population parameter in context].”

Interpreting Confidence Levels


To say that we are 95% confident is shorthand for “If we take many
samples of the same size from this population, about 95% of them
will result in an interval that captures the actual parameter value.”
Interpreting Confidence Levels and
Intervals
The confidence level tells us how likely it is that the method we are
using will produce an interval that captures the population parameter if
we use it many times.

The confidence level does not tell us the chance that a particular
confidence interval captures the population parameter.

Instead, the confidence interval gives us a set of plausible values for


the parameter.

We interpret confidence levels and confidence intervals in much the


same way whether we are estimating a population mean, proportion, or
some other parameter.
Constructing Confidence Intervals
Why settle for 95% confidence when estimating a parameter? The price
we pay for greater confidence is a wider interval.

When we calculated a 95% confidence interval for the mystery mean µ,


we started with
estimate ± margin of error
Our estimate came from the sample statistic x .
Since the sampling distribution of x is Normal,
about 95% of the values of x will lie within 2
standard deviations (2s x ) of the mystery mean m.
That is, our interval could be written as :

240.79 ± 2 × 5 = x ± 2s x
This leads to a more general formula for confidence intervals:
statistic ± (critical value) • (standard deviation of statistic)
Constructing Confidence Intervals
Calculating a Confidence Interval
The confidence interval for estimating a population parameter has
the form
statistic ± (critical value) • (standard deviation of statistic)
where the statistic we use is the point estimator for the parameter.

Properties of Confidence Intervals:


•The “margin of error” is the
(critical value) • (standard deviation of statistic)
•The user chooses the confidence level, and the margin of error follows
from this choice.
•The critical value depends on the confidence level and the sampling
distribution of the statistic.
•Greater confidence requires a larger critical value
•The standard deviation of the statistic depends on the sample size n
Using Confidence Intervals Wisely
Here are two important cautions to keep in mind when constructing and
interpreting confidence intervals.

 Our method of calculation assumes that the data come from an SRS
of size n from the population of interest.

 The margin of error in a confidence interval covers only chance


variation due to random sampling or random assignment.
Factors Affecting Confidence Intervals
Various Levels of Confidence

• When population standard deviation is


known use Z table values:
– For 95%CI: mean +/- 1.96 s.e. of mean
– For 99% CI: mean +/- 2.58 s.e. of mean
• When population standard deviation is not
known use “Critical Value of t” table
– For 95%CI: mean +/- 2.04 s.e. of mean
– For 99% CI: mean +/- 2.75 s.e. of mean
95%Confidence Interval
95 times out of 100 the interval constructed
around the sample mean will capture
the population mean. 5 times out of 100 the
interval will not capture the population mean

95%

-2.58sem -1.96sem u +1.96sem +2.58sem


mu
99%Confidence Interval
99 times out of 100 the interval constructed
around the sample mean will capture
the population mean. 1 time out of 100 the
interval will not capture the population mean

99%

-2.58sem u +2.58sem
mu
Effects of Sample Size
Process for Constructing Confidence
Intervals
• Compute the sample statistic (e.g. a mean)
• Compute the standard error of the mean
• Make a decision about level of confidence that is
desired (usually 95% or 99%)
• Find tabled value for 95% or 99% confidence
interval
• Multiply standard error of the mean by the tabled
value
• Form interval by adding and subtracting
calculated value to and from the mean
Types
• One-Sided Confidence Intervals vs. Two-Sided
Confidence Intervals
• The concept of one-sided and two-sided
confidence intervals is fairly straightforward.
• A two-sided confidence interval brackets the
population parameter of interest from above and
below.
• A one-sided confidence interval brackets the
population parameter of interest from either
above or below, which establishes an upper or
lower window in which the parameter exists.
• How to Calculate a Confidence Interval
• Let’s imagine a group of researchers that are interested in determining
whether or not the oranges grown on a particular farm are large enough
to be sold to a prospective grocery chain.
• Step #1: Find the number of samples (n).
• The researchers randomly select 46 oranges from trees on the
farm. Therefore, n = 46.
• Step #2: Calculate the mean (x) of the the samples.
• The researchers then calculate of a mean weight of 86 grams from their
sample. Therefore, x = 86.
• Step #3: Calculate the standard deviation (s).
• It’s best to use the standard deviation of the entire population, however,
in many cases researchers will not have access to this information. If this
is the case, the researchers should use the standard deviation of the
sample that they have established.
• For our example, let’s say that the researchers have resorted to
calculating the standard deviation from their sample. They receive a
standard deviation of 6.2 grams. Therefore, s = 6.2.
• Step #4: Decide the confidence interval that will be used.
• 95 percent and 99 percent confidence intervals are the most
common choices in typical market research studies.
• In our example, let’s say the researchers have elected to use a
confidence interval of 95 percent.
• Step #5: Find the Z value for the selected confidence interval.
• The researchers would then utilize the following table to
determine their Z value:
• Confidence Interval Z
• 80% 1.282
• 85% 1.440
• 90% 1.645
• 95% 1.960
• 99% 2.576
• 99.5% 2.807
• 99.9% 3.291
• Since they have decided to use a 95 percent confidence interval,
the researchers determine that Z = 1.960.
• Step #6: Calculate the following formula.
• Next, the researchers would need to plug their known values into
the formula.
• Continuing with our example, this formula would appear as follows:
• 86 ± 1.960 (6.2/6.782)
• When calculated, this formula gives the researchers the result of 86
± 1.79 as their confidence interval.
• Step #7: Draw a conclusion.
• The researchers have now determined that the true mean of the
greater population of oranges is likely (with 95 percent confidence)
between 84.21 grams and 87.79 grams.
Example
DCOVA

• A sample of 11 circuits from a large normal


population has a mean resistance of 2.20
ohms. We know from past testing that the
population standard deviation is 0.35 ohms.

• Determine a 95% confidence interval for the


true mean resistance of the population.
Example (continued)

DCOVA

• A sample of 11 circuits from a large normal


population has a mean resistance of 2.20
ohms. We know from past testing that the
population standard deviation is 0.35 ohms.
• Solution: X  Zα/2
σ
n
 2.20  1.96 (0.35/ 11 )
 2.20  0.2068
1.9932  μ  2.4068
Interpretation
DCOVA

• We are 95% confident that the true mean


resistance is between 1.9932 and 2.4068
ohms
• Although the true mean may or may not be in
this interval, 95% of intervals formed in this
manner will contain the true mean
example
A recent large survey of a random sample of Australian children
asked about weekly hours of internet use in three age groups.
The following table shows the mean and standard deviation of
the number of hours of internet use per week and the total
number of children surveyed for each age group. Calculate an
approximate 95% confidence interval for the mean number of
hours of internet use per week in each group.
Solution
Confidence Intervals
DCOVA

Confidence
Intervals

Population Population
Mean Proportion

σ Known σ Unknown
Confidence Interval for μ
(σ Unknown) DCOVA

• If the population standard deviation σ is


unknown, we can substitute the sample
standard deviation, S
• This introduces extra uncertainty, since S is
variable from sample to sample
• So we use the t distribution instead of the
normal distribution
Confidence Interval for μ (continued)
(σ Unknown)
DCOVA
• Assumptions
– Population standard deviation is unknown
– Population is normally distributed
– If population is not normal, use large sample (n > 30)
• Use Student’s t Distribution
• Confidence Interval Estimate:
S
X  tα / 2
n
(where tα/2 is the critical value of the t distribution with n -1 degrees of freedom and an
area of α/2 in each tail)
Student’s t Distribution
DCOVA

• The t is a family of distributions


• The tα/2 value depends on degrees of freedom
(d.f.)
– Number of observations that are free to vary after sample
mean has been calculated

d.f. = n - 1
Degrees of Freedom (df)
DCOVA

Idea: Number of observations that are free to vary


after sample mean has been calculated
Example: Suppose the mean of 3 numbers is 8.0

Let X1 = 7 If the mean of these three values is 8.0,


then X3 must be 9
Let X2 = 8
(i.e., X3 is not free to vary)
What is X3?

Here, n = 3, so degrees of freedom = n – 1 = 3 – 1 = 2


(2 values can be any numbers, but the third is not free to vary for a
given mean)
Example
DCOVA

• A random sample of 100 people shows that 25


are left-handed.
• Form a 95% confidence interval for the true
proportion of left-handers
Example (continued)

DCOVA

• A random sample of 100 people shows that 25


are left-handed. Form a 95% confidence
interval for the true proportion of left-
handers.
p  Z /2 p(1  p)/n
 25/100  1.96 0.25(0.75)/100
 0.25  1.96(0.0433)

 0.1651    0.3349
Interpretation
DCOVA

• We are 95% confident that the true percentage of


left-handers in the population is between
16.51% and 33.49%.

• Although the interval from 0.1651 to 0.3349 may


or may not contain the true proportion, 95% of
intervals formed from samples of size 100 in this
manner will contain the true proportion.
Sampling Error
DCOVA

• The required sample size can be found to reach a


desired margin of error (e) with a specified level
of confidence (1 - )

• The margin of error is also called sampling error


– the amount of imprecision in the estimate of the
population parameter
– the amount added and subtracted to the point
estimate to form the confidence interval
Determining Sample Size(continued)
DCOVA
• To determine the required sample size for the proportion, you
must know:

– The desired level of confidence (1 - ), which determines the critical


value, Zα/2
– The acceptable sampling error, e
– The true proportion of events of interest, π

• π can be estimated with a pilot sample if necessary (or


conservatively use 0.5 as an estimate of π)
Required Sample Size Example
DCOVA

How large a sample would be necessary to


estimate the true proportion of defectives in a
large population within ±3%, with 95%
confidence?
(Assume a pilot sample yields p = 0.12)
Required Sample Size Example
(continued)

Solution: DCOVA

For 95% confidence, use Zα/2 = 1.96


e = 0.03
p = 0.12, so use this to estimate π

Z / 2π (1  π ) (1.96) (0.12)(1  0.12)


2 2
n 2
 2
 450.74
e (0.03)
So use n = 451
Ethical Issues
• A confidence interval estimate (reflecting
sampling error) should always be included
when reporting a point estimate
• The level of confidence should always be
reported
• The sample size should be reported
• An interpretation of the confidence interval
estimate should also be provided
Bootstrapping Is A Method To Use When
Population Is Not Normal DCOVA

To estimate a population mean using bootstrapping, you would:


1. Select a random sample of size n without replacement from a population
of size N.
2. Resample the initial sample by selecting n values with replacement from
the initial sample.
3. Compute X from this resample.
4. Repeat steps 2 & 3 m different times.
5. Construct the resampling distribution of X.
6. Construct an ordered array of the entire set of resampled X’s.
7. In this ordered array find the value that cuts off the smallest α/2(100%)
and the value that cuts off the largest α/2(100%). These values provide
the lower and upper limits of the bootstrap confidence interval estimate
of μ.
Bootstrapping Requires The Use of
Software As Minitab or JMP DCOVA
 Typically a very large number (thousands) of
resamples are used.

 Software is needed to:


 Automate the resampling process
 Calculate the appropriate sample statistic
 Create the ordered array
 Find the lower and upper confidence limits
Bootstrapping Example -- Processing Time of
Life Insurance Applications DCOVA

Sample of 27 times taken without replacement from population

73 19 16 64 28 28 31 90 60 56 31 56 22 18 45 48 17 17 17 91 92 63 50 51 69 16 17

From boxplot conclude population is


not normal so t confidence interval is
not appropriate.

Use bootstrapping to form a


confidence interval for μ.
Comparing the original sample to the first
resample with replacement DCOVA

Sample of 27 times taken without replacement from population

73 19 16 64 28 28 31 90 60 56 31 56 22 18 45 48 17 17 17 91 92 63 50 51 69 16 17

The initial bootstrap resample omits some values (18, 45, 50, 63, and 91) that appear in
the initial sample above. Note that the value of 73 appears twice even though it appears
only once in the initial sample.

16 16 16 17 17 17 17 17 19 22 28 31 31 51 56 56 60 60 64 64 64 64 69 73 73 90 92
The Ordered Array of Sample Means for 100 Resamples

Fifth Smallest
DCOVA

31.5926 33.9259 35.4074 36.5185 36.6296 36.9630 37.0370 37.0741

37.1481 37.3704 37.9259 38.1111 38.1481 38.2222 38.2963 38.7407

38.8148 38.8519 38.8889 39.0000 39.1852 39.3333 39.3704 39.6667

40.1481 40.5185 40.6296 40.9259 40.9630 41.2593 41.2963 41.7037

41.8889 42.0741 42.1111 42.1852 42.8519 43.0741 43.1852 43.3704

43.4444 43.7037 43.8148 43.8519 43.8519 43.9259 43.9630 44.1481

44.4074 44.5556 44.7778 45.0000 45.4444 45.5185 45.5556 45.6667

45.7407 45.8519 45.9630 45.9630 46.0000 46.1111 46.2963 46.2963

46.3333 46.3333 46.4815 46.6667 46.7407 46.9630 47.0741 47.2222

47.2963 47.3704 47.4815 47.4815 47.5556 47.6667 47.8519 48.5185

48.8889 49.0000 49.2222 49.4444 49.4815 49.4815 49.6296 49.6296

49.7407 50.2963 50.4074 50.5926 50.9259 51.4074 Fifth Largest


51.4815 51.5926

51.9259 52.3704 53.4074 54.3333


Finding a 90% Bootstrap CI for The Population
DCOVA
Mean
• To find the 90% CI for 100 resamples we need
to find the 0.05(100) = 5th smallest and the 5th
largest values.

• From the table the 5th smallest value is


36.6296 and the 5th largest value is 51.5926.

• The 90% bootstrap CI is (36.6296, 51.5926)

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy