Lecture 10

Download as pdf or txt
Download as pdf or txt
You are on page 1of 44

INTRODUCTION TO PROBABILITY

AND STATISTICS
FOURTEENTH EDITION

Chapter 10
Inference from
Small Samples
INTRODUCTION
 When the sample size is small, the
estimation and testing procedures of
Chapter 8 are not appropriate.
 There are equivalent small sample test
and estimation procedures for
✓ m, the mean of a normal population
✓ m1-m2, the difference between two
population means
✓ s2, the variance of a normal population
✓ The ratio of two population variances.
THE SAMPLING DISTRIBUTION
OF THE SAMPLE MEAN
 When we take a sample from a normal
population, the sample mean x has a
normal distribution for any sample size n,
and
x -m x -m
z= is not normal!
s/ n s/ n
 has a standard normal distribution.
 But if s is unknown, and we must use s to
estimate it, the resulting statistic is not
normal.
STUDENT’S T
DISTRIBUTION
 Fortunately, this statistic does have a
sampling distribution that is well known
to statisticians, called the Student’s t
distribution, with n-1 degrees of
freedom.

x -m
t=
s/ n

•We can use this distribution to create


estimation testing procedures for the
population mean m.
PROPERTIES OF STUDENT’S T
•Mound-shaped and
symmetric about 0.
•More variable than z,
with “heavier tails”

 Shape depends on the sample size n or


the degrees of freedom, n-1.
 As n increases the shapes of the t and z
distributions become almost identical.
USING THE T-TABLE
 Table 4 gives the values of t that cut off certain
critical values in the tail of the t distribution.
 Index df and the appropriate tail area a to
find ta,the value of t with area a to its right.

For a random sample of size


n = 10, find a value of t that
cuts off .025 in the right tail.
Row = df = n –1 = 9
Column subscript = a = .025
t.025 = 2.262
SMALL SAMPLE INFERENCE
FOR A POPULATION MEAN M
 The basic procedures are the same as
those used for large samples. For a
test of hypothesis:
Test H 0 : m = m 0 versus H a : one or two tailed
using the test statistic
x - m0
t=
s/ n
using p - values or a rejection region based on
a t - distribution with df = n - 1.
SMALL SAMPLE INFERENCE
FOR A POPULATION MEAN M

 For a 100(1-a)% confidence interval for the


population mean m:

s
x  ta / 2
n
where ta / 2 is the value of t that cuts off area a/2
in the tail of a t - distribution with df = n - 1.
EXAMPLE
A sprinkler system is designed so that the
average time for the sprinklers to activate
after being turned on is no more than 15
seconds. A test of 5 systems gave the following
times:
17, 31, 12, 17, 13, 25
Is the system working as specified? Test using
a = .05.
H 0 : m = 15 (working as specified)
H a : m  15 (not working as specified)
EXAMPLE
Data: 17, 31, 12, 17, 13, 25
First, calculate the sample mean and standard deviation, using your
calculator or the formulas in Chapter 2.

 xi 115
x= = = 19.167
n 6
( x) 115 2 2
x - 2477 -
2

s= n = 6 = 7.387
n -1 5
EXAMPLE
Data: 17, 31, 12, 17, 13, 25
Calculate the test statistic and find the rejection region for a =.05.

Test statistic : Degrees of freedom :


x - m 0 19.167 - 15
t= = = 1.38 df = n - 1 = 6 - 1 = 5
s / n 7.387 / 6

Rejection Region: Reject H0


if t > 2.015. If the test statistic
falls in the rejection region, its
p-value will be less than a =
.05.
CONCLUSION
Data: 17, 31, 12, 17, 13, 25
Compare the observed test statistic to the rejection region, and draw
conclusions.

Test statistic : t = 1.38


H 0 : m = 15
Rejection Region :
H a : m  15
Reject H 0 if t  2.015.

Conclusion: For our example, t = 1.38 does not fall in


the rejection region and H0 is not rejected. There is
insufficient evidence to indicate that the average
activation time is greater than 15.
APPROXIMATING THE
P-VALUE
 Youcan only approximate the p-
value for the test using Table 4.

Since the observed


value of t = 1.38 is
smaller than t.10 =
1.476,
THE EXACT P-VALUE
 You can get the exact p-value
using some calculators or a computer.
p-value = .113 which
is greater than .10
as we approximated
using Table 4.

One-Sample T: Times
Test of mu = 15 vs > 15

95%
Lower
Variable N Mean StDev SE Mean Bound T P
Times 6 19.1667 7.3869 3.0157 13.0899 1.38 0.113
TESTING THE DIFFERENCE
BETWEEN TWO MEANS

Asin Chapter 9, independent random samplesof sizen1 and n2 are drawn


from populations 1 and 2 with means μ1 and μ2 and variancess 12and s 22 .

Sincethe samplesizesare small,the two populations must be normal.


•To test:
•H0: m1-m2 = D0 versus Ha: one of three
where D0 is some hypothesized difference,
usually 0.
TESTING THE DIFFERENCE
BETWEEN TWO MEANS
• The test statistic used in Chapter 9
𝑥lj 1 − 𝑥lj 2 − 𝑫𝟎
z≈
𝑠12 𝑠22
+
𝑛1 𝑛2

• does not have either a z or a t


distribution, and cannot be used for
small-sample inference.
• We need to make one more assumption,
that the population variances,
although unknown, are equal.
TESTING THE DIFFERENCE
BETWEEN TWO MEANS
• Instead of estimating each population
variance separately, we estimate the
common variance with
( n - 1) s 2
+ ( n - 1) s 2 • And the resulting
s2 = 1 1 2 2
test statistic,
n1 + n2 - 2

x1 - x2 - D0
t= has a t distribution
1 1 with n1+n2-2
s  + 
2

 n1 n2  degrees of
freedom.
ESTIMATING THE DIFFERENCE
BETWEEN TWO MEANS
• You can also create a 100(1-a)%
confidence interval for m1-m2.
Remember the three
1 1 assumptions:
( x1 - x2 )  ta / 2 s  + 
2
1. Original
 n1 n2  populations
normal
( n - 1) s 2
+ ( n - 1) s 2
with s 2 = 1 1 2 2
2. Samples random
n1 + n2 - 2 and independent
3. Equal population
variances.

E XAMPLE
Two training procedures are compared by
measuring the time that it takes trainees to
assemble a device. A different group of trainees are
taught using each method. Is there a difference in
the two methods? Use a = .01.

Time to Method 1 Method 2 H 0 : m1 - m 2 = 0


Assemble
H a : m1 - m 2  0
Sample size 10 12
Sample mean 35 31
Test statistic :
Sample Std 4.9 4.5 x1 - x2 - 0
Dev t=
2 1 1
s  + 
 n1 n2 
EXAMPLE
• Solve this problem by approximating
the p-value using
Time to Method Method
Table 4. Assemble 1 2
Sample size 10 12
Sample mean 35 31
Sample Std 4.9 4.5
Dev
Calculate : Test statistic :
(n1 - 1) s + (n2 - 1) s
2 2
35 - 31
s =
2 1 2
t=
n1 + n2 - 2
1 1
21.942 + 
9(4.9 2 ) + 11(4.52 )  10 12 
= = 21.942
20 = 1.99
EXAMPLE
p - value : P(t  1.99) + P(t  -1.99)
1
P(t  1.99) = ( p - value)
2
.025 < ½( p-value) <
df = n1 + n2 – 2 = 10 + 12 – 2 = 20 .05

.05 < p-value < .10


Since the p-value is
greater than a = .01,
H0 is not rejected.
There is insufficient
evidence to indicate a
difference in the
population means.
TESTING THE DIFFERENCE
BETWEEN TWO MEANS
• How can you tell if the equal variance
assumption is reasonable?
Rule of Thumb :
larger s 2
If the ratio, 2
 3,
smaller s
the equal variance assumption is reasonable.
larger s 2
If the ratio, 2
 3,
smaller s
use an alternative test statistic.
TESTING THE DIFFERENCE
BETWEEN TWO MEANS
• If the population variances cannot be
assumed equal, the test statistic
2
x1 - x2 s 2
s 
2

t  + 
1 2

df  2  n1 n2 
s12 s22
+ ( s1 / n1 ) 2 ( s22 / n2 ) 2
n1 n2 +
n1 - 1 n2 - 1

• has an approximate t distribution with


degrees of freedom given above. This is
most easily done by computer.
THE PAIRED-DIFFERENCE
TEST
•Sometimes the assumption of independent
samples is intentionally violated, resulting in
a matched-pairs or paired-difference test.
•By designing the experiment in this way, we
can eliminate unwanted variability in the
experiment by analyzing only the
differences,
di = x1i – x2i
•to see if there is a difference in the two
population means, m1-m2.
EXAMPLE
Car 1 2 3 4 5
Type A 10.6 9.8 12.3 9.7 8.8
Type B 10.2 9.4 11.8 9.1 8.3

• One Type A and one Type B tire are randomly


assigned to each of the rear wheels of five cars.
Compare the average tire wear for types A and B
using a test of hypothesis.
• But the samples are not
H 0 : m1 - m 2 = 0
independent. The pairs of
H a : m1 - m 2  0 responses are linked
because measurements are
taken on the same car.
THE PAIRED-DIFFERENCETEST

To test H 0 : m1 - m 2 = 0 we test H 0 : m d = 0
using the test statistic
d -0
t=
sd / n
where n = number of pairs, d and sd are the
mean and standard deviation of the differences, d i .
Use the p - value or a rejection region based on
a t - distribution with df = n - 1.
EXAMPLE
Car 1 2 3 4 5
Type A 10.6 9.8 12.3 9.7 8.8
Type B 10.2 9.4 11.8 9.1 8.3
Difference .4 .4 .5 .6 .5

H 0 : m1 - m 2 = 0
H a : m1 - m 2  0 Test statistic :
d -0 .48 - 0
 di t= = = 12.8
Calculated = = .48 sd / n .0837 / 5
n

d2 -
( d i )2

i
sd = n = .0837
n -1
EXAMPLE
Car 1 2 3 4 5
Type A 10.6 9.8 12.3 9.7 8.8
Type B 10.2 9.4 11.8 9.1 8.3
Difference .4 .4 .5 .6 .5

Rejection region: Reject


H0 if t > 2.776 or t < -2.776.
Conclusion: Since t = 12.8,
H0 is rejected. There is a
difference in the average
tire wear for the two types
of tires.
SOME NOTES
• You can construct a 100(1-a)% confidence
interval for a paired experiment using
sd
d  ta / 2
n
• Once you have designed the experiment by
pairing, you MUST analyze it as a paired
experiment. If the experiment is not
designed as a paired experiment in
advance, do not use this procedure.
INFERENCE CONCERNING
A POPULATION VARIANCE

• Sometimes the primary parameter of


interest is not the population mean m but
rather the population variance s2. We
choose a random sample of size n from a
normal distribution.
• The sample variance s2 can be used in its
standardized form:
( n - 1) s 2
2 =
s2
• which has a Chi-Square distribution with n -
1 degrees of freedom.
INFERENCE CONCERNING
A POPULATION VARIANCE
• Table 5 gives both upper and lower critical
values of the chi-square statistic for a given df.

For example, the


value of chi-square
that cuts off .05 in the
upper tail of the
distribution with df =
5 is 2 =11.07.
INFERENCE CONCERNING
A POPULATION VARIANCE
To test H 0 : s 2 = s 02 versus H a : one or two tailed
we use the test statistic
(n - 1) s 2
 =
2
with a rejection region based on
s 2
0

a chi - square distribution with df = n - 1.

Confidence interval :
(n - 1) s 2 (n - 1) s 2
s 
2

a / 2
2
 (21-a / 2)
EXAMPLE
•A cement manufacturer claims that his cement
has a compressive strength with a standard
deviation of 10 kg/cm2 or less. A sample of n =
10 measurements produced a mean and
standard deviation of 312 and 13.96,
respectively.
uses the test statistic:
A test of hypothesis:
H0: s2 = 10 (claim is ( n - 1) s 2
9(13.96 2
)
correct)  =
2
2
= = 17.5
10 100
Ha: s2 > 10 (claim is
wrong)
EXAMPLE
• Do these data produce sufficient evidence
to reject the manufacturer’s claim? Use a =
.05.
Rejection region:
Reject H0 if 2  16.919
(a = .05).
Conclusion: Since 2=
17.5, H0 is rejected. The
standard deviation of
the cement strengths is
more than 10.
APPROXIMATING THE
P-VALUE

p - value : P(   17.5) with df = n - 1 = 9


2

.025 < p-value < .05


Since the p-value is
less than a = .05, H0 is
not rejected. There is
sufficient evidence to
reject the
manufacturer’s claim.
INFERENCE CONCERNING
TWO POPULATION VARIANCES
• We can make inferences about the ratio of
two population variances in the form a ratio.
We choose two independent random samples
of size n1 and n2 from normal distributions.
• If the two population variances are equal, the
statistic
s12
F= 2
s2

• has an F distribution with df1 = n1 - 1 and


df2 = n2 - 1 degrees of freedom.
INFERENCE CONCERNING
TWO POPULATION VARIANCES
•Table 6 gives only upper critical values of
the F statistic for a given pair of df1 and
df2.
For example, the
value of F that cuts
off .05 in the upper
tail of the distribution
with df1 = 5 and df2 = 8
is F =3.69.
INFERENCE CONCERNING
TWO POPULATION
VARIANCES
To test H 0 : s 12 = s 22 versusH a : one or two tailed
we use the test statistic
s12
F = 2 wheres1 is the largerof the two samplevariances.
2

s2
with a rejectionregionbasedon an F distribution with
df1 = n1 - 1 and df 2 = n2 - 1. Confidence interval :
s12 1 s 12 s12
 2  2 Fdf 2 ,df1
s2 Fdf1 ,df 2 s 2 s2
2
EXAMPLE
•An experimenter has performed a lab
experiment using two groups of rats. He
wants to test H0: m1 = m2, but first he wants
to make sure that the population variances
are equal. Standard (2) Experimental (1)
Sample size 10 11
Sample mean 13.64 12.42
Sample Std Dev 2.3 5.8

Preliminary test :
H 0 : s 12 = s 22 versus H a : s 12  s 22
EXAMPLE
Standard (2) Experimental (1)

Sample size 10 11
Sample Std Dev 2.3 5.8
Test statistic :
H0 : s = s2
1
2
2
s12 5.82
Ha :s  s
2
1
2
2 F= 2 = 2
= 6.36
s2 2.3
We designate the sample with the larger
standard deviation as sample 1, to force the
test statistic into the upper tail of the F
distribution.
EXAMPLE
H 0 : s 12 = s 22 Test statistic :
2 2
Ha :s  s
2 2 s 5.8
1 2 F= = 1
2 2
= 6.36
s 2.3
2

The rejection region is two-tailed, with a = .05, but we


only need to find the upper critical value, which has a/2
= .025 to its right.
From Table 6, with df1=10 and df2 = 9, we reject H0 if F >
3.96.
CONCLUSION: Reject H0. There is sufficient evidence
to indicate that the variances are unequal. Do not rely
on the assumption of equal variances for your t test!
KEY CONCEPTS
I. Experimental Designs for Small Samples
1. Single random sample: The sampled
population must be normal.
2. Two independent random samples: Both
sampled populations must be normal.
a. Populations have a common variance s 2.
b. Populations have different variances
3. Paired-difference or matched-pairs design:
The samples are not independent.
KEY CONCEPTS
II. Statistical Tests of Significance
1. Based on the t, F, and  2 distributions
2. Use the same procedure as in Chapter 9
3. Rejection region—critical values and significance
levels: based on the t, F, and  2 distributions with
the appropriate degrees of freedom
4. Tests of population parameters: a single mean,
the difference between two means, a single variance,
and the ratio of two variances
III. Small Sample Test Statistics
To test one of the population parameters when the
sample sizes are small, use the following test
statistics:
KEY CONCEPTS

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy