Unit 5: Test of Significance/Hypothesis Testing (Topics 20, 22, 23)
Unit 5: Test of Significance/Hypothesis Testing (Topics 20, 22, 23)
In this unit we will add to what we have learned about statistical inference by studying tests of
significance. These tests will assess the degree to which sample data provide evidence against a
particular conjecture or hypothesis about the value of the population mean. We will study the
formal structure of the tests, starting with the hypotheses and ending with a conclusion. Like a
confidence interval allows us to infer something about a population mean, so does a significance
test.
Estimating Elapsed Time (from Introduction to Statistical Investigations by Tintle, et. all)
Does it ever seem like time drags on or time flies by? Perception, including that of time, is one of the
things that psychologists study. Students in a statistics class collected data on 48 other students’
perception of time. They told their subjects that they would be listening to some music and then after it
was over, they would be asked some questions. They played 10 seconds of the Jackson 5’s song “ABC”.
Afterward, they simply asked the subjects how long they thought the song clip lasted. They wanted to see
whether students could accurately estimate the length of this short song segment. Below is a frequency
table for the data:
Time (sec) 5 6 7 8 10 12 13 15 20 21 22 30
Frequency 1 1 3 6 11 3 3 10 4 1 1 4
Let’s explore this study using a formal, step-by-step process called a test of significance. We will outline
the six main steps in such a test throughout this activity and you will use these same steps for the next two
topics as well. Since we will be working with samples of quantitative data, we will be conducting t-tests.
a. Write out a definition of the parameter of interest in the elapsed time study and indicate what symbol
you will use to represent it. The parameter of interest is the average time in seconds that a student
estimated elapsed while the clip of “ABC” was played. Use
Page 1
b. In this elapsed time study, the null hypothesis is that the mean time elapsed is 10 seconds. Restate this
using the symbol from part a and the appropriate hypothesized value, instead of words: H0: =
c. In the elapsed time study, the students think the other students’ estimates will differ from the actual
time since the perception of time is inaccurate. Restate this conjecture (with symbols and a number) as an
alternative hypothesis. Ha:
Note that the symbol and hypothesized value don’t change between the null and the alternative
hypothesis; you are just selecting between less than, greater than, or not equal to based on the research
conjecture.
d. What conditions needed to be met for the Central Limit Theorem to be satisfied?
We assume the sample was random, we know it is large (n > 30), and we assume it is independent (n
< 10% of the overall student population).
Note that you will often check these conditions assuming the null hypothesis is true. If random sampling
isn’t mentioned, it can usually be assumed because it is part of a well-designed experiment. If the sample
is not large enough (n < 30), look at displays of the distribution and/or a normal probability plot to make
sure the data appear to be normally distributed. If not, proceed with caution through the rest of the test. If
independence isn’t clear (if you don’t know how large the population is), you can usually assume a
sample is independent, but you should note you are assuming this when you state the conditions.
Page 2
At this point, it is very good practice to then draw a well-labeled sketch of the sampling distribution of the
sample mean using the mean of 10 seconds and the standard deviation of the sampling distribution of
s 6.500
= = 0.938 .
48 48
e. Your calculator can find the test statistic, or t-statistic since this is a t-test, but you can find it as well
by calculating how many standard deviations of the sampling distribution the sample mean is away from
x − 13.708 − 10
the population mean (very similar to the z-score calculation): t = = = 3.952
s n 6.500 / 48
f. Since our alternative hypothesis is “not equal”, we will need the probability in both tails. This is a
two-sided or two-tailed test. Since our t-statistic is positive, we will find the probability above it in the t-
distribution with the correct degrees of freedom (47 in this case) and then we will multiply that
probability by 2 to calculate the total probability in both tails. You can do this using a t-table or using
your calculator (if you use your calculator with the correct alternative hypothesis, it will multiply by 2 for
you).
p-value = 0.000258 (your calculator might display something like 2.586 -4 which is scientific notation
for 2.586 10−4 ; look carefully since the p-value is a probability and should be a number from 0 to 1)
Page 3
Note that the smaller the p-value, the stronger the evidence against the null hypothesis and in favor of the
alternative hypothesis. Typical evaluations are
• A p-value above .10 constitutes little or no evidence against the null hypothesis.
• A p-value below .10 but above .05 constitutes moderately strong evidence against the null
hypothesis.
• A p-value below .05 but above .01 constitutes reasonably strong evidence against the null
hypothesis.
• A p-value below .01 constitutes very strong evidence against the null hypothesis.
In some studies, the researcher decides in advance how small the p-value must be to provide convincing
evidence against the null hypothesis. This cutoff value is called a significance level, denoted by α (alpha).
Common values are α = .10, α = .05, and α = .01. A smaller significance level indicates a stricter
standard for deciding if the null hypothesis can be rejected. If the researcher specifies a level of
significance in advance, you would then say you reject or fail to reject at that particular level. Another
common expression is to say that the data are statistically significant if it is unlikely to have occurred by
chance or sampling variability alone (assuming that the null hypothesis is true).
g. Does the p-value for the Elapsed Time study lead to reject or failing to reject the null hypothesis at the
.01 level?
We reject the null hypothesis at the .01 level because our p-value of 0.0002 is less than 0.01.
h. Does this study provide convincing evidence that the mean estimate for time elapsed was different
than 10 seconds? Explain in context. Yes. It would be surprising to obtain a sample mean of 13.708
seconds which we first saw in the graph of the sampling distribution. A second piece of evidence is
the large t-statistic, over 3, showing our sample mean is unlikely to occur in a random sample with
the true elapsed time of 10 seconds. The probability of obtaining such a t-statistic by random
chance alone is .0002. We have strong evidence to think the mean time perceived by students as the
time elapsed while a 10 sec clip of the song “ABC” is played will be different than 10 seconds.
Page 4
Practice Exercise:
Use the 6-step process outlined above to complete a test of significance for the following situation (from
AMSCO’s AP Statistics): An association of college bookstores reported that the average amount of
money spent by students on textbooks was $325.16 with a standard deviation of $76.42. A random
sample of 75 students at the local campus of the state university indicated that an average bill for
textbooks for the semester in question to be $312.34. Do these data provide significant evidence at a 95%
confidence level (same as a 0.05 significance level) that the actual bill will be less than the $325.16 that
was reported? Show all steps.
1. The parameter is the mean bill for textbooks for college students.
2. H0: = 325.16
Ha: < 325.16
3. Because the sample size is large (75 > 30) and randomly selected, and the sample values are
independent of each other, we can apply the Central Limit Theorem.
312.34 - 325.16
4. t= = -1.4528
76.42
75
5. p-value = .07313
6. This is not significant at the .05 level. We fail to reject the null hypothesis because our p-
value of 0.07313 is greater than .05. There is insufficient evidence to show that the average
bill for college textbooks is less than $325.16.
Page 5
Walk the Line Experiment
You’ll need someone to help you with this data collection. We want to test how much eyesight helps you
keep walking in a straight line. We assume without any eye covering, you could walk a straight line
without any issues. We will have you blindfolded, either inside or outside depending on your own
situation, and try to walk down a straight line for 10 ft. Hopefully there is already a line on the floor or
the sidewalk you use. At the end of the 10 ft, measure how far off the line (to either side) you are in
inches.
What is the parameter we wish to study?
The mean number of inches away from a line after walking blind-folded for 10 ft
represented by
Materials: blind-fold, helper to make sure you don’t run into anything, ruler
Procedure:
1. Find a spot, inside or outside, with 10 ft of a straight line.
2. Have someone blindfold you and place you at the start of the 10 foot segment.
3. Walk 10 ft ahead. Have someone stop you at the end of 10 ft.
4. Measure how far off the line you are at the end of the 10 ft in inches. Count either side as
positive.
Record the class data: (sample data given below; results based on sample)
student dist off line (in) student dist off line (in) student dist off line (in)
1 1.5 11 1.5 21 1.9
2 15 12 15.5 22 1.8
x = 3.675, sx = 4.751
Page 6
In order to complete a t-test, verify the technical conditions:
Not sure if participants we randomly sampled from any
population in particular, maybe the TJ Class of 2025?
Probably not any better or worse at walking in a straight line
blind-folded than most students their age though? If the
population is the freshmen class, our sample is less than 10%
and independent. It is smaller than 30, so we must check
normality. The normal probability plot (distance in inches on
horizontal axis, z-scores on vertical axis) is mostly a straight
line with 3 points clearly off to the side. We will proceed with
caution through the rest of the test.
We have one sample, but we don’t know the population standard deviation, so we will run a one
sample t-test. Since we are using the sample standard deviation, the test statistic will be a t-critical
x -μ
value and can be found using: . The calculator also finds the test statistic for us.
s
n
Test statistic: t = 3.789
Calculate the p-value and find a confidence interval using a 95% confidence level:
The p-value = 0.0004 < 0.05 so we have evidence to reject the null hypothesis. The confidence
interval gives us further evidence that students tend to end up 1.669 to 5.681 inches off a straight
line when they try to walk 10 ft blindfolded.
Now complete Activity 20-1 starting on page 422 and Activity 20-2 starting on page 426 in your
textbook.
Page 7
Topic 20 Summaries
b. When should the alternative hypothesis be formulated? Before collecting the sample data,
based on the research question
c. What is the denominator of the test statistic? The standard error of the sample mean.
d. When you calculate the p-value for a two-sided alternative, what area is included? The total
area in both tails of the t-distribution beyond the value of the test statistic.
e. What does a small p-value indicate? You are unlikely to obtain such extreme sample data if
the null hypothesis is true, which provides evidence against the null hypothesis and in favor
of the alternative hypothesis.
2. From the Watch-Out on page 430, when should you be cautious about generalizing to a larger
population? When a sample is not chosen randomly.
Page 8
Complete Activity 22-1 starting on page 472 in your textbook.
Personality Quiz
Students will take a personality-style quiz using the link: https://forms.gle/6UafB96FAD3njkga7. After
answering the quiz questions, students will calculate a score. The data collection will include the score
and whether the student was born before July 1 or not. The question is whether there is a difference in
score on the personality-style quiz based on time of year you were born.
Take the personality-style quiz using the link above. It is an FCPS Google link. Report your score to the
data collection spreadsheet and then use data from there to complete this problem.
(Sample data given. Results based on sample data.)
Scores for Students born before July 1 (points from quiz):
46 30 31 18 17 20 35 41 21 16
32 56 43 34 41 24 62 38 20 50
52 20 33 57 64 63 43 58 16 22
nE = 30 xE = 36.4333 sE = 16.211
nL = 30 xL = 39.633 sL = 11.877
Page 9
Which distribution will you use? Why? Are there any assumptions to be made that you haven't already
stated? Explain.
Since we don’t have the population standard deviation for either group, we will use the t-
distribution. Since our sample size might not be 30, checking that the samples are normal using a
normal probability plot would be a good idea. Either way, we will proceed with the test (possibly
with caution if samples are not normal or large enough!).
4. Test Statistic: We will run a two-sample t-test since the “treatment” is time of year born. The
observational units are the students.
t = -0.872
5. p-value = 0.387
d.f. (if applicable) = 53.169
95% C.I. = (-10.56, 4.159)
Page 10
Topic 22 Summaries
both are both symmetric with similar shapes, then the sample size may be
may be smaller than 30 but if the populations are skewed, we prefer sample sizes be
b. What does it mean that the degrees of freedom convention is a conservative approximation?
The degrees of freedom is on the low side, so the critical value will be slightly greater than it
needs to be; thus, the interval will be slightly wider and, therefore, will succeed in capturing
µ1 - µ2 slightly more often than the confidence level indicates.
2. From page 479, all else being the same, a test results becomes more statistically significant as
3. From the Watch-Out on page 479, is failing to reject a null hypothesis the same as accepting it? No.
c. What should you always relate your conclusion to? The context of the study.
Page 11
Complete Activity 23-1 starting on page 498 in your textbook.
Complete Activity 23-3 on page 504 in your textbook.
Heart Rate Experiment: Is there a difference between your resting and active heart rate?
For most people, their resting heart rate is lower than their active heart rate. Heart rate is measured in
beats per minute and can be calculated by finding your pulse in your wrist, counting the beats in 10
seconds, and multiplying by 6 to get beats per minute.
In a matched pairs experiment, we can test that idea. We will test the same person’s heart rate at rest and
after 1 minute of jumping jacks.
1. Parameter:
µr is the mean resting heart rate in beats per minute (bpm), µa is the mean active heart rate in beats
per minute (bpm), and µd is the mean difference in heart rate in beats per minute (bpm), active
minus resting (µa- µr), for the population of RS 1 students participating in this experiment.
2. Hypotheses: Ho: µd = 0
Ha: µd > 0
Materials: timer
Procedure:
1. After sitting (at rest) for at least one minute, find your resting heart rate. The pulse in your wrist
is usually easiest to find. Using two fingers (not your thumb), gently press on your wrist until you feel
your pulse. Count the number of beats in 10 seconds. Multiply that by 6 to get beats per minute.
Page 12
3. Technical Conditions:
Random sampling can be assumed as students were
basically randomly placed in classes by a computer to start
the school year. Since sample size may not be 30 and since
the sample size is close to 10% of the population, samples
should be checked to see that they are approximately
normal using a normal probability plot. A normal
probability plot with the differences in bpm on the
horizontal axis and the corresponding z-scores in the
vertical axis is shown:
6. Test decision (Significance Level: = 0.05 ): With p = 0 < 0.05 we have evidence to reject the
null hypothesis. There is strong evidence that freshmen in RS1 have a higher active heart rate than
resting heart rate measured in beats per minute. The 95% confidence interval gives further
evidence; we are 95% sure that the active heart rate is 52.905 to 69.495 beats per minute higher
than the resting heart rate of a freshmen in RS1 on average.
Page 13
Topic 23 Summaries
b. What can help you decide whether data are collected with a paired design? Ask whether
there is a link between each observation in one group with a specific observation in the
other group.
c. In which type of design does mixing up the order of values in one group create a problem?
Paired design.
d. If a study has a different number of observations between two groups, what type of design
cannot it not be? Paired design.
Page 14
Summary
Hypothesis Testing – t-Test: Procedure to test if there is any statistical significance to your data.
Step 2: State competing claims concerning the parameter of interest. Write the null and alternative
hypothesis in words and symbols.
Step 4: Calculate test statistic, some measure of difference between values (z-score, t-score, etc.). Calculate
degrees of freedom.
Step 6: Draw a conclusion based on whether or not to reject or fail to reject the null hypothesis. Using
the p-value from Step 5, assume that the null hypothesis is true and the resulting probability (p-
value) is that of obtaining the sample statistical measure or a more extreme sample statistical
measure. If this probability is smaller than the level of significance, α, you should reject the
null hypothesis. If this probability is larger than the level of significance, α, you should fail to
reject the null hypothesis.
Your conclusion is two parts. The first part (a) is including the p-value to state whether you
reject or fail to reject the null hypothesis. The second part (b) is stating in context of the
problem what you are rejecting or failing to reject.
A level of significance, α, is the maximum probability of error that you are willing to allow in the
hypothesis testing procedure.
Page 15
Unit 5 In-Class Review
1. Perform a complete hypothesis test: A state university is concerned that there is a difference in the
writing abilities of their male and female students. To test this assertion, the university took a random
sample of 60 of their first-year students and recorded their genders and SAT Writing scores. The data
appears below.
Use an appropriate t-test to compare these data sets and their population means.
1. M is the mean SAT score for males from the state university; F is the mean SAT score for
females from the state university
H 0 : M = F
2.
H 0 : M F
3. Random sample is stated. We can assume there are more than 600 students at a state university and the
samples are male and female, so the samples are independent and independent of each other. The size of
sample of males is 35 which is large enough that we do not need to check normality. The size of the
female sample is 25 so we will construct a normal probability plot to check normality. The female SAT
scores will be the x-values with each z-score as the y-value:
The data are close to forming a line, especially
for the majority of the values, so we will proceed
as if the sample is normal.
5. p-value = 0.035
df = 57.699 or 57.670
95% confidence interval: (2.425, 65.689)
Page 16
2. A study on children’s television viewing was conducted by Stanford researchers (Robinson, 1999). At
the beginning of the study, parents of third- and fourth-grade students at two public elementary schools in
San Jose were asked to report how many hours of television the child watched in a typical week. The 198
responses had a mean of 15.41 hours and a standard deviation of 14.16 hours.
Conduct a test of whether or not these sample data provide evidence at the .05 level for concluding that
third- and fourth-grade children watch an average of more than two hours of television per day. Include
all the components of a significance test, and explain what each component reveals. Start by identifying
the observational units, variable, sample, and population.
Page 17
3. Police trainees were seated in a darkened room facing a projector screen. Ten different license planes
A B
were projected on the screen, one at a time, for 5 seconds each, separated by 15-second intervals.
6 6
8 5
After the last 15-second interval, the lights were turned on and the police trainees were asked to write
6 6
down as many of the 10 license plate numbers as possible, in any order at all.
7 5
9 7
A random sample of 15 trainees who took this test where then given a week-long memory training course.
8 5
They were then retested. The results are shown in the table (A is after training, B is before)
9 4
6 6
Test, at the 5% level of significance, that the memory course improves the ability of the trainees to
correctly identify license plates. 7 7
5 8
A B
9 4
1. is the mean number of license plates remembered after the week-long memory course; 8 5
is the mean number of license plates remembered before the week-long memory course; D is the 6 4
difference in the means, after minus before 8 6
H 0 : D = 0 6 7
2.
H 0 : D 0
3. Apparently, all police trainees were given the original test, but then a random sample of 15 were
given the week-long memory course, so there is a random sample. We can assume there are more
than 150 police trainees, so we have an independent sample. Since the sample size, 15, is less than
30, we will construct a normal probability plot. The difference in number of license plates
memorized will be the x-values and the z-score for each will be the y-value:
6. The p-value = 0.008 < 0.05 so we reject the null hypothesis. We have strong evidence that the
week-long memory course increases the mean number of license plates police trainees are able to
memorize. Since the 95% confidence interval doesn’t contain 0, we are 95% confident that the mean
increase in number of license plates memorized after the memory course is between 0.315 and 2.751.
Page 18
Extra Practice: Unit 5 Review
1. Which type of sampling must be used to select the samples used for constructing confidence
intervals and performing hypothesis tests?
2. The null hypothesis is a claim about a:
a) parameter, where the claim is assumed to be false until it is declared true
b) parameter, where the claim is assumed to be true until it is declared false
c) statistic, where the claim is assumed to be false until it is declared true
d) statistic, where the claim is assumed to be true until it is declared false
3. If we want to calculate a confidence interval or perform a hypothesis test for a population mean,
when will we use the t-distribution rather than the z-distribution in the formulas and procedures?
4. The mean federal income tax paid last year by a random sample of 19 persons selected from a city
was $4275 with a standard deviation of $766. If we want to use this information to test at a 5%
significance level that the mean income tax of all persons in this city is more than $4000, we
a) could construct a Z-interval
b) could construct a T-interval
c) could perform a Z-test
d) could perform a T-test
5. In a hypothesis test, if we REJECT the null hypothesis at a 5% significance level, then it must be
that
a) P-value > 0.05
b) P-value < 0.05
c) P-value = 0.05
d) P-value > 0.025
6. A two-tailed hypothesis test using the normal distribution reveals that the area under the sampling
distribution curve of the mean and located in the tail to the right of the sample mean equals 0.028.
Consequently, the p-value for this test equals:
7. We want to know if there is a difference in pay among females and males at a large cooperation. We
draw two random samples, one from the population of female employees and one from the
population of male employees at this cooperation. The two samples are
a) independent
b) dependent
c) matched samples
d) paired samples
8. Drug A was given to 132 patients and Drug B was given to 127 patients in a Phase 3 clinical trial for
efficacy. Each drug claims to reduce patients' diastolic blood pressure. Blood pressure readings were
taken before and after administration of the drug to test parameters μA and μB . What type of T-test is
appropriate and how many degrees of freedom would you use in the following T-tests, using the
textbook's conservative approximations?
a) H0 : The average diastolic of patients given Drug A was 85, before administration.
Page 19
b) H0 : The average change in diastolic of patients given Drug A was -5.
c) H0 : The average diastolic of patients given Drug A was the same as patients given Drug B,
before administration.
d) H0 : The average change in diastolic of patients given Drug B was -5.
e) H0 : Drugs A and B are equally effective because the average change in diastolic for the two
groups is identical.
9. A soft-drink manufacturer claims that its 12-ounce cans do not contain, on average, more than 30
calories. A random sample of 64 cans of this soft drink, which were checked for calories, contained
a mean of 32 calories with a standard deviation of 3 calories. Does the sample information support
the alternative hypothesis that the manufacturer's claim is false? Use a significance level of 5%.
∘
10. Listed below are temperatures (in F) of subjects measured at 8:00 am and then again at 12:00am.
a) Construct a 95% confidence interval estimate of the difference between the 8:00 am
temperatures and the 12:00 am temperatures.
b) Test at 5% significance level the claim that the body temperature is the same at both times.
c) Explain the relationship between your answers to the above 2 parts.
8:00
AM 97.0 96.2 97.6 96.4 97.8 99.9
12:00
AM 98.0 98.6 98.8 98.0 98.6 97.6
11. John read that farmers in Japan routinely subject plants to stress before transplanting from the
greenhouse to the field. Methods of stress induction included pulling on the plants and hitting them with
straw rakes. John decided to investigate this phenomenon by growing two groups of bean plants
(10/group) in a greenhouse for 15 days during which time the plants in one group were pulled on three
times daily at 8:00 in the morning and at 4:00 in the afternoon. The plants were then transplanted to a
field. John hypothesized that stressed plants would exhibit significantly larger mean plant heights after
transplanting than the non-stressed plants (control). Use = .05 and complete a hypothesis test showing
all work.
Plant heights (in cm.) after 30 days were:
Stressed Plants: 55, 65, 50, 57, 59, 73, 57, 54, 62, 68
Non-stressed Plants: 48, 65, 59, 57, 51, 63, 65, 58, 44, 50
Stop-and-Go Highway
1 1500 941
2 870 456
3 1120 893
4 1250 1060
5 3460 3107
6 1110 1339
7 1120 1346
8 880 644
Page 20
Unit 5 Review Key
1. Random sampling of independent items; 2. B;
3. Use t when you don't know σ, the population standard deviation.
4. B or D; 5. B; 6. p = 2(0.028) = 0.056
7. A. If the corporation is sufficiently large, it is safe to assume the samples are independent.
8. a. 1-Sample T-test with 131 degrees of freedom, b. 1-Sample Matched Pairs T-test with 131 degrees
of freedom, c. 2-Sample T-test with 126 degrees of freedom, d. 1-Sample Matched Pairs T-test with
126 degrees of freedom, e. 2-Sample T-test with 126 degrees of freedom
9. The test rejects H0 in favor of HA : μ0 > 30 with a t-statistic of 5.3 and a p-value less that 10−6.
10. a) A 95% confidence interval is (−0.9094,2.476) using the TI-84.
b) We fail to reject H0 at the 5% level because p = 0.2876
c) We are 95% confident that the difference in means is between -0.9094 and 2.476. Since 0 is
within this confidence interval, we cannot reject H0 at the 5% level. The 95% central probability
associated with the confidence interval is the complement of the 5% alpha-region which would
allow us to reject H0 .
11. Two-sample t-test for the stressed and non-stressed plants:
1. S is the mean height in cm after 30 for the stressed plants; N is the mean height in cm
after 30 days for the non-stressed plants
H 0 : S = N
2.
H a : S N
3. Simple random sampling isn’t stated, but we can assume that the 20 plants used were
randomly sampled from a larger population. We also know that the two samples are
independent of each other and that there are far more than 200 plants, so the samples are
independent. Since the sample size, 10, is less than 30, we will check normality by
constructing a normal probability plot for each sample. The sample heights in cm are on the
x-axis and the z-score for each height is the y-value (the red squares are the stressed plants
and the blue crosses are the non-stressed plants):
The red squares from the stressed
plants appear to form a line so we
can assume the sample is normal.
The blue crosses from the non-
stressed plants are not quite as
linear, but we will proceed with
caution.
4. Since the two groups of plants were part of different treatment groups, either being stressed
before moving or non-stressed, we will run a two-sample t-test.
Page 21
Test statistic: t = 1.240
5. p-value = 0.115, df = 17.944 or 17.945
95% confidence interval: (-2.777, 10.777)
6. With p = 0.115 > 0.05, we fail to reject the null hypothesis. We do not have sufficient
evidence to say that the mean height in cm of the stressed plants is greater than the mean
height of the non-stressed plants. The 95% confidence interval includes the value 0 which
is more evidence that there is no difference in the plant heights in cm for the two groups on
average.
12. Matched pairs t-test because emissions values were taken from each car, once when driven on the
highway and once when driven off the highway.
1. N is the mean emissions value (no units given) for non-highway driving; H is the
mean emissions value for the highway driving; D is the difference of the means, non-
highway minus highway
H 0 : D = 0
2.
H a : D 0
Note: if you set of the mean of the differences the opposite way, you would choose the
opposite alternative hypothesis.
3. Simple random sampling isn’t stated, but we can assume that the 16 cars were randomly
sampled from a larger population. We also know that there are far more than 160 cars, so
the sample is independent. Since the sample size, 8, is less than 30, we will check
normality by constructing a normal probability plot for the sample that is the difference in
emissions for each car. The sample difference in emissions for each car is on the x-axis
and the z-score for each car is the y-value:
From the normal probability plot
we are not convinced that the
sample data is normal since the
points for not look approximately
linear, so we will proceed with
caution.
Page 22
Glossary
Alternative hypothesis – a statement of what researchers suspect or hope to be true about the
parameter. It will take one of these three forms:
• Ha: parameter < hypothesized value
• Ha: parameter > hypothesized value
• Ha: parameter ≠ hypothesized value
The specific form (direction) of the alternative is determined by the research question, before
the sample data are determined.
Comparing two means – common inference procedure used when the response variable is
quantitative; procedure attempts to distinguish between an observed difference due to sampling
variability and one too large to have occurred by chance.
Matched-pairs Experiment – Experiment that incorporates blocking, where the block size is 2. The
pairs may arise naturally, and they may not be independent.
Null hypothesis – A statement about the parameter of interest. Typically a statement of no effect or
no difference, the null states the parameter of interest is equal to a specific value:
H0-: parameter = hypothesized value
One-tailed test – significance test conducted when the alternative hypothesis is one-sided. For
example, Ha: μ > μ0 or Ha: μ < μ0.
Practical significance – When large samples are available, even tiny deviations from the null
hypothesis will be statistically significant. But a tiny deviation may not have practical importance,
so use your common sense and look at the size of an observed difference. Ask yourself whether the
observed difference is important.
p-value – The probability, assuming the null hypothesis to be true, of obtaining a test statistic at
least as extreme as the one actually observed. Extreme means in the direction of the alternative
hypothesis.
Robust – Describes a procedure that tends to give reasonable results even for small sample sizes as
long as the population is not severely skewed and does not have extreme outliers
Significance Level – The cutoff p-value that the researcher decides in advance in order to provide
convincing evidence against the null hypothesis.
Technical conditions for t-test – The t-test requires a simple random sample from a population of
interest. The t-test also requires either a large sample size or a normally distributed population.
You can generally regard a sample of at least 30 as large enough for the procedure to be valid. If
the sample size is less than 30, examine visual displays of the sample data to see whether they
appear to follow a normal distribution.
Page 23
Test decision – A comment evaluating the strength of evidence against the null hypothesis. Where a
test decision needs to be made:
If the p-value is small, reject the null hypothesis.
If the p-value is not small, fail to reject the null hypothesis.
The decision should respond to the research question, stating that you either have evidence for the
alternative hypothesis (in context) or you do not. In other words, restate your final conclusions in
the language of the research question.
Test of Significance – A significance test is a formal procedure for comparing observed data with a
claim (hypothesis) whose truth we want to assess. The claim is a statement about a parameter, like
the population mean μ. We express the results of a significance test in terms of a probability that
measures how well the data and the claim agree.
Test statistic – This is a measure of the discrepancy between our observed statistic and the
hypothesized value of the parameter. If the discrepancy is large, we have evidence against the null
hypothesis.
Two-tailed test – When we look for results at least as extreme as the sample result in both
directions. When the alternative hypothesis is two-sided (not equal to), we find the p-value by
computing
2 ∙ P(Z > |z|)
Page 24