Inference For Two Population Means: Case Study
Inference For Two Population Means: Case Study
Inference For Two Population Means: Case Study
density plots;
histograms;
box-and-whisker plots;
dot plots.
As this example data set is small, a dot plot is best because there is
no compelling reason to summarize the data.
Points should be jittered so equal values are not directly on top of one
another.
Two Population Means Graphics 8 / 65
Dot plot of the data
b
r
o
o
d
s
0
2
4
6
Different Same
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G G
G
G
G
G
G
G
GG
G
G
G
G
G
G
G
G
Two Population Means Graphics 9 / 65
Comments on the Graphics
There is a lot of overlap between the samples.
It would be dicult to place an individual in one group or the other
on the basis of the number of successful broods.
But the centers of the distributions appear to be a bit dierent with
generally larger values for the Dierent group on average.
Two Population Means Graphics 10 / 65
Components of a Randomization Test
Randomization Tests
1
State hypotheses;
2
Select and calculate a test statistic;
3
Use simulation to nd the null distribution of the test statistic;
4
Compare the value of the actual test statistic to its null distribution
to compute a p-value;
5
Summarize the results in the context of the problem.
Two Population Means Randomization Tests 11 / 65
State Hypotheses
Hypotheses are statements about populations;
Here we are assuming that the pseudoscorpions in the sample may be
treated as if they were randomly sampled from the population of
these pseudoscorpions in the wild;
In words, the hypotheses are:
H
0
: There would be no dierence in the mean number of
successful broods for each experimental condition
among all female pseudoscorpions in the population.
H
A
: The experimental condition with dierent partners
produces a larger mean number of successful broods
than the experimental condition with the same partner
mating twice.
Two Population Means Randomization Tests 12 / 65
State Hypotheses (cont.)
In symbols, letting
1
and
2
represent the mean number of
successful broods in the population for the Same and Dierent
groups, respectively, the hypotheses are:
H
0
:
1
=
2
H
A
:
1
<
2
One could also test the alternative hypotheses H
A
:
1
=
2
or
H
A
:
1
>
2
if appropriate for the setting.
Two Population Means Randomization Tests 13 / 65
Select a Test Statistic
The dierence in sample means is the natural test statistic for a
hypothesis that compares population means.
As we are determining the null distribution by simulation, there is no
need to standardize the test statistic so it can be compared to some
well-known benchmark distribution (such as standard normal,
chi-square, or t).
For the observed data, x
1
= 44/20 = 2.2 and x
2
= 58/16 = 3.625
and the dierence is x
1
x
2
= 1.425.
Two Population Means Randomization Tests 14 / 65
Compute the Null Distribution
Conceptually, we take a random sample of 20 without replacement
from the 36, compute its mean and the mean of the 16 remaining
values, and take the dierence.
Repeat this process very many times and see how may dierences are
1.425 or smaller.
The proportion of such values is the p-value.
Two Population Means Randomization Tests 15 / 65
Graph of Null Distribution
Difference in Sample Means
D
e
n
s
i
t
y
0.0
0.2
0.4
0.6
2 1 0 1 2 3
Two Population Means Randomization Tests 16 / 65
P-value
It is evident from the graph that the observed dierence is fairly
unusual relative to the sampling distribution.
The p-value is the actual proportion of sampled randomizations with
a dierence at least as extreme as that observed.
For the 100,000 randomly selected randomizations, the p-value is
estmated to be 0.0093.
A dierent simulation would estimate this dierently, but not by too
much.
The jaggedness of the preceding graph is caused by lots of ties in the
randomization distribution of the dierence between sample means.
Two Population Means Randomization Tests 17 / 65
Conclusions
The p-value is fairly small.
In the context of the problem, we can say this.
There is strong evidence that female pseudoscorpions have
fewer successful broods when they mate with only one
partner than when they mate with two partners under the
given experimental conditions (two independent sample
randomization test, p = 0.009, n
1
= 20, n
2
= 16). This
result is consistent with the evolutionary explanation that
the behavior of having multiple partners as seen in nature
may overcome the possibility of genetic incompatibility
among some partners.
Two Population Means Randomization Tests 18 / 65
Summary
Randomization tests are useful for comparing population means (or
other population characteristics).
The method simply considers the test statistic under the null
hypothesis of independence between the randomization and the
response variable of interest.
As simulation determines the null distribution, there is no need to
scale the test statistic so that it is comparable to a standard
benchmark null distribution.
Randomization tests are only practical with a computer.
We see that the shape of the null distribution is symmetric and
bell-shaped, which suggests that an approximation with a standard
distribution may be accurate.
Later in these notes we will reexamine this data using a t-test.
Two Population Means Randomization Tests 19 / 65
Two Dierent Designs
There are two standard designs to compare two treatment groups;
1
In a paired design, there is a single sample of pairs of observations and
each treatment is applied to each sampled unit.
Example
In a sample of people, scores from vision tests are recorded separately for each
eye. There is interest in comparing scores between right and left eyes.
2
In a two-independent-sample design, there are two separate samples,
and all elements of one sample get one treatment, all elements of the
other sample get the other treatment.
Example
In a dairy cattle study with two dierent feed supplements, cattle are randomly
separated into two groups and each group is given one of two dietary
supplements. Average daily milk yield is compared between the groups.
Two Population Means Comparison of Means 20 / 65
Butterfat Study
This study and the accompaning data is from a former Statistics 571
student.
Case Study
The butterfat content in milk is an important factor in determining its eco-
nomic value and in how it is processed to form dairy products such as cheese,
ice cream, and butter. In an experiment, a company is interested in com-
paring the performances of two dierent labs which measure the butterfat
content of milk. Two separate samples were collected from 107 loads of
milk, and one sample from each load was sent to one of two labs. Butterfat
content changes based on the identity of the cows, the time of milking, the
time since the last milking, and other factors, so the percentage butterfat
can be expected to vary from load to load, but should be consistent for
samples taken from the same load as each load is properly agitated before
sampling to promote mixing throughout the load.
How should this data be examined to compare the performances of the labs?
Two Population Means Case Studies 21 / 65
Horned Lizards
Case Study
The horned lizard Phrynosoma mcalli has horns it uses for protection. Re-
searchers tested a hypothesis that longer horns are more protective then
shorter horns. A predator of these lizards is the loggerhead shrike, a bird
that impales the lizards on thorns or barbed wire. Researchers compared the
horn lengths of 30 skewered lizards with 154 horned lizards that were living.
The average length of the skewered lizards was 21.99 mm and the average
length of the living lizards was 24.28 mm. Is this evidence that longer horns
are more protective?
Two Population Means Case Studies 22 / 65
Graphs
Density plots, histograms, dot plots, and box-and-whisker plots are all
useful for graphical comparisons between samples.
For paired samples, graphs of dierences are also useful.
It is very important to graph data and look for patterns or outliers
before carrying out statistical inferences.
Two Population Means Graphics 23 / 65
Butterfat Study
The butterfat percentage data is from a paired data design.
There is a single sample of size n = 107.
Each sample unit (a load) is measured twice (the two butterfat
measurements from the two labs).
Paired data design data is analyzed by:
2
A P% condence interval for
D
=
1
2
has the form
D t
n
< <
D + t
n
where t
and t
under
a t-density with n 1 degrees of freedom is P/100, where n is the sample
size.
Two Population Means Application Estimation 36 / 65
Chalkboard example
Example
Here are summary statistics:
2
= 0
H
A
:
1
2
= 0
Paired t test
If dierences D
1
, D
2
, . . . , D
n
are normally distributed, then
T =
D d
0
s/
n
t(n 1)
where d
0
(usually 0) is the mean dierence in the null hypothesis, s is the
sample standard deviation of dierences, and n is the sample size.
Two Population Means Application Hypothesis Tests 41 / 65
Chalkboard example
Example
Here are summary statistics:
X
Y) (
1
2
)
s
p
_
1
n
1
+
1
n
2
t(n
1
+ n
2
2)
where
s
p
=
(n
1
1)s
2
1
+ (n
2
1)s
2
2
(n
1
1) + (n
2
1)
is the pooled sample standard deviation and s
1
and s
2
are the respective
single sample standard deviations.
Note that s
2
p
, the pooled variance, is a weighted average of the
sample variances, weighted by the degrees of freedom.
Two Population Means Two Independent Samples t Distribution 52 / 65
Derivation
Using linearity properties of expectation and variances of independent
random variables, it is straightforward to show that if X
1
, . . . , X
n
1
and
Y
1
, . . . , Y
n
2
are independent samples with E(X
i
) =
1
, Var(X
i
) =
2
1
,
E(Y
i
) =
2
, and Var(Y
i
) =
2
2
, then
E(
X
Y) =
1
2
Var(
X
Y) =
2
1
n
1
+
2
2
n
2
Furthermore, if both distributions are normal, the distribution of the
dierence in sample means is also normal.
Under the additional assumption that
1
=
2
,
Var(
X
Y) =
2
_
1
n
1
+
1
n
2
_
Two Population Means Two Independent Samples t Distribution 53 / 65
Standard Error
The standard error for the dierence in sample means is
SE(
X
Y) =
2
1
n
1
+
2
2
n
2
Under the assumption that
1
=
2
, this is estimated as
SE(
X
Y) = s
p
_
1
n
1
+
1
n
2
Without this assumption, the estimate is
SE(
X
Y) =
s
2
1
n
1
+
s
2
2
n
2
Welchs t-test uses
SE for the T statistic; in this case the distribution
is only approximate and the degrees of freedom is approximated with
a messier formula (see page 304 in the textbook for details).
Welchs approach is the default in the R function t.test().
Two Population Means Two Independent Samples t Distribution 54 / 65
Chalkboard example
Example
Here are summary statistics for the lizard data:
The sample sizes are 154 for the living lizards and 30 for the
unfortunate skewered ones.
The sample means are 24.28 for the living lizards and 21.99 for the
killed ones.
Find a 95% condence interval for the dierence in mean horn length
for the two groups.