T-Test For Difference of Means
T-Test For Difference of Means
T-Test For Difference of Means
Before we considered inferences about a single sample mean. Normally, however, we are interested in
relationships. and want to compare two or more sample means between or across groups. This would
allow us to explore questions like is the average amount of GNP per capita allocated by governments
for military expenditures significantly greater in the richest countries compared to the poorest? Or, is
the crime rate significantly greater in the largest cities compared to the smallest?
These questions require that we calculate two means and compare them to see if one is greater than the
other, and by how much. To do this we have to return to a theorem derived from the central limits
theorem:
That is to say: µ =µ −µ
X − X
1 2
1 2
σ σ
σ = +
X −X
n n
1 2
1 2
While a random selection procedure insures that each case sampled is independent of others within a
sample, the selection procedures of one sample cannot influence the selection of the other sample.
So how do we proceed? There are two models that can be employed. The first assumes that the two
population variances are equal (σ1=σ2). The second assumes that they are unequal (σ1≠σ2).
Model A: (σ1=σ2)
X −µ
A t-score is calculated in general as follows:
t= X
σ
X
X1 − X 2
t ( X1 − X 2 ) =
In practical terms: σ 12 σ 22
+
N1 N 2
σ2 σ2 N +N
The denominator, assuming equal variances, σ = 1 + 2 =σ 1 + 1 =σ 1 2
X −X
can be rewritten as:
1 2 N N N N N N
1 2 1 2 1 2
n1 s12 + n2 s22 n1 + n2
Putting the whole thing together we get: σˆ ( X1 − X 2 ) =
n1 + n2 − 2 n1n2
X1 − X 2
t=
And finally, the whole t-score formula is equal to: n1s12 + n2 s22 n1 + n2
n1 + n2 − 2 n1n2
An Example
Since we do not know the population variances, we have to use a t-test. We are assuming equal
population variances, but this can be tested as we shall see later. For the time being, we will just
assume this to be the case. We will use a significance level of 0.05 and a one-tailed test. Let's say we
were given the following proposition:
P1: The Higher the educational achievement of a man's father, the higher his
educational level.
Using the 1984 GSS data we can test this. First, we can take father's education and dichotomize it. The
two groups are:
The dependent variable is the number of years of education that each father's son possesses. From this we
can derive two hypotheses, the null and alternative hypotheses:
H0: µ1-µ2=0
H1: µ1-µ2>0
The critical value of t with df equal to 90+359-2 for alpha equal to 0.05 is 1.645. Clearly, we can reject
the hypothesis of no difference.
Model B: (σ1≠σ2)
If we cannot assume that the two populations have equal standard deviations, we have to modify our
procedure slightly. In this case, we cannot pool our variances from two samples together; they must be
estimated separately.
s12 s22
The formula for the standard error is: σˆ ( X1 − X 2 ) = +
n1 − 1 n2 − 1
15.24 − 12.57
t=
6.57 9.82
+
Using the numbers from the example above, we 90 − 1 359 − 1
get:
2.67
= = 8.39
0.3181987
2
s12 s22
+
df = N1 − 1 N 2 − 1 −2
2 2
s1 1 s2 1
2 2
+
The df for the unequal
variance model is
equal to: − N +
1 1 2 2
N 1 1 N − 1 N + 1
= 163.42
The critical value of t with df equal to 163.42 for alpha equal to 0.05 is 1.645. Clearly, we can reject the
hypothesis of no difference.
Testing for Equality of Variances
So, now that we know that there are two ways to calculate the difference of means test, one for equal
variances, one for unequal variances, how do we determine which one to use? To determine whether
the variances are equal or not, we will take the ratio of one sample variance to the other. It turns out
that this ratio has a sampling distribution with known characteristics, and is called the F distribution.
As this ratio departs from unity, we assume, with a known probability of error, that the variances are
unequal. That is to say, we reject the null hypothesis of equal variances.
How do you use the F table? Two parameters are necessary for the F distribution, υ1 and υ2 (pronounced
nu). These are equal to the degrees of freedom associated with each sample where df is equal to n-1.
F tables are normally very condensed since there are so many possible values. Furthermore, most
tables only present information for a one-tailed test.
Suppose that we have two samples, one with n=13, and the other with n=20. These are associated,
respectively, with s1 and s2. The respective degrees of freedom are 12 and 19. Assuming α=0.05, we
look across the top of the F-table to the column labeled 12. Then we look down the rows to the one
labeled 19. The critical value that the F ratio must exceed to reject the null hypothesis is 2.31 --the
intersection of the appropriate row and column.
s12
So, how does all this work? Take the ratio of the
larger variance to the lesser variance, given as:
Fη 1 ;η2 = 2
s2
H 0 : σ 12 = σ 22
And the null and alternative hypotheses are:
H 1 : σ 12 > σ 22
In the earlier example we saw that group two had a variance equal to 9.82 and group one had a variance
of 6.57 with 358 and 89 df, respectively. The closest we can get to these df in the F-table for α=0.05 is
infinity and 60. I choose 60 as a more conservative test than 120. The critical value, then, is 1.39.
The F ratio is equal to:
So, we would reject the null hypothesis and be forced to use the formula for unequal variances (i.e. we
cannot pool the variances).