Basic Analysis of Variance and The General Linear Model: Psy 420 Andrew Ainsworth
Basic Analysis of Variance and The General Linear Model: Psy 420 Andrew Ainsworth
Basic Analysis of Variance and The General Linear Model: Psy 420 Andrew Ainsworth
2
15
a
Y =
1
2
245
a
Y =
2
2
45
a
Y =
1
7
a
Y =
3
3
a
Y =
GLM
Changes produced by the treatment represent
deviations around the GM
2 2 2
2 2 2 2
( ) [(7 5) (3 5) ]
5(2) 5( 2) 5[(2) ( 2) ] 40
j
n Y GM n
or
= + =
+ + =
GLM
Now if we add in some random variation (error)
a
1
a
2
Case Score Case Score SUM
s
1
s
2
s
3
s
4
s
5
5 + 2 + 2 = 9
5 + 2 + 0 = 7
5 + 2 1 = 6
5 + 2 + 0 = 7
5 + 2 1 = 6
s
6
s
7
s
8
s
9
s
10
5 2 + 0 = 3
5 2 2 = 1
5 2 + 0 = 3
5 2 + 1 = 4
5 2 + 1 = 4
1
35
a
Y =
2
15
a
Y =
50 Y =
1
2
251
a
Y =
2
2
51
a
Y =
2
302 Y =
1
7
a
Y =
3
3
a
Y =
5 Y =
GLM
Now if we calculate the variance for each group:
The average variance in this case is also going to
be 1.5 (1.5 + 1.5 / 2)
2
2
2
2
2
1
( )
15
51
5
1.5
1 4
a
N
Y
Y
N
s
N
= = =
1
2
2
2
2
1
( )
35
251
5
1.5
1 4
a
N
Y
Y
N
s
N
= = =
GLM
We can also calculate the total variability in
the data regardless of treatment group
The average variability of the two groups is
smaller than the total variability.
2
2
2
2
1
( )
50
302
10
5.78
1 9
N
Y
Y
N
s
N
= = =
2
302 Y =
52 =
8 =
12 =
5 Y =
5(8) 40 n = =
52 = 40 + 12
Analysis deviation approach
degrees of freedom
DF
total
= N 1 = 10 -1 = 9
DF
A
= a 1 = 2 1 = 1
DF
S/A
= a(S 1) = a(n 1) = an a =
N a = 2(5) 2 = 8
Analysis deviation approach
Variance or Mean square
MS
total
= 52/9 = 5.78
MS
A
= 40/1 = 40
MS
S/A
= 12/8 = 1.5
Test statistic
F = MS
A
/MS
S/A
= 40/1.5 = 26.67
Critical value is looked up with df
A
, df
S/A
and alpha. The test is always non-
directional.
Analysis deviation approach
ANOVA summary table
Source SS df MS F
A 40 1 40 26.67
S/A 12 8 1.5
Total 52 9
Analysis computational
approach
Equations
Under each part of the equations, you divide by the
number of scores it took to get the number in the
numerator
( )
2
2
2 2
Y T
Y
T
SS SS Y Y
N an
= = =
( )
2
2
j
A
a
T
SS
n an
=
( )
2
2
/
j
S A
a
SS Y
n
=
Analysis computational
approach
Analysis of sample problem
2
50
302 52
10
T
SS = =
2 2 2
35 15 50
40
5 10
A
SS
+
= =
2 2
/
35 15
302 12
5
S A
SS
+
= =
Analysis regression
approach
Levels of A Cases Y X YX
a
1
S
1
S
2
S
3
S
4
S
5
9
7
6
7
6
1
1
1
1
1
9
7
6
7
6
a
2
S
6
S
7
S
8
S
9
S
10
3
1
3
4
4
-1
-1
-1
-1
-1
-3
-1
-3
-4
-4
Sum 50 0 20
Squares Summed 302 10
N 10
Mean 5
Analysis regression
approach
Y = a + bX + e
e = Y Y
Analysis regression
approach
Sums of squares
( )
2
2
2
50
( ) 302 52
10
Y
SS Y Y
N
= = =
( )
2
2
2
0
( ) 10 10
10
X
SS X X
N
= = =
( )( )
(50)(0)
( ) 20 20
10
Y X
SP YX YX
N
= = =
Analysis regression
approach
( )
( ) 52
Total
SS SS Y = =
( )
| |
2
2
( )
20
40
( ) 10
regression
SP YX
SS
SS X
= = =
( ) ( ) ( )
52 40 12
residual total regression
SS SS SS = = =
Slope
Intercept
( )
2
2
( )( )
( ) 20
2
( ) 10
Y X
YX
SP YX
N
b
SS X
X
X
N
(
= = = =
5 2(0) 5 a Y bX = = =
Analysis regression
approach
1
2
'
For a :
' 5 2(1) 7
For a :
' 5 2( 1) 3
Y a bX
Y
Y
= +
= + =
= + =
Analysis regression
approach
Degrees of freedom
df
(reg.)
= # of predictors
df
(total)
= number of cases 1
df
(resid.)
= df(total) df(reg.) =
9 1 = 8
Statistical Inference
and the F-test
Any type of measurement will include a
certain amount of random variability.
In the F-test this random variability is seen
in two places, random variation of each
person around their group mean and each
group mean around the grand mean.
The effect of the IV is seen as adding further
variation of the group means around their
grand mean so that the F-test is really:
Statistical Inference
and the F-test
If there is no effect of the IV than the
equation breaks down to just:
which means that any differences between the
groups is due to chance alone.
BG
WG
effect error
F
error
+
=
1
BG
WG
error
F
error
= ~
Statistical Inference
and the F-test
The F-distribution is based on having
a between groups variation due to the
effect that causes the F-ratio to be
larger than 1.
Like the t-distribution, there is not a
single F-distribution, but a family of
distributions. The F distribution is
determined by both the degrees of
freedom due to the effect and the
degrees of freedom due to the error.
Statistical Inference
and the F-test
Assumptions of the analysis
Robust a robust test is one that is
said to be fairly accurate even if the
assumptions of the analysis are not
met. ANOVA is said to be a fairly
robust analysis. With that said
Assumptions of the analysis
Normality of the sampling distribution of
means
This assumes that the sampling distribution of
each level of the IV is relatively normal.
The assumption is of the sampling distribution
not the scores themselves
This assumption is said to be met when there
is relatively equal samples in each cell and the
degrees of freedom for error is 20 or more.
Assumptions of the analysis
Normality of the sampling distribution
of means
If the degrees of freedom for error are small
than:
The individual distributions should be
checked for skewness and kurtosis (see
chapter 2) and the presence of outliers.
If the data does not meet the distributional
assumption than transformations will need to
be done.
Assumptions of the analysis
Independence of errors the size of the error for
one case is not related to the size of the error in
another case.
This is violated if a subject is used more than once
(repeated measures case) and is still analyzed with
between subjects ANOVA
This is also violated if subjects are ran in groups. This
is especially the case if the groups are pre-existing
This can also be the case if similar people exist within
a randomized experiment (e.g. age groups) and can
be controlled by using this variable as a blocking
variable.
Assumptions of the analysis
Homogeneity of Variance since we are
assuming that each sample comes from
the same population and is only affected
(or not) by the IV, we assume that each
groups has roughly the same variance
Each sample variance should reflect the
population variance, they should be equal to
each other
Since we use each sample variance to estimate
an average within cell variance, they need to be
roughly equal
Assumptions of the analysis
Homogeneity of Variance
An easy test to assess this assumption is:
2
largest
max
2
smallest
max
10, than the variances are roughly homogenous
S
F
S
F
=
s
Assumptions of the analysis
Absence of outliers
Outliers a data point that doesnt really
belong with the others
Either conceptually, you wanted to study
only women and you have data from a man
Or statistically, a data point does not cluster
with other data points and has undue
influence on the distribution
This relates back to normality
Assumptions of the analysis
Absence of outliers