Factor Analysis
Factor Analysis
Factor Analysis
You perform a factor analysis to see if there are really these three factors. If they do, you
will be able to create three separate scales, by summing the items on each dimension.
Factor analysis is based on a correlation table. If there are k items in the study
(e.g. kquestions in the above example) then the correlation table has k × k entries of
form rijwhere each rij is the correlation coefficient between item i and item j. The main
diagonal consists of entries with value 1.
Closely related to factor analysis is principal component analysis, which creates a
picture of the relationships between the variables useful in identifying common factors.
Factor analysis is based on various concepts from Linear Algebra, in particular
eigenvalues, eigenvectors, orthogonal matrices and the spectral theorem. We review these
concepts first before explaining how principal component analysis and factor analysis
work.
Topics:
Real Statistics Functions: The Real Statistics Resource Pack provides the following
supplemental functions, where R1 is a k × k range in Excel.
eVALUES(R1): Produces an 1 × k array containing the eigenvalues of matrix in range R1.
These eigenvalues are listed in decreasing absolute value order.
eVECTORS(R1) : Produces a row with the eigenvalues as for eVALUES(R1). Below each
eigenvalue is a unit eigenvector corresponding to this eigenvalue. Thus the output of
eVECTORS(R1) must be an (k+1) × k array.
Since the calculation of these functions uses iterative techniques, you can optionally
specify the number of iteration used by using eVALUES(R1, iter) and
eVECTORS(R1, iter). If the iter parameter is not used then it defaults to 100 iterations.
The eigenvectors produced by eVECTORS(R1) are all orthogonal, as described in
Definition 8 of Matrix Operations. See Figure 5 of Principal Component Analysis for an
example of the output from the eVECTORS function.
Observation: Every square k × k matrix has at most k (real) eigenvalues
(see Eigenvalues and Eigenvectors). If A is symmetric then it has k eigenvalues, although
these don’t need to be distinct (see Symmetric Matrices). It turns out that the eigenvalues
for covariance and correlation matrices are always non-negative (see Positive Definite
Matrices).
Theorem 1 (Spectral Decomposition Theorem): Let A be a
symmetric n × n matrix, then A has a spectral decomposition A = CDCT where C is an n
× n matrix whose columns are unit eigenvectors C1, …, Cn corresponding to the
eigenvalues λ1, …, λn of A and D is the n × n diagonal matrix whose main diagonal consists
of λ1, …, λn.
Observation: This is Theorem 1 found in Spectral Decomposition. We will use this
theorem to carry out principal component analysis and factor analysis. In fact, the key
form of the theorem that we will use is that A can be expressed as
Principal Component Analysis
Principal component analysis is a statistical technique that is used to analyze the
interrelationships among a large number of variables and to explain these variables in
terms of a smaller number of variables, called principal components, with a minimum
loss of information.
Definition 1: Let X = [xi] be any k × 1 random vector. We now define a k × 1 vector Y
= [yi], where for each i the ith principal component of X is
for some regression coefficients βij. Since each yi is a linear combination of the xj, Y is a
random vector.
Now define the k × k coefficient matrix β = [βij] whose rows are the 1 × k vectors =
[βij]. Thus,
yi = Y=
For reasons that will be become apparent shortly, we choose to view the rows of β as
column vectors βi, and so the rows themselves are the transpose .
Observation: Let Σ = [σij] be the k × k population covariance matrix for X. Then the
covariance matrix for Y is given by
ΣY = βT Σ β
i.e. population variances and covariances of the yi are given by
Observation: Our objective is to choose values for the regression coefficients βij so as to
maximize var(yi) subject to the constraint that cov(yi, yj) = 0 for all i ≠ j. We find such
coefficients βij using the Spectral Decomposition Theorem (Theorem 1 of Linear Algebra
Background). Since the covariance matrix is symmetric, by Theorem 1 of Symmetric
Matrices, it follows that
Σ = β D βT
where β is a k × k matrix whose columns are unit eigenvectors β1, …, βk corresponding to
the eigenvalues λ1, …, λk of Σ and D is the k × k diagonal matrix whose main diagonal
consists of λ1, …, λk. Alternatively, the spectral theorem can be expressed as
Proof: By definition of the covariance matrix, the main diagonal of Σ contains the
values , …, , and so trace(Σ) = . But by Property 1 of Eigenvalues and
Eigenvectors, trace(Σ) = .
Observation: Thus the total variance for X can be expressed as trace(Σ)
= , but by Property 1, this is also the total variance for Y.
Thus the portion of the total variance (of X or Y) explained by the ith principal component
yi is λi/ . Assuming that λ1 ≥ … ≥ λk the portion of the total variance explained by the
first m principal components is therefore / .
Our goal is to find a reduced number of principal components that can explain most of
the total variance, i.e. we seek a value of m that is as low as possible but such that the
ratio / is close to 1.
Observation: Since the population covariance Σ is unknown, we will use the sample
covariance matrix S as an estimate and proceed as above using S in place of Σ. Recall
that S is given by the formula:
where we now consider X = [xij] to be a k × n matrix such that for each i, {xij: 1 ≤ j ≤ n} is a
random sample for random variable xi. Since the sample covariance matrix is symmetric,
there is a similar spectral decomposition
The sample covariance matrix S is shown in Figure 3 and can be calculated directly as
=MMULT(TRANSPOSE(B4:J123-B126:J126),B4:J123-B126;J126)/(COUNT(B4:B123)-
1)
Here B4:J123 is the range containing all the evaluation scores and B126:J126 is the range
containing the means for each criterion. Alternatively we can simply use the Real
Statistics formula COV(B4:J123) to produce the same result.
In practice, we usually prefer to standardize the sample scores. This will make the weights
of the nine criteria equal. This is equivalent to using the correlation matrix. Let R = [rij]
where rij is the correlation between xi and xj, i.e.
The sample correlation matrix R is shown in Figure 4 and can be calculated directly as
=MMULT(TRANSPOSE((B4:J123-B126:J126)/B127:J127),(B4:J123-
B126:J126)/B127:J127)/(COUNT(B4:B123)-1)
Here B127:J127 is the range containing the standard deviations for each criterion.
Alternatively we can simply use the Real Statistics function CORR(B4:J123) to produce
the same result.
Note that all the values on the main diagonal are 1, as we would expect since the variances
have been standardized. We next calculate the eigenvalues for the correlation matrix
using the Real Statistics eVECTORS(M4:U12) formula, as described in Linear Algebra
Background. The result appears in range M18:U27 of Figure 5.
Figure 5 – Eigenvalues and eigenvectors of the correlation matrix
The first row in Figure 5 contains the eigenvalues for the correlation matrix in Figure 4.
Below each eigenvalue is a corresponding unit eigenvector. E.g. the largest eigenvalue
is λ1= 2.880437. Corresponding to this eigenvalue is the 9 × 1 column
eigenvector B1 whose elements are 0.108673, -0.41156, etc.
As we described above, coefficients of the eigenvectors serve as the regression coefficients
of the 9 principal components. For example the first principal component can be
expressed by
i.e.
Thus for any set of scores (for the xj) you can calculate each of the corresponding
principal components. Keep in mind that you need to standardize the values of
the xj first since this is how the correlation matrix was obtained. For the first sample
(row 4 of Figure 1), we can calculate the nine principal components using the matrix
equation Y = BX′ as shown in Figure 6.
Here B (range AI61:AQ69) is the set of eigenvectors from Figure 5, X (range AS61:AS69)
is simply the transpose of row 4 from Figure 1, X′ (range AU61:AU69) standardizes the
scores in X (e.g. cell AU61 contains the formula =STANDARDIZE(AS61, B126, B127),
referring to Figure 2) and Y (range AW61:AW69) is calculated by the formula
=MMULT(TRANSPOSE(AI61:AQ69),AU61:AU69). Thus the principal components
values corresponding to the first sample are 0.782502 (PC1), -1.9758 (PC2), etc.
As observed previously, the total variance for the nine random variables is 9 (since the
variance was standardized to 1 in the correlation matrix), which is, as expected, equal to
the sum of the nine eigenvalues listed in Figure 5. In fact, in Figure 7 we list the
eigenvalues in decreasing order and show the percentage of the total variance accounted
for by that eigenvalue.
Using Excel’s charting capability, we can plot the values in column N of Figure 7 to obtain
a graphical representation, called a scree plot.
This is done by highlighting the range R32:U40 and selecting Home >
Styles|Conditional Formatting and then choosing Highlight Cell Rules >
Greater Than and inserting the value .4 and then selecting Home >
Styles|Conditional Formatting and then choosing Highlight Cell Rules > Less
Than and inserting the value -.4.
Note that Entertainment, Communications, Charisma and Passion are highly correlated
with PC1, Motivation and Caring are highly correlated with PC3 and Expertise is highly
correlated with PC4. Also Expectation is highly positively correlated with PC2 while
Friendly is negatively correlated with PC2.
Ideally we would like to see that each variable is highly correlated with only one principal
component. As we can see form Figure 9, this is the case in our example. Usually this is
not the case, however, and we will show what to do about this in the Basic Concepts of
Factor Analysis when we discuss rotation in Factor Analysis.
In our analysis we retain 4 of the 9 principal factors. As noted previously, each of the
principal components can be calculated by
where the εi are the components which are not explained by the linear relationship. We
further assume that the mean of each is 0 and the factors are independent with mean 0
and variance 1. We can consider the above equations to be a series of regression equations.
The coefficient βij is called the loading of the ith variable on the jth factor. The
coefficient εi is called the specific factor for the ith variable. Let β = [βij] be
the k × m matrix of loading factors and let ε = [εi] be the k × 1 column vector of
specific factors.
Define the communality of variable xi to be φi = and let ϕi = var(εi) and =
var(xi).
Observation: Since μi = E[xi] = E[βi0 + yi + εi] = E[βi0] + E[yi] + E[εi]
= βi0 + 0 + 0 = βi0, it follows that the intercept term βi0 = μi, and so the regression equations
can be expressed as
or equivalently
where is the k × k diagonal matrix with in the ith position on the diagonal.
Observation: Let λ1 ≥ … ≥ λk be the eigenvalues of Σ with corresponding unit
eigenvectors γ1, …, γk where each eigenvector γi = [γij] is a k × 1 column vector of the
form γi = [γij]. Now define the k × k matrix β = [βij] such that βij = γij for all 1 ≤ i, j ≤ k.
As observed in Linear Algebra Background, all the eigenvalues of Σ are non-negative, and
so the βij are well defined (see Property 8 of Positive Definite Matrices). By Theorem 1
of Linear Algebra Background (Spectral Decomposition Theorem), it follows that
where λ1 ≥ … ≥ λk are the eigenvalues of S (a slight abuse of notation since these are not
the same as the eigenvalues of Σ) with corresponding unit eigenvectors C1, …, Ck and L =
[bij] is the k × k matrix such that bij = cij.
As we saw previously
or equivalently
and so
Similarly
Factor Extraction
A number of methods are available to determine the factor loadings used for factor
analysis. We will start by explaining the principal component method. Another commonly
used method, the principal axis method, is presented in Principal Axis Method of Factor
Extraction.
Using the concepts that are described in Basic Concepts of Factor Analysis, we show how
to carry out factor analysis via the following example..
Example 1: Carry out the factor analysis for evaluating great teachers based on the data
in Example 1 of Principal Component Analysis.
As we saw in Example 1 of Principal Component Analysis, nine criteria are measured. Our
objective is to find a set of fewer than nine factors which reasonably captures what is a
great teacher. In fact we hope to find substantially fewer than nine factors that do the job.
Figure 1 shows the correlation matrix for this data (repeated from Figure 4 of Principal
Component Analysis).
Figure 72 shows the table of eigenvalues and eigenvectors for the correlation matrix
(repeated from in Figure 5 of Principal Component Analysis) using the supplemental
function eVECTORS(B6:J14).
Using the formula bij = cij where C1, …, Ck are the eigenvectors (range B19:J27 in Figure
2) corresponding to the eigenvalues (range B18:J18 in Figure 2) λ1 ≥ ⋯ ≥ λk, we calculate
the loading factors for the nine common factors (see Figure 3).
Figure 3 – Loading factors (full model)
For example, the loading factor of the Passion variable on Factor 1 (cell B38) is given by
the formula =B26*SQRT(B$18). Figure 3 also contains the communalities (range
K31:K39). The communality of each variable represents the portion of that variable’s
variance captured by the model. For variable xi this is . E.g., the communality of
the Passion variable (cell K38) is calculated via the formula =SUMSQ(B38;J38). Since we
are using the full model (where all nine common factors are present) and the variance of
each variable is 1 (remember we standardized the data), it is not surprising that column
K contains all ones.
In fact, if we had used the eigenvalues and eigenvectors as calculated in Figure 2, we would
have seen communalities that are close to 1, but not exactly 1. In fact, to get the
communalities to come out to 1 we reran the eigenvector function eVECTORS(B6:J14,
200), using 200 iterations (instead the default of 100) to get a more accurate picture of
the eigenvalues and eigenvectors.
Determining the Number of Factors
As mentioned previously, one of the main objectives of factor analysis is to reduce the
number of parameters. The number of parameters in the original model is equal to the
number of unique elements in the covariance matrix. Given symmetry, there are C(k, 2)
= k(k+1)/2 such elements. The factor analysis model requires k(m+1) elements; i.e. the
number of parameters in L (namely km) plus the number of elements in X = μ + LY +
ε (namely k).
Thus, we desire a value for m such that k(m+1) ≤ k(k+1)/2, i.e. m ≤ (k–1)/2. For Example
1 of Factor Extraction, we are looking for m ≤ (k–1)/2 = (9–1)/2 = 4. Our preference is to
use fewer than 4 factors if possible.
In general, the factors which have a high eigenvalue should be retained, while those with
a low eigenvalue should be eliminated, but what is high and what is low? The general
approach (Kaiser) is to retain factors with eigenvalue ≥ 1 and eliminate factors with
eigenvalue < 1. This may be appropriate for smaller models, but it may be too restrictive
for models with lots of variables.
Another approach is to create a scree plot (Cattel), i.e. a graph of the eigenvalues (y-
axis) of all the factors (x-axis) where the factors are listed in decreasing order of their
eigenvalues (as we did in principal component analysis). The heuristic is to retain all the
factors above (i.e. to the left of) the inflection point (i.e. the point where the curve starts
to levels off) and eliminate any factor below (i.e. to the right of) the inflection point. Since
the curve isn’t necessarily smooth there can be multiple inflection points and so the actual
cutoff point can be subjective.
The scree plot for Example 1 of Factor Analysis Example is shown in Figure 1. The plot
seems to have two inflection points: one at eigenvalue 2 and the other at eigenvalue 5. For
our purposes we choose to keep the factors corresponding to eigenvalues to the left of
eigenvalue 5, i.e. the 4 largest eigenvalues. These four eigenvalues account for 72.3% of
the variance.
In addition we recalculate the communalities for each of the variables (in column F). We
can think of a communality as something like R2 from regression analysis. In fact, if we
perform regression analysis on the four factors, the value of R2 would be 6.60747, which
represents the total variance (out of 9) captured by the model (i.e. 72.3%). The
communalities for each of the variables range from 50.2% for Passion to 92.2% for
Expertise. Note that 72.3% of total variance is the same percentage that we saw in Figure
1, found by dividing the sum of the eigenvalues for the highest four factors by the total
variance.
In general we would like to see that the communalities for each variable are at least .5.
Variables with communalities less than .5 should be considered for removal and the
analysis rerun.
Since the variance of each variable is 1, the specific variance is simply 1 – the communality,
i.e. ϕi = 1 – , as summarized in Figure 3. The communalities are the variances
captured by the model and the specific variance are the error terms.
Figure 3 – Communalities and specific variances
As we did in Figure 9 of Principal Component Analysis, we highlight all the loading factors
whose absolute value is greater than .4 (see Figure 2). We see that Entertainment,
Communications, Motivation, Charisma and Passion are highly correlated with Factor 1,
Motivation and Caring are highly correlated with Factor 3 and Expertise is highly
correlated with Factor 4. Also Expectation is highly positively correlated with Factor 2
while Friendly is negatively correlated with Factor 2.
Ideally we would like to see that each variable is highly correlated with only one factor. As
we can see from Figure 2, this is the case in our example, except that Motivation is
correlated with both Factor 1 and 3. We will attempt to clarify the analysis by means of a
rotation, as in Rotation.
Rotation
Let U be any m × m orthogonal matrix , and so by definition UTU = I.
Let L′ = LUT and Y′= UY. Then L′ is a (k × m) × (m × m) = k × m matrix and Y′ is a
(m × m) × (m × 1) = m × 1 column vector. Also
X = μ + LY+ ε = μ + LUTUY + ε = μ + L′Y′ + ε
E[Y′] = E[UY] = U E[Y] = U0 = 0
var(Y′) = var(UY) = U var(Y) UT = UIUT = UUT = I
cov(Y′, ε) = cov(UY, ε) = U cov(Y, ε) = U0 = 0
This shows that if L and Y satisfy the model, then so do L′ and Y′. Since there are an
infinite number of orthogonal matrices U, there are an infinite number of alternative
models.
A rotation of the original axes is determined by an orthogonal matrix U with det = 1
(Property 6 of Orthogonal Vectors and Matrices). Thus, replacing Y and by Y′ is
equivalent to rotating the axes. This won’t change the overall variance explained by the
model (i.e. the communalities), but it will change the distribution of variances among the
factors.
We seek an m × m rotation matrix U = [uij] such that the rows represent the existing
factors and the columns represent the new factors. The most popular rotation approach
is called Varimax, which maximizes the differences between the loading factors while
maintaining orthogonal axes. Varimax attempts to maximize the value of V where
Regression Method
If we look to Definition 1 of Basic Concepts of Factor Analysis, we recall that the factor
analysis model is based on the equations:
We find the values of the factors using the method of least square employed in multiple
regression (see Least Square Method of Multiple Regression). In particular, our goal is to
find the value of Y which minimizes ||E|| based on the values in the sample for the explicit
variables X.
The least square solution (Property 1 of Least Square Method of Multiple Regression) is
Note that since this regression doesn’t have a constant term, we don’t need to add a
columns of 1’s to L as we did in Property 1 of Least Square Method of Multiple Regression.
Now LTL = D where D is a diagonal matrix whose main diagonal consists of the
eigenvalues of S. Thus (LTL)-1 is the diagonal matrix whose main diagonal consists of 1/λ1,
⋯, 1/λk.
We define the factor score matrix to be the m × k matrix F = (LTL)-1L = [fij] where
and where C1, …, Cm are the orthonormal eigenvectors corresponding to the eigenvalues λ1,
…, λm.
Recall that L = [bij] is the k × m matrix such that bij = . Since · = , if
follows that the factor scores for the sample X satisfy
For example, the factor score matrix and factor scores for the first sample (see Figure 1 or
6 of Principal Component Analysis) for Example 1 of Factor Extraction is shown in Figure
1.
Figure 1 – Factor score matrix using least squares method
Here the factor score matrix (range BV6:BY14) is calculated by the formula
=B19:E27/SQRT(B18:E18) (referring to cells in Figure 2 of Factor Extraction), the sample
scores X (range CA6:C14) is as in Figure 1 or 6 of Principal Component Analysis, X′
(CC6:CC14) consists of the values in X less the means of each of the variables and is
calculated by the formula =CA6:CA14-TRANSPOSE(B128:J128) (referring to Figure 2
of Principal Component Analysis). Finally, the factor scores Y corresponding to the scores
in X (range CE6:CE9) is calculated by the formula
=MMULT(TRANSPOSE(BV6:BY14),CC6:CC14)
Actually since we reversed the sign of the loadings for factor 1, we need to reverse the sign
for the factor scores for factor 1 (i.e. column BV). This results in a change of sign for factor
1 (i.e. CE6). The result is shown in Figure 2.
we try to minimize
where V is the diagonal matrix whose main diagonal
consists of the specific variances. This produces factor scores satisfying
For Bartlett’s method we define the factor score matrix to be
the m × k
For Example 1 of Factor Extraction the factor score matrix and calculation for the first
sample using Bartlett’s method is shown in Figure 4.
Figure 4 – Factor scores using Bartlett’s method
Here LTV-1L (range CN:CQ21) is calculated by the array formula
=MMULT(TRANSPOSE(B44:E52),MMULT(MINVERSE(DIAGONAL(Q44:Q52)),
B44:E52))
=TRANSPOSE(MMULT(MINVERSE(CN18:CQ21),MMULT(TRANSPOSE(B44:E52
MINVERSE(DIAGONAL(Q44:Q52)))))
The rest of the figure is calculated as in Figure 2. In a similar fashion we can calculate the
factor scores for the entire sample (see Figure 2 of Principal Component Analysis). The
result for the first 10 sample items is shown in Figure 5 (note that we are now showing
the X as row vectors instead of column vectors as was employed in Figure 4).
Figure 5 – Factor scores using Bartlett’s method
Here the factor scores for the entire sample is given in range CZ19:DC38, and is calculated
by the formula =MMULT(B4:J123-B126:J126,CN26:CQ34), referring to cells in Figure 1
of Principal Component Analysis and Figure 4.
Anderson-Rubin’s Method
In this method the factor scores are not correlated. This method produces factor scores
satisfying
To calculate the factor matrix for Example 1 of Factor Extraction using Anderson-Rubin’s
method, we first find the matrices shown in Figure 6.
=MINVERSE(MSQRT(DF17:DI20))
The factor score matrix and calculation for the first sample using Anderson-Rubin’s
method is shown in Figure 7.
=TRANSPOSE(MMULT(DR12:DU15,DR5:DZ8))
The factor scores for the first 10 sample items is shown in Figure 8 (note that as before we
are now showing the X as row vectors instead of column vectors as was employed in
Figure 7).
Figure 8 – Factor scores using Anderson-Rubin’s method
The factor scores (using any of the methods described above) can now be used as the data
for subsequent analyses. In some sense they provide similar information as that given in
the original sample (Figure 1 of Principal Component Analysis), but with a reduced
number of variables (as was our original intention).
Note that exploratory factor analysis does not require that the data be multivariate
normally distributed, but many of the analyses that will be done using the reduced factors
(and factor scores) will require multivariate normality.
alidity of Correlation Matrix and Sample Size
Factor analysis doesn’t make sense when there is either too much or too little correlation
between the variables. When reducing the number of dimensions we are leveraging the
inter-correlations. E.g. if we believe that three variables are correlated to some hidden
factor, then these three variables will be correlated to each other. You can test the
significance of the correlations, but with such a large sample size, even small correlations
will be significant, and so a rule of thumb is to consider eliminating any variable which
has many correlations less than 0.3.
We can calculate the Reproduced Correlation Matrix, which is the correlation matrix
of the reduced loading factors matrix.
Our expectation is that cov(ei, ej) ≈ cov(εi, εj) = 0 for all i ≠ j. If too many of these
covariances are large (say > .05) then this would be an indication that our model is not as
good as we would like.
The error matrix, i.e. R – LLT, for Example 1 of Factor Extraction is calculated by the array
formula,
=B6:J14-MMULT(B44:E52,TRANSPOSE(B44:E52))
Note too that if overall the variables don’t correlate, signifying that the variables are
independent of one another (and so there aren’t related clusters which will correlate with
a hidden factor), then the correlation matrix would be approximately an identity matrix.
We can test (called Bartlett’s Test) whether a population correlation matrix is
approximately an identity matrix using Box’s test.
For Example 1 of Factor Extraction, we get the results shown in Figure 3.
We first fill in the range L5:M6. Here cell L5 points to the upper left corner of the
correlation matrix (i.e. cell B6 of Figure 1 of Factor Extraction) and cell L6 points to a 9
× 9 identity matrix. 120 in cells M5 and M6 refers to the sample size. We next highlight
the 5 × 1 range M8:M12, enter the array formula BOX(L5:M6) and then press Ctrl-Shft-
Enter.
Since p-value < α = .001, we conclude there is a significant difference between the
correlation matrix and the identity matrix.
Of course, even if Bartlett’s test shows that the correlation matrix isn’t approximately an
identity matrix, especially with a large number of variables and a large sample, it is
possible for there to be some variables that don’t correlate very well with other variables.
We can use the Partial Correlation Matrix and the Kaiser-Meyer-
Olkin (KMO)measure of sample adequacy (MSA) for this purpose, described as follows.
It is not desirable to have two variables which share variance with each other but not with
other variables. As described in Multiple Correlation this can be measured by the partial
correlation between these two variables. To calculate the partial correlation matrix for
Example 1 of Factor Extraction, first we find the inverse of the correlation matrix, as
shown in Figure 4.
Range B6:J14 is a copy of the correlation matrix from Figure 1 of Factor Extraction (onto
a different worksheet). Range B20:J28 is the inverse, as calculated by
=MINVERSE(B6:J14). We have also shown the square root of the diagonal of this matrix
in range L20:L28 as calculated by =SQRT(DIAG(B20:J28)), using the DIAG
supplemental array function. The partial correlation matrix is now shown in range
B33:J41 of Figure 5.
Figure 5 – Partial correlation matrix
The partial correlation between variables xi and xj where i ≠ j keeping all the
other variables constant is given by the formula
where Z = the list of variables x1, …, xk excluding xi and xj, and the inverse of the
correlation matrix is R-1 = [pij]. Thus the partial correlation matrix shown in Figure 5 can
be calculated using the array formula
=-B20:J28/MMULT(L20:L28,TRANSPOSE(L20:L28))
Since this formula results in a matrix whose main diagonal consists of minus ones, we use
the slightly modified form to keep the main diagonal all ones:
=-B20:J28/MMULT(L20:L28,TRANSPOSE(L20:L28))+2*IDENTITY()
where the correlation matrix is R = [rij] and the partial covariance matrix is U = [uij]. The
overall KMO measure of sample adequacy is given by the above formula taken over all
combinations and i ≠ j.
KMO takes values between 0 and 1. A value near 0 indicates that the sum of the partial
correlations are large compared to the sum of the correlations, indicating that the
correlations are widespread and so are not clustering among a few variables, indicating a
problem for factor analysis. On the contrary, a value near 1 indicates a good fit for factor
analysis.
In addition to the KMO measures of sample adequacy, various guidelines have been
proposed to determine how big a sample is required to perform exploratory factor
analysis. Some have proposed that the sample size should be at least 10 times the number
of variables and some even recommend 20 times. For Example 1 of Factor Extraction, a
sample size of 120 observations for 9 variables yields a 13:1 ratio. A better indicator of
sample size is summarized in the following table:
In the principal axis factoring method, we make an initial estimate of the common
variance in which the communalities are less than 1. This initial estimate assumes that
the communality of each variable is equal to the square multiple regression coefficient of
that variable with respect to the other variables. The principal axis factoring method is
implemented by replacing the main diagonal of the correlation matrix (which consists of
all ones) by these initial estimates of the communalities. The principal component is now
applied to this revised version of the correlation matrix, as described above
where M4:U12 is the correlation matrix (see Figure 3 of Factor Analysis Example).
For each p we show how to compute the communalities Cp+1 in the next example.
Example 1: Repeat the factor analysis on the data in Example 1 of Factor
Extraction using the principal axis factoring method.
As calculate the correlation matrix and then the initial communalities as described above.
We next substitute the initial communalities in the main diagonal of the correlation
matrix and calculate the factor matrix as we did in the principal component method of
extraction. This is shown in Figure 2.
Figure 2 – Iteration #1
The revised correlation matrix R1 in range Y6:AG14 is equal to the original correlation
matrix with the entries in the main diagonal replaced by the communalities calculated in
the previous step (i.e C0 in this case). We can calculate this correlation via the array
formula
=M4:U12–IDENTITY()+DIAGONAL(V33:V41)
where M4:U12 is the original correlation matrix R0 (Figure 3 of Factor Analysis Example)
and V33:V41 are the communities C0 (from Figure 1).
The eigenvalues and eigenvectors in range Y18:AG28 is calculated by
=eVECTORS(Y6;AG114). The Factor Matrix in range Y33;AG41 is calculated as in
Principal Component extraction, except where the corresponding eigenvalues are not
positive. While this is not possible for Principal Component extraction, it is possible for
Principal Axis extraction. When an eigenvalue is non-positive (as is the case with the final
5 eigenvalues in Figure 2) the corresponding loading factors are set to zero. For example
the formula for calculating the first entry in the Factor Matrix (cell Y33) is
=IF(Y$19>0,Y20*SQRT(Y$19),0)
=SUMXMY2(AH33:AH41,V33:V41)
(referring to Figure 1 and 2).
It turns out that after 19 iterations convergence goal of .00001 is reached with the
difference between the communalities C18 and C19 of 8.81E-06. The values of the
communalities after the 19th iteration are given in range IP33:IP41 of Figure 3.
Figure 3 – Iteration #19
The Real Statistics Resource Pack provides a supplemental array function which
automates the process of finding the converged values of the communalities, thus
avoiding the tedious calculations described above.
If you click on the Help button the following dialog box will appear.
In order to display the rotated factor matrix shown in range B114:E122, the VARIMAX
supplemental array function is used. This function is provided in the Real Statistics
Resource Pack.
VARIMAX(R1, iter, prec) = the result of rotating the square matrix defined by range R1
using the Varimax algorithm, where iter is the maximum number of iterations (default
100) and prec is the value that is considered to be sufficiently close to zero (default
0.00001).
In Figure 7, range B114:E122 contains the formula =VARIMAX(M100:P108).
Figure 8 – Factor Analysis PCA Extraction – part 6
Figure 9 – Factor Analysis PCA Extraction – part 7
Figure 10 – Factor Analysis PCA Extraction – part 8
As described in Principal Axis Extraction, the Real Statistics software next calculates the
initial communalities and revised communalities (using the ExtractCommunalities
supplemental function) as described in Figure 11.
Figure 11 – Factor Analysis PAF Extraction – part 3
From this point on the data analysis tool calculates its results exactly as in Principal
Component extraction except that the revised correlation matrix (range M96:104 in
Figure 11) is used as the correlation matrix.
Figure 12 – Factor Analysis PAF Extraction – part 4