Hotelling 1936

Download as pdf or txt
Download as pdf or txt
You are on page 1of 57

RELATIONS BETWEEN TWO SETS OF VARIATES*

BY HAROLD HOTELLJNG, Columbia University.


CONTENTS.
BBCT. PAGE
1. The Correlation of Vectors. T h e Most Predictable Criterion and t h e
Tetrad Difference 321
2. Theorems on Determinants and Matrices 325

Downloaded from http://biomet.oxfordjournals.org/ at University of Oklahoma on January 17, 2015


3. Canonical Variates and Canonical Correlations. Applications to Algebra
and Geometry 326
4. Vector Correlation and Alienation Coefficients 332
6. Standard Errors 336
6. Examples, and an Iterative Method of Solution 342
7. T h e Vector Correlation as a Product of Correlations or of Cosines '. 349
8. An Exact Sampling Distribution of q 362
9. Moments of q. T h e Distribution for Large Samples . . . . 354
10. The Distribution for Small Samples. Form of t h e Frequency Curve . 359
11. Tests for Complete Independence 362
12. Alternants of a Plane and of a Sample 365
13. The Bivariate Distribution for Complete Independence (« —1 = 2, n—4) . 369
14. Theorem on Circularly Distributed Variates 371
15. Generalization of Section 13 for Samples of Any Size . . . . 372
16. F u r t h e r Problems 375

1. The Correlation of Vectors. The Most Predictable Criterion and the Tetrad
Difference. Concepts of correlation and regression may be applied not only to
ordinary one-dimensional variates bat also to variates of two or more dimensions.
Marksmen side by side firing simultaneous shots at targets, so that the deviations
are in part due to independent individual errors and in part to common causes
such as wind, provide a familiar introduction to the theory of correlation; but only
the correlation of the horizontal components is ordinarily discussed, whereas the
complex consisting of horizontal and vertical deviations may be even more interest-
ing. The wind at two places may be compared, using both components of the
velocity in each place. A fluctuating vector is thus matched at each moment with
another fluctuating vector. The study of individual differences in mental and
physical traits calls for a detailed study of the relations between sets of correlated
variates. For example the scores on a number of mental tests may be compared
with physical measurements on the same persons. The questions then arise of
determining the number and nature of the independent relations of mind and body
shown by these data to exist, and of extracting from the multiplicity of correlations
in the system suitable characterizations of these independent relationa As another
* Presented before the American Mathematical Society and the Institute of Mathematical Statisticians
at Ann Arbor, September 12, 1985.
Biometrik* xxvm 21
322 Relations between Two Sets of Varieties
example, the inheritance of intelligence in rats might be studied by applying not
one but » different mental tests to N mothers and to a daughter of each. Then
-~z—- correlation coefficients could be determined, taking each of the mother-
daughter pairs as one of the N cases. From these it would be possible to obtain
a clearer knowledge as to just what components of mental ability are inherited
than could be obtained from any single test.
Much attention has been given to the effects of the crops of various agricultural
commodities on their respective prices, with a view to obtaining demand curves.
The standard errors associated with such attempts, when calculated, have usually
been found quite excessive. One reason for this unfortunate outcome has been the

Downloaded from http://biomet.oxfordjournals.org/ at University of Oklahoma on January 17, 2015


large portion of the variance of each commodity price attributable to crops of other
commodities. Thus the consumption of wheat may be related as much to the prices
of potatoes, rye, and barley as to that of wheat. The like is true of supply functions.
It therefore seems appropriate that studies of demand and supply should be made
by groups rather than by single commodities*. This is all the more important in
view of the discovery that demand and supply curves provide altogether inadequate
foundation for the discussion of related commodities, the ignoring of the effects of
which has led to the acceptance as part of classical theory of results which are wrong
not only quantitatively but qualitatively. It is logically as well as empirically
necessary to replace the classical one-commodity type of analysis, relating for
example to the incidence of taxation, utility, and consumers' surplus, by a simul-
taneous treatment of a multiplicity of commodities f.
The relations between two sets of variates with which we shall be concerned
are those that remain invariant under internal linear transformations of each set
separately. Such invariants are not affected by rotations of axes in the study of
wind or of hits on a target, or by replacing mental test scores by an equal number
of independently weighted sums of them for comparison with physical measurements.
If measurements such as height to shoulder and difference in height of shoulder
and top of head are replaced by shoulder height and stature, the invariant relations
with other sets of variates will not be affected. In economics there are important
linear transformations corresponding for example to the mixing of different grades
of wheat in varying proportions {. Both prices and quantities are then transformed
linearly.
In this case, besides the invariants to be discussed in this paper, there will be
others resulting from the fact that the transformation of quantities is not independent
of that of the prices, but is contragredient to i t (Cf. Section 16 below.)
* The only published study known to the writer of groups of commodities for which standard
errors were calculated is the paper of Henry Sohulte, "Interrelations of Demand," in Journal of Political
Economy, Vol. m . pp. 468—612, August, 1938. Some at least of the coefficients obtained are significant.
t Harold Hotelling, "Edgeworth's Taxation Paradox and the Nature of Demand and Supply
Functions" in Journal of Political Economy, Vol. XL. pp. 577—616, October, 1982, and "Demand
Functions with Limited Budgets " in Econometriea, Vol. m. pp. 66—78, January, 1936.
X Harold Hotelling, " Spaces of Statistics and their Metriiation " in Science, Vol. l i r a . pp. 149—160,
February 10, 1928.
HAROLD HOTELLING 323

Sets of variates, which may also be regarded as many-dimensional variates, or


as vectors possessed of frequency distributions, have been investigated from several
different standpoints. The work of Gauss on least squares and that of Bravais, Galton,
Pearson, Yule and others on multivariate distributions and multiple correlation are
early examples. In 'The Generalization of Student's Ratio*," the writer has given
a method of testing in a manner invariant under linear transformations, and with
full statistical efficiency, the significance of sets of means, of regression coefficients,
and of differences of means or regression coefficients. A procedure generalizing the
analysis of variance to vectors has been applied to the study of the internal structure
of cells by means of Brownian movements, for which the vectors representing
displacements in consecutive fifteen-second intervals were compared with their

Downloaded from http://biomet.oxfordjournals.org/ at University of Oklahoma on January 17, 2015


resultants to demonstrate the presence of invisible obstructions restricting the
movement f. Finally, S. S. Wilks has published important work on relations among
two or more sets of variatea which are invariant under internal linear trans-
formations:!:. Denoting by A, B and D respectively the determinants of sample
correlations within a set of s variates, within a set of t variates, and in the set
consisting of all these s + t variates, the distribution of the statistic,

was determined exactly by Wilks under the hypothesis that the distribution is
normal, with no population correlation between any variate in one set and any in
the other. Wilks also found distributions of analogous functions of three or more
sets, and of other related statistics.
The statistic (1*1) is invariant under internal linear transformations of either
set, as will be proved in Section 4. Another example of such a statistic is provided
by the maximum multiple correlation with either set of a linear function of the
other set, which has been the subject of a brief study§. This problem of finding,
not only a best predictor among the linear functions of one set, but at the same
time the function of the other set which it predicts most accurately, will be solved
in Section 3 in a more symmetrical manner. When the influence of these two
linear functions is eliminated by partial correlation, the process may be repeated
with the residuals. In this way we may obtain a sequence of pairs of variates, and
of correlations between them, which in the aggregate will fully characterize the
invariant relations between the sets, in so far as these can be represented by
correlation coefficients. They will be called canonical variates and canonical
correlations. Every invariant under general linear internal transformations, such
for example as t, will be seen to be a function of the canonical correlations.
• Annalt of Mathematical Statistic*, Vol. n. pp. 860—878, August, 1981.
t L. G. M. Baas-Beoking, Henrietta van de Sande Bakhuyzen, and Harold Hotelling, " The Physical
State of Protoplasm" in Vtrhandelingcn der Koninklijke Akadenat van Wetenschappen te Armterdam,
Second Section, Vol. v. (1928).
X "Certain Generalizations in the Analysis of Variance" in Biometrika, Vol. xrrv. pp. 471—494,
November, 1932.
§ Harold Hotelling, " T h e Most Predictable Criterion" in Journal of Educational Prycholoffy,
Vol. xxvi. pp. 189—142, February, 1936.
21—2
324 Relations between Two Sets of Varieties
Observations of the values taken in N cases by the components of two vectors
constitute two matrices, each of N columns. If each vector has s components, then
each matrix has s rows. In this case we may consider the correlation coefficient
between the C^-rowed determinants in one matrix and the corresponding deter-
minants in the other. Since a linear transformation of the variates in either set
effects a linear transformation of the rows of the matrix of observations, which
merely multiplies all these determinants by the same constant, it is evident that
the correlation coefficient thus calculated is invariant in absolute value. We shall
call it the vector correlation or vector correlation coefficient, and denote it by q.
When 8 = 2, if we call the variates of one set xlt xt, and those of the other x3, xt,
and rit the correlation of x{ with xt, then it is easy to deduce with the help of the

Downloaded from http://biomet.oxfordjournals.org/ at University of Oklahoma on January 17, 2015


theorems stated in Section 2 below that
r18rM-rurM
V(l»KlV)
For larger values of s, q will have as its numerator the determinant of correlations of
variates in one set with variates in the other, and as its denominator the geometric
mean of the two determinants of internal correlations. A generalization of q for
sets with unequal numbers of components will be given in Section 4.
Corresponding to the correlation coefficient r between two simple variates,
T. L. Kelley has defined the alienation coefficient as Vl — r*. The square of the
correlation coefficient between x and y is the fraction of the variance of y attri-
butable to x, while the square of the alienation coefficient is the fraction independent
of x. If we adopt this apportionment of variance as a basis of generalization, we
shall be consistent in calling 4z the vector alienation coefficient
If there exists a linear function of one set equal to a linear function of the
other—if for example Xi is identically equal to xg—the expression (1*2) for q reduces
to a partial correlation coefficient. If one set consists of a single variate and the
other of two or more, the vector correlation coincides with the multiple correlation.
If each set contains only one variate, q is the simple correlation between the two.
Thus the vector correlation coefficient provides a generalization of several familiar
concepts.
The numerator of (12), known as the tetrad difference or tetrad, has been of
much concern to psychologists. The vanishing in the population of all the tetrads
among a set of tests is a necessary condition for the theory, propounded by Spearman,
that scores on the teste are made up of a component common in varying degrees to
all of them, and of independent components specific to each. The vanishing of some
but not all of the tetrads is a condition for certain variants of the situation *. The
sampling errors of the tetrad have therefore received much attention. In dealing
with them it has been thought necessary to ignore at least three types of error:
(1) The standard error formulae used are only asymptotically valid for very
large samples, with no means of determining how large a sample is necessary.
* Truman L. Kelley, Crotrroad* in the Mind of Man, Stanford University Press, 1928. This book,
in addition to relevant test data and discussion, contains references to the extensive literature, r
standard error formula for the tetrad, and other mathematical developments.
HAKOLD HOTELLING 325

(2) The assumption is made implicitly thai the distribution of the tetrad is
normal, though this cannot possibly be the case, since the range is finite *.
(3) Since the standard error formulae involve unknown population values, these
are in practice replaced by sample values. No limit is known for the errors com-
mitted in this way.
Now it is evident that to test whether the population value of the tetrad is
zero—the only value of interest—is the same thing us to test the vanishing of any
'multiple of the tetrad by a finite non-vanishing quantity. Wishartf considered the
tetrad of covariances, which is simply the product of the tetrad of correlations by
the four standard deviations. For this function he found exact values of the mean

Downloaded from http://biomet.oxfordjournals.org/ at University of Oklahoma on January 17, 2015


and standard error, thus eliminating the first source of error mentioned above.
The exact distribution of q found in Section 8 below may be used to test the
vanishing of the tetrad, eliminating the first and second sources of error. Un-
fortunately even this distribution involves a parameter of the population, one of the
canonical correlations, which must usually be estimated from the sample, introducing
again an error of the third type. However there may be cases in which this one
parameter will be known from theory or from a larger sample.
Now it will be shown that q is the product of the canonical correlations. Hence
at least one of these correlations is zero if the tetrad is. Thus still another test of
the same hypothesis may be made in this way. Now we shall obtain for a canonical
correlation vanishing in the population the extremely simple standard error formula
-.- , involving no unknown parameter. Thus this test evades errors of the third kind,
but is subject to those of the first two, although the second is somewhat mitigated
by an ultimate approach to normality, since the canonical correlations satisfy the
criterion for approach to normality derived by Doob in the article cited. Further
research is needed to find an exact test involving no unknown parameter. The
question of whether this is possible raises a very fundamental problem in the
theory of statistical inference. We shall, however, find exact distributions appro-
priate for testing a variety of hypotheses.
2. Theorems on Determinants and Matrices. We shall have frequent occasion
to refer to the following well-known theorem, the proofs of which parallel those of
the multiplication theorem for determinants, and which might advantageously be
used in expounding many parts of the theory of statistics:
Given two arrays, each composed of in rows and n columns (m < n). The deter-
minant formed by multiplying the rows of one array by those of the other equals the
* The first proof that the distribution of the tetrad approaches normality for large samples was
given by J. L. Doob in an article, "The Limiting Distributions of Certain Statistics," in the Annals of
Mathematical Statutie; Vol. vi. pp. 160—169 (September, 1935). The proof is applicable only if the
population value of i is different from unity, i.e. if the sets x,, xs and x,, xt are not completely
independent. If they are completely independent, the limiting distribution is of the form ^ce~clt[dt, as
Doob showed. What the distribution of the tetrad is for any finite number of cases no one knows.
t "Sampling Errors in the Theory of Two Factors" in British Journal of Ptycholopy, Vol. xix.
pp. 180—187 (1928).
326 Relations between Two Sets of Variates
sum of the products of the m-rowed determinants in thefirstarray by the corresponding
m-rowed determinants in the second.
When the two arrays are identical, we have the corollary that the symmetrical
determinant of the products of rows by rows of an array of m rows and n columns
(m ^ n) equals the sum of the squares of the m-rowed determinants in the array, and
is therefore not negative.
3. Canonical Variates and Canonical Correlations. Applications to Algebra and
Geometry. If xit Wf, ... are yariates having zero expectations and finite covariances,
we denote these covariances by
<T,p= ExaXfi,

Downloaded from http://biomet.oxfordjournals.org/ at University of Oklahoma on January 17, 2015


where E stands for the mathematical expectation of the quantity following. If new
variates Xi',x%, ... are introduced as liuear functions of the old, such that

then the covariances of the new variates are expressed in terms of those of the old
by the equations
22 (3-1),
obtained by substituting the equations above directly in the definition

and taking the expectation term by term.


Now (3'1) gives also the formula for the transformation of the coefficients of a
quadratic form SScr^a;.^ when the .variables are subjected to a linear trans-
formation. Hence the problem of standardizing the covariances among a set
of variates by linear transformations is algebraically equivalent to the canonical
reduction of a quadratic form. The transformation of a quadratic form into a
sura of squares corresponds to replacing a set of variates by uncorrelated com-
ponents. It is to be observed that the fundamental nature of covariances implies
that "LLaafX.Xg is a positive definite quadratic form, and that only real trans-
formations are relevant to statistical problems.
Considering two sets of variates xi, ..., x, and x,+1, ..., x,+t, we shall denote
the covariances, in the sense of expectations of products, by <r«^, o-,it and cr(i, using
Greek subscripts for the indices 1, 2 s and Latin subscripts for s + 1, ..., s + t.
Determination of invariant relations between the two sets by means of the cor-
relations or covariances among the s + t variates is associated with the algebraic
problem, which appears to be new, of determining the invariants of the system
consisting of two positive definite quadratic forms

afi ij
in two separate sets of variables, and of a bilinear form

ai
in both sets, under real linear non-singular transformations of the two sets
separately.
HABOLD HOTELLING 327
Sample covariances are also transformed by the formula (3"1). The ensuing
analysis might therefore equally well be carried out for a sample instead of for
the population. Correlations might be used instead of covariances, either for the
sample or for the population, by introducing appropriate factors, or by assuming
the standard deviations to be unity.
We shall assume that there is no fixed linear relation among the variatea, so
that the determinant of their covariances or correlations is not zero. This implies
that there is no fixed linear relation among any subset of them; consequently
every principal minor of the determinant of s + t rows is different from zero.
If we consider a function u of the variates in the first set and a function v

Downloaded from http://biomet.oxfordjournals.org/ at University of Oklahoma on January 17, 2015


of those in the second, such that

the conditions = 1.'. .(3-2)

are equivalent to requiring the standard deviations of u and v to be unity. The


correlation of u with v is then
«6 < .- (33).
ai

If u and v are chosen so that this correlation is a maximum, the coefficients a.


and hi will satisfy the equations obtained by differentiating

namely .(3-4),

.(3-5).

Here X and fi are Lagrange multipliers. Their interpretation will be evident upon
multiplying (3'4) by a. and summing with respect to a, then multiplying (35) by
bt and summing with respect to t. With (3-2) and (3'3), this process gives
X = ^ = R.
The s + t homogeneous linear equations (3-4) and (3-5) in the s + t unknowns
aa and b{ will determine variates u and v making R a maximum, a minimum, or
otherwise stationary, if their determinant vanishes. Since X = fi, this condition is

.(3-6).
328 Relations between Two Sets of Variates
This symmetrical determinant is the discriminant of a quadratic form <£ —
where

Here sfr is positive definite because it is the sum of two positive definite quadratic
forms. Consequently* all the roots of (36) are real. Moreover the elementary
divisors are all of the first degreef. This means that the matrix of the determinant
in (3-6) is reducible, by transformations which do not affect either its rank or its
linear factors, to a matrix having zeros everywhere except in the principal diagonal,
while the elements in this diagonal are polynomials
^(X), Et(\). ..., B,+t(\),

Downloaded from http://biomet.oxfordjournals.org/ at University of Oklahoma on January 17, 2015


none of which contains any linear factor of the form X — p raised to a degree
higher than the first Therefore, if a simple root of (36) is substituted for X, the
rank is a +1 — 1; but substitution of a root of multiplicity m for X makes the rank
8 + t — m. Consequently if a simple root is substituted for X and p in (34) and
(3"5) these equations will determine values of alt at,..., a,, 6<+ll .... b^+t, uniquely
except for constant factors whose absolute values are determinate from (32). Not
all these quantities are zero; from this fact, and the form of {34) and (3'5), it is
evident that at least one a« and at least one 6j differ from zero, provided the value
put for X is not zero. The variates u and v will then be fully determinate except
that they may be replaced by the pair — u, — v. But for a root of multiplicity m
there will be m linearly independent solutions instead of one in a complete set
of solutions. From these may be obtained m different pairs of variates u and v.
The coefficient of the highest power of X in (36) is the product of two principal
minors, both of which differ from zero because the variates have been assumed
algebraically independent. The equation is therefore of degree s + t. We assume
as a mere matter of notation, if s =/> t, that 8 < t. Then of the s + t roote at least
t — 8 vanish; for the coefficients of X'"1"1 and lower powers of X are sums of
principal minors of 2s +1 or more rows, in which X is replaced by zero, and every
such minor vanishes, as can be seen by a Laplace expansion. Also, the sign of X
may be changed in (3'6) without changing the equation, for this may be accom-
plished by multiplying each of the firsts rows and last t columns by — 1. Therefore
the negative of every root is also a root. The 8+t — (£-«) = 2« roots that do not
necessarily vanish consist therefore of « positive or zero roots pi, p*, •••, p,, and
of the negatives of these roots. These s roots which are positive or zero we shall
call the canonical correlations between the sets of variates; the corresponding
linear functions w, o whose coefficients satisfy (3'2), (3'4) and (3"5) we call canonical
variates\. It is clear that every canonical correlation is the correlation coefficient
between a pair of canonical variates. Hence no canonical correlation can exceed
unity. The greatest canonical correlation is the maximum multiple correlation
• Maxime BAcher, Introduction to Higher Algebra, New York, 1931, p. 170, Theorem 1.
t Booher, p. 806, Theorem 4; p. 267, Theorem 2; p. 271, Definition 3.
X The word "canonical" is used in the algebraic theory of invariants with a meaning consistent
with that of this paper.
HAROLD HOTELLING 329

with either set of a disposable linear function of the other set If u, v are canonical
variates corresponding to pr, tben the pair « , - » o n ) , - u is associated with the
root — py.
If a pair of canonical variates corresponding to a root py is
u^Sa.ya;., vy-='ZbiyXi (37),
a i

the coefficients must satisfy (3-4) and (35), so that

<(^ (3-9).

Downloaded from http://biomet.oxfordjournals.org/ at University of Oklahoma on January 17, 2015


i
Also let ua = Sa3j#0, t;j = 2&«<r< (3'10)

be canonical variates associated with a canonical correlation />j. Among the four
variates (3"7) and (3'10) there are six correlations. Apart from py and p& these are
obviously

We shall prove that the last four are all zero. Multiply (3-8) by a,8-and sum with
respect to a. The result, with the help of (3"11), may be written

Multiplying (3-9) by b^ and summing with respect to t, we get


EUyVt = PyEVyVl
Interchanging 7 and S in this and then using (3"12), we obtain
PyEUyU,
Again interchanging 7 and 8, we have

If py*=fc pt*, the last two equations show that EuyUt = Evyvt = 0. Hence, by (312)
and (313), EvyVt and EuyVt vanish. Thus all the correlations among canonical
variates are zero except those between the canonical variates associated with the
same canonical correlation.
If p. is a root of multiplicity m, it is possible by well-known processes to obtain
m solutions of the linear equations such that, if

are any two of these solutions, they will satisfy the orthogonality condition
2a aY a. s + 26,T6« = 0 (3-15).
a i

There is no loss of generality in supposing that each of the original variates was
330 Relations between Two Sets of Varieties
uncorrelated with the others in the same set and had unit variance. In this case
(315) is equivalent to
—0,
where Uy, vy, vt, vt are given by (37) and (310). For this case of equal roots we
have also from (3-14),
P. (EllyUt - EVyV») «= 0.
If pa =f 0, the last two equations show that EuyUt = Evyvs =» 0, and then from (3'12)
and (313) we have that Evyvt = EuyVt = 0. These correlations also vanish if p. = 0,
for then the right-hand members of (38) and (3-9) vanish, leaving two distinct sets
of equations in disjunct sets of unknowns. The solutions may therefore be chosen
so that the two sums in (3-15) vanish separately.

Downloaded from http://biomet.oxfordjournals.org/ at University of Oklahoma on January 17, 2015


A double zero root determines uniquely, if 8 = t, a pair of canonical variates.
If s < t, such a root determines a canonical variate for the less numerous set, and
leaves t — s degrees of freedom for the choice of the other.
The reduction of our sets of variatea to canonical form may be completed by
the choice of new variates v,+1, v,+2, ...,vtaa linear functions of the second and
more numerous set (unless the numbers in the two sets are equal), uncorrelated
with each other and with the canonical variates vy previously determined, and having
unit variance. This may be done in infinitely many ways, as is well known. These
variates will also be uncorrelated with the canonical variates Uy. Indeed, if
vk = ZbjtXj
is one of them, its correlation with Uy is, by (3-7) and (39),
EuyVt = S S ( 7 a / a a y ^ = pylll1aiibiybjt = pyEvyvk,
which vanishes because vk was defined to be uncorrelated with vy.
The normal form of two sets of variates under internal linear transformations
is thus found to consist of linear functions u^, u%, ...,«, of one set, and vt, vt, ..., vt
of the other, such that all the correlations among these linear functions are zero,
except that the correlation of v^, with vy is a positive number py (7 = 1, 2, ..., s).
Therefore the only invariants of the system under internal linear transformations
are plt pit .... p,, and functions of these quantities.
The solution of the algebraic problem mentioned at the beginning of this
section, by steps exactly parallel to those just taken with the statistical problem,
is the following:
The positive definite quadratic forms EEa.pa^a^, and 'L'£aiia;ixi, and the
bilinear form ZZcraia;,a;; with real coefficients, where the Latin subscripts are
summed from 1 to s and the Cheek subscripts from s + 1 to s + t, and 8 4,t, may
be reduced by a real linear transformation of xlt ..., x, and a real linear trans-
formation of xM, ..., x^t simultaneously to the respective forms xf + ... + xf,
x
$+i + • • • + 4VH8> and PiX1xt^.1 + ptxixt+a + ... + p,x,xu. A fundamental system of
invariants under such transformations consists of pu ..., p,.
This algebraic theorem holds also if the quadratic forms are not restricted
to be positive definite, provided (3'6) has no multiple roots and the forms are
non-singular.
HAROLD HOTELLING 331
The normalization process we have defined may also be carried out for a sample,
yielding canonical correlations r x , r, r,, which may be regarded as estimates
of p l t p t ps, and associated canonical variates. With sampling problems raised
in this way we shall largely be concerned in the remainder of this paper.

A further application is to geometry. In a space of N dimensions a sample


of N values of a variate may be represented by a point whose coordinates are the
observed values. The sample correlation between two variates is the cosine of
the angle between lines drawn from the origin to the representing points, with the
proviso, since deviations from means are used in the expression for a correlation,
that the sum of all the coordinates of each point be zero. A sample of 8 +1

Downloaded from http://biomet.oxfordjournals.org/ at University of Oklahoma on January 17, 2015


variates determines a flat space of s and one of t dimensions, intersecting at the
origin, and containing the points representing the two sets of variates. In typical
cases these two flat spaces do not intersect except at this one point. A complete
set of metrical invariants of a pair of flat spaces is easily seen from the foregoing
analysis to consist of s angles whose cosines are rt, ...,»•,. Indeed, like all cor-
relations, they are invariant under rotations of the j^-space about the origin, and
they do not depend on the particular points used to define the two flat spaces.
Each of these invariants is the angle between a line in one flat space and a line
in the other. One of the invariants is the minimum angle of this kind, and the
others are in a sense stationary. The condition that the two flat spaces intersect
in a line is that one of the invariant quantities rlt ..., r, be unity. They intersect in
a plane if two of these quantities equal unity. For two planes through a point
in space of four or more dimensions, there will be two invariants rt, rt, of which
one is the cosine of the minimum angle. If r t = rt, the planes are isocline. Every
line in each plane then makes the minimum angle with some line in the other.
If rj = rt •= 0, the planes are completely perpendicular; every line in one plane is
then perpendicular to every line in the other. If one of these invariants is zero
and the other is not, the planes are semi-perpendicular; every line in each plane
is perpendicular to a certain line in the other.

The determinant of the correlations among canonical variates is


1—1

0 ... 0 Pi0 ... 0 ... 0


1—1

0 ... 0 0 Pi ... 0 ... 0

0 0 ... 1 0 0 ... p, ... 0


Pi 0 ... 0 1 0 ... 0 ... 0
0 Pi ... 0 0 1 ... 0 ... 0

0 0 ... p, 0 0 ... 1 ... 0

0 0 ... 0 0 0 ... 0 ... 1

(1 - Pa«) (1 - / > . ' ) . - - ( I -/>,*)••• .(3-16).


332 Relations between Two Sets of Variates
The rank of the matrix

of correlations between the two sets is invariant under non-singular linear trans-
formations of either set. Transformation to canonical variates reduces this
matrix to
p, 0 ... 0 0
0 p, ... 0 0

Downloaded from http://biomet.oxfordjournals.org/ at University of Oklahoma on January 17, 2015


0 0 ... p, 0
The rank is therefore the number of canonical correlations that do not vanish.
This is the number of independent components common to the two sets. In the
parlance of mental testing, the number of " common factors " of two sets of tests
(e.g. mental and physical, or mathematical and linguistic tests) is the number
of non-vanishing canonical correlations.
4. Vector Correlation and Alienation Coefficients. In terms of the covariances
among the variates in the two sets xl x, and x,+1, .... x^.t, we define the
following determinants, maintaining the convention that Greek subscripts take
values from 1 to «, and Latin subscripts take values from 8+1 to s + t. It will be
assumed throughout that s^t. A is the determinant of the covariances among
the variates in the first set, arranged in order: that is, the element in the ath row
and /9th column of A is a^. B is the determinant of the covariances among
variates in the second set, likewise ordered. D is the determinant oi s + t rows
containing in order all the covariances among all the variates of both sets. C is
obtained from D by replacing the covariances among the variates of the first set,
including their variances, by zeros. Symbolically,
A I_ I D I I rt
0 cr*i

Suppose now that new variates X\ x,' are defined in terms of the old
variates in the first set by the 8 equations

The new covariances are then expressed in terms of the old by (31). The deter-
minant of these new covariances, which we shall denote by A', may by (3"1) and
the multiplication theorem of determinants be expressed as the product of three
determinants, of which two equal the determinant c = | cY« | of the coefficients of
the transformation, while the third is A. If the variates of the second set are
subjected to a transformation of determinant d, the determinants of covariances
among the new variates analogous to those defined above are readily seen in this
way to equal
A'^c'A, B' = d}B, C' = c*d*C, D' = <*d}D (4-1).
HAROLD HOTELLING 333

Thus A, B, C, D are relative invariants under internal transformations of the two


sete of variatea
{ = {
The ratios ^ ^ Z
we shall call respectively the squares of the vector correlation coefficient or vector
correlation, and of the vector alienation coefficient It is evident that both are
absolute invariants under internal transformations of the two sets, since their
values computed from transformed variates have numerators and denominators
multiplied by the same factor c*d}, in accordance with (4-l).
The notation just used is appropriate to a population, but the same definitions
and reasoning may be applied to a sample. We denote by q* and z the same

Downloaded from http://biomet.oxfordjournals.org/ at University of Oklahoma on January 17, 2015


functions of the sample covariances that Q1 and Z, respectively, have been defined
to be of the population covariances.
A particularly simple linear transformation consists of dividing each variate
by its standard deviation. The covariances among the new variates are then the
same as their correlations, which are also the correlations among the old variates.
Hence, in the definitions of the vector correlation and alienation coefficients, the
covariances may be replaced by the correlations. For example, if s = t = 2, the
squared vector correlation in a sample may be written
0 0 ria
0 0 rM rH
r« 7"Sf 1 ru
r« r« r« 1
q' = .(4-3).
1 1
ru l 1
The vector correlation coefficient will always be taken as the positive square
root of 5 s or of Q8 (which are seen below to be positive) when s < t, and usually
also when « = t However, if in accordance with (4*3) we write

)
it is evident that q may be positive for some samples of a particular set of variates,
and negative for other samples. It may sometimes be advantageous, as in testing
whether two samples arose from the same population, to retain the sign of q for
each sample, since this provides evidence in addition to that given by the absolute
value of q. But unless otherwise stated we shall always regard q as the positive
root of q*. Likewise, Q, *]i and *JZ will denote the positive roots unless otherwise
specifically indicated in each case. A transformation of either set will reverse the
sign of the algebraic expression (4-4) if the determinant of the transformation is
negative. This will be true of a simple interchange of two variates; for example,
xj = x\, xt = xi has the determinant — 1. On the other hand, the sign is conserved
if the determinant of the transformation is positive. Such considerations apply
whenever s = t.
334 Relations between Two Sets of Variates
Since the vector correlation and alienation coefficients are invariants, they
may be computed on the assumption that the variates are canonical. In this case
A = B = 1, and D is given by (3'16). To obtain G we replace the first s l's in
the principal diagonal of (316) by O's. It then follows that

This confirms that the value of Q* given in (4-2) is positive. In this way the vector
correlation and alienation coefficients are expressible in terms of the canonical
correlations by the equations
Q=±ftP.-P., Z = (\-Pl')(l-pi*)...(l-p,*) (45),

Downloaded from http://biomet.oxfordjournals.org/ at University of Oklahoma on January 17, 2015


q =±r1r,...r., z = (1 - r?) (1 - r8a) ... (1 - r,«) (46).
From these results it is obvious that both the vector correlation and vector
alienation coefficients are confined to values not exceeding unity. Also Z and z
are necessarily positive, except that they vanish if, and only if, all the variates in
one set are linear functions of those in the other.
Since the denominator of (4-4) is obviously less than unity, and since we have
just shown that q<,\, the tetrad must be still less. This simple proof that the
tetrad is between — 1 and + 1 shows the falsity of the .idea that the range of
the tetrad is from — 2 to + 2 , which has gained some currency. An equivalent
proof in vector notation was communicated to the writer by E. B. Wilson.
The only case in which Z can attain its maximum value unity is that in which
all the canonical correlations vanish. In this case no variate in either set is
correlated with any variate in the other, so that the two sets are completely
independent, at least if the distribution is normal. Moreover, Q = 0. On the other
hand, the only case in which Q can be unity is that in which all the canonical
correlations are unity. In this event, Z = 0; also, the variates in the first set are
linear functions of those in the second. Thus either s, 1 — q, or 1 — q* might be
used as an index of independence, while we might use q, q* or 1 — s as a measure
of relationship between the two sets. The work of Wilks alluded to in Section 1
provides an exact distribution of s on the hypothesis of complete independence, a
distribution which may thus be used to test this hypothesis.

If-we regard the elements of A, B and G as sample covariances, we have in case


8 = t a, simple interpretation of q. Consider the two matrices of observations on the
two sets of variates in N individuals, in which each row corresponds to a variate
and each column to an individual observed. From Section 2 it is evident that the
square of the sum of the products of corresponding s-rowed determinants in the two
matrices is (— 1)* N^G; also that the sums of squares of the s-rowed determinants
in the two matrices are N'A and N'B. Therefore q is simply the product-moment
correlation coefficient between corresponding 8-rowed determinants.
The generalized variance of a set of variates may be defined as the determinant
of their ordered covariances, such as A or B. Let glt f», ...,£, be estimates
HAROLD HOTELLING 336

respectively of a^,<r,,..., x, obtained from x^-i, • • •, <*VK by least squares, and let the
regression equations be
£. = £&.«*, (4-7).
i
The appropriateness of Q as. a generalization of the correlation coefficient, and of >JZ
as a generalization of the alienation coefficient, will be apparent from the following
theorem:
The ratio of the generalized variance of £x £, to that of a^, ..., x, is Q*.
The ratio of the generalised variance of x1 — ^1,xt — ^t, ...,x, — £, to that of
Xi, ...,x,is Z.
This theorem is expressed in terms of the population, but an exactly parallel

Downloaded from http://biomet.oxfordjournals.org/ at University of Oklahoma on January 17, 2015


one holds for a sample.
Proof: If x1, ...,xt be subjected to a linear transformation of determinant c,
and if £lt..., £, be subjected to the same transformation (i.e. a transformation with
the siime coefficients), then x1 — f1; ..., as, — f, will also be subjected to this trans-
formation. The generalized variances of all three of these sets of variates will be
multiplied by the same constant c*, just as in (4'1) we found that A' = c*A. Ratios
among the three determinants will therefore be absolutely invariant; consequently
our theorem is true if it is true when the original variates are canonical. Suppose,
then, that this is the case. Since each canonical variate is correlated with only one
of the other set, the regression equations (47) reduce simply to
£« = £.««+« («=•! «)•
Since the variance of a-«+, is unity, that of £. is />«*; that of the deviation «. - £«
is 1 —p.*. Since the canonical variates a^,..., x, are mutally uncorrelated, the same
is true of the f., and also of the #„ — £«. The generalized variance of the canonical
variates is unity; that of the £a is the product of the elements in its principal
diagonal, namely pi*p»*... p*; and the generalized variance of the ar. — £« is
(1 — pi*)... (1 — p,1). In view of (4-5), this proves the theorem.
A further property of the vector correlation is obvious from the final paragraph
of Section 3:
A necessary and sufficient condition that the number of coviponents in an un-
correlated set of components common to two sets of variates be less tftan the number
of variates in either set is that the vector correlation be zero.
When 8 = 2 the canonical correlations not only determine the vector correlation
and alienution coefficients but are determined by them. If as usual we take
q positive, (46) becomes q = r x r s , z =«(1 — rf)(1 — rgf), whence

Solving, and denoting the greater canonical correlation by r1( we have


q)*-z+ V(l-gj»-
j (*'")•
336 Relations between Two Sets of Variates
Since the canonical correlations are real, (r^ — rt)* is positive; therefore
t<(l-q)' (4-10).
In like manner, the vector correlation and alienation coefficients in the population
are subject not only to the inequalities, O^Q*^1, O ^ i f ^ l , but also, when 8 = 2, to

These inequalities become equalities when the roots are equal.


The fundamental equation (3'6), regarded as an equation in A1, has as roots the
squares of the canonical correlations. Hence, by (4"8), it reduces it to the form
A « - ( l - 5 + 3t)Al + 9 l = 0 (411),

Downloaded from http://biomet.oxfordjournals.org/ at University of Oklahoma on January 17, 2015


where « = 2.
5. (Standard Errors. The canonical correlations and the coeflBcients of the
canonical variates are defined in Section 3 in such a way that they are continuous
functions of the covariances, with continuous derivatives of all orders, except for
certain sets of values corresponding to multiple or zero roots, within the domain of
variation for which the covariances are the coeflBcients of a positive definite quadratic
form. This is trne for the canonical reduction of a sample as well as for that of
a population. The probability that a random sample from a continuous distribution
will yield multiple roots is zero; and sample covariancea must always be the
coefficients of a positive definite form.
We shall in this section derive asymptotic standard errors, variances and co-
variances for the canonical correlations on the assumption that those in the
population are unequal, and that the population has the multiple normal distribution.
From these we shall derive standard errors for the vector correlation and alienation
coefficients q and z. The deviation of sample from population values in these as in
most cases have variances of order n~1, and distributions approaching normality of
form as n increases*.
Let xlt ...,xp be a normally distributed set of variates of zero means and
covariances
CTy = Ex{xt (5 - l).
For a sample of N in which «a is the value of Xi observed in the / t h individual,
the sample covariance of x{ and xt is
2 ( ^ - 2 , ) (Xff-Xf) y N

S(j (52)>
—w=i —jm—
where x{ and S, are the sample means. To simplify the later work, we introduce
the pseudo-observatiovs, xu', defined in terms of the observations by the equations

* Foi a proof of approach to normality for a general class of statistics including those with which
we deal, of. Doob, op. eit.
HAROLD HOTELLING 337

where the quantities ctB, independent of i and therefore the same for all the
variates xit are the coefficients of an orthogonal transformation, such that

Since the transformation is orthogonal we must havo


= 8/g (55),

where S/o is the Kronecker delta, equal to unity iff=g, but to zero if f=£g. The
coefficients cfe may be chosen in an infinite variety of ways consistently with these
requirements, but will be held fixed throughout the discussion. Since linear

Downloaded from http://biomet.oxfordjournals.org/ at University of Oklahoma on January 17, 2015


functions of normally distributed variates are normally distributed, the pseudo-
observations are normally distributed. Their population means are, from (5 3),
Exy = 2c/e Exfg •= 0,
since the original variates were assumed to have zero means. Also, since the
expectation of the product of independent variates is zero, and since the different
individuals in a sample are assumed independent, so that, by (5 - l),
Exihxjt - 8»taw (5 6),
we have, from (53), (5*6) and (55),
Exu'xig' =.

(5-7).
h

From (5*4) and (5*3) it is clear that

The equations (5'3) may, on account of their orthogonality, be solved in the form
Xtf = ± S '
Therefore, by (5"5),

/ a
Substituting this result and (5-8) in (5-2), we find that the final term of the sum
cancels out. Introducing therefore the symbol S for summation from 1 to N-l
with respect to the second subscript, and putting also
n = N-l (5-9),
we have the compact result

Biometrika u v m 22
338 Relations between Two Sets of Variates
Since the pseudo-observations are normally distributed with the covariances (5'7)
and zero means, they have exactly the same distribution as the observations in
a random sample of n from the original population. The equivalence of the mean
product (510) with the sample covariance (5-2) establishes the important principle
that the distribution of covariances in a sample of n+1 is exactly the same as the
distribution of mean products in a sample of n, if the parent population is normally
distributed about zero means. Use of this principle will considerably simplify the
discussions of sampling.
An important extension of this consideration lies in the use of deviations, not
merely from.sample means, but from regression equations based on other variates.
In such cases the number of degrees of freedom n to be used is the difference

Downloaded from http://biomet.oxfordjournals.org/ at University of Oklahoma on January 17, 2015


between the sample number and the number of constants in each of the regression
equations, which number must be the same for all the deviations. The estimate of
covariance of deviations in the tth and jth variates to be used is then the sum of the
products of corresponding deviations, divided by n. This may also be regarded as
the mean product of the values of xi and xt in n pseudo-observations, as above,
without elimination of the means or of the extraneous variates. The sampling
distributions with which we shall be concerned will all be expressed in terms of the
number of degrees of freedom n, rather than in terms of the number of observa-
tions N. This will permit immediately of the extension, which is equivalent to
replacing all the correlations, in terms of which our statistics may be defined, by
partial correlations representing the elimination of a particular set of variates, the
same in all cases.
A variance is of course the covariance of a variate with itself, so that this whole
discussion of covariances is equally applicable to variances.
The characteristic function of a multiple normal distribution with zero means is
well known to be

The moments of the distribution are the derivatives of the characteristic function,
evaluated for ti =*tt= ...= 0. From the fourth derivative with respect to tit tt, tt
and *„, it is easy to show in this way that

From (5*10) we have


E s ^ = ^ SSEXH'XH'XU'XJ (5-12).
Nowif/^p,
ExJ Xn'xj X^' — (EXu'Xj,') (Exv'Xnf') =CTyajn, (5'13),
since the expectation of the product of independent quantities is the product of
their expectations. Of the n* terms in the double sum in (5-12), n* — n are equal
to (513). The remaining n terms are those for which / = g, and each of them
equals (511). Hence

Inasmuch as Esu = <TW,


HAROLD HOTBLLING 339

we have, if we put da{j = sit — cr{i,


for the deviation of sample from population value, that the sampling covariance of
two covariances is
Edoifda^
1
whence Edoijdo]m=^(pii<Jin+otmojJ (514).
This is a fundamental formula from which may be derived directly a number of
more familiar special cases. For example, to obtain the variance of a variance,
merely put t = j = k = m, which gives

Downloaded from http://biomet.oxfordjournals.org/ at University of Oklahoma on January 17, 2015


Returning from these general considerations to the problem of canonical
correlations, we recall from (3'2) and (3'3) that for any particular canonical cor-
relation pi,
SS £Sa«6< (515),
where a and /3 are summed from 1 to s, and i and j from s + 1 to s +1. Any
particular set of sampling errors daj^ in the covariances determines a corresponding
set of sampling errors in the aa and bt and in px> f° r these quantities are definite
analytic functions of the covariances except when pi is a multiple or zero root of
(3"6), cases which we now exclude from consideration. In terms of the derivatives
of these functions we define

p ^ ^ A B (5-16),
where dojx^x.sAB — aAS, and the summations are over all values of A and B from
1 to a +1. Then differentiating (515) we have
2 2 (2(7^ a. dap + a.apd(?^) = 0, mi{2ailb(dbi + bibjdaij) = 0,1 ..,,.,.
J- (5'17).
Let us now suppose that the variates are in the population canonical. This
assumption does not entail any loss of generality as regards p\, since pi is an
invariant under transformations of the variates of either set. Since a. is the
coefficient of «. in the expression for one of the canonical variates, which we take
to be x^, we have in the population a i » I, a* -»a* =» ... = a, = 0. In the same way,

Also, since the covariances among canonical variates are the elements of the
determinant in (3*16), we have

the Kronecker deltas being equal to unity if the two subscripts are equal, and
otherwise vanishing. When these special values of the a's, b'e and a's are substi-
tuted in (5'17) most of the terms drop out, leaving the simple equations

=pt db,+1 + pi dai + dalt ,+1 j


22—2
340 Relations between Two Sets of Variates
Substituting from the first two in the third of these equations, we get
dpi = d<jlif+i - Api (dan + da,+ly , + i) (520).
For any other simple root p% we have in the same way
dpi = dat,i+%— ipt(dan + ^°I+B, n-i) (5'21).
Squaring (5'20), taking the expectation, using the fundamental formula (5"14), and
finally substituting the canonical values (5-18), we have
nE(dpi) = ana,+i , + i + o\lt+i — pi (2<7nOi t + i + !
I
(5-22).

Downloaded from http://biomet.oxfordjournals.org/ at University of Oklahoma on January 17, 2015


Treating the product of (5"20) and (5'21) in the same way we obtain
Edp1dp% = 0 (5-23).
A sample canonical correlation r-i may be expanded, about pi in a Taylor series
of the form
8P
n = Pl + S S J ^ - ^ a + £ £ £ £ £ . 'J daABda0D+ (5-24),
a
C AB 9 AB(laCT>
O

or, by the last of (5-16),


»"i-Pi = dpi+ (5-25).
The expectation of the product of any number of the sampling deviations daj^ is
a fixed function of the a'a divided by a power of n whose exponent increases with
the number of the quantities daAS in the product. Since EdaAB = 0, we have from
(5"24) and (5"14) that E(i\ — pi) is of order rr1. Hence squaring (5"25) and using
(5-22), we find that the sampling variance of rt is given by ^ - , apart from
1
terms of higher order in rr . If by the standard error of rx we understand the
leading term in the asymptotic expansion of the square root of the variance, we have
for this standard error

It is remarkable that this standard error of a canonical correlation is of exactly the


same form as that of a product-moment correlation coefficient calculated directly
from data, at least so far as the leading term is concerned.
The covariance of two statistics or their correlation would ordinarily be of
order nr1; but from (5-23) it appears that the covariance of ?i and ra is of order n~8
at least. All these results hold as between any pair of simple non-vanishing roots.
To summarize:
Let pi, ^ pp be any set of triviple non-vanishing roots of (3'6). For sufficiently
large samples these will be approximated by certain of the canonical cotTelations
r
i> r i . •••>rp of the samples in such a way that, when ry — py is divided by tfte
standard error
%=17?2 (y = 1 2
' P> (527)l
HAROLD HOTBLLING 341

the resulting variates have a distribution which, as n increases, approaches the normal
distribution ofp independent variates of zero means and unit standard deviations.
For small samples there will be ambiguities as to which root of the determinantal
equation for the sample is to be regarded as approximating a particular canonical
correlation of the population. As n increases, the sample roots will separately
cluster more and more definitely about individual population roota
If a canonical correlation py is zero, and if s = t, the foregoing result is
applicable with the qualification that sample values r 7 approximating py must not
all be taken positive, but must be assigned positive and negative values with equal
probabilities. Alternatively, if we insist on taking all the sample canonical correla-

Downloaded from http://biomet.oxfordjournals.org/ at University of Oklahoma on January 17, 2015


tions as positive, the distribution will be that of absolute values of a normally
distributed variate.
To prove this, suppose that the determinantal equation has zero as a double
root. For sample covariances sufficiently near those in the population, there will
be a root r close to zero, which will be very near the value of X obtained by
dropping from the equation all but the term in A1 and that independent of X. The
latter is for s = t a perfect square, and the former does not vanish, since the zero
root is only a double one. Hence r is the ratio of a polynomial in the SAB'B to
a non-vanishing regular function in the neighbourhood. This means that the
differential method applicable to non-vanishing roots is also valid here, and that,
since the derivatives are continuous, (5'27) holds even when py = 0.
Since a tetrad difference is proportional to a vector correlation, which is the
product of the canonical correlations, the question whether the tetrad differs
significantly from zero is equivalent to the question whether a canonical correlation
is significantly different from zero. This may be tested by means of the standard
error (5-27), which reduces in this case to -j-. Since this is independent of unknown
parameters, we have here a method of meeting the third of the difficulties mentioned
in Section 1 in connection with testing the significance of the tetrad.
For s = 2, a zero root is of multiplicity t at least. From the final result in § 9
below it may be deduced that if zero is a root of multiplicity exactly t, if r is the
corresponding sample canonical correlation, and if s = 2, then nr* has the %* distri-
bution with t — 1 degrees of freedom. This provides a means of testing the
significance of a sample canonical correlation in all cases in which 8=2.
We shall conclude this section by deriving standard error formulae for the
vector correlation and vector alienation coefficients, assuming the canonical correla-
tions in the population all distinct. Differentiating (46) and supposing all canonical
correlations positive we have
dQ = S ^ , dZ = -<LZZt&\.
l
Pr - Pi
Taking the expectations of the squares and products of these expressions and using
342 Relations between Two Sets of Variates
(522) and (5"23), we obtain for the variances and covariance, apart from terms of
higher order in n"1,
^ (1 - PyV 9 7 / p i ' + •••+?.'

EdQdZ=--QZ?.(\-p*) (5-28).

For the case a = 2 these formulae reduce with the help of (4*5) to
1-Z+QP

Downloaded from http://biomet.oxfordjournals.org/ at University of Oklahoma on January 17, 2015


EdQdZ = - -
n
6. Eocamples, and an Iterative Method of Solution. The correlations obtained
by Truman L. Kelley* among testa in (1) reading speed, (2) reading power,
(3) arithmetic speed, and (4) arithmetic power are given by the elements of the
following determinant, in which the rows and columns are arranged in the order
given:
10000 6328 2412 0586
•6328 10000 --0553 0655
D= = 4129.
•2412 --O553 10000 -4248
0586 "0655 4248 10000
These correlations were obtained from a sample of 140 seventh-grade school
children. Let us inquire into the relations of arithmetical with reading abilities
indicated by these tests.
The two-rowed minors of D in the upper left, lower right, and upper right
corners are respectively
A = -5996, 5 =-8195, ^ 0 =-01904.
Hence, by (42),
tf o, -0007377, g=O27l61, * = -84036 (6-1).
By means of (4"9) or (4'11) these values give for the canonical correlations
^ = •3945, rsi = -
In this case n = N— 1 = 139, and the standard error (527) reduces, for the
hypothesis of a zero canonical correlation in the population, to = -0848. It is
V loH
plain, therefore, that rt is not significant, so that we do not have any evidence here
of more than one common component of reading and arithmetical abilities.
Whether we have convincing evidence of any common component is another
question. I t is tempting to compare the value of ri also with the standard error
•0848 for the purpose of answering this question, which would give a decidedly
significant value. This however is not a sensitive procedure for testing the hypothesis
* Op. cit., p. 100. These are the raw correlations, not corrected for attenuation.
HAROLD HOTELLING 343

that there is no common factor; for this hypothesis of complete independence


would mean that both canonical correlations would in the population be zero;
they would therefore be a quadruple root of the fundamental equation, to which the
standard error is not applicable. Other tests for complete independence will be
considered in Section 11; these have a sound basis, and one of them (discovered
by Wilks) gives approximately "0001 aB the probability of a value of t as small as
or smaller than the value found above. We conclude that reading and arithmetic
involve one common mental factor but, so far as these data show, only one.
Linear functions ai<Ei + Otxt and 6,03 + 6404 having maximum correlation with
each other may be used either to predict arithmetical from reading ability or vice
versa. The coefficients will satisfy (3-4) and (3"5); when in these equations we

Downloaded from http://biomet.oxfordjournals.org/ at University of Oklahoma on January 17, 2015


substitute rt = "3946 for X and /t, and the given correlations for the covariances,
and divide by — X •• — -3946, we have
ax + -6328a, - -61146a - 148564 = 0,
•6328a! + a, + -14026,.- 166064 = 0,
--6114a! +-1402a, + 6, +-424864 = 0,
--1485ai --1660a, +-42486, + 64 = 0.
The fourth equation must be dependent on the preceding three, so we ignore it
except for a final checking. Replacing 64 by unity we may solve the first three
equations, which are symmetrical, by the usual least-square method. Thus we
write the coefficients, without repetition, in the form
1-0000 -6328 --6114 --1485 -8729
1-0000 -1402 - 1 6 6 0 1-6070
1-0000 -4248 -9536
the last column consisting of the sums of the elements written or understood in
the respective rows. The various divisions, multiplications and subtractions
involved in solving the equations are applied to the elements in the rows,
including those in the check column, which at every stage gives the sum of the
elements written or understood in a row. In the array above, the coefficients of
each equation begin in the first row and proceed downward to the diagonal, then
across to the right, and this scheme is followed with the reduced set of equations
obtained by eliminating an unknown, which is done in such a way as to preserve
symmetry. This process yields finally the ratios
ox : a, : 6, : 6« = - 27772 : 2-2655 : - 24404 : 1.
Therefore the linear functions of arithmetical and reading scores that predict each
other most accurately are proportional to — 27772a;i + 2-2655<r» and — 2-4404a^ + «4,
respectively. It is for these weighted sums that the maximum correlation -3945 is
attained.
From the same individuals, Kelley obtained the correlations in the following
table, in which the first two rows correspond to the arithmetic speed and power
344 Edations between Two Sets of Variates
tests cited above, while the others are respectively memory for words, memory for
meaningful symbols, and memory for meaningless symbols :
10000 •4248 0420 •0215 0573
•4248 10000 1487 •2489 •2843

0420 •1487 1-0000 •6693 •4662


0215 •2489 •6693 10000 6915
0573 •2843 •4662 •6915 10000

From this we find q* = 0003209, q = -01792, t = 902466,

Downloaded from http://biomet.oxfordjournals.org/ at University of Oklahoma on January 17, 2015


whence n = '3073, r a = -0583.
Since in this case s 4=1, we cannot sny as before that the standard error of rt when
p = 0 is n~i = O848. But, putting ^ s = nraa = 472, with two degrees of freedom,
we find P = '79, so that rt is far from significant. However r± is decidedly
significant.
In view of the tests in Section 11, we conclude in this case also that there is
evidence of one common component but not of two.
If each of the two sets contains more than two variates, the two invariants
q and z do not suffice to determine the coefficients of the various powers of \ in
the determinantal equation, so that its roots can no longer be calculated in the
foregoing manner. The coefficients in the equation will involve other rational
invariants in addition to q and z, but we shall not be concerned with these, and it
is desirable to have a procedure that does not require their calculation, or the
explicit determination and solution of the equation. It is also desirable to avoid
the explicit solution of the sets of linear equations (3'4) and (3-5) when the variates
ure numerous, since the labour of the direct procedure then becomes excessive.
These computational difficulties are analogous to those in the determination of the
principal axes of a quadric in n-space, or of the principal components of a set of
statistical variates, problems for which an iterative procedure has been found
useful, and has been proved to converge to the correct values in all cases*. We
shall now show how a process partly iterative in character may be applied to
determine canonical variates and canonical correlations between two sets.

If in the s equations (3'4) we regard \ai, Xoj,.... \a, as the unknowns, we may
solve for them in terms of the b'a by the methods appropriate for solving normal
equationa Indeed, the matrix of the coefficients of the unknowns is symmetrical;
and in the solving process it is only necessary to carry along, instead of a single
column of right-hand members, £ columns, from which the coefficients of b,+1,... ,bt+t
in the expressions for a.\ a, are to be determined. The entries initially placed

* Harold Hotelling, " Analysia of a Complex of Statistical Variables into Principal Components" in
Journal of Kditcaliuiitil Vnydioloflij, Vol. xxiv. pp. 417—441 and 498—520 (September nnd October,
1933), Section 4.
HAROLD HOTELLING 346

in these columns are of course the covariances between the two sets. Let the
solution of these equations consist of the s erpressions
S^ < 6 < (o = l, 2, ...,«) (63).
i
In exactly the same way the t equations (3*5), with /* replaced by \, may be solved
for XfcJ+1, ..., \b,+t in the form
(i = s + l , ..., s + t) (64).

If we substitute from (64) in (63) and set


k^ = I,gtih0 (6-5),

Downloaded from http://biomet.oxfordjournals.org/ at University of Oklahoma on January 17, 2015


we have X*am = Yik^a^ (6'6).

Now if an arbitrarily chosen set of numbers be substituted for d , . . . , a, in the


right-hand members of (6"6), the sums obtained will be proportional to the numbers
substituted only if they are proportional to the true values of Oi, .... a,. If, as will
usually be the case, the proportionality does not hold, the sums obtained, multiplied
or divided by any convenient constant, may be used as second approximations to
solutions ai a, of the equations. Substitution of these second approximations
in the right-hand members of (6"6) gives third approximations which may be treated
in the same way; and so on. Repetition of this process gives repeated sets of trial
values, whose ratios will be seen below to approach as limits those among the true
values of Oi,..., a,. The factor of proportionality X* in (66) becomes rf, the square
of the largest canonical correlation. When the quantities a/, ..., a,' eventually
determined as sufficiently nearly proportional to alt ..., a, are substituted in the
right-hand members of (6"4), there result quantities b,+1', .... b,+t' proportional to
b,+1, .... b,+t, apart from errors which may be made arbitrarily small by continuation
of the iterative process. The factor of proportionality to be applied in order to
obtain linear functions with unit variance is the same for the a's and the 6's; from
(32), (3"4), and (3'5) it may readily be shown that if from the quantities obtained
we calculate
r
(6-7),

then the true coefficients of the first pair of canonical variates are mai, ..., ma,',
mb,+1', ..., mb,+t.
In the iterative process, if ai, ..., a, represent trial values at any stage, those
at the next stage will be proportional to
a«'=Sfcaflu/i (6-8).
Another application of the process gives
a," = 2 * , . a.',
whence, substituting, we have ny" = SAr^*a^,
provided we put t Tfl *
346 Relations between Two Sets of Variates
The last equation is equivalent to the statement that the matrix K* of the
coefficients A^* is the square of the matrix K of the k*p. It follows therefore that
one application of the iterative process by means of the squared matrix is exactly
equivalent to two successive applications with the original matrix. This means
that if at the beginning we square the matrix only half the number of steps will
.ubsequently be required for a given degree of accuracy.
The number of steps required may again be cut in half if we square K1, for
with the resulting matrix K* one iteration is exactly equivalent to four with the
original matrix. Squaring again we obtain Ka, with which one iteration is
equivalent to eight, and so on. This method of accelerating convergence is also
applicable to the calculation of principal components*. It embodies the root-

Downloaded from http://biomet.oxfordjournals.org/ at University of Oklahoma on January 17, 2015


squaring principle of solving algebraic equations in a form specially suited to
determinantal equations.
After each iteration it is advisable to divide all the trial values obtained by a
particular one of them, say the first, so as to make successive values comparable.
The value obtained for eti, if this is the one used to divide the rest at each step,
will approach r{ if the matrix K is used in iteration, but will approach ri* if Ka is
used, ri 8 if K* is used, and so forth. When stationary values are reached, they may
well be subjected once to iteration by means of K itself, both in order to determine
ri 1 without extracting a root of high order, and as a check on the matrix-squaring
operations.
If our covariances are derived from a sample from a continuous multdvariate
distribution, it is infinitely improbable that the equation in to,

o,
ku -
has multiple roots. If we assume that the roots a>i, to%, .... to, are all simple, and
regard ax, ...,o, as the homogeneous coordinates of a point in s —1 dimensions
which is moved by the collineation (68) into a point (c^', .... a,'), we knowf that
there exists in this space a transformed system of coordinates such that the col-
lineation is represented in terms of them by
= to.a,.

Another iteration yields a point whose transformed homogeneous coordinates are


proportional to
0)1*0,!, to%a%, ..., afa,.

Continuation of this process means, if o>i is the root of greatest absolute value, that
* Another method of accelerated iterative calculation of prinoipal components is given by T. L.
Kelley in Essential Traits of Mental Life, Cambridge, Mass., 19S5. A method similar to that given above
is applied to principal components by the author in Ptychometrika, Vol. i. No. 1 (1986).
+ Bflcher, p. 298.
HAEOLD HOTELLING 347

the ratio of the first transformed coordinates to any of the others increases in
geometric progression. Consequently the moving point approaches as a limit
the invariant point corresponding to this greatest root Therefore the ratios of the
trial values of a.\,..., a, will approach those among the coefficients in the expression
for the canonical variate corresponding to the greatest canonical correlation. Thus
the iterative process is seen to converge, just as in the determination of principal
components.
After the greatest canonical correlation and the corresponding canonical variates
are determined, it is possible to construct a new matrix of covariances of deviations
from these canonical variates. When the iterative process is applied to this new
matrix, the second largest canonical correlation and the corresponding canonical

Downloaded from http://biomet.oxfordjournals.org/ at University of Oklahoma on January 17, 2015


variates are obtained. This procedure may be carried as far as desired to obtain
additional canonical correlations and variates, as in the method of principal com-
ponents ; but the later stages of the process will yield results which will usually be
of diminishing importance. The modification of the matrix is somewhat more
complicated than in the case of principal components, and we shall omit further
discussion of this extension.
The process of obtaining iteratively the greatest canonical correlation, the most
predictable criterion, and the best predicter may be illustrated if we imagine that,
with three variates in each set, we have obtained from a sample the matrix of
correlations
10 •7 1 •5 •4 •2
•7 10 •4 •3 •5
•1 •1 1-0 •2 •2 •4
•5 •4 •2 10 •8 •6
•4 •3 •2 •8 10 •7
•2 •5 •4 •7 10

From the first three rows we obtain the set of normal equations indicated by
1-0 -7 -1 -5 -4 -2 2-9
10 -1 -4 -3 -5 30
10 2 -2 -4 2-0
Here the second and third rows are understood to be filled out with unwritten
terms in such a way as to make the matrix consisting of the first three columns
symmetric. The entries in the last column are the sums of those written or
understood in the respective rows preceding them. By linear operations on the
rows, equivalent to solving the equations, they are reduced to
1 -423 -362 --316 1-470
1 -089 -031 -685 1-804
1 149 161 -362 1-671
348 Relations between Two Sets of Variates
This is the numerical equivalent of (6'3). Hence g^ is the element in the ath row
and ith column of the matrix
•423 -362 - 3 1 6
G= 089 -031 -685
149 -161 -362
From the last three columns of the given matrix of correlations we obtain likewise
the normal equations indicated by
1-0 -8 -6 -5 4 -2 3-5
10 -7 -4 -3 -2 34

Downloaded from http://biomet.oxfordjournals.org/ at University of Oklahoma on January 17, 2015


10 2 -5 -4 34
The three columns before the check column appear in the same order as in the
lower left corner of the matrix of correlations. The solutions of these equations,
corresponding to (6-4), are the elements of the matrix
•522 -385 -054
H= •121 - 3 8 5 - 1 9 9
- -198 -539 507
in which h^ is the element in the tth row and /9th column. Upon multiplying the
rows of 0 by the columns of H we find that k^, defined by (6"5), is in the ath row
and /3th column of
•327 - 1 4 7 --209
-•085 •392 •346
026 190 •160
The check columns are used to verify the calculation to this point, and may be
used also at the next stage, which is to compute, by multiplying the rows of K by
its columns,
•114 - 1 4 5 -153
-•052 •232 •209
-003 101 •086
and in the same way,
021 -•066 -061
K* = - 0 1 9 •082 •074
-•006 033 029
The iteration process may now be begun with the trial values 1, 1, 1; when
this set (which may be regarded as a vector) is multiplied by the rows of K* there
results simply the set of sums of rows, namely
- -106 -137 056.
Dividing all three of these by the first we have
10 -1-3 -5.
HAROLD HOTELLING 349

Multiplying this vector by the rows of K* gives


•137 --163 --063,
which upon division by the first becomes
1-00 - 1-19 - 46.
Multiplication of this vector by the rows of K* and division by the first resulting
element gives
100 - 1-18 - -46,
which upon another repetition of the process recurs exactly to the two decimal
places. We therefore return to the matrix K with these trial values; multiplying
them by the rows of K, dividing by the first, and then repeating the process once,

Downloaded from http://biomet.oxfordjournals.org/ at University of Oklahoma on January 17, 2015


we have the values
•5968 - -707 - -272,
which, divided by the first, become
« i ' = l , a,' = -1-187, a3' = --456,
are stationary under further iterations, and are correct to three decimal places.
The last divisor, "5968, is the square of the greatest canonical correlation, also
correct to three places; hence rx = V-59b'8 = "773. Substitution of oi', a%, a3', in
the right meinbera of (6'4), which conies to the same thing as multiplication by
the rows of H, yields
W = -040, b6' = -669, W = - 1-069.
Then from (6'7) we have m = l - 0 1 6 ; when this is multiplied by the values of a.'
and hi just found, there result the coefficients in the expressions for the leading
canonical variates, namely
Ui = l-016a;i-l-2l7a; 8 - -463^3,
«!= -041a;4+ -68Oa;s-1-086^,
which have unit variances and the correlation -773.
7. The Vector Correlation as a Product of Correlations or of Cosines. We shall
in this section define certain linear functions of the variates in each set, forming
two sequences, of which the product of the correlations between corresponding
members is the vector correlation q. This result will be used in Section 8 to obtain
an exact sampling distribution of q. The resolution, though valid with respect to
the population, needs for our purposes to be made with reference to a sample. We
shall use the pseudo-observations defined in Section 5, but shall write the sample
covariance (5*10) in the form
SXLTU

where S stands for summation from 1 to n, the number of degrees of freedom, and
where L and M stand for an arbitrary pair, equal or unequal, of the subscripts
1, 2, ...,s + t.
The sequences of variates which we shall consider may be defined as follows.
First, let x±=xx. Then let xj (a = 2, 3, ..., s) be the difference between tva and
350 Relations between Two Sets of Variates
a least-square estimate of a;, iu terms of xlt ...,xa_lta\l divided by such a constant
that the variance of xj is unity. To define the other sequence, let x,+1' be a linear
function of x,+1, .... x,+t having maximum correlation with xi; and let x,+f
(y9 = 2, .... s) be a linear function uncorrelated with x,+1', ..., *,+£_/, and having
maximum correlation with xp. All these are to have unit variances. For a sample
we may set aside as infinitely improbable the possibility that any of these new
variates should be indeterminate. Putting Rfi for the correlation of xp' with «,+/,
we shall find that
q = R1Rt...R, (72).
The process will be more perspicuous in geometrical than in algebraic language
because of the simplicity of the geometry associated with samples from normal

Downloaded from http://biomet.oxfordjournals.org/ at University of Oklahoma on January 17, 2015


distributions, and the remainder of this section will be in geometrical terms. In
the space of n dimensions in which the pseudo-observations of a variate are the
coordinates of a point, there is for each variate a spherically symmetrical distribu-
tion of probability density centred at the origin. Let XL denote the point whose
coordinates are the pseudo-observations xL1, xn, ..., Xjjn on the variate xL
(L=l, 2, ...,8 + t). Let Pt be the flat space of t dimensions containing the origin
0 and the points X,+1, ..., X,+t determined by the second set of variates.
Perpendicular to OX± will be a flat space of n - 1 dimensions, whose intersection
with Pt will in general be of t — 1 dimensions. Denote this intersection by Pt-!.
Let Pf_, be the flat space of t — 2 dimensions contained in Pt-X and perpendicular
to 0Xt; and so forth.
Let Xi be the point on OXi of unit distance from the origin. We further define
points Xt,..., X,', all at unit distance from the origin, such that OXi, 0Xt\ ..., OX,'
are mutually perpendicular, and such that 0X% is coplanar with OXi and 0X%;
OX3' is in the same flat 3-space with OXX, 0Xt and 0X9; and so forth. The
coordinates

of these a points will thus satisfy


.(73),
where 8^ is the Kronecker delta, equal to unity if a = /3 but otherwise zero, and
where S stands for summation from 1 to n.
Let us also rotate the n axes so that the first t of them lie in Pt; and let an
internal transformation be performed upon the t variates of the second set such
that, for this particular sample, the coordinates representing them become
1 0 0 ... 0
0 1 0 ... 0
0 0 1 ... 0 (7-4).

0 0 0 .. . 1 ... 0
HAROLD HOTELLING 351
None of these transformations affects the value of q, and we have, adapting the
definition (4*2) to this case, by replacing the covariances by the functions (7'1) of
«ii'. •••,*«»' and of the elements of (74), and then multiplying each row of each
determinant by n,

A'B'
where now A' = B1 = 1, while

C' = ATS).

Downloaded from http://biomet.oxfordjournals.org/ at University of Oklahoma on January 17, 2015


xu'...xa' 0 0 ... 1
Now letting 2 stand for summation from 1 to t, we introduce determinants

= 1,2,...,*) (7-6).

Upon expanding (7-5) with respect to the first s rows and columns we find, with
the help of Section 2,
q* (iyC'>D. (77).
Now any line perpendicular to OXt and OXt is perpendicular to all the lines in
the plane of these two, in particular to 0Xt'. Hence Pt-t, which consists of lines
perpendicular to OXX and OXt, is perpendicular to OXt'. In like manner, Pt-t is
perpendicular to OXi, OX% and 0X3'; and in general Pt_p is perpendicular to
0X1',0Xt',...,0Xp'.
Since Pt-p lies entirely within Pt, the coordinates of any point U in Pt-fi will
be linearly dependent on the rows of (7 "4), and so of the form
"I, «s, •.., «t.O, 0, . . . , 0 (7-8).
The orthogonality of OU to OXx,..., OXp' means that
2 W =0 ( a = l , 2, ...,/5) (7-9).
Now let 0^+1 denote the angle that OXp+i makes with P,_fl; that is, 6p+1 is the
minimum angle of 0Xp+1' with a line 0 U such that the coordinates of U are of the
form (78) and satisfy (7-9). Without loss of generality we may also take U at unit
distance from the origin, so that
Eu^l (7-10).
Since Sxf+i% = 1 by (7 3), we then have
cos dfi+1 <= Su«^ + i' (Til).
To determine the minimum angle we therefore differentiate with respect to «i,..., u<
the expression
S ; i ' — i y Su 1 -
352 Relations between Two Sets of Varieties
where y, Xlt ...,Xf are Lagrange multipliers. This gives
\.xik'+...+XftXfh'=xft+Xth'-yuh (h=\,...,t) (712).
Multiply (712) by uA and sum with respect to h. The left member disappears
by (79), and from (710) and (711) we have
y = cos0p+1 (713).
Upon multiplying (7'12) by x^, summing with respect to h, and using (79),
we have, for a •= 1,2,..., fi,
A 1 S.r 1 '*/ + . . . + A 3 2 V ; C a ' = Sz.>0 + i' (7-14).

Downloaded from http://biomet.oxfordjournals.org/ at University of Oklahoma on January 17, 2015


Eliminating Ax, ..., Xf from the j3 + 1 equations (714) and (7*12) we have

= 0 (7-15).

Multiply the last row of this determinant by a^j-n,*' and sum with respect to A
from 1 to t. The last element, with the help of (7-11) and (713), reduces to
SW8-y",
and so, from (76), we have
7* = ^ (M6).
Hence, from (713),
cos^+1=^/^ 0 = 1,2 s-1) (717).

The cosine of the angle which 0Xx makes with P< is


cos01 = VS^7*=.\/Z)1 (7-18).
Multiplying together all the equations (717) and (7-18) and recalling (7-7), we
obtain
q = cos 0i cos 8»... cos 6, (719).
It is obvious that the correlatious R? defined at the beginning of this section
have the property that
/?0 = COS0 3 ,
so that (7-19) is equivalent to (7"2).

8. An Exact Sampling Distribution of q. We shall now deduce the exact dis-


tribution of q in samples from a multivariate normal population in which the vector
correlation is zero, for the case in which one of the sets consists of exactly two
variates. From (4-5) it follows in this case that at least one of the two canonical
correlations is zero. If the numbers of variates in both sets are 2, we have essentially
the case of the tetrad difference; the distribution will then be symmetrical, since
the population value is assumed to be zero. Let p, = 0, and for brevity put v for p^,
which will be a parameter of the distribution.
HAROLD HOTELLING 363

The angle 0t defined in Section 7 between the line OXi determined by the
sample values of the first variate and the flat space Pt determined by those of the
second set has the property that Ri = cos 0i is the multiple correlation of X\ with
the second set of variatea The population value of this multiple correlation is p\.
We assume all the variates subject to random sampling. In this case Rx will have
the "A" distribution discovered by R. A. Fisher*. In our notation, with samples
of n + 1 from which the means have been eliminated, or in samples of n + k from
which k degrees of freedom have been removed by least-squares elimination of
other variates, the distribution of Ri is

Downloaded from http://biomet.oxfordjournals.org/ at University of Oklahoma on January 17, 2015


(81),
with F denoting the hypergeometric function.
The points X% of Section 7 corresponding to an infinity of samples form a globular
cluster having spherical symmetry with centre at the origin, in the flat space of n —1
dimensions perpendicular to 0Xx. In this flat space is the space Pt-\, which makes
with 0Xt' the angle 0%. Hence i2j = cos0i has the distribution of a multiple
correlation coefficient in samples from an uncorrdated normal population, with t — 1
"independent" variates. We replace n in (81) by n— 1, t by t — 1, Ri by Rt)
v by zero, and have
(^ZT) t-S n-t-2
9 2
(82).

From (7'2) we have q = RiRt. Hence put Rt = -~ , dRt = ^ i n (8#2), multiply by


-til -£*>l
(81), and integrate with respect to R^ from q to 1. This gives the distribution of q
in the form

d(f)

n-t-2 -n+t+1
1 a 1 2
x J [(l-i?)(i?-g*)] (R ) . F Q , | , | , Kfi*)d(#»)...(8-3),
where the subscript is dropped from the variable of integration. Now

' "The General Sampling Distribution of the Multiple Correlation Coefficient" In Proceeding! of
the Royal Society, Vol. CIXL A (1928), p. 660.
Biometrika xxvni 28
354 Relations betxoeen Two Sets of Varieties
Making this substitution and changing the variable of integration to

we have therefore
(W
" 2 ) ! . _.. (1 - v? (1 - f)»-<-i (f)~ d (f)
r
2»-'(<2)!r»( y)
n-t-2 -n+t+1

(8-4).

Downloaded from http://biomet.oxfordjournals.org/ at University of Oklahoma on January 17, 2015


If it is supposed that the values of the variates in the second set are fixed,
instead of varying normally from sample to sample, Fisher's distribution "C of the
multiple correlation coefficient should be used instead of his "A" distribution.
This gives for q a new distribution, which for s = 2 may be written in terms of
a confluent hypergeotnetric function

2 2
0 (lfy-tqdqi'ixilx)] [ 1 ( 1 ? V ]

However the conditions of sampling under which q is likely to be used are such
that (84) appears to be the more important form, and we shall give no further
consideration to (8"5).
By extension of •the reasoning above the distribution of q may be found for
larger values of s, provided all but one of the parameters pi, ...,p, vanish. Thus
for « = 3 we have, from (7"2),
q = Ri R» Rg — q'Rt,
where q' has the distribution (83), i.e. (8*4), while R» has the distribution obtained
from (8"2) by replacing R* by R», n by n — 1, and t by t — 1. Combining this with
(84) in the same manner that (82) was combined with (81) to produce (83), the
new distribution of q is obtained. This process may be repeated to obtain the
distribution for values of « as great as desired; but it must of course be remembered
that s^t.
It is tempting to try to obtain the general distribution of q, without our
assumption that all but one of the quantities pi, ...,p, are zero, by treating these
as population values of the multiple correlations whose distributions are used
successively in finding the distribution of q as above. However this suggested
procedure appears to be incorrect Ifpi^O, the centre of the globular cluster
formed by the projected X% points will have a centre which is not on Pt-i, where
it should be if the multiple correlation distributioa were to be amplified.
9. Moments of q. The Distribution for Large Samples. We shall derive the
even momenta of the distribution of Section 8, assuming that v does not take
HAROLD HOTELLING 355

either of the extreme values 0 and 1. The latter is the case in which q becomes
a partial correlation coefficient, the theory of which is well understood. The case
v = 0, corresponding to complete independence, is a very simple one, concerning
which all information desired may be obtained from Section 11. The moments
will be obtained by processes involving repeated interchanges of order of the
processes of integration, differentiation, and summation of series. It will be
observed that the uniform convergence and continuity required to justify these
interchanges exist, provided v is definitely between 0 and 1 without taking either
of these values.
The odd momenta of q about zero, which require no consideration unless s = t,
vanish in this case when only one of the canonical correlations is different from

Downloaded from http://biomet.oxfordjournals.org/ at University of Oklahoma on January 17, 2015


zero, since the distribution of q must then be symmetrical. Let fitk be the 2kth
moment of q, which is also the kth moment of q*, about zero. To determine its
value, multiply (8*3) by q** and integrate with respect to q* from 0 to 1. In the
double integral thus obtained, a reversal of the order of integration means that q*
will vary from 0 to R\ and then i2* will vary from 0 to 1. The first integration in
this new order may be effected at once, since upon putting q* = B?z, d (q*) = R*dz,
we have
n-<-2 n-8
f I 1
rp"
V 2
Therefore

„ j

t-2 , n-t-3

Expanding the hypergeometric series and integrating term by term, since

we obtain

(iii)rCi
In this we make the substitution

ri-+rl » ,
1
"~ ' l rl - + r - l
1
I 2 /i _\t—IJ_ /'U•9^ •
fl/ I i — *./ Cr* • . * . . . • • • • • • I t / —/>
0

as—a
356 Relations between Two Sets of Variates
upon reversing the order of integration and summation we have

• (1 - v)

xJ o (l-s)*-^S v'dx.

Now make the further substitution

Downloaded from http://biomet.oxfordjournals.org/ at University of Oklahoma on January 17, 2015


r(« + * + r)
= a;
dk i .(93).

which gives, upon interchanging the order of summation and differentiation,

- (l-O

n-t

Jo
\2/>
The sum is now a binomial expansion and

n-t
(9-4).

From this form, by k successive integrations by parts, it is easy in any particular


case to calculate /t, t . Thus for k = 1 we have
n n-1 t n
^ = — (1 - vf Vx"*' I- L? (1 - vx) ~ *\ dx

S^T*1 —»' " •«>?4l|


H A R O L D HOTELLING 357
Euler's transformation (lO'l) of the hypergeometric reduces this to

in which the series can easily be calculated to any required accuracy. For large
samples the convergence is extremely rapid. A series of powers of n~l may also
be obtained by expanding each term of the hypergeometric series:

+ ... (9-6).

This form brings out the manner in which, for large samples, fit varies with v.

Downloaded from http://biomet.oxfordjournals.org/ at University of Oklahoma on January 17, 2015


The moments may alternatively be found by a slightly different method, giving
a general result in which an integral does not appear explicitly. In the identity
(9'3) let x be replaced by v, and make both this substitution and (9-2) in (9'1).
The result is

r(^ ,+k+r-l
dx.
X afv
d? *
The integral equals
J+fc-i «J-l i_ ru n »i n
dx=

whence
Jo
(HJ
-- -
5 +
/ (I-,) 2

n n

This may be made even more explicit by performing the differentiation with the
help of Leibnitz' theorem; and Euler's transformation may be applied to each of
the hypergeometric functions to give a rapidly convergent series. In this way we
obtain

k\

r
(1 - i')r F (k, k,^ + 2k-r,v\ (9 8).
358 Relations between Two Sets of Variatev
The expressions obtained from this by substituting particular values of k are
different in form from those obtained directly from (9*4), but are reducible to them
with the help of the Gauss relations between "neighbouring" hypergeometric
functions.
From (97) it is easy to see that f^ = 1, as it should; this checks a long chain
of deductions.
The asymptotic value of /i, t for large values of n will now be investigated.
In the expression for the kth derivative obtained from (9*4) by Leibnitz' theorem,
the term of highest order in n is

Downloaded from http://biomet.oxfordjournals.org/ at University of Oklahoma on January 17, 2015


whence

/I \2

f1 "-*
x (1 — x)r~1(l — vx) dx
Jo

Hence, for 8 — t = 2, the distribution of 5 approaches the normal form, with variance
- , as is seen either from (9'9) or from (9'4), and mean value zero.
71
For t ^ 2, the distribution of q does not behave in this way. As in the case of
multiple correlation, it is then confined to positive values. An approximation to
the distribution is however suggested by the foregoing asymptotic values of the
moments. These are in fact the moments of the j£ distribution with t— 1 degrees
of freedom, if we put

The approximate distributions thus obtained may tentatively be used for testing
the significance of q in large samples when v has a value not too close to zero. For
small samples and small values of v, the methods of the next section are appropriate.
HAROLD HOTBLLINO 369

10. The Distribution for Small Sampfa*. Form of the Frequency Curve. The
distribution of Section 8 may in certain cases be expressed in elementary forms.
If n — t is even, the Euler transformation
F(a, b, c, x)= (I - if-*-*F(c -a, c-b, c, x) (1CKL)
may be applied to the hypergeometric function in the distribution to give a
terminating series, and the integration can then be carried out for each term, the
integrand in (8"3) being a rational function of R, or involving (if t is odd) a single
quadratic surd.
Consider for example the simple case t = 2, n = 4. Since t = «, it is more con-
venient to work with the distribution of q than with that of q*. We halve the

Downloaded from http://biomet.oxfordjournals.org/ at University of Oklahoma on January 17, 2015


numerical coefficient because negative and positive sample values are distinguish-
able. The distribution (8-3) becomes, for positive values of q,

(1 - v)*dq P F(2, 2, 1, vR*)dR


Using (10"l), expanding F(— 1 , - 1 , 1, R*), and putting ~v^*p*, this becomes

(1 - Py dq\\l- p*R*)-*(l + p*R*) dR,


or, carrying out the integration,
1 - W l+p 2pq(3-py) 2p(3-p')|
log +
i-p TTfi (i-pVr (i-V)f j
Other special cases of the distribution function may be obtained by integrating the
distributions obtained by R. A. Fisher for the multiple correlation coefficient for
even values of nf.
A more systematic development, not depending on the oddness or evenness ot
n or t, and valuable when n and v are not too great, is obtained by expanding the
hypergeometric series in powers of v and carrying out the integration term by
term. Applying this procedure to (8*4) we obtain integrals of the form
fl n-t-2 -n+t+1
[x(l — xj] 2 n_(i_o*)a;] 2 dx (4=0,1,2,...).
Jo
But this is itself a hypergeometric integral, and equals

where for brevity we put m =n— t (10*2).


* The smallest samples to which the text is applicable are those for which n = « + ». For smaller
samples, the matrix of pseudo-observations has more rows than columns; consequently there is a linear
relation among the rows, i.e. among the sample variates, whose number is thas in effect reduced, so
that a simpler theory is adequate. Thus, if * = t = 2 and n = 8, q rednces to a partial correlation coefficient,
whose distribution is known.
t Op. cit., p. 661.
360 Relations between Two Sets of Vai^iates
Introducing the functions
/ I + oV*-1 „/ro — 1 , m t
( 1 ) ^ ( k,j,m,l-q
k ) (10-3),

the distribution £8'4) may thus be written, for q > 0,

(104).
A factor £ must be applied to this expression in the case t = 2 if we then distinguish

Downloaded from http://biomet.oxfordjournals.org/ at University of Oklahoma on January 17, 2015


negative from positive values of q.
The first of the "relationes inter functiones contiguas" of Gauss* is
(a, b, c, w) + a(l-x)F(a + l, b, c, x)
±, 0, Cf X)
c—a
a = =m> x=l
Putting ~ 2 ' 2"' ° ~'?>
this shows that (103) satisfies the linear difference equation

which may be used as a recurrence relation for computing the successive c t 's as
soon as we have determined two values whose indices differ by unity.
From the identity of Gauss (ibid., p. 227)

F (a, 0, a +fi+ i , m) - F (2a, 2/9, a + 0 + *. ^ — ^ ^ )


we find at once
( ^ 2 ) ( ± ^ i ) (106).
In the series expansion of this last function, numerator nnd denominator factors
cancel in each term in such a way as to leave a binomial expansion
/ l - o X - " ^ 1 / 2 V1*-1
1
I " 2) =(f7i)
In this way we have, from (103) and (106),
t; 0 =l (10-7).
In (103) put k= — 1, and apply (101). The result reduces, with the help of
(10-6), to
«_i = q~l-
We have thus obtained two consecutive values of vk, from which the rest are
successively determined by means of (10"5), a relation which may also be written

Vi+1 = fv^ + m _ ^ + 1 ((1 + q1) vt - 25»Dt_x| (10-8).

• Werke, Vol. in. p. 180.


HAROLD HOTKLLING 361

We thus find in turn

;
It is easy to show by means of the recurrence relation that the limit of vk as m
increases is <?*, and that the remaining terms of vk constitute a polynomial in q
having the factor (1 - q)%.
For a test of significance of q it is necessary to integrate the distribution from
an arbitrary value to unity. For this purpose it is convenient to put p = 1 — q. In
terms of p,

Downloaded from http://biomet.oxfordjournals.org/ at University of Oklahoma on January 17, 2015


13 , 3(2m+ 11) 4
3) * (m +
+ I TT

The distribution (104) may, for the leading case t = 2, be written

The series is uniformly convergent and may be integrated term by term, thus
providing a test of significance for the tetrad difference. It is not however very
convenient for computation unless n and v are small. For large values of n, the
method of the preceding section may be used: the standard error often gives a
satisfactory test of significance, even when used with the crude inequality of
Tchebycheff, which takes no account of the nature of the particular distribution.
Light is thrown on the form of the frequency curves by the expansions we have
just obtained. The case t = 2 stands out as of a special character, different from the
rest; this will be true in general where s = f. This special character is related to
the fact that positive and negative sample values of q are distinguishable only if
8 — t. In other cases, just as in that of the multiple correlation (i.e. that of q when
8 = 1), the values must be taken as positive, and q* is in some respects a more
natural variate to use.
The vt, and therefore the convergent series in (10-4) and (109), and also the
derivatives of the vk and of the series, take definite finite values both for q = 0 and
for <? = 1. From (104) it is therefore evident that the frequency curve for q has, for
q = l, contact with the axis of order m — 2. For q = 0 the ordinate of the curve is
zero for t > 3, but has a finite value for t •= 2.
The derivative with respect to q of the integral in (8'3) has, if v< 1, a finite
negative value for q = 0 as well as for every positive value of q. The ordinate of the
distribution curve for t=2 will therefore have these properties. This curve must
be symmetrical about q = 0. Hence it is not flat-topped, but has a corner above
the origin.
362 Relations between Two Sets of Variates
But if v = 1 the distribution of q for s*= t = 2 does not have such a discontinuity
in the middle. For in this case linear functions of the variates in the two sets exist
which are perfectly correlated with each other, and are thus for our purposes identical.
Taking these as Xi and xs, (12) shows that q is in every sample the partial corre-
lation of the remaining two variates. Hence when v = l the distribution becomes
identically that of the partial correlation coefficient. According to R. A. Fisher's
work*, this is the same as the distribution of the simple correlation coefficient,
with the sample number reduced by unity, a distribution having continuous
derivatives of all orders throughout its range.
11. Tests for Complete Independence. If « = 2 and botJi canonical correlations

Downloaded from http://biomet.oxfordjournals.org/ at University of Oklahoma on January 17, 2015


vanish, the normal distribution of the population implies complete independence
between the two sets. No linear function of the first set is correlated with any of
the second. In this case i/ = 0, and the distribution of q reduces, as is at once
evident from the form (10-4), together with (10-2) and (10-7), to the extremely
simple form,

for positive values of q. Thus q has in this case the same distribution as the square
of the multiple correlation coefficient in samples of n (= N— 1) from an uncorrelated
normal population, with t — 1 variates.
The question whether complete independence exists between two sets of variates
for which we have sample correlations may be investigated by computing q and
determining from (11*1) whether the probability of so great a value of q is negli-
gible. This requires the integral of (ll'l), which is easy to compute for any
moderate value of t. For large values of t it may be obtained from the Tables of
the Incomplete Beta Function^-. For t = 2 the probability of a greater value of | q |
if complete independence really exists is simply

where N is the number in the sample. In this way a very simple test for complete
independence may be applied.
But this is not by any means the only possible test of complete independence
between two sets. Indeed, the distribution of the vector alienation coefficient
(Section 4),
D

has been found by Wilks under this same hypothesis of complete independence
and normality}. This distribution, which was obtained by means of its moments,
reduces for the case s = 2 which we are now studying to

* "The Distribution of the Partial Correlation Coefficient" in Metron, Vol. m. (1924), pp. 829—332.
t Biometrika Office, 1984. t Wilks, op. cit.
HAROLD HOTELLING 363

The range of possible values of * is from 0 to 1, the latter corresponding to


complete independence between the two sets of variates, just as does 5 = 0. q and z
are not functionally related; for a continuum of values of either can be found which
is consistent with any value of the other. The field of the joint distribution of the
two is easily delimited by reference to the canonical correlations r% and r8. Indeed,
if we always take q ^ 0, we have from (4*10) that the field of variation of a point of
coordinates q, z is in the quadrant in which both are positive, and iB bounded by
the parabola

shown in Figure 1. The best agreement with the hypothesis of complete independence
is shown by a sample for which z = 1 and q = 0, and which therefore corresponds to

Downloaded from http://biomet.oxfordjournals.org/ at University of Oklahoma on January 17, 2015


the point in the upper corner of the figure.

Fig. l.

If we represent a sample by a point in a plane in which i\ and r% are rectangular


coordinates, and take rx as the greater, then the field of variation is the right
triangle for which 0 $ r s 4,Ti ^ 1. The point corresponding to best agreement with
the hypothesis of complete independence is in this case the origin. The curves
q = constant and z = constant, shown in Figure 2, are respectively hyperbolic arcs,
and quartic curves which in the neighbourhood of the origin approximate circles.
Their equations are
rira = q, (1 - rf) (1 - r,*) = z (11-5).
To test complete independence by means of z, we need the integral of (113)
from zero to the observed value. For t — 2 this is
n-8
(11-6).
Like the integral of the distribution of q, that of (11'3) is easily found numerically
from the Tables of the Incomplete Beta Function.
364 Relations between Two Sets of Varieties
The existence of two different, though exact, teats of the same hypothesis makes
us ask in what circumstances each should be used. No general answer appears
to be possible to this question; but if we make sufficiently special assumptions
about the nature of the deviations from our hypothesis that are likely to occur, or
test for deviations of a sufficiently special character, a unique solution will exist.
In order that z differ from unity, it is seen from (11'5) to be sufficient for either
of the canonical correlations to differ from zero. But in order that q differ from zero,
it is necessary that both correlations differ from zero. This suggests that the i test
will be the more sensitive to deviations from complete independence resulting from
the existence of only a single component common to the two sets of variates; but

Downloaded from http://biomet.oxfordjournals.org/ at University of Oklahoma on January 17, 2015


if the correlations of one set with the other result from two independent common
components operating to an approximately equal extent, the deviation from in-

dependence will be revealed by q more clearly than by z. This conclusion is


confirmed by a comparison of (11'2) with (11'6), putting for q and z their values
from (11 5). Ifr a = ri, then
P = (l - n«)-«, P' =. (l - r ^ - a {(« - 2) n« + 1 - V},
so that P < P', and q provides the more sensitive test. If on the other hand rj=O,
P = 1, so that q provides no evidence whatever of deviation from independence,
though for a large enough sample P' becomes arbitrarily small, supplying evidence
to any desired extent, if ri has any constant value other than zero.
Let us apply both tests for complete independence to Kelley's correlations
cited in Section 6. For the correlations of arithmetical with reading abilities the
values (6'1) of q and * were obtained, with s = t = 2. From (11*2) the test for
complete independence based on q gives P= '023, a probability so small that we
may conclude that the two kinds of ability really have something in common.
HAROLD HOTELLING 365

The same conclusion is given even greater definiteness [by the z test (11'6), from
which we have F = -0001.
The comparison of arithmetical with memory tests in Section 6 was for the
values « = 2, t = 3. In this case we find from q that P = 86 x 10"10, while the test
for complete independence by means of z gives _P'= 10 x 10~M. Thus z gives a more
sensitive test, and a more conclusive demonstration, of complete independence in
both these cases than does q. The underlying reason for this is the considerable
inequality between the two canonical correlations in each case.
One practical consideration in favour of the q test is that q is somewhat easier
to calculate than z. The chief ground for distinction between them is however their
sensitiveness to different types of deviations from complete independence.

Downloaded from http://biomet.oxfordjournals.org/ at University of Oklahoma on January 17, 2015


Wilks in a later paper* derived the t test for complete independence from the
likelihood criterion of Egon S. Pearson and J. Neyman. The considerations of this
section therefore are relevant to an understanding of this criterion.
It is clear that a full understanding of the relations between two independent
pairs of variates necessitates a knowledge of the bivariate distribution surfaces of
rx and r», and of q and z. These we shall proceed to investigate.
12. Alternants of a Plane and of a Sample. The common method of specifying
the orientation of a plane by the direction cosines of its normal is unsatisfactory in
a space of more than three dimensions, since then a plane has an infinity of normals
at each point. Instead we shall use determinants which may be regarded as of the
form known as alternants. If in a space of n dimensions a flat space of k dimensions
is determined by the origin and k other points, we shall call the 4-rowed deter-
minants in the matrix of the rectangular coordinates of these k points the alternants
of the fc-space. The alternants are Ckn in number, and are connected by numerous
quadratic relations. The number of degrees of freedom of the A-space through
a fixed point is k (n - k). The alternants depend only on the 4-space, and not on
the particular points used to determine it, except for multiplication of all the
alternants by a constant; for to replace the k points by k others in the same i-space
is to replace their matrix of coordinates by a new one whose rows are linear functions
of the old; and this merely multiplies all the Ar-rowed determinants by a constant.
Taking the case k = 2, a plane containing the points

yi,yt, ••-,yn
is specified by the alternants
(121),
which are analogous to the Pliicker coordinates of a line. Indeed, the planes
through a point in n-space are in one-to-one correspondence with the lines in which
they meet an (n— l)-space not containing the point.
* " O n the Independence of It Sets of Normally Distributed Statistical Variables" in Econometriea,
Vol. m. (1985), pp. 309—326.
366 Relations between Two Sets of Variates
The relations connecting these ajtemants, apart from the obvious relations
Pu=-pit (122),
are obtainable from the fact that, on account of identical rows,
x{ x, xk xn
Vi Vt y* y«
• 0.
X{ X, Xk Xn

yi y> y* y«
Applying a Laplace expansion to the first two rows of this determinant we obtain
0 (123).

Downloaded from http://biomet.oxfordjournals.org/ at University of Oklahoma on January 17, 2015


There is one of these relations for each combination of subscripts, but not all are
independent. To obtain a set of independent relations which shall imply all the
rest, we first observe that not all the ptf can be zero if a definite plane is to be
specified, for in that case the x- and y-points would be collinear with the origin.
Let the notation be arranged so that pii^O. Then in terms of
Pll,Pl»,Pu, •••,Pln\ -12
PU.Pu, •••,Ptn)
any other alternant pv is determined from (12"3), by putting k = 1, m = 2, so that
P
PiJ = W«-r» " (12-5).
Pi*
The relations (125) constitute a complete set of independent relations of the
form (12-3). For if in the left member of (12'3) we substitute for all the alternants
the expressions obtained from (12*5) by putting the several combinations of sub-
scripts in place oft andj, the resulting equation is satisfied identically. Furthermore,
any set of quantities pa (i, j= 1, ...,n) satisfying (12'2) and(12'5), and not all zero,
determines uniquely a plane through the origin. For, supposing that piti=0, we
may obtain the coordinates of two points not collinear with the origin in the
following manner. Let a^ = ys = 0, and let xt = l. Putting i = l , ,;'=2in (121)
we then have yi = — pu- Then putting first j=l and then j = 2, we obtain
tc{ = ^ = —2^; and y<=Pn. The points whose coordinates are thus determined
yi Pis
cannot be collinear with the origin, for if they were the alternants would all be zero.
Since the alternants of a plane have so far been determined only to within
a multiplicative conacant, we may determine them uniquely if we add the condition

Here we use the sign E to mean summation over the - -- alternants

for which the first subscript is less than the second. This condition on the
quantities (12-4) shows that the number of independent alternants is 2n — 4, which
is the number of degrees of freedom of the plane.
HAROLD HOTELLING 367

We define the alternants of a set of observations of two variates on n individuals


(or rather with n degrees of freedom after elimination of the mean and possibly
other variables and an appropriate orthogonal transformation of the observations)
as the determinants p# in the matrix of observations, or of pseudo-observations,
multiplied by a constant It will frequently be convenient to choose this constant
so that (12'6) is satisfied. This definition breaks down in the case where the
determinants are all zero, but if the observations are a sample from a continuous
distribution this is infinitely improbable, and we shall disregard this case.
It is clear that all relations between two pairs of variates that are invariant
under internal linear transformations of the pairs, and are based on a sample, mast
be expressible in terms of the alternants of the two pairs; for such relations must

Downloaded from http://biomet.oxfordjournals.org/ at University of Oklahoma on January 17, 2015


correspond to relations between planes through the origin in n-space, independent
of the particular points used to define the planes. The relations depending on
correlations must also be invariant under rotations of the n-space about the origin,
since the correlations are cosines of angles at the origin, which are invariant under
rotations. We shall therefore suppose for simplicity that the axes have been rotated
in such a way that the first two of them lie in the plane of the observations on one
pair of variates, and contain the observation points for this pair. We shall suppose
further that all the observation points are at unit distance from the origin, so that
the sum of the squares of the observations on each variate is unity. None of these
assumptions reduces the generality of our results. The matrix of observations now
takes the form
\ X% &i Xi CCi ... Xn

y» y« y« • •• .Vn
.(127).
1 0 0 0
0 o . ..
0 I 0 0 0 o . ..
The determinant D of the correlations among the four variates is the deter-
minant of sums of squares and products, since each sum of squares is unity.
Hence, by Section 2, D is the sum of the squares of the four-rowed determinants
in the matrix (127). But all these determinants are zero except those containing
the first and second columns. The determinant consisting of the first, second,
tth andjth columns equals o^y, —o^y,. Defining this as p^, we obtain the result
Z) = E ' V (128),
where 2 ' denotes summation from 3 to n with respect to t, and from t + 1 to n with
respect to j . It is further evident from (127) that the determinants of correlations
within the sets are
1 r.
.(12-9).
The equation (3'6) for the canonical correlations of the sample is
A Arjs rla ru
A?*Dt A T"j8 T*f|
= 0 (1210).
7*U f*2B A

'"14 I'M
368 Relations between Two Sets of Varieties
The coefficient of A* in this equation is AB = Ep4if. The term independent of A is
the square of the sum of the products of the determinants in the first two rows
of (127) by the corresponding determinants in the last two rows. The latter
determinants, however, are all zero but the first; hence the constant term in the
equation is pu1. The coefficient of A* may be obtained by putting A = 1 in the left
member of (12'10) and subtracting the coefficient of A4 and the constant term.
This coefficient is therefore equal to

where S denotes summation with respect to t from 1 to n. We recall in this


connection that (12"2) shows that pH = 0. The equation may thus be written in

Downloaded from http://biomet.oxfordjournals.org/ at University of Oklahoma on January 17, 2015


terms of alternants
0 (12-11).
If we regard this as a quadratic equation in Aa, the roots are n 1 and »-,*. Hence,
from (11'5) and the expressions for the coefficients in terms of the roots,

If we adopt the further convention (12'6), which by (12'9) is seen to be equivalent


to assuming that the first pair of variates has been reduced by an internal trans-
formation so that the correlation is zero, (12'11) and (12'12) simplify to
A«-A*S(p 1 < »+V)+Pu 1 = 0 (1213),
q=±Pu> * = S V (12-14).
If we do not specialise one of our planes with reference to the coordinates, but
take its alternants as qu, while those of the other are p^, it is easy to see in the
foregoing manner, or from Section 4, that the vector correlation is
Xpiiqii
V(Sp«)(5V)
This has the form of the ordinary formula for a correlation coefficient, or of the
cosine of an angle. From the latter fact comes the following conclusion, which is of
the utmost importance for our purposes.
Let us take the alternants p o for which t < j as Cartesian coordinates in a space
of — dimensions, and denote by V the subspace in which the equations (12"5)
A
and (12-6) hold. Then V is a curved space of 2n - 4 dimensions, in which all the
equations (12*3) hold, since they follow from (12-5). The points of Fare in one-to-
one correspondence with the planes through the origin in n-space. A property of
this correspondence which we shall use is that it is metrical, in the sense that any
rotation of the n-space about the origin engenders a transformation of V which is
also a rotation. This fact follows from (12-15), which shows that q is the cosine of
the angle at the origin between lines extending to points of V representing two
planes. Under a rigid rotation of the n-space, the correlations defining q are all
HAROLD HOTELLING 369

invariant, so that q is invariant Hence the points of V representing the rotated


pair of planes are exactly as far apart as the points representing the planes in their
original position, since all these points are, by (126), equidistant from the origin.
Thus the transformation of V satisfies the definition of a rotatiou.
13. The Bivariate Distribution for Complete Independence (a = t = 2, n = 4). If
there is complete independence of one pair of variates from another, pi = pt = 0.
We may without loss of generality regard the internal correlations also as zero.
The planes corresponding to a sample from a normal distribution are then deter-
mined by lines drawn through the origin in n-space at random, in the sense that
the probability of a line meeting any region on a surrounding sphere is proportional

Downloaded from http://biomet.oxfordjournals.org/ at University of Oklahoma on January 17, 2015


to the generalized area of the region. The chance selection of a plane in this way
is equivalent to the selection of a point in V in such a way that the element of
probability is proportional to the volume element. For, since any plane through
the origin in n-space can be rotated into any other, any point in V can be rotated
into any other, and will carry with it in this rotation its probability density, which
must therefore be uniform over the whole of V. Thus all problems of finding
distributions of statistics calculated from the pairs of variates in such a way as to
be invariant under internal transformations reduce to purely geometrical problems
of finding the (2n — 4)-dimensional volumes of the corresponding regions in V.
The distribution of q and z, or of ri and r%, will be deduced with the help of
methods of parametric representation resembling those previously applied by the
author to other statistical problems*. First we take the case n = 4. In the six-
dimensional space in which the alternants are Cartesian coordinates, V is then a
curved four-dimensional space having the equations
Pi»PM-pi*PM + Pi*Pta = O, pu? + pig* + pu* + PM1 +pu* + pm* = 1
(131).
It follows that V may be defined in terms of four parameters a, ft, y, 8 by means
of the equations
pit = | (sin a sin ft + sin 7 sin 8)
pta = £ (sin a sin ft — sin 7 sin 8)
pis = ^ (cos a sin ft.+ cos 7 sin 8) I (132)
pu = — i (cos a sin ft - cos 7 sin 8)
pit = i ( cosft+ cos 8)
. PM= i ( cosft— cos 8)
since these equations satisfy both the equations (13-1). All points of V are included
when we allow a and 7 to vary from 0 to 2ir, and ft and 8 from 0 to ir. The element
of volume in V is of course
•SgdadftdydS,
* "The Distribution of Correlation Ratios Calculated from Random Data" in Proceeding! of the
National Academy of Sciencet, Vol. n . (1925), pp. 687—662; "The Generalization of Student's Batio"
in AimaU of Mathematical StatUtiet, Vol. n. (1981), pp. 860—878; "The Physical State of Protoplasm,"
loc. tit.
Biometriia nvin 24
370 Relations between Two Sets of Variates
where g is the sum of the squares of the four-rowed determinants in the matrix of
partial derivatives of the pt} with respect to a, /9, 7 and 8. These derivatives are
the halves of the elements of
cos a sin /9 cos a sin /9 —sin a sin /9 sin a sin /9 0 0
sin a cos $ sin a cos y9 cos a cos /9 — cos a cos /9 — sin /9 — sin £
cos 7 sin 8 — cos 7 sin 8 — sin 7 sin 8 — sin 7 sin 8 0 0
sin 7 cos 8 — sin 7 cos 8 cos 7 cos 8 cos 7 cos 8 - sin 8 + siu 8
The sum of products of corresponding elements in each pair of rows of this matrix
is zero; the sums of squares are respectively 2 sin*/9, 2, 2 sin18, and 2, each of which
sums must be divided by 4. Thus in accordance with Section 2 we have

Downloaded from http://biomet.oxfordjournals.org/ at University of Oklahoma on January 17, 2015


g = •& sin* /9 sin2 8,
so that the volume element in V is
\ sin 0 sin 8 dad0 dy d$, or i d (cos/9) d (cos 8)dacfy.
From this it follows, if we put
£ = cos£, £' = cos8 (133),
that £ and {' are independently distributed with uniform density between — 1
and 1. If we take q to have the sign of pit, and >Jz to have that of pu, we have,
from (1214), (13"2) and (133),

whence f = q + *Jz, £' = q — \jz.


These relations show, because they are linear, and since points of coordinates (£, f')
are uniformly distributed in the square $ = ±1, £ ' = ± 1 , that points of coordinates
(q, t/z) are uniformly distributed in the square bounded by the four lines
q ± -Jz = ± 1.
If we restrict q and >Jz to positive values, their distribution will be
2 dq d >Je,
within the triangle bounded by the coordinate axes and the line q + *Jt = \. The
distribution of q and z is therefore
z-idzdq (134),
subject to the limitations that both are positive, and that
^(i-g)8 (13-5).
If we integrate (13'4) with respect to q or z we obtain Wilks' distribution (11'3)
of z, or the distribution ( l l ' l ) of q, respectively, for the case « = t = 2, n = 4.
We may regard (a, /9) and (7, 8) as the spherical coordinates of two points on a
sphere in 3-space. The equations (13'2) thus establish a correspondence having
metrical properties between planes through a point in 4-space and pairs of points
HAROLD HOTELLING 371

on an ordinary sphere. In this representation q appears as the mean distance of


the two points from a fixed plane, while >Jz is half the difference of the distances
from this plane. From the theorem of Archimedes that the area of a zone depends
only on the distance between the bounding planes and the radius of the sphere
(which in this case is unity), it is therefore evident that when z is fixed the distri-
bution of q is of uniform density, confirming (13'4).
Let us call r the cosine of the angle between the two lines determining our
variable plane in 4-space. The distribution of r, which is the sample correlation
between two really uncorrelated variates, is readily seen geometrically, or by putting
n = 4 in the general distribution of such sample correlations, to be

Downloaded from http://biomet.oxfordjournals.org/ at University of Oklahoma on January 17, 2015


-(l-r*)idr.
IT

Moreover, this distribution is independent of that of q and z, since r depends only


on the angle within the plane, and q and z on the plane itself. Consequently the
joint distribution of the three is, for n = 4,

-(l-r^iz-idrdzdq (136).
7T

This result and the following theorem will be used in Section 15 in extending the
distribution to a general value of n.
14. Theorem on Circularly Distributed Variates. The sum or difference of two
variates distributed independently and with uniform density over a particular range
is known to have a distribution represented by an isosceles triangle whose base has
double the breadth of the original range. If however each value of the sum or
difference is reduced with the original range as modulus—that is, is replaced by
the remainder after dividing by the range—the resulting distribution is exactly
the original one, with uniform density over the same range. This is a special case
of the following rather remarkable
THEOREM : / / any number of variates are distributed independently and with
uniform density from 0 to a, then any linear function of these variates with integral
coefficients, when reduced modulo a, is likewise distributed with uniform density from
0 to a. Any number of such functions, if algebraically independent, are also inde-
pendent in the probability sense.
The truth of this theorem becomes evident when we regard each set of values
of the variates as a point in a space having the metrical properties of a hypercube
of as many dimensions as there are variates, but with a topological nature deter-
mined by making each pair of diametrically opposite faces of the hypercube
correspond to a single region of the space. The space is thus a closed manifold
generalizing a torus in its topology, but not contained in a euclidean space, because
of its metrical nature. For two variates this representing space would be approxi-
mated by a torus obtained by revolving a very small circle about a very distant
line in its plane. Another representation in this case would be by means of the
24-2
372 Relations between Two Sets of Variates
squares of side a into which a plane is divided by two sets of parallel lines, all
points occupying a particular position within their respective squares being regarded
as identical. If we call the variates, or coordinates, X\, x\, ..., xm, the linear
functions
y{ = S a,,*, (i = 1, 2, ...,&) (14-1),
i-i
in which the coefficients a u are positive or negative integers or zero (but are not
all zero for any value of i), are constants over loci which, on the representation on
a plane or flat space of m dimensions, are parallel lines, or hyperplanes of m — 1
dimensions. In the space itself, in which only one point corresponds to each set of

Downloaded from http://biomet.oxfordjournals.org/ at University of Oklahoma on January 17, 2015


values of the variates other than 0 and a, the loci (14rl) are closed curves, or closed
hypersurfaces, because the coefficients are integers. It is obvious that the volume
in this space contained between y< = & and yf = c must, on account of the homo-
geneity, be proportional to b — c.
If the k linear functions (141) are linearly independent, the loci obtained by
giving each y{ a succession of constant values differing consecutively by - , where
p is an integer, will divide the space into congruent parallelepipedous. If A; — 1 of
the y{ are constrained to lie in certain of these intervals, the representing point is
merely constrained to lie in a certain layer. Since all such layers must be congruent,
the distribution of the kth of the yit reduced modulo a, is not affected by this
constraint. Hence all the variates thus reduced are independent.

15. Generalization of Section 13 for Samples of Any Size. No direct extension


to a larger number of dimensions of the method of using alternants in Section 13
appears to lead in any simple fashion to the generalization of the distribution there
found. This generalization will however be obtained with the help of hyperspherical
coordinates in the space of the observations.
On account of the spherical symmetry of the density distributions in n-space in
the absence of true correlation, our distributions will not be affected if we assume
the two points of coordinates (xi xn) and (yi,..., yn) to be taken independently
at random on a unit sphere about the origin in n dimensions, in such a way that
the element of probability for each is proportional to the element of (n — ^-dimen-
sional area on this hypersphere. If we define the hypersphere parametrically by the
equations
xi = sin #i sin 0»
x% = cos 0i sin 0j
Xg «= COS 0 2 COS 63

xt •= cos dt sin 63 cos 0« , K 1


*• I10 l

a;n-i =* cos0nsin 03 ... sin 0n_sco8 0n_i


tcn — cos 0» sin 03 ... sin 0n_s sin 0n_i
; (H03,0« 6n.%%
HAROLD HOTELLING 373

which satisfy Sa^ = l identically, then the element of (n —l)-dimensioniil area for
the appoint may be written

where g is a determinant of n — 1 rows, in which the element in the tth row and jth
column is
(Lr2)
?-iWtW§ -
All these quantities are readily seen from (15\L) to vanish except those for which
i =j. The successive diagonal elements of g are
sin"#2, 1, cos 2 0 2 , cos 2 0 s sin*0 S) ....

Downloaded from http://biomet.oxfordjournals.org/ at University of Oklahoma on January 17, 2015


Hence the element of generalized area may be written
sin 02 cos""8 02 sin"- 4 03 sin"- 6 0«... sin 0n_g dfli rf02... rf#u-i (15-3).
In the same way, if we put

cos fa sin fa
COSfaCOS fa •(15-4),
cos^jain <£3cos<

?/„ = cosfasin ^3sin 1^4...sin <f>n_i,


the element of probability for the y-point is proportional to
sin <£a cos"" 3 <f>t sin n ~ 4 fa... sin <j>n_2 dfa dfa... dfa;^ (15-5).
The distribution of the parameters defining the two points is obtained by multiply-
ing (15'3)by (15"5). It is evident that all the quantities dlt ...,0,,_i, fa, ...,<£,,_i are
independent in the probability sense, since the distribution function is a product
of functions each involving only one of these parameters.
We now introduce quantities 1/3, w«, ..., un, t/3, ..., vn defined by the equations
x, = i/,cos0 2) yi = vicosfa (t' = 3, 4 , . . . , n) (15'6).
The uit by (15"1), are functions only of 03, 04, ..., 0,,_i and the vt, by (15'4), are
functions of fa fai-i- The ut and v{ may be regarded as Cartesian coordinates of
two points on a sphere in space of n — 2 dimensions, these points being taken
independently of each other and of the values of 0 t , 6%, fa and fa, with the element
of probability proportional to the element of (n — 3)-dimensionul area. If we denote
the angle between these points by A ( O ^ A < T T ) , it is evident that the distribution
of A is proportional to
sin"-4ArfA (157),
and is independent of 0lt 6%, fa and fa.
Let r be the cosine of the angle subtended at the origin in the n-dimensional
space by the x- and y-points. Since
... + uuvn = cos A (15'8),
374 Relations between Two Sets of Varieties
it is evident from (151), (15-4), (15-6) and (15-8) that
r = ~Lxy = cos (#i —fa)am 0a sin <j>t + cos A cos 9» cos <j>i (15'9).
The sample values of the variates of the second pair may in the absence of
correlation in the population be represented by an arbitrary pair of fixed lines; we
shall represent them by the first two coordinate axes. We then have from (12-12)
and (12-1), with the help of Section 2,

where 2 denotes summation from 1 to n, and 2 ' from 3 to n. These expressions

Downloaded from http://biomet.oxfordjournals.org/ at University of Oklahoma on January 17, 2015


become, upon substitution from (15-1), (15-4), (15-6), (15-8) and (15"9),

_ 1 -r1 (loJU).
To simplify the notation we shall replace 9% and <f>% simply by 9 and <f> respectively.
We shall also put
to = 0
0i —f

Since only the sines and cosines of to will enter into our discussion, we may
regard a> as reduced modulo 2TT. NOW 6% and <f>i vary independently and, as is seen
from (15-3) and (15'5), with uniform density from 0 to 2TT. Hence their difference
to must, by the theorem of the last section, have a distribution of uniform density
from 0 to 2TT. Moreover, since <?i and <f>i have been seen to be independent of 9,
<f> and A, it follows that 10 is likewise independent of them; indeed, to, 8, <f> and A
constitute a completely independent set. The distributions of 6 and <j> are
determined by integrating (15'3) and (15"5) between constant limits with respect
to all the variates appearing in them except 9% and <f>t respectively. Combining
with (15'7) the result of this integration and the uniformity of the distribution of to,
we have that the element of probability is of the form
(15-11),
where kn depends only on n. The limits for 8 and <f> are 0 and - ; for A they are 0
and 77; for to they are 0 and 2TT.
In the new notation, (15 - 9) and (15 - 10) become
r = cos at sin 9 sin <f> + cos A cos 9 cos (f> (15-12),
. sin8 en sin 9 0sin2<£ sin 2 A cosa 8 cos8<f> ,,rio\
= z=
^ —r^—• —r^— (is-13).
We next consider a transformation to the variates q, z, r and co. Without troubling
to compute the Jacobian J of this transformation, we need observe only that it is
independent of n, since the functional relations (15'12) and (1513) do not involve n.
Substituting in (15'11) from the second of the equations (15'13) we find that the
distribution is of the form
n-8 «_-8
knifiz 2 (1 - }•») "2 " dqdzdrdut,
H A R O L D HOTELLING 376

where ip does not involve n. Upon integrating this with respect to at between
certain limits depending on q, t and r, but not on n, we have the distribution
n-3 n-8
2
kn^Yz (1 — r8) 2
dqdzdr,
2
which for n — 4 must reduce to (136). Comparing with (136) we have & 4 Y= —.
7tZ
Inasmuch as the distribution of r is known to be
n-8

Downloaded from http://biomet.oxfordjournals.org/ at University of Oklahoma on January 17, 2015


and to be independent of that of q and z, we have for the distribution of the latter
two
n-5
2
K' dzdq,
where kH depends only on n. Since the integral over the entire range of variation
defined by the inequalities

must be unity, the constant h* is readily found. The distribution is


n-fi
l(n-2)(n-3)z~T~dzdq (1514).
Its form shows that, in the plane of Fig. 1, the loci of uniform density are
horizontal lines.
The distribution of the canonical correlations is determined by (15*14), together
with (11'5), which latter gives

Thus the distribution of the correlations in case of complete independence is

(n-2)(n-9)(rl'-rat)(l-n*)~(l-rS)~dr1d>; (15-15).
16. Further Problems. The foregoing treatment of sampling distributions is
obviously incomplete. I t would be desirable to have exact distributions, both of
sample canonical correlations and of various functions of them, for cases in which
the canonical correlations in the population have arbitrary values. The coefficients
obtained for the canonical variates have sampling distributions which remain to be
determined. Furthermore, various possible comparisons among different samples
remain to be investigated; for example, there is the problem of testing the signifi-
cance of the difference between vector correlations obtained from different samples.
A generalization of the problem of relations between two sets of variates invariant
under internal linear transformations is that of invariants under such transformations
of three or more sets of variates. A beginning of this theory has been made by
376 Relations between Two Sets of Variates
Wilks in the work previously alluded to; the e we have used is only a special case
of a statistic of his, which in general is defined with reference to any number of sets
of variates as a fraction, whose numerator is the determinant of the correlations
among all the variates, and whose denominator is the product of the determinants
of correlations within sets. It is obvious also that the invariants we have discussed,
taken between every two of the sets, are invariants of such a system. An additional
set of invariants will be the roots of the equation in A resembling (3'6), obtained
from the' determinant of all the correlations or covariances by multiplying those
between variates in the same set by — A. It is easy to prove with the help of the
theory of A-matrices that the roots and coefficients of this equation are actually
invariants.

Downloaded from http://biomet.oxfordjournals.org/ at University of Oklahoma on January 17, 2015


A generalization of our work in a different direction would consider invariants,
not under all linear internal transformations, but under a restricted class of these
transformations. For example, a study of the relations of the prices to the quantities
of several commodities might well consider transformations of commodities, such for
example as the mixing of different grades of wheat, or the combination of raw
materials and labour into finished products. If from quantities qlt qt, ... of the old
commodities there are formed quantities qi, qt', ... of the new, we may, at least
approximately, write

If all the costs and profits of the mixing or manufacturing operation are regarded
as prices of constituents, the value of one set of commodities will equal that of the
other, so that
(16-2),
where the pf are the prices of the original commodities and the p/ are those of the
products. If we regard (161) as a linear transformation of the quantities, there
will be a corresponding linear transformation of the prices, whose coefficients may
be determined in terms of the c# by substituting (161) and
p/ = T.dikpk
in (16"2) and then equating coefficients of like terms. This process shows that
'Ec{id{k = 8jk, =li{j=>fc, =
These equations fully determine the dik as functions of the cit. The relation is such
that the transformation of prices is contragredient to that of quantities.
An important class of relations between prices and quantities of a group of
commodities would be the class of relations invariant under mixings of the kind
described above. The canonical correlations and their functions, which are the main
subject of this paper, are such invariants. But on account of the restriction that
linear transformations of one set of variates shall be contragredient to those of the
other, there will be additional invariants for this case, which remain to be in-
vestigated.
HAROLD HOTELLING 377

Other important problems are connected with the case of equal canonical
correlations, which had to be excluded in deriving the approximate standard errors
in Section 5. If two or more canonical correlations in the population are equal, it
appears that the distribution of the corresponding sample values does not approach
the multivariate normal form. This case is of much practical importance, owing to
the practice of devising tests designed to measure the same character with equal
accuracy. The psychologists' use of "reliability coefficients," and of "correlations
corrected for attenuation " has been recognized as unsatisfactory. One symptom of
trouble is that the formula for correlations corrected for attenuation sometimes
gives values greater than unity. A satisfactory treatment of this difficulty should
be possible with the help of the distribution function, when found, of sample

Downloaded from http://biomet.oxfordjournals.org/ at University of Oklahoma on January 17, 2015


canonical correlations, or of the vector correlation, when in the population the roots
of the determinantal equation are equal

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy