Chap 1

STAT7005 Multivariate Methods
1. Overview and Matrix Orientation

1 Overview and Matrix Orientation
1.1 Multivariate Extension of Univariate
Problems
Typical examples of univariate problems that have direct extension to multivariate

context
• Testing the population mean being equal to a hypothesized value ( one-sample

t test )
• Testing the equality of two population means ( two independent samples t test )
• Testing the equality of several population means ( one-way and multi-way
ANOVA)
• Testing the population variance being equal to a hypothesized value
(Chi-square test )
• Testing the equality of two population variances ( F test )
• Regression of Y on X1 , X2 , . . . , Xm (or linear modelling when some of the Xi
are not continuous)
Examples of Multivariate Extension

Suppose eight men (n = 8) each received a certain drug. The change in blood sugar
(µ1 ), diastolic blood pressure (µ2 ) and systolic blood pressure (µ3 ) are recorded.
Man Blood sugar Diastolic BP Systolic BP

1 30 −8 −1
2 90 7 6
3 −10 −2 4
4 −10 0 2
5 30 −2 5
6 60 0 3
7 0 −2 −1
8 40 1 2
1. Testing change before and after
Consider the univariate t-test at α level for each of the three hypotheses.
No drug eect on change of blood sugar ⇔ No change in blood sugar

⇔ H0 : µ1 = 0 vs H1 : µ1 6= 0
No drug eect on change of diastolic BP ⇔ No change in diastolic BP
⇔ H0 : µ2 = 0 vs H1 : µ2 6= 0
No drug eect on change of systolic BP ⇔ No change in systolic BP
HKU STAT7005 1
⇔ H0 : µ3 = 0 vs H1 : µ3 6= 0
Suppose we want to test the joint hypothesis (n = 8, p = 3).
No drug eect ⇔ No change in blood sugar and both BPs

⇔ H0 : µ1 = 0, µ2 = 0, µ3 = 0 vs. H1 : µj 6= 0 for at least one j
   
µ1 0
⇔ H0 : µ2 = 0  vs. H1 : at least one element not zero
  
µ3 0
It is tempting to combine the three t-tests into a joint -edure that rejects the
joint statement of three hypotheses if at least one of the univariate tests ends
up with a rejection. But then the Type I error ( probability of wrong rejection
when H0 is correct ) of this single procedure is generally larger than α. One
approach, treating all three hypotheses equally important, is to adjust the
signicance level of each test to α/3 instead. This approach would guarantee
the Type I error being no more than α for the joint procedure, according
to the Bonferroni inequality of probability. But the problem is that if the
actual Type I error of a procedure is much smaller than α, the power of the test
probability of correct rejection when H0 is incorrect )
( is also much smaller
than what one would expect of a procedure with Type I error being exactly
α. For this reason, it may be desirable to have a single multivariate test of
three hypotheses jointly at an exact signicance level α.
2. Two-sample test for means
Suppose data are also available for ten women and we wish to study the
dierence between genders. Then, one may consider each measurement at
a time and perform three univariate tests accordingly:
Drug eect on change in blood sugar is the same on men and women
(1) (2)
⇔ H0 : µ1 = µ1
Drug eect on change in diastolic BP is the same on men and women
(1) (2)
⇔ H0 : µ2 = µ2
Drug eect on change in systolic BP is the same on men and women
(1) (2)
⇔ H0 : µ3 = µ3
If a single multivariate test is desired for the same reason as explained in item
1 above, we would look for a multivariate test (n1 = 8, n2 = 10, p = 3) for the
hypothesis in vector form:
Change in all three measures is the same on men and women

 (1)   (2) 
µ1 µ1
 (1)   (2) 
⇔ H0 :  µ2  =  µ2 
(1) (2)
µ3 µ3
HKU STAT7005 2
In a similar way, ANOVA is extended to MANOVA (short for multivariate

ANOVA) if we are comparing more than two independent samples, or if we
are dealing with a factorial design where the tests are based on multivariate
observations.
3. Regression
Let
Y1 = change in blood sugar
Y2 = change in diastolic BP
Y3 = change in systolic BP
X = age of subject
For change in blood sugar alone, the model for linear regression is (n = 8, m =
1):
Y1 = α + βX + ε
Direct multivariate extension covering all three measures will be (n = 8, p =
3, m = 1):
Y 1 = α 1 + β1 X + ε1
Y 2 = α 2 + β2 X + ε2
Y 3 = α 3 + β3 X + ε3
We shall see later that the point estimates of regression parameters and their
standard errors from three separate regression for Yi , i = 1, 2, 3, are the same
as from the single multivariate regression. But, the latter approach can do
more: testing a hypothesis, or estimating a quantity, that involves regression
parameters from dierent univariate equations. The multivariate approach can
also provide simultaneous condence intervals or condence regions concerning
a mix of parameters from several univariate regressions.
4. Linear model
Consider now the situation having an additional record of 10 women and dene
a dummy variable indicating gender: Z=1 for a man, and =0 for a woman.
The univariate linear model for change in blood sugar on age without
interaction eects with gender, i.e. the regression lines for both genders have
the same slope, is (n = 18, m = 2):
Y1 = α + βX + γZ + ε.
Multivariate extension covering all three measures will then take the form
(n = 18, p = 3, m = 2):
Y1 = α1 + β1 X + γ1 Z + ε1
HKU STAT7005 3
Y2 = α2 + β2 X + γ2 Z + ε2
Y3 = α3 + β3 X + γ3 Z + ε3
Here, the dierence between the multivariate approach and the separately
univariate approach is the same as for the regression situation discussed in
item 3 above.
1.2 New Multivariate Problems
There are multivariate problems that are not attributable as direct extension of
univariate problems. The following is a partial list.
• Dimension reduction for Graphical presentation
1-dim. − histogram, stem-and-leaf plot

2-dim. − 2-D plot
3-dim. − 3-D plot by rotation
> 3-dim. − Need dimension reduction methods
• Testing special structures of a mean vector or several mean vectors
• Testing special structures of a covariance matrix or several covariance matrices
• Correlation Analysis (Canonical Correlation Analysis, CCA)
• Principal Component Analysis (PCA)
• Factor Analysis (FA)
• Discrimination and Classication (also called Pattern Recognition , or

Supervised Learning )
• Clustering (also called Unsupervised Learning )
• Correspondence Analysis
1.3 Population Quantities
Notation
A, B , S , R · · · matrix
a, b, s, r · · · vector
a, b, s, r · · · scalar
 
x1
 .. 
For simplicity, we use x =  .  to represent both a random vector or an
xp
observation vector wherever the context is clear.
HKU STAT7005 4
1.3.1 Mean Vector

Dene the mean vector of the random vector x as
   
E(x1 ) µ1
 ..   .. 
E(x) =  .  =  .  = µ
E(xp ) µp
Properties
1. E(a0 x) = a0 µ.
2. E(Ax) = Aµ.
1.3.2 Covariance Matrix (short for Variance-Covariance

Matrix)
Dene the covariance matrix of the random vector x as
 
σ11 σ12 · · · σ1p
 σ21 σ22 · · · σ2p 
V ar(x) or Σ =  .. .. . . .. 
 
 . . . . 
σp1 σp2 · · · σpp
where σij = Cov(xi , xj ) = E[(xi − E(xi ))(xj − E(xj ))].
Note that
1) σii = Cov(xi , xi ) = V ar(xi ) for i = 1, 2, . . . , p.
2) σij = Cov(xi , xj ) = Cov(xj , xi ) = σji ⇒ Σ is symmetric.
In matrix form,
V ar(x) = E[(x − E(x))(x − E(x))0 ]

= E(xx0 ) − E(x)E(x)0 .
If E(x) = µ,
V ar(x) = E(xx0 ) − µµ0 .
Properties
1. V ar(a0 x+b) = V ar(a0 x) = a0 Σa ≥ 0, where the last equality holds if and only
0
if a x 6= c, a constant. This property implies that Σ is always non-negative
−1
denite. It also implies that Σ is positive denite, and hence Σ exists, unless
x1 , . . . , xp are linearly related in which case we say that x is a degenerated
random vector (i.e. its eective dimension is less than p; in other words, its
joint distribution is concentrated in a subspace of lower dimension.)
2. V ar(Ax + b) = AΣA0 .
HKU STAT7005 5
1.3.3 Covariance Matrix of 2 Vectors

The covariance matrix is dened as
Cov(xp×1 , y q×1 ) = E[(x − E(x))(y − E(y))0 ]p×q .
If E(x) = µx and E(y) = µy ,
Cov(xp×1 , y q×1 ) = E[(x − µx )(y − µy )0 ].
Take note in the notation of the non-symmetry in arguments:
Cov(y, x) = E[(y − µy )(x − µx )0 ]

= [Cov(x, y)]0
6= Cov(x, y)
Properties
1. V ar(x) = Cov(x, x).
2. Suppose x1 , x2 and y are p×1 vectors. Then, Cov(x1 +x2 , y) = Cov(x1 , y)+
Cov(x2 , y).
3. Suppose x and y are p × 1 vectors. Then, V ar(x + y) = V ar(x) + V ar(y) +

Cov(y, x) + Cov(x, y).
4. Cov(Ax, By) = ACov(x, y)B 0 .
5. If x and y are independent, Cov(x, y) = 0. However, the converse is not

always true.
1.3.4 Population Correlation Matrix

By denition, the correlation matrix is
 
1 ρ12 · · · ρ1p
 ρ21 1 · · · ρ2p 
ρ =  ..
 
.. . 
 . .
. . 
ρp1 ρp2 · · · 1
p √
where ρij = Corr(xi , xj ) = Cov(xi , xj )/ V ar(xi )V ar(xj ) = σij / σii σjj .
In matrix operation form,

    √1 
√1 0 ··· 0 0 ··· 0
σ11 σ11 σ12 · · · σ1p σ
√1
 11 √1
 0 ··· 0  σ σ · · · σ 0 ··· 0 
   
σ22
  21 22 . 2p  σ22

ρ=
 .. .. .   .. .
.  .
.

.. . 

. . . .   .. . .
 . .  . . 
0 0 · · · √σ1pp σp1 σp2 · · · σpp 0 0 ··· √
1
σpp
HKU STAT7005 6
= D −1 ΣD −1 ,
where √ 
σ11 0 · · · 0
 0 √σ22 · · · 0  1
D =  .. = (diag(Σ)) 2 .
 
.. .
.

 . . . 
√
0 0 · · · σpp
Properties
1. −1 ≤ ρij ≤ 1. ρii = 1 for i = 1, · · · , p. ρij = ρji . ρij = 0 if and only if σij = 0.
2. Each ρij does not change under re-location or rescaling of xi and xj .
1.4 Sample Statistics
Suppose x1 , x2 , · · · , xn
are independent and identically distributed (i.i.d.) p × 1
0
random vectors, where xi = (xi1 , · · · , xip ) , i = 1, · · · , n. We usually arrange the
data in an n×p matrix, putting each multivariate observation as a row vector instead
of a column vector as follows.
1.4.1 Data Matrix
X1 · · · Xj · · · Xp
x01 x11 · · · x1j · · · x1p 1
   
 ..   .. .
.
.  .
. .
 .   . . .  .
 0 
X = (xij )n×p =  xi  =  xi1 · · · xij · · · xip  i

 .   . . .  .
 ..   .. .
.
.  .
. .
x0n xn1 · · · xnj · · · xnp n
x1 · · · xj · · · xp sample mean
s21 · · · s2j · · · s2p sample variance
HKU STAT7005 7
1.4.2 Summary Statistics

1. Sample mean vector
We collate sample means of all variables into a column vector

 
x1
.
.
 1 0
xp×1 =   = X 1.

.
n
xp
2. Sample covariance matrix (sample covariance matrix)
It can be shown that the following sample covariance matrix is an unbiased

estimate of the population covariance matrix
 
s11 s12 · · · s1p
 s21 s22 · · · s2p 
S p×p =  .. .. . . ..  .
 
 . . . . 
sp1 sp2 · · · spp
where
Xn
1
skj = n−1
(xik − xk )(xij − xj ), sample covariance between xk and xj
i=1
sjj = s2j , sample variance of xj
In matrix form,
1
S= [X 0 X − nx x0 ]
n−1
1 Xn
= (xi − x)(xi − x)0
n−1 i=1
1
= W,
n−1
where
n
X
W = X 0 X − nx x0 = (xi − x)(xi − x)0
i=1
is called the corrected sums of squares and products matrix, CSSP
matrix for short.
3. Sample correlation matrix
From the sample covariance matrix, we can obtain the sample correlation
matrix as an estimate of the population correlation matrix
 
1 r12 · · · r1p
 r21 1 · · · r2p 
Rp×p =  ..
 
.. . 
 . .
. . 
rp1 rp2 · · · 1
HKU STAT7005 8
skj
where rkj = √ , sample correlation coecient between xk and xj .
skk sjj
In matrix form,
R = D −1 SD −1 ,
where √ 
s11 0 · · · 0
 0 √s22 · · · 0 
D =  .. .
 
.. .
 . . .
. 
√
0 0 · · · spp
Note that from the viewpoint of matrix operation, the transform from
covariance matrix to correlation matrix is just a particular application of
standardizing a positive denite matrix into a standard form where the
diagonal elements all equal to one. The original positive denite matrix can be
recovered by means of the the diagonal elements and the standardized matrix.
HKU STAT7005 9
Appendix 1 Some Results on Matrices
Special Matrices
 
1
 .. 
Unit vector 1=.
1
 
1 0
 ..
Identity matrix I=

. 
0 1
 
1 ··· 1
 .. . . .
. 0
Unit matrix J =. .  = 11

.
1 ··· 1
Matrix Operations
Let A = (aij ), B = (bij ), a = (ai ) and b = (bi ).
A + B = (aij + bij )
A − B = (aij − bij )
cA = (caij )
X
a0 b = ai b i
i
!
X
AB = aik bkj
k
A0 = (aji )

1 if i=j
diag(A) = (aij δij ) where δij =
0 if i 6= j
X
tr(A) = aii
i
Transpose
(A0 )0 =A
(A + B)0 = A0 + B 0
(AB)0 = B 0 A0
0 0 0
AB A C
=
C D B 0 D0
Determinant: Det(A) or |A|
1. |AB| = |A| |B|.
AC
2. = |A| |B|.
0 B
3. |I p + AB| = |I q + BA| A is p × q, B is q × p.
HKU STAT7005 10
Inverse: A−1
1. A−1 A = AA−1 = I .
2. (A0 )−1 = (A−1 )0 .
3. (AB)−1 = B −1 A−1 .
4. A−1 = |A|−1 .

A11 A12
5. Given A = where A11 and A22 are square matrices. Let A11·2 =
A21 A22
A11 − A12 A−1 −1
22 A21 and A22·1 = A22 − A21 A11 A12 .
(a) If |A11 | =
6 0,
A−1 −1 −1 −1
−A−1 −1

−1 11 + A11 A12 A22·1 A21 A11 11 A12 A22·1
A = −1 −1 −1
−A22·1 A21 A11 A22·1
|A| = |A11 | |A22·1 | .
(b) If |A22 | =
6 0,
A−1 −A−1 −1

−1 11·2 11·2 A12 A22
A =
−A22 A21 A−1
−1
11·2 A22 + A22 A21 A11·2 A12 A−1
−1 −1 −1
22
|A| = |A22 | |A11·2 | .
(c) If |A11 | =
6 0 and|A22 | =
6 0,
A−1 −A−1 −1

−1 11·2 11 A12 A22·1
A =
−A−1 −1
22 A21 A11·2 A−1 22·1
|A| = |A11 | |A22·1 | = |A22 | |A11·2 | .
6. (A + CBD)−1 = A−1 − A−1 CB(B + BDA−1 CB)−1 BDA−1 .
0 −1 −1 A−1 cd0 A−1

7. (A + cd ) =A − .
1 + d0 A−1 c
8. |A + cd0 | = |A| (1 + d0 A−1 c).
Trace and Eigenvalues

Denition: Let A be an n×n square matrix and let x be an n×1 nonzero vector such
that Ax = λx. Then, λ is called an eigenvalue of A and x is called an eigenvector
corresponding to eigenvalue λ. The eigenvalues are the solutions of
|A − λI| = 0.
1. tr(A + B) = tr(A)+tr(B)
HKU STAT7005 11
2. tr(AB) = tr(BA)
3. tr(αA) = αtr(A)
4. For an n×n matrix A with eigenvalues λ1 ,λ2 ,. . . , λn , we have the following
n
P
(a) tr(A) = λi
i=1
n
Q
(b) |A| = λi
i=1
n
Q
(c) |I n ± A| = (1 ± λi )
i=1
5. The nonzero eigenvalues of AB are the same as those of BA.

Positive Semi-Denite and Positive Denite Matrices
Denition: Let A be an n×n matrix. Then, A is
symmetric if A0 = A
orthogonal if A−1 = A0
positive semi-denite (p.s.d.) if A is symmetric and `0 A` ≥ 0 for all `
positive denite (p.d.) if A is symmetric and `0 A` > 0 for ` 6= 0.
1. A symmetric matrix is p.s.d. (p.d.) if and only if its eigenvalues are
nonnegative (all positive).
2. B0B is p.s.d. for any square matrix B.

3. If A is p.d., A−1 is p.d..
4. (a) If A is p.s.d. of rank r if and only if there exists a square matrix R of

rank r such that A = RR0 .
(b) If A is p.d. if and only if there exists a non-singular matrix R such that
A = RR0 .
Factorization of Matrices
In the following, suppose A and B are n×n square matrices.
1. Spectral decomposition
If A is symmetric, there exists an orthogonal matrix P = (p1 p2 · · · pn ) such

that  
λ1 0 · · · 0
 0 λ2 · · · 0 
P 0 AP =  .. .. . . .. 
 
 . . . . 
0 0 · · · λn
where λ1 , λ2 , . . . , λn are eigenvalues of A and p1 ,p2 ,. . .,pn are their
corresponding eigenvectors.
HKU STAT7005 12
2. If A is p.s.d. (p.d.), there exists a p.s.d. (p.d.) matrix B such that A = B2.
1
Sometimes, people write B as A 2 notationally.
3. Cholesky decomposition
If A is p.s.d. (p.d.), there exists a (unique) upper triangular matrix U with

nonnegative (positive) diagonal elements, such that
A = U 0U .
4. Canonical decomposition
If A is symmetric and B is symmetric p.d., there exists a non-singular matrix

P such that
P 0 AP = Λ and P 0 BP = I n ,
where Λ = diag(λ1 , . . . , λn ) and the λi are the eigenvalues of B −1 A (or AB −1 ).
Idempotent Matrices
Denition: A matrix A is idempotent if A2 = A.
1. If A is idempotent,
(a) I −A is idempotent.
0
(b) P AP is idempotent if P is orthogonal.
2. If A is symmetric and idempotent, rank(A) = tr(A).
3. If A is an n × n symmetric matrix, A is idempotent of rank r if and only if A

has r eigenvalues equal to 1 and n − r equal to 0.
4. If A is symmetric idempotent of rank n, A = I n .

Inequalities
1. Cauchy-Schwarz inequality:
(a0 b)2 ≤ (a0 a)(b0 b).
2. If D is positive denite, for all a,

(a0 `)2
≤ `0 D −1 `.
a0 Da
The equality holds when a ∝ D −1 `.
0 −1
Given that D is positive denite, D = CC and D = (C 0 )−1 C −1 . Let b =
C 0 a and e = C −1 `. Then, by Cauchy-Schwarz inequality (b0 e)2 ≤ (b0 b)(e0 e),
the result can follow.
HKU STAT7005 13
3. Let A be an n×n symmetric matrix and let D be any n×n p.d. matrix.
Then, for all a,
−1 a0 Aa
the smallest eigenvalue ofD A≤ ≤ the largest eigenvalue of D −1 A.
a0 Da
The equality on either side holds when a is proportional to the corresponding
eigenvector.
4. If A and B are p.d.,

(a0 Db)2
max =θ
a,b a0 Aa · b0 Bb
where θ is the largest eigenvalue of A−1 DB −1 D 0
B −1 D 0 A−1 D . The or
−1 −1
maximum occurs when a is proportional to an eigenvector of A DB D0
−1
corresponding to θ , b is proportional to an eigenvector of B D 0 A−1 D
corresponding to θ .
5. Consider the function
f (Σ) = log |Σ| + tr(Σ−1 A).
If A and Σ are p.d., f (Σ) is minimized uniquely at Σ = A.

Vector and Matrix Dierentiation
Denition
∂y
 
∂x1
∂y . 
= . 

.
∂x ∂y
∂xn
∂y
· · · ∂x∂y1n
 
∂x11
∂y . .
= . .
 
. .
∂X

∂y ∂y
∂xn1
· · · ∂xnn
∂a0 x
1. = a.
∂x
∂x0 Ax
2. = 2Ax if A is symmetric.
∂x
∂ tr(X)
3. =I
∂X
0
∂ tr(AX) A if all elements of X are distinct
4. = 0
∂X A + A − diag(A) if X is symmetric.
|X| (X −1 )0

∂ |X| if all elements of X are distinct
5. = −1 −1 0
∂X |X| (2X − diag(X )) if X is symmetric.
HKU STAT7005 14

Chap 1

Uploaded by

Copyright:

Available Formats

Chap 1

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chap 1

Uploaded by

Copyright:

Available Formats

STAT7005 Multivariate Methods

1. Overview and Matrix Orientation

1.1 Multivariate Extension of Univariate

Typical examples of univariate problems that have direct extension to multivariate

• Testing the population mean being equal to a hypothesized value ( one-sample

Examples of Multivariate Extension

Man Blood sugar Diastolic BP Systolic BP

1. Testing change before and after

No drug eect on change of blood sugar ⇔ No change in blood sugar

Suppose we want to test the joint hypothesis (n = 8, p = 3).

No drug eect ⇔ No change in blood sugar and both BPs

2. Two-sample test for means

Change in all three measures is the same on men and women

In a similar way, ANOVA is extended to MANOVA (short for multivariate

Y1 = change in blood sugar

1.2 New Multivariate Problems

• Dimension reduction for Graphical presentation

1-dim. − histogram, stem-and-leaf plot

• Testing special structures of a mean vector or several mean vectors

• Testing special structures of a covariance matrix or several covariance matrices

• Correlation Analysis (Canonical Correlation Analysis, CCA)

• Principal Component Analysis (PCA)

• Factor Analysis (FA)

• Discrimination and Classication (also called Pattern Recognition , or

• Clustering (also called Unsupervised Learning )

1.3 Population Quantities

1.3.1 Mean Vector

1.3.2 Covariance Matrix (short for Variance-Covariance

where σij = Cov(xi , xj ) = E[(xi − E(xi ))(xj − E(xj ))].

V ar(x) = E[(x − E(x))(x − E(x))0 ]

1.3.3 Covariance Matrix of 2 Vectors

Cov(xp×1 , y q×1 ) = E[(x − E(x))(y − E(y))0 ]p×q .

If E(x) = µx and E(y) = µy ,

Cov(xp×1 , y q×1 ) = E[(x − µx )(y − µy )0 ].

Take note in the notation of the non-symmetry in arguments:

Cov(y, x) = E[(y − µy )(x − µx )0 ]

3. Suppose x and y are p × 1 vectors. Then, V ar(x + y) = V ar(x) + V ar(y) +

4. Cov(Ax, By) = ACov(x, y)B 0 .

5. If x and y are independent, Cov(x, y) = 0. However, the converse is not

1.3.4 Population Correlation Matrix

In matrix operation form,

2. Each ρij does not change under re-location or rescaling of xi and xj .

1.4 Sample Statistics

1.4.1 Data Matrix

s21 · · · s2j · · · s2p sample variance

1.4.2 Summary Statistics

We collate sample means of all variables into a column vector

2. Sample covariance matrix (sample covariance matrix)

It can be shown that the following sample covariance matrix is an unbiased

3. Sample correlation matrix

Appendix 1 Some Results on Matrices

2. (A0 )−1 = (A−1 )0 .

6. (A + CBD)−1 = A−1 − A−1 CB(B + BDA−1 CB)−1 BDA−1 .

0 −1 −1 A−1 cd0 A−1

Trace and Eigenvalues

5. The nonzero eigenvalues of AB are the same as those of BA.

2. B0B is p.s.d. for any square matrix B.

4. (a) If A is p.s.d. of rank r if and only if there exists a square matrix R of

No drug eect on change of blood sugar ⇔ No change in blood sugar

No drug eect ⇔ No change in blood sugar and both BPs

• Discrimination and Classication (also called Pattern Recognition , or

• Clustering (also called Unsupervised Learning )

2. If D is positive denite, for all a,