Chap 1

Download as pdf or txt
Download as pdf or txt
You are on page 1of 14

STAT7005 Multivariate Methods

1. Overview and Matrix Orientation


1 Overview and Matrix Orientation

1.1 Multivariate Extension of Univariate

Problems

Typical examples of univariate problems that have direct extension to multivariate


context

• Testing the population mean being equal to a hypothesized value ( one-sample


t test )
• Testing the equality of two population means ( two independent samples t test )
• Testing the equality of several population means ( one-way and multi-way
ANOVA)
• Testing the population variance being equal to a hypothesized value
(Chi-square test )
• Testing the equality of two population variances ( F test )
• Regression of Y on X1 , X2 , . . . , Xm (or linear modelling when some of the Xi
are not continuous)

Examples of Multivariate Extension


Suppose eight men (n = 8) each received a certain drug. The change in blood sugar
(µ1 ), diastolic blood pressure (µ2 ) and systolic blood pressure (µ3 ) are recorded.

Man Blood sugar Diastolic BP Systolic BP


1 30 −8 −1
2 90 7 6
3 −10 −2 4
4 −10 0 2
5 30 −2 5
6 60 0 3
7 0 −2 −1
8 40 1 2

1. Testing change before and after

Consider the univariate t-test at α level for each of the three hypotheses.

No drug eect on change of blood sugar ⇔ No change in blood sugar


⇔ H0 : µ1 = 0 vs H1 : µ1 6= 0
No drug eect on change of diastolic BP ⇔ No change in diastolic BP
⇔ H0 : µ2 = 0 vs H1 : µ2 6= 0
No drug eect on change of systolic BP ⇔ No change in systolic BP

HKU STAT7005 1
STAT7005 Multivariate Methods
1. Overview and Matrix Orientation

⇔ H0 : µ3 = 0 vs H1 : µ3 6= 0

Suppose we want to test the joint hypothesis (n = 8, p = 3).

No drug eect ⇔ No change in blood sugar and both BPs


⇔ H0 : µ1 = 0, µ2 = 0, µ3 = 0 vs. H1 : µj 6= 0 for at least one j
   
µ1 0
⇔ H0 : µ2 = 0  vs. H1 : at least one element not zero
  
µ3 0

It is tempting to combine the three t-tests into a joint -edure that rejects the
joint statement of three hypotheses if at least one of the univariate tests ends
up with a rejection. But then the Type I error ( probability of wrong rejection
when H0 is correct ) of this single procedure is generally larger than α. One
approach, treating all three hypotheses equally important, is to adjust the
signicance level of each test to α/3 instead. This approach would guarantee
the Type I error being no more than α for the joint procedure, according
to the Bonferroni inequality of probability. But the problem is that if the
actual Type I error of a procedure is much smaller than α, the power of the test
probability of correct rejection when H0 is incorrect )
( is also much smaller
than what one would expect of a procedure with Type I error being exactly
α. For this reason, it may be desirable to have a single multivariate test of
three hypotheses jointly at an exact signicance level α.

2. Two-sample test for means

Suppose data are also available for ten women and we wish to study the
dierence between genders. Then, one may consider each measurement at
a time and perform three univariate tests accordingly:

Drug eect on change in blood sugar is the same on men and women
(1) (2)
⇔ H0 : µ1 = µ1
Drug eect on change in diastolic BP is the same on men and women
(1) (2)
⇔ H0 : µ2 = µ2
Drug eect on change in systolic BP is the same on men and women
(1) (2)
⇔ H0 : µ3 = µ3

If a single multivariate test is desired for the same reason as explained in item
1 above, we would look for a multivariate test (n1 = 8, n2 = 10, p = 3) for the
hypothesis in vector form:

Change in all three measures is the same on men and women


 (1)   (2) 
µ1 µ1
 (1)   (2) 
⇔ H0 :  µ2  =  µ2 
(1) (2)
µ3 µ3

HKU STAT7005 2
STAT7005 Multivariate Methods
1. Overview and Matrix Orientation

In a similar way, ANOVA is extended to MANOVA (short for multivariate


ANOVA) if we are comparing more than two independent samples, or if we
are dealing with a factorial design where the tests are based on multivariate
observations.

3. Regression

Let

Y1 = change in blood sugar

Y2 = change in diastolic BP

Y3 = change in systolic BP

X = age of subject

For change in blood sugar alone, the model for linear regression is (n = 8, m =
1):
Y1 = α + βX + ε
Direct multivariate extension covering all three measures will be (n = 8, p =
3, m = 1):

Y 1 = α 1 + β1 X + ε1
Y 2 = α 2 + β2 X + ε2
Y 3 = α 3 + β3 X + ε3

We shall see later that the point estimates of regression parameters and their
standard errors from three separate regression for Yi , i = 1, 2, 3, are the same
as from the single multivariate regression. But, the latter approach can do
more: testing a hypothesis, or estimating a quantity, that involves regression
parameters from dierent univariate equations. The multivariate approach can
also provide simultaneous condence intervals or condence regions concerning
a mix of parameters from several univariate regressions.

4. Linear model

Consider now the situation having an additional record of 10 women and dene
a dummy variable indicating gender: Z=1 for a man, and =0 for a woman.

The univariate linear model for change in blood sugar on age without
interaction eects with gender, i.e. the regression lines for both genders have
the same slope, is (n = 18, m = 2):

Y1 = α + βX + γZ + ε.

Multivariate extension covering all three measures will then take the form
(n = 18, p = 3, m = 2):

Y1 = α1 + β1 X + γ1 Z + ε1

HKU STAT7005 3
STAT7005 Multivariate Methods
1. Overview and Matrix Orientation

Y2 = α2 + β2 X + γ2 Z + ε2
Y3 = α3 + β3 X + γ3 Z + ε3
Here, the dierence between the multivariate approach and the separately
univariate approach is the same as for the regression situation discussed in
item 3 above.

1.2 New Multivariate Problems

There are multivariate problems that are not attributable as direct extension of
univariate problems. The following is a partial list.

• Dimension reduction for Graphical presentation

1-dim. − histogram, stem-and-leaf plot


2-dim. − 2-D plot
3-dim. − 3-D plot by rotation
> 3-dim. − Need dimension reduction methods

• Testing special structures of a mean vector or several mean vectors

• Testing special structures of a covariance matrix or several covariance matrices

• Correlation Analysis (Canonical Correlation Analysis, CCA)

• Principal Component Analysis (PCA)

• Factor Analysis (FA)

• Discrimination and Classication (also called Pattern Recognition , or


Supervised Learning )

• Clustering (also called Unsupervised Learning )

• Correspondence Analysis

1.3 Population Quantities

Notation
A, B , S , R · · · matrix
a, b, s, r · · · vector
a, b, s, r · · · scalar
 
x1
 .. 
For simplicity, we use x =  .  to represent both a random vector or an

xp
observation vector wherever the context is clear.

HKU STAT7005 4
STAT7005 Multivariate Methods
1. Overview and Matrix Orientation

1.3.1 Mean Vector


Dene the mean vector of the random vector x as
   
E(x1 ) µ1
 ..   .. 
E(x) =  .  =  .  = µ
E(xp ) µp

Properties
1. E(a0 x) = a0 µ.

2. E(Ax) = Aµ.

1.3.2 Covariance Matrix (short for Variance-Covariance


Matrix)
Dene the covariance matrix of the random vector x as
 
σ11 σ12 · · · σ1p
 σ21 σ22 · · · σ2p 
V ar(x) or Σ =  .. .. . . .. 
 
 . . . . 
σp1 σp2 · · · σpp

where σij = Cov(xi , xj ) = E[(xi − E(xi ))(xj − E(xj ))].

Note that
1) σii = Cov(xi , xi ) = V ar(xi ) for i = 1, 2, . . . , p.
2) σij = Cov(xi , xj ) = Cov(xj , xi ) = σji ⇒ Σ is symmetric.

In matrix form,

V ar(x) = E[(x − E(x))(x − E(x))0 ]


= E(xx0 ) − E(x)E(x)0 .

If E(x) = µ,
V ar(x) = E(xx0 ) − µµ0 .
Properties
1. V ar(a0 x+b) = V ar(a0 x) = a0 Σa ≥ 0, where the last equality holds if and only
0
if a x 6= c, a constant. This property implies that Σ is always non-negative
−1
denite. It also implies that Σ is positive denite, and hence Σ exists, unless
x1 , . . . , xp are linearly related in which case we say that x is a degenerated
random vector (i.e. its eective dimension is less than p; in other words, its
joint distribution is concentrated in a subspace of lower dimension.)

2. V ar(Ax + b) = AΣA0 .

HKU STAT7005 5
STAT7005 Multivariate Methods
1. Overview and Matrix Orientation

1.3.3 Covariance Matrix of 2 Vectors


The covariance matrix is dened as

Cov(xp×1 , y q×1 ) = E[(x − E(x))(y − E(y))0 ]p×q .

If E(x) = µx and E(y) = µy ,

Cov(xp×1 , y q×1 ) = E[(x − µx )(y − µy )0 ].

Take note in the notation of the non-symmetry in arguments:

Cov(y, x) = E[(y − µy )(x − µx )0 ]


= [Cov(x, y)]0
6= Cov(x, y)

Properties
1. V ar(x) = Cov(x, x).

2. Suppose x1 , x2 and y are p×1 vectors. Then, Cov(x1 +x2 , y) = Cov(x1 , y)+
Cov(x2 , y).

3. Suppose x and y are p × 1 vectors. Then, V ar(x + y) = V ar(x) + V ar(y) +


Cov(y, x) + Cov(x, y).

4. Cov(Ax, By) = ACov(x, y)B 0 .

5. If x and y are independent, Cov(x, y) = 0. However, the converse is not


always true.

1.3.4 Population Correlation Matrix


By denition, the correlation matrix is
 
1 ρ12 · · · ρ1p
 ρ21 1 · · · ρ2p 
ρ =  ..
 
.. . 
 . .
. . 
ρp1 ρp2 · · · 1
p √
where ρij = Corr(xi , xj ) = Cov(xi , xj )/ V ar(xi )V ar(xj ) = σij / σii σjj .

In matrix operation form,


    √1 
√1 0 ··· 0 0 ··· 0
σ11 σ11 σ12 · · · σ1p σ
√1
 11 √1
 0 ··· 0  σ σ · · · σ 0 ··· 0 
   
σ22
  21 22 . 2p  σ22

ρ=
 .. .. .   .. .
.  .
.

.. . 

. . . .   .. . .
 . .  . . 
0 0 · · · √σ1pp σp1 σp2 · · · σpp 0 0 ··· √
1
σpp

HKU STAT7005 6
STAT7005 Multivariate Methods
1. Overview and Matrix Orientation

= D −1 ΣD −1 ,

where √ 
σ11 0 · · · 0
 0 √σ22 · · · 0  1
D =  .. = (diag(Σ)) 2 .
 
.. .
.

 . . . 

0 0 · · · σpp
Properties
1. −1 ≤ ρij ≤ 1. ρii = 1 for i = 1, · · · , p. ρij = ρji . ρij = 0 if and only if σij = 0.

2. Each ρij does not change under re-location or rescaling of xi and xj .

1.4 Sample Statistics

Suppose x1 , x2 , · · · , xn
are independent and identically distributed (i.i.d.) p × 1
0
random vectors, where xi = (xi1 , · · · , xip ) , i = 1, · · · , n. We usually arrange the
data in an n×p matrix, putting each multivariate observation as a row vector instead
of a column vector as follows.

1.4.1 Data Matrix

X1 · · · Xj · · · Xp
x01 x11 · · · x1j · · · x1p 1
   
 ..   .. .
.
.  .
. .
 .   . . .  .
 0 
X = (xij )n×p =  xi  =  xi1 · · · xij · · · xip  i

 .   . . .  .
 ..   .. .
.
.  .
. .
x0n xn1 · · · xnj · · · xnp n
x1 · · · xj · · · xp sample mean

s21 · · · s2j · · · s2p sample variance

HKU STAT7005 7
STAT7005 Multivariate Methods
1. Overview and Matrix Orientation

1.4.2 Summary Statistics


1. Sample mean vector

We collate sample means of all variables into a column vector


 
x1
.
.
 1 0
xp×1 =   = X 1.

.
n
xp

2. Sample covariance matrix (sample covariance matrix)

It can be shown that the following sample covariance matrix is an unbiased


estimate of the population covariance matrix
 
s11 s12 · · · s1p
 s21 s22 · · · s2p 
S p×p =  .. .. . . ..  .
 
 . . . . 
sp1 sp2 · · · spp
where
Xn
1
skj = n−1
(xik − xk )(xij − xj ), sample covariance between xk and xj
i=1
sjj = s2j , sample variance of xj
In matrix form,

1
S= [X 0 X − nx x0 ]
n−1
1 Xn
= (xi − x)(xi − x)0
n−1 i=1
1
= W,
n−1
where
n
X
W = X 0 X − nx x0 = (xi − x)(xi − x)0
i=1
is called the corrected sums of squares and products matrix, CSSP
matrix for short.

3. Sample correlation matrix

From the sample covariance matrix, we can obtain the sample correlation
matrix as an estimate of the population correlation matrix
 
1 r12 · · · r1p
 r21 1 · · · r2p 
Rp×p =  ..
 
.. . 
 . .
. . 
rp1 rp2 · · · 1

HKU STAT7005 8
STAT7005 Multivariate Methods
1. Overview and Matrix Orientation

skj
where rkj = √ , sample correlation coecient between xk and xj .
skk sjj

In matrix form,
R = D −1 SD −1 ,
where √ 
s11 0 · · · 0
 0 √s22 · · · 0 
D =  .. .
 
.. .
 . . .
. 

0 0 · · · spp
Note that from the viewpoint of matrix operation, the transform from
covariance matrix to correlation matrix is just a particular application of
standardizing a positive denite matrix into a standard form where the
diagonal elements all equal to one. The original positive denite matrix can be
recovered by means of the the diagonal elements and the standardized matrix.

HKU STAT7005 9
STAT7005 Multivariate Methods
1. Overview and Matrix Orientation

Appendix 1 Some Results on Matrices

Special Matrices
 
1
 .. 
Unit vector 1=.
1
 
1 0
 ..
Identity matrix I=

. 
0 1
 
1 ··· 1
 .. . . .
. 0
Unit matrix J =. .  = 11

.
1 ··· 1
Matrix Operations
Let A = (aij ), B = (bij ), a = (ai ) and b = (bi ).
A + B = (aij + bij )
A − B = (aij − bij )
cA = (caij )
X
a0 b = ai b i
i
!
X
AB = aik bkj
k
A0 = (aji )

1 if i=j
diag(A) = (aij δij ) where δij =
0 if i 6= j
X
tr(A) = aii
i

Transpose
(A0 )0 =A
(A + B)0 = A0 + B 0
(AB)0 = B 0 A0
 0  0 0
AB A C
=
C D B 0 D0
Determinant: Det(A) or |A|
1. |AB| = |A| |B|.
AC
2. = |A| |B|.
0 B
3. |I p + AB| = |I q + BA| A is p × q, B is q × p.

HKU STAT7005 10
STAT7005 Multivariate Methods
1. Overview and Matrix Orientation

Inverse: A−1

1. A−1 A = AA−1 = I .

2. (A0 )−1 = (A−1 )0 .

3. (AB)−1 = B −1 A−1 .

4. A−1 = |A|−1 .
 
A11 A12
5. Given A = where A11 and A22 are square matrices. Let A11·2 =
A21 A22
A11 − A12 A−1 −1
22 A21 and A22·1 = A22 − A21 A11 A12 .

(a) If |A11 | =
6 0,

A−1 −1 −1 −1
−A−1 −1
 
−1 11 + A11 A12 A22·1 A21 A11 11 A12 A22·1
A = −1 −1 −1
−A22·1 A21 A11 A22·1
|A| = |A11 | |A22·1 | .

(b) If |A22 | =
6 0,

A−1 −A−1 −1
 
−1 11·2 11·2 A12 A22
A =
−A22 A21 A−1
−1
11·2 A22 + A22 A21 A11·2 A12 A−1
−1 −1 −1
22
|A| = |A22 | |A11·2 | .

(c) If |A11 | =
6 0 and|A22 | =
6 0,

A−1 −A−1 −1
 
−1 11·2 11 A12 A22·1
A =
−A−1 −1
22 A21 A11·2 A−1 22·1
|A| = |A11 | |A22·1 | = |A22 | |A11·2 | .

6. (A + CBD)−1 = A−1 − A−1 CB(B + BDA−1 CB)−1 BDA−1 .

0 −1 −1 A−1 cd0 A−1


7. (A + cd ) =A − .
1 + d0 A−1 c
8. |A + cd0 | = |A| (1 + d0 A−1 c).

Trace and Eigenvalues


Denition: Let A be an n×n square matrix and let x be an n×1 nonzero vector such
that Ax = λx. Then, λ is called an eigenvalue of A and x is called an eigenvector
corresponding to eigenvalue λ. The eigenvalues are the solutions of

|A − λI| = 0.

1. tr(A + B) = tr(A)+tr(B)

HKU STAT7005 11
STAT7005 Multivariate Methods
1. Overview and Matrix Orientation

2. tr(AB) = tr(BA)

3. tr(αA) = αtr(A)
4. For an n×n matrix A with eigenvalues λ1 ,λ2 ,. . . , λn , we have the following

n
P
(a) tr(A) = λi
i=1
n
Q
(b) |A| = λi
i=1
n
Q
(c) |I n ± A| = (1 ± λi )
i=1

5. The nonzero eigenvalues of AB are the same as those of BA.


Positive Semi-Denite and Positive Denite Matrices
Denition: Let A be an n×n matrix. Then, A is

symmetric if A0 = A
orthogonal if A−1 = A0
positive semi-denite (p.s.d.) if A is symmetric and `0 A` ≥ 0 for all `
positive denite (p.d.) if A is symmetric and `0 A` > 0 for ` 6= 0.
1. A symmetric matrix is p.s.d. (p.d.) if and only if its eigenvalues are
nonnegative (all positive).

2. B0B is p.s.d. for any square matrix B.


3. If A is p.d., A−1 is p.d..

4. (a) If A is p.s.d. of rank r if and only if there exists a square matrix R of


rank r such that A = RR0 .
(b) If A is p.d. if and only if there exists a non-singular matrix R such that
A = RR0 .
Factorization of Matrices
In the following, suppose A and B are n×n square matrices.

1. Spectral decomposition

If A is symmetric, there exists an orthogonal matrix P = (p1 p2 · · · pn ) such


that  
λ1 0 · · · 0
 0 λ2 · · · 0 
P 0 AP =  .. .. . . .. 
 
 . . . . 
0 0 · · · λn
where λ1 , λ2 , . . . , λn are eigenvalues of A and p1 ,p2 ,. . .,pn are their
corresponding eigenvectors.

HKU STAT7005 12
STAT7005 Multivariate Methods
1. Overview and Matrix Orientation

2. If A is p.s.d. (p.d.), there exists a p.s.d. (p.d.) matrix B such that A = B2.
1
Sometimes, people write B as A 2 notationally.

3. Cholesky decomposition

If A is p.s.d. (p.d.), there exists a (unique) upper triangular matrix U with


nonnegative (positive) diagonal elements, such that

A = U 0U .

4. Canonical decomposition

If A is symmetric and B is symmetric p.d., there exists a non-singular matrix


P such that
P 0 AP = Λ and P 0 BP = I n ,
where Λ = diag(λ1 , . . . , λn ) and the λi are the eigenvalues of B −1 A (or AB −1 ).
Idempotent Matrices
Denition: A matrix A is idempotent if A2 = A.
1. If A is idempotent,

(a) I −A is idempotent.
0
(b) P AP is idempotent if P is orthogonal.

2. If A is symmetric and idempotent, rank(A) = tr(A).

3. If A is an n × n symmetric matrix, A is idempotent of rank r if and only if A


has r eigenvalues equal to 1 and n − r equal to 0.

4. If A is symmetric idempotent of rank n, A = I n .


Inequalities
1. Cauchy-Schwarz inequality:

(a0 b)2 ≤ (a0 a)(b0 b).

2. If D is positive denite, for all a,


(a0 `)2
≤ `0 D −1 `.
a0 Da
The equality holds when a ∝ D −1 `.

0 −1
Given that D is positive denite, D = CC and D = (C 0 )−1 C −1 . Let b =
C 0 a and e = C −1 `. Then, by Cauchy-Schwarz inequality (b0 e)2 ≤ (b0 b)(e0 e),
the result can follow.

HKU STAT7005 13
STAT7005 Multivariate Methods
1. Overview and Matrix Orientation

3. Let A be an n×n symmetric matrix and let D be any n×n p.d. matrix.
Then, for all a,

−1 a0 Aa
the smallest eigenvalue ofD A≤ ≤ the largest eigenvalue of D −1 A.
a0 Da
The equality on either side holds when a is proportional to the corresponding
eigenvector.

4. If A and B are p.d.,


(a0 Db)2
max =θ
a,b a0 Aa · b0 Bb
where θ is the largest eigenvalue of A−1 DB −1 D 0
B −1 D 0 A−1 D . The or
−1 −1
maximum occurs when a is proportional to an eigenvector of A DB D0
−1
corresponding to θ , b is proportional to an eigenvector of B D 0 A−1 D
corresponding to θ .

5. Consider the function

f (Σ) = log |Σ| + tr(Σ−1 A).

If A and Σ are p.d., f (Σ) is minimized uniquely at Σ = A.


Vector and Matrix Dierentiation
Denition

∂y
 
∂x1
∂y . 
= . 

.
∂x ∂y
∂xn
∂y
· · · ∂x∂y1n
 
∂x11
∂y . .
= . .
 
. .
∂X

∂y ∂y
∂xn1
· · · ∂xnn

∂a0 x
1. = a.
∂x
∂x0 Ax
2. = 2Ax if A is symmetric.
∂x
∂ tr(X)
3. =I
∂X
 0
∂ tr(AX) A if all elements of X are distinct
4. = 0
∂X A + A − diag(A) if X is symmetric.

|X| (X −1 )0

∂ |X| if all elements of X are distinct
5. = −1 −1 0
∂X |X| (2X − diag(X )) if X is symmetric.

HKU STAT7005 14

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy