Chap 1
Chap 1
Chap 1
Problems
Consider the univariate t-test at α level for each of the three hypotheses.
HKU STAT7005 1
STAT7005 Multivariate Methods
1. Overview and Matrix Orientation
⇔ H0 : µ3 = 0 vs H1 : µ3 6= 0
It is tempting to combine the three t-tests into a joint -edure that rejects the
joint statement of three hypotheses if at least one of the univariate tests ends
up with a rejection. But then the Type I error ( probability of wrong rejection
when H0 is correct ) of this single procedure is generally larger than α. One
approach, treating all three hypotheses equally important, is to adjust the
signicance level of each test to α/3 instead. This approach would guarantee
the Type I error being no more than α for the joint procedure, according
to the Bonferroni inequality of probability. But the problem is that if the
actual Type I error of a procedure is much smaller than α, the power of the test
probability of correct rejection when H0 is incorrect )
( is also much smaller
than what one would expect of a procedure with Type I error being exactly
α. For this reason, it may be desirable to have a single multivariate test of
three hypotheses jointly at an exact signicance level α.
Suppose data are also available for ten women and we wish to study the
dierence between genders. Then, one may consider each measurement at
a time and perform three univariate tests accordingly:
Drug eect on change in blood sugar is the same on men and women
(1) (2)
⇔ H0 : µ1 = µ1
Drug eect on change in diastolic BP is the same on men and women
(1) (2)
⇔ H0 : µ2 = µ2
Drug eect on change in systolic BP is the same on men and women
(1) (2)
⇔ H0 : µ3 = µ3
If a single multivariate test is desired for the same reason as explained in item
1 above, we would look for a multivariate test (n1 = 8, n2 = 10, p = 3) for the
hypothesis in vector form:
HKU STAT7005 2
STAT7005 Multivariate Methods
1. Overview and Matrix Orientation
3. Regression
Let
Y2 = change in diastolic BP
Y3 = change in systolic BP
X = age of subject
For change in blood sugar alone, the model for linear regression is (n = 8, m =
1):
Y1 = α + βX + ε
Direct multivariate extension covering all three measures will be (n = 8, p =
3, m = 1):
Y 1 = α 1 + β1 X + ε1
Y 2 = α 2 + β2 X + ε2
Y 3 = α 3 + β3 X + ε3
We shall see later that the point estimates of regression parameters and their
standard errors from three separate regression for Yi , i = 1, 2, 3, are the same
as from the single multivariate regression. But, the latter approach can do
more: testing a hypothesis, or estimating a quantity, that involves regression
parameters from dierent univariate equations. The multivariate approach can
also provide simultaneous condence intervals or condence regions concerning
a mix of parameters from several univariate regressions.
4. Linear model
Consider now the situation having an additional record of 10 women and dene
a dummy variable indicating gender: Z=1 for a man, and =0 for a woman.
The univariate linear model for change in blood sugar on age without
interaction eects with gender, i.e. the regression lines for both genders have
the same slope, is (n = 18, m = 2):
Y1 = α + βX + γZ + ε.
Multivariate extension covering all three measures will then take the form
(n = 18, p = 3, m = 2):
Y1 = α1 + β1 X + γ1 Z + ε1
HKU STAT7005 3
STAT7005 Multivariate Methods
1. Overview and Matrix Orientation
Y2 = α2 + β2 X + γ2 Z + ε2
Y3 = α3 + β3 X + γ3 Z + ε3
Here, the dierence between the multivariate approach and the separately
univariate approach is the same as for the regression situation discussed in
item 3 above.
There are multivariate problems that are not attributable as direct extension of
univariate problems. The following is a partial list.
• Correspondence Analysis
Notation
A, B , S , R · · · matrix
a, b, s, r · · · vector
a, b, s, r · · · scalar
x1
..
For simplicity, we use x = . to represent both a random vector or an
xp
observation vector wherever the context is clear.
HKU STAT7005 4
STAT7005 Multivariate Methods
1. Overview and Matrix Orientation
Properties
1. E(a0 x) = a0 µ.
2. E(Ax) = Aµ.
Note that
1) σii = Cov(xi , xi ) = V ar(xi ) for i = 1, 2, . . . , p.
2) σij = Cov(xi , xj ) = Cov(xj , xi ) = σji ⇒ Σ is symmetric.
In matrix form,
If E(x) = µ,
V ar(x) = E(xx0 ) − µµ0 .
Properties
1. V ar(a0 x+b) = V ar(a0 x) = a0 Σa ≥ 0, where the last equality holds if and only
0
if a x 6= c, a constant. This property implies that Σ is always non-negative
−1
denite. It also implies that Σ is positive denite, and hence Σ exists, unless
x1 , . . . , xp are linearly related in which case we say that x is a degenerated
random vector (i.e. its eective dimension is less than p; in other words, its
joint distribution is concentrated in a subspace of lower dimension.)
2. V ar(Ax + b) = AΣA0 .
HKU STAT7005 5
STAT7005 Multivariate Methods
1. Overview and Matrix Orientation
Properties
1. V ar(x) = Cov(x, x).
2. Suppose x1 , x2 and y are p×1 vectors. Then, Cov(x1 +x2 , y) = Cov(x1 , y)+
Cov(x2 , y).
HKU STAT7005 6
STAT7005 Multivariate Methods
1. Overview and Matrix Orientation
= D −1 ΣD −1 ,
where √
σ11 0 · · · 0
0 √σ22 · · · 0 1
D = .. = (diag(Σ)) 2 .
.. .
.
. . .
√
0 0 · · · σpp
Properties
1. −1 ≤ ρij ≤ 1. ρii = 1 for i = 1, · · · , p. ρij = ρji . ρij = 0 if and only if σij = 0.
Suppose x1 , x2 , · · · , xn
are independent and identically distributed (i.i.d.) p × 1
0
random vectors, where xi = (xi1 , · · · , xip ) , i = 1, · · · , n. We usually arrange the
data in an n×p matrix, putting each multivariate observation as a row vector instead
of a column vector as follows.
X1 · · · Xj · · · Xp
x01 x11 · · · x1j · · · x1p 1
.. .. .
.
. .
. .
. . . . .
0
X = (xij )n×p = xi = xi1 · · · xij · · · xip i
. . . . .
.. .. .
.
. .
. .
x0n xn1 · · · xnj · · · xnp n
x1 · · · xj · · · xp sample mean
HKU STAT7005 7
STAT7005 Multivariate Methods
1. Overview and Matrix Orientation
1
S= [X 0 X − nx x0 ]
n−1
1 Xn
= (xi − x)(xi − x)0
n−1 i=1
1
= W,
n−1
where
n
X
W = X 0 X − nx x0 = (xi − x)(xi − x)0
i=1
is called the corrected sums of squares and products matrix, CSSP
matrix for short.
From the sample covariance matrix, we can obtain the sample correlation
matrix as an estimate of the population correlation matrix
1 r12 · · · r1p
r21 1 · · · r2p
Rp×p = ..
.. .
. .
. .
rp1 rp2 · · · 1
HKU STAT7005 8
STAT7005 Multivariate Methods
1. Overview and Matrix Orientation
skj
where rkj = √ , sample correlation coecient between xk and xj .
skk sjj
In matrix form,
R = D −1 SD −1 ,
where √
s11 0 · · · 0
0 √s22 · · · 0
D = .. .
.. .
. . .
.
√
0 0 · · · spp
Note that from the viewpoint of matrix operation, the transform from
covariance matrix to correlation matrix is just a particular application of
standardizing a positive denite matrix into a standard form where the
diagonal elements all equal to one. The original positive denite matrix can be
recovered by means of the the diagonal elements and the standardized matrix.
HKU STAT7005 9
STAT7005 Multivariate Methods
1. Overview and Matrix Orientation
Special Matrices
1
..
Unit vector 1=.
1
1 0
..
Identity matrix I=
.
0 1
1 ··· 1
.. . . .
. 0
Unit matrix J =. . = 11
.
1 ··· 1
Matrix Operations
Let A = (aij ), B = (bij ), a = (ai ) and b = (bi ).
A + B = (aij + bij )
A − B = (aij − bij )
cA = (caij )
X
a0 b = ai b i
i
!
X
AB = aik bkj
k
A0 = (aji )
1 if i=j
diag(A) = (aij δij ) where δij =
0 if i 6= j
X
tr(A) = aii
i
Transpose
(A0 )0 =A
(A + B)0 = A0 + B 0
(AB)0 = B 0 A0
0 0 0
AB A C
=
C D B 0 D0
Determinant: Det(A) or |A|
1. |AB| = |A| |B|.
AC
2. = |A| |B|.
0 B
3. |I p + AB| = |I q + BA| A is p × q, B is q × p.
HKU STAT7005 10
STAT7005 Multivariate Methods
1. Overview and Matrix Orientation
Inverse: A−1
1. A−1 A = AA−1 = I .
3. (AB)−1 = B −1 A−1 .
4. A−1 = |A|−1 .
A11 A12
5. Given A = where A11 and A22 are square matrices. Let A11·2 =
A21 A22
A11 − A12 A−1 −1
22 A21 and A22·1 = A22 − A21 A11 A12 .
(a) If |A11 | =
6 0,
A−1 −1 −1 −1
−A−1 −1
−1 11 + A11 A12 A22·1 A21 A11 11 A12 A22·1
A = −1 −1 −1
−A22·1 A21 A11 A22·1
|A| = |A11 | |A22·1 | .
(b) If |A22 | =
6 0,
A−1 −A−1 −1
−1 11·2 11·2 A12 A22
A =
−A22 A21 A−1
−1
11·2 A22 + A22 A21 A11·2 A12 A−1
−1 −1 −1
22
|A| = |A22 | |A11·2 | .
(c) If |A11 | =
6 0 and|A22 | =
6 0,
A−1 −A−1 −1
−1 11·2 11 A12 A22·1
A =
−A−1 −1
22 A21 A11·2 A−1 22·1
|A| = |A11 | |A22·1 | = |A22 | |A11·2 | .
|A − λI| = 0.
1. tr(A + B) = tr(A)+tr(B)
HKU STAT7005 11
STAT7005 Multivariate Methods
1. Overview and Matrix Orientation
2. tr(AB) = tr(BA)
3. tr(αA) = αtr(A)
4. For an n×n matrix A with eigenvalues λ1 ,λ2 ,. . . , λn , we have the following
n
P
(a) tr(A) = λi
i=1
n
Q
(b) |A| = λi
i=1
n
Q
(c) |I n ± A| = (1 ± λi )
i=1
symmetric if A0 = A
orthogonal if A−1 = A0
positive semi-denite (p.s.d.) if A is symmetric and `0 A` ≥ 0 for all `
positive denite (p.d.) if A is symmetric and `0 A` > 0 for ` 6= 0.
1. A symmetric matrix is p.s.d. (p.d.) if and only if its eigenvalues are
nonnegative (all positive).
1. Spectral decomposition
HKU STAT7005 12
STAT7005 Multivariate Methods
1. Overview and Matrix Orientation
2. If A is p.s.d. (p.d.), there exists a p.s.d. (p.d.) matrix B such that A = B2.
1
Sometimes, people write B as A 2 notationally.
3. Cholesky decomposition
A = U 0U .
4. Canonical decomposition
(a) I −A is idempotent.
0
(b) P AP is idempotent if P is orthogonal.
0 −1
Given that D is positive denite, D = CC and D = (C 0 )−1 C −1 . Let b =
C 0 a and e = C −1 `. Then, by Cauchy-Schwarz inequality (b0 e)2 ≤ (b0 b)(e0 e),
the result can follow.
HKU STAT7005 13
STAT7005 Multivariate Methods
1. Overview and Matrix Orientation
3. Let A be an n×n symmetric matrix and let D be any n×n p.d. matrix.
Then, for all a,
−1 a0 Aa
the smallest eigenvalue ofD A≤ ≤ the largest eigenvalue of D −1 A.
a0 Da
The equality on either side holds when a is proportional to the corresponding
eigenvector.
∂y
∂x1
∂y .
= .
.
∂x ∂y
∂xn
∂y
· · · ∂x∂y1n
∂x11
∂y . .
= . .
. .
∂X
∂y ∂y
∂xn1
· · · ∂xnn
∂a0 x
1. = a.
∂x
∂x0 Ax
2. = 2Ax if A is symmetric.
∂x
∂ tr(X)
3. =I
∂X
0
∂ tr(AX) A if all elements of X are distinct
4. = 0
∂X A + A − diag(A) if X is symmetric.
|X| (X −1 )0
∂ |X| if all elements of X are distinct
5. = −1 −1 0
∂X |X| (2X − diag(X )) if X is symmetric.
HKU STAT7005 14