I2ml3e Chap6
I2ml3e Chap6
INTRODUCTION
TO
MACHINE
LEARNING
3RD EDITION
ETHEM ALPAYDIN
© The MIT Press, 2014
alpaydin@boun.edu.tr
http://www.cmpe.boun.edu.tr/~ethem/i2ml3e
CHAPTER 6:
DIMENSIONALITY
REDUCTION
Why Reduce Dimensionality?
3
Chosen
6
Iris data: Add one more feature to F4
Chosen
7
Principal Components Analysis
8
z = WT(x – m)
where the columns of W are the eigenvectors of ∑
and m is sample mean
Centers the data at the origin and rotates the axes
How to choose k ?
11
x z
z x
Factor Analysis
17
E | X
z r
z x x
s r s
2
s 2
r ,s x xr
gx | gx | x
r s r
x s
2
s 2
r ,s x x
r
Map of Europe by MDS
21
Find a low-dimensional
space such that when x
is projected, classes are
well-separated.
Find w that maximizes
J w
m1 m2 2
s1 s2
2 2
m1
t x r
w T t t
s t w x m1 r
2 T t 2 t
r t 1
t
22
Between-class scatter:
m1 m2 w m1 w m 2
2 T T 2
w T m1 m 2 m1 m 2 T w
w T SB w where SB m1 m 2 m1 m 2 T
Within-class scatter:
s t w x m1 r
2 T t 2 t
1
t w x m1 x m1 wr t w T S1w
T t t T
where S1 t x m1 x m1 r
t t T t
Parametric soln:
w 1 μ1 μ 2
when px|C i ~ N μ i ,
K>2 Classes
25
Within-class scatter:
Si t ri x m i x m i
K
SW Si t t t T
i 1
Between-class scatter:
K
1 K
SB Ni m i m m i m T
m mi
i 1 K i 1
Find W that max JW WT SB W
WT SW W
The largest eigenvectors of SW-1SB; maximum rank of K-1
26
PCA vs LDA
27
Canonical Correlation Analysis
28
100 2
22222
2
2
50 33 22 2
7 7777 1 11 313
3
3
77 7 7 7 4 111 1
1 3 338
1 8 83
7 44999 5 5 5 98 38
0 9 4
994949 5 9 88
4 0
88 0 0
0 00
-50 000
6
4 6 66 0
6 66
4
44
-100
4
-150
-150 -100 -50 0 50 100 150
32
Locally Linear Embedding
33
E (z | W) z r Wrs z(sr )
r s
34
LLE on Optdigits
35
0 000
7
7777
7
6 666 7
7 9 9
1 66 399 47
84 4
8383
9334
957
9
44
389 93
41
9
8 34
3
484 1
1
4 4 1 82 282
1 1 22 222
9
8 25
1
1
1 55
5
1
1
-3.5 -3 -2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5