0% found this document useful (0 votes)

41 views37 pages

I2ml3e Chap6

This document provides a summary of key concepts in dimensionality reduction techniques discussed in Chapter 6. It begins with an overview of why dimensionality reduction is useful, distinguishing between feature selection and feature extraction. It then covers several specific techniques including principal component analysis (PCA), linear discriminant analysis (LDA), canonical correlation analysis (CCA), Isomap, and others. For each technique, it provides a brief explanation of the mathematical formulation and goal in 2-3 sentences.

Uploaded by

EMS Metalworking Machinery

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

41 views37 pages

I2ml3e Chap6

Uploaded by

EMS Metalworking Machinery

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 37

Lecture Slides for

INTRODUCTION
TO
MACHINE
LEARNING
3RD EDITION
ETHEM ALPAYDIN
© The MIT Press, 2014

alpaydin@boun.edu.tr
http://www.cmpe.boun.edu.tr/~ethem/i2ml3e
CHAPTER 6:

DIMENSIONALITY
REDUCTION
Why Reduce Dimensionality?
3

 Reduces time complexity: Less computation

 Reduces space complexity: Fewer parameters
 Saves the cost of observing the feature
 Simpler models are more robust on small datasets
 More interpretable; simpler explanation
 Data visualization (structure, groups, outliers, etc) if
plotted in 2 or 3 dimensions
Feature Selection vs Extraction
4

 Feature selection: Choosing k<d important features,

ignoring the remaining d – k
Subset selection algorithms
 Feature extraction: Project the
original xi , i =1,...,d dimensions to
new k<d dimensions, zj , j =1,...,k
Subset Selection
5

 There are 2d subsets of d features

 Forward search: Add the best feature at each step
 Set of features F initially Ø.
 At each iteration, find the best new feature
j = argmini E ( F  xi )
 Add xj to F if E ( F  xj ) < E ( F )

 Hill-climbing O(d2) algorithm

 Backward search: Start with all features and remove
one at a time, if possible.
 Floating search (Add k, remove l)
Iris data: Single feature

Chosen

6
Iris data: Add one more feature to F4

Chosen

7
Principal Components Analysis
8

 Find a low-dimensional space such that when x is

projected there, information loss is minimized.
 The projection of x on the direction of w is: z = wTx
 Find w such that Var(z) is maximized
Var(z) = Var(wTx) = E[(wTx – wTμ)2]
= E[(wTx – wTμ)(wTx – wTμ)]
= E[wT(x – μ)(x – μ)Tw]
= wT E[(x – μ)(x –μ)T]w = wT ∑ w
where Var(x)= E[(x – μ)(x –μ)T] = ∑
 Maximize Var(z) subject to ||w||=1
maxw1T w1   w1T w1  1
w1

∑w1 = αw1 that is, w1 is an eigenvector of ∑

Choose the one with the largest eigenvalue for Var(z) to be
max
 Second principal component: Max Var(z2), s.t.,
||w2||=1 and orthogonal to w1
maxwT2 w 2   wT2 w 2  1   wT2 w1  0
w2

∑ w2 = α w2 that is, w2 is another eigenvector of ∑

and so on.
9
What PCA does
10

z = WT(x – m)
where the columns of W are the eigenvectors of ∑
and m is sample mean
Centers the data at the origin and rotates the axes
How to choose k ?
11

 Proportion of Variance (PoV) explained

1  2    k
1  2    k    d

when λi are sorted in descending order

 Typically, stop at PoV>0.9
 Scree graph plots of PoV vs k, stop at “elbow”
12
13
Feature Embedding
14

 When X is the Nxd data matrix,

XTX is the dxd matrix (covariance of features, if mean-
centered)
XXT is the NxN matrix (pairwise similarities of instances)
 PCA uses the eigenvectors of XTX which are d-dim and can
be used for projection
T
 Feature embedding uses the eigenvectors of XX which are
N-dim and which give directly the coordinates after
projection
 Sometimes, we can define pairwise similarities (or distances)
between instances, then we can use feature embedding
without needing to represent instances as vectors.
Factor Analysis
15

 Find a small number of factors z, which when

combined generate x :
xi – µi = vi1z1 + vi2z2 + ... + vikzk + εi

where zj, j =1,...,k are the latent factors with

E[ zj ]=0, Var(zj)=1, Cov(zi ,, zj)=0, i ≠ j ,
εi are the noise sources
E[ εi ]= ψi, Cov(εi , εj) =0, i ≠ j, Cov(εi , zj) =0 ,
and vij are the factor loadings
PCA vs FA
16

 PCA From x to z z = WT(x – µ)

 FA From z to x x – µ = Vz + ε

x z

z x
Factor Analysis
17

 In FA, factors zj are stretched, rotated and

translated to generate x
Singular Value Decomposition and
18
Matrix Factorization
 Singular value decomposition: X=VAWT
V is NxN and contains the eigenvectors of XXT
W is dxd and contains the eigenvectors of XTX
and A is Nxd and contains singular values on its first
k diagonal
 X=u1a1v1T+...+ukakvkT where k is the rank of X
Matrix Factorization
19

 Matrix factorization: X=FG

F is Nxk and G is kxd

Latent semantic indexing

Multidimensional Scaling
20

 Given pairwise distances between N points,

dij, i,j =1,...,N
place on a low-dim map s.t. distances are preserved
(by feature embedding)
 z = g (x | θ ) Find θ that min Sammon stress

E  | X   
z r
z  x x
s r s

2

s 2
r ,s x xr


 gx |  gx |   x
r s r
x s

2

s 2
r ,s x x
r
Map of Europe by MDS
21

Map from CIA – The World Factbook: http://www.cia.gov/

Linear Discriminant Analysis

 Find a low-dimensional
space such that when x
is projected, classes are
well-separated.
 Find w that maximizes

J w  
m1  m2  2

s1  s2
2 2

m1 
t x r
w T t t

s  t w x  m1  r
2 T t 2 t

r t 1
t
22
 Between-class scatter:
m1  m2   w m1  w m 2 
2 T T 2

 w T m1  m 2 m1  m 2 T w
 w T SB w where SB  m1  m 2 m1  m 2 T

 Within-class scatter:
s  t w x  m1  r
2 T t 2 t
1

 t w x  m1 x  m1  wr t  w T S1w
T t t T

where S1  t x  m1 x  m1  r
t t T t

s12  s12  w T SW w where SW  S1  S 2

23
Fisher’s Linear Discriminant
24

 Find w that max

w SB w w m1  m 2 
T 2
T
Jw   T 
w SW w w T SW w
LDA soln:
w  c  SW1 m1  m2 


 Parametric soln:
w   1 μ1  μ 2 
when px|C i  ~ N μ i ,  
K>2 Classes
25

 Within-class scatter:
Si  t ri x  m i x  m i 
K
SW   Si t t t T

i 1

 Between-class scatter:
K
1 K
SB   Ni m i  m m i  m  T
m  mi
i 1 K i 1
 Find W that max JW   WT SB W
WT SW W
The largest eigenvectors of SW-1SB; maximum rank of K-1
26
PCA vs LDA
27
Canonical Correlation Analysis
28

 X={xt,yt}t ; two sets of variables x and y x

 We want to find two projections w and v st when x
is projected along w and y is projected along v, the
correlation is maximized:
CCA
29

 x and y may be two different views or modalities;

e.g., image and word tags, and CCA does a joint
mapping
Isomap
30

 Geodesic distance is the distance along the

manifold that the data lies in, as opposed to the
Euclidean distance in the input space
Isomap
31

 Instances r and s are connected in the graph if

||xr-xs||<e or if xs is one of the k neighbors of xr
The edge length is ||xr-xs||
 For two nodes r and s not connected, the distance is
equal to the shortest path between them
 Once the NxN distance matrix is thus formed, use
MDS to find a lower-dimensional mapping
Optdigits after Isomap (with neighborhood graph).
150

100 2
22222
2
2
50 33 22 2
7 7777 1 11 313
3
3
77 7 7 7 4 111 1
1 3 338
1 8 83
7 44999 5 5 5 98 38
0 9 4
994949 5 9 88
4 0
88 0 0
0 00
-50 000
6
4 6 66 0
6 66
4
44
-100
4

-150
-150 -100 -50 0 50 100 150

Matlab source from http://web.mit.edu/cocosci/isomap/isomap.html

32
Locally Linear Embedding
33

1. Given xr find its neighbors xs(r)

2. Find Wrs that minimize
2

E (W| X )   x r   Wrs x(sr )

r s

3. Find the new coordinates zr that minimize

E (z | W)   z r   Wrs z(sr )
r s
34
LLE on Optdigits
35

0 000

7
7777
7
6 666 7
7 9 9
1 66 399 47
84 4
8383
9334
957
9
44
389 93
41
9
8 34
3
484 1
1
4 4 1 82 282
1 1 22 222
9
8 25
1
1
1 55

5
1

1
-3.5 -3 -2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5

Matlab source from http://www.cs.toronto.edu/~roweis/lle/code.html

Laplacian Eigenmaps
36

 Let r and s be two instances and Brs is their similarity, we

want to find zr and zs that

 Brs can be defined in terms of similarity in an original

space: 0 if xr and xs are too far, otherwise

 Defines a graph Laplacian, and feature embedding

returns zr
Laplacian Eigenmaps on Iris
37

Spectral clustering (chapter 7)

Wealth Managers
0% (1)
Wealth Managers
4 pages
Certified Risk and Compliance Professional (CRCP)
100% (2)
Certified Risk and Compliance Professional (CRCP)
11 pages
Ruiz Modified I2ml3e Chap6
No ratings yet
Ruiz Modified I2ml3e Chap6
38 pages
16 dm2 Dimred 2022 23
No ratings yet
16 dm2 Dimred 2022 23
49 pages
Visualization 9 Dim Reduction
No ratings yet
Visualization 9 Dim Reduction
73 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
57 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
60 pages
DimensionalityReduction 13022024
No ratings yet
DimensionalityReduction 13022024
32 pages
MLSP-6 Dimensionality Reduction
No ratings yet
MLSP-6 Dimensionality Reduction
39 pages
Lecture 7: Unsupervised Learning: C19 Machine Learning Hilary 2013 A. Zisserman
No ratings yet
Lecture 7: Unsupervised Learning: C19 Machine Learning Hilary 2013 A. Zisserman
20 pages
Feature Engineering
No ratings yet
Feature Engineering
51 pages
Unit V
No ratings yet
Unit V
82 pages
کتاب نهم بارگزاری شده
No ratings yet
کتاب نهم بارگزاری شده
55 pages
Dimensions Reduction
No ratings yet
Dimensions Reduction
27 pages
CHP 4
No ratings yet
CHP 4
72 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
85 pages
08 HighDimensional PDF
No ratings yet
08 HighDimensional PDF
88 pages
08 HighDimensional PDF
No ratings yet
08 HighDimensional PDF
88 pages
Outline: Reducing Data Dimension
No ratings yet
Outline: Reducing Data Dimension
7 pages
Lecture 9 - PCA
No ratings yet
Lecture 9 - PCA
44 pages
5 Data Pre Processing III
No ratings yet
5 Data Pre Processing III
30 pages
Unit 3
No ratings yet
Unit 3
21 pages
Dim Reduction & Pattern Recognition
No ratings yet
Dim Reduction & Pattern Recognition
63 pages
Lecture 6
No ratings yet
Lecture 6
38 pages
شباتر اله مجمعه
No ratings yet
شباتر اله مجمعه
126 pages
Dimensionality Reduction Lecture Slide
No ratings yet
Dimensionality Reduction Lecture Slide
27 pages
Cheat Sheet
No ratings yet
Cheat Sheet
4 pages
Principal Component Analysis: Atent Ariables
No ratings yet
Principal Component Analysis: Atent Ariables
13 pages
Sta 5
No ratings yet
Sta 5
16 pages
AML Unit - 1 Material
No ratings yet
AML Unit - 1 Material
36 pages
Lecture W12ab
No ratings yet
Lecture W12ab
60 pages
Lec 3
No ratings yet
Lec 3
60 pages
Data Science Cheatsheet
No ratings yet
Data Science Cheatsheet
5 pages
Lecture 7: Principal Component Analysis (PCA) (Draft: Version 0.9.1)
No ratings yet
Lecture 7: Principal Component Analysis (PCA) (Draft: Version 0.9.1)
11 pages
DimensionalityReduction Pca
No ratings yet
DimensionalityReduction Pca
24 pages
22AIP3101A Session 7
No ratings yet
22AIP3101A Session 7
28 pages
Machine Learning (CSO851) - Lecture 03
No ratings yet
Machine Learning (CSO851) - Lecture 03
71 pages
Facial Recognition and Mathematics - Vectors and Geometry in Action
No ratings yet
Facial Recognition and Mathematics - Vectors and Geometry in Action
6 pages
4.5 Principal Component Analysis
No ratings yet
4.5 Principal Component Analysis
15 pages
Unit 4
No ratings yet
Unit 4
79 pages
Dimensionality Reduction: Pca, SVD, MDS, Ica, and Friends
No ratings yet
Dimensionality Reduction: Pca, SVD, MDS, Ica, and Friends
50 pages
Machine Learning
No ratings yet
Machine Learning
29 pages
Pca Kmeans GMM
No ratings yet
Pca Kmeans GMM
96 pages
# Loop Over Classes: 6.2 Principal Components Analysis (Pca)
No ratings yet
# Loop Over Classes: 6.2 Principal Components Analysis (Pca)
10 pages
Lecture 17 and 18
No ratings yet
Lecture 17 and 18
29 pages
Dimension Reduction and Hidden Structure: 1.1 Principal Component Analysis (PCA)
No ratings yet
Dimension Reduction and Hidden Structure: 1.1 Principal Component Analysis (PCA)
40 pages
Lecture8 2015
No ratings yet
Lecture8 2015
51 pages
5 - Feature Generation
No ratings yet
5 - Feature Generation
15 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
32 pages
Feature Extraction
No ratings yet
Feature Extraction
16 pages
cs229 Notes10 PDF
No ratings yet
cs229 Notes10 PDF
6 pages
Unit - 4
No ratings yet
Unit - 4
76 pages
Lecture: Dimensionality Reduction With Principal Component Analysis
No ratings yet
Lecture: Dimensionality Reduction With Principal Component Analysis
42 pages
Chapter 6
No ratings yet
Chapter 6
55 pages
315 F19 27 Pca1
No ratings yet
315 F19 27 Pca1
28 pages
ML Mod 4 Part 2
No ratings yet
ML Mod 4 Part 2
32 pages
PrincipalComponentAnalysis LectureNotesPublic
No ratings yet
PrincipalComponentAnalysis LectureNotesPublic
24 pages
Module7 PCA Clustering November 9-13-2023
No ratings yet
Module7 PCA Clustering November 9-13-2023
41 pages
Mathematical Functions
From Everand
Mathematical Functions
Oliver Linton
No ratings yet
Geometric functions in computer aided geometric design
From Everand
Geometric functions in computer aided geometric design
Oscar Ruiz
No ratings yet
Factoring and Algebra - A Selection of Classic Mathematical Articles Containing Examples and Exercises on the Subject of Algebra (Mathematics Series)
From Everand
Factoring and Algebra - A Selection of Classic Mathematical Articles Containing Examples and Exercises on the Subject of Algebra (Mathematics Series)
CSPacademic
No ratings yet
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet
Es1003 Lathe Drawtube Specifications
No ratings yet
Es1003 Lathe Drawtube Specifications
7 pages
EN PDF Betontechnik
No ratings yet
EN PDF Betontechnik
23 pages
Appendix 1 - NC00056438-Rev2
No ratings yet
Appendix 1 - NC00056438-Rev2
3 pages
Erguvan Olive Oil Price List
No ratings yet
Erguvan Olive Oil Price List
1 page
Duyuru Basel 0001 55
No ratings yet
Duyuru Basel 0001 55
29 pages
2018 Annual Report
No ratings yet
2018 Annual Report
76 pages
I2ml3e Chap9
No ratings yet
I2ml3e Chap9
15 pages
I2ml3e Chap8
No ratings yet
I2ml3e Chap8
28 pages
I2ml3e Chap11
No ratings yet
I2ml3e Chap11
38 pages
I2ml3e Chap5
No ratings yet
I2ml3e Chap5
26 pages
I2ml3e Chap15
No ratings yet
I2ml3e Chap15
22 pages
Feeding For Dairy
No ratings yet
Feeding For Dairy
28 pages
Steam Turbine 00 Neil Rich
No ratings yet
Steam Turbine 00 Neil Rich
242 pages
A Directory of Paper Recycling Resources
No ratings yet
A Directory of Paper Recycling Resources
276 pages
Design and Fabrication of Pressing Steam Boiler
No ratings yet
Design and Fabrication of Pressing Steam Boiler
14 pages
Unit 2
No ratings yet
Unit 2
92 pages
Thermodynamics of 1911 Pea B
No ratings yet
Thermodynamics of 1911 Pea B
330 pages
TheDesignofSteamBoilersandPressureVessels 10267815
No ratings yet
TheDesignofSteamBoilersandPressureVessels 10267815
439 pages
Steam Turbine 08 Neil Goog
No ratings yet
Steam Turbine 08 Neil Goog
655 pages
General Purchase Terms - Version 10.2006
No ratings yet
General Purchase Terms - Version 10.2006
5 pages
Steamturbine 02 Neilgoog
No ratings yet
Steamturbine 02 Neilgoog
245 pages
(WWW - Asianovel.com) - The Viridescent Crown 3-30
0% (1)
(WWW - Asianovel.com) - The Viridescent Crown 3-30
245 pages
Drilling Rig Components
No ratings yet
Drilling Rig Components
55 pages
Inverters: Variable Frequency Drives Family
No ratings yet
Inverters: Variable Frequency Drives Family
52 pages
Eaton 12V 330W Battery: Data Sheet
No ratings yet
Eaton 12V 330W Battery: Data Sheet
2 pages
Vestige
No ratings yet
Vestige
77 pages
Canking - Can Simulation Software: EX. NO.1.B 02-07-2019
No ratings yet
Canking - Can Simulation Software: EX. NO.1.B 02-07-2019
6 pages
At May 2011 Web
No ratings yet
At May 2011 Web
48 pages
Sorting Algorithms C++
No ratings yet
Sorting Algorithms C++
18 pages
Minutes April 2018
No ratings yet
Minutes April 2018
4 pages
CAE Atpl 11 Npa 29 PBN
No ratings yet
CAE Atpl 11 Npa 29 PBN
42 pages
Bell LaPadula
No ratings yet
Bell LaPadula
42 pages
Interlink Vs CA - G.R. No. 203298
No ratings yet
Interlink Vs CA - G.R. No. 203298
8 pages
Fire Protection Gen. Notes
No ratings yet
Fire Protection Gen. Notes
1 page
Emergency Lighting Specification Zoneworks XT L10 V1.1
No ratings yet
Emergency Lighting Specification Zoneworks XT L10 V1.1
5 pages
Worksheet Internal Barriers To Change
100% (1)
Worksheet Internal Barriers To Change
2 pages
Vi 200 User Manual 11.03.02
No ratings yet
Vi 200 User Manual 11.03.02
28 pages
Navigating The Changes To Ifrs 2022
No ratings yet
Navigating The Changes To Ifrs 2022
36 pages
Module 3: Income Tax On Individuals - Part 1 Learning Objectives
No ratings yet
Module 3: Income Tax On Individuals - Part 1 Learning Objectives
8 pages
Lesson 1: Digital Citizenship vs. Global Citizenship
100% (2)
Lesson 1: Digital Citizenship vs. Global Citizenship
14 pages
Electromagnetic Design and Loss Calculations of A 1.12-MW High-Speed Permanent-Magnet Motor For Compressor Applications
No ratings yet
Electromagnetic Design and Loss Calculations of A 1.12-MW High-Speed Permanent-Magnet Motor For Compressor Applications
9 pages
The VW Engine 2.3 Litros - Type V - 5 Cil
No ratings yet
The VW Engine 2.3 Litros - Type V - 5 Cil
40 pages
KTH Thesis Latex Template
100% (2)
KTH Thesis Latex Template
8 pages
Appendix A
No ratings yet
Appendix A
12 pages
Lesson 1 Introduction To The Cabin Crew
No ratings yet
Lesson 1 Introduction To The Cabin Crew
6 pages
04 Quiz 1
No ratings yet
04 Quiz 1
3 pages
The Life and Culture of The Indigenous People
No ratings yet
The Life and Culture of The Indigenous People
2 pages
W60HAP V2 Product Brochure
No ratings yet
W60HAP V2 Product Brochure
9 pages
MLBSherman Act
No ratings yet
MLBSherman Act
7 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

I2ml3e Chap6

Uploaded by

I2ml3e Chap6

Uploaded by

Lecture Slides for

 Reduces time complexity: Less computation

 Feature selection: Choosing k<d important features,

 There are 2d subsets of d features

 Hill-climbing O(d2) algorithm

 Find a low-dimensional space such that when x is

∑w1 = αw1 that is, w1 is an eigenvector of ∑

∑ w2 = α w2 that is, w2 is another eigenvector of ∑

 Proportion of Variance (PoV) explained

when λi are sorted in descending order

 When X is the Nxd data matrix,

 Find a small number of factors z, which when

where zj, j =1,...,k are the latent factors with

 PCA From x to z z = WT(x – µ)

 In FA, factors zj are stretched, rotated and

 Matrix factorization: X=FG

Latent semantic indexing

 Given pairwise distances between N points,

Map from CIA – The World Factbook: http://www.cia.gov/

s12  s12  w T SW w where SW  S1  S 2

 Find w that max

 X={xt,yt}t ; two sets of variables x and y x

 x and y may be two different views or modalities;

 Geodesic distance is the distance along the

 Instances r and s are connected in the graph if

Matlab source from http://web.mit.edu/cocosci/isomap/isomap.html

1. Given xr find its neighbors xs(r)

E (W| X )   x r   Wrs x(sr )

3. Find the new coordinates zr that minimize

Matlab source from http://www.cs.toronto.edu/~roweis/lle/code.html

 Let r and s be two instances and Brs is their similarity, we

 Brs can be defined in terms of similarity in an original

 Defines a graph Laplacian, and feature embedding

Spectral clustering (chapter 7)

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.