0% found this document useful (0 votes)
24 views19 pages

Week 1

The document discusses dimension reduction techniques for machine learning. It begins with an overview of principal component analysis (PCA) and variants like sparse PCA, randomized PCA, and kernel PCA. It then covers multidimensional scaling (MDS) which attempts to preserve distances between data points. Locally linear embedding (LLE) and isometric feature mapping (ISOMAP) are also introduced as techniques that preserve local or geometric structures in the data. In closing, it notes that the best technique depends on the application and data.

Uploaded by

Đỗ Thiện
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views19 pages

Week 1

The document discusses dimension reduction techniques for machine learning. It begins with an overview of principal component analysis (PCA) and variants like sparse PCA, randomized PCA, and kernel PCA. It then covers multidimensional scaling (MDS) which attempts to preserve distances between data points. Locally linear embedding (LLE) and isometric feature mapping (ISOMAP) are also introduced as techniques that preserve local or geometric structures in the data. In closing, it notes that the best technique depends on the application and data.

Uploaded by

Đỗ Thiện
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

Dimension Reduction Techniques

DS 862: Machine Learning for Business Analysts

Rex Cheung

San Francisco State University

Jan 27, 2021

Rex Cheung (SFSU, DS 862) 8/25-27/2020 1 / 19


Outline:

Dimension Reduction Introduction


PCA review
Different types of PCA
Multidimensional Scaling
Locally Linear Embedding
Isometric Feature Mapping
Others

Rex Cheung (SFSU, DS 862) 8/25-27/2020 2 / 19


Dimension Reduction

Main purpose: reduce dimension of data (from p-dimension to


q-dimension, where q < p)
Curse of dimensionality
Notion of closeness is different in high-dimensional data
Will require a lot more data for high-dimensional setting
Different types of dimension reduction technique:
Raw feature selection
Projection based
Manifold learning

Rex Cheung (SFSU, DS 862) 8/25-27/2020 3 / 19


Review of PCA

Aims to find a lower dimensional representation of the data that


maximizes the variance
Find some linear combinations of the original data
The linear combinations of the original data is called principal
components (or principal component scores).
In general, for a data set with n observations and p predictors, you
can create at most min(n-1,p) principal components

Rex Cheung (SFSU, DS 862) 8/25-27/2020 4 / 19


Variants of PCA

Original PCA is good, but it depends on the data. Can be computationally


intensive if data size is large. Variants of PCA that give good
approximation of the full PCA algorithm:
Sparse PCA
Randomized PCA
Incremental PCA

Rex Cheung (SFSU, DS 862) 8/25-27/2020 5 / 19


Variants of PCA Cont.

Sparse PCA
Instead of a full PCA, sparse PCA encourages sparity in the principal
component loadings (the weights).
The optimization is similar to a full PCA procedure, with the addition
of sparse constraint in the loadings (think of LASSO).

max v T Σv
subject to ||v ||2 = 1
||v ||0 ≤ k

Application: financial protfolios, gene analyses, statistical testing, etc.

Rex Cheung (SFSU, DS 862) 8/25-27/2020 6 / 19


Variants of PCA Cont.

Randomized PCA
Instead of computing the exact PC loadings, this uses approximation
to estimate the first k PC loadings.
Borrows different algorithms to quickly estimate the singular value
decomposition (SVD) of a matrix, which is also a main algorithm for
getting PC.
Incremental PCA
Useful when data set is too large for computer memory.
Instead of computing PCA on the entire data set, it will perform PCA
on smaller chunks of data, and update sequentially.
Also useful for streaming data.

Rex Cheung (SFSU, DS 862) 8/25-27/2020 7 / 19


Variants of PCA Cont.

Kernel PCA
Useful for preserving clusters after projection.
According to some statistical theory, sometimes mapping data to a
higher dimension can help make a nonlinearly separately problem
becomes linearly separable.
Idea of Kernel PCA: map current data to an even higher dimension,
then perform PCA.
High-dimensional mapping will increase computation time. We can
use the kernel trick to help solve this problem.
Kernel: a function / way that calculates the distance between two
vector.
For more mathematical details of Kernel PCA, refer to
https://arxiv.org/pdf/1207.3538.pdf.

Rex Cheung (SFSU, DS 862) 8/25-27/2020 8 / 19


Figure: The two groups are not linearly separable in 2D. Using the mapping
(x1 , x2 ) → (x1 , x2 , x12 + x22 ), it becomes linearly separable in 3D. Example borrowed
from http://www.cs.haifa.ac.il/˜rita/uml_course/lectures/KPCA.pdf

Rex Cheung (SFSU, DS 862) 8/25-27/2020 9 / 19


Variants of PCA Cont.

Common kernels to use:


Gaussian / Radial Basis Function: K (xi , xj ) = exp(−γ||xi − xj ||22 )
Polynomial: K (xi , xj ) = (1 + xiT xj )p
Sigmoid: K (xi , xj ) = tanh(γ(xiT xj ) + δ)
xi and xj are original observations. Everything else are tuning parameters.

Rex Cheung (SFSU, DS 862) 8/25-27/2020 10 / 19


Multidimensional Scaling (MDS)

PCA is considered a projection method, where it attemps to project


high dimensional data to a lower dimension.
Projection methods usually don’t preserve any underlying structure.
Manifold embedding is similar to projection method, but also preserve
some structures simultaneously.

Rex Cheung (SFSU, DS 862) 8/25-27/2020 11 / 19


Figure: The famous Swiss Roll example. Image taken from
https://www.math.pku.edu.cn/teachers/yaoy/2011.fudan/mani.pdf

Rex Cheung (SFSU, DS 862) 8/25-27/2020 12 / 19


Multidimensional Scaling

Reduces dimensionality while trying to preserve the distances between


the instances
Basic steps to perform classical MDS:
1 Calculate distances dij between observations xi and xj
2 Find z1 , . . . zn ∈ Rq such that it minimizes
X
(dij − ||zi − zj ||)2
i6=j

This is called the stress function.


There are variants of MDS, such as using a different stress function.

Rex Cheung (SFSU, DS 862) 8/25-27/2020 13 / 19


Multidimensional Scaling

Remarks:
MDS can only be used for feature exploration.
It cannot be use to transform new observations (unlike PCA)

Rex Cheung (SFSU, DS 862) 8/25-27/2020 14 / 19


Locally Linear Embedding (LLE)

An improved version of MDS.


Preserves local properties rather than global distances.
A pseudocode:
1 For each observation xi , finds the closest k points.
2 Find the weights that reconstructs xi using the k nearest neighbors, i.e.
X
minwi,k ||xi − wi,k xk ||2
k∈N (i)

3 In the lower dimension, find new points zi that minimizes the expression

X N
X
||zi − wi,k zk ||2
i=1 k=1

The idea is to preserve the structure through the weights.

Rex Cheung (SFSU, DS 862) 8/25-27/2020 15 / 19


Isometric Feature Mapping (ISOMAP)

Consider geodesic distance rather than euclidean distance


A high level pseudocode:
1 Find the k nearest neighbours for each observation.
2 Construct the neighborhood graph.
3 Compute the shortest path (the geodesic distance) between each
observation. This will be our dij .
4 Apply MDS on dij .

Rex Cheung (SFSU, DS 862) 8/25-27/2020 16 / 19


Other Dimension Reduction Methods

There are many more methods out there:


Factor analysis
Independent Component Analysis
t-SNE
Self-Organizing Map
Autoencoders
etc..

Rex Cheung (SFSU, DS 862) 8/25-27/2020 17 / 19


Remarks

Most dimension reduction techniques are unsupervised methods.


All methods described works on numerical data only.
No best method; depends on application.
No best ’evaluation’; also depends on application.
For categorical data, one can use Multiple Correspondence Analysis
(MCA).

Rex Cheung (SFSU, DS 862) 8/25-27/2020 18 / 19


Reference

Hands-on ML: Chapter 8


ESL: Chapter 14.5, 14.7 - 14.9
tSNE vs PCA: link

Rex Cheung (SFSU, DS 862) 8/25-27/2020 19 / 19

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy