Week 1
Week 1
Rex Cheung
Sparse PCA
Instead of a full PCA, sparse PCA encourages sparity in the principal
component loadings (the weights).
The optimization is similar to a full PCA procedure, with the addition
of sparse constraint in the loadings (think of LASSO).
max v T Σv
subject to ||v ||2 = 1
||v ||0 ≤ k
Randomized PCA
Instead of computing the exact PC loadings, this uses approximation
to estimate the first k PC loadings.
Borrows different algorithms to quickly estimate the singular value
decomposition (SVD) of a matrix, which is also a main algorithm for
getting PC.
Incremental PCA
Useful when data set is too large for computer memory.
Instead of computing PCA on the entire data set, it will perform PCA
on smaller chunks of data, and update sequentially.
Also useful for streaming data.
Kernel PCA
Useful for preserving clusters after projection.
According to some statistical theory, sometimes mapping data to a
higher dimension can help make a nonlinearly separately problem
becomes linearly separable.
Idea of Kernel PCA: map current data to an even higher dimension,
then perform PCA.
High-dimensional mapping will increase computation time. We can
use the kernel trick to help solve this problem.
Kernel: a function / way that calculates the distance between two
vector.
For more mathematical details of Kernel PCA, refer to
https://arxiv.org/pdf/1207.3538.pdf.
Remarks:
MDS can only be used for feature exploration.
It cannot be use to transform new observations (unlike PCA)
3 In the lower dimension, find new points zi that minimizes the expression
X N
X
||zi − wi,k zk ||2
i=1 k=1