SVD Note
SVD Note
SVD Note
Frank Dellaert
May 2008
1
3 Principal Components Analysis
An important application is principal components analysis (PCA). To motivate this, consider a normally
distributed, r-dimensional vector x ∼ N(0, Ir ) with zero mean and unit covariance matrix. In other words,
each component of the vector x is drawn independently from a 1-dimensional Gaussian with zero mean and
unit variance, i.e., white noise. If we transform the vector x to an m-dimensional output space using the
following linear transformation
y = µ +W x
with W of size m × r, then the m-dimensional vectors y will be distributed as y ∼ N(µ,WW T ).
Given a m × n data matrix Y of n data samples y, PCA uses eigen-analysis to recover the (scaled)
principal components W . After subtracting the sample mean µ from all vectors y forming the matrix A, the
eigen-decomposition of the sample covariance matrix AAT is obtained by (3):
AAT = US2U T = (US)(US)T = WW T
Hence, the data can be whitened by
x = W T (y − µ)
Just as a sanity check, the resulting covariance of x is indeed unity:
XX T = W T (AAT )W = W T (WW T )W = Ir
Two more facts regarding PCA are worth noting:
1. If the dimensionality m of the data matrix Y is very large, it is more efficient to use the eigen-
decomposition (2) of AT A and obtain the principal components as W = AV .
2. The m × r matrix W contains the scaled principal components. Clearly, the normalized principal
components are the columns of U, and their lengths are the singular values σ .
Finally, it is interesting that to sample from the density y ∼ N(µ,WW T ) one can proceed in two ways:
1. Sample from x ∼ N(0, Ir ), and form y = µ + W x, i.e., generate a white latent vector x and use the
principal components W of the data Y to generate an m-dimensional vector.
2. Sample from c ∼ N(0, In ), and form y = µ + Ac, i.e., simply form a linear combination of the original
n
(centered) data points, with coefficients c j j=1 drawn from zero-mean, unit-variance Gaussians.
If the data is very high-dimensional, i.e., m n, a similar algorithm can be devised by sampling rows.