SVD Note

Download as pdf or txt
Download as pdf or txt
You are on page 1of 2

Singular Value and Eigenvalue Decompositions

Frank Dellaert

May 2008

1 The Singular Value Decomposition


The singular value decomposition (SVD) factorizes a linear operator A : Rn → Rm into three simpler linear
operators:
1. Projection z = V T x into an r-dimensional space, where r is the rank of A
2. Element-wise multiplication with r singular values σi , i.e., z0 = Sz
3. Transformation y = Uz0 to the m-dimensional output space
Combining these statements, A can be re-written as
A = USV T (1)
with U an m × r orthonormal matrix spanning A’s column space im(A), S an r × r diagonal matrix of singular
values, and V an n × r orthonormal matrix spanning A’s row space im(AT ).

2 The Eigenvalue Decomposition


The eigenvalue decomposition applies to mappings from Rn to itself, i.e., a linear operator A : Rn → Rn
described by a square matrix. An eigenvector e of A is a vector that is mapped to a scaled version of itself,
i.e., Ae = λ e, where λ is the corresponding eigenvalue. For a matrix A of rank r, we can group the r non-zero
eigenvalues in an r × r diagonal matrix Λ and their eigenvectors in an n × r matrix E, and we have
AE = EΛ
Furthermore, if A is full rank (r = n) then A can be factorized as
A = EΛE −1
which is a diagonalization similar to the SVD (1). In fact, if and only if A is symmetric1 and positive definite
(abbreviated SPD), we have that the SVD and the eigen-decomposition coincide
A = USU T = EΛE −1
with U = E and S = Λ.
Given a non-square matrix A = USV T , two matrices and their factorization are of special interest:
AT A = V S2V T (2)
AAT = US2U T (3)
Thus, for these matrices the SVD on the original matrix A can be used to compute their SVD. And since
these matrices are by definition SPD, this is also their eigen-decomposition, with eigenvalues Λ = S2 .
1 if we allow complex matrices, A must be hermitian, i.e., A’s conjugate transpose A∗ = A

1
3 Principal Components Analysis
An important application is principal components analysis (PCA). To motivate this, consider a normally
distributed, r-dimensional vector x ∼ N(0, Ir ) with zero mean and unit covariance matrix. In other words,
each component of the vector x is drawn independently from a 1-dimensional Gaussian with zero mean and
unit variance, i.e., white noise. If we transform the vector x to an m-dimensional output space using the
following linear transformation
y = µ +W x
with W of size m × r, then the m-dimensional vectors y will be distributed as y ∼ N(µ,WW T ).
Given a m × n data matrix Y of n data samples y, PCA uses eigen-analysis to recover the (scaled)
principal components W . After subtracting the sample mean µ from all vectors y forming the matrix A, the
eigen-decomposition of the sample covariance matrix AAT is obtained by (3):
AAT = US2U T = (US)(US)T = WW T
Hence, the data can be whitened by
x = W T (y − µ)
Just as a sanity check, the resulting covariance of x is indeed unity:
XX T = W T (AAT )W = W T (WW T )W = Ir
Two more facts regarding PCA are worth noting:
1. If the dimensionality m of the data matrix Y is very large, it is more efficient to use the eigen-
decomposition (2) of AT A and obtain the principal components as W = AV .
2. The m × r matrix W contains the scaled principal components. Clearly, the normalized principal
components are the columns of U, and their lengths are the singular values σ .
Finally, it is interesting that to sample from the density y ∼ N(µ,WW T ) one can proceed in two ways:
1. Sample from x ∼ N(0, Ir ), and form y = µ + W x, i.e., generate a white latent vector x and use the
principal components W of the data Y to generate an m-dimensional vector.
2. Sample from c ∼ N(0, In ), and form y = µ + Ac, i.e., simply form a linear combination of the original
n
(centered) data points, with coefficients c j j=1 drawn from zero-mean, unit-variance Gaussians.

4 A Simple Monte Carlo PCA Algorithm


If the data lives in a small subspace of rank r, then simply drawing s ≥ r samples Ys from the data and
performing PCA will with high probability recover a set of scaled principal components Ws = Us Ss (of size
m × r) that span the subspace. Then, forming the r × n matrix Xs of latent variables
Xs = WsT A
for all centered samples A will not quite whiten the data. However, by doing a second decomposition on the
covariance of Xs
Xs XsT = W2W2T
the latent variables are whitened. Hence, if we form W = WsW2 and X = W T A = W2T Xs , we have
XX T = W2T Xs XsT W2 = W2T W2W2T W2 = Ir
 

If the data is very high-dimensional, i.e., m  n, a similar algorithm can be devised by sampling rows.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy