Singular Value Decomposition: Yan-Bin Jia Sep 6, 2012

Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

Singular Value Decomposition

(Com S 477/577 Notes)

Yan-Bin Jia
Sep 6, 2012

Introduction

Now comes a highlight of linear algebra. Any m n matrix can be factored as


A = U V T
where U is an m m orthogonal matrix1 whose columns are the eigenvectors of AAT , V is an n n
orthogonal matrix whose columns are the eigenvectors of AT A, and is an m n diagonal matrix
of the form

..

.
.

.
0

with 1 2 r > 0 and r = rank(A). In the above, 1 , . . . , r are the square roots of the
eigenvalues of AT A. They are called the singular values of A.
Our basic goal is to solve the system Ax = b for all matrices A and vectors b. A second goal
is to solve the system using a numerically stable algorithm. A third goal is to solve the system in
a reasonably efficient manner. For instance, we do not want to compute A1 using determinants.
Three situations arise regarding the basic goal:
(a) If A is square and invertible, we want to have the solution x = A1 b.
(b) If A is underconstrained, we want the entire set of solutions.
(c) If A is overconstrained, we could simply give up. But this case arises a lot in practice, so
which
instead we will ask for the least-squares solution. In other words, we want that x
minimizes the error kA
x bk. Geometrically, A
x is the point in the column space of A
closest to b.
1

That is, U U T = U T U = I.

Gaussian elimination is reasonably efficient, but it is not numerically very stable. In particular, elimination does not deal with nearly singular matrices. The method is not designed for
overconstrained systems. Even for underconstrained systems, the method requires extra work.
The poor numerical character of elimination can be seen in a couple ways. First, the elimination
process assumes a non-singular matrix. But singularity, and rank in general, is a slippery concept.
After all, the matrix A contains continuous, possibly noisy, entries. Yet, rank is a discrete integer.
Strictly speaking, the two sets below are linearly independent vectors:



0
0
1.00
1.00
1
1.01
0 , 1 , 0 ,
1.00 , 1.01 , 1.00 .

1
0
0
1.01
1.00
1.00

Yet, the first set seems genuinely independent, while the second set seems almost dependent.
Second, based on elimination, one solves the system Ax = b by solving (via backsubstitution)
Ly = b and U x = D 1 y. If A is nearly singular, then D will contain near-zero entries on its
diagonal, and thus D 1 will contain large numbers. That is OK since in principle one needs the
large numbers to obtain a true solution. The problem is if A contains noisy entries. Then the large
numbers may be pure noise that dominates the true information. Furthermore, since L and U can
be fairly arbitrary, they may distort or magnify that noise across other variables.
Singular value decomposition is a powerful technique for dealing with sets of equations or
matrices that are either singular or else numerically very close to singular. In many cases where
Gaussian elimination and LU decomposition fail to give satisfactory results, SVD will not only
diagnose the problem but also give you a useful numerical answer. It is also the method of choice
for solving most linear least-squares problems.

SVD Close-up

Recall that it is sometimes possible to decompose an n n matrix A as


A = SS 1 ,
where is a diagonal matrix with the eigenvalues of A on the diagonal and S contains the eigenvectors of A.
Why is the above decomposition appealing? The answer lies in the change of coordinates
y = S 1 x. Instead of working with the system Ax = b, we can work with the system y = c,
where c = S 1 b. Since is diagonal, we are left with a trivial system
i yi = ci ,

i = 1, . . . , n,

where i s are the eigenvalues of A. If this system has a solution, then another change of coordinates
gives us x, that is, x = Sy.2
Unfortunately, the decomposition A = SS 1 is not always possible. The condition for its
existence is that A is n n with n linearly independent eigenvectors. Even worse, what do we do
if A is not square?
2

There is no reason to believe that computing S 1 b is any easier than computing A1 b. However, in the ideal
case, the eigenvectors of A are orthogonal. This is true, for instance, if AAT = AT A. In that case the columns of
S are orthogonal and so we can take S to be orthogonal. But then S 1 = S T , and the problem of solving Ax = b
becomes very simple.

The answer is work with AT A and AAT , both of which are symmetric (and have n and m
orthogonal eigenvectors, respectively). So we have the following decompositions:
AT A = V DV T ,
AAT

= U D U T ,

where V is an n n orthogonal matrix consisting of the eigenvectors of AT A, D an n n diagonal


matrix with the eigenvalues of AT A on the diagonal, U an m m orthogonal matrix consisting of
the eigenvectors of AAT , and D an m m diagonal matrix with the eigenvalues of AAT on the
diagonal. It turns out that D and D have the same non-zero diagonal entries except that the order
might be different.
Recall that

VT

mm

mn

nn

u1

ur ur+1

(1)

um

...

...

...

0
col(A)

v T1
row(A)
v Tr

...

...

mn

...

v Tr+1

null(A)

v Tn

col(AT )

There are several facts about SVD:


(a) rank(A) = rank() = r.
(b) The column space of A is spanned by the first r columns of U .
(c) The null space of A is spanned by the last n r columns of V .
(d) The row space of A is spanned by the first r columns of V .
(e) The null space of AT is spanned by the last m r columns of U .
We can think of U and V as rotations and reflections and as stretching matrix. The next
figure illustrates the sequence of transformation under A on the unit vectors v 1 and v 2 (and all
other vectors on the unit circle) in the case m = n = 2. Note that V = (v 1 v 2 ). When multiplied
by V T on the left, the two vectors undergo a rotation and become unit vectors i = (1, 0)T and
j = (0, 1)T . Then the matrix stretches these two resulting vectors to 1 i and 2 j, respectively.
In the last step, the vectors undergo a final rotation due to U and become 1 u1 and 2 u2 .

VT

U
j

v2

2 j

v1

2 u2

1 i
1 u1

From (1) we also see that


A = 1 u1 v T1 + + r ur v Tr .

We can swap i with j as long as we swap ui with uj and v i with v j at the same time. If i = j ,
then ui and uj can be swapped as long as v i and v j are also swapped. SVD is unique up to the
permutations of (ui , i , v i ) or of (ui , v i ) among those with equal i s. It is also unique up to the
signs of ui and v i , which have to change simultaneously.
We also have
T T
T
AT A = V
U 2 U V
1

..

r2
= V

..

T
V .

Hence 12 , . . . , r2 (and 0 if r < n) are eigenvalues of AT A, which is positive definite if rank(A) = n,


and v 1 , . . . , v n its eigenvectors.
Similarly, we have

2
1

..

2
T

r
T
U .

AA
= U

.
.

.
0

Therefore 12 , . . . , r2 (and 0 if r < m) are also eigenvalues of AAT , and u1 , . . . , um its eigenvectors.
Example 1. Find the singular value decomposition of


2 2
A=
.
1 1

The eigenvalues of
T

A A=
are 2 and 8 corresponding to unit eigenvectors
!
12
v1 =
1

respectively. Hence 1 =
Av 1
Av 2

5 3
3 5

and

v2 =

2 and 2 = 8 = 2 2. We have


0 ,
= 1 u1 v T1 v 1 = 1 u1 =
2
 
2 2
= 2 u2 v T2 v 2 = 1 u2 =
,
0

1
2
1
2


0
1
 
1
so u2 =
.
0

so

u1 =

12

1
2

1
2

1
2

The SVD of A is therefore


A = U V

0 1
1 0


2
0

0
2 2

Note that we can also compute the eigenvectors u1 and u2 directly from AAT .

More Discussion

That rank(A) = rank() tells us that we can determine the rank of A by counting the non-zero
entries in . In fact, we can do better. Recall that one of our complaints about Gaussian elimination
was that it did not handle noise or nearly singular matrices well. SVD remedies this situation.
For example, suppose that an n n matrix A is nearly singular. Indeed, perhaps A should be
singular, but due to noisy data, it is not quite singular. This will show up in , for instance, when
all of the n diagonal entries in are non-zero and some of the diagonal entries are almost zero.
More generally, an m n matrix A may appear to have rank r, yet when we look at we
may find that some of the singular values are very close to zero. If there are l such values, then
the true rank of A is probably r l, and we would do well to modify . Specifically, we should
replace the l nearly zero singular values with zero.
Geometrically, the effect of this replacement is to reduce the column space of A and increase
its null space. The point is that the column space is warped along directions for which i 0. In
effect, solutions to Ax = b get pulled off to infinity (since 1i ) along vectors that are almost
in the null space. So, it is better to ignore the ith coordinate by zeroing i .
Example 2. The matrix

yields

1.01 1.00 1.00


A = 1.00 1.01 1.00
1.00 1.00 1.00

3.01
0
0
0 0.01
0 .
=
0
0 0.01

Since 0.01 is significantly smaller than 3.01, we could treat it as zero and the rank A as one.

The SVD Solution

For the moment, let us suppose A is an n n square matrix. The following picture sketches the
way in which SVD solves the system Ax = b.

SVD solutions to
both Ax = b
Ax = b

set of solutions to
Ax = b
col(A)
null(A)

The system Ax = b has an affine set of solutions, given by x0 + null(A), where x0 is any
solution. It is easy to describe the null space null(A) given the SVD decomposition, since it is just
the span of the last n r columns of V . Also note that Ax = b has no solution since b is not in
the column space of A.
SVD solves Ax = b by determining that x which is the closest to the origin, i.e., which has the
minimum norm. It solves Ax = b by projecting b onto the column space of A, obtaining b, then
solving Ax = b. In other words, SVD obtains the least-squares solution.
So how do we compute a solution to Ax = b for all the cases? Given the diagonal matrix in
the SVD of A, denote by + the diagonal matrix whose diagonal entries are of the form:
 1

i if 1 i r;
+ ii =
0 if r + 1 i n.
Thus the product matrix

+
+
= =

..

.
1
0
..

has r 1s on its diagonal. If is invertible, then + = 1 .


Example 3. If

2
= 0
0

0 0
5 0
0 0

then

1
2

0
0 .
0

+ = 0
0

1
5

To solve Ax = b, we first compute the SVD decomposition A = U V T , then compute


= V + U T b.
x
is the unique solution to Ax = b.
(a) If A is invertible, then x
is the solution closest to the origin. And
(b) If A is singular and b is in the range of A, then x
+
T
V U is a pseudo-inverse of A. The set of solutions is x+null(A), where null(A) is spanned
by the last n r columns of V .
is the least-squares solution.
(c) If A is singular and b is not in the range of A, then x
So far we have assumed that A is square. How do we solve Ax = b for general m n matrix
A? We still make use of the n m pseudo-inverse matrix
A+ = V + U T .
= A+ b.
The SVD solution is x
The pseudoinverse A+ has the following properties
 1
+
i v i
A ui =
0
 1

+ T
i ui
vi =
A
0

if i r,
if i > r;
if i r,
if i > r.

Namely, the vectors u1 , . . . , ur in the column space of A go back to the row space. The other
vectors ur+1 , . . . , um are in the null space of AT , and A+ sends them to zero. When we know what
happens to each basis vector ui , we know A+ because v i and i will be determined.
Example 4. The pseudoinverse of A =
And we have
+

A =A

2 2
1 1

=V

from Example 1 is A+ = A1 , because A is invertible.

1
4
1
4

21
1
2

In general there will not be any zero singular values. However, if A has column degeneracies
there may be near-zero singular values. It may be useful to zero these out, to remove noise. (The
implication is that the overdetermined set is not quite as overdetermined as it seemed, that is, the
null space is not trivial.)

SVD Algorithm

Here we only take a brief look at how the SVD algorithm actually works. For more details we
refer to [4]. It is useful to establish a contrast with Gaussian elimination. Recall that Gaussian
elimination reduces a matrix A by a series of row operations that zero out portions of columns of A.
Row operations imply pre-multiplying the matrix A. They are all collected together in the matrix
L1 where A = LDU .
In contrast, SVD zeros out portions of both rows and columns. Thus, whereas Gaussian elimination only reduces A using pre-multiplication, SVD uses both pre- and post-multiplication. As a
result, SVD can at each stage rely on orthogonal matrices to perform its reductions on A. By using
orthogonal matrices, SVD reduces the risk of magnifying noise and errors. The pre-multiplication
matrices are gathered together in the matrix U T , while the post-multiplication matrices are gathered together in the matrix V .
There are two phases to the SVD decomposition algorithm:
(i) SVD reduces A to bidiagonal form using a series of orthogonal transformations. This phase
is deterministic and has a running time that depends only on the size of the matrix A.
(ii) SVD removes the superdiagonal elements from the bidiagonal matrix using orthogonal transformation. This phase is iterative but converges quickly.
Let us take a slightly closer look at the first phase. Step 1 in this phase creates two orthogonal
matrices U1 and V1 such that

a11 a12 a1n


0

U1 A = .
,
..

B
0


a11 a12 0 0

U1 AV1 = .
.
.

.
B
0

If A is m n then B is (m 1) (n 1). The next step of this phase recursively works


on B , and so forth, until orthogonal matrices U1 , . . . , Un1 , V1 , . . . , Vn2 are produced such that
Un1 U1 AV1 Vn2 is bidiagonal (assume m n).
In both phases, the orthogonal transformation are constructed from Householder matrices.

References
[1] M. Erdmann. Lecture notes for 16-811 Mathematical Fundamentals for Robotics. The Robotics
Institute, Carnegie Mellon University, 1998.
[2] G. Strang. Introduction to Linear Algebra. Wellesley-Cambridge Press, 1993.
[3] W. H. Press, et al. Numerical Recipes in C++: The Art of Scientific Computing. Cambridge
University Press, 2nd edition, 2002.
8

[4] G. E. Forsythe, M. A. Malcolm, and C. B. Moler. Computer Methods for Mathematical Computations. Prentice-Hall, 1977.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy