Singular Value Decomposition: Yan-Bin Jia Sep 6, 2012
Singular Value Decomposition: Yan-Bin Jia Sep 6, 2012
Singular Value Decomposition: Yan-Bin Jia Sep 6, 2012
Yan-Bin Jia
Sep 6, 2012
Introduction
..
.
.
.
0
with 1 2 r > 0 and r = rank(A). In the above, 1 , . . . , r are the square roots of the
eigenvalues of AT A. They are called the singular values of A.
Our basic goal is to solve the system Ax = b for all matrices A and vectors b. A second goal
is to solve the system using a numerically stable algorithm. A third goal is to solve the system in
a reasonably efficient manner. For instance, we do not want to compute A1 using determinants.
Three situations arise regarding the basic goal:
(a) If A is square and invertible, we want to have the solution x = A1 b.
(b) If A is underconstrained, we want the entire set of solutions.
(c) If A is overconstrained, we could simply give up. But this case arises a lot in practice, so
which
instead we will ask for the least-squares solution. In other words, we want that x
minimizes the error kA
x bk. Geometrically, A
x is the point in the column space of A
closest to b.
1
That is, U U T = U T U = I.
Gaussian elimination is reasonably efficient, but it is not numerically very stable. In particular, elimination does not deal with nearly singular matrices. The method is not designed for
overconstrained systems. Even for underconstrained systems, the method requires extra work.
The poor numerical character of elimination can be seen in a couple ways. First, the elimination
process assumes a non-singular matrix. But singularity, and rank in general, is a slippery concept.
After all, the matrix A contains continuous, possibly noisy, entries. Yet, rank is a discrete integer.
Strictly speaking, the two sets below are linearly independent vectors:
0
0
1.00
1.00
1
1.01
0 , 1 , 0 ,
1.00 , 1.01 , 1.00 .
1
0
0
1.01
1.00
1.00
Yet, the first set seems genuinely independent, while the second set seems almost dependent.
Second, based on elimination, one solves the system Ax = b by solving (via backsubstitution)
Ly = b and U x = D 1 y. If A is nearly singular, then D will contain near-zero entries on its
diagonal, and thus D 1 will contain large numbers. That is OK since in principle one needs the
large numbers to obtain a true solution. The problem is if A contains noisy entries. Then the large
numbers may be pure noise that dominates the true information. Furthermore, since L and U can
be fairly arbitrary, they may distort or magnify that noise across other variables.
Singular value decomposition is a powerful technique for dealing with sets of equations or
matrices that are either singular or else numerically very close to singular. In many cases where
Gaussian elimination and LU decomposition fail to give satisfactory results, SVD will not only
diagnose the problem but also give you a useful numerical answer. It is also the method of choice
for solving most linear least-squares problems.
SVD Close-up
i = 1, . . . , n,
where i s are the eigenvalues of A. If this system has a solution, then another change of coordinates
gives us x, that is, x = Sy.2
Unfortunately, the decomposition A = SS 1 is not always possible. The condition for its
existence is that A is n n with n linearly independent eigenvectors. Even worse, what do we do
if A is not square?
2
There is no reason to believe that computing S 1 b is any easier than computing A1 b. However, in the ideal
case, the eigenvectors of A are orthogonal. This is true, for instance, if AAT = AT A. In that case the columns of
S are orthogonal and so we can take S to be orthogonal. But then S 1 = S T , and the problem of solving Ax = b
becomes very simple.
The answer is work with AT A and AAT , both of which are symmetric (and have n and m
orthogonal eigenvectors, respectively). So we have the following decompositions:
AT A = V DV T ,
AAT
= U D U T ,
VT
mm
mn
nn
u1
ur ur+1
(1)
um
...
...
...
0
col(A)
v T1
row(A)
v Tr
...
...
mn
...
v Tr+1
null(A)
v Tn
col(AT )
VT
U
j
v2
2 j
v1
2 u2
1 i
1 u1
We can swap i with j as long as we swap ui with uj and v i with v j at the same time. If i = j ,
then ui and uj can be swapped as long as v i and v j are also swapped. SVD is unique up to the
permutations of (ui , i , v i ) or of (ui , v i ) among those with equal i s. It is also unique up to the
signs of ui and v i , which have to change simultaneously.
We also have
T T
T
AT A = V
U 2 U V
1
..
r2
= V
..
T
V .
2
1
..
2
T
r
T
U .
AA
= U
.
.
.
0
Therefore 12 , . . . , r2 (and 0 if r < m) are also eigenvalues of AAT , and u1 , . . . , um its eigenvectors.
Example 1. Find the singular value decomposition of
2 2
A=
.
1 1
The eigenvalues of
T
A A=
are 2 and 8 corresponding to unit eigenvectors
!
12
v1 =
1
respectively. Hence 1 =
Av 1
Av 2
5 3
3 5
and
v2 =
2 and 2 = 8 = 2 2. We have
0 ,
= 1 u1 v T1 v 1 = 1 u1 =
2
2 2
= 2 u2 v T2 v 2 = 1 u2 =
,
0
1
2
1
2
0
1
1
so u2 =
.
0
so
u1 =
12
1
2
1
2
1
2
0 1
1 0
2
0
0
2 2
Note that we can also compute the eigenvectors u1 and u2 directly from AAT .
More Discussion
That rank(A) = rank() tells us that we can determine the rank of A by counting the non-zero
entries in . In fact, we can do better. Recall that one of our complaints about Gaussian elimination
was that it did not handle noise or nearly singular matrices well. SVD remedies this situation.
For example, suppose that an n n matrix A is nearly singular. Indeed, perhaps A should be
singular, but due to noisy data, it is not quite singular. This will show up in , for instance, when
all of the n diagonal entries in are non-zero and some of the diagonal entries are almost zero.
More generally, an m n matrix A may appear to have rank r, yet when we look at we
may find that some of the singular values are very close to zero. If there are l such values, then
the true rank of A is probably r l, and we would do well to modify . Specifically, we should
replace the l nearly zero singular values with zero.
Geometrically, the effect of this replacement is to reduce the column space of A and increase
its null space. The point is that the column space is warped along directions for which i 0. In
effect, solutions to Ax = b get pulled off to infinity (since 1i ) along vectors that are almost
in the null space. So, it is better to ignore the ith coordinate by zeroing i .
Example 2. The matrix
yields
3.01
0
0
0 0.01
0 .
=
0
0 0.01
Since 0.01 is significantly smaller than 3.01, we could treat it as zero and the rank A as one.
For the moment, let us suppose A is an n n square matrix. The following picture sketches the
way in which SVD solves the system Ax = b.
SVD solutions to
both Ax = b
Ax = b
set of solutions to
Ax = b
col(A)
null(A)
The system Ax = b has an affine set of solutions, given by x0 + null(A), where x0 is any
solution. It is easy to describe the null space null(A) given the SVD decomposition, since it is just
the span of the last n r columns of V . Also note that Ax = b has no solution since b is not in
the column space of A.
SVD solves Ax = b by determining that x which is the closest to the origin, i.e., which has the
minimum norm. It solves Ax = b by projecting b onto the column space of A, obtaining b, then
solving Ax = b. In other words, SVD obtains the least-squares solution.
So how do we compute a solution to Ax = b for all the cases? Given the diagonal matrix in
the SVD of A, denote by + the diagonal matrix whose diagonal entries are of the form:
1
i if 1 i r;
+ ii =
0 if r + 1 i n.
Thus the product matrix
+
+
= =
..
.
1
0
..
2
= 0
0
0 0
5 0
0 0
then
1
2
0
0 .
0
+ = 0
0
1
5
if i r,
if i > r;
if i r,
if i > r.
Namely, the vectors u1 , . . . , ur in the column space of A go back to the row space. The other
vectors ur+1 , . . . , um are in the null space of AT , and A+ sends them to zero. When we know what
happens to each basis vector ui , we know A+ because v i and i will be determined.
Example 4. The pseudoinverse of A =
And we have
+
A =A
2 2
1 1
=V
1
4
1
4
21
1
2
In general there will not be any zero singular values. However, if A has column degeneracies
there may be near-zero singular values. It may be useful to zero these out, to remove noise. (The
implication is that the overdetermined set is not quite as overdetermined as it seemed, that is, the
null space is not trivial.)
SVD Algorithm
Here we only take a brief look at how the SVD algorithm actually works. For more details we
refer to [4]. It is useful to establish a contrast with Gaussian elimination. Recall that Gaussian
elimination reduces a matrix A by a series of row operations that zero out portions of columns of A.
Row operations imply pre-multiplying the matrix A. They are all collected together in the matrix
L1 where A = LDU .
In contrast, SVD zeros out portions of both rows and columns. Thus, whereas Gaussian elimination only reduces A using pre-multiplication, SVD uses both pre- and post-multiplication. As a
result, SVD can at each stage rely on orthogonal matrices to perform its reductions on A. By using
orthogonal matrices, SVD reduces the risk of magnifying noise and errors. The pre-multiplication
matrices are gathered together in the matrix U T , while the post-multiplication matrices are gathered together in the matrix V .
There are two phases to the SVD decomposition algorithm:
(i) SVD reduces A to bidiagonal form using a series of orthogonal transformations. This phase
is deterministic and has a running time that depends only on the size of the matrix A.
(ii) SVD removes the superdiagonal elements from the bidiagonal matrix using orthogonal transformation. This phase is iterative but converges quickly.
Let us take a slightly closer look at the first phase. Step 1 in this phase creates two orthogonal
matrices U1 and V1 such that
U1 A = .
,
..
B
0
a11 a12 0 0
U1 AV1 = .
.
.
.
B
0
References
[1] M. Erdmann. Lecture notes for 16-811 Mathematical Fundamentals for Robotics. The Robotics
Institute, Carnegie Mellon University, 1998.
[2] G. Strang. Introduction to Linear Algebra. Wellesley-Cambridge Press, 1993.
[3] W. H. Press, et al. Numerical Recipes in C++: The Art of Scientific Computing. Cambridge
University Press, 2nd edition, 2002.
8
[4] G. E. Forsythe, M. A. Malcolm, and C. B. Moler. Computer Methods for Mathematical Computations. Prentice-Hall, 1977.