Chap03 Fall15 Final
Chap03 Fall15 Final
Outline
Orthogonality, continued
Most Important Picture
t=3*(rand(m,1)-0.5);
b = t.^3 - t; b=b+0.2*rand(m,1); %% Expect: x =~ [ 0 -1 0 1 ]
plot(t,b,'ro'), pause
x = (A'*A) \ A'*b
plot(t,b0,'ro',t,A0*x,'bo',t,1*(b0-A0*x),'kx'), pause
plot(t,A0*x,'bo'), pause
plot(t,b0,'ro',tt,s,'b-')
title('Least Squares Model Fitting to Cubic')
xlabel('Independent Variable, t')
ylabel('Dependent Variable b_i and y(t)')
Note on the text examples
❑ The notation in the examples is a bit different from the rest of the
derivation… so be sure to pay attention.
Least Squares Data Fitting
Least Squares
Existence, Uniqueness, and Conditioning
Data Fitting
Solving Linear Least Squares Problems
Data Fitting
Data Fitting
Polynomial fitting
f (t, x) = x1 + x2 t + x3 t2 + · · · + xn tn 1
f (t, x) = x1 ex2 t + · · · + xn 1e
xn t
Example, continued
For data
t 1.0 0.5 0.0 0.5 1.0
y 1.0 0.5 0.0 0.5 2.0
overdetermined 5 ⇥ 3 linear system is
2 3 2 3
1 1.0 1.0 2 3 1.0
61 0.5 0.257 60.57
6 7 x1 6 7
6
Ax = 61 0.0 0.0 7 x2 = 6
7 4 5 ⇠ 7
60.07 = b
41 0.5 0.255 x3 40.55
1 1.0 1.0 2.0
Solution, which we will see later how to compute, is
⇥ ⇤T
x = 0.086 0.40 1.4
so approximating polynomial is
p(t) = 0.086 + 0.4t + 1.4t2
Michael T. Heath Scientific Computing 8 / 61
Least Squares Data Fitting
Least Squares
Existence, Uniqueness, and Conditioning
Data Fitting
Solving Linear Least Squares Problems
Example, continued
Resulting curve and original data points are shown in graph
Normal Equations
2AT Ax 2AT b = 0
AT Ax = AT b
Orthogonality
Vectors v1 and v2 are orthogonal if their inner product is
zero, v1T v2 = 0
Space spanned by columns of m ⇥ n matrix A,
span(A) = {Ax : x 2 Rn }, is of dimension at most n
If m > n, b generally does not lie in span(A), so there is no
exact solution to Ax = b
Vector y = Ax in span(A) closest to b in 2-norm occurs
when residual r = b Ax is orthogonal to span(A),
0 = AT r = AT (b Ax)
AT Ax = AT b
Michael T. Heath Scientific Computing 12 / 61
Least Squares Data Fitting Existence and Uniqueness
Existence, Uniqueness, and Conditioning Orthogonality
Solving Linear Least Squares Problems Conditioning
Orthogonality, continued
Orthogonal Projectors
Matrix P is orthogonal projector if it is idempotent
(P 2 = P ) and symmetric (P T = P )
Orthogonal projector onto orthogonal complement
span(P )? is given by P? = I P
For any vector v,
v = (P + (I P )) v = P v + P? v
P = A(AT A) 1
AT
b = P b + P? b = Ax + (b Ax) = y + r
↵a1 2 span{a1 }.
y = ↵a1
r=b ↵a1
↵a1 b ↵a1
b b
a1 a1
1D Projection
• We see that y points in the direction of a1 and has magnitude that scales
as b (but not with a1 ).
Projection in Higher Dimensions
x T AT b = x T c = x 1 c 1 + x 2 c 2 + . . . x n c n .
n X
X n
xT AT Ax = xT Hx = xk Hkj xj
j=1 k=1
• y has units and length that scale with b, but it lies in the range of A.
• It is the projection of b onto R(A).
Least Squares Data Fitting Existence and Uniqueness
Existence, Uniqueness, and Conditioning Orthogonality
Solving Linear Least Squares Problems Conditioning
A+ = (AT A) 1 AT
kyk2 kAxk2
cos(✓) = =
kbk2 kbk2
k xk2 1 k bk2
cond(A)
kxk2 cos(✓) kbk2
k xk2 kEk2
/ [cond(A)]2 tan(✓) + cond(A)
kxk2 kAk2
AT A = LLT
Example, continued
Cholesky factorization of symmetric positive definite matrix
AT A gives
2 3
5.0 0.0 2.5
AT A = 40.0 2.5 0.0 5
2.5 0.0 2.125
2 32 3
2.236 0 0 2.236 0 1.118
= 4 0 1.581 0 54 0 1.581 0 5 = LLT
1.118 0 0.935 0 0 0.935
y = A x ⇡ b,
r := b A x ? R(A)
||r||2 = ||b y||2 ||b v||2 8 v 2 R(A).
0 1
aT1 b
B C
B T C
B a2 b C
B C
AT A x = AT b = B C
B .. C
B . C
@ A
aTn b
0 1
aT1 a1 aT1 a2 ··· aT1 an
B C
B T C
B a2 a1 aT2 a2 · · · aT2 an C
B C
AT A = B C.
B .. .. C
B . . C
@ A
aTn a1 aTn a2 · · · aTn an
Orthogonal Bases
In this case,
n
AT A = I.
2
QR Factorization
Reduced QR Full QR
R R
Q1 = A Q = A
O
• Note that
R ⇥ ⇤ R
A = Q = Q1 Q2 = Q1 R.
O O
span{ a1 } = span{ q1 }
span{ a1 , a2 } = span{ q1 , q2 }
span{ a1 , a2 , a3 } = span{ q1 , q2 , q3 }
span{ a1 , a2 , . . . , an } = span{ q1 , q2 , . . . , qn }
QR Factorization: Gram-Schmidt
span{ a1 } = span{ q1 }
span{ a1 , a2 } = span{ q1 , q2 }
span{ a1 , a2 , a3 } = span{ q1 , q2 , q3 }
span{ a1 , a2 , . . . , an } = span{ q1 , q2 , . . . , qn }
a1 = q1 r11
a2 = q1 r12 + q2 r22
i.e., A = QR
(For now, we drop the distinction between Q and Q1 , and focus only on
the reduced QR problem.)
Gram-Schmidt Orthogonalization
for j = 2, . . . , n 1
vj = aj Pj 1 aj = (I Pj 1 ) a j 1 = P?,j 1 aj 1
vj P?,j 1 aj
qj = =
||vj || ||P?,j 1 aj ||
end
P2 a3 = Q2 QT2 a3
q T1 a3 q T2 a3
= q1 + q2
q T1 q 1 q T2 q 2
= q 1 q T1 a3 + q 2 q T2 a3
for j = 2, . . . , n 1
vj = aj Pj 1 aj = (I Pj 1 ) a j 1 = P?,j 1 aj 1
vj P?,j 1 aj
qj = =
||vj || ||P?,j 1 aj ||
end
vj = aj
for k = 1, . . . , j 1,
vj = vj qk qTk aj
end
vj = aj
for k = 1, . . . , j 1,
vj = vj qk qTk vj
end
Mathematical Di↵erence Between CGS and MGS
Gram-Schmidt Orthogonalization
Given vectors a1 and a2 , we seek orthonormal vectors q1
and q2 having same span
This can be accomplished by subtracting from second
vector its projection onto first vector and normalizing both
resulting vectors, as shown in diagram
Gram-Schmidt Orthogonalization
Process can be extended to any number of vectors
a1 , . . . , ak , orthogonalizing each successive vector against
all preceding ones, giving classical Gram-Schmidt
procedure
for k = 1 to n
qk = a k
for j = 1 to k 1
rjk = qjT ak
qk = qk rjk qj
end
rkk = kqk k2
qk = qk /rkk
end
Resulting qk and rjk form reduced QR factorization of A
Michael T. Heath Scientific Computing 45 / 61
Least Squares Data Fitting Normal Equations
Existence, Uniqueness, and Conditioning Orthogonal Methods
Solving Linear Least Squares Problems SVD
Modified Gram-Schmidt
for k = 1 to n
rkk = kak k2
qk = ak /rkk
for j = k + 1 to n
rkj = qkT aj
aj = aj rkj qk
end
end
Orthogonal Transformations
We seek alternative method that avoids numerical
difficulties of normal equations
We need numerically robust transformation that produces
easier problem without changing solution
What kind of transformation leaves least squares solution
unchanged?
Square matrix Q is orthogonal if QT Q = I
Multiplication of vector by orthogonal matrix preserves
Euclidean norm
Residual is
krk22 = kb1 Rxk22 + kb2 k22
We have no control over second term, kb2 k22 , but first term
becomes zero if x satisfies n ⇥ n triangular system
Rx = b1
QR Factorization
Given m ⇥ n matrix A, with m > n, we seek m ⇥ m
orthogonal matrix Q such that
R
A=Q
O
where R is n ⇥ n and upper triangular
Linear least squares problem Ax ⇠ = b is then transformed
into triangular least squares problem
T R ⇠ c1
Q Ax = x= = QT b
O c2
which has same solution, since
2 2 R R
krk2 = kb Axk2 = kb Q xk22 = kQT b xk22
O O
Michael T. Heath Scientific Computing 27 / 61
Least Squares Data Fitting Normal Equations
Existence, Uniqueness, and Conditioning Orthogonal Methods
Solving Linear Least Squares Problems SVD
Orthogonal Bases
If we partition m ⇥ m orthogonal matrix Q = [Q1 Q2 ],
where Q1 is m ⇥ n, then
R R
A=Q = [Q1 Q2 ] = Q1 R
O O
QT1 Ax = Rx = c1 = QT1 b
• Start with Ax ⇡ b
R
Q x ⇡ b
O
R R c1
QT Q x = x ⇡ QT b = [ Q1 Q2 ] b = .
O O c2
Computing QR Factorization
Householder Transformations
Householder transformation has form
vv T
H=I 2 T
v v
for nonzero vector v
H is orthogonal and symmetric: H = H T = H 1
v T a = aT a ↵a1 , v T v = aT a 2↵a1 + ↵2
aT a ↵a1
Ha = a 2 T 2
(a ↵e1 )
a a 2↵a1 + ↵
||a||2 ± ||a||a1
= a 2 (a ↵e1 )
2||a||2 ± 2||a||a1
= a (a ↵e1 ) = ↵e1 .
✓ ◆
a1
Choose ↵= sign(a1 )||a|| = ||a||.
|a1 |
Least Squares Data Fitting Normal Equations
Existence, Uniqueness, and Conditioning Orthogonal Methods
Solving Linear Least Squares Problems SVD
Householder QR Factorization
To compute QR factorization of A, use Householder
transformations to annihilate subdiagonal entries of each
successive column
Each Householder transformation is applied to entire
matrix, but does not affect prior columns, so zeros are
preserved
In applying Householder transformation H to arbitrary
vector u,
✓ T
◆ ✓ T ◆
vv v u
Hu = I 2 T u=u 2 T v
v v v v
Example, continued
Applying resulting Householder transformation H1 yields
transformed matrix and right-hand side
2 3 2 3
2.236 0 1.118 1.789
6 0 0.191 0.4057 6 0.3627
6 7 6 7
H1 A = 66 0 0.309 0.6557 6
7 , H1 b = 6 0.8627
7
4 0 0.809 0.4055 4 0.3625
0 1.309 0.345 1.138
Example, continued
Applying resulting Householder transformation H2 yields
2 3 2 3
2.236 0 1.118 1.789
6 0 1.581 0 7 6 0.6327
6 7 6 7
6
H2 H1 A = 6 0 0 0.7257 , H2 H1 b = 6
7 1.0357
6 7
4 0 0 0.5895 4 0.8165
0 0 0.047 0.404
Example, continued
0 1 0 1
x x x x
B x xC BxC
H1 A = B
@
C, H1 b ! b(1) =B C
x xA @xA
x x x
0 1 01
x x x x
B x xC BxC
B
H2 H 1 A = @ C, H2 b(1) ! b(2) =B C
xA @xA
x x
0 1
x x x ✓ ◆
B x xC c1
B
H 3 H2 H 1 A = @ C, H3 b(2) ! b(3) = .
xA c2
Givens Rotations
Givens rotations introduce zeros one at a time
⇥ ⇤T
Given vector a1 a2 , choose scalars c and s so that
c s a1 ↵
=
s c a2 0
p
with c + s = 1, or equivalently, ↵ = a21 + a22
2 2
p
Finally, c2 + s2 = 1, or ↵ = a21 + a22 , implies
a1 a2
c= p 2 and s= p 2
a1 + a22 a1 + a22
Givens QR Factorization
More generally, to annihilate selected component of vector
in n dimensions, rotate target component with another
component
2 32 3 2 3
1 0 0 0 0 a1 a1
60 c 0 s 07 6a2 7 6 ↵ 7
6 76 7 6 7
60 0 1 0 07 6 7 6 7
6 7 6a3 7 = 6a3 7
40 s 0 c 05 4a4 5 4 0 5
0 0 0 0 1 a5 a5
By systematically annihilating successive entries, we can
reduce matrix to upper triangular form using sequence of
Givens rotations
Each rotation is orthogonal, so their product is orthogonal,
producing QR factorization
Michael T. Heath Scientific Computing 42 / 61
Least Squares Data Fitting Normal Equations
Existence, Uniqueness, and Conditioning Orthogonal Methods
Solving Linear Least Squares Problems SVD
Givens QR Factorization
Rank Deficiency
1
Least Squares Data Fitting Normal Equations
Existence, Uniqueness, and Conditioning Orthogonal Methods
Solving Linear Least Squares Problems SVD
A U § VT
• A = U ⌃V T is m ⇥ n.
• U is m ⇥ m, orthogonal.
• ⌃ is m ⇥ n, diagonal, i > 0.
• V is n ⇥ n, orthogonal.
Least Squares Data Fitting Normal Equations
Existence, Uniqueness, and Conditioning Orthogonal Methods
Solving Linear Least Squares Problems SVD
Example: SVD
2 3
1 2 3
64 5 67
SVD of A = 6
47 8 95
7 is given by U ⌃V T =
10 11 12
2 32 3
.141 .825 .420 .351 25.5 0 0 2 3
6.344 7 6 7 .504 .574 .644
6 .426 .298 .7827 6 0 1.29 07 4
4.547 .761 .057 .6465
.0278 .664 .5095 4 0 0 05
.408 .816 .408
.750 .371 .542 .0790 0 0 0
Applications of SVD
max
Euclidean condition number of matrix : cond(A) =
min
T
U ⌃V ⇡ b
SVD for Linear Least Squares Problem: A = U ⌃V T
U T U ⌃V T
⇡
Ax ⇡ b U T
b
T T ⇡ UTb
⌃V
U ⌃V ⇡ b
Ax ⇡ b
U TU ⌃V T ⇡ U T✓
b ◆
R̃ c1 U ⌃V T ⇡ b
x =T
O
⌃V T
⇡ U b c2
T T T
✓ ◆ U U ⌃V ⇡ U b
R̃ c1
x ⇡
O c2 ⌃V T ⇡ U T b
R̃x = c1 ✓ ◆
R̃ c1
n
X n
X x =
1 1 O c2
x= v j (c1 )j = v j uTj b
j=1 1 j
n
X n j=1
X 1 T
j
x= v j (c1 )j = v j uj b
j=1 j j=1 j
T
SVD for Linear Least Squares Problem: A = U ⌃V
U T U ⌃V T ⇡ U T b
k
X 1
x= vj uTj b ⌃V T ⇡ U T b
j=1 j
✓ ◆
R̃ c1
x =
O c2
Least Squares Data Fitting Normal Equations
Existence, Uniqueness, and Conditioning Orthogonal Methods
Solving Linear Least Squares Problems SVD
Pseudoinverse
Orthogonal Bases
with Ei = ui viT
Ei has rank 1 and can be stored using only m + n storage
locations
Product Ei x can be computed using only m + n
multiplications
Condensed approximation to A is obtained by omitting
from summation terms corresponding to small singular
values
Approximation using k largest singular values is closest
matrix of rank k to A
Approximation is useful in image processing, data
compression, information retrieval, cryptography, etc.
< interactive example >
Michael T. Heath Scientific Computing 57 / 61
T
Low Rank Approximation to A = U ⌃V
• Because
❑ Full image storage of the
cost scales diagonal form of ⌃, we have
as O(mn)
n
X
T T
Aas
❑ Compress image storage scales = O(km)
U ⌃V +=O(kn),uwith
j j vkj < m or n.
j=1
• Because
❑ Full image storage of the
cost scales diagonal form of ⌃, we have
as O(mn)
n
X
T T
Aas
❑ Compress image storage scales = O(km)
U ⌃V +=O(kn),uwith
j j vkj < m or n.
j=1
If 1 2 ··· n,
Pk + T
x⇡ j=1 j j jb
v u
Comparison of Methods
Forming normal equations matrix AT A requires about
n2 m/2 multiplications, and solving resulting symmetric
linear system requires about n3 /6 multiplications
Solving least squares problem using Householder QR
factorization requires about mn2 n3 /3 multiplications
If m ⇡ n, both methods require about same amount of
work
If m n, Householder QR requires about twice as much
work as normal equations
Cost of SVD is proportional to mn2 + n3 , with
proportionality constant ranging from 4 to 10, depending on
algorithm used
Michael T. Heath Scientific Computing 59 / 61
Least Squares Data Fitting Normal Equations
Existence, Uniqueness, and Conditioning Orthogonal Methods
Solving Linear Least Squares Problems SVD