0% found this document useful (0 votes)

2 views

Chap03 Fall15 Final

The document discusses the method of least squares for data fitting, focusing on linear least squares problems, their existence, uniqueness, and conditioning. It explains how measurement errors can be minimized by projecting higher-dimensional data into a lower-dimensional space, leading to an overdetermined system. The document also covers the formulation of the normal equations used to find the least squares solution and the geometric interpretation of the relationships among the data points, residuals, and the span of the matrix involved.

Uploaded by

newtondr7

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views

Chap03 Fall15 Final

Uploaded by

newtondr7

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 106

Least Squares Data Fitting

Existence, Uniqueness, and Conditioning

Solving Linear Least Squares Problems

Outline

1 Least Squares Data Fitting

2 Existence, Uniqueness, and Conditioning

3 Solving Linear Least Squares Problems

Michael T. Heath Scientific Computing 2 / 61

Least Squares Data Fitting
Least Squares
Existence, Uniqueness, and Conditioning
Data Fitting
Solving Linear Least Squares Problems

Method of Least Squares

Measurement errors are inevitable in observational and

experimental sciences

Errors can be smoothed out by averaging over many

cases, i.e., taking more measurements than are strictly
necessary to determine parameters of system

Resulting system is overdetermined, so usually there is no

exact solution

In effect, higher dimensional data are projected into lower

dimensional space to suppress irrelevant detail

Such projection is most conveniently accomplished by

method of least squares

Michael T. Heath Scientific Computing 3 / 61

Least Squares Data Fitting
Least Squares
Existence, Uniqueness, and Conditioning
Data Fitting
Solving Linear Least Squares Problems

Linear Least Squares

For linear problems, we obtain overdetermined linear

system Ax = b, with m ⇥ n matrix A, m > n

System is better written Ax ⇠= b, since equality is usually

not exactly satisfiable when m > n

Least squares solution x minimizes squared Euclidean

norm of residual vector r = b Ax,

min krk22 = min kb Axk22

x x

Michael T. Heath Scientific Computing 4 / 61

Least Squares Idea

This system is overdetermined.

There are more equations than unknowns.

Least Squares Idea

❑ With m > n, we have:

❑ Lots of data ( b )
❑ A few parameters (
Least Squares Data Fitting Existence and Uniqueness
Existence, Uniqueness, and Conditioning Orthogonality
Solving Linear Least Squares Problems Conditioning

Orthogonality, continued
Most Important Picture

Geometric relationships among b, r, and span(A) are

shown in diagram

Michael T. Heath Scientific Computing 13 / 61

❑ The vector y is the orthogonal projection of b onto span(A).

❑ The projection results in minimization of || r ||2 , which, as we shall see,

is equivalent to having r := b – Ax ? span(A)
Example

❑ Suppose we have observational data, { bi } at some independent

times { ti } (red circles).
❑ The ti s do not need to be sorted and can in fact be repeated.

❑ We wish to fit a smooth model (blue curve) to the data so we can

compactly describe (and perhaps integrate or differentiate) the
functional relationship between b(t) and t.
Example
Matlab Example
% Linear Least Squares Demo

degree=3; m=20; n=degree+1;

t=3*(rand(m,1)-0.5);
b = t.^3 - t; b=b+0.2*rand(m,1); %% Expect: x =~ [ 0 -1 0 1 ]

plot(t,b,'ro'), pause

%%% DEFINE a_ij = phi_j(t_i)

A=zeros(m,n); for j=1:n; A(:,j) = t.^(j-1); end;

A0=A; b0=b; % Save A & b.

%%%% SOLVE LEAST SQUARES PROBLEM via Normal Equations &&&&

x = (A'*A) \ A'*b

plot(t,b0,'ro',t,A0*x,'bo',t,1*(b0-A0*x),'kx'), pause
plot(t,A0*x,'bo'), pause

%% CONSTRUCT SMOOTH APPROXIMATION

tt=(0:100)'/100; tt=min(t) + (max(t)-min(t))*tt;

S=zeros(101,n); for k=1:n; S(:,k) = tt.^(k-1); end;
s=S*x;

plot(t,b0,'ro',tt,s,'b-')
title('Least Squares Model Fitting to Cubic')
xlabel('Independent Variable, t')
ylabel('Dependent Variable b_i and y(t)')
Note on the text examples

❑ Note, the text uses similar examples.

❑ The notation in the examples is a bit different from the rest of the
derivation… so be sure to pay attention.
Least Squares Data Fitting
Least Squares
Existence, Uniqueness, and Conditioning
Data Fitting
Solving Linear Least Squares Problems

Data Fitting

Given m data points (ti , yi ), find n-vector x of parameters

that gives “best fit” to model function f (t, x),
m
X
min (yi f (ti , x))2
x
i=1

Problem is linear if function f is linear in components of x,

f (t, x) = x1 1 (t) + x2 2 (t) + · · · + xn n (t)

where functions j depend only on t

Problem can be written in matrix form as Ax ⇠

= b, with
aij = j (ti ) and bi = yi

Michael T. Heath Scientific Computing 5 / 61

Least Squares Data Fitting
Least Squares
Existence, Uniqueness, and Conditioning
Data Fitting
Solving Linear Least Squares Problems

Data Fitting

Polynomial fitting

f (t, x) = x1 + x2 t + x3 t2 + · · · + xn tn 1

is linear, since polynomial linear in coefficients, though

nonlinear in independent variable t

Fitting sum of exponentials

f (t, x) = x1 ex2 t + · · · + xn 1e
xn t

is example of nonlinear problem

For now, we will consider only linear least squares

problems

Michael T. Heath Scientific Computing 6 / 61

Least Squares Data Fitting
Least Squares
Existence, Uniqueness, and Conditioning
Data Fitting
Solving Linear Least Squares Problems

Example: Data Fitting

Fitting quadratic polynomial to five data points gives linear

least squares problem
2 2 3 2 3
1 t1 t 1 2 3 y1
61 t2 t22 7 x1 6 y2 7
6 7 6 7
Ax = 6 2 74 5 ⇠ 6 7
6 1 t3 t 3 7 x 2 = 6 y 3 7 = b
4 1 t4 t 2 5 x 3 4 y4 5
4
1 t5 t25 y5

Matrix whose columns (or rows) are successive powers of

independent variable is called Vandermonde matrix

Michael T. Heath Scientific Computing 7 / 61

Least Squares Data Fitting
Least Squares
Existence, Uniqueness, and Conditioning
Data Fitting
Solving Linear Least Squares Problems

Example, continued
For data
t 1.0 0.5 0.0 0.5 1.0
y 1.0 0.5 0.0 0.5 2.0
overdetermined 5 ⇥ 3 linear system is
2 3 2 3
1 1.0 1.0 2 3 1.0
61 0.5 0.257 60.57
6 7 x1 6 7
6
Ax = 61 0.0 0.0 7 x2 = 6
7 4 5 ⇠ 7
60.07 = b
41 0.5 0.255 x3 40.55
1 1.0 1.0 2.0
Solution, which we will see later how to compute, is
⇥ ⇤T
x = 0.086 0.40 1.4
so approximating polynomial is
p(t) = 0.086 + 0.4t + 1.4t2
Michael T. Heath Scientific Computing 8 / 61
Least Squares Data Fitting
Least Squares
Existence, Uniqueness, and Conditioning
Data Fitting
Solving Linear Least Squares Problems

Example, continued
Resulting curve and original data points are shown in graph

< interactive example >

Michael T. Heath Scientific Computing 9 / 61

Least Squares Data Fitting Existence and Uniqueness
Existence, Uniqueness, and Conditioning Orthogonality
Solving Linear Least Squares Problems Conditioning

Existence and Uniqueness

Linear least squares problem Ax ⇠

= b always has solution

Solution is unique if, and only if, columns of A are linearly

independent, i.e., rank(A) = n, where A is m ⇥ n

If rank(A) < n, then A is rank-deficient, and solution of

linear least squares problem is not unique

For now, we assume A has full column rank n

Michael T. Heath Scientific Computing 10 / 61

Least Squares Data Fitting Existence and Uniqueness
Existence, Uniqueness, and Conditioning Orthogonality
Solving Linear Least Squares Problems Conditioning

Normal Equations

To minimize squared Euclidean norm of residual vector

krk22 = r T r = (b Ax)T (b Ax)

= bT b 2xT AT b + xT AT Ax

take derivative with respect to x and set it to 0,

2AT Ax 2AT b = 0

which reduces to n ⇥ n linear system of normal equations

AT Ax = AT b

Michael T. Heath Scientific Computing 11 / 61

Least Squares Data Fitting Existence and Uniqueness
Existence, Uniqueness, and Conditioning Orthogonality
Solving Linear Least Squares Problems Conditioning

Orthogonality
Vectors v1 and v2 are orthogonal if their inner product is
zero, v1T v2 = 0
Space spanned by columns of m ⇥ n matrix A,
span(A) = {Ax : x 2 Rn }, is of dimension at most n
If m > n, b generally does not lie in span(A), so there is no
exact solution to Ax = b
Vector y = Ax in span(A) closest to b in 2-norm occurs
when residual r = b Ax is orthogonal to span(A),

0 = AT r = AT (b Ax)

again giving system of normal equations

AT Ax = AT b
Michael T. Heath Scientific Computing 12 / 61
Least Squares Data Fitting Existence and Uniqueness
Existence, Uniqueness, and Conditioning Orthogonality
Solving Linear Least Squares Problems Conditioning

Orthogonality, continued

Geometric relationships among b, r, and span(A) are

shown in diagram

Michael T. Heath Scientific Computing 13 / 61

Least Squares Data Fitting Existence and Uniqueness
Existence, Uniqueness, and Conditioning Orthogonality
Solving Linear Least Squares Problems Conditioning

Orthogonal Projectors
Matrix P is orthogonal projector if it is idempotent
(P 2 = P ) and symmetric (P T = P )
Orthogonal projector onto orthogonal complement
span(P )? is given by P? = I P
For any vector v,

v = (P + (I P )) v = P v + P? v

For least squares problem Ax ⇠

= b, if rank(A) = n, then

P = A(AT A) 1
AT

is orthogonal projector onto span(A), and

b = P b + P? b = Ax + (b Ax) = y + r

Michael T. Heath Scientific Computing 14 / 61

1D Projection

• Consider the 1D subspace of lR2 spanned by a1 :

↵a1 2 span{a1 }.

• The projection of a point b 2 lR2 onto span{a1 } is the point on

the line y = ↵a1 that is closest to b.
• To find the projection, we look for the value ↵ that minimizes
||r|| = ||↵a1 b|| in the 2-norm. (Other norms are also possible.)

y = ↵a1
r=b ↵a1

↵a1 b ↵a1
b b
a1 a1
1D Projection

• Minimizing the square of the residual with respect to ↵, we have

d
||r||2 =
d↵
d
= (b ↵a1 )T (b ↵a1 )
d↵
d ⇥ T 2 T T
⇤
= b b + ↵ a1 a1 2↵ a1 b
d↵
= 2↵ aT1 a1 2 aT1 b = 0

• For this to be a minimum, we require the last expression to be zero,

which implies
aT1 b aT1 b
↵ = T , =) y = ↵a1 = T a1 .
a1 a1 a1 a1

• We see that y points in the direction of a1 and has magnitude that scales
as b (but not with a1 ).
Projection in Higher Dimensions

• Here, we have basis coefficients xi written as x = [x1 . . . xn ]T .

• As before, we minimize the square of the norm of the residual

||r||2 = ||Ax b||2

= (Ax b)T (Ax b)
= bT b bT Ax (Ax)T b + xT AT Ax
= bT b + xT AT Ax 2xT AT b.

• As in the 1D case, we require stationarity with respect to all coefficients

d
||r||2 = 0
dxi
• The first term is constant.
• The second and third are more complex.
Projection in Higher Dimensions

• Define c = AT b and H = AT A such that

x T AT b = x T c = x 1 c 1 + x 2 c 2 + . . . x n c n .
n X
X n
xT AT Ax = xT Hx = xk Hkj xj
j=1 k=1

• Di↵erentiating with respect to xi ,

d
xT AT b = c i = AT b i
, and
dxi
n
X n
X
d
xT Hx = Hij xj + xk Hki
dxi j=1 k=1
n
X
= 2 Hij xj = 2 (Hx)i .
j=1
Projection in Higher Dimensions

• From the preceding pages, the minimum is realized when

d
0 = xT AT Ax 2xT AT b = 2 AT Ax AT b i , i = 1, . . . , n
dxi
• Or, in matrix form:
1
x = AT A AT b.

• As in the 1D case, our projection is

1
y = Ax = A AT A AT b.

• y has units and length that scale with b, but it lies in the range of A.
• It is the projection of b onto R(A).
Least Squares Data Fitting Existence and Uniqueness
Existence, Uniqueness, and Conditioning Orthogonality
Solving Linear Least Squares Problems Conditioning

Pseudoinverse and Condition Number

Nonsquare m ⇥ n matrix A has no inverse in usual sense
If rank(A) = n, pseudoinverse is defined by

A+ = (AT A) 1 AT

and condition number by

cond(A) = kAk2 · kA+ k2

By convention, cond(A) = 1 if rank(A) < n

Just as condition number of square matrix measures
closeness to singularity, condition number of rectangular
matrix measures closeness to rank deficiency
Least squares solution of Ax ⇠ = b is given by x = A+ b
Michael T. Heath Scientific Computing 15 / 61
Least Squares Data Fitting Existence and Uniqueness
Existence, Uniqueness, and Conditioning Orthogonality
Solving Linear Least Squares Problems Conditioning

Sensitivity and Conditioning

Sensitivity of least squares solution to Ax ⇠

= b depends on
b as well as A

Define angle ✓ between b and y = Ax by

kyk2 kAxk2
cos(✓) = =
kbk2 kbk2

Bound on perturbation x in solution x due to perturbation

b in b is given by

k xk2 1 k bk2
 cond(A)
kxk2 cos(✓) kbk2

Michael T. Heath Scientific Computing 16 / 61

Least Squares Data Fitting Existence and Uniqueness
Existence, Uniqueness, and Conditioning Orthogonality
Solving Linear Least Squares Problems Conditioning

Sensitivity and Conditioning, contnued

Similarly, for perturbation E in matrix A,

k xk2 kEk2
/ [cond(A)]2 tan(✓) + cond(A)
kxk2 kAk2

Condition number of least squares solution is about

cond(A) if residual is small, but can be squared or
arbitrarily worse for large residual

Michael T. Heath Scientific Computing 17 / 61

Least Squares Data Fitting Normal Equations
Existence, Uniqueness, and Conditioning Orthogonal Methods
Solving Linear Least Squares Problems SVD

Normal Equations Method

If m ⇥ n matrix A has rank n, then symmetric n ⇥ n matrix

AT A is positive definite, so its Cholesky factorization

AT A = LLT

can be used to obtain solution x to system of normal

equations
AT Ax = AT b
which has same solution as linear least squares problem
Ax ⇠=b

Normal equations method involves transformations

rectangular ! square ! triangular

Michael T. Heath Scientific Computing 18 / 61

Least Squares Data Fitting Normal Equations
Existence, Uniqueness, and Conditioning Orthogonal Methods
Solving Linear Least Squares Problems SVD

Example: Normal Equations Method

For polynomial data-fitting example given previously,
normal equations method gives 2 3
2 3 1 1.0 1.0
1 1 1 1 1 6 6 1 0.5 0.257 7
T
A A = 4 1.0 0.5 0.0 0.5 1.0 61 5 6 0.0 0.0 7 7
1.0 0.25 0.0 0.25 1.0 41 0.5 0.255
1 1.0 1.0
2 3
5.0 0.0 2.5
= 40.0 2.5 0.0 5 ,
2.5 0.0 2.125
2 3
2 3 1.0 2 3
1 1 1 1 6
1 60.57 7 4.0
AT b = 4 1.0 0.5 0.0 0.5 1.05 6 7 4
60.07 = 1.0
5
1.0 0.25 0.0 0.25 1.0 40.55 3.25
2.0
Michael T. Heath Scientific Computing 19 / 61
Least Squares Data Fitting Normal Equations
Existence, Uniqueness, and Conditioning Orthogonal Methods
Solving Linear Least Squares Problems SVD

Example, continued
Cholesky factorization of symmetric positive definite matrix
AT A gives
2 3
5.0 0.0 2.5
AT A = 40.0 2.5 0.0 5
2.5 0.0 2.125
2 32 3
2.236 0 0 2.236 0 1.118
= 4 0 1.581 0 54 0 1.581 0 5 = LLT
1.118 0 0.935 0 0 0.935

Solving lower triangular system Lz = AT b by

⇥ ⇤T
forward-substitution gives z = 1.789 0.632 1.336
Solving upper triangular system LT x = z by
⇥ ⇤T
back-substitution gives x = 0.086 0.400 1.429
Michael T. Heath Scientific Computing 20 / 61
Least Squares Data Fitting Normal Equations
Existence, Uniqueness, and Conditioning Orthogonal Methods
Solving Linear Least Squares Problems SVD

Shortcomings of Normal Equations

Information can be lost in forming AT A and AT b
For example, take 2 3
1 1
A = 4 ✏ 05
0 ✏
p
where ✏ is positive number smaller than ✏mach
Then in floating-point arithmetic
 2

1+✏ 1 1 1
AT A = =
1 1 + ✏2 1 1
which is singular
Sensitivity of solution is also worsened, since
cond(AT A) = [cond(A)]2
Michael T. Heath Scientific Computing 21 / 61
Projection, QR Factorization, Gram-Schmidt

• Recall our linear least squares problem:

y = A x ⇡ b,

which is equivalent to minimization / orthogonal projection:

r := b A x ? R(A)
||r||2 = ||b y||2  ||b v||2 8 v 2 R(A).

• This problem has solutions

1
x = AT A AT b
T 1
y = A A A AT b = P b,
T 1
where P := A A A AT is the orthogonal projector onto R(A).
Observations

0 1
aT1 b
B C
B T C
B a2 b C
B C
AT A x = AT b = B C
B .. C
B . C
@ A
aTn b

0 1
aT1 a1 aT1 a2 ··· aT1 an
B C
B T C
B a2 a1 aT2 a2 · · · aT2 an C
B C
AT A = B C.
B .. .. C
B . . C
@ A
aTn a1 aTn a2 · · · aTn an
Orthogonal Bases

• If the columns of A were orthogonal, such that aij = aTi aj = 0 for i 6= j,

then AT A is a diagonal matrix,
0 T 1
a1 a1
B C
B T C
B a2 a2 C
T B C
A A = B C,
B ... C
B C
@ A
aTn an
and the system is easily solved,
0 1
10 1
aT1 a1 aT1 b
B C B C
B 1 C B T C
B aT2 a2 C B a2 b C
T 1 T B C B C
x = A A A b = B C B C.
B ... C B .. C
B C B . C
@ A @ A
1
aTn an aTn b

• In this case, we can write the projection in closed form:

n
X Xn
aTj b
y = xj a j = Ta
aj . (1)
j=1 j=1
a j j

• For orthogonal bases, (1) is the projection of b onto span{a1 , a2 , . . . , an }.

Orthonormal Bases

• If the columns are orthogonal and normalized such that ||aj || = 1,

we then have aTj aj = 1, or more generally
(
T
1, i = j
ai aj = ij , with ij := the Kronecker delta,
0, i 6= j

• In this case, AT A = I and the orthogonal projection is given by

n
X
y = A AT b = aj aTj b .
j=1

Example: Suppose our model fit is based on sine functions,

sampled uniformly on [0, ⇡]:

j (t) = sin j ti , ti = ⇡ i/m, i = 1, . . . , m.

In this case,

A = ( 1 (ti ) 2 (ti ) ··· n (ti ) ),

n
AT A = I.
2
QR Factorization

• Generally, we don’t a priori have orthonormal bases.

• We can construct them, however. The process is referred to as QR
factorization.
• We seek factors Q and R such that QR = A with Q orthogonal (or,
unitary, in the complex case).
• There are two cases of interest:

Reduced QR Full QR

R R
Q1 = A Q = A
O

• Note that
 
R ⇥ ⇤ R
A = Q = Q1 Q2 = Q1 R.
O O

• The columns of Q1 form an orthonormal basis for R(A).

• The columns of Q2 form an orthonormal basis for R(A)? .
QR Factorization: Gram-Schmidt

• We’ll look at three approaches to QR:

– Gram-Schmidt Orthogonalization,
– Householder Transformations, and
– Givens Rotations
• We start with Gram-Schmidt - which is most intuitive.
• We are interested in generating orthogonal subspaces that match the
nested column spaces of A,

span{ a1 } = span{ q1 }

span{ a1 , a2 } = span{ q1 , q2 }

span{ a1 , a2 , a3 } = span{ q1 , q2 , q3 }

span{ a1 , a2 , . . . , an } = span{ q1 , q2 , . . . , qn }
QR Factorization: Gram-Schmidt

• It’s clear that the conditions

span{ a1 } = span{ q1 }

span{ a1 , a2 } = span{ q1 , q2 }

span{ a1 , a2 , a3 } = span{ q1 , q2 , q3 }

span{ a1 , a2 , . . . , an } = span{ q1 , q2 , . . . , qn }

are equivalent to the equations

a1 = q1 r11

a2 = q1 r12 + q2 r22

a3 = q1 r13 + q2 r23 + q3 r33

.. ..
. = . + ···

an = q1 r1n + q2 r2n + · · · + qn rnn

i.e., A = QR

(For now, we drop the distinction between Q and Q1 , and focus only on
the reduced QR problem.)
Gram-Schmidt Orthogonalization

• The preceding relationship suggests the first algorithm.

Let Qj 1 := [q1 q2 . . . qj 1 ] , Pj 1 := Qj QTj 1 , P?,j 1 := I Pj 1 .

for j = 2, . . . , n 1
vj = aj Pj 1 aj = (I Pj 1 ) a j 1 = P?,j 1 aj 1

vj P?,j 1 aj
qj = =
||vj || ||P?,j 1 aj ||
end

• This is Gram-Schmidt orthogonalization.

• Each new vector qj starts with aj and subtracts o↵ the projection onto
R(Qj 1 ), followed by normalization.
Classical Gram-Schmidt Orthogonalization
⌘⌘
36
⌘
XXX
a3 ⌘
XX⌘ ⌘
⌘ XXXX r33 q
⌘ XXX 3
⌘ XX
q2 ⌘ XX
✓⌘ XXX
⌘ X
⌘
XXX R(Q2 ) ⌘
XXX -
XXX z
X P2 a 3
XXX q1
XXX
XXX
XX XXX

P2 a3 = Q2 QT2 a3

q T1 a3 q T2 a3
= q1 + q2
q T1 q 1 q T2 q 2

= q 1 q T1 a3 + q 2 q T2 a3

In general, if Qk is an orthogonal matrix, then

Pk = Qk QTk is an orthogonal projector onto R(Qk )
Gram-Schmidt Orthogonalization

• The preceding relationship suggests the first algorithm.

Let Qj 1 := [q1 q2 . . . qj 1 ] , Pj 1 := Qj QTj 1 , P?,j 1 := I Pj 1 .

for j = 2, . . . , n 1
vj = aj Pj 1 aj = (I Pj 1 ) a j 1 = P?,j 1 aj 1

vj P?,j 1 aj
qj = =
||vj || ||P?,j 1 aj ||
end

• This is Gram-Schmidt orthogonalization.

• Each new vector qj starts with aj and subtracts o↵ the projection onto
R(Qj 1 ), followed by normalization.
Gram-Schmidt: Classical vs. Modified

• We take a closer look at the projection step, vj = aj Pj 1 aj .

• The classical (unstable) GS projection is executed as

vj = aj
for k = 1, . . . , j 1,
vj = vj qk qTk aj
end

• The modified GS projection is executed as

vj = aj
for k = 1, . . . , j 1,
vj = vj qk qTk vj
end
Mathematical Di↵erence Between CGS and MGS

• Let P̃?,j , := I qj qTj (This is an m ⇥ m matrix of what rank?)

• The CGS projection step amounts to
⇣ ⌘
vj = P̃?,j 1 P̃?,j 2 · · · P̃?,1 aj
⇣ ⌘
= I P̃1 P̃2 ··· P̃j 1 aj

= aj P̃1 aj P̃2 aj ··· P̃j 1 aj

j 1
X
= aj P̃k aj .
k=1

• The MGS projection step is equivalent to

⇣ ⇣ ⇣ ⌘ ⌘⌘
vj = P̃?,j 1 P̃?,j 2 · · · P̃?,1 aj · · ·
⇣ ⌘⇣ ⌘ ⇣ ⌘
= I P̃j 1 I P̃j 2 · · · I P̃1 aj
j 1⇣
Y ⌘
= I P̃k aj
k=1
Mathematical Di↵erence Between CGS and MGS

• Lack of associativity in floating point arithmetic drives the di↵erence

between CGS and MGS.
• Conceptually, MGS projects the residual, rk := aj Pk 1 a j .
• As we shall see, neither GS nor MGS are as robust as
Householder transformations.
• Both, however, can be cleaned up with a second-pass through the
orthogonalization process. (Just set A = Q and repeat, once.)
Least Squares Data Fitting Normal Equations
Existence, Uniqueness, and Conditioning Orthogonal Methods
Solving Linear Least Squares Problems SVD

Gram-Schmidt Orthogonalization
Given vectors a1 and a2 , we seek orthonormal vectors q1
and q2 having same span
This can be accomplished by subtracting from second
vector its projection onto first vector and normalizing both
resulting vectors, as shown in diagram

< interactive example >

Michael T. Heath Scientific Computing 44 / 61
Least Squares Data Fitting Normal Equations
Existence, Uniqueness, and Conditioning Orthogonal Methods
Solving Linear Least Squares Problems SVD

Gram-Schmidt Orthogonalization
Process can be extended to any number of vectors
a1 , . . . , ak , orthogonalizing each successive vector against
all preceding ones, giving classical Gram-Schmidt
procedure
for k = 1 to n
qk = a k
for j = 1 to k 1
rjk = qjT ak
qk = qk rjk qj
end
rkk = kqk k2
qk = qk /rkk
end
Resulting qk and rjk form reduced QR factorization of A
Michael T. Heath Scientific Computing 45 / 61
Least Squares Data Fitting Normal Equations
Existence, Uniqueness, and Conditioning Orthogonal Methods
Solving Linear Least Squares Problems SVD

Modified Gram-Schmidt

Classical Gram-Schmidt procedure often suffers loss of

orthogonality in finite-precision

Also, separate storage is required for A, Q, and R, since

original ak are needed in inner loop, so qk cannot overwrite
columns of A

Both deficiencies are improved by modified Gram-Schmidt

procedure, with each vector orthogonalized in turn against
all subsequent vectors, so qk can overwrite ak

Michael T. Heath Scientific Computing 46 / 61

Least Squares Data Fitting Normal Equations
Existence, Uniqueness, and Conditioning Orthogonal Methods
Solving Linear Least Squares Problems SVD

Modified Gram-Schmidt QR Factorization

Modified Gram-Schmidt algorithm

for k = 1 to n
rkk = kak k2
qk = ak /rkk
for j = k + 1 to n
rkj = qkT aj
aj = aj rkj qk
end
end

< interactive example >

Matlab Demo: house.m

Michael T. Heath Scientific Computing 47 / 61

Classical & Modified GS: Notes
Classical & Modified GS: Notes
Householder Transformations: Notes
Least Squares Data Fitting Normal Equations
Existence, Uniqueness, and Conditioning Orthogonal Methods
Solving Linear Least Squares Problems SVD

Orthogonal Transformations
We seek alternative method that avoids numerical
difficulties of normal equations
We need numerically robust transformation that produces
easier problem without changing solution
What kind of transformation leaves least squares solution
unchanged?
Square matrix Q is orthogonal if QT Q = I
Multiplication of vector by orthogonal matrix preserves
Euclidean norm

kQvk22 = (Qv)T Qv = v T QT Qv = v T v = kvk22

Thus, multiplying both sides of least squares problem by

orthogonal matrix does not change its solution
Michael T. Heath Scientific Computing 24 / 61
Least Squares Data Fitting Normal Equations
Existence, Uniqueness, and Conditioning Orthogonal Methods
Solving Linear Least Squares Problems SVD

Triangular Least Squares Problems

As with square linear systems, suitable target in simplifying

least squares problems is triangular form

Upper triangular overdetermined (m > n) least squares

problem has form  
R b1
x⇠ =
O b2
where R is n ⇥ n upper triangular and b is partitioned
similarly

Residual is
krk22 = kb1 Rxk22 + kb2 k22

Michael T. Heath Scientific Computing 25 / 61

Least Squares Data Fitting Normal Equations
Existence, Uniqueness, and Conditioning Orthogonal Methods
Solving Linear Least Squares Problems SVD

Triangular Least Squares Problems, continued

We have no control over second term, kb2 k22 , but first term
becomes zero if x satisfies n ⇥ n triangular system

Rx = b1

which can be solved by back-substitution

Resulting x is least squares solution, and minimum sum of

squares is
krk22 = kb2 k22
So our strategy is to transform general least squares
problem to triangular form using orthogonal transformation
so that least squares solution is preserved

Michael T. Heath Scientific Computing 26 / 61

Least Squares Data Fitting Normal Equations
Existence, Uniqueness, and Conditioning Orthogonal Methods
Solving Linear Least Squares Problems SVD

QR Factorization
Given m ⇥ n matrix A, with m > n, we seek m ⇥ m
orthogonal matrix Q such that

R
A=Q
O
where R is n ⇥ n and upper triangular
Linear least squares problem Ax ⇠ = b is then transformed
into triangular least squares problem
 
T R ⇠ c1
Q Ax = x= = QT b
O c2
which has same solution, since
 
2 2 R R
krk2 = kb Axk2 = kb Q xk22 = kQT b xk22
O O
Michael T. Heath Scientific Computing 27 / 61
Least Squares Data Fitting Normal Equations
Existence, Uniqueness, and Conditioning Orthogonal Methods
Solving Linear Least Squares Problems SVD

Orthogonal Bases
If we partition m ⇥ m orthogonal matrix Q = [Q1 Q2 ],
where Q1 is m ⇥ n, then
 
R R
A=Q = [Q1 Q2 ] = Q1 R
O O

is called reduced QR factorization of A

Columns of Q1 are orthonormal basis for span(A), and
columns of Q2 are orthonormal basis for span(A)?
Q1 QT1 is orthogonal projector onto span(A)
Solution to least squares problem Ax ⇠ = b is given by
solution to square system

QT1 Ax = Rx = c1 = QT1 b

Michael T. Heath Scientific Computing 28 / 61

QR for Solving Least Squares

• Start with Ax ⇡ b

R
Q x ⇡ b
O
  
R R c1
QT Q x = x ⇡ QT b = [ Q1 Q2 ] b = .
O O c2

• Define the residual, r := b y = b Ax

||r|| = ||b Ax||

= ||QT (b Ax) ||
✓ ◆ ✓ ◆
c1 Rx
=
c2 O
(c1 Rx)
=
c2

||r||2 = ||c1 Rx||2 + ||c2 ||2

• Norm of residual is minimized when Rx = c1 = QT1 b, and

takes on value ||r|| = ||c2 ||.
Least Squares Data Fitting Normal Equations
Existence, Uniqueness, and Conditioning Orthogonal Methods
Solving Linear Least Squares Problems SVD

Computing QR Factorization

To compute QR factorization of m ⇥ n matrix A, with

m > n, we annihilate subdiagonal entries of successive
columns of A, eventually reaching upper triangular form

Similar to LU factorization by Gaussian elimination, but use

orthogonal transformations instead of elementary
elimination matrices
Possible methods include
Householder transformations
Givens rotations
Gram-Schmidt orthogonalization

Michael T. Heath Scientific Computing 29 / 61

Method 2: Householder Transformations
Least Squares Data Fitting Normal Equations
Existence, Uniqueness, and Conditioning Orthogonal Methods
Solving Linear Least Squares Problems SVD

Householder Transformations
Householder transformation has form
vv T
H=I 2 T
v v
for nonzero vector v
H is orthogonal and symmetric: H = H T = H 1

Given vector a, we want to choose v so that

2 3 2 3
↵ 1
607 6 07
6 7 6 7
Ha = 6 . 7 = ↵ 6 . 7 = ↵e1
4 .. 5 4 .. 5
0 0
Substituting into formula for H, we can take
v=a ↵e1
and ↵ = ±kak2 , with sign chosen to avoid cancellation
Michael T. Heath Scientific Computing 30 / 61
Householder Reflection
Householder Derivation
0 1 0 1
v1 ↵
B C B C
vT a B
B v2 C
C
B 0 C
B C
Ha = a 2 T B .. C = B . C
v vB . C B .. C
@ A @ A
vm 0

v = a ↵e1 Choose ↵ to get desired cancellation.

v T a = aT a ↵a1 , v T v = aT a 2↵a1 + ↵2

aT a ↵a1
Ha = a 2 T 2
(a ↵e1 )
a a 2↵a1 + ↵
||a||2 ± ||a||a1
= a 2 (a ↵e1 )
2||a||2 ± 2||a||a1
= a (a ↵e1 ) = ↵e1 .
✓ ◆
a1
Choose ↵= sign(a1 )||a|| = ||a||.
|a1 |
Least Squares Data Fitting Normal Equations
Existence, Uniqueness, and Conditioning Orthogonal Methods
Solving Linear Least Squares Problems SVD

Example: Householder Transformation

⇥ ⇤T
If a = 2 1 2 , then we take
2 3 2 3 2 3 2 3
2 1 2 ↵
v = a ↵e1 = 415 ↵ 405 = 415 405
2 0 2 0
where ↵ = ±kak2 = ±3
Since a1 is positive, we
2 3choose
2 negative
3 2 3 sign for ↵ to avoid
2 3 5
cancellation, so v = 415 4 05 = 415
2 0 2
To confirm that transformation works,
2 3 2 3 2 3
2 5 3
vT a 15
Ha = a 2 T v = 415 2 415 = 4 05
v v 30
2 2 0
< interactive example >
Michael T. Heath Scientific Computing 31 / 61
Least Squares Data Fitting Normal Equations
Existence, Uniqueness, and Conditioning Orthogonal Methods
Solving Linear Least Squares Problems SVD

Householder QR Factorization
To compute QR factorization of A, use Householder
transformations to annihilate subdiagonal entries of each
successive column
Each Householder transformation is applied to entire
matrix, but does not affect prior columns, so zeros are
preserved
In applying Householder transformation H to arbitrary
vector u,
✓ T
◆ ✓ T ◆
vv v u
Hu = I 2 T u=u 2 T v
v v v v

which is much cheaper than general matrix-vector

multiplication and requires only vector v, not full matrix H
Michael T. Heath Scientific Computing 32 / 61
Least Squares Data Fitting Normal Equations
Existence, Uniqueness, and Conditioning Orthogonal Methods
Solving Linear Least Squares Problems SVD

Householder QR Factorization, continued

Process just described produces factorization


R
Hn · · · H1 A =
O

where R is n ⇥ n and upper triangular


R
If Q = H1 · · · Hn , then A = Q
O
To preserve solution of linear least squares problem,
right-hand side b is transformed by same sequence of
Householder transformations

R
Then solve triangular least squares problem x⇠
= QT b
O
Michael T. Heath Scientific Computing 33 / 61
Least Squares Data Fitting Normal Equations
Existence, Uniqueness, and Conditioning Orthogonal Methods
Solving Linear Least Squares Problems SVD

Householder QR Factorization, continued

For solving linear least squares problem, product Q of

Householder transformations need not be formed explicitly

R can be stored in upper triangle of array initially

containing A

Householder vectors v can be stored in (now zero) lower

triangular portion of A (almost)

Householder transformations most easily applied in this

form anyway

Michael T. Heath Scientific Computing 34 / 61

Least Squares Data Fitting Normal Equations
Existence, Uniqueness, and Conditioning Orthogonal Methods
Solving Linear Least Squares Problems SVD

Example: Householder QR Factorization

For polynomial data-fitting example given previously, with
2 3 2 3
1 1.0 1.0 1.0
61 0.5 0.257 60.57
6 7 6 7
6
A = 61 0.0 0.0 7 , b = 6
7 0.0 7
6 7
41 0.5 0.25 5 40.55
1 1.0 1.0 2.0

Householder vector v1 for annihilating subdiagonal entries

of first column of2A3is 2 3 2 3
1 2.236 3.236
6 17 6 0 7 6 1 7
6 7 6 7 6 7
v1 = 6 7 6 7 6
6 17 6 0 7 = 6 1 7
7
4 15 4 0 5 4 1 5
1 0 1
Michael T. Heath Scientific Computing 35 / 61
Least Squares Data Fitting Normal Equations
Existence, Uniqueness, and Conditioning Orthogonal Methods
Solving Linear Least Squares Problems SVD

Example, continued
Applying resulting Householder transformation H1 yields
transformed matrix and right-hand side
2 3 2 3
2.236 0 1.118 1.789
6 0 0.191 0.4057 6 0.3627
6 7 6 7
H1 A = 66 0 0.309 0.6557 6
7 , H1 b = 6 0.8627
7
4 0 0.809 0.4055 4 0.3625
0 1.309 0.345 1.138

Householder vector v2 for annihilating subdiagonal entries

of second column
2 of H3 1 A2is 3 2 3
0 0 0
6 0.1917 61.5817 6 1.7727
6 7 6 7 6 7
v2 = 6 0.3097 6 0 7 = 6 0.3097
6 7 6 7 6
7
4 0.8095 4 0 5 4 0.8095
1.309 0 1.309
Michael T. Heath Scientific Computing 36 / 61
Least Squares Data Fitting Normal Equations
Existence, Uniqueness, and Conditioning Orthogonal Methods
Solving Linear Least Squares Problems SVD

Example, continued
Applying resulting Householder transformation H2 yields
2 3 2 3
2.236 0 1.118 1.789
6 0 1.581 0 7 6 0.6327
6 7 6 7
6
H2 H1 A = 6 0 0 0.7257 , H2 H1 b = 6
7 1.0357
6 7
4 0 0 0.5895 4 0.8165
0 0 0.047 0.404

Householder vector v3 for annihilating subdiagonal entries

of third column of H2 H1 A is
2 3 2 3 2 3
0 0 0
6 0 7 6 0 7 6 0 7
6 7 6 7 6 7
v3 = 6 7 6 7 6
6 0.7257 60.9357 = 6 1.6607
7
4 0.5895 4 0 5 4 0.5895
0.047 0 0.047
Michael T. Heath Scientific Computing 37 / 61
Least Squares Data Fitting Normal Equations
Existence, Uniqueness, and Conditioning Orthogonal Methods
Solving Linear Least Squares Problems SVD

Example, continued

Applying resulting Householder transformation H3 yields

2 3 2 3
2.236 0 1.118 1.789
6 0 1.581 0 7 6 0.6327
6 7 6 7
H 3 H2 H 1 A = 6
6 0 0 0.9357 6
7 , H3 H2 H1 b = 6 1.3367
7
4 0 0 0 5 4 0.0265
0 0 0 0.337

Now solve upper triangular system Rx = c1 by

⇥ ⇤T
back-substitution to obtain x = 0.086 0.400 1.429

< interactive example >

Michael T. Heath Scientific Computing 38 / 61

kth Householder Transformation (Reflection)
Householder Transformations

0 1 0 1
x x x x
B x xC BxC
H1 A = B
@
C, H1 b ! b(1) =B C
x xA @xA
x x x
0 1 01
x x x x
B x xC BxC
B
H2 H 1 A = @ C, H2 b(1) ! b(2) =B C
xA @xA
x x
0 1
x x x ✓ ◆
B x xC c1
B
H 3 H2 H 1 A = @ C, H3 b(2) ! b(3) = .
xA c2

Questions: How does H3 H2 H1 relate to Q or Q1 ??

What is Q in this case?

Note: Householder Procedure
Method 3: Givens Rotations
Least Squares Data Fitting Normal Equations
Existence, Uniqueness, and Conditioning Orthogonal Methods
Solving Linear Least Squares Problems SVD

Givens Rotations
Givens rotations introduce zeros one at a time
⇥ ⇤T
Given vector a1 a2 , choose scalars c and s so that
  
c s a1 ↵
=
s c a2 0
p
with c + s = 1, or equivalently, ↵ = a21 + a22
2 2

Previous equation can be rewritten

  
a1 a2 c ↵
=
a2 a1 s 0
Gaussian elimination yields triangular system
  
a1 a2 c ↵
=
0 a1 a22 /a1 s ↵a2 /a1

Michael T. Heath Scientific Computing 39 / 61

Least Squares Data Fitting Normal Equations
Existence, Uniqueness, and Conditioning Orthogonal Methods
Solving Linear Least Squares Problems SVD

Givens Rotations, continued

Back-substitution then gives

↵a2 ↵a1
s= 2 and c= 2
a1 + a22 a1 + a22

p
Finally, c2 + s2 = 1, or ↵ = a21 + a22 , implies

a1 a2
c= p 2 and s= p 2
a1 + a22 a1 + a22

Michael T. Heath Scientific Computing 40 / 61

2 x 2 Rotation Matrices
Least Squares Data Fitting Normal Equations
Existence, Uniqueness, and Conditioning Orthogonal Methods
Solving Linear Least Squares Problems SVD

Example: Givens Rotation

⇥ ⇤T
Let a = 4 3
To annihilate second entry we compute cosine and sine
a14 3 a2
c= p 2 = = 0.8 and s = p 2 = = 0.6
2
a1 + a2 5 2
a1 + a2 5

Rotation is then given by

 
c s 0.8 0.6
G= =
s c 0.6 0.8

To confirm that rotation works,

  
0.8 0.6 4 5
Ga = =
0.6 0.8 3 0

Michael T. Heath Scientific Computing 41 / 61

Least Squares Data Fitting Normal Equations
Existence, Uniqueness, and Conditioning Orthogonal Methods
Solving Linear Least Squares Problems SVD

Givens QR Factorization
More generally, to annihilate selected component of vector
in n dimensions, rotate target component with another
component
2 32 3 2 3
1 0 0 0 0 a1 a1
60 c 0 s 07 6a2 7 6 ↵ 7
6 76 7 6 7
60 0 1 0 07 6 7 6 7
6 7 6a3 7 = 6a3 7
40 s 0 c 05 4a4 5 4 0 5
0 0 0 0 1 a5 a5
By systematically annihilating successive entries, we can
reduce matrix to upper triangular form using sequence of
Givens rotations
Each rotation is orthogonal, so their product is orthogonal,
producing QR factorization
Michael T. Heath Scientific Computing 42 / 61
Least Squares Data Fitting Normal Equations
Existence, Uniqueness, and Conditioning Orthogonal Methods
Solving Linear Least Squares Problems SVD

Givens QR Factorization

Straightforward implementation of Givens method requires

about 50% more work than Householder method, and also
requires more storage, since each rotation requires two
numbers, c and s, to define it

These disadvantages can be overcome, but requires more

complicated implementation

Givens can be advantageous for computing QR

factorization when many entries of matrix are already zero,
since those annihilations can then be skipped

< interactive example >

Michael T. Heath Scientific Computing 43 / 61

Givens QR

❑ A particularly attractive use of Givens QR is when A is upper

Hessenberg – A is upper triangular with one additional nonzero
diagonal below the main one: Aij = 0 if i > j+1

❑ In this case, we require Givens row operations applied only n

times, instead of O(n2) times.

❑ Work for Givens is thus O(n2), vs. O(n3) for Householder.

❑ Upper Hessenberg matrices arise when computing eigenvalues.

Successive Givens Rotations

As with Householder transformations, we apply successive Givens rotations,

G1 , G2 , etc.
0 1 0 1
x x x x
Bx x xC BxC
G1 A = B
@x x xA
C, H 1 b ! b(1)
= B C
@xA
x x x
0 1 0 1
x x x x
Bx x xC BxC
G2 G1 A = B
@
C, G2 b(1) ! b(2) =B C
x xA @xA
x x x
0 1 0 1
x x x x
B x xC BxC
G3 G2 G1 A = B
@
C, G3 b(2) ! b(3) =B C
x xA @xA
x x x

• How many Givens rotations (total) are required for the m ⇥ n

case?
• How does . . . G3 G2 G1 relate to Q or Q1 ?
• What is Q in this case?
Least Squares Data Fitting Normal Equations
Existence, Uniqueness, and Conditioning Orthogonal Methods
Solving Linear Least Squares Problems SVD

Rank Deficiency

If rank(A) < n, then QR factorization still exists, but yields

singular upper triangular factor R, and multiple vectors x
give minimum residual norm

Common practice selects minimum residual solution x

having smallest norm

Can be computed by QR factorization with column pivoting

or by singular value decomposition (SVD)

Rank of matrix is often not clear cut in practice, so relative

tolerance is used to determine rank

Michael T. Heath Scientific Computing 48 / 61

Least Squares Data Fitting Normal Equations
Existence, Uniqueness, and Conditioning Orthogonal Methods
Solving Linear Least Squares Problems SVD

Example: Near Rank Deficiency

Consider 3 ⇥ 2 matrix
2 3
0.641 0.242
A = 40.321 0.1215
0.962 0.363
Computing QR factorization,

1.1997 0.4527
R=
0 0.0002
R is extremely close to singular (exactly singular to 3-digit
accuracy of problem statement)
If R is used to solve linear least squares problem, result is
highly sensitive to perturbations in right-hand side
For practical purposes, rank(A) = 1 rather than 2, because
columns are nearly linearly dependent
Michael T. Heath Scientific Computing 49 / 61
Least Squares Data Fitting Normal Equations
Existence, Uniqueness, and Conditioning Orthogonal Methods
Solving Linear Least Squares Problems SVD

QR with Column Pivoting

Instead of processing columns in natural order, select for

reduction at each stage column of remaining unreduced
submatrix having maximum Euclidean norm

If rank(A) = k < n, then after k steps, norms of remaining

unreduced columns will be zero (or “negligible” in
finite-precision arithmetic) below row k

Yields orthogonal factorization of form


T R S
Q AP =
O O

where R is k ⇥ k, upper triangular, and nonsingular, and

permutation matrix P performs column interchanges

Michael T. Heath Scientific Computing 50 / 61

rjk = q Tj ak rjk = q Tj q̃ k
q̃ k = q̃ k q j rjk q̃ k = q̃ k q j rjk
Least Squares Data Fitting Normal Equations
end end
Existence, Uniqueness, and Conditioning Orthogonal Methods
q k = q̃ k /||q̃ k || Solving Linear Least q̃ k /||q̃ k ||
q k =Squares Problems SVD

QR with Column Pivoting, continued

Modifed GS computes the projection onto q j using the remainder,
ak Qj 1 ak , rather than simply projecting ak onto q j .
Basicwith
At each step, you are working solution
a smaller to least
correction. squares problem Ax ⇠ = b can now
be iscomputed
Essentially the same e↵ect by solving
realized by running classical GStriangular
twice, first system Rz = c1 , where
on A, then on Q. On the contains
c1second pass, thefirst k components
corrections are very small and of QT b, and then taking
hence less sensitive to round-o↵.

Classical GS is attractive for parallel computing. Why? z
x=P
0
Minimum-norm solution can be computed, if desired, at
expense of additional processing to annihilate S
rank(A) is usually unknown, so rank is determined by
monitoring norms of remaining unreduced columns and
terminating factorization when maximum value falls below
chosen tolerance

< interactive example >

Michael T. Heath Scientific Computing 51 / 61

1
Least Squares Data Fitting Normal Equations
Existence, Uniqueness, and Conditioning Orthogonal Methods
Solving Linear Least Squares Problems SVD

Singular Value Decomposition

Singular value decomposition (SVD) of m ⇥ n matrix A has

form
A = U ⌃V T
where U is m ⇥ m orthogonal matrix, V is n ⇥ n
orthogonal matrix, and ⌃ is m ⇥ n diagonal matrix, with
⇢
0 for i 6= j
ij =
i 0 for i = j

Diagonal entries i , called singular values of A, are

usually ordered so that 1 2 ··· n

Columns ui of U and vi of V are called left and right

singular vectors

Michael T. Heath Scientific Computing 52 / 61

SVD of Rectangular Matrix A

A U § VT

• A = U ⌃V T is m ⇥ n.
• U is m ⇥ m, orthogonal.
• ⌃ is m ⇥ n, diagonal, i > 0.
• V is n ⇥ n, orthogonal.
Least Squares Data Fitting Normal Equations
Existence, Uniqueness, and Conditioning Orthogonal Methods
Solving Linear Least Squares Problems SVD

Example: SVD
2 3
1 2 3
64 5 67
SVD of A = 6
47 8 95
7 is given by U ⌃V T =

10 11 12

2 32 3
.141 .825 .420 .351 25.5 0 0 2 3
6.344 7 6 7 .504 .574 .644
6 .426 .298 .7827 6 0 1.29 07 4
4.547 .761 .057 .6465
.0278 .664 .5095 4 0 0 05
.408 .816 .408
.750 .371 .542 .0790 0 0 0

In square matrix case, U § V T closely related to eigenpair, X ¤ X-1

< interactive example >

Michael T. Heath Scientific Computing 53 / 61

Least Squares Data Fitting Normal Equations
Existence, Uniqueness, and Conditioning Orthogonal Methods
Solving Linear Least Squares Problems SVD

Applications of SVD

Minimum norm solution to Ax ⇠

= b is given by
X uT b
i
x= vi
i
i 6=0

For ill-conditioned or rank deficient problems, “small”

singular values can be omitted from summation to stabilize
solution

Euclidean matrix norm : kAk2 = max

max
Euclidean condition number of matrix : cond(A) =
min

Rank of matrix : number of nonzero singular values

Michael T. Heath Scientific Computing 54 / 61
U ⌃V⇡
A = Ax T b

T
U ⌃V ⇡ b
SVD for Linear Least Squares Problem: A = U ⌃V T

U T U ⌃V T
⇡
Ax ⇡ b U T
b
T T ⇡ UTb
⌃V
U ⌃V ⇡ b
Ax ⇡ b
U TU ⌃V T ⇡ U T✓
b ◆
R̃ c1 U ⌃V T ⇡ b
x =T
O
⌃V T
⇡ U b c2
T T T
 ✓ ◆ U U ⌃V ⇡ U b
R̃ c1
x ⇡
O c2 ⌃V T ⇡ U T b

R̃x = c1  ✓ ◆
R̃ c1
n
X n
X x =
1 1 O c2
x= v j (c1 )j = v j uTj b
j=1 1 j
n
X n j=1
X 1 T
j
x= v j (c1 )j = v j uj b
j=1 j j=1 j
T
SVD for Linear Least Squares Problem: A = U ⌃V

• SVD can also handle the rank deficient case.

Ax ⇡ b
• If there are only k singular values j > ✏ then
take only the first k contributions. U ⌃V T ⇡ b

U T U ⌃V T ⇡ U T b
k
X 1
x= vj uTj b ⌃V T ⇡ U T b
j=1 j
 ✓ ◆
R̃ c1
x =
O c2
Least Squares Data Fitting Normal Equations
Existence, Uniqueness, and Conditioning Orthogonal Methods
Solving Linear Least Squares Problems SVD

Pseudoinverse

Define pseudoinverse of scalar to be 1/ if 6= 0, zero

otherwise
Define pseudoinverse of (possibly rectangular) diagonal
matrix by transposing and taking scalar pseudoinverse of
each entry
Then pseudoinverse of general real m ⇥ n matrix A is
given by
A+ = V ⌃+ U T
Pseudoinverse always exists whether or not matrix is
square or has full rank
If A is square and nonsingular, then A+ = A 1
In all cases, minimum-norm solution to Ax ⇠
= b is given by
x = A+ b
Michael T. Heath Scientific Computing 55 / 61
Least Squares Data Fitting Normal Equations
Existence, Uniqueness, and Conditioning Orthogonal Methods
Solving Linear Least Squares Problems SVD

Orthogonal Bases

SVD of matrix, A = U ⌃V T , provides orthogonal bases for

subspaces relevant to A

Columns of U corresponding to nonzero singular values

form orthonormal basis for span(A)

Remaining columns of U form orthonormal basis for

orthogonal complement span(A)?

Columns of V corresponding to zero singular values form

orthonormal basis for null space of A

Remaining columns of V form orthonormal basis for

orthogonal complement of null space of A

Michael T. Heath Scientific Computing 56 / 61

Least Squares Data Fitting Normal Equations
Existence, Uniqueness, and Conditioning Orthogonal Methods
Solving Linear Least Squares Problems SVD

Lower-Rank Matrix Approximation

Another way to write SVD is
A = U ⌃V T = 1 E1 + 2 E2 + ··· + n En

with Ei = ui viT
Ei has rank 1 and can be stored using only m + n storage
locations
Product Ei x can be computed using only m + n
multiplications
Condensed approximation to A is obtained by omitting
from summation terms corresponding to small singular
values
Approximation using k largest singular values is closest
matrix of rank k to A
Approximation is useful in image processing, data
compression, information retrieval, cryptography, etc.
< interactive example >
Michael T. Heath Scientific Computing 57 / 61
T
Low Rank Approximation to A = U ⌃V

• Because of the diagonal form of ⌃, we have

n
X
A = U ⌃V T = v Tj ⇡ b
uj jAx
j=1
U ⌃V T ⇡ b
• A rank k approximation to A is given by
T T T
k
U U ⌃V ⇡ U b
X
A ⇡ Ak := uj j v Tj
j=1
⌃V T ⇡ U T b

• Ak is the best approximation to
✓ ◆
R̃ A in the Frobenius
c1 norm,
q x =
2 2 O 2 c2
||M ||F := m11 + m21 + · · · + mmn
SVD for Image Compression

❑ If we view an image as an m x n matrix, we can use the SVD to

generate a low-rank compressed version.

• Because
❑ Full image storage of the
cost scales diagonal form of ⌃, we have
as O(mn)
n
X
T T
Aas
❑ Compress image storage scales = O(km)
U ⌃V +=O(kn),uwith
j j vkj < m or n.
j=1

• A rank k approximation to A is given by

k
X
A ⇡ Ak := uj j v Tj
j=1

• Ak is the best approximation to A in the Frobenius no

q
||M ||F := m211 + m221 + · · · + m2mn
Image Compression

❑ If we view an image as an m x n matrix, we can use the SVD to

generate a low-rank compressed version.

• Because
❑ Full image storage of the
cost scales diagonal form of ⌃, we have
as O(mn)
n
X
T T
Aas
❑ Compress image storage scales = O(km)
U ⌃V +=O(kn),uwith
j j vkj < m or n.
j=1

• A rank k approximation to A is given by

k
X
A ⇡ Ak := uj j v Tj
j=1

• Ak is the best approximation to A in the Frobenius no

q
k=1
||M ||F := m211 + m221 + · · · + m2mn
Image Compression

❑ If we view an image as an m x n matrix, we can use the SVD to

generate a low-rank compressed version.

❑ Full image storage cost scales as O(mn)

❑ Compress image storage scales as O(km) + O(kn), with k < m or n.

k=1 k=2 k=3 (m=536,n=432)

Matlab code
Image Compression
Compressed image storage scales as O(km) + O(kn), with k < m or n.
k=1 k=2 k=3

k=10 k=20 k=50 (m=536, n=462)

Low-Rank Approximations to Solutions of Ax = b

If 1  2  ···  n,
Pk + T
x⇡ j=1 j j jb
v u

❑ Other functions, aside from the inverse of the matrix, can

also be approximated in this way, at relatively low cost,
once the SVD is known.
Least Squares Data Fitting Normal Equations
Existence, Uniqueness, and Conditioning Orthogonal Methods
Solving Linear Least Squares Problems SVD

Comparison of Methods
Forming normal equations matrix AT A requires about
n2 m/2 multiplications, and solving resulting symmetric
linear system requires about n3 /6 multiplications
Solving least squares problem using Householder QR
factorization requires about mn2 n3 /3 multiplications
If m ⇡ n, both methods require about same amount of
work
If m n, Householder QR requires about twice as much
work as normal equations
Cost of SVD is proportional to mn2 + n3 , with
proportionality constant ranging from 4 to 10, depending on
algorithm used
Michael T. Heath Scientific Computing 59 / 61
Least Squares Data Fitting Normal Equations
Existence, Uniqueness, and Conditioning Orthogonal Methods
Solving Linear Least Squares Problems SVD

Comparison of Methods, continued

Normal equations method produces solution whose

relative error is proportional to [cond(A)]2
Required Cholesky factorization can be expected to break
p
down if cond(A) ⇡ 1/ ✏mach or worse
Householder method produces solution whose relative
error is proportional to

cond(A) + krk2 [cond(A)]2

which is best possible, since this is inherent sensitivity of

solution to least squares problem
Householder method can be expected to break down (in
back-substitution phase) only if cond(A) ⇡ 1/✏mach or worse

Michael T. Heath Scientific Computing 60 / 61

Least Squares Data Fitting Normal Equations
Existence, Uniqueness, and Conditioning Orthogonal Methods
Solving Linear Least Squares Problems SVD

Comparison of Methods, continued

Householder is more accurate and more broadly

applicable than normal equations

These advantages may not be worth additional cost,

however, when problem is sufficiently well conditioned that
normal equations provide sufficient accuracy

For rank-deficient or nearly rank-deficient problems,

Householder with column pivoting can produce useful
solution when normal equations method fails outright

SVD is even more robust and reliable than Householder,

but substantially more expensive

Michael T. Heath Scientific Computing 61 / 61

Solution Manual For Discrete Time Signal Processing 3 E 3rd Edition Alan V Oppenheim Ronald W Schafer
0% (1)
Solution Manual For Discrete Time Signal Processing 3 E 3rd Edition Alan V Oppenheim Ronald W Schafer
4 pages
Direct Methods For Sparse Linear Systems by Timothy A. Davis
No ratings yet
Direct Methods For Sparse Linear Systems by Timothy A. Davis
230 pages
Least Squares Data Fitting With Applications PDF
No ratings yet
Least Squares Data Fitting With Applications PDF
175 pages
Chap 03
No ratings yet
Chap 03
59 pages
3.1 Least-Squares Problems
No ratings yet
3.1 Least-Squares Problems
28 pages
Linear Least Squares
No ratings yet
Linear Least Squares
21 pages
Least Squares Problems
No ratings yet
Least Squares Problems
30 pages
Chapter 6: Application: The Least-Square Solution and The Least-Squares Error. 1 Best Approximation
No ratings yet
Chapter 6: Application: The Least-Square Solution and The Least-Squares Error. 1 Best Approximation
7 pages
Some Notes On Least Squares, QR-factorization, SVD and Fitting
No ratings yet
Some Notes On Least Squares, QR-factorization, SVD and Fitting
12 pages
Least-Square Method
No ratings yet
Least-Square Method
32 pages
Lecture25 Ps
No ratings yet
Lecture25 Ps
10 pages
Linear Least Squares Problems
No ratings yet
Linear Least Squares Problems
38 pages
บทที่3
No ratings yet
บทที่3
36 pages
Notas de Optimizacion
No ratings yet
Notas de Optimizacion
3 pages
KKKQ1223 Engineering Mathematics (Linear Algebra) : Best Approximation Least Squares Least Squares Fitting To Data
No ratings yet
KKKQ1223 Engineering Mathematics (Linear Algebra) : Best Approximation Least Squares Least Squares Fitting To Data
22 pages
Lecture 17 Least Squares, State Estimation
No ratings yet
Lecture 17 Least Squares, State Estimation
29 pages
Least Squares and Data Fitting
No ratings yet
Least Squares and Data Fitting
6 pages
Lec10 PDF
No ratings yet
Lec10 PDF
5 pages
Least Squares Problems
No ratings yet
Least Squares Problems
18 pages
MATH2089 NM Lectures Topic3
No ratings yet
MATH2089 NM Lectures Topic3
16 pages
Lecture 13 - Least Squares
No ratings yet
Lecture 13 - Least Squares
28 pages
leastsquares_minnorm_problems
No ratings yet
leastsquares_minnorm_problems
6 pages
Lecture 42
No ratings yet
Lecture 42
24 pages
Least Squares Problems: How To State and Solve Them, Then Evaluate Their Solutions
100% (1)
Least Squares Problems: How To State and Solve Them, Then Evaluate Their Solutions
63 pages
Module-3.1 Static, Linear Inverse Problem - Nov-06
No ratings yet
Module-3.1 Static, Linear Inverse Problem - Nov-06
29 pages
ECEN615 Fall2022 Lect16-1
No ratings yet
ECEN615 Fall2022 Lect16-1
47 pages
La LN10
No ratings yet
La LN10
9 pages
Sketching As A Tool For Numerical Linear Algebra
No ratings yet
Sketching As A Tool For Numerical Linear Algebra
139 pages
Lecture 09
No ratings yet
Lecture 09
22 pages
MA 106: Linear Algebra: J. K. Verma Department of Mathematics Indian Institute of Technology Bombay
No ratings yet
MA 106: Linear Algebra: J. K. Verma Department of Mathematics Indian Institute of Technology Bombay
13 pages
CH 2 Linear Equations 11
No ratings yet
CH 2 Linear Equations 11
28 pages
Least Squares 25
No ratings yet
Least Squares 25
6 pages
Least Squares Aproximations
No ratings yet
Least Squares Aproximations
10 pages
( (x, y · · ·, n), and given a functional β) β β) "models" the data
No ratings yet
( (x, y · · ·, n), and given a functional β) β β) "models" the data
21 pages
31 Least Squares
No ratings yet
31 Least Squares
39 pages
Solving Linear Equations(1)
No ratings yet
Solving Linear Equations(1)
14 pages
ch3 3
No ratings yet
ch3 3
6 pages
Lecture24 26
No ratings yet
Lecture24 26
9 pages
Worksheet2
No ratings yet
Worksheet2
9 pages
Applied Econometrics: Department of Economics Stern School of Business
No ratings yet
Applied Econometrics: Department of Economics Stern School of Business
27 pages
Math201 - WK12 - Sec 6.3, 6.4, 6.5, 6.6
No ratings yet
Math201 - WK12 - Sec 6.3, 6.4, 6.5, 6.6
33 pages
ECEN615 Fall2020 Lect15
No ratings yet
ECEN615 Fall2020 Lect15
52 pages
lecture_2 (1)
No ratings yet
lecture_2 (1)
24 pages
Regression Using LS Handout
No ratings yet
Regression Using LS Handout
21 pages
Econometrics I 3
No ratings yet
Econometrics I 3
27 pages
DV06 LSQ Fit
No ratings yet
DV06 LSQ Fit
61 pages
L16 - 17 Linear Algebra - Least Square Approximation
No ratings yet
L16 - 17 Linear Algebra - Least Square Approximation
11 pages
Math416 LeastSquares
No ratings yet
Math416 LeastSquares
3 pages
Least Square Ion
No ratings yet
Least Square Ion
12 pages
Least Squares Data Fitting With Applications
0% (1)
Least Squares Data Fitting With Applications
175 pages
CHP - 10.1007 - 978 3 319 74222 9 - 4
No ratings yet
CHP - 10.1007 - 978 3 319 74222 9 - 4
6 pages
MATH 685/ CSI 700/ OR 682 Lecture Notes: Least Squares
No ratings yet
MATH 685/ CSI 700/ OR 682 Lecture Notes: Least Squares
60 pages
Linear Algebra Cheat Sheet
No ratings yet
Linear Algebra Cheat Sheet
2 pages
5.3 Least Squares Problems
No ratings yet
5.3 Least Squares Problems
13 pages
Performance of Differential Evolution Method in Least Squares Fitting of Some Typical Nonlinear Curves
No ratings yet
Performance of Differential Evolution Method in Least Squares Fitting of Some Typical Nonlinear Curves
21 pages
Lect-14 &15 (lsp1) - 2023
No ratings yet
Lect-14 &15 (lsp1) - 2023
42 pages
72073931-8e00-4107-bdde-c19d4ec282cb
No ratings yet
72073931-8e00-4107-bdde-c19d4ec282cb
5 pages
Chapter 05 - Least Squares
No ratings yet
Chapter 05 - Least Squares
27 pages
5_6122829974731751861
No ratings yet
5_6122829974731751861
8 pages
MATA2754 Least Square Fitting
No ratings yet
MATA2754 Least Square Fitting
19 pages
Numerical Analysis II Essentials
From Everand
Numerical Analysis II Essentials
The Editors of REA
No ratings yet
Matrix Theory and Applications for Scientists and Engineers
From Everand
Matrix Theory and Applications for Scientists and Engineers
Alexander Graham
No ratings yet
IMSL Fortran Library User Guide 1 PDF
No ratings yet
IMSL Fortran Library User Guide 1 PDF
947 pages
Matrix Decomposition and Its Application in Statistics - NK
100% (1)
Matrix Decomposition and Its Application in Statistics - NK
82 pages
MA7163 Applied Mathematics For Electrical Engineers
No ratings yet
MA7163 Applied Mathematics For Electrical Engineers
20 pages
QR Factorization Chapter4
0% (1)
QR Factorization Chapter4
12 pages
Orthogonality and Vector Spaces
No ratings yet
Orthogonality and Vector Spaces
18 pages
Parallel Searching-Based Sphere Detector For MIMO
No ratings yet
Parallel Searching-Based Sphere Detector For MIMO
14 pages
ECEN 615 Methods of Electric Power Systems Analysis: Lecture 19: Equivalents, Voltage Stability
No ratings yet
ECEN 615 Methods of Electric Power Systems Analysis: Lecture 19: Equivalents, Voltage Stability
35 pages
2007 - Operational Modal Analysis and Finite Element Model Correlation of The Braga Stadium Suspended Roof - 2007
No ratings yet
2007 - Operational Modal Analysis and Finite Element Model Correlation of The Braga Stadium Suspended Roof - 2007
11 pages
Real Statistics Examples Part 1B
No ratings yet
Real Statistics Examples Part 1B
421 pages
QR Factorisation Method
No ratings yet
QR Factorisation Method
21 pages
Sagi Rama Krishnam Raju Engineering College (Autonomous)
No ratings yet
Sagi Rama Krishnam Raju Engineering College (Autonomous)
35 pages
Elementary Linear Algebra 10th Edition-664-700
No ratings yet
Elementary Linear Algebra 10th Edition-664-700
37 pages
Linear Algebra in Data Science Peter Zizler, Roberta La Haye Z Library
No ratings yet
Linear Algebra in Data Science Peter Zizler, Roberta La Haye Z Library
232 pages
Dynamic Modularity Approach to Adaptive Control of Robotic Systems With Closed Architecture
No ratings yet
Dynamic Modularity Approach to Adaptive Control of Robotic Systems With Closed Architecture
8 pages
QRD-RLS Adaptive Filter Based Antenna Beam Forming For OFDM
No ratings yet
QRD-RLS Adaptive Filter Based Antenna Beam Forming For OFDM
12 pages
Surat cerai.pdf
No ratings yet
Surat cerai.pdf
23 pages
MAT 124-13 Exercises On Vectors
No ratings yet
MAT 124-13 Exercises On Vectors
10 pages
Numerical Linear Algebra and Applications 2nd Edition Biswa Nath Datta - Read the ebook now with the complete version and no limits
No ratings yet
Numerical Linear Algebra and Applications 2nd Edition Biswa Nath Datta - Read the ebook now with the complete version and no limits
43 pages
Matrix Fcatorization
No ratings yet
Matrix Fcatorization
59 pages
QR Decomposition
No ratings yet
QR Decomposition
10 pages
A New Diffusion Variable Spatial Regularized QRRLS Algorithm
No ratings yet
A New Diffusion Variable Spatial Regularized QRRLS Algorithm
5 pages
Matrix Computations
No ratings yet
Matrix Computations
180 pages
Mathematics 4048/01: Preliminary Examination 2022
No ratings yet
Mathematics 4048/01: Preliminary Examination 2022
18 pages
Scientific computing with case studies 1st Edition Dianne P. O'Leary all chapter instant download
100% (9)
Scientific computing with case studies 1st Edition Dianne P. O'Leary all chapter instant download
52 pages
Understanding Polyfitn
No ratings yet
Understanding Polyfitn
4 pages
23MAT112_06_MIS2_Gram-Schmidt Orthogonalisation
No ratings yet
23MAT112_06_MIS2_Gram-Schmidt Orthogonalisation
13 pages
Matrix_Theory_BTL_Questions
No ratings yet
Matrix_Theory_BTL_Questions
2 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.