0% found this document useful (0 votes)
44 views

MA 106: Linear Algebra: J. K. Verma Department of Mathematics Indian Institute of Technology Bombay

1. The document discusses least squares approximation to fit data points to a linear or polynomial function. It aims to minimize the total error between the actual and predicted values. 2. The normal equations Ax=b are derived, where A is the design matrix, x contains the coefficients, and b the observed values. Solving the normal equations finds the coefficients that best fit the data in the least squares sense. 3. An example fits data points to a straight line and parabola using the least squares method by solving the normal equations. Orthogonal matrices that represent rotations and reflections are also briefly reviewed.

Uploaded by

jatin choudhary
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
44 views

MA 106: Linear Algebra: J. K. Verma Department of Mathematics Indian Institute of Technology Bombay

1. The document discusses least squares approximation to fit data points to a linear or polynomial function. It aims to minimize the total error between the actual and predicted values. 2. The normal equations Ax=b are derived, where A is the design matrix, x contains the coefficients, and b the observed values. Solving the normal equations finds the coefficients that best fit the data in the least squares sense. 3. An example fits data points to a straight line and parabola using the least squares method by solving the normal equations. Orthogonal matrices that represent rotations and reflections are also briefly reviewed.

Uploaded by

jatin choudhary
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

MA 106 : Linear Algebra

Lecture 11

J. K. Verma
Department of Mathematics
Indian Institute of Technology Bombay

J. K. Verma 0 / 12
Least squares approximation

1 Suppose we have a large number of data points (xi , yi ) i = 1, 2, . . . , n,


collected from some experiment.
2 Sometime we believe that these points lie on a straight line.
3 So a linear function y (x) = s + tx may satisfy

y (xi ) = yi , i = 1, . . . , n.

4 Due to uncertainty in data and experimental error, in practice the points will
deviate somewhat from a straight line and so it is impossible to find a linear
y (x) that passes through all of them.
5 So we seek a line that fits the data well, in the sense that the errors are made
as small as possible.

J. K. Verma 1 / 12
Least squares approximation

1 A natural question that arises now is: how do we define the error? Consider
the following system of linear equations, in the variables s and t, and known
coefficients xi , yi , i = 1, . . . , n:

s + x1 t = y1
s + x2 t = y2
.
.
s + xn t = yn

2 Note that typically n would be much greater than 2. If we can find s and t to
satisfy all these equations, then we have solved our problem.
3 However, for reasons mentioned above, this is not always possible.

J. K. Verma 2 / 12
Least squares approximation

1 For given s and t, the error in the ith equation is |yi − s − xi t|.
2 There are several ways of combining the errors in the individual equations to
get a measure of the total error.
3 The following are three examples:
v
u n n
uX X
t (yi − s − xi t)2 , |yi − s − xi t|, max 1≤i≤n |yi − s − xi t|.
i=1 i=1

4 Both analytically and computationally, a nice theory exists for the first of
these choices and this is what we shall study.
qP
n 2
The problem of finding s, t so as to minimize i=1 (yi − s − xi t)
5

is called a least squares problem.

J. K. Verma 3 / 12
Least squares approximation

1 Suppose that
     
1 x1 y1 s + tx1
 1 x2   y2   s + tx 
  2 
s
    
A= . . , b =  . , x = , so Ax =  . .
     
    t  
 . .   .   . 
1 xn yn s + txn

2 The least squares problem is finding an x such that ||b − Ax|| is minimized,
i.e., find an x such that Ax is the best approximation to b in the column
space C (A) of A.
3 This is precisely the problem of finding x such that b − Ax ∈ C (A)⊥ .
4 Note that b − Ax ∈ C (A)⊥ ⇐⇒ At (b − Ax) = 0 ⇐⇒ At Ax = At b. These
are the normal equations for the least square problem.

J. K. Verma 4 / 12
Least squares approximation
1 Example. Find s, t such that the straight line y = s + tx best fits the
following data in the least squares sense:

y = 1 at x = −1, y = 1 at x = 1, y = 3 at x = 2.

1 −1
   
1
2 Project b =  1  onto the column space of A =  1 1 .
3 1 2
   
3 2 5
3 Now At A = and At b = . The normal equations are
2 6 6
    
3 2 s 5
= .
2 6 t 6

4 The solution is s = 9/7, t = 4/7.


9
5 Therefore the best line which fits the data is y = 7 + 47 x.
J. K. Verma 5 / 12
Least squares approximation

1 We can also try to fit an mth degree polynomial

y (x) = s0 + s1 x + s2 x 2 + · · · + sm x m

to the data points (xi , yi ), i = 1, . . . , n, so as to minimize the error in the


least squares sense.
2 In this case s0 , s1 , . . . , sm are the variables and we have
     
1 x1 x12 . . x1m y1 s0
 1 x x2 . . x2m  y2   s1
   
 2 2   
A= . . . . . , b =  . , x =  . .
     
     
 . . . . .   .   . 
1 xn xn2 . . xnm
yn sm

J. K. Verma 6 / 12
Fitting data points by a parabola

1 The path of a projectile is a parabola. By observing it at a few points, we can


describe the parabolic path which best fits the data.
2 This is done by the least squares approximation which was first described by
C. F. Gauss who found the trajectory of the largest asteroid Ceres in the
Solar system.
3 Problem. Fit heights b1 , b2 , . . . , bm at the times t1 , t2 , . . . , tm by a parabola
c + dt + et 2 .
4 Solution. If each observation is correct then the equations are

c + dt1 + et12 = b1
c + dt2 + et22 = b2
..
.
2
c + dtm + etm = bm

J. K. Verma 7 / 12
Fitting by a parabola

1 The best parabola c + dt + et 2 which fits the m data points is found by


solving the normal equations At Ax = At b where x = (c, d, e)t and

t12
 
1 t1
 1 t2 t22 
A= .
 
.. .. ..
 . . . 
2
1 tm tm

2 Since t1 , t2 , . . . , tm are distinct, if we have m ≥ 3 then the rank of A is three


since using the van der Monde determinant, we see that the determinant of
the sub matrix of the first three rows and all the columns is nonzero.
3 Hence rankAt A = rankA = 3. This means At A is invertible. Therefore the
only solution to At Ax = At b is given by

x = (At A)−1 At b.

J. K. Verma 8 / 12
Review of orthogonal matrices
1 A real n × n matrix Q is called orthogonal if Q t Q = I .
2 A 2 × 2 orthogonal matrix has two possibilities:
   
cos θ − sin θ cos θ sin θ
A= or B = .
sin θ cos θ sin θ − cos θ
3 TA represents rotation of R2 by θ radians in anticlockwise direction.
4 The matrix B represents a reflection with respect to y = tan(θ/2)x.
5 Definition. A hyperplane in Rn is a subspace of dimension n − 1.
6 A linear map T : Rn → Rn is called a reflection with respect to a hyperplane
H if Tu = −u where u ⊥ H and Tu = u for all u ∈ H.
7 Definition. Let u be a unit vector in Rn . The Householder matrix of u, for
reflection with respect to L(u)⊥ is H = I − 2uu t . Hence
Hu = u − 2u(u t u) = −u. If w ⊥ u then Hw = w − 2uu t w = w .
8 So H induces reflection in the plane perpendicular to the line L(u).
9 Exercise. Show that H is a symmetric and orthogonal.
J. K. Verma 9 / 12
The QR decomposition of a matrix

1 Problem. Suppose A = [u1 u2 . . . un ] is an m × n matrix whose column


vectors u1 , u2 , . . . un are linearly independent. Let Q be the matrix whose
column vectors are q1 , q2 , . . . , qn which are obtained from u1 , u2 , . . . , un by
applying the Gram-Schmidt orthonormalization process. How are A and Q
related?
2 Let C (A) denote the column space of A. Then {q1 , q2 , . . . , qn } is an
orthonormal basis of C (A). Hence

u1 = hq1 , u1 iq1 + hq2 , u1 iq2 + · · · + hqn , u1 iqn


u2 = hq1 , u2 iq1 + hq2 , u2 iq2 + · · · + hqn , u2 iqn
..
.
un = hq1 , un iq1 + hq2 , un iq2 + · · · + hqn , un iqn

J. K. Verma 10 / 12
The QR decomposition of a matrix

1 These equations can be written in the matrix form as

hq1 , u1 i hq1 , u2 i hq1 , un i


 
...
 hq2 , u1 i hq2 , u2 i ... hq2 , un i 
[u1 u2 . . . un ] = [q1 q2 . . . qn ] 
 
.. .. .. 
 . . . 
hqn , u1 i hqn , u2 i . . . hqn , un i

2 By the Gram-Schmidt construction qi ⊥ uj for all j ≤ i − 1.


3 Hence the matrix R = (hqi , uj i) is upper triangular.
4 Moreover none of the diagonal entries are zero. Hence R is an invertible
upper triangular matrix.
5 This gives us A = QR where Q = [q1 q2 . . . qn ].

J. K. Verma 11 / 12
The QR decomposition and the normal equations

1 If A is a square matrix then the column vectors of Q form an orthonormal


basis of Rn and QQ t = I .
2 Matrices which satisfy this condition are called orthogonal matrices.
3 The normal equations for the least squares approximation are At Ax = At b.
These correspond to Ax = b. Let us use the QR decomposition of A to solve
the normal equations.
4 Theorem. Let A be an m × n matrix with linearly independent column
vectors. Let A = QR be its QR-decomposition. Then for every b ∈ Rm the
system Ax = b has a unique least squares solution given by x = R −1 Q t b.
5 Proof. Substitute A = QR in the equations At Ax = At b to get

(QR)t QRx = (QR)t b =⇒ R t (Q t Q)Rx = R t Q t b =⇒ x = R −1 Q t b.

J. K. Verma 12 / 12

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy