0% found this document useful (0 votes)
15 views

Least Squares and Data Fitting

Uploaded by

Muhammad Salman
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views

Least Squares and Data Fitting

Uploaded by

Muhammad Salman
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Least Squares and Data Fitting

Warm-up: Suppose we want to find a quadratic polynomial f (x) = ax2 + bx + c such that f (−1) = 1, f (0) = 0,
f (1) = 2, and f (2) = 5. Express this problem as a linear system.

Solution. We want
1 = f (−1) = a(−1)2 + b(−1) + c = a − b + c
0= f (0) = a(0)2 + b(0) + c =c
2
2= f (1) = a(1) + b(1) + c =a+b+c
2
5= f (2) = a(2) + b(2) + c = 4a + 2b + c
That is, we are trying to solve the linear system
a− b+c=1
c=0
,
a+ b+c=2
4a + 2b + c = 5
which can be rewritten as the matrix equation
   
1 −1 1   1
0 a
 0 1 0
 b =   .
1 1 1 2
c
4 2 1 5
 
1 0
RECALL: Let M = 2 1.
4 3

1. How do you find a basis for (im M )⊥ ?

2. What can you say about (im M )⊥ in general? (im M )⊥ = ker AT

1. The general situation we’re looking at is that we have a system A~x = ~b which may be inconsistent.
So, rather than finding solutions of A~x = ~b, we’d like to find vectors ~x ∗ which make A~x ∗ as close as
possible to ~b. (These are called least squares solutions of A~x = ~b.)

(a) We have a name for all possible vectors of the form A~x ∗ : what?

Solution. This is exactly the image of A.

(b) In terms of your answer to ??, how could we find the vector A~x ∗ closest to ~b? (A picture may
help.)

Solution. We are looking for the vector in im A closest to ~b, and this is exactly projim A ~b.

~b

O
im A projim A ~b

1
(c) How could you use your answer to ?? to find the least squares solutions to the system in ???
(Don’t do the calculations; just come up with a strategy.)

Solution. We know how to calculate projim A ~b: we first find an orthonormal basis of im A, and
then we can use #2(c) on the worksheet “Orthogonal Projections and Orthonormal Bases” to
calculate the projection. Once we know projim A ~b, we can solve the system A~x ∗ = projim A ~b
using Gauss-Jordan. However, this is a lot of work!

(d) There is actually a much more efficient way to compute least squares solutions of A~x = ~b! The
key is to think about ~b − A~x ∗ ; if ~x ∗ is a least squares solution of A~x = ~b, what must be true about
~b − A~x ∗ ? (Is the converse true?)

Solution. The key to projecting onto a subspace is “dropping a perpendicular” to the subspace;
in our picture, that perpendicular is ~b − A~x ∗ :

~b
~b − A~
x∗
O
im A x∗
A~

More precisely, A~x ∗ = projim A ~b ⇐⇒ ~b − A~x ∗ ∈ (im A)⊥ .

(e) How can we use ?? to come up with a more efficient way of finding least squares solutions of
A~x = ~b?
Solution. So far, we’ve found:

~x ∗ is a least squares solution of A~x = ~b ⇐⇒ ~b − A~x ∗ ∈ (im A)⊥

But we’ve seen that (im A)⊥ = ker(AT ), so we can rewrite this as

⇐⇒ ~b − A~x ∗ ∈ ker(AT )

Using the definition of the kernel, we can rewrite this:

⇐⇒ AT (~b − A~x ∗ ) = ~0

And now we can just do some algebra to make the equation we have look nicer:

⇐⇒ AT ~b − AT A~x ∗ = ~0
⇐⇒ AT A~x ∗ = AT ~b

2. Suppose that we want to fit a line to the data points (−1, 3), (0, 1), and (1, 1).

(a) Do you expect the slope of the line to be positive, negative, or zero?

Solution. Looking at the data points, we see that the best-fit line should have negative slope.

(b) Find the best-fit line.

2
Solution. The given data points are on the line y = mx + c ⇐⇒ the following system is satisfied:
−m + c = 3
c=1
m+c=1
We can rewrite this as    
−1 1   3
 0 m
1 = 1 .
c
1 1 1
Thus, if    
−1 1 3
A= 0 1 and ~b = 1 ,
1 1 1
we are looking for solutions of the system A~x = ~b. This system is inconsistent, but the least
squares solutions are the solutions of the system AT A~x = AT ~b, or
    
2 0 m −2
= .
0 3 c 5
This system has a unique solution, which is
   
m −1
= .
c 5/3

Thus, the line that best fits the data points is y = −x + 35 .

- 4 - 3 - 2 -1 1 2 3 4
-1

-2

3. (T/F) Suppose AT A is invertible, then A(AT A)−1 AT = A(A−1 A−T )AT = (AA−1 )(A−T A) = I
Solution. False, because A might not be invertible. However, ker A is trivial.
What can one say about the matrix A(AT A)−1 AT (suppose AT A is invertible)?

Solution. The matrix A(AT A)−1 AT is the projection matrix onto im(A). If the columns of A are
orthonormal, this simplifies to P = AAT . The reason is:

If the columns of A are orthonormal, then AT A = I (In class, I said A−1 = AT which is only true
if A is a square matrix, and often that’s not the case. So, my mistake, sorry!). But all we need is
thatAT A = I because A(AT A)−1 AT = A(I)T AT = AAT .

3
4. The following table describes the percent of classes that Harvard students attend:

Year (y) Percent of Classes Attended (p)


1 (Freshman) 100
2 (Sophomore) 90
3 (Junior) 60
4 (Senior) 10

We suspect that p(y) looks like ky n for some constants k and n, and we would like to find k and n.

(a) Do you expect k and n to be positive or negative? What number should k be close to?
Solution. Since p(1) will be approximated by k · 1n = k, we expect k to be close to 100 (in
particular, k should be positive). We expect n to be negative since p(y) decreases as y increases.

(b) How might we express this problem as a linear system?


Solution. We have the equations

k · 1n = 100
k · 2n = 90
k · 3n = 60
k · 4n = 10

The problem is that these are not linear equations. However, if we take the natural log of all of
these equations and write c = ln k, then the equations become

c + n ln 1 = ln 100
c + n ln 2 = ln 90
c + n ln 3 = ln 60
c + n ln 4 = ln 10

We can rewrite this system as a matrix equation


   
1 ln 1   ln 100
1 ln 2 c ln 90 
1 ln 3 n = ln 60  .
   

1 ln 4 ln 10

Thus, if    
1 ln 1 ln 100
1 ln 2
 and ~b = ln 90  ,
 
A=
1 ln 3 ln 60 
1 ln 4 ln 10

then the least squares solutions are the solutions of the linear system AT A~x = AT ~b. Using
Mathematica, we see that this system has a unique solution,
   
c 4.98
≈ .
n −1.39

Since k = ec , k ≈ 145.478. So, p(y) ≈ 145y −1.39 . The graph of p looks like this:

4
200

150

100

50

0
0 1 2 3 4

5. Let A be an n × m matrix. For which vectors ~b ∈ Rn is it true that the least squares solutions of A~x = ~b
form a subspace of Rm ?

Solution. The least squares solutions of A~x = ~b are the solutions of A~x = projim A ~b. In order for
this set of solutions to be a subspace, the set must contain ~0. That is, ~x = ~0 must be a solution of
A~x = projim A ~b, which means projim A ~b must be ~0. Therefore, ~b must be in (im A)⊥ . So, we conclude
that, if the least squares solutions of A~x = ~b form a subspace of Rn , then ~b must be in (im A)⊥ .

Conversely, if ~b is indeed in (im A)⊥ , then projim A ~b = ~0, so the least squares solutions of A~x = ~b are
exactly the solutions of A~x = ~0. That is, the set of least squares solutions of A~x = ~b is simply ker A,
which is definitely a subspace of Rm .
So, we conclude that the least squares solutions of A~x = ~b form a subspace of Rm ⇐⇒ ~b ∈ (im A)⊥ .

6. (a) True or false: If AT A is invertible, then ker A = {~0}. Explain.


Solution. Let’s see what we can say about ker A. If ~x ∈ ker A, this means (by definition of
kernel) that A~x = ~0. Multiplying both sides of this equation by AT on the left, AT A~x = ~0. But
since AT A is invertible, we can multiply again on the left by (AT A)−1 to get simply ~x = ~0.

So, we’ve shown that the only vector in ker A is ~0, and the statement is true .

(b) True or false: If ker A = {~0}, then AT A is invertible.


Solution. The matrix AT A is always square, so it is invertible if and only if its kernel is {~0}.
(There are lots of other ways to characterize invertible matrices, but since we are told something
about ker A, it seems to make sense to think about ker(AT A).) So, let’s look at ker(AT A).

If ~x ∈ ker(AT A), that means (by definition!) that AT A~x = ~0. In other words, A~x ∈ ker(AT ). But
we’ve shown before that ker(AT ) = (im A)⊥ , so this is the same as saying that A~x ∈ (im A)⊥ .
Now, A~x is in im A (by definition of the image), so A~x is in both im A and (im A)⊥ . That is, A~x
is in im A and orthogonal to im A; the only way this can happen is if A~x = ~0. But the statement
A~x = ~0 is saying that ~x ∈ ker A, and we were told that the only vector in ker A was ~0. So, ~x = ~0.

To summarize, if ~x ∈ ker(AT A), then ~x = ~0; that is, the only vector in ker(AT A) is ~0, so AT A is
indeed invertible, and the statement is true .

1. You should understand what we mean by a least squares solution of a linear system A~x = ~b

5
2. You should understand why the normal equation AT A~x = AT ~b gives us least squares solutions of
A~x = ~b

3. You should be able to solve data fitting problems using least squares.

Are you ready? Let’s check!

7. Decide whether each of the following statements is true or false. If the statement is true, explain why
briefly; if the statement is false, give a counterexample.

(a) The least squares solutions of A~x = ~b are exactly the solutions of A~x = projim A ~b

(b) If ~x∗ is a least squares solution of A~x = ~b, then ||~b||2 = ||A~x∗ ||2 + ||~b − A~x∗ ||2

(c) Every linear system has a unique least squares solution.

(d) Even if the system A~x = ~b is inconsistent, the system AT A~x = AT ~b is consistent.

(e) For any matrix A, (ker A)⊥ = im(AT ).

Hint: Exactly one of the statements is false.

Solution. (c) is false. AT A might not be invertible.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy