Least Squares and Data Fitting
Least Squares and Data Fitting
Warm-up: Suppose we want to find a quadratic polynomial f (x) = ax2 + bx + c such that f (−1) = 1, f (0) = 0,
f (1) = 2, and f (2) = 5. Express this problem as a linear system.
Solution. We want
1 = f (−1) = a(−1)2 + b(−1) + c = a − b + c
0= f (0) = a(0)2 + b(0) + c =c
2
2= f (1) = a(1) + b(1) + c =a+b+c
2
5= f (2) = a(2) + b(2) + c = 4a + 2b + c
That is, we are trying to solve the linear system
a− b+c=1
c=0
,
a+ b+c=2
4a + 2b + c = 5
which can be rewritten as the matrix equation
1 −1 1 1
0 a
0 1 0
b = .
1 1 1 2
c
4 2 1 5
1 0
RECALL: Let M = 2 1.
4 3
1. The general situation we’re looking at is that we have a system A~x = ~b which may be inconsistent.
So, rather than finding solutions of A~x = ~b, we’d like to find vectors ~x ∗ which make A~x ∗ as close as
possible to ~b. (These are called least squares solutions of A~x = ~b.)
(a) We have a name for all possible vectors of the form A~x ∗ : what?
(b) In terms of your answer to ??, how could we find the vector A~x ∗ closest to ~b? (A picture may
help.)
Solution. We are looking for the vector in im A closest to ~b, and this is exactly projim A ~b.
~b
O
im A projim A ~b
1
(c) How could you use your answer to ?? to find the least squares solutions to the system in ???
(Don’t do the calculations; just come up with a strategy.)
Solution. We know how to calculate projim A ~b: we first find an orthonormal basis of im A, and
then we can use #2(c) on the worksheet “Orthogonal Projections and Orthonormal Bases” to
calculate the projection. Once we know projim A ~b, we can solve the system A~x ∗ = projim A ~b
using Gauss-Jordan. However, this is a lot of work!
(d) There is actually a much more efficient way to compute least squares solutions of A~x = ~b! The
key is to think about ~b − A~x ∗ ; if ~x ∗ is a least squares solution of A~x = ~b, what must be true about
~b − A~x ∗ ? (Is the converse true?)
Solution. The key to projecting onto a subspace is “dropping a perpendicular” to the subspace;
in our picture, that perpendicular is ~b − A~x ∗ :
~b
~b − A~
x∗
O
im A x∗
A~
(e) How can we use ?? to come up with a more efficient way of finding least squares solutions of
A~x = ~b?
Solution. So far, we’ve found:
But we’ve seen that (im A)⊥ = ker(AT ), so we can rewrite this as
⇐⇒ ~b − A~x ∗ ∈ ker(AT )
⇐⇒ AT (~b − A~x ∗ ) = ~0
And now we can just do some algebra to make the equation we have look nicer:
⇐⇒ AT ~b − AT A~x ∗ = ~0
⇐⇒ AT A~x ∗ = AT ~b
2. Suppose that we want to fit a line to the data points (−1, 3), (0, 1), and (1, 1).
(a) Do you expect the slope of the line to be positive, negative, or zero?
Solution. Looking at the data points, we see that the best-fit line should have negative slope.
2
Solution. The given data points are on the line y = mx + c ⇐⇒ the following system is satisfied:
−m + c = 3
c=1
m+c=1
We can rewrite this as
−1 1 3
0 m
1 = 1 .
c
1 1 1
Thus, if
−1 1 3
A= 0 1 and ~b = 1 ,
1 1 1
we are looking for solutions of the system A~x = ~b. This system is inconsistent, but the least
squares solutions are the solutions of the system AT A~x = AT ~b, or
2 0 m −2
= .
0 3 c 5
This system has a unique solution, which is
m −1
= .
c 5/3
- 4 - 3 - 2 -1 1 2 3 4
-1
-2
3. (T/F) Suppose AT A is invertible, then A(AT A)−1 AT = A(A−1 A−T )AT = (AA−1 )(A−T A) = I
Solution. False, because A might not be invertible. However, ker A is trivial.
What can one say about the matrix A(AT A)−1 AT (suppose AT A is invertible)?
Solution. The matrix A(AT A)−1 AT is the projection matrix onto im(A). If the columns of A are
orthonormal, this simplifies to P = AAT . The reason is:
If the columns of A are orthonormal, then AT A = I (In class, I said A−1 = AT which is only true
if A is a square matrix, and often that’s not the case. So, my mistake, sorry!). But all we need is
thatAT A = I because A(AT A)−1 AT = A(I)T AT = AAT .
3
4. The following table describes the percent of classes that Harvard students attend:
We suspect that p(y) looks like ky n for some constants k and n, and we would like to find k and n.
(a) Do you expect k and n to be positive or negative? What number should k be close to?
Solution. Since p(1) will be approximated by k · 1n = k, we expect k to be close to 100 (in
particular, k should be positive). We expect n to be negative since p(y) decreases as y increases.
k · 1n = 100
k · 2n = 90
k · 3n = 60
k · 4n = 10
The problem is that these are not linear equations. However, if we take the natural log of all of
these equations and write c = ln k, then the equations become
c + n ln 1 = ln 100
c + n ln 2 = ln 90
c + n ln 3 = ln 60
c + n ln 4 = ln 10
1 ln 4 ln 10
Thus, if
1 ln 1 ln 100
1 ln 2
and ~b = ln 90 ,
A=
1 ln 3 ln 60
1 ln 4 ln 10
then the least squares solutions are the solutions of the linear system AT A~x = AT ~b. Using
Mathematica, we see that this system has a unique solution,
c 4.98
≈ .
n −1.39
Since k = ec , k ≈ 145.478. So, p(y) ≈ 145y −1.39 . The graph of p looks like this:
4
200
150
100
50
0
0 1 2 3 4
5. Let A be an n × m matrix. For which vectors ~b ∈ Rn is it true that the least squares solutions of A~x = ~b
form a subspace of Rm ?
Solution. The least squares solutions of A~x = ~b are the solutions of A~x = projim A ~b. In order for
this set of solutions to be a subspace, the set must contain ~0. That is, ~x = ~0 must be a solution of
A~x = projim A ~b, which means projim A ~b must be ~0. Therefore, ~b must be in (im A)⊥ . So, we conclude
that, if the least squares solutions of A~x = ~b form a subspace of Rn , then ~b must be in (im A)⊥ .
Conversely, if ~b is indeed in (im A)⊥ , then projim A ~b = ~0, so the least squares solutions of A~x = ~b are
exactly the solutions of A~x = ~0. That is, the set of least squares solutions of A~x = ~b is simply ker A,
which is definitely a subspace of Rm .
So, we conclude that the least squares solutions of A~x = ~b form a subspace of Rm ⇐⇒ ~b ∈ (im A)⊥ .
So, we’ve shown that the only vector in ker A is ~0, and the statement is true .
If ~x ∈ ker(AT A), that means (by definition!) that AT A~x = ~0. In other words, A~x ∈ ker(AT ). But
we’ve shown before that ker(AT ) = (im A)⊥ , so this is the same as saying that A~x ∈ (im A)⊥ .
Now, A~x is in im A (by definition of the image), so A~x is in both im A and (im A)⊥ . That is, A~x
is in im A and orthogonal to im A; the only way this can happen is if A~x = ~0. But the statement
A~x = ~0 is saying that ~x ∈ ker A, and we were told that the only vector in ker A was ~0. So, ~x = ~0.
To summarize, if ~x ∈ ker(AT A), then ~x = ~0; that is, the only vector in ker(AT A) is ~0, so AT A is
indeed invertible, and the statement is true .
1. You should understand what we mean by a least squares solution of a linear system A~x = ~b
5
2. You should understand why the normal equation AT A~x = AT ~b gives us least squares solutions of
A~x = ~b
3. You should be able to solve data fitting problems using least squares.
7. Decide whether each of the following statements is true or false. If the statement is true, explain why
briefly; if the statement is false, give a counterexample.
(a) The least squares solutions of A~x = ~b are exactly the solutions of A~x = projim A ~b
(b) If ~x∗ is a least squares solution of A~x = ~b, then ||~b||2 = ||A~x∗ ||2 + ||~b − A~x∗ ||2
(d) Even if the system A~x = ~b is inconsistent, the system AT A~x = AT ~b is consistent.