Math 115 Lecture Notes

Download as pdf or txt
Download as pdf or txt
You are on page 1of 321

MATH 115 Lecture Notes

University of Waterloo
August 28, 2023
Lecture 1

Complex Numbers – Standard Form


Recall the number systems you know:
Natural Numbers: N = {1, 2, 3, . . .}
Integers: Z = {.
n .a. , −3, −2, −1, 0, 1,o2, 3, . . .}
Rational Numbers: Q = a, b ∈ Z, b 6= 0
b
Real Numbers: R, the set or collection of all rational and irrational numbers
Note that every natural number is an integer, every integer is a rational number (with
denominator equal to 1) and that every rational number is a real number. Consider the
following five equations:
x+3=5 (1)
x+4=3 (2)
2x = 1 (3)
x2 = 2 (4)
x2 = −2 (5)
Equation (1) has solution x = 2, and thus can be solved using natural numbers. Equation (2)
does not have a solution in the natural numbers, but it does have a solution in the integers,
namely x = −1. Equation (3) does not have a solution in the integers, but it does have a
rational solution of x = 12 . Equation (4) does not have a rational solution, but it does a have

real solution: x = 2. Finally, since the square of any real number is greater than or equal
to zero, Equation (5) does not have a real solution. In order to solve this last equation, we
will need a “larger” set of numbers.

We introduce a bit of notation here. When we write x ∈ R, we mean that the variable x is
a real number. As another example, by p, q ∈ Z, we mean that both p and q are integers.
By x ∈/ N, we mean that x is not a natural number.

Definition 1.1: Complex Number in Standard Form


A complex number in standard form is an expression of the form x + yj where x, y ∈ R
and j satisfies j 2 = −1. The set of all complex numbers is denoted by

C = {x + yj | x, y ∈ R}.

Note that mathematicians (and most other humans including the authors of the text) use
i rather than j, however engineers use j since i is often used in the modelling of electric
networks.

1
Example 1.1
• 3 = 3 + 0j ∈ C

• 4j = 0 + 4j ∈ C

• 3 + 4j ∈ C

• sin(π/7) + π π j ∈ C

In fact, every x ∈ R can be expressed as x = x + 0j ∈ C, so every real number is a complex


number. However, not every complex number is real, for example, 3 + 4j ∈ / R.

We introduce a little bit more notation here. We just mentioned that every real number is a
complex number. We denote this by R ⊆ C and say that R is a subset of C. We also showed
that not every complex number is a real number, which we denote by C 6⊆ R and say that
C is not a subset of R. From the definitions of natural numbers, integers, rational numbers,
real numbers and complex numbers, we have

N ⊆ Z ⊆ Q ⊆ R ⊆ C.

Definition 1.2: Real and Imaginary Parts


Let z = x + yj ∈ C with x, y ∈ R. We call x the real part of z and we call y the
imaginary part of z:

x = Re(z) (sometimes written as R(z))


y = Im(z) (sometimes written as I(z))

If y = 0, then z = x is purely real (we normally just say “real”). If x = 0, then z = yj


is purely imaginary.

Example 1.2

• Re(3 − 4j) = 3

• Im(3 − 4j) = −4

It is important to note that Im(3 − 4j) 6= −4j. By definition, for any z ∈ C we have
Re(z), Im(z) ∈ R, that is, both the real and imaginary parts of a complex number are real
numbers.

Having defined complex numbers, we now look at how the basic algebraic operations of
addition, subtraction, multiplication and division are defined.

2
Definition 1.3: Equality of Complex Numbers
Two complex numbers z = x + yj and w = u + vj with x, y, u, v ∈ R are equal if and
only if x = u and y = v, that is, if and only if Re(z) = Re(w) and Im(z) = Im(w).

Simply put, two complex numbers are equal if they have the same real parts and the same
imaginary parts.
Definition 1.4: Operations on Complex Numbers
Let z = x + yj and w = u + vj be two complex numbers in standard form. We define
addition, subtraction and multiplication as

z + w = (x + yj) + (u + vj) = (x + u) + (y + v)j


z − w = (x + yj) − (u + vj) = (x − u) + (y − v)j
zw = (x + yj)(u + vj) = (xu − yv) + (xv + yu)j

To add two complex numbers, we simply add the real parts and add the imaginary parts.
Subtraction is done similarly. With our definition of multiplication, we can verify that
j 2 = −1:
j 2 = (j)(j) = (0 + 1j)(0 + 1j) = (0(0) − 1(1)) + (0(1) + 1(0))j = −1 + 0j = −1
There is no need to memorize the formula for multiplication of complex numbers. Using the
fact that j 2 = −1, we can simply do a binomial expansion:
(x + yj)(u + vj) = xu + xvj + yuj + yvj 2
= xu + xvj + yuj − yv
= (xu − yv) + (xv + yu)j

Example 1.3
Let z = 3 − 2j and w = −2 + j. Compute z + w, z − w and zw. Express your answers
in standard form.
Solution. We have

z + w = (3 − 2j) + (−2 + j) = (3 + (−2)) + (−2 + 1)j = 1 − j


z − w = (3 − 2j) − (−2 + j) = (3 − (−2)) + (−2 − 1)j = 5 − 3j
zw = (3 − 2j)(−2 + j) = −6 + 3j + 4j − 2j 2 = −6 + 3j + 4j + 2 = −4 + 7j

We see that addition, subtraction and multiplication are similar to that of real numbers, just
a little more complicated. We now look at division of complex numbers.

3
Example 1.4
Let z = x + yj be a nonzero complex number. Show that
1 x y
= 2 2
− 2 j.
z x +y x + y2
Solution. Since z is nonzero, x and y cannot both be zero. It follows that x − yj 6= 0.
We have
 
1 1 1 x − yj x − yj x − yj x y
= = = = 2 = 2 − 2 j.
z x + yj x + yj x − yj x2 − xyj + yxj − y 2 j 2 x + y2 x + y2 x + y2
1
Since x, y are not both zero, x2 + y 2 > 0, which guarantees that z
is defined.

Notice that when we divide by a nonzero complex number x + yj, we multiply both the
numerator and denominator by x − yj. This is because (x + yj)(x − yj) = x2 + y 2 ∈ R,
which allows us to put the quotient into standard form. We can now divide any complex
number by any nonzero complex number.

Exercise 1.1
With z = 3 − 2j and w = −2 + j, compute z/w in standard form.

Solution. We have
−6 − 3j + 4j + 2j 2
 
z 3 − 2j 3 − 2j −2 − j −8 + j 8 1
= = = = = − + j.
w −2 + j −2 + j −2 − j 4 + 2j − 2j − j 2 4+1 5 5

Exercise 1.2
Express
(1 − 2j) − (3 + 4j)
5 − 6j
in standard form.

Solution. We carry out our operations as we would with real numbers.

(1 − 2j) − (3 + 4j) −2 − 6j
=
5 − 6j 5 − 6j
 
−2 − 6j 5 + 6j
=
5 − 6j 5 + 6j
−10 − 12j − 30j − 36j 2
=
25 + 36

4
26 − 42j
=
61
26 42
= − j
61 61

Note that for z ∈ C, we have z 1 = z, and for any integer k ≥ 2, z k = z(z k−1 ). For z 6= 0
(here, 0 = 0 + 0j), z 0 = 1. As usual, 00 is undefined. For any z ∈ C with z 6= 0, we have
z −k = z1k for any positive integer k. In particular, z −1 = z1 for z 6= 0.

We now summarize the rules of arithmetic in C. Notice that we’ve encountered some of
these rules already.

Theorem 1.1: Properties of Arithmetic in C


Let u, v, z ∈ C. Then

(1) (u + v) + z = u + (v + z) addition is associative

(2) u + v = v + u addition is commutative

(3) z + 0 = z 0 is the additive identity

(4) z + (−z) = 0 −z is the additive inverse of z

(5) (uv)z = u(vz) multiplication is associative

(6) uv = vu multiplication is commutative

(7) z(1) = z 1 is the multiplicative identity

(8) for z 6= 0, z −1 z = 1 z −1 is the multiplicative inverse of z 6= 0

(9) z(u + v) = zu + zv distributive law

5
Lecture 2

Example 2.1

Find all z ∈ C satisfying z 2 = −7 + 24j.


Solution. Let z = a + bj with a, b ∈ R. Then

z 2 = (a + bj)2 = a2 − b2 + 2abj = −7 + 24j.

Equating real and imaginary parts gives

a2 − b2 = −7 (6)
2ab = 24 (7)
24 12 12
From (7), we have that a, b 6= 0, so b = 2a
= a
. Substituting b = a
into (6) gives
2 
2 12
a − = −7
a
144
a2 − 2 = −7
a
4 2
a + 7a − 144 = 0
(a + 16)(a2 − 9) = 0
2

(a2 + 16)(a + 3)(a − 3) = 0.

Since a ∈ R, a2 + 16 > 0, so we conclude that a + 3 = 0 or a − 3 = 0 which gives a = 3


or a = −3. Since b = 12
a
, b = 12
3
12
= 4 or b = −3 = −4. Thus z = 3 + 4j or z = −3 − 4j.

Exercise 2.1
Find all z ∈ C satisfying z 2 = −2.

Solution. Let z = a + bj with a, b ∈ R. Then

z 2 = (a + bj)2 = a2 − b2 + 2abj = −2.

Equating real and imaginary parts gives

a2 − b2 = −2 (8)
2ab = 0 (9)
2 2
From (9) we
√ see that a √ √ to −b = −2, that is b = 2.
= 0 or b = 0. If a = 0√then (8) reduces
Hence b = 2 or b = − 2. In this case, z = 2j or z = − 2j. On the other hand, if b = 0

6
2

then a√ = −2 which has no solutions since a ∈ R implies that a2 ≥ 0. Thus z = 2j and
z = − 2j are the only solutions.

Conjugate and Modulus


We introduce the conjugate and modulus of a complex number and state their basic prop-
erties.
Definition 2.1: Complex Conjugate
The complex conjugate of z = x + yj with x, y ∈ R is z = x − yj.

Example 2.2

• 1 + 3j = 1 − 3j
√ √
• 2j = − 2j

• −4 = −4

Theorem 2.1: Properties of Conjugates


Let z, w ∈ C with z = x + yj where x, y ∈ R. Then

(1) z = z

(2) z + z = 2x = 2Re(z)

(3) z − z = 2yj = 2jIm(z)

(4) z ∈ R ⇐⇒ z = z

(5) z is purely imaginary ⇐⇒ z = −z

(6) z + w = z + w

(7) zw = z w
z z
(8) = provided w 6= 0
w w
k
(9) z k = z for k ∈ Z (k > 0 if z = 0)

(10) zz = x2 + y 2

Note that “⇐⇒” means “if and only if”.

7
Proof. Let z, w ∈ C with z = x + yj and w = u + vj where x, y, u, v ∈ R.

(1) z = x + yj = x − yj = x + (−y)j = x − (−y)j = x + yj = z.

(2) z + z = (x + yj) + (x − yj) = 2x = 2Re(z).

(3) z − z = (x + yj) − (x − yj) = 2yj = 2jIm(z).

(4) z = z ⇐⇒ x − yj = x + yj ⇐⇒ 2yj = 0 ⇐⇒ y = 0 ⇐⇒ Im(z) = 0 ⇐⇒ z ∈ R.

(5) z = −z ⇐⇒ 2z = z − z = 2jIm(z) by (3) ⇐⇒ z = jIm(z) ⇐⇒ z is purely imaginary.

(6) z + w = (x + yj) + (u + vj) = (x + u) + (y + v)j = (x + u) − (y + v)j


= (x − yj) + (u − vj) = z + w.

(7) We have

zw = (x + yj)(u + vj) = (xu − yv) + (xv + yu)j = (xu − yv) − (xv + yu)j

and
 
z w = x + yj u + vj = (x − yj)(u − vj) = (xu − yv) + (−xv − yu)j
= (xu − yv) − (xv + yu)j

from which it follows that zw = z w.

(8) For w 6= 0,
     
1 1 u v u v
= = 2 2
− 2 2
j = 2 2
+ 2 j
w u + vj u +v u +v u +v u + v2

and
1 1 u v
= = 2 + j
w u − vj u + v 2 u2 + v 2
so  
1 1
= .
w w
Now, using (7) we obtain
z    
1 1 1 z
= z =z =z =
w w w w w

(9) This requires a proof technique called induction which we do not cover in MATH 115.

(10) zz = (x + yj)(x − yj) = x2 + xyj − xyj − y 2 j 2 = x2 + y 2 .

8
Definition 2.2: Modulus
p
The modulus of z = x+yj with x, y ∈ R is the nonnegative real number |z| = x2 + y 2 .

Example 2.3
√ √
• |1 + j| = 12 + 12 = 2
√ √
• |3j| = 02 + 32 = 9 = 3
p √
• | − 4| = (−4)2 + 02 = 16 = 4

For x ∈ R, we know that since R ⊆ C, x ∈ C. Thus the modulus of x is given by


√ √
|x| = |x + 0j| = x2 + 02 = x2 = |x| .
|{z} |{z}
modulus absolute value

We see that for real numbers x, the modulus of x is the absolute value of x. Thus the
modulus is the extension of the absolute value to the complex numbers which is why we
have chosen the same notation for the modulus as the absolute value. We will see shortly
that the modulus of a complex number can be interpreted as the size or magnitude of that
complex number, just like the absolute value of a real number can be interpreted as the size
or magnitude of that real number.

Note that for real numbers x, y, we always have x ≤ y or y ≤ x, that is, we can always directly
compare two real numbers. The same is not true for complex numbers. For example, we
cannot say 1 + j ≤ 3j nor can we say 3j ≤ 1 + j. However, the modulus√ does give us a way
to indirectly compare complex numbers: we can say that |1 + j| = 2 < 3 = |3j|.

Theorem 2.2: Properties of Modulus


Let z, w ∈ C. Then

(1) |z| = 0 ⇐⇒ z = 0

(2) |z| = |z|

(3) zz = |z|2

(4) |zw| = |z||w|


z |z|
(5) = provided w 6= 0
w |w|
(6) |z + w| ≤ |z| + |w| which is known as the Triangle Inequality

9
Proof. Let z, w ∈ C.

(1) Assume first that z = 0. pThen |z| = 02 + 02 = 0. Assume now that z = x + yj is
such that |z| = 0. Then x2 + y 2 = 0 and so x2 + y 2 = 0. It follows that x = y = 0
and so z = 0.
p p
(2) |z| = |x − yj| = |x + (−y)j| = x2 + (−y)2 = x2 + y 2 = |z|.

(3) Using Theorem 2.1(10), we have zz = x2 + y 2 = |z|2 .

(4) We have

|zw|2 = (zw)zw by (3)


= zwz w by Theorem 2.1(7)
= zzww
= |z|2 |w|2 by (3)
= (|z||w|)2

Thus |zw|2 = (|z||w|)2 . Since the modulus of a complex number is never negative, we
can take square roots of both sides to obtain |zw| = |z||w|.

(5) Let w 6= 0. Using (4) and Theorem 2.1(8), we have

z 2 zz z z zz |z|2
= = = =
w w w ww ww |w|2

Since the modulus of a complex number is never negative, we can take square roots of
z |z|
both sides to obtain = .
w |w|
(6) Left as an exercise.

Note that for a complex number z 6= 0, the modulus and the complex conjugate give us a
nice way to write z −1 :
1 z z
z −1 = = = 2.
z zz |z|

Geometry
Visually, we interpret the set of real numbers as a line. Given that R ⊆ C and that there
are complex numbers that are not real, the set of complex numbers should be “bigger” than
a line. In fact, the set of complex numbers is a plane, much like the xy–plane as shown in
Figure 2.1. We “identify” the complex number x + yj ∈ C with the point (x, y) ∈ R2 . In
this sense, the complex plane is simply a “relabelling” of the xy–plane. The x–axis in the
xy–plane corresponds to the real axis in the complex plane which contains the real numbers,

10
The y–axis of the xy–plane corresponds to the imaginary axis in the complex plane which
contains the purely imaginary numbers. Note we will often label the real axis as “Re” and
the imaginary axis as “Im”.

(a) The xy−plane, known as R2 . (b) The complex plane C.

Figure 2.1: The xy−plane and the complex plane.

We also have a geometric interpretation of the complex conjugate and the modulus as well
which is shown in Figure 2.2.

Figure 2.2: Visually interpreting the complex conjugate and the modulus of a
complex number.

11
For z ∈ C, we see that that z is a reflection of z in the real axis and that |z| is the distance
between 0 and z. Also note that any complex number w lying on the green circle in Figure
2.2 satisfies |w| = |z|. If w is inside the green circle, then |w| < |z| and if w is outside the
green circle, then |w| > |z|.

We also gain a geometric interpretation of addition:

Figure 2.3: Visually interpreting complex addition.

We see that the complex numbers 0, z, w and z+w form a parallelogram with the line segment
between 0 and z + w as one of the diagonals. Finally, we look at the triangle determined by
0, z and z + w.

Figure 2.4: Visualizing the Triangle Inequality.

Since the length of any one side of a triangle cannot exceed the sum of the other two sides

12
(or else the triangle wouldn’t “close”), we have must have

|z + w| ≤ |z| + |w|.

Note that Figure 2.4 does not constitute a proof of the Triangle Inequality.

13
Lecture 3

Complex Numbers – Polar Form


We now look at another way that we can represent complex numbers that will help us gain
a geometric understanding of complex multiplication. Consider a nonzero complex number
z = x + yj in standard form. Let r = |z| > 0 and let θ denote the angle the line segment
from 0 to z makes with the positive real axis, measured counterclockwise. We refer to r > 0
as the radius of z and θ as an argument of z.

Figure 3.1: A complex number with its radius and an argument.


p
Given z = x+yj 6= 0 in standard form, we compute r = |z| = x2 + y 2 > 0 and we compute
θ using
x y
cos θ = and sin θ = .
r r
It follows that
x = r cos θ and y = r sin θ
from which we obtain

z = x + yj = (r cos θ) + (r sin θ)j = r(cos θ + j sin θ).



Note that | cos θ + j sin θ| = cos2 θ + sin2 θ = 1, and as a result, we may understand an
argument of a complex number z as giving us a point on a circle of radius 1 to move towards
(that is measured counterclockwise from the positive real axis), while r > 0 tell us how far
to move in that direction to reach z. This is illustrated in Figure 3.2.

14
Figure 3.2: Using r and θ to locate a complex number. Here, r > 1.

So far, we have considered complex numbers z 6= 0. For z = 0, it is clear that r = 0 with


any θ ∈ R serving as an argument for z = 0 since 0 = 0(cos θ + j sin θ) for any θ ∈ R.

Definition 3.1: Polar Form


The polar form of a complex number z is given by

z = r(cos θ + j sin θ)

where r = |z| and θ is an argument of z.

We typically write cos θ + j sin θ rather than cos θ + (sin θ)j to avoid the extra brackets. For
standard form, we still write x + yj and not x + jy. Note that unlike standard form, z does
not have a unique polar form. Recall that for any k ∈ Z,

cos θ = cos(θ + 2kπ) and sin θ = sin(θ + 2kπ)

so 
r(cos θ + j sin θ) = r cos(θ + 2kπ) + j sin(θ + 2kπ)
for any k ∈ Z.

15
Example 3.1
Write the following complex numbers in polar form.

(1) 1 + 3j

(2) 7 + 7j

Solution.
√ q √ √ √
(1) We have r = |1 + 3j| = 12 + ( 3)2 = 1 + 3 = 4 = 2. Thus, factoring

r = 2 out of 1 + 3j gives
√ !
√ 1 3
1 + 3j = 2 + j .
2 2

1 3
As this is of the form r(cos θ + j sin θ), we have that cos θ = 2
and sin θ = 2
.
We thus take θ = π3 so
√  π π
1 + 3j = 2 cos + j sin .
3 3
√ p √
(2) Since r = |7 + 7j| = 72 + 72 = 2(49) = 7 2, we have that
√ √
   
7 7 1 1
7 + 7j = 7 2 √ + √ j = 7 2 √ + √ j
7 2 7 2 2 2
√ √
√1 2 2 π
so cos θ = 2
= 2
and sin θ = 2
. Thus we take θ = 4
to obtain
√  π π
7 + 7j = 7 2 cos + j sin .
4 4

Note that we can add 2π to either of our above arguments to obtain



 
7π 7π
1 + 3j = 2 cos + j sin
3 3

 
9π 9π
7 + 7j = 7 2 cos + j sin
4 4
which verifies that the polar form of a complex number is not unique. Normally, we choose
our arguments θ such that 0 ≤ θ < 2π or −π < θ ≤ π to avoid this problem.

Converting from standard form to polar form is a bit computational, however the next ex-
ample shows it is quite easy to convert from polar form back to standard form.

16
Example 3.2
5π 5π
 
Write 3 cos 6
+ j sin 6
in standard form.
Solution. We have
     √ ! √
5π 5π 3 1 3 3 3
3 cos + j sin =3 − + j =− + j.
6 6 2 2 2 2

As mentioned, polar form is useful for complex multiplication. To see how, we begin by
recalling the angle sum formulas

cos(θ1 + θ2 ) = cos θ1 cos θ2 − sin θ1 sin θ2


sin(θ1 + θ2 ) = sin θ1 cos θ2 + cos θ1 sin θ2

If
z1 = r1 (cos θ1 + j sin θ1 ) and z2 = r2 (cos θ2 + j sin θ2 )
are two complex numbers in polar form, then
 
z1 z2 = r1 (cos θ1 + j sin θ1 ) r2 (cos θ2 + j sin θ2 )
= r1 r2 (cos θ1 + j sin θ1 )(cos θ2 + j sin θ2 )

= r1 r2 (cos θ1 cos θ2 − sin θ1 sin θ2 ) + j(sin θ1 cos θ2 + cos θ1 sin θ2 )

= r1 r2 cos(θ1 + θ2 ) + j sin(θ1 + θ2 )

Thus 
z1 z2 = r1 r2 cos(θ1 + θ2 ) + j sin(θ1 + θ2 ) .
This now allows us to understand polar multiplication geometrically. Given a complex
number z = r(cos θ + j sin θ), multiplying by z can be viewed as a counterclockwise rotation
by θ about the number 0 in the complex plane, and a scaling by a factor of r. This is
illustrated in Figure 3.3. Note that a counterclockwise rotation by θ is a clockwise rotation
by −θ. Thus, if θ = − π4 for example, then multiplication by z can be viewed as a clockwise
rotation by π4 (plus a scaling by a factor of r).
Recall that multiplying complex numbers in standard form requires a binomial expansion
which can be tedious and error prone by hand. Although it is also tedious to convert a
complex number in standard form to polar form, multiplying complex numbers in polar
form is quite simple. We simply multiply the two moduli together, which is just multiplica-
tion of real numbers, and add the arguments together, which is just addition of real numbers.

17
Figure 3.3: Multiplication of complex numbers in polar form. Note that in this
image, |z1 |, |z2 | > 1 and θ1 , θ2 > 0.

Example 3.3

Let z1 = 2 cos π3 + j sin π3 and z2 = 7 2 cos π4 + j sin π4 . Express z1 z2 in polar form.
 

Solution. We have
√  π π  √
 π π   
7π 7π
z1 z2 = 2(7 2) cos + + j sin + = 14 2 cos + j sin .
3 4 3 4 12 12

Exercise 3.1
Let z1 = r1 (cos θ1 + j sin θ1 ) and z2 = r2 (cos θ2 + j sin θ2 ) be two complex numbers in
polar form with z2 6= 0 (from which it follows that r2 6= 0). Show that
z1 r1 
= cos(θ1 − θ2 ) + j sin(θ1 − θ2 ) .
z2 r2

Solution. Recall that


cos(θ1 − θ2 ) = cos θ1 cos θ2 + sin θ1 sin θ2
sin(θ1 − θ2 ) = sin θ1 cos θ2 − cos θ1 sin θ2
We have
z1 r1 (cos θ1 + j sin θ1 )
=
z2 r2 (cos θ2 + j sin θ2 )

18
r1 cos θ1 + j sin θ1 cos θ2 − j sin θ2
=
r2 cos θ2 + j sin θ2 cos θ2 − j sin θ2
r1 (cos θ1 cos θ2 + sin θ1 sin θ2 ) + j(sin θ1 cos θ2 − cos θ1 sin θ2 )
=
r2 cos2 θ2 + sin2 θ2
r1 
= cos(θ1 − θ2 ) + j sin(θ1 − θ2 ) .
r2

Powers of Complex Numbers


We now look at computing z n for any integer n. Let z = r(cos θ + j sin θ) be a complex
number in polar form. Then

z 2 = r(cos θ + j sin θ) r(cos θ + j sin θ)


 

= r2 cos(θ + θ) + j sin(θ + θ)


= r2 cos(2θ) + j sin(2θ) .


A similar computation shows that

z 3 = z 2 z = r3 cos(3θ) + j sin(3θ) .


Continuing with this process, it appears that for any positive integer n,

z n = rn cos(nθ) + j sin(nθ) .


Exercise 3.2
For z = r(cos θ + j sin θ) 6= 0, show that
1 1
z −1 =

= cos(−θ) + j sin(−θ) .
z r

Solution. Note that |1| = 1 and θ = 0 is an argument for 1. Using the result of Exercise 3.1,
we have
1 1(cos 0 + j sin 0) 1  1
z −1 =

= = cos(0 − θ) + j sin(0 − θ) = cos(−θ) + j sin(−θ) .
z r(cos θ + j sin θ) r r

Exercise 3.2 shows that z n = rn cos(nθ) + j sin(nθ) holds for n = −1 as well. We have the
following important result.

19
Theorem 3.1: de Moivre’s Theorem
If z = r(cos θ + j sin θ) 6= 0, then

z n = rn cos(nθ) + j sin(nθ)


for any n ∈ Z.

Since de Moivre’s Theorem is stated for n ∈ Z, we have to allow for n ≤ 0 and thus the
restriction that z 6= 0. It is easy to verify that de Moivre’s Theorem holds for z = 0 provided
n ≥ 1 since z n = 0 in this case. The proof of de Moivre’s Theorem again requires induction
so is not included here.
Example 3.4

Compute (2 + 2j)7 using de Moivre’s Theorem and express your answer in standard
form.
√ p √
Solution. We have r = |2 + 2j| = 4 + 4 = 2(4) = 2 2 and so
√ √ !
√ √
 
2 2 2 2
2 + 2j = 2 2 √ + √ j = 2 2 + j
2 2 2 2 2 2

from which we find θ = π4 . Thus


√  π π
2 + 2j = 2 2 cos + j sin .
4 4
Then
7
√ 

7 π π
(2 + 2j) = 2 2 cos + j sin
4 4
√ 7
 
7π 7π
= (2 2) cos + j sin by de Moivre’s Theorem
4 4
√ √ !
√ 2 2
= 1024 2 − j
2 2
= 1024 − 1024j

Exercise 3.3
 √ 602
3
Compute 21 + 2
j and express your answer in standard form.

20
√ q
1 3 1 3
Solution. Since r = 2
+ 2
j = 4
+ 4
= 1, we see that

1 3 π π
+ j = cos + j sin .
2 2 3 3
Thus
√ !602 
1 3 π π 602
+ j = cos + j sin
2 2 3 3
602π 602π
= cos + j sin by de Moivre’s Theorem
3 3
2π 2π
= cos + j sin
3 √ 3
1 3
=− + j
2 2
It is hopefully apparent that trigonometry will play a role here, so we include the unit circle
in the complex plane. Note that in MATH 115, we use radians to measure angles as opposed
to degrees.

Figure 3.4: The unit circle in the complex plane.

21
Lecture 4

Complex nth Roots


Let n be a positive integer and z ∈ C. We have seen that we can use polar form to compute
z n rather easily, but suppose instead that we are asked to find all w ∈ C such that wn = z.
We refer to such a w as an nth root of z. Example 2.1 and Exercise 2.1 illustrate one way of
solving this problem when n = 2, but for larger n, this method becomes much more difficult.
Again, polar form will help us. Of course, if z = 0, then wn = 0 implies that w = 0, so we
assume z 6= 0.

Let z = r(cos θ + j sin θ) and let w = R(cos φ + j sin φ). From wn = z we have
n
R(cos φ + j sin φ) = r(cos θ + j sin θ).
Using de Moivre’s Theorem, we obtain
Rn (cos(nφ) + j sin(nφ)) = r(cos θ + j sin θ).
From this we find that
Rn = r and nφ = θ + 2kπ
for some k ∈ Z. To understand this, notice that since wn = z, it must be the case that wn
and z have the same modulus and so Rn = r, and that any argument of wn must be equal
to an argument of z plus some integer multiple of 2π. Solving for R and φ gives
θ + 2kπ
R = r1/n and φ =
n
for some k ∈ Z. Here, r1/n is the nth root of the real number r and is evaluated in the
normal way. Thus, for any k ∈ Z, let
    
1/n θ + 2kπ θ + 2kπ
wk = r cos + j sin .
n n
Then
      n
θ + 2kπ θ + 2kπ
wkn = r 1/n
cos + j sin
n n
    
1/n n
 θ + 2kπ θ + 2kπ
= r cos n + j sin n by de Moivre’s Theorem
n n

= r cos(θ + 2kπ) + j sin(θ + 2kπ)
= r(cos θ + j sin θ)
= z.
Hence wkn = z for any integer k. It is tempting to think that there will be infinitely many
solutions to wn = z, but in fact we obtain exactly n solutions.

22
Theorem 4.1
Let z = r(cos θ + j sin θ) be nonzero, and let n be a positive integer. Then the n
distinct nth roots of z are given by
    
1/n θ + 2kπ θ + 2kπ
wk = r cos + j sin .
n n

for k = 0, 1, . . . , n − 1.

A few examples will show why we only need to consider k = 0, 1, . . . , n − 1.

Example 4.1

Find the 3rd roots of 1, that is, find all w ∈ C such that w3 = 1.
Solution. Here, z = 1 and n = 3. In polar form, 1 = 1(cos 0 + j sin 0) so the 3rd roots
of 1 are given by
    
1/3 0 + 2kπ 0 + 2kπ
wk = 1 cos + j sin , k = 0, 1, 2
3 3
2kπ 2kπ
= cos + j sin , k = 0, 1, 2
3 3
Thus

w0 = cos 0 + j sin 0 = 1

2π 2π 1 3
w1 = cos + j sin =− + j
3 3 2 √2
4π 4π 1 3
w2 = cos + j sin =− − j
3 3 2 2

√ √
3 3
Thus, the 3rd roots of 1 are given by 1, − 21 + 2
j and − 12 − 2
j. This means that
√ !3 √ !3
1 3 1 3
13 = − + j = − − j = 1.
2 2 2 2

If we try to compute w−1 and w3 , we find


    √
2π 2π 1 3
w−1 = cos − + j sin − =− − j = w2
3 3 2 2
w3 = cos(2π) + j sin(2π) = 1 = w0

23
We see that as we increase (resp. decrease) k by 1, we rotate counterclockwise (resp. clock-
wise) by an angle of 2π/3, and thus after doing so three times, we are back where we started.
The 3rd roots of 1 are plotted in Figure 4.1.

Figure 4.1: The 3rd roots of 1.

Example 4.2
Find all 4th roots of −256 in standard form and plot them in the complex plane.
Solution. Here, z = −256 and n = 4. We have that −256 = 256(cos π + j sin π) so the
4th roots are given by
    
1/4 π + 2kπ π + 2kπ
wk = (256) cos + j sin , k = 0, 1, 2, 3
4 4
    
π + 2kπ π + 2kπ
= 4 cos + j sin , k = 0, 1, 2, 3.
4 4

Thus
√ √ !
 π π 2 2 √ √
w0 = 4 cos + j sin =4 + j = 2 2 + 2 2j
4 4 2 2

24
√ √ !
√ √
 
3π 3π 2 2
w1 = 4 cos + j sin =4 − + j = −2 2 + 2 2j
4 4 2 2
√ √ !
√ √
 
5π 5π 2 2
w2 = 4 cos + j sin =4 − − j = −2 2 − 2 2j
4 4 2 2
√ √ !
√ √
 
7π 7π 2 2
w3 = 4 cos + j sin =4 − j = 2 2 − 2 2j
4 4 2 2

which we plot in the complex plane. Notice again that the roots are evenly spaced out
on a circle of radius 4.

Figure 4.2: The 4th roots of −256.

Exercise 4.1

Find the 3rd roots of 4 − 4 3j. Express your answers in polar form.

√ √ √
Solution. Since |4 − 4 3j| = 4|1 − 3j| = 4 1 + 3 = 4(2) = 8, we have
√ ! √ !

 
4 4 3 1 3 5π 5π
4 − 4 3j = 8 − j =8 − j = 8 cos + j sin .
8 8 2 2 3 3

Thus, the 3rd roots are given by


  5π   5π 
1/3 + 2kπ + 2kπ
wk = 8 cos 3 + j sin 3 , k = 0, 1, 2
3 3

25
    
5π + 6kπ 5π + 6kπ
= 2 cos + j sin , k = 0, 1, 2
9 9
so
 
5π 5π
w0 = 2 cos + j sin
9 9
 
11π 11π
w1 = 2 cos + j sin
9 9
 
17π 17π
w2 = 2 cos + j sin
9 9

Note that complex numbers with arguments such as 5π/9, 11π/9 and 17π/9 are difficult to
write in standard form without a calculator.

The Complex Exponential


In this section, we introduce the notation ejθ and briefly look at how it relates to polar form.

Definition 4.1: Euler’s Formula


Let θ ∈ R. The expression ejθ is defined as ejθ = cos θ + j sin θ.

The expression ejθ is actually an exponential function, but we do not formally define how the
exponential function behaves when the variable is complex in MATH 115. With the tools
gained in MATH 118 or MATH 119, you will be able to verify that Euler’s Formula indeed
holds. Given that the left hand side of Euler’s Formula looks like an exponential function
while the right hand side is a trigonometric function, it is quite remarkable that the two
quantities are equal.

If z = r(cos θ + j sin θ) is the polar form of z ∈ C, then z = rejθ is the complex exponential
form of z. As the sine and cosine functions are 2π−periodic, the complex exponential form
is not unique:

rejθ = r(cos θ + j sin θ) = r(cos(θ + 2kπ) + j sin(θ + 2kπ)) = rej(θ+2kπ) for any k ∈ Z.

Taking r = 1, we see that ejθ = ej(θ+2kπ) for any k ∈ Z, so ejθ is 2π−periodic, that is, it
oscillates like trigonometric functions!

Let z1 = r1 ejθ1 and z2 = r2 ejθ2 . Then

z1 z2 = r1 ejθ1 r2 ejθ2
= r1 (cos θ1 + j sin θ1 )r2 (cos θ2 + j sin θ2 )

26
= r1 r2 (cos(θ1 + θ2 ) + j sin(θ1 + θ2 ))
= r1 r2 ej(θ1 +θ2 )

In particular, taking r1 = r2 = 1 gives

ejθ1 ejθ2 = ej(θ1 +θ2 )

which is consistent with the rules of multiplication of exponential functions!

Let z = rejθ and let n ∈ Z. Then using de Moivre’s Theorem we have


n n
rejθ = r(cos θ + j sin θ) = rn cos(nθ) + j sin(nθ) = rn ej(nθ) .


Taking r = 1 gives n
ejθ = ej(nθ)
which is again consistent with the rules of powers of exponential functions!

Now consider ejπ = cos π + j sin π = −1. It follows that

ejπ + 1 = 0

which is known as Euler’s Identity and is often regarded as the most beautiful equation in
mathematics because it combines some of the most important quantities mathematicians use
into one tidy little equation:

e − irrational number appearing all over mathematics, particularly in differential equations


π − irrational number important for trigonometry
j − most famous nonreal complex number
1 − the multiplicative identity
0 − the additive identity

Unless specifically asked otherwise, you may use complex exponential form instead of polar
form.

Example 4.3
Find the 6th roots of −64 in standard form.
Solution. Since −64 = 64(cos π + j sin π) = 64ejπ , the 6th roots are given by
j π+2kπ j π+2kπ
wk = 641/6 e 6
= 2e 6
, k = 0, 1, 2, 3, 4, 5.

27
Thus,
√ !
3 1 √
w0 = 2ejπ/6 = 2 + j = 3+j
2 2
w1 = 2ejπ/2 = 2(0 + j) = 2j
√ !
3 1 √
w2 = 2ej5π/6 = 2 − + j =− 3+j
2 2
√ !
3 1 √
w3 = 2ej7π/6 = 2 − − j =− 3−j
2 2
w4 = 2ej3π/2 = 2(0 − j) = −2j
√ !
3 1 √
w5 = 2ej11π/6 = 2 − j = 3−j
2 2

In a course on complex analysis, one often begins by studying well-known functions from
√ complex. Given z ∈ C, one considers functions
calculus while allowing the variable to be
z
such as e , sin z, cos z, tan z, ln z and z. As our work above suggests, these functions
behave quite differently when the variable is allowed to be complex. As an example of how
different the behaviour is, it can be shown that there exist infinitely many z ∈ C such that
sin z = w where w is any given complex number (say, w = 7). However, this is a topic for
another course.

28
Lecture 5

Complex Polynomials
Definition 5.1: Polynomial

A polynomial p(x) of degree n is given by the expression

an xn + an−1 xn−1 + · · · + a1 x + a0

with an 6= 0. We call x the variable and a0 , a1 , . . . , an the coefficients. We often denote


a polynomial by p and write p(x) = an xn + an−1 xn−1 + · · · + a1 x + a0 . A number c is
a root (or a zero) of p(x) if p(c) = 0, or equivalently, if x − c is a factor of p(x).

In high school, you likely studied polynomials in the case a0 , a1 , . . . , an ∈ R. Here we will
briefly introduce polynomials where a0 , a1 , . . . , an ∈ C.

Definition 5.2: Real and Complex Polynomials

Let p(x) = an xn + an−1 xn−1 + · · · + a1 x + a0 . If a0 , a1 , . . . , an ∈ R, then we call p(x)


a real polynomial, and if a0 , a1 , . . . , an ∈ C, then we call p(x) a complex polynomial.
Since R ⊆ C, every real polynomial is also a complex polynomial.

For a polynomial with non-real coefficients, we often use the variable z in place of x.

Example 5.1

The polynomial p(z) = jz 3 − (1 − j)z 2 + 3z + (4 − j) is a complex polynomial of degree


3 with coefficients

a3 = j, a2 = −(1 − j), a1 = 3 and a0 = 4 − j.

We define the basic operations of complex polynomials. Since every real polynomial is a
complex polynomial, these result also hold for real polynomials and should be familiar from
high school.

Definition 5.3: Equality, Addition and Scalar Multiplication of Polynomials

Let p(z) and q(z) be two complex polynomials with


p(z) = an z n + an−1 z n−1 + · · · + a1 z + a0
q(z) = bn z n + bn−1 z n−1 + · · · + b1 z + b0

29
for some a0 , . . . , an , b0 , . . . , bn ∈ C. We say that

• p(z) and q(z) are equal if ai = bi for i = 0, 1, . . . , n,

and we define

• (p+q)(z) = p(z)+q(z) = (an +bn )z n +(an−1 +bn−1 )z n−1 +· · ·+(a1 +b1 )z+(a0 +b0 )

• (kp)(z) = kp(z) = kan z n + kan−1 z n−1 + · · · + ka1 z + ka0 for any k ∈ C.

Definition 5.3 makes no mention of the degree of p(z) and q(z), that is, we don’t assume
that an 6= 0 or that bn 6= 0. Of course, in order for p(z) and q(z) to be equal, they must
have the same degree. However we can add polynomials of different degrees. For exam-
ple, if p(z) = z 2 + 1 and q(z) = z 3 then we can write these as p(z) = 0z 3 + 1z 2 + 1 and
q(z) = 1z 3 + 0z 2 + 0 to get that (p + q)(z) = z 3 + z 2 + 1.

A real polynomial need not have a real root: consider p(x) = x2 + 1 as an example. How-
ever p(x) is also a complex polynomial with two complex roots x = j and x = −j since
p(±j) = (±j)2 + 1 = −1 + 1 = 0. The Fundamental Theorem of Algebra states that every
non-constant complex polynomial will have at least one complex root. The proof of this
Theorem requires a bit more knowledge of polynomials and is thus omitted.

Theorem 5.1: Fundamental Theorem of Algebra

Let p(z) be a complex polynomial of degree at least 1. Then p(z) has at least 1 complex
root.

The next Corollary follows from the Fundamental Theorem of Algebra and says that we can
expect the number of roots of a complex polynomial, counting multiplicities, to be equal to
the degree of the polynomial provided this degree is at least 1. By counting multiplicities,
we mean that we count repeated roots of a polynomial according to how many times they
appear as a root. For example, the third-degree complex polynomial p(z) = (z − j)2 (z − 1)
has two distinct roots, j and 1, but note that j appears as a double root, so we say that p(z)
has three roots counting multiplicities.

Corollary 5.1

Let p(z) be a complex polynomial of degree n ≥ 1. Then p(z) has exactly n complex
roots, counting multiplicites.

We noted that above that the real polynomial p(x) = x2 + 1 has complex roots ±j. That

30
these two roots are complex conjugates of one another is not a coincidence.

Theorem 5.2: Conjugate Root Theorem

Let p(x) = an xn + an−1 xn−1 + · · · + a1 x + a0 be a real polynomial. If w ∈ C is a root


of p(x), then so too is w.

Proof. Let p(x) = an xn + an−1 xn−1 + · · · + a1 x + a0 be a real polynomial and suppose w ∈ C


is a root of p(x). Then p(w) = 0, that is

an wn + an−1 wn−1 + · · · + a1 w + a0 = 0

Taking complex conjugates of both sides and using the fact that 0, a0 , a1 , . . . , an ∈ R, we
have

an w n + an−1 w n−1 + · · · + a1 w + a0 = 0
an w n + an−1 w n−1 + · · · + a1 w + a0 = 0
an w n + an−1 w n−1 + · · · + a1 w + a0 = 0
an w n + an−1 w n−1 + · · · + a1 w + a0 = 0
an w n + an−1 w n−1 + · · · + a1 w + a0 = 0.

Thus p(w) = 0 so w is a root of p(x).

Example 5.2

Let p(x) = x3 + 16x. If p(x) = 0, then 0 = x3 + 16x = x(x2 + 16). Thus x = 0 or


x2 + 16 = 0. For x2 + 16 = 0, we can use the quadratic formula:
p
−0 ± 02 − 4(1)(16)
x=
2(1)

−64

2
8j
=± since (±8j)2 = −64
2
= ±4j.

Thus the roots of p(x) are 0, 4j and −4j. Note that given any of these roots, the
complex conjugate of that root is also a root of p(x).


It’s worth mentioning the symbol −64 from the previous example, which we haven’t ac-
tually defined (and which we won’t define in MATH 115). It is reasonable (and correct)

31

to assume that any value w we assign to −64 should satisfy w2 = −64. We know from
Theorem 4.1 that there are exactly two solutions to w2 = −64, which are 8j and −8j. We
have not developed enough theory yet to decide which of 8j and −8j should be called the
square root of −64, but notice that 8j and√−8j are negatives of one another so it won’t
actually matter which one we substitute for −64.

Note that we require p(x) to be a real polynomial for Theorem 5.2 to hold. The complex
polynomial
p(z) = z 2 + (2 + 3j)z − (5 − j)
has roots 1 − j and −3 − 2j, neither of which is a complex conjugate of the other.

Exercise 5.1
Let z ∈ C and consider the polynomial p(z) = 3z 3 − az 2 − bz + 6b where a, b ∈ R. It
is known that 2 + 2j is a root of p(z). Find a and b as well as the other roots of p(z).
Remember that if w is a root of p(z), then z − w is a factor of p(z).

Solution. Since p(z) has real coefficients and 2 + 2j is a root of p(z), we have that 2 − 2j is
also a root of p(z) by Theorem 5.2. Since p(z) as degree 3, there is a third root of p(z) by
Theorem 5.1. Let w ∈ C be this third root. Then

3z 3 − az 2 − bz + 6b = 3 z − (2 + 2j) z − (2 − 2j) (z − w)
 

= 3 z 2 − (2 − 2j)z − (2 + 2j)z + 8)(z − w)


= 3 z 2 − 4z + 8)(z − w)
= 3(z 3 − wz 2 − 4z 2 + 4wz + 8z − 8w)
= 3 z 3 − (w + 4)z 2 + (4w + 8)z − 8w


= 3z 3 − (3w + 12)z 2 + (12w + 24)z − 24w

Equating the z coefficients and the constant terms, we see that

−b = 12w + 24 (10)
6b = −24w (11)

From (11), we see that b = −4w and substituting this into (10) gives 4w = 12w + 24.
Simplifying gives 8w = −24, so w = −3. From b = −4w, we now have b = 12. Finally,
equating the z 2 coefficients gives

a = 3w + 12 = 3(−3) + 12 = −9 + 12 = 3.

Thus a = 3, b = 12 and the other two roots are 2 − 2j and −3.

32
A Very Brief Introduction to Proofs
Linear algebra is often the first time students encounter proofs. The goal of this section is
to give a brief introduction to proofs - what to assume, what to prove, and how much detail
should be given. Consider the following:

1 1
Let z1 , z2 ∈ C. Prove that if |z1 | = |z2 | = 1, then + = z1 + z2 .
z1 z2

We are told that z1 , z2 are arbitrary complex numbers. The statement


1 1
if |z1 | = |z2 | = 1, then + = z1 + z2 .
z1 z2
is what we have to prove. Statements of the form “if . . . then . . .” are called implications.
The part following the word “if” is called the hypothesis, while the part following the word
“then” is called the conclusion. To prove an implication (that is, to prove the implication
is always true), we assume that hypothesis is true, and then show that the conclusion must
be true. For our example, we may assume that z1 , z2 are any complex numbers such that
|z1 | = |z2 | = 1, and under these assumptions, we must show that
1 1
+ = z1 + z2 .
z1 z2

Example 5.3
1 1
Let z1 , z2 ∈ C. Prove that if |z1 | = |z2 | = 1, then + = z1 + z2 .
z1 z2
Proof. Let z1 , z2 ∈ C and assume that |z1 | = |z2 | = 1. Then z1 , z2 6= 0 and it follows
that z1 , z2 6= 0. We have
1 1 z1 z2
+ = +
z1 z2 z1 z1 z2 z2

z1 z2
= 2
+
|z1 | |z2 |2

= z1 + z2 since |z1 | = |z2 | = 1

= z1 + z2 .

In the above proof, we began by stating our assumptions: z1 , z2 ∈ C and |z1 | = |z2 | = 1. We
then deduced that since |z1 | = |z2 | = 1, z1 , z2 6= 0 from which it followed that z 1 , z2 6= 0.

33
This justified why we could multiply z11 by zz11 and multiply z12 by zz22 . From there, we used
properties of conjugates to finish showing the conclusion held. Note that in the proof, we
stated where we used the hypothesis |z1 | = |z2 | = 1.

We include here a couple of incorrect proofs of the statement in Example 5.3.


Incorrect Proof 1. Let z1 = z2 = 1. Then |z1 | = |z2 | = 1 and the hypothesis holds. Now
1 1 1 1
+ = + =1+1=2
z1 z2 1 1
and
z1 + z2 = 1 + 1 = 2 = 2
which shows that
1 1
+ = z1 + z2 .
z1 z2
The issue here is that specific values of z1 and z2 have been chosen when the proof required
us to assume that z1 , z2 were any complex numbers satisfying |z1 | = |z2 | = 1. The above
proof is correct in the specific case of z1 = z2 = 1, but ignores infinitely many other cases,
such as z1 = j and z2 = −j for example.
Incorrect Proof 2. Let z1 , z2 ∈ C and assume that |z1 | = |z2 | = 1. Then z1 , z2 6= 0 and it
follows that z1 , z2 6= 0. We have
1 1
+ = z1 + z2
z1 z2

z1 z2
+ = z1 + z2
z1 z1 z2 z2

z1 z2
2
+ = z1 + z2
|z1 | |z2 |2

z1 + z2 = z1 + z2 since |z1 | = |z2 | = 1

z1 + z2 = z1 + z2 .

The issue here is a bit more subtle. The proof begins by correctly listing the assumptions
and then deriving that z 1 , z 2 6= 0 without picking specific values for z1 , z2 as in the previous
incorrect proof. The error here arises in the next line: z11 + z12 = z1 + z2 . The moment this
equality appears, it is being assumed as true. However this is exactly what we are asked to
prove, so we can never assume it is true during the proof. Note that in the correct proof
stated in Example 5.3, we started with one side of the equality z11 + z12 = z1 + z2 and then de-
rived that it was equal to the other side, and at no point did we assume that they were equal.

34
Example 5.4

Let z ∈ C. Show that |Re(z)| + |Im(z)| ≤ 2|z|.
Proof. Let z ∈ C. Then z = x + yj with x, y ∈ R and x = Re(z) and y = Im(z). Since

( 2|z|)2 = 2|z|2 = 2(x2 + y 2 ) = 2x2 + 2y 2 = 2|x|2 + 2|y|2

and 2
|Re(z)| + |Im(z)| = (|x| + |y|)2 = |x|2 + 2|x||y| + |y|2 ,
we have
√ 2
( 2|z|)2 − |Re(z)| + |Im(z)| = 2|x|2 + 2|y|2 − |x|2 − 2|x||y| − |y|2
= |x|2 − 2|x||y| + |y|2
= (|x| − |y|)2
= (|Re(z)| − |Im(z)|)2
≥0
√ 2 2 √
Thus ( 2|z|)2 − |Re(z)| + |Im(z)| ≥ 0, that is, |Re(z)| + |Im(z)| ≤ ( 2|z|)2 .

Since both |Re(z)| + |Im(z)| and 2|z| are nonnegative
√ real numbers, we take square
roots and conclude that |Re(z)| + |Im(z)| ≤ 2|z|.

The proof presented in Example 5.4 can be challenging to create, and even difficult to
understand if you’re just reading it. It relies on a few results about absolute values and
square roots:
√ √
(1) For a, b ∈ R, if 0 ≤ a ≤ b then a ≤ b.
√ √
(2) For c ∈ R, c2 = |c|, however is c ≥ 0, then c2 = c since |c| = c in this case.
To help understand the proof, let
a = |Re(z)| + |Im(z)|

b = 2|z|
and note that a and b are both real numbers with a, b ≥ 0. We are thus being asked to show
that a ≤ b. In the proof, we squared both a and b, getting expressions in terms of x and y.
We then showed that b2 − a2 ≥ 0 from which it followed
√ that 2 2 2 2
√ a ≤ b . Since a , b ≥ 0, we
have 0 ≤ a2 ≤ b2 . Using (1), we can conclude that a2 ≤ b2 and using (2), we have that
a ≤ b as required.

Once the logic of the proof is understood, it won’t look so intimidating! As an additional note,
it was not necessary to introduce x and y – we could have worked with z = Re(z) + Im(z)j.
The choice to switch to x and y was simply to make the proof a bit more readable.

35
Lecture 6

Vector Algebra
We begin with the Cartesian Plane. We choose an origin O and two perpendicular axes
called the x1 −axis and the x2 −axis.1 A point P in this plane is represented by the ordered
pair (p1 , p2 ). We think of p1 as a measure of how far to the right (if p1 > 0) or how far to the
left (if p1 < 0) P is from the x2 −axis and we think of p2 as a measure of how far above (if
p2 > 0) or how far below (if p2 < 0) the x1 −axis P is. It is often convenient to associate to
each point a vector which we view geometrically as an “arrow”, or a directed line segment.
Thus, given a point P (p1 , p2 ) in our Cartesian plane, we associate to it the vector p~ = [ pp12 ].
This is illustrated in Figure 6.1.

Figure 6.1: The point P (p1 , p2 ) in the Cartesian Plane and the vector p~ = [ pp12 ].

Of course, this idea extends to three-space where we have the x1 −, x2 − and x3 −axes as
demonstrated in Figure 6.2.

Figure 6.2: A vector in three-space. Note the labelling of the axes.

1
You might be more familiar with the names x−axis and y−axis. However, this naming scheme will lead
to us running out of letters as we consider more axes, and hence we will call them the x1 −axis and the
x2 −axis.
36
Definition 6.1: Rn
We define   
 x1 
.. 
 
n
R =  .  x1 , . . . , xn ∈ R


 
xn
to be the set (or collection) of all vectors with n components, each of which is a real
number. A vector in R3 is illustrated in Figure 6.2.

Thus, for example,


  
(" # )  x 1 
x1  
R2 = x1 , x2 ∈ R 3
and R =  x2  x1 , x2 , x3 ∈ R .
 
x2  
x3
 

Definition 6.2: Zero Vector


h0i
.
The zero vector in Rn is denoted by ~0Rn = .. , that is, the vector whose n entries are
0
all zero.

For example,
 
  0
" # 0
~0R2 = 0 ~0R3

~0R4 =  0 
, =  0 , and so on.
  
 
0  0 
0
0
We often simply denote the zero vector in Rn as ~0 whenever this doesn’t cause confusion.
However, if we are considering, say, R2 and R3 at the same time, then we may prefer to write
~0R2 and ~0R3 to respectively denote the zero vectors of R2 and R3 , since it may not be clear
which zero vector we are referring to when we write ~0.

Definition 6.3: Equality of Vectors


h x1 i y 
.. .1
Two vectors ~x = . and ~y = .. in Rn are equal if x1 = y1 , x2 = y2 , . . . , xn = yn ,
xn yn
that is, if their corresponding entries are equal, and we write ~x = ~y in this case.
Otherwise, we write ~x 6= ~y .

n m
It is important to note that h ifi ~x ∈ R and ~y ∈ R with n 6= m, then ~x and ~y can never be
1
equal. For example, [ 12 ] 6= 2 as one vector belongs to R2 and the other belongs to R3 .
0

37
Definition 6.4: Vector Addition
h x1 i y 
.. .1
Let ~x = . and ~y = .. be two vectors in Rn . We define vector addition as
xn yn

 
x1 + y1
.. n
~x + ~y =  ∈R ,
 
.
xn + yn

that is, we add vectors by adding the corresponding entries.

Example 6.1
" # " # " #
1 −1 0
• + =
2 3 5
     
1 2 3
•  2 + 3 = 5 
     

3 −2 1
 
1 " #
1
•  1 + is not defined, because one vector is in R3 and the other is in R2 .
 
2
1

We have a nice geometric interpretation of vector addition that is illustrated in Figure 6.3.
We see that two vectors determine a parallelogram with their sum appearing as a diagonal
of this parallelogram.2

Figure 6.3: Geometrically interpreting vector addition. The figure on the left is in
R2 with vector components labelled on the corresponding axes and the figure on the
right is vector addition viewed for vectors in Rn with the x1 −, x1 −, . . . , xn −axes
removed.

2
If the one of the two vectors being added is a scalar multiple of the other, then our parallelogram is
simply a line segment or a “degenerate” parallelogram.
38
Definition 6.5: Scalar Multiplication
h x1 i
.
Let ~x = .. ∈ Rn and let c ∈ R. We define scalar multiplication as
xn

 
cx1
 . 
c~x =  ..  ∈ Rn
cxn

that is, we multiply each entry of ~x by c.

Example 6.2
   
1 2
 6   12 
• 2 =
   

 −4   −8 
8 16
   
−1 0
• 0  −1  =  0  = ~0
   

2 0

We often refer to c ∈ R as a scalar, and call c~x a scalar multiple of ~x. Figure 6.4 helps us
understand geometrically what scalar multiplication of a nonzero vector ~x ∈ R2 looks like.
The picture is similar for ~x ∈ Rn .

Figure 6.4: Geometrically interpreting scalar multiplication in R2 .

Using the definitions of addition and scalar multiplication, we can define subtraction for
~x, ~y ∈ Rn :
~x − ~y = ~x + (−1)~y .

39
Definition 6.6: Parallel Vectors
Two nonzero vectors in Rn are parallel if they are scalar multiples of one another.

Example 6.3
The vectors " # " #
2 −4
~x = and ~y =
−5 10
are parallel since ~y = −2~x, or equivalently, ~x = − 12 ~y . The vectors
   
−2 −2
~u =  −3  and ~v =  −1 
   

−4 −13

are not parallel for ~u = c~v would imply that −2 = −2c, −3 = −c and −4 = −13c
4
which implies that c = 1, 3, 13 simultaneously, which is impossible.

Thus far, we have associated vectors in Rn with points. Recall that given a point P (p1 , . . . , pn ),
we associate with it the vector  
p1
 . 
p~ =  ..  ∈ Rn
pn
and view p~ as a directed line segment from the origin to P . Before we continue, we briefly
mention that vectors may also be thought of as directed segments between arbitrary points.
For example, given two points A and B in the x1 x2 −plane, we denote the directed line seg-
−→
ment from A to B by AB. In this sense, the vector p~ from the origin O to the point P can
−→
be denoted as p~ = OP . This is illustrated in Figure 6.5.

Notice that Figure 6.5 is in R2 , but that we can view directed segments between vectors in
Rn in a similar way. We realize that there is something special about directed segments from
−→
the origin to a point P . In particular, given a point P , the entires in the vector p~ = OP
are simply the coordinates of the point P (refer to Figures 6.1 and 6.2). Thus we refer to a
−→
vector p~ = OP to be the position vector of P and and we say that p~ is in standard position.
Note that in Figure 6.5, only the vector p~ is in standard position.

40
Figure 6.5: Vectors between points in R2 (the picture in Rn is similar).

Finding a vector from a point A to a point B in Rn is also not difficult. For two points
A(a1 , a2 ) and B(b1 , b2 ) we have that
" # " # " #
−→ b 1 − a1 b1 a1 −−→ −→
AB = = − = OB − OA
b 2 − a2 b2 a2

which is illustrated in Figure 6.6.

−→
Figure 6.6: Finding the components of AB ∈ R2 .

This generalizes naturally to Rn where for A(a1 , . . . , an ) and B(b1 , . . . , bn ) we have


     
b 1 − a1 b1 a1
−→  .. ..   ..  −−→ −→
AB =  = . − .  = OB − OA.
 
.
b n − an bn an

41
Example 6.4

Find the vector from A(1, 1, 1) to B(2, 3, 4).


−→
Solution. The vector from A to B is the vector AB. We have
     
2 1 1
−→ −−→ −→      
AB = OB − OA =  3  −  1  =  2  .
4 1 3

Now in Rn , given three points A, B and C, we have that


−→ −→ −→ −−→ −→ −→ −−→ −→ −−→
AC = OC − OA = OB − OA + OC − OB = AB + BC

which is illustrated in Figure 6.7.

−→ −−→ −→
Figure 6.7: AB + BC = AC.

Finally, putting everything together, we see that given two points A and B, their correspond-
−→ −−→
ing position vectors OA and OB determine a parallelogram, and that the sum and difference
of these vectors determine the diagonals of this parallelogram. This is displayed in Figure
−−→
6.8, where the image on the right is obtained from the one on the left by setting ~u = OB and
−→ −−→ −→
~v = OA. Note that by orienting vectors this way, OB−OA = ~u−~v is not in standard position.

Figure 6.8: The parallelogram determined by two vectors. The diagonals of the
parallelogram are represented by the sum and difference of the two vectors.

42
Having equipped the set Rn with vector addition and scalar multiplication, we state here a
theorem that lists the properties these operations obey.

Theorem 6.1
~ ~x, ~y ∈ Rn and let c, d ∈ R. We have
Let w,

V1. ~x + ~y ∈ Rn Rn is closed under addition

V2. ~x + ~y = ~y + ~x addition is commutative

V3. (~x + ~y ) + w
~ = ~x + (~y + w)
~ addition is associative

V4. There exists a vector ~0 ∈ Rn such that ~v + ~0 = ~v for every ~v ∈ Rn zero vector

V5. For each ~x ∈ Rn there exists a (−~x) ∈ Rn such that ~x + (−~x) = ~0 additive inverse

V6. c~x ∈ Rn Rn is closed under scalar multiplication

V7. c(d~x) = (cd)~x scalar multiplication is associative

V8. (c + d)~x = c~x + d~x distributive law

V9. c(~x + ~y ) = c~x + c~y distributive law

V10. 1~x = ~x scalar multiplicative identity

Note that the zero vector of Rn is ~0 = ~0Rn and the additive inverse of ~x ∈ Rn is −~x = (−1)~x.
Many of the properties listed in Theorem 6.1 may seem obvious and it might not be clear
as to why they are stated as a theorem. One of the reasons is that everything we do in this
course will follow from these ten properties, so it is important to list them all here. Also, as
we proceed through the course, we will see that vectors in Rn are not the only mathematical
objects that are subject to these properties, and it is quite useful and powerful to understand
what other classes of objects behave the same as vectors in Rn .

We now define a concept that is central to linear algebra which will appear frequently
throughout the course.

Definition 6.7: Linear Combination


Let ~x1 , ~x2 , . . . , ~xk ∈ Rn and c1 , c2 , . . . , ck ∈ R for some positive integer k. We call the
vector
c1~x1 + c2~x2 + · · · + ck ~xk
a linear combination of the vectors ~x1 , ~x2 , . . . , ~xk .

It follows from properties V1 and V6 of Theorem 6.1 that if we have ~x1 , . . . , ~xk ∈ Rn and

43
c1 , . . . , ck ∈ R, then the linear combination c1~x1 + c2~x2 + · · · + ck ~xk ∈ Rn . Thus every linear
combination of ~x1 , . . . , ~xk will again be a vector in Rn and we say that Rn is closed under
linear combinations.

Example 6.5

In R3 , let      
1 0 0
~e1 =  0  , ~e2 =  1  , and ~e3 =  0  .
     

0 0 1
h 1 i
(a) Express −2 as a linear combination of ~e1 , ~e2 , ~e3 .
3
h x1 i
(b) Express ~x = x2
x3
∈ R3 as a linear combination of ~e1 , ~e2 , ~e3 .

Solution.

(a) For c1 , c2 , c3 ∈ R, consider


         
1 1 0 0 c1
 −2  = c1  0  + c2  1  + c3  0  =  c2  .
         

3 0 0 1 c3
h 1
i
Equating entries gives c1 = 1, c2 = −2 and c3 = 3 so −2 = 1~e1 − 2~e2 + 3~e3 .
3

(b) Using the exact same method as in the first part, we have ~x = x1~e1 + x2~e2 + x3~e3 .
This means that every ~x ∈ R3 can be expressed as a linear combination of ~e1 , ~e2
and ~e3 .

44
Lecture 7

The Norm of a Vector


Having introduced vectors in Rn , the algebraic operations of addition and scalar multiplica-
tion along with their geometric interpretations, we now define the norm of a vector.

Definition 7.1: Norm


h x1 i
.
The norm of ~x = .. ∈ Rn is the nonnegative real number
xn
q
k~xk = x21 + · · · + x2n .

We interpret the norm of a vector in Rn as the length or magnitude of the vector. Figure
7.1 shows this for a vector in R2 .

Figure 7.1: A vector ~x ∈ R2 and its norm, interpreted as length.

Example 7.1
√ √
• If ~x = [ 12 ] ∈ R2 , then k~xk = 12 + 22 = 5
√ √
 
1
• If ~x = 11 ∈ R4 , then k~xk = 12 + 12 + 12 + 12 = 4 = 2
1

Example 7.2

Find the distance from A(1, −1, 2) to B(3, 2, 1).

45
Solution. Since

    
3 1 2
−→ −−→ −→   
AB = OB − OA =  2  −  −1  =  3  ,
  

1 2 −1

the distance from A to B is


−→ p √ √
kABk = 22 + 32 + (−1)2 = 4 + 9 + 1 = 14.

The next theorem states some useful properties the norm obeys. We will employ these prop-
erties when we derive new results that rely on norms.

Theorem 7.1: Properties of Norms


Let ~x, ~y ∈ Rn and c ∈ R. Then

(1) k~xk ≥ 0 with equality if and only if ~x = ~0

(2) kc~xk = |c|k~xk

(3) k~x + ~y k ≤ k~xk + k~y k which we call the Triangle Inequality.

Property (3) is known as the Triangle Inequality and has a very nice geometric interpretation.
Namely, that in the triangle determined by vectors ~x, ~y and ~x +~y (see Figure 7.2), the length
of any one side of the triangle cannot exceed the sum of the lengths of the remaining two
sides.

Figure 7.2: Interpreting the Triangle Inequality.

Definition 7.2: Unit Vector


A vector ~x ∈ Rn is a unit vector if k~xk = 1.

46
Example 7.3

• ~x = [ 10 ] ∈ R2 is a unit vector since k~xk = 12 + 02 = 1
h i
1 √ √
• ~x = − √13 1 ∈ R3 is a unit vector since k~xk = − √13 12 + 12 + 12 = √1
3
3=1
1

Given a nonzero vector ~x ∈ Rn , the vector


1
~y = ~x
k~xk

is a unit vector in the direction of ~x. To see this, note that since ~x 6= ~0, we have k~xk > 0 by
Theorem 7.1(1) and it follows that 1/k~xk > 0. Thus ~y is a positive scalar multiple of ~x so ~y
is in the same direction as ~x. Now
1 1 1
k~y k = ~x = k~xk = k~xk = 1
k~xk k~xk k~xk
so ~y is a unit vector in the direction of ~x.

Example 7.4
h i
4
Find a unit vector in the direction of ~x = 5 .
6
√ √ √
Solution. Since k~xk = 42 + 52 + 62 = 16 + 25 + 36 = 77, we have
   √ 
4 4/ 77
1    √ 
~y = √  5  =  5/ 77 
77 √
6 6/ 77

is the desired vector.

The Dot Product


We now define the dot product of two vectors in Rn . We will see how this product is related
to the norm, and use it to compute the angles between nonzero vectors.

Definition 7.3: Dot Product


h x1 i y 
.. .1
Let ~x = . and ~y = .. be vectors in Rn . The dot product of ~x and ~y is the real
xn yn
number
~x · ~y = x1 y1 + · · · + xn yn .

47
The dot product is sometimes referred to the scalar product or the standard inner product.
The term scalar product comes from the fact that give two vectors in Rn , their dot product
returns a real number, which we call a scalar.

Example 7.5
   
1 −3
 1  ·  −4  = 1(−3) + 1(−4) + 2(5) = −3 − 4 + 10 = 3.
   

2 5

The next theorem states some useful properties of the dot product. Note that property (4)
shows how the norm and dot product are related.

Theorem 7.2: Properties of Dot Products


~ ~x, ~y ∈ Rn and c ∈ R.
Let w,

(1) ~x · ~y ∈ R

(2) ~x · ~y = ~y · ~x

(3) ~x · ~0 = 0

(4) ~x · ~x = k~xk2

(5) (c~x) · ~y = c(~x · ~y ) = ~x · (c~y )

~ · (~x ± ~y ) = w
(6) w ~ · ~x ± w
~ · ~y

h x1 i y 
.. .1
Proof. We prove (2), (4) and (5). Let c ∈ R and let ~x = . and ~y = .. be vectors in
xn yn
n
R . For (2) we have

~x · ~y = x1 y1 + · · · + xn yn = y1 x1 + · · · + yn xn = ~y · ~x.

Now to prove (4), we have

~x · ~x = x1 x1 + · · · + xn xn = x21 + · · · + x2n = k~xk2 .

For (5),
(c~x) · ~y = (cx1 )y1 + · · · + (cxn )yn = c(x1 y1 + · · · + xn yn ) = c(~x · ~y ).
That ~x · (c~y ) = c(~x · ~y ) is shown similarly.

48
We now look at how norms and dot products lead to a nice geometric interpretation about
angles between vectors. Given two vectors ~x, ~y ∈ R2 , they determine an angle θ as shown
in Figure 7.3. We restrict θ to 0 ≤ θ ≤ π to avoid multiple values for θ and to avoid reflex
angles.

Figure 7.3: Every two nonzero vectors in R2 determine an acute, obtuse or orthog-
onal angle.

Theorem 7.3
For two nonzero vectors ~x, ~y ∈ R2 determining an angle θ,

~x · ~y = k~xkk~y k cos θ

Proof. Consider the triangle determined by the vectors ~x, ~y and ~x − ~y .

From the Cosine Law, we have

k~x − ~y k2 = k~xk2 + k~y k2 − 2k~xkk~y k cos θ. (12)

Using Theorem 7.2, we obtain

k~x − ~y k2 = (~x − ~y ) · (~x − ~y )


= (~x − ~y ) · ~x − (~x − ~y ) · ~y
= ~x · ~x − ~y · ~x − ~x · ~y + ~y · ~y

49
= k~xk2 − 2(~x · ~y ) + k~y k2 .

Thus (12) becomes

k~xk2 − 2(~x · ~y ) + k~y k2 = k~xk2 + k~y k2 − 2k~xkk~y k cos θ

and subtracting k~xk2 + k~y k2 from both sides and then multiplying both sides by − 21 gives
~x · ~y = k~xkk~y k cos θ as required.
Theorem 7.3 gives a relationship between the angle θ determined by two nonzero vectors
~x, ~y ∈ R2 and their dot product. This relationship motivates us to define the angle deter-
mined by two vectors in Rn .

Definition 7.4: Angle Determined by Two Vectors in Rn

Let ~x, ~y ∈ Rn be two nonzero vectors. The angle θ they determine (with 0 ≤ θ ≤ π)
is such that
~x · ~y = k~xkk~y k cos θ.

The equation in Definition 7.4 can be rearranged as


~x · ~y
cos θ = (13)
k~xkk~y k

which will allow us to explicitly solve for θ (again, we are assuming that ~x and ~y are nonzero).
Note that for Equation (13) (and thus Definition 7.4) to make any sense, we require that

~x · ~y
≤ 1, or equivalently, |~x · ~y | ≤ k~xkk~y k.
k~xkk~y k

since | cos θ | ≤ 1. This is exactly the Cauchy-Schwarz Inequality, which we state here with-
out proof.

Theorem 7.4: Cauchy-Schwarz Inequality


For any two vectors ~x, ~y ∈ Rn , we have

|~x · ~y | ≤ k~xkk~y k.

Now we can solve for θ:  


−1 ~x · ~y
θ = cos .
k~xkk~y k

50
Example 7.6
h 2
i h 1
i
Compute the angle determined by the vectors ~x = 1
−1
and y = −1
−2

Solution. We have that


~x · ~y 2(1) + 1(−1) − 1(−2) 3 1
cos θ = =√ √ =√ √ =
k~xkk~y k 4+1+1 1+1+4 6 6 2
so  
−1 1 π
θ = cos = .
2 3

For nonzero vectors ~x, ~y ∈ Rn determining an angle θ, we are often not interested in the spe-
cific value of θ, but rather in the approximate size of θ. That is, we are often only concerned
if ~x and ~y determine an acute angle, an obtuse angle, or if the vectors are orthogonal (refer
to Figure 7.3). Recalling that
π
cos θ > 0 for 0 ≤ θ <
2
π
cos θ = 0 for θ =
2
π
cos θ < 0 for < θ ≤ π
2
we see from Equation (13) that the sign of cos θ is determined by the sign of ~x · ~y since
k~xkk~y k > 0. Thus
π
~x · ~y > 0 ⇐⇒ 0 ≤ θ < ⇐⇒ ~x and ~y determine an acute angle
π 2
~x · ~y = 0 ⇐⇒ θ= ⇐⇒ ~x and ~y are orthogonal
π 2
~x · ~y < 0 ⇐⇒ < θ ≤ π ⇐⇒ ~x and ~y determine an obtuse angle
2
Example 7.7
For " # " #
1 6
~x = and ~y = ,
2 −2
we compute
~x · ~y = 1(6) + 2(−2) = 2 > 0
and so ~x and ~y determine an acute angle.

Note that to find the exact angle determined by ~x and ~y in the previous example we compute
~x · ~y 2 2 2 2 1
cos θ = =√ √ =√ √ =√ = √ = √
k~xkk~y k 1 + 4 36 + 4 5 40 200 10 2 5 2

51
so  
1
−1
θ = cos √
5 2
which is our exact answer for θ. Note that as a decimal number rounded to the nearest
millionth, we have θ ≈ 1.428899, but that this is an approximation rather than the exact
value. In MATH 115, it is normally expected that students give exact answers unless othe-
wise stated.

We have defined the norm for any vector in Rn and the dot product for any two vectors in
Rn . However, our work with angles determined by vectors has required that our vectors be
nonzero thus far. Now since ~x · ~0 = 0 for every ~x ∈ Rn , we define the zero vector to be
orthogonal to every vector in Rn and make the following definition.

Definition 7.5: Orthogonal


Two vectors ~x, ~y ∈ Rn are said to be orthogonal if ~x · ~y = 0.

Although the zero vector of Rn is orthogonal to all vectors in Rn , we don’t explicitly compute
the angle ~0 makes with another vector ~x ∈ Rn since
~x · ~y
cos θ =
k~xkk~y k

is not defined if either of ~x or ~y is the zero vector. Thus, we interpret ~x and ~y being or-
thogonal to mean that their dot product is zero, and if both ~x and ~y are nonzero, then they
determine an angle of π2 .

52
Lecture 8

Complex Vectors
We briefly extend our work in Rn to vectors whose entries are complex numbers.

Definition 8.1: Cn
We define   
 z1 
.. 
 
n
C =  .  z1 , . . . , zn ∈ C


 
zn
to be the set of all vectors with n components, each of which is a complex number.

Since R ⊆ C, it follows that Rn ⊆ Cn .

Definition 8.2: Zero Vector


h0i
.
The zero vector in Cn is denoted by ~0Cn = .. , or just ~0 when there is no confusion.
0

Note that the zero vector for Cn is the same as the zero vector for Rn . We also have similar
definition for equality, vector addition and scalar multiplication as we did for Rn .

Definition 8.3: Equality, Vector Addition and Scalar Multiplication


h z1 i h w1 i
. .
Let ~z = .. and w ~ = .. be two vectors in Cn . We say that ~z and w ~ are equal if
zn wn
~ in this case. Otherwise we write ~z 6= w.
z1 = w1 , . . . , zn = wn and we write ~z = w ~ We
define vector addition by
 
z1 + w1
..
~z + w~ = ,
 
.
zn + wn
and for α ∈ C, we define scalar multiplication by
 
αz1
 . 
α~z =  ..  .
αzn

53
Definition 8.4: Norm, Complex Inner Product and Dot Product
h z1 i h w1 i
. .
Let ~z = .. and w~ = .. be two vectors in Cn . The norm of ~z is
zn wn
q
k~z k = z1 z1 + · · · + zn zn .

The complex inner product of ~z and w


~ is

h~z, w
~ i = z1 w1 + · · · + zn wn ,

and the dot product of ~z and w


~ is

~z · w
~ = z1 w1 + · · · + zn wn .

The hdefinitions of the norm and the complex inner product might seem surprising. For
x.1 i
~x = .. ∈ R , we recall
n
xn
q q
2
k~xk = x1 + · · · + xn = |x1 |2 + · · · + |xn |2
2

which is a nonnegative real number that we√interpret as the length or magnitute of ~x. It then
follows from Theorem 7.2(4) that k~xhk =i ~x · ~x. The definition of the norm for a complex
z.1
number is analogous to this: for ~z = .. ∈ Cn ,
zn
q q
k~z k = z1 z1 + · · · + zn zn = |z1 |2 + · · · + |zn |2

which is again a nonnegative real number (since |zi | is a nonnegative real number for
i = 1, . . . n) which we will interpret as the length or magnitude of ~z.

Example 8.1

Let ~z = 1+j
 
z · ~z, h~z, ~z i and k~z k.
2−j . Compute ~

Solution. We have

~z · ~z = (1 + j)2 + (2 − j)2 = (1 + 2j − 1) + (4 − 4j − 1) = 3 − 2j
h~z, ~z i = (1 + j)(1 + j) + (2 − j)(2 − j) = (1 − j)(1 + j) + (2 + j)(2 − j)
= (1 + 1) + (4 + 1) = 7
q √
k~z k = (1 + j)(1 + j) + (2 − j)(2 − j) = 7

54
Example 8.2

Let ~z = 2−2j
   
1+j ~ = 2+j
and w 3 . Compute h~z, wi
~ and hw,
~ ~z i.
Solution. We have

h~z, wi
~ = (2 − 2j)(2 + j) + (1 + j)(3) = (2 + 2j)(2 + j) + (1 − j)(3)
= (2 + 6j) + (3 − 3j) = 5 + 3j

and

hw,
~ ~z i = (2 + j)(2 − 2j) + (3)(1 + j) = (2 − j)(2 − 2j) + (3)(1 + j)
= (2 − 6j) + (3 + 3j) = 5 − 3j

From Examples 8.1 and 8.2, we see that for ~z ∈ Cn , ~z · ~z need not be a real number. It
also appears that k~z k2 = h~z, ~z i and that although h~z, w
~ i 6= hw, ~ ∈ Cn , we have
~ ~z i for w
~ i = hw,
h~z, w ~ ~z i instead. This is confirmed in the following theorem.

Theorem 8.1: Properties of Norms and Complex Inner Products


~ ~z ∈ Cn and α ∈ C. Then
Let ~v , w,

(1) h~z, ~z i ≥ 0 with equality if and only if ~z = ~0

(2) k~zk2 = h~z, ~z i

~ i = hw,
(3) h~z, w ~ ~z i

(4) h~v + w,
~ ~z i = h~v , ~z i + hw,
~ ~z i and h~z, ~v + w
~ i = h~z, ~v i + h~z, w
~i

(5) hα~z, w ~ i and h~z, αw


~ i = αh~z, w ~ i = αh~z, w
~i

(6) |h~z, w
~ i| ≤ k~z kkwk
~ (Cauchy–Schwarz Inequality)

(7) k~z + wk
~ ≤ k~z k + kwk
~ (Triangle Inequality)

h z1 i h w1 i
.. .
Proof. We prove (5). Let ~z = ~ = .. be vectors in Cn , and let α ∈ C. Then
. and w
zn wn

~ i = (αz1 )w1 + · · · + (αzn )wn


hα~z, w
= α z 1 w1 + · · · + α z n wn
= α(z 1 w1 + · · · + z n wn )
= αh~z, w~i
and
h~z, αw
~ i = z 1 (αw1 ) + · · · + z n (αwn )

55
= α z 1 w1 + · · · + α z n wn
= α(z 1 w1 + · · · + z n wn )
= αh~z, w~i

We end this section with one final definition.

Definition 8.5: Complex Conjugate


h z1 i
.
Let ~z = .. ∈ Cn . The complex conjugate of ~z is
zn

 
z1
 . 
~z =  ..  .
zn
h z1 i h w1 i
.. .
From this, we can say that for ~z = ~ = .. in Cn ,
. and w
zn wn

~ i = z1 w1 + · · · + zn wn = ~z · w
h~z, w ~

~ as a dot product of ~z and w.


which allows us to view the complex inner product of ~z and w ~

The Cross Product in R3


We now define a product of two vectors that is only valid3 in R3 . Whereas the dot product
of two vectors in R3 is a real number, the cross product of two vectors in R3 is a vector in R3 .

Definition 8.6: Cross Product in R3


h x1 i h y1 i
Let ~x = xx23 and ~y = yy23 be two vectors in R3 . The cross product of ~x and ~y is
 
x2 y 3 − y 2 x3
~x × ~y =  −(x1 y3 − y1 x3 )  .
 

x1 y 2 − y 1 x2

Note that the cross product is sometimes called the vector product because the result is also
a vector.

3
This is not entirely true. There is a cross product in R7 as well, but it is beyond the scope of this course.

56
Example 8.3
h i h −1 i
1
Let ~x = 6 and ~y = 3 . Then
3 2
   
6(2) − 3(3) 3
  
~x × ~y =  − 1(2) − (−1)(3)  =  −5 
 

1(3) − (−1)(6) 9

Using the results of Example 8.3, we compute

~x · (~x × ~y ) = 1(3) + 6(−5) + 3(9) = 3 − 30 + 27 = 0


~y · (~x × ~y ) = −1(3) + 3(−5) + 2(9) = −3 − 15 + 18 = 0

from which we see that ~x × ~y is orthogonal to both ~x and ~y .

The formula for ~x × ~y is quite tedious to remember. Here we give a simpler way. For
a, b, c, d ∈ R, define
a b
= ad − bc
c d
so that
 
x2 y2
  ←− remove x1 and y1

 x3 y3 

     
x1 y1

 
  x1 y1 
~x × ~y =  x2  ×  y 2  =   − x
 ←− remove x2 and y2 (don’t forget the “−” sign)
  
3 y3 
x3 y3
 
 
 
 

 x1 y1 
 ←− remove x and y
3 3
x2 y2
 
x2 y 3 − y 2 x3
= −(x1 y3 − y1 x3 )  .
 

x1 y 2 − y 1 x2

It’s a good idea to try this “trick” using the above example.

57
Lecture 9

Theorem 9.1: Properties of Cross Products

~ ∈ R3 , c ∈ R. Then
Let ~x, ~y , w

(1) ~x × ~y ∈ R3

(2) ~x × ~y is orthogonal to both ~x and ~y

(3) ~x × ~0 = ~0 = ~0 × ~x

(4) ~x × ~x = ~0

(5) ~x × ~y = −(~y × ~x )

(6) (c~x ) × ~y = c(~x × ~y ) = ~x × (c~y )

~ × (~x ± ~y ) = (w
(7) w ~ × ~x ) ± (w
~ × ~y )

(8) (~x ± ~y ) × w
~ = (~x × w
~ ) ± (~y × w
~)

h x1 i h y1 i
Proof. We prove (5). Let ~x = x2
x3
and ~y = y2
y3
be two vectors in R3 . Then
     
x2 y 3 − y 2 x3 −(y2 x3 − x2 y3 ) y2 x3 − x2 y3
~x × ~y =  −(x1 y3 − y1 x3 )  =  y1 x3 − x1 y3  = −  −(y1 x3 − x1 y3 )  = −(~y × ~x ).
     

x1 y 2 − y 1 x2 −(y1 x2 − x1 y2 ) y1 x2 − x1 y2

Example 9.1

Let ~x, ~y ∈ R3 be parallel vectors. Compute ~x × ~y .


Solution. Let ~x, ~y ∈ R3 be parallel vectors. Then ~y = c~x for some c ∈ R. Using
Theorem 9.1(4),(6) we have

~x × ~y = ~x × (c~x ) = c(~x × ~x) = c(~0) = ~0.

Exercise 9.1
~ ∈ R3 such that
Show that the cross product is not associative. That is, find ~x, ~y , w

(~x × ~y ) × w
~ 6= ~x × (~y × w
~ ).

58
h i h i h i
1 0 0
Solution. Consider ~x = 1 , ~y = 1 , and w
~ = 0 . Then
0 0 1
    
      
1 0 0 0 0 0
(~x × ~y ) × w
~ =  1  ×  1  ×  0  =  0  ×  0  =  0 
           

0 0 1 1 1 0

and
           
1 0 0 1 1 0
~x × (~y × w
~ ) =  1  ×  1  ×  0  =  1  ×  0  =  0 
           

0 0 1 0 0 −1

so we see that (~x × ~y ) × w


~ 6= ~x × (~y × w
~ ). Thus, the cross product is not associative.
Since the cross product is not associative, the expression ~x × ~y × w ~ is undefined. We must
always include brackets to indicate in which order we should evaluate the cross products as
changing the order will change the result. Also note that the cross product is not commu-
tative as ~x × ~y 6= ~y × ~x. However, since ~x × ~y = −(~y × ~x), we say that the cross product
is anti-commutative, that is, changing the order of ~x and ~y in the cross product changes the
result by a factor of −1.

Example 9.2
h i h 1 i
1
Find a nonzero vector orthogonal to both ~x = 2 and ~y = −1 . Moreover, show
3 −1
that this vector is orthogonal to any linear combination of ~x and ~y .
Solution. Using Theorem 9.1(2), we have that
     
1 1 1
~n = ~x × ~y =  2  ×  −1  =  4 
     

3 −1 −3

is orthogonal to both ~x and ~y . Now for any s, t ∈ R,

~n · (s~x + t~y ) = s(~n · ~x) + t(~n · ~y ) = s(0) + t(0) = 0

so ~n = ~x × ~y is orthogonal to any linear combination of ~x and ~y .

Example 9.2 demonstrates one of the main uses of the cross product in R3 . Given two non
parallel vectors ~x, ~y ∈ R3 , it is quite useful to find a nonzero vector that is orthogonal to both
~x and ~y (and hence to any linear combination of them). Also, we note here that once the
cross product of ~x, ~y ∈ R3 is computed, we can check that our work is correct by verifying

59
that (~x × ~y ) · ~x = 0 and that (~x × ~y ) · ~y = 0.

The cross product can also be used to compute the area of a parallelogram determined by
two vectors in R3 . To see how, we will need the following result which is stated without proof.

Theorem 9.2: Lagrange Identity

Let ~x, ~y ∈ R3 . Then k~x × ~y k2 = k~x k2 k~y k2 − (~x · ~y )2 .

Let ~x, ~y ∈ R3 be nonzero vectors. Then by Theorem 7.3,

~x · ~y = k~x kk~y k cos θ

where 0 ≤ θ ≤ π. Substituting this into the Lagrange Identity gives

k~x × ~y k2 = k~x k2 k~y k2 − (k~x kk~y k cos θ)2


= k~x k2 k~y k2 − k~x k2 k~y k2 cos2 θ
= k~x k2 k~y k2 (1 − cos2 θ)
= k~x k2 k~y k2 sin2 θ.

Since sin θ ≥ 0 for 0 ≤ θ ≤ π, we may take square roots to obtain

k~x × ~y k = k~x kk~y k sin θ.

Next, consider the parallelogram determined by the nonzero vectors ~x and ~y .

Figure 9.1: The parallelogram determined by ~x and ~y .

Denoting the base by b and the height by h, we see that b = k~x k and that h satisfies
sin θ = k~yh k which gives h = k~y k sin θ. Denoting the area of the parallelogram by A, we have

A = bh = k~x kk~y k sin θ = k~x × ~y k.

We see that the norm of the cross product of two nonzero vectors ~x, ~y ∈ R3 gives the area
of the parallelogram that ~x and ~y determine. Our derivation has been for nonzero vectors ~x

60
and ~y , and we implicitly assumed that ~x and ~y were not parallel in the above diagram. Note
that if ~x and ~y are parallel, then the parallelogram they determine is simply a line segment
(a degenerate parallelogram) and thus the area is zero. Moreover, if any of ~x and ~y are zero,
then the area of the resulting parallelogram is again zero. Note that in these two cases we
have ~x × ~y = ~0, so our formula A = k~x × ~y k holds for any ~x, ~y ∈ R3 .

Example 9.3
h i h 1 i
1
Let ~x = 1 and ~y = −3
2 . Find the area of the parallelogram determined by ~
x and
1
~y .
Solution. Since      
1 1 −5
~x × ~y =  1  ×  2  =  4  ,
     

1 −3 1
the area of the parallelogram is
√ √
A = k~x × ~y k = 25 + 16 + 1 = 42.

The Vector Equation of a Line


In R2 , we define lines by equations such as

x2 = mx1 + b or ax1 + bx2 = c

where m, a, b, c are constants. How do we describe lines in Rn (for example, in R3 )? It might


be tempting to think the above equations are equations of lines in Rn as well, but this is not
the case. Consider the graph of the equation x2 = x1 in R2 . This graph consists of all points
(x1 , x2 ) such that x2 = x1 , which yields a line (see Figure 9.2).

Figure 9.2: The graph of x2 = x1 is a line in R2 .

61
If we consider the equation x2 = x1 in R3 , then we are considering all points (x1 , x2 , x3 ) with
the property that x2 = x1 . Notice that there is no restriction on x3 , so we can take x3 to
be any real number. It follows that the equation x2 = x1 represents a plane in R3 and not a
line (see Figure 9.3).

Figure 9.3: The graph of x2 = x1 is a plane in R3 . The red line indicates the
intersection of this plane with the x1 x2 −plane.

Note that we require two things to describe a line:

(1) A point P on the line,

(2) A vector d~ in the direction of the line (called a direction vector for the line).

Definition 9.1: Vector Equation of a Line


~ where d~ ∈ Rn is nonzero, is given by
A line in Rn through a point P with direction d,
the vector equation
 
x1
 .  −→ ~ t ∈ R.
~x =  ..  = OP + td,
xn

We can see from Figure 9.4 how the line through P with direction d~ is “drawn out” by the
−→
vector ~x = OP + td~ as t ∈ R varies between −∞ to ∞.

62
−→
Figure 9.4: The line through P with direction d~ and the vector OP + td~ with some
additional points plotted for a few values of t ∈ R.
−→
We can also think of the equation ~x = OP + td~ as first moving us from the origin to the
~ This is shown
point P , and then moving from P as far as we like in the direction given by d.
in Figure 9.5.

−→ ~
Figure 9.5: An equivalent way to understand the vector equation ~x = OP + td.

Example 9.4

Find the vector equation of the line through the points A(1, 1, −1) and B(4, 0, −3).
Solution. We first find a direction vector for the line. Since the line passes through
the points A and B, we take the direction vector to be the vector from A to B. That

63
is,     
4 1 3
−→ −−→ −→ 
d~ = AB = OB − OA =  0  −  1  =  −1  .
    

−3 −1 −2
Hence, using the point A, we have a vector equation for our line:
   
1 3
−→ −→ 
~x = OA + tAB =  1  + t  −1  , t ∈ R.
  

−1 −2

Note that the vector equation for a line is not unique. In fact, in Example 9.4, we could
−→
have used the vector BA as our direction vector, and we could have used B as the point on
our line to obtain
   
4 −3
−−→ −→ 
~x = OB + tBA =  0  + t  1  , t ∈ R.
  

−3 2

Indeed, we can use any known point on the line and any nonzero scalar multiple of the
direction vector for the line when constructing the vector equation. Thus, there are infinitely
many vector equations for a line (see Figure 9.6).

Figure 9.6: Two different vector equations for the same line.

Finally, given one of the vector equations for the line in Example 9.4, we have
           
x1 1 3 1 3t 1 + 3t
~x =  x2  =  1  + t  −1  =  1  +  −t  =  1 − t 
           

x3 −1 −2 −1 −2t −1 − 2t

64
from which it follows that

x1 = 1 + 3t
x2 = 1 − t, t∈R
x3 = −1 − 2t

which we call the parametric equations of the line. For each choice of t ∈ R, these equations
give the x1 −, x2 − and x3 −coordinates of a point on the line. Note that since the vector
equation for a line is not unique, neither are the parametric equations for a line.

65
Lecture 10

The Vector Equation of a Plane


We extend the vector equation for a line in Rn to a vector equation for a plane in Rn .

Definition 10.1: Vector Equation of a Plane


The vector equation for a plane in Rn through a point P is given by
 
x1
 .  −→
~x =  ..  = OP + s~u + t~v , s, t ∈ R
xn

where ~u, ~v ∈ Rn are nonzero nonparallel vectors.

We may think of this vector equation as taking us from the origin to the point P on the
plane, and then adding any linear combination of ~u and ~v to reach any point on the plane.
It is important to note that the parameters s and t are chosen independently of one another,
that is, the choice of one parameter does not determine the choice of the other. See Figure
10.1.

Figure 10.1: Using vectors to describe a plane in Rn

66
Example 10.1

Find a vector equation for the plane containing the points A(1, 1, 1), B(1, 2, 3) and
C(−1, 1, 2).
Solution. We compute

    
1 1 0
−→ −−→ −→      
AB = OB − OA =  2  −  1  =  1 
3 1 2
     
−1 1 −2
−→ −→ −→ 
AC = OC − OA =  1  −  1  =  0 
    

2 1 1
−→ −→
and note that AB and AC are nonzero and nonparallel. A vector equation is thus
       
x1 1 0 −2
 −→ −→ −→  
~x =  x2  = OA + sAB + tAC =  1  + s  1  + t  0  , s, t ∈ R.
    

x3 1 2 1

Considering our vector equation from Example 10.1, we see that by setting either of s, t ∈ R
to be zero and letting the other parameter be arbitrary, we obtain vector equations for two
lines – each of which lie in the given plane:
       
1 0 1 −2
−→ −→     −→ −→   
~x = OA+sAB =  1  +s  1  , s ∈ R and ~x = OA+tAC =  1  +t  0  , t ∈ R.

1 2 1 1

This is illustrated in Figure 10.2.

We also note that evaluating the right hand side of the vector equation derived in Example
10.1 gives          
x1 1 0 −2 1 − 2t
~x =  x2  =  1  + s  1  + t 0  =  1 + s 
         

x3 1 2 1 1 + 2s + t
from which we derive the parametric equations of the plane:

x1 = 1 − 2t
x2 = 1 + s s, t ∈ R.
x3 = 1 + 2s + t

67
Figure 10.2: The plane through the points A, B and C with vector equation
−→ −−→ −→
~x = OA + sAB + tAC, s, t ∈ R.
.

Finally, we note that as with lines, our vector equation for the plane in Example 10.1 is not
unique as we could have chosen
−−→ −−→ −→
~x = OB + sBC + tAB, s, t ∈ R
−−→ −→
as the vector equation instead (it is easy to verify that BC and AB are nonzero and non-
parallel).

Example 10.2

Find a vector equation of the plane containing the point P (1, −1, −2) and the line
with vector equation    
1 1
~x =  3  + r  1  , r ∈ R.
   

−1 4
Solution. We construct two vectors lying in the plane. For one, we can take the
direction vector of the given line, and for the other, we can take a vector from a known
point on the given line to the point P . Thus we let
       
1 1 1 0
~u =  1  and ~v =  −1  −  3  =  −4  .
       

4 −2 −1 −1
Then, since ~u and ~v are nonzero and nonparallel, a vector equation for the plane is
−→
~x = OP + s~u + t~v

68
     
1 1 0
=  −1  + s  1  + t  −4  , s, t ∈ R.
     

−2 4 −1

We note that for the vector equation for a plane, we do require ~u and ~v to be nonparallel. If
~u and ~v are parallel, say ~u = c~v for some c ∈ R, then the vector equation we derive is
−→ −→ −→
~x = OP + s~u + t~v = OP + s(c~v ) + t~v = OP + (sc + t)~v ,

which is the vector equation for a line through P , not a plane.

The Scalar Equation of a Plane in R3


Given a plane in R3 and any point P on this plane, there is a unique line through that point
that is perpendicular to the plane. Let ~n be a direction vector for this line. Then for any Q
−→
on the plane, ~n is orthogonal to P Q.

Figure 10.3: A line that is perpendicular to a plane.

Definition 10.2: Normal Vector for a Plane


A nonzero vector ~n ∈ R3 is a normal vector for a plane if for any two points P and Q
−→
on the plane, ~n is orthogonal to P Q.

We note that given a plane in R3 , a normal vector for that plane is not unique as any nonzero
scalar multiple of that vector will also be a normal vector for that plane.

69
Now consider a plane in R3 with a normal vector
 
n1
~n =  n2  ,
 

n3

and suppose P (a, b, c) is a given point on this plane. For any point Q(x1 , x2 , x3 ), Q lies on
the plane if and only if
   
n1 x1 − a
−→ −→ −→ 
0 = ~n · P Q = ~n · OQ − OP =  n2  ·  x2 − b  = n1 (x1 − a) + n2 (x2 − b) + n3 (x3 − c).
  

n3 x3 − c

Definition 10.3: Scalar Equation of a Plane


h n1 i
3
The scalar equation of a plane in R with normal vector ~n = n2
n3
containing the point
P (a, b, c) is given by

n1 x1 + n2 x2 + n3 x3 = n1 a + n2 b + n3 c.

Example 10.3

Find a scalar equation of the plane containing the points A(3, 1, 2), B(1, 2, 3) and
C(−2, 1, 3).
Solution. We have three points lying on the plane, so we only need to find a normal
vector for the plane.

70
We compute

    
1 3 −2
−→ −−→ −→     
AB = OB − OA =  2  −  1  =  1 

3 2 1
     
−2 3 −5
−→ −→ −→ 
AC = OC − OA =  1  −  1  =  0 
    

3 2 1
−→ −→
and notice that AB and AC are nonzero nonparallel vectors in R3 . We compute
     
−2 −5 1
−→ −→ 
~n = AB × AC =  1  ×  0  =  −3 
    

1 1 5
−→ −→
and recall that the nonzero vector ~n is orthogonal to both AB and AC. It follows
from Example 9.2 that ~n is orthogonal to the entire plane and is thus a normal vector
for the plane. Hence, using the point A(3, 1, 2), our scalar equation is

1(x1 − 3) − 3(x2 − 1) + 5(x3 − 2) = 0

which evaluates to
x1 − 3x2 + 5x3 = 10.

We make a few remarks about the preceding example here.


• Using the point B or C rather than A to compute the scalar equation would lead to
the same scalar equation as is easily verified.
• As the normal vector for the above plane is not unique, neither is the scalar equation.
In fact, 2~n is also a normal vector for the plane, and using it instead of ~n would lead to
the scalar equation 2x1 − 6x2 + 10x3 = 20, which is just the scalar equation we found
multiplied by a factor of 2.
• From our work above, we see that we can actually compute a vector equation for the
plane:
     
3 −2 −5
−→ −→ −→  
~x = OA + sAB + tAC =  1  + s  1  + t  0  , s, t ∈ R
   

2 1 1
−→
for example. In fact, given a vector equation ~x = OP + s~u + t~v for a plane in R3
containing a point P , we can find a normal vector by computing ~n = ~u × ~v .

71
• Note that in the scalar equation x1 −3x2 +5x3 = 10, the coefficients on the variables x1 ,
x2 and x3 are exactly the entries in the normal vector we found (see Definition 10.3).
Thus, if we are given a scalar equationh of ia different plane, say 3x1 − 2x2 + 5x3 = 72,
3
we can deduce immediately that ~n = −2 is a normal vector for that plane.
5

Given a plane in R3 , when is it better to use a vector equation and when is it better to
use a scalar equation? Consider a plane with scalar equation 4x1 − x2 − x3 = 2 and vector
equation      
1 1 1
~x =  1  + s  2  + t  1  , s, t ∈ R.
     

1 2 3
Suppose you are asked if the point (2, 6, 0) lies on this plane. Using the scalar equation
4x1 − x2 − x3 = 2, we see that 4(2) − 1(6) − 1(0) = 2 satisfies this equation so we can easily
conclude that (2, 6, 0) lies on the plane. However, if we use the vector equation, we must
determine if there exist s, t ∈ R such that
       
1 1 1 2
 1  + s 2  + t 1  =  6 
       

1 2 3 0

which leads to the system of equations

s + t = 1
2s + t = 5
2s + 3t = −1

With a little work, we can find that the solution4 to this system is s = 4 and t = −3 which
again guarantees that (2, 6, 0) lies on the plane. It should be clear that using a scalar equa-
tion is preferable here. On the other hand, if you are asked to generate a point that lies
on the plane, then using the vector equation, we may select any two values for s and t (say
s = 0 and t = 0) to conclude that the point (1, 1, 1) lies on the plane. It is not too difficult
to find a point lying on the plane using the scalar equation either - this will likely be done
by choosing two of x1 , x2 , x3 and then solving for the last, but this does involve a little bit
more math. Thus, the scalar equation is preferable when verifying if a given point lies on a
plane, and the vector equation is preferable when asked to generate points that lie on the
plane.

We have have discussed parallel vectors previously, and we can use this definition to define
parallel lines and planes.
4
We will look at a more efficient technique to solve systems of equations shortly.

72
Definition 10.4: Parallel Lines and Parallel Planes
Two lines in Rn are parallel if their direction vectors are parallel. Two planes in R3
are parallel if their normal vectors are parallel.

Hyperplanes
The scalar equation for a plane in R3 introduced in the last section can be generalized to
Rn . In fact, you’ve likely already seen this in R2 - for example, the equation 2x1 + 3x2 = 4
determines a line in R2 . Since the points P (2, 0) and Q(5, −2) satisfy this equation, they lay
on the line and thus
" # " # " #
−→ −→ −→ 5 2 3
P Q = OQ − OP = − =
−2 0 −2
−→
is a direction vector for this line (recall that any nonzero scalar multiplie of P Q can also
serve as a direction vector for this line). Note also that if we form a vector ~n using the
coeffiecients on x1 and x2 above, we obtain ~n = [ 23 ] and that
" # " #
−→ 2 3
~n · P Q = · = 0.
3 −2

We call ~n a normal vector of the line in R2 (again, any nonzero scalar multiple of ~n will also
serve as a normal vector for this line). We now state the definition of a hyperplane in Rn .

Definition 10.5: Scalar Equation of a Hyperplane


A hyperplane in Rn is defined by hthe iscalar equation a1 x1 + · · · + an xn = d where
a.1
a1 , . . . , an , d ∈ R. The vector ~n = .. is called a normal vector for the hyperplane.
an
Given the point P (p1 , . . . , pn ) lying on the hyperplane, d = a1 p1 + · · · + an pn .

Taking n = 2 in Definition 10.5, we see that in R2 , hyperplanes are lines, and taking n = 3
gives that in R3 , hyperplanes are the planes we discussed above.

Exercise 10.1
Find the vector equation and scalar equation for the line in R2 passing through
P (1, −3) and Q(2, 4). Plot the line along with a normal vector.

−→
Solution. We compute P Q, a direction vector for the line.
" # " # " #
−→ −→ −→ 2 1 1
P Q = OQ − OP = − = .
4 −3 7

73
Thus a vector equation for the line is
" # " #
−→ −→ 1 1
~x = OP + tP Q = +t , t ∈ R.
−3 7

7 −→
We see that ~n = [ −1 ] is orthogonal to P Q so our scalar equation is of the form 7x1 − x2 = d.
Using the point P , we see that d = 7(1) − 1(3) = 10 so our scalar equation is 7x1 − x2 = 10.

Note that given a vector equation of a line in R2


−→ ~ t∈R
~x = OP + td,

with d~ = dd12 =
6 ~0, we can simply take ~n = −d
   d2 
1
when finding the scalar equation.

74
Lecture 11

Projections
Given two vectors ~u, ~v ∈ Rn with ~v 6= ~0, we can write ~u = ~u1 + ~u2 where ~u1 is a scalar
multiple of ~v and ~u2 is orthogonal to ~v . In physics, this is often done when one wishes to
resolve a force into its vertical and horizontal components.

Figure 11.1: Decomposing ~u ∈ Rn as ~u = ~u1 + ~u2 where ~u1 is parallel to ~v and ~u2
is orthogonal to ~v .

This is not a new idea. In R2 , we have seen that we can write a vector ~u as a linear
combination ~e1 = [ 10 ] and ~e2 = [ 01 ] in a natural way. Figure 11.2 shows that we are actually
writing a vector ~u ∈ R2 as the sum ~u1 +~u2 where ~u1 is parallel to ~v = ~e1 and ~u2 is orthogonal
to ~v = ~e1 .

Figure 11.2: Writing a vector ~u = [ xx12 ] ∈ R2 as a linear combination of ~e1 and ~e2 .

Now for ~u, ~v ∈ Rn with ~v 6= ~0,

~u = ~u1 + ~u2 =⇒ ~u2 = ~u − ~u1


~u2 orthogonal to ~v =⇒ ~u2 · ~v = 0
~u1 a scalar multiple of ~v =⇒ ~u1 = t~v for some t ∈ R
75
so if we can find t, then we can find ~u1 and then find ~u2 . To find t, we have

0 = ~u2 · ~v = (~u − ~u1 ) · ~v = ~u · ~v − ~u1 · ~v = ~u · ~v − (t~v ) · ~v .

Hence
0 = ~u · ~v − t(~v · ~v ) = ~u · ~v − tk~v k2
and since ~v 6= ~0,
~u · ~v
t= .
k~v k2

Definition 11.1: Projection and Perpendicular

Let ~u, ~v ∈ Rn with ~v 6= ~0. The projection of ~u onto ~v is


~u · ~v
proj ~v ~u = ~v
k~v k2

and the projection of ~u perpendicular to ~v (or the perpendicular of ~u onto ~v ) is

perp ~v ~u = ~u − proj ~v ~u.

Note that from our above work, ~u1 = proj ~v ~u and ~u2 = perp ~v ~u.

Figure 11.3: Visualizing projections and perpendiculars based on the angle deter-
mined by ~u, ~v ∈ Rn .

76
Example 11.1
h i h −1 i
1
Let ~u = 2 and ~v = 1 . Then
3 2
    
−1 −1 −7/6
~u · ~v −1 + 2 + 6   7
proj ~v ~u = ~v =  1  =  1  =  7/6  .
  
k~v k 2 1+1+4 6
2 2 7/3

and

    
1 −7/6 13/6
perp ~v ~u = ~u − proj ~v ~u =  2  −  7/6  =  5/6  .
     

3 7/3 2/3

In the previous example, note that


• proj ~v ~u = 76 ~v which is a scalar multiple of ~v ,
• (perp ~v ~u) · ~v = − 13
6
+ 56 + 4
3
= − 86 + 8
6
= 0 so perp ~v ~u is orthogonal to ~v ,
• proj ~v ~u + perp ~v ~u = ~u.

Exercise 11.1
For ~u, ~v ∈ Rn with ~v 6= ~0, prove that proj ~v ~u and perp ~v ~u are orthogonal. Hint:
Definition 11.1, Theorem 7.1 (Properties of Norms), and Theorem 7.2 (Properites of
Dot Products) will be helpful here.

Proof. For ~u, ~v ∈ Rn with ~v 6= ~0, we have


proj ~v ~u · perp ~v ~u = proj ~v ~u · (~u − proj ~v ~u)
= (proj ~v ~u) · u − proj ~v ~u · proj ~v ~u
     
~u · ~v ~u · ~v ~u · ~v
= ~v · ~u − ~v · ~v
k~v k2 k~v k2 k~v k2
   2
~u · ~v ~u · ~v
= (~v · ~u) − (~v · ~v )
k~v k2 k~v k2
(~u · ~v )2 (~u · ~v )2
 
= 2
− 4
k~v k2
k~v k k~v k
2
(~u · ~v ) (~u · ~v )2
= −
k~v k2 k~v k2
=0
and thus proj ~v ~u and perp ~v ~u are orthogonal.
77
The Shortest Distance from a Point to a Line
Given a point P , and the vector equation of a line, we are interested in finding the shortest
distance from P to the line, and also the point Q on the line that is closest to P .

Example 11.2

Find the shortest distance from the point P (1, 2, 3)h to ithe line L which passes through
1
the point P0 (2, −1, 2) with direction vector d~ = −1 1 . Also, find the point Q on L

that is closest to P .
Solution. The following illustration can help us visualize the problem. Note that the
line L and the point P were plotted arbitrarily, so it is not meant to be accurate. It
does however, give us a way to think about the problem geometrically and inform us
as to what computations we should do.

We construct the vector from the point P0 lying on the line to the point P which gives
     
1 2 −1
−−→ −→ −−→   
P0 P = OP − OP0 =  2  −  −1  =  3  .
  

3 2 1
−−→ ~ of the line leads to
Projecting the vector P0 P onto the direction vector, d,
     
−−→ ~ 1 1 1/3
−−→ P0 P · d ~ −1 + 3 − 1   1
proj d~ P0 P = d=  1  =  1  =  1/3 
  
~
kdk 2 1+1+1 3
−1 −1 −1/3
and it follows that
     
−1 1/3 −4/3
−−→ −−→ −−→ 
perp d~ P0 P = P0 P − proj d~ P0 P =  3  −  1/3  =  8/3  .
    

1 −1/3 4/3

78
The shortest distance from P to L is thus given by
−−→ 1√ 1p 4√
kperp d~ P0 P k = 16 + 64 + 16 = 16(1 + 4 + 1) = 6.
3 3 3
We have two ways to find the point Q since
     
2 1/3 7/3
−→ −−→ −−→ 
OQ = OP0 + proj d~ P0 P =  −1  +  1/3  =  −2/3 
    

2 −1/3 5/3

and
  
  
1 −4/3 7/3
−→ −→ −−→   
OQ = OP − perp d~ P0 P =  2  −  8/3  =  −2/3  .
  

3 4/3 5/3
7
, − 23 , 53

In either case, Q 3
is the point on L closest to P .

We see now we that our illustration in Example 11.2 was indeed inaccurate. It seems to
−−→ ~ but our computations show that proj ~ −−→
suggest that proj d~ P0 P is approximately 52 d, d P0 P =
1~
3
d. This is okay though, as the illustration was meant only as a guide to inform us as to
what computations to perform.

The Shortest Distance from a Point to a Plane


In R3 , given a point P and the scalar equation of a plane, we want to find the shortast
distance from P to the plane, and also the point Q on the plane closest to P .

Example 11.3

Find the shortest distance from the point P (1, 2, 3) to the plane T with equation
x1 + x2 − 3x3 = −2. Also, find the point Q on T that is closest to P .
Solution. The accompanying illustration can help us visualize the problem. As in
Example 11.2, this picture is not meant to be accurate as the point and the line have
been plotted arbitrarily, but rather to inform us on what computations we should
perform.

79
We see that P0 (−2, 0, 0) lies on T since −2 + 0 − 3(0) = −2. We also have that
 
1
~n =  1 
 

−3
is a normal vector for T . Now
 
   
1 −2 3
−−→ −→ −−→   
P0 P = OP − OP0 =  2  −  0  =  2 
  

3 0 3
and
   
−−→ 1 1
−−→ P0 P · ~n 3+2−9 4 
proj ~n P0 P = ~n = 1  = −  1 .
 
k~nk 2 1+1+9

11
−3 −3
The shortest distance from P to T is

−−→ 4 √ 4 11
kproj ~n P0 P k = − 1+1+9= .
11 11
To find Q we have
     
1 1 15/11
−→ −→ −−→   4 
OQ = OP − proj ~n P0 P =  2  +  1  =  26/11 
  
11
3 −3 21/11
15 26 21

so Q 11 , 11 , 11 is the point on T closest to P .
3
Volumes of Parallelepipeds in R
80
Consider three nonzero vectors w, ~ ~x, ~y ∈ R3 such that no one vector is a linear combination
of the other two (that is, w,
~ ~x, ~y are nonzero and nonparallel and no one of them lies on the
plane determined by the other two5 ). These three vectors determine a parallelepiped, which
is the three dimensional analogue of a parallelogram.

Figure 11.4: A parallelipiped determined by the vectors w,


~ ~x and ~y .

The volume of the parallelepiped is the product of its height with the area of its base. We
know that the area of the base is given by k~x ×~y k (which is nonzero since ~x and ~y are nonzero
and nonparallel), and we can find the height by computing the length of the projection of w ~
onto ~x × ~y . Thus, the volume V of the parallelepiped is given by

V = k~x × ~y kkproj ~x×~y w ~k


w~ · (~x × ~y )
= k~x × ~y k (~x × ~y )
k~x × ~y k2
|w
~ · (~x × ~y )|
= k~x × ~y k k~x × ~y k
k~x × ~y k2
= |w
~ · (~x × ~y )|

Example 11.4
h i h i h 1 i
1 1
Let w
~ = 1 , ~x = 1 , and ~y = −3
2 . Find the volume of the parallelepiped they
1 2
determine.

5
~ ~x and ~y are equivalent to the set {w,
As we will see shortly, our conditions on w, ~ ~x, ~y } being linearly
independent.

81
Solution. We compute
         
1 1 1 1 −7
w~ · (~x × ~y ) =  1  ·  1  ×  2  =  1  ·  5  = −7 + 5 + 1 = −1
         

1 2 −3 1 1

so the volume of the parallelepiped determined by w,


~ ~x and ~y is

V = |w
~ · (~x × ~y )| = | − 1| = 1.

We make a couple of remarks here:

• In our derivation of the formula for the volume of the parallelepiped determined by
the vectors w,
~ ~x and ~y , there was nothing special about labelling the vectors the way
that we did. We only needed to call one of the vectors w, ~ one of them ~x and one of
them ~y . Also, we could have chosen any of the six sides of the parallelogram to be the
base. Thus, we also have

V = |w
~ · (~y × ~x)| = |~x · (w
~ × ~y )| = |~x · (~y × w)|
~ = |~y · (~x × w)|
~ = |~y · (w
~ × ~x)|.

• Our derivation also required that no one of the vectors w, ~ ~x and ~y was a linear combi-
nation of the others (so no one of the three vectors lied in the plane through the origin
determined by the other two). Suppose one of the vectors is a linear combination of
the others, say w~ is a linear combination of ~x and ~y . Then w ~ = s~x + t~y for some
s, t ∈ R (from which we see w ~ lies in the plane through the origin determined by ~x
and ~y ). Geometrically, the resulting parallelepiped determined by w, ~ ~x and ~y is “flat”,
and thus the volume should be zero. Since w ~ lies in the plane determined by ~x and ~y ,
we have that w ~ is orthogonal to ~x × ~y and so w ~ · (~x × ~y ) = 0 so our derived formula
does indeed return the correct volume. A similar result occurs if ~x or ~y is a linear
combination of the other two vectors. Thus, our formula V = |w ~ · (~x × ~y )| holds for
3
~ ~x, ~y ∈ R .
any three vectors w,

82
Lecture 12

Systems of Linear Equations


Consider the following problem of finding all points that lie on three given planes in R3 .

Exercise 12.1
Find all points that lie on all three planes with scalar equations 2x1 + x2 + 9x3 = 31,
x2 + 2x3 = 8 and x1 + 3x3 = 10.

Solution. We are looking for points (x1 , x2 , x3 ) that simultaneously satisfy the three equa-
tions
2x1 + x2 + 9x3 = 31
x2 + 2x3 = 8
x1 + 3x3 = 10
From the second equation, we see that x2 = 8 − 2x3 and from the third equation, we have
that x1 = 10 − 3x3 . Substituting both of these into the first equation gives

31 = 2x1 + x2 + 9x3 = 2(10 − 3x3 ) + 8 − 2x3 + 9x3 = 20 − 6x3 + 8 − 2x3 + 9x3 = 28 + x3

so that x3 = 3. From x2 = 8 − 2x3 , we find that x2 = 2, and from x1 = 10 − x3 we have


x1 = 1. Thus, the only point that lies on all three planes is (x1 , x2 , x3 ) = (1, 2, 3).
The method of Elimination of Substitution can be used to solve this problem, but we will
look for a more systematic method to solve such problems that extends to handling more
equations and more variables than in Example 12.1.

Definition 12.1: System of Linear Equations


A linear equation in n variables is an equation of the form

a1 x 1 + a2 x 2 + · · · + an x n = b

where x1 , . . . , xn ∈ R are the variables or unknowns, a1 , . . . , an ∈ R are coefficients


and b ∈ R is the constant term. A system of linear equations (also called a linear
system of equations) is a collection of finitely many linear equations.

83
Example 12.1
The system
3x1 + 2x2 − x3 + 3x4 = 3
2x1 + x3 − 2x4 = −1
is a system of two linear equations in four variables. We see that each equation is the
scalar equation of a hyperplane in R4 . More generally, a system of m linear equations
in n variables is written as

a11 x1 + a12 x2 + · · · + a1n xn = b1


a21 x1 + a22 x2 + · · · + a2n xn = b2
.. .. .. .. ..
. . . . .
am1 x1 + am2 x2 + · · · + amn xn = bm

The number aij is the coefficient of xj in the ith equation and bi is the constant term
in the ith equation. Each of the m equations is a scalar equation of a hyperplane in
Rn .

Definition 12.2: Solution Set of a System of Linear Equations


h s1 i
.
A vector ~s = .. is a solution to a system of m equations in n variables if all m
sn
equations are satisfied when we set xj = sj for j = 1, . . . , n. The set of all solutions
to a system of equations is called the solution set.

We may view the solution set of a system of m equations in n variables as the intersection
of m hyperplanes in Rn determined by the system.

Example 12.2
Solving the system of two linear equations in two variables

a11 x1 + a12 x2 = b1
a21 x1 + a22 x2 = b2

can be viewed as finding the points of intersection of the two lines with scalar equations
a11 x1 + a12 x2 = b1 and a21 x1 + a22 x2 = b2 (where we are assuming that a11 , a12 are not
both zero and that a21 , a22 are not both zero). The following figure shows the possible
number of solutions we may obtain.

84
We see that a system of two equations in two variables can have no solutions, exactly
one solution or infinitely many solutions. Figure 12.1 shows a similar situation when we
consider a system of three equations in three variables, which we may view geometrically as
intersecting three planes in R3 . Indeed we will see that for any linear system of m equations
in n variables, we will obtain either no solutions, exactly one solution, or infinitely many
solutions.

Figure 12.1: Number of solutions resulting from intersecting three planes. Note
that there are other ways to arrange these planes to obtain the given number of
solutions.

Definition 12.3: Consistent and Inconsistent


We call a linear system of equations consistent if it has at least one solution. Otherwise,
we call the linear system inconsistent.

85
Example 12.3
Solve the linear system
x1 + 3x2 = −1
x 1 + x2 = 3
Solution. To begin, we will eliminate x1 in the second equation by subtracting the first
equation from the second:
!
x1 + 3x2 = −1 Subtract the first x1 + 3x2 = −1
−→ −→
x1 + x2 = 3 equation from the second −2x2 = 4

Next, we multiply the second equation by a factor of − 12 :


!
x1 + 3x2 = −1 Multiply second x1 + 3x2 = −1
−→ 1
−→
−2x2 = 4 equation by − 2 x2 = −2

Finally we eliminate x2 from the first equation by subtracting the second equation
from the first equation three times:
!
x1 + 3x2 = −1 Subtract 3 times the second x1 = 5
−→ −→
x2 = −2 equation from the first x2 = −2

From here, we conclude that the given system is consistent and


" # " #
x1 = 5 x1 5
or = or (x1 , x2 ) = (5, −2)
x2 = −2 x2 −2

which we refer to as the parametric form of the solution, the vector form of the solution
and the point form of the solution respectively.

Notice that when we write a system of equations, we always list the variables in order and
that when we solve a system of equations, we are ultimately concerned with the coefficients
and constant terms. Thus, we can write the above systems of equations and the subsequent
operations we used to solve the system more compactly:
" # " # " # " #
1 3 −1 −→ 1 3 −1 −→ 1 3 −1 R1 −3R2 1 0 5
1 1 3 R2 −R1 0 −2 4 − 12 R2 0 1 −2 −→ 0 1 −2
so " # " #
x1 5
=
x2 −2

86
as above. We call " #
1 3
1 1
the coefficient matrix 6 of the linear system, which is often denoted by A. The vector
" #
−1
3

is the constant matrix (or constant vector) of the linear system and will be denoted by ~b.
Finally " #
1 3 −1
1 1 3
is the augmented matrix of the linear system, and will be denoted by [ A | ~b ]. This is gener-
alized in the following definition.

Definition 12.4: Coefficient and Augmented Matrices, Constant Vector


For the system of linear equations
a11 x1 + a12 x2 + · · · + a1n xn = b1
a21 x1 + a22 x2 + · · · + a2n xn = b2
.. .. .. .. .. ,
. . . . .
am1 x1 + am2 x2 + · · · + amn xn = bm
the coefficient matrix is
 
a11 a12 · · · a1n
a21 a22 · · · a2n
 
 
A= .. .. .. .. ,
. . . .
 
 
am1 am2 · · · amn
 
b.1
the constant vector is ~b = .. and the augmented matrix is
bm
 
a11 a12 · · · a1n b1
a21 a22 · · · a2n b2
 
[ A | ~b ] = 
 
.. .. .. .. .. .
. . . . .
 
 
am1 am2 · · · amn bm

6
A matrix will be formally defined in Lecture 18 - for now, we view them as rectangular arrays of numbers
used to represent systems of linear equations.

87
From the discussion immediatly following Example 12.3, we see that by taking the aug-
mented matrix of a linear system of equations, we can “reduce” it to an augmented matrix
of a simpler system from which we can “read off” the solution. Notice that by writing things
in this way, we are simply supressing the variables (since we know x1 is always the first
variable and x2 is always the second variable), and treating the equations as rows of the
augmented matrix. Thus, the operation R2 − R1 written to the right of the second row of an
augmented matrix means that we are subtracting the first row from the second to obtain a
new second row which would appear in the next augmented matrix. The following definition
summarizes the operations we are allowed to perform to an augmented matrix.

Definition 12.5: Elementary Row Operations

We are allowed to perform the following Elementary Row Operations (EROs) to the
augmented matrix of a linear system of equations:

• Swap two rows.

• Add a scalar multiple of one row to another.

• Multiply any row by a nonzero scalar.

We say that two systems are equivalent if they have the same solution set. A system derived
from a given system by performing elementary row operations on its augmented matrix will
be equivalent to the given system. Thus elementary row operations allow us to reduce a
complicated system to one that is easier to solve. In the previous example, since
" # " #
1 3 −1 1 0 5
−→
1 1 3 0 1 −2

the systems they represent, namely

x1 + 3x2 = −1 x1 = 5
and ,
x1 + x2 = 3 x2 = −2
must have the same solution set. Clearly, the second system is easier to solve as we can
simply read off the solution.

Example 12.4
Solve the linear system of equations

2x1 + x2 + 9x3 = 31
x2 + 2x3 = 8
x1 + 3x3 = 10

88
Solution. To solve this system, we perform elementary row operations to the aug-
mented matrix:
     
2 1 9 31 −→ 1 0 3 10 −→ 1 0 3 10 −→
 0 1 2 8  R1 ↔R3  0 1 2 8   0 1 2 8 
     

1 0 3 10 2 1 9 31 R3 −2R1 0 1 3 11 R3 −R2
   
1 0 3 10 R1 −3R3 1 0 0 1
 0 1 2 8  R2 −2R3  0 1 0 2 
   

0 0 1 3 −→ 0 0 1 3

We thus have
   
x1 = 1 x1 1
x2 = 2 or  x2  =  2  or (x1 , x2 , x3 ) = (1, 2, 3)
   

x3 = 3 x3 3

as our solution.

89
Lecture 13

Solving Systems of Linear Equations


In the last lecture, it perhaps wasn’t clear which elementary row operations one should
perform on an augmented matrix in order to solve a linear system of equations. Note that
in the two examples done last lecture, we computed
" # " #
1 3 −1 1 0 5
−→
1 1 3 0 1 −2

and
   
2 1 9 31 1 0 0 1
 0 1 2 8  −→  0 1 0 2 
   

1 0 3 10 0 0 1 3

In both cases, we chose our elementary row operations in order to get to the augmented
matrices on the right, and this is the “form” that we are looking for.

Definition 13.1: Row Echelon Form and Reduced Row Echelon Form
• The first nonzero entry in each row of a matrix is called a leading entry (or a
pivot).

• A matrix is in Row Echelon Form (REF) if

(1) All rows whose entries are all zero appear below all rows that contain
nonzero entries,
(2) Each leading entry is to the right of the leading entries above it.

• A matrix is in Reduced Row Echelon Form (RREF) if it is in REF and

(3) Each leading entry is a 1 called a leading one,


(4) Each leading one is the only nonzero entry in its column.

Note that by definition, if a matrix is in RREF, then it is in REF.

When row reducing the augmented matrix of a linear system of equations, we aim first to
reduce the augmented matrix to REF. Once we have reached an REF form, we continue
using elementary row operations until we reach RREF where we can simply read off the
solution.

90
Recalling Example 12.4, we rewrite the steps taking to row reduce the augmented matrix of
the system and circle the leading entries:
     
2 1 9 31 −→ 1 0 3 10 −→ 1 0 3 10 −→
 0 1 2 8  R1 ↔R3  0 1 2 8   0 1 2 8 
     

1 0 3 10 2 1 9 31 R3 −2R1 0 1 3 11 R3 −R2
   
1 0 3 10 R1 −3R3 1 0 0 1
 0 1 2 8  R2 −2R3  0 1 0 2 
   

0 0 1 3 −→ 0 0 1 3
| {z } | {z }
REF REF and RREF
We point out here that any matrix has many REFs, but the RREF is always unique for any
matrix.

Example 13.1
Solve the linear system of equations
3x1 + x2 = 10
2x1 + x2 + x3 = 6
−3x1 + 4x2 + 15x3 = −20
Solution. We use elementary row operations to carry the augmented matrix of the
system to RREF.
     
3 1 0 10 R1 −R2 1 0 −1 4 −→ 1 0 −1 4 −→
 2 1 1 6  −→  2 1 1 6  R2 −2R1  0 1 3 −2 
     

−3 4 15 −20 −3 4 15 −20 R3 +3R1 0 4 12 −8 R3 −4R2


 
1 0 −1 4
 0 1 3 −2 
 

0 0 0 0
If we write out the resulting system, we have
x1 − x3 = 4
x2 + 3x3 = −2
0 = 0
The last equation is clearly always true, and from the first two equations, we can solve
for x1 and x2 respectively to obtain
x1 = 4 + x3
x2 = −2 − 3x3

91
We see that there is no restriction on x3 , so we let x3 = t ∈ R. Thus our solution is
     
x1 = 4 + t x1 4 1
x2 = −2 − 3t , t ∈ R or  x2  =  −2  + t  −3  , t ∈ R.
     

x3 = t x3 0 1

Geometrically, we view solving the above system of equations as finding those points in R3
that lie on the three planes 3x1 + x2 = 10, 2x1 + x2 + x3 = 6 and −3x1 + 4x2 + 15x3 = −20.
Notice that the solution we obtained
     
x1 4 1
 x2  =  −2  + t  −3  , t ∈ R
     

x3 0 1

is the vector equation of a line in R3 . Hence we see that the three planes intersect in a line,
and we have found a vector equation for that line. See Figure 13.1.

Figure 13.1: The intersection of the three planes in R3 is a line. Note that the
planes may not be arranged exactly as shown.

That our solution was a line in R3 was a direct consequence of the fact that there were no
restrictions on the variable x3 and that as a result, our solutions for x1 and x2 depended on
x3 . This motivates the following definition.

92
Definition 13.2: Leading Variable and Free Variable

Consider a consistent system of equations with augmented matrix [ A | ~b ], and let


[ R | ~c ] be any REF of [ A | ~b ]. If the jth column of R has a leading entry in it, then
the variable xj is called a leading variable. If the jth column of R does not have a
leading entry, then xj is called a free variable.

In our last example,


     
3 1 0 10 R1 −R2 1 0 −1 4 −→ 1 0 −1 4 −→
 2 1 1 6  −→  2 1 1 6   0 1 3 −2 
     
R2 −2R1

-3 4 15 −20 -3 4 15 −20 R3 +3R1 0 4 12 −8 R3 −4R2


 
1 0 −1 4
 0 1 3 −2 
 

0 0 0 0
| {z }
REF and RREF
h 1 0 −1 i
With R = 0 1 3 being an REF of the coefficient matrix of the linear system of equations,
0 0 0
we see that R has leading entries (leading ones, in fact) in the first and second columns only.
Thus Definition 13.2 states that x1 and x2 are leading variables while x3 is a free variable.

When solving a consistent system, if there are free variables, then each free variable is as-
signed a different parameter, and then the leading variables are solved for in terms of the
parameters. The existence of a free variable guarantees that there will be infinitely many
solutions to the linear system of equations.

Example 13.2
Solve the linear system of equations

x1 + 6x2 − x4 = −1
.
x3 + 2x4 = 7

Solution. The augmented matrix for this system of linear equations


" #
1 6 0 −1 −1
0 0 1 2 7

is already in RREF. The leading entries are in the first and third columns, so x1 and
x3 are leading variables while x2 and x4 are free variables. We will assign x2 and x4

93
different parameters. We have

x1 = −1 − 6s + t
x2 = s
, s, t ∈ R
x3 = 7 − 2t
x4 = t

or as a vector equation
       
x1 −1 −6 1
 x2   0   1   0 
=  + s  + t , s, t ∈ R
       

 x3   7   0   −2 
x4 0 0 1

which we recognize as the vector equation of a plane in R4 .

Example 13.3
Solve the linear system of equations

2x1 + 12x2 − 8x3 = −4


2x1 + 13x2 − 6x3 = −5
−2x1 − 14x2 + 4x3 = 7

Solution. We have
     
2 12 −8 −4 −→ 2 12 −8 −4 −→ 2 12 −8 −4
 2 13 −6 −5  R2 −R1  0 1 2 −1   0 1 2 −1 
     

−2 −14 4 7 R3 +R1 0 −2 −4 3 R3 +2R2 0 0 0 1

The resulting system is

2x1 + 12x2 − 8x3 = −4


x2 + 2x3 = −1
0 = 1

Clearly, the last equation can never be satisfied for any x1 , x2 , x3 ∈ R. Hence our
system is inconsistent, that is, it has no solution.

Geometrically, we see that the three planes 2x1 + 12x2 − 8x3 = −4, 2x1 + 13x2 − 6x3 = −5
and −2x1 − 14x2 + 4x3 = 7 of Example 13.3 have no point in common. Notice that no two

94
of these planes are parallel so the planes are arranged similarly to what is depicted in Figure
13.2.

Figure 13.2: Three nonparallel planes that have no common point of intersection.

Keeping track of our leading entries in Example 13.3, we see


     
2 12 −8 −4 −→ 2 12 −8 −4 −→ 2 12 −8 −4
 2 13 −6 −5  R2 −R1  0 1 2 −1   0 1 2 −1 
     

-2 −14 4 7 R3 +R1 0 -2 −4 3 R3 +2R2 0 0 0 1


| {z }
REF (but not RREF)

If row reducing an augmented matrix reveals a row of the form


h i
0 ··· 0 c

with c 6= 0, then the system is inconsistent. Thus, there is no need to continue row operations
in this case. Note that in a row of the form [ 0 ··· 0 | c ] with c 6= 0, the entry c is a leading
entry. Thus, a leading entry appearing in the last column of an augmented matrix indicates
that the system of linear equations is inconsistent.

All of our work for systems of linear equations can easily be generalized to the complex case.

Example 13.4
Solve the linear system of equations

jz1 − z2 − z3 + (−1 + j)z4 = −1


− (1 + j)z3 − 2jz4 = −1 − 3j
2jz1 − 2z2 − z3 − (1 − 3j)z4 = j

95
Solution. Our method to solve this system is no different than in the real case. We
take the augmented matrix of the system and use elementary row operations to carry
it to RREF. Note that our elementary row operations now involve multiplying a row
by a complex number and adding a complex multiple of one row to another, in addition
to swapping two distinct rows.
   
j −1 −1 −1 + j −1 −→ j −1 −1 −1 + j −1 −jR1

 0 0 −1 − j −2j −1 − 3j  0 0 −1 − j −2j −1 − 3j  R2 ↔R3


   

2j −2 −1 −1 + 3j j R3 −2R1 0 0 1 1+j 2+j −→
   
1 j j 1+j j −→ 1 j j 1+j j R1 −jR2

 0 0 1 1+j 2+j  0 0 1 1+j 2 + j  −→


   

0 0 −1 − j −2j −1 − 3j R3 +(1+j)R2 0 0 0 0 0
 
1 j 0 2 1−j
 0 0 1 1+j 2+j 
 

0 0 0 0 0

We see that the system is consistent and that z1 and z3 are leading variables while z2
and z4 are free variables. Thus

z1 = (1 − j) − js − 2t
z2 = s
s, t ∈ C
z3 = (2 + j) − (1 + j)t
z4 = t
or        
z1 1−j −j −2
 z2   0   1   0 
=  + s  + t , s, t ∈ C
       

 z3   2+j   0   −1 − j 
z4 0 0 1

Note that when we are dealing with a complex system of linear equations, our parameters
should be complex numbers rather than just real numbers.

96
Lecture 14

Rank
After solving numerous systems of equations, we are beginning to see the importance of lead-
ing entries in an REF of the augmented matrix of the system. This motivates the following
definition.

Definition 14.1: Rank


The rank of a matrix A, denoted by rank (A), is the number of leading entries in any
REF of A.

Note that although we don’t prove it here, given a matrix and any two of its REFs, the
number of leading entries in both of these REFs will be the same. This means that our
definition of rank actually makes sense.

Example 14.1
Consider the following three matrices A, B and C along with one of their REFs. Note
that A and B are being viewed as augmented matrices for a linear system of equations,
while C is being viewed as a coefficient matrix.
   
2 1 9 31 1 0 3 10
A =  0 1 2 8  −→  0 1 2 8 
   

1 0 3 10 0 0 1 3
" # " #
2 0 1 3 4 1 1 4 −13 −5
B= −→
5 1 6 −7 3 0 -2 −7 29 14
" # " #
1 2 3 1 2 3
C= −→
2 4 6 0 0 0

We see that rank (A) = 3, rank (B) = 2 and rank (C) = 1.

Note that the requirement that a matrix be in REF before counting leading entries is im-
portant. The matrix " #
1 2 3
C=
2 4 6
has two leading entries, but rank (C) = 1.

97
Note that if a matrix A has m rows and n columns, then rank (A) ≤ min{m, n}, the mini-
mum of m and n. This follows from the definition of leading entries and REF: there can be
at most one leading entry in each row and each column.

The next theorem is useful to analyze systems of equations and will be used later in the
course.

Theorem 14.1: System-Rank Theorem

Let [ A | ~b ] be the augmented matrix of a system of m linear equations in n variables.

(1) The system is consistent if and only if rank (A) = rank [ A | ~b ]




(2) If the system is consistent, then the number of parameters in the general solution
is the number of variables minus the rank of A:

# of parameters = n − rank (A).

(3) The system is consistent for all ~b ∈ Rm if and only if rank (A) = m.

We don’t prove the System-Rank Theorem here. However, we will look at some of the sys-
tems we have encountered thus far and show that they each satisfy all three parts of the
System-Rank Theorem.

Example 14.2
From Example 12.4, the system of m = 3 linear equations in n = 3 variables
2x1 + x2 + 9x3 = 31
x2 + 2x3 = 8
x1 + 3x3 = 10
has augmented matrix
   
2 1 9 31 1 0 0 1
[ A | ~b ] =  0 1 2 8  −→  0 1 0 2 
   

1 0 3 10 0 0 1 3
and solution    
x1 1
 x2  =  2  .
   

x3 3
From the System-Rank Theorem we see that

98
(1) rank (A) = 3 = rank [ A | ~b ] so the system is consistent.


(2) # of parameters = n − rank (A) = 3 − 3 = 0 so there are no parameters in the


solution (unique solution).

(3) rank (A) = 3 = m so the system will be consistent for any ~b ∈ R3 , that is, the
system

2x1 + x2 + 9x3 = b1
x2 + 2x3 = b2
x1 + 3x3 = b3
will be consistent (with a unique solution) for any choice of b1 , b2 , b3 ∈ R.

Example 14.3
From Example 13.1, the system of m = 3 linear equations in n = 3 variables

3x1 + x2 = 10
2x1 + x2 + x3 = 6
−3x1 + 4x2 + 15x3 = −20

has augmented matrix


   
3 1 0 10 1 0 −1 4
[ A | ~b ] =  2 1 1 6  −→  0 1 3 −2 
   

−3 4 15 −20 0 0 0 0

and solution      
x1 4 1
 x2  =  −2  + t  −3  , t ∈ R.
     

x3 0 1
From the System-Rank Theorem, we have

(1) rank (A) = 2 = rank [ A | ~b ] so the system is consistent.




(2) # of parameters = n − rank (A) = 3 − 2 = 1 so there is 1 parameter in the


solution (infinitely many solutions).

(3) rank (A) = 2 6= 3 = m, so the system will not be consistent for every ~b ∈ R3 ,
that is, the system

99
3x1 + x2 = b1
2x1 + x2 + x 3 = b2
−3x1 + 4x2 + 15x3 = b3
will be inconsistent for some choice of b1 , b2 , b3 ∈ R.

Example 14.4
From Example 13.2, the system of m = 2 equations in n = 4 variables

x1 + 6x2 − x4 = −1
x3 + 2x4 = 7

has augmented matrix that is already in RREF


" #
1 6 0 −1 −1
[ A | ~b ] =
0 0 1 2 7

and solution
       
x1 −1 −6 1
 x2   0   1   0 
=  + s  + t , s, t ∈ R
       

 x3   7   0   −2 
x4 0 0 1

From the System-Rank Theorem,

(1) rank (A) = 2 = rank [ A | ~b ]) so the system is consistent.

(2) # of parameters = n − rank (A) = 4 − 2 = 2 so there are 2 parameters in the


solution (infinitely many solutions).

(3) rank (A) = 2 = m, so the system will be consistent for every ~b ∈ R2 , that is, the
system

x1 + 6x2 − x 4 = b1
x3 + 2x4 = b2
will be consistent (with infinitely many solutions) for any choice of b1 , b2 ∈ R.

100
Example 14.5
From Example 13.3, the system of m = 3 linear equations in n = 3 variables

2x1 + 12x2 − 8x3 = −4


2x1 + 13x2 − 6x3 = −5
−2x1 − 14x2 + 4x3 = 7

has augmented matrix


   
2 12 −8 −4 2 12 −8 −4
[ A | ~b ] =  2 13 −6 −5  −→  0 1 2 −1 
   

−2 −14 4 7 0 0 0 1

and is inconsistent. From the System-Rank Theorem, we see

(1) rank (A) = 2 < 3 = rank [ A | ~b ] , so the system is inconsistent.




(2) as the system is inconsistent, the System-Rank Theorem does not apply here.

(3) rank (A) = 2 < 3 = m so the system will not be consistent for everyh~b ∈i R3 .
−4
Indeed, as our work shows, the system is clearly not consistent for ~b = −5 .
7

In our last example, it is tempting to think that the system [ A | ~b ] will be inconsistent for
every ~b ∈ R3 , however, this is not the case. If we take ~b = ~0, then our system becomes

2x1 + 12x2 − 8x3 = 0


2x1 + 13x2 − 6x3 = 0
−2x1 − 14x2 + 4x3 = 0

It isn’t difficult to see that x1 = x2 = x3 = 0 is a solution, so that this system is indeed


consistent. Of course, we could ask for which ~b ∈ R3 is this system consistent.

Example 14.6
Find an equation that b1 , b2 , b3 ∈ R must satisfy so that the system

2x1 + 12x2 − 8x3 = b1


2x1 + 13x2 − 6x3 = b2
−2x1 − 14x2 + 4x3 = b3

is consistent.

101
Solution. We carry the augmented matrix of this system to REF.
     
2 12 −8 b1 −→ 2 12 −8 b1 −→ 2 12 −8 b1
 2 13 −6 b2  0 1 2 b2 − b1   0 1 2 b2 − b1
     
R2 −R1  
−2 −14 4 b3 R3 +R1 0 −2 −4 b3 + b1 R3 +2R2 0 0 0 −b1 + 2b2 + b3

We see rank (A) = 2 so we require rank [ A | ~b ] = 2 for consistency. Thus, we have




−b1 + 2b2 + b3 = 0.

Note that if −b1 + 2b2 + b3 6= 0, then the above system is inconsistent.

It’s possible that a linear system of equations may have coefficients which are defined in terms
of a parameters (which we assume to be real numbers). Different values of these parameter
will lead to different of equations. We can use the System-Rank Theorem to determine which
values of the parameters will lead to systems with no solutions, one solution, and infinitely
many solutions.

Example 14.7
For which values of the parameters k, ` ∈ R does the system

2x1 + 6x2 = 5
4x1 + (k + 15)x2 = ` + 8

have no solutions? A unique solution? Infinitely many solutions?


Solution. Let " # " #
2 6 5
A= and ~b = .
4 k + 15 `+8

We carry [ A | ~b ] to REF.
" # " #
2 6 5 −→ 2 6 5
4 k + 15 ` + 8 R2 −2R1 0 k+3 `−2

If k + 3 6= 0, that is if k 6= −3, then rank (A) = 2 = rank [ A | ~b ] so the system




is consistent with 2 − rank (A) = 2 − 2 = 0 parameters. Hence we obtain a unique


solution. If k + 3 = 0, that is if k = −3, then
" # " #
2 6 5 2 6 5
simplifies as .
0 k+3 `−2 0 0 `−2

102
If ` − 2 6= 0, that is if ` 6= 2, then rank (A) = 1 < 2 = rank [ A | ~b ] so the system


is inconsistent and thus has  no solutions. If ` − 2 = 0, that is if ` = 2, then


rank (A) = 1 = rank [ A | ~b ] so the system is consistent with 2 − rank (A) = 2 − 1 = 1
parameter. Hence we have infinitely many solutions.

In summary,
Unique Solution : k 6= −3
No Solutions : k = −3 and ` 6= 2 .
Infinitely Many Solutions : k = −3 and ` = 2

Definition 14.2: Underdetermined Linear System of Equations


A linear system of m equations in n variables is underdetermined if n > m, this is, if
it has more variables than equations.

Example 14.8
The linear system of equations

x1 + x 2 − x3 + x4 − x5 = 1
x1 − x2 − 3x3 + 2x4 + 2x5 = 7

is underdetermined.

Theorem 14.2
A consistent underdetermined linear system of equations has infinitely many solutions.

Proof. Consider a consistent underdetermined linear system of m equations in n variables


with augmented matrix [ A | ~b ]. Since rank (A) ≤ min{m, n} = m, the system will have
n − rank (A) ≥ n − m > 0 parameters by the System-Rank Theorem, and so will have
infinitely many solutions.

Definition 14.3: Overdetermined Linear System of Equations


A linear system of m equations in n variables is overdetermined if n < m, this is, if it
has more equations than variables.

103
Example 14.9
The linear system of equations

−2x1 + x2 = 2
x1 − 3x2 = 4
3x1 + 2x2 = 7

is overdetermined.

Note that overdetermined systems are often inconsistent. Indeed, the system in the previous
example is inconsistent. To see why this is, consider for example, three lines in R2 (so a
system of three equations in two variables like the one in the previous example). When
chosen arbitrarily, it is generally unlikely that all three lines would intersect in a common
point and hence we would generally expect no solutions.

104
Lecture 15

Homogeneous Systems of Linear Equations


We now discuss a particular type of linear system of equations that have some very nice
properties.

Definition 15.1: Homogeneous Linear System of Equations


A homogeneous linear equation is a linear equation where the constant term is zero. A
system of homogeneous linear equations is a collection of finitely many homogeneous
equations.

A homogeneous system of m linear equations in n variables is written as


a11 x1 + a12 x2 + · · · + a1n xn = 0
a21 x1 + a22 x2 + · · · + a2n xn = 0
.. .. .. .. ..
. . . . .
am1 x1 + am2 x2 + · · · + amn xn = 0
As this is still a linear system of equations, we use our usual techniques to solve such systems.
However, notice that x1 = x2 = · · · = xn = 0 satisfies each equation in the homogeneous
system, and thus ~0 ∈ Rn is a solution to this system, called the trivial solution. As every
homogeneous system has a trivial solution, we see immediately that homogeneous linear
systems of equations are always consistent.

Example 15.1
Solve the homogeneous linear system

x1 + x2 + x3 = 0
.
3x2 − x3 = 0

Solution. We have
" # " # " #
1 1 1 0 −→ 1 1 1 0 R1 −R2 1 0 4/3 0
0 3 −1 0 1
R
3 2
0 1 −1/3 0 −→ 0 1 −1/3 0
so    
x1 = − 34 t x1 −4/3
1
x2 = t , t∈R or  x2  = t  1/3  , t ∈ R.
   
3
x3 = t x3 1

105
We make a few remarks about Example 15.1:
• Note that taking t = 0 gives the trivial solution, which is just one of infinitely many
solutions for the system. This should not be surprising since our system is under-
determined and consistent (consistency follows from the system being homogeneous).
Indeed, the solution set is actually a line through the origin.
• We can simplify our solution a little bit by eliminating fractions:
       
x1 −4/3 −4 −4
 t
 x2  = t  1/3  =  1  = s  1  , s ∈ R
     
3
x3 1 3 3
where s = t/3. Hence we can let the parameter “absorb” the factor of 1/3. This is not
necessary, but is useful if one wishes to eliminate fractions.
• When row reducing the augmented matrix of a homogeneous systems of linear equa-
tions, notice that the last column always contains zeros regardless of the row operations
performed. Thus, it is common to row reduce only the coefficient matrix:
" # " # " #
1 1 1 −→ 1 1 1 R1 −R2 1 0 4/3
.
0 3 −1 1
R
3 2
0 1 −1/3 −→ 0 1 −1/3

Definition 15.2: Associated Homogeneous System of Linear Equations

Given a non-homogeneous linear system of equations with augmented matrix [ A | ~b ] (so


~b 6= ~0), the homogeneous system with augmented matrix [ A | ~0 ] is called the associated
homogeneous system.

The solution to the associated homogeneous system tells us a lot about the solution of the
original non-homogeneous system. If we solve the system

x1 + x2 + x3 = 1
(14)
3x2 − x3 = 3
we have
" # " # " #
1 1 1 1 −→ 1 1 1 1 R1 −R2 1 0 4/3 0
0 3 −1 3 1
R
3 2
0 1 −1/3 1 −→ 0 1 −1/3 1
so
     
x1 = − 34 t x1 0 −4/3
x2 = 1 + 13 t, t∈R or  x2  =  1  + t  1/3  , t ∈ R.
     

x3 = t x3 0 1

106
Recall that the solution to the associated homogeneous system from Example 15.1 is
   
x1 −4/3
 x2  = t  1/3  , t ∈ R
   

x3 1

so we view the homogeneous solution from Example 15.1 as a line, say L0 , through the origin,
and
h i the solution from (14) as a line, say L1 , through P (0, 1, 0) parallel to L0 . We refer to
0
1 as a particular solution to (14) and note that in general, the solution to a consistent
0
non-homogeneous system of linear equations is a particular solution to that system plus the
general solution to the associated homogeneous system of linear equations.
solution to the
solution to the system of equations associated homogeneous system of equations
z
    }|   { z   }|  {
x1 0 −4/3 x1 −4/3
 x2  =  1  + t  1/3 , t∈R  x2  = t  1/3  , t∈R .
         

x3 0 1 x3 1
| {z } | {z }
particular associated
solution homogeneous
solution

Example 15.2
Consider the system of linear equations

x1 + 6x2 − x4 = −1
x3 + 2x4 = 7

We know from Example 13.2 that the solution is


       
x1 −1 −6 1
 x2   0   1   0 
=  + s  + t , s, t ∈ R,
       

 x3   7   0   −2 
x4 0 0 1
 −6   1

4 1 0
which is as a plane through (−1, 0, 7, 0) in R since the vectors 0 and −2 are
0 1
nonzero and not parallel. Thus the solution to the associated homogeneous system

x1 + 6x2 − x4 = 0
x3 + 2x4 = 0

107
is      
x1 −6 1
 x2   1   0 
 = s  + t , s, t ∈ R.
     

 x3   0   −2 
x4 0 1
which we recognize as a plane through the origin in R4 .

Another nice property of homogeneous systems of linear equations is that given two solu-
tions, say ~x1 and ~x2 , any linear combination of them is also a solution to the system.

Example 15.3
y 
.1
Consider a homogeneous system of m equations in n unknowns. Suppose ~y = ..
yn
h z1 i
..
and ~z = . are solutions to this system. Show that c1 ~y + c2~z is also a solution to
zn
this system for any c1 , c2 ∈ R.
Proof. Since ~y and ~z satisfy the homogeneous system of linear equations, they satisfy
any arbitrary equation of the system, say a1 x1 + · · · + an xn = 0. Thus we have that

a1 y1 + · · · + an yn = 0 = a1 z1 + · · · + an zn .

We verify that
 
c1 y1 + c2 z1
..
c1 ~y + c2~z = 
 
. 
c1 yn + c2 zn
satisfies this arbitrary equation as well. We have

a1 (c1 y1 + c2 z1 ) + · · · + an (c1 yn + c2 zn ) = c1 (a1 y1 + · · · + an yn ) + c2 (a1 z1 + · · · + an zn )


= c1 (0) + c2 (0)
= 0.

Hence c1 ~y + c2~z is also a solution to the homogeneous system of linear equations.

Some Comments on Combining Elementary Row Operations


Having performed many elementary row operations by this point, it’s a good idea to review
some rules about combining elementary row operations, that is, performing multiple ele-
mentary row operations in the same step. Many of the previous examples contain instances

108
where systems are solved by performing multiple row operations to the augmented matrix
in the same step. For example,
   
1 0 −1 4 −→ 1 0 −1 4
 2 1 1 6  R2 −2R1  0 1 3 −2  .
   

−3 4 15 −20 R3 +3R1 0 4 12 −8

Here we are simply using one row to modify the other rows. This is completely acceptable
(and encouraged) since we only have to write out matrices twice as opposed to three times.
We must be careful however, as not all elementary row operations can be combined. Consider
the following linear system of equations.

x1 + x2 = 1
.
x1 − x2 = −1

If we perform the following operations


" # " # " # " #
1 1 1 R1 −R2 0 2 2 −→ 0 2 2 1
2
R 1 0 1 1
−→
1 −1 −1 R2 −R1 0 −2 −2 R2 +R1 0 0 0 −→ 0 0 0

then we find that " # " #


0 1
~x = +t , t∈R
1 0
appears to be the solution. However, this is incorrect since the system has the unique solution
~x = [ 01 ]. The error occurs in the first set of row operations. Here both the first and second
rows are used to modify the other. If we perform R1 − R2 to R1 , then we have now changed
the first row. If we then go on to perform R2 − R1 to R2 , then we should use the updated
R1 and not the original R1 . Thus we should separate our first step above into two steps:
" # " # " #
1 1 1 R1 −R2 0 2 2 −→ 0 2 2
−→ · · · .
1 −1 −1 −→ 1 −1 −1 R2 −R1 1 −3 −3

Clearly, this is not the best choice of row operations to solve the system! However the goal
of this example is not to find a solution, but rather illustrate that we should not modify a
given row in one step while at the same time using it to modify another row.

Another thing to avoid is modifying a row multiple times in the same step. This itself is
not mathematically wrong, but is generally shunned as it often leads students to arithmetic
errors. For example, while
   
2 1 3 −→ 2 1 3
 6 2 4   6 2 4 
   

18 5 7 R3 +3R1 −4R2 0 0 0

109
is mathematically correct, it is not immediately obvious that such a row operation would be
useful, and it forces the student to do more “mental math” which often leads to mistakes.
A better option would be
     
2 1 3 −→ 2 1 3 −→ 2 1 3
 6 2 4  R2 −3R1  0 −1 −5   0 −1 −5 
     

18 5 7 R3 −9R1 0 −4 −20 R3 −4R2 0 0 0

which is more natural and has simpler computations.

To summarize, students are encouraged to combine row operations as it leads to less writing
and shorter solutions. However, keep in mind that on any given step, one must not modify a
given row while using that row to modify another row, and that one should avoid modifying
a row more than once in the same step.

110
Lecture 16

Application: Balancing Chemical Reactions


A very simple chemical reaction often learned in high school is the combination of hydrogen
molecules (H2 ) and oxygen molecules (O2 ) to produce water (H2 O). Symbolically, we write
H2 + O2 −→ H2 O
The process by which molecules combine to form new molecules is called a chemical reaction.
Note that each hydrogen molecule is composed of two hydrogen atoms, each oxygen molecule
is composed of two oxygen atoms, and that each water molecule is composed of two hydrogen
atoms and one oxygen atom. Our goal is to balance this chemical reaction, that is, compute
how many hydrogen molecules and how many oxygen molecules are needed so that there are
the same number of atoms of each type both before and after the chemical reaction takes
place. By inspection, we find that
2H2 + O2 −→ 2H2 O
That is, two hydrogen molecules and one oxygen molecule combine to create two water
molecules. Before this chemical reaction takes place, there are four hydrogen atoms and
two oxygen atoms. After the reaction, there are again four hydrogen atoms and two oxygen
atoms. Thus we have balanced the chemical reaction.

Balancing chemical reactions by inspection becomes increasingly difficult as more complex


molecules are introduced. For example, the chemical reaction photosynthesis is a process
where plants combine carbon dioxide (CO2 ) and water (H2 O) to produce glucose (C6 H12 O6 )
and oxygen (O2 ):
CO2 + H2 O −→ C6 H12 O6 + O2
Although this could be solved by inspection, we look at another method. Let x1 denote
the number of CO2 molecules, x2 the number of H2 O molecules, x3 the number of C6 H12 O6
molecules and x4 the number of O2 molecules. Then we have
x1 CO2 + x2 H2 O −→ x3 C6 H12 O6 + x4 O2
Equating the number of atoms of each type before and after the reaction gives the equations
C: x1 = 6x3
O : 2x1 + x2 = 6x3 + 2x4
H: 2x2 = 12x3
Moving all variables to the left in each equation gives the homogeneous system
x1 − 6x3 = 0
2x1 + x2 − 6x3 − 2x4 = 0
2x2 − 12x3 = 0

111
Row reducing the augmented matrix of this system to RREF gives
     
1 0 −6 0 0 −→ 1 0 −6 0 0 −→ 1 0 −6 0 0 −→
 2 1 −6 −2 0  R2 −2R1  0 1 6 −2 0   0 1 6 −2 0 
     

0 2 −12 0 0 1
R
2 3
0 1 −6 0 0 R3 −R2 0 0 −12 2 0 − 12 R3
     
1 0 −6 0 0 R1 +R3 1 0 0 −1 0 −→ 1 0 0 −1 0
 0 1 6 −2 0  R2 −R3  0 1 0 −1 0   0 1 0 −1 0 
     

0 0 6 −1 0 −→ 0 0 6 −1 0 1
R
6 3
0 0 1 −1/6 0

We see that for t ∈ R,

x1 = t, x2 = t, x3 = t/6 and x4 = t

There are infinitely many solutions to the homogeneous system. However, since we cannot
have a negative number of molecules nor a fractional number of molecules, we require that
x1 , x2 , x3 and x4 be nonnegative integers. This implies that t should be an integer multiple
of 6. Moreover, we wish to have the simplest (or “smallest”) solution, so we will take t = 6.
This gives x1 = x2 = x4 = 6 and x3 = 1. Thus,

6CO2 + 6H2 O −→ C6 H12 O6 + 6O2

balances the chemical reaction.

Exercise 16.1
The fermentation of sugar is a chemical reaction given by the following equation:

C6 H12 O6 −→ CO2 + C2 H6 O

where C6 H12 O6 is glucose, CO2 is carbon dioxide and C2 H6 O is ethanol. Balance this
chemical reaction.

Note that ethanol is also denoted by C2 H5 OH and CH3 CH2 OH

Solution. Let x1 denote the number of C6 H12 O6 molecules, x2 the number of CO2 molecules
and x3 the number of C2 H5 OH molecules. We obtain

x1 C6 H12 O6 −→ x2 CO2 + x3 C2 H6 O

Equating the number of atoms of each type before and after the reaction gives the equations

C : 6x1 = x2 + 2x3
O : 6x1 = 2x2 + x3
H : 12x1 = 6x3

112
which leads to the homogeneous system of equations

6x1 − x2 − 2x3 = 0
6x1 − 2x2 − x3 = 0
12x1 − 6x3 = 0

Carrying the augmented matrix of this system to RREF gives


     
6 −1 −2 0 −→ 6 −1 −2 0 R1 −R2 6 0 −3 0 1
R
6 1

 6 −2 −1 0  R2 −R1  0 −1 1 0  −→  0 −1 1 0  −R2
     

12 0 −6 0 R3 −2R1 0 2 −2 0 R3 +2R2 0 0 0 0 −→
 
1 0 −1/2 0
 0 1 −1 0 
 

0 0 0 0

Thus, for t ∈ R,
x1 = t/2, x2 = t and x3 = t
Taking t = 2 gives the smallest nonnegative integer solution, and we conclude that

C6 H12 O6 −→ 2CO2 + 2C2 H6 O.

Application: Linear Models


Example 16.1

An industrial city has four heavy industries (denoted by A1 , A2 , A3 , A4 ) each of which


burns coal to manufacture its products. By law, no industry can burn more than 45
units of coal per day. Each industry produces the pollutants Pb (lead), SO2 (sulfur
dioxide), and NO2 (nitrogen dioxide) at daily rates per unit of coal burned and these
are released into the atmosphere. The rates are shown in the following table.

Industry A1 A2 A3 A4
Pb 1 0 1 7
SO2 2 1 2 9
NO2 0 2 2 0

The CAAG (Clean Air Action Group) has just leaked a government report that claims
that on one day last year, 250 units of Pb, 550 units of SO2 and 400 units of NO2 were
measured in the atmosphere. An inspector reported that A3 did not break the law on
that day. Which industry (or industries) broke the law on that day?

113
Solution. Let ai denote the number of units of coal burned by Industry Ai , for i =
1, 2, 3, 4. Using the above table, we account for each of the pollutants on that day.

Pb : a1 + a3 + 7a4 = 250
SO2 : 2a1 + a2 + 2a3 + 9a4 = 550
NO2 : 2a2 + 2a3 = 400

Carrying the augmented matrix of the above system to RREF, we have


   
1 0 1 7 250 −→ 1 0 1 7 250 −→
 2 1 2 9 550  R2 −2R1  0 1 0 −5 50 
   

0 2 2 0 400 0 2 2 0 400 R3 −2R2


     
1 0 1 7 250 −→ 1 0 1 7 250 R1 −R3 1 0 0 2 100
 0 1 0 −5 50   0 1 0 −5 50  −→  0 1 0 −5 50 
     

0 0 2 10 300 1
R
2 3
0 0 1 5 150 0 0 1 5 150

From this, we find that

a1 = 100 − 2t, a2 = 50 + 5t, a3 = 150 − 5t, a4 = t

where t ∈ R. Now we look for conditions on t. We know A3 did not break that law,
so 0 ≤ a3 ≤ 45, that is,

0 ≤ 150 − 5t ≤ 45
−150 ≤ −5t ≤ −105
30 ≥ t ≥ 21

It immediately follows that A4 didn’t break that law as a4 = t. Looking at A2 , we


have
21 ≤ t ≤ 30
105 ≤ 5t ≤ 150
155 ≤ 50 + 5t ≤ 200
155 ≤ a2 ≤ 200
so A2 broke the law. Finally, for A1 , we find

21 ≤ t ≤ 30
−42 ≥ −2t ≥ −60
58 ≥ 100 − 2t ≥ 40
58 ≥ a1 ≥ 40

so it is possible that A1 broke the law, but we cannot be sure without more information.

114
Exercise 16.2
An engineering company has three divisions (Design, Production, Testing) with a
combined annual budget of $1.5 million. Production has an annual budget equal to
the combined annual budgets of Design and Testing. Testing requires a budget of at
least $80 000. What is the Production budget and the maximum possible budget for
the Design division?

Solution. Let x1 denote the annual Design budget, x2 the annual Production budget, and
x3 the annual Testing budget. It follows that x1 + x2 + x3 = 1 500 000. Since the annual
Production budget is equal the the combined Design and Testing budgets, we have x2 =
x1 + x3 . This gives the system of equations

x1 + x2 + x3 = 1 500 000
x1 − x2 + x3 = 0

Row reducing the above system gives


" # " #
1 1 1 1 500 000 −→ 1 1 1 1 500 000 −→
1 −1 1 0 R2 −R1 0 −2 0 −1 500 000 − 12 R2
" # " #
1 1 1 1 500 000 R1 −R2 1 0 1 750 000
0 1 0 750 000 −→ 0 1 0 750 000

This gives
x1 = 750 000 − t, x2 = 750 000, x3 = t
where t ∈ R. We know that the Testing budget requires at least $80 000 and can re-
ceive no more than $750 000 (since Testing shares a budget of $750 000 with Design). Thus
80 000 ≤ t ≤ 750 000. It follows that

−750 000 ≤ −t ≤ −80 000


0 ≤ 750 000 − t ≤ 670 000
0 ≤ x1 ≤ 670 000

Hence the Production budget is $750 000 and the maximum Design budget is $670 000.

115
Lecture 17

Application: Network Flow


Definition 17.1
A network consists of a system of junctions or nodes that are connected by directed
line segments.

These networks are used to model real world problems such as traffic flow, fluid flow, or any
such system where a flow is observed. We observe here the central rule that must be obeyed
by these systems.

Junction Rule: At each of the junctions (or nodes) in the network, the flow into that
junction must equal the flow out of that junction.

Definition 17.2: Equilibrium


A network where every node obeys the Junction Rule is said to be in equilibrium.

Example 17.1
The diagram below gives an example of a network with four nodes, A, B, C and D,
and eight directed line segments.

Compute all possible values of f1 , f2 , f3 and f4 so that the system is in equilibrium.

116
Solution. Using the Junction Rule at each node, we construct the following table:

Flow In Flow Out


A: 40 = f1 + f4
B: f1 + f2 = 50
C: 60 = f2 + f3
D: f3 + f4 = 50

Rearranging each of the above four linear equations leads to the following system:

f1 + f4 = 40
f1 + f2 = 50
f2 + f3 = 60
f3 + f4 = 50

Row reducing the augmented matrix to RREF, we have


     
1 0 0 1 40 −→ 1 0 0 1 40 −→ 1 0 0 1 40 −→

 1 1 0 0 50 
 R2 −R1

 0 1 0 −1 10 


 0 1 0 −1 10 

     
 0 1 1 0 60   0 1 1 0 60  R3 −R2  0 0 1 1 50 
0 0 1 1 50 0 0 1 1 50 0 0 1 1 50 R4 −R3
 
1 0 0 1 40
 0 1
 0 −1 10 

 
 0 0 1 1 50 
0 0 0 0 0

We find that

f1 = 40 − t, f2 = 10 + t, f3 = 50 − t and f4 = t

where t ∈ R.

We see that there are infinitely many values for f1 , f2 , f3 and f4 so that the system is in
equilibrium. Note that a negative solution for one of the variables means that the flow is
in the opposite direction than the one indicated in the diagram. Depending on what the
network is representing, we may have additional constraints on our variables. For example,
if the network represents water flowing through irrigation canals and the water cannot flow
in the opposite direction of the arrows, we would additionally require that each of f1 , f2 , f3

117
and f4 be nonnegative. In this case,
f1 ≥0 =⇒ 40 − t ≥ 0 =⇒ t ≤ 40
f2 ≥0 =⇒ 10 + t ≥ 0 =⇒ t ≥ −10
f3 ≥0 =⇒ 50 − t ≥ 0 =⇒ t ≤ 50
f4 ≥0 =⇒ t≥0
Here, we see that 0 ≤ t ≤ 40 with t ∈ R and that there are still infinitely many solutions.
As another example, if the flows in the above network represent the number of automobiles
moving between the junctions along one-way streets, then we require f1 , f2 , f3 and f4 to be
integers in addition to being nonnegative. In our example, this would make t = 0, 1, 2, . . . , 40,
giving us just 41 possible solutions.

When using linear algebra to model real world problems, we must be able to interpret our
solutions in terms of the problem it is modelling. This includes incorporating any real world
restrictions imposed by the system we are modelling.

Exercise 17.1
Consider four train stations labelled A, B, C and D. In the figure below, the directed
line segments represent train tracks to and from stations, and the numbers represent
the number of trains travelling on that track per day. Assume the tracks are one-way,
so trains may not travel in the other direction.

(a) Find all values of f1 , . . . , f5 so that the system is in equilibrium.

(b) Suppose the tracks from A to C and from D to A are closed due to maintenance.
Is it still possible for the system to be in equilibrium?

118
Solution.
(a) We construct a table using the Junction Rule:
Flow In Flow Out
A: 15 + f4 = 10 + f1 + f5
B: 20 + f1 = 10 + f2
C: 15 + f2 + f5 = 25 + f3
D: 5 + f3 = 10 + f4
Rearranging gives the linear system of equations
f1 − f4 + f5 = 5
f1 − f2 = −10
f2 − f3 + f5 = 10
f3 − f4 = 5
which we carry to RREF
   
1 0 0 −1 1 5 −→ 1 0 0 −1 1 5 −→
 1 −1 0 0 0 −10  −R2  −1
  1 0 0 0 10  R2 +R1
 
   
 0 1 −1 0 1 10  −R3  0 −1 1 0 −1 −10 
0 0 1 −1 0 5 −R4 0 0 −1 1 0 −5
   
1 0 0 −1 1 5 −→ 1 0 0 −1 1 5 −→
 0
 1 0 −1 1 15 

 0 1
 0 −1 1 15 
   
 0 −1 1 0 −1 −10  R3 +R2  0 0 1 −1 0 5 
0 0 −1 1 0 −5 0 0 −1 1 0 −5 R4 +R3
 
1 0 0 −1 1 5
 0 1 0 −1 1 15 
 
 
 0 0 1 −1 0 5 
0 0 0 0 0 0
giving
f1 = 5 + s − t, f2 = 15 + s − t, f3 = 5 + s, f4 = s and f5 = t
for integers s, t (as we cannot have fractional trains). Moreover, as trains cannot go
the other way, we immediately have
f1 ≥0 =⇒ 5 + s − t ≥ 0 =⇒ s − t ≥ −5
f2 ≥0 =⇒ 15 + s − t ≥ 0 =⇒ s − t ≥ −15
f3 ≥0 =⇒ 5+s≥0 =⇒ s ≥ −5
f4 ≥0 =⇒ s≥0
f5 ≥0 =⇒ t≥0

119
so we have s, t ≥ 0 and s − t ≥ −5.

(b) Assume the tracks from A to C and from D to A are closed. This forces f4 = f5 = 0.
From our previous solution, we have that s = t = 0. Since s − t = 0 ≥ −5, this is a
valid solution. We have

f1 = 5, f2 = 15, f3 = 5, f4 = 0 and f5 = 0

Notice here we have a unique solution.

Application: Electrical Networks


Consider the following electrical network shown in Figure 17.1:

Figure 17.1: An electrical network

It consists of voltage sources, resistors and wires. A voltage source (often a battery) provides
an electromotive force V measured in volts. This electromotive force moves electrons through
the network along a wire at a rate we refer to as current I measured in amperes (or amps).
The resistors (lightbulbs for example) are measured in ohms Ω, and serve to retard the
current by slowing the flow of electrons. The intersection point between three or more wires
is called a node. The nodes break the wires up into short paths between two nodes. Every
such path can have a different current, and the arrow on each path is called a reference
direction. Pictured here is a voltage source (left) and a resistor (right) between two nodes.
One remark about voltage sources. If a current passes through a battery supplying V volts
from the “−” to the “+”, then there is a voltage increase of V volts. If the current passes

120
through the same battery from the “+” to the “−”, then there is a voltage drop (decrease)
of V volts.

Our aim is to compute the currents I1 , I2 and I3 in Figure 17.1. The following laws will be
useful.

Ohm’s Law The potential difference V across a resistor is given by V = IR, where I is the
current and R is the resistance.

Note that the reference direction is important when using Ohm’s Law. A current I travelling
across a resistor of 10Ω in the reference direction will result in a voltage drop of 10I while
the same current travelling across the same resistor against the reference direction will result
in a voltage gain of 10I.

Kirchoff ’s Laws

1. Conservation of Energy: Around any closed voltage loop in the network, the algebraic
sum of voltage drops and voltage increases caused by resistors and voltage sources is
zero.

2. Conservation of Charge: At each node, the total inflow of current equals the total
outflow of current.

Kirchoff’s Laws will be used to derive a system of equations that we can solve in order to find
the currents. The Conservation of Energy requires using Ohm’s Law. Returning to Figure
17.1, we can now solve for I1 , I2 and I3 . Notice that there is an upper loop and a lower loop.
We may choose any orientation we like for either loop. Given the reference directions, we
will use a clockwise orientation for the upper loop and a counterclockwise orientation for the
lower loop. We will compute the voltage increases and drops as we move around both loops.
Conservation of Energy says the voltage drops must equal the voltage gains around each loop.

For the upper loop, we can start at node A. Moving clockwise, we first have a voltage gain
of 5 from the battery, then a voltage drop of 5I1 at the 5Ω resistor and a 10I2 voltage drop
at the 10Ω resistor. Thus

5I1 + 10I2 = 5 (15)

121
For the lower loop, we can again start at node A. Moving counterclockwise, we have a
voltage drop of 5I3 followed by a voltage increase of 10 and finally a voltage drop of 10I2 .
We have

10I2 + 5I3 = 10 (16)

Now, applying the Conservation of Charge to node A gives I1 + I3 = I2 so we obtain

I1 − I2 + I3 = 0 (17)

Note that at node B we obtain the same equation, so including it would be redundant.
Combining equations (15), (16) and (17) gives the system of equations

I1 − I2 + I3 = 0
5I1 + 10I2 = 5
10I2 + 5I3 = 10

Carrying the augmented matrix of this system to RREF,


     
1 −1 1 0 −→ 1 −1 1 0 −→ 1 −1 1 0 −→
 5 10 0 5  R2 −5R1  0 15 −5 5  5 R2  0 3 −1 1  R2 −R3
    1  

0 10 5 10 0 10 5 10 1
R
5 3
0 2 1 2
     
1 −1 1 0 R1 +R2 1 0 −1 −1 −→ 1 0 −1 −1 R1 +R3

 0 1 −2 −1  −→  0 1 −2 −1   0 1 −2 −1  R2 +2R3
     

0 2 1 2 R3 −2R2 0 0 5 4 1
R
5 3
0 0 1 4/5 −→
 
1 0 0 −1/5
 0 1 0 3/5 
 

0 0 1 4/5

we see that I1 = −1/5 amps, I2 = 3/5 amps and I3 = 4/5 amps. Notice that I1 is nega-
tive. This simply means that our reference direction for I1 in Figure 17.1 is incorrect and
the current flows in the opposite direction there. Note that the reference directions may be
assigned arbitrarily.

Note that there is actually a third loop in Figure 17.1: the loop that travels along the outside
of the network. If we start at node A and travel clockwise around this loop, we first have
a voltage increase of 5, then a voltage drop of 5I1 , then another voltage drop of 10 (as we
pass through the 10V battery from “+” to “−”) and finally a voltage increase of 5I3 (as we
pass through the 5Ω resistor in the opposite reference direction for I3 ). As voltage increases
equal voltage drops, we have 5 + 5I3 = 5I1 + 10, or 5I1 − 5I3 = −5. However, this is just
Equation (16) subtracted from Equation (15). Including this equation in our above system
of equations would only result in an extra row of zeros when we carried the resulting system

122
of equations to RREF. This will be true in general, and shows that when computing current
in an electrical network, we only need to consider the “smallest” loops.

Another note is that we chose to orient the upper loop in the clockwise direction and the
lower loop in the counterclockwise direction. This was totally arbitrary (but made sense
given the reference directions). We could have changed either of the directions. Of course,
as we saw in the previous paragraph, we have to consider which way our orientation will
cause the current to flow through a battery, and how to handle resistors if our orientation
has us moving in the opposite direction of a reference direction.

One last thing to notice here is that since I1 is negative, the current is actually flowing back-
wards through the 5V battery. This can happen in a poorly designed electrical network: the
10V battery is too strong and actually forces the current to travel through the 5V battery in
the wrong direction. Too much current being forced through a battery in the wrong direction
will lead to a fire.

Exercise 17.2
Using Kirchoff’s Laws, derive the system of equations that you would need to solve in
order to find the currents in the following electrical network.

If you are feeling brave, you may solve the system.

Solution. We begin by using the Conservation of Energy on each of the three smallest closed
loops. Going clockwise around the left loop starting at A, we see a voltage drop of 20I2 , a
voltage gain of 10 and then a drop of 20I1 . This gives

20I1 + 20I2 = 10 or 2I1 + 2I2 = 1

Traversing the middle loop clockwise starting at A, we have a voltage drop of 20I3 followed
by a gain of 20I2 (note the we pass the resistor between A and C in the opposite direction

123
of I2 ). We obtain
20I2 = 20I3 or I2 − I3 = 0
Moving clockwise around the right loop starting at B, we observe a voltage gain of 20,
followed by a drop of 20I5 and then a gain of 20I3 leading to

20I5 = 20 + 20I3 or I3 − I5 = −1

Next, we apply the Conservation of Charge to the nodes A, B, C and D (in that order) to
obtain the equations

I1 − I2 − I4 =0
I3 − I4 + I5 =0
I1 − I2 − I6 =0
I3 + I5 − I6 =0

Finally, we have constructed the system of equations

2I1 + 2I2 = 1
I2 − I3 = 0
I3 − I5 = −1
I1 − I2 − I4 = 0
I3 − I4 + I5 = 0
I1 − I2 − I6 = 0
I3 + I5 − I6 = 0

The following row reduction shows why solving large systems by hand is not feasible.
   
2 2 0 0 0 0 1 −→ 1 −1 0 −1 0 0 0 −→
1 −1 1 −1
   
 0 0 0 0 0   0 0 0 0 0 
   
 0
 0 1 0 −1 0 −1  R1 ↔R4  0
  0 1 0 −1 0 −1 

 1 −1 0 −1 0 0 0   2 2 0 0 0 0 1  R4 −2R1
   
   
 0
 0 1 −1 1 0 0 

 0
 0 1 −1 1 0 0 

 1 −1 0 −1
0 0 0   1 −1 0 0 0 −1 0 
 
  R6 −R1
0 0 1 0 1 −1 0 0 0 1 0 1 −1 0

124
   
1 −1 0 −1 0 0 0 R1 +R2 1 0 −1 −1 0 0 0 R1 +R3

1 −1 0  −→  0 1 −1
   
 0 0 0 0 0 0 0 0  R2 +R3
   

 0 0 1 0 −1 0 −1   
 0 0 1 0 −1 0 −1  −→

0 4 0 2 0 0 1  R4 −4R2  0 0 4 2 0 0 1  R4 −4R3
   

   

 0 0 1 −1 1 0 0  
 0 0
 1 −1 1 0 0  R5 −R3


 0 0 0 1 0 −1 0  
 0 0
 0 1 0 −1 0 

0 0 1 0 1 −1 0 0 0 1 0 1 −1 0 R7 −R3
   
1 0 0 −1 −1 0 −1 −→ 1 0 0 −1 −1 0 −1 R1 +R4

0 −1 0 −1  0 −1 0 −1  −→
   
 0 1 0  0 1 0
   

 0 0 1 0 −1 0 −1  
 0 0 1
 0 −1 0 −1 

0 0 0 2 4 0 5   0 0 0 1 0 −1 0 
   

   

 0 0 0 −1 2 0 1  R4 ↔R6  0 0 0 −1
 2 0 1  R5 +R4


 0 0 0 1 0 −1 0 
 0 0 0
 2 4 0 5  R6 −2R4
0 0 0 0 2 −1 1 0 0 0 0 2 −1 1
   
1 0 0 0 −1 −1 −1 −→ 1 0 0 0 −1 −1 −1 R1 +R5

0 1 0 0 −1 0 −1   0 1 0 0 −1 0 −1  R2 +R5
   

   

 0 0 1 0 −1 0 −1  
 0 0 1 0 −1 0 −1  R3 +R5

0 0 0 1 0 −1 0   0 0 0 1 0 −1 0  −→
   

   

 0 0 0 0 2 −1 1 
 2
1
R 4
 0 0 0 0
 1 −1/2 1/2 


 0 0 0 0 4 2 5  1
R
 4 5 
 0 0 0 0 1 1/2 5/4  R −R
 6 5
0 0 0 0 2 −1 1 1
R
2 7
0 0 0 0 1 −1/2 1/2 R7 −R5
   
1 0 0 0 0 −3/2 −1/2 R1 + 32 R6 1 0 0 0 0 0 5/8
0 1 0 0 0 −1/2 −1/2  R2 + 12 R6  0 1 0 0 0 0 −1/8 
   

   

 0 0 1 0 0 −1/2 −1/2   R3 + 2 R6  0 0 1 0 0 0 −1/8 
1  
0 0 0 1 0 −1 0  R4 +R6  0 0 0 1 0 0 3/4 
   

   

 0 0 0 0 1 −1/2 1/2   R5 + 2 R6  0 0 0 0 1 0
1  7/8 

 0 0 0 0 0 1 3/4  −→  0 0 0 0 0 1
  3/4 
0 0 0 0 0 0 0 0 0 0 0 0 0 0

Finally, we see
5 1 1
I1 = amps, I2 = − amps, I3 = − amps,
8 8 8

3 7 3
I4 = amps, I5 = amps, I6 = amps
4 8 4
In particular, the reference arrows for I2 and I3 are pointing in the wrong direction. For
those brave souls who solved this system by hand, we salute you.

125
Lecture 18

Matrix Algebra
We first encountered matrices when we solved systems of equations, where we performed
elementary row operations to the augmented matrix or the coefficient matrix of the system.
We now treat matrices as algebraic objects, beginning with the definition of matrix addition
and scalar multiplication. Under these operations, we will see that matrices behave much
like vectors in Rn .

Definition 18.1: Matrix


An m × n matrix A is a rectangular array with m rows and n columns. The entry in
the ith row and jth column will be denoted by either aij or (A)ij , that is
 
a11 a12 · · · a1j · · · a1n
a21 a22 · · · a2j · · · a2n
 
 
 .. .. .. .. 
 . . . . 
A=
 

 ai1 ai2 · · · aij · · · ain 
 .. .. .. ..
 
 . . . .


am1 am2 · · · amj · · · amn

which we sometimes abbreviate as A = [aij ] when the size of the matrix is known.
Two m × n matrices A = [aij ] and B = [bij ] are equal if aij = bij for all i = 1, . . . , m
and j = 1, . . . , n, and we write A = B. The set of all m × n matrices with real entries
is denoted by Mm×n (R). For a matrix A ∈ Mm×n (R), we say that A has size m × n
and call aij the (i, j)−entry of A. If m = n, we say that A is a square matrix.

Example 18.1
Let  
1 2 " #
0 0
A= 6 4  and B = .
 
0 sin π
3 1
Then A is a 3 × 2 matrix and B is a 2 × 2 square matrix.

126
Definition 18.2: Zero Matrix
The m × n matrix with all zero entries is called a zero matrix, denoted by 0m×n , or
simply by 0 if the size is clear.

Note that the matrix B in Example 18.1 is the 2 × 2 zero matrix. We now defined addition
and scalar multiplication of matrices.
Definition 18.3: Matrix Addition and Scalar Multiplication

For A, B ∈ Mm×n (R) we define matrix addition as

(A + B)ij = (A)ij + (B)ij

and for c ∈ R, scalar multiplication is defined by

(cA)ij = c(A)ij

As with vectors in Rn , we define A − B = A + (−1)B.

Example 18.2
Find a, b, c ∈ R such that
h i h i h i
a b c − 2 c a b = −3 3 6 .

Solution. Since
h i h i h i
a b c −2 c a b = a − 2c b − 2a c − 2b

we require
a − 2c = −3
−2a + b = 3
−2b + c = 6
     
1 0 −2 −3 −→ 1 0 −2 −3 −→ 1 0 −2 −3 −→
 −2 1 0 3  R2 +2R1  0 1 −4 −3   0 1 −4 −3 
     

0 −2 1 6 0 −2 1 6 R3 +2R2 0 0 −7 0 − 17 R3
   
1 0 −2 −3 R1 +2R3 1 0 0 −3
 0 1 −4 −3  R2 +4R3  0 1 0 −3 
   

0 0 1 0 −→ 0 0 1 0

so a = b = −3 and c = 0.

127
It follows from our definition of scalar multiplication that for A = Mm×n (R) and any c ∈ R
0A = 0m×n and c 0m×n = 0m×n .
The next example shows that if cA = 0m×n , then either c = 0 or A = 0m×n .

Example 18.3

Let c ∈ R and A ∈ Mm×n (R) be such that cA = 0m×n . Prove that either c = 0 or
A = 0m×n .
Proof. Since cA = 0m×n , we have that

caij = 0 for every i = 1, . . . , m and j = 1, . . . , n. (18)

If c = 0, then the result holds, so we assume c = 6 0. But then from (18), we see that
aij = 0 for every i = 1, . . . , m and j = 1, . . . , n, that is, A = 0m×n .

The next theorem is very similar to Theorem 6.1, and shows that under our operations of
addition and scalar multiplication, matrices behave much like vectors in Rn .

Theorem 18.1
Let A, B, C ∈ Mm×n (R) and let c, d ∈ R. We have
V1. A + B ∈ Mm×n (R) Mm×n (R) is closed under addition

V2. A + B = B + A addition is commutative

V3. (A + B) + C = A + (B + C) addition is associative

V4. There exists a matrix 0 ∈ Mm×n (R) such that A + 0 = A for every
A ∈ Mm×n (R) zero matrix

V5. For each A ∈ Mm×n (R) there exists a (−A) ∈ Mm×n (R) such that A + (−A) = 0
additive inverse

V6. cA ∈ Mm×n (R) Mm×n (R) is closed under scalar multiplication

V7. c(dA) = (cd)A scalar multiplication is associative

V8. (c + d)A = cA + dA distributive law

V9. c(A + B) = cA + cB distributive law

V10. 1A = A scalar multiplicative identity

Of course, the zero matrix in Mm×n (R) is 0 = 0m×n and for A ∈ Mm×n (R), the additive
inverse of A is (−A) = (−1)A. We now introduce another important operation we can

128
perform on matrices.

Definition 18.4: Transpose of a Matrix

Let A ∈ Mm×n (R). The transpose of A, denoted by AT , is the n × m matrix satisfying


(AT )ij = (A)ji .

Example 18.4
Let 
1 " #
h i 4 2
A =  2 , B= 1 4 8 and C = .
 
−1 3
3
Then 
1 " #
h i 4 −1
AT = 1 2 3 , BT =  4  and C T = .
 
2 3
8

Theorem 18.2: Properties of Transpose

Let A, B ∈ Mm×n (R) and c ∈ R. Then

(1) AT ∈ Mn×m (R)


T
(2) AT = A

(3) (A + B)T = AT + B T

(4) (cA)T = cAT

Example 18.5
Solve for A if " #!T " #
1 2 2 3
2AT − 3 = .
−1 1 −1 2

Solution. Using Theorem 18.2, we have


" #!T " #
T T
 1 2 2 3
2A − 3 = by (3)
−1 1 −1 2

129
" #T " #
T 1 2 2 3
2 AT − 3 = by (4)
−1 1 −1 2
" # " #
1 −1 2 3
2A − 3 = by (2)
2 1 −1 2
" # " #
2 3 3 −3
2A = +
−1 2 6 3
" #
1 5 0
A=
2 5 5
" #
5/2 0
A=
5/2 5/2

Definition 18.5: Symmetric Matrix

A matrix A is symmetric if AT = A.

Note that if A ∈ Mm×n (R), then AT ∈ Mn×m (R) so AT = A implies n = m. Thus a sym-
metric matrix must be a square matrix.

Example 18.6
Let  
" # 1 −2 3
1 6
A= and B =  −2 4 5 .
 
6 9
3 6 7
Then  
" # 1 −2 3
1 6
AT = = A and B T =  −2 4 6 =6 B
 
6 9
3 5 7
so A is symmetric while B is not symmetric.

Example 18.7

Prove that if A, B ∈ Mn×n (R) are symmetric, then sA + tB is symmetric for any
s, t ∈ R.
Proof. Since A and B are symmetric, we have that AT = A and B T = B. We must

130
show that (sA + tB)T = sA + tB. We have

(sA + tB)T = (sA)T + (tB)T = sAT + tB T = sA + tB

so sA + tB is symmetric.

131
Lecture 19

The Matrix–Vector Product


Thus far, given a system of linear equations, we have worked with the augmented matrix in
order to solve the system and to verify various properties of systems of equations. We now
show that a linear system of equations can be viewed as a matrix–vector equation. We will
see that in addition to giving us a compact way to express a system of equations, this new
notation will make it easier to verify properties of systems of equations.

To begin, consider the linear system of equations

x1 + 3x2 − 2x3 = −7
−x1 − 4x2 + 3x3 = 8

Let  
" # x1 " #
1 3 −2 −7
A= , ~x =  x2  and ~b = .
 
−1 −4 3 8
x3
and let " # " # " #
1 3 −2
~a1 = , ~a2 = and ~a3 =
−1 −4 3

be the columns of A so that A = [ ~a1 ~a2 ~a3 ]. Now our above system is consistent if and
only if we can find x1 , x2 , x3 ∈ R so that
" # " # " # " # " #
~b = −7 x 1 + 3x 2 − 2x 3 1 3 −2
= = x1 + x2 + x3
8 −x1 − 4x2 + 3x3 −1 −4 3
= x1~a1 + x2~a2 + x3~a3

that is, the system is consistent if and only if ~b can be expressed as a linear combination of the
columns of A. In this case, any solution to the system is given by the coefficients that express
~b as a linear combination of the columns of A. This leads us to make the following definition:

Definition 19.1: Matrix–Vector Product

Let A = [ ~a1 · · · ~an ] ∈ Mm×n (R) (it follows that ~a1 , . . . , ~an ∈ Rm ) and ~x =
[ x1 · · · xn ]T ∈ Rn . Then the vector A~x is defined by

A~x = x1~a1 + · · · + xn~an ∈ Rm .

132
Using this definition, we can rewrite our above system as
 
" # x1 " #
1 3 −2  −7
 x2  =

−1 −4 3 8
x3

or more simply as
A~x = ~b.
Example 19.1
       
1 5 " # 1 5 9
 −1
 −1 2  = (−1)  −1  + 2  2  =  5 
      
2
−2 1 −2 1 4

from which we see that ~x = [ −1 2 ]T is a solution to the linear system

x1 + 5x2 = 9
−x1 + 2x2 = 5
−2x1 + x2 = 4

Notice in the previous example that the entries in the solution ~x to the system A~x = ~b are
the coefficients that express ~b as a linear combination of the columns of the coefficient matrix
A.

Theorem 19.1

(1) Every linear system of equations can be expressed as A~x = ~b for some matrix A
and some vector ~b,

(2) The system A~x = ~b is consistent if and only if ~b can be expressed as a linear
combination of the columns of A,

(3) If ~a1 , . . . , ~an are the columns of A ∈ Mm×n (R) and ~x = [ x1 · · · xn ]T , then ~x
satisfies A~x = ~b if and only if x1~a1 + · · · + xn~an = ~b.

It’s important to keep the sizes of our matrices and vectors in mind:

A |{z} ~b
~x = |{z}
|{z}
m×n Rn Rm

133
For example, the matrix-vector product
  
1 2 1
A~x =  3 4   2 
  

1 4 −1

/ R2 .
is not defined since A has two columns but ~x ∈

Example 19.2
" #" # " # " # " #
1 1 1 1 1 0
=1 −1 =
2 2 −1 2 2 0

This shows that for A ∈ Mm×n (R) and ~x ∈ Rn with A 6= 0m×n and ~x 6= ~0Rn we are
h iT
not guaranteed that A~x is nonzero. Additionally, this shows that ~x = 1 −1 is a
solution to the homogeneous system of equations

x1 + x2 = 0
.
2x1 + 2x2 = 0

The next theorem shows that the matrix–vector Product behaves well with respect to matrix
addition, vector addition and scalar multiplication.

Theorem 19.2: Properties of the Matrix–Vector Product

Let A, B ∈ Mm×n (R), ~x, ~y ∈ Rn and c ∈ R. Then

(1) A(~x + ~y ) = A~x + A~y

(2) A(c~x) = c(A~x) = (cA)~x

(3) (A + B)~x = A~x + B~x

Proof. We prove (1). Let A = [ ~a1 · · · ~an ] where ~a1 , . . . , ~an ∈ Rm , ~x = [ x1 · · · xn ]T


and ~y = [ y1 · · · yn ]T . Then
 
x 1 + y1
..
A(~x + ~y ) = [ ~a1 · · · ~an ] 
 
. 
x n + yn
= (x1 + y1 )~a1 + · · · + (xn + yn )~an

134
= x1~a1 + · · · + xn~an + y1~a1 + · · · + yn~an
= A~x + A~y

We can use the matrix–vector product to verify results about linear systems of equations in
a more compact way:

Example 19.3

Consider the homogenous system of equations A~x = ~0 where A ∈ Mm×n (R) and
~x ∈ Rn . Assume ~x1 , . . . , ~xk ∈ Rn are solutions to this system and let c1 , . . . , ck ∈ R.
Show that c1~x1 + · · · + ck ~xk is also a solution to A~x = ~0.
Proof. Since ~x1 , . . . , ~xk are solutions to A~x = ~0, we have that A~x1 = · · · = A~xk = ~0.
Using Theorem 19.2, we have

A(c1~x1 + · · · + ck ~xk ) = A(c1~x1 ) + · · · + A(ck ~xk ) by (1)


= c1 A~x1 + · · · + ck A~xk by (2)
= c1~0 + · · · + ck~0
= ~0.

Thus c1~x1 + · · · + ck ~xk is a solution to A~x = ~0.

We now show that dot products can be useful to compute a matrix–vector product. Consider
   
1 −1 6 1
A= 0 2 1  and ~x =  1 
   

4 −3 2 2
so that
         
1 −1 6 1(1) + 1(−1) + 2(6) 12
A~x = 1  0  + 1  2  + 2  1  =  1(0) + 1(2) + 2(1)  =  4 .
         

4 −3 2 1(4) + 1(−3) + 2(2) 5


| {z }
these look like dot products

If we define     

1 0 4
~r1 =  −1  , ~r2 =  2  and ~r3 =  −3 
     

6 1 2
then  
~r1 · ~x
A~x =  ~r2 · ~x 
 

~r3 · ~x

135
In general, given A ∈ Mm×n (R), there are vectors ~r1 , . . . , ~rm ∈ Rn so that

~r1T

 . 
A =  .. 
~rmT

and for any ~x ∈ Rn ,


~r1T

  T   
~r1 ~x ~r1 · ~x
 .   .   . 
A~x =  ..  ~x =  ..  =  .. 
~rmT ~rmT ~x ~rm · ~x
from which we see that the ith entry of A~x is the dot product ~ri ·~x where ~riT is the ith row of A.

Exercise 19.1
" #
1 1 2 −1 h iT
Let A = and let ~x = 1 2 1 0 . Compute A~x
2 1 −3 2

(a) as a linear combination of the columns of A,

(b) using dot products.

Solution.

(a) Using linear combinations, we have


 
" # 1 " # " # " # " #
1 1 2 −1  2  1 1 2 −1
A~x = =1 +2 +1 +0
 

2 1 −3 2  1  2 1 −3 2
0
" # " # " # " # " #
1 2 2 0 5
= + + + = .
2 2 −3 0 1

(b) Using dot products, we have


 
" # 1 " # " #
1 1 2 −1  2  1(1) + 1(2) + 2(1) − 1(0) 5
A~x = = = .
 

2 1 −3 2  1  2(1) + 1(2) − 3(1) + 2(0) 1
0

136
Definition 19.2
The n × n identity matrix, denoted by In (or In×n or just I if the size is clear) is the
square matrix of size n × n with (In )ii = 1 for i = 1, 2, . . . , n (these entries make up
what we call the main diagonal of the matrix) and zeros elsewhere.

For example,
 
  1 0 0 0
" # 1 0 0
1 0  0 1 0 0 
I2 = I3 =  0 1 0  I4 = 
   

0 1 , ,  0 0 1 0  .
0 0 1
= [ ~e1 ~e2 ] 0 0 0 1
= [ ~e1 ~e2 ~e3 ]
= [ ~e1 ~e2 ~e3 ~e4 ]

Example 19.4
Show In~x = ~x for every ~x ∈ Rn .
Solution. Let ~x = [ x1 · · · xn ]T ∈ Rn . Then In~x = x1~e1 + · · · + xn~en = ~x.

Note that In~x = ~x for every ~x ∈ Rn is exactly why we call In the identity matrix. It is
also why we require In to be an square matrix. If I were an m × n matrix with m 6= n and
~x ∈ Rn , then I~x ∈ Rm 6= Rn so I~x would never be equal to ~x.

Example 19.5
Let " # " # " #
1 0 3 −1 1
A= , B= and ~x = .
2 3 2 3 2
Then " #" # " # " #" #
1 0 1 1 3 −1 1
A~x = = = = B~x.
2 3 2 8 2 3 2

We see that A~x = B~x with ~x 6= ~0, and yet A 6= B.

Example 19.5 might seem strange when compared to real numbers. For a, b, x ∈ R with
x 6= 0, we know that if ax = bx, then a = b. As Example 19.5 shows, this result does
not hold for matrices: A~x = B~x for a given nonzero vector ~x is not sufficient to guarantee
A = B. Example 19.5 is actually equivalent to Example 19.2 since A~x = B~x is equivalent
to (A − B)~x = ~0, and we have seen in Example 19.2 that we can have A − B and ~x nonzero
despite their product being zero.

137
Theorem 19.3: Matrices Equal Theorem

Let A, B ∈ Mm×n (R). If A~x = B~x for every ~x ∈ Rn , then A = B.

Proof. Let A, B ∈ Mm×n (R) with A = [ ~a1 · · · ~an ] and B = [ ~b1 · · · ~bn ]. Since A~x =
B~x for every ~x ∈ Rn , we have that A~ei = B~ei for i = 1, . . . , n. Since

A~ei = ~ai and B~ei = ~bi

we have that ~ai = ~bi for i = 1, . . . , n. Hence A = B.

138
Lecture 20

Matrix Multiplication
We now extend the matrix–vector product to matrix multiplication.

Definition 20.1

If A ∈ Mm×n (R) and B = [ ~b1 · · · ~bk ] ∈ Mn×k (R), then the matrix product AB is
the m × k matrix
AB = [ A~b1 · · · A~bk ].

Example 20.1
Let 

" # 1 2
1 2 3
A= and B =  1 −1 
 
−1 −1 1
2 2
so   

1 2
~b1 = 
 1 

and ~b2 =  −1  .
 

2 2
Then
   
" # 1 " # " # 2 " #
1 2 3 9 1 2 3 6
A~b1 =  1 = and A~b2 =  −1  =
   
−1 −1 1 0 −1 −1 1 1
2 2
so " #
9 6
AB = [ A~b1 A~b2 ] = .
0 1

In the above example, we saw that

(A2×3 )(B3×2 ) = (AB)2×2

In general, for the product AB to be defined, the number of columns of A must equal
the number of rows of B. If this is the case, then A ∈ Mm×n (R) and B ∈ Mn×k (R) and
AB ∈ Mm×k (R).

139
The above method to multiply matrices can be quite tedious. As with the matrix–vector
product, we can simplify the task using dot products. For
 T 
~r1
 .. 
A =  .  ∈ Mm×n (R) and B = [ ~b1 · · · ~bk ] ∈ Mn×k (R)
~rmT

we see that ~ri ∈ Rn for i = 1, . . . , m and ~bj ∈ Rn for j = 1, . . . , k so the dot products ~ri · ~bj
are defined. We thus obtain
~r1 ~b1 · · · ~r1T~bk ~r1 · ~b1 · · · ~r1 · ~bk
 T   T   
~r1
 .   . ..   .. ..
AB =  ..  [ ~b1 · · · ~bk ] =  .. . =

. . 
~r T m ~r T~b1 · · · ~r T~bk
m ~rm · ~b1 · · · ~rm · ~bk
m

Thus, the (i, j)−entry of AB is ~ri · ~bj .

Example 20.2
Let " # " #
1 2 1 1 3
A= and B = .
3 4 4 −2 1
Then
      
1 2 1 1 3 1(1) + 2(4) 1(1) + 2(−2) 1(3) + 2(1) 9 −3 5
AB = = =
3 4 4 −2 1 3(1) + 4(4) 3(1) + 4(−2) 3(3) + 4(1) 19 −5 13

Also note that A ∈ M2×2 (R) and B ∈ M2×3 (R) so AB ∈ M2×3 (R). However, the
number of columns of B is not equal to the number of rows of A, so the product BA
is not defined.

Example 20.3
Let " # " #
1 1 1 2
A= and B = .
1 1 1 −1
Then
" # " #" #
1 1 1
2 2 1
AB = =
1 −11 1 2 1
" #" # " #
1 2 1 1 3 3
BA = =
1 −1 1 1 0 0

140
from which we see that AB 6= BA despite the products AB and BA both being defined
and having the same size.

Examples 20.2 and 20.3 show us that matrix multiplication is not commutative. That is,
given two matrices A and B such that AB is defined, the product BA may not be defined,
and even if it is, BA may not be equal to AB (in fact, BA need not have the same size as
AB: consider A ∈ M2×3 (R) and B ∈ M3×2 (R)).

Example 20.4
Let " # " #
1 2 1 1
A= and B =
3 4 −1 2
Then
" #" #!T " #T " #
1 2 1 1 −1 5 −1 −1
(AB)T = = =
3 4 −1 2 −1 11 5 11

but
" #" # " #
T T 1 3 1 −1 4 5
A B = =
2 4 1 2 6 6

and in fact
" #" # " #
1 −1 1 3 −1 −1
B T AT = = = (AB)T
1 2 2 4 5 11

Theorem 20.1: Properties of Matrix Multiplication


Let c ∈ R and A, B, C be matrices of appropriate sizes.
(1) IA = A I is an identity matrix

(2) AI = A I is an identity matrix

(3) A(BC) = (AB)C Matrix multiplication is associative

(4) A(B + C) = AB + AC Left distributive law

(5) (B + C)A = BA + CA Right distributive law

(6) (cA)B = c(AB) = A(cB)

141
(7) (AB)T = B T AT

Note that since we defined matrix products in terms of the matrix vector product, we have
that (3) holds for the matrix vector product also: A(B~x) = (AB)~x where ~x has the same
number of entries as B has columns. We also note that (7) can be generalized as

(A1 A2 · · · Ak )T = ATk · · · AT2 AT1

where A1 , . . . , Ak are matrices of appropriate sizes. In fact, taking A1 = · · · = Ak = A for


some square matrix A, we obtain
T k
Ak = AT
for any positive integer k.7

Exercise 20.1
Simplify A(3B − C) + (A − 2B)C + 2B(C + 2A).

Solution. We have

A(3B − C) + (A − 2B)C + 2B(C + 2A) = 3AB − AC + AC − 2BC + 2BC + 4BA


= 3AB + 4BA

Make careful note of the following points regarding Exercise 20.1 – we must keep the order
of our matrices correct when doing matrix algebra:
• A(3B − C) = 3AB − AC, that is, when distributing, A must remain on the left,

• (A − 2B)C = AC − 2BC, that is, when distributing, C must remain on the right,

• 3AB + 4BA 6= 7AB since we cannot assume AB = BA.

Exercise 20.2
If A, B, C ∈ Mn×n (R) and C commutes (with respect to multiplication) with both A
and B, then prove that C commutes with AB.

Proof. Since C commutes with both A and B, we have that AC = CA and BC = CB. Thus

(AB)C = A(BC) = A(CB) = (AC)B = (CA)B = C(AB)

and so C commutes with AB.


7
For a square matrix A and a positive integer k, Ak = A
| ·{z
· · A}.
k times

142
Lecture 21

Application: Adjacency Matrices for Directed Graphs


Definition 21.1: Directed Graph

A directed graph (or digraph) is a set of vertices and a set of directed edges between
some of the pairs of vertices. We may move from one vertex in the directed graph to
another vertex if there is a directed edge pointing in the direction we wish to move.

Consider the directed graph below:

This graph has four vertices, V1 , V2 , V3 and V4 . A directed edge between two vertices Vi and
Vj is simply the arrow pointing from Vi to Vj . As seen in the figure, we may have a directed
edge from a vertex to the same vertex (see V1 ), an edge may be directed in both directions
(see V2 and V3 ) and there may be more than one directed edge from one vertex to another
(see V3 and V4 ).

One question we may ask is in how many distinct ways can we get from V1 to V4 travelling
along exactly 3 directed edges, that is, how many distinct 3−edged paths are there from V1
to V4 ? A little counting reveals that there are 6 distinct such paths:
upper
V1 −−−→ V1 −−−→ V3 −−−→ V4
lower
V1 −−−→ V1 −−−→ V3 −−−→ V4

143
V1 −−−→ V1 −−−→ V2 −−−→ V4
upper
V1 −−−→ V2 −−−→ V3 −−−→ V4
lower
V1 −−−→ V2 −−−→ V3 −−−→ V4
V1 −−−→ V3 −−−→ V2 −−−→ V4

Note that each time we move from V3 to V4 , we specify which directed edge we are taking
since there is more than one. We could alternatively label each directed edge as we have the
vertices. However, we are more concerned with counting the number of paths and not with
actually listing them all out.

Counting may seem easy, but what if we were asked to find all distinct 20−edged paths
from V1 to V4 ? After months of counting, you would find 2 584 875 distinct paths. Clearly,
counting the paths one-by-one is not the best method.

Consider the 4 × 4 matrix A whose (i, j)−entry is the number of directed edges from Vi to
Vj . Then
 
1 1 1 0
 0 0 1 1 
A=
 

 0 1 0 2 
1 0 0 0
We compute
   
1 2 2 3 4 3 3 6
2
 1 1 0 2  
3 3 1 2 1 
A = and A = 
   
 
 2 0 1 1   3 3 2 2 
1 1 1 0 1 2 2 3
and note that the (1, 4)−entry of A3 is 6 which is the number of distinct 3−edged paths
from V1 to V4 . In fact, the (i, j)−entry of A3 gives the number of distinct 3−edged paths
from Vi to Vj for any i and j with 1 ≤ i, j ≤ 4.

Definition 21.2: Adjacency Matrix


Consider a directed graph with n vertices V1 , V2 , . . . , Vn . The adjacency matrix of the
directed graph is the n × n matrix A whose (i, j)−entry is the number of directed
edges from Vi to Vj .

Theorem 21.1
Consider a directed graph with n vertices V1 , V2 , . . . , Vn and let A be the adjacency
matrix for this graph. For any positive integer k, the number of distinct k−edged

144
paths from Vi to Vj is given by the (i, j)−entry of Ak .

Proof. 8 The result is true for k = 1 since the (i, j)−entry of A1 = A is by definition the
number of distinct 1−edged paths from Vi to Vj . Assume now that the result is true for
(k)
some positive integer k. Denote the (i, j)−entry of Ak by aij so that the number of distinct
(k) (k+1)
k−edged paths from Vi to Vj is aij . Consider the (i, j)−entry of Ak+1 , denoted by aij .
We have n
(k+1)
X (k) (k) (k) (k)
aij = ai` a`j = ai1 a1j + ai2 a2j + · · · + ain anj
`=1

Note that every (k + 1)−edged path from Vi to Vj is of the form

Vi −−−→ · · · −−−→ V` −−−→ Vj (19)


(k)
for some ` = 1, 2, . . . , n. By assumption, there are ai` distinct k−edged paths from Vi
(k)
to V` . Since there are a`j distinct 1−edged paths from V` to Vj , there are ai` a`j distinct
(k + 1)−edged paths of the form given by (19). Thus, the total number of (k + 1)−edged
paths from Vi to Vj is given by
(k) (k) (k)
ai1 a1j + ai2 a2j + · · · + ain anj
(k+1)
which is aij , the (i, j)−entry of Ak+1 .
The next example is quite challenging. You should be able to set up the adjacency matrix
and answer part (a). Parts (b) and (c) are fun, but optional.

Exercise 21.1
An airline company offers flights between the cities of Toronto, Beijing, Paris and
Sydney. You can fly between these cities as you like, except that there is no flight
from Beijing to Sydney, and there is no flight between Toronto and Sydney in either
direction.

(a) If you depart from Toronto, how many distinct sequences of flights can you take
if you plan to arrive in Beijing after no more than 5 flights? (You may arrive at
Beijing in less than 5 flights and then leave, provided you end up back in Beijing
no later than the 5th flight).

(b) Suppose you wish to depart from Sydney and arrive in Beijing after exactly 5
flights. In how many ways can this be done so that your second flight takes you

8
The proof technique used here is called induction. Although you will not be asked to give a proof by
induction, we include this proof here as it illustrates why Ak gives the number of k−edged paths between
the vertices of a directed graph.

145
to Toronto?

(c) Suppose you wish to depart from Sydney and arrive in Beijing after exactly 5
flights. In how many ways can this be done so that you visit Toronto at least
once? (Hint: In how many ways can you depart from Sydney and arrive in
Beijing after exactly 5 flights without visiting Toronto?)

Solution. We denote the four cities as vertices, and place a directed arrow between two cities
if we can fly between the two cities in that direction. We label Toronto as V1 , Beijing as V2 ,
Paris as V3 and Sydney as V4 . We obtain the following directed graph: We construct the

adjacency matrix A as
 
0 1 1 0
 1 0 1 0 
A=
 

 1 1 0 1 
0 1 1 0
As we can take at most 5 flights, we will compute A2 , A3 , A4 and A5 :
   
2 1 1 1 2 4 4 1
 1 2 1 1   3 3 4 1 
A2 =   , A3 =  ,
   
 1 2 3 0   5 4 3 3 
2 1 1 1 2 4 4 1
   
8 7 7 4 14 19 19 7
 7 8 7 4   15 18 19 7 
A4 =  , A5 = 
   

 7 11 12 3   23 22 21 12 
8 7 7 4 14 19 19 7

(a) Since the (1, 2)−entry of Ak gives the number of distinct ways to fly from Toronto to
Beijing using k flights, we simply add the (1, 2)−entries of these five matrices. We

146
have 1 + 1 + 4 + 7 + 19 = 32. Thus, there are 32 distinct ways to fly from Toronto to
Beijing using no more than 5 flights.
(b) Here, we need to fly from Sydney to Beijing using exactly 5 flights. The (4, 2)−entry of
A5 tells us that there are 19 ways to do this. However, we must pass through Toronto
after the second flight. This restriction implies our final answer should be no greater
than 19. We will compute the number of ways to fly from Sydney to Toronto in two
flights, and then the number of ways to fly from Toronto to Beijing in three flights,
and finally multiply our results together to get the final answer. Thus we compute
(2) (3)
a41 · a12 = 2 · 4 = 8
There are 8 ways to fly from Sydney to Beijing in 5 flights, stopping in Toronto after
the second flight.
(c) Here it is tempting to count the number of flights from Sydney that pass through
Toronto after the first flight, then the number of flights that pass through Toronto
after the second flight, third flight and fourth flight, then add the results, that is, to
compute
       
(1) (4) (2) (3) (3) (2) (4) (1)
a41 · a12 + a41 · a12 + a41 · a12 + a41 · a12 = 0 · 7 + 2 · 4 + 2 · 1 + 8 · 1 = 18
and conclude that there are 18 such flights. However, the sequence of flights
Sydney −→ Paris −→ Toronto −→ Beijing −→ Toronto −→ Beijing
is “double-counted” as it passes through Toronto twice. Thus there should be less than
18 such flights. To avoid this double-counting, we will instead count the number of
ways to fly from Sydney to Beijing without visiting Toronto, and we will accomplish
this by removing Toronto from our directed graph:

This leads to a new adjacency matrix


 
0 0 0 0
 0 0 1 0 
B=
 

 0 1 0 1 
0 1 1 0

147
It’s left as an exercise to see that
 
0 0 0 0
 0 3 4 1 
B5 = 
 

 0 5 4 4 
0 5 5 3

The (4, 2)−entry of B 5 shows that there are 5 distinct way to fly from Sydney to Beijing
in 5 flights without stopping in Toronto. Since the (4, 2)-entry of A5 shows there are
19 ways to fly from Sydney to Beijing in 5 flights, there must be 19 − 5 = 14 distinct
ways to fly from Sydney to Beijing in 5 flights while visiting Toronto at least once.

Application: Markov Chains


In the zombie apocalypse, a person exists in exactly one of two states: human or zombie.
If a person is a human on any given day, then that person will become a zombie on the
next day with probability 1/2, and if a person is a zombie on any given day, then with
a simple application of the moderately successful miracle-cure ZombieGoneTM , they will
become a human on the next day with probability 1/4. We are interested in computing
the probability that a person is a human or a zombie k days after the onset of the zombie
apocalypse. Given the above data, we can construct the following table:

From : To :
Human Zombie
1/2 1/4 Human
1/2 3/4 Zombie

For k a nonnegative integer, let hk be the probability that a person is a human on day k and
let zk be the probability that a person is a zombie on day k. Since 1/2 of the humans on
day k will remain human the next day, and 1/4 of the zombies on day k will become human
the next day, we have that
1 1
hk+1 = hk + zk
2 4
Similarly, we find
1 3
zk+1 = hk + zk
2 4
giving us a system of equations. In matrix notation,
" # " #" #
hk+1 1/2 1/4 hk
=
zk+1 1/2 3/4 zk

148
Now let " # " #
hk 1/2 1/4
~sk = and P =
zk 1/2 3/4
to obtain

~sk+1 = P~sk (20)

for every nonnegative integer k. Since a person must be either a human or a zombie (but not
both), we have that hk + zk = 1 for each k. Notice how we can find the matrix P directly
from the table we constructed above, and that it follows from how we constructed the table
that the entries of P are all nonnegative and the entries in each column of P sum to 1.

Definition 21.3: Probability Vector, Stochastic Matrix, Markov Chain


A vector ~s ∈ Rn is called a probability vector if the entries in the vector are nonnegative
and sum to 1. A square matrix is called stochastic if its columns are probability
vectors. Given a stochastic matrix P , a Markov Chain is a sequence of probability
vectors ~s0 , ~s1 , ~s2 , . . . where
~sk+1 = P~sk
for every nonnegative integer k. In a Markov Chain, the probability vectors ~sk are
called state vectors.

Now suppose that for k = 0 (the moment the zombie apocalypse begins - referred to by
survivors as “Z–Day”), everyone is still human. Thus, a person is a human with probability
1 and a person is a zombie with probability 0. This gives
" #
1
~s0 =
0

We can now compute


" #" # " #
1/2 1/4 1 1/2
~s1 = P~s0 = =
1/2 3/4 0 1/2

showing that one day after the start of the zombie apocalypse, 1/2 of the population are
humans while the other 1/2 of the population are now zombies. Now
" #" # " # " #
1/2 1/4 1/2 3/8 0.37500
~s2 = P~s1 = = =
1/2 3/4 1/2 5/8 0.62500
" #" # " # " #
1/2 1/4 3/8 11/32 0.34375
~s3 = P~s2 = = =
1/2 3/4 5/8 21/32 0.65625

149
Continuing to use the formula ~sk+1 = P~sk , we obtain
" # " # " # " #
43/128 0.33594 171/512 0.33398
~s4 = ≈ , ~s5 = ≈ ,
85/128 0.66406 341/512 0.66602
" # " #
683/2048 0.33350
~s6 = ≈
1365/2048 0.66650
and after some work, we find that
" # " #
174 763/524 288 0.33333
~s10 = ≈
349 525/524 288 0.66666

It appears that the sequence ~s0 , ~s1 , ~s2 , . . . is converging9 to


" #
1/3
~s =
2/3

In fact, " #" # " #


1/2 1/4 1/3 1/3
P~s = = = ~s
1/2 3/4 2/3 2/3
Thus, if the system reaches state ~s (or starts in this state), then the probabilities that a
given person is a human or a zombie no longer change over time.

Definition 21.4: Steady-State Vector


If P is a stochastic matrix, then a state vector ~s is called a steady-state vector for P
if P~s = ~s. It can be shown that every stochastic matrix has a steady-state vector.

To algebraically determine any steady-state vectors in above our example, we start with
P~s = ~s. Then

P~s − ~s = ~0
P~s − I~s = ~0
(P − I)~s = ~0

so that we have a homogeneous system. Note the introduction of the identity matrix I above.
It might be tempting to go from P~s − ~s = ~0 to (P − 1)~s = ~0, but since P is a matrix and 1
9
By converging, we mean that each component of ~sk is converging to the corresponding component of ~s
as k tends to infinity

150
is a number, P − 1 is not defined. Computing the coefficient matrix P − I and row reducing
gives
" # " # " #
−1/2 1/4 −→ −1/2 1/4 −2R1 1 −1/2
1/2 −1/4 R2 +R1 0 0 −→ 0 0

We find that for t ∈ R, " #


1
2
t
~s =
t
Recalling that ~s is a state vector, we additionally require that (1/2)t + t = 1, that is, t = 2/3.
This gives " #
1/3
~s =
2/3
h iT
Now, what happens if we change our initial state vector ~s0 ? If we let ~s0 = h0 z0 and
h iT
recall that h0 + z0 = 1, we obtain that z0 = 1 − h0 , so ~s0 = h0 1 − h0 . It is a good
exercise to show that by repeatedly using Equation (20), we obtain
h0 1 1
 
+ −
 4k 3 3 · 4k 
~sk = 
 

 h 2 1 
0
− k + +
4 3 3 · 4k
Since both h0 /4k and 1/(3 · 4k ) tend to zero as k tends to infinity, we see that for any initial
state vector ~s0 , the sequence ~s0 , ~s1 , ~s2 , . . . tends to
" #
1/3
~s =
2/3

This means that as more and more time passes after the zombie apocalypse begins, the
probability that a person is human on a given day will get closer and closer to 1/3 while the
probability that that a person is a zombie on a given day will tend to 2/3, and this long-
term behaviour does not depend on the intial state vector ~s0 . For example, given enough
time after the start of the zombie apocalypse, a city of 100 000 people would have 33 333
humans and 66 667 zombies each day. Note that once this steady-state is achieved, humans
are still turning into zombies and zombies are still reverting back to humans each day, but
that the number of humans turning into zombies is equal to the number of zombies turning
into humans. It’s worth noting that once the steady-state is achieved, there would “math-
ematically” be 33 333.3̄ humans and 66 666.6̄ zombies. We have rounded our final answers
due to the real-world constraints that we cannot have fractional humans.

151
Note that in our above zombie example, our stochastic matrix P had a unique steady-state
and that the Markov Chain converged to this steady-state regardless of the initial state vec-
tor ~s0 . The next two examples show that this is not always the case.

Example 21.1
The n × n identity matrix I is a stochastic matrix, and for any state vector ~s ∈ Rn ,
I~s = ~s. This shows that every state vector ~s ∈ Rn is a steady-state vector for I. Thus
we do not have a unique steady-state vector.

Example 21.2
For our second example, consider the stochastic matrix
" #
0 1
Q=
1 0
h iT
Then for any state vector ~s = s1 s2 ,
" #" # " #
0 1 s1 s2
Q~s = =
1 0 s2 s1
In order for ~s to be a steady-state vector, we require Q~s = ~s, and so we have that
s1 = s2 = 1/2. Thus we have a unique steady-state vector (which we also could have
found by solving the homogeneous system (Q − I)~s = ~0 as above). However, if we take
h iT
the initial state vector ~s0 = 1 0 , we find
" #" # " #
0 1 1 0
~s1 = Q~s0 = =
1 0 0 1
" #" # " #
0 1 0 1
~s2 = Q~s1 = = = ~s0
1 0 1 0
We see that  " #
 1

 , if k is even,
 0



~sk =
 " #
0




 , if k is odd
1

so that the Markov Chain doesn’t converge to the steady-state with this initial state.
h iT
In fact, the Markov Chain converges to the steady-state only when ~s0 = 1/2 1/2 .

152
Clearly, the stochastic matrix P from the zombie apocalypse example is special in the sense
that P has a unique steady-state vector and any Markov Chain will converge to this steady-
state regardless of the initial state vector chosen. This is because the matrix P is regular.

Definition 21.5: Regular Stochastic Matrix


An n × n stochastic matrix P is called regular if for some positive integer k, the matrix
P k has all positive entries.

Since a stochastic matrix has all entries between 0 and 1 inclusive, a stochastic matrix P
fails to be regular when P k contains a zero entry for every positive integer k. Clearly,
" #
1/2 1/4
P = P1 =
1/2 3/4

is regular as all entries are positive. The n × n identity matrix is not regular since for any
positive integer k, I k = I contains zero entries. The matrix
" #
0 1
Q=
1 0

is not regular since for any positive integer k,


 " #
 1 0

 , if k is even
 0 1



k
Q =
 " #
 0 1



 , if k is odd
1 0

always contains a zero entry.

Theorem 21.2
Let P be a regular n × n stochastic matrix. Then P has a unique steady-state vector
~s and for any initial state vector ~s0 ∈ Rn , the resulting Markov Chain converges to
the steady-state vector ~s.

To solve a Markov Chain problem:

1. Read and understand the problem,

2. Determine the stochastic matrix P and verify that P is regular,

153
3. Determine the initial state vector ~s0 if required,

4. Solve the homogeneous system (P − I)~s = ~0,

5. Choose values for any parameters resulting from solving the above system so that ~s is
a probability vector,

6. Conclude by Theorem 21.2 that ~s is the steady-state vector,

7. Interpret the entries of ~s in terms of the original problem as needed.

A brief word about notation. Our examples here dealt mostly with two states - for example,
human and zombie. In general, we will have many states in our Markov Chain. This means
that our stochastic matrix P will be an n × n matrix and our state vectors ~sk ∈ Rn for
k = 0, 1, 2, . . .. Oftentimes, the following notation is used for state vectors:
 (k) 
s1
 ..
~sk =  .


(k)
sn

for k = 0, 1, 2, . . . and
 
s1
 . 
~s =  .. 
sn
for our steady-state vector.

154
Lecture 22

Complex Matrices
We denote the set of m×n matrices with complex entries by Mm×n (C). The rules of addition,
scalar multiplication, matrix-vector product, matrix multiplication and transpose derived for
real matrices also hold for complex matrices.

Example 22.1
Let " # " #
j 2−j 1 j
A= and B = .
4 + j 1 − 2j 2j 1 − j
Then
" #" # " #
j 2−j 1 j 2 + 5j −3j
AB = =
4 + j 1 − 2j 2j 1 − j 8 + 3j −2 + j
" #" # " #
1 j j 2−j −1 + 5j 4
BA = =
2j 1 − j 4 + j 1 − 2j 3 − 3j 1 + j

from which we see that AB 6= BA, so multiplication of complex matrices also doesn’t
commute.

Note that if
~r1T
 
 . 
A =  ..  ∈ Mm×n (C) and B = [ ~b1 · · · ~bk ] ∈ Mn×k (C)
~rmT

then ~ri ∈ Cn for i = 1, . . . , m and ~bj ∈ Cn for j = 1, . . . , k so the dot product ~ri ·~bj is defined.
Then we have
~r1 · ~b1 · · · ~r1 · ~bk
 T   
~r1
 .  .. ..
AB =  ..  [ ~b1 · · · ~bk ] =   ∈ Mm×k (C).
 
. .
~rmT ~rm · ~b1 · · · ~rm · ~bk

Thus, the (i, j)−entry of AB is ~ri · ~bj . It’s important to note that we use the dot product
here, and not the complex inner product.

Recall that taking the complex conjugate of a vector ~z ∈ Cn was done by conjugating each
entry of the vector. We see taking the conjugate of a matrix is no different.

155
Definition 22.1: Conjugate of a Matrix

Let A = [aij ] ∈ Mm×n (C). Then the conjugate of A is

A = [aij ].

The following theorem is stated without proof, but it is a good exercise to verify it.

Theorem 22.1
If A ∈ Mm×n (C) and ~z ∈ Cn , then A~z = A ~z .

The definition of symmetry carries over to complex matrices as well – if A ∈ Mn×n (C), then
A is symmetric if AT = A.

Example 22.2
Let " # " #
j 1+j 3 2−j
A= and B = .
1+j 3 2+j 6
Then
" #
j 1+j
AT = =A
1+j 3
" #
3 2+j
BT = 6= B
2−j 6

so A is symmetric and B is not.

Matrix Inverses10
We have seen that like real numbers, we can multiply matrices. For real numbers, we know
that 1 is the multiplicative identity since 1(x) = x = x(1) for any x ∈ R. We also know that
if x, y ∈ R are such that xy = 1 = yx, then x and y are multiplicative inverses of each other,
and we say that they are both invertible. We have recently seen that for an n × n matrix A,
IA = A = AI where I is the n × n identity matrix which shows that I is the multiplicative
identity for Mn×n (R). It is then natural to ask that for a given a matrix A, does there exist
a matrix B so that AB = I = BA? If so, the requirement that AB = BA imposes the
10
When we say inverse here, we mean multiplicative inverse. Given any matrix A ∈ Mm×n (R), the additive
inverse of A is −A, which is easy to compute and not very interesting to study.

156
condition that A and B be a square matrices
.
Definition 22.2: Inverse Matrix
Let A ∈ Mn×n (R). If there exists a B ∈ Mn×n (R) such that

AB = I = BA

then A is invertible and B is an inverse of A (and B is invertible with A an inverse of


B).

Example 22.3
Let " # " #
2 −1 1 1
A= and B = .
−1 1 1 2
Then
" #" # " # " #" # " #
2 −1 1 1 1 0 1 1 2 −1 1 0
AB = = and BA = =
−1 1 1 2 0 1 1 2 −1 1 0 1

so A is invertible and B is an inverse of A.

Example 22.4
Let " #
1 2
A= .
0 0
Then for any b1 , b2 , b3 , b4 ∈ R,
" #" # " # " #
1 2 b1 b2 b1 + 2b3 b2 + 2b4 1 0
= 6=
0 0 b3 b4 0 0 0 1

so A is not invertible.

Notice that in the previous example, A is a nonzero matrix that fails to be invertible. This
might be surprising since for a real number x, we know that x being invertible is equivalent
to x being nonzero. Clearly this is not the case for n × n matrices.

By the above definition, to show that B ∈ Mn×n (R) is an inverse of A ∈ Mn×n (R), we must
check that both AB = I and BA = I. Then next theorem shows that if AB = I, then it
follows that BA = I (or equivalently, if BA = I then it follows that AB = I) so that we

157
need only verify only one of AB = I and BA = I to conclude that B is an inverse of A.

Theorem 22.2
Let A, B ∈ Mn×n (R) be such that AB = I. Then BA = I. Moreover, rank (A) =
rank (B) = n.

Proof. Let A, B ∈ Mn×n (R) be such that AB = I. We first show that rank (B) = n. Let
~x ∈ Rn be such that B~x = ~0. Since AB = I,

~x = I~x = (AB)~x = A(B~x) = A~0 = ~0

so ~x = ~0 is the only solution to the homogeneous system B~x = ~0. Thus, rank (B) = n by
the System–Rank Theorem(2).

We next show that BA = I. Let ~y ∈ Rn . Since rank (B) = n and B has n rows, the
System–Rank Theorem(3) guarantees that we will find ~x ∈ Rn such that ~y = B~x. Then

(BA)~y = (BA)B~x = B(AB)~x = BI~x = B~x = ~y = I~y

so (BA)~y = I~y for every ~y ∈ Rn . Thus BA = I by the Matrices Equal Theorem.

Finally, since BA = I, it follows that rank (A) = n by the first part of our proof with the
roles of A and B interchanged.
We have now proven that if A ∈ Mn×n (R) is invertible, then rank (A) = n. It follows that
the reduced row echelon form of A is I. We now prove that if A is invertible, then the inverse
of A is unique.

Theorem 22.3
Let A ∈ Mn×n (R) be invertible. If B, C ∈ Mn×n (R) are both inverses of A, then
B = C.

Proof. Assume for A, B, C ∈ Mn×n (R) that both B and C are inverses of A. Then BA = I
and AC = I. We have

B = BI = B(AC) = (BA)C = IC = C.

Hence, if A is invertible, the inverse of A is unique, and we denote this inverse by A−1 .

158
Theorem 22.4: Properties of Matrix Inverses

Let A, B ∈ Mn×n (R) be invertible and let c ∈ R with c 6= 0. Then

(1) (cA)−1 = 1c A−1

(2) (AB)−1 = B −1 A−1


−1 k
(3) Ak = A−1 for k a positive integer
−1 T
(4) AT = A−1
−1
(5) A−1 =A

Proof. We prove (2) and (4) only. For (2), since

(AB)(B −1 A−1 ) = A(BB −1 )A−1 = AIA−1 = AA−1 = I

we have that (AB)−1 = B −1 A−1 and for (4), since


T T
AT A−1 = A−1 A = I T = I
−1 T
we see that AT = A−1 .
Note that (2) from Theorem 22.4 generalizes for more than two matrices. For invertible
matrices A1 , A2 , . . . , Ak ∈ Mn×n (R) we have that A1 A2 · · · Ak is invertible and

(A1 A2 · · · Ak )−1 = A−1 −1 −1


k · · · A2 A1 .

In particular, if A1 = A2 = · · · = Ak = A is invertible, then


−1 k
Ak = A−1

for any positive integer k.

159
Lecture 23

Matrix Inversion Algorithm


Having shown many properties of matrix inverses, we have yet to actually compute the in-
verse of an invertible matrix. We know that for a real number x, x is invertible if and only
if x 6= 0, and in this case x−1 = x1 . Things aren’t quite so easy with matrices.11 . We derive
an algorithm here that will tell us if a matrix is invertible, and compute the inverse should
the matrix be invertible. Our construction is for 3 × 3 matrices, but generalizes naturally
for n × n matrices.

Consider A ∈ M3×3 (R). If A is invertible, then there exists an X = [ ~x1 ~x2 ~x3 ] ∈ M3×3 (R)
such that

AX = I
A[ ~x1 ~x2 ~x3 ] = [ ~e1 ~e2 ~e3 ]
[ A~x1 A~x2 A~x3 ] = [ ~e1 ~e2 ~e3 ]

Thus
A~x1 = ~e1 , A~x2 = ~e2 and A~x3 = ~e3 .
We have three systems of equations with the same coefficient matrix, so we construct an
augmented matrix
[ A | ~e1 ~e2 ~e3 ] = [ A | I ]
We must consider two cases when solving this system. First, if the reduced row echelon form
of A is I, then
[ A | I ] −→ [ I | ~b1 ~b2 ~b3 ]

where B = [ ~b1 ~b2 ~b3 ] ∈ M3×3 (R) is the matrix that I reduces to under the same elemen-
tary row operations that carry A to I. From this, we see that ~b1 is the solution to A~x1 = ~e1 ,
~b2 is the solution to A~x2 = ~e2 and ~b3 is the solution to A~x3 = ~e3 , that is,

~x1 = ~b1 , ~x2 = ~b2 and ~x3 = ~b3

Hence
AX = AB = A[ ~b1 ~b2 ~b3 ] = [ A~b1 A~b2 A~b3 ] = [ ~e1 ~e2 ~e3 ] = I
so A−1 = B. Second, if the reduced row echelon form of A is not I, then rank (A) < 3 and
A cannot be invertible since if A were invertible, we would have rank (A) = 3 by Theorem
22.2.

11
Don’t even think about writing A−1 = 1
A. This is wrong as 1
A is not even defined.

160
Thus for A ∈ Mn×n (R), to see if A is invertible (and to compute A−1 if A is invertible), carry
the augmented matrix [ A | I ] to reduced row echelon form. If the reduced row echelon form
of [ A | I ] is [ I | B ] for some B ∈ Mn×n (R), then B = A−1 , but if the reduced row echelon
form of A is not I, then A is not invertible. This is known as the Matrix Inversion Algorithm.

Example 23.1
Let " #
2 3
A= .
4 5
Find A−1 if it exists.
Solution. We have
" # " # " #
2 3 1 0 −→ 2 3 1 0 R1 +3R2 2 0 −5 3 1
R
2 1

4 5 0 1 R2 −2R1 0 −1 −2 1 −→ 0 −1 −2 1 −R2
" #
1 0 −5/2 3/2
0 1 2 −1

So A is invertible (since the reduced row echelon form of A is I) and


" #
−5/2 3/2
A−1 = .
2 −1

Example 23.2
Let " #
1 2
A= .
2 4
Find A−1 if it exists.
Solution. We have
" # " #
1 2 1 0 −→ 1 2 1 0
2 4 0 1 R2 −2R1 0 0 −2 1
We see that the reduced row echelon form of A is
" # " #
1 2 1 0
6=
0 0 0 1

so A is not invertible (note that rank (A) = 1 < 2).

161
Exercise 23.1
Let  
1 0 −1
A =  1 1 −2  .
 

1 2 −2
Find A−1 if it exists.

Solution. We have
   
1 0 −1 1 0 0 −→ 1 0 −1 1 0 0 −→
 1 1 −2 0 1 0  R2 −R1  0 1 −1 −1 1 0 
   

1 2 −2 0 0 1 R3 −R1 0 2 −1 −1 0 1 R3 −2R2
   
1 0 −1 1 0 0 R1 +R3 1 0 0 2 −2 1
 0 1 −1 −1 1 0  R2 +R3  0 1 0 0 −1 1 
   

0 0 1 1 −2 1 −→ 0 0 1 1 −2 1

and we conclude that A is invertible and


 
2 −2 1
A−1 =  0 −1 1  .
 

1 −2 1

Note that if you find A to be invertible and you compute A−1 , then you can check your work
by ensuring that AA−1 = I.

Properties of Matrix Inverses


Theorem 23.1: Cancellation Laws
Let A ∈ Mn×n (R) be invertible

(1) For all B, C ∈ Mn×k (R), if AB = AC, then B = C left cancellation

(2) For all B, C ∈ Mk×n (R), if BA = CA, then B = C right cancellation

Proof. We prove (1). We have

AB = AC
A (AB) = A−1 (AC)
−1

162
(A−1 A)B = (A−1 A)C
IB = IC
B=C

Note that our two cancellation laws require that A be invertible. Indeed
" #" # " # " #" #
0 0 1 2 3 0 0 0 0 0 7 8 9
= =
0 1 4 5 6 4 5 6 0 1 4 5 6

but " # " #


1 2 3 7 8 9
6= .
4 5 6 4 5 6
Notice that rank ([ 00 01 ]) = 1 < 2 so [ 00 01 ] is not invertible.

Example 23.3

If A, B, C ∈ Mn×n (R) are such that A is invertible and AB = CA, does B = C?


Solution. The answer is no. To see this, consider
" # " # " #
1 1 1 1 2 0
A= , B= and C = .
0 1 1 1 1 0

Then
" #" # " #
1 1 1 1 2 2
AB = =
0 1 1 1 1 1
" #" # " #
2 0 1 1 2 2
CA = =
1 0 0 1 1 1

So AB = CA but B 6= C.

The previous example shows that we do not have mixed cancellation. This is a direct result
of matrix multiplication not being commutative. From AB = CA with A invertible, we can
obtain B = A−1 CA, and since B 6= C, we have C 6= A−1 CA. Note that we cannot cannot
cancel A and A−1 here.

Example 23.4

For A, B ∈ Mn×n (R) with A, B and A + B invertible, do we have that (A + B)−1 =


A−1 + B −1 ?

163
Solution. The answer is no. Let A = B = I. Then A + B = 2I and
1 1
(A + B)−1 = (2I)−1 = I −1 = I
2 2
but

A−1 + B −1 = I −1 + I −1 = I + I = 2I

As 21 I 6= 2I, (A + B)−1 6= A−1 + B −1

The following theorem summarizes many of the results we have seen thus far in the course,
and shows the importance of matrix invertibility. This theorem is central to all of linear
algebra and actually contains many more parts, some of which we will encounter later. Note
that we have already proven several of these equivalences.

Theorem 23.2: Invertible Matrix Theorem


Let A ∈ Mn×n (R). The following are equivalent.

(1) A is invertible

(2) rank (A) = n

(3) The reduced row echelon form of A is I

(4) For all ~b ∈ Rn , the system A~x = ~b is consistent and has a unique solution

(5) AT is invertible

In particular, for A invertible, the system A~x = ~b has a unique solution. We can solve for ~x
using our matrix algebra:

A~x = ~b
A−1 A~x = A−1~b
I~x = A−1~b
~x = A−1~b

164
Example 23.5

Consider the system of equations A~x = ~b with


" # " #
2 3 4
A= and ~b =
4 5 −1

Then A is invertible (see Example 23.1) and

~x = A−1~b
" #" #
−5/2 3/2 4
=
2 −1 −1
" #
−23/2
=
9

Of course we could have solved the above system A~x = ~b by row reducing the augmented
matrix [ A | ~b ] → [ I | −23/2
9
]. Note that to find A−1 we row reduced [ A | I ] −→ [ I | A−1 ] and
that the elementary row operations used in both cases are the same.

165
Lecture 24

Spanning Sets
Recall that linear combinations were introduced in Definition 6.7. We encountered them
again when we discussed lines and planes, and again when we introduced the matrix-vector
product. Here, we will study them in more detail. Given a set of k vectors in Rn , we will
be interested in the set12 of all possible linear combinations of these vectors. This motivates
the following definition.

Definition 24.1: Span

Let B = {~v1 , . . . , ~vk } be a set of vectors in Rn . The span of B is the set

Span B = {c1~v1 + · · · + ck~vk | c1 , . . . , ck ∈ R}.

We say that the set Span B is spanned by B and that B is a spanning set for Span B.

It is important to note that there are two sets here: B and Span B. The set B = {~v1 , . . . , ~vk }
is simply a set containing the k vectors ~v1 , . . . , ~vk in Rn , whereas Span B is the set of all
linear combinations of these k vectors. To show that a vector ~x ∈ Rn belongs to Span B, we
must show that we can express ~x as a linear combination of ~v1 , . . . , ~vk . As an example, note
that for i = 1, . . . , k,

~vi = 0~v1 + · · · + 0~vi−1 + 1~vi + 0~vi+1 + · · · + 0~vk

from which we see that vi ∈ Span B for i = 1, . . . , k. This shows that B ⊆ Span B.

Example 24.1
Determine if " # (" # " #)
2 4 3
∈ Span , .
3 5 3
If so, express [ 23 ] as a linear combination of [ 45 ] and [ 33 ].
Solution. Let c1 , c2 ∈ R and consider
" # " # " # " #
2 4 3 4c1 + 3c2
= c1 + c2 = .
3 5 3 5c1 + 3c2

12
Please see Appendix A for a brief introduction to sets.

166
We obtain the the system of equations

4c1 + 3c2 = 2
.
5c1 + 3c2 = 3

Carrying the augmented matrix of this system to reduced row echelon form gives
" # " # " #
4 3 2 R1 ↔R2 5 3 3 R1 −R2 1 0 1 −→
5 3 3 −→ 4 3 2 −→ 4 3 2 R2 −4R1
" # " #
1 0 1 −→ 1 0 1
.
0 3 −2 1
R
3 2
0 1 −2/3

As the system is consistent, we conclude that


" # (" # " #)
2 4 3
∈ Span , .
3 5 3

From the above reduced row echelon form, we see that c1 = 1 and c2 = − 23 . Thus
" # " # " #
2 4 2 3
=1 − .
3 5 3 3

Example 24.2
Determine if      
1  1
 1  
 2  ∈ Span  0 , 1  .
     
 
3 1 0
 
h i h i h i
1 1 1
If so, express 2 as a linear combination of 0 and 1 .
3 1 0

Solution. Let c1 , c2 ∈ R and consider


       
1 1 1 c1 + c2
 2  = c1  0  + c2  1  =  c2  .
       

3 1 0 c1

167
We obtain the system of equations

c1 + c2 = 1
c2 = 2 .
c1 = 3

Solving this system, we have


     
1 1 1 −→ 1 1 1 −→ 1 1 1
 0 1 2   0 1 2   0 1 2 
     

1 0 3 R3 −R1 0 −1 2 R3 +R2 0 0 4
h i
1
which shows the system is inconsistent. Here we see that 2 cannot be expressed as
h i h i 3
1 1
a linear combination of 0 and 1 and so we conclude that
1 0
     
1  1
 1 
 2 ∈/ Span  0 , 1  .
     
 
3 1 0
 

Exercise 24.1
Determine if        
4  1
 2 3 

 7  ∈ Span  3 , 1 , 4
       

 
3 1 1 2
 
h i h i h i h i
4 1 2 3
If so, express 7 as a linear combination of 3 , 1 and 4 .
3 1 1 2

Solution. Let c1 , c2 , c3 ∈ R and consider


         
4 1 2 3 c1 + 2c2 + 3c3
 7  = c1  3  + c2  1  + c3  4  =  3c1 + c2 + 4c3  .
         

3 1 1 2 c1 + c2 + 2c3

We obtain the system of equations

c1 + 2c2 + 3c3 = 4
3c1 + c2 + 4c3 = 7 .
c1 + c2 + 2c3 = 3

168
Soving this system gives
     
1 2 3 4 −→ 1 2 3 4 −→ 1 2 3 4 R1 −2R2

 3 1 4 7  R2 −3R1  0 −5 −5 −5  − 51 R2  0 1 1 1  −→
     

1 1 2 3 R3 −R1 0 −1 −1 −1 0 −1 −1 −1 R3 +R2
 
1 0 1 2
 0 1 1 1 
 

0 0 0 0

As the system is consistent, we conclude that


       
4  1
 2 3 
 7  ∈ Span  3  ,  1  ,  4 
       
 
3 1 1 2
 

The solution to the system is c1 = 2 − t, c2 = 1 − t and c3 = t where t ∈ R. Taking t = 0


gives            
4 1 2 3 1 2
 7  = 2 3  + 1 1  + 0 4  = 2 3  + 1 1 .
           

3 1 1 2 1 1
The existence
h i of a parameter in our solution
h i h means
i hthat
i there are infinitely many ways to
4 1 2 3
express 7 as a linear combination of 3 , 1 and 4 . For instance, we can take t = 100
3 1 1 2
to obtain        
4 1 2 3
 7  = −98  3  − 99  1  + 100  4  .
       

3 1 1 2

From Examples 24.1, 24.2 and from Exercise 24.1, we notice that we need to solve a sys-
tem of linear equations when checking if a vector ~x is in the span of a set of vectors
{~v1 , . . . , ~vk }. Recalling the definition of span and the matrix-vector product, we have that
~x ∈ Span {~v1 , . . . , ~vk } if and only if there exist c1 , . . . , ck so that
 
c1
 .. 
h i
~x = c1~v1 + · · · + ck~vk = ~v1 · · · ~vk  . .
ck

This verifies the following theorem, which is simply a restatement of Theorem 19.1(2) using
our new definition of span.

169
Theorem 24.1
h i
Let B = {~v1 , . . . , ~vk } ⊆ Rn , ~x ∈ Rn , ~c ∈ Rk and let A = ~v1 · · · ~vk ∈ Mn×k (R).
Then ~x ∈ Span B if and only if the system A~c = ~x is consistent.

By Theorem 24.1, to check if ~x ∈ Span {~v1 , . . . , ~vk }, we need only verify that the system
A~c = ~x is consistent, which amounts to carrying the system to row echelon form and ap-
plying the System Rank Theorem(1). However, if we wish to explicity write ~x as a linear
combination of ~v1 , . . . , ~vk , then we must solve the system for ~c.

Given a set of vectors {~v1 , . . . , ~vk } ⊆ Rn , we now try to gain a geometric understanding of
Span {~v1 , . . . , ~vk }.

Example 24.3
Describe the subset  
 1 
 
S = Span  2 
 
 
3
 

of R3 geometrically.
Solution. By definition,    

 1 

S = s 2  s ∈ R .
 
 
3
 
h i
1
Thus, ~x ∈ S if and only if ~x = s 2 for some s ∈ R. The equation
3
 
1
~x = s  2  , s∈R
 

is called a vector equation for S. But we see that this is simply a vector equation for a
3 3
line in R
h ithrough the origin. Hence, S is a line in R through the origin with direction
1
vector 2 .
3

170
Exercise 24.2
Describe the subset    
 1
 1 
S = Span  0  ,  1 
   
 
1 0
 

of R3 geometrically.

Solution. By definition,
     

 1 1 

S = s  0  + t  1  s, t ∈ R
   
 
1 0
 

so a vector equation for S is


  
1 1
~x = s  0  + t  1  , s, t ∈ R.
   

1 0
h i h i
1 1
Since the vectors 0 and 1 are not scalar multiples of one another, we see that S is a
1 0
plane in R3 through the origin13 .

Example 24.4
Let      
 1
 1 1 
S = Span  0  ,  1  ,  2  .
     
 
0 0 1
 

Show that S = R3 .
Solution. We showhS i=hR3i by
h showing
i that S ⊆ R3 and that R3 ⊆ S. To see that
1 1 1
S ⊆ R3 , note that 0 , 1 , 2 ∈ R3 and that S contains all linear combinations of
0 0 1
these three vectors. Since R3 is closed under linear combinations (see V1 and V6 from
Theorem 6.1), everyh x1 vector
i in S must be a vector in R3 , so S ⊆ R3 . Now to show
R3 ⊆ S, let ~x = xx23 ∈ R3 . We need to show that ~x can be expressed as a linear
h i h i h i
1 1 1
combination of 0 , 1 and 2 . By Theorem 24.1, this amounts to showing that
0 0 1

13
The set S is from Example 24.2. In light of what we have observed here, Example 24.2 shows us that
the point P (1, 2, 3) does not lie on the plane S.

171
the system with augmented matrix
 
1 1 1 x1
 0 1 2 x2 
 

0 0 1 x3

is consistent. Note that this matrix is already in row echelon form. Since the coefficient
matrix has three rows and has rank 3, the System-Rank Theorem (Theorem 14.1(3))
3
guarantees that the systemhis iconsistent
h i for i ~x ∈ R . Hence any ~x can be expressed
h any
1 1 1
as a linear combination of 0 , 1 and 2 , so S ⊆ R3 . We conclude that S = R3 .
0 0 1

Note that we did not solve the system of equations derived in Example 24.4 since we were
only concerned with whether or not the system was consistent. We can of course solve the
system knowing that it will be consistent. In fact, since there are no free variables, we will
obtain a unique solution by the System-Rank Theorem (Theorem 14.1(2)):
     
1 1 1 x1 R1 −R3 1 1 0 x1 − x3 R1 −R2 1 0 0 x1 − x2 + x3
 0 1 2 x2  R2 −2R3  0 1 0 x2 − 2x3  −→  0 1 0 x2 − 2x3  .
     

0 0 1 x3 −→ 0 0 1 x3 0 0 1 x3
Thus we obtain the unique solution
       
x1 1 1 1
 x2  = (x1 − x2 + x3 )  0  + (x2 − 2x3 )  1  + x3  2  .
       

x3 0 0 1
Although
h x1 i a bit more work, solving theh system
i h i showsh usi how to explicity construct any vector
1 1 1
~x = xx23 as a linear combination of 0 , 1 and 2 .
0 0 1

The examples given thus far may seem to indicate (at least in R3 ) that the span of one vector
gives a line through the origin, the span of two vectors gives a plane through the origin, and
the span of three vectors gives all of R3 . However, consider the following example.

Example 24.5
Let
         
 1
 0 1   1
 0 
S1 = Span  0  ,  1  ,  1  and S2 = Span  0 , 1  .
         
   
0 0 0 0 0
   

Show S1 = S2 .

172
Solution. We first show that S1 ⊆ S2 . Let ~x ∈ S1 . Then for some c1 , c2 , c3 ∈ R,
     
1 0 1
~x = c1  0  + c2  1  + c3  1  .
     

0 0 0
h i h i h i
1 1 0
However, we observe that 1 = 0 + 1 so
0 0 0
       
1 0 1 0
~x = c1  0  + c2  1  + c3  0  +  1 
       

0 0 0 0
       
1 0  1
 0 
= (c1 + c3 )  0  + (c2 + c3 )  1  ∈ Span  0  ,  1  = S2 .
       
 
0 0 0 0
 

Thus S1 ⊆ S2 . Now let ~y ∈ S2 . Then for some d1 , d2 ∈ R,


   
1 0
~y = d1  0  + d2  1 
   

0 0
           
1 0 1  1
 0 1 
= d1  0  + d2  1  + 0  1  ∈ Span  0  ,  1  ,  1  = S1 .
           
 
0 0 0 0 0 0
 

Thus S2 ⊆ S1 . Hence S1 = S2 .

If asked to describe      
 1
 0 1 
S1 = Span  0  ,  1  ,  1 
     
 
0 0 0
 

geometrically, we see from Example 24.5 that we cannot simply say S1 is R3 because although
     
1 0 1
~x = c1  0  + c2  1  + c3  1  , c1 , c2 , c3 ∈ R
     

0 0 0

173
is a vector equation for S1 , so too is
   
1 0
~x = c1  0  + c2  1  , c1 , c2 ∈ R. (21)
   

0 0
h i h i
1 0
Note that since neither of 0 and 1 are scalar multiples of one another, we see that S1
0 0
is a plane through the origin in R3 . The vector equation for S1 given in (21) is called a
simplified vector equation for S1 .

Example 24.5 showed that since one of the vectors in the given spanning set for S1 of Exam-
ple 24.5 was a linear combination of the other vectors in that spanning set, we could remove
that vector from the spanning set and the resulting smaller set would still span S1 . It was
important to do this as it allowed us to understand that S1 was geometrically a plane in R3
through the origin. The following Theorem generalizes what we have just seen.

Theorem 24.2
Let ~v1 , . . . , ~vk ∈ Rn . One of these vectors, say ~vi , can be expressed as a linear combi-
nation of ~v1 , . . . , ~vi−1 , ~vi+1 , . . . , ~vk if and only if

Span {~v1 , . . . , ~vk } = Span {~v1 , . . . , ~vi−1 , ~vi+1 , . . . , ~vk }.

We make a comment here before giving the proof. The theorem we need to prove is a double
implication as evidenced by the words if and only if.14 Thus we must prove two implications:
1. If ~vi can be expressed as a linear combination of ~v1 , . . . , ~vi−1 , ~vi+1 , . . . , ~vk , then
Span {~v1 , . . . , ~vk } = Span {~v1 , . . . , ~vi−1 , ~vi+1 , . . . , ~vk }
2. If Span {~v1 , . . . , ~vk } = Span {~v1 , . . . , ~vi−1 , ~vi+1 , . . . , ~vk }, then ~vi can be expressed as a
linear combination of ~v1 , . . . , ~vi−1 , ~vi+1 , . . . , ~vk .
The result of this theorem is that the two statements
“ ~vi can be expressed as a linear combination of ~v1 , . . . , ~vi−1 , ~vi+1 , . . . , ~vk ”
and
“ Span {~v1 , . . . , ~vk } = Span {~v1 , . . . , ~vi−1 , ~vi+1 , . . . , ~vk }”
are equivalent, that is, they are both true or they are both false. The proof that follows
is often not completely understood after just the first reading - it takes a bit of time to
understand, so don’t be discouraged if you need to read it a few times before it fully makes
sense.
14
We sometimes write ⇐⇒ to mean “if and only if”. To prove a statement of the form A ⇐⇒ B, we must
prove the two implications A =⇒ B and B =⇒ A, and so we call A ⇐⇒ B a double implication.

174
Proof. Without loss of generality15 , we assume i = k. To simplify the writing of the proof,
we let

S = Span {~v1 , . . . , ~vk−1 , ~vk }


T = Span {~v1 , . . . , ~vk−1 }.

To prove the first implication, assume that ~vk can be expressed as a linear combination of
~v1 , . . . , ~vk−1 . Then there exist c1 , . . . , ck−1 ∈ R such that

~vk = c1~v1 + · · · + ck−1~vk−1 . (22)

We must show that S = T . Let ~x ∈ S. Then there exist d1 , . . . , dk−1 , dk ∈ R such that

~x = d1~v1 + · · · + dk−1~vk−1 + dk~vk

and we make the substitution for ~vk using Equation (22) to obtain

~x = d1~v1 + · · · + dk−1~vk−1 + dk c1~v1 + · · · + ck−1~vk−1 )


= (d1 + dk c1 )~v1 + · · · + (dk−1 + dk ck−1 )~vk−1

from which we see that ~x can be expressed as a linear combination of ~v1 , . . . , ~vk−1 and it
follows that ~x ∈ T . Hence S ⊆ T . Now let ~y ∈ T . Then there exist a1 , . . . , ak−1 ∈ R such
that

~y = a1~v1 + · · · + ak−1~vk−1
= a1~v1 + · · · + ak−1~vk−1 + 0~vk

and we have that ~y can be expressed as a linear combination of ~v1 , . . . , ~vk from which it
follows that ~y ∈ S. We have that T ⊆ S and combined with S ⊆ T we conclude that S = T .

To prove the second implication, we now assume that S = T and we must show that
~vk can be expressed as a linear combination of ~v1 , . . . , ~vk−1 . Since vk ∈ S (recall that
~vk = 0~v1 + · · · + 0~vk−1 + 1~vk ) and S = T , we have ~vk ∈ T . Thus, there exist b1 , . . . , bk−1 ∈ R
such that ~vk = b1~v1 + · · · + bk−1~vk−1 as required.

Example 24.6
Consider (" # " # " # " #)
1 5 2 0
S = Span , , , .
0 0 4 1

15
What we mean here is that if i 6= k, then we may “rename” the vectors ~v1 , . . . , ~vk so that ~vk is the vector
that can be expressed as a linear combination of the first k − 1 vectors. Thus we just assume i = k. Note
that for i = k, Span {~v1 , . . . , ~vi−1 , ~vi+1 , . . . , ~vk } is written as Span {~v1 , . . . , ~vk−1 }.

175
Since [ 50 ] = 5 [ 10 ] = 5 [ 10 ] + 0 [ 24 ] + 0 [ 01 ], Theorem 24.2 gives
(" # " # " #)
1 2 0
S = Span , ,
0 4 1

and since [ 24 ] = 2 [ 10 ] = 4 [ 01 ], it again follows from Theorem 24.2 that


(" # " #)
1 0
S = Span , .
0 1

Finally, since [ 10 ] and [ 01 ] are not scalar multiples of one another, we cannot remove
either of them from the spanning set without changing the span. A vector equation
for S is " # " #
1 0
~x = c1 + c2 , c1 , c2 ∈ R.
0 1
Combining the vectors on the right gives
" #
c1
~x = .
c2

and it is clear that S = R2 .

Regarding the last example, the vectors that were chosen to be removed from the spanning
set depended on us noticing that some were linear combinations of others. Of course, we
could have noticed that [ 10 ] = 12 [ 24 ] − 2 [ 01 ] and concluded that
(" # " # " #)
5 2 0
S = Span , ,
0 4 1

and then continued from there. Indeed, any of


nh i h io nh i h io nh i h io nh i h io
1 2 5 2 5 0 2 0
S = Span 0 , 4 = Span 0 , 4 = Span 0 , 1 = Span 4 , 1

are also correct descriptions of S where the spanning sets cannot be further reduced.

176
Lecture 25

Linear Dependence and Linear Independence


In the previous lecture, we saw that given a spanning set for a set S, if one of the vectors in
the spanning set was a linear combination of the others, then we could remove this vector
and the set of remaining vectors would still span S. Thus far, our method has been to
simply observe whether or not one vector in a given spanning set is a linear combination of
the others. However, suppose we are given
       

 1 −2 1 −6  

 2   1   6   −6  
S = Span  , , ,  .
       

  −3   4   8   3  

 

7 8 2 7
It’s likely not obvious that
       
−6 1 −2 1
 −6   2   1   6 
= − + 2 −
       
     
 3   −3   4   8 
7 7 8 2
and that we can thus remove the last vector from the spanning set for S. Now imagine being
given 500 vectors in R1000 and trying to decide if any one of them is a linear combination of
the other 499 vectors. Inspection clearly won’t help here, so we need a better way to spot
these “dependencies” among a set of vectors should they exist. We make a definition here,
and will see soon how it can help us spot such dependencies.

Definition 25.1: Linear Dependence and Independence

Let B = {~v1 , . . . , ~vk } be a set of vectors in Rn . We say that B is linearly dependent if


there exist c1 , . . . , ck ∈ R, not all zero so that

c1~v1 + · · · + ck~vk = ~0.

We say that B is linearly independent if the only solution to

c1~v1 + · · · + ck~vk = ~0

is c1 = · · · = ck = 0, which we call the trivial solution.

It is important to understand that by “c1 , . . . , ck are not all zero”, we mean that at least one
of c1 , . . . , ck is nonzero.

177
Example 25.1
Determine whether the set
(" # " #)
2 −1
A= ,
3 2

is linearly dependent or linearly independent.


Solution. Let c1 , c2 ∈ R and consider
" # " # " #
2 −1 0
c1 + c2 = .
3 2 0

Equating entries gives the homogeneous system of linear equations

2c1 − c2 = 0
3c1 + 2c2 = 0

Reducing the coefficient matrix to row echelon form, we have


" # " # " # " #
2 −1 R1 ↔R2 3 2 R1 −R2 1 3 −→ 1 3
.
3 2 −→ 2 −1 −→ 2 −1 R2 −2R1 0 −7

We see that there are no free variables, so we get a unique solution. Since the system
is homogeneous, the unique solution must be c1 = c2 = 0, and hence A is linearly
independent.

Example 25.2
Determine whether the set
     

 1 2 1 
B=  0 , 1 , 1 
     
 
−1 0 1
 

linearly dependent or linearly independent.


Solution. Let c1 , c2 , c3 ∈ R and consider
       
1 2 1 0
c1  0  + c2  1  + c3  1  =  0  . (23)
       

−1 0 1 0

178
We obtain
c1 + 2c2 + c3 = 0
c2 + c3 = 0
−c1 + c3 = 0
Carrying the coefficient matrix to row echelon form gives
     
1 2 1 −→ 1 2 1 −→ 1 2 1
 0 1 1   0 1 1   0 1 1 
     

−1 0 1 R3 +R1 0 2 2 R3 −2R2 0 0 0

from which we see that the third variable is free. We will thus obtain nontrivial solu-
tions, that is, solutions where c1 , c2 , c3 are not all zero. Hence B is linearly dependent.

Note that we did not solve the system of equations derived in Example 25.2 since we were
only concerned with the number of solutions. Given that the set B is linearly dependent,
it is useful to solve the system as the solution will allow us to decide which vectors in B
can be expressed as linear combinations of the other vectors in B. Continuing with our
computations above, we have
     
1 2 1 1 2 1 R1 −2R2 1 0 −1
 0 1 1  −→  0 1 1  −→  0 1 1 
     

−1 0 1 0 0 0 0 0 0

from which we see that c3 = t, c2 = −t and c1 = t for any t ∈ R. Substituting these solutions
into Equation (23) gives
       
1 2 1 0
t 0  − t 1  + t 1  =  0 .
       

−1 0 1 0

Choosing any t 6= 0, say t = 1, gives


       
1 2 1 0
 0  −  1  +  1  =  0 . (24)
       

−1 0 1 0

We may rearrange this as      


1 2 1
 0 = 1 − 1 
     

−1 0 1

179
which allow us to use Theorem 24.2 to conclude that
         

 1 2 1   2
 1 
Span B = Span  0  ,  1  ,  1  = Span  1 , 1  .
         
   
−1 0 1 0 1
   

In this case we could solve for any vector on the left hand side of Equation (24) in terms of
the other two to alternatively arrive at
               
2 1 1 
 1 2 1  
 1 1 
 1  =  0  +  1  =⇒ Span  0  ,  1  ,  1  = Span 0 , 1 
               

   
0 −1 1 −1 0 1 −1 1
   

or
               
1 2 1 
 1 2 1  
 1 2 
 1  =  1  −  0  =⇒ Span 0  ,  1  ,  1  = Span 0 , 1  .
               
 
   
1 0 −1 −1 0 1 −1 0
   

Exercise 25.1
Show that the set    

 1 1 
C=  ,
0   1 
   
 
−1 1
 

is linearly independent.

Solution. For c1 , c2 ∈ R, consider


     
1 1 0
c1  0  + c2  1  =  0  .
     

−1 1 0
We obtain the system of equations
c1 + c2 = 0
c2 = 0 .
−c1 + c2 = 0
Carrying the coefficient matrix to row echelon form gives
     
1 1 −→ 1 1 −→ 1 1
 0 1  0 1   0 1 
     

−1 1 R3 +R1 0 2 R3 −2R2 0 0

180
from which we see that there are no free variables and hence a unique (trivial) solution.
Thus C is linearly independent.
From Examples 25.1 and 25.2, we see the appearence of homogeneous systems. When check-
ing for linear independence of a set {~v1 , . . . , ~vk } ⊆ Rn , we consider the vector equation
 
i c1
. 
h
~0 = c1~v1 + · · · + ck~vk = ~v1 · · · ~vk   .. 
ck
which we see leads to a matrix-vector equation of a homogeneous system of equations. We
are thus interested in whether or not we have a unique solution (no free variables) or if we
have infinitely many solutions (at least one free variable). The following theorem follows
from the System-Rank Theorem (Theorem 14.1(2)).

Theorem 25.1
h i
n
Let B = {~v1 , . . . , ~vk } be as set of k vectors in R and let A = ~v1 · · · ~vk . Then
B is linearly independent if and only if rank (A) = k.

Exercise 25.2
Determine whether the set
     
 1
 4 7 
B =  2 , 5 , 8 
     
 
3 6 9
 

is linearly dependent or linearly independent.

Solution. By Theorem 25.1, we will find the rank of the matrix A whose columns are the
vectors in B. Since there are three vectors in B, B will be linearly independent if and only
if rank (A) = 3. Otherwise, B will be linearly dependent. We have
     
1 4 7 −→ 1 4 7 −→ 1 4 7
A =  2 5 8  R2 −2R1  0 −3 −6   0 −3 −6  .
     

3 6 9 R3 −3R1 0 −6 −12 R3 −2R2 0 0 0


We thus see that rank (A) = 2 < 3, so B is linearly dependent.

Following Example 25.2, we saw that for a linearly dependent set B, we could express at
least one of the vectors in B as a linear combination of the other vectors in B. The following
theorem shows that this is always the case.

181
Theorem 25.2
A set of vectors {~v1 , . . . , ~vk } in Rn is linearly dependent if and only if

~vi ∈ Span {~v1 , . . . , ~vi−1 , ~vi+1 , . . . , ~vk }

for some i = 1, . . . , k.

Proof. Assume first that the set {~v1 , . . . , ~vk } in Rn is linearly dependent. Then there exist
c1 , . . . , ck ∈ R, not all zero, such that

c1~v1 + · · · + ci−1~vi−1 + ci~vi + ci+1~vi+1 + · · · + ck~vk = ~0.

Without loss of generality, assume that ci 6= 0. Then we may isolate for ~vi on one side of the
equation:
c1 ci−1 ci+1 ck
~vi = − ~v1 − · · · − ~vi−1 − ~vi+1 − · · · − ~vk
ci ci ci ci
which shows that ~vi ∈ Span {~v1 , . . . , ~vi−1 , ~vi+1 , . . . , ~vk }. To prove the other implication, we
assume that ~vi ∈ Span {~v1 , . . . , ~vi−1 , ~vi+1 , . . . , ~vk } for some i = 1, . . . , k. Then there exist
d1 , . . . , di−1 , di+1 , . . . , dk ∈ R such that

~vi = d1~v1 + · · · + di−1~vi−1 + di+1~vi+1 + · · · + dk~vk

and rearranging gives

d1~v1 + · · · + di−1~vi−1 − 1~vi + di+1~vi+1 + · · · + dk~vk = ~0

which shows that {~v1 , . . . , ~vk } is linearly dependent.


We conclude with a few more examples involving linear dependence and linear independence.
The first shows that if a set of k vectors contains the zero vector, then the set is linearly
dependent.

Example 25.3

Consider the set {~v1 , . . . , ~vk−1 , ~0} of vectors in Rn . Then

0~v1 + · · · + 0~vk−1 + (1)~0 = ~0

which shows that {~v1 , . . . , ~vk−1 , ~0} is linearly dependent.

182
Example 25.4

Let ~v1 , ~v2 , ~v3 ∈ Rn be such that {~v1 , ~v2 , ~v3 } is linearly independent. Prove that the set

{~v1 , ~v1 + ~v2 , ~v1 + ~v2 + ~v3 }

is linearly independent.
Proof. We must prove that the set {~v1 , ~v1 + ~v2 , ~v1 + ~v2 + ~v3 } is linearly independent.
To do so, we consider the vector equation

c1~v1 + c2 (~v1 + ~v2 ) + c3 (~v1 + ~v2 + ~v3 ) = ~0, c1 , c2 , c3 ∈ R.

Rearranging this equation gives

(c1 + c2 + c3 )~v1 + (c2 + c3 )~v2 + c3~v3 = ~0.

Since {~v1 , ~v2 , ~v3 } is linearly independent, we must have that

c1 + c2 + c3 = 0
c2 + c3 = 0
c3 = 0

We see that c3 = 0 and it follows that c2 = 0 and then that c1 = 0. Hence we have
only the trivial solution, so our set {~v1 , ~v1 + ~v2 , ~v1 + ~v2 + ~v3 } is linearly independent.

Example 25.5

Let {~v1 , . . . , ~vk } be a linearly independent set of vectors in Rn . Prove that


{~v1 , . . . , ~vk−1 } is linearly independent.
Proof. It is given that {~v1 , . . . , ~vk } is linearly independent. Suppose for a contradiction
that {~v1 , . . . , ~vk−1 } is linearly dependent. Then there exist c1 , . . . , ck−1 , not all zero,
such that
c1~v1 + · · · + ck−1~vk−1 = ~0.
But then adding 0~vk to both sides gives

c1~v1 + · · · + ck−1~vk−1 + 0~vk = ~0

which shows that {~v1 , . . . , ~vk } is linearly dependent, since not all of c1 , . . . , ck−1 are
zero. But this is a contradiction since we were given that {~v1 , . . . , ~vk } is linearly
independent. Hence, our supposition that {~v1 , . . . , ~vk−1 } is linearly dependent was
incorrect. This leaves only that {~v1 , . . . , ~vk−1 } is linearly independent, as required.

183
In the solution of Example 25.5, we used a proof technique known as Proof by Contradiction.
When using proof by contradiction, you are essentially proving a statement is true by proving
that it cannot be false. We are told that the set A = {~v1 , . . . , ~vk } is linearly independent
and asked to show that, under this assumption, the set B = {~v1 , . . . , ~vk−1 } is also linearly
independent. The set B must be either linearly independent or linearly dependent, but not
both. So instead of proving that B was linearly independent directly, we suppose that B
is linearly dependent. From that supposition, we argue until we arrived at A being linearly
dependent, which is impossible since we are given that A is linearly independent as part of
our hypothesis. A being linearly dependent is thus a “contradiction” and since it followed
from our supposition that B is linearly dependent, the supposition that B is linearly depen-
dent is incorrect. Since B is not linearly dependent, it must be linearly independent (which
is what we were asked to prove).

It follows from the last example that every nonempty subset of a linearly independent set
is also linearly independent. Of course, we should consider the empty set, ∅, since it is a
subset of every set. As the empty set contains no vectors, we cannot exhibit vectors from the
empty set that form a linearly dependent set. Thus, the empty set is (vacuously) linearly
independent. Thus, we can now say that given any linearly independent set B, every subset
of B is linearly independent as well.

184
Lecture 26

Subspaces of Rn
We have seen that linear combinations have played an important role thus far. We now look
at subsets of Rn that are closed under linear combinations, that is, subsets of Rn that are
closed under vector addition and scalar multiplication.

Definition 26.1: Subspace of Rn


A subset S of Rn is called a subspace of Rn if for every w,
~ ~x, ~y ∈ S and c, d ∈ R we
have

S1 ~x + ~y ∈ S S is closed under addition

S2 ~x + ~y = ~y + ~x addition is commutative

S3 (~x + ~y ) + w
~ = ~x + (~y + w)
~ addition is associative

S4 There exists a vector ~0 ∈ S such that ~v + ~0 = ~v for every ~v ∈ S zero vector

S5 For each ~x ∈ S there exists a (−~x) ∈ S such that ~x + (−~x) = ~0 additive inverse

S6 c~x ∈ S S is closed under scalar multiplication

S7 c(d~x) = (cd)~x scalar multiplication is associative

S8 (c + d)~x = c~x + d~x distributive law

S9 c(~x + ~y ) = c~x + c~y distributive law

S10 1~x = ~x scalar multiplicative identity

Given a subset S of Rn , it would appear that in order to show that S is a subspace of Rn , we


would need to verify all ten properties given in the above definition. However, we can use
the fact that since S is a subset of Rn , every vector in S is a vector in Rn and so properties
S2, S3, S7, S8, S9 and S10 are simply properties V2, V3, V7, V8, V9 and V10 of Theorem
6.1 and so must hold.

Thus to show that a subset S of Rn is a subspace of Rn , we need only verify properties


S1, S4, S5 and S6 as these depend on S (and not on Rn as in Theorem 6.1). However, our
work is cut even further as once S1, S4 and S6 hold, we may conclude that for any ~x ∈ S,
(−~x) = (−1)~x ∈ S, that is, S5 hold. This leads to the following theorem.

185
Theorem 26.1: Subspace Test
A subset S is a subspace of Rn if

(1) ~0Rn ∈ S, S contains the zero vector of Rn

(2) if ~x, ~y ∈ S, then ~x + ~y ∈ S, S is closed under vector addition

(3) if ~x ∈ S and c ∈ R, then c~x ∈ S. S is closed under scalar multiplication

We normally write ~0 instead of ~0Rn as it is clear we are talking about the zero vector of
Rn . From this definition, we see that {~0} is a subspace of Rn , called the trivial subspace,
and Rn is a subspace of Rn . It follows from Theorem 26.1 that if S is a subspace of Rn and
~v1 , . . . , ~vk ∈ S, then c1~v1 + · · · + ck~vk ∈ S for any c1 , . . . , ck ∈ R.

Example 26.1
The set (" # " #)
1 1
S= ,
1 2

is not a subspace of R2 since ~0 ∈


/ S.

Example 26.1 demonstrates that it’s easy to show a subset of Rn is not a subspace of Rn if
~0 ∈
/ S. We also note that since [ 11 ] + [ 12 ] = [ 23 ] ∈
/ S, S is not closed under vector addition,
and since 2 [ 11 ] = [ 22 ] ∈
/ S, S is not closed under scalar multiplication.

Example 26.2
Show that   

 x 1 

3
S =  x2  ∈ R x1 − x2 + 2x3 = 0
 
 
x3
 

is a subspace of R3 .
h y1 i h z1 i
Solution. Since 0 − 0 + 2(0) = 0, ~0 ∈ S. Now assume ~y = yy23 and ~z = zz23 are
vectors in
h y1S. Then y1 − y2 + 2y3 = 0 and z1 − z2 + 2z3 = 0. We must show that
+z1 i
~y + ~z = y2 +z2 ∈ S by showing that (y1 + z1 ) − (y2 + z2 ) + 2(y3 + z3 ) = 0. We have
y3 +z3

(y1 + z2 ) − (y2 + z2 ) + 2(y3 + z3 ) = (y1 − y2 + 2y3 ) + (z1 − z2 + 2z3 ) = 0 + 0 = 0

so ~y + ~z ∈ S and S is closed under vector addition. Finally, for c ∈ R, we must show

186
h cy1 i
that c~y = cy2
cy3
∈ S by showing that (cy1 ) − (cy2 ) + 2(cy3 ) = 0. We have

(cy1 ) − (cy2 ) + 2(cy3 ) = c(y1 − y2 + 2y3 ) = x(0) = 0

so c~y ∈ S and S is closed under scalar multiplication. Thus, S is a subspace of R3 by


the Subspace Test.

Example 26.3
Show that ( " # )
1
S= c c∈R
3
is a subspace of R2 .
Solution. Taking c = 0 gives [ 00 ] ∈ S. Now let ~x, ~y ∈ S. Then there exist c1 , c2 ∈ R
such that " # " #
1 1
~x = c1 and ~y = c2 .
3 3
Then we have that
" # " # " #
1 1 1
~x + ~y = c1 + c2 = (c1 + c2 )
3 3 3

so ~x + ~y ∈ S and for any c ∈ R,


" #! " #
1 1
c~x = c c1 = (cc1 )
3 3

so c~x ∈ S. Hence S is a subspace of R2 by the Subspace Test.

Exercise 26.1
Show that   
 x1
 

3
S =  x2  ∈ R x1 + x2 = 0 and x2 − x3 = 0
 
 
x3
 

is a subspace of R3 .
h x1 i h y1 i
~
Solution. Since 0 + 0 = 0 and 0 − 0 = 0, 0 ∈ S. Now let ~x = x23 and ~y = yy23 be two
x

vectors in S. Then x1 + x2 = 0 and x2 − x3 = 0 and also y1 + y2 = 0 and y2 − y3 = 0. For

187
h x1 +y1 i
~x + ~y = x2 +y2 , we have
x3 +y3

(x1 + y1 ) + (x2 + y2 ) = (x1 + x2 ) + (y1 + y2 ) = 0 + 0 = 0

and

(x2 + y2 ) − (x3 + y3 ) = (x2 − x3 ) + (y2 − y3 ) = 0 + 0 = 0


h cx1 i
so ~x + ~y ∈ S. For c~x = cx
cx3
2 with c ∈ R, we have

cx1 + cx2 = c(x1 + x2 ) = c(0) = 0

and

cx2 − cx3 = c(x2 − x3 ) = c(0) = 0

so c~x ∈ S. Hence S is a subspace of R3 by the Subspace Test.


The next theorem shows that given a set {~v1 , . . . , ~vk } of vectors in Rn , the span of that set
will always be a subspace of Rn .

Theorem 26.2
Let ~v1 , . . . , ~vk ∈ Rn . Then S = Span {~v1 , . . . , ~vk } is a subspace of Rn .

Proof. Clearly we have ~0 = 0~v1 + · · · + 0~vk ∈ S. Now let ~x, ~y ∈ S. Then there exist
c1 , . . . , ck , d1 , . . . , dk ∈ R such that

~x = c1~v1 + · · · + ck~vk and ~y = d1~v1 + · · · + dk~vk .

Then

~x + ~y = c1~v1 + · · · + ck~vk + d1~v1 + · · · + dk~vk = (c1 + d1 )~v1 + · · · + (ck + dk )~vk

and so ~x + ~y ∈ S as it is a linear combination of ~v1 , . . . , ~vk . For any c ∈ R,

c~x = c(c1~v1 + · · · + ck~vk ) = (cc1 )~v1 + · · · + (cck )~vk

from which we see that c~x ∈ S as it is also a linear combination of ~v1 , . . . , ~vk . Thus, S is a
subspace of Rn .
Theorem 26.2 shows that we can always generate a subspace by taking the span of a finite
set of vectors. In fact, every subspace S of Rn can be expressed as S = Span {~v1 , . . . , ~vk }
for some ~v1 , . . . , ~vk ∈ S where k is a positive integer, however we omit the proof of this last
statement. Thus it is exactly the subspaces of Rn that have spanning sets.

188
Lecture 27

Bases of Subspaces16
Definition 27.1: Basis
Let S be a subspace of Rn and let B = {~v1 , . . . , ~vk } ⊆ S. We say that B is a basis for
S if

(1) B is linearly independent,

(2) S = Span B.

If S = {~0}, then we define B = ∅ to be the basis for S.

In short, a basis for a subspace S of Rn is a linearly independent spanning set for S. We will
focus first on the case S = Rn .

Example 27.1
Show that (" # " #)
1 2
B= ,
2 5
is a basis for R2 .
Solution. We first show that B is linearly independent. Consider the matrix
" #
1 2
A= .
2 5

Carrying A to row echelon form, we have


" # " #
1 2 −→ 1 2
2 5 R2 −2R1 0 1

from which we see that rank (A) = 2. By Theorem 25.1, B is linearly independent.
Moreover, since A has 2 rows and rank (A) = 2, the system A~c = ~x is consistent for
every ~x ∈ R2 by the System-Rank Theorem(3) so ~x ∈ Span B for every ~x ∈ R2 by
Theorem 24.1. This shows that R2 ⊆ Span B. Since [ 12 ] , [ 25 ] ∈ R2 , and R2 is closed
under linear combinations, we have that Span B ⊆ R2 . Hence Span B = R2 and so B
is a basis for R2 .

16
“Bases” is the plural of “basis”.

189
Note that given a set B = {~v1 , . . . , ~k} ⊆ Rn , Span B ⊆ Rn since Rn is closed under linear
combinations. Thus, if Rn ⊆ Span B, we may immediately conclude that Span B = Rn .

Example 27.2
Show that      
 1
 0 0 
B =  0 , 1 , 0 
     
 
0 0 1
 

is a basis for R3 .
Solution. Consider the matrix
 
1 0 0
A =  0 1 0 .
 

0 0 1

It is clear that rank (A) = 3 so B is linearly independent by Theorem 25.1. Since


A has 3 rows and rank (A) = 3, the system A~c = ~x is consistent for any ~x ∈ R3 by
the System-Rank Theorem(3), so ~x ∈ Span B for every ~x ∈ R3 by Theorem 24.1 and
R3 ⊆ Span B. Thus Span B = R3 and B is a basis for R3 .

The basis in Example 27.2 known as the standard basis for R3 . We similarly have a standard
basis for Rn .

Definition 27.2: Standard Basis for Rn


Let ~e1 , . . . , ~en ∈ Rn be the columns of the n × n identity matrix I. The set {~e1 , . . . , ~en }
is a basis for Rn , called the standard basis for Rn .

In R2 the standard basis is


(" # " #)
1 0
{~e1 , ~e2 } = ,
0 1

and in R3 the standard basis is


     
 1
 0 0 
{~e1 , ~e2 , ~e3 } =  0  ,  1  ,  0  .
     
 
0 0 1
 

It should now be clear how to write out the standard basis for R4 , R5 and so on. Note
that we have seen the standard basis for R3 before in Example 6.5. As with Example 6.5, it

190
is easy to write any vector in Rn as a linear combination of the standard basis vectors for Rn .

Both bases we have seen for R2 contained two vectors. This is true in general: every basis
for Rn will contain n vectors.

Theorem 27.1
Let ~v1 , . . . , ~vk ∈ Rn . If B = {~v1 , . . . , ~vk } is a basis for Rn , then k = n.

Proof. Let A = [ ~v1 · · · ~vk ] ∈ Mn×k (R). Since B is a basis for Rn , B is linearly independent
and Span B = Rn . Since B is linearly independent, rank (A) = k by Theorem 25.1, so we
have k ≤ n. Since Span B = Rn , we have that for any ~x ∈ Rn , the system A~c = ~x is
consistent by Theorem 24.1. It follows from the System-Rank Theorem(3) (Theorem 14.1)
that rank (A) = n, so n ≤ k. Since k ≤ n and n ≤ k, we have k = n.
Thus if a set B = {~v1 , . . . , ~vk } ⊆ Rn with k 6= n, then B cannot be a basis for Rn . However,
k = n does not guarantee that B is a basis for Rn . For example, the set
(" # " #)
1 2
B= , ⊆ R2
1 2

contains 2 vectors, but is not a basis for R2 since it is linearly dependent.

Theorem 27.2
Let B = {~v1 , . . . , ~vn } ⊆ Rn . Then B is a basis for Rn if and only if [ ~v1 · · · ~vn ] has
rank n.

Proof. Let B = {~v1 , . . . , ~vn } ⊆ Rn and let A = [ ~v1 · · · ~vn ] ∈ Mn×n (R). If B is a basis for
Rn , then B is linearly independent, so rank (A) = n by Theorem 25.1. On the other hand,
if rank (A) = n, then it follows from Theorem 25.1 that B is linearly independent. Also we
see that the system A~c = ~x is consistent for every ~x ∈ Rn by the System-Rank Theorem(3),
and it then follows from Theorem 24.1 that Rn ⊆ Span B. Thus Span B = Rn and B is a
basis for Rn .

Example 27.3

Which of the following are bases for R3 ?


     
 0
 1 1 

(a) B1 =  1  ,  0 , 1  .
     
 
1 1 0
 

191
     
 1
 2 5 
(b) B2 =  1  ,  1  ,  3  .
     
 
3 2 7
 

Solution. We use Theorem 27.2.


(a) Since
       
0 1 1 R1 ↔R2 1 0 1 −→ 1 0 1 −→ 1 0 1
A1 =  1 0 1  −→  0 1 1   0 1 1   0 1 1 ,
       

1 1 0 1 1 0 R3 −R1 0 1 −1 R3 −R2 0 0 −2

we see that rank (A1 ) = 3, so B1 is a basis for R3 .

(b) Since
     
1 2 5 −→ 1 2 5 −→ 1 2 5
A2 =  1 1 3   0 −1 −2   0 −1 −2  ,
     
R2 −R1

3 2 7 R3 −3R1 0 −4 −8 R3 −4R2 0 0 0

we see that rank (A2 ) = 2 < 3, so B2 is not a basis for R2 .

We now turn our attention to finding bases for subspaces S of Rn other than Rn itself.

Example 27.4
Find a basis for the subspace
  
 x1
 

3
S =  x2  ∈ R x1 − x2 + 2x3 = 0
 
 
x3
 

of R3 .
h x1 i
Solution. Let ~x = x2
x3
∈ S. Then x1 − x2 + 2x3 = 0, so x1 = x2 − 2x3 . We have
       
x1 x2 − 2x3 1 −2
~x =  x2  =  x2  = x 2  1  + x3  0  .
       

x3 x3 0 1

192
Letting    

 1 −2 

B =  1 , 0  ,
   
 
0 1
 

we see that S ⊆ Span B. Thus S = Span B. Since neither vector in B is a scalar


multiple of the other, B is linearly independent and thus a basis for S.

When finding a spanning set for a subspace S of Rn , we choose an arbitrary ~x ∈ S and try
to “decompose” ~x as a linear combination of some ~v1 , . . . , ~vk ∈ S. This then shows that
S ⊆ Span {~v1 , . . . , ~vk }. Technically, we should also show that Span {~v1 , . . . , ~vk } ⊆ S, but this
is trivial as S is a subspace and thus contains all linear combinations of ~v1 , . . . , ~vk . Thus for
a subspace S of Rn , S ⊆ Span {~v1 , . . . , ~vk } implies that S = Span {~v1 , . . . , ~vk }, and we don’t
normally show (or even mention) that Span {~v1 , . . . , ~vk } ⊆ S.

Example 27.5
Consider the subspace   

 a − b 

S =  b − c  a, b, c ∈ R
 
 
c−a
 

of R3 . Find a basis for S.


Solution. Let ~x ∈ S. Then for some a, b, c ∈ R,
       
a−b 1 −1 0
~x =  b − c  = a  0  + b  1  + c  −1  .
       

c−a −1 0 1

Thus      

 1 −1 0 
S = Span  0  ,  1  ,  −1  .
     
 
−1 0 1
 

Now since      
0 1 −1
 −1  = −  0  −  1  ,
     

1 −1 0

193
we have from Theorem 24.2 that
   

 1 −1 
S = Span  0 , 1 
   
 
−1 0
 

so    

 1 −1 
B=  0 , 1 
   
 
−1 0
 

is a spanning set for S. Moreover, since neither vector in B is a scalar multiple of the
other, B is linearly independent and hence a basis for S.

The subspace S from Example 27.5 is a plane through the origin and a vector equation for
S is    
1 −1
~x = s  0  + t  1  , s, t ∈ R.
   

−1 0

Exercise 27.1
Find a basis for the subspace
  

 x 1 

S =  x2  x1 + x2 = 0 and x2 − x3 = 0
 
 
x3
 

of R3 .
h x1 i
Solution. Let ~x = xx23 ∈ S. Then x1 + x2 = 0 and x2 − x3 = 0 and thus x1 = −x2 and
x3 = x2 . It follows that
    
x1 −x2 −1
~x =  x2  =  x2  = x2  1  .
     

x3 x2 1
nh −1 io h −1 i
Thus S ⊆ Span 1 . Now since 1 ∈ S and since S is closed under linear combina-
1 1

194
nh −1 io nh −1 io
tions, we have that Span 1 ⊆ S and so Span 1 = S. Hence the set
1 1
 

 −1 

B=  1 
 
 
1
 

is a spanning set for S. Since B consists of a single nonzero vector, B is linearly independent
and is hence a basis for S.
Given a subspace S of Rn with B a basis for S, the next theorem shows the importance of
having both Span B = S and B be linearly indendent.

Theorem 27.3
If B = {~v1 , . . . , ~vk } is a basis for a subspace S ⊆ Rn , then every ~x ∈ S can be expressed
as a linear combination ~v1 , . . . , ~vk in a unique way.

Proof. Since B is a basis for S, S = Span B and so every ~x ∈ S can be expressed as a linear
combination of the vectors in B. Thus we only need to show that this expression is unique.
Suppose for some ~x ∈ S we have

~x = c1~v1 + · · · + ck~vk and ~x = d1~v1 + · · · + dk~vk

for some c1 , d1 , . . . , ck , dk ∈ R. Then

c1~v1 + · · · + ck~vk = d1~v1 + · · · + dk~vk

and rearranging gives


(c1 − d1 )~v1 + · · · + (ck − dk )~vk = ~0.
Since B is linearly independent, we have that c1 − d1 = · · · = ck − dk = 0, that is, ci = di
for i = 1, . . . , k, which shows any ~x ∈ S can be expressed uniquely as a linear combination
of the vectors in B.

195
Lecture 28

Dimension of a Subspace
Intuitively, we have an understanding of what dimension is. We understand that R2 is two-
dimensional and that R3 is three-dimensional. This section gives a precise definition of what
the dimension of a subspace is. This notion of dimension can be extended to sets that are
not subspaces of Rn , but we will not pursue that idea here.

Theorem 28.1
Let B = {~v1 , . . . , ~vk } be a basis for a subspace S of Rn . If C = {w ~ ` } is a set in
~ 1, . . . , w
S with ` > k, then C is linearly dependent.

Proof. We prove Theorem 28.1 in the case k = 2 and ` = 3, the proof of the general result
being similar. Thus we have B = {~v1 , ~v2 } is a basis for S and C = {w
~ 1, w ~ 3 } is a set
~ 2, w
of three vectors in S. Since B is a basis for S, Theorem 27.3 gives that there are unique
a1 , a2 , b1 , b2 , c1 , c2 ∈ R so that

w
~ 1 = a1~v1 + a2~v2 , w
~ 2 = b1~v1 + b2~v2 and w
~ 3 = c1~v1 + c2~v2 .

Now for t1 , t2 , t3 ∈ R, consider


~0 = t1 w
~ 1 + t2 w
~ 2 + t3 w
~3
= t1 (a1~v1 + a2~v2 ) + t2 (b1~v1 + b2~v2 ) + t3 (c1~v1 + c2~v2 )
= (a1 t1 + b1 t2 + c1 t3 )~v1 + (a2 t1 + b2 t2 + c2 t3 )~v2

Since B = {~v1 , ~v2 } is linearly independent we have,

a1 t1 + b1 t2 + c1 t3 = 0
a2 t1 + b2 t2 + c2 t3 = 0

This is an underdetermined homogeneous system, so it is consistent with nontrivial solutions


and it follows that C = {w
~ 1, w ~ 3 } is linearly dependent.
~ 2, w

It follows Theorem 28.1 that if B = {~v1 , . . . , ~vk } is a basis for a subspace S of Rn and
C = {w ~ ` } is a linearly independent subset of S, then ` ≤ k. We now state our main
~ 1, . . . , w
result.

196
Theorem 28.2
If B = {~v1 , . . . , ~vk } and C = {w ~ ` } are both bases for a subspace S of Rn , then
~ 1, . . . , w
k = `.

Proof. Since B is a basis for S and C is linearly independent, we have that ` ≤ k. Since C
is a basis for S and B is linearly independent, k ≤ `. Hence k = `.
Hence, given a subspace S of Rn , there may be many bases for S, but they will all contain
the same number of vectors. This motivates the following definition.

Definition 28.1: Dimension


If B = {~v1 , . . . , ~vk } is a basis for a subspace S of Rn , then we say the dimension of S
is k, and we write dim(S) = k. If S = {~0}, then dim(S) = 0 since ∅ is a basis for S.

Example 28.1

Since the standard basis for Rn is {~e1 , . . . , ~en }, we see that dim(Rn ) = n.

Example 28.2
We saw in Example 27.5 that the subspace
  

 a − b 

S =  b − c  a, b, c ∈ R
 
 
c−a
 

of R3 had basis    

 1 −1 
B=  0 , 1  ,
   
 
−1 0
 

so dim(S) = 2.

Theorem 28.3
If S is a k−dimensional subspace of Rn with k > 0, then
(1) A set of more than k vectors in S is linearly dependent,

(2) A set of fewer than k vectors in S cannot span S,

(3) A set of exactly k vectors in S spans S if and only if it is linearly independent.

197
Example 28.3
h 1
i h 1
i
Let S be a subspace of R3 with dim(S) = 2 and suppose that ~v1 = 1
−2
and ~v2 = 2
−3
belong to S. Find a basis for S.
Solution. Since ~v1 and ~v2 are nonzero and nonparallel, we have that {~v1 , ~v2 } is a linearly
independent set of two vectors in S. Since dim(S) = 2, we have that S = Span {~v1 , ~v2 }
by Theorem 28.3(3). Thus {~v1 , ~v2 } is a basis for S.

Note that we must know dim(S) before we use Theorem 28.3. In the previous example, we
could not have used the linear independence of {~v1 , ~v2 } to conclude that S = Span {~v1 , ~v2 }
if we weren’t told the dimension of S.

Three Fundamental Subspaces Associated with a Matrix


In this section, we define the nullspace, column space and row space of a matrix A ∈
Mm×n (R).

Definition 28.2: Nullspace

Let A ∈ Mm×n (R). The nullspace of A (sometimes called the kernel of A) is the subset
of Rn defined by
Null (A) = {~x ∈ Rn | A~x = ~0}.

Definition 28.3: Column Space

Let A = [ ~a1 · · · ~an ] ∈ Mm×n (R). The column space of A is the subset of Rm
defined by
Col (A) = {A~x | ~x ∈ Rn } = Span {~a1 , . . . , ~an }.

Definition 28.4: Row Space

~r1T
 
 . 
Let A =  ..  ∈ Mm×n (R). The row space of A is subset of Rn defined by
~rmT

Row (A) = {AT ~x | ~x ∈ Rm } = Span {~r1 , . . . , ~rm }.

198
Theorem 28.4
Let A ∈ Mm×n (R). Then Null (A) and Row (A) are subspaces of Rn and Col (A) is a
subspace of Rm .

Proof. By definition, Null (A) is a subset of Rn . We use the Subspace Test to show Null (A)
is a subspace of Rn . Since A~0Rn = ~0Rm , ~0Rn ∈ Null (A). For ~y , ~z ∈ Null (A), we have that
A~y = ~0Rm = A~z. Then

A(~y + ~z) = A~y + A~z = ~0Rm + ~0Rm = ~0Rm

so ~y + ~z ∈ Null (A). For c ∈ R,

A(c~x) = cA~x = c(~0Rm ) = ~0Rm

so c~x ∈ Null (A). Thus Null (A) is a subspace of Rn . Since Col (A) ⊆ Rm and Col (A) =
Span {~a1 , . . . , ~an } where ~a1 , . . . , ~an are the columns of A, Col (A) is a subspace of Rm by The-
orem 26.2. Finally, since Row (A) ⊆ Rn and Row (A) = Span {~r1 , . . . , ~rm } where ~r1T , . . . , ~rm T

are the rows of A, Row (A) is a subspace of Rn by Theorem 26.2.


Given a matrix A ∈ Mm×n (R), we now turn our attention to finding bases for these sub-
spaces. The following two theorems will be useful.

Theorem 28.5
Let A ∈ Mm×n (R). If B ∈ Mm×n (R) is obtained from A by a series of elementary row
operations, then Row (B) = Row (A).

Sketch of the Proof. Let A ∈ Mm×n (R) with rows ~r1T , . . . , ~rmT . It is sufficient to show that
Row (A) is unchanged by each of the three elementary row operations. Let 1 ≤ i, j ≤ m
with i 6= j. If we swap the ith row and jth row of A, then the row space of the resulting
matrix will be spanned by

{~r1 , . . . , ~ri−1 , ~rj , ~ri+1 , . . . , ~rj−1 , ~ri , ~rj+1 , . . . , ~rm }

(we’ve shown the case for i < j, the case j < i being similar) and it’s not difficult to see that

Span {~r1 , . . . , ~rm } = Span {~r1 , . . . , ~ri−1 , ~rj , ~ri+1 , . . . , ~rj−1 , ~ri , ~rj+1 , . . . , ~rm }. (25)

If we add k times the ith row of A to the jth row of A, then the resulting matrix will have
a row space spanned by
{~r1 , . . . , ~rj−1 , ~rj + k~ri , ~rj+1 , . . . , ~rm }
and it’s not difficult to show that

Span {~r1 , . . . , ~rm } = Span {~r1 , . . . , ~rj−1 , ~rj + k~ri , ~rj+1 , . . . , ~rm }. (26)

199
Finally, if we multiply the ith row of A by a nonzero scalar k ∈ R, then the row space of the
resulting matrix will be spanned by

{~r1 , . . . , ~ri−1 , k~ri , ~ri+1 , . . . , ~rm }

and it is again not difficult to show that

Span {~r1 , . . . , ~rm } = Span {~r1 , . . . , ~ri−1 , k~ri , ~ri+1 , . . . , ~rm }. (27)

Together, equations (25), (26) and (27) show that if B is obtained from A by a series of
elementary row operations, then Row (B) = Row (A).

Theorem 28.6

Let A = [ ~a1 · · · ~an ] ∈ Mm×n (R) and suppose B = [ ~b1 · · · ~bn ] ∈ Mm×n (R) is
obtained from A by a series of elementary row operations. Then for any c1 , . . . , cn ∈ R,
c1~a1 + · · · + cn~an = ~0 if and only if c1~b1 + · · · + cn~bn = ~0.

Proof. Since B can be obtained from A by a series of elementary


 c1  row operations, the aug-
mented matrix [ A | ~0 ] reduces to [ B | ~0 ]. Thus ~c = ... is a solution to the system
cn
A~x = ~0 if and only if ~c is a solution to B~x = ~0. Hence c1~a1 + · · · + cn~an = ~0 if and only if
c1~b1 + · · · + cn~bn = ~0.
Theorem 28.6 states that any dependencies among the columns of a matrix A are preserved
by elementary row operations. For example, if the first column of A is the sum of the second
and third columns of A, then any B matrix obtained from A by elementary row operations
will have its first column be the sum of the second and third columns as well.

200
Lecture 29

Example 29.1
Let  
1 1 5 1
A= 1 2 7 2 
 

2 3 12 3
Find a basis for Null (A), Col (A) and Row (A), and state the dimensions of each of
these subspaces.
Solution. Carrying A to RREF gives
     
1 1 5 1 −→ 1 1 5 1 R1 −R2 1 0 3 0
 1 2 7 2  R2 −R1  0 1 2 1  −→  0 1 2 1 
     

2 3 12 3 R3 −2R1 0 1 2 1 R3 −R2 0 0 0 0

The solution to the homogeneous system A~x = ~0 is


     
x1 −3 0
 x2   −2   −1 
 = s  + t , s, t ∈ R
     

 x3   1   0 
x4 0 1
so    

 −3 0 

 −2   −1 
 
B1 =  ,
   


  1   0 

 

0 1
is a spanning set for Null (A). Since each vector in B1 has a 1 where the other has a
0, B1 is linearly independent and is thus a basis for Null (A), so dim(Null (A)) = 2.
To find a basis for Col (A), notice that only the first two rows of the RREF of A
contain leading entries, and that the columns without leading entries can be expressed
as linear combinations of the columns with leading entries:
           
3 1 0 0 1 0
 2  = 3  0  + 2  1  and  1  = 0  0  + 1  1  .
           

0 0 0 0 0 0

201
Theorem 28.6 shows that the same dependencies exist among the corresponding
columns of A. Indeed we can now see that
           
5 1 1 1 1 1
 7  = 3  1  + 2  2  and  2  = 0  1  + 1  2 .
           

12 2 3 3 2 3

Thus, by Theorem 24.2, Col (A) = Span B2 where


   
 1
 1 
B2 =  1  ,  2  .
   
 
2 3
 

Since the first two columns of the RREF of A form a linearly independent set, Theorem
28.6 shows that the first two columns of A form a linearly independent set, so B2 is
linearly independent and thus a basis for Col (A) so dim(Col (A)) = 2. To find a basis
for Row (A), Theorem 28.5 tells us that the rows of the reduced row echelon form of
A span Row (A), so
     

 1 0 0 


 0   1 
  0 
Row (A) = Span   ,  ,  .
     

  3   2   0 


 

0 1 0

Since each of the nonzero vectors in our spanning set for Row (A) has a 1 where the
others have a zero, the nonzero vectors in our spanning set are linearly independent
and still span Row (A). Hence
   

 1 0 


 0   1 

B3 =   , 
   


  3   2 


 

0 1

is a basis for Row (A) and dim(Row (A)) = 2.

From Example 29.1, we used the solution to the homogeneous system to quickly generate a
spanning set for Null (A). We also saw that since each vector in the spanning set contained
a 1 where the others contained a 0, no vector was a linear combination of the other so our
spanning set for Null (A) was linearly independent, and thus a basis for Null (A). This will

202
always be the case. For example, consider the system

x 1 + x2 + x3 + 4x5 = 0
x4 + 2x5 = 0

with coefficient matrix " #


1 1 1 0 4
0 0 0 1 2
which is already in reduced row echelon form. We see that x2 , x3 and x5 are free variables.
The solution is given by
       
x1 −1 −1 −4
 x2 1 0 0
       
      
       
 x3  = t1  0  + t2  1  + t3  0 
       
 x4 0 0 −2
       
      
x5 0 0 1

and see that      



 −1 −1 −4 

 
1   0   0

     

 

     
B=  0 , 1 , 0
   



0 0   −2

     

    


 
 0 0 1 

is a spanning set for Null (A). Since each vector has a 1 where the others have a 0, no
vector in B is a linear combination of the others, so Theorem 25.2 gives that B is linearly
independent and thus a basis for Null (A). Note that if A~x = ~0 has only the trivial solution,
then Null (A) = {~0} so ∅ is a basis for Null (A).

The methods for finding bases for the Column Space and Row Space are also similar to
Example 29.1. For A ∈ Mm×n (R), we find a basis for Col (A) by finding the columns in
any row echelon form17 of A that have leading entries - the corresponding columns of A will
form a basis for Col (A). We find a basis for Row (A) by taking the nonzero rows in any row
echelon form of A. We do not need to take the corresponding rows of A when finding a basis
for Row (A)18 .
17
Row echelon form is indeed sufficient here, but if we’re also looking for a basis for Null (A), then it’s
common to carry A to reduced row echelon form.
18
It is not wrong to take the corresponding rows of A, but it’s an easy way to make an error, particulary
if a row swap was made when carrying A to a row echelon form.

203
Example 29.2

Find a basis for Null (A), Col (A) and Row (A) where
 
1 2 1 3 4
A= 3 6 2 6 9 
 

−2 −4 1 1 −1
and state their dimensions.
Solution. We carry A to RREF:
   
1 2 1 3 4 −→ 1 2 1 3 4 R1 +R2

 3 6 2 6 9  R2 −3R1  0 0 −1 −3 −3  −→
   

−2 −4 1 1 −1 R3 +2R1 0 0 3 7 7 R3 +3R2
     
1 2 0 0 1 −→ 1 2 0 0 1 −→ 1 2 0 0 1
 0 0 −1 −3 −3  −R2  0 0 1 3 3  R2 −3R3  0 0 1 0 0 
     

0 0 0 −2 −2 − 12 R3 0 0 0 1 1 0 0 0 1 1
We have
         
x1 −2 −1 
 −2 −1 

 
 x2 1 0 1   0
      
    
     
 

         
 x3

 = s
  0  + t
  0 ,
 s, t ∈ R so B1 =  0 

, 0
 


 x4 0 −1  0   −1
      
    
     
 


 
x5 0 1  0 1 

is a basis for Null (A) showing that dim(Null (A)) = 2. As the first, third and fourth
columns of the RREF of A have leading entries,
     

 1 1 3  
B2 =  3  ,  2  ,  6 
     
 
−2 1 1
 

is a basis for Col (A) and dim(Col (A)) = 3. Finally, the nonzero rows of the reduced
row echelon form of A give
     

 1 0 0 

 
2   0 0

      

   

     
B3 =   0 ,
  1 ,
  0 


 0   0 1

      

   



 1 
0 1 

as a basis for Row (A) and dim(Row (A)) = 3.

204
We state here a theorem concerning the dimensions of these subspaces.

Theorem 29.1
Let A ∈ Mm×n (R). Then

1. dim(Null (A)) = n − rank (A)

2. dim(Col (A)) = rank (A)

3. dim(Row (A)) = rank (A).

Our method for finding a basis for the column space of a matrix can easily be applied to
finding a basis for any subspace given a spanning set for that subspace.

Example 29.3
Let S = Span C where
       

 1 1 1 3 

C =  −1  ,  2  ,  5  ,  6  .
       
 
1 −3 −7 −9
 

Find a basis B for S with B ⊆ C.


Solution. We have
     
1 1 1 3 −→ 1 1 1 3 −→ 1 1 1 3
 −1 2 5 6  R2 +R1  0 3 6 9   0 3 6 9 
     

1 −3 −7 −9 R3 −R1 0 −4 −8 −12 R3 + 34 R2 0 0 0 0

As only the first two columns of a row echelon form of our matrix contain leading
entries, the first two vectors in C comprise B, that is
   

 1 1 

B =  −1  ,  2 
   
 
1 −3
 

is a basis for S.

205
Theorem 29.2: Invertible Matrix Theorem Revisited
Let A ∈ Mn×n (R). The following are equivalent.

(1) A is invertible

(2) rank (A) = n

(3) The reduced row echelon form of A is I

(4) For all ~b ∈ Rn , the system A~x = ~b is consistent and has a unique solution

(5) AT is invertible

(6) Null (A) = {~0}

(7) The columns of A form a linearly independent set

(8) The columns of A span Rn

(9) Col (A) = Rn

(10) Null (AT ) = {~0}

(11) The rows of A form a linearly independent set

(12) The rows of A span Rn

(13) Row (A) = Rn

206
Lecture 30

Vector Spaces
Back in Lecture 6, we defined the operations of vector addition and scalar multiplication
for vectors in Rn . Theorem 6.1 then gave ten properties that vectors in Rn obey under our
definitions of vector addition and scalar multiplication. The notion of a vector space is to
consider a set V of objects with an operation of addition and scalar multiplication defined
upon them such that a similar set of properties as those stated in Theorem 6.1 also hold.
As an example, in Lecture 18 we defined addition and scalar multiplication for matrices in
Mm×n (R), and Theorem 18.1 showed that the same ten properties held for matrices under
these two operations.

Definition 30.1: Vector Space


A set V with an operation of addition, denoted ~x + ~y , and an operation of scalar
multiplication, denoted c~x, c ∈ R is called a vector space over R if for every ~v , ~x, ~y ∈ V
and for every c, d ∈ R

V1: ~x + ~y ∈ V V is closed under addition

V2: ~x + ~y = ~y + ~x addition is commutative

V3: (~x + ~y ) + ~v = ~x + (~y + ~v ) addition is associative

V4: There exists a vector ~0 ∈ V so that ~x + ~0 = ~x for every ~x ∈ V zero vector

V5: For every ~x ∈ V there exists a (−~x) ∈ V so that ~x + (−~x) = ~0 additive inverse

V6: c~x ∈ V V is closed under scalar multiplication

V7: c(d~x) = (cd)~x scalar multiplication is associative

V8: (c + d)~x = c~x + d~x distributive law

V9: c(~x + ~y ) = c~x + c~y distributive law

V10: 1~x = ~x scalar multiplicative identity

We call the elements of V vectors.

Note that in the above definition, “over R” means that our scalars are real numbers. Later,
we will briefly mention vector spaces over C. Until then, all vector spaces are over R and we
will simply say “vector space”. As usual, properties V1 and V6 combine to give that a vector
space V is closed under linear combinations. Note also that the textbook uses a boldface x

207
to denote a vector ~x in an arbitrary vector space V.

Example 30.1

Rn and Mm×n (R) are vector spaces under the operations of vector addition and scalar
multiplication.

Example 30.2
Let a, b ∈ R with a < b. With the standard addition and scalar multiplication,

• The set F(a, b) of all functions f : (a, b) → R is a vector space

• The set C(a, b) of all continuous functions f : (a, b) → R is a vector space

• The set C 1 (a, b) of all differentiable functions f : (a, b) → R is a vector space

Example 30.3
The set of discontinuous functions f : R → R with the standard addition and scalar
multiplication is not a vector space. To see this, consider
( (
1, x ≥ 0 0, x ≥ 0
f1 (x) = and f2 (x) =
0, x < 0 1, x < 0

Both are discontinuous, but their sum

(f1 + f2 )(x) = f1 (x) + f2 (x) = 1

for every x ∈ R and is thus continuous. Hence, V1 fails: the set of discontinuous
functions is not closed under addition and is thus not a vector space.

The following theorem illustrates the importance of the ten vector space axioms given in
Definition 30.1. They are the basic “truths” of a vector space from which all other state-
ments regarding vector spaces can be derived.

Theorem 30.1
If V is a vector space, then for every ~x ∈ V,

(1) 0~x = ~0

(2) −~x = (−1)~x.

208
Proof.

(1) For ~x ∈ V, we have

0~x = 0~x + ~0 by V4

= 0~x + ~x + (−~x) by V5

= 0~x + 1~x + (−~x) by V10
= (0~x + 1~x) + (−~x) by V3
= (0 + 1)~x + (−~x) by V8
= 1~x + (−~x) since 0 + 1 = 1
= ~x + (−~x) by V10
= ~0 by V5.

(2) For ~x ∈ V, we have

(−1)~x = (−1)~x + ~0 by V4

= (−1)~x + ~x + (−~x) by V5

= (−1)~x + 1~x + (−~x) by V10

= (−1)~x + 1~x + (−~x) by V3

= (−1) + 1 ~x + (−~x) by V8
= 0~x + (−~x) since − 1 + 1 = 0
= ~0 + (−~x) by part (1) above
= (−~x) + ~0 by V2
= −~x by V4.

Our work involving spanning sets, linear independence and linear dependence, bases and
subspaces all carry over naturally to vector spaces. We restate those definitions here for an
arbitrary vector space V.

Definition 30.2: Span

Let B = {~v1 , . . . , ~vk } be a set of vectors in a vector space V. The span of B is

Span B = {c1~v1 + · · · + ck~vk | c1 , . . . , ck ∈ R}.

The set Span B is spanned by B, and B is a spanning set for Span B.

209
Definition 30.3: Linear Dependence and Independence

Let B = {~v1 , . . . , ~vk } be a set of vectors in a vector space V. We say that B is linearly
dependent if there exist c1 , . . . , ck ∈ R, not all zero so that
~0 = c1~v1 + · · · + ck~vk .

We say that B is linearly independent if the only solution to


~0 = c1~v1 + · · · + ck~vk

is c1 = · · · = ck = 0.

Definition 30.4: Subspace


A subset S is called a subspace of a vector space V if S is itself a vector space under
the same operations of vector addition and scalar multiplication as V.

Theorem 30.2: Subspace Test


A subset S is called a subspace of a vector space V if

(1) ~0V ∈ S, S contains the zero vector of V

(2) if ~x, ~y ∈ S, then ~x + ~y ∈ S, S is closed under vector addition

(3) if ~x ∈ S and c ∈ R, then c~x ∈ S. S is closed under scalar multiplication

In particular, {~0} is a subspace of V, called the trivial subspace, and V is a subspace of V.

Definition 30.5: Basis


Let S be a subspace of V, and let B = {~v1 , . . . , ~vk } be a set of vectors in S. Then B is
a basis for S if B is linearly independent and S = Span B. If S = {~0}, then we define
B = ∅ to be a basis for S.

Definition 30.5 carries with it the assumption that a vector space V (and any subspace S
of V) is spanned by finitely many vectors. This is not generally true, as none of the vector
spaces presented in Example 30.2 can be spanned by a set with finitely many vectors. This
type of vector space will not be studied in this course (beyond Example 30.2). As with Rn ,
the vector spaces we study below will always be spanned by finitely many vectors, and the
theory and results we are summarizing here are for such vector spaces.

210
Definition 30.6: Dimension
If B = {~v1 , . . . , ~vk } is a basis for a subspace S of V, then we say the dimension of S is
k, and we write dim(S) = k. If S = {~0}, then dim(S) = 0 since ∅ is a basis for S.

Theorem 30.3
If S is a k−dimensional subspace of V with k > 0, then

(1) A set of more than k vectors in S is linearly dependent,

(2) A set of fewer than k vectors in S cannot span S,

(3) A set of exactly k vectors in S spans S if and only if it is linearly independent.

Having reviewed the important definitions, we begin to look at some examples involving
them. We begin with the vector space Mm×n (R).

Example 30.4
Consider the set
(" # " # " # " #)
1 0 0 1 0 0 0 0
B= , , , .
0 0 0 0 1 0 0 1

Show that B is a basis for the vector space M2×2 (R).


Solution. We first show that B is linearly independent. For c1 , c2 , c3 , c4 ∈ R, consider
" # " # " # " # " #
1 0 0 1 0 0 0 0 0 0
c1 + c2 + c3 + c4 = .
0 0 0 0 1 0 0 1 0 0

This gives " # " #


c1 c2 0 0
=
c3 c4 0 0
and so clearly c1 = c2 = c3 = c4 = 0 and thus B is linearly independent. We next
check that Span B = M2×2 (R) (note that Span B ⊆ M2×2 (R) since M2×2 (R) is closed
under linear combinations). For any [ ac db ] ∈ M2×2 (R),
" # " # " # " # " #
a b 1 0 0 1 0 0 0 0
=a +b +c +d
c d 0 0 0 0 1 0 0 1

so Span B = M2×2 (R). Thus B is a basis for M2×2 (R).

211
Notice that in Example 30.4, it was very easy to show that basis was linearly independent,
and it was equally easy to write an arbitrary matrix as a linear combination of the matrices
in B. This is reminiscent of the standard basis for Rn . Indeed, the basis for M2×2 (R) given
in Example 30.4 is very similar to the standard basis for R4 :
       
1 0 0 0
0 , 1 , 0 , 0 .
0 0 1 0
0 0 0 1

Thus, the following definition is quite natural.

Definition 30.7: Standard Basis for M2×2 (R)

The set (" # " # " # " #)


1 0 0 1 0 0 0 0
B= , , ,
0 0 0 0 1 0 0 1
is called the standard basis for M2×2 (R).

It follows from Definition 30.7 that dim(M2×2 (R)) = 4. We construct the standard basis for
Mm×n (R) in a similar way, so dim(Mm×n (R)) = mn.

Example 30.5
Let (" # " # " # " #)
1 1 1 1 0 1 1 0
B= , , ,
0 1 1 0 1 1 1 1
Show B is a basis for M2×2 (R) and express A = [ 13 24 ] as a linear combination of the
vectors (matrices) in B.
Solution. For c1 , c2 , c3 , c4 ∈ R, consider
" # " # " # " # " #
1 2 1 1 1 1 0 1 1 0
= c1 + c2 + c3 + c4
3 4 0 1 1 0 1 1 1 1

Equating corresponding entries gives the system

c1 + c2 + + c4 = 1
c1 + c2 + c3 + = 2
+ c2 + c3 + c4 = 3
c1 + + c3 + c4 = 4

212
which we carry to reduced row echelon form.
     
1 1 0 1 1 −→ 1 1 0 1 1 R1 −R3 1 0 −1 0 −2 R1 +R2
     

 1 1 1 0 2  R2 −R1  0
  0 1 −1 1 
 −→ 
 0 0 1 −1 1  −→


 0 1 1 1 3 

 0
 1 1 1 3 


 0 1 1 1 3 
 R3 −R2
1 0 1 1 4 R4 −R1 0 −1 1 0 3 R4 +R3 0 0 2 1 6 R4 −2R2
     
1 0 0 −1 −1 −→ 1 0 0 −1 −1 R1 +R4 1 0 0 0 1/3
     

 0 0 1 −1 1 
 R2 ↔R3  0
 1 0 2 2 
 R2 −2R4  0 1 0 0 −2/3 
 

 0 1 0 2 2 
  0
 0 1 −1 1 
 R3 +R4
 0 0 1 0
 7/3 

0 0 0 3 4 1
3
R4 0 0 0 1 4/3 −→ 0 0 0 1 4/3

so c1 = 1/3, c2 = −2/3, c3 = 7/3, c4 = 4/3 and


" # " # " # " # " #
1 2 1 1 1 2 1 1 7 0 1 4 1 0
= − + + .
3 4 3 0 1 3 1 0 3 1 1 3 1 1

Also, since the coefficient matrix reduces to I, the corresponding homogeneous system
derived from
" # " # " # " # " #
1 1 1 1 0 1 1 0 0 0
c1 + c2 + c3 + c4 =
0 1 1 0 1 1 1 1 0 0

has only the trivial solution, so B is linearly independent. Since B has 4 vectors and
dim(M2×2 (R)) = 4, Span B = M2×2 (R) so B is a basis for M2×2 (R).

Example 30.6

Let B ∈ Mn×k (R) be fixed and let

S = {A ∈ Mm×n (R) | AB = 0m×k }.

Show S is a subspace of Mm×n (R).


Solution. Since 0m×n B = 0m×k , we have that 0m×n ∈ S. Let A1 , A2 ∈ S. Then
A1 B = 0m×k = A2 B. Then

(A1 + A2 )B = A1 B + A2 B = 0m×k + 0m×k = 0m×k

and so A1 + A2 ∈ S. For any c ∈ R,

(cA1 )B = c(A1 B) = c 0m×k = 0m×k

so cA1 ∈ S. Thus S is a subspace of Mm×n (R) by the Subspace Test.

213
Exercise 30.1
Determine whether S = {A ∈ M2×2 (R) | A2 = A} a subspace of M2×2 (R).

Solution. S is not a subspace of M2×2 (R). To see this, note that I ∈ S since I 2 = I. However,
(2I)2 = 4I 6= 2I, so 2I ∈/ S. Thus S is not closed under scalar multiplication ( I ∈ S, but
2I ∈
/ S, that is, property V6 fails).

Example 30.7

Consider the subspace S = {A ∈ M2×2 (R) | AT = A} of M2×2 (R). Find a basis for S
and state the dimension of S.
" #
a1 a2
Solution. Let A = ∈ S. Then AT = A so
a3 a4
" # " #
a1 a3 a1 a2
=
a2 a4 a3 a4

from which we see a3 = a2 . Thus


" # " # " # " #
a1 a2 1 0 0 1 0 0
A= = a1 + a2 + a4
a2 a4 0 0 1 0 0 1
so (" # " # " #)
1 0 0 1 0 0
B= , ,
0 0 1 0 0 1
is a spanning set for S. Since each vector in B contains a nonzero entry where others
contain a zero entry, B is linearly independent and thus a basis for S. It follows that
dim(S) = 3.

Exercise 30.2
Consider S = {A ∈ M2×2 (R) | AT = −A} ⊆ M2×2 (R). Show S is a subspace of
M2×2 (R) and find a basis for S. State the dimension of S.

Solution. Since 02×2 = −02×2 , 02×2 ∈ S. Now let A1 , A2 ∈ S. Then AT1 = −A1 and
A2 = −A2 . We have

(A1 + A2 )T = AT1 + AT2 = −A1 − A2 = −(A1 + A2 )

214
which shows that A1 + A2 ∈ S. For any c ∈ R,

(cA1 )T = c(AT1 ) = c(−A1 ) = −cA1

showing that cA1 ∈ S. Hence " S is a #subspace of M2×2 (R) by the Subspace Test. We now
a1 a2
find a basis for S. Let A = ∈ S. Then AT = A so
a3 a4
" # " #
−a1 −a3 a1 a2
=
−a2 −a4 a3 a4

from which we see a1 = −a1 , a3 = −a2 and a4 = −a4 . It follows that a1 = a4 = 0 and
" # " #
0 a2 0 1
A= = a2
−a2 0 −1 0
so (" #)
0 1
B=
−1 0
is a spanning set for S. Since B contains exactly one nonzero vector, B is linearly independent
and thus a basis for S. It follows that dim(S) = 1.

215
Lecture 31
We now examine real polynomials.

Definition 31.1: Pn (R)

For each nonnegative integer n, the set

Pn (R) = {a0 + a1 x + · · · + an xn | a0 , a1 , . . . , an ∈ R}

denotes the set of all real polynomials of degree at most n. We denote the zero
polynomial by 0 = 0 + 0x + · · · + 0xn ∈ Pn (R).

Note that
P0 (R) = {a0 | a0 ∈ R} = R, constant polynomials
P1 (R) = {a0 + a1 x | a0 , a1 ∈ R}, constant and first degree polynomials
2
P2 (R) = {a0 + a1 x + a2 x | a0 , a1 , a2 ∈ R}, constant, first and second degree polynomials
and so on. In particular, P0 (R) ⊆ P1 (R) ⊆ P2 (R) ⊆ · · · .

The operations of addition and scalar multiplication for polynomials were defined in the
complex case in Definition 5.3. We include them again here for Pn (R).

Definition 31.2: Equality, Addition and Scalar Multiplication

Let p(x), q(x) ∈ Pn (R) with

p(x) = a0 + a1 x + · · · + an xn
q(x) = b0 + b1 x + · · · + bn xn

for some a0 , . . . , an , b0 , . . . , bn ∈ R. We say that p and q are equal if and only if ai = bi


for i = 0, 1, . . . , n and write p = q. Otherwise, we write p 6= q. We define addition and
scalar multiplication by

• (p + q)(x) = p(x) + q(x) = (a0 + b0 ) + (a1 + b1 )x + · · · + (an + bn )xn

• (kp)(x) = kp(x) = ka0 + ka1 x + · · · + kan xn for any k ∈ R

Theorem 31.1
With the operations of addition and scalar multiplication, Pn (R) is a vector space for
each nonnegative integer n.

216
Example 31.1

Consider the set B = {1, x, . . . , xn } ⊆ Pn (R). Show that B is a basis for the vector
space Pn (R).
Solution. For c0 , c1 , . . . , cn ∈ R, consider

c0 (1) + c1 x + · · · + cn xn = 0 = 0 + 0x + · · · + 0xn .

We have that c0 = c1 = · · · = cn = 0 so B is linearly independent. Also, for any


polynomial p(x) = a0 + a1 x + · · · + an xn ∈ Pn (R), p(x) is trivially a linear combination
of the elements in B:

p(x) = a0 (1) + a1 (x) + · · · + an (xn )

so Span B = Pn (R). Thus B is a basis for Pn (R).

Definition 31.3: Standard Basis for Pn (R)

The set B = {1, x, . . . , xn } is called the standard basis for Pn (R).

From Definition 31.3, we see that dim(Pn (R)) = n + 1.

Example 31.2

Let B = {1, 1 + x, 1 + x + x2 } ⊆ P2 (R). Show B is a basis for P2 (R).


Solution. For c1 , c2 , c3 ∈ R, consider

c1 (1) + c2 (1 + x) + c3 (1 + x + x2 ) = 0

Rearranging gives

(c1 + c2 + c3 ) + (c2 + c3 )x + c3 x2 = 0 + 0x + 0x2

Thus
c1 + c2 + c3 = 0
c2 + c3 = 0
c3 = 0
and we see that c3 = 0 which implies that c2 = 0 which in turn gives c1 = 0. Thus
B is linearly independent. Since B has 3 elements and dim(P2 (R)) = 3, we see that
Span B = P2 (R) and so B is a basis for P2 (R).

217
Example 31.3

Let B = {1 + x, 1 − x, 1, 2x, x + x2 } ⊆ P2 (R). Find a basis B 0 for Span B with B 0 ⊆ B.


Find the dimension of Span B.
Solution. For c1 , . . . , c5 ∈ R, consider

c1 (1 + x) + c2 (1 − x) + c3 (1) + c4 (2x) + c5 (x + x2 ) = 0.

Rearranging gives

(c1 + c2 + c3 ) + (c1 − c2 + 2c4 + c5 )x + c5 x2 = 0 + 0x + 0x2

from which we obtain

c1 + c2 + c3 = 0
c1 − c2 + 2c4 + c5 = 0
c5 = 0

We see immediately that this homogeneous system is underdetermined and thus has
nontrivial solutions. This allows us to conclude that B is a linearly dependent set.
Carrying the coefficient matrix of our system to reduced row echelon form gives
     
1 1 1 0 0 −→ 1 1 1 0 0 −→ 1 1 1 0 0 R1 −R2

 1 −1 0 2 1  R2 −R1  0 −2 −1 2 1  − 2 R2  0 1 1/2 −1 −1/2  −→


    1  

0 0 0 0 1 0 0 0 0 1 0 0 0 0 1
   
1 0 1/2 1 1/2 R1 − 21 R3 1 0 1/2 1 0
 0 1 1/2 −1 −1/2  R2 + 21 R3  0 1 1/2 −1 0 
   

0 0 0 0 1 −→ 0 0 0 0 1

From any of the above row echelon forms, we can see that there are leading entries in
the first, second and fifth columns. Thus we can tell that 1 and 2x can be expressed
as linear combinations of 1 + x and 1 − x and from the reduced row echelon form, we
see easily that
1 1
1 = (1 + x) + (1 − x)
2 2
2x = 1(1 + x) − 1(1 − x)

From any of the above row echelon forms, we conclude that

B 0 = {1 + x, 1 − x, x + x2 }

is a linearly independent subset of B with Span B 0 = Span B. Thus B 0 is a basis for


Span B with B 0 ⊆ B, and dim(Span B) = 3.

218
Example 31.4

Let S = {p(x) ∈ P2 (R) | p(1) = 0}. Show S is a subspace of P2 (R), find a basis for S
and state the dimension of S.
Solution. Since 0 + 0(1) + 0(1)2 = 0, the zero polynomial belongs to S. Now for
p(x), q(x) ∈ S, p(1) = 0 = q(1) so

(p + q)(1) = p(1) + q(1) = 0 + 0 = 0

which shows (p + q)(x) ∈ S. For any c ∈ R,

(cp)(1) = cp(1) = c(0) = 0

so (cp)(x) ∈ S. Hence S is a subspace of P2 (R) by the Subspace Test. To find a basis


for S, we first find a spanning set for S. Let p(x) ∈ S. Then p(1) = 0 so x − 1 is a
factor of p(x). Since p(x) ∈ P2 (R), there are a, b ∈ R so that

p(x) = (x − 1)(ax + b) = ax2 + bx − ax − b = a(x2 − x) + b(x − 1)

so S = Span {x2 − x, x − 1}. Since neither x2 − x nor x − 1 is a scalar multiple of the


other, {x2 − x, x − 1} is linearly independent and thus a basis for S. Thus, dim(S) = 2.

Exercise 31.1
Consider the set

S = {a0 + a1 x | a0 , a1 ∈ R and a1 6= 0} ⊆ P1 (R).

Determine if S is a subspace of P1 (R).

Solution. S is not a subspace of P1 (R) simply because 0 = 0 + 0x ∈


/ S.

From Exercise 31.1, we additionally point out that S is not closed under addition nor scalar
multiplication. To see this, notice that 1 + 2x, 1 − 2x ∈ S but their sum (1 + 2x) + (1 − 2x) =
2∈/ S and 0(1 + 2x) = 0 ∈ / S.

Exercise 31.2
Consider S = {a + bx + (a − b)x2 | a, b ∈ R} ⊆ P2 (R). Show that S a subspace of P2 (R)
and find a basis for S. State the dimension of S.

219
Solution. First note that the zero polynomial satisfies 0 = 0+0x+(0−0)x2 and thus belongs
to S. Let p(x), q(x) ∈ S. Then there are a1 , b1 , a2 , b2 ∈ R such that p(x) = a1 +b1 x+(a1 −b1 )x2
and q(x) = a2 + b2 x + (a2 − b2 )x2 . Then

(p + q)(x) = p(x) + q(x) = (a1 + b1 x + (a1 − b1 )x2 ) + (a2 + b2 x + (a2 − b2 )x2 )


= (a1 + a2 ) + (b1 + b2 )x + (a1 + a2 ) − (b1 + b2 ) x2 ,


which shows that (p + q)(x) ∈ S. For any c ∈ R,

(cp)(x) = cp(x) = c(a1 + b1 x + (a1 − b1 )x2 = ca1 + cb1 x + (ca1 − cb1 )x2 ,

which gives that (cp)(x) ∈ S. Hence S is a subspace of P2 (R) by the Subspace Test. To find
a basis for S, consider p(x) ∈ S. Then there are a, b ∈ R so that

p(x) = a + bx + (a − b)x2 = a + bx + ax2 − bx2 = a(1 + x2 ) + b(x − x2 ).

It follows that B = {1 + x2 , x − x2 } is a spanning set for S, and since no element of B


is a scalar multiple of the other, B is linearly independent. Hence B is a basis for S and
dim(S) = 2.

Vector Spaces over C

• Cn is a vector space over19 C. For ~z ∈ Cn ,


 
z1
 . 
~z =  ..  = z1~e1 + · · · + zn~en .
zn
We call {~e1 , . . . , ~en } (where ~ei is the ith column of the n × n identity matrix) the
standard basis for Cn , so dim(Cn ) = n.

• Mm×n (C) is a vector space over C. The standard basis for Mm×n (C) is the same as for
Mm×n (R), so dim(Mm×n (C)) = mn.

• Pn (C) (the set of polynomials of degree at most n) is a vector space over C. The
standard basis is {1, z, . . . , z n } (where z ∈ C), so dim(Pn (C)) = n + 1.

The notions of subspace, span, linear independence, basis and dimension are handled the
same way as for real vector spaces.

19
The expression “over C” means our scalars are complex numbers.

220
Lecture 32

Linear Transformations
Recall that a function is a rule that assigns to every element in one set (called the domain
of the function) a unique element in another set (called the codomain 20 of the function).
Given sets U and V we write f : U → V to indicate that f is a function with domain U
and codomain V , and it is understood that to each element u ∈ U , the function f assigns
a unique element v ∈ V . We say that f maps u to v and that v is the image of u under f .
We typically write v = f (u). See Figure 32.1.

(a) A function with domain U and codomain (b) This fails to be a function from U to V
V. for two reasons: it doesn’t assign an image
in V to all points in U , and it assigns to one
point in U more than one image in V .

Figure 32.1: An example of a function (on the left) and something that fails to be
a function (on the right).

In calculus, one studies functions f : R → R, for example f (x) = x2 or f (x) = sin(x). We


will consider functions f : Rn → Rm . In fact, for A ∈ Mm×n (R) and ~x ∈ Rn , we have seen
how to compute the matrix-vector product A~x, and we know that A~x ∈ Rm . This motivates
the following definition.

Definition 32.1: Matrix Transformation


For A ∈ Mm×n (R), the function fA : Rn → Rm defined by fA (~x) = A~x for every
~x ∈ Rn is called the matrix transformation corresponding to A. We call Rn the domain
of fA and Rm the codomain of fA . We say that fA maps ~x to A~x and say that A~x is
the image of ~x under fA .

20
The codomain of a function is often confused with the range of a function. We will define the range of
a function shortly.

221
We make a few notes here:
• It is not uncommon to say matrix mapping instead of matrix transformation. We may
use the words transformation and mapping interchangeably.
• The subscript A in fA is merely to indicate that the function depends on the matrix
A. If we change the matrix A, we change the function fA .
• For A ∈ Mm×n (R), we have that fA : Rn → Rm . This is a result of how we defined the
matrix-vector product.

Example 32.1
Let " #
1 2 3
A= .
1 −1 1
Then A ∈ M2×3 (R) and so fA : R3 → R2 . We can compute
 
" # 1
1 2 3  
fA (1, 1, 4) =  1  = (15, 4),
1 −1 1
4

and more generally,


 
" # x1
1 2 3
fA (x1 , x2 , x3 ) =  x2  = (x1 + 2x2 + 3x3 , x1 − x2 + x3 ).
 
1 −1 1
x3

Since for A ∈ Mm×n (R), the function fA sends vectors in Rn to vectors in Rm , we should be
writing
     
x1 x1 y1
 .   .   . 
fA  ..  = A  ..  =  ..  ,
xn xn ym
but as functions are often viewed as sending points to points, we will prefer the notation
 
y1
 . 
f (x1 , . . . , xn ) = (y1 , . . . , ym ) or f (x1 , . . . , xn ) =  ..  .
ym
h x1 i
..
However, we must still write A . and not A(x1 , . . . , xn ) due to our rules for the matrix-
xn
vector product.

222
Theorem 32.1: Properties of Matrix Transformations

Let A ∈ Mm×n (R) and let fA be the matrix transformation corresponding to A. For
every ~x, ~y ∈ Rn and for every c ∈ R,

(1) fA (~x + ~y ) = fA (~x) + fA (~y )

(2) fA (c~x) = cfA (~x)

Proof. We use the properties of the matrix–vector product. We have

fA (~x + ~y ) = A(~x + ~y ) = A~x + A~y = fA (~x) + fA (~y )

and
fA (c~x) = A(c~x) = cA~x = cfA (~x).
Thus matrix transformations preserve vector sums and scalar multiplication. Combin-
ing these two results shows that matrix transformations preserve linear combinations: for
~x1 , . . . , ~xk ∈ Rn and c1 , . . . , ck ∈ R,

fA (c1~x1 + · · · + ck ~xk ) = c1 fA (~x1 ) + · · · + ck fA (~xk ).

Functions which preserve linear combinations are called linear transformations or linear map-
pings.

Definition 32.2: Linear Transformation


A function L : Rn → Rm is called a linear transformation (or a linear mapping) if for
every ~x, ~y ∈ Rn and for every s, t ∈ R we have

L(s~x + t~y ) = sL(~x) + tL(~y ).

For m = n, a linear transformation L : Rn → Rn is often called a linear operator on


Rn .

It follows immediately from Theorem 32.1 that every matrix transformation is a linear trans-
formation.

By taking s = t = 0 in the definition of a linear transformation, we find that

L(~0Rn ) = ~0Rm ,

that is, a linear transformation always sends the zero vector of the domain to the zero vector
of the codomain. By taking s = −1 and t = 0, we see that

L(−~x) = −L(~x)

223
so linear transformations preserve additive inverses as well.

Linear transformations are important throughout mathematics – in fact, we have seen them
in calculus.21 For differentiable functions f, g : R → R, and s, t ∈ R we have
d d d
(sf (x) + tg(x)) = s f (x) + t g(x).
dx dx dx

Example 32.2

Show that L : R2 → R2 defined by

L(x1 , x2 ) = (x1 − x2 , 2x1 + x2 )

is a linear transformation.
Solution. Let ~x, ~y ∈ R2 and s, t ∈ R. With ~x = [ xx12 ] and ~y = [ yy12 ], we have

L(s~x + t~y ) = L(sx1 + ty1 , sx2 + ty2 )



= (sx1 + ty1 ) − (sx2 + ty2 ), 2(sx1 + ty1 ) + (sx2 + ty2 )
= (sx1 − sx2 , 2sx1 + sx2 ) + (ty1 − ty2 , 2ty1 + ty2 )
= s(x1 − x2 , 2x1 + x2 ) + t(y1 − y2 , 2y1 + y2 )
= sL(~x) + tL(~y ).

As L(s~x + t~y ) = sL(~x) + tL(~y ), we see that L is a linear transformation.

Note that we could have also noticed that for any ~x ∈ R2


" # " #" #
x1 − x2 1 −1 x1
L(~x) = =
2x1 + x2 2 1 x2

which shows that L is a matrix transformation and hence a linear transformation.

To show that a transformation L : Rn −→ Rm is not a linear transformation, it is sufficient


to exhibit s, t ∈ R and ~x, ~y ∈ Rn such that L(s~x + t~y ) 6= sL(~x) + tL(~y ) as is done in the next
example.

Example 32.3

Show that L : R3 → R2 defined by L(x1 , x2 , x3 ) = (x1 + x2 + x3 , x23 + 3) is not linear.

21
It is important to always remember that linear algebra is far better than calculus.

224
h i h i
1 0
Solution. Consider ~x = 0 and ~y = 1 . Then
0 0

L(~x + ~y ) = L(1, 1, 0) = (2, 3)

but
L(~x) + L(~y ) = L(1, 0, 0) + L(0, 1, 0) = (1, 3) + (1, 3) = (2, 6)
which shows that L is not linear (here we have taken s = t = 1).

Exercise 32.1
Show that L : R2 → R defined by L(~x) = k~xk is not linear.

Solution. To show that L is not linear, we must exhibit two vectors ~x, ~y ∈ R2 and two
scalars s, t ∈ R such that L(s~x + t~y ) 6= sL(~x) + tL(~y ). We know that the norm does not
generally preserve sums, so we will take s = t = 1 and choose two nonzero nonparallel vectors
~x, ~y ∈ R2 . Consider " # " #
1 0
~x = and ~y = .
0 1
Then " #
1 √
L(~x + ~y ) = L(1, 1) = = 2
1
and " # " #
1 0
L(~x) + L(~y ) = L(1, 0) + L(0, 1) = + = 1 + 1 = 2.
0 1
As we have found vectors ~x, ~y ∈ R2 such that L(~x + ~y ) 6= L(~x) + L(~y ), we conclude that L
is not linear.

Recall that a linear transformation always maps the zero vector of the domain to the
zero vector of the codomain. Thus in Example 32.3, we could have quickly noticed that
L(0, 0, 0) = (0, 3) 6= (0, 0) and concluded immediately that L was not linear. Note however,
that a function sending the zero vector of the domain to the zero vector of the codomain
does not guarantee that the function is linear – see Exercise 32.1.

Example 32.4

Let L : R2 → R4 be a linear transformation such that

L(1, 2) = (1, 2, 3, 4) and L(2, 3) = (1, 4, 0, −1)

225
Then
L(3, 5) = L(1, 2) + L(2, 3) = (1, 2, 3, 4) + (1, 4, 0, −1) = (2, 6, 3, 3).

In general, for a linear transformation L : Rn → Rm , if we are given L(~x1 ), . . . , L(~xk ) for


~x1 , . . . , ~xk ∈ Rn , then we can compute L(~x) for any ~x ∈ Span {~x1 , . . . , ~xk } since L pre-
serves linear combinations. In particular, if {~v1 , . . . , ~vn } is a basis for Rn and we know
L(~v1 ), . . . , L(~vn ), then we can compute L(~v ) for any ~v ∈ Rn which is an extremely powerful
property!

Indeed, from Example 32.4, the set


(" # " #)
1 2
,
2 3

is a basis for R2 . Thus, for any [ xx12 ] ∈ R2 we have


" # " #
1 2 x1 −→ 1 2 x1 −→
2 3 x2 R2 −2R1 0 −1 x2 − 2x1 −R2
" # " #
1 2 x1 R1 −2R2 1 0 −3x1 + 2x2
0 1 2x1 − x2 −→ 0 1 2x1 − x2

and so " # " # " #


x1 1 2
= (−3x1 + 2x2 ) + (2x1 − x2 ) .
x2 2 3
It follows that

L(x1 , x2 ) = (−3x1 + 2x2 )L(1, 2) + (2x1 − x2 )L(2, 3)


= (−3x1 + 2x2 )(1, 2, 3, 4) + (2x1 − x2 )(1, 4, 0, −1)
= (−x1 + x2 , 2x1 , −9x1 + 6x2 , −14x1 + 9x2 ).

Thus, by knowing just L(1, 2) and L(2, 3) we can compute L(~x) for any ~x ∈ R2 . Also note
that    
−x1 + x2 −1 1 " #
 2x1   2 0  x1
L(x1 , x2 ) =  =
   

 −9x1 + 6x2   −9 6  x2
−14x1 + 9x2 −14 9
which shows that L is a matrix transformation.

Recall that Theorem 32.1 guarantees that every matrix transformation from Rn to Rm is a
linear transformation. We also noticed that the linear transformations from Examples 32.2

226
and 32.4 were matrix transformations, so it is natural to ask if every linear transformation
from Rn to Rm is a matrix transformation. The following theorem shows the answer is yes.

Theorem 32.2
If L : Rn → Rm is a linear transformation, then L is a matrix transformation with
corresponding matrix

[ L ] = [ L(~e1 ) · · · L(~en ) ] ∈ Mm×n (R),

that is, L(~x) = [ L ]~x for every ~x ∈ Rn .

Proof. Let ~x = [ x1 · · · xn ]T ∈ Rn . Then ~x = x1~e1 + · · · + xn~en . We have

L(~x) = L(x1~e1 + · · · + xn~en )


= x1 L(~e1 ) + · · · + xn L(~en ) since L is linear
 
x1
 . 
= [ L(~e1 ) · · · L(~en ) ]  .. 
xn
= [ L ]~x.

Given a linear transformation L : Rn → Rm , we refer to [ L ] ∈ Mm×n (R) as the standard


matrix of L. Theorems 32.1 and 32.2 combine to give that L is linear if and only if it is a
matrix transformation.

Example 32.5

Let d~ ∈ R2 be nonzero and define L : R2 → R2 by L(~x) = proj d~ ~x for every ~x ∈ R2 .


Show L is linear, and then find the standard matrix of L with d~ = [ −1
3 ].

Solution. We first show L is linear. Let ~x, ~y ∈ R2 and s, t ∈ R. We have

L(s~x + t~y ) = proj d~ (s~x + t~y )


(s~x + t~y ) · d~ ~
= d
~ 2
kdk
~x · d~ ~ ~y · d~ ~
=s d+t d by properties of the dot product
~ 2
kdk ~ 2
kdk
= s proj d~ ~x + t proj d~ ~y
= sL(~x) + tL(~y )

227
so L is linear. Now with d~ = [ −1
3 ],
" # " #
~e1 · d~ ~ 1 −1 1/10
L(~e1 ) = proj d~ ~e1 = d=− =
~ 2
kdk 10 3 −3/10
" # " #
~e2 · d~ ~ 3 −1 −3/10
L(~e2 ) = proj d~ ~e2 = d= =
~ 2
kdk 10 3 9/10
so " #
1/10 −3/10
[ L ] = [ L(~e1 ) L(~e2 ) ] = .
−3/10 9/10

Note that if we take ~x = [ 12 ] for example, we can compute the projection of ~x onto d~ = [ −1
3 ]
as " #" # " #
1/10 −3/10 1 −1/2
L(~x) = proj d~ ~x = = ,
−3/10 9/10 2 3/2
that is, we can compute projections using matrix multiplication.

228
Lecture 33

Example 33.1

Let L : R2 → R2 be defined by L(~x) = 2proj d~ ~x − ~x where d~ ∈ R2 is a nonzero vector.


Figure 33.1 shows that L represents a reflection in the line through the origin with
~ Show that L is linear, and then find the standard matrix of L with
direction vector d.
~ 1
d = [ 1 ].

~ Note
Figure 33.1: Reflecting ~x in a line through the origin with direction vector d.
that 2proj d~ ~x − ~x = ~x − 2perp d~ ~x

Solution. We first show that L is linear using the fact that proj d~ ~x is linear. For
~x, ~y ∈ R2 and s, t ∈ R we have

L(s~x + t~y ) = 2proj d~ (s~x + t~y ) − (s~x + t~y )


= 2(s proj d~ ~x + t proj d~ ~y ) − s~x − t~y
= s(2proj d~ ~x − ~x) + t(2proj d~ ~y − ~y )
= sL(~x) + tL(~y )

so L is linear. Now with d~ = [ 11 ]


" #! " # " #
1 1 1 0
L(~e1 ) = 2proj d~ ~e1 − ~e1 = 2 − =
2 1 0 1
" #! " # " #
1 1 0 1
L(~e2 ) = 2proj d~ ~e2 − ~e2 = 2 − =
2 1 1 0

and so " #
0 1
[ L ] = [ L(~e1 ) L(~e2 ) ] = .
1 0

229
Note that in Example 33.1, for any ~x = [ xx12 ] ∈ R2 ,
" #" # " #
0 1 x1 x2
L(~x) = =
1 0 x2 x1

from which we see that reflecting a vector in the line with direction vector d~ = [ 11 ] simply
swaps the coordinates of that vector.

Exercise 33.1
Let L : R3 → R3 be defined by L(~x) = x − 2proj ~n ~x where ~n ∈ R3 is a nonzero vector.
Figure 33.2 shows that L represents a reflection in the plane through the origin with
normal vector ~n. Show that L is linear, and find the standard matrix of L if the plane
has scalar equation x1 − x2 + 2x3 = 0.

Figure 33.2: Reflecting ~x in a plane through the origin with normal vector ~n.

Solution. We first show that L is linear using the fact that projections are linear. For
~x, ~y ∈ R3 , and s, t ∈ R,
L(s~x + t~y ) = (s~x + t~y ) − 2proj ~n (s~x + t~y )
= s~x + t~y − 2(s proj ~n ~x + t proj ~n ~y )
= s(~x − 2proj ~n ~x) + t(~y − 2proj ~n ~y )
= sL(~x) + tL(~y )
h 1 i
and so L is linear. Now for the plane x1 − x2 + 2x3 = 0, we have that ~n = −1 . We compute
2
      
1 1 2/3
~e1 · ~n 1 
L(~e1 ) = ~e1 − 2proj ~n ~e1 = ~e1 − 2 ~n =  0  − 2   −1  =  1/3 
    
k~nk 2 6
0 2 −2/3
      
0 1 1/3
~e2 · ~n  (−1) 
L(~e2 ) = ~e2 − 2proj ~n ~e2 = ~e2 − 2 ~n =  1  − 2   −1  =  2/3 
    
k~nk 2 6
0 2 2/3

230
      
0 1 −2/3
~e3 · ~n 2 
L(~e3 ) = ~e3 − 2proj ~n ~e3 = ~e3 − 2 ~n =  0  − 2   −1  =  2/3 
    
k~nk 2 6
1 2 −1/3

Hence the standard matrix of L is


 
2/3 1/3 −2/3
[ L ] = [ L(~e1 ) L(~e2 ) L(~e3 ) ] =  1/3 2/3 2/3  .
 

−2/3 2/3 −1/3

In Example 33.1 and Exercise 33.1, we required the objects we were reflecting in (a line and
a plane) to be through the origin. The reason for this is because if our line or plane does
not contain the origin, then these transformations would not send the zero vector to the zero
vector and thus not be linear.

Oftentimes we use more descriptive names for our linear transformations:


proj ~ : Rn → Rn is the projection onto d~ 6= ~0 in Rn
d

perp d~ : Rn → Rn is the perpendicular onto d~ 6= ~0 in Rn


refl~n : Rn → Rn is the reflection in a hyperplane through the origin with
normal vector ~n 6= ~0 in Rn
We are seeing that linear transformations (or equivalently, matrix transformations) give us a
way to geometrically understand the matrix–vector product. We have seen that projections
and reflections are both linear transformations, and we now look at some additional linear
transformations that are common in many fields, such as computer graphics.

We first consider rotations. Let Rθ : R2 → R2 be a counterclockwise rotation about the


origin by an angle of θ. To see that Rθ is linear, we use basic trigonometry to write ~x ∈ R2
as " #
r cos φ
~x =
r sin φ
where r ∈ R satisfies r = k~xk ≥ 0 and φ ∈ R is the angle ~x makes with the positive x1 −axis
measured counterclockwise (if ~x = ~0, then r = 0 and we may take φ to be any real number).
See Figure 33.3.

Since Rθ (~x) is obtained from rotating ~x counterclockwise about the origin, it is clear that
kRθ (~x)k = r and that Rθ (~x) makes an angle of θ + φ with the positive x1 −axis (this is
illustrated in Figure 33.3). Thus using the angle-sum formulas for sine and cosine, we have
" #
r cos(φ + θ)
Rθ (~x) =
r sin(φ + θ)

231
Figure 33.3: Rotating a vector in R2 .
" #
r(cos φ cos θ − sin φ sin θ)
=
r(sin φ cos θ + cos φ sin θ)
" #
cos θ(r cos φ) − sin θ(r sin φ)
=
sin θ(r cos φ) + cos θ(r sin φ)
" #" #
cos θ − sin θ r cos φ
=
sin θ cos θ r sin φ
" #
cos θ − sin θ
= ~x
sin θ cos θ
and we see that Rθ is a matrix transformation and thus a linear transformation. We also see
that " #
cos θ − sin θ
[ Rθ ] = .
sin θ cos θ

Example 33.2

Find the vector that results from rotating ~x = [ 12 ] counterclockwise about the origin
by an angle of π6 .
Solution. We have
    √    √ 
cos π6 − sin π6 1 3/2 −1/2 1 1 3−2
R π6 (~x) = [ R π6 ]~x = = √ = √ .
sin π6 cos π6 2 1/2 3/2 2 2 1+2 3

Note that a clockwise rotation about the origin by an angle of θ is simply a counterclockwise
rotation about the origin by an angle of −θ. Thus a clockwise rotation by θ is given by the
linear transformation
" # " #
cos(−θ) − sin(−θ) cos θ sin θ
[ R−θ ] = =
sin(−θ) cos(−θ) − sin θ cos θ

232
where we have used the fact that cos θ is an even function and sin θ is an odd function, that
is,
cos(−θ) = cos θ and sin(−θ) = − sin θ.
We briefly mention that we can generalize these results for rotations about a coordinate axis
in R3 . Consider22
     
1 0 0 cos θ 0 sin θ cos θ − sin θ 0
A =  0 cos θ − sin θ  , B =  0 1 0  , C =  sin θ cos θ 0  .
     

0 sin θ cos θ − sin θ 0 cos θ 0 0 1

Then

L1 : R3 → R3 defined by L1 (~x) = A~x is a counterclockwise rotation about the x1 − axis,


L2 : R3 → R3 defined by L2 (~x) = B~x is a counterclockwise rotation about the x2 − axis,
L3 : R3 → R3 defined by L3 (~x) = C~x is a counterclockwise rotation about the x3 − axis.

In fact, we can rotate about any line through the origin in R3 , but finding the standard
matrix of such a transformation is beyond the scope of this course.

We next look at stretches and compressions. For t a positive real number, let
" #
t 0
A=
0 1

and define L : R2 → R2 by L(~x) = A~x for every ~x ∈ R2 . Then L is a matrix transformation


and hence a linear transformation. For ~x = [ xx12 ],
" #" # " #
t 0 x1 tx1
L(~x) = = .
0 1 x2 x2

If t > 1, then we say that L is a stretch in the x1 −direction by a factor of t, and if 0 < t < 1,
we say that L is a compression if the x1 −direction. A stretch in the ~x1 −direction is illus-
trated in Figure 33.4.

Note the requirement that t > 0. If t = 0, then L is actually a projection onto the x2 −axis,
and if t < 0, then L is a reflection in the x2 −axis followed by a stretch or compression by a
factor of −t > 0. A stretch or compression in the x2 −direction is defined in a similar way.

22
For the matrix B, notice that the negative sign is on the “other” instance of sin θ. The reason for
this is if one “stares” down the positive x2 −axis, then they see the x1 x3 −plane, however, the orientation is
backwards – the positive x1 −axis is to the left of the positive x3 −axis. Thus the roles of “clockwise” and
“counterclockwise” are reversed in this instance.

233
Figure 33.4: A stretch in the x1 −direction by a factor of t > 1.

We next consider dilations and contractions. For t ∈ R with t > 0, let


" #
t 0
B=
0 t

and define L(~x) = B~x for every ~x ∈ R2 . Then L is a matrix transformation and thus a linear
transformation. For ~x = [ xx12 ],
" #" # " #
t 0 x1 tx1
L(~x) = = = t~x.
0 t x2 tx2

We see that L(~x) is simply a scalar multiple of ~x. We call L a dilation if t > 1 and we call
L a contraction if 0 < t < 1. If t = 1, then B is the identity matrix and L(~x) = ~x. Figure
33.5 illustrates a dilation.

Figure 33.5: A dilation by a factor of t > 1.

Finally, we consider shears. For s ∈ R, let


" #
1 s
C=
0 1

234
and define L : R2 → R2 by L(~x) = C~x for every ~x ∈ R2 . Then L is a matrix transformation
and hence a linear transformation. For ~x = [ xx12 ],
" #" # " #
1 s x1 x1 + sx2
L(~x) = =
0 1 x2 x2

and we see that L is a shear in the x1 −direction by a factor of s (also referred to as a


horizontal shear by a factor of s). Figure 33.6 illustrates a shear in the x1 −direction for
s > 0.

Figure 33.6: A shear in the x1 −direction by a factor of s > 0.

Note that a shear in the x2 −direction (or a vertical shear) by a factor of s ∈ R has standard
matrix " #
1 0
.
s 1

235
Lecture 34

Operations on Linear Transformations


We now study linear transformations more algebraically. Given the relationship between
linear transformations and matrices, it shouldn’t be too much of a surprise that linear trans-
formations behave very much like matrices under the operations of addition and scalar mul-
tiplication.

Definition 34.1: Equality of Linear Transformations

Let L, M : Rn → Rm be (linear) transformations. If L(~x) = M (~x) for every ~x ∈ Rn ,


then we say L and M are equal and we write L = M . If for some ~x ∈ Rn we have that
L(~x) 6= M (~x), then L and M are not equal and we write L 6= M .

Note that if L, M : Rn → Rm are linear transformations, then


L = M ⇐⇒ L(~x) = M (~x) for every ~x ∈ Rn
⇐⇒ [ L ]~x = [ M ]~x for every ~x ∈ Rn
⇐⇒ [ L ] = [ M ] by the Matrices Equal Theorem.

Definition 34.2: Operations on Linear Transformations

Let L, M : Rn → Rm be (linear) transformations and let c ∈ R. We define


(L + M ) : Rn → Rm by
(L + M )(~x) = L(~x) + M (~x)
for every ~x ∈ Rn , and we define (cL) : Rn → Rm by

(cL)(~x) = cL(~x)

for every ~x ∈ Rn .

Example 34.1

Let L, M : R3 → R2 be linear transformations such that

L(x1 , x2 , x3 ) = (2x1 + x2 , x1 − x2 + x3 )
M (x1 , x2 , x3 ) = (x3 , x1 + 2x2 + 3x3 ).

Calculate L + M and −2L, that is, find expressions for (L + M )(x1 , x2 , x3 ) and
(−2L)(x1 , x2 , x3 ).

236
h x1 i
Solution. For ~x = x2
x3
∈ R3 we have

(L + M )(~x) = L(~x) + M (~x)


= (2x1 + x2 , x1 − x2 + x3 ) + (x3 , x1 + 2x2 + 3x3 )
= (2x1 + x2 + x3 , 2x1 + x2 + 4x3 )

and

(−2)L(~x) = −2(2x1 + x2 , x1 − x2 + x3 ) = (−4x1 − 2x2 , −2x1 + 2x2 − 2x3 ).

It is not difficult to show that the transformations L + M and −2L derived in Example 34.1
are both linear. Computing the standard matrices for L and M gives
" # " #
2 1 0 0 0 1
[L] = and [ M ] =
1 −1 1 1 2 3

and computing the standard matrices for L + M and −2L shows us that
" # " #
2 1 1 −4 −2 0
[L + M ] = = [ L ] + [ M ] and [ −2L ] = = −2[ L ].
2 1 4 −2 2 −2

Theorem 34.1
Let L, M : Rn → Rm be linear transformations and c ∈ R. Then
L + M : Rn → Rm and cL : Rn → Rm are linear transformations. In addition,

[ L + M ] = [ L ] + [ M ] and [ cL ] = c[ L ].

Proof. We prove the result for cL. For any ~x, ~y ∈ Rn and s, t ∈ R, we have
(cL)(s~x + t~y ) = cL(s~x + t~y ) by definition of cL

= c sL(~x) + tL(~y ) since L is linear
= csL(~x) + ctL(~y )
= s(cL)(~x) + t(cL)(~y ) by definition of cL
and we see that cL is a linear transformation. Now for any ~x ∈ Rn
[cL]~x = (cL)(~x) by definition of the standard matrix of cL
= cL(~x) by definition of cL
= c[ L ]~x by the definition of the standard matrix of L
from which we see that [ cL ] = c[ L ] by the Matrices Equal Theorem (Theorem 19.3).

237
As with vectors in Rn , matrices in Mm×n (R) and polynomials in Pn (R), the set of linear
transformations from Rn to Rm form a vector space under the operations of addition and
scalar multiplication.

Theorem 34.2
Let L, M, N : Rn → Rm be linear transformations and let c, d ∈ R. We have
V1. L + M : Rn → Rm is a linear transformation closure under addition

V2. L + M = M + L addition is commutative

V3. (L + M ) + N = L + (M + N ) addition is associative

V4. There exists a linear transformation 0 : Rn → Rm such that L + 0 = L for every


linear transformation L : Rn → Rm zero transformation

V5. For each linear transformation L : Rn → Rm there exists a linear transformation


(−L) : Rn → Rm such that L + (−L) = 0 additive inverse

V6. cL : Rn → Rm is a linear transformation closure under scalar multiplication

V7. c(dL) = (cd)L scalar multiplication is associative

V8. (c + d)L = cL + dL distributive law

V9. c(L + M ) = cL + cM distributive law

V10. 1L = L scalar multiplicative identity

The zero transformation 0 : Rn → Rm is such that 0(~x) = ~0Rm for every ~x ∈ Rn . It’s not
difficult to verify that this transformation is linear, and that its standard matrix is given
by [ 0 ] = 0m×n . As usual, for two linear transformations L, M :→ Rm → Rn , we define
L − M = L + (−M ).

Aside from adding and scaling linear transformations, we can also compose them. We will
see that composition of linear transformations is closely tied to matrix multiplication.

Definition 34.3: Composition of Linear Transformations

Let L : Rn → Rm and M : Rm → Rp be (linear) transformations. The composition


M ◦ L : Rn → Rp is defined by

(M ◦ L)(~x) = M (L(~x))

for every ~x ∈ Rn .

The composition of two transformations is illustrated in Figure 34.1. It is important to note

238
that in order for M ◦ L to be defined, the domain of M must contain the codomain of L.

Figure 34.1: Composing two transformations

Example 34.2

Let L : R3 → R2 and M : R2 → R2 be linear transformations defined by L(x1 , x2 , x3 ) =


(x1 + x2 , x2 + x3 ) and M (x1 , x2 ) = (x1 − 3x2 , 2x1 ). Calculate M ◦ L.
Solution. We have

(M ◦ L)(x1 , x2 , x3 ) = M L(x1 , x2 , x3 )
= M (x1 + x2 , x2 + x3 )

= (x1 + x2 ) − 3(x2 + x3 ), 2(x1 + x2 )
= (x1 − 2x2 − 3x3 , 2x1 + 2x2 ).

Notice that M ◦ L is also a linear transformation with domain R3 and codomain R2 . In fact,
computing the standard matrices for L and M gives
" # " #
1 1 0 1 −3
[L] = and [ M ] =
0 1 1 2 0

and computing their product gives


" #" # " #
1 −3 1 1 0 1 −2 −3
[ M ][ L ] = =
2 0 0 1 1 2 2 0

which is the standard matrix for M ◦ L.

239
Theorem 34.3
Let L : Rn → Rm and M : Rm → Rp be linear transformations. Then M ◦L : Rn → Rp
is a linear transformation and

[ M ◦ L ] = [ M ][ L ].

Proof. We first show that M ◦ L is linear. Let ~x, ~y ∈ Rn and s, t ∈ R. Then



(M ◦ L)(s~x + t~y ) = M L(s~x + t~y ) by definition of composition

= M sL(~x) + tL(~y ) since L is linear
 
= sM L(~x) + tM L(~y ) since M is linear
= s(M ◦ L)(~x) + t(M ◦ L)(~y ) by definition of composition

and we see that M ◦ L is linear. Now for any ~x ∈ Rn ,

[ M ◦ L ]~x = (M ◦ L)(~x) by definition of the standard matrix of M ◦ L



= M L(~x) by definition of composition
= M ([ L ]~x) by definition of the standard matrix of L
= [ M ]([ L ]~x) by definition of the standard matrix of M
= ([ M ][ L ])~x

from which we see that [ M ◦ L ] = [ M ][ L ] by the Matrices Equal Theorem.

Example 34.3

Let L : R2 → R2 be a counterclockwise rotation about the origin by an angle of π/4


and let M : R2 → R2 be a projection onto the x1 −axis. Find the standard matrices
for M ◦ L and L ◦ M .
Solution. Since L and M are linear, we have
" # " √ √ #
cos π/4 − sin π/4 2/2 − 2/2
[L] = = √ √
sin π/4 cos π/4 2/2 2/2
" #
h i 1 0
[ M ] = proj ~e1 ~e1 proj ~e1 ~e2 =
0 0

and thus
" #" √ √ # " √ √ #
1 0 2/2 − 2/2 2/2 − 2/2
[ M ◦ L ] = [ M ][ L ] = √ √ =
0 0 2/2 2/2 0 0

240
" √ √ #" # " √ #
2/2 − 2/2 1 0 2/2 0
[ L ◦ M ] = [ L ][ M ] = √ √ = √ .
2/2 2/2 0 0 2/2 0

We notice in the previous example that although M ◦ L and L ◦ M are both defined,
[ M ◦ L ] 6= [ L ◦ M ] from which we conclude that M ◦ L and L ◦ M are not the same
linear transformation, that is, L and M do not commute. This shouldn’t be surprising for
two reasons: first, the composition of linear transformations corresponds to multiplication
of matrices, and multiplication of matrices is not commutative; and second, you have seen
in your √calculus courses that composition of functions do not commute. For example, if
f (x) = x and g(x) = sin(x), then
p √ 
f (g(x)) = sin(x) 6= sin x = g(f (x)).

Example 34.4

Let L, M : R2 → R2 be linear transformations defined by L(x1 , x2 ) = (2x1 +x2 , x1 +x2 )


and M (x1 , x2 ) = (x1 − x2 , −x1 + 2x2 ). Find [ M ◦ L ] and [ L ◦ M ].
Solution. Since L and M are linear, we have
" #
h i 2 1
[ L ] = L(~e1 ) L(~e2 ) =
1 1
" #
h i 1 −1
[ M ] = M (~e1 ) M (~e2 ) =
−1 2

and thus
" #" # " #
1 −1 2 1 1 0
[ M ◦ L ] = [ M ][ L ] = =
−1 2 1 1 0 1
" #" # " #
2 1 1 −1 1 0
[ L ◦ M ] = [ L ][ M ] = = .
1 1 −1 2 0 1

We see that [ M ◦ L ] = I = [ L ◦ M ], so M ◦ L = L ◦ M .

Example 34.3 shows that linear transformations do sometimes commute. Also, note that the
standard matrices [ L ] and [ M ] are inverses of each other. As we will see, this will imply
that L and M are inverses of one another.

241
Lecture 35

Inverse Linear Transformations


We have studied invertible matrices, and have seen that the inverse is only defined for square
matrices. We now study invertible linear transformations, which will only be defined for lin-
ear operators on Rn .23

Definition 35.1: Identity Operator

The linear operator Id on Rn defined by Id(~x) = ~x for every ~x ∈ Rn is called the


identity operator or identity transformation.

Computing the standard matrix for Id : Rn → Rn gives


h i h i
[ Id ] = Id(~e1 ) · · · Id(~en ) = ~e1 · · · ~en = I.

Definition 35.2: Invertible Linear Operator


If L is a linear operator on Rn and there exists another linear operator M on Rn such
that M ◦ L = Id = L ◦ M , then we say that L is invertible and call M the inverse of
L, and write L−1 = M .

From Example 34.4, we see that L−1 = M (and equivalently that M −1 = L). We also see
that [ L ]−1 = [ M ] (and equivalently that [ M ]−1 = [ L ]).

Theorem 35.1
If L, M : Rn → Rn are linear operators, then M is the inverse of L if and only if [ M ]
is the inverse of [ L ].

Proof. We have

M is the inverse of L ⇐⇒ M ◦ L = Id = L ◦ M
⇐⇒ [ M ◦ L ] = [ Id ] = [ L ◦ M ]
⇐⇒ [ M ][ L ] = I = [ L ][ M ]
⇐⇒ [ M ] is the inverse of [ L ].
23
Recall, a linear operator on Rn is a linear transformation L : Rn → Rn , that is, a linear transformation
whose codomain is equal to its domain.

242
It follows from Theorem 35.1 that if L is an invertible linear operator on Rn , then

[ L−1 ] = [ L ]−1 .

Geometrically, given an invertible linear operator L on Rn , we can view


L−1 : Rn → Rn as “undoing” what L does.

Example 35.1

Recall that Rθ : R2 → R2 denotes a counterclockwise rotation about the origin through


an angle of θ. Describe the inverse transformation of Rθ and find its standard matrix.
Solution. The inverse of a counterclockwise rotation by an angle of θ is a counterclock-
wise rotation by an angle of −θ (that is, a clockwise rotation by an angle of θ). Thus,
the inverse transformation of Rθ is Rθ−1 = R−θ . As we have seen following Example
33.2, " # " #
cos(−θ) − sin(−θ) cos θ sin θ
[ R−θ ] = = .
sin(−θ) cos(−θ) − sin θ cos θ

Note that we have just shown that [ Rθ ]−1 = [ R−θ ], that is,
" #−1 " #
cos(θ) − sin(θ) cos θ sin θ
= .
sin(θ) cos(θ) − sin θ cos θ

We could have used the Inversion Algorithm to compute [ Rθ ]−1 :


" # " #
cos θ − sin θ 1 0 1 0 cos θ sin θ
−→
sin θ cos θ 0 1 0 1 − sin θ cos θ

but this is quite tedious. Indeed, understanding what multiplication by a square matrix does
geometrically can give us a fast way to decide if the matrix is invertible, and if so, what the
inverse of that matrix is.

Exercise 35.1
Let L : R2 → R2 be a linear transformation defined by L(x1 , x2 ) = (2x1 +5x2 , x1 +3x2 ).
Find L−1 , that is, find an expression for L−1 (x1 , x2 ).

Solution. We have " #


h i 2 5
[L] = L(~e1 ) L(~e2 ) = .
1 3

243
Applying the Matrix Inversion Algorithm gives
" # " # " #
2 5 1 0 R1 ↔R2 1 3 0 1 −→ 1 3 0 1 −→
1 3 0 1 −→ 2 5 1 0 R2 −2R1 0 −1 1 −2 −R2
" # " #
1 3 0 1 R1 −3R2 1 0 3 −5
.
0 1 −1 2 −→ 0 1 −1 2

Thus " #" # " #


3 −5 x1 3x1 − 5x2
= ,
−1 2 x2 −x1 + 2x2
that is,
L−1 (x1 , x2 ) = (3x1 − 5x2 , −x1 + 2x2 ).

The Kernel and the Range of a Linear Transformation


Definition 35.3: The Kernel of a Linear Transformation
Let L : Rn → Rm be a (linear) transformation. The kernel of L is

Ker (L) = {~x ∈ Rn | L(~x) = ~0}.

Note that Ker (L) ⊆ Rn . The kernel of L can also be called the nullspace of L, denoted by
Null (L).

Example 35.2

Let L : R2 → R2 be a linear transformation defined by

L(x1 , x2 ) = (x1 − x2 , −3x1 + 3x2 ).

Determine which of ~x1 = [ 00 ], ~x2 = [ 11 ] and ~x3 = [ 32 ] belong to Ker (L).


Solution. We compute

L(~x1 ) = L(0, 0) = (0 − 0, −3(0) + 3(0)) = (0, 0)


L(~x2 ) = L(1, 1) = (1 − 1, −3(1) + 3(1)) = (0, 0)
L(~x3 ) = L(3, 2) = (3 − 2, −3(3) + 3(2)) = (1, −3)

from which we deduce that ~x1 , ~x2 ∈ Ker (L) and ~x3 ∈
/ Ker (L).

244
Definition 35.4: The Range of a Linear Transformation

Let L : Rn → Rm be a (linear) transformation. The range of L is

Range (L) = {L(~x) | ~x ∈ Rn }

Note that Range (L) ⊆ Rm .

Example 35.3

Let L : R2 → R3 be a linear transformation defined by

L(x1 , x2 ) = (x1 + x2 , 2x1 + x2 , 3x2 ).


h i h i
2 1
Determine which of ~y1 = 3 and ~y2 = 1 belong to Range (L).
3 2

Solution. To see if ~y1 ∈ Range (L), we try to find ~x = [ xx12 ] ∈ R2 such that L(~x) = ~y1 .
Thus we need
L(x1 , x2 ) = (x1 + x2 , 2x1 + x2 , 3x2 ) = (2, 3, 3).
This leads to a system of equations

x1 + x2 = 2
2x1 + x2 = 3
3x2 = 3

Carrying the augmented matrix of this system to reduced row echelon form gives
       
1 1 2 −→ 1 1 2 R1 +R2 1 0 1 −→ 1 0 1
 2 1 3  R2 −2R1  0 −1 −1  −→  0 −1 −1  −R2  0 1 1 
       

0 3 3 0 3 3 R3 +3R1 0 0 0 0 0 0

from which we see that x1 = x2 = 1 and so L(1, 1) = (2, 3, 3). Thus ~y1 ∈ Range (L).
For ~y2 , we seek ~x = [ xx12 ] ∈ R2 such that L(x1 , x2 ) = (1, 1, 2). A similar computation
leads to a system of equations with augmented matrix
   
1 1 1 1 1 1
 2 1 1  −→  0 −1 −1  .
   

0 3 2 0 0 −1

As this system is inconsistent, there is no ~x = [ xx12 ] ∈ R2 such that L(x1 , x2 ) = (1, 1, 2)


and so ~y2 ∈
/ Range (L).

245
Note that in Example 35, to see if a vector ~y ∈ Range (L), were are ultimately checking if
the linear system of equations [ L ]~x = ~y is consistent, that is, if ~y ∈ Col ([ L ]). Recalling
that for a linear transformation L : Rn → Rm , L(~x) = [ L ]~x for every ~x ∈ Rn , the following
theorem should not be too surprising.

Theorem 35.2
Let L : Rn → Rm be a linear transformation with standard matrix [ L ]. Then

(1) Ker (L) = Null ([ L ]), and

(2) Range (L) = Col ([ L ]).

In particular, Ker (L) is a subspace of Rn and Range (L) is a subspace of Rm .

Proof.

(1) Since
~x ∈ Ker (L) ⇐⇒ L(~x) = ~0 ⇐⇒ [ L ]~x = ~0 ⇐⇒ ~x ∈ Null ([ L ]),
we have that Ker (L) = Null ([ L ]) and thus Ker (L) is a subspace of Rn .

(2) Since

~y ∈ Range (L) ⇐⇒ ~y = L(~x) for some ~x ∈ Rn


⇐⇒ ~y = [ L ]~x for some ~x ∈ Rn
⇐⇒ ~y ∈ Col ([ L ]),

we see that Range (L) = Col ([ L ]) and thus Range (L) is a subspace of Rm .

Exercise 35.2
Let L : Rn → Rm be a linear transformation. Without referring to Theorem 35.2, use
the Subspace Test to prove that

(1) Ker (L) is a subspace of Rn , and

(2) Range (L) is a subspace of Rm .

Proof.

(1) Since L is linear, L(~0Rn ) = ~0Rm so ~0Rn ∈ Ker (L). For ~x, ~y ∈ Ker (L), we have that
L(~x) = ~0 = L(~y ). Then, since L is linear

L(~x + ~y ) = L(~x) + L(~y ) = ~0 + ~0 = ~0

246
so ~x + ~y ∈ Ker (L) and Ker (L) is closed under vector addition. For c ∈ R, we again
use the linearity of L to obtain

L(c~x) = cL(~x) = c ~0 = ~0

showing that c~x ∈ Ker (L) so that Ker (L) is closed under scalar multiplication. Hence,
Ker (L) is a subspace of Rn .
(2) Since L is linear, L(~0Rn ) = ~0Rm so ~0Rm ∈ Range (L). For ~x, ~y ∈ Range (L), there exist
~u, ~v ∈ Rn such that ~x = L(~u) and ~y = L(~v ). Then since L is linear,

L(~u + ~v ) = L(~u) + L(~v ) = ~x + ~y

and so ~x + ~y ∈ Range (L). For c ∈ R, we use the linearity of L to obtain

L(c~u) = cL(~u) = c~x

and so c~x ∈ Range (L). Thus Range (L) is a subspace of Rm .

Example 35.4
3 3
h : iR → R be a projection onto the line through the origin with direction vector
Let L
1
d~ = 1 . Find a basis for Ker (L) and Range (L).
1

Solution. As L is linear, the standard matrix of L is


 
h i h i 1/3 1/3 1/3
[L] = L(~e1 ) L(~e2 ) L(~e3 ) = proj d~ ~e1 proj d~ ~e2 proj d~ ~e3 =  1/3 1/3 1/3 
 

1/3 1/3 1/3


To find a basis for Ker (L), we solve the homogeneous system of equations given by
[ L ]~x = ~0. Carrying [ L ] to reduced row echelon form gives
     
1/3 1/3 1/3 −→ 1/3 1/3 1/3 3R1 1 1 1
 1/3 1/3 1/3  R2 −R1  0 0 0  −→  0 0 0 
     

1/3 1/3 1/3 R3 −R1 0 0 0 0 0 0


and we see that    
−1 −1
~x = s  1  + t  0  , s, t ∈ R
   

0 1
so    
 −1
 −1  
 1 , 0 
   
 
0 1
 

247
is a basis for Ker (L). From our work above, we see that the reduced row echelon form
of [ L ] has a leading one in the first column only, and so a basis for Range (L) is
 

 1/3 

 1/3  .
 
 
1/3
 

In Example 35.4, note that geometrically, Ker (L) is a plane through the origin (two-
dimensional subspace) in R3 , and that Range
h (L)
i is a line through the origin (one-dimensional
1
3 ~
subspace) in R with direction vector d = 1 . Figure 35.1 gives a more general geometric
1
interpretation of the domain and range of a linear transformation from Rn to Rm .

Figure 35.1: Visualizing the kernel and the range of a linear transformation .

Exercise 35.3
Find a basis for Ker (L) and Range (L) where L is the linear transformation defined
by
L(x1 , x2 , x3 ) = (x1 + x2 , x1 + x2 + x3 ).

Solution. We have
" #
h i 1 1 0
[L] = L(~e1 ) L(~e2 ) L(~e3 ) = .
1 1 1

Carrying [ L ] to reduced row echelon form gives


" # " #
1 1 0 −→ 1 1 0
1 1 1 R2 −R1 0 0 1

248
from which we see the solution to L(~x) = [ L ]~x = ~0 is
 
−1
~x = t  1  , t ∈ R
 

and so  

 −1 

1
 
 
 
0
 

is a basis for Ker (L). As the reduced row echelon form of [ L ] has leading ones in the first
and last columns, a basis for Range (L) is
(" # " #)
1 0
, .
1 1

249
Lecture 36

Determinants, Adjugates and Matrix Inverses


Previously, we used the Matrix Inversion Algorithm to both decide if an n × n matrix A was
invertible and to compute A−1 if A was invertible. Now we study a number associated to an
n × n matrix A, called the determinant, and we will see how the determinant is related to
the invertibility. We begin with a 2 × 2 matrix.

Definition 36.1: The Determinant and the Adjugate: the 2 × 2 case

Let A = [ ac db ] ∈ M2×2 (R). The determinant of A is

a b
det A = = ad − bc
c d

and the adjugate of A is " #


d −b
adj A = .
−c a

Example 36.1

Consider A = [ 13 24 ]. Then

det A = 1(4) − 2(3) = 4 − 6 = −2

and " #
4 −2
adj A = .
−3 1
Also,
" #" # " # " #
1 2 4 −2 −2 0 1 0
A(adj A) = = = −2 = (det A)I
3 4 −3 1 0 −2 0 1
" #" # " # " #
4 −2 1 2 −2 0 1 0
(adj A)A = = = −2 = (det A)I
−3 1 3 4 0 −2 0 1
From this we see " # " #! " #
1 2 1 4 −2 1 0
− =
3 4 2 −3 1 0 1

250
so " #
−2 1
A−1 =
3/2 −1/2

Theorem 36.1
Let A ∈ M2×2 (R). Then

A(adj A) = (det A)I = (adj A)A.

Moreoever, A is invertible if and only if det A 6= 0 and in this case


1
A−1 = adj A.
det A

 d −b 
Proof. Let A = [ ac db ] ∈ M2×2 (R). Then det A = ad − bc and adj A = −c a . Now
" #" # " #
a b d −b ad − bc 0
A(adj A) = = = (det A)I
c d −c a 0 ad − bc
" #" # " #
d −b a b ad − bc 0
(adj A)A = = = (det A)I
−c a c d 0 ad − bc

Assume then that det A 6= 0. From

A(adj A) = (det A)I = (adj A)A

we obtain    
1 1
A adj A = I = adj A A
det A det A
so
1
A−1 = adj A.
det A
Thus det A 6= 0 implies that A invertible and gives our formula for A−1 . We now show if A is
invertible, then det A 6= 0. Assume for a contradiction that det A = 0. Since A is invertible,
A 6= 0 so at least one of a, b, c, d are not zero. Since

A(adj A) = (det A)I = 0I = 0,

we have " # " # " # " #


d 0 −b 0
A = and A = .
−c 0 a 0

251
Since not all of a, b, c, d are zero, we have that either
" # " # " # " #
d 0 −b 0
6= or 6=
−c 0 a 0

from which we see that the homogeneous system A~x = ~0 has a nontrivial solution, so A is
not invertible by the Invertible Matrix Theorem. This is a contradiction, so our assumption
that det A = 0 was incorrect, and we must have det A 6= 0.
We now consider how to compute the determinant of an n × n matrix. We will see that the
definition of the determinant of an n×n matrix is recursive - to compute such a determinant,
we will compute n determinants of size (n − 1) × (n − 1). This can be quite tedious by hand,
so we will also begin to explore how elementary row (and column) operations can greatly
reduce our work.

Definition 36.2: Cofactors of an n × n Matrix


Let A ∈ Mn×n (R) and let A(i, j) be the (n − 1) × (n − 1) matrix obtained from A by
deleting the ith row and jth column of A. The (i, j)-cofactor of A, denoted by Cij , is

Cij = (−1)i+j det A(i, j).

Example 36.2
h 1 −2 3 i
Let A = 1 0 4 . Then the (3, 2)-cofactor of A is
4 1 1

1 3
C32 = (−1)3+2 det A(3, 2) = (−1)5 = (−1)(4 − 3) = −1,
1 4

and the (2, 2)-cofactor of A is

1 3
C22 = (−1)2+2 det A(2, 2) = (−1)4 = 1(1 − 12) = −11.
4 1

Definition 36.3: The Determinant: the n × n case


Let A = [aij ] ∈ Mn×n (R). For any i = 1, . . . , n, we define the determinant of A as
det A = ai1 Ci1 + ai2 Ci2 + · · · + ain Cin
which we refer to as a cofactor expansion of A along the ith row of A. Equivalently,
for any j = 1, . . . , n,
det A = a1j C1j + a2j C2j + · · · + anj Cnj

252
which we refer to as a cofactor expansion of A along the jth column of A.

Note that we can do a cofactor expansion along any row or column we choose. This is illus-
trated in the next example.

Example 36.3
1 2 −3
h i
Compute det A where A = 4 −5 6 .
−7 8 9

Solution. Doing a cofactor expansion along the first row gives


+ − + + − + + − +
− + − − + − − + −
+ − + + − + + − +
2 −3
z}|{ z}|{ z}|{
1
−5 6 4 6 4 −5
det A = 4 −5 6 = 1 −2 −3
8 9 −7 9 −7 8
−7 8 9 | {z } | {z } | {z }
1 2 −3 1 2 −3 1 2 −3
4 −5 6 4 −5 6 4 −5 6
−7 8 9 −7 8 9 −7 8 9

= 1(−45 − 48) − 2(36 + 42) − 3(32 − 35)


= 1(−93) − 2(78) − 3(−3)
= −93 − 156 + 9
= −240

Alternatively, a cofactor expansion along the second columns leads to


+ − + + − + + − +
− + − − + − − + −
+ − + + − + + − +
2 −3
z}|{ z}|{ z}|{
1
4 6 1 −3 1 −3
det A = 4 −5 6 = −2 −5 −8
−7 9 −7 9 4 6
−7 8 9 | {z } | {z } | {z }
1 2 −3 1 2 −3 1 2 −3
4 −5 6 4 −5 6 4 −5 6
−7 8 9 −7 8 9 −7 8 9

= −2(36 + 42) − 5(9 − 21) − 8(6 + 12)


= −2(78) − 5(−12) − 8(−18)
= −240 (as before)

Exercise 36.1
h 1 0 −2 i
Find det B if B = 0 3 4 .
5 6 −7

253
Solution. Expanding along the first column,

1 0 −2
3 4 0 −2 0 −2
det B = 0 3 4 = 1(−1)1+1 + 0(−1)2+1 + 5(−1)3+1
6 −7 6 −7 3 4
5 6 −7
= 1(−21 − 24) + 0 + 5(0 + 6)
= −45 + 30
= −15

The next example shows that performing a cofactor expansion along a row or column that
consists mainly of zeros can greatly reduce the work in computing the determinant.

Example 36.4
−1
 1 2 3

Find det A if A = 1 2 0 4 .
0 0 0 3
−1 1 2 1

Solution. Performing a cofactor expansion along the third row, we have

1 2 −1 3
1 2 −1
1 2 0 4
det A = = −3 1 2 0
0 0 0 3
−1 1 2
−1 1 2 1

To evaluate the determinant of the 3 × 3 matrix, we can do a cofactor expansion along


the third column. This gives
!
1 2 1 2
det A = −3 −1 +2
−1 1 1 2
= −3(−1(1 + 2) + 2(2 − 2))
= −3(−3 + 0)
=9

254
Lecture 37

Definition 37.1: The Cofactor Matrix and the Adjugate: the n × n case

Let A = [aij ] ∈ Mn×n (R)

1. Cij = (−1)i+j det A(i, j) is the (i, j)-cofactor of A,

2. The cofactor matrix of A is

cof A = [Cij ] ∈ Mn×n (R),

3. The adjugate of A is
adj A = [Cij ]T ∈ Mn×n (R).

Example 37.1
h i
1 2 3
Find adj A if A = 1 1 2 .
3 4 5

Solution.
 T
1 2 1 2 1 1


 4 5 3 5 3 4 

 
   T  
  −3 1 1 −3 2 1
 2 3 1 3 1 2 
adj A =  − −  = 2 −4 2  = 1 −4 1 .
     
 4 5 3 5 3 4 



 1 1 −1 1 2 −1
 
 
 2 3 1 3 1 2 

1 2 1 2 1 1

Note that in the previous example, we computed all of the cofactors of A. Thus we can
easily compute the determinant of A by doing a cofactor expansion along, say, the first row:

1 2 3
det A = 1 1 2 = a11 C11 + a12 C12 + a13 C13 = 1(−3) + 2(1) + 3(1) = 2.
3 4 5

Also observe that


    
1 2 3 −3 2 1 2 0 0
A(adj A) =  1 1 2   1 −4 1  =  0 2 0  = 2I = (det(A))I
    

3 4 5 1 2 −1 0 0 2

255
    
−3 2 1 1 2 3 2 0 0
(adj A)A =  1 −4 1   1 1 2  =  0 2 0  = 2I = (det(A))I
    

1 2 −1 3 4 5 0 0 2

More generally, let  


a11 a12 a13
A =  a21 a22 a23  .
 

a31 a32 a33


Then  T  
C11 C12 C13 C11 C21 C31
adj A =  C21 C22 C23  =  C12 C22 C32 
   

C31 C32 C33 C13 C23 C33


so
 
a11 C11 + a12 C12 + a13 C13 a11 C21 + a12 C22 + a13 C23 a11 C31 + a12 C32 + a13 C33
A(adj A) =  a21 C11 + a22 C12 + a23 C13 a21 C21 + a22 C22 + a23 C23 a21 C31 + a22 C32 + a23 C33 
 

a31 C11 + a32 C12 + a33 C13 a31 C21 + a32 C22 + a33 C23 a31 C31 + a32 C32 + a33 C33

and
 
a11 C11 + a21 C21 + a31 C31 a12 C11 + a22 C21 + a32 C31 a13 C11 + a23 C21 + a33 C31
(adj A)A =  a11 C12 + a21 C22 + a31 C32 a12 C12 + a22 C22 + a32 C32 a13 C12 + a23 C22 + a33 C32 
 

a11 C13 + a21 C23 + a31 C33 a12 C13 + a22 C23 + a32 C33 a13 C13 + a23 C23 + a33 C33

The (1, 1)−, (2, 2)− and (3, 3)− entries of A(adj A) are respectively the cofactor expansions
along the first, second and third rows of A, and thus are each equal to det A. The (1, 1)−,
(2, 2)− and (3, 3)− entries of (adj A)A are respectively the cofactor expansions along the first,
second and third columns of A, and are thus each equal to det A. The entries of A(adj A)
and (adj A)A that are not on the main diagonal look like cofactor expansions, but they are
not (they are sometimes called false determinants). These always evaluate to zero.

The following theorem generalizes Theorem 36.1 for n × n matrices. The proof is omitted.

Theorem 37.1
Let A ∈ Mn×n (R). Then

A(adj A) = (det A)I = (adj A)A.

Moreover, A is invertible if and only if det A 6= 0. In this case,


1
A−1 = adj A.
det A

256
Example 37.2
h i
1 1 2
Find det A, adj A and A−1 if A = 1 1 4 .
1 2 4

Solution. Using a cofactor expansion along the first row, we obtain

1 4 1 4 1 1
det A = 1 −1 +2
2 4 1 4 1 2
= 1(4 − 8) − 1(4 − 4) + 2(2 − 1)
= −4 + 2
= −2

Then
 T
1 4 1 4 1 1


 2 4 1 4 1 2 

 
   T  
  −4 0 1 −4 0 2
 1 2 1 2 1 1 
adj A =  − −  = 0 2 −1  =  0 2 −2 
     
 2 4 1 4 1 2 



 2 −2 0 1 −1 0
 
 
 1 2 1 2 1 1 

1 4 1 4 1 1

so    
−4 0 2 2 0 −1
1
A−1 =−  0 2 −2  =  0 −1 1 
  
2
1 −1 0 −1/2 1/2 0

Elementary Row/Column Operations


After computing several determinants, we see that having a row or column consisting of
mostly zeros greatly simplifies our work (see Example 36.4). We now investigate how a
determinant changes after a matrix has elementary row operations (or elementary column
operations24,25 performed on it). Our goal is to use these operations to introduce rows and/or
columns with many zero entries.

24
Elementary column operations are the same as elementary row operations, but performed on the columns.
One may think of performing an elementary column operation on A as performing an elementary row
operation on AT .
25
When solving a linear system of equations by carrying the augmented matrix to reduced row echelon
form, you must perform elementary row operations, and not elementary column operations.

257
Example 37.3
Consider
" # " # " # " #
1 2 2 1 1 3 2 2
A= , B= , C= and D =
1 4 4 1 1 5 2 4

with determinants

det A = 2, det B = −2, det C = 2 and det D = 4.

Notice that B, C and D can each be derived from A by exactly one elementary column
operation.
" # " #
1 2 −→ 2 1
A= = B and det B = − det A
1 4 C1 ↔C2 4 1
" # " #
1 2 −→ 1 3
A= = C and det C = det A
1 4 C1 +C2 →C2 1 5
" # " #
1 2 2C1 →C1 2 2
A= = D and det D = 2 det A
1 4 −→ 2 4

It appears that the determinant changes predictably under these elementary column
operations (the same holds for elementary row operations).

Theorem 37.2
Let A ∈ Mn×n (R).

(1) If A has a row (or column) of zeros, then det A = 0.

(2) If B is obtained from A by swapping two distinct rows (or two distinct columns),
then det B = − det A.

(3) If B is obtained from A by adding a multiple of one row to another row (or a
multiple of one column to another column) then det B = det A.

(4) If two distinct rows of A (or two distinct columns of A) are equal, then det A = 0.

(5) If B is obtained from A by multiplying a row (or a column) by c ∈ R, then


det B = c det A.

Note: do not perform elementary row operations and elementary column operations at the
same time. In particular, do not add a multiple of a row to a column, or swap a row with a
column. If you need to do both types of operations, do the row operations in one step and

258
the column operations in another.
We now use elementary row and column operations to simplify the taking of determinants.

Example 37.4
h i
1 2 3
Find det A if A = 4 5 6 .
7 8 10

Solution. Rather than immediately doing a cofactor expansion, we will perform ele-
mentary row operations to A to introduce two zeros in the first column, and then do
a cofactor expansion along that column.

1 2 3 = 1 2 3
−3 −6
det A = 4 5 6 R2 −4R1 0 −3 −6 =1
−6 −11
7 8 10 R3 −7R1 0 −6 −11

Of course, we could now evaluate the 2×2 determinant, but to include another example,
we will instead multiply the first column by a factor of −1/3 and then evaluate the
simplified determinant.

−3 −6 − 31 C1 →C1 1 −6
det A = (−3) = (−3)(−11 + 12) = −3.
−6 −11 = 2 −11

A couple of things to note here. First, we are using “=” rather than “−→” when we perform
our elementary operations on A. This is because we are really working with determinants,
and provided we are making the necessary adjustments mentioned in Theorem 37.2, we will
maintain equality. Secondly, when we performed the operation − 31 C1 → C1 , a factor of −3
appeared rather than a factor of −1/3. To see why this is, consider
" # " #
−3 −6 1 −6
C= and B =
−6 −11 2 −11

Since " # " #


−3 −6 − 13 C1 →C1 1 −6
C= =B
−6 −11 −→ 2 −11
we see that B is obtained from C by multiplying the first column of C by −1/3. Thus by
Theorem 37.2
1
det B = − det C
3
and so
det C = −3 det B

259
which is why we have
−3 −6 1 −6
= (−3)
−6 −11 2 −11
We normally view this type of row or column operation as “factoring out” of that row or
column, and we omit writing this type of operation as we reduce.

Example 37.5
h 1 a a2 i
Let A = 1 b b22 . Show that det A = (b − a)(c − a)(c − b).
1 c c

Solution. We again introduce two zeros into the first column by performing elementary
row operations on A, and then do a cofactor expansion along that column.

1 a a2 = 1 a a2
(b − a) (b − a)(b + a)
det A = 1 b b2 R2 −R1 0 b − a b 2 − a2 =1
(c − a) (c − a)(c + a)
1 c c2 R3 −R1 0 c − a c 2 − a2
1 b+a
= (b − a)(c − a)
1 c+a
= (b − a)(c − a)(c + a − b − a)
= (b − a)(c − a)(c − b)

Again, notice that

(b − a) (b − a)(b + a) 1 b+a
= (b − a)(c − a) (28)
(c − a) (c − a)(c + a) 1 c+a

results from removing a factor of b − a from the first row of the determinant on the left, and
removing a factor of c − a from the second row. These correspond to the row operations
1 1
R → R1 and c−a
b−a 1
R2 → R2 . It is natural to ask what happens if a = b or a = c since it
would appear that we are dividing by zero in these cases. However, if a = b or a = c, we see
that both sides of (28) evaluate to zero, so that we still have equality.

Exercise 37.1
h i
x x 1
Consider A = x 1 x . For what values of x ∈ R does A fail to be invertible?
1 x x

260
Solution. A fails to be invertible exactly when det A = 0. Thus we have

x x 1 R1 −xR3 0 x − x2 1 − x2
x(1 − x) (1 + x)(1 − x)
0= x 1 x R2 −xR3 0 1 − x2 x − x2 =1
(1 + x)(1 − x) x(1 − x)
1 x x = 1 x x
x 1+x
= (1 − x)2
1+x x
= (1 − x)2 (x2 − (1 + x)2 ) = (1 − x)2 (x2 − 1 − 2x − x2 )
= −(1 − x)2 (1 + 2x)

so A is not invertible exactly when −(1−x)2 (1+2x) = 0, that is, when x = 1 or x = −1/2.

Example 37.6
 
1 0 0 0
Compute det A if A = 2 3 0 0 .
4 5 6 0
7 8 9 10

Solution.

1 0 0 0
3 0 0
2 3 0 0 6 0
det A = =1 5 6 0 = 1(3) = 1(3)(6)(10) = 180.
4 5 6 0 9 10
8 9 10
7 8 9 10

Note that in the previous example, det A is just the product of the entries on the main
diagonal26

Definition 37.2: Upper and Lower Triangular Matrices

Let A ∈ Mm×n (R). A is called upper triangular if every entry below the main diagonal
is zero. A is called lower triangular if every entry above the main diagonal is zero.

Example 37.7
The matrices    
4 −7 1 2 3 " #
0 0
 0 3 ,  0 4 10  and
   
0 0
0 0 0 0 −2

26
Recall that for A = [aij ] ∈ Mm×n (R), the main diagonal of A consists of the entries a11 , a22 , . . . , akk
with k being the minimum of m and n.

261
are upper triangular, and the matrices
 
" # 0 0 0 " #
3 0 0 0 0
,  1 2 0  and
 
2 −4 0 0 0
−1 3 4

are lower triangular.

Theorem 37.3
If A = [aij ] ∈ Mn×n (R) is a triangular matrix (upper or lower triangular), then
n
Y
det A = a11 a22 · · · ann = aii .
i=1

262
Lecture 38

Properties of Determinants
In this section, we examine the algebraic properties of the determinant. We will see that the
determinant behaves well with scalar multiplication and matrix multiplication, but not with
matrix addition. Our first result shows how the determinant behaves with respect to scalar
multiplication.

Theorem 38.1
If A ∈ Mn×n (R) and k ∈ R, then

det(kA) = k n det A.

Proof. If k = 0, then det(kA) = det(0n×n ) = 0 and k n det A = 0(det A) = 0 so the result


6 0, then we perform k1 Ri → Ri to each of the n rows of kA, which gives the
holds. If k =
result by Theorem 37.2.

Example 38.1
1 1
Find (det A)(det B) and det(AB) where A = [ 13 24 ] and B = [ −1 2 ].

Solution. We have

(det A)(det B) = (4 − 6)(2 − (−1)) = −2(3) = −6,

and
−1 5
det(AB) = = −11 − (−5) = −6.
−1 11

Theorem 38.2
If A, B ∈ Mn×n (R), then det(AB) = (det A)(det B).

Note that Theorem 38.2 says that for n×n matrices, the determinant distributes over matrix
multiplication. Since multiplication of real numbers is commutative, we have
det(AB) = (det A)(det B) = (det B)(det A) = det(BA)
for any A, B ∈ Mn×n (R). This means that even though A and B do not commute in general,
we are guaranteed that det(AB) = det(BA).

263
Note that Theorem 38.2 extends to more than two matrices. For A1 , A2 , . . . , Ak ∈ Mn×n (R),

det(A1 A2 · · · Ak ) = (det A1 )(det A2 ) · · · (det Ak ).

In particular, if A1 = A2 = · · · = Ak = A for any positive integer k, then we obtain

det(Ak ) = (det A)k .

We now use Theorem 38.2 to see compute the determinant of the inverse of a matrix.

Theorem 38.3
Let A ∈ Mn×n (R) be invertible. Then
1
det(A−1 ) = .
det A

Proof. For an invertible matrix A we have,

(det A)(det(A−1 )) = det(AA−1 ) = det I = 1

by Theorem 38.2, and since A invertible implies det A 6= 0, we obtain


1
det(A−1 ) = .
det A
For an invertible matrix A, we define A−k = (A−1 )k for any positive integer k and we define
A0 = I. Thus
k k
det(A−k ) = det (A−1 )k = det(A−1 ) = (det A)−1 = (det A)−k


and

det(A0 ) = det I = 1 = 10 = (det A)0 .

It follows that
det(Ak ) = (det A)k
for any integer k where k ≤ 0 requires that A be invertible.

Recalling Theorem 22.4, we have that if A1 , A2 , . . . , Ak ∈ Mn×n (R) are invertible, then the
product A1 A2 · · · Ak is invertible and

(A1 A2 · · · Ak )−1 = A−1 −1 −1


k · · · A2 A1 .

The next example shows that if a product n × n matrices is invertible, then each matrix in
the product is invertible.

264
Example 38.2

Let A1 , A2 , . . . , Ak ∈ Mn×n (R) be such that A1 A2 · · · Ak is invertible. Then

0 6= det(A1 A2 · · · Ak ) = (det A1 )(det A2 ) · · · (det Ak )

and so for i = 1, 2, . . . , k, we have that det Ai 6= 0 and thus Ai is invertible for


i = 1, . . . , k.

Theorem 38.4
Let A ∈ Mn×n (R). Then det(AT ) = det(A).

Example 38.3

If det(A) = 3, det(B) = −2 and det(C) = 4 for A, B, C ∈ Mn×n (R), find


det(A2 B T C −1 B 2 (A−1 )2 ).
Solution. We have

det(A2 B T C −1 B 2 (A−1 )2 ) = det(A2 ) det(B T ) det(C −1 ) det(B 2 ) det((A−1 )2 )


1 1
= (det A)2 (det B) (det B)2
det C (det A)2
(det B)3
=
det C
(−2)3 8
= = − = −2.
4 4

As mentioned earlier, the determinant does not behave well with matrix addition.

Example 38.4

Let A = [ 10 00 ] and B [ 00 01 ]. Then

det A + det B = 0 + 0 = 0

but
det(A + B) = det I = 1,
so for A, B ∈ Mn×n (R), det(A+B) 6= det A+det B in general, that is, the determinant
does not distribute over matrix addition.

Application: Polynomial Interpolation


265
During experiments, data is often observed, measured and recorded in the form (x, y) where
x is the independent (or control) variable and y is the dependent (or responding) variable.
Given a set of data points (x1 , y1 ), (x2 , y2 ), . . . , (xn , yn ), we seek a polynomial p(x) such that
yi = p(xi ) for each i = 1, . . . , n. We can then use the polynomial to approximate y values
corresponding to other x values. Since two distinct points determine a line and three distinct
points determine a quadratic (provided they don’t all lie on a line), given n data points we
seek a polynomial of degree n − 1.

Example 38.5

Find a cubic polynomial p(x) whose graph passes through each of the points (−2, −5),
(−1, 4), (1, 4) and (3, 60).
Solution. Let p(x) = a0 + a1 x + a2 x2 + a3 x3 for a0 , a1 , a2 , a3 ∈ R. For each data point
(xi , yi ), we evaluate the equation p(xi ) = yi .

(−2, −5) : a0 − 2a1 + 4a2 − 8a3 = −5


(−1, 4) : a0 − a1 + a2 − a3 = 4
(1, 4) : a0 + a1 + a2 + a3 = 4
(3, 60) : a0 + 3a1 + 9a2 + 27a3 = 60

Converting to matrix notation, we obtain


    
1 −2 4 −8 a0 −5
 1 −1 1 −1   a1
    4 
=
   
  
 1 1 1 1   a2   4 
1 3 9 27 a3 60

Solving the system gives a0 = 3, a1 = −2, a2 = 1 and a3 = 2, that is, p(x) =


3 − 2x + x2 + 2x3 .

More generally, given n data points (x1 , y1 ), (x2 , y2 ), . . . , (xn , yn ), we construct a polynomial
p(x) = a0 + a1 x + a2 x2 + · · · + an−1 xn−1 such that p(xi ) = yi for each i = 1, . . . , n. This gives
the system of equations

a0 + a1 x1 + a2 x21 + · · · + an−1 xn−1


1 = y1
2 n−1
a0 + a1 x2 + a2 x2 + · · · + an−1 x2 = y2
.. .. .. .. ..
. . . . .
2 n−1
a0 + a1 xn + a2 xn + · · · + an−1 xn = yn

266
whose matrix equation is
    
1 x1 x21 · · · xn−1
1 a0 y1
1 x2 x2 · · · xn−1
2
a1 y2
    
 2   
=

 .. .. .. ..  .. ..  (29)
. . . . . .
    
    
2 n−1
1 xn xn · · · xn an−1 yn
For x1 , x2 , . . . , xn ∈ R and n ≥ 2, the n × n matrix
 
1 x1 x21 · · · xn−1
1
2 n−1
 1 x2 x2 · · · x2
 

A=  .. .. .. .. 
 . . . .


2 n−1
1 xn xn · · · xn
is called a Vandermonde matrix. For n = 2, the Vandermonde matrix is
" #
1 x1
A=
1 x2

with det A = x2 − x1 . For n = 3, we have


 
1 x1 x21
A =  1 x2 x22 
 

1 x3 x23

with det A = (x3 − x1 )(x3 − x2 )(x2 − x1 ) (see Example 37.5). For the n × n Vandermonde
matrix, A, we have
Y
det A = (xj − xi )
1≤i<j≤n

that is, det A is the product of the terms (xj − xi ) where j > i and i, j both lie between
1 and n inclusively. It follows that the n × n Vandermonde matrix is invertible if and only
if x1 , x2 , . . . , xn are all distinct and that in this case, Equation (29) has a unique solution.
This shows the following:

Theorem 38.5
For the n data points (x1 , y1 ), (x2 , y2 ), . . . , (xn , yn ) where x1 , x2 , . . . , xn are all distinct,
there exists a unique polynomial of degree n − 1
p(x) = a0 + a1 x + a2 x2 + · · · + an−1 xn−1
satisfying p(xi ) = yi for each i = 1, 2 . . . , n.

267
Example 38.6
A car manufacturing company uses a wind tunnel to test the force due to air resistance
experienced by the car windshield. The following data was collected:

Air velocity (m/s) 20 33 45


Force on windshield (N) 200 310 420

Construct a quadratic polynomial to model this data, and use it to predict the force
due to air resistance from a wind speed of 40m/s.
Solution. Let p(x) = a0 + a1 x + a2 x2 where a0 , a1 , a2 ∈ R. Using our data points
(20, 200), (33, 310) and (45, 420) we obtain the system of equations in matrix notation
    
1 20 400 a0 200
 1 33 1089   a1  =  310 
    

1 45 2025 a2 420

The determinant of the coefficient matrix is (45 − 20)(45 − 33)(33 − 20) = 25 · 12 · 13 =


3900 and the adjugate is
 T
33 1089 1 1089 1 33
 − 

 45 2025 1 2025 1 45 

   T
17 820 −936 12
 
 
 − 20 400 1 400 1 20
 
−  −22 500 1625 −25 
 = 
 45 2025 1 2025 1 45 
8 580 −689 13
 
 
 
 
 20 400 1 400 1 20 
 − 
33 1089 1 1089 1 33
 
17 820 −22 500 8 580
=  −936 1625 −689 
 

12 −25 13
so       
a0 17 820 −22 500 8 580 200 642/13
1 
 a1  =  −936 1625 −689   310  =  209/30 
     
3900
a2 12 −25 13 420 11/390
Thus
642 209 11 2
p(x) = + x+ x
13 30 390

268
When x = 40, we have
642 209 11 2 14 554
p(40) = + 40 + 40 = ≈ 373.18
13 30 390 39
When the air velocity is 40 m/s, the windshield experiences approximately 373.18N of
force.

We solved the system in Example 38.6 by using determinants and adjugates, but it is plain
to see that the computations are messy. Carrying the augmented matrix to reduced row
echelon form is not much better:
     
1 20 400 200 −→ 1 20 400 200 −→ 1 20 400 200 R1 −20R2

 1 33 1089 310  R2 −R1  0 13 689 110  13 R2  0 1 53 110/13  −→


    1  

1 45 2025 420 R3 −R1 0 25 1625 220 0 25 1625 220 R3 −25R2


     
1 0 −660 400/13 −→ 1 0 −660 400/13 R1 +660R3 1 0 0 642/13
 0 1 53 110/13 0 1 53 110/13  R2 −53R3  0 1 0 209/30  .
     
 
0 0 300 110/13 1
R
300 3
0 0 1 11/390 −→ 0 0 1 11/390

269
Lecture 39

Eigenvalues and Eigenvectors


For A ∈ Mm×n (R), ~x ∈ Rn and ~b ∈ Rm , we have seen that the equation A~x = ~b has
been central to our study of linear algebra. We now focus on a particular instance of this
equation where ~b is a scalar multiple of ~x. This of course requires ~b ∈ Rn and it follows that
A ∈ Mn×n (R). Thus, given A ∈ Mn×n (R), we seek a nonzero vector ~x and a scalar λ such
that
A~x = λ~x.
The values of λ and ~x will depend on A. We insist ~x 6= ~0 as otherwise the above equation
becomes A~0 = λ~0 which trivially holds for any scalar λ.

Definition 39.1: Eigenvalues and Eigenvectors

For A ∈ Mn×n (R), a scalar λ is an eigenvalue of A if A~x = λ~x for some nonzero
vector ~x. The vector ~x is then called an eigenvector of A corresponding to λ.

Example 39.1
h i
If A = −3/5 4/5
x = [ 12 ], then
4/5 3/5 and ~

" #" # " #


−3/5 4/5 1 1
A~x = = = 1~x,
4/5 3/5 2 2

and so λ = 1 is an eigenvalue of A and ~x = [ 12 ] is a corresponding eigenvector.

Recalling that every A ∈ Mn×n (R) is the standard matrix of a linear operator on R, we can
gain some geometric intuition about the eigenvalues and eigenvectors of a matrix.

Example 39.2

Let the linear operator L on R2 be a reflection in the x2 −axis with standard matrix
" #
−1 0
A = [L] = .
0 1

Thinking geometrically, we see that the reflection of ~e1 in the x2 −axis is −~e1 , that is,
A~e1 = −~e1 = (−1)~e1 so λ = −1 is an eigenvalue of A with corresponding eigenvector ~e1 .
Similarly, we see A~e2 = ~e2 = 1~e2 , so λ = 1 is an eigenvalue of A with corresponding

270
eigenvector ~e2 . In fact, any nonzero vector lying on the x1 −axis is an eigenvector
corresponding to λ = −1 and any nonzero vector lying on the x2 −axis is an eigenvector
corresponding to λ = 1.

How do we find eigenvalues and eigenvectors for A ∈ Mn×n (R)? For a nonzero vector ~x and
scalar λ, we have that λ is an eigenvalue of A with corresponding eigenvector ~x if and only
if
A~x = λ~x ⇐⇒ A~x − λ~x = ~0 ⇐⇒ A~x − λI~x = ~0 ⇐⇒ (A − λI)~x = ~0.
Thus we will consider the homogeneous system (A − λI)~x = ~0. Since ~x 6= ~0, we require non-
trivial solutions to this system, and since A − λI is an n × n matrix, the Invertible Matrix
Theorem gives that A − λI cannot be invertible, and so det(A − λI) = 0. This verifies the
following theorem.

Theorem 39.1
Let A ∈ Mn×n (R). A number λ is an eigenvalue of A if and only if λ satisfies the
equation
det(A − λI) = 0.
If λ is a eigenvalue of A, then all nonzero solutions of the homogeneous system of
equations
(A − λI)~x = ~0
are the eigenvectors of A corresponding to λ.

Theorem 39.1 indicates that to find the eigenvalues and corresponding eigenvectors of an
n × n matrix A, we first find all scalars λ so that det(A − λI) = 0 which will be our eigen-
values. Then for each eigenvalue λ of A, we find the nullspace of A − λI by solving the
homogeneous system (A − λI)~x = ~0. The nonzero vectors of Null (A − λI) will be the set of
eigenvectors of A corresponding to λ. We make the following definition.

Definition 39.2: Characteristic Polynomial

Let A ∈ Mn×n (R). The characteristic polynomial of A is

CA (λ) = det(A − λI).

We note that λ is an eigenvalue of A if and only if CA (λ) = 0. As we will see, CA (λ) is


indeed a polynomial. Since A ∈ Mn×n (R), CA (λ) will have real coefficients, but may have
non–real roots.

271
Example 39.3
1 2
Find the eigenvalues and all corresponding eigenvectors for A = [ −1 4 ].

Solution. We first compute the characteristic polynomial of A.

1−λ 2
CA (λ) = det(A − λI) = = (1 − λ)(4 − λ) − 2(−1)
−1 4 − λ
= 4 − 5λ + λ2 + 2 = λ2 − 5λ + 6 = (λ − 2)(λ − 3).

Now λ is a eigenvalue of A if and only if CA (λ) = 0, that is, if and only if (λ−2)(λ−3) =
0. Thus λ1 = 2 and λ2 = 3 are the eigenvalues of A. To find the eigenvectors of A
corresponding to λ1 = 2, we solve the homogeneous system (A − 2I)~x = ~0.
" # " # " #
−1 2 −→ −1 2 −R1 1 −2
A − 2I =
−1 2 R2 −R1 0 0 −→ 0 0
so " # " #
2t 2
~x = =t , t ∈ R.
t 1
Thus the eigenvectors of A corresponding to λ1 = 2 are
" #
2
t , t ∈ R, t 6= 0.
1

To find the eigenvectors of A corresponding to λ2 = 3, we solve the homogeneous


system (A − 3I)~x = ~0.
" # " # " #
−2 2 −→ −2 2 − 12 R1 1 −1
A − 3I =
−1 1 R2 − 12 R1 0 0 −→ 0 0
so " # " #
s 1
~x = =s , s∈R
s 1
and thus the eigenvectors of A corresponding to λ2 = 3 are
" #
1
s , s ∈ R, s =
6 0.
1

In Example 39.3, we see that the eigenvectors of a matrix A ∈ Mn×n (R) associated to a
given eigenvalue λ are the nonzero vectors of the subspace Null (A − λI) of Rn .

272
Lecture 40

Definition 40.1: Eigenspace

Let λ be an eigenvalue of A ∈ Mn×n (R). The set containing all of the eigenvectors
of A corresponding to λ together with the zero vector is called the eigenspace of A
corresponding to λ, and is denoted by Eλ (A).

We note that the set of all eigenvectors of A corresponding to lambda together with the zero
vector is simply the nullspace of A − λI. Thus Eλ (A) = Null (A − λI). This leads to the
following result.

Theorem 40.1
Let λ be an eigenvalue of A ∈ Mn×n (R). If λ ∈ R, then Eλ (A) a subspace of Rn , and
if λ 6∈ R, then Eλ (A) is a subspace of Cn .

Thus we seek a basis for each eigenspace Eλ (A) of A. Once we have a basis for Eλ (A),
we can construct all eigenvectors of A corresponding to λ by taking all non-zero linear
combinations of these basis vectors. Note that when working with eigenspaces, we will only
consider A ∈ Mn×n (R) such that all of the eigenvalues of A are real.

From Example 39.3, the eigenvalues of A were λ1 = 2 and λ2 = 3. Hence


(" #) (" #)
2 1
and
1 1

are bases for Eλ1 (A) and Eλ2 (A) respectively. Note that each eigenspace is a line through
the origin in R2 .

Note that we can verify our work is correct by ensuring that our basis vectors for each
eigenspace satisfy the equation A~x = λ~x for the corresponding eigenvalue λ:
" # " #" # " # " #
2 1 2 2 4 2
A = = =2
1 −1 4 1 2 1
" # " #" # " # " #
1 1 2 1 3 1
A = = =3 .
1 −1 4 1 3 1

273
Example 40.1
h i
0 1 1
Find the eigenvalues and a basis for each eigenspace of A where A = 1 0 1 .
1 1 0

Solution. We begin by computing the characteristic polynomial of A, using elementary


row operations to aid in our computations.

−λ 1 1 R1 +λR2 0 1 − λ2 1 + λ
CA (λ) = det(A − λI) = 1 −λ 1 = 1 −λ 1
1 1 −λ R3 −R2 0 1 + λ −λ − 1

and performing a cofactor expansion along the first column and factoring entries as
needed leads to

(1 + λ)(1 − λ) 1+λ 1−λ 1


= (−1) = (−1)(1 + λ)2
1+λ −(1 + λ) 1 −1
= (−1)(λ + 1)2 ((1 − λ)(−1) − 1) = (−1)(λ + 1)2 (λ − 2).

Hence the eigenvalues of A are λ1 = −1 and λ2 = 2. For λ1 = −1, we solve


(A + I)~x = ~0.    
1 1 1 −→ 1 1 1
A + I =  1 1 1  R2 −R1  0 0 0 
   

1 1 1 R3 −R1 0 0 0
so      
−s − t −1 −1
~x =  s  = s 1  + t 0 , s, t ∈ R.
     

t 0 1
Hence a basis for Eλ1 (A) is
   

 −1 −1 

B1 =  1  ,  0  .
   
 
0 1
 

For λ2 = 2, we solve (A − 2I)~x = ~0.


     
−2 1 1 R1 +2R2 0 −3 3 −→ 0 −3 3 − 31 R1

A − 2I =  1 −2 1  −→  1 −2 1   1 −2 1  −→
     

1 1 −2 R3 −R2 0 3 −3 R3 +R1 0 0 0

274
     
0 1 −1 −→ 0 1 −1 1 0 −1
R1 ↔R2
 1 −2 1   1 0 −1   0 1 −1 
     
R2 +2R1
−→
0 0 0 0 0 0 0 0 0
so    
t 1
~x =  t  = t  1  , t ∈ R.
   

t 1
Hence a basis for Eλ2 (A) is  
 1 
 
B2 =  1  .
 
 
1
 

Note that in Example 39.3, A was a 2 × 2 matrix and CA (λ) was of degree 2, and in Ex-
ample 40.1, A was a 3 × 3 matrix and CA (λ) was of degree 3. This is true in general: for
A ∈ Mn×n (R), CA (λ) will be of degree n. We state the following theorem, but omit the proof.

Theorem 40.2
Let A ∈ Mn×n (R). Then CA (λ) is a real polynomial of degree n.

Recalling that a polynomial cannot have more distinct roots (or zeros) than the degree of
the polynomial, we have that an n × n matrix cannot have more than n distinct eigenvalues.
Indeed, in Example 40.1, the 3 × 3 matrix A had only two distinct eigenvalues since the
the eigenvalue λ1 = −1 appeared as a root of the characteristice polynomial twice. This
motivates the following definition.

Definition 40.2: Algebraic Multiplicity

Let A ∈ Mn×n (R) with eigenvalue λ. The algebraic multiplicity of λ, denoted by aλ ,


is the number of times λ appears as a root of CA (λ).

We can find the algebraic multiplicities of the eigenvalues of a matrix from the factorization
of its characteristic polynomial.

Example 40.2
In Example 40.1, we found λ1 = −1 and λ2 = 2 were the only two eigenvalues of A
since CA (λ) = (−1)(λ + 1)2 (λ − 2). The exponent of “2” on the λ + 1 term means that

275
λ1 = −1 has algebraic multiplicity 2 while the exponent of “1” on the λ − 2 means
that λ2 = 2 has algebraic multiplicity 1. Thus

aλ1 = 2 and aλ2 = 1.

We will also be concerned with the dimension of the eigenspaces of a matrix. This leads to
a second multiplicity attached to each eigenvalue.

Definition 40.3: Geometric Multiplicity

Let A ∈ Mn×n (R) with eigenvalue λ. The geometric multiplicity of λ, denoted by gλ ,


is the dimension of the eigenspace Eλ (A).

Example 40.3
Again from Example 40.1, we found λ1 = −1 and λ2 = 2 were the only two eigenvalues
of A. We saw that dim(Eλ1 (A)) = 2 and dim(Eλ2 (A)) = 1. Thus

gλ1 = 2 and gλ2 = 1.

The next theorem states a relationship between the algebraic and geometric multiplicities of
an eigenvalue. The proof is omitted as it is beyond the scope of this course.

Theorem 40.3
For any A ∈ Mn×n (R) and any eigenvalue λ of A,

1 ≤ gλ ≤ aλ ≤ n.

Example 40.4

Find the eigenvalues of A and a basis for each eigenspace where A = [ 15 01 ].


Solution. We have

1−λ 0
CA (λ) = det(A − λI) = = (1 − λ)2
5 1−λ

which shows that λ1 = 1 is the only eigenvalue of A and aλ1 = 2. We solve (A−I)~x = ~0.
" # " # " #
0 0 −→ 0 0 R1 ↔R2 1 0
A−I =
5 0 1
R
5 2
1 0 −→ 0 0

276
so " # " #
0 0
~x = =t , t∈R
t 1
Thus (" #)
0
1
is a basis for Eλ1 (A), and we see gλ1 = 1 < 2 = aλ1 .

We see from this example that the geometric multiplicity of an eigenvalue can be less than
its algebraic multiplicity. We also notice that for a square upper or lower triangular matrix,
the eigenvalues of A are the entries on the main diagonal of A.

Given A ∈ Mn×n (R) we have seen that CA (λ) is a real polynomial of degree n. However, we
have seen before that a real polynomial can have non-real roots, and it thus follows that a
real matrix can have non-real eigenvalues.

Example 40.5

Let A = [ 11 −1
1 ]. Find the eigenvalues of A, and for each eigenvalue, find one corre-
sponding eigenvector.
Solution. We have

1 − λ −1
CA (λ) = det(A − λI) = = (1 − λ)2 + 1 = λ2 − 2λ + 2.
1 1−λ

Turning to the quadratic formula, we find


p √ √
−(−2) ± (−2)2 − 4(1)(2) 2 ± −4 2 ± 2 −1
λ= = = = 1 ± j.
2(1) 2 2

√j and −j square to give −1, and in this case, it won’t


We have used the fact that only
matter which one we pick for −1. Thus we take λ1 = 1 − j and λ2 = 1 + j as the
eigenvalues of A. To find an eigenvector of A corresponding to λ1 = 1 − j, we solve
(A − (1 − j)I)~x = ~0.
" # " # " #
j −1 −jR1 1 j −→ 1 j
A − (1 − j)I =
1 j −→ 1 j R2 −R1 0 0
so " # " #
−jt −j
~x = =t , t∈C
t 1

277
and we have that −j
 
1 is an eigenvector of A corresponding to λ1 = 1 − j. For
λ2 = 1 + j, we solve (A − (1 + j)I)~x = ~0.
" # " # " #
−j −1 jR1 1 −j −→ 1 −j
A − (1 + j)I =
1 −j −→ 1 −j R2 −R1 0 0
so " # " #
jt j
~x = =t , t∈C
t 1
j
and 1 is an eigenvector of A corresponding to λ2 = 1 + j.

Again, we can check our work:


" #" # " # " #
1 −1 −j −1 − j −j
= = (1 − j)
1 1 1 1−j 1
" #" # " # " #
1 −1 j −1 + j j
= = (1 + j)
1 1 1 1+j 1

Recall from Theorem 5.2 that if a real polynomial has a non-real root z, then z is also a root
of the polynomial. Thus it follows that if a real n × n matrix A has a non-real eigenvalue λ,
then λ is also an eigenvalue of A, which is exactly what we observed in the previous example.
Moreover, we observed that if ~z is an eigenvector of a real n × n matrix A corresponding to
a non-real eigenvalue λ, then ~z is a eigenvector of A corresponding to the nonreal eigenvalue λ.

Exercise 40.1
Let A ∈ Mn×n (R) have a non-real eigenvalue λ with a corresponding eigenvector ~z.
Show that λ is also an eigenvalue of A with a corresponding eigenvector ~z.

Solution. Since λ is an eigenvalue of A ∈ Mn×n (R) with corresponding eigenvector ~z, we


have A~z = λ~z. Taking the conjugate of both sides gives

A~z = λ~z
A ~z = λ ~z
A~z = λ ~z

since A ∈ Mn×n (R) implies A = A. We see that λ is an eigenvalue of A with corresponding


eigenvector ~z.

278
Lecture 41

Diagonalization
Note that for our following discussions about diagonalization, it is assumed that our matrices
are square matrices with real entries, and that all eigenvalues (and thus all eigenvectors) of
our matrices are real. Our work does generalize naturally to real matrices with complex
eigenvalues, and even to complex matrices with complex eigenvalues, but we do not pursue
this here.

Definition 41.1: Diagonal Matrix


An n × n matrix D such that dij = 0 for all i 6= j is called a diagonal matrix and is
denoted by D = diag(d11 , . . . , dnn ).

Example 41.1
The matrices  
" # " # 1 0 0
1 0 0 0
, , 0 2 0 
 
0 1 0 0
0 0 3
are diagonal matrices. Note that diagonal matrices are both upper and lower triangular
matrices.

Lemma 41.1
If D = diag(d11 , . . . , dnn ) and E = diag(e11 , . . . , enn ) then it follows

1) D + E = diag(d11 + e11 , . . . , dnn + enn )

2) DE = diag(d11 e11 , . . . , dnn enn ) = diag(e11 d11 , . . . , enn dnn ) = ED

In particular, for any positive integer k,

Dk = diag(d11
k k
, . . . , dnn ).

In fact, this holds for any integer k provided none of d11 , . . . , dnn are zero, that is, if
D is invertible.

279
Definition 41.2: Diagonalizable Matrix
An n × n matrix A is diagonalizable if there exists an n × n invertible matrix P and an
n × n diagonal matrix D so that P −1 AP = D. In this case, we say that P diagonalizes
A to D.

It is important to note that P −1 AP = D does not imply that A = D in general. This is


because matrix multiplication does not commute, so we cannot cancel P and P −1 in the
expression P −1 AP = D.

We now consider how to determine if a square matrix A is diagonalizable, and if so, how
to find the invertible matrix P that diagonalizes A. Suppose A is an n × n matrix whose
distinct eigenvalues are λ1 , . . . , λk with algebraic multiplicities aλ1 , . . . , aλk . Since CA (λ) is a
polynomial of degree n (recall Theorem 40.2, it has exactly n roots (counting complex roots
and repeated roots). Thus
aλ1 + · · · + aλk = n.
From Theorem 40.3, we have that 1 ≤ gλ ≤ aλ ≤ n for any eigenvalue λ of A. It follows that

k ≤ gλ1 + · · · + gλk ≤ n.

In fact, gλ1 + · · · + gλk = n if and only if gλi = aλi for each i = 1, . . . , k.

Lemma 41.2
Let A be an n × n matrix and let λ1 , . . . , λk be distinct eigenvalues of A. If Bi is a
basis for the eigenspace Eλi (A) for i = 1, . . . , k, then B = B1 ∪ B2 ∪ · · · ∪ Bk is linearly
independent.

Lemma 41.2 simply states that if we have bases for eigenspaces corresponding to the distinct
eigenvalues of an n × n matrix A and we construct a set B that contains all of those bases
vectors, then the set B will be linearly independent. Since the number of vectors the basis
of each eigenspace Eλi (A) is gλi , there are k ≤ gλ1 + · · · + gλk ≤ n vectors in B. If there are
in fact n vectors in B, then B is a basis for Rn consisting of eigenvectors of A.

As a reminder, we are still under the assumption that all matrices are real, and that all
eigenvalues of these matrices are also real. The following theorem gives us a condition under
which such matrices are diagonalizable.

Theorem 41.1: Diagonalization Theorem

A matrix A ∈ Mn×n (R) with every eigenvalue being real is diagonalizable if and only
if there exists a basis for Rn consisting of eigenvectors of A.

280
Proof. We first assume that A is diagonalizable. Then there exists an invertible matrix
P = [ ~x1 · · · ~xn ] and a diagonal matrix D = diag(λ1 , . . . , λn ) such that P −1 AP = D,
that is, such that AP = P D. Thus

A[ ~x1 · · · ~xn ] = P [ λ1~e1 · · · λn~en ]


[ A~x1 · · · A~xn ] = [ λ1 P~e1 · · · λn P~en ]
[ A~x1 · · · A~xn ] = [ λ1~x1 · · · λn~xn ].

We see that A~xi = λi~xi for i = 1, . . . , n, and since P = [ ~x1 · · · ~xn ] is invertible, it follows
from the Invertible Matrix Theorem that the set {~x1 , . . . , ~xn } is a basis for Rn so that ~xi 6= ~0
for i = 1, . . . , n. Thus {~x1 , . . . , ~xn } is a basis for Rn consisting of eigenvectors of A.

We now assume that there is a basis {~x1 , . . . , ~xn } of Rn consisting of eigenvectors of A. Then
for each i = 1, . . . n, A~xi = λi~xi for some eigenvalue λi of A. It follows from the Invertible
Matrix Theorem that P = [ ~x1 · · · ~xn ] is invertible and thus

P −1 AP = P −1 [ A~x1 · · · A~xn ]
= P −1 [ λ1~x1 · · · λn~xn ]
= P −1 [ λ1 P~e1 · · · λn P~en ]
= P −1 P [ λ1~e1 · · · λn~en ]
= diag(λ1 , . . . , λn )

which shows that A is diagonalizable.


The proof of the Diagonalization Theorem is a constructive proof, that is, given a diagonal-
izable matrix A, it tells us exactly how to construct the invertible matrix P and the diagonal
matrix D so that P −1 AP = D. Given that A is diagonalizable, the jth column of P will
contain the jth vector from the basis of eigenvectors, and the jth column of the diagonal
matrix D will contain the corresponding eigenvalue in the (j, j)−entry.

The following are consequences of the Diagonalization Theorem.

Corollary 41.1
An n × n matrix A is diagonalizable if and only if aλ = gλ for every eigenvalue λ of A.

Corollary 41.2
If an n × n matrix A has n distinct eigenvalues, then A is diagonalizable.

281
Example 41.2
Diagonalize the matrix " #
1 2
A= .
−1 4

Solution. From Example 39.3, the eigenvalues of A are

λ1 = 2 with algebraic multiplicity aλ1 = 1


λ2 = 3 with algebraic multiplicity aλ2 = 1

and
(" #)
2
is a basis for Eλ1 (A) so λ1 = 2 has geometric multiplicity gλ1 = 1
1
(" #)
1
is a basis for Eλ2 (A) so λ2 = 3 has geometric multiplicity gλ2 = 1.
1

We see that aλ1 = gλ1 and aλ2 = gλ2 and so A is diagonalizable by Corollary 41.1. We
take " #
2 1
P =
1 1
and have that P diagonalizes A, that is
" #
−1 2 0
P AP = diag(2, 3) = = D.
0 3

We can (and should) check our work:


" #
1 1 1 −1
P −1 = adj (P ) =
det(P ) 1 −1 2
so
" #" #" # " #" # " #
−1 1 −1 1 2 2 1 2 −2 2 1 2 0
P AP = = = =D
−1 2 −1 4 1 1 −3 6 1 1 0 3

Note that P and D are not unique. We could have chosen P = [ 11 21 ] which would have diago-
nalized A to D = [ 30 02 ]. Moreover, we can use the vectors from any bases for the eigenspaces
of A, not just the ones we found in Example 39.3.

282
In Example 41.2, we didn’t need to know the geometric multiplicities of the eigenvalues to
determine that A is diagonalizable. Since A is a 2 × 2 matrix that has 2 distinct eigenvalues,
A is diagonalizable by Corollary 41.2.

Example 41.3
h i
0 1 1
Diagonalize the matrix A = 1 0 1 .
1 1 0

Solution. From Example 40.1 the eigenvalues of A are

λ1 = −1 with algebraic multiplicity aλ1 = 2


λ2 = 2 with algebraic multiplicity aλ2 = 1

and
   
 −1
 −1 

 1 , 0  is a basis for Eλ1 (A) so λ1 = −1 has geometric multiplicity gλ1 = 2
   
 
0 1
 
 
 1
 

 1  is a basis for Eλ2 (A) so λ2 = 2 has geometric multiplicity gλ2 = 1.
 
 
1
 

Since aλ1 = gλ1 and aλ2 = gλ2 , we see that A is diagonalizable so we take
 
−1 −1 1
P = 1 0 1 
 

0 1 1

from which it follows that




−1 0 0
P −1 AP = diag(−1, −1, 2) =  0 −1 0  = D.
 

0 0 2

Again, it’s a good idea to check P −1 AP = D even though it’s a bit more work to compute
P −1 for a 3 × 3 matrix.

Example 41.4
Recall from Example 40.4 that " #
1 0
A=
5 1

283
has eigenvalues λ1 = 1 with aλ1 = 2. However, a basis for Eλ1 (A) is
(" #)
0
1

so gλ1 = 1 6= 2 = aλ1 . Hence A is not diagonalizable. This means that we cannot find
two linearly independent eigenvectors of A to form an invertible 2 × 2 matrix P .

284
Lecture 42

Powers of Matrices
A useful application of diagonalizing is computing high powers of a matrix. Suppose A is
an n × n diagonalizable matrix. Then P −1 AP = D for some n × n invertible P and n × n
diagonal matrix D. Rearranging gives A = P DP −1 and
A2 = P DP −1 P DP −1 = P DIDP −1 = P D2 P −1
Similarly, A3 = P D3 P −1 and more generally, Ak = P Dk P −1 for any positive integer k.
Although computing a high power of an arbitrary matrix is nearly impossible by inspection,
Lemma 41.1 states that to compute a positive power of a diagonal matrix, one need only
raise each of the diagonal entries to that power.

Example 42.1

Find a formula for Ak for any positive integer k where A = [ −1


1 2
4 ].

Solution. From Example 41.2, A is diagonalizable with


" # " #
2 1 2 0
P = and D = .
1 1 0 3

Thus

Ak = P Dk P −1
" #" #" #
2 1 2k 0 1 −1
=
1 1 0 3k −1 2
" #" #
2k+1 3k 1 −1
= k k
2 3 −1 2
" #
2k+1 − 3k (2)3k − 2k+1
= .
2k − 3k (2)3k − 2k

Note that we can verify our work is reasonable by taking k = 1 and ensuring we get A.
" # " #
1+1 1 1 1+1
2 − 3 (2)3 − 2 1 2
A1 = = = A.
21 − 31 (2)31 − 21 −1 4
Note also that we can now easily compute, say, A10 :
" #
−57001 116050
A10 = .
−58025 117074

285
Exercise 42.1
3 −4
 
Find a formula for Ak for A = −2 1 .

Solution.
3 − λ −4
CA (λ) = = (3 − λ)(1 − λ) − 8 = λ2 − 4λ +3 − 8 = λ2 − 4λ − 5 = (λ − 5)(λ +1)
−2 1 − λ
so λ1 = −1 and λ2 = 5 are the eigenvalues of A. We see aλ1 = 1 = aλ2 , that is, the 2 × 2
matrix A has two distinct eigenvalues, so we are guaranteed that A is diagonalizable by
Corollary 41.2. For λ1 = −1,
" # " # " #
4 −4 −→ 4 −4 1
4
R 1 1 −1
A+I =
−2 2 1
R2 + 2 R1 0 0 −→ 0 0
so (" #)
1
1
is a basis for Eλ1 (A). For λ2 = 5,
" # " # " #
−2 −4 −→ −2 −4 − 12 R1 1 2
A − 5I =
−2 −4 R2 −R1 0 0 −→ 0 0
so (" #)
−2
1
is a basis for Eλ2 (A). Now, let
" # " #
1 −2 −1 0
P = from which it follows that D = .
1 1 0 5
Then " #
1 1 2
P −1 =
3 −1 1
and
" #" # " #
1 −2 (−1)k 0 1 1 2
Ak = P Dk P −1 =
1 1 0 5k 3 −1 1
" #" #
1 (−1)k (−2)5k 1 2
=
3 (−1)k 5k −1 1
" #
1 (−1)k + (2)5k 2(−1)k − (2)5k
= .
3 (−1)k − 5k 2(−1)k + 5k

286
Eigenvalues, the Determinant and the Trace
We can use the eigenvalues of an n × n matrix A to compute the determinant and trace
of A. Suppose A has exactly k distinct eigenvalues λ1 , . . . , λk with algebraic multiplicities
aλ1 , . . . , aλk . Then aλ1 + · · · + aλk = n and the characteristic polynomial of A is of the form

CA (λ) = det(A − λI) = (−1)n (λ − λ1 )aλ1 · · · (λ − λk )aλk .

Taking λ = 0 gives

det(A − 0I) = det A = (−1)n (−λ1 )aλ1 · · · (−λk )aλk


= (−1)n (−1)aλ1 +···+aλk λ1 aλ1 · · · λk aλk
= (−1)n (−1)n λ1 aλ1 · · · λk aλk
= λ1 aλ1 · · · λk aλk
k
Y aλ
= λi i .
i=1

Thus, det A is the product of the eigenvalues of A where each eigenvalue λ of A appears in
the product aλ times. In particular, we note that A is invertible if and only if 0 is not an
eigenvalue of A.

Definition 42.1: Trace


Let A ∈ Mn×n (R). The trace of A, denoted by trA, is the sum of the entries on the
main diagonal. That is,
n
X
tr A = (A)ii = (A)11 + · · · + (A)nn .
i=1

Recall that (A)ij denotes the (i, j)−entry of A.

It can be shown that


k
X
tr A = λ1 aλ1 + · · · + λk aλk = λ i aλ i ,
i=1

that is, the trace of A is the sum of the eigenvalues of A where each eigenvalue λ of A appears
in the sum aλ times.

287
Example 42.2
For  
0 1 1
A =  1 0 1 ,
 

1 1 0
λ1 = −1 and λ2 = 2 with aλ1 = 2 and aλ2 = 1. Thus

det A = (−1)2 21 = (−1)(−1)(2) = 2


tr A = (−1)(2) + 2(1) = (−1) + (−1) + 2 = 0.

288
Lecture 43

Orthogonal Sets and Bases


Definition 43.1: Orthogonal Set

A set {~v1 , . . . , ~vk } ⊆ Rn is an orthogonal set if ~vi · ~vj = 0 for i 6= j.

Example 43.1

The standard basis {~e1 , . . . , ~en } for Rn is an orthogonal set. The set
     

 1 1 0 


 1   −2   0 

 , ,
     




 1   1   0 


 
0 0 0

is an orthogonal set in R4 .

Note that an orthogonal set may contain the zero vector, and that any set containing the
zero vector is linearly dependent. However, if we insist that our orthogonal set contain only
nonzero vectors, then we obtain a linearly independent set.

Theorem 43.1
If {~v1 , . . . , ~vk } ⊆ Rn is an orthogonal set of nonzero vectors, then {~v1 , . . . , ~vk } is linearly
independent.

Proof. For c1 , . . . , ck ∈ R, consider c1~v1 + · · · + ck~vk = ~0. For each i = 1, . . . , k,


~vi · (c1~v1 + · · · + ck~vk ) = ~vi · ~0.
Expanding the dot product on the left and evaluating the one on the right gives
c1 (~vi · ~v1 ) + · · · + ci−1 (~vi · ~vi−1 ) + ci (~vi · ~vi ) + ci+1 (~vi · ~vi+1 ) + · · · + ck (~vi · ~vk ) = 0.
Since {~v1 , . . . , ~vk } is an orthogonal set, we have ~vi · ~vj = 0 for i 6= j. We thus obtain
ci (~vi · ~vi ) = 0,
that is,
ci k~vi k2 = 0.
Since ~vi 6= ~0, k~vi k =
6 0 and we must have ci = 0. Since i was arbitrary, it follows that
c1 = · · · = ck = 0 and we have that {~v1 , . . . , ~vk } is linearly independent.

289
Definition 43.2: Orthogonal Basis
If an orthogonal set B is a basis for a subspace S of Rn , then B is an orthogonal basis
for S.

If B = {~v1 , . . . , ~vk } is an orthogonal basis of a subspace S of Rn and ~x ∈ S, then, since B is


a basis for S, there exist c1 , . . . , ck ∈ R such that ~x = c1~v1 + · · · + ck~vk . For any i = 1, . . . , k
it follows from B being an orthogonal set that

~vi · ~x = ~vi · (c1~v1 + · · · + ck~vk )


= ci k~vi k2

and using the fact that ~vi 6= ~0, we find


~x · ~vi
ci = .
k~vi k2
Thus,
~x · ~v1 ~x · ~vk
~x = 2
~v1 + · · · + ~vk .
k~v1 k k~vk k2

Hence, we can compute the coefficients that are used to express ~x as a linear combination of
the vectors in B directly, that is, without solving a system of equations. Also note that we
can solve for the coefficients independently of one another.

Example 43.2

Let ~x = [ −1
1 ] and (" # " #)
1 6
B= ,
3 −2
be an orthogonal basis for R2 . Write ~x as a linear combination of the vectors in B.
Solution. For c1 , c2 ∈ R, consider
" # " #
1 6
~x = c1 + c2 .
3 −2

Then
[ −1 1
1 ] · [3] −1 + 3 2 1
c1 = 2 = = =
k[ 13 ]k 1+9 10 5

290
[ −1 6
1 ] · [ −2 ] −6 − 2 −8 1
c2 = 2 = = =−
6
k[ −2 ]k 36 + 4 40 5

and so " # " #


1 1 1 6
~x = − .
5 3 5 −2

Orthonormal Sets and Bases


Definition 43.3: Orthonormal Set, Orthonormal Basis

An orthogonal set {~v1 , . . . , ~vk } ⊆ Rn is called an orthonormal set if


k~vi k = 1 for i = 1, . . . , k. If an orthonormal set B is a basis for subspace S of
Rn , then B is an orthonormal basis for S.

Example 43.3

The standard basis {~e1 , . . . , ~en } for Rn is an orthonormal set (and an orthonormal
basis for Rn ). The set
 √   √   √ 
 1/√3 1/ 6 −1/ 2
√  
 

 1/ 3  ,  −2/ 6  ,  0
   
√ √ √

 
1/ 3 1/ 6 1/ 2
 

is also an orthonormal set (and also an orthonormal basis for R3 ).

Note that the condition k~vi k = 1 excludes the zero vector from any orthonormal set. It
follows that any orthonormal set is an orthogonal set of nonzero vectors27 , and as such, must
be linearly independent by Theorem 43.1.

Given an orthogonal basis B = {~v1 , . . . , ~vk } for a subspace S of Rn , we can obtain an


orthonormal basis
C = {w ~ k}
~ 1, . . . , w
for S by letting
1
w
~i = ~vi
k~vi k
for i = 1, . . . , k.

27
Note that not every orthogonal set of nonzero vectors is an orthonormal set.

291
If B = {~v1 , . . . , ~vk } is an orthonormal basis for a subspace S of Rn , then B is an orthogonal
basis for S and so for any ~x ∈ S we have
~x · ~v1 ~x · ~vk
~x = 2
~v1 + · · · + ~vk
k~v1 k k~vk k2

and since k~vi k = 1 for i = 1, . . . , k, we have

~x = (~x · ~v1 )~v1 + · · · + (~x · ~vk )~vk .

We see that for an orthonormal basis B = {~v1 , . . . , ~vk } of a subspace S of Rn , evaluating


the k dot products ~x · ~v1 , . . . , ~x · ~vk is all that is required to compute the coefficients used to
express ~x ∈ S as a linear combination of ~v1 , . . . , ~vk .

Exercise 43.1
From before, (" # " #)
1 6
B= ,
3 −2
is an orthogonal basis for R2 . Obtain an orthonormal basis C for R2 from B and
express ~x = [ −1
1 ] as a linear combination of the vectors in C.

Solution. Since " # " #


1 √ 6 √ √
= 10 and = 40 = 2 10
3 −2
we have (" √ # " √ #)
1/ 10 3/ 10
C= √ , √
3/ 10 −1/ 10
is an orthonormal basis for R2 . Since
" # " √ # " # " √ #
−1 1/ 10 2 −1 3/ 10 4
· √ =√ and · √ = −√
1 3/ 10 10 1 −1/ 10 10

we obtain " √ # " √ #


2 1/ 10 4 3/ 10
~x = √ √ −√ √ .
10 3/ 10 10 −1/ 10

Orthogonal Matrices

292
Let B = {~v1 , . . . , ~vn } be an orthonormal basis for Rn . Then
(
1 if i = j
~vi · ~vj = .
0 if i 6= j
h i
where 1 ≤ i, j ≤ n. If we define P ∈ Mn×n (R) by P = ~v1 · · · ~vn , then

~v1T h
   
~
v 1 · ~
v 1 · · · ~
v 1 · ~
v n
 .  .. ..
i
P T P =  ..  ~v1 · · · ~vn =  ...
 = In
 
. .
~vnT ~vn · ~v1 · · · ~vn · ~vn

so P T = P −1 .

Definition 43.4: Orthogonal Matrix

Let P ∈ Mn×n (R). P is called an orthogonal matrix if P T P = I.

It follows that every orthogonal matrix P is invertible with P −1 = P T .

Theorem 43.2
Let P ∈ Mn×n (R). The following are equivalent.

(1) P is an orthogonal matrix

(2) the columns of P form an orthonormal basis for Rn

(3) the rows of P form an orthonormal basis for Rn .

It is important to understand that the columns (or rows) of an n × n orthogonal matrix form
an orthonormal basis for Rn (and not simply an orthogonal basis).

293
Example 43.4

• The n × n identity matrix I is an orthogonal matrix since I −1 = I = I T .

" #
cos θ − sin θ
• [ Rθ ] = is an orthogonal matrix since
sin θ cos θ
" #
−1 cos θ sin θ
[ Rθ ] = = [ Rθ ].T
− sin θ cos θ

• The matrix √ √ √ 
1/ 3 1/ 6 −1/ 2
 √ √
P =  1/ 3 −2/ 6 0

√ √ √

1/ 3 1/ 6 1/ 2
is an orthogonal matrix since its columns form an orthonormal basis for Rn by
Example 43.3.

Exercise 43.2
Show that if P ∈ Mn×n (R) is an orthogonal matrix, then P −1 is an orthogonal matrix.

Solution. Since P is an orthogonal matrix, P T P = I, so P −1 = P T . We need to show that


T
P −1 P −1 = I. We have
T T
P −1 P −1 = P T P −1 = P P −1 = I

so P −1 is also an orthogonal matrix.

294
Lecture 44

Gram-Schmidt Procedure
Given a basis {~v1 , . . . , ~vk } for a subspace S of Rn , we wish to find an orthogonal basis for
S. We can then, if needed, normalize the vectors in our orthogonal basis to obtain an or-
thonormal basis for S. We will see that projections will be useful here. We will begin with
the following Lemma which will be useful as we proceed.

Lemma 44.1
Let ~v1 , . . . , ~vk ∈ Rn . For any t1 , . . . , tk−1 ∈ R,

Span {~v1 , . . . , ~vk−1 , ~vk } = Span {~v1 , . . . , ~vk−1 , ~vk + t1~v1 + · · · + tk−1~vk−1 }.

Proof. Since ~vk + t1~v1 + · · · + tk−1~vk−1 ∈ Span {~v1 , . . . , ~vk−1 , ~vk } for any t1 , . . . , tk−1 ∈ R, it
follows from Theorem 24.2 that

Span {~v1 , . . . , ~vk−1 , ~vk } = Span {~v1 , . . . , ~vk−1 , ~vk , ~vk + t1~v1 + · · · + tk−1~vk−1 },

and since

~vk = −t1~v1 − · · · − tk−1~vk−1 + (~vk + t1~v1 + · · · + tk−1~vk−1 )


∈ Span {~v1 , . . . , ~vk−1 , ~vk + t1~v1 + · · · + tk−1~vk−1 }

it again follows from Theorem 24.2 that

Span {~v1 , . . . , ~vk−1 , ~vk , ~vk +t1~v1 +· · ·+tk−1~vk−1 } = Span {~v1 , . . . , ~vk−1 , ~vk +t1~v1 +· · ·+tk−1~vk−1 }.

Thus
Span {~v1 , . . . , ~vk−1 , ~vk } = Span {~v1 , . . . , ~vk−1 , ~vk + t1~v1 + · · · + tk−1~vk−1 }.

Example 44.1

Let B = {~v1 , ~v2 } with ~v1 = [ 12 ] and ~v2 = [ 13 ] be a basis for R2 . Since ~v1 · ~v2 = 7 6= 0,
B is not an orthogonal basis for R2 . Let
" #
1
w
~ 1 = ~v1 = ,
2
" # " # " #
~v2 · w~1 1 7 1 −2/5
~ 2 = ~v2 − proj w~ 1 ~v2 = ~v2 −
w w
~1 = − = .
kw~ 1 k2 3 5 2 1/5

295
Then {w ~ 2 } is an orthogonal basis for R2 since it contains two nonzero nonparallel
~ 1, w
vectors and w ~1 · w~ 2 = 0. We may then normalize w ~ 1 and w2 :
" # " √ #
1 1 1 1/ 5
~u1 = ~1 = √
w = √
kw
~ 1k 5 2 2/ 5
" # " √ #
1 1 −2/5 −2/ 5
~u2 = ~2 = √
w = √
kw
~ 2k 1/ 5 1/5 1/ 5

so {~u1 , ~u2 } is an orthonormal basis for R2 .

In Example 44.1, we showed that {w ~ 2 } is basis for R2 by justifying that it is a linearly


~ 1, w
independent set of two vectors in R2 . Note that this actually follows from Lemma 44.1 with
~ 1 = ~v1 , we have that Span {~v1 , ~v2 } = Span {w
k = 2: since w ~ 1 , ~v2 }, and since
~v2 · w
~1 ~v2 · w
~1
~ 2 = ~v2 − proj w~ 1 ~v1 = ~v2 −
w ~ 1 = ~v2 −
w ~v1 since w
~ 1 = ~v1
kw
~ 1k kw
~ 1k
~v2 · w~1
= ~v2 + t~v1 with t = − ,
kw~ 1k
we have that Span {w~ 1 , ~v2 } = Span {w ~ 2 }. We also verified that {w
~ 1, w ~ 2 } was an orthog-
~ 1, w
onal set by showing that w ~1 · w
~ 2 = 0, however, note that
~ 2 = ~v2 − proj w~ 1 ~v2 = perp w~ 1 ~v1 = perp ~v1 ~v2
w since w
~ 1 = ~v1
so w
~ 1 = ~v1 is clearly orthogonal to w
~ 2 = perp ~v1 ~v2 .

We now extend this procedure to k vectors in Rn .

Theorem 44.1: Gram-Schmidt Procedure


Let {~v1 , . . . , ~vk } be a basis for a subspace S of Rn . Define

w
~ 1 = ~v1
~ 2 = ~v2 − proj w~ 1 ~v2
w
~ 3 = ~v3 − proj w~ 1 ~v3 − proj w~ 2 ~v3
w
...
~ k = ~vk − proj w~ 1 ~vk − proj w~ 2 ~vk − · · · − proj w~ k−1 ~vk
w

Then {w ~ k } is an orthogonal basis for S and for each j = 1, . . . , k,


~ 1, . . . , w

Span {~v1 , . . . , ~vj } = Span {w ~ j }.


~ 1, . . . , w

296
Exercise 44.1
With k = 3 in Theorem 44.1, show that {w ~ 1, w ~ 3 } is an orthogonal set. You may
~ 2, w
use the fact that w
~ 1, w
~ 2, w
~ 3 are nonzero without justification.

Solution. Since w
~ 1, w ~ 3 6= ~0, we have that kw
~ 2, w ~ 1 k, kw
~ 2 k, kw
~ 3k =
6 0. Then

~1 · w
w ~ 1 · (~v2 − proj w~ 1 ~v2 )
~2 = w
 
~v2 · w~1
=w~ 1 · ~v2 − w
~1
kw~ 1 k2
~v2 · w ~1
=w~ 1 · ~v2 − (w~1 · w
~ 1)
kw ~ 1 k2
=w~ 1 · ~v2 − ~v2 · w ~ 1 (since w ~1 · w ~ 1 k2 )
~ 1 = kw
= 0.

Next,

~1 · w
w ~ 1 · (~v3 − proj w~ 1 ~v3 − proj w~ 2 ~v3 )
~3 = w
 
~v3 · w~1 ~v3 · w~2
=w~ 1 · ~v3 − ~1 −
w w
~2
kw~ 1 k2 kw~ 2 k2
~v3 · w~1 ~v3 · w~2
=w~ 1 · ~v3 − 2
(w~1 · w
~ 1) − ~1 · w
(w ~ 2)
kw ~ 1k kw~ 2 k2
=w~ 1 · ~v3 − ~v2 · w ~ 1 (since w ~1 · w~ 1 = kw ~ 1 k2 and w
~1 · w
~ 2 = 0)
= 0.

Finally,

~2 · w
w ~ 2 · (~v3 − proj w~ 1 ~v3 − proj w~ 2 ~v3 )
~3 = w
 
~v3 · w~1 ~v3 · w~2
=w~ 2 · ~v3 − ~1 −
w w
~2
kw~ 1 k2 kw~ 2 k2
~v3 · w~1 ~v3 · w~2
=w~ 2 · ~v3 − 2
(w~2 · w
~ 1) − ~2 · w
(w ~ 2)
kw ~ 1k kw~ 2 k2
=w~ 2 · ~v3 − ~v3 · w ~ 2 (since w ~2 · w~ 1 = 0 and w ~2 · w ~ 2 k2 )
~ 2 = kw
=0

Thus {w
~ 1, w ~ 3 } is an orthogonal set.
~ 2, w

297
Example 44.2

Let {~v1 , ~v2 , ~v3 } be a basis for a subspace S of R5 with


     
1 −1 0
 1   2   1
     

     
~v1 = 
 0  , ~v2 =  1  and ~v3 =  1
    .

 1   0   1
     

1 1 2

Find an orthonormal basis for S.


Solution. We set w
~ 1 = ~v1 . Then we compute
     
−1 1 −3/2
2 1 3/2
     
     
~v2 · w~1   2   
~v2 − proj w~ 1 ~v2 = ~v2 − w
~ 1 =  1 −  0 = 1 .
kw~ 1 k2   4   
0 1 −1/2
     
     
1 1 1/2

We can eliminate fractions here by multiplying our result by 2 to obtain


 
−3
 3 
 
 
w
~2 = 
 2 .

−1
 
 
1

Then we compute
~v3 · w
~1 ~v3 · w~2
~v3 − proj w~ 1 ~v3 − proj w~ 2 ~v3 = ~v3 − 2
~1 −
w w
~2
kw~ 1k kw~ 2 k2
       
0 1 −3 −1/4
 1   1   3   −3/4 
       
  4 
 −  0  − 6  2  =  1/2 
   
= 1
  4   24    
 1   1   −1   1/4 
       
2 1 1 3/4

298
and to eliminate fractions, we can scale our result by a factor of 4 to obtain
 
−1
 −3 
 
 
w
~3 =  2 .

 1 
 
3

Thus {w
~ 1, w ~ 3 } is an orthogonal basis for S. We now normalize each of these vectors.
~ 2, w
We take
 
1
 1 
 
1 1 
~u1 = w
~1 =  0 
kw
~ 1k 2 
 1 
 
1
 
−3
 3 
 
1 1  
~u2 = ~2 = √ 
w 2 
kw
~ 2k 2 6 
 −1 

1
 
−1
 −3 
 
1 1  
~u3 = ~3 = √ 
w 2 
kw
~ 3k 2 6 
1
 
 
3

which gives {~u1 , ~u2 , ~u3 } as our orthonormal basis.

Notice in Example 44.2, we multiplied some of the w


~ i ’s by nonzero scalars to eliminate frac-
tions. In doing so, we preserve orthogonality and are left with numbers that are easier to
work with in subsequent computations.

299
Exercise 44.2
Let {~v1 , ~v2 , ~v3 } be a basis for a subspace S of R4 with
     
1 0 3
 1   −1   0 
~v1 =  , ~v2 =  and ~v3 =  .
     

 0   1   1 
1 1 1

Find an orthogonal basis for S.

Solution. Let w
~ 1 = ~v1 . Then
     
0 1 0
~v2 · w
~1  −1  0 1   −1 
~v2 − proj w~ 1 ~v2 = ~v2 − w
~1 =  −  =
     
kw~ 1k2  3

 1 0   1 
1 1 1

so we take w
~ 2 = ~v2 (note that ~v1 and ~v2 were orthogonal already). Finally we compute

~v3 · w
~1 ~v3 · w~2
~v3 − proj w~ 1 ~v3 − proj w~ 2 ~v3 = ~v3 − 2
~1 −
w w
~2
kw~ 1k kw~ 2 k2
       
3 1 0 5/3
 0  4 1  2  −1   −2/3 
= −  −  =
       
 3  3 1

 1 0   1/3 
1 1 1 −1

so we take  
5/3
 −2/3 
w
~3 =  .
 
 1/3 
−1
Thus {w
~ 1, w ~ 3 } is an orthogonal basis for S.
~ 2, w

300
Example 44.3

Find an orthogonal basis for S = Span {~v1 , ~v2 , ~v3 } where


     
1 0 1
~v1 =  1  , ~v2 =  −1  and ~v3 =  0  .
     

2 −1 1

Solution. We take w
~ 1 = ~v1 . Then we compute
     
0 1 1/2
~v2 · w
~1  −3   
~v2 − proj w~ 1 ~v2 = ~v2 − ~ 1 =  −1  −
w  1  =  −1/2 
 
kw~ 1k 2 6
−1 2 0

so we will take  
1
~ 2 =  −1  .
w
 

0
Then we compute
~v3 · w
~1 ~v3 · w~2
~v3 − proj w~ 1 ~v3 − proj w~ 2 ~v3 = ~v3 − 2
~1 −
w w
~2
kw~ 1k kw~ 2 k2
       
1 1 1 0
  3  1
=  0  −  1  −  −1  =  0 
  
6 2
1 2 0 0

Rearranging the last equation, we find that ~v3 = 21 w ~ 1 + 12 w


~ 2. Thus, ~v3 ∈
Span {w ~ 2 } = Span {~v1 , ~v2 }. It follows from Theorem 24.2 that
~ 1, w

S = Span {~v1 , ~v2 , ~v3 } = Span {~v1 , ~v2 } = Span {w ~ 2 }.


~ 1, w

Thus, {w ~ 2 } is an orthogonal basis for S.


~ 1, w

Example 44.3 shows that if a spanning set for a subspace S is linearly dependent, we can
still apply the Gram Schmidt Procedure to this spanning set to obtain an orthogonal basis
for S. If at some point during the Gram Schmidt Procedure we find that w ~ i = ~0, it simply
means that ~vi ∈ Span {w ~ i−1 } = Span {~v1 , . . . , ~vi−1 }. We then simply discard ~vi from
~ 1, . . . , w
our original spanning set and then continue with the Gram Schmidt Procedure.

301
Lecture 45

Orthogonal Diagonalization
We now return to diagonalizing n × n real matrices. Our goal, given A ∈ Mn×n (R), is to de-
termine if there exists and orthogonal matrix P and a diagonal matrix D so that P T AP = D.
The following definition will be useful.

Definition 45.1: Orthogonal Subspaces


Let S1 and S2 be two subspaces of Rn . We say S1 and S2 are orthogonal if ~x · ~y = 0
for every ~x ∈ S1 and every ~y ∈ S2 .

Exercise 45.1
Let S1 and S2 be two subspaces of Rn , with {~v1 , · · · , ~vk } a basis for S1 and {w ~ `}
~ 1, . . . , w
a basis for S2 . Then S1 and S2 are orthogonal if and only if ~vi · w ~ j = 0 for every
i = 1, . . . , k and j = 1, . . . , `.

Thus, when checking if two subspace of Rn are orthogonal, it is sufficient to check


that each basis vector for one subspace is orthogonal to every basis vector of the other
subspace.

Proof. Assume first that S1 and S2 are orthogonal. Then it follows from Definition 45.1 that
~vi · w
~ j = 0 for every i = 1, . . . , k and j = 1, . . . , `. Now assume that ~vi · w ~ j = 0 for every
i = 1, . . . , k and j = 1, . . . , `. Let ~x ∈ S1 and ~y ∈ S2 . Then there are c1 , . . . , ck , d1 , . . . , d` ∈ R
such that ~x = c1~v1 + · · · + ck~vk and ~y = d1 w ~ 1 + · · · + d` w
~ ` . Then

~x · ~y = (c1~v1 + · · · + ck~vk ) · (d1 w~ 1 + · · · + d` w~ `)


= c1~v1 · (d1 w
~ 1 + · · · + d` w~ ` ) + · · · + ck~vk · (d1 w
~ 1 + · · · + d` w~ `)
 
= c1 d1 (~v1 · w
~ 1 ) + · · · + d` (~v1 · w
~ ` ) + · · · + ck d1 (~vk · w ~ 1 ) + · · · + d` (~vk · w
~ `)
 
= c1 d1 (0) + · · · + d` (0) + · · · + ck d1 (0) + · · · + d` (0)
= c1 (0) + · · · + ck (0)
=0

so S1 and S2 are orthogonal.


Recall the matrix  
0 1 1
A =  1 0 1 .
 

1 1 0

302
In Example 40.1 we found that the eigenvalues of A are
λ1 = −1 with algebraic multiplicity aλ1 = 2
λ2 = 2 with algebraic multiplicity aλ2 = 1
and that the bases for the corresponding eigenspaces are
   

 −1 −1 

 1  ,  0  for Eλ1 (A) so λ1 = −1 has geometric multiplicity gλ1 = 2
   
 
0 1
 
 
 1 
 
 1  for Eλ2 (A) so λ2 = 2 has geometric multiplicity gλ2 = 1.
 
 
1
 

We found that A is diagonalizable in Example 41.3 with


   
−1 −1 1 −1 0 0
P = 1 0 1  and D =  0 −1 0  .
   

0 1 1 0 0 2
We let      
−1 −1 1
~v1 =  1  , ~v2 =  0  and ~v3 =  1 
     

0 1 1
and compute
~v1 · ~v2 = 1, ~v1 · ~v3 = 0 and ~v2 · ~v3 = 0.
Thus ~v1 and ~v2 (the basis vectors we found for Eλ1 (A)) are both orthogonal to ~v3 (the basis
vector we found for Eλ2 (A)) and thus Eλ1 (A) is orthogonal to Eλ2 (A). However, ~v1 is not
orthogonal to ~v2 , so we now apply the Gram-Schmidt Procedure to {~v1 , ~v2 } to obtain an
orthogonal basis {w ~ 1, w~ 2 } for Eλ1 (A).
     
−1 −1 −1/2
~v2 · w
~1  1
~v2 − proj w~ 1 ~v2 = ~v2 = ~ 1 =  0  −  1  =  −1/2  .
w
   
kw~ 1k 2 2
1 0 1
We take    
−1/2 −1
~ 2 = 2  −1/2  =  −1 
w
   

1 2
so that {w ~ 2 } is an orthogonal basis for Eλ1 (A). Note that now
~ 1, w
~1 · w
w ~ 2 = 0, ~ 1 · ~v3 = 0 and w
w ~ 2 · ~v3 = 0.

303
so that {w ~ 2 , ~v3 } is an orthogonal basis for R3 consisting of eigenvectors of A. We can
~ 1, w
then normalize the vectors w ~ 1, w
~ 2 and ~v3 to obtain
   √ 
−1 −1/ 2
1 1  √ 
~u1 = w~ 1 = √  1  =  1/ 2 
 
kw~ 1k 2
0 0
   √ 
−1 −1/ 6
1 1  √ 
~u2 = w~ 2 = √  −1  =  −1/ 6 
 
kw~ 2k 6 √
2 2/ 6
   √ 
1 −1/ 3
1 1    √ 
~u3 = ~v3 = √  1  =  −1/ 3  .
k~v3 k 3 √
1 −1/ 3

We have that {~u1 , ~u2 } is an orthonormal basis for Eλ1 (A), {~u3 } is an orthonormal basis for
Eλ1 (A) and {~u1 , ~u2 , ~u3 } is an orthonormal basis for R3 consisting of eigenvectors of A. We
set  √ √ √   
−1/ 2 −1/ 6 1/ 3 −1 0 0
√ √ √ 
P =  1/ 2 −1/ 6 1/ 3  and D =  0 −1 0 
  
√ √
0 2/ 6 1/ 3 0 0 2
and we again have P −1 AP = D (recall that the diagonalizing matrix P is not unique).
However, P is an orthogonal matrix, so P −1 = P T . Thus

P T AP = D,

so we say that A is orthogonally diagonalizable and that P orthogonally diagonalizes A to


D

Definition 45.2: Orthogonally Diagonalizable Matrix

An A ∈ Mn×n (R) is orthogonally diagonalizable if there exists an n × n orthogonal


matrix P and an n × n diagonal matrix D so that P T AP = D. In this case, we say
that P orthogonally diagonalizes A to D.

It’s important to remember that a matrix needs to be diagonalizable before it can be or-
thogonally diagonalizable. However, the next example shows that not every diagonalizable
matrix is orthogonally diagonalizable.

304
Example 45.1
From Example 41.2, the matrix
" #
1 2
A=
−1 4

has eigenvalues λ1 = 2 and λ2 = 3 with


(" #) (" #)
2 1
a basis for Eλ1 (A) and a basis for Eλ2 (A).
1 1

Since the basis vectors from the two eigenspaces are not orthogonal, the eigenspaces
are not orthogonal. As a result, we cannot find an orthonormal basis for R2 consisting
of eigenvectors of A and thus A is not orthogonally diagonalizable.
h i
0 1 1 1 2
Note that the matrix 1 0 1 is symmetric while the matrix [ −1 4 ] is not symmetric. As
1 1 0
we will derive below, symmetry is needed for a matrix A ∈ Mn×n (R) to be orthogonally
diagonalizable. We begin with the following Theorem, the proof of which is omitted.

Theorem 45.1
Let A ∈ Mn×n (R) be a symmetric matrix. Then every eigenvalue of A is real.

It follows from Theorem 45.1 that eigenvectors of a symmetric matrix A ∈ Mn×n (R) are
vectors in Rn as thus the eigenspaces of a symmetric matrix A ∈ Mn×n (R) are subspaces of
Rn .

The following Lemma uses the fact that for ~a, ~b ∈ Rn , ~a · ~b = ~a T~b.

Lemma 45.1
Let A ∈ Mn×n (R). Then A is symmetric if and only if ~x · (A~y ) = (A~x) · ~y for all
~x, ~y ∈ Rn .

Proof. Assume A ∈ Mn×n (R) is symmetric so AT = A. Then for any ~x, ~y ∈ Rn ,

~x · (A~y ) = ~x T A~y = ~x T AT ~y = (A~x)T ~y = (A~x) · ~y .

Assume now that ~x · (A~y ) = (A~x) · ~y for all ~x, ~y ∈ Rn . We must show that A is symmetric.
Since
~x T A~y = ~x · (A~y ) = (A~x) · ~y = (A~x)T ~y

305
holds for every ~y ∈ Rn , we see that ~x T A = (A~x)T by the Matrices Equal Theorem. From
this, we have that for every ~x ∈ Rn
~x T A = (A~x)T
(~x T A)T = A~x (transposing both sides)
AT ~x = A~x
so that AT = A by the Matrices Equal Theorem, so A is symmetric.
The next theorem confirms that the eigenspaces of a symmetric matrix are orthogonal sub-
spaces.

Theorem 45.2
Let A ∈ Mn×n (R) be a symmetric matrix. If λ1 , λ2 are distinct eigenvalues of A, then
Eλ1 (A) and Eλ2 (A) are orthogonal subspaces of Rn .

Proof. Let ~x1 ∈ Eλ1 (A) and ~x2 ∈ Eλ2 (A). Then A~x1 = λ1~x1 and A~x2 = λ2~x2 . We must
show that ~x1 · ~x2 = 0. This is trivial if either ~x1 = ~0 or ~x2 = ~0, so we assume ~x1 , ~x2 6= ~0.
Then
λ1 (~x1 · ~x2 ) = (λ1~x1 ) · ~x2
= (A~x1 ) · ~x2
= ~x1 · (A~x2 ) by Lemma 45.1 since A is symmetric
= ~x1 · (λ2~x2 )
= λ2 (~x1 · ~x2 ).
Since λ1 (~x1 · ~x2 ) = λ2 (~x1 · ~x2 ), we have that (λ1 − λ2 )(~x1 · ~x2 ) = 0. However, since λ1 and λ2
are distinct, λ1 − λ2 6= 0 so ~x1 · ~x2 = 0. Thus Eλ1 (A) and Eλ2 (A) are orthogonal subspaces
of Rn

Theorem 45.3
Let A ∈ Mn×n (R). Then A is symmetric if and only if A is orthogonally diagonalizable.

Proof. We prove that if A ∈ Mn×n (R) is orthogonally diagonalizable, then A is symmetric.


If A is orthogonally diagonalizable, then there is an n × n orthogonal matrix P and and
n × n diagonal matrix D so that P T AP = D. It follows that A = P DP T and
AT = P DP T )T = P T )T DT P T = P DP T = A
where we have used the fact that D being diagonal implies that DT = D. Hence AT = A
so A is symmetric. The proof that A being symmetric implying that A is orthogonally
diagonalizable is beyond the scope of this course and is thus omitted.

306
Lecture 46

Example 46.1
 
5 −4 −2
Orthogonally diagonalize A =  −4 5 −2 .
 

−2 −2 8
Solution. We first find the characteristic polynomial.
5 − λ −4 −2 R1 −R2 9 − λ −9 + λ 0 C2 +C1 →C2

CA (λ) = −4 5 − λ −2 = −4 5−λ −2 =
−2 −2 8 − λ −2 −2 8−λ
9−λ 0 0
1 − λ −2
−4 1 − λ −2 = (9 − λ)
−4 8 − λ
−2 −4 8 − λ

= −(λ − 9) (1 − λ)(8 − λ) − 8
= −(λ − 9)(8 − 9λ + λ2 − 8)
= −(λ − 9)(λ2 − 9)
= −λ(λ − 9)2
so λ1 = 0 with aλ1 = 1 and λ2 = 9 with aλ2 = 2. For each eigenvalue, we now find a
basis for the corresponding eigenspace. For λ1 = 0, we solve (A − 0I)~x = ~0, that is,
we solve A~x = ~0.
     
5 −4 −2 R1 +R2 1 1 −4 −→ 1 1 −4 −→
A =  −4 5 −2  −→  −4 5 −2  R2 +4R1  0 9 −18  9 R2
      1

−2 −2 8 −2 −2 8 R3 +2R1 0 0 0
   
1 1 −4 R1 −R2 1 0 −2
 0 1 −2  −→  0 1 −2 
   

0 0 0 0 0 0
so    
2t 2
~x =  2t  = t  2  , t ∈ R.
   

t 1
Hence a basis for Eλ1 (A) is  
 2 
 
B1 =  2 
 
 
1
 

307
and gλ1 = 1 = aλ1 . For λ2 = 9, we solve (A − 9I)~x = ~0.
     
−4 −4 −2 − 41 R1 1 1 1/2 −→ 1 1 1/2
A − 9I =  −4 −4 −2  − 14 R2 −→  1 1 1/2  R2 −R1  0 0 0 
     

−2 −2 −1 − 12 R3 1 1 1/2 R3 −R1 0 0 0
so      
−s − t/2 −1 −1/2
~x =  s  = s 1  + t 0 , s, t ∈ R.
     

t 0 1
Hence a basis for Eλ2 (A) is
   
 −1
 −1  
B2 =  1  ,  0 
   
 
0 2
 

and gλ2 = 2 = aλ2 . Note that the second vector in the basis for Eλ2 (A) was scaled by
a factor of 2 to eliminate fractions. We let
     
2 −1 −1
~v1 =  2  , ~v2 =  1  and ~v3 =  0  .
     

1 0 2

Now is a good time to verify that A~v1 = ~0, A~v2 = 9~v2 and A~v3 = 9~v3 . We will also
note here that A is diagonalizable since aλ1 = gλ1 and aλ2 = gλ2 (alternatively, A
symmetric implies that A is orthogonally diagonalizable which in turn implies that A
is diagonalizable). It’s also a good time to verify that ~v1 is orthogonal to both ~v2 and
~v3 , that is, that the two eigenspaces are orthogonal. We now find an orthogonal basis
for Eλ2 (A) by applying the Gram-Schmidt Procedure to {~v2 , ~v3 }. Let w ~ 2 = ~v2 and
compute
     
−1 −1 −1/2
~v3 · w
~2  1
~v3 − proj w~ 2 ~v3 = ~v3 − ~ 2 =  0  −  1  =  −1/2 
w
   
kw~ 2k 2 2
2 0 2

and let    
−1/2 1
~ 3 = −2  −1/2  =  1  .
w
   

2 −4

308
Thus {~v1 } is an orthogonal basis for Eλ1 (A) and {w ~ 3 } is an orthogonal basis for
~ 2, w
Eλ2 (A). Moreover, since the eigenspace are orthogonal, {~v1 , w ~ 3 } is an orthogonal
~ 2, w
basis for R3 (since it is a linearly independent set of 3 vectors in R3 ). We then normalize
~v1 , w ~ 3 to obtain an orthonormal basis for R3 . We have
~ 2 and w
   
2 2/3
1 1  
~u1 = ~v1 =  2  =  2/3 

k~v1 k 3
1 1/3
   √ 
−1 −1/ 2
1 1  √ 
~u2 = w~ 2 = √  1  =  1/ 2 
 
kw~ 2k 2
0 0
   √ 
1 1/(3 2)
1 1  √ 
~u3 = w~ 3 = √  1  =  1/(3 2)  .
 
kw~ 3k 3 2 √
−4 −4/(3 2)

Finally, we see that P T AP = D where


 √ √   
2/3 −1/ 2 1/(3 2) 0 0 0
√ √ 
P =  2/3 1/ 2 1/(3 2)  and D =  0 9 0 
  

1/3 0 −4/(3 2) 0 0 9

309
THE END

28
A 4−dimensional cube, often called a tesseract or a hypercube. The same hypercube is depicted on the
cover of these notes, but is viewed from a different angle.

310
Appendix A

Introduction to Set Theory


Sets will play an important role in linear algebra, so we need to be able to understand the
basic results concerning them. We begin with the definition of a set. Note that this definition
is far from the formal definition, and can lead to contradictions if we are not careful. For
our purposes here however, this definition will be sufficient.

Definition A.1: Set


A set is a collection of objects. We call the objects elements of the set

Example A.1

• S = {1, 2, 3} is a set with three elements, namely 1, 2 and 3,

• T = {♥, f (x), {1, 2}, 3},

• ∅ = { }, the set with no elements, which is called the empty set.

We see that one way to describe a set is to list the elements of the set between curly
braces “{” and “}”. The set T shows that a set can have elements other than numbers:
the elements can be functions, other sets, or other symbols. The empty set has no
elements in it, and we normally prefer using ∅ over { } in this case.

Given a set S, we write x ∈ S if x is an element of S, and x ∈


/ S is x is not an element of S.

Example A.2

For T = {♥, f (x), {1, 2}, 3}, we have

♥ ∈ T, f (x) ∈ T, {1, 2} ∈ T and 3 ∈ T

but
1∈
/T and 2 ∈
/ T.

Example A.3
Here are a few more sets that we know:

• N = {1, 2, 3, . . .},

311
• Z = {. . . , −3, −2, −1, 0, 1, 2, 3, . . .},
na o
• Q= a, b ∈ Z, b 6= 0 ,
b
• R is the the set of all numbers that are either rational or irrational,

• C = {a + bj | a, b ∈ R},
  
 x 1 
 .. 
 
n
• R =  .  x1 , . . . , x n ∈ R .

 

xn

Note that each of these sets contains infinitely many elements. The sets N and Z are
defined by listing their elements (or rather, listing enough elements so that you “get
the idea”), the set R is defined using words, and the sets Q, C and Rn are defined
using set builder notation where an arbitrary element is described. For example, the
set na o
Q= a, b ∈ Z, b 6= 0
b
is understood to mean “Q is the set of all fractions of the form ab where a and b are
integers and b is nonzero”.

Example A.4
    

 x 1 
 1
3
Let S =  x2  ∈ R 2x1 − x2 + x3 = 4 . Is  2  ∈ S?
   
 
x3 3
 
 
1
Solution. Since 2(1) − 2 + 3 = 3 6= 4, we have that  2  ∈
/ S.
 

We now define two ways that we can combine given sets to create new sets.

Definition A.2: Union and Intersection


Let S, T be sets. The union of S and T is the set
S ∪ T = {x | x ∈ S or x ∈ T }
and the intersection of S and T is the set
S ∩ T = {x | x ∈ S and x ∈ T }.

312
We can visualize the union and intersection of two sets using Venn Diagrams. Although
Venn Diagrams can help us visualize sets, they should never be used as part of a proof of
any statement regarding sets.

(a) A Venn Diagram depicting the union (b) A Venn Diagram depicting the inter-
of two sets S and T . section of two sets S and T .

Figure A.1: Venn Diagrams.

Example A.5

If S = {1, 2, 3, 4} and T = {−1, 2, 4, 6, 7}, then

S ∪ T = {−1, 1, 2, 3, 4, 6, 7}
S ∩ T = {2, 4}

Definition A.3: Subset


Let S, T be sets. We say that S is a subset of T (and we write S ⊆ T ) if for every
x ∈ S we have that x ∈ T . If S is not a subset of T , then we write S 6⊆ T .

Figure A.2: A Venn diagram showing an instance when S ⊆ T on the left, and an
instance when S 6⊆ T (and also T 6⊆ S) on the right.

313
Example A.6

Let S = {1, 2, 4} and T = {1, 2, 3, 4}. Then S ⊆ T since every element of S is an


element of T , but T 6⊆ S since 3 ∈ T , but 3 ∈
/ S.

Note that it’s important to distinguish between an element of a set and a subset of a set.
For example,
1 ∈ {1, 2, 3} but 1 6⊆ {1, 2, 3}
and
{1} ∈
/ {1, 2, 3} but {1} ⊆ {1, 2, 3}.
More interestingly,

{1, 2} ∈ {1, 2, {1, 2}} and {1, 2} ⊆ {1, 2, {1, 2}}

which shows that an element of a set may also be a subset of a set. This last example can
cause students to stumble, so the following may help:

{1, 2} ∈ {1, 2, {1, 2}} and {1, 2} ⊆ {1, 2, {1, 2}}.

Finally we mention that for any set S, we have that ∅ ⊆ S. This generally seems quite
strange at first. However if ∅ 6⊆ S, then there must be some element x ∈ ∅ such that x ∈
/ S.
But the empty set contains no elements, so we can never show that ∅ is not a subset of S.
Thus we are forced to conclude that ∅ ⊆ S.28

Definition A.4: Equality


Let S, T be sets. We say that S = T if S ⊆ T and T ⊆ S.

Example A.7
Let
( " # " # " # )
1 1 2
S= c1 + c2 + c3 c1 , c2 , c3 ∈ R
2 1 3
( " # " # )
1 1
T = d1 + d2 d1 , d2 ∈ R .
2 1
Show that S = T .

Before we give the solution, we note that S is the set of all linear combinations of the
vectors [ 12 ], [ 11 ] and [ 23 ] while T is the set of all linear combinations of just [ 12 ] and [ 11 ].

28
The statement ∅ ⊆ S is called vacuously true, that is, it is a true statement simply because we cannot
show that it is false.

314
However, we notice that " # " # " #
2 1 1
= + . (30)
3 2 1

Solution. We show that S = T by showing that S ⊆ T and that T ⊆ S. To show that


S ⊆ T , we choose an arbitrary ~x ∈ S and show that ~x ∈ T . So, let ~x ∈ S. Then there
exist c1 , c2 , c3 ∈ R such that
" # " # " #
1 1 2
~x = c1 + c2 + c3
2 1 3
" # " # " # " #!
1 1 1 1
= c1 + c2 + c3 + by (30)
2 1 2 1
" # " #
1 1
= (c1 + c3 ) + (c2 + c3 )
2 1

from which it follows that ~x ∈ T since ~x can be expressed as a linear combination of


[ 12 ] and [ 11 ]. This shows that S ⊆ T . We now show that T ⊆ S by showing that if
~y ∈ T then ~y ∈ S. Let ~y ∈ T . Then there exist d1 , d2 ∈ R such that
" # " #
1 1
~y = d1 + d2
2 1
" # " # " #
1 1 2
= d1 + d2 +0
2 1 3

from which it follows that ~y ∈ S since ~y can be expressed as a linear combination of


[ 12 ], [ 11 ] and [ 23 ]. Thus T ⊆ S. Since S ⊆ T and T ⊆ S, we conclude that S = T .

315
Appendix B

Determinants and Area


h u1 i h v1 i
Let ~u = [ uu12 ] and ~v = [ vv12 ] be vectors in R2 . Recall that [ uu12 ] 6= u2 , [ vv12 ] 6= v2 , and
0 0
that the parallelogram
h u1 i determined
h v1 i by [ uu12 ] and [ vv12 ] is a subset of R2 while the parallelogram
determined by u2 and v2 is a subset of R3 . However, these two parallelograms do have
0 0
the same area. See Figure B.1.

Figure B.1: A parallelogram determined by ~u, ~v ∈ R2 on the left, and its “realiza-
tion” lying in the x1 x2 −plane of R3 on the right.

Thus the area, A, of the parallelogram ~u, ~v ∈ R2 determine can be computed29 as


     
u1 v1 0
p
A =  u2  ×  v2  =  0  = (u1 v2 − v1 u2 )2
     

0 0 u1 v2 − v1 u2
" #
u1 v1
= |u1 v2 − v1 u2 | = det = det[ ~u ~v ] .
u2 v2

29
We need to be careful here: we explicitly write “det” when indicating a determinant since we are using
“| · · · |” to indicate absolute value and not the determinant. Mathematics often uses the same notation to
mean different things in different settings, and so we must be careful in cases such as this when such notation
could be interpreted in several ways.

316
Example B.1

The area of the parallelogram determined by the vectors ~u = [ 12 ] and ~v = [ 34 ] is


" #
1 3
A = det = |4 − 6| = | − 2| = 2.
2 4

Now, consider ~u, ~v ∈ R2 and a linear transformation L : R2 → R2 with standard matrix [ L ].


Using Theorem 38.2, the area of the parallelogram determined by L(~u) and L(~v ) is

A = det[ L(~u) L(~v ) ]


 
= det [ L ]~u [ L ]~v

= det [ L ][ ~u ~v ]
= det[ L ] det[ ~u ~v ] .

Example B.2

Let ~u, ~v ∈ R2 determine a parallelogram with area equal to 4. Let L : R2 → R2 be a


linear transformation with standard matrix
" #
1 2
[L] = .
−1 1

Then the area, A, of the parallelogram determined by L(~u) and L(~v ) is

A = det[ L(~u) L(~v ) ] = det[ L ] det[ ~u ~v ] = 1 − (−2) (4) = 3(4) = 12

Example B.2 is illustrated in Figure B.2.

Figure B.2: The parallelogram determined by ~u and ~v on the left and its image
under the linear transformation L on the right.
317
Although we have focused on parallelograms, our work generalizes to any shape in R2 . For
example, consider a circle of radius r = 1 centred at the origin in R2 . The area of this circle
is Acircle = πr2 = π(1)2 = π. If we consider a stretch in the x1 −direction by a factor of 2,
then we are considering the linear transformation L : R2 → R2 with standard matrix
" #
2 0
[L] = .
0 1

The image of our circle under L is called an ellipse, and this ellipse has area

Aellipse = det[ L ] Acircle = |2|π = 2π.

Figure B.3 depicts our circle and the resulting ellipse, and shows that our result for the area
of the ellipse is consistent with the actual formula for the area of an ellipse.

Figure B.3: A circle of radius 1 centred at the origin on the left, and its image
under the linear transformation L on the right.

Determinants and Volume


h u1 i h v1 i h w1 i
Let ~u = uu23 , ~v = vv23 and w~ = w 2
w3
be three vectors in R3 . From before, we know the
volume, V , of the parallelepiped they determine is given by

V = |~u · (~v × w)|.


~

Working with the components of ~u, ~v and w


~ gives

V = |~u · (~v × w)|


~
     
u1 v1 w1
=  u2  ·  v2  ×  w2 
     

u3 v3 w3

318
   
u1 v2 w3 − w2 v3
=  u2  ·  −(v1 w3 − w1 v3 ) 
   

u3 v1 w2 − w1 v2
 
u1 v1 w1
= det  u2 v2 w2 
 

u3 v3 w3
h i
= det ~u ~v w~ .

~ ∈ R3 and any linear transformation


From this, it follows that for three vectors ~u, ~v , w
L : R3 → R3 , the volume of the parallelepiped determined by L(~u), L(~v ) and L(w)
~ is

V = det[ L(~u) L(~v ) L(w)


~ ] = det[ L ] det[ ~u ~v w
~ ]

with the derivation being the same as the two dimensional case.

As in R2 , our work generalizes to any shape in R3 . For example, consider a sphere of radius
r = 1 centred at the origin in R3 . The volume of this sphere is Vsphere = 34 πr3 = 43 π(1)3 = 43 π.
If we consider a stretch in the x2 −direction by a factor of 2 and a stretch in the x3 −direction
by a factor of 3, then we have the linear transformation L : R3 → R3 with standard matrix
 
1 0 0
[L] =  0 2 0 .
 

0 0 3

The image of our circle under L is an ellipsoid, and this ellipsoid has volume
4
Vellipsoid = det[ L ] Vsphere = |6| π = 8π.
3
Figure B.4 illustrates our sphere and the resulting ellipsoid, and shows that our result for the
volume of the ellipsoid is consistent with the actual formula for the volume of an ellipsoid.

319
Figure B.4: A sphere of radius 1 centred at the origin on the left, and its image
under the linear transformation L on the right.

320

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy