0% found this document useful (0 votes)
1 views

LinearAlgebra1-LectureNotes2018 (1)

The document contains lecture notes for the Linear Algebra 1 course at Technische Universiteit Eindhoven for the academic year 2018-2019. It covers various topics including complex numbers, vectors, matrices, vector spaces, and inner product spaces, along with exercises and notes for each section. The notes also provide prerequisites and answers to exercises, aiming to enhance understanding of linear algebra concepts.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1 views

LinearAlgebra1-LectureNotes2018 (1)

The document contains lecture notes for the Linear Algebra 1 course at Technische Universiteit Eindhoven for the academic year 2018-2019. It covers various topics including complex numbers, vectors, matrices, vector spaces, and inner product spaces, along with exercises and notes for each section. The notes also provide prerequisites and answers to exercises, aiming to enhance understanding of linear algebra concepts.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 210

Linear Algebra 1

Department of Mathematics and Computer Science


Technische Universiteit Eindhoven

2018-2019
ii

Lecture notes for Linear Algebra 1 (2WF20)


Contents

1 Complex numbers 1
1.1 Arithmetic with complex numbers . . . . . . . . . . . . . . . . 1
1.2 The exponential function, sine and cosine . . . . . . . . . . . . 11
1.3 Complex polynomials . . . . . . . . . . . . . . . . . . . . . . . 16
1.4 Geometry with complex numbers . . . . . . . . . . . . . . . . 22
1.5 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
1.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
1.6.1 Exercises from old exams . . . . . . . . . . . . . . . . . 37

2 Vectors in two and three dimensions 39


2.1 Vectors in dimensions two and three . . . . . . . . . . . . . . . 39
2.2 Vector descriptions of lines and planes . . . . . . . . . . . . . 44
2.3 Bases, coordinates, and equations . . . . . . . . . . . . . . . . 46
2.4 Distances, Angles and the Inner Product . . . . . . . . . . . . 51
2.5 The cross product . . . . . . . . . . . . . . . . . . . . . . . . . 58
2.6 Vectors and geometry . . . . . . . . . . . . . . . . . . . . . . . 62
2.7 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
2.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
2.8.1 Exercises from old exams . . . . . . . . . . . . . . . . . 73

3 Matrices and systems of linear equations 75


3.1 Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
3.2 Row reduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
3.3 Systems of linear equations . . . . . . . . . . . . . . . . . . . . 85
3.4 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
3.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
3.5.1 Exercises from old exams . . . . . . . . . . . . . . . . . 98

i
ii CONTENTS

4 Vector spaces 99
4.1 Vector spaces and linear subspaces . . . . . . . . . . . . . . . 99
4.2 Spans, linearly (in)dependent systems . . . . . . . . . . . . . . 110
4.3 Coordinates . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
4.4 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
4.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
4.5.1 Exercises from old exams . . . . . . . . . . . . . . . . . 132

5 Rank and inverse of a matrix, determinants 133


5.1 Rank and inverse of a matrix . . . . . . . . . . . . . . . . . . 133
5.2 Determinants . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
5.3 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
5.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
5.4.1 Exercises from old exams . . . . . . . . . . . . . . . . . 158

6 Inner product spaces 159


6.1 Inner product, length and angle . . . . . . . . . . . . . . . . . 159
6.2 Orthogonal complements and orthonormal bases . . . . . . . . 167
6.3 The QR-decomposition . . . . . . . . . . . . . . . . . . . . . . 178
6.4 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180
6.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
6.5.1 Exercises from old exams . . . . . . . . . . . . . . . . . 184

A Prerequisites 185
A.1 Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
A.2 Maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186
A.3 Some trigonometric relations . . . . . . . . . . . . . . . . . . . 187
A.4 The Greek alphabet . . . . . . . . . . . . . . . . . . . . . . . . 188

B Answers to most of the exercises 189


Preface

These are the lecture notes for the course Linear Algebra 1 (2WF20). Though
this translation follows a previous Dutch version quite closely, I have taken
the opportunity to include various improvements.
If you notice any mistakes, please let me know.

Hans Sterk
Summer 2017

iii
Chapter 1

Complex numbers

1.1 Arithmetic with complex numbers


1.1.1 We can view the real numbers as points on the ‘real line’. In a similar way, we
can view the extended number system to be discussed, the complex numbers,
as points in the plane. We discuss the operations addition and multiplication
and their properties of these new numbers, which justify the name ‘numbers’.
In particular, we discuss in this section

• the notion of a complex number;

• the addition/substraction and multiplication/division of complex num-


bers;

• the description of a complex number using its absolute value and ar-
gument;

• the complex conjugate of a complex number;

• various arithmetical rules;

• the geometric interpretation of complex numbers and their usage in


plane geometry.

1.1.2 Start with the usual coordinate system in the plane. Now call the horizontal
axis the real axis and the vertical axis the imaginary axis. Every point in the
plane is determined by its coordinates, say a and b, which are real numbers.
We will call the point (a, b) a complex number, and usually denote it by a+bi.

1
2 Complex numbers

The points (a, 0) will simply be denoted by a, and the points (0, b) on the
second axis by bi. In particular, i denotes the point (0, 1). So (1, 2) becomes
1 + 2i, and (0, 3) becomes 3i. We often denote a complex number by z or w.
The set of complex numbers is denoted by C.

bi z = a + bi
imaginary axis ✲


0 a
real axis

1.1.3 Addition: the sum of two complex numbers


The sum of two complex numbers z1 = a1 + b1 i and z2 = a2 + b2 i is defined
as follows:
z1 + z2 := (a1 + a2 ) + (b1 + b2 )i.
For example, (1 + i) + (−2 + 4i) = −1 + 5i. The addition of complex
numbers therefore corresponds to coordinatewise addition of points in the
plane. Geometrically, it corresponds to vector addition.
The addition of complex numbers satisfies various properties, which we
know from the real numbers. For instance, the addition is commutative: for
all complex numbers z and w the equality z + w = w + z holds. To prove
this, write z = a + bi and w = c + di (with a, b, c, d real) and appy the
definition first to z + w to get (a + c) + (b + d)i and then to w + z to get
(c + a) + (d + b)i. Since a + c = c + a and b + d = d + b (properties of the
real numbers!) we conclude that z + w = w + z.
In a similar way we can show that the addition is associative: for all
complex numbers z1 , z2 , z3 we have (z1 + z2 ) + z3 = z1 + (z2 + z3 ). This is
useful in the sense that if we encounter a sum z1 + z2 + z3 of three complex
numbers we can deal with this sum by adding z1 + z2 and z3 , or by adding
z1 and z2 + z3 .

1.1.4 The product of two complex numbers


The multiplication of complex numbers is defined in a surprising way, but
the definition is justified by its incredible usefulness, as we will see later.
We define the product of two complex numbers z1 = a1 + b1 i and z2 =
a2 + b2 i as follows:

z1 z2 := (a1 a2 − b1 b2 ) + (a1 b2 + a2 b1 ) i.
1.1 Arithmetic with complex numbers 3

In particular, i2 = −1. This definition of the multiplication looks pretty


complicated, but is in fact easy to memorize and use in the following way.
Just expand (a1 + b1 i) (a2 + b2 i), using the rules you are used to from the
real numbers, and then use one additional property, namely that i2 = −1.
For instance,

(1 + i) (−2 + 4i) = −2 + 4i + (−2)i + i(4i) = −2 + 2i − 4 = −6 + 2i.

Since we have defined the multiplication by using the usual rules for deal-
ing with expressions containing symbols, it is not that surprising that the
complex numbers share the usual arithmetical properties with the real num-
bers. The verification is fairly straightforward (cf. the properties of addi-
tion), but quite tedious. Here are the most often used properties: com-
mutativity, i.e., zw = wz for all complex numbers z and w; associativity,
i.e., (z1 z2 )z3 = z1 (z2 z3 ) for all complex numbers z1 , z2 , z3 ; distributivity:
z1 (z2 + z3 ) = z1 z2 + z1 z3 for all complex numbers z1 , z2 , z3 .

1.1.5 Absolute value and argument


Multiplication of complex numbers has a geometric interpretation in terms
of scaling and rotations. To see this, we introduce the notions of absolute
value and argument of a complex number, which are closely related to po-
lar coordinates of a point in the plane. See 1.1.9 and 1.1.12. These polar
coordinates of a point are at the basis of the following definition:

• the absolute value is the distance to the origin;

• the argument is the angle the directed segment from the origin to the
complex number makes with the positive real axis. The argument is
only defined for nonzero complex numbers.

The absolute value of the complex


√ number z = a + bi (with a and √ b real!)
2 2
is denoted by |z|, and equals a + b . For instance, |1 + 2i| = 5. The
argument is determined up to a multiple of 2π. If we choose the argument
of z in the interval (−π, π], then we call this value the principal value of z.
π
The argument of z 6= 0 is denoted by arg(z). So arg(1 + i) = .
4
4 Complex numbers

z ❖
■ arg(z)
|z| ❘

If a complex number z has absolute value |z| and argument ϕ, then the
cartesian coordinates of the corresponding point in the plane are |z| cos ϕ
and |z| sin ϕ, respectively, so that

z = |z| cos ϕ + i|z| sin ϕ, or


z = |z|(cos ϕ + i sin ϕ).

1.1.6 Example. A few examples:



• The complex number i has absolute value 02 + 12 = 1 and argument
π/2.

• The complex number cos t+i sin t (with t real) corresponds


√ to the point
(cos t, sin t) on the unit circle. It has absolute value cos t + sin2 t = 1
2

and argument t. Likewise, the complex number r(cos t + i sin t) (with


r and t real and r > 0) has absolute value r and argument t.
p √
• The complex number −1 + i has absolute value (−1)2 + 12 = 2;
the principal value of the argument is 3π/4. √ The complex numer −1 + i
can also be written in the following way: 2 (cos(3π/4) + i sin(3π/4)).
The advantage of this representation is that we see immediately what
the absolute value and the argument of −1 + i are.

1.1.7 Real and imaginary part of a complex number


If z = a + bi with a, b ∈ R, then a is called the real part of z, denoted
by Re(z), and b is called the imaginary part of z, denoted by Im(z). For
instance, Im(2 + 3i) = 3.
Re en Im can be viewed as maps from C to R: you put a complex number
in, and a real number comes out. Note that the imaginary part of a complex
number is real! The following properties can be easily verified using the
1.1 Arithmetic with complex numbers 5

definitions. For alle z1 , z2 we have:


Re(z1 + z2 ) = Re(z1 ) + Re(z2 ),
Im(z1 + z2 ) = Im(z1 ) + Im(z2 ),
Re(z1 z2 ) = Re(z1 )Re(z2 ) − Im(z1 )Im(z2 ),
Im(z1 z2 ) = Re(z1 )Im(z2 ) + Im(z1 )Re(z2 ).
It is useful to know the first two by heart.
From the absolute value and the argument of a complex number z 6= 0 it
is straightforward to find its real and imaginary parts:
Re(z) = |z| cos(arg(z)),
Im(z) = |z| sin(arg(z)).
Conversely, the absolute value can be deduced from the real and imaginary
parts: p
|z| = Re(z)2 + Im(z)2 .
The (principal value of the) argument of the complex number z is often
thought to equal
 Im(z) 
arg(z) = arctan ,
Re(z)
but this is not true in general (check for z = −1 − i). It is true if Re(z) > 0.
There is a (for this course not important) formula valid for all z which are
not on the negative real axis,
Im(z)
arg(z) = 2 arctan .
|z| + Re(z)
1.1.8 Triangle inequality
The absolute value has the following property. For all z and w:

|z + w| ≤ |z| + |w|.

This property is called the triangle inequality. See exercise 6 for a proof.

1.1.9 Absolute value and argument: computational rules


The absolute value and argument satisfy the following important properties.
For all z, w (non-zero in the case of the argument):

|zw| = |z| · |w|


arg(zw) = arg(z) + arg(w) (mod 2π)
6 Complex numbers

(Here, (mod 2π) stands for ‘modulo/up to multiples of 2π’.) These properties
are based on properties of the sine and cosine. To prove them, take two
complex numbers, z1 en z2 with polar coordinates (|z1 |, ϕ1 ) en (|z2 |, ϕ2 ),
respectively, so that
z1 = |z1 | cos ϕ1 + i |z1 | sin ϕ1 ,
z2 = |z2 | cos ϕ2 + i |z2 | sin ϕ2 .

Then

z1 z2 = |z1 | |z2 | (cos ϕ1 cos ϕ2 − sin ϕ1 sin ϕ2 ) +

i(cos ϕ1 sin ϕ2 + cos ϕ2 sin ϕ1 )
 
= |z1 | |z2 | cos(ϕ1 + ϕ2 ) + i sin(ϕ1 + ϕ2 ) .

Since the absolute value of the complex number r(cos t + i sin t) (with r, t real
and r > 0) is r and its argument is t (up to multiples of 2π) we find:

|z1 z2 | = |z1 | |z2 |,


(1.1)
arg(z1 z2 ) = arg(z1 ) + arg(z2 ) (mod 2π).

By repeated application of these rules we find for n complex numbers z1 ,


z2 , . . . , zn :
|z1 z2 · · · zn | = |z1 | |z2 | · · · |zn |,
(1.2)
arg(z1 z2 · · · zn ) = arg(z1 ) + arg(z2 ) + · · · + arg(zn ) (mod 2π).

If all z1 , z2 , . . . , zn are equal to z, then we find:


|z n | = |z|n ,
(1.3)
arg(z n ) = n arg(z) (mod 2π).

1.1.10 The quotient of two complex numbers


For every complex number z there exists an ‘inverse’, ‘1/z’. To describe it,
note that 1/z should satisfy

z (1/z) = 1,

and so, using formula (1.1), it should satisfy

|z| |1/z| = |1| = 1 ,


arg(z) + arg(1/z) = arg(1) = 0.
1.1 Arithmetic with complex numbers 7

Consequently, we can define 1/z in terms of polar coordinates by

1 1
z
= |z| ,
arg(1/z) = − arg(z).

The quotiënt z/w of the complex numbers z and w (with w 6= 0) is defined


as the product
z (1/w).

In terms of polar coordinates (again using formula (1.1)) we obtain for the
quotient:
|z/w| = |z|/|w|,
(1.4)
arg(z/w) = arg(z) − arg(w) (mod 2π).

The quotient satisfies the usual rules, like

z1 w 1 z1 w 1
= .
z2 w 2 z2 w 2

The real and imaginary part of 1/z can be obtained as follows. Suppose
z = a + bi with a, b ∈ R and not both equal to 0. Then

1 1 a − bi a − bi
= = = 2 .
z a + bi (a + bi)(a − bi) a + b2

Here, we use that (a + bi)(a − bi) = a2 + b2 (verify!). For example,

1 1 3 − 4i 3 − 4i 3 − 4i
= · = 2 2
= ,
3 + 4i 3 + 4i 3 − 4i 3 +4 25

and
1+i 1 + i 2 − 3i 5−i
= · = .
2 + 3i 2 + 3i 2 − 3i 13

1.1.11 The complex conjugate


If z = a+bi with a, b ∈ R is a complex number, then a−bi is called the complex
conjugate of z, denoted by z̄. Geometrically, conjugation is a reflection in
the real axis. The following properties are easy to verify, geometrically, or
8 Complex numbers

by using the definition.

Re(z) = Re(z̄),
Im(z) = − Im(z̄),
1
Re(z) = 2
(z + z̄),
1
Im(z) = 2i
(z − z̄),
|z| = |z̄|,
arg(z) = − arg(z̄),
z1 + z2 = z1 + z2 ,
z1 z2 = z1 · z2 ,
z + z̄ = 2 Re(z),
z z̄ = |z|2 .

Note that the last formula implies


1 z̄
= 2 ,
z |z|

in accordance with 1.1.10. So 1/z is obtained by first reflecting in the real


axis and then scaling by a factor |z|2 .
By way of example, we provide the details of the proof of the identity
z1 + z2 = z1 + z2 . Write z1 = a1 + b1 i and z2 = a2 + b2 i (with ai and bi real
for i = 1, 2). Then z1 + z2 = a1 + b1 i + a2 + b2 i = (a1 + a2 ) + (b1 + b2 )i so that

z1 + z2 = (a1 + a2 ) + (b1 + b2 )i = (a1 + a2 ) − (b1 + b2 )i.

Now (a1 + a2 ) − (b1 + b2 )i = (a1 − b1 i) + (a2 − b2 i) = z1 + z2 and we are done.

1.1.12 Geometric interpretation


Let z = a + bi and w = c + di be two complex numbers.pThe absolute value
|z − w| is the distance between z and w, since |z − w| = (c − a)2 + (d − b)2
and by the Pythagorean applied to (a, b) and (c, d). The following examples
show the usefulness of this interpretation.

• The solutions to the equation |z − i| = 5 are precisely the complex


numbers at distance 5 from i. So they form a circle with center i and
radius 5. Of course, we can rewrite the equation in terms of coordinates
to get the well-know equation of a circle: if z = x + iy, then

|z − i| = 5 ⇔ |z − i|2 = 25 ⇔ |x + (y − 1)i|2 = 25 ⇔ x2 + (y − 1)2 = 25.


1.1 Arithmetic with complex numbers 9

• The equation |z − z1 | = |z − z2 | describes the points with equal distance


to z1 and z2 . This is precisely the perpendicular bisector of the segment
with endpoint z1 and z2 .

Complex multiplication with a fixed complex number also has a geometric


interpretation, which follows from 1.1.9. Let’s illustrate this for multiplica-
tion by i. If z 6= 0, then |zi| = |z| · |i| = |z|, and arg(zi) = arg(z) + arg(i) =
arg(z)+π/2. In other words, zi can be obtained from z by rotating z (around
the origin) through an angle π/2. Similarly, multiplication by 1 √ + i comes
down to rotating z through an angle π/4 and scaling by a factor 2.

1.1.13 In computations with complex numbers we can use the representation in


terms of their real and imaginary parts (‘a + bi’, cartesian coordinates) or in
terms of their absolute values and arguments (‘|z|’ and ‘arg(z)’, or r(cos t +
i sin t)). Here is a rough indication of when to use which representation.

• If the computation mainly involves additions, then try using cartesian


coordinates first.

• If the computation mainly involves multiplications (and powers), and


no additions, then try polar coordinates first.

Also note the following: two complex numbers are equal if and only if their
real parts and their imaginary parts are equal. Two nonzero complex num-
bers are equal if and only if their absoluate values are equal and their argu-
ments are equal up to multiples of 2π.

1.1.14 Example. We solve the equation z 2 = 2i in two different ways.

• (First solution) Here we write z = x + iy (with x and y real) and


substitute
x2 + 2ixy − y 2 = 2i.
Looking at the real and imaginary parts produces the two (real) equa-
tions x2 − y 2 = 0 and 2xy = 2. From the first equation we deduce for
the real x and y that x = y or x = −y. Substituting in the second
equation then yields x2 = 1 (in case x = y), and x2 = −1 (in case
x = −y). The equation x2 = −1 has no real solutions, and the equa-
tion x2 = 1 leads to x = 1 or x = −1. So we find the two solutions
1 + i en −1 − i of z 2 = −1.
10 Complex numbers

• (Second solution) First observe that any solution has to be nonzero.


Using the absolute value and argument we then get:

z 2 = 2i ⇔ |z 2 | = 2 and arg(z 2 ) = π/2 + 2kπ (k integral)


⇔ |z|2 =√2 and 2 arg(z) = π/2 + 2kπ (k integral)
⇔ |z| = 2 and arg(z) = π/4 + kπ (k integral).

This leads to the following two solutions:


√ √
2 (cos(π/4) + i sin(π/4)) and 2 (cos(5π/4) + i sin(5π/4))

(we only use the values k = 0 and k = 1, since for k = 2 we find the
same solution as for k = 0, for k = 3 we find the same solution as
for k = 1, etc.). Verify that these to numbers are 1 + i and −1 − i,
respectively.
Note that the second approach is to be prefered if the exponent is
bigger, for instance, z 6 = −1 (try the first approach and you’ll quicly
see why).

1.1.15 Example. We prove that zw = 0 implies z = 0 or w = 0 in two different


ways.
• First proof. Suppose that z 6= 0, then we need to show that w = 0.
From zw = 0 we infer |zw| = 0 so that |z| · |w| = 0. So now we are in
the situation where the product of the two real numbers |z| and |w| is
0. Since |z| 6= 0 we conclude that |w| = 0 (using the properties of real
numbers). But then w = 0.

• Second proof. Suppose that z 6= 0. Now multiply zw = 0 on both sides


with 1/z:
1 1
· (zw) = · 0(= 0).
z z
1 1
Rewrite the left-hand side using associativity: · (zw) = ( · z)w = w.
z z
So w = 0.
A third approach would be to write z = x + iy and w = u + iv (with x, y, u, v
real) and analyse (x + iy)(u + iv) = 0. This approach is computationally
more involved. The previous two proofs show the usefulness of the arithmetic
rules for complex numbers.
1.2 The exponential function, sine and cosine 11

1.2 The exponential function, sine and cosine


So far we have discussed addition, substraction, multiplication and division
of complex numbers. In this section we turn to the complex exponential
function, the complex sine and cosine function.

1.2.1 Definition. (Complex exponential function) For very complex number


z we define the complex number ez by giving its absolute value and argument:

|ez | = eRe(z) ,
z
arg(e ) = Im(z) .

1.2.2 Note the use of the real exponential function in this definition. The definition
of the complex exponential function agrees with the real exponential function
for real numbers z, since for a real number z = x+i·0, our new definition 1.2.1
yields |ez | = ex and arg(ez ) = Im(z) = 0. So ez equals the real exponential
ez .
Note furthermore that ez 6= 0 for all complex z because |ez | = eRe(z) 6= 0
(the real exponential function has no zeros).

1.2.3 Example. The complex number eπi has absolute value eRe(πi) = e0 = 1 and
argument Im(πi) = π, so that eπi = −1. The number e1+πi/2 has absolute
value eRe(1+πi/2) = e1 and argument Im(1 + πi/2) = π/2, so that e1+πi/2 = e i.

1.2.4 Example. To solve the equation ez = 1 + i we compare the absolute values


and arguments of the left-hand and right-hand sides. Writing z = x + iy this
gives us the following two equations:
√ π
ex = 2 and y = + 2kπ with k integral.
4
The (infinite) solution set is therefore:

1 π
{ log(2) + i( + 2kπ) | k ∈ Z },
2 4
where log denotes the natural logarithm.

1.2.5 Theorem. ez1 ez2 = ez1 +z2 for all complex numbers z1 and z2 .
12 Complex numbers

Proof. We prove this equality by showing that both sides have the same
absolute value and the same argument. Here is the computation for the
absolute value (note that we use arithmetical rules for the absolute value,
and for the real exponential function):

|ez1 ez2 | = |ez1 | |ez2 | = eRe(z1 ) eRe(z2 ) = eRe(z1 )+Re(z2 ) ,

|ez1 +z2 | = eRe(z1 +z2 ) = eRe(z1 )+Re(z2 ) .


So ez1 ez2 and ez1 +z2 have the same absolute value. Similarly,

arg(ez1 ez2 ) = arg(ez1 ) + arg(ez2 ) = Im(z1 ) + Im(z2 ) ,

arg(ez1 +z2 ) = Im(z1 + z2 ) = Im(z1 ) + Im(z2 ) .


So ez1 ez2 and ez1 +z2 have the same argument. This concludes the proof. 

1.2.6 Theorem. (ez )n exists for every integral exponent n. It satisfies (for every
complex number z and integral n) the following property:

(ez )n = enz .

Proof. The existence statement follows from the fact that ez 6= 0. So we


turn to the proof of the equality (ez )n = enz . For positive integral n this
property is a consequence of Theorem 1.2.5 (and for the mathematicians:
use mathematical induction). For n = 0 both sides equal 1.
Next we turn to the case n = − 1. Because of 1.1.10 the absolute value
of 1/ez equals
 −1
z −1 z −1 Re(z)
|(e ) | = |e | = e = e− Re(z) = eRe(− z) .

Analogously, we find for the argument:


 
arg (ez )−1 = − arg(ez ) = − Im(z) = Im(− z) .

So (ez )−1 = e−z . Applying Theorem 1.2.5 the equality (ez )n = enz follows for
all integral negative n. 

1.2.7 Corollary. e2πin = 1 and eπin = (− 1)n for every integral n.


Both properties follow directly from the definition of ez and Theorem 1.2.6.
1.2 The exponential function, sine and cosine 13

1.2.8 Theorem. The function ez is periodic and had period 2πi.


Proof. ez+2πi = ez e2πi = ez · 1 = ez . 

1.2.9 The formulas in Corollary 1.2.7 are a special case of a general property. Let
ϕ be a real number. Then eiϕ is a complex number with absolute value 1
and argument ϕ, so:
Property. For every real number ϕ we have
eiϕ = cos ϕ + i sin ϕ .
This relation connects the complex exponential function and the (real) sine
and cosine. Moreover, it provides another short way of representing a com-
plex number:
z = |z| ei arg z .

For instance, 1 + i = 2 eπi/4 .

1.2.10 From 1.2.9 the following relation for real ϕ follows:


e−iϕ = cos ϕ − i sin ϕ .
From eiϕ = cos ϕ + i sin ϕ and e−iϕ = cos ϕ − i sin ϕ we deduce that for every
real ϕ the following relations hold:
 
cos ϕ = 21 eiϕ + e−iϕ ,
 
1
sin ϕ = 2i
eiϕ − e−iϕ .

Based on this we define for an arbitary complex number z the (complex)


sine and cosine as follows.

1.2.11 Definition. (Complex sine and cosine)


 
cos z = 21 eiz + e−iz ,
 
1
sin z = 2i
eiz − e−iz .

it follows from 1.2.9 that these definitions agree with the real sine and cosine,
i.e., if you take z to be a real number in the new definition, then sin z and
cos z are simply the usual real sine and cosine of z, respectively.
14 Complex numbers

1.2.12 Theorem. Sine and cosine are periodic and have period 2π. Also,

sin2 (z) + cos2 (z) = 1 ∀z ∈ C.

Proof.
   
1 1
sin(z + 2π) = 2i
ei(z+2π) − e−i(z+2π) = 2i
eiz e2πi − e−iz e−2πi
 
1
= 2i
eiz − e−iz = sin z,

since e±2πi has absolute value 1 and argument ± 2π, and therefore equals 1.
The proof that the cosine is periodic with period 2π is similar.
The relation sin2 (z) + cos2 (z) = 1 is proved by substituting the defining
expressions for sin(z) and cos(z):
1 2  1 2 1 1
(eiz +e−iz ) + (eiz −e−iz ) = (e2iz +2+e−2iz )− (e2iz −2+e−2iz ) = 1.
2 2i 4 4


1.2.13 Example. Solve the equation

cos(z) = 2 .

It is clear that there are no real solutions. But there turn out to be complex
solutions. First rewrite the equation in terms of the exponential function:

1  iz 
e + e−iz = 2 .
2
Now set w = eiz , then w 6= 0 because of 1.2.2, and we find:
1
w+ w
=4,

w2 − 4w + 1 = 0 ,

(w − 2)2 = 3 ,

w =2± 3.
1.2 The exponential function, sine and cosine 15

So we arrive at:
√ √
|eiz | = e−Im(z) = |w| = 2 ± 3 , dus Im(z) = − log(2 ± 3) .

arg(eiz ) = Re(z) = arg w = 0 (mod 2π), dus Re(z) = k 2π , k ∈ Z .


Therefore all solutions are

z = − i log(2 + √3) + k · 2π , k ∈ Z,
z = − i log(2 − 3) + k · 2π , k ∈ Z.

Note that the absolute value of the complex cosine is not bounded by 1 like
for the real cosine.

1.2.14 Theorem.
ez = ez̄ ,
sin(z) = sin(z̄),
cos(z) = cos(z̄).
Proof.
z Re(z)
z
 |e | = |e | = e = eRe(z̄) ,
arg ez = − arg(ez ) = − Im(z) = Im(z̄),

so ez = ez̄ .

1  iz   1   1  īz̄ 
sin(z) = e − e−iz = eiz − e−iz = − e − e−īz̄
2i 2i 2i
1  −iz̄  1  iz̄ 
=− e − eiz̄ = e − e−iz̄
2i 2i

= sin(z̄).

The formula for cos(z) can be proved in a similar way. 

1.2.15 The formula ez ew = ez+w for the complex exponential function is useful in
deriving trigonometric formulas. For instance, start with e2it = eit eit (for
real t) and rewrite this relation as follows:

cos(2t) + i sin(2t) = (cos t + i sin t)(cos t + i sin t)


16 Complex numbers

Since (cos t + i sin t)(cos t + i sin t) = cos2 t − sin2 t + 2i sin t cos t we find upon
comparing real and imaginary parts:
cos(2t) = cos2 t − sin2 t en sin(2t) = 2 sin t cos t.
This formula (and many similar ones) also turn out to hold for complex
values of t; this is easily verified by applying the definition of the complex
sine and cosine.
In a similar way the formula eia eib = ei(a+b) leads to
cos(a + b) = cos a cos b − sin a sin b;
sin(a + b) = sin a cos b + cos a sin b.

1.3 Complex polynomials


1.3.1 Solving equations is very important in mathematics. Complex numbers en-
able us to solve more equations than with just the real numbers. They also
enable us to see the connections between various types of equations. In this
section we will discuss various types of polynomial equations.

1.3.2 Complex polynomials


An expression of the form
an z n + an−1 z n−1 + · · · + a1 z + a0
in which a0 , . . . , an are complex numbers, is called a complex polynomial in
z. If an 6= 0 then n is the degree of the polynomial. The numbers a0 , . . . , an
are called the coefficients of the polynomial. If they are all real, then the
polynomial is called real.
Let p(z) be a polynomial. If p(α) = 0, then α is called a zero or root
of the polynomial. It is also called a solution of the polynomial equation
p(z) = 0.
The following formulas (see (1.2) and (1.3)) will play an important role
in this section:
|z1 z2 · · · zn | = |z1 | |z2 | · · · |zn |,
arg(z1 z2 · · · zn ) = arg(z1 ) + arg(z2 ) + · · · + arg(zn ) (mod 2π)
and, in particular, if z1 = z2 = ... = zn = z:
|z n | = |z|n ,
arg(z n ) = n arg(z) (mod 2π).
1.3 Complex polynomials 17

1.3.3 Example. Here is how we use these formulas to solve the equation

z3 = i .

(If you rewrite it like z 3 − i = 0 you see that it comes from a polynomial
equation.) We solve this equation by comparing the absolute values and
arguments of both sides of the equation. First note that any solution is
nonzero. Now turn to the absolute values:

|z 3 | = |z|3 = |i| = 1 , so |z| = 1 .

And for the arguments we find (here we use that z 6= 0)


π
arg(z 3 ) = 3 arg(z) = arg(i) = + k · 2π , k ∈ Z ,
2
so that
π 2π
+k·
arg(z) = , k∈Z.
6 3
Up to multiples of 2π we find three distinct values for arg(z), for k = 0,
k = 1, k = 2. There are therefore precisely three solutions of the equation:
π π 5π 5π 9π 9π
z = cos + i sin , z = cos + i sin , z = cos + i sin .
6 6 6 6 6 6
Depending on what the solutions are needed for, other forms may be more
1√ 1 1√ 1
practical, like eiπ/6 , e5iπ/6 , e3iπ/2 , or: 3 + i, − 3 + i, −i.
2 2 2 2
1.3.4 The equation of the previous example is a special case of the equation

z n = a,

in which n is a nonnegative integer and a is a complex number with a 6= 0.


This type of equation lends itself very well for the approach using absolute
values and arguments. From z n = a we obtain
p
|z n | = |z|n = |a| , dus |z| = n |a| ,

arg(z n ) = n arg(z) = arg(a) + k · 2π , k ∈ Z ,


1 2π
arg(z) = arg(a) + k · , k∈Z.
n n
18 Complex numbers

Up to multiples of 2π we find, for k = 0, ..., n − 1, exactly n distinct values


for arg z.
So the equation z n = a has n disctinct solutions which are all
p located ar
regular angular intervals onpthe circle with center 0 and radius n |a|, i.e. the
circle with equation |z| = n |a|. In the figure, the four solutions of z 4 = 1
are drawn.
i

−1 1

−i

The equation z n = 0 has ‘n coinciding’ solutions, or a solution z = 0 with


multiplicity n. See 1.3.6.
The equation
z 2 = a , a 6= 0
p
is also a special case. It has two solutions, both with absolute value |a|
and with arguments 12 arg(a) and 12 arg(a) + π, respectively.
Next, we investigate the zeros of a complex polynomial p(z) of degree n.

1.3.5 Theorem. If α is a root of the complex polynomial p(z) of degree n > 0,


i.e., p(α) = 0, then z − α is a factor of p(z), i.e., there exists a complex
polynomial q(z) of degree n − 1 such that

p(z) = (z − α) q(z) .

Proof. For every complex number α (regardless of it being a zero of p(z)


or not). we can divide p(z) by z − α (e.g., by using the technique of long
division). We then get a quotient q(z) and a remainder r (a constant):

p(z) = (z − α)q(z) + r .

If α is a zero, so that p(α) = 0, then we obtain

0 = p(α) = (α − α)q(α) + r

and so r = 0. 
1.3 Complex polynomials 19

1.3.6 If p(α) = 0, then p(z) can be written as (z − α)q(z) for some polynomial
q(z). If q(α) = 0, then q(z) also contains a factor z − α, and so we have (for
some polynomial s(z))

p(z) = (z − α)2 s(z) , enz.

Definition. α is called a zero of multiplicity m of the complex polynomial


p(z) if there exists a polynomial t(z) with t(α) 6= 0 such that

p(z) = (z − α)m t(z) .

The multiplicity of a zero α is the number of factors z − α of p(z).

1.3.7 If p(z) is a polynomial of degree n, then the preceeding discussion implies


that thet total number of zeros, counted with multiplicities, is at most n. In
fact, we have the following even stronger property.

1.3.8 Theorem. (Fundamental theorem of algebra) Every complex polyno-


mial of degree n has exactly n zeros if every zero is counted with its multi-
plicity.
The proof is beyond the scope of this course (and requires advanced tech-
niques). Note that we did prove the theorem for the special class of polyno-
mials of the form z n − a. Also note that the theorem doesn’t hold over the
real numbers: x2 + 1 is a real polynomial of degree 2 without any zeros in R.

1.3.9 Polynomials of degree 1


For polynomials of degree 1 the zero is easily found as follows

az + b = 0 , a 6= 0

implies z = − b/a.

1.3.10 Polynomials of degree 2


Next we turn to polynomials of degree 2:

az 2 + bz + c , a 6= 0 .

Rewrite the polynomial as follows:

2 b
2 b 2 4ac − b2
az + bz + c = a(z + z) + c = a(z + ) + .
a 2a 4a
20 Complex numbers

b
Now let w = z + 2a
, then we find the (quadratic) equation

b2 − 4ac
w2 = ,
4a2
which is of the type we discussed before. It has two solutions for w (unless
b2 −4ac = 0), see 1.3.4, from which the two solutions for z follow immediately.
The technique we used in rewriting the quadratic equation is called com-
pleting the square. Note that we do not use the abc formula, since we haven’t
defined square roots.

1.3.11 Example. Consider the equation

z 2 + (2 + 4i)z + i = 0 .

Completing the square yields

(z + (1 + 2i))2 = −3 + 3i ,

so that (see 1.3.4.):



4 3π 3π
z + 1 + 2i = 18(cos( ) + i sin( )) or
8 8

4 11π 11π
z + 1 + 2i = 18(cos( ) + i sin( )).
8 8
Finally,

4 3π √ 3π
z = −1 + 18 cos( ) + i(− 2 + 4 18 sin ) or
8 8

4 3π √ 3π
z = −1 − 18 cos( ) + i(− 2 − 4 18 sin ).
8 8
1.3.12 Degree 3 and higher
For polynomials of degree 3 and 4 there exist (complicated) ways to produce
the solutions in an algorithmic manner (there are formulas like the abc for-
mula). For polynomials of degree 5 and higher such algorithms and formulas
do not exist. In those cases we have to use numerical methods to approximate
the zeros. Of course, in some specific cases one may be able to find (some
of) the exact solutions, for instance for equations of the form z n − a = 0.
1.3 Complex polynomials 21

The following theorem deals with polynomials whose coefficients are all real.
In that case any non-real solution automatically produces a ‘twin’ solution.

1.3.13 Theorem. Let


p(z) = an z n + an−1 z n−1 + · · · + a1 z + a0 ,
where an , an−1 , ..., a1 , a0 are all real. If the complex number α satisfies p(α) =
0, then p(ᾱ) = 0.
Proof. Since α is a zero, we have
an αn + an−1 αn−1 + · · · + a1 α + a0 = 0 .
Taking the complex conjugate yields
an αn + an−1 αn−1 + · · · + a1 α + a0 = 0̄ = 0 .
Applying the rules for complex conjugation 1.1.11 (and the fact that ak = ak
for k = 1, . . . , n) then produces
an ᾱn + an−1 ᾱn−1 + · · · + a1 ᾱ + a0 = 0,
so p(ᾱ) = 0. 

1.3.14 Corollary. Every nonzero polynomial with real coefficients can be factored
as a product of real polynomials of degree 1 and 2.
Proof. Let p(z) be a polynomial with real coefficients. If α is a real zero of
p(z), then p(z) can be written as
p(z) = (z − α)q(z) ,
where q(z) is also a polynomial with real coefficients. If α is a non-real zero
of p(z), then ᾱ is also a zero and
p(z) = (z − α)(z − ᾱ) r(z)
= (z 2 − (α + ᾱ)z + αᾱ) r(z)
= (z 2 − 2Re(α)z + |α|2 ) r(z) .
The first factor has real coefficients, so r(z) has real coefficients. Since the
degrees of q(z) and r(z) are less than the degree of p(z), we can repeat this
construction until we get to the point where the degrees of the quotients are
0. 
22 Complex numbers

1.3.15 Example. Suppose we wish to factor the polynomial


p(z) = z 5 − 6z 4 + 25z 3 − z 2 + 6z − 25.
into real factors of degrees 1 and 2. Suppose also that we know that 3 − 4i is
a zero of the polynomial. Because the polynomial has real coefficients, 3 + 4i
is also a zero, and so the polynomial has a factor
(z − 3 + 4i)(z − 3 − 4i) = z 2 − 6z + 25.
Long division then gives
p(z) = (z 2 − 6z + 25)(z 3 − 1) .
The last factor has a zero z = 1 and therefore contains a factor z − 1:
p(z) = (z 2 − 6z + 25)(z − 1)(z 2 + z + 1) .
The third factor, z 2 + z + 1, has no real factors of degree 1. (Since both zeros
of this polynomial are non-real.)

1.4 Geometry with complex numbers


1.4.1 Complex numbers have proven their usefulness in many branches of mathe-
matics (and other disciplines as well). In this section we explore the relation
with plane geometry, and show how complex numbers add to the techniques
for addressing geometric problems. In the next chapter we will see a similar
role for vectors.
In the sequel we identify the plane from classical geometry with the com-
plex plane. Points are then described by complex numbers. Of course, we
have the freedom to put the origin at a suitable place in a given problem. We
will assume some basic facts from classical geometry at certain points in our
discussion. The book [5] elaborates more extensively on the use of complex
numbers in planar geometry.

1.4.2 Lines and segments in the (complex) plane


If z 6= 0 is a complex number, then the numbers tz with t real describe the
line through 0 and z. The segment with endpoints 0 and z is decribed by
taking t in the interval [0, 1]. We sometimes denote this segment by [0, z].
1
The midpoint of the segment [0, z] is the complex number z.
2
1.4 Geometry with complex numbers 23

For two distinct complex numbers z and w the complex numbers of the
form z + t(w − z) with t real run through the points of the line through z
and w. The segment with endpoints z and w (whose points correspond to
1
parameter values t in the interval [0, 1]) is denoted by [z, w]. For t = we
2
find the midpoint of the segment [z, w]:
1 1
w + (z − w) or (z + w).
2 2
The length of the segment [z, w] is equal to the distance between z and w,
i.e., |w − z|. The complex number w − z not only encodes the information
on the length of the segment [z, w], but also on the segment’s direction via
√ argument. For example, the segment [2 + i, 3 + 2i] has length |1 + i| =
its
2, and it makes an angle of π/4 radians with the positive real axis since
arg(1 + i) = π/4.
The lines through z1 and z2 (with z1 6= z2 ) and through w1 and w2 (with
w1 6= w2 ), respectively, are parallel if and only if w2 − w1 is a real multiple
w2 − w1
of z2 − z1 , or, equivalently, the quotient is real.
z2 − z1
1.4.3 Example. This example illustrates the use of complex numbers in handling
segments in a triangle. Suppose △ABC is a triangle in the plane. Let D
be the midpoint of AC and let E be the midpoint of BC. We will show,
using complex numbers, that DE is parallel with AB and that the length of
segment AB is twice the length of segment DE.
To show this, let A, B, C correspond to the complex numbers z1 , z2 , z3 ,
respectively (it turns out to be irrelevant where the origin is). Then D and
E correspond to
1 1
(z1 + z3 ) and (z2 + z3 ) ,
2 2
respectively. Since
1 1 1
(z2 + z3 ) − (z1 + z3 ) = (z2 − z1 )
2 2 2
we conclude that DE and AB are parallel and that segment DE is half as
long as segment AB.

1.4.4 Translations
Let u be a complex number. The map T : C → C given by T (z) = z + u
24 Complex numbers

is a translation over u. Translations ‘preserve shapes’, so, for example, they


transform straight lines into straight lines.

1.4.5 Rotations and circles


If w is a complex number with absolute value 1 and argument α radians, the
map R : C → C defined by R(z) = zw is a rotation through α radians. To
see this, note that
|R(z)| = |zw| = |z| · |w| = |z|,
zodat R(z) and z are at the same distance from the origin (they are both on
the circle with center 0 and radius |z|), and

arg(R(z)) = arg(zw) = arg(z) + arg(w) = arg(z) + α (mod 2π),

so that the argument of R(z) is α radians more than that of z. Another way
of saying this is: if z is on the circle C with equation |z| = r, then R(z) is
also on C.
Similarly, if the absolute value of w 6= 0 differs from 1, then multiplication
by w defines a transformation of the plane in which each complex number is
rotated through arg(w) radians and is scaled by a factor |w|.
A circle with center z0 and radius r consists of all complex numbers z
satisfying
|z − z0 | = r.
Depending on the situation, alternative descriptions may be useful. Here are
a few equvalent descriptions.

• |z − z0 |2 = r2 or (z − z0 )(z̄ − z¯0 ) = r2 , where in the last equation we


have used the fact that |w|2 = ww̄ for every complex number w.

• If we write z0 = x0 + iy0 and z = x + iy (with x0 , y0 , x, y real), then


|z − z0 |2 = r2 expands into the well-known ‘real’ equation of a circle:
(x − x0 )2 + (y − y0 )2 = r2 .

• An explicit way of describing all points on the circle with equation


|z − z0 | = r is as follows: z − z0 has absolute value r and is therefore
of the form r eit (or r(cos t + i sin t)), where t is the argument (modulo
2π) of z − z0 . So we find that z can be described as z0 + reit . This is an
example of a parametric equation of the circle |z − z0 | = r. (Of course,
there are many, z0 + re2it , z0 − reit are two more examples.)
1.4 Geometry with complex numbers 25

1
For example, suppose you are asked to show that w = for every complex
w
number w on the circle C : |z| = 1, then you could proceed as follows. Let
w be an arbitrary complex number on C, then w can be written as eit for
1 1
some real t. Then w = eit = e−it by Theorem 1.2.6, and = it = e−it by
w e
Theorem 1.2.14, and so we are done. (An alternative approach is to write w
in the form cos t + i sin t, etc.)

1.4.6 Example. If the complex numbers z and w have the same absolute value
(6= 0) and if the angle between (the segments connecting 0 with) z and w
is equal to α, then the argument of the quotient z/w is α or −α so that
z/w = eiα of z/w = e−iα . Another way of phrasing this is to say that
z = weiα or z = we−iα .
If, for instance, in △αβγ (so a triangle with vertices α, β, γ) γ − α =
e±πi/3 (β − α), then this can be read as: the segment [α, γ] is obtained from
the segment [α, β] by a rotation through ±π/3 radians. In particular, these
two segments have the same length:
|γ − α| = |e±πi/3 (β − α)| = |e±πi/3 | · |(β − α)| = |β − α|.
And of course, the triangle is then equilateral by the congruence criterion
SAS (side-angle-side). Here is a verification that |γ − β| = |β − α|(= |γ − α|)
using complex numbers. First rewrite γ − β as follows:
γ − β = γ − α + α − β = e±πi/3 (β − α) + α − β = (e±πi/3 − 1)(β − α).
1 1 √ 1 1 √
Now note that e±πi/3 − 1 = ± i 3 − 1 = − ± i 3 whose absolute value
2 2 2 2
is 1. So
|γ − α| = |(e±πi/3 − 1)(β − α)| = |e±πi/3 − 1| · |β − α| = |β − α|.
Note that this example really comes down to the fact that the complex num-
bers 0, eπi/3 and eπi/3 − 1 are the vertices of an equilateral triangle.

1.4.7 Example. If △z1 z2 z3 is a triangle, then for every complex number w 6= 0 the
triangles △z1 z2 z3 and △(wz1 )(wz2 )(wz3 ) (multiply each vertex zi by w) are
similar. There are various ways to see this. One way is to compare lengths
of corresponding sides (using the rules for absolute values):
|wz2 − wz1 | |w| · |z2 − z1 |
= = |w|,
|z2 − z1 | |z2 − z1 |
26 Complex numbers

and
|wz3 − wz1 | |w| · |z3 − z1 |
= = |w|,
|z3 − z1 | |z3 − z1 |
and, similarly, |wz3 − wz2 | = |w| · |z3 − z2 |. So the triangles are similar by
the sss criterion (side-side-side).
Of course, you can also compare the angles of triangle △(wz1 )(wz2 )(wz3 )
wz1 z1
with those of triangle △z1 z2 z3 , e.g., arg( ) = arg( ). So both triangles
wz2 z2
have the same angles and are therefore similar.

1.4.8 Example. Let △ABC be a triangle. Let BCDE and ACF G be two squares
erected externally on the sides BC and AC, respectively, as in the illustra-
tion. Let H be the midpoint of DF . Prove that HC and AB are perpen-
dicular. The idea in the following proof is to connect perpendicularity with
F
H

G D

C
E
A B

Figure 1.1: Triangle △ABC with two squares.

multiplication by i.
Put the origin in C (the point C seems central to the configuration, so
looks like a reasonable choice to make the computations easier) and denote
vertex A by the complex number z and vertex B by w. Then vertex D
corresponds to iw (rotate B around C through 90◦ ) vertex F corresponds to
−iz (rotate vertex A through −90◦ ). The midpoint of segment DF is then
1
(iw − iz). Since
2
1 1
(iw − iz) = · i · (w − z)
2 2
and since w − z corresponds to segment AB, we find that HC is indeed
perpendicular to AB (and has half its length).

1.4.9 The nine-point circle I


Let △ABC or, in complex terms, △αβγ be a triangle, where the origin is
1.4 Geometry with complex numbers 27

chosen in the center of the circumcircle. Suppose that |α| = |β| = |γ| =
1 1 1
1. The points (β + γ), (α + γ) and (α + β) are the midpoints D,
2 2 2
E, F of the three sides BC, AC and AB, respectively. It follows from
classical geometry that the segments connecting the origin with each of these
midpoints are perpendicular to the corresponding sides of the triangle. The
1
point (α + β + γ) is the centroid Z of triangle △αβγ.
3

P
E D
Q Z O
H N

A R F B

Figure 1.2: The circumcircle of △ABC with center O, and the circumcircle of
△DEF with center N . The centroid Z, the orthocenter H and the altitudes
are also shown.

1
The point h = α + β + γ is also special: h − γ = α + β = 2 · (α + β), so
2
h−γ and AB are perpendicular and h−γ is twice as long as OF . So the point
h is on the altitude from C. Similarly, h is on the altitudes through B and
A, respectively. So the three altitudes are concurrent. Their common point
H (or h in terms of complex numbers) is called the orthocenter of triangle
△ABC. (By the way: in every triangle the three altitudes are concurrent;
the assumptions we have made are not restrictive; do you see why?)
1
The point (α + β + γ) or h/2 is also special. To see this, consider the
2
28 Complex numbers

distances between this point and the three midpoints of the sides of △ABC:
1 1 1 1
| (α + β + γ) − (β + γ)| = | α| = .
2 2 2 2
1
The distances to the midpoints E and F are equal to and so the point h/2
2
is the midpoint N of the circumcircle of triangle △DEF (see figure 2.15).

1.4.10 The nine-point circle II


The circle through D, E, and F turns out to pass through the midpoints of
C

E D
O
H N

K L
A F B

Figure 1.3: The circumcircle of △DEF with center N also passes through
the midpoints of segments AH, BH en CH.

the three segments connecting h and the three vertices A, B and C, respec-
1
tively. The distances of h to these midpoints is equal to :
2
1 1 1 1
| h − (h + α)| = | − α| = ,
2 2 2 2
etc. By now we have: the midpoints of the sides of △ABC and the midpoints
of the segments HA, HB en HC lie on the same circle.
This circle turns out to also pass through the three feet of the altitudes
of △ABC (as figure 2.15 suggests). For this reason the circle is called the
nine-point circle of △ABC. The proof is discussed in exercise 24.
1.4 Geometry with complex numbers 29

1.4.11 Other transformations


Translations, rotations and reflections (and their compositions) have the spe-
cial property that they are bijections of the plane, i.e., they have inverses.
There also exist transformations that are not bijections (and still useful), like
the transformation sending z to z 2 . We will not discuss them in this course.
30 Complex numbers

1.5 Notes
More worked examples can be found in in [7] en [8] (see the bibliography at the
end of the lecture notes). The role of complex numbers in geometry is extensively
discussed in [5].
The construction of complex numbers is an example of the construction of an
Algebra arithmetical system. Another example is the system Z/nZ of integers modulo n.
Such constructions are discussed in the various algebra courses.
Complex numbers have a centuries long history. A ‘formal’ definition in terms
of pairs of real numbers was given by Sir William Hamilton (1805–1865), see [1],
p. 524. He defined the addition on such pairs by (a, b) + (c, d) = (a + c, b + d),
and the multiplication by (a, b) · (c, d) = (ac − bd, ad + bc). By agreeing to write
a instead of (a, b) (for real a) and i for (0, 1), we arrive at the usual notation
a + bi. Hamilton’s approach to define complex numbers in terms of the familiar
real numbers contributed to the demystification of complex numbers. Hamilton
generalized his construction to an arithmetical system with elements of the form
a+bi+cj+dk (with a, b, c, d ∈ R), where i2 = −1, j 2 = −1, k 2 = −1, ij = k = −ji,
jk = i = −kj, ki = j = −ik. This is the famous arithmetical system of the
quaternions.
Linear Complex numbers are useful for linear algebra since they enable us to solve
Algebra polynomial equations related to linear transformations, as will be discussed in
Linear Algebra 2. Polynomials are discussed in more detail in the algebra courses.
They play an important role in many branches of mathematics, ranging from
numerical mathematics to cryptology.
The Fundamental Theorem of Algebra has a long history in itself. It took
many decades in the 18th and 19th century and the efforts of mathematicians like
d’Alembert, Argand, Gauss to produce a rigorous proof (many candidate proofs
contained a subtle gap which could only be filled after the development of rigorous
analysis and topology), see [1]. A proof that uses complex integration is discussed
Complex in Complex Analysis. The fact that there do not exist explicit formulas for solving
Analysis polynomial equations of degrees 5 and higher requires a substantial amount of
algebra.
The analysis of functions f : C → C, i.e., limits, continuity, differentiation,
integration, is also discussed in the course on complex analysis. Complex analysis
is extensively used not only in mathematics, but also in electrical engineering and
in mathematical physics.
1.6 Exercises 31

1.6 Exercises
§1

1 Write each of the following complex numbers in the form a+bi with a en b real:
7+i
a. (2 + 3i)(1 − i), d. ,
1 + 2i
√ √ 9 − 3i
b. (− 12 + 21 i 3)(− 12 − 12 i 3), e. ,
1 + 3i
1 z 1
√ 1

c. , f. , met z = 2
2 + 2
2i.
4 − 3i (z + 1)2

2 Write each of the following complex numbers in the form r(cos ϕ + i sin ϕ),
with r > 0 en −π ≤ ϕ ≤ π, and draw these numbers
√ in the complex plane:
a. −3, d. 3 + i,

b. 2i, e. 5 + 12i,

c. 1 + i, f. 4 − 4i.

3 Draw an arbitrary complex number z (and not on the real axis) in the com-
plex plane.

a. Draw, without any computations (!), the following complex numbers


(make sure the relative position to z is clearly indicated):

1
z + 2, −2z, , z − 2i, iz, z, −iz.
z

b. Similar question for: z(cos(π/2) − i sin(π/2)), 3z(cos(7π/6) + i sin(7π/6))


and z(cos(2π/3) + i sin(2π/3)).

4 In the complex plane, draw the complex numbers z ∈ C that satisfy both

π 3π
|z + 1 − i|2 ≤ 2 and ≤ arg z ≤ .
2 4
5 Determine all complex numbers z that satisfy
32 Complex numbers

a. |z − i| = |z + 3i|, d. Re(z 2 + 1) = 0 and |z| = 2,


b. |z − 3i| = |4 + 2i − z|, e. arg(z/z) = 3
.

c. Re(z 2 ) = Im(z 2 ),

6 Prove the triangle inequality |z1 + z2 | ≤ |z1 | + |z2 | in the following steps.

a. Prove the inequality in the case z1 = 1. Write z2 in the form z2 =


r(cos t + i sin t) and analyse |1 + r cos t + ir sin t|2 .

b. Prove the inequality |z1 + z2 | ≤ |z1 | + |z2 | in the case z1 6= 0 by dividing


both sides by |z1 |.

Show furthermore that for all complex numbers z1 , z2 , . . . , zn the following


inequality holds:

|z1 + z2 + · · · + zn | ≤ |z1 | + |z2 | + · · · + |zn |.

§2

7 Draw each of the following complex numbers in the plane and write them in
the form a + bi (with a, b real):
a. 2eπi/2 , d. e5πi/3 ,

b. 3e2πi/3 , e. e(−πi/3)+3 ,

c. 2eπi/4 , f. e−5πi/6+2kπi , k ∈ Z.

8 Solve each of the following equations:


a. ez = 1 + i, d. e|z| = 1,
√ 2
b. ez = 1 + 3i, e. e−z = −i,

1+i
c. eRe(z) = 5, f. e2iz = 1−i
.

9 Use the definitions of the complex cosine and sine to show each of the fol-
lowing statements.

a. sin 2z = 2 sin z cos z for all z ∈ C.


1.6 Exercises 33

b. cos 2z = cos2 z − sin2 z for all z ∈ C.

10 Solve each of the following equations:


1 iz
a. 2
(e + e−iz ) = 0,

b. sin(2z) = 4.

§3

11 Solve each of the following equations and draw the solutions in the complex
plane.
a. z 6 = 1, e. (z + 2 − i)6 = 27i,

b. z 3 = 8, f. z 2 = z,

c. z 4 = 16i, g. z 3 = −z.

d. (z + i)4 = −1,

12 Solve each of the following equations and draw the solutions in the complex
plane.

a. z 2 + z + 1 = 0,

b. z 2 − 2iz + 8 = 0,

c. z 2 − (4 + 2i)z + 3 + 4i = 0,

d. z 2 (i + z 2 ) = −6.

13 a. The equation z 3 + (2 − 3i)z 2 + (−2 − 6i)z − 4 = 0 has a solution z = i.


Determine the other solutions.

b. The equation z 4 + 4z 3 + 3z 2 − 14z + 26 = 0 has a solution z = 1 + i.


Determine the other solutions.

c. Suppose 5 and 1+2i are zeros of degree 3 polynomial with real coefficients.
Determine such a polynomial.

d. Suppose i and 2 − 3i are zeros of a degree 4 polynomial with real coeffi-


cients. Determine such a polynomial.
34 Complex numbers

14 Factor in real factors of lowest possible degrees:


a. z 4 − 3z 2 − 4,

b. z 3 + 3z 2 + 4z + 2,

c. z 4 + z 3 + 2z 2 + z + 1.

15 a. Compute (1 + i)11 .

b. Suppose the complex number z satisfies


√ π
z 4 = 8 3 + 8i and ≤ arg(z) ≤ π.
2
Determine the exact values of |z 23 | and arg(z 23 ) (the argument taken in
the interval [0, 2π]).

16 Prove that for all positive integers n De Moivre’s formula holds for real ϕ:

(cos ϕ + i sin ϕ)n = cos nϕ + i sin nϕ.

(After the French mathematician A. de Moivre (1667–1754)). Use it to ex-


press cos 3ϕ and sin 4ϕ in terms of cos ϕ and sin ϕ.

17 Let p(z) = az 2 + bz + c be a complex polynomial with a 6= 0. Prove the


following statement using the steps outlined below: The polynomial p(z) has
a zero with multiplicity 2 if and only if b2 − 4ac = 0.
• First part: If p has a zero with multiplicity 2, then b2 − 4ac = 0.
So suppose that λ is such a zero of p(z). Now show:

1) p(z) = az 2 + bz + c = a(z − λ)2 .


2) Express b and c in terms of λ and verify that b2 − 4ac = 0.

• Second part: If b2 − 4ac = 0, then p(z) has a zero with multiplicity 2.


Complete the square in p(z) = az 2 + bz + c and use b2 − 4ac = 0.
§4

18 Prove each of the following statements.


a. The complex number z is real if and only if z = z.
1.6 Exercises 35

b. The complex number z is purely imaginary (i.e., real part equal to 0) if


and only if z + z = 0.

c. The (segments connecting 0 with the) complex numbers z and w (both


6= 0) are parallel if and only if zw = zw. [Hint: analyse the quotient z/w.]

d. The (segments connecting 0 with the) complex numbers z and w (both


6= 0) are perpendicular if and only if zw + zw = 0. [Hint: what can you
say of the quotient z/w if z are w perpendicular?]

19 (Reflections) In terms of complex numbers complex conjugation describes


a reflection in the real axis: z = x + iy is mapped to x − iy or z.
a. Show that a reflection in the imaginary axis maps z into −z.

b. A reflection of z in a line through the origin making an angle of α radians


with the positive real axis can be described as follows: first rotate z around
the origin through −α radians, then reflect the result in the real axis, and,
finally, rotate through α radians. Describe the resulting complex number
in terms of z.

c. The angle between the lines ℓ and m through the origin is α radians. We
first reflect z in ℓ and then the result in m. Show that this composition of
these two reflections is a rotation through 2α radians around the origin.
[Hint: assume the angle between ℓ and the positive real axis is β radians,
and the angle between m and the real axis is β + α radians.]

20 Let △ABC be an equilateral triangle, whose vertices A, B, C correspond to


the complex numbers α, β, γ, respectively.
a. In this item we put the origin in A. Show that the vertices can be repre-
sented in the following way: 0, z, exp(πi/3) z.

b. Let ρ = eπi/3 and let ω = ρ2 . Verify that ρ3 = −1, and 1 − ρ + ρ2 = 0,


and 1 + ω + ω 2 = 0.

c. From this item onwards, the origin is not necessarily located in one of the
vertices. Prove that γ − α = ρ(β − α) or γ − α = ρ̄(β − α).

d. Prove that α + ωβ + ω 2 γ = 0 or α + ω 2 β + ωγ = 0 if △ABC is equilateral.

e. Prove that △ABC is equilateral if α + ωβ + ω 2 γ = 0 or α + ω 2 β + ωγ = 0.


36 Complex numbers

21 Let ℓ be the line through the two distinct complex numbers v and w. Then
ℓ consists of all complex numbers of the form v + t(w − v) with t real.
a. Prove: if, for a complex number z with z 6= v and z 6= w, the quotient
z−w
is a real number, say t, then z is on the line ℓ.
z−v
z−w
b. Prove: if z, distinct from v and w, is on ℓ, then the quotient is a
z−v
real number.

c. Prove that v, w, z are collinear (lie on one line) if and only if

(z − w)(z̄ − v̄) = (z̄ − w̄)(z − v).

22 Suppose ABCD and AB ′ C ′ D′ are two squares in the plane that a) have
vertex A in common, b) have the same orientation of the vertices, and c) lie
outside one another. Let P be the intersection of the diagonals AC and BD;
let Q be the intersection of the diagonals AC ′ and B ′ D′ ; let R be the midpoint
of the segmen BD′ , and let S be the midpoint of the diagonal B ′ D. Prove
that P QRS is a square by first showing that segment P S transforms into P R
by a rotation through 90◦ . (Do not denote complex numbers corresponding
to P , etc., by P , etc.; use for instance corresponding small letters.)

23 Reflecting in the line through u and v


Let ℓ be the line in the complex plane through the points u and v. In this
exercise we determine the mirror image of z when we reflect z in ℓ.
a. Suppose u = 0. Show that we can write z in the form z = reit · v for some
real r and t. What is the mirror image of z in this case?

b. Back to the general case: show that z can be written as u + reit · (v − u).
Use this to show that the mirror image of z is equal to
v−u
u + (z̄ − ū) .
v̄ − ū

c. Show that this expression simplifies to

u + v − uvz̄

if |u| = |v| = 1. [Hint: use that 1/u = u.]


1.6 Exercises 37

24 The nine-point circle


The nine-point circle, (see 1.4.9 and 1.4.10) also passes through the feet of
the three altitudes. In this exercise we discuss a proof of this fact.

a. The altitude from A intersects the circumcircle of △ABC in A′ . Show


that the corresponding complex number α′

α − α′ ᾱ − ᾱ′
+ = 0.
β−γ β̄ − γ̄

α − α′
[Hint: since α − α′ and β − γ are perpendicular, the quotient is
β−γ
purely imaginary.]

b. Now use ᾱ = 1/α, β̄ = 1/β, etc., to deduce that

βγ
α′ = − .
α
[Note: an alternative approach would be to compute the mirror image of
h = α + β + γ in the line AB with the formula from exercise 23c) and to
verify that this mirror image in on the circumcircle of △ABC.]

c. Show that the segments BH and BA′ have the same length.

d. Conclude that the foot P of the altitude from A is


1 βγ
(h − ).
2 α

1
e. Show that the distance between h/2 and P equals . Conclude that the
2
nine-point circle passes through the three feet P , Q, R of the altitudes.

1.6.1 Exercises from old exams


25 Determine all complex numbers z that satisfy

z̄ · z
= 1.
(1 − z)2
38 Complex numbers

26 Solve the following equation in C :


1+i
e2iz = .
1−i
27 a. Sketch the set of z ∈ C that satisfy
π
| arg(z)| = ,
4
and the set of z ∈ C that satisfy

|z + 2i| = |z − 3|.

b. Determine all z ∈ C satisfying


π
| arg(z)| = and |z + 2i| = |z − 3|.
4
28 The complex number 1+2i is a zero of the polynomial z 4 −2z 3 +9z 2 −8z +20.
Find the factorization in factors of lowest possible degrees and determine the
remaining zeros.

29 Let p(z) be a complex polynomial. Prove that p(z) is a real polynomial (i.e.,
all its coefficients are real) if and only if p(z) = p(z) for all z ∈ C.

30 Solve in C:
z 3 = i z.

31 Suppose the squares ABCD and A′ B ′ C ′ D′ have the same orientation (so
going from A to B to C to D and going from A′ to B ′ to C ′ to D′ is both
clockwise or counterclockwise). Prove that the midpoints of the segments
AA′ , BB ′ , CC ′ , and DD′ are the vertices of a square.
Chapter 2

Vectors in two and three


dimensions

2.1 Vectors in dimensions two and three


2.1.1 The mathematical concept of a vector was originally introduced to represent
quantities that have both magnitude and direction. Velocity and force are
two examples of such quantities in physics. In this chapter we will consider
vectors in the plane and in space. The notions we encounter here will be
generalized to abstract vector spaces in later chapters.
The crucial point with vectors is that they can be added and multiplied by
numbers. This opens the way to applying algebra in the plane and in space.
In this chapter this arithmetic of vectors is discussed, the use of vectors in
describing lines and planes and their relative positions (including distances,
angles, and the cross product). Also, the use of vectors in geometric problems
is briefly discussed. Like complex numbers, vectors provide a tool for dealing
with geometry.

2.1.2 Vectors
A vector corresponds with an arrow in the plane or in space, and is de-
termined by its direction and its magnitude (length). Therefore, an arrow,
translated parallel to itself to any point in space but with the same direction
and magnitude, is considered to represent the same vector. Such translated
arrows are called equivalent, i.e. represent the same vector1 .
1
Dont confuse this with a force vector (or any other vector valued quantity) applied
to a physical point in space! Although the force has, as a vector, many mathematical

39
40 Vectors in two and three dimensions

Figure 2.1: On the left, representations of the same vector are drawn: direc-
tion and length of each arrow are the same, but the heads and tails differ. On
the right, different vectors are drawn with the same starting point, namely a
chosen origin O in the plane.

In the sequel, we will usually choose an origin in the plane or space,


and assume our vectors start there. Sometimes we will deviate from this
convention. This will hardly lead to any confusion.
In these notes, vectors will be denoted by an underlined letter as follows:

v.

In the literature other notations are used, such as v, ~v or v̄.

2.1.3 The zero vector


There is one special vector which has no direction, and whose length equals
0. This vector is called the zero vector and is denoted by 0.

2.1.4 Scalar multiplication


Let v be a vector, and λ a real number. Then λv denotes the scalar product
of λ and v, and is defined as the vector that points in the same (if λ > 0)
or opposite (if λ < 0) direction as v, and whose length equals |λ| times the
length of v. For λ = 0 we define λv = 0, the zero vector. We also call
λv a scalar multiple or multiple of v. The real number λ is called a scalar.
Scalars are often denoted by Greek letters, but this is only a tradition and
not necessary. Sometimes a multiplication symbol is used for clarity as in
3 · v.
representations, there is physically only one force with one point of application, and we
cannot translate the force to another point without changing the physics.
2.1 Vectors in dimensions two and three 41

Furthermore, we write v for 1 v, −v for (−1)v, −3v for (−3)v, etc. The
vector −v is called the opposite of v.

u 2u −u

Figure 2.2: Scalar multiplication.

For any vector v and scalars λ and µ, we have:

• 0 · v = 0;

• λ(µv) = (λµ)v.
In words: if the vector v is first multiplied by µ and the resulting vector
is multiplied by λ, then the result equals the scalar product of the scalar
λµ and the vector v.

2.1.5 The addition of vectors


If u and v are two vectors starting from the same point (by translating this
can always be arranged), then their sum u + v is by definition the vector
which starts from that same point and ends at the point equal to the fourth
point of the parallelogram spanned by u and v. The sum u en v can also be
obtained by joining the starting point (or tail) of v to the endpoint (or head)
of u, or vice versa. If the vectors have the same or opposite directions, then
only this second construction works. Note furthermore that u + 0 = u for
every vector u.
Here are the arithmetic rules and some remarks regarding this addition
(no proofs):

• Associativity of the addition:

(u + v) + w = u + (v + w)
42 Vectors in two and three dimensions

u+v u+v

v
v

u u

Figure 2.3: Addition of vectors: on the left the construction using a paral-
lelogram, on the right the head-to-tail construction, joining the tail of the
second vector to the head of the first one.

for all vectors u, v, w. Note that the addition is only defined for
two vectors, and not for three or more. So if you want to add three
vectors, you will have to split the problem in various additions of two
vectors. For instance, you could add the first and second vector, and
then add the result to the third vector, so this corresponds to (u+v)+w.
Associativity tells you that it doesn’t matter in which way you split the
problem, the answer is always the same. That’s the (justified) reason
we often leave out the brackets (we sometimes put in brackets to clarify
calculations for the reader). So we often simply write v 1 + v 2 + v 3 + v 4
for the addition of four vectors instead of, say,

v 1 + ((v 2 + v 3 ) + v 4 ).

• Commutativity of the addition:

v+w =w+v

for all vectors v and w. This is obvious from the parallellogram con-
struction. It implies that you can change whenever needed the order
of the vectors in additions. For instance, u + v + w = w + u + v. Here
is how this specific equality follows from commutativity:

u + v + w = u + w + v = w + u + v.

From now on, you don’t have to supply such proofs any time you use
commutativity, unless a proof is explicitly asked for.
• Instead of v + −w we usually write v − w (subtraction of vectors).
2.1 Vectors in dimensions two and three 43

Here are the arithmetic rules that involve both addition and scalar multipli-
cation.
• Distributivity of the scalar multiplication over addition:
λ(v + w) = λv + λw
for all vectors v, w and for all scalars λ.
• Distributivity of the scalar addition over the scalar multiplication:
(λ + µ)v = λv + µv
for all scalars λ, µ and all vectors v.
The sum of any vector v and its opposite −v always yields the zero vector:
v − v = 0.
2.1.6 Linear combinations
If v 1 , v 2 , . . . , v n are n vectors and λ1 , λ2 , . . . , λn are n real numbers (scalars),
then the vector
λ1 v 1 + λ2 v 2 + · · · + λn v n
is called a linear combination of the vectors v 1 , v 2 , . . . , v n . Linear combina-
tions are the vectors we can build out of a given set of vectors using addition
and scalar multiplication.
So 2u − 3v + 2w is a linear combination of u, v, w.

2.1.7 Examples. The following examples show that computations with vectors
involving addition and scalar multiplication only are fairly easy. Note that
we cannot multiply two (or more) vectors (but see §2.5).
a) 3v − w + 2v + 3w = 5v + 2w. Here are the detailed steps, using the
various arithmetic rules. By commutativity
3v − w + 2v + 3w = 3v + 2v − w + 3w.
Next, distributivity and the fact that −w = (−1)w imply
3v + 2v − w + 3w = (3 + 2)v + (−1 + 3)w = 5v + 2w.
Note that because of associativity we didn’t place brackets. Otherwise,
the first step of the computation would have looked like:
(3v+(−w+2v))+3w = (3v+(2v−w))+3w = ((3v+2v)−w)+3w = . . .
44 Vectors in two and three dimensions

b) The opposite of λ v is −λv. Here, λ is an arbitrary scalar. A proof


could run as follows: λ v + −λv = (λ − λ)v = 0 · v = 0.
c) Do you see why v + 12 (w − v) = 12 (v + w)? This equality provides two
ways to consider the midpoint of a segment. Do you see which ones?
d) By using the various arithmetic rules, you find:
(−u + 2v + 3w) + (2u − v + w) = u + v + 4w.

2.2 Vector descriptions of lines and planes


2.2.1 By choosing an origin O, every point P corresponds uniquely with the vec-
tor p that starts in O and ends in P . On the other hand, every vector p
determines uniquely the point P given by its head, if we let p start in O. In
this way we have a correspondence between points and vectors. Sometimes,
when there is no risk of misinterpretation, we will make no explicit distinction
between a point and the corresponding vector. From here on we will assume
that an origin O is defined. This point corresponds with the zero vector 0.

2.2.2 Lines
The scalar multiples x = λv of a vector v, with v 6= 0, run through all points
(vectors) of a straight line ℓ through the origin.

a + λv

a
v

Figure 2.4: A parametric representation of a straight line with supporting


vector a nd direction vector v. Each vector on the line can be obtained by
adding an appropriate scalar multiple of v to a.

If a is a second vector, then for any λ,


x = a + λv
2.2 Vector descriptions of lines and planes 45

is a point on a straight line m through a and parallel with ℓ. We call


ℓ : x = λv and m : x = a + λv
a parametric representations or vector representations of the lines ℓ and m,
respectively. The vector v is a so-called direction vector . The vector a is a
so-called supporting vector (or position vector) of the line m (we may call 0
a supporting vector of the line ℓ). The scalar λ is called a parameter. Of
course, instead of λ any other letter can be used.
Summarizing: for a parameter or vector representation of a straight line
we need a vector (with its endpoint) on the line and a direction vector. Note
that both vectors are not unique. Every non-zero scalar multiple µv of the
direction vector v is also a possible direction vector. Any vector b (with
endpoint) on the line m can be used as supporting vector.

2.2.3 Example. (Supporting and direction vectors of a line are not unique)
The vector p + v is on the line ℓ with parametric description x = p + λv:
just take λ = 1. This vector p + v may serve as a supporting vector of ℓ,
since the vectors p + v + µv run through the same vectors for varying µ as
the vectors p + λv (for varying λ). This follows easily from the equalities
p + v + µv = p + (1 + µ)v and p + λv = p + v + (λ − 1)v. In fact, every vector
on ℓ may serve as a supporting vector.
Similarly, 2v, −3v, π v are direction vectors of ℓ. For instance, the vectors
p + µ(2v) run through the vectors of ℓ for varying µ.

2.2.4 Planes
Planes in space can also be represented in terms of a vector or parametric
representation. For this we need one vector whose endpoint is in the plane (a
supporting vector) and two direction vectors which are not scalar multiples of
each other. Since we use two direction vectors, we also need two parameters.
The plane U through the origin and with direction vectors u and v has
the following parametric description:
U : x = λu + µv.
The plane V with supporting vector a and direction vectors u and v has the
following parametric representation:
V : x = a + λu + µv.
Just as with lines, neither the supporting vectors nor the direction vectors
are uniquely determined.
46 Vectors in two and three dimensions

v v

u u

Figure 2.5: On the left a plane through the origin. On the right a plane
through a with direction vectors u and v.

2.2.5 Example. (Supporting and direction vectors of planes are not unique)
The plane V with parametric equation V : x = a + λu + µv can, for instance,
also be described in the following way:

x = a + ρ(u + v) + σ(u − v).

To see this we have to verify that every vector of the form a + λu + µv can
also be written in the form a + ρ(u + v) + σ(u − v), and vice versa. The
following two equalities show this:
a + λu + µv = a + 12 (λ + µ)(u + v) + 12 )(λ − µ)(u − v)
a + ρ(u + v) + σ(u − v) = a = (ρ + σ)u + (ρ − σ)v.
In fact, one can prove in a similar way that any two linear combinations of u
and v that are not multiples of one another, may serve as direction vectors.
For example, the pair 2u + 3v, 2u − 5v is such a couple.
As with lines, any vector on V can serve as supporting vector of V . For
example, a + 3u + 5v is such a vector.

2.3 Bases, coordinates, and equations


2.3.1 To be able to do concrete computations, it is useful to describe vectors using
numbers (in their role as coordinates). For this purpose we need the notions
basis and coordinates.

2.3.2 Basis
2.3 Bases, coordinates, and equations 47

• The plane
In the plane we need two vectors which are not multiples of each other,
v1 e1+ v2 e2

e2

e1

Figure 2.6: Using the basis e1 , e2 any vector in the plane can be described
with the use of two coordinates.

for instance e1 and e2 . Every vector v in the plane can be expressed in


a unique way as a linear combination of the vectors e1 , e2 ,

v = v1 e 1 + v2 e 2 ,

for some scalars v1 and v2 , unique determined by v.

• 3-dimensional space
In space we choose three vectors e1 , e2 , e3 that are not coplanar (i.e.,
whose endpoints do not lie in one plane with the origin). Then any
vector x can be written as a linear combination of these three vectors:

v = v1 e 1 + v2 e 2 + v3 e 3 ,

for unique scalars v1 , v2 en v3 (v1 e1 is a kind of ‘projection’ of v onto


the line x = λe1 , so that v1 is determined, etc.).
The vectors e1 , e2 , e3 are said to form a basis basis of space and the
numbers v1 , v2 , v3 are called the coordinates of the vector v with respect
to this basis. If the vectors e1 , e2 , e3 are mutually perpendicular and
have length 1, then the basis is called an orthonormal basis.

2.3.3 The vector spaces R2 and R3


Via coordinates every vector v in space corresponds to a unique triple of
coordinates, v1 , v2 , v3 , say. The triple is usually denoted as an element of
48 Vectors in two and three dimensions

R3 . Such a triple is called a coordinate vector . The addition and scalar


multiplication translate as follows into coordinates:

v + w ↔ (v1 + w1 , v2 + w2 , v3 + w3 )
λv ↔ (λv1 , λv2 , λv3 )

(where w corresponds to (w1 , w2 , w3 )). With the choice of a basis, we have


coordinatized space with the set R3 . Note that the coordinates depend on
the specific basis (and position of the origin). In this coordinate space we can
add coordinate vectors and multiply them by scalars. In Chapter 4 we will
see that space and its coordinate space are special cases of a vector space.
In a similar way, R2 provides coordinates for vectors in the plane. For us,
main role of the coordinate plane R2 and the coordinate space R3 is to able
to translate vector problems into problems with numbers. The zero vector
in the plane and in space, respectively, correspond to (0, 0) and (0, 0, 0),
respectively.
In practice we often ‘identify’ the coordinate space (plane) with the space
(plane) of vectors itself. We then speak of the line in R2 or R3 , vectors in R2
or R3 , the plane R2 , the space R3 , a parametric equation of a line in R2 , etc.
We will write a = (a1 , a2 , a3 ) if (a1 , a2 , a3 ) is the coordinate vector of a, even
though this is strictly speaking not correct.

2.3.4 Describing lines in the plane with coordinates


Suppose ℓ : x = a + λv is a (vector parametric equation of a) line in the
plane, and suppose e1 , e2 is a basis of the plane. Let (a1 , a2 ) and (v1 , v2 ) be
the coordinate vectors of the supporting vector a and the direction vector v,
then, in terms of coordinates, we find the following parametric equation for
ℓ:
ℓ : (x1 , x2 ) = (a1 , a2 ) + λ(v1 , v2 ).
Sometimes it is useful to use column notation:
     
x1 a1 v1
= +λ .
x2 a2 v2

Still another way is to write x1 = a1 + λv1 and x2 = a2 + λv2 .


By eliminating λ from these two relations, we obtain an equation for ℓ.
Multiply both sides of x1 = a1 + λv1 by v2 and both sides of x2 = a2 + λv2
by v1 and subtract:
v 2 x 1 − v 1 x 2 = v 2 a1 − v 1 a2 .
2.3 Bases, coordinates, and equations 49

This yields a linear equation in the variables x1 and x2 . Lines do not have
unique equations. For instance, the equations x1 + 2x2 = 3 and 2x1 + 4x2 = 6
obviously describe the same line.In fact, multiplying both sides of an equation
by the same nonzero scalar doesn’t change the solution set.
Note that a vector parametric equation ℓ : x = a + λv of a line gives an
explicit description of the vectors on the line: every value of λ produces a
vector (or coordinate vector) on the line.
An equation describes the vectors on the line implicitly: you only know
which relation the coordinates of a vector need to satisfy in order to be the
coordinates of a vector on the line.

2.3.5 Describing lines in space with coordinates


A parametric equation ℓ : x = a + λv with a = (a1 , a2 , a3 ) and v = (v1 , v2 , v3 )
is
ℓ : (x1 , x2 , x3 ) = (a1 , a2 , a3 ) + λ(v1 , v2 , v3 )
or, in column notation:
     
x1 a1 v1
 x 2  =  a2  + λ  v 2  .
x3 a3 v3

A line in space can also be described using two linear equations, because a
line can be seen as the intersection of two planes and every plane can be
described by a linear equation (extensive details on this follow in Chapter
4). For instance, the system x1 + x2 + x3 = 1, 2x1 − x3 = 0 describes the line
x = (0, 1, 0) + λ(1, −3, 2) (by substitution you can verify that every vector
satisfies both equations). A way to find this parametric equation from the
two linear equations is to choose x1 as parameter, call it λ, and then deduce
that x3 = 2λ and x2 = 1 − x1 − x3 = 1 − 3λ. The computational techniques
behind this will be discussed in Chapter 3.

2.3.6 Describing planes in space in terms of coordinates


A parametric description V : x = a + λu + µv of a plane can be written out
in coordinates in various ways.

• Parametric description in ‘row notation’:

(x1 , x2 , x3 ) = (a1 , a2 , a3 ) + λ(u1 , u2 , u3 ) + µ(v1 , v2 , v3 ).


50 Vectors in two and three dimensions

• Parametric description in ‘column notation’:


       
x1 a1 u1 v1
 x2  =  a 2  + λ  u2  + µ  v 2  .
x3 a3 u3 v3

• Or simply each coordinate separately:

x1 = a1 + λu1 + µv1
x2 = a2 + λu2 + µv2
x3 = a3 + λu3 + µv3 .

Upon eliminating the parameters λ and µ an equation of the plane appears,


a linear equation in x1 , x2 , x3 ,

d 1 x1 + d 2 x2 + d 3 x3 = d 4 ,

for certain d1 , d2 , d3 , d4 . At least one of the coefficients d1 , d2 , d3 should be


nonzero.

2.3.7 Examples. a) x = (1, 2) + λ(3, −1) and x = (1, 2) + µ(−6, 2) describe


the same line. Why?

b) To determine an equation of the line ℓ : x = (1, 2) + λ(3, −1), we start


with x1 = 1 + 3λ and x2 = 2 − λ. Now multiply the 2nd equation by 3
and add the result to the 1st equation:

x1 + 3x2 = 1 + 3λ + 3(2 − λ) = 7.

So an equation of the line ℓ is x1 + 3x2 = 7.

c) Suppose 2x1 −x2 +3x3 = 4 is the equation of the plane V . To determine


a parametric description we proceed as follows. If you assign any value,
say λ, to x2 and any value, say µ, to x3 , then x1 is determined: x1 =
2 + λ/2 − 3µ/2. So

x1 = 2 + λ/2 − 3µ/2, x2 = λ, x3 = µ.

In vector notation:

(x1 , x2 , x3 ) = (2+λ/2−3µ/2, λ, µ) = (2, 0, 0)+λ(1/2, 1, 0)+µ(−3/2, 0, 1).


2.4 Distances, Angles and the Inner Product 51

Then a vector parametric description is


1 3
V : x = (2, 0, 0) + λ( , 1, 0) + µ(− , 0, 1).
2 2
To avoid fractions, you could also take

V : x = (2, 0, 0) + ρ(1, 2, 0) + σ(−3, 0, 2).

Do you see why?

d) To find an equation of the plane V with vector parametric equation


x = (2, 0, 0) + λ(1, 1, 0) + µ(0, 2, 1), we eliminate λ and µ from the
three expressions x1 = 2 + λ, x2 = λ + 2µ and x3 = µ, for instance as
follows (more systematic methods will be discussed in later chapters):

– Since x3 = µ we can replace µ in x2 = λ + 2µ by x3 : x2 = λ + 2x3 .


– Now subtract x2 = λ + 2x3 from x1 = 2 + λ: x1 − x2 = 2 − 2x3 .
So an equation is
x1 − x2 + 2x3 = 2.

2.4 Distances, Angles and the Inner Product


2.4.1 Computations involving the length of a vector, the distance and angle be-
tween two vectors (the definitions are given below) become easier with the
notion of inner product. This section is devoted to a discussion of these
notions.
We start with the plane or with the 3-dimensional space and introduce
an origin. The length of a vector x is the distance between head and tail of
(any representative arrow of) the vector. The length of x is denoted by k x k.
The distance between the two vectors u and v is by definition the length of
the difference u − v (or v − u), so k u − v k.

2.4.2 The inner product


The inner product of two vectors u and v, both 6= 0, is defined as

k u k · k v k · cos ϕ,

where ϕ is the angle between the vectors u and v (note the role of the cosine:
the sign of the angle doesn’t matter). If one (or both) of the vectors is the
52 Vectors in two and three dimensions

zero vector, then the inner product is, by definition, 0. We denote the inner
product by
(u, v).
Here is an example. Suppose the vectors u, v have length 4 and the angle
v

v cos φ

Figure 2.7: If the angle between the vectors u and v is at most π/2, then
their inner product equals the product of the length of u and the length of the
projection of v on the line x = λu.

between them is 60◦ (or π/3 radians), then


1
(u, v) = 4 · 4 cos 60◦ = 4 · 4 · = 8.
2
If the angle is 120◦ , then the inner product changes into −8. In particular,
the inner product can be negative. In the literature other notations occur,
like u • v (the inner product is sometimes called the dot product). Here are
some remarks and properties (we do not always treat the case of zero vectors
separately).
• If v = u, both 6= 0, then the angle is 0, so the cosine is 1 and we obtain
(u, u) =k u k2 . Or: p
k u k= (u, u).
This relation also holds in case u = 0. Note that (u, u) ≥ 0 for every
vector u, and that (u, u) = 0 occurs precisely if u = 0.
• If the inner product of two nonzero vectors is known as well as their
lengths, then (the cosine of) the angle between the vectors can be
computed:
(u, v)
cos ϕ = .
kuk·kvk
2.4 Distances, Angles and the Inner Product 53

Once we work with coordinates this is often useful.

• Symmetry of the inner product:


(u, v) = (v, u) for all vectors u and v. This follows immediately from
the definition, since

k u k · k v k · cos ϕ =k v k · k u k · cos ϕ.

(The angle between the vectors is the same in both cases.)

• Behaviour with respect to scalar multiplication:


λ(u, v) = (λu, v) = (u, λv) for all vectors u, v and for every scalar λ.
Fill in the details yourself (distinguish the cases λ > 0, λ < 0 and
λ = 0).

• Behaviour with respect to vector addition:

(u + v, w) = (u, w) + (v, w),


(u, v + w) = (u, v) + (u, w)

for all vectors u, v en w.

• Orthogonality:
If two non-zero vectors have inner product 0, then they are perpen-
dicular (the angle between them is ±90◦ or ±π/2) since the cosine of
the angle between them is 0. Conversely, if two non-zero vectors are
perpendicular, then their inner product is 0. Now the zero vector has
inner product 0 with any vector, and we agree to say that the zero
vector is perpendicular to any vector. This is a convenient convention
since then we have: The inner product of two vectors is 0 if and only
if the two vectors are perpendicular.

2.4.3 Examples. Although lengths and angles are maybe what we are really in-
terested in, the inner product is so useful because of the arithmetic rules it
satisfies. For instance, k u + v k usually differs from k u k + k v k, but
(u + v, u + v) is easy to expand using the rules. Often it is therefore useful
to translate problems involving lengths and angles into problems with in-
ner products. Here are some examples demonstrating the use of the inner
product’s properties.
54 Vectors in two and three dimensions

a) Suppose that (u, v) = 2. Using the arithmetic rules the inner product
(3 u, −4 v) is computed as follows:

(3 u, −4 v) = 3(u, −4 v) = 3 · −4 (u, v) = −12 · 2 = −24.

b) If k u k= 2, k v k= 3 en (u, v) = 1, then, using the arithmetic rules


again, you can determine, for instance, (u + v, u − 2v). Here is the first
step of the computation:

(u + v, u − 2v) = (u, u − 2v) + (v, u − 2v).

Next, we turn to the first term on the right-hand side, (u, u − 2v):

(u, u − 2v) = (u, u) + (u, −2v) = (u, u) − 2(u, v).

Now (u, u) =k u k2 = 4 and (u, v) = 1, so we find (u, u−2v) = 4−2 = 2.


In a similar way we deal with the term (v, u − 2v):

(v, u − 2v) = (v, u) + (v, −2v) = (v, u) − 2(v, v) = 1 − 2 · 9 = −17.

So we find (u + v, u − 2v) = 2 − 17 = −15.

2.4.4 The inner product and coordinates


Let e1 , e2 be a basis of the plane consisting of two perpendicular vectors of
length 1 (an orthonormal basis). In terms of coordinates we find an easy to
memorize expression (in the case the basis is not orthonormal the expressions
become complicated, we will not discuss that situation). Now suppose v =
v1 e1 + v2 e2 and w = w1 e1 + w2 e2 are two vectors in the plane, then we
find, using the properties of the inner product and using that (e1 , e1 ) = 1,
(e1 , e2 ) = 0, (e2 , e1 ) = 0, (e2 , e2 ) = 1:
(v, w) = (v1 e1 + v2 e2 , w1 e1 + w2 e2 )
= v1 w1 (e1 , e1 ) + v1 w2 (e1 , e2 ) + v2 w1 (e2 , e1 ) + v2 w2 (e2 , e2 )
= v1 w 1 + v2 w 2 .
So:
(v, w) = v1 w1 + v2 w2 .
In particular, we obtain an easy (and well-known) expression for the length
of a vector v = v1 e1 + v2 e2 :
q
k v k= v12 + v22 .
2.4 Distances, Angles and the Inner Product 55

In terms of coordinates, the distance between u = (u1 , u2 ) and v = (v1 , v2 )


equals p
k u − v k= (u1 − v1 )2 + (u2 − v2 )2 .
The cosine of the angle ϕ between the vectors (both 6= 0) v = v1 e1 + v2 e2
and w = w1 e1 + w2 e2 is equal to
(v, w) v1 w 1 + v2 w 2
cos ϕ = =p 2 .
kvk·kwk
p
v1 + v22 · w12 + w22
In a similar way, using an orthonormal basis e1 , e2 , e3 in space (so vectors
of length 1 and mutually perpendicular), we have the following expression in
coordinates for the inner product of the vectors v = v1 e1 + v2 e2 + v3 e3 and
w = w1 e1 + w2 e2 + w3 e3 :

v1 w 1 + v2 w 2 + v3 w 3 .

The coordinate expression for the length of the vector v is


q
k v k= v12 + v22 + v32 .

The distance between u and v equals


p
k u − v k= (u1 − v1 )2 + (u2 − v2 )2 + (u3 − v3 )2 .

Finally, the cosine of the angle between the vectors v and w (both 6= 0) equals
(v, w) v1 w 1 + v2 w 2 + v3 w 3
cos ϕ = =p 2 .
kvk·kwk
p
v1 + v22 + v32 · w12 + w22 + w32
2.4.5 R2 , R3 and the standard inner product
Motivated by the previous discussion, we introduce th so-called standard
inner product in R2 and R3 , viewed as vector spaces themselves (more on
this in later chapters). A vector in R2 is a pair of real numbers like (a1 , a2 ).
The standard inner product of two vectors a = (a1 , a2 ) and b = (b1 , b2 ) in R2
is defined as
(a, b) := a1 b1 + a2 b2 .
Similarly, he standard inner product of two vectors a = (a1 , a2 , a3 ) and b =
(b1 , b2 , b3 ) in R3 is defined as

(a, b) := a1 b1 + a2 b2 + a3 b3 .
56 Vectors in two and three dimensions

2.4.6 Example. The angle ϕ between the vectors u = (1, 0) and v = (1, 1) in R2
can be determined as follows.
(u, v) 1·1+0·1 1 1√
cos ϕ = =√ √ =√ = 2.
kuk·kvk 12 + 02 · 12 + 12 2 2

So the angle is π/4 (or 45◦ ).

2.4.7 Normal vectors


If u = (u1 , u2 , u3 ) and v = (v1 , v2 , v3 ) are two vectors in the plane V with
equation 2x1 − x2 + 3x3 = 6, then 2u1 − u2 + 3u3 = 6 en 2v1 − v2 + 3v3 = 6.
Subtracting yields

2(u1 − v1 ) − (u2 − v2 ) + 3(u3 − v3 ) = 0.

We can rephrase this equality as an inner product:

((2, −1, 3), (u1 − v1 , u2 − v2 , u3 − v3 )) = 0.

This means that the difference u − v is perpendicular to (2, −1, 3). In par-
ticular, (2, −1, 3) is a vector which is perpendicular to all direction vectors
of the plane. We call (2, −1, 3) a normal vector of the plane.
In general, if a1 x1 + a2 x2 + a3 x3 = d is an equation of the plane V , then
we can rewrite it in the form of an inner product:

(a, x) = d,

where a = (a1 , a2 , a3 ) and x = (x1 , x2 , x3 ). If u and v are two vectors in the


plane, then (a, u) = d and (a, v) = d. Subtracting yields (a, u) − (a, v) = 0,
so that, using the properties of the inner product:

(a, u − v) = 0.

In other words, u − v is perpendicular to a. In particular, direction vectors


of the plane V are all perpendicular to a. The vector a is called a normal
vector of the plane.
The situation for lines in the plane is similar. If a1 x1 + a2 x2 = d is an
equation of a line, then (a1 , a2 ) is a normal vector of the line. This vector is
perpendicular to every direction vector of the line.
2.4 Distances, Angles and the Inner Product 57

2.4.8 Pythagoras
If u and v are perpendicular vectors, then we find for the square of the length
of the sum vector u + v:

k u + v k2 = (u + v, u + v)
= (u, u) + 2(u, v) + (v, v)
= (u, u) + (v, v)
=k u k2 + k v k2 .

Similarly, k u − v k2 =k u k2 + k v k2 . This is a vector form of the


u + v

u −v

v
v

u u

Figure 2.8: If u and v are perpendicular, then k u + v k2 =k u k2 + k v k2


and k u − v k2 =k u k2 + k v k2 . The figure illustrates the relation with the
Pythagorean theorem.

Pythagorean theorem: the triangle with vertices the endpoints of 0, u, v is a


right triangle whose sides opposite the hypothenuse have lengths k u k and
k v k, respectively. The hypothenuse has length k u − v k.
Give a similar geometric interpretation of the equality k u + v k2 =k u k2
+ k v k2 .

2.4.9 Example. We determine the distance between (the endpoint of) p = (1, 2)
and the line ℓ : x = (8, 1) + λ(3, −4). To this end we first determine a vector
q on ℓ such that p − q is perpendicular to ℓ, i.e., perpendicular to a. To find
q, we solve:

((1, 2) − (8, 1) − λ(3, −4), (3, −4)) = 0 or (−7) · 3 − 9λ + 1 · (−4) − 16λ = 0.

This leads to λ = −1. So q = (5, 5). The distance between p and q is


p
(5 − 1)2 + (5 − 2)2 = 5. This is also the distance between p and the line ℓ:
58 Vectors in two and three dimensions

q
q

p
r
p
r

Figure 2.9: To compute the distance between p and the line ℓ, we determine
a vector q on ℓ such that p − q is perpendicular to ℓ. If r is an arbitrary
vector on ℓ, then the right-hand figure illustrates that the distance between p
and r is greater than (or equal to) the distance between p and q because of
the Pythagorean theorem.

for every vector on ℓ, its distance to p turns out to be at least as big. Here
is why. If r is a vector on ℓ, then we should compare k p − r k and k p − q k.
Since p − q is perpendicular to q − r (why?), we can apply the Pythagorean
theorem to the triangle with vertices p, q, r. In vector language: we apply
Pythagoras to the vectors u = p − q, v = q − r and their sum u + v = p − r.
We obtain:
k p − r k2 =k p − q k2 + k q − r k2 .
Evidently, k p−r k≥k p−q k (with equality if and only if q = r). So k p−q k
is the distance between p and ℓ.

2.5 The cross product


2.5.1 The definition of the cross product
The inner product of two vectors is a real number. There is also a construc-
tion that assigns to two vectors in space a new vector with special and useful
properties. We discuss this construction on the level of coordinates, so in R3 .
The cross product v×w of the vectors v = (v1 , v2 , v3 ) and w = (w1 , w2 , w3 )
is by definition the vector
(v2 w3 − v3 w2 , v3 w1 − v1 w3 , v1 w2 − v2 w1 ).
This looks pretty complicated, but turns out to always produce a vector
perpendicular to both v = (v1 , v2 , v3 ) and w = (w1 , w2 , w3 ). In a sense it
2.5 The cross product 59

provides a ‘universal’ answer to the question: what is a vector perpendicular


to two given vectors in R3 ? In R2 , the analogue would be the much easier
question: what is a vector perpendicular to a given vector (a, b)? In this case
a useful answer is easy to see: (−b, a).
The cross product has more properties, like the following: its length
equals k v k · k w k · sin ϕ, where ϕ is the angle (0 ≤ ϕ ≤ π) between
v and w. This length equals the surface area of the parallelogram spanned
by v and w.
Here is a list with properties of the cross product. They can all be verified
by expanding the relevant expressions, although d) requires some thought,
see below. There are more properties but they are beyond the scope of these
lecture notes. For all v, w, etc., we have:
a) v × v = 0.
b) The cross product of v and w is perpendicular to both v and w, i.e.,
the corresponding inner products are 0:
(v × w, v) = 0 and (v × w, w) = 0.
This property can be used to determine, for instance, a normal vector
to a plane given its direction vectors.
c) The cross product is antisymmetric:
v × w = −(w × v).

w sin φ
φ

Figure 2.10: The length of the cross product of v and w is the area of the
parallelogram spanned by v and w.

d) The length of the cross product equals


k v × w k=k v k · k w k · sin ϕ,
60 Vectors in two and three dimensions

where ϕ is the angle between v en w. This is precisely the surface area


of the parallelogram spanned by v and w (‘base times heigth’, where
the base has length k v k and the height equals k w k · k sin ϕ k).

e) The connection with the vector addition is as follows:

u × (v + w) = u × v + u × w en (v + w) × u = v × u + w × u.

f) The connection with scalar multiplication is as follows:

λ(v × w) = (λv) × w = v × (λw).

The properties b) and d) almost determine the cross product, but not quite:
the cross product could still point in two different directions perpendicular
to the plane spanned by v and w. Which direction to choose is based on
the right hand rule if you put your right hand along v in such a way that
your fingers curl from v to w (so either your litte finger or your index finger
touches v), then your thumb points in the direction of v × w.

2.5.2 On the proof of property d)


Property d) on the length of the cross product is best approached by some
subtle manoeuvring to deal with the factor sin ϕ. Here are the various steps.
Firstly, instead of proving that k v × w k=k v k · k w k · sin ϕ, we show that
k v × w k2 equals
k v k2 · k w k2 · sin2 ϕ.
Secondly, replace sin2 ϕ by 1−cos2 ϕ and use that (v, w) =k v k · k w k · cos ϕ
to replace k v k2 · k w k2 · sin2 ϕ by

k v k2 · k w k2 ·(1 − cos2 ϕ) =k v k2 · k w k2 −(v, w)2 .

So now we are reduced to showing that k v × w k2 equals k v k2 · k w k2


−(v, w)2 , i.e., (v2 w3 − v3 w2 )2 + (v3 w1 − v1 w3 )2 + (v1 w2 − v2 w1 )2 equals

(v12 + v22 + v32 )(w12 + w22 + w32 ) − (v1 w1 + v2 w2 + v3 w3 )2 .

This verification is straightforward and left to the reader.

2.5.3 The volume of a parallelepiped


The volume of the parallelepiped P ‘spanned’ by the vectors a, b, c can be
2.5 The cross product 61

axb

c cos φ
b

Figure 2.11: The volume of the parallelepiped equals the absolute value of
(a × b, c).

expressed using the inner product and cross product. To obtain this expres-
sion, note that the volume is the product of the area of the parallelogram
spanned by a and b and the height. The area of the parallelogram is k a × b k
as we saw before. Since a×b is perpendicular to the parallelogram, the height
equals the (length of the) projection of c on a × b, i.e., the absolute value of
k c k · cos ϕ, where ϕ is the angle between c and a × b. So, the volume of the
parallelepiped is

k a × b k · k c k ·| cos ϕ| = |(a × b, c)|.

In conclusion, the volume of the parallelepiped spanned by a, b, c is

|(a × b, c)|.

2.5.4 Examples. a) Een normaalvector van het vlak V met parametervoorstelling


x = (1, 2, 3) + λ(1, 2, 1) + µ(3, 1, 0) is

(1, 2, 1) × (3, 1, 0) = (−1, 3, −5).

Een vergelijking van het vlak is dus −x1 + 3x2 − 5x3 = d voor een
of andere d. Vullen we (1, 2, 3) in, dan vinden we dat d = −10. Een
vergelijking is dus −x1 + 3x2 − 5x3 = −10.

b) De oppervlakte van de driehoek met hoekpunten (0, 0, 0), (1, 2, 1) en


(2, −1, 3) is gelijk aan

1 1 1√
k (1, 2, 1) × (2, −1, 3) k= k (7, −1, −5) k= 65 .
2 2 2
62 Vectors in two and three dimensions

2.6 Vectors and geometry


2.6.1 Vectors can be a useful tool in addressing geometric problems. Below we
describe how to translate a geometric situation into the language of vectors
and how to solve (some) geometric problems using our vector techniques.
Not all geometric problems lend themselves for such an approach, but vector
techniques, just like complex numbers, do provide additional tools for dealing
with geometric problems.

2.6.2 The medians in a triangle


A well-known theorem in geometry states that the three medians in a triangle
are concurrent, i.e., have a point in common, the centroid. The medians
(segments or lines as the situation requires) join the vertices of a triangle to
the midpoints of the opposite sides.

1 1
(a + c) (b + c)
2 2

A 1 B
(a + b)
2

Figure 2.12: The three medians in △ABC are concurrent. The vector de-
scription of the midpoints of the sides is given.

For triangle △ABC we denote the vectors corresponding to the vertices as


1
follows: a, b, c. The midpoint of side BC corresponds to the vector (b + c).
2
A parametric description of the median through A is then
 
1
x=a+λ (b + c) − a .
2
2.6 Vectors and geometry 63

The other two medians have the following descriptions:


 
1
x=b+µ (a + c) − b
2 
1
x=c+ρ (a + b) − c .
2

The question whether the three medians have a point in common comes down
to the question whether the parameters λ, µ and ρ can be chosen in such a
way that the three parametric descriptions describe the same vector. The
answer is ‘yes.’ Indeed, for λ, µ, ρ all equal to 2/3 we obtain the common
1
vector (a+b+c). This gives a vector description of the centroid of a triangle.
3
Note that it looks like an ‘average’ of the three vectors corresponding to the
vertices.
Since we need the parameter value 2/3, the vector approach also shows
that the medians, now viewed as segments, divide one another in the ratio
2 : 1.
Note that the common value for λ, µ, ρ can also be computed. Try finding
that value by rewriting the vector equation
   
1 1
a+λ (b + c) − a = b + µ (a + c) − b
2 2

in the form (2−2λ−µ)a+(λ−2+2µ)b+(λ−µ)c = 0. Note that this equation


does not imply that the coefficients of a, b, c are all 0 (why?). Fortunately,
we don’t need that, we just need a solution. This is a subtle point. If you
don’t trust it, put the origin in A, say, and the computations (and subtleties)
simplify.

2.6.3 A parallelogram in a quadrangle


A second example concerns an arbitrary quadrangle ABCD in the plane in
which no two points coincide. Let E, F , G, H, respectively, be the midpoints
of (the segments) AB, BC, CD, AD, respectively. The figure suggests the
theorem: quadrangle EF GH is a parallelogram (regardless of the position of
the points A, B, C, D)!
Again, we use vectors. We have to show that EF and HG are parallel
and equal in length. In vector language this comes down to showing that
e − f = ±(h − g), where we indicate vectors corresponding to the points
64 Vectors in two and three dimensions

C
G

D
F
H

A E B

Figure 2.13: The midpoints of the sides of quadrangle ABCD form a paral-
lelogram.

in the obvious way. First we express the vectors e, f , g, h in terms of the


vectors a, b, c, d:
1 1 1 1
e = (a + b), f = (b + c), g = (c + d), h = (a + d).
2 2 2 2
Then we analyze e − f and h − g:

1 1 1
e − f = (a + b) − (b + c) = (a − c)
2 2 2
and
1 1 1
h − g = (a + d) − (c + d) = (a − c).
2 2 2
This finishes the proof.

2.6.4 The altitudes in a triangle are concurrent


The altitudes of a triangle are the segments (or lines, when convenient) con-
necting a vertex of the triangle with the (unique) point on the opposite side
(extended if necessary) so that the segment (or line) is perpendicular to this
opposite side.
We use vectors and the inner product to show that the three altitudes in
a triangle are concurrent, i.e., pass through a common point, the so-called
orthocenter of the triangle.
2.6 Vectors and geometry 65

A B

Figure 2.14: The altitudes in △ABC are concurrent. The altitude from B is
dashed.

So, let △ABC be a triangle. Suppose the altitudes from A and C meet
in P . The vector corresponding to P is denoted by p. The fact that AP is
perpendicular to BC and CP is perpendicular to AB translates as follows:
p − a ⊥ b − c or (p − a, b − c) = 0,
(2.1)
p − c ⊥ a − b or (p − c, a − b) = 0.
In order to prove that P is on the altitude from B, we will show that p − b
and a − c are perpendicular. First we use the bilinearity of the inner product
to rewrite the expressions in (2.1):
(p, b) + (a, c) = (p, c) + (a, b)
(p, a) + (c, b) = (p, b) + (c, a)
Adding (the left-hand sides and right-hand sides, respectively, of) these equa-
tions yields
(p, a) + (c, b) = (p, c) + (a, b),
which can be rewritten as
(p − b, a − c) = 0.
So we are done.
Note that we haven’t used the freedom to choose an origin. Choosing
a convenient origin might simplify the computations. In our case, a clever
choice would be to put the origin in P . Please check yourself in what way
the computation then simplifies.
66 Vectors in two and three dimensions

2.6.5 The nine-point circle revisited


Vectors can also be used to analyse the nine-point circle of a triangle ABC,
i.e., the circumcircle of the midpoints of the sides of the triangle, discussed
in the previous chapter using complex numbers.
C

M
P
E D
Q
O
H N

K L
A R F B

Figure 2.15: The nine points D, E, F , K, K, L, M , P , Q, R on the


circumcircle of triangle DEF with center N . The orthocenter H and the
center O of the circumcircle of △ABC are also drawn.

Choose the origin O in the circumcenter of △ABC. Let a, b, etc., denote


the vectors corresponding to the vertices A, B, etc. Then k a k=k b k=k c k.
Denote by D, E, F the midpoints of the sides BC, AC, AB, respectively.
Then d = 12 (b + c), e = 21 (a + c), and f = 12 (a + b).
Let N be the point corresponding to the vector n = 12 (a + b + c). This is
the center of the circle through D, E and F , since the distance between N
and each of these points is 21 k a k:

k d − n k=k 21 (b + c) − 21 (a + b + c) k= 21 k a k,
k e − n k=k 21 (a + c) − 12 (a + b + c) k= 21 k b k,
k f − n k=k 21 (a + b) − 21 (a + b + c) k= 21 k c k .

The orthocenter H of △ABC turns out to correspond to the vector h =


a + b + c. To prove this, we show that h is on the three altitudes. For
2.7 Notes 67

instance, h − a ⊥ c − b follows from

(h − a, c − b) = (b + c, c − b) = 0.

And similarly for the two other altitudes.


Let K be the midpoint of segment AH. Then k = 12 (a + a + b + c). The
distance between K and N equals

1 1 1
k k − n k=k (a + a + b + c) − (a + b + c) k= k a k .
2 2 2
So K is also on the circumcircle of triangle DEF . Similar computations
show that the midpoints L of BH and M of CH are on this circle.
That the circle also passes through the feet of the three altitudes is left
as an exercise.

2.6.6 In later chapters we will be able to handle rotations and reflections using
vectors.

2.7 Notes
This chapter serves as a quick and slightly informal introduction to ‘concrete’
vectors in the plane and in space. In Chapter 4 the general notion of a vector
space will be discussed. The notions and techniques discussed in this chapter (and
their extensions presented in the following chapters) are of direct use in many
branches of mathematics (algebra, analysis, statistics, optimization) and other
disciplines like physics.

2.8 Exercises
§1

1 Given arbitrary vectors u and v, draw the vectors

a. 2u + 3v,

b. u − v.
68 Vectors in two and three dimensions

2 Use the computational rules for vectors to verify that

v 1 + ((v 2 + v 3 ) + v 4 ) = (v 2 + v 1 ) + (v 4 + v 3 ).

§2

3 Let u and v be two (distinct) vectors.

a. Why is
x = u + λ(v − u)
a vector parametric equation of the line through (the endpoints of) u
and v?

b. Which of the following vector parametric equations describes the same


line?

x = (1 − λ)u + λv, x = v + µ(u − v), x = 2v − u + ρ(u − v).

c. Is −2u + 3v on the line?

4 Suppose u, v, w are distinct vectors in space.

a. Show that
x = u + λ(v − u) + µ(w − u)
is a vector parametric equation of the plane through u, v and w (where
we assume that none of the three vectors is on the line through the
remaining two).

b. Which of the following is also a parametric equation of this plane?

x = (1 − λ − µ)u + λv + µw,
x = v + λ(v − u) + µ(w − u),
x = u + λ(w − v) + µ(w − u).

5 The line ℓ has parametric equation x = u + λ(v − u).

a. For which values of λ is x between u and v?

b. For which value of λ is x the midpoint of the segment connecting the


endpoints of u and v?
2.8 Exercises 69

c. If x divides the segment connecting the endpoints of u and v in the


ratio 2 : 1, then what is the value of λ?

§3

6 Determine a parametric equation for each of the lines in a) and b) and for
each of the planes in c) and d).

a. The line passing through (2, 1, 5) and (5, −1, 4).

b. The line passing through (1, 2) and (2, 4).

c. The plane passing through (1, 2, 2), (0, 1, 1) and (1, 3, 2).

d. The plane containing the line x = (−2, 1, 3) + λ(1, 2, −1) and the point
(4, 0, 3).

7 Determine whether (3, 4, 0) is on the line with parametric description x =


(1, 2, 1) + λ(2, 2, −1). Are x = (3, 4, 0) + λ(2, 2, −1) and x = (1, 2, 1) +
µ(−2, −2, 1) vector parametric equations of the same line?

8 Determine an equation for each of the following lines.

a. x = (1, 3) + λ(2, −1).

b. x = (2, 2) + λ(1, −1).

c. x = (3, 4) + λ(0, 2).

9 Determine a parametric equation for each of the following lines in R2 .

a. 2x1 + 3x2 = 3.

b. 3x1 − 4x2 + 7 = 0.

c. 2x2 = 5.

10 Determine an equation for each of the following planes.

a. x = (2, 0, 1) + λ(1, 0, 2) + µ(1, −1, 0).

b. x = (1, 1, 1) + λ(1, 1, 0) + µ(0, 1, 1).


70 Vectors in two and three dimensions

c. x = λ(4, 1, 1) + µ(0, 1, −1).

11 Determine a parametric equation for each of the following planes.


a. x1 + x2 − 3x3 = 5.

b. 2x1 + 3x2 + 5x3 = 0.

c. x2 = 5.
§4

12 Draw a vector u of length 2 in the plane. Draw all vectors in the plane having
inner product 1 with u.

13 Use the properties of the inner product to prove the following:


a. (λu, µv) = λµ(u, v) for all vectors u, v and all scalars λ and µ.

b. (u + v, u − v) =k u k2 − k v k2 for all vectors u and v.

14 a. Compute the length of the vector (−2, 2, 1).

b. Compute the distance between the vectors (1, −1, 1) and (1, −4, 5).

c. Compute the angle between the vectors (1, 1, 2) and (1, 1, −1).

d. Determine the number a so that the vector (1, −2, a) is perpendicular


to the vector (3, 1, −1).

15 In each of the following cases determine an equation of the line passing


through the given point and which is perpendicular to the given line. Deter-
mine in each case the distance between the given point and line.
a. P = (3, 2) and ℓ : x = (2, 1) + λ(1, −1).

b. P = (1, 2) and ℓ : 3x1 − 4x2 = 20.

16 The plane V has the following equation: 2x − y + 2z = 18.


a. Determine the distance between (0, 0, 0) and V .

b. The plane W : 2x − y + 2z = 24 is parallel to V . Compute the distance


between V and W .
2.8 Exercises 71

§5

17 Use the cross product to determine a normal vector and an equation of each
of the following planes.

a. x = (1, 2, 2) + λ(1, −1, 0) + µ(0, 1, 1).

b. x = (2, 1, 0) + λ(1, 2, 0) + µ(0, 2, 3).

18 Compute using the cross product:

a. The surface area of the triangle with vertices (1, 1, 0), (2, 1, 1), (1, 3, 3).

b. The surface area of the triangle with vertices (2, 0), (5, 1), (1, 4).

c. The volume of the parallellepiped spanned by (1, 1, 1), (2, 2, 3), (1, 0, 1).

§6

19 In 2.6.2 parametric vector descriptions were given of the three medians in a


triangle. The values of the parameters corresponding to the centroid were
not computed there, but simply given. In this exercise we take a look at the
way these values can be computed.

a) Show that intersecting the medians through A and B, respectively,


leads to the equation

(2 − 2λ − µ)a + (λ − 2 + 2µ)b + (λ − µ)c = 0.

b) Equation a) is satisfied if all coefficients are equal to 0. What does this


mean for λ en µ? Show that ρ can be obtained in a similar way.

c) Does equation a) imply that all coefficients must be 0?

d) Take a look at c) under the assumption that the origin is not in the
plane of the triangle.

20 Three non-collinear points determine a triangle. Similarly, four non-coplanar


points in space determine a (not necessarily regular) tetrahedron.

a) Define, analogously to the notion of a median in a triangle, the notion


of a median in a tetrahedron ABCD.
72 Vectors in two and three dimensions

b) Show that the four medians in a tetrahedron are concurrent, i.e., pass
through one point (the centroid), and describe the centroid in terms of
vectors.

21 The perpendicular bisector of a segment is the line through the midpoint


of the segment which is perpendicular to the segment. In this exercise we
are going to prove that the three perpendicular bisectors of a triangle are
concurrent, i.e., pass through a common point.

Q P
S
A R B

Figure 2.16: The perpendicular bisectors of a triangle pass through a common


point.

Let P , Q, R be the midpoints of the sides BC, AC, AB of triangle


△ABC.

a) Let S be the intersection point of the perpendicular bisectors of AC


and BC. Which inner products with the vectors s − p and s − q must
be 0?

b) Use the properties of the inner product to prove that s − r is perpen-


dicular to a − b. Conclusion?

c) Another property of a perpendicular bisector of a segment is that every


point on it has equal distances to the endpoints of the segment. Use
vector parametric equations of the perpendicular bisector of AC to
show that each point on this bisector has equal distances to A and C.

22 (The nine-point circle) In this exercise we show that the feet of the alti-
tudes of △ABC are also on the nine-point circle.
2.8 Exercises 73
C

M
P
E D
Q
O
H N

K L
A R F B

a) Use vectors to show that EF LM is a rectangle (show that LF and


EM are parallel and have equal length, that EF and M L are parallel
and have equal length, and that LF ⊥ EF ). Conclude that M F is a
diameter of the nine-point circle.

b) Why is M R ⊥ RF ? Conclude that R is on the nine-point circle (use


Thales or show that k r − n k= 12 k a k using (r − f , m − r) = 0).

2.8.1 Exercises from old exams


23 Let ℓ: x = (3, 1, 2) + λ(1, 3, −2), p = (6, 10, −4), and let V be the plane with
equation x + y + z = 0.

a) Show that p is on ℓ and determine the intersection of ℓ and V .

b) The perpendicular projection of the line ℓ on V , i.e., the collection of


perpendicular projections of the vectors on ℓ) is a line. Determine a
vector parametric equation of this line.

24 In △ABC (the points A, B, C are non-collinear) P is the midpoint of the


segment BC and R is the point on the line AB such that A is the midpoint
of the segment BR. Use vectors to determine the point of intersection Q of
the lines P R and AC, and show that AQ : QC = 1 : 2.
74 Vectors in two and three dimensions

25 Suppose ℓ is the line in R3 given by ℓ: x = (3, 0, 3) + λ(1, −2, 2), and V is


the plane given by the equation 2x + 2y + z = 0.

a) Show that ℓ and V do not intersect.

b) Determine the distance between ℓ and V .

c) Suppose the plane W contains ℓ and suppose the vector (1, 0, 7) is a


direction vector of W . Determine a vector parametric equation of the
line of intersection of V and W .

26 Let ℓ : x = λa and m : x = µb be two distinct lines in the plane. Suppose


the vectors a and b both have length 1. Let p 6= 0 be a vector such that its
angle with a equals its angle with b. Determine, for each real scalar α, the
perpendicular projections of α p on ℓ and m, respectively. Use this to prove
that the distance between α p and ℓ equals the distance between α · p and m.

27 Let V be the plane with equation 2x + y + 3z = 0 and let ℓ be the line with
parametric description x = (4, 0, 2) + λ(1, 1, −1).

a) Show that ℓ and V do not intersect.

b) Determine the perpendicular projection of ℓ onto V .

28 Let ABC be a triangle in the plane (so A, B, C are not collinear). Suppose
P is a point on the line AB such that A is the midpoint of the segment P B.
Let R be the point on the segment BC such that BR : RC = 2 : 1. Choose
a convenient origin and denote the vectors corresponding to points in the
usual way: c corresponds to C, etc. Use vectors to determine the point of
intersection Q of the lines P R and AC. Also determine the ratio AQ : QC.
Chapter 3

Matrices and systems of linear


equations

3.1 Matrices
3.1.1 Matrices are rectangular arrays of numbers (or, more generally, elements from
some arithmetical structure, like polynomials) which turn out to be useful in
many places. In this chapter we discuss the arithmetic of matrices and the
role of matrices in solving systems of linear equations.
This first section deals with

• the notion of a matrix,

• the addition and (scalar) multiplication of matrices,

• special matrices such as the zero matrix, the identity matrix, the trans-
pose of a matrix,

• the inverse of a matrix.

3.1.2 What is a matrix?


A matrix is a rectangular array of numbers or elements from some arithmeti-
cal structure, like  
1 0 4 −2
A= .
0 2 0 1
In this example the matrix consists of 2 rows and 4 columns. If the matrix
consists of n rows and m columns, then we say that the matrix is a n by m

75
76 Matrices and systems of linear equations

or n × m matrix. So our example is a 2 × 4 matrix. We often denote matrices


by capitals.
The numbers in a matrix are called the entries, elements or coefficients of
the matrix. The elements of a matrix A are usually indicated by subindices
in the following way: A = (aij ) or A = (Aij ), where aij or Aij denotes the
element in the ith row and the j th column. If we denote the elements in the
example by aij , then
a12 = a21 = a23 = 0 , a13 = 4 .
Two matrices A = (aij ) and B = (bij ) are equal, A = B, if they have the
same dimensions, i.e., A and B have the same number of rows and the same
number of columns, and if all corresponding elements are equal, i.e., aij = bij
for all admissible i and j.
The set of all n × m-matrices is denoted by Mn,m . In case it is relevant
to know which set the coefficients of a matrix belong to, then this is denoted
as follows: Mn,m (R), Mn,m (C), etc.
Note that, apart from numbers, variables and polynomials occasionally
occur as entries in our matrices.
3.1.3 Matrix arithmetic: the addition of matrices
Matrices of the same dimension can be added in an obvious way (multipli-
cation is discussed below).
Let A = (aij ) and B = (bij ) be two n × m-matrices. Define, for all
1 ≤ i ≤ n and 1 ≤ j ≤ m,
cij = aij + bij .
The n × m-matrix C with entries cij is called the sum of the matrices A and
B. We write C = A + B. If the dimensions of two matrices A and B are not
the same, the sum is not defined.
Here are two examples:
     
1 −1 3 −1 −1 1 0 −2 4
+ = .
2 0 4 4 2 −2 6 2 2
The sum of ( 21 −01 43 ) and ( −41 −21 ) doesn’t exist because the dimensions don’t
match.
3.1.4 Property. Let A = (aij ), B = (bij ), C = (cij ) be three n × m-matrices. It
is straightforward to check the following two properties:
A+B =B+A (commutativity),
(A + B) + C = A + (B + C) (associativity).
3.1 Matrices 77

For instance, the second property can be proved as follows. First note that
A + B, B + C, (A + B) + C, A + (B + C) are all n × m matrices. Next note
that the element in position ij of the matrix (A + B) + C is ((A + B) + C)ij =
(A + B)ij + cij = (aij + bij ) + cij , and that the element in position ij of the
matrix A + (B + C) equals aij + (bij + cij ) (similar computation); of course,
these numbers are equal (here we use the associativity of the real or complex
numbers).
Due to the associativity we can just speak of A+B +C without specifying
which addition is carried out first, etc., since it doesn’t matter for the result.
Likewise we don’t necessarily need brackets in expressions like A+B +C +D,
since all ways of obtaining this sum, for instance as (A + B) + (C + D) or as
A + (B + (C + D)), lead to the same result.

3.1.5 Matrix arithmetic: scalar multiplication


There is also a useful way of multiplying matrices by numbers (scalars), called
scalar multiplication. Let λ be a number and let A be a n × m-matrix. The
matrix λA is obtained by multiplying every element of A by λ. For example,
   
1 2 3 2 4 6
2 = .
4 5 6 8 10 12

In general terms, by definition the element (λA)ij in position ij of λA equals


λ · aij , where A = (aij ).
We have the following properties for scalar multiplication (for all scalars
λ, µ and all matrices A and B of the same dimension):

1A= A,
(λ + µ)A = λA + µA,
λ(A + B) = λA + λB,
λ(µA) = (λµ)A.

The verifications are easy exercises. For instance, the last property is proved
by comparing the elements in position ij of both sides (for all i, j):

(λ(µA))ij = λ · (µA)ij = λ(µ · aij ) = (λµ) · aij = ((λµ)A)ij .

3.1.6 Matrix arithmetic: multiplication


Multiplying matrices is more complicated and, at least in the beginning, not
so intuitive.
78 Matrices and systems of linear equations

To begin with, we only define the product AB of two matrices A and


B if the rows of A have the same length as the columns of B. So if A is
a m × n-matrix, then B has to be a n × p-matrix for some p. The product
matrix C = AB is defined by

cij = ai1 b1j + ai2 b2j + . . . + ain bnj i = 1, . . . , m; j = 1, . . . , p.

So the matrix C is a m × p-matrix.

3.1.7 Examples. Here are some examples that can be verified using the definition
of the product of matrices.
 
  1 −1  
1 2 −1  −2 − 6 − 1
• 0  = ,
0 1 1 1 0
3 0
   
1 −1   1 1 −2
1 2 −1
•  −2 0  =  −2 −4 2 .
0 1 1
3 0 3 6 −3

In particular, we observe that AB and BA are not necessarily the same. We


say that matrix multiplication is not commutative. In the example AB and
BA even have distinct sizes.

• Even if A and B are square matrices of the same size, i.e., both are
n × n matrices for some n, then AB and BA may still differ:
     
1 −3 −1 2 − 4 − 13
= ,
3 4 1 5 1 26
     
−1 2 1 −3 5 11
= .
1 5 3 4 16 17

• If AB exists, then BA need not exist:


     
1 2 −1 3
= ,
3 1 2 −1
   
−1 1 2
does not exist!
2 3 1
3.1 Matrices 79

3.1.8 Property. For matrices of the correct dimensions, various arithmetic rules,
similar to those for ordinary real or complex numbers, hold. Here are the
most important ones.

A(B + C) = AB + AC and (E + F )G = EG + F G ,
(λA)B = λ(AB) ,
λ(µA) = (λµ)A,
(AB)C = A(BC) .

These rules follow from the definitions, but especially the third one requires
some effort. When we deal with linear transformations, we will discuss an
easy proof.
As a consequence of these rules we can, for example, simply write λAB
instead of (λA)B or λ(AB). Similarly, we can write ABC instead of (AB)C
or A(BC). Of course, putting in brackets may sometimes be useful to clarify
a computation.

3.1.9 The zero matrix


The n × m matrix all of whose entries are 0 is called the n × m-zero matrix .
We usually denote this matrix by O. For every n × m matrix A we have
A + O = O + A = A. Note: for every size n × m there is exactly one zero
matrix with that size. To avoid confusion we sometimes write On,m for the
n × m zero matrix.

3.1.10 The opposite matrix


The matrix B = (−1)A satisfies A + B = O and is called the (additive)
opposite of A. Instead of (−1)A we usually write −A.

3.1.11 The identity matrix


The n × n-matrix
1 0 ... 0
 
. . . .. 
0 1 . 

I=

.. ... ... 
 . 0 
0 ... 0 1
satisfies the following property: IA = AI = A for every n × n-matrix A.
This is easily verified using the definition of matrix multiplication (and it is
a good exercise to first try the case n = 2). The matrix I is called the n × n-
identity matrix . It plays a similar role in matrix arithmetic as the number 1
80 Matrices and systems of linear equations

in multiplications of numbers. If necessary, we emphasize the dimensions by


writing In .

3.1.12 The (multiplicative) inverse of a matrix


If, given a n × n matrix A, there exists a matrix n × n B with AB = BA = I,
then B is called the inverse of A. We usually denote this inverse by A−1 .
This is justified by the fact that such a B is unique: for if B ′ also satisfies
AB ′ = B ′ A = I, then B = BI = B(AB ′ ) = (BA)B ′ = IB ′ = B ′ . Also note
that is B is the inverse of A, then A is of course the inverse of B.
We really have to impose two conditions, AB = I and BA = I, since
matrix multiplication is not commutative, and so it is not obvious that once
a n × n matrix B satisfies AB = I, it also satisfies BA = I.
Fortunately, using linear transformations (Linear Algebra 2), it is fairly
easy to show for a n × n matrix B that AB = I implies BA = I (and
conversely, if BA = I then AB = I). So, if AB = I (or BA = I), then B
is an inverse of A. Without using linear transformations, the proof is more
complicated, which is why we skip it here.
The square matrix 0 has no inverse since OB = BO = O for every
matrix B (of the same size). But there also exist non-zero matrices without
an inverse.

3.1.13 Example. The following two matrices are each other’s inverse:
   
2 1 1 −1
, .
1 1 −1 2
Next consider  
1 1
A= .
1 1
If  
x u
B=
y v
is the inverse of A, then
     
1 1 x u 1 0
= ,
1 1 y v 0 1
so
x + y = 1, x + y = 0, u + v = 0, u + v = 1.
It is clear that there are no solutions for x, y, u, v.
3.1 Matrices 81

3.1.14 Let A and B be n × n matrices and suppose that A−1 and B −1 exist. Then

(A B)−1 = B −1 A−1 ,

since

(AB)(B −1 A−1 ) = A(B B −1 )A−1 = A(I A−1 ) = A A−1 = I

and, similarly, (B −1 A−1 )(AB) = I.

3.1.15 The transpose of a matrix


If A = (aij ) is a n × m matrix, then its transpose AT is the m × n matrix
whose i-th row equals the i-th column of A (for i = 1, . . . , m), so the i, j-th
entry of AT equals aji . The j-th column is then automatically equal to the
j-th row of A. In other words, you get the matrix AT from the matrix A by
interchanging the roles of rows and columns.

3.1.16 Examples.
 
  1 4
1 2 3
A= , AT =  2 5  .
4 5 6
3 6
 
1  
A =  2 , AT = 1 2 3 .
3
In short, transposing is ‘taking the mirror image with respect to the so-
called main diagonal’ (the main diagonal consists of the elements with indices
11, 22, . . .).

3.1.17 Property. It follows directly from the definition that the following rules
hold (supposing in each case that the operations can be carried out):

(A + B)T = AT + B T ,
(λA)T = λAT ,
(A B)T = B T AT ,
(AT )T = A.
82 Matrices and systems of linear equations

3.2 Row reduction


Row reduction refers to a useful type of algorithm on matrices, that enables
us, as one of its main applications, to solve systems of linear equations in
a systematic and efficient way. In this section we discuss the details about
row reduction, and in the next section we discuss its application to solving
systems of linear equations. More applications will follow in later chapters.

3.2.1 Row reduction


The main ingredient of row reduction consists of the following three so-called
elementary row operations that can be applied to (the rows of) a given matrix:

• Interchange the order of the rows (in particular, interchange two rows).

• Multiply every entry in a row by a nonzero constant.

• Replace a row by the sum of this row and a scalar multiple of another
row.

These row operations are inspired by the process of solving systems of lin-
ear equations, in which interchanging equations, multiplying equations by a
scalar, and adding a multiple of an equation to another, are used to simplify
and solve the equations. The relation between row operations and solving
systems of linear equations is discussed in the next section.
First we discuss an example of how to use these elementary operations to
change the given matrix into a special form with ‘many zeros’.

3.2.2 Example. Consider the matrix


 
1 2 −4 8
A =  −1 −2 6 −4  .
1 4 −2 0

We use the first row to get as many zeros as possible in the first column.
Therefore we add the first row to the second, and subtract it from the third
row (we work from top to bottom). We find:
 
1 2 −4 8
 0 0 2 4  .
0 2 2 −8
3.2 Row reduction 83

Next, we try to achieve the same in the second column without ruining the
first column. So we don’t use the first row, but instead interchange the second
and the third row:  
1 2 −4 8
 0 2 2 −8  .
0 0 2 4
Then we divide the second row by 2:
 
1 2 −4 8
 0 1 1 −4  .
0 0 2 4

Now we can use the second row to produce zeros in the second column. So
we subtract the second row twice from the first (note that this doesn’t affect
the first column!):  
1 0 − 6 16
 0 1 1 −4  .
0 0 2 4
In the next step, we use the third row. We first divide it by 2,
 
1 0 − 6 16
 0 1 1 −4 
0 0 1 2

and then add the new third row 6 times to the first row, and subtract it from
the second row. Note that this doesn’t alter the first two columns.
 
1 0 0 28
 0 1 0 −6  .
0 0 1 2

We can’t go any further since that would affect the first three columns. The
matrix obtained is called the (row ) reduced echelon form of A.

3.2.3 Row reduction: description of the steps


We now turn to the general description of the row reduction procedure in
the form of an ‘algorithm.’ Do compare this description with the example
just given.
The first step consists of the following.
84 Matrices and systems of linear equations

• Let n1 be the index of the first column (from the left) that contains a
non-zero element.

• If necessary interchange two rows so that the first element of the n1 -th
column is non-zero.

• Divide each element of the first row by the first element of the n1 -th
column so that we obtain a situation with a1n1 = 1.

• Use the first row to produce zeros in all other entries of the n1 -th
column.

Now suppose we have carried out m steps of this kind. In the resulting matrix
the first m rows have been used and the last column we have dealt with is
the nm -th. Then we do the following:

• Let nm+1 be the index of the first column that contains a non-zero
element in one of the spots with index at least m + 1.

• If necessary interchange the m + 1-th row with one of the next rows so
that the m + 1-th element of the nm+1 -th column is non-zero.

• Divide the m + 1-th row by this element so that am+1,nm+1 = 1.

• Use the m+1-th row to produce zeros in the other entries of the nm+1 -th
column.

This process stops if all rows have been used or if we are left with rows
consisting of zeros only.
The result of these row reduction steps is a matrix in so-called row reduced
echelon form or simply reduced echelon form. It looks as follows in the first
case:
 
0 ... 0 1 ∗ ... ∗ 0 ∗ ... ∗ 0 ∗ ... 0 ∗ ... ∗
 0 ... 0 0 0 ... 0 1 ∗ ... ∗ 0 ∗ ... 0 ∗ ... ∗ 
 
 0 ... 0 0 0 ... 0 0 0 ... 0 1 ∗ ... 0 ∗ ... ∗ 
 
 ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... 
 
 0 ... ... ... ... ... ... ... ... ... ... ... ... ... 0 ∗ ... ∗ 
0 ... 0 0 0 ... 0 0 0 ... 0 0 0 ... 1 ∗ ... ∗
3.3 Systems of linear equations 85

and as follows in the second case:


 
0 ... 0 1 ∗ ... ∗ 0 ∗ ... ∗ 0 ∗ ... 0 ∗ ... ∗
 0 ... 0 0 0 ... 0 1
 ∗ ... ∗ 0 ∗ ... 0 ∗ ... ∗ 

 0 ... 0 0 0 ... 0 0
 0 ... 0 1 ∗ ... 0 ∗ ... ∗ 

 ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... 
 
 ... ... ... ... ... ... ... ...
 ... ... ... ... ... ... 0 ∗ ... ∗ 

 0 ... 0 0 0 ... 0 0
 0 ... 0 0 0 ... 1 ∗ ... ∗ 

 0 ... 0 0 0 ... 0 0 0 ... 0 0 0 ... 0 0 ... 0 
 
 .. .. .. .. .. .. .. .. .. .. .. .. .. 
 . . . . . . . . . . . . . 
0 ... 0 0 0 ... 0 0 0 ... 0 0 0 ... 0 0 ... 0

The entries with a ∗ can be any numbers.

3.2.4 The shape of a row reduced matrix


A matrix in row reduced form has the following properties:

• Every row starts with (possibly zero) zeros. Its first nonzero entry (if
there is any) is 1 (its leading entry). The column containing this 1 has
zeros in all other entries.

• Every non-zero row starts with more zeros than the row directly above
it In particular, if there are any ‘zero rows’ (rows consisting of zeros
only), they are all below the non-zero rows.

The matrix  
1 −1 −2
0 1 3
is not in row reduced form, because the second row doesn’t satisfy the first
condition: there is a −1 above the 1 in the second column.

3.3 Systems of linear equations


3.3.1 In this section we explain the connection between row reducing matrices and
solving systems of linear equations. The resulting procedure for solving such
systems is also called Gaussian elimination after C.F. Gauss (1777-1855).
86 Matrices and systems of linear equations

3.3.2 Systems of linear equations


The equation
3x1 − 4x2 + 5x3 = 7
is called a linear equation in the variables x1 , x2 , x3 . The numbers 3, −4, 5,
7 are called the coefficients of this equation; the number 7 is also called the
right-hand side of the equation. If we are dealing with several such equations,
then we speak of a system of linear equations.
In such a system of linear equations
a11 x1 + a12 x2 + . . . + a1m xm = b1
a21 x1 + a22 x2 + . . . + a2m xm = b2
.. .. .. (3.1)
. . .
an1 x1 + an2 x2 + . . . + anm xm = bn
the matrix  
a11 . . . a1m
A =  ... .. 

. 
an1 . . . anm
is called the coefficient matrix and the row b = (b1 , . . . , bn ) (or column
(b1 , . . . , bn )⊤ ) is called the right-hand side. If bi = 0 for all i then the system
is called homogeneous, and otherwis it is called inhomogeneous. Note that
we can write the system (3.1) as follows in matrix form:
   
x1 b1
A  ...  =  ...  .
   
xn bn

A sequence of numbers (p1 , . . . , pn ) is called a solution of (3.1) if


   
p1 b1
A  ...  =  ...  .
   
pn bn

We can represent the system (3.1) by the matrix


 
a11 . . . a1m b1
 .. .. ..  .
 . . . 
an1 . . . anm bn
3.3 Systems of linear equations 87

containing all the aij and bk . This matrix is often denoted as (A|b) and
is called the extended coefficient matrix of the system. The vertical bar is
sometimes used to distinguish between the the two types of coefficients.

3.3.3 Examples. If the matrix (A|b) is in row reduced form, then it is easy to
describe the solutions of the system. Here are some examples.

 
1 0 2 x1 = 2,
• so
0 1 3 x2 = 3.

 
1 0 5 0
•  0 1 −2 0  . The last equation has the form 0x1 +0x2 +0x3 = 1
0 0 0 1
or 0 = 1. This equation has no solutions. We call the system inconsis-
tent.
 
1 0 5 2
•  0 1 −2 3  . Every triple (p1 , p2 , p3 ) satisfies the last equation,
0 0 0 0
so that we can just as well leave out this equation. What remains is:

x1 +5x3 = 2 ,
x2 −2x3 = 3 .

We now assign x3 (a variable from a column we can’t do anything with)


an arbitrary value, say λ. Then we find

x1 = 2 − 5λ,
x2 = 3 + 2λ,

so that
(x1 , x2 , x3 ) = (2, 3, 0) + λ(− 5, 2, 1).

3.3.4 Solutions of systems in row reduced form


Here is the general procedure for writing down the solutions of a system.
88 Matrices and systems of linear equations

Suppose that (A|b) is in row reduced form.


 
0 ... 0 1 ∗ ... ∗ 0 ∗ ... ∗ 0 ∗ ... 0 ∗ ... ∗
 0 ... 0 0 0 ... 0 1 ∗ ... ∗ 0 ∗ ... 0 ∗ ... ∗ 
 
 0 ... 0 0 0 ... 0 0 0 ... 0
 1 ∗ ... 0 ∗ ... ∗ 

 ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... 
 
 ... ... ... ... ... ... ... ... ... ... ...
 ... ... ... 0 ∗ ... ∗ 

 0 ... 0 0 0 ... 0 0 0 ... 0
 0 0 ... 1 ∗ ... ∗ 

 0 ... 0 0 0 ... 0 0 0 ... 0 0 0 ... 0 0 ... 0 
 
 .. .. .. .. .. .. .. .. .. .. .. .. .. 
 . . . . . . . . . . . . . 
0 ... 0 0 0 ... 0 0 0 ... 0 0 0 ... 0 0 ... 0

• If a row (0, . . . , 0, 1) occurs then the corresponding equation and there-


fore the system of equations no solutions; the system is inconsistent.

• Assign parameters to every variable corresponding to a column with


an ∗ (when the system is in row reduced echelon form). Then solve
the remaining variables (corresponding to columns with a 1 in the row
reduced echelon form). We then find:

– Exactly one solution if in the row reduced echelon form no column


with an ast occurs,
– ∞ many solutions otherwise.

We return to this below.

3.3.5 Row operations in terms of the system of equations


Next we turn to the situation of a system of linear equations (3.1) where
(A|b) is not in row reduced echelon form. The result we will deduce below is
that the solutions of a system do not change when we apply row operations
to the corresponding matrix: interchanging rows (equations), multiplying a
row (equation) by a non-zero scalar, adding a multiple of a row (equation)
to one of the other rows (equations).
Consider the equations

v : a1 x1 + a2 x2 + · · · + am xm = b and w : c1 x1 + c2 x2 + · · · + cm xm = d.

We define the sum of these equations by


3.3 Systems of linear equations 89

v + w : (a1 + c1 )x1 + · · · + (am + cm )xm = b + d


and, for an arbitrary (real or complex) scalar α, the scalar product by

αv : αa1 x1 + · · · + αam xm = αb .

Note that if we represent the equations v and w by the rows

v : (a1 , a2 , . . . , am , b),
w : (c1 , c2 , . . . , cm , d),

the equation v + w is represented by the sum of the rows

v + w : (a1 + c1 , a2 + c2 , . . . am + cm , b + d),

and the equation αv is represented by the scalar product

αv : (αa1 , αa2 , . . . , αam , αb) .

3.3.6 Applying row operations doesn’t change the solutions of the system
For the technique of applying row operations to work, it is essential that in
each step the solution set remains the same. We show this by proving that
each type of row operation doesn’t change the solution set.

• Changing the order of the equations.


Since we are not changing the individual equations, it is immediately
clear that the solutions do not change when we change the order of the
equations.

• Multiplication of an equation by a non-zero factor.


Let (p1 , . . . , pm ) be a solution of the equation

v : a1 x 1 + · · · + am x m = b .

This means that


a1 p 1 + · · · + am p m = b .
Now let α be number different from 0. Multiplying left-hand side and
righ-hand side by α produces the equality

αa1 p1 + · · · + αam pm = αb ,
90 Matrices and systems of linear equations

so that (p1 , . . . , pm ) is a solution of the eqution αv. So every solution


of the equation v is also a solution of the equation αv. For the same
reason, every solution of the equation αv is a solution of α1 αv = v,
because α 6= 0. So the equations v and αv have the same solutions.
In conclusion, if we multiply one of the equations by a non-zero scalar,
the solutions of the equation do not change. And then the solution set
of the system does not change.

• Replacing the equations v and w by v and v + βw.


Let (p1 , . . . , pm ) be a solution of the equations

v : a1 x1 + · · · + am xm = b and w : c1 x1 + · · · + cm xm = d.

This means that

a1 p1 + · · · + am pm = b and c1 p1 + · · · + cm pm = d.

From these two equalities we infer that for any scalar β

(c1 + βa1 )p1 + · · · + (cm + βam )pm = d + βb .

So every solution of the equations v and w is also a solution of the


equations v and w+βv. For the same reason we have that every solution
of the equations v and w + βv is also a solution of the equations v en
(w + βv) − βv, i.e., a solution of the equations v and w.
The system of equations v and w therefore has the same solutions as
the equations v and w + βv. So the solution set of a system doesn’t
change if we add a scalar multiple of one of the equations to one of the
other equations.

Since the process of row reducing consists of applying (at the level of matri-
ces) such operations consecutively, we conclude that applying row operations
doesn’t change the solution set of a system of equations.

3.3.7 The procedure for solving a system of linear equations


Form the above discussion we arrive at the following procedure for solving a
system of linear equations:

• Start with the extended coefficient matrix (A|b) corresponding to the


system,
3.3 Systems of linear equations 91

• Use row operations to determine the row reduced echelon form,

• Then determine the solutions.

3.3.8 Example. To solve the system


x1 + 2x2 + 3x3 = 2 ,
x2 + 2x3 = 1 ,
3x1 + x2 + x3 = 3

we first represent it in matrix form:


 
1 2 3 2
 0 1 2 1  .
3 1 1 3

Applying row operations yields


 
1 0 0 1
 0 1 0 −1  .
0 0 1 1

So the system has precisely one solution:

(x1 , x2 , x3 ) = (1, − 1, 1) .

3.3.9 Example. The system


x1 + 2x2 + 3x3 − x4 = 1 ,
2x1 + 3x2 − 2x3 + 3x4 = 1 ,
4x1 + 7x2 + 4x3 + x4 = 3

has the following matrix representation:


 
1 2 3 −1 1
 2 3 −2 3 1  .
4 7 4 1 3

Applying row operations produces the following row reduced echelon form
 
1 0 − 13 9 −1
 0 1 8 −5 1  .
0 0 0 0 0
92 Matrices and systems of linear equations

We now assign the values λ and µ to the variables x3 and x4 , respectively:


x3 = λ and x4 = µ. Then

x1 = − 1 − 9µ + 13λ ,

x2 = 1 + 5µ − 8λ ,
so that

(x1 , x2 , x3 , x4 ) = (− 1, 1, 0, 0) + λ( 13, − 8, 1, 0) + µ(− 9, 5, 0, 1) .

The advantage of this latter way of describing the solutions is that it shows
that the solution set is a ‘plane in 4-dimensional space.’ We’ll return to this
in the chapter on vector spaces.

3.3.10 Remark. a) One can prove that the row reduced echelon form of a matrix
is unique: in whatever way you apply the row operations, you’ll always
end up with the same row reduced echelon form. A proof can be found
in Thomas Yuster, The reduced row echelon form of a matrix is unique:
A simple proof , Mathematics Magazine, vol. 57, No. 2 (1984).

b) Of course, in practical situations it is not always necessary to find the


row reduced echelon form of a system of equations in order to find the
solutions. Often, the solutions can be read off some steps before you
reach this form.

3.4 Notes
James Joseph Sylvester (1814–1897) introduced the term matrix for a rectangular
array of numbers. In the Philosophical Magazine (1851) he wrote: “I have in
previous papers defined a “Matrix” as a rectangular array of terms, out of which
different systems of determinants may be engendered, as from the womb of a
common parent”. Determinants will be discussed in Chapter 5.
Matrices turn out to be a useful way of storing and handling data. In this
chapter, we have used them to store and manipulate the coefficients of systems
of linear equations. We will come across various other usages of matrices in the
following chapters (by the way, they are also used in many other mathematics
courses). The importance of matrices is in the arithmetic operations like addition
and multiplication that allow for efficient handling of data.
3.4 Notes 93

In a way linear equations are the simplest kind of equations in mathematics.


They belong to the few classes of equations for which, in theory, a complete solution
procedure exists, Gaussian elimination, the details of which have been discussed in
this chapter. This procedure is not the end of the story regarding linear equations,
since, for instance, equations whose coefficients vary very much in size, or systems
with a very large number of equations cause different problems. Such systems Numerical
often occur in practice and the mathematics to deal with them is discussed in the Linear
course Numerical Linear Algebra. Algebra
Linear equations are a relatively simple form of polynomial equations. The
latter type is much more difficult to handle. Algorithms to determine exact solu- Algebra
tions of systems of polynomial equations are discussed in the course Algoritms in
Algebra and Number Theory.
Many problems in linear algebra (and also outside linear algebra) are closely
related to solving systems of linear equations. Knowing the techniques to solve
them is essential for a fruitful study of the remaining chapters, but is also useful
for other courses.
94 Matrices and systems of linear equations

3.5 Exercises
§1

1 Determine AB, BA, A(B − 2C), AD, CC ⊤ , C ⊤ C, DD⊤ , D⊤ D, where


     
  4 −1 2 2 2
1 −1 2
A= , B =  0 −2  , C =  1 −1  , D =  −1  .
0 3 4
−3 3 1 −3 3

2 Determine A + B, (A − B)C, A⊤ B, AA⊤ , A⊤ C ⊤ , where


 
    1−i 1
1 + i i −i 1 − i −i i
A= ,B= ,C= i 1 .
1 −1 1 1 −1 1
i 1

3 The 2 × 3 matrices A = (akl ) and B = (bkl ) are given by akl = k + l, bkl = k − l.


Determine A + B, A − 2B, A⊤ B, AB ⊤ .

4 The 3 × 2 matrices A = (akl ) and B = (bkl ) are given by akl = k + li, bkl = k − li.
Determine A + B, A − B, A⊤ B, AB ⊤ .

5 a. Compute the inverse of the matrix


 
1 −1
.
1 1

b. Suppose A is invertible with inverse A−1 . Determine the inverse of each of the
following matrices: λA (λ 6= 0), A2 , A⊤ , A−1 .

6 Let V be the set of matrices of the form


 
a −b
met a, b reëel.
b a

a. Show that A + B ∈ V and AB ∈ V for alle A ∈ V and B ∈ V .


b. Compare, for real numbers a, b, c, d, the sum and the product of the complex
numbers a + bi, c + di with the sum and the product of the matrices
   
a −b c −d
, .
b a d c
Conclude that the matrices in V ‘behave’ similarly with respect to addition
and multiplication as the complex numbers.
3.5 Exercises 95

c. Prove that  n  
cos ϕ − sin ϕ cos nϕ − sin nϕ
=
sin ϕ cos ϕ sin nϕ cos nϕ
for all positive integers n.

§2

7 Use row reduction to transform the following matrices into row reduced echelon
form.

a.  
1 2 −3 −11
 2 5 −5 −11  ,
−1 −1 7 43

b.  
0 0 1 1 3
 1 2 2 2 8 .
1 2 3 3 11

8 The operations used in row reduction can also be brought about by multiplying
with suitable matrices. This connection is discussed in this exercise.

a. In the 3 × 3 identity matrix interchange the 2nd and 3rd row. Let E be the
resulting matrix. Now compute the product
 
a11 a12 a13 a14
E  a21 a22 a23 a24  .
a31 a32 a33 a34

Find by analogy the matrix you need to multiply with (from the left or from
the right?) to accomplish swapping the i-th and j-th rows of an m × n matrix.

b. In the 3 × 3 identity matrix multiply the 2nd row by 7. Let F be the resulting
matrix. Now compute the product
 
a11 a12 a13 a14
F  a21 a22 a23 a24  .
a31 a32 a33 a34

Find by analogy the matrix you need to multiply with (from the left or from
the right?) to accomplish multiplication of the i-th row of an m × n matrix by
λ.
96 Matrices and systems of linear equations

c. In the 3 × 3 identity matrix add 5 times the 3-rd row to the first row and call
the resulting matrix G. Compute the product
 
a11 a12 a13 a14
G  a21 a22 a23 a24  .
a31 a32 a33 a34

Find by analogy the matrix you need to multiply with (from the left or from
the right?) so that in a m × n matrix λ times the i-th row is added to the j-th
row.

§3

9 Solve each of the following systems of linear equations.

a.
x1 +2x2 +3x3 −x4 = 0,
2x1 +3x2 −x3 +3x4 = 0,
4x1 +6x2 +x3 +2x4 = 0;

b.
3x1 +x2 +2x3 −x4 = 0,
2x1 −x2 +x3 +x4 = 0,
5x1 +5x2 +4x3 −5x4 = 0,
2x1 +9x2 +3x3 −9x4 = 0;

c.
x1 −x2 +x3 +2x4 = 2,
2x1 −3x2 +4x3 −x4 = 3,
x1 −x3 +7x4 = 3.

10 Solve each of the following systems of linear equations.

a.
x2 +2x3 = 1,
x1 +2x2 +3x3 = 2,
3x1 +x2 +x3 = 3;

b.
x1 +x2 +2x3 +3x4 −2x5 = 1,
2x1 +4x2 −8x5 = 3,
−2x2 +4x3 +6x4 +4x5 = 0;
3.5 Exercises 97

c.
x1 +2x2 = 0,
x1 +4x2 −2x3 = 4,
2x1 +4x3 = −8,
3x1 +6x3 = −12,
−2x1 −8x2 +4x3 = −8;

d.
x1 +x2 −2x3 = 0,
2x1 +x2 −3x3 = 0,
4x1 −2x2 −2x3 = 0,
6x1 −x2 −5x3 = 0,
7x1 −3x2 −4x3 = 1.

11 Solve each of the following systems of linear equations.


a.
z1 +iz2 +z3 = 1,
z2 + (i + 1)z3 = 0,
−iz1 +z2 = 0;

b.
(1 − λ)z1 −2z2 = 0,
5z1 +(3 − λ)z2 = 0,
for λ = 2 + 3i, and for λ = 2 − 3i;

c.
λz1 −z2 = 0,
λz2 +z3 = 0,
z1 +λz3 = 0,
for λ = 1, λ = e2πi/3 , and for λ = e−2πi/3 .
1 1 √
12 Let a = − + i 3. Show that a2 + a + 1 = 0 and solve the following system of
2 2
linear equations.
z1 −z2 +z3 = 0,
z1 +az2 +a2 z3 = 1,
−z1 −a2 z2 −az3 = 1.

13 Determine for each value of λ the solution(s) of the following system of linear
equations.
λx1 +x2 +x3 = 2,
x1 +λx2 +x3 = 3.
98 Matrices and systems of linear equations

3.5.1 Exercises from old exams


14 Solve for each value of λ the following system of linear equations.

x1 −2x3 = λ + 4,
−2x1 +λx2 +7x3 = −14,
−x1 +λx2 +6x3 = λ − 12.

15 Solve for each value of λ the following system of linear equations.

x1 +2x2 +3x3 −6x4 = −1,


2x1 +x2 +9x4 = 2 + λ,
−3x1 −3x2 −3x3 −3x4 = 1 + λ.
Chapter 4

Vector spaces

4.1 Vector spaces and linear subspaces


4.1.1 When mathematicians notice mathematical objects with similar properties in var-
ious places of mathematics, they try to develop a common generalization. The
concept of a vector space is such a notion. Vector spaces play an important role in
many branches of mathematics (and other sciences). In this section we introduce

• the arithmetical rules (‘axiom’s’) of a vector space,

• linear subspaces,

• (parametric representations of) lines and planes in vector spaces.

4.1.2 The notion of a vector space


The notion of a vector is usually associated with an arrow in the plane or space
(starting in the origin) as we discussed in Chapter 2. In that setting we noticed
that addition and scalar multiplication satisfy various properties. Here are eight
important ones. For all vectors p, q, r and all scalars (numbers) λ, µ we have:

1. p + q = q + p,

2. (p + q) + r = p + (q + r),

3. there is a zero vector 0 with the property p + 0 = p for every p

4. every vector p has an opposite −p such that p + −p = 0 (we also write


p − p = 0)

5. 1 p = p,

99
100 Vector spaces

6. (λµ)p = λ(µp),

7. (λ + µ)p = λp + µp,

8. λ(p + q) = λp + λq.

Now matrix addition and scalar multiplication of, say, m × n matrices satisfy
similar properties. The similarities observed in the setting of vectors in the plane,
of matrices, and of other examples, have led to the idea of introducing an abstract
notion of which vectors in the plane or space, and matrices are examples. This is
the notion of a vector space in which the starting point is any set together with two
operations on the elements of this set, called ‘addition’ and ‘scalar multiplication’,
in which the above eight ‘axioms’ hold. The elements of the set are then called
vectors. A vector space is also sometimes called a linear space. In these lecture
notes we denote vectors by underlined symbols1 , like v. The scalars can be real or
complex numbers. In the first case we are dealing with a real vector space, in the
second case with a complex vector space. There do exist vector spaces over other
sets of scalars but they are beyond the scope of this course.
From the eight rules described above we can derive some more (obvious looking)
arithmetical rules that hold for vectors in an abstract setting (note that in the
abstract setting we only know so far that our set satisfies the eight axioms; any
other rule, even if it looks trivial, requires a proof). For instance, for every scalar
λ the equality λ 0 = 0 holds, and for every vector a we have 0 a = 0 (see exercise
27).
Some more rules (that we will not discuss and proof in detail here; but see
exercise 27) and remarks:

• The zero vector 0 is unique (in a given vector space), the opposite of a vector
is unique.

• Stricly speaking, a sum of, say, three vectors v 1 , v 2 , v 3 (or more) is not
defined; only the sum of two vectors is. To deal with three vectors, just take
(v 1 + v 2 ) + v 3 (why is this sum defined?). Another option is to define the
sum as v 1 + (v 2 + v 3 ), and the associativity garantuees that the two given
options give the same answer. This is the reason that we usually just write
v 1 +v 2 +v 3 and only care about brackets if they are of help in a computation
or proof. For more than three vectors something similar can be shown, so
that a sum of n vectors v 1 + · · · + v n is meaningful. For instance, a way of
defining the sum of four vectors v 1 , . . . , v 4 is as follows: (v 1 + v 2 ) + (v 3 + v 4 ).
But, ((v 1 + v 2 ) + v 3 ) + v 4 could also be the definition, and, again by an
1
In the literature you’ll come across various other notations: ~v , v̄, v
4.1 Vector spaces and linear subspaces 101

associativity argument (do you see how?), the two ‘definitions’ produce the
same vector.

Finally, even though vectors in the plane or in space are just two examples of
vector spaces, they are important in shaping our intuition. These examples are
often a good guide, even when working in a totally different vector space.

4.1.3 Example. The first example is the ‘space of arrows’ in the plane or in space. We
fix a point O, the origin. For every point P let p be the arrow from O to P . Our
vector space to be consists of all such arrows; we denote it by E 2 (the plane) or
E 3 (space).
The operations ‘addition’ and ‘scalar multiplication’ are defined as suggested
in the figure. Using geometry the eight axioms of a vector space can be checked,
but we will not discuss the details of this verification. The vector spaces E 2 and
E 3 are examples of real vector spaces.

❃ a+b
✕ λa
b
✕ a

✶ a

0 0
4.1.4 Example. Let n ≥ 1 be an integer and let Rn = {(a1 , . . . , an ) | a1 , . . . , an ∈ R}.
For any two n-tuples of real numbers a = (a1 , . . . , an ) and b = (b1 , . . . , bn ) from
Rn , and any scalar α we define the sum and the scalar product as follows:

a + b = (a1 + b1 , . . . , an + bn ),
α a = (αa1 , . . . , αan ) (α real).

One can easily verify that Rn with these two operations the axioms of a (real)
vector space are satisfied. By way of example, We’ll check the first one. The first
axiom requires that a + b = b + a for all a and b. Now

a + b = (a1 + b1 , . . . , an + bn )

and
b + a = (b1 + a1 , . . . , bn + an ),
where a = (a1 , . . . , an ) and b = (b1 , . . . , bn ). Since ai + bi = bi + ai , i = 1, . . . , n
(this is a property of the real numbers), we conclude that indeed a + b = b + a.
102 Vector spaces

Note that the zero vector is (0, 0, . . . , 0), and the opposite of the vector (a1 , . . . , an )
is (−a1 , . . . , −an ).
We remark that, strictly speaking, E 2 is not the same space as R2 and E 3 is
not R3 , since an arrow is not an array of numbers. There is a close connection,
and we will come back to that.
In a similar way we can turn the set Cn of n-tuples of complex numbers into
a complex vector space.

4.1.5 Example. The set Mn,m of n × m-matrices with matrix addition and the usual
scalar multiplication is a vector space with zero vector the n × m zero matrix. The
opposite of a matrix A is the matrix −A. Depending on which numbers we use in
the matrix and as scalars, we obtain a real or complex vector space. Sometimes
the notations Mn,m (R) and Mn,m (C) are used to denote these two types.

4.1.6 Example. Let p and q be two polynomials of degree at most n:

p = an xn + an−1 xn−1 + · · · + a1 x + a0 ,
q = bn xn + bn−1 xn−1 + · · · + b1 x + b0 .

Now define the sum and the scalar product as follows:

p + q = (an + bn )xn + (an−1 + bn−1 )xn−1 + · · · + (a1 + b1 )x + (a0 + b0 ),


λp = λan xn + λan−1 xn−1 + · · · + λa1 x + λa0 .

This addition and scalar multiplication satisfy the eight axioms. The zero vector
is the zero polynomial (all coefficients equal to 0), and the opposite of an xn +
an−1 xn−1 + · · · + a1 x + a0 is of course the polynomial −an xn − an−1 xn−1 − · · · −
a1 x − a0 . If we only allow polynomials with real coefficients and if we use real
scalars in the scalar multiplication, then the vector space is real. If we admit
complex coefficients and complex scalars, the vector space is complex.

4.1.7 Example. Consider the set of all functions from a non-empty set X to the real
numbers. Addition and scalar multiplication can be defined as follows:

(f + g)(x) = f (x) + g(x),


(αf )(x) = α f (x).

Then X becomes a real vector space (the zero vector is the ‘zero function’
which sends every x ∈ X to 0; the opposite −f of a function f is the function
(−1)f ).
Of course, in a similar way a complex vector space can be constructed.
4.1 Vector spaces and linear subspaces 103

4.1.8 Next, we discuss subsets of a vector space V that are themselves vector spaces
(with the two operations ‘inherited’ from V ). A typical example is the subset
{(x, y) ∈ R2 | y = 0} of the vector space R2 . It is easy to verify that this subset with
the addition (u, 0)+(x, 0) = (u+x, 0) and the scalar multiplication λ(u, 0) = (λu, 0)
(simply add and multiply them as vectors in R2 ) is itself a vector space (with zero
vector (0, 0) and opposite (−u, 0) of (u, 0)).
Suppose W is a non-empty subset of the vector space V . For any two vectors
in V that actually lie in W , there is a sum vector in V because we know how to
add vectors in V . But there is no garantuee that this sum vector is itself in W .
A similar remark holds for scalar multiples of vectors from W : such multiples lie
in V but not necessarily in W . If such sums and scalar multiples always lie in
W , then W turns out to be a vector space itself. We call such a subset a linear
subspace of V .

4.1.9 Definition. (Linear subspace) A non-empty subset W of a vector space V is


called a linear subspace of V if for all p, q ∈ W and for all scalars λ we have

p + q ∈ W,
λp ∈ W.

Equivalently: for all p, q ∈ W and for all scalars λ, µ

λp + µq ∈ W.

To verify that such a W is indeed a vector space itself, we need to check the eight
axioms. This turns out to be easy. For instance, to check the first axiom, we need
to verify that v + w = w + v for every v, w ∈ W . But we already know that
v + w = w + v for every v, w ∈ V , and so the equality certainly holds if v, w belong
to a subset of V ! Most axioms hold for similar reasons.
As for the zero vector: V ’s zero vector turns out to lie in W . To see this, take
any p in W (here we use the fact that W is non-empty!) and take the scalar 0.
Then 0 · p = 0 is in W by the above requirements for a linear subspace.
By using the equality −w = (−1)w one easily shows in a similar way that the
opposite of w ∈ W is itself in W .
So linear subspaces are vector spaces themselves. Conversely, if a subset of a
vector space V is a vector space itself (with the addition and scalar multiplica-
tion from V ), then the subset obviously satisfies the above conditions for a linear
subspace.
Caution: note that subspaces are required to be non-empty.

4.1.10 Here is a useful observation that sometimes helps in deciding that a subset is not
a linear subspace.
104 Vector spaces

If W is a linear subspace of the vector space V , then W contains the zero


vector 0 of V as we just saw. An equivalent formulation is: if the subset W of the
vector space V does not contain V ’s zero vector, then W is not a linear subspace.
For instance, the subset W of R2 defined by W = {(x, y) ∈ R2 | y = 1} is not
a linear subspace since it doesn’t contain (0, 0).

4.1.11 Example. The subset V : 3x − 2y + z = 6 in R3 is not a linear subspace of R3 ,


since the zero vector of R3 is not in V .
Here is a different way to arrive at the same conclusion. Take two cleverly
chosen vectors in V , say a = (2, 0, 0) and b = (0, −3, 0). Then a ∈ V and b ∈ V
but a + b = (2, −3, 0) 6∈ V , as is easily verified by substituting the coordinates in
the equation. So V (with the addition and scalar multiplication from R3 ) is not
closed with respect to the addition.
The set (plane) U with equation 3x − 2y + z = 0 is a linear subspace of R3 . To
verify this we first note that U is nonempty because (0, 0, 0) ∈ U . Next we turn
to the condition that the sum of any two vectors from U should be in U . Sp let
a = (a1 , a2 , a3 ) ∈ U and b = (b1 , b2 , b3 ) ∈ U . Then

3a1 − 2a2 + a3 = 0,
3b1 − 2b2 + b3 = 0.

Adding yields:
3(a1 + b1 ) − 2(a2 + b2 ) + (a3 + b3 ) = 0,
so a + b = (a1 + b1 , a2 + b2 , a3 + b3 ) ∈ U .
In a similar way one can verify that α a ∈ U for every α ∈ R.
Note that in order to prove that a subset is a linear subspace it is not enough to
show that 0 belongs to that subset. For instance, the subset W = {(x, y)|y = x2 }
of R2 contains (0, 0), but W is not a linear subspace because (1, 1) is in W but
2 · (1, 1) is not.

4.1.12 Example. In the vector space V of all real polynomials of degree at most 3, the
subset W = { p(x) ∈ V | p(1) = 0 }, i.e., the set of polynomials having a zero at 1,
is a linear subspace. This subset contains for example the polynomial p(x) = x2 −1.
Here is the proof that W is indeed a linear subspace.

• W 6= ∅ because the zero polynomial is in W ;

• if p, q ∈ W and λ, µ ∈ R, then λp + µq ∈ W because

(λp + µq)(1) = λ · p(1) + µ · q(1) = λ · 0 + µ · 0 = 0.


4.1 Vector spaces and linear subspaces 105

4.1.13 Example. The solutions of a homogeneous system of linear equations


   
x1 0
 ..   .. 
A .  =  . ,
xn 0
where A is an m × n matrix, form a linear subspace of the vector space Mn,1 of
n × 1 matrices (over R or C): first of all, the solution set is non-empty because
the n × 1 zero matrix O is a solution; secondly, if X and Y are solutions and λ
and µ are scalars, then (using the rules for matrix multiplication) A(λX + µY ) =
λAX + µAY = O + O = O, so that λX + µY is also a solution.
The solution set of an inhomogeneous system AX = B, where B is a nonzero
m × 1 matrix, is never a linear subspace because the n × 1 zero matrix is not a
solution.
In the next section we discuss so-called (linear) spans, an important class of
linear subspaces. Here is another, more abstract example involving a chain of
linear subspaces.

4.1.14 Example. Consider the vector space V of all functions on R, with sum f + g and
scalar product αf defined by
(f + g)(x) = f (x) + g(x) for all x ∈ R,
(αf )(x) = α f (x) for all x ∈ R.
Now polynomials (more precisely, polynomial functions) form a nonempty subset
of V . The sum of two such functions and the scalar product of such a function
are again polynomial functions. So the set P of all polynomials forms a linear
subspace of V .
Here is a further refinement of this statement. The sum of two polynomials of
degree at most n and the scalar product of a polynomial of degree at most n are
again polynomials of degree at most n. So for every nonnegative integer n the set
Pn of all polynomials of degree at most n is a linear subspace of V . So we have
the following chain of linear subspaces:
P0 ⊂ P1 ⊂ P2 ⊂ · · · ⊂ Pn ⊂ · · · ⊂ P ⊂ V.
Note that no two of these subspaces are equal.
Next, we define the notions line and plane in the general setting of vector spaces.

4.1.15 If p and v 6= 0 are two vectors in E 3 (or E 2 ), then geometrically it is clear that
the endpoints of the vectors
x = p + λv, λ∈R (4.1)
106 Vector spaces

are on the line through the endpoint of p and parallel with v. The formula (4.1) is
calles a parametric equation of this line. Since the expression p + λv is built from
a scalar product of a vector and a sum of vectors, we can, by analogy, state the
following definition in any vector space.

4.1.16 Definition. (Line) Let p and v be two vectors in a vector space en suppose v 6= 0.
Then the set of vectors of the form

x = p + λv, λ ∈ R or C,

is called a line in the vector space. The vector p is called a position vector of the
line and the vector v a direction vector. We call the description x = p + λv a
parametric equation or parametric representation of the line.

4.1.17 Example. All solutions of the differential equation

y ′ + 2y = 2x

are
1
y = (x − ) + ce−2x .
2
The solution of this differential equation is therefore a line in the space of all
functions on R. Its position vector is the function x − 12 and its direction vector is
the function e−2x .

4.1.18 Similarly, if p, v, w are three vectors in E 3 such that v 6= 0, w 6= 0, and such that v
and w are not multiples of one another. (In the next section, we will formulate this
as: v and w are linearly independent. Geometrically it is clear that the endpoints
of the vectors
x = p + λv + µw, λ, µ ∈ R
describe a plane passing through the endpoint of p and parallel to v and w. This
motivates the following generalization.

4.1.19 Definition. (Plane) Let p, v, w be three vectors in a vector space and suppose
v 6= 0, w 6= 0, and v and w are not multiples of one another. The set of vectors

x = p + λv + µw, λ, µ ∈ R (of C) (4.2)

is calles a plane in the vector space with position vector p and direction vectors
v and w. The description (4.2) is called a parametric equation (or parametric
representation) of the plane.
4.1 Vector spaces and linear subspaces 107

4.1.20 Example. The real solutions of the differential equation

y ′′ + y = x

are
y = x + c1 cos x + c2 sin x with c1 , c2 ∈ R.
So the solution set is a plane in the vector space of all functions on R, with position
vector the function x, and with direction vectors the functions sin x and cos x.

4.1.21 In R3 , consider the set

V = {(x, y, z) | 2x + 3y − z = 4}.

Take x = λ, y = µ, then z = 2λ + 3µ − 4 so that

V : x = (0, 0, −4) + λ(1, 0, 2) + µ(0, 1, 3).

This is a parametric description of the plane V in R3 where we have used the


vector (0, 0 − 4) as position vector and the vectors (1, 0, 2) and (0, 1, 3) as direction
vectors. (If, for example, you take x = λ and z = µ you will find a different
parametric description, but which is just as good.)
The equation 2x + 3y − z = 4 is an equation of the plane (4x + 6y = 8 + 2z
is another one). By solving the equation we have produced a vector parametric
equation (or parametric equation(s)) of the plane.
Conversely, starting from a vector parametric equation of, say, a plane in R3 ,
we can derive an equation of the plane. We use an example to demonstrate one of
the ways to find such an equation. Consider the plane

W : x = (1, 0, 1) + λ(1, 1, −1) + µ(2, −1, −1),

i.e.,
x = 1 +λ +2µ,
y = λ −µ,
z = 1 −λ −µ.
From the last two equations we solve for λ and µ and find
1 1 1 1 1 1
λ= y− z+ and µ=− y− z+ .
2 2 2 2 2 2
Substituting in the first of the three equations yields

2x + y + 3z = 5,
108 Vector spaces

an equation of the plane W .


A systematic way to find the answer is to rephrase the parametric equations
as a system of linear equations in λ and µ with the following extended coefficient
matrix:  
1 2 x−1
 1 −1 y .
−1 −1 z − 1
Row reducing produces:
 
1 0 −x − 2z + 3
 0 1 x+z−2 .
0 0 2x + y + 3z − 5

In the last row we see an equation appearing: 2x + y + 3z − 5 = 0.

4.1.22 A linear equation in R3 describes a plane in R3 . A linear equation in R4 is not


a plane; if you turn the equation into a parametric description, three parameters
appear instead of the expected two in the case of a plane. A plane in R4 can be
described in terms of two linear equations. We illustrate this with an example.
(The notion that is really behind all this is that of dimension, to which we will
turn our attention later on in this chapter. Then all details should become fully
clear.)
In any vector space, the plane through p, q and r (assumed not to be on a
single line) can be described with the following vector parametric equation:

x = p + λ(q − p) + µ(r − p).

We find p for λ = µ = 0, q for λ = 1, µ = 0, and r for λ = 0, µ = 1.


Next we turn to the plane W in R4 through (0, 1, 0, 1), (−1, 0, 2, 1), (1, 1, 0, 0).
A vector parametric description of the plane is

(x, y, z, u) = (0, 1, 0, 1) + λ(−1, −1, 2, 0) + µ(1, 0, 0, −1),

or
x = −λ +µ,
y = 1 −λ ,
z = 2λ ,
u = 1 −µ.
Using the last two equations we express λ and µ in terms of z and u and use the
results in the first two equations. We find
2x + z + 2u = 2,
2y + z = 2.
4.1 Vector spaces and linear subspaces 109

Every point of W is a solution of this system and conversely (for the converse,
solve the system of two linear equations).

4.1.23 Every vector on the line


ℓ : x = p + λv
can be used as position vector of the line. To show this, let’s look at the line

m : x = (p + αv) + λv,

where α is an arbitrary scalar (which we assume fixed in the discussion!)), and


show that ℓ and m are actually the same line.
First, by rewriting (p + αv) + λv as p + (α + λ)v we see that every vector on
m is also on ℓ. By rewriting p + λv as (p + αv) + (λ − α)v we conclude that every
vector on ℓ is also on m. (Note how placing brackets and the arithmetic rules for
vectors play a role in the proof.)
In terms of sets we have shown that

{p + λv | λ ∈ R} = {(p + αv) + λv | λ ∈ R} .

(For complex vector spaces we should of course replace R by C.)

Figure 4.1: Every vector of a line may serve as position vector.

This remark implies that the line ℓ is a linear subspace if and only if 0 ∈ ℓ.
Here are the details for the ‘if’ part. If 0 ∈ ℓ, then we can use 0 as a position vector
of the line and describe the line by the scalar multiples λ v of v. Since the sum
λv + µv can be written as (λ + µ)v, this sum is again a multiple of v and therefore
on ℓ. Of course, since µ(λv) = (µλ)v, we see that scalar multiples of vectors on l
are themselves on ℓ.
Similar remarks hold for planes: every vector on a plane can serve as position
vector of the plane, and a plane is a linear subspace if and only if the zero vector
is on the plane.
110 Vector spaces

In a similar way as above one can show that any nonzero multiple of v can
serve as direction vector of the line ℓ : x = p + λv. Planes can also have many
pairs of direction vectors (no details here).

4.2 Spans, linearly (in)dependent systems


4.2.1 The concept of a vector space generalizes many particular situations, such as vec-
tors in the plane, polynomials, matrices. What we gain by having this abstract
notion is that whatever we prove there automatically holds in every concrete ex-
ample of a vector space (the price we pay is that we have to get used to working
with the abstract concept).
In this section we will illustrate this by developing the notion of dimension of a
vector space (‘the size of a vector space’). Along the way we will need various other
notions and results. To develop these notions we usually let concrete examples be
the source of inspiration.
In this section we concentrate on

• linear combinations of vectors,

• systems and spans of vectors,

• linearly dependent sets and linearly independent sets of vectors,

• bases and dimension.

4.2.2 Definition. (Linear combination) Let a1 , . . . , an be vectors in a vector space


V . The vector x is called a linear combination of a1 , . . . , an if there exist scalars
λ1 , . . . , λn such that
x = λ1 a 1 + λ2 a 2 + · · · + λn a n .
In this situation we also say that the vector x depends (or is linearly dependent)
on the vectors a1 , . . . , an .

4.2.3 Example. A linear combination of the vectors (1, 1, −1) and (2, 0, 1) in R3 is, for
example, the vector (−1, 3, −5) = 3 (1, 1, −1) − 2 (2, 0, 1).

4.2.4 Definition. (Span of vectors) Let a1 , . . . , an be vectors in a vector space. The


set of all linear combinations of a1 , . . . , an is called the span of the vectors a1 , . . . , an
and is denoted as < a1 , . . . , an >.

4.2.5 Theorem. Spans are linear subspaces, i.e., if a1 , . . . , an are vectors in the vector
space V , then < a1 , . . . , an > is a linear subspace of V .
4.2 Spans, linearly (in)dependent systems 111

Proof. Of course, the span is non-empty (it contains the zero vector).
Now let p and q be vectors in < a1 , . . . , an > and suppose

p = p1 a 1 + · · · + pn a n and q = q1 a1 + · · · + qn an .

Then
p + q = (p1 + q1 )a1 + · · · + (pn + qn )an ∈< a1 , . . . , an > .
Also, for every scalar λ:

λp = λp1 a1 + · · · + λpn an ∈< a1 , . . . , an > .

So sums and scalar multiples of vectors from the span belong to the span, which
finishes the proof. 

4.2.6 Example. The span < (2, 1, 0), (1, 0, 1) > is precisely the plane with equation
x − 2y − z = 0: with y = λ and z = µ we get (x, y, z) = (2λ + µ, λ, µ) =
λ(2, 1, 0) + µ(1, 0, 1), and, by definition, these vectors run through the span <
(2, 1, 0), (1, 0, 1) >.

4.2.7 Example. In a vector space V consider a line passing through the origin:

l : x = λv.

This line equals the span < v >, so it is a linear subspace as we saw before in
4.1.23.
Similarly, the plane
V : x = λv + µw
passing through the origin equals the span < v, w >.

4.2.8 Example. In R3 consider the vectors a = (1, 1, −2), b = (−1, 1, 0), c = (0, 1, −1)
and let V =< a, b, c >. We see immediately that 2c−a = b. Now take an arbitrary
x ∈ V . Then x can be written as

x = x1 a + x2 b + x3 c,

for some scalars x1 , x2 , x3 , so that

x = x1 a + x2 (2c − a) + x3 c
= (x1 − x2 )a + (2x2 + x3 )c.

So every vector from V is a linear combination of a and c, and therefore V is


contained in the span < a, c >.
112 Vector spaces

Conversely, every linear combination of a and c is of course a linear combination


of a, b, c (add 0 b to such a linear combination of a and c), so that < a, c > is
contained in V . We conclude: V =< a, c >.
Verify yourself that a = 2c − b and hence V =< b, c >; also c = 12 a + 12 b from
which we conclude that V =< a, b >.
Also check that V is the plane x + y + z = 0.

4.2.9 Manipulating spans


In the previous example we considered a span of a set of vectors that could also
be spanned by a smaller set of vectors. Such a smaller set is likely to be of more
practical value than the bigger one (as we will see later on). Our next goal is
therefore to discuss operations with which we can reduce the number of vectors
spanning a given linear subspace in a systematic way, and even find ‘minimal
spanning sets.’ You will notice that these operations are very similar to row
reduction operations.

4.2.10 Theorem. Let a1 , . . . , an be vectors in a vector space V . The span < a1 , . . . , an >
doesn’t change if we
1. change the order of the vectors,
2. multiply one of the vectors by a scalar 6= 0, i.e., replace, say, ai by λai with
λ 6= 0,
3. add a scalar multiple of one of the vectors to one of the other vectors, i.e.,
replace, say, ai by ai + αaj with j 6= i.
The span also doesn’t change if we
4. insert the zero vector, for instance, < a1 , . . . , an >=< a1 , . . . , an , 0 >, or
leave out the zero vector (if of course the zero vector was one of the ai ),
5. insert a linear combination λ1 a1 + · · · + λn an of a1 , . . . , an ,
6. leave out ai if this vector is a linear combination of the other aj .
Proof. The proof that changing the order (1), and inserting or leaving out the zero
vector (4) doesn’t affect the span is almost trivial, so we leave that to the reader.
To prove 2) we first observe that the equality
λ1 a1 + λ2 a2 + · · · + λk (αak ) + · · · + λn an = λ1 a1 + λ2 a2 + · · · + (λk α)ak + · · · + λn an
shows that every linear combination of a1 , . . . , αak , . . . , an (only ak is multiplied
by the scalar α) is a linear combination of a1 , . . . , ak , . . . , an . Likewise,
λk
λ 1 a 1 + λ 2 a 2 + · · · + λ k a k + · · · + λ n a n = λ 1 a1 + λ 2 a 2 + · · · + (αak ) + · · · + λn an
α
4.2 Spans, linearly (in)dependent systems 113

shows that, for α 6= 0, every linear combination of a1 , . . . , ak , . . . , an is a linear


combination of a1 , a2 , . . . , αak , . . . , an . This finishes the proof of 2).
We now show that the span doesn’t change if we add a multiple of one of
the vectors to one of the other vectors. By possibly changing the order of the
vectors we may consider the situation where we add αa2 to a1 . Since every linear
combination of a1 + αa2 , a2 , . . . , an is a linear combination of a1 , . . . , an we have

< a1 + αa2 , a2 , . . . , an >⊂< a1 , . . . , an > .

But then

< (a1 + αa2 ) − αa2 , a2 , . . . , an >⊂< a1 + αa2 , a2 , . . . , an > .

Part 5) follows from the previous ones as follows: if b = λ1 a1 + · · · + λn an , then

< a1 , . . . , an > =< a1 , . . . , an , 0 >


=< a1 , . . . , an , 0 + λ1 a1 >
..
.
=< a1 , . . . , an , λ1 a1 + · · · + λn an >
=< a1 , . . . , an , b > .

Here we have used properties 4) and 3).


We leave the proof of the last item of the theorem to the reader. 

4.2.11 Example. By repeatedly applying the above rules, we see that (regardless of the
vector space we are working in)

< a + 2b, a − b, a + b > =< a + 2b, a − b, a + b + (a − b) >


=< a + 2b, a − b, 2a >
=< a + 2b − (1/2) · 2a, a − b − (1/2) · 2a, 2a >
=< 2b, −b, 2a >=< b, −b, a >
=< b, −b + b, a >=< b, 0, a >
=< a, b > .
In particular we observe that we only need two vectors. Matrices can be used to
perform the above manipulations in a systematic way. Here are the details. As in
the case of systems of linear equations we restrict to writing down the coefficients
of a and b in the various vectors in the various stages of the rewriting process.
Collect the coefficients of the three vectors we start with in rows and apply row
reduction:
   
1 2 1 0
row reduce  1 −1  to normal form  0 1  .
1 1 0 0
114 Vector spaces

We interpret this computation as < a + 2b, a − b, a + b >=< a, b >. This matrix


approach provides both a shorthand notation and a systematic way of finding the
answer.
Here is yet another way of handling spans.

4.2.12 Theorem. (Exchange theorem) If V =< a1 , . . . , an > and b = λ1 a1 + . . . +


λi ai + . . . + λn an with λi 6= 0, then

V =< a1 , . . . , an >=< a1 , . . . ai−1 , b, ai+1 , . . . , an > .

4.2.13 This theorem states that, if a vector b ∈< a1 , . . . , an > can be written as a linear
combination of a1 , . . . , an , where the coefficient of ai is nonzero, then we can replace
the vector ai by b without altering the span of the vectors.
Proof of 4.2.12. Consider V =< a1 , . . . , ai , . . . , an >. Now first multiply ai by λi
(6= 0) and then add to ie the vector λ1 a1 , . . . , λi−1 ai−1 , λi+1 ai+1 , . . . , λn an . These
steps leave the span the samen, so that V =< a1 , . . . , ai−1 , b, ai+1 , . . . , an >. 

4.2.14 In 4.2.11 we have seen an example of a space spanned by three vectors, but which
can also be spanned by two vectors. We now discuss how to find such ‘minimal’
systems of vectors spanning a given space. Apart from the theorems 4.2.10 and
4.2.12, the notion of a linear (in)dependent system of vectors plays a central role.

4.2.15 Definition. (Linearly (in)dependent set of vectors) A set or system of vec-


tors a1 , . . . , an is called linearly dependent if at least one of the vectors is a linear
combination of the others. The vectors are called linearly independent if none of
the vectors is a linear combination of the others. We often say in such situations
that the vectors a1 , . . . , an are linearly (in)dependent, or that the set {a1 , . . . , an }
is linearly independent.

4.2.16 A more practical way to decide if a set of vectors is linearly (in)dependent is based
on the following equivalent formulation.

4.2.17 Definition. (Linearly (in)dependent vectors: practical version) The sys-


tem of vectors a1 , . . . , an is linearly independent if and only if the only solution of
the equation
λ 1 a 1 + λ 2 a 2 + · · · + λ n an = 0 (4.3)
in λ1 , λ2 , . . . , λn is: λ1 = 0, λ2 = 0, . . . , λn = 0.
A non–trivial relation between the vectors a1 , . . . , an is an equality of the form

λ1 a1 + λ2 a2 + · · · + λn an = 0,
4.2 Spans, linearly (in)dependent systems 115

where at least one of the coefficients λ1 , λ2 , . . . , λn is non-zero.


An equivalent formulation for a linearly dependent system of vectors is: the
vectors a1 , . . . , an are linearly dependent if and only if there exists a non-trivial
relation between the vectors a1 , . . . , an .

Proof. We restrict ourselves to the proof of the first equivalence, and leave the
second one to the reader.
First we deal with the implication ⇒). If the equation (4.3) has a solution
with, say, λi 6= 0, then

−λ1 −λi−1 −λi+1 −λn


ai = a + ai−1 + ai+1 + a ,
λi 1 λi λi λi n

so that ai is a linear combination of the other vectors, contradicting the assump-


tion. Conversely, suppose the equation (4.3) only has the zero solution. If, for
example, ai is a linear combination of the vectors a1 , . . . , ai−1 , ai+1 , . . . , an , say,

ai = α1 a1 + · · · + αi−1 ai−1 + αi+1 ai+1 + · · · + αn an ,

then we obtain the following non-trivial relation

α1 a1 + · · · + αi−1 ai−1 − 1 · ai + αi+1 ai+1 + · · · + αn an = 0,

contradicting the assumption. 

4.2.18 Examples. The following examples illustrate definition 4.2.17.

• The vectors
e1 = (1, 0, 0, . . . , 0),
e2 = (0, 1, 0, . . . , 0),
..
.
en = (0, 0, 0, . . . , 1),
in Rn are linearly independent, since the equation

λ1 e1 + · · · + λn en = (0, . . . , 0),

i.e., (λ1 , . . . , λn ) = (0, . . . , 0), has λ1 = · · · = λn = 0 as its only solution.

• The vectors (1, 2, 2) and (0, 1, −1) in R3 are linearly independent; here is the
proof. If a(1, 2, 2) + b(0, 1, −1) = (0, 0, 0), then we rewrite this as (a, 2a +
b, 2a − b) = (0, 0, 0), and easily conclude a = b = 0.
116 Vector spaces

• The vectors (1, 0, 1), (0, 1, 1), (1, 1, 0), (2, 2, 2) in R3 are not linearly depen-
dent. To see this, consider the equation

a(1, 0, 1) + b(0, 1, 1) + c(1, 1, 0) + d(2, 2, 2) = (0, 0, 0)

in a, b, c, d. This is a system of linear equations in a, b, c, d, whose solutions


are (a, b, c, d) = λ(1, 1, 1, −1). So the vectors satisfy a non-trivial relation,
e.g., for λ = 1:

(1, 0, 1) + (0, 1, 1) + (1, 1, 0) − (2, 2, 2) = (0, 0, 0).

So the vectors are not linearly independent.

• The functions sin and cos in the space of real functions R → R are linearly
independent. Suppose

a sin +b cos = 0 (= de nulfunctie),

then, since this is an equality of functions, we find dat that for every real
number t the relation a sin(t) + b cos(t) = 0 holds. Now we choose a few
‘smart’ values for t to deduce that a and b are 0: for t = 0 we get b cos(0) = 0
so that b = 0, and for t = π/2 we get a sin(π/2) = 0 so that a = 0.
Of course, in general there may exist dependences between functions. For
instance, the formula sin(2t) = 2 sin(t) cos(t) tells us that the functions
t 7→ sin(2t) en t 7→ sin(t) cos(t) are not linearly independent.

• In example 4.2.8 a non-trivial relation between the vectors a, b and c ex-


ists, namely a + b − 2c = 0. In particular, these vectors are not linearly
independent.

4.2.19 Finding an independent set from a spanning set of vectors


Every set {a1 , . . . , an } in a vector space V which does not consist of zero vectors
only can be ‘reduced’ to a linearly independent set with the same span in the
following way. Let U =< a1 , . . . , an >. If a1 , . . . an is linearly independent, then
we are done. If not, then at least one of the vectors is a linear combination of the
others. This vector can be left out without changing the span. We are then left
with a set of n − 1 vectors that still spans U . If this set is linearly independent, we
are ready. If not, we repeat the procedure we just described. After at most n − 1
steps we find a linearly independent set spanning U .
Next we turn to a special property of linearly independent sets spanning a given
vector space: if two (finite) linearly independent sets span the same space, they
contain the same number of vectors. That number is what we call the dimension
of the space. Here are the preparations for this result.
4.2 Spans, linearly (in)dependent systems 117

4.2.20 Theorem. Suppose V = < a1 , . . . , an > and suppose b1 , . . . , bm is a linearly


independent set of vectors in V . Then m ≤ n.

Proof. The vector b1 is a linear combination of a1 , . . . , an . Since b1 is not the zero


vector, at least of the coefficients of the ai is not zero. So, Theorem 4.2.12 enables
us to exchange the vectors b1 and one of the ai . Without loss of generality we may
assume that we exchange b1 and a1 (if necessary relabel the ai ). So

V = < b1 , a2 , . . . , an > .

Now the vector b2 is a linear combination of the vectors on the right-hand side.
Again, at least of the coefficients of the a2 , . . . , an must be 6= 0 (otherwise, b1 would
be a multiple of b1 ). So we can exchange b2 and one the vectors a2 , . . . , an , again
by Theorem 4.2.12. Possibly after relabeling, we may assume that we exchange b2
and a2 . So:
V = < b1 , b2 , a3 , . . . , an > .

Continue in the same way. By Theorem 4.2.12 every bi can be exchanged, so that
m ≤ n. 

4.2.21 Theorem. If the vector space V is the span of each of the systems of independent
vectors a1 , . . . , an and b1 , . . . , bm , then m = n.

Proof. Apply the previous theorem to the system of independent vectors b1 , . . . , bm


in < a1 , . . . , an > to conclude that m ≤ n. Then apply the theorem to the system
of independent vectors a1 , . . . , an in < b1 , . . . , bm > to conclude n ≤ m . So m = n.


4.2.22 Definition. (Basis and dimension) A linearly independent set spanning a vec-
tor space V is called a basis of V . The number of elements in the basis is called
the dimension of V is denoted as dim(V ).

4.2.23 If there isn’t a finite basis of V (and V does not consist of 0 only), then we say
dim(V ) = ∞.
The case V = {0} is a bit special. The space V contains only one vector, 0,
but this vector is not linearly independent since 3 0 = 0 (do you see why?). We
usually say that the emptyset ∅ is a basis and that the dimension of V is 0.

4.2.24 Examples. Here are some vector spaces and their dimensions.

• Geometrically it is clear that dim(E 2 ) = 2 and dim(E 3 ) = 3.


118 Vector spaces

• In Rn the set containing the vectors

e1 = (1, 0, 0, . . . , 0),
e2 = (0, 1, 0, . . . , 0),
..
.
en = (0, 0, 0, . . . , 1),

is a linearly independent set spanning Rn : we already established that these


vectors are linearly independent, so we only need to check that every vector
(x1 , x2 , . . . , xn ) can be written as a linear combination of these vectors, but
that is clear from the expression x1 e1 +· · ·+xn en . So dim(Rn ) = n. The basis
e1 , . . . , en is usually called the standard basis of Rn . Similarly, dim(Cn ) = n.
We also use the term standard basis in this case.

• Let Pn be the set of all (real or complex) polynomials in x of degree at most n.


Then Pn = < 1, x, x2 , . . . , xn >. The polynomials 1, x, x2 , . . . , xn are linearly
independent as we now prove. Suppose α0 + α1 x + α2 x2 + · · · + αn xn = 0 for
all x and suppose not all coefficients αj are equal to 0. Then the polynomial
on the left-hand side would have at most n zeros, while the zero polynomial
on the right-hand side is identically 0. This contradiction shows that all αj
have to be 0. So {1, x, . . . , xn } is a basis of Pn and dim(V ) = n + 1.

4.2.25 Here are some consequences of the definitions. If V is a vector space with dim(V ) =
n < ∞, then every basis of V consists of exactly n vectors. We use this to prove
the following statements about the m vectors b1 , . . . , bm in V .

1. If m < n, then < b1 , . . . , bm >⊂⊂ V (the notation ⊂⊂ indicates that the


left-hand side is contained in but not equal to the right-hand side). Here is
why. If < b1 , . . . , bm > were equal to V , then using our technique to reduce
the set {b1 , . . . , bm } to a basis would produce a basis of V with at most m
elements, contradicting the assumption on the dimension being n.

2. If m > n, then the set {b1 , . . . , bm } is not linearly independent because of


Theorem 4.2.20.

3. If m = n, then the vectors b1 , . . . , bm are a basis of V if and only if the


vectors are linearly independent. To prove this, first assume the set is not
linearly independent. Then < b1 , . . . , bm > can be spanned by less than m
vectors and then item 1) implies < b1 , . . . , bm > ⊂⊂ V . So our set can’t
be a basis. Conversely, if b1 , . . . , bn is linearly independent and its span is
not equal to V , then there is a vector a ∈ V with a 6∈< b1 , . . . , bn >, so
that {b1 , . . . , bn , a} is linearly independent (see Theorem 4.2.29 below) and
4.2 Spans, linearly (in)dependent systems 119

contains more than n vectors, contradicting the fact that dim(V ) = n. So


< b1 , . . . , bn >= V and b1 , . . . , bn is a basis of V .

4.2.26 If V is a vector space with dim(V ) = ∞, then there is an infinite sequence of vectors
a1 , a2 , . . . with
an+1 6∈< a1 , . . . , an >
for every n. To see this, choose a1 6= 0 in V . If V = < a1 >, then dim(V ) = 1. So
there must be a a2 ∈ V with a2 6∈< a1 >. If V = < a1 , a2 >, then dim(V ) = 2, so
< a1 , a2 >⊂⊂ V . Now choose a3 ∈ V , a3 6∈< a1 , a2 >, etc.
The infinite sequence a1 , a2 , . . . that we find in this way has the desired prop-
erty. Moreover, for every n the set {a1 , . . . , an } is linearly independent. This
follows from Theorem 4.2.29 below. Here we see an important distinction between
finite dimensional and infinite dimensional vector spaces: in an infinite dimen-
sional vector space there exist arbitrarily large linearly independent sets, whereas
in finite dimensional vector spaces the number of vectors in a linearly independent
set is at most the dimension of the vector space.

4.2.27 Finding bases


Back to the problem of finding ‘economical’ sets spanning a vector space. The pre-
vious discussions show that this comes down to finding bases of vector spaces. Here
are two obviuous ways of finding bases (sometimes ad hoc methods are quicker).
• As we saw before, if a vector space is given as a span of vectors, then by
eliminating ‘dependencies’ we find a basis. In the case of vectors in Rn or
Cn , our row reduction operations from Chapter 3 are useful. We come back
to this in Theorem 4.3.9 below. By using coordinates these techniques can
also be used in other cases (see Theorem 4.3.7).
• If we do not have a set of vectors spanning a given vector space V , we make
one ourselves in the following way. Start with a vector a1 6= 0 in V (if
possible, otherwise, we are already done). If ha1 i = 6 V , then choose a vector
a2 6∈< a1 >. The vectors a1 , a2 are linearly independent by Theorem 4.2.29
below. If ha1 , a2 i =
6 V , then choose a3 6∈< a1 , a2 >. The vectors a1 , a2 , a3 are
linearly independent by Theorem 4.2.29. Etc. We illustrate this technique
in the following example.

4.2.28 Example. We complete a1 = (1, −1, 0) to a basis of R3 . Choose a second vector


outside < (1, −1, 0) >, e.g., a2 = (0, 0, 1). Since dim(R3 ) = 3 we look for a
third vector outside the span < (1, −1, 0), (0, 0, 1) >, e.g., a3 = (1, 1, 0) (of course
we have to check that a3 is a valid choice). By Theorem 4.2.29 below we can
now conclude that {a1 , a2 , a3 } is a basis (you may also prove it directly from the
definition if you prefer).
120 Vector spaces

4.2.29 Theorem. If the set of vectors {a1 , . . . , an } in the vector space V satisfies

a1 6= 0, a2 6∈< a1 >, a3 6∈< a1 , a2 >, . . . , an 6∈< a1 , . . . , an−1 >,

then the vectors are linearly independent.

Proof. To show linear independence, we consider the equation

λ 1 a 1 + λ 2 a 2 + · · · + λ n an = 0

in λ1 , . . . , λn . If λn 6= 0, then

−λ1 −λn−1
an = a1 + · · · + an−1 ,
λn λn

contradicting the fact that an 6∈< a1 , . . . , an−1 >. So λn = 0. In a similar way we


derive that λn−1 = 0, . . . , λ2 = 0. Finally, from λ1 a1 = 0 and a1 6= 0 we conclude
that λ1 = 0 (see exercise 27).

4.3 Coordinates
4.3.1 Coordinates
Bases are ‘minimal’ systems of vectors spanning a vector space. They have another
special property which will enable us to use coordinates. If a1 , . . . , an span V , then
every x ∈ V can be written in the form

x = x1 a1 + · · · + xn an . (4.4)

The coefficients need not be unique. For example, consider the space V from
example 4.2.8; for the vector b we have

b = 0a + 1b + 0c
= − 1a + 0b + 2c.

However, if a1 , . . . , an is a basis, then the coefficients in (4.4) are unique. Here is


why. If x = y1 a1 + · · · + yn an , then by subtracting we deduce

0 = (x1 − y1 )a1 + · · · + (xn − yn )an ,

so that x1 = y1 , . . . , xn = yn because the system is linearly independent. We can


therefore represent every vector x in V in a unique way with such coefficients. We
define coordinates and coordinate vectors as follows.
4.3 Coordinates 121

4.3.2 Definition. (Coordinates) Let a1 , . . . , an be a basis of the vector space V . If

x = x1 a1 + · · · + xn an

then the coefficients x1 , . . . , xn are called the coordinates of the vector x with
respect to this basis. The vector (x1 , . . . , xn ) is called the coordinate vector of x
and is itself a vector in Rn of Cn .
Note: coordinates depend on the basis used!

4.3.3 Example. Let V be the vector space of polynomials of degree at most 2. Consider
the polynomials
p0 : p0 (x) = 1,
p1 : p1 (x) = x,
p2 : p2 (x) = x2 .
Let p be an arbitrary polynomial in this space, say, ax2 + bx + c. Then p =
ap2 + bp1 + cp0 , so that
V =< p0 , p1 , p2 > .
The polynomials p0 , p1 , p2 are linearly independent: suppose α0 p0 +α1 p1 +α2 p2 = 0
(the zero polynomial). This means

α0 + α1 x + α2 x2 = 0 for all x.

If (α0 , α1 , α2 ) 6= (0, 0, 0), then the left-hand side polynomial would have at most
two zeros, which is not the case (since the polynomial is also equal to the zero
polynomial). So (α0 , α1 , α2 ) = (0, 0, 0) and p0 , p1 , p2 are linearly independent.
Therefore the polynomials p0 , p1 , p2 form a basis of V and (c, b, a) is the coordinate
vector of p with respect to this basis.

4.3.4 Example. The vectors (1, 1) and (1, −1) form a basis of R2 . The linear inde-
pendency is easily derived from the fact that a(1, 1) + b(1, −1) = (0, 0) implies
a = b = 0. From 4.2.25 we then derive that these two independent vectors are a
basis of the space.
The coordinate vector of (5, 3) with respect to this basis can be found by
looking for a c and d such that c(1, 1) + d(1, −1) = (5, 3). Solving leads to c = 4
and d = 1. The coordinate vector of (5, 3) w.r.t. the new basis is then (4, 1).

4.3.5 Coordinates of sums and scalar multiples


Let a1 , . . . , an be a basis of the vector space V . If x ∈ V has coordinate vector
(x1 , . . . , xn ) w.r.t. this basis and y ∈ V has coordinate vector (y1 , . . . , yn ), then we
easily verify that the coordinate vectors of x + y and αx are (x1 + y1 , . . . , xn + yn )
122 Vector spaces

and (αx1 , . . . , αxn ), respectively. E.g., for the scalar multiple αx this follows from
the fact that α(x1 a1 + · · · + xn an ) = (αx1 )a1 + · · · (αxn )an .
So addition and scalar multiplication in V correspond nicely to the usual ad-
dition and scalar multiplication in Rn (or Cn ).

4.3.6 By coordinatising questions on vector spaces can be reduced to questions on Rn


or Cn , where we have our machinery of techniques at our disposal. (Of course,
finally we translate the answers found in the coordinate world back to the original
setting.)
The vector spaces Rn and Cn model every real and complex vector space of
dimension n. Mathematicians say that every real n-dimensional vector space is
isomorphic to Rn , and every complex n-dimensional vector space is isomorphic to
Cn .
Often we say that R2 is the plane, but that is, strictly speaking, wrong. Upon
choosing a basis (a1 , a2 ) in E 2 every vector x can be written in the form x =
x1 a1 + x2 a2 and computations with vectors can be ‘translated’ into computations
with elements in R2 . The choice of basis is of importance here. Since we are so
used to the use of coordinate vectors in the plane, it is tempting to say that these
two space are the same.
The following theorem shows that linear independency can be checked at the
level of coordinates.

4.3.7 Theorem. Let α be a basis of the n–dimensional vector space V and let {a1 , . . . , am }
be a set of vectors in V . Then:

• {a1 , . . . , am } is linearly indenpendent if and only if the corresponding set of


coordinate vectors (in Rn or Cn ) w.r.t. α is linearly independent.

• {a1 , . . . , am } is a basis if and only if the corresponding set of coordinate


vectors (in Rn or Cn ) w.r.t. α is a basis.

Proof. We only prove the first item, since the second item is a direct consequence
of it. Suppose b1 , . . . , bm are the coordinate vectors of a1 , . . . , am . The coordinate
vector of λ1 a1 +· · ·+λm am is equal to λ1 b1 +· · ·+λm bm . So if λ1 a1 +· · ·+λm am = 0,
then we also have λ1 b1 +· · ·+λm bm = (0, . . . , 0) and conversely, since the coordinate
vector of the zero vector is (0, . . . , 0). A non-trivial relation between a1 , . . . , am
translates into a non-trivial relation between the coordinate vectors b1 , . . . , bm . In
other words: a1 , . . . , am is linearly dependent if and only if b1 , . . . , bm is linearly
dependent.

4.3.8 Relation with systems of linear equations


The above considerations also lead to a new view on systems of linear equations.
4.3 Coordinates 123

Let
a11 x1 + a12 x2 + · · · + a1m xm = b1
.. ..
. .
an1 x1 + an2 x2 + · · · + anm xm = bn
be such a system. Let k 1 , . . . , k m be the columns of the coefficient matrix and let
b = (b1 , . . . , bn )T . Then we can write the system as

x1 k 1 + x2 k 2 + · · · + xm k m = b.

So we try to write b as a linear combination of the columns of the matrix. The


system has at least one solution if and only if b is contained in the span of the
columns, and precisely one solution if and only if the columns k 1 , . . . , k m are
moreover linearly independent. In that case, the solution produces the coordinates
of the vector b w.r.t. the basis {k 1 , . . . , k m }.

4.3.9 Theorem. The nonzero rows of a matrix in row reduced echelon form are linearly
independent.
Proof. A matrix in row reduced form has the following shape (see 3.2.3):

0 ... 0 1 ∗ ... ∗ 0 ∗ ... ∗ 0 ∗ ... 0 ∗ ... ∗


 

 0 ... 0 0 0 ... 0 1 ∗ ... ∗ 0 ∗ ... 0 ∗ ... ∗ 


 0 ... 0 0 0 ... 0 0 0 ... 0 1 ∗ ... 0 ∗ ... ∗ 


 ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... 


 ... ... ... ... ... ... ... ... ... ... ... ... ... ... 0 ∗ ... ∗ 


 0 ... 0 0 0 ... 0 0 0 ... 0 0 0 ... 1 ∗ ... ∗ 


 0 ... 0 0 0 ... 0 0 0 ... 0 0 0 ... 0 0 ... 0 

 .. .. .. .. .. .. .. .. .. .. .. .. .. 
 . . . . . . . . . . . . . 
0 ... 0 0 0 ... 0 0 0 ... 0 0 0 ... 0 0 ... 0

Suppose the first p rows r1 , . . . , rp are 6= 0 and suppose

α1 r1 + α2 r2 + · · · + αp rp = 0.

Now consider the columns containing only zeros except for one 1. Then we find,
respectively,
α1 = 0, α2 = 0, . . . , αp = 0.


4.3.10 The considerations above fully explain the techniques announced in 4.2.27. From
Theorems 4.2.10 and 4.3.9 we find how to construct a basis for a given span in Rn
124 Vector spaces

or Cn : Consider the spanning vectors as rows of a matrix, take the row reduced
echelon form of this matrix and take the nonzero rows. In an ‘abstract’ vector
space we can use these techniques if use coordinates.

4.3.11 Example. Consider the following vectors in R4 :


a = (1, 0, 2, 0), b = (1, 1, 2, 1), c = (2, −1, 4, −1) en d = (1, 3, 2, 3).
Suppose V =< a, b, c, d >. Now form the matrix
 
1 0 2 0
 1 1 2 1 
 2 −1 4 −1  .
 

1 3 2 3
Row reducing doesn’t change the span of the rows. The row reduced form is:
 
1 0 2 0
 0 1 0 1 
 0 0 0 0 .
 

0 0 0 0
The nonzero rows (1, 0, 2, 0) en (0, 1, 0, 1) produce a basis of V .

4.3.12 Example. Consider the following polynomials in the vector space of polynomials
of degree at most 2
p1 : p1 (x) = x2 + 2x − 3,
p2 : p2 (x) = x2 − 2x + 1,
p3 : p3 (x) = x2 − 5x + 4.
and consider their span V = < p1 , p2 , p3 >. Choose as basis: (1, x, x2 ). With
respect to this basis, the coordinate vectors of the three polynomials are
p1 : (−3, 2, 1),
p2 : (1, −2, 1),
p3 : (4, −5, 1).
Next we use Theorems 4.3.7 and 4.3.9 to find a basis for V . Collect the coordinate
vectors as rows in a matrix and row reduce:
   
−3 2 1 1 0 −1
 1 −2 1  ∼  0 1 −1  .
4 −5 1 0 0 0
So the first two rows yield the basis (1, 0, −1), (0, 1, −1) of the span of the coor-
dinate vectors of p1 , p2 , p3 . So the polynomials 1 − x2 and x − x2 are a basis of
V.
4.4 Notes 125

4.4 Notes
The notion of a vector space (or linear space) is the central concept in linear
algebra; it is the (or a) formalized version of our intuition of space. Its strength
lies in the fact that vector spaces can be used in many different situations. One of
the first to describe vector spaces using axioms was Giuseppe Peano (1858–1932).
In this course we only touch upon the precise role of the axioms. Probably you do
not even notice that we use rules such as 0 + 0 = 0 and 0 · a = 0, which we didn’t
prove (but see the exercises).
Vector spaces are used in many different situations, also outside mathematics.
For instance, to describe notions like speed, acceleration force and impulse in Mechanics
mechanics, and fields in electromagnetism. In signal theory (to handle visual or
audio signals) and in quantum mechanics vector spaces of functions are important.
They tend to be infinite dimensional.
As for geometry, although vector spaces are used to model ‘flat’ objects like
lines and planes, vector spaces are also of help in describing tangent spaces to
curved objects.
Finally, instead of using real or complex numbers, also other systems of scalars
are possible, and most results still hold! In coding theory and cryptology, such
number systems, like the integers modulo 2 are used (i.e., the numbers 0 and 1 Coding theory
with the rules 0 + 0 = 0, 0 + 1 = 1 + 0 = 1, 1 + 1 = 0, 1 · 0 = 0 · 1 = 0 and 1 · 1 = 1). Cryptology
126 Vector spaces

4.5 Exercises
§1

1 In each of the following cases decide if the subsets of R3 are linear subspaces of
R3 :
W1 = {(x1 , x2 , x3 ) ∈ R3 | x2 ∈ Q},
W2 = {(x1 , x2 , x3 ) ∈ R3 | x1 + 2x2 + 3x3 = 0},
W3 = {(x1 , x2 , x3 ) ∈ R3 | x1 + x2 = 1},
W4 = {(x1 , x2 , x3 ) ∈ R3 | x1 ≥ 0}.

2 In each of the following cases decide if the subsets of C3 are linear subspaces of
C3 :
W1 = {(z1 , z2 , z3 ) ∈ C3 | z1 + iz2 + (1 + i)z3 = 0},
W2 = {(z1 , z2 , z3 ) ∈ C3 | z1 + iz2 + (1 + i)z 3 = 0},
W3 = {(z1 , z2 , z3 ) ∈ C3 | Re(z1 ) + Im(z2 ) = 0},
W4 = {(z1 , z2 , z3 ) ∈ C3 | z 1 + iz 2 = 0}.

3 Check whether the following subsets of the vector space of 2 × 3-matrices with real
entries are linear subspaces:

a. the matrices of the form  


a11 0 0
,
a21 a22 0

b. the matrices  
a11 a12 a13
, where a11 + a22 = 0.
a21 a22 a23

4 Consider the vector space V of all functions defined on R. Check whether the
following subsets of V are linear subspaces:

a. the functions f with f (1) = 0,

b. the functions f with f (1) = 2,

c. the functions f with f (0) = f (1),

d. the functions f with limx→∞ f (x) = 0,

e. the functions f with f (0) = 1 + f (1),

f. the functions f with f (x) = f (−x) for all x,

g. the functions f with f (x) ≥ 0 for all x ∈ R,


4.5 Exercises 127

h. the functions f with f a real polynomial of degree ≥ 3.

5 Determine a parametric representation of each of the following planes in R3 :

a. x + 4y − 5z = 7,

b. 2x − 4y + z = 0,

c. 2x + 4y + 4z = 7.

6 Determine an equation of each of the following planes in R3 :

a. V : x = (3, 0, 1) + λ(1, 1, 2) + µ(0, 0, 1),

b. V : x = (1, 1, 1) + λ(2, 3, 1) + µ(1, 0, 1),

c. V : x = (1, 1, 3) + λ(0, 1, 1) + µ(−1, 0, 1).

7 Determine a parametric representation of each of the following planes in R4 :

a. V = {(x, y, z, u) | x + 2y − 3z − u = 2, 2x + y + 6z + u = 7},

b. V = {(x, y, z, u) | y + 2z − 2u = 1, 3y + 6z = 9}.

8 Consider the following planes in R4 which are given in parametric representation.


Describe each of them by a system of two linear equations:

a. U : x = (1, 0, 0, 0) + λ(1, 1, 1, 0) + µ(1, 2, 2, 2),

b. V : x = (−1, 1, 3, 5) + λ(2, −1, 1, 1) + µ(−1, −1, 1, −2).

9 Let l be the line in R3 through (3, 2, 1) and (−3, 5, 4) and let V be the plane with
equation 3x − y + 2z = 4. Determine the intersection of l and V .

10 Let U and W be linear subspaces of the vector space V .

a. Prove that the intersection U ∩ W is also a linear subspace of V .

b. Prove that the subset { u + w | u ∈ U, w ∈ W } is a linear subspace of V .


(This subspace is often denoted by U + W .)

c. Show by example that the union U ∪ W is not necessarily a linear subspace of


V.

§2, 3
128 Vector spaces

11 Show that:

a. (1, 3) and (3, 9) form a linearly dependent set of vectors in R2 ,

b. (1, 2) and (2, −1) form a linearly independent set of vectors in R2 ,

c. (0, 1, 2), (1, 2, 3), (1, 1, 1) form a linearly dependent set of vectors in R3 ;

d. (−1, 5, 5, 3), (−1, 2, 1, 1), (1, 1, 3, 1) form a linearly dependent set of vectors in
R4 ,

e. in R3 , the vector (1, 2, 1) is not a linear combination of (1, 3, 2) and (1, 1, 1),

f. in R3 , the vector (1, 1, 1) is a linear combination of (3, −1, 4), (1, −3, 2) and
(2, 6, 1),

g. in R3 , the vector (1, 0, 0) is not a linear combination of (3, −1, 4), (1, −3, 2)
and (2, 6, 1).

12 Which of the following systems are linearly independent:

a. e2t , sin t, cos t in the space of all functions on R,

b. eit , sin t, cos t in the space of all functions on R,

c. the polynomials z 2 + 1, z 3 + z and z + i in the vector space of polynomials.

d. 2, t, sin t, cos t in the space of all functions on R,

e. 2, t, sin2 t, cos2 t in the space of all functions on R,

13 Determine a basis of each of the following spans:

a. < (1, 2), (2, 3) > in R2 ,

b. < (1, 1, 1), (1, 2, 1), (1, 0, 1) > in R3 ,

c. < (1, 0, 0, 1), (2, 1, 1, 3), (0, 0, 1, 0) > in R4 ,

d. < (3, −1, 4, 7), (1, −3, 2, 5), (2, 6, 1, −2) > in R4 ,

e. < (3, −1, 4, 7), (1, −3, 2, 5), (5, 3, 2, −1) > in R4 ,

f. < (3, −1, 4, 7), (1, −3, 2, 5), (2, 6, 1, −2), (0, 4, −1, 4) > in R4 .

14 Check if the vectors (2, −2, 7, 5), (i, 1 + i, i, 1), (2 + 3i, −3 + 2i, 1, −2 + 2i) belong
to the span < (3, −2, 3, 1), (2, 1, −2, −1), (1, 1, 2, 3) > in C4 .
4.5 Exercises 129

15 In the vector space of polynomials the following polynomials are given: f (x) =
x + 1, g(x) = (x + 1)2 . Check if the polynomials x2 + 3x + 1, x2 − 1, 3x2 − 4x − 7
belong to the span < f, g >.

16 Let a and b be two vectors in a vector space. Prove that < a, b >=< a − b, a + b >.

17 Suppose the vectors a, b, c are linearly independent. Determine whether the fol-
lowing systems are linearly dependent:
a. a + b, a + b − c, 2a + b + c;

b. a + b + c, a + 2b, c − b;

c. a + 2b, a + c, c.

18 Let a1 , . . . , ak be a basis of the linear subspace U of the vector space V . Let


ak+1 , ak+2 , . . . , an be vectors in V . Show that a1 , a2 , . . . , an is a basis of V if and
only if the following holds:

ak+1 6∈ U, ak+2 6∈< a1 , a2 , . . . , ak+1 >, . . . , an 6∈< a1 , a2 , . . . , an−1 > .

19 Determine a parametric representation of the plane containing the point (0, 1, 1)


and the line x = (1, −1, 0) + λ(2, 1, 1). Also determine an equation of this plane.

20 In R3 the subset V and the line l are described as follows

V : x = (1, 1, 1) + λ(1, a, a2 ) + µ(1, a, 4), l : x = σ(1, 3, 3).

a. For which values of a is V a plane?

b. For which values of a do l and V not intersect (which means that l and V are
parallel)?

21 a. Determine all vectors in R3 which belong to both of these spans:

U1 =< (−4, 1, 3), (−2, 3, 1) > en U2 =< (−1, 5, 4), (3, −1, 2) >,

i.e., determine the union U1 ∩ U2 .

b. Determine all vectors in R4 which belong to both of the following spans:

V1 =< (4, 3, 2, 1), (1, 0, 0, 0) > en V2 =< (2, 1, 0, 0), (3, 2, 1, 0) >,

i.e., determine V1 ∩ V2 .
130 Vector spaces

22 Determine a basis and the dimension of the following subspaces of C3 :

a. < (3, −1, 4), (1, −3, 2), (5, 3, 2) >,

b. < (i, 1 + 2i, 1 + i), (i, 1 + 3i, 2 + 2i), (2 + 2i, 5 + i, 2) >,

c. < (2i, 3i, i), (1 + i, 2 + i, i), (−1 + 2i, −1 + 3i, −1 + i) >.

23 Determine a basis and the dimension of each of the following spans (in the space
of functions from R to R):

a. < e2t , t2 , t >,

b. < 2, t, sin2 t, cos2 t >,

c. < e2t , e−t , cosh t, sinh t >,

d. < 2x3 + x2 − x + 5, x3 + 2x2 + 10, −2x2 + x, 2x3 − 8x2 + x − 10 >.

24 Determine the dimension of each of the following subspaces:

a. the subspace of P5 of the polynomials which are zero in 2,

b. the space M3,3 of 3 × 3–matrices,

c. the subspace of M3,3 consisting of the matrices A with A = A⊤ ,

d. the subspace of M3,3 consisting of the matrices A for which A + A⊤ = 0,

e. the subspace of M2,3 consisting of the matrices A, for which


 
1
A 2  = 0,

3

f. the subspace of M2,2 consisting of the matrices A, for which


   
1 1 1 1
A = A.
1 2 1 2

25 In the vector space V the linearly independent set a, b, c is given. Determine the
dimension of each of the following spans:

a. < a − b, a + b + c, −2a − c >,

b. < a − b, a + b, a + b + c >,
4.5 Exercises 131

26 Determine the coordinates of each of the following vectors with respect to the given
bases:
a. (2, 3) with respect to (1, 0), (1, 1),
b. (1, 2, 3) with respect to (1, 1, 1), (1, 0, 1), (0, 0, 1),
c. x2 with respect to 1 − x, 1 − x2 , x + x2 ,
d. cos 2t with respect to 4, sin2 t.

27 In this problem we will prove some further properties of vectors. We need the
eight axioms mentioned in 4.1.2.
a. If a + 0 = a + b, then b = 0. Prove this by adding the opposite −a of a to
both sides.
b. For all scalars λ 0 = 0. Indicate which axioms are used in the following
derivation. We have: λ 0 = λ(0 + 0) = λ 0 + λ0. But we also have: λ 0 =
λ 0 + 0. So λ 0 + λ0 = λ 0 + 0, so that part a. implies λ 0 = 0.
c. This item focuses on the equality 0 a = 0 (for every a). Finish the following
proof: 0 a = (0 + 0)a = 0 a + 0 a.
d. The zero vector 0 is unique, i.e., if 0′ also satisfies a + 0′ = a for all a, then
0 = 0′ . Prove this by considering the expression 0 + 0′ .
e. The opposite −a of a given vector a is unique. Suppose b is also an opposite
of a. Then finish the following chain of equalities to provide the proof:
−a = −a + 0 = −a + (a + b).

28 Prove: a system of vectors a1 , . . . , an in a vector space V is a basis of V if and


only if every vector from V can be written as a linear combination of the vectors
a1 , . . . , an in exactly one way.

29 V is a finite dimensional vector space; U and W are subspaces of V .


a. Assume that each vector in V can be written as a sum of a vector from U
and a vector from W . (Then we write V = U + W .) Show that dim(V ) ≤
dim(U ) + dim(W ).
b. If V = U + W and U ∩ W = {0}, then every vector from V can be written
in exactly one way as a sum of a vector from U and a vectr from W (i.e. if
x = u + w = u′ + w′ with u, u′ ∈ U and w, w′ ∈ W , then u = u′ and w = w′ ).
Derive this from the relation u − u′ = w′ − w. Also show that in this situation
dim(V ) = dim(U ) + dim(W ).
132 Vector spaces

4.5.1 Exercises from old exams


30 a. If a, b, c is a linearly independent system of vectors in a vector space V , then
prove that a + b, a − b, a − 2b + c is a linearly independent system.

b. In the real vector space P2 of polynomials of degree at most 2, the following


polynomials are given:

p1 (x) = x2 + 2x − 3, p2 (x) = x2 + cx + 1, p3 (x) = x2 − 5x + 4.

For which value(s) of c is p1 (x), p2 (x), p3 (x) a basis of P2 ?

31 C ∞ (R) is the vector space of infinitely differentiable functions f : R → R.

a. Determine a basis of the subspace

U = h1 + 2x, 4x − e3x , 2 + e3x i

of C ∞ (R). What is the dimension of U ?

b. Determine the intersection U ∩ W , where W = h1, ex , e3x i.

32 a. Let U = { (x, y, z, u) | x+y +z = 0, y +z +u = 0 } be a plane in R4 . Determine


a vector parametric representation of U .

b. Determine the intersection of the plane U from part a) and the plane

V : x = (2, 2, 2, 1) + λ(1, 3, 2, 0) + µ(1, 2, 3, 0).


Chapter 5

Rank and inverse of a matrix,


determinants

5.1 Rank and inverse of a matrix


5.1.1 In this section we first concentrate on the linear subspaces spanned by the rows
and columns of a matrix, respectively. Then we study the relation with systems
of linear equations and with the inverse of a matrix.

5.1.2 Definition. (Row and column space) Let A be a matrix with n rows and m
columns. Then every row has m entries so that these rows can be seen as vectors
in in Rm or Cm ; the subspace spanned by the rows is called the row space of the
matrix. Similarly, every column is an element of Rn or Cn ; the space spanned by
the columns is called the column space of the matrix.

5.1.3 We agree to consider length n sequences of numbers, but written in column form,
as elements of Rn or Cn , respectively, and that, when convenient, we consider
elements of Rn or Cn , as columns. Of course, we try to avoid any confusion in
doing so. For instance, we write: the system Ax = b with x ∈ Rn , where x is then
seen as a column vector.

5.1.4 Example. Let  


1 −1 3 7
A= .
2 1 1 5
Then the row space is

< (1, −1, 3, 7), (2, 1, 1, 5) > ⊆ R4

133
134 Rank and inverse of a matrix, determinants

and the column space is

< (1, 2), (−1, 1), (3, 1), (7, 5) > ⊆ R2 .

5.1.5 The row and column spaces of a matrix seem to be quite unrelated, since they are
usually subspaces of different vector spaces. Yet their dimensions are the same!
To show this we first connect the matrix product to linear combinations of the
columns (or rows). The following example shows how a matrix product can be
rewritten as a linear combination of the columns of the 3 × 2–matrix.
 
  3  
1 3 −1   3 · 1 + 2 · 3 + 6 · (−1)
2 =
2 −2 5 3 · 2 + 2 · (−2) + 6 · 5
6      
1 3 −1
=3 +2 +6 .
2 −2 5

In general, the matrix product


  
a11 a12 · · · a1m x1
 .. .. ..   .. 
 . . .  . 
an1 an2 · · · anm xm

can be rewritten as
     
a11 a12 a1m
x1  ...  + x2  ...  + · · · + xm  ...  ,
     

an1 an2 anm

i.e., as a linear combination of the columns k 1 , . . . , k m of the matrix A. In a similar


way, the product
 
a11 a12 · · · a1m
 . .. .. 
y1 · · · yn  .. . . 
an1 an2 · · · anm

can be viewed as a linear combination y1 r1 + · · · + yn rn of the rows r1 , . . . , rn of


the matrix. A first conclusion from these considerations is the following.

5.1.6 Theorem. The system of linear equations (in matrix form) Ax = b has a solution
if and only if b belongs to the column space of A.
5.1 Rank and inverse of a matrix 135

5.1.7 Two linear combinations of the columns can be described by using a m × 2 matrix
instead of a m × 1 matrix. Likewise for more than two linear combinations. For
example, in the following matrix product the two columns of the 2×2–matrix from
the right-hand side are linear combinations of the three columns of the 2×3-matrix
from the left-hand side:
 
  3 1  
1 3 −1  3 −13
2 −4  = .
2 −2 5 32 20
6 2

Now suppose that the columns of A span a subspace of dimension k. Let c1 , . . . , ck


be a basis of this space. Arrange these vectors as columns in a n × k–matrix C.
Each of the m columns of A is a linear combination of these columns of C; these
linear combinations assembled in the matrix product

CX = A,

in which X is a k × m–matrix.
Now concentrate on the rows in this equality: every row of A is a linear com-
bination of the rows of X. Since the number of rows of X equals k, the row space
has dimension at most k. So

dim(rowspace) ≤ dim(columnspace).

Note that this equality applies to any matrix. In particular, if we apply it to AT


and note that the dimension of the row space (column space) of AT equals the
dimension of the column space (rwo space) of A, we find:

dim(columnspace) ≤ dim(rowspace).

Summarizing:

5.1.8 Theorem. For every matrix A the dimension of the row space equals the dimen-
sion of the column space.

5.1.9 Definition. (Rank) The rank of a matrix is by definition the dimension of its
row or column space. Notation: rank(A).

5.1.10 Determining the rank of a matrix is straightforward: row reduce till you reach the
row reduced echelon form and count the number of nonzero rows.
In the remainder of this section we concentrate on n × n–matrices.

5.1.11 Theorem. For a n × n–matrix A the following statements are equivalent:


136 Rank and inverse of a matrix, determinants

1. the rank of A is n;

2. the rows of A are linearly independent;

3. the columns of A are linearly independent;

4. the row reduced echelon form of A is the identity matrix.

Proof. We first show that 1), 2) and 3) are equivalent, and then that 1) and 4) are
equivalent.
If the rank of A is n, then the n rows and the columns span an n-dimensional
space, hence must be linearly independent. So 1) implies 2) and 3). Conversely, if
the n rows (or columns) are linearly independent, then the rows (columns) span a
n-dimensional space and so the rank must be n. So 2) implies 1), and 3) implies
1). Hence 1), 2), 3) are equivalent.
Since A and its row reduced echelon form have the same rank, we see that 4)
implies 1). Now suppose A has rank n and consider the last row of the row reduced
echelon form. It can’t be the zero row, because then the row space is spanned by
n − 1 rows and its dimension would be at most n − 1. So there must be a 1 in the
last row. Since the last row starts with more zeros than the n − 1-th row, and this
n − 1-th row starts with more zeros than the n − 2-th row, etc, this 1 must be in
position n, n. In a similar way, you show that, for k = 1, . . . , n − 1, the k-th row
is the k-th row of the identity matrix (or use induction). So 1) implies 4) and we
are done. 

5.1.12 Corollary. A system Ax = b of n linear equations in n variables has exactly one


solution if and only if the rank of the coefficient matrix equals n.

Proof. If the rank of the coefficient matrix is n, then the columns are a basis of
Rn (or Cn ). Every vector in Rn (or Cn ) can then be written in a unique way
as a linear combination of the columns, i.e., the system Ax = b has exactly one
solution.
If the rank of A is less than n, then the columns of A are not linearly indepen-
dent. This means that the system has infinitely many solutions, or no no solutions
at all (this last case means that b is not in the column space of A). So if the system
has exactly one solution, the rank of A must be n. 

5.1.13 Theorem. if the m × n–matrix A has rank k, then the solution space of the
homogeneous system of linear equations Ax = 0 in n variables has dimension
n − k. In other words: the dimension of the solution space equals the number of
variables minus the number of ‘independent’ conditions.
5.1 Rank and inverse of a matrix 137

Proof. Since the rank of A nor the solution space of Ax = 0 changes if we replace
A by its row reduced echelon form, we can restrict our attention to the case that
A is already in row reduced echelon form. The rank of A is then equal to the
number of nonzero rows (see Theorem 4.3.9). The first place (from the left) in row
j containing a 1 corresponds to variable xij , say. So the k variables xi1 , . . . , xik
cannot be assigned a value arbitrarily, while the remaining n − k variables, say
xj1 , . . . , xjn−k in spots j1 , . . . , jn−k can be assigned arbitrary values, λj1 , . . . , λjn−k ,
say. The solutions can then be written as λj1 aj1 + · · · + λjn−k ajn−k . The vector
ajl has a 1 in position jl , while the remaining vectors have a zero in that position.
This implies that these n − k vectors are linearly independent (why?). So the
dimension of the solution space is n − k. 

5.1.14 Inverse of a matrix


Finally, we use the previous results to discuss a procedure to find the inverse of
an n × n–matrix, if this inverse exists. The main result ifs the following theorem;
the proof provides a way to compute inverses.

5.1.15 Theorem. The inverse of an n × n–matrix A exists if and only if rank(A) = n.


Proof. Let A be an n × n–matrix of rank n. Then the columns k 1 , . . . , k n of A
are linearly independent and are a basis of Rn (or Cn ). We are looking for an
n × n–matrix X with elements xij so that
 
  1 0 ··· 0
x11 x12 · · · x1n
 .. .. ..  =  0 1 · · · 0  .
 
A . . .   .. . .
 . . . .. 

xn1 xn2 · · · xnn
0 ··· 0 1

The elements of the i-th column (x1i , . . . , xni ) of X should satisfy x1i k 1 + · · · +
xni k n = ei . Since the columns of A are a basis of Rn (or Cn ), such a column exists
(and is in fact unique). So there is a unique inverse.
Conversely, if an inverse matrix X exists, then we conclude from

x1i k 1 + · · · + xni k n = ei voor alle i

that e1 , . . . , en are in the column space of A. The n columns of A therefore span the
n-dimensional space Rn (or Cn ), hence must be linearly independent (otherwise
this n-dimensional space could be spanned with less columns contradicting the fact
that the dimension is n). So the rank of A is n by Theorem 5.1.11. 

5.1.16 Computing inverses


The proof just given provides a way to compute inverses. To find the i-th column
138 Rank and inverse of a matrix, determinants

of the inverse, you need to solve the system of linear equations with extended
matrix (A|ei ). Since the resulting n systems all have the same coefficient matrix
A, these systems can be solved simultaneously! To actually do this, consider the
matrix (A|e1 , . . . , en ) = (A|I). Row reduce and read off the solutions on the right
of the vertical bar: (A|I) is being reduced to (I|A−1 ) if the inverse exists. Whether
this inverse exists can be concluded during the process: if the rank of A turns out
to be less than n, then row reducing of (A|I) produces a matrix in which the last
row is
(0, . . . , 0| ∗ . . . ∗),
where the last n ∗-s cannot all be 0 (the rows of I are linearly independent so must
remain nonzero in the row reducing process). So the system has no solutions.
In Linear Algebra 2 we will be able to prove in a simple way that if a matrix
B satisfies A B = I, then automatically B A = I. The matrix computed according
to the above procedure is therefore indeed the inverse of A.

5.1.17 Example. To determine the inverse of the matrix


 
4 0 1
A= 2 2 0 
3 1 1

we consider (A|I), i.e.,  


4 0 1 1 0 0
 2 2 0 0 1 0 ,
3 1 1 0 0 1
and row reduce:
1 1
− 21
 
1 0 0 2 4
 0 1 0 − 12 1
4
1
2
.
0 0 1 −1 −1 2
The first column of the inverse matrix is then the first column on the right of the
vertical bar, etc. So we find:
 1 1
− 21
  
2 4 2 1 −2
1
A−1 =  − 12 14 1 
2 =  −2 1 2 .
4
−1 −1 2 −4 −4 8

5.2 Determinants
5.2.1 In the previous section we have seen that an n × n-matrix A of rank n has some
pleasant properties: its inverse exists and the system of linear equations (A|b) has
5.2 Determinants 139

exactly one solution. To determine the rank of a matrix can be done using row
reduction and a simple count of nonzero rows. A second technique which we will
discuss, is to determine the so-called determinant of an n × n-matrix. This is a
number which is nonzero if and only if the rank of the matrix equals n. Computing
determinants is fairly easy (at least if n is relatively small), but the theory behind
is more complicated.
The plan for this section is as follows.
• First we discuss the case of 2 × 2–determinants in detail, because almost all
aspects of determinants can be illustrated in this case.
• Then we turn to the definition of a general n × n–determinant.
• Next we concentrate on various ways to compute determinants, like expand-
ing with respect to a row or column and the role of row reducing.
• Finally, we discuss the connection with systems of linear equations (Cramer’s
rule) and with inverse matrices.

5.2.2 Definition. (2 × 2–determinant) The 2 × 2–determinant of a 2 × 2–matrix


 
a b
A=
c d
is the number ad − bc. Notations: det(A) and
a b
.
c d

5.2.3 It’s easy to show that det(A) 6= 0 ⇔ rang(A) = 2. If, for instance, the second row
is a multiple of the first, say (c, d) = λ(a, b), then it easily follows that det(A) =
a(λb) − b(λa) = 0. Similarly, if the first row is a multiple of the second row.
Conversely, if det(A) = 0 and a 6= 0, then from ad = bc we get d = (bc/a) so that
(c, d) = (c/a)(a, b). Finish the proof yourself by considering the case a = 0.
The number det(A) also plays a role in solving the system of equations Ax = p
in two variables. If A1 is the matrix obtained from A by replacing the first column
by p, and A2 the matrix obtained from A by replacing the second column of A by
p, then the unique solution of the system

ax1 + bx2 = p1
cx1 + dx2 = p2
is
det(A1 ) p1 d − bp2 det(A2 ) ap2 − p1 c
x1 = = , x2 = = ,
det(A) ad − bc det(A) ad − bc
140 Rank and inverse of a matrix, determinants

provided det(A) 6= 0. You can check that this is really the solution, but we will
come across a nice proof in 5.2.22. There is a similar formula for the inverse of a
2 × 2–matrix of rank 2.
The determinant det(A) also plays a role in surface area computations. (We
haven’t defined surface area’s exactly, so this discussion is only by way of illus-
tration.) Figure 5.1 provides a ‘proof by pictures’ of the fact that the area of a
parallellogram spanned by the vectors (a, b) and (c, d) is equal to ad − bc = det(A).
This number is called the ‘oriented area’ since it can be negative. The ‘true’ area
is obtained by taking the absolute value. The expression ad − bc is not linear, but
(0, d-bc/a) (0, d-bc/a)

(c,d)

(a,b)

(a, 0)

Figure 5.1: De determinant en de oppervlakte van een parallellogram.

is what is called bilinear (as will be discussed later). Most properties of the 2 × 2–
determinant recur when we discuss n × n–determinants. Consider the determinant
as a function of the two rows (or columns, but for the moment we focus on rows)
of the matrix: det(a1 , a2 ). Then the properties that we mean are mainly (for all
choices of vectors, scalars):
1. bilinearity:

det(λ1 b1 + λ2 b2 , a2 ) = λ1 det(b1 , a2 ) + λ2 det(b2 , a2 ),


det(a1 , λ1 b1 + λ2 b2 ) = λ1 det(a1 , b1 ) + λ2 det(a1 , b2 ),

(sometimes we say: linear in both entries);

2. antisymmetry: det(a1 , a2 ) = − det(a2 , a1 ) (interchanging two vectors in-


troduces a minus sign); if the two vectors are equal, then we conclude:
det(a, a) = 0 since this number is equal to its negative;

3. normalization: det(e1 , e2 ) = 1.
These properties are easy to verify. For instance, the second property follows from:

a11 a12 a21 a22


= a11 a22 − a12 a21 = −(a21 a12 − a22 a11 ) = − .
a21 a22 a11 a12
5.2 Determinants 141

It seems a bit pompous to desribe an easy expression like ad−bc with these abstract
looking properties, but in higher dimensions the description with the properties is
extremely useful as opposed to explicit formulas for n × n determinants.
The determinant is unique in the sense that any function D of pairs of vectors
having the three properties mentioned above must be the determinant function.
To show this, we first note that D(a, a) = 0 (use antisymmetry). Next, using
bilinearity we find:

D((a, b), (c, d)) = D(ae1 + be2 , ce1 + de2 )


= aD(e1 , ce1 + de2 ) + bD(e2 , ce1 + de2 )
= acD(e1 , e1 ) + adD(e1 , e2 ) + bcD(e2 , e1 ) + bdD(e2 , e2 )
= ad D(e1 , e2 ) + bc D(e2 , e1 )
= (ad − bc)D(e1 , e2 ) = ad − bc.

In higher dimensions a similar computation (which we will skip) shows that the
n × n determinant is unique.
Now we turn to the n × n determinant and start with a definition of a determinant
function of n vectors in Rn or Cn .

5.2.4 Definition. (Determinant function) A determinant function on Rn or Cn is a


function D that assigns to every n-tuple of vectors a1 , . . . , an a real (or complex)
number and has the properties:

1. Multilinearity:
m
X
D(a1 , . . . , ai−1 , βk bk , ai+1 , . . . , an )
k=1
m
X
= βk D(a1 , . . . , ai−1 , bk , ai+1 , . . . , an ),
k=1

m
X m
X
and similar expressions for D( βk bk , a2 , . . . , an ) up to D(a1 , . . . , an−1 , βk bk ).
k=1 k=1
We sometimes say that D is linear in every entry.

2. Antisymmetry: by interchanging two vectors the determinant function


changes sign.

3. Normalization: D(e1 , e2 , . . . , en ) = 1.

5.2.5 Note that the definition doesn’t garantee that determinant functions exist for all
n (for n = 2 we have seen that there is one). We will show that there is, for every
142 Rank and inverse of a matrix, determinants

n, precisely one determinant function, and we will discuss ways to compute such
determinants. We will usually call the unique determinant function simply the
determinant.

5.2.6 Suppose D is a determinant function, then D satisfies the following properties.

• D(. . . , a, . . . , a, . . .) = 0, i.e., of two of the vectors a1 , . . . , an are equal then


the determinant is 0: on the one hand, by interchanging two vectors the
determinant changes sign because of antisymmetry, on the other hand the
determinant stays the same since the two vectors are the same. But if a
number equals minus that number, it has to be 0.

• D(a1 , . . . , an ) = 0 if the vectors a1 , . . . , an are not linearly independent.


Suppose for instance that

a 1 = α2 a 2 + · · · + αn a n .

Then
n
X
D(a1 , a2 , . . . , an ) = D( α k a k , a2 , . . . , a n )
k=2
n
X
= αk D(ak , a2 , . . . , an ) = 0
k=2

because of the previous item.

5.2.7 Using these conditions we can write out what the n × n determinant should be:
just like for 2 × 2 matrices, write every row as a linear combination of the standard
basis vectors and use the multilinearity to expand the determinant as a sum of
many determinants with standard basis vectors as rows (in some order). Here is
how to do that. Consider
 
a11 a12 . . . a1n
 a21 a22 . . . a2n 
A= . .. 
 
 . . ... ... . 
an1 . . . . . . ann

with rows a1 , . . . , an . Then write for every i


n
X
ai = aij ej
j=1
5.2 Determinants 143

and so
n
X n
X 
D (a1 , . . . , an ) = D a1j1 ej1 , . . . , anjn ejn
j1 =1 jn =1
n
X n
X
= ... a1j1 . . . anjn D(ej1 , . . . , ejn ) .
j1 =1 jn =1

This is a sum with many terms if n gets big: there are n summation indices each
one of which assumes n values, so the number of terms is nn . For n = 8 this
already amounts to 16.777.216 terms.
In reality there are less terms, since if two of the indices are equal, then we are
dealing with a determinant with two equal rows and such a determinant is 0. So,
if D exists, then
X
D(a1 , . . . , an ) = a1j1 . . . anjn D(ej1 , . . . , ejn ) .
alle ji distinct

Now which indices (j1 , . . . , jn ) occur in this sum? From the fact that all numbers
j1 , . . . , jn should be distinct and lies in between 1 and n we conclude that in
(j1 , . . . , jn ) every number between 1 and n occurs precisely once. Such a sequence
is called a permutation of the numbers 1, . . . , n. For example, the permutations of
1, 2, 3 are
(1, 2, 3), (1, 3, 2), (2, 1, 3), (2, 3, 1), (3, 1, 2), (3, 2, 1).

These permutations can be listed as follows: choose an element from {1, 2, 3}. This
can be done in three ways. Then there are two left to choose from in the next
step. After that, there is only one choice left for the third element. So there are
3 × 2 × 1 = 3! = 6 permutations of the numbers 1, 2, 3.
In general, there are n! = n · (n − 1) · · · 2 · 1 (‘n factorial’) permutations for the
numbers 1, 2, . . . , n.
The number of terms is therefore n!, which is substantially less than nn , but
still large. For instance, 8! = 40.320. Unfortunately, a further a priori reduction
is not possible (though we will see that there are ways to compute determinants
avoiding writing down all these terms). Since any sequence j1 , . . . , jn consists of
all numbers 1, . . . , n, by repeatedly interchanging two elements in the sequence in
a clever way (so-called transpositions), we can attain the sequence 1, . . . , n. Every
step in which we interchange two elements introduces a factor −1 , so that,
X
D(a1 , . . . , an ) = ±a1j1 . . . anjn , (5.1)
j1 ,...,jn
144 Rank and inverse of a matrix, determinants

where the sum runs through all n! permutations of (1, . . . , n) and where a term is
preceded by +1 if the corresponding permutation can be changed into (1, . . . , n)
by an even number of ‘transpositions’, and by −1 otherwise.
The formula show that for any n there is at most one determinant function
(defined by the formula (5.1) just given). Conversely, one can prove that (5.1)
satisfies the requirements from Definition 5.2.4 (the proof comes down to proving
that the parity of the number of transpositions, i.e., whether you need an even
or odd number, involved in changing a permutation into 1, . . . , n does not depend
on the choice of transpositions used. We will skip this proof, since it belongs to
the domain of algebra. Given that determinants exist, we now concentrate on the
cases n = 2 and n = 3. The discussions for n > 3 are similar.

5.2.8 2 × 2 determinant
Take n = 2 and consider the matrix with rows a1 = (a, b) en a2 = (c, d).
There are only two permutations in this case: (1, 2) and (2, 1). The first one
comes with a plus sign in (5.1) and the second is one step of interchanging away
from (1, 2) and so comes with a minus sign. the rows. So:

a b
= ad − bc ,
c d

in agreement with definition 5.2.2.

5.2.9 3 × 3 determinant
Take n = 3 and consider
a11 a12 a13
a21 a22 a23 .
a31 a32 a33
There are 6 permutations of (1, 2, 3); the ones that come with a plus sign are
(1, 2, 3), (2, 3, 1), (3, 1, 2), and the ones with a minus sign are (1, 3, 2), (2, 1, 3),
(3, 2, 1). We find:

a11 a12 a13


a11 a22 a33 + a12 a23 a31 + a13 a21 a32
a21 a22 a23 = .
− a11 a23 a32 − a12 a21 a33 − a13 a22 a31
a31 a32 a33

This expression is also known as Sarrus’ rule. This rule is easy to remember. Just
put copies of the first two columns on the right-hand side of the matrix,

a11 a12 a13 a11 a12


a21 a22 a23 a21 a22
a31 a32 a33 a31 a32
5.2 Determinants 145

and then take the sum of the products of the elements on each of the diagonals
(from upper left to lower right) and subtract the products of the ‘anti diagonals’
(from upper right to lower left).
By inspection of the terms we see that each of a11 , a12 , a13 occurs linearly
in every term reflecting the fact that the determinant is really linear in the first
vector. It takes some more work to verify from the formula that interchanging a1
and a2 , or a2 and a3 , or a3 and a1 , only changes the sign of the determinant (and
we will not write out the details). Finally, if a11 = a22 = a33 = 1 and aij = 0
for i 6= j, then D(e1 , e2 , e3 ) = 1. These considerations show that the determinant
exists for n = 3.

5.2.10 In general one can prove that expression (5.1) indeed defines a determinant function
for every n. The proof is quite involved and we will not provide details in this
course. The subtle point is the sign: one needs that the parity of the number of
steps you need in rewriting a given permutation doesn’t depend on the way you
actually carry out these steps.
In practice, the expression (5.1) is almost never useful. To actually compute
determinants there are much better ways than using this formula as we will see
below. Here are a number of results on determinants and some words on the
proofs.

5.2.11 Theorem. For every n × n-matrix A:

det(A) = det(AT ) .

Sketch of proof. Note that (5.1) implies that every term in det(A) is a product of n
elements of the matrix in such a way that every row and every column contribute
to this product. The same holds for all terms of det(AT ) so that det(A) and
det(AT ) are sums of the same terms except maybe for the signs. That the signs
are the same is less trivial and is part of the theory of permutations. 

5.2.12 Theorem. All n × n matrices A and B satisfy

det(A B) = det(A) det(B).

Sketch of proof. We relate this property to the construction of a special multi-


linear function D that turns out to be the determinant up to a factor. To define
D(x1 , . . . , xn ) we collect the vectors as rows in a matrix X and define

D(x1 , . . . , xn ) = det(XB).

Then it is not difficult to show (but it takes some writing) that D is multilinear and
anti-symmetric. Since D(e1 , . . . , en ) = det(B) (and not necessarily 1), we conclude
146 Rank and inverse of a matrix, determinants

that D must be det(B) times the determinant. In other words, D = det(B) det.
In particular, if we use the rows a1 , . . . , an of our matrix A, we get

det(AB) = D(a1 , . . . , an ) = det(B) det(A).

5.2.13 Practical rules for computing determinants


To actually compute a determinant more practical approaches are available. We
will concentrate on two of them: one involves row (or column) operations, the
other one concerns the so-called expansion of a determinant with respect to a
row or column in order to reduce the computation of an n × n determinant to the
computation of n determinants of size (n−1)×(n−1). In practice, the combination
of these two techniques is quite effient.

5.2.14 Expansion with respect to a row or column


For a matrix A we denote by Aij the matrix obtained from A by deleting the i-th
row and the j-th column. We call Aij a submatrix of A. For example, if
 
−1 0 2
A= 2 1 3 ,
1 3 −1

then    
2 3 −1 0
A12 = en A33 = .
1 −1 2 1
Consider the n × n-matrix
 
a11 . . . a1n
 .. 
 a21 . 
A=
 .. ..
 .

 . . 
an1 . . . ann

Every term of det(A) contains factors from the first column. First consider those
terms containing the factor a11 . Then (5.1) shows that all other factors of such
a term do not come from the first row or column. These other factors therefore
come from the submatrix A11 . Next we consider the terms containing the factor
a21 . The other factors in such a term must come from A21 for similar reasons, etc.
A detailed inspection yields:

det(A) = a11 det(A11 ) − a21 det(A21 ) + . . . + (−1)n+1 an1 det(An1 ) .


5.2 Determinants 147

This formula is known as expansion across the first column.

There are similar formulas for the expansion across any row or column. Such
expansions reduce the computation of a ‘big’ determinant to the computation of
smaller determinants.

5.2.15 Theorem. (Expansion across a row or column) A determinant can be com-


puted by expansion across a row or column:

• Expansion across the i-th row:

n
X
det(A) = (−1)i+j aij det(Aij ).
j=1

• Expansion across the j-th column:

n
X
det(A) = (−1)i+j aij det(Aij ).
i=1

5.2.16 Example. Such an expansion is especially useful if a row or column contains many
zeros. The expansion across such a row or column then reduces the computation
to only a few smaller determinants. Here is an example concerning a so-called
(upper of lower) triangular matrix : a matrix whose entries above or below the
148 Rank and inverse of a matrix, determinants

main diagonal are 0’s. Repeatedly expanding across the first column yields:
a11 ∗ ... ... ∗
..
0 a22 .
det(A) = .. .. .. ..
. . . .
.. .. ..
. . . an−1,n−1 ∗
0 ... ... 0 ann

a22 ∗ ... ... ∗


.. ..
0 . .
= a11 .. .. .. ..
. . . .
.. .. ..
. . . an−1,n−1 ∗
0 ... ... 0 ann

a33 . . . ∗
.. .. an−1,n−1 ∗
= a11 a22 . . = a11 a22 · · ·
0 ann
0 ann

= a11 a22 . . . an−1,n−1 ann .


Of course, this result can also be obtained directly from (5.1); suppose all entries
below the main diagonal are 0, then if a term contains a factor from above the
diagonal, then there must also be a factor from below the diagonal which is 0. Such
a term is therefore 0. The only term that remains is the product of the entries on
the main diagonal.

5.2.17 Row operations and determinants


Row and column operations can be used to transform a matrix into one with
‘many’ 0’s so that expansion across a suitable row or column is efficient. Row and
column operations do influence the value of a determinant, so some bookkeeping
is required. Let A be a square matrix n × n. We first deal with row operations.
• The antisymmetry of the determinant function implies: interchanging two
rows changes the determinant by a minus sign. So if A changes into B by
interchanging two rows of A, then det(B) = − det(A).
• The multilinearity implies: if one row of A is multiplied by λ to produce
matrix B, then det(B) = λ det(A), or:
det(a1 , . . . , ai−1 , λai , ai+1 , . . . , an ) = λ det(a1 , . . . , ai , . . . , an ).
5.2 Determinants 149

• The multilinearity also implies: if a multiple of one row of A is added to


another row of A, then the resulting matrix B has the same determinant as
A, i.e., det(B) = det(A).

The proofs are straightforward. By way of example, we prove the third property.
Suppose we add λaj to ai , i 6= j. Then linearity implies:

det(. . . , ai + λaj , . . . , aj , . . .)

= det(. . . , ai , . . . , aj , . . .) + λ det(. . . , aj , . . . , aj , . . .).

The first determinant on the right-hand side is det(A), while the second one is 0
since two of the rows are equal, see 5.2.6.

5.2.18 Column operations and determinants As det(A) = det(AT ) similar properties


hold if we replace rows by columns in 5.2.17. In particular, if we are only interested
in the value of the determinant of A, then we can apply row operations and column
operations in any order we like. However, if we use both types of operations, we
do loose information on the row and column space of the matrix (but for the value
of the determinant this is not relevant).

5.2.19 Example.

0 1 2 −1 0 1 2 −1
2 5 −7 3 2 5 −7 3
det(A) = = =
0 3 6 2 0 3 6 2
−2 −5 4 −2 0 0 −3 1

1 2 −1 1 2 −1
1 2
−2 3 6 2 = −2 0 0 5 = 10 = − 30.
0 −3
0 −3 1 0 −3 1

Here is a description of the steps taken. First we add the 2-nd row to the 4-th.
Then we expanded across the first column. Then we subtract the 1-st row 3 times
from the 2-nd. Then we expand the 3×3 determinant across the 2-nd row. Finally,
the 2 × 2 determinant is computed using 5.2.8.
150 Rank and inverse of a matrix, determinants

5.2.20 Example.

2 0 0 8 1 0 0 4
1 −7 −5 0 1 −7 −5 0
det(A) = = 2
3 8 6 0 3 8 6 0
0 7 5 4 0 7 5 4

1 0 0 1 1 0 0 1
1 −7 −5 0 1 −7 −5 0
= 8 = 8
3 8 6 0 3 8 6 0
0 7 5 1 −1 7 5 0

1 −7 −5
= −8 3 8 6 = 0.
−1 7 5

In the first two steps we ‘extract’ a factor 2 from the 1-st row and then a factor
4 from the last column. Then we expand across the last column. The result is a
3 × 3 determinant in which the 1-st and 3-rd row are multiples of one another, so
that this determinant is 0.
The following theorem is related to Theorems 5.1.11 and 5.1.15, and to Corollary
5.1.12:

5.2.21 Theorem. Let A be an n × n-matrix and let b be a vector in Rn or Cn .

1. rank (A) = n ⇔ det(A) 6= 0.

2. A is invertible if and only if det(A) 6= 0.

3. The system of linear equations Ax = b has exactly one solution if and only
if det(A) 6= 0.

Proof.

1. From 5.2.17 we first conclude that row and column operations may change
the value of the determinant, but not the property of ‘being 0’: if det(A) 6= 0,
then any row or column operation produces a matrix whose determinant
is 6= 0; and if det(A) = 0 then any row or column operation produces a
matrix whose determinant is 0. If rank (A) = n, then A can be transformed
into the identity matrix I whose determinant is 1, so that, by the previous
considerations, det(A) 6= 0; if rank (A) < n, then one of the rows must be a
linear combination of the other rows and therefore det(A) = 0 by 5.2.6.
5.2 Determinants 151

2. Follows from (1) and Theorem 5.1.15.

3. Follows from (1) and Corollary 5.1.12.

5.2.22 Cramer’s rule


There is a remarkable role for determinants in solving systems of n linear equations
in n variables. Let A be the coefficient matrix of such a system Ax = b, or
explicitly:
a11 x1 + a12 x2 + . . . + a1n xn = b1
a21 x1 + a22 x2 + . . . + a2n xn = b2
.. (5.2)
.
an1 x1 + an2 x2 + . . . + ann xn = bn .
This system of linear equations has exactly one solution if rank (A) = n, so if
det(A) 6= 0. The remarkable thing is that using determinants we can write down
an explicit expression for the solution. Here is how such an expression if derived.
Let k 1 , . . . , k n be the columns of A, and let x1 , . . . , xn be the (coordinates of
the) solution vector. Then

b = x1 k 1 + x2 k 2 + . . . + xn k n .

Now replace the j-th column of A by b an denote the resulting matrix by Aj (b).
Then
n
X
det(Aj (b)) = det(k 1 , . . . , k j−1 , xi k i , k j+1 , . . . , k n )
i=1

n
X
= xi det(k 1 , . . . , k j−1 , k i , k j+1 , . . . , k n )
i=1

= xj det(k 1 , . . . , k n ) = xj det(A),
because in the sum all determinants with i 6= j are 0 since they contain two equal
vectors. The solution of the system is therfore:

det(Aj (b))
xj = , j = 1, . . . , n.
det(A)

This is called Cramer’s rule. This rule is of importance in theoretical considera-


tions on determinants, but has limited practical value. Already for n ≥ 3, the rule
is quite impractical for actually solving a system.
152 Rank and inverse of a matrix, determinants

For n = 2, Cramer’s rule applied to the system

ax + by = c
dx + ey = f

leads to
c b a c
f e ce − bf d f af − cd
x= = , y= = ,
a b ae − bd a b ae − bd
d e d e
whenever ae − bd 6= 0.

5.2.23 Cramer’s rule can also be used to derive an explicit formula for the inverse of an
invertible square matrix. Again, for n ≥ 3 this formula is mainly of theoretical
importance. Using row operations to find the inverse is much more efficient in
practice.
To derive this formula in the case of a 2 × 2–matrix, we have to solve two
systems of linear equations:

ax11 + bx21 = 1 ax12 + bx22 = 0


en
cx11 + dx21 = 0 cx12 + dx22 = 1.

Cramer’s rule then produces the following result:


 
−1 1 d −b
A = .
ad − bc −c a

5.2.24 The 2 × 2 determinant can be interpreted as a surface area as we saw in the begin-
ning of the chapter. Similarly, n × n determinants have geometric interpretations:
they ‘measure’ volumes of parallelepipeds spanned by the rows (or columns) of the
matrix in in Rn .
5.3 Notes 153

5.3 Notes
Determinants were popular among 19th century mathematicians. The name deter-
minant was coined by the French mathematician Augustin-Louis Cauchy (1789–
1846). All sorts of determinantal identities were derived. Cramer’s rule goes back
to Gabriel Cramer (1704–1752), even though Cramer himself did not give a proof
of the rule. Nowadays, the importance of determinants in various branches of
mathematics (and fields where mathematics is applied) is very clear. In analysis Analysis 2, 3
determinants show up in the substitution rule for multiple integrals. For instance,
when introducing new variables in a double integral a 2 × 2 determinant appears
in the transformed integral. When using polar coordinates x = r cos φ, y = r sin φ
this looks as follows:
RR RR cos φ sin φ
f (x, y) dx dy = f (r cos φ, r sin φ) dr dφ
RR −r sin φ r cos φ
= f (r cos φ, r sin φ)r dr dφ.

The proof that determinant functions exist in all dimensions requires knowledge
of permutations beyond the scope of these lecture notes. An elegant construction
of a determinant function is the following: let φ1 , . . . , φn be the dual basis of a
basis of Rn (dual bases occur in Linear Algebra 2) and define
X
det(a1 , . . . , an ) = sgn(σ)φσ(1) (a1 ) · · · φσ(n) (an ),
σ∈Sn

where Sn is the set of all n! permutations of {1, . . . , n}. Permutations are discussed
more extensively in the courses on algebra. They are also useful in describing
symmetries like the 48 symmetries of the cube. The proof that there are no general Algebra
formulas for finding the roots of polynomials of degrees ≥ 5 also uses permutations.
Determinant functions are special cases of multilinear functions, which play
an important role in the theory of line, surface and volume integrals (theorems of
Gauss, Green, Stokes) and in differential geometry.
154 Rank and inverse of a matrix, determinants

5.4 Exercises
§1

1 Determine a basis for the row space and a basis for the column space for each of
the following matrices:  
  1 i 1+i
1 1 1 1
a. , c.  1 + i 1 2 + i  .
1 2 3 4
  2 + i −1 1 + i
1 1 0 1
b.  −1 2 1 1  ,
−1 8 3 5

2 Determine the rank of each of the following matrices:


 
  −4 1 0 1
1 1  −2 0 2 1 
a. , c.   0 2 −3
,
1 2 0 
   −7 2  0 2
1 −2 1 1 2 3
b.  −1 3 −2  , d.  2 4 5  .
−1 1 0 3 5 6

3 Consider the matrices in the previous exercise. Deetermine the inverse of each of
the matrices whenever the inverse exists. Check your answers by using AA−1 =
A−1 A = I.

4 Determine the inverse of each of the following matrices:


 
  1 0 1
i 1
a. , b.  1 0 −1  .
1 i
0 i 0

5 a. Let A be an n × m–matrix and let B be an n × 1–matrix. Prove that there


exists an m × 1–matrix X with AX = B if and only if B belongs to the column
space of A.

b. (Notation as in a.) What is the relation between rank(A) and rank(A|B) if


the system AX = B has a solution?

c. Let A and B be two matrices such that the product AB exists. Show that the
column space of AB is contained in the column space of A, and that the row
space of AB is contained in the row space of B.
5.4 Exercises 155

6 a. For each λ ∈ C determine the rank of the matrix


 
λ 1
.
1 λ3

b. For each λ ∈ R determine the rank of the matrix


 
2 3 2+λ
.
1+λ 4−λ 3

§2

7 Determine the following determinants:


0 1 0 0
i 1 0 0 0 1
a. , e. ,
−1 i 0 0 1 0
1 0 0 0
0 0 π 3 −4 1
b. 0 √e 11 , f. −2 3 −1 ,
i 2 2 1 1 2
7 −3 8 7 9 4
c. 9 11 5 , g. 6 5 4 ,
8 8 3 2 5 3
4 4 8 0 0 2
d. 0 1 2 , h. 0 2 6 .
0 0 2 1 2 2

8 Determine the following determinants:


4 6 2 5 1 1 0 0
6 9 3 7 1 2 1 0
a. , d. ,
5 8 6 2 1 3 3 1
7 10 8 3 1 4 6 4
11 3 −4 5 1 3 −1 2
−13 −4 5 −6 −1 1 0 −1
b. , e. ,
18 5 −5 7 1 2 1 −2
8 2 −2 3 0 3 4 1
1 2 3 −3 6
1 0 0 1
3 6 9 2 7
0 π 0 1
c. , f. 5 3 2 3 −1 .
1 0 1 0
7 2 5 4 1
0 1 1 0
2 −1 3 1 1
156 Rank and inverse of a matrix, determinants
 
1 
9 Let A =  4  and B = 2 −1 2 . Calculate det(AB) and det(BA).
2

10 Consider the n × n-matrix A given by akl = k + l. Calculate det(A).

11 Check if the matrix  


1 1 i
A= 1 i 1 
i 1 1
has an inverse.

12 a. Show that the system of linear equations

3x1 −4x2 +x3 = 4,


−2x1 +3x2 −x3 = −3,
x1 +x2 +2x3 = 3.

has exactly one solution.

b. Show that the system of equations

x1 +x2 +x3 = 2,
2x1 −x2 −3x3 = −1,
3x2 +5x3 = 5.

has more than one solution.

c. Show that thes system of linear equations

x1 +ix2 +2x3 = i,
(1 + i)x1 +x2 +x3 = 1,
(−1 + 2i)x1 +(−1 + i)x2 +3ix3 = 0.

has no solutions.

13 a. Use Cramer’s rule to solve for y in the system of linear equations

2x +3y −2z = −1,


3x +y +5z = 11,
x +4y −3z = −2.
5.4 Exercises 157

b. Use Cramer’s rule to solve for x in the system of linear equations

3x +4y +2z = 6,
4x +6y +3z = 6,
2x +3y +z = 1.

14 For which α, β, γ are the vectors (α, β, γ), (α, 2β, 2γ), (2α, 2β, γ) in R3 linearly
dependent?

15 a. Show that any n × n-matrix A and any scalar α the relation det(αA) =
αn det(A) holds.

b. Suppose the square matrix A satisfied A−1 = A⊤ . What conclusion can you
draw on det(A)?

c. Of the n × n-matrix A it is given that A⊤ = −A. What follows from this for
det(A)?

d. If A is an n × n-matrix and S is an n × n-matrix for which S −1 exists, then


det(S −1 AS) = det(A). Prove this.

16 a. Give an example of a square matrix A, different from the zero matrix, for which
A2 equals the zero matrix.

b. Suppose A is a square matrix with A2 = 0 (zero matrix). Determine det(A)?

c. Give an example of the square matrix A, different from the zero matrix, for
which A2 = A.

d. What are the possibilities for det(A) if A2 = A?

17 a. Show that
1 1 1
a b c = (b − a)(c − a)(c − b).
a 2 b2 c 2

b. Can you generalize this to n×n–determinants? (These so-called Vandermonde-


determinants are named after Alexandre–Theophile Vandermonde (1735–1796).)
158 Rank and inverse of a matrix, determinants

5.4.1 Exercises from old exams


18 For which value(s) of a does the matrix
 
1 1 a
A= 0 a 1 
1 1 3

have an inverse? Determine the inverse of A for these value(s).

19 Determine all the values of a for which the vectors (a2 + 1, a, 1), (2, 1, a), (1, 0, 1)
in R3 are linearly dependent.

20 a. For each λ ∈ R the following matrix is given:


 
1 λ 0
Aλ =  λ 1 λ2 − 1  .
0 2 −1

Determine the inverse of Aλ for each value of λ for which Aλ is invertible.

b. For which value(s) of λ does the system

x1 + λx2 = λ2
λx1 + x2 +(λ − 1)x3 = λ2
2

2x2 −x3 = 0

have no solutions?
Chapter 6

Inner product spaces

6.1 Inner product, length and angle


6.1.1 In Chapter 4 we defined the notion of a vector space. The elements of such a vector
space are called vectors and ‘behave’ as far as addition and scalar multiplication
is concerned as the vectors in the plane or in space.
Now vectors in the plane and in space have two more characteristics: such
a vector has a length and two such (nonzero) vectors determine an angle. In
this chapter we will generalize these notions to general vector spaces. It may not
come as a surprise that actually the notion of inner product will be central in our
discussion.
In this section we

• introduce inner products and inner product spaces;

• define the notions length, distance and angle (in particular being perpendic-
ular) in real vector spaces;

• discuss the Cauchy-Schwarz inequality (needed for instance to introduce the


notion of angle)

• discuss the Pythagorean theorem.

6.1.2 To give as to where the notion of an inner product comes from, we take a look at
the plane.

159
160 Inner product spaces

b

✶ a

In a triangle in which two sides correspond to the vectors a and b, the third side
has length k a − b k. If ϕ is the angle between the vectors a and b, then the cosine
rule tells us that

k a − b k2 =k a k2 + k b k2 −2 k a k · k b k cos ϕ.

In this equality the product k a k · k b k cos ϕ on the right-hand side is of


importance to us; it is the inner product of the vectors a and b in the plane. By
choosing a = b we recover the square of the length of a. The angle between two
(nonzero) vectors is also present in this expression. Now in abstract vector spaces
we can’t start with such an explicit formula containing undefined factors. The way
out is to generalize properties of the inner product: symmetry and (bi)linearity as
discussed in Chapter 2.

6.1.3 Definition. (Inner product) Let V be a real vector space. An inner product 1
on V is a function which assigns to any two vectors a, b from V a real number
denoted by (a, b) in such a way that

1. (a, b) is linear in both entries:

(λv + µw, b) = λ(v, b) + µ(w, b)


(a, λv + µw) = λ(a, v) + µ(a, w)

for all scalars and all vectors.

2. (a, b) = (b, a) for all a, b ∈ V . (So we need only impose linearity in the
first entry in the previous item, because symmetry will then automatically
imply linearity in the second entry.)

3. (a, a) ≥ 0 for all a ∈ V , and (a, a) = 0 implies a = 0.

A real vector space with an inner product is often called a real inner product space.
1
An inner product is sometimes called a ‘dot product’ with notation a · b.
6.1 Inner product, length and angle 161

6.1.4 Definition. Let V be a complex vector space. An inner product on V is a function


that assigns to every two vectors a, b in V a complex number, denoted by (a, b),
in such a way that
1. (a, b) is linear in a,

2. (a, b) = (b, a) for all a, b ∈ V ,

3. (a, a) ≥ 0 for all a ∈ V , and (a, a) = 0 implies a = 0.


A complex vector space with an inner product is often called a (complex) inner
product space.

6.1.5 An inner product on a real vector space takes real values, an inner product on a
complex vector space assumes complex values. Although there are many similar-
ities between real and complex inner products one significant difference concerns
the second entry: a complex inner product is not linear in the second entry.
 X n  n
X  n
X n
X
a, β i bi = βi bi , a = βi (bi , a) = β i (bi , a)
i=1 i=1 i=1 i=1
n
X
= βi (a, bi ) .
i=1

6.1.6 Example. (Standard inner product) There are many ways to define an inner
product on Rn or Cn , but there is one which deserves a special name, the so-called
standard inner product: if a = (a1 , a2 , . . . , an ) en b = (b1 , b2 , . . . , bn ) then

(a, b) = a1 b1 + a2 b2 + · · · + an bn ;

in Rn this definition reduces to

(a, b) = a1 b1 + a2 b2 + · · · + an bn .

Verify that these are really inner products using 6.1.3 and 6.1.4. The standard
inner product generalizes the usual inner product in R2 , but it turns out to play a
special role among inner products as we will see later.

6.1.7 Example. Let V be the set of continuous real valued functions on the interval
[a, b] (with b > a). Then pointwise addition and scalar multiplication turn V into
a vector space. For f, g ∈ V we define
Zb
(f, g) = f (x) g(x)dx.
a
162 Inner product spaces

It’s easy to verify that this defines an inner product on V , except maybe the second
part of property 3 (the remaining verifications are left to the reader). To prove
the second part of property 3 we proceed as follows. Take any f ∈ V and suppose
f (α) 6= 0 for some α ∈ [a, b]. Then continuity of f implies that there is an interval
around α so that |f (x)| > 12 |f (α)| > 0 for all x in that interval (if you are familiar
with the ε-δ definition of continuity: use ε = 12 |f (α)|). Let δ be the length of the
interval. Then:
Zb
1
(f, f ) = |f (x)|2 dx ≥ δ|f (α)|2 > 0.
4
a

So if (f, f ) = 0, then f (x) must equal 0 for all x ∈ [a, b].

6.1.8 Consider a real or complex inner product spacep V . Since (a, a) is a nonnegative
real number for every vector a, the square root (a, a) is a real number. We use
this to define the notion of length.

6.1.9 Definition. (Length and distance) In an inner product space the length or
norm k a k of a vector a is defined as
p
k a k= (a, a).

The distance between the vectors a and b is by definition the length of the vector
a − b, i.e., k a − b k.

6.1.10 Examples. Using the standard inner product, the length of (1, 1, 1, 1) ∈ R4 equals
p
12 + 12 + 12 + 12 = 2.

The distance between (2, 1, 3, 4) and (5, 1, 7, 4) is


p √
(2 − 5)2 + (1 − 1)2 + (3 − 7)2 + (4 − 4)2 = 9 + 16 = 5.

In the inner product space V from example 6.1.7 (where we take a = 0 and b = 1)
the length of the function/vector x 7→ x2 is
s
1√
Z 1
x2 · x2 dx = 5.
0 5

6.1.11 The introduction of the angle between vectors requires some thought plus a famous
theorem, the Cauchy-Schwarz inequality. This inequality is also of importance in
other branches of mathematics.
6.1 Inner product, length and angle 163

6.1.12 Theorem. (The Cauchy-Schwarz inequality) For all a, b in the inner product
space V the following inequality holds:

|(a, b)| ≤ k a k k b k .

Proof. First we deal with the case that a = 0. If a = 0, then the linearity of the
inner product implies: (a, b) = (0a, b) = 0(a, b) = 0 for all b, so in particular
(a, a) = 0 and k a k = 0. In this the inequality is even an equality.
From now on we assume a 6= 0. Choose b ∈ V and let ϕ be the argument of (a, b).
Define a∗ = e−iϕ a. Then

(a∗ , b) = e−iϕ (a, b) ∈ R .

Take λ ∈ R and consider

f (λ) = (λa∗ + b, λa∗ + b) .

Then f (λ) ≥ 0 for every λ ∈ R because of definitions 6.1.3 and 6.1.4. Since f (λ)
is a quadratic function in λ,
f (λ) = (λa∗ + b, λa∗ + b)
= λ2 (a∗ , a∗ ) + λ(a∗ , b) + λ(b, a∗ ) + (b, b)
= k a∗ k2 λ2 + 2λ(a∗ , b)+ k b k2 (want (a∗ , b) ∈ R),
its discriminant must be less than or equal to 0, i.e.,
4 (a∗ , b)2 − 4 k a∗ k2 k b k2 ≤ 0 ,
(a∗ , b)2 ≤ k a∗ k2 k b k2 ,
|(a∗ , b)| ≤ k a∗ k k b k .
This implies

|eiϕ (a, b)| = |(a, b)| ≤ k eiϕ a k k b k = k a k k b k .

6.1.13 The Cauchy–Schwarz inequality is also known as the Cauchy–Schwarz–Bunyakovsky


inequality. As for the proof, it is a good exercise to redo the proof in the real case
(it simplifies considerably).

6.1.14 Examples. Consider the inner product spaces from example 6.1.6. The Cauchy–
Schwarz inequality then implies the following formula in Cn :
n 2 n
! n
!
X X X
a i bi ≤ |ai |2 |bi |2
i=1 i=1 i=1
164 Inner product spaces

for every pair (a1 , . . . , an ), (b1 , . . . , bn ) in Cn . For example, for a = (1, 1, 2) and
for all (x, y, z) in R3 we get:

(x + y + 2z)2 ≤ (12 + 12 + 22 )(x2 + y 2 + z 2 ) = 6(x2 + y 2 + z 2 ).

In the vector space V from example 6.1.7 the Cauchy–Schwarz inequality leads to
the following inequality for every pair of continuous functions f and g on [a, b]:
Z b 2 Z b Z b
2
f (x)g(x)dx ≤ |f (x)| dx |g(x)|2 dx .
a a a

Inequalities like this one are used to give estimates on, e.g., integrals. For example,
2 2
for x 7→ ex and x 7→ e−x , considered on the interval [0, 1], we have:
Z 1 2 Z 1 Z 1
x2 −x2 2x2 2
1= e e dx ≤ e dx e−2x dx.
0 0 0

6.1.15 Theorem. Let V be an inner product space. Then for all vectors a and b in V
and all scalars λ we have:

1. k a k≥ 0, and k a k= 0 if and only if a = 0.

2. The triangle inequality:

k a + b k≤k a k + k b k,

a generalization of the famous inequality in the plane or in space:

❃ a+b

b kbk

✛ ka+bk
kbk ✶ a

kak
0
3. k λa k= |λ| k a k.

Proof. The first part follows directly from the third condition on inner products.
The third part follows from the linearity of the inner product and (a, b) = (b, a):

(λa, λa) = λ(a, λa) = λλ̄(a, a) = |λ|2 (a, a),


6.1 Inner product, length and angle 165

so
k λa k = |λ| k a k .
For property 2, the triangle inequality, we need the Cauchy–Schwarz inequality:

k a + b k2 = (a + b, a + b) = (a, a) + (b, b) + (a, b) + (b, a)


≤ k a k2 + k b k2 +2|(a, b)|
≤ k a k2 + k b k2 +2 k a k k b k = (k a k + k b k)2 .

Upon taking square roots on both sides, we find the desired inequality. 

6.1.16 The triangle inequality provides an upper bound on the length of the sum of two
vectors. A lower bound can also be extracted from this inequality as follows.
Replace a by a − b in the triangle inequality (we can apply the inequality to any
two vectors!). We then find (for all a and b):

k (a − b) + b k ≤ k a − b k + k b k ,

ka−bk ≥ kak−kbk .
This inequality is also valid if we now replace b by − b:

k a + b k ≥ k a k − k −b k = k a k − k b k

(using the third property of an inner product). Interchanging a and b lproduces

kb+ak ≥ kbk−kak ,

so that finally (for all a and b):

ka+bk ≥ | kak−kbk | .

6.1.17 Normed vector space


A vector space with the notion of length satisfying the properties in the theorem is
called a normed vector space. They are of importance in functional analysis where
spaces of functions paly a special role. They are beyond the scope of this course.

6.1.18 Angle
The Cauchy–Schwarz inequality enables us to define the notion of angle between
two (nonzero) vectors in a real vector space. Here are the details. Let a 6= 0, b 6= 0.
Then the Cauchy–Schwarz inequality implies

(a, b)
−1 ≤ ≤ 1.
kak kbk
166 Inner product spaces

So there is a real number ϕ satisfying


(a, b)
= cos ϕ
kak kbk
or
(a, b) = k a k k b k cos ϕ. (6.1)
We call this number (usually taken in the interval (−π, π]) the angle between the
two vectors. In practice, you usually first compute the cosine of the angle via (6.1).
The notion of angle in a complex vector space is more complicated. It is beyond
the scope of this course. We do define perpendicularity.

6.1.19 Definition. (Perpendicular vectors) Let V be an inner product space. The


vectors a ∈ V and b ∈ V are perpendicular , denoted by a ⊥ b, if (a, b) = 0.
(In this definition a and/or b are allowed to be the zero vector; also note that
(a, b) = 0 ⇔ (b, a) = 0.)

6.1.20 Example. In Rn or Cn (with the standard inner product) the vectors ei and ej
are perpendicular for every i and j with i 6= j. Moreover, every ei has length 1.

6.1.21 Example. Let V be the inner product space of continuous functions on [0, 2π]
(see example 6.1.7) and consider for all n ∈ Z the function
en = einx .
If n 6= m, then en ⊥ em , because
Z2π Z2π
(en , em ) = einx eimx dx = ei(n−m)x dx
0 0

1
ei(n−m)2π − e0

= i(n−m) = 0.
Also,
Z2π
k en k2 = (en , en ) = einx einx dx
0
Z2π Z2π
inx 2
= e dx = dx = 2π ,
0 0
and therefore √
k en k = 2π voor alle n ∈ Z .
This inner product space of functions plays an important role in Fourier analysis,
a field with applications in for instance signal analysis.
6.2 Orthogonal complements and orthonormal bases 167

6.1.22 Theorem. (Pythagorean theorem) Let V be an inner product space.


1. If the vectors a and b are perpendicular, then

k a + b k2 =k a k2 + k b k2 .

2. If the vectors a1 . . . , ak are mutually perpendicular, i.e., (ai , aj ) = 0 if i 6= j,


then
k a1 + · · · + ak k2 =k a1 k2 + · · · + k ak k2 .

Proof. To prove the first part, we use the properties of the inner product to expand
k a + b k2 (and we use (a, b) = (b, a) = 0):

k a + b k2 = (a + b, a + b) = (a, a) + (a, b) + (b, a) + (b, b)


= (a, a) + (b, b) =k a k2 + k b k2 .
The second part follows from the first one plus an induction argument. We leave
this to the reader. 

6.2 Orthogonal complements and orthonor-


mal bases
6.2.1 In terms of the standard inner product on R3 the equation 2x1 + 3x2 − x3 = 0 can
be rewritten as
((2, 3, −1), (x1 , x2 , x3 )) = 0.
So the solutions of the equation are precisely the vectors x ∈ R3 which are per-
pendicular to (2, 3, −1). There is a similar interpretation of the solutions of a
homogeneous system of linear equations Ax = 0. If A = (aij ) is a m × n–matrix,
then the i-th element of the matrix product Ax, i.e., ai1 x1 + · · · + ain xn , equals the
standard inner product (ai , x) of the i-th row of A and the vector x. The solutions
of the system are therefore the vectors which are perpendicular to all rows of A.
There are more situations where perpendicularity is useful as we will see. In
this section we discuss
• The set of vectors perpendicular to a given subspace (orthogonal comple-
ment),

• sets of vectors of length 1 which are mutually perpendicular (orthonormal


sets),

• the role of orthonormal sets of vectors in computing orthogonal projections


and in working with coordinates,
168 Inner product spaces

• the Gram-Schmidt procedure to transform a given set of vectors into an


orthonormal set.

6.2.2 Orthogonal complement


Let V be a real or complex inner product space and let W be a linear subspace of
V . The set of vectors which are perpendicular to all vectors of W is denoted by
W ⊥ . More precisely:

6.2.3 Definition. Let W be a linear subspace of an inner product space V . The or-
thogonal complement of W is the set

W ⊥ = {x ∈ V | (x, w) = 0 voor alle w ∈ W } .

6.2.4 Here are some properties of the orthogonal complement W ⊥ :

• W ⊥ is a linear subspace of V . The orthogonal complement W ⊥ is non-empty


since 0 ∈ W ⊥ . If x ∈ W ⊥ and y ∈ W ⊥ , then for all w ∈ W (x + y, w) =
(x, w) + (y, w) = 0, so that x + y ∈ W ⊥ , and (αx, w) = α(x, w) = 0, so that
αx ∈ W ⊥ .

• W ∩ W ⊥ = {0}, i.e., a linear subspace W and its orthogonal complement


W ⊥ only have the zero vector in common. Suppose x ∈ W ∩ W ⊥ . Since
x ∈ W ⊥ we have (x, w) = 0 for all w ∈ W . Now apply this to w = x ∈ W .
Then (x, x) = 0 and we get x = 0.

To compute the orthogonal complement of a span < a1 , . . . , an > it suffices to


find the vectors perpendicular to each of the vectors a1 , . . . , an . We formulate this
result as a theorem.

6.2.5 Theorem. If W =< a1 , . . . , an >, then

W ⊥ = {x ∈ V | (ai , x) = 0 for i = 1, . . . , n} .

Proof. If x ∈ W ⊥ , then x is perpendicular to all vectors in W , in particular,


x is perpendicular to a1 , . . . , an . This means that (ai , x) = 0 for i = 1, . . . , n.
Conversely, let x be a vector with the property that (ai , x) = 0 for i = 1, . . . , n.
Xn
An arbitrary vector w ∈ W can be written as a linear combination w = αi a i .
i=1
n
X n
X
Then (w, x) = ( αi ai , x) = αi (ai , x) = 0 by linearity of the inner product.
i=1 i=1
So x is perpendicular to every vector in W . Consequently, x ∈ W ⊥ . 
6.2 Orthogonal complements and orthonormal bases 169

6.2.6 Example. In R3 we determine all vectors which are perpendicular to (1, 2, − 1),
i.e., the orthogonal complement of the subspace l =< (1, 2, − 1) >. A vector
x = (x, y, z) is in this complement if and only if ((1, 2, − 1), x) = 0, or x+2y−z = 0.
So l⊥ is the plane V : x + 2y − z = 0.
Next we determine the orthogonal complement of the plane V . We first deter-
mine a parametric representation of V . Let z = λ and y = µ, then x = λ − 2µ
and
V = < (1, 0, 1), (− 2, 1, 0) > .
V ⊥ consists precisely of all vectors (x, y, z) satisfying

((1, 0, 1), (x, y, z)) = x +z = 0,


((− 2, 1, 0), (x, y, z)) = − 2x +y = 0.

Neem x = λ, dan is y = 2λ en z = − λ, dus

V ⊥ =< (1, 2, − 1) >= l ,

which is maybe not so surprising (it is actually an example of a general statement


on orthogonal complements).
In a similar way we find that the orthogonal complement of the line l =<
(a, b, c) > in R3 is the plane V : ax + by + cz = 0, and that, conversely, V ⊥ =<
(a, b, c) >.

6.2.7 Orthonormal sets of vectors


Sets of length 1 vectors which are mutually orthogonal are of special importance.
Such sets of vectors are called orthonormal sets. An example is the standard
basis e1 , . . . , en of Rn or Cn . Orthonormal sets are useful in computing orthogonal
projections, distances, and coordinates.

6.2.8 Definition. Let V be an inner product space. The vectors e1 , . . . , en in V form


an orthonormal set if for 1 ≤ i, j ≤ n

0 als i 6= j ,
(ei , ej ) =
1 als i = j .

If moreover e1 , . . . , en is a basis of V , then the set is called an orthonormal basis


of V .

6.2.9 Example. The set of vectors


1 1
√ (1, −1, 0), √ (1, 1, 0), (0, 0, 1)
2 2
170 Inner product spaces

in R3 is an orthonormal set. It is not difficult to see that this set is linearly


independent (see also the following theorem) and therefore an orthonormal basis
of R3 .

6.2.10 Theorem. Let V be an inner product space.


1. Orthonormal sets of vectors in V are linearly independent.
2. If a1 , . . . , an is an orthonormal basis of V , then the coordinates of x with
respect to this basis are (x, a1 ), . . . , (x, an ), respectively. In the case of a real
vector space we have
k x k2 = (x, a1 )2 + · · · + (x, an )2 ,
i.e., the length of x equals the length of the coordinate vector of x with
respect to the standard inner product. (There is a similar equality in the
complex setting, we won’t go into that.)
Proof. 1) To prove linear independence, we study the equation
λ1 a 1 + · · · + λn a n = 0 (6.2)
in λ1 , . . . , λn . Now take the inner product on both sides of (6.2) with the vector
aj :
(λ1 a1 + · · · + λn an , aj ) = (0, aj ) = 0.
Since a1 , . . . , an is an orthonormaal set, the left-hand side simplifies to λj . Conse-
quently, λj = 0.
2) To prove the second statement, we first note that since a1 , . . . , an is a basis
of V there are scalars µ1 , . . . , µn with x = µ1 a1 + · · · + µn an . To determine the µj ,
we again take the inner product of both sides with aj :
(x, aj ) = (µ1 a1 + · · · + µn an , aj ).
Expanding the right-hand side yields (x, aj ) = µj . The equality involving the
length is a direct consequence of the Pythagorean theorem. 6.1.22. 

6.2.11 Example. The set 6.2.9 is an orthonormal set and therefore linearly independent
by Theorem 6.2.10. So the three vectors are a basis. The coordinates of a = (2, 2, 2)
with respect to this basis are
1 1 √
(a, √ (1, −1, 0)) = 0, (a, √ (1, 1, 0)) = 2 2, (a, (0, 0, 1)) = 2.
2 2
So:
√ 1
(2, 2, 2) = 2 2 √ (1, 1, 0) + 2 (0, 0, 1).
2
6.2 Orthogonal complements and orthonormal bases 171

6.2.12 Orthogonal projection


Orthonormal sets are useful in determining orthogonal projections on subspaces
(and such projections are useful since they ‘solve’ shortest distance problems). Let
W be a linear subspace of the inner product space V . The orthogonal projection
of x ∈ V on the subspace W is the vector y in W such that x − y is perpendicular
to the subspace W , i.e., belongs to W ⊥ . (The vector y turns out to be unique as
we will see below.)

Figure 6.1: Loodrechte projectie.

Suppose that a1 , . . . , ak is an orthonormal basis of the subspace W . Then the


vector y can be written as λ1 a1 + · · · + λk ak for some scalars λ1 , . . . , λk . The
condition that x − y is orthogonal to W is equivalent to

(x − (λ1 a1 + · · · + λk ak ), a1 ) = 0
..
.
(x − (λ1 a1 + · · · + λk ak ), ak ) = 0.
Expanding these inner products yields

λ1 = (x, a1 ), . . . , λk = (x, ak ).

In particular, there is precisely one such vector y. We denote the projection of


x by PW (x). We state this result in the following theorem. We also show that
PW (x) is the vector in W with minimal distance to x.

6.2.13 Theorem. Let a1 , . . . , ak be an orthonormal basis of the linear subspace W of the


inner product space V . If PW (x) is the orthogonal prrojection of x on W , then:
1. x − PW (x) is perpendicular to every vector from W .

2. The orthogonal projection PW (x) of x ∈ V on W =< a1 , . . . , ak > equals


PW (x) = (x, a1 )a1 + · · · + (x, ak )ak .
172 Inner product spaces

3. k x − PW (x) k= minz∈W k x − z k, i.e., the orthogonal projection is the


unique vector in W with minimal distance to x.

4. k PW (x) k≤k x k with equality occurring if and only if x = PW (x).


Proof. The first item is just the definition; the second part has been proved
above. So we turn to the third item. Take any vector z in W . We will compare
x

PW ( x )
z

Figure 6.2: Orthogonal projection and shortest distance.

k x − PW (x) k and k x − z k. To this end, write x − z = (x − PW (x)) + (PW (x) − z).


The component PW (x) − z is in W and is therefore orthogonal to x − PW (x). The
Pythagorean theorem 6.1.22 now implies

k x − z k2 =k x − PW (x) k2 + k PW (x) − z k2 .

Since lengths are non-negative we find

k x − PW (x) k≤k x − z k

with equality if and only if k PW (x) − z k= 0, i.e., if and only if PW (x) = z.


The last statement in the theorem follows in a similar way from the Pythagorean
theorem 6.1.22: since x − PW (x) and PW (x) are perpendicular (note that the sec-
ond vector is in W ), we have

k x k2 =k x − PW (x) k2 + k PW (x) k2 .

So k PW (x) k≤k x k with equality precisely if x = PW (x). 

6.2.14 Example. The orthogonal projection of the vector (1, 0, 1) ∈ R3 onto the the line
l =< (1, 2, 1) > can be computed as follows. First we divide (1, 2, 1) by its length

6 to get a vector on the line with length 1:
1
l =< √ (1, 2, 1) > .
6
6.2 Orthogonal complements and orthonormal bases 173

Next we apply Theorem 6.2.13 to find


1 1 1
((1, 0, 1), √ (1, 2, 1)) · √ (1, 2, 1) = (1, 2, 1).
6 6 3

In general, the orthogonal projection of x onto the line < a >, where a has length
1, equals (x, a)a.

6.2.15 Constructing orthonormal bases


Coordinates with respect to orthonormal bases can be easily found by taking ap-
propriate inner products. Our next task is to construct orthonormal sets and
bases.

6.2.16 Theorem. Let a1 , . . . , an be linearly independent vectors in an inner product


space V . There exists a constructive process to transform this set of vectors into
an orthonormal set e1 , . . . , en in such a way that

< a1 , . . . , ai > = < e1 , . . . , ei > voor i = 1, . . . , n .

In particular, every finite dimensional inner product space has an orthonormal


basis.

6.2.17 Proof: the Gram-Schmidt process


We prove this theorem by providing an algorithm that actually produces the re-
quired orthonormal basis e1 , . . . , en . This algorithm can be carried out by a com-
puter or, in simple cases, by hand. It is called the Gram-Schmidt process.

• Step 1 First we replace a1 by a vector of length 1 in ha1 i:


a1
e1 = .
k a1 k

Then e1 is an orthonormal set and we have < a1 > = < e1 >.

• Step i: Suppose we have found the orthonormal set e1 , . . . , ei already. In


particular:
< a1 , . . . , ai >= < e1 , . . . , ei > .
Then we construct ei+1 as follows. We first compute the orthogonal pro-
jection of ai+1 onto < e1 , . . . , ei >. This projection is (ai+1 , e1 )e1 + · · · +
(ai+1 , ei )ei by Theorem 6.2.13. By definition of orthogonal projection, the
difference
a∗i+1 = ai+1 − [(ai+1 , e1 )e1 + · · · + (ai+1 , ei )ei ]
174 Inner product spaces

is perpendicular to < e1 , . . . , ei >, and, in particular, perpendicular to each


of the vectors e1 , . . . , ei . Now a∗i+1 is not the zero-vector because otherwise
ai+1 would be a linear combination of e1 , . . . ei , and therefore of a1 , . . . , ai ,
contradicting the fact that the aj are linearly independent. Construct a
vector of length 1 spanning the same line as a∗i+1 by dividing by its length
k a∗i+1 k:
Xi
ai+1 − (ai+1 , ek )ek
a∗i+1 k=1
ei+1 := = .
k a∗i+1 k Xi
k ai+1 − (ai+1 , ek )ek k
k=1

Since ei+1 ∈< e1 , . . . , ei >⊥ , the vectors e1 , . . . , ei , ei+1 form an orthonormal


set. Moreover,

< a1 , . . . , ai , ai+1 >=< e1 , . . . , ei , ai+1 >=< e1 , . . . , ei , ei+1 >,

where the first equality follows from < e1 , . . . , ei >= ha1 , . . . , ai i, and the
second equality follows from the way we have constructed ei+1 : we have
added a linear combination of e1 , . . . , eui to ai+1 and multiplied by a non-
zero scalar. Both operations do not change the span.
Step 2 is then used to construct e2 from e1 , then e3 from e1 and e2 , etc.

6.2.18 Example. Consider the plane

V : x + y − 2z = 0 .

in R3 . Take z = λ, y = µ, then x = 2λ − µ and it easily follows that V = <


(2, 0, 1), (− 1, 1, 0) >. We construct an orthonormal basis of V as follows. First we
let
(2, 0, 1) 1
e1 = = √ (2, 0, 1) .
k (2, 0, 1) k 5
Then we let
a∗2 = a2 − P<e1 > a2 = (− 1, 1, 0) − ( (− 1, 1, 0), ( √25 , 0, √15 ) ) ( √25 , 0, √15 )
= 15 (− 1, 5, 2)

so that
a∗2 1
e2 = ∗ = √ (− 1, 5, 2) .
k a2 k 30
An orthonormal basis is therefore { √15 (2, 0, 1), √130 (− 1, 5, 2)}. Note that if you
start with other vectors spanning V , or with the same vectors but in a different
6.2 Orthogonal complements and orthonormal bases 175

order, you usually obtain a different orthonormal basis. For example, if you start
with the vectors (−1, 1, 0), (2, 0, 1), i.e., just the order has changed, then you find
the orthonormal basis { √12 (−1, 1, 0), √13 (1, 1, 1)} of V .

6.2.19 Example. We determine an orthonormal basis for the span of the vectors a =
(1, 1, 1, 1), b = (1, − 1, 2, 0), c = (5, 0, 1, − 4) in R4 .
In the first step we obtain:

a 1
e1 = = (1, 1, 1, 1) .
kak 2

Next, P<e1 > (b) = (b, e1 )e1 = 12 (1, 1, 1, 1), so b−P<e1 > (b) = 12 (1, − 3, 3, −1). There-
fore
(1, − 3, 3, −1) 1
e2 = = √ (1, − 3, 3, −1) .
k (1, − 3, 3, −1) k 2 5
In the next step, let P<e1 ,e2 > (c) = (c, e1 )e1 +(c, e2 )e2 = 21 (1, 1, 1, 1)+ 35 (1, − 3, 3, −1)
1 13
= 10 (11, − 13, 23, − 1). Then c − P<e1 ,e2 > (c) = 10 (3, 1, − 1, −3) and

(3, 1, − 1, −3) 1
e3 = = √ (3, 1, − 1, −3).
k (3, 1, − 1, −3) k 2 5

The set {e1 , e2 , e3 } is the required orthonormal basis.

6.2.20 Orthonormal sets and coordinates


Now that we know that every finite-dimensional inner product space V has an
orthonormal basis and that we can find such bases from given ones using the
Gram-Schmidt process, we turn to coordinates with respect to orthonormal bases.
Let {e1 , . . . , en } be an orthonormal basis of V and let
n
X n
X
x= x i ei ∈ V , y = yj e j ∈ V
i=1 j=1

be two arbitrary vectors in V . Then


 
n
X n
X X n
n X n
X
(x, y) =  x i ei , yj e j  = xi yj (ei , ej ) = x i yi .
i=1 j=1 i=1 j=1 i=1

This means that if we use coordinates with respect to the orthonormal basis, the
inner product of x and y equals the ‘ordinary’ inner product (in Rn or Cn depending
on whether we work in a real or complex vector space) of their coordinate vectors.
176 Inner product spaces

So, if we work with coordinates with respect to an orthonormal basis, compu-


tations in an n-dimensional real or complex inner product space can be ‘translated’
into computations in Rn or Cn with the standard inner product.
We next turn to an interesting relation between a linear subspace and its
orthogonal complement. Let W be a k–dimensional lineare subspace of an n–
dimensional inner product space V . Choose any basis a1 , . . . , ak of W and sup-
plement it with vectors ak+1 , . . . , an to a basis a1 , . . . , an of V . Apply the Gram-
Schmidt process to a1 , . . . , an to find an orthonormal basis e1 , . . . , en of V such
that e1 , . . . , ek is an orthonormal basis of W . By Theorem 6.2.10 every vector
x ∈ V can now be written as
k
X n
X n
X
x= (xi , ei ) ei + (x, ei ) ei = Pw (x) + (x, ei ) ei .
i=1 i=k+1 i=k+1

Then Theorem 6.2.13 implies that

W ⊥ =< ek+1 , . . . , en >

(or: if x ∈ W ⊥ then Pw (x) = 0 and x ∈< ek+1 , . . . , en >; if x ∈< ek+1 , . . . , en >,
then, by orthonormality, (x, ei ) = 0 for i = 1, . . . , k, hence x ∈ W ⊥ ).
So the orthogonal complement W ⊥ of W has dimension n − k and the set
{ek+1 , . . . , en } is an orthonormal basis of W ⊥ . In conclusion:

6.2.21 Theorem. Let W be a k-dimensional linear subspace of an n-dimensional inner


product space V . Then there exists an orthonormal basis {e1 , . . . , en } of V such
that W =< e1 , . . . , ek > and W ⊥ =< ek+1 , . . . , en >. In particular,

dim V = dim W + dim W ⊥ . (6.3)

6.2.22 Homogeneous systems and orthogonal complements


We have come across the relation dim V = dim W + dim W ⊥ before, but in a
different guise. Let a1 , . . . , am be the rows of an m × n–matrix A. The matrix
product Ax of A with a (column) vector x ∈ Rn can be viewed as a column of
inner products of each of the vectors ai with x:
    
a11 · · · a1n x1 (a1 , x)
 .. ..   ..  =  ..
.

 . .  .   .
am1 · · · amn xn (am , x)

So the set of solutions of the system Ax = 0 is exactly the orthogonal complement


of < a1 , . . . , am >. If the rank of A is k, then we find, using Theorem 6.2.21, that
the dimension of the solution space is n − k, in agreement with Theorem 5.1.13.
6.2 Orthogonal complements and orthonormal bases 177

6.2.23 Example. In R4 consider the linear subspace V given by


x +2z − u = 0,
x+y = 0.
Solving these equations show that
V = < (1, − 1, 0, 1), (− 2, 2, 1, 0) > .
The vectors (1, 0, 2, − 1) and (1, 1, 0, 0) (produced from the coefficients of the equa-
tions) are perpendicular to V . Since dim(V ) = 2 we have dim(V ⊥ ) = 2(= 4 − 2);
as the two vectors (1, 0, 2, − 1), (1, 1, 0, 0) are linearly independent we conclude
V ⊥ =< (1, 0, 2, − 1), (1, 1, 0, 0) > .
Now apply Gram-Schmidt to the two spanning vectors of V and the two spanning
vectors of V ⊥ . Then we find
1 1 1 1
√ (1, − 1, 0, 1), √ (− 2, 2, 3, 4), √ (1, 0, 2, − 1), √ (5, 6, − 2, 1)
3 33 6 66
which form an orthonormal basis of R4 in such a way that the first two vectors
span V and the last two vectors span V ⊥ .

6.2.24 Projection onto W ⊥


If W is a linear subspace of the finite-dimensional inner product space V and
PW (x) is the orthogonal projection of x onto W , then it seems plausible that the
vector x − PW (x), which is perpendicular to W , is the orthogonal projection of
x onto W ⊥ . To show that this is really the case we must check that x − (x −
PW (x)) = PW (x) is perpendicular to all vectors from W ⊥ , i.e., that it belongs to
(W ⊥ )⊥ . As PW (x) ∈ W it suffices to show that W = (W ⊥ )⊥ . Now it’s evident
that every vector from W is perpendicular to every vector from W ⊥ (from the
definition of W ⊥ !), so that W ⊆ (W ⊥ )⊥ . The dimension formula (6.3) implies
that dim(W ⊥ )⊥ = dim V − dim W ⊥ = dim W . Hence W = (W ⊥ )⊥ .
This observation on projection onto W ⊥ is useful in computations, since one
of the projections may be easier to compute (directly) than the other.

6.2.25 Example. We determine the orthogonal projection of (1, 2, 1) ∈ R3 onto the plane
W : x + y + z = 0. The orthogonal complement W ⊥ is the line < (1, 1, 1) > and
the projection of (1, 2, 1) onto this line is
1 1 4
((1, 2, 1), √ (1, 1, 1)) √ (1, 1, 1) = (1, 1, 1).
3 3 3
The orthogonal projection of (1, 2, 1) onto W is therefore
4 1
(1, 2, 1) − (1, 1, 1) = (−1, 2, −1).
3 3
178 Inner product spaces

6.3 The QR-decomposition


6.3.1 The relation between a linearly independent set of vectors in Rm and the resulting
orthonormal set, derived using the Gram-Schmidt process, can also be expressed in
terms of matrices. This leads to a way to express an m × n–matrix whose columns
are linearly independent as a product of two matrices with special properties.

6.3.2 The Gram-Schmidt process in terms of matrices


Let a1 , . . . , an ∈ Rm be n linearly independent vectors. Collect them as columns
in an m × n-matrix A. Applying Gram-Schmidt to these vectors produces an
orthononormal set u1 , . . . , un ∈ Rm . Now every vector ak is a linear combination
of the first k vectors u1 , . . . , uk from the orthonormal set:

ak = r1k u1 + · · · + rkk uk + 0 · uk+1 + · · · 0 · un

with rkk 6= 0 (why?). By replacing uk by −uk if necessary, we can arrange that


every rkk > 0. Now collect the vectors u1 , . . . , un as columns in the m×n-matrix Q
and the numbers rij in the n × n-matrix R (column k of R contains the coefficients
that we used for ak ). Then we get the following equality of matrices:

A = QR,

in which R is an upper triangular matrix with positive entries on the diagonal.


This decomposition is usually called the QR-decomposition of the matrix A.
The matrix R can be found in various ways in concrete problems. One of these
ways is the following, which uses the orthonormality of the set {u1 , . . . , un }. The
equalities
u⊤k · uℓ = (uk , uℓ ) = δkℓ

can be rewritten as
Q⊤ Q = I n .
But then
Q⊤ A = Q⊤ QR = In R = R.
Another way is via careful bookkeeping when carrying out the Gram-Schmidt
process.

6.3.3 Example. Applying the Gram-Schmidt process to the linearly independent vec-
tors (2, 0, 1), (−1, 1, 0) yields the orthonormal vectors

1 1
√ (2, 0, 1), √ (−1, 5, 2).
5 30
6.3 The QR-decomposition 179

The corresponding QR-decomposition is therefore


   √2 √−1  √
2 −1 5 30 5 − √25
!
 0 1 = 0 √ · 5
.
 
30 0 √6
1 0 √1 √2 30
5 30

Here, R has been found, for instance, via R = Q⊤ A.


180 Inner product spaces

6.4 Notes
The Cauchy–Schwarz inequality is named after A.–L. Cauchy (1789–1857) and
H.A. Schwarz (1843–1921). Cauchy described the inequality in terms of sequences
of numbers, whereas Schwarz worked in function spaces. The inequality is some-
times also named after V.Y. Bunyakovsky (1804–1889), who came up with it in-
dependently, also in the setting of function spaces.
Analysis Inner product spaces of functions will be further discussed in more advanced
analysis courses. Applications of such inner product spaces can be found in, for
instance, signal analysis and in quantum mechanics.
A variation of the inner product, in which non-zero vectors need not have a
Relativity positive length, occurs in relativity theory. Implicitly, a bit of this can be seen in
theory the classification of quadratic forms in Linear Algebra 2.
The Pythagorean theorem has its roots, of course, in geometry, but has sur-
prising and useful consequences in cleverly chosen function spaces. An example is
the inequality

X 1 π2
≤ ,
n2 6
n=1

which can be derived using an inner product space as in example 6.1.7. (By the
way, the inequality turns out to be an equality, a famous result due to Euler.)
The Gram-Schmidt process is named after the Dane J.P. Gram (1850–1916)
and the German E. Schmidt (1876–1959) and was introduced in the setting of
function spaces.
Orthogonal projections can be used to derive the method of least squares. This
Linear method is an essential tool in, e.g., dealing with measurements. The notions length,
Algebra angle, orthogonality will reoccur in Linear Algebra 2 in the study of orthogonal
maps, like reflections and rotations.
The data of an inner product can be neatly stored in a so-called Gram–matrix.
If {a1 , . . . , an } is a basis of the inner product space V , then the entry in position
i, j of this n × n–matrix is the inner product (ai , aj ).
6.5 Exercises 181

6.5 Exercises
§1

1 Which of the following expressions define an inner product in R2 (where a =


(a1 , a2 ), b = (b1 , b2 ))?
a. (a, b) = a1 b1 + 2a2 b2 ,

b. (a, b) = a1 b1 − a2 b2 ,
  
 2 1 b1
c. (a, b) = a1 a2 ,
1 2 b2
  
 4 2 b1
d. (a, b) = a1 a2 .
2 1 b2

2 Use the Cauchy-Schwarz inequality to prove that for all real numbers a, b, c the
following inequality holds:
p
|a + 2b + 2c| ≤ 3 a2 + b2 + c2 .

3 Compute the distance between the given points in R3 :


a. (2, 1, 3) en (−1, 2, 4);

b. (−1, 1, −3) en (−3, −2, 1).

4 Determine the angle between the given pairs of vectors:


a. (2, −1, −3) en (1, 3, 2) in R3 ;

b. (1, 2, 3) en (−2, 3, 1) in R3 ;

c. (2, 3, 0, 2, 1) en (0, 2, 2, 2, 2) in R5 ;

d. (1, 4, 4, 1, 1, 1) en (0, −1, 3, 2, 1, 1) in R6 .

5 In the real inner product space V the (mutually distinct) vectors a, b, c, d satisfy
a − b ⊥ c − d and a − c ⊥ b − d. Prove that a − d ⊥ b − c.

6 a. Prove the following equality for every pair of vectors a and b in Rn :

ka + bk2 + ka − bk2 = 2kak2 + 2kbk2 .

Also give a geometrical interpretation.


182 Inner product spaces

b. Prove the following inequality for every pair of vectors a and b in Rn :

4|(a, b)| ≤ (a + b, a + b) + (a − b, a − b).

7 In the real inner product space V the vectors a, b satisfy kak = kbk. Prove that
a + b and a − b are orthogonal. Give a geometrical interpretation.

8 Determine all vectors u in the inner product space V such that (u, x) = 0 for all
x ∈ V . [Hint: take x = u.]
§2

9 Determine the orthogonal complement of each of the following subspaces of R3 :

a. < (1, 1, 2), (0, 1, 0) >.

b. < (1, 1, 2), (0, 1, 0), (4, 5, 8) >.

c. < (3, 1, 0), (2, 1, −4), (5, 1, 4) >.

d. {(x1 , x2 , x3 ) | 2x1 − x2 = 5x3 }.

10 Consider the following vectors in R4

a = (4, 7, −6, 1), b = (2, 6, 2, −2), c = (−2, −1, 8, −3), d = (−2, −6, −2, 2).

Determine < a, b, c, d >⊥ .

11 a. Let W =< (1, 0, 1, 2), (1, 1, 0, 1) > be a subspace of R4 .

i) Determine a basis of W ⊥ .
ii) Determine v ∈ W and w ∈ W ⊥ such that v + w = (2, 2, 3, 2).

b. Let W =< (1, 1, −1, 0), (2, 1, −1, −1), (1, −2, 0, 3) > be a subspace of R4 .

i) Determine a basis of W ⊥ .
ii) Determine v ∈ W and w ∈ W ⊥ such that v + w = (3, 6, −1, 3).

12 In R4 , let a = (1, 2, −2, 0), b = (2, 1, 0, 4), c = (5, 7, 3, 2). Determine the orthogonal
projection of c on < a, b >⊥ .

13 In R5 let U =< (1, 0, 1, 0, 1), (1, 1, 1, 1, 1) >.

a. Give a basis of U ⊥ .
6.5 Exercises 183

b. Decompose the vector (3, 2, 2, 0, 1) in a component in U and a component in


U ⊥ , i.e., find u ∈ U and v ∈ U ⊥ such that (3, 2, 2, 0, 1) = u + v.

14 a. In R3 let V = {(x, y, z) | x + y + z = 0}. Determine orthonormal bases of V


and V ⊥ . Determine the orthogonal projection of (1, 2, 1) onto V .

b. Let l = {(x, y, z) | x + y = 0, x + y + 2z = 0} be a line in R3 . Determine


orthonormal bases of l and of l⊥ .

c. Let
W =< (1, 0, 0, −1), (1, 1, −1, 0), (0, 1, 0, −2) >

be a subspace of R4 . Determine an orthonormal basis of W and one of W ⊥ .

15 Let W =< (1, 2, 2, 4), (3, 2, 2, 1), (1, −2, −8, −4) > be a subspace of R4 .

a. Determine an orthonormal basis of W .

b. Extend this orthonormal basis to one of R4 . What are the coordinate vectors
of (1, 1, 1, 1) and (4, 4, 4, 5), respectively, with respect to this basis?

16 Let a1 = (1, 1, 1, 1), a2 = (3, 3, −1, −1), a3 = (7, 9, 3, 5) be vectors in R4 . Consider


the subspaces W1 =< a1 >, W2 =< a1 , a2 >, W3 =< a1 , a2 , a3 >.

a. Determine orthonormal bases of W1 , W2 , W3 , respectively.

b. Determine the orthogonal projection of (1, 2, 1, 2) onto W2 .

c. Extend the basis of W3 to an orthonormal basis of R4 . What are the coordi-


nates of the vector (1, 2, 1, 2) with respect to this basis?

17 Let
A1 =< (1, 1, 1, 2, 1) >⊥ and A2 =< (2, 2, 3, 6, 2) >⊥

be subspaces in R5 . Determine an orthonormal basis of A1 ∩ A2 .

§3

18 Determine the QR-decomposition of the matrix with columns (1, 1, 1, 1), (3, 3, −1, −1),
(7, 9, 3, 5) (see exercise 16).
184 Inner product spaces

6.5.1 Exercises from old exams


19 In R4 with the standard inner product let U =< (1, 1, 1, 1), (1, 0, 2, −1) >.

a. Determine an orthonormal basis of U .

b. Determine the orthogonal projection of (−1, 4, 4, −1) onto U .

c. What is the distance of a = (−1, 4, 4, −1) to U , i.e. the minimum of the


distances of a to vectors from U ?

20 In R4 let A1 : (a1 , x) = 0 and A2 : (a2 , x) = 0, where a1 = (1, 1, 1, 1) and


a2 = (1, −2, 2, 4).

a. Determine the angle between the vectors a1 and a2 .

b. Determine a parametric representation for A1 ∩A2 , i.e. the collection of vectors


that belong to both A1 and A2 .

21 Let a, b, c be vectors in the real inner product space V such that


1
(a, a) = (b, b) = (c, c) = 1, (a, b) = , (a, c) = 0, (b, c) = 0.
2

a. Show that {a, b, c} is a linearly independent set of vectors.

b. Determine the orthogonal projection of a + b + c onto < a >.

c. Determine the orthogonal projection of a + b + c onto < a, b >.


Appendix A

Prerequisites

A.1 Sets
Sets consist of elements. The way to denote that a is an element of the set A (or
belongs to A) is as follows:
a ∈ A.
By definition, two sets A and B are the same, A = B, if they contain exactly the
same elements. We usually describe sets in one of the following ways:
• Enumeration of a set’s elements between curly brackets. For exam-
ple: √
{1, 2, 3, 5}, {1, 2, 3, . . .}, {1, 2, 3, 5, 3}, {2, 3, x2 − 1}.
The dots in the second example indicate that the reader is expected to
recognize the pattern and knows that 4, 5, etc., also belong to the set. The
first and third sets are equal: the order in which the elements are listed and
repetitions of elements are unimportant.
In mathematics (a, 2, π) denotes an ordered list in which the order of the
elements and repetitions do matter. So (1, 2, 3), (1, 2, 2, 3), (1, 3, 2, 2) are
all distinct lists. We use such lists in this course mainly in the setting of
coordinates.
• Description of a set using defining properties. Examples:
{x | x is an even integer}, {y | y is real and y < 0}.
We also write
{x ∈ Z | x even}, {y ∈ R | y < 0},
so that it is immediately clear in which set we are working.

185
186 Prerequisites

Here is a list with often used notations regarding sets.

∅ the empty set


a 6∈ A a is not an element of A
A ⊂ B (ook: B ⊃ A) A is a subset of B
(or A ⊆ B) (or: A is contained in B)
i.e., for every a ∈ A we have a ∈ B
A 6⊂ B A is not a subset of B
A ∩ B := {x | x ∈ A and x ∈ B} the intersection of A and B
A ∪ B := {x | x ∈ A or x ∈ B} the union of A and B
A − B := {x | x ∈ A and x 6∈ B}
(or: A \ B) the (set)difference of A and B
A × B := {(a, b) | a ∈ A, b ∈ B} the (cartesian) product of A and B
A1 × A2 × · · · × An := the product of A1 , A2 , . . . , An
{(a1 , a2 , . . . , an ) | a1 ∈ A1 ,
a2 ∈ A2 , . . . , an ∈ An }
An := {(a1 , a2 , . . . , an ) | special case
a1 , . . . , an ∈ A}

To prove that two sets A and B are equal an often used strategy is to prove the
following statements separately: A ⊂ B and B ⊂ A.

A.2 Maps
If A and B are sets, then a map f from A to B is a rule that assigns to every element
a of A an element f (a) of B, called the image of a under f . Notation: f : A → B.
The set A is called the domain of the map, the set B is called the codomain. If
B is a set of numbers, the term function is often used instead of the term map.
In the setting of vector spaces the term transformation is often used. Two maps
are the same if they have the same domain, codomain, and if they assign to every
element of the domain the same image. The set of all images f (a) is called the
range of the map f . In set notation: {f (a) | a ∈ A} or {b ∈ B | ∃a ∈ A[b = f (a)]}.

Some more notions and notations regarding maps:


A.3 Some trigonometric relations 187

f :A→B map with domain A and codomain B


(other letters are also allowed!)
f (a) the image of a
f : a 7→ b f assigns b to a
or: a is mapped to b, or: f maps a to b
f (D) := {f (d) | d ∈ D} the image of D,
where D is a subset of A
f (A) special case: the image of f
f −1 (E) := {a ∈ A | the preimage of E
f (a) ∈ E} (or: f ← (E)) (E a subset of B)
f −1 (b) instead of f −1 ({b}) keeps the notation simple
f : A → B injective for every a, a′ ∈ A: f (a) = f (a′ ) ⇒ a = a′ ; or:
(f is an injective map) for every a, a′ ∈ A: if a 6= a′ then f (a) 6= f (a′ )
f : A → B surjective for every b there is an a ∈ A with f (a) = b,
(f is a surjective map) i.e., f (A) = B
f : A → B bijective f is injective and surjective
(f is a bijection) (so: for every b ∈ B there is exactly one a ∈ A
with f (a) = b)

If f : A → B is a bijection, then for every b ∈ B there is exactly one a ∈ A with


f (a) = b. Therefore, we can define a map from B to A by: b 7→ a if f (a) = b. This
map (which only exists if f is a bijection) is called the inverse of f and is denoted
by f −1 . (Be careful: this symbol is used for different notions.)

A.3 Some trigonometric relations


Here are some trigonometric relations (x and y are arbitrary real numbers):
• cos2 (x) + sin2 (x) = 1;
• sin(x + 2π) = sin(x) and cos(x + 2π) = cos(x);
• sin(π − x) = sin(x) and cos(π − x) = − cos(x);
• sin(π + x) = − sin(x) and cos(π + x) = − cos(x);
• sin(π/2 − x) = cos(x) and cos(π/2 − x) = sin(x);
• sin(2x) = 2 sin(x) cos(x) and cos(2x) = cos2 (x) − sin2 (x);
• sin(x + y) = sin(x) cos(y) + cos(x) sin(y) and cos(x + y) = cos(x) cos(y) −
sin(x) sin(y).
188 Prerequisites

A.4 The Greek alphabet


In mathematics we often use Greek symbols. Below is the Greek alphabet, where
we have added a * to symbols that occur frequently in Linear algebra A and B.
name lower case upper case
alpha α* A
beta β* B
gamma γ* Γ
delta δ* ∆
epsilon ε E
zeta ζ Z
eta η H
theta θ of ϑ Θ
iota ι I
kappa κ K
lambda λ* Λ
mu µ* M
nu ν N
xi ξ Ξ
omikron o O
pi π Π
rho ρ* R
sigma σ* Σ
tau τ * T
upsilon υ Υ
phi φ of ϕ * Φ
chi χ X
psi ψ* Ψ
omega ω* Ω
Appendix B

Answers to most of the


exercises

Chapter 1: Complex numbers

9
1. a) 5+i d) 5 − 13
5 i
b) 1 e) −3i √
c) 4
25 +
3
25 i f) 1 − 22

2. a) r = 3, ϕ=π d) r = 2, ϕ = π6
b) r=√ 2, ϕ = π2 e) r = 13,
√ ϕ = arctan 12 2
5 = 2arctan 3
c) r = 2, ϕ = π4 f) r = 4 2, ϕ = −4 π

5. a) Im(z) = −1, the perpendicular bisector of i and −3i


b) perpendicular bisector of 3i and 4 + 2i
c) all z satisfying
√ arg(z)
√ = π8 + kπ
√ or arg(z)
√ = 5π8 + kπ, k = 0, −1
1 1 1 1
d) z1 = + 2 √2 + 2 i√6, z2 = − 2 √2 + 2 i√6,
z3 = + 12 2 − 21 i 6, z4 = − 21 2 − 21 i 6,
e) alle z met arg(z) = 13 π, |z| 6= 0 en arg(z) = 34 π, |z| 6= 0.


7. a) 2i √ c) 1+i √ e) e3 ( 12√− 21 i 3)
b) − 32 + 23 i 3 d) 1 1
2 − 2i 3 f) − 12 3 − 21 i

189
190 Answers to most of the exercises

8. a) z = 21 ln 2 + ( π4 + 2kπ)i, k ∈ Z
b) z = ln 2 + ( π3 + 2kπ)i, k ∈ Z
c) Re(z) = ln 5, Im(z) arbitrary
d) z=0 √

e) z = ± 12 √4k + 1 π(1 + i), k = 0, 1, · · ·

z = ± 12 4k − 1 π(1 − i), k = 1, 2, · · ·
f) z = π4 + kπ, k ∈ Z

10. a) z = π2 + kπ, k ∈ Z √
π
b) z = x + iy met x = 4 + kπ; y = − 21 ln(4 ± 15),
k∈Z
√ √
11. a) z1 = 1 z2 = 12 (1 + i √ 3) z3 = 12 (−1 +√i 3)
z4 = −1 z5 = 12 (−1 −√ i 3) z6 = 12 (1 − i 3)
b) z1 = 2 : z2,3 = −(1 ± i 3)
c) z1 = 2(cos π8 + i sin π8 ) z2 = 2(− sin π8 + i cos π8 )
π π π
z3 = 2(−√ cos 8 √ − i sin 8 ) z4 = 2(sin √ 8 − i√cos π8 )
d) z1 = 21 √2 + ( 21 √2 − 1)i z2 = − 21 √2 + ( 21 √2 − 1)i
z3 = 21 2 −√ ( 21 2 + 1)i z4 = − 21 2 − ( 21 2 + 1)i
π
e) zk = i − 2 + 3(cos( 12 + k π3 ) + i sin( 12
π
+ k π3 )),
k = 0, 1, · · · , 5. √
f) z1 = 0; z2 = 1; z3,4 = − 21 (1 ± 3i) √ √
g) z = 0 and z = eiϕ met ϕ = ( 14 + 21 k)π, k ∈ ZZ (d.w.z. ± 22 ± i 22 )

12. a) z = − 12 ± 21 i 3
b) z1 = −2i and z2 = 4i
c) z1,2 = 2 +√i
d) z1,2 = ± 12 6(1 − i); z3,4 = ±(1 + i)

13. a) z2 = 2i; z3 = −2
b) z2 = 1 − i; z3,4 = −3 ± 2i
c) z 3 − 7z 2 + 15z − 25
d) z 4 − 4z 3 + 14z 2 − 4z + 13

14. a) (z − 2)(z + 2)(z 2 + 1)


b) (z + 1)(z 2 + 2z + 2)
c) (z 2 + 1)(z 2 + z + 1)

15. a) −32 + 32i


11
b) |z 23 | = 223 , arg(z 23 ) = 24 π
191

16. cos3 ϕ − 3 sin2 ϕ cos ϕ


4 cos3 ϕ sin ϕ − 4 cos ϕ sin3 ϕ
18. a. If z = x + iy, then z = x − iy. If z is real, then both z = x and z = x.
Conversely, if x + iy = x − iy, then y = 0, so z is real.
b. Proof is similar to that of a)
c. z and w are parallel if and only if z/w is real. Then use a).

19. a. z = x + iy is mapped to −x + iy which is −z.


b. z 7→ e2iα · z

20. a. Without loss of generality assume the vertices are 0, z and ρz with |ρ| = 1.
Then |z| = |z − ρz| = |1 − ρ| · |z| so that |1√− ρ| = 1. From |ρ| = |1 − ρ| = 1
1 3
you obtain via ρ = a + bi that ρ = ± i .
2 2
z−w
21. a. From = t with t real (and 6= 1) you get z − w = t(z − v) so that
z−v
1 t
(1 − t)z = w − tv. Then z = w− v, so z = uw + (1 − u)v =
1−t 1−t
v + u(w − v) with u = 1/(1 − t) real.

23. a. re−it · v

25. (1/2) + bi, b ∈ R

26. (π/4) + kπ with k ∈ Z

27. b) (5/2) − i(5/2), (1/2) + i(1/2).

28. Factorization: (z 2 − 2z + 5)(z 2 + 4); zeros: 1 ± 2i, ±2i

30. Solutions are:


0, cos(π/8) + i sin(π/8), cos(5π/8) + i sin(5π/8),
cos(9π/8) + i sin(9π/8), cos(13π/8) + i sin(13π/8).

(Solutions can also be given in exponential notation, of course.)

31. Put vertex A of square ABCD in the origin, then the vertices of the square
can be described as follows: 0, w, (1 + i)w, iw (check!). Now A′ B ′ C ′ D′ is
also of this form apart from a translation, so: u, z + u, z(1 + i) + u, zi + u.
The midpoints (times 2 for computational convenience) are u, w + z + u,
(1 + i)(w + z) + u, i(w + z) + u. Apart from a translation over u we get 0,
w + z, (1 + i)(w + z), i(w + z).
192 Answers to most of the exercises

Chapter 2: Vector geometry in dimensions 2 and 3

3. b) All
c) The vector is on the line: take λ = 3.

4. b) all

5. a) 0≤λ≤1
b) λ = 1/2
c) λ = 2/3

6. a) x = (2, 1, 5) + λ(3, −2, −1)


b) x = λ(1, 2)
c) x = (1, 2, 2) + λ(1, 1, 1) + µ(0, 1, 0)
d) x = (−2, 1, 3) + λ(1, 2, −1) + µ(6, −1, 0)
7. Yes; yes.

8. a) x + 2y = 7
b) x+y =4
c) x=3

9. a) x = (0, 1) + λ(3, −2)


b) x = (−1, 1) + λ(4, 3)
c) x = (0, 5/2) + λ(1, 0)

10. a) 2x + 2y − z = 3
b) x−y+z =1
c) x − 2y − 2z = 0

11. a) x = (5, 0, 0) + λ(1, −1, 0) + µ(3, 0, 1)


b) x = λ(3, −2, 0) + µ(0, 5, −3)
c) x = (0, 5, 0) + λ(1, 0, 0) + µ(0, 0, 1)

14. a) 3
b) 5
c) π/2 radians
d) a=1

15. a) x1 − x2 = 1; 2
b) 4x1 + 3x2 = 10; 5

16. a) 6
b) 2
193

17. a) normal vector (−1, −1, 1); −x1 − x2 + x3 = −1


b) normal vector (6, −3, 2); 6x1 − 3x2 + 2x3 = 9

18. a) 17/2
b) 13/2
c) 2

20. b) A median is the line through a vertex of ABCD


and the midpoint of the opposite site.

23. a) intersection point (0, −8, 8)


b) x = (0, −8, 8) + λ(1, 7, −8)
1
24. q = 3 · c where the origin is in A.

25. b) 3
c) x = (2, 0, 4) + σ(1, −2, 2)

26. b) x = (2, −1, −1) + ρ(1, 1, −1)

Chapter 3: Linear equations


 
 4 −7 4
−2 7
1. AB = ; BA =  0 −6 −8  ;
−12 6
   −3 12 6
−8 13 9
A(B − 2C) = ; AD = ;
 −26 36
 9
8 0 −4  
6 0
CC T =  0 2 4  ; T
C C= ;
0 14
 −4 4 10 
4 −2 6
DDT =  −2 1 −3  ; DT D = 14
6 −3 9
   
2 0 0 2 + 2i 2i
2. A+B = ; (A − B)C = ;
 2 −2 −2  0 0
3 −i i  
−2 + 2i 1 − i
AT B = i 2 −2  ; AAT = ;
1−i 3
 −i −2 2 
3 i i
AT C =  i −2 −2 
−i 2 2
194 Answers to most of the exercises
   
2 2 2 2 5 8
3. A+B = ; A − 2B = ;
 4 4 4  1 4 7
3 −2 −7  
T T −11 −2
A B=  4 −3 −10 ; AB =

−14 −2
5 −4 −13
   
2 2 2i 4i
4. A + B =  4 4 ; A − B =  2i 4i  ;
6 6  2i 4i 
  7 9 + 3i 11 + 6i
17 20 − 6i
AT B = ; AB T =  9 − 3i 13 17 + 3i 
20 + 6i 26
11 − 6i 17 − 3i 23
 
1 1 1
5. a) 2
−1 1
1 −1
b) λA ;(A−1 )2 ; (A−1 )T ; A
   
1 0 0 2 1 2 0 0 2
7. a)  0 1 0 4  b)  0 0 1 1 3 
0 0 1 7 0 0 0 0 0
8. multiply all from the left by:

i j m m j m
1.. 1.. 1..
. . .
a) i 0.. 1 b) i λ. . c) i λ. . 1
. .. ..
j 1 0.. .. . ..
. .
m 1 m 1 m 1
9. a) x = λ(17, −13, 4, 3)
b) x = λ(3, 1, −5, 0) + µ(0, 1, 0, 1)
c) x = (3, 1, 0, 0) + λ(1, 2, 1, 0) + µ(7, 5, 0, −1)

10. a) x = (1, −1, 1)


b) no solutions
c) x = (0, 0, −2) + λ(−2, 1, 1)
d) no solutions

11. a) z = (−1 + i, −1 − i, 1)
b) z = µ(2, −1 − 3i) and z = µ(2, −1 + 3i)
2 1 2 1
c) z 1 = µ(1, 1, −1); z 2 = µ(1, e 3 πi , e 3 πi ); z 3 = µ(1, e− 3 πi e− 3 πi );
1

12. z = 2a+1 (2, 3, 1) = − 13 i 3(2, 3, 1)
195

1 1
13. λ = 1: inconsistent; λ 6= 1: ( , 0, 3 − ) + µ(1, 1, −λ − 1)
1−λ 1−λ
14. For λ 6= 0: (x1 , x2 , x3 ) = (λ, 2, −2); for λ = 0: x1 = 0, x3 = −2, x2 = µ
(arbitrary)

15. λ 6= −1: inconsistent system; for λ = −1: (1, −1, 0, 0) + α(1, −2, 1, 0) +
β(−8, 7, 0, 1)

Chapter 4: Vector spaces

1. No, yes, no, no

2. Yes, no, no, yes

3. Yes, yes

4. a), c), d), f): yes; b), e), g), h): no

5. a) x = (7, 0, 0) + λ(−4, 1, 0) + µ(5, 0, 1)


b) x = λ(1, 0, −2) + µ(0, 1, 4)
c) x = ( 72 , 0, 0) + λ(−2, 1, 0) + µ(−2, 0, 1)

6. a) x−y =3
b) 3x − y − 3z = −1
c) x−y+z =3

7. a) x = (0, 0, 3, −11) + λ(1, 0, −1, 4) + µ(0, 1, −1, 5)


b) x = (0, 3, 0, 1) + λ(1, 0, 0, 0) + µ(0, 2, −1, 0)

8. a) 2x − 2y + u = 2; y = z
b) y + z = 4; −x − y + u = 5

9. x = (1, 3, 2)

12. a), c), d): independent; b), e): dependent

13. a) {(1, 0), (0, 1)}


b) {(0, 1, 0), (1, 0, 1)}
c) {(1, 0, 0, 1), (0, 1, 0, 1), (0, 0, 1, 0)}
d) {(0, 4, −1, −4), (1, 5, 0, −3)}
e) {(3, −1, 4, 7), (1, −3, 2, 5), (5, 3, 2, −1)}
f) {(1, 5, 0, 0), (0, 4, −1, 0), (0, 0, 0, 1)}

14. Yes, no, yes


196 Answers to most of the exercises

15. No, yes, yes

17. b): dependent; a), c): independent

19. x = (1, −1, 0) + λ(2, 1, 1) + µ(−1, 2, 1); x + 3y − 5z = −2

20. a) a 6= ±2
b) a=3

21. a) < (−2, 3, 1) >


b) <0>

22. a) basis {(1, 0, 0), (0, 1, 0), (0, 0, 1)} dimW1 = 3


b) basis {(i, 1 + 1, 0), (0, i, 1 + i)} dimW2 = 2
c) basis {(0, 1, −1), (1, 2, 0)} dimW3 = 2

23. a) basis {e2t , t2 , t} dim = 3


b) basis {t, sin2 t, cos2 t} dim = 3
c) basis {e2t , e−t , et } dim = 3
d) basis {2x3 + x2 − x + 5, x3 + 2x2 + 10, −2x2 + x} dim = 3

24. a) dim = 5
b) dim = 9
c) dim = 6
d) dim = 3
e) dim = 4
f) dim = 2

25. a) dim = 2
b) dim = 3

26. a) (−1, 3)
b) (2, −1, 2)
c) ( 12 , − 21 , 21 )
d) ( 14 , −2)

30. b) c 6= −2

31. a) 2; b) h2 + e3x i

32. a) λ(0, −1, 1, 0) + µ(1, −1, 0, 1); b) (1, −1, 0, 1) + µ(0, −1, 1, 0)

Chapter 5: Rank and Inverse, Determinants


197

1. a) row space: < (1, 1, 1, 1), (0, 1, 2, 3) >


column space: < (1, 1), (0, 1) >
b) row space: < (1, 1, 0, 1), (0, 3, 1, 2) >
column space: < (1, −1, −1), (0, 1, 3) >
c) row space: < (0, 1, 1), (1, 0, 1) >
column space: < (1, 1 + i, 2 + i), (0, 2 − i, −2i) >

2. a) rank = 2; b) rank = 2; c) rank = 4; d) rank = 3


 
−2 0 0 1    
  −2 3 −1 1 −3 2
2 −1  9 3 2 −6   ;  0 −3 2  ;  −3 3 −1 
3. ; 
−1 1  6 2 1 −4 
1 1 −1 2 −1 0
−16 −3 −2 10
 
1 −i 1
4. a) 2
 1 −i 
1 1 0
b) 21  0 0 −2i 
1 −1 0

5. b) rank(A) = rank(A|B)

6. a) λ4 = 1(λ = ±1 ∨ λ = ±i) : rank = 1, anders : rank = 2


b) λ = 1 : rank = 1, λ 6= 1 : rank = 2

7. a) 0; b) −πei; c) −216; d) 8; e) 1; f) 4; g) −45; h) −4

8. a) −8; b) 0; c) π − 1; d) 1; e) 73; f) 484

9. 0; 2

 2, n = 1
10. |A| = −1, n = 2
0, n ≥ 3

11. yes

12. a) det(A) 6= 0
b) det(A) = 0 and (2, −1, 5) ∈ column space
c) det(A) = 0 and (i, 1, 0) 6∈ column space

13. a) y = 1; b) x = −6

14. α=0∨β =0∨γ =0


198 Answers to most of the exercises

15. b) det(A) = ±1
c) det(A) = 0 for n odd
 
0 1
16. a)
0 0
b) 0
c) I
d) det(A) = 0 or det(A) = 1
18. Voor a 6= 0 and a 6= 3:
−3a + 1 −a + 3 a2 − 1
 
1  −1 a−3 1 
a2 − 3a
a 0 −a

19. a = 0 and a = 1
20. a) for λ 6= 1, −1:
1 − 2λ2 λ λ3 − λ
 
1 
λ −1 1 − λ2 
1 − λ2
2λ −2 1 − λ2
b) λ = −1
Chapter 6: Inner product spaces
1. a) yes
b) no
c) yes
d) no

3. a) √11
b) 29
2
4. a) 3π
1
b) 3π
1
c) 4π
1
d) 3π

8. u=0
9. a) < (2, 0, −1) >
b) < (2, 0, −1) >
c) < (0, 0, 0) >
d) < (2, −1, −5) >
199

10. < (5, −2, 1, 0), (−2, 1, 0, 1) >

11. a) i) < (1, −1, −1, 0), (0, −1, −2, 1) >
ii) v = (2, 1, 1, 3), w = (0, 1, 2, −1)
b) i) < (1, 2, 3, 1) >
ii) v = (2, 4, −4, 2), w = (1, 2, 3, 1)

12. (2, 4, 5, −2)

13. a) < (1, 0, 0, 0, −1), (0, 1, 0, −1, 0), (0, 0, 1, 0, −1) >
b) (2, 1, 2, 1, 2) and (1, 1, 0, −1, −1)
√ √ √
14. a) V ⊥ =< 13√ 3(1, 1, 1) >; V =< 12 2(1, √ 0, −1), 61 6(1, −2, 1) >
b) l⊥ =< 21 √2(1, 1, 0), (0, 0, 1)√>; l =< 21 2(1, √ −1, 0) >
c) W =< 21 2(1, √ 0, 0, −1), 1
10 10(1, 2, −2, 1), 1
3 3(1, −1, 0, 1) >,
⊥ 1
W =< 15 15(1, 2, 3, 1) >

15. a) W =< 51 (1, 2, 2, 4), 151 1


(12, 4, 4, −7), 15 (6, 2, −13, 4) >
1
b) complete with 15 (6, −13, 2, 4)
(4, 4, 4, 5) → (8, 3, 0, 0)

16. a) W1 =< b1 >, b1 = 21 (1, 1, 1, 1)


W2 =< b1 , b2 >, b2 = 12 (1, 1, −1, −1)
W3 =< b1 , b2 , b3 >, b3 = 12 (1, −1, 1, −1)
b) 23 b1
√ √ √
17. < 12 2(1, 0, 0, 0, −1), ( 51 5(0, 0, 2, −1, 0), 61 6(1, −2, 0, 0, 1) >

18.    
1 3 7 1 1 −1  
 1 3 9  1 2 2 12
1 1 1  

 1 −1 3  = 2 
   · 0 4 4 
1 −1 −1 
0 0 2
1 −1 5 1 −1 1
√ √
19. a) (1/2)(1, 1, 1, 1), (1/ 20)(1, −1, 3, −3); b) (2, 1, 3, 0); c) 2 5

20. a) π/3; b) x = λ(−4, 1, 3, 0) + µ(2, 0, −3, 1)

21. b) (3/2)a; c) a + b
200 Answers to most of the exercises
Bibliography

[1] Carl B. Boyer. A history of mathematics. John Wiley & Sons, Inc., New York,
(1989, 2nd edition with Uta C. Merzbach)

[2] Bruce Cooperstein. Elementary Linear Algebra: Methods, Procedures and Al-
gorithms.
Can be obtained through ‘lulu’. As the title suggests, this book con-
tains descriptions of computational techniques (and not so much the the-
ory of linear algebra) and examples of the use of these techniques. See
http://linearalgebramethods.com for more information.

[3] Jan van de Craats. Vectoren en matrices. Epsilon Uitgaven, Utrecht (2000).
This book overlaps quite a bit with the course material. But of course, the
author uses his own approach.

[4] David C. Lay. Linear algebra and its applications. Pearson/Addison Wesley,
Boston, etc. (2006, 3rd ed. update)
A pleasant book to read. Contains many exercises.

[5] Liang-shin Hahn. Complex numbers & Geometry. The Mathematical Associa-
tion of America (1994)
This fine little book discusses the role of complex numbers in plane geometry.

[6] Paul Halmos. Finite–dimensional vector spaces. Van Nostrand (1958)


A classic with emphasis on the abstract theory of vector spaces.

[7] Murray R. Spiegel. Theory and problems of complex variables. Schaum’s Out-
line Series, McGraw-Hill (1974)

[8] Hans Sterk. Wat? Nog meer getallen. Syllabus complexe getallen ten behoeve
van Wiskunde D. See http://www.win.tue.nl/wiskunded

201
202 BIBLIOGRAPHY

[9] Gilbert Strang. Linear algebra and its applications. Harcourt Brace etc., San
Diego (1988, 3rd edition)
Clearly written text on linear algebra in which matrices are central. In contrast
to Halmos’s book, no emphasis on abstract theory.
Index

altitude, 64 decomposition
angle, 165 QR-decomposition, 178
dependent on, 110
basis, 47, 117 determinant, 139, 142
expansion across a column, 147
centroid, 27, 62
expansion across a row, 147
circle
expansion across the first column,
parametric equation, 24
147
circumcircle, 27
determinant function, 141
coördinaten, 121
dimension, 116, 117
column space, 133
distance, 51, 162
completing the square, 20
division
complex cosine, 13
long, 18
complex exponential, 11
complex number, 1 elementary row operations, 82
absolute value, 3
addition, 2 Fundamental theorem of algebra, 19
argument, 3
Gaussian elimination, 85
principal value, 3
Gram-Schmidt process, 173
complex conjugate , 7
divide by, 7 imaginary axis, 1
imaginary part, 4 inner product, 51, 160, 161
multiplication, 2 inner product space, 160, 161
real part, 4
complex polynomial, 16 length, 51, 162
coefficients, 16 line, 106
degree, 16 direction vector, 45, 106
complex sine, 13 parametric equation, 106
coordinate vector, 121 parametric representation, 45, 106
coordinates, 47 position vector, 106
Cramer’s rule, 151 supporting vector, 45
cross product, 58 vector representation, 45
right hand rule, 60 linear combination, 43, 110

203
204 INDEX

linear space, 100 parametric representation, 45


linear subspace, 103 permutation, 143
perpendicular, 166
matrix, 75 plane, 106
addition, 76 direction vectors, 106
coeffcients, 76 parametric equation, 106
column space, 133 position vector, 106
columns of, 75 polar coordinates, 3
elementary row operations, 82 polynomial equation, 16
elements, 76 projection
entries, 76 orthogonal, 171
indentity matrix, 79 Pythagoras
inverse, 80 Theorem of, 57
opposite, 79
product, 78 real axis, 1
rank, 135 row reduced echelon form, 84
reduced echelon form, 83 row space, 133
row reduced echelon form, 84
row reduction, 82 Sarrus’ rule, 144
row space, 133 scalaire vermenigvuldiging, 40
rows of, 75 scalar, 40
scalar multiplication, 77 span, 110
square, 78 standard basis, 118
submatrix, 146 standard inner product, 55, 161
transpose, 81 submatrix, 146
zero matrix, 79 system of linear equations, 86
median, 62 coefficient matrix, 86
multiplicity, 19 extended coefficient matrix, 87
homogeneous, 86
negenpuntscirkel, 26, 28 inconsistent, 87
non–trivial relation, 114 inhomogeneous, 86
norm, 162 right-hand side, 86
normal vector, 56 solution, 86

orthocenter, 27 triangle
orthogonal complement, 168 altitude, 64
orthonormal basis, 47, 54, 169 centroid, 62
orthonormal set, 169 median, 62
orthonormal set of vectors, 169 triangle inequality, 5, 164

parametric equation(s), 107 vector, 39, 100


INDEX 205

coordinate vector, 48
coordinates, 47
opposite, 41
scalar multiple, 40
scalar product, 40
zero vector, 40
vector parametric equation, 107
vector representation, 45
vector space, 100
axioms, 100
basis, 117
complex, 100
dimension, 117
normed, 165
real, 100
standard basis, 118
vectoroptelling, 41
vectors
linearly dependent, 114
linearly independent, 114
vectors space
dimension, 116

zero vector, 40

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy