LinearAlgebra1-LectureNotes2018 (1)
LinearAlgebra1-LectureNotes2018 (1)
2018-2019
ii
1 Complex numbers 1
1.1 Arithmetic with complex numbers . . . . . . . . . . . . . . . . 1
1.2 The exponential function, sine and cosine . . . . . . . . . . . . 11
1.3 Complex polynomials . . . . . . . . . . . . . . . . . . . . . . . 16
1.4 Geometry with complex numbers . . . . . . . . . . . . . . . . 22
1.5 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
1.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
1.6.1 Exercises from old exams . . . . . . . . . . . . . . . . . 37
i
ii CONTENTS
4 Vector spaces 99
4.1 Vector spaces and linear subspaces . . . . . . . . . . . . . . . 99
4.2 Spans, linearly (in)dependent systems . . . . . . . . . . . . . . 110
4.3 Coordinates . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
4.4 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
4.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
4.5.1 Exercises from old exams . . . . . . . . . . . . . . . . . 132
A Prerequisites 185
A.1 Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
A.2 Maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186
A.3 Some trigonometric relations . . . . . . . . . . . . . . . . . . . 187
A.4 The Greek alphabet . . . . . . . . . . . . . . . . . . . . . . . . 188
These are the lecture notes for the course Linear Algebra 1 (2WF20). Though
this translation follows a previous Dutch version quite closely, I have taken
the opportunity to include various improvements.
If you notice any mistakes, please let me know.
Hans Sterk
Summer 2017
iii
Chapter 1
Complex numbers
• the description of a complex number using its absolute value and ar-
gument;
1.1.2 Start with the usual coordinate system in the plane. Now call the horizontal
axis the real axis and the vertical axis the imaginary axis. Every point in the
plane is determined by its coordinates, say a and b, which are real numbers.
We will call the point (a, b) a complex number, and usually denote it by a+bi.
1
2 Complex numbers
The points (a, 0) will simply be denoted by a, and the points (0, b) on the
second axis by bi. In particular, i denotes the point (0, 1). So (1, 2) becomes
1 + 2i, and (0, 3) becomes 3i. We often denote a complex number by z or w.
The set of complex numbers is denoted by C.
bi z = a + bi
imaginary axis ✲
✻
0 a
real axis
z1 z2 := (a1 a2 − b1 b2 ) + (a1 b2 + a2 b1 ) i.
1.1 Arithmetic with complex numbers 3
Since we have defined the multiplication by using the usual rules for deal-
ing with expressions containing symbols, it is not that surprising that the
complex numbers share the usual arithmetical properties with the real num-
bers. The verification is fairly straightforward (cf. the properties of addi-
tion), but quite tedious. Here are the most often used properties: com-
mutativity, i.e., zw = wz for all complex numbers z and w; associativity,
i.e., (z1 z2 )z3 = z1 (z2 z3 ) for all complex numbers z1 , z2 , z3 ; distributivity:
z1 (z2 + z3 ) = z1 z2 + z1 z3 for all complex numbers z1 , z2 , z3 .
• the argument is the angle the directed segment from the origin to the
complex number makes with the positive real axis. The argument is
only defined for nonzero complex numbers.
z ❖
■ arg(z)
|z| ❘
If a complex number z has absolute value |z| and argument ϕ, then the
cartesian coordinates of the corresponding point in the plane are |z| cos ϕ
and |z| sin ϕ, respectively, so that
|z + w| ≤ |z| + |w|.
This property is called the triangle inequality. See exercise 6 for a proof.
(Here, (mod 2π) stands for ‘modulo/up to multiples of 2π’.) These properties
are based on properties of the sine and cosine. To prove them, take two
complex numbers, z1 en z2 with polar coordinates (|z1 |, ϕ1 ) en (|z2 |, ϕ2 ),
respectively, so that
z1 = |z1 | cos ϕ1 + i |z1 | sin ϕ1 ,
z2 = |z2 | cos ϕ2 + i |z2 | sin ϕ2 .
Then
z1 z2 = |z1 | |z2 | (cos ϕ1 cos ϕ2 − sin ϕ1 sin ϕ2 ) +
i(cos ϕ1 sin ϕ2 + cos ϕ2 sin ϕ1 )
= |z1 | |z2 | cos(ϕ1 + ϕ2 ) + i sin(ϕ1 + ϕ2 ) .
Since the absolute value of the complex number r(cos t + i sin t) (with r, t real
and r > 0) is r and its argument is t (up to multiples of 2π) we find:
z (1/z) = 1,
1 1
z
= |z| ,
arg(1/z) = − arg(z).
In terms of polar coordinates (again using formula (1.1)) we obtain for the
quotient:
|z/w| = |z|/|w|,
(1.4)
arg(z/w) = arg(z) − arg(w) (mod 2π).
z1 w 1 z1 w 1
= .
z2 w 2 z2 w 2
The real and imaginary part of 1/z can be obtained as follows. Suppose
z = a + bi with a, b ∈ R and not both equal to 0. Then
1 1 a − bi a − bi
= = = 2 .
z a + bi (a + bi)(a − bi) a + b2
1 1 3 − 4i 3 − 4i 3 − 4i
= · = 2 2
= ,
3 + 4i 3 + 4i 3 − 4i 3 +4 25
and
1+i 1 + i 2 − 3i 5−i
= · = .
2 + 3i 2 + 3i 2 − 3i 13
Re(z) = Re(z̄),
Im(z) = − Im(z̄),
1
Re(z) = 2
(z + z̄),
1
Im(z) = 2i
(z − z̄),
|z| = |z̄|,
arg(z) = − arg(z̄),
z1 + z2 = z1 + z2 ,
z1 z2 = z1 · z2 ,
z + z̄ = 2 Re(z),
z z̄ = |z|2 .
Also note the following: two complex numbers are equal if and only if their
real parts and their imaginary parts are equal. Two nonzero complex num-
bers are equal if and only if their absoluate values are equal and their argu-
ments are equal up to multiples of 2π.
(we only use the values k = 0 and k = 1, since for k = 2 we find the
same solution as for k = 0, for k = 3 we find the same solution as
for k = 1, etc.). Verify that these to numbers are 1 + i and −1 − i,
respectively.
Note that the second approach is to be prefered if the exponent is
bigger, for instance, z 6 = −1 (try the first approach and you’ll quicly
see why).
|ez | = eRe(z) ,
z
arg(e ) = Im(z) .
1.2.2 Note the use of the real exponential function in this definition. The definition
of the complex exponential function agrees with the real exponential function
for real numbers z, since for a real number z = x+i·0, our new definition 1.2.1
yields |ez | = ex and arg(ez ) = Im(z) = 0. So ez equals the real exponential
ez .
Note furthermore that ez 6= 0 for all complex z because |ez | = eRe(z) 6= 0
(the real exponential function has no zeros).
1.2.3 Example. The complex number eπi has absolute value eRe(πi) = e0 = 1 and
argument Im(πi) = π, so that eπi = −1. The number e1+πi/2 has absolute
value eRe(1+πi/2) = e1 and argument Im(1 + πi/2) = π/2, so that e1+πi/2 = e i.
1 π
{ log(2) + i( + 2kπ) | k ∈ Z },
2 4
where log denotes the natural logarithm.
1.2.5 Theorem. ez1 ez2 = ez1 +z2 for all complex numbers z1 and z2 .
12 Complex numbers
Proof. We prove this equality by showing that both sides have the same
absolute value and the same argument. Here is the computation for the
absolute value (note that we use arithmetical rules for the absolute value,
and for the real exponential function):
1.2.6 Theorem. (ez )n exists for every integral exponent n. It satisfies (for every
complex number z and integral n) the following property:
(ez )n = enz .
So (ez )−1 = e−z . Applying Theorem 1.2.5 the equality (ez )n = enz follows for
all integral negative n.
1.2.9 The formulas in Corollary 1.2.7 are a special case of a general property. Let
ϕ be a real number. Then eiϕ is a complex number with absolute value 1
and argument ϕ, so:
Property. For every real number ϕ we have
eiϕ = cos ϕ + i sin ϕ .
This relation connects the complex exponential function and the (real) sine
and cosine. Moreover, it provides another short way of representing a com-
plex number:
z = |z| ei arg z .
√
For instance, 1 + i = 2 eπi/4 .
it follows from 1.2.9 that these definitions agree with the real sine and cosine,
i.e., if you take z to be a real number in the new definition, then sin z and
cos z are simply the usual real sine and cosine of z, respectively.
14 Complex numbers
1.2.12 Theorem. Sine and cosine are periodic and have period 2π. Also,
Proof.
1 1
sin(z + 2π) = 2i
ei(z+2π) − e−i(z+2π) = 2i
eiz e2πi − e−iz e−2πi
1
= 2i
eiz − e−iz = sin z,
since e±2πi has absolute value 1 and argument ± 2π, and therefore equals 1.
The proof that the cosine is periodic with period 2π is similar.
The relation sin2 (z) + cos2 (z) = 1 is proved by substituting the defining
expressions for sin(z) and cos(z):
1 2 1 2 1 1
(eiz +e−iz ) + (eiz −e−iz ) = (e2iz +2+e−2iz )− (e2iz −2+e−2iz ) = 1.
2 2i 4 4
cos(z) = 2 .
It is clear that there are no real solutions. But there turn out to be complex
solutions. First rewrite the equation in terms of the exponential function:
1 iz
e + e−iz = 2 .
2
Now set w = eiz , then w 6= 0 because of 1.2.2, and we find:
1
w+ w
=4,
w2 − 4w + 1 = 0 ,
(w − 2)2 = 3 ,
√
w =2± 3.
1.2 The exponential function, sine and cosine 15
So we arrive at:
√ √
|eiz | = e−Im(z) = |w| = 2 ± 3 , dus Im(z) = − log(2 ± 3) .
Note that the absolute value of the complex cosine is not bounded by 1 like
for the real cosine.
1.2.14 Theorem.
ez = ez̄ ,
sin(z) = sin(z̄),
cos(z) = cos(z̄).
Proof.
z Re(z)
z
|e | = |e | = e = eRe(z̄) ,
arg ez = − arg(ez ) = − Im(z) = Im(z̄),
so ez = ez̄ .
1 iz 1 1 īz̄
sin(z) = e − e−iz = eiz − e−iz = − e − e−īz̄
2i 2i 2i
1 −iz̄ 1 iz̄
=− e − eiz̄ = e − e−iz̄
2i 2i
= sin(z̄).
1.2.15 The formula ez ew = ez+w for the complex exponential function is useful in
deriving trigonometric formulas. For instance, start with e2it = eit eit (for
real t) and rewrite this relation as follows:
Since (cos t + i sin t)(cos t + i sin t) = cos2 t − sin2 t + 2i sin t cos t we find upon
comparing real and imaginary parts:
cos(2t) = cos2 t − sin2 t en sin(2t) = 2 sin t cos t.
This formula (and many similar ones) also turn out to hold for complex
values of t; this is easily verified by applying the definition of the complex
sine and cosine.
In a similar way the formula eia eib = ei(a+b) leads to
cos(a + b) = cos a cos b − sin a sin b;
sin(a + b) = sin a cos b + cos a sin b.
1.3.3 Example. Here is how we use these formulas to solve the equation
z3 = i .
(If you rewrite it like z 3 − i = 0 you see that it comes from a polynomial
equation.) We solve this equation by comparing the absolute values and
arguments of both sides of the equation. First note that any solution is
nonzero. Now turn to the absolute values:
z n = a,
−1 1
−i
p(z) = (z − α) q(z) .
p(z) = (z − α)q(z) + r .
0 = p(α) = (α − α)q(α) + r
and so r = 0.
1.3 Complex polynomials 19
1.3.6 If p(α) = 0, then p(z) can be written as (z − α)q(z) for some polynomial
q(z). If q(α) = 0, then q(z) also contains a factor z − α, and so we have (for
some polynomial s(z))
az + b = 0 , a 6= 0
implies z = − b/a.
az 2 + bz + c , a 6= 0 .
2 b
2 b 2 4ac − b2
az + bz + c = a(z + z) + c = a(z + ) + .
a 2a 4a
20 Complex numbers
b
Now let w = z + 2a
, then we find the (quadratic) equation
b2 − 4ac
w2 = ,
4a2
which is of the type we discussed before. It has two solutions for w (unless
b2 −4ac = 0), see 1.3.4, from which the two solutions for z follow immediately.
The technique we used in rewriting the quadratic equation is called com-
pleting the square. Note that we do not use the abc formula, since we haven’t
defined square roots.
z 2 + (2 + 4i)z + i = 0 .
(z + (1 + 2i))2 = −3 + 3i ,
The following theorem deals with polynomials whose coefficients are all real.
In that case any non-real solution automatically produces a ‘twin’ solution.
1.3.14 Corollary. Every nonzero polynomial with real coefficients can be factored
as a product of real polynomials of degree 1 and 2.
Proof. Let p(z) be a polynomial with real coefficients. If α is a real zero of
p(z), then p(z) can be written as
p(z) = (z − α)q(z) ,
where q(z) is also a polynomial with real coefficients. If α is a non-real zero
of p(z), then ᾱ is also a zero and
p(z) = (z − α)(z − ᾱ) r(z)
= (z 2 − (α + ᾱ)z + αᾱ) r(z)
= (z 2 − 2Re(α)z + |α|2 ) r(z) .
The first factor has real coefficients, so r(z) has real coefficients. Since the
degrees of q(z) and r(z) are less than the degree of p(z), we can repeat this
construction until we get to the point where the degrees of the quotients are
0.
22 Complex numbers
For two distinct complex numbers z and w the complex numbers of the
form z + t(w − z) with t real run through the points of the line through z
and w. The segment with endpoints z and w (whose points correspond to
1
parameter values t in the interval [0, 1]) is denoted by [z, w]. For t = we
2
find the midpoint of the segment [z, w]:
1 1
w + (z − w) or (z + w).
2 2
The length of the segment [z, w] is equal to the distance between z and w,
i.e., |w − z|. The complex number w − z not only encodes the information
on the length of the segment [z, w], but also on the segment’s direction via
√ argument. For example, the segment [2 + i, 3 + 2i] has length |1 + i| =
its
2, and it makes an angle of π/4 radians with the positive real axis since
arg(1 + i) = π/4.
The lines through z1 and z2 (with z1 6= z2 ) and through w1 and w2 (with
w1 6= w2 ), respectively, are parallel if and only if w2 − w1 is a real multiple
w2 − w1
of z2 − z1 , or, equivalently, the quotient is real.
z2 − z1
1.4.3 Example. This example illustrates the use of complex numbers in handling
segments in a triangle. Suppose △ABC is a triangle in the plane. Let D
be the midpoint of AC and let E be the midpoint of BC. We will show,
using complex numbers, that DE is parallel with AB and that the length of
segment AB is twice the length of segment DE.
To show this, let A, B, C correspond to the complex numbers z1 , z2 , z3 ,
respectively (it turns out to be irrelevant where the origin is). Then D and
E correspond to
1 1
(z1 + z3 ) and (z2 + z3 ) ,
2 2
respectively. Since
1 1 1
(z2 + z3 ) − (z1 + z3 ) = (z2 − z1 )
2 2 2
we conclude that DE and AB are parallel and that segment DE is half as
long as segment AB.
1.4.4 Translations
Let u be a complex number. The map T : C → C given by T (z) = z + u
24 Complex numbers
so that the argument of R(z) is α radians more than that of z. Another way
of saying this is: if z is on the circle C with equation |z| = r, then R(z) is
also on C.
Similarly, if the absolute value of w 6= 0 differs from 1, then multiplication
by w defines a transformation of the plane in which each complex number is
rotated through arg(w) radians and is scaled by a factor |w|.
A circle with center z0 and radius r consists of all complex numbers z
satisfying
|z − z0 | = r.
Depending on the situation, alternative descriptions may be useful. Here are
a few equvalent descriptions.
1
For example, suppose you are asked to show that w = for every complex
w
number w on the circle C : |z| = 1, then you could proceed as follows. Let
w be an arbitrary complex number on C, then w can be written as eit for
1 1
some real t. Then w = eit = e−it by Theorem 1.2.6, and = it = e−it by
w e
Theorem 1.2.14, and so we are done. (An alternative approach is to write w
in the form cos t + i sin t, etc.)
1.4.6 Example. If the complex numbers z and w have the same absolute value
(6= 0) and if the angle between (the segments connecting 0 with) z and w
is equal to α, then the argument of the quotient z/w is α or −α so that
z/w = eiα of z/w = e−iα . Another way of phrasing this is to say that
z = weiα or z = we−iα .
If, for instance, in △αβγ (so a triangle with vertices α, β, γ) γ − α =
e±πi/3 (β − α), then this can be read as: the segment [α, γ] is obtained from
the segment [α, β] by a rotation through ±π/3 radians. In particular, these
two segments have the same length:
|γ − α| = |e±πi/3 (β − α)| = |e±πi/3 | · |(β − α)| = |β − α|.
And of course, the triangle is then equilateral by the congruence criterion
SAS (side-angle-side). Here is a verification that |γ − β| = |β − α|(= |γ − α|)
using complex numbers. First rewrite γ − β as follows:
γ − β = γ − α + α − β = e±πi/3 (β − α) + α − β = (e±πi/3 − 1)(β − α).
1 1 √ 1 1 √
Now note that e±πi/3 − 1 = ± i 3 − 1 = − ± i 3 whose absolute value
2 2 2 2
is 1. So
|γ − α| = |(e±πi/3 − 1)(β − α)| = |e±πi/3 − 1| · |β − α| = |β − α|.
Note that this example really comes down to the fact that the complex num-
bers 0, eπi/3 and eπi/3 − 1 are the vertices of an equilateral triangle.
1.4.7 Example. If △z1 z2 z3 is a triangle, then for every complex number w 6= 0 the
triangles △z1 z2 z3 and △(wz1 )(wz2 )(wz3 ) (multiply each vertex zi by w) are
similar. There are various ways to see this. One way is to compare lengths
of corresponding sides (using the rules for absolute values):
|wz2 − wz1 | |w| · |z2 − z1 |
= = |w|,
|z2 − z1 | |z2 − z1 |
26 Complex numbers
and
|wz3 − wz1 | |w| · |z3 − z1 |
= = |w|,
|z3 − z1 | |z3 − z1 |
and, similarly, |wz3 − wz2 | = |w| · |z3 − z2 |. So the triangles are similar by
the sss criterion (side-side-side).
Of course, you can also compare the angles of triangle △(wz1 )(wz2 )(wz3 )
wz1 z1
with those of triangle △z1 z2 z3 , e.g., arg( ) = arg( ). So both triangles
wz2 z2
have the same angles and are therefore similar.
1.4.8 Example. Let △ABC be a triangle. Let BCDE and ACF G be two squares
erected externally on the sides BC and AC, respectively, as in the illustra-
tion. Let H be the midpoint of DF . Prove that HC and AB are perpen-
dicular. The idea in the following proof is to connect perpendicularity with
F
H
G D
C
E
A B
multiplication by i.
Put the origin in C (the point C seems central to the configuration, so
looks like a reasonable choice to make the computations easier) and denote
vertex A by the complex number z and vertex B by w. Then vertex D
corresponds to iw (rotate B around C through 90◦ ) vertex F corresponds to
−iz (rotate vertex A through −90◦ ). The midpoint of segment DF is then
1
(iw − iz). Since
2
1 1
(iw − iz) = · i · (w − z)
2 2
and since w − z corresponds to segment AB, we find that HC is indeed
perpendicular to AB (and has half its length).
chosen in the center of the circumcircle. Suppose that |α| = |β| = |γ| =
1 1 1
1. The points (β + γ), (α + γ) and (α + β) are the midpoints D,
2 2 2
E, F of the three sides BC, AC and AB, respectively. It follows from
classical geometry that the segments connecting the origin with each of these
midpoints are perpendicular to the corresponding sides of the triangle. The
1
point (α + β + γ) is the centroid Z of triangle △αβγ.
3
P
E D
Q Z O
H N
A R F B
Figure 1.2: The circumcircle of △ABC with center O, and the circumcircle of
△DEF with center N . The centroid Z, the orthocenter H and the altitudes
are also shown.
1
The point h = α + β + γ is also special: h − γ = α + β = 2 · (α + β), so
2
h−γ and AB are perpendicular and h−γ is twice as long as OF . So the point
h is on the altitude from C. Similarly, h is on the altitudes through B and
A, respectively. So the three altitudes are concurrent. Their common point
H (or h in terms of complex numbers) is called the orthocenter of triangle
△ABC. (By the way: in every triangle the three altitudes are concurrent;
the assumptions we have made are not restrictive; do you see why?)
1
The point (α + β + γ) or h/2 is also special. To see this, consider the
2
28 Complex numbers
distances between this point and the three midpoints of the sides of △ABC:
1 1 1 1
| (α + β + γ) − (β + γ)| = | α| = .
2 2 2 2
1
The distances to the midpoints E and F are equal to and so the point h/2
2
is the midpoint N of the circumcircle of triangle △DEF (see figure 2.15).
E D
O
H N
K L
A F B
Figure 1.3: The circumcircle of △DEF with center N also passes through
the midpoints of segments AH, BH en CH.
the three segments connecting h and the three vertices A, B and C, respec-
1
tively. The distances of h to these midpoints is equal to :
2
1 1 1 1
| h − (h + α)| = | − α| = ,
2 2 2 2
etc. By now we have: the midpoints of the sides of △ABC and the midpoints
of the segments HA, HB en HC lie on the same circle.
This circle turns out to also pass through the three feet of the altitudes
of △ABC (as figure 2.15 suggests). For this reason the circle is called the
nine-point circle of △ABC. The proof is discussed in exercise 24.
1.4 Geometry with complex numbers 29
1.5 Notes
More worked examples can be found in in [7] en [8] (see the bibliography at the
end of the lecture notes). The role of complex numbers in geometry is extensively
discussed in [5].
The construction of complex numbers is an example of the construction of an
Algebra arithmetical system. Another example is the system Z/nZ of integers modulo n.
Such constructions are discussed in the various algebra courses.
Complex numbers have a centuries long history. A ‘formal’ definition in terms
of pairs of real numbers was given by Sir William Hamilton (1805–1865), see [1],
p. 524. He defined the addition on such pairs by (a, b) + (c, d) = (a + c, b + d),
and the multiplication by (a, b) · (c, d) = (ac − bd, ad + bc). By agreeing to write
a instead of (a, b) (for real a) and i for (0, 1), we arrive at the usual notation
a + bi. Hamilton’s approach to define complex numbers in terms of the familiar
real numbers contributed to the demystification of complex numbers. Hamilton
generalized his construction to an arithmetical system with elements of the form
a+bi+cj+dk (with a, b, c, d ∈ R), where i2 = −1, j 2 = −1, k 2 = −1, ij = k = −ji,
jk = i = −kj, ki = j = −ik. This is the famous arithmetical system of the
quaternions.
Linear Complex numbers are useful for linear algebra since they enable us to solve
Algebra polynomial equations related to linear transformations, as will be discussed in
Linear Algebra 2. Polynomials are discussed in more detail in the algebra courses.
They play an important role in many branches of mathematics, ranging from
numerical mathematics to cryptology.
The Fundamental Theorem of Algebra has a long history in itself. It took
many decades in the 18th and 19th century and the efforts of mathematicians like
d’Alembert, Argand, Gauss to produce a rigorous proof (many candidate proofs
contained a subtle gap which could only be filled after the development of rigorous
analysis and topology), see [1]. A proof that uses complex integration is discussed
Complex in Complex Analysis. The fact that there do not exist explicit formulas for solving
Analysis polynomial equations of degrees 5 and higher requires a substantial amount of
algebra.
The analysis of functions f : C → C, i.e., limits, continuity, differentiation,
integration, is also discussed in the course on complex analysis. Complex analysis
is extensively used not only in mathematics, but also in electrical engineering and
in mathematical physics.
1.6 Exercises 31
1.6 Exercises
§1
1 Write each of the following complex numbers in the form a+bi with a en b real:
7+i
a. (2 + 3i)(1 − i), d. ,
1 + 2i
√ √ 9 − 3i
b. (− 12 + 21 i 3)(− 12 − 12 i 3), e. ,
1 + 3i
1 z 1
√ 1
√
c. , f. , met z = 2
2 + 2
2i.
4 − 3i (z + 1)2
2 Write each of the following complex numbers in the form r(cos ϕ + i sin ϕ),
with r > 0 en −π ≤ ϕ ≤ π, and draw these numbers
√ in the complex plane:
a. −3, d. 3 + i,
b. 2i, e. 5 + 12i,
c. 1 + i, f. 4 − 4i.
3 Draw an arbitrary complex number z (and not on the real axis) in the com-
plex plane.
1
z + 2, −2z, , z − 2i, iz, z, −iz.
z
4 In the complex plane, draw the complex numbers z ∈ C that satisfy both
π 3π
|z + 1 − i|2 ≤ 2 and ≤ arg z ≤ .
2 4
5 Determine all complex numbers z that satisfy
32 Complex numbers
√
a. |z − i| = |z + 3i|, d. Re(z 2 + 1) = 0 and |z| = 2,
2π
b. |z − 3i| = |4 + 2i − z|, e. arg(z/z) = 3
.
c. Re(z 2 ) = Im(z 2 ),
6 Prove the triangle inequality |z1 + z2 | ≤ |z1 | + |z2 | in the following steps.
§2
7 Draw each of the following complex numbers in the plane and write them in
the form a + bi (with a, b real):
a. 2eπi/2 , d. e5πi/3 ,
b. 3e2πi/3 , e. e(−πi/3)+3 ,
√
c. 2eπi/4 , f. e−5πi/6+2kπi , k ∈ Z.
1+i
c. eRe(z) = 5, f. e2iz = 1−i
.
9 Use the definitions of the complex cosine and sine to show each of the fol-
lowing statements.
b. sin(2z) = 4.
§3
11 Solve each of the following equations and draw the solutions in the complex
plane.
a. z 6 = 1, e. (z + 2 − i)6 = 27i,
b. z 3 = 8, f. z 2 = z,
c. z 4 = 16i, g. z 3 = −z.
d. (z + i)4 = −1,
12 Solve each of the following equations and draw the solutions in the complex
plane.
a. z 2 + z + 1 = 0,
b. z 2 − 2iz + 8 = 0,
c. z 2 − (4 + 2i)z + 3 + 4i = 0,
d. z 2 (i + z 2 ) = −6.
c. Suppose 5 and 1+2i are zeros of degree 3 polynomial with real coefficients.
Determine such a polynomial.
b. z 3 + 3z 2 + 4z + 2,
c. z 4 + z 3 + 2z 2 + z + 1.
15 a. Compute (1 + i)11 .
16 Prove that for all positive integers n De Moivre’s formula holds for real ϕ:
c. The angle between the lines ℓ and m through the origin is α radians. We
first reflect z in ℓ and then the result in m. Show that this composition of
these two reflections is a rotation through 2α radians around the origin.
[Hint: assume the angle between ℓ and the positive real axis is β radians,
and the angle between m and the real axis is β + α radians.]
c. From this item onwards, the origin is not necessarily located in one of the
vertices. Prove that γ − α = ρ(β − α) or γ − α = ρ̄(β − α).
21 Let ℓ be the line through the two distinct complex numbers v and w. Then
ℓ consists of all complex numbers of the form v + t(w − v) with t real.
a. Prove: if, for a complex number z with z 6= v and z 6= w, the quotient
z−w
is a real number, say t, then z is on the line ℓ.
z−v
z−w
b. Prove: if z, distinct from v and w, is on ℓ, then the quotient is a
z−v
real number.
22 Suppose ABCD and AB ′ C ′ D′ are two squares in the plane that a) have
vertex A in common, b) have the same orientation of the vertices, and c) lie
outside one another. Let P be the intersection of the diagonals AC and BD;
let Q be the intersection of the diagonals AC ′ and B ′ D′ ; let R be the midpoint
of the segmen BD′ , and let S be the midpoint of the diagonal B ′ D. Prove
that P QRS is a square by first showing that segment P S transforms into P R
by a rotation through 90◦ . (Do not denote complex numbers corresponding
to P , etc., by P , etc.; use for instance corresponding small letters.)
b. Back to the general case: show that z can be written as u + reit · (v − u).
Use this to show that the mirror image of z is equal to
v−u
u + (z̄ − ū) .
v̄ − ū
u + v − uvz̄
α − α′ ᾱ − ᾱ′
+ = 0.
β−γ β̄ − γ̄
α − α′
[Hint: since α − α′ and β − γ are perpendicular, the quotient is
β−γ
purely imaginary.]
βγ
α′ = − .
α
[Note: an alternative approach would be to compute the mirror image of
h = α + β + γ in the line AB with the formula from exercise 23c) and to
verify that this mirror image in on the circumcircle of △ABC.]
c. Show that the segments BH and BA′ have the same length.
1
e. Show that the distance between h/2 and P equals . Conclude that the
2
nine-point circle passes through the three feet P , Q, R of the altitudes.
z̄ · z
= 1.
(1 − z)2
38 Complex numbers
|z + 2i| = |z − 3|.
29 Let p(z) be a complex polynomial. Prove that p(z) is a real polynomial (i.e.,
all its coefficients are real) if and only if p(z) = p(z) for all z ∈ C.
30 Solve in C:
z 3 = i z.
31 Suppose the squares ABCD and A′ B ′ C ′ D′ have the same orientation (so
going from A to B to C to D and going from A′ to B ′ to C ′ to D′ is both
clockwise or counterclockwise). Prove that the midpoints of the segments
AA′ , BB ′ , CC ′ , and DD′ are the vertices of a square.
Chapter 2
2.1.2 Vectors
A vector corresponds with an arrow in the plane or in space, and is de-
termined by its direction and its magnitude (length). Therefore, an arrow,
translated parallel to itself to any point in space but with the same direction
and magnitude, is considered to represent the same vector. Such translated
arrows are called equivalent, i.e. represent the same vector1 .
1
Dont confuse this with a force vector (or any other vector valued quantity) applied
to a physical point in space! Although the force has, as a vector, many mathematical
39
40 Vectors in two and three dimensions
Figure 2.1: On the left, representations of the same vector are drawn: direc-
tion and length of each arrow are the same, but the heads and tails differ. On
the right, different vectors are drawn with the same starting point, namely a
chosen origin O in the plane.
v.
Furthermore, we write v for 1 v, −v for (−1)v, −3v for (−3)v, etc. The
vector −v is called the opposite of v.
u 2u −u
• 0 · v = 0;
• λ(µv) = (λµ)v.
In words: if the vector v is first multiplied by µ and the resulting vector
is multiplied by λ, then the result equals the scalar product of the scalar
λµ and the vector v.
(u + v) + w = u + (v + w)
42 Vectors in two and three dimensions
u+v u+v
v
v
u u
Figure 2.3: Addition of vectors: on the left the construction using a paral-
lelogram, on the right the head-to-tail construction, joining the tail of the
second vector to the head of the first one.
for all vectors u, v, w. Note that the addition is only defined for
two vectors, and not for three or more. So if you want to add three
vectors, you will have to split the problem in various additions of two
vectors. For instance, you could add the first and second vector, and
then add the result to the third vector, so this corresponds to (u+v)+w.
Associativity tells you that it doesn’t matter in which way you split the
problem, the answer is always the same. That’s the (justified) reason
we often leave out the brackets (we sometimes put in brackets to clarify
calculations for the reader). So we often simply write v 1 + v 2 + v 3 + v 4
for the addition of four vectors instead of, say,
v 1 + ((v 2 + v 3 ) + v 4 ).
v+w =w+v
for all vectors v and w. This is obvious from the parallellogram con-
struction. It implies that you can change whenever needed the order
of the vectors in additions. For instance, u + v + w = w + u + v. Here
is how this specific equality follows from commutativity:
u + v + w = u + w + v = w + u + v.
From now on, you don’t have to supply such proofs any time you use
commutativity, unless a proof is explicitly asked for.
• Instead of v + −w we usually write v − w (subtraction of vectors).
2.1 Vectors in dimensions two and three 43
Here are the arithmetic rules that involve both addition and scalar multipli-
cation.
• Distributivity of the scalar multiplication over addition:
λ(v + w) = λv + λw
for all vectors v, w and for all scalars λ.
• Distributivity of the scalar addition over the scalar multiplication:
(λ + µ)v = λv + µv
for all scalars λ, µ and all vectors v.
The sum of any vector v and its opposite −v always yields the zero vector:
v − v = 0.
2.1.6 Linear combinations
If v 1 , v 2 , . . . , v n are n vectors and λ1 , λ2 , . . . , λn are n real numbers (scalars),
then the vector
λ1 v 1 + λ2 v 2 + · · · + λn v n
is called a linear combination of the vectors v 1 , v 2 , . . . , v n . Linear combina-
tions are the vectors we can build out of a given set of vectors using addition
and scalar multiplication.
So 2u − 3v + 2w is a linear combination of u, v, w.
2.1.7 Examples. The following examples show that computations with vectors
involving addition and scalar multiplication only are fairly easy. Note that
we cannot multiply two (or more) vectors (but see §2.5).
a) 3v − w + 2v + 3w = 5v + 2w. Here are the detailed steps, using the
various arithmetic rules. By commutativity
3v − w + 2v + 3w = 3v + 2v − w + 3w.
Next, distributivity and the fact that −w = (−1)w imply
3v + 2v − w + 3w = (3 + 2)v + (−1 + 3)w = 5v + 2w.
Note that because of associativity we didn’t place brackets. Otherwise,
the first step of the computation would have looked like:
(3v+(−w+2v))+3w = (3v+(2v−w))+3w = ((3v+2v)−w)+3w = . . .
44 Vectors in two and three dimensions
2.2.2 Lines
The scalar multiples x = λv of a vector v, with v 6= 0, run through all points
(vectors) of a straight line ℓ through the origin.
a + λv
a
v
2.2.3 Example. (Supporting and direction vectors of a line are not unique)
The vector p + v is on the line ℓ with parametric description x = p + λv:
just take λ = 1. This vector p + v may serve as a supporting vector of ℓ,
since the vectors p + v + µv run through the same vectors for varying µ as
the vectors p + λv (for varying λ). This follows easily from the equalities
p + v + µv = p + (1 + µ)v and p + λv = p + v + (λ − 1)v. In fact, every vector
on ℓ may serve as a supporting vector.
Similarly, 2v, −3v, π v are direction vectors of ℓ. For instance, the vectors
p + µ(2v) run through the vectors of ℓ for varying µ.
2.2.4 Planes
Planes in space can also be represented in terms of a vector or parametric
representation. For this we need one vector whose endpoint is in the plane (a
supporting vector) and two direction vectors which are not scalar multiples of
each other. Since we use two direction vectors, we also need two parameters.
The plane U through the origin and with direction vectors u and v has
the following parametric description:
U : x = λu + µv.
The plane V with supporting vector a and direction vectors u and v has the
following parametric representation:
V : x = a + λu + µv.
Just as with lines, neither the supporting vectors nor the direction vectors
are uniquely determined.
46 Vectors in two and three dimensions
v v
u u
Figure 2.5: On the left a plane through the origin. On the right a plane
through a with direction vectors u and v.
2.2.5 Example. (Supporting and direction vectors of planes are not unique)
The plane V with parametric equation V : x = a + λu + µv can, for instance,
also be described in the following way:
To see this we have to verify that every vector of the form a + λu + µv can
also be written in the form a + ρ(u + v) + σ(u − v), and vice versa. The
following two equalities show this:
a + λu + µv = a + 12 (λ + µ)(u + v) + 12 )(λ − µ)(u − v)
a + ρ(u + v) + σ(u − v) = a = (ρ + σ)u + (ρ − σ)v.
In fact, one can prove in a similar way that any two linear combinations of u
and v that are not multiples of one another, may serve as direction vectors.
For example, the pair 2u + 3v, 2u − 5v is such a couple.
As with lines, any vector on V can serve as supporting vector of V . For
example, a + 3u + 5v is such a vector.
2.3.2 Basis
2.3 Bases, coordinates, and equations 47
• The plane
In the plane we need two vectors which are not multiples of each other,
v1 e1+ v2 e2
e2
e1
Figure 2.6: Using the basis e1 , e2 any vector in the plane can be described
with the use of two coordinates.
v = v1 e 1 + v2 e 2 ,
• 3-dimensional space
In space we choose three vectors e1 , e2 , e3 that are not coplanar (i.e.,
whose endpoints do not lie in one plane with the origin). Then any
vector x can be written as a linear combination of these three vectors:
v = v1 e 1 + v2 e 2 + v3 e 3 ,
v + w ↔ (v1 + w1 , v2 + w2 , v3 + w3 )
λv ↔ (λv1 , λv2 , λv3 )
This yields a linear equation in the variables x1 and x2 . Lines do not have
unique equations. For instance, the equations x1 + 2x2 = 3 and 2x1 + 4x2 = 6
obviously describe the same line.In fact, multiplying both sides of an equation
by the same nonzero scalar doesn’t change the solution set.
Note that a vector parametric equation ℓ : x = a + λv of a line gives an
explicit description of the vectors on the line: every value of λ produces a
vector (or coordinate vector) on the line.
An equation describes the vectors on the line implicitly: you only know
which relation the coordinates of a vector need to satisfy in order to be the
coordinates of a vector on the line.
A line in space can also be described using two linear equations, because a
line can be seen as the intersection of two planes and every plane can be
described by a linear equation (extensive details on this follow in Chapter
4). For instance, the system x1 + x2 + x3 = 1, 2x1 − x3 = 0 describes the line
x = (0, 1, 0) + λ(1, −3, 2) (by substitution you can verify that every vector
satisfies both equations). A way to find this parametric equation from the
two linear equations is to choose x1 as parameter, call it λ, and then deduce
that x3 = 2λ and x2 = 1 − x1 − x3 = 1 − 3λ. The computational techniques
behind this will be discussed in Chapter 3.
x1 = a1 + λu1 + µv1
x2 = a2 + λu2 + µv2
x3 = a3 + λu3 + µv3 .
d 1 x1 + d 2 x2 + d 3 x3 = d 4 ,
x1 + 3x2 = 1 + 3λ + 3(2 − λ) = 7.
x1 = 2 + λ/2 − 3µ/2, x2 = λ, x3 = µ.
In vector notation:
k u k · k v k · cos ϕ,
where ϕ is the angle between the vectors u and v (note the role of the cosine:
the sign of the angle doesn’t matter). If one (or both) of the vectors is the
52 Vectors in two and three dimensions
zero vector, then the inner product is, by definition, 0. We denote the inner
product by
(u, v).
Here is an example. Suppose the vectors u, v have length 4 and the angle
v
v cos φ
Figure 2.7: If the angle between the vectors u and v is at most π/2, then
their inner product equals the product of the length of u and the length of the
projection of v on the line x = λu.
k u k · k v k · cos ϕ =k v k · k u k · cos ϕ.
• Orthogonality:
If two non-zero vectors have inner product 0, then they are perpen-
dicular (the angle between them is ±90◦ or ±π/2) since the cosine of
the angle between them is 0. Conversely, if two non-zero vectors are
perpendicular, then their inner product is 0. Now the zero vector has
inner product 0 with any vector, and we agree to say that the zero
vector is perpendicular to any vector. This is a convenient convention
since then we have: The inner product of two vectors is 0 if and only
if the two vectors are perpendicular.
2.4.3 Examples. Although lengths and angles are maybe what we are really in-
terested in, the inner product is so useful because of the arithmetic rules it
satisfies. For instance, k u + v k usually differs from k u k + k v k, but
(u + v, u + v) is easy to expand using the rules. Often it is therefore useful
to translate problems involving lengths and angles into problems with in-
ner products. Here are some examples demonstrating the use of the inner
product’s properties.
54 Vectors in two and three dimensions
a) Suppose that (u, v) = 2. Using the arithmetic rules the inner product
(3 u, −4 v) is computed as follows:
Next, we turn to the first term on the right-hand side, (u, u − 2v):
v1 w 1 + v2 w 2 + v3 w 3 .
Finally, the cosine of the angle between the vectors v and w (both 6= 0) equals
(v, w) v1 w 1 + v2 w 2 + v3 w 3
cos ϕ = =p 2 .
kvk·kwk
p
v1 + v22 + v32 · w12 + w22 + w32
2.4.5 R2 , R3 and the standard inner product
Motivated by the previous discussion, we introduce th so-called standard
inner product in R2 and R3 , viewed as vector spaces themselves (more on
this in later chapters). A vector in R2 is a pair of real numbers like (a1 , a2 ).
The standard inner product of two vectors a = (a1 , a2 ) and b = (b1 , b2 ) in R2
is defined as
(a, b) := a1 b1 + a2 b2 .
Similarly, he standard inner product of two vectors a = (a1 , a2 , a3 ) and b =
(b1 , b2 , b3 ) in R3 is defined as
(a, b) := a1 b1 + a2 b2 + a3 b3 .
56 Vectors in two and three dimensions
2.4.6 Example. The angle ϕ between the vectors u = (1, 0) and v = (1, 1) in R2
can be determined as follows.
(u, v) 1·1+0·1 1 1√
cos ϕ = =√ √ =√ = 2.
kuk·kvk 12 + 02 · 12 + 12 2 2
This means that the difference u − v is perpendicular to (2, −1, 3). In par-
ticular, (2, −1, 3) is a vector which is perpendicular to all direction vectors
of the plane. We call (2, −1, 3) a normal vector of the plane.
In general, if a1 x1 + a2 x2 + a3 x3 = d is an equation of the plane V , then
we can rewrite it in the form of an inner product:
(a, x) = d,
(a, u − v) = 0.
2.4.8 Pythagoras
If u and v are perpendicular vectors, then we find for the square of the length
of the sum vector u + v:
k u + v k2 = (u + v, u + v)
= (u, u) + 2(u, v) + (v, v)
= (u, u) + (v, v)
=k u k2 + k v k2 .
u −v
v
v
u u
2.4.9 Example. We determine the distance between (the endpoint of) p = (1, 2)
and the line ℓ : x = (8, 1) + λ(3, −4). To this end we first determine a vector
q on ℓ such that p − q is perpendicular to ℓ, i.e., perpendicular to a. To find
q, we solve:
q
q
p
r
p
r
Figure 2.9: To compute the distance between p and the line ℓ, we determine
a vector q on ℓ such that p − q is perpendicular to ℓ. If r is an arbitrary
vector on ℓ, then the right-hand figure illustrates that the distance between p
and r is greater than (or equal to) the distance between p and q because of
the Pythagorean theorem.
for every vector on ℓ, its distance to p turns out to be at least as big. Here
is why. If r is a vector on ℓ, then we should compare k p − r k and k p − q k.
Since p − q is perpendicular to q − r (why?), we can apply the Pythagorean
theorem to the triangle with vertices p, q, r. In vector language: we apply
Pythagoras to the vectors u = p − q, v = q − r and their sum u + v = p − r.
We obtain:
k p − r k2 =k p − q k2 + k q − r k2 .
Evidently, k p−r k≥k p−q k (with equality if and only if q = r). So k p−q k
is the distance between p and ℓ.
w sin φ
φ
Figure 2.10: The length of the cross product of v and w is the area of the
parallelogram spanned by v and w.
u × (v + w) = u × v + u × w en (v + w) × u = v × u + w × u.
The properties b) and d) almost determine the cross product, but not quite:
the cross product could still point in two different directions perpendicular
to the plane spanned by v and w. Which direction to choose is based on
the right hand rule if you put your right hand along v in such a way that
your fingers curl from v to w (so either your litte finger or your index finger
touches v), then your thumb points in the direction of v × w.
axb
c cos φ
b
Figure 2.11: The volume of the parallelepiped equals the absolute value of
(a × b, c).
expressed using the inner product and cross product. To obtain this expres-
sion, note that the volume is the product of the area of the parallelogram
spanned by a and b and the height. The area of the parallelogram is k a × b k
as we saw before. Since a×b is perpendicular to the parallelogram, the height
equals the (length of the) projection of c on a × b, i.e., the absolute value of
k c k · cos ϕ, where ϕ is the angle between c and a × b. So, the volume of the
parallelepiped is
|(a × b, c)|.
Een vergelijking van het vlak is dus −x1 + 3x2 − 5x3 = d voor een
of andere d. Vullen we (1, 2, 3) in, dan vinden we dat d = −10. Een
vergelijking is dus −x1 + 3x2 − 5x3 = −10.
1 1 1√
k (1, 2, 1) × (2, −1, 3) k= k (7, −1, −5) k= 65 .
2 2 2
62 Vectors in two and three dimensions
1 1
(a + c) (b + c)
2 2
A 1 B
(a + b)
2
Figure 2.12: The three medians in △ABC are concurrent. The vector de-
scription of the midpoints of the sides is given.
The question whether the three medians have a point in common comes down
to the question whether the parameters λ, µ and ρ can be chosen in such a
way that the three parametric descriptions describe the same vector. The
answer is ‘yes.’ Indeed, for λ, µ, ρ all equal to 2/3 we obtain the common
1
vector (a+b+c). This gives a vector description of the centroid of a triangle.
3
Note that it looks like an ‘average’ of the three vectors corresponding to the
vertices.
Since we need the parameter value 2/3, the vector approach also shows
that the medians, now viewed as segments, divide one another in the ratio
2 : 1.
Note that the common value for λ, µ, ρ can also be computed. Try finding
that value by rewriting the vector equation
1 1
a+λ (b + c) − a = b + µ (a + c) − b
2 2
C
G
D
F
H
A E B
Figure 2.13: The midpoints of the sides of quadrangle ABCD form a paral-
lelogram.
1 1 1
e − f = (a + b) − (b + c) = (a − c)
2 2 2
and
1 1 1
h − g = (a + d) − (c + d) = (a − c).
2 2 2
This finishes the proof.
A B
Figure 2.14: The altitudes in △ABC are concurrent. The altitude from B is
dashed.
So, let △ABC be a triangle. Suppose the altitudes from A and C meet
in P . The vector corresponding to P is denoted by p. The fact that AP is
perpendicular to BC and CP is perpendicular to AB translates as follows:
p − a ⊥ b − c or (p − a, b − c) = 0,
(2.1)
p − c ⊥ a − b or (p − c, a − b) = 0.
In order to prove that P is on the altitude from B, we will show that p − b
and a − c are perpendicular. First we use the bilinearity of the inner product
to rewrite the expressions in (2.1):
(p, b) + (a, c) = (p, c) + (a, b)
(p, a) + (c, b) = (p, b) + (c, a)
Adding (the left-hand sides and right-hand sides, respectively, of) these equa-
tions yields
(p, a) + (c, b) = (p, c) + (a, b),
which can be rewritten as
(p − b, a − c) = 0.
So we are done.
Note that we haven’t used the freedom to choose an origin. Choosing
a convenient origin might simplify the computations. In our case, a clever
choice would be to put the origin in P . Please check yourself in what way
the computation then simplifies.
66 Vectors in two and three dimensions
M
P
E D
Q
O
H N
K L
A R F B
k d − n k=k 21 (b + c) − 21 (a + b + c) k= 21 k a k,
k e − n k=k 21 (a + c) − 12 (a + b + c) k= 21 k b k,
k f − n k=k 21 (a + b) − 21 (a + b + c) k= 21 k c k .
(h − a, c − b) = (b + c, c − b) = 0.
1 1 1
k k − n k=k (a + a + b + c) − (a + b + c) k= k a k .
2 2 2
So K is also on the circumcircle of triangle DEF . Similar computations
show that the midpoints L of BH and M of CH are on this circle.
That the circle also passes through the feet of the three altitudes is left
as an exercise.
2.6.6 In later chapters we will be able to handle rotations and reflections using
vectors.
2.7 Notes
This chapter serves as a quick and slightly informal introduction to ‘concrete’
vectors in the plane and in space. In Chapter 4 the general notion of a vector
space will be discussed. The notions and techniques discussed in this chapter (and
their extensions presented in the following chapters) are of direct use in many
branches of mathematics (algebra, analysis, statistics, optimization) and other
disciplines like physics.
2.8 Exercises
§1
a. 2u + 3v,
b. u − v.
68 Vectors in two and three dimensions
v 1 + ((v 2 + v 3 ) + v 4 ) = (v 2 + v 1 ) + (v 4 + v 3 ).
§2
a. Why is
x = u + λ(v − u)
a vector parametric equation of the line through (the endpoints of) u
and v?
a. Show that
x = u + λ(v − u) + µ(w − u)
is a vector parametric equation of the plane through u, v and w (where
we assume that none of the three vectors is on the line through the
remaining two).
x = (1 − λ − µ)u + λv + µw,
x = v + λ(v − u) + µ(w − u),
x = u + λ(w − v) + µ(w − u).
§3
6 Determine a parametric equation for each of the lines in a) and b) and for
each of the planes in c) and d).
c. The plane passing through (1, 2, 2), (0, 1, 1) and (1, 3, 2).
d. The plane containing the line x = (−2, 1, 3) + λ(1, 2, −1) and the point
(4, 0, 3).
a. 2x1 + 3x2 = 3.
b. 3x1 − 4x2 + 7 = 0.
c. 2x2 = 5.
c. x2 = 5.
§4
12 Draw a vector u of length 2 in the plane. Draw all vectors in the plane having
inner product 1 with u.
b. Compute the distance between the vectors (1, −1, 1) and (1, −4, 5).
c. Compute the angle between the vectors (1, 1, 2) and (1, 1, −1).
§5
17 Use the cross product to determine a normal vector and an equation of each
of the following planes.
a. The surface area of the triangle with vertices (1, 1, 0), (2, 1, 1), (1, 3, 3).
b. The surface area of the triangle with vertices (2, 0), (5, 1), (1, 4).
c. The volume of the parallellepiped spanned by (1, 1, 1), (2, 2, 3), (1, 0, 1).
§6
d) Take a look at c) under the assumption that the origin is not in the
plane of the triangle.
b) Show that the four medians in a tetrahedron are concurrent, i.e., pass
through one point (the centroid), and describe the centroid in terms of
vectors.
Q P
S
A R B
22 (The nine-point circle) In this exercise we show that the feet of the alti-
tudes of △ABC are also on the nine-point circle.
2.8 Exercises 73
C
M
P
E D
Q
O
H N
K L
A R F B
27 Let V be the plane with equation 2x + y + 3z = 0 and let ℓ be the line with
parametric description x = (4, 0, 2) + λ(1, 1, −1).
28 Let ABC be a triangle in the plane (so A, B, C are not collinear). Suppose
P is a point on the line AB such that A is the midpoint of the segment P B.
Let R be the point on the segment BC such that BR : RC = 2 : 1. Choose
a convenient origin and denote the vectors corresponding to points in the
usual way: c corresponds to C, etc. Use vectors to determine the point of
intersection Q of the lines P R and AC. Also determine the ratio AQ : QC.
Chapter 3
3.1 Matrices
3.1.1 Matrices are rectangular arrays of numbers (or, more generally, elements from
some arithmetical structure, like polynomials) which turn out to be useful in
many places. In this chapter we discuss the arithmetic of matrices and the
role of matrices in solving systems of linear equations.
This first section deals with
• special matrices such as the zero matrix, the identity matrix, the trans-
pose of a matrix,
75
76 Matrices and systems of linear equations
For instance, the second property can be proved as follows. First note that
A + B, B + C, (A + B) + C, A + (B + C) are all n × m matrices. Next note
that the element in position ij of the matrix (A + B) + C is ((A + B) + C)ij =
(A + B)ij + cij = (aij + bij ) + cij , and that the element in position ij of the
matrix A + (B + C) equals aij + (bij + cij ) (similar computation); of course,
these numbers are equal (here we use the associativity of the real or complex
numbers).
Due to the associativity we can just speak of A+B +C without specifying
which addition is carried out first, etc., since it doesn’t matter for the result.
Likewise we don’t necessarily need brackets in expressions like A+B +C +D,
since all ways of obtaining this sum, for instance as (A + B) + (C + D) or as
A + (B + (C + D)), lead to the same result.
1A= A,
(λ + µ)A = λA + µA,
λ(A + B) = λA + λB,
λ(µA) = (λµ)A.
The verifications are easy exercises. For instance, the last property is proved
by comparing the elements in position ij of both sides (for all i, j):
3.1.7 Examples. Here are some examples that can be verified using the definition
of the product of matrices.
1 −1
1 2 −1 −2 − 6 − 1
• 0 = ,
0 1 1 1 0
3 0
1 −1 1 1 −2
1 2 −1
• −2 0 = −2 −4 2 .
0 1 1
3 0 3 6 −3
• Even if A and B are square matrices of the same size, i.e., both are
n × n matrices for some n, then AB and BA may still differ:
1 −3 −1 2 − 4 − 13
= ,
3 4 1 5 1 26
−1 2 1 −3 5 11
= .
1 5 3 4 16 17
3.1.8 Property. For matrices of the correct dimensions, various arithmetic rules,
similar to those for ordinary real or complex numbers, hold. Here are the
most important ones.
A(B + C) = AB + AC and (E + F )G = EG + F G ,
(λA)B = λ(AB) ,
λ(µA) = (λµ)A,
(AB)C = A(BC) .
These rules follow from the definitions, but especially the third one requires
some effort. When we deal with linear transformations, we will discuss an
easy proof.
As a consequence of these rules we can, for example, simply write λAB
instead of (λA)B or λ(AB). Similarly, we can write ABC instead of (AB)C
or A(BC). Of course, putting in brackets may sometimes be useful to clarify
a computation.
3.1.13 Example. The following two matrices are each other’s inverse:
2 1 1 −1
, .
1 1 −1 2
Next consider
1 1
A= .
1 1
If
x u
B=
y v
is the inverse of A, then
1 1 x u 1 0
= ,
1 1 y v 0 1
so
x + y = 1, x + y = 0, u + v = 0, u + v = 1.
It is clear that there are no solutions for x, y, u, v.
3.1 Matrices 81
3.1.14 Let A and B be n × n matrices and suppose that A−1 and B −1 exist. Then
(A B)−1 = B −1 A−1 ,
since
3.1.16 Examples.
1 4
1 2 3
A= , AT = 2 5 .
4 5 6
3 6
1
A = 2 , AT = 1 2 3 .
3
In short, transposing is ‘taking the mirror image with respect to the so-
called main diagonal’ (the main diagonal consists of the elements with indices
11, 22, . . .).
3.1.17 Property. It follows directly from the definition that the following rules
hold (supposing in each case that the operations can be carried out):
(A + B)T = AT + B T ,
(λA)T = λAT ,
(A B)T = B T AT ,
(AT )T = A.
82 Matrices and systems of linear equations
• Interchange the order of the rows (in particular, interchange two rows).
• Replace a row by the sum of this row and a scalar multiple of another
row.
These row operations are inspired by the process of solving systems of lin-
ear equations, in which interchanging equations, multiplying equations by a
scalar, and adding a multiple of an equation to another, are used to simplify
and solve the equations. The relation between row operations and solving
systems of linear equations is discussed in the next section.
First we discuss an example of how to use these elementary operations to
change the given matrix into a special form with ‘many zeros’.
We use the first row to get as many zeros as possible in the first column.
Therefore we add the first row to the second, and subtract it from the third
row (we work from top to bottom). We find:
1 2 −4 8
0 0 2 4 .
0 2 2 −8
3.2 Row reduction 83
Next, we try to achieve the same in the second column without ruining the
first column. So we don’t use the first row, but instead interchange the second
and the third row:
1 2 −4 8
0 2 2 −8 .
0 0 2 4
Then we divide the second row by 2:
1 2 −4 8
0 1 1 −4 .
0 0 2 4
Now we can use the second row to produce zeros in the second column. So
we subtract the second row twice from the first (note that this doesn’t affect
the first column!):
1 0 − 6 16
0 1 1 −4 .
0 0 2 4
In the next step, we use the third row. We first divide it by 2,
1 0 − 6 16
0 1 1 −4
0 0 1 2
and then add the new third row 6 times to the first row, and subtract it from
the second row. Note that this doesn’t alter the first two columns.
1 0 0 28
0 1 0 −6 .
0 0 1 2
We can’t go any further since that would affect the first three columns. The
matrix obtained is called the (row ) reduced echelon form of A.
• Let n1 be the index of the first column (from the left) that contains a
non-zero element.
• If necessary interchange two rows so that the first element of the n1 -th
column is non-zero.
• Divide each element of the first row by the first element of the n1 -th
column so that we obtain a situation with a1n1 = 1.
• Use the first row to produce zeros in all other entries of the n1 -th
column.
Now suppose we have carried out m steps of this kind. In the resulting matrix
the first m rows have been used and the last column we have dealt with is
the nm -th. Then we do the following:
• Let nm+1 be the index of the first column that contains a non-zero
element in one of the spots with index at least m + 1.
• If necessary interchange the m + 1-th row with one of the next rows so
that the m + 1-th element of the nm+1 -th column is non-zero.
• Use the m+1-th row to produce zeros in the other entries of the nm+1 -th
column.
This process stops if all rows have been used or if we are left with rows
consisting of zeros only.
The result of these row reduction steps is a matrix in so-called row reduced
echelon form or simply reduced echelon form. It looks as follows in the first
case:
0 ... 0 1 ∗ ... ∗ 0 ∗ ... ∗ 0 ∗ ... 0 ∗ ... ∗
0 ... 0 0 0 ... 0 1 ∗ ... ∗ 0 ∗ ... 0 ∗ ... ∗
0 ... 0 0 0 ... 0 0 0 ... 0 1 ∗ ... 0 ∗ ... ∗
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
0 ... ... ... ... ... ... ... ... ... ... ... ... ... 0 ∗ ... ∗
0 ... 0 0 0 ... 0 0 0 ... 0 0 0 ... 1 ∗ ... ∗
3.3 Systems of linear equations 85
• Every row starts with (possibly zero) zeros. Its first nonzero entry (if
there is any) is 1 (its leading entry). The column containing this 1 has
zeros in all other entries.
• Every non-zero row starts with more zeros than the row directly above
it In particular, if there are any ‘zero rows’ (rows consisting of zeros
only), they are all below the non-zero rows.
The matrix
1 −1 −2
0 1 3
is not in row reduced form, because the second row doesn’t satisfy the first
condition: there is a −1 above the 1 in the second column.
containing all the aij and bk . This matrix is often denoted as (A|b) and
is called the extended coefficient matrix of the system. The vertical bar is
sometimes used to distinguish between the the two types of coefficients.
3.3.3 Examples. If the matrix (A|b) is in row reduced form, then it is easy to
describe the solutions of the system. Here are some examples.
1 0 2 x1 = 2,
• so
0 1 3 x2 = 3.
1 0 5 0
• 0 1 −2 0 . The last equation has the form 0x1 +0x2 +0x3 = 1
0 0 0 1
or 0 = 1. This equation has no solutions. We call the system inconsis-
tent.
1 0 5 2
• 0 1 −2 3 . Every triple (p1 , p2 , p3 ) satisfies the last equation,
0 0 0 0
so that we can just as well leave out this equation. What remains is:
x1 +5x3 = 2 ,
x2 −2x3 = 3 .
x1 = 2 − 5λ,
x2 = 3 + 2λ,
so that
(x1 , x2 , x3 ) = (2, 3, 0) + λ(− 5, 2, 1).
v : a1 x1 + a2 x2 + · · · + am xm = b and w : c1 x1 + c2 x2 + · · · + cm xm = d.
αv : αa1 x1 + · · · + αam xm = αb .
v : (a1 , a2 , . . . , am , b),
w : (c1 , c2 , . . . , cm , d),
v + w : (a1 + c1 , a2 + c2 , . . . am + cm , b + d),
3.3.6 Applying row operations doesn’t change the solutions of the system
For the technique of applying row operations to work, it is essential that in
each step the solution set remains the same. We show this by proving that
each type of row operation doesn’t change the solution set.
v : a1 x 1 + · · · + am x m = b .
αa1 p1 + · · · + αam pm = αb ,
90 Matrices and systems of linear equations
v : a1 x1 + · · · + am xm = b and w : c1 x1 + · · · + cm xm = d.
a1 p1 + · · · + am pm = b and c1 p1 + · · · + cm pm = d.
Since the process of row reducing consists of applying (at the level of matri-
ces) such operations consecutively, we conclude that applying row operations
doesn’t change the solution set of a system of equations.
(x1 , x2 , x3 ) = (1, − 1, 1) .
Applying row operations produces the following row reduced echelon form
1 0 − 13 9 −1
0 1 8 −5 1 .
0 0 0 0 0
92 Matrices and systems of linear equations
x1 = − 1 − 9µ + 13λ ,
x2 = 1 + 5µ − 8λ ,
so that
The advantage of this latter way of describing the solutions is that it shows
that the solution set is a ‘plane in 4-dimensional space.’ We’ll return to this
in the chapter on vector spaces.
3.3.10 Remark. a) One can prove that the row reduced echelon form of a matrix
is unique: in whatever way you apply the row operations, you’ll always
end up with the same row reduced echelon form. A proof can be found
in Thomas Yuster, The reduced row echelon form of a matrix is unique:
A simple proof , Mathematics Magazine, vol. 57, No. 2 (1984).
3.4 Notes
James Joseph Sylvester (1814–1897) introduced the term matrix for a rectangular
array of numbers. In the Philosophical Magazine (1851) he wrote: “I have in
previous papers defined a “Matrix” as a rectangular array of terms, out of which
different systems of determinants may be engendered, as from the womb of a
common parent”. Determinants will be discussed in Chapter 5.
Matrices turn out to be a useful way of storing and handling data. In this
chapter, we have used them to store and manipulate the coefficients of systems
of linear equations. We will come across various other usages of matrices in the
following chapters (by the way, they are also used in many other mathematics
courses). The importance of matrices is in the arithmetic operations like addition
and multiplication that allow for efficient handling of data.
3.4 Notes 93
3.5 Exercises
§1
4 The 3 × 2 matrices A = (akl ) and B = (bkl ) are given by akl = k + li, bkl = k − li.
Determine A + B, A − B, A⊤ B, AB ⊤ .
b. Suppose A is invertible with inverse A−1 . Determine the inverse of each of the
following matrices: λA (λ 6= 0), A2 , A⊤ , A−1 .
c. Prove that n
cos ϕ − sin ϕ cos nϕ − sin nϕ
=
sin ϕ cos ϕ sin nϕ cos nϕ
for all positive integers n.
§2
7 Use row reduction to transform the following matrices into row reduced echelon
form.
a.
1 2 −3 −11
2 5 −5 −11 ,
−1 −1 7 43
b.
0 0 1 1 3
1 2 2 2 8 .
1 2 3 3 11
8 The operations used in row reduction can also be brought about by multiplying
with suitable matrices. This connection is discussed in this exercise.
a. In the 3 × 3 identity matrix interchange the 2nd and 3rd row. Let E be the
resulting matrix. Now compute the product
a11 a12 a13 a14
E a21 a22 a23 a24 .
a31 a32 a33 a34
Find by analogy the matrix you need to multiply with (from the left or from
the right?) to accomplish swapping the i-th and j-th rows of an m × n matrix.
b. In the 3 × 3 identity matrix multiply the 2nd row by 7. Let F be the resulting
matrix. Now compute the product
a11 a12 a13 a14
F a21 a22 a23 a24 .
a31 a32 a33 a34
Find by analogy the matrix you need to multiply with (from the left or from
the right?) to accomplish multiplication of the i-th row of an m × n matrix by
λ.
96 Matrices and systems of linear equations
c. In the 3 × 3 identity matrix add 5 times the 3-rd row to the first row and call
the resulting matrix G. Compute the product
a11 a12 a13 a14
G a21 a22 a23 a24 .
a31 a32 a33 a34
Find by analogy the matrix you need to multiply with (from the left or from
the right?) so that in a m × n matrix λ times the i-th row is added to the j-th
row.
§3
a.
x1 +2x2 +3x3 −x4 = 0,
2x1 +3x2 −x3 +3x4 = 0,
4x1 +6x2 +x3 +2x4 = 0;
b.
3x1 +x2 +2x3 −x4 = 0,
2x1 −x2 +x3 +x4 = 0,
5x1 +5x2 +4x3 −5x4 = 0,
2x1 +9x2 +3x3 −9x4 = 0;
c.
x1 −x2 +x3 +2x4 = 2,
2x1 −3x2 +4x3 −x4 = 3,
x1 −x3 +7x4 = 3.
a.
x2 +2x3 = 1,
x1 +2x2 +3x3 = 2,
3x1 +x2 +x3 = 3;
b.
x1 +x2 +2x3 +3x4 −2x5 = 1,
2x1 +4x2 −8x5 = 3,
−2x2 +4x3 +6x4 +4x5 = 0;
3.5 Exercises 97
c.
x1 +2x2 = 0,
x1 +4x2 −2x3 = 4,
2x1 +4x3 = −8,
3x1 +6x3 = −12,
−2x1 −8x2 +4x3 = −8;
d.
x1 +x2 −2x3 = 0,
2x1 +x2 −3x3 = 0,
4x1 −2x2 −2x3 = 0,
6x1 −x2 −5x3 = 0,
7x1 −3x2 −4x3 = 1.
b.
(1 − λ)z1 −2z2 = 0,
5z1 +(3 − λ)z2 = 0,
for λ = 2 + 3i, and for λ = 2 − 3i;
c.
λz1 −z2 = 0,
λz2 +z3 = 0,
z1 +λz3 = 0,
for λ = 1, λ = e2πi/3 , and for λ = e−2πi/3 .
1 1 √
12 Let a = − + i 3. Show that a2 + a + 1 = 0 and solve the following system of
2 2
linear equations.
z1 −z2 +z3 = 0,
z1 +az2 +a2 z3 = 1,
−z1 −a2 z2 −az3 = 1.
13 Determine for each value of λ the solution(s) of the following system of linear
equations.
λx1 +x2 +x3 = 2,
x1 +λx2 +x3 = 3.
98 Matrices and systems of linear equations
x1 −2x3 = λ + 4,
−2x1 +λx2 +7x3 = −14,
−x1 +λx2 +6x3 = λ − 12.
Vector spaces
• linear subspaces,
1. p + q = q + p,
2. (p + q) + r = p + (q + r),
5. 1 p = p,
99
100 Vector spaces
6. (λµ)p = λ(µp),
7. (λ + µ)p = λp + µp,
8. λ(p + q) = λp + λq.
Now matrix addition and scalar multiplication of, say, m × n matrices satisfy
similar properties. The similarities observed in the setting of vectors in the plane,
of matrices, and of other examples, have led to the idea of introducing an abstract
notion of which vectors in the plane or space, and matrices are examples. This is
the notion of a vector space in which the starting point is any set together with two
operations on the elements of this set, called ‘addition’ and ‘scalar multiplication’,
in which the above eight ‘axioms’ hold. The elements of the set are then called
vectors. A vector space is also sometimes called a linear space. In these lecture
notes we denote vectors by underlined symbols1 , like v. The scalars can be real or
complex numbers. In the first case we are dealing with a real vector space, in the
second case with a complex vector space. There do exist vector spaces over other
sets of scalars but they are beyond the scope of this course.
From the eight rules described above we can derive some more (obvious looking)
arithmetical rules that hold for vectors in an abstract setting (note that in the
abstract setting we only know so far that our set satisfies the eight axioms; any
other rule, even if it looks trivial, requires a proof). For instance, for every scalar
λ the equality λ 0 = 0 holds, and for every vector a we have 0 a = 0 (see exercise
27).
Some more rules (that we will not discuss and proof in detail here; but see
exercise 27) and remarks:
• The zero vector 0 is unique (in a given vector space), the opposite of a vector
is unique.
• Stricly speaking, a sum of, say, three vectors v 1 , v 2 , v 3 (or more) is not
defined; only the sum of two vectors is. To deal with three vectors, just take
(v 1 + v 2 ) + v 3 (why is this sum defined?). Another option is to define the
sum as v 1 + (v 2 + v 3 ), and the associativity garantuees that the two given
options give the same answer. This is the reason that we usually just write
v 1 +v 2 +v 3 and only care about brackets if they are of help in a computation
or proof. For more than three vectors something similar can be shown, so
that a sum of n vectors v 1 + · · · + v n is meaningful. For instance, a way of
defining the sum of four vectors v 1 , . . . , v 4 is as follows: (v 1 + v 2 ) + (v 3 + v 4 ).
But, ((v 1 + v 2 ) + v 3 ) + v 4 could also be the definition, and, again by an
1
In the literature you’ll come across various other notations: ~v , v̄, v
4.1 Vector spaces and linear subspaces 101
associativity argument (do you see how?), the two ‘definitions’ produce the
same vector.
Finally, even though vectors in the plane or in space are just two examples of
vector spaces, they are important in shaping our intuition. These examples are
often a good guide, even when working in a totally different vector space.
4.1.3 Example. The first example is the ‘space of arrows’ in the plane or in space. We
fix a point O, the origin. For every point P let p be the arrow from O to P . Our
vector space to be consists of all such arrows; we denote it by E 2 (the plane) or
E 3 (space).
The operations ‘addition’ and ‘scalar multiplication’ are defined as suggested
in the figure. Using geometry the eight axioms of a vector space can be checked,
but we will not discuss the details of this verification. The vector spaces E 2 and
E 3 are examples of real vector spaces.
❃ a+b
✕ λa
b
✕ a
✕
✶ a
0 0
4.1.4 Example. Let n ≥ 1 be an integer and let Rn = {(a1 , . . . , an ) | a1 , . . . , an ∈ R}.
For any two n-tuples of real numbers a = (a1 , . . . , an ) and b = (b1 , . . . , bn ) from
Rn , and any scalar α we define the sum and the scalar product as follows:
a + b = (a1 + b1 , . . . , an + bn ),
α a = (αa1 , . . . , αan ) (α real).
One can easily verify that Rn with these two operations the axioms of a (real)
vector space are satisfied. By way of example, We’ll check the first one. The first
axiom requires that a + b = b + a for all a and b. Now
a + b = (a1 + b1 , . . . , an + bn )
and
b + a = (b1 + a1 , . . . , bn + an ),
where a = (a1 , . . . , an ) and b = (b1 , . . . , bn ). Since ai + bi = bi + ai , i = 1, . . . , n
(this is a property of the real numbers), we conclude that indeed a + b = b + a.
102 Vector spaces
Note that the zero vector is (0, 0, . . . , 0), and the opposite of the vector (a1 , . . . , an )
is (−a1 , . . . , −an ).
We remark that, strictly speaking, E 2 is not the same space as R2 and E 3 is
not R3 , since an arrow is not an array of numbers. There is a close connection,
and we will come back to that.
In a similar way we can turn the set Cn of n-tuples of complex numbers into
a complex vector space.
4.1.5 Example. The set Mn,m of n × m-matrices with matrix addition and the usual
scalar multiplication is a vector space with zero vector the n × m zero matrix. The
opposite of a matrix A is the matrix −A. Depending on which numbers we use in
the matrix and as scalars, we obtain a real or complex vector space. Sometimes
the notations Mn,m (R) and Mn,m (C) are used to denote these two types.
p = an xn + an−1 xn−1 + · · · + a1 x + a0 ,
q = bn xn + bn−1 xn−1 + · · · + b1 x + b0 .
This addition and scalar multiplication satisfy the eight axioms. The zero vector
is the zero polynomial (all coefficients equal to 0), and the opposite of an xn +
an−1 xn−1 + · · · + a1 x + a0 is of course the polynomial −an xn − an−1 xn−1 − · · · −
a1 x − a0 . If we only allow polynomials with real coefficients and if we use real
scalars in the scalar multiplication, then the vector space is real. If we admit
complex coefficients and complex scalars, the vector space is complex.
4.1.7 Example. Consider the set of all functions from a non-empty set X to the real
numbers. Addition and scalar multiplication can be defined as follows:
Then X becomes a real vector space (the zero vector is the ‘zero function’
which sends every x ∈ X to 0; the opposite −f of a function f is the function
(−1)f ).
Of course, in a similar way a complex vector space can be constructed.
4.1 Vector spaces and linear subspaces 103
4.1.8 Next, we discuss subsets of a vector space V that are themselves vector spaces
(with the two operations ‘inherited’ from V ). A typical example is the subset
{(x, y) ∈ R2 | y = 0} of the vector space R2 . It is easy to verify that this subset with
the addition (u, 0)+(x, 0) = (u+x, 0) and the scalar multiplication λ(u, 0) = (λu, 0)
(simply add and multiply them as vectors in R2 ) is itself a vector space (with zero
vector (0, 0) and opposite (−u, 0) of (u, 0)).
Suppose W is a non-empty subset of the vector space V . For any two vectors
in V that actually lie in W , there is a sum vector in V because we know how to
add vectors in V . But there is no garantuee that this sum vector is itself in W .
A similar remark holds for scalar multiples of vectors from W : such multiples lie
in V but not necessarily in W . If such sums and scalar multiples always lie in
W , then W turns out to be a vector space itself. We call such a subset a linear
subspace of V .
p + q ∈ W,
λp ∈ W.
λp + µq ∈ W.
To verify that such a W is indeed a vector space itself, we need to check the eight
axioms. This turns out to be easy. For instance, to check the first axiom, we need
to verify that v + w = w + v for every v, w ∈ W . But we already know that
v + w = w + v for every v, w ∈ V , and so the equality certainly holds if v, w belong
to a subset of V ! Most axioms hold for similar reasons.
As for the zero vector: V ’s zero vector turns out to lie in W . To see this, take
any p in W (here we use the fact that W is non-empty!) and take the scalar 0.
Then 0 · p = 0 is in W by the above requirements for a linear subspace.
By using the equality −w = (−1)w one easily shows in a similar way that the
opposite of w ∈ W is itself in W .
So linear subspaces are vector spaces themselves. Conversely, if a subset of a
vector space V is a vector space itself (with the addition and scalar multiplica-
tion from V ), then the subset obviously satisfies the above conditions for a linear
subspace.
Caution: note that subspaces are required to be non-empty.
4.1.10 Here is a useful observation that sometimes helps in deciding that a subset is not
a linear subspace.
104 Vector spaces
3a1 − 2a2 + a3 = 0,
3b1 − 2b2 + b3 = 0.
Adding yields:
3(a1 + b1 ) − 2(a2 + b2 ) + (a3 + b3 ) = 0,
so a + b = (a1 + b1 , a2 + b2 , a3 + b3 ) ∈ U .
In a similar way one can verify that α a ∈ U for every α ∈ R.
Note that in order to prove that a subset is a linear subspace it is not enough to
show that 0 belongs to that subset. For instance, the subset W = {(x, y)|y = x2 }
of R2 contains (0, 0), but W is not a linear subspace because (1, 1) is in W but
2 · (1, 1) is not.
4.1.12 Example. In the vector space V of all real polynomials of degree at most 3, the
subset W = { p(x) ∈ V | p(1) = 0 }, i.e., the set of polynomials having a zero at 1,
is a linear subspace. This subset contains for example the polynomial p(x) = x2 −1.
Here is the proof that W is indeed a linear subspace.
4.1.14 Example. Consider the vector space V of all functions on R, with sum f + g and
scalar product αf defined by
(f + g)(x) = f (x) + g(x) for all x ∈ R,
(αf )(x) = α f (x) for all x ∈ R.
Now polynomials (more precisely, polynomial functions) form a nonempty subset
of V . The sum of two such functions and the scalar product of such a function
are again polynomial functions. So the set P of all polynomials forms a linear
subspace of V .
Here is a further refinement of this statement. The sum of two polynomials of
degree at most n and the scalar product of a polynomial of degree at most n are
again polynomials of degree at most n. So for every nonnegative integer n the set
Pn of all polynomials of degree at most n is a linear subspace of V . So we have
the following chain of linear subspaces:
P0 ⊂ P1 ⊂ P2 ⊂ · · · ⊂ Pn ⊂ · · · ⊂ P ⊂ V.
Note that no two of these subspaces are equal.
Next, we define the notions line and plane in the general setting of vector spaces.
4.1.15 If p and v 6= 0 are two vectors in E 3 (or E 2 ), then geometrically it is clear that
the endpoints of the vectors
x = p + λv, λ∈R (4.1)
106 Vector spaces
are on the line through the endpoint of p and parallel with v. The formula (4.1) is
calles a parametric equation of this line. Since the expression p + λv is built from
a scalar product of a vector and a sum of vectors, we can, by analogy, state the
following definition in any vector space.
4.1.16 Definition. (Line) Let p and v be two vectors in a vector space en suppose v 6= 0.
Then the set of vectors of the form
x = p + λv, λ ∈ R or C,
is called a line in the vector space. The vector p is called a position vector of the
line and the vector v a direction vector. We call the description x = p + λv a
parametric equation or parametric representation of the line.
y ′ + 2y = 2x
are
1
y = (x − ) + ce−2x .
2
The solution of this differential equation is therefore a line in the space of all
functions on R. Its position vector is the function x − 12 and its direction vector is
the function e−2x .
4.1.18 Similarly, if p, v, w are three vectors in E 3 such that v 6= 0, w 6= 0, and such that v
and w are not multiples of one another. (In the next section, we will formulate this
as: v and w are linearly independent. Geometrically it is clear that the endpoints
of the vectors
x = p + λv + µw, λ, µ ∈ R
describe a plane passing through the endpoint of p and parallel to v and w. This
motivates the following generalization.
4.1.19 Definition. (Plane) Let p, v, w be three vectors in a vector space and suppose
v 6= 0, w 6= 0, and v and w are not multiples of one another. The set of vectors
is calles a plane in the vector space with position vector p and direction vectors
v and w. The description (4.2) is called a parametric equation (or parametric
representation) of the plane.
4.1 Vector spaces and linear subspaces 107
y ′′ + y = x
are
y = x + c1 cos x + c2 sin x with c1 , c2 ∈ R.
So the solution set is a plane in the vector space of all functions on R, with position
vector the function x, and with direction vectors the functions sin x and cos x.
V = {(x, y, z) | 2x + 3y − z = 4}.
i.e.,
x = 1 +λ +2µ,
y = λ −µ,
z = 1 −λ −µ.
From the last two equations we solve for λ and µ and find
1 1 1 1 1 1
λ= y− z+ and µ=− y− z+ .
2 2 2 2 2 2
Substituting in the first of the three equations yields
2x + y + 3z = 5,
108 Vector spaces
or
x = −λ +µ,
y = 1 −λ ,
z = 2λ ,
u = 1 −µ.
Using the last two equations we express λ and µ in terms of z and u and use the
results in the first two equations. We find
2x + z + 2u = 2,
2y + z = 2.
4.1 Vector spaces and linear subspaces 109
Every point of W is a solution of this system and conversely (for the converse,
solve the system of two linear equations).
m : x = (p + αv) + λv,
{p + λv | λ ∈ R} = {(p + αv) + λv | λ ∈ R} .
This remark implies that the line ℓ is a linear subspace if and only if 0 ∈ ℓ.
Here are the details for the ‘if’ part. If 0 ∈ ℓ, then we can use 0 as a position vector
of the line and describe the line by the scalar multiples λ v of v. Since the sum
λv + µv can be written as (λ + µ)v, this sum is again a multiple of v and therefore
on ℓ. Of course, since µ(λv) = (µλ)v, we see that scalar multiples of vectors on l
are themselves on ℓ.
Similar remarks hold for planes: every vector on a plane can serve as position
vector of the plane, and a plane is a linear subspace if and only if the zero vector
is on the plane.
110 Vector spaces
In a similar way as above one can show that any nonzero multiple of v can
serve as direction vector of the line ℓ : x = p + λv. Planes can also have many
pairs of direction vectors (no details here).
4.2.3 Example. A linear combination of the vectors (1, 1, −1) and (2, 0, 1) in R3 is, for
example, the vector (−1, 3, −5) = 3 (1, 1, −1) − 2 (2, 0, 1).
4.2.5 Theorem. Spans are linear subspaces, i.e., if a1 , . . . , an are vectors in the vector
space V , then < a1 , . . . , an > is a linear subspace of V .
4.2 Spans, linearly (in)dependent systems 111
Proof. Of course, the span is non-empty (it contains the zero vector).
Now let p and q be vectors in < a1 , . . . , an > and suppose
p = p1 a 1 + · · · + pn a n and q = q1 a1 + · · · + qn an .
Then
p + q = (p1 + q1 )a1 + · · · + (pn + qn )an ∈< a1 , . . . , an > .
Also, for every scalar λ:
So sums and scalar multiples of vectors from the span belong to the span, which
finishes the proof.
4.2.6 Example. The span < (2, 1, 0), (1, 0, 1) > is precisely the plane with equation
x − 2y − z = 0: with y = λ and z = µ we get (x, y, z) = (2λ + µ, λ, µ) =
λ(2, 1, 0) + µ(1, 0, 1), and, by definition, these vectors run through the span <
(2, 1, 0), (1, 0, 1) >.
4.2.7 Example. In a vector space V consider a line passing through the origin:
l : x = λv.
This line equals the span < v >, so it is a linear subspace as we saw before in
4.1.23.
Similarly, the plane
V : x = λv + µw
passing through the origin equals the span < v, w >.
4.2.8 Example. In R3 consider the vectors a = (1, 1, −2), b = (−1, 1, 0), c = (0, 1, −1)
and let V =< a, b, c >. We see immediately that 2c−a = b. Now take an arbitrary
x ∈ V . Then x can be written as
x = x1 a + x2 b + x3 c,
x = x1 a + x2 (2c − a) + x3 c
= (x1 − x2 )a + (2x2 + x3 )c.
4.2.10 Theorem. Let a1 , . . . , an be vectors in a vector space V . The span < a1 , . . . , an >
doesn’t change if we
1. change the order of the vectors,
2. multiply one of the vectors by a scalar 6= 0, i.e., replace, say, ai by λai with
λ 6= 0,
3. add a scalar multiple of one of the vectors to one of the other vectors, i.e.,
replace, say, ai by ai + αaj with j 6= i.
The span also doesn’t change if we
4. insert the zero vector, for instance, < a1 , . . . , an >=< a1 , . . . , an , 0 >, or
leave out the zero vector (if of course the zero vector was one of the ai ),
5. insert a linear combination λ1 a1 + · · · + λn an of a1 , . . . , an ,
6. leave out ai if this vector is a linear combination of the other aj .
Proof. The proof that changing the order (1), and inserting or leaving out the zero
vector (4) doesn’t affect the span is almost trivial, so we leave that to the reader.
To prove 2) we first observe that the equality
λ1 a1 + λ2 a2 + · · · + λk (αak ) + · · · + λn an = λ1 a1 + λ2 a2 + · · · + (λk α)ak + · · · + λn an
shows that every linear combination of a1 , . . . , αak , . . . , an (only ak is multiplied
by the scalar α) is a linear combination of a1 , . . . , ak , . . . , an . Likewise,
λk
λ 1 a 1 + λ 2 a 2 + · · · + λ k a k + · · · + λ n a n = λ 1 a1 + λ 2 a 2 + · · · + (αak ) + · · · + λn an
α
4.2 Spans, linearly (in)dependent systems 113
But then
4.2.11 Example. By repeatedly applying the above rules, we see that (regardless of the
vector space we are working in)
4.2.13 This theorem states that, if a vector b ∈< a1 , . . . , an > can be written as a linear
combination of a1 , . . . , an , where the coefficient of ai is nonzero, then we can replace
the vector ai by b without altering the span of the vectors.
Proof of 4.2.12. Consider V =< a1 , . . . , ai , . . . , an >. Now first multiply ai by λi
(6= 0) and then add to ie the vector λ1 a1 , . . . , λi−1 ai−1 , λi+1 ai+1 , . . . , λn an . These
steps leave the span the samen, so that V =< a1 , . . . , ai−1 , b, ai+1 , . . . , an >.
4.2.14 In 4.2.11 we have seen an example of a space spanned by three vectors, but which
can also be spanned by two vectors. We now discuss how to find such ‘minimal’
systems of vectors spanning a given space. Apart from the theorems 4.2.10 and
4.2.12, the notion of a linear (in)dependent system of vectors plays a central role.
4.2.16 A more practical way to decide if a set of vectors is linearly (in)dependent is based
on the following equivalent formulation.
λ1 a1 + λ2 a2 + · · · + λn an = 0,
4.2 Spans, linearly (in)dependent systems 115
Proof. We restrict ourselves to the proof of the first equivalence, and leave the
second one to the reader.
First we deal with the implication ⇒). If the equation (4.3) has a solution
with, say, λi 6= 0, then
• The vectors
e1 = (1, 0, 0, . . . , 0),
e2 = (0, 1, 0, . . . , 0),
..
.
en = (0, 0, 0, . . . , 1),
in Rn are linearly independent, since the equation
λ1 e1 + · · · + λn en = (0, . . . , 0),
• The vectors (1, 2, 2) and (0, 1, −1) in R3 are linearly independent; here is the
proof. If a(1, 2, 2) + b(0, 1, −1) = (0, 0, 0), then we rewrite this as (a, 2a +
b, 2a − b) = (0, 0, 0), and easily conclude a = b = 0.
116 Vector spaces
• The vectors (1, 0, 1), (0, 1, 1), (1, 1, 0), (2, 2, 2) in R3 are not linearly depen-
dent. To see this, consider the equation
• The functions sin and cos in the space of real functions R → R are linearly
independent. Suppose
then, since this is an equality of functions, we find dat that for every real
number t the relation a sin(t) + b cos(t) = 0 holds. Now we choose a few
‘smart’ values for t to deduce that a and b are 0: for t = 0 we get b cos(0) = 0
so that b = 0, and for t = π/2 we get a sin(π/2) = 0 so that a = 0.
Of course, in general there may exist dependences between functions. For
instance, the formula sin(2t) = 2 sin(t) cos(t) tells us that the functions
t 7→ sin(2t) en t 7→ sin(t) cos(t) are not linearly independent.
V = < b1 , a2 , . . . , an > .
Now the vector b2 is a linear combination of the vectors on the right-hand side.
Again, at least of the coefficients of the a2 , . . . , an must be 6= 0 (otherwise, b1 would
be a multiple of b1 ). So we can exchange b2 and one the vectors a2 , . . . , an , again
by Theorem 4.2.12. Possibly after relabeling, we may assume that we exchange b2
and a2 . So:
V = < b1 , b2 , a3 , . . . , an > .
Continue in the same way. By Theorem 4.2.12 every bi can be exchanged, so that
m ≤ n.
4.2.21 Theorem. If the vector space V is the span of each of the systems of independent
vectors a1 , . . . , an and b1 , . . . , bm , then m = n.
4.2.22 Definition. (Basis and dimension) A linearly independent set spanning a vec-
tor space V is called a basis of V . The number of elements in the basis is called
the dimension of V is denoted as dim(V ).
4.2.23 If there isn’t a finite basis of V (and V does not consist of 0 only), then we say
dim(V ) = ∞.
The case V = {0} is a bit special. The space V contains only one vector, 0,
but this vector is not linearly independent since 3 0 = 0 (do you see why?). We
usually say that the emptyset ∅ is a basis and that the dimension of V is 0.
4.2.24 Examples. Here are some vector spaces and their dimensions.
e1 = (1, 0, 0, . . . , 0),
e2 = (0, 1, 0, . . . , 0),
..
.
en = (0, 0, 0, . . . , 1),
4.2.25 Here are some consequences of the definitions. If V is a vector space with dim(V ) =
n < ∞, then every basis of V consists of exactly n vectors. We use this to prove
the following statements about the m vectors b1 , . . . , bm in V .
4.2.26 If V is a vector space with dim(V ) = ∞, then there is an infinite sequence of vectors
a1 , a2 , . . . with
an+1 6∈< a1 , . . . , an >
for every n. To see this, choose a1 6= 0 in V . If V = < a1 >, then dim(V ) = 1. So
there must be a a2 ∈ V with a2 6∈< a1 >. If V = < a1 , a2 >, then dim(V ) = 2, so
< a1 , a2 >⊂⊂ V . Now choose a3 ∈ V , a3 6∈< a1 , a2 >, etc.
The infinite sequence a1 , a2 , . . . that we find in this way has the desired prop-
erty. Moreover, for every n the set {a1 , . . . , an } is linearly independent. This
follows from Theorem 4.2.29 below. Here we see an important distinction between
finite dimensional and infinite dimensional vector spaces: in an infinite dimen-
sional vector space there exist arbitrarily large linearly independent sets, whereas
in finite dimensional vector spaces the number of vectors in a linearly independent
set is at most the dimension of the vector space.
4.2.29 Theorem. If the set of vectors {a1 , . . . , an } in the vector space V satisfies
λ 1 a 1 + λ 2 a 2 + · · · + λ n an = 0
in λ1 , . . . , λn . If λn 6= 0, then
−λ1 −λn−1
an = a1 + · · · + an−1 ,
λn λn
4.3 Coordinates
4.3.1 Coordinates
Bases are ‘minimal’ systems of vectors spanning a vector space. They have another
special property which will enable us to use coordinates. If a1 , . . . , an span V , then
every x ∈ V can be written in the form
x = x1 a1 + · · · + xn an . (4.4)
The coefficients need not be unique. For example, consider the space V from
example 4.2.8; for the vector b we have
b = 0a + 1b + 0c
= − 1a + 0b + 2c.
x = x1 a1 + · · · + xn an
then the coefficients x1 , . . . , xn are called the coordinates of the vector x with
respect to this basis. The vector (x1 , . . . , xn ) is called the coordinate vector of x
and is itself a vector in Rn of Cn .
Note: coordinates depend on the basis used!
4.3.3 Example. Let V be the vector space of polynomials of degree at most 2. Consider
the polynomials
p0 : p0 (x) = 1,
p1 : p1 (x) = x,
p2 : p2 (x) = x2 .
Let p be an arbitrary polynomial in this space, say, ax2 + bx + c. Then p =
ap2 + bp1 + cp0 , so that
V =< p0 , p1 , p2 > .
The polynomials p0 , p1 , p2 are linearly independent: suppose α0 p0 +α1 p1 +α2 p2 = 0
(the zero polynomial). This means
α0 + α1 x + α2 x2 = 0 for all x.
If (α0 , α1 , α2 ) 6= (0, 0, 0), then the left-hand side polynomial would have at most
two zeros, which is not the case (since the polynomial is also equal to the zero
polynomial). So (α0 , α1 , α2 ) = (0, 0, 0) and p0 , p1 , p2 are linearly independent.
Therefore the polynomials p0 , p1 , p2 form a basis of V and (c, b, a) is the coordinate
vector of p with respect to this basis.
4.3.4 Example. The vectors (1, 1) and (1, −1) form a basis of R2 . The linear inde-
pendency is easily derived from the fact that a(1, 1) + b(1, −1) = (0, 0) implies
a = b = 0. From 4.2.25 we then derive that these two independent vectors are a
basis of the space.
The coordinate vector of (5, 3) with respect to this basis can be found by
looking for a c and d such that c(1, 1) + d(1, −1) = (5, 3). Solving leads to c = 4
and d = 1. The coordinate vector of (5, 3) w.r.t. the new basis is then (4, 1).
and (αx1 , . . . , αxn ), respectively. E.g., for the scalar multiple αx this follows from
the fact that α(x1 a1 + · · · + xn an ) = (αx1 )a1 + · · · (αxn )an .
So addition and scalar multiplication in V correspond nicely to the usual ad-
dition and scalar multiplication in Rn (or Cn ).
4.3.7 Theorem. Let α be a basis of the n–dimensional vector space V and let {a1 , . . . , am }
be a set of vectors in V . Then:
Proof. We only prove the first item, since the second item is a direct consequence
of it. Suppose b1 , . . . , bm are the coordinate vectors of a1 , . . . , am . The coordinate
vector of λ1 a1 +· · ·+λm am is equal to λ1 b1 +· · ·+λm bm . So if λ1 a1 +· · ·+λm am = 0,
then we also have λ1 b1 +· · ·+λm bm = (0, . . . , 0) and conversely, since the coordinate
vector of the zero vector is (0, . . . , 0). A non-trivial relation between a1 , . . . , am
translates into a non-trivial relation between the coordinate vectors b1 , . . . , bm . In
other words: a1 , . . . , am is linearly dependent if and only if b1 , . . . , bm is linearly
dependent.
Let
a11 x1 + a12 x2 + · · · + a1m xm = b1
.. ..
. .
an1 x1 + an2 x2 + · · · + anm xm = bn
be such a system. Let k 1 , . . . , k m be the columns of the coefficient matrix and let
b = (b1 , . . . , bn )T . Then we can write the system as
x1 k 1 + x2 k 2 + · · · + xm k m = b.
4.3.9 Theorem. The nonzero rows of a matrix in row reduced echelon form are linearly
independent.
Proof. A matrix in row reduced form has the following shape (see 3.2.3):
α1 r1 + α2 r2 + · · · + αp rp = 0.
Now consider the columns containing only zeros except for one 1. Then we find,
respectively,
α1 = 0, α2 = 0, . . . , αp = 0.
4.3.10 The considerations above fully explain the techniques announced in 4.2.27. From
Theorems 4.2.10 and 4.3.9 we find how to construct a basis for a given span in Rn
124 Vector spaces
or Cn : Consider the spanning vectors as rows of a matrix, take the row reduced
echelon form of this matrix and take the nonzero rows. In an ‘abstract’ vector
space we can use these techniques if use coordinates.
1 3 2 3
Row reducing doesn’t change the span of the rows. The row reduced form is:
1 0 2 0
0 1 0 1
0 0 0 0 .
0 0 0 0
The nonzero rows (1, 0, 2, 0) en (0, 1, 0, 1) produce a basis of V .
4.3.12 Example. Consider the following polynomials in the vector space of polynomials
of degree at most 2
p1 : p1 (x) = x2 + 2x − 3,
p2 : p2 (x) = x2 − 2x + 1,
p3 : p3 (x) = x2 − 5x + 4.
and consider their span V = < p1 , p2 , p3 >. Choose as basis: (1, x, x2 ). With
respect to this basis, the coordinate vectors of the three polynomials are
p1 : (−3, 2, 1),
p2 : (1, −2, 1),
p3 : (4, −5, 1).
Next we use Theorems 4.3.7 and 4.3.9 to find a basis for V . Collect the coordinate
vectors as rows in a matrix and row reduce:
−3 2 1 1 0 −1
1 −2 1 ∼ 0 1 −1 .
4 −5 1 0 0 0
So the first two rows yield the basis (1, 0, −1), (0, 1, −1) of the span of the coor-
dinate vectors of p1 , p2 , p3 . So the polynomials 1 − x2 and x − x2 are a basis of
V.
4.4 Notes 125
4.4 Notes
The notion of a vector space (or linear space) is the central concept in linear
algebra; it is the (or a) formalized version of our intuition of space. Its strength
lies in the fact that vector spaces can be used in many different situations. One of
the first to describe vector spaces using axioms was Giuseppe Peano (1858–1932).
In this course we only touch upon the precise role of the axioms. Probably you do
not even notice that we use rules such as 0 + 0 = 0 and 0 · a = 0, which we didn’t
prove (but see the exercises).
Vector spaces are used in many different situations, also outside mathematics.
For instance, to describe notions like speed, acceleration force and impulse in Mechanics
mechanics, and fields in electromagnetism. In signal theory (to handle visual or
audio signals) and in quantum mechanics vector spaces of functions are important.
They tend to be infinite dimensional.
As for geometry, although vector spaces are used to model ‘flat’ objects like
lines and planes, vector spaces are also of help in describing tangent spaces to
curved objects.
Finally, instead of using real or complex numbers, also other systems of scalars
are possible, and most results still hold! In coding theory and cryptology, such
number systems, like the integers modulo 2 are used (i.e., the numbers 0 and 1 Coding theory
with the rules 0 + 0 = 0, 0 + 1 = 1 + 0 = 1, 1 + 1 = 0, 1 · 0 = 0 · 1 = 0 and 1 · 1 = 1). Cryptology
126 Vector spaces
4.5 Exercises
§1
1 In each of the following cases decide if the subsets of R3 are linear subspaces of
R3 :
W1 = {(x1 , x2 , x3 ) ∈ R3 | x2 ∈ Q},
W2 = {(x1 , x2 , x3 ) ∈ R3 | x1 + 2x2 + 3x3 = 0},
W3 = {(x1 , x2 , x3 ) ∈ R3 | x1 + x2 = 1},
W4 = {(x1 , x2 , x3 ) ∈ R3 | x1 ≥ 0}.
2 In each of the following cases decide if the subsets of C3 are linear subspaces of
C3 :
W1 = {(z1 , z2 , z3 ) ∈ C3 | z1 + iz2 + (1 + i)z3 = 0},
W2 = {(z1 , z2 , z3 ) ∈ C3 | z1 + iz2 + (1 + i)z 3 = 0},
W3 = {(z1 , z2 , z3 ) ∈ C3 | Re(z1 ) + Im(z2 ) = 0},
W4 = {(z1 , z2 , z3 ) ∈ C3 | z 1 + iz 2 = 0}.
3 Check whether the following subsets of the vector space of 2 × 3-matrices with real
entries are linear subspaces:
b. the matrices
a11 a12 a13
, where a11 + a22 = 0.
a21 a22 a23
4 Consider the vector space V of all functions defined on R. Check whether the
following subsets of V are linear subspaces:
a. x + 4y − 5z = 7,
b. 2x − 4y + z = 0,
c. 2x + 4y + 4z = 7.
a. V = {(x, y, z, u) | x + 2y − 3z − u = 2, 2x + y + 6z + u = 7},
b. V = {(x, y, z, u) | y + 2z − 2u = 1, 3y + 6z = 9}.
9 Let l be the line in R3 through (3, 2, 1) and (−3, 5, 4) and let V be the plane with
equation 3x − y + 2z = 4. Determine the intersection of l and V .
§2, 3
128 Vector spaces
11 Show that:
c. (0, 1, 2), (1, 2, 3), (1, 1, 1) form a linearly dependent set of vectors in R3 ;
d. (−1, 5, 5, 3), (−1, 2, 1, 1), (1, 1, 3, 1) form a linearly dependent set of vectors in
R4 ,
e. in R3 , the vector (1, 2, 1) is not a linear combination of (1, 3, 2) and (1, 1, 1),
f. in R3 , the vector (1, 1, 1) is a linear combination of (3, −1, 4), (1, −3, 2) and
(2, 6, 1),
g. in R3 , the vector (1, 0, 0) is not a linear combination of (3, −1, 4), (1, −3, 2)
and (2, 6, 1).
d. < (3, −1, 4, 7), (1, −3, 2, 5), (2, 6, 1, −2) > in R4 ,
e. < (3, −1, 4, 7), (1, −3, 2, 5), (5, 3, 2, −1) > in R4 ,
f. < (3, −1, 4, 7), (1, −3, 2, 5), (2, 6, 1, −2), (0, 4, −1, 4) > in R4 .
14 Check if the vectors (2, −2, 7, 5), (i, 1 + i, i, 1), (2 + 3i, −3 + 2i, 1, −2 + 2i) belong
to the span < (3, −2, 3, 1), (2, 1, −2, −1), (1, 1, 2, 3) > in C4 .
4.5 Exercises 129
15 In the vector space of polynomials the following polynomials are given: f (x) =
x + 1, g(x) = (x + 1)2 . Check if the polynomials x2 + 3x + 1, x2 − 1, 3x2 − 4x − 7
belong to the span < f, g >.
16 Let a and b be two vectors in a vector space. Prove that < a, b >=< a − b, a + b >.
17 Suppose the vectors a, b, c are linearly independent. Determine whether the fol-
lowing systems are linearly dependent:
a. a + b, a + b − c, 2a + b + c;
b. a + b + c, a + 2b, c − b;
c. a + 2b, a + c, c.
b. For which values of a do l and V not intersect (which means that l and V are
parallel)?
U1 =< (−4, 1, 3), (−2, 3, 1) > en U2 =< (−1, 5, 4), (3, −1, 2) >,
V1 =< (4, 3, 2, 1), (1, 0, 0, 0) > en V2 =< (2, 1, 0, 0), (3, 2, 1, 0) >,
i.e., determine V1 ∩ V2 .
130 Vector spaces
23 Determine a basis and the dimension of each of the following spans (in the space
of functions from R to R):
25 In the vector space V the linearly independent set a, b, c is given. Determine the
dimension of each of the following spans:
b. < a − b, a + b, a + b + c >,
4.5 Exercises 131
26 Determine the coordinates of each of the following vectors with respect to the given
bases:
a. (2, 3) with respect to (1, 0), (1, 1),
b. (1, 2, 3) with respect to (1, 1, 1), (1, 0, 1), (0, 0, 1),
c. x2 with respect to 1 − x, 1 − x2 , x + x2 ,
d. cos 2t with respect to 4, sin2 t.
27 In this problem we will prove some further properties of vectors. We need the
eight axioms mentioned in 4.1.2.
a. If a + 0 = a + b, then b = 0. Prove this by adding the opposite −a of a to
both sides.
b. For all scalars λ 0 = 0. Indicate which axioms are used in the following
derivation. We have: λ 0 = λ(0 + 0) = λ 0 + λ0. But we also have: λ 0 =
λ 0 + 0. So λ 0 + λ0 = λ 0 + 0, so that part a. implies λ 0 = 0.
c. This item focuses on the equality 0 a = 0 (for every a). Finish the following
proof: 0 a = (0 + 0)a = 0 a + 0 a.
d. The zero vector 0 is unique, i.e., if 0′ also satisfies a + 0′ = a for all a, then
0 = 0′ . Prove this by considering the expression 0 + 0′ .
e. The opposite −a of a given vector a is unique. Suppose b is also an opposite
of a. Then finish the following chain of equalities to provide the proof:
−a = −a + 0 = −a + (a + b).
b. Determine the intersection of the plane U from part a) and the plane
5.1.2 Definition. (Row and column space) Let A be a matrix with n rows and m
columns. Then every row has m entries so that these rows can be seen as vectors
in in Rm or Cm ; the subspace spanned by the rows is called the row space of the
matrix. Similarly, every column is an element of Rn or Cn ; the space spanned by
the columns is called the column space of the matrix.
5.1.3 We agree to consider length n sequences of numbers, but written in column form,
as elements of Rn or Cn , respectively, and that, when convenient, we consider
elements of Rn or Cn , as columns. Of course, we try to avoid any confusion in
doing so. For instance, we write: the system Ax = b with x ∈ Rn , where x is then
seen as a column vector.
133
134 Rank and inverse of a matrix, determinants
5.1.5 The row and column spaces of a matrix seem to be quite unrelated, since they are
usually subspaces of different vector spaces. Yet their dimensions are the same!
To show this we first connect the matrix product to linear combinations of the
columns (or rows). The following example shows how a matrix product can be
rewritten as a linear combination of the columns of the 3 × 2–matrix.
3
1 3 −1 3 · 1 + 2 · 3 + 6 · (−1)
2 =
2 −2 5 3 · 2 + 2 · (−2) + 6 · 5
6
1 3 −1
=3 +2 +6 .
2 −2 5
can be rewritten as
a11 a12 a1m
x1 ... + x2 ... + · · · + xm ... ,
5.1.6 Theorem. The system of linear equations (in matrix form) Ax = b has a solution
if and only if b belongs to the column space of A.
5.1 Rank and inverse of a matrix 135
5.1.7 Two linear combinations of the columns can be described by using a m × 2 matrix
instead of a m × 1 matrix. Likewise for more than two linear combinations. For
example, in the following matrix product the two columns of the 2×2–matrix from
the right-hand side are linear combinations of the three columns of the 2×3-matrix
from the left-hand side:
3 1
1 3 −1 3 −13
2 −4 = .
2 −2 5 32 20
6 2
CX = A,
in which X is a k × m–matrix.
Now concentrate on the rows in this equality: every row of A is a linear com-
bination of the rows of X. Since the number of rows of X equals k, the row space
has dimension at most k. So
dim(rowspace) ≤ dim(columnspace).
dim(columnspace) ≤ dim(rowspace).
Summarizing:
5.1.8 Theorem. For every matrix A the dimension of the row space equals the dimen-
sion of the column space.
5.1.9 Definition. (Rank) The rank of a matrix is by definition the dimension of its
row or column space. Notation: rank(A).
5.1.10 Determining the rank of a matrix is straightforward: row reduce till you reach the
row reduced echelon form and count the number of nonzero rows.
In the remainder of this section we concentrate on n × n–matrices.
1. the rank of A is n;
Proof. We first show that 1), 2) and 3) are equivalent, and then that 1) and 4) are
equivalent.
If the rank of A is n, then the n rows and the columns span an n-dimensional
space, hence must be linearly independent. So 1) implies 2) and 3). Conversely, if
the n rows (or columns) are linearly independent, then the rows (columns) span a
n-dimensional space and so the rank must be n. So 2) implies 1), and 3) implies
1). Hence 1), 2), 3) are equivalent.
Since A and its row reduced echelon form have the same rank, we see that 4)
implies 1). Now suppose A has rank n and consider the last row of the row reduced
echelon form. It can’t be the zero row, because then the row space is spanned by
n − 1 rows and its dimension would be at most n − 1. So there must be a 1 in the
last row. Since the last row starts with more zeros than the n − 1-th row, and this
n − 1-th row starts with more zeros than the n − 2-th row, etc, this 1 must be in
position n, n. In a similar way, you show that, for k = 1, . . . , n − 1, the k-th row
is the k-th row of the identity matrix (or use induction). So 1) implies 4) and we
are done.
Proof. If the rank of the coefficient matrix is n, then the columns are a basis of
Rn (or Cn ). Every vector in Rn (or Cn ) can then be written in a unique way
as a linear combination of the columns, i.e., the system Ax = b has exactly one
solution.
If the rank of A is less than n, then the columns of A are not linearly indepen-
dent. This means that the system has infinitely many solutions, or no no solutions
at all (this last case means that b is not in the column space of A). So if the system
has exactly one solution, the rank of A must be n.
5.1.13 Theorem. if the m × n–matrix A has rank k, then the solution space of the
homogeneous system of linear equations Ax = 0 in n variables has dimension
n − k. In other words: the dimension of the solution space equals the number of
variables minus the number of ‘independent’ conditions.
5.1 Rank and inverse of a matrix 137
Proof. Since the rank of A nor the solution space of Ax = 0 changes if we replace
A by its row reduced echelon form, we can restrict our attention to the case that
A is already in row reduced echelon form. The rank of A is then equal to the
number of nonzero rows (see Theorem 4.3.9). The first place (from the left) in row
j containing a 1 corresponds to variable xij , say. So the k variables xi1 , . . . , xik
cannot be assigned a value arbitrarily, while the remaining n − k variables, say
xj1 , . . . , xjn−k in spots j1 , . . . , jn−k can be assigned arbitrary values, λj1 , . . . , λjn−k ,
say. The solutions can then be written as λj1 aj1 + · · · + λjn−k ajn−k . The vector
ajl has a 1 in position jl , while the remaining vectors have a zero in that position.
This implies that these n − k vectors are linearly independent (why?). So the
dimension of the solution space is n − k.
The elements of the i-th column (x1i , . . . , xni ) of X should satisfy x1i k 1 + · · · +
xni k n = ei . Since the columns of A are a basis of Rn (or Cn ), such a column exists
(and is in fact unique). So there is a unique inverse.
Conversely, if an inverse matrix X exists, then we conclude from
that e1 , . . . , en are in the column space of A. The n columns of A therefore span the
n-dimensional space Rn (or Cn ), hence must be linearly independent (otherwise
this n-dimensional space could be spanned with less columns contradicting the fact
that the dimension is n). So the rank of A is n by Theorem 5.1.11.
of the inverse, you need to solve the system of linear equations with extended
matrix (A|ei ). Since the resulting n systems all have the same coefficient matrix
A, these systems can be solved simultaneously! To actually do this, consider the
matrix (A|e1 , . . . , en ) = (A|I). Row reduce and read off the solutions on the right
of the vertical bar: (A|I) is being reduced to (I|A−1 ) if the inverse exists. Whether
this inverse exists can be concluded during the process: if the rank of A turns out
to be less than n, then row reducing of (A|I) produces a matrix in which the last
row is
(0, . . . , 0| ∗ . . . ∗),
where the last n ∗-s cannot all be 0 (the rows of I are linearly independent so must
remain nonzero in the row reducing process). So the system has no solutions.
In Linear Algebra 2 we will be able to prove in a simple way that if a matrix
B satisfies A B = I, then automatically B A = I. The matrix computed according
to the above procedure is therefore indeed the inverse of A.
5.2 Determinants
5.2.1 In the previous section we have seen that an n × n-matrix A of rank n has some
pleasant properties: its inverse exists and the system of linear equations (A|b) has
5.2 Determinants 139
exactly one solution. To determine the rank of a matrix can be done using row
reduction and a simple count of nonzero rows. A second technique which we will
discuss, is to determine the so-called determinant of an n × n-matrix. This is a
number which is nonzero if and only if the rank of the matrix equals n. Computing
determinants is fairly easy (at least if n is relatively small), but the theory behind
is more complicated.
The plan for this section is as follows.
• First we discuss the case of 2 × 2–determinants in detail, because almost all
aspects of determinants can be illustrated in this case.
• Then we turn to the definition of a general n × n–determinant.
• Next we concentrate on various ways to compute determinants, like expand-
ing with respect to a row or column and the role of row reducing.
• Finally, we discuss the connection with systems of linear equations (Cramer’s
rule) and with inverse matrices.
5.2.3 It’s easy to show that det(A) 6= 0 ⇔ rang(A) = 2. If, for instance, the second row
is a multiple of the first, say (c, d) = λ(a, b), then it easily follows that det(A) =
a(λb) − b(λa) = 0. Similarly, if the first row is a multiple of the second row.
Conversely, if det(A) = 0 and a 6= 0, then from ad = bc we get d = (bc/a) so that
(c, d) = (c/a)(a, b). Finish the proof yourself by considering the case a = 0.
The number det(A) also plays a role in solving the system of equations Ax = p
in two variables. If A1 is the matrix obtained from A by replacing the first column
by p, and A2 the matrix obtained from A by replacing the second column of A by
p, then the unique solution of the system
ax1 + bx2 = p1
cx1 + dx2 = p2
is
det(A1 ) p1 d − bp2 det(A2 ) ap2 − p1 c
x1 = = , x2 = = ,
det(A) ad − bc det(A) ad − bc
140 Rank and inverse of a matrix, determinants
provided det(A) 6= 0. You can check that this is really the solution, but we will
come across a nice proof in 5.2.22. There is a similar formula for the inverse of a
2 × 2–matrix of rank 2.
The determinant det(A) also plays a role in surface area computations. (We
haven’t defined surface area’s exactly, so this discussion is only by way of illus-
tration.) Figure 5.1 provides a ‘proof by pictures’ of the fact that the area of a
parallellogram spanned by the vectors (a, b) and (c, d) is equal to ad − bc = det(A).
This number is called the ‘oriented area’ since it can be negative. The ‘true’ area
is obtained by taking the absolute value. The expression ad − bc is not linear, but
(0, d-bc/a) (0, d-bc/a)
(c,d)
(a,b)
(a, 0)
is what is called bilinear (as will be discussed later). Most properties of the 2 × 2–
determinant recur when we discuss n × n–determinants. Consider the determinant
as a function of the two rows (or columns, but for the moment we focus on rows)
of the matrix: det(a1 , a2 ). Then the properties that we mean are mainly (for all
choices of vectors, scalars):
1. bilinearity:
3. normalization: det(e1 , e2 ) = 1.
These properties are easy to verify. For instance, the second property follows from:
It seems a bit pompous to desribe an easy expression like ad−bc with these abstract
looking properties, but in higher dimensions the description with the properties is
extremely useful as opposed to explicit formulas for n × n determinants.
The determinant is unique in the sense that any function D of pairs of vectors
having the three properties mentioned above must be the determinant function.
To show this, we first note that D(a, a) = 0 (use antisymmetry). Next, using
bilinearity we find:
In higher dimensions a similar computation (which we will skip) shows that the
n × n determinant is unique.
Now we turn to the n × n determinant and start with a definition of a determinant
function of n vectors in Rn or Cn .
1. Multilinearity:
m
X
D(a1 , . . . , ai−1 , βk bk , ai+1 , . . . , an )
k=1
m
X
= βk D(a1 , . . . , ai−1 , bk , ai+1 , . . . , an ),
k=1
m
X m
X
and similar expressions for D( βk bk , a2 , . . . , an ) up to D(a1 , . . . , an−1 , βk bk ).
k=1 k=1
We sometimes say that D is linear in every entry.
3. Normalization: D(e1 , e2 , . . . , en ) = 1.
5.2.5 Note that the definition doesn’t garantee that determinant functions exist for all
n (for n = 2 we have seen that there is one). We will show that there is, for every
142 Rank and inverse of a matrix, determinants
n, precisely one determinant function, and we will discuss ways to compute such
determinants. We will usually call the unique determinant function simply the
determinant.
a 1 = α2 a 2 + · · · + αn a n .
Then
n
X
D(a1 , a2 , . . . , an ) = D( α k a k , a2 , . . . , a n )
k=2
n
X
= αk D(ak , a2 , . . . , an ) = 0
k=2
5.2.7 Using these conditions we can write out what the n × n determinant should be:
just like for 2 × 2 matrices, write every row as a linear combination of the standard
basis vectors and use the multilinearity to expand the determinant as a sum of
many determinants with standard basis vectors as rows (in some order). Here is
how to do that. Consider
a11 a12 . . . a1n
a21 a22 . . . a2n
A= . ..
. . ... ... .
an1 . . . . . . ann
and so
n
X n
X
D (a1 , . . . , an ) = D a1j1 ej1 , . . . , anjn ejn
j1 =1 jn =1
n
X n
X
= ... a1j1 . . . anjn D(ej1 , . . . , ejn ) .
j1 =1 jn =1
This is a sum with many terms if n gets big: there are n summation indices each
one of which assumes n values, so the number of terms is nn . For n = 8 this
already amounts to 16.777.216 terms.
In reality there are less terms, since if two of the indices are equal, then we are
dealing with a determinant with two equal rows and such a determinant is 0. So,
if D exists, then
X
D(a1 , . . . , an ) = a1j1 . . . anjn D(ej1 , . . . , ejn ) .
alle ji distinct
Now which indices (j1 , . . . , jn ) occur in this sum? From the fact that all numbers
j1 , . . . , jn should be distinct and lies in between 1 and n we conclude that in
(j1 , . . . , jn ) every number between 1 and n occurs precisely once. Such a sequence
is called a permutation of the numbers 1, . . . , n. For example, the permutations of
1, 2, 3 are
(1, 2, 3), (1, 3, 2), (2, 1, 3), (2, 3, 1), (3, 1, 2), (3, 2, 1).
These permutations can be listed as follows: choose an element from {1, 2, 3}. This
can be done in three ways. Then there are two left to choose from in the next
step. After that, there is only one choice left for the third element. So there are
3 × 2 × 1 = 3! = 6 permutations of the numbers 1, 2, 3.
In general, there are n! = n · (n − 1) · · · 2 · 1 (‘n factorial’) permutations for the
numbers 1, 2, . . . , n.
The number of terms is therefore n!, which is substantially less than nn , but
still large. For instance, 8! = 40.320. Unfortunately, a further a priori reduction
is not possible (though we will see that there are ways to compute determinants
avoiding writing down all these terms). Since any sequence j1 , . . . , jn consists of
all numbers 1, . . . , n, by repeatedly interchanging two elements in the sequence in
a clever way (so-called transpositions), we can attain the sequence 1, . . . , n. Every
step in which we interchange two elements introduces a factor −1 , so that,
X
D(a1 , . . . , an ) = ±a1j1 . . . anjn , (5.1)
j1 ,...,jn
144 Rank and inverse of a matrix, determinants
where the sum runs through all n! permutations of (1, . . . , n) and where a term is
preceded by +1 if the corresponding permutation can be changed into (1, . . . , n)
by an even number of ‘transpositions’, and by −1 otherwise.
The formula show that for any n there is at most one determinant function
(defined by the formula (5.1) just given). Conversely, one can prove that (5.1)
satisfies the requirements from Definition 5.2.4 (the proof comes down to proving
that the parity of the number of transpositions, i.e., whether you need an even
or odd number, involved in changing a permutation into 1, . . . , n does not depend
on the choice of transpositions used. We will skip this proof, since it belongs to
the domain of algebra. Given that determinants exist, we now concentrate on the
cases n = 2 and n = 3. The discussions for n > 3 are similar.
5.2.8 2 × 2 determinant
Take n = 2 and consider the matrix with rows a1 = (a, b) en a2 = (c, d).
There are only two permutations in this case: (1, 2) and (2, 1). The first one
comes with a plus sign in (5.1) and the second is one step of interchanging away
from (1, 2) and so comes with a minus sign. the rows. So:
a b
= ad − bc ,
c d
5.2.9 3 × 3 determinant
Take n = 3 and consider
a11 a12 a13
a21 a22 a23 .
a31 a32 a33
There are 6 permutations of (1, 2, 3); the ones that come with a plus sign are
(1, 2, 3), (2, 3, 1), (3, 1, 2), and the ones with a minus sign are (1, 3, 2), (2, 1, 3),
(3, 2, 1). We find:
This expression is also known as Sarrus’ rule. This rule is easy to remember. Just
put copies of the first two columns on the right-hand side of the matrix,
and then take the sum of the products of the elements on each of the diagonals
(from upper left to lower right) and subtract the products of the ‘anti diagonals’
(from upper right to lower left).
By inspection of the terms we see that each of a11 , a12 , a13 occurs linearly
in every term reflecting the fact that the determinant is really linear in the first
vector. It takes some more work to verify from the formula that interchanging a1
and a2 , or a2 and a3 , or a3 and a1 , only changes the sign of the determinant (and
we will not write out the details). Finally, if a11 = a22 = a33 = 1 and aij = 0
for i 6= j, then D(e1 , e2 , e3 ) = 1. These considerations show that the determinant
exists for n = 3.
5.2.10 In general one can prove that expression (5.1) indeed defines a determinant function
for every n. The proof is quite involved and we will not provide details in this
course. The subtle point is the sign: one needs that the parity of the number of
steps you need in rewriting a given permutation doesn’t depend on the way you
actually carry out these steps.
In practice, the expression (5.1) is almost never useful. To actually compute
determinants there are much better ways than using this formula as we will see
below. Here are a number of results on determinants and some words on the
proofs.
det(A) = det(AT ) .
Sketch of proof. Note that (5.1) implies that every term in det(A) is a product of n
elements of the matrix in such a way that every row and every column contribute
to this product. The same holds for all terms of det(AT ) so that det(A) and
det(AT ) are sums of the same terms except maybe for the signs. That the signs
are the same is less trivial and is part of the theory of permutations.
D(x1 , . . . , xn ) = det(XB).
Then it is not difficult to show (but it takes some writing) that D is multilinear and
anti-symmetric. Since D(e1 , . . . , en ) = det(B) (and not necessarily 1), we conclude
146 Rank and inverse of a matrix, determinants
that D must be det(B) times the determinant. In other words, D = det(B) det.
In particular, if we use the rows a1 , . . . , an of our matrix A, we get
then
2 3 −1 0
A12 = en A33 = .
1 −1 2 1
Consider the n × n-matrix
a11 . . . a1n
..
a21 .
A=
.. ..
.
. .
an1 . . . ann
Every term of det(A) contains factors from the first column. First consider those
terms containing the factor a11 . Then (5.1) shows that all other factors of such
a term do not come from the first row or column. These other factors therefore
come from the submatrix A11 . Next we consider the terms containing the factor
a21 . The other factors in such a term must come from A21 for similar reasons, etc.
A detailed inspection yields:
There are similar formulas for the expansion across any row or column. Such
expansions reduce the computation of a ‘big’ determinant to the computation of
smaller determinants.
n
X
det(A) = (−1)i+j aij det(Aij ).
j=1
n
X
det(A) = (−1)i+j aij det(Aij ).
i=1
5.2.16 Example. Such an expansion is especially useful if a row or column contains many
zeros. The expansion across such a row or column then reduces the computation
to only a few smaller determinants. Here is an example concerning a so-called
(upper of lower) triangular matrix : a matrix whose entries above or below the
148 Rank and inverse of a matrix, determinants
main diagonal are 0’s. Repeatedly expanding across the first column yields:
a11 ∗ ... ... ∗
..
0 a22 .
det(A) = .. .. .. ..
. . . .
.. .. ..
. . . an−1,n−1 ∗
0 ... ... 0 ann
a33 . . . ∗
.. .. an−1,n−1 ∗
= a11 a22 . . = a11 a22 · · ·
0 ann
0 ann
The proofs are straightforward. By way of example, we prove the third property.
Suppose we add λaj to ai , i 6= j. Then linearity implies:
det(. . . , ai + λaj , . . . , aj , . . .)
The first determinant on the right-hand side is det(A), while the second one is 0
since two of the rows are equal, see 5.2.6.
5.2.19 Example.
0 1 2 −1 0 1 2 −1
2 5 −7 3 2 5 −7 3
det(A) = = =
0 3 6 2 0 3 6 2
−2 −5 4 −2 0 0 −3 1
1 2 −1 1 2 −1
1 2
−2 3 6 2 = −2 0 0 5 = 10 = − 30.
0 −3
0 −3 1 0 −3 1
Here is a description of the steps taken. First we add the 2-nd row to the 4-th.
Then we expanded across the first column. Then we subtract the 1-st row 3 times
from the 2-nd. Then we expand the 3×3 determinant across the 2-nd row. Finally,
the 2 × 2 determinant is computed using 5.2.8.
150 Rank and inverse of a matrix, determinants
5.2.20 Example.
2 0 0 8 1 0 0 4
1 −7 −5 0 1 −7 −5 0
det(A) = = 2
3 8 6 0 3 8 6 0
0 7 5 4 0 7 5 4
1 0 0 1 1 0 0 1
1 −7 −5 0 1 −7 −5 0
= 8 = 8
3 8 6 0 3 8 6 0
0 7 5 1 −1 7 5 0
1 −7 −5
= −8 3 8 6 = 0.
−1 7 5
In the first two steps we ‘extract’ a factor 2 from the 1-st row and then a factor
4 from the last column. Then we expand across the last column. The result is a
3 × 3 determinant in which the 1-st and 3-rd row are multiples of one another, so
that this determinant is 0.
The following theorem is related to Theorems 5.1.11 and 5.1.15, and to Corollary
5.1.12:
3. The system of linear equations Ax = b has exactly one solution if and only
if det(A) 6= 0.
Proof.
1. From 5.2.17 we first conclude that row and column operations may change
the value of the determinant, but not the property of ‘being 0’: if det(A) 6= 0,
then any row or column operation produces a matrix whose determinant
is 6= 0; and if det(A) = 0 then any row or column operation produces a
matrix whose determinant is 0. If rank (A) = n, then A can be transformed
into the identity matrix I whose determinant is 1, so that, by the previous
considerations, det(A) 6= 0; if rank (A) < n, then one of the rows must be a
linear combination of the other rows and therefore det(A) = 0 by 5.2.6.
5.2 Determinants 151
b = x1 k 1 + x2 k 2 + . . . + xn k n .
Now replace the j-th column of A by b an denote the resulting matrix by Aj (b).
Then
n
X
det(Aj (b)) = det(k 1 , . . . , k j−1 , xi k i , k j+1 , . . . , k n )
i=1
n
X
= xi det(k 1 , . . . , k j−1 , k i , k j+1 , . . . , k n )
i=1
= xj det(k 1 , . . . , k n ) = xj det(A),
because in the sum all determinants with i 6= j are 0 since they contain two equal
vectors. The solution of the system is therfore:
det(Aj (b))
xj = , j = 1, . . . , n.
det(A)
ax + by = c
dx + ey = f
leads to
c b a c
f e ce − bf d f af − cd
x= = , y= = ,
a b ae − bd a b ae − bd
d e d e
whenever ae − bd 6= 0.
5.2.23 Cramer’s rule can also be used to derive an explicit formula for the inverse of an
invertible square matrix. Again, for n ≥ 3 this formula is mainly of theoretical
importance. Using row operations to find the inverse is much more efficient in
practice.
To derive this formula in the case of a 2 × 2–matrix, we have to solve two
systems of linear equations:
5.2.24 The 2 × 2 determinant can be interpreted as a surface area as we saw in the begin-
ning of the chapter. Similarly, n × n determinants have geometric interpretations:
they ‘measure’ volumes of parallelepipeds spanned by the rows (or columns) of the
matrix in in Rn .
5.3 Notes 153
5.3 Notes
Determinants were popular among 19th century mathematicians. The name deter-
minant was coined by the French mathematician Augustin-Louis Cauchy (1789–
1846). All sorts of determinantal identities were derived. Cramer’s rule goes back
to Gabriel Cramer (1704–1752), even though Cramer himself did not give a proof
of the rule. Nowadays, the importance of determinants in various branches of
mathematics (and fields where mathematics is applied) is very clear. In analysis Analysis 2, 3
determinants show up in the substitution rule for multiple integrals. For instance,
when introducing new variables in a double integral a 2 × 2 determinant appears
in the transformed integral. When using polar coordinates x = r cos φ, y = r sin φ
this looks as follows:
RR RR cos φ sin φ
f (x, y) dx dy = f (r cos φ, r sin φ) dr dφ
RR −r sin φ r cos φ
= f (r cos φ, r sin φ)r dr dφ.
The proof that determinant functions exist in all dimensions requires knowledge
of permutations beyond the scope of these lecture notes. An elegant construction
of a determinant function is the following: let φ1 , . . . , φn be the dual basis of a
basis of Rn (dual bases occur in Linear Algebra 2) and define
X
det(a1 , . . . , an ) = sgn(σ)φσ(1) (a1 ) · · · φσ(n) (an ),
σ∈Sn
where Sn is the set of all n! permutations of {1, . . . , n}. Permutations are discussed
more extensively in the courses on algebra. They are also useful in describing
symmetries like the 48 symmetries of the cube. The proof that there are no general Algebra
formulas for finding the roots of polynomials of degrees ≥ 5 also uses permutations.
Determinant functions are special cases of multilinear functions, which play
an important role in the theory of line, surface and volume integrals (theorems of
Gauss, Green, Stokes) and in differential geometry.
154 Rank and inverse of a matrix, determinants
5.4 Exercises
§1
1 Determine a basis for the row space and a basis for the column space for each of
the following matrices:
1 i 1+i
1 1 1 1
a. , c. 1 + i 1 2 + i .
1 2 3 4
2 + i −1 1 + i
1 1 0 1
b. −1 2 1 1 ,
−1 8 3 5
3 Consider the matrices in the previous exercise. Deetermine the inverse of each of
the matrices whenever the inverse exists. Check your answers by using AA−1 =
A−1 A = I.
c. Let A and B be two matrices such that the product AB exists. Show that the
column space of AB is contained in the column space of A, and that the row
space of AB is contained in the row space of B.
5.4 Exercises 155
§2
x1 +x2 +x3 = 2,
2x1 −x2 −3x3 = −1,
3x2 +5x3 = 5.
x1 +ix2 +2x3 = i,
(1 + i)x1 +x2 +x3 = 1,
(−1 + 2i)x1 +(−1 + i)x2 +3ix3 = 0.
has no solutions.
3x +4y +2z = 6,
4x +6y +3z = 6,
2x +3y +z = 1.
14 For which α, β, γ are the vectors (α, β, γ), (α, 2β, 2γ), (2α, 2β, γ) in R3 linearly
dependent?
15 a. Show that any n × n-matrix A and any scalar α the relation det(αA) =
αn det(A) holds.
b. Suppose the square matrix A satisfied A−1 = A⊤ . What conclusion can you
draw on det(A)?
c. Of the n × n-matrix A it is given that A⊤ = −A. What follows from this for
det(A)?
16 a. Give an example of a square matrix A, different from the zero matrix, for which
A2 equals the zero matrix.
c. Give an example of the square matrix A, different from the zero matrix, for
which A2 = A.
17 a. Show that
1 1 1
a b c = (b − a)(c − a)(c − b).
a 2 b2 c 2
19 Determine all the values of a for which the vectors (a2 + 1, a, 1), (2, 1, a), (1, 0, 1)
in R3 are linearly dependent.
x1 + λx2 = λ2
λx1 + x2 +(λ − 1)x3 = λ2
2
2x2 −x3 = 0
have no solutions?
Chapter 6
• define the notions length, distance and angle (in particular being perpendic-
ular) in real vector spaces;
6.1.2 To give as to where the notion of an inner product comes from, we take a look at
the plane.
159
160 Inner product spaces
b
✕
✶ a
In a triangle in which two sides correspond to the vectors a and b, the third side
has length k a − b k. If ϕ is the angle between the vectors a and b, then the cosine
rule tells us that
k a − b k2 =k a k2 + k b k2 −2 k a k · k b k cos ϕ.
6.1.3 Definition. (Inner product) Let V be a real vector space. An inner product 1
on V is a function which assigns to any two vectors a, b from V a real number
denoted by (a, b) in such a way that
2. (a, b) = (b, a) for all a, b ∈ V . (So we need only impose linearity in the
first entry in the previous item, because symmetry will then automatically
imply linearity in the second entry.)
A real vector space with an inner product is often called a real inner product space.
1
An inner product is sometimes called a ‘dot product’ with notation a · b.
6.1 Inner product, length and angle 161
6.1.5 An inner product on a real vector space takes real values, an inner product on a
complex vector space assumes complex values. Although there are many similar-
ities between real and complex inner products one significant difference concerns
the second entry: a complex inner product is not linear in the second entry.
X n n
X n
X n
X
a, β i bi = βi bi , a = βi (bi , a) = β i (bi , a)
i=1 i=1 i=1 i=1
n
X
= βi (a, bi ) .
i=1
6.1.6 Example. (Standard inner product) There are many ways to define an inner
product on Rn or Cn , but there is one which deserves a special name, the so-called
standard inner product: if a = (a1 , a2 , . . . , an ) en b = (b1 , b2 , . . . , bn ) then
(a, b) = a1 b1 + a2 b2 + · · · + an bn ;
(a, b) = a1 b1 + a2 b2 + · · · + an bn .
Verify that these are really inner products using 6.1.3 and 6.1.4. The standard
inner product generalizes the usual inner product in R2 , but it turns out to play a
special role among inner products as we will see later.
6.1.7 Example. Let V be the set of continuous real valued functions on the interval
[a, b] (with b > a). Then pointwise addition and scalar multiplication turn V into
a vector space. For f, g ∈ V we define
Zb
(f, g) = f (x) g(x)dx.
a
162 Inner product spaces
It’s easy to verify that this defines an inner product on V , except maybe the second
part of property 3 (the remaining verifications are left to the reader). To prove
the second part of property 3 we proceed as follows. Take any f ∈ V and suppose
f (α) 6= 0 for some α ∈ [a, b]. Then continuity of f implies that there is an interval
around α so that |f (x)| > 12 |f (α)| > 0 for all x in that interval (if you are familiar
with the ε-δ definition of continuity: use ε = 12 |f (α)|). Let δ be the length of the
interval. Then:
Zb
1
(f, f ) = |f (x)|2 dx ≥ δ|f (α)|2 > 0.
4
a
6.1.8 Consider a real or complex inner product spacep V . Since (a, a) is a nonnegative
real number for every vector a, the square root (a, a) is a real number. We use
this to define the notion of length.
6.1.9 Definition. (Length and distance) In an inner product space the length or
norm k a k of a vector a is defined as
p
k a k= (a, a).
The distance between the vectors a and b is by definition the length of the vector
a − b, i.e., k a − b k.
6.1.10 Examples. Using the standard inner product, the length of (1, 1, 1, 1) ∈ R4 equals
p
12 + 12 + 12 + 12 = 2.
In the inner product space V from example 6.1.7 (where we take a = 0 and b = 1)
the length of the function/vector x 7→ x2 is
s
1√
Z 1
x2 · x2 dx = 5.
0 5
6.1.11 The introduction of the angle between vectors requires some thought plus a famous
theorem, the Cauchy-Schwarz inequality. This inequality is also of importance in
other branches of mathematics.
6.1 Inner product, length and angle 163
6.1.12 Theorem. (The Cauchy-Schwarz inequality) For all a, b in the inner product
space V the following inequality holds:
|(a, b)| ≤ k a k k b k .
Proof. First we deal with the case that a = 0. If a = 0, then the linearity of the
inner product implies: (a, b) = (0a, b) = 0(a, b) = 0 for all b, so in particular
(a, a) = 0 and k a k = 0. In this the inequality is even an equality.
From now on we assume a 6= 0. Choose b ∈ V and let ϕ be the argument of (a, b).
Define a∗ = e−iϕ a. Then
Then f (λ) ≥ 0 for every λ ∈ R because of definitions 6.1.3 and 6.1.4. Since f (λ)
is a quadratic function in λ,
f (λ) = (λa∗ + b, λa∗ + b)
= λ2 (a∗ , a∗ ) + λ(a∗ , b) + λ(b, a∗ ) + (b, b)
= k a∗ k2 λ2 + 2λ(a∗ , b)+ k b k2 (want (a∗ , b) ∈ R),
its discriminant must be less than or equal to 0, i.e.,
4 (a∗ , b)2 − 4 k a∗ k2 k b k2 ≤ 0 ,
(a∗ , b)2 ≤ k a∗ k2 k b k2 ,
|(a∗ , b)| ≤ k a∗ k k b k .
This implies
6.1.14 Examples. Consider the inner product spaces from example 6.1.6. The Cauchy–
Schwarz inequality then implies the following formula in Cn :
n 2 n
! n
!
X X X
a i bi ≤ |ai |2 |bi |2
i=1 i=1 i=1
164 Inner product spaces
for every pair (a1 , . . . , an ), (b1 , . . . , bn ) in Cn . For example, for a = (1, 1, 2) and
for all (x, y, z) in R3 we get:
In the vector space V from example 6.1.7 the Cauchy–Schwarz inequality leads to
the following inequality for every pair of continuous functions f and g on [a, b]:
Z b 2 Z b Z b
2
f (x)g(x)dx ≤ |f (x)| dx |g(x)|2 dx .
a a a
Inequalities like this one are used to give estimates on, e.g., integrals. For example,
2 2
for x 7→ ex and x 7→ e−x , considered on the interval [0, 1], we have:
Z 1 2 Z 1 Z 1
x2 −x2 2x2 2
1= e e dx ≤ e dx e−2x dx.
0 0 0
6.1.15 Theorem. Let V be an inner product space. Then for all vectors a and b in V
and all scalars λ we have:
k a + b k≤k a k + k b k,
❃ a+b
b kbk
✕
✛ ka+bk
kbk ✶ a
kak
0
3. k λa k= |λ| k a k.
Proof. The first part follows directly from the third condition on inner products.
The third part follows from the linearity of the inner product and (a, b) = (b, a):
so
k λa k = |λ| k a k .
For property 2, the triangle inequality, we need the Cauchy–Schwarz inequality:
Upon taking square roots on both sides, we find the desired inequality.
6.1.16 The triangle inequality provides an upper bound on the length of the sum of two
vectors. A lower bound can also be extracted from this inequality as follows.
Replace a by a − b in the triangle inequality (we can apply the inequality to any
two vectors!). We then find (for all a and b):
k (a − b) + b k ≤ k a − b k + k b k ,
ka−bk ≥ kak−kbk .
This inequality is also valid if we now replace b by − b:
k a + b k ≥ k a k − k −b k = k a k − k b k
kb+ak ≥ kbk−kak ,
ka+bk ≥ | kak−kbk | .
6.1.18 Angle
The Cauchy–Schwarz inequality enables us to define the notion of angle between
two (nonzero) vectors in a real vector space. Here are the details. Let a 6= 0, b 6= 0.
Then the Cauchy–Schwarz inequality implies
(a, b)
−1 ≤ ≤ 1.
kak kbk
166 Inner product spaces
6.1.20 Example. In Rn or Cn (with the standard inner product) the vectors ei and ej
are perpendicular for every i and j with i 6= j. Moreover, every ei has length 1.
6.1.21 Example. Let V be the inner product space of continuous functions on [0, 2π]
(see example 6.1.7) and consider for all n ∈ Z the function
en = einx .
If n 6= m, then en ⊥ em , because
Z2π Z2π
(en , em ) = einx eimx dx = ei(n−m)x dx
0 0
1
ei(n−m)2π − e0
= i(n−m) = 0.
Also,
Z2π
k en k2 = (en , en ) = einx einx dx
0
Z2π Z2π
inx 2
= e dx = dx = 2π ,
0 0
and therefore √
k en k = 2π voor alle n ∈ Z .
This inner product space of functions plays an important role in Fourier analysis,
a field with applications in for instance signal analysis.
6.2 Orthogonal complements and orthonormal bases 167
k a + b k2 =k a k2 + k b k2 .
Proof. To prove the first part, we use the properties of the inner product to expand
k a + b k2 (and we use (a, b) = (b, a) = 0):
6.2.3 Definition. Let W be a linear subspace of an inner product space V . The or-
thogonal complement of W is the set
W ⊥ = {x ∈ V | (ai , x) = 0 for i = 1, . . . , n} .
6.2.6 Example. In R3 we determine all vectors which are perpendicular to (1, 2, − 1),
i.e., the orthogonal complement of the subspace l =< (1, 2, − 1) >. A vector
x = (x, y, z) is in this complement if and only if ((1, 2, − 1), x) = 0, or x+2y−z = 0.
So l⊥ is the plane V : x + 2y − z = 0.
Next we determine the orthogonal complement of the plane V . We first deter-
mine a parametric representation of V . Let z = λ and y = µ, then x = λ − 2µ
and
V = < (1, 0, 1), (− 2, 1, 0) > .
V ⊥ consists precisely of all vectors (x, y, z) satisfying
6.2.11 Example. The set 6.2.9 is an orthonormal set and therefore linearly independent
by Theorem 6.2.10. So the three vectors are a basis. The coordinates of a = (2, 2, 2)
with respect to this basis are
1 1 √
(a, √ (1, −1, 0)) = 0, (a, √ (1, 1, 0)) = 2 2, (a, (0, 0, 1)) = 2.
2 2
So:
√ 1
(2, 2, 2) = 2 2 √ (1, 1, 0) + 2 (0, 0, 1).
2
6.2 Orthogonal complements and orthonormal bases 171
(x − (λ1 a1 + · · · + λk ak ), a1 ) = 0
..
.
(x − (λ1 a1 + · · · + λk ak ), ak ) = 0.
Expanding these inner products yields
λ1 = (x, a1 ), . . . , λk = (x, ak ).
PW ( x )
z
k x − z k2 =k x − PW (x) k2 + k PW (x) − z k2 .
k x − PW (x) k≤k x − z k
k x k2 =k x − PW (x) k2 + k PW (x) k2 .
6.2.14 Example. The orthogonal projection of the vector (1, 0, 1) ∈ R3 onto the the line
l =< (1, 2, 1) > can be computed as follows. First we divide (1, 2, 1) by its length
√
6 to get a vector on the line with length 1:
1
l =< √ (1, 2, 1) > .
6
6.2 Orthogonal complements and orthonormal bases 173
In general, the orthogonal projection of x onto the line < a >, where a has length
1, equals (x, a)a.
where the first equality follows from < e1 , . . . , ei >= ha1 , . . . , ai i, and the
second equality follows from the way we have constructed ei+1 : we have
added a linear combination of e1 , . . . , eui to ai+1 and multiplied by a non-
zero scalar. Both operations do not change the span.
Step 2 is then used to construct e2 from e1 , then e3 from e1 and e2 , etc.
V : x + y − 2z = 0 .
so that
a∗2 1
e2 = ∗ = √ (− 1, 5, 2) .
k a2 k 30
An orthonormal basis is therefore { √15 (2, 0, 1), √130 (− 1, 5, 2)}. Note that if you
start with other vectors spanning V , or with the same vectors but in a different
6.2 Orthogonal complements and orthonormal bases 175
order, you usually obtain a different orthonormal basis. For example, if you start
with the vectors (−1, 1, 0), (2, 0, 1), i.e., just the order has changed, then you find
the orthonormal basis { √12 (−1, 1, 0), √13 (1, 1, 1)} of V .
6.2.19 Example. We determine an orthonormal basis for the span of the vectors a =
(1, 1, 1, 1), b = (1, − 1, 2, 0), c = (5, 0, 1, − 4) in R4 .
In the first step we obtain:
a 1
e1 = = (1, 1, 1, 1) .
kak 2
Next, P<e1 > (b) = (b, e1 )e1 = 12 (1, 1, 1, 1), so b−P<e1 > (b) = 12 (1, − 3, 3, −1). There-
fore
(1, − 3, 3, −1) 1
e2 = = √ (1, − 3, 3, −1) .
k (1, − 3, 3, −1) k 2 5
In the next step, let P<e1 ,e2 > (c) = (c, e1 )e1 +(c, e2 )e2 = 21 (1, 1, 1, 1)+ 35 (1, − 3, 3, −1)
1 13
= 10 (11, − 13, 23, − 1). Then c − P<e1 ,e2 > (c) = 10 (3, 1, − 1, −3) and
(3, 1, − 1, −3) 1
e3 = = √ (3, 1, − 1, −3).
k (3, 1, − 1, −3) k 2 5
This means that if we use coordinates with respect to the orthonormal basis, the
inner product of x and y equals the ‘ordinary’ inner product (in Rn or Cn depending
on whether we work in a real or complex vector space) of their coordinate vectors.
176 Inner product spaces
(or: if x ∈ W ⊥ then Pw (x) = 0 and x ∈< ek+1 , . . . , en >; if x ∈< ek+1 , . . . , en >,
then, by orthonormality, (x, ei ) = 0 for i = 1, . . . , k, hence x ∈ W ⊥ ).
So the orthogonal complement W ⊥ of W has dimension n − k and the set
{ek+1 , . . . , en } is an orthonormal basis of W ⊥ . In conclusion:
6.2.25 Example. We determine the orthogonal projection of (1, 2, 1) ∈ R3 onto the plane
W : x + y + z = 0. The orthogonal complement W ⊥ is the line < (1, 1, 1) > and
the projection of (1, 2, 1) onto this line is
1 1 4
((1, 2, 1), √ (1, 1, 1)) √ (1, 1, 1) = (1, 1, 1).
3 3 3
The orthogonal projection of (1, 2, 1) onto W is therefore
4 1
(1, 2, 1) − (1, 1, 1) = (−1, 2, −1).
3 3
178 Inner product spaces
A = QR,
can be rewritten as
Q⊤ Q = I n .
But then
Q⊤ A = Q⊤ QR = In R = R.
Another way is via careful bookkeeping when carrying out the Gram-Schmidt
process.
6.3.3 Example. Applying the Gram-Schmidt process to the linearly independent vec-
tors (2, 0, 1), (−1, 1, 0) yields the orthonormal vectors
1 1
√ (2, 0, 1), √ (−1, 5, 2).
5 30
6.3 The QR-decomposition 179
6.4 Notes
The Cauchy–Schwarz inequality is named after A.–L. Cauchy (1789–1857) and
H.A. Schwarz (1843–1921). Cauchy described the inequality in terms of sequences
of numbers, whereas Schwarz worked in function spaces. The inequality is some-
times also named after V.Y. Bunyakovsky (1804–1889), who came up with it in-
dependently, also in the setting of function spaces.
Analysis Inner product spaces of functions will be further discussed in more advanced
analysis courses. Applications of such inner product spaces can be found in, for
instance, signal analysis and in quantum mechanics.
A variation of the inner product, in which non-zero vectors need not have a
Relativity positive length, occurs in relativity theory. Implicitly, a bit of this can be seen in
theory the classification of quadratic forms in Linear Algebra 2.
The Pythagorean theorem has its roots, of course, in geometry, but has sur-
prising and useful consequences in cleverly chosen function spaces. An example is
the inequality
∞
X 1 π2
≤ ,
n2 6
n=1
which can be derived using an inner product space as in example 6.1.7. (By the
way, the inequality turns out to be an equality, a famous result due to Euler.)
The Gram-Schmidt process is named after the Dane J.P. Gram (1850–1916)
and the German E. Schmidt (1876–1959) and was introduced in the setting of
function spaces.
Orthogonal projections can be used to derive the method of least squares. This
Linear method is an essential tool in, e.g., dealing with measurements. The notions length,
Algebra angle, orthogonality will reoccur in Linear Algebra 2 in the study of orthogonal
maps, like reflections and rotations.
The data of an inner product can be neatly stored in a so-called Gram–matrix.
If {a1 , . . . , an } is a basis of the inner product space V , then the entry in position
i, j of this n × n–matrix is the inner product (ai , aj ).
6.5 Exercises 181
6.5 Exercises
§1
b. (a, b) = a1 b1 − a2 b2 ,
2 1 b1
c. (a, b) = a1 a2 ,
1 2 b2
4 2 b1
d. (a, b) = a1 a2 .
2 1 b2
2 Use the Cauchy-Schwarz inequality to prove that for all real numbers a, b, c the
following inequality holds:
p
|a + 2b + 2c| ≤ 3 a2 + b2 + c2 .
b. (1, 2, 3) en (−2, 3, 1) in R3 ;
c. (2, 3, 0, 2, 1) en (0, 2, 2, 2, 2) in R5 ;
5 In the real inner product space V the (mutually distinct) vectors a, b, c, d satisfy
a − b ⊥ c − d and a − c ⊥ b − d. Prove that a − d ⊥ b − c.
7 In the real inner product space V the vectors a, b satisfy kak = kbk. Prove that
a + b and a − b are orthogonal. Give a geometrical interpretation.
8 Determine all vectors u in the inner product space V such that (u, x) = 0 for all
x ∈ V . [Hint: take x = u.]
§2
a = (4, 7, −6, 1), b = (2, 6, 2, −2), c = (−2, −1, 8, −3), d = (−2, −6, −2, 2).
i) Determine a basis of W ⊥ .
ii) Determine v ∈ W and w ∈ W ⊥ such that v + w = (2, 2, 3, 2).
b. Let W =< (1, 1, −1, 0), (2, 1, −1, −1), (1, −2, 0, 3) > be a subspace of R4 .
i) Determine a basis of W ⊥ .
ii) Determine v ∈ W and w ∈ W ⊥ such that v + w = (3, 6, −1, 3).
12 In R4 , let a = (1, 2, −2, 0), b = (2, 1, 0, 4), c = (5, 7, 3, 2). Determine the orthogonal
projection of c on < a, b >⊥ .
a. Give a basis of U ⊥ .
6.5 Exercises 183
c. Let
W =< (1, 0, 0, −1), (1, 1, −1, 0), (0, 1, 0, −2) >
15 Let W =< (1, 2, 2, 4), (3, 2, 2, 1), (1, −2, −8, −4) > be a subspace of R4 .
b. Extend this orthonormal basis to one of R4 . What are the coordinate vectors
of (1, 1, 1, 1) and (4, 4, 4, 5), respectively, with respect to this basis?
17 Let
A1 =< (1, 1, 1, 2, 1) >⊥ and A2 =< (2, 2, 3, 6, 2) >⊥
§3
18 Determine the QR-decomposition of the matrix with columns (1, 1, 1, 1), (3, 3, −1, −1),
(7, 9, 3, 5) (see exercise 16).
184 Inner product spaces
Prerequisites
A.1 Sets
Sets consist of elements. The way to denote that a is an element of the set A (or
belongs to A) is as follows:
a ∈ A.
By definition, two sets A and B are the same, A = B, if they contain exactly the
same elements. We usually describe sets in one of the following ways:
• Enumeration of a set’s elements between curly brackets. For exam-
ple: √
{1, 2, 3, 5}, {1, 2, 3, . . .}, {1, 2, 3, 5, 3}, {2, 3, x2 − 1}.
The dots in the second example indicate that the reader is expected to
recognize the pattern and knows that 4, 5, etc., also belong to the set. The
first and third sets are equal: the order in which the elements are listed and
repetitions of elements are unimportant.
In mathematics (a, 2, π) denotes an ordered list in which the order of the
elements and repetitions do matter. So (1, 2, 3), (1, 2, 2, 3), (1, 3, 2, 2) are
all distinct lists. We use such lists in this course mainly in the setting of
coordinates.
• Description of a set using defining properties. Examples:
{x | x is an even integer}, {y | y is real and y < 0}.
We also write
{x ∈ Z | x even}, {y ∈ R | y < 0},
so that it is immediately clear in which set we are working.
185
186 Prerequisites
To prove that two sets A and B are equal an often used strategy is to prove the
following statements separately: A ⊂ B and B ⊂ A.
A.2 Maps
If A and B are sets, then a map f from A to B is a rule that assigns to every element
a of A an element f (a) of B, called the image of a under f . Notation: f : A → B.
The set A is called the domain of the map, the set B is called the codomain. If
B is a set of numbers, the term function is often used instead of the term map.
In the setting of vector spaces the term transformation is often used. Two maps
are the same if they have the same domain, codomain, and if they assign to every
element of the domain the same image. The set of all images f (a) is called the
range of the map f . In set notation: {f (a) | a ∈ A} or {b ∈ B | ∃a ∈ A[b = f (a)]}.
9
1. a) 5+i d) 5 − 13
5 i
b) 1 e) −3i √
c) 4
25 +
3
25 i f) 1 − 22
2. a) r = 3, ϕ=π d) r = 2, ϕ = π6
b) r=√ 2, ϕ = π2 e) r = 13,
√ ϕ = arctan 12 2
5 = 2arctan 3
c) r = 2, ϕ = π4 f) r = 4 2, ϕ = −4 π
√
7. a) 2i √ c) 1+i √ e) e3 ( 12√− 21 i 3)
b) − 32 + 23 i 3 d) 1 1
2 − 2i 3 f) − 12 3 − 21 i
189
190 Answers to most of the exercises
8. a) z = 21 ln 2 + ( π4 + 2kπ)i, k ∈ Z
b) z = ln 2 + ( π3 + 2kπ)i, k ∈ Z
c) Re(z) = ln 5, Im(z) arbitrary
d) z=0 √
√
e) z = ± 12 √4k + 1 π(1 + i), k = 0, 1, · · ·
√
z = ± 12 4k − 1 π(1 − i), k = 1, 2, · · ·
f) z = π4 + kπ, k ∈ Z
10. a) z = π2 + kπ, k ∈ Z √
π
b) z = x + iy met x = 4 + kπ; y = − 21 ln(4 ± 15),
k∈Z
√ √
11. a) z1 = 1 z2 = 12 (1 + i √ 3) z3 = 12 (−1 +√i 3)
z4 = −1 z5 = 12 (−1 −√ i 3) z6 = 12 (1 − i 3)
b) z1 = 2 : z2,3 = −(1 ± i 3)
c) z1 = 2(cos π8 + i sin π8 ) z2 = 2(− sin π8 + i cos π8 )
π π π
z3 = 2(−√ cos 8 √ − i sin 8 ) z4 = 2(sin √ 8 − i√cos π8 )
d) z1 = 21 √2 + ( 21 √2 − 1)i z2 = − 21 √2 + ( 21 √2 − 1)i
z3 = 21 2 −√ ( 21 2 + 1)i z4 = − 21 2 − ( 21 2 + 1)i
π
e) zk = i − 2 + 3(cos( 12 + k π3 ) + i sin( 12
π
+ k π3 )),
k = 0, 1, · · · , 5. √
f) z1 = 0; z2 = 1; z3,4 = − 21 (1 ± 3i) √ √
g) z = 0 and z = eiϕ met ϕ = ( 14 + 21 k)π, k ∈ ZZ (d.w.z. ± 22 ± i 22 )
√
12. a) z = − 12 ± 21 i 3
b) z1 = −2i and z2 = 4i
c) z1,2 = 2 +√i
d) z1,2 = ± 12 6(1 − i); z3,4 = ±(1 + i)
13. a) z2 = 2i; z3 = −2
b) z2 = 1 − i; z3,4 = −3 ± 2i
c) z 3 − 7z 2 + 15z − 25
d) z 4 − 4z 3 + 14z 2 − 4z + 13
20. a. Without loss of generality assume the vertices are 0, z and ρz with |ρ| = 1.
Then |z| = |z − ρz| = |1 − ρ| · |z| so that |1√− ρ| = 1. From |ρ| = |1 − ρ| = 1
1 3
you obtain via ρ = a + bi that ρ = ± i .
2 2
z−w
21. a. From = t with t real (and 6= 1) you get z − w = t(z − v) so that
z−v
1 t
(1 − t)z = w − tv. Then z = w− v, so z = uw + (1 − u)v =
1−t 1−t
v + u(w − v) with u = 1/(1 − t) real.
23. a. re−it · v
31. Put vertex A of square ABCD in the origin, then the vertices of the square
can be described as follows: 0, w, (1 + i)w, iw (check!). Now A′ B ′ C ′ D′ is
also of this form apart from a translation, so: u, z + u, z(1 + i) + u, zi + u.
The midpoints (times 2 for computational convenience) are u, w + z + u,
(1 + i)(w + z) + u, i(w + z) + u. Apart from a translation over u we get 0,
w + z, (1 + i)(w + z), i(w + z).
192 Answers to most of the exercises
3. b) All
c) The vector is on the line: take λ = 3.
4. b) all
5. a) 0≤λ≤1
b) λ = 1/2
c) λ = 2/3
8. a) x + 2y = 7
b) x+y =4
c) x=3
10. a) 2x + 2y − z = 3
b) x−y+z =1
c) x − 2y − 2z = 0
14. a) 3
b) 5
c) π/2 radians
d) a=1
√
15. a) x1 − x2 = 1; 2
b) 4x1 + 3x2 = 10; 5
16. a) 6
b) 2
193
25. b) 3
c) x = (2, 0, 4) + σ(1, −2, 2)
i j m m j m
1.. 1.. 1..
. . .
a) i 0.. 1 b) i λ. . c) i λ. . 1
. .. ..
j 1 0.. .. . ..
. .
m 1 m 1 m 1
9. a) x = λ(17, −13, 4, 3)
b) x = λ(3, 1, −5, 0) + µ(0, 1, 0, 1)
c) x = (3, 1, 0, 0) + λ(1, 2, 1, 0) + µ(7, 5, 0, −1)
11. a) z = (−1 + i, −1 − i, 1)
b) z = µ(2, −1 − 3i) and z = µ(2, −1 + 3i)
2 1 2 1
c) z 1 = µ(1, 1, −1); z 2 = µ(1, e 3 πi , e 3 πi ); z 3 = µ(1, e− 3 πi e− 3 πi );
1
√
12. z = 2a+1 (2, 3, 1) = − 13 i 3(2, 3, 1)
195
1 1
13. λ = 1: inconsistent; λ 6= 1: ( , 0, 3 − ) + µ(1, 1, −λ − 1)
1−λ 1−λ
14. For λ 6= 0: (x1 , x2 , x3 ) = (λ, 2, −2); for λ = 0: x1 = 0, x3 = −2, x2 = µ
(arbitrary)
15. λ 6= −1: inconsistent system; for λ = −1: (1, −1, 0, 0) + α(1, −2, 1, 0) +
β(−8, 7, 0, 1)
3. Yes, yes
6. a) x−y =3
b) 3x − y − 3z = −1
c) x−y+z =3
8. a) 2x − 2y + u = 2; y = z
b) y + z = 4; −x − y + u = 5
9. x = (1, 3, 2)
20. a) a 6= ±2
b) a=3
24. a) dim = 5
b) dim = 9
c) dim = 6
d) dim = 3
e) dim = 4
f) dim = 2
25. a) dim = 2
b) dim = 3
26. a) (−1, 3)
b) (2, −1, 2)
c) ( 12 , − 21 , 21 )
d) ( 14 , −2)
30. b) c 6= −2
31. a) 2; b) h2 + e3x i
32. a) λ(0, −1, 1, 0) + µ(1, −1, 0, 1); b) (1, −1, 0, 1) + µ(0, −1, 1, 0)
5. b) rank(A) = rank(A|B)
9. 0; 2
2, n = 1
10. |A| = −1, n = 2
0, n ≥ 3
11. yes
12. a) det(A) 6= 0
b) det(A) = 0 and (2, −1, 5) ∈ column space
c) det(A) = 0 and (i, 1, 0) 6∈ column space
13. a) y = 1; b) x = −6
15. b) det(A) = ±1
c) det(A) = 0 for n odd
0 1
16. a)
0 0
b) 0
c) I
d) det(A) = 0 or det(A) = 1
18. Voor a 6= 0 and a 6= 3:
−3a + 1 −a + 3 a2 − 1
1 −1 a−3 1
a2 − 3a
a 0 −a
19. a = 0 and a = 1
20. a) for λ 6= 1, −1:
1 − 2λ2 λ λ3 − λ
1
λ −1 1 − λ2
1 − λ2
2λ −2 1 − λ2
b) λ = −1
Chapter 6: Inner product spaces
1. a) yes
b) no
c) yes
d) no
√
3. a) √11
b) 29
2
4. a) 3π
1
b) 3π
1
c) 4π
1
d) 3π
8. u=0
9. a) < (2, 0, −1) >
b) < (2, 0, −1) >
c) < (0, 0, 0) >
d) < (2, −1, −5) >
199
11. a) i) < (1, −1, −1, 0), (0, −1, −2, 1) >
ii) v = (2, 1, 1, 3), w = (0, 1, 2, −1)
b) i) < (1, 2, 3, 1) >
ii) v = (2, 4, −4, 2), w = (1, 2, 3, 1)
13. a) < (1, 0, 0, 0, −1), (0, 1, 0, −1, 0), (0, 0, 1, 0, −1) >
b) (2, 1, 2, 1, 2) and (1, 1, 0, −1, −1)
√ √ √
14. a) V ⊥ =< 13√ 3(1, 1, 1) >; V =< 12 2(1, √ 0, −1), 61 6(1, −2, 1) >
b) l⊥ =< 21 √2(1, 1, 0), (0, 0, 1)√>; l =< 21 2(1, √ −1, 0) >
c) W =< 21 2(1, √ 0, 0, −1), 1
10 10(1, 2, −2, 1), 1
3 3(1, −1, 0, 1) >,
⊥ 1
W =< 15 15(1, 2, 3, 1) >
18.
1 3 7 1 1 −1
1 3 9 1 2 2 12
1 1 1
1 −1 3 = 2
· 0 4 4
1 −1 −1
0 0 2
1 −1 5 1 −1 1
√ √
19. a) (1/2)(1, 1, 1, 1), (1/ 20)(1, −1, 3, −3); b) (2, 1, 3, 0); c) 2 5
21. b) (3/2)a; c) a + b
200 Answers to most of the exercises
Bibliography
[1] Carl B. Boyer. A history of mathematics. John Wiley & Sons, Inc., New York,
(1989, 2nd edition with Uta C. Merzbach)
[2] Bruce Cooperstein. Elementary Linear Algebra: Methods, Procedures and Al-
gorithms.
Can be obtained through ‘lulu’. As the title suggests, this book con-
tains descriptions of computational techniques (and not so much the the-
ory of linear algebra) and examples of the use of these techniques. See
http://linearalgebramethods.com for more information.
[3] Jan van de Craats. Vectoren en matrices. Epsilon Uitgaven, Utrecht (2000).
This book overlaps quite a bit with the course material. But of course, the
author uses his own approach.
[4] David C. Lay. Linear algebra and its applications. Pearson/Addison Wesley,
Boston, etc. (2006, 3rd ed. update)
A pleasant book to read. Contains many exercises.
[5] Liang-shin Hahn. Complex numbers & Geometry. The Mathematical Associa-
tion of America (1994)
This fine little book discusses the role of complex numbers in plane geometry.
[7] Murray R. Spiegel. Theory and problems of complex variables. Schaum’s Out-
line Series, McGraw-Hill (1974)
[8] Hans Sterk. Wat? Nog meer getallen. Syllabus complexe getallen ten behoeve
van Wiskunde D. See http://www.win.tue.nl/wiskunded
201
202 BIBLIOGRAPHY
[9] Gilbert Strang. Linear algebra and its applications. Harcourt Brace etc., San
Diego (1988, 3rd edition)
Clearly written text on linear algebra in which matrices are central. In contrast
to Halmos’s book, no emphasis on abstract theory.
Index
altitude, 64 decomposition
angle, 165 QR-decomposition, 178
dependent on, 110
basis, 47, 117 determinant, 139, 142
expansion across a column, 147
centroid, 27, 62
expansion across a row, 147
circle
expansion across the first column,
parametric equation, 24
147
circumcircle, 27
determinant function, 141
coördinaten, 121
dimension, 116, 117
column space, 133
distance, 51, 162
completing the square, 20
division
complex cosine, 13
long, 18
complex exponential, 11
complex number, 1 elementary row operations, 82
absolute value, 3
addition, 2 Fundamental theorem of algebra, 19
argument, 3
Gaussian elimination, 85
principal value, 3
Gram-Schmidt process, 173
complex conjugate , 7
divide by, 7 imaginary axis, 1
imaginary part, 4 inner product, 51, 160, 161
multiplication, 2 inner product space, 160, 161
real part, 4
complex polynomial, 16 length, 51, 162
coefficients, 16 line, 106
degree, 16 direction vector, 45, 106
complex sine, 13 parametric equation, 106
coordinate vector, 121 parametric representation, 45, 106
coordinates, 47 position vector, 106
Cramer’s rule, 151 supporting vector, 45
cross product, 58 vector representation, 45
right hand rule, 60 linear combination, 43, 110
203
204 INDEX
orthocenter, 27 triangle
orthogonal complement, 168 altitude, 64
orthonormal basis, 47, 54, 169 centroid, 62
orthonormal set, 169 median, 62
orthonormal set of vectors, 169 triangle inequality, 5, 164
coordinate vector, 48
coordinates, 47
opposite, 41
scalar multiple, 40
scalar product, 40
zero vector, 40
vector parametric equation, 107
vector representation, 45
vector space, 100
axioms, 100
basis, 117
complex, 100
dimension, 117
normed, 165
real, 100
standard basis, 118
vectoroptelling, 41
vectors
linearly dependent, 114
linearly independent, 114
vectors space
dimension, 116
zero vector, 40