Lecture Notes in Linear Algebra 1 (80134) : Raz Kupferman Institute of Mathematics The Hebrew University

Download as pdf or txt
Download as pdf or txt
You are on page 1of 342

Lecture Notes in Linear Algebra 1 (80134)

Raz Kupferman
Institute of Mathematics
The Hebrew University

February 13, 2021


2
Contents

1 Fields 1
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Definition of a field . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.4 Solvability of linear equations . . . . . . . . . . . . . . . . . . . 9
1.5 Equality as an equivalence relation . . . . . . . . . . . . . . . . 14
1.6 Extended associativity and commutativity . . . . . . . . . . . . 15

2 Linear Systems of Equations 23


2.1 One equation in multiple unknowns . . . . . . . . . . . . . . . . 23
2.2 Systems of equations . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.3 Equivalent systems of equations . . . . . . . . . . . . . . . . . . 33
2.4 Matrix notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
2.4.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . 40
2.4.2 Elementary row-operations and row-equivalence . . . . 42
2.4.3 Row-reduced echelon matrices . . . . . . . . . . . . . . . 49
2.4.4 The Gauss-Jordan algorithm . . . . . . . . . . . . . . . 53
2.5 Operations with matrices . . . . . . . . . . . . . . . . . . . . . . 60
2.5.1 Addition of matrices . . . . . . . . . . . . . . . . . . . . 60
2.5.2 Multiplication by a scalar . . . . . . . . . . . . . . . . . 61
2.5.3 Products of matrices . . . . . . . . . . . . . . . . . . . . 63
2.5.4 Algebraic properties of matrix multiplication . . . . . . 70
ii CONTENTS

2.5.5 Matrix multiplication and block patterns . . . . . . . . 72


2.5.6 Invertible matrices . . . . . . . . . . . . . . . . . . . . . 73
2.5.7 Elementary matrices . . . . . . . . . . . . . . . . . . . . 79
2.5.8 Elementary matrices and invertibility . . . . . . . . . . 83
2.6 The structure of the set of solutions . . . . . . . . . . . . . . . 90
2.6.1 The homogeneous case . . . . . . . . . . . . . . . . . . . 90
2.6.2 The inhomogeneous case . . . . . . . . . . . . . . . . . . 91
2.7 The geometry of solutions . . . . . . . . . . . . . . . . . . . . . 93
2.7.1 Affine spaces . . . . . . . . . . . . . . . . . . . . . . . . . 93
2.7.2 The affine space An (F) . . . . . . . . . . . . . . . . . . . 96
2.7.3 Lines in affine spaces . . . . . . . . . . . . . . . . . . . . 97
2.7.4 Planes in affine spaces . . . . . . . . . . . . . . . . . . . 99

3 Vector Spaces 101


3.1 Definitions and examples . . . . . . . . . . . . . . . . . . . . . . 101
3.2 Basic properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
3.3 Subspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
3.3.1 Definitions and examples . . . . . . . . . . . . . . . . . . 112
3.3.2 The subspace generated by a set . . . . . . . . . . . . . 117
3.3.3 The linear span of a set of vectors . . . . . . . . . . . . 124
3.3.4 The row space of a matrix . . . . . . . . . . . . . . . . . 131
3.3.5 The column space of a matrix . . . . . . . . . . . . . . . 134
3.3.6 The sum of linear subspaces . . . . . . . . . . . . . . . . 136
3.4 Bases and dimensions . . . . . . . . . . . . . . . . . . . . . . . . 139
3.4.1 Linear dependence . . . . . . . . . . . . . . . . . . . . . . 139
3.4.2 Bases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
3.4.3 The dimension of a vector space . . . . . . . . . . . . . 157
3.4.4 The rank of a matrix . . . . . . . . . . . . . . . . . . . . 168
3.5 Coordinates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172
3.5.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . 172
CONTENTS iii

3.5.2 Ordered bases and coordinates . . . . . . . . . . . . . . 173


3.5.3 Transitions between bases . . . . . . . . . . . . . . . . . 181

4 Linear Forms 189


4.1 Definition and examples . . . . . . . . . . . . . . . . . . . . . . . 189
4.2 Properties of linear forms . . . . . . . . . . . . . . . . . . . . . . 192
4.3 The dual space . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197
4.4 Dual bases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200
4.5 Null space and annihilator . . . . . . . . . . . . . . . . . . . . . 211
4.5.1 The annihilator of a set of vectors . . . . . . . . . . . . 211
4.5.2 The null space of a set of linear forms . . . . . . . . . . 215
4.5.3 Linear systems and linear forms . . . . . . . . . . . . . 223

5 Linear Transformations 227


5.1 Definition and examples . . . . . . . . . . . . . . . . . . . . . . . 227
5.2 Properties of linear transformations . . . . . . . . . . . . . . . . 230
5.3 The space HomF (V, W ) . . . . . . . . . . . . . . . . . . . . . . . 237
5.4 Projections and reflections . . . . . . . . . . . . . . . . . . . . . 238
5.5 Kernel and image . . . . . . . . . . . . . . . . . . . . . . . . . . . 245
5.6 linear transformations and subspaces . . . . . . . . . . . . . . . 248
5.7 Nullity and Rank . . . . . . . . . . . . . . . . . . . . . . . . . . . 251
5.8 Composition of linear transformations . . . . . . . . . . . . . . 258
5.9 Rotations of the plane . . . . . . . . . . . . . . . . . . . . . . . . 263
5.10 The dimension of HomF (V, W ) . . . . . . . . . . . . . . . . . . . 265
5.11 Isomorphisms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 268
5.12 Matrix representation . . . . . . . . . . . . . . . . . . . . . . . . 275
5.13 Algebra of transformations and matrix algebra . . . . . . . . . 286
5.14 Change of basis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 291

6 Volume Forms and Determinants 297


6.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 297
iv CONTENTS

6.2 Volume forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 299


6.3 Volume forms and elementary matrices . . . . . . . . . . . . . . 301
6.4 Multilinearity and alternation . . . . . . . . . . . . . . . . . . . 303
6.5 Determinants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 310
6.6 Calculating determinants . . . . . . . . . . . . . . . . . . . . . . 315
6.7 Determinants and transposition . . . . . . . . . . . . . . . . . . 320
6.8 Cramer’s formula . . . . . . . . . . . . . . . . . . . . . . . . . . . 326
6.9 The determinant of a linear transformation . . . . . . . . . . . 331
Chapter 1

Fields

1.1 Motivation
Recall the times when you learned to solve a linear equation is one unknown,
say,
3X + 6 = 18. (1.1)
An equation “asks a question”: which number x yields an equality between
both sides of the equation when substituted for the unknown X. What did
you do? As a first step, you determined that if “something” plus 6 equals
18, then that “something” had to be equal 12, namely, that every solution
of (1.1) is also a solution of the equation

3X = 12.

Stated differently, you used the fact that since the two sides of an equation are
by definition equal, then the equation will remain true if you subtract 6 from
both sides. As a second step, you determined that if 3 times “something”
equals 12, then by solving an unknown-factor problem, that “something” has
to be equal 4, finally, leading to the solution

x = 4.

In fact, 4 is the unique solution to (1.1).


Note that there are quite a few underlying assumptions in this way of solving
an equation. First, there is a notion of the two sides of an equation being
2 Chapter 1

in some sense “the same”. Second, this notion of equality justifies the fact
that if the same operation is applied on both sides, then the results of this
operation preserve the sameness of the two sides. Third, we assume the
existence of the operations of addition and multiplication, and their inverses,
subtraction and division. We used the fact if “unknown + number = number”
then “unknown” can be determined uniquely, and similarly for “unknown ×
number = number” (unless if the first number is zero).
But even before that, what are numbers? In the above example, we make do
with the natural numbers,

N = {1, 2, 3, . . . },

with which we are acquainted since early childhood.


Does every linear equation with coefficients in N have a solution in N? No.
Consider the equation
X + 6 = 6.
It does not have solutions in N. If we want this equation to be solvable, we
must add to the natural numbers a new element, which we call zero, forming
now a set of numbers N ∪ {0}.
Does every linear equation with coefficients in N ∪ {0} have a solution in
N ∪ {0}? No. Consider the equation

X + 8 = 6.

It does not have solutions in N ∪ {0}. If we want this equation to be solvable,


we must introduce the negative integers, which together with the natural
numbers and zero form the set of integers (!M‫ השלמי‬M‫)המספרי‬,

Z = {⋯, −2, −1, 0, 1, 2, . . . }.

(The letter “Z” stands for the German word zahl, which means number.)
Does every equation with coefficients in Z have a solution in Z? No. The
equations
4X = 3 and 4X = (−3)
do not have solutions in Z. Requiring these equation to be solvable requires
the introduction of the rational numbers (!M‫ הרציונליי‬M‫)המספרי‬, which are
denoted by Q (for quotients).
Fields 3

The set of rational numbers gives us already the ability to solve any equation
of the form
aX + c = b,
where a, b, c ∈ Q as long as a ≠ 0 (we will discuss the case of a = 0 later). Thus,
the rational numbers are “complete” in the sense that any linear equation
with coefficients in that set has a solution within that set.
The rational numbers are however not “complete” in other respects. More
than two millennia ago, it was discovered that the quadratic equation X 2 = 2
does not have a solution within the set of rational numbers, leading eventually
to the definition of the set of real numbers (!M‫ הממשיי‬M‫)המספרי‬, which we
denote by R. The set of real numbers extends the set of rational numbers in
a sense described in your Calculus class. And yet, even with this extension,
there still exist “simple” equations that are not solvable, such as

X 2 = (−1).

This observation has eventually led to the further extension of the set of
real numbers into the set of complex numbers (!M‫ המרוכבי‬M‫)המספרי‬, which
we denote by C. The complex numbers are defined by introducing a new
“number” ı, satisfying ı2 = (−1), and then considering all combinations a+b ı,
with a, b ∈ R.
It should be noted that in the context of linear equations, denoting either Q,
R or C by the generic notation F, every equation of the form

aX + c = b,

where a, b, c ∈ F has a unique solution in F, provided that a ≠ 0.

1.2 Definition of a field


This course starts with the problem of solving systems of linear equations; as
we progress to higher levels of mathematics, we tend to abstract out concepts
that were formerly used without a formal definition. By the end of the day,
we want to do mathematics in a way that is independent of meaning. Thus,
we ask ourselves what is it that we want “numbers” to satisfy in order to be
able to solve linear equations featuring those numbers as coefficients. The
answer is partly given above: whatever those numbers are, we want to be able
4 Chapter 1

to perform on them all the operations we would do with “school numbers”


to solve linear systems of equations.
This brings us to defining an algebraic structure called a field (!‫)שדה‬:
A field F is a set containing at least two different elements, which we call
zero and one, and denote by 0F and 1F . These elements are endowed with
two binary operations (!‫)פעולות דו מקומיות‬, which we call addition (!‫)חיבור‬
and multiplication (!‫)כפל‬.
A binary operation on a set can be viewed as a “machine” taking for input
two elements in the set (in a prescribed order!), and returning for output
an element in that set, such that the output is uniquely determined by the
input. In the case where a ∈ F and b ∈ F are inputs for the addition operation,
we denote the output by a + b. The statement that the output be determined
by the input can be formalized into stating that to every a, b ∈ F there
corresponds a unique c ∈ F, such that c = a + b1 . Likewise, if a ∈ F and b ∈ F
are inputs for the multiplication operation, we denote the output by a ⋅ b; to
every a, b ∈ F there corresponds a unique c ∈ F, such that c = a ⋅ b.
For F to be a field, more structure has a to be incorporated: addition and
multiplication have to satisfy nine properties, called the axioms of field
(!‫)אקסיומות השדה‬. Before stating the axioms, we should note that both addi-
tion and multiplication only act, by definition, on pairs of elements. Thus,
there is no meaning at this point to adding or multiplying three or more el-
ements. A binary operation can be extended to an operation on any (finite)
number of elements in a recursive way. Let a, b, c ∈ F. Their sum can be de-
fined by taking the sum of a + b and adding it to c. This repeated application
of the binary operation of addition is denoted using parentheses,
(a + b) + c.
This is however not the only alternative: we could have also added a to the
sum of b and c, the result of this compound action being denoted by
a + (b + c).

With that, we spell out the first four axioms, which are pertinent to addition:
1
Throughout this text we will use the standard notation of set theory: if A is a set,
then a ∈ A means that a is an element in A, or that a belongs to A. For two sets A and
B, the relation A ⊆ B means that A is a subset of B, implying that every element in A
is also an element in B; note that this relation holds also if A = B. In fact A = B means
that both A ⊆ B and B ⊆ A.
Fields 5

1. Addition is associative (!‫)קיבוצי‬: for all a, b, c ∈ F,

(a + b) + c = a + (b + c). (A1)

2. Addition is commutative (!‫)חילופי‬: for all a, b ∈ F,

a + b = b + a. (A2)

3. Zero is neutral to addition: for all a ∈ F,

a + 0F = a. (A3)

4. Every element a ∈ F has an additive inverse (!‫)איבר נגדי‬, which we


denote by (−a) ∈ F, satisfying

a + (−a) = 0F . (A4)

The next four axioms are analogous (with one big difference!) and pertinent
to multiplication:

5. Multiplication is associative: for all a, b, c ∈ F,

(a ⋅ b) ⋅ c = a ⋅ (b ⋅ c). (M1)

6. Multiplication is commutative: for all a, b ∈ F,

a ⋅ b = b ⋅ a. (M2)

7. One is neutral to multiplication: for all a ∈ F,

a ⋅ 1F = a. (M3)

8. Every non-zero (!!!) element 0F ≠ a ∈ F has a multiplicative inverse


(!‫)איבר הפכי‬, which we denote by a−1 ∈ F, satisfying

a ⋅ a−1 = 1F . (M4)

Finally, the ninth axiom links between addition and multiplication:


6 Chapter 1

9. Multiplication is distributive (!‫ )פילוגי‬over addition: for all a, b, c ∈ F,

a ⋅ (b + c) = a ⋅ b + a ⋅ c. (D)

Comments:

(a) Elements of a field are called scalars (!M‫( )סקלרי‬rather than numbers).
(b) When no ambiguity occurs, we may denote the product of two elements
by ab rather than by a ⋅ b.
(c) We denoted the elements zero and one by 0F and 1F to emphasize that
they may differ from the numbers zero and one. Nevertheless, when no
confusion arises, we may revert to the more standard notation 0 and 1.
(d) A priori, a scalar may be its own additive and/or multiplicative inverse.
In fact, 0F is always its own additive inverse and 1F is always its own
multiplicative inverse. We will shortly see an example in which 1F is
also its own additive inverse.
(e) Subtraction (!‫ )חיסור‬is defined as the addition of the additive inverse,

a − b = a + (−b),

whereas division (!‫( )חילוק‬by a nonzero divisor) is defined as the mul-


tiplication by the multiplicative inverse,

a ÷ b = ab−1 .

Exercises

(easy) 1.1 S is a set. S claims to be a field. List all the properties you
should check in order to verify whether S’s claim is correct.

(easy) 1.2 Draw an “addition machine”, which is a box having two input
ports (labeled Input 1 and Input 2) and one output port. Combine two such
machines to generate the output (a + b) + c. Combine two such machines to
generate the output a + (b + c).

(easy) 1.3 Let F be a field. Prove that for every a ∈ F,

0F − a = (−a).
Fields 7

Likewise, prove that for every F ∋ a ≠ 0F ,

1F ÷ a = a−1 .

Solution 1.3: By the definitions of subtraction and division, and by the neutrality
properties of 0F and 1F ,

0F − a = 0F + (−a) = (−a) and 1F ÷ a = 1F ⋅ a−1 = a−1 .

1.3 Examples
Example: We are already acquainted with three fields, Q, R and C. Since

Q ⊂ R ⊂ C,

this may give the impression that all the fields in the world form a hierarchy
of inclusions. This is not the case, as the next example shows. ▲▲▲

Example: A field is fully determined by its elements, and its tables of


addition and multiplication. The smallest possible field is one consisting of
just two elements, zero and one, along with the addition and multiplication
tables:

+ 0 1 ⋅ 0 1
0 0 1 0 0 0
1 1 0 1 0 1

It takes some explicit verification to check that this is indeed a field (do you
recognize it?). This field is commonly denoted by F2 . That addition and
multiplication are commutative is apparent by the symmetry of the tables.
The neutrality of zero and one is also apparent. For associativity and dis-
tributivity we actually have to examine all the cases. Finally, 0 is its own
additive inverse and 1 is both its own additive and multiplicative inverses.
▲▲▲
8 Chapter 1

Exercises

(intermediate) 1.4 Consider a set consisting of three elements {0, 1, 2}


along with two binary operations defined by

+ 0 1 2 ⋅ 0 1 2
0 0 1 2 0 0 0 0
1 1 2 0 1 0 1 2
2 2 0 1 2 0 2 1

How many verifications need to be done to determine whether it is a field


(without taking shortcuts)? Verify that this is indeed a field (and you may
take shortcuts). This field is commonly denoted by F3 .

(harder) 1.5 Construct a field having four elements. Hint: construct first
the multiplication table. Then, construct addition tables and show that only
one of them is consistent with all axioms.

Solution 1.5: The solution is

+ 0 1 a b ⋅ 0 1 a b
0 0 1 a b 0 0 0 0 0
1 1 0 b a 1 0 1 a b
a a b 0 1 a 0 a b 1
b b a 1 0 b 0 b 1 a

(easy) 1.6 Consider the following set

S = {(1, a) ∶ a ∈ R} ,

along with two binary operations,

(1, a) ⊕ (1, b) = (1, a + b) and (1, a) ⊙ (1, b) = (1, ab) ,

where the addition and the multiplication on the right-hand sides are the
standard addition and multiplication in R.

(a) Does S have an element neutral to ⊕?


Fields 9

(b) Does S have an element neutral to ⊙?


(c) Is S with ⊕ and ⊙ a field.

Solution 1.6: Intuitive answer: just ignore the 1. Indeed, S is closed with respect to
both operations; (1, 0) is neutral to addition, (1, 1) is neutral to multiplication, etc. So
yes, S is a field.

(intermediate) 1.7 Consider the following set

T = {(a, b) ∶ a, b ∈ R} ,

along with two binary operations,


(a, b) ⊕ (c, d) = (a + c, b + d) and (a, b) ⊙ (c, d) = (ac, bd) ,

where the addition and the multiplication on the right-hand sides are the
standard addition and multiplication in R.

(a) Does T have an element neutral to ⊕?


(b) Does T have an element neutral to ⊙?
(c) Is T with ⊕ and ⊙ a field.

Solution 1.7: (0, 0) is neutral to ⊕ and (1, 1) is neutral to ⊙. However, this is not a
field. For example, the non-zero element (1, 0) doesn’t have a multiplicative inverse.

1.4 Solvability of linear equations


We next show that every linear equation in one unknown with parameters in
a field has a unique solution within that field:

Theorem 1.1 Let F be a field and let a, b, c ∈ F with a ≠ 0F . Then, the linear
equation
aX + c = b
has a solution and this solution is unique.
10 Chapter 1

Proof : There are two claims to be proved: first, that there exists an x ∈ F
such that
ax + c = b,
and second, that if x, y ∈ F both satisfy
ax + c = b and ay + c = b,
then x = y.
For existence, x = a−1 (b + (−c)) is a solution, as
(M1)
a (a−1 (b + (−c))) + c = (aa−1 )(b + (−c)) + c
(M4)
= 1F ⋅ (b + (−c)) + c
(M3)
= (b + (−c)) + c
(A1)
= b + (c + (−c))
(A4)
= b + 0F
(A3)
= b.
(Be sure to understand the justification of each passage.)
To prove uniqueness, suppose that
ax + c = b and ay + c = b.
Since both left-hand sides equal to b, they are equal, i.e.,
ax + c = ay + c.
We now proceed with the following deductions:
(ax + c) + (−c) = (ay + c) + (−c)
ax + (c + (−c)) = ay + (c + (−c))
ax + 0F = ay + 0F
ax = ay
a (ax) = a−1 (ay)
−1

(a−1 a)x = (a−1 a)y


1F ⋅ x = 1F ⋅ y
x = y.
Fields 11

(Be sure you understand why we had to assume that a ≠ 0F both for the
existence and the uniqueness.) n
The above proposition has a number of implications pertinent to any field:

Corollary 1.2 (Uniqueness of zero) If there exist b, x ∈ F such that

x + b = b,

then x = 0F .

Proof : Consider the linear equation

X + b = b.

Since x = 0F is a solution of this equation, it follows from the uniqueness


property that x + b = b implies that x = 0F . n

Corollary 1.3 (Uniqueness of the additive inverse) If there exist


b, x ∈ F such that
x + b = 0F ,
then x = (−b) (in other words, the additive inverse is unique).

Proof : Consider the linear equation

X + b = 0F .

Since x = (−b) is a solution of this equation, it follows from the uniqueness


property that x + b = 0F implies that x = (−b). n

Exercises

(easy) 1.8 Prove that if there exist a, x ∈ F, a ≠ 0, such that

x ⋅ a = a,

then x = 1F .
12 Chapter 1

Solution 1.8: Consider the equation


a X + 0F = a.
x = 1 is a solution, but by Theorem 1.1, the solution is unique, i.e., x = 1F is the only
scalar satisfying ax = a.

(easy) 1.9 Prove that if there exist a, x ∈ F such that


x ⋅ a = 1F ,
then x = a−1 (in other words, the multiplicative inverse is unique).
Solution 1.9: Consider the equation
a X + 0F = 1F .
x = a−1 is a solution, but by Theorem 1.1, the solution is unique, i.e., x = a−1 is the only
scalar satisfying ax = 1F .

(harder) 1.10 Let F be a field. Prove that for every a ∈ F,


a ⋅ 0F = 0F .
Hint: consider the equation X +a⋅0F = a⋅0F and show that x = 0F and x = 0F ⋅a
are both solutions, hence must be equal.
Solution 1.10: Using the properties of a field,
a ⋅ 0F = a ⋅ (0F + 0F ) = a ⋅ 0F + a ⋅ 0F .
Consider now the equation,
X + a ⋅ 0F = a ⋅ 0F .
Since both a ⋅ 0F and 0F are solutions, it follows from Theorem 1.1 that a ⋅ 0F = 0F .

(harder) 1.11 Let F be a field. Prove that for every a, b ∈ F,


ab = 0F if and only if a = 0F or b = 0F .
Comment: the word or has a different meaning in mathematics than in our
daily language. The “mathematical” or is inclusive: in this case, either
a = 0F , or b = 0F or both a = b = 0F . Hint: there are two separate claims to
prove; formulate each claim separately.
Fields 13

Solution 1.11: We already know that if either a = 0F or b = 0F , then ab = 0F . Remains


the other direction. Suppose that ab = 0F . If a ≠ 0F , consider the equation

aX + 0F = 0F .

Since x = 0F and x = b are both solutions, it follows from Theorem 1.1 that b = 0F . Similarly,
if b = 0F ≠ 0, it follows that a = 0F .

(intermediate) 1.12 Let F be a field. Prove that for every a, b, c, d ∈ F

(a) −(−a) = a.
(b) (a−1 )−1 = a.
(c) (−1)a = (−a).
(d) (−0) = 0.
(e) a ≠ 0 if and only if (−a) ≠ 0.
(f) a = b if and only if a − b = 0.
(g) −(a + b) = −a − b.
(h) −(a − b) = b − a.
(i) (−a)b = a(−b) = −(ab).
(j) (−a)(−b) = ab.
(k) a ⋅ a = 1 if and only if a = 1 or a = −1.
(l) a ⋅ a = b ⋅ b if and only if a = b or a = −b.
(m) If a, b ≠ 0 then (ab)−1 = a−1 b−1 .
(n) If a ≠ 0 then 0/a = 0.
(o) a/1 = a.
(p) If b, d ≠ 0 then a/b = c/d if and only if ad = bc.
(q) If b, d ≠ 0 then (b/d)−1 = d−1 /b−1 .
(r) If b, d ≠ 0 then (a/b)(c/d) = (ac)/(bd).
(s) If b, d ≠ 0 then a/b + c/d = (ad + bc)/(bd).
14 Chapter 1

Solution 1.12: Most items are based on the same idea. Take for example Item (i):
consider the equation
X + ab = 0F .
On the one hand x = −(ab) is a solution. On other hand, substituting x = (−a)b,

(−a)b + ab = ((−a) + a))b = 0F b = 0F ,

i.e., x = (−a)b is also a solution, and by uniqueness (−a)b = −(ab). Item (l) follows by
noting that
a ⋅ a − b ⋅ b = (a + b)(a − b),
hence a ⋅ a − b ⋅ b = 0F if and only if either a + b = 0F or a − b = 0F .

1.5 Equality as an equivalence relation


One of the hidden assumptions throughout this section is the properties of
the equality sign, and its consistency with the operations of addition and
multiplication. Equality is an instance of an equivalence relation (‫יחס‬
!‫)שקילות‬. By that we mean the following:

(a) Every element in a set is equal to itself, i.e., for every a ∈ F,

a = a.

(This property of being equivalent to oneself called reflexivity.)


(b) Equality is symmetric: for all a, b ∈ F,

a=b implies b = a.

(c) Equality is transitive: for every a, b, c ∈ F,

a=b and b=c imply a = c.

You will encounter many equivalence relations throughout your studies, in-
cluding in this course.
Moreover, we assume that addition and multiplication are consistent with
this notion of equivalence, namely, for all a, b, c ∈ F,

a=b implies a + c = b + c,
Fields 15

and
a=b implies a ⋅ c = b ⋅ c.
This assumption is the basis for the practice of adding the same term to both
sides of an equation.

Exercises
(easy) 1.13 Show that
a=b and c=d implies a + c = b + d,
and
a=b and c=d implies a ⋅ c = b ⋅ d.
Solution 1.13: It is an immediate consequence of consistency with respect to addition
and multiplication, along with the transitivity of the equality, for example,
a + c = b + c = b + d.

1.6 Extended associativity and commutativ-


ity
Associativity for finite sums The associativity of addition (and similarly
of multiplication) assert that for every three scalars a, b, c ∈ F
(a + b) + c = a + (b + c).
What about the addition of four scalars. Without switching the order of the
addends, we have the following alternative ways of adding up four addends
a, b, c, d ∈ F,
((a + b) + c) + d (a + (b + c)) + d (a + b) + (c + d)
(1.2)
a + ((b + c) + d) a + (b + (c + d))
The associativity of addition generalizes to any number of addends. If there
are n addends, then n−2 pairs of parentheses are needed in order to prescribe
the order of summation. The generalized law of associativity (which follows
from the associativity for 3 addends) asserts that addends may be grouped
in any order, always yielding the same sum.
16 Chapter 1

The summation sign Let a1 , . . . , an ∈ F, where n may be any natural


number. We may denote their sum by

a1 + a2 + ⋅ ⋅ ⋅ + an .

While this notation may be self-explanatory. there may be cases where the
use of an ellipsis (three dots) is ambiguous. The more formal way or writing
this sum is n
∑ ai or ∑ ai ,
i=1 1≤i≤n

which we read as “the sum of all ai ’s where i ranges from one to n”. Formally,
this sum is defined inductively (!‫ )הגדרה אינדוקטיבית‬as follows:
1
∑ ai = a1 ,
i=1

and for all n > 1,


n n−1
∑ ai = ∑ ai + an .
i=1 i=1

Note that such a definition is meaningful even if the operation is nor asso-
ciative nor commutative.

Example: Let’s follow the inductive definition for


4
x = ∑ i(i + 1).
i=1

Unfolding the recursion we obtain


4 3
∑ i(i + 1) = ∑ i(i + 1) + 4 ⋅ 5
i=1 i=1
2
= (∑ i(i + 1) + 3 ⋅ 4) + 4 ⋅ 5
i=1
1
= ((∑ i(i + 1) + 2 ⋅ 3) + 3 ⋅ 4) + 4 ⋅ 5
i=1
= ((1 ⋅ 2 + 2 ⋅ 3) + 3 ⋅ 4) + 4 ⋅ 5.
▲▲▲
Fields 17

Commutativity for finite sums Commutativity is inherently a binary


property. As such it can only be generalized to multiple addends (or factors)
when combined with associativity. The generalized law of commutativity and
associativity can be formalized as follows: let a1 , . . . , an ∈ F be a collection
of scalars. Let σ be a permutation (!‫)תמורה‬: σ is a function taking for
input an index {1, . . . , n} and returning an index is that same set, such that
every index is mapped to a distinct index. That is, aσ(1) , aσ(2) , . . . , aσ(n) is
a reordering of a1 , . . . , an ∈ F. The generalized law of commutativity asserts
that
n n
∑ ai = ∑ aσ(i) .
i=1 i=1

Example: Let n = 5 and let σ(1) = 3, σ(2) = 1, σ(3) = 4, σ(4) = 2 and


σ(5) = 5. Then,
5
∑ aσ(i) = a3 + a1 + a4 + a2 + a5 .
i=1

▲▲▲
Let a1 , a2 , . . . , an ∈ F and b1 , b2 , . . . , bn ∈ F. It can be shown inductively on n
that
n n n
∑ ai + ∑ bi = ∑(ai + bi ).
i=1 i=1 i=1

Likewise, for c ∈ F,
n n
c (∑ ai ) = ∑(c ai ).
i=1 i=1

n-tuples of field elements We consider the set of all ordered n-tuples of


element of a field, i.e., elements of the form

(a1 , . . . , an ),

where ai ∈ F for all i = 1, . . . , n. We denote this set by

Fn = {(a1 , . . . , an ) ∶ ai ∈ F, i = 1, . . . , n} .

More generally, let S be a set, then

S n = {(s1 , . . . , sn ) ∶ si ∈ S, i = 1, . . . , n}.
18 Chapter 1

For reasons that will become apparent later in this course, we will sometimes
write n-tuples of scalars as columns delimited by square brackets; we denote
this set by

⎪ ⎡ x1 ⎤ ⎫


⎪ ⎢ ⎥ ⎪

⎢ ⎥
Fncol = ⎨⎢ ⋮ ⎥ ∶ xi ∈ F, i = 1, . . . , n⎬ .

⎪ ⎢ n
⎥ ⎪

⎩⎢⎣x ⎥⎦
⎪ ⎪

At other times, the scalars will be arrange in a row delimited by square
brackets, and we denote this set by
Fnrow = {[a1 . . . an ] ∶ ai ∈ F, i = 1, . . . , n} .
At times, when writing columns is calligraphically annoying we will write
⎡ a1 ⎤
T
⎢ ⎥
⎢ ⎥
[x1 . . . xn ] = ⎢ ⋮ ⎥ .
⎢ ⎥
⎢an ⎥
⎣ ⎦
The reasons for this apparent nonsense (who cares about the form of paren-
theses and why write scalars in columns?) will be clarified later on.

Exercises

(easy) 1.14 Let S be a set. Describe the sets (S 2 )3 and (S 3 )2 .


Solution 1.14:
(S 2 )3 = {((a, b), (c, d), (e, f )) ∶ a, b, c, d, e, f ∈ S}
(S 3 )2 = {((a, b, c), (d, e, f )) ∶ a, b, c, d, e, f ∈ S}.

(easy) 1.15 Prove that all five ways of adding four addends in (1.2) yield
the same sum.
Solution 1.15: The identities follow from associativity for 3 addends, for example,
((a + b) + c) + d = (a + b) + (c + d),
where we take (a + b) as one of the addends. Now, taking (c + d) as one addend,
(a + b) + (c + d) = a + (b + (c + d)),
and so on.
Fields 19

(intermediate) 1.16 Prove using an inductive argument that


n n n
∑ ai + ∑ bi = ∑(ai + bi ).
i=1 i=1 i=1

Solution 1.16: For n = 1 the identity is a1 + b1 = a1 + b1 , hence holds. Suppose that the
identity holds for n = k. Then, by definition,
k+1 k+1 k k
∑ ai + ∑ bi = ∑ ai + ak+1 + ∑ bi + bk+1
i=1 i=1 i=1 i=1
k k
= ∑ ai + ∑ bi + ak+1 + bk+1
i=1 i=1
k
= ∑(ai + bi ) + (ak+1 + bk+1 )
i=1
k+1
= ∑ (ai + bi ),
i=1

where in the passage to the third line we used the inductive assumption.

(intermediate) 1.17 Prove that for every 1 < k < n.


n k n
∑ ai = ∑ ai + ∑ ai .
i=1 i=1 i=k+1

Hint: use an inductive argument on k.

(intermediate) 1.18 Calculate the following sums

(a) ∑20
n=3 (k ⋅ k − (k − 1) ⋅ (k − 1)).

(b) ∑99 1
n=1 n(n+1) .

Hint: you’re not supposed to carry out tedious calculations.


Solution 1.18: In the first case
20
2 2 2 2 2 2 2 2
∑ (k ⋅ k − (k − 1) ⋅ (k − 1)) = (3 − 2 ) + (4 − 3 ) + ⋅ ⋅ ⋅ + (20 − 19 ) = 20 − 2 .
n=3

Such a sum is called a “telescopic sum”. In the second case, we note that
1 1 1
= − ,
n(n + 1) n n + 1
20 Chapter 1

hence we once again have a “telescopic sum”,


99
1 1 1
∑ = − .
n=1 n(n + 1) 1 100

(intermediate) 1.19 Unfold and evaluate the following sum,


3 i
S = ∑ (∑(i + 2j)) .
i=1 j=1

Solution 1.19:
1 2 3
S = ∑ (1 + 2j) + ∑ (2 + 2j) + ∑ (3 + 2j)
j=1 j=1 j=1

= (1 + 2) + (2 + 2) + (2 + 4) + (3 + 2) + (3 + 4) + (3 + 6).

(harder) 1.20 Let


{aij ∈ F ∶ 1 ≤ i ≤ n, 1 ≤ j ≤ m}
be a set of mn scalars. Show that
n m m n
∑ (∑ aij ) = ∑ (∑ aij ) .
i=1 j=1 j=1 i=1

(This equality is an instance of Fubini’s theorem which you will encounter


later in your studies in different contexts.)
Solution 1.20: We need here a double induction. For m = n = 1, the identity is
a11 = a11 , which holds trivially. Let’s see that we can increase n by one (by symmetry it
will implies that we can also increase m by one). So assume that the identity holds for m
and n. Then,
n+1 ⎛ m ⎞ n ⎛m ⎞ m
∑ ∑ aij = ∑ ∑ aij + ∑ an+1,j
i=1 ⎝j=1 ⎠ i=1 ⎝j=1 ⎠ j=1
m n m
= ∑ (∑ aij ) + ∑ an+1,j
j=1 i=1 j=1
m n
= ∑ (∑ aij + an+1,j )
j=1 i=1
m n+1
= ∑ ( ∑ aij ) ,
j=1 i=1
Fields 21

where in the passage to the second line we used the inductive assumption and in the
passage to the third line we used the distributive law.

(harder) 1.21 Let

{aij ∈ F ∶ 1 ≤ i ≤ n, 1 ≤ j ≤ n}

be a set of n2 scalars. Show that


n i n n
∑ (∑ aij ) = ∑ (∑ aij ) .
i=1 j=1 j=1 i=j

Solution 1.21: Since the summation sign is defined inductively, we have to use induc-
tion. For n = 1, the identity is a11 = a11 , which holds trivially. Suppose this holds for n,
then
n+1 ⎛ i ⎞ n ⎛ i ⎞ n+1
∑ ∑ aij = ∑ ∑ aij + ∑ an+1,j
i=1 ⎝j=1 ⎠ i=1 ⎝j=1 ⎠ j=1
n ⎛ n ⎞ n+1
= ∑ ∑ aij + ∑ an+1,j
j=1 ⎝i=j ⎠ j=1
n ⎛ n ⎞ n
= ∑ ∑ aij + ∑ an+1,j + an+1,n+1
j=1 ⎝i=j ⎠ j=1
n ⎛ n ⎞
= ∑ ∑ aij + an+1,j + an+1,n+1
j=1 ⎝i=j ⎠
n ⎛n+1 ⎞
= ∑ ∑ aij + an+1,n+1
j=1 ⎝ i=j ⎠

Noting that the last term can be written as


n+1
an+1,n+1 = ∑ ai,n+1 ,
i=n+1

we finally obtain
n+1 ⎛n+1 ⎞
∑ ∑ aij .
j=1 ⎝ i=j ⎠
22 Chapter 1

(harder) 1.22 True or false? For every n ∈ N and sequences a1 , . . . , an ,


b1 , . . . , b n ,
n n n
(∑ an ) (∑ bn ) = ∑ an bn .
i=1 i=1 i=1

Solution 1.22: False. Take F = R, n = 2, a1 = a2 = b1 = b2 = 1. Then,


n n
(∑ an ) (∑ bn ) = (1 + 1)(1 + 1) = 4,
i=1 i=1

whereas
n
∑ an bn = 1 ⋅ 1 + 1 ⋅ 1 = 2.
i=1
Chapter 2

Linear Systems of Equations

2.1 One equation in multiple unknowns


We start by considering one linear equation in n unknowns:

Definition 2.1 A linear equation in n unknowns X 1 , . . . , X n with coeffi-


cients a1 , . . . , an , b ∈ F is an equation of the form

a1 X 1 + a2 X 2 + ⋅ ⋅ ⋅ + an X n = b. (2.1)

The scalar ai is called the coefficient (!M‫ )מקד‬of the i-th unknown. We write
the coefficients of the X i ’s in the form

[a1 , a2 , . . . , an ] ∈ Fnrow .

We also refer to the extended list of coefficients, which includes the right-
hand side
[a1 , a2 , . . . , an , b] ∈ Fn+1
row .

Example: Consider the following equation in two unknowns,

2(X 1 + X 2 − 6) = 3X 2 + 4(8 − X 1 ).

This is a linear equation in two unknowns albeit not of the form (2.1). By
algebraic manipulations (based on the axioms of field) we can rewrite it as

6X 1 − X 2 = 44,
24 Chapter 2

which in the above notation corresponds to the extended list of coefficients


[a1 , a2 , b] = [6, −1, 44]. ▲▲▲

Definition 2.2 A solution to (2.1) is an n-tuple of field elements,


⎡ x1 ⎤
⎢ ⎥
⎢ ⎥
⎢ ⋮ ⎥ ∈ Fncol ,
⎢ n⎥
⎢x ⎥
⎣ ⎦
such that
a1 x1 + ⋅ ⋅ ⋅ + an xn = b. (2.2)
The set of all solutions (which could be an empty set) is a subset of Fn ,

⎪ ⎡ x1 ⎤ ⎫

n
⎪⎢⎢ ⎥⎥
⎪ n i


S[a1 ,...,an ∣b] = ⎨⎢ ⋮ ⎥ ∈ Fcol ∶ ∑ ai x = b⎬ .
⎪⎢⎢xn ⎥⎥

⎪ i=1 ⎪


⎩⎣ ⎦ ⎭
In words, the set of solutions of the linear equation defined by the coefficients
[a1 , . . . , an , b] is the set of all [x1 , . . . , xn ]T ∈ Fncol satisfying (2.2).

Generally, an equation may have one solution, many solutions, or no solution


at all. What do we mean then by solving an equation? We mean that we
obtain a “constructive recipe” for generating all of its solutions.

Example: Consider the linear equation in two unknowns,

X 1 + X 2 = 1. (2.3)

We are looking for pairs of scalars [x1 , x2 ]T ∈ F2col satisfying this equation.
We may see right away that

1 0
[ ] and [ ]
0 1

are both solutions to (2.3), but do there exist more solutions? Take any t ∈ F
and substitute it for X 2 . Then, we are left with the equation

X 1 + t = 1,
Linear Systems of Equations 25

which is solvable, and this solution is unique, x1 = 1 − t. Thus, for every


choice of t ∈ F, the pair [1 − t, t]T is a solution to (2.3). Namely,
1−t
{[ ] ∶ t ∈ F} ⊆ S[1,1∣1] .
t
In words, for every t ∈ F, the pair [1 − t, t]T ∈ F2col is a solution to the linear
equation with two unknowns and coefficients a1 = 1, a2 = 1 and b = 1.
In fact, this inclusion between sets turns out to be an equality, as every
solution to (2.3) must be of the form [1 − t, t]T . Note how we broke the
symmetry between the two unknowns: we treated the second unknown as a
“free” parameter, which may assume any value, whereas the value of the first
unknown was “dependent” on the choice of the second unknown. Note also
that the choice of the second unknown as “free” is arbitrary; we could have
done it the other way around. ▲▲▲

Definition 2.3 A linear equation of the form (2.1) is called homogeneous


(!‫ )הומוגנית‬if b = 0. It is called consistent (!‫ )עקבית‬if its set of solutions is
not empty.

Example: Suppose that all the coefficients ai are zero, namely


0F ⋅ X 1 + ⋅ ⋅ ⋅ + 0F ⋅ X n = b.
If b ≠ 0F then no [x1 , . . . , xn ]T ∈ Fncol satisfies the equation, i.e., the equation
is not consistent, namely,
S[0,...,0∣b] = ∅.
If on the other hand b = 0F , then every n-tuple is a solution, i.e., the set of
all solutions is Fn . In other words,

⎪ ⎡ x1 ⎤ ⎫

n
⎪⎢⎢ ⎥⎥
⎪ ⎪

⎨⎢ ⋮ ⎥ ∈ Fcol ∶ ∑ 0F ⋅ x = 0F ⎬ = Fncol .
n i

⎪ ⎢ n⎥ i=1 ⎪

⎩⎢⎣x ⎥⎦
⎪ ⎪

▲▲▲
We now show how an equation can be modified without changing its set of
solutions. Take an equation of the form (2.1), and let F ∋ c ≠ 0F . Consider
the equation
(ca1 )X 1 + ⋅ ⋅ ⋅ + (can )X n = cb, (2.4)
obtained by multiplying all the coefficients in (2.1) by c.
26 Chapter 2

Proposition 2.4 Every solution of (2.1) is a solution of (2.4), and vice-


versa, every solution of (2.4) is a solution of (2.1). That is, both equations
have the same set of solutions,

S[a1 ,...,an ∣b] = S[ca1 ,...,can ∣cb] .

Proof : If [x1 , . . . , xn ]T is a solution of (2.1), then by definition

a1 x1 + ⋅ ⋅ ⋅ + an xn = b.

Multiplying both sides by c, using the distributive law and the associativity
of products,

cb = c (a1 x1 + ⋅ ⋅ ⋅ + an xn )
= c(a1 x1 ) + ⋅ ⋅ ⋅ + c(an xn )
= (ca1 )x1 + ⋅ ⋅ ⋅ + (can )xn ,

i.e., [x1 , . . . , xn ]T is also a solution of (2.4). The reverse implication follows


by multiplying (2.4) by c−1 . n
Proposition 2.4 implies that we have a means of changing an equation with-
out changing its set of solutions. This is the idea behind the procedure of
simplifying equations. Suppose that there exists at least one ai different from
zero. Let k ∈ {1, . . . , n} be the smallest index for which ai ≠ 0, i.e., ak ≠ 0 and
ai = 0 for all i < k. That is, we can write the equation as

ak X k + ak+1 X k+1 + ⋅ ⋅ ⋅ + an X n = b.

(We call X k the leading variable (!‫ )המשתנה המוביל‬of the equation.) Mul-
tiplying this equation by a−1
k we obtain an equation having the same set of
solutions, whose first non-zero coefficient is one,

X k + (ak+1 /ak )X k+1 + ⋅ ⋅ ⋅ + (an /ak )X n = b/ak .

We say that this equation is in standard form (!‫)הצגה מתוקנת‬. By Theo-


rem 1.1, no matter which values we substitute for X 1 , . . . , X k−1 and X k+1 , . . . , X n ,
Linear Systems of Equations 27

there exists a unique value of X k for which this equation holds. That is, the
set of solutions can be written as

⎪ ⎡ t1 ⎤ ⎫



⎪ ⎢ ⎥ ⎪



⎪ ⎢ ⋮ ⎥ ⎪


⎪ ⎢ ⎥ ⎪

⎪ ⎢ k−1 ⎥ ⎪


⎪ ⎢ t ⎥ ⎪


⎪⎢⎢ n i
⎥ 1 n ⎪
S[a1 ,...,an ∣b] = ⎨⎢b/ak − ∑i=k+1 (ai /ak )t ⎥ ∶ t , . . . , t ∈ F⎬ .

⎪ ⎢ ⎥ ⎪



⎪ ⎢ tk+1 ⎥ ⎪




⎪ ⎢ ⎥ ⎪


⎪ ⎢ ⋮ ⎥ ⎪


⎪ ⎢ n
⎥ ⎪


⎪ ⎢ t ⎥ ⎪

⎩⎣ ⎦ ⎭
This is what we mean by a solutions which is constructive, or explicit (!‫)מפורש‬.
The full set of solutions can be generated by selecting all possible values for
(t1 , . . . , tk−1 , tk+1 , . . . tn ). In this representation we say that the variables X i
for i ≠ k are free variables (!M‫ חופשיי‬M‫( )משתני‬because we can generate
all solutions by selecting their values “freely”) whereas X k is a dependent
variable (!‫( )משתנה קשור‬because once the free variables have been assigned,
the value of X k depends on those assigned values).
We may formulate the following corollary:

Corollary 2.5 Every linear equation in n unknowns having at least one


non-zero coefficient ai is consistent, and its set of solutions can be represented
by means of n − 1 free variables.

Exercises

(easy) 2.1 Write the set of solutions to the linear equation in two unknowns
over R,
3X 1 − 4X 2 = 7.

Solution 2.1: The set of solutions is


7
+ 4t
S[3,−4∣7] = {[ 3 3 ] ∶ t ∈ R} .
t
28 Chapter 2

(easy) 2.2 Write the equation over R

0 X 1 + 0 X 2 − 4X 3 + 0 X 4 + 7X 5 = 3

in normalized form and write its set of solutions in explicit form.

Solution 2.2: The standard form is


X 3 − 47 X 5 = − 34 .

Its set of solutions is





⎢ a ⎤⎥ ⎫


⎪ ⎢ ⎪


⎪ b ⎥⎥ ⎪

⎪⎢⎢ 3
⎪ 7 ⎥


S[0,0,−4,0,7∣3] = ⎨⎢− 4 + 4 d⎥ ∶ a, b, c, d ∈ R⎬ .

⎪ ⎢ ⎥ ⎪





⎢ c ⎥⎥ ⎪


⎪ ⎪
⎩⎢⎣
⎪ d ⎦ ⎥ ⎪

(intermediate) 2.3 Find the set of solutions to the equation

X1 + X2 + X3 = 1

over the field F2 . Solve the same equation over the field F3 .

Solution 2.3: Over any field, the set of solutions is



⎪ ⎡1 − s − t⎤ ⎫

⎪⎢⎢
⎪ ⎥
⎥ ⎪

S = ⎨⎢ s ⎥ ∶ s, t ∈ F⎬ .

⎪ ⎢ ⎥ ⎪

⎩⎢⎣ t ⎥⎦
⎪ ⎪

If the field is F2 , then there are four solutions,

⎪ ⎡1⎤ ⎡0⎤ ⎡0⎤ ⎡1⎤⎫
⎪⎢⎢ ⎥⎥ ⎢⎢ ⎥⎥ ⎢⎢ ⎥⎥ ⎢⎢ ⎥⎥⎪
⎪ ⎪

S = ⎨⎢0⎥ , ⎢1⎥ , ⎢0⎥ , ⎢1⎥⎬ .

⎪ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥⎪

⎩⎢⎣0⎥⎦ ⎢⎣0⎥⎦ ⎢⎣1⎥⎦ ⎢⎣1⎥⎦⎪
⎪ ⎭
If the field is F3 , then there are nine solutions,

⎪ ⎡1⎤ ⎡0⎤ ⎡2⎤ ⎡0⎤ ⎡2⎤ ⎡1⎤ ⎡2⎤ ⎡1⎤ ⎡0⎤⎫
⎪⎢⎢ ⎥⎥ ⎢⎢ ⎥⎥ ⎢⎢ ⎥⎥ ⎢⎢ ⎥⎥ ⎢⎢ ⎥⎥ ⎢⎢ ⎥⎥ ⎢⎢ ⎥⎥ ⎢⎢ ⎥⎥ ⎢⎢ ⎥⎥⎪
⎪ ⎪

S = ⎨⎢0⎥ , ⎢1⎥ , ⎢2⎥ , ⎢0⎥ , ⎢1⎥ , ⎢2⎥ , ⎢0⎥ , ⎢1⎥ , ⎢2⎥⎬ .

⎪ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥⎪

⎩⎢⎣0⎥⎦ ⎢⎣0⎥⎦ ⎢⎣0⎥⎦ ⎢⎣1⎥⎦ ⎢⎣1⎥⎦ ⎢⎣1⎥⎦ ⎢⎣2⎥⎦ ⎢⎣2⎥⎦ ⎢⎣2⎥⎦⎪
⎪ ⎭
Linear Systems of Equations 29

(intermediate) 2.4 Find the set of solutions to the equation

a1 X 1 + a2 X 2 + a3 X 3 = b

over the field R for the following sets of coefficients:

(a) [a1 , a2 , a3 ∣b] = [1, 1, 2∣1].


(b) [a1 , a2 , a3 ∣b] = [0, 1, 6∣3].
(c) [a1 , a2 , a3 ∣b] = [0, 3, 6∣3].

Solution 2.4: The solutions are



⎪ ⎡1 − s − 2t⎤ ⎫
⎪ ⎧
⎪ ⎡ s ⎤ ⎫

⎪⎢⎢
⎪ ⎥
⎥ ⎪
⎪ ⎪⎢⎢
⎪ ⎥
⎥ ⎪

S[1,1,2∣1] = ⎨⎢ s ⎥ ∶ s, t ∈ R⎬ S[0,1,6∣3] = ⎨⎢3 − 6t⎥ ∶ s, t ∈ R⎬

⎪⎢⎢ ⎥ ⎪
⎪ ⎪ ⎢
⎪⎢ t ⎥ ⎥ ⎪


⎩⎣ t ⎥
⎦ ⎪
⎭ ⎪
⎩⎣ ⎦ ⎪


⎪ ⎡ s ⎤ ⎫

⎪⎢⎢
⎪ ⎥
⎥ ⎪

S[0,3,6∣3] = ⎨⎢1 − 2t⎥ ∶ s, t ∈ R⎬ .

⎪ ⎢ ⎥ ⎪

⎩⎢⎣ t ⎥⎦
⎪ ⎪

(intermediate) 2.5 Suppose that [x1 , . . . , xn ]T is a solution to both equa-


tions,

a1 X 1 + ⋅ ⋅ ⋅ + an X n = b and c1 X 1 + ⋅ ⋅ ⋅ + cn X n = d.

Prove that for every α ∈ F it is also a solution to the equation

(α a1 + c1 )X 1 + ⋅ ⋅ ⋅ + (α an + cn )X n = α b + d.

Solution 2.5: This follows from the axioms of field and summation rules. We need to
prove that if
n n
i i
∑ ai x = b and ∑ ci x = d,
i=1 i=1

then
n
i
∑(α ai + ci )x = α b + d.
i=1

This identity was basically proved inductively in the previous section.


30 Chapter 2

2.2 Systems of equations


We introduce next the notion of a system of m linear equations in n
unknowns (!‫)מערכת משוואות ליניאריות‬. For this we need m sets of coefficients
ai and b. Rather than using new symbols for each equation, we denote the
coefficients for the i-th equation by an upper index i. That is, we consider a
system of m equations of the form

a11 X 1 +a12 X 2 + . . . +a1n X n = b1


a21 X 1 +a22 X 2 + . . . +a2n X n = b2
, (2.5)
⋮ ⋮ ⋮ ⋮
am
1 X
1 +am X 2 + . . . +am X n = bm
2 n

where
aij ∈ F i = 1, . . . , m and j = 1, . . . , n
is the coefficient (!M‫ )מקד‬of the j-th variable in the i-th equation, and

bi ∈ F i = 1, . . . , m

is the right-hand side of the i-th equation. Please note: the upper indexes in
aij and in bi enumerate the equation, whereas the lower index in aij enumerates
the variable.
A solution (!N‫ )פתרו‬to the system is any n-tuple [x1 , x2 , . . . , xn ]T ∈ Fncol , such
that
a11 x1 +a12 x2 + . . . +a1n xn = b1
a21 x1 +a22 x2 + . . . +a2n xn = b2
. (2.6)
⋮ ⋮ ⋮ ⋮
m 1
a1 x +am
2 x
2 + . . . +am xn = bm
n

Given a system of equations (2.5) (which is uniquely determined by n, m and


the scalars aij and bi ), we would like to find the set of all of its solutions,


⎪ ⎡ x1 ⎤ ⎫

n
⎪⎢⎢ ⎥⎥
⎪ n i j i


S = ⎨⎢ ⋮ ⎥ ∈ F ∶ ∑ aj x = b for all i = 1, . . . , m⎬ .

⎪ ⎢ n⎥ j=1 ⎪

⎩⎢⎣x ⎥⎦
⎪ ⎪

As in the case of a single equation, solving the system of equations means
obtaining a constructive way of generating all of its solutions.
Like for a single equation:
Linear Systems of Equations 31

Definition 2.6 A system of linear equations (2.5) is called homogeneous


(!‫ )מערכת משוואות הומוגנית‬if the right-hand side is zero, namely, bi = 0 for all
i = 1, . . . , m. It is called consistent (!‫ )עקבית‬if its set of solutions is not
empty.

Example: Consider the inhomogeneous system of two equations in two un-


knowns over R,
X 1 +X 2 = 0
X 1 +X 2 = 1.
Each equation separately has a solution, however this system is not consistent
as if [x1 , x2 ]T was a solution, it would imply that
0 = x1 + x2 = 1,
which violates the axioms of field. ▲▲▲

Example: Consider the inhomogenous system of m = 2 linear equations in


n = 4 unknowns over R,
X 1 +2X 2 −X 4 = 1
X3 +4X 4 = 3.
This is a quite special form of a system as we will immediately see. First, it
is not very difficult to “guess” a solution
[1, 0, 3, 0]T ∈ F4col .
In fact, we may observe that the variable X 1 only appears in the first equa-
tion, whereas the variable X 3 only appears in the second equation. As a
result, suppose that we substitute s ∈ F for X 2 and t ∈ F for X 4 . Then, we
obtain two decoupled linear equations for X 1 and X 3 , whose solutions are
x1 = 1 − 2s + t
x3 = 3 − 4t.
As we did in the previous section, we may treat X 2 and X 4 as free vari-
ables, so that the set of solutions is generated by all possible choices of those
variables, yielding,

⎪ ⎡1 − 2s + t⎤ ⎫


⎪ ⎢ ⎥ ⎪


⎪⎢⎢ s ⎥ ⎪

⎥ 4
S = ⎨⎢ ⎥ ∈ Fcol ∶ s, t ∈ F⎬ .

⎪ ⎢ 3 − 4t ⎥ ⎪


⎪ ⎢ ⎥ ⎪


⎩⎣⎢ t ⎥ ⎪


32 Chapter 2

If, for example, F is a finite field, then we can enumerate the set of solutions,
which is a finite set. ▲▲▲
Not every system of equations is as “transparent” as in the above example.
What do we do when the system is more complicated? We transform it into
a “transparent” one having the same set of solutions, and we then solve the
easier one.

Example: Consider the inhomogeneous system of m = 2 linear equations in


n = 4 unknowns over R,

X 1 +2X 2 +X 3 +3X 4 = 4
3X 1 +6X 2 +2X 3 +5X 4 = 9.

This system is not “transparent” as the previous one. In secondary school


you learned how to solve such equations by eliminating variables (Z‫חילו‬
!M‫)משתני‬. Take the first equation and multiply it by 3,

3X 1 +6X 2 +3X 3 +9X 4 = 12.

We proved (Proposition 2.4) that this does not alter its set of solutions. Take
now this equation and subtract it from the second equation in the original
system, yielding
−X 3 −4X 4 = −3.

Then, add this equation to the first equation in the original system, yielding

X 1 +2X 2 −X 4 = 1.

Finally, multiply the penultimate equation by (−1) yielding

X 3 +4X 4 = 3.

Look at the last two equations. This is the system of the previous example—
the “transparent” system, whose solution we’ve already found. As we will
prove in the next section, the solutions of both sets of equations are the same.
▲▲▲
Linear Systems of Equations 33

2.3 Equivalent systems of equations


Our goal is to now formalize the process we have just done in a specific
example. Given a linear system (2.5) of m equations in n unknowns, we may
form a new equation, which is a linear combination (!‫ ליניארי‬P‫ )צירו‬of the
m equations by multiplying each equation by a number ci , i = 1, . . . , m, and
add up the resulting m equations, Multiplying the i-th equation by ci and
summing over the m equations yields the equation
m n m
∑ ci (∑ aij X j ) = ∑ ci bi ,
i=1 j=1 i=1

which we can rearrange by interchanging the order of summation (see Exer-


cise 1.20 ) into
n m m
∑ (∑ ci aij ) X j = ∑ c i bi .
j=1 i=1 i=1
´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹¸¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¶ ´¹¹ ¹ ¸ ¹ ¹ ¶
coefficient of X j right-hand side

In more explicit notation, if for every i = 1, . . . , m,

ai1 X 1 + ⋅ ⋅ ⋅ + ain X n = bi ,

then multiplying this equation by ci and summing over all i = 1, . . . , m,

c1 (a11 X 1 + ⋅ ⋅ ⋅ + a1n X n ) + ⋅ ⋅ ⋅ + cm (am 1 m n 1 m


1 X + ⋅ ⋅ ⋅ + an X ) = c 1 b + ⋅ ⋅ ⋅ + c m b ,

which we further reorganize as

(c1 a11 + . . . cm am 1 1 m n 1 m
1 )X + ⋅ ⋅ ⋅ + (c1 an + . . . cm an )X = c1 b + ⋅ ⋅ ⋅ + cm b . (2.7)

Note that we applied here both the extended associativity and commutativity
of addition and the distributive law. We conclude that a linear combination
of linear equations is again a linear equation.

Proposition 2.7 ((!‫ ))משפט הירושה‬Every solution [x1 , . . . , xn ]T ∈ Fncol of


(2.5) is also a solution of (2.7).
34 Chapter 2

Proof : Let [x1 , . . . , xn ]T be a solution to (2.5), i.e.,

ai1 x1 + ⋯ + anj xn = bi for all i = 1, . . . , m.

Multiplying the i-th equation by ci , summing over i and applying the dis-
tributive law, we recover the desired result after exchanging the order of
summation. Note that we used here the consistency of equality and addi-
tion: if s1 = t1 , s2 = t2 up to tm = sm , then

s1 + s2 + ⋅ ⋅ ⋅ + sm = t1 + t2 + ⋅ ⋅ ⋅ + tm .

n
Note, however, that the reverse is not necessarily true. Not every solution to
(2.7) is necessarily a solution of (2.5) (“information may have been lost”).
More generally, consider a linear system of k equations in n unknowns,
g11 X 1 +g21 X 2 + . . . +gn1 X n = z 1
g12 X 1 +g22 X 2 + . . . +gn2 X n = z 2
. (2.8)
⋮ ⋮ ⋮ ⋮
g1k X 1 +g2k X 2 + . . . +gnk X n = z k

If each of the k equations in (2.8) is a linear combination of the m equations


in (2.5), then, by Proposition 2.7, every solution of (2.5) is also a solution to
(2.8) (but not necessarily the other way around).
This observation brings us to the following definition:

Definition 2.8 Two linear systems of equations are called equivalent (!‫)שקולות‬
if every equation in one system is a linear combination of the equations in
the other system.

Example: Back to our first example, the systems


2X 1 −X 2 +X 3 = 0
X 1 +3X 2 +4X 3 = 0
and
X 2 +X 3 = 0
X1 +X 3 = 0
are equivalent. The first equation in the second system is obtained by a linear
combination of the first system with coefficients [−1/7, 2/7] and the second
Linear Systems of Equations 35

equation in the second system is obtained by a linear combination of the


first system with coefficients [3/7, 1/7]. Conversely, the first equation in the
first system is obtained by a linear combination of the second system with
coefficients [−1, 2], and the second equation in the first system is obtained
by a linear combination of the second system with coefficients [3, 1]. ▲ ▲ ▲
The importance of equivalent systems stems from the following fact:

Proposition 2.9 Equivalent systems have the same set of solutions.

Proof : By Proposition 2.7, every solution of a linear system is also a solution


of an equation obtained by a linear combination of that system. Since each
equation in one system is a linear combination of the equation in the other
system, every solution of System A is a solution of System B, and conversely,
every solution of System B is a solution of System A, n
This notion of two systems being equivalent has a very important property:

Lemma 2.10 If a linear system of equations B is obtained by linear combi-


nations of a linear system of equations A, and a linear system of equations
C is obtained by linear combinations of a linear system of equations B, then
System C is obtained by linear combinations of the equations in System A.

Proof : Suppose that System A has m equations, System B has k equations


and System C has p equations, all in n unknowns. If System B is obtained
by linear combinations of System A, then the `-th equation in System B is
obtained by taking linear combinations of the m equations in System A, with
coefficients c`1 , . . . , c`m , namely, the `-th equation of system B is of the form
m n m
∑ c`s ∑ asj X j = ∑ c`s bs .
s=1 j=1 s=1

Likewise, if System C is is obtained by linear combinations of System B, then


the i-th equation in System C is obtained by taking linear combinations of
36 Chapter 2

the k equations in System B, with coefficients di1 , . . . , dik , namely, the i-th
equation of system C is of the form

k m n k m
∑ di` ∑ c`s ∑ asj X j = ∑ di` ∑ c`s bs .
`=1 s=1 j=1 `=1 s=1

Reorganizing this equation as

m k n m k
∑ (∑ di` c`s ) ∑ asj X j = ∑ (∑ di` c`s ) bs ,
s=1 `=1 j=1 s=1 `=1
´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¸ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¶ ´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¸ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¶
eis eis

proves that System C is obtained by a linear combinations of System A. n

Corollary 2.11 If a linear system of equations B is equivalent to a linear


system of equations A, and a linear system of equations C is equivalent to a
linear system of equations B, then System C is equivalent to System A.

Proof : Apply the previous lemma both ways. n


These observations are key to the solution of linear systems of equations.
What we actually do is to replace the original system by equivalent systems
through a chain of transformations which ensure that we always remain with
a system that is equivalent to the original one, so that the set of solutions
never changes. The key is to end up with a system equations which is “trans-
parent”.

Comment: Note that in all summations, the index we sum upon always
appears once as an upper index and once as a lower index. If you come
across a summation in which this is not the case, look for an error.
We end this section by observing that while we have a well-defined notion
of equivalence between systems of equations, we don’t yet have a means for
verifying wether two systems of equations are equivalent, nor a systematic
way of generating equivalent systems to a given system.
Linear Systems of Equations 37

Exercises

(easy) 2.6 Consider the following linear system of two equations in three
unknowns over R,
2X 1 +X 2 +X 3 = 2
X 1 +2X 2 −X 3 = −1.

(a) Is this a homogeneous system?


(b) Is [1, 0, 0]T a solution?
(c) Write an equation which is a linear combination of this system with
coefficients [2, −3].
(d) Is X 1 + X 2 = 1/3 a linear combination of this system? If it is, what are
the coefficients?
(e) Is 2X 1 + X 2 + X 3 = 1 a linear combination of this system? If it is, what
are the coefficients?

Solution 2.6:
(a) No.
(b) No. It is not a solution of the second equation.
(c) X 1 − 4X 2 + 5X 3 = 7.
(d) Yes. The coefficients are [1/3, 1/3].
(e) No. You can’t find a, b such that 2a + b = 2, a + 2b = 1, a − b = 1 and 2a − b = 1.

(easy) 2.7 Write the set of solutions of the linear system over R in the
unknowns (X, Y ):
X +Y = 5
2X −Y = 3.

Solution 2.7: There is only one solution,


8
S = {[ 37 ]} .
3
38 Chapter 2

(easy) 2.8 Write the set of solutions of the linear system over R in the
unknowns (X, Y, Z):
X +Y −Z = −1
X −Y −Z = −1.

Solution 2.8: The set of solutions is



⎪ ⎡s − 1⎤ ⎫

⎪⎢⎢
⎪ ⎥
⎥ ⎪

S = ⎨⎢ 0 ⎥ ∶ s ∈ F⎬ .
⎪ ⎢
⎪⎢ s ⎥ ⎥ ⎪


⎩⎣ ⎦ ⎪

(intermediate) 2.9 Show that the following two homogeneous systems of


equations are equivalent,
⎧ ⎧
⎪ X 1 −X 2 = 0
⎪ ⎪ 3X 1 +X 2 = 0

⎨ 1 2
and ⎨ 1
⎩ 2X +X = 0

⎪ ⎩ X

⎪ +X 2 = 0.

Solution 2.9: The second system is obtained from the first by taking the coefficients
to be [1/3, 4/3] and [−1/3, 2/3], respectively. The first system is obtained from the second
by taking the coefficients to be [1, −2] and [1/2, 1/2], respectively.

(intermediate) 2.10 Show that the following two homogeneous systems of


equations over R are equivalent,

⎪ −X 1 +X 2 +4X 3 = 0 ⎧


⎪ ⎪ X1
⎪ −X 3 = 0
⎨ X 1 +3X 2 +8X 3 = 0 and ⎨

⎪ 1 1

⎪ X 2 +3X 3 = 0.

⎩ 2X
⎪ +X 2 + 52 X 3 = 0 ⎩

Solution 2.10: We have to show that every equation in one system is a linear combina-
tion of the equations in the other system. Let’s show for example that the first equation
on the left is a linear combination of the two equations on the right. We need to show
that there exists a, b ∈ R such that
a ⋅ 1 + b ⋅ 0 = −1 a⋅0+b⋅1=1 and a ⋅ (−1) + b ⋅ 3 = 4.
Indeed, a = (−1) and b = 1 is a solution. I.e., the first equation on the left is obtained by
subtracting the first equation on the right by the second equation.
Linear Systems of Equations 39

(intermediate) 2.11 Consider the following two homogeneous systems of


equations over R
⎧ ⎧
⎪ X 1 −X 2 = 0
⎪ ⎪ X 1 +2X 2 = 0

⎨ 1 2
and ⎨ 1 2
⎩ 2X +X = 0 ⎩ −2X −4X = 0.

⎪ ⎪

Are they equivalent? If they are, write each system as a linear combination
of the other.

Solution 2.11: These systems are not equivalent. To prove it, it suffices to show that
one equation in one of the systems is not a linear combination of the equations in the
other system. We will show that the first equation on the left is not a linear combination
of the two equations on the right. If it were, there would exists a, b ∈ R, such that

a ⋅ 1 + b ⋅ (−2) = 1 and a ⋅ 2 + b ⋅ (−4) = −1.

You may check that this system is inconsistent.

(harder) 2.12 We showed that if two systems of equations are equivalent,


then they have the same sets of solutions. What about the converse? Show
that if two homogeneous systems of linear equations in two unknowns have
the same solutions, then they are equivalent.

(harder) 2.13 Does there exists a linear system of m equations in n un-


knowns having a unique solution when

(a) m = 4 and n = 3.
(b) m = 3 and n = 4.

If the answer is positive provide an example; if it is negative explain why.

Solution 2.13: The answer to the first item is positive. Take for example the system
of equations

X1 = 1 X2 = 1 X3 = 1 and X 1 + X 2 = 2.

The answer to the second equation is negative. Note that a system of three equations in
four unknowns may be inconsistent, in which case there are no solutions at all. If, however
it is consistent, the analysis is the subsequent sections will show that the solution is not
unique.
40 Chapter 2

2.4 Matrix notation


2.4.1 Definitions
An important practice in mathematics is the adoption of convenient nota-
tions. In the present case, since a linear system of equations (hence also its
solutions) is fully determined by the coefficients aij and the bi , there is no
need to carry around also the variables X j . We organize the coefficients aij
in a rectangular array
⎡ a1 a1 ⋯ a1n ⎤
⎢ 1 2 ⎥
⎢ a2 a2 ⋯ a2 ⎥
⎢ 1 2 n⎥
A=⎢ ⎥
⎢ ⋮ ⎥
⎢ m ⋮m ⋯ ⋮m ⎥
⎢a
⎣ 1 a2 ⋯ an ⎦

which we call the m×n matrix of coefficients (!M‫)מטריצת המקדמי‬. The entry
at the i-th row and the j-th column is the coefficient of the j-th unknown in
the i-th equation. Likewise, we organize the bi ’s as an m × 1 matrix
⎡ b1 ⎤
⎢ ⎥
⎢ b2 ⎥
⎢ ⎥
b = ⎢ ⎥,
⎢ ⋮ ⎥
⎢ m⎥
⎢b ⎥
⎣ ⎦
which is an element of Fm
col . If we further organize the unknowns as an n × 1
matrix,
⎡X 1 ⎤
⎢ ⎥
⎢X 2 ⎥
⎢ ⎥
X = ⎢ ⎥,
⎢ ⋮ ⎥
⎢ n⎥
⎢X ⎥
⎣ ⎦
then we may symbolically represent the system of equations as AX = b. At
this stage this is just a symbolic notation, but it will acquire a meaning
shortly.

Comments:

(a) Note that in aij , the upper index i designates the row and the lower
index j designates the column. It will sometimes be convenient to write
the (i, j)-th element of a matrix A also by (A)ij .
Linear Systems of Equations 41

(b) Formally, an m × n matrix A is a function from the set {1, . . . , m} ×


{1, . . . , n} to the field F. For every pair of indexes (i, j) it returns the
field element which we denote by aij .
(c) We denote the set of m × n matrices with values in the field F by

Mm×n (F).

(d) M1×n (F) coincides with Fnrow , whereas Mm×1 (F) coincides with Fm
col .

We denote the i-th row of the matrix A by

Rowi (A) = [ai1 ai2 . . . ain ] .

Likewise, we denote the j-th column of A by

⎡ a1j ⎤
⎢ ⎥
⎢ a2 ⎥
⎢ ⎥
Colj (A) = ⎢ j ⎥ .
⎢ ⋮ ⎥
⎢ m⎥
⎢a ⎥
⎣ j⎦

In fact, we may present the matrix A either as a column of m rows, each of


size n or as a row of n columns, each of size m,

⎢ Row1 (A) ⎤
⎥ ⎡ ⎤

⎢ Row2 (A) ⎥ ⎢
⎥ ⎢


A=⎢ ⎥ = ⎢Col1 (A) Col2 (A) ⋯ Coln (A)⎥
⎢ ⋮ ⎥ ⎢ ⎥
⎢ m
⎥ ⎢ ⎥
⎢ Row (A) ⎥ ⎣ ⎦
⎣ ⎦

We may also write the coefficients and the right-hand side of the equations
as a unified m × (n + 1) matrix,


⎢ a11 a12 ⋯ a1n b1 ⎤

⎢ a21 a22 ⋯ a2n b2 ⎥
⎢ ⎥
[A∣b] = ⎢ ⎥.
⎢ ⋮ ⋮ ⋯ ⋮ ⋮ ⎥
⎢ ⎥

⎣ a1 am
m
2
m
⋯ an b m ⎥

It is called the augmented matrix (!‫ )המטריצה המורחבת‬of the system AX =


b. Finally, we denote the set of solutions to AX = b by S[A∣b] .
42 Chapter 2

Exercises

(easy) 2.14 Consider the matrix


⎡0 0 1 4⎤
⎢ ⎥
⎢ ⎥
A = ⎢2 4 2 6⎥
⎢ ⎥
⎢3 6 2 5⎥
⎣ ⎦

What are a23 , Row2 (A) and Col3 (A)?

Solution 2.14: a23 = 2,


⎡1⎤
⎢ ⎥
2 ⎢ ⎥
Row (A) = [2 4 2 6] and Col3 (A) = ⎢2⎥ .
⎢ ⎥
⎢2⎥
⎣ ⎦

(easy) 2.15 Write the system of equations represented by the extended ma-
trix
⎡ 0 0 1 4 3 ⎤
⎢ ⎥
⎢ ⎥
⎢ 2 4 2 6 7 ⎥.
⎢ ⎥
⎢ 3 6 2 5 8 ⎥
⎣ ⎦

Solution 2.15:
+X 3 + 4X 4 =3
1 2
2X +4X +2X 3 + 6X 4 =7
3X 1 +6X 2 +2X 3 + 5X 4 = 8.

2.4.2 Elementary row-operations and row-equivalence


Next, we consider operations on matrices that correspond to forming lin-
ear combinations of equations. We define the following elementary row-
operations (!‫)פעולות שורה יסודיות‬:

1. Multiplication of the k-th row by a non-zero scalar F ∋ c ≠ 0.


2. Replacement of the r-th row with row r plus c times row s, where c ∈ F.
Linear Systems of Equations 43

These operations are in fact functions taking an element in Mm×n (F) and
returning an element in Mm×n (F).
Formally, if A is a matrix, and e is the operation (the function) taking a
matrix and returning a matrix having all rows the same, except that the r-th
row has been multiplied by F ∋ c ≠ 0, then for every pair of indexes i, j,

⎪c a i
⎪ i=r
(e(A))ij =⎨ i j
⎩ aj i ≠ r,

i.e.,
⎡ a11 ⋯ a1n ⎤ ⎡ a11 ⋯ a1n ⎤
⎢ ⎥ ⎢ ⎥
⎢ ⋮ ⋮ ⋮ ⎥⎥ ⎢⎢ ⋮ ⋮ ⋮ ⎥⎥

⎢ ⎥ ⎢ ⎥
e ∶ ⎢ ar1 ⋯ arn ⎥ ↦ ⎢c ar1 ⋯ c arn ⎥
⎢ ⎥ ⎢ ⎥
⎢ ⋮ ⋮ ⋮ ⎥⎥ ⎢⎢ ⋮ ⋮ ⋮ ⎥⎥

⎢am ⋯ am ⎥ ⎢ am ⋯ am ⎥
⎣ 1 n⎦ ⎣ 1 n ⎦

If e is the operation taking a matrix and returning a matrix having all rows
the same, except for the r-th row being the sum of the r-th row and c times
the s-th row of A, then for every pair of indexes i, j,

⎪ai + c asj
⎪ i=r
(e(A))ij = ⎨ ji
⎩ aj i ≠ r,

i.e.,
⎡ a11 ⋯ a1n ⎤ ⎡ a11 ⋯ a1n ⎤⎥
⎢ ⎥ ⎢
⎢ ⋮ ⋮ ⋮ ⎥⎥ ⎢⎢ ⋮ ⋮ ⋮ ⎥
⎢ ⎥
⎢ r r ⎥ ⎢ r s r s ⎥
e ∶ ⎢ a1 ⋯ an ⎥ ↦ ⎢a1 + c a1 ⋯ an + c an ⎥
⎢ ⎥ ⎢ ⎥
⎢ ⋮ ⋮ ⋮ ⎥⎥ ⎢⎢ ⋮ ⋮ ⋮ ⎥
⎢ ⎥
⎢am ⋯ am ⎥ ⎢ am ⋯ am ⎥
⎣ 1 n⎦ ⎣ 1 n ⎦

Lemma 2.12 Every elementary row-operation e has in inverse operation


e−1 , which is also an elementary row-operation, such that for every matrix
A,
e−1 (e(A)) = A.

Proof : The operation of multiplying the r-th row by c ≠ 0 can be reversed


by multiplying that same row by 1/c (which is why we required c ≠ 0), which
44 Chapter 2

is also an elementary row-operation. The operation of replacing the r-th row


by the sum of row r and c times row s can be reversed by the elementary
row-operation of replacing the r-th row by the sum of row r and (−c) times
row s. n

Definition 2.13 An m × n matrix A is row-equivalent (!‫)שקולה לפי שורה‬


to an m × n matrix B if it can be obtained from B by a finite sequence of
elementary row-operations. That is, if there exists a sequence e1 , e2 , . . . , es of
elementary row-operations, such that

A = es (es−1 (. . . e1 (B))).

Example: Since

1 2 3 r2 ←r2 +2r1 1 2 3 r1 ←−4r1 −4 −8 −12


[ ] Ð→ [ ] Ð→ [ ],
4 5 6 6 9 12 6 9 12

it follows by definition that all three matrices are row-equivalent. ▲▲▲

Proposition 2.14 Row-equivalence is an equivalence relation (‫יחס‬


!‫)שקילות‬: that is, (i) every matrix is row-equivalent to itself, (ii) if A is row-
equivalent to B then B is row-equivalent to A, and (iii) if A is row-equivalent
to B and B is row-equivalent to C, then A is row-equivalent to C. (In par-
ticular, we have a well-defined notion of two matrices being row-equivalent to
each other.)

Proof : Every matrix A is equivalent to itself, for example, because if e is the


elementary row-operation of multiplying the first row by 1, then

A = e(A).

If A is row-equivalent to B, then by definition, there exists a sequence of


elementary row-operations e1 , e2 , . . . , ek , such that

A = ek (ek−1 (⋯e2 (e1 (B)))).

Since every ej has an inverse e−1


j , it follows that

e−1
k (A) = ek (ek (ek−1 (⋯e2 (e1 (B))))) = ek−1 (⋯e2 (e1 (B))),
−1
Linear Systems of Equations 45

and proceeding inductively,


B = e−1 −1 −1 −1
1 (e2 (⋯ek−1 (ek (A))),

proving that B is row-equivalent to A. Finally, if B is also row-equivalent to


C, then by definition, there exists a sequence of elementary row-operations
f1 , f2 , . . . , fs , such that
B = fs (fs−1 (⋯f2 (f1 (C)))).
Hence,
A = ek (ek−1 (⋯e2 (e1 (fs (fs−1 (⋯f2 (f1 (C)))))))),
proving that A is row-equivalent to C. n
Note that we have two notions of equivalence: equivalence between systems
of equations and row-equivalence between matrices. We will shortly claim
that these two notions are related, namely, if two matrices are row-equivalent,
then they represent equivalent systems of equations.

Proposition 2.15 If A and B are row-equivalent m × n matrices, then the


homogeneous linear systems AX = 0 and BX = 0 have the same solutions,

S[A∣0] = S[B∣0] .

Proof : By definition, there exists a sequence of elementary row-operations


e1 , e2 , . . . , ek , such that
A = ek (ek−1 (⋯e2 (e1 (B)))).
It suffices to show that if e is any elementary row-operation, then the homoge-
nous linear systems e(A)X = 0 and AX = 0 have the same set of solutions.
Let e be an elementary row-operation. Since every row in e(A) is a linear
combination of the rows in A, then every solution of AX = 0 is also a solution
of e(A)X = 0 (see Proposition 2.9). Conversely, since every row in A is a
linear combination of the rows in e(A) (since A = e−1 (e(A))), then every
solution of e(A)X = 0 is also a solution of AX = 0. n
In fact, an analogous statement holds for inhomogeneous systems by consid-
ering the extended matrices:
46 Chapter 2

Proposition 2.16 If [A∣c] and [B∣d] are row-equivalent m × (n + 1) matri-


ces, then the linear systems AX = c and BX = d have the same solutions,

S[A∣c] = S[B∣d] .

Exercises

(easy) 2.16 Let e be an elementary row-operation. Show that for every


matrix A, every row of e(A) is a linear combination of the rows in A.

Solution 2.16: This is so by the very definition of elementary row operations; when an
elementary row operation e acts on the matrix A, all its row but one remain unchanged,
whereas one row of e(A) is a linear combination of either one or two rows of A.

(easy) 2.17 Explain explicitly why in the proof of Proposition 2.15 it suffices
to show that the solutions of a homogeneous linear system do not change
under a single elementary row-operation.

Solution 2.17: Just apply an inductive argument on the number of elementary row-
operations.

(easy) 2.18 Can a matrix A ∈ M2×4 and a matrix B ∈ M4×3 be row-equivalent?


If yes, give an example and if not explain why.

Solution 2.18: No. Row-equivalent matrices are of the same size, since elementary
row-operations do no change the dimensions of a matrix.

(easy) 2.19 Consider the following row-operations:

e1 ∶ multiplying the first row by −2.


e2 ∶ exchanging the first and the second rows.
e3 ∶ adding the third row 3 times the first row.

(a) What are the inverse operations e−1


1 , e2 and e3 .
−1 −1
Linear Systems of Equations 47

(b) Perform the three operations sequentially on the matrix


⎡2 1 −1 3⎤
⎢ ⎥
⎢ ⎥
⎢1 −2 0 1⎥ ∈ M3×4 (F).
⎢ ⎥
⎢0 0 2 1⎥
⎣ ⎦

Solution 2.19: For the first part,


e−1
1 ∶ multiplying the first row by −1/2.
e−1
2 ∶ exchanging the first and the second rows.
e−1
3 ∶ adding the third row −3 times the first row.

For the second part,


⎡ 3⎤⎥⎞⎞⎞ ⎡ −6⎤⎥⎞⎞
⎛ ⎛ ⎛⎢⎢2 1 −1
⎥ ⎛ ⎛⎢⎢−4 −2 2

e3 ⎜e2 ⎜e1 ⎜⎢1 −2 0 1⎥⎟⎟⎟ = e3 ⎜e2 ⎜⎢ 1 −2 0 1 ⎥⎟⎟
⎝ ⎝ ⎝⎢⎢0 0 2

1⎥⎦⎠⎠⎠ ⎝ ⎝⎢⎢ 0 0 2

1 ⎥⎦⎠⎠
⎣ ⎣
⎡ 1 −2 0 1 ⎤⎥⎞ ⎡⎢ 1 −2 −6 −2⎤⎥
⎛⎢⎢ ⎥ ⎢ ⎥
= e3 ⎜⎢−4 −2 2 −6⎥⎟ = ⎢−4 −2 2 −6⎥ .
⎝⎢⎢ 0 0 2
⎥⎠ ⎢
1 ⎥⎦ ⎢0 0 2

1 ⎥⎦
⎣ ⎣

(intermediate) 2.20 Let e1 and e2 be two elementary row-operations. Is


it always the case that

e1 (e2 (A)) = e2 (e1 (A)) ?

If yes, explain why. If not, give an example.

Solution 2.20: No. Let for example,


e1 ∶ multiplying the first row by −1.
e2 ∶ adding the third row to the first row,

then
1 −1 2
e2 (e1 ([ ])) = e2 ([ ]) = [ ] ,
3 3 3
whereas
1 4 −4
e1 (e2 ([ ])) = e1 ([ ]) = [ ] .
3 3 3
48 Chapter 2

(intermediate) 2.21 Let

a b
A=[ ] ∈ M2×2 (F).
c d

(a) Show that if ad − bc = 0 then A is row-equivalent to a matrix having a


row with all entries zero. Hint: separate the cases c = 0 and c ≠ 0.
(b) Show that if ad − bc ≠ 0 then A is row-equivalent to the matrix

1 0
[ ].
0 1

Solution 2.21: (a) Suppose that c = 0. Then, ad are zero, i.e., either a = 0 or d = 0. If
d = 0, then we are done. Otherwise, if a = 0 and d ≠ 0, then

0 b 0 0
[ ] is row equivalent to [ ].
0 d 0 d

If however c ≠ 0, then b = ad/c. Then,

a ad/c 0 0
[ ] is row equivalent to [ ].
c d c d

(b) Suppose that c = 0. Then, both a and d are non-zero, and

a b r1 ←r1 /a 1 b/a r2 ←r1 /d 1 b/a r1 ←r1 −(b/a)r2 1 0


[ ] Ð→ [ ] Ð→ [ ] Ð→ [ ].
0 d 0 d 0 1 0 1

You may now complete the case c ≠ 0.

(intermediate) 2.22 Show that two matrices in which two rows have been
interchanged are row-equivalent.

Solution 2.22: The interchange of rows r and s can be partitioned as follows: (i) add
row r to row s, (ii) subtract row s from row r, (iii) add row r to row s, and (iv) multiply
row r by (−1).
Linear Systems of Equations 49

2.4.3 Row-reduced echelon matrices


We next show how a sequence of elementary row-operations can be used to
“simplify” a matrix, which means that its set of solutions can be obtained
easily.

Example: By performing a sequence of eight elementary row-operations,


the matrix
⎡2 −1 3 2 ⎤⎥

⎢ ⎥
A = ⎢1 4 0 −1⎥
⎢ ⎥
⎢2 6 −1 5 ⎥⎦

can be brought to the form
⎡1 0 0 17/3 ⎤
⎢ ⎥
⎢ ⎥
B = ⎢0 1 0 −5/3 ⎥ .
⎢ ⎥
⎢0 0 1 −11/3⎥
⎣ ⎦
What did we gain? The homogeneous linear system BX = 0, whose solutions
coincide by Proposition 2.15 with those of AX = 0, takes the form

X1 +17/3 X 4 = 0
X2 −5/3 X 4 = 0
X3 −11/3 X 4 = 0.

We can let X 4 assume any value, say s, and then

x = [−17/3s, 5/3s, 11/3s, s]T

is a solution. In fact, there are no other solutions, as any other solution


would fail to satisfy the equation BX = 0. That is,


⎪ ⎡−17/3s⎤ ⎫


⎪ ⎢ ⎥ ⎪

⎪ ⎢
⎪⎢ 5/3s ⎥ ⎥ ⎪

4
S[A∣0] = S[B∣0] = ⎨⎢ ⎥ ∈ Fcol ∶ s ∈ F⎬ .

⎪ ⎢ 11/3s ⎥ ⎪


⎪ ⎢ ⎥ ⎪


⎩⎣⎢ s ⎥ ⎪


▲▲▲
50 Chapter 2

Comment: It is sometimes notationally convenient to write a blank instead


of zero in a matrix. Thus, the above matrix B is written as
⎡1 17/3 ⎤⎥

⎢ ⎥
B=⎢ 1 −5/3 ⎥ .
⎢ ⎥

⎣ 1 −11/3⎥⎦

Example: Consider the non-homogeneous linear system represented by the


augmented matrix
⎡ 1 −2 8 5 2 ⎤
⎢ ⎥
⎢ ⎥
A = ⎢ 2 3 1 4 1 ⎥.
⎢ ⎥
⎢ 4 −1 17 14 3 ⎥
⎣ ⎦
By performing a sequence of elementary row-operations, we obtain the aug-
mented matrix
⎡ 1 −2 8 5 2 ⎤⎥

⎢ ⎥
B=⎢ 7 −17 −6 −3 ⎥ .
⎢ ⎥
⎢ −2 ⎥⎦

This system is not consistent because the third equation has all coefficients
of the X i ’s zero but the right-hand side is not zero. ▲▲▲
In both example, we manipulated the system through the matrix of coeffi-
cients until reaching an equivalent system which is explicit, from which we
could determine the solution (in the first example) or determine that there
are no solutions (in the second example). This brings us to the following
definition:

Definition 2.17 An m × n matrix A is said to be a row-reduced echelon


matrix (!‫ )מטריצה בצורת מדרגות מצומצמת‬if

(a) There exists a number r ≤ m, such that the rows r + 1, . . . , m are iden-
tically zero (if r = m then there are no rows that are identically zero).
(b) For each i = 1, . . . , r (i.e., for each non-zero row), let aiki be the first
non-zero entry; then aiki = 1 and k1 < k2 < ⋅ ⋅ ⋅ < kr (the ki ’s are the
columns of the leading coefficients in the non-zero rows).
(c) For each i, aiki is the only nonzero element in the ki -th column.
Linear Systems of Equations 51

Example: A zero matrix (!M‫ )מטריצת אפסי‬is a matrix having all entries
zero; if A ∈ Mm×n is a zero matrix, we write A = 0, or A = 0m×n . A zero
matrix is an example of a row-reduced echelon matrix (with r = 0). ▲ ▲ ▲

Example: The matrix B in the first example is a row-reduced echelon matrix


(with r = m = 3). The matrix
⎡1 −3 17/3 ⎤⎥

⎢ 1 −5/3 ⎥⎥

⎢ ⎥
⎢ 1 −11/3⎥ (2.9)
⎢ ⎥
⎢ ⎥
⎢ ⎥
⎢ ⎥
⎣ ⎦
also satisfies all three conditions with m = 5, r = 3 and k1 = 1, k2 = 3 and
k3 = 4. ▲▲▲

Example: A very important row-reduced echelon matrix is the identity


matrix (!‫ )מטריצת הזהות‬I, which is a square matrix (i.e., m = n) of the form

⎪1 i = j

Iji = δij =⎨
⎩0 i ≠ j,


i.e.,
⎡1 ⎤
⎢ ⎥
⎢ 1 ⎥
⎢ ⎥
I =⎢ ⎥.
⎢ ⋱ ⎥⎥

⎢ 1⎥⎦

The symbol δij is called Kronecker’s delta. ▲▲▲
Row-reduced echelon matrices are useful, because we can read off the solution
to the associated linear system right away. For every i = 1, . . . , r, we call the
variable X ki a dependent variable; a variable X j which is not a dependent
variable, i.e., j ∈/ {k1 , . . . , kr }, is called a free variable. The general solution
of a linear system AX = b, where A is a row-reduced echelon matrix is
constructed as follows: assign arbitrary values to the free variables, and
then, express the dependent variables in terms of the free variables: that is,
for each i = 1, . . . , r,
x k i = bi − ∑ aij xj .
j/∈{k1 ,...,kr }

Note that the summation is only on the indexes of the free variables.
52 Chapter 2

Example: In the matrix (2.9), X 1 , X 3 and X 4 are dependent variables,


whereas X 2 and X 5 are free variables. Setting X 2 = s and X 5 = t, the
general solution of AX = 0 is
x = [3s − 17 5 11 T
3 t, s, 3 t, 3 t, t] .

▲▲▲
When is a non-homogeneous system consistent? Let AX = b with A being
a row-reduced echelon matrix. There are two possibilities: if A has a row
with all its entries zero, and the corresponding row of b is non-zero, then the
system does not have any solution. Otherwise, the zero rows can be ignored,
and the system is consistent.

Exercises

(easy) 2.23 Construct three matrices, each of which fails to satisfy exactly
one condition in the definition of a row-reduced echelon matrix.

(easy) 2.24 Let A be a row-reduced echelon matrix having r non-zero rows.


Explain why it must hold that r ≤ m and r ≤ n.
Solution 2.24: Since k1 < k2 < ⋅ ⋅ ⋅ < kr , it follows that ki ≥ i for all i = 1, . . . , r. Hence,
r ≤ kr ≤ n

That r ≤ m (the number of non-zero rows is not larger than the total number of rows) is
obvious.

(easy) 2.25 Characterize all 1×n and all m×1 row-reduced echelon matrices.
Solution 2.25: There is only one m × 1 row-reduced echelon matrix,
⎡1⎤
⎢ ⎥
⎢0⎥
⎢ ⎥
⎢ ⎥.
⎢⋮⎥
⎢ ⎥
⎢0⎥
⎣ ⎦
On the other hand, there are more types of 1 × n row-reduced echelon matrices,
[1 ∗ ⋯ ∗] [0 1 ∗ ⋯ ∗] ,

where the asterisks represents arbitrary scalars.


Linear Systems of Equations 53

(easy) 2.26 Explain why the n × n identity matrix is the unique n × n row-
reduced echelon matrix having no zero row.

Solution 2.26: Since the matrix has no non-zero rows, r = n. Now,


n ≥ kr ≥ r = n,

from which we conclude that kr = r. The condition that

k1 < k2 < ⋯ < kr ,

implies that ki = i for all i = 1, . . . , n. The only such matrix is the unit matrix.

(easy) 2.27 Characterize all the 2 × 2 row-reduced echelon matrices.

Solution 2.27: The only row-reduced echelon matrices are


0 0 1 ∗ 0 1 1 0
[ ] [ ] [ ] [ ].
0 0 0 0 0 0 0 1

2.4.4 The Gauss-Jordan algorithm


The key theorem for a systematic solution of linear systems is the following:

Theorem 2.18 Every m × n matrix A is row-equivalent to a row-reduced


echelon matrix.

Proof : The proof follows a procedure called the Gauss-Jordan algorithm.


If all the entries of A are zero, then A is already a row-reduced echelon matrix
(hence it is row-equivalent to one). Otherwise, if needed, take any row whose
first nonzero entry is the least, and bring it to be the first row (this is an
operation preserving row-equivalence, see Exercise 2.22); in the new matrix,
k1 is the column of the first non-zero entry of the first row. Divide the first
row by a1k1 such that after this change a1k1 = 1. Then, subtract from the
i-th row, i ≠ 1, aik1 times the first row. These are elementary row-operations
which eliminate all entries in the k1 -st column except in the first row.
54 Chapter 2

Next, ignore the first row and bring to the second row the row whose first
nonzero entry is the least. Denote by k2 the column of the first non-zero
entry of the second row; by construction, k2 > k1 . Divide the second row by
a2k2 such that after this change a2k2 = 1. Then, subtract from the i-th row,
i ≠ 2, aik2 times the second row. These are elementary row-operations which
eliminate all entries in the k2 -nd column except in the second row. Note
also that this did not destroy the fact that up to the k2 -th column, the only
nonzero entries are a1k1 = 1 and possibly a1j for k1 < j < k2 .
We proceed this way, until reaching the m-th row, or until the remaining
rows are identically zero. n

Example: Apply the Gauss-Jordan algorithm on


⎡2 1 2 10⎤
⎢ ⎥
⎢ ⎥
A = ⎢1 2 1 8 ⎥ .
⎢ ⎥
⎢3 1 −1 2 ⎥
⎣ ⎦
We follows the procedure,
⎡2 1 2 10⎤ ⎡ ⎤ ⎡ 5⎤⎥
⎢ ⎥ r1 ←r1 /2 ⎢1 1/2 1 5⎥ r ←r −r ⎢1 1/2 1
⎢ ⎥ ⎢ ⎥ 2 2 1⎢ ⎥
⎢1 2 1 8 ⎥ Ð→ ⎢1 2 1 8⎥ Ð→ ⎢0 3/2 0 3⎥
⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎢3 1 −1 2 ⎥ ⎢3 1 −1 2⎥ ⎢3 1 −1 2⎥⎦
⎣ ⎦ ⎣ ⎦ ⎣
⎡1 1/2 1 ⎤
5 ⎥ ⎡ 5 ⎤⎥
⎢ ⎢1 1/2 1
r3 ←r3 −3r1 ⎢ ⎥ r2 ←2r2 /3 ⎢ ⎥
Ð→ ⎢0 3/2 0 3 ⎥ Ð→ ⎢0 1 0 2 ⎥
⎢ ⎥ ⎢ ⎥
⎢0 −1/2 −4 −13⎥ ⎢0 −1/2 −4 −13⎥⎦
⎣ ⎦ ⎣
⎡1 0 1 4 ⎥⎤ ⎡1 0 1 4 ⎤⎥

r1 ←r1 −r2 /2 ⎢

⎥ r3 ←r3 +r2 /2 ⎢ ⎥
Ð→ ⎢0 1 0 2 ⎥ Ð→ ⎢0 1 0 2 ⎥
⎢ ⎥ ⎢ ⎥
⎢0 −1/2 −4 −13⎥ ⎢0 0 −4 −12⎥⎦
⎣ ⎦ ⎣
⎡1 0 1 4⎤ ⎡1 0 0 1⎤⎥

r3 ←−r3 /4 ⎢
⎥ r ←r −r ⎢
⎥ 1 1 3⎢ ⎥
Ð→ ⎢0 1 0 2⎥ Ð→ ⎢0 1 0 2⎥
⎢ ⎥ ⎢ ⎥
⎢0 0 1 3⎥ ⎢0 0 1 3⎥⎦
⎣ ⎦ ⎣
▲▲▲

Example: With A as in the previous example, solve the linear system


⎡1⎤
⎢ ⎥
⎢ ⎥
AX = ⎢2⎥
⎢ ⎥
⎢3⎥
⎣ ⎦
using the augmented matrix of this system. ▲▲▲
Linear Systems of Equations 55

Solution of linear systems using the Gauss-Jordan algorithm Let


[A∣b] be the augmented matrix representing a linear system of m equations
in n unknowns. We now have a systematic way of determining whether it
is consistent, and if it is, finding its set of solutions. Let R ∈ Mm×n (F) be
a row-reduced echelon matrix which is row-equivalent to A and let [R∣d] be
the augmented matrix obtained by applying on [A∣b] the elementary row-
operations bringing A into R. That is, the system represented by [R∣d] has
the same set of solutions as the system represented by [A∣b].
Let r ≤ m be the number of non-zero rows in R and let k1 < k2 < ⋅ ⋅ ⋅ < kr be
the columns of the leading non-zero elements in each of these rows. Thus,
the variables
X k1 , X k2 , . . . , X kr
are the dependent variables, whereas the rest, which we denote by

X `1 , X `2 , . . . , X `n−r

are the free variables.


By the structure of the row-reduced echelon matrix, the first r equations read
n−r
X k1 + ∑ r`1j X `j = d1
j=1


n−r
X kr + ∑ r`rj X `j = dr ,
j=1

whereas the next m − r equations read

0 = dr+1

0 = dm .

Evidently, if dr+1 , . . . , dm are not all zero, then the system is not consistent.
If however,
dr+1 = dr+2 = ⋅ ⋅ ⋅ = dm = 0,
then we can replace the free variables X `1 , . . . , X `n−r by any sequence of
scalars t1 , . . . , tn−r , obtaining a solvable equation for each of the dependent
56 Chapter 2

variables X k1 , . . . , X kr (a linear equation in one unknown!), whose solution


is n−r
xki = di − ∑ r`1j tj .
j=1

We should have perhaps noted long ago, that any homogeneous linear system
has at least one solution, [0, 0, . . . , 0]T , which we simply denote by 0 ∈ Fncol .
This solution is called the trivial solution (!‫ הטריוויאלי‬N‫)הפתרו‬. For any
matrix that has free variables, there also exist non-trivial solutions to the
homogeneous problem (as they may assume any value). In particular,

Proposition 2.19 If A is an m × n matrix with m < n (i.e., less equa-


tions than unknowns), then the homogeneous system AX = 0 has non-trivial
solutions.

Proof : Reduce A to a row-reduced echelon matrix. Then, there are at most


m nonzero rows, hence there are at least m − n free variables. n
The question of whether there exist non-trivial solutions is central to linear
algebra. Naively, we would expect solutions to be unique when the number
of equations is equal to the number of unknown, i.e., when m = n. This is
not sufficient. The following theorem characterizes the square matrices for
which the trivial solution is the only solution:

Theorem 2.20 Let A be an n × n matrix. Then, the homogeneous system


AX = 0 has only a trivial solution if and only if A is row-equivalent to the
n × n identity matrix.

Proof : There are two directions to prove. Assume first that A is row-
equivalent to the identity matrix. Since row-equivalent matrices have the
same associated solutions, the solutions to AX = 0 coincide with the solu-
tions of IX = 0, i.e,

X1 = 0 X2 = 0 ⋯ X n = 0,

and those only include the trivial solution.


Linear Systems of Equations 57

Conversely, suppose that x = 0 is the only solution to AX = 0. Let R denote


a row-reduced echelon matrix which is row-equivalent to A. Then, RX = 0
doesn’t have non-trivial solutions, which means that all of its n rows are non-
zero. This is only possible if k1 = 1, k2 = 2, etc, and the only row-reduced
echelon matrix satisfying these conditions is the identity matrix. n

Exercises

(intermediate) 2.28 Suppose that A is a square matrix which is row-


equivalent to the identity matrix. Show that the inhomogeneous system
AX = b is consistent and has a unique solution.
Solution 2.28: Since A can be reduced to the identity matrix, the extended matrix
can be reduced to a matrix of the form
⎡ 1
⎢ d1 ⎤⎥
⎢ 1 d2 ⎥⎥

⎢ ⎥.
⎢ ⋱ ⋮ ⎥⎥
⎢ m

⎣ 1 d ⎥⎦
This row-equivalent system is solvable and has a unique solution.

(intermediate) 2.29 Let F = Q. Find all the solutions to the homogeneous


linear system
1 1
3X +2X 2 −6X 3 = 0
−4X 1 +5X 3 = 0
−3X +6X −13X 3 = 0
1 2

− 37 X 1 +2X 2 − 83 X 3 = 0
by first writing it in matrix form, and then transforming the matrix of coef-
ficients into a row-reduced echelon matrix.
Solution 2.29: Writing the matrix and coefficients and reducing it, we obtain
⎡ 1 2 −6 ⎤⎥ ⎡1 0 19 ⎤
⎢ 3 ⎢ 6 ⎥
⎢ −4 0 5 ⎥⎥ ⎢0 1 − 67 ⎥
⎢ ⎢ 24 ⎥
⎢ ⎥ → ⎢ ⎥
⎢ −3
⎢ 7 6 −13⎥⎥ ⎢0
⎢ 0 0 ⎥

⎢−
⎣ 3 2 − 83 ⎥⎦ ⎢0
⎣ 0 0 ⎥

Thus, X 3 is a free variables, and the set of solutions is
{[− 19
6
a, 67
24
a, a]T ∶ a ∈ Q}.
58 Chapter 2

(intermediate) 2.30 What are all the solutions (if any) of the system
X 1 −X 2 +2X 3 = 1
2X 1 +2X 3 = 1
X 1 −3X +4X 3 = 2.
2

Use the augmented matrix representation to solve this system.


Solution 2.30: We start with the augmented matrix
⎡ 1 −1 2 ⎤
1
⎢ ⎥
⎢ ⎥
⎢ 2 0 2 1
⎥.
⎢ ⎥
⎢ 1 −3 4 2

⎣ ⎦
Applying the Gauss-Jordan elimination algorithm we obtain
⎡ 1 1 ⎤
⎢ 0 1 2 ⎥
⎢ ⎥
⎢ 0 1 −1 − 21 ⎥ .
⎢ ⎥
⎢ 0 0 0 0 ⎥⎦

The system is solvable with X 3 a free variable. The set of solutions is
{(− 12 + a, 21 − a, a) ∶ a ∈ F}.

(intermediate) 2.31 Show using the Gauss-Jordan algorithm that the non-
homogeneous system
X 1 −2X 2 +X 3 +2X 4 = 1
X 1 +X 2 −X 3 +X 4 = 2
X 1 +7X 2 −5X 3 −X 4 = 3.
has no solutions.
Solution 2.31: We start with the augmented matrix
⎡ 1 −2 1 2 1 ⎤⎥

⎢ ⎥
⎢ 1 1 −1 1 2 ⎥ .
⎢ ⎥
⎢ 1 7 −5 −1 3 ⎥
⎣ ⎦
Applying the Gauss-Jordan elimination algorithm we obtain
⎡ 1 0 1 4 5 ⎤
⎢ 3 3 3 ⎥
⎢ 2 1 1 ⎥
⎢ 0 1 −3 −3 ⎥.
3 ⎥

⎢ 0 0 0 0 −1 ⎥
⎣ ⎦
The last equation does not have a solution.
Linear Systems of Equations 59

(intermediate) 2.32 Let


⎡3 −1 2⎤
⎢ ⎥
⎢ ⎥
A = ⎢2 1 1⎥ .
⎢ ⎥
⎢1 −3 0⎥
⎣ ⎦
For which b ∈ F3col does the system AX = b have a solution?

Solution 2.32: Write the augmented matrix


⎡ 3
⎢ −1 2 b1 ⎤

⎢ ⎥
⎢ 2 1 1 b2 ⎥
⎢ ⎥
⎢ 1
⎣ −2 0 b3 ⎥

Reducing it, we find that all three rows are non-zero (i.e., A is row-equivalent to the unit
matrix). Hence, this system is solvable for all b’s.

(intermediate) 2.33 Let


⎡ 3 −6 2 −1⎤
⎢ ⎥
⎢−2 4 1 3 ⎥
⎢ ⎥
A=⎢ ⎥.
⎢0 0 1 1⎥
⎢ ⎥
⎢ 1 −2 1 0 ⎥
⎣ ⎦
For which b ∈ F4col does the system AX = b have a solution?

Solution 2.33: Performing the Gauss-Jordan algorithm, the reduced extended matrix
is
⎡ 1 −2
⎢ 2/3 −1/3 b1 /3 ⎤

⎢ 0 0 1 1 3b /7 + 2b1 /7
2 ⎥
⎢ ⎥
⎢ ⎥
⎢ 0 0
⎢ 0 0 b − 3b2 /70 − 2b1 /7
3 ⎥

⎢ 0 0
⎣ 0 0 3b4 − 3b2 /7 − 9b1 /7 ⎥

For the system to be solvable,

b3 = 3b2 /70 + 2b1 /7 and b4 = b2 /7 + 3b1 /7.

(harder) 2.34 Let A and B be two 2 × 3 row-reduced echelon matrices.


Suppose that the homogeneous systems AX = 0 and BX = 0 have the same
set of solutions. Prove that A = B.
60 Chapter 2

2.5 Operations with matrices


2.5.1 Addition of matrices
Given two matrices A, B ∈ Mm×n (F) we define their sum

S =A+B

to be a matrix S ∈ Mm×n (F), whose entries sij are given by

sij = aij + bij .

Note that the “+” sign in both relations has a totally different meaning: the
first is addition in Mm×n (F), whereas the second is addition in F. Another
way to write the definition of the addition of matrices (of the same size!) is

(A + B)ij = aij + bij .

Example:
1 2 3 7 8 9 8 10 12
[ ]+[ ]=[ ].
4 5 6 10 11 12 14 16 18
▲▲▲
If we denote by 0m×n (or just 0 in short) the m × n-matrix whose entries are
all zero, then
A+0=A
for every A ∈ Mm×n (F). Likewise, given A ∈ Mm×n (F), we denote by (−A)
the m × n matrix given by
(−A)ij = −aij .
For every A ∈ Mm×n (F),
A + (−A) = 0.
It is easy to see that matrix addition is associative, namely, for every A, B, C ∈
Mm×n (F) we have
(A + B) + C = A + (B + C),
and commutative. namely,

A + B = B + A.
Linear Systems of Equations 61

Note that the addition of matrices satisfies the four axioms of addition in a
field. This doesn’t make Mm×n (F) into a field!
We could define in a similar way products of matrices of the same size. We
could. But we won’t do so. We will rather have a different definition for
products of matrices, not necessarily of the same size, which will relate to
linear combinations of systems of equations.
You may ask yourself what is the purpose of adding up matrices, and whether
it relates to the solution of linear systems of equations. The meaning of
matrix addition will be clarified later in this course, in the context of linear
transformations.

Exercises

(easy) 2.35 Show that matrix addition is both associative and commuta-
tive.

Solution 2.35: We have


(A + B)ij = aij + bij = bij + aij = (B + A)ij ,

and

((A + B) + C)ij = (A + B)ij + cij = aij + bij + cij = aij + (B + C)ij = (A + (B + C))ij .

2.5.2 Multiplication by a scalar


For a matrix A ∈ Mm×n (F) and a scalar c ∈ F we define their product, c A ∈
Mm×n (F), whose entries are defined by

(c A)ij = c aij .

That is, the scalar c multiplies every entry of A to yield the matrix c A.
We could think of the elements of F as “acting” on elements in Mm×n (F)
resulting in an element in Mm×n (F).
62 Chapter 2

Example:
1 2 3 4 8 12
4⋅[ ]=[ ].
4 5 6 16 20 24
▲▲▲
It is easy to see that multiplication by a scalar satisfies

1F ⋅ A = A

for every A ∈ Mm×n (F), and

c(dA) = (cd)A

for every c, d ∈ F and A ∈ Mm×n (F); this is a kind-of associativity up to the


fact that the product between scalars differs from the product between a
scalar and a matrix. Also, for every A ∈ Mm×n (F),

0F ⋅ A = 0m×n

and for every c ∈ F,


c 0m×n = 0m×n .
Finally, for every A ∈ Mm×n (F),

(−1F )A = (−A).

The product of a scalar and a matrix satisfies also distributive properties:


on the one hand, for every A, B ∈ Mm×n (F) and c ∈ F,

c (A + B) = c A + c B. (2.10)

On the other hand, for every A ∈ Mm×n (F) and c, d ∈ F,

(c + d) A = c A + d A. (2.11)

Exercises

(easy) 2.36 Show that the multiplication of a matrix by a scalar satisfies


1F ⋅ A = A for every A ∈ Mm×n (F), and c(dA)) = (cd)A for every c, d ∈ F and
A ∈ Mm×n (F).
Linear Systems of Equations 63

Solution 2.36: Take the second claim: by definition,


(c(dA))ij = c (dA)ij == cd aij = ((cd)A)ij .

Since this holds for every i, j, we obtain that c(dA) = (cd)A.

(easy) 2.37 Show that for every A ∈ Mm×n (F),

(−1F )A = (−A),

and more generally that for every A ∈ Mm×n (F) and c ∈ F

(−c) A = −(c A).

Solution 2.37: We have


((−c) A)ij = (−c) aij = −(c aij ) = −(c A)ij = (−(c A))ij .

Since this holds for every i, j, we obtain that (−c)A = −(cA).

(easy) 2.38 Prove the two distributive properties (2.10), (2.11) of the prod-
uct of a scalar and a matrix.

Solution 2.38: Take the first identity,


(c(A + B))ij = c (A + B)ij = c(aij + bij ) = c aij + c bij = (cA)ij + (cB)ij = (cA + cB)ij .

Since this holds for every i, j, we obtain that c(A + B) = cA + cB.

2.5.3 Products of matrices


We started this chapter by considering linear systems in which each equation
is a linear combination of the equations of another system, before focusing
on the particular case of elementary row-operations. We now return to the
procedure of forming linear combinations of equations in a more systematic
way, leaning upon our new notational system of matrices.
Let A ∈ Mm×n (F) be a matrix representing a system of m equations in n
unknowns. Suppose that we want to create from it a system of p equations
64 Chapter 2

in the same n unknowns by taking linear combinations of the equations of the


first system. Think of the i-th equation in the new system. It is formed by
multiplying the first equation by a scalar bi1 , the second equation by a scalar
bi2 up to the m-th equation by a scalar bim , and adding up the m equations.
What is the coefficient of X 1 in the new equation? It is
m
bi1 a11 + bi2 a21 + ⋅ ⋅ ⋅ + bim am i k
1 = ∑ b k a1 .
k=1

Note that the index i remains fixed—it represents the index of the equation
in the new system—and so does the index 1—which represents the variable
whose coefficient we calculate.
Likewise, the coefficient of X 2 in the new i-th equation is
m
bi1 a12 + bi2 a22 + ⋅ ⋅ ⋅ + bim am i k
2 = ∑ b k a2 ,
k=1

and more generally, the coefficient of X j in the new i-th equation is


m
bi1 a1j + bi2 a2j + ⋅ ⋅ ⋅ + bim am i k
j = ∑ b k aj .
k=1

Thus, to form p equations by linear combinations of m equations we need


p × m scalars
{bij ∶ i = 1, . . . , p, j = 1, . . . , m},
such that the coefficient of the j-th variable in the new i-th equation is given
by
m
∑ bik akj .
k=1

This operation of forming linear combinations of equations can be represented


using matrices.

Definition 2.21 Let B ∈ Mp×m (F) and let A ∈ Mm×n (F). Their product
(!‫ )מכפלה של מטריצות‬BA is a p × n matrix whose (i, j)-th entry is given by
m
(BA)ij = ∑ bik akj = bi1 a1j + ⋅ ⋅ ⋅ + bim am
j .
k=1
Linear Systems of Equations 65

Note: for the product BA to be defined, the number of columns in B has to


be equal to the number of rows in A.

Example: Consider a linear system of 2 equations in 3 unknowns represented


by the matrix
5 −1 2
A=[ ].
15 4 8
We form a new system of 2 equations in 3 unknowns by multiplying it by the
matrix
1 0
B=[ ].
−3 1
The first equation in the new system is obtained by multiplying the first
equation in the original system by 1 and the second equation by zero and
adding the two—in other words, the first equation remains the same. The
second equation in the new system is obtained be multiplying the first equa-
tion in the original system by (−3) and the second equation by 1 and adding
the two. The corresponding matrix product is

1 0 5 −1 2 5 −1 2
[ ][ ]=[ ].
−3 1 15 4 8 0 7 2

▲▲▲
Note that when we wrote the unknowns as an n×1 matrix, and the right-hand
side of the equation as an m × 1 matrix,
⎡X 1 ⎤ ⎡ b1 ⎤
⎢ ⎥ ⎢ ⎥
⎢X 2 ⎥ ⎢ b2 ⎥
⎢ ⎥ ⎢ ⎥
X=⎢ ⎥ and b = ⎢ ⎥,
⎢ ⋮ ⎥ ⎢ ⋮ ⎥
⎢ n⎥ ⎢ m⎥
⎢X ⎥ ⎢b ⎥
⎣ ⎦ ⎣ ⎦
the equation AX = b can be interpreted in terms of matrix multiplication:
the product of an m × n matrix and an n × 1 matrix is an m × 1 matrix.

Example: Let A be an m × n matrix and let Im (or in short I) be the m × m


identity matrix, which we recall is given by

⎪1 i = j

Iij = δij =⎨
⎩0 i ≠ j.


66 Chapter 2

Then,
m
(Im A)ij = ∑ δki Akj = Aij ,
k=1
namely Im A = A for every A. Likewise,
n
(AIn )ij = ∑ Aik δjk = Aij ,
k=1
namely, AIn = A. ▲▲▲

Example: Let’s see what happens when we multiply a matrix by a matrix


which has all entries zero except for one entry, which is 1. For example,
suppose that the (2, 3) entry equals one,
⎡0 0 0⎤ ⎡a1 a1 a1 a1 ⎤ ⎡ 0 0 0 0 ⎤
⎢ ⎥ ⎢ 1 2 3 4⎥ ⎢ ⎥
⎢ ⎥⎢ ⎥ ⎢ ⎥
⎢0 0 1⎥ ⎢a21 a22 a23 a24 ⎥ = ⎢a31 a32 a33 a34 ⎥ .
⎢ ⎥ ⎢ 3 3 3 3⎥ ⎢ ⎥
⎢0 0 0⎥ ⎢a1 a2 a3 a4 ⎥ ⎢ 0 0 0 0 ⎥
⎣ ⎦⎣ ⎦ ⎣ ⎦
Thus, the third row was copied into the second row of the product.
For the other way around
⎡a1 a1 a1 a1 ⎤ ⎢0 0 0⎤⎥ ⎡

⎢ 1 2 3 4 ⎥ ⎢0 0 0 0 a12 ⎤⎥
⎢ 2 2 2 2⎥ ⎢ 1⎥⎥ ⎢⎢ ⎥
⎢a1 a2 a3 a4 ⎥ ⎢ ⎥ = ⎢0 0 a22 ⎥ ,
⎢ 3 3 3 3 ⎥ ⎢0 0 0⎥ ⎢
⎥ ⎢ 3⎥

⎢a1 a2 a3 a4 ⎥ ⎢ 0 0 a 2
⎣ ⎦ ⎢0 0 0⎥⎦ ⎣ ⎦

i.e., the second column was copied into the third column of the product.
▲▲▲
Here is another way to define the product of two matrices. Let B ∈ M1×m (F) =
n
Fm
row and A ∈ Mm×1 (F) = Fcol . We define their product by
⎡ a1 ⎤
⎢ ⎥
⎢ a2 ⎥ m
⎢ ⎥
BA = [b1 b2 ⋯ b ] ⎢ ⎥ = ∑ bj aj .
m
⎢ ⋮ ⎥ j=1
⎢ m⎥
⎢a ⎥
⎣ ⎦
Then, for B ∈ Mp×m (F) and A ∈ Mm×n (F), their product BA ∈ Mp×n (F) is
defined by
(BA)ij = Rowi (B) ⋅ Colj (A).
The following relations are useful to remember,
Rowi (AB) = Rowi (A) ⋅ B
(2.12)
Colj (AB) = A ⋅ Colj (B).
Linear Systems of Equations 67

Exercises

(easy) 2.39 Let


⎡3⎤
2 −1 1 ⎢ ⎥
⎢ ⎥
A=[ ] B=⎢1⎥ and C = [1 −1] .
1 2 1 ⎢ ⎥
⎢−1⎥
⎣ ⎦
Calculate ABC and CAB (advice: before starting to calculate, determine
the size of the matrices in each case).
Solution 2.39:
⎡3⎤
2 −1 1 ⎢⎢ ⎥⎥ 4
AB = [ ]⎢ 1 ⎥ = [ ],
1 2 1 ⎢⎢ ⎥⎥ 4
⎣−1⎦
hence
4 4 −4 4
ABC = [ ] [1 −1] = [ ] and CAB = [1 −1] [ ] = [0].
4 4 −4 4

(easy) 2.40 Let


⎡1 ⎤ ⎡ 1⎤⎥ ⎡a b c ⎤
⎢ ⎥ ⎢ ⎢ ⎥
⎢ ⎥ ⎢ ⎥ ⎢ ⎥
A=⎢ 1⎥ B = ⎢1 ⎥ and C = ⎢d e f ⎥ .
⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎢ 1 ⎥ ⎢ 1 ⎥ ⎢g h i ⎥
⎣ ⎦ ⎣ ⎦ ⎣ ⎦
Calculate AB, BA, ABC and CBA.
Solution 2.40:
⎡ 1⎤⎥ ⎡ 1 ⎤ ⎡g h i ⎤⎥ ⎡b a c ⎤⎥
⎢ ⎢ ⎥ ⎢ ⎢
⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥
AB = ⎢ 1 ⎥ BA = ⎢1 ⎥ ABC = ⎢d e f⎥ CBA = ⎢ e d f⎥ .
⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎢1 ⎥ ⎢ 1⎥⎦ ⎢a b c ⎥⎦ ⎢h g i ⎥⎦
⎣ ⎦ ⎣ ⎣ ⎣

(intermediate) 2.41 Let


⎡0 1 0 0 0⎤⎥

⎢0 0 1 0 0⎥⎥

⎢ ⎥
A = ⎢0 0 0 1 0⎥ .
⎢ ⎥
⎢0 0 0 0 1⎥⎥

⎢0 0 0 0 0⎥⎦

Calculate A2 , A3 , A4 , A5 and A6 .
68 Chapter 2

Solution 2.41: An explicit calculation yields


⎡0 0 1 0 0⎤⎥ ⎡0 0 0 1 0⎤⎥ ⎡0 0 0 0 1⎤⎥
⎢ ⎢ ⎢
⎢0 0 0 1 0⎥⎥ ⎢0 0 0 0 1⎥⎥ ⎢0 0 0 0 0⎥⎥
⎢ ⎢ ⎢
2 ⎢ ⎥ 3 ⎢ ⎥ 4 ⎢ ⎥
A = ⎢0 0 0 0 1⎥ A = ⎢0 0 0 0 0⎥ A = ⎢0 0 0 0 0⎥ .
⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎢0 0 0 0 0⎥⎥ ⎢0 0 0 0 0⎥⎥ ⎢0 0 0 0 0⎥⎥
⎢ ⎢ ⎢
⎢0 0 0 0 0⎥⎦ ⎢0 0 0 0 0⎥⎦ ⎢0 0 0 0 0⎥⎦
⎣ ⎣ ⎣
All higher powers of A vanish.

(intermediate) 2.42 Find a non-zero matrix A ∈ M2×2 (F) satisfying A2 =


02×2 .

Solution 2.42: Based on the previous exercise,


0 1
A=[ ]
0 0

is a solution.

(intermediate) 2.43 Let


1 1
A=[ ].
0 1
Calculate A2020 .

Solution 2.43: Looking for the first few powers, you may convince yourself that
1 n
An = [ ].
0 1

This clearly holds for n = 1. Assume this holds for n − 1. Then,

1 1 1 n−1 1 n
An = AAn−1 = [ ][ ]=[ ].
0 1 0 1 0 1

(intermediate) 2.44 Let A ∈ Mm×n (F) and B ∈ Bk×m (F). Which of the
following statements is true? If it is, prove it, otherwise provide a counter
example.

(a) If the first row of A is zero, then the first row of BA is zero.
Linear Systems of Equations 69

(b) If the first column of A is zero, then the first column of BA is zero.
(c) If the first two rows of B are zero, then the first two rows of BA are
zero.
(d) If the first two columns of B are zero, then the first two columns of BA
are zero.
(e) If the i-th and the j-th rows of A are equal then the i-th and the j-th
rows of BA are equal.
(f) If the i-th and the j-th columns of A are equal then the i-th and the
j-th columns of BA are equal.
(g) If the i-th and the j-th rows of B are equal then the i-th and the j-th
rows of BA are equal.
(h) If the i-th and the j-th columns of B are equal then the i-th and the
j-th columns of BA are equal.

Solution 2.44: It all hinges on the fact that


Rowi (AB) = Rowi (A)B and Coli (AB) = A Coli (B).
Hence, the answer are false, true, true and false. As a counter example for the first item
1 1 0 0 1 1
[ ][ ]=[ ].
1 1 1 1 1 1

(harder) 2.45 Prove or disprove the following statements:

(a) If A, B ∈ Mn×n (F) satisfy AB = B and B ≠ 0, then A = I2 .


(b) There exists a matrix A ∈ M2×2 (F) satisfying A2 = −I2 .

Solution 2.45: The first statement is false, for example,


1 0 1 1 1 1
[ ][ ]=[ ].
1 0 1 1 1 1
The second statement is true,
0 1 0 1 −1 0
[ ][ ]=[ ].
−1 0 −1 0 0 −1
70 Chapter 2

2.5.4 Algebraic properties of matrix multiplication


Since we have introduced a new operation—a product of matrices—there are
natural questions to raise: (i) is it commutative? (ii) is it associative? (iii)
does this product have a unit element? (iii) is it distributive? (iv) How does
it relate to multiplication by a scalar?
For commutativity, for AB and BA to be defined, it must be that if A ∈
Mm×n (F), then B ∈ Mn×m (F). Take for example, m = n = 2, and consider the
matrices
1 2 2 1
A=[ ] and B=[ ].
3 4 0 3
Then,
2 7 5 8
AB = [ ] and BA = [ ],
6 15 9 12
i.e., matrix multiplication is not commutative.
For associativity, we first note that if A is an m × n matrix, B is an n × p
matrix and C is a p×q matrix, then both (AB)C and A(BC) are well-defined
m × q matrices.

Proposition 2.22 Matrix multiplication is associative: for all A ∈


Mm×n (F), B ∈ Mn×p (F) and C ∈ Mp×q (F),

(AB)C = A(BC).

Proof : Just follow the definition, using the associative properties of both
addition and multiplication in F.
p p n p n
((AB)C)ij = ∑ (AB)ik ckj = ∑ (∑ ais bsk ) ckj = ∑ ∑ ais bsk ckj ,
k=1 k=1 s=1 k=1 s=1

and
n n p n p
(A(BC))ij = ∑ ais (BC)sj = ∑ ais ( ∑ bsk ckj ) = ∑ ∑ ais bsk ckj .
s=1 s=1 k=1 s=1 k=1

Since the order of summation can be interchanged (commutativity of addi-


tion), both expressions are equal. n
Linear Systems of Equations 71

Comment: Since (AB)C = A(BC), we may write products ABC unam-


biguously. The same holds for the product of four of more matrices (as long
as they are of compatible size).

Comment: if A is a square matrix, then AA is well-defined. By associativity,


AAA, AAAA are well-defined, hence we may write Ak , k ∈ N unambiguously.
Regarding unit elements, we saw that an m × n matrix has a unit element Im
for left-multiplication and a unit element In for right-multiplication.
Next,

Proposition 2.23 Matrix multiplication and matrix addition are distribu-


tive. If A and B are m × n matrices and C and D are n × p matrices, then

(A + B)C = AC + BC and A(C + D) = AC + AD.

Proof : This is left as an exercise. n


Finally, we also have the following form of associativity:

Proposition 2.24 Let A ∈ Mm×n (F) and B ∈ Mn×p (F). Let λ ∈ F. Then,

λ(AB) = (λA)B.

Proof : This is left as an exercise. n

Exercises

(intermediate) 2.46 Prove Propositions 2.23 and 2.24.


Solution 2.46: We prove the first case. By definition,
n n n n
((A+B)C)ij = ∑ (A+B)ik ckj = ∑ (aik +bik )ckj = ∑ aik ckj + ∑ bik ckj = (AC)ij +(BC)ij = (AC+BC)ij .
k=1 k=1 k=1 k=1
72 Chapter 2

2.5.5 Matrix multiplication and block patterns


Consider as an example a product of a 3 × 2 matrix A and a 2 × 5 matrix B.
We may look at this product as follows:

In this illustration, (AB)24 is determined by multiplying the second row of A


and the fourth column of B.
Think now of A and B as matrices which are internally divided into sub-
matrices as follows:

B E F

C C ⋅E C ⋅F

D D⋅E D⋅F

Here, we partition the rows of A into two groups so that we represent the
matrix A ∈ M3×2 (F) as
C
A = [ ],
D
Linear Systems of Equations 73

where C ∈ M2×2 (F) and D ∈ M1×2 (F). Likewise, we partition the columns of
B into two groups so that we represent the matrix B ∈ M3×2 (F) as
B = [E F ] ,
where E ∈ M2×3 (F) and F ∈ M2×2 (F). Then, the product AB ∈ M3×5 (F) can
be represented as a block matrix
CE CF
AB = [ ],
DE DF
with CE ∈ M2×3 (F), CF ∈ M2×2 (F), DE ∈ M1×3 (F) and DF ∈ M1×2 (F).

2.5.6 Invertible matrices


In this section we consider the algebra of n×n matrices with values in F, which
we denote by Mn (F) (as short-hand notation for Mn×n (F)). Such matrices
are called square matrices (!‫ ;)מטריצות ריבועיות‬they have the property that
their product yields once again a matrix of the same type. Note that in the
context of linear systems, square matrices represent systems of equations in
which the number of equations equals the number of variables.

Definition 2.25 A matrix A ∈ Mn (F) is called invertible (!‫ )הפיכה‬if there


exists a matrix B ∈ Mn (F) such that
BA = AB = In .
The matrix B is called an inverse (!‫ )הפכית‬of the matrix A. The set of n × n
invertible matrices is denoted by GLn (F).

Comments:

(a) By definition, if A is invertible and B is an inverse of A, then B is


invertible and A is an inverse of B.
(b) At this stage, we are referring to an inverse rather than the inverse
because we don’t (yet) know whether there exists a unique inverse.

Example: The matrix In is invertible as


In In = In ,
i.e., In is its own inverse. ▲▲▲
74 Chapter 2

Comment: If a matrix A ∈ Mn (F) has a row whose entries are all zero or a
column whose entries are all zero, then it is not invertible. Why? Suppose
that the i-th row of A is zero. Then, for every matrix B ∈ Mn (F),

(AB)ii = Rowi (A) ⋅ Coli (B) = 0 ≠ (In )ii ,

i.e., AB cannot be equal In . Similarly, if the i-th column of A is zero, then


for every matrix B ∈ Mn (F),

(BA)ii = Rowi (B) ⋅ Coli (A) = 0F ≠ 1F = (In )ii ,

and BA product cannot be equal In .


In fact, there are many more matrices that are not invertible:

Proposition 2.26 Let A ∈ Mn (F). If there exists a non-zero matrix C ∈


Mn (F) such that AC = 0, then A in not invertible.

Proof : Suppose by contradiction that A is invertible. That it, there exists


a matrix B ∈ Mn (F) such that BA = In . Using the associativity of matrix
multiplication,

C = In C = (BA)C = B(AC) = B ⋅ 0n×n = 0n×n ,

which is a contradiction, because we assumed that C was not a zero matrix.


n

Example: In the case of 2 × 2 matrices we can find “by hand” a complete


characterization of all the invertible matrices. Let

a b
A=[ ] ≠ 02×2 .
c d

A direct calculation shows that

a b d −b d −b a b ad − bc
[ ][ ]=[ ][ ]=[ ] = (ad − bc)I.
c d −c a −c a c d ad − bc
Linear Systems of Equations 75

There are now two possibilities: if ad − bc = 0F then

a b d −b
[ ][ ] = 02×2 ,
c d −c a

and by Proposition 2.26, A is not invertible. If, however, ad − bc ≠ 0F , then


A is invertible with
1 d −b
[ ]
ad − bc −c a
being an inverse. ▲▲▲

Comment: The scalar ad − bc is known as the determinant (!‫ )דטרמיננטה‬of


the matrix A, denoted
det A = ad − bc.
We will study determinants of general square matrices later on in this course.

Lemma 2.27 Let A, L, R ∈ Mn (F) be such that

LA = In and AR = In .

(The matrix L is called a left-inverse (!‫ )הפכית שמאלית‬of A and the matrix
R is called a right-inverse (!‫ )הפכית ימנית‬of A.) Then

L = R,

and A is invertible.

Proof : Using the associativity of matrix multiplication,

L = LIn = L(AR) = (LA)R = In R = R,

i.e., L = R. By definition L(= R) is an inverse of A. n

Corollary 2.28 If A ∈ Mn (F) is invertible, then its inverse is unique.


76 Chapter 2

Proof : Suppose that L, R ∈ Mn (F) are both inverses of A. By defintiion,

LA = In and AR = In .

By Lemma 2.27, L = R, proving the uniqueness of A. n


Since an invertible matrix A has a unique inverse, we can introduced a no-
tation for its inverse: A−1 .
We now further characterize the set GLn (F) of invertible n × n matrices with
entries in F.

Proposition 2.29 Let A, B ∈ Mn (F). Then,

(a) If A is invertible, so is A−1 and (A−1 )−1 = A


(b) If A and B are both invertible, then so is AB and

(AB)−1 = B −1 A−1 .

(Note the inversion in the order of B −1 and A−1 relative to A and B,


and recall that matrix multiplication is not commutative.)

Proof : By definition of the inverse,

AA−1 = A−1 A = In ,

which proves, by definition, that A−1 is invertible and A is its inverse. For
the second statement, using the associativity of matrix multiplication,

(B −1 A−1 )(AB) = B −1 (A−1 A)B = B −1 In B = B −1 B = In


(AB)(B −1 A−1 ) = A(BB −1 )A−1 = AIn A−1 = AA−1 = In ,

proving that B −1 A−1 is the inverse of AB. n

Corollary 2.30 Any (finite) product of invertible matrices is invertible.

Proof : This is left as an exercise. n


Linear Systems of Equations 77

Comment: We have just seen that the set GLn (F) satisfies the following
properties:

(a) It is not empty.


(b) It is endowed with a product that takes two elements of GLn (F) and
returns an element in GLn (F).
(c) It has a unit element In satisfying AIn = In A = A for every A ∈ GLn (F).
(d) Every A ∈ GLn (F) has a B ∈ GLn (F) satisfying AB = BA = In .

Such a structure is called a group (!‫ ;)חבורה‬it is the main subject of a second-
year course in algebra. Note that a group, unlike a field, is endowed with
only one algebraic operation, which does not need to be commutative. The
notation GLn stands for the general linear group.

Exercises

(easy) 2.47 Let A ∈ Mn (F). Show that if there exists a non-zero matrix
C ∈ Mn (F) such that CA = 0, then A in not invertible.
Solution 2.47: Suppose that A is invertible, then
C = C(AA−1 ) = (CA)A−1 = 0 ⋅ A−1 = 0,

which is a contradiction.

(easy) 2.48 Is the matrix


1 3
[ ]
2 6
invertible? If it is, what is its inverse?
Solution 2.48: No. Its determinant is zero.

(easy) 2.49 Is the matrix


⎡1 ⎤
⎢ ⎥
⎢ ⎥
⎢ 3 ⎥
⎢ ⎥

⎣ 6⎥

invertible? If it is, what is its inverse?
78 Chapter 2

Solution 2.49: Yes. Its inverse is


⎡1 ⎤
⎢ ⎥
⎢ ⎥
⎢ 1/3 ⎥.
⎢ ⎥

⎣ 1/6⎥⎦

(intermediate) 2.50 Prove that for every k ∈ N, a product of k invertible


matrices is invertible.

Solution 2.50: We have seen that the product of two invertible matrices is invertible.
To show that a product of k invertible matrices is invertible we proceed inductively.

(intermediate) 2.51 Prove or disprove the following statements:

(a) If A, B, C ∈ Mn (F) satisfy that A is invertible and AB = AC, then


B = C.
(b) If A, B ∈ GLn (F), then A + B ∈ GLn (F).
(c) If A, B ∈ Mn (F) are not invertible, then A + B is not invertible.
(d) If A ∈ GLn (F), then A3 ∈ GLn (F).

Solution 2.51:

(a) Yes. Multiply both sides on the left by A−1 .


(b) No. Take A = I2 and B = −I2 .
(c) No. Take
1 0 0 0
A=[ ] and B=[ ].
0 0 0 1

(d) Yes. Any product of invertible matrices is invertible.


Linear Systems of Equations 79

2.5.7 Elementary matrices


Next, we relate matrix multiplication to elementary row-operations. When
we start with an m × n matrix, an elementary row-operation yields a new
matrix of the same size, with each row being a linear combination of the
rows of the original matrix. In other words, an elementary row-operation
can be represented by a left-multiplication by an m × m matrix.

Definition 2.31 An m × m matrix E is called an elementary matrix


(!‫ )מטריצה יסודית‬if there exists an elementary row-operation e such that for
every matrix A (having m rows) EA = e(A).

Example: Let m = 2. Then, the elementary matrices obtained by multiply-


ing the first and second rows by 0 ≠ c ∈ F are

c 1
[ ] and [ ].
1 c

The elementary matrices corresponding to adding s times the first row to the
second row, and s times the second row to the first row are

1 1 s
[ ] and [ ].
s 1 1

▲▲▲
More generally, we denote by Dk (a) the elementary matrix multiplying the
k-th row by a. It is easy to verify that since

⎪c Akj
⎪ i=k
(e(A))ij =⎨ i
⎩A j i ≠ k,

it follows that

⎪ 1 i=j≠k



i
(Dk (a))j = ⎨a i = j = k .



⎩0 otherwise

80 Chapter 2

Example: The elementary matrix corresponding to multiplying the second


row by c ≠ 0 is
⎡1 0 0 ⋯ 0⎤⎥

⎢0 c 0 ⋯ 0⎥⎥

⎢ ⎥
⎢0 0 1 ⋯ 0⎥ .
⎢ ⎥
⎢⋮ ⋮ ⋮ ⋯ ⋮ ⎥⎥

⎢0 0 0 ⋯ 1⎥⎦

▲▲▲
We denote by Tk` (c) the elementary matrix adding c times the `-th row to
the k-th row. It is easy to verify that since

⎪Ai
⎪ i≠k
(e(A))j = ⎨ rj
i
k
⎩Aj + c Aj i = r,


it follows that

⎪ 1 i=j



` i
(Tk (c))j = ⎨c i = k, j = ` .



⎩0 otherwise

Proposition 2.32 An elementary matrix is invertible and its inverse is


again an elementary matrix; hence a product of elementary matrices is in-
vertible.

Proof : We show first that


(Dk (a))−1 = Dk (a−1 ).
Indeed,

⎪ 0 i≠j
m ⎪


−1 i i −1 p
(Dk (a) ⋅ Dk (a ))j = ∑(Dk (a))p (Dk (a ))j = ⎨1 ⋅ 1 i=j≠k
p=1 ⎪


⎩a ⋅ a i = j = k.
⎪ −1

In a similar way, we show that


(Tk` (a))−1 = Tk` (−a).
Finally, since any product of invertible matrices is invertible, it follows that
any product of elementary matrices is invertible (see Exercise 2.50). n
Linear Systems of Equations 81

Corollary 2.33 Two matrices A, B ∈ Mm×n (F) are row-equivalent if and


only if there exists an m × m matrix P , which is a product of elementary
matrices (hence in GLm (F)), such that

B = P A.

Proof : By definition, A and B are row-equivalent if and only if there exists


a sequence of elementary row-operations e1 , . . . , ek , such that

B = ek (ek−1 ⋯(e1 (A))).

We have just seen that every elementary row-operation is realized by a left-


multiplication by an elementary matrix. That is, there exists a sequence
E1 , . . . , Ek of elementary matrices such that

B = Ek (Ek−1 . . . (E1 (A))).

Since matrix multiplication is associative, we obtain the desired result with

P = Ek Ek−1 . . . E1 .

Corollary 2.34 To every matrix A ∈ Mm×n (F) there exists a matrix P ∈


GLm (F), which is a product of elementary matrices, such that

R = PA

is a row-reduced echelon matrix.

Proof : By the Gauss-Jordan algorithm, A is row-equivalent to a row-reduced


echelon matrix R. By Corollary 2.33, there exists a matrix P ∈ GLm (F),
which is a product of elementary matrices, such that R = P A. n
82 Chapter 2

Exercises
(easy) 2.52 Show by a direct calculation that the elementary 2 × 2 matrix
1 c
T12 (c) = [ ]
0 1
is invertible.
Solution 2.52: By a direct calculation,
1 c 1 −c 1 0
[ ][ ]=[ ].
0 1 0 1 0 1

(easy) 2.53 Write down explicitly the elementary matrix corresponding to


the elementary row-operation of adding c times row 4 to row 2 for m = 5.
Solution 2.53:
⎡1 0 0 0 0⎤⎥

⎢0 1 0 c 0⎥⎥

4 ⎢ ⎥
T2 (c) = ⎢0 0 1 0 0⎥
⎢ ⎥
⎢0 0 0 1 0⎥⎥

⎢0 0 0 0 1⎥⎦

(intermediate) 2.54 For each of the following matrices, determine whether


it is a product of elementary matrices; if it is find its inverse:
⎡1 0 0⎤ ⎡0 0 1⎤ ⎡0 1 2⎤ ⎡1 0 0⎤ ⎡1 0 0⎤
⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥
A = ⎢0 0 1⎥ B = ⎢1 0 0⎥ C = ⎢0 1 0⎥ D = ⎢0 2 0⎥ E = ⎢0 1 0⎥ .
⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎢0 1 0⎥ ⎢0 1 0⎥ ⎢0 0 1⎥ ⎢0 2 1⎥ ⎢0 0 2⎥
⎣ ⎦ ⎣ ⎦ ⎣ ⎦ ⎣ ⎦ ⎣ ⎦

Solution 2.54: We solve the first.


⎡1 0 0⎤ ⎡1 0 0 ⎤ ⎡1 0 0⎤ ⎡1 0 0⎤ ⎡1 0 0⎤
⎢ ⎥ ⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥
⎢ ⎥ ⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥
⎢0 0 1⎥ = ⎢0 1 0 ⎥ ⎢0 1 1⎥ ⎢0 1 0⎥ ⎢0 1 1⎥ .
⎢ ⎥ ⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥
⎢0 1 0⎥ ⎢0 0 −1⎥ ⎢0 0 1⎥ ⎢0 −1 1⎥ ⎢0 0 1⎥
⎣ ⎦ ⎣ ⎦⎣ ⎦⎣ ⎦⎣ ⎦
Its inverse is therefore
⎡1 0 0⎤−1 ⎡1 0 0 ⎤ ⎡1 0 0⎤ ⎡1 0 0 ⎤ ⎡1 0 0 ⎤
⎢ ⎥ ⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥
⎢ ⎥ ⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥
⎢0 0 1⎥ = ⎢0 1 −1⎥ ⎢0 1 0⎥ ⎢0 1 −1⎥ ⎢0 1 0 ⎥ .
⎢ ⎥ ⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥
⎢0 1 0⎥ ⎢0 0 1 ⎥ ⎢0 1 1⎥ ⎢0 0 1 ⎥ ⎢0 0 −1⎥
⎣ ⎦ ⎣ ⎦⎣ ⎦⎣ ⎦⎣ ⎦
Linear Systems of Equations 83

2.5.8 Elementary matrices and invertibility


We will now use the last results to state conditions under which a square
matrix is invertible.

Theorem 2.35 Let A ∈ Mm (F). Then the following statements are equiva-
lent:

(a) A is invertible.
(b) A is row-equivalent to Im .
(c) A is a product of elementary matrices.

Comment: When we say that three (or more) statements are equivalent, it
means that if one of them is true, then all of them are true, and equivalently,
if one of them is false, then all are false. To prove it, it suffices to prove that
the first statement implies the second, that the second implies the third, and
so on, and finally that the last implies the first.

Proof : Statement (b) implies statement (c) as if A is row-equivalent to Im ,


then there exist elementary matrices E1 , . . . , Es such that

A = Es Es−1 . . . E1 Im = Es Es−1 . . . E1 .

Statement (c) implies statement (a) by Proposition 2.32. Thus, it only re-
mains to prove that an invertible matrix is row-equivalent to Im .
Let R be a row-reduced echelon matrix which is row-equivalent to A. Since
R and A are row-equivalent, there exists a matrix P , which is a product of
elementary matrices, such that R = P A, hence R is invertible. It follows that
R does not have a row that is zero, but a square row-reduced echelon matrix
which has no zero rows can only be the identity matrix. n
In fact, the inverse of an invertible matrix can be calculated as follows:
84 Chapter 2

Proposition 2.36 Let A ∈ GLm (F) and let P be a product of elementary


matrices such that P A = Im . Then,

P Im = A−1 .

That is, the sequence of operations reducing A to Im is the same sequence


bringing Im to A−1 .

Proof : We have
(P Im )A = P (Im A) = P A = Im ,
which, by the uniqueness of the matrix inverse proves that P Im = A−1 . n
We finally relate the property of being invertible to the existence of solution
to linear systems of equations:

Theorem 2.37 Let A ∈ Mm (F). The following statements are equivalent:

(a) A is invertible.
(b) The homogeneous system AX = 0 only has the trivial solution.
(c) For every m × 1 matrix b, the system AX = b is consistent and its
solution is unique.

Proof : Suppose that Statement (a) holds, i.e., A is invertible. On the one
hand, x = A−1 b is a solution; on the other hand, if Ax = b, then

x = Ix = A−1 Ax = A−1 b,

i.e., a solution exists and it is unique (because we actually determined what it


must be), so that Statement (c) holds. Statement (b) is a particular example
of Statement (c), so that (c) implies (b). It remains to prove that Statement
(b) implies Statement (a).
Suppose, by contradiction, that Statement (b) holds and that A is not in-
vertible. Let R be the row-reduced echelon matrix which is row-equivalent
to A. Since A is not invertible, R is not the identity matrix, hence it has at
Linear Systems of Equations 85

least one row identically zero. It follows that it has at least one free variable,
contradicting the fact that AX = 0 has a unique solution (since its solutions
are the same as the solutions of RX = 0). n
With that we finally have:

Corollary 2.38 A square matrix having either a left- or a right-inverse is


invertible. That is, let A ∈ Mn (F). If there exists an L ∈ Mn (F) such that
LA = In , or if there exists an R ∈ Mn (F) such that AR = In , then A is
invertible.

Proof : Suppose for example that A has a left-inverse L. Let x be a solution


to AX = 0, then
x = In x = LAx = L(Ax) = 0,
i.e., 0 is the unique solution to AX = 0 implying by Theorem 2.37 that A is
invertible. On the other hand, if R is a right-inverse of A, then A is a left-
inverse for R, hence R is invertible, and AR = In implies that A is invertible
as well. n

Corollary 2.39 A product of square matrices is invertible if and only if


every matrix in this product is invertible.

Proof : We will show it for a product of two matrices; the general case can
be shown inductively. We already know that a product of invertible matrices
is invertible—we will now show that if AB is invertible then both A and B
are invertible. Let C be the inverse of AB, then

(AB)C = A(BC) = I,

i.e., A has a right-inverse, and by Corollary 2.38 it is invertible. Likewise,

C(AB) = (CA)B = I,

i.e., B has a left-inverse, and by Corollary 2.38 it is invertible. n


As a final note, if A ∈ GLn (F), then the linear system (whether homogeneous
or not)
AX = b
86 Chapter 2

has a unique solution,


x = A−1 b.
In this section we obtained an algorithm (using the Gauss-Jordan procedure)
for calculating A−1 . Note however, that this chapter has a much wider scope,
encompassing linear systems of arbitrary m, n, whether consistent or not.

Exercises

(easy) 2.55 Let A be an m × m matrix. Prove that if A is invertible and


AB = 0 for some m × m matrix B, then B = 0.

Solution 2.55: Multiply both sides of AB = 0 by A−1 .

(intermediate) 2.56 For each of the two matrices


⎡2 5 −1⎤ ⎡1 −1 2 ⎤
⎢ ⎥ ⎢ ⎥
⎢ ⎥ ⎢ ⎥
⎢4 −1 2 ⎥ and ⎢3 2 4 ⎥
⎢ ⎥ ⎢ ⎥
⎢6 4 1 ⎥ ⎢0 1 −2⎥
⎣ ⎦ ⎣ ⎦
find using elementary row-operations whether they are invertible and find
their inverses if they are (this is quite tedious, but one has to do it at some
point...).

Solution 2.56: Recall that a square matrix is invertible if it is row-equivalent to the


identity matrix, which is its row-reduced echelon form. Moreover, the inverse is obtained
by performing the same sequence of operations on the identity matrix. The first matrix
turns out not to be invertible.

(intermediate) 2.57 Is the matrix

⎡1 2 3 4⎤⎥

⎢0 2 3 4⎥⎥

⎢ ⎥
⎢0 0 3 4⎥⎥

⎢0 0 0 4⎥⎦

invertible? If it is, what is its inverse?
Linear Systems of Equations 87

Solution 2.57: The inverse is


⎡1 −1 0 0 ⎤⎥

⎢0 1/2 −1/2 0 ⎥⎥

⎢ ⎥.
⎢0 0 1/3 −1/3⎥⎥

⎢0 0 0 1/4 ⎥⎦

(intermediate) 2.58 Let A be a 2 × 1 matrix and let B be a 1 × 2 matrix.


Show that the 2 × 2 matrix AB is not invertible.

Solution 2.58: Suppose that B = [a b]. Then, for

b
X = [ ],
−a

we have BX = 0, hence (AB)X = 0. If a = b = 0 then AB = 0 and it is not invertible.


Otherwise, ABX = 0 has a non-trivial solution, which implies that AB is not invertible.

(intermediate) 2.59 The following matrices are over R. Determine whether


they are invertible, and if they are, find their inverses:
⎡2 1 2⎤ ⎡1 1 1⎤ ⎡1 1 1⎤
⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎢ ⎥ ⎢ ⎥ ⎢ ⎥
U1 = ⎢4 0 3⎥ U2 = ⎢1 1 1⎥ U3 = ⎢0 1 0⎥
⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎢0 3 5⎥ ⎢0 1 1⎥ ⎢1 0 1⎥
⎣ ⎦ ⎣ ⎦ ⎣ ⎦
⎡1 1 1⎤ ⎡2 3 4⎤ ⎡1 3 4⎤
⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎢ ⎥ ⎢ ⎥ ⎢ ⎥
U4 = ⎢0 1 0⎥ U5 = ⎢0 1 2⎥ U6 = ⎢2 4 0⎥
⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎢0 1 1⎥ ⎢0 0 3⎥ ⎢3 1 1⎥
⎣ ⎦ ⎣ ⎦ ⎣ ⎦

(intermediate) 2.60 Consider the matrix


⎡ 1 0 3 ⎤⎥

⎢ ⎥
A = ⎢3/2 1/2 3 ⎥ .
⎢ ⎥
⎢ −5 4 −9⎥
⎣ ⎦
If possible, express it as a product of elementary matrices. If not, explain
why.
88 Chapter 2

(intermediate) 2.61 Let A ∈ Mn (F) satisfy the equation

A2 − A + I = 0.

Show that A is invertible and express its inverse in terms of A.

Solution 2.61: Since A(I − A) = I, it follows that A (and I − A) are invertible and
A−1 = I − A.

(intermediate) 2.62 Let A ∈ Mn (F) satisfy the equation

A3 − 2A + I = 0.

Show that A is invertible and express its inverse in terms of A.

Solution 2.62: Since A(2I − A2 ) = I, it follows that A (and 2I − A2 ) are invertible, and
A−1 = 2I − A2 .

(intermediate) 2.63 Show that if A, B ∈ Mn (F) satisfy that AB 2 − A is


invertible, then BA − A is invertible.

Solution 2.63: Since AB 2 − A = A(B + I)(B − I) is invertible, it follows that A and


B − I are invertible, hence so is (B − I)A = BA − A.

(intermediate) 2.64 Let A ∈ Mn (F), B ∈ Mk (F) and D ∈ Mn×k (F ). Con-


sider the block matrix C ∈ Mn+k (F) given by

A D
C=[ ].
B

Show that if A and B are invertible, then so is C. What about the converse?
Linear Systems of Equations 89

Solution 2.64: Suppose that A and B are invertible. Consider the equation CX = 0.
If we partition the n + k rows of a solution into n row (which we denote x) and k rows
(which we denote y), we obtain two equations,

Ax + Dy = 0 and By = 0.

Since B is invertible, y = 0, and the first equation reduces to Ax = 0. Since A is invertible,


then x = 0 as well; since the equation CX = 0 only has trivial solutions, C is invertible.

(harder) 2.65 Show that if A, B, A + B ∈ GLn (F), then

A−1 + B −1 ∈ GLn (F).

Solution 2.65: We have

A−1 + B −1 = B −1 BA−1 + B −1 AA−1 = B −1 (B + A)A−1 ,

which is a product of invertible matrices.

(harder) 2.66 Let A be an m × m matrix. Prove that if A is not invertible


then there exists a non-zero m × m matrix B such that AB = 0.

Solution 2.66: If A is not invertible, then there exists a non-zero m × 1 matrix X such
that AX = 0. Take the matrix B whose first column is X and the other columns are zero.
Then, AB = 0.

(harder) 2.67 Let A be an m × n matrix with n < m, and let B be an n × m


matrix. Show that AB is not invertible. Hint: what can you say about the
homogeneous system BX = 0?

Solution 2.67: The equation BX has non-trivial solutions since the row-reduced ech-
elon matrix which is row-equivalent to B has at least m − n free variables. It follows that
AB = 0 has non-trivial solutions proving that AB is not invertible.
90 Chapter 2

2.6 The structure of the set of solutions


2.6.1 The homogeneous case
Let A ∈ Mm×n (F). Consider the set S[A∣0] of all solutions x ∈ Mn×1 (F) = Fncol
to the homogeneous system
AX = 0.
This set turns out to have interesting properties, which will play a central
role throughout this course:

Theorem 2.40 Let A ∈ Mm×n (F).Then,

(a) If u, v ∈ S[A∣0] , then u + v ∈ S[A∣0] .


(b) If u ∈ S[A∣0] and λ ∈ F, then λ u ∈ S[A∣0] .

(In other words, the set of solutions of a homogeneous system is closed under
addition and under scalar multiplication.)

Proof : For the first statement, if u, v ∈ S[A∣0] , then

Au = 0 and Av = 0,

from which follows from distributivity that

A(u + v) = Au + Av = 0,

namely u + v ∈ S[A∣0] . For the second statement, if Au = 0, then

A(λu) = λ(Au) = 0,

namely λu ∈ S[A∣0] . Note that we used here the fact that for every i = 1, . . . , m,
n n n
(A(λu))i = ∑ aik (λu)k = ∑ λaik uk = λ ∑ aik uk = (λ(Au))i .
k=1 k=1 k=1

n
Linear Systems of Equations 91

Example: Consider the case of m = 1 and n = 2,


X 1 + X 2 = 0.
The set of solutions of this “system” of equations is
t
S[1,1∣0] = {[ ] ∶ t ∈ F} .
−t
Take any two elements,
t s
[ ] , [ ] ∈ S[1,1∣0] ,
−t −s
their sum
t s t+s
[ ]+[ ]=[ ]
−t −s −(t + s)
is also an element of S[1,1∣0] . Likewise, for every λ ∈ F,

t λt
λ[ ] = [ ]
−t −λt
is an element of S[1,1∣0] . ▲▲▲

2.6.2 The inhomogeneous case


Consider next an inhomogeneous system,
AX = b,
where A ∈ Mm×n (F) and b ∈ Fm col . Do we get here the same phenomenon?
Is it true that u, v ∈ S[A∣b] implies that u + v ∈ S[A∣b] . Let’s verify it. If
u, v ∈ S[A∣b] , then
Au = b and Av = b,
and
A(u + v) = Au + Av = b + b
which differs from b unless b = 0 (note that we don’t write b + b = 2b as we
are in a general field). Thus, unless the system is homogeneous, u+v ∈/ S[A∣b] .
The following theorem shows that the set of solution of an inhomogeneous
system has its own algebraic structure:
92 Chapter 2

Theorem 2.41 Let A ∈ Mm×n (F) and b ∈ Fm


col . If u ∈ S[A∣b] and v ∈ S[A∣0] ,
then u + v ∈ S[A∣b] .

Proof : Let u ∈ S[A∣b] and v ∈ S[A∣0] , namely,

Au = b and Av = 0.

Then,
A(u + v) = Au + Av = b + 0 = b,
which means that u + v ∈ S[A∣b] . n
In fact, we can prove something even stronger.

Theorem 2.42 Let A ∈ Mm×n (F) and b ∈ Fm col . Suppose that the inhomoge-
neous system is consistent, namely, that there exists x ∈ Fncol satisfying

Ax = b.

Then, every u ∈ S[A∣b] can be represented as

u = x + v,

for some v ∈ S[A∣0] .

Proof : Let u ∈ S[A∣b] , and write

u = x + (u − x) .
´¹¹ ¹ ¹ ¹ ¸¹¹ ¹ ¹ ¹ ¶
=v

Now,
Av = A(u − x) = Au − Ax = b − b = 0,
i.e., v ∈ S[A∣0] . n
In other words, if an inhomogeneous system is consistent, every solution can
be represented as the sum of one particular solution and a solution of the
corresponding homogeneous system.
Linear Systems of Equations 93

Example: Consider the inhomogeneous system with m = 1 and n = 2,

X 1 + X 2 = 5.

The set of solutions of this “system” of equations is

t
S[1,1∣5] = {[ ] ∶ t ∈ F} .
5−t

Note that
0
x=[ ]
5
is a particular solution of this system, and that the set of solutions can be
written as
0 t
S[1,1∣5] = {[ ] + [ ] ∶ t ∈ F} .
5 −t
That is, every solution can be represented as the sum of one particular solu-
tion and a solution of the homogeneous system. ▲▲▲

2.7 The geometry of solutions

2.7.1 Affine spaces


We end this chapter on linear systems of equations by presenting a geometric
interpretation of the set of solutions of systems AX = b. To this end we
introduce an algebraic construct called an affine space (!‫ )מרחב אפיני‬over a
field F. The introduction will be somewhat less formal than the standard of
this course, because the goal here is mainly to develop some intuition.
An affine space over a field F encompasses two (non-empty) sets: a set of
so-called points (!‫ )נקודות‬and a set of so-called translations (!‫)הזזות‬. To
distinguish between the two, we denote the points by uppercase roman char-
acters, e.g., P, Q, . . . , and we denote the translations by lowercase roman
characters, e.g., u, v, . . . .
It is useful to think of the points as actual points (say on a plane) and of
the translations as arrows on that same plane. Translations act on points by
94 Chapter 2

translating them into other points. We denote this action using the addition
sign: a translation v acting on a point P yields a point which we denote by

P + v.

In other words there exists a function of the type

+ ∶ points × translations → points.

This is the image one should have in mind:

v P +v

The rule is that for every two points P, Q there exists a unique translation
v such that Q = P + v; we denote this unique translation by Q − P , which is
sometimes also denoted by P⃗Q.

Comments:

(a) There is no meaning to adding two points! Thus far, the addition
operation represents only the action of a translation on a point.
(b) An affine space does not come equipped with a special point, such as
an origin.

Translations can be composed. One can act on a point P by a translation v


and then act on the resulting point by a translation w, as depicted below:

v
w
P
v+w

In an affine space, an addition of translations is defined, satisfying the rule


that v + w is the translation equivalent to a translation by v followed by a
translation by w, that is,

P + (v + w) = (P + v) + w.
Linear Systems of Equations 95

Note the difference between the types of addition on both sides of the equa-
tion. The addition on the left-hand side is a function

+ ∶ translations × translations → translations.

The addition of translations is assumed to be associative and commutative.


Also, there exists a zero translation, which we denote by 0, satisfying for
every point P ,
P + 0 = P.

This last point merits some elaboration. By assumption, there exists for a
point P a unique translation v satisfying

P + v = P,

and there exists for a point Q a unique translation w satisfying

Q + w = Q,

The claim is that v = w, so that there exists a single translation which leaves
all points unaffected. Why this? Because if Q − P = u, i.e., Q = P + u, then

Q + v = (P + u) + v = P + (u + v) = P + (v + u) = (P + v) + u = P + u = Q,

i.e., both Q+w = Q and Q+v = Q, and by the uniqueness assumption, v = w.


Also, given a point P and a translation v, there exists a unique translation
w, such that
(P + v) + w = P,
i.e.,
P + (v + w) = P,
from which we deduce that v + w = 0, i.e., every translation has an additive
inverse.
Thus far, the field F has played no role. An affine space is endowed with an
additional operation, which is the scalar multiplication of a translation by a
scalar (think of it as a scaling of the translation): for a translation v and a
scalar λ ∈ F, one forms a product λv, which is a translation. That is,

⋅ ∶ scalars × translations → translations.


96 Chapter 2

Scalar multiplication is associative, in the sense that

α(βv) = (αβ)v,

has a neutral element,


1F ⋅ v = v,
and is distributive both over scalar addition,

(α + β)v = αv + βv

and over the addition of translations,

α(u + v) = αu + αv.

2.7.2 The affine space An (F)


Thus far, the discussion was completely general. We now consider a particu-
lar instance of an affine space. Let n ∈ N be any natural number. The affine
space An (F) is defined as follows: the points belong to the set
⎧ 1 ⎫


⎪ ⎛p ⎞ ⎪


n 1 n
A (F) = ⎨⎜ ⋮ ⎟ ∶ p , . . . , p ∈ F⎬ ,

⎪⎝pn ⎠
⎪ ⎪


⎩ ⎭
whereas the translation belong to the set,

⎪ ⎡v1 ⎤ ⎫

n ⎪⎢⎢ ⎥⎥
⎪ 1 n


V (F) = ⎨⎢ ⋮ ⎥ ∶ v , . . . , v ∈ F⎬ .

⎪ ⎢ n
⎥ ⎪

⎩⎢⎣v ⎥⎦
⎪ ⎪

Note that these two sets are essentially “the same”, and we use different
parentheses to distinguish between the two.
The action of a translation on a point is defined by
1 ⎡ 1⎤ 1 1
⎛ p ⎞ ⎢⎢ v ⎥⎥ ⎛ p + v ⎞
⎜ ⋮ ⎟ + ⎢ ⋮ ⎥ = ⎜ ⋮ ⎟.
⎝pn ⎠ ⎢⎢v n ⎥⎥ ⎝pn + v n ⎠
⎣ ⎦
For the operations on translations, we exploit the fact that we use a matrix
notation so that addition and scalar multiplication have already been defined.
Linear Systems of Equations 97

And now we connect this geometric construct to the set of solutions to linear
system. Let A ∈ Mm×n (F). We interpret the solutions of the system AX = b
(which are n-tuples of field elements) as points in the affine space An (F). In
contrast, we interpret the set of solutions of the homogeneous system AX = 0
(which are also n-tuples of field elements) as the space of translations V n (F).
Theorem 2.42 can then be interpreted as follows. Suppose that the system
AX = b is consistent, i.e., it has at least one solution P (a point in the affine
space). Then, its set of solutions is all the points Q obtained by translating
P by a solution of the homogeneous equation (which is indeed a translation).

2.7.3 Lines in affine spaces


Let A be the set of points in an affine space over F and let V be the set of
translations. A set of points L ⊂ A is called a line (!‫)ישר‬, if there exists a
point P ∈ A and a translation 0 ≠ v ∈ V , such that

L = {P + tv ∶ t ∈ F}.

That is, a line is a set of points obtained by translating a point P ∈ A by all


translations which are multiples of a translation v ∈ V .

v
P

Proposition 2.43 Let a, b ∈ F, which are not both zero and let c ∈ F. Then,
the set of solutions to the equation

aX + bY = c

is a line in the affine space A2 (F).

Proof : We already have a technique for finding the space of solutions S[a,b∣c] .
Suppose first that a ≠ 0F . Then, the extended matrix [a, b∣c] is row-equivalent
98 Chapter 2

to a matrix of the form [1, d∣e]; the corresponding linear systems have the
same solutions. The set of solutions is the set of points

e − dt
S[1,d∣e] = {( ) ∶ t ∈ F} ,
t

which we can rewrite as


e −d
S[1,d∣e] = {( ) + t [ ] ∶ t ∈ F} .
0 1

If a = 0 and b ≠ 0, then

t
S[0,b∣c] = {( ) ∶ t ∈ F} ,
c/b

which we can rewrite as


0 1
S[0,b∣c] = {( ) + t [ ] ∶ t ∈ F} ,
c/b 0

which is also a line. n


In fact the converse is also true:

Proposition 2.44 Let L ⊂ A2 (F) be a line. Then, there exist a, b ∈ F, which


are not both zero and a c ∈ F, such that L is the set of solutions to the linear
equation
aX + bY = c.

Proof : Let
p1 v1
L = {( 2 ) + t [ 2 ] ∶ t ∈ F}
p v
be a line in A2 (F). Let [x1 , x2 ]T ∈ L. Then, there exists a t ∈ F, such that

x1 = p1 + tv 1 and x2 = p 2 + t v 2 .

Since the translation is non-zero, either v 1 ≠ 0 or v 2 ≠ 0. If v 1 ≠ 0, then

t = (x1 − p1 )/v 1 ,
Linear Systems of Equations 99

so that
x2 = p2 + v 2 (x1 − p1 )/v 1 ,
which we may rewrite as
v 2 x1 − v 1 x2 = v 2 p 1 − v 1 p 2 .
That is, all points in L are solution of the equation
v 2 X 1 − v 1 X 2 = v 2 p 1 − v 1 p2 .

We can also think of it differently: we are trying to solve a system of two


equations in one unknown,
v 1 t = x1 − p 1
v 2 t = x2 − p 2 .
The extended matrix is
v 1 x1 − p 1
[ .]
v 2 x2 − p 2
Suppose that v 1 ≠ 0. Then, the corresponding row-reduced echelon matrix is
1 (x1 − p1 )/v 1
[ .]
0 (x2 − p2 ) − v 2 (x1 − p1 )/v 1
This equation is consistent if and only if
(x2 − p2 ) − v 2 (x1 − p1 )/v 1 = 0,
which is the same as we obtained before. n

2.7.4 Planes in affine spaces


Let A be the set of points in an affine space over F and let V be the set of
translations. A set of points M ⊂ A is called a plane (!‫)מישור‬, if there exists
a point P ∈ A and two translation 0 ≠ u, v ∈ V , such that none is a multiple
of the other, such that
M = {P + su + tv ∶ s, t ∈ F}.
That is, a plane is a set of points obtained by translating a point P ∈ A by
all translations which are linear combinations of two translation u, v ∈ V .
100 Chapter 2

Proposition 2.45 Let a, b, c ∈ F, which are not all zero and let d ∈ F. Then,
the set of solutions to the equation

aX + bY + cZ = d

is a plane in the affine space A3 (F).

Proof : We leave this as an exercise. Separate the cases a ≠ 0, a = 0 by b ≠ 0,


and a = b = 0. n
And conversely,

Proposition 2.46 Let M ⊂ A3 (F) be a plane. Then, there exist a, b, c ∈ F,


which are not all zero and a d ∈ F, such that M is the set of solutions to the
linear equation
aX + bY + cZ = d.
Chapter 3

Vector Spaces

The subject of this course is a theory of sets for which there is a notion of
linear combinations of elements. We have already encountered linear combi-
nations of equations and linear combinations of matrices; we are now going
to formalize axiomatically such sets, which we call vector spaces. Vector
spaces are abundant in mathematics (and its applications in all branches
of science), and their theory is foundational to that branch of mathematics
called algebra.

3.1 Definitions and examples


Definition 3.1 Let F be a field. A vector space (!‫ )מרחב וקטורי‬over F is
a non-empty set V (whose elements we call vectors) on which are defined
two operations: vector addition (!‫)חיבור וקטורי‬,

+ ∶ V × V → V,

taking every u, v ∈ V to an element u + v ∈ V , and scalar multiplication


(!‫)כפל בסקלר‬,
⋅ ∶ F × V → V,
taking every a ∈ F and u ∈ V to an element au ∈ V .
Vector addition satisfies the following properties:

(a) Commutativity: for every u, v ∈ V , u + v = v + u.


102 Chapter 3

(b) Associativity: for every u, v, w ∈ V , (u + v) + w = u + (v + w).


(c) Neutral element: there exists a vector 0V ∈ V (or just 0 in short) such
that for all u ∈ V , u + 0V = u.
(d) Additive inverse: every u ∈ V has an element (−u), such that u+(−u) =
0V .

Scalar multiplication satisfies the following properties:

(e) Identity element: For every u ∈ V , 1F ⋅ u = u.


(f ) Associativity: for every a, b ∈ F and every u ∈ V , a(bu) = (ab)u (note
the distinction between the products ⋅ ∶ F × F → F and ⋅ ∶ F × V → V ).

Finally, the two operations satisfy the distributive laws:

(g) For every a ∈ F and u, v ∈ V , a(u + v) = au + av.


(h) For every a, b ∈ F and u ∈ V , (a + b)u = au + bu.

(Note the distinction between the sums + ∶ F × F → F and + ∶ V × V → V .)

Comments:

(a) A vector space hinges on two structures, a set of vectors and a field.
Formally, a vector field is a four-tuple, (V, +, F, ⋅).
(b) Vector spaces are also called linear spaces (!M‫ לינאריי‬M‫)מרחבי‬.
(c) Be careful not to confuse 0F ∈ F and 0V ∈ V , although we often denote
them by the same symbol, 0.
(d) There is no meaning to a product ua, with u ∈ V and a ∈ F (even
though we could have defined it by commutativity).
(e) Vector spaces don’t have a canonical notion of products of vectors.
For those who are acquainted with scalar and vector products, these
products assume additional structure.
(f) Inductively, a vector space is closed under any finite linear combination
of vectors. That is, for every v1 , . . . , vn ∈ V and a1 , . . . , an ∈ F,

a1 v1 + ⋅ ⋅ ⋅ + an vn ∈ V.
Vector Spaces 103

We will often write such sums using our notation for matrix multipli-
cation,
⎡ a1 ⎤
⎢ ⎥
1 n ⎢ ⎥
a v1 + ⋅ ⋅ ⋅ + a vn = (v1 . . . vn ) ⎢ ⋮ ⎥ .
⎢ n⎥
⎢a ⎥
⎣ ⎦
The interpretation is that the column of scalars “acts” on the row
of vectors to produce a linear combination. At this stage, the role
of matrices enclosed by square bracket becomes “operators” forming
linear combinations. Note that we obtain products such as v1 a1 , which
we interpret as a1 v1 .
(g) Physicists often describe vectors as entities having a “magnitude” and
a “direction”; at this stage (and throughout this course) vectors have
neither magnitudes nor directions.

Example: Let F be any field. A set comprising just one element, V = {0V },
is a vector space with vector addition and scalar multiplication defined the
only possible way, namely

0V + 0V = 0V and a 0V = 0V .

Such a vector space is called the zero space (!‫)מרחב האפס‬, even though strictly
speaking, the vector space ({0V }, +, F, ⋅) is a different space for each field F.
▲▲▲

Example: For any field F and every n ∈ N, the set V = Fn is a vector space
over F with respect to vector addition,

(u1 , . . . , un ) + (v1 , . . . , vn ) = (u1 + v1 , . . . , un + vn ),

and scalar multiplication

a(u1 , . . . , un ) = (au1 , . . . , aun ).

The zero vector of this space is

0Fn = (0F , . . . , 0F ),

and the additive inverse of a vector is given by

−(v1 , . . . , vn ) = (−v1 , . . . , −vn ).


104 Chapter 3

All the vector space axioms follow from the properties of the field F (which
you should verify). Thus, (Fn , +, F, ⋅) is a vector space. The same applies if
we rather consider Fnrow or Fncol . ▲▲▲

Example: In particular, setting n = 1, F is a vector space over itself! That


is, for every field F, (F, +, F, ⋅) is a vector space. This is quite confusing as
the same set plays two different roles. ▲▲▲

Example: Consider the vector space (F2 , +, F, ⋅) and let v1 = (2, 3) and
v2 = (4, 5). The linear combination 8v1 + 9v2 is written using the action of a
matrix,
8
((2, 3) (4, 5)) [ ] .
9
▲▲▲

Example: The space of m × n matrices with entries in F is a vector space


over F with respect to vector addition

(A + B)ij = aij + bij

and scalar multiplication


(λ A)ij = λ aij .
The zero element of this space is 0m×n . And don’t be confused: in this vector
space, the vectors are matrices. ▲▲▲

Example: Let S be any non-empty set and let V = Func(S, F ) be the space
of functions f ∶ S → F (you will learn about functions in depth in the calculus
course, but let’s just think of a function as a “machine” which when fed with
an element in S, returns an element in F). Then, V is a vector space over F
with respect to vector addition

(f + g)(s) = f (s) + g(s)

and scalar multiplication

(a f )(s) = a f (s).
Vector Spaces 105

The zero element of this space is the function returning 0 ∈ F for all s ∈ S.
The additive inverse (−f ) of a function f is the function
(−f )(s) = −f (s).
Thus, (Func(S, F), +, F, ⋅) is a vector space. Once again don’t be confused:
in this vector space, the vectors are functions. ▲▲▲

Example: Another example is that of polynomial spaces (!M‫)מרחבי פולינומי‬.


Let F be a field and let X be a symbol. We denote by F[X] the set of
expressions of the form
P = p 0 + p1 X + p2 X 2 + pn X n ,
where p0 , . . . , pn are scalars and pn ≠ 0. We call pn the leading coefficient
(!‫ מוביל‬M‫ )מקד‬and we call pn X n the leading term (!‫)איבר מוביל‬. To this set
we also add the scalar 0F . If pn is the leading coefficient, we say that P is of
degree (!‫ )דרגה‬n, and write
deg P = n.
The degree of P = 0F is set to be −∞.
Let n m
P = ∑ pi X i and Q = ∑ qi X i ,
i=1 i=1
where without loss of generality, m ≤ n. Then, we define
m n
P + Q = ∑(pi + qi )X i + ∑ pi X i ,
i=1 i=m+1

and P + 0F = P . Likewise, we define scalar multiplication by


n
cP = ∑(cpi )X i .
i=1

It is readily checked that F[X] forms a vector space over F with respect to
these operations. ▲▲▲

Example: The complex numbers C are a field, hence C is a vector space over
C under the natural operations of addition and multiplication by scalars. On
the other hand, C is also a vector space over R, which is a totally different
vector space, despite the fact that the elements of the space (i.e., the vectors)
are the same. More generally, C is a vector space over any subfield of C (e.g.,
the complex rationals). ▲▲▲
106 Chapter 3

3.2 Basic properties


Like fields, vector spaces satisfy a number of generic properties:

Proposition 3.2 Let V be a vector space over F. Then,

(a) Every vector v ∈ V has a unique additive inverse.


(b) For every a ∈ F, a 0V = 0V .
(c) For every u ∈ V , 0F u = 0V .
(d) If a ∈ F and u ∈ V satisfy au = 0V , then either a = 0F or u = 0V .
(e) For every u ∈ V , (−1F )u = −u.

Proof :

(a) Suppose that u + v = 0V and w + v = 0V . It follows from the first three


properties of vector addition that

u = 0V + u = (w + v) + u = w + (v + u) = w + (u + v) = w + 0V = w.

(b) By the properties of 0V and distributivity,

a 0V = a(0V + 0V ) = a 0V + a 0V .

Adding −(a 0V ) to both sides and using the properties of vector addi-
tion,

0V = a 0V + (−(a 0V ))
= (a 0V + a 0V ) + (−(a 0V ))
= a 0V + (a 0V + (−(a 0V )))
= a 0V + 0V
= a 0V ,

proving that a 0V = 0V .
Vector Spaces 107

(c) Similarly,
0F u = (0F + 0F )u = 0F u + 0F u.
Adding −(0F u) to both sides,

0V = 0F u + (−(0F u))
= (0F u + 0F u) + (−(0F u))
= 0F u + (0F u + (−(0F u)))
= 0F u + 0V
= 0F u,
proving that 0F u = 0V .
(d) Suppose that au = 0V . If a ≠ 0F , then using the fact that a has a
multiplicative inverse,

u = 1F ⋅ u = (a−1 a)u = a−1 (au) = a−1 0V = 0V ,

i.e., either a = 0F or u = 0V .
(e) We have
0V = 0F u = (1F + (−1F ))u = u + (−1F )u,
and it follows from the uniqueness of the inverse that (−1F )u = −u.

Comment: The fourth item has an important consequence: suppose that


v ∈ V is non-zero and there exist a, b ∈ F, such that av = bv. Then, (a − b)v =
0, from which we deduce that a = b.
We now come to the raison d’être of vector spaces—the formation of linear
combinations:

Definition 3.3 Let V be a vector space over a field F and let (u1 , . . . , un ) ⊂
V be a sequence of n vectors. A vector v ∈ V is said to be a linear combi-
nation of (u1 , . . . , un ), if there exists a sequence of scalars (a1 , . . . , an ) ∈ Fn ,
such that
v = a1 u1 + ⋅ ⋅ ⋅ + an un ,
or in matrix form, if there exists an a ∈ Fncol , such that

v = (u1 . . . un ) a.
108 Chapter 3

Some notations: Let (V, +, F, ⋅) be a vector space. For every v ∈ V , we denote


by
F v = {a v ∶ a ∈ F}
the set of scalar multiples of v. For S, T ⊂ V we denote by

S + T = {u + v ∶ u ∈ S, v ∈ T }

the set of vectors obtained by sums of elements of S and T .

Exercises

(easy) 3.1 What is a vector? Let S be any non-empty set and let x ∈ S.
How can we tell whether x is a vector?

Solution 3.1: This is an almost senseless question. An element of a set is a vector if the
set is endowed with an algebraic structure (relying in particular on another structure—a
field), with two binary operations satisfying all axioms.

(easy) 3.2 In what sense is every field a vector space? Is it true that every
vector space is a field?

Solution 3.2: Every field is a vector space over itself in the following sense: every
two field elements can be added yielding another field element (here we think of field
elements as vectors). Every field element can be multiplied by a field element, yielding a
field element (here we think of it as a scalar times a vector resulting in a vector). Every
field element (viewed as a vector) has an additive inverse. Under this perspective, one
can verify that all the axioms of vector spaces are satisfied. On the other hand, not every
vector space is a field. For example, in a general vector space there is no product taking
two vectors and returning a vector.

(easy) 3.3 Let S be any non-empty set and let V = Func(S, F). Prove that
it is indeed a vector space with respect to the vector addition and scalar
multiplication defined above.

Solution 3.3: For f, g ∈ Func(S, F) and a ∈ F we defined the functions f + g and a f by


(f + g)(s) = f (s) + g(s) and (a f )(s) = a f (s).
Vector Spaces 109

The zero element of this vector space is the function ζ ∈ Func(S, F) defined by ζ(s) = 0F
for all s ∈ S. You just have to take each of the axioms one-by-one and check that they are
satisfied. For example, for every s ∈ S.
(f + ζ)(s) = f (s) + ζ(s) = f (s) + 0F = f (s),
hence f + ζ = f .

(easy) 3.4 Let V = R2 be the set of pairs of real number and let F = R.
Define
(x, y) + (w, z) = (x + w, 0)
a(x, y) = (ax, 0).
Is V a vector space over R under these operations?
Solution 3.4: No. The unit property of 1F is not satisfied, for example,
1F (2, 3) = (2, 0) ≠ (2, 3).

(easy) 3.5 What is the smallest vector space containing more than one vec-
tor?
Solution 3.5: The field F2 , viewed as a vector space over itself contains exactly two
elements.

(easy) 3.6 Show that any vector space over R is either the zero space, or
contains infinitely-many vectors.
Solution 3.6: Let V be a vector space over R. If V is not the zero space, it contains at
least one non-zero element v. The set of vectors
Rv = {av ∶ a ∈ R}
is infinite, because a ≠ b implies that av ≠ bv (see comment above).

(intermediate) 3.7 Let V be a vector space over F. Prove that for every
v, w ∈ V and 0 ≠ a ∈ F there exists a unique u ∈ V satisfying
au + v = w.
Hint: you’ve done something very similar in the context of fields.
110 Chapter 3

Solution 3.7: Set u = a−1 (w − v). Then,


au+v = a (a−1 (w−v))+v = (a a−1 )(w−v)+v = 1F (w−v) = (w−v)+v = w+(v+v) = w+0V = w,
showing that a solution exists. For uniqueness, suppose that u1 and u2 are both solutions.
Then,
au1 + v = w = au2 + v,
from which we deduce that au1 + v = au2 + v. Adding (−v) to both sides and then
multiplying by a−1 , we obtain that u1 = u2 .

(intermediate) 3.8 Use the result of Exercise 3.7 to deduce the uniqueness
of the additive inverse.
Solution 3.8: Consider the equation
1F u + v = 0V .
By the previous exercise, there exists a unique u ∈ V satisfying u + v = 0F , which is by
definition (−v).

(intermediate) 3.9 Let


V = {x ∈ R ∶ x > 0}.
For x, y ∈ V and a ∈ R define
x ⊕ y = xy and a ⊙ x = xa .
Prove that (V, ⊕, R, ⊙) is a vector space.
Solution 3.9: As surprising as it is, this is indeed a vector space. The zero element is
1 (which is an element in V ), as for all x ∈ V ,
x ⊕ 1 = x ⋅ 1 = x.
The additive inverse of x is 1/x. Let’s verify for example the distributive laws: for y, z ∈ V
and a ∈ R,
a ⊙ (y ⊕ z) = a ⊙ (yz) = (yz)a = y a z a = y a ⊕ z a = (a ⊙ y) ⊕ (a ⊙ z),
and for a, b ∈ R and x ∈ V ,
(a + b) ⊙ x = xa+b = xa xb = xa ⊕ xb = (a ⊙ x) ⊕ (b ⊙ x).
Vector Spaces 111

(intermediate) 3.10 Consider the vector space (R2 , +, R, ⋅). Let


w = (2, −1) ∈ R2 ,
and define on R2 the following two operations,
u ⊞ v = (u + v) + w and a ⊡ v = av + (a − 1)w.
(a) Is there an element in R2 neutral to ⊞? If yes, what is it? (b) Does any
element in R2 have an additive-inverse with respect to ⊞? If yes, what is it?
(c) Are the operations distributive, namely,
a ⊡ (u ⊞ v) = a ⊡ u ⊞ a ⊡ v ?

Solution 3.10: (a) Yes.


u ⊞ (−w) = (u + (−w)) + w = u.
(b) Yes.
u ⊞ (−u − 2w) = (u + (−u − 2w)) + w = −w.
(c) Yes.
a ⊡ (u ⊞ v) = a(u + v + w) + (a − 1)w,
and
a ⊡ u ⊞ a ⊡ v = (au + (a − 1)w + av + (a − 1)w) + w = a(u + v + w) + (a − 1)w.

(intermediate) 3.11 Consider the vector space (C3 , +, C, ⋅). Which vectors
are linear combinations of the vectors (1, 0, −1), (0, 1, 1) and (1, 1, 1)?
Solution 3.11: By definition, all the vectors of the form
a(1, 0, −1) + b(0, 1, 1) + c(1, 1, 1) = (a + c, b + c, b + c − a),
with a, b, c ∈ C. But this is not an explicit solution. Are there vectors in C3 which are not
linear combinations of these three vectors? In other words, can every (x, y, z) be expressed
as such a linear combination? This amounts to ask wether the linear system
⎡ 1 0 1⎤ ⎡a⎤ ⎡x⎤
⎢ ⎥⎢ ⎥ ⎢ ⎥
⎢ ⎥⎢ ⎥ ⎢ ⎥
⎢ 0 1 1⎥ ⎢ b ⎥ = ⎢ y ⎥
⎢ ⎥⎢ ⎥ ⎢ ⎥
⎢−1 1 1⎥ ⎢ c ⎥ ⎢ z ⎥
⎣ ⎦⎣ ⎦ ⎣ ⎦
is always consistent. It is readily verified that the matrix of coefficients is row-equivalent
to the unit matrix, hence the answer is positive: every vector in C3 is a linear combination
of these three vectors.
112 Chapter 3

3.3 Subspaces

3.3.1 Definitions and examples


A recurring theme in mathematics is to consider a subset of a structure,
which inherits the properties of the structure it is part of. This leads us to
the definition of a linear subspace of a vector space.
Let W ⊆ V . Since every vector in W is a vector in V , we can add together
vectors in W . The restriction of the operation + ∶ V × V → V to pairs of
vectors in W is denoted by

+∣W ×W ∶ W × W → V.

The sum of two vectors in W is not necessarily a vector in W , but it is


necessarily a vector in V . Likewise, the restriction of the operation ⋅ ∶ F×V →
V to pairs of vectors in W is denoted by

⋅∣F×W ∶ F × W → V.

A scalar multiple of a vector in W is not necessarily a vector in W , but it is


necessarily a vector in V .

Definition 3.4 Let V be a vector space over F. A subspace (or linear


subspace) (!‫ )תת מרחב וקטורי‬of V is a non-empty subset W ⊆ V , which is
closed under vector addition and scalar multiplication, namely, for all u, v ∈
V and a ∈ F,
u+v ∈W and a v ∈ W.
We denote the relation of W being a linear subspace of V by W ≤ V .

The following proposition asserts that a linear subspace of a vector space is


a vector space in its own right:

Proposition 3.5 Let V be a vector space over a field F and let W ≤ V .


Then, (W, +∣W ×W , F, ⋅∣F×W ) is a vector space.
Vector Spaces 113

Proof : Since W is not empty, it contains at least one element w. Then,

(−1)w ∈ W and (−1)w + w ∈ W,

i.e., W includes 0V . Likewise, for every w ∈ W ,

(−1)w + 0V ∈ W,

i.e., every element of W has its additive inverse in W . It remains to show


that all eight axioms are satisfied, but this follows from the axioms in V . For
example, for every u, v ∈ W , since u, v ∈ V , it follows that u + v = v + u. n

Example: Let V be a vector space over F. The subset {0V } ⊂ V is a linear


subspace of V . It is called the zero subspace of V , {0V } ≤ V , ▲▲▲

Example: Every vector space is a subspace of itself, V ≤ V . We will refer to


a proper subspace to emphasize that W is a strict subset of V , i.e., that
V ∖ W is not empty. We denote this relation by W < V . ▲▲▲

Example: Consider the vector space (Mn (F), +, F, ⋅). A matrix A ∈ Mn (F)
is called symmetric if aij = aji for all i, j ∈ 1, . . . , n. It is easy to see that the
subset of symmetric matrices is a linear subspace of Mn (F). ▲▲▲

Example: Consider the vector space V = (Fncol , +, F, ⋅). Let A ∈ Mm×n (F)
and let
W = {x ∈ Fncol ∶ Ax = 0}
be the set of solutions of the corresponding homogeneous system of equations.
By Theorem 2.40, W ≤ V . ▲▲▲

Example: Let V be a vector space and let w ∈ V . Consider the subset of


V,
W = Fw.
We claim that W is not just a subset of V ; it is a linear subspace. Why?
It is not-empty as it includes 1F ⋅ w = w. Moreover, let u, v ∈ W . By the
definition of W , there exist a, b ∈ F, such that

u = aw and v = b w.
114 Chapter 3

Then,
u + v = a w + b w = (a + b) w ∈ W.
Let c ∈ F, then
c u = c (a w) = (ca) w ∈ W,
proving W is a linear subspace of V . ▲▲▲

Exercises

(easy) 3.12 Let V be a vector space over F. Prove that W ≤ V and U ≤ W


implies that U ≤ V .

Solution 3.12: By definition, since U ≤ W , then U is not empty and for every u, v ∈ U
and a ∈ F,
u+v ∈U and a v ∈ U.
By definition U ≤ V . (Yes, there is almost nothing to prove...)

(easy) 3.13 Consider the vector space (V, ⊕, R, ⊙) in Exercise 3.9. Is it a


linear subspace of the vector space (R, +, R, ⋅)?

Solution 3.13: No, because the operations in V are not restrictions of the operations
in R. It is not sufficient that V be a subset of R to qualify as a linear subspace.

(intermediate) 3.14 Consider the vector space (R2 , +, R, ⋅).

(a) Find a subset W ⊂ R2 including the zero vector, which is closed under
scalar multiplication but not closed under vector addition.
(b) Find a subset U ⊂ R2 including the zero vector, which is closed under
vector addition but not closed under scalar multiplication.
(c) Does there exist a non-empty subset V ⊂ R2 which does not include
the zero vector, which is closed under scalar multiplication?

Solution 3.14: (a) The set of vectors


W = {(a, a) ∶ a ∈ R} ∪ {(a, −a) ∶ a ∈ R}.
Vector Spaces 115

(b) The set of vectors


U = {(m, n) ∶ m, n ∈ Z}.
(c) No. Let S ⊂ V be non-empty and closed under scalar multiplication. Then, for every
v ∈ S (which by assumption exists),

0V = 0F v ∈ S,

i.e., 0V ∈ S.

(intermediate) 3.15 In each of the following items is given a subset W of


a vector space (V, +, F, ⋅). Determine whether W ≤ V .

(a) V = (C2 , +, C, ⋅) and

W = {(z, w) ∶ 2z = 3w} .

(b) V = (M2×2 (R), +, R, ⋅) and

a b
W = {[ ] ∶ ad = 0} .
c d

(c) V = (R[X], +, R, ⋅) and

W = {p(X) ∈ R[X] ∶ p(0) = p(2)}.

(d) V = (R3 , +, R, ⋅) and

W = {(x, y, z) ∶ 2x − y + z = 0, y − 2z = 0} .

(e) V = (R3 , +, R, ⋅) and

W = {(x, y, z) ∶ xy = z} .

(f) V = Func(R, R) over R, and

W = {f ∶ R → R ∶ f (2) = f (3)}.

(g) V = Func(R, R) over R, and

W = {f ∶ R → R ∶ f (0) = f 2 (1)}.
116 Chapter 3

Solution 3.15: (a) yes, (b) no, (c) yes, (d) yes, (e) no, (f) yes, (g) no.

(intermediate) 3.16 Let V = Func(R, R) over R. Which of the following


subsets is a linear subspace?

(a) The functions f satisfying f (−1) + f (1) = 0.


(b) The functions f satisfying f (0) + f (1) = 1.
(c) The functions f satisfying f (0) ⋅ f (1) = 0.
(d) The functions f satisfying f (−x) + f (x) = 0 for all x ∈ R.

Solution 3.16: (a) and (d).

(intermediate) 3.17 Consider the vector space (Cn , +, C, ⋅). Let W ≤ Cn


and consider the set U ⊆ Cn ,

U = {(z̄ 1 , . . . , z̄ n ) ∶ (z 1 , . . . , z n ) ∈ W } ,

where z̄ is the complex conjugate of z. Show that U ≤ Cn .

Solution 3.17: U is not empty as


(0̄, . . . , 0̄) = (0, . . . , 0) ∈ W,

i.e., (0, . . . , 0) ∈ U . Let

(z̄ 1 , . . . , z̄ n ), (w̄1 , . . . , w̄n ) ∈ U, i.e., (z 1 , . . . , z n ), (w1 , . . . , wn ) ∈ W.

Then,
1 n
(z̄ 1 , . . . , z̄ n ) + (w̄1 , . . . , w̄n ) = ((z + w) , . . . , (z + w) ) ∈ U
since
((z + w)1 , . . . , (z + w)n ) ∈ W.
It follows that U is closed under vector addition. We proceed similarly to show the closure
of U under scalar multiplication.

(intermediate) 3.18 Consider the vector space (Rn , +, R, ⋅) for some n ≥ 3.


Which of the following subsets of Rn is a linear subspace?
Vector Spaces 117

(a) All x = (x1 , . . . , xn ) satisfying x1 ≥ 0.


(b) All x = (x1 , . . . , xn ) satisfying x1 + 3x2 = x3 .
(c) All x = (x1 , . . . , xn ) satisfying x2 = x21 (here the superscript 2 is a
square).
(d) All x = (x1 , . . . , xn ) satisfying x1 x2 = 0.
(e) All x = (x1 , . . . , xn ) such that x2 is rational.

Solution 3.18: Only (b). (a), (c) and (e) are not closed under scalar multiplication;
(d) is not closed under vector addition,

(harder) 3.19 Let V be a vector space over a field F and let


S = {vα ∶ α ∈ I}
be a non-empty subset of V ; here I is an index set (which could be infinite).
Consider the subset U ⊂ V comprising all linear combinations of vectors in
S,
U = { ∑ aα vα ∶ J ⊂ I is finite, aα ∈ F, vα ∈ S} .
α∈J
Prove that U ≤ V .
Solution 3.19: This is proved in Section 3.3.3.

3.3.2 The subspace generated by a set


A vector space may have many linear subspaces. The following proposition
asserts that the intersection of any collection of linear subspaces is again a
linear subspace:

Proposition 3.6 Let V be a vector space over F. Let C be a (possibly


infinite) collection of linear subspaces of V (i.e., C is a set whose elements
are linear subspaces of V ). Then,

W = ⋂C

is a linear subspace of V .
118 Chapter 3

Proof : First, let’s interpret the statement of this proposition. There is a


collection of linear subspaces of V ; this collection could be finite (e.g., seven
subspaces, which we could denote by W1 , . . . , W7 ); this collection could be
countable (!‫)בת מנייה‬, i.e., form a sequence (!‫( )סדרה‬which we could denote
by W1 , W2 , . . . ); this collection could also be uncountably infinite. The set

W = ⋂C

comprises all those elements in V which are elements in U for every U ∈ C ,


i.e., w ∈ W if and only if w ∈ U for all U ∈ C . The claim is that this set is a
linear subspace of V .
By definition, we need to show that W is not empty, and that for every
u, v ∈ W and a ∈ F,

u+v ∈W and a u ∈ W.

Each of the U ∈ C is a linear subspace of V , hence

0V ∈ U for every U ∈ C ,

from which follows that 0V ∈ W .


Let u, v ∈ W . By the very definition of W ,

u, v ∈ U for every U ∈ C ,

Since every such U is a linear subspace,

u+v ∈U for every U ∈ C ,

from which follows that u + v ∈ W .


Likewise for a ∈ F and u ∈ V ,

au ∈ U for every U ∈ C ,

from which follows that a u ∈ W . This concludes the proof. n


This proposition has an important consequence, whose likes are recurring
in many branches of mathematics. Let S ⊂ V be a collection of vectors,
which could be finite, countably infinite, uncountable infinite or even empty.
Vector Spaces 119

Consider the collection of all linear subspaces of V which contain all those
vectors, namely,
C = {W ≤ V ∶ S ⊆ W }.
This collection is not empty, because V itself is a linear subspace of V con-
taining all vectors in S, i.e.,
V ∈ C.
Whatever this collection of linear subspaces is, its intersection is a linear
subspace of V . We call it the linear subspace generated (!‫ )תת מרחב נוצר‬by
the vectors in S, and denote it by

⟨S⟩ = ⋂{W ≤ V ∶ S ⊆ W }. (3.1)

The following two lemmas provide a useful characterization of the generated


subspace:

Lemma 3.7 Let V be a vector space over F, and let S ⊆ V . If S ⊆ W ≤ V ,


then ⟨S⟩ ≤ W .

Proof : This is really a direct consequence of the definition (3.1). If W ≤ V


contains S, i.e.,
W ∈ {W̃ ≤ V ∶ S ⊂ W̃ },
then,
⋂{W̃ ≤ V ∶ S ⊂ W̃ } ⊂ W,
as an intersection of any collection of sets is contained in any set in that
intersection, but this is exactly what we have to prove. n

Lemma 3.8 Let V be a vector space over F, and let S ⊆ V . If T ⊆ V


satisfies that T ⊆ W for every W ≤ V containing S, then

T ∈ ⟨S⟩ .
120 Chapter 3

Proof : Once again, this is a direct consequence of the definition of the gen-
erated subspace. If

T ⊆W for all W ∈ {W̃ ≤ V ∶ S ⊂ W̃ },

then
T ⊆ ⋂{W̃ ≤ V ∶ S ⊂ W̃ }.
n

Example: Let V be a vector space over F. Let w ∈ V and let S = {w}. As


a matter of convenience, we write ⟨w⟩ rather than ⟨{w}⟩. We will show that

⟨w⟩ = Fw,

that is, the linear subspace generated by a single vector is the subspace
obtained by all multiples of that vectors by scalars. If w = 0V , then this
subspace is the zero subspace. Otherwise, it is a line.
We have already seen that
Fw ≤ V.
Since {w} ⊂ Fw ≤ V , if follows from Lemma 3.7 that

⟨w⟩ ⊆ Fw.

Conversely, every vector in Fw must be included in any linear subspace


containing w, namely

Fw ⊆ W for all W ∈ {W̃ ≤ V ∶ {w} ⊂ W̃ },

and by Lemma 3.8,


Fw ⊆ ⟨w⟩ .
▲▲▲
The following properties of the generated subspace follow almost directly
from the definition.

Proposition 3.9 In every vector space

⟨∅⟩ = {0V }.
Vector Spaces 121

Proof : Since {0V } ≤ V contains the empty set, it follows from Lemma 3.7
that
⟨∅⟩ ⊆ {0V }.
Conversely, since {0V } is contained in every linear subspace of V , it follows
that
{0V } ⊆ W for all W ∈ {W̃ ≤ V ∶ ∅ ⊂ W̃ },
and by Lemma 3.8,
{0V } ⊆ ⟨∅⟩ .
n

Proposition 3.10 Let V be a vector space over F and let S ⊆ T ⊆ V . Then,

⟨S⟩ ≤ ⟨T ⟩ .

(Note that we write ⟨S⟩ ≤ ⟨T ⟩ rather than ⟨S⟩ ⊆ ⟨T ⟩ because these are linear
subspaces.)

Proof : Since S ⊆ T , it follows that

{W ≤ V ∶ T ⊆ W } ⊆ {W ≤ V ∶ S ⊆ W }.

Intersecting over more sets can only reduce the intersection, hence

⟨T ⟩ = ⋂{W ≤ V ∶ T ⊂ W } ⊇ ⋂{W ≤ V ∶ S ⊆ W } = ⟨S⟩ .

n
More properties of generated subspaces are derived in the exercise section.

Exercises

(easy) 3.20 Let W1 , W2 ≤ V . Prove directly (i.e., without recurring to the


general theorem proved above) that W1 ∩ W2 ≤ V .
122 Chapter 3

Solution 3.20: Since W1 and W2 are both linear subspaces, 0V ∈ W1 and 0V ∈ W2 , from
which follows that 0V ∈ W1 ∩ W2 . Let u, v ∈ W1 ∩ W2 and let a ∈ F. Since in particular
u, v ∈ W1 which is a subspace, it follows that u + v ∈ W1 and av ∈ W1 . Since in particular
u, v ∈ W2 which is a subspace, it follows that u + v ∈ W2 and av ∈ W2 . Thus,

u + v ∈ W1 ∩ W2 and a v ∈ W1 ∩ W2 ,

proving that W1 ∩ W2 is a linear subspace.

(intermediate) 3.21 Let V be a vector space over a field F and let W ≤ V .


Show that
⟨W ⟩ = W.

Solution 3.21: By definition,


⟨W ⟩ = ⋂{U ≤ V ∶ W ⊆ U }.

Since W ∈ {U ≤ V ∶ W ⊆ U } (it is a subspace containing W ) it follows that ⟨W ⟩ ⊆ W (⟨W ⟩


is an intersection of sets, one of which is W ). On the other hand, since every subspace
containing W contains W , it follows that the intersection of all subspaces containing W
contains W , i.e., W ⊆ ⟨W ⟩, which completes the proof.

(intermediate) 3.22 Let V be a vector space over a field F and let S ⊆ V .


Show that
⟨⟨S⟩⟩ = ⟨S⟩ .

Solution 3.22: This is an immediate corollary of the previous exercise as ⟨S⟩ ≤ V .

(intermediate) 3.23 Let V be a vector space over F. Let S1 , S2 ⊆ V be


non-empty subsets. Suppose that

S1 ⊆ ⟨S2 ⟩ and S2 ⊆ ⟨S1 ⟩ .

Show that
⟨S1 ⟩ = ⟨S2 ⟩ .
Vector Spaces 123

Solution 3.23: From Proposition 3.10 and the previous exercise,


S1 ⊆ ⟨S2 ⟩ implies ⟨S1 ⟩ ⊆ ⟨⟨S2 ⟩⟩ = ⟨S2 ⟩ ,

and
S2 ⊆ ⟨S1 ⟩ implies ⟨S2 ⟩ ⊆ ⟨⟨S1 ⟩⟩ = ⟨S1 ⟩ .

(intermediate) 3.24 Let V be a vector space over F. Let S1 , S2 ⊆ V be


non-empty subsets. For each of the following statements, determine whether
it is true or not:

(a) If ⟨S1 ⟩ ⊆ ⟨S2 ⟩ then S1 ⊆ S2 .


(b) If S2 ⊆ S1 and ⟨S1 ⟩ ⊆ ⟨S2 ⟩, then ⟨S1 ⟩ = ⟨S2 ⟩.
(c) If S2 ⊆ S1 and ⟨S2 ⟩ ≠ ⟨S1 ⟩, then for every v ∈ S1 ∖ S2 we have v ∈/ ⟨S2 ⟩.
(d) If S2 ⊆ S1 and ⟨S2 ⟩ ≠ ⟨S1 ⟩, then there exists v ∈ S1 ∖ S2 such that
v ∈/ ⟨S2 ⟩.
(e) If S1 ∩ S2 = ∅, then ⟨S1 ⟩ ∩ ⟨S2 ⟩ = {0}.

Solution 3.24:
(a) False. Take the vector space R with S1 = {1, 2} and S2 = {1}. Then, ⟨S1 ⟩ = ⟨S2 ⟩
(i.e., ⟨S1 ⟩ ⊆ ⟨S2 ⟩) but S1 ⊆/ S2 .
(b) True. by Proposition 3.10, S2 ⊆ S1 implies that ⟨S2 ⟩ ⊆ ⟨S1 ⟩, which together with
⟨S1 ⟩ ⊆ ⟨S2 ⟩ yields an equality.
(c) False. Let V = R2 , S1 = {(1, 0), (0, 1), (2, 0)} and S2 = {(1, 0)}. Then ⟨S1 ⟩ = R2 and
⟨S2 ⟩ = R(1, 0) i.e., ⟨S2 ⟩ ≠ ⟨S1 ⟩. Then, (2, 0) ∈ S1 ∖ S2 and yet (2, 0) ∈ ⟨S2 ⟩.
(d) True. Suppose by contradiction that every v ∈ S1 ∖ S2 satisfies v ∈ ⟨S2 ⟩. Then,

S1 = S2 ∪ (S1 ∖ S2 ) ⊆ ⟨S2 ⟩ ,

hence
⟨S1 ⟩ ⊆ ⟨S2 ⟩ .
On the other hand, since S2 ⊆ S1 it follows that ⟨S2 ⟩ ⊆ ⟨S1 ⟩, which contradicts
⟨S2 ⟩ ≠ ⟨S1 ⟩.
(e) False. Let V = R, S1 = {1} and S2 = {2}. Then, S1 ∩ S2 = ∅ however ⟨S1 ⟩ ∩ ⟨S2 ⟩ = R.
124 Chapter 3

3.3.3 The linear span of a set of vectors


The definition of the linear subspace generated by a collection of vectors is
quite implicit. We will now provide a more explicit characterization.

Definition 3.11 Let V be a vector space over a field F and let S ⊆ V be a


non-empty collection of vectors. Then, the linear span (!‫ )הפרוס הלינארי‬of
S is the set of all (finite) linear combinations of elements of S,

Span S = {a1 v1 + ⋅ ⋅ ⋅ + an vn ∶ n ∈ N, ai ∈ F, vi ∈ S} . (3.2)

In the particular case where S = ∅ we define Span S = {0V }.

Example: Let w ∈ V . Then, the only linear combinations of {w} are scalar
multiples of w,
Span{w} = Fw.
Note that Span{w} = ⟨w⟩. We will shortly see that this is a general identity
(note also that we defined the span such that Span ∅ = {0V } = ⟨∅⟩). ▲ ▲ ▲

Example: Let V = (R2 , +, R, ⋅) and let

S = {(1, 1), (−1, 1)}.

Then,

Span S = {a(1, 1) + b(−1, 1) ∶ a, b ∈ R} = {(a − b, a + b) ∶ a, b ∈ R}.

It is not hard to see that for every (x, y) ∈ R2 ,

(x, y) = (a − b, a + b),

where
a = 12 (y + x) and b = 12 (y − x),
proving that Span S = R2 . ▲▲▲

Proposition 3.12 Let V be a vector space over a field F and let S ⊆ V .


Then Span S ≤ V . (Note that this was already mentioned in Exercise 3.19.)
Vector Spaces 125

Proof : If S = ∅, then Span S = {0V } ≤ V by convention. Otherwise, it suffices


to show that for every u, v ∈ Span S and c ∈ F,

u + v ∈ Span S and c u ∈ Span S.

If u, v ∈ Span S, then there exist vectors u1 , . . . , un , v1 , . . . , vm ∈ S and scalars


a1 , . . . , an , b1 , . . . , bm ∈ F, such that

u = a1 u1 + ⋅ ⋅ ⋅ + an un and v = b1 v 1 + ⋅ ⋅ ⋅ + bm v m .

Then
u + v = a1 u1 + ⋅ ⋅ ⋅ + an un + b1 v1 + ⋅ ⋅ ⋅ + bm vm ∈ Span S.
Likewise,
c u = c a1 u1 + ⋅ ⋅ ⋅ + c an un ∈ Span S,
proving that Span S ≤ V . n

Theorem 3.13 Let V be a vector space over a field F and let S ⊂ V . Then,

Span S = ⟨S⟩ .

Proof : If S = ∅, then this holds by definition. Otherwise, since

Span S ∈ {W ≤ V ∶ S ⊆ W },

it follows from Lemma 3.7 that

⟨S⟩ ⊆ Span S.

Conversely, since every W ≤ V containing S must contain every vector in


Span S, i.e.,

Span S ⊆ W for all W ∈ {W̃ ≤ V ∶ S ⊆ W̃ },

it follows by Lemma 3.8 that

Span S ⊆ ⟨S⟩ ,

which completes the proof. n


126 Chapter 3

Corollary 3.14 Let W ≤ V . Then,

Span W = W.

Proof : This corollary asserts that linear subspaces are closed under linear
combinations. We can prove it directly, but we can get this as a consequence
of the last theorem, recalling that ⟨W ⟩ = W (see Exercise 3.21). n

Example: Let V = R5 and let


v1 = (1, 2, 0, 3, 0)
v2 = (0, 0, 1, 4, 0)
v3 = (0, 0, 0, 0, 1).

A vector v ∈ R5 is in the linear span of v1 , v2 , v3 if and only if there exist


scalar a, b, c, such that
v = av1 + bv2 + cv3 ,
i.e., if there exist such scalars such that

v = (a, 2a, b, 3a + 4b, c).

We can relate this to linear systems: Span{v1 , v2 , v3 } is the set of all x ∈ R5 ,


such that
x2 = 2x1 and x4 = 3x1 + 4x3 .
▲▲▲

Exercises

(easy) 3.25 Let V be a vector space over the field F2 and let v ∈ V be a
non-zero vector. Write explicitly all the vectors in Span{v}.

Solution 3.25:
Span{v} = {a v ∶ a ∈ F2 } = {0F v, 1F v} = {0V , v}.
Vector Spaces 127

(easy) 3.26 Consider the vector space (F3 , +, F, ⋅). Find two vectors u, v ∈
F3 , such that
Span{u, v} = {(0F , a, b) ∶ a, b ∈ F} .

Solution 3.26: For example u = (0F , 1F , 0F ) and u = (0F , 0F , 1F ). Then for every a, b ∈ F,
a u + b v = (0F , a, b),

i.e.,
Span{u, v} = {a u + b v ∶ a, b ∈ F} = {(0F , a, b) ∶ a, b ∈ F} .

(easy) 3.27 Consider the vector space (R4 , +, R, ⋅). Find two different sets
S, T ⊂ R4 , such that

Span S = Span T = {(a, a − b, b, a + b) ∶ a, b ∈ R} .

Solution 3.27: For example,


S = {(1, 1, 0, 1), (0, −1, 1, 1)}

and
T = {(π, π, 0, π), (0, −eπ , eπ , eπ )}.

(Yes, I could have worked harder to make the second example more ”interesting”, but why
work harder?)

(intermediate) 3.28 Consider the vector space (R4 , +, R, ⋅), and let

v1 = (2, −1, 3, 2)
v2 = (−1, 1, 1, −3)
v3 = (1, 1, 9, −5).

Is
(3, −1, 0, −1) ∈ Span{v1 , v2 , v3 }?
128 Chapter 3

Solution 3.28: The question is whether there exist real numbers x1 , x2 , x3 , such that
x1 (2, −1, 3, 2) + x2 (−1, 1, 1, −3) + x3 (1, 1, 9, −5) = (3, −1, 0, −1).
We can turn this into whether the non-homogeneous system
2x2 − x2 + x3 = 3 − x1 + x2 + x3 = −1 3x1 + x2 + 9x3 = 0 2x1 − 3x2 − 5x3 = −1
is consistent. In matrix form
⎡2 −1 1 ⎤⎥ ⎡ 1 ⎤ ⎡⎢ 3 ⎤⎥
⎢ x
⎢−1 1 1 ⎥⎥ ⎢⎢ 2 ⎥⎥ ⎢⎢−1⎥⎥

⎢ ⎥ ⎢x ⎥ = ⎢ ⎥ .
⎢3 1 9 ⎥⎥ ⎢⎢ 3 ⎥⎥ ⎢⎢ 0 ⎥⎥

⎢2 x
⎣ −3 −5⎥⎦ ⎣ ⎦ ⎢⎣−1⎥⎦
A this stage we know how to proceed by reducing the system. This yields,
⎡ 2 −1 1 3 ⎤ ⎡ 1 0 2 2 ⎤
⎢ ⎥ ⎢ ⎥
⎢ −1 1 1 −1 ⎥ ⎢ 0 1 3 1 ⎥
⎢ ⎥ ⎢ ⎥
⎢ ⎥ → ⎢ ⎥.

⎢ 3 1 9 0 ⎥


⎢ 0 0 0 −2 ⎥


⎣ 2 −3 −5 −1 ⎥


⎣ 0 0 0 −2 ⎥

This system is not consistent, hence the answer is negative.

(intermediate) 3.29 Let V be a vector space over R and let u, v, w ∈ V .

(a) Is Span{u − v, v − w, w} = Span{u, v, w}?


(b) Is Span{u − v, v − w, w − u} = Span{u, v, w}?
(c) Is it possible that u, v, w are distinct and Span{u − v, v − w, w − u} =
Span{u, v, w}?

Solution 3.29:
(a) Yes. Set S1 = {u−v, v−w, w} and S2 = {u, v, w}. Clearly, S1 ⊂ Span S2 . Conversely,
u = (u − v) + (v − w) + w and v = (v − w) + w,
hence S2 ⊂ Span S1 , from which we conclude that Span S1 = Span S2 .
(b) No, and we show it by finding a counter example. Let V = R3 with u = (1, 0, 0),
v = (0, 1, 0) and w = (0, 0, 1). On the one hand, Span{u, v, w} = R3 . On the other
hand
{u − v, v − w, w − u} = {(1, −1, 0), (0, 1, −1), (−1, 0, 1)},
whose span is
Span{u − v, v − w, w − u} = {(a − c, −a + b, c − b) ∶ a, b, c ∈ R}
This span contains only vectors x ∈ R3 for which x1 + x2 + x3 = 0, hence this span is
a strict subset of R3 .
Vector Spaces 129

(c) Yes. Let V = R with u = 1, v = 2 and w = 3. Then,

Span{u − v, v − w, w − u} = Span{u, v, w} = R.

(harder) 3.30 Let W ⊂ R5 be the set of all solutions to the linear system

2X 1 −X 2 + 34 X 3 −X 4 =0
X1 + 23 X 3 −X 5 = 0
9X 1 −3X 2 +6X 3 −3X 4 −3X 5 = 0.
Find a set of three vectors spanning W .

Solution 3.30: We first reduce the system to find the subspace of solutions,
⎡ 2 4 ⎤ ⎡ 1 0 2
⎢ −1 3
−1 0 ⎥ ⎢ 3
0 −1 ⎤⎥
⎢ 2 ⎥ ⎢ ⎥
⎢ 1 0 3
0 −1 ⎥ → ⎢ 0 1 0 1 2 ⎥.
⎢ ⎥ ⎢ ⎥
⎢ 9 −3 6 −3 −3 ⎥ ⎢ 0 0 0 0 0 ⎥⎦
⎣ ⎦ ⎣
Thus, there are three free variables, and the space of solutions is

{(u − 32 s, 2u − t, s, t, u) ∶ s, t, u ∈ R}.

A set of vectors spanning this subspace is

{(− 23 , 0, 1, 0, 0), (0, −1, 0, 1, 0), (1, 2, 0, 0, 1)}.

(intermediate) 3.31 Prove that the only linear subspaces of R (the field R
as a vector space over itself) are R and {0}.

Solution 3.31: Let W ≤ R. If W is not the zero subspace, then there exists a non-zero
element a ∈ W . Since {a} ⊂ W , it follows that

Span{a} ≤ Span W = W,

however Span{a} = R, hence W = R.

(intermediate) 3.32 Let V be a vector space over F and let W ≤ V . Let


S ⊆ V satisfying Span S = V . Prove or disprove: there exists a subset T ⊆ S,
such that Span T = W .
130 Chapter 3

Solution 3.32: Absolutely not! Let F = R, V = R2 and W = Span{(1, 0)}. Take,


S = {(1, 1), (1, −1)}.

Then Span S = V but there is no subset of S spanning W .

(intermediate) 3.33 Let V be a vector space over F. Let W ≤ V and let


u, v ∈ V ∖ W . Show that

u ∈ Span(W ∪ {v}) if and only if v ∈ Span(W ∪ {u}).

Solution 3.33: It suffices to show one direction, since the relation is symmetric with
respect to u and v. So suppose that

u ∈ Span W ∪ {v}.

This means that there exist w1 , . . . , wn ∈ W and a1 , . . . , an , c, such that

u = a1 w1 + ⋅ ⋅ ⋅ + an wn + c v.

If must be that c ≠ 0F , otherwise u ∈ W , in contradiction to u ∈ V ∖ W . Thus,

v = c−1 u − (c−1 a1 )w1 − ⋅ ⋅ ⋅ − (c−1 an )wn ,

i.e., v ∈ Span(W ∪ {u}).

(harder) 3.34 Prove that the only linear subspaces of R2 are R2 , {0} or
sets of the form
Rv
for some v ∈ R2 .

Solution 3.34: Clearly, R2 , {0} and sets of the form Rv are linear subspaces of R2 .
We need to show that there are no other. If all the vectors in W ≤ V are multiple of one
vector v, then W = Span{v} = Rv. Otherwise, there exists in W a non-zero vector v and
a vector not its its span, say w. We need to show that Span{v, w} = R2 .
Write v = (a, b) and w = (c, d). We are asking whether every (e, f ) ∈ R2 can be written as

(e, f ) = x1 (a, b) + x2 (c, d),

which is to ask whether the non-homogeneous system

a c x1 e
[ ][ ] = [ ]
b d x2 f
Vector Spaces 131

is consistent for all e, f ∈ R. We know that the answer is positive if and only if ad − bc ≠ 0,
i.e., if and only if (a, b) and (c, d) are not proportional to each other. Thus, Span{v, w} =
R2 .

(harder) 3.35 What are all the linear subspaces of (C, +, R, ⋅)?

Solution 3.35: The vector space (C, +, R, ⋅) can be viewed as (R2 , +, R, ⋅) with the
identification a + ıb → (a, b). Thus, the answer is the zero subspace, C and all subspaces
of the form Rz, with z ∈ C.

(harder) 3.36 Let W1 , W2 ≤ V . Suppose that W1 ∪ W2 ≤ V . Prove that


either W1 ⊆ W2 or W2 ⊆ W1 .

Solution 3.36: Suppose by contradiction the existence of


u ∈ W 1 ∖ W2 and v ∈ W2 ∖ W1 .

Since W1 ∪ W2 is a linear subspace, it is closed under vector addition, hence

u + v ∈ W1 ∪ W2 ,

but this can’t be. For suppose that u + v ∈ W1 , then

v = (u + v) − u ∈ W1 ,

which is a contradiction. And if u + v ∈ W2 , then

u = (u + v) − v ∈ W2 ,

which is also a contradiction.

3.3.4 The row space of a matrix


Let A ∈ Mm×n (F). The rows of A,

{Rowi (A) ∶ i = 1, . . . , m}

are a subset of Fnrow , which is a vector space over F. Their linear span is
called the row space (!‫ )מרחב השורות‬of A, denoted by

R(A) = Span {Rowi (A) ∶ i = 1, . . . , m} .


132 Chapter 3

We can express linear combinations of the rows of A using matrix notation,


⎡ Row1 (A) ⎤
m ⎢ ⎥
i ⎢ ⎥
∑ ci Row (A) = [c1 ⋯ cm ] ⎢ ⋮ ⎥.
i=1
⎢ m ⎥
⎢Row (A)⎥
⎣ ⎦
Namely,
R(A) = {cA ∶ c ∈ Fm
row } .

Example: Consider the case where m < n, and


⎡1 0 0⎤⎥

⎢ 1 0 0⎥⎥

A=⎢ ⎥.
⎢ ⋱ ⋮ ⋮ ⎥⎥

⎢ 1 0 0⎥⎦

Then
R(A) = {x ∈ Fnrow ∶ xm+1 = ⋯ = xn = 0} .
▲▲▲

Example: Consider the case where m > n, and


⎡1 ⎤
⎢ ⎥
⎢ 1 ⎥
⎢ ⎥
⎢ ⎥
⎢ ⋱ ⎥
⎢ ⎥
A = ⎢⎢ 1⎥⎥ .
⎢0 0 . . . 0⎥⎥

⎢ ⎥
⎢⋮
⎢ ⋮ . . . ⋮ ⎥⎥
⎢0 0 . . . 0⎥⎦

Then,
R(A) = Fnrow .
▲▲▲

Lemma 3.15 Let A ∈ Mm×n (F) and B ∈ Mn×k (F). Then,

R(AB) ≤ R(B).
Vector Spaces 133

Proof : We can think of the product AB as


⎡ Row1 (A) ⎤ ⎡ Row1 (A)B ⎤
⎢ ⎥ ⎢ ⎥
⎢ ⎥ ⎢ ⎥
AB = ⎢ ⋮ ⎥B = ⎢ ⋮ ⎥,
⎢ ⎥ ⎢ ⎥
⎢Rowm (A)⎥ ⎢Rowm (A)B ⎥
⎣ ⎦ ⎣ ⎦
so that each row of AB is a linear combination of the rows of B. That is,

{Rowi (AB) ∶ i = 1, . . . , m} ⊂ R(B),

from which follows that


R(AB) ≤ R(B).
n
The following theorem connects the notion of row-equivalence to the row
spaces of matrices:

Theorem 3.16 Two matrices A, B ∈ Mm×n (F) are row-equivalent if and


only if R(A) = R(B). In particular, the row space of every matrix is equal
to the row space of its row-reduced form.

Proof : Recall that A and B are row-equivalent if and only if there exist
matrices P, Q ∈ Mm (F), such that

B = PA and A = QB.

By Lemma 3.15, if A and B are row-equivalent, then

R(B) ≤ R(A) and R(A) ≤ R(B),

hence R(A) = R(B).


Conversely, if R(A) = R(B), then every row of A is a linear combination of
the rows of B and vice-versa, i.e., there exist matrices P, Q ∈ Mm (F) such
that
B = PA and A = QB,
hence they are row-equivalent. n
134 Chapter 3

Exercises

(intermediate) 3.37 Consider the vector space (R3 , +, R, ⋅) and the sets
S = {(1, 2, 3), (2, 2, 1)} and T = {(2, 3, −1), (3, 0, −2)}.
Is Span S = Span T ?
Hint: find matrices A, B such that Span S = R(A) and Span T = R(B). Re-
duce these matrices and base your answer on those reduced representations.
Solution 3.37: Consider the matrices
1 2 3 2 3 −1
A=[ ] and B=[ ].
2 2 1 3 0 −2

Clearly, Span S = R(A) and Span T = R(B). The row space of two matrices of the same
size is the same if and only if they are row-equivalent. Denoting by RA and RB the
row-reduced forms of A and B, a direct calculation gives
1 0 −2 1 0 −2/3
RA = [ ] and RB = [ ].
0 1 5/2 0 1 1/9
These two matrices are not row-equivalent hence neither are A and B.

3.3.5 The column space of a matrix


Let A ∈ Mm×n (F). The columns of A,
{Coli (A) ∶ i = 1, . . . , n}
are a subset of Fm
col , which is a vector space over F. Their linear span is called
the column space (!‫ )מרחב העמודות‬of A, denoted by
C (A) = Span{Coli (A) ∶ i = 1, . . . , n}.

We can express linear combinations of the columns of A using matrix nota-


tion,
⎡ c1 ⎤
n ⎢ ⎥
i ⎢ ⎥
∑ c Coli (A) = [Col1 (A) . . . Col n (A)] ⎢ ⋮ ⎥.
i=1
⎢ n⎥
⎢c ⎥
⎣ ⎦
Namely,
C (A) = {Ac ∶ c ∈ Fncol } .
Vector Spaces 135

Example: Consider the case where m < n, and


⎡1 0 0⎤⎥

⎢ 1 0 0⎥⎥

A=⎢ ⎥
⎢ ⋱ ⋮ ⋮ ⎥⎥

⎢ 1 0 0⎥⎦

Then
C (A) = Fm
col .

▲▲▲

Example: Consider the case where m > n, and


⎡1 ⎤
⎢ ⎥
⎢ 1 ⎥
⎢ ⎥
⎢ ⎥
⎢ ⋱ ⎥
⎢ ⎥
A = ⎢⎢ 1⎥⎥
⎢0 0 . . . 0⎥⎥

⎢ ⎥
⎢⋮
⎢ ⋮ . . . ⋮ ⎥⎥
⎢0 0 . . . 0⎥⎦

Then,
C (A) = {x ∈ Fm
col ∶ x
n+1
= ⋯ = xm = 0} .
▲▲▲

Example: Let A ∈ M2×2 (R) be given by

1 1
A=[ ].
0 0

Then,
R(A) = Span{[1, 1]} = {[c c] ∶ c ∈ R} ,
whereas
1 c
C (A) = Span {[ ]} = {[ ] ∶ c ∈ R} .
0 0
▲▲▲
136 Chapter 3

Lemma 3.17 Let A ∈ Mm×n (F) and B ∈ Mn×k (F). Then,

C (AB) ≤ C (A).

Proof : We can think of the product AB as

AB = A [Col1 (B) . . . Coln (B)] = [A Col1 (B) . . . A Coln (B)] ,

so that each column of AB is a linear combination of the columns of A, from


which follows that
C (AB) ≤ C (A).
n
The following theorem connects the column space of a matrix with the con-
sistency of associated non-homogeneous systems:

Theorem 3.18 Let A ∈ Mm×n (F). Then, the non-homogenous system AX =


b is consistent if and only if

b ∈ C (A).

Proof : If you think of it, there is nothing to prove. b ∈ C (A) if and only if
there exists an x ∈ Fncol , such that Ax = b, which by definition amounts to
the system AX = b being consistent. n

3.3.6 The sum of linear subspaces


Definition 3.19 Let V be a vector space over a field F and let S1 , S2 , . . . , Sn
be non-empty subset of V (not necessarily linear subspaces). We define their
sum
S1 + S2 + ⋅ ⋅ ⋅ + Sn = {v1 + v2 + ⋅ ⋅ ⋅ + vn ∶ vi ∈ Si ∀i}.

Be careful not to confuse S1 + S2 and S1 ∪ S2 .


Vector Spaces 137

Example: Let V = (F2col , +, F, ⋅) and let

2 4
S1 = {[ ] , [ ]} ⊂ V,
3 5

and
2 7
S2 = {[ ] , [ ]} ⊂ V,
3 8
Then,
2 4 7
S1 ∪ S2 = {[ ] , [ ] , [ ]} ,
3 5 8
whereas
4 9 6 11
S1 + S2 = {[ ] , [ ] , [ ] , [ ]} .
6 11 8 13
▲▲▲

Proposition 3.20 Let V be a vector space over a field F and let


W1 , W2 , . . . , Wn be linear subspaces of V . Then,

W = W1 + W2 + ⋅ ⋅ ⋅ + Wn

is a linear subspace of V . Furthermore,


n
W = Span (⋃ Wi ) .
i=1

Proof : We need to show that W is closed under linear combinations. Let


u, v ∈ W . By definition, they can be written in the form

u = a1 u1 + ⋅ ⋅ ⋅ + an un and v = b1 v 1 + ⋅ ⋅ ⋅ + bn v n ,

where ui , vi ∈ Wi for every i = 1, . . . , n. Then,

u + v = (a1 u1 + b1 v1 ) + ⋅ ⋅ ⋅ + (an un + bn vn ) ∈ W,

and for every c ∈ F,

c u = c a1 u1 + ⋅ ⋅ ⋅ + c an un ∈ W,
138 Chapter 3

thus proving that W ≤ V .


For the second part of the proposition, we observe that on the one hand,
every w ∈ W is of the form
w = w1 + ⋅ ⋅ ⋅ + wn ,
with wi ∈ Wi , proving that
n
W ≤ Span (⋃ Wi ) .
i=1

On the other hand, since W ≤ V contains the union of the Wi ’s, it follows by
Lemma 3.7 that
n
⟨⋃ Wi ⟩ ≤ W,
i=1
and by Theorem 3.13,
n
Span (⋃ Wi ) ≤ W,
i=1
which completes the proof. n

Exercises
(harder) 3.38 Let W1 , W2 be linear subspaces of a vector space V , such
that
W1 + W2 = V and W1 ∩ W2 = {0V }.
Prove that for every vector v ∈ V there exist unique vectors w1 ∈ W1 and
w2 ∈ W2 , such that
v = w1 + w2 .
Solution 3.38: Since V = W1 + W2 , then by definition every v ∈ V can be represented
as
v = w1 + w2 ,
where w1 ∈ W1 and w2 ∈ W2 . Suppose that
v = u1 + u2 ,
where u1 ∈ W1 and u2 ∈ W2 . Then,
w1 − u1 = u2 − w2 .
The left-hand side is an element of W1 and the right-hand side is an element of W2 . Since
the only element belonging to both subspaces is 0V , it follows that w1 − u1 = u2 − w2 = 0V ,
i.e., u1 = w1 and u2 = w2 , thus proving the uniqueness of the representation.
Vector Spaces 139

3.4 Bases and dimensions


In the previous section we considered the linear subspace generated (or equiv-
alently, spanned) by a set of vectors. Clearly, the whole space spans itself.
A question of interest is to characterize minimal sets of vector spanning the
whole space, where minimality is in the sense that if any vector in the set is
removed, then the set no longer spans the entire space. As we will see, if there
exists a finite set of vectors spanning the space, then it is possible to define a
dimension for that space, in the same sense as a line is one-dimensional and
a plane is two-dimensional.

3.4.1 Linear dependence


Definition 3.21 Let V be a vector space over F. Let S ⊆ V . We say that
a vector v ∈ V is linearly-dependent (!‫ )תלוי לינארית‬on S if
v ∈ Span S,
i.e., if we can compose v as a linear combination of vectors in S.

Example: It is always true that 0V is linearly dependent on any set S (even


empty), as 0V is in the span of every subset. ▲▲▲

Example: Let V = (R3 , +, R, ⋅). Then, the vector v = (1, 1, 0) is linearly-


dependent on S = {(1, 0, 0), (0, 1, 0)}; it is also linearly-dependent on S =
{(1, 0, 0), (0, 1, 0), (0, 0, 1)}, but it not linearly-dependent on {(1, 0, 0), (0, 0, 1)}.
Note also that (1, 0, 0) is dependent on S = {(0, 1, 0), v}. ▲▲▲

Example: Let V = (R2 , +, R, ⋅) and consider the vectors


√ √
u = (1, 0) v = (−1/2, 3/2) and w = (−1/2, − 3/2).
v

w
140 Chapter 3

Then, every vector is dependent on every two other. Furthermore, we note


that
1 ⋅ u + 1 ⋅ v + 1 ⋅ w = 0.
▲▲▲
This last example motivates the following definition:

Definition 3.22 Let V be a vector space over F. A set (possibly infinite)


S ⊆ V is called linearly-dependent if there exists an n ∈ N, a sequence of
distinct vectors (v1 , . . . , vn ) ⊂ S and a sequence of scalars (a1 , . . . , an ) ⊂ F
not all of which are 0F , such that

a1 v1 + ⋅ ⋅ ⋅ + an vn + = 0V .

The set is called linearly-independent (!‫ )בלתי תלוי לינארית‬if it is not


linearly-dependent.

Example: Let V = (R2 , +, R, ⋅) and let

S = {(1, 0), (0, 1), (1, 1)}.

This set is linearly-dependent because

1 ⋅ (1, 0) + 1 ⋅ (0, 1) + (−1) ⋅ (1, 1) = (0, 0) = 0V .

On the other hand, the set {(1, 0), (0, 1)} is linearly-independent because

a1 (1, 0) + a2 (0, 1) = (a1 , a2 )

equals 0V only if a1 = a2 = 0. ▲▲▲

Comment: Be aware of the difference between a set (!‫ )קבוצה‬and a se-


quence (!‫)סדרה‬. A set is a collection of elements with no notion of order
among them; moreover, every element only counts once, e.g., {1} ∪ {1} = {1}.
A sequence, on the other hand, is an assignment of an element of a set to
ordinal positions (first, second etc.). In particular, the same element may
appear repeatedly in different positions of that sequence.
Vector Spaces 141

Comment: We can reformulate the properties of linear-dependence and


linear-independence using matrix notation. Given a sequence of vectors
(v1 , . . . , vn ), we may express linear combinations of that sequence via multi-
plication by a column vector c ∈ Fncol ,

c1 v1 + ⋅ ⋅ ⋅ + cn vn = (v1 . . . vn ) c.

Then, a set of vectors S is linearly-independent if for every sequence of dis-


tinct elements (v1 , . . . , vn ) in S,

(v1 . . . vn ) c = 0

if and only if c ∈ Fncol is the zero vector.

Proposition 3.23 Let V be a vector space over F. Let S ⊆ V . Then, the


following statements are equivalent:

(a) S is linearly-dependent.
(b) There exists a vector v ∈ S which is dependent on S ∖ {v}.

Proof : (a) ⇒ (b): Suppose that S is linearly-dependent. By definition, there


exists a sequence of distinct vectors (v1 , . . . , vn ) in S and a sequence of scalars
(a1 , . . . , an ), not all zero, such that

a1 v1 + ⋅ ⋅ ⋅ + an vn = 0.

Let j ∈ {1, . . . , n} be such that aj ≠ 0 (at least one such j exists). Then,

vj = − ∑(ai /aj )vi ,


i≠j

proving that vj ∈ Span S ∖ {vj }.


(b) ⇒ (a): Suppose that there exists a v ∈ S, such that v ∈ Span S ∖ {v}.
That is, there exists a sequence of distinct vectors (v1 , . . . , vn ) ⊂ S ∖ {v} and
a sequence of scalars (a1 , . . . , an ), such that

v = a1 v 1 + ⋅ ⋅ ⋅ + an v n .
142 Chapter 3

Setting vn+1 = v and an+1 = (−1), we obtain that (v1 , . . . , vn , vn+1 ) are distinct
vectors in S satisfying

a1 v1 + ⋅ ⋅ ⋅ + an vn + an+1 vn+1 = 0,

i.e., S is linearly-dependent. n
What makes a set S ⊆ V linearly-independent? If for every sequence (v1 , . . . , vn ) ⊂
S of distinct vectors and every sequence (a1 , . . . , an ) of scalars,

a1 v 1 + ⋅ ⋅ ⋅ + an v n = 0

if and only if a1 = ⋅ ⋅ ⋅ = an = 0.

Proposition 3.24 Let V be a vector space over F. Let S ⊂ V . Then, the


following statements are equivalent:

(a) S is linearly-independent.
(b) Every v ∈ S is linearly-independent of S ∖ {v}.

Proof : This is just a reformulation of the previous proposition in negated


form. n
The following claims are quite immediate:

Proposition 3.25 Let V be a vector space over F.

(a) A set containing a linearly-dependent subset is linearly-dependent.


(b) A subset of a linearly-independent set is linearly-independent.
(c) Any set containing 0V is linearly-dependent.
(d) A set S is linearly-independent if and only if every finite subset of S is
linearly-independent.

Proof :
Vector Spaces 143

(a) Suppose that S ⊂ V contains a subset T ⊂ S which is linearly-dependent.


By definition, there exist distinct vectors (v1 , . . . , vn ) ⊂ T and scalars
(a1 , . . . , an ) ⊂ F, such that
a1 v1 + ⋅ ⋅ ⋅ + an vn = 0.
Since (v1 , . . . , vn ) ⊂ S, it follows by definition that S is linearly-dependent.
(b) Let S be linearly-independent and let T ⊂ S be a non-empty subset. If
T was linearly-dependent, it would follows from the first item that S is
linearly-dependent, which is a contradiction. Hence every non-empty
subset of S is linearly-independent.
(c) Suppose that 0V ∈ S. Then, taking n = 1, v1 = 0V and a1 = 1F , we
obtain that
a1 v1 = 1F ⋅ 0V = 0V ,
hence S is linearly-dependent.
(d) By the second item, if S is linearly-independent, then any of its subsets,
whether finite or not, is linearly-independent. Conversely, suppose that
every subset of S is linearly independent. If S is linearly-dependent,
then there exist distinct (v1 , . . . , vn ) ⊂ S and scalars (a1 , . . . , an ) ⊂ F,
not all of which are zero, such that
a1 v1 + ⋅ ⋅ ⋅ + an vn = 0.
This implies that the finite subset {v1 , . . . , vn } of S is linearly-dependent,
in contradiction. Hence, S is linearly-independent.

Exercises

(easy) 3.39 Why did we insist in Definition 3.22 that the vectors vi be
distinct?
Solution 3.39: If we allow vectors to repeat, e.g., S = {v}, and v1 = v2 = v, then
1F ⋅ v1 + (−1F ) v2 = 0.
Without this restriction, every non-empty set of non-zero vectors would be linearly-
dependent.
144 Chapter 3

(easy) 3.40 Let v ∈ V . Show that the set {v} is linearly-dependent if and
only if v = 0V .

Solution 3.40: If v = 0V , then


1F ⋅ v = 0,
so by definition {v} is linearly-dependent. If v ≠ 0V , then the only non-trivial linear
combinations of elements of {v} are
av
with a ≠ 0F , and such a linear combination cannot be the zero vector.

(easy) 3.41 Show that if two vectors are linearly-dependent, then one is a
(scalar) multiple of the other.

Solution 3.41: Suppose that the set {u, v} is linearly-dependent. This means that
there exists a non-trivial linear combination that vanishes,

a u + b v = 0V .

If a ≠ 0F then u = −a−1 b v; if b ≠ 0F then v = −b−1 a u. In either case, one is a scalar multiple


of the other.

(intermediate) 3.42 Find three vectors in (R3 , +, R, ⋅) that are linearly −


dependent, but every pair of them is linearly-independent.

Solution 3.42: Just restrict the vectors to a plane. For instance,


u = (1, 0, 0) v = (0, 1, 0) and w = (1, 1, 0).

(intermediate) 3.43 Let V = (R4 , +, R, ⋅). Are the vectors

v1 = (1, 1, 2, 4) v2 = (2, −1, −5, 2)


v3 = (1, −1, −4, 0) v4 = (2, 1, 1, 6)

linearly-independent?
Vector Spaces 145

Solution 3.43: This amount to asking whether the linear system


⎡1
⎢ 2 1 2⎤⎥ ⎡⎢x1 ⎤⎥ ⎡⎢0⎤⎥
⎢1 −1 −1 1⎥⎥ ⎢⎢x2 ⎥⎥ ⎢⎢0⎥⎥

⎢ ⎥⎢ ⎥ = ⎢ ⎥
⎢2
⎢ −5 −4 1⎥⎥ ⎢⎢x3 ⎥⎥ ⎢⎢0⎥⎥
⎢4
⎣ 2 0 6⎥⎦ ⎢⎣x4 ⎥⎦ ⎢⎣0⎥⎦
⎡1 1
⎢ 2 4⎤⎥ ⎡⎢x1 ⎤⎥ ⎡⎢0⎤⎥
⎢2 −1 −5 2⎥ ⎢x2 ⎥ ⎢0⎥
⎢ ⎥⎢ ⎥ ⎢ ⎥
⎢ ⎥⎢ ⎥ = ⎢ ⎥
⎢1 −1 −4 0⎥ ⎢x3 ⎥ ⎢0⎥
⎢ ⎥⎢ ⎥ ⎢ ⎥
⎢2 1
⎣ 1 6⎥⎦ ⎢⎣x4 ⎥⎦ ⎢⎣0⎥⎦
doesn’t have a non-trivial solution. We know that this is the case if and only if the matrix
of coefficients is row-equivalent to the unit matrix. Reducing the matrix,
⎡1 2 1 2⎤⎥ ⎡1 0 −1/3 4/3⎤⎥
⎢ ⎢
⎢1 −1 −1 1⎥⎥ ⎢0 1 2/3 1/3⎥⎥
⎢ ⎢
⎢ ⎥ → ⎢ ⎥.
⎢2 −5 −4 1⎥⎥ ⎢0 0 0 0 ⎥⎥
⎢ ⎢
⎢4 2 0 6⎥⎦ ⎢0 0 0 0 ⎥⎦
⎣ ⎣
Hence they are linearly-dependent. Indeed, for example
v1 − 2v2 + 3v3 = (0, 0, 0, 0).

(intermediate) 3.44 Consider the set C2 and let


u = (1 − ı, 3 + ı) and v = (1, 1 + 2ı).

(a) Is {u, v} linearly-dependent in (C2 , +, C, ⋅)?


(b) Is {u, v} linearly-dependent in (C2 , +, R, ⋅)?

Solution 3.44:
(a) Suppose that a, b ∈ C satisfy
a(1 − ı, 3 + ı) + b(1, 1 + 2ı) = (0, 0).
That is,
(1 − ı)a + b = 0 and (3 + ı)a + (1 + 2ı)b = 0.
The homogeneous linear system
1−ı 1 a 0
[ ][ ] = [ ]
3 + ı 1 + 2ı b 0

has non-trivial solutions (check that the determinant of the matrix vanishes), hence
the vectors u and v are linearly-dependent.
146 Chapter 3

(b) For a, b ∈ R, we get separate equations for the real and imaginary parts,
⎡ 1 1⎤ ⎡0⎤
⎢ ⎥ ⎢ ⎥
⎢−1 0⎥ a ⎢0⎥
⎢ ⎥ ⎢ ⎥
⎢ ⎥[ ] = ⎢ ⎥,
⎢ 3 1⎥ b ⎢0⎥
⎢ ⎥ ⎢ ⎥
⎢ 1 2⎥ ⎢0⎥
⎣ ⎦ ⎣ ⎦
And you can check that this system only has trivial solutions, hence the vectors u
and v are linearly-independent.

(intermediate) 3.45 Let V be a vector space over F and let U1 , U2 ≤ V


such that U1 ∩ U2 = {0V }. Let L1 ⊆ U1 and L2 ⊆ U2 be linearly-independent
sets. Show that L1 ∪ L2 is linearly-independent.
Solution 3.45: Let u1 , . . . , un ∈ L1 , v1 , . . . , vm ∈ L2 and a1 , . . . , an , b1 , . . . , bm ∈ F satisfy
(a1 u1 + ⋅ ⋅ ⋅ + an un ) + (b1 v1 + ⋅ ⋅ ⋅ + bm vm ) = 0V .
Then,
a1 u1 + ⋅ ⋅ ⋅ + an un = −(b1 v1 + ⋅ ⋅ ⋅ + bm vm ).
The left-hand side is an element of U1 and the right-hand side is an element of U2 . Since
their intersection contains only the zero vector, it follows that
(a1 u1 + ⋅ ⋅ ⋅ + an un ) = (b1 v1 + ⋅ ⋅ ⋅ + bm vm ) = 0V ,
but since L1 and L2 are both linearly-independent, it follows that all the coefficients vanish,
i.e., L1 ∪ L2 is linearly-independent.

(intermediate) 3.46 In conjunction with the previous exercise, could we


omit the condition that U1 ∩ U2 = {0V }?
Solution 3.46: No. Take for example, V = R, U1 = U2 = V , L1 = {1} and L2 =
{2}. Then, L1 and L2 are each linearly-independent, however L1 ∪ L2 {1, 2} is linearly-
dependent.

(intermediate) 3.47 Consider the vector space (F2col , +, F, ⋅). Show that
the set
a c
S = {[ ] , [ ]}
b d
is linearly-independent if and only if ad − bc ≠ 0.
Vector Spaces 147

Solution 3.47: The question is under what conditions on a, b, c, d, the equation


a c 0
x[ ] + y[ ] = [ ]
b d 0

only has trivial solutions. That is, under what conditions

a c x 0
[ ][ ] = [ ].
b d y 0

only has trivial solutions. We know that the answer is if and only if the determinant ad−bc
does not vanish.

3.4.2 Bases
Definition 3.26 Let V be a vector space over F. A subset S ⊆ V is called
a generating set (!‫ )קבוצה יוצרת‬if

Span S = V.

It is called a basis (!‫ )בסיס‬for V if it is a linearly-independent generating set.


A vector space having a finite basis is called finite-dimensional (‫ממימד‬
!‫)סופי‬, or finitely-generated (!‫)נוצר סופי‬. Otherwise, it is called infinite-
dimensional.

Example: Let V = (R2 , +, R, ⋅). Then,

S = {(1, 0), (0, 1), (1, 1)} ⊂ V

is not a basis because it is linearly-dependent; on the other hand, it spans


V , i.e., it is a generating set. Likewise,

T = {(1, 0)} ⊂ V

is not a basis because it does not span V ; for example,

(1, 1) ∈/ Span T.

▲▲▲
148 Chapter 3

Example: Let V = (Rn , +, R, ⋅) for some n ∈ N. The set of vectors

e1 = (1, 0, 0, . . . , 0, 0)
e2 = (0, 1, 0, . . . , 0, 0)
⋮=⋮
en = (0, 0, 0, . . . , 0, 1)

is a basis called the standard basis (!‫)הבסיס הסטנדרטי‬. We leave it as an


exercise to prove that this is indeed a basis. ▲▲▲

Example: Consider the vector space (C, +, C, ⋅). Then,

S = {1}

is a basis. Why? A singleton containing a non-zero vector is always linearly-


independent. On the other hand, Span S = C, as every z ∈ C can be written
is
z = z ⋅ 1,
where z on the left-hand side is viewed as a vector, whereas z on the right-
hand side is viewed as a scalar. In fact, {1} is a basis for every field viewed
as a vector space over itself. ▲▲▲

Example: Consider the vector space (C, +, R, ⋅). Then,

S = {1}

is not a basis because for example, ı ∈/ Span{1}. On the other hand,

T = {1, ı}

is a basis. I recommend thinking again about the difference between the last
two examples. ▲▲▲

Example: Let A ∈ GLn (F) and consider

S = {Coli (A) ∶ i = 1, . . . , n} ⊂ Fncol .

We claim that S is a basis for V = Fncol . There are two things to show: that
S is a linearly-independent set and that S spans Fncol (i.e., that the column
space of A is Fncol ).
Vector Spaces 149

Let x ⊂ Fncol be such that


n
∑ xi Coli (A) = 0V .
i=1

Noting that for every j = 1, . . . , n,


n j n
(∑ x Coli (A)) = ∑ aji xi = (Ax)j ,
i
i=1 i=1

it follows that Ax = 0V . Since A is invertible, it follows that x = 0V , proving


that S is linearly-independent.
It remains to show that S spans V . Let c ∈ V . Since A is invertible, the
linear system Ax = c is solvable, i.e., there exists a linear combination of the
columns of A equal to c, proving that S spans V . ▲▲▲

Example: Thus far, all the vector spaces in this section were finitely-
generated. Consider now the vector space of polynomials F[X]. This space
is infinite-dimensional. Why? Let P1 , . . . , Pn ∈ F[X] be a finite set of poly-
nomials; we will show that it cannot span F[X]. Let
n
N = max deg Pi .
i=1

Then, each Pi can be written as


N
Pi = ∑ cij X j ,
j=1

where some of the cij may be zero. For every scalars a1 , . . . , an ,


n N n
∑ ai Pi = ∑ (∑ ai cij ) X j ,
i=1 j=1 i=1

It follows, for example, that X N +1 is not in the span of {P1 , . . . , Pn }. ▲ ▲ ▲


We next provide two additional characterization to bases.

Definition 3.27 Let V be a vector space over F. A subset L ⊂ V is called


maximally linearly-independent (!‫ )בלתי תלויה מקסימלית‬if it is linearly-
independent, and for every v ∈ V ∖ L, L ∪ {v} is linearly-dependent.
150 Chapter 3

Proposition 3.28 Every maximally linearly-independent set is a basis.

Proof : Let L ⊂ V be maximally linearly-independent. In order to show that


it is a basis, it only remains to show that it is a generating set. Suppose it
wasn’t a generating set. By definition, there exist a v ∈/ Span L. We claim
that L ∪ {v} is linearly-independent, in contradiction to L being maximally
linearly-independent. Indeed, if L ∪ {v} was linearly-dependent, there would
exist (v1 , . . . , vn ) ⊂ L, (a1 , . . . , an ) ⊂ F and a ∈ F, such that (a1 , . . . , an , a) are
not all zero, and
a1 v1 + ⋅ ⋅ ⋅ + an vn + av = 0V .
We argue that a ≠ 0, for otherwise, it would imply that L is not linearly-
independent. Thus,
n
v = ∑(−ai /a)vi ,
i=1

in contradiction to v ∈/ Span L. We conclude that V = Span L, hence L is a


basis. n

Definition 3.29 Let V be a vector space over F. A subset G ⊂ V is called


minimally-generating (!‫ )יוצרת מינימלית‬if it is generating, and for every
v ∈ G, G ∖ {v} is not generating.

Proposition 3.30 Every minimally-generating set is a basis.

Proof : Let G ⊂ V be minimally-generating. Since it is a generating set, it


remains to prove that it is linearly-independent. Suppose it weren’t linearly-
independent. This implies the existence of a v ∈ G, such that

v = a1 v 1 + ⋅ ⋅ ⋅ + an v n

for some (v1 , . . . , vn ) ⊂ G ∖ {v} and (a1 , . . . , an ) ⊂ F. We claim that G ∖ {v}


is a generating set in contradiction to the minimality of G. Indeed, since G
is a generating set, every u ∈ V can be written in the form

u = b1 u1 + ⋅ ⋅ ⋅ + bm um ,
Vector Spaces 151

for some (u1 , . . . , um ) ⊂ G and (b1 , . . . , bm ) ⊂ F. Now either the ui do not


comprise v, in which case

u ∈ Span G ∖ {v},

or, if one of the ui equals v, we can substitute for v its expression as a linear
combination of element in G ∖ {v}, so in either case

u ∈ Span G ∖ {v},

showing that G ∖ {v} is a generating set—contradiction. n

Proposition 3.31 Let V be a vector space over F. Let G ⊂ V be a finite


generating set and let L ⊆ G be linearly-independent. Then, there exists a
basis B for V , such that
L ⊆ B ⊆ G.
In other words, every linearly-independent set which is partial to a generating
set, can be expanded into a basis contained that generating set. Alternatively,
every generating set containing a linearly-independent set can be reduced to
a basis containing that linearly-independent set.

Proof : Start with L, and add to it vectors in G, as long as the set remains
linearly-independent. This process must terminate, as G is a finite set. Con-
sider the resulting set L ⊂ B ⊂ G. By construction, B is linearly-independent,
and for every v ∈ G ∖ B we obtain that G ∪ {v} is linearly-dependent. It
follows that every such v is in the span of B, i.e.,

G ∖ B ⊂ Span B,

from which follows at once that

G ⊂ Span B,

hence
V = Span G ⊂ Span B ⊂ V,
i.e., B is a generating set, hence a basis. n
152 Chapter 3

Corollary 3.32 Every finitely-generated vector space has a basis (which in


particular is finite).

Proof : Apply the previous proposition with L = ∅. n


As it turns out, every vector space has bases; to show it in the general case
is much more involved, and relies on a fundamental axiom of set theory
called the axiom of choice (!‫)אקסיומת הבחירה‬. You are welcome to read in
Wikipedia about Hamel bases.

Exercises

(easy) 3.48 Prove that for every field F, {1F } is a basis for (F, +, F, ⋅).

Solution 3.48: Every non-zero singleton (a set containing only one vector) is linearly-
independent. It is generating because to every a ∈ F (viewed as a vector) corresponds a ∈ F
(viewed as a scalar), such that a = a ⋅ 1F .

(easy) 3.49 Find a basis for (C2 , +, C, ⋅).

Solution 3.49: The standard basis {(1, 0), (0, 1)} is a basis for C2 over C.

(easy) 3.50 Find a basis for (C2 , +, R, ⋅).

Solution 3.50: The set {(1, 0), (ı, 0), (0, 1), (0, ı)} is a basis for C2 over R.

(intermediate) 3.51 Find a basis for the subspace of R4 spanned by the


four vectors of Exercise 3.43.

Solution 3.51: Recall that


v1 = (1, 1, 2, 4) v2 = (2, −1, −5, 2)
v3 = (1, −1, −4, 0) v4 = (2, 1, 1, 6).

The vectors v1 and v2 are linearly-independent (one is not a scalar multiple of the other).
To show that they form a basis, it suffices to show that v3 and v4 are linear combinations
Vector Spaces 153

of v1 and v2 (for then any vector in the span of all four is in the span of the first two).
Indeed,
v3 = − 13 v1 + 23 v2 and v4 = 34 v1 + 13 v2 .

(intermediate) 3.52 Show that the vectors

v1 = (1, 0, −1) v2 = (1, 2, 1) v3 = (0, −3, 2)

form a basis for (R3 , +, R, ⋅). Write each of the standard basis vectors as a
linear combination of v1 , v2 , v3 .

Solution 3.52: We need to show that the linear system


xv1 + yv2 + zv3 = (a, b, c)

has a unique solution for every a, b, c ∈ R. Writing it in matrix form


⎡1 1 0 ⎤⎥ ⎡⎢x⎤⎥ ⎡⎢a⎤⎥

⎢ ⎥⎢ ⎥ ⎢ ⎥
⎢0 2 −3⎥ ⎢y ⎥ = ⎢ b ⎥ .
⎢ ⎥⎢ ⎥ ⎢ ⎥
⎢−1 1 2 ⎥⎦ ⎢⎣ z ⎥⎦ ⎢⎣ c ⎥⎦

A direct calculation show that the matrix of coefficients is row-equivalent to the unit ma-
trix, hence v1 , v2 , v3 are both linearly-independent and generating. To write for example,
the basis vector e1 as a linear combination of v1 , v2 , v3 , solve the linear system
⎡1 1 0 ⎤⎥ ⎡⎢x⎤⎥ ⎡⎢1⎤⎥

⎢ ⎥⎢ ⎥ ⎢ ⎥
⎢0 2 −3⎥ ⎢y ⎥ = ⎢0⎥ .
⎢ ⎥⎢ ⎥ ⎢ ⎥
⎢−1 1 2 ⎥⎦ ⎢⎣ z ⎥⎦ ⎢⎣0⎥⎦

(intermediate) 3.53 Let V = (M2×2 (F), +, F, ⋅) and consider the subsets

a −a
W1 = {[ ] ∶ a, b, c ∈ F} ,
b c

and
a b
W2 = {[ ] ∶ a, b, c ∈ F} .
−a c

(a) Prove that W1 , W2 ≤ V .


154 Chapter 3

(b) Prove that W1 + W2 ≤ V (repeat the proof which was given for the
general case).
(c) Find bases for W1 , W2 , W1 + W2 and W1 ∩ W2 .

Solution 3.53:
(a) Just check that W1 , W2 are not empty and closed under vector addition and scalar
multiplication.
(b) Let u, v ∈ W1 + W2 . By definition, there exist u1 , v1 ∈ W1 and u2 , v2 ∈ W2 , such
that
u = u1 + u2 and v = v1 + v2 .
Thus,

u + v = (u1 + u2 ) + (v1 + v2 ) = (u1 + v1 ) + (u2 + v2 ) ∈ W1 + W2 .

Likewise, for a ∈ F,

a v = a (v1 + v2 ) = a v1 + a v2 ∈ W1 + W2 ,

proving that W1 + W2 is a linear subspace of V .


(c)

1 −1 0 0 0 0 1 0 0 0 0 1
S1 = {[ ],[ ],[ ]} and S2 = {[ ],[ ],[ ]}
0 0 1 0 0 1 −1 0 1 0 0 0

are bases for W1 and W2 . Since any 2×2 matrix can be written as a sum of elements
in W1 and W2 ,

a b a/2 −a/2 a/2 b + a/2


[ ]=[ ]+[ ],
c d c + a/2 d/2 −a/2 d/2

it follows that W1 + W2 = M2×2 (F), and we may take for basis

1 0 0 0 0 1 0 0
{[ ],[ ],[ ],[ ]} .
0 0 1 0 0 0 0 1

Finally,
1 −1 0 0
{[ ],[ ]}
−1 0 0 1
is a basis for W1 ∩ W2 .
Vector Spaces 155

(intermediate) 3.54 Consider the matrix


⎡1 0 1⎤
⎢ ⎥
⎢ ⎥
A = ⎢0 1 0⎥ ∈ M3 (R).
⎢ ⎥
⎢1 1 1⎥
⎣ ⎦
This matrix defines two subspaces of R3col : the column space and the space
of solutions to AX = 0. Find a basis for each subspace. Show that those
subspaces are different.

Solution 3.54: Since the first and third columns are identical, it is easy to see that the
first two columns are a basis for the column space of A. On the the other hand the space
of solution of AX = 0, is obtained by reducing the matrix,
⎡1 0 1⎤⎥ ⎡1 0 1⎤⎥
⎢ ⎢
⎢ ⎥ ⎢ ⎥
⎢0 1 0⎥ → ⎢0 1 0⎥ .
⎢ ⎥ ⎢ ⎥
⎢1 1 1⎥⎦ ⎢0 0 0⎥⎦
⎣ ⎣
The space of solutions is all vectors of the form [−a, 0, a]T , for which a basis is the singleton
{[1, 0, −1]T }.

(intermediate) 3.55 Let S be a non-empty finite set and let V = Func(S, F)


(we saw that this is a vector space over F). For each t ∈ S, denote by ft ∶ S → F
the function defined by

⎪1 s = t

ft (s) = ⎨
⎩0 otherwise.

Show that
{ft ∶ t ∈ S}
is a basis for V .

Solution 3.55: Let S = (s1 , . . . , sn ), Let f ∶ S → F be given by


f (si ) = ai

for every i = 1, . . . , n. Then,


f = a1 fs1 + ⋅ ⋅ ⋅ + an fsn ,
proving that {ft ∶ t ∈ S} generates Func(S, F). To show that this set is independent, let

a1 fs1 + ⋅ ⋅ ⋅ + an fsn = 0,
156 Chapter 3

where on the right-hand side is the function returning zero for all input. Substituting sj
on both sides, we obtain that aj = 0, proving that only a trivial combination of the ft ’s
can yield the zero function.

(harder) 3.56 Let V be a vector space over F. Let G ⊆ V be a generating


set and let L ⊆ V be linearly-independent. Show that for every u ∈ L ∖ G
there exists a v ∈ G ∖ L such that

(G ∖ {v}) ∪ {u}

is generating, and
(L ∖ {u}) ∪ {v}
is linearly-independent. This fact is known as the exchange lemma.

Solution 3.56: Let u ∈ L ∖ G. Since G is a generating set, there exist v1 , . . . , vn ∈ G


and a1 , . . . , an ∈ F, such that
u = a1 v1 + ⋅ ⋅ ⋅ + an vn .
Let’s take the minimal n for which such an identity holds, so that in particular, all the ai
are non-zero (note that u ≠ 0V as L is linearly-independent). Furthermore, at least one of
the vk ’s is not in L since L is linearly-independent. Furthermore, at least one of the vk ’s
is not in Span(L ∖ {u}) otherwise, it would imply that u ∈ Span(L ∖ {u}), contradicting
the linear independence of L.
Without loss of generality take vn by the element not in Span(L ∖ {u}). Then,

L ∖ {u} ∪ {vn }

is linearly-independent. On the other hand, the set (G ∖ {vn }) ∪ {u} is generating because
any vector can be written as a linear combination of vectors in G. If this combination
includes vn , then we can replace this term by a linear combination of u and elements in
G, showing that every vector can be written as a linear combination of u and vectors in
G, excluding vn .

(harder) 3.57 Let V be a vector space over F. A proper subspace W < V


is called a hyperplane (!‫ )על מישור‬if for every subspace W ≤ U ≤ V either
U = W or U = V . Show that if V is finitely-generated, dimF V ≥ 2, then there
exists a maximal hyperplane 0 < W < V . Show that for every v ∈ V ∖ W ,

V = Fv + W.
Vector Spaces 157

Solution 3.57: Let


B = (v1 , . . . , vn )
be a basis for V . Let
W = Span{v1 , . . . , vn−1 }.
We will show that W is a hyperplane. Let W ≤ U ≤ V . If U ≠ W , then there exists a
u ∈ U ∖ W . Since B is a basis for V , there exist scalars such that

u = a1 v1 + ⋅ ⋅ ⋅ + an vn .

We claim that an ≠ 0, otherwise u would be in W . It follows that

vn = (an )−1 u − (a1 /an )v1 − ⋅ ⋅ ⋅ − (an−1 /an )vn−1 .

We have just shown that vn ∈ U . It follows that

{v1 , . . . , vn } ⊂ U,

hence
V = Span{v1 , . . . , vn } ≤ U,
but this necessarily implies that U = V , proving that W is a hyperplane. Finally, every

v = b1 v1 + ⋅ ⋅ ⋅ + bn vn

can be written as

v = b1 v1 + ⋅ ⋅ ⋅ + bn−1 vn−1 + bn ((an )−1 u − (a1 /an )v1 − ⋅ ⋅ ⋅ − (an−1 /an )vn−1 )
= (b1 − (a1 /an ))v1 + ⋅ ⋅ ⋅ + (bn−1 − (an−1 /an ))vn−1 + (bn /an ) u,

which is of the form W + Fu.

3.4.3 The dimension of a vector space


We are now in measure to define the dimension of a finitely-generated vector
space:

Proposition 3.33 Let V be a finitely-generated vector space over F. Let


G = {v1 , . . . , vn } ⊂ V be a (finite) generating set for V . Then, any linearly-
independent set of vectors has no more than n elements.
158 Chapter 3

Proof : Let S ⊂ V have more than n elements, and let {u1 , . . . , un+1 } ⊂ S be
distinct vectors. By the definition of a generating set, each of the vectors ui
is in the span of that G, hence there exist (n + 1) × n scalars aij such that

uj = a1j v1 + ⋅ ⋅ ⋅ + anj vn for every j = 1, . . . , n + 1.

(Since G is finite, we may always consider vectors in its span as a linear


combination of all vi , with some coefficients being possibly zero.) For any
sequence of scalars (c1 , . . . , cn+1 ),
n+1 n+1 n n n+1
∑ cj uj = ∑ cj (∑ aij vi ) = ∑ ( ∑ aij cj ) vi .
j=1 j=1 i=1 i=1 j=1

Consider the n × (n + 1) matrix A whose entries are aij . It represents a system


of equations having more variables than equations. We know that for such a
system, the homogeneous equation has non-trivial solutions. That is, there
exists a c ≠ 0Fn+1
col
, such that

ai1 c1 + ⋅ ⋅ ⋅ + ain+1 cn+1 = 0 for all i = 1, . . . , n.

For that c,
c1 u1 + ⋅ ⋅ ⋅ + cn+1 un+1 = 0V ,
proving that the set {u1 , . . . , un+1 } ⊂ S is linearly-dependent, hence so is
S (which contains a linearly-dependent set). It follows that any linearly-
independent set of vectors contains at most n vectors.
We may rewrite this proof in matrix form. Let the sequence (v1 , . . . , vn ) be
an ordered n-tuple containing the vectors in G, and let (u1 , . . . , um ), m > n,
be any sequence of m vectors. Since the {vi } are a generating set, there
exists an n × m matrix A, such that

(u1 . . . um ) = (v1 . . . vn ) A.

Since A has more columns than rows, there exists a non-zero c ∈ Fm


col such
that Ac = 0, i.e.,

(u1 . . . um ) c = (v1 . . . vn ) Ac = 0V ,

proving that the vectors {ui } are linearly-dependent. n


Vector Spaces 159

Corollary 3.34 Let V be a finitely-generated vector space over F. Then,


every two bases have the same number of elements.

Proof : Let {v1 , . . . , vn } and {u1 , . . . , um } be two bases for V . By the pre-
vious theorem, since bases are by definition generating sets and linearly-
independent, m ≤ n and n ≤ m, which completes the proof. n
This last corollary implies that we can associate with every non-zero finitely-
generated vector space a natural number which is the cardinality of any of
its bases. This number is called the dimension (!‫ )מימד‬of V , and is denoted

dimF V.

Note the explicit mention of the field F, as the same set of vectors may con-
stitute a vector space of different dimension depending on the field. The zero
space (which contains no independent sets of vectors) is assigned dimension
zero,
dimF {0V } = 0.
If follows from the last theorem that if dim V = n, then every set of vectors
containing more than n elements is linearly-dependent, and that no set of
vectors containing fewer than n elements can span V (see exercises).

Example: The vector space (C, +, C, ⋅) has dimension 1 (because {1} is a


basis),
dimC C = 1.
The vector space (C, +, R, ⋅) has dimension 2 (because {1, ı} is a basis),

dimR C = 2.

Generally, for any field F, the vector space (Fn , +, F, ⋅) has dimension n,

dimF Fn = n,

because the standard basis has n elements. ▲▲▲


160 Chapter 3

Lemma 3.35 Let V be a vector space over F. Let S ⊂ V be linearly-


independent and let v ∈/ Span S. Then, S ∪ {v} is linearly-independent.

Proof : This was essentially proved in Proposition 3.28, but for the sake of
completeness, we repeat the proof. Suppose, by contradiction that S ∪ {v}
is linearly-dependent. Then, there exist distinct vectors v1 , . . . , vn ∈ S and
scalar c1 , . . . , cn , c, not all zero, such that

c1 v1 + ⋅ ⋅ ⋅ + cn vn + cv = 0V .

If c = 0, then this contradicts the linear-independence of S. If on the other


hand c ≠ 0, then
n
v = ∑(−ci /c)vi ,
i=1

in contradiction to v not being in the span of S. n

Proposition 3.36 Let V be a finitely-generated vector space over F. Let


W ≤ V be a linear subspace. Then, every linearly-independent subset S ⊆ W
is part of a basis for W . In particular, since V ≤ V , every basis for W can
be extended to a basis for V .

Proof : If S spans W then it is a basis for W and we are done. Otherwise,


there exists a vector
v1 ∈ W ∖ Span S.
By the previous lemma, S ∪ {v1 } is linearly-independent. If it spans W then
it is a basis for W and we are done. Otherwise, there exists a vector

v2 ∈ W ∖ Span(S ∪ {v1 }).

By the previous lemma, S ∪ {v1 , v2 } is linearly-independent. We proceed


inductively. Eventually, after no more than dimF V steps (because there
exist at most dimF V linearly-independent vectors), we obtain a basis for W
containing S. n
Vector Spaces 161

Corollary 3.37 Let V be a finitely-generated vector space over F. Let W <


V be a proper linear subspace. Then,

dimF W < dimF V.

(In particular, W is finitely-generated.)

Proof : If W = {0V } then dimF W = 0 and we are done. Otherwise, there


exists a non-zero w ∈ W . By the previous proposition and its proof, there
exists a basis S for W containing w and having no more than dim V elements,
hence
dimF W ≤ dimF V.
Since W < V , there exists a vector v ∈ V ∖ W , hence not in the span of S.
It follows that S ∪ {v} is linearly-independent (as a collection of vectors in
V ), hence a basis for V contains at least dimF W + 1 vectors, from which we
conclude that dimF W < dimF V . n
In fact, the following holds:

Corollary 3.38 Let V be a finitely-generated vector space of dimension n


and let W ≤ V . Then,

dimF W = dimF V if and only if W = V.

Proof : One direction is immediate. For the other direction, suppose that
dimF W = dimF V . If W < V , then there exists a v ∈ V ∖ W . Let L be a
maximally-independent set for W ; then L ∪ {v} is independent, proving that
L is not maximally-independent for V , hence dimF W < dimF V , which is a
contradiction. n
Finally, a statement reminiscent of the inclusion-exclusion principle:
162 Chapter 3

Proposition 3.39 Let W1 , W2 be finitely-generated linear subspaces of a


vector space V . Then, the linear subspace W1 + W2 is finitely-generated and

dimF (W1 + W2 ) = dimF W1 + dimF W2 − dimF (W1 ∩ W2 ).

Comment: The example you should have in mind is V = R3 , W1 being the


xy-plane and W2 being the xz-plane. Then, W1 + W2 = R3 and W1 ∩ W2 is
the x-axis. In this case,
dimR (W1 + W2 ) = dimR W1 + dimR W2 − dimR (W1 ∩ W2 ) .
´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¸¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¶ ´¹¹ ¹ ¹ ¹ ¹ ¹ ¹¸¹ ¹ ¹ ¹ ¹ ¹ ¹ ¶ ´¹¹ ¹ ¹ ¹ ¹ ¹ ¹¸¹ ¹ ¹ ¹ ¹ ¹ ¹ ¶ ´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¸¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¶
=3 =2 =2 =1

Proof : Note that V may not be finitely-generated, but since W1 ∩ W2 ≤


W1 , W2 , it follows from Corollary 3.37 that W1 ∩ W2 is finitely-generated. Let
dimF (W1 ∩ W2 ) = k dimF W1 = k + n and dimF W2 = k + m
(a priori, k, m and n may be zero). Let {u1 , . . . , uk } be a basis for W1 ∩ W2 .
By Proposition 3.36, it is part of a basis
{u1 , . . . , uk } ∪ {v1 , . . . , vn }
for W1 and it is part of a basis
{u1 , . . . , uk } ∪ {w1 , . . . , wm }
for W2 . Clearly, the set
S = {u1 , . . . , uk } ∪ {v1 , . . . , vn } ∪ {w1 , . . . , wm }
spans W1 + W2 (convince yourself that this is true). If we show that S is also
linearly-independent then, by definition, it is a basis for W1 + W2 , in which
case dimF (W1 + W2 ) = k + m + n, proving the claim.
Suppose, by contradiction, that S is dependent. This implies that there exist
scalars a1 , . . . , ak , b1 , . . . , bn and c1 , . . . , cm , not all of which are zero, such that
k n m
∑ ai ui + ∑ bi vi + ∑ ci wi = 0V .
i=1 i=1 i=1
Vector Spaces 163

Thus,
m k n
∑ ci wi = − ∑ ai ui − ∑ bi vi .
i=1 i=1 i=1

The left-hand side is in W2 , whereas the right-hand side is in W1 . Thus, both


sides are in W1 ∩ W2 . Moreover, they can’t be zero, otherwise either the {wi }
or the {ui } ∪ {vi } are not linearly-independent (recall that at least one of the
coefficients is non-zero). Thus, we conclude that at least one of the {ci } is
non-zero.
Since the vectors {u1 , . . . , uk } form a basis for W1 ∩ W2 , it follows that there
exist scalars d1 , . . . , dk , such that

m k
∑ ci wi = ∑ di ui ,
i=1 i=1

or
m k
∑ ci wi − ∑ di ui = 0,
i=1 i=1

contradicting the fact that the vectors {u1 , . . . , uk }∪{w1 , . . . , wm } are linearly-
independent. Hence, S is linearly-independent, and therefore a basis. n
We end this section with a very important theorem:

Theorem 3.40 Let A ∈ Mn (F). Then, A is invertible if and only if its rows
form a linearly-independent set in Fnrow .

Proof : Suppose that the rows of A form a linearly-independent set in Fnrow .


Since dimF Fnrow = n, it follows that the rows of A form a basis, and in par-
ticular are a generating set. Thus, there exists for every i = 1, . . . , n a vector
[xi1 , . . . , xin ], such that

⎡ Row1 (A) ⎤
⎢ ⎥
⎢ ⎥
[xi1 ⋯ xin ] ⎢ ⋯ ⎥ = [0 ⋯ 1 ⋯ 0] ,
⎢ ⎥
⎢Rown (A)⎥
⎣ ⎦
´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹¸¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¶
A
164 Chapter 3

where the right-hand side is a vector of zeros except for a 1 in the i-th column.
Then,
⎡ x1 ⋯ x1 ⎤ ⎡ Row1 (A) ⎤ ⎡1 ⎤
⎢ 1 n⎥ ⎢ ⎥ ⎢ ⎥
⎢ ⎥⎢ ⎥ ⎢ ⎥
⎢ ⋮ ⋮ ⋮ ⎥⎢ ⋯ ⎥ = ⎢ ⋱ ⎥,
⎢ n ⎥ ⎢ ⎥ ⎢ ⎥
⎢x1 ⋯ xnn ⎥ ⎢Rown (A)⎥ ⎢ 1⎥⎦
⎣ ⎦⎣ ⎦ ⎣
´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹¸¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¶
A

proving that A is invertible. Conversely, if A is invertible, let B = A−1 .


Suppose that c ∈ Fnrow satisfies cA = 0Fnrow . Then,

c = c I = c (AB) = (c A)B = 0Fnrow ,

i.e., the only vanishing linear combination of the rows of A is the trivial one,
proving that the rows of A are linearly-independent. n

Exercises

(easy) 3.58 Show that the vector space (M2×2 (F), +, F, ⋅) has dimension
four. More generally, show that the vector space (Mm×n (F), +, F, ⋅) has di-
mension mn.

Solution 3.58: The set


1 0 0 1 0 0 0 0
{[ ],[ ],[ ],[ ]}
0 0 0 0 1 0 0 1

is generating and independent.

(easy) 3.59 Let V be a vector space of dimension 3. Show that if U, W ≤ V


with dimF U = dimF W = 2, then U ∩ W ≠ {0V }.

Solution 3.59: Since


dimF (U + W ) = dimF U + dimF W − dimF (U ∩ W ),
´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¸ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¶ ´¹¹ ¹ ¹ ¹ ¸¹ ¹ ¹ ¹ ¹ ¶ ´¹¹ ¹ ¹ ¹ ¹ ¹¸ ¹ ¹ ¹ ¹ ¹ ¹ ¶
≤3 =2 =2

if follows that dimF (U ∩ W ) ≥ 1, hence U ∩ W ≠ {0V }.

(intermediate) 3.60 Let A ∈ Mn (F). Show that A is invertible if and only


if its columns form a linearly-independent set in Fncol .
Vector Spaces 165

Solution 3.60: We know that A is invertible if and only if the homogeneous system
AX = 0 has only trivial solutions, and if and only if the non-homogeneous system AX = b
is consistent for every b ∈ Fncol . Suppose that the columns of A are linearly-independent;
then they form a basis, so that there exists an x ∈ Fncol for which
⎡ x1 ⎤ ⎡1⎤
⎢ 1⎥ ⎢ ⎥
⎢ x2 ⎥ ⎢0⎥
⎢ ⎥ ⎢ ⎥
[Col1 (A) ⋯ Coln (A)] ⎢ 1 ⎥ = ⎢ ⎥ .
⎢ ⋮ ⎥ ⎢⋮⎥
⎢ n⎥ ⎢ ⎥
⎢x1 ⎥ ⎢0⎥
⎣ ⎦ ⎣ ⎦
Repeating this n times, there exist n columns of x’s. such that
⎡ x1 ... x1n ⎤⎥ ⎡⎢1 ⎤
⎢ 1 ⎥
⎢ ⎥ ⎢ ⎥
[Col1 (A) ⋯ Coln (A)] ⎢ ⋮ ⋮ ⋮ ⎥=⎢ ⋱ ⎥,
⎢ n ⎥ ⎢ ⎥
⎢x1
⎣ ⋯ xnn ⎥⎦ ⎢⎣ 1⎥⎦

from which we deduce that A is invertible. Conversely, suppose that A is invertible. Then,
to every b ∈ Fncol corresponds an x ∈ Fncol such that Ax = b, proving that the columns of A
generate Fncol . Since dimF Fncol = n, any generating set of size n is linearly-independent.

(intermediate) 3.61 Find a basis for the vector subspace




⎪ x1 + x2 + x3 + x4 = 0 ⎫


⎪ ⎪

⎪ x3 + x 4 + x5 = 0 ⎪

W = ⎨x ∈ R5 ∶ ⎬

⎪ x1 + x 2 + x5 = 0 ⎪


⎪ ⎪


⎩ x1 + x3 = 0 ⎪

What is its dimension?
Solution 3.61: We need to find all solutions of the homogeneous system
⎡x1 ⎤
⎡1 1 1 1 0⎤⎥ ⎢⎢ 2 ⎥⎥ ⎡⎢0⎤⎥
⎢ x
⎢0 0 1 1 1⎥⎥ ⎢⎢ 3 ⎥⎥ ⎢⎢0⎥⎥

⎢ ⎥ ⎢x ⎥ = ⎢ ⎥ .
⎢1 1 0 0 1⎥⎥ ⎢⎢ 4 ⎥⎥ ⎢⎢0⎥⎥

⎢1 x
⎣ 0 1 0 0⎥⎦ ⎢⎢ 5 ⎥⎥ ⎢⎣0⎥⎦
⎣x ⎦
Reducing the matrix of coefficients,
⎡1 1 1 1 0⎤⎥ ⎡1 0 0 −1 0⎤⎥
⎢ ⎢
⎢0 0 1 1 1⎥⎥ ⎢0 1 0 1 0⎥⎥
⎢ ⎢
⎢ ⎥ → ⎢ ⎥,
⎢1 1 0 0 1⎥⎥ ⎢0 0 1 1 0⎥⎥
⎢ ⎢
⎢1 0 1 0 0⎥⎦ ⎢0 0 0 0 1⎥⎦
⎣ ⎣
we find that the space of solutions is

W = {(s, −s, −s, s, 0) ∶ s ∈ R} = Span {(1, −1, −1, 1, 0)} ,


166 Chapter 3

and its dimension is one.

(intermediate) 3.62 Let V be a finitely-generated vector space and let


U, W ≤ V . Which of the following assertions is true? Prove or find a counter
example.

(a) If 2 + dimF V ≤ dimF U + dimF W then V = U + W .


(b) If 2 dimF V ≤ dimF U + dimF W then V = U + W .
(c) If dimF V > dimF U + dimF W then V ≠ U + W .
(d) If dimF V > dimF U + dimF W then U ∩ W = {0}.

Solution 3.62:
(a) False. Let V = R4 and
U = W = Span{e1 , e2 , e3 }.
Then, dimR V = 4 and dimR U = dimR W = 3, so that 2 + dimR V ≤ dimR U + dimR W .
On the other hand, U + W = U = W ≠ V .
(b) True. In fact, this implies that dimF U = dimF W = dimF V , i.e., U = W = V , and in
particular U + W = V .
(c) True, for if V = U + W , then

dimF V = dimF U + dimF W − dimF (U ∩ W ) ≤ dimF U + dimF W.

(d) False. Take V = R3 and U = W = Rv for some v ∈ R3 . Then, dimR V = 3 and


dimR U = dimR W = 1, i.e., dimR V > dimR U + dimR W and yet U ∩ W = Rv ≠ {0}.

(intermediate) 3.63 Consider the linear subspaces of R4 ,

U = Span {(1, 0, −1, −2), (−1, −1, 0, 2), (1, 2, 1, −1)}

and
x1 + 3x2 + x3 − x4 = 0
W = {x ∈ R4 ∶ }.
x2 − 3x3 + 2x4 = 0
What is dimR (U + W )?
Vector Spaces 167

Solution 3.63: We start by identifying W . It is the space of solutions of


⎡x1 ⎤ ⎡x1 ⎤
⎢ ⎥ ⎢ ⎥
1 3 1 −1 ⎢⎢x2 ⎥⎥ 0 1 0 10 −7 ⎢⎢x2 ⎥⎥ 0
[ ]⎢ ⎥ = [ ] → [ ]⎢ ⎥ = [ ],
0 1 −3 2 ⎢⎢x3 ⎥⎥ 0 0 1 −3 2 ⎢⎢x3 ⎥⎥ 0
⎢x4 ⎥ ⎢x4 ⎥
⎣ ⎦ ⎣ ⎦
so that
W = Span{(−10, 3, 1, 0), (7, −2, 0, 1)}.
Now,

Span{(1, 0, −1, −2), (−1, −1, 0, 2), (1, 2, 1, −1), (−10, 3, 1, 0), (7, −2, 0, 1)} = U + W,

so lets look at it. If those five vectors are a generating set for R4 , then U + W = R4 . So
the question is whether the system of equations
⎡ x1 ⎤
⎡1 −1 1 −10 7 ⎤⎥ ⎢⎢ 2 ⎥⎥ ⎡⎢a⎤⎥
⎢ x
⎢0 −2 2 3 −2⎥⎥ ⎢⎢ 3 ⎥⎥ ⎢⎢ b ⎥⎥

⎢ ⎥ ⎢x ⎥ = ⎢ ⎥
⎢−1 0 1 1 0 ⎥⎥ ⎢⎢ 4 ⎥⎥ ⎢⎢ c ⎥⎥
⎢ x
⎢−2
⎣ 2 −1 0 1 ⎥⎦ ⎢⎢ 5 ⎥⎥ ⎢⎣d⎥⎦
⎣x ⎦
is always solvable. You know how to proceed from here.

(harder) 3.64 Let V be a vector space over F, such that

dimF V = n.

Show that any set of vectors containing less than n vectors does not span V .

(harder) 3.65 Consider the vector space (R, +, Q, ⋅) (i.e. the vectors are
real numbers, the scalars are rational numbers, with the operations of vector
addition and scalar multiplication defined as usual in R). Prove that this
vector space is not finitely-generated. (Hint: start by convincing yourself
that {1} is not a basis for this space; the argument is based on the fact that
Q is countable, whereas R is not.)

Solution 3.65: Suppose that (R, +, Q, ⋅) was finitely-generated. It would imply that
there exists a finite number of real numbers a1 , . . . , an generating R. Consider Span(a1 , . . . , an );
it is a countable set, whereas R is uncountable—contradiction.
168 Chapter 3

(harder) 3.66 Let V be a vector space of dimension n. What is the maximal


m for which there exists linear subspaces

W0 < W1 < ⋅ ⋅ ⋅ < Wm ?

Solution 3.66: The answer is n, since W0 < W1 < ⋅ ⋅ ⋅ < Wm implies that
0 ≤ dimF W0 < dimF W1 < ⋅ ⋅ ⋅ < dimF Wm ≤ n.

Such sequence of subspaces always exists, as take a basis

B = (v1 ... vn ) ,

and then
W0 = ⟨∅⟩ W1 = ⟨v1 ⟩ W2 = ⟨v1 , v2 ⟩ . etc.

3.4.4 The rank of a matrix


Let A ∈ Mm×n (F). We defined for a matrix two vector spaces,

R(A) = Span{Rowi (A) ∶ i = 1, . . . , m} ⊆ Fnrow


C (A) = Span{Colj (A) ∶ j = 1, . . . , n} ⊆ Fm
col .

The row-rank (!‫ )דרגה לפי שורות‬of a matrix is the dimension of its row space,
whereas its column-rank (!‫ )דרגה לפי עמודות‬is the dimension of its column
space. Even though these two spaces are seemlingly unrelated, it turns out
that
dimF R(A) = dimF C (A).
This joint dimension is called the rank (!‫ )דרגה‬of the matrix A.
We start with the row space:

Proposition 3.41 Let R be the row-reduced form of A. Then,

dimF R(A)

equals the number of non-zero rows in R.


Vector Spaces 169

Proof : Lemma 3.15 shows that


R(A) = R(R).
Let p be the number of non-zero rows in R. Then, the row space of R
(equivalently A) is spanned by the first p rows of R. These p rows are
linearly-independent because each an entry rj,kj = 1, while ri,kj = 0 for all
i ≠ j. It follows that
dimF R(R) = p.
n
We proceed with the column space: let R = P A with P ∈ GLm (R). We
have seen that the non-homogeneous system AX = b is solvable if and only
if b ∈ C (A). But this system is solvable if and only if the system
RX = P AX = P b
is solvable, i.e., if and only if P b ∈ C (R). That is,
b ∈ C (A) if and only if P b ∈ C (R).

Proposition 3.42 The column-rank of a matrix equals that of its row-


reduced form.

Proof : Let (v1 , . . . , vp ) be an ordered basis for C (A). If we show that


(P v1 , . . . , P vp ) is an ordered basis for C (R) then we are done. Let w ∈ C (R).
Then, P −1 w ∈ C (A), and there exist scalars (a1 , . . . , ap ) such that
P −1 w = a1 v1 + ⋅ ⋅ ⋅ + ap vp ,
from which we deduce that
w = P (a1 v1 + ⋅ ⋅ ⋅ + ap vp ) = a1 P v1 + ⋅ ⋅ ⋅ + ap P vp ,
proving that (P v1 , . . . , P vp ) generates C (R).
Suppose then
a1 P v1 + ⋅ ⋅ ⋅ + ap P vp = 0Fm
col
.
It follows that
P −1 (a1 P v1 + ⋅ ⋅ ⋅ + ap P vp ) = a1 v1 + ⋅ ⋅ ⋅ + ap vp = 0Fm
col
,
proving that (P v1 , . . . , P vp ) is independent, hence a basis for C (R). n
170 Chapter 3

Proposition 3.43 Let R be the row-reduced form of A. Then,

dimF C (A)

equals the number of non-zero rows in R.

Proof : Let p be the number of non-zero rows in R. By the previous propo-


sition it suffices to show that dimF C (R) = p. Take the p columns Colkj (R),
j = 1, . . . , p. They are independent and spanning because the form together
the unit matrix. n

Example: Consider the matrix


⎡0 0 1 4⎤
⎢ ⎥
⎢ ⎥
A = ⎢2 4 2 6⎥
⎢ ⎥
⎢3 6 2 5⎥
⎣ ⎦
You may verify that
⎡1 2 0 −1⎤ ⎡ 0 −1 1 ⎤ ⎡0 0 1 4⎤
⎢ ⎥ ⎢ ⎥⎢ ⎥
⎢ ⎥ ⎢ ⎥⎢ ⎥
⎢0 0 1 4 ⎥ = ⎢ 1 0 0 ⎥ ⎢2 4 2 6⎥,
⎢ ⎥ ⎢ ⎥⎢ ⎥
⎢0 0 0 0 ⎥ ⎢−2 −3 −2⎥ ⎢3 6 2 5⎥
⎣ ⎦ ⎣ ⎦⎣ ⎦
´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹¸¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¶ ´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¸ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¶ ´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¸ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¶
R P A

and that
⎡0 1 0⎤ ⎡1 2 0 −1⎤ ⎡0 0 1 4⎤
⎢ ⎥⎢ ⎥ ⎢ ⎥
⎢ ⎥⎢ ⎥ ⎢ ⎥
⎢2 2 1⎥ ⎢0 0 1 4 ⎥ = ⎢2 4 2 6⎥ .
⎢ ⎥⎢ ⎥ ⎢ ⎥
⎢3 2 1⎥ ⎢0 0 0 0 ⎥ ⎢3 6 2 5⎥
⎣ ⎦⎣ ⎦ ⎣ ⎦
´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¸ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¶ ´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹¸¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¶ ´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¸ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¶
Q=P −1 R A

Consider first the row space of R. It is spanned by two non-zero rows, hence
its dimension is at most 2; it is in fact equal to 2, because

a [1 2 0 −1] + b [0 0 1 4] = [0 0 0 0]

if and only if a = b = 0. Consider then the column space of R. It consists of


column vector of length 3 whose last entry is zero; this space has dimension
Vector Spaces 171

at most 2. Its dimension is 2 because the first and third columns are linearly-
independent. Thus,
dimR R(R) = dimR C (R) = 2.

The question is why these are also the dimensions of the row and column
spaces of A. The easier part to see is the row space. The rows of A and
linear combinations of the rows of R and vice-versa, hence,
{Rowi (A) ∶ i = 1, 2, 3} ⊂ R(R) and {Rowi (R) ∶ i = 1, 2, 3} ⊂ R(A),
from which we deduce that R(A) = R(R), hence
dimR R(A) = 2.
The more surprising fact is that the column space of A has the same dimen-
sion as the column space of R, even though the two spaces are not identical.
The second column of R equals twice its first column,
Col2 (R) = 2 Col1 (R),
and the same holds for the column of A,
Col2 (A) = 2 Col1 (A).
Likewise,
Col4 (R) = 4 Col3 (R) − 5 Col1 (R),
but also,
Col4 (A) = 4 Col3 (A) − 5 Col1 (A).
In other words, the relations between the column of A are the same as the
relations between the columns of R.
Look again at the identity
⎡0 1 0⎤ ⎡1 2 0 −1⎤ ⎡0 0 1 4⎤
⎢ ⎥⎢ ⎥ ⎢ ⎥
⎢ ⎥⎢ ⎥ ⎢ ⎥
⎢2 2 1⎥ ⎢0 0 1 4 ⎥ = ⎢2 4 2 6⎥ .
⎢ ⎥⎢ ⎥ ⎢ ⎥
⎢3 2 1⎥ ⎢0 0 0 0 ⎥ ⎢3 6 2 5⎥
⎣ ⎦⎣ ⎦ ⎣ ⎦
´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹¸¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¶ ´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¸ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¶
R A

It states that the first and third columns of A are the first and third columns
of R, and that the other columns of A are linear combinations of those same
two columns of R. Hence dimR C (A) = 2. ▲▲▲
172 Chapter 3

3.5 Coordinates
3.5.1 Motivation
Consider the vector space (R2 , +, R, ⋅). It is not hard to verify that the set
S = {(1, 2), (2, 1)}
is a basis for R2 . The fact that S spans R2 implies that every vector (x, y) ∈ R2
can be written as a linear combination
(x, y) = a(1, 2) + b(2, 1)
for some a, b ∈ R. The fact that the vectors in S are independent, implies
that a and b are determined uniquely, as if the pairs of scalars a, b and c, d
satisfy
a(1, 2) + b(2, 1) = c(1, 2) + d(2, 1),
then
(a − c)(1, 2) + (b − d)(2, 1) = (0, 0),
which implies that a = c and b = d. This means that given the basis S, every
element in R2 can be identified with a pair of scalars, which are coefficients
of the basis vectors. For example,
(8, 7) = 2(1, 2) + 3(2, 1).
This is shown in the following plot, where the vector (8, 7) is shown to be
twice the vector (1, 2) plus three times the vector (2, 1). Note also how the
two basis vectors define a grid.

(8, 7)

(1, 2)

(2, 1)
Vector Spaces 173

Given the choice of a basis, every point in R2 can be characterized in a unique


way as a pair of scalars representing coefficients of the two basis vectors. In
other words, after having chosen a basis S, we may identify the point (8, 7)
with the pair of scalars 2 and 3. Note however, that these coefficients cannot
be viewed as an ordered pair unless we impose an order on the basis vectors.
Thus, for example, if we decided that the basis vector (1, 2) is “first” and
the basis vector (2, 1) is “second”, then we could have identified the points
(8, 7) ∈ R2 as the ordered pair of numbers [2, 3]T , via
2
(8, 7) = ((1, 2), (2, 1)) [ ] .
3
The column vector [2, 3]T is called the coordinate matrix of (8, 7) with re-
spect to the ordered basis ((1, 2), (2, 1)).

3.5.2 Ordered bases and coordinates


We defined a basis for a vector space as a set of vectors that are both gen-
erating and linearly-independent. We already mentioned the fact that a set
in not endowed with an order among its elements. If we want elements in a
set to be ordered, this requires an additional structure. This leads us to the
following definition:

Definition 3.44 Let V be a finitely-generated vector space. An ordered


basis (!‫ )בסיס סדור‬for V is a finite sequence (v1 , . . . , vn ) of vectors, which is
linearly-independent and spans V .

Note that the only difference between an ordered basis and any old basis
is that its elements are ordered... also, a priori, not all the elements of a
sequence have to be distinct, but linear-independence implies at once that
all the elements in the sequence are distinct.

Proposition 3.45 Let V be a finitely-generated vector space, and let B =


(v1 , . . . , vn ) be an ordered basis for V . Then, to every v ∈ V there corresponds
a unique a ∈ Fncol , such that
⎡ a1 ⎤
⎢ ⎥
1 n ⎢ ⎥
v = a v1 + ⋅ ⋅ ⋅ + a vn = (v1 . . . vn ) ⎢ ⋮ ⎥ .
⎢ n⎥
⎢a ⎥
⎣ ⎦
174 Chapter 3

Proof : Since a basis spans V , the existence of such scalars is guaranteed.


For uniqueness, let a, b ∈ Fncol be such that
v = a1 v 1 + ⋅ ⋅ ⋅ + an v n
v = b1 v 1 + ⋅ ⋅ ⋅ + bn v n .
Thus,
(a1 − b1 )v1 + ⋅ ⋅ ⋅ + (an − bn )vn = 0V ,
but since the vectors in B are independent, it follows that ai = bi for every
i = 1, . . . , n, proving the uniqueness of the representation. n
Since, on the other hand, every a ∈ Fncol defines a vector in V via linear
combinations of the vectors in the ordered basis, we have just discovered the
following fact:
Given an ordered basis B = (v1 , . . . , vn ) for a finitely-generated vector space,
there exists a one-to-one correspondence between the elements of V and
elements of Fncol : every element in V can be identified with a unique a ∈ Fncol ,
such that
⎡ a1 ⎤
⎢ ⎥
1 n ⎢ ⎥
v = a v1 + ⋅ ⋅ ⋅ + a vn = (v1 ⋯ vn ) ⎢ ⋮ ⎥ ,
⎢ n⎥
⎢a ⎥
⎣ ⎦
n
and vice-versa, every a ∈ Fcol can be identified with a unique v ∈ V . The
vector a ∈ Fncol is called the coordinate matrix (!‫ )מטריצת הקואורדינטות‬of v
relative to the basis B. We will denote by
[v]B ∈ Fncol
the coordinates of v relative to the basis B, namely, for every basis B =
(v1 , . . . , vn ), the column vector
⎡ a1 ⎤
⎢ ⎥
⎢ ⎥
[v]B = ⎢ ⋮ ⎥
⎢ n⎥
⎢a ⎥
⎣ ⎦
is the unique vector satsfying
⎡ a1 ⎤
⎢ ⎥
⎢ ⎥
v = (v1 . . . vn ) ⎢ ⋮ ⎥ = B [v]B .
⎢ n⎥
⎢a ⎥
⎣ ⎦
Vector Spaces 175

Example: Let V = (Fn , +, F, ⋅) and let E = (e1 . . . en ) be the standard


basis. Every x = (x1 , . . . , xn ) ∈ Fn can be represented as
x = x1 e1 + ⋅ ⋅ ⋅ + xn en ,
i.e., by definition
⎡ x1 ⎤
⎢ ⎥
⎢ ⎥
[x]E = ⎢ ⋮ ⎥ .
⎢ n⎥
⎢x ⎥
⎣ ⎦
I.e., the i-th coordinate of x is xi , which is really what we would expect. In
other words, when we write the entries of a vector x ∈ Fn as a column vector,
we really write its coordinate matrix. ▲▲▲

Example: Let V = (R2 , +, R, ⋅) and let


B = (v1 , v2 ),
with v1 = (1, 1) and v2 = (1, −1). B is an ordered basis for R2 . Consider now
the vector
v = (3, 5).
A direct calculation shows that
4
(3, 5) = ((1, 1) (1, −1)) [ ] ,
−1
i.e., v = B [v]B , where
4
[v]B = [ ] .
−1
See diagram below. ▲▲▲

(3, 5)

(1, 1)

(1, −1)
176 Chapter 3

The following proposition shows that operations on vectors correspond to


analogous operations on their coordinates:

Proposition 3.46 Let (V, +, F, ⋅) be a finitely-generated vector space, and


let B = (v1 , . . . , vn ) be an ordered basis for V . Then, for every u, v ∈ V and
c ∈ F,
[u + v]B = [u]B + [v]B ,
and
[c u]B = c [u]B .

Comment: Note that u + v and c u are operations in (V, +, F, ⋅), whereas


[u]B + [v]B and c [u]B are operations in (Fncol , +, F, ⋅).

Proof : By definition,
⎡ a1 ⎤ ⎡ b1 ⎤
⎢ ⎥ ⎢ ⎥
⎢ ⎥ ⎢ ⎥
[u]B = ⎢ ⋮ ⎥ and [v]B = ⎢ ⋮ ⎥ ,
⎢ n⎥ ⎢ n⎥
⎢a ⎥ ⎢b ⎥
⎣ ⎦ ⎣ ⎦
are the unique matrices satisfying

u = B [u]B and v = B [v]B .

That is,
⎡ a1 ⎤
⎢ ⎥
⎢ ⎥
u = a1 v1 + ⋅ ⋅ ⋅ + an vn = (v1 . . . vn ) ⎢ ⋮ ⎥ ,
⎢ n⎥
⎢a ⎥
⎣ ⎦
and
⎡ b1 ⎤
⎢ ⎥
⎢ ⎥
v = b1 v1 + ⋅ ⋅ ⋅ + bn vn = (v1 . . . vn ) ⎢ ⋮ ⎥ .
⎢ n⎥
⎢b ⎥
⎣ ⎦
Hence,
⎡ a1 + b 1 ⎤
⎢ ⎥
⎢ ⎥
u + v = (a1 + b1 )v1 + ⋅ ⋅ ⋅ + (an + bn )vn = (v1 . . . vn ) ⎢ ⋮ ⎥ ,
⎢ n n⎥
⎢a + b ⎥
⎣ ⎦
Vector Spaces 177

which we may also write as


u + v = B ([u]B + [v]B ),
proving that [u + v]B = [u]B + [v]B . Likewise,
c u = c(a1 v1 + ⋅ ⋅ ⋅ + an vn ) = (c a1 )v1 + ⋅ ⋅ ⋅ + (c an )vn ,
which we may write as
⎡ c a1 ⎤
⎢ ⎥
⎢ ⎥
cu = (v1 . . . vn ) ⎢ ⋮ ⎥ = B (c [u]B ),
⎢ n⎥
⎢c a ⎥
⎣ ⎦
proving that [c u]B = c [u]B . n

Example: Consider once again the vector space V = (R2 , +, R, ⋅) with the
ordered basis
B = ((1, 1), (1, −1)).
Let
u = (−1, 0) and v = (3, 5) hence u + v = (2, 5).
We proceed the calculate the coordinates,
−1/2
u = ((1, 1) (1, −1)) [ ]
−1/2
4
v = ((1, 1) (1, −1)) [ ]
−1
7/2
u + v = ((1, 1) (1, −1)) [ ],
−3/2
so that indeed
[u + v]B = [u]B + [v]B .
▲▲▲
Before we end this section, we emphasize its main result. The choice of an
ordered basis allows us to view vectors in V as matrices of coordinates. Both
V and Fncol are vector spaces over the same field F, but they are different
spaces. What we have is an identification (which really is a one-to-one and
onto function) of elements in V with elements in Fncol . What we proved is
that vector addition and scalar multiplication “respect” this identification:
for example, the column vector representing the sum of two vectors is the
sum of the column vectors representing each vector.
178 Chapter 3

Exercises

(easy) 3.67 Let V = R2 and let

B = ((1, 0), (1, 1)) and C = ((1, 2), (2, 1))

be ordered bases.

(a) Find [v]B and [v]C for v = (3, 3).


(b) Find v, w ∈ R2 for which [v]B = [w]C = [3, 3]T .
(c) What are the coordinate matrices of (1, 2) and (2, 1) relative to the
basis C?

Solution 3.67:
(a) Since
(3, 3) = 0(1, 0) + 3(1, 1) and (3, 3) = 1(1, 2) + 1(2, 1),
it follows that
0 1
[v]B = [ ] and [v]C = [ ] .
3 1

(b) The solution is v = (6, 3) and w = (9, 9).


(c) The solution is

1 0
[(1, 2)]C = [ ] and [(2, 1)]C = [ ] .
0 1

(easy) 3.68 Denote by R2 [X] the space of polynomials of degree up to 2


with real coefficients and let

B = (1, X, X 2 ) C = (X 2 , X, 1) and D = (X + 1, X 2 , X − 1)

be ordered bases.

(a) Write [p]B , [p]C and [p]D for p = 4 + 2X − 6X 2 .


(b) Find polynomials p1 , p2 , p3 such that [p1 ]B = [p2 ]C = [p3 ]D = [1, 1, 1]T .
Vector Spaces 179

(intermediate) 3.69 Show that the vectors


v1 = (1, 1, 0, 0) v2 = (0, 0, 1, 1)
v3 = (1, 0, 0, 4) v4 = (0, 0, 0, 2)
form a basis for (R4 , +, R, ⋅). What are the coordinate matrices of each of the
standard basis vectors e1 , e2 , e3 , e4 is the ordered basis B = (v1 , v2 , v3 , v4 )?
Solution 3.69: To show that they are a basis, we have to show that the matrix
⎡1 0 1 0⎤⎥

⎢1 0 0 0⎥⎥

⎢ ⎥
⎢0 1 0 0⎥⎥

⎢0 1 4 2⎥⎦

is row-equivalent to the unit matrix (remind yourself why!). This is easy to see.
To find for example [e1 ]B , we have to solve the linear system
⎡1
⎢ 0 1 0⎤⎥ ⎡⎢x1 ⎤⎥ ⎡⎢1⎤⎥
⎢1 0 0 0⎥⎥ ⎢⎢x2 ⎥⎥ ⎢⎢0⎥⎥

⎢ ⎥⎢ ⎥ = ⎢ ⎥,
⎢0
⎢ 1 0 0⎥⎥ ⎢⎢x3 ⎥⎥ ⎢⎢0⎥⎥
⎢0
⎣ 1 4 2⎥⎦ ⎢⎣x4 ⎥⎦ ⎢⎣0⎥⎦
obtaining
⎡0⎤
⎢ ⎥
⎢0⎥
⎢ ⎥
[e1 ]B = ⎢ ⎥ .
⎢1⎥
⎢ ⎥
⎢−2⎥
⎣ ⎦
Do the same for the three other standard basis vectors.

(intermediate) 3.70 Let V = (C3 , +, C, ⋅). What are the coordinates of the
vector (1, 0, 1) in the ordered basis
B = ((2ı, 1, 0) (2, −1, 1) (0, 1 + ı, 1 − ı)) ?

Solution 3.70: Solve the linear system


⎡2ı 2
⎢ 0 ⎤⎥ ⎡⎢z 1 ⎤⎥ ⎡⎢1⎤⎥
⎢ ⎥⎢ ⎥ ⎢ ⎥
⎢ 1 −1 1 + ı⎥ ⎢z 2 ⎥ = ⎢0⎥ ,
⎢ ⎥⎢ ⎥ ⎢ ⎥
⎢ 0 1 1 − ı⎥ ⎢z 3 ⎥ ⎢1⎥
⎣ ⎦⎣ ⎦ ⎣ ⎦
and find that
⎡− 1 (1 + ı)⎤
⎢ 2 ⎥
⎢ ı ⎥
[(1, 0, 1)]B = ⎢ 2
⎥.
⎢ 1 ⎥
⎢ (3 + ı) ⎥
⎣ 4 ⎦
180 Chapter 3

(intermediate) 3.71 Let

B = ((1, 0, −1) (1, 1, 1) (1, 0, 0))

be an ordered basis for R3 . calculate

[(a, b, c)]B

for arbitrary a, b, c ∈ R.

Solution 3.71: The solution is obtained by solving the linear system


⎡ 1 1 1⎤ ⎡x1 ⎤ ⎡a⎤
⎢ ⎥⎢ ⎥ ⎢ ⎥
⎢ ⎥⎢ ⎥ ⎢ ⎥
⎢ 0 1 0 ⎥ ⎢ x2 ⎥ = ⎢ b ⎥ ,
⎢ ⎥⎢ ⎥ ⎢ ⎥
⎢−1 1 0⎥ ⎢x3 ⎥ ⎢ c ⎥
⎣ ⎦⎣ ⎦ ⎣ ⎦
yielding
⎡ b−c ⎤
⎢ ⎥
⎢ ⎥
[(a, b, c)]B = ⎢ b ⎥.
⎢ ⎥
⎢c + a − 2b⎥
⎣ ⎦

(intermediate) 3.72 Let W ≤ C3 be the subspace generated by the vectors

v1 = (1, 0, ı) and v2 = (1 + ı, 1, −1).

(a) Show that (v1 , v2 ) form an ordered basis for W .


(b) Show that

u1 = (1, 1, 0) and u2 = (1, ı, 1 + ı)

form another basis for W .


(c) What are the coordinate matrices of v1 , v2 in the ordered basis (u1 , u2 )?

Solution 3.72:
(a) By definition, every set of vectors generates their span, so we only need to show that
v1 and v2 and linearly-independent, which is the case if they are not proportional
to each other, and they are not (just look at the second entry).
Vector Spaces 181

(b) To show that u1 and u2 form a basis for the two-dimensional space W it suffices
to show that they are an independent set in W . That they are independent is
immediate (look at the third entry). To show that they are in W , we note (after a
direct calculation) that

u1 = ı v1 + v2 and u2 = (2 − ı) v1 + ı v2 .

(c) For this, we invert the relations we’ve just found. Let’s be clever, and write this in
matrix form,
ı 2−ı
(u1 u2 ) = (v1 v2 ) [ ].
1 ı
Since we know how to invert a 2 × 2 matrix, we find that

1 ı −2 + ı
(u1 u2 ) [ ] = (v1 v2 ) .
ı−3 −1 ı

Thus,

1 ı 1 −2 + ı
[v1 ](u1 ,u2 ) = [ ] and [v2 ](u1 ,u2 ) = [ ].
ı−3 −1 ı−3 ı

(intermediate) 3.73 Let u = (u1 , u2 ) and v = (v1 , v2 ) be vectors in R2 such


that
u21 + u22 = v12 + v22 = 1 and u1 v1 + u2 v2 = 0.

(a) Interpret the properties of those vectors geometrically.


(b) Show that {u, v} is a basis for R2 .
(c) Find the coordinates of (x, y) in the ordered basis (u, v).

3.5.3 Transitions between bases


An ordered basis of n vectors enables us to view vectors (which are abstract
entities) as n-tuples of scalars, which are more concrete entities. But bear in
mind that we cannot say that a vector in a general finitely-generated vector
space is an n-tuple of scalars. This identification relies on the choice of a
basis. The same vector may have different coordinate matrices depending on
the ordered basis relative to which they are defined. A natural question is
the relation between coordinates of vectors relative to different bases.
182 Chapter 3

Consider now a finitely-generated vector space, and let

B = (u1 . . . un ) and C = (v1 . . . vn )

be two ordered bases. What can be said about the relation between coordi-
nates relative to both bases. In other words, for v ∈ V , what is the relation
between [v]B and [v]C ?
Since B is a basis, each of the vectors vi in the basis C has a unique repre-
sentation as a linear combination of the basis vectors ui . In other words, for
every i = 1 . . . , n, there exists n scalars p1i , . . . , pni , such that

vi = p1i u1 + ⋅ ⋅ ⋅ + pni un ,

i.e.,
⎡ p1 ⎤
⎢ i⎥
⎢ ⎥
vi = 1
(u . . . u n ⎢ ⋮ ⎥.
)
⎢ n⎥
⎢pi ⎥
⎣ ⎦
In fact, that column vector is nothing but the coordinate matrix of vi relative
to the basis B,
Coli (P ) = [vi ]B ,
where P is the n × n matrix whose entries are pij . Since this hold for every
i = 1 . . . , n,
⎡ p1 . . . p1n ⎤⎥
⎢ 1
⎢ ⎥
(v1 . . . vn ) = (u1 . . . un ) ⎢ ⋮ ⋮ ⋮ ⎥,
⎢ n ⎥
⎢p1
⎣ . . . pnn ⎥⎦
namely
C = B P.

Symmetrically, denoting by Q the n×n matrix such that for every i = 1, . . . , n,

ui = qi1 v1 + ⋅ ⋅ ⋅ + qin vn ,

namely,
Coli (Q) = [ui ]C ,
we obtain that
⎡q1 . . . q1 ⎤
⎢ 1 n⎥
⎢ ⎥
(u1 . . . un ) = (v1 . . . vn ) ⎢ ⋮ ⋮ ⋮ ⎥,
⎢ n ⎥
⎢q1 . . . qnn ⎥
⎣ ⎦
Vector Spaces 183

or
B = C Q.
Combining the two, for every i = 1, . . . , n,
n n n n
vi = ∑ pji ( ∑ qjk vk ) = ∑ (∑ pji qjk ) vk ,
j=1 k=1 k=1 j=1

namely,
C = C ⋅ QP,
or
C ⋅ (QP − I) = 0.
Since the basis vectors in C are all independent, and since multiplication by
(QP − I) yields n linear combinations of the basis vectors C, these combina-
tions vanish only if each column of QP − I is identically zero, form which we
deduce that
QP = I,
i.e., P ∈ GLn (F) and Q = P −1 . That is, the transitions between bases is
through a right-multiplication by an invertible n × n matrix. The matrices P
and Q are called transition matrices (!‫)מטריצות מעבר‬.
Let now v ∈ V . By definition,
v = B [v]B and v = C [v]C
Since C = B P , it follows that
v = (B P ) [v]C = B (P [v]C ),
which implies that
[v]B = P [v]C .
Likewise, since B = C Q,
v = (C Q) [v]B = C (Q [v]B ),
from which we deduce that
[v]C = Q [v]B .

Let’s summarize this as a theorem:


184 Chapter 3

Theorem 3.47 Let V be an n-dimensional vector space over F. Let B =


(u1 . . . un ) and C = (v1 . . . vn ) be two ordered bases for V . Then the
matrix P ∈ Mn (F) given by

Coli (P ) = [vi ]B .

is invertible and Q = P −1 is given by

Coli (Q) = [ui ]C .

Furthermore,
BP = C and C Q = B,
and for every v ∈ V ,

[v]B = P [v]C and [v]C = Q[v]B .

Example: Let V = R2 and consider two bases

B = ((1, 2) (2, 1)) and C = ((1, 1) (1, −1)) .

(1, 2)
(1, 1)
(2, 1)

(1, −1)

We verify that

1/3 −1
[(1, 1)]B = [ ] and [(1, −1)]B = [ ] ,
1/3 1
Vector Spaces 185

so that
1/3 −1
((1, 1) (1, −1)) = ((1, 2) (2, 1)) [ ],
1/3 1
´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¸¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¶ ´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹¸¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹¶
C B ´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¸ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹¶
P

and
3/2 3/2
[(1, 2)]C = [ ] and [(2, 1)]C = [ ] ,
−1/2 1/2
so that
3/2 3/2
((1, 2) (2, 1)) = ((1, 1) (1, −1)) [ ].
−1/2 1/2
´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹¸¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹¶ ´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¸¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¶
B C ´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¸¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹¶
Q

Indeed,
−1
1/3 −1 3/2 3/2
[ ] =[ ].
1/3 1 −1/2 1/2
Let v = (3, 4). A direct calculation shows that

5/3 7/2
[v]B = [ ] and [v]C = [ ].
2/3 −1/2

You may verify that


5/3 1/3 −1 7/2
[ ] =[ ][ ].
2/3 1/3 1 −1/2
² ´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¸ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹¶ ´¹¹ ¹ ¹ ¸¹ ¹ ¹ ¹ ¶
[v]B P [v]C

▲▲▲

Example: Consider the vector space (R2 , +, R, ⋅) and the ordered bases

B = (u1 u2 ) = ((cos α, sin α) (− sin α, cos α)) ,

and
C = (v1 v2 ) = ((cos β, sin β) (− sin β, cos β)) ,
for some α, β ∈ R (convince yourself geometrically that these are ordered
bases). You may verify that for every (x, y) ∈ R2 ,

x cos α + y sin α
[(x, y)]B = [ ].
−x sin α + y cos α
186 Chapter 3

In particular,

cos β cos α + sin β sin α cos(β − α)


[v1 ]B = [ ]=[ ],
− cos β sin α + sin β cos α sin(β − α)

and
− sin β cos α + cos β sin α − sin(β − α)
[v2 ]B = [ ] =[ ].
sin β sin α + cos β cos α cos(β − α)
That is, C = B ⋅ P , where

cos(β − α) − sin(β − α)
P =[ ].
sin(β − α) cos(β − α)

We know how to invert a 2 × 2 matrix,

cos(β − α) sin(β − α)
P −1 = [ ].
− sin(β − α) cos(β − α)

If follows that for every (x, y) ∈ R2 ,

cos(β − α) sin(β − α)
[(x, y)]C = [ ] [(x, y)]B .
− sin(β − α) cos(β − α)

▲▲▲

Exercises

(easy) 3.74 Consider Exercise 3.67.

(a) Find the matrix P whose columns are the coordinates of the vectors in
C relative to the basis B.
(b) Show directly that P is invertible and find its inverse.
(c) Find the matrix whose columns are the coordinates of the vectors in B
relative to the basis C.

(intermediate) 3.75 Let V be a vector space over F and let B = (v1 , v2 , v3 )


be a sequence of linearly-independent vectors.

(a) Explain why is B an ordered basis for W = Span B.


Vector Spaces 187

(b) Show that


C = (v1 + v2 , v2 − v3 , v1 + v2 + v3 )
is also an ordered basis for W .
(c) Find the matrix P such that B = CP .

Solution 3.75:
(a) Every set of vectors generated, by definition, its own span. If they are linearly-
independent in V they are in particular linearly-independent within their own span,
hence form a basis for their span.
(b) Since the dimension of W is three, it suffices to show that those three vectors are
linearly-independent. Suppose that

a(v1 + v2 ) + b(v2 − v3 ) + c(v1 + v2 + v3 ) = 0,

i.e.,
(a + c)v1 + (a + b + c)v2 + (c − b)v3 = 0.
Since v1 , v2 , v3 are linearly-independent, it follows that

a+c=0 a+b+c=0 and c − b = 0,

from which we obtain at once that a = b = c = 0, proving that they are indeed
linearly-independent.
(c) The solution is
⎡2 −1 −1⎤⎥

⎢ ⎥
(v1 v2 v3 ) = (v1 + v2 v2 − v3 v1 + v2 + v3 ) ⎢−1 1 0 ⎥.
⎢ ⎥
⎢−1
⎣ 1 1 ⎥⎦
188 Chapter 3
Chapter 4

Linear Forms

4.1 Definition and examples


Let V be a vector space over F. Often, we want to assign vectors numerical
values (think of measurements). In the context of a vector space over a field
F, the “number” we associate with each vector is a scalar; in other words, a
“measurement” of vectors is a function V → F. However, a vector space is
not just any old set of points; this set is endowed with an algebraic structure,
and therefore, we may be interested in functions on V that “communicate”
with this algebraic structure. This leads us to the following definition:

Definition 4.1 Let V be a vector space over F. A linear form (‫תבנית‬


!‫ )לינארית‬or a linear functional (!‫ )פונקציונל לינארי‬over V is a function ` ∶
V → F (i.e., a scalar-valued function with domain V ) satisfying the following
conditions: for every u, v ∈ V ,

`(u + v) = `(u) + `(v),

and for every v ∈ V and a ∈ F,

`(a v) = a `(v).

In other words, a linear form on a vector space is a scalar-valued function


over that space that “respects” linear operations. Note (once again) the
distinction between operations in V and operations in F.
190 Chapter 4

Example: The function ` ∶ V → F assigning to every vector v ∈ V the value


`(v) = 0F is a linear form. Why? because for every u, v ∈ V and a ∈ F,

`(u + v) = 0F = 0F + 0F = `(u) + `(v),

and
`(a v) = 0F = a `(v).
This linear form is called the zero form (!‫)תבנית האפס‬. ▲▲▲

Example: Let V be an n-dimensional vector space and let

B = (v1 , . . . , vn )

be an ordered basis. For every i = 1, . . . , n, we denote by `i ∶ V → F the


function returning the i-th coordinate of a vector relative to the basis B.
That is,
`i (v) = ([v]B )i .
More explicitly, if
⎡ a1 ⎤
⎢ ⎥
⎢ ⎥
v = (v1 . . . vn ) ⎢ ⋮ ⎥ ,
⎢ n⎥
⎢a ⎥
⎣ ⎦
i i
then ` (v) = a . Why is this a linear form? Because for every u, v ∈ V ,

`i (u + v) = ([u + v]B )i = ([u]B + [v]B )i = ([u]B )i + ([v]B )i = `i (u) + `i (v),

where we used here Proposition 3.46. Note the different types of addition:
in the first two terms it is addition in V , in the third term it is addition in
Fncol , and in the last two terms it is addition in F.
Likewise, using once again Proposition 3.46, for u ∈ V and c ∈ F,

`i (c u) = ([c u]B )i = (c [u]B )i = c ([u]B )i = c `i (u),

Note that for every i, j = 1, . . . , n,



i i ⎪1 i = j

` (vj ) = ([vj ]B ) = ⎨ ,


⎩ 0 i ≠ j

i.e., `i (vj ) = δji . This particular set of linear forms will have an important
role shortly. ▲▲▲
Linear Forms 191

Example: Let V = (Fncol , +, F, ⋅) and let a ∈ Fnrow . We define the function


`a ∶ V → F by
⎡v1 ⎤
⎢ ⎥
⎢ ⎥
`a (v) = a v = [a1 . . . an ] ⎢ ⋮ ⎥ .
⎢ n⎥
⎢v ⎥
⎣ ⎦
The function `a is a linear form because matrix multiplication is distributive,
namely, for u, v ∈ V and c ∈ F,
`a (u + v) = a (u + v) = a u + a v = `a (u) + `a (v),
and
`a (c u) = a (cu) = c a u = c `a (u).
Note how we view the row vector a as “constant” whereas the linear form
`a operates on all v ∈ V . To summarize: every vector a ∈ Fnrow defines via
matrix multiplication a linear form on Fncol . ▲▲▲

Example: Take n = 1 and F = R in the previous example; then V = R, and


for every a ∈ R we define the function
`a (x) = a x.
Thus, linear forms coincide in this case with the good old notion of linear
functions R → R. ▲▲▲

Example: Let V = (Mn (F), +, F, ⋅) and define the function known as the
trace (!‫ )עקבה‬of the matrix.
n
tr(A) = ∑ aii .
i=1

It is readily verified that the trace is also a linear form. ▲▲▲

Example: Let S be a non-empty set (it doesn’t need to have any other
structure than being a set) and consider the set V = FS of all functions
f ∶ S → F. We have seen that V is a vector space over F with respect to
the natural operations of addition and scalar multiplication of field-valued
functions (make sure you remember the vectorial structure of FS ). Let s ∈ S,
and define the function Evals ∶ V → F,
Evals (f ) = f (s).
192 Chapter 4

(Given a function f ∈ FS , the function Evals return the value of f at s.)


Then, Evals is a linear form, because for every f, g ∈ FS and c ∈ F,

Evals (f + g) = (f + g)(s) = f (s) + g(s) = Evals (f ) + Evals (g),

and
Evals (c f ) = (c f )(s) = c f (s) = c Evals (f ).
▲▲▲

4.2 Properties of linear forms


In this section we review some important properties of linear forms.
The following is readily proved inductively:

Proposition 4.2 Let ` be a linear form on a vector space (V, +, F, ⋅). Then
for every v1 , . . . , vn ∈ V and a1 , . . . , an ∈ F,

` (a1 v1 + ⋅ ⋅ ⋅ + an vn ) = a1 `(v1 ) + ⋯ + an `(vn ).

Proof : This is left as an exercise. n

Proposition 4.3 Let ` be a linear form on a vector space (V, +, F, ⋅). Then

`(0V ) = 0F .

Proof : Let v ∈ V be arbitrary. Then, using the fact that 0F v = 0V and the
properties of `,
`(0V ) = `(0F v) = 0F `(v) = 0F .
n
An important fact about linear forms (in finitely-generated vector spaces) is
that they are completely determined by their action on basis vectors. We
establish this in two separate propositions:
Linear Forms 193

Proposition 4.4 Let V be a finitely-generated vector space, and let

B = (v1 . . . vn )

be an ordered basis for V . Then, for every set c1 , . . . , cn of scalars there exists
a linear form `, such that

`(vi ) = ci for every i = 1, . . . , n.

Proof : There really is only one way to define such a functional. Since every
v ∈ V has a unique representation as

v = a1 v 1 + ⋅ ⋅ ⋅ + an v n ,

then `(v) must be given by

`(v) = a1 `(v1 ) + ⋅ ⋅ ⋅ + an `(vn ) = a1 c1 + ⋅ ⋅ ⋅ + an cn .

To complete the proof, we have to verify that ` is a linear form. Let v, w ∈ V


be given by
v = a1 v1 + ⋅ ⋅ ⋅ + an vn
w = b1 v 1 + ⋅ ⋅ ⋅ + bn v n .
Then,
v + w = (a1 + b1 ) v1 + ⋅ ⋅ ⋅ + (an + bn ) vn .
By the way we defined `,
`(v) = a1 c1 + ⋅ ⋅ ⋅ + an cn
`(w) = b1 c1 + ⋅ ⋅ ⋅ + bn cn ,
and
`(v + w) = (a1 + b1 ) c1 + ⋅ ⋅ ⋅ + (an + bn ) cn ,
so that indeed `(v + w) = `(v) + `(w). We proceed similarly to show that
`(k v) = k `(v) for k ∈ F. n
The following complementing proposition asserts that there really was no
other way to define `:
194 Chapter 4

Proposition 4.5 Let V be a finitely-generated vector space. Let

B = (v1 . . . vn )

be an ordered basis for V . If two linear forms `, `′ satisfy

`(vi ) = `′ (vi ) for all i = 1, . . . , n,

then ` = `′ .

Proof : By the property of a basis in a finitely-generated vector space, every


v ∈ V can be represented uniquely as
v = a1 v 1 + ⋅ ⋅ ⋅ + an v n
for some scalars a1 , . . . , an . Then, by the linearity of `, `′ ,
`(v) = a1 `(v1 ) + ⋅ ⋅ ⋅ + an `(vn ) = a1 `′ (v1 ) + ⋅ ⋅ ⋅ + an `′ (vn ) = `′ (v).
n
Note how we defined the functional `. Given the c ∈ Fnrow ,
⎡ ([v]B )1 ⎤
⎢ ⎥
⎢ ⎥
`(v) = [c1 . . . cn ] ⎢ ⋮ ⎥ = c[v]B .
⎢ ⎥
⎢([v]B )n ⎥
⎣ ⎦
The two last propositions have a very important implication: every linear
form can be defined using n scalars. It is difficult not to make a connection
with the notion of coordinates. However, at this stage we haven’t identified
the set of linear forms as a vector space, hence these is yet no meaning to
assign them coordinates. This will be rectified in the next section.
Take the particular example where V = Fn along with the standard basis,
E = (e1 . . . en ) .
Then every vector v = (v 1 , . . . , v n ) ∈ V “coincides with its coordinates”, i.e.,
v i = ([v]B )i . We have just shown that to every linear form ` corresponds a
unique c ∈ Fnrow , such that
`(v) = c [v]E = c1 v 1 + ⋅ ⋅ ⋅ + cn v n .
Linear Forms 195

Exercises

(easy) 4.1 Prove using induction that for a linear from ` on a vector space
V,
f (a1 v1 + ⋅ ⋅ ⋅ + an vn ) = a1 f (v1 ) + . . . an f (vn )
for every a1 , . . . , an ∈ F and v1 , . . . , vn ∈ V .

Solution 4.1: For n = 1 the assertion is that f (a v) = a f (v), which hold by definition.
Assume that the assertion holds for n = k, then

f (a1 v1 + ⋅ ⋅ ⋅ + ak+1 vk+1 ) = f ((a1 v1 + ⋅ ⋅ ⋅ + ak vk ) + ak+1 vk+1 )


= f (a1 v1 + ⋅ ⋅ ⋅ + ak vk ) + f (ak+1 vk+1 )
= (a1 f (v1 ) + . . . ak f (vk )) + ak+1 f (vk+1 ).

(intermediate) 4.2 Let V = (R3 , +, R, ⋅) and let

v1 = (1, 0, 1) v2 = (0, 1, −2) and v3 = (−1, −1, 0).

(a) Find the linear form ` on R3 satisfying

`(v1 ) = 1 `(v2 ) = −2 and `(v3 ) = 3.

That is, what is `(x, y, z)?


(b) Characterize all linear forms satisfying `(v1 ) = `(v2 ) = 0 and `(v3 ) ≠ 0.
(c) Show that for a linear form such as in the previous article, `(2, 3, −1) ≠
0.

Solution 4.2:
(a) The vectors (v1 , v2 , v3 ) form a basis. Every (x, y, z) ∈ R3 can be written as

(x, y, z) = (2x − 2y − z)v1 + (x − y − z)v2 + (x − 2y − z)v3 ,

hence
f (x, y, z) = (2x − 2y − z)f (v1 ) + (x − y − z)f (v2 ) + (x − 2y − z)f (v3 )
= (2x − 2y − z) − 2(x − y − z) + 3(x − 2y − z)
= 3x − 6y − 2z.
196 Chapter 4

(b) Using the result of the previous item

f (x, y, z) = (2x − 2y − z)f (v1 ) + (x − y − z)f (v2 ) + (x − 2y − z)f (v3 ),

if f (v1 ) = f (v2 ) = 0 and f (v3 ) = a ≠ 0, then

f (x, y, z) = a(x − 2y − z).

(c) f (2, 3, −1) = −3a ≠ 0.

(intermediate) 4.3 Let (V, +, F, ⋅) be a finitely-generated vector space and


let v ∈ V be a non-zero vector, v ≠ 0V . Prove that there exists a linear form
` ∈ V ∨ , such that `(v) ≠ 0F .

Solution 4.3: Suppose that dimF V = n. Complete v into a basis


B = (v, u1 , . . . , un−1 ),

and define ` ∈ V ∨ by

`(v) = 1 and `(ui ) = 0 for i = 1, . . . , n − 1.

(intermediate) 4.4 Let (V, +, F, ⋅) be a finitely-generated vector space and


let u, v ∈ V be distinct vectors, u ≠ v. Prove that there exists a linear form
` ∈ V ∨ , such that `(u) ≠ `(v).

Solution 4.4: By the previous exercise (with v replaced by v − u), there exists a linear
functional ` ∈ V ∨ satisfying `(u − v) ≠ 0, and since ` is linear `(u) ≠ `(v).

(intermediate) 4.5 Let (V, +, F, ⋅) be a vector space and let `, m ∈ V ∨ be


linear forms satisfying that

`(v) = 0F if and only if m(v) = 0F .

Prove that there exists an a ∈ F such that m = a `.


Linear Forms 197

Solution 4.5: If ` = 0V ∨ then m = 0V ∨ (and vice-versa) and the claim holds. Otherwise,
there exists a vector u ∈ V such that `(u) ≠ 0F . Let v ∈ V be arbitrary. Then,

`(v)
` (v − u) = 0F .
`(u)
It follows that
`(v)
m (v − u) = 0F ,
`(u)
which precisely means that
`(v)
m(v) = m(u),
`(u)
namely,
m(u)
m= `.
`(u)

(intermediate) 4.6 Consider the infinite-dimensional vector space R[X].


Let a, b ∈ R such that a < b. For
n
P = ∑ pi X i ∈ R[X]
i=0

we define n
b pi i+1 i+1
∫a P (x) dx = ∑ i + 1 (b − a ).
i=0

Let Q ∈ R[X]. Prove that the function ` ∶ R[X] → R defined by


b
`(P ) = ∫ P (x)Q(x) dx
a

is a linear form. Note: you are not expected to know anything about
integrals—just follow the definitions.

4.3 The dual space


Let V be a vector space over F. In the previous section we defined the notion
of linear forms over (V, +, F, ⋅). We denote the set of all linear forms over V
by
V ∨ = {` ∶ V → F ∶ ` is a linear form}.
198 Chapter 4

it is a subset of the set of Func(V, F), which comprises all (i.e., not necessarily
linear) functions f ∶ V → F. Recall that Func(V, F) is itself a vector space
over F with respect to the function addition
(f + g)(v) = f (v) + g(v)
and the scalar multiplication
(c f )(v) = c f (v).

Proposition 4.6 The set of linear forms V ∨ is a linear subspace of the


vector space Func(V, F) (hence, V ∨ is a vector space in its own sake).

Proof : By definition, in order to prove that a set of vectors is a linear sub-


space, we need to prove that it is non-empty, and that it is closed under
addition and scalar multiplication.
The set V ∨ is non-empty, because it contains at least the zero form, which
we now denote by 0V ∨ . Let `1 , `2 ∈ V ∨ . The sum `1 + `2 is well-defined as a
sum in Func(V, F); we need to show that `1 + `2 ∈ V ∨ , i.e., that it is a linear
form. For all u, v ∈ V and c ∈ F,
(`1 + `2 )(u + v) = `1 (u + v) + `2 (u + v)
= (`1 (u) + `1 (v)) + (`2 (u) + `2 (v))
= (`1 (u) + `2 (u)) + (`1 (v) + `2 (v))
= (`1 + `2 )(u) + (`1 + `2 )(v),
and
(`1 + `2 )(c u) = `1 (c u) + `2 (c u)
= c `1 (u) + c `2 (u)
= c (`1 (u) + `2 (u))
= c (`1 + `2 )(u),
proving that `1 + `2 ∈ V ∨ . Likewise, let ` ∈ V ∨ and a ∈ F; we need to show
that a ` ∈ V ∨ , i.e., that it is a linear form. For all u, v ∈ V and c ∈ F,
(a `)(u + v) = a `(u + v)
= a (`(u) + `(v))
= a `(u) + a `(v)
= (a `)(u) + (a `)(v),
Linear Forms 199

and

(a `)(c u) = a `(c u)
= a (c `(u))
= c (a `(u))
= c (a `)(u),

proving that a ` ∈ V ∨ . This completes the proof. n


Thus, every vector space (V, +, F, ⋅) induces another vector space (V ∨ , +, F, ⋅)
over the same field. The vector space V ∨ is called the space dual (!‫ )דואלי‬to
V . You should internalize the fact that elements of V ∨ are also vectors, but
they are at the same time functions over a vector space, V . Elements of V
and elements of V ∨ are both vectors, albeit belonging to different spaces. In
particular, there is no meaning to adding an element of V and an element of
V ∨ . On the other hand, the elements of V ∨ “act” on element of V to yield
scalars.
The action `(v) of a linear form ` or a vector v can be viewed as a function
taking an element of V ∨ and an element of V and returning is a scalar. We
often denote this pairing by

⟨⋅, ⋅⟩ ∶ V ∨ × V → F,

where
⟨`, v⟩ = `(v).

Example: For V = Fncol we have seen that V ∨ can be identified with Fnrow :
every a ∈ Fnrow defined a unique `a ∈ V ∨ defined by

`a (v) = a ⋅ v.

It is customary to write
(Fncol )∨ ≃ Fnrow ,

where the ≃ sign mean that the two spaces can be identified (more on that
later). ▲▲▲
200 Chapter 4

4.4 Dual bases


Let V be a finitely-generated vector space. What can be said about its dual
space? Is it also finitely-generated? If it is, is there a relation between dimF V
and dimF V ∨ ? The theorem below answers this question affirmatively.

Theorem 4.7 Let V be a finitely-generated vector space. Let

B = (v1 . . . vn )

be an ordered basis for V . Then,


1
⎛` ⎞
B =⎜ ⋮ ⎟

⎝`n ⎠

is an ordered basis for V ∨ , called the dual basis (!‫ )בסיס דואלי‬of B, where
`i is the unique linear form satisfying

`i (vj ) = δji for all i, j = 1, . . . , n,

or equivalently
`i (v) = ([v]B )i .
As a result,
dimF V ∨ = dimF V.

Proof : We need to show that B∨ is spanning and independent. Suppose that


a1 , . . . , an are scalars satisfying

a1 `1 + ⋅ ⋅ ⋅ + an `n = 0V ∨

(this is an equality between elements in V ∨ ). In particular, applying both


sides on vj ,
a1 `1 (vj ) + ⋅ ⋅ ⋅ + an `n (vj ) = 0V ∨ (vj ) = 0F ,
i.e.,
aj = 0F .
Linear Forms 201

Since this holds for every j = 1, . . . , n, it follows that the linear combination
of the `i ’s is trivial, namely, the linear forms `i are linearly-independent.
It remains to show that B∨ is spanning. We will show that any ` ∈ V ∨ can
be represented as
` = `(v1 ) `1 + ⋅ ⋅ ⋅ + `(vn ) `n ,
i.e., it is a linear combination of the linear forms `i (note that `(vi ) are
scalars). By Proposition 4.5 it suffices to verify that both sides yield the
same scalar when acting on basis vectors vj . Indeed,

(`(v1 ) `1 + ⋅ ⋅ ⋅ + `(vn ) `n )(vj ) = `(v1 ) `1 (vj ) + ⋅ ⋅ ⋅ + `(vn ) `n (vj ) = `(vj ),

which completes the proof. n

Example: Let V = (Fn , +, F, ⋅) and let

E = (e1 . . . en )

be the standard basis. We denote the basis dual to E by


1
⎛e ⎞
E = ⎜ ⋮ ⎟.

⎝en ⎠

As we have seen, for v = (x1 , . . . , xn ) we have

ei (v) = [v]E = xi ,

that is the i-th linear form in the dual standard basis extracts the i-th coor-
dinate of a vector. ▲▲▲
Since V ∨ is a vector space and since B∨ is a basis for V ∨ , every linear form
in V ∨ can be represented using coordinates. Every ` ∈ V ∨ has a unique
representation
1
⎛` ⎞
` = [c1 . . . cn ] ⎜ ⋮ ⎟,
´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¸¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¶ ⎝`n ⎠
[`]B∨
±
B ∨

where [`]B∨ ∈ Fnrow is the coordinate matrix. We have just proved that

[`]B∨ = [`(v1 ) . . . `(vn )] .


202 Chapter 4

Consider now the following question: given a basis B on a finitely-generated


vector space V , and its dual basis, every vector v and every linear form ` can
be written using coordinates,

v = B [v]B and ` = [`]B∨ B∨ .

Can we express the scalar `(v) obtained by the action of the linear form on
the vector using their respective coordinates?
Let denote the coordinates of v and ` as
v = a1 v 1 + ⋅ ⋅ ⋅ + an v n
` = b1 `1 + ⋅ ⋅ ⋅ + bn `n ,

namely,
⎡ a1 ⎤
⎢ ⎥
⎢ ⎥
[v]B = ⎢ ⋮ ⎥ and [`]B∨ = [b1 . . . bn ] .
⎢ n⎥
⎢a ⎥
⎣ ⎦
Then,
n n
`(v) = ∑ bi `i (∑ aj vj )
i=1 j=1
n n
= ∑ ∑ bi aj `i (vj )
i=1 j=1
n n
= ∑ ∑ bi aj δji
i=1 j=1
n
= ∑ b i ai .
i=1

Consider the right-hand side; it is the product of the row vector [`]B∨ and
the column vector [v]B .
We have just proved the following:

Proposition 4.8 Let V be a finitely-generated vector space. Let

B = (v1 . . . vn )
Linear Forms 203

be an ordered basis for V and let


1
⎛` ⎞
B =⎜ ⋮ ⎟

⎝`n ⎠

be its dual basis. Then, for every ` ∈ V ∨ and v ∈ V ,

`(v) = [`]B∨ [v]B .

We have seen that given an ordered basis B = (v1 , . . . , vn ) and its dual
B∨ = (`1 , . . . , `n ) in a finitely-generated vector space, every linear form ` ∈ V ∨
can be represented as
n
` = ∑ `(vi ) `i .
i=1
This representation has an analog for vectors: every vector v ∈ V is given by
n
v = ∑ `i (v) vi ,
i=1

because by definition, `i (v) = ([v]B )i .


We end this section with addressing the transition between dual bases:

Theorem 4.9 Let V be a finitely-generated vector space. Let


B = (v1 . . . vn ) and C = (w1 . . . wn )
be ordered bases for V , related by a transition matrix P ∈ GLn (F),
C = B P.
Denote the corresponding dual bases by
1 1
⎛` ⎞ ⎛m ⎞
B =⎜ ⋮ ⎟

and C = ⎜ ⋮ ⎟.

⎝`n ⎠ ⎝mn ⎠
Then, the transition matrix from B∨ to C∨ is Q = P −1 ,
C∨ = QB∨ .
204 Chapter 4

Proof : By definition of the dual basis,


mj (wi ) = δij for all i, j = 1, . . . , n.
It is given that
n
wi = ∑ pki vk ,
k=1
and we need to show that n
mj = ∑ qsj `s .
s=1
This is an identity between linear forms; both sides are equal if they yield
the same set of scalars when acting on the basis vectors wi . Indeed, for every
i, j = 1, . . . , n,
n n n
∑ qsj `s (wi ) = ∑ qsj `s ( ∑ pki vk )
s=1 s=1 k=1
n n
= ∑ qsj ∑ pki `s (vk )
s=1 k=1
n n
= ∑ qsj ∑ pki δks
s=1 k=1
n
= ∑ qkj pki
k=1
= (P Q)ji = δij .
This completes the proof. n

Example: Consider once again the vector space (R2 , +, R, ⋅) endowed with
the two bases
B = ((1, 2) (2, 1)) and C = ((1, 1) (1, −1)) .

(1, 2)
(1, 1)
(2, 1)

(1, −1)
Linear Forms 205

We have seen that


1/3 −1
((1, 1) (1, −1)) = ((1, 2) (2, 1)) [ ],
1/3 1
´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¸¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¶ ´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹¸¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹¶
C B ´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¸ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹¶
P

and
3/2 3/2
((1, 2) (2, 1)) = ((1, 1) (1, −1)) [ ].
−1/2 1/2
´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹¸¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹¶ ´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¸¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¶
B C ´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¸¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹¶
Q

We now calculate the dual bases


`1 m1
B∨ = ( 2 ) and C∨ = ( 2 ) .
` m
Since
`i (v) = ([v]B )i ,
we have to find the coordinates of every vector v ∈ R2 relative to the basis
B. Write v = (x, y), then
(x, y) = `1 (v)(1, 2) + `2 (v)(2, 1),
from which we obtain that
`1 (x, y) = 31 (2y − x) and `2 (x, y) = 31 (2x − y).
Similarly,
(x, y) = m1 (v)(1, 1) + m2 (v)(1, −1),
from which we obtain that
m1 (x, y) = 21 (x + y) and m2 (x, y) = 21 (x − y).
Since C = BP we expect that C∨ = Q B∨ , i.e.,
m1 3/2 3/2 `1
( 2) = [ ]( )
m −1/2 1/2 `2
Indeed, for every v = (x, y),
( 23 `1 + 32 `2 )(v) = 23 ⋅ 13 (2y − x) + 32 ⋅ 13 (2x − y) = 21 (x + y) = m1 (v),
and
(− 21 `1 + 21 `2 )(v) = − 21 ⋅ 13 (2y − x) + 21 ⋅ 13 (2x − y) = 21 (x − y) = m2 (v).
▲▲▲
206 Chapter 4

Exercises

(easy) 4.7 Consider the vector space (R2 , +, R, ⋅). Find the ordered basis
dual to the ordered basis
B = ((3, 4) (5, 7)) .

Solution 4.7: Denote v1 = (3, 4) and v2 = (5, 7). Every (x, y) ∈ R2 can be written as
(x, y) = (7x − 5y)v1 + (3y − 4x)v2 .
The linear forms `1 and `2 are determined by
`1 (x, y) = (7x − 5y) `1 (v1 ) + (3y − 4x)`1 (v2 ) = 7x − 5y
`2 (x, y) = (7x − 5y) `2 (v1 ) + (3y − 4x)`2 (v2 ) = 3y − 4x.

(intermediate) 4.8 Let (V, +, F, ⋅) be a finitely-generated vector space. Prove


that

(a) v = 0V if and only if `(v) = 0 for all ` ∈ V ∨ .


(b) ` = 0V ∨ if and only if `(v) = 0 for all v ∈ V .

Solution 4.8:
(a) Let (v1 , . . . , vn ) be a basis for V and let (`1 , . . . , `n ) be the dual basis. One direction
is immediate: if v = 0V then `(0V ) = 0F for every ` ∈ V ∨ . Conversely, suppose that
`(v) = 0 for all ` ∈ V ∨ . Let v ∈ V be arbitrary; it can be written as
v = a1 v1 + ⋅ ⋅ ⋅ + an vn .
For every j = 1, . . . , n,
0F = `j (v) = a1 `j (v1 ) + ⋅ ⋅ ⋅ + an `j (vn ) = aj ,
i.e., aj = 0F for every j = 1, . . . , n, which proves that v = 0V .
(b) Once again, one direction is trivial (in fact, by definition of the zero form). Suppose
that f `(v) = 0 for all v ∈ V . Expanding ` in the dual basis,
` = b1 `1 + ⋅ ⋅ ⋅ + bn `n ,
and substituting vj ,
0F = `(vj ) = b1 `1 (vj ) + . . . , +bn `n (vj ) = bj ,
i.e., bj = 0F for every j = 1, . . . , n, which proves that ` = 0V ∨ .
Linear Forms 207

(intermediate) 4.9 Consider the vector space (C3 , +, C, ⋅). Find the basis
dual to the ordered basis

B = ((1, 0, −1) (1, 1, 1) (2, 2, 0)) .

Solution 4.9: Denote


v1 = (1, 0, −1) v2 = (1, 1, 1) and v3 = (2, 2, 0).

Every (x, y, z) ∈ C3 can ve written as

(x, y, z) = (x − y)v1 + (x − y + z)v2 + (y − 21 x − 12 z)v3 ,

from which we obtain that

`1 (x, y, z) = x − y `2 (x, y, z) = x − y + z and `3 (x, y, z) = y − 21 x − 12 z.

(intermediate) 4.10 Let V = (Q3 , +, Q, ⋅) and consider the ordered basis

B = ((1, 0, −1), (1, 1, 1), (2, 2, 0)) .

(a) Find the basis B∨ dual to B.


(b) Let E = (e1 , e2 , e3 ) be the standard basis for V . Find the basis E ∨ dual
to E
(c) Find the transition matrix P satisfying B = EP .
(d) Find the transition matrix Q satisfying E ∨ = QB∨ (write the bases E ∨
and B∨ as columns of linear forms).
(e) Find the transition matrix P satisfying E = BP .
(f) Find the transition matrix Q satisfying B∨ = QE ∨ .

Solution 4.10:
(a) We really solved it in the previous exercise (and the fact that the field was different
doesn’t matter for the sake of the calculation):

`1 (x, y, z) = x − y `2 (x, y, z) = x − y + z and `3 (x, y, z) = y − 12 x − 12 z.


208 Chapter 4

(b) The ordered basis dual to the standard basis is

e1 (x, y, z) = x e2 (x, y, z) = y and e3 (x, y, z) = z.

(c) We are looking for a matrix satisfying

((1, 0, −1) (1, 1, 1) (2, 2, 0)) = ((1, 0, 0) (0, 1, 0) (0, 0, 1)) P.

The solution is
⎡ 1 1 2⎤
⎢ ⎥
⎢ ⎥
P = ⎢ 0 1 2⎥ .
⎢ ⎥
⎢−1 1 0⎥
⎣ ⎦
(d) Now we need to write
1 1
⎛e ⎞ ⎛` ⎞
⎜e ⎟ = Q ⎜`2 ⎟ .
2

⎝e3 ⎠ ⎝`3 ⎠

Substituting (x, y, z) on both sides,


⎡x⎤ ⎡ x−y ⎤
⎢ ⎥ ⎢ ⎥
⎢ ⎥ ⎢ ⎥
⎢y ⎥ = Q ⎢ x − y + z ⎥ .
⎢ ⎥ ⎢ 1 1 ⎥
⎢z ⎥ ⎢y − x − z ⎥
⎣ ⎦ ⎣ 2 2 ⎦

The solution is
⎡ 1 1 2⎤
⎢ ⎥
⎢ ⎥
Q = ⎢ 0 1 2⎥ .
⎢ ⎥
⎢−1 1 0⎥
⎣ ⎦
(e) You may verify that
⎡1 −1 0 ⎤⎥

⎢ ⎥
((1, 0, 0) (0, 1, 0) (0, 0, 1)) = ((1, 0, −1) (1, 1, 1) (2, 2, 0)) ⎢ 1 −1 1 ⎥.
⎢ 1 1⎥
⎢−
⎣ 2 1 − 2 ⎥⎦

(f) You may verify that indeed


1 ⎡
⎛e ⎞ ⎢⎢ 1 −1 0 ⎤⎥ ⎛`1 ⎞
2 ⎥
⎜e ⎟ = ⎢ 1 −1 1 ⎥ ⎜`2 ⎟ .
⎝e3 ⎠ ⎢⎢− 1 1 1⎥
− 2 ⎥⎦ ⎝`3 ⎠
⎣ 2

(intermediate) 4.11 Repeat the previous question with E replaced by

C = ((1, 1, 0), (1, 0, 1), (0, 1, 1)) .


Linear Forms 209

(intermediate) 4.12 Based on the last two questions, formulate a general


statement and prove it.

(intermediate) 4.13 Let (V, +, F, ⋅) be a vector space of dimension at least


n. Let A ∈ GLn (F) (an invertible square matrix) and let

(v1 . . . vn )

be an independent sequence of vectors. Define the linear forms


1
⎛ϕ ⎞
⎜ ⋮ ⎟
⎝ϕn ⎠

via
ϕi (vj ) = aij for all i, j = 1, . . . , n.
(Recall that this defines the linear forms uniquely.) Show that the linear
forms ϕ1 , . . . , ϕn are linearly-independent. Try to relate this question to the
last three.

Solution 4.13: Suppose that the linear form


` = b1 ϕ1 + ⋅ ⋅ ⋅ + bn ϕn

is the zero form. Then, for every j = 1, . . . , n,

0F = `(vj ) = b1 a1j + ⋅ ⋅ ⋅ + bn anj .

We have a homogeneous linear system with coefficient matrix aij ; since this matrix is
invertible, the only solution is the trivial one, proving that the linear forms ϕi are linearly-
independent.

(harder) 4.14 Let B = (v1 , v2 , . . . ) be an infinite (but countable) basis for


a vector space V over a field F. Define a sequence of linear forms B∨ =
(`1 , `2 , . . . ) by
`i (vj ) = δji .

(a) Show that the functions `i are indeed well-defined for all v ∈ V , and
are linear forms.
(b) Show that the sequence B∨ is linearly-independent.
210 Chapter 4

(c) Show that B∨ is not a basis for V ∨ . I.e., there exists an ` ∈ V ∨ which
is not in the span of B∨ . Hint: set `(vi ) = 1 for all i ∈ N.

Solution 4.14:
(a) One way to think of a countable basis is that every v ∈ V has a representation

v = ∑ ai vi ,
i=1

where this sum is not really infinite, as the coefficients ai vanish starting from some
i > n. Since the basis vectors are linearly-independent, this representation is unique.
Then, the `i are well-defined as for such a v

`i (v) = `i (a1 v1 + a2 v2 + . . . ) = a1 `i (v1 ) + a2 `i (v2 ) + ⋅ ⋅ ⋅ = ai .

They are linear forms, because for


∞ ∞
v = ∑ ai vi and w = ∑ bi v i ,
i=1 i=1

we have

v + w = ∑(ai + bi ) vi ,
i=1

so that
`i (v + w) = ai + bi = `i (v) + `i (w),
and likewise for c ∈ F,

c v = ∑(c ai ) vi ,
i=1

hence
`i (c v) = c ai = c `i (v).

(b) Let
a1 `1 + ⋅ ⋅ ⋅ + an `n = 0V ∨ .
Then for every j = 1, . . . , n,

a1 `1 (vj ) + ⋅ ⋅ ⋅ + an `n (vj ) = aj = 0F ,

proving that the linear forms `i are linearly-independent. Note that we used here
the fact that any linear combination of `i ’s can be written a linear combination of
all of them up to some i = n.
Linear Forms 211

(c) To show that the `i ’s do not span V ∨ we consider the linear form defined by its
action on all basis vectors, `(vi ) = 1. Then, for

v = ∑ ai vi ,
i=1

`(v) is the sum of all coefficients. It is easy to see that this is a linear form. It is
however not spanned by the `i ’s has any (finite!) linear combination of `i ’s fail to
“see” the coefficients beyond a certain value of i.

4.5 Null space and annihilator


4.5.1 The annihilator of a set of vectors
Definition 4.10 Let V be a vector space over F and let S ⊆ V be a subset
(not necessarily a subspace). The annihilator (!M‫ )קבוצת המאפסי‬of S is the
set S 0 ⊆ V ∨ of linear forms that vanish on all elements in S,

S 0 = {` ∈ V ∨ ∶ `(v) = 0F for all v ∈ S} ⊆ V ∨ .

(In some places the notation is Ann(S).)

Example: Let S = {0V }, then the set of linear forms ` ∈ V ∨ satisfying that
`(v) = 0F for all v ∈ S, i.e., `(0V ) = 0F is the entirety of V ∨ , i.e.,

{0V }0 = V ∨ .

▲▲▲

Example: Let V = (R2 , +, R, ⋅) and let S = {(1, 0)}. Then,

S 0 = {` ∈ V ∨ ∶ `(1, 0) = 0F }.

Take the standard basis for V ∨ ,

e1 (x, y) = x and e2 (x, y) = y.


212 Chapter 4

Writing ` = a e1 + b e2 , we have that


`(1, 0) = 0F if and only if a = 0F ,
so that
S 0 = {b e2 ∶ b ∈ F} = F e2 .
▲▲▲

Example: Let V = (R2 , +, R, ⋅) and let S = {(1, 0), (0, 1)}. Then,
S 0 = {` ∈ V ∨ ∶ `(1, 0) = 0F and `(0, 1) = 0F }.
Using the same basis for V ∨ , we obtain that both a and b vanish, i.e.,
S 0 = {0V ∨ }.
▲▲▲
Look at the above three example: first notice that the larger S is, the smaller
S 0 is. Second, in all instances S 0 turned out to be a linear subspace of V ∨ .
The next two propositions show that this is always the case:

Proposition 4.11 Let V be a vector space over F and let S ⊆ V be a subset.


Then,
S 0 ≤ V ∨.

Proof : We need to show that S 0 is non-empty and that it is closed under


addition and scalar multiplication. The set S 0 is non-empty because 0V ∨ ∈ S 0 .
Let `, `′ ∈ S 0 , i.e.,
`(v) = `′ (v) = 0F for all v ∈ S.
Then,
(` + `′ )(v) = `(v) + `′ (v) = 0F for all v ∈ S,
proving that ` + `′ ∈ S 0 . Likewise, let ` ∈ S 0 and a ∈ F, then
(a `)(v) = a `(v) = 0F for all v ∈ S,
proving that a ` ∈ S 0 . By definition, S 0 ≤ V ∨ . n
Linear Forms 213

Proposition 4.12 Let (V, +, F, ⋅) be a vector space and let S, T ⊆ V . Then,

(a) If S ⊆ T then T 0 ≤ S 0 .
(b) S 0 = (Span S)0

Proof : For the first item, let ` ∈ T 0 , i.e.,

`(v) = 0F for all v ∈ T .

Since S ⊆ T , it follows that

`(v) = 0F for all v ∈ S,

i.e., ` ∈ S 0 , proving that T 0 ⊆ S 0 .


For the second item, let ` ∈ S 0 , i.e.,

`(v) = 0F for all v ∈ S.

Every v ∈ Span S is of the form

v = a1 v 1 + ⋅ ⋅ ⋅ + an v n

for some v1 , . . . , vn ∈ S, hence

`(v) = a1 `(v1 ) + ⋅ ⋅ ⋅ + an `(vn ) = 0F ,

proving that ` ∈ (Span S)0 , i.e.,

S 0 ⊆ (Span S)0 .

Conversely, since S ⊆ Span S, it follows from the first item that (Span S)0 ⊆
S 0 , proving that (Span S)0 = S 0 . n
Thus far, S was just any old set; consider now the case there S = W is a
subspace of V , in which case we have two subspaces, W and W 0 , of spaces,
V and V ∨ , having the same dimension. As we show the dimensions of W
and W 0 are inter-related:
214 Chapter 4

Proposition 4.13 Let (V, +, F, ⋅) be a finitely-generated vector space and let


W ≤ V . Then,
dimF W + dimF W 0 = dimF V.

Proof : Suppose that

dimF W = n and dimF V = n + k.

Let (w1 . . . wn ) be an ordered basis for W , which we complete (using


Proposition 3.36) into an ordered basis

B = (w1 , . . . , wn , v1 , . . . , vk )

for V . We partition its dual basis accordingly

B∨ = (`1 , . . . , `n , m1 , . . . , mk ),

such that

`i (wj ) = δji `i (vj ) = 0 mi (wj ) = 0 and mi (vj ) = δji .

We will be done if we prove that (m1 , . . . , mk ) is an ordered basis for W 0 ,


for then dimF W 0 = k.
By the definition of a basis, every ` ∈ W 0 ≤ V ∨ can be written as

` = (a1 `1 + ⋅ ⋅ ⋅ + an `n ) + (b1 m1 + ⋅ ⋅ ⋅ + bk mk ).

For every j = 1, . . . , n,

0F = `(wj ) = (a1 `1 + ⋅ ⋅ ⋅ + an `n )(wj ) + (b1 m1 + ⋅ ⋅ ⋅ + bk mk )(wj ) = aj ,

proving that
` = b1 m1 + ⋅ ⋅ ⋅ + bk mk ,
i.e., (m1 , . . . , mk ) is a generating set for W 0 ; since it is also independent, it
is a basis for W 0 . n
Linear Forms 215

4.5.2 The null space of a set of linear forms


The notion of an annihilating set has a dual version:

Definition 4.14 Let V be vector space and let L ⊆ V ∨ . The null space
(!M‫ )קבוצת האפסי‬of L is the set of vectors

L0 = {v ∈ V ∶ `(v) = 0F for all ` ∈ L} ⊆ V.

Example: Let V be any vector space and L = {0V ∨ }. Then,

L0 = {v ∈ V ∶ 0V ∨ (v) = 0F } = V.

▲▲▲

Example: Let V = F3col and let L = {`} for

`([x, y, z]T ) = x + y + z.

Then,
L0 = {([x, y, z]T ) ∈ F3col ∶ x + y + z = 0},
which we know how to express explicitly. In fact, we know that
⎡⎡−s − t⎤ ⎤ ⎧
⎪ ⎡−1⎤ ⎡−1⎤⎫
⎢⎢
⎢⎢



⎥ ⎪⎢⎢ ⎥⎥ ⎢⎢ ⎥⎥⎪
⎪ ⎪

L0 = ⎢⎢ s ⎥ ∶ s, t ∈ F⎥ = Span ⎨⎢ 1 ⎥ , ⎢ 0 ⎥⎬ .
⎢⎢ ⎥ ⎥ ⎪
⎪ ⎢ ⎥ ⎢ ⎥⎪ ⎪
⎢⎢ t ⎥ ⎥ ⎩⎢⎣ 0 ⎥⎦ ⎢⎣ 1 ⎥⎦⎪
⎪ ⎭
⎣⎣ ⎦ ⎦
This example shows that the left-hand side of a linear equation of the type
we started this course with is really a linear form, and the solution of a
homogeneous equation is nothing but its null space. ▲▲▲

Example: Let V = M2 (F) and let ` = tr, i.e.,

a b
` ([ ]) = a + d.
c d

It is easy to see that

a b
{`}0 = {[ ] ∶ a, b, c ∈ F} ,
c −a
216 Chapter 4

or
1 0 0 1 0 0
{`}0 = Span {[ ],[ ],[ ]} .
0 −1 0 0 1 0
▲▲▲
The following three propositions are the analogs of Propositions 4.11–4.13:

Proposition 4.15 The null space of a set of linear forms is a vector sub-
space: let V be a vector space and let L ⊆ V ∨ , then

L0 ≤ V.

Proof : The set L0 is non-empty because it contains 0V . Let u, v ∈ L0 , i.e.,

`(u) = `(v) = 0F for all ` ∈ L.

Then,
`(u + v) = `(u) + `(v) = 0F for all ` ∈ L,
which implies that u + v ∈ L0 . For u ∈ L0 and a ∈ F,

`(a u) = a `(u) = 0F for all ` ∈ L,

which implies that a u ∈ L0 . By definition, L0 is a linear subspace of V . n

Proposition 4.16 Let (V, +, F, ⋅) be a vector space and let L, M ⊆ V ∨ .


Then,

(a) If L ⊆ M then M0 ≤ L0 .
(b) L0 = (Span L)0

Proof : Before we prove it formally, two observations: (i) the larger a set
of linear forms is, the more constraints are imposed on its null space, hence
its null space should be smaller. (ii) Think of L0 as a set of homogeneous
linear equations on Fncol (just as an example—we haven’t even required V to
Linear Forms 217

be finitely-generated). The span of L is the set of all linear equations that


are linear combinations of the equations in L; we know that the space of
solutions doesn’t change, which explains the second item.
And now to the formal proof. For the first item, let v ∈ M0 , i.e.,

`(v) = 0F for all ` ∈ M .

Since L ⊆ M , it follows that

`(v) = 0F for all ` ∈ L,

i.e., v ∈ L0 , proving that M0 ⊆ L0 .


For the second item, let v ∈ L0 , i.e.,

`(v) = 0F for all ` ∈ L.

Every ` ∈ Span L is of the form

` = a1 `1 + ⋅ ⋅ ⋅ + an `n

for some `1 , . . . , `n ∈ L, hence

`(v) = (a1 `1 + ⋅ ⋅ ⋅ + an `n ) (v) = a1 `1 (v) + ⋅ ⋅ ⋅ + an `n (v) = 0F ,

proving that v ∈ (Span L)0 , i.e.,

L0 ⊆ (Span L)0 .

Conversely, since L ⊆ Span L, it follows from the first item that (Span L)0 ⊆
L0 , proving that (Span L)0 = L0 . n

Proposition 4.17 Let (V, +, F, ⋅) be a finitely-generated vector space and let


L ≤ V ∨ . Then,
dimF L + dimF L0 = dimF V.

Proof : This is left as an exercise; start with a basis for L0 . n


We now combine the notions of null sets and annihilators to prove the fol-
lowing:
218 Chapter 4

Proposition 4.18 Let V be a finitely-generated vector space. Let W ≤ V


and let L ≤ V ∨ . Then,

(W 0 )0 = W and (L0 )0 = L. (4.1)

Proof : By Proposition 4.17 and Proposition 4.13,

dimF W 0 + dim(W 0 )0 = dimF V

and
dimF W + dimF W 0 = dimF V,
from which we conclude that W and dim(W 0 )0 have the same dimension.
It suffices then to show every vector in W is also in (W 0 )0 (actually, justify
this assertion formally).
By definition,

(W 0 )0 = {v ∈ V ∶ `(v) = 0F for all ` ∈ W 0 },

whereas
W 0 = {` ∈ V ∨ ∶ `(w) = 0F for all w ∈ W }.
So let w ∈ W . For every ` ∈ W 0

`(w) = 0F ,

from which follows that w ∈ (W 0 )0 , proving that W ⊆ (W 0 )0 , which com-


pletes the proof. The second part is left as an exercise. n

Corollary 4.19 Let V be a finitely-generated vector space and let U, W ≤ V .


Then,
U =W if and only if U 0 = W 0.
Likewise, let L, M ≤ V ∨ . Then,

L=M if and only if L0 = M0 .


Linear Forms 219

Proof : We prove the first item. One direction is obvious, U = W implies that
U 0 = W 0 . The other direction follows from the fact that U 0 = W 0 implies
that (U 0 )0 = (W 0 )0 , along with (4.1). The second item is left as an exercise.
n

Exercises

(intermediate) 4.15 Let (V, +, F, ⋅) be a vector space and let W ≤ V . De-


fine
U = {` ∈ V ∨ ∶ W ≤ {`}0 }.
Show that U ≤ V ∨ .

Solution 4.15: The set U is not empty because {0V ∨ }0 = V , hence 0V ∨ ∈ U . Let
`, m ∈ U . By definition,

W ≤ {v ∈ V ∶ `(v) = 0F } and W ≤ {v ∈ V ∶ m(v) = 0F }.

Now,

{v ∈ V ∶ `(v) + m(v) = 0F } ⊇ {v ∈ V ∶ `(v) = 0F } ∪ {v ∈ V ∶ m(v) = 0F },

from which follows that

W ≤ {v ∈ V ∶ `(v) + m(v) = 0F } = {` + m}0 ,

i.e., ` + m ∈ U . Similarly, let ` ∈ U and let a ∈ F. Then,

{v ∈ V ∶ `(v) = 0F } ⊆ {v ∈ V ∶ a `(v) = 0F }

(this is an equality unless a = 0F ), from which we deduce that

W ≤ {v ∈ V ∶ a `(v) = 0F },

i.e., a ` ∈ U . This completes the proof that U is a linear subspace of V ∨ .

(easy) 4.16 Let


w = (1, 1) ∈ R2 .
Calculate {w}0 .

Solution 4.16: By definition,


{w}0 = {` ∈ V ∨ ∶ `(1, 1) = 0}.
220 Chapter 4

Denote by `1 and `2 the dual basis form,

`1 (x, y) = x and `2 (x, y) = y,

so that every ` ∈ V ∨ is of the form ` = a `1 + b `2 . Then,

{w}0 = {a `1 − a `2 ∶ a ∈ R}.

(intermediate) 4.17 Let (V, +, F, ⋅) be a finitely-generated vector space, let


W1 , W2 ≤ V and let L1 , L2 ≤ V ∨ . Show that

(a) (W1 ∩ W2 )0 = (W1 )0 + (W2 )0 .


(b) (W1 + W2 )0 = (W1 )0 ∩ (W2 )0 .
(c) (L1 ∩ L2 )0 = (L1 )0 + (L2 )0 .
(d) (L1 + L2 )0 = (L1 )0 ∩ (L2 )0 .

Solution 4.17: We will solve only the first item. By definition,


W10 = {` ∈ V ∨ ∶ `(w) = 0 for all w ∈ W1 }
W20 = {` ∈ V ∨ ∶ `(w) = 0 for all w ∈ W2 }
(W1 ∩ W2 )0 = {` ∈ V ∨ ∶ `(w) = 0 for all w ∈ W1 ∩ W2 }.

One direction is easy: if ` ∈ (W1 )0 + (W2 )0 , then there exist m1 ∈ (W1 )0 and m2 ∈ (W2 )0 ,
such that ` = m1 + m2 . Then for all w ∈ W1 ∩ W2 ,

`(w) = m1 (w) + m2 (w) = 0,

i.e., ` ∈ (W1 ∩ W2 )0 , proving that

(W1 )0 + (W2 )0 ⊆ (W1 ∩ W2 )0 .

The other direction is harder. Let (w1 , . . . , wk ) be a basis for W1 ∩ W2 . We complete it


into a basis
(w1 , . . . , wk , u1 , . . . , up )
for W1 , and then to a basis
(w1 , . . . , wk , v1 , . . . , vq )
for W2 . We saw that
(w1 , . . . , wk , u1 , . . . , up , v1 , . . . , vq )
Linear Forms 221

is a basis for W1 + W2 . Finally, we complete it into a basis

(w1 , . . . , wk , u1 , . . . , up , v1 , . . . , vq , x1 , . . . , xr )

for V . We partition the dual basis correspondingly as follows

(`1 , . . . , `k , m1 , . . . , mp , s1 , . . . , sq , t1 , . . . , tr ).

The proof is completed by showing that

(s1 , . . . , sq , t1 , . . . , tr )

is a basis for (W1 )0 ,


(m1 , . . . , mp , t1 , . . . , tr )
is a basis for (W2 )0 , and

(m1 , . . . , mp , s1 , . . . , sq , t1 , . . . , tr )

is a basis for both (W1 ∩ W2 )0 and (W1 )0 + (W2 )0 .

(intermediate) 4.18 Find a basis for the annihilator of

W = Span ((1, 2, −3, 4), (0, 1, 4, −1)) ≤ R4 .

Solution 4.18: The annihilator of W is the set of all linear forms ` ∈ V ∨ satisfying
`(w) = 0 for all w ∈ W . It suffices to require that `(w) = 0 for all w in a generating set,
i.e.,
`(1, 2, −3, 4) = `(0, 1, 4, −1) = 0.
Let’s use the basis dual to the standard basis,

`1 (a, b, c, d) = a `2 (a, b, c, d) = b, etc.

Writing ` = a `1 + b `2 + c `3 + d `4 , we obtain that

a + 2b − 3c + 4d = 0 and b + 4c − d = 0.

A straightforward calculation gives that

a = −6b − 13c and d = b + 4c,

where we treat here b and c as free variables. Thus, a basis for W 0 is

{−6 `1 + `2 + `4 , −13 `1 + `3 + 4 `4 }.

Note that there is a lot of freedom in the answer!


222 Chapter 4

(intermediate) 4.19 Let V = (R4 , +, R, ⋅), and let

`1 (x) = x1 + 2x2 + 2x3 + x4 `2 (x) = 2x1 + x4

`3 (x) = −2x1 − 3x3 + 3x4 .


Find a subspace W ≤ R4 such that

W 0 = Span({`1 , `2 , `3 }).

Solution 4.19: Let’s use the fact that W = (W 0 )0 . Thus,


W = {w ∈ V ∶ `(w) = 0 for all ` ∈ Span({`1 , `2 , `3 }}.

We are looking for vectors w ∈ V such that

`1 (w) = `2 (w) = `3 (w) = 0,

i.e.,
⎡w1 ⎤
⎡1 2 2 1⎤⎥ ⎢⎢ 2 ⎥⎥ ⎡⎢0⎤⎥

⎢ ⎥ ⎢w ⎥ ⎢ ⎥
⎢2 0 0 1⎥ ⎢ 3 ⎥ = ⎢0⎥ .
⎢ ⎥ ⎢w ⎥ ⎢ ⎥
⎢−2 0 −3 3⎥⎦ ⎢⎢ 4 ⎥⎥ ⎢⎣0⎥⎦
⎣w ⎦

Using the Gauss-Jordan algorithm, we obtain the reduced matrix


⎡1 0 0 1/2 ⎤⎥

⎢ ⎥
⎢0 1 0 19/12⎥ .
⎢ ⎥
⎢0 0 1 −4/3 ⎥⎦

Thus,
W = Span{(−1/2, −19/12, 4/3, 1)} = Span{(−6, −19, 16, 12)}.

(intermediate) 4.20 Let V be a finitely-generated vector space and let


L ≤ V ∨ . Show that
(L0 )0 = L.
Conclude that for L, M ≤ V ∨ ,

L=M if and only if L0 = M0 .


Linear Forms 223

Solution 4.20: We have


dimF L0 + dimF (L0 )0 = dimF V,

and
dimF L + dimF L0 = dimF V,
from which we conclude that dimF L = dimF (L0 )0 . We will be done if we prove that
L ≤ (L0 )0 , as a proper subspace has lower dimension than the space it is a subspace of.
Let ` ∈ L; by definition,
`(v) = 0 for all v ∈ L0 ,
By this means, by definition, that ` ∈ (L0 )0 , which completes the first part.
The second part now follows. Clearly, L = M implies that L0 = M0 . Conversely, if L0 = M0
then (L0 )0 = (M0 )0 , i.e., L = M .

(harder) 4.21 Prove Proposition 4.17.

4.5.3 Linear systems and linear forms


Let A ∈ Mm×n (F). We consider the space of solutions

SA = {v ∈ Fncol ∶ Av = 0Fm
col
}

of the homogeneous linear system. Each of the m rows of A can be viewed


as a linear form acting on an element of Fncol ; Thus the set of solutions SA
equals,

SA = {v ∈ Fncol ∶ Rowi (A)v = 0, i = 1, . . . , m} = {Rowi (A) ∶ i = 1, . . . , m}0 .

By Proposition 4.16,

SA = (Span{Rowi (A) ∶ i = 1, . . . , m})0 = (R(A))0 ,

i.e., the set of solutions is the null space of the row space of A. Proposi-
tion 4.17 asserts that

dimF R(A) + dimF SA = dimF Fncol = n.

Recall that the dimension of the row space equals the dimension of the column
space, and that this dimension is called the rank of the matrix. Thus,

dimF SA = n − rankA.
224 Chapter 4

In other words, for a homogeneous linear system of m equations in n un-


knowns, the space of solutions is a linear subspace of Fncol , whose dimension
is n minus the rank of A, which we recall is the number of non-zero rows in
its row-reduced form (make sure that this makes sense to you).

Example: Consider once again the matrix


⎡0 0 1 4⎤
⎢ ⎥
⎢ ⎥
⎢2 4 2 6⎥ ,
⎢ ⎥
⎢3 6 2 5⎥
⎣ ⎦
whose row-reduced form is
⎡1 2 0 −1⎤
⎢ ⎥
⎢ ⎥
⎢0 0 1 4 ⎥.
⎢ ⎥
⎢0 0 0 0 ⎥
⎣ ⎦
In this case, n = 4 and rankA = 2. As for the space of solutions, its dimension
is 2,

⎪ ⎡−2s + t⎤ ⎫
⎪ ⎧
⎪ ⎡−2⎤ ⎡ 1 ⎤⎫

⎪ ⎢ ⎥ ⎪
⎪ ⎪
⎪ ⎢ ⎥ ⎢ ⎥⎪ ⎪

⎪⎢ s ⎥
⎪ ⎥ ⎪
⎪ ⎪⎢⎢ 1 ⎥⎥ ⎢⎢ 0 ⎥⎥⎪
⎪ ⎪

SA = ⎨⎢ ⎥ ∶ s, t ∈ R⎬ = Span ⎨⎢ ⎥ , ⎢ ⎥⎬ .

⎪ ⎢ −4t ⎥ ⎪
⎪ ⎪
⎪ ⎢ 0 ⎥ ⎢−4 ⎥⎪
⎪⎢⎢ t ⎥⎥

⎪ ⎪

⎪ ⎪⎢⎢ 0 ⎥⎥ ⎢⎢ 1 ⎥⎥⎪

⎪ ⎪

⎩⎣ ⎦ ⎭ ⎩⎣ ⎦ ⎣ ⎦⎪ ⎭
▲▲▲

Example: Let’s have a different look on the relation between equations and
solutions. Let V = F3col ; then V ∨ = F3row under the action through row-column
multiplication. We use the standard bases for V and V ∨ . Consider the linear
form
⎡x1 ⎤
⎢ ⎥
⎢ ⎥
`(x) = [1 1 1] ⎢x2 ⎥ = x1 + x2 + x3 .
⎢ 3⎥
⎢x ⎥
⎣ ⎦
The space of solutions, which is the null space of {`} is

⎪ ⎡−s − t⎤ ⎫
⎪ ⎧
⎪ ⎡−1⎤ ⎡−1⎤⎫
⎪⎢⎢
⎪ ⎥
⎥ ⎪
⎪ ⎪⎢⎢ ⎥⎥ ⎢⎢ ⎥⎥⎪
⎪ ⎪

{`}0 = ⎨⎢ s ⎥ ∶ s, t ∈ F⎬ = Span ⎨⎢ 1 ⎥ , ⎢ 0 ⎥⎬ ≤ F3col .
⎪⎢⎢ t ⎥⎥
⎪ ⎪
⎪ ⎪
⎪ ⎢ ⎥ ⎢ ⎥⎪ ⎪

⎩⎣ ⎦ ⎪
⎭ ⎩⎢⎣ 0 ⎥⎦ ⎢⎣ 1 ⎥⎦⎪
⎪ ⎭
The equation represented by the linear form whose coordinates (relative to
the standard dual basis) are [1, 1, 1], induces a space of solutions, which is
Linear Forms 225

a two-dimensional subspace of F3col . As we know, the space of solution does


not change if we multiply ` by any non-zero scalar: the space of solution is
in fact the null space of the one-dimensional subspace of linear forms, whose
coordinate representation is

F[1, 1, 1] = {[a, a, a] ∶ a ∈ F}.

Denote the space of solutions by W . We may ask the opposite question:


does the space of solutions define the equation whose solution they are. This
is really asking: what are all the linear forms ` satisfying `(w) = 0F for all
w ∈ W . Write such a linear form as

` = a1 e 1 + a2 e 2 + a3 e 3 ,

we require that ` ∈ W 0 , which is the case if and only if


⎡ ⎤ ⎡ ⎤
⎛⎢⎢−1⎥⎥⎞ ⎛⎢⎢−1⎥⎥⎞
` ⎜⎢ 1 ⎥⎟ = −a1 + a2 = 0F and ` ⎜⎢ 0 ⎥⎟ = −a1 + a3 = 0F ,
⎝⎢⎢ 0 ⎥⎥⎠ ⎝⎢⎢ 1 ⎥⎥⎠
⎣ ⎦ ⎣ ⎦
from which we obtain that a1 = a2 = a3 , i.e., ` must be of the form
⎡e1 ⎤
⎢ ⎥
⎢ ⎥
` = a (e1 + e2 + e3 ) = a [1 1 1] ⎢e2 ⎥ ,
⎢ 3⎥
⎢e ⎥
⎣ ⎦
which is what we expected. This examples show once again the relations
between equations and solutions as linear subspaces of vectors and linear
forms. ▲▲▲
226 Chapter 4
Chapter 5

Linear Transformations

5.1 Definition and examples


Mathematics features all kind of “categories”, which are sets endowed with
a structure. This course is concerned with the category of vector spaces over
a field F, which are sets endowed with a notion of linear combinations. A
major reason for defining vector spaces is that they are abundant—there are
many vector spaces of interest in mathematics and its applications; indeed,
it wouldn’t make sense to define a class of objects if there was only one such
object in this class. Thus, we often encounter situations in which there are
multiple vector spaces (over the same field). In such cases, we might be
interested in looking at functions between two such objects.
Let (V, +, F, ⋅) and (W, +, F, ⋅) be two vector spaces over the same field. The
set
Func(V, W ) = {f ∶ V → W }

is the space of functions with domain (!M‫ )תחו‬V and codomain (!‫ )טווח‬W . But
just as with the linear forms on V , which are functions FV , we delineate a
subset of all functions that “respect” the vector space stucture:

Definition 5.1 Let (V, +, F, ⋅) and (W, +, F, ⋅) be vector spaces. A linear


transformation (!‫ )העתקה לינארית‬from V to W is a function f ∶ V → W ,
satisfying
f (u + v) = f (u) + f (v)
228 Chapter 5

and
f (a v) = a f (v)
for all u, v ∈ V and a ∈ F. The set of all linear transformations from V to
W is denoted by HomF (V, W ).

Comments:

(a) Note once again how addition and scalar multiplication on both sides
of an equations are operations on different spaces.
(b) Setting W = F, HomF (V, F) = V ∨ .

The following properties of linear transformation are easy to prove (cf. with
their analogs for linear forms):

Proposition 5.2 Let (V, +, F, ⋅) and (W, +, F, ⋅) be vector spaces and let f ∈
HomF (V, W ). Then,

(a) f (0V ) = 0W .
(b) For every v ∈ V , f (−v) = −f (v).
(c) For every v1 , . . . , vn ∈ V and a1 , . . . , an ∈ F,

f (a1 v1 + ⋅ ⋅ ⋅ + an vn ) = a1 f (v1 ) + ⋅ ⋅ ⋅ + an f (vn ).

Proof : For the first item, for every v ∈ V ,


f (0V ) = f (0F v) = 0F f (v) = 0W .
For the second item,
f (−v) = f ((−1F )v) = (−1F ) f (v) = −f (v).
The third item follows by induction. Note that we can write it in matrix
form,
⎡ a1 ⎤ ⎡ a1 ⎤
⎛ ⎢ ⎥⎞ ⎢ ⎥
⎢ ⎥ ⎢ ⎥
f ⎜(v1 . . . vn ) ⎢ ⋮ ⎥⎟ = (f (v1 ) . . . f (vn )) ⎢ ⋮ ⎥ .
⎢ n ⎥⎠ ⎢ n⎥
⎝ ⎢a ⎥ ⎢a ⎥
⎣ ⎦ ⎣ ⎦
n
Linear Transformations 229

Example: The zero transformation f ∶ V → W defined by


f (v) = 0W for all v ∈ V
is a linear transformation. ▲▲▲

Example: The identity map (!‫ )העתקת הזהות‬f ∶ V → V defined by


f (v) = v for all v ∈ V
is a linear transformation. ▲▲▲

Example: The inverse map f ∶ V → V defined by


f (v) = −v for all v ∈ V
is a linear transformation. ▲▲▲

Example: Linear forms are linear transformations HomF (V, F). ▲▲▲

Example: Maps f ∶ V → V defined by


f (v) = a v for all v ∈ V
for some a ∈ F are linear transformations. They are called homotheties
(!‫)הומותטיות‬. ▲▲▲

Example: Let V be a finitely-generated vector space and let B = (v1 . . . vn )


be an ordered basis. The coordinate map
f ∶ V → Fncol
defined by
f (v) = [v]B
is a linear transformation. This was in fact proved in Proposition 3.46. ▲▲▲

Example: Let A ∈ Mm×n (F). Consider the transformations


f ∶ Fncol → Fm
col and g ∶ Fm n
row → Frow

defined by
f (v) = Av and g(w) = wA.
Both are linear transformations. ▲▲▲
230 Chapter 5

5.2 Properties of linear transformations


Like for linear forms, we consider the case where V is finitely-generated.
The following two propositions are analogous to Proposition 4.4 and Propo-
sition 4.5

Proposition 5.3 Let V be a finitely-generated vector space and let W be a


vector space over the same field. Let

B = (v1 . . . vn )

be an ordered basis for V . Then, for every sequence w1 , . . . , wn ∈ W there


exists a linear transformation f ∈ HomF (V, W ), such that

f (vi ) = wi for every i = 1, . . . , n.

Proof : There really is only one way to define such a transformation. Since
every v ∈ V has a unique representation as

v = a1 v 1 + ⋅ ⋅ ⋅ + an v n ,

then f (v) must be given by

f (v) = a1 f (v1 ) + ⋅ ⋅ ⋅ + an f (vn ) = a1 w1 + ⋅ ⋅ ⋅ + an wn .

To complete the proof, we have to verify that f is a linear transformation.


Let u, v ∈ V be given by

u = a1 v 1 + ⋅ ⋅ ⋅ + an v n
v = b1 v 1 + ⋅ ⋅ ⋅ + bn v n .

Then,
u + v = (a1 + b1 ) v1 + ⋅ ⋅ ⋅ + (an + bn ) vn .
By the way we defined f ,

f (u) = a1 w1 + ⋅ ⋅ ⋅ + an wn
f (v) = b1 w1 + ⋅ ⋅ ⋅ + bn wn ,
Linear Transformations 231

and
f (u + v) = (a1 + b1 ) w1 + ⋅ ⋅ ⋅ + (an + bn ) wn ,
so that indeed f (u + v) = f (u) + f (v). We proceed similarly to show that
f (k v) = k f (v) for k ∈ F. n
The following complementing proposition asserts that there really was no
other way to define f :

Proposition 5.4 Let V be a finitely-generated vector space and let W be a


vector space over the same field. Let

B = (v1 . . . vn )

be an ordered basis for V . If two linear transformations g, f ∈ HomF (V, W )


satisfy
g(vi ) = f (vi ) for all i = 1, . . . , n,
then g = f .

Proof : By the property of a basis in a finitely-generated vector space, every


v ∈ V can be represented uniquely as

v = a1 v 1 + ⋅ ⋅ ⋅ + an v n

for some scalars a1 , . . . , an . Then, by the linearity of g, f ,

g(v) = a1 g(v1 ) + ⋅ ⋅ ⋅ + an g(vn ) = a1 f (v1 ) + ⋅ ⋅ ⋅ + an f (vn ) = f (v).

Example: Let V be a finitely-generated vector space. Let

B = (v1 . . . vn )

be an ordered basis for V . Let λ1 , . . . , λn ∈ F. Then the linear transformation


f ∈ HomF (V, V ) satisfying
f (vi ) = λi vi
is defined uniquely. ▲▲▲
232 Chapter 5

Example: Let V = Fncol and W = Fm col . We will show that to every f ∈


HomF (V, W ) corresponds a A ∈ Mm×n (F) such that

f (v) = Av.

Take the standard basis E = (e1 . . . en ). Every v ∈ Fncol has a unique


representation
v = v 1 e1 + ⋅ ⋅ ⋅ + v n en ,
hence
f (v) = v 1 f (e1 ) + ⋅ ⋅ ⋅ + v n f (en ) = Av,
where for every i = 1, . . . , n,

Coli (A) = f (ei ).

▲▲▲

Example: Let V = R2 and W = R3 . Consider the basis for R2 , B = (v1 , v2 ),


where
v1 = (1, 2) and v2 = (3, 4).
By the above propositions, there exists a unique linear transformations f ∶
R2 → R3 satisfying

f (v1 ) = (3, 2, 1) and f (v2 ) = (6, 5, 4).

How do we find it. A direct calculation shows that


1
(3y − 4x)
[(x, y)]B = [ 21 ].
2 (2x − y)

Hence,
1 1
f (x, y) = (3y − 4x) f (v1 ) + (2x − y) f (v2 )
2 2
1 1
= (3y − 4x)(3, 2, 1) + (2x − y)(6, 5, 4).
2 2
For example,
f (1, 0) = −2(3, 2, 1) + (6, 5, 4) = (0, 1, 2).
▲▲▲
Linear Transformations 233

Exercises

(easy) 5.1 Which of the following functions f ∶ R2 → R2 is a linear trans-


formation?

(a) f (x, y) = (1 + x, y).


(b) f (x, y) = (y, x).
(c) f (x, y) = (x2 , y).
(d) f (x, y) = (sin x, y).
(e) f (x, y) = (y − x, 0).

Solution 5.1: No, yes, no, no, yes.

(easy) 5.2 Let V = R2 and W = R3 . Write in explicit form the linear


transformation f ∈ HomR (V, W ) satisfying

f (1, 2) = (3, 2, 1) and f (3, 4) = (6, 5, 4).

Solution 5.2: Since for every (x, y) ∈ R2


(x, y) = (−2x + 23 y)(1, 2) + (x − 21 y)(3, 4),

it follows that

f (x, y) = (−2x + 32 y)f (1, 2) + (x − 12 y)f (3, 4) = (−2x + 23 y)(3, 2, 1) + (x − 12 y)(6, 5, 4).

(easy) 5.3 Let

V = R<2 [X] = {p ∈ R[X] ∶ deg p < 2},

and let W = M3 (R). Define the function f ∶ V → W ,


⎡a ⎤
⎢ ⎥
⎢ ⎥
f (a + bX) = ⎢ a + b ⎥ .
⎢ ⎥

⎣ b⎥⎦
234 Chapter 5

(a) Show that f is a linear transformation.


(b) Does there exist a p ∈ V such that
⎡1 ⎤
⎢ ⎥
⎢ ⎥
f (p) = ⎢ 0 ⎥ ?
⎢ ⎥

⎣ 1⎥⎦

(c) Does there exist a non-zero p ∈ V such that f (p) = 0W ?

Solution 5.3: The answer to (b) is negative, because f only yields diagonal matrices
in which the middle term in the sum of the two other term. The answer to (c) is also
negative, because f (a + bX) = 0W if and only if a = b = 0.

(intermediate) 5.4 Let f ∶ C2 → C be defined by

f (z, w) = z + w̄,

where w̄ is the complex-conjugate of w. Is f a linear transformation when

(a) C2 and C are vector spaces over C?


(b) C2 and C are vector spaces over R?

Solution 5.4: The answer to (a) is negative, because for (w, z), (s, t) ∈ C2 and a ∈ C ∖ R
(viewed as a scalar),

f (a(w, z) + (s, t)) = f (aw + s, az + t) = aw + s + az + t = aw + s + az + t


= (aw + āz̄) + (s + t̄) ≠ a f (w, z) + f (s, t).

The answer to (b) is positive, because for (w, z), (s, t) ∈ C2 and a ∈ R (viewed as a scalar),

f (a(w, z) + (s, t)) = f (aw + s, az + t) = aw + s + az + t = aw + s + az + t


= a(w + z̄) + (s + t̄) = a f (w, z) + f (s, t).

(intermediate) 5.5 Does there exists a linear transformation f ∶ R3 → R3 ,


which is not the zero transformation satisfying

f (v1 ) = f (v2 ) = f (v3 ) = f (v4 ),


Linear Transformations 235

where
v1 = (1, 0, 1) v2 = (1, 2, 1) v3 = (0, 1, 1) v4 = (2, 3, 3) ?
If it does, write it explicitly; otherwise explain why not.
Solution 5.5: Let
u1 = v2 − v1 = (0, 2, 0) u2 = v3 − v1 = (−1, 1, 0) and u3 = v4 − v1 = (1, 3, 2).

It is given that f (u1 ) = f (u2 ) = f (u3 ) = 0. Now,

e2 = 21 u1 e1 = e2 − u2 and e3 = 21 (u3 − e1 − 3e2 ).

By linearity
f (e1 ) = f (e2 ) = f (e3 ) = 0,
3
hence for every (x, y, z) ∈ R ,

f (x, y, z) = f (x e1 + y e2 + z e3 ) = x f (e1 ) + y f (e2 ) + z f (e3 ) = 0.

(intermediate) 5.6 Consider a linear transformation f ∶ R3 → R2 satisfying


f (0, 1, 2) = (1, 0) and f (0, 0, 1) = (1, 1).
Based on this, it is possible to find
f (0, 2, 3) and f (1, 2, 3) ?

Solution 5.6: The answer to the first item is positive as


(0, 2, 3) = 2(0, 1, 2) − (0, 0, 1),

so that
f (0, 2, 3) = 2 f (0, 1, 2) − f (0, 0, 1) = 2(1, 0) − (1, 1) = (1, −1).
The answer to the second item is negative, since f (1, 0, 0) cannot be recovered from the
data.

(intermediate) 5.7 Let V, W be vector spaces and let U ≤ W . Let f ∶ V →


W be a linear transformation. Show that
S = {v ∈ V ∶ f (v) ∈ U } ≤ V.
236 Chapter 5

Solution 5.7: We need to show that the set of vectors v ∈ V satisfying that f (v) ∈ U
is not just a subset of V , but a vector subspace. First, we claim that S is not empty as
0V ∈ S; indeed f (0V ) = 0W ∈ U as U is a vector subspace of W . Second, if u, v ∈ S, then
f (u), f (v) ∈ U , hence
f (u + v) = f (u) + f (v) ∈ U,
proving that u + v ∈ S. Finally, if u ∈ S and a ∈ F, then
f (a u) = a f (u) ∈ U,
i.e., a u ∈ S.

(easy) 5.8 Let V be a vector space over F and let `1 , . . . , `n ∈ V ∨ . Define


f ∶ V → Fncol by
⎡ `1 (v) ⎤
⎢ ⎥
⎢ ⎥
f (v) = ⎢ ⋮ ⎥ .
⎢ n ⎥
⎢` (v)⎥
⎣ ⎦
Show that f is a linear transformation.
Solution 5.8: We have
⎡ `1 (u + v) ⎤ ⎡ `1 (u) + `1 (v) ⎤ ⎡ `1 (u) ⎤ ⎡ `1 (v) ⎤
⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥
f (u + v) = ⎢ ⋮ ⎥=⎢ ⋮ ⎥ = ⎢ ⋮ ⎥ + ⎢ ⋮ ⎥,
⎢ n ⎥ ⎢ n n ⎥ ⎢ n ⎥ ⎢ n ⎥
⎢` (u + v)⎥ ⎢` (u) + ` (v)⎥ ⎢` (u)⎥ ⎢` (v)⎥
⎣ ⎦ ⎣ ⎦ ⎣ ⎦ ⎣ ⎦
and
⎡ `1 (a u) ⎤ ⎡ a `1 (u) ⎤ ⎡ `1 (u) ⎤
⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎢ ⎥ ⎢ ⎥ ⎢ ⎥
f (a u) = ⎢ ⋮ ⎥ = ⎢ ⋮ ⎥ = a ⎢ ⋮ ⎥ .
⎢ n ⎥ ⎢ n ⎥ ⎢ n ⎥
⎢` (a u)⎥ ⎢a ` (u)⎥ ⎢` (u)⎥
⎣ ⎦ ⎣ ⎦ ⎣ ⎦

(harder) 5.9 Consider V = Func(R, R) as a vector space over R. Show


that a function h ∶ V → V (it is a function mapping functions to functions!)
defined for every f ∈ V by
(h(f ))(x) = f (x + 1)
is a linear transformation.
Solution 5.9: Let f, g ∈ Func(R, R) and a ∈ R. Then, for every x ∈ R
(h(f + g))(x) = (f + g)(x + 1) = f (x + 1) + g(x + 1) = (h(f ))(x) + (h(g))(x),
and
(h(a f ))(x) = (a f )(x + 1) = a f (x + 1) = a (h(f ))(x),
i.e., h(f + g) = h(f ) + h(g) and h(a f ) = a h(f ).
Linear Transformations 237

5.3 The space HomF(V, W )


Given what we learned about linear forms, you will not be surprised to know
that the set of linear transformations HomF (V, W ) can be given a structure of
a vector space. After all, it is a subset of the space of functions Func(V, W ),
which are a vector space with respect to the addition of functions,

(f + g)(v) = f (v) + g(v),

and scalar multiplication,

(a f )(v) = a f (v).

Proposition 5.5 Let V and W be vector spaces over a field F. The set
HomF (V, W ) is a linear subspace of Func(V, W ).

Proof : The set HomF (V, W ) is non-empty because it contains the zero map.
Let f, g ∈ HomF (V, W ) and b ∈ F; we need to show that f + g ∈ HomF (V, W )
and that b f ∈ HomF (V, W ). For all u, v ∈ V ,

(f + g)(u + v) = f (u + v) + g(u + v) = (f (u) + f (v)) + (g(u) + g(v))


= (f (u) + g(u)) + (f (v) + g(v)) = (f + g)(u) + (f + g)(v),
and for every v ∈ V and a ∈ F,
(f + g)(a v) = f (a v) + g(a v) = a f (v) + a g(v)
= a (f (v) + g(v)) = a((f + g)(v)),

proving that f + g ∈ HomF (V, W ).


Likewise, for all u, v ∈ V ,
(b f )(u + v) = b (f (u + v)) = b (f (u) + f (v))
= b (f (u)) + b (f (v)) = (b f )(u) + (b f )(v),
and for every v ∈ V and a ∈ F,

(b f )(a v) = b (f (a v)) = b (a f (v)) = a (b f (v)) = a((b f )(v)),

proving that b f ∈ HomF (V, W ). n


238 Chapter 5

Example: Let V = Fncol and W = Fm


col . For A, B ∈ Mm×n (F), we define
fA , fB ∈ HomF (V, W ) by
fA (v) = Av and fB (v) = Bv.
Then, fA + fB ∈ HomF (V.W ) is given by
(fA + fB )(v) = fA (v) + fB (v) = Av + Bv = (A + B)v,
where in the last equality we used the distributivity of matrix multiplication.
Thus, fA + fB = fA+B . We conclude that the addition of matrices of the same
dimensions is really the addition of two linear transformations. ▲▲▲

5.4 Projections and reflections


In this section we will see two interesting examples of linear transformations.

Definition 5.6 Let V be a vector space over F and let U, W ≤ V . We say


that U and W are complementary (!M‫ )משלימי‬if

(a) U + W = V .
(b) To every v ∈ V correspond unique u ∈ U and w ∈ W , such that
v = u + w.

In such case we write


V = U ⊕ W,
and such a sum is called a direct sum (!‫ ישר‬M‫)סכו‬.

We have already seen earlier in this course that these two conditions are
equivalent to the conditions that U + W = V and U ∩ W = {0V }.

Example: Let V = R3 , then


U = {(v 1 , v 2 , v 3 ) ∈ R3 ∶ v 3 = 0} and W = {(v 1 , v 2 , v 3 ) ∈ R3 ∶ v 1 = v 2 = 0}
are complementary, because every v ∈ R3 is a sum of a vector in U and a
vector in W , and this decomposition,
(v 1 , v 2 , v 3 ) = (v 1 , v 2 , 0) + (0, 0, v 3 ),
is unique. ▲▲▲
Linear Transformations 239

Example: Let V = R[X] be the space of polynomials in X with real-valued


coefficients. Then, V = U ⊕ W , where
U = {p ∈ R[X] ∶ p = ∑ pi X 2i }
i=0
W = {p ∈ R[X] ∶ p = ∑ pi X 2i+1 }.
i=0

I.e., the polynomials of odd and even powers are complementary in the space
of all polynomials. ▲▲▲

Definition 5.7 Let V be a vector space over F such that V = U1 ⊕ U2 . We


define two projection operators (!‫)אופרטורי הטלה‬

p1 ∶ V → V and p2 ∶ V → V,

by
p1 (v) = u1 and p2 (v) = u2 ,
where v = u1 + u2 is the unique decomposition of v as a sum of elements in
U1 , U2 . The operator p1 is called the projection on U1 parallel to U2 ; the
operator p2 is called the projection on U2 parallel to U1 .

Comments:

(a) We could have defined p1 ∶ V → U1 and p2 ∶ V → U2 .


(b) For every v ∈ V ,

(p1 + p2 )(v) = p1 (v) + p2 (v) = u1 + u2 = v,

so that p1 + p2 is the identity V → V .

Example: Let V = R3 ,

U1 = Span{(1, 0, 0), (0, 1, 0)} and U2 = Span{(1, 1, 1)}.

Then, every (x, y, z) ∈ R3 has a unique decomposition

(x, y, x) = (x − z, y − z, 0) + (z, z, z),

so that

p1 (x, y, z) = (x − z, y − z, 0) and p2 (x, y, z) = (z, z, z).


240 Chapter 5

U2
S2 (v) p2 (v)
v

U1
−p1 (v) p1 (v)

▲▲▲

Definition 5.8 Let V be a vector space over F such that V = U1 ⊕ U2 . We


define two reflection operators (!P‫)אופרטורי שיקו‬

S1 ∶ V → V and S2 ∶ V → V,

by
S1 (v) = u1 − u2 and S2 (v) = u2 − u1 ,
where v = u1 + u2 is the unique decomposition of v as a sum of elements in
U1 , U2 .

Proposition 5.9 Let V be a vector space over F such that V = U1 ⊕ U2 .


Then the projection and the reflection operators are linear transformations.

Proof : The key is to observe that if u = u1 + u2 and v = v1 + v2 , where


u1 , v1 ∈ U1 and u2 , v2 ∈ U2 , then

u + v = u1 + v1 + u2 + v2 and a v = a v1 + a v 2 ,
´¹¹ ¹ ¹ ¹¸¹ ¹ ¹ ¹ ¶ ´¹¹ ¹ ¹ ¹¸¹ ¹ ¹ ¹ ¶ ° °
∈U1 ∈U2 ∈U1 ∈U2

hence by definition

p1 (u + v) = u1 + v1 = p1 (u) + p1 (v) and p1 (a v) = a v1 = a p1 (v),

and similarly for the three other operators. n


Linear Transformations 241

Exercises

(easy) 5.10 Prove (possibly for the second time) that V = U ⊕ W if and
only if V = U + W and U ∩ W = {0V }.

Solution 5.10: Suppose first that V = U + W and U ∩ W = {0V }. We need to show that
every v ∈ V has a unique representation as u + w with u ∈ U and w ∈ W . Suppose that

v = u1 + w1 = u2 + w2 ,

where u1 , u2 ∈ U and w1 , w2 ∈ W . Then,

u1 − u2 = w2 − w1 .

Since the left-hand side is in U and the right-hand side is in W , both vanish, i.e., u1 = u2
and w1 = w2 . Conversely, suppose that V = U ⊕ W . It follows at once that in particular
V = W + U . It remains to show that U ∩ W = {0V }. Suppose that there exists a v ∈ U ∩ W ,
v ≠ 0. Then,
v = v + 0V = 0V + v,
i.e., there are two different ways to write v as the sum of an element in U and an element
in W .

(easy) 5.11 Let V = R2 and consider the linear subspaces

U = Span{(1, 0)} and W = Span{(0, 1)}.

(a) Show that V = U ⊕ W .


(b) Write explicitly the linear transformations pi and Si .

Solution 5.11: For the first part, R2 = U + V as every (x, y) ∈ R2 can be written as
(x, y) = x(1, 0) + y(0, 1) .
´¹¹ ¹ ¹ ¸¹¹ ¹ ¹ ¶ ´¹¹ ¹ ¹ ¸¹ ¹ ¹ ¹¶
∈U ∈W

Also since all the elements in U ar of the form (x, 0), x ∈ R and all the elements in W are
of the form (0, y), y ∈ R, U ∩ W = {0}. For the second part,

p1 (x, y) = (x, 0) p2 (x, y) = (0, y) S1 (x, y) = (x, −y) and S2 (x, y) = (−x, y).
242 Chapter 5

(easy) 5.12 Let V = R2 and consider the linear subspaces

U = Span{(1, 2)} and W = Span{(1, 1)}.

(a) Show that V = U ⊕ W .


(b) Write explicitly the linear transformations pi and Si .

Solution 5.12: Every (x, y) ∈ R2 has a unique representation as


(x, y) = (y − x)(1, 2) + (2x − y)(1, 1).

Hence
p1 (x, y) = (y − x)(1, 2) p2 (x, y) = (2x − y)(1, 1)
S1 (x, y) = (y − x)(1, 2) − (2x − y)(1, 1) and S2 (x, y) = (2x − y)(1, 1) − (y − x)(1, 2).

(intermediate) 5.13 Let V = R3 and consider the linear subspaces

U = Span{(1, 0, 0), (1, 1, 0)} and W = Span{(1, 1, 1)}.

(a) Show that V = U ⊕ W .


(b) Write explicitly the linear transformations pi and Si .

Solution 5.13: Every (x, y, z) ∈ R3 has a unique representation as


(x, y, z) = (x − y)(1, 0, 0) + (y − z)(1, 1, 0) + z(1, 1, 1).

Hence,

p1 (x, y, z) = (x − y)(1, 0, 0) + (y − z)(1, 1, 0) and p2 (x, y, z) = z(1, 1, 1).

The reflections S1 , S2 are then readily obtained.

(intermediate) 5.14 Let f ∈ HomR (R3 , R3 ) be the linear transformation


defined by
f (x, y, z) = (x, y, −z).
Show that
Linear Transformations 243

(a) f (u) = u if and only if u ∈ Span(e1 , e2 ) = U .


(b) f (w) = −w if and only if w ∈ Span(e3 ) = W .
(c) f is the reflection through U parallel to W .

Solution 5.14: (a) We have f (x, y, z) = (x, y, z) if and only if z = 0, i.e., for all vectors
of the form
(x, y, 0) = x e1 + y e2 .
(b) We have f (x, y, z) = −(x, y, z) if and only if x = y = 0, i.e., for all vectors of the form

(0, 0, z) = z e3 .

(c) Since p1 (x, y, z) = (x, y, 0) and p2 (x, y, z) = (0, 0, z), it follows that

f (x, y, z) = (x, y, −z) = p1 (x, y, z) − p2 (x, y, z) = S1 (x, y, z).

(intermediate) 5.15 Let V = M2 (R),

a b −c 0
U = {[ ] ∶ a, b ∈ R} and W = {[ ] ∶ c, d ∈ R} .
0 0 c d

(a) Show that U, W ≤ V and V = U ⊕ W .


(b) Write explicitly the projection and reflection operators.

Solution 5.15: It is easy to see that U and W are not empty (both contain the zero
matrix) and are closed under vector addition and scalar multiplication, from which we
deduce that U, W ≤ V . Moreover, every element in V has a unique representation as

a b a+c b −c 0
[ ]=[ ]+[ ].
c d 0 0 c d

Hence
a b a+c b a b −c 0
p1 ([ ]) = [ ] and p2 ([ ]) = [ ].
c d 0 0 c d c d
The reflections S1 , S2 are then readily obtained.
244 Chapter 5

(harder) 5.16 Let V = Func(R, R) and consider the linear subspaces

U = {f ∈ Func(R, R) ∶ f (x) = f (−x) for all x ∈ R}

and
W = {f ∈ Func(R, R) ∶ f (x) = −f (−x) for all x ∈ R}.

(a) Show that V = U ⊕ W .


(b) Write explicitly the linear transformations pi and Si .

Solution 5.16: Every f ∈ Func(R, R) has a decomposition


f (x) = 12 (f (x) + f (−x)) + 21 (f (x) − f (−x)),

i.e., f = g + h, where g ∈ U and h ∈ W . This shows that V = U + W . This is a direct sum


because the only function that is both in U and in W is the zero function. Thus,

(p1 (f ))(x) = 21 (f (x) + f (−x)) and (p2 (f ))(x) = 12 (f (x) − f (−x)).

The reflections S1 , S2 are then readily obtained,

(S1 (f ))(x) = f (−x) and (S2 (f ))(x) = −f (−x).

(harder) 5.17 Let V be a vector space over Q. Let B = (v1 , v2 , v3 ) be


an ordered basis for V . Let f ∈ HomQ (V, V ) be the linear transformation
satisfying
f (v1 ) = 65 v1 − 13 v2 − 12 v3
f (v2 ) = − 16 v1 + 23 v2 − 21 v3
f (v3 ) = − 16 v1 − 13 v2 + 21 v3 .
Find subspaces U, W ≤ V such that f is the projection on U parallel to W .

Solution 5.17: Every vector v ∈ V has a unique representation as


v = a v1 + b v2 + c v3 .

The function f is given by

f (v) = ( 56 a − 61 b − 16 c)v1 + (− 13 a + 23 b − 31 c)v2 + (− 12 a − 21 b + 12 c)v3 .

Every vector in V can be obtained as f of a vector, it follows that U = V and W = {0V }.


Linear Transformations 245

5.5 Kernel and image


Definition 5.10 Let V and W be vector spaces over a field F. Let f ∈
HomF (V, W ). The kernel (!N‫ )גרעי‬of f is the set of vectors in V that are
mapped by f to the zero vector in W ,

ker f = {v ∈ V ∶ f (v) = 0W }.

The image (!‫ )תמונה‬of f is those vectors in w ∈ W for which there exists a
vectors in v ∈ W , such that w = f (v),

Image f = {f (v) ∶ v ∈ V } = {w ∈ W ∶ ∃v ∈ V, w = f (v)}.

Note that ker f is a subset of V whereas Image f is a subset of W . The


following proposition asserts that they are more than just subsets—they are
linear subspaces. Furthermore, for the case where W = F and HomF (V, W ) =
V ∨ , then ker f = {f }0 (the null space of f ).

Proposition 5.11 Let V and W be vector spaces over a field F. Let f ∈


HomF (V, W ). Then,

ker f ≤ V and Image f ≤ W.

Proof : The set ker f is not empty, because it contains 0V . Let u, v ∈ ker f
and a ∈ F, i.e.,
f (u) = f (v) = 0W .
Then,

f (u + v) = f (u) + f (v) = 0W and f (a v) = a f (v) = 0W ,

i.e., u + v ∈ ker f and a v ∈ ker f , proving that ker f is a linear subspace of V .


Likewise, Image f is not empty because it contains 0W . Let w1 .w2 ∈ Image f
and let a ∈ F. By definition of the image, there exist v1 , v2 ∈ V such that

w1 = f (v1 ) and w2 = f (v2 ).


246 Chapter 5

By the linearity of f ,
f (v1 + v2 ) = f (v1 ) + f (v2 ) = w1 + w2 ,
i.e., w1 + w2 ∈ Image f , and
f (a v1 ) = a f (v1 ) = a w1 ,
i.e., a w1 ∈ Image f , thus proving that Image f is a linear subspace of W . n

Example: Let f ∈ HomF (V, W ) be the zero transformation. Then


ker f = V and Image f = {0W }.
▲▲▲

Example: Let V = U1 ⊕ U2 and let p1 , p2 ∶ V → V be the projections on the


components of the direct sum. Then,
Image p1 = U1 ,
as, by definition, for every v ∈ V , p1 (v) = u1 where v = u1 + u2 for u1 ∈ U1
and u2 ∈ U2 . This shows that
Image p1 ≤ U1 .
On the other hand, for every u1 ∈ U1 , p1 (u1 ) = u1 , proving that
U1 ≤ Image p1 .

Likewise,
ker p1 = U2 ,
because if u2 ∈ U2 , then p1 (u2 ) = 0V , proving that
U2 ≤ ker p1 .
Conversely, if u ∈ ker p1 , then p1 (u) = 0V , proving that u ∈ U2 , i.e.,
ker p1 ≤ U2 .
▲▲▲
Recall that a function f ∶ V → W is called one-to-one (!‫( )חד חד ערכית‬or
injective), if f (u) = f (v) implies that u = v. The following proposition
relates the kernel of a linear transformation to its injectivity.
Linear Transformations 247

Proposition 5.12 Let f ∈ HomF (V, W ). Then, f is one-to-one if and only


if ker f = {0V }.

Proof : Let f be one-to-one. Since f (0V ) = 0W , it follows that f (v) = 0W only


if v = 0V , proving that ker f = {0V }. Conversely, suppose that ker f = {0V }.
Let u, v ∈ V satisfy f (u) = f (v). Then,

f (u − v) = f (u) − f (v) = 0W ,

i.e., u − v ∈ ker f , and by assumption u − v = 0V , namely, u = v. n

Exercises

(easy) 5.18 Let A ∈ M2×2 (R) be given by

1 2
A=[ ],
3 6

and consider the linear transformation f ∶ R2col → R2col ,

f (v) = Av.

Find ker f and Image f .

Solution 5.18: By definition,


x 1 2 x 0
ker f = {[ ] ∶ [ ] [ ] = [ ]} .
y 3 6 y 0

The answer is
−2t
ker f = {[ ] ∶ t ∈ R} .
t
For the second part,

1 2 x x + 2y t
Image f = {[ ] [ ] ∶ x, y ∈ R} = {[ ] ∶ x, y ∈ R} = {[ ] ∶ t ∈ R} .
3 6 y 3(x + 2y) 3t
248 Chapter 5

(easy) 5.19 Let

V = R<3 [X] = {p ∈ R[X] ∶ deg p < 3},

and let W = M2×2 (R). Consider the linear transformation f ∈ HomR (V, W ),

a+b 0
f (a + bX + cX 2 ) = [ ].
b+c c−a

Find ker f and Image f .

Solution 5.19: We have


ker f = Span{1 − X + X 2 },

and
a+b 0 x 0
Image f = {[ ] ∶ a, b, c ∈ R} = {[ ] ∶ x, y ∈ R} .
b+c c−a y y−x

5.6 linear transformations and subspaces


The content of the previous section is in fact particular cases to the more
general interaction between linear transformations and subspaces. Since lin-
ear transformations “communicate” with the linear structure of vector spaces
and so do linear subspaces, it turns out that linear transformations map sub-
spaces of V into subspaces of W , and conversely, the set of vectors whose
image under f lie in a subspace of W constitute a subspace of V .
We give here two useful definitions pertinent to functions between any pair
of sets:

Definition 5.13 Let f ∈ Func(D, C) be a function with domain (!M‫ )תחו‬D


and codomain (!‫ )טווח‬C (note that D and C need not have any algebraic
structure). Let S ⊆ D. The image (!‫ )תמונה‬of S under f is the subset of
C,
f (S) = {f (x) ∶ x ∈ S}.
That is, y ∈ f (S) if and only if there exists an x ∈ S such that y = f (x).
Linear Transformations 249

In the particular case where S = D, f (D) is the image of f ,

f (D) = Image f.

Definition 5.14 Let f ∈ Func(D, C). Let T ⊆ C. The pre-image (‫תמונה‬


!‫ )הפוכה‬of T under f is the subset of D,

f −1 (T ) = {x ∈ D ∶ f (x) ∈ T }.

That is, x ∈ f −1 (T ) if and only if f (x) ∈ T .

It is important to emphasize that the pre-image is always well-defined, re-


gardless of whether f is invertible! For invertible functions, the pre-image of
every singleton in C is a singleton in D. In the particular case where T = C,

f −1 (C) = {x ∈ D ∶ f (x) ∈ C} = D.

Indeed, by definition, every element in D is mapped by f into an element in


C.
Thus far, we dealt with general functions between sets. Next, these sets will
be vector spaces, or linear subspaces (which by definitions are vector spaces
on their own).

Proposition 5.15 Let V and W be vector spaces over a field F. Let f ∈


HomF (V, W ). For every U ≤ V ,

f (U ) ≤ W,

and for every Z ≤ W ,


f −1 (Z) ≤ V.

Comment: For U = V this proposition asserts that the image of f ,

Image f = f (V )

is a linear subspace of W ; for Z = {0W }, this proposition asserts that ker f ,

ker f = f −1 ({0W })

is a linear subspace of V .
250 Chapter 5

Proof : The proof follows the same lines as the proof of Proposition 5.11. The
set f (U ) is not empty because 0V ∈ U , hence 0W ∈ f (U ). Let w1 .w2 ∈ f (U )
and let a ∈ F. By definition, there exist u1 , u2 ∈ U such that

w1 = f (u1 ) and w2 = f (u2 ).

By the linearity of f ,

f (u1 + u2 ) = f (u1 ) + f (u2 ) = w1 + w2 ,

and since u1 + u2 ∈ U , it follows that w1 + w2 ∈ f (U ). Similarly,

f (a u1 ) = a f (u1 ) = a w1 ,

and since a u1 ∈ U , it follows that a w1 ∈ f (U ), thus proving that f (U ) is a


linear subspace of W .
Conversely, The set f −1 (Z) is not empty, because 0W ∈ Z hence 0V ∈ f −1 (Z).
Let u, v ∈ f −1 (Z) and a ∈ F. By definition,

f (u) ∈ Z and f (v) ∈ Z.

Since Z is a linear subspace of W ,

f (u + v) = f (u) + f (v) ∈ Z and f (a v) = a f (v) ∈ Z,

i.e.,
u + v ∈ f −1 (Z) and a v ∈ f −1 (Z).
proving that f −1 (Z) is a linear subspace of V . n

Exercises

(easy) 5.20 Let f ∶ R3 → R3 be the linear transformation given by

f (x, y, z) = (x + 2y, y − z, x + 2z).

Let

U = Span{(1, 1, 1)} and W = Span{(1, 0, 1), (0, 1, 0)}.

Find (a) ker f , (b) Image f , (c) f (U ), (d) f (W ), (e) f −1 (U ), (f) f −1 (W ).


Linear Transformations 251

Solution 5.20:
(a) ker f = Span{(−2, 1, 1)}.
(b) Image f = {(t + 2s, s, t) ∶ s, t ∈ R}.
(c) f (U ) = Span{(1, 0, 1)}.
(d) f (U ) = Span{(1, −1, 3), (2, 1, 0)}.
(e) We need to find the solutions to f (x, y, z) = (c, c, c), i.e.,

x + 2y = c y−z =c and x + 2z = c.

This system is only solvable for c = 0, i.e., f −1 (U ) = ker f (note that it is always the
case that ker f ≤ f −1 (U )).
(f) We need to find the solutions to f (x, y, z) = (a, b, a), i.e.,

x + 2y = a y−z =b and x + 2z = a.

This implies y = z and b = 0. It follows that

f −1 (W ) = {(x, y, y) ∶ x, y ∈ R}.

5.7 Nullity and Rank


The kernel and the image of a linear transformation are defined for transfor-
mations between any pair of vector spaces. We now examine the case where
V is finitely-generated. First a lemma:

Lemma 5.16 Let V and W be vector spaces over a field F, with V finitely-
generated. Let f ∈ HomF (V, W ). Then, both ker f and Image f are finitely-
generated.

Proof : Since ker f ≤ V and V is finitely-generated, then ker f is also finitely-


generated. More surprising perhaps is the fact that Image f is finitely-
generated, even though W may not be. Let

B = (v1 . . . vn )
252 Chapter 5

be a generating set for V . We will show that

C = (f (v1 ) . . . f (vn ))

is a generating set for Image f , hence Image f is of dimension at most n.


Let w ∈ Image f . By definition, there exists a v ∈ V , such that f (v) = w.
Since B is a generating set for V , there exist n scalars a1 , . . . , an ∈ F, such
that
v = a1 v 1 + ⋅ ⋅ ⋅ + an v n .
By the linearity of f ,

w = f (v) = a1 f (v1 ) + ⋅ ⋅ ⋅ + an f (vn ),

proving that w ∈ Span C, i.e.,

Image f ≤ Span C.

Definition 5.17 Let V and W be vector spaces over a field F, with V


finitely-generated. Let f ∈ HomF (V, W ). The nullity (!‫ )אפסּות‬of f is

ν(f ) = dimF ker f.

The rank (!‫ )דרגה‬of f is

%(f ) = dimF Image f.

Intuitively, the larger the nullity of a linear transformation, the more vectors
in V are mapped to the zero vector in W . The larger the range of f , the
more vectors in W are obtained by applying f on vectors in V .

Example: Let f ∈ HomF (V, W ) be the zero transformation. Then

ν(f ) = dimF V and %(f ) = 0.

▲▲▲
Linear Transformations 253

Theorem 5.18 (Rank-nullity theorem (!M‫ ))משפט הממדי‬Let V and W


be vector spaces over a field F, with V finitely-generated. Let f ∈
HomF (V, W ). Then,
ν(f ) + %(f ) = dimF V.
(In other words, there is a “tradeoff” between “how many” vectors in V are
mapped to the zero vector and “how many” vector in W can be obtained as
the output of f .)

Proof : The idea of the proof is quite similar in essence to the proof of the
theorems relating the annihilators of subspaces for linear forms. Denote by
n the dimension of V . Let
(u1 . . . uk )
be a basis for ker f (which is of dimension at most n) and let

B = (u1 , . . . , uk , v1 , . . . , vn−k )

be its completion to a basis for V (recall that a basis for any subspace can be
completed into a basis for the entire space). Since, by definition, ν(f ) = k,
it remains to prove that %(f ) = n − k. Consider the set

C = (f (v1 ) . . . f (vn−k )) .

If we show that C is a basis for Image f , then we are done.


Let w ∈ Image f . By definition there exists a v ∈ V such that w = f (v).
Since B is a basis for V ,

v = a1 u1 + ⋅ ⋅ ⋅ + ak uk + b1 v1 + ⋅ ⋅ ⋅ + bn−k vn−k

for some scalars a1 , . . . , ak , b1 , . . . , bn−k ∈ F. Applying f on both sides, using


its lineartiy

w = f (v) = a1 f (u1 ) + ⋅ ⋅ ⋅ + ak f (uk ) + b1 f (v1 ) + ⋅ ⋅ ⋅ + bn−k f (vn−k ).

However, ui ∈ ker f , namely, f (ui ) = 0, from which we obtain that

w = b1 f (v1 ) + ⋅ ⋅ ⋅ + bn−k f (vn−k ),


254 Chapter 5

i.e., w ∈ Span C, hence the latter is a generating set for Image f .


It remains to show that the sequence C is independent. Suppose that

b1 f (v1 ) + ⋅ ⋅ ⋅ + bn−k f (vn−k ) = 0W

for some scalars b1 , . . . , bn−k ∈ F. We need to show that bi = 0 for all i =


1, . . . , n − k.
Using the linearity of f in the “reverse direction”,

f (b1 v1 + ⋅ ⋅ ⋅ + bn−k vn−k ) = 0W .

This implies that


b1 v1 + ⋅ ⋅ ⋅ + bn−k vn−k ∈ ker f.
Since the ui ’s form a basis for ker f , there exist scalars a1 , . . . , ak ∈ F, such
that
b1 v1 + ⋅ ⋅ ⋅ + bn−k vn−k = a1 u1 + ⋅ ⋅ ⋅ + ak uk .
However, B is a basis for V (i.e., the set comprising both ui ’s and vi ’s is
linearly-independent). It follows that ai = 0 and bj = 0 for all i and j, proving
that the sequence C is linearly-independent. This completes the proof. n

Comment: An implication of the rank-nullity theorem is that the dimension


of the image of a linear transformation cannot exceed the dimension of it
domain. For example, if V = R and W = R27 , then the image of a linear
transformation f ∈ HomF (V, W ) is at most one-dimensional. In a sense, a
linear transformation cannot “create a space from nothing”.

Example: Let A ∈ Mm×n (F) and consider the linear map f ∈ HomF (Fncol , Fm
col )
given by
f (v) = Av.
In this case, the image of f is the column space of A,

Image A = C (A),

whereas the kernel of f is the set of zeros of the rows of A, viewed as linear
forms, i.e.,
ker A = (R(A))0 .
Linear Transformations 255

Then, by the rank-nullity theorem,

dimF C (A) + dimF (R(A))0 = dimF Fncol = n.

Recall that dimF C (A) is the column-rank of A, whereas n − dimF (R(A))0 is


the row-rank of A (the number of non-zero rows in the reduced form). Thus,
we have discovered once again that the row-rank of a matrix equals to its
column-rank. ▲▲▲

Exercises

(intermediate) 5.21 Let V be a vector space and let

B = {v1 , v2 , v3 , v4 }

be a basis for V . Let f ∈ HomF (V, V ) such that {v1 , v2 } is a basis for ker f .
Show that the set {f (v3 ), f (v4 )} is linearly-independent.

Solution 5.21: Suppose that


a f (v3 ) + b f (v4 ) = 0W .

This implies that


f (a v3 + b v4 ) = 0W ,
i.e., that a v3 + b v4 ∈ ker f . However, since {v1 , v2 } is a basis for ker f , there exist c, d ∈ F
such that
a v3 + b v4 = c v1 + d v2 ,
i.e.,
c v1 + d v2 − a v3 − b v4 = 0V .
Since {v1 , v2 , v3 , v4 } is a basis for V it follows that a = b = c = d = 0, proving that
{f (v3 ), f (v4 )} is linearly-independent.

(harder) 5.22 Find a linear transformation f ∶ R<4 [X] → M2×3 (R), such
that
ker f = Span{X 3 − 2X + 1, X 3 + X 2 − X + 3}
and
−1 2 1
Span {[ ]} ⊆ Image f.
3 −1 0
256 Chapter 5

Solution 5.22: We are looking for a linear transformation


∗ ∗ ∗
x1 + x2 X + x3 X 2 + x4 X 3 ↦ [ ],
∗ ∗ ∗

such that each entry of the matrix is a linear combination of x1 , x2 , x3 , x4 . Each entry of
the image of f is a linear form in x1 , .., x4 . All vanish for (x1 , x2 , x3 , x4 ) = (1, −2, 0, 1) and
(x1 , x2 , x3 , x4 ) = (3, −1, 1, 1), i.e.,

aij (x1 , x2 , x3 , x4 ) = sx1 + tx2 + (t − 2s)x3 + (2t − s)x4 .

Any such transformation satisfies the kernel condition. All that remains is to decide, for
example, that
−1 2 1
f (1) = [ ],
3 −1 0
and set (for example) t = 0 above to yield,

−x1 + 2x3 + x4 2x1 − 4x3 − 2x4 x1 − 2x3 − x4


f (x1 + x2 X + x3 X 2 + x4 X 3 ) = [ 1 ].
3x − 6x3 − 3x4 −x1 + 2x3 + x4 0

(intermediate) 5.23 Consider a linear transformation g ∈ HomR (R4 , R3 )


satisfying

g(1, 3, −1, 0) = (1, 0, −4) and g(2, 1, 2, 1) = (2, 0, −8).

(a) Can g be one-to-one? Find an example or argue why not.


(b) Can g be onto? Find an example or argue why not.

Solution 5.23: (a) No.


g(2, 6, −2, 0) = 2 g(1, 3, −1, 0) = (2, 0, −8) = g(2, 1, 2, 1).

This implies that (0, 5, −4, −1) ∈ ker g. (b) Yes. Since dimR ker g = 1 it follows that
dimR Image g = 3.

(intermediate) 5.24 Let f ∈ HomR (R4 , R3 ). Let v1 , v2 ∈ R4 be indepen-


dent vectors satisfying f (v1 ) = f (v2 ) = 0R3 . Show that f is not onto.
Linear Transformations 257

Solution 5.24: Since v1 , v2 ∈ ker f are independent, it follows that dimF ker f ≥ 2, hence
dimF Image f ≤ 4 − 2 < 3, from which follows that f is not onto.

(intermediate) 5.25 Which of the following assertions is true? Provide an


example or disprove:

(a) There exists a linear transformation f ∈ HomR (R2 , R2 ) satisfying ker f =


Image f .
(b) There exists a linear transformation f ∈ HomR (R3 , R3 ) satisfying ker f =
Image f .

Solution 5.25: (a) Yes. For example, take


f (x, y) = (y, 0).

Then,
ker f = Image f = Span{(1, 0)}.
(b) No. Because dimF ker f + dimF Image f is an odd number.

(intermediate) 5.26 Let V and W be vector spaces over F. Let f ∈


HomF (V, W ) and let
(v1 , . . . , vn )
be a sequence of vectors in V .

(a) Suppose that f is one-to-one. Show that (v1 , . . . , vn ) are linearly-


independent if and only if (f (v1 ), . . . , f (vn )) are linearly-independent.
(b) Suppose that f is onto. Show that if (v1 , . . . , vn ) is a generating set
for V , then (f (v1 ), . . . , f (vn )) is a generating set for W .
(c) Show that it is not generally true that if (f (v1 ), . . . , f (vn )) is a gener-
ating set for W , then (v1 , . . . , vn ) is a generating set for V .
(d) Suppose that f is one-to-one and onto. Show that (v1 , . . . , vn ) is a
basis for V if and only if (f (v1 ), . . . , f (vn )) is a basis for W .
258 Chapter 5

Solution 5.26: (a) Let f be one-to-one and let (v1 , . . . , vn ) be linearly-independent.


Then, if
a1 f (v1 ) + ⋯ + an f (vn ) = 0W ,
it follows that f (a1 v1 + ⋅ ⋅ ⋅ + an vn ) = 0W , i.e., a1 v1 + ⋅ ⋅ ⋅ + an vn ∈ ker f . Since ker f = {0V } it
follows that a1 v1 + ⋅ ⋅ ⋅ + an vn = 0V , from which we deduce that all the ai are zero, proving
that (f (v1 ), . . . , f (vn )) are linearly-independent. Conversely, if (f (v1 ), . . . , f (vn )) are
linearly-independent and a1 v1 + ⋅ ⋅ ⋅ + an vn = 0V , then

a1 f (v1 ) + ⋯ + an f (vn ) = 0W ,

hence all the ai are zero, proving that the (v1 , . . . , vn ) are linearly-independent.
(b) Let f be onto and let (v1 , . . . , vn ) be a generating set. Let w ∈ W , Since f is onto,
there exists a v such that w = f (v). Since we can write v as

v = a1 v1 + ⋅ ⋅ ⋅ + an vn ,

it follows that
w = a1 f (v1 ) + ⋯ + an f (vn ),
proving that (f (v1 ), . . . , f (vn )) is a generating set.
(c) Let V = R2 , W = R and f (x, y) = x. Then, {f (1, 0)} = {1} is a generating set for W ,
but {(1, 0)} is not a generating set for V .
(d) If f is one-to-one and onto, then by the rank theorem dimF V = dimF W . If (v1 , . . . , vn )
is a basis for V then it is in particular linearly-independent, hence (f (v1 ), . . . , f (vn )) is
linearly-independent, hence a basis for W . By the same argument, if (f (v1 ), . . . , f (vn ))
is a basis for W , then (f −1 (f (v1 )), . . . , f −1 (f (vn ))) is linearly-independent, hence a basis
for V .

5.8 Composition of linear transformations


Let U, V, W be vector spaces over a field F. For f ∈ HomF (U, V ) and
g ∈ HomF (V, W ), we can compose (!‫ )להרכיב‬the two functions, yielding
a function
g ○ f ∶ U → W,
given for all u ∈ U by
(g ○ f )(u) = g(f (u)).
Note that the composition of functions is a notion pertinent to sets; there is
nothing “linear” about it. The following proposition asserts that the compo-
sition of linear transformations is a linear transformation:
Linear Transformations 259

Proposition 5.19 Let U, V, W be vector spaces over a field F. If f ∈


HomF (U, V ) and g ∈ HomF (V, W ), then g ○ f ∈ HomF (U, W ).

Proof : For every u1 , u2 ∈ U , since f and g are both linear transformations,


(g ○ f )(u1 + u2 ) = g(f (u1 + u2 ))
= g(f (u1 ) + f (u2 ))
= g(f (u1 )) + g(f (u2 ))
= (g ○ f )(u1 ) + (g ○ f )(u2 ),
and for every u ∈ U and a ∈ F,
(g ○ f )(a u) = g(f (a u)) = g(a f (u)) = a g(f (u)) = a (g ○ f )(u),
proving that g ○ f is indeed a linear transformation. n

Example: Consider the case of U = Fncol , V = Fm k


col and W = Fcol . Let
A ∈ Mm×n (F) and B ∈ Mk×m (F). Define the linear transformations fA ∈
HomF (U, V ) and fB ∈ HomF (V, W ) by
fA (u) = Au and fB (v) = Bv.
Then, fB ○ fA ∈ HomF (U, W ) is given by
(fB ○ fA )(u) = fB (fA (u)) = B(fA (u)) = B(Au) = (BA)u,
where in the last step we used the associativity of matrix multiplication.
Thus, fB ○ fA = fBA , showing that matrix multiplication is in fact a compo-
sition of linear transformations. ▲▲▲
The following properties of composition are easy to verify:

Proposition 5.20 Let f, f1 , f2 ∈ HomF (U, V ) and g, g1 , g2 ∈ HomF (V, W ).


Then,
(g1 + g2 ) ○ f = g1 ○ f + g2 ○ f,
and
g ○ (f1 + f2 ) = g ○ f1 + g ○ f2 .
260 Chapter 5

We next relate the composition of linear transformations to the notions of


kernel and image:

Proposition 5.21 Let U, V, W be vector spaces over the same field F. Let
f ∈ HomF (U, V ) and g ∈ HomF (V, W ). Then,

ker f ≤ ker(g ○ f ),

and
Image(g ○ f ) ≤ Image g.

In other words, if f maps a vector to zero then the further application of


g cannot yield a non-zero vector. Also, g ○ f cannot return a vector that g
cannot return.

Proof : Let u ∈ ker f , i.e., f (u) = 0V , then

(g ○ f )(u) = g(f (u)) = g(0V ) = 0W ,

which means that u ∈ ker g ○ f , i.e., ker f ⊆ ker g ○ f .


Let w ∈ Image(g ○ f ). By definition, there exists a u ∈ U such that

w = (g ○ f )(u),

but this means that


w = g(f (u)),
proving that w ∈ Image g, i.e., Image(g ○ f ) ⊆ Image g. n

Exercises

(easy) 5.27 Prove Proposition 5.20.

Solution 5.27: For example,


((g1 + g2 ) ○ f )(v) = (g1 + g2 )(f (v)) = g1 (f (v)) + g2 (f (v)) = (g1 ○ f + g2 ○ f )(v).
Linear Transformations 261

(easy) 5.28 Let f, g, h ∈ HomR (R3 , R3 ) be defined by


f (x, y, z) = (x, y, −z) g(x, y, z) = (y + z, x, x + z)
h(x, y, z) = (x + 2y, 2x + y, 0).
Write explicitly the linear transformations

2f − g f + 2h f ○g g○f h ○ f + 2g.

Solution 5.28: For example,


(h ○ f + 2g)(x, y, z) = h(f (x, y, z)) + 2g(x, y, z)
= h(x, y, −z) + 2(y + z, x, x + z)
= (x + 2y, 2x + y, 0) + 2(y + z, x, x + z)
= (x + 4y + 2z, 4x + y, 2x + 2z).

(easy) 5.29 Let f ∈ HomF (V, V ). Show that

ker f ≤ ker(f ○ f ) and Image(f ○ f ) ≤ Image f.

Solution 5.29: If v ∈ ker f then f (v) = 0V , hence f (f (v)) = 0V , i.e., v ∈ ker f ○ f . If


w ∈ Image f ○ f , then there exists a v ∈ V such that w = f (f (v)) i.e., w ∈ Image f .

(intermediate) 5.30 Let f ∈ HomF (V, V ). Show that

ker f = ker(f ○ f )

if and only if ker f ∩ Image f = {0V }.

Solution 5.30: In the previous exercise we say that it is always the case that ker f ≤
ker(f ○f ). Thus, we need to show that ker(f ○f ) ≤ ker f if and only if ker f ∩Image f = {0V }.
Suppose that ker(f ○ f ) = ker f . Let v ∈ ker f ∩ Image f . Then, there exists a w such that
v = f (w) and
f (v) = f (f (w)) = 0V .
This implies that f (w) = 0V , i.e., ker f ∩ Image f = {0V }.
Conversely, suppose that ker f ∩ Image f = {0V } and let f (f (v)) = 0V . Since f (v) ∈
ker f Image , it follows that f (v) = 0V , i.e., ker(f ○ f ) ≤ ker f .
262 Chapter 5

(intermediate) 5.31 Let V be a finitely-generated vector space over R. Let


f ∈ HomR (V, V ) satisfy
f ○ f = 2f.
Show that ker f ∩ Image f = {0V } and that
V = Image f ⊕ ker f.

Solution 5.31: Let w ∈ ker f ∩ Image f , then w = f (v) for some v ∈ V and
w = f (v) = 21 f (f (v)) = 21 f (w) = 0V ,
proving that ker f ∩ Image f = {0V }. It remains to prove that V = Image f ⊕ ker f , but
dimF (ker f + Image f ) = dimF ker f + dimF Image f − dimF (ker f ∩ Image f ),
´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¸ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¶ ´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¸¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¶
=dimF V =0

hence ker f + Image f = V .

(intermediate) 5.32 Let V be a finitely-generated vector space over R. Let


f ∈ HomR (V, V ). Show that
ker f = ker(f ○ f ) implies Image f = Image(f ○ f ).

Solution 5.32: If ker f = ker(f ○ f ), then


dimF Image f = dimF V − dimF ker f = dimF V − dimF ker(f ○ f ) = dimF Image(f ○ f ).
Since Image(f ○f ) ≤ Image f both must be equal. The other direction is proved similarly.

(intermediate) 5.33 Find vector spaces U, V, W and linear transformations


f ∈ HomF (U, V ) and g ∈ HomF (V, W ), such that
ker f < ker g ○ f,
and
Image g ○ f < Image g.

Solution 5.33: Take U = R2 , V = R2 and W = R. Take,


f (y, z) = (y, 0) and g(y, z) = z.
Then, ker f = Span{(1, 0)}, ker(g ○ f ) = R2 , Image g = R and Image(g ○ f ) = {0}.
Linear Transformations 263

5.9 Rotations of the plane

In this section we introduce yet another family of linear transformations—


this time transformations from the plane to itself, R2 → R2 . Recall that a
point (x, y) ∈ R2 can be represented by its distance from the origin, r, and
the angle α formed between the arrow pointing to it from the origin and the
x axis:

(r cos α, r sin α)

We define a family of linear transformations,

Rotθ ∶ R2 → R2 ,

where θ ∈ R, where

Rotθ (r cos α, r sin α) = (r cos(α + θ), r sin(α + θ)).

That is, this transformation rotates vector about the origin by an angle θ.
264 Chapter 5

(r cos(α + θ), r sin(α + θ))

(r cos α, r sin α)

θ
α

On the face of it, this transformation doesn’t seem linear; the trigonometric
functions are nonlinear. However, using the trigonometric identities,
cos(α + θ) = cos α cos θ − sin α sin θ
sin(α + θ) = sin α cos θ + cos α sin θ,
setting x = r cos α and y = r sin α, we find that
Rotθ (x, y) = (cos θ x − sin θ y, sin θ x + cos θ y),
that is Rotθ ∈ HomR (R2 , R2 ).
If we rather write the components of vectors relative to the standard basis,
x cos θ x − sin θ y cos θ − sin θ x
Rotθ ([ ]) = [ ]=[ ][ ].
y sin θ x + cos θ y sin θ cos θ y

Thus, a rotation in the plane by an angle θ is represented (in standard coor-


dinates) by a multiplication by a rotation matrix (!‫)מטריצת סיבוב‬

cos θ − sin θ
Rθ = [ ].
sin θ cos θ

What happens if we compose two rotations? What happens if we rotate a


vector by an angle θ and then rotate the result by an addition angle of ϕ.
Clearly, we expect
Rotϕ ○ Rotθ = Rotϕ+θ .
Linear Transformations 265

A straightforward calculation shows that indeed

cos ϕ − sin ϕ cos θ − sin θ cos(ϕ + θ) − sin(ϕ + θ)


[ ][ ]=[ ],
sin ϕ cos ϕ sin θ cos θ sin(ϕ + θ) cos(ϕ + θ)

i.e.,
Rϕ Rθ = Rϕ+θ .

Note that
R2π = R0 = I,
and
Rθ R−θ = I.

5.10 The dimension of HomF(V, W )


Since the set of linear transformations HomF (V, W ) is a vector space in its
own right, a number of questions arise right away: under what conditions is
it finitely-generated? What would be a natural basis for it?
The following lemma is the key to answering these questions:

Lemma 5.22 Let V and W be finitely-generated vector spaces. Let

B = (v1 . . . vn ) and C = (w1 . . . wm )

be ordered bases for V and W . Then, there exists for every i = 1, . . . , n and
j = 1, . . . , m a unique linear transformation fji ∈ HomF (V, W ), such that

⎪wj
⎪ k=i
fji (vk ) =⎨ (5.1)
⎩0W k ≠ i.

Proof : This is an immediate consequence of Proposition 5.3 and Propo-


sition 5.4, whereby a linear transformation is uniquely determined by its
action on basis vectors. It is worth though to examine this in more detail.
266 Chapter 5

Let f ∈ HomF (V, W ) and consider the vector f (v1 ): it has a unique represen-
tation as a linear combination of the basis vectors wi , which we may write
as
⎡ a1 ⎤
⎢ 1⎥
⎢ ⎥
f (v1 ) = (w1 . . . wm ) ⎢ ⋮ ⎥ .
⎢ 1⎥
⎢am ⎥
⎣ ⎦
Repeating this for each of the n vectors f (vj ), we obtain that f is uniquely
determined by an m × n matrix
⎡ a1 ⋯ an ⎤
⎢ 1 1⎥
⎢ ⎥
(f (v1 ) . . . f (vn )) = (w1 . . . wm ) ⎢ ⋮ ⋮ ⋮ ⎥.
⎢ 1 ⎥
⎢am ⋯ anm ⎥
⎣ ⎦
The function fji corresponds to the matrix A which is zero everywhere, except
for the element on the i-th colum and j-th row, which is equal to one. n

Example: Let n = 3 and m = 5, then, for example,

f42 (v1 ) = 0W f42 (v2 ) = w4 and f42 (v3 ) = 0W ,

namely,
⎡0 0 0⎤⎥

⎢0 0 0⎥⎥

⎢ ⎥
(f42 (v1 ) f42 (v2 ) f42 (v3 )) = (w1 w2 w3 w4 w5 ) ⎢0 0 0⎥ .
⎢ ⎥
⎢0 1 0⎥⎥

⎢0 0 0⎥⎦

▲▲▲

Theorem 5.23 Let V and W be finitely-generated vector spaces. Let

B = (v1 . . . vn ) and C = (w1 . . . wm )

be ordered bases for V and W . Then, the linear transformations fji defined
by (5.1) are a basis for HomF (V, W ). In particular,

dimF HomF (V, W ) = dimF V × dimF W.


Linear Transformations 267

Corollary 5.24 In the particular case where W = F. Theorem 5.23 asserts


that
dimF HomF (V, F) = dimF V × dimF F = dimF V,
´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¸¹ ¹¹¹¹¹¹¹¹¹¹¹¹¹¶ ´¹¹ ¹ ¹ ¸¹¹ ¹ ¹ ¶
V ∨ 1

which we already know.

Proof : We need to show that the set of linear transformations

{fji ∶ i = 1, . . . , n, j = 1, . . . , m}

is generating HomF (V, W ) and independent.


Let f ∈ HomF (V, W ). We want to show that it can be represented as
n m
f = ∑ ∑ aji fji ,
i=1 j=1

where ajj ∈ F are the coefficients. As we know, a linear transformation is


uniquely determined by its action on basis vectors: substituting vk , k =
1, . . . , n on both sides,
n m n m m
f (vk ) = ∑ ∑ aji fji (vk ) = ∑ ∑ aji δki wj = ∑ ajk wj .
i=1 j=1 i=1 j=1 j=1

For every fixed k = 1, . . . , n, the coefficients ajk are the coordinates of f (vk ) ∈
W relative the basis C,
ajk = ([f (vk )]C )j .
In other words, every f ∈ HomF (V, W ) can be represented as
n m
f = ∑ ∑([f (vi )]C )j fji ,
i=1 j=1

thus proving that the linear transformations fji is a generating set for HomF (V, W ).
It remains to show that they are also independent. Let aij be scalars and
suppose that
n m
j
∑ ∑ ai fji = 0HomF (V,W ) .
i=1 j=1
268 Chapter 5

Substituting vk on both sides,


n m m
j j
∑ ∑ ai fji (vk ) = ∑ ak wj = 0W .
i=1 j=1 j=1

Since C is a basis for W , it follows that ajk = 0F for all j = 1, . . . , m (and for
all k = 1, . . . , n), which completes the proof. n

5.11 Isomorphisms
The notion of isomorphism (!M‫ )איזומורפיז‬is fundamental is mathematics:
loosely speaking, two sets are said to be isomorphic if they are “the same”
up to a renaming of their elements. The most basic notion of isomorphism
is between plain sets: two sets S and T are isomorphic if there exists a func-
tion f ∶ S → T that is one-to-one (!‫ )חד חד ערכית‬and onto (!‫ ;)על‬then,
the function f induces a relation where every element in S can be identified
with an element in T and vice-versa, so that we could say that f (x) ∈ T is
a “renaming” of x ∈ S. We then say that f is an isomorphism and that S
and T are isomorphic (!M‫)איזומורפיי‬. An alternative way of stating that two
sets are isomorphic is that there exist two functions f ∶ S → T and g ∶ T → S,
such that
g(f (x)) = x for every x ∈ S,
and
f (g(y)) = y for every y ∈ T .
In other words, g ○ f = IdS and f ○ g = IdT .
Vector spaces are not just plain sets; they are endowed with a linear structure,
so for two vector spaces to be considered isomorphic, we require more than
being equivalent as sets. The function identifying an element in one space
with an element in the other space has to “respect” linear operations. This
leads us to the following definition:

Definition 5.25 Let V and W be vector spaces over a field F. The spaces
are called isomorphic if there exists a linear transformation f ∈ HomF (V, W )
and a linear transformation g ∈ HomF (W, V ) such that g ○ f = IdV and f ○ g =
IdW . The function f is called an isomorphism from V to W and g is called
an isomorphism from W to V .
Linear Transformations 269

Note that f and g both necessarily one-to-one and onto. Take f for example:
for every w ∈ W ,
w = f (g(w)),
showing that f is onto. Likewise, if

f (v1 ) = f (v2 ),

then
v1 = g(f (v1 )) = g(f (v2 )) = v2 ,
showing that f is one-to-one.
The next proposition provides a sufficient condition for two vector spaces to
be isomorphic.

Proposition 5.26 Let V and W be vector spaces over a field F. Let f ∈


HomF (V, W ) be one-to-one and onto. Then, f is an isomorphism from V to
W (implying that V and W are isomorphic).

Comment: This proposition states that if a linear transformation is invert-


ible, then its inverse is necessarily also a linear transformation.

Proof : Since f is one-to-one and onto, it has an inverse, which we denote by


g. It remains to prove that g is a linear transformation. Let w1 , w2 ∈ W . By
definition, there exist unique v1 , v2 ∈ V , such that

w1 = f (v1 ) and w2 = f (v2 ).

Reciprocally,
v1 = g(w1 ) and v2 = g(w2 ).
By the linearity of f ,

w1 + w2 = f (v1 ) + f (v2 ) = f (v1 + v2 ),

and reciprocally,

g(w1 + w2 ) = v1 + v2 = g(w1 ) + g(w2 ),

thus proving the first condition of linearity for g.


270 Chapter 5

Likewise, let w ∈ W and a ∈ F. There exists a unique v ∈ V , such that

w = f (v) and v = g(w).

By the linearity of f ,
a w = a f (v) = f (a v),
and reciprocally,
g(a w) = a v = a g(w),
thus proving the second condition of linearity for g. n

Proposition 5.27 An isomorphism between vector spaces is an equivalence


relation.

Proof : Recall that an equivalence relation has three criteria: reflexivity,


symmetry and transitivity. Every vector space is isomorphic to itself. Why?
Take the identity f ∶ V → V , defined by

f (v) = v for all v ∈ V .

It is invertible (its inverse being also the identity) and linear, proving that V is
isomorphic to itself. Next, if V is isomorphic to W then W is isomorphic to V ,
because an isomorphism is symmetric by construction. Remains transitivity:
suppose that U and V and isomorphic and V and W are isomorphic. By
definition, there exist
f ∈ HomF (U, V ) g ∈ HomF (V, U )
h ∈ HomF (V, W ) k ∈ HomF (W, V ),

such that

g ○ f = IdU f ○ g = IdV k ○ h = IdV and h ○ k = IdW .


f h
( (
Uh V h W
g k

Consider the functions

h○f ∶U →W and g ○ k ∶ W → U.
Linear Transformations 271

Since they are compositions of linear transformations, they are linear trans-
formations, i.e.,

h ○ f ∈ HomF (U, W ) and g ○ k ∈ HomF (W, U ).

Finally, for every u ∈ U ,

(g ○ k) ○ (h ○ f )(u) = g(k(h(f (u)) = g(f (u) = u,

and for every w ∈ W ,

(h ○ f ) ○ (g ○ k)(w) = h(f (g(k(w)) = h(k(w) = w,

proving that U and W are isomorphic. n


Now that we know what are isomorphic vector spaces, we will see examples,
and in particular, discover that we have already encountered isomorphisms
without being aware of it...

Example: Let V be finitely-generated vector space over F and let

B = (v1 . . . vn )

be an ordered basis. The mapping f ∶ V → Fncol ,

f ∶ v ↦ [v]B ,

mapping every vector to its coordinate matrix relative to B is an isomor-


phism. We know that this is a linear transformation; it is also one-to-one—
every vector has a unique coordinate representation—and onto—every col-
umn of n scalars is the coordinate matrix of some vector. ▲▲▲

Lemma 5.28 Let V and W be finitely-generated vector spaces over F having


the same dimension. Let f ∈ HomF (V, W ). Then, f is one-to-one if and only
if f is onto.

Proof : This is an immediate consequence of the rank-nullity theorem (The-


orem 5.18), whereby

dimF ker f + dimF Image f = dimF V = dimF W.


272 Chapter 5

Recall that f is one-to-one if and only if dimF ker f = 0 (Proposition 5.12); f


is onto if and only if Image f = W . Thus, if f is one-to-one, then

dimF Image f = dimF W,

which implies that Image f = W , proving that f is onto. Conversely, if f is


onto, namely, Image f = W , then

dimF ker f + dimF W = dimF W,

i.e., ker f = {0V }, hence f is one-to-one. n

Proposition 5.29 Every two finitely-generated vector spaces over the same
field having the same dimension are isomorphic.

Proof : Let V and W be vector spaces of dimension n over F. Let

B = (v1 . . . vn ) and C = (w1 . . . wn )

be ordered bases for V and W . Define f ∈ HomF (V, W ) as the unique linear
transformation satisfying

f (vi ) = wi for all i = 1, . . . , n.

If we show that f is one-to-one and onto, then we are done, but from
Lemma 5.28 it suffices to show just one of them. Let w ∈ W ; by the definition
of a basis, there exist scalars a1 , . . . , an ∈ F, such that

w = a1 w1 + ⋅ ⋅ ⋅ + an wn .

Then,
w = a1 f (v1 ) + . . . an f (vn ) = f (a1 v1 + ⋅ ⋅ ⋅ + an vn ),
i.e., w ∈ Image f , which proves that f is onto. n

Corollary 5.30 Every finitely-generated vector space is isomorphic to its


dual.
Linear Transformations 273

Comment: Isomorphisms are commonly split into two categories: natural


isomorphisms and “unnatural” ones. An isomorphism is called natural if its
definition does not rely on arbitrary choices. When we say that two vector
spaces of the same dimension are isomorphic, the isomorphism depends on a
choice of bases, therefore it is not considered natural.

Example: Let V be a finitely-generated vector space over F. It is isomorphic


to its dual, and its dual is isomorphic to its own dual (the so called double-
dual). By transitivity, V is isomorphic to (V ∨ )∨ . In this case, there exists a
natural isomorphism. Consider the map

f ∶ V → (V ∨ )∨

assigning to every v ∈ V a linear form (on linear forms...) f (v) defined by

(f (v))(`) = `(v) for every ` ∈ V ∨ .

We claim that f is an isomorphism, i.e., it is a linear transformation, one-to-


one and onto.
To show that it is linear, we note that for every u, v ∈ V and every ` ∈ V ∨ ,

(f (u + v))(`) = `(u + v) = `(u) + `(v) = (f (u))(`) + (f (v))(`),

and since this hold for every ` ∈ V ∨ ,

f (u + v) = f (u) + f (v).

Similarly, for v ∈ V , a ∈ F and ` ∈ V ∨ ,

(f (a v))(`) = `(a v) = a `(v) = a (f (v))(`),

and since this hold for every ` ∈ V ∨ ,

f (a v) = a f (v).

This completes the proof that f is a linear transformation.


Since V and (V ∨ )∨ are of the same dimension, it suffices to show that f is
one-to-one, and equivalently, that its kernel is trivial. Let v ∈ ker f . This
means that
(f (v))(`) = `(v) = 0
274 Chapter 5

for all ` ∈ V ∨ . We have seen that if v ≠ 0 then there exists a linear form
` ∈ V ∨ such that `(v) ≠ 0. We conclude that v = 0V , i.e.,
ker f = {0V },
completing the proof that f is an isomorphism. This isomorphism is consid-
ered natural because it does not hinge on any arbitrary construct. ▲ ▲ ▲
We end this section with one more manifestation of isomorphisms respecting
the linear structure of vector spaces:

Proposition 5.31 Let V, W be finitely-generated vector spaces over F and


let f ∈ HomF (V, W ) be an isomorphism. If

B = (v1 . . . vn )

is a basis for V , then


C = (f (v1 ) . . . f (vn ))
is a basis for W . In particular, two finitely-generated vector spaces are iso-
morphism if and only if they are of the same dimension.

Proof : We need to prove that C is generating W and independent. Denote


by g ∈ HomF (W, V ) the map inverse to f . Let w ∈ W and let v = g(w). Since
B is a basis for V , we can write
v = a1 v 1 + ⋅ ⋅ ⋅ + an v n
for some scalars a1 , . . . , an ∈ F. Then,
w = f (v) = a1 f (v1 ) + ⋅ ⋅ ⋅ + an f (vn ),
proving that C is a generating set for W .
Let
a1 f (v1 ) + ⋅ ⋅ ⋅ + an f (vn ) = 0W .
Applying g on both sides, using its linearity and the fact that g ○ f = Id, we
obtain that
a1 v1 + ⋅ ⋅ ⋅ + an vn = 0V .
Since B is a basis, ai = 0 for all i = 1, . . . , n, proving that C is an independent
set. This completes the proof. n
Linear Transformations 275

Exercises

(intermediate) 5.34 Complete the proof of Proposition 5.31: deduce that


if dimF V = m and dimF W = n, where m ≠ n, then V and W are not isomor-
phic.

(intermediate) 5.35 Let A ∈ Mn (F). Prove that the linear transformation


f ∶ Fncol → Fncol ,
f (v) = Av
is an isomorphism if and only if A ∈ GLn (F).

Solution 5.35: If f is an isomorphism, then ker f = {0}, i.e., Av = 0 if and only if v = 0,


which is equivalent to A being invertible. Conversely, if A is invertible, ker f = {0}, hence
f is also onto, hence an isomorphism.

5.12 Matrix representation


Recall that in finitely-generated vector spaces, the introduction of ordered
bases enables us to encode vectors as coordinate matrices. In a similar way, if
V and W are finitely-generated vector spaces, we can encode linear transfor-
mations in HomF (V, W ) as matrices acting of the coordinate representation
of v ∈ V , returning the coordinate representation of f (v) ∈ W .
Consider the following diagram:

f
V /W

B C

 
Fncol / Fm
Af col

In this diagram there are four vector spaces: V , W , Fncol and Fm


col . The arrows
represent linear transformations between the tail of the arrow and the head
of the arrow. Thus, f is a linear transformation from V to W . Assume that
dimF V = n and dimF W = m. The introduction of ordered bases, B for V and
C for W , induces two linear transformations, one from V to the space of its
276 Chapter 5

coordinate matrices Fncol , and one from W to the space of its coordinate matri-
ces Fm
col . In this section, we show that to every f ∈ HomF (V, W ) corresponds
a unique matrix Af ∈ Mm×n (F), which we view as a linear transformation in
HomF (Fncol , Fm
col ), such that this diagram “commutes”. To explain what this
means, consider the same diagram through its action on a vector v ∈ V :

v_ 
f
/ f (v)
_

B C

 
[v]B  Af
/ [f (v)]C

Take the vector v ∈ V ; if we apply f on it we obtain a vector f (v) ∈ W . If we


apply on the latter the linear transformation returning its coordinate matrix
relative to C, we obtain [f (v)]C ∈ Fm col . Alternatively, apply on v first the
linear transformation returning its coordinate matrix [v]B ∈ Fncol relative to
B. Multiply it then by the matrix Af , yielding a matrix Af [v]B . When we
say that the diagram commutes, we mean that either path yields the same
outcome, namely,
[f (v)]C = Af [v]B .

The matrix Af is called the matrix representing (!‫ )המטריצה המיצגת‬the


linear transformation f relative to the ordered bases B and C; we denote it
by [f ]B
C , i.e.,
[f (v)]C = [f ]B
C [v]B .

Theorem 5.32 Let V, W be finitely-generated vector spaces over F. Let

B = (v1 . . . vn ) and C = (w1 . . . wm )

be ordered bases for V and W . Every f ∈ HomF (V, W ) has a unique A ∈


Mm×n (F), such that

[f (v)]C = A [v]B for every v ∈ V .


Linear Transformations 277

Proof : Consider the transformation taking v ∈ V and returning [f (v)]C .


This is a mapping V → Fm col , which is a composition of two linear transforma-
tions, hence it is a linear transformation. For every i = 1, . . . , n, substituting
vi , we obtain a coordinate matrix

[f (vi )]C ∈ Fm
col .

We define then the matrix A to be the unique linear transformation in


HomF (Fncol , Fm
col ) satisfying

A[vi ]B = [f (vi )]C .

Here we used several facts: first, by Proposition 5.3 and Proposition 5.4,
a linear transformation is uniquely determined by its action on basis vec-
tors. But second, we used the fact that [vi ]B is a basis for Fncol ; this follows
from the fact that the mapping from a vector to its coordinate matrix is an
isomorphism (Proposition 5.31).
We claim that A has the desired property: for every v ∈ V , which we write
as
v = a1 v 1 + ⋅ ⋅ ⋅ + an v n ,
we have
A[v]B = A[a1 v1 + ⋅ ⋅ ⋅ + an vn ]B
= A (a1 [v1 ]B + ⋅ ⋅ ⋅ + an [vn ]B )
= a1 A[v1 ]B + ⋯ + an A[vn ]B
= a1 [f (v1 )]C + ⋅ ⋅ ⋅ + an [f (vn )]C
= [a1 f (v1 ) + ⋅ ⋅ ⋅ + an f (vn )]C
= [f (a1 v1 + ⋅ ⋅ ⋅ + an vn )]C
= [f (v)]C ,

which completes the proof. n


We’ve already seen this matrix. Recall that there is a matrix A ∈ Mm×n (F),
such that
(f (v1 ) . . . f (vn )) = (w1 . . . wm ) A,
The entries of A are precisely ([f (vi )]C )j . That is,

(f (v1 ) . . . f (vn )) = (w1 . . . wm ) [f ]B


C.
278 Chapter 5

Example: The zero transformation is represented by the zero matrix. ▲▲▲

Example: Let dimF V = n and consider the identity function Id ∈ HomF (V, V ),
i.e.,
Id(v) = v for all v ∈ V .
Let B = (v1 . . . vn ) be a basis for V . Then,
⎡1 ⎤
⎢ ⎥
⎢ ⎥
(Id(v1 ) . . . Id(vn )) = (v1 . . . vn ) ⎢ ⋱ ⎥ ,
⎢ ⎥

⎣ 1⎥⎦
i.e., the matrix representing the identity map of a vector space is the identity
matrix,
[Id]B
B = In .

In this very special case it does not depend on the choice of basis, as long as
we use the same basis both for the domain and the codomain. ▲▲▲

Example: Let dimF V = n and for a ∈ F consider the homothety f ∈


HomF (V, V ), given by

f (v) = av for all v ∈ V .

Let B = (v1 . . . vn ) be a basis for V . Then,


⎡a ⎤
⎢ ⎥
⎢ ⎥
(f (v1 ) . . . f (vn )) = (v1 . . . vm ) ⎢ ⋱ ⎥ ,
⎢ ⎥

⎣ a⎥⎦
i.e., the matrix representing a homothety is a multiple of the identity matrix,
and for every v ∈ V ,
⎡a ⎤
⎢ ⎥
⎢ ⎥
[f (v)]B = ⎢ ⋱ ⎥ [v]B .
⎢ ⎥

⎣ a⎥⎦
▲▲▲

Example: Let B and C be two ordered bases for V (which is finitely-


generated). Recall that the two bases are connected via transition matrices,
P, Q ∈ GLn (F),
C = BP and B = CQ,
Linear Transformations 279

where Q = P −1 . Furthermore, for every v ∈ V ,


[v]B = P [v]C and [v]C = Q[v]B .
Since we can equivalently write
[Id(v)]B = P [v]C ,
it follows that the transition matrix P is the matrix representing the identity
map Id ∈ HomF (V, V ) relative to the bases B and C, namely,
P = [Id]CB .

For example, let V = R2 , with


B = ((1, 2), (2, 1)) and C = ((1, 1), (1, −1)).
You may verify once again that
1/3 −1
((1, 1) (1, −1)) = ((1, 2) (2, 1)) [ ],
1/3 1
´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¸¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¶ ´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹¸¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹¶
C B ´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¸ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹¶
P

Then,
1/3 −1
[Id]CB = [ ],
1/3 1
implying that for all v ∈ R2 ,
1/3 −1
[v]B = [ ] [v]C .
1/3 1
▲▲▲

Exercises

(easy) 5.36 Let f ∈ HomR (R2 , R2 ) be given by


f (x, y) = (2x, x + y).
Calculate [f ]EB and [f ]B
E for

E = ((1, 0) (0, 1)) and B = ((1, 1) (0, 1)) .


280 Chapter 5

Solution 5.36: By definition,


[f (x, y)]B = [f ]E
B [(x, y)]E and [f (x, y)]E = [f ]B
E [(x, y)]B .

Now.
x x
[(x, y)]E = [ ] and [(x, y)]B = [ ],
y y−x
hence
2x 2x
[f (x, y)]E = [ ] and [f (x, y)]B = [ ].
x+y y−x
We deduce that
2x 2 0 x 2x 2 0 x
[f (x, y)]E = [ ]=[ ][ ] and [f (x, y)]B = [ ]=[ ][ ].
x+y 2 1 y−x y−x −1 1 y
´¹¹ ¹ ¹ ¹ ¸¹ ¹ ¹ ¹ ¶ ´¹¹ ¹ ¹ ¹ ¹ ¹ ¸¹¹ ¹ ¹ ¹ ¹ ¹ ¶
[f ]B
E
[f ]E
B

Alternatively, we may proceed as follows,

2 0
(f (1, 0), f (0, 1)) = ((2, 1), (0, 1)) = ((1, 1), (0, 1)) [ ].
´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹¸ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹¶ ´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¸¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹¶ −1 1
f (E) B ´¹¹ ¹ ¹ ¹ ¹ ¹ ¸¹¹ ¹ ¹ ¹ ¹ ¹ ¶
[f ]E
B

(easy) 5.37 Let A ∈ M2 (F) and let f ∶ M2 (F) → M2 (F) be given by f (B) =
AB.

(a) Show that f is a linear transformation.


(b) Does there exist a basis B for M2 (F), such that [f ]B
B = A?

Solution 5.37: f is linear because


f (B + C) = A(B + C) = AB + AC = f (A) + f (B) and f (c B) = A(c B) = c AB = c f (B).

The answer to the second question is negative because M2 (F) is a 4-dimensional space,
hence representing matrices are elements of M4 (F).

(easy) 5.38 Let En denote the standard ordered basis for Rn . Let f ∈
HomR (R2 , R3 ) be given by

f (x, y) = (2x − y, x + y, −x + 3y).


Linear Transformations 281

(a) Write the matrix [f ]EE23 .


(b) Find the linear transformation g ∈ HomR (R2 , R3 ) for which
⎡ 1 0⎤
⎢ ⎥
E2 ⎢ ⎥
[g]E3 = ⎢ 0 2⎥ .
⎢ ⎥
⎢−1 2⎥
⎣ ⎦

Solution 5.38: The answer to (a) is


⎡2 −1⎤⎥

⎢ ⎥
[f ]E
E3
2
=⎢1 1 ⎥.
⎢ ⎥
⎢−1
⎣ 3 ⎥⎦
The answer to (b) is
g(x, y) = (x, 2y, −x + 2y).

(easy) 5.39 Repeat the previous exercise, this time using the ordered bases
B = ((1, 1) (1, −1)) and C = ((1, 0, 0) (1, 1, 0) (1, 1, 1)) .

Solution 5.39: For part (a),


⎡−1 3 ⎤
⎢ ⎥
⎢ ⎥
(f (1, 1), f (1, −1)) = ((1, 2, 2), (3, 0, −4)) = ((1, 0, 0), (1, 1, 0), (1, 1, 1)) ⎢ 0 4 ⎥.
´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¸¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹¶ ´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¸ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¶ ⎢ 2 −4⎥⎥

f (B) C ⎣ ⎦
´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¸¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¶
[f ]B
C

For part (b),


⎡ 1 0⎤
⎢ ⎥
⎢ ⎥
(g(1, 1), g(1, −1)) = ((1, 0, 0), (1, 1, 0), (1, 1, 1)) ⎢ 0 2⎥ = ((0, −1, −1), (4, 4, 2)).
´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹¸ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹¶ ´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¸ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¶ ⎢⎢−1 2⎥⎥
g(B) C ⎣ ⎦
´¹¹ ¹ ¹ ¹ ¹ ¹ ¹¸ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¶
[g]B
C

It follows that
g(1, 0) = 21 (g(1, 1) + g(1, −1)) = (2, 32 , 12 )
g(0, 1) = 12 (g(1, 1) − g(1, −1)) = (−2, − 52 , − 32 ),
i.e.,
g(x, y) = (2x, 23 x, 12 x) + (−2y, − 52 y, − 23 y).
282 Chapter 5

(easy) 5.40 Find the linear transformation f ∈ HomR (R2 , R3 ) satisfying


[f ]B
C = I3 relative to the ordered bases

B = ((1, 0, −1) (1, −1, 0), (0, 1, 1))


C = ((1, 0, 0) (1, 1, 0) (1, 1, 1)) .

Solution 5.40: By definition,


⎡1 ⎤
⎢ ⎥
⎢ ⎥
(f (1, 0, −1), f (1, −1, 0, f (0, 1, 1))) = ((1, 0, 0), (1, 1, 0), (1, 1, 1)) ⎢ 1 ⎥ .
⎢ ⎥
´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹¸ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹¶ ´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¸ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¶ ⎢
⎣ 1⎥⎦
f (B) C
´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹¸¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¶
[f ]B
C

It follows that
1
f (1, 0, 0) = 2
(f (1, 0, −1) + f (1, −1, 0) + f (0, 1, 1)) = 12 (3, 2, 1)
1
f (0, 1, 0) = 2
(f (1, 0, −1) − f (1, −1, 0) + f (0, 1, 1)) = 12 (1, 0, 1)
1
f (0, 0, 1) = 2
(−f (1, 0, −1) + f (1, −1, 0) + f (0, 1, 1)) = 12 (1, 2, 1),

hence
f (x, y, z) = 21 (3x + y + z, 2x + 2z, x + y + z).

(intermediate) 5.41 Let V = M2 (R),

a b −c 0
U = {[ ] ∶ a, b ∈ R} and W = {[ ] ∶ c, d ∈ R} .
0 0 c d

In Exercise 5.15 you showed that V = U ⊕ W and wrote explicitly the pro-
jections pi and reflections Si .

(a) Find an ordered basis B = {u1 , u2 , w1 , w2 } for V , such that {u1 , u2 } is


a basis for U and {w1 , w2 } is a basis for W .
(b) Find the matrices [p1 ]B B
B and [S1 ]B .

(intermediate) 5.42 Let V = (C, +, R, ⋅) and consider the linear transfor-


mation f ∶ C → C defined by
f (z) = z̄.
Find [f ]B
B for B = (1, −ı).
Linear Transformations 283

Solution 5.42:
1 0
(f (1), f (−ı)) = (1, ı) = (1, −ı) [ ].
´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¸ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¶ ´¹¹ ¹ ¹ ¸¹ ¹ ¹ ¶ 0 −1
f (B) B ´¹¹ ¹ ¹ ¹ ¹ ¹ ¸¹¹ ¹ ¹ ¹ ¹ ¹ ¶
[f ]B
B

(intermediate) 5.43 Let

V = R<4 [X] = {p ∈ R[X] ∶ deg p < 4}.

Let f ∶ V → V be defined by

(f (p))(X) = X p′ (X),

where p′ is the derivative of p, viewed as a function, e.g., f (3X − X 2 ) =


X(3 − 2X) = 3X − 2X 2 .

(a) Show that f is a linear transformation.


(b) Find [f ]B 2 3
B for B = (1, X, X , X ).

(c) Find the kernel and the image of f .

Solution 5.43: For (b),


⎡ 0 0 0 0⎤
⎢ ⎥
⎢ 0 1 0 0⎥
⎢ ⎥
(0, X, 2X 2 , 3X 3 ) = (1, X, X 2 , X 3 ) ⎢ ⎥.

´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¸ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¶ ´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹¸ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹¶ ⎢ 0 0 2 0 ⎥

f (B) B ⎢ 0 0 0 3⎥
⎣ ⎦
´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹¸ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹¶
[f ]B
B

For (c), ker f = Span{1} and Image f = Span{X, X 2 , X 3 }.

(intermediate) 5.44 Let V = M2 (R) and let W = R<3 [X]. Let f ∶ V → W


be the linear transformation defined by

a b
f ([ ]) = (a + 2b + c) + (3a − d)X + (a − 4b − 2c − d)X 2 .
c d
284 Chapter 5

(a) Find [f ]B 2
C for C = (1, X, X ) and

1 0 0 1 0 0 0 0
B = ([ ] [ ] [ ] [ ] .)
0 0 0 0 1 0 0 1

(b) Find the kernel and the image of f .

Solution 5.44: By definition,


⎡1 2 1 0 ⎤⎥

2 2 2 2 2 ⎢ ⎥
(1 + 3X + X , 2 − 4X , 1 − 2X , −X − X ) = (1, X, X ) ⎢3 0 0 −1⎥ .
´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¸¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¶ ´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹¸ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹¶ ⎢1 −4 −2 −1⎥⎥

f (B) C ⎣ ⎦
´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹¸ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹¶
[f ]B
C

Then,
1 −1 0 −1
ker f = Span {[ ],[ ]} ,
1 3 2 0
and
Image f = Span{1 − 2X 2 , 1 + 2X}.
For the image, once we know that the kernel is two-dimensional, it suffices to find two
independent vectors in the image of f .

(intermediate) 5.45 Let f ∶ R<3 [X] → R<3 [X] be the linear transformation
represented by the matrix
⎡1 2 5⎤
⎢ ⎥
⎢ ⎥
[f ]B
B = ⎢−1 0 −1⎥
⎢ ⎥
⎢0 1 2⎥
⎣ ⎦
relative to the ordered basis B = (1, 1 + X, 1 − X + X 2 ). Find Image f .

Solution 5.45: It is given that


⎡1 2 5 ⎤⎥

⎢ 2 2 ⎥
(f (1), f (1 + X), f (1 − X + X )) = (1, 1 + X, 1 − X + X ) ⎢−1 0 −1⎥ .
⎢ ⎥
⎢0 1 2 ⎥⎦

That is,

f (1) = −X f (1 + X) = 3 − X + X 2 and f (1 − X + X 2 ) = 6 − 3X + 2X 2 .
Linear Transformations 285

We note that
f (2 + 3X − X 2 ) = 0,
hence
ker f = Span{2 + 3X − X 2 }.
By the rank theorem, the image of f is the span of any two linearly-independent vectors
in the image of f , e.g.,
Image f = Span{−X, 3 − X + X 2 }.

(intermediate) 5.46 Let

B = ((1, 3) (3, 0))

be an ordered basis for R2 . Let f ∈ HomR (R3 , R2 ) satisfy

−2 5 0
[f ]EB = [ ],
1 0 −1

where E is the standard basis. Find a matrix A ∈ M2×3 (R) such that

f (x, y, z) = (x, y, z)A.

Solution 5.46: We have


−2 5 0
(f (1, 0, 0), f (0, 1, 0), f (0, 0, 1)) = ((1, 3), (3, 0)) [ ] = ((1, −6), (5, 15), (−3, 0)).
1 0 −1

Hence,
⎡1 −6⎤⎥

⎢ ⎥
f (x, y, z) = (x + 5y − 3z, −6x + 15y) = (x, y, z) ⎢ 5 15 ⎥ .
⎢ ⎥
⎢−3
⎣ 0 ⎥⎦

(intermediate) 5.47 Let V be a vector space over R and let B = (v1 , v2 , v3 )


be an ordered basis for V . Let f ∈ HomR (V, R2 ) be the linear transformation
satisfying
5 −3 4
[f ]B
C =[ ]
−1 6 2
286 Chapter 5

where
C = ((1, 2) (0, −1)) .
Let v ∈ V satisfy
⎡1⎤
⎢ ⎥
⎢ ⎥
[v]B = ⎢ 0 ⎥ .
⎢ ⎥
⎢−1⎥
⎣ ⎦
Find f (v).

Solution 5.47: We have


⎡1⎤
5 −3 4 ⎢⎢ ⎥⎥ 1
[f (v)]C = [f ]B
C [v]B =[ ]⎢ 0 ⎥ = [ ],
−1 6 2 ⎢⎢ ⎥⎥ −3
⎣−1⎦
hence
1
f (v) = C [f (v)]C = ((1, 2) (0, −1)) [ ] = (1, 5).
−2

5.13 Algebra of transformations and matrix


algebra
We now connect the composition of linear transformations to their matrix
representation. Let U, V, W be vector spaces over F, let f ∈ HomF (U, V ) and
let g ∈ HomF (V, W ). Let B, C and D be ordered bases for U , V and W . The
following diagram is useful:

f g
U / V / W

B C D

  
Fncol / Fm / Fkcol
col
[f ]B
C [g]C
D
Linear Transformations 287

Proposition 5.33 The above diagram commutes, namely,

[g ○ f ]B C B
D = [g]D [f ]C .

In other words, the matrix representation of a composition is the product of


the matrix representations.

Proof : By definition, for every u ∈ U ,

[f (u)]C = [f ]B
C [u]B

and for every v ∈ V ,


[g(v)]D = [g]CD [v]C .
Combining the two,

[g(f (u))]D = [g]CD [f (u)]C = [g]CD [f ]B


C [u]B ,

and by definition,
[g]CD [f ]B B
C = [g ○ f ]D .

n
Linear transformation from V to W can also be added. The addition of
linear transformations is represented by the addition of the corresponding
transition matrices:

Proposition 5.34 Let V, W be vector spaces over F and let f, g ∈


HomF (V, W ). Let B and C be ordered bases for V and W . Then,

[f + g]B B B
C = [f ]C + [g]C .

Proof : By definition, for every v ∈ V ,

[f (v)]C = [f ]B
C [v]B and [g(v)]C = [g]B
C [v]B .
288 Chapter 5

Combining the two,


[(f + g)(v)]C = [f (v) + g(v)]C
= [f (v)]C + [g(v)]C
= [f ]B B
C [v]B + [g]C [v]B
= ([f ]B B
C + [g]C )[v]B ,

and by definition
[f ]B B B
C + [g]C = [f + g]C .

n
Similarly, we can prove:

Proposition 5.35 Let V, W be vector spaces over F; let f ∈ HomF (V, W )


and a ∈ F. Let B and C be ordered bases for V and W . Then,

[a f ]B B
C = a [f ]C .

Proof : We leave this as an exercise. n

Example: Let V = U1 ⊕ U2 , where


dimF U1 = dimF U2 = 1.
Recall that every v ∈ V has a unique representation as v = u1 + u2 , and we
defined the projection operators p1 , p2 ∶ V → V by
p1 (v) = u1 and p2 (v) = u2 ,
and the reflection operators S1 , S2 ∶ V → V by
S1 (v) = u1 − u2 and S2 (v) = u2 − u1 .
These linear transformations satisfy the following additive relations,
p1 + p2 = IdV
p1 − p2 = S1
p2 − p1 = S2
S1 + S2 = 0HomF (V,V ) .
Linear Transformations 289

In the present case there exist u, w ∈ V , such that

U1 = Span{u} and U2 = Span{w}.

Take B = (u, w) as an ordered basis for V . We have

1 0
[u]B = [ ] and [w]B = [ ] .
0 1

Since p1 (u) = u and p1 (w) = 0V , the matrix representation of p1 relative to


the basis B is
1 0
[p1 ]B
B =[ ].
0 0
Likewise,
0 0
[p2 ]B
B =[ ].
0 1
Note that [p1 ]B B B
B + [p2 ]B = I2 = [IdV ]B . Further

1 0
[S1 ]B B B
B = [p1 ]B − [p2 ]B = [ ],
0 −1

and
−1 0
[S2 ]B B B
B = [p2 ]B − [p1 ]B = [ ],
0 1
so that [S1 ]B B
B + [S2 ]B = 0M2 (F) , as expected.
Consider now compositions of these operators, for example,

p1 ○ p1 = p1 and p1 ○ p2 = 0V .

Indeed,
1 0 1 0 1 0
[p1 ]B B
B [p1 ]B = [ ][ ]=[ ] = [p1 ○ p1 ]B
B,
0 0 0 0 0 0
and
1 0 0 0 0 0
[p1 ]B B
B [p2 ]B = [ ][ ]=[ ] = [p1 ○ p2 ]B
B.
0 0 0 1 0 0
▲▲▲
290 Chapter 5

Exercises

(easy) 5.48 Show explicitly in the last example that p1 ○ S2 = −p1 and

[p1 ]B B B
B [S1 ]B = [p1 ○ S1 ]B .

(harder) 5.49 Let V be a three-dimensional vector space over a field F


and let f ∈ HomF (V, V ) be a linear transformation, which is not the zero
transformation, satisfying

f ○ f = 0HomF (V,V ) .

Show that there exists an ordered basis B for V , such that


⎡0 0 1⎤
⎢ ⎥
⎢ ⎥
[f ]B
B = ⎢0 0 0⎥.
⎢ ⎥
⎢0 0 0⎥
⎣ ⎦
Hint: start by finding the dimensions of ker f and Image f . Is one of those
subspaces contained in the other?

Solution 5.49: It is given that Image f ≤ ker f < 3. By the rank theorem,
dimF ker f + dimF Image f = 3,

but since
dimF Image f ≤ dimF dimF ker f,
we conclude that dimF ker f = 2 and dimF Image f = 1. Let {u1 } be a basis for Image f .
Let u2 complete u1 into a basis for ker f and let u3 ∈/ ker f complete u1 , u2 into a basis for
V . We choose u3 such that
f (u3 ) = u1 ,
which we can always do by an appropriate scalar multiplication. Hence,
⎡0 0 1⎤⎥

⎢ ⎥
(f (u1 ), f (u2 ), f (u3 )) = (0V , 0V , u1 ) = (u1 , u2 , u3 ) ⎢0 0 0⎥ .
⎢ ⎥
⎢0 0 0⎥⎦

(harder) 5.50 Let V be a three-dimensional vector space over a field F and


let f ∈ HomF (V, V ) be a linear transformation satisfying

f ○ f ≠ 0HomF (V,V ) and f ○ f ○ f = 0HomF (V,V ) .


Linear Transformations 291

Show that there exists an ordered basis B for V , such that


⎡0 1 0⎤
⎢ ⎥
⎢ ⎥
[f ]B
B = ⎢0 0 1⎥.
⎢ ⎥
⎢0 0 0⎥
⎣ ⎦
Hint: start by finding the dimensions of ker f and Image f . Is one of those
subspaces contained in the other? What is the implication of f ○ f not being
the zero transformation?

Solution 5.50: It is given that Image(f ○ f ) ≤ ker f ≤ ker(f ○ f ) < 3. By the rank
theorem,
dimF ker(f ○ f ) + dimF Image(f ○ f ) = 3,
from which we conclude that dimF ker(f ○ f ) = 2 and dimF Image(f ○ f ) = 1. It is also given
that Image f ≤ ker(f ○ f ) ≤ 3. A priori, there are two possibilities,

dimF ker f = 1 and dimF Image f = 2,

or
dimF ker f = 2 and dimF Image f = 1.
Suppose that dimF ker f = 2 and dimF Image f = 1. Since f ○ f ≠ 0, it follows that Image f ∩
ker f = {0V } (figure out why!). It follows that there exists a non-zero u ∈ Image f , such
that f (u) = u, i.e., f (f (f (u)) ≠ 0V , which is a contradiction. We thus conclude that

dimF ker f = 1 and dimF Image f = 2.

Let {u1 } be a basis for ker f . Let u2 complete u1 into a basis for ker(f ○ f ), and let u3
complete the two other into a basis for V . We can choose u2 such that f (u2 ) = u1 and we
can choose u3 such that f (u3 ) = u2 (figure out why!). Hence,
⎡0 1 0⎤⎥

⎢ ⎥
(f (u1 ), f (u2 ), f (u3 )) = (0V , u1 , u2 ) = (u1 , u2 , u3 ) ⎢0 0 1⎥ .
⎢ ⎥
⎢0 0 0⎥⎦

5.14 Change of basis


The change of an ordered basis induces a change in the coordinate matrices
of vectors. Likewise, it also induces a change in the matrix representation
of linear transformations. The following theorem provides a formula for the
change of the matrix representation.
292 Chapter 5

Theorem 5.36 Let V be a finitely-generated vector space over F, dimF V =


n. Let B and C be ordered bases for V , such that

C = BP,

for some P ∈ GLn (F). Then, for f ∈ HomF (V, V ),

[f ]CC = P −1 [f ]B
B P.

Proof : By definition of the representing matrix, for every v ∈ V ,

[f (v)]B = [f ]B
B [v]B and [f (v)]C = [f ]CC [v]C .

Moreover,

P [v]C = [v]B and P [f (v)]C = [f (v)]B ,

from which we obtain,

[f ]CC [v]C = [f (v)]C = P −1 [f (v)]B = P −1 [f ]B B


B [v]B = P [f ]B P [v]C .
−1

This holds for every v ∈ V , hence

[f ]CC = P −1 [f ]B
B P.

Example: Let V = R2 and let f ∈ HomR (R2 , R2 ) be given by

f (x, y) = (3x + 7y, 2x − 5y).

With respect to the standard basis E,

x
[(x, y)]E = [ ] ,
y

and
3x + 7y 3 7 x
[f (x, y)]E = [ ]=[ ][ ],
2x − 5y 2 −5 y
Linear Transformations 293

namely
3 7
[f ]EE = [ ].
2 −5
Let now
B = ((1, 2) (2, 1))
be another ordered basis for R2 . Then,
1 2
((1, 2) (2, 1)) = ((1, 0) (0, 1)) [ ],
2 1
and for (x, y) ∈ R2 ,
x 1 2 −1/3 2/3 x
[ ]=[ ] [(x, y)]B and [(x, y)]B = [ ][ ].
y 2 1 2/3 −1/3 y
Now,
[f (x, y)]B = [(3x + 7y, 2x − 5y)]B
−1/3 2/3 3x + 7
=[ ][ ]
2/3 −1/3 2x − 5y
−1/3 2/3 3 7 x
=[ ][ ][ ]
2/3 −1/3 2 −5 y
−1/3 2/3 3 7 1 2
=[ ][ ][ ] [(x, y)]B .
2/3 −1/3 2 −5 2 1
We conclude that
−1/3 2/3 3 7 1 2
[f ]B
B =[ ][ ][ ].
2/3 −1/3 2 −5 2 1
▲▲▲

Definition 5.37 Square matrices A, B ∈ Mn (F) are called similar (!‫)דומות‬,


if there exists an invertible matrix P ∈ GLn (F) such that
B = P −1 AP.

Thus, we have proved that matrices representing the same linear transfor-
mation f ∈ HomF (V, V ) relative to different bases are similar. The opposite
is also true: two matrices that are similar represent the same linear transfor-
mation relative to different bases.
294 Chapter 5

Proposition 5.38 Similarity between matrices is an equivalence relation.

Proof : This is left as an exercise. n

Exercises

(intermediate) 5.51 Let

B = ((2, 1) (3, 2)) and C = ((1, −1) (−1, 2))

be ordered bases for R2 . Let f ∈ HomR (R2 , R2 ) be the linear transformation


satisfying
1 2
[f ]B
B =[ ].
−1 1
Find [f ]CC .

(easy) 5.52 Prove that similarity between matrices is an equivalence rela-


tion (first, remind yourself what it takes to be a similarity relation).

(easy) 5.53 Prove that similar matrices represent the same linear transfor-
mation relative to different bases.

(easy) 5.54 Show that for any scalar a, the matrix a In is similar only to
itself. Interpret this result in terms of the matrix representation of linear
transformations.

(intermediate) 5.55 Let A, B ∈ Mn (R). Prove or disprove each of the


following statements:

(a) If A and B are row-equivalent, then they are similar.


(b) If A and B are similar, then they are row-equivalent.
(c) If A and B are similar and A is invertible, then B is invertible.
(d) If A is not invertible, then it is similar to a matrix having a row of
zeros.
Linear Transformations 295

(intermediate) 5.56 Let A ∈ M2 (R) be similar to the matrix

3 0
D=[ ]
0 2

Prove that
(A − 3I2 )(A − 2I2 ) = 0.
296 Chapter 5
Chapter 6

Volume Forms and


Determinants

6.1 Motivation

Consider the vector space V = R2 and let u, v ∈ R2 , as shown in the figure


below:

We want to define a function ω, which given two vectors u, v ∈ V , returns the


area of the parallelogram formed by those two vectors. Thus, we are looking
for a function ω ∶ V 2 → R, where V n denotes the space of n-tuples of vectors.
What are the properties we would like ω to satisfy? First, if one of the vectors
is multiplied by a scalar, the area should by magnified by the same scalar,
as depicted below, where v has been magnified by a factor of 2:
298 Chapter 6

u u
2v
v

That is, for every u, v ∈ R2 and a ∈ R,


ω(a u, v) = a ω(u, v),
and likewise,
ω(u, a v) = a ω(u, v).
Note that if we require this property to hold for every a ∈ R we may obtain
negative areas; the notion we are looking for is that of a signed area (‫שטח‬
!N‫)מסומ‬, which is negative or positive depending on the orientation (!‫ )מגמה‬of
the parallelogram. In the case of R2 , ω(u, v) whenever the shortest rotation
from u to v occurs inside the parallelogram (in the above figures the signed
area is negative).
The second property we expect ω to satisfy is that if we translate u along v
(and vice-versa v along u), then the area doesn’t change, as depicted below:

u + 0.5v
u u

v v

In other words, for every u, v ∈ R2 and a ∈ R,


ω(u + a v, v) = ω(u, v) and ω(u, v + a u) = ω(u, v).

As we will see, these two properties determine almost uniquely the area
function; there always remains a choice of “units”, which assigns an area to
Volume Forms and Determinants 299

a reference shape. To understand why, just observe this sequence of trans-


formations, which do not change the area of the parallelogram:

6.2 Volume forms


Definition 6.1 Let V be an n-dimensional vector space over a field F. A
function
ω∶Vn →F
is called a volume form (!‫ )תבנית נפח‬on V if

(a) For every (v1 , . . . , vn ) ∈ V n and for every i ≠ j,


ω(v1 , . . . , vi + vj , . . . , vn ) = ω(v1 , . . . , vn ). (6.1)

(b) For every a1 , . . . , an ∈ F,


ω(a1 v1 , . . . , an vn ) = a1 ⋯an ω(v1 , . . . , vn ). (6.2)

Note that the function ω returning 0F for every (v1 , . . . , vn ) ∈ V n satisfies


those conditions, i.e., it is a volume form. Such a volume form is called
degenerate (!N‫)מנוו‬.
The following theorem is the central one in this section:

Theorem 6.2 Let V be an n-dimensional vector space over a field F. For


every ordered basis B = (v1 , . . . , vn ), there exists a unique volume form ω
satisfying
ω(v1 , . . . , vn ) = 1F .
300 Chapter 6

We will not prove this theorem right away; for the time being, we will assume
that such volume forms exist and examine their properties.

Lemma 6.3 Let ω ∶ V n → F be a volume form on an n-dimensional vector


space V . Then, for all (v1 , . . . , vn ) ∈ V n and all 1 ≤ i ≤ n,

ω(v1 , . . . , vi−1 , 0V , vi+1 , . . . , vn ) = 0F .

Proof : This follows from Property (6.2) that

ω(v1 , . . . , vi−1 , 0V , vi+1 , . . . , vn ) = ω(v1 , . . . , vi−1 , 0F vi , vi+1 , . . . , vn )


= 0F ω(v1 , . . . , vn )
= 0F .

Lemma 6.4 Let ω ∶ V n → F be a volume form on an n-dimensional vector


space V . Then, for all i ≠ j and a ∈ F,

ω(v1 , . . . , vi + a vj , . . . , vn ) = ω(v1 , . . . , vn ).

Proof : If a = 0F then there is nothing to prove. Otherwise,

a ω(v1 , . . . , vn ) = ω(v1 , . . . , vi , . . . , a vj , . . . , vn )
= ω(v1 , . . . , vi + a vj , . . . , a vj , . . . , vn )
= a ω(v1 , . . . , vi + a vj , . . . , vj , . . . , vn ),

where the first and third equalities follow from (6.2) and the second equality
follows from (6.1). Dividing both sides by a, we obtain the required result.
n
Volume Forms and Determinants 301

Corollary 6.5 Let ω ∶ V n → F be a volume form on an n-dimensional vector


space V . Then, for all a1 , . . . , an ∈ F and 1 ≤ i ≤ n,

ω (v1 , . . . , vi + ∑ aj vj , . . . , vn ) = ω(v1 , . . . , vn ).
j≠i

Proof : This follows from (n − 1) applications of the previous lemma. n

Corollary 6.6 Let ω ∶ V n → F be a volume form on an n-dimensional vector


space V . If the sequence of vectors (v1 , . . . , vn ) is linearly-dependent, then

ω(v1 , . . . , vn ) = 0F .

Proof : If the vectors are linearly-dependent, then one of the vectors, say vi ,
can be written as a linear combination of all the others,

v i = ∑ aj v j .
j≠i

Then,

⎛ ⎞
⎜ ⎟
⎜ ⎟
ω(v1 , . . . , vn ) = ω ⎜v1 , . . . , 0V + ∑ aj vj , . . . , vn ⎟
⎜ j≠i

⎜ ⎟
⎝ ´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¸ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¶ ⎠
i-th term
= ω (v1 , . . . , 0V , . . . , vn ) = 0F .
n

6.3 Volume forms and elementary matrices


Both operations of multiplying one of the vectors by a scalar and adding to
one vector a multiple of another vector can be realized by multiplication by
302 Chapter 6

an elementary matrix. If we define the elementary matrices



⎪ 0F i ≠ j



k j
(Dk (a))i = ⎨a i = j = k (6.3)



⎩1 i = j ≠ k,

Then,
(v1 , . . . , vn ) Dkk (a) = (v1 , . . . , a vk , . . . , vn ).
Similarly, if we define the elementary matrices

⎪ 1 i=j



` j
(Tk (a))i = ⎨a i = k, j = ` (6.4)



⎩0F otherwise,

Then,
(v1 , . . . , vn ) Tk` (a) = (v1 , . . . , vk + a v` , . . . , vn ).
If follows that for every volume form ω,
ω((v1 , . . . , vn )Dkk (a)) = a ω(v1 , . . . , vn ), (6.5)
and
ω((v1 , . . . , vn )Tk` (a)) = ω(v1 , . . . , vn ). (6.6)

Example: Let V = R2 and let


B = ((1, 2), (2, 1)) and C = ((1, 1), (1, −1))
be two ordered bases. We have seen that
1/3 −1
((1, 1), (1, −1)) = ((1, 2), (2, 1)) [ ].
1/3 1
You may verify that
1/3 −1 1 0 1/3 0 1 0 1 0 1 −3
[ ]=[ ][ ][ ][ ][ ],
1/3 1 0 1/3 0 1 1 1 0 6 0 1
hence
1/3 −1 1 1
ω((1, 1), (1, −1)) = ω (((1, 2), (2, 1)) [ ]) = ⋅ ⋅ 6 ⋅ ω((1, 2), (2, 1)).
1/3 1 3 3
Thus, the ratio between the volumes associated with these two bases is com-
pletely determined by the structure of the transition matrix between those
bases. The ratio 3/2 is a property of the transition matrix, which we will
identify below as its determinant. ▲▲▲
Volume Forms and Determinants 303

6.4 Multilinearity and alternation


In this section we are going to examine volume forms from another perspec-
tive.

Definition 6.7 Let V be an n-dimensional vector space over F. A function


f ∶ V n → F is called multilinear (!‫ )מולטילינארית‬if for every (v1 , . . . , vn ) ∈ V n ,
w ∈ V and a ∈ F,

f (v1 , . . . , vi + w, . . . , vn ) = f (v1 , . . . , vn ) + f (v1 , . . . , w, . . . , vn ),

and
f (v1 , . . . , a vi , . . . , vn ) = a f (v1 , . . . , vn ).

Example: Let `1 , . . . , `n ∈ V ∨ be a sequence of linear forms, then the function


f ∶ V n → F defined by

f (v1 , . . . , vn ) = `1 (v1 )⋯`n (vn )

is multilinear. ▲▲▲

Example: Let V = Fncol , then the function f ∶ V n → F defined by


⎡ 1⎤ ⎡a1 ⎤
⎛⎢⎢ a1 ⎥⎥ ⎢ n ⎥⎞
⎢ ⎥
f ⎜⎢ ⋮ ⎥ , . . . , ⎢ ⋮ ⎥⎟ = a11 a22 . . . ann

⎝⎢an1 ⎥ ⎥ ⎢ n ⎥⎠
⎢an ⎥
⎣ ⎦ ⎣ ⎦
is multilinear. ▲▲▲

Definition 6.8 Let V be an n-dimensional vector space over F. A function


f ∶ V n → F is called alternating (!‫ )חילופית‬if for every (v1 , . . . , vn ) ∈ V n for
which vi = vj with i ≠ j,
f (v1 , . . . , vn ) = 0.

Theorem 6.9 Let V be an n-dimensional vector space over F. Let f ∶ V n →


F. Then, f is a volume form if and only if it is multilinear and alternating.
304 Chapter 6

Proof : Suppose first that ω is a volume form. We need to show that it is


multilinear and alternating. We will show that it is linear in its first entry,
as we can repeat the same argument for all other entries. We need to show
that for every u, v, v2 , . . . , vn ∈ V ,

ω(u + v, v2 , . . . , vn ) = ω(u, v2 , . . . , vn ) + ω(v, v2 , . . . , vn ).

By Corollary 6.6, if the vectors v2 , . . . , vn are linearly-dependent, then both


sides of this equation are zero. Otherwise, let v1 be the completion of
v2 , . . . , vn into a basis for V , and write

u = a1 v 1 + ⋅ ⋅ ⋅ + an v n and v = b 1 v 1 + ⋅ ⋅ ⋅ + bn v n .

By Properties (6.1),(6.2) of volume forms,

ω(u, v2 , . . . , vn ) = ω(a1 v1 , v2 , . . . , vn ) = a1 ω(v1 , v2 , . . . , vn )


ω(v, v2 , . . . , vn ) = ω(b1 v1 , v2 , . . . , vn ) = b1 ω(v1 , v2 , . . . , vn )
ω(u + v, v2 , . . . , vn ) = ω((a1 + b1 ) v1 , v2 , . . . , vn ) = (a1 + b1 ) ω(v1 , v2 , . . . , vn ),

which proves the additive property. The multiplicative property of multilin-


earity,
ω(a v1 , v2 , . . . , vn ) = a ω(v1 , v2 , . . . , vn )
is a particular case of (6.2).
The alternating property follows from the fact that if vi = vj for i ≠ j, then

ω(v1 , . . . , vi , . . . , vj , . . . , vn ) = ω(v1 , . . . , vi − vj , . . . , vj , . . . , vn )
= ω(v1 , . . . , 0V , . . . , vj , . . . , vn ) = 0F ,

where in the last step we used Lemma 6.3.


Conversely, suppose that ω is multilinear and alternating. Property (6.2) is
automatically satisfied. It only remains to prove that

ω(v1 , . . . , vi , . . . , vj , . . . , vn ) = ω(v1 , . . . , vi + vj , . . . , vj , . . . , vn ),

but this is immediate as, by multilinearity and alternation,

ω(v1 , . . . , vi + vj , . . . , vj , . . . , vn ) = ω(v1 , . . . , vi , . . . , vj , . . . , vn )
+ ω(v1 , . . . , vj , . . . , vj , . . . , vn )
= ω(v1 , . . . , vi , . . . , vj , . . . , vn ) + 0F .
Volume Forms and Determinants 305

n
In practice, volume forms are more natural to think of geometrically, and al-
ternating multilinear functions are more convenient to think of algebraically.
We have just shown that they are the same.

Proposition 6.10 Let V be an n-dimensional vector space over F and let ω


be a volume form on V . Then V is anti-symmetric, namely, for every i ≠ j,

ω(v1 , . . . , vi , . . . , vj , . . . , vn ) = −ω(v1 , . . . , vj , . . . , vi , . . . , vn ).

Proof : Consider

ω(v1 , . . . , vi + vj , . . . , vi + vj , . . . , vn ) = 0F ,

which vanishes by the alternating property of the volume form. Using the
multilinearity, two of the terms vanish by the alternating property, remaining
with

ω(v1 , . . . , vi , . . . , vj , . . . , vn ) + ω(v1 , . . . , vj , . . . , vi , . . . , vn ) = 0F .

Example: Let dimF V = 2 and let

B = (v1 , v2 ) and B∨ = (`1 , `2 )

be an ordered basis and its dual. Then, the function ω ∶ V 2 → F defined by

ω(u, v) = `1 (u)`2 (v) − `1 (v)`2 (u)

is multilinear and alternating (check it!). In addition,

ω(v1 , v2 ) = `1 (v1 )`2 (v2 ) − `1 (v2 )`2 (v1 ) = 1F .

Note that we proved in fact Theorem 6.2 (the existence part) for n = 2.
Suppose now that
a b
(u, v) = (v1 , v2 ) [ ],
c d
306 Chapter 6

namely,
u = a v1 + c v 2 and v = b v1 + d v2 .
Then,
ω(u, v) = ad − bc,
which should ring a bell. This is the what we called the determinant of the
matrix. ▲▲▲
In view of Theorem 6.9, we can replace Theorem 6.2 by the equivalent:

Theorem 6.11 Let V be an n-dimensional vector space over a field F. For


every ordered basis B = (v1 , . . . , vn ), there exists a unique multilinear alter-
nating function ω ∶ V n → F satisfying

ω(v1 , . . . , vn ) = 1F .

Proof : The proof is by induction on n = dimF V . Take first n = 1 and let


B = (v1 ) be a basis for V . The linear form ω ∶ V → F satisfying ω(v1 ) = 1 is
(mutli)linear, alternating (in an empty sense) and normalized. It is unique
as there exists a unique linear form that is normalized (the linear forms form
a 1-dimensional vector space. hence are all proportional to ω).
Assume that the statement holds for dimF V = n − 1 and let dimF V = n. Let
B = (v1 , . . . , vn )
be a basis for V and define
L = Span{v1 } and H = Span{v2 , . . . , vn }
be linear subspaces of V . We note that V = L ⊕ H, with dimF L = 1 and
dimF H = n−1. By the inductive assumption, there exists a unique multilinear
alternating function ωH ∶ H n−1 → F satisfying
ωH (v2 , . . . , vn ) = 1F .

Denote by pL ∶ V → V and pH ∶ V → V the projections on L and H parallel


to H and L, respectively. Every vector u ∈ V has a unique decomposition
u = λ(u) v1 + pH (u),
Volume Forms and Determinants 307

where λ ∶ L → F is the function satisfying

pL (u) = λ(u) v1 .

Note also that we can think of pH as a linear transformation V → H. Both


functions λ and pH are linear transformations (λ is a linear form).
We now define a function ω ∶ V n → F as follows,
n
ω(u1 , . . . , un ) = ∑(−1)j+1 λ(uj ) ωH (pH (u1 ), . . . , p̂
H (uj ), . . . , pH (un )),
j=1

where the “hat” over the j-th term means that this term has been omitted.
We now show that ω is a normalized, alternating multilinear function. Let’s
write it more explicitly.
ω(u1 , . . . , un ) = λ(u1 ) ωH (pH (u2 ), . . . , pH (un ))
− λ(u2 ) ωH (pH (u1 ), pH (u3 ), . . . , pH (un ))
+ λ(u3 ) ωH (pH (u1 ), pH (u2 ), pH (u4 ), . . . , pH (un ))
∓ ...
+ (−1)n+1 λ(un ) ωH (pH (u1 ), pH (u2 ), . . . , pH (un−1 )).
The function ω is multilinear: each of the terms in the sum is linear in each
of the uj ’s, either because λ is linear, or because pH is linear and ωH is
multilinear. The function ω is also alternating. Suppose, for example, that
u1 = u2 . In all of the summands but two, u1 and u2 are arguments of ωH ,
which is alternating, hence these terms vanish. Remain two terms, which in
this case are
λ(u1 ) ωH (pH (u2 ), pH (u3 ), . . . , pH (un ))
− λ(u2 ) ωH (pH (u1 ), pH (u3 ), . . . , pH (un )) = 0F .
You may convince yourself that this would happen whenever ui = uj for i ≠ j.
As for the normalization, since λ(vi ) = 0F and pH (vi ) = vi for all i ≥ 2,

ω(v1 , . . . , vn ) = λ(v1 ) ωH (pH (v2 ), . . . , pH (vn )) = 1F ⋅ 1F = 1F .

We have thus proved that ω is a volume form on V .


It remains to prove the uniqueness. Let η ∶ V n → F be a volume form on V
satisfying
η(v1 , . . . , vn ) = 1F .
308 Chapter 6

Let (u1 , . . . , un ) ∈ V n . If this sequence of vectors is linearly-dependent, then


η(u1 , . . . , un ) = 0F = ω(u1 , . . . , un ).
Otherwise, (u1 , . . . , un ) is a basis, and there exists an invertible matrix P ∈
GLn (F) such that
(u1 , . . . , un ) = (v1 , . . . , vn )P.
Such a P can be written as a product of elementary matrices of type Dkk (a)
and Tk` (a). By (6.5) and (6.6),
ω((v1 , . . . , vn )Dkk (a)) = a ω(v1 , . . . , vn )
=a
= a η(v1 , . . . , vn )
= η((v1 , . . . , vn )Dkk (a)),
and
ω((v1 , . . . , vn )Tk` (a)) = ω(v1 , . . . , vn )
= 1F
= η(v1 , . . . , vn )
= η((v1 , . . . , vn )Tk` (a)).
Proceeding inductively
ω((v1 , . . . , vn )P ) = η((v1 , . . . , vn )P ),
proving that ω = η for all entries (u1 , . . . , un ). n

Exercises

(easy) 6.1 Let V be a vector space over F and let k ∈ N (not necessarily
the dimension on V ). We denote by Mult(k, V, F) the set of functions f ∶
V k → F that are multilinear (it is a subspace of Func(V k , F)). Show that
Mult(k, V, F) is a vector space over F.
Solution 6.1: Let f, g ∈ Mult(k, V, F), and let u1 , . . . , uk ∈ V , v ∈ V and a ∈ F. Then,
(f + g)(u1 + av, u2 , . . . , uk ) = f (u1 + av, u2 , . . . , uk ) + g(u1 + av, u2 , . . . , uk )
= f (u1 , u2 , . . . , uk ) + g(u1 , u2 , . . . , uk )
+ a f (v, u2 , . . . , uk ) + a g(v, u2 , . . . , uk )
= (f + g)(u1 , u2 , . . . , uk ) + a(f + g)(v, u2 , . . . , uk ),
Volume Forms and Determinants 309

thus showing that f + g is linear in its first argument. We proceed similarly to show that
it is linear in any argument, and so it a f .

(intermediate) 6.2 Let V be a three-dimensional vector space over a field


F. Let
B = (v1 , v2 , v3 ) and B∨ = (`1 , `2 , `3 )
be a basis for V and its dual. Consider the function f ∶ V 3 → F by

f (u, v, w) = `1 (u)(`2 (v)`3 (w) − `3 (v)`2 (w))


− `2 (u)(`1 (v)`3 (w) − `3 (v)`1 (w))
+ `3 (u)(`1 (v)`2 (w) − `2 (v)`1 (w)).

Show that f is a normalized volume form on V .

Solution 6.2: f is clearly multilinear. It is easy to see that it is also alternating.


Finally, it is immediate that f (v1 , v2 , v3 ) = 1F .

(intermediate) 6.3 Let V be a four-dimensional vector space over a field


F. Let
B = (v1 , v2 , v3 , v4 ) and B∨ = (`1 , `2 , `3 , `4 )
be a basis for V and its dual. Write using the dual basis a normalized volume
form on V .

Solution 6.3: Just follow the idea of the previous example: start with
f (u, v, w, x) = `1 (u)`2 (v)`3 (w)`4 (x),

and then “anti-symmetrize” it, by adding or subtracting similar terms with the `i ’s inter-
changed. This yields,

f (u, v, w, x) = `1 (u)`2 (v)`3 (w)`4 (x) − `1 (u)`2 (v)`4 (w)`3 (x)


− `1 (u)`3 (v)`2 (w)`4 (x) + `1 (u)`3 (v)`4 (w)`2 (x)
+ `1 (u)`4 (v)`2 (w)`3 (x) + `1 (u)`4 (v)`3 (w)`2 (x)
+ 18 more terms.
310 Chapter 6

6.5 Determinants
Let V be an n-dimensional vector space over a field F. Let B = (v1 , . . . , vn )
be an ordered basis and let ω be a volume form on V . By the definition of a
basis, every (u1 , . . . , un ) ∈ V n has a unique representation as

(u1 , . . . , un ) = (v1 , . . . , vn )A

for some matrix A ∈ Mn (F). That is,

ω(u1 , . . . , un ) = ω((v1 , . . . , vn )A).

The right-hand side only depends on the matrix A. Consider then the func-
tion
ω((v1 , . . . , vn )A)
f (A) = .
ω(v1 , . . . , vn )
We note that it satisfies the following properties:

(a) If A = In , then f (A) = 1F .


(b) If
(u1 , . . . , un ) = (v1 , . . . , vn )A,
and A has two columns that are identical, say,

Coli (A) = Colj (A),

then ui = uj , hence ω(u1 , . . . , un ) = 0F . It follows that f (A) = 0F .


(c) By the distributivity of matrix multiplication,
⎡ a1 . . . b1 + c1 . . . a1n ⎤ ⎡ a1 . . . b1 . . . a1n ⎤⎥
⎢ 1 ⎥ ⎢ 1
⎢ a2 . . . b2 + c2 . . . a1 ⎥ ⎢ a2 . . . b2 . . . a1n ⎥⎥
⎢ 1 n ⎥ ⎢
(v1 , . . . , vn ) ⎢ ⎥ = (v1 , . . . , vn ) ⎢ 1 ⎥
⎢⋮ ⎥ ⎢⋮ ⋮ ⎥⎥
⎢ n ⋮ ⋮ ⋮ ⋮⎥ ⎢ n ⋮ ⋮ ⋮
⎢a . . . bn + cn . . . ann ⎥ ⎢a . . . bn . . . ann ⎥⎦
⎣ 1 ⎦ ⎣ 1
⎡ a1 . . . c1 . . . a1n ⎤⎥
⎢ 1
⎢ a2 . . . c2 . . . a1n ⎥⎥

+ (v1 , . . . , vn ) ⎢ 1 ⎥,
⎢⋮ ⋮ ⋮ ⋮ ⋮ ⎥⎥
⎢ n
⎢a
⎣ 1 . . . cn . . . ann ⎥⎦
where the b’s and c’s are in the i-th column. This is an equation
involving three elements of V n , which all have the same entries except
Volume Forms and Determinants 311

for the i-th entry, where the i-th entry on left-hand side is the sum of
the i-th entries on the right-hand side. By the multilinearity of volume
forms,

⎡ a1 . . . b1 + c1 . . . a1n ⎤ ⎡ a1 . . . b1 . . . a1n ⎤⎥⎞


⎛ ⎢ 1 ⎥ ⎢ 1
⎢ a2 . . . b2 + c2 . . . a1 ⎥⎞ ⎛ ⎢ a2 . . . b2 . . . a1n ⎥⎥⎟
⎜ ⎢ 1 n ⎥⎟ ⎜ ⎢
ω ⎜(v1 , . . . , vn ) ⎢ ⎥⎟ = ω ⎜(v1 , . . . , vn ) ⎢ 1 ⎥⎟
⎜ ⎢⋮ ⋮ ⋮ ⋮ ⋮ ⎥⎟ ⎜ ⎢⋮ ⋮ ⋮ ⋮ ⋮ ⎥⎥⎟
⎢ n ⎥ ⎢ n
⎝ ⎢a . . . bn + cn . . . ann ⎥⎠ ⎝ ⎢a . . . bn . . . ann ⎥⎦⎠
⎣ 1 ⎦ ⎣ 1
⎡ a1 . . . c1 . . . a1n ⎤⎥⎞
⎛ ⎢ 1
⎢ a2 . . . c2 . . . a1n ⎥⎥⎟
⎜ ⎢
+ ω ⎜(v1 , . . . , vn ) ⎢ 1 ⎥⎟ .
⎜ ⎢⋮ ⋮ ⋮ ⋮ ⋮ ⎥⎥⎟
⎢ n
⎝ ⎢a
⎣ 1 . . . cn . . . ann ⎥⎦⎠

Dividing both sides by ω(v1 , . . . , vn ) we obtain

⎡ 1 1 1 1⎤ ⎡ 1 . . . b1 . . . a1n ⎤⎥⎞
⎛⎢⎢ a1 . . . b + c . . . an ⎥⎥⎞ ⎛⎢⎢ a1
⎜⎢ a2 . . . b2 + c2 . . . a1n ⎥⎟ ⎜⎢ a2 . . . b2 . . . a1n ⎥⎥⎟
f ⎜⎢ 1 ⎥⎟ = f ⎜⎢ 1 ⎥⎟
⎜⎢ ⋮ ⋮ ⋮ ⋮ ⋮ ⎥⎟ ⎜⎢ ⋮ ⋮ ⋮ ⋮ ⋮ ⎥⎥⎟
⎝⎢⎢an . . . bn + cn . . . an ⎥⎥⎠ ⎝⎢⎢an . . . bn . . . ann ⎥⎦⎠
⎣ 1 n⎦ ⎣ 1
⎡ a1 . . . c1 . . . a1n ⎤⎥⎞
⎛⎢⎢ 1
⎜ ⎢ a2 . . . c2 . . . a1n ⎥⎥⎟
+ f ⎜⎢ 1 ⎥⎟
⎜⎢ ⋮ ⋮ ⋮ ⋮ ⋮ ⎥⎥⎟
⎝⎢⎢an . . . cn . . . ann ⎥⎦⎠
⎣ 1

(d) The n-tuple


⎡ a1 . . . c a1 . . . a1n ⎤
⎢ 1 i ⎥
⎢ a2 . . . c a 2 . . . a 1 ⎥
⎢ n⎥
(v1 , . . . , vn ) ⎢ 1 i ⎥
⎢⋮ ⋮ ⋮ ⋮ ⋮ ⎥
⎢ n ⎥
⎢a . . . c an . . . ann ⎥
⎣ 1 i ⎦
differs from
⎡ a1 . . . a1 . . . a1n ⎤
⎢ 1 i ⎥
⎢ a2 . . . a 2 . . . a 1 ⎥
⎢ 1 i n⎥
(v1 , . . . , vn ) ⎢ ⎥
⎢⋮ ⋮ ⎥⎥
⎢ n ⋮ ⋮ ⋮
⎢a . . . an . . . ann ⎥
⎣ 1 i ⎦
is the i-th entry, which is c times larger. By the homogeneity of the
312 Chapter 6

volume form,
⎡ a1 . . . c a1 . . . a1n ⎤ ⎡ a1 . . . a1 . . . a1n ⎤
⎛ ⎢ 1 i ⎥ ⎢ 1 i ⎥
⎢ a2 . . . c a2 . . . a1 ⎥⎞ ⎛ ⎢ a2 . . . a2 . . . a1 ⎥⎞
⎜ ⎢ 1 i n ⎥⎟ ⎜ ⎢ 1 i n ⎥⎟
ω ⎜(v1 , . . . , vn ) ⎢ ⎥⎟ = c ω ⎜(v1 , . . . , vn ) ⎢ ⎥⎟ .
⎜ ⎢⋮ ⋮ ⋮ ⋮ ⋮ ⎥⎟ ⎜ ⎢⋮ ⋮ ⋮ ⋮ ⋮ ⎥⎟
⎢ n ⎥ ⎢ n ⎥
⎝ ⎢a . . . c an . . . ann ⎥⎠ ⎝ ⎢a . . . an . . . ann ⎥⎠
⎣ 1 i ⎦ ⎣ 1 i ⎦
Dividing both sides by ω(v1 , . . . , vn ) we obtain
⎡ 1 1 1⎤ ⎡ 1 1 1⎤
⎛⎢⎢ a1 . . . c ai . . . an ⎥⎥⎞ ⎛⎢⎢ a1 . . . ai . . . an ⎥⎥⎞
⎜⎢ a2 . . . c a2i . . . a1n ⎥⎟ ⎜⎢ a2 . . . a2i . . . a1n ⎥⎟
f ⎜⎢ 1 ⎥⎟ = c f ⎜⎢ 1 ⎥⎟ .
⎜⎢ ⋮ ⋮ ⋮ ⋮ ⋮ ⎥⎥⎟ ⎜⎢ ⋮ ⋮ ⋮ ⋮ ⋮ ⎥⎥⎟

⎝⎢an . . . c an . . . an ⎥⎠ ⎢
⎝⎢an . . . an . . . an ⎥⎠
⎣ 1 i n⎦ ⎣ 1 i n⎦

Let’s summarize: given an n-tuple of vectors (v1 , . . . , vn ),

ω((v1 , . . . , vn )A) = f (A) ω(v1 , . . . , vn ),

where the function function f ∶ Mn (F) → F satisfies the following properties:

(a) It is column-wise multilinear.


(b) It is column-wise alternating, i.e., f (A) = 0F if A has two identical
columns.
(c) f (In ) = 1F .

Definition 6.12 A function f ∶ Mn (F) → F is called a determinant


(!‫ )דטרמיננטה‬if

(a) It is column-wise multilinear.


(b) It is column-wise alternating, i.e., f (A) = 0F if A has two identical
columns.
(c) It is normalized, i.e., f (In ) = 1F .

Proposition 6.13 For every field F and n ∈ N there exists a unique deter-
minant function f ∶ Mn (F) → F. We denote this function either by A ↦ det A
or by A ↦ ∣A∣.
Volume Forms and Determinants 313

Proof : If we view a matrix A ∈ Mn (F) as a sequence of column-vectors,

A = (Col1 (A), . . . , Coln (A)) ,

then the determinant can be viewed as a function

f ∶Vn →F where V = Fncol .

The requirements on f are precisely that it is a volume form normalized such


that
f (e1 , . . . , en ) = 1F .
By Theorem 6.2 such a function exists and is unique. n

Corollary 6.14 Let ω ∶ V n → F be a volume form on a vector space V . For


every n-tuple (v1 , . . . , vn ) ∈ V n and matrix A ∈ Mn (F),

ω((v1 , . . . , vn )A) = det(A) ω(v1 , . . . , vn ).

We have thus obtained a means for calculating the volume of every n-tuple
of vectors given its value for a basis, assuming we know how to calculate the
determinant of a matrix.

Exercises

(easy) 6.4 Let A ∈ Mn (F) and λ ∈ F. Show that

det(λA) = λn det(A).

Solution 6.4: This is an immediate consequence of the multilinearity of the determi-


nant.

(easy) 6.5 Let A ∈ Mn (F) be such that its n-th column is a linear combi-
nation of the other columns. Show that

det A = 0F .
314 Chapter 6

Solution 6.5: Suppose that


Coli (A) = ∑ cj Colj (A).
j=≠i

By multilinearity, det A is a sum a determinants where tin each he i-th column has been
replaced by cj times the j-th column. Once again by multilinearity, the cj factors out and
we remain with the determinant of a matrix having two equal columns, hence zero.

(easy) 6.6 Let A ∈ Mn (F) be such that aji = 0F for all i < j. Show that
det A = a11 a22 . . . ann .

Solution 6.6: Since the determinant is invariant under adding a column to another, once
can inductively eliminate all the off-diagonal entries of the matrix without modifying the
determinant. One remains then with a diagonal matrix having the same diagonal entries
as the original matrix.

(intermediate) 6.7 In each of the following items is given a function f ∶


M3 (R) → R. Determine whether it is (a) columns-wise multilinear, (b) linear
or (c) neither:

(a) f (A) = 1F .
(b) f (A) = 0F .
(c) f (A) = a11 + a22 + a33 .
(d) f (A) = a11 a11 + 2a11 a22 .
(e) f (A) = −a11 a12 a33 .
(f) f (A) = a12 a23 a31 + a13 a21 a32 .

Solution 6.7: (a) Neither (b) Both linear and multilinear (c) Linear (d) Neither (e)
Multilinear (f) Multilinear.

(easy) 6.8 Show that


⎡s1 a + t1 s2 a + t2 s3 a + t3 ⎤
⎢ ⎥
⎢ ⎥
det ⎢ s1 b + t1 s2 b + t2 s3 b + t3 ⎥ = 0F
⎢ ⎥
⎢ s1 c + t1 s2 c + t2 s3 c + t3 ⎥
⎣ ⎦
for all a, b, c, s1 , s2 , s3 , t1 , t2 , t3 ∈ F.
Volume Forms and Determinants 315

Solution 6.8: The columns are linearly-dependent as


α Col1 +β Col2 +γ Col3 = 0,

for α, βγ ∈ F satisfying

αs1 + βs2 + γs3 = 0 and αt1 + βt2 + γt3 = 0.

A system of two equations in three unknowns always has non-trivial solutions.

(easy) 6.9 Let A, B ∈ M3 (R) be given by


⎡a1 b1 c1 ⎤ ⎡a1 − 4b1 + 9c1 2b1 3c1 ⎤
⎢ ⎥ ⎢ ⎥
⎢ ⎥ ⎢ ⎥
A = ⎢a2 b2 c2 ⎥ and B = ⎢a2 − 4b2 + 9c2 2b2 3c2 ⎥ .
⎢ 3 3 3⎥ ⎢ 3 ⎥
⎢a b c ⎥ ⎢a − 4b3 + 9c3 2b3 3c3 ⎥
⎣ ⎦ ⎣ ⎦
What is det(B) if det(A) = 3/2?

Solution 6.9: We have


⎡a1 − 4b1 + 9c1 2b1 3c1 ⎤⎥ ⎡a1 2b1 3c1 ⎤⎥
⎢ ⎢
⎢ 2⎥ ⎢ ⎥
det ⎢a2 − 4b2 + 9c2 2b2 3c ⎥ = det ⎢a2 2b2 3c2 ⎥ = 6 det A,
⎢ 3 3⎥ ⎢ 3 3⎥
⎢a − 4b3 + 9c3 2b3 3c ⎥⎦ ⎢a 2b3 3c ⎥⎦
⎣ ⎣
hence the answer is 9.

(intermediate) 6.10 (a) Find a function f ∶ M3 (R) which is multilinear


with respect to its columns, alternating but not normalized.
(b) Find a function f ∶ M3 (R) which is alternating, normalized, but not
multilinear with respect to its columns.
(c) Find a function f ∶ M3 (R) which is multilinear with respect to its
columns, normalized, but not alternating.

6.6 Calculating determinants


The determinant of a matrix is defined via three properties: column-wise
multilinearity, column-wise alternation, and normalization. In this section we
turn these defining properties into an algorithm for calculating determinants.
316 Chapter 6

Proposition 6.15 Let Dkk (a) and Tk` (a) be the elementary matrices defined
by (6.3) and (6.4). Then, for all A ∈ Mn (F),

det(A Dkk (a)) = a det(A) and det(A Tk` (a)) = det(A).

In particular,

det(Dkk (a)) = a and det(Tk` (a)) = 1F ,

so that for every elementary matrix E

det(A E) = det(E) det(A). (6.7)

Proof : This is an immediate consequence of the multilinearity and alterna-


tion of the determinant. But we can also look at it differently. Let ω be a
volume form on an n-dimensional vector space V over F. By (6.5) and (6.6),
with (v1 , . . . , vn ) replaced by (v1 , . . . , vn )A,

ω((v1 , . . . , vn )ADkk (a)) = a ω((v1 , . . . , vn )A),

and
ω((v1 , . . . , vn )ATk` (a)) = ω((v1 , . . . , vn )A).
Dividing by ω(v1 , . . . , vn ), we obtained the desired result. n

Corollary 6.16 Let E1 , . . . , En be a sequence of elementary matrices.


Then,
det(E1 . . . En ) = det(E1 )⋯ det(En ).

Proof : Apply (6.7) (n − 1) times. n

Proposition 6.17 Let A ∈ Mn (F). Then, A ∈ GLn (F) if and only if det A ≠
0.
Volume Forms and Determinants 317

Proof : If A ∈ GLn (F), then it is a product of elementary matrices,

A = E1 ⋯En ,

and since det Ei ≠ 0 for all i, it follows from the previous corollary that

det A = det(E1 )⋯ det(En ) ≠ 0F .

Conversely, if A is not invertible, then it has a column linearly-dependent on


the other columns, hence det A = 0F . n

Proposition 6.18 Let A, B ∈ Mn (F). Then,

det(AB) = det(A) det(B).

Proof : If either A or B are not invertible, then AB is not invertible and both
sides of the equation vanish. Otherwise, both A and B can be written as
products of elementary matrices,

A = E1 ⋯En and B = F1 ⋯Fk .

Then,
det(AB) = det(E1 )⋯ det(En ) det(F1 )⋯ det(Fk ) .
´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¸ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹¶ ´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¸¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¶
det A det B

Example: You may verify that

3 1 1 1 1 0 1 0 1 2
[ ]=[ ][ ][ ][ ].
2 −1 0 1 2 1 0 −5 0 1

Hence
3 1
det = [ ] = −5.
2 −1
▲▲▲
Such a means for calculating determinants is not very convenient. A more
systematic way hinges on the proof of Theorem 6.11, which we remind, was
318 Chapter 6

inductive on n. The determinant of a matrix can be viewed as a column-wise


multilinear, column-wise alternating function ω ∶ (Fncol )n → F, normalized
such that ω(e1 , . . . , en ) = 1F .
Define
L = Span{e1 } and H = Span{e2 , . . . , en }.
Then,
⎡ 1⎤ ⎡ 1⎤ ⎡ 1⎤
⎛⎢⎢ a ⎥⎥⎞ ⎢⎢a ⎥⎥ ⎛⎢⎢ a ⎥⎥⎞ ⎢⎢ 0 ⎥⎥
⎡ ⎤
⎜⎢ a2 ⎥⎟ ⎢ 0 ⎥ ⎜⎢ a2 ⎥⎟ ⎢ a2 ⎥
pL ⎜⎢ ⎥⎟ = ⎢ ⎥ and pH ⎜⎢ ⎥⎟ = ⎢ ⎥ ,
⎜⎢ ⋮ ⎥⎟ ⎢ ⋮ ⎥ ⎜⎢ ⋮ ⎥⎟ ⎢ ⋮ ⎥
⎝⎢⎢an ⎥⎥⎠ ⎢⎢ 0 ⎥⎥ ⎝⎢⎢an ⎥⎥⎠ ⎢⎢an ⎥⎥
⎣ ⎦ ⎣ ⎦ ⎣ ⎦ ⎣ ⎦
so that
⎡ 1⎤
⎛⎢⎢ a ⎥⎥⎞
⎜⎢ a2 ⎥⎟
λ ⎜⎢ ⎥⎟ = a1
⎜⎢ ⋮ ⎥⎟
⎝⎢⎢an ⎥⎥⎠
⎣ ⎦
and we view pH as a function Fncol → Fn−1
col . By the construction in the proof
of Theorem 6.11, the volume form ωH is the unique normalized volume form
on Fn−1
col , which is nothing but the determinant for (n − 1) × (n − 1) matrices.
Thus, we obtain the following formula for the determinant,
n
det A = ∑(−1)j+1 λ(Colj (A)) det(pH (Col1 (A)), . . . , pH (Col
̂ j (A)), . . . , pH (Coln (A)))
j=1
n
= ∑(−1)j+1 a1j det(pH (Col11 (A), . . . , Col
̂ 1 1
j (A), . . . , Coln (A)),
j=1
(6.8)
where Colij (A) is the j-th column of A from which the i-entry has been
deleted.

Example: For n = 2,
a b
∣ ∣ = a∣d∣ − b∣c∣ = ad − bc.
c d
For n = 3,
RRRa b c RRR
RRRd e f RRRRR = a ∣ e f ∣ − b ∣d f ∣ + c ∣d e ∣ .
RRR R
RRR R h i g i g h
RRg h i RRRR
▲▲▲
Volume Forms and Determinants 319

Exercises

(intermediate) 6.11 Let D ∶ Mn (F) → F satisfy

D(AB) = D(A) D(B)

for all A, B ∈ Mn (F).

(a) Show that if D(In ) = 0F then D is the zero function.


(b) Show that if D(In ) ≠ 0F then D(In ) = 1F and D(A) ≠ 0F if A is invert-
ible.

Solution 6.11: (a) For every A ∈ Mn (F),


D(A) = D(A In ) = D(A) D(In ) = 0F .

(b) If D(In ) ≠ 0, then for every n ∈ N,

(D(In ))n = D(In ⋯In ) = D(In ),

i.e., D(In ) = 1F . If A is invertible, then

1F = D(In ) = D(A A−1 ) = D(A) D(A−1 ),

from which we conclude that D(A) ≠ 0F .

(easy) 6.12 Let A ∈ M2 (F) and let λ ∈ F. Show that

det(λI2 − A) = λ2 − λ tr A + det A,

where tr A is the sum of its diagonal terms.

Solution 6.12: For


a b
A=[ ],
c d
we have
λ−a −b
det(λI2 − A) = ∣ ∣ = λ2 − (a + d)λ + (ad − bc),
−c λ−d
which is the desired result.

(intermediate) 6.13 Let A ∈ M2 (F) such that A2 = 0.


320 Chapter 6

(a) Show that det A = 0F .


(b) Show that λI2 − A is invertible for every λ ≠ 0F .
(c) Show that for every λ ∈ F, det(λI2 − A) = λ2 .

Solution 6.13: (a) We have


0F = det(A2 ) = (det A)2 ,
hence det A = 0F .
(b+c) We observe that for
a b
A=[ ]
c d
we have
(tr A)2 = (a + d)2 = (a2 + 2bc + d2 ) + 2(ad − bc) = (A2 )11 + 2 det A = 0F ,
hence by the previous exercise, det(λI2 − A) = λ2 , which in particular implies that λI2 − A
is invertible for all λ ≠ 0F .

6.7 Determinants and transposition


Determinants are invariant under certain column operations; what about
invariance under row operations.

Definition 6.19 Let A ∈ Mm×n (F). We denote by At ∈ Mn×m its trans-


pose (!‫)המטריצה המשוחלפת‬, given by
(At )ji = Aij .

In the next semester you will see why such an operations makes sense.

Example: If
1 2 3
A=[ ],
4 5 6
then
⎡1 4⎤
⎢ ⎥
t ⎢ ⎥
A = ⎢2 5⎥ .
⎢ ⎥
⎢3 6⎥
⎣ ⎦
▲▲▲
Volume Forms and Determinants 321

Lemma 6.20 Let A ∈ Mm×n (F) and let B ∈ Mn×k (F). Then

(AB)t = B t At .

Proof : Just follows the definitions,


n n
(AB)ij = ∑ aik bkj hence ((AB)t )ij = ∑ ajk bki .
k=1 k=1

On the other hand


n n
(B t At )ij = ∑ (B t )ik (At )kj = ∑ bki ajk .
k=1 k=1

Lemma 6.21 Let E ∈ Mn (F) is an elementary matrix, then

det E t = det E.

Proof : This follows from the fact that

(Dkk (a))t = Dkk (a) and (Tk` (a))t = T`k (a).

Lemma 6.22 A ∈ Mn (F) is invertible if and only if At is invertible.

Proof : A is invertible if and only if its columns are linearly-independent and


if and only if its rows are linearly-independent. The claim follows by noting
that the rows of A are the columns of At and vice-versa. n
322 Chapter 6

Corollary 6.23 Let A ∈ Mn (F). Then,

det At = det A.

Proof : If A ∈ GLn (F), then it can be written as a product of elementary


matrices,
A = E1 ⋯Ek .
By Lemma 6.20,
At = Ekt ⋯E1t .
Combining with Proposition 6.18 and Lemma 6.21,
det At = det Ekt ⋯ det E1t = det E1 ⋯ det Ek = det A.
If A ∈/ GLn (F), then At ∈/ GLn (F), and
det A = 0F = det At .
n
The implication of this last proposition is that you can evaluate determinants
using row operations; for example, the determinant does not change if a
multiple of one row is added to another row.

Exercises

(intermediate) 6.14 Find the determinant of the following matrix,


⎡a b ⎤
⎢ ⎥
⎢c d ⎥
⎢ ⎥
⎢ ⎥,
⎢∗
⎢ ∗ e f ⎥⎥
⎢∗ ∗ g h⎥⎦

where the asterisks can represent any scalar.
Solution 6.14: The the case of a lower-diagonal block matrix, the determinant is the
product of the determinants of each block,
(ad − bc)(eh − f g).
Volume Forms and Determinants 323

(intermediate) 6.15 Let A ∈ Fncol and let B = Fnrow for n > 1. What can be
said about
det(AB) ?

Solution 6.15: If is zero because AB is not invertible (see Exercise 2.58).

(intermediate) 6.16 Find


⎡a b c d e⎤⎥

⎢f g h i j ⎥⎥

⎢ ⎥
det ⎢ k l 0 0 0⎥
⎢ ⎥
⎢m n 0 0 0⎥⎥

⎢p q 0 0 0⎥⎦

Solution 6.16: It is zero because the right-most three columns are not linearly-
independent.

(easy) 6.17 Calculate the determinants of the following matrices:


⎡ 1 1 1⎤⎥
⎡ 1 0 0 0 3 ⎤ ⎢⎢1 1 1
⎡9 5 6 4⎤ ⎢
⎢ ⎥ ⎢
⎥ ⎢0 1 1
⎥ 1 1 1⎥⎥
⎢7 0 3 0⎥ ⎢13 5 14 6 17⎥ ⎢⎢1 0 1 1 1 1⎥⎥

⎢ ⎥ ⎢ ⎥
⎢ ⎥ ⎢ 0 0 2 0 0 ⎥ ⎢⎢
⎢2 0 0 0⎥ ⎢ ⎥ ⎢1 1 0 1 1 1⎥⎥
⎢8 6 4 7⎥ ⎢11 8 15 10 19⎥ ⎢⎢1 1 1
⎢ ⎥ ⎢ ⎥ ⎥
⎣ ⎦ ⎢7 0 0 0 9⎥ ⎢ 0 1 1⎥
⎣ ⎦ ⎢1 1 1 ⎥
⎣ 1 0 1⎥⎦

Solution 6.17: For the left-most one,


RRR5 6 4RRRR
R 5 4
det = 2 RRRRR0 3 0RRRR = 6 ∣ ∣ = 66.
RRR6 RRR 6 7
R 4 7RR
For the middle one,
RRR 1 0 0 3 RRRR RRR 1 0 0 3 RRR
RRR R RRR R RRR0 0 3 RRR
RRR13 5 6 17RRRR RRR13 5 6 17RRRRR R R
det = 2 RR = 2 RR = −8 RRRRR5 6 17RRRRR = −24(50 − 48).
RRR11 8 10 19RRRRR RRR11 8 10 19RRRRR RRR8 10 19RRR
RRR 7 0 0 9 RRRR RRR 4 0 0 0 RRR R R
R R R
For the one on the right, by subtracting from each column the one to its left, we end up
with a lower-triangular matrix having ones on its diagonal, hence its determinant is 1F .
324 Chapter 6

(intermediate) 6.18 Calculate the determinants of the following n × n ma-


trices, n > 2,
⎡0 1 1 ... 1 ⎤⎥ ⎡1 2 3 ... n ⎤⎥ ⎡a b b ... b ⎤⎥
⎢ ⎢ ⎢
⎢1 1 1 ... 1 ⎥⎥ ⎢2 3 4 . . . n + 1 ⎥⎥ ⎢b a b ... b ⎥⎥
⎢ ⎢ ⎢
⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎢1 1 2 ... 1 ⎥ ⎢3 4 5 ... n + 2 ⎥ ⎢b b a ... b⎥
⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎢⋮ ⋱ ⋮ ⎥⎥ ⎢⋮ ⋱ ⋮ ⎥⎥ ⎢⋮ ⋱ ⋮ ⎥⎥
⎢ ⎢ ⎢
⎢1 1 1 . . . n − 1⎥⎦ ⎢n n + 1 n + 2 . . . 2n − 1⎥⎦ ⎢b b b ... a⎥⎦
⎣ ⎣ ⎣
(intermediate) 6.19 Let a1 , . . . , an ∈ F. Calculate
⎡ 1 1 1 ... 1 ⎤⎥

⎢ a1 a2 a3 ... an ⎥⎥

⎢ 2 2 ⎥
det ⎢ (a1 ) (a2 ) (a3 )2 ... (an )2 ⎥
⎢ ⎥
⎢ ⋮ ⋯ ⎥⎥

⎢(a )n−1 (a )n−1 (a )n−1 . . . (an )n−1 ⎥⎦
⎣ 1 2 3
.

(intermediate) 6.20 Let A, B ∈ M3 (R) find det(2A2 B −1 ) given than det A =


5 and det B = 10.
Solution 6.20: We have
det(2A2 B −1 ) = 23 (det A)2 (det B)−1 = 8 ⋅ 25 ⋅ 10−1 = 20.

(intermediate) 6.21 Let A, B ∈ M3 (R) find det(5AB 3 A−1 B −1 ) given than


det A ≠ 0 and det B = 2.
Solution 6.21: We have
det(5AB 3 A−1 B −1 ) = 53 det A (det B)3 (det A)−1 (det B)−1 = 500.

(intermediate) 6.22 For each of the following matrices, calculate the de-
terminants and determine for what values of the parameters those matrices
are invertible:
⎡1 a − 2 −a + 1⎤ ⎡ 1 d d2 ⎤
⎢ ⎥ b − 3 −2 c−1 4 ⎢ ⎥
⎢ ⎥ ⎢ ⎥
⎢0 2 a−1 ⎥ [ ] [ ] ⎢ d d2 1 ⎥ .

⎢a a2
⎥ 1 b − 6 2 c − 3 ⎢ ⎥
⎣ a2 − 1 ⎥⎦ ⎢d2 1 d ⎥
⎣ ⎦
Volume Forms and Determinants 325

Solution 6.22: Take the left-most one. We have


RRR1 a−2 −a + 1RRRR RRRR1 −2 −a + 1RRRR RRRR1 0 0 RRRR
RRR
RRR0 2 a − 1 RRRR = RRRR0 2 a − 1 RRRR = RRRR0 2 a − 1 RRRR = 2(a2 − 1),
R R R R R
RRRa
R a2 a2 − 1 RRRR RRRRa 0 a2 − 1 RRRR RRRRa 0 a2 − 1RRRR

hence the matrix in invertible unless a = ±1.

(intermediate) 6.23 Let A ∈ Mn (R) satisfy A2 = −A − In .

(a) Show that A is invertible.


(b) Show that A3 = In .
(c) Find det A.

Solution 6.23: (a) Since


In = −A(A + In ),
it follows that A is invertible. Moreover, A−1 = −(A + In ) = A2 .
(b) Thus, A3 = A2 A = A−1 A = In .
(c) We have
1 = det(A3 ) = (det A)3 ,
hence det A = 1.

(intermediate) 6.24 Let A, B ∈ Mn (F).

(a) Show that if AB + B is invertible then so is BA + B.


(b) Show that if A2 B − A2 is invertible then so is BA − A.
(c) Show that if AB 2 − A is invertible then so is BA − A.
(d) Show that if A2 −B 2 is invertible and AB = BA, then A+B is invertible.

(intermediate) 6.25 Calculate the determinant of the matrix


⎡ a2 b2 c2 d2 ⎤⎥

⎢(a + 1)2 (b + 1)2 (c + 1)2 (d + 1)2 ⎥
⎢ ⎥
⎢ ⎥.
⎢(a + 2)2 (b + 2)2 (c + 2)2 (d + 2)2 ⎥
⎢ ⎥
⎢(a + 3)2 (b + 3)2 (c + 3)2 (d + 3)2 ⎥
⎣ ⎦
326 Chapter 6

Solution 6.25: We have


RRR a2 b2 c2 d2 RRRR RRRR a2 b2 c2 d2 RRRR
RRR R R R
R2a + 1 2b + 1 2c + 1 2d + 1RRRR RRRR2a + 1 2b + 1 2c + 1 2d + 1RRRR
det = RRRR = = 0.
RRR4a + 4 4b + 4 4c + 4 4d + 4RRRR RRRR 2 2 2 2 RRRR
RRR6a + 9 6b + 9 RRR RRR RRR
R 6c + 9 6d + 9RR RR 6 6 6 6 RR

6.8 Cramer’s formula


Let’s look back on how we calculated the determinant of a matrix. The con-
struction was inductive, based on the proof of the existence of a normalized
volume form. Let’s write the formula in a more compact form: First, let’s
denote by
AiC AjC and AijC
C
the matrix A with the i-th row removed, the j-column removed and both the
i-th row and the j-th column removed. Formula (6.8) for the determinant of
a matrix can be written as
n
det A = ∑(−1)j+1 a1j det AA1j . (6.9)
j=1 C

This is of course a recursive formula. It is worth recalling why it is correct.


The determinant is the unique function Mn (F) → F, which is column-wise
multilinear, column-wise alternating and satisfying det In = 1. (Since it is
invariant under transposition, it is also the unique function which is row-
wise multilinear, row-wise alternating and satisfying det In = 1.)
The fact that the inductive definition (6.9) satisfies these requirements is
proved inductively on n. For example, assume that the determinant is alter-
nating for Mn−1 (F), and let A ∈ Mn (F) have its k-th column equal to its `-th
column. Then, AA1j has two identical columns unless j = k or j = `; that is,
C
unless j = k or j = ` we have by the inductive assumption that det AA1j = 0F .
C
Thus,
det A = (−1)k+1 a1k det AA1k + (−1)`+1 a1` det AA1` .
A C
Now a1k = a1` and the matrices AA1k and AA1` are almost identical; they may only
A C
differ in the ordering of the columns. If, for example, ` = k + 1, then AA1k = AA1` ,
A C
Volume Forms and Determinants 327

hence
det A = (−1)k+1 a1k det AA1k + (−1)k+2 a1k det AA1k = 0F .
A A
If, for example ` = k + 2, then AA1k and AA1` differ by the interchange of two
columns, which implies that their determinants differ by a sign, i.e.,
A C

det A = (−1)k+1 a1k det AA1k + (−1)k+3 a1k (− det AA1k ) = 0F .


A A
Keep “playing with this” to convince yourself that it does not matter how
far apart k and ` are; in either case, the determinant for n × n matrices is
alternating.
Formula (6.9) delineates the first row as “special”; there is of course nothing
special about it. We could have selected any row i, and write instead
n
det A = ∑(−1)j+i aij det AijC . (6.10)
j=1 C

Example: Take the 3 × 3 matrix


⎡1 3 4⎤
⎢ ⎥
⎢ ⎥
A = ⎢7 2 1⎥ .
⎢ ⎥
⎢9 3 2⎥
⎣ ⎦
The term (−1)j+i yields the following pattern,
⎡+ − +⎤
⎢ ⎥
⎢ ⎥
⎢− + −⎥ .
⎢ ⎥
⎢+ − +⎥
⎣ ⎦
Take for example the second row, i = 2. Then,

⎡1 3 4⎤ ⎡1 3 4⎤ ⎡1 3 4⎤
⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎢ ⎥ ⎢ ⎥ ⎢ ⎥
AA21 = ⎢7 2 1⎥ AA22 = ⎢7 2 1⎥ AA23 = ⎢7 2 1⎥
⎢ ⎥ ⎢ ⎥ ⎢ ⎥
A ⎢9 3 2⎥ A ⎢9 3 2⎥ A ⎢9 3 2⎥
⎣ ⎦ ⎣ ⎦ ⎣ ⎦

So that
3 4 1 4 1 3
det A = −7 ∣ ∣+2 ∣ ∣−1 ∣ ∣ = (−7)(−6) + 2(−34) + (−1)(−24) = (−2).
3 2 9 2 9 3
328 Chapter 6

▲▲▲
Since the determinant is invariant under transposition, we could have as well
chosen a distinguished column, say the j-th column, and then sum up over
all rows n
det A = ∑(−1)j+i aij det AijC . (6.11)
i=1 C

Example: Take the same matrix as in the previous example and take say
the third column, j = 3. Then,
⎡1 3 4⎤ ⎡1 3 4⎤ ⎡1 3 4⎤
⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎢ ⎥ ⎢ ⎥ ⎢ ⎥
AA13 = ⎢7 2 1⎥ AA23 = ⎢7 2 1⎥ AA33 = ⎢7 2 1⎥
⎢ ⎥ ⎢ ⎥ ⎢ ⎥
A ⎢9 3 2⎥ A ⎢9 3 2⎥ A ⎢9 3 2⎥
⎣ ⎦ ⎣ ⎦ ⎣ ⎦

So that
7 2 1 3 1 3
det A = 4 ∣ ∣ + (−1) ∣ ∣+2 ∣ ∣ = 4 ⋅ 3 − (−24) + 2(−19) = (−2).
9 3 9 3 7 2
▲▲▲
Next, we relate determinants to the first subject of this course, the solution
of linear systems; we focus on the case where the number of equations equals
the number of unknowns. Let A ∈ Mn (F) and b ∈ Fncol , and denote by
Aj→b
C
The matrix in which the j-th column has been replaced by the column matrix
b.

Theorem 6.24 (Cramer’s formula) Let A and b be as above and sup-


pose that x ∈ Fncol satisfies the equation
Ax = b.
The for every j = 1, . . . , n,
det Aj→b
C
= xj det A.
In particular, if A is invertible, then
det Aj→b
xj = C .
det A
Volume Forms and Determinants 329

Example: Consider the linear system,

5 3 x1 4
[ ][ ] = [ ].
2 1 x2 0

Then,
4 3 5 4
∣ ∣ ∣ ∣
0 1 4 2 0 (−8)
x1 = = and x2 = = ,
5 3 (−1) 5 3 (−1)
∣ ∣ ∣ ∣
2 1 2 1
and indeed,
5 3 −4 4
[ ][ ] = [ ].
2 1 8 0
But let’s actually go through the steps of the proof below. We have

4 5x1 + 3x2
[ ]=[ 1 ].
0 2x + x2

Hence,
4 3 5x1 + 3x2 3 5x1 3 5 3
∣ ∣=∣ 1 2 ∣ = ∣ 1 ∣ = x1 ∣ ∣,
0 1 2x + x 1 2x 1 2 1
and
5 4 5 5x1 + 3x2 5 3x2 5 3
∣ ∣=∣ 1 2 ∣=∣ ∣ = x2 ∣ ∣.
2 0 2 2x + x 2 x2 2 1
▲▲▲

Proof : The matrix b satisfies the equation


n
b = ∑ xi Coli (A).
i=1

Take b, place it instead of the j-th column of A; then,

Aj→b
C
= [Col1 (A) . . . ∑ni=1 xi Coli (A) . . . Coln (A)] ,

where the sum is ay the j-th column. By the multilinearity of the determi-
nant,
n
det Aj→b
C
= ∑ xi ∣Col1 (A) . . . Coli (A) . . . Coln (A)∣ .
i=1
330 Chapter 6

By the alternation of the determinant, all the summands vanish, except for
the j-th, i.e.,
det Aj→b
C
= xj det A.
n
And with this, we obtain Cramer’s formula for the inverse matrix:

Theorem 6.25 (Cramer’s formula) Let A ∈ GLn (F) and denote B =


A−1 . Then,
det AjiC
i j+i
bj = (−1) C.
det A

Proof : Let’s verify that this coincides with the known formula for 2 × 2
matrices. We have

det AA11
d det AA21 b
(A−1 )11 = (−1)2
= A (A−1 )12 = (−1) 3 A =−
det A ad − bc det A ad − bc

det AA12 c det AA22 a


(A−1 )21 3
= (−1) =− A (A−1 )22 = (−1) 4 A = ,
det A ad − bc det A ad − bc
i.e.,
−1
a b 1 d −b
[ ] = [ ].
c d ad − bc −c a
n

Proof : The matrix B satisfies the equation AB = In , which we may rewrite


as
A [Col1 (B) . . . Coln (B)] = [e1 . . . en ] .
That is,
A Colj (B) = ej .
By Cramer’s formula,
det Ai→e
C j
bij = .
det A
Volume Forms and Determinants 331

Consider the numerator. The i-th column of the matrix Ai→e


C j
consists of
zeros, except for 1 at the j-th row. Hence, by (6.10),

det Ai→e
C j
= (−1)i+j det AjiC .
C
n

Exercises

(intermediate) 6.26 Solve the following linear systems over R using Cramer’s
formula.

(a)
X +2Y +3Z =6
4X +5Y +6Z = 15
7X +8Y +10Z = 25.

(b)
X +Y +Z = 11
2X −6Y −Z = 0
3X +4Y +2Z = 0.

(c)
3X −2Y =7
3Y −2Z = 6
−2X +3Z = −1.

(intermediate) 6.27 Invert the following matrices using Cramer’s formula,


⎡1 2 3 ⎤ ⎡−2 3 2 ⎤ ⎡cos θ 0 − sin θ⎤
⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎢ ⎥ ⎢ ⎥ ⎢ ⎥
A = ⎢4 5 6 ⎥ B=⎢6 0 3⎥ C=⎢ 0 1 0 ⎥.
⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎢7 8 10⎥ ⎢ 4 1 −1⎥ ⎢ sin θ 0 cos θ ⎥
⎣ ⎦ ⎣ ⎦ ⎣ ⎦

6.9 The determinant of a linear transforma-


tion
In our introduction of determinants we essentially proved the following the-
orem (Corollary 6.14):
332 Chapter 6

Theorem 6.26 Let V be an n-dimensional vector space over F and let ω ∶


V n → F be a volume form on V . Then, for every ordered basis B and any
matrix A ∈ Mn (F), writing

(u1 . . . un ) = BA,

we have
ω (u1 . . . un ) = ω(B) det A.

This lead to the perhaps surprising corollary:

Corollary 6.27 For every non-degenerate volume form ω, every basis B


and every matrix A, the ratio

ω(BA)
ω(B)

depends neither on the volume form, nor on the basis.

And further,

Corollary 6.28 Let ω and η be two non-degenerate volume forms on V ,


then there exists a constant c ∈ F such that

ω = c η,

i.e., for every (u1 . . . un ),

ω (u1 . . . un ) = c η (u1 . . . un ) .

Proof : Let B be any ordered basis on V and let A ∈ Mn (F) be the unique
matrix satisfying
(u1 . . . un ) = BA.
Volume Forms and Determinants 333

Then,

ω(BA) η(BA)
ω (u1 . . . un ) = ω(BA) = ω(B) = ω(B) = c η (u1 . . . un ) ,
ω(B) η(B)

where
ω(B)
c= .
η(B)
n
Since all the volume forms are multiples of each other, they are essentially
the same; they only differ by a choice of units. This observation yields that
an operator on a vector space can be characterized by how much it magnifies
volumes:

Theorem 6.29 Let ω be a non-degenerate volume form on a finitely-


generated vector space V . Let f ∈ HomF (V, V ) be a linear transformation.
Let B = (v1 . . . vn ) by an ordered basis on V ; we denote

f (B) = (f (v1 ), . . . , f (vn )) .

Then, the ratio


ω(f (B))
ω(B)
depends neither on ω nor on B; it is a sole property of the linear transfor-
mation f , which we call the determinant of f .

Proof : Let A = [f ]B
B , i.e.,
f (B) = BA.
Then,
ω(f (B)) ω(BA)
= = det A,
ω(B) ω(B)
and the right-hand side depends neither on ω nor on B. n
Thus, the determinant of f coincides with the determinant of its representing
matrix, but, this identity does not depend on the basis relative to which we
334 Chapter 6

represent f . This should perhaps not come as a surprise, as if C is some other


basis, then there exists an invertible matrix P , such that

[f ]CC = P −1 [f ]B
B P.

By the properties of the determinant,

det[f ]CC = det P −1 ⋅ det[f ]B


B ⋅ det P.

Since det P −1 det P = det In = 1F , we obtain that the determinant of the


representing matrix is independent of the representation, i.e., it is an intrinsic
property of the transformation.

det[f ]CC = det[f ]B


B

Exercises

(intermediate) 6.28 Let V be a finitely-generated vector space and let B


be an ordered basis for V . Let S, T ∈ HomF (V, V ). Show that

det[T ○ S]B B B
B = det[T ]B det[S]B .

Solution 6.28: This follows from the fact that [T ○ S]B B B


B = [T ]B [S]B .

(intermediate) 6.29 Let V be a finitely-generated vector space and let


B, C, D be ordered bases for V . Show that

(a) det[IdV ]B
C ≠ 0.

(b) (det[IdV ]B
C)
−1 = det[Id ]C .
V B

(c) det[IdV ]B C B
D = det[IdV ]D det[IdV ]C .

(intermediate) 6.30 Let A ∈ Mn (F) and define a linear transformation


g ∶ Mn (F) → Mn (F),
g(X) = XA − AX.
Show that det g = 0F .
Volume Forms and Determinants 335

Solution 6.30: If A is the zero matrix then g is the zero function and its determinant
is clearly zero. Otherwise, since g(A) = 0, it follows that g is not invertible, and neither is
any of its representing matrices.

(intermediate) 6.31 Let A ∈ Mn (F) and define two linear transformations


L, R ∶ Mn (F) → Mn (F),

L(X) = AX and R(X) = XA.

Show that
det L = det R = (det A)n .
Hint: Separate the cases A ∈ GLn (F) and A ∈/ GLn (F).
336 Chapter 6

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy