MATH1012_Reader

Download as pdf or txt
Download as pdf or txt
You are on page 1of 216

DEPARTMENT OF MATHEMATICS & STATISTICS

M AT H 1 0 1 2
M AT H E M AT I C A L
T H E O RY A N D
METHODS
2

Acknowledgements: The following members of the Department


of Mathematics and Statistics, UWA, have contributed in one way
or another for the production of these Lecture Notes: A. Bassom,
E. Cripps, A. Devillers, M. Giudici, L. Jennings, K. Judd, D. Hill,
J. Hopwood, M. Matthews, A. Niemeyer, G. Royle, T. Stemler, L.
Stoyanov.
Contents

1 Systems of linear equations 7


1.1 Systems of linear equations 7
1.1.1 Solutions to systems of linear equations 8
1.2 Solving linear equations 10
1.2.1 Elementary row operations 11
1.2.2 The augmented matrix 13
1.3 Gaussian elimination 14
1.4 Back substitution 17
1.5 A more advanced method: Gauss-Jordan elimination 21
1.6 Reasoning about systems of linear equations 25

2 Vector spaces and subspaces 27


2.1 The vector space Rn 27
2.2 Subspaces 29
2.2.1 Subspace proofs 31
2.2.2 Exercises 34
2.3 Spans and spanning sets 34
2.3.1 Spanning sets 37
2.4 Linear independence 40
2.5 Bases 43
2.5.1 Dimension 46
2.5.2 Coordinates 49

3 Matrices and determinants 51


3.1 Matrix algebra 51
3.1.1 Basic operations 52
4 CONTENTS

3.2 Subspaces from matrices 55


3.2.1 The row space and column space 55
3.2.2 The null space 60
3.3 Solving systems of linear equations 64
3.4 Matrix inversion 64
3.4.1 Finding inverses 67
3.4.2 Characterising invertible matrices 70
3.5 Determinants 72
3.5.1 Calculating determinants 75
3.5.2 Properties of the determinant 78

4 Linear transformations 81
4.1 Introduction 81
4.2 Linear transformations and bases 83
4.3 Linear transformations and matrices 83
4.4 Rank-nullity theorem revisited 85
4.5 Composition 86
4.6 Inverses 88

5 Change of basis 91
5.1 Change of basis for vectors 91
5.2 Change of bases for linear transformations 93

6 Eigenvalues and eigenvectors 97


6.1 Introduction 97
6.2 Finding eigenvalues and eigenvectors 99
6.3 Some properties of eigenvalues and eigenvectors 103
6.4 Diagonalisation 104

7 Improper integrals 107


7.1 Improper integrals over infinite intervals 107
7.2 Improper integrals of unbounded functions over finite intervals 109
7.3 More complicated improper integrals 111
math1012 mathematical theory and methods 5

8 Sequences and series 113


8.1 Sequences 113
8.1.1 Bounded sequences 116
8.2 Infinite series 119
8.2.1 The integral test 122
8.2.2 More convergence tests for series 125
8.2.3 Alternating series 127
8.2.4 Absolute convergence and the ratio test 128
8.3 Power series 131
8.3.1 Taylor and MacLaurin series 133

9 Fourier series 137


9.1 Calculation of the Fourier coefficients 140
9.2 Functions of an arbitrary period 143
9.3 Convergence of Fourier series 143
9.4 Functions defined over a finite interval 144
9.5 Even and odd functions 146
9.6 Fourier cosine series for even functions 147
9.7 Fourier sine series for odd functions 148
9.8 Half-range expansions 149
9.9 Parseval’s theorem (not for assessment) 152
9.10 Differentiation of Fourier series 153
9.11 Integration of Fourier series 156

10 Differential equations 159


10.1 Introduction 159
10.1.1 Solutions of differential equations 160
10.1.2 Verification of solutions of differential equations 161
10.2 Mathematical modelling with ordinary differential equations 162
10.3 First-order ordinary differential equations 164
10.3.1 Direction fields 164
10.3.2 Separation of variables 166
10.3.3 The integrating factor method 167
10.3.4 Initial conditions 169
6 CONTENTS

10.4 Second-order ordinary differential equations 170


10.5 Linear homogeneous second-order ordinary differential equations with constant coeffi-
cients 173
10.6 Linear nonhomogeneous second-order ordinary differential equations with constant coef-
ficients 177
10.6.1 Method of undetermined coefficients 178
10.6.2 Variation of parameters 181
10.7 Initial and boundary conditions 185

11 Laplace transforms 187


11.1 The Laplace transform and its inverse 187
11.1.1 Linearity of the Laplace transform 189
11.1.2 Existence of Laplace transforms 190
11.2 Inverse Laplace transforms of rational functions 191
11.3 The Laplace transform of derivatives and integrals of f (t) 192
11.4 Solving differential equations 195
11.5 Shift theorems 197
11.6 Derivatives of transforms 202
11.7 Convolution 203
11.8 Laplace transforms table 207

12 Appendix - Useful formulas 209

13 Index 215
1
Systems of linear equations

This chapter covers the systematic solution of systems of linear


equations using Gaussian elimination and back-substitution and the
description, both algebraic and geometric, of their solution space.

Before commencing this chapter, students should be able to:

• Plot linear equations in 2 variables, and


• Add and multiply matrices.

After completing this chapter, students will be able to:

• Systematically solve systems of linear equations with many


variables, and
• Identify when a system of linear equations has 0, 1 or infinitely
many solutions, and
• Give the solution set of a system of linear equations in paramet-
ric form.

1.1 Systems of linear equations

A linear equation is an equation of the form

x + 2y = 4

where each term in the equation is either a number1 (i.e. “6”) or 1


You will often see the word scalar
a numerical multiple of a variable (i.e., “2x”, “4y”). If an equation used to refer to a number, and scalar
multiple to describe a numerical multi-
involves powers or products of variables (x2 , xy, etc.) or any other ple of a variable.
functions (sin x, e x , etc.) then it is not linear.
A system of linear equations is a set of one or more linear equa-
tions considered together, such as

x + 2y = 4
x − y = 1

which is a system of two equations in the two variables2 x and y. 2


Often the variables are called “un-
A solution to a system of linear equations is an assignment of knowns” emphasizing that solving a
system of linear equations is a process
values to the variables such that all of the equations in the system of finding the unknowns.
8 CHAPTER 1. SYSTEMS OF LINEAR EQUATIONS

are satisfied. For example, there is a unique solution to the system


given above which is
x = 2 , y = 1.
Particularly when there are more variables, it will often be useful to
give the solutions as vectors like
( x, y) = (2, 1). In this case we could also just give the
solution as (2, 1) where, by convention
If the system of linear equations involves just two variables, then the first component of the vector is
the x-coordinate. Usually the variables
we can visualise the system geometrically by plotting the solutions will have names like x, y, z or x1 , x2 ,
to each equation separately on the xy-plane, as illustrated in Fig- . . ., xn and so we can just specify the
vector alone and it will be clear which
ure 1.1. The solution to the system of linear equations is the point
component corresponds to which
where the two plots intersect, which in this case is the point (2, 1). variable.

Figure 1.1: The two linear


equations x + 2y = 4 and
x − y = 1 plotted as intersect-
ing lines in the xy-plane.

It is easy to visualise systems of linear equations in two vari-


ables, but it is more difficult in three variables, where we need
3-dimensional plots. In three dimensions the solutions to a single
linear equation such as
Recall from MATH1011 that this
x + 2y −z = 4
particular equation describes the
form a plane in 3-dimensional space. While computer algebra sys- plane with normal vector (1, 2, −1)
containing the point (4, 0, 0).
tems can produce somewhat reasonable plots of surfaces in three
dimensions, it is hard to interpret plots showing two or more inter-
secting surfaces.
With four or more variables any sort of visualisation is essen- It is still very useful to use geometric
tially impossible and so to reason about systems of linear equations intuition to think about systems of
linear equations with many variables
with many variables, we need to develop algebraic tools rather than provided you are careful about where
geometric ones. it no longer applies.

1.1.1 Solutions to systems of linear equations


The system of linear equations shown in Figure 1.1 has a unique
solution. In other words there is just one ( x, y) pair that satisfies
math1012 mathematical theory and methods 9

both equations, and this is represented by the unique point of inter-


section of the two lines. Some systems of linear equations have no If a system of linear equations has
solutions at all. For example, there are no possible values for ( x, y) at least one solution, then it is called
consistent, and otherwise it is called
that satisfy both of the following equations inconsistent.

x + 2y = 4
2x + 4y = 3.

Geometrically, the two equations determine parallel but different


lines, and so they do not meet. This is illustrated in Figure 1.2.

Figure 1.2: The inconsistent


system of equations x + 2y = 4
and 2x + 4y = 2 plotted as
lines in the xy-plane.

There is another possibility for the number of solutions to a sys-


tem of linear equations, which is that a system may have infinitely
many solutions. For example, consider the system

x + 2y + z = 4
(1.1)
y + z = 1.

Each of the two equations determines a plane in three dimen-


sions, and as the two planes are not parallel3 , they meet in a line 3
The two planes are not parallel
and so every point on the line is a solution to this system of linear because the normal vectors to the
two planes, that is n1 = (1, 2, 1) and
equations. n2 = (0, 1, 1) are not parallel.

Remark 1.1. How can we describe the solution set to a system of linear
equations with infinitely many solutions?

One way of describing an infinite solution set is in terms of free


parameters where one (or more) of the variables is left unspecified
with the values assigned to the other variables being expressed as
formulas that depend on the free variables.
Let’s see how this works with the system of linear equations
given by Equation (1.1): here we can choose z to be the “free vari-
able” but then to satisfy the second equation it will be necessary
10 CHAPTER 1. SYSTEMS OF LINEAR EQUATIONS

to have y = 1 − z. Then the first equation can only be satisfied by


taking

x = 4 − 2y − z
= 4 − 2(1 − z ) − z (using y = 1 − z)
= 2 + z.

Thus the complete solution set S of system (1.1) is

S = {(2 + z, 1 − z, z) | z ∈ R}.

To find a particular solution to the linear system, you can pick


any desired value for z and then the values for x and y are deter-
mined. For example, if we take z = 1 then we get (3, 0, 1) as a
solution, and if we take z = 0 then we get (2, 1, 0), and so on. For
this particular system of linear equations it would have been possi-
ble to choose one of the other variables to be the “free variable” and
we would then get a different expression for the same solution set.

Example 1.2. (Different free variable) To rewrite the solution set S =


{(2 + z, 1 − z, z) | z ∈ R} so that the y-coordinate is the free variable,
just notice that as y = 1 − z, this implies that z = 1 − y and so the
solution set becomes S = {(3 − y, y, 1 − y) | y ∈ R}.

A system of linear equations can also be expressed as a single


matrix equation involving the product of a matrix and a vector of
variables. So the system of linear equations

x + 2y + z = 5
y − z = −1
2x + 3y − z = 3

can equally well be expressed as


    
1 2 1 x 5
0 1 −1  y  =  −1
    
2 3 −1 z 3

just using the usual rules for multiplying matrices.4 In general, a 4


Matrix algebra is discussed in detail
in Chapter 3 but for this representation
system of linear equations with m equations in n variables has the
as a system of linear equations, just
form the definition of the product of two
Ax = b matrices is needed.

where A is an m × n coefficient matrix, x is an n × 1 vector of vari-


ables, and b is an m × 1 vector of scalars.

1.2 Solving linear equations  In high school, systems of linear


equations are often solved with ad
In this section, we consider a systematic method of solving systems hoc methods that are quite suitable
for small systems, but which are
of linear equations. The method consists of two steps, first using
not sufficiently systematic to tackle
Gaussian elimination to reduce the system to a simpler system of larger systems. It is very important
linear equations, and then back-substitution to find the solutions to to thoroughly learn the systematic
method, as solving systems of linear
the simpler system. equations is a fundamental part of
many of the questions that arise in
linear algebra. In fact, almost every
question in linear algebra ultimately
depends on setting up and solving a
suitable system of linear equations!
math1012 mathematical theory and methods 11

1.2.1 Elementary row operations


An elementary row operation is an operation that transforms a system
of linear equations into a different, but equivalent system of linear
equations, where “equivalent” means that the two systems have
identical solutions. The answer to the obvious question — “ Why
bother transforming one system of linear equations into another?” —
is that the new system might be simpler to solve than the original
system. In fact there are some systems of linear equations that are
extremely simple to solve, and it turns out that by systematically
applying a sequence of elementary row operations, we can transform
any system of linear equations into an equivalent system whose
solutions are very simple to find.

Definition 1.3. (Elementary row operations)


An elementary row operation is one of the following three types of
transformation applied to a system of linear equations:

Type 1 Interchanging two equations.

Type 2 Multiplying an equation by a non-zero scalar.

Type 3 Adding a multiple of one equation to another equation.

In a system of linear equations, we let Ri denote the i-th equa-


tion, and so we can express an elementary row operation symboli-
cally as follows:

Ri ↔ R j Exchange equations Ri and R j


Ri ← αRi Multiply equation Ri by α
Ri ← Ri + αR j Add α times R j to Ri

We will illustrate elementary row operations on a simple system


of linear equations:

x + 2y + z = 5
y − z = −1 (1.2)
2x + 3y − z = 3

Example 1.4. (Type 1 Elementary Row Operation) Applying the Type 1


elementary row operation R1 ↔ R2 (in words, “interchange equations 1
and 2") to the original system Equation (1.2) yields the system of linear
equations:
y − z = −1
x + 2y + z = 5
2x + 3y − z = 3
It is obvious that this new system of linear equations has exactly the same
solutions as the original system, because each individual equation is un-
changed and listing them in a different order does not alter which vectors
satisfy them all.
12 CHAPTER 1. SYSTEMS OF LINEAR EQUATIONS

Example 1.5. (Type 2 Elementary Row Operation) Applying the Type


2 elementary row operation R2 ← 3R2 (in words, “multiply the second
equation by 3”) to the original system Equation (1.2) gives a new system
of linear equations:
x + 2y + z = 5
3y − 3z = −3
2x + 3y − z = 3
Again it is obvious that this system of linear equations has exactly the
same solutions as the original system. While the second equation is
changed, the solutions to this individual equation are not changed.5 5
This relies on the equation being
multiplied by a non-zero scalar.
Example 1.6. (Type 3 Elementary Row Operation) Applying the Type 3
elementary row operation R3 ← R3 − 2R1 (in words,“add −2 times the
first equation to the third equation" ) to the original system Equation (1.2)
gives a new system of linear equations:

x + 2y + z = 5
y − z = −1
− y − 3z = −7

In this case, it is not obvious that the system of linear equations has the
same solutions as the original. In fact, the system is actually different
from the original, but it happens to have the exact same set of solutions.
This is so important that it needs to be proved.

As foreshadowed in the last example, in order to use elementary


row operations with confidence, we must be sure that the set of solu-
tions to a system of linear equations is not changed when the system
is altered by an elementary row operation. To convince ourselves of
this, we need to prove6 that applying an elementary row operation 6
A proof in mathematics is a careful
explanation of why some mathematical
to a system of linear equations neither destroys existing solutions
fact is true. A proof normally consists
nor creates new ones. of a sequence of statements, each fol-
lowing logically from the previous
Theorem 1.7. Suppose that S is a system of linear equations, and that statements, where each individual
logical step is sufficiently simple that
T is the system of linear equations that results by applying an elementary
it can easily be checked. The word
row operation to S. Then the set of solutions to S is equal to the set of “proof” often alarms students, but
solutions to T. really it is nothing more than a very
simple line-by-line explanation of a
mathematical statement. Creating
Proof. As discussed in the examples, this is obvious for Type 1 and proofs of interesting or useful new
Type 2 elementary row operations. So suppose that T arises from S facts is the raison d‘être of a profes-
by performing the Type 3 elementary row operation Ri ← Ri + αR j . sional mathematician.

Then S consists of m equations

S = { R1 , R2 , . . . , R m }

while T only differs in the i-th equation

T = { R1 , R2 , . . . , Ri−1 , Ri + αR j , Ri+1 , . . . , Rm }.

It is easy to check that if a vector satisfies two equations Ri and


R j , then it also satisfies Ri + αR j and so any solution to S is also a
solution to T. What remains to be checked is that any solution to T
math1012 mathematical theory and methods 13

is a solution to S. However if a vector satisfies all the equations in


T, then it satisfies Ri + αR j and R j , and so it satisfies the equation

( Ri + αR j ) + (−αR j )

which is just Ri . Thus any solution to T also satisfies S.

Now let’s consider applying an entire sequence of elementary


row operations to our example system of linear equations Equation
(1.2) to reduce it to a much simpler form.
So, starting with

x + 2y + z = 5
y − z = −1
2x + 3y − z = 3

apply the Type 3 elementary row operation R3 ← R3 − 2R1 to get

x + 2y + z = 5
y − z = −1
− y − 3z = −7

followed by the Type 3 elementary row operation R3 ← R3 + R2


obtaining
x + 2y + z = 5
y − z = −1
− 4z = −8
Now notice that the third equation only involves the variable z,
and so it can now be solved, obtaining z = 2. The second equation
involves just y, z and as z is now known, it really only involves y,
and we get y = 1. Finally, with both y and z known, the first equa-
tion only involves x and by substituting the values that we know
into this equation we discover that x = 1. Therefore, this system of
linear equations has the unique solution ( x, y, z) = (1, 1, 2).
Notice that the final system was essentially trivial to solve, and
so the elementary row operations converted the original system
into one whose solution was trivial.

1.2.2 The augmented matrix


The names of the variables in a system of linear equations are es-
sentially irrelevant — whether we call three variables x, y and z or
x1 , x2 and x3 makes no fundamental difference to the equations or
their solution. Thus writing out each equation in full when writing
down a sequence of systems of linear equations related by elemen-
tary row operations involves a lot of unnecessary repetition of the
variable names. Provided each equation has the variables in the
same order, all the information is contained solely in the coefficients,
and so these are all we need. Therefore we normally represent a
system of linear equations by a matrix known as the augmented
matrix of the system of linear equations; each row of the matrix
represents a single equation, with the coefficients of the variables
14 CHAPTER 1. SYSTEMS OF LINEAR EQUATIONS

to the left of the bar, and the constant term to the right of the bar.
Each column to the left of the bar contains all of the coefficients for
a single variable. For our example system Equation (1.2), we have
the following:
 
x + 2y + z = 5 1 2 1 5
y − z = −1 0 1 −1 −1
 
2x + 3y − z = 3 2 3 −1 3

Example 1.8. (From matrix to linear system) What system of linear


equations has the following augmented matrix?
 
0 −1 2 3
1 0 −2 4
 
3 4 1 0

The form of the matrix tells us that there are three variables, which we
can name arbitrarily, say x1 , x2 and x3 . Then the first row of the matrix
corresponds to the equation 0x1 − 1x2 + 2x3 = 3, and interpreting the
other two rows analogously, the entire system is

− x2 + 2x3 = 3
x1 − 2x3 = 4
3x1 + 4x2 + x3 = 0.

We could also have chosen any other three names for the variables.

In other words, the augmented matrix for the system of linear


equations Ax = b is just the matrix [ A | b].

1.3 Gaussian elimination

When solving a system of linear equations using the augmented


matrix, the elementary row operations7 are performed directly on 7
This is why they are called elemen-
the augmented matrix. tary row operations, rather than ele-
mentary equation operations, because
As explained earlier, the aim of the elementary row operations is they are always viewed as operating
to put the matrix into a simple form from which it is easy to “read on the rows of the augmented matrix.
off” the solutions; to be precise we need to define exactly the simple
form that we are trying to achieve.

Definition 1.9. (Row echelon form)


A matrix is in row echelon form if

1. Any rows of the matrix consisting entirely of zeros occur as the last
rows of the matrix, and
2. The first non-zero entry of each row is in a column strictly to the right
of the first non-zero entry in any of the earlier rows.

This definition is slightly awkward to read, but very easy to


math1012 mathematical theory and methods 15

grasp by example. Consider the two matrices

1 1 −1 2 0 1 1 −1 2 0
   
0 0 −2 1 3  0 0 −2 1 3 
   
0 0 0 1 0  0 2 0 1 0 
   

0 0 0 0 −1 0 0 1 2 −1

Neither matrix has any all-zero rows so the first condition is au-
tomatically satisfied. To check the second condition we need to
identify the first non-zero entry in each row — this is called the
leading entry:

1 1 −1 2 0 1 1 −1 2 0
   

 0 0 −2 1 3 


 0 0 −2 1 3 
0 0 0 1 0 0 2 0 1 0 
   
  
0 0 0 0 −1 0 0 1 2 −1

In the first matrix, the leading entries in rows 1, 2, 3 and 4 occur


in columns 1, 3, 4 and 5 respectively and so the leading entry for
each row always occurs strictly further to the right than the leading
entry in any earlier row. So this first matrix is in row-echelon form.
However for the second matrix, the leading entry in rows 2 and 3
occur in columns 3 and 2 respectively, and so the leading entry in
row 3 actually occurs to the left of the leading entry in row 2; hence
this matrix is not in row-echelon form.

Example 1.10. (Row-echelon form) The following matrices are all in


row-echelon form:
     
1 0 2 1 1 1 2 3 1 0 0 0
0 0 1 −1 0 2 1 −1 0 2 1 1
     
0 0 0 0 0 0 3 0 0 0 0 1

Example 1.11. (Not row-echelon form) None of the following matrices are
in row-echelon form:
     
1 0 2 1 1 1 2 3 1 0 0 0
0 0 0 0  0 2 1 −1 0 0 1 1
     
0 0 1 −1 0 1 3 0 0 0 2 1

Gaussian Elimination (sometimes called row-reduction) is a system-


atic method for applying elementary row operations to a matrix
until it is in row-echelon form. We’ll see in the next section that a
technique called back substitution, which involves processing the
equations in reverse order, can easily determine the set of solu-
tions to a system of linear equations whose augmented matrix is in
row-echelon form.
Without further ado, here is the algorithm for Gaussian Elimina-
tion, first informally in words, and then more formally in symbols.
The algorithm is defined for any matrices, not just the augmented
16 CHAPTER 1. SYSTEMS OF LINEAR EQUATIONS

matrices arising from a system of linear equations, because it has


many applications.

Definition 1.12. (Gaussian elimination — in words)


Let A be an m × n matrix. At each stage in the algorithm, a particular
position in the matrix, called the pivot position, is being processed. Ini-
tially the pivot position is at the top-left of the matrix. What happens at
each stage depends on whether the pivot entry (that is, the number in the
pivot position) is zero or not.

1. If the pivot entry is zero then, if possible, interchange the pivot row
with one of the rows below it, in order to ensure that the pivot entry is
non-zero. This will be possible unless the pivot entry and every entry
below it are zero, in which case simply move the pivot position one
column to the right.
2. If the pivot entry is non-zero then, by adding a suitable multiple of the
pivot row to every row below the pivot row, ensure that every entry
below the pivot entry is zero. Then move the pivot position one column
to the right and one row down.

When the pivot position is moved off the matrix, then the process finishes
and the matrix will be in row-echelon form.

The process of “adding a multiple of the pivot row to every row


below it in order to zero out the column below the pivot entry” is
called pivoting on the pivot entry for short.

Example 1.13. (Gaussian elimination) Consider the following matrix,


with the initial pivot position marked:
 
2 1 2 4
 2 1 1 0
 

4 3 2 4

The initial pivot position is the (1, 1) position in the matrix, and the
pivot entry is therefore 2. Pivoting on the (1, 1)-entry is accomplished by
performing the two elementary operations R2 ← R2 − R1 and R3 ←
R3 − 2R1 , leaving the matrix:
 
2 1 2 4
0 0 −1 −4 R2 ← R2 − R1
 

0 1 −2 −4 R3 ← R3 − 2R1

(The elementary row operations used are noted down next to the relevant
rows to indicate how the row reduction is proceeding.) The new pivot
entry is 0, but as the entry immediately under the pivot position is non-
zero, interchanging the two rows moves a non-zero to the pivot position.
 
2 1 2 4
0 1 −2 −4 R2 ↔ R3
 

0 0 −1 −4 R3 ↔ R2
math1012 mathematical theory and methods 17

The next step is to pivot on this entry in order to zero out all the entries
below it and then move the pivot position. As the only entry below the
pivot is already zero, no elementary row operations need be performed, and
the only action required is to move the pivot:
 
2 1 2 4
0 1 −2 −4 .
 

0 0 −1 −4

Once the pivot position reaches the bottom row, there are no further opera-
tions to be performed (regardless of whether the pivot entry is zero or not)
and so the process terminates, leaving the matrix in row-echelon form
 
2 1 2 4
0 1 −2 −4
 
0 0 −1 −4

as required.

For completeness, and to provide a description more suitable


for implementing Gaussian elimination on a computer, we give the
same algorithm more formally – in a sort of pseudo-code.8 8
Pseudo-code is a way of expressing
a computer program precisely, but
without using the syntax of any
particular programming language.
Definition 1.14. (Gaussian elimination — in symbols) In pseudo-code, assignments, loops,
conditionals and other features that
Let A = ( aij ) be an m × n matrix and set two variables r ← 1, c ← 1. vary from language-to-language are
(Here r stands for “row” and c for “column” and they store the pivot po- expressed in natural language.
sition.) Then repeatedly perform whichever one of the following operations
is possible (only one will be possible at each stage) until either r > m or
c > n, at which point the algorithm terminates.

1. If arc = 0 and there exists x > r such that a xc 6= 0 then perform the
elementary row operation Rr ↔ R x .
2. If arc = 0 and a xc = 0 for all x > r, then set c ← c + 1.
3. If arc 6= 0 then, for each x > r, perform the elementary row operation

R x ← R x − ( a xc /arc ) Rr ,

and then set r ← r + 1 and c ← c + 1.

When this algorithm terminates, the matrix will be in row-echelon form.

1.4 Back substitution

Recall that the whole point of elementary row operations is to


transform a system of linear equations into a simpler system with
the same solutions; in other words to change the problem to an easier
problem with the same answer. So after reducing the augmented
matrix of a system of linear equations to row-echelon form, we now
need a way to read off the solution set.
18 CHAPTER 1. SYSTEMS OF LINEAR EQUATIONS

The first step is to determine whether the system is consistent or


otherwise, and this involves identifying the leading entries in each
row of the augmented matrix in row-echelon form — these are the
first non-zero entries in each row. In other words, run a finger along
each row of the matrix stopping at the first non-zero entry and
noting which column it is in.

Example 1.15. (Leading entries) The following two augmented matrices


in row-echelon form have their leading entries highlighted.

1 0 −1 2 3
 
 
1 2 −1 2 3
0 0 2 1 0
 
0 0 2 1 0
   
0 0 0 0 2
 
0 0 0 −1 2
 
0 0 0 0 0

The left-hand matrix of Example 1.15 has the property that one
of the leading entries is on the right-hand side of the augmenting bar. You may wonder why we keep saying
If we “unpack” what this means for the system of linear equations, “to the right of the augmenting bar”
rather than “in the last column”. The
then we see that the third row corresponds to the linear equation answer is that if we have more than
one linear equation with the same
0x1 + 0x2 + 0x3 + 0x4 = 2, coefficient matrix, say Ax = b1 ,
Ax = b2 , then we can form a “super-
augmented” matrix [ A | b1 b2 ]
which can never be satisfied. Therefore this system of linear equa- and solve both systems with one
tions has no solutions, or in other words, is inconsistent. This is in application of Gaussian elimination.
So there may be more than one column
fact a defining feature of an inconsistent system of linear equations, a
to the right of the augmenting bar.
fact that is important enough to warrant stating separately.

Theorem 1.16. A system of linear equations is inconsistent if and only if


one of the leading entries in the row-echelon form of the augmented matrix
is to the right of the augmenting bar.

Proof. Left to the reader.

The right-hand matrix of Example 1.15 has no such problem,


and so we immediately conclude that the system is consistent — it
has at least one solution. Every column to the left of the augment-
ing bar corresponds to one of the variables in the system of linear
equations

x1 x2 x3 x4
 
1 2 −1 2 3
0 0 2 1 0 (1.3)
 
0 0 0 −1 2
and so the leading entries identify some of the variables. In this case,
the leading entries are in columns 1, 3 and 4 and so the identified
variables are x1 , x3 and x4 . The variables identified in this fash-
ion are called the basic variables (also known as leading variables) of
the system of linear equations. The following remark is the key to
understanding solving systems of linear equations by back substitu-
tion:
math1012 mathematical theory and methods 19

Remark 1.17. Every non-basic variable of a system of linear equations


is a free variable or free parameter of the system of linear equations,
while every basic variable can be expressed uniquely as a combination of
the free parameters and/or constants.

The process of back-substitution refers to examining the equations


in reverse order, and for each equation finding the unique expres-
sion for the basic variable corresponding to the leading entry of
that row. Let’s continue our examination of the right-hand matrix of
Example 1.15, also shown with the columns identified in Equation
(1.3).
The third row of the matrix, when written out as an equation,
says that −1x4 = 2, and so x4 = −2, which is an expression for
the basic variable x4 as a constant. The second row of the matrix
corresponds to the equation 2x3 + x4 = 0, but as we know now that
x4 = −2, this can be substituted in to give 2x3 − 2 = 0 or x3 = 1.
The first row of this matrix corresponds to the equation

x1 + 2x2 − x3 + 2x4 = 3

and after substituting in x3 = 1 and x4 = −2 this reduces to

x1 + 2x2 = 8. (1.4)

This equation involves one basic variable (that is, x1 ) together with
a non-basic variable (that is, x2 ) and a constant (that is, 8). The rules
of back-substitution say that this should be manipulated to give an
expression for the basic variable in terms of the other things. So we
get
x1 = 8 − 2x2

and the entire solution set for this system of linear equations is
given by
S = {(8 − 2x2 , x2 , 1, −2) | x2 ∈ R}.

Therefore we conclude that this system of linear equations has


infinitely many solutions that can be described by one free parameter.
The astute reader will notice that (1.4) could equally well be
written x2 = 4 − x1 /2 and so we could use x1 as the free parameter,
rather than x2 — so why does back-substitution need to specify
which variable should be chosen as the free parameter? The answer
is that there is always an expression for the solution set that uses
the non-basic variables as the free parameters. In other words, the
process as described will always work.

Example 1.18. (Back substitution) Find the solutions to the system of


linear equations whose augmented matrix in row-echelon form is

0 2 −1 0 2 3 1
 
0 0 1 3 −1 0 2
 
0 0 0 0 1 1 0
 

0 0 0 0 0 1 5
20 CHAPTER 1. SYSTEMS OF LINEAR EQUATIONS

First identify the leading entries in the matrix, and therefore the basic and
non-basic variables. The leading entries in each row are highlighted below

0 2 −1 0 2 3 1
 
0 0 1 3 −1 0 2
 
0 0 0 0 1 1 0
 

0 0 0 0 0 1 5

Therefore the basic variables are { x2 , x3 , x5 , x6 } while the free variables


are { x1 , x4 } and so this system has infinitely many solutions that can be
described with two free parameters. Now back-substitute starting from the
last equation. The fourth equation is simply that x6 = 5, while the third
equation gives x5 + x6 = 0 which after substituting the known value for
x6 gives us x5 = −5. The second equation is

x3 + 3x4 − x5 = 2

and so it involves the basic variable x3 along with the free variable x4 and
the already-determined variable x5 . Substituting the known value for x5
and rearranging to give an expression for x3 , we get

x3 = −3 − 3x4 .

Finally the first equation is

2x2 − x3 + 2x5 + 3x6 = 1

and so substituting all that we have already determined we get

2x2 − (−3 − 3x4 ) + 2(−5) + 3(5) = 1

which simplifies to
−7 − 3x4
x2 = .
2
What about x1 ? It is a variable in the system of linear equations, but it
did not actually occur in any of the equations. So if it does not appear
in any of the equations, then there are no restrictions on its values and
so it can take any value — therefore it is a free variable. Fortunately, the
rules for back-substitution have already identified it as a non-basic variable
as it should be. Therefore the final solution set for this system of linear
equations is

1
  
S= x1 , (−7 − 3x4 ), −3 − 3x4 , x4 , −5, 5 x1 , x4 ∈ R
2
and therefore we have found an expression with two free parameters as
expected.

Key Concept 1.19. (Solving systems of linear equations)


To solve a system of linear equations of the form Ax = b, perform the
following steps:

1. Form the augmented matrix [ A | b].


math1012 mathematical theory and methods 21

2. Use Gaussian elimination to put the augmented matrix into row-


echelon form.
3. Use back-substitution to express each of the basic variables as a
combination of the free variables and constants.

1.5 A more advanced method: Gauss-Jordan elimination

We now explain a method that allows us to do both Gaussian eli-


mination and back-substitution at the same time, both in matrix
form.
In Gaussian elimination, when we pivot on an entry in the ma-
trix, we use the pivot row in order to zero-out the rest of the col-
umn below the pivot entry. However there is nothing stopping us
from zeroing out the rest of the column above the pivot entry as
well. We will now do more elementary row operations in order to
make the system of linear equations even simpler than before.
Let’s do this on the system with the following augmented matrix
and see how useful it is:
 
−1 0 1 1
 0 1 −1 0 .
 
2 0 −1 0

After pivoting on the (1, 1)-entry we get


 
−1 0 1 1
 0 1 −1 0
 
0 0 1 2 R3 ← R3 + 2R1

which is now in row-echelon form. We can now use the last pivot
to zero-out the rest of the third column:
 
−1 0 0 −1 R1 ← R1 − R3
 0 1 0 2  R2 ← R2 + R3
 
0 0 1 2
One final elementary row operation puts the augmented matrix
into an especially nice form.
 
1 0 0 1 R1 ← (−1) R1
0 1 0 2
 
0 0 1 2

In this form not even any back substitution is needed to find the
solution; the system of equations has solution x1 = 1, x2 = 2 and
x3 = 2.
In this example, we’ve jumped ahead without using the formal
terminology or precisely defining the “especially nice form” of the
final matrix. We remedy this immediately.
22 CHAPTER 1. SYSTEMS OF LINEAR EQUATIONS

Definition 1.20. (Reduced row-echelon form)


A matrix is in reduced row-echelon form if it is in row echelon form,
and

1. The leading entry of each row is equal to one, and


2. The leading entry of each row is the only non-zero entry in its column.

Example 1.21. (Reduced row-echelon form) The following matrices are


both in reduced row echelon form:
1 0 0 2 0 1 0 0
   
0 1 0 0 0 0 1 0
 and  .
   
0 0 1 3 0 0 0 1

0 0 0 0 1 0 0 0

Example 1.22. (Not in reduced row-echelon form) The following matrices


are NOT in reduced row echelon form:
1 0 0 2 0 1 0 0
   
0 −1 0 0 0 0 1 0
 and  .
   
0 0 1 3 0 0 0 1

0 0 0 0 2 0 0 1

A simple modification to the algorithm for Gaussian elimination


yields an algorithm for reducing a matrix to reduced row echelon
form; this algorithm is known as Gauss-Jordan elimination. It is pre-
sented below, with the differences between Gaussian elimination
and Gauss-Jordan elimination highlighted in boldface.

Definition 1.23. (Gauss-Jordan elimination — in words)


Let A be an m × n matrix. At each stage in the algorithm, a particular
position in the matrix, called the pivot position, is being processed. Ini-
tially the pivot position is at the top-left of the matrix. What happens at
each stage depends on whether the pivot entry (that is, the number in the
pivot position) is zero or not.

1. If the pivot entry is zero then, if possible, interchange the pivot row
with one of the rows below it, in order to ensure that the pivot entry is
non-zero. This will be possible unless the pivot entry and every entry
below it are zero, in which case simply move the pivot position one
column to the right.
2. If the pivot entry is non-zero, multiply the pivot row to ensure
that the pivot entry is 1 and then, by adding a suitable multiple of
the pivot row to every row above and below the pivot row, ensure that
every entry above and below the pivot entry is zero. Then move the
pivot position one column to the right and one row down.
math1012 mathematical theory and methods 23

When the pivot position is moved off the matrix, then the process finishes
and the matrix will be in reduced row echelon form.

This method has the advantage that now the solutions can sim-
ply be read off the augmented matrix.

Key Concept 1.24. (Solving systems of linear equations, advanced


method)
To solve a system of linear equations of the form Ax = b, perform the
following steps:

1. Form the augmented matrix [ A | b].


2. Use Gauss-Jordan elimination to put the augmented matrix into
reduced row-echelon form.
3. Identify the leading entries (which are all equal to 1) to identify the
basic variables; the other variables will be free parameters.
4. Read from each row of the reduced row-echelon form matrix what
each basic variable is equal to as a combination of the free variables
and constants.

We will now apply this method to an example to illustrate the


differences in the method and show how easy it is to get the solu-
tion from the reduced row-echelon form. Compare with Example
1.13.

Example 1.25. (Gauss-Jordan elimination) Consider the system cor-


responding to the following augmented matrix, with the initial pivot
position marked:
 
2 1 2 4 −2
2 1 1 0 1 
 

4 3 2 4 3

The initial pivot position is the (1, 1) position in the matrix, and the
pivot entry is therefore 2. Our first step is to multiply the first row by 1/2
so that the pivot entry is 1.

  1
1 1/2 1 2 −1 R1 ← R1
2
2 1 1 0 1 
 

4 3 2 4 3

Pivoting on the (1, 1)-entry is then accomplished by performing the two


elementary operations R2 ← R2 − 2R1 and R3 ← R3 − 4R1 , leaving the
matrix:  
1 1/2 1 2 −1
0 0 −1 −4 3  R2 ← R2 − 2R1
 

0 1 −2 −4 7 R3 ← R3 − 4R1
24 CHAPTER 1. SYSTEMS OF LINEAR EQUATIONS

The new pivot entry is 0, but as the entry immediately under the pivot
position is non-zero, interchanging the two rows moves a non-zero to the
pivot position.
 
1 1/2 1 2 −1
0 1 −2 −4 7  R2 ↔ R3
 

0 0 −1 −4 3 R3 ↔ R2
This pivot is already equal to 1 so the next step is to pivot on this entry in
order to zero out all the entries above and below it and then move the
pivot position.
  1
1 0 2 4 −9/2 R1 ← R1 − R2
2
0 1 −2 −4 7 
 

0 0 −1 −4 3

Now me multiply Row 3 by −1 to make the pivot equal to 1.


 
1 0 2 4 −9/2
0 1 −2 −4 7 
 

0 0 1 4 −3 R3 ← − R3
Finally we pivot off that entry to get zeros in all other entries in that
column.
 
1 0 0 −4 3/2 R1 ← R1 − 2R3
0 1 0 4 1  R2 ← R2 + 2R3
 
0 0 1 4 −3
This matrix is now in reduced row-echelon form. The leading entries
are exactly the positions we used as pivots:
 
1 0 0 −4 3/2
0 1 0 4 1 
 

0 0 1 4 −3
since no leading entry is to the right of the augmenting bar, this system
is consistent. Moreover, we see that the basic variables are x1 , x2 , and x3 ,
and there is one free parameter: x4 .
The first row, written as an equation, is x1 − 4x4 = 3/2, thus we
immediately get x1 = 4x4 + 3/2. From the second row and third row,
we immediately get x2 = −4x4 + 1 and x3 = −4x4 − 3, respectively.
Therefore the final solution set for this system of linear equations is
S = {(4x4 + 3/2, −4x4 + 1, −4x4 − 3, x4 )| x4 ∈ R}

As you can see, there are more steps with matrices, but then no
back-substitution is required at all.

Remark 1.26. As you saw in the very first step of the example, having a
pivot entry not equal to 1 or −1 introduces fractions, which can be annoy-
ing. If there is an entry in that column which is a 1 or −1, interchanging
the two rows before applying Gauss-Jordan elimination allows us to avoid
introducing fractions, and so makes calculations easier.
math1012 mathematical theory and methods 25

1.6 Reasoning about systems of linear equations

Understanding the process of Gaussian elimination and back-


substitution and Gauss-Jordan elimination also allows us to reason
about systems of linear equations, even if they are not explicitly de-
fined, and make general statements about the number of solutions
to systems of linear equations. One of the most important is the
following result, which says that any consistent system of linear
equations with more unknowns than equations has infinitely many
solutions.

Theorem 1.27. Suppose that Ax = b is a system of m linear equations


in n variables. If m < n, then the system is either inconsistent or has
infinitely many solutions.

Proof. Consider the row-echelon form of the augmented matrix [ A |


b]. If the last column (on the right of the augmenting bar) contains
the leading entry of some row, then the system is inconsistent.
Otherwise, each leading entry is in a column corresponding to a
variable, and so there are exactly m basic variables. As there are n
variables altogether, this leaves n − m > 0 free parameters in the
solution set and so there are infinitely many solutions.

A homogeneous system of linear equations is one of the form


Ax = 0 and these systems are always consistent.9 Thus Theorem 1.27 9
Why is this true?
has the important corollary that “every homogeneous system of lin-
ear equations with more unknowns than equations has infinitely many
solutions”.
A second example of reasoning about a system of linear equa-
tions rather than just solving an explicit system is when the system
is not fully determined. For example, suppose that a and b are un-
known values. What can be said about the number of solutions of
the following system of linear equations?
    
1 2 a x −3
0 1 2  y  =  b 
    
1 3 3 z 0

In particular, for which values of a and b will this system have 0, 1


or infinitely many solutions?
To answer this, start performing Gaussian elimination as usual,
treating a and b symbolically as their values are not known.10 The 10
Of course, it is necessary to make
row reduction proceeds in the following steps: the initial aug- sure that you never compute anything
that might be undefined, such as 1/a.
mented matrix is   If you need to use 1/a during the
1 2 a −3 Gaussian elimination, then you need to
0 1 2 b 
  separate out the cases a = 0 and a 6= 0
and do them separately.
1 3 3 0
and so after pivoting on the top-left position we get
 
1 2 a −3
0 1 2 b 
 
0 1 3−a 3 R3 ← R3 − R1
26 CHAPTER 1. SYSTEMS OF LINEAR EQUATIONS

and then  
1 2 a −3
0 1 2 b 
 
0 0 1 − a 3 − b R3 ← R3 − R2
From this matrix, we can immediately see that if a 6= 1 then 1 − a 6=
0 and every variable is basic, which means that the system has a
unique solution (regardless of the value of b). On the other hand,
if a = 1 then either b 6= 3 in which case the system is inconsistent,
or b = 3 in which case there are infinitely many solutions. We can
summarise this outcome:

a 6= 1 Unique solution
a = 1 and b 6= 3 No solution
a = 1 and b = 3 Infinitely many solutions
2
Vector spaces and subspaces

This chapter takes the first steps away from the geometric in-
terpretation of vectors in familiar 2− or 3−dimensional space by
introducing n−dimensional vectors and the vector space Rn , which
must necessarily be described and manipulated algebraically.

Before commencing this chapter, students should be able to:

• Solve systems of linear equations.

After completing this chapter, students will be able to:

• Determine when a set of vectors is a subspace, and


• Determine when a set of vectors is linearly independent, and
• Find a basis for a subspace, and hence determine its dimension.

2.1 The vector space Rn

The vector space Rn consists of all the n−tuples of real numbers,


which henceforth we call vectors; formally we say that

Rn = {( x1 , x2 , . . . , xn ) | x1 , x2 , . . . , xn ∈ R}.

Thus R2 is just the familiar collection of pairs of real numbers that


we usually visualise by identifying each pair ( x, y) with the point
( x, y) on the Cartesian plane, and R3 the collection of triples of real
numbers that we usually identify with 3−space.
A vector u = (u1 , . . . , un ) ∈ Rn may have different meanings:

• when n = 2 or 3 it could represent a geometric vector in Rn that


has both a magnitude and a direction;

• when n = 2 or 3 it could represent the coordinates of a point in


the Cartesian plane or in 3−space;

• it could represent certain quantities, eg u1 apples, u2 pears, u3


oranges, u4 bananas, . . .

• it may simply represent a string of real numbers.


28 CHAPTER 2. VECTOR SPACES AND SUBSPACES

The vector space Rn also has two operations that can be per-
formed on vectors, namely vector addition and scalar multiplication.
Although their definitions are intuitively obvious, we give them
anyway:

Definition 2.1. (Vector addition)


If u = (u1 , u2 , . . . , un ) and v = (v1 , v2 , . . . , vn ) are vectors in Rn then
their sum u + v is defined by

u + v = ( u1 + v1 , u2 + v2 , . . . , u n + v n ).

In other words, two vectors are added coordinate-by-coordinate.

Definition 2.2. (Scalar multiplication)


If v = (v1 , v2 , . . . , vn ) and α ∈ R then the product αv is defined by

αv = (αv1 , αv2 , . . . , αvn ) .

In other words, each coordinate of the vector is multiplied by the scalar.

Example 2.3. Here are some vector operations:

(1, 2, −1, 3) + (4, 0, 1, 2) = (5, 2, 0, 5)


(3, 1, 2) + (6, −1, −4) = (9, 0, −2)
5(1, 0, −1, 2) = (5, 0, −5, 10)

Row or column vectors?


A vector in Rn is simply an ordered n−tuple of real num-
bers and for many purposes all that matters is that we write it
down in such a way that it is clear which is the first coordinate,
the second coordinate and so on.
However a vector can also be viewed as a matrix, which is
very useful when we use matrix algebra (the subject of Chap-
ter 3) to manipulate equations involving vectors, and then a
choice has to be made whether to use a 1 × n matrix, i.e. a
matrix with one row and n columns or an n × 1 matrix, i.e. a
matrix with n rows and 1 column to represent the vector.
Thus a vector in R4 can be represented either as a row vector
such as
(1, 2, 3, 4)
or as a column vector such as
1
 
2
 .
 
3
4
math1012 mathematical theory and methods 29

For various reasons, mostly to do with the conventional no-


tation we use for functions (that is, we usually write f ( x ) rather For instance when we write Ax = b for
a system of linear equaions in Chapter
than ( x ) f ), it is more convenient mathematically to assume that 1, the vectors x and b are column
vectors are represented as column vectors most of the time. Un- vectors.
fortunately, in writing about mathematics, trying to typeset a row
vector such as [1, 2, 3, 4] is much more convenient than typeset-
1
 
2
ting a column vector such as   which, as you can see, leads
 
3
4
to ugly and difficult to read paragraphs.
Some authors try to be very formal and use the notation for
a matrix transpose (see Chapter 3) to allow them to elegantly
typeset a column vector: so their text would read something
like: Let v = (1, 2, 3, 4) T in which case everyone is clear that the
vector v is really a column vector.
In practice however, either the distinction between a row-
and column-vector is not important (e..g adding two vectors
together) or it is obvious from the context; in either case there
is never any actual confusion caused by the difference. So to
reduce the notational overload of adding a slew of transpose
symbols that are almost never needed, in these notes we’ve
decided to write all vectors just as rows, but with the under-
standing that when it matters (in matrix equations), they are
really to be viewed as column vectors. In this latter case, it will
always be obvious from the context that the vectors must be
column vectors anyway!

One vector plays a special role in linear algebra; the vector in


Rn with all components equal to zero is called the zero-vector and
denoted
0 = (0, 0, . . . , 0).
It has the obvious properties that v + 0 = 0 + v = v; this means that
it is an additive identity1 . 1
This is just formal mathematical
terminology for saying that you can
add it to any vector without altering
2.2 Subspaces that vector.

In the study of 2− and 3− dimensional geometry, figures such


as lines and planes play a particularly important role and occur in
many different contexts. In higher-dimensional and more general
vector spaces, a similar role is played by a vector subspace or just
subspace which is a set of vectors that has three special additional
properties.

Definition 2.4. (Vector subspace)


Let S ⊆ Rn be a set of vectors. Then S is called a subspace of Rn if
30 CHAPTER 2. VECTOR SPACES AND SUBSPACES

(S1) 0 ∈ S, and
(S2) u + v ∈ S for all vectors u, v ∈ S, and
(S3) αv ∈ S for all scalars α ∈ R and vectors v ∈ S.

First we’ll go through these three conditions in turn and see


what they are saying. The first condition (S1) simply says that a
subspace must contain the zero vector 0; when this condition does
not hold, it is an easy way to show that a given set of vectors is not a
subspace.

Example 2.5. The set of vectors S = {( x, y) | x + y = 1} is not a


subspace of R2 because the vector 0 = (0, 0) does not belong to S.

The second condition2 (S2) says that in order to be a subspace, a 2


Condition (S2) does not restrict what
set S of vectors must be closed under vector addition. This means that happens to the sum of two vectors that
are not in S, or the sum of a vector in S
if two vectors that are both in S are added together, then their sum and one not in S. It is only concerned
must remain in S. with the sum of two vectors that are
both in S.
Example 2.6. (Closed under vector addition)3 In R3 , the xy-plane is the 3
In the next section, we’ll see how to
set of all vectors of the form ( x, y, 0) (where x and y can be anything). The present a formal proof that a set of
vectors is closed under vector addition,
xy-plane is closed under vector addition because if we add any two but for these examples, geometric
vectors in the xy-plane together, then the resulting vector also lies in the intuition is enough to see that what is
being claimed is true.
xy-plane.

Example 2.7. (Not closed under vector addition) In R2 , the unit disk is
the set of vectors
{( x, y) | x2 + y2 ≤ 1}.
This set of vectors is not closed under vector addition because if we take
u = (1, 0) and v = (0, 1), then both u and v are in the unit disk, but their
sum u + v = (1, 1) is not in the unit disk.

The third condition (S3) says that in order to qualify as a sub-


space, a set S of vectors must be closed under scalar multiplication,
meaning that if a vector is contained in S, then all of its scalar multi-
ples must also be contained in S.

Example 2.8. (Closed under scalar multiplication) In R2 , the set of


vectors on the two axes, namely

S = {( x, y) | xy = 0}

is closed under scalar multiplication, because it is clear that any multiple


of a vector on the x-axis remains on the x-axis, and any multiple of a
vector on the y-axis remains on the y-axis. Algebraically: if ( x, y) satisfies
xy = 0 then (αx, αy) satisfies αx.αy = 0.

Example 2.9. (Not closed under scalar multiplication) In R2 , the unit


disk, which was defined in Example 2.7, is not closed under scalar multi-
plication because if we take u = (1, 0) and α = 2, then αu = (2, 0) which
is not in the unit disk.
math1012 mathematical theory and methods 31

One of the fundamental skills needed in linear algebra is the


ability to identify whether a given set of vectors in Rn forms a
subspace or not. Usually a set of vectors will be described in some
way, and you will need to be able to tell whether this set of vectors
is a subspace. To prove that a given set of vectors is a subspace,
it is necessary to show that all three conditions (S1), (S2) and (S3)
are satisfied, while to show that a set of vectors is not a subspace,
it is only necessary to show that one of the three conditions is not
satisfied.

2.2.1 Subspace proofs


In this subsection, we consider in more detail how to show whether
or not a given set of vectors is a subspace or not. It is much easier
to show that a set of vectors is not a subspace than to show that a
set of vectors is a subspace. The reason for this is that condition
(S2) apply to every pair of vectors in the given set. To show that this
condition fails, we only need to give a single example where the
condition does not hold, but to show that they are true, we need
to find a general argument that applies to every pair of vectors.
The same concept applies to condition (S3). This asymmetry is so
important that we give it a name4 . 4
The “black swan” name comes from
the famous notion that in order to
prove or disprove the logical statement
“all swans are white”, it would only be
Key Concept 2.10. (The “Black Swan” concept) necessary to find one single black swan
to disprove it, but it would be necessary
1. To show that a set of vectors S is not closed under vector addition, to check every possible swan in order
to prove it. Subspaces are the same —
it is sufficient to find a single explicit example of two vectors u, v
if a set of vectors is not a subspace,
that are contained in S, but whose sum u + v is not contained in then it is only necessary to find one
S. “black swan” showing that one of
the conditions does not hold, but if it
However, to show that a set of vectors S is closed under vector is a subspace then it is necessary to
“check every swan” by proving that
addition, it is necessary to give a formal symbolic proof that
the conditions hold for every pair of
applies to every pair of vectors in S. vectors and scalars.

2. To show that a set of vectors S is not closed under scalar mul-


tiplication, it is sufficient to find a single explicit example of
one vector v contained in S and one scalar α such that αv is not
contained in S.
However, to show that a set of vectors S is closed under scalar
multiplication, it is necessary to give a formal symbolic proof
that applies to every pair of one vector in S and one scalar.

Example 2.11. (Not a subspace) The set S = {(w, x, y, z) | wx = yz} in


R4 is not a subspace because if u = (1, 0, 2, 0) and v = (0, 1, 0, 2), then
u, v ∈ S but u + v = (1, 1, 2, 2) ∈
/ S so condition (S2) does not hold.
Examples 2.7 and 2.9 also use this black swan concept for (S2)
and (S3) respectively.
One of the hardest techniques for first-time students of linear al-
gebra is to understand how to structure a proof that a set of vectors
32 CHAPTER 2. VECTOR SPACES AND SUBSPACES

is a subspace, so we’ll go slowly. Let

S = {( x, y, z) | x − y = 2z}

be a set of vectors in R3 . We wish to check whether or not it is a


subspace. Here is a model proof, interleaved with some discussion
about the proof.5 5
When you do your proofs it may
help to structure them like this model
proof, but don’t include the discussion
(S1) It is obvious that — this is to help you understand why
0 − 0 = 2(0) the model proof looks like it does, but
it is not part of the proof itself.
and so 0 ∈ S.
Discussion: To check that 0 is in S, it is necessary to verify that the zero
vector satisfies the “defining condition” that determines S. In this case, the
defining condition is that the difference of the first two coordinates (that
is, x − y) is equal to twice the third coordinate (that is, 2z). For the vector
0 = (0, 0, 0) we have all coordinates equal to 0, and so the condition is true.

(S2) Let u = (u1 , u2 , u3 ) ∈ S and v = (v1 , v2 , v3 ) ∈ S. Then

u1 − u2 = 2u3 (2.1)
v1 − v2 = 2v3 . (2.2)

Discussion: To prove that S is closed under addition, we need to check


every possible pair of vectors, which can only be done symbolically. We give
symbolic names u and v to two vectors in S and write down the only facts
that we currently know — namely that they satisfy the defining condition
for S. These equations are given labels — in this case, Equations (2.1) and
(2.2), because the proof must refer to these equations later.

Now consider the sum

u + v = ( u1 + v1 , u2 + v2 , u3 + v3 )

and test it for membership in S. As

( u1 + v1 ) − ( u2 + v2 ) = u1 + v1 − u2 − v2 (rearranging)
= ( u1 − u2 ) + ( v1 − v2 ) (rearranging)
= 2u3 + 2v3 (by Eqs. (2.1) and (2.2))
= 2( u3 + v3 ) (rearranging terms)

it follows that u + v ∈ S.
Discussion: To show that u + v is in S, we need to show that the difference
of its first two coordinates is equal to twice its third coordinate. So the
sequence of calculations starts with the difference of the first two coordinates
and then carefully manipulates this expression in order to show that it
is equal to twice the third coordinate. Every stage of the manipulation is
justified either just as a rearrangement of the terms or by reference to some
previously known fact. At some stage in the manipulation, the proof must
use Equations (2.1) and (2.2), because the result must depend on the two
original vectors being vectors in S.
math1012 mathematical theory and methods 33

(S3) Let u = (u1 , u2 , u3 ) ∈ S and α ∈ R. Then

u1 − u2 = 2u3 . (2.3)

Discussion: To prove that S is closed under scalar multiplication, we need


to check every vector in S and scalar in R. We give the symbolic name u
to the vector in S and α to the scalar, and note down the only fact that we
currently know — namely that u satisfies the defining condition for S. We’ll
need this fact later, and so give it a name, in this case Equation (2.3).

Now consider the vector

αu = (αu1 , αu2 , αu3 )

and test it for membership in S. As

αu1 − αu2 = α(u1 − u2 ) (rearranging)


= α(2u3 ) (by Equation (2.3))
= 2(αu3 ) (rearranging)

it follows that αu ∈ S.
Discussion: To show that αu is in S, we need to show that the difference
of its first two coordinates is equal to twice its third coordinate. So the
sequence of calculations starts with the difference of the first two coordinates
and then carefully manipulates it in order to show that it is equal to twice
the third coordinate. At some stage in the manipulation, the proof must use
the equations Equation (2.3) because the result must depend on the original
vector being a member of S.

It will take quite a bit of practice to be able to write this sort of


proof correctly, so do not get discouraged if you find it difficult at
first. Here are some examples to try out.

Example 2.12. These sets of vectors are subspaces:

1. The set of vectors {(w, x, y, z) | w + x + y + z = 0} in R4 .


2. The xy-plane in R3 .
3. The line x = y in R2 .
4. The set of vectors {( x1 , x2 , . . . , xn ) | x1 + x2 + . . . + xn−1 = xn } in
Rn .
5. The set consisting of the unique vector 0.

These sets of vectors are not subspaces:

1. The set of vectors {(w, x, y, z) | w + x + y + z = 1} in R4 .


2. The plane normal to n = (1, 2, 1) passing through the point (1, 1, 1).
3. The line x = y − 1 in R2 .
4. The set of vectors {( x1 , x2 , . . . , xn ) | x1 + x2 + . . . + xn−1 ≥ xn } in
Rn for n ≥ 2.
34 CHAPTER 2. VECTOR SPACES AND SUBSPACES

2.2.2 Exercises
1. Show that a line in R2 is a subspace if and only if it passes
through the origin (0, 0).

2. Find a set of vectors in R2 that is closed under vector addition,


but not closed under scalar multiplication.

3. Find a set of vectors in R2 that is closed under scalar multiplica-


tion, but not closed under vector addition.

2.3 Spans and spanning sets

We start this section by considering a simple question:

What is the smallest subspace S of R2 containing the vector v =


(1, 2)?

If S is a subspace, then by condition (S1) of Definition 2.4 it must


also contain the zero vector, so it follows that S must contain at least
the two vectors {0, v}. But by condition (S2), the subspace S must
also contain the sum of any two vectors in S, and so therefore S
must also contain all the vectors

(2, 4), (3, 6), (4, 8), (5, 10), . . .

But then, by condition (S3), it follows that S must also contain all
the multiples of v, such as

(−1, −2), (1/2, 1), (1/4, 1/2), . . .

Therefore S must contain at least the set of vectors6 6


So if a subspace contains a vector
v then it must contain every scalar
multiple of v. In R2 and R3 , this means
{(α, 2α) | α ∈ R}
that if a subspace contains a point,
then it contains the line containing the
and in fact this set contains enough vectors to satisfy the three origin and that point.
conditions (S1), (S2) and (S3), and so it is the smallest subspace
containing v.
Now let’s extend this result by considering the same ques-
tion but with a bigger starting set of vectors: Suppose that A =
{v1 , v2 , . . . , vk } is a set of vectors in Rn — what is the smallest sub-
space of Rn that contains A?
To answer this, we need a couple more definitions:

Definition 2.13. (Linear combination)


Let A = {v1 , v2 , . . . , vk } be a set of vectors in Rn . Then a linear combi-
nation of the vectors in A is any vector of the form

v = α1 v1 + α2 v2 + · · · + αk vk

where α1 , α2 , . . ., αk ∈ R are arbitrary scalars.


math1012 mathematical theory and methods 35

By slightly modifying the argument of the last paragraph, it


should be clear that if a subspace contains the vectors v1 , v2 , . . .,vk ,
then it also contains every linear combination of those vectors. As
we will frequently need to refer to the “set of all possible linear
combinations” of a set of vectors, we should give it a name:

Definition 2.14. (Span)


The span of A = {v1 , v2 , . . . , vk } is the set of all possible linear
combinations of the vectors in A, and is denoted span( A). In symbols, When A is given by a short list of
elements, we will sometimes commit
a small abuse of notation, writing
span( A) = {α1 v1 + α2 v2 + · · · + αk vk | αi ∈ R, 1 ≤ i ≤ k} . span(v1 , v2 , . . . , vk ) instead of the
correct span({v1 , v2 , . . . , vk }).

Therefore, if a subspace S contains a subset A ⊆ S then it also


contains the span of A. To answer the original question (“what is
the smallest subspace containing A”) it is enough to notice that the
span of A is always a subspace itself and so no further vectors need
to be added. This is sufficiently important to write out formally as a
theorem and to give a formal proof.

Theorem 2.15. (Span of anything is a subspace) Let A = {v1 , v2 , . . . , vk }


be a set of vectors in Rn . Then span( A) is a subspace of Rn , and is the
smallest subspace of Rn containing A.

Proof. We must show that the three conditions of Definition 2.4


hold.

(S1) It is clear that


0 = 0v1 + 0v2 + · · · + 0vk
and so 0 is a linear combination of the vectors in A and thus
0 ∈ span( A).
(S2) Let u, v ∈ span( A). Then there are scalars αi , β i (1 ≤ i ≤ k)
such that

u = α1 v1 + α2 v2 + · · · + α k v k (2.4)
v = β 1 v1 + β 2 v2 + · · · + β k v k . (2.5)

Now7 consider the sum u + v: 7


This seems like a lot of work just to
say something that is almost obvious:
u + v = ( α1 v1 + α2 v2 + · · · + α k v k ) if you take two linear combinations of
a set of vectors and add them together,
+ ( β 1 v1 + β 2 v2 + · · · + β k v k ) then the resulting vector is also a linear
combination of the original set of
(by Eqs. (2.4) and (2.5))
vectors!
= ( α1 + β 1 ) v1 + ( α2 + β 2 ) v2 + · · · + ( α k + β k ) v k

and so u + v ∈ span( A) since αi + β i ∈ R for all i.


(S3) Let v ∈ span( A) and α ∈ R. Then there are scalars β i (1 ≤ i ≤
k) such that

v = β 1 v1 + β 2 v2 + · · · + β k v k . (2.6)
36 CHAPTER 2. VECTOR SPACES AND SUBSPACES

It is clear that

αv = α ( β 1 v1 + β 2 v2 + · · · + β k vk ) (by Equation (2.6))


= (αβ 1 )v1 + (αβ 2 )v2 + · · · + (αβ k )vk (rearranging)

and so αv ∈ span( A) since αβ i ∈ R for all i..

The arguments earlier in this section showed that any subspace


containing A also contains span( A) and as span( A) is a subspace
itself, it must be the smallest subspace containing A.

Remark 2.16. In Rn , the smallest subspace containing the empty set ∅ is


{0} since any subspace must contain 0 and {0} is closed under addition
and scalar multiplication. Therefore, by convention, we set

span(∅) = {0},

so that Theorem 2.15 also holds for A being the empty set.

The span of a set of vectors gives us an easy way to find sub-


spaces — start with any old set of vectors, take their span and we
get a subspace. If we do have a subspace given in this way, then
what can we say about it? Is this a useful way to create, or work
with, a subspace?
For example, suppose we start with

A = {(1, 0, 1), (3, 2, 3)}

as a set of vectors in R3 . Then span( A) is a subspace of R3 — what


can we say about this subspace? The first thing to notice is that
we can easily check whether or not a particular vector is contained
in span( A), because it simply involves solving a system of linear
equations.8 8
This shows that having a subspace of
Continuing our example, to decide whether a vector v is con- the form span( A) is a good representa-
tion of the subspace, because we can
tained in span( A), we try to solve the vector equation easily test membership of the subspace
— that is, we “know” which vectors
v = λ1 (1, 0, 1) + λ2 (3, 2, 3) are contained in the subspace.

for the two “unknowns” λ1 , λ2 ; this is a system of 3 linear equa-


tions in two unknowns and, as discussed in Chapter 1, can easily be
solved.

Example 2.17. (Vector not in span) If A = {(1, 0, 1), (3, 2, 3)}, then
v = (2, 4, 5) is not in span( A). This follows because the equation

(2, 4, 5) = λ1 (1, 0, 1) + λ2 (3, 2, 3)

yields the system of linear equations

λ1 + 3λ2 = 2
2λ2 = 4
λ1 + 3λ2 = 5

which is obviously inconsistent. Thus there is no linear combination of


the vectors in A that is equal to (2, 4, 5).
math1012 mathematical theory and methods 37

Example 2.18. (Vector in span) If A = {(1, 0, 1), (3, 2, 3)}, then v =


(5, −2, 5) is in span( A). This follows because the equation

(5, −2, 5) = λ1 (1, 0, 1) + λ2 (3, 2, 3)

yields the system of linear equations

λ1 + 3λ2 = 5
2λ2 = −2
λ1 + 3λ2 = 5

which has the unique solution λ2 = −1 and λ1 = 8. Therefore v ∈


span( A) because we have now found the particular linear combination
required.

Continuing with A = {(1, 0, 1), (3, 2, 3)}, is there another descrip-


tion of the subspace span( A)? It is easy to see that every vector in
span( A) must have its first and third coordinates equal, and by try-
ing a few examples, it seems likely that every vector with first and
third coordinates equal is in span( A). To prove this, we would need
to demonstrate that a suitable linear combination of the two vectors
can be found for any such vector. In other words, we need to show
that the equation

( x, y, x ) = λ1 (1, 0, 1) + λ2 (3, 2, 3)

has a solution for all values of x and y. Fortunately this system of


linear equations can easily be solved symbolically with the result
that the system is always consistent with solution
3 1
λ1 = x − y λ2 = y.
2 2
Therefore we have the fact that

span((1, 0, 1), (3, 2, 3)) = {( x, y, x ) | x, y ∈ R}

2.3.1 Spanning sets


So far in this section, we have started with a small collection of vec-
tors (that is, the set A), and then built a subspace (that is, span( A))
from that set of vectors. Why do we want to find a spanning
Now we consider the situation where we start with an arbi- set of vectors for a subspace? The
answer is that a spanning set is an
trary subspace V and try to find a set — hopefully a small set — effective way of describing a subspace.
of vectors A such that V = span( A). This concept is sufficiently Every subspace has a spanning set,
and so it is also a universal way of
important to warrant a formal definition: describing a subspace. Basically,
once you know a spanning set for
a subspace, you can easily calculate
everything about that subspace.
Definition 2.19. (Spanning set)
Let V ⊆ Rn be a subspace. Then a set A = {v1 , v2 , . . . , vk } of vectors,
each contained in V, is called a spanning set for V if

V = span( A).
38 CHAPTER 2. VECTOR SPACES AND SUBSPACES

Example 2.20. (Spanning set) If V = {( x, y, 0) | x, y ∈ R}, then


V is a subspace of R3 . The set A = {(1, 0, 0), (0, 1, 0)} is a spanning
set for V, because every vector in V is a linear combination of the vectors
in A, and every linear combination of the vectors in A is in V. There are
other spanning sets for V — for example, the set {(1, 1, 0), (1, −1, 0)} is
another spanning set for V.

Example 2.21. (Spanning set) If V = {(w, x, y, z) | w + x + y + z = 0},


then V is a subspace of R4 . The set

A = {(1, −1, 0, 0), (1, 0, −1, 0), (1, 0, 0, −1)}

is a spanning set for V because every vector in V is a linear combination


of the vectors in A, and every linear combination of the vectors in A is in
V. There are many other spanning sets for V — for example, the set

{(2, −1, −1, 0), (1, 0, −1, 0), (1, 0, 0, −1)}

is a different spanning set for the same subspace.

One critical point that often causes difficulty for students begin-
ning linear algebra is understanding the difference between “span”
and “spanning set”; the similarity in the phrases seems to cause
confusion. To help overcome this, we emphasise the difference.9 9
Another way to think of it is that a
“spanning set” is like a list of LEGO®
shapes that you can use to build
a model while the “span” is the
Key Concept 2.22. (Difference between span and spanning set) To completed model (the subspace).
Finding a spanning set for a subspace
remember the difference between span and spanning set, make sure is like starting with the completed
you understand that: model and asking “What shapes do I
need to build this model?".
• The span of a set A of vectors is the entire subspace that can be
“built” from the vectors in A by taking linear combinations in all
possible ways.

• A spanning set of a subspace V is a set of vectors that are


needed in order to “build” V.

In the previous examples, the spanning sets were just “given”


with no explanation of how they were found, and no proof that
they were the correct spanning sets. In order to show that a particu-
lar set A actually is a spanning set for a subspace V, it is necessary
to check two things:

1. Check that the vectors in A are actually contained in V — this


guarantees that span( A) ⊆ V.
2. Check that every vector in V can be made as a linear combination
of the vectors in A — this shows that span( A) = V.

The first of these steps is easy and, by now, you will not be sur-
prised to discover that the second step can be accomplished by
solving a system of linear equations10 . 10
In fact, almost everything in linear
algebra ultimately involves nothing
more than solving a system of linear
equations!
math1012 mathematical theory and methods 39

Example 2.23. (Spanning set with proof) We show that the set A =
{(1, 1, −1), (2, 1, 1)} is a spanning set for the subspace

V = {( x, y, z) | z = 2x − 3y} ⊂ R3 .

First notice that both (1, 1, −1) and (2, 1, 1) satisfy the condition that
z = 2x − 3y and so are actually in V. Now we need to show that every
vector in V is a linear combination of these two vectors. Any vector in
V has the form ( x, y, 2x − 3y) and so we need to show that the vector
equation
( x, y, 2x − 3y) = λ1 (1, 1, −1) + λ2 (2, 1, 1)
in the two unknowns λ1 and λ2 is consistent, regardless of the values of x
and y. Writing this out as a system of linear equations we get

λ1 + 2λ2 = x
λ1 + λ2 = y
−λ1 + λ2 = 2x − 3y.

Solving this system of linear equations using the techniques of the previ-
ous chapter shows that this system always has a unique solution, namely

λ1 = 2y − x λ2 = x − y.

Hence every vector in V can be expressed as a linear combination of the


two vectors, namely

( x, y, 2x − 3y) = (2y − x )(1, 1, −1) + ( x − y)(2, 1, 1).

This shows that these two vectors are a spanning set for V.

Actually finding a spanning set for a subspace is not so difficult,


because it can just be built up vector-by-vector. Suppose that a
subspace V is given in some form (perhaps by a formula) and you
need to find a spanning set for V. Start by just picking any non-
zero vector v1 ∈ V, and examine span(v1 ) — if this is equal to V,
then you have finished, otherwise there are some vectors in V that
cannot yet be built just from v1 . Choose one of these “unreachable”
vectors, say v2 , add it to the set you are creating, and then examine
span(v1 , v2 ) to see if this is equal to V. If it is, then you are finished
and otherwise there is another “unreachable” vector, which you call
v3 and add to the set, and so on. After some finite number of steps
(say k steps), this process will eventually terminate11 when there 11
The reason that this process must
are no more unreachable vectors in V in which case terminate (i.e. not go on for ever)
will become clear over the next few
sections.
V = span(v1 , v2 , . . . , vk )

and you have found a spanning set for V.

Example 2.24. (Finding Spanning Set) Let

V = {( x, y, z) | z = 2x − 3y}

which is a subspace of R3 . To find a spanning set for V, we start by choos-


ing any non-zero vector that lies in V, say v1 = (1, 0, 2). It is clear that
40 CHAPTER 2. VECTOR SPACES AND SUBSPACES

span(v1 ) is strictly smaller than V, because every vector in span(v1 ) has


a zero second coordinate, whereas there are vectors in V that do not have
this property. So we choose any one of these — say, v2 = (0, 1, −3) —
and now consider span(v1 , v2 ), which we can now prove is actually equal
to V. Thus, a suitable spanning set for V is the set

A = {(1, 0, 2), (0, 1, −3)}.

2.4 Linear independence

In the last section, we learned how a subspace can always be de-


scribed by giving a spanning set for that subspace. There are many
spanning sets for any given subspace, and in this section we con-
sider when a spanning set is efficient — in the sense that it is as
small as it can be. For example, here are three different spanning
sets for the xy-plane in R3 (remember that the xy-plane is the set of
vectors {( x, y, 0) | x, y ∈ R}.

A1 = {(1, 0, 0), (0, 1, 0)}


A2 = {(1, 1, 0), (1, −1, 0), (1, 3, 0)}
A3 = {(2, 2, 0), (1, 2, 0)}.

Which of these is the “best” spanning set to use? There is perhaps


nothing much to choose between A1 and A3 , because each of them
contain two vectors12 , but it is clear that A2 is unnecessarily big — if 12
But maybe A1 looks “more natural”
we throw out any of the three vectors in A2 , then the remaining two because the vectors have such a simple
form; later we will see that for many
vectors still span the same subspace. On the other hand, both of the subspaces there is a natural spanning
spanning sets A1 and A3 are minimal spanning sets for the xy-plane set, although this is not always the
case.
in that if we discard any of the vectors, then the remaining set no
longer spans the whole xy-plane.
The reason that A2 is not a smallest-possible spanning set for the
xy-plane is that the third vector is redundant — it is already a linear
combination of the first two vectors: (1, 3, 0) = 2(1, 1, 0) − (1, −1, 0)
and therefore any vector in span( A2 ) can be produced as a linear
combination only of the first two vectors. More precisely any linear
combination

α1 (1, 1, 0) + α2 (1, −1, 0) + α3 (1, 3, 0)

of all three of the vectors in A2 can be rewritten as

α1 (1, 1, 0) + α2 (1, −1, 0) + α3 (2(1, 1, 0) − (1, −1, 0))

which is equal to

(α1 + 2α3 )(1, 1, 0) + (α2 − α3 )(1, −1, 0)

which is just a linear combination of the first two vectors with al-
tered scalars.
Therefore a spanning set for a subspace is an efficient way to
represent a subspace if none of the vectors in the spanning set is a
linear combination of the other vectors. While this condition is easy
math1012 mathematical theory and methods 41

to state, it is hard to work with directly, and so we use a condition


that means exactly the same thing but is easier to use.

Definition 2.25. (Linear independence)


Let A = {v1 , v2 , . . . , vk } be a set of vectors in Rn . Then A is called
linearly independent (or just independent) if the only solution to the
vector equation
λ1 v1 + λ2 v2 + · · · + λk vk = 0
in the unknowns λ1 , λ2 , . . ., λk is the trivial solution λ1 = λ2 = · · · =
λk = 0.

Before seeing why this somewhat strange definition means ex-


actly the same as having no one of the vectors being a linear com-
bination of the others, we’ll see a couple of examples. The astute
reader will not be surprised to learn that testing a set of vectors for
linear independence involves solving a system of linear equations.
Example 2.26. (Independent set) In order to decide whether the set of
vectors A = {(1, 1, 2, 2), (1, 0, −1, 2), (2, 1, 3, 1)} in R4 is linearly
independent, we need to solve the vector equation

λ1 (1, 1, 2, 2) + λ2 (1, 0, −1, 2) + λ3 (2, 1, 3, 1) = (0, 0, 0, 0).

This definitely has at least one solution, namely the trivial solution
λ1 = 0, λ2 = 0, λ3 = 0, and so the only question is whether it has
more solutions. The vector equation is equivalent to the system of linear
equations
There is an asymmetry here similar to
λ1 + λ2 + 2λ3 = 0 the asymmetry in subspace proofs. To
λ1 + λ3 = 0 show that a set of vectors is dependent
only requires one non-trivial linear
2λ1 − λ2 + 3λ3 = 0
combination, whereas to show that
2λ1 + 2λ2 + λ3 = 0 a set of vectors is independent it is
necessary in principle to show that
which can easily be shown, by the techniques of Chapter 1 to have a unique every non-trivial linear combination
solution. of the vectors is non-zero. Of course
in practice this is done by solving the
A set of vectors that is not linearly independent is called depen- relevant system of linear equations and
dent. To show that a set of vectors is dependent, it is only necessary showing that it has a unique solution,
which must therefore be the trivial
to find an explicit non-trivial linear combination of the vectors solution.
equal to 0.
Example 2.27. (Dependent set) Is the set

A = {(1, 3, −1), (2, 1, 2), (4, 7, 0)}

in R3 linearly independent? To decide this, set up the vector equation

λ1 (1, 3, −1) + λ2 (2, 1, 2) + λ3 (4, 7, 0) = (0, 0, 0)

and check how many solutions it has. This is equivalent to the system of
linear equations
λ1 + 2λ2 + 4λ3 = 0
3λ1 + λ2 + 7λ3 = 0
−λ1 + 2λ2 = 0
42 CHAPTER 2. VECTOR SPACES AND SUBSPACES

After Gaussian elimination, the augmented matrix for this system of linear
equations is  
1 2 4 0
0 −5 −5 0
 
0 0 0 0
and so has infinitely many solutions, because λ3 is a free parameter.
While this is already enough to prove that A is dependent, it is always
useful to find an explicit solution which can then be used to double-check
the conclusion. As λ3 is free, we can find a solution by putting λ3 = 1, in
which case the second row gives λ2 = −1 and the first row λ1 = −2. And
indeed we can check that

−2(1, 3, −1) − 1(2, 1, 2) + 1(4, 7, 0) = (0, 0, 0)

as required to prove dependence.

Sometimes a non-trivial linear combination is easy to see, so that


there is no need to try to solve a system.

Example 2.28. (Dependent set) Is the set

A = {(1, 0, 0), (0, 1, 0), (4, 7, 0)}

in R3 linearly independent? We immediately see that

−4(1, 0, 0) − 7(0, 1, 0) + 1(4, 7, 0) = (0, 0, 0)

and this proves that A is a dependent set.

Here is a surprising example:

Example 2.29. ({0} is dependent) Consider the set A = {0} in Rn . It is


a bit counter-intuitive that a single vector can be dependent but the vector
equation α1 0 = 0 has a non-trivial solution α1 = 1, which proves that A
is a dependent set.

Now we’ll give a rigorous proof of the earlier claim that the
definition of linear independence (Definition 2.25) is just a way of
saying that none of the vectors is a linear combination of the others.

Theorem 2.30. Let A = {v1 , v2 , . . . , vk } be a set of vectors in Rn .


Then A is linearly independent if and only if none of the vectors in A are a
linear combination of the others.

Proof. We will actually prove the contrapositive13 statement, namely 13


Given the statement “A implies B”,
that A is linearly dependent if and only if one of the vectors is recall that the contrapositive statement
is “not B implies not A”. If a statement
a linear combination of the others. First suppose that one of the is true then so is its contrapositive, and
vectors, say vi , is a linear combination of the others: then there are vice versa.
scalars α1 , α2 , . . ., αi−1 , αi+1 , . . ., αk such that

vi = α1 v1 + α2 v2 + · · · + αi−1 vi−1 + αi+1 vi+1 + · · · + αk vk

and so there is a non-trivial linear combination of the vectors of A


equal to 0, namely:

α1 v1 + α2 v2 + · · · + αi−1 vi−1 − 1vi + αi+1 vi+1 + · · · + αk vk = 0.


math1012 mathematical theory and methods 43

(This linear combination is not all-zero because the coefficient of vi


is equal to −1 which is definitely non-zero.)
Next suppose that the set A is linearly dependent. Then there is
some non-trivial linear combination of the vectors equal to 0:

α1 v1 + α2 v2 + · · · + αk vk = 0.

Because this linear combination is non-trivial, not all of the coeffi-


cients are equal to 0, and so we can pick one of them, say αi , that is
non-zero. But then

αi vi = −α1 v1 − α2 v2 − · · · − αi−1 vi−1 − αi+1 vi+1 − · · · − αk vk

and so because αi 6= 0 we can divide by αi and get


α1 α2 α α α
vi = − v1 − v2 − · · · − i−1 vi−1 − i+1 vi+1 − · · · − k vk
αi αi αi αi αi
so one of the vectors is a linear combination of the others.

Remark 2.31. If A is the empty set ∅, we can consider that none of


the vectors in A are a linear combination of the others (which is a bit
of a vacuous condition). For this reason, we will consider that ∅ is an
independent set.

Remark 2.32. The condition


“none of the vectors in A are a linear combination of the others”
can also be written formally as:
/ span( A \ {v})14 ”.
“for all vector v in A, v ∈ 14
The set notation A \ B means the set
of vectors that are in A but not in B.
Using this condition and remembering that span(∅) = {0}, we see
So A \ {v} consists of all the vectors in
that Theorem 2.30 also applies to the set A = {0} (which is dependent, see A except for v.
Example 2.29).

There are two key facts about dependency that are intuitively
clear, but useful enough to state formally:

1. If A is a linearly independent set of vectors in Rn , then any sub-


set of A is also linearly independent.
2. If B is a linearly dependent set of vectors in Rn then any superset
of B is also linearly dependent.

In other words, you can remove vectors from an independent set


and it remains independent and you can add vectors to a depen-
dent set and it remains dependent. In particular, any set containing
the vector 0 is dependent.

2.5 Bases

In the last few sections, we have learned that giving a spanning set
for a subspace is an effective way of describing a subspace and that
a spanning set is efficient if it is linearly independent. Therefore an
excellent way to describe or transmit, for example by computer, a
subspace is to give a linearly independent spanning set for the sub-
space. This concept is so important that it has a special name:
44 CHAPTER 2. VECTOR SPACES AND SUBSPACES

Definition 2.33. (Basis)


Let V be a subspace of Rn . Then a basis for V is a linearly independent
spanning set for V. In other words, a basis is a set of vectors A ⊂ Rn
such that

• V = span( A), and


• A is linearly independent.

Example 2.34. (Basis) The set A = {(1, 0, 0), (0, 1, 0)} is a basis for
the xy-plane in R3 , because it is a linearly independent set of vectors and
any vector in the xy-plane can be expressed as a linear combination of the
vectors of A.

Example 2.35. (Basis proof) Let V be the subspace of R3 defined by


V = {( x, y, z) | x − y + 2z = 0}. Then we shall show that A =
{(2, 0, −1), (1, 1, 0)} is a basis for V. We check three separate things: that
the vectors are actually in V, that they are linearly independent, and that
they are a spanning set for V.

1. Check that both vectors are actually in V.


This is true because

2 − (0) + 2(−1) = 0
1 − 1 + 2(0) = 0.

2. Check that A is linearly independent.


Theorem 2.30 shows that two vectors are linearly dependent only if
one of them is a multiple of the other. As this is not the case here, we
conclude that the set is linearly independent.
3. Check that A is a spanning set for V
Every vector in V can be expressed as a linear combination of the two
vectors in A, because all the vectors in V are of the form {( x, y, (y −
Note here that we can also write V
x )/2) | x, y ∈ R} and using the techniques from Chapter 1 we see that in the form {(y − 2z, y, z) | y, z ∈ R}
and then see that (y − 2z, y, z) =
x−y −z(2, 0, −1) + y(1, 1, 0) to prove that A
( x, y, (y − x )/2) = (2, 0, −1) + y(1, 1, 0). is a spanning set for V. This makes the
2
computations slightly easier.
Therefore A is a basis for V.

A subspace of Rn can have more than one basis — in fact, a


subspace usually has infinitely many different bases.15 15
The word “bases” is the plural of
“basis.”
Example 2.36. (Two different bases) Let V be the subspace of R3 defined
by V = {( x, y, z) | x − y + 2z = 0}. Then

A = {(2, 0, −1), (1, 1, 0)}


B = {(1, 3, 1), (3, 1, −1)}

are both bases for V. Proving this is left as an exercise.


math1012 mathematical theory and methods 45

The vector space R3 is itself a subspace and so has a basis. In


this case, there is one basis that stands out as being particularly
natural. It is called the standard basis and contains the three vectors

e1 = (1, 0, 0), e2 = (0, 1, 0), e3 = (0, 0, 1)

where the standard basis vectors are given the special names e1 , e2
and e3 .16 More generally, the vector space Rn has a basis consisting 16
In Engineering, the standard basis
of the n vectors {e1 , e2 , . . . , en } where the i-th basis vector ei is all- vectors for R3 are also known as i, j
and k respectively.
zero except for a single 1 in the i-th position.
Finding a basis from scratch is straightforward, because the
technique described before Example 2.24 (and illustrated in the
example) for finding spanning sets by adding vectors one-by-one to
an independent set will automatically find a linearly independent
spanning set — in other words, a basis. In fact, the same argument
shows that you can start with any linearly independent set and
augment it vector-by-vector to obtain a basis containing the original
linearly independent set of vectors17 . 17
We still have not yet shown that this
Another approach to finding a basis of a subspace is to start with process will actually terminate, but
will do so in the next section.
a spanning set that is linearly dependent and to remove vectors from
it one-by-one. If the set is linearly dependent then one of the vec-
tors is a linear combination of the others, and so it can be removed
from the set without altering the span of the set of vectors. This
process can be repeated until the remaining vectors are linearly
independent in which case they form a basis.

Example 2.37. (Basis from a spanning set) Let

A = {(1, 1, −2), (−2, −2, 4), (−1, −2, 3), (5, −5, 0)}

be a set of vectors in R3 , and let V = span( A). What is a basis for V


contained in A? We start by testing whether A is linearly independent by
solving the system of linear equations

λ1 (1, 1, −2) + λ2 (−2, −2, 4) + λ3 (−1, −2, 3) + λ4 (5, −5, 0) = (0, 0, 0)

to see if it has any non-trivial solutions. If so, then one of the vectors can
be expressed as a linear combination of the others and discarded. In this
case, we discover that (−2, −2, 4) = −2(1, 1, −2) and so we can throw
out (−2, −2, 4). Now we are left with

{(1, 1, −2), (−1, −2, 3), (5, −5, 0)}

and test whether this set is linearly independent. By solving

λ1 (1, 1, −2) + λ2 (−1, −2, 3) + λ3 (5, −5, 0) = (0, 0, 0)

we discover that (5, −5, 0) = 15(1, 1, −2) + 10(−1, −2, 3) and so we


can discard (5, −5, 0). Finally the remaining two vectors are linearly
independent and so the set

{(1, 1, −2), (−1, −2, 3)}

is a basis for V.
46 CHAPTER 2. VECTOR SPACES AND SUBSPACES

2.5.1 Dimension

As previously mentioned, a subspace of Rn will usually have in-


finitely many bases. However, these bases will all share one feature
— every basis for a subspace V contains the same number of vectors.
This fact is not at all obvious, and so we will give a proof for it.
Actually we will prove a slightly more technical result that has the
result about bases, along with a number of other useful results, as
simple consequences.18 While this is a result of fundamental impor- 18
A result that is a straightforward
tance, it uses some fiddly notation with lots of subscripts, so do not consequence of a theorem is called a
“corollary”.
feel alarmed if you do not understand it first time through.

Theorem 2.38. Let A = {v1 , v2 , . . . , vk } be a linearly independent set


of vectors. Then any set of ` > k vectors contained in V = span( A) is
dependent.

Proof. Let {w1 , w2 , . . . , w` } be a set of ` > k vectors in V. Then


each of these can be expressed as a linear combination of the vec-
tors in A, and so there are scalars αij ∈ R, where 1 ≤ i ≤ ` and
1 ≤ j ≤ k such that:

w1 = α11 v1 + α12 v2 + · · · + α1k vk


w2 = α21 v1 + α22 v2 + · · · + α2k vk
..
.
w` = α`1 v1 + α`2 v2 + · · · + α`k vk .

Now consider what happens when we test {w1 , w2 , . . . , w` } for


linear dependence. We try to solve the system of linear equations

β 1 w1 + β 2 w2 + · · · + β ` w` = 0 (2.7)

and determine if there are any non-trivial solutions to this system.


By replacing each wi in Equation (2.7) with the corresponding
expression as a linear combination of the vectors in A, we get a
huge equation:

0 = β 1 (α11 v1 + α12 v2 + · · · + α1k vk )


+ β 2 (α21 v1 + α22 v2 + · · · + α2k vk )
(2.8)
+···
+ β ` (α`1 v1 + α`2 v2 + · · · + α`k vk ).

However, this is a linear combination of the vectors in A that is


equal to the zero vector. Because A is a linearly independent set
of vectors, this happens if and only if the coefficients of the vec-
tors vi in Equation (2.8) are all zero. In other words, the scalars
{ β 1 , β 2 , . . . , β ` } must satisfy the following system of linear equa-
math1012 mathematical theory and methods 47

tions:

α11 β 1 + α21 β 2 + · · · + α`1 β ` = 0


α12 β 1 + α22 β 2 + · · · + α`2 β ` = 0
.. .. ..
. . .
α1k β 1 + α2k β 2 + · · · + α`k β ` = 0.

This is a homogeneous19 system of linear equations, and so it is 19


The constant term in each equation
consistent. As we discussed in Chapter 1, Theorem 1.27 has the is zero.

important corollary that every homogeneous system of linear equa-


tions with more unknowns than equations has infinitely many
solutions. Hence there is at least one non-trivial choice of scalars
{ β 1 , β 2 , . . . , β ` } satisfying Equation (2.7) (and indeed there are in-
finitely many such choices), thereby showing that {w1 , w2 , . . . , w` }
is linearly dependent.

Corollary 2.39. Every basis for a subspace V of Rn contains the same


number of vectors.

Proof. Suppose that A = {v1 , v2 , . . . , vk } and B = {w1 , w2 , . . . , w` }


are two bases for V. Then both A and B are linearly independent
sets of vectors and both have the same span. By Theorem 2.38, if we
had ` > k then B would be dependent, thus ` ≤ k. Now swapping
the roles of A and B (and hence swapping ` and k), Theorem 2.38
also implies that k ≤ `. Therefore ` = k.

Definition 2.40. (Dimension)


The dimension of the subspace V, denoted by dim(V ), is the number
of vectors in a basis for V. This definition is not ambiguous, thanks to
Corollary 2.39.

Example 2.41. (Dimension of Rn ) The standard basis for R2 contains


two vectors, the standard basis for R3 contains three vectors and the stan-
dard basis for Rn contains n vectors, so we conclude that the dimension
of Rn is equal to n.

Example 2.42. (Dimension of a line) A line through the origin in Rn is


a subspace consisting of all the multiples of a given non-zero vector:

L = {λv | λ ∈ R}.

The set {v} containing the single vector v is a basis for L and so a line is
1-dimensional.

Exercise 2.5.1. What is the dimension of the subspace {0} of Rn ?

This shows that the formal algebraic definition of dimension


coincides with our intuitive geometric understanding of the word
dimension, which is reassuring.20 20
Of course, we would expect this to
be the case, because the algebraic no-
tion of “dimension” was developed as
an extension of the familiar geometric
concept.
48 CHAPTER 2. VECTOR SPACES AND SUBSPACES

Another important corollary of Theorem 2.38 is that we are fi-


nally in a position to show that the process of finding a basis by
extending 21 a linearly independent set will definitely finish. 21
“extending” just means “adding
vectors to”.
Corollary 2.43. If V is a subspace of Rn , then any linearly independent
set A of vectors in V is contained in a basis for V.

Proof. If span( A) 6= V, then adding a vector in V \span( A) to


A creates a strictly larger independent set of vectors. As no set
of n + 1 vectors in Rn is linearly independent, this process must
terminate in at most n steps, and when it terminates, the set is a
basis for V.

The next result seems intuitively obvious, because it just says


that dimension behaves as you would expect, in that a subspace can
only contain other subspaces if they have lower dimension.

Theorem 2.44. Suppose that S, T are subspaces of Rn and that S ( T.


Then dim(S) < dim( T ).

Proof. Let BS be a basis for S. Then BS is a linearly independent set


of vectors contained in T, and so it can be extended to a basis for T.
As span( BS ) 6= T the basis for T is strictly larger than the basis for
S and so dim(S) < dim( T ).

Corollary 2.39 has a number of important consequences. If A is a


set of vectors contained in a subspace V, then normally there is no
particular relationship between the properties “A is linearly inde-
pendent” and “A is a spanning set for V” in that A can have none,
either or both of these properties. However if A has the right size to
be a basis, then it must either have none or both of the properties.

Corollary 2.45. Let V be a k-dimensional subspace of Rn . Then

1. Any linearly independent set of k vectors of V is a basis for V.


2. Any spanning set of k vectors of V is a basis for V.

Proof. 1. Let A be a linearly independent set of k vectors in V.


By Corollary 2.43, A can be extended to a basis but by Corol-
lary 2.39 this basis contains k vectors and so no vectors can be
added to A. Therefore A is already a basis for V.
2. Let B be a spanning set of k vectors. Suppose B is not a basis,
that is, B is linearly dependent. As explained above Example
2.37, we can obtain a basis by removing vectors from B. But
then this basis would have less than k vectors, contradicting
Corollary 2.39. Therefore B is linearly independent.

Example 2.46. (Dimension of a specific plane) The subspace V =


{( x, y, z) | x + y + z = 0} is a plane through the origin in R3 . What
is its dimension? We can quickly find two vectors in V, namely (1, −1, 0)
and (1, 0, −1), and as they are not multiples of each other, the set

{(1, −1, 0), (1, 0, −1)}


math1012 mathematical theory and methods 49

is linearly independent. As R3 is 3-dimensional, any proper subspace of


R3 has dimension at most 2 by Theorem 2.44, and so this set of vectors is
a basis and V has dimension 2. Similarly any plane through 0 in R3 has
dimension 2.

2.5.2 Coordinates
The most important property of a basis for a subspace V is that
every vector in V can be expressed as a linear combination of the
basis vectors in exactly one way; we prove this in the next result:
Theorem 2.47. Let B = {v1 , v2 , . . . , vk } be a basis for the sub-
space V. Then for any vector v ∈ V, there is a unique choice of scalars
α1 , α2 , . . . , αk such that

v = α1 v1 + α2 v2 + · · · + αk vk .

Proof. As B is a spanning set for V, there is at least one way of ex-


pressing v as a linear combination of the basis vectors. So we just
need to show that there cannot be two different linear combinations
each equal to v. So suppose that there are scalars α1 , α2 , . . . , αk and
β 1 , β 2 , . . . , β k such that

v = α1 v1 + α2 v2 + · · · + αk vk
v = β 1 v1 + β 2 v2 + · · · + β k vk .

Subtracting these expressions and rearranging, we discover that

0 = (α1 − β 1 )v1 + (α2 − β 2 )v2 + · · · + (αk − β k )vk .

As B is a linearly independent set of vectors, the only linear com-


bination equal to 0 is the trivial linear combination with all coeffi-
cients equal to 0, and so α1 − β 1 = 0, α2 − β 2 = 0, . . ., αk − β k = 0
and so αi = β i for all i. Therefore the two linear combinations for v
are actually the same.

Definition 2.48. (Coordinates)


Let B = {v1 , v2 , . . . , vk } be a basis for the subspace V. If
This often applies to V being the full
v = α1 v1 + α2 v2 + · · · + αk vk , vector space Rn .

we call the scalars α1 , . . . , αk the coordinates of v in the basis B, and we


write
( v ) B = ( α1 , α2 , . . . , α k ) .

Remark 2.49. Note that when we write a vector v as ( x, y, z), this


just means that x, y, z are the coordinates in the standard basis S =
{e1 , e2 , e3 } since
( x, y, z) = xe1 + ye2 + ze3 ,
and similarly in higher dimensions. If we wish to emphasise that fact, we
sometimes write (v)S = ( x, y, z).
50 CHAPTER 2. VECTOR SPACES AND SUBSPACES

Example 2.50. Let V = {( x, y, z) | x + y + z = 0} be a plane through


the origin in R3 . A basis for this subspace is

B = {(1, −1, 0), (1, 0, −1)},

as seen in Example 2.46


Therefore any vector v in V can be written in a unique way as a linear
combination of the basis vectors, for example,

v = (1, −3, 2) = 3(1, −1, 0) − 2(1, 0, −1),

hence
(v) B = (1, −3, 2) B = (3, −2).

In general, the task of finding the coordinates of a target vector


v with respect to a basis B = {v1 , v2 , . . . , vk } is that of solving a
system of linear equations:

Find α1 , α2 , · · · , αn such that v = α1 v1 + α2 v2 + · · · + αk vk .

Example 2.51. (Example 2.50 continued) Express w = (1, 4, −5) in


terms of the basis B.
Solution. We must solve: (1, 4, −5) = α1 (1, −1, 0) + α2 (1, 0, −1),
which gives us the system

α1 + α2 = 1
− α1 = 4
− α2 = −5
which has the uniques solution α1 = −4 and α2 = 5.
So we found the coordinates of w: (w) B = (1, 4, −5) B = (−4, 5). The
fact that we have found a solution means that w lies in the plane V. If it
did not then the system of equation for α1 , α2 would be inconsistent.

It is often useful to change the basis that one is working with.

Example 2.52. (Example 2.50 continued, different basis) Suppose that


instead of the basis B in the above example we chose to use another basis,
say
C = {(0, 1, −1), (1, −2, 1)}.
(As was the case for basis B, it is easy to verify that this is a basis for V).
After some calculation we can show that

(v)C = (−1, 1) , (w)C = (6, 1).

In order for us to exploit a judiciously chosen basis we would


like to have a simple way of converting from coordinates in one
basis to coordinates in another. For example, given (v) B can we find
(v)C without having to work out what v itself is? We will answer
this question in Chapter 5.
3
Matrices and determinants

This chapter introduces matrix algebra and explains the


fundamental relationships between matrices and their properties,
and the various subspaces associated with a matrix.

Before commencing this chapter, students should be able to:

• Solve systems of linear equations,


• Confidently identify and manipulate subspaces, including
rapidly determining spanning sets and bases for subspaces,
and
• Add and multiply matrices.

After completing this chapter, students will be able to:

• Understand the operations of matrix algebra and identify the


similarities and differences between matrix algebra and the alge-
bra of real numbers,
• Describe, and find bases for, the row space, column space and
null space of a matrix,
• Find the rank and nullity of a matrix and understand how they
are related by the rank-nullity theorem, and
• Compute determinants and understand the relationship between
determinants, rank and invertibility of matrices.

An m × n matrix is a rectangular array of numbers with m rows


and n columns.

3.1 Matrix algebra

In this section we consider the algebra of matrices — that is, the sys-
tem of mathematical operations such as addition, multiplication,
inverses and so on, where the operands1 are matrices, rather than 1
This is mathematical terminology for
numbers. In isolation, the basic operations are all familiar from “the objects being operated on”.

high school — in other words, adding two matrices or multiplying


two matrices should be familiar to everyone— but matrix algebra is
primarily concerned with the relationships between the operations.
52 CHAPTER 3. MATRICES AND DETERMINANTS

3.1.1 Basic operations


The basic operations for matrix algebra are matrix addition, matrix
multiplication , matrix transposition and scalar multiplication. For com-
pleteness, we give the formal definitions of these operations:

Definition 3.1. (Matrix operations)


The basic matrix operations are matrix addition, matrix multiplication,
matrix transposition and scalar multiplication, which are defined as
follows:
Matrix addition: Let A = ( aij ) and B = (bij ) be two m × n matrices.
Then their sum C = A + B is the m × n matrix defined by A = ( aij ) is a notation which means
that the entry in the i-th row and j-th
column (the (i, j)-entry, for short) of
cij = aij + bij . the matrix A is the real number aij .

Matrix multiplication: Let A = ( aij ) be an m × p matrix, and B =


(bij ) be a p × n matrix. Then their product C = AB is the m × n matrix
defined by
k= p
cij = ∑ aik bkj .
k =1

Matrix transposition: Let A = ( aij ) be an m × n matrix, Then the


transpose C = A T of A is the n × m matrix defined by

cij = a ji .

Scalar multiplication: Let A = ( aij ) be an m × n matrix, and α ∈ R be


a scalar. Then the scalar multiple C = αA is the m × n matrix defined by

cij = αaij .

The properties of these operations are mostly obvious but,


again for completeness, we list them all and give them their for-
mal names.
Theorem 3.2. If A, B and C are matrices and α, β are scalars then,
whenever the relevant operations are defined, the following properties hold:
1. A + B = B + A (matrix addition is commutative)
2. ( A + B) + C = A + ( B + C ) (matrix addition is associative)
3. α( A + B) = αA + αB
4. (α + β) A = αA + βA
5. (αβ) A = α( βA)
6. A( BC ) = ( AB)C (matrix multiplication is associative)
7. (αA) B = α( AB) and A(αB) = α( AB)
8. A( B + C ) = AB + AC (multiplication is left-distributive over
addition)
9. ( A + B)C = AC + BC (multiplication is right-distributive over
addition)
math1012 mathematical theory and methods 53

10. ( A T ) T = A
11. ( A + B) T = A T + B T
12. ( AB) T = B T A T

Proof. All of these can be proved by elementary algebraic manip-


ulation of the expressions for (i, j)-entry of the matrices on both
sides of each equation. We omit the proofs because they are slightly
tedious and not very illuminating.2 2
However it may be worth your while
working through one of them, say
the proof that matrix multiplication is
Almost all of the properties in Theorem 3.2 are unsurprising and
associative, to convince yourself that
essentially mirror the properties of the algebra of real numbers.3 you can do it.
Probably the only property in the list that is not immediately “obvi- 3
A 1 × 1 matrix can be viewed as
ous” is Property (12) stating that the transpose of a matrix product essentially identical to a real number,
and so this is also not surprising.
is the product of the matrix transposes in reverse order:

( AB)T = B T A T .

This property extends to longer products of matrices, for example


we can find the transpose of ABC as follows:4 4
A cautious reader might — correctly
— object that ABC is not a legiti-
mate expression in matrix algebra,
( ABC )T = (( AB)C )T = C T ( AB)T = C T ( B T A T ) = C T B T A T .
because we have only defined ma-
trix multiplication to be a product
It is a nice test of your ability to structure a proof by induction to of two matrices. To make this a legal
prove formally that expression in matrix algebra, it really
needs to be parenthesised, and so we
should use either A( BC ) or ( AB)C.
( A1 A2 · · · An−1 An )T = AnT AnT−1 · · · A2T A1T . However because matrix multiplication
is associative, these evaluate to the
However, rather than considering the obvious properties that same matrix and so, by convention,
as it does not matter which way the
are in the list, it is more instructive to consider the most obvious product is parenthesised, we omit the
omission from the list; in other words, an important property that parentheses altogether.
matrix algebra does not share with the algebra of real numbers. This
is the property of commutativity of multiplication because, while
the multiplication of real numbers is commutative, it is easy to
check by example that matrix multiplication is not commutative. For
example,
" #" # " #
1 1 3 −1 3 3
=
2 1 0 4 6 2

but " #" # " #


3 −1 1 1 1 2
= .
0 4 2 1 8 4

If A and B are two specific matrices, then it might be the case that
AB = BA, in which case the two matrices are said to commute, but
usually it will be the case that AB 6= BA. This is a key difference
between matrix algebra and the algebra of real numbers.
There are some other key differences worth delving into: in real
algebra, the numbers 0 and 1 play special roles, being the additive
identity and multiplicative identity respectively. In other words, for
any real number x ∈ R we have

x+0 = 0+x = x and 1.x = x.1 = x


54 CHAPTER 3. MATRICES AND DETERMINANTS

and
0.x = x.0 = 0. (3.1)
In the algebra of square matrices (that is, n × n matrices for some
n) we can analogously find an additive identity and a multiplicative
identity. The additive identity is the matrix On with every entry equal
to zero, and it is obvious that for any n × n matrix A,
A + On = On + A = A.

The multiplicative identity is the matrix In where every entry on


the main diagonal5 is equal to one, and every entry off the main 5
The main diagonal consist of the
diagonal is equal to zero. Then for any n × n matrix A, (1, 1), (2, 2), . . ., (n, n) positions. As an
example,
AIn = In A = A. 
1 0 0

I3 = 0 1 0 .

When the size of the matrices is unspecified, or irrelevant, we will 0 0 1
often drop the subscript and just use O and I respectively. As the
terms “additive/multiplicative identity" are rather cumbersome, the
matrix O is usually called the zero matrix and the matrix I is usually
called the identity matrix or just the identity.
The property Equation (3.1) relating multiplication and zero also
holds in matrix algebra, because it is clear that
AO = OA = O
for any square matrix A. However, there are other important prop-
erties of real algebra that are not shared by matrix algebra. In par-
ticular, in real algebra there are no non-zero zero divisors, so that if
xy = 0 then at least one of x and y is equal to zero. However this is
not true for matrices — there are products equal to the zero matrix
even if neither matrix is zero. For example,
" #" # " #
1 1 2 −1 0 0
= .
2 2 −2 1 0 0

Definition 3.3. (A menagerie of square matrices)

Suppose that A = ( aij ) is an n × n matrix. Then


• A is the zero matrix if aij = 0 for all i, j.
• A is the identity matrix if aij = 1 if i = j and 0 otherwise.
• A is a symmetric matrix if A T = A.
• A is a skew-symmetric matrix if A T = − A.
• A is a diagonal matrix if aij = 0 for all i 6= j.
• A is an upper-triangular matrix if aij = 0 for all i > j.
• A is a lower-triangular matrix if aij = 0 for all i < j.
• A is an idempotent matrix if A2 = A, where A2 = AA is the
product of A with itself.
• A is a nilpotent matrix if Ak = O for some k, where Ak = |AA{z
· · · A}.
k times
math1012 mathematical theory and methods 55

Example 3.4. (Matrices of various types) Consider the following 3 × 3


matrices:
     
1 0 −1 −1 0 0 −1 0 1
A = 0 2 3  B =  0 2 0 C =  0 2 1 .
     
0 0 1 0 0 0 1 1 3

Then A is an upper-triangular matrix, B is a diagonal matrix and C is


a symmetric matrix.

Example 3.5. (Nilpotent matrix) The matrix


" #
2 4
A=
−1 −2

is nilpotent because
" #" # " #
2 4 2 4 0 0
A2 = = .
−1 −2 −1 −2 0 0

3.2 Subspaces from matrices

There are various vector subspaces associated with a matrix, and


there are useful relationships between the properties of the matrices
and subspaces. The three principal subspaces associated with a
matrix are the row space, the column space and the null space of a
matrix. The first two of these are defined analogously to each other,
while the third is somewhat different. If A is an m × n matrix,
then each of the rows of the matrix can be viewed as a vector in Rn ,
while each of the columns of the matrix can be viewed as a vector in
Rm .

3.2.1 The row space and column space

Definition 3.6. (Row and column space)


Let A be an m × n matrix. Then the row space and column space are
defined as follows:
Row space The row space of A is the subspace of Rn that is spanned
by the rows of A. In other words, the row space is the subspace con-
sisting of all the linear combinations of the rows of A. We denote it by
rowsp( A).
Column space The column space of A is the subspace of Rm that is
spanned by the columns of A. In other words, the column space is the
subspace consisting of all the linear combinations of the columns of A. We
denote it by colsp( A).
56 CHAPTER 3. MATRICES AND DETERMINANTS

Example 3.7. (Row space and column space) Let A be the 3 × 4 matrix
 
1 0 1 −1
A = 2 −1 2 0  . (3.2)
 
1 0 2 1

Then the row space of A is the subspace of R4 defined by

rowsp( A) = span({(1, 0, 1, −1), (2, −1, 2, 0), (1, 0, 2, 1)}),

while the column space of A is the subspace of R3 defined by

colsp( A) = span({(1, 2, 1), (0, −1, 0), (1, 2, 2), (−1, 0, 1)}).

(Remember the conventions regarding row and column vectors described


in Chapter 2.)

As described in Chapter 2, it is easy to answer any particular


question about a subspace if you know a spanning set for that
subspace. In particular, it is easy to determine whether a given
vector is in the row or column space of a matrix just by setting up
the appropriate system of linear equations.

Example 3.8. (Vector in row space) Is the vector (2, 1, −1, 3) in the row
space of the matrix A shown in Equation (3.2) above? This question is
equivalent to asking whether there are scalars λ1 , λ2 and λ3 such that

λ1 (1, 0, 1, −1) + λ2 (2, −1, 2, 0) + λ3 (1, 0, 2, 1) = (2, 1, −1, 3).

By considering each of the four coordinates in turn, this corresponds to the


following system of four linear equations in the three variables:

λ1 + 2λ2 + λ3 = 2
− λ2 = 1
λ1 + 2λ2 + 2λ3 = −1
− λ1 + λ3 = 3.

The augmented matrix for this system is

1 2 1 2
 
 0 −1 0 1 
 
 1 2 2 −1
 

−1 0 1 3

which, after row-reduction, becomes

1 2 1 2
 
0 −1 0 1 
 
0 0 1 −3
 

0 0 0 13

and so the system is inconsistent and we conclude that (2, 1, −1, 3) ∈/


rowsp( A). Notice that the part on the left of the augmenting bar in the
augmented matrix of the system of linear equations is the transpose of the
original matrix.
math1012 mathematical theory and methods 57

Example 3.9. (Vector in column space) Is the vector (1, −1, 2) in the col-
umn space of the matrix A shown in Equation (3.2) above? This question
is equivalent to asking whether there are scalars λ1 , λ2 , λ3 and λ4 such
that

λ1 (1, 2, 1) + λ2 (0, −1, 0) + λ3 (1, 2, 2) + λ4 (−1, 0, 1) = (1, −1, 2).

By considering each of the three coordinate positions in turn, this corre-


sponds to a system of three equations in the four variables:

λ1 + λ3 − λ4 = 1
2λ1 − λ2 + 2λ3 = −1
λ1 + 2λ3 + λ4 = 2.
The augmented matrix for this system is
 
1 0 1 −1 1
2 −1 2 0 −1
 
1 0 2 1 2

which, after row reduction, becomes


 
1 0 1 −1 1
0 −1 0 2 −3
 
0 0 1 2 1

and so this system of linear equations has three basic variables, one free
parameter and therefore infinitely many solutions. So we conclude that
(1, −1, 2) ∈ colsp( A). We could, if necessary, or just to check, find a
particular solution to this system of equations. For example, if we set the
free parameter λ4 = 1 then the corresponding solution is λ1 = 3, λ2 = 5,
λ3 = −1 and λ4 = 1 and we can check that

3(1, 2, 1) + 5(0, −1, 0) − (1, 2, 2) + (−1, 0, 1) = (1, −1, 2).

Notice that in this case, the part on the left of the augmenting bar in the
augmented matrix of the system of linear equations is just the original
matrix itself.

In the previous two examples, the original question led to sys-


tems of linear equations whose coefficient matrix had either the
original matrix or the transpose of the original matrix on the left of
the augmenting bar.
In addition to being able to identify whether particular vectors
are in the row space or column space of a matrix, we would also
like to be able to find a a basis for the subspace and thereby deter-
mine its dimension. In Chapter 2 we described a technique where
any spanning set for a subspace can be reduced to a basis by suc-
cessively throwing out vectors that are linear combinations of the
others. While this technique works perfectly well for determining
the dimension of the row space or column space of a matrix, there
is an alternative approach based on two simple observations:

1. Performing elementary row operations on a matrix does not


change its row space.6 6
However, elementary row operations
do change the column space!!
58 CHAPTER 3. MATRICES AND DETERMINANTS

2. The non-zero rows of a matrix in row-echelon form7 are linearly 7


Reduced row-echelon form has
independent, and therefore form a basis for the row space of that the same property but will yiled a
matrix. different basis.

The consequence of these two facts is that it is very easy to find


a basis for the row space of a matrix — simply put it into row-
echelon form using gauss elimination and then write down the
non-zero rows that are found. However the basis that is found by
this process will not usually be a subset of the original rows of the
matrix. If it is necessary to find a basis for the row space whose
vectors are all original rows of the matrix, then technique described
above can be used.

Example 3.10. (Basis of row space) Consider the problem of finding a


basis for the row space of the 4 × 5 matrix

1 2 −1 −1 4
 
0 0 0 0 0
A= .
 
1 −1 1 0 1
3 0 1 −1 6

After performing the elementary row operations R3 ← R3 − 1R1 , R4 ←


R4 − 3R1 then R2 ↔ R3 and finally R4 ← R4 − 2R2 , we end up with the
following matrix, which we denote A0 , which is in row-echelon form.

1 2 −1 −1 4
 
0 −3 2 1 −3
A0 =  .
 
0 0 0 0 0 
0 0 0 0 0

The key point is that the elementary row operations have not changed the
row space of the matrix in any way and so rowsp( A) = rowsp( A0 ).
However it is obvious that the two non-zero rows

{(1, 2, −1, −1, 4), (0, −3, 2, 1, −3)}

are a basis for the row space of A0 , and so they are also a basis for the row
space of A.

To find a basis for the column space of the matrix A, we cannot


do elementary row operations because they alter the column space.
However it is clear that colsp( A) = rowsp( A T ), and so just trans-
posing the matrix and then performing the same procedure will
find a basis for the column space of A.

Example 3.11. (Basis of column space) What is a basis for the column
space of the matrix A of Example 3.10? We first transpose the matrix,
getting
1 0 1 3
 
 2 0 −1 0 
 
T
A =  −1 0 1 1 
 
 −1 0 0 −1
 

4 0 1 6
math1012 mathematical theory and methods 59

and then perform Gaussian elimination to obtain the row-echelon matrix

1 0 1 3
 
0 0 −3 −6
 
0 0 0 0 
 
0 0 0 0 
 

0 0 0 0

whose row space has basis {(1, 0, 1, 3), (0, 0, −3, −6)}. Therefore these
two vectors are a basis for the column space of A.

What can be said about the dimension of the row space and col-
umn space of a matrix? In the previous two examples, we found
that the row space of the matrix A is a 2-dimensional subspace of
R5 , and the column space of A is a 2-dimensional subspace of R4 .
In particular, even though they are subspaces of different ambient
vector spaces, the dimensions of the row space and column space
turn out to be equal. This is not an accident, and in fact we have the
following surprising result:

Theorem 3.12. Let A be an m × n matrix. Then the dimension of its row


space is equal to the dimension of its column space.

Proof. Suppose that {v1 , v2 , . . . , vk } is a basis for the column space


of A. Then each column of A can be expressed as a linear combi-
nation of these vectors; suppose that the j-th column c j is given
by
c j = γ1j v1 + γ2j v2 + · · · γkj vk .

Now form two matrices as follows: B is an m × k matrix whose


columns are the basis vectors vj , while C = (γij ) is a k × n matrix
whose j-th column contains the coefficients γ1j , γ2j , . . ., γkj . It then
follows8 that A = BC. 8
You may have to try this out with
However we can also view the product A = BC as expressing a few small matrices first to see why
this is true. It is not difficult when
the rows of A as a linear combination of the rows of C with the i-th you see a small situation, but it is not
row of B giving the coefficients for the linear combination that de- immediately obvious either.
termines the i-th row of A. Therefore the rows of C are a spanning
set for the row space of A, and so the dimension of the row space of
A is at most k. We conclude that

dim(rowsp( A)) ≤ dim(colsp( A)).

Applying the same argument to A T we also conclude that

dim(colsp( A)) ≤ dim(rowsp( A))

and hence these values are equal.

This number — the common dimension of the row space and


column space of a matrix — is an important property of a matrix
and has a special name:
60 CHAPTER 3. MATRICES AND DETERMINANTS

Definition 3.13. (Matrix Rank)


The dimension of the row space (and column space) of a matrix is called
the rank of the matrix. If an n × n matrix has rank n, then the matrix is
said to be full rank.

There is a useful characterisation of the row and column spaces


of a matrix that is sufficiently important to state separately.

Theorem 3.14. If A is an m × n matrix, then the set of vectors

{ Ax | x ∈ Rn }
Here x is a vector seen as a column
vector, that is, as an (n × 1)-matrix.
is equal to the column space of A, while the set of vectors
Here y is a vector seen as a row vector,
{yA | y ∈ Rm } that is, as an (1 × m)-matrix.

is equal to the row space of A.

Proof. Suppose that {c1 , c2 , . . . , cn } ∈ Rm are the n columns of


A. Then if x = ( x1 , x2 , . . . , xn ), it is easy to see that Ax = x1 c1 +
x2 c2 + . . . + xn cn . Therefore every vector of the form Ax is a linear
combination of the columns of A, and every linear combination of
the columns of A can be obtained by multiplying A by a suitable
vector. A similar argument applies for the row space of A.

3.2.2 The null space


Suppose that A is an m × n matrix. Obviously, for 0 ∈ Rn , A0 = 09 . 9
Note this zero vector belongs to Rm
Moreover if two vectors v1 , v2 of Rn have the property that Av1 = 0
and Av2 = 0, then simple manipulation shows that

A(v1 + v2 ) = Av1 + Av2 = 0 + 0 = 0

and for any λ ∈ R,

A(λv1 ) = λAv1 = λ0 = 0.

Therefore the set of vectors v with the property that Av = 0 con-


tains the zero vector, is closed under vector addition and scalar
multiplication, and therefore it satisfies the requirements to be a
subspace of Rn .

Definition 3.15. (Null space)


Let A be an m × n matrix. The set of vectors

{v ∈ Rn | Av = 0}

is a subspace of Rn called the null space of A and denoted by nullsp( A).


math1012 mathematical theory and methods 61

Example 3.16. (Null space) Is the vector v = (0, 1, −1, 2) in the null
space of the matrix " #
1 2 2 0
A= ?
3 0 2 1
All that is needed is to check Av and see what arises. As

# 0
 
" " #
1 2 2 0  1 
 0
 =
3 0 2 1  −1 0
2
it follows that v ∈ nullsp( A).
This shows that testing membership of the null space of a matrix
is a very easy task. What about finding a basis for the null space of
a matrix? This turns out10 to be intimately related to the techniques 10
No surprise here!
we used Chapter 1 to solve systems of linear equations.
So, suppose we wish to find a basis for the nullspace of the ma-
trix " #
1 2 2 0
A=
3 0 2 1
from Example 3.16. The matrix equation Ax = 0 yields the follow-
ing system of linear equations
x1 + 2x2 + 2x3 = 0
3x1 + 2x3 + x4 = 0
which has augmented matrix
" #
1 2 2 0 0
.
3 0 2 1 0
Applying the Gauss-Jordan algorithm, we perform the elementary
1
row operations R2 ← R2 − 3R1 , R2 ← − R2 , R1 ← R1 − 2R2 :
6
" # " # " #
1 2 2 0 0 1 2 2 0 0 1 0 2/3 1/3 0
, , .
0 −6 −4 1 0 0 1 2/3 −1/6 0 0 1 2/3 −1/6 0
The last matrix is in reduced row echelon form.
Therefore x3 and x4 are free parameters and we directly get x1 =
1 1
− (2x3 + x4 ) and x2 = ( x4 − 4x3 ). Thus, following the techniques
3 6
of Chapter 1 we can describe the solution set as
1 1
  
S= − (2x3 + x4 ), ( x4 − 4x3 ), x3 , x4 x3 , x4 ∈ R .
3 6
In order to find a basis for S notice that we can rewrite the solu-
tion as a linear combination of vectors by separating out the terms
involving x3 from the terms involving x4
1 1
 
− (2x3 + x4 ), ( x4 − 4x3 ), x3 , x4
3 6
2 4 1 1
   
= − x3 , − x3 , x3 , 0 + − x4 , x4 , 0, x4
3 6 3 6
2 2 1 1
   
= x3 − , − , 1, 0 + x4 − , , 0, 1 .
3 3 3 6
62 CHAPTER 3. MATRICES AND DETERMINANTS

Therefore we can express the solution set S as follows:

2 2 1 1
     
S = x3 − , − , 1, 0 + x4 − , , 0, 1 x3 , x4 ∈ R .
3 3 3 6

However this immediately tellsus that S just consists of all the linear
2 2 1 1
 
combinations of the two vectors − , − , 1, 0 and − , , 0, 1
3 3 3 6
and therefore we have found, almost by accident, a spanning set for
the subspace S. It is immediate that these two vectors are linearly
independent and therefore they form a basis for the null space of A.

Remark 3.17. The astute student will notice that after just one elemen-
tary row operation, the matrix would be in reduced row echelon form if the
fourth column was the second column. Therefore we can stop calculations
there and take x1 and x4 as the basic parameters. We immediately get

S = {(−2x2 − 2x3 , x2 , x3 , 6x2 + 4x3 )| x2 , x3 ∈ R} .

Therefore we get that (−2, 1, 0, 6) and (−2, 0, 1, 4) form also a (simpler)


basis for the null space of A. The lesson to take from this is that we can
sometimes find the nullspace quicker by not following the Gauss-Jordan
algorithm blindly, and remembering that we can take other variables as the
basic ones than the one given by the leading entries in the (reduced) row
echelon form 11 . 11
In this example, you could also
1
perform the operation R1 ← R1 then
In general, this process will always find a basis for the null space 2
take x2 , x4 as the basic parameters.
of a matrix. If the set of solutions to the system of linear equations
has s free parameters, then it can be expressed as a linear combina-
tion of s vectors. These s vectors will always be linearly independent
because in each of the s coordinate positions corresponding to the
free parameters, just one of the s vectors will have a non-zero entry.

Definition 3.18. (Nullity)


The dimension of the null space of a matrix A is called the nullity of A.

We close this section with one of the most important results in


elementary linear algebra, which is universally called the Rank-
Nullity Theorem, which has a surprisingly simple proof.

Theorem 3.19. Suppose that A is an m × n matrix. Then

rank( A) + nullity( A) = n.

Proof. Consider the system of linear equations Ax = 0, which is


a system of m equations in n unknowns. This system is solved by
applying Gaussian elimination to the augmented matrix [ A | 0],
thereby obtaining the matrix [ A0 | 0] in row-echelon form. The rank
of A is equal to the number of non-zero rows of A0 , which is equal
to the number of basic variables in the system of linear equations.
The nullity of A is the number of free parameters in the solution
math1012 mathematical theory and methods 63

set to the system of linear equations and so it is equal to the num-


ber of non-basic variables. So the rank of A plus the nullity of A is
equal to the number of basic variables plus the number of non-basic
variables. As each of the n variables is either basic or non-basic, the
result follows.

Given a matrix, it is important to be able to put all the tech-


niques together and to determine the rank, the nullity, a basis for
the null space and a basis for the row space of a given matrix. This
is demonstrated in the next example.

Example 3.20. (Rank and nullity) Find the rank, nullity and bases for
the row space and null space for the following 4 × 4 matrix:

1 0 2 1
 
3 1 3 3
A= .
 
2 1 1 0
2 1 1 2

All of the questions can be answered once the matrix is in reduced row-
echelon form, and so the first task is to apply Gauss-Jordan elimination,
which will result in the following matrix:

1 0 2 0
 
0 1 −3 0
A0 =  .
 
0 0 0 1
0 0 0 0

The matrix has 3 non-zero rows and so the rank of A is equal to 3. These
non-zero rows form a basis for the rowspace of A and so a basis for the
rowspace of A is

{(1, 0, 2, 0), (0, 1, −3, 0), (0, 0, 0, 1)}.

By the Rank-Nullity theorem, we immediately know that the nullity of


A is the difference between the number of columns (4) and the rank (3) so
is equal to 1. We now determine a basis for the null space.
The null space is the set of solutions to the matrix equation Ax = 0,
and solving this equation by performing Gauss-Jordan elimination on the
augmented matrix [ A | 0] would yield the augmented matrix [ A0 | 0]. 12 12
In other words, the Gauss-Jordan
So given the augmented matrix elimination part only needs to be done
once, because everything depends
only on the form of the matrix A0 .
1 0 2 0 0
 
However it is important to remember
0 1 −3 0 0
that although it is the same matrix,
A0 = 
 
0 0 0 1 0
 we are using it in two quite distinct
ways. This distinction is often missed
0 0 0 0 0 by students studying linear algebra for
the first time.
we see that x1 , x2 and x4 are basic variables, while the solution space has
x3 as its only free parameter. Expressing the basic variables in terms of the
free parameter, we determine that the solution set is

S = {(−2x3 , 3x3 , x3 , 0) | x3 ∈ R}

and it is clear that a basis for this is {(−2, 3, 1, 0)}, which confirms that
the nullity of A is equal to 1.
64 CHAPTER 3. MATRICES AND DETERMINANTS

3.3 Solving systems of linear equations

We have seen that the set of all solutions to the system of linear
equations Ax = 0 is the nullspace of A. What can we say about the
set of solutions of
Ax = b (3.3)

when b 6= 0?
Suppose that we know one solution x1 and that v lies in the
nullspace of A. Then

A( x1 + v) = Ax1 + Av = b + 0 = b.

Hence if we are given one solution we can create many more by


simply adding elements of the nullspace of A.
Moreover, given any two solutions x1 and x2 of Equation (3.3) we
have that
A( x2 − x1 ) = Ax2 − Ax1 = b − b = 0

and so x2 − x1 lies in the nullspace of A. In particular, every solu-


tion of Ax = b is of the form x1 + v for some v ∈ nullsp( A). This is
so important that we state it as a theorem.

Theorem 3.21. Let Ax = b be a system of linear equations and let x1 be


one solution. Then the set of all solutions is

S = { x1 + v | v ∈ nullsp( A)}.

With a slight abuse of notation, we can write the solution set S as


x1 + nullsp( A). If we know a basis {v1 , . . . , vk } for the null space of
A, we can even write S = x1 + span(v1 , . . . , vk ).
Consider for instance the solution set for Example 1.25:

S = {(4x4 + 3/2, −4x4 + 1, −4x4 − 3, x4 )| x4 ∈ R} .

Here our solution x1 is (3/2, 1, −3, 0) and the null space is

{(4x4 , −4x4 , −4x4 , x4 ) | x4 ∈ R} ,

so we can write S = (3/2, 1, −3, 0) + span({(4, −4, −4, 1)}). This is


a convenient and efficient way to write the solution set to a system.
This corollary follows from the theorem.

Corollary 3.22. The number of free parameters required for the set of
solutions of Ax = b is the nullity of A.

3.4 Matrix inversion

In this section we consider the inverse of a matrix. The theory of


inverses in matrix algebra is far more subtle and interesting than
that of inverses in real algebra, and it is intimately related to the
ideas of independence and rank that we have explored so far.
math1012 mathematical theory and methods 65

In real algebra, every non-zero element has a multiplicative in-


verse; in other words, for every x 6= 0 we can find another number,
x 0 such that
xx 0 = x 0 x = 1.
Rather than calling it x 0 , we use the notation x −1 to mean “the
number that you need to multiply x by in order to get 1”, thus
getting the familiar
1
2−1 = .
2
In matrix algebra, the concept of a multiplicative identity and
hence an inverse only makes sense for square matrices. However,
even then, there are some complicating factors. In particular, be-
cause multiplication is not commutative, it is conceivable that a
product of two square matrices might be equal to the identity only
if they are multiplied in a particular order. Fortunately, this does
not actually happen:
Theorem 3.23. Suppose that A and B are square n × n matrices such
that AB = In . Then BA = In .
Proof. First we observe that B has rank equal to n. Indeed, suppose
that Bv = 0 for any vector v, then ABv = 0. Since AB = In , we get
In v = v = 0. So the null space of B contains only the zero vector,
so B has nullity 0 and therefore, by the Rank-Nullity theorem, B has
rank n.
Secondly we do some simple manipulation using the properties
of matrix algebra that were outlined in Theorem 3.2:

On = AB − In (because AB = In )
= B( AB − In ) (because BOn = On )
= BAB − B (distributivity)
= ( BA − In ) B. (distributivity)

This manipulation shows that the matrix ( BA − In ) B = On .


However because the rank of B is n, it follows that the column
space of B is the whole of Rn , and so any vector v ∈ Rn can be
expressed in the form Bx for some x by Theorem 3.14. Therefore

( BA − In )v = ( BA − In ) Bx (because B has rank n)


= On x (because ( BA − In ) B = On )
= 0. (properties of zero matrix)

Applying Theorem 3.14 to the matrix BA − In , this means the col-


umn space of BA − In is just {0}, so BA − In = On or BA = In as
required.

This theorem shows that when defining the inverse of a matrix,


we don’t need to worry about the order in which the multiplication
occurs.13 13
In some text-books, the authors
Another property of inverses in the algebra of real numbers is introduce the idea that B is the “left-
inverse” of A if BA = I and the
that a non-zero real number has a unique inverse. Fortunately, this “right-inverse” of A if AB = I, and
property also holds for matrices: then immediately prove Theorem 3.23
showing that a left-inverse is a right-
inverse and vice versa.
66 CHAPTER 3. MATRICES AND DETERMINANTS

Theorem 3.24. If A, B and C are square matrices such that

AB = In and AC = In

then B = C.

Proof. The proof just proceeds by manipulation using the properties


of matrix algebra outlined in Theorem 3.2.

B = BIn (identity matrix property)


= B( AC ) (hypothesis of theorem)
= ( BA)C (associativity)
= In C (by Theorem 3.23)
= C. (identity matrix property)

Therefore a matrix has at most one inverse.

Definition 3.25. (Matrix inverse)


Let A be an n × n matrix. If there is a matrix B such that

AB = In

then B is called the inverse of A, and is denoted A−1 . From The-


orems 3.23 and 3.24, it follows that B is uniquely determined, that
BA = In , and that B−1 = A.
A matrix is called invertible if it has an inverse, and non-invertible
otherwise.

Example 3.26. (Matrix inverse) Suppose that


 
1 0 1
A = 2 1 1 .
 
2 0 1

Then if we take  
−1 0 1
B =  0 1 −1
 
2 0 −1
then it is easy to check that
AB = I3 .
Therefore we conclude that A−1 exists and is equal to B and, naturally,
B−1 exists and is equal to A.

In real algebra, every non-zero number has an inverse, but this is


not the case for matrices:

Example 3.27. (Non-zero matrix with no inverse) Suppose that


" #
1 1
A= .
2 2
math1012 mathematical theory and methods 67

Then there is no possible matrix B such that AB = I2 . Why is this? If


the matrix B existed, then it would necessarily satisfy
" #" # " #
1 1 b11 b12 1 0
= .
2 2 b21 b22 0 1

In order to satisfy this matrix equation, then b11 + b21 must equal 1, while
2b11 + 2b21 = 2(b11 + b21 ) must equal 0 —- clearly this is impossible. So
the matrix A has no inverse.

One of the common mistakes made by students of elementary


linear algebra is to assume that every matrix has an inverse. If an
argument or proof about a generic matrix A ever uses A−1 as part
of the manipulation, then it is necessary to first demonstrate that A
is actually invertible. Alternatively, the proof can be broken down
into two separate cases, one covering the situation where A is as-
sumed to be invertible and a separate one for where it is assumed
to be non-invertible.

3.4.1 Finding inverses


This last example of the previous section (Example 3.27) essentially
shows us how to find the inverse of a matrix because, as usual, it all
boils down to solving systems of linear equations. If A is an n × n
matrix then finding its inverse, if it exists, is just a matter of finding
a matrix B such that AB = In . To find the first column of B, it is
sufficient to solve the equation Ax = e1 , then the second column
is the solution to Ax = e2 , and so on. If any of these equations has
no solutions then A does not have an inverse.14 Therefore, finding 14
In fact, if any of them have infinitely
many solutions, then it also follows
the inverse of an n × n matrix involves solving n separate systems
that A has no inverse because if the
of linear equations. However because each of the n systems has the inverse exists it must be unique. Thus
same coefficient matrix on the left of the augmenting bar (that is, if one of the equations has infinitely
many solutions, then one of the other
A), there are shortcuts that make this procedure easier. equations must have no solutions.
To illustrate this, we do a full example for a 3 × 3 matrix, al-
though the principle is the same for any matrix. Suppose we want
to find the inverse of the matrix
 
−1 0 1
A =  0 1 −1 .
 
2 0 −1

The results above show that we just need to find a matrix B = (bij )
such that     
−1 0 1 b11 b12 b13 1 0 0
 0 1 −1 b21 b22 b23  = 0 1 0 .
    
2 0 −1 b31 b32 b33 0 0 1
This can be done by solving three separate systems of linear equa-
tions, one to determine each column of B:
           
b11 1 b12 0 b13 0
A b21  = 0 , A b22  = 1 and A b23  = 0 .
           
b31 0 b32 0 b33 1
68 CHAPTER 3. MATRICES AND DETERMINANTS

Then the matrix A has an inverse if and only if all three of these
systems of linear equations have a solution, and in fact, each of
them must have a unique solution. If any one of the three equations
is inconsistent, then A is one of the matrices that just doesn’t have
an inverse.
Consider how solving these systems of linear equations will
proceed: for the first column, we get the augmented matrix
 
−1 0 1 1
 0 1 −1 0
 
2 0 −1 0

which we solved in Section 1.5 by doing the elementary row opera-


tions R3 ← R3 + 2R1 , R1 ← R1 − R3 , R2 ← R2 + R3 , R1 ← − R1 : it
has solution b11 = 1, b21 = 2 and b31 = 2
Now we solve for the second column of B; this time the aug-
mented matrix is  
−1 0 1 0
 0 1 −1 1
 
2 0 −1 0

and after pivoting on the (1, 1)-entry we get


 
−1 0 1 0
 0 1 −1 1
 
0 0 1 0 R3 ← R3 + 2R1

We can now use the last pivot to zero-out the rest of the third
column:
 
−1 0 0 0 R1 ← R1 − R3
 0 1 0 1 R2 ← R2 + R3
 
0 0 1 0
and we finish with
 
1 0 0 0 R1 ← − R1
0 1 0 1
 
0 0 1 0

It is immediately apparent that we used the exact same elementary


row operations as we did for the previous system of linear equations
because, naturally enough, the coefficient matrix is the same matrix.
And obviously, we’ll do the same elementary row operations again
when we solve the third system of linear equations! So to avoid
repeating work unnecessarily, it is better to solve all three systems
simultaneously. This is done by using a sort of “super-augmented”
matrix that has three columns to the right of the augmenting bar,
representing the right-hand sides of the three separate equations:
 
−1 0 1 1 0 0
 0 1 −1 0 1 0 .
 
2 0 −1 0 0 1
math1012 mathematical theory and methods 69

Then performing an elementary row operation on this bigger ma-


trix has exactly the same effect as doing it on each of the three
systems separately. We will apply Gauss-Jordan elimination (see
Section 1.5) to this “super-augmented” matrix.
 
1 0 −1 −1 0 0 R1 ← − R1
 0 1 −1 0 1 0
 

2 0 −1 0 0 1
 
1 0 −1 −1 0 0
0 1 −1 0 1 0
 

0 0 1 2 0 1 R3 ← R3 − 2R1
 
1 0 0 1 0 1 R1 ← R1 + R3
0 1 0 2 1 1 R2 ← R2 + R3
 

0 0 1 2 0 1
The first system of equations has solution b11 = 1, b21 = 2 and
b31 = 2, while the second has solution b12 = 0, b22 = 1 and b32 = 0,
and the final system has solution b13 = 1, b23 = 1 and b33 = 1. Thus
the inverse of the matrix A is given by
 
1 0 1
A −1 = 2 1 1 
 
2 0 1
which is just exactly the matrix that was found to the right of the
augmenting bar!
Formally, the procedure for finding the inverse of a matrix is as
follows. Remember, however, that this is simply a way of organising
the calculations efficiently, and that there is nothing more sophisti-
cated occurring than solving systems of linear equations.

Key Concept 3.28. (Finding the inverse of a matrix) In order to find


the inverse of an n × n matrix A, proceed as follows:

1. Form the “super-augmented” matrix

[ A | In ].

2. Apply Gauss-Jordan elimination to this matrix to place it into


reduced row-echelon form

3. If the resulting reduced row echelon matrix has an identity matrix


to the left of the augmenting bar, then it must have the form

[ In | A−1 ]

and so A−1 will be the matrix on the right of the augmenting bar.

4. If the reduced row echelon matrix does not have an identity ma-
trix to the left of the augmenting bar, then the matrix A is not
invertible.
70 CHAPTER 3. MATRICES AND DETERMINANTS

It is interesting to note that, while it is important to understand


what a matrix inverse is and how to calculate a matrix inverse, it is
almost never necessary to actually find an explicit matrix inverse
in practice. An explicit problem for which a matrix inverse might
be useful can almost always be solved directly (by some form of
Gaussian elimination) without actually computing the inverse.
However, as we shall see in the next section, understanding
the procedure for calculating an inverse is useful in developing
theoretical results.

3.4.2 Characterising invertible matrices


In the last two subsections we have defined the inverse of a matrix,
demonstrated that some matrices have inverses and others don’t
and given a procedure that will either find the inverse of a matrix
or demonstrate that it does not exist. In this subsection, we consider
some of the special properties of invertible matrices focussing on
what makes them invertible, and what particular properties are
enjoyed by invertible matrices.

Theorem 3.29. An n × n matrix is invertible if and only if it has rank


equal to n.

Proof. This is so important that we give a couple of proofs15 in 15


Note that in the proof of Theorem
slightly different language, though the fundamental concept is the 3.23, we already saw that if B is the
inverse of A, then B has full rank.
same in both proofs.
Proof 1: Applying elementary row operations to a matrix does not
alter its row space, and hence its rank. If a matrix A is invertible,
then Gauss-Jordan elimination applied to A will yield the iden-
tity matrix, which has rank n. If A is not invertible, then applying
Gauss-Jordan elimination to A yields a matrix with at least one row
of zeros, and so it does not have rank n.
Proof 2: If a matrix A is invertible then there is always a solution
to the matrix equation
Ax = v

for every v. Indeed we can just take x = A−1 v. Thus the column
space of A, which is { Ax| x ∈ Rn } is equal to the whole of Rn , and
so the rank of A is n. Conversely, assume A has full rank. Then
the column space of A, which is { Ax| x ∈ Rn }, has dimension
n so is equal to Rn . Therefore there exist x1 , x2 , . . . , xn such that
Ax j = e j for each j. Now construct the matrix B whose j-th column
is the vector x j . Then it can be checked that AB = I and so A is
invertible.

There are some other characterisations of invertible matrices that


may be useful, but they are all really just elementary restatements
of Theorem 3.29.

Theorem 3.30. Let A be an n × n matrix. Then

1. A is invertible if and only if its rows are linearly independent.


math1012 mathematical theory and methods 71

2. A is invertible if and only if its columns are linearly independent.


3. A is invertible if and only if its row space is Rn .
4. A is invertible if and only if its column space is Rn .

Proof. These are all ways of saying “the rank of A is n”.

Example 3.31. (Non-invertible matrix) The matrix


 
0 1 2
A = 1 2 −1
 
1 3 1

is not invertible because

(0, 1, 2) + (1, 2, −1) = (1, 3, 1)

is a dependency among the rows, and so the rows are not linearly indepen-
dent.

Now let’s consider some of the properties of invertible matrices.

Theorem 3.32. Suppose that A and B are invertible n × n matrices, and


k is a positive integer. Then

1. The matrix AB is invertible, and

( AB)−1 = B−1 A−1 .

2. The matrix Ak is invertible, and


 −1 k
Recall that Ak = |AA{z
· · · A}
 
Ak = A −1 .
k times

3. The matrix A T is invertible, and


  −1  T
AT = A −1 .

Proof. To show that a matrix is invertible, it is sufficient to demon-


strate the existence of some matrix whose product with the given
matrix is the identity. Thus to show that AB is invertible, we must
find something that we can multiply AB by in order to end up with
the identity.

( AB)( B−1 A−1 ) = A( BB−1 ) A−1 (associativity)


−1
= AIn A (properties of inverses)
−1
= AA (properties of identity)
= In . (properties of inverses)

This shows that AB is invertible, and that its inverse is B−1 A−1
as required. The remaining two statements are straightforward
to prove using matrix properties (and induction for the second
property).
72 CHAPTER 3. MATRICES AND DETERMINANTS

This theorem shows that the collection of invertible n × n ma-


trices is closed under matrix multiplication. In addition, there is a
multiplicative identity (the matrix In ) and every matrix has an in-
verse (obviously!). These turn out to be the conditions that define
an algebraic structure called a group. The group of invertible n × n
matrices plays a fundamental role in the mathematical subject of
group theory which is an important topic in higher-level Pure Mathe-
matics.

3.5 Determinants

From high-school we are all familiar with the formula for the in-
verse of a 2 × 2 matrix:
" # −1 " #
a b 1 d −b
= if ad − bc 6= 0
c d ad − bc −c a
where the inverse does not exist if ad − bc = 0. In other words, a
2 × 2 matrix has an inverse if and only if ad − bc 6= 0. This num-
ber is called the determinant of the matrix, and it is either denoted
det( A) or just | A|.
Example 3.33. (Determinant notation) If
" #
3 5
A=
2 4
then we say either

3 5
det( A) = 2 or =2
2 4
because 3 · 4 − 2 · 5 = 2.
In this section, we’ll extend the concept of determinant to n × n
matrices and show that it characterises invertible matrices in the
same way — a matrix is invertible if and only if its determinant is
non-zero.
The determinant of a square matrix is a scalar value (i.e. a num-
ber) associated with that matrix that can be recursively defined as
follows:

Definition 3.34. (Determinant)


If A = ( aij ) is an n × n matrix, then the determinant of A is a real
number, denoted det( A) or | A|, that is defined as follows:

1. If n = 1, then | A| = a11 .
2. If n > 1, then
j=n
| A| = ∑ (−1)1+ j a1j | A[1, j]| (3.4)
j =1

where A[i, j] is the (n − 1) × (n − 1) matrix obtained from A by


deleting the i-th row and the j-th column
math1012 mathematical theory and methods 73

Notice that when n > 1, this expresses an n × n determinant as an


alternating sum of n terms, each of which is a real number multiplied by
an (n − 1) × (n − 1) determinant.

Exercise 3.5.1. Check that this method yields the formula you know for
2 × 2 matrices.
Example 3.35. (A 3 × 3 determinant) What is the determinant of the
matrix  
2 5 3
A = 4 3 6 ?
 
1 0 2
First let’s identify the matrices A[1, 1], A[1, 2] and A[1, 3]; recall these are
obtained by deleting one row and column from A. For example, A[1, 2] is
obtained by deleting the first row and second column from A, thus
 
2 5 3 " #
4 6
A[1, 2] = 4 3 6 = .
 
1 2
1 0 2

The term (−1)1+ j simply alternates between +1 and −1 and so the


first term is added because (−1)2 = 1, the second subtracted because
(−1)3 = −1, the third added, and so on. Using the formula we get

3 6 4 6 4 3
| A| = 2 · −5· +3·
0 2 1 2 1 0
= 2 · 6 − 5 · 2 + 3 · (−3)
= −7

where the three 2 × 2 determinants have just been calculated using the
usual rule.
This procedure for calculating the determinant is called expand-
ing along the first row, because each of the terms a1j A[1, j] is associ-
ated with an entry in the first row. However it turns out, although
we shall not prove it16 , that it is possible to do the expansion along 16
Proving this is not difficult but
any row or indeed, any column. So in fact we have the following it involves a lot of manipulation of
subscripts and nested sums, which is
result: probably not the best use of your time.

Theorem 3.36. Let A = ( aij ) be an n × n matrix. Then for any fixed


row index i we have
j=n
| A| = ∑ (−1)i+ j aij | A[i, j]|
j =1

and for any fixed column index j, we have


i =n
| A| = ∑ (−1)i+ j aij | A[i, j]|.
i =1

(Notice that the first of these two sums involves terms obtained from the i-
th row of the matrix, while the second involves terms from the j-th column
of the matrix.)
74 CHAPTER 3. MATRICES AND DETERMINANTS

Example 3.37. (Expanding down the second column) Determine the


determinant of  
2 5 3
A = 4 3 6
 
1 0 2
by expanding down the second column. Notice that because we are using
the second column, the signs given by the (−1)i+ j terms alternate −1,
+1, −1 starting with a negative, not a positive. So the calculation gives

4 6 2 3
| A| = (−1) · 5 · +3· + (−1) · 0 · (don’t care)
1 2 1 2
= −5 · 2 + 3 · 1 + 0
= −7.

Also notice that because a32 = 0, the term (−1)3+2 a32 | A[3, 2]| is forced
to be zero, and so there is no need to actually calculate | A[3, 2]|.

In general, you should choose the row or column of the matrix


that has lots of zeros in it, in order to make the calculation as easy
as possible!

Example 3.38. (Easy if you choose right) To determine the determinant of

2 5 0 3
 
4 3 0 6
A=
 
1 0 0 2

1 1 3 2

use the third column which has only one non-zero entry, and get

2 5 3
| A| = (+1) · 0 + (−1) · 0 + (+1) · 0 + (−1) · 3 · 4 3 6
1 0 2
= (−3).(−7) = 21

rather than getting an expression with three or four 3 × 3 determinants to


evaluate!

From this we can immediately deduce some theoretical results:

Theorem 3.39. Let A be an n × n matrix. Then

1. | A T | = | A|,

2. |αA| = αn | A|, and

3. If A has a row of zeros, then | A| = 0.

4. If A is an upper (or lower) triangular matrix, then | A| is the product of


the entries on the diagonal of the matrix.

Proof. To prove the first statement, we use induction on n. Certainly


the statement is true for 1 × 1 matrices. So now suppose that it is
true for all matrices of size n − 1. Notice that expanding along the
math1012 mathematical theory and methods 75

first row of A gives a sum with the same coefficients as expanding


down the first column of A T , the signs of each term are the same be-
cause (−1)i+ j = (−1) j+i , and all the (n − 1) × (n − 1) determinants
in the first sum are just the transposes of those in the second sum,
and so are equal by the inductive hypothesis.
For the second statement we again use induction. Certainly the
statement is true for 1 × 1 matrices. So now suppose that it is true
for all matrices of size up to n − 1. Then

j=n
|αA| = ∑ (−1)i+ j (αaij ) |αA[i, j]|
j =1
j=n
= ∑ (−1)i+ j (αaij ) αn−1 | A[i, j]| (inductive hypothesis)
j =1
j=n
= ααn−1 ∑ (−1)i+ j aij | A[i, j]| (rearranging)
j =1

= α n | A |.

The third statement is immediate because if we expand along the


row of zeros, then every term in the sum is zero.
The fourth statement again follows from an easy induction argu-
ment. Intuitively, for an upper triangular matrix, then keep expand-
ing along the first column at each step.

Example 3.40. (Determinant of matrix in row echelon form) A matrix in


row echelon form is necessarily upper triangular, and so its determinant
can easily be calculated. For example, the matrix

2 0 1 −1
 
0 1 2 1 
A=
 
0 0 −3 2 

0 0 0 1

which is in row echelon form has determinant equal to 2 · 1 · (−3) · 1 =


−6 because this is the product of the diagonal entries. We can verify this
easily from the formula by repeatedly expanding down the first column. So

2 0 1 −1
1 2 1
0 1 2 1 −3 2
= 2 · 0 −3 2 = 2 · 1 · .
0 0 −3 2 0 1
0 0 1
0 0 0 1

3.5.1 Calculating determinants


The recursive definition of a determinant expresses an n × n de-
terminant as a linear combination of n terms each involving an
(n − 1) × (n − 1) determinant. So to find a 10 × 10 determinant
like this involves computing ten 9 × 9 determinants, each of which
involves nine 8 × 8 determinants, each of which involves eight 7 × 7
76 CHAPTER 3. MATRICES AND DETERMINANTS

determinants, each of which involves seven 6 × 6 determinants,


each of which involves six 5 × 5 determinants, each of which in-
volves five 4 × 4 determinants, each of which involves four 3 × 3
determinants, each of which involves three 2 × 2 determinants.
While this is possible (by computer) for a 10 × 10 matrix, even the
fastest supercomputer would not complete a 100 × 100 matrix in the
lifetime of the universe.
However, in practice, a computer can easily find a 100 × 100
determinant, so there must be another more efficient way. Once
again, this way is based on elementary row operations.

Theorem 3.41. Suppose A is an n × n matrix, and that A0 is obtained


from A by performing a single elementary row operation.

1. If the elementary row operation is of Type 1 (Ri ↔ R j ), then

| A0 | = −| A|.

In other words, a Type 1 elementary row operation multiplies the deter-


minant by −1.
2. If the elementary row operation is of Type 2, say Ri ← αRi then

| A 0 | = α | A |.

In other words, multiplying a row by the scalar α multiplies the deter-


minant by α.
3. If the elementary row operation is of Type 3 (Ri ← Ri + αRk ), then

| A | = | A 0 |.

In other words, adding a multiple of one row to another does not change
the determinant.

Proof. 1. (This proof is very technical so we will only give a sketch.)


Consider the elementary row operation of Type 1 Ri ↔ R j where
i < j. If you expand along row i then row j (which has now be-
come row j − 1 after row i was deleted) for A, and expand along
row j then row i for A0 . You will get two linear combinations of
n(n − 1) terms involving (n − 2) × (n − 2) determinants, and you
will notice the two combinations are exactly the opposite of each
other. Therefore | A0 | = −| A|.
2. Consider the elementary row operation of Type 2 Ri ← αRi .
Note this is an elementary row opera-
Expand both A and A0 along row i and notice that, if we remove tion only if α 6= 0 but this result holds
row i, A and A0 are the same matrix so | A[i, j]| = | A0 [i, j]| for even if α = 0.
each j. Therefore
j=n j=n
| A0 | = ∑ (−1)i+ j aij0 | A0 [i, j]| = ∑ (−1)i+ j αaij | A[i, j]|
j =1 j =1
j=n
= α ∑ (−1)i+ j aij | A[i, j]| = α| A|.
j =1
math1012 mathematical theory and methods 77

3. Consider the elementary row operation of Type 3 Ri ← Ri + αRk .


Expand A0 along row i and notice that | A[i, j]| = | A0 [i, j]| for each
j.

j=n j=n
| A0 | = ∑ (−1)i+ j aij0 | A0 [i, j]| = ∑ (−1)i+ j (aij + αakj )| A[i, j]|
j =1 j =1
j=n j=n
= ∑ (−1)i+ j aij | A[i, j]| + α ∑ (−1)i+ j akj | A[i, j]| = | A| + α| B|,
j =1 j =1

where B is obtained from A by replacing row i by row k. There-


fore B has twice the same row: rows i and k are the same. Now
apply the row operation Ri ↔ Rk to B: it did not change the
matrix! Thus, by part 1, | B| = −| B|, and so | B| = 0.

Another common mistake for begin-


Previously we have used elementary row operations to find ning students of linear algebra is to
assume that row-reduction preserves
solutions to systems of linear equations, and to find the basis and every interesting property of a matrix.
dimension of the row space of a matrix. In both these applications, Instead, row-reduction preserves some
properties, alters others in a controlled
the elementary row operations did not change the answer that was fashion, and destroys others. It is
being sought. For finding determinants however, elementary row important to always know why the
operations do change the determinant of the matrix, but they change row-reduction is being done.

it in a controlled fashion and so the process is still useful.

Example 3.42. (Finding determinant by row-reduction) We return to an


earlier example, of finding the determinant of
 
2 5 3
A = 4 3 6 .
 
1 0 2

Suppose that this unknown value is denoted d. Then after doing the Type I
elementary row operation R1 ↔ R3 we get the matrix
 
1 0 2
4 3 6
 
2 5 3

which has determinant −d, because Type 1 elementary row operations


multiply the determinant by −1. If we now perform the Type 3 elementary
row operations, R2 ← R2 − 4R1 and R3 ← R3 − 2R1 , then the resulting
matrix  
1 0 2
0 3 −2
 
0 5 −1
still has determinant −d because Type 3 elementary row operations do
not alter the determinant. Finally, the elementary row operation R3 ←
R3 − (5/3) R2 yields the matrix
 
1 0 2
0 3 −2 
 
0 0 7/3
78 CHAPTER 3. MATRICES AND DETERMINANTS

which still has determinant −d. Using property 4 of Theorem 3.39, we


get that the determinant of this final matrix is 7, and so −d = 7, which
immediately tells us that d = −7 confirming the results of Examples 3.35
and 3.37

Thinking about this process in another way shows us that if a


matrix A has determinant | A| and the matrix A0 is the row-echelon
matrix obtained by performing Gaussian elimination on A, then

| A0 | = β| A| for some β 6= 0.

Combining this with the fourth property of Theorem 3.39 allows us


to state the single most important property of determinants:

Theorem 3.43. A matrix A is invertible if and only if its determinant is


non-zero.

Proof. Consider the row echelon matrix A0 obtained by applying


Gaussian elimination to A. If A is invertible, then A0 has no zero
rows and so every diagonal entry is non-zero and thus | A0 | 6= 0,
while if A is not invertible, A0 has at least one zero row and thus
| A0 | = 0. As the determinant of A is a non-zero multiple of the
determinant of A0 , it follows that A has non-zero determinant if
and only if it is invertible.

3.5.2 Properties of the determinant


We finish this chapter with some of the properties of determinants,
most of which follow immediately from the following theorem,
which shows that the determinant function is multiplicative.

Theorem 3.44. If A and B are two n × n matrices, then

| AB| = | A| · | B|.

Proof. There are several proofs of this result, none of which are
very nice. We give a sketch outline17 of the most illuminating proof. 17
This is a very brief outline of the
First note that if either A or B (or both) is not invertible, then AB is proof so do not worry if you cannot
follow it without some guidance on
not invertible and so the result is true if any of the determinants is how to fill in the gaps.
zero.
Then proceed in the following steps:

1. Define an elementary matrix to be a matrix obtained by perform-


ing a single elementary row operation on the identity matrix.
2. Note that premultiplying a matrix A by an elementary matrix E,
thereby forming the matrix EA, is exactly the same as perform-
ing the same elementary row operation on A.
3. Show that elementary matrices of Type 1, 2 and 3 have deter-
minant −1, α and 1 respectively (where α is the non-zero scalar
associated with an elementary row operation of Type 2).
4. Conclude that the result is true if A is an elementary matrix or a
product of elementary matrices.
math1012 mathematical theory and methods 79

5. Finish by proving that a matrix is invertible if and only if it is the


product of elementary matrices, because Gauss-Jordan elimina-
tion will reduce any invertible matrix to the identity matrix.

The other proofs of this result use a different description of the


determinant as the weighted sum of n! products of matrix entries,
together with extensive algebraic manipulation.

Just for fun, we’ll demonstrate this result for 2 × 2 matrices


purely algebraically, in order to give a flavour of the alternative
proofs. Suppose that
" # " #
a b a0 b0
A= B= 0 0 .
c d c d

Then " #
aa0 + bc0 ab0 + bd0
AB = 0 .
a c + c0 d b0 c + dd0
Therefore

| AB| = ( aa0 + bc0 )(b0 c + dd0 ) − ( ab0 + bd0 )( a0 c + c0 d)


= aa0 b0 c + aa0 dd0 + bc0 b0 c + bc0 dd0 − ab0 a0 c − ab0 c0 d − bd0 a0 c − bd0 c0 d
= ( aa0 b0 c − ab0 a0 c) + aa0 dd0 + bc0 b0 c + (bc0 dd0 − bd0 c0 d) − ab0 c0 d − bd0 a0 c
= 0 + ( ad)( a0 d0 ) + (bc)(b0 c0 ) + 0 − ( ad)(b0 c0 ) − ( a0 d0 )(bc)
= ( ad − bc)( a0 d0 − b0 c0 )
= | A|| B|.

The multiplicativity of the determinant immediately gives the


main properties.

Theorem 3.45. Suppose that A and B are n × n matrices and k is a


positive integer. Then

1. | AB| = | BA|

2. | Ak | = | A|k

3. If A is invertible, then | A−1 | = 1/| A|

Proof. The first two are immediate, and the third follows from the
fact that AA−1 = In and so | A|| A−1 | = 1.
4
Linear transformations

4.1 Introduction

First we will give the general definition of a function.

Definition 4.1. (Function)


Given two sets A and B, a function f : A → B is a rule that assigns to
each element of A a unique element of B. We often write

f : A −→ B
a 7−→ f ( a)

where f ( a) is the element of B assigned to a, called the image of a un-


der f . The set A is called the domain of f and is sometimes denoted
dom( f ). The set B is called the codomain of f . The range of f , (some-
times denoted range( f )) is the set of all elements of B that are the image
of some element of A.

Often f ( a) is defined by some equation involving a (or whatever


variable is being used to represent elements of A), for example
f ( a) = a2 . However, sometimes you may see f defined by listing
f ( a) for each a ∈ A. For example, if A = {1, 2, 3} we could define f
by
f : A −→ R
1 7−→ 10
2 7−→ 10
3 7−→ 102.
If f ( x ) is defined by some rule and the domain of f is not explicitly
given then we assume that the domain of f is the set of all values
on which f ( x ) is defined.
Note that the range of f need not be all of the codomain. For
example, if f : R → R is the function defined by f ( x ) = x2 then the
codomain of f is R while the range of f is the set { x ∈ R | x ≥ 0}.
A linear transformation is a function from one vector space to an-
other preserving the structure of vector spaces, that is, it preserves
vector addition and scalar multiplication.
82 CHAPTER 4. LINEAR TRANSFORMATIONS

More precisely:

Definition 4.2. (Linear transformation)


A function f from Rn to Rm is a linear transformation if:
An interesting case is when n = m, in
n
1. f (u + v) = f (u) + f (v) for all u, v in R ; which case the domain and codomain
are the same vector space.
2. f (αv) = α f (v) for all v in Rn and all α in R.

Example 4.3. (Linear transformation) In R3 , the orthogonal projection


Check the two conditions to convince
to the xy-plane is a linear transformation. This maps the vector ( x, y, z) to yourself
( x, y, 0).

Example 4.4. (Not a linear transformation) The function from R3 to


R given by f ( x, y, z) = x2 + y2 + z2 is not a linear transformation.
These two examples illustrate again
Indeed for v = (1, 0, 0) and α = 2, f (αv) = f (2, 0, 0) = 4 while the black swan concept: we only need
α f (v) = 2.1 = 2. to find one concrete counter-example
to prove that a function is not a linear
transformation.
Example 4.5. (Not a linear transformation) Let f : R → R defined
by f ( x ) = ax + b. Note that f (1) = a + b while f (2) = 2a + b 6=
2( a + b) when b 6= 0. Thus when b 6= 0, the function f is not a linear
transformation of the vector space R. We call f an affine function.
Example 4.6. Let A be an m × n matrix. Then the function f from Rn to
Rm such that f ( x) = Ax is a linear transformation (where we see x as an
n × 1 column vector, as described in Chapter 2). Indeed

f (u + v) = A(u + v) = Au + Av = f (u) + f (v)

(using Property (8) of Theorem 3.2), and

f (αv) = A(αv) = αAv = α f (v)

(using Property (7) of Theorem 3.2).


Theorem 4.7. Let f : Rn −→ Rm be a linear transformation.
(i) f (0) = 0.

(ii) The range of f is a subspace of Rm .


Proof. (i) Since 0 is an additive identity, we have that 0 = 0 + 0.
Applying f to this identity yields: f (0) = f (0 + 0) = f (0) + f (0).
The result follows from subtracting f (0) on each side.

(ii) We need to prove the three subspace conditions for range( f ) (see
Definition 2.4).

(S1) 0 ∈ range( f ) since 0 = f (0) by Part (i).


(S2) Let u, v ∈ range( f ), that is, u = f (u0 ), v = f (v0 ) for some
u0 , v0 ∈ Rn . Then u + v = f (u0 ) + f (v0 ) = f (u0 + v0 ) and so
u + v ∈ range( f ).
(S3) Let α ∈ R and v ∈ range( f ), that is, v = f (v0 ) for some
v0 ∈ Rn . Then αv = α f (v0 ) = f (αv0 ) and so αv ∈ range( f ).
math1012 mathematical theory and methods 83

4.2 Linear transformations and bases

A linear transformation can be given by a formula, but there are


other ways to describe it. In fact, if we know the images under a
linear transformation of each of the vectors in a basis, then the rest
of the linear transformation is completely determined.

Theorem 4.8. Let {u1 , u2 , . . . , un } be a basis for Rn and let t1 , t2 , . . ., For instance the basis of Rn can be the
tn be n vectors of Rm . Then there exists a unique linear transformation f standard basis.

from Rn to Rm such that f (u1 ) = t1 , f (u2 ) = t2 , . . ., f (un ) = tn .

Proof. We know by Theorem 2.47 that any vector v ∈ Rn can be


written in a unique way as v = α1 u1 + α2 u2 + . . . + αn un (where the
αi ’s are real numbers). Define f (v) = α1 t1 + α2 t2 + . . . + αn tn . Then
f satisfies f (ui ) = ti for all i between 1 and n and we can easily
check that f is linear. So we have that f exists.
Now suppose g is also a linear transformation satisfying g(ui ) =
ti for all i between 1 and n. Then

g ( v ) = g ( α1 u1 + α2 u2 + · · · + α n u n )
= g ( α1 u1 ) + g ( α2 u2 ) + · · · + g ( α n u n )
(by the first condition for a linear function)
= α1 g ( u1 ) + α2 g ( u2 ) + · · · + α n g ( u n )
(by the second condition for a linear function)
= α 1 t1 + α 2 t2 + · · · + α n t n
= f ( v ).

Thus g(v) = f (v) for all v ∈ Rn so they are the same linear trans-
formation, that is, f is unique.

Exercise 4.2.1. Let f be a linear transformation from R2 to R3 with


f (1, 0) = (1, 2, 3) and f (0, 1) = (0, −1, 2). Determine f ( x, y).

4.3 Linear transformations and matrices

We have seen that it is useful to choose a basis of the domain,


say B = {u1 , u2 , . . . , un }. Now we will also take a basis for the
codomain: C = {v1 , v2 , . . . , vm }. For now we can think of both the
bases as the standard ones, but later we will need the general case.
By Theorem 2.47, each vector f (u j ) (1 ≤ j ≤ n) has unique
coordinates in the basis C of Rm . More precisely:

f (u j ) = a1j v1 + a2j v2 + · · · + amj vm , that is ( f (u j ))C = ( a1j , a2j , . . . , amj ).

For short we write


m
f (u j ) = ∑ aij vi .
i =1

Now we can determine the image of any vector x in Rn . If x =


x1 u1 + x2 u2 + · · · + xn un (so ( x) B = ( x1 , x2 , . . . , xn ) ), then we have:
84 CHAPTER 4. LINEAR TRANSFORMATIONS

f ( x ) = f ( x1 u1 + x2 u2 + · · · + x n u n )
n
= f ( ∑ xj uj )
j =1
n
= ∑ x j f (u j ) (by linearity)
j =1
n m
!
= ∑ x j ∑ aij vi
j =1 i =1
m n
!
= ∑ ∑ aij x j vi .
i =1 j =1

n
Notice that ∑ aij x j is exactly the i-th element of the m × 1 matrix
j =1
A( x) B , where A = ( aij ) is the m × n matrix defined by f (u j ) =
m
∑ aij vi . This is saying that the coordinates with respect to basis C
i =1
of f ( x) are just A( x) B .
This gives us a very convenient way to express a linear transfor-
mation (as the matrix A) and to calculate the image of any vector.

Definition 4.9. (Matrix of a linear transformation)


The matrix of a linear transformation f , with respect to the basis B of
the domain and the basis C of the codomain, is the matrix A whose j-th
column contains the coordinates in the basis C of the image under f of the
j-th basis vector of B.
If we want to emphasise the choice of bases we write A = ACB .
When both B and C are the standard bases then we refer to the matrix
A as the standard matrix of f .

Whenever m = n we usually take B = C.


The argument above and this definition yield the following theo-
rem.

Theorem 4.10. Let f be a linear transformation, B a basis of its domain, Together with Example 4.6, this tells
C a basis of its codomain, and ACB as above. Then us that linear transformations are
essentially the same as matrices (after
you have chosen a basis of the domain
( f ( x))C = ACB ( x) B . and a basis of the codomain).

In the case where B = C = S (where S is the standard basis) then we can


just write f ( x) = Ax, where A = ASS is the standard matrix for f .

Key Concept 4.11. Let A be the matrix of a linear transformation f .

• The number of rows of A is the dimension of the codomain of f .


math1012 mathematical theory and methods 85

• The number of columns of A is the dimension of the domain of


f.

In other words, if f : Rn → Rm then A is an m × n matrix, whatever


bases we choose for the domain and codomain.

Example 4.12. (identity) The identity matrix In corresponds to the


linear transformation that fixes every basis vector, and hence fixes every
vector in Rn .
" #
2 0
Example 4.13. (dilation) The linear transformation with matrix A dilation is a function that maps every
0 2 vector to a fixed multiple of itself:
(with respect to the standard basis, for both the domain and codomain) x 7→ λx, where λ is called the ratio of
the dilation.
maps e1 to 2e1 and e2 to 2e2 : it is a dilation of ratio 2 in R2 .
" #
0 −1
Example 4.14. (rotation) The linear transformation with matrix
1 0
(with respect to the standard basis) maps e1 to e2 and e2 to −e1 : it is
the anticlockwise rotation of the plane by an angle of 90 degrees (or π/2)
around the origin.

Exercise 4.3.1. In R2 , an anticlockwise rotation of angle θ around the


origin is a linear transformation. What is its matrix with respect to the
standard basis?

4.4 Rank-nullity theorem revisited

Remember the Rank-Nullity Theorem (Theorem 3.19): rank( A) +


nullity( A) = n for an m × n matrix A. We now know that A rep-
resents a linear transformation f : x → Ax, so we are going to
interpret what the rank and the nullity are in terms of f .
The rank of A is the dimension of the column space of A. We
have seen that the columns represent the images f (u j ) for each
basis vector u j of Rn .

Theorem 4.15. Let f : Rn −→ Rm be a linear transformation and let


{u1 , u2 , . . . , un } be a basis for Rn . Then

range( f ) = span({ f (u1 ), f (u2 ), . . . , f (un )}).

Proof. We first show that any element in the range of f is in the


span, that is can be written as a linear combination of the vectors.
Let y ∈ Rm be in the range of f , that is y = f ( x) for some x in
Rn . By Theorem 2.47, x can be written in a unique way as x =
α1 u1 + α2 u2 + · · · + αn un (where the αi ’s are real numbers). Using
the linearity of f , it immediately follows that f ( x) = α1 f (u1 ) +
α2 f (u2 ) + · · · + αn f (un ). Hence y ∈ span({ f (u1 ), f (u2 ), . . . , f (un )}).
Now we need to prove the converse: that every element in the
the span must be in the range of f . Let v be in span({ f (u1 ), f (u2 ), . . . , f (un )}).
By definition v = α1 f (u1 ) + α2 f (u2 ) + · · · + αn f (un ) for some
86 CHAPTER 4. LINEAR TRANSFORMATIONS

scalars α1 , α2 , . . . , αn . Using the linearity of f , it immediately follows


that v = f (α1 u1 + α2 u2 + · · · + αn un ), and so v is in the range of
f since it is the image of some vector α1 u1 + α2 u2 + · · · + αn un of
Rn .

Therefore the column space corresponds exactly to the range of


f . By Theorem 4.7, we know that the range of f is a subspace, and
so has a dimension: the rank of A corresponds to the dimension of
the range of f .
The nullity of A is the dimension of the null space of A. Recall
that the null space of A is the set of vectors x of Rn such that Ax =
0. In terms of f , it corresponds to the vectors x of Rn such that
f ( x) = 0. This set is called the kernel of f .

Definition 4.16. (Kernel)


The kernel of a linear transformation f : Rn −→ Rm is the set

Ker( f ) = { x ∈ Rn | f ( x) = 0}.

The kernel of f is a subspace1 of Rn and so has a dimension: the 1


Try proving it!
nullity of A corresponds to the dimension of the kernel of f .
We can now rewrite the Rank-Nullity Theorem as follows:

Theorem 4.17. Let f be a linear transformation. Then

dim(range( f )) + dim(Ker( f )) = dim(dom( f )).

We immediately get:

Corollary 4.18. The dimension of the range of a linear transformation


is at most the dimension of its domain.

4.5 Composition

Whenever you have two linear transformations such that the


codomain of the first one is the same vector space as the domain
of the second one, we can apply the first one followed by the sec-
ond one.

Definition 4.19. (Composition)


Let f : A → B and g : B → C be functions. Then the function
g ◦ f : A → C defined by
Notice we read composition from right
( g ◦ f )( a) = g( f ( a)) for all a ∈ A to left.

is the composition of f by g.

Theorem 4.20. If f : Rn → Rm and g : Rm → R p are linear


transformations, then g ◦ f is also a linear transformation, from Rn to R p .
math1012 mathematical theory and methods 87

Proof. We need to prove the two conditions for a linear transforma-


tion.
For all u, v in Rn :

( g ◦ f )(u + v) = g( f (u + v)) (definition of composition)


= g( f (u) + f (v)) ( f is a linear transformation)
= g( f (u)) + g( f (v))) (g is a linear transformation)
= ( g ◦ f )(u) + ( g ◦ f )(v). (composition definition)

For all v in Rn and all α in R:

( g ◦ f )(αv) = g( f (αv)) (definition of composition)


= g(α f (v)) ( f is a linear transformation)
= αg( f (v)) (g is a linear transformation)
= α( g ◦ f )(v). (definition of composition)

Let B = {u1 , u2 , . . . , un }, C = {v1 , v2 , . . . , vm }, D = {w1 , w2 , . . . , w p }


be bases of Rn , Rm , and R p respectively. Let F = FCB = ( f ij ) be the
matrix corresponding to f with respect to the bases B of the domain
and C of the codomain. Let G = GDC = ( gij ) be the matrix corre-
sponding to g with respect to the bases C of the domain and D of
the codomain. So F is an m × n matrix and G is a p × m matrix. Let
us look at the image of u1 under g ◦ f .
We first apply f , so the image f (u1 ) corresponds to the first
m
column of A: f (u1 ) = f 11 v1 + f 21 v2 + · · · + f m1 vm = ∑ fi1 vi . Then
i =1
we apply g to f (u1 ):
m
!
( g ◦ f )(u1 ) = g ∑ fi1 vi
i =1
m
= ∑ fi1 g(vi ) (g is a linear transformation)
i =1
m p
!
= ∑ fi1 ∑ gji wj
i =1 j =1
p m
!
= ∑ ∑ fi1 gji wj (rearranging terms)
j =1 i =1
p m
!
= ∑ ∑ gji fi1 wj
j =1 i =1
p
= ∑ (GF) j1 wj .
j =1

This says that the first column of the matrix GF yields the coordi-
nates of ( g ◦ f )(u1 ) with respect to the basis D. We can do the same
calculation with any u j (1 ≤ j ≤ n) to see that the image ( g ◦ f )(u j )
corresponds exactly to the j-th column of the matrix GF. Hence
the matrix corresponding to g ◦ f with respect to the basis B of the
domain and the basis D of the codomain is GF = GDC FCB .
88 CHAPTER 4. LINEAR TRANSFORMATIONS

Key Concept 4.21. Composition of linear transformations is the


same thing as multiplication of the corresponding matrices, where we
order the matrices from right to left, just as composition.

You may have thought that matrix multiplication was defined in


a strange way: it was defined precisely so that it corresponds with
composition of linear transformations.

4.6 Inverses

An inverse function is a function that “undoes” another function: if


f ( x ) = y, the inverse function g maps y to x.

Definition 4.22. (Inverse function)


Let f : A −→ B be a function. We say that f is invertible if there exists
a function g : B −→ A such that g( f ( x )) = x, meaning that g ◦ f is the
identity function. The inverse function g is then uniquely determined by f
and is denoted by f −1 . We have

( f −1 ◦ f )( a) = a for all a ∈ A and ( f ◦ f −1 )(b) = b for all b ∈ B

It is routine to show that f is invertible if and only if it is bijective.

Definition 4.23. (Bijective function)


Let f : A −→ B be a function.

• We say that f is one-to-one if no two elements of A have the same


image. More formally:

f ( a1 ) = f ( a2 ) ⇒ a1 = a2

• We say that f is onto if the range of f is equal to the codomain B.

• We say that f is bijective if f is both one-to-one and onto.

We now look at the particular case where f is a linear transfor-


mation.

Theorem 4.24. Let f : Rn −→ Rm be a linear transformation.

1. If f is invertible then f −1 is a linear transformation.

2. f is invertible if and only if n = m and range( f ) = Rn .


math1012 mathematical theory and methods 89

Proof. 1. Suppose first that f is invertible. We need to show that


f −1 : Rm −→ Rn satisfies the two properties of a linear transfor-
mation. Take u, v ∈ Rm . Then:
 
f f −1 ( u + v ) = ( f ◦ f −1 ) ( u + v )
= u + v = ( f ◦ f −1 ) ( u ) + ( f ◦ f −1 ) ( v )
 
= f f −1 ( u ) + f −1 ( v ) .

Since f is one-to-one, it follows that f −1 (u + v) = f −1 (u) +


f −1 (v). The other property is proved in a similar fashion.

2. Suppose first that f is invertible. Since f is onto, range( f ) = Rm ,


which has dimension m. By Corollary 4.18, we get that m ≤ n.
Now the inverse function f −1 is also onto, so that range( f −1 ) =
Rn , which has dimension n. Since f −1 is a linear transformation,
we can apply Corollary 4.18 to f −1 , and so n ≤ m. Therefore
m = n and range( f ) = Rm = Rn .
Conversely, suppose that n = m and range( f ) = Rn . It follows
immediately that f is onto so we only need to prove that f is
one-to-one. Suppose f (u) = f (v) for some u, v ∈ Rn , we want
to show that u = v. By linearity f (u − v) = 0, that is u −
v ∈ Ker( f ). By Theorem 4.17, dim(Ker( f )) = dim(dom( f )) −
dim(range( f )) = n − n = 0. The only subspace of dimension
0 is the trivial subspace {0}, so Ker( f ) = {0}. It follows that
u − v = 0, and so u = v.

Consider an invertible linear transformation f . By Theorem


4.24, the domain and codomain of f are the same, say Rn , and
f −1 is also a linear transformation with domain and codomain
Rn . Let A be the matrix corresponding to f and B be the matrix
corresponding to f −1 (all with respect to the standard basis, say);
both are n × n matrices. We have seen in Example 4.12 that the
matrix corresponding to the identity function is the identity matrix
In . By the Key Concept 4.21, we have that BA = In . In other words,
B is the inverse matrix of A.
Hence we have:

Theorem 4.25. The matrix corresponding to the inverse of an invertible


linear transformation f is the inverse of the matrix corresponding to f
(with respect to a chosen basis).
5
Change of basis

In Chapter 2, we saw that choosing a basis for a subspace (for in-


stance the whole vector space Rn ) determines coordinates. In Chap-
ter 4, we saw how a choice of bases for the domain and for the
codomain allows us to write a linear transformation as a matrix. In
this Chapter, we will study ways to change bases in these two cases.

5.1 Change of basis for vectors

A change of coordinates from one basis to another can be achieved


by multiplication of the given coordinate vector by a so-called
change of coordinates matrix. Consider a subspace V of dimension
n of Rm and two different bases for V:

B = { u1 , u2 , · · · , u n } and C = { w 1 , w 2 , · · · , w n }.

A given vector v ∈ V will have cordinates in each of these bases:

( v ) B = ( α1 , α2 , . . . , α n ) , ( v )C = ( β 1 , β 2 , . . . , β n ) ,
n n
that is, v = ∑ αk uk = ∑ βi wi . Our task is to find an invertible
k =1 i =1
n × n matrix PCB for which

(v)C = PCB (v) B and (v) B = PBC (v)C


−1
where PBC = PCB . That is, pre-multiplication by PCB will con-
vert coordinates in basis B to coordinates in basis C and pre-
multiplicationby PBC will convert  those in C to those in B.
p11 p12 · · · p1n
 p21 p22 · · · p2n 
 
Let PCB =  .. .. .
.. 
 . . . 
pn1 pn2 · · · pnn
We compute PCB (v) B by computing the (i, 1)-entry for each i
(recall the formula for matrix multiplication in Definition 3.1 and
that PCB (v) B is an n × 1 matrix):
n
( PCB (v) B )i1 = ∑ pik αk
k =1
n
Therefore β i = ∑ pik αk for each i.
k =1
92 CHAPTER 5. CHANGE OF BASIS

It follows that
n
v= ∑ β i wi
i =1
n n
!
= ∑ ∑ pik αk wi
i =1 k =1
n n
!
= ∑ αk ∑ pik wi
k =1 i =1

On the other hand,


n
v= ∑ αk uk ,
k =1
so
n
uk = ∑ pik wi .
i =1

In other words, the coefficients of the matrix PCB satisfy:

(u1 )C = ( p11 , p21 , . . . , pn1 )


(u2 )C = ( p12 , p22 , . . . , pn2 )
..
.
(un )C = ( p1n , p2n , . . . , pnn ).

That is each column corresponds to the coordinates of the vectors in


the basis B with respect to basis C. For this reason, we might prefer
to write the vectors (ui )C as column vectors in this case:
     
p11 p12 p1n
 .   .   . 
 .. 
( u1 ) C =  ,  .. 
( u2 ) C =  , ···  ..  .
( un )C = 
  
pn1 pn2 pnn

Key Concept 5.1. To convert coordinates of the vector v from basis


B to basis C we perform the matrix multiplication

(v)C = PCB (v) B ,

where PCB is the matrix whose i-th column is the coordinates with
respect to basis C of the i-th basis vector in B.
−1
Moreover PBC = PCB .

The matrix PCB will be invertible because the elements of basis


B are linearly independent and the elements of basis C are linearly
independent.

Example 5.2. (Example 2.50 revisited again) Recall we considered B =


{(1, −1, 0), (1, 0, −1)} and C = {(0, 1, −1), (1, −2, 1)}, two bases for
the vector space V. To find PCB we need to determine the coordinates of the
basis elements of B with respect to basis C.
math1012 mathematical theory and methods 93

The required coordinates are


" # " #
1 2
(1, −1, 0)C = , (1, 0, −1)C =
1 1

and hence
" # " #
1 2 −1 −1 2
PCB = and PBC = PCB = .
1 1 1 −1

We can verify these. Recall that (v) B = (3, −2) and (w)C = (6, 1). Then
" #" # " #
1 2 3 −1
(v)C = PCB (v) B = =
1 1 −2 1

which agrees with what we got before, and


" #" # " #
−1 2 6 −4
(w) B = PBC (w)C = =
1 −1 1 5

which also agrees with what we got before.

5.2 Change of bases for linear transformations

Recall that linear transfomations f can be represented using matri-


ces:
f ( x) = ASS x.
For example, counterclockwise rotation about the origin through
an angle θ in R2 using the standard basis for the original and trans-
formed vectors has transformation matrix
" #
cos θ − sin θ
ASS = .
sin θ cos θ

We have used the subscript SS to indicate that the standard basis is


being used to represent the original vectors (domain of the linear
transformation) and also the rotated vectors (codomain of the linear
transformation). For example, the vector e2 = [0, 1] under a rotation
π
of becomes
2
" #" # " #
0 −1 0 −1
= = −e1 , as expected.
1 0 1 0

If we desire to use different bases to represent the coordinates of


the vectors, say basis B for the domain and basis C for codomain,
then recall from Definition 4.9 that we label the transformation
matrix ACB and the linear transformation will be

( f ( x))C = ACB ( x) B . (5.1)

We need a way of deducing ACB . This can be achieved by employ-


ing the change of basis matrices PBS and PCS where

( x) B = PBS ( x)S and ( f ( x))C = PCS ( f ( x))S .


94 CHAPTER 5. CHANGE OF BASIS

Substitution of these formulae in Equation (5.1) gives

PCS ( f ( x))S = ACB PBS ( x)S ⇒ ( f ( x))S = PSC ACB PBS ( x)S
−1
(recalling that PCS = PSC ). Using the standard basis for the trans-
formation would be ( f ( x))S = ASS ( x)S and hence we must have
ASS = PSC ACB PBS , which we can rearrange to get the linear trans-
formation change of basis formula

ACB = PCS ASS PSB .

Note that if we use the same basis B for both the domain and
codomain then we have

A BB = PBS ASS PSB . (5.2)

Two matrices M and N are similar if there exists an invertible


matrix Q such that N = Q−1 MQ. Equation (5.2) tells us that all
linear transformation matrices in which the same basis is used for
the domain and codomain are similar.

Example 5.3. We determine the change of basis matrix from a basis


B = {u1 , u2 , · · · , un } to the standard basis S. We note that (ui )S = ui
for all i = 1, · · · , n and hence we can immediately write

PSB = [u1 u2 · · · un ]

where the i-th column is the vector ui (written as usual as an n-tuple).

Example 5.4. We determine the transformation matrix for counterclock-


wise rotation through an angle θ in R2 using the basis

B = {(1, −1), (1, 1)}

to represent both the original vectors and the transformed vectors.


We need to calculate PSB and PBS . We can write down immediately
that
" # " #
1 1 −1 1/2 −1/2
PSB = and that PBS = PSB =
−1 1 1/2 1/2

and so the desired transformation matrix is


" #" #" #
1/2 −1/2 cos θ − sin θ 1 1
A BB = PBS ASS PSB =
1/2 1/2 sin θ cos θ −1 1
" #" #
1/2 −1/2 cos θ + sin θ cos θ − sin θ
=
1/2 1/2 sin θ − cos θ sin θ + cos θ
" #
cos θ − sin θ
= .
sin θ cos θ

That is, A BB = ASS , but if we think about the geometry then this makes
sense.
math1012 mathematical theory and methods 95

Example 5.5. We determine the transformation matrix for counterclock-


wise rotation through an angle θ in R2 using the basis

C = {(1, 0), (1, 1)}

to represent both the original vectors and the transformed vectors.


We need to calculate PSC and PCS . We can write down immediately
that " # " #
1 1 −1 1 −1
PSC = and that PCS = PSC =
0 1 0 1
and so the desired transformation matrix is
" #" #" #
1 −1 cos θ − sin θ 1 1
ACC = PCS ASS PSC =
0 1 sin θ cos θ 0 1
" #" #
1 −1 cos θ cos θ − sin θ
=
0 1 sin θ sin θ + cos θ
" #
cos θ − sin θ −2 sin θ
= .
sin θ sin θ + cos θ

Example 5.6. The matrix of a particular linear transfomation in R3 in


which the standard basis has been used for both the domain and codomain
is  
1 3 4
ASS =  2 −1 1 .
 
−3 5 1
Determine the matrix of the linear transfomation if basis B is used for the
domain and basis C is used for the codomain, where

B = {(0, 0, 1), (0, 1, 0), (1, 0, 0)} , C = {(1, 1, 1), (0, 1, 1), (0, 0, 1)}.

Solution. First we need to calculate PCS and PSB . We can immediately


−1
write down PSB and PSC , and use the fact that PCS = PSC . We have
     
0 0 1 1 0 0 1 0 0
−1
PSB = 0 1 0 and PSC = 1 1 0 ⇒ PCS = PSC =  −1 1 0
     
1 0 0 1 1 1 0 −1 1

and hence  
4 3 1
ACB = PCS ASS PSB =  −3 −4 1  .
 
0 6 −5

Of course we would like to make the matrix of the linear trans-


formation simpler by choosing an appropriate basis B (most often,
if the matrix is square, we chose the same basis for the domain and
codomain).

Example 5.7. Consider the linear transformation with matrix


" #
2 6
ASS = .
3 5
96 CHAPTER 5. CHANGE OF BASIS

Find A BB where B = {(1, 1), (−2, 1)}.


Solution. We need to calculate PSB and PBS . We can write down
immediately that
" # " #
1 −2 −1 1/3 2/3
PSB = and that PBS = PSB =
1 1 −1/3 1/3

and so the desired transformation matrix is


" #" #" #
1/3 2/3 2 6 1 −2
A BB = PBS ASS PSB =
−1/3 1/3 3 5 1 1
" #" #
1/3 2/3 8 2
=
−1/3 1/3 8 −1
" #
8 0
= .
0 −1

The matrix A BB has a very simple form, which is nice for calculations.
It also makes it easy to visualise. The first vector in the basis B is stretched
8 times, and the second vector is mapped onto its opposite.

In the next chapter, we will learn how to determine a nice basis B


for a given linear transformation and its corresponding matrix.
6
Eigenvalues and eigenvectors

6.1 Introduction

Matrix multiplication usually results in a change of direction, for


example,
" #" # " # " #
2 0 1 2 1
= which is not parallel to .
1 3 4 13 4

The eigenvectors of a given (square) matrix A are those special non-


zero vectors v that map to multiples of themselves under multipli-
cation by the matrix A, and the eigenvalues of A are the correspond-
ing scale factors.

Definition 6.1. (Eigenvectors and eigenvalues)


Let A be a square matrix. An eigenvector of A is a vector v 6= 0 such
that
Av = λv for some scalar λ.

An eigenvalue of A is a scalar λ such that

Av = λv for some vector v 6= 0.

Geometrically, the eigenvectors of a matrix A are stretched (or


shrunk) on multiplication by A, whereas any other vector is rotated
as well as being stretched or shrunk.
Note that by definition 0 is not an eigenvector but we do allow 0 to
be an eigenvalue.

Example 6.2. Recall Example 5.7, and we compute Av for each v in the
basis B = {(1, 1), (−2, 1)}.
" #" # " # " #
2 6 1 8 1
= = 8 ,
3 5 1 8 1

" #" # " # " #


2 6 −2 2 −2
= = −1 .
3 5 1 −1 1
98 CHAPTER 6. EIGENVALUES AND EIGENVECTORS
math1011 multivariate calculus 1

x2 x2
Ax 6= lx
x Av = lv

x1 x1

# "
2 6
Hence 8 and −1 are eigenvalues of the matrix with λ = 8 having
3 5
corresponding eigenvector (1, 1) and λ = −1 having corresponding
eigenvector (−2, 1).

Example 5.7 illustrates that if we change the basis to eigenvectors


we get a diagonal matrix. In other words, solving the eigenvalue-
eigenvector problem is equivalent to finding a basis in which the
linear transformation has a particularly simple (in this case, diago-
nal) matrix representation.

Definition 6.3. (Eigenspace)


Let λ be an eigenvalue for A. Then the eigenspace corresponding to the
eigenvalue λ is the set

Eλ = {v | Av = λv},

that is, the set of eigenvectors corresponding to λ together with the zero
vector.

Theorem 6.4. Let λ be an eigenvalue for the n × n matrix A. Then the


eigenspace Eλ is a subspace of Rn of dimension at least 1.

Proof. We need to show the three subspace conditions.

(S1) A0 = λ0 so 0 ∈ Eλ .
(S2) Let u, v ∈ Eλ . Then Au = λu and Av = λv. We want to show
that u + v ∈ Eλ so we test the membership condition:

A(u + v) = Au + Av = λu + λv = λ(u + v).

(S3) Let α ∈ R and v ∈ Eλ . Then Av = λv. We want to show that


αv ∈ Eλ so we test the membership condition:

A(αv) = α( Av) = α(λv) = λ(αv).

By definition of eigenvalue, there exists a non-zero vector v such


that Av = λv, so we can construct a basis for Eλ containing at least
the vector v. Thus the dimension of Eλ is at least 1.
math1012 mathematical theory and methods 99

Geometrically, for n = 3, eigenspaces are lines or planes through


the origin in R3 (or sometimes even the whole of R3 ).
In Example 6.2, we have at least two eigenvalues 8 and −1 and
each eigenspace contains at least all the scalar multiples of the
eigenvectors we found. In this case, there are only two eigenspaces,
both of dimension 1. The eigenspaces for Example 6.2 are shown in math1011 multivariate calculus 1
Figure 6.1.

x2
E8 Figure 6.1: Eigenspaces for
Example 6.2.

x1

E 1

Theorem 6.5. Consider a square matrix A and its eigenvalues/eigenspaces.

1. If 0 is an eigenvalue, then the eigenspace E0 is exactly the null space of


A.

2. 0 is an eigenvalue if and only if the null space of A has dimension at


least 1.

3. For each eigenvalue λ 6= 0, Eλ is a subspace of the column space of A.

Proof. Suppose 0 is an eigenvalue. Then E0 = {v | Av = 0} is the


null space and has dimension at least 1 by Theorem 6.4. If 0 is not
an eigenvalue, then the only vector v such that Av = 0 is the zero
vector, so the null space of A has dimension 0. This proves the first
two statements.
Recall that the column space of A is equal to { Ax| x ∈ Rn }. For
1 1
an eigenvalue λ 6= 0, each eigenvector v = Av = A( v), and so
λ λ
v belongs to the column space of A. It follows that Eλ is a subspace
of the column space of A.

6.2 Finding eigenvalues and eigenvectors

Let A be a given n × n matrix. Recall that the algebraic definition of


an eigenvalue-eigenvector pair is

Av = λv

where λ is a scalar and v is a nonzero column vector of length n.


We begin by rearranging and regrouping, and noting that v = Iv
100 CHAPTER 6. EIGENVALUES AND EIGENVECTORS

for any vector v, where I is the n × n identity matrix, as follows:

Av − λv = 0

Av − λIv = 0
( A − λI )v = 0.
This is a homogeneous system of linear equations for the compo-
nents of v, with augmented matrix [ A − λI |0]. If the matrix A − λI
were invertible then the solution would simply be v = 0 but this is
not allowed by definition and in any case would be of no practical
use in applications. We hence require that A − λI be not invertible
and, by Theorem 3.43, this will be the case if

det( A − λI ) = 0. (6.1)

When we evaluate this determinant we will have a polynomial


equation of degree n in the unknown λ. The solutions of this equa-
tion will be the required eigenvalues. Equation (6.1) is called the
characteristic equation of the matrix A. The polynomial det( A − λI )
is called the characteristic polynomial of A.

Example 6.6. (Example 6.2 revisited) We start by forming


" # " # " #
2 6 1 0 2−λ 6
A − λI = −λ = .
3 5 0 1 3 5−λ

The determinant of this matrix is

det( A − λI ) = (2 − λ)(5 − λ) − 18 = λ2 − 7λ − 8

and hence the characteristic equation is

λ2 − 7λ − 8 = 0 ⇒ (λ − 8)(λ + 1) = 0 ⇒ λ = 8, −1.

That is, the eigenvalues of the given matrix A are λ = 8 and λ = −1.
For each eigenvalue λ we must solve the system

( A − λI )v = 0

to determine the corresponding eigenspace. In other words we must solve


the system with augmented matrix [ A − λI |0], using the techniques
learned in Chapter 1. When λ = 8 we have
" #
−6 6 0
⇒ E8 = span((1, 1)).
3 −3 0

When λ = −1 we have
" #
3 6 0
⇒ E−1 = span((−2, 1)).
3 6 0

In solving these systems of equations we end up with the complete eigenspace


in each case. For reasons that will become clear shortly, it is useful to de-
termine a basis for each eigenspace.
math1012 mathematical theory and methods 101

It is important to note that the reduced row echelon form of


( A − λI ) will always have at least one row of zeros. The number of zero
rows equals the number of free parameters, so is the dimension of
the eigenspace.

Example 6.7. Consider the upper triangular matrix


 
1 2 6
A = 0 3 5 .
 
0 0 4

Since A is upper triangular, so is A − λI and hence its determinant is just


the product of the diagonal. Thus the characteristic equation is

(1 − λ)(3 − λ)(4 − λ) = 0 ⇒ λ = 1, 3, 4.

Note that these are in fact the diagonal elements of A. The respective
eigenspaces are

E1 = span((1, 0, 0)), E3 = span((1, 1, 0)), E4 = span((16, 15, 3)).

The eigenvalues, and corresponding eigenvectors, could be


complex-valued. If the matrix A is real-valued then the eigenval-
ues, that is, the roots of the characteristic polynomial, will occur in
complex conjugate pairs.

Example 6.8. Consider "#


2 1
A= .
−5 4
The characteristic polynomial is λ2 − 6λ + 13 = 0. The eigenvalues are
λ = 3 + 2i and λ = 3 − 2i. When we solve ( A − λI )v = 0 we will get
solutions containing complex numbers. Although we can’t interpret them
as vectors in R2 there are many applications (particularly in Engineering)
in which there is a natural interpretation in terms of the problem under
investigation. The corresponding eigenvectors are
" # " #
1 − 2i 1 + 2i
and .
5 5

The characteristic polynomial may have repeated roots. If it


factors into the form

( λ1 − λ ) m1 · · · ( λ j − λ ) m j · · · ( λ p − λ ) m p

we say that the algebraic multiplicity of the eigenvalue λ j is m j . For


example, if the characteristic polynomial were (λ + 3)(λ − 2)4 (λ −
5) then the algebraic multiplicity of the eigenvalue 2 would be 4.

Example 6.9. Consider


 
−2 2 −3
A =  2 1 −6 .
 
−1 −2 0
102 CHAPTER 6. EIGENVALUES AND EIGENVECTORS

The characteristic equation is

−λ3 − λ2 + 21λ + 45 = 0

and so
(3 + λ )2 (5 − λ ) = 0 ⇒ λ = −3, −3, 5.
Repeating the root −3 reflects the fact that λ = −3 has algebraic multi-
plicity 2.
To find the eigenvectors corresponding to λ = 5 we solve
 
−7 2 −3 0
( A − 5I )v = 0 ⇒  2 −4 −6 0 .
 
−1 −2 −5 0

After some work we arrive at the reduced row echelon form


 
1 0 1 0
0 1 2 0 .
 
0 0 0 0

Note that we have one row of zeros. The solution will therefore involve
one free parameter, namely v3 . We readily get the solution v1 = −v3 and
v2 = −2v3 . Hence the eigenspace corresponding to λ = 5 is

E5 = {(−v3 , −2v3 , v3 )} = span((−1, −2, 1)).

Similarly, to find the eigenvectors corresponding to λ = −3 we solve


 
1 2 −3 0
( A + 3I )v = 0 ⇒  2 4 −6 0 .
 
−1 −2 3 0

The reduced row echelon form is


 
1 2 −3 0
0 0 0 0 ,
 
0 0 0 0

which has two rows of zeros and hence the solution will involve two free
parameters. The eigenspace corresponding to λ = −3 is

E−3 = {(−2v2 + 3v3 , v2 , v3 )} = span((−2, 1, 0), (3, 0, 1)).

The dimension of the eigenspace of an eigenvalue is called its


geometric multiplicity. In the above example the geometric multiplic-
ity of λ = −3 is 2 and that of λ = 5 is 1. Note that the eigenspace
corresponding to λ = −3 is a plane through the origin and the
eigenspace corresponding to λ = 5 is a line through the origin, as
displayed in Figure 6.2. Moreover {(−1, −2, 1), (−2, 1, 0), (3, 0, 1)} is
a basis of R3 all of whose elements are eigenvectors.
We summarise the two definitions of multiplicity.
math1012 mathematical theory and methods 103
math1011 multivariate calculus 1

z Figure 6.2: The eigenspaces for


Example 6.9.

E 3

y
E5

Definition 6.10. (Multiplicity of eigenvalue)


Let λi be an eigenvalue of the matrix A. The geometric multiplicity of
λi is dim( Eλi ), while the algebraic multiplicity of λi is the number of
factors (λi − λ) in the factorisation of the characteristic polynomial of A.

It can be proved that the geometric multiplicity of an eigenvalue


is always at most its algebraic multiplicity.

6.3 Some properties of eigenvalues and eigenvectors

Let A be an n × n matrix with eigenvalues λ1 , λ2 , · · · , λn , where we


include all complex and repeated eigenvalues. Then:

• The determinant of the matrix A equals the product of the eigen-


values:
det( A) = λ1 λ2 . . . λn .

• The trace of a square matrix is the sum of its diagonal entries.


The trace of the matrix A equals the sum of the eigenvalues:

trace( A) = a11 + a22 + · · · + ann = λ1 + λ2 + · · · + λn .

Note that in both of these formulae all n eigenvalues must be


counted.

• The eigenvalues of A−1 (if it exists) are

1 1 1
, , ··· , .
λ1 λ2 λn
104 CHAPTER 6. EIGENVALUES AND EIGENVECTORS

• The eigenvalues of A T (that is, the transpose of A) are the same


as for the matrix A:

λ1 , λ2 ,··· , λn .

• If k is a scalar then the eigenvalues of the matrix kA are

kλ1 , kλ2 ,··· , kλn .

• If k is a scalar and I the identity matrix then the eigenvalues of


the matrix A + kI are

λ1 + k, λ2 + k ,··· , λn + k.

• If k is a positive integer then the eigenvalues of Ak are

λ1k , λ2k ,··· , λkn .

• Any matrix polynomial in A:

A n + α n −1 A n −1 + · · · + α 1 A + α 0 I

has eigenvalues

λ n + α n −1 λ n −1 + · · · + α 1 λ + α 0 for λ = λ1 , λ2 , . . . , λ n .

• The Cayley-Hamilton Theorem: A matrix A satisfies its own


characteristic equation, that is, if the characteristic equation is

(−1)n λn + cn−1 λn−1 + · · · + c1 λ + c0 = 0

where c1 , c2 , · · · , cn are constants then

(−1)n An + cn−1 An−1 + · · · + c1 A + c0 I = 0.

• It can be shown that any set of ` vectors from ` different eigenspaces,


that is, corresponding to different eigenvalues, is a linearly inde-
pendent set.

6.4 Diagonalisation

Suppose that the n × n matrix A = ASS has enough eigenvectors


so that we can construct a basis B = {v1 , v2 , . . . , vn } only consisting
of eigenvectors. Then if we perform a change of basis on the matrix
to determine A BB , then that matrix will be diagonal! Indeed the
i-th column of A BB represents Avi with coordinates in terms of the
basis B. Since Avi = λi vi , we get all zero coordinates except for the
i-th one equal to λi .
We know from Chapter 5 that A BB = PBS ASS PSB . To reflect that
A BB is a diagonal matrix, we write D = A BB , and for short we
will write P = PSB 1 . Then we also have PBS = P−1 . Hence we can 1
Recall from Example 5.3 that the
rewrite the diagonalisation formula. matrix PSB is particularly easy to
determine: just take as columns the
vectors of B.
math1012 mathematical theory and methods 105

Let A be an n × n matrix and suppose there exists a basis B


of Rn only consisting of eigenvectors of A, then there exists a
diagonal matrix D and an invertible matrix P such that

D = P−1 AP.

More precisely, P has for columns the eigenvectors in B, and D


is a diagonal matrix with entries the eigenvalues corresponding
to those eigenvectors (in the same order).

Two matrices M and N are called similar matrices if there exists


an invertible matrix Q such that

N = Q−1 MQ.

Clearly A and the diagonal matrix D constructed from the eigenval-


ues of A are similar.
Example 6.11. (Example 6.9 revisited) Consider
 
−2 2 −3
A =  2 1 −6 .
 
−1 −2 0
We found a basis of eigenvectors B = {(−1, −2, 1), (−2, 1, 0), (3, 0, 1)}.
Thus we take
   
−1 −2 3 5 0 0
P =  −2 1 0 D = 0 −3 0  .
   
1 0 1 0 0 −3
Note that the columns of D and P must correspond, e.g. λ = 5 is in
column 1 of D so the corresponding eigenvector must be in column 1 of P.
We easily check that the diagonalisation formula holds.
Example 6.12. (An engineering example) In a number of branches of
engineering one encounters stress (and strain) tensors. These are in fact
matrix representations of (linearised) mechanical considerations. For ex-
ample, the stress tensor is used to calculate the stress in a given direction
at any point of interest (T (n) = σn in one of the standard notations). The
eigenvalues are referred to as principal stresses and the eigenvectors as
principal directions. The well-known (to these branches of engineering)
transformation rule for the stress tensor is essentially the diagonalisation
formula.
Diagonalisation is the process of determining a matrix P such that
−1
P AP is diagonal. All we need to do is to find the eigenvalues and
eigenvectors of A and form P as described above.
Note, however, that not every matrix is diagonalisable.
Example 6.13. Consider
 
1 −1 0
A = 0 1 1  .
 
0 0 −2
106 CHAPTER 6. EIGENVALUES AND EIGENVECTORS

The eigenvalues of A are λ = −2, 1, 1 (that is, λ = 1 has algebraic


multiplicity 2).
The eigenspace corresponding to λ = −2 is

E−2 = span((1, 3, −9)).

The eigenspace corresponding to λ = 1 is

E1 = span((1, 0, 0)).

Both λ = −2 and λ = 1 have geometric multipliciy 1. This means that for


λ = 1 the geometric multiplicity is less than the algebraic multiplicity. In
order to get the matrix P we need a basis of eigenvectors. Unfortunately,
there are not enough linearly independent eigenvectors to enable us to
build matrix P. Hence matrix A is not diagonalisable.

In order for a matrix to be diagonisable, the characteristic poly-


nomial must be factorisable fully into linear factors, and all geo-
metric multiplicities must be equal to the algebraic multiplicities.
Otherwise we won’t have enough linearly independent eigenvec-
tors.

Remark 6.14. If an n × n matrix has n distinct eigenvalues then it


will be diagonalisable because each eigenvalue will give a representative
eigenvector and these will be linearly independent because they correspond
to different eigenvalues.

Remark 6.15. Recall from Definition 3.3 that a matrix A is called a


symmetric matrix if it equals its transpose, that is

A = AT .

It can be shown that the eigenvalues of a real symmetric n × n matrix


A are all real and that we can always find enough linearly independent
eigenvectors to form matrix P, even if there are less than n distinct eigen-
values. That is, a real symmetric matrix can always be diagonalised.
7
Improper integrals

Recall that definite integrals were defined in MATH1011 for bounded


functions on finite intervals [ a, b]. In this chapter we describe how
to generalise this to unbounded functions and unbounded domains,
using limits. These are called improper integrals.

7.1 Improper integrals over infinite intervals

Definition 7.1. (Type I improper integrals)


(a) Let the function f be defined on [ a, ∞) for some a ∈ R and integrable
over [ a, t] for any t > a. The improper integral of f over [ a, ∞) is defined
to be
Z∞ Zt
f ( x ) dx = lim f ( x ) dx.
t→∞
a a

If the limit exists then the improper integral is called convergent. If the
limit does not exist then the improper integral is called divergent.
(b) Similarly
Zb Zb
f ( x ) dx = lim f ( x ) dx.
t→−∞
−∞ t

(c) Finally, suppose f ( x ) is defined for all x ∈ R. Consider an arbitrary


c ∈ R and define

Z∞ Zc Z∞
f ( x ) dx = f ( x ) dx + f ( x ) dx.
−∞ −∞ c

The improper integral is convergent if and only if for some c ∈ R both


integrals on the right-hand-side are convergent.

Remark 7.2. It can be shown that the choice of c is not important; i.e.
if for one particular choice of c the integrals on the right-hand-side are
convergent, then the same is true for any other choice of c and the sum of
the two integrals is always the same1 . 1
Try to prove this.
108 CHAPTER 7. IMPROPER INTEGRALS

Z∞
1
Example 7.3. Find the improper integral dx if it is convergent or
x3
1
show that it is divergent.
Z∞ Zt
1 1
Solution: dx = lim dx by definition. For any t > 1,
x3 t→∞ x3
1 1

Zt
1 1 t 1 1
   
dx = − = − +
x3 2x2 1 2t2 2
1

1
which converges to when t → ∞. Hence the improper integral is
2
1
convergent and its value is (cf. Figure 7.1)
2

5
Figure 7.1: The area A under
the graph of f ( x ) = x13 and
4 above the interval [1, ∞) is
finite even though the ‘bound-
aries’ of the area are infinitely
3
long.

0
1 2 3 4 5

Z∞
1
Example 7.4. Find the improper integral dx if it is convergent or
x
1
show that it is divergent.
Solution:
Z∞ Zt
1 1
dx = lim dx
x t→∞ x
1 1
= lim [ln x ]1t
t→∞
= lim ln t
t→∞

Z∞
1
which does not exist. Hence the integral dx is divergent (cf. Figure
x
1
7.2).

Example 7.5. Find all constants p ∈ R such that the improper integral
Z∞
1
dx is convergent.
xp
1
math1012 mathematical theory and methods 109

5
Figure 7.2: The area under
1
the graph of f ( x ) = and
4 x
above the interval [1, ∞) is
unbounded.
3

0
1 2 3 4 5

Solution: For p = 1 the previous example shows that the integral is


divergent. Suppose p 6= 1. Then
Zt  − p +1  t
1 x t 1− p − 1
dx = = .
xp −p + 1 1 1− p
1

1
When 1 − p < 0, lim t1− p = 0 so the integral is convergent (to ).
t→∞ p−1
When 1 − p > 0, lim t1− p = ∞ and therefore the integral is divergent.
t→∞
Hence the integral is divergent for p ≤ 1 and otherwise convergent.

7.2 Improper integrals of unbounded functions over finite in-


tervals

Sometimes we want to integrate a function over an interval, even


though the function is not defined at some points in the interval.

Definition 7.6. (Type II improper integrals)


(a) Assume that for some a < b the function f is defined and continuous
on [ a, b) however it has some kind of singularity at b, e.g. f ( x ) → ∞ or
−∞ as x → b− . Define the improper integral of f over [ a, b] by
Recall from MATH1011 that the nota-
Zb Zt tion lim , lim represent the left-hand
t→b− t→ a+
f ( x ) dx = lim f ( x ) dx. and right-hand limits respectively.
t→b−
a a

If the limit exists, then the improper integral is called convergent, other-
wise it is divergent.
(b) In a similar way, if f is continuous on ( a, b] however it has some
kind of singularity at a, e.g. f ( x ) → ∞ or −∞ as x → a+ . We define
Zb Zb
f ( x ) dx = lim f ( x ) dx.
t→ a+
a t
110 CHAPTER 7. IMPROPER INTEGRALS

(c) If for some c ∈ ( a, b), f is continuous on each of the intervals [ a, c)


and (c, b], however it has some kind of singularity at c, define

Zb Zc Zb
f ( x ) dx = f ( x ) dx + f ( x ) dx
a a c
Zt Zb
= lim f ( x ) dx + lim f ( x ) dx.
t→c− t→c+
a t

The improper integral on the left-hand side is convergent if and only if


both improper integrals on the right-hand-side are convergent.

Z1
1
Example 7.7. Consider the integral √ dx. This is an improper
x
0
1 1
integral, since √ is not defined at 0 and √ → ∞ as x → 0.
x x
Z1 Z1
1 1
By definition, √ dx = lim √ dx.
x t →0+ x
0 t
For 0 < t < 1, we have

Z1 √
1 √
√ dx = [2 x ]1t = 2 − 2 t
x
t

which converges to 2 as t → 0+ . So the improper integral is convergent


and its value is 2. (This example shows that the area A under the graph
1
of f ( x ) = √ and above the interval (0, 1] is finite even though the
x
‘boundaries’ of the area are infinitely long.)

Z2
1
Example 7.8. Consider the integral dx. It is improper, since
x−1
0
f ( x ) = 1/( x − 1) is not defined at x = 1 and is unbounded near x = 1.
However, f ( x ) is continuous on [0, 1) ∪ (1, 2].
Therefore

Z2 Z1 Z2
1 1 1
dx = dx + dx
x−1 x−1 x−1
0 0 1

and the integral on the left-hand-side is convergent if and only if both


integrals on the right-hand-side are convergent.
Consider
Z1 Zt
1 1
dx = lim dx.
x−1 t →1− x−1
0 0

Given 0 < t < 1, we have

Zt
1/( x − 1) dx = [ln | x − 1|]0t = ln |t − 1| − 0 = ln(1 − t)
0
math1012 mathematical theory and methods 111

which does not exist as t → 1− .


Hence this integral is divergent and therefore the original integral is
1
also divergent2 . (The area under the graph of f ( x ) = and above the 2
Note that if one integral is shown
x−1 to be divergent we do not need to
interval [0, 2] is unbounded.) check if the other one is convergent or
not, we already know the answer: the
Z1 original improper integral is divergent.
Exercise 7.2.1. For what values of p is x p dx improper, and in that
0
case for what values of p is the integral divergent? Compare with Example
7.5.

7.3 More complicated improper integrals

Sometimes you can have a combination of problems: multiple


points with singularities, perhaps also infinite intervals. The method
in this case is to split the domain of integration to get a sum of im-
proper integrals which can each be solved independently. They
all need to be convergent in order for the original integral to be
convergent.

Key Concept 7.9. (Splitting the domain of integration for improper


integrals)

1. Identify all the problems: ∞, −∞, singularities contained in the


domain of integration.
2. Split the domain of integration into subintervals such that each
subinterval has a singularity or ∞/ − ∞ at exactly one end.
3. Solve the improper integral (type I or II) over each of these subin-
tervals as in the previous sections.
4. If at any point you find a divergent integral, the original integral
is divergent, so you don’t need to solve further. Otherwise, the
original integral is convergent, and its value is the sum of all the
values of all the improper integrals you computed at the previous
step.

Z∞
1
Example 7.10. Consider the integral I = dx. The function
x−1
0
f ( x ) = 1/( x − 1) has a singularity at x = 1 (which is in the domain of
integration) and ∞ is part of the domain of integration.
Therefore we split the domain as follows: Z∞
1
We cannot compute dx di-
x−1
Z1 Z2 Z∞ 1
1 1 1 rectly as this integral has two prob-
I= dx + dx + dx lems, one at each end. Therefore we
x−1 x−1 x−1
0 1 2 need to split up (1, ∞) in two subinter-
vals, we arbitrarily chose to split at 2,
and the integral on the left-hand-side is convergent if and only if all three but one could have split at any number
larger than 1.
integrals on the right-hand-side are convergent.
112 CHAPTER 7. IMPROPER INTEGRALS

Z1
1
We saw in Example 7.8 that the first improper integral dx is
x−1
0
divergent, hence the integral I is divergent.
8
Sequences and series

8.1 Sequences

By a sequence we mean an infinite sequence of real numbers:

a1 , a2 , a3 , . . . , a n , . . .

We denote such a sequence by ( an ) or ( an )∞


n=1 . Sometimes our
sequences will start with am for some m 6= 1.

Example 8.1. (Sequences)

1.
1 1 1
1, , , . . . , , . . .
2 3 n
1
Here an = for all integers n ≥ 1.
n

2. bn = (−1)n n3 for n ≥ 1, defines the sequence

−1, 23 , −33 , 43 , −53 , 63 , . . .

3. For any integer n ≥ 1, define an = 1 when n is odd, and an = 0 when


n is even. This gives the sequence

1, 0, 1, 0, 1, 0, 1, 0, . . .

4. The sequence of the so called Fibonacci numbers is defined recur-


sively as follows: a1 = a2 = 1, an+2 = an + an+1 for n ≥ 1. This is
then the sequence
1, 1, 2, 3, 5, 8, 13, . . .

In the same way that we could define the limit as x → ∞ of a


function f ( x ) we can also define the limit of a sequence. This is not
surprising since a sequence can be regarded as a function with the
domain being the set of positive integers.

Definition 8.2. (Intuitive definition of the limit of a sequence)


114 CHAPTER 8. SEQUENCES AND SERIES

Let ( an ) be a sequence and L be a real number. We say that ( an ) has a


This definition can be made more
limit L if we can make an arbitrarily close to L by taking n to be suffi- precise in the following manner: We
ciently large. We denote this situation by say that ( an ) has a limit L if for every
e > 0 there exists a positive integer N
lim an = L. such that | an − L| < e for all n ≥ N.
n→∞ Most proofs in this chapter use this
definition, and hence are a bit technical
We say that ( an ) is convergent if lim an exists; otherwise we say that (some proofs will only be sketched or
n→∞
omitted).
( an ) is divergent.

Example 8.3. (Limits of sequences)

1. Let b be a real number and consider the constant series ( an ) where


an = b for all n ≥ 1. Then lim an = b.
n→∞

1
2. Consider the sequence an = (n ≥ 1). Then lim an = 0.
n n→∞

1
3. If α > 0 is a constant (that is, does not depend on n) and an = α for
n
1
any n ≥ 1, then lim an = 0. For instance, taking α = gives
n→∞ 2
1 1 1
1, √ , √ , . . . , √ , . . . −→ 0
2 3 n

4. Consider the sequence ( an ) from Example 8.1(3) above, that is

1, 0, 1, 0, 1, 0, 1, 0, . . .

This sequence is divergent.

Just as for limits of functions of a real variable we have Limit


Laws and a Squeeze Theorem:

Theorem 8.4 (Limit laws). Let ( an ) and (bn ) be convergent sequences


with lim an = a and lim bn = b. Then:
n→∞ n→∞

1. lim ( an ± bn ) = a ± b.
n→∞

2. lim (c an ) = c a for any constant c ∈ R.


n→∞

3. lim ( an bn ) = a b.
n→∞

an a
4. If b 6= 0 and bn 6= 0, for all n then lim = .
n→∞ bn b
Theorem 8.5 (The squeeze theorem or the sandwich theorem). Let
( an ), (bn ) and (cn ) be sequences such that lim an = lim cn = a and
n→∞ n→∞

a n ≤ bn ≤ c n

for all sufficiently large n. Then the sequence (bn ) is also convergent and
lim bn = a.
n→∞
math1012 mathematical theory and methods 115

We can use Theorems 8.4 and 8.5 to calculate limits of various


sequences.

Example 8.6. (Using Theorem 8.4 several times)

n2 − n + 1 n2 (1 − 1/n + 1/n2 )
lim = lim
n→∞ 3n2 + 2n − 1 n→∞ n2 (3 + 2/n − 1/n2 )

lim (1 − 1/n + 1/n2 )


n→∞
=
lim (3 + 2/n − 1/n2 )
n→∞
.
1 1
lim 1 − lim + lim 2
n→∞ n→∞ n n→∞ n
=
1 1
lim 3 + 2 lim − lim 2
n→∞ n→∞ n n→∞ n

1
=
3
cos n
Example 8.7. (Using the squeeze theorem) Find lim if it exists.
n→∞ n
Solution. (Note: Theorem 8.4 is not applicable here, since lim cos n
n→∞
does not exist.) Since −1 ≤ cos n ≤ 1 for all n, we have
1 cos n 1
− ≤ ≤
n n n
for all n ≥ 1. Using the Squeeze Theorem and the fact that
1 1
lim − = lim = 0
n→∞ n n→∞ n
cos n
it follows that lim = 0.
n→∞ n

A particular way for a sequence to be divergent is if it diverges


to ∞ or −∞.

Definition 8.8. (Diverging to infinity)


We say that the sequence ( an ) diverges to ∞ if given any positive num-
ber M we can always find a point in the sequence after which all terms are
greater than M. We denote this by lim an = ∞ or an → ∞.
n→∞
Similarly, we say that ( an ) diverges to −∞ if given any negative
number M we can always find a point in the sequence after which all
terms are less than M. We denote this by lim an = −∞ or an → −∞.
n→∞

Note that it follows from the definitions that if an → −∞, then


| an | → ∞.
Example 8.9. (Sequences diverging to ∞ or −∞)

1. Let an = n and bn = −n2 for all n ≥ 1. Then an → ∞, while


bn → −∞, and |bn | = n2 → ∞.

2. an = (−1)n n does not diverge to ∞ and an does not diverge to −∞,


either. However, | an | = n → ∞.
116 CHAPTER 8. SEQUENCES AND SERIES

3. Let r > 1 be a constant. Then r n → ∞. If r < −1, then r n does not


diverge to ∞ or −∞. However, |r n | = |r |n → ∞.

Note as in the case of the limit of a function, when we write


lim an = ∞ we do not mean that the limit exists and is equal
n→∞
to some special number ∞. We are just using a combination of
symbols which has been agreed will be taken to mean "the limit
does not exist and the reason it does not exist is that the terms in
the sequence increase without bound’. In particular, ∞ IS NOT a
number and so do not try to do arithmetic with it, and do not use
Theorem 8.4 nor Theorem 8.5 if any of the sequences diverges to ∞
or −∞. If you do you will almost certainly end up with incorrect
results, sooner or later, and you will always be writing nonsense.

Example 8.10. (Problems with adding diverging sequences) The sequence


an = n diverges to ∞. The following sequences all diverge to −∞ but have
different behaviours when added to the sequence ( an ).

1. bn = −n. Then an + bn = 0 → 0

2. bn = 1 − n. Then an + bn = 1 → 1

3. bn = −n2 . Then an + bn = n − n2 = n(1 − n) → −∞


√ √ √ √
4. bn = − n. Then an + bn = n − n = n( n − 1) → ∞

The following properties can be easily derived from the defini-


tions.
1
Theorem 8.11. If an 6= 0 for all n, then an → 0 if and only if → ∞.
| an |
1
Similarly, if an > 0 for all n ≥ 1, then an → ∞ if and only if → 0.
an
It follows from Example 8.9(3) and this theorem that for any
constant r with |r | < 1 we have r n → 0.

8.1.1 Bounded sequences

Definition 8.12. (Bounded set)


A non-empty subset A of R is called bounded above if there exists
N ∈ R such that x ≤ N for every x ∈ A. Any such N is called an
upper bound of A. Similarly, A is called bounded below if there exists
M ∈ R such that x ≥ M for every x ∈ A. Any such M is called a lower
bound of A.
If A is bounded both below and above, then A is called bounded.

Clearly, the set A is bounded if and only if there exists a (finite)


interval [ M, N ] containing A.
math1012 mathematical theory and methods 117

Now we can apply this definition to a sequence by considering


the set of elements in the sequence, which we denote by { an }.

Definition 8.13. (Bounded sequence)


A sequence ( an )∞n=1 is called bounded above if the set { an } is bounded
above, that is if an is less than or equal to some number (an upper bound)
for all n. Similarly, it is called bounded below if the set { an } is bounded
below (by a lower bound).
We say that a sequence is bounded if it has both an upper and lower
bound.

In Examples 8.1, the sequences in part (1) and (3) are bounded,
the one in part (2) is bounded neither above nor below, while the
sequence in part (4) is bounded below but not above.

Theorem 8.14. Every convergent sequence is bounded.

It is important to note that the converse statement in Theorem


8.14 is not true; there exist bounded sequences that are divergent,
for instance Example 8.1(3) above. So this theorem is not used to
prove that sequences are convergent but to prove that they are not,
as in the example below.

Example 8.15. (Unbounded sequence is divergent) The sequence an =


(−1)n n3 is not bounded, so by Theorem 8.14. it is divergent.

An upper bound of a sequence is not unique. For example, both


1
1 and 2 are upper bounds for the sequence an = . This motivates
n
the following definition.

Definition 8.16. (Supremum and infimum)


Let A ⊂ R. The least upper bound of A (whenever A is bounded above) is
called the supremum of A and is denoted sup A.
Similarly, the greatest lower bound of A (whenever A is bounded be-
low) is called the infimum of A and is denoted inf A.

Note that finite sets always have


Notice that if the set A has a maximum (that is a largest ele- a maximum and a minimum. The
notions of infimum and supremum are
ment), then sup A is the largest element of A. Similarly, if A has a most useful for infinite sets.
minimum (a smallest element), then inf A is the smallest element
of A. However, sup A always exists when A is bounded above even
when A has no maximal element. For example, if A is the open
interval (0, 1) then A does not have a maximal element (for any
a ∈ A there is always b ∈ A such that a < b). However, it has a
supremum and sup A = 1. Similarly, inf A always exists when A
is bounded below, regardless of whether A has a minimum or not.
For example, if A = (0, 1) then inf A = 0.
118 CHAPTER 8. SEQUENCES AND SERIES

Definition 8.17. (Monotone)


Note a constant sequence (see Ex-
A sequence ( an ) is called monotone if it is non-decreasing or non- ample 8.3(1)) is both monotone
increasing, that is, if either an ≤ an+1 for all n or an ≥ an+1 for all non-decreasing and monotone non-
n. increasing.

We can now state one important property of sequences.

Theorem 8.18 (The monotone sequences theorem). If the sequence


( an )∞
n=1 is non-decreasing and bounded above for all sufficiently large n,
then the sequence is convergent and

lim an = sup({ an }).


n→∞

If ( an )∞
n=1 is non-increasing and bounded below for all sufficiently large n,
then ( an ) is convergent and

lim an = inf({ an }).


n→∞

That is, every monotone bounded sequence is convergent.

Theorem 8.18 and the definition of divergence to ±∞ implies the


following:

Key Concept 8.19. For any non-decreasing sequence

a 1 ≤ a 2 ≤ a 3 ≤ · · · ≤ a n ≤ a n +1 ≤ · · ·

there are only two options:


Option 1. The sequence is bounded above. Then by the Monotone
Sequences Theorem the sequence is convergent, that is, lim an exists.
n→∞
Option 2. The sequence is not bounded above. It then follows
from the definition of divergence to ∞ that lim an = ∞.
n→∞
Similarly, for any non-increasing sequence

b1 ≥ b2 ≥ b3 ≥ · · · ≥ bn ≥ bn+1 ≥ · · ·

either the sequence is bounded below and then lim bn exists or the
n→∞
sequence is not bounded below and then lim bn = −∞.
n→∞

1
Example 8.20. Let an = for all n ≥ 1. As we know,
n
1
 
lim an = 0 = inf | n = 1, 2, . . .
n→∞ n
This just confirms Theorem 8.18.

The following limits of sequences can sometimes be useful (do


not memorise them though). The first three sequences are non-
increasing for n sufficiently large and the last one is non-decreasing
for n sufficiently large (these facts are hard to prove).
math1012 mathematical theory and methods 119

Theorem 8.21. 1. For every real constant α > 0


ln n
lim =0.
n→∞ nα
2.

lim n
n = lim n1/n = 1 .
n→∞ n→∞

3. For every constant a ∈ R


an
lim =0.
n→∞ n!
4. For every constant a ∈ R
 a n
lim 1+ = ea .
n→∞ n

8.2 Infinite series

Definition 8.22. (Infinite series)


An infinite series is by definition of the form

∑ a n = a1 + a2 + · · · + a n + · · · (8.1) It is important to be clear about the
n =1 difference between a sequence and
a series. These terms are often used
where a1 , a2 , . . . , an , . . . is a sequence of real numbers. interchangeably in ordinary English
but in mathematics they have distinctly
different and precise meanings.

As for sequences, the series will sometimes start with am with m 6=


1 (for instance m = 0).

Example 8.23. (Infinite series)

1. The geometric series with common ratio r is



∑ r n = 1 + r + r 2 + · · · + r n −1 + . . .
n =0

2. The harmonic series is


1 1 1
1+ + +···+ +...
2 3 n

3. The p-series is

1
∑ np
n =1
where p ∈ R is an arbitrary constant.
Note that the case p = 1 is the harmonic series.

Since an infinite series involves the sum of infinitely many terms


this raises the issue of what the sum actually is. We can deal with
this in a precise manner by the following definition.
120 CHAPTER 8. SEQUENCES AND SERIES

Definition 8.24. (Convergent and divergent series)


For every n ≥ 1,
s n = a1 + a2 + · · · + a n

is called the nth partial sum of the series in Equation (8.1). Then (sn ) is
a sequence. If lim sn = s we say that the infinite series is convergent
n→∞
and write

∑ an = s.
n =1

The number s is then called the sum of the series.


If lim sn does not exist, we say that the infinite series (8.1) is diver-
n→∞
gent.

In other words, when a series is convergent we have



∑ an = nlim
→∞
( a1 + a2 + · · · + a n ).
n =1

Example 8.25. (Convergent series) The interval (0, 1] can be covered


by subintervals as follows: first take the subinterval [1/2, 1] (of length
1/2), then the subinterval [1/4, 1/2] (of length 1/4); then the subinterval
[1/8, 1/4] (of length 1/8), etc. The total length of these subintervals is 1,
which implies the geometric series with common ratio 1/2 converges to 2:

1 1 1 1
1= + + +···+ n +...
2 4 8 2
We now generalise this more formally.

Example 8.26. (Convergence of geometric series) Consider the geometric


series from Example 8.23(1):

∑ r n = 1 + r + r 2 + · · · + r n −1 + · · ·
n =0

For the nth partial sum of the above series we have

1 − rn
s n = 1 + r + r 2 + · · · + r n −1 = .
1−r
when r 6= 1.

1 − rn 1
1. Let |r | < 1. Then lim sn = lim = , since lim r n = 0
n→∞ n→∞ 1 − r 1−r
whenever |r | < 1. Thus the geometric series is convergent in this case
and we write
1
1 + r + r 2 + · · · + r n −1 + · · · =
1−r

Going back to Example 8.25, now rigorously we have


∞ ∞
1 1 1
∑ 2n = − 1 + ∑ 2n = −1 + 1 − 1/2 = 1.
n =1 n =0
math1012 mathematical theory and methods 121

2. If |r | > 1, then lim r n does not exist (in fact |r n | → ∞ as n → ∞),


n→∞
1 − rn
so sn = does not have a limit; that is, the geometric series is
1−r
divergent.

3. If r = 1, then sn = n → ∞ as n → ∞, so the geometric series is again


divergent.

4. Let r = −1. Then sn = 1 for odd n and sn = 0 for even n. Thus


the sequence of partial sums is 1, 0, 1, 0, 1, 0, . . . , which is a divergent
sequence. Hence the geometric series is again divergent.

In conclusion we get the following theorem.

Theorem 8.27 (Convergence of the geometric series). The geometric



series ∑ rn is convergent if and only if |r| < 1, in which case its sum is
n =0
1
.
1−r

If the series ∑ an is convergent, then by definition nlim
→∞
sn = s.
n =1
Of course we also have that lim sn−1 = s. Since an = sn − sn−1 , we
n→∞
get that

lim an = lim sn − sn−1 = lim sn − lim sn−1 = s − s = 0


n→∞ n→∞ n→∞ n→∞

(we used the limit laws Theorem 8.4(1)). Therefore we get the fol-
lowing theorem.

Theorem 8.28. If the series ∑ an is convergent, then nlim
→∞
an = 0.
n =1

The above theorem says that lim an = 0 is necessary for con-


n→∞
vergence. The theorem is most useful in the following equivalent
form.

Theorem 8.29 (Test for Divergence). If the sequence ( an ) does not



converge to 0, then the series ∑ an is divergent.
n =1

n2 + 2n + 1
Example 8.30. (Using the Test for Divergence) The series ∑ 2
n=1 −3n + 4
is divergent by the Test for Divergence, since

n2 + 2n + 1 1 + 2/n + 1/n2 1
lim an = lim 2
= lim = − 6= 0 .
n→∞ n→∞ −3n + 4 n→∞ −3 + 4/n2 3
Note that Theorem 8.28 does not say that lim an = 0 implies
n→∞
that the series is convergent. For instance the harmonic series is
1
divergent even though lim = 0 (see Theorem 8.35 below).
n→∞ n
Convergent series behave well with regard to addition, sub-
traction, multiplying be a constant (but not multiplying/dividing
series).
122 CHAPTER 8. SEQUENCES AND SERIES

∞ ∞
Theorem 8.31 (Series laws). If the infinite series ∑ an and ∑ bn are
n =1 n =1
∞ ∞
convergent, then the series ∑ (an ± bn ) and ∑ c an (for any constant
n =1 n =1
c ∈ R) are also convergent with
∞ ∞ ∞ ∞ ∞
∑ ( a n ± bn ) = ∑ an ± ∑ bn and ∑ c an = c ∑ an .
n =1 n =1 n =1 n =1 n =1

Example 8.32. Using Theorem 8.31 and the formula for the sum of a
(convergent) geometric series (see Theorem 8.27), we get

∞ n ∞ n
2 3 2 3 ∞ 1
  
∑ −
3
+
4n +1
= ∑ −
3
+
4 n∑ n
n =0 n =0 =0 4

1 3 1 8
= + · = .
1 + 2/3 4 1 − 1/4 5

8.2.1 The integral test



Recall that we associate any series ∑ an with two sequences,
n =1
namely the sequence (sn ) of its partial sums and the sequence ( an )
of its terms. The sequence ( an ) can be seen as a function whose
domain is the set of positive integers

f : N −→ R : n → an .

In other words an = f (n) for each positive integer n.



Theorem 8.33 (The integral test). Suppose that ∑ an is an infinite se-
n =1
ries such that an > 0 for all n and f is a continuous, positive, decreasing1 1
Recall that a function is decreasing
function for x ≥ 1. If f (n) = an for all integers n ≥ 1, then the series and if f ( x ) > f (y) whenever x < y. If f
is differentiable, then f 0 ( x ) < 0 is a
improper integral necessary and sufficient condition.
∞ Z∞
∑ an and f ( x ) dx
n =1 1

either both converge or both diverge.



Note that this theorem also holds for a series ∑ an if all the
n=m
conditions on f hold for all x ≥ m.

Proof. Because f is a decreasing function, the rectangular polygon


with area
s n = a1 + a2 + a3 + · · · + a n

shown in Figure 8.1 contains the region under y = f ( x ) from x = 1


to x = n + 1. Hence
nZ+1
f ( x ) dx ≤ sn . (8.2)
1
math1012 mathematical theory and methods 123

y = f (x)
Figure 8.1: Underestimating
the partial sums with an inte-
gral.

a1 a2 a3 a4 a5 a6 an

Similarly, the rectangular polygon with area

s n − a1 = a2 + a3 + · · · + a n

shown in Figure 8.2 is contained in the region under y = f ( x ) from


x = 1 to x = n. Hence
Zn
s n − a1 ≤ f ( x ) dx. (8.3)
1

y = f (x)
Figure 8.2: Overestimating the
partial sums with an integral.

a2 a3 a4 a5 an

Zt
Notice that since f ( x ) > 0 for all x ≥ 1, f ( x ) dx is an increas-
1
ing function of t. Either that function is bounded and the improper
Z∞
integral f ( x ) dx is convergent, or it is unbounded and the inte-
1
gral diverges to ∞ (this is a similar idea to Key Concept 8.19).
124 CHAPTER 8. SEQUENCES AND SERIES

Z∞
Suppose first that the improper integral f ( x ) dx diverges, then
1

Zt
lim f ( x ) dx = ∞,
t→∞
1

so it follows from Equation (8.2) that lim sn = ∞ as well, and hence


n→∞

the infinite series ∑ an likewise diverges.
n =1
Z∞
Now suppose instead that the improper integral f ( x ) dx con-
1
verges to a finite value I. Then Equation (8.3) implies that
Zn
s n ≤ a1 + f ( x ) dx ≤ a1 + I
1

so the increasing sequence (sn ) is bounded, and so converges by


the monotone sequences theorem (Theorem 8.18). Thus the infinite
series

∑ an = nlim
→∞
sn
n =1
converges as well.
Hence we have shown that the infinite series and the improper
integral either both converge or both diverge.

Remark 8.34. Unfortunately this theorem does not tell us what the sum
of the series is, whenever it is convergent. In particular, it is not equal to
Z∞
f ( x ) dx.
1
Consider again the p-series from Example 8.23(3), that is,

1
∑ n p
n =1

where p ∈ R is an arbitrary constant.


Theorem 8.35. The p-series is convergent if and only if p > 1.
In particular, the harmonic series (a p-series with p = 1) di-
verges.
1
Proof. If p ≤ 0 then lim = lim n− p 6= 0, hence by the test for
n→∞ n p n→∞
divergence the p-series diverges.
1
If p > 0 the function f ( x ) = p is continuous, positive and
x
p
decreasing for x ≥ 1, since f 0 ( x ) = − p+1 < 0 for x ≥ 1. Thus we
x
can apply the integral test.
Z∞
1
Recall from Example 7.5 that the improper integral dx
xp
1
converges if p > 1 and diverges if p ≤ 1.
It therefore follows from the integral test that the p-series con-
verges if p > 1 and diverges if p ≤ 1.
math1012 mathematical theory and methods 125

This result does not tell us what the sum of the series is.

8.2.2 More convergence tests for series


There are several other results and tests that allow you to determine
if a series is convergent or not. We outline some of them in the
theorems below.
An easy way to determine if a series converges is to compare it
to a series whose behaviour we know. The first way to do this is in
an analogous way to the Squeeze Theorem.
∞ ∞
Theorem 8.36 (The Comparison Test). Let ∑ an and ∑ bn be infinite
n =1 n =1
series such that
0 ≤ a n ≤ bn

holds for all sufficiently large n.


∞ ∞
1. If ∑ bn is convergent then ∑ an is convergent.
n =1 n =1

∞ ∞
2. If ∑ an is divergent then ∑ bn is divergent.
n =1 n =1

Proof. (Sketch) Compare the two sequences of partial sums, notice


they are non-decreasing and use Key Concept 8.19.

Since we know the behaviour of a p-series (Theorem 8.35), we


usually use one in our comparison.

Example 8.37. (Using the Comparison Test)

1. Consider the series



1 + sin n
∑ n2
.
n =1

Since −1 ≤ sin n ≤ 1, we have 0 ≤ 1 + sin n ≤ 2 for all n. Thus

1 + sin n 2
0≤ 2
≤ 2 (8.4)
n n
for all integers n ≥ 1.

1
The series
n∑2
is convergent, since it is a p-series with p = 2 > 1.
n =1

2
Thus ∑ 2 is convergent by the series laws, and now Equation (8.4)
n =1 n

1 + sin n
and the Comparison Test show that the series ∑ is also
n =1 n2
convergent.

ln(n)
2. Consider the series ∑ n
. Since
n =1

1 ln(n)
0< ≤
n n
126 CHAPTER 8. SEQUENCES AND SERIES


1
for all integers n ≥ 3, and the series
n ∑
is divergent (it is the
n =1

ln(n)
harmonic series), the Comparison Test implies that the series ∑
n =1
n
is also divergent.

Another way to compare two series is to take the limit of the


ratio of terms from the corresponding sequences. This allows us to
compare the rates at which the two sequences go to 0.
∞ ∞
Theorem 8.38 (The Limit Comparison Test). Let ∑ an and ∑ bn be
n =1 n =1
infinite series such that an ≥ 0 and bn > 0 for sufficiently large n, and let
an
c = lim ≥ 0.
n→∞ bn
∞ ∞
(a) If 0 < c < ∞, then ∑ an is convergent if and only if ∑ bn is
n =1 n =1
convergent.
∞ ∞
(b) If c = 0 and ∑ bn is convergent then ∑ an is convergent.
n =1 n =1
∞ ∞
an
(c) If lim = ∞ and ∑ an is convergent then ∑ bn is convergent.
n→∞ bn n =1 n =1

The proof uses the formal definition of limits and the Compari-
son Test.

Remark 8.39. 1. Clearly in case (a) above we have that ∑ an is diver-
n =1
∞ ∞
gent whenever ∑ bn is divergent. In case (b), if ∑ an is divergent,
n =1 n =1
∞ ∞
then ∑ bn must be also divergent. And in case (c), if ∑ bn is diver-
n =1 n =1

gent, then ∑ an is divergent.
n =1

2. Notice that in cases (b) and (c) we have implications (not equivalences).

For example, in case (b) if we know that ∑ an is convergent, we can-
n =1
∞ ∞
not claim the same for ∑ bn . Similarly, in case (c) if ∑ bn is conver-
n =1 n =1

gent, we cannot conclude the same about ∑ an .
n =1

Again we will often compare with a p-series.

Example 8.40. (Using the Limit Comparison Test)



sin2 n + n
1. Consider the series ∑ 2 . To check whether the series is con-
n=1 2n − 1

vergent or divergent we will compare it with the series ∑ bn , where
n =1
math1012 mathematical theory and methods 127

1 sin2 n + n
bn = . For an = , we have
n 2n2 − 1
sin n 2
an n(sin2 n + n) n +1 1
= 2
= →
bn 2n − 1 2 − n12 2

sin2 n 1
as n → ∞, since 0 ≤ ≤ , so by the Squeeze Theorem
n n
sin2 n
lim = 0.
n→∞ n
∞ ∞
1
Now the series ∑ bn = ∑ n
is divergent (it is the harmonic series),
n =1 n =1

so part (a) of the Limit Comparison Test implies that ∑ an is also
n =1
divergent.
∞ √
2 n+3
2. Consider the series ∑ 2 . To check whether the series is con-
n=1 3n − 1

vergent or divergent we will compare it with the series ∑ bn , where
√ n =1
1 2 n+3
bn = 3/2 . For an = , we have
n 3n2 − 1
√ 2 + √3n
an n3/2 (2 n + 3) 2
= 2
= 1

bn 3n − 1 3 − n2 3

as n → ∞.
∞ ∞
1
Since the series ∑ bn = ∑ n 3/2
is convergent (it is a p-series with
n =1 n =1

p = 3/2 > 1), part (a) of the Limit Comparison Test shows that ∑ an
n =1
is also convergent.

8.2.3 Alternating series


Alternating series are infinite series of the form

∑ (−1)n−1 an = a1 − a2 + a3 − a4 + a5 − a6 + . . .
n =1

where a1 , a2 , . . . is a sequence with an ≥ 0 for all n.

Theorem 8.41 (The Alternating Series Test). Let ( an ) be a non-


increasing sequence such that lim an = 02 . Then the alternating series 2
Such a sequence must satisfy an ≥ 0
n→∞
∞ for all n.
∑ (−1)n−1 an is convergent. Moreover, if s is the sum of the alternating
n =1
series and sn its nth partial sum, then |s − sn | ≤ an+1 for all n ≥ 1.

The proof is quite technical and involves showing that the se-
quence of partial sums with even indices is non-decreasing and
bounded above.
128 CHAPTER 8. SEQUENCES AND SERIES

Remark 8.42. The conclusion of Theorem 8.41 about the convergence of


an alternating series remains true if we assume that ( an ) is non-increasing
for sufficiently large n and of course that lim an = 0. In other words,
n→∞
the sequence can increase for a finite number of terms before becoming
non-increasing.
Example 8.43. (Using the Alternating Series Test) The series

1 1 1 1 1 1
∑ (−1)n−1 n = 1− + − + − ...
2 3 4 5 6
n =1
1
is convergent according to the Alternating Series Test, since for an = ,
n
the sequence ( an ) is decreasing and converges to 0.

8.2.4 Absolute convergence and the ratio test


We begin with a theorem.

Theorem 8.44. If the infinite series ∑ |an | is convergent, then the series
n =1

∑ an is also convergent.
n =1
Proof. We apply the Comparison Test in an ingenious way.

Assume the infinite series ∑ |an | is convergent. Let bn = an +
n =1
| an |. Since an = ±| an |, we have that 0 ≤ bn ≤ 2| an |. By the series
∞ ∞
laws, ∑ 2|an | is convergent, and so ∑ bn is also convergent by the
n =1 n =1
Comparison Test. Finally we use again the series laws:
∞ ∞ ∞ ∞
∑ bn − | a n | = ∑ bn − ∑ | an | = ∑ an
n =1 n =1 n =1 n =1
is convergent.

This motivates the following definition.

Definition 8.45. (Absolute convergence)



An infinite series ∑ an is called absolutely convergent if the series
n =1
∞ ∞ ∞
∑ |an | is convergent. If ∑ an is convergent but ∑ |an | is divergent
n =1 n =1 n =1

then we say that ∑ an is conditionally convergent.
n =1

As Theorem 8.44 shows, every absolutely convergent series is


convergent. The next example shows that the converse is not true.

1
Example 8.46. (Conditionally convergent) The series ∑ (−1)n−1 n is
n =1
convergent (see Example 8.43). However,
∞ ∞
1 1
∑ (−1)n−1 = ∑
n =1
n n =1
n
math1012 mathematical theory and methods 129


1
is divergent (it is the harmonic series). Thus ∑ (−1)n−1 n is condition-
n =1
ally convergent.

sin n
Example 8.47. (Absolutely convergent) Consider the series ∑ 2
.
n =1 n
Since
sin n 1
0≤ ≤ 2
n2 n

1
and the series ∑ n 2
is convergent (a p-series with p = 2 > 1), the
n =1
∞ ∞
sin n sin n
Comparison Test implies that ∑ 2
is convergent. Hence ∑ 2
n =1 n n =1 n
is absolutely convergent and therefore convergent.

The following test is very useful.



Theorem 8.48 (The Ratio Test). Let ∑ an be such that
n =1

a n +1
lim = L.
n→∞ an

1. If L < 1, then ∑ an is absolutely convergent.
n =1


2. If L > 1, then ∑ an is divergent.
n =1

Note that when L = 1 the Ratio Test gives no information.

Proof. 1. Suppose L < 1. Choose a number r such that L < r <


a
1. For n large enough (say for n ≥ N), the ratio n+1 will
an
eventually be less than r (this follows from the formal definition
of a limit). Therefore | an+1 | < r | an | for all n ≥ N. In particular

| a N +1 | < r | a N | = r 1 | a N |
| a N +2 | < r | a N +1 | < r 2 | a N |
| a N +3 | < r | a N +3 | < r3 | a N | etc

In general | a N +t | < r t | a N | for every positive integer t.


We are now going to use the Comparison Test with the two
series

∑ | a N + n −1 | = | a N | + | a N +1 | + | a N +2 | + · · · and
n =1


1
∑ rn−1 |a N | = |a N | + r|a N | + r2 |a N | + · · · = |a N |(1 + r + r2 + · · · ) = |a N | 1 − r .
n =1

We have 0 ≤ | a N +n−1 | ≤ r n−1 | a N | for all n as seen above, and the


second series converges by Theorem 8.27 and the series laws (it
is a multiple of a geometric series with |r | < 1). Hence the first
130 CHAPTER 8. SEQUENCES AND SERIES


series converges. Now ∑ |an | is just adding a finite number of
n =1

terms to ∑ |a N +n−1 | (namely adding |a1 | + |a2 | + · · · + |a N −1 |)
n =1

so it is also convergent. Therefore, our series ∑ an is absolutely
n =1
convergent (and therefore convergent).

a n +1
2. If L > 1, then for n sufficiently large > 1, that is, the
an
sequence (| an |) is increasing (this follows from the formal def-
inition of a limit), so we cannot possibly have that lim an = 0.
n→∞

By the Test for Divergence (Theorem 8.28) the series ∑ an is
n =1
divergent.

Example 8.49. (Using the Ratio Test with L < 1)


n2
1. Consider the series ∑ 2n
. We have
n =1

2 2
a n +1 1 ( n + 1 )2 2n 1 n+1 1 1 1
 
= a n +1 × = × 2 = = 1+ →
an an 2n +1 n 2 n 2 n 2

1 n2
as n → ∞. Since L = < 1, the Ratio Test implies that ∑ n is
2 n =1
2
absolutely convergent and hence convergent.

bn
2. For any constant b ∈ R, consider the infinite series ∑ n!
. Then
n =0
bn
an = . We have
n!

a n +1 b n +1 n! |b|
= × n = →0
an ( n + 1) ! b n+1

bn
as n → ∞. So, by the Ratio Test, the series ∑ n!
is absolutely con-
n =0
vergent. Note that by the Test for Divergence this implies Theorem
8.21(3).

Example 8.50. (Using the Ratio Test with L > 1) Consider the series

n4n
∑ (−3)n . We have
n =1

a n +1 (n + 1)4n+1 (−3)n 4 1 4
= n 1
× n
= (1 + ) → > 1.
an −3 + n4 3 n 3

n4n
So, by the Ratio Test, the series ∑ (−3)n is divergent.
n =1
math1012 mathematical theory and methods 131

8.3 Power series

Definition 8.51. (Power series)


Let a ∈ R be a given number, (bn )∞
n=0 a given sequence of real numbers
and x ∈ R a variable (parameter).
A series of the form

∑ bn (x − a)n = b0 + b1 (x − a) + b2 (x − a)2 + · · · + bn (x − a)n + · · ·
n =0

is called a power series centred at a. When a = 0, the series is simply


called a power series.

Clearly a power series can be regarded as a function of x defined


for all x ∈ R for which the infinite series is convergent.
Let us apply the Ratio Test to this series. The terms of this series
are of the form an = bn ( x − a)n , so

a n +1 |b || x − a|n+1 b
L = lim = lim n+1 = | x − a| × lim n+1 .
n→∞ an n→∞ |bn || x − a|n n→∞ bn
bn + 1
There are three cases, according to lim .
n→∞ bn
bn +1 1
(a) If lim is a positive real number, which we denote by .
n→∞ bn R
| x − a|
Then L = . By the Ratio Test, this diverges when L > 1,
R
that is when | x − a| > R, and is absolutely convergent when
L < 1, that is when | x − a| < R. Note that

bn +1 − 1 bn
 
R = lim = lim .
n→∞ bn n → ∞ bn + 1

bn +1
(b) If lim = 0, then L = 0 and so by the Ratio Test the power
n→∞ bn
series is absolutely convergent for all x.
bn +1
(c) If lim = ∞, then L = ∞ and so by the Ratio Test the
n→∞ bn
power series diverges EXCEPT if x = a, then L = 0 and so the
series converges. We easily that the series reduces to just b0 when
x = a.

Thefore we proved the following theorem.



bn
Theorem 8.52. For a power series ∑ bn (x − a)n , let R = nlim
→∞ bn +1
.
n =0
Then one of the following three possibilities occurs:

(a) R is a positive real number, and the series is absolutely convergent for
| x − a| < R and divergent for | x − a| > R.

(b) R = ∞ and the series is absolutely convergent for all x ∈ R.


132 CHAPTER 8. SEQUENCES AND SERIES

(c) R = 0 and the series is absolutely convergent for x = a and divergent


for all x 6= a.

In other words, a power series is convergent at only one point or


everywhere or on an interval ( a − R, a + R) centred at a. It is not
possible for it to only be convergent at several separated points or
on several separate intervals. In case (a), when | x − a| = R then the
series may or not be convergent. It is even possible for a series to be
convergent for a + R but divergent for a − R or vice versa.

Definition 8.53. (Radius of convergence)


bn
The number R = lim is called the radius of convergence of the
n → ∞ bn + 1

power series ∑ bn ( x − a ) n .
n =0

Example 8.54. (Convergence of power series) Find all x ∈ R for which


the series

(−1)n
∑ √ n ( x + 3) n (8.5)
n =1

is absolutely convergent, conditionally convergent or divergent.


(−1)n
Solution. Here we have bn = √ and a = −3. We compute
n
√ r
bn 1 n+1 1
R = lim = lim √ × = lim 1 + = 1.
n → ∞ bn + 1 n→∞ n 1 n → ∞ n

Therefore we are in case (a) and so the power series is absolutely conver-
gent for x ∈ (−3 − 1, −3 + 1) = (−4, −2) and is divergent for x < −4
and x > −2. It remains to check the points x = −4 and x = −2.

(−1)n
Substituting x = −4 in Equation (8.5) gives the series ∑ √ (−1)n =
n =1 n

1
∑ n1/2 which is divergent (it is a p-series with p = 1/2 < 1).
n =1

(−1)n
When x = −2, the series (8.5) becomes ∑ √ , which is conver-
n =1 n
1
gent by the Alternating Series Test since ( √ ) is non-increasing with
n
∞ ∞
(−1)n 1
limit 0. However, ∑ √ = ∑ √ is divergent (as we mentioned
n =1 n n =1 n
∞ n
(−1)
above), so ∑ √ is conditionally convergent.
n =1 n
Conclusion: The series (8.5) is

1. absolutely convergent for −4 < x < −2;

2. conditionally convergent for x = −2

3. divergent for x ≤ −4 and x > −2.


math1012 mathematical theory and methods 133

Power series have the useful property that they can be differenti-
ated term-by-term.

Theorem 8.55 (Term-by-term differentiation of a power series).



Assume that the power series ∑ an (x − a)n has a radius of convergence
n =0
R > 0 and let f ( x ) be defined by

f ( x ) = a0 + a1 ( x − a ) + a2 ( x − a )2 + · · · + a n ( x − a ) n + · · ·

for | x − a| < R. Then f ( x ) is differentiable (and so continuous) for


| x − a| < R and

f 0 ( x ) = a1 + 2a2 ( x − a) + 3a3 ( x − a)2 + · · · + n an ( x − a)n−1 + · · ·

for | x − a| < R. Moreover, the radius of convergence of the power series


representation for f 0 ( x ) is R.

8.3.1 Taylor and MacLaurin series

Definition 8.56. (Power series representation)


If for a function f ( x ) we have

f ( x ) = a0 + a1 ( x − a ) + a2 ( x − a )2 + · · · + a n ( x − a ) n + · · ·

for all x in some interval I containing a, we say that the above is a power
series representation for f about a on I. When a = 0 this is simply
called a power series representation for f on I.

For example, the formula for the sum of a geometric series gives

1
= 1 + x + x2 + · · · + x n + · · · for all | x | < 1,
1−x
1
which provides a power series representation for f ( x ) =
1−x
on (−1, 1).
Suppose that a function f ( x ) has a power series representation

f ( x ) = a0 + a1 ( x − a ) + a2 ( x − a )2 + · · · + a n ( x − a ) n + . . . (8.6)

for those x such that | x − a| < R for some positive real number R.
Substituting x = a in Equation (8.6) implies that f ( a) = a0 . Next,
differentiating (8.6) using Theorem 8.55 implies

f 0 ( x ) = a1 + 2a2 ( x − a) + 3a3 ( x − a)2 + · · · + n an ( x − a)n−1 + · · ·


(8.7)
for all | x − a| < R. Then substituting x = a in Equation (8.7) gives
f 0 ( a ) = a1 .
Similarly, differentiating (8.7) yields

f 00 ( x ) = 2a2 + 6a3 ( x − a) + · · · + n (n − 1) an ( x − a)n−2 + · · ·


134 CHAPTER 8. SEQUENCES AND SERIES

for all | x − a| < R and substituting x = a in this equality gives


f 00 ( a)
f 00 ( a) = 2a2 , that is, a2 = .
2!
Continuing in this fashion we must have

f (n) ( a )
an = for each n.
n!

Definition 8.57. (Taylor series)


Assume that f ( x ) has derivatives of all orders on some interval I contain-
ing the point a in its interior. Then the power series

f (n) ( a ) f 00 ( a)
∑ ( x − a)n = f ( a) + f 0 ( a) ( x − a) + ( x − a )2 + · · ·
n =0
n! 2!

is called the Taylor series of f about a. When a = 0 this series is called


the MacLaurin Series of f .

We can find the radius of convergence of a Taylor series by using


the formula

bn f (n) ( a ) ( n + 1) ! f (n) ( a )
R = lim = lim × ( n +1) = lim (n + 1) (n+1)
n→∞ bn + 1 n→∞ n! f ( a) n→∞ f ( a)

if this limit exists. Within the radius of convergence, we know by


Theorem 8.52 that the power series is absolutely convergent. Note
however that it is not guaranteed that it converges to f ( x ), it could
converge to something else (though in practice that is very rare).
To prove rigorously that the series converges to the function, we
need to examine the error term for each partial sum. We denote the
(n + 1)st partial sum of the Taylor series of f at a by Tn,a ( x ).

f 00 ( a) f 000 ( a) f (n) ( a )
Tn,a ( x ) = f ( a) + f 0 ( a) ( x − a) + ( x − a )2 + ( x − a )3 + · · · + ( x − a)n
2! 3! n!
Theorem 8.58. For any given x ∈ I we have

f ( x ) = Tn,a ( x ) + Rn,a ( x )

where Rn,a ( x ) is the remainder (or error term) given by

f ( n +1) ( z )
Rn,a ( x ) = ( x − a ) n +1
( n + 1) !
for some z between a and x.

So, if for some x ∈ I we have Rn,a ( x ) → 0 as n → ∞, then



f (n) ( a )
f ( x ) = lim Tn,a ( x ) = ∑ ( x − a)n .
n→∞
n =0
n!

That is, when Rn,a ( x ) → 0 as n → ∞, the Taylor series is convergent


at x and its sum is equal to f ( x ).
math1012 mathematical theory and methods 135

To show that Rn,a ( x ) → 0 we determine an upper bound


Sn,a ( x ) for | Rn,a ( x )|. If the limit of the upper bound is 0, then
0 ≤ | Rn,a ( x )| ≤ Sn,a ( x ) → 0 so we can use the squeeze theorem
to get that Rn,a ( x ) → 0.

Example 8.59. (MacLaurin series)

1. Consider the function f ( x ) = e x . Then f (n) ( x ) = e x for all integers


n ≥ 1, so f (n) (0) = 1 for all n. Thus, the Taylor series for e x about 0
has the form

xn x x2 x3 xn
∑ n!
= 1+ +
1! 2!
+
3!
+···+
n!
+··· . (8.8)
n =0

We first determine the radius on convergence:

bn ( n + 1) !
R = lim = lim = lim n + 1 = ∞.
n→∞ bn + 1 n→∞ n! n→∞

Therefore the series is absolutely convergent for all x, that is I = R. We


will now show that the sum of this series is equal to e x for all x ∈ R.
By Theorem 8.58, we have f ( x ) = Tn,0 ( x ) + Rn,0 ( x ) for all x, where
f ( n +1) ( z ) n +1 ez
Rn,0 ( x ) = x = x n+1 for some z between 0 and
( n + 1) ! ( n + 1) !
x. We now split the analysis into two cases.
If x ≥ 0, then 0 ≤ z ≤ x so ez ≤ e x . Therefore

x n +1
0 ≤ | Rn,0 ( x )| ≤ e x →0,
( n + 1) !

as n → ∞ by Theorem 8.21(3) (we consider x as a constant here).


Now if x < 0 then x ≤ z < 0 so ez ≤ 1. Therefore

x n +1
0 ≤ | Rn,0 ( x )| ≤ →0,
( n + 1) !

as n → ∞ again by Theorem 8.21(3).


In both cases, by the Squeeze Theorem, lim | Rn,0 ( x )| = 0 for any
n→∞
x > 0 . Thus the MacLaurin series (8.8) is convergent and its sum is
e x for any x .

2. We easily compute the MacLaurin series of f ( x ) = sin x to be



x2n+1 x x3 x5 x2n+1
∑ (−1)n (2n + 1)! =
1!

3!
+
5!
− · · · + (−1)n
(2n + 1)!
+···
n =0

bn
In this case lim does not exist (the sequence alternates between
n→∞ bn + 1
x2n+1
0 and ∞), but we can use the Ratio Test, with an = (−1)n .
(2n + 1)!
Let

a n +1 x2n+3 (2n + 1)! x2


L = lim = lim × = lim =0
n→∞ an n→∞ (2n + 3) ! x2n+1 n→∞ (2n + 2)(2n + 3)
136 CHAPTER 8. SEQUENCES AND SERIES

for all x. Thus the series is absolutely convergent for all x.


By Theorem 8.58, we have f ( x ) = Tn,0 ( x ) + Rn,0 ( x ) for all x, where
f ( n +1) ( z ) n +1
Rn,0 ( x ) = x for some z between 0 and x, Now f (n+1) (z)
( n + 1) !
is one of sin z, cos z, − sin z, − cos z. So f (n+1) (z) ≤ 1.
Therefore
x n +1
0 ≤ | Rn,0 ( x )| ≤ →0,
( n + 1) !
as n → ∞ by Theorem 8.21(3) and we conclude as in the previous
exercise.

3. Similarly,

x2n x2 x4 x6 x2n
cos x = ∑ (−1)n (2n)! = 1−
2!
+ − − · · · + (−1)n
4! 6! (2n)!
+···
n =0

for all x ∈ R. Here the right hand side is the Taylor series of cos x
about 0.

4. We easily compute the MacLaurin series of f ( x ) = ln(1 + x ) to be

x2 x3 xn
x− + − · · · + (−1)n−1 + · · ·
2 3 n
We can compute the radius of convergence

bn n+1 1
R = lim = lim = lim 1 + = 1.
n→∞ bn + 1 n→∞ n n→∞ n

So the series converges absolutely for x ∈ (−1, 1) and diverges for


x < −1 and x > 1. We now examine the two points x = ±1. For
x = 1 the series becomes
1 1 1
1− + − · · · + (−1)n−1 + · · · ,
2 3 n
which is conditionally convergent by Example 8.46. For x = 1 the
series becomes the harmonic series which is divergent. We conclude that
the series converges for x ∈ (−1, 1].
In this case it is much harder to prove that the sum of the series is equal
to ln(1 + x ) (that is, that the error term Rn,0 ( x ) → 0 as n → ∞) and
we omit that proof here.
9
Fourier series

Because waves and vibrations tend to have a periodic structure,


that is, they repeat their basic shape in time or space, it is often
convenient to approximate a periodic function by a linear combi-
nation of perfect waves (the sine and cosine functions, which arise
in simple harmonic motion). Decomposing a general periodic func-
tion into a sum of trigonometrical functions is sometimes called
harmonic analysis. This process is often necessary because most
physical waves and vibrations do not have a perfect single sine or
cosine form but rather they are made up of a number of different
harmonics of the underlying system.
In order to develop the ideas required we need to formalise the
process. For convenience we shall assume for now that the underly-
ing periodicity (whether it be in space or time) is of length 2π. This
eases the subsequent manipulation somewhat but turns out not to
be unduly restrictive; if in practice our function has a periodicity of
length different to 2π it is relatively easy to scale our results so as
to account for this change. However, it is worth quickly reminding
ourselves of the definition of the period of a function.

Definition 9.1. (Period of a function)


We say that a function f from R to R has period P if f (t + P) = f (t)
for all t in R. In this case, we also say that f is P-periodic.

Graphically it is generally not difficult to identify a periodic


function for its sketch takes the form of a curve that clearly repeats
itself after an interval of length P. The rather peculiar function in
Figure 9.1 possesses discontinuities at t = (2n + 1)π for n ∈ Z but
nevertheless has a period 2π.
Recall that both sin(nt) and cos(nt) are 2π-periodic functions for
positive integer values of n. Note that the smallest period of sin(nt)
and cos(nt) is actually 2π/n, n ≥ 1.
We will define an infinite series using the functions sin(nt) and
cos(nt) and a constant term, which will hopefully converge to our
2π-periodic function. In other words, our hope is to approximate
138 CHAPTER 9. FOURIER SERIES

Periodic function

1 Figure 9.1: A periodic function


with period 2π.

−1

−3π −2π −π 0 π 2π 3π

Figure 9.2: Graphs of cos(nt)


and sin(nt) for n = 0, 1, 2, 3.

f (t) in the form


a0
S N f (t) = + a1 cos t + b1 sin t + a2 cos(2t) + b2 sin(2t) + . . . + a N cos( Nt) + b N sin( Nt)
2
N
a
= 0 + ∑ ( an cos(nt) + bn sin(nt)).
2 n =1

If our approximation is well-behaved we should expect that it im-


proves as N goes to infinity.

There is no need to consider n negative


Definition 9.2. (Fourier series) because sin(−nt) = − sin(nt) and
cos(−nt) = cos(nt).
The infinite series

a0
FS f (t) = lim S N f (t) = + ∑ ( an cos(nt) + bn sin(nt)) (9.1)
N →∞ 2 n =1

is called the Fourier series expansion of f where the constants a0 , an ,


bn (n = 1, 2, 3, . . .) are known as the Fourier coefficients. The various
S N f (t) functions are the partial sums of the Fourier series expansion.
math1012 mathematical theory and methods 139

We will explain in the next section how to compute the Fourier


coefficients. The following example illustrates how the approxima-
tions get better when N increases.

Example 9.3. Consider the piecewise 2π-periodic function



1, −π < t ≤ 0,
f (t) = and f (t + 2π ) = f (t).
−1, 0 < t ≤ π,

Figure 9.3: Graph of f for


Example 9.3.

We will see in Example 9.13 that


N
4 sin(nt)
S N f (t) = −
π ∑ n
.
n=1,n odd

Figure 9.4 illustrates the Fourier sums for different values of N.

Figure 9.4: Graph of S N f for


N = 11 (left), N = 21 (mid-
dle) and N = 101 (right), on
[−π, π ].

Notice how the approximation improves as we increase N (so we are


retaining more terms in the partial sum S N ( f )). We know that the true
function is equal to −1 for 0 < t ≤ π and the the approximation S N ( f )
oscillates about this value. When N = 11 this oscillation is quite notice-
able and relatively large; by the time N = 101 it is far less pronounced.
Notice also how the approximation S N f is relatively good away from dis-
continuities in the function f (t) but poorer as these points are approached.
140 CHAPTER 9. FOURIER SERIES

As an example of this look at the N = 11 result; it is clear that oscillations


in the approximating function are small around t = π/2 but increase as
either t → 0 or t → π. This is a well-known behaviour known as Gibbs’
phenomena which tends to occur when the function f (t) has points of
discontinuity.

9.1 Calculation of the Fourier coefficients

Recall that the Fourier series representation of f (t) is


Note at this juncture it seems a little
a0 strange to define the constant term as
FS f (t) = + a1 cos t + b1 sin t + a2 cos(2t) + b2 sin(2t) + · · · (9.1) a0
2 rather than simply a0 . We will see
2
We need a method of determining the values of the coefficients why this is done presently.
a0 , an , bn , for n = 1, 2, 3, . . ., so that FS f (t) converges (if possible) to
f (t). To find a0 we simply integrate both sides from −π to π to get
Zπ Zπ
a0
FS f (t)dt = dt = πa0
2
−π −π

because
Zπ Zπ
cos(nt) dt = 0 and sin(nt) dt = 0.
−π −π
So we set

1
a0 = f (t) dt. (9.1a)
π
−π
This says that a0 is twice the average value of f (t), or equiva-
lently, the zeroth order term a0 /2 is the average value of the func-
tion f (t).
We can show by direct evaluation of the integrals that

sin(mt) sin(nt)dt = 0, for any integers m, n, with m 6= n,
−π


sin(mt) cos(nt)dt = 0, for any integers m, n, with m 6= n,
−π


cos(mt) cos(nt)dt = 0, for any integers m, n, with m 6= n,
−π

Zπ Zπ
2
sin (nt)dt = π , cos2 (nt)dt = π for any integer n.
−π −π

It is an exercise to verify these statements, using the trigonometric


formulae in the Appendix, or using integration by parts (twice).
This allows us to calculate the other Fourier coefficients. To ob-
tain an we multiply Equation (9.1) by cos(nt) and integrate from
−π to π:

FS f (t) cos(nt) dt = πan
−π
math1012 mathematical theory and methods 141

so we set

1
an = f (t) cos(nt)dt. (9.1b)
π
−π
Note that only one term on the right-hand-side survives the in-
tegration. To obtain bn we multiply Equation (9.1) by sin(nt) and
integrate from −π to π:

FS f (t) sin(nt)dt = πbn ,
−π
so we set

1
bn = f (t) sin(nt) dt. (9.1c)
π
−π
Again, only one term on the right-hand-side survives. The above
process is called expanding the function f as an infinite sum of
orthogonal1 functions. 1
We can think of functions as vectors,
Notice we need to assume here that f is sufficiently regular for and define a dot product f 1 . f 2 =

all these integrals to be defined (for instance it is sufficient for f to f 1 (t) f 2 (t) dt, so that the functions
be piecewise continuous2 on [−π, π ]). −π
in the Fourier series are mutually
orthogonal (dot product equal to 0).
2
The expressions (9.1a, b, c) are called Euler’s formulae. A function f ( x ) is called piecewise
continuous on a given interval [ a, b]
Zπ if f has only finitely many points of
1 discontinuity in [ a, b].
a0 = f (t) dt
π
−π


1
an = f (t) cos(nt) dt
π
−π


1
bn = f (t) sin(nt) dt
π
−π

It is at this point that we can appreciate the reason we defined


a0
the constant term in the Fourier series to be and not simply a0 .
2
If we put n = 0 in Euler’s formula for an this result collapses to
1
the expression defining a0 . Had the factor not been inserted in
2
the definition of the Fourier series then the universal formula that
defines an for all values of n would not apply.
Example 9.4. Define the 2π-periodic function

0, −π < t ≤ 0,
f (t) = and f (t + 2π ) = f (t).
π − t, 0 < t ≤ π,

What is its Fourier series?


Solution: A simple calculation yields
Z0 Zπ Zπ
1 1 1 π
a0 = 0 dt + π − t dt = (π − t) dt = .
π π π 2
−π 0 0
142 CHAPTER 9. FOURIER SERIES

Figure 9.5: Graph of f for


Example 9.4.

A more complicated calculation yields for n > 0, Useful anti-derivative formulas (de-
rived via integration by parts):
Zπ t 1
1 1 − cos(nπ )
Z
t cos(nt) dt = sin(nt) + 2 cos(nt) + C
an = (π − t) cos(nt) dt = n n
π πn2
0 and
t 1
Z
t sin(nt) dt = − cos(nt) + 2 sin(nt) + C.
Hence we can evaluate an as n n

0, n > 0 even
an =
 2 , n odd
πn2

or, for k = 1, 2, . . .,
a2k−1 = 2
π (2k − 1)2
a2k = 0.

A similarly complicated calculation yields


1 1
bn = (π − t) sin(nt) dt = .
π n
0

Hence the Fourier series of the above function is

∞ ∞
π 2 cos(nt) sin(nt)
FS f (t) =
4
+
π ∑ n 2
+ ∑ n
n=1, n odd n =1

2 cos(3t) cos(5t)
 
π
= + cos t + + +···
4 π 9 25

sin(2t) sin(3t)
 
+ sin t + + +··· .
2 3
math1012 mathematical theory and methods 143

9.2 Functions of an arbitrary period

The above analysis extends to functions of arbitrary period 2L rather


than the special value 2π used above. In this case it turns out that

∞ ∞
a0 nπt nπt
   
FS f (t) = + ∑ an cos + ∑ bn sin
2 n =1
L n =1
L

where
ZL
1
a0 = f (t) dt
L
−L

ZL
1 nπt
 
an = f (t) cos dt
L L
−L

ZL
1 nπt
 
bn = f (t) sin dt
L L
−L

Notice that these results revert to the expressions (9.1a, b, c)


when L = π, as indeed they should.

9.3 Convergence of Fourier series

It is of importance to consider the convergence properties of Fourier


series, in other words what happens to the partial sums S N ( f ) as
N → ∞. We would hope that in this limit the partial sums would
approach f (t) so that we could approximate the true value of f (t)
at any given point to a given accuracy just by taking enough terms
in the requisite partial sum. Unfortunately, this cannot be guaran-
teed. Rather, instead of being able to prove that the partial sums
converge to f (t) at every point, it is only possible to be assured that
the integral of the squared difference of the partial sums and the
function goes to zero.
ZL
Theorem 9.5. Assume f (t)2 dt < ∞. Then
−L

ZL
lim (S N f (t) − f (t))2 dt = 0.
N →∞
−L

In other words, while we cannot be sure of the behaviour of


the partial sums at any one single given point, we do know that
the integral of the square of the difference does approach zero as
N → ∞. The consequence of this is that while FS f (t) = f (t) at
almost all points, there could be a finite number of locations in the
interval [− L, L], where FS f (t) 6= f (t).
144 CHAPTER 9. FOURIER SERIES

An additional issue arises for functions which possess a dis-


continuity. The function in Example 9.3 has a jump at t = 0; for
π < t ≤ 0 then f (t) = 1 but for 0 < t ≤ π we have f (t) = −1. What
does the Fourier series converge to at t = 0? This issue is settled by
the following theorem.

Theorem 9.6. Provided that f (t) and f 0 (t) are bounded and piecewise
continuous on [− L, L], the Fourier series will converge to (be equal to)
f (t) except at points of discontinuity, where it will converge to the
average of the right- and left-hand limits of f (t) at that point, i.e.

f (t+ ) + f (t− )
2
where f (t+ ) is the right-hand limit and f (t− ) is the left-hand limit.

Example 9.7. (Example 9.4 revisited) We see that f (t) and f 0 (t) are
bounded and piecewise continuous on [−π, π ] (with the only disconti-
nuity point being 0).3 Since for this function we have f (0− ) = 0 and 3
Note that f 0 (t) considered on its full
f (0+ ) = π, then by Theorem 9.6 the Fourier series FS f (t) converges domain also has discontinuity points
in the odd multiples of π.
to the average of these values, ie. π/2 at t = · · · , −2π, 0, 2π, · · · . The
graph of the Fourier series is then as shown in Figure 9.6.

Figure 9.6: Fourier series func-


tion of Example 9.4.

Note that it is identical to the graph of f (t) except that it takes the
π
value at integer multiples of 2π whereas the function itself is 0 at these
2
points.

9.4 Functions defined over a finite interval

Writing a function in terms of a Fourier series is convenient for


many calculations. Up to now we have only discussed strictly peri-
odic functions but the ideas of Fourier series can be extended and
applied to many functions that are defined on a finite interval but
which appear to have no intrinsic periodic properties.
math1012 mathematical theory and methods 145

Suppose we have f (t) defined on some finite interval of length


2L given by − L < t ≤ L. (It might seem restrictive to assume
that the interval is centred on t = 0. However if the interval is not
centred on the origin it is straightforward to apply a translation and
consider the function in terms of a new co-ordinate t0 for which the
centre is at t0 = 0.)
We can now extend f (t) to all real values of t by defining the
periodic extension of f (t).

Definition 9.8. (Periodic extension)


Let f (t) be a function defined on the interval (− L, L]. The periodic
extension of f (t) is the function φ(t) defined by:

φ ( t ) = f ( t ), − L < t ≤ L, and φ(t + 2L) = φ(t) ∀t.

Now φ(t) is defined for all values of t and is naturally a periodic


function of period 2L. Hence we are able to apply the theory of
Fourier series to φ(t).

Example 9.9. The graph of the Fourier series of the periodic extension of
et , −1 < t ≤ 1 is illustrated in Figure 9.7.

Figure 9.7: Fourier series of the


periodic extension of et .

Note that the periodic extension is discontinuous at t = · · · , −5, −3,


−1, 1, 3, 5,· · · .
146 CHAPTER 9. FOURIER SERIES

9.5 Even and odd functions

Definition 9.10. (Even functions)


A function f (t) if even if and only if f (−t) = f (t) for all t. The graph of
an even function is symmetrical in the vertical axis.

Simple examples of even functions include

f (t) = 1 , f ( t ) = t2 , f (t) = cos t.

In particular all the even power functions f (t) = t2n are even.

Definition 9.11. (Odd functions)


A function f (t) if odd if and only if f (−t) = − f (t) for all t. The graph
of an odd function is 180◦ rotationally symmetric around the origin.

Elementary examples of odd functions include

f (t) = t , f ( t ) = t3 , f (t) = sin t,

together with the function in Example 9.3. In particular all the odd
power functions f (t) = t2n+1 are odd.

Figure 9.8: Graphs of the even


functions cos t and t2 , and the
odd functions sin t and t3 .
math1012 mathematical theory and methods 147

Properties of odd and even functions:

Try proving them


(even) + (even) = (even), (odd) + (odd) = (odd)
(even) · (even) = (even), (odd) · (odd) = (even)
(odd) · (even) = (odd), (even) · (odd) = (odd)

ZL
(odd)dt = 0
−L
ZL ZL
If f (t) is even: : f (t)dt = 2 f (t)dt
−L 0

In words, this tells us that the sum of two even (odd) functions is
itself even (odd). The product of two even or two odd functions is
even while the product of an odd and an even function is odd. The
integral results are particularly important for they facilitate some
great simplifications in the calculation of Fourier series.

9.6 Fourier cosine series for even functions

Even functions must have even Fourier series and hence bn = 0


for all n, giving a Fourier cosine series. There is nothing particularly
special about a Fourier cosine series; really it is little more than a
standard Fourier series with the property that all its sine terms are
absent because the coefficients bn all happen to vanish. We can use
the integration properties of odd and even functions to verify that
bn = 0 when f (t) is an even function. From its definition

ZL
1 nπt
 
bn = f (t) sin dt
L L
−L

ZL
1
= (even)(odd) dt
L
−L

ZL
1
= (odd) dt
L
−L

= 0.

The Fourier series of an even function f (t) is the cosine


series

a0 nπt
 
+ ∑ an cos ,
2 n =1
L
148 CHAPTER 9. FOURIER SERIES

where
ZL ZL
2 2 nπt
 
a0 = f (t) dt and an = f (t) cos dt
L L L
0 0

Example 9.12. Determine the Fourier series of the even (‘Hats’4 ) function 4
Note that the name ‘hats’ function
derives from the from of its graph
sketched in Figure 9.9.

π + t, −π < t ≤ 0
f (t) = and f (t + 2π ) = f (t).
π − t, 0 < t ≤ π

Figure 9.9: Fourier cosine


series for Example 9.12.

Solution: Since f is even, its Fourier series is a cosine series. We can


compute:

2
a0 = (π − t) dt = π
π
0
and

Zπ 0 if n even
2 2(1 − (−1)n )
an = (π − t) cos(nt) dt = =
π n2 π  4 if n odd
0 n2 π
and hence the Fourier cosine series is

4 cos(nt) 4 cos(3t) cos(5t)
 
π π
FS f (t) = +
2 π ∑ n 2
= +
2 π
cos t +
9
+
25
+··· .
n=1, n odd

9.7 Fourier sine series for odd functions

Odd functions must have odd Fourier series and hence an = 0 for
math1012 mathematical theory and methods 149

all n leading to a Fourier sine series. Again it is relatively straightfor-


ward to check that the an = 0 because from the definition

ZL
1 nπt
 
an = f (t) cos dt
L L
−L

ZL
1
= (odd)(evem) dt
L
−L

ZL
1
= (odd) dt
L
−L

= 0.

The Fourier series of an odd function f (t) is the sine series



nπt
 
∑ bn sin L
,
n =1

where
ZL
2 nπt
 
bn = f (t) sin dt
L L
0

Example 9.13. Determine the Fourier series of the function in Example


9.3.

Solution: Since f (t) is an odd function, its Fourier series is a sine series.
We compute

Zπ 0 if n even
2 2(cos(nπ ) − 1)
bn = (−1) sin(nt) dt = = ,
π nπ − 4 if n odd
0 nπ
and hence the Fourier sine series is

4 sin(nt)
FS f (t) = −
π ∑ n
.
n=1,n odd

By Theorem 9.6 we know this converges to function shown in Figure


9.10.

9.8 Half-range expansions

Suppose that a function f (t) is only defined on [0, L]. We could


use the ideas described above to create a periodic function and
hence derive a Fourier series representation. However, with the
function defined on [0, L] it is possible to extend the function in
such a way that the resulting series contains only cosine terms or, if
150 CHAPTER 9. FOURIER SERIES

Figure 9.10: Fourier sine series


for Example 9.13.

the extension is made in another way, such that the series contains
only sine terms.
To see how to do this, we extend the domain of definition to
[− L, L], called a half-range expansion, in two ways. We accomplish
this by defining two new functions g(t) and h(t) according to the
following recipes.
Even expansion:

 f (t) if 0 ≤ t ≤ L
g(t) =
 f (−t) if − L ≤ t ≤ 0.

Now g(t) is an even function by construction; therefore the se-


ries for g(t) (or more precisely for its periodic extension) will be a
Fourier cosine series.
Odd expansion:

 f (t)

 if 0 < t ≤ L
h(t) = 0 if t = 0

− f (−t) if − L ≤ t < 0.

This time our function is an odd one so will be given by a


Fourier sine series.
As an example look at the sketches in Figure 9.11. Here a func-
tion f (t) is defined for 0 < t < 2 (left panel). In the centre is shown
the even expansion of f (t), that is the function g(t) given above.
This function is now defined on −2 < t < 2 and is clearly even
(its graph is symmetric about the vertical axis). On the other hand,
the right diagram illustrates the odd expansion h(t). This time the
graph possesses the characteristic 180o rotational symmetry about
the origin indicative of an odd function. The two extended func-
tions g(t) and h(t) clearly must be given by Fourier cosine and
Fourier sine series respectively. Notice that these two (different)
math1012 mathematical theory and methods 151

Figure 9.11: Original function


and its even and odd expan-
sions.

series converge to the same f (t) for 0 < t < 2 as both g(t) and h(t)
equal f (t) here but will naturally converge to different values for
t < 0.

Example 9.14. Find the Fourier series of the even and odd expansions of

f ( t ) = t2 , 0 ≤ t ≤ 1.

Solution:
Even expansion: The even expansion is just g(t) = t2 , −1 ≤ t ≤ 1.
We find the Fourier coefficients (as g(t) is even, bn = 0):

Z1
2
a0 = 2 t2 dt = ,
3
0

Z1
4(−1)n
an = 2 t2 cos(nπt) dt = .
π 2 n2
0

Then the Fourier cosine series is



1 4 (−1)n
+
3 π2 ∑ 2
cos(nπt).
n =1 n

This series converges to g(t) on [−1, 1] so in particular, it converges to


f (t) for 0 ≤ t ≤ 1. 
 t2 if 0 ≤ t ≤ 1
Odd expansion: The odd expansion is h(t) =
−t2 if − 1 ≤ t < 0.
We find the Fourier coefficients (as h(t) is odd, a0 = an = 0):

Z1
−4 − 2(n2 π 2 − 2)(−1)n
bn = 2 t2 sin(nπt) dt = .
π 3 n3
0

Hence the Fourier sine series is



2 2 + (n2 π 2 − 2)(−1)n

π3 ∑ n3
sin(nπt).
1

This series converges to h(t) on (−1, 1) so in particular, it converges to


f (t) for 0 ≤ t < 1.
Notice these two series look very different, but they converge to the
same value f (t) = t2 for 0 ≤ t < 1!
152 CHAPTER 9. FOURIER SERIES

9.9 Parseval’s theorem (not for assessment)

There is a relationship between the sum of the squares of all of


the Fourier coefficients of a function and the integral of the square
of the function itself over one period. This relationship turns out
to be very useful in Engineering, Physics and other branches of
Mathematics.

Theorem 9.15 (Parseval’s theorem). If a 2π-periodic, piecewise con-


tinuous on [−π, π ], bounded function f (t) has a Fourier series given
by

a0
FS f (t) = + ∑ ( an cos(nt) + bn sin(nt))
2 n =1

then

1 a20 ∞ 
[ f (t)]2 dt =

+ ∑ a2n + bn2 .
π 2 n =1
−π

The proof of this result is omitted here. What is more important


is to see how the theorem enables us to derive results concerning
the sums of infinite series of terms. We do this via a few examples.

Example 9.16. We shall apply Parseval’s theorem to Example 9.13. Recall


the function

1, −π < t ≤ 0,
f (t) = and f (t + 2π ) = f (t).
−1, 0 < t ≤ π,

We found that its Fourier series is



4
 
FS f (t) = ∑ −

sin(nt).
n=1,n odd

Parseval’s theorem says that

Z0 Zπ ∞ 2
1 1 4

2 2
π
1 dt +
π
(−1) dt = ∑ −

−π 0 n=1, n odd

∞ ∞
16 1 1 π2
⇒ 1+1 =
π2 ∑ n2
⇒ ∑ n 2
=
8
.
n=1, n odd n=1, n odd

We can use the result of the previous example to find the value

1
of ∑ 2 by noting that
n =1 n

∞ ∞ ∞
1 1 1
∑ 2
= ∑ 2
+ ∑ 2
n =1 n n=1, n odd
n n=1, n even n

and realising that



1 1 ∞ 1 1 ∞ 1
∑ (2k)2 4 ∑ k2 4 ∑ n2 .
= =
k =1 k =1 n =1
math1012 mathematical theory and methods 153

Making use of the result in the previous example gives



1 π2 1 ∞ 1 ∞
1 π2
∑ 2 8 4 ∑ n2
= + ⇒ ∑ 2 6. =
n =1 n n =1 n =1 n

Leonhard Euler first proved that equality in 1741 (by an entirely


different method, though).
Example 9.17. We shall apply Parseval’s theorem to Example 9.4. Recall
that the function is

0, −π < t ≤ 0,
f (t) = and f (t + 2π ) = f (t)
π − t, 0 < t ≤ π,

with Fourier series


∞ ∞
1 π 2 1
FS f (t) = · + ∑
2 2 n=1, n odd πn2
cos ( nt ) + ∑ n
sin(nt).
n =1

Then Parseval’s theorem tells us that


Zπ ∞ 2 ∞  2
1 π2 2 1

2
π
(π − t) dt =
8
+ ∑ πn2
+∑
n
0 n=1, n odd n =1

∞ ∞
π2 π2 4 1 1

3
=
8
+ 2
π ∑ n 4
+∑ 2
n
n=1, n odd n =1

5π 2 4 1 π2

24
=
π2 ∑ n4
+
6
.
n=1, n odd

1 π4
⇒ ∑ n4
=
96
.
n=1, n odd

An application of Parseval’s theorem to a suitably chosen func-


tion can often yield equivalent results for other infinite sums that
are often difficult to evaluate by other means.

1 π4
Exercise 9.9.1. Use ∑ n4
=
96
to determine the sum of the
n=1, n odd
p-series with p = 4.

9.10 Differentiation of Fourier series

If we wish to differentiate a function expressed as a Fourier series


it is tempting to simply differentiate each term in the infinite series
one by one. Very often in mathematics we need to be ultra-careful
when dealing with infinite series because results that look as if they
ought to be reasonable and sensible are not always true! Therefore
it is not an obvious result that the differentiation of a Fourier se-
ries of a function is possible term by term but, fortunately, it can be
proved to yield something that is useful. In particular, for a 2π-
periodic function, if
∞ ∞
a0
FS f (t) = + ∑ an cos(nt) + ∑ bn sin(nt)
2 n =1 n =1
154 CHAPTER 9. FOURIER SERIES

then
∞ ∞
(FS f )0 (t) = − ∑ nan sin(nt) + ∑ nbn cos(nt).
n =1 n =1

Moreover, the following theorem tells us what f 0 (t) is, if f is contin-


uous.

Theorem 9.18. If f is continuous, then (FS f )0 (t) = f 0 (t) at points t


where f (t) is differentiable.

In particular, note the requirement that the function f (t) be con-


tinuous; if it is not continuous then the result does not necessarily
follow.

Example 9.19. Recall the ’Hats’ function in Example 9.12; this function
is continuous. It is differentiable except at points an integer multiple of π.
Previously we showed that

π 4 cos(nt)
FS f (t) =
2
+
π ∑ n2
.
n=1, n odd

By Theorem 9.18, we have



4 sin(nt)
f 0 (t) = (FS f )0 (t) = −
π ∑ n
,
n=1, n odd

for all t not a multiple of π.

Figure 9.12: Derivative of the


Fourier series of the Hats func-
tion.

The graph of the original function (and of its Fourier series) appeared
in Example 9.12. We recognise from Example 9.13 that (FS f )0 (t) is the
Fourier series of the function from Example 9.3, so the graph of (FS f )0 (t)
is shown in Figure 9.12. Note that it has the value 0 at multiples of π,
the average of the left and right limits, but that f 0 (t) is not defined at
multiples of π.
math1012 mathematical theory and methods 155

Recall that a function cannot be differentiated at points of dis-


continuity. There are also problems with the convergence of the
differentiated series if f (t) is not continuous, as illustrated by the
following example.
Example 9.20. Let f (t) be the ‘Slopes’ function:
t
f (t) = , −π < t ≤ π , f (t + 2π ) = f (t).
2
The graph of its Fourier series is shown in Figure 9.13. Notice it is discon-
tinuous at odd multiples of π.

Figure 9.13: Fourier series of


the Slopes function (Example
9.20)

Since f is an odd function, it has a sine Fourier series and we can show
(exercise) that

(−1)n+1
FS f = ∑ sin(nt).
n =1
n
If we naively find the derivative of the Fourier series term-by-term we
deduce that

(FS f )0 (t) = ∑ (−1)n+1 cos(nt) = cos t − cos(2t) + cos(3t) + · · · .
n =1
(9.2)
We have an obvious problem here. We know5 For example, if we try to 5
Recall the ‘Test for Divergence’
evaluate Equation (9.2) at t = 0 we have Theorem 8.28 that if an infinite series
is to converge then necessarily the nth
term in the series must go to zero as
1−1+1−1+... n → ∞.

and this series clearly does not converge. Moreover, note that t = 0 is
not a problem point for f and f 0 (0) = 1/2 so it is not the case that the
differentiated series only fails at points where f (t) is not differentiable or
has some other problem. If we evaluate Equation (9.2) at t = π we have
−1 − 1 − 1 − · · · which is also clearly nonsense. The actual derivative
function is shown in Figure 9.14 and it is not defined at odd multiples of
π.
156 CHAPTER 9. FOURIER SERIES

Figure 9.14: Actual derivative


of the slopes function.

The conclusion is that we must not differentiate the Fourier


series of a non-continuous function and expect to obtain results
with any meaning.

9.11 Integration of Fourier series

It turns out that the Integration of Fourier series is more stable than
differentiation in the sense that fewer potential problems tend to
arise.

Theorem 9.21. Let f (t) be a 2π-periodic, piecewise continuous on


[−π, π ], bounded function. If a0 = 0 then
Zt ∞ ∞
an bn
f (α) dα = ∑ n
sin(nt) − ∑
n
(cos(nt) − cos(nπ )) .
−π n =1 n =1

Recall that sin(nπ ) = 0 for integer values of n so the first right-


hand side term is simpler than the second. We must have a0 = 0
because
Zt
a0 a0
dα = (t + π )
2 2
−π
which is not a Fourier series component.

Example 9.22. Recall the Slopes function from Example 9.20 and its
Fourier series

(−1)n+1
FS f = ∑ sin(nt).
n =1
n
Notice a0 = 0 and the function is bounded and piecewise continuous on
[−π, π ]. Thus by Theorem 9.21:
Zt ∞ ∞
cos(nt) − cos(nπ ) cos(nt) − (−1)n
   
α
dα = ∑ (−1)n = ∑ (−1)n .
2 n =1 n2 n =1 n2
−π
math1012 mathematical theory and methods 157

From this we get


∞ ∞ ∞
t2 − π 2 (−1)n cos(nt) 1 π2 (−1)n cos(nt)
= ∑ − ∑ = − + ∑ .
4 n =1 n2 n =1 n
2 6 n =1 n2

t2 − π 2 π2
Note that the average value of function is − , and that the
4 6
integral function is continuous.

Figure 9.15: The integral of


the Fourier series of the slopes
function.
10
Differential equations

10.1 Introduction

In a vast number of situations a mathematical model of a system


or process will result in an equation (or set of equations) involving
not only functions of the dependent variables but also derivatives
of some or all of those functions with respect to one or more of the
variables. Such equations are called differential equations.
The simplest situation is that of a single function of a single
independent variable, in which case the equation is referred to
as an ordinary differential equation. A situation in which there is
more than one independent variable will involve a function of
those variables and an equation involving partial derivatives of that
function is called a partial differential equation.
Notationally, it is easy to tell the difference. For example, the
equation
∂f ∂f
+ = f2 (10.1)
∂x ∂y
is a partial differential equation to be solved for f ( x, y), whereas
d2 f df
+ 3 + 2 f = x4 (10.2)
dx2 dx
is an ordinary differential equation to be solved for f ( x ).
The order of a differential equation is the degree of the highest
derivative that occurs in it. The partial differential equation 10.1 is
first-order and the ordinary differential equation 10.2 is second-
order. For partial differential equations the degree of a mixed
derivative is the total number of derivatives taken. For example,
the following partial differential equation for f ( x, t) has order five:
∂5 f ∂2 f ∂f
+ + = 0. (10.3)
∂x3 ∂t2 ∂x2 ∂t
An important class of differential equations are those referred
to as linear. Roughly speaking, linear differential equations are
those in which neither the function nor its derivatives occur in
products, powers or nonlinear functions. Differential equations that
are not linear are referred to as nonlinear. Equation 10.1 is nonlinear,
whereas Equations 10.2 and 10.3 are both linear.
160 CHAPTER 10. DIFFERENTIAL EQUATIONS

Example 10.1. Classify the following differential equations with respect


to (i) their nature (ordinary or partial), (ii) their order, and (iii) linear or
nonlinear:
∂f ∂f ∂u ∂u
( a) − = 1. (d) = x + t.
∂x ∂y ∂x ∂t

∂2 g d2 P
(b) + g = sin t. (e) P2 = x5 + 1.
∂t2 dx2
d3 y ∂4 F
(c) + 8y = x sin x. (f) = t2 F.
dx3 ∂x∂y3

Solution:

1. Equations ( a), (b), (d) and ( f ) involve partial derivatives and are
hence partial differential equations, whereas equations (c) and (e) in-
volve ordinary derivatives and are hence ordinary differential equations.

2. Recall that the order of a differential equation is the degree of the high-
est derivative that occurs in it. The orders of the differential equations
are as follows:

( a) First-order. (d) First-order.


(b) Second-order. (e) Second-order.
(c) Third-order. ( f ) Fourth-order.

3. Recall that linear differential equations are those in which neither the
function nor its derivatives occur in products, powers or nonlinear
functions. It doesn’t matter how the independent variables appear.
We observe that equations ( a), (b), (c) and ( f ) are linear whereas
equations (d) and (e) are nonlinear.

10.1.1 Solutions of differential equations


When asked to solve an algebraic equation, for example x2 − 3x +
2 = 0, we expect the answers to be numbers. The situation with
differential equations is much more difficult because we are being
asked to find functions that will satisfy the given equation, for
example in Example 10.1(a) we are asked for a function f ( x, y) that
∂f ∂f
will satisfy the partial differential equation − = 1, and
∂x ∂y
in Example 10.1(c) we are asked to find a function y( x ) that will
d3 y
satisfy 3 + 8y = x sin x.
dx
Unlike algebraic equations, which only have a discrete set of
solutions (for example x2 − 3x + 2 = 0 only has the solutions x = 1
or 2) differential equations can have whole families of solutions.
For example, y = Ce3x satisfies the ordinary differential equation
dy
= 3y for any value of C.
dx
math1012 mathematical theory and methods 161

If a differential equation is linear then there is a well-established


procedure for finding solutions and we shall cover this in detail for
ordinary differential equations. If an ordinary differential equation
is nonlinear but is of first-order then we may also be able to find
solutions.
The theory of partial differential equations is outside the scope of
this unit.

10.1.2 Verification of solutions of differential equations


To get a feel for things (and to practice our algebra) we’ll have a
quick look at the relatively simple procedure of verifying solutions
of differential equations by way of a few examples.

Example 10.2. Verify that


y( x ) = C1 e2x + C2 e−2x − 2 cos x − 5x sin x (10.4)
is a solution of the ordinary differential equation
d2 y
− 4y = 25x sin x (10.5)
dx2
for any value of the constants C1 and C2 .

d2 y
Solution: We need to calculate . In order to do this we need the
dx2
product rule to differentiate x sin x. It gives
d
( x sin x ) = sin x + x cos x
dx
and
d2 d
( x sin x ) = (sin x + x cos x ) = 2 cos x − x sin x.
dx2 dx
Hence
d2 y
= 4C1 e2x + 4C2 e−2x − 8 cos x + 5x sin x
dx2
and substitution of this and Equation 10.4 into Equation 10.5 quickly
yields the required verification.

Example 10.3. Verify that both


1 1
f ( x, y) = xy − y2 and f ( x, y) = sin(y − x ) + x2
2 2
are solutions of the partial differential equation
∂f ∂f
+ = x.
∂x ∂y

∂f ∂f
Solution: In each case we need to calculate and . For f ( x, y) =
∂x ∂y
1
xy − y2 we have
2
∂f ∂f ∂f ∂f
= y and = x−y ⇒ + = y + x − y = x.
∂x ∂y ∂x ∂y
162 CHAPTER 10. DIFFERENTIAL EQUATIONS

1
For f ( x, y) = sin(y − x ) + x2 we have
2
∂f ∂f
= − cos(y − x ) + x and = cos(y − x )
∂x ∂y
∂f ∂f
⇒ + = − cos(y − x ) + x + cos(y − x ) = x.
∂x ∂y

In both cases we have verified the solution of the partial differential equa-
tion.

10.2 Mathematical modelling with ordinary differential equa-


tions

For real-world systems changing continuously in time we can use


derivatives to model the rates of change of quantities. Our mathe-
matical models are thus differential equations.

Example 10.4. Modelling population growth.


The simplest model of population growth is to assume that the rate of
change of population is proportional to the population at that time. Let
P(t) represent the population at time t. Then the mathematical model is

dP
= rP for some constant r > 0.
dt
It can be shown (using a method called separation of variables, which
we shall learn shortly) that the function P(t) that satisfies this differential
equation is

P(t) = P0 ert where P0 is the population at t = 0.

This model is clearly inadequate in that it predicts that the population will
increase without bound if r > 0. A more realistic model is the logistic
growth model

dP
= rP(C − P) where r>0 and C>0 are constants.
dt
The method of separation of variables can be used to show that the solution
of this differential equation is

CP0
P(t) = where P0 = P(0).
P0 + (C − P0 )e−rt

This model predicts that as time goes on, the population will tend towards
the constant value C, called the carrying capacity.

Example 10.5. Newton’s law of cooling


The rate at which heat is lost from an object is proportional to the differ-
ence between the temperature of the object and the ambient temperature.
math1012 mathematical theory and methods 163

Let H (t) be the temperature of the object (in ◦ C) at time t and suppose
the fixed ambient temperature is A◦ C. Newton’s law of cooling says that

dH
= α( A − H ) for some constant α > 0.
dt
The method of separation of variables can be used to show that the solution
of this differential equation is

H (t) = A + ( H0 − A)e−αt where H0 = H (0).

This model predicts that as time goes on, the temperature of the object will
approach that of its surroundings, which agrees with our intuition.

Example 10.6. One tank mixing process


Suppose we have a tank of salt water and we allow fresh water into the
tank at a rate of F m3 /sec, and allow salt water out of the tank at the same
rate, as illustrated in Figure 10.1. Note this is a volume rate, and that the
volume V of the tank is maintained constant. We assume instantaneous
mixing so that the tank has a uniform concentration.

Figure 10.1: A mixing tank.


F
Fresh water

V
Volume

F
Salt water

Let y(t) represent the salt concentration of the water (kg/m3 ) in the
tank at time t and a(t) represent the amount of salt (kg). We have y(t) =
a(t)
. The tank starts with an amount of salt a0 kg.
V
The rate at which salt is being removed from the tank at time t is given
by

da F
= −y(t) × (flow rate) = − Fy(t) = − a(t) = −αa(t)
dt V
F
where α = is a positive constant. This equation has the solution
V
−αt
a(t) = a0 e , which approaches zero as t → ∞ (as expected).
Consider the same tank which is now filled with fresh water. Water pol-
luted with q kg/m3 of some chemical enters the tank at a rate of F m3 /sec,
and polluted water exits the tank at the same rate. We again assume in-
stantaneous mixing so that the tank has a uniform concentration.
164 CHAPTER 10. DIFFERENTIAL EQUATIONS

Let y(t) represent the concentration of pollutant (kg/m3 ) in the water


in the tank at time t and a(t) represent the amount of pollutant (kg). We
a(t)
again have y(t) = . The rate at which pollutant is being added to the
V
tank at time t is given by

da
= (amount of pollutant added per second)
dt
− (amount of pollutant removed per second).

That is,
da F
= qF − Fy(t) = qF − a(t).
dt V
Alternatively, we can obtain a differential equation for the concentration
x (t) by dividing through the above equation by V to give
dy F dy
= (q − y) ⇒ = α(q − x )
dt V dt
F
where α = is a positive constant. Notice that this is essentially the
V
same as the differential equation that we obtained for Newton’s law of
cooling.

10.3 First-order ordinary differential equations

Most first-order ordinary differential equations can be expressed


(by algebraic re-arrangement if necessary) in the form
dy
= f ( x, y) (10.6)
dx
where the function f ( x, y) is known, and we are asked to find the
solution y( x ).

10.3.1 Direction fields


Equation 10.6 means that for any point in the xy−plane (for which
dy
f is defined) we can evaluate the gradient and represent this
dx
graphically by means of a small arrow representing the vector
dy
 
1, . If we do this for a whole grid of points in the xy-plane
dx
and place all of the arrows on the same plot we produce what is
called a direction field or slope field. Figure 10.2 displays the direction
field in the case where f ( x, y) = y2 − x2 .
A solution of Equation 10.6 is a function relating y and x which
geometrically is a curve in the xy−plane. Since this solution satis-
fies the differential equation, the curve is such that its gradient is
the same as the direction field vector at any point on the curve.
That is, the direction field is a collection of arrows that are
tangential to the solution curves. This observation enables us to
roughly sketch solution curves without actually solving the differ-
ential equation, as long as we have a device to plot the direction
math1012 mathematical theory and methods 165

Figure 10.2: The direction field


dy
of = y2 − x 2 .
dx

field. We can indeed sketch many such curves (called a family of


solution curves) superimposed on the same direction field.

dy
Example 10.7. The direction field of = y2 − x2 along with three
dx
(disjoint) solution curves through the points ( x, y) = (0, 1), (0, 0) and
(0, −2) is shown in Figure 10.3.

Figure 10.3: Three solution


dy
curves for = y2 − x 2 .
dx

Remark 10.8. Note that we will not be able to solve the differential
equation in Example 10.7 using the techniques we will cover in this unit
– this differential equation is known as a Ricatti differential equation,
which are notoriously difficult to solve.
166 CHAPTER 10. DIFFERENTIAL EQUATIONS

dy
Example 10.9. The direction field of = 3y + e x along with three
dx
solution curves are shown in Figure 10.4. The top curve is the solution
that goes through ( x, y) = (0, 1), the middle curve is the solution that
goes through ( x, y) = (0, 0) and the bottom curve is the solution that goes
through ( x, y) = (0, −1).

Figure 10.4: Three solution


dy
curves for = 3y + e x .
dx

10.3.2 Separation of variables


A first-order differential equation is called separable provided that
the function f ( x, y) may be written as the product of a function of x
and a function of y, that is f ( x, y) = F ( x ) G (y).
Thus the variables x and y can be “separated” and placed on
opposite sides of the equation; that is, given
dy
= F ( x ) G ( y ),
dx
dy
then by thinking of the derivative as a fraction1 we have 1
Just like we do with the chain rule.
dx
1
dy = F ( x ) dx,
G (y)
and then each side can be integrated, so that
1
Z Z
dy = F ( x ) dx + C,
G (y)
where the arbitrary integration constant C includes the constants
from both integrals.
We then solve this equation (if possible) for y, which yields the
general solution of the differential equation.
If we can uniquely solve for y, then the solution is called the ex-
plicit solution of the differential equation, but if we cannot uniquely
math1012 mathematical theory and methods 167

solve for y, then the solution is called the implicit solution of the
differential equation.

dy
Example 10.10. Solve the first-order differential equation = y2 sin x.
dx

Solution: The differential equation is separable. The solution is given by


1
Z Z
y−2 dy = sin x dx ⇒ − = − cos x + C (10.7)
y

which is the implicit solution, where C is a constant of integration. We


can re-arrange Equation 10.7 to get the explicit solution
1
y( x ) = , (10.8)
cos x + C
where we have arbitrarily re-named the integration constant from −C to
+C. Note that Equation 10.7 does not hold if x = 0. In such situations we
dy
have to investigate the original differential equation = y2 sin x. In this
dx
case it turns out that y( x ) = 0 is in fact a solution, but not of the form of
Equation 10.8. Special situations like this are something that we should be
aware of.

10.3.3 The integrating factor method


A first-order linear differential equation is one that may be written
in the following standard form

dy
+ f ( x ) y = g ( x ),
dx
where f ( x ) and g( x ) are arbitrary functions of x only. Note that if
g( x ) 6= 0, the differential equation is not separable.
To solve such a differential equation, we multiply both sides by a
function I ( x ) such that the left-hand-side may be written

dy d
 
I + fy = ( Iy),
dx dx
thus allowing the left-hand-side to be integrated – hence the func-
tion I ( x ) is called an integrating factor.
If an integrating factor I ( x ) can be found, then the general solu-
tion is
d
Z
( Iy) = Ig ⇒ Iy = Ig dx + C
dx
which implies
1 C
Z
y( x ) = I ( x ) g( x ) dx + . (10.9)
I (x) I (x)

How to we find the function I ( x )? Since

dy d
 
I + fy = ( Iy),
dx dx
168 CHAPTER 10. DIFFERENTIAL EQUATIONS

we have by expanding the left-hand-side and using the product rule


on the right-hand-side that
dy dI dy dI dI
I + Ify = y + I ⇒ Ify = y ⇒ = I f.
dx dx dx dx dx
This is a separable differential equation for I ( x ), with solution

1
Z Z 
dI = f dx ⇒ ln( I ) = f dx + C ⇒ I = exp f dx + C .
I

We want the simplest possible solution for I ( x ), so we set C = 0.


Hence the integrating factor is2 2
Note that exp x is just another way
of writing e x but has the advantage
that the “power” x is not a small
Z 
I ( x ) = exp f ( x ) dx . superscript.

Example 10.11. . Solve the first-order linear differential equation


dy
− 3y = e x . (10.10)
dx
As a guide to the shape of the solutions, the direction field for this differ-
ential equation along with a number of solution curves appears in Figure
10.4.

Solution: The integrating factor is


Z 
I ( x ) = exp −3 dx = e−3x .

Multiplying the differential equation through by I ( x ) gives


dy
e−3x − 3e−3x y = e−2x .
dx
We know that now the left-hand side of this can be rewritten in product
form and so we obtain:
d  −3x 
e y = e−2x ,
dx
which can be integrated to give
1
Z
e−3x y = e−2x dx = − e−2x + C
2
hence
1
y( x ) = − e x + Ce3x .
2

Remark 10.12. Note that we could have written down the solution
immediately by appealing to Equation 10.9 but when learning the
method it is instructive to follow through each step in the process in
order to gain a better understanding of how it works.
However, the general solution strategy is as follows:
math1012 mathematical theory and methods 169

1. Write the linear first-order differential equation in standard form


dy
+ f ( x )y = g( x ) and identify the functions f ( x ) and g( x ).
dx
Z 
2. Find the integrating factor I ( x ) = exp f ( x ) dx , omitting
the integration constant.
Z
3. Find I ( x ) g( x ) dx, omitting the integration constant.

1 C
Z
4. The general solution is then y( x ) = I ( x ) g( x ) dx + .
I (x) I (x)

Example 10.13. Solve the first-order linear differential equation

dy 1
− y = xe x .
dx x

1
Solution: Here we have f ( x ) = − and g( x ) = xe x . Then
x
1
Z Z  
f ( x ) dx = − dx = − ln x = ln x −1 ,
x
and hence
Z 
−1
I ( x ) = exp f ( x ) dx = eln( x ) = x −1 .

Then
Z Z   Z
I ( x ) g( x ) dx = x −1 ( xe x ) dx = e x dx = e x ,

and the general solution is therefore

1 C
Z
y( x ) = I ( x ) g( x ) dx +
I (x) I (x)

1 C
 
= (e x ) +
x −1 x −1

= xe x + Cx.

10.3.4 Initial conditions


The values of constants of integration that arise when we solve
differential equations can be determined by making use of other
conditions (or restrictions) placed on the problem. For first-order
differential equations, these conditions are called initial conditions
and the combined differential equation plus initial condition is
called an initial value problem.
170 CHAPTER 10. DIFFERENTIAL EQUATIONS

dy
Example 10.14. Solve − 3y = e x subject to y(0) = 1, that is, y = 1
dx
when x = 0.

Solution: We have already seen this differential equation. It is Equation


10.10 and we have determined that its (most general) solution is given by
1
y( x ) = − e x + Ce3x .
2
All we have to do is substitute y = 1 and x = 0 and solve the resulting
algebraic equation for C. We have
1 3
1 = − e0 + Ce0 ⇒ C= ,
2 2
so the required solution is

3e3x − e x
y( x ) = . (10.11)
2
The solution curve of Equation 10.11 appears in Figure 10.4.

Example 10.15. Solve the initial value problem

dy x2
= , y(1) = 4.
dx y

Solution: We observe that the differential equation is separable. The


solution is:
1 2 1
Z Z
y dy = x2 dx ⇒ y = x3 + C
2 3
which implies r
2 3
y( x ) = ± x + C,
3
where we have arbitrarily re-named the integration constant.
Notice that we have two different solutions to the differential equation,
one positive and one negative. The initial condition y(1) = 4 allows us to
eliminate the negative solution, so we are left with
r
2 3
y( x ) = x + C,
3
and substituting into this y = 4 and x = 1 gives
r
2x3 + 46
r
2 46
4= +C ⇒ C = ⇒ y( x ) = .
3 3 3

10.4 Second-order ordinary differential equations

A general second-order differential equation may be written in the


form
d2 y dy
 
= f x, y, ,
dx2 dx
math1012 mathematical theory and methods 171

where f ( x, y, y0 ) is an arbitrary (but known!) function of x, y and y0 ,


and we wish to find a solution y( x ) that satisfies the given differen-
tial equation.
A second-order linear differential equation is one that may be
written
d2 y dy
+ p ( x ) + q ( x ) y = g ( x ),
dx2 dx
where p( x ), q( x ) and g( x ) are arbitrary functions of x. A second-
order linear differential equation is said to be homogeneous if g( x ) =
0, otherwise the differential equation is nonhomogeneous and g( x ) is
called the nonhomogeneous term.
If p( x ) = p and q( x ) = q are constant functions, then we have a
second-order linear differential equation with constant coefficients

d2 y dy
2
+ p + qy = g( x ),
dx dx
otherwise the differential equation is said to have variable coefficients.
Since two integration’s are required to find a solution of a
second-order differential equation and each integration produces
an arbitrary integration constant, the general solution y( x ) will
contain two integration constants, C1 and C2 .

Theorem 10.16. (Principal of Superposition)


If y1 and y2 are two solutions of the second-order linear homogeneous
differential equation

d2 y dy
2
+ p( x ) + q( x )y = 0,
dx dx
then the linear combination C1 y1 + C2 y2 is also a solution for any values
of the constants C1 and C2 .

d2 y dy
Proof. If y1 and y2 are both solutions of 2
+ p( x ) + q( x )y = 0,
dx dx
then

d2 y1 dy d2 y2 dy
+ p ( x ) 1 + q ( x ) y1 = 0 and + p( x ) 2 + q( x )y2 = 0.
dx2 dx dx2 dx
Now,

d2 d
2
(C1 y1 + C2 y2 ) + p( x ) (C1 y1 + C2 y2 ) + q( x ) (C1 y1 + C2 y2 )
dx dx

d2 y1 d2 y2 dy dy
= C1 2
+ C2 2
+ C1 p( x ) 1 + C2 p( x ) 2 + C1 q( x )y1 + C2 q( x )y2
dx dx dx dx
 2 :
 0  2  :0

d y1 dy1  d y2 dy2 
   
 
= C1 + p( x )
2  dx
+ q( x )y1 + C2 + p( x )
2  dx
+ q ( x ) y2
dx  dx 
 

= 0.
172 CHAPTER 10. DIFFERENTIAL EQUATIONS

Definition 10.17. (Linear dependence)


Let y1 ( x ), y2 ( x ) be a set of functions. The set y1 ( x ), y2 ( x ) is linearly
dependent on an interval I if there are constants C1 and C2 , not both
zero, so that
C1 y1 ( x ) + C2 y2 ( x ) = 0
for every value of x in I.
The set y1 ( x ), y2 ( x ) is linearly independent if it is not linearly
dependent (that is, the only possible way the above equation is satisfied is
if C1 = 0 = C2 ).

A simple way to check if two solutions y1 and y2 are linearly


independent is to calculate a function called the Wronskian of y1 and
y2 , denoted by W [y1 , y2 ]( x ), which is defined below.

Definition 10.18. (Wronskian of a set of two functions)


Let y1 ( x ), y2 ( x ) be a set of differentiable functions. The Wronskian of the
set y1 ( x ), y2 ( x ), denoted by W [y1 , y2 ]( x ), is

y1 y2
" #
dy dy
W [y1 , y2 ]( x ) = det 0 0 = y1 2 − y2 1 .
y y dx dx
1 2

Theorem 10.19. (Wronskian and Linear Dependence)


Let y1 ( x ), y2 ( x ) be a set of differentiable functions. If W [y1 , y2 ]( x ) 6= 0
for all x in some interval I, then y1 and y2 are linearly independent on I.
If W [y1 , y2 ]( x ) = 0 for every x in some interval I, then y1 and y2 are
linearly dependent on I.

Remark 10.20. To prove Theorem 10.19 and the next Theorem, we need
some concepts of linear algebra which will not be covered until the next
unit.

Theorem 10.21. (General Solution)


Consider the second-order linear homogeneous differential equation

d2 y dy
+ p( x ) + q( x )y = 0,
dx2 dx

and suppose that y1 ( x ) and y2 ( x ) are linear independent solutions of this


differential equation.
Then the general solution

y( x ) = C1 y1 ( x ) + C2 y2 ( x )

with arbitrary constants C1 and C2 includes every possible solution of the


differential equation.
math1012 mathematical theory and methods 173

Key Concept 10.22. The conclusion from all this is: given a second-
order linear homogeneous differential equation

d2 y dy
2
+ p( x ) + q( x )y = 0,
dx dx
the general solution is

y( x ) = C1 y1 ( x ) + C2 y2 ( x )

for arbitrary constants C1 and C2 , where both y1 and y2 are solutions


of the differential equation and the functions y1 and y2 are linearly
independent, that is the Wronskian

y1 y2
" #
dy dy
W [y1 , y2 ]( x ) = det 0 0 = y1 2 − y2 1 6= 0.
y y dx dx
1 2

10.5 Linear homogeneous second-order ordinary differential equa-


tions with constant coefficients

For the remainder of this Chapter, we will only consider linear


second-order ordinary differential equations with constant coeffi-
cients, which when homogeneous have the general form
d2 y dy
2
+ p + qy = 0,
dx dx
where p and q are constants.
In seeking a solution technique, consider the first-order equation
dy
p + qy = 0,
dx
which is a separable first-order differential equation with solution
dy qy 1 q
Z Z
=− ⇒ dy = − dx
dx p y p
q
⇒ ln y = − x + C
p

⇒ y( x ) = Cemt
q
where m = − and C is the constant of integration that we have
p
arbitrarily re-named from eC .
By analogy we attempt to find a solution to the second-order
differential equation by assuming a solution of the form y = emx ,
and the differential equation becomes
emx (m2 + pm + q) = 0 ⇒ m2 + pm + q = 0,
which is the characteristic equation or auxiliary equation of the differ-
ential equation. Since it is a quadratic in m, it has two roots
− p + p2 − 4q − p − p2 − 4q
p p
m1 = , m2 = .
2 2
174 CHAPTER 10. DIFFERENTIAL EQUATIONS

Hence there are three cases to consider, depending on whether


the discriminant p2 − 4q is positive, negative or zero.

Case 1. Two real roots


In this case the discriminant is positive and we have two real
distinct roots m1 and m2 . Then y1 = em1 x and y2 = em2 x are two
solutions of the differential equation, and the Wronskian is
 m x
e m2 x

e 1
W [y1 , y2 ]( x ) = det   = ( m − m ) e ( m1 + m2 ) x ,
2 1
m1 x m2 x
m1 e m2 e
which is never zero since m1 6= m2 . Hence the general solution of
the differential equation is

y( x ) = C1 em1 x + C2 em2 x .

Example 10.23. Solve the differential equation


d2 y dy
− 5 + 4y = 0.
dx2 dx

Solution: The characteristic equation is m2 − 5m + 4 = 0 which factor-


izes into (m − 1)(m − 4) = 0 and hence the required solutions are m1 = 1
and m2 = 4. Then the general solution is

y( x ) = C1 e x + C2 e4x .

Case 2. Complex conjugate roots


In this case the discriminant is negative and we have two com-
plex roots m1 = a + ib and m2 = a − ib that are complex conjugates
of each other, where a = − 21 p and b = 12 4q − p2 . Then y1 = em1 x
p

and y2 = em2 x are two solutions of the differential equation, and the
Wronskian is again never zero since m1 6= m2 . The general solution
of the differential equation is then

y( x ) = C1 e(a+ib) x + C2 e(a−ib) x .

Recalling Euler’s formula eix = cos x + i sin x, we have

y( x ) = C1 e ax eibx + C2 e ax e−ibx

= C1 e ax [cos(bx ) + i sin(bx )] + C2 e ax [cos(bx ) − i sin(bx )]

= (C1 + C2 )e ax cos(bx ) + i (C1 − C2 )e ax sin(bx )

= C1 e ax cos(bx ) + C2 e ax sin(bx ),

where we have arbitrarily re-named the two integration constants.


Hence the general solution of the differential equation is

y( x ) = C1 e ax cos(bx ) + C2 e ax sin(bx ).
math1012 mathematical theory and methods 175

Example 10.24. Solve the differential equation

d2 y dy
2
− 4 + 13y = 0.
dx dx

Solution: The characteristic equation is m2 − 4m + 13 = 0. The quadratic


formula gives the roots as m1,2 = 2 ± 3i and hence the general solution is

y( x ) = C1 e2x cos(3x ) + C2 e2x sin(3x ).

Case 3. Equal roots


In this case the discriminant is zero and we have one repeated
root m = − 21 p, so we only know “half” of the general solution.
How to we find the other “half” of the solution, namely y2 ? If
1
we let y( x ) = v( x )y1 ( x ) = v( x )e− 2 px for some function v( x ) to be
found, then using the product rule we have

dy  1  dv 1 − 1 px − 12 px dv 1
   
− 2 px
= e + v − pe 2 =e − pv ,
dx dx 2 dx 2

and using the product rule again we find that

d2 y d2 v dv 1 2
 
1
= e− 2 px − p + p v .
dx2 dx2 dx 4

Then the differential equation becomes

d2 v dv 1 2 − 12 px dv 1
   
− 12 px 1
e 2
− p + p v + pe − pv + qve− 2 px = 0,
dx dx 4 dx 2

which simplifies to

d2 v 1 2
 
+ − p + q v = 0.
dx2 4

Since p2 − 4q = 0, the coefficient of v in the above equation is


zero, so we have
d2 v
= 0,
dx2
which can be integrated twice to give

v( x ) = C1 + C2 x.

Therefore
1 1 1
y( x ) = v( x )y1 ( x ) = (C1 + C2 x )e− 2 px = C1 e− 2 px + C2 xe− 2 px ,

and the second linearly independent solution of the differential


1
equation is therefore y2 = xe− 2 px .

Remark 10.25. This is an example of a process called reduction of


order, which is way to “build” the general solution of a second-order
linear differential equation provided we can find a single solution y1 .
176 CHAPTER 10. DIFFERENTIAL EQUATIONS

So, if the characteristic equation has only one root m, and the
general solution of the differential equation is

y( x ) = C1 emx + C2 xemx .

Notice that the Wronskian is

emx xemx
 

W [y1 , y2 ]( x ) = det   = e2mx ,


memx (1 + mx )emx

which is never zero since m 6= 0.

Example 10.26. Solve the differential equation

d2 y dy
2
+ 6 + 9y = 0.
dx dx

Solution: The characteristic equation is m2 + 6m + 9 = 0, which factor-


izes into (m + 3)2 = 0 and hence the required component solutions are
e−3x and xe−3x . Hence the general solution of the differential equation is

y( x ) = C1 e−3x + C2 xe−3x .

Key Concept 10.27. In summary, to find the general solution of a


linear homogeneous second-order ordinary homogeneous differential
equation with constant coefficients of general form

d2 y dy
+ p + qy = 0
dx2 dx
where p and q are constants, find the roots of the characteristic equa-
tion
m2 + pm + q = 0.

1. If the roots m1 and m2 are real and unequal, then the general
solution is
y( x ) = C1 em1 x + C2 em2 x .

2. If the roots are complex conjugates a ± ib, then the general solution
is
y( x ) = C1 e ax cos(bx ) + C2 e ax sin(bx ).

3. If there is a single (or repeated) root m, then the general solution is

y( x ) = C1 emx + C2 xemx .
math1012 mathematical theory and methods 177

10.6 Linear nonhomogeneous second-order ordinary differen-


tial equations with constant coefficients

Consider a linear nonhomogeneous second-order ordinary differen-


tial equations with constant coefficients

d2 y dy
+ p + qy = g( x ),
dx2 dx
where p and q are constants and the nonhomogeneous term g( x )
is an arbitrary function of x. For this differential equation we also
consider the corresponding homogeneous differential equation

d2 y dy
+ p + qy = 0,
dx2 dx
with general solution yc , which we call the complimentary solution.

Definition 10.28. (Particular solution)


A particular solution y p of the nonhomogeneous differential equation

d2 y dy
2
+ p + qy = g( x )
dx dx
is a specific function that contains no arbitrary constants and satisfies the
differential equation.

Example 10.29. Recall Example 10.2, where we showed that

y( x ) = C1 e2x + C2 e−2x − 2 cos x − 5x sin x

was a solution of the ordinary differential equation

d2 y
− 4y = 25x sin x
dx2
for any value of the constants C1 and C2 . With C1 = 0 = C2 , a particular
solution of this differential equation is simply

y p ( x ) = −2 cos x − 5x sin x.

Theorem 10.30. (General Solution of a Nonhomogeneous Differen-


tial Equation)
The general solution of a linear nonhomogeneous second-order
ordinary differential equations with constant coefficients

d2 y dy
+ p + qy = g( x )
dx2 dx
178 CHAPTER 10. DIFFERENTIAL EQUATIONS

is
y ( x ) = y c ( x ) + y p ( x ),
where y p is a particular solution of the nonhomogeneous differential
equation and yc is the general solution of the corresponding homoge-
neous differential equation

d2 y dy
+ p + qy = 0.
dx2 dx

Proof.

d2 y d2 y c d2 y p
!
dy dyc dy p
 
+ p + qy = +p + q yc + y p

+ +
dx2 dx dx2 dx2 dx dx

d2 y c d2 y p
!
dyc dy p
 
= 2
+p + qyc + 2
+p + qy p
dx dx dx dx
= 0 + g ( x ) = g ( x ).

There are two methods to find y p ( x ), either the method of unde-


termined coefficients (which is a very specific method) or variation of
parameters (which is a much more general method).

10.6.1 Method of undetermined coefficients


The method of undetermined coefficients can be applied when the
nonhomogeneous term g( x ) is:

• A polynomial;

• A linear combination of sines and cosines;

• An exponential function; or

• A combination of sums, differences and products of the above


functions.

The idea behind this method is that the derivative of a polyno-


mial is a polynomial, that of a trigonometric function is a trigono-
metric function, and that of an exponential function is an exponen-
tial function, meaning that we can make an intelligent guess for the
form of y p ( x ).

Nonhomogeneous term g( x ) Form of trial particular solution y p ( x )


. .
a n ( x ) = a n x n + a n −1 x n −1 + · · · + a 1 x + a 0 A n ( x ) = A n x n + A n −1 x n −1 + · · · + A 1 x + A 0

. .
an ( x )eαx An ( x )eαx

. .
an ( x ) sin( βx ) or an ( x ) cos( βx ) An ( x ) sin( βx ) + Bn ( x ) cos( βx )

. .
an ( x )eαx sin( βx ) or an ( x )eαx cos( βx ) eαx [ An ( x ) sin( βx ) + Bn ( x ) cos( βx )]
math1012 mathematical theory and methods 179

We formulate a guess for y p using the above table and the fol-
lowing rules:

• Basic rule: If g( x ) is one of the functions listed in the first col-


umn, substitute the corresponding function from the second
column and determine the unknown constants by equating coef-
ficients.

• Modification rule: If a term in the choice for y p is a solution of This should come as no surprise –
the homogeneous equation, then multiply this term by x. remember from Section 10.5 that
the second solution for the case of
an equal root of the characteristic
• Sum rule: If g( x ) is a sum of functions listed in the first column, equation was just x times the first
then substitute the corresponding sum of functions from the sec- solution.
ond column and solve for the unknown coefficients by equating
coefficients.

d2 y dy
Example 10.31. Solve the differential equation +6 + 9y =
dx2 dx
4x2 + 5.

Solution: In Example 10.26, we found the complementary solution was


yc ( x ) = C1 e−3x + C2 xe−3x . We now need to determine a particular
solution. Based on the Table of guesses we try

y p ( x ) = A2 x 2 + A1 x + A0 , (10.12)

and we note that y p is not included already in yc . Substitution of Equa-


tion 10.12 into the differential equation gives

2A2 + 6(2A2 x + A1 ) + 9( A2 x2 + A1 x + A0 ) = 4x2 + 5


⇒ 9A2 x2 + (9A1 + 12A2 ) x + (9A0 + 6A1 + 2A2 ) = 4a2 + 5

and equating the coefficients of the powers of x on each side of this equa-
tion leads to a set of algebraic equations to solve for the unknowns A0 , A1
and A2 :

9A2 = 4 , 9A1 + 12A2 = 0 , 9A0 + 6A1 + 2A2 = 5.

The solution of this set of equations is

23 16 4
A0 = , A1 = − , A2 = .
27 27 9
Hence the particular solution is

4 2 16 23
y p (x) = x − x+
9 27 27
and finally, the general solution of the nonhomogeneous differential equa-
tion is
4 16 23
y( x ) = C1 e−3x + C2 xe−3x + x2 − x + .
9 27 27
180 CHAPTER 10. DIFFERENTIAL EQUATIONS

Example 10.32. Solve the differential equation


d2 y dy
2
− 5 + 4y = 7 cos(3x ).
dx dx

Solution: In Example 10.23 we found the complementary solution was


yc ( x ) = C1 e x + C2 e4x . Based on the Table of guesses we try

y p ( x ) = A cos(3x ) + B sin(3x ) (10.13)

as our particular solution y p , and we note that y p is not included already


in yc . Substitution of Equation 10.13 into the differential equation gives

− 9A cos(3x ) − 9B sin(3x ) − 5(−3A sin(3x ) + 3B cos(3x ))


+ 4( A cos(3x ) + B sin(3x )) = 7 cos(3x ),

and equating coefficients of cos(3x ) and sin(3x ) on both sides of this


equation leads to a set of algebraic equations to solve for the unknowns A
and B:

−9A − 15B + 4A = 7 , −9B + 15A + 4B = 0.

The solution of this set of equations is


7 21
A=− , B=−
50 50
and so
7 21
y p (x) = − cos(3x ) − sin(3x ),
50 50
and finally, the general solution of the nonhomogeneous differential equa-
tion is
7 21
y( x ) = C1 e x + C2 e4x − cos(3x ) − sin(3x ).
50 50

Example 10.33. Solve the differential equation


d2 y dy
2
− 4 + 13y = 8e−3x .
dx dx

Solution: In Example 10.24 we found the complementary solution was


yc ( x ) = C1 e2x cos(3x ) + C2 e2x sin(3x ). Based on the Table of guesses we
try
y p ( x ) = Ae−3x (10.14)
as our particular solution y p , and we note that y p is not included already
in yc . Substitution of Equation 10.14 into the differential equation gives

9Ae−3x + 12Ae−3x + 13Ae−3x = 8e−3x

and dividing through by e−3x


4 4 −3x
34A = 8 ⇒ A= ⇒ y p (x) = e ,
17 17
and hence the general solution of the nonhomogeneous differential equation
is
4
y( x ) = C1 e2x cos(3x ) + C2 e2x sin(3x ) + e−3x .
17
math1012 mathematical theory and methods 181

Example 10.34. Solve the differential equation

d2 y dy
+ 5 + 6y = 3e−2x .
dx2 dx

Solution: The corresponding homogeneous differential equation

d2 y dy
+ 5 + 6y = 0
dx2 dx
has characteristic equation

m2 + 5m + 6 = (m + 2)(m + 3) = 0 ⇒ m = −2, −3

so the complementary solution is

yc ( x ) = C1 e−2x + C2 e−3x .

Based on the Table of guesses we try

y p ( x ) = Ae−2x

as our particular solution y p , but in this case we note that y p is included


already in yc , so instead we try

y p ( x ) = Axe−2x (10.15)

as our particular solution y p . Substitution of Equation 10.15 into the


differential equation gives

4A( x − 1)e−2x + 5A(1 − 2x )e−2x + 6Axe−2x = 3e−2x

and dividing through by e−2x and expanding yields

4Ax − 4A + 5A − 10Ax + 6Ax = 3 ⇒ A=3 ⇒ y p ( x ) = 3xe−3x ,

and hence the general solution of the nonhomogeneous differential equation


is
y( x ) = C1 e−2x + C2 e−3x + 3xe−2x .

Remark 10.35. You should use y p ( x ) = Ae−2x as as the particular


solution y p in Example 10.34, and investigate what happens when you
attempt to find the value of the constant A.

10.6.2 Variation of parameters


In cases where the nonhomogeneous term is not of the right type
and the method of undetermined coefficients cannot be applied, a
more general method called variation of parameters may be used to
find y p .
Consider the complimentary solution yc = C1 y1 + C2 y2 of the
homogeneous differential equation

d2 y dy
+ p( x ) + q( x )y = 0.
dx2 dx
182 CHAPTER 10. DIFFERENTIAL EQUATIONS

To find a particular solution y p of the corresponding nonhomoge-


neous differential equation

d2 y dy
2
+ p ( x ) + q ( x ) y = g ( x ),
dx dx
we replace the integration constants C1 and C2 in the complemen-
tary solution with unknown functions u1 ( x ) and u2 ( x ) and suppose
that this is y p ; that is, we set

y p ( x ) = u1 ( x ) y1 ( x ) + u2 ( x ) y2 ( x ).

From the product rule we have


dy p du1 dy du dy
= y1 + u1 1 + y2 2 + u2 2
dx dx dx dx dx
dy dy du du
 
= u1 1 + u2 2 + y1 1 + y2 2 .
dx dx dx dx
This is a nasty-looking expression, so lets set the term in brackets
equal to zero, that is
du1 du
y1 + y2 2 = 0,
dx dx
then the first derivative of y p is the not-so-nasty looking

dy p dy1 dy
= u1 + u2 2 .
dx dx dx
Differentiating again using the product rule we have

d2 y p dy1 du1 d2 y dy du2 d2 y


= + u1 21 + 2 + u2 22 ,
dx2 dx dx dx dx dx dx
and hence the nonhomogeneous differential equation becomes

dy1 du1 d2 y dy du2 d2 y


+ u1 21 + 2 + u2 22
dx dx dx dx dx dx
dy dy
 
+ p u1 1 + u2 2 + q (u1 y1 + u2 y2 ) = g,
dx dx
which may be written

:0  2 :0
d2 y1
!
dy d y2 dy
 


1
 2
u1 2
+ p   + qy 1
+ u 2 +
2 
p  + qy 2
dx dx dx dx
 
 
dy1 du1 dy du2
+ + 2 = g.
dx dx dx dx
Hence we have two equations for the derivatives of the unknown
functions u1 and u2 , namely
du1 du dy1 du1 dy du
y1 + y2 2 = 0 and + 2 2 = g.
dx dx dx dx dx dx
Solving these equations for u10 and u20 we find

du1 y2 g du2 y1 g
=− and = ,
dx W [ y1 , y2 ] dx W [ y1 , y2 ]
math1012 mathematical theory and methods 183

where W [y1 , y2 ] is the Wronskian of the homogeneous solutions y1


and y2 .
By integrating3 these equations (omitting the integration con- 3
Although we have said this is a gen-
stants) we can obtain the particular solution y p = u1 y1 + u2 y2 , and eral method, there is no guarantee that
these two equations can be integrated
therefore the general solution y = yc + y p . to find u1 ( x ) and/or u2 ( x ).

Remark 10.36. 1. If one of the terms in y p is already in yc , we can “ab-


sorb” it into the integration constants contained in yc .

2. Of course, the Examples that were solved in the previous section by the
method of undetermined coefficients can also be solved by variation of
parameters. However, in the vast majority of cases if a nonhomogeneous
differential equation can be solved by the method of undetermined
coefficients, it will be much easier to use that method than to solve than
the same problem using variation of parameters.

Key Concept 10.37. In summary, to find the general solution of a


linear nonhomogeneous second-order ordinary differential equation
with constant coefficients of general form

d2 y dy
+ p + qy = g( x )
dx2 dx
where p and q are constants and g( x ) is an arbitrary function of x by
the method of variation of parameters:

1. Find the general solution yc ( x ) = C1 y1 ( x ) + C2 y2 ( x ) of the


corresponding homogeneous differential equation

d2 y dy
+ p + qy = 0.
dx2 dx

2. Calculate the Wronskian

y1 y2
" #
dy2 dy
W [y1 , y2 ]( x ) = det = y1 − y2 1 .
0
y1 y20 dx dx

du1 y ( x ) g( x ) du2 y ( x ) g( x )
3. Let =− 2 and = 1 .
dx W [ y1 , y2 ] dx W [ y1 , y2 ]
4. Integrate these two equations to find u1 ( x ) and u2 ( x ), omitting
the integration constants.

5. A particular solution of the nonhomogeneous differential is then


y p ( x ) = u1 ( x ) y1 ( x ) + u2 ( x ) y2 ( x ).

6. The general solution of the nonhomogeneous differential equation is


then y( x ) = yc ( x ) + y p ( x ).

Example 10.38. Solve the differential equation


d2 y dy
− 2 + y = e x ln x.
dx2 dx
184 CHAPTER 10. DIFFERENTIAL EQUATIONS

Solution: Note that the nonhomogeneous term g( x ) = e x ln x is not


of a form that we can apply the method of undetermined coefficients, be-
cause we cannot make an intelligent guess for y p ( x ). The corresponding
homogeneous differential equation

d2 y dy
−2 +y = 0
dx2 dx
has characteristic equation

m2 − 2m + 1 = (m − 1)2 = 0 ⇒ m = 1,

so the complementary solution is

yc ( x ) = C1 e x + C2 xe x .

With y1 ( x ) = e x and y2 ( x ) = xe x we find


" x
e xe x
#
W [y1 , y2 ]( x ) = det x = e2x .
e e x + xe x

Then
du1 y ( x ) g( x ) ( xe x )(e x ln x )
=− 2 =− = − x ln x,
dx W [ y1 , y2 ] e2x

and using integration by parts we find that

1 2 1 2
Z
u1 ( x ) = − x ln x dx = x − x ln x.
4 2
Similarly

du2 y ( x ) g( x ) (e x )(e x ln x )
= 1 = = ln x,
dx W [ y1 , y2 ] e2x

hence Z
u2 ( x ) = ln x dx = x ln x − x.

Then the particular solution is

y p ( x ) = u1 ( x ) y1 ( x ) + u2 ( x ) y2 ( x )

1 2 1 2
 
= x − x ln x (e x ) + ( x ln x − x )( xe x )
4 2
1 2 x 3
= x e ln x − x2 e x ,
2 4
and hence the general solution of the nonhomogeneous differential equation
is
1 3
y( x ) = C1 e x + C2 xe x + x2 e x ln x − x2 e x .
2 4

Remark 10.39. It should be obvious from this example that it would


have been extremely difficult (if not impossible) to “guess” the form of the
particular solution y p .
math1012 mathematical theory and methods 185

10.7 Initial and boundary conditions

As mentioned earlier, the values of the integration constants that


arise when we solve differential equations can be determined by
making use of other conditions (or restrictions) placed on the prob-
lem. If there are n unknown constants then we will need n extra
conditions.
If all of the extra conditions are given at one value of the inde-
pendent variable then the extra conditions are called initial condi-
tions and the combined differential equation plus initial conditions
is called an initial value problem.
If the extra conditions are given at different values of the inde-
pendent variable then they are called boundary conditions and the
combined differential equation plus boundary conditions is called a
boundary value problem.

Example 10.40. Solve the initial value problem

d2 y dy
− 5 + 4y = 7 cos(3x ) , y (0) = 1 , y0 (0) = 2.
dx2 dx

Solution: We have already seen this differential equation in Example


10.32 and determined that its general solution is given by
7 21
y( x ) = C1 e x + C2 e4x − cos(3x ) − sin(3x ).
50 50
The initial conditions will give two equations to solve for the unknowns,
C1 and C2 . Firstly,
7
y (0) = 1 ⇒ 1 = C1 + C2 − . (10.16)
50
The second initial condition involves the derivative, so:
dy 21 63
= C1 et + 4C2 e4x + sin(3x ) − cos(3x ),
dx 50 50
and the second initial condition then gives
63
y 0 (0) = 2 ⇒ 2 = C1 + 4C2 − . (10.17)
50
Solving the pair of algebraic equations 10.16 and 10.17 gives
13 53
C1 = and C2 = ,
30 75
so the required solution is
13 x 53 4x 7 21
y( x ) = e + e − cos(3x ) − sin(3x ).
30 75 50 50

Example 10.41. Solve the boundary value problem

d2 y dy
+ 6 + 9y = 4x2 + 5 , y (0) = 7 , y(1) = −3.
dx2 dx
186 CHAPTER 10. DIFFERENTIAL EQUATIONS

Solution: We have already seen this differential equation in Example


10.31 and determined that its general solution is given by

4 16 23
y( x ) = C1 e−3x + C2 xe−3x + x2 − x + .
9 27 27
The boundary conditions give two equations to solve for the unknowns, C1
and C2 :
23
y (0) = 7 ⇒ 7 = C1 + ,
27
4 16 23
y (1) = −3 ⇒ −3 = C1 e−3 + C2 e−3 + − + .
9 27 27
Solving this pair of algebraic equations gives

166 166 + 100e3


C1 = and C2 = − ,
27 75
so the required solution is

166 −3x 166 + 100e3 4 16 23


 
y( x ) = e − xe−3x + x2 − x + .
27 75 9 27 27
11
Laplace transforms

Laplace transforms represent a powerful method for tackling vari-


ous problems that arise in engineering and physical sciences. Most
often they are used for solving differential equations that cannot
be solved via standard methods. An introduction to the concepts
of and the language relating to Laplace transforms is our plan for
this chapter. More advanced theory and uses of the transform are
postponed until later units.

11.1 The Laplace transform and its inverse

We begin with a definition of the Laplace transform of a scalar func-


tion f (t) defined for t ≥ 0.

Definition 11.1. (Laplace transform)


Given a function f (t) defined for all t ≥ 0, the Laplace transform (LT)
of f (t) is the function

Z∞
F (s) = e−st f (t) dt
0

defined for all s ∈ R for which the above improper integral is convergent.
We often write F (s) as L( f ), or, more precisely L( f )(s).

It is worth remarking that here we are following traditional nota-


tion and denoting the variable of the initial function t (this is moti-
vated by regarding the function f (t) as defined for all ‘time’ t ≥ 0).
Performing the transformation will of course yield a function F (s)
and the usual designation of this Laplace transform variable is s
(although some texts might use p instead). Lastly, we point out
that the Laplace transforms of functions f (t), g(t), h(t), etc. are
normally denoted by their corresponding capital letters F (s), G (s),
H (s), etc.
If F = L( f ) is the Laplace transform of f (t), we say that f (t) is
the inverse Laplace transform (ILT) of F (s), written as f = L−1 ( F ). In
slightly cumbersome terms, this is saying that the inverse transform
of F (s) is that function f (t) whose Laplace transform is F (s).
188 CHAPTER 11. LAPLACE TRANSFORMS

We now determine the Laplace transforms of some simple func-


tions.

Example 11.2. If f (t) = 1 for t ≥ 0, then if s > 0


Throughout this chapter we use the
following notational convention: If for
Z∞  −st ∞
e 1 1 1 a function f (t) the improper integral
F (s) = e−st dt = − = − lim e−st + = . Z∞
s 0 s t→∞ s s f (t) dt exists and if g(t) is an anti-
0
c
derivative for f (t), then we write
This is so since lim e−st = 0 for s > 0 (notice this limit does not exist if [ g(t)]∞
c for lim ( g ( t ) − g ( c )) .
t→∞ t→∞
s ≤ 0). Thus, the integral exists for s > 0, giving that

1
L(1) = .
s

Notice that here F (s) is not defined for all real values of s, just for s > 0.
The definition of the ILT now implies that

1
 
−1
L = 1.
s

Example 11.3. For f (t) = tn for some integer n ≥ 0 then

Z∞
L(tn ) = e−st tn dt.
0

Substituting u = ts gives

Z∞ Z∞
 u n du 1 n!
L(tn ) = e−u = n+1 un e−u du = n+1 ( for s > 0)
s s s s
0 0

where the integral can be evaluated using the principle of mathematical


induction and integration by parts.

Example 11.4. Consider f (t) = e at for t ≥ 0, where a is a constant.


Then for s > a we have

Z∞
#∞
e( a−s)t
"
−st at
F (s) = e e dt =
a−s
0 0

1 1 1
=− + lim e(a−s)t = .
a−s a − s t→∞ s−a
Thus, the integral exists for s > a (note F (s) does not exist for s ≤ a) and

1
L(e at ) = ( s > a ).
s−a

Hence we can deduce that

1
 
−1
L = e at ( t ≥ 0).
s−a

For a = 0 this result is consistent with Example 11.2.


math1012 mathematical theory and methods 189

Example 11.5. Let f (t) = sin( at) for some a 6= 0, and let F = L( f ).
Notice that for s > 0 we have lim e−st sin( at) = 0 (by the Squeeze Theo-
t→∞
rem); similarly, lim e−st cos( at) = 0. Using this and two integrations by
t→∞
parts, we get
Z∞ Z∞
1
F (s) = e−st sin( at) dt = − (e−st )0 sin( at) dt
s
0 0
Z∞
1 a
= − [e−st sin( at)]0∞ + e−st cos( at) dt
s s
0
Z∞
a
= 0− (e−st )0 cos( at) dt
s2
0
Z∞
a −st a2
= − 2
[e cos( at)]0∞ − 2 e−st sin( at) dt
s s
0
a a2
= 2 − 2 F ( s ).
s s
This gives an equation for F (s):
a a2
F (s) = − F ( s ).
s2 s2
It is a matter of simple algebra to rearrange to deduce that
a
L(sin( at)) = F (s) = 2 ( s > 0).
s + a2
It is then immediately obvious that
a
 
L −1 2 = sin( at).
s + a2

Exercise 11.1.1. Use similar methods to show that


s s
 
−1
L(cos( at)) = 2 and L = cos( at) for s > 0.
s + a2 s2 + a2
These are just a few of the more straightforward examples of
the Laplace transform. To obtain others we can use some of the
properties of the Laplace transform operation.
Exercise 11.1.2. Use integration by parts to show that for any constants
a > 0 and ω ∈ R we have
ω
(a) L(e at sin(ωt)) = , for s > 0
( s − a )2 + ω 2
s+a
(b) L(e at cos(ωt)) = , for s > 0
( s − a )2 + ω 2
(Hint: Write down the definition of the Laplace transform in each case.
A suitable substitution will reduce the integrals to those in Example 11.5
thereby circumventing the need to do pages of laborious calculation.)

11.1.1 Linearity of the Laplace transform


The first part of the theorem below is an immediate consequence
of the definition of the Laplace transform. The second part follows
immediately from the first one.
190 CHAPTER 11. LAPLACE TRANSFORMS

Theorem 11.6. 1. If the Laplace transforms L( f )(s) and L( g)(s) of two


functions f (t) and g(t) exist for s ≥ a for some a ∈ R, then for any
constants α ∈ R and β ∈ R we have

L (α f + βg) (s) = αL( f )(s) + βL( g)(s)


This theorem says that both the
for s ≥ a. Laplace transform and the inverse
Laplace transform act as linear trans-
2. Let F (s) and G (s) be functions. If the inverse Laplace transforms formations on the space of functions.
f (t) = L−1 ( F )(t) and g(t) = L−1 ( G )(t) exist, then for any con-
stants α ∈ R and β ∈ R we have

L−1 (αF + βG ) (t) = αL−1 ( F )(t) + βL−1 ( G )(t).

1
 
Example 11.7. Find L−1 .
s ( s − 1)
1 1 1
Solution: We decompose in partial fractions: = − .
s ( s − 1) s−1 s
Hence
1 1 1 1 1
       
L −1 = L −1 − = L −1 − L −1 = et − 1.
s ( s − 1) s−1 s s−1 s

Exercise 11.1.3. Use the linearity of the Laplace transform and some of
the above examples to find the Laplace transforms of:
(a) f (t) = cos t − sin t
(b) f (t) = t2 − 3t + 5
(c) f (t) = 3e−t + sin(6t)
Exercise 11.1.4. Use the linearity of the inverse Laplace transform and
some of the above examples to find the inverse Laplace transforms of:
2
(a) F (s) = − , s > −16
s + 16
4s
(b) F (s) = 2 , s>3
s −9
3 1
(c) F (s) = + 2, s > 7
s−7 s

11.1.2 Existence of Laplace transforms


Recall that a function f ( x ) is called piecewise continuous on a given
interval [ a, b] if f has only finitely many points of discontinuity in
[ a, b]. Piecewise continuous functions possess a Laplace transform if
they are of exponential order:

Definition 11.8. (Exponential order)


A function f (t), t ≥ 0, is of exponential order if f (t) is piecewise
continuous and bounded on every interval [0, T ] with T > 0 and there
exist constants M > 0 and γ ∈ R such that

| f (t)| ≤ Meγt for all t ≥ 0.

When this holds we will say that the exponential order of f is ≤ γ.


math1012 mathematical theory and methods 191

Given a function is of exponential order ≤ γ we can then deduce


for what values of s its Laplace transform is defined:

Theorem 11.9. If f (t) is of exponential order ≤ γ, then the Laplace


transform F (s) = L( f )(s) exists for all s > γ.

The Laplace transform of a given function is unique. Conversely,


if two functions have the same Laplace transform then they can
differ only at isolated points.

Example 11.10. For f (t) = e at , we saw in Example 11.4 that the trans-
form exists for s > a. This is consistent with Theorem 11.9 since f (t)
is of exponential order ≤ a: taking M = 1 and γ = a we see that
| f (t)| ≤ Me at for all t ≥ 0.
2 2
Example 11.11. For f (t) = et there are no M and γ for which et ≤
2
Meγt for all t ≥ 0. In very informal terms, et grows more quickly than
2
eγt for any γ. The Laplace transform L(et ) does not exist in this case.
This example proves that not every well-defined function necessarily has a
Laplace transform.

11.2 Inverse Laplace transforms of rational functions

If the Laplace transform F (s) = L( f ) of some function f (t) has the


special form
P(s)
F (s) = ,
Q(s)
where P(s) and Q(s) are polynomials with deg( P( x )) < deg( Q( x )),
then we can find the inverse Laplace transform f (t) = L−1 ( F ) using
partial fractions, which were covered in MATH1011 and is sum-
marised in the Appendix.
Notice that at this stage we know the inverse Laplace transforms
of the following basic rational functions (see Examples 11.4, 11.5
and Exercise 11.1.1)

1
 
L −1 = e at for s > a
s−a

1 1
 
L −1 = sin( at), for s > 0
s2 + a2 a

s
 
L −1 = cos( at), for s > 0.
s2 + a2

To recall the method of partial fractions and demonstrate how it


applies to problems involving inverse Laplace transforms, we look
at two examples.
2s − 1
Example 11.12. Suppose F (s) = , s > 1, and we want
( s2
− 1)(s + 3)
to find f (t) = L−1 ( F ). First, using partial fractions, we write

2s − 1 A B C
F (s) = = + + .
(s − 1)(s + 1)(s + 3) s−1 s+1 s+3
192 CHAPTER 11. LAPLACE TRANSFORMS

This is equivalent to

2s − 1 = A(s + 1)(s + 3) + B(s − 1)(s + 3) + C (s − 1)(s + 1).

From this with s = 1 we get A = 1/8. Similarly, s = −1 gives B = 3/4,


while s = −3 implies C = −7/8. Thus,
1 3 7
F (s) = + −
8( s − 1) 4( s + 1) 8( s + 3)
and therefore

f (t) = L−1 ( F (s))

1 −1 1 3 1 7 1
     
= L + L −1 − L −1
8 s−1 4 s+1 8 s+3

1 t 3 −t 7 −3t
= e + e − e .
8 4 8

2s2 − s + 4
Example 11.13. Suppose F (s) = , for s ≥ 0. To find
s3 + 4s
−1
f (t) = L ( F ), we first use partial fractions:
2s2 − s + 4 A Bs + C
F (s) = 2
= + 2 .
s ( s + 4) s s +4
This is equivalent to

2s2 − s + 4 = A(s2 + 4) + ( Bs + C )s = ( A + B)s2 + Cs + 4A,

so we must have A + B = 2, C = −1 and 4A = 4. This gives A = 1,


B = 1, C = −1.
Thus,
1 s−1 1 s 1
F (s) = + 2 = + 2 −
s s +4 s s + 4 s2 + 4
and therefore

−1 1 s 1
     
−1 −1 −1
f (t) = L ( F ) = L +L −L
s s2 + 4 s2 + 4
1
= 1 + cos(2t) − sin(2t).
2

Exercise 11.2.1. Use partial fractions to find the inverse Laplace trans-
forms of:
2s
(a) F (s) = −
(s + 1)(s2 + 1)
1
(b) F (s) = 4
s − 16

11.3 The Laplace transform of derivatives and integrals of f (t)

Later we shall show that some of the most important applications


of Laplace transforms are to the solutions of differential equa-
tions. To that end it is important to know the forms of the Laplace
transforms of the derivatives and integral of f (t). The form of the
Laplace transform of f 0 (t) is given in the following theorem:
math1012 mathematical theory and methods 193

Theorem 11.14. If f (t) is continuous and of exponential order ≤ γ and


if f 0 (t) exists and is piecewise continuous and bounded over [0, T ] for all
T ≥ 0, then the Laplace transform of f 0 (t) exists for s > γ and

L( f 0 )(s) = sL( f )(s) − f (0) .

Proof. This is easy to verify using integration by parts. Indeed, if


we denote by F (s) and G (s) the Laplace transforms of f (t) and
f 0 (t), respectively, then

Z∞ ∞ Z∞
−st 0 −st
G (s) = e f (t) dt = e f (t) +s e−st f (t) dt

0
0 0
= − f (0) + lim e−st f (t) + sF (s) = sF (s) − f (0)
t→∞

since for s > γ we have |e−st f (t)| ≤ Me(γ−s)t → 0 as t → ∞.

It is worth pausing at this juncture and spending a few moments


reflecting on this result. There are a few very important proper-
ties that need to be appreciated before we proceed. First, note how
G (s) = L( f 0 )(s) is a multiple of F (s) minus a constant. Before
working through Theorem 11.14 we might well have expected that
the Laplace transform of f 0 (t) would have involved the derivative
dF
of the Laplace transform of f (t), in other words . But that clearly
ds
is not the case; G (s) is related to F (s) by simple algebraic multipli-
cation and this is the first clue as to the usefulness of the Laplace
transform in solving differential equations. Laplace transforms of
derivatives of y(t) are changed to algebraic multiples of its Laplace
transform Y (s) and, as we shall see, this means that ultimately the
solution of the problem reduces to a task of solving an algebraic
equation which is normally a much easier prospect than analysing
the original differential equation.
The second aspect of note in the result of Theorem 11.14 is the
presence of the value f (0). This too is not expected – normally
when dealing with the derivative of a function we are not con-
cerned with the value of the function itself at any given point. But
that is not the case for the Laplace transform; in order to find the
Laplace transform of f 0 (t) completely some knowledge of f (0) is
required.

Exercise 11.3.1. Use the formula for the Laplace transform of a derivative
to find:
(a) L(te at )
(b) L(tn e at )

Given the technique used to deduce the form of G (s) we can re-
peat the process to obtain the Laplace transforms of higher deriva-
tives of f (t) (provided some regularity conditions are satisfied). For
example

L( f 00 )(s) = sL( f 0 )(s) − f 0 (0) = s[sL( f )(s) − f (0)] − f 0 (0)


194 CHAPTER 11. LAPLACE TRANSFORMS

so that
L( f 00 )(s) = s2 L( f )(s) − s f (0) − f 0 (0).
Similarly,

L( f 000 )(s) = s3 L( f )(s) − s2 f (0) − s f 0 (0) − f 00 (0)

and, more generally (this can be proved using the principle of


mathematical induction),

L( f (n) )(s) = sn L( f )(s) − sn−1 f (0) − · · · − s f (n−2) (0) − f (n−1) (0).

Once again we remark that L( f (n) )(s) involves no derivatives of


F (s) at all; it is given by a multiple sn of F (s) plus a polynomial of
degree n − 1 in s. The coefficients of this polynomial involve the
values of the first n − 1 derivatives of f (t) at t = 0.
Example 11.15. The above can be used to find L(sin( at)) by an alterna-
tive route than that taken in Example 11.5. Let f (t) = sin( at). Then f (t)
is continuous and of exponential order ≤ 0 (since | f (t)| ≤ 1e0t ) and f 0 (t)
is continuous, so we can apply Theorem 11.14. We have f 00 (t) = − a2 f (t)
and so

− a2 L( f ) = L( f 00 ) = s2 L( f ) − s f (0) − f 0 (0), for s > 0.

Collecting the two terms involving L( f ),

( a2 + s2 )L( f ) = s f (0) + f 0 (0)

and, using that f (0) = 0 and f 0 (0) = a, we obtain


a
L( f ) = , for s > 0.
s2 + a2

Using all of the techniques above (and some more to come


later) we can construct a table of Laplace transforms of frequently-
encountered functions. Such a table is provided on page 207.
Exercise 11.3.2. Use the formula for the Laplace transform of a double
derivative to show that
2ωs
(a) L(t sin(ωt)) = 2
( s + ω 2 )2
s2 − ω 2
(b) L(t cos(ωt)) = 2
( s + ω 2 )2
Next we consider how one can derive a formula for the Laplace
transform of an integral.
Theorem 11.16. If f (t) is of exponential order (so that L( f ) exists) then
Zt
g(t) = f (u)du is of exponential order ≤ γ for some γ. Moreover, for
0
s > γ and s 6= 0 we have
1
L( g)(s) = L( f )(s).
s
math1012 mathematical theory and methods 195

In other words, if the Laplace transform of f (t) is F (s), then


Zt
F (s) F (s)
 
L( g)(s) = and L −1 = L−1 ( F )(u)du.
s s
0

Sketch of Proof. Denote the Laplace transform of g(t) by G (s). By


definition of g(t), it is continuous and it can be proved that it is of
exponential order, say ≤ γ. Since g0 (t) = f (t) by the Fundamental
theorem of Calculus and since g(0) = 0, Theorem 11.14 implies (for
s > γ)

F (s) = L( f )(s) = L( g0 )(s) = sL( g)(s) − g(0) = sG (s).

Thus, for s > γ, s 6= 0, we have G (s) = F (s)/s. 


This can be particularly useful in helping to determine the in-
verse transform of functions which have a factor s appearing in the
denominator.

Example 11.17. Find the inverse Laplace transform g(t) of G (s) =


1
, using Theorem 11.16. We could also use the partial fraction
s ( s2 + ω 2 ) method.
F (s) 1
Solution: Notice that G (s) = , where for F (s) = 2 we know
s s + ω2
1
f ( t ) = L −1 ( F ) = sin(ωt).
ω
Then Theorem 11.16 yields
Zt Zt
−1 1 1
g(t) = L ( G (s)) = f (u)du = sin(ωu)du = [1 − cos(ωt)].
ω ω2
0 0

Exercise 11.3.3. Use 


the formula 
for the Laplace
 transform of an integral
1 1

to find L−1 and L−1 (no partial fractions
s ( s + 3) s2 ( s + 3)
required).

11.4 Solving differential equations

Laplace transforms can be applied to initial-value problems for


linear ordinary differential equations by reducing them to the task
of solving an algebraic equation.
However it should be realised that Laplace transform meth-
ods will only be able to detect solutions of ordinary differential
equations that have Laplace transforms. While most solutions will
satisfy this requirement, not all will. We saw in Example 11.11 that
2
f (t) = et does not have a Laplace transform but this is a solution
of the differential equation
dy
− 2ty = 0;
dt
we could not therefore expect to derive a meaningful solution of
this equation using the Laplace transform. We need to bear in mind
196 CHAPTER 11. LAPLACE TRANSFORMS

that although the Laplace transform will find most solutions of


differential equations there are isolated cases when it will fail.
Despite this caution, it can be shown that all solutions of con-
stant coefficient differential equations are of exponential order so
we can use Laplace transform methods to seek a solution y(t) of

y00 (t) + ay0 (t) + by(t) = r (t) t ≥ 0

where a, b are constants and r (t) is a given function, such that y(t)
satisfies the initial conditions

y ( 0 ) = K0 , y 0 ( 0 ) = K1 .

To solve this, first transform the differential equation, writing

L(y00 ) + aL(y0 ) + bL(y) = R(s)

where R(s) = L(r )(s). In terms of Y (s) = L(y), this gives the
equation

[s2 Y (s) − sy(0) − y0 (0)] + a[sY (s) − y(0)] + bY (s) = R(s).

This can be written in the form

(s2 + as + b)Y (s) = R(s) + (s + a)y(0) + y0 (0) = R(s) + (s + a)K0 + K1

so we have
R ( s ) + ( s + a ) K0 + K1
Y (s) = .
s2 + as + b
Therefore
R ( s ) + ( s + a ) K0 + K1
 
−1 −1
y(t) = L (Y ) = L .
s2 + as + b
This method will become clearer with some examples.
Example 11.18. Solve the initial value problem

y00 (t) − y(t) = t , y(0) = 1 , y0 (0) = 1.

Solution: Applying the Laplace transform and denoting Y (s) = L(y),


we get
1
[s2 Y (s) − sy(0) − y0 (0)] − Y (s) = L(t) = 2 .
s
Using the initial conditions y(0) = y0 (0) = 1, we write it in the form
1
( s 2 − 1 )Y = s + 1 + .
s2
Solving for Y (s) gives
s+1 1 s3 + s2 + 1 1 3 1 1 1
Y (s) = 2
+ 2 2
= 2
=− 2+ · − ·
s − 1 s ( s − 1) s (s − 1)(s + 1) s 2 s−1 2 s+1
using partial fractions. So, from the table of Laplace transforms,
1 3 −1 1 1 −1 1
     
−1 −1
y(t) = L (Y )(t) = −L + L − L
s2 2 s−1 2 s+1

3 1
= − t + et − e−t .
2 2
math1012 mathematical theory and methods 197

Example 11.19. Solve the initial value problem

y(4) (t) − y(t) = 0 , y(0) = 0 , y0 (0) = 1, y00 (0) = y000 (0) = 0.

Solution: Let Y (s) = L(y)(s). Applying the Laplace transform to the


DE, we get

[s4 Y (s) − s3 y(0) − s2 y0 (0) − sy00 (0) − y000 (0)] − Y (s) = 0.

Using the initial conditions, this gives s4 Y (s) − s2 − Y (s) = 0, and


therefore
s2
Y (s) = 4 .
s −1
To find y(t) = L−1 (Y ) we need to find a convenient partial fraction
expansion for Y (s). The following will be adequate:

s2 1 1 1 1 1 1
Y (s) = 2
= · − · + · 2 .
(s − 1)(s + 1)(s + 1) 4 s−1 4 s+1 2 s +1

Hence
1 −1 1 1 −1 1 1 −1 1
     
−1
y(t) = L (Y ) = L − L + L
4 s−1 4 s+1 2 s2 + 1
1 1 1
= et − e−t + sin t.
4 4 2

An advantage of this technique is that it is not necessary to solve


for the general solution of the homogeneous differential equation
and then determine the arbitrary constants in that solution. Apart
from this it can also be used for higher order differential equations,
as we saw in Example 11.19.

Exercise 11.4.1. Solve the initial value problems using the Laplace
transform :
(a) y0 (t) − 9y(t) = t, y(0) = 5
(b) y00 (t) − 4y0 (t) + 4y(t) = cos t, y(0) = 1, y0 (0) = −1
(c) y00 (t) − 5y0 (t) + 6y(t) = e−t , y(0) = 0, y0 (0) = 2
(d) y(4) (t) − 4y(t) = 0, y(0) = 1, y0 (0) = 0, y00 (0) = −2, y000 (0) = 0

11.5 Shift theorems

We have now seen the general strategy for solving differential equa-
tions using Laplace transforms; we transform the differential prob-
lem to an algebraic one for Y (s) and then, given our knowledge of
inverse Laplace transforms, we attempt to reconstruct the form of
y(t). It is this last step that is potentially the tricky one for there is
always the possibility that Y (s) is of a form we do not recognise.
The situation gets worse. It is relatively straightforward to find
the Laplace transform of a function in as much that given an f (t)
we can, at least theoretically, compute F (s) using the definition of
a Laplace transform but, unfortunately, there is no easy equivalent
definition for going in the reverse direction (ie. given F (s), deduce
198 CHAPTER 11. LAPLACE TRANSFORMS

f (t)). Thus it is of importance to expand our repertoire of easily


identifiable inverse functions and this is facilitated using two so-
called shift theorems.

Theorem 11.20. If F (s) is the Laplace transform of f (t) for s > b, then
the Laplace transform of e at f (t) is

L(e at f (t)) = F (s − a) Note that there is no restriction on a


in this theorem: a can be positive or
negative.
for s − a > b. Equivalently,

L−1 ( F (s − a)) = e at f (t).

Proof. We have

Z∞ Z∞
at −st at
L e f (t) = e e f (t) dt = e−(s−a)t f (t) dt = F (s − a)
 

0 0

which proves the statement.

This is called s-shifting, as the graph of the function F (s − a) is


obtained from that of F (s) by shifting a units (to the right if a > 0
and to the left if a < 0) on the s-axis. Putting this result in words it
tells us that if the Laplace transform of f (t) is F (s), then the shifted
function F (s − a) is the transform of e at f (t).

Example 11.21. Find the Laplace transform of e at tn .


n!
Solution: Recall Example 11.3: L(tn )(s) = for s > 0. Using this
s n +1
and Theorem 11.20 we get

n!
L(e at tn )(s) = L(tn )(s − a) = ( s > a ).
( s − a ) n +1

For example,

4!
L(e2t t4 )(s) = L(t4 )(s − 2) = ( s > 2).
( s − 2)5

Example 11.22. Find the Laplace transform of e at cos(ωt).


s
Solution: The Laplace transform of f (t) = cos(ωt) is F (s) =
s2 + ω 2
(s > 0). Hence
s−a
L e at cos(ωt) = L(cos(ωt))(s − a) = ( s > a ).

( s − a )2 + ω 2

Exercise 11.5.1. Find the Laplace transform of the functions


(a) (t3 − 3t + 2)e−2t
(b) e4t (t − cos t)
1
Example 11.23. Find the inverse Laplace transform of .
(s − a)n
math1012 mathematical theory and methods 199

1 1
Solution: Notice = F (s − a) for F (s) = n . We know that
(s − a)n s
t n −1
f (t) = L−1 ( F )(s) = .
( n − 1) !
It follows that, for any integer n ≥ 1, we have

1 e at tn−1
 
L −1 n
(t) = e at f (t) = .
(s − a) ( n − 1) !
For example,

1 e−2t t2 t2 e−2t
 
L −1 (t) = = .
( s + 2)3 2! 2

Using s-shifting, we can find the inverse Laplace transform of


as + b
any function of the form G (s) = 2 , where ps2 + qs + r =
ps + qs + r
0 has no real roots.

Example 11.24. Find the inverse Laplace transform of

1 1
G (s) = = .
s2 − 4s + 7 ( s − 2)2 + 3
1
Solution: We have G (s) = F (s − 2), where F (s) = , so
s2 +3
1 √
L−1 ( F )(t) = √ sin( 3t).
3
1 √
By Theorem 11.20, L−1 ( G )(t) = √ e2t sin( 3t).
3
Here is a more complicated example.

Example 11.25. Find the inverse Laplace transform of

2s 2s
G (s) = = .
s2 + 2s + 5 ( s + 1)2 + 4
Solution: We use a similar method to the previous example.

2( s + 1) − 2
 
L −1 ( G ) ( t ) = L −1
( s + 1)2 + 4
2s − 2
 
= e − t L −1 2
s +4
s 1
 
= e − t L −1 2 2 − 2
s + 22 s 2 + 22
= e−t [2 cos(2t) − sin(2t)].

The second shifting theorem is related to the so-called Heaviside


function H (t) defined by

0 , t < 0
H (t) =
1 , t ≥ 0
200 CHAPTER 11. LAPLACE TRANSFORMS

This is also called the unit step function.


Notice that for any a ∈ R the graph of H (t − a) is obtained from
the graph of H (t) by shifting a units (to the right if a > 0 and to the
left if a < 0), that is:

0 , t < a
H (t − a) =
1 , t ≥ a

We point out that multiplying a given function g(t) by H (t − a)


has the effect of turning the function off until time t = a and then
activating it. More precisely, we have

 0 , t<a
g(t) H (t − a) =
 g(t) , t ≥ a

Multiplication by the pulse function H (t − a) − H (t − b), where


a < b has the effect of a switch. This function has value one for
a ≤ t < b and is zero for times t < a and t ≥ b. Thus the application
of this function is equivalent to turning on a switch at t = a then
turning it off again at a later t = b.


 0 , t<a


g(t)[ H (t − a) − H (t − b)] = g(t) , a ≤ t < b

0 , t≥b

Because the Heaviside function is quite so important in real


problems it is helpful to note the result of the following theorem,
called t-shifting theorem.

Theorem 11.26. If the Laplace transform of f (t) is F (s) for s > b, then
for any a ≥ 0 we have

L[ f (t − a) H (t − a)] = e−as F (s) There is a restriction on a in this


theorem: these results are not valid if a
is negative.
for s > b. Consequently,

L−1 e−as F (s) = f (t − a) H (t − a).




You can try to prove this result (it involves the definition of the
Laplace transform of H and one change of coordinate).

Example 11.27. Find L( H (t − a)) where a ≥ 0.

Solution: We take f (t) = 1 and apply the theorem. We saw in Example


1
11.2 that F (s) = L( f ) = for s > 0. Thus
s
e− as
L ( H (t − a)) = L ( f (t − a) H (t − a)) = e−as F (s) = , for s > 0.
s
math1012 mathematical theory and methods 201

Example 11.28. Find L( g(t)) where



t if 0 ≤ t < 3
g(t) =
1 − 3t if t ≥ 3

Solution: We can express g(t) with Heaviside functions:

g(t) = t[ H (t) − H (t − 3)] + (1 − 3t) H (t − 3)


Recall we are only concerned with
= tH (t) + (1 − 4t) H (t − 3) functions defined on [0, ∞). On that
interval, H (t) is nothing else than the
= t + (1 − 4t) H (t − 3). constant function 1.

This is still not in the form required in order to use Theorem 11.26:
in the second term we have to write 1 − 4t as a function of t − 3. Since
t = (t − 3) + 3, we have 1 − 4t = 1 − 4(t − 3) − 12 = −4(t − 3) − 11.
Thus
g(t) = t − [4(t − 3) + 11] H (t − 3).
We now apply Theorem 11.26 with f (t) = 4t + 11.

4 11
 
−3s −3s
L([4(t − 3) + 11] H (t − 3)) = L( f (t − 3) H (t − 3))(s) = e F (s) = e 2
+ .
s s

Thus
1 4 11
 
L( g)(s) = 2 − e−3s + .
s s2 s

We now solve a differential equation where the right-hand side


uses Heavyside functions.

Exercise 11.5.2. Find the Laplace transform of the function



2t + 1 if 0 ≤ t < 2
f (t) =
2 − 3t if t ≥ 2.

e−4s
 
−1
Example 11.29. Find L .
s3
1
Solution: We apply the theorem with a = 4 and F (s) = 3 , so that
 −4s  s
e
L −1 = f (t − 4) H (t − 4). All we have to do is determine f (t).
s3
1
From the table we get that f (t) = t2 , so that,
2

 −4s 
e ( t − 4 ) 2 0 if t < 4
L −1 3
= H ( t − 4 ) = 1
s 2 2
 (t − 4) if t ≥ 4
2

Exercise 11.5.3. Find the inverse Laplace transforms of


e−s
(a)
( s − 5)3
se−2s
(b) 2
s +9
202 CHAPTER 11. LAPLACE TRANSFORMS

Example 11.30. Solve the initial value problem

y00 + y = H (t − 1) − H (t − 2)

with initial conditions y(0) = 0 and y0 (0) = 1.


Solution: Taking transforms, Y (s) = L(y) satisfies

e−s e−2s
s2 Y (s) − sy(0) − y0 (0) + Y (s) = −
s s
and solving for Y yields that

1 1
Y (s) = + (e−s − e−2s ) 2
s2 + 1 s ( s + 1)
1 1 s
 
= 2 −s
+ (e − e )−2s
− 2 We used partial fractions here.
s +1 s s +1
1 −s 1 s −2s 1 s
   
= 2 +e − 2 −e − 2 .
s +1 s s +1 s s +1

We now need to find the inverse Laplace transform of


 Y (s). For the
 last
1 s
two terms, we will apply Theorem 11.26 with F (s) = − 2 , so
s s +1
that f (t) = 1 − cos t, and with a = 1 or 2. We get

y(t) = sin t + H (t − 1)[1 − cos(t − 1)] − H (t − 2)[1 − cos(t − 2)].

Hence

sin t,

 0≤t<1
y(t) = sin t + 1 − cos(t − 1), 1≤t<2

sin t − cos(t − 1) + cos(t − 2), 2≤t

Note that in this solution both y and y0 are continuous at t = 1 and


t = 2.

Exercise 11.5.4. Solve the initial value problem

y00 (t) − 2y0 (t) − 3y(t) = f (t),

where 
0 if 0 ≤ t < 4
f (t) =
12 if t ≥ 4,

such that y(0) = 1 and y0 (0) = 0.

11.6 Derivatives of transforms

If f (t) is of exponential order ≤ γ and piecewise continuous and


bounded on [0, T ] for any T > 0, then by Theorem 11.9,

Z∞
F (s) = L( f )(s) = e−st f (t) dt
0

exists for s > γ. Moreover we have the following:


math1012 mathematical theory and methods 203

Theorem 11.31. Derivative of transform. Under the above assump-


tions, F 0 (s) exists for all s > γ, and
− F 0 (s) = L (t f (t)) . (11.1)
Consequently,
L −1 F 0 ( s ) = − t f ( t ) ( t ≥ 0).


The proof of this result is omitted here as our focus is on how we


might use Theorem 11.31 to find more function-transform pairs.
Example 11.32. In Exercise 11.3.2, we found the transform of g(t) =
t sin(ωt) by differentiating twice. We now have an easier method since by
taking f (t) = sin(ωt) in Equation (11.1), so that F (s) = L(sin(ωt)) =
ω
, we get:
s + ω2
2

d 2ωs
 
ω
L(t sin(ωt)) = − F 0 (s) = − = 2 .
ds s2 + ω 2 ( s + ω 2 )2

Exercise 11.6.1. Use Theorem 11.31 twice to find the Laplace transform
of f (t) = t2 cos(ωt).

11.7 Convolution

One last idea relevant to the theory of Laplace transforms is that


of the convolution of two functions. Very often the solution of the
transformed problem can be written in the form Y (s) = F (s) G (s);
it is extremely tempting to suppose that y(t) = f (t) g(t) which
would say that the Laplace transform of a product of two func-
tions is the product of the Laplace transforms of the two functions.
Unfortunately things are not that simple and instead requires the
introduction of a concept known as the convolution.

Definition 11.33. (Convolution)


Given two functions f (t) and g(t), both of them being piecewise continu-
ous and bounded on every finite interval [0, T ], the convolution f ∗ g of f
and g is defined by
Zt
( f ∗ g)(t) = f (u) g(t − u) du.
0

Main properties of the convolution:


f ∗g = g∗ f commutative
f ∗ ( g1 + g2 ) = f ∗ g1 + f ∗ g2 distributive
( f ∗ g) ∗ h = f ∗ ( g ∗ h) associative
f ∗0 = 0∗ f = 0
However, note that f ∗ 1 is not equal to f in general and that
f ∗ f can be negative.
204 CHAPTER 11. LAPLACE TRANSFORMS

Theorem 11.34. (The Convolution Theorem) Let f (t) and g(t) be as


above and let F (s) = L( f ) and G (s) = L( g) be defined for s > γ. Then

L( f ∗ g)(s) = F (s) G (s) ( s > γ ).

Equivalently, L−1 ( F (s) G (s)) = ( f ∗ g)(t).

This result tells us that if f (t) and g(t) have Laplace transforms
F (s) and G (s) respectively then the function ( f ∗ g)(t) has Laplace
transform F (s) G (s).

Example 11.35. Find the inverse Laplace transform of

1
K (s) =
( s − 1)2 ( s − 3)2

using the Convolution Theorem. Another method would be to use


partial fractions as in Section 11.2.

Solution: Notice that, by Example 11.23,

1 1
   
L −1 = tet = f (t) , L−1 = te3t = g(t).
( s − 1)2 ( s − 3)2

Therefore by the Convolution Theorem,

1 1
 
L −1 ( K ) = L −1 · = f (t) ∗ g(t)
( s − 1) ( s − 3)2
2

Zt Zt
= f (u) g(t − u) du = ueu (t − u)e3(t−u) du
0 0
Zt
= e3t (tu − u2 )e−2u du.
0

To evaluate the latter integral we have to use several integrations by parts


which shows that

Zt
t − 1 + (t + 1)e−2t
(tu − u2 )e−2u du = .
4
0

t − 1 + (t + 1)e−2t (t − 1)e3t + (t + 1)et


Thus, L−1 (K ) = e3t = .
4 4

1
Example 11.36. Find the inverse Laplace transform of F (s) =
( s2 + 1)2
using the Convolution Theorem.

1 1 1
Solution: Since F (s) = · and L−1 ( 2 ) = sin(t),
( s2 + 1) ( s2 + 1) s +1
math1012 mathematical theory and methods 205

by the Convolution Theorem,

Zt
−1
f (t) =L ( F ) = sin(t) ∗ sin(t) = sin(u) sin(t − u) du
0

Zt
1
= [cos(u − (t − u)) − cos(u + (t − u))] du (using the cosine sum formula)
2
0

Zt
1
= [cos(2u − t) − cos t] du
2
0

t
1 1

= sin(2u − t) − u cos t
2 2 0

1 1
= sin t − t cos t.
2 2

Exercise 11.7.1. In the same manner as the previous example, show that

s2 1 1
 
−1
L = sin t + t cos t .
( s2 + 1)2 2 2

Exercise 11.7.2. Find the inverse Laplace transform of the functions


using the Convolution Theorem:
1
(a) 2
(s + 4)(s2 − 4)
s
(b) 2
(s + a )(s2 + b2 )
2

e−2s
(c) 2
s + 16
We will use an example to demonstrate how the convolution
Theorem can be applied to solve differential equations.

Example 11.37. Solve the initial value problem



1 if 0 ≤ t < 1
y00 (t) + y(t) = f (t) =
0 if t ≥ 1

with initial conditions y(0) = 0, y0 (0) = 1.


Solution: First we write the right-hand side using Heaviside functions:
f (t) = H (t) − H (t − 1). Applying the Laplace transform to the DE, we
get
1 − e−s
s2 Y ( s ) − 1 + Y ( s ) =
s
where Y (s) = L(y). This gives (using partial fractions)

1 1
Y (s) = + (1 − e − s ) 2
s2 + 1 s ( s + 1)
1 1 s e−s 1
= 2 + − 2 − · 2 .
s +1 s s +1 s s +1
206 CHAPTER 11. LAPLACE TRANSFORMS

Taking the inverse Laplace transform of this and using the Convolution
Theorem for the last term, we get

y(t) = L−1 (Y ) = sin t + 1 − cos t − H (t − 1) ∗ (sin t).

To evaluate the convolution integral, notice that when 0 ≤ u ≤ t < 1 we


have H (u − 1) = 0. Thus,

Zt
H (t − 1) ∗ (sin t) = H (u − 1) sin(t − u) du = 0, for t < 1
0

while for t ≥ 1 we have:

Zt
H (t − 1) ∗ (sin t) = H (u − 1) sin(t − u) du
0

Zt
= sin(t − u) du
1

= [cos(t − u)]1t

= [1 − cos(t − 1)].

Hence H (t − 1) ∗ (sin t) = [1 − cos(t − 1)] H (t − 1).


Finally we get the solution to the initial value problem:

y(t) = sin t + 1 − cos t − H (t − 1)[1 − cos(t − 1)].

Exercise 11.7.3. Solve the following initial value problems using Laplace
transforms:
(a) y00 (t) + 4y0 (t) + 13y(t) = f (t), y(0) = y0 (0) = 0, where f (t) = 1
for 0 ≤ t < π and f (t) = 0 for t ≥ π.
(b) y00 (t) + 2y0 (t) + 2y(t) = sin t, y(0) = y0 (0) = 0
math1012 mathematical theory and methods 207

11.8 Laplace transforms table

Z∞
L ( f (t)) = F (s) = f (t)e−st dt
0

SPECIFIC FUNCTIONS GENERAL RULES


. . . .
F (s) f (t) F (s) f (t)

1. . e−. as .
1 H (t − a)
s s
1 . tn.−1 . .
, n ∈ Z+ e− as F (s) f (t − a) H (t − a)
sn ( n − 1) !
1. . . .
e at F (s − a) e at f (t)
s−a
1 . t.n−1 . .
n
, n ∈ Z+ e at sF (s) − f (0) f 0 (t)
(s − a) ( n − 1) !
1. sin(.ωt) . .
s2 F ( s ) − s f (0) − f 0 (0) f 00 (t)
s2 + ω 2 ω
s. . . .
cos(ωt) F 0 (s) −t f (t)
s2 + ω2
1. e at sin. (ωt) . .
F (n) ( s ) (−t)n f (t)
( s − a )2 + ω 2 ω
s −. a . .
F (s)
Zt .
e at cos(ωt) f (u) du
( s − a )2 + ω 2 s
0

1. sin(ωt) − .ωt cos(ωt) . .


F (s) G (s) f ∗ g (t)
( s + ω 2 )2
2
2ω 3
s. t sin.(ωt)
( s2 + ω 2 )2 2ω

Higher derivatives:
 
L f ( n ) ( t ) = s n F ( s ) − s n −1 f (0 ) − s n −2 f 0 (0 ) − · · · − s f ( n −2) (0 ) − f ( n −1) (0 )

The Convolution Theorem:


Zt
L ( f ∗ g) = L ( f ) L ( g) where f ∗ g (t) = f (u) g(t − u) du


0
12
Appendix - Useful formulas

Exponential and logarithmic functions

(Natural) exponential function: y y = ex

y = e x , Domain R, Range (0, ∞). y=x

.
(Natural) logarithmic function: Note that ln x is shorthand for loge x.
y = ln x
y = ln x, Domain (0, ∞), Range R. x
.
Cancellation equations:
ln(e x ) = x and eln x = x.
.

Index and log laws:

ex
e x ey = e x +y = e x −y (e x )y = e xy
ey
x
 
ln( xy) = ln x + ln y ln = ln x − ln y ln( x y ) = y ln x
y

Trigonometry

O 1 H
sin θ = H ; cosec θ = sin θ = O; Note that cosec θ is also known as just
A 1 H csc θ.
cos θ = H ; sec θ = cos θ = A;
H
O tan θ = O
A cot θ = 1
tan θ = A
O
= O/H
A/H
θ sin θ
A = cos θ .

Reference triangles for common angles:

π π
4 3

2 2
1 1

π π
4 6

1 3
210 CHAPTER 12. APPENDIX - USEFUL FORMULAS

Trigonometric functions:

(a) y = sin x (b) y = cos x (c) y = tan x


. . .
π
Domain R Domain R Domain x 6= + nπ
2
. . .
Range [−1, 1] Range [−1, 1] Range R

(d) y = cosec x (e) y = sec x (f) y = cot x


. . .
π
Domain x 6= nπ Domain x 6= + nπ Domain x 6= nπ
2
. . .
Range R Range R Range R

Trigonometric properties:

Fundamental properties: sin2 x + cos2 x = 1,


tan2 x + 1 = sec2 x,
1 + cot2 x = cosec2 x.

Odd/even properties: sin(− x ) = − sin x, cos(− x ) = cos x.

Addition formula: sin( x + y) = sin x cos y + cos x sin y,

cos( x + y) = cos x cos y − sin x sin y,


tan x + tan y
tan( x + y) = .
1 − tan x tan y

Half-angle formula: sin(2x ) = 2 sin x cos x,

cos(2x ) = cos2 x − sin2 x = 2 cos2 x − 1 = 1 − 2 sin2 x,


2 tan x
tan(2x ) = .
1 − tan2 x

Product formula: sin x cos y = 21 [sin( x + y) + sin( x − y)],


sin x sin y = 12 [cos( x − y) − cos( x + y)],
cos x cos y = 12 [cos( x + y) + cos( x − y)].
math1012 mathematical theory and methods 211

Inverse trigonometric functions:

(g) y = sin−1 x (h) y = cos−1 x (i) y = tan−1 x


. . .
Domain [−1, 1] Domain [−1, 1] Domain R
h. π π i . . π π 
Range − , Range [0, π ] Range − ,
2 2 2 2

Differentiation
dy du dv
The product rule: If y = uv then =v +u .
dx dx dx
du dv
u dy v −u
The quotient rule: If y = then = dx dx .
v dx v2

dy dy du
The chain rule: If y = f (u) and u = g( x ) then = × .
dx du dx

Integration

Integration by inverse trigonometric substitution:

Integral involves Then substitute Restriction on u Use the identity


p . . π . π .
a2 − x 2 x = a sin u − ≤u≤ 1 − sin2 u = cos2 u
2 2
p . . π . π .
a2 + x 2 x = a tan u − <u< 1 + tan2 u = sec2 u
2 2
p . . . π .
x 2 − a2 x = a sec u 0≤u< sec2 u − 1 = tan2 u
2

To return to the original variable x use the reference triangles illus-


trated below.


a x2 + a2 x √
x x x2 − a2
u u u
√ a a
a2 − x2

(j) Reference triangle for (k) Reference triangle for (l) Reference triangle for
x = a sin u x = a tan u x = a sec u
212 CHAPTER 12. APPENDIX - USEFUL FORMULAS

Integration by half-angle substitution:


x
The substitution u = tan

2
x = 2 tan−1 u with reference tri-

angle shown to the right turns an 1 + u2 u
integral with a quotient involving
x
sin x and/or cos x into an integral of 2
a rational function of u, where 1
2u 1 − u2
sin x = and cos x = .
1 + u2 1 + u2

Integration by partial fractions:

P( x )
A rational function f ( x ) = with deg( P( x )) < deg( Q( x )) can
Q( x )
be decomposed into partial fractions as follows:

Case 1: Denominator has distinct linear factors

P( x ) A1 Ak
f (x) = = +···+ ,
( x − a1 ) · · · ( x − a k ) x − a1 x − ak
where a1 , . . . , ak are pairwise distinct.

Case 2: Denominator has repeated linear factors

P( x ) B1 B2 Bc−1 Bc
f (x) = c
= + +···+ + .
( x − a) x − a ( x − a) 2 ( x − a) c − 1 ( x − a)c

Case 3: Denominator has an irreducible factor of degree 2

P( x ) A1 C x + C2
f (x) = = + 21 .
( x − a)( x2 + bx + c) x−a x + bx + c

If deg( P( x )) ≥ deg( Q( x )) use polynomial division on the rational


function before decomposing into partial fractions.

Integration by parts:
Z Z
u dv = uv − v du.

Use the following table as a guide:

u dv
. .
Exponential
Polynomial
Trigonometric
.
Logarithmic .
Polynomial
Inverse trigonometric
math1012 mathematical theory and methods 213

Differentiation and integration formulas

.
dy . .
Z
y y dx
dx
0 a (constant) ax + C
. . x n+1.
nx n−1 x n (n 6= −1) +C
n+1
1 . 1 . −1 .
− or − x −2 or x ln x + C
x2 x
ex ex ex + C
1. . .
ln x x ln x − x + C
x
cos x sin x − cos x + C
− sin x cos x sin x + C
sec.2 x tan. x ln(sec .x ) + C
− cot x cosec x cosec x ln(cosec x − cot x ) + C
tan x sec x sec x ln(sec x + tan x ) + C
−cosec. 2x cot. x ln(sin .x ) + C
1
√ . .
sin−1 x
.
x sin−1 x + 1 − x2 + C
p
1 − x2
.1 . .
cos−1 x x cos−1 x −
p
−√ 1 − x2 + C
1 − x2
1. . 1.
tan−1 x x tan−1 x − ln(1 + x2 ) + C
1 + x2 2
214 CHAPTER 12. APPENDIX - USEFUL FORMULAS

Laplace transforms table

Z∞
L ( f (t)) = F (s) = f (t)e−st dt
0

SPECIFIC FUNCTIONS GENERAL RULES


. . . .
F (s) f (t) F (s) f (t)

1. . e−. as .
1 H (t − a)
s s
1 . tn.−1 . .
, n ∈ Z+ e− as F (s) f (t − a) H (t − a)
sn ( n − 1) !
1. . . .
e at F (s − a) e at f (t)
s−a
1 . t.n−1 . .
n
, n ∈ Z+ e at sF (s) − f (0) f 0 (t)
(s − a) ( n − 1) !
1. sin(.ωt) . .
s2 F ( s ) − s f (0) − f 0 (0) f 00 (t)
s2 + ω 2 ω
s. . . .
cos(ωt) F 0 (s) −t f (t)
s2 + ω2
1. e at sin. (ωt) . .
F (n) ( s ) (−t)n f (t)
( s − a )2 + ω 2 ω
s −. a . .
F (s)
Zt .
e at cos(ωt) f (u) du
( s − a )2 + ω 2 s
0

1. sin(ωt) − .ωt cos(ωt) . .


F (s) G (s) f ∗ g (t)
( s + ω 2 )2
2
2ω 3
s. t sin.(ωt)
( s2 + ω 2 )2 2ω

Higher derivatives:
 
L f ( n ) ( t ) = s n F ( s ) − s n −1 f (0 ) − s n −2 f 0 (0 ) − · · · − s f ( n −2) (0 ) − f ( n −1) (0 )

The Convolution Theorem:


Zt
L ( f ∗ g) = L ( f ) L ( g) where f ∗ g (t) = f (u) g(t − u) du


0
13
Index

p-series, 119 discriminant, 174 improper integrals, 107


divergent, 120 inconsistent., 9
divergent, 114 diverges to ∞, 115 independent, 41
domain, 81 infimum, 117
absolutely convergent, 128 infinite series, 119
additive identity, 53 eigenspace, 98 initial conditions, 169, 185
Alternating series, 127 eigenvalues, 97 initial value problem, 169, 185
associative, 52 eigenvectors, 97 integrating factor, 167
augmented matrix, 13 elementary matrix, 78 inverse, 64, 66
auxiliary equation, 173 elementary row operation, 11 inverse function, 88
Euler’s formula, 174 inverse Laplace transform, 187, 191
back substitution, 15 Euler’s formulae, 141 invertible, 66, 88
basic variables, 18 Even expansion, 150
basis, 44 even function, 146 kernel, 86
boundary conditions, 185 explicit solution, 166
boundary value problem, 185 exponential order, 190 Laplace transform, 187
leading entries, 18
carrying capacity, 162 family of solution curves, 165 leading entry, 15
change of coordinates matrix, 91 Fourier coefficients, 138 leading variables, 18
characteristic equation, 173 Fourier cosine series, 147 left-distributive, 52
codomain, 81 Fourier series expansion, 138 linear, 159, 167, 171
coefficient matrix, 10 Fourier sine series, 149 linear combination, 34
column space, 55 free parameter, 19 linear transformation, 81
column vector, 28 free variable, 19 linearly dependent, 172
commutative, 52 full rank, 60 linearly independent, 41, 172
commute, 53 function, 81 logistic growth model, 162
complimentary solution, 177 lower-triangular matrix, 54
conditionally convergent., 128 Gaussian Elimination, 15
consistent, 9 general solution, 166, 172 MacLaurin Series, 134
constant coefficients, 171 geometric series, 119, 120 main diagonal, 54
contrapositive, 42 group, 72 matrix, 51
convergent, 114, 120 matrix addition, 52
convolution, 203 half-range expansion, 150 matrix multiplication, 52
coordinates, 49 harmonic series, 119 matrix transposition, 52
Heaviside function, 199 method of undetermined coeffi-
Derivative of transform, 203 homogeneous, 25, 171 cients, 178
determinant, 72 monotone, 118
diagonal matrix, 54 idempotent matrix, 54 multiplicative identity, 53
differential equations, 159 identity, 54
dilation, 85 identity matrix, 54, 85 nilpotent matrix, 54
dimension, 47 image, 81 non-basic variable, 19
direction field, 164 implicit solution, 167 non-invertible, 66
216 CHAPTER 13. INDEX

nonhomogeneous, 171 Rank-Nullity Theorem, 62 subspace, 29


nonhomogeneous term, 171 ratio, 85 sum, 28
nonlinear, 159 reduced row-echelon form, 22 supremum, 117
null space, 60 reduction of order, 175 symmetric matrix, 54
nullity, 62 Ricatti differential equation, 165 systems of linear equations, 7
right-distributive, 52
Odd expansion, 150 row echelon form, 14 Taylor series, 134
odd function is, 146 row space, 55 transpose, 52
order, 159 row vector, 28 Type I improper integrals, 107
ordinary differential equation, 159 row-reduction, 15 Type II improper integrals, 109
orthogonal projection, 82
scalar, 7
upper-triangular matrix, 54
partial differential equation, 159 scalar multiple, 7
particular solution, 177 scalar multiplication, 28, 52
variable coefficients, 171
period of a function, 137 separable, 166
variation of parameters, 178
periodic extension, 145 separation of variables, 162
vector addition, 28
piecewise continuous, 141 sequence, 113
vector space, 27
pivot entry, 16 similar, 94
vector subspace, 29
pivot position, 16 skew-symmetric matrix, 54
vectors, 27
power series, 133 slope field, 164
product, 28 span, 35
proof, 12 spanning set, 37 Wronskian, 172
pulse function, 200 standard basis, 45
standard basis vectors, 45 zero divisors, 54
radius of convergence, 132 standard form, 167 zero matrix, 54
rank, 60 standard matrix, 84 zero-vector, 29

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy