Math Methods
Math Methods
Math Methods
D EPARTMENT OF C OMPUTING
I MPERIAL C OLLEGE OF S CIENCE , T ECHNOLOGY AND M EDICINE
Instructors:
Marc Deisenroth, Mahdi Cheraghchi
1 Sequences 1
1.1 The Convergence Definition . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Illustration of Convergence . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Common Converging Sequences . . . . . . . . . . . . . . . . . . . . . 3
1.4 Combinations of Sequences . . . . . . . . . . . . . . . . . . . . . . . 4
1.5 Sandwich Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.6 Ratio Tests for Sequences . . . . . . . . . . . . . . . . . . . . . . . . 6
1.7 Proof of Ratio Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.8 Useful Techniques for Manipulating Absolute Values . . . . . . . . . 8
1.9 Properties of Real Numbers . . . . . . . . . . . . . . . . . . . . . . . 9
2 Series 10
2.1 Geometric Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.2 Harmonic Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.3 Series of Inverse Squares . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.4 Common Series and Convergence . . . . . . . . . . . . . . . . . . . . 13
2.5 Convergence Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.6 Absolute Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.7 Power Series and the Radius of Convergence . . . . . . . . . . . . . . 18
2.8 Proofs of Ratio Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3 Power Series 22
3.1 Basics of Power Series . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.2 Maclaurin Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.3 Taylor Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.4 Taylor Series Error Term . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.5 Deriving the Cauchy Error Term . . . . . . . . . . . . . . . . . . . . . 29
3.6 Power Series Solution of ODEs . . . . . . . . . . . . . . . . . . . . . 29
I
CONTENTS Table of Contents
5 Complex Numbers 40
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
5.1.1 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
5.1.2 Imaginary Number . . . . . . . . . . . . . . . . . . . . . . . . 41
5.1.3 Complex Numbers as Elements of R2 . . . . . . . . . . . . . . 42
5.1.4 Closure under Arithmetic Operators . . . . . . . . . . . . . . 42
5.2 Representations of Complex Numbers . . . . . . . . . . . . . . . . . . 43
5.2.1 Cartesian Coordinates . . . . . . . . . . . . . . . . . . . . . . 43
5.2.2 Polar Coordinates . . . . . . . . . . . . . . . . . . . . . . . . . 43
5.2.3 Euler Representation . . . . . . . . . . . . . . . . . . . . . . . 44
5.2.4 Transformation between Polar and Cartesian Coordinates . . 45
5.2.5 Geometric Interpretation of the Product of Complex Numbers 47
5.2.6 Powers of Complex Numbers . . . . . . . . . . . . . . . . . . 47
5.3 Complex Conjugate . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
5.3.1 Absolute Value of a Complex Number . . . . . . . . . . . . . . 49
5.3.2 Inverse and Division . . . . . . . . . . . . . . . . . . . . . . . 49
5.4 De Moivre’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
5.4.1 Integer Extension to De Moivre’s Theorem . . . . . . . . . . . 51
5.4.2 Rational Extension to De Moivre’s Theorem . . . . . . . . . . 51
5.5 Triangle Inequality for Complex Numbers . . . . . . . . . . . . . . . 52
5.6 Fundamental Theorem of Algebra . . . . . . . . . . . . . . . . . . . . 53
5.6.1 nth Roots of Unity . . . . . . . . . . . . . . . . . . . . . . . . 53
5.6.2 Solution of zn = a + ib . . . . . . . . . . . . . . . . . . . . . . 54
5.7 Complex Sequences and Series* . . . . . . . . . . . . . . . . . . . . . 55
5.7.1 Limits of a Complex Sequence . . . . . . . . . . . . . . . . . . 55
5.8 Complex Power Series . . . . . . . . . . . . . . . . . . . . . . . . . . 57
5.8.1 A Generalized Euler Formula . . . . . . . . . . . . . . . . . . 57
5.9 Applications of Complex Numbers* . . . . . . . . . . . . . . . . . . . 57
5.9.1 Trigonometric Multiple Angle Formulae . . . . . . . . . . . . 57
5.9.2 Summation of Series . . . . . . . . . . . . . . . . . . . . . . . 58
5.9.3 Integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
6 Linear Algebra 60
6.1 Linear Equation Systems . . . . . . . . . . . . . . . . . . . . . . . . . 62
6.1.1 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
6.2 Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
6.2.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
6.2.2 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
6.3 Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
6.3.1 Matrix Multiplication . . . . . . . . . . . . . . . . . . . . . . . 64
6.3.2 Inverse and Transpose . . . . . . . . . . . . . . . . . . . . . . 66
6.3.3 Multiplication by a Scalar . . . . . . . . . . . . . . . . . . . . 67
II
Table of Contents CONTENTS
III
CONTENTS Table of Contents
IV
Chapter 1
Sequences
For all > 0, we can always find a positive integer N , such that, for all
n > N:
|an − l| <
This statement is best understood by taking the last part first and working back-
wards.
The limit inequality |an − l| < is called the limit inequality and says that the dis-
tance from the nth term in the sequence to the limit should be less than .
For all n > N . . . There should be some point N after which all the terms an in the
sequence are within of the limit.
For all > 0 . . . No matter what value of we pick, however tiny, we will still be able
to find some value of N such that all the terms to the right of aN are within
of the limit.
How do we prove convergence So that is the definition, but it also tells us how
to prove that a sequence converges. We need to find the N for any given value of .
If we can find this quantity then we have a rigorous proof of convergence for a given
sequence. We have an example below for an = 1/n.
1
1.2. Illustration of Convergence Chapter 1. Sequences
an
1
0.8
0.6
0.4
0.2
|an − l|
0 N
0 2 4 6 8 10 12 14
nth term
Applying the limit inequality In the example in Figure 1.1 we have taken a par-
ticular value of to demonstrate the concepts involved. Taking = 0.28 (picked
arbitrarily), we apply the |an − l| < inequality, to get:
|1/n| < 0.28
1/n < 0.28
n > 1/0.28
n > 3.57
This states that for n > 3.57 the continuous function 1/n will be less than 0.28 from
the limit. However, we are dealing with a sequence an which is only defined at
integer points on this function. Hence we need to find the N th term, aN , which is
the first point in the sequence to also be less than this value of . In this case we can
see that the next term above 3.57, that is N = 4, satisfies this requirement.
For all n > N We are further required to ensure that all other points to the right of
aN in the sequence an for n > N are also within 0.28 of the limit. We know this to be
2
Chapter 1. Sequences 1.3. Common Converging Sequences
true because the condition on n we originally obtained was n > 3.57, so we can be
sure that aN , aN +1 , aN +2 , . . . are all closer to the limit than 0.28.
For all > 0 The final point to note is that it is not sufficient to find N for just a
single value of . We need to find N for every value of > 0. Since N will vary with
as mentioned above, we are clearly required to find a function N ().
In the case of an = 1/n, this is straightforward. We apply the limit inequality in
general and derive a condition for n.
1 1
< ⇒ n>
n
We are looking for a greater-than inequality. So long as we get one, we can select
the next largest integer value; we use the ceiling function to do this, giving:
1
N () =
Now whatever value of we are given, we can find a value of N using this function
and we are done.
1 1 1
1. an = → 0, also an = 2 → 0, an = √ → 0
n n n
1
2. In general, an = → 0 for some positive real constant c > 0
nc
1 1 1
3. an = n
→ 0, also an = n → 0, an = n → 0
2 3 e
1
4. In general, an = → 0 for some real constant |c| > 1;
cn n
or equivalently an = c → 0 for some real constant |c| < 1
1
5. an = → 0 (Hard to prove directly, easier to use a ratio test in Section 1.6).
n!
1
6. an = → 0 for n > 1
log n
3
1.4. Combinations of Sequences Chapter 1. Sequences
2. lim (an + bn ) = a + b
n→∞
3. lim (an − bn ) = a − b
n→∞
4. lim (an bn ) = ab
n→∞
an a
5. lim = provided that b , 0
n→∞ bn b
We proved rule (2) in lectures, you can find proofs of the other results in “Analysis:
an introduction to proof” by Stephen Lay.
Example These sequence combination results are useful for being able to under-
stand convergence properties of combined sequences. As an example, if we take the
sequence:
3n2 + 2n
an = 2
6n + 7n + 1
We can transform this sequence into a combination of sequences that we know how
to deal with, by dividing numerator and denominator by n2 .
3 + n2
an =
6 + n7 + n12
b
= n
cn
where bn = 3 + n2 and cn = 6 + n7 + n12 . We know from rule (5) above that if bn → b and
cn → c, then bcnn → bc . This means that we can investigate the convergence of bn and
cn separately.
Similarly we can rewrite bn as 3 + dn , for dn = n2 . We know that dn = n2 → 0 as it is
one of the common sequence convergence results, thus bn → 3 by rule (2) above. By
a similar argument, we can see that cn → 6 using composition rule (2) and common
results.
Finally, we get the result that an → 21 .
4
Chapter 1. Sequences 1.5. Sandwich Theorem
you wish to reason about. The two bounding sequences should be given as known
results, or proved to converge prior to use. The bounding sequences should be easier
to reason about than the sequence to be reasoned about.
1
un
ln
an
0.5
-0.5
-1
0 2 4 6 8 10 12 14
nth term
Then an → l as n → ∞.
5
1.6. Ratio Tests for Sequences Chapter 1. Sequences
Proof of sandwich theorem We know that ln and un tend to the same limit l. We
use the definition of convergence for both these sequences.
Given some > 0, we can find an N1 and N2 such that for all n > N1 , |ln − l| < and
for all n > N2 , |un − l| < . If we want both sequences to be simultaneously less than
from the limit l for a single value of N 0 , then we have to pick N 0 = max(N1 , N2 ).
So for N 0 = max(N1 , N2 ), for all n > N 0 , we know that both:
Therefore, removing the modulus sign and using the fact that ln < un :
We also have proved in the course of applying the sandwich theorem that ln ≤ an ≤ un
for all n > N . Hence, for all n > max(N , N 0 ):
l − < ln ≤ an ≤ un < l +
l − < an < l +
− < an − l <
⇒ |an − l| <
an+1
Ratio convergence test If an ≤ c < 1 for some c ∈ and for all sufficiently large n
(i.e., for all n ≥ N for some integer N ), then an → 0 as n → ∞.
an+1
Ratio divergence test If an ≥ c > 1 for some c ∈ and for all sufficiently large n
(i.e., for all n ≥ N for some integer N ), then an diverges as n → ∞.
6
Chapter 1. Sequences 1.7. Proof of Ratio Tests
Limit ratio test The limit ratio test is a direct corollary of the standard ratio tests.
Instead of looking at the ratio of consecutive terms in the sequence, we look at the
limit of that ratio. This is useful when the standard test is not conclusive and leaves
terms involving n in the ratio expression. To use this test, we compute:
an+1
r = lim
n→∞ an
Thus:
Proof of limit ratio test Justifying that this works demonstrates an interesting
an+1
(and new) use of the limit definition. We define a new sequence, bn = an , and we
can consider the limit of this new sequence, r.
7
1.8. Useful Techniques for Manipulating Absolute Values Chapter 1. Sequences
As with all sequences, we can say that for all > 0, we can find an N such that for
all n > N , |bn − r| < .
As before we can expand the modulus to give:
Now we take the definition of the limit literally and pick a carefully chosen value for
> 0 in both cases.
In the case where r < 1 (remember that this corresponds to a sequence that we wish
to show converges), we choose = 1−r2 > 0. Equation (∗) becomes:
1−r 1−r
r− < bn < r +
2 2
3r − 1 r +1
< bn <
2 2
r −1 r −1
r− < bn < r +
2 2
r +1 3r − 1
< bn <
2 2
In this case, we take the left hand side of the inequality, bn > r+1
2 and since r > 1, we
r+1 r+1
can show that 2 > 1 also. If we now take c = 2 > 1, then by the original ratio test,
we can say that the original sequence an must diverge.
Where we are combining absolute values of expressions, the following are useful,
where x, y as real numbers:
8
Chapter 1. Sequences 1.9. Properties of Real Numbers
x |x|
2. =
y |y|
3. |x + y| ≤ |x| + |y|, the triangle inequality
4. |x − y| ≥ ||x| − |y||
A set S can have many upper or lower bounds (or none), so:
By convention, if a set S has no upper bound, e.g., S = {x | x > 0}, then sup(S) = ∞.
Similarly if S has no lower bound then inf(S) = −∞.
The fundamental axiom of analysis states that every increasing sequence of real
numbers that is bounded above, must converge.
9
Chapter 2
Series
Series are sums of sequences. Whether they converge or not is an important factor
in determining whether iterative numerical algorithms terminate.
An infinite series is a summation of the form:
∞
X
S= an
n=1
Partial sums One easy way of determining the convergence of series is to construct
the partial sum – the sum up to the nth term of the series:
n
X
Sn = ai
i=1
Bounded above When ai > 0 for all i, then the partial sum Sn is an increasing
sequence, that is:
S1 < S2 < S3 < · · ·
If you can show that the sequence is bounded above, then as with other increasing
sequences, it must converge to some limit.
10
Chapter 2. Series 2.1. Geometric Series
x + x(Gn − xn ), so:
x − xn+1
Gn =
1−x
Gn is a sequence, so we can determine if Gn converges or diverges in the limit, taking
care to take into account behaviour that may depend on x. The xn+1 term is the only
varying term in the sequence, so from our knowledge of sequences we can say that
this and thus Gn will converge if |x| < 1.
x
Gn → as n → ∞ for |x| < 1
1−x
by rules of convergence for sequences.
Similarly, for |x| > 1, Gn diverges as n → ∞. For x = 1, Gn = n which also diverges.
11
2.3. Series of Inverse Squares Chapter 2. Series
1 1 1 1 1 1 1 1 1 1
S = 1 + + + + + + + +··· > 1 + + + + ···
2 3 4 5 6 7 8 2 2 2
| {z } | {z }
> 14 + 14 > 18 + 18 + 18 + 18
Clearly 1 + n2 diverges and since Sn is greater than this sequence, it must also diverge
as n → ∞ and so does S.
1 1 1
Using partial fractions, we can rewrite i(i+1)
as i − i+1 . The the nth partial sum of T
can therefore be written as:
n
1 1 1
X
Tn = − = 1− (2.1)
i i +1 n+1
i=1
(To see why the right hand side is true, try writing out the first and last few terms
of the sum and observing which cancel). From this, we see that Tn converges giving
T = 1.
We will use the partial sum result to gives us a handle on being able to understand
the convergence or divergence of:
∞
X 1
S=
n2
n=1
Pn 1
We start by considering terms of the partial sum Sn = i=1 i 2 , and we notice that:
1 1 1
< 2< for i ≥ 2
i(i + 1) i (i − 1)i
12
Chapter 2. Series 2.4. Common Series and Convergence
We sum from i = 2 (to avoid a 0 denominator on the right hand side) to n to get:
n
1 1 1 1 X1 1 1 1 1
+ + + ··· + < 2
< + + + ··· +
2·3 3·4 4·5 n(n + 1) i 2·1 3·2 4·3 n(n − 1)
i=2
We note that the left hand and right hand series above differ only in the first and last
terms, so can be rewritten as:
n n n−1
X 1 1 X 1 X 1
− < <
i(i + 1) 2 i2 i(i + 1)
i=1 i=2 i=1
1
Now we add 1 across the whole inequality to allow us to sum i2
from 1 and get the
required partial sum Sn as our middle expression:
n n−1
1 X 1 X 1
+ < Sn < 1 +
2 i(i + 1) i(i + 1)
i=1 i=1
This effectively bounds the sequence of partial sums, Sn . We use our result of (2.1)
to get:
3 1 1
− < Sn < 2 −
2 n+1 n
We see that the upper bound on Sn (and thus also S) is 2 and the lower bound is 32 .
Since the partial sum sequence is increasing, the existence of an upper bound proves
that the series converges. The value of the bound itself must lie between 32 and 2.
∞
X
3. Geometric series: S = xn diverges for |x| ≥ 1.
n=1
13
2.5. Convergence Tests Chapter 2. Series
Converging series The limit for the inverse squares series is given for interest only,
the essential thing is to know that the series converges.
∞
X x
1. Geometric series: S = xn converges to so long as |x| < 1.
1−x
n=1
∞
X 1 π2
2. Inverse squares series: S = converges to .
n2 6
n=1
∞
1
X 1
3. nc series: S = converges for c > 1.
nc
n=1
Limit comparison test Sometimes the following form of comparison test is easier,
as it does not involve having to find a constant λ to make the first form of the test
work:
∞
ai X
1. if lim exists, then ai converges
i→∞ ci
i=1
∞
di X
2. if lim exists, then ai diverges
i→∞ ai
i=1
14
Chapter 2. Series 2.5. Convergence Tests
D’Alembert’s ratio test The D’Alembert ratio test is a very useful test for quickly
discerning the convergence or divergence of a series. It exists in many subtly differ-
ent forms.
From some point N ∈ N in the series, for all i ≥ N :
∞
a X
1. If i+1 ≥ 1, then ai diverges
ai
i=1
∞
a X
2. If there exists a k such that i+1 ≤ k < 1, then ai converges
ai
i=1
D’Alembert’s limit ratio test From this definition we can obtain (Section 2.8) the
easier-to-use and more often quoted version of the test which looks at the limit of
the ratio of successive terms. This version states:
∞
a X
1. if lim i+1 > 1 then ai diverges
i→∞ ai
i=1
∞
a X
2. if lim i+1 = 1 then ai may converge or diverge
i→∞ ai
i=1
∞
a X
3. if lim i+1 < 1 then ai converges
i→∞ ai
i=1
P∞
Integral test This test is based on the idea1 that for a sequence S = n=1 an where
an = f (n) is a decreasing function, e.g., n1 as n ≥ 1:
∞
X Z ∞ ∞
X
an+1 < f (x) dx < an
n=1 1 n=1
In practice, you evaluate the integral and if it diverges you know that the series
diverges by the right hand inequality. If the integral converges, you know the series
does also; this time by the left hand inequality.
Formally stated: suppose that f (x) is a continuous, positive and decreasing function
on the interval [N , ∞) and that an = f (n) then:
Z∞ ∞
X
1. If f (x) dx converges, so does an
N n=N
you have come across Riemann Sums before, we Rare essentially using ∞
1 If P P∞
i=1 an and i=1 an+1 as
∞
left and right Riemann sums which bound the integral 1 f (x) dx where an = f (n).
15
2.5. Convergence Tests Chapter 2. Series
1.2
f (x) = 1x
1 an = n1
0.8
0.6
0.4
0.2
0
0 1 2 3 4 5 6 7 8 9 10
R∞
Figure 2.1: A divergence test: the series ∞ 1 1
P
n=1 n as compared to 1 x dx, showing that
the series is bounded below by the integral.
Z ∞ ∞
X
2. If f (x) dx diverges, so does an
N n=N
As usual we do not have to concern ourselves with the first N − 1 terms as long as
they are finite. If we can show that the tail converges or diverges, i.e., an for n ≥ N ,
then the whole series will follow suit.
Example:
P∞ 1 Integral test divergenceP∞ We can apply the integral test to show that
1
n=1 n diverges. Figure 2.1 shows a
n=1 n where an = n as a sum of 1×an rectangular
areas for n ≥ 1. When displayed like Pthis, and only because f (x) = 1x is a decreasing
∞
function,R we can see that the series n=1 an is strictly greater than the corresponding
∞
integral 1 f (x) dx. If we evaluate this integral:
Z ∞ Z b
1
f (x) dx = lim dx
1 b→∞ 1 x
= lim [ln x]b1
b→∞
= lim (ln b − ln 1)
b→∞
P∞
we get a diverging result. Hence the original series n=1 an must also diverge.
Example:
P∞ 1 Integral test convergence P We can apply the integral test to show that
∞ 1
n=1 n2 converges. Figure 2.2 shows n=1 an+1 where an = n2 as a sum of 1 × an+1
16
Chapter 2. Series 2.6. Absolute Convergence
0.4
f (x) = x12
0.35
an = n12
0.3
0.25
0.2
0.15
0.1
0.05
0
0 1 2 3 4 5 6 7 8 9 10
R∞
Figure 2.2: A convergence test: the series ∞ 1 1
P
n=2 n2 as compared to 1 x2 dx, showing
that the series is bounded above by the integral.
17
2.7. Power Series and the Radius of Convergence Chapter 2. Series
Tests for absolute convergence Where we have a series with a mixture of positive
and negative terms, an , we can test for absolute convergence by applying any of the
above tests to the absolute value of the terms |an |. For example the limit ratio test
becomes: for N ∈ N,
∞
an+1 X
1. if lim > 1 then an diverges
n→∞ an
n=1
∞
an+1 X
2. if lim = 1 then an may converge or diverge
n→∞ an
n=1
∞
an+1 X
3. if lim < 1 then an converges absolutely (and thus also converges)
n→∞ an
n=1
18
Chapter 2. Series 2.8. Proofs of Ratio Tests
x
However this limit, 1−x , and therefore also the power series expansion are only
equivalent if |x| < 1 as we have seen.
The radius of convergence represents the size of x set for which a power series
converges. If a power series converges for |x| < R, then the radius of convergence
is said to be R. Sometimes, we consider series which only converge within some
distance R of a point a, in this case, we might get the following result |x − a| < R. In
this case also the radius of convergence is R. If the series converges for all values of
x, then the radius is given as infinite.
We can apply the D’Alembert ratio test to this as with any other series, however
we have to leave the x variable free, and we have to cope with possible negative x
values.
We apply the version of the D’Alembert ratio test which tests for absolute conver-
gence (and thus also convergence) since, depending on the value of x, we may have
negative terms in the series.
a
(n + 1)2 xn+1
lim n+1 = lim
n→∞ an n→∞ n2 xn
1 2
= lim x 1 +
n→∞ n
= |x|
If we look at the ratio test, it says that if the result is < 1 it will converge. So our
general convergence condition on this series, is that |x| < 1. Therefore the radius of
convergence is 1.
19
2.8. Proofs of Ratio Tests Chapter 2. Series
aN +1 ≤ kaN
aN +2 ≤ kaN +1 ≤ k 2 aN
aN +3 ≤ kaN +2 ≤ k 2 aN +1 ≤ k 3 aN
..
.
aN +m ≤ k m aN
20
Chapter 2. Series 2.8. Proofs of Ratio Tests
Proof of D’Alembert limit ratio test Taking the two cases P∞ in turn:
Case limi→∞ aai+1
i
= l > 1: We are looking to show that i=1 ai diverges. Using the
definition of the limit, we know that from some point N for all i > N :
ai+1 − l <
ai
a
l − < i+1 < l +
ai
Since we are trying to show that this series diverges, we would like to use the left
hand inequality to show that the ratio aai+1 i
> 1 from some point onwards, for example
i > N . To do this we can pick = l − 1, recalling that l > 1, so this is allowed as > 0.
We now get that for all i > N :
a
1 < i+1
ai
P∞
and so by the non-limit D’Alembert ratio test, the series P∞ i=1 ai diverges.
Case limi→∞ aai+1
i
= l < 1: We are looking to show that i=1 ai converges in this case.
Again using the limit definition we get for all i > N :
ai+1 − l <
ai
a
l − < i+1 < l +
ai
21
Chapter 3
Power Series
where f (n) (a) represents the function f (x) differentiated n times and then taking
the derivative expression with x = a. Now if we differentiate f (x) to get rid of the
constant term:
f 0 (x) = a1 + 2.a2 x + 3.a3 x2 + 4.a4 x3 + · · ·
and again set x = 0, we get a1 = f 0 (0).
22
Chapter 3. Power Series 3.2. Maclaurin Series
f (2) (0)
a2 =
2!
In general, for the nth term, we get:
f (n) (0)
an = for n ≥ 0
n!
Maclaurin series Suppose f (x) is differentiable infinitely many times and that it
has a power series representation (series expansion) of the form
∞
X
f (x) = ai xi
i=0
as above.
Differentiating n times gives
∞
X
(n)
f (x) = ai i(i − 1) . . . (i − n + 1)xi−n
i=n
Setting x = 0, we have f (n) (0) = n!an because all terms but the first have x as a factor.
Hence we obtain Maclaurin’s series:
∞
X xn
f (x) = f (n) (0)
n!
n=0
As we have seen in the Series notes, it is important to check the radius of convergence
(set of valid values for x) that ensures that this series converges.
The addition of each successive term in the Maclaurin series creates a closer approx-
imation to the original function around the point x = 0, as we will demonstrate.
1. f (0) = 1 so a0 = 1
3
2. f 0 (x) = 3(1 + x)2 so f 0 (0) = 3 and a1 = 1! =3
6
3. f 00 (x) = 3.2(1 + x) so f 00 (0) = 6 and a2 = 2! =3
6
4. f 000 (x) = 3.2.1 so f 000 (0) = 6 and a3 = 3! =1
23
3.2. Maclaurin Series Chapter 3. Power Series
(1 + x)3 = 1 + 3x + 3x2 + x3
In this case, since the series expansion is finite we know that the Maclaurin series will
be accurate (and certainly converge) for all x, therefore the radius of convergence is
infinite.
P f (i) (x)
We now consider the partial sums of the Maclaurin expansion fn (x) = ni=0 i! xi .
For f (x) = (1 + x)3 these would be:
f0 (x) =1
f1 (x) =1 + 3x
f2 (x) =1 + 3x + 3x2
f3 (x) =1 + 3x + 3x2 + x3
We can plot these along with f (x) in Figure 3.1, and we see that each partial sum is
a successively better approximation to f (x) around the point x = 0.
10
f (x)
f0 (x)
8 f1 (x)
f2 (x)
-2
-4
-6
-2 -1.5 -1 -0.5 0 0.5 1
Figure 3.1: Successive approximations from the Maclaurin series expansion of f (x) =
(1 + x)3 where f0 (x) = 1, f1 (x) = 1 + 3x and f2 (x) = 1 + 3x + 3x2 .
Example 2: f (x) = (1 − x)−1 We probably know that the power series is for this
function is the geometric series in x, in which all ai = 1.
1. f (0) = 1, so far so good!
24
Chapter 3. Power Series 3.2. Maclaurin Series
Differentiating repeatedly,
so
f (n) (0) n!(1)−(n+1)
an = = =1
n! n!
Thus (1 − x)−1 = ∞ i
P
i=0 x unsurprisingly since this is the sum of an infinite geometric
series in x.
We can check convergence by using the absolute convergence version of D’Alembert’s
ratio test, where bn = xn are the series terms:
xn+1
bn+1
lim = lim
n→∞ bn n→∞ xn
= |x|
Thus convergence is given by |x| < 1 and a radius of convergence of 1 for this power
series.
x x2 x3 x4
ln(1 + x) = − + − + ···
1 2 3 4
(−1)n−1 xn
Taking series terms of bn = n , we get a ratio test of limn→∞ | bbn+1 | = |x| and a
n
convergence condition of |x| < 1.
Indeed we see that if we were to set x = −1, we would get ln 0 = −∞ and a corre-
sponding power series of:
∞
X 1
ln 0 = −
n
n=1
25
3.3. Taylor Series Chapter 3. Power Series
On the other hand setting x = 1, gives the alternating harmonic series and a nice
convergence result:
∞
X (−1)n−1
ln 2 =
n
n=1
which goes to show that you cannot be sure whether a ratio test of 1 will give a
converging or diverging series.
Example For a series like f (x) = ln x, we would not be able to create a Maclaurin
series since it ln 0 is not defined (is singular). So this is an example where an expan-
sion around another point is required. We show a Taylor expansion about x = 2:
1. f (x) = ln x
1
2. f 0 (x) = x
3. f 00 (x) = − x12
(−1)n−1 (n−1)!
5. f (n) = xn for n > 0
We can now show how the Taylor series approximates the function f (x) around the
point x = a in the same way as the Maclaurin series does around the point x = 0.
26
Chapter 3. Power Series 3.3. Taylor Series
3
f (x)
f0 (x)
f1 (x)
f2 (x)
f3 (x)
2
-1
-2
0 0.5 1 1.5 2 2.5 3 3.5 4
Figure 3.2: Successive approximations from the Taylor series expansion of f (x) = ln x
around the point x = 2.
P f (i) (a)
Figure 3.2 shows the first four partial sums fn (x) = ni=0 i! (x − a)i from the Taylor
series for f (x) = ln x around the point x = 2, where:
f0 (x) = ln 2
x−2
f1 (x) = ln 2 +
2
x − 2 (x − 2)2
f2 (x) = ln 2 + −
2 8
x − 2 (x − 2)2 (x − 2)3
f3 (x) = ln 2 + − +
2 8 24
As for Maclaurin series we still need to be aware of the radius of convergence for
this series. The techniques for calculating it are identical for Taylor series.
(−1)n−1
Taking the absolute ratio test for terms bn = n2n (x − 2)n , where we can ignore the
ln 2 term since we are looking in the limit of n → ∞:
(x − 2)n+1 /(n + 1)2n+1
bn+1
lim = lim
n→∞ bn n→∞ (x − 2)n /n2n
|x − 2| n
= lim
n→∞ 2 n+1
|x − 2|
=
2
This gives us a convergence condition of |x − 2| < 2 and a radius of convergence of 2
for this power series. In general the radius of convergence is limited by the nearest
singularity (such as x = 0 in this case) for a real series or pole if we extend complex
series to the complex plane (we may or may not have time to investigate poles!).
27
3.4. Taylor Series Error Term Chapter 3. Power Series
f (k+1) (c)
where c is a constant that lies between x and a. The term (k+1)! (x−a)k+1 is known as
the Lagrange error term and it replaces the tail of the infinite series from the (k +1)th
term onwards. Although it is precise there is no easy way to find c. So in practice
the bound a < c < x or x < c < a is used to generate a worst-case error for the term.
f (x)
Figure 3.3 shows a continuous and differentiable function f (x). The mean value
theorem states that between two points x and a, there is a point c such that f 0 (c) is
equal to the gradient between the points (a, f (a)) and (x, f (x)).
28
Chapter 3. Power Series 3.5. Deriving the Cauchy Error Term
Given that we can compute the gradient between those two points, we get:
f (x) − f (a)
f 0 (c) =
x−a
f (x) = f (a) + (x − a)f 0 (c) (3.2)
for some c between x and a. Taylor’s theorem can be thought of as a general version
of Mean Value Theorem but expressed in terms of the (k + 1)th derivative instead of
the 1st.
In fact if you set k = 0 in (3.1), you get the mean value theorem of (3.2).
k
X f (n) (t)
Fk (t) = (x − t)n
n!
n=0
Now Fk (a) is the standard for of the nth partial sum and Fk (x) = f (x) for all k. Thus:
k
X f (n) (a)
Fk (x) − Fk (a) = f (x) − (x − a)n
n!
n=0
= Rk (x)
dy
= ky
dx
for constant k, given that y = 1 when x = 0.
Try the series solution
∞
X
y= ai xi
i=0
Find the coefficients ai by differentiating term by term, to obtain the identity, for
i ≥ 0:
Matching coefficients
∞
X ∞
X ∞
X
ai ixi−1 ≡ kai xi ≡ kai−1 xi−1
i=1 i=0 i=1
29
3.6. Power Series Solution of ODEs Chapter 3. Power Series
k k k ki
ai = ai−1 = · ai−2 = . . . = a0
i i i −1 i!
When x = 0, y = a0 so a0 = 1 by the boundary condition. Thus
∞
X (kx)i
y= = ekx
i!
i=0
30
Chapter 4
• Terminology:
dy
– Ordinary differential equations (ODEs) are first order if they contain a
dx
term but no higher derivatives
d2 y
– ODEs are second order if they contain a term but no higher derivatives
dx2
dy
– For example, 2 +y = 0 (∗)
dx
– Try: y = emx
⇒ 2memx + emx = 0
⇒ emx (2m + 1) = 0
⇒ emx = 0 or m = − 21
– emx , 0 for any x, m. Therefore m = − 12
– General solution to (∗):
1
y = Ae− 2 x
31
4.1. Differential Equations Chapter 4. Differential Equations and Calculus
32
Chapter 4. Differential Equations and Calculus 4.1. Differential Equations
d2 y dy
– For example, 2
−6 + 9y = 0 (∗)
dx dx
– Try: y = emx
⇒ m2 emx − 6memx + 9emx = 0
⇒ emx (m2 − 6m + 9) = 0
⇒ emx (m − 3)2 = 0
– m = 3 (twice)
– General solution to (∗) for repeated roots:
y = (Ax + B)e3x
• Coupled ODEs are used to model massive state-space physical and computer
systems
!
y1
• If we let y = , we can rewrite this as:
y2
dy1
" # ! " #
dx = a b
y1 dy a b
dy2 or = y
c d y2 dx c d
dx
33
4.2. Partial Derivatives Chapter 4. Differential Equations and Calculus
34
Chapter 4. Differential Equations and Calculus 4.2. Partial Derivatives
Differentiation Contents
Optimisation
• Example: look to find best predicted gain in portfolio given different possible
share holdings in portfolio
Differentiation
δy
δx
δy f (x + δx) − f (x)
=
δx δx
35
4.2. Partial Derivatives Chapter 4. Differential Equations and Calculus
df f (x + δx) − f (x)
= lim
dx δx→0 δx
• Take f (x) = xn
Derivative of xn
df f (x+δx)−f (x)
• dx = limδx→0 δx
(x+δx)n −xn
= limδx→0 δx
Pn n n−i i n
( )x δx −x
= limδx→0 i=0 i δx
Pn n n−i i
( )x δx
= limδx→0 i=1 iδx
Pn n n−i
= limδx→0 i=1 i x δxi−1
n !
n
X n
= limδx→0 1 xn−1 + xn−i δxi−1
i
i=2
| {z }
→0 as δx→0
n!
= 1!(n−1)!
xn−1 = nxn−1
• Finding the derivative involves finding the gradient of the function by varying
one variable and keeping the others constant
∂f ∂f
– ∂x
= 2xy + y 3 and ∂y
= x2 + 3xy 2
36
Chapter 4. Differential Equations and Calculus 4.2. Partial Derivatives
• f (x, y) = x2 + y 2
• f (x, y) = x2 + y 2
Further Examples
∂f ∂
• f (x, y) = (x + 2y 3 )2 ⇒ ∂x
= 2(x + 2y 3 ) ∂x (x + 2y 3 ) = 2(x + 2y 3 )
df ∂f dx ∂f dy
= +
dt ∂x dt ∂y dt
37
4.2. Partial Derivatives Chapter 4. Differential Equations and Calculus
4.2.4 Jacobian
• The modulus of this matrix is called the Jacobian:
∂y
∂x
J = ∂s ∂s
∂x ∂y
∂t ∂t
df (x)
we would use: du ≡ dx dx
Formal Definition
∂f f (x + δx, y) − f (x, y)
= lim
∂x δx→0 δx
38
Chapter 4. Differential Equations and Calculus 4.2. Partial Derivatives
Further Notation
39
Chapter 5
Complex Numbers
5.1 Introduction
We can see need for complex numbers by looking at the shortcomings of all the
simpler (more obvious) number systems that preceded them. In each case the next
number system in some sense fixes a perceived problem or omission with the previ-
ous one:
Z Integers, the natural numbers with 0 and negative numbers, not closed under
division
Q Rational numbers, closed under arithmetic operations but cannot represent the
solution of all non-linear equations, e.g., x2 = 2
R Real numbers, solutions to some quadratic equations with real roots and some
higher-order equations, but not all, e.g., x2 + 1 = 0
C Complex numbers, we require these to represent all the roots of all polynomial
equations.1
Another important use of complex numbers is that often a real problem can be solved
by mapping it into complex space, deriving a solution, and mapping back again: a
direct solution may not be possible or would be much harder to derive in real space,
e.g., finding solutions to integration or summation problems, such as
Z x n
X
aθ
I= e cos bθ dθ or S= ak cos kθ . (5.1)
0 k=0
5.1.1 Applications
Complex numbers are important in many areas. Here are some:
1 Complex numbers form an algebraically closed field, where any polynomial equation has a root.
40
Chapter 5. Complex Numbers 5.1. Introduction
• Machine learning: Using a pair of uniformly distributed random numbers (x, y),
we can generate random numbers in polar form (r cos(θ), r sin(θ)). This can
lead to efficient sampling methods like the Box-Muller transform (Box and
Muller, 1958).2 The variant of the Box-Muller transform using complex num-
bers was proposed by Knop (1969).
x2 + 1 = 0, (5.2)
There is no way√of squeezing this into R, it cannot be compared with a real number
(in contrast to 2 or π, which we can compare with rationals and get arbitrarily
accurate approximations in the rationals). We call i the imaginary number/unit,
orthogonal to the reals.
1 1
4. In general i −2n = i 2n
= (−1)n
= (−1)n , i −(2n+1) = i −2n i −1 = (−1)n+1 i for all n ∈ N
5. i 0 = 1
2 This is a pseudo-random number sampling method, e.g., for generating pairs of independent,
standard, normally distributed (zero mean, unit variance) random numbers, given a source of uni-
formly distributed random numbers.
41
5.1. Introduction Chapter 5. Complex Numbers
Im
z = (x, y) = x + iy
iy
1 x Re
Figure 5.1: Complex plane (Argand diagram). A complex number can be represented
in a two-dimensional Cartesian coordinate system with coordinates x and y. x is the real
part and y is the imaginary part of a complex number z = x + iy.
C := {a + ib : a, b ∈ R, i 2 = −1} (5.4)
as the set of tuples (a, b) ∈ R2 with the following definition of addition and multipli-
cation:
In this context, the element i := (0, 1) is the imaginary number/unit. With the
complex multiplication defined in (5.6), we immediately obtain
42
Chapter 5. Complex Numbers 5.2. Representations of Complex Numbers
Im
z2
Re
z1 + z2
z1
Figure 5.2: Visualization of complex addition. As known from geometry, we simply add
the two vectors representing complex numbers.
43
5.2. Representations of Complex Numbers Chapter 5. Complex Numbers
i r
φ
Im
r sin φ
φ r
r cos φ Re
z = r exp(iφ) (5.12)
where r and φ are the polar coordinates. We already know that z = r(cos φ + i sin φ),
i.e., it must also hold that r exp(iφ) = r(cos φ+i sin φ). This can be proved by looking
at the power series expansions of exp, sin, and cos:
∞
X (iφ)k (iφ)2 (iφ)3 (iφ)4 (iφ)5
exp(iφ) = = 1 + iφ + + + + + ··· (5.13)
k! 2! 3! 4! 5!
k=0
φ2 iφ3 φ4 iφ5
= 1 + iφ − − + + ∓ ··· (5.14)
2! 3! 4! 5!
44
Chapter 5. Complex Numbers 5.2. Representations of Complex Numbers
φ2 φ4 φ3 φ5
! !
= 1− + ∓ ··· + i φ − + ∓ ··· (5.15)
2! 4! 3! 5!
∞ ∞
X (−1)k φ2k X (−1)k φ2k+1
= +i = cos φ + i sin φ . (5.16)
(2k)! (2k + 1)!
k=0 k=0
Therefore, z = exp(iφ) is a complex number, which lives on the unit circle (|z| = 1)
and traces out the unit circle in the complex plane as φ ranges through the real
numbers.
p
x = r cos φ r r = x2 + y 2
y
y = r sin φ tan φ = xy + quadrant
z = x + iy = r(cos φ + i sin φ)
Figure 5.5: Transformation between Cartesian and polar coordinate representations of
complex numbers.
Figure 5.5 summarizes the transformation between Cartesian and polar coordinate
representations of complex numbers z. We have to pay some attention when com-
puting Arg (z) when transforming Cartesian coordinates into polar coordinates.
45
5.2. Representations of Complex Numbers Chapter 5. Complex Numbers
y
z r=2 Im
φ= 2π
3
Re
x
z = 2 − 2i
2π
(a) (r, φ) = (2, 3 ) (b) (x, y) = (2, −2)
z = −1 + i Im Im
1 1
Re Re
z= − 32 i
that y/x has two possible angles, which differ by π, see Figure 5.7. By looking at the
quadrant in which the complex number z lives we can resolve this ambiguity. Let us
have a look at some examples:
√ √
1. z = 2 − 2i. We immediately obtain r = 22 + 22 = 2 2. For the argument,
we obtain tan φ = − 22 = −1. Therefore, φ ∈ { 34 π, 74 π}. We identify the correct
argument by plotting the complex number and identifying the quadrant. Fig-
ure 5.6(b) shows that z lies in the fourth quadrant. Therefore, φ = 74 π.
2. z = −1 + i.
√ √
r = 1+1 = 2 (5.19)
−1
tan φ = = −1 ⇒ φ ∈ { 34 π, 74 π} . (5.20)
1
Figure 5.6(c) shows that z lies in the second quadrant. Therefore, φ = 43 π.
3. z = − 23 i.
3
r= 2 (5.21)
− 32 π 3
tan φ = ⇒ φ ∈ { , π} (5.22)
0 2 2
Figure 5.6(d) shows that z is between the third and fourth quadrant (and not
between the first and second). Therefore, φ = 23 π
46
Chapter 5. Complex Numbers 5.2. Representations of Complex Numbers
φ1 φ2 = φ1 + π
Figure 5.7: Tangens. Since the tangens possesses a period of π, there are two solutions
for the argument 0 ≤ φ < 2π of a complex number, which differ by π.
47
5.3. Complex Conjugate Chapter 5. Complex Numbers
Im
z1
Re
z2
z1 z2
Figure 5.8: Complex multiplication. When we multiply two complex numbers z1 , z2 ,
the corresponding distances r1 and r2 are multiplied while the corresponding arguments
φ1 , φ2 are summed up.
Im
z = x + iy
Re
z = x − iy
Figure 5.9: The complex conjugate z is a reflection of z about the real axis.
can be computed efficiently: The distance r to the origin is simply raised to the
power of n and the argument is scaled/multiplied by n. This also immediately gives
us the result
2. =(z) = −=(z)
3. z + z = 2x = 2<(z) ∈ R
48
Chapter 5. Complex Numbers 5.3. Complex Conjugate
5. z1 + z2 = z1 + z2
Notice that the term ‘absolute value’ is the same as defined for real numbers when
=(z) = 0. In this case, |z| = |x|.
The absolute value of the product has the following nice property that matches the
product result for real numbers:
1 z z x − iy
= = 2= 2 . (5.31)
z zz |z| x + y2
This can be written z−1 = |z|−2 z, using only the complex operators multiply and add,
see (5.5) and (5.6), but also real division, which we already know. Complex division
is now defined by z1 /z2 = z1 z2−1 . In practice, we compute the division z1 /z2 by ex-
panding the fraction by the complex conjugate of the denominator. This ensures that
the denominator’s imaginary part is 0 (only the real part remains), and the overall
fraction can be written as
z1 z1 z2 z1 z2
= = (5.32)
z2 z2 z2 |z2 |2
49
5.4. De Moivre’s Theorem Chapter 5. Complex Numbers
3 + 2i
z = x + iy = (5.34)
7 − 3i
Solution:
3 + 2i (3 + 2i)(7 + 3i) 15 + 23i 15 23
= = = +i (5.35)
7 − 3i (7 − 3i)(7 + 3i) 49 + 9 58 58
15 23
Now, the fraction can be written as z = x + iy with x = 58 and y = 58 .
The proof is done by induction (which you will see in detail in the course Reasoning
about Programs). A proof by induction allows you to prove that a property is true for
all values of a natural number n. To construct an induction proof, you have to prove
that the property, P (n), is true for some base value (say, n = 1). A further proof is
required to show that if it is true for the parameter n = k, then that implies it is also
true for the parameter n = k + 1: that is P (k) ⇒ P (k + 1) for all k ≥ 1. The two proofs
combined allow us to build an arbitrary chain of implication up to some value n = m:
50
Chapter 5. Complex Numbers 5.4. De Moivre’s Theorem
Proof 1
We start the induction proof by checking whether de Moivre’s theorem holds for n = 1:
is trivially true, and we can now make the induction step: We assume that (5.36) is
true for k and show that it also holds for k + 1.
Assuming
we can write
We have tackled the case for n > 0 already, n = 0 can be shown individually. So we
take the case n < 0. We let n = −m for m > 0.
1
(cos φ + i sin φ)n =
(cos φ + i sin φ)m
1
= by de Moivre’s theorem
cos mφ + i sin mφ
cos mφ − i sin mφ
=
cos2 mφ + sin2 mφ
= cos(−mφ) + i sin(−mφ) Trig. identity: cos2 mφ + sin2 mφ = 1
= cos nφ + i sin nφ
51
5.5. Triangle Inequality for Complex Numbers Chapter 5. Complex Numbers
p p
Hence cos q φ + i sin q φ is one of the qth roots of (cosφ + i sin φ)p .
The qth roots of cos φ + i sin φ are easily obtained. We need to use the fact that
(repeatedly) adding 2π to the argument of a complex number does not change the
complex number.
1 1
(cos φ + i sin φ) q = (cos(φ + 2nπ) + i sin(φ + 2nπ)) q (5.41)
φ + 2nπ φ + 2nπ
= cos + i sin for 0 ≤ n < q (5.42)
q q
We will use this later to calculate roots of complex numbers.
Finally, the full set of values for (cos +i sin φ)n for n = p/q ∈ Q is:
pφ + 2nπ pφ + 2nπ
cos + i sin for 0 ≤ n < q (5.43)
q q
52
Chapter 5. Complex Numbers 5.6. Fundamental Theorem of Algebra
Proof 2
Let z1 = x1 + iy1 and z2 = x2 + iy2 . Squaring the left-hand side of (5.45) yields
The geometrical argument via the Argand diagram is a good way to understand the
triangle inequality.
A root z∗ of p(z) satisfies p(z∗ ) = 0. Bear in mind that complex roots include all real
roots as the real numbers are a subset of the complex numbers. Also some of the
roots might be coincident, e.g., for z2 = 0. Finally, we also know that if ω is a root
and ω ∈ C\R, then ω is also a root. So all truly complex roots occur in complex
conjugate pairs.
zn = 1 , n ∈ N, (5.53)
for which we want to determine the roots. The fundamental theorem of algebra tells
us that there exist exactly n roots, one of which is z = 1.
53
5.6. Fundamental Theorem of Algebra Chapter 5. Complex Numbers
Im
1 Re
Figure 5.10: Then nth roots of zn = 1 lie on the unit circle and form a regular polygon.
Here, we show this for n = 8.
To find the other solutions, we write (5.53) in a slightly different form using the
Euler representation:
zn = 1 = eik2π , ∀k ∈ Z . (5.54)
The 3rd roots of 1 are z = e2kπi/3 for k = 0, 1, 2, i.e., 1, e2πi/3 , e4πi/3 . These are often
referred to as ω1 ω1 and ω3 , and simplify to
ω1 = 1
√
ω2 = cos 2π/3 + i sin 2π/3 = (−1 + i 3)/2 ,
√
ω3 = cos 4π/3 + i sin 4π/3 = (−1 − i 3)/2 .
Try cubing each solution directly to validate that they are indeed cubic roots.
5.6.2 Solution of zn = a + ib
Finding the n roots of zn = a + ib is similar to the approach discussed above: Let
a + ib = reiφ in polar form. Then, for k = 0, 1, . . . , n − 1,
54
Chapter 5. Complex Numbers 5.7. Complex Sequences and Series*
Example
Determine the cube roots of 1 − i.
√
1. The polar coordinates of 1 − i are r = 2, φ = 74 π, and the corresponding Euler
representation is
√
z = 2 exp(i 7π
4 ). (5.57)
The only distinction here is the meaning of |zn − l|, which refers to the complex
absolute value and not the absolute real value.
55
5.7. Complex Sequences and Series* Chapter 5. Complex Numbers
r
1
⇒n> −1 for ≤ 1 (5.66)
2
We have to be a tiny bit careful as N () needs to be defined for all > 0 and the
penultimate line of the limit inequality is true for all n > 0 if > 1. In essence this
was no different in structure from the normal sequence convergence proof. The only
difference was how we treated the absolute value.
Absolute Convergence
Similarly, a complex series ∞
P P∞
n=1 zn is absolutely convergent if n=1 |zn | converges.
Again the |zn | refers to the complex absolute value.
and diverges if
zn+1
lim > 1. (5.69)
n→∞ zn
We can prove that this will converge for some values of z ∈ C in the same way we
n
could for the real-valued series. Applying the complex ratio test, we get limn→∞ | azazn−1 | =
|z|. We apply the standard condition and get that |z| < 1 for this series to converge.
The radius of convergence is still 1 (and is an actual radius of a circle in the complex
plane). What is different here is that now any z-point taken from within the circle
centred on the origin with radius 1 will make the series converge, not just on the
real interval (−1, 1).
a
For your information, the limit of this series is 1−z , which you can show using
Maclaurin as usual, discussed below.
56
Chapter 5. Complex Numbers 5.8. Complex Power Series
∀z ∈ C, n ∈ Z : z = rei(φ+2nπ) (5.74)
since ei2nπ = cos 2nπ + i sin 2nπ = 1. This is the same general form we used in the
rational extension to De Moivres theorem to access the many roots of a complex
number.
In terms of the Argand diagram, the points ei(φ+2nπ) for i ≥ 1 lie on top of each other,
each corresponding to one more revolution (through 2π).
The complex conjugate of eiφ is e−iφ = cos φ − i sin φ. This allows us to get useful
expressions for sin φ and cos φ:
57
5.9. Applications of Complex Numbers* Chapter 5. Complex Numbers
and
and, therefore,
1
cos6 φ = (cos 6φ + 6 cos 4φ + 15 cos 2φ + 10) . (5.84)
32
n
ak sin kφ. Then,
P
Let S =
k=1
n
X 1 − (aeiφ )n+1
C + iS = ak eikφ = . (5.86)
k=0
1 − aeiφ
Hence,
58
Chapter 5. Complex Numbers 5.9. Applications of Complex Numbers*
5.9.3 Integrals
We can determine integrals
Z x
C= eaφ cos bφdφ , (5.91)
Z0x
S= eaφ sin bφdφ (5.92)
0
59
Chapter 6
Linear Algebra
This chapter is largely based on the lecture notes and books by Drumm and Weil
(2001); Strang (2003); Hogben (2013) as well as Pavel Grinfeld’s Linear Algebra
series1 . Another excellent source is Gilbert Strang’s Linear Algebra lecture at MIT2 .
Linear algebra is the study of vectors. Generally, vectors are special objects that can
be added together and multiplied by scalars to produce another object of the same
kind. Any object that satisfies these two properties can be considered a vector. Here
are three examples of such vectors:
1. Geometric vectors. This example of a vector may be familiar from High School.
Geometric vectors are directed segments, which can be drawn, see Fig. 6.1.
Two vectors x~, y~ can be added, such that x~ + y~ = ~z is another geometric vector.
Furthermore, λx~, λ ∈ R is also a geometric vector. In fact, it is the original
vector scaled by λ. Therefore, geometric vectors are instances of the vector
concepts introduced above.
2. Polynomials are also vectors: Two polynomials can be added together, which
results in another polynomial; and they can be multiplied by a scalar λ ∈ R, and
the result is a polynomial as well. Therefore, polynomial are (rather unusual)
instances of vectors. Note that polynomials are very different from geometric
vectors. While geometric vectors are concrete “drawings”, polynomials are
abstract concepts. However, they are both vectors.
3. Rn is a set of numbers, and its elements are n-tuples. Rn is even more abstract
than polynomials, and the most general concept we consider in this course. For
example,
1
a = 2 ∈ R3 (6.1)
3
60
Chapter 6. Linear Algebra
~y
~x
Figure 6.1: Example of two geometric vectors in two dimensions.
Linear algebra focuses on the similarities between these vector concepts: We can
add them together and multiply them by scalars. We will largely focus on the third
kind of vectors since most algorithms in linear algebra are formulated in Rn . There
is a 1:1 correspondence between any kind of vector and Rn . By studying Rn , we
implicitly study all other vectors. Although Rn is rather abstract, it is most useful.
• Data visualization
• State estimation and optimal control (e.g., in robotics and dynamical systems)
61
6.1. Linear Equation Systems Chapter 6. Linear Algebra
a11 x1 + · · · + a1n xn b1
.. = ... , (6.3)
.
am1 x1 + · · · + amn xn bm
where aij ∈ R and bi ∈ R. Equation (6.3) is the general form of a linear equation
system, and x1 , . . . , xn are the unknowns of this linear equation system. Every n-
tuple (x1 , . . . , xn ) ∈ Rn that satisfies (6.3) is a solution of the linear equation system.
The linear equation system
x1 + x2 + x3 = 3 (1)
x1 − x2 + 2x3 = 2 (2) (6.4)
2x1 + 3x3 = 1 (3)
has no solution: Adding the first two equations yields (1)+(2) = 2x1 +3x3 = 5, which
contradicts the third equation (3).
Let us have a look at the linear equation system
x1 + x2 + x3 = 3 (1)
x1 − x2 + 2x3 = 2 (2) . (6.5)
x2 + x3 = 2 (3)
From the first and third equation it follows that x1 = 1. From (1)+(2) we get 2+3x3 =
5, i.e., x3 = 1. From (3), we then get that x2 = 1. Therefore, (1, 1, 1) is the only
possible and unique solution (verify by plugging in).
As a third example, we consider
x1 + x2 + x3 = 3 (1)
x1 − x2 + 2x3 = 2 (2) . (6.6)
2x1 + 3x3 = 5 (3)
62
Chapter 6. Linear Algebra 6.2. Groups
Since (1)+(2)=(3), we can omit the third equation (redundancy). From (1) and
(2), we get 2x1 = 5 − 3x3 and 2x2 = 1 + x3 . We define x3 = a ∈ R as a free variable,
such that any triplet
5 3 1 1
− a, + a, a , a ∈ R (6.7)
2 2 2 2
is a solution to the linear equation system, i.e., we obtain a solution set that contains
infinitely many solutions.
In general, for a real-valued linear equation system we obtain either no, exactly one
or infinitely many solutions.
For a systematic approach to solving linear equation systems, we will introduce a
useful compact notation. We will write the linear equation system from (6.3) in the
following form:
a11 a12 a1n b1 a11 · · · a1n x1 b1
x1 ... + x2 ... + · · · + xn ... = ... ⇔ ... .. .. = .. .
. . . (6.8)
am1 am2 amn bm am1 · · · amn xn bm
In order to work with these matrices, we need to have a close look at the underlying
algebraic structures and define computation rules.
6.2 Groups
Groups play an important role in computer science. Besides providing a fundamental
framework for operations on sets, they are heavily used in cryptography, coding
theory and graphics.
6.2.1 Definitions
Consider a set G and an operation ⊗ : G → G defined on G. For example, ⊗ could be
+, · defined on R, N, Z or ∪, ∩, \ defined on P (B), the power set of B.
Then (G, ⊗) is called a group if
• Associativity: ∀x, y, z ∈ G : (x ⊗ y) ⊗ z = x ⊗ (y ⊗ z)
63
6.3. Matrices Chapter 6. Linear Algebra
6.2.2 Examples
(Z, +) is a group, whereas (N0 , +)5 is not: Although (N0 , +) possesses a neutral ele-
ment (0), the inverse elements are missing.
(Z, ·) is not a group: Although (Z, ·) contains a neutral element (1), the inverse
elements for any z ∈ Z, z , ±1, are missing.
(R, ·) is not a group since 0 does not possess an inverse element. However, (R\{0}) is
Abelian.
(Rn , +), (Zn , +), n ∈ N are Abelian if + is defined componentwise, i.e.,
Then, e = (0, · · · , 0) is the neutral element and (x1 , · · · , xn )−1 := (−x1 , · · · , −xn ) is the
inverse element.
6.3 Matrices
Definition 1 (Matrix)
With m, n ∈ N a real-valued (m, n) matrix A is an m·n-tuple of elements aij , i = 1, . . . , m,
j = 1, . . . , n, which is ordered according to a rectangular scheme consisting of m rows
and n columns:
a11 a12 · · · a1n
a
21 a22 · · · a2n
A = ..
.. .. , aij ∈ R . (6.10)
. . .
am1 am2 · · · amn
(1, n)-matrices are called rows, (m, 1)-matrices are called columns. These special ma-
trices are also called row/column vectors.
Rm×n is the set of all real-valued (m, n)-matrices. A ∈ Rm×n can be equivalently
represented as A ∈ Rmn . Therefore, (Rm×n , +) is Abelian group (with componentwise
addition as defined in (6.9)).
This means, to compute element cij we multiply the elements of the ith row of A
with the jth column of B 6 and sum them up.7
5N
0 = N ∪ {0}
6 They are both of length k, such that we can compute ail blj for l = 1, . . . , n.
7 Later, we will call this the scalar product or dot product of the corresponding row and column.
64
Chapter 6. Linear Algebra 6.3. Matrices
Remark 1
Matrices can only be multiplied if their “neighboring” dimensions match. For instance,
an n × k-matrix A can be multiplied with a k × m-matrix B, but only from the left side:
A B = C (6.12)
|{z} |{z} |{z}
n×k k×m n×m
The product BA is not defined if m , n since the neighboring dimensions do not match.
Remark 2
Note that matrix multiplication is not defined as an element-wise operation on matrix
elements, i.e., cij , aij bij (even if the size of A, B was chosen appropriately).8
Example
0 2
" #
1 2 3
∈ R2×3 , B = 1 −1 ∈ R3×2 , we obtain
For A =
3 2 1
0 1
0 2 # 6 4 2
"
1 2 3
= −2 0 2 ∈ R3×3 .
BA = 1 −1 (6.14)
3 2 1
0 1 3 2 1
From this example, we can already see that matrix multiplication is not commuta-
tive, i.e., AB , BA.
1 0 · · · · · · 0
0 1 0 · · · 0
I n = ... . . . . . . . . . ... ∈ Rn×n .
(6.15)
0 · · · 0 1 0
0 ··· ··· 0 1
With this, A·I n = A = I n A for all A ∈ Rn×n . Therefore, the identity matrix is the neutral
element with respect to matrix multiplication “·” in (Rn×n , ·).9
8 This kind of element-wise multiplication appears often in computer science where we multiply
(multi-dimensional) arrays with each other.
9 If A ∈ Rm×n then I is only a right neutral element, succh that AI = A. The corresponding
n n
left-neutral element would be I m since I m A = A.
65
6.3. Matrices Chapter 6. Linear Algebra
Properties
• Associativity: ∀A ∈ Rm×n , B ∈ Rn×p , C ∈ Rp×q : (AB)C = A(BC)
• Distributivity: ∀A1 , A2 ∈ Rm×n , B ∈ Rn×p : (A1 + A2 )B = A1 B + A2 B
A(B + C) = AB + AC
• ∀A ∈ Rm×n : I m A = AI n = A. Note that I m , I n for m , n.
• (AB)−1 = B −1 A−1
• (A> )> = A
• (A + B)> = A> + B >
• (AB)> = B > A>
• If A is invertible, (A−1 )> = (A> )−1
1 1
• Note: (A + B)−1 , A−1 + B −1 . Example: in the scalar case 2+4 = 6 , 12 + 14 .
A is symmetric if A = A> . Note that this can only hold for (n, n)-matrices (quadratic
matrices). The sum of symmetric matrices is symmetric, but this does not hold for
the product in general (although it is always defined). A counterexample is
" #" # " #
1 0 1 1 1 1
= . (6.16)
0 0 1 1 0 0
10 Thenumber columns equals the number of rows.
11 Themain diagonal (sometimes principal diagonal, primary diagonal, leading diagonal, or major
diagonal) of a matrix A is the collection of entries Aij where i = j.
66
Chapter 6. Linear Algebra 6.4. Gaussian Elimination
• Distributivity:
(λ + ψ)C = λC + ψC, C ∈ Rm×n
λ(B + C) = λB + λC, B, C ∈ Rm×n
• Associativity:
(λψ)C = λ(ψC), C ∈ Rm×n
λ(BC) = (λB)C = B(λC), B ∈ Rm×n , C ∈ Rn×k .
Note that this allows us to move scalar values around.
and use the rules for matrix multiplication, we can write this equation system in a
more compact form as
Note that x1 scales the first column, x2 the second one, and x3 the third one.
Generally, linear equation systems can be compactly represented in their matrix form
as Ax = b, see (6.3), and the product Ax is a (linear) combination of the columns of
A.12
a11 x1 + · · · + a1n xn = b1
.. .. , (6.18)
. .
am1 x1 + · · · + amn xn = bm
12 We will discuss linear combinations in Section 6.5.
67
6.4. Gaussian Elimination Chapter 6. Linear Algebra
This equation system is in a particularly easy form, where the first two columns
consist of a 1 and a 0.13 Remember that we want to find scalars x1 , . . . , x4 , such that
P 4
i=1 xi ci = b, where we define ci to be the ith column of the matrix and b the right-
hand-side of (6.19). A solution to the problem in (6.19) can be found immediately
by taking 42 times the first column and 8 times the second column, i.e.,
" # " # " #
42 1 0
b= = 42 +8 . (6.20)
8 0 1
Therefore, one solution vector is [42, 8, 0, 0]> . This solution is called a particular
solution or special solution. However, this is not the only solution of this linear
equation system. To capture all the other solutions, we need to be creative of gen-
erating 0 in a non-trivial way using the columns of the matrix: Adding a couple of
0s to our special solution does not change the special solution. To do so, we express
the third column using the first two columns (which are of this very simple form):
" # " # " #
8 1 0
=8 +2 , (6.21)
2 0 1
such that 0 = 8c1 + 2c2 − 1c3 + 0c4 . In fact, any scaling of this solution produces the
0 vector:
8
2
λ1 = 0 , λ1 ∈ R. (6.22)
−1
0
13 Later, we will say that this matrix is in reduced row echelon form.
68
Chapter 6. Linear Algebra 6.4. Gaussian Elimination
Following the same line of reasoning, we express the fourth column of the matrix
in (6.19) using the first two columns and generate another set of non-trivial versions
of 0 as
−4
12
λ2 = 0 , λ2 ∈ R. (6.23)
0
−1
Putting everything together, we obtain all solutions of the linear equation system
in (6.19), which is called the general solution, as
42 8 −4
8 2 12
+ λ1 + λ2 . (6.24)
0 −1 0
0 0 −1
Remark 5
• The general approach we followed consisted of the following three steps:
1. Find a particular solution to Ax = b
2. Find all solutions to Ax = 0
3. Combine the solutions from 1. and 2. to the general solution.
• Neither the general nor the particular solution is unique.
The linear equation system in the example above was easy to solve because the
matrix in (6.19) has this particularly convenient form, which allowed us to find
the particular and the general solution by inspection. However, general equation
systems are not of this simple form. Fortunately, there exists a constructive way of
transforming any linear equation system into this particularly simple form: Gaussian
elimination.
The rest of this section will introduce Gaussian elimination, which will allow us to
solve all kinds of linear equation systems by first bringing them into a simple form
and then applying the three steps to the simple form that we just discussed in the
context of the example in (6.19), see Remark 5.
69
6.4. Gaussian Elimination Chapter 6. Linear Algebra
Example
If we now multiply the second equation with (−1) and the third equation with − 31 ,
we obtain the row echelon form
x1 − 2x2 + x3 − x4 + x5 = 0
x3 − x4 + 3x5 = −2
(6.30)
x4 − 2x5 = 1
0 = a+1
Only for a = −1, this equation system can be solved. A particular solution is given
by
x1 2
x2 0
x3 = −1 (6.31)
x4 1
x5 0
70
Chapter 6. Linear Algebra 6.4. Gaussian Elimination
and the general solution, which captures the set of all possible solutions, is given
as
x1 2 2 2
x2 0 1 0
x3 = −1 + λ1 0 + λ2 −1 , λ1 , λ2 ∈ R (6.32)
x4 1 0 2
x5 0 0 1
From here, we find relatively directly that λ3 = 1, λ2 = −1, λ1 = 2. When we put every-
thing together, we must not forget the non-pivot columns for which we set the coefficients
implicitly to 0. Therefore, we get the particular solution
2
0
x = −1 . (6.34)
1
0
Example 2
In the following, we will go through solving a linear equation system in matrix form.
Consider the problem of finding x = [x1 , x2 , x3 ]> , such that Ax = b, where
71
6.4. Gaussian Elimination Chapter 6. Linear Algebra
1 2 3 4
4 5 6 6 ,
7 8 9 8
which we now transform into row echelon form using the elementary row opera-
tions:
1 2 3 4 1 2 3 4
4
5 6 6 −4R1 ; 0 −3 −6 −10 ·(− 13 )
7 8 9 8 −7R1 0 −6 −12 −20 −2R2
1 2 3 4
; 0 1 2 10
3
0 0 0 0
From the row echelon form, we see that x3 is a free variable. To find a particular
solution, we can set x3 to any real number. For convenience, we choose x3 = 0, but
any other number would have worked. With x3 = 0, we obtain a particular solution
8
x1 − 3
x 10
2 = 3 . (6.36)
x3 0
To find the general solution, we combine the particular solution with the solution of
the homogeneous equation system Ax = 0. There are two ways of getting there: the
matrix view and the equation system view. Looking at it from the matrix perspective,
we need to express the third column of the row-echelon form in terms of the first
two columns. This can be done by seeing that
3 1 2 1 2 3
(6.37)
2
= − + 2 ⇔ − + 2 1 − 2 = 0 .
0 1 0
0 0 0 0 0 0
We now take the coefficients [−1, 2, −1]> of these colums that are a non-trivial repre-
sentation of 0 as the solution (and any multiple of it) to the homogeneous equation
system Ax = 0.
An alternative and equivalent way is to remember that we wanted to solve a linear
equation system, we find the solution to the homogeneous equation system by ex-
pressing x3 in terms of x1 , x2 . From the row echelon form, we see that x2 + 2x3 =
0 ⇒ x3 = − 21 x2 . With this, we now look at the first set of equations and obtain
x1 + 2x2 + 3x3 = 0 ⇒ x1 − x3 = 0 ⇒ x3 = x1 .
Independent of whether we use the matrix or equation system view, we arrive at the
general solution
8
− 3 1
10
3 + λ −2 , λ ∈ R. (6.38)
0 1
72
Chapter 6. Linear Algebra 6.4. Gaussian Elimination
• Every pivot must be 1 and is the only non-zero entry in its column.
The reduced row echelon form will play an important role in later sections because it
allows us to determine the general solution of a linear equation system in a straight-
forward way.
Example: Reduced Row Echelon Form Verify that the following matrix is in re-
duced row echelon form:
1 3 0 0 3
A = 0 0 1 0 9 (6.39)
0 0 0 1 −4
73
6.4. Gaussian Elimination Chapter 6. Linear Algebra
To start, we assume that A is in reduced row echelon form without any rows that
just contain zeros (e.g., after applying Gaussian elimination), i.e.,
0 · · · 0 1 ∗ · · · ∗ 0 ∗ · · · ∗ 0 ∗ · · · ∗
.. .. .. .. ..
. . 0 0 · · · 0 1 ∗ · · · ∗ . . .
. . . . . . ..
A = .. .. .. .. .. 0 ∗ . . . ∗ 0 .. . (6.41)
. . . . . . . .
.. .. ∗ . . . ∗ 0 ..
.. .. .. .. ..
0 ··· 0 0 0 ··· 0 0 ∗ ··· ∗ 1 ∗ ··· ∗
Note that the columns j1 , . . . , jk with the pivots (marked red) are the standard unit
vectors e1 , . . . , ek ∈ Rk .
We now extend this matrix to an n × n-matrix à by adding n − k rows of the form
h i
0 · · · 0 −1 0 · · · 0 , (6.42)
such that the diagonal of the augmented matrix à contains only 1 or −1. Then,
the columns of Ã, which contain the −1 as pivots are solutions of the homogeneous
equation system Ax = 0.16 To be more precise, these columns form a basis (Sec-
tion 6.7) of the solution space of Ax = 0, which we will later call the kernel or null
space (Section 6.9.1).
Example
Let us revisit the matrix in (6.39), which is already in reduced row echelon form:
1 3 0 0 3
A = 0 0 1 0 9 . (6.43)
0 0 0 1 −4
We now augment this matrix to a 5 × 5 matrix by adding rows of the form (6.42) at
the places where the pivots on the diagonal are missing and obtain
1 3 0 0 3
0 −1 0 0 0
à = 0 0 1 0 9 (6.44)
0 0 0 1 −4
0 0 0 0 −1
From this form, we can immediately read out the solutions of Ax = 0 by taking the
columns of Ã, which contain −1 on the diagonal:
3 3
−1 0
λ1 0 + λ2 9 , λ1 , λ2 ∈ R, (6.45)
0 −4
0 −1
74
Chapter 6. Linear Algebra 6.4. Gaussian Elimination
This means that if we bring the augmented equation system into reduced row eche-
lon form, we can read off the inverse on the right-hand side of the equation system.
"
#
1 2
Example 1 For A = , we determine its inverse by solving the following linear
3 4
equation system:
" #
1 2 1 0
3 4 0 1
The right-hand side of this augmented equation system contains the inverse
" #
−1 −2 1
A = 3 . (6.46)
2 − 21
1 0 2 0
1 1 0 0
A = (6.47)
1 2 0 1
1 1 1 1
75
6.5. Vector Spaces Chapter 6. Linear Algebra
1 0 2 0 1 0 0 0
1 1 0 0 0 1 0 0
1 2 0 1 0 0 1 0
1 1 1 1 0 0 0 1
1 0 0 0 −1 2 −2 2
0 1 0 0 1 −1 2 −2
,
0 0 1 0 1 −1 1 −1
0 0 0 1 −1 0 −1 2
−1 2 −2 2
1 −1 2 −2
A−1 = . (6.48)
1 −1 1 −1
−1 0 −1 2
Remark 9
You may have encountered a way of computing the inverse of a matrix using co-factors
and/or cross-products. This approach only works in three dimensions and is not used
in practice.
+ : V ×V → V (6.49)
· : R×V → V (6.50)
where
1. (V , +) is an Abelian group
2. Distributivity:
(a) λ · (x + y) = λ · x + λ · y ∀λ ∈ R, x, y ∈ V
(b) (λ + ψ) · x = λ · x + ψ · x ∀λ, ψ ∈ R, x ∈ V
76
Chapter 6. Linear Algebra 6.5. Vector Spaces
Remark 10
When we started the course, we defined vectors as special objects that can be added
together and multiplied by scalars to yield another element of the same kind (see p. 60).
Examples were geometric vectors, polynomials and Rn . Definition 5 gives now the
corresponding formal definition and applies to all kinds of vectors. We will continue
focusing on vectors as elements of Rn because it is the most general formulation, and
most algorithms are formulated in Rn .
Remark 11
Note that a “vector multiplication” ab, a, b ∈ Rn , is not defined. Theoretically, we
could define it in two ways: (a) We could define an element-wise multiplication, such
that c = a · b with cj = aj bj . This “array multiplication” is common to many program-
ming languages but makes mathematically only limited sense; (b) By treating vectors as
n × 1 matrices (which we usually do), we can use the matrix multiplication as defined
in (6.11). However, then the dimensions of the vectors do not match. Only the fol-
lowing multiplications for vectors are defined: ab> (outer product), a> b (inner/scalar
product).
6.5.1 Examples
• V = Rn , n ∈ N is a vector space with operations defined as follows:
77
6.5. Vector Spaces Chapter 6. Linear Algebra
Remark 12 (Notation)
The three vector spaces Rn , Rn×1 , R1×n are only different with respect to the way of
writing. In the following, we will not make a distinction between Rn and Rn×1 , which
allows us to write n-tuples as column vectors
x1
x = ... .
(6.51)
xn
This will simplify the notation regarding vector space operations. However, we will
distinguish between Rn×1 and R1×n (the row vectors) to avoid confusion with matrix
multiplication. By default we write x to denote a column vector, and a row vector is
denoted by x> , the transpose of x.
1. U , ∅, in particular: 0 ∈ U
2. Closure of U :
78
Chapter 6. Linear Algebra 6.6. Linear (In)Dependence
4 miles East
Examples
• For every vector space V the trivial subspaces are V itself and {0}.
Remark 14
Every subspace U ⊂ Rn is the solution space of a homogeneous linear equation system
Ax = 0.
79
6.6. Linear (In)Dependence Chapter 6. Linear Algebra
80
Chapter 6. Linear Algebra 6.6. Linear (In)Dependence
• A practical way of checking whether the column vectors are linearly independent is
to use Gaussian elimination: Write all vectors as columns of a matrix A. Gaussian
elimination yields a matrix in (reduced) row echelon form. The pivot columns
indicate the vectors, which are linearly independent of the previous18 vectors (note
that there is an ordering of vectors when the matrix is built). If all columns are
pivot columns, the column vectors are linearly independent.
first and third column are pivot columns. The second column is a non-pivot col-
umn because it is 3 times the first column. If there is at least one non-pivot
column, the columns are linearly dependent.
6.6.1 Examples
• Consider R4 with
2 1 −2
−3 0 1
x1 = , x2 = , x3 = . (6.59)
1 1 −1
4 2 1
To check whether they are linearly dependent, we follow the general approach
and solve
2 1 −2
−3 0 1
λ1 x1 + λ2 x2 + λ3 x3 = λ1 + λ2 + λ3 = 0 (6.60)
1 1 −1
4 2 1
81
6.6. Linear (In)Dependence Chapter 6. Linear Algebra
1 1 −1 0 1 1 −1 0
2 1 −2 0 −2R1 0 −1 0 0
+3R ;
−3
0 1 0 1
0 3 −2 0
+3R2
4 2 1 0 −4R1 0 −2 5 0 −2R2
1 1 −1 0 1 1 −1 0
0 −1 0 0 ·(−1) 0 1 0 0
; ;
0 0 −2 0 0 0 1 0
5
0 0 5 0 + 2 R3 0 0 0 0
Here, every column of the matrix is a pivot column19 , i.e., every column is
linearly independent of the columns on its left. Therefore, there is no non-
trivial solution, and we require λ1 = 0, λ2 = 0, λ3 = 0. Hence, the vectors
x1 , x2 , x3 are linearly independent.
• Consider a set of linearly independent vectors b1 , b2 , b3 , b4 ∈ Rn and
x1 = b1 − 2b2 + b3 − b4
x2 = −4b1 − 2b2 + 4b4
(6.61)
x3 = 2b1 + 3b2 − b3 − 3b4
x4 = 17b1 − 10b2 + 11b3 + b4
Are the vectors x1 , . . . , x4 ∈ Rn linearly independent? To answer this question,
we investigate whether the column vectors
1 −4 2 17
−2 −2 3 −10
, , , (6.62)
1 0 −1 11
−1 4 −3 1
are linearly independent. The reduced row echelon form of the corresponding
linear equation system with coefficient matrix
1 −4 2 17
−2 −2 3 −10
A = (6.63)
1 0 −1 11
−1 4 −3 1
is given as
1 0 0 −7
0 1 0 −15
. (6.64)
0 0 1 −18
0 0 0 0
From the reduced row echelon form, we see that the corresponding linear
equation system is non-trivially solvable: The last column is not a pivot column,
and x4 = −7x1 − 15x2 − 18x3 . Therefore, x1 , . . . , x4 are linearly dependent as x4
lies in the span of x1 , . . . , x3 .
19 Note that the matrix is not in reduced row echelon form; it also does not need to be.
82
Chapter 6. Linear Algebra 6.7. Basis and Dimension
Let V be a real vector space and B ⊂ V , B , ∅. Then, the following statements are
equivalent:
• B is basis of V
k
X k
X
x= λi bi = ψi bi (6.65)
i=1 i=1
6.7.1 Examples
• In R3 , the canonical/standard basis is
1 0 0
B= 0 , 1 , 0 . (6.66)
0 0 1
83
6.7. Basis and Dimension Chapter 6. Linear Algebra
• The set
1 2 1
2 −1 1
A= , , (6.68)
3 0 0
4 2 −4
is linearly independent, but not a generating set (and no basis): For instance,
the vector [1, 0, 0, 0]> cannot be obtained by a linear combination of elements
in A.
Remark 17
• Every vector space V possesses a basis B.
• The examples above show that there can be many bases of a vector space V , i.e.,
there is no unique basis. However, all bases possess the same number of elements,
the basis vectors.
we are interested in finding out which vectors x1 , . . . , x4 are a basis for U . For
this, we need to check whether x1 , . . . , x4 are linearly independent. Therefore,
we need to solve
4
X
λi x i = 0 , (6.70)
i=1
84
Chapter 6. Linear Algebra 6.7. Basis and Dimension
From this reduced-row echelon form we see that x1 , x2 , x4 are linearly inde-
pendent (because the linear equation system λ1 x1 + λ2 x2 + λ4 x4 = 0 can only
be solved with λ1 = λ2 = λ4 = 0). Therefore, {x1 , x2 , x4 } is a basis of U .
• Let us now consider a slightly different problem: Instead of finding out which
vectors x1 , . . . , x4 of the span of U form a basis, we are interested in finding
a “simple” basis for U . Here, “simple” means that we are interested in basis
vectors with many coordinates equal to 0.
To solve this problem we replace the vectors x1 , . . . , x4 with suitable linear com-
binations. In practice, we write x1 , . . . , x4 as row vectors in a matrix and per-
form Gaussian elimination:
1 2 −1 −1 −1 1 2 −1 −1 −1
2 −1 1 2 −2 0 −5 3 4 0
3 −4 ;
3 5 −3 0 −10 6 8 0
−1 8 −6 −6 1 0 10 −7 −7 0
1 2 −1 −1 −1 1 2 −1 −1 −1
0 − 75
0 0 0 0 0 0 1 0
; ;
0 1 − 53 − 45 0 0 0 1 −1 0
0 0 −1 1 0 0 0 0 0 0
0 − 54
1 2 0 −2 −1 1 0 −1
0 − 75 0 − 75
0 1 0 0 1 0
; ;
0 0 1 −1 0 0 0 1 −1 0
0 0 0 0 0 0 0 0 0 0
85
6.7. Basis and Dimension Chapter 6. Linear Algebra
From the reduced row echelon form, the simple basis vectors are the rows with
the leading 1s (the “steps”).
1 0 0
0 1
0
U = [ 0 , 0 , 1 ] (6.72)
4 − 7 −1
− 5 5
−1 0 0
|{z} |{z} |{z}
b1 b2 b3
and B = {b1 , b2 , b3 } is a (simple) basis of U (check that they are linearly inde-
pendent!).
6.7.3 Rank
• The number of linearly independent columns of a matrix A ∈ Rm×n equals the
number of linearly independent rows and is called rank of A and is denoted
by rk(A).
• rk(A) = rk(A> ), i.e., the column rank equals the row rank.
• For all A ∈ Rm×n and all b ∈ Rm : The linear equation system Ax = b can be
solved if and only if rk(A) = rk(A|b), where A|b denotes the “extended” system.
• A matrix A ∈ Rm×n has full rank if its rank equals the largest possible for a
matrix of the same dimensions, which is the lesser of the number of rows and
columns, i.e., rk(A) = min(m, n). A matrix is said to be rank deficient if it does
not have full rank.
86
Chapter 6. Linear Algebra 6.8. Intersection of Subspaces
Examples
1 0 1
• A = 0 1 1. A possesses two linearly independent rows (and columns).
0 0 0
Therefore, rk(A) = 2.
" #
1 2 3
• A= . We see that the second row is a multiple of the first row, such
4 8 12
" #
1 2 3
that the row-echelon form of A is , and rk(A) = 1.
0 0 0
1 2 1
• A = −2 −3 1 We use Gaussian elimination to determine the rank:
3 5 0
1 2 1 1 2 1
1 ; 1 +2R1
−2 −3 −2 −3
3 5 0 +R1 − R2 0 0 0
1 2 1
; 0 3
−1
0 0 0
Here, we see that the number of linearly independent rows and columns is 2,
such that rk(A) = 2.
6.8.1 Approach 1
Consider U1 = [b1 , . . . , bk ] ⊂ V and U2 = [cP1 , . . . , cl ] ⊂ V . We know that and x ∈ U1 can
be represented as a linear combination ki=1 λi bi of the basis vectors (or spanning
vectors) b1 , . . . , bk . Equivalently x = lj=1 ψj cj . Therefore, the approach is to find
P
87
6.8. Intersection of Subspaces Chapter 6. Linear Algebra
λ1
.
..
λ
A k = 0 (6.76)
ψ1
.
.
.
ψl
Example
We consider
3
X 2
X
λi b i = x = ψj cj , (6.78)
i=1 j=1
where bi and cj are the basis vectors of U1 and U2 , respectively. The matrix A =
[b1 |b2 |b3 | − c1 | − c2 ] from (6.75) is given as
1 0 0 1 0
1 1 0 −1 −1
A = . (6.79)
0 1 1 −2 0
0 0 1 0 0
1 0 0 1 0
0 1 0 −2 0
. (6.80)
0 0 1 0 0
0 0 0 0 1
88
Chapter 6. Linear Algebra 6.8. Intersection of Subspaces
−1
1
U1 ∩ U2 = ψ1 c1 = [ ] , ψ1 ∈ R . (6.82)
2
0
Remark 18
Alternatively, we could have used λ1 = −ψ1 , λ2 = 2ψ1 , λ3 = 0 and determined the (same)
solution via the basis vectors of U1 as
1 0 −1 −1
1 1 1 1
ψ1 − + 2 = ψ1 = [ ] , ψ1 ∈ R . (6.83)
0 1
2 2
0 0 0 0
6.8.2 Approach 2
In the second approach, we exploit Remark 14, which says that any subspace is the
solution of a homogeneous linear equation system, to determine the intersection
U1 ∩ U2 of two subspaces U1 , U2 ⊂ Rn .
First, we show how to determine the linear equation system that generates a sub-
space; second, we exploit these insights to find U1 ∩ U2 .
Lemma 1
Consider U = [x1 , . . . , xm ] ⊂ Rn and dim(U ) = r. We write the vectors x1 , . . . , xm as rows
of a matrix
>
x1
A = ... ∈ Rm×n
(6.84)
>
xm
and investigate the homogeneous linear equation system Ay = 0. First, the solution
space Ũ possesses dimension k = n − rk(A) = n − dim(U ) = n − r. Second, we choose a
basis (b1 , . . . , bk ) in Ũ and again write these basis vectors as the rows of a matrix
>
b1
B = ... ∈ Rk×n
(6.85)
>
bk
89
6.8. Intersection of Subspaces Chapter 6. Linear Algebra
Proof 4
Define Sh as the solution space of By = 0. It holds that dim(Sh ) = n − rk(B) = n − k = r.
Therefore, dim(Sh ) = dim(U ). From Abj = 0, j = 1, . . . , k it follows that x> i bj = 0 for
i = 1, . . . , m and j = 1, . . . , k (remember how matrix-vector multiplication works), and at
the same time b> j xi = 0. Therefore, Bxi = 0, i = 1, . . . , m and, hence, U ⊂ Sh . However,
since dim(Sh ) = dim(U ) it follows that Sh = U .
Practical Algorithm
Let us summarize the main steps to determine U1 ∩ U2 :
1. Write U1 , U2 as solution spaces of two linear equation systems B 1 x = 0 and
B 2 x = 0:
(a) Write spanning vectors of U1 , U2 as the rows of two matrices A1 , A2 ,
respectively.
(b) Determine S1 as the solution of A1 x = 0 and S2 as the solution of A2 x = 0
(c) Write spanning vectors of S1 and S2 as the rows of the matrices B 1 and
B 2 , respectively.
" #
B
2. U1 ∩ U2 is the solution space of Cx = 0, where C = 1 , which we find by
B2
means of Gaussian elimination.
Example 1
To determine the intersection of two subspaces U1 , U2 ⊂ Rn , we use the above
method. We consider again the subspaces U1 , U2 ⊂ R4 from the example above
(and hopefully, we end up with the same solution):
1 1 0 0
" #
−1 1 2 0
(6.87)
0 1 1 0
A1 = , A2 = ,
0 1 0 0
0 0 1 1
respectively.
20 It also holds that Rn = U ∪ Ũ = U ⊕ Ũ , where ⊕ is the direct sum.
90
Chapter 6. Linear Algebra 6.8. Intersection of Subspaces
1 0 0 1
" #
1 0 −2 0
(6.88)
0 1 0 −1
Ã1 = , Ã2 = .
0 1 0 0
0 0 1 1
1 2 0
−1 0 0
S1 = [ ] , S2 = [ , ] . (6.89)
1 1 0
−1 0 1
(c) U1 is now the solution space of the linear equation system B 1 x = 0 with
h i
B 1 = 1 −1 1 −1 , (6.90)
1 −1 1 −1
" #
B1
C= = 2 0 1 0 . (6.92)
B2
0 0 0 1
To determine this solution space, we follow the standard procedure of (a) com-
puting the reduced row echeolon form
1
1 0 2 0
0 1 − 1 0 (6.93)
2
0 0 0 1
using Gaussian elimination and (b) finding the (general) solution using the
Minus-1 Trick from Section 6.4.3 as
−1
1
U1 ∩ U2 = [ ] , (6.94)
2
0
91
6.8. Intersection of Subspaces Chapter 6. Linear Algebra
Example 2
−1 0 −4 −5 1
1 −1 −1 −2 1
−5 −1 2 2 −6
A1 = 0 3 3 3 0 , A2 = , (6.96)
1 2 −1 3 2
1 −3 1 −2 4
3 1 0 3 3
respectively.
(b) We use Gaussian elimination to determine the corrsponding reduced row
echelon forms
1 0 0 0 12
13
1 0 0 −1 1
6
0 1 0 0
Ã1 = 0 1 0 21 − 34 , Ã2 = 13 . (6.97)
5
0 0 1 0 − 13
1 3
0 0 1 2
4 1
0 0 0 1 − 13
92
Chapter 6. Linear Algebra 6.9. Linear Mappings
To determine this solution space, we follow the standard procedure of (a) com-
puting the reduced row echeolon form
1 0 0 0 −1
(6.102)
0 1 0 −1 −1
0 0 1 −1 −1
using Gaussian elimination and (b) finding the (general) solution using the
Minus-1 Trick from Section 6.4.3 as
0 1
1 1
U1 ∩ U2 = [1 , 1] . (6.103)
1 0
0 1
• Endomorphism: Φ : V → V linear
93
6.9. Linear Mappings Chapter 6. Linear Algebra
Φ:V →W
V W
Ker(Φ) Im(Φ)
0 0
Example: Homomorphism
The mapping Φ : R2 → C, Φ(x) = x1 + ix2 , is a homomorphism:
" # " #! " #! " #!
x1 y1 x1 y1
Φ + = (x1 + y1 ) + i(x2 + y2 ) = x1 + ix2 + y1 + iy2 = Φ +Φ
x2 x2 x2 y2
" #! " #!
x x1
Φ λ 1 = λx1 + λix2 = λ(x1 + ix2 ) = λΦ
x2 x2
(6.107)
We have already discussed the representation of complex numbers as tuples in R2 ,
but now we know why we can do this: There is a bijective linear mapping (we only
showed linearity, but not the bijection) that converts the elementwise addition of
tuples in R2 the set of complex numbers with the corresponding addition.
Theorem 3
Finite-dimensional R-vector spaces V and W are isomorph if and only if dim(V ) =
dim(W ).
94
Chapter 6. Linear Algebra 6.9. Linear Mappings
i.e., the image is the span of the columns of A, also called the column space.
Therefore, the column space (image) is a subspace of Rm , i.e., where m is the
“height” of the matrix.
• The kernel/null space ker(Φ) is the general solution to the linear homogeneous
equation system Ax = 0 and captures all possible linear combinations of the ele-
ments in Rn that produce 0 ∈ Rm .
• The kernel (null space) is a subspace of Rn , where n is the “width” of the matrix.
• The null space focuses on the relationship among the columns, and we can use it
to determine whether/how we can express a column as a linear combination of
other columns.
• The purpose of the null space is to determine whether a solution of the linear
equation system is unique and, if not, to capture all possible solutions.
is linear. To determine Im(Φ) we can simply take the span of the columns of the
transformation matrix and obtain
" # " # " # " #
1 2 −1 0
Im(Φ) = [ , , , ]. (6.113)
1 0 0 1
95
6.9. Linear Mappings Chapter 6. Linear Algebra
This matrix is now in reduced row echelon form, and we can now use the Minus-1
Trick to compute a basis of the kernel (see Section 6.4.3).21 This gives us now the
kernel (null space) as
0 −1
1 1
ker(Φ) = [ 2 , 2 ]. (6.114)
1 0
0 1
Remark 21
Consider R-vector spaces V , W , X. Then:
• For linear mappings Φ : V → W and Ψ : W → X the mapping Ψ ◦ Φ : V → X is
also linear.
• If Φ : V → W is an isomorphism then Φ −1 : W → V is an isomorphism as well.
• If Φ : V → W , Ψ : V → W are linear then Φ + Ψ and λΦ, λ ∈ R are linear, too.
B = (b1 , . . . , bn ) (6.116)
x = α1 b1 + . . . + αn bn (6.117)
21 Alternatively, we can express the non-pivot columns (columns 3 an 4) as linear combinations of
the pivot-columns (columns 1 and 2). The third column a3 is equivalent to − 12 times the second
column a2 . Therefore, 0 = a3 + 21 a2 . In the same way, we see that a4 = a1 − 12 a2 and, therefore,
0 = a1 − 21 a2 − a4 .
96
Chapter 6. Linear Algebra 6.9. Linear Mappings
Remark 22
For an n-dimensional R-vector space V and a basis B of V , the mapping Φ : Rn → V ,
Φ(ei ) = bi , i = 1, . . . , n is linear (and because of Theorem 3 an isomorphism), where
(e1 , . . . , en ) is the standard basis of Rn .
Now we are ready to make a connection between linear mappings between finite-
dimensional vector spaces and matrices.
is the unique representation of Φ(bj ) with respect to C. Then, we call the m × n-matrix
AΦ := ((aij )) (6.120)
Remark 23
• The coordinates of Φ(bj ) are the j-th column of AΦ .
• rk(AΦ ) = dim(Im(Φ))
ŷ = AΦ x̂ . (6.121)
This means that the transformation matrix can be used to map coordinates with
respect to an ordered basis in V to coordinates with respect to an ordered basis in
W.
97
6.9. Linear Mappings Chapter 6. Linear Algebra
Φ(b1 ) = c1 − c2 + 3c3 − c4
Φ(b2 ) = 2c1 + c2 + 7c3 + 2c4 (6.122)
Φ(b3 ) = 3c2 + c3 + 4c4
P4
the transformation matrix AΦ with respect to B and C satisfies Φ(bk ) = i=1 αik ci for
k = 1, . . . , 3 and is given as
1 2 0
−1 1 3
AΦ = (α1 |α2 |α3 ) = , (6.123)
3 7 1
−1 2 4
Similarly, we write the new basis vectors C̃ of W as a linear combination of the basis
vectors of C, which yields
We define S = ((sij )) ∈ Rn×n and T = ((tij )) ∈ Rm×m . In particular, the jth column of S
are the coordinate representations of b̃j with respect to B and the jth columns of T
98
Chapter 6. Linear Algebra 6.9. Linear Mappings
is the coordinate representation of c̃j with respect to C. Note that both S and T are
regular.
For all j = 1, . . . , n, we get
m m m
m
m X
X X X X
Φ(b̃j ) = ãkj c̃k = ãkj tik ci = t ã (6.128)
c
ik kj i
k=1 |{z} k=1 i=1 i=1 k=1
∈W
where we expressed the new basis vectors c̃k ∈ W as linear combinations of the basis
vectors ci ∈ W . When we express the b̃k ∈ V as linear combinations of bi ∈ V , we
arrive at
n n n m
n
m X
X X X X X
Φ(b̃j ) = Φ skj bk = skj Φ(bk ) = skj aik ci = aik skj ci (6.129)
k=1 k=1 k=1 i=1 i=1 k=1
and, therefore,
T Ã = AS, (6.131)
such that
à = T −1 AS. (6.132)
Hence, with a basis change in V (B is replaced with B̃) and W (C is replaced with
C̃) the transformation matrix AΦ of a linear mapping Φ : V → W is replaced by an
equivalent matrix ÃΦ with
ÃΦ = T −1 AΦ S. (6.133)
Definition 15 (Equivalence)
Two matrices A, Ã ∈ Rm×n are equivalent if there exist regular matrices S ∈ Rn×n and
T ∈ Rm×m , such that à = T −1 AS.
Definition 16 (Similarity)
Two matrices A, Ã ∈ Rn×n are similar if there exists a regular matrix S ∈ Rn×n with
à = S −1 AS
Remark 24
Similar matrices are always equivalent. However, equivalent matrices are not necessar-
ily similar.
Remark 25
Consider R-vector spaces V , W , X. From Remark 21 we already know that for linear
mappings Φ : V → W and Ψ : W → X the mapping Ψ ◦ Φ : V → X is also linear.
With transformation matrices AΦ and AΨ of the corresponding mappings, the overall
transformation matrix AΨ ◦Φ is given by AΨ ◦Φ = AΨ AΦ .
99
6.9. Linear Mappings Chapter 6. Linear Algebra
In light of this remark, we can look at basis changes from the perspective of concate-
nating linear mappings:
• AΦ : B → C
• ÃΦ : B̃ → C̃
• S : B̃ → B
• T : C̃ → C and T −1 : C → C̃
and
B̃ → C̃ = B̃ → B→ C → C̃ (6.134)
ÃΦ = T −1 AΦ S . (6.135)
Note that the execution order in (6.135) is from right to left because vectors are
multiplied at the right-hand side.
Example
Consider a linear mapping Φ : R3 → R4 whose transformation matrix is
1 2 0
−1 1 3
AΦ = (6.136)
3 7 1
−1 2 4
100
Chapter 6. Linear Algebra 6.10. Determinants
Then,
1 1 0 1
1 0 1
1 0 1 0
S = 1 1 0 , T = , (6.139)
0 1 1 0
0 1 1
0 0 0 1
where the ith column of S is the coordinate representation of b̃i in terms of the basis
vectors of B.22 Similarly, the jth column of T is the coordinate representation of c̃j
in terms of the basis vectors of C.
Therefore, we obtain
Soon, we will be able to exploit the concept of a basis change to find a basis with
respect to which the transformation matrix of an endomorphism has a particularly
simple (diagonal) form.
6.10 Determinants
Determinants are important concepts in linear algebra. For instance, they indicate
whether a matrix can be inverted or we can use them to check for linear indepen-
dence. A geometric intuition is that the absolute value of the determinant of real
vectors is equal to the volume of the parallelepiped spanned by those vectors. Deter-
minants will play a very important role for determining eigenvalues and eigenvectors
(Section 6.11).
Determinants are only defined for square matrices A ∈ Rn×n , and we write det(A) or
|A|.
Remark 26
• For n = 1, det(A) = det(a11 ) = a11
• For n = 2,
a11 a12
det(A) = = a11 a22 − a12 a21 (6.141)
a21 a22
22 Since
B is the standard basis, this representation is straightforward toPfind. For a general basis B
we would need to solve a linear equation system to find the λi such that λi=1 3 bi = b̃j , j = 1, . . . , 3.
101
6.10. Determinants Chapter 6. Linear Algebra
• det(A) = det(A> )
• Similar matrices possess the same determinant. Therefore, for a linear mapping
Φ : V → V all transformation matrices AΦ of Φ have the same determinant.
Theorem 5
For A ∈ Rn×n :
1. Adding a multiple of a column/row to another one does not change det(A).
Because of this theorem, we can use Gaussian elimination to compute det(A). How-
ever, we need to pay attention to swapping the sign when swapping rows.
Example
2 0 1 2 0 2 0 1 2 0 2 0 1 2 0
2 −1 0 1 1 0 −1 −1 −1 1 0 −1 −1 −1 1
0 1 2 1 2 = 0 1 2 1 2 = 0 0 1 0 3
(6.143)
−2 0 2 −1 2 0 0 3 1 2 0 0 3 1 2
2 0 0 1 1 0 0 −1 −1 1 0 0 −1 −1 1
2 0 1 2 0 2 0 1 2 0
0 −1 −1 −1 1 0 −1 −1 −1 1
= 0 0 1 0 3 = 0 0 1 0 3 = 6 (6.144)
0 0 0 1 −7 0 0 0 1 −7
0 0 0 −1 4 0 0 0 0 −3
102
Chapter 6. Linear Algebra 6.10. Determinants
We first used Gaussian elimination to bring A into triangular form, and then ex-
ploited the fact that the determinant of a triangular matrix is the product of its
diagonal elements.
Theorem 6 (Laplace Expansion)
Consider a matrix A = ((aij )) ∈ Rn×n . We define Ai,j to be the matrix that remains if we
delete the ith row and the jth column from A. Then, for all j = 1, . . . , n:
1. det(A) = nk=1 (−1)k+j akj det(Ak,j )
P
“Development about column j”
Example 2
Let us re-compute the example in (6.143)
2 0 1 2 0 2 0 1 2 0
1
2 −1 0 1 1 0 −1 −1 −1 1 −1 −1 −1
0 1 2 1 2 = 0 1 2 1 2 1st=col. (−1)1+1 2 · 1 2 1 2
0 3 1 (6.148)
2
−2 0 2 −1 2 0 0 3 1 2
0 −1 −1 1
2 0 0 1 1 0 0 −1 −1 1
If we now subtract the fourth row from the first row and multiply (−2) times the
third column to the fourth column we obtain
−1 0 0 0
2 1 0
1 2 1 0 1st row
3rd col. 3+3 2 1
2 = −2 3 1 0 = (−2) · 3(−1) ·
= 6 (6.149)
0 3 1 0 3 1
−1 −1 3
0 −1 −1 3
103
6.11. Eigenvalues Chapter 6. Linear Algebra
6.11 Eigenvalues
Definition 17 (Eigenvalue, Eigenvector)
For an R-vector space V and a linear map Φ : V → V , the scalar λ ∈ R is called
eigenvalue if there exists an x ∈ V , x , 0, with
Remark 29
• Apparently, Eλ = ker(Φ − λidV ) since
Theorem 7
Consider an R-vector space V and a linear map Φ : V → V with pairwise differ-
ent eigenvalues λ1 , . . . , λk and corresponding eigenvectors x1 , . . . , xk . Then the vectors
x1 , . . . , xk are linearly independent.
• rk(A − λI n ) < n
• det(A − λI n ) = 0
Note that A and A> possess the same eigenvalues, but not the same eigenvectors.
104
Chapter 6. Linear Algebra 6.11. Eigenvalues
a0 = det(A), (6.153)
n−1
an−1 = (−1) tr(A), (6.154)
Pn
where tr(A) = i=1 aii is the trace of A and defined as the sum of the diagonal elements
of A.
Theorem 8
λ ∈ R is eigenvalue of A ∈ Rn×n if and only if λ is a root of the characteristic polynomial
p(λ) of A.
Remark 30
1. If λ is an eigenvalue of A ∈ Rn×n then the corresponding eigenspace Eλ is the
solution space of the homogeneous linear equation system (A − λI n )x = 0.
105
6.11. Eigenvalues Chapter 6. Linear Algebra
0 −1 1 1
−1 1 −2 3
• A =
2 −1 0 0
1 −1 1 0
1. Characteristic polynomial:
−λ −1 1 1 −λ −1 1 1
−1 1 − λ −2 3 0 −λ −1 3 − λ
p(λ) = = (6.158)
2 −1 −λ 0 0 1 −2 − λ 2λ
1 −1 1 −λ 1 −1 1 −λ
−λ −1 − λ 0 1
0 −λ −1 − λ 3 − λ
= (6.159)
0 1 −1 − λ 2λ
1 0 0 −λ
−λ −1 − λ 3 − λ −1 − λ 0 1
= (−λ) 1 −1 − λ 2λ − −λ −1 − λ 3 − λ (6.160)
0 0 −λ 1 −1 − λ 2λ
−1 − λ 0 1
2 −λ −1 − λ
= (−λ) − −λ −1 − λ 3 − λ (6.161)
1 −1 − λ
1 −1 − λ 2λ
106
Chapter 6. Linear Algebra 6.11. Eigenvalues
6.11.4 Applications
4 2 0 2 4 6
3 3
2 2
1 1
0 0
1 1
2 2
3 3
4 2 0 2 4 6 4 2 0 2 4 6
(b) Data set with projection onto the first princi- (c) Data set with projection onto the second prin-
pal component cipal component
Figure 6.4: (a) Two-dimensional data set (red) with corresponding eigenvectors (blue),
scaled by the magnitude of the corresponding eigenvectors. The longer the eigenvector
the higher the variability (spread) of the data along this axis. (b) For optimal (linear)
dimensionality reduction, we would project the data onto the subspace spanned by the
eigenvector associated with the largest eigenvalue. (c) Projection of the data set onto
the subspac spanned by the eigenvector associated with the smaller eigenvalue leads to
a larger projection error.
107
6.12. Diagonalization Chapter 6. Linear Algebra
6.12 Diagonalization
Diagonal matrices are possess a very simple structure and they allow for a very fast
computation of determinants and inverses, for instance. In this section, we will have
a closer look at how to transform matrices into diagonal form. More specifically, we
25 Tobe more precise, PCA can be considered a method for (1) performing a basis change from the
standard basis toward the eigenbasis in Rd , d < n, (2) projecting the data in Rn onto the subspace
spanned by the d eigenvectors corresponding to the d largest eigenvalues (which are called the
principal components), (3) moving back to the standard basis.
26 Developed at Stanford University by Larry Page and Sergey Brin in 1996.
27 When normalizing x∗ , such that kx∗ k = 1 we can interpret the entries as probabilities.
108
Chapter 6. Linear Algebra 6.12. Diagonalization
c1 0 · · · · · · 0
0 c2 0 · · · 0
. .. .. .. .
.. . . . (6.164)
0 · · · 0 cn−1 0
0 · · · · · · 0 cn
Theorem 9
For an endomorphism Φ of an n-dimensional R-vector space V the following statements
are equivalent:
1. Φ is diagonalizable.
Theorem 10
For an n-dimensional R-vector space V and a linear mapping Φ : V → V the following
holds: Φ is diagonalizable if and only if
2. For i = 1, . . . , k
ues is diagonalizable.
109
6.12. Diagonalization Chapter 6. Linear Algebra
In (6.165) we say that the characteristic polynomial decomposes into linear factors.
The second requirement in (6.166) says that the dimension of the eigenspace Eci
must correspond to the (algebraic) multiplicity ri of the eigenvalues in the character-
istic polynomial, i = 1, . . . , k.31 The dimension of the eigenspace Eci is the dimension
of the kernel/null space of Φ − ci idV .
Theorem 10 holds equivalently if we replace Φ with A ∈ Rn×n and idV with I n .
If Φ is diagonalizable it possesses a transformation matrix of the form
c1 0 · · · · · · · · · · · · 0
0 . . . . . .
..
.
. . ... ..
.. . . c .
1
.. . .
.. .. .. . ..
AΦ = . . (6.168)
.. .
. .. c . . . ..
. k
. .. ..
.. . . 0
0 · · · · · · · · · · · · 0 ck
where each eigenvalue ci appears ri times (its multiplicity in the characteristic poly-
nomial and the dimension of the corresponding eigenspace) on the diagonal.
6.12.1 Examples
"
#
1 0
• A= .
1 1
0 −1 1 1
−1 1 −2 3
• A = .
2 −1 0 0
1 −1 1 0
31 More explicitly, the algebraic multiplicity of ci is the number of times ri it is repeated as a root
of the characteristic polynomial.
110
Chapter 6. Linear Algebra 6.12. Diagonalization
3 2 −1
• A = 2 6 −2.
0 0 2
Therefore, A is diagonalizable.
Remark 31
If A ∈ Rn×n is diagonalizable and (b1 , . . . , bn ) is an ordered basis of eigenvectors of A
with Abi = ci bi , i = 1, . . . , n, then it holds that for the regular matrix S = (b1 | . . . |bn )
Coming backto the above example, where we wanted to determine the diagonal
3 2 −1
form of A = 2 6 −2. We already know that A is diagonalizable. We now de-
0 0 2
termine the eigenbasis of R3 that allows us to transform A into a similar matrix in
diagonal form via S −1 AS:
111
6.12. Diagonalization Chapter 6. Linear Algebra
1 −2 1
S = (b1 |b2 |b3 ) = 0 1 2 (6.171)
1 0 0
such that
2 0 0
S −1 AS = 0 2 0 . (6.172)
0 0 7
Remark 32
The dimension of the eigenspace Eλ cannot be greater than the algebraic multiplicity of
the corresponding eigenvalue λ.
Remark 33
So far, we computed diagonal matrices as D = S −1 AS. However, we can equally write
A = SDS −1 . Here, we can interpret the transformation matrix A as follows: S −1 per-
forms a basis change from the standard basis into the eigenbasis. Then, D then scales
the vector along the axes of the eigenbasis, and S transforms the scaled vectors back
into the standard/canonical coordinates.
6.12.2 Applications
Diagonal matrices D = S −1 AS exhibit the nice properties that they can be easily
raised to a power:
Ak = (S −1 DS)k = S −1 D k S . (6.173)
p(Φ) = 0 (6.174)
112
Chapter 6. Linear Algebra 6.13. Scalar Products
Remark 34
• Note that the right hand side of (6.174) is the zero mapping (or the 0-matrix
when we use the transformation matrix AΦ ).
Applications
# "
−1 2 n−1 1 −1
• Find an expression for A in terms of I , A, A , . . . , A . Example: A =
2 1
2
has the characteristic polynomial p(λ) = λ − 2λ + 3. Then Theorem 11 states
that A2 − 2A + 3I = 0 and, therefore, −A2 + 2A = 3I ⇔ A−1 = 13 (2I − A)
• The pair (V , h·, ·i) is called Euclidean vector space or (real) vector space with
scalar product.
6.13.1 Examples
Pn
• For V = Rn we define the standard scalar product hx, yi := x> y = i=1 xi yi .
In a Euclidean vector space, the scalar product allows us to introduce concepts, such
as lengths, distances and orthogonality.
113
6.13. Scalar Products Chapter 6. Linear Algebra
k·k : V → R (6.175)
x 7→ kxk (6.176)
is called norm.
6.13.3 Example
In geometry, we are often interested in lengths of vectors. We can now use the scalar
product to compute them. For instance, in a Euclidean vector√ space√with standard
>
scalar product, if x = [1, 2] then its norm/length is kxk = 1 + 22 = 5
2
Remark 35
The norm k · k possesses the following properties:
d : V ×V → R (6.177)
(x, y) 7→ d(x, y) (6.178)
is called metric.
A metric d satisfies:
Theorem 12
Let V be a Euclidean vector space (V , h·, ·i) and x, y, z ∈ V . Then:
114
Chapter 6. Linear Algebra 6.13. Scalar Products
6.13.4 Applications
Scalar products allow us to compute angles between vectors or distances. A major
purpose of scalar products is to determine whether vectors are orthogonal to each
other; in this case hx, yi = 0. This will play an important role when we discuss projec-
tions in Section 6.16. The scalar product also allows us to determine specific bases of
vector (sub)spaces, where each vector is orthogonal to all others (orthogonal bases)
using the Gram-Schmidt method32 . These bases are important optimization and
numerical algorithms for solving linear equation systems. For instance, Krylov sub-
space methods33 , such as Conjugate Gradients or GMRES, minimize residual errors
that are orthogonal to each other (Stoer and Burlirsch, 2002).
In machine learning, scalar products are important in the context of kernel meth-
ods (Schölkopf and Smola, 2002). Kernel methods exploit the fact that many linear
algorithms can be expressed purely by scalar product computations.34 Then, the
“kernel trick” allows us to compute these scalar products implicitly in a (potentially
infinite-dimensional) without even knowing this feature space explicitly. This al-
lowed the “non-linearization” of many algorithms used in machine learning, such
as kernel-PCA (Schölkopf et al., 1998) for dimensionality reduction. Gaussian pro-
cesses (Rasmussen and Williams, 2006) also fall into the category of kernel methods
and are the current state-of-the-art in probabilistic regression (fitting curves to data
points).
32 not discussed in this course
33 The basis for the Krylov subspace is derived from the CayleyHamilton theorem, see Sec-
tion 6.12.3, which allows us to compute the inverse of a matrix in terms of a linear combination
of its powers.
34 Matrix-vector multiplication Ax = b falls into this category since b is a scalar product of the ith
i
row of A with x.
115
6.14. Affine Subspaces Chapter 6. Linear Algebra
L y
x0
Figure 6.5: Points y on a line lie in an affine subspace L with support point x0 and
direction u.
Note that the definition of an affine subspace excludes 0 if the support point x0 < U .
Therefore, an affine subspace is not a (linear) subspace of V for x0 < U .
Examples of affine subspaces are points, lines and planes in R2 , which do not (nec-
essarily) go through the origin.
Remark 36
• Consider two affine subspaces L = x0 + U and L̃ = x̃0 + Ũ of a vector space V .
L ⊂ L̃ if and only if U ⊂ Ũ and x0 − x̃0 ∈ Ũ .
x = x 0 + λ1 b 1 + . . . + λk b k , (6.182)
where λ1 , . . . , λk ∈ R.
This representation is called parametric equation of L with directional vectors
b1 , . . . , bk and parameters λ1 , . . . , λk .
Examples
• One-dimensional affine subspaces are called lines and can be written as y =
x0 + λx1 , where λ ∈ R, where U = [x1 ] ⊂ Rn is a one-dimensional subspace of
Rn .
35 L is also called linear manifold.
116
Chapter 6. Linear Algebra 6.14. Affine Subspaces
From this, we obtain the following inhomogeneous equation system with unknowns
λ1 , λ 2 , λ 3 , ψ 1 , ψ 2 :
1 0 0 1 0 1
1 1 0 −1 −1 1
0 1 1 −2 0 0
0 0 1 0 0 −1
36 Recall that for homogeneous equation systems Ax = 0 the solution was a vector subspace (not
affine).
117
6.14. Affine Subspaces Chapter 6. Linear Algebra
L2 1
0 1
L1
Figure 6.6: Parallel lines. The affine subspaces L1 and L2 are parallel with L1 ∩ L2 = ∅.
Using Gaussian elimination, we quickly obtain the reduced row echelon form
1 0 0 1 0 1
0 1 0 −2 −1 0
,
0 0 1 0 1 0
0 0 0 0 1 1
Parallelism
Two affine subspaces L1 and L2 are parallel (L1 ||L2 ) if the following holds for the
corresponding direction spaces U1 , U2 : U1 ⊂ U2 or U2 ⊂ U1 .37
Parallel affine subspaces, which do not contain each other (such that L1 ⊂ L2 or
L2 ⊂ L1 ), have no points in common, i.e., L1 ∩ L2 = ∅. For instance,
" # " # " # " #
1 1 2 1
L1 = + [ ] , L2 = +[ ] (6.186)
−1 0 1 0
are parallel (U1 = U2 in this case) and L1 ∩ L2 = ∅ because the lines are offset as
illustrated in Figure 6.6.
We talk about skew lines if they are neither parallel nor have an intersection point.
Imagine non-parallel lines in 3D that “miss” each other.
37 Note that this definition of parallel allows for L1 ⊂ L2 or L2 ⊂ L1 .
118
Chapter 6. Linear Algebra 6.14. Affine Subspaces
Examples
1. Consider two lines g = x1 + U1 , h = x2 + U2 in R2 , where U1 , U2 ⊂ R are one-
dimensional subspaces.
(a) If g ∩ h , ∅:
• For dim(U1 ∩ U2 ) = 0, we get a single point as the intersection.
• For dim(U1 ∩ U2 ) = 1 it follows that U1 = U2 and g = h.
(b) If g ∩ h = ∅:
• For dim(U1 ∩ U2 ) = 1 it follows again that U1 = U2 , and g||h, g , h
(because they do not intersect).
• The case dim(U1 ∩ U2 ) = 0 cannot happen in R2 .
2. Consider two lines g = x1 + U1 , h = x2 + U2 in Rn , n ≥ 3
(a) For g ∩ h , ∅ we obtain (as in R2 ) that either g and h intersect in a single
point or g = h.
(b) If g ∩ h = ∅:
• For dim(U1 ∩ U2 ) = 0, we obtain that g and h are skew lines (this
cannot happen in R2 ). This means, there exists no plane that contains
both g and h.
• If dim(U1 ∩ U2 ) = 1 it follows that g||h.
3. Consider two hyper-planes L1 = x1 + U1 and L2 = x2 + U2 in Rn , n = 3, where
U1 , U2 ⊂ R2 are two-dimensional subspaces.
(a) If L1 ∩ L2 , ∅ the intersection is an affine subspace. The kind of subspace
depends on the dimension dim(U1 ∩ U2 ) of the intersection of the corre-
sponding direction spaces.
• dim(U1 ∩ U2 ) = 2: Then U1 = U2 and, therefore, L1 = L2 .
• dim(U1 ∩ U2 ) = 1: The intersection is a line.
• dim(U1 ∩ U2 ) = 0: Cannot happen in R3 .
(b) If L1 ∩ L2 = ∅, then dim(U1 ∩ U2 ) = 2 (no other option possible in R3 ) and
L1 ||L2 .
4. Consider two planes L1 = x1 + U1 and L2 = x2 + U2 in Rn , n = 4, where U1 , U2 ⊂
R2 are two-dimensional subspaces.
(a) For L1 ∩ L2 , ∅ the additional case is possible that the planes intersect in a
point.
(b) For L1 ∩ L2 = ∅ the additional case is possible that dim(U1 ∩ U2 ) = 1. This
means that the planes are not parallel, they have no point in common, but
there is a line g such that g||L1 and g||L2 .
5. For two planes L1 = x1 + U1 and L2 = x2 + U2 in Rn , n > 4, where U1 , U2 ⊂ R2
are two-dimensional subspaces, all kinds of intersections are possible.
119
6.15. Affine Mappings Chapter 6. Linear Algebra
φ :V → W (6.187)
x 7→ a + Φ(x) (6.188)
• Affine mappings keep the geometric structure invariant, and preserve the di-
mension and parallelism.
Theorem 13
Let V , W be finite-dimensional R-vector spaces and φ : V → W an affine mapping.
Then:
6.16 Projections
Projections are an important class of linear transformations (besides rotations and
reflections38 ). Projections play an important role in graphics (see e.g., Figure 6.7),
coding theory, statistics and machine learning. We often deal with data that is very
high-dimensional. However, often only a few dimensions are important. In this case,
we can project the original very high-dimensional data into a lower-dimensional
feature space39 and work in this lower-dimensional space to learn more about the
data set and extract patterns. For example, machine learning tools such as Principal
Components Analysis (PCA) by Hotelling (1933) and Deep Neural Networks (e.g.,
deep auto-encoders, first applied by Deng et al. (2010)) exploit this idea heavily.
In the following, we will focus on linear orthogonal projections.
Definition 27 (Projection)
Let V be an R-vector space and W ⊂ V a subspace of V . A linear mapping π : V → W
is called a projection if π2 = π ◦ π = π.
38 Note that translations are not linear. Why?
39 “Feature” is just a commonly word for “data representation”.
120
Chapter 6. Linear Algebra 6.16. Projections
Figure 6.7: The shade is the projection of the ball onto a plane (table) with the center
being the light source. Adapted from http://tinyurl.com/ka4t28t
In the following, we will derive projections from points in the Euclidean vector space
(Rn , h·, ·i) onto subspaces. We will start with one-dimensional subspaces, the lines. If
not mentioned otherwise, we assume the standard scalar product hx, yi = x> y.
121
6.16. Projections Chapter 6. Linear Algebra
b
x
p = πU (x)
sin ω
ω ω
cos ω
b
(a) General case (b) Special case: Unit circle
with kxk = 1
In the following three steps, we determine λ, the projection point πU (x) = p ∈ U and
the projection matrix P π that maps arbitrary x ∈ Rn onto U .
x − p = x − λb ⊥ b (6.189)
⇔b> (x − λb) = b> x − λb> b = 0 (6.190)
b> x b> x
⇔λ = b = (6.191)
b> b kbk2
b> x
p = λb = b. (6.192)
kbk2
We can also compute the length of p (i.e., the distance from 0) we can deter-
mine by means of Definition 22:
|b> x| kbk
kpk = kλbk = |λ| kbk = 2
kbk = | cos ω| kxk kbk = cos ω kxk . (6.193)
kbk kbk2
Here, ω is the angle between x and b. This equation should be familiar from
trigonometry: If kxk = 1 it lies on the unit circle. Then the projection onto the
horizontal axis b is exactly cos ω. An illustration is given in Figure 6.8(b)
b> x bb>
p = λb = b = x (6.194)
kbk2 kbk2
122
Chapter 6. Linear Algebra 6.16. Projections
bb>
Pπ= . (6.195)
kbk2
Note that bb> is a matrix (with rank 1) and kbk2 = hb, bi is a scalar.
The projection matrix P π projects any vector x onto the line through the origin with
direction b (equivalently, the subspace U spanned by b).
Example
h i>
Find the projection matrix P π onto the line through the origin and b = 1 2 2 .
b is a direction and a basis of the one-dimensional subspace (line through origin).
With (6.195), we obtain
1 i 1 1 2 2
bb> 1 h
P π = > = 2 1 2 2 = 2 4 4 . (6.196)
b b 9 2 9
2 4 4
Let us nowh choose ia particular x and see whether it lies in the subspace spanned by
>
b. For x = 1 1 1 , the projected point is
Note that the application of P π to p does not change anything, i.e., P π p = p.41 This
is expected because according to Definition 27 we know that a projection matrix P π
satisfies P 2π x = P π x.
123
6.16. Projections Chapter 6. Linear Algebra
hb1 , x − pi = b>
1 (x − p) = 0 (6.198)
..
. (6.199)
>
hbm , x − pi = bm (x − p) = 0 (6.200)
b>
1 (x − Bλ) = 0 (6.201)
..
. (6.202)
>
bm (x − Bλ) = 0 (6.203)
Factorizing yields
and the expression on the right-hand side is called normal equation.43 Since
the vectors b1 , . . . , bm are a basis and, therefore, linearly independent, B > B is
regular and can be inverted. This allows us to solve for the optimal coefficients
The matrix (B > B)−1 B > is often called the pseudo-inverse of B, which can be
computed for non-square matrices B. It only requires that B > B is positive
definite. In practical applications (e.g., linear regression), we often add I to
B > B to guarantee positive definiteness or increase numerical stability. This
“ridge” can be rigorously derived using Bayesian inference.44
124
Chapter 6. Linear Algebra 6.16. Projections
3. Find the projection matrix P π . From (6.207) we can immediately see that the
projection matrix that solves P π x = p must be
P π = B(B > B)−1 B > . (6.208)
Remark 38
Comparing the solutions for projecting onto a line (1D subspace) and the general case,
we see that the general case includes the line as a special case: If dim(U ) = 1 then B > B
is just a scalar and we can rewrite the projection matrix in (6.208) P π = B(B > B)−1 B >
>
as P π = BB
B>B
, which is exactly the projection matrix in (6.195).
Example
To verify the results, we can (a) check whether the error vector p − x is orthogonal
to all basis vectors of U , (b) verify that P π = P 2π (see Definition 27).
Remark 39
In vector spaces with non-standard scalar products, we have to pay attention when
computing angles and distances, which are defined by means of the scalar product.
125
6.17. Rotations Chapter 6. Linear Algebra
Figure 6.9: The robotic arm needs to rotate its joints in order to pick up objects or to
place them correctly. Figure taken from (Deisenroth et al., 2015).
6.16.3 Applications
Projections are often used in computer graphics, e.g., to generate shadows, see Fig-
ure 6.7. In optimization, orthogonal projections are often used to (iteratively) min-
imize residual errors. This also has applications in machine learning, e.g., in linear
regression where we want to find a (linear) function that minimizes the residual er-
rors, i.e., the lengths of the orthogonal projections of the data onto the line (Bishop,
2006). PCA (Hotelling, 1933) also uses projections to reduce the dimensionality
of high-dimensional data: First, PCA determines an orthogonal basis of the data
space. It turns out that this basis is the eigenbasis of the data matrix. The impor-
tance of each individual dimension of the data is proportional to the correspond-
ing eigenvalue. Finally, we can select the eigenvectors corresponding to the largest
eigenvalues to reduce the dimensionality of the data, and this selection results in
the minimal residual error of the data points projected onto the subspace spanned
by these principal components. Figure 6.4 illustrates the projection onto the first
principal component for a two-dimensional data set.
6.17 Rotations
An important category of linear mappings are the rotations. A rotation is acting
to rotate an object (counterclockwise) by an angle θ about the origin. Important
application areas of rotations include computer graphics and robotics. For example,
it is often important to know how to rotate the joints of a robotic arm in order to
pick up or place an object, see Figure 6.9.
( " # " #)
2 1 0
Consider the standard basis in R given by e1 = , e2 = . Assume we want
0 1
to rotate this coordinate system by an angle θ as illustrated in Figure 6.10. Since
rotations Φ are linear mappings, we can express them by a transformation matrix
RΦ (θ). Trigonometry allows us to determine the coordinates of the rotated axes with
126
Chapter 6. Linear Algebra 6.17. Rotations
e2
θ
− sin θ e1 cos θ
Figure 6.10: Rotation of the standard basis in R2 by an angle θ
e2
θ e1
e3
Figure 6.11: Rotation of a general vector (black) in R3 by an angle θ about the e2 -axis.
The rotated vector is shown in blue.
127
6.18. Vector Product (Cross Product) Chapter 6. Linear Algebra
Remark 40
• Composition of rotations sums their angles:
Rφ Rθ = Rφ+θ (6.219)
• kxk = kRΦ xk, i.e., the original vector and the rotated vector have the same length.
• This implies that det(RΦ ) = 1
• x and x0 are separated by an angle θ.
• Only in two dimensions vector rotations are commutative (R1 R2 = R2 R1 ) and
form an Abelian group (with multiplication) if they rotate about the same point
(e.g., the origin).
• Rotations in three dimensions are generally not commutative. Therefore, the order
in which rotations are applied is important, even if they rotate about the same
point.
• Rotations have no real eigenvalues.
128
Chapter 6. Linear Algebra 6.18. Vector Product (Cross Product)
x×y
y kx × yk
0 x
Figure 6.12: Vector product. The vector product x×y is orthogonal to the plane spanned
by x, y, and its length corresponds to the area of the parallelogram having x and y as
sides.
Remark 41
• The vector product x × y of two linearly independent vectors x and y is a vector
that is orthogonal to both and, therefore, normal to the plane spanned by them
(see Fig. 6.12).
• The length of the vector product x × y corresponds to the (positive) area of the
parallelogram having x and y as sides (see Fig. 6.12), which is given as kx × yk =
kxk kyk sin θ, where θ is the angle between x, y.
• Since the magnitude of the cross product goes by the sine of the angle between its
arguments, the cross product can be thought of as a measure of orthogonality in
the same way that the scalar product is a measure of parallelism (which goes by
the cosine of the angle between its arguments).
Remark 42
The vector/cross product possesses the following properties:
• Anti-commutative: x × y = −y × x
• Jacobi identity: x × (y × z) + y × (z × x) + z × (x × y) = 0
45 All vectors are orthogonal and have unit length k · k = 1.
129
6.18. Vector Product (Cross Product) Chapter 6. Linear Algebra
• hx × y, xi = hx × y, yi = 0
130
Bibliography
Deisenroth, M. P., Fox, D., and Rasmussen, C. E. (2015). Gaussian Processes for
Data-Efficient Learning in Robotics and Control. IEEE Transactions on Pattern Anal-
ysis and Machine Intelligence, 37(2):408–423. pages 126
Deng, L., Seltzer, M. L., Yu, D., Acero, A., Mohamed, A.-r., and Hinton, G. E. (2010).
Binary Coding of Speech Spectrograms using a Deep Auto-Encoder. In Interspeech,
pages 1692–1695. pages 120
Drumm, V. and Weil, W. (2001). Lineare Algebra und Analytische Geometrie. Lec-
ture Notes, Universität Karlsruhe. pages 60
Hogben, L., editor (2013). Handbook of Linear Algebra. Discrete Mathematics and
Its Applications. Chapman and Hall, 2nd edition. pages 60
Knop, R. (1969). Remark on Algorithm 334 [G5]: Normal Random Deviates. Com-
munications of the ACM, 12(5):281. pages 41
Schölkopf, B., Smola, A. J., and Müller, K.-R. (1998). Nonlinear Component Analysis
as a Kernel Eigenvalue Problem. Neural Computation, 10(5):1299–1319. pages
115
131
Bibliography BIBLIOGRAPHY
132