CorinnaII

MATH36206 - MATHM6206
Dynamical Systems and Ergodic Theory

Teaching Block 1, 2014/15
Lecturers:
Prof. Jens Marklof

Dr. Corinna Ulcigrai
PART II: LECTURES 8-15
course web site: www.maths.bris.ac.uk/∼majm/DSET/
Copyright c University of Bristol 2010 & 2014. This material is copyright of the University.
It is provided exclusively for educational purposes at the University
and is to be downloaded or copied for your private study only.
Chapter 2
Topological Dynamics
and Symbolic Dynamics
2.1 Metric Spaces and Basic Topology notions

In this section we briefly overview some basic notions about metric spaces and topology.
A metric space (X, d) is a space X with a distance function d : X × X → R+ (also called metric, from
which the name metric space), that is a function which assigns to each pair of points x, y ∈ X a real number
d(x, y) (their distance) and has the following properties:
Definition 2.1.1. A distance d is a function d : X × X → R+ such that

1. If d(x, y) = 0 then x = y;
2. For each x, y ∈ X we have d(x, y) = d(y, x) (symmetry);
3. The triangle inequality holds, that is for all x, y, z ∈ X
d(x, z) ≤ d(x, y) + d(y, z).
Examples of metric spaces and distances are the following. The first three are classical examples, while
the following two are useful in dynamical systems.
Example 2.1.1. 1. X = R or X = [0, 1] with
d(x, y) = |x − y|.
2. X = R2 or X = [0, 1] × [0, 1] with the Euclidean distance: if x = (x1 , x2 ) and y = (y1 , y2 ) are points in
R2 , their distance is p
d(x, y) = (x1 − x2 )2 + (y1 − y2 )2
3. X = S 1 with the arc length distance d(z1 , z2 ) defined in §1.2.

4. Σ+ = {0, 1}N , the shift space of one-sided sequences, is a metric space with the following distance:
∞
X |ai − bi |
d((ai )∞ ∞
i=1 , (bi )i=1 ) = .
i=1
2i
In particular two points (ai )∞ ∞ +

i=1 , (bi )i=1 ∈ Σ are close if and only if the first block of digits agree: for
example, if ak = bk for 1 ≤ k ≤ n, then the distance is less than 1/2n .
1
MATH36206 - MATHM6206 Topological and Symbolic Dynamics
5. Let (X, d) be any metric space and f : X → X. Then for each n ∈ N+ we can define a new distance,
dn , given by
dn (x, y) = max d(f k (x), f k (y)).
k=0,...,n−1
Two points x, y are close in the dn metric if their orbits up to time n stay close. We will use this distance
to defined topological entropy in §2.3.
Exercise 2.1.1. Check that the distances in the previous Examples satisfy the properties of distance in
Definition 2.1.1. For each, describe the ball of radius at a point.
In a metric space one can talk about convergence and continuity as in Rn . Let (X, d) be a metric space.
Given x ∈ X and > 0, let Bd (x, ) be the ball of radius around the point x defined using the distance d,
that is
Bd (x, ) = {y ∈ X such that d(x, y) < }.
If there is no ambiguity about the distance, we will often write simply B(x, ), dropping the subscript d. We
can use balls to define convergence:
Definition 2.1.2. A sequence (xn )n∈N ⊂ X converges to x and we write limn→∞ xn = x if for any > 0
there exists N > 0 such that xn ∈ Bd (x, ) for all n ≥ N .
We can use the distance to define the notion of open and closed sets.
Definition 2.1.3. A set U ⊂ X of a metric space (X, d) is open if for any x ∈ U there exists an > 0 such
that
Bd (x, ) ⊂ U.
A set C ⊂ X is closed if its complement X\C is open.
Example 2.1.2. If X = R and d(x, y) = |x − y|, the intervals (a, b) are open sets and the intervals [a, b] are
closed sets. Also intervals of the form (a, ∞) or (∞, b) are open and intervals of the form [a, ∞) or (∞, b] are
closed. Intervals of the form [a, b) or (a, b] are neither open nor closed.
Exercise 2.1.2. Prove that a ball Bd (x, ) is open (use the triangle inequality).
Open and closed sets in a metric space enjoy the following property:1
Lemma 2.1.1. (1) Countable unions of open sets are open: if U1 , U2 , . . . , Un , . . . are open sets, than ∪k∈N Uk
is an open set;
(2) Finite intersections of open sets are open: if U1 , U2 , . . . , UN are open sets, than ∩N
k=1 Uk is an open set.
Exercise 2.1.3. Prove the lemma using the Definition 2.1.3 above.
Exercise 2.1.4. Given an example in X = R of a countable collection of open sets whose interesection is not
open.
By using De Morgan’s Laws, it follows that closed sets have the following properties (note that the role of
intersections and unions is reversed):
Corollary 2.1.1. (1) Countable intersections of closed sets are closed: if C1 , C2 , . . . , Cn , . . . are closed sets,
than ∩k∈N Ck is a closed set;
(2) Finite unions of closed sets are closed: if C1 , C2 , . . . , CN are closed sets, than ∪N
k=1 Ck is a closed set.
Exercise 2.1.5. Prove the Corollary from Lemma 2.1.1

Exercise 2.1.6. Given an example in X = R of a countable collection of closed sets whose union is not closed.
1 It is possible to define open sets as an abstract collection of the subsets of X which satisfy certain properties. In this case,
Property (1) in the Lemma is taken as an axiom. A collection U of subsets of X such that ∅, X ∈ U and Property (1) is satisfied
is called a topology. In this case, sets in U are called open sets and complement of sets in U are called closed sets. A topological
space (X, U ) is a space X with a topology U , see Extra.
2 c University of Bristol 2010 & 2014

Definition 2.1.4. A subset Y ⊂ X is dense if for any non-empty open set U ⊂ X there is a point y ∈ Y such
that y ∈ U .
One can check that this definition of dense set reduces to the usual definition of dense set for a subset
Y ⊂ R, that is, for each y ∈ Y and > 0 there exists y ∈ Y such that |x − y| < .
Definition 2.1.5. A metric space (X, d) is called separable if it contains a countable dense subset.
Example 2.1.3. If X = Rn with the Euclidean distance, X is separable since the set Qn given by all points
(x1 , . . . , xn ) ∈ Rn whose coordinates xi are rational numbers is dense and it is countable.
Let (X, dX ) and (Y, dY ) be a metric space. We will now consider properties of functions f : X → Y .
Definition 2.1.6. A function f : X → Y is an isometry if it preserves the distances, that is
dY (f (x), f (y)) = dX (x, y) ∀ x, y ∈ X.
We already saw an example of isometry:

Example 2.1.4. If X = Y = S 1 is the circle with the arc length distance d = dX = dY , then f = Rα the
rotation by 2πα is an isometry.
Definition 2.1.7. A function f : X → Y is continuous if for any x ∈ X and any > 0 there exists a δ > 0
such that
f (BdX (x, δ)) ⊂ BdY (f (x), ).
This definition generalizes the definition of continuity for a function f : R → R that you might have seen
in calculus/analysis classes, that is f is continuous if for any x ∈ R and any > 0 there exists δ > 0 such that
|x − y| < δ implies |f (x) − f (y)| < .
Exercise 2.1.7. Check that if X, Y ⊂ R and dX (x1 , x2 ) = |x1 − x2 |, dY (y1 , y2 ) = |y1 − y2 | this gives the usual
, δ definition of continuity of a real function.
Exercise 2.1.8. Let X = Y = S 1 and dX = dY = d be the arc length distance. Prove that
(a) The rotation Rα : S 1 → S 1 by 2πα is continuous;
(b) The doubling map f : S 1 → S 1 given in this coordinates by f (e2πiθ ) = (e2πi2θ ) is continuous.
It is enough to know which are the open sets in X and Y to define the notion of continuity:2
Lemma 2.1.2. A function f : X → Y is continuous if and only if for each open set U ⊂ Y the preimage
f −1 (U ) is an open set of X.
Proof. Assume that f is continuous. Let U ⊂ Y is open and let us show that f −1 (U ) is open. We have to
show that for each x ∈ f −1 (U ) there is δ > 0 such that BdX (x, δ) ⊂ f −1 (U ). Let y = f (x). Clearly y ∈ U
since x ∈ f −1 (U ). By definition of open set there exists > 0 such that BdY (y, ) ⊂ Y . By definition of
continuity, there exists δ > 0 such that
f (BdX (x, δ)) ⊂ BdY (y, ) ⊂ U,
thus BdX (x, δ) ⊂ f −1 (U ). This shows that f −1 (U ) is open.

The other implication is left as an exercise.
Exercise 2.1.9. Prove the other implication in Lemma 2.1.2, that is show that if a function f : X → Y
between two metric spaces (X, dX ), (Y, dY ) is such that for each open set U ⊂ Y the preimage f −1 (U ) is an
open set of X, then f is continuous in the sense of Definition 2.1.7.
The last metric space notion that we will use is the notion of compact sets. Let (X, d) be a metric space.
2 The following Lemma can be taken as definition of a continuous function when (X, U ) is a topological space, see Extra.

Definition 2.1.8. [Sequentially compact] A subset K ⊂ X is (sequentially) compact if for any sequence
(xn )n∈N ⊂ K there exists a convergent subsequence (xnk )k∈N and the limit limk→∞ xnk = x belong to K.
This property is called sequentially compactness since the definition involves sequences. There are other
notion of compactness (see compactness by covers below) which are equivalent in a metric space, so we will
simply say that a set is compact and use the term sequentially compact only when we specifically want to use
the above property of compact sets.
Example 2.1.5. Closed bounded intervals [a, b] ⊂ R are sequentially compact (this is known as Heine-Borel
theorem).
Conversely, in R, if a set is not bounded or not closed, it is not compact. The following two are non-examples,
that is examples of spaces that are not compact.
Example 2.1.6. The unbounded closed interval [0, ∞) is not sequentially compact: consider for example the
sequence (xn )n∈N given by xn = n. The sequence has no convergent subsequence.
The open interval (0, 1) is not sequentially compact: consider for example the sequence (xn )n∈N given by
xn = 1/n. We have limn→∞ xn = 0, but 0 ∈ / (0, 1).
2.1.1 Compactness by covers (level M only, Extra for level 3)
In addition to the definition of sequencial compact, there is an other definitions of compactness, compactness
by open covers, which turn out to be equivalent in a metric space. Compactness by open covers is a more
general definition of compactness and can be used as a defintion of compactness in any topological space (see
Extra on topological spaces if interested).
Definition 2.1.9. An open cover of K ⊂ X is a collection {Uα }α of open sets of X such that
[
K⊂ Uα
α
(this is why we say that they cover K). A finite subcover is a finite subset {Uα1 , Uα1 , . . . , UαN } ⊂ {Uα }α
which still covers, that is such that K ⊂ ∪N
i=1 Uαi .
Definition 2.1.10. [Compact by covers] A subset K ⊂ X is compact by covers if for any open cover {Uα }α
there exists a finite subcover {Uα1 , Uα1 , . . . , UαN } ⊂ {Uα }α such that X ⊂ ∪α Uα .
Example 2.1.7. The open interval (0, 1) is not compact by covers: consider for example the collection

1 1
U = , , n∈N
n+2 n
is an open cover, but U does not admit a finite subcover. Indeed, a finite subset of intervals in U is of the
form
1 1 1 1 1 1
, , , , , ,
n1 + 2 n1 n2 + 2 n2 nk + 2 nk

1
so that if n = maxi=1,...,k ni , no point in 0, n+2 is covered by the finite collection.
Theorem 2.1.1. In a metric space (X, d), a subset K ⊂ X is sequentially compact if and only if it is compact
by covers.
Since we will work only with metric spaces, we will simply say that a set is compact and use equivalenty
either Definition 2.1.8 or 2.1.10.
Remark 2.1.1. In Rn , any subset C ⊂ X which is closed and bounded, that is such that supx,y∈C d(x, y) <
+∞, is compact.

2.2 First topological dynamical properties

Topological dynamics is the branch of dynamical systems which studies topological dynamical systems and
their topological dynamical properties. We will not define here topological dynamical systems in general, but
we will work with metric spaces3 .
Let (X, d) be a metric space.
Definition 2.2.1. A (discrete-time) topological dynamical system is a map f : X → X where (X, d) is a
metric space (or more in general, a topological space) and f is continuous.
We have already seen many examples of topological dynamical systems:
1. The rotation Rα : S 1 → S 1 ;
2. The doubling map f : S 1 → S 1 (we remarked in §2.1 that f is continuous, as it can be seen easily using
the coordinates by f (e2πiθ ) = (e2πi2θ ));
3. The cat map fA : T2 → T2 (look at the cat map action in Figure 1.1 and, recalling that the sides of the
parallelogram image of [0, 1]2 by A are glued together, try to convince yourself that fA is continuous)
and more in general any toral automorphisms.
2.2.1 Topological dynamical properties

Let us now define some dynamical properties investigated in topological dynamics.
Definition 2.2.2. A topological dynamical system is called topologically transitive if there exists a dense
orbit, that is there exists x0 ∈ X such that the set O+ 4
f (x0 ) is dense .
Example 2.2.1. For example, the doubling map is topologically transitive, since we constructed a dense orbit
(see Theorem 1.3.2 in §1.3.). Another example is given by the rotation Rα by an irrational number α: as we
proved in Theorem 1.2.1 in §1.2, for example the orbit of x = 0 is dense.
Definition 2.2.3. A topological dynamical system is called minimal if all orbits are dense, that is for all
x ∈ X the set O+
f (x) is dense.
Example 2.2.2. For example, the rotation Rα by an irrational number α is minimal, as we proved in
Theorem 1.2.1 in § 1.2. On the other hand, the doubling map is not minimal, since there are periodic points
which lead to non-dense orbits.
Remark 2.2.1. Minimality implies topological transitivity, since if all orbits are dense, there is in particular
one dense orbit. On the other hand, we have just seen that the converse is not true, since there are systems
that are topologically transitive but not minimal, as the doubling map.
A useful alternative characterisation of topological transitivity is the following. We say a point x ∈ X is
isolated if the singleton {x} is an open set in X, or, equivalently, if there is > 0 such that Bd (x, ) = {x}.
Proposition 1. Let X be compact. A topological dynamical system f : X → X is topologically transitive if
for each pair U, V of non-empty open sets there exists n ∈ N such that
f n (U ) ∩ V 6= ∅.
The reverse implication holds under the additional assumption that X has no isolated points.
The reverse implication requires the following lemma:
3 More generally, it is enough to use a topological space, for which the notion of continuous map and of homeomorphism is
well defined.
4 If f is invertible, and f −1 is continuous (that is, f is a homeomorphism), one can require that there exists x ∈ X such that
0
the full orbit Of (x0 ) is dense. In some books, this stronger definition of topologically transitive is used for homeomorphisms.
Definition 2.2.2 is sometimes referred to as forward topologically transitive. If there exists x0 ∈ X such that the full orbit Of (x0 )
is dense, one can prove that there exists x such that O+ f (x) is dense, but x could be different than x0 .

Lemma 2.2.1. Assume X has no isolated points. If O+

f (x0 ) is dense, then for any n ∈ N also Of (f (x0 )) is
+ n
dense.
Proof of Lemma. Since X has no isolated points, every open non-empty set U ⊂ X contains infinitely many
1
distinct open subsets Uk . [For x ∈ U and K sufficiently large, take for instance Uk = B(x, k1 ) \ B(x, k+1 ),
k ≥ K. Here B(x, ) = {x ∈ X : d(x, ) ≤ )} is the closed ball at x.] This implies that, since Of (x0 ) is
+
dense, there are infinitely many integers mk such that f mk (x0 ) ∈ Uk ⊂ U . Given n ∈ N, choose k such that
mk ≥ n. Then U 3 f mk (x0 ) = f mk −n (f n (x0 )) and thus there is an integer m (namely m = mk − n) such that
f m (f n (x0 )) ∈ U .
Proof of Proposition 1. We first prove the reverse implication. Assume that f is topologically transitive and
X has no isoluated points. Let x0 be such that O+ f (x0 ) is dense. Given U, V open sets, by density there
exists n such that f n (x0 ) ∈ U . Since by Lemma 2.2.1 also O+ n
f (f (x0 )) is dense, there exists m such that
m n
f (f (x0 )) ∈ V . So
f m+n (x0 ) ∈ f m (U ) ∩ V,
which shows that f m (U ) ∩ V 6= ∅.
[For Level M:] Let us now prove the first implication. Assume that for each pair U, V of non-empty open
sets there exists n ∈ N such that f n (U ) ∩ V 6= ∅. Since X is compact, one can prove that X has a countable
dense subset, see the Exercise in §2.1 (for example, in the unit square [0, 1]2 , which is a compact set in R2 ,
the set Q2 ∩ [0, 1]2 of points whose coordinates are rational is a dense subset and is countable).
Let {xn }n∈N be the points of thie countable dense subset. To show that the orbit of a point x ∈ X is
dense, it is enough to show that for each k ∈ N and n ∈ N there exists a m such that f m (x) ∈ B(xn , 1/k).
This is because any open set U contains a point xn for some n (by density) and hence, since it is open, it
contains the ball B(xn , 1/k) for some k ∈ N, so if f m (x) ∈ B(xn , 1/k) ⊂ U , then O+
f (x) ∩ U 6= ∅.
Since the balls B(xn , 1/k) for n ∈ N and k ∈ N are countable, we can relabel them and enumerate them as
{B(xn , 1/k), n ∈ N, k ∈ N} = {U1 , U2 , . . . , Un , . . . }.
Let us know use transitivity to construct an orbit which visits all these balls in the order in which we listed
them5 . Let B0 = B(x, ) be any ball and let B0 be the closed ball B0 = {y d(x, y) ≤ }. By assumption, there
exists N1 such that f N1 (B0 ) ∩ U1 6= ∅. Thus, we can pick a ball inside the non-empty open set B0 ∩ f −N1 (U1 ).
Up to reducing the radius, we can assume that there is a smaller ball, that we call B1 , such that the closed
ball
B1 ⊂ B0 ∩ f −N1 (U1 ).
By assumption there exists N2 such that f N2 (B1 ) ∩ U2 6= ∅. Thus, as before we can find a smaller ball B2
such that
B2 ⊂ B1 ∩ f −N2 (U2 ).
Repeating by induction, we can construct a sequence balls Bn such that
Bn+1 ⊂ Bn ∩ f −Nn+1 (Un+1 ). (2.1)
Since the closed balls are nested, that is B n+1 ⊂ Bn and we are in a compact space, their intersection is
non-empty6 . If x ∈ ∩n B n , then f Nn (x) ∈ Un for any n by (2.1), thus the orbit of x is dense.
A dynamical property stronger than topological transitivity is the following, which is the first mathematical
definition of the intuitive idea of mixing (we will see in Chapter 4 another definition of mixing in the context
of ergodic theory).
Definition 2.2.4. A topological dynamical system f : X → X is called topologically mixing if for any pair
U, V of non-empty open sets there exists N ∈ N such that for all n ≥ N we have f n (U ) ∩ V 6= ∅.
5 Compare this with the proof that we gave that the doubling map has a dense orbit. For the doubling map, we used the
collection of binary intervals of the form I(a0 , . . . , an ), that is countable and plays the same role that the Ui in the proof and we
then constructed an orbit which visits them all.
6 This is a consequence of compactness that you might have seen in Metric Spaces: in a compact set, a countable collection of
non empty nested closed sets has a non empty intersection.

Topological mixing conveys the idea that each set U , after iterations of f , become spread everywhere: for
each V , for all n sufficiently large, f n (U ) intersects V .
If f is topologically mixing, in particular it is topologically transitive. This follows from the characterization
of topologically transitive in Proposition 1: if f n (U ) ∩ V 6= ∅ for all n ≥ N , in particular there is an n such
that f n (U ) ∩ V 6= ∅. Topologically mixing though, requires that the sets intersect for all large enough n.
Let us start by giving a non-example, that is an example of a map which is topologically transitive but not
topologically mixing.
Example 2.2.3. Rotations Rα : S 1 → S 1 are not topologically mixing. For simplicity take α < 1/2. Take
for example U, V to be two arcs, each of arc length less than πα. Then one can see there are infinitely many
k
k such that the images Rα (U ) does not intersect V : for every [1/α] iterates of Rα ([1/α], that is the integer
part of 2π/2α is the number of iterates to turn once around the cirle), there is at most one iterate such that
Rα (U ) intersects V (drawing a picture of the iterates of Rα will help you understand it).
On the other hand, if α is irrational, Rα is minimal (by Theorem 1.2.1 in § 1.2) and in particular topolog-
ically transitive. So irrational rotations are topologically transitive but not topologically mixing.
Let us now give an example of a topologically mixing dynamical system. In the following we make [0, 1]2
a metric space by using the standard Euclidean metric (we could also use the Manhattan distance d(x, y) =
max(|x1 − y1 |, |x2 − y2 |); see what this changes in the discussion below.)
Proposition 2. The baker map F : [0, 1]2 → [0, 1]2 is topologically mixing.
Proof. Let U, V be any two non empty open sets in X. Since U contains a small ball, it also contain a small
dyadic square Q, that is, a square with sides of length 1/2n which has the form

i i+1 j j+1
Q= , × , , 0 ≤ i, j ≤ 2n − 1.
2n 2n 2n 2n
Let us describe the iterates F n (Q).
Figure 2.1: Here n = 1, i = j = 0. The figures show 2k horizontal strips in F n+k (Q) with spacing 1/2k for
k = 1 and k = 2.
For each k = 1, 2, . . . , n, F acts on Q by doubling the horizontal width and halving the vertical height,
so that F k (Q) is a dyadic rectangle of width 2k (1/2n ) = 1/2n−k and height (1/2)k (1/2n ) = 1/2n+k . In
particular, for n = k, F n (Q) is thin horizontal rectangle of full width equal to 1. The image by F of a full
horizontal rectangle consists of two full horizontal rectangles, whose vertical distance is at most 1/2 (recall
how F acts geometrically and draw a picture to understand it). Since each iterate of F splits a full horizontal
rectangle into two, F k+n (Q) (for k ∈ N) consists of 2k horizontal rectangles of width 1, whose vertical spacing
is no more than 1/2k (where by vertical spacing we mean the distance between for example the centers of two
consecutive strips), see for example Figure 2.1. Now let B(y, ) be a ball contained in V , with > 0 sufficiently
small. Then, for all k such that 1/2k < , we have F n+k (Q) ∩ B(y, ) 6= ∅ and hence F n+k (U ) ∩ V 6= ∅.
The doubling map and the cat map are also topologically mixing dynamical systems (see Exercises below)
and provide examples in which one can prove that a small set (for example a dyadic rectangle for a baker map
or a rectangle whose sides are in eigenvalues directions for the cat map) is spread under the dynamics.
Exercise 2.2.1. Let X = R/Z and let f (x) = 2x mod 1 be the doubling map. You can use that if d(x, y) <
1/4, then d(f (x), f (y)) = 2d(x, y).
(a) Let I be a dyadic interval, that is an interval of the form

i i+1
I= , , N ∈ N+ , 0 ≤ i ≤ 2N − 1.
2N 2N
Describe the iterates f n (I) for n ∈ N: how many dyadic intervals do they consist of? of which size?
[Hint: You will need to consider separately what happens if n ≤ N and if n > N .]

(b) Show that for any non-empty open set U there exists N ∈ N such that f N (U ) = X. continues...
(c) Show that the doubling map is topologically mixing.
Exercise 2.2.2. Let fA : T2 → T2 be the cat map. Let λ1 > 1, λ2 < 1 be the eigenvalues of A and let
√ √
1+ 5 1− 5
v1 = 2 , v2 = 2 .
1 1
be the corresponding eigenvectors. Check that v 1 and v 2 are orthogonal.
(a) Let Q be a small rectangle whose sides have direction v1 and v2 respectively. Describe how the iterates
fAn (Q) look like.
(b) Show that the cat map is topologically mixing.
[Hint: In both Part (a) and (b) you can use that the directions of v1 and v2 are irrational and that if
π(L) is the projection via π : R2 → T2 of a line L with irrational slope (see figure below), then π(S) is
dense in T2 , that is for any non empty open set U ⊂ T2 there is a point of π(S) inside U .]

2.3 Topological conjugacies

and more topological dynamical properties
We already defined the notion of conjugacy and semi-conjugacy between dynamical systems. In the setting
of topological dynamics, it is natural to ask more from the conjugacy, so that the properties of a topological
dynamical systems are preserved by conjugacy.
2.3.1 Topological conjugacy

Definition 2.3.1. A map ψ : Y → X is a homeomorphism if it is continuous and invertible so that the inverse
ψ −1 : X → Y is continuous.
Definition 2.3.2. Two topological dynamical systems f : X → X and g : Y → Y are topologically conjugate
if they are conjugate and the conjugacy map ψ : Y → X is a homeomorphism. We will call ψ a topological
conjugacy.
Definition 2.3.3. Two topological dynamical systems f : X → X and g : Y → Y are topologically semi-
conjugate if they are semi-conjugate and the semi-conjugacy map ψ : Y → X is not only surjective but also
continuous. W call ψ a topological semi-conjugacy.
The definition is the same of conjugacy or semi-conjugacy, just that we require additionally that ψ is
continuous to have a topological semi-conjugacy or a homeomorphism (thus that also the inverse is continuous)
to have a topological conjugacy.
Example 2.3.1. Let g : [0, 1] → [0, 1] be the logistic map f (x) = 4x(1 − x) (graph in Figure 2.2(a)) and let
g : [0, 1] → [0, 1] be the tent map, that is

2x if 0 ≤ x ≤ 1/2,
g(x) = 1 − |2x − 1| =
2 − 2x if 1/2 ≤ x ≤ 1
The graph of the tent map is shown in Figure 2.2(b).
(a) (b)
The The
lo- tent
gis- map
tic g
map
f
Figure 2.2: The tent map g and the logistic map f are topologically conjugate.
The logistic map f and the tent map g are topologically conjugate. Let us show that the topological
conjugacy is the map ψ : [0, 1] → [0, 1] given by
π
ψ(x) = sin2 x . (2.2)
2
Let us first show that the following diagram commutes
g
[0, 1] −−−−→ [0, 1]
 
ψ ψ
y y
f
[0, 1] −−−−→ [0, 1]
Recall that we have the trigonometric identities
sin 2θ = 2 sin θ cos θ, sin(π − θ) = sin(θ). (2.3)

Thus, using this trigonometric identities we have

π π π π 2
f (ψ(x)) = 4ψ(x)(1 − ψ(x)) = 4 sin2 x (1 − sin2 x) = 2 sin x cos x =
π 2 π 2 2 2
sin2 2 x = sin2 (π − πx) = sin2 (2 − 2x) = ψ(g(x)).
2 2
Thus f ◦ ψ = ψ ◦ g.
Let us show that ψ is a homeomorphism. It is clearly continuous, since it is composition of continuous
functions. Since ψ(0) = 0 and ψ(1) = 1, by continuity and intermediate value theorem it assumes all values
in [0, 1], thus it is surjective. Since
π π π
ψ 0 (x) = 2 sin x cos x > 0 for 0 < x < 1,
2 2 2
ψ is monotonic, thus injective. Thus, it is invertible. The inverse is given by
2 √
ψ −1 (y) = arcsin( y),
π
as you can check by computing ψψ −1 and showing that it gives the identity map. Hence, also the inverse is
continuous. Thus ψ is a homeomorphism and it gives a topological conjugacy.
Topological conjugacies preserve many topological dynamical properties (see Proposition 3 below). Thus,
if one finds a topolological conjugacy of a map f with a simpler map g, one can analyse the simpler map g
to obtain information about dynamical properties of the original map f . This is for example the case in the
previous example: logistic maps (also called logistic family) appear very often as models of real-life dynamical
systems, for example in biology. Many chaotic properties of the logistic map fµ for µ = 4 can be studied
through the topological-conjugacy with the tent map: the tent map turns out to be easier to analyse, since it
can be studied using binary expansions similarly to the doubling map (in particular, one can find all periodic
points and construct dense orbits, see Exercise at the end of this section).
We have already seen that a conjugacy maps periodic points of period n to periodic points of period n.
All the topological dynamical properties that we saw in this lecture are preserved by topological conjugacy.
Proposition 3. Assume that the topological dynamical systems f : X → X and g : Y → Y are topologically
conjugate. Then:
(1) f is topologically transitive if and only if g is topologically transitive;
(2) f is minimal if and only if g is minimal;
(3) f is topologically mixing if and only if g is topologically mixing.
Thus, if two topologically conjugate dynamical systems have the same topological dynamical properties.
Remark 2.3.1. If the two systems are semi-conjugate, only one of the implications in (1), (2), (3) follows
(see Exercise 2.3.2 and Exercise 2.3.3).
Lemma 2.3.1. If f : X → X and g : Y → Y are topologically conjugated by the homeomorphism ψ : Y → X,
then for all y ∈ Y we have: the orbit O+
g (y) is dense in Y if and only if the orbit Of (ψ(y)) is dense in X.
+
Proof of Lemma 2.3.1. Assume that Og (y)+ is dense and let us show that O+ f (ψ(y)) is dense. For any U ⊂ X
−1 −1
non-empty open set, ψ (U ) is an open set in Y since ψ is continuous because ψ is an homeomorphism and
it is non-empty since ψ is surjective. By density of Og (y)+ , there exists k ∈ N such that
g k (y) ∈ ψ −1 (U ) ⇔ ψ(g k (y)) ∈ U.
Since ψ is a conjugacy, f k ψ = ψg k so f k (ψ(y)) = ψ(g k (y)) ∈ U , so O+

f (ψ(y)) intersects U . This holds for
any non-empty open set U and thus shows that Of (ψ(y)) is dense. The other implication follows simply by
+
exchanging the role of f and g, and ψ and ψ −1 .

Exercise 2.3.1. Check that if the topological dynamical systems f : X → X and g : Y → Y are topologically
semi-conjugate, one can still prove that if the orbit O+
g (y) of y ∈ Y is dense, then the orbit Of (ψ(y)) of ψ(y)
+
is dense in X.
Proof of Proposition 3. Part (1) and (2) follow from the definitions as a consequence of the Lemma 2.3.1, since
dense orbits are mapped to dense orbits by the conjugacy. Indeed, if g is topologically transitive, there exists
y ∈ Y so that Og+ (y) is dense and by Lemma 2.3.1 Of+ (ψ(y)) is dense in X so also f is topologically transitive.
Similarly, if g is minimal, for any x ∈ X, let y = ψ −1 (x) ∈ Y (which is well defined since ψ is intertible). By
minimality of g Og+ (y) is dense and by Lemma 2.3.1 Of+ (x) is dense in X so also f is minimal. The converse
implications follow by reversing the role of f and g and noting that if ψ : Y → X is a topological conjugacy
also ψ −1 : X → Y is a topological conjugacy.
Let us now prove Part (3). Assume that g is topologically mixing and let us deduce that f is also
topologically mixing. Given U, V open and non empty, ψ −1 (U ) and ψ −1 (V ) are also open, since ψ is continuous,
and non-empty, since ψ is surjective. Thus there exists N such that for any n ≥ N we have that g n (ψ −1 (U )) ∩
ψ −1 (V ) 6= ∅. Let y ∈ g n (ψ −1 (U )) ∩ ψ −1 (V ). Recall that by definition of conjugacy, since ψ is invertible, we
have ψ ◦ g ◦ ψ −1 = f and hence by induction ψ ◦ g n ◦ ψ −1 = f n . Thus, if we consider ψ(y) we have that
ψ(y) ∈ ψ g n (ψ −1 (U )) ∩ V = f n (U ) ∩ V.

Hence for any n ≥ N we have f n (U )∩V 6= ∅, which shows that f is topologically mixing. The other implication
follows again by reversing the role of f and g.
The next Exercise follows from Exercise 2.3.1 as Proposition 3 follows from Lemma 2.3.1:
Exercise 2.3.2. If f : X → X and g : Y → Y are topologically semi-conjugated by ψ : Y → X, then if g is
topologically transitive, f is topologically transitive and if g is minimal then f is minimal.
Exercise 2.3.3. Prove that if f : X → X and g : Y → Y are topologically semi-conjugated by ψ : Y → X
then if g is topologically mixing then f is topologically mixing.
2.3.2 More topological dynamical properties

Let us defined two more topological dynamical properties, sensitive dependence and expansivity, and then let
us give a mathematical definition of chaotic systems.
The following definition which captures mathematically the notion of sensitive dependence on initial con-
ditions: in certain systems, even if two points start arbitrally close, their orbits will become eventually macro-
scopically far apart7 .
Definition 2.3.4. A topological dynamical system f : X → X on a metric space (X, d) has sensitive de-
pendence on initial conditions (or simply sensitive dependence) if there exists ∆ > 0, called the sensitivity
constant such that for all x ∈ X and all δ > 0, there exists y ∈ B(x, δ) and n ∈ N such that
d(f n (x), f n (y)) ≥ ∆.
Sensitive dependence means that arbitrarily close to each point of the space there are points whose future
itereates will become ∆-apart from the iterates of the given point. This phenomenon cause high unpre-
dictability: if for example one tries to use a computer to understand the dynamics of a system, one needs to
approximating the initial conditions by rounding it off and if the system has sensitive dependence on initial
conditions, this might cause a huge difference: the simulated orbit might be completely different than the real
evolution.
Example 2.3.2. Show that if ∆ > 0 is a sensitivity constant for f : X → X, any constant 0 < ∆0 < ∆ is
also a sensitivity constant.
7 A popular image for this phenomenon is the so called butterfly effect: ”‘a butterfly in China could cause a hurricane in
mexico”’. A small difference in initial conditions, like the presence of a buttarfly swinging, in a chaotic system could create a
huge difference in the long-term evolution, as the creation of a hurricane.

Note that we do not require that all points in B(x, δ) have different evolution, but only that in each ball
there is at least one point with that property. A stronger requirement, which implies sensitive dependence, is
the following:
Definition 2.3.5. A topological dynamical system f : X → X on a metric space (X, d) is called expansive if
there exists ν > 0, called the expansive constant such that for all x, y ∈ X such that x 6= y there exists n ∈ N
such that
d(f n (x), f n (y)) ≥ ν.
If f is expansive, the orbits of all points nearby a given point x ∈ N have some iterate which eventually
become far apart. Thus, if f is expansive with constant ν, in particular it has sensitive dependence with
constant ∆ = ν.
Example 2.3.3. The doubling map is expansive. We can take ν = 1/4. We have already seen in §1.3 that if
d(x, y) < 1/4, then d(f (x), f (y)) = 2d(x, y). Let x 6= y be distinct points. If d(x, y) ≥ 41 , then the definition
of expansive holds with n = 0 for x and y. Otherwise, by the above relation d(f (x), f (y)) = 2d(x, y). If
d(f (x), f (y)) ≥ 14 , we are again done since the definition holds with n = 1. Otherwise, d(f 2 (x), f 2 (y)) =
2d(f (x), f (y)) = 22 d(x, y). Thus, continuing in this way, if d(f k (x), f k (y)) < 41 , for all 0 ≤ k ≤ n − 1, then
d(f n (x), f n (y)) = 2n d(x, y).
Note that d(x, y) 6= 0 by the properties of a distance if x 6= y. Thus, for given x 6= y there exists n ∈ N
1
log
such that 2n d(x, y) ≥ 41 (namely any n ≥ 4d(x,y)
log 2 ), and therefore d(f n (x), f n (y)) ≥ 1/4. We conclude the
doubling map has sensitive dependence on initial conditions with constant 1/4.
Exercise 2.3.4. The linear maps Em : R/Z → R/Z given by Em (x) = x + m mod 1 (where m ∈ N , m > 1)
are expansive with expansive constant ν = 1/2m.
Let us now give an example of a map which has sensitive dependence but is not expansive.
Example 2.3.4. Let fA : T2 → T2 be the cat map. Let us first show that fA has sensitive dependence on
initial conditions. Take for example ∆ = 1/2. For any x ∈ T2 and any δ > 0, consider all y ∈ Bd (x, δ) on the
line through x in direction of the eigenvector v1 with eigenvalue λ1 > 1. Since fA expands points in direction
v1 with factor λ1 , we have that
d(fAn x, fAn y) = λn1 d(x, y)
at least for all n such that λn1 d(x, y) ≤ 1/2. Since λ1 > 1, there exists n0 such that 1/2λn1 0 < δ. We can
now choose y ∈ Bd (x, δ) so that d(y, x) = 1/2λn1 0 . Then d(fAn0 x, fAn0 y) = 1/2. This shows that ∆ = 1/2 is a
sensitivity constant.
Let us now show that fA is not expansive. Fix any ν ∈ (0, 21 ]. Then fix any x ∈ T2 and consider y ∈ Bd (x, ν)
on the line through x in direction of the eigenvector v2 with eigenvalue λ2 < 1. Since fA contracts distances
in direction v2 with factor λ2 , we have that
d(fAn x, fAn y) = λn2 d(x, y) < d(x, y) < ν.
for all n ∈ N. We have thus shown that for any ν ∈ (0, 21 ] there exist x 6= y such that d(fAn x, fAn y) < ν for all
n ∈ N. This implies fA is not expansive.
Exercise 2.3.5. Show that any hyperbolic toral automorphism fX : T2 → T2 has sensitive dependence but
is not expansive.
Exercise 2.3.6. Let F : [0, 1]2 → [0, 1]2 be the Baker map.
(a) Show that the baker map has sensitive dependence on initial conditions with ∆ = 1/4;
(b) Show that it is not expansive, that is that there is no ν > 0 such that for every two points (x1 , y1 ), (x2 , y2 ) ∈
[0, 1]2 there is n such that d(F n ((x1 , y1 ), F n (x2 , y2 )) ≥ ν.

[Hint: Look at nearby points on the same horizontal line for (a) and at nearby points on the same vertical
line for (b)]
In various books8 the following is taken as definition of chaos.

Definition 2.3.6. A topological dynamical system f : X → X is called chaotic if
(C0) f has sensitive dependence on initial conditions;
(C1) f is topologically transitive;

(C2) The set P er(f ) of perodic points for f is dense in X.
We have already seen some examples of chaotic dynamical systems according to this definition.
Example 2.3.5. The doubling map is chaotic. We have already seen that it is expansive (Eg 2.3.3), that it
is topologically transitive (Theorem 1.3.1 in §1.3) and that periodic points are dense (Exercise 1.3.4 in §1.3).
Example 2.3.6. Rotations Rα of the circle are not chaotic according to this definition. If α is irrational,
there are no periodic points. If α is rational, there are no dense orbits. Moreover, for any α, Rα does not have
sensitive dependence on initial conditions since Rα is an isometry and the iterates of nearby points remain
close by.
Other maps that can be proved to be chaotic are the baker map, the cat map and the logistic map. For
the latter, one can use the fact that it is topologically conjugate to the doubling map. Indeed, let us remark
that if f : X → X and g : Y → Y are topologically conjugate one can prove, similarly to what we did for the
other properties, that f is chaotic if and only if g is chaotic.
* Exercise 2.3.7. Consider the linear twist T : T2 → T2 , that is the map given by
T (x, y) = (x, y) → (x + y mod 1, y mod 1).
(a) Show that T has sensitive dependence on initial conditions.
(b) Show that if y is rational than (x, y) is periodic. Conclude that P er(T ) is dense.
(c) Is T chaotic?
* Exercise 2.3.8. Consider the tent map f and the logistic map g defined above in this lecture.
• Show that the tent map f is topologically mixing;
• Show that periodic points P er(f ) are dense;

• Prove that the logistic map g is chaotic.
8 The first to adopt this as definition of chaos was Devaney, in An Introduction to Chaotic Dynamical Systems. Sometimes
this definition is also called Devanay chaotic.

2.4 Topological Entropy

We will define in this section a very important and deep concept in dynamics, the concept of entropy. We
will consider here topological entropy. There is also a notion of metric entropy, that is related to topological
entropy but we will not see in this course (metric entropy is very important also in information theory and the
concept of entropy was defined independently in dynamics and in information theory9 ). Topological entropy
is a positive number assigned to each topological dynamical system, that roughly tells how much chaotic a
system is. Let us try to give first an intuitive idea of what it measures.
Imagine to look at your space X with a finite scale resolution (for example, if you look at your system from
far away you will not distinguish points that are very close; similarly if you do computer simulations of your
system, the finite precision of a computer forces you to divide the space in small bits). Consider finite pieces
of orbits. Then you will be able to distinguish only a finite number of orbits up to time n. As the length n of
the orbits that you consider increases (and also as the resolution increases), the number of orbits will increase.
In systems where the entropy is positive, the number of orbits increase exponentially.
Topological entropy gives the exponential rate of growth of the number of orbits distinguishable with finite
but arbitrary precision.
There are various equivalent ways of defining topological entropy and sometimes one is more convenient
than the others to compute it. We will present the one introduced by Bowen10 (using (n, )−separated, §2.4.1,
and (n, )−spanning sets, section §2.4.2). In section §2.4 we also give the definition of entropy via covers11 .
2.4.1 Entropy via separated sets

Let (X, d) be a metric space and let f : X → X be a topological dynamical system. In this section we will
always assume that X is compact.
For each n ∈ N let us define a new distance, dn : X × X → [0, ∞) given by
dn (x, y) = max d(f k (x), f k (y)).

0≤k<n
Thus, two points are −close with respect to the distance dn if their iterates under f stay −close until time
n. Note that the definition of dn depends on the trasformation f . Thus, the balls with respect to this metric
Bdn (x, ) = {y ∈ X such that d(f k (x), f k y) < for all 0 ≤ k < n}
consist of all points whose trajectories up to time n stay -close to the finite orbit segment {x, f (x), . . . , f n−1 (x)}.
Another way to express it is
Bdn (x, ) = Bd (x, ) ∩ f −1 (Bd (f (x), ) ∩ · · · ∩ f −n+1 Bd (f n−1 (x), ).
Indeed, the points y in the intersections are exactly the points such that f k (y) ∈ Bd (f k (x), ) for all 0 ≤ k ≤
n − 1.
Definition 2.4.1. Let > 0, n ∈ N. A set S ⊂ X is (, n)-separated if for all distinct point x, y ∈ S, x 6= y,
we have dn (x, y) ≥ .
Points in an (, n)-separated set have trajectories that, with a finite scale resolution , can be recognized
as different in time n. Let us give an example of an (n, )-separated set.
9 The definition of metric entropy, often called Kolmogorov-Sinai entropy, was introduced for the first time by Kolmogorov, one
of the fathers of ergodic theory, in a paper in 1958 and was successively developed by Sinai, another crucial figure in ergodic theory,
who at the time was his graduate student (the entry on entropy on Scholarpedia was actually written by Sinai himself). Entropy
in information theory, usually called Shannon entropy, was introduced by Claude Shannon in his 1948 paper Mathematical Theory
of Communication.
10 In metric spaces this definition of entropy was introduced by Bowen in 1971 and independently by Dinaburg in 1970. The
definition of entropy via covers for any topological space already existed before. Equivalence between these two notions was
proved by Bowen in 1971.
11 Historially the definition of topological entropy via covers came first and is due to introduced in 1965 by Adler, Konheim and
McAndrew. Their definition for topological dynamical systems is modelled on the definition of metric entropy by Kolmogorov
and Sinai.

Example 2.4.1. [Separated sets for the doubling map] Let f : R/Z → R/Z be the doubling map,
f (x) = 2x mod 1. Let > 0 and assume that < 1/4. Find k such that 1/2k+1 < ≤ 1/2k . By assumption
on , k ≥ 2. Consider the set Sn of dyadic fractions with denominator 2n , that is

i n
Sn = , 0 ≤ i ≤ 2 − 1 . (2.4)
2n
Let us prove that the set Sn−1+k is (n, )-separated. Let x, y ∈ Sn−1+k , with x 6= y. We need to show that
dn (x, y) ≥ , that means that there exists 0 ≤ l ≤ n − 1 such that d(f l (x), f l (y)) ≥ . In an exercise, we saw
that for any u, v ∈ R/Z
1
d(u, v) ≤ ⇒ d(f (u), f (v)) = 2d(u, v). (2.5)
4
If there exists 0 ≤ l ≤ n − 1 such that d(f l (x), f l (y)) ≥ 1/4, we are done (since k ≥ 2, 1/4 ≥ 1/2k ≥ ).
Otherwise, we can apply (2.5) repeatingly for n − 1 times and get
d(f n−1 (x), f n−1 (y)) = 2n−1 d(x, y).
Since x 6= y and x, y ∈ Sn−1+k , we have d(x, y) ≥ 1/2n−1+k , so that we get
2n−1 1
d(f n−1 (x), f n−1 (y)) = 2n−1 d(x, y) ≥ = ≥ .
2n−1+k 2k
This proves that Sn−1+k is (n, )−separated. Note that the cardinality of Sn−1+k is 2n−1+k .
Remark 2.4.1. One can show, using the assumption that X is compact, (n, )-separated sets exist and are
finite (see Extra 1 in the Extras on Topological Entropy linked after the following class).
To determine how many different orbits can be recognized at resolution one can maximize the size of an
(n, )−separated set.
Definition 2.4.2. Let Sep(f, n, ) (or simply Sep(n, ) when there is no ambiguity on f ) be the maximal
(that is, the largest) cardinality of an (n, )−separated set in X.
Clearly as n grows, the maximum number of (n, )−separated points will grow. In our Example 2.4.1 with
the doubling map, the cardinality of the (n, )−separated set Sn−1+k that we exhibited grows as 2k+n−1 , thus
exponentially in n.
Note that if a quantity grows exponentially, for example if an = eκn , the exponential rate of growth, or
the exponent κ, can be obtained as
log eκn log an
κ = lim = lim .
n→∞ n n→∞ n
This is still true if the exponential growth is not purely exponential, but there are other subexponential factors,
for example if an = n2 eκn
log an log n2 eκn 2 log n log eκn
lim = lim = lim + lim = 0 + κ = κ.
n→∞ n n→∞ n n→∞ n n→∞ n
More in general, if an = f (n)eκn where f (n) is subexponential, that is, limn→∞ log f (n)/n = 0, the limit
limn→∞ log an /n still gives κ.
Thus, to consider the exponential growth rate of Sep(f, n, ), for any > 0, consider the quantity
log(Sep(f, n, ))
htop (f, ) = lim sup .
n→∞ n
We need to consider the lim sup because we do not know if the limit a priori exists. In all the examples that
we will compute the limit actually exists. If we now change resolution, thus let → 0, it is again clear that
Sep(f, n, ) cannot decrease as tends to zero (as the resolution becomes finer and finer, one can distinguish
at least the same orbits, and probabily more). Moreover one can show that the growth rate of distinguishable
orbits will stay the same or increase, thus also htop (f, ) is not decreasing. Since a monotone function has a
limit, the limit of htop (f, ) as tends to 0 exists.

Definition 2.4.3. [Topological entropy] The topological entropy htop (f, ) of a topological dynamical system
f : X → X is given by
log(Sep(f, n, ))
htop (f ) = lim htop (f, ) = lim lim sup .
→0 →0 n→∞ n
Thus, htop (f, ) is the exponential growth rate of the maximum number of orbits of length n which are
distinguishable with finite precision and htop (f ), which is the limit for arbitrarly small , is the exponential
growth rate of the maximum number of orbits of length n which are distinguishable with finite but arbitrary
precision.
Example 2.4.2. Let f be again the doubling map. We showed that if 1/2k+1 < ≤ 1/2k then the set Sn+k
is an (n, )−separated set. Thus the maximal cardinality Sep(n, ) of an (n, )−separated set is at least the
cardinality of Sn+k , which is 2n+k . Thus, remarking that here , and hence k, are fixed and the limit is taken
only in n, we get
log(Sep(n, )) log 2n+k (n + k) log 2

htop (f, ) = lim sup ≥ lim sup = lim sup = log 2.
n→∞ n n→∞ n n→∞ n
This quantity is independent on . Thus
htop (f ) = lim htop (f, ) ≥ log 2.

→0
This shows that the doubling map has positive entropy.
2.4.2 Entropy via spanning sets

While it is relatively easy to construct an (n, )−separated set, it might be hard to prove that an (n, )−separated
set is maximal. Exhibiting an (n, )−separated sets for each n gives a lower bound on topological entropy. To
give upper bounds, it is easier to produce sets which are in some sense opposite to (n, )−separated sets and
are called (n, )−spanning sets.
Definition 2.4.4. Let > 0, n ∈ N. A set S ⊂ X is (, n)−spanning if for all x ∈ X there is an y ∈ S such
that dn (x, y) < .
In other words, a set S is (, n)−spanning if any point of the space can be approximated with a point
of S whose orbit up to time n is indistinguishable up to time n with finite resolution . Equivalently, S is
(, n)−spanning if and only if
[
X⊂ Bdn (y, ), where Bdn (y, ) = {x ∈ X, such that dn (x, y) < }.
y∈S
One can show that in any compact metric space (X, d), for any > 0 and n ∈ N there exist (n, )-spanning
sets (see Extra 1 in the Extras on Topological Entropy linked after the following class).
Example 2.4.3. [Spanning set for the doubling map] Let f be again the doubling map and Sk the set
of dyadic fractions with denominator 2k , see (2.4). Let 1/2k+1 < ≤ 1/2k . Then Sk+n is (, n)−spanning.
Indeed, if x ∈ [0, 1], there exists i such that

i i+1
x ∈ n+k , n+k , where 0 ≤ i ≤ 2n+k − 1.
2 2
Then, if y ∈ Sn+k is one of the endpoints of the interval, we have
1 2j 2n−1 1
d(x, y) ≤ ⇒ d(f j (x), f j (y)) ≤ ≤ = k+1 < , for all 0 ≤ j < n.
2n+k 2n+k 2n+k 2
Thus, dn (x, y) < .
Note that the cardinality of Sn+k+1 is 2n+k+1 , so it grows exponentially in n.

This time it makes sense to see what is the smallest possible (, n)-spanning set.
Definition 2.4.5. Let Span(f, n, ) (or Span(n, ) if there is no ambiguity on f ) be the minimal (that is, the
smallest) cardinality of an (n, )-spanning set in X.
In other words, Span(f, n, ) is the minimum number of initial conditions needed to approximate with
resolution all orbits in the space up to time n. We can consider the exponential growth rate in n of
Span(f, n, ) for fixed and then take the limit as tends to zero. It turns out that this gives the same result
than using Sep(f, n, ) and yields again the topological entropy:
Theorem 2.4.1. For any topological dynamical system f : X → X we have
log(Span(f, n, ))
htop (f ) = lim lim sup .
→0 n→∞ n
Note that since we have already defined htop (f ) in terms of (n, )-separated sets, so by definition
log(Sep(f, n, ))
htop (f ) = lim lim sup .
→0 n→∞ n
Thus, the content of the theorem, is that the two limsups, obtained using separated and spanning sets respec-
tively, are the same, that is:
log(Sep(f, n, )) log(Span(f, n, ))

lim lim sup = lim lim sup .
→0 n→∞ n →0 n→∞ n
The proof of the theorem is given below. Let us first remark that the theorem states that to compute
topological entropy, we can either use maximal separated sets or minimal spanning sets. While the expression
with Sep(n, ) is useful to give lower bounds on htop (f ), the expression with Span(n, ) is useful to give upper
bounds on htop (f ). Indeed, once we find an (n, )-separated set, we have a lower bound on on the maximal
cardinality of (n, )-separated sets, that is on Sep(n, ). Conversely, once we find an (n, )-spanning set, we
have an upper bound on the minimal cardinality of (n, )-spanning sets, that is on Sep(n, ). When the upper
bound and the lower bound coincide, the common value is the topological entropy.
Let us show two applications of Theorem 2.4.1.
Example 2.4.4. [Entropy of the doubling map] Consider the doubling map f . Let < 1/4 and let k be
such that 1/2k+1 < ≤ 1/2k . Since Sn+k is an (n, )-spanning set of cardinality 2n+k , the minimal cardinality
Span(f, n, ) of an (n, )-spanning set is at most 2n+k so
log(Span(f, n, )) log 2n+k (n + k) log 2

lim sup ≤ lim sup = lim = log 2.
n→∞ n n→∞ n n→∞ n
Note that k is fixed when taking the limit in n. Taking now the limit in of a quantity which is independent
on , by Theorem 2.4.1 we have htop (f ) ≤ log 2. Since we have already shown using (n, )-separating sets that
htop (f ) ≥ log 2, we conclude that htop (f ) = log 2.
Example 2.4.5. [Entropy of rotations] Let Rα : S 1 → S 1 be a rotation. Fix and N such that 1/N ≤ .
Let us consider N equispaced points on S 1 :
m
S = { e2πi N , m = 0, 1, . . . , N − 1},
so that the arc length distance between two successive points in S is 1/N . Clearly S is (1, )−spanning, since
for any z ∈ S 1 there exists z 0 ∈ S such that d(z, z 0 ) ≤ 1/N ≤ . Let us show that the same set S is also
(n, )-spanning for any n ∈ N. It is enough to remark that for any j ∈ N, since Rα preserves the arc length
distance,
d(Rαj j
(z), Rα (z 0 )) = d(z, z 0 ) ≤ 1/N ≤ . (2.6)
Thus, for any n ∈ N, dn (z, z 0 ) ≤ . This shows that S is also (n, )-spanning for any n ∈ N.

Hence, the minimal cardinality Span(f, n, ) is less than the cardinality of S, which is N , independently
on n. Thus, for any , if we choose 1/N ≤ and keep N fixed as n → ∞, we get
log(Span(f, n, )) log N

lim sup ≤ lim sup = 0.
n→∞ n n→∞ n
Notice that the groth rate is non negative, so that if it is less than 0, i it indeed 0. By the Theorem 2.4.1
log(Span(f, n, ))
0 ≤ htop (Rα ) = lim lim sup = 0.
→0 n→∞ n
This shows that rotations have zero topological entropy.

Remark 2.4.2. Note that the only property of rotations used in the proof their entropy is zero is that
rotations are isometries, so that (2.6) holds. More generally, any isometry of a compact metric space has zero
entropy.
Exercise 2.4.1. Let (X, d) be a compact metric space and f : X → X a continuous isometry.
(a) Show that for any > 0 there exists a (1, ) spanning set.
[Hint: Use the definition of compactness by covers, see §2.1.1]
(b) Prove that htop (f ) = 0.
Let us know prove Theorem 2.4.1. We will use the following:

Lemma 2.4.1. For any topological dynamical system f : X → X on a metric space (X, d) we have:
(1) Span(n, ) ≤ Sep(n, );
(2) Sep(n, 2) ≤ Span(n, ).

Proof. Let us prove (1) first. Let S be a (n, )-separated set of maximal cardinality, so that Sep(n, ) =Card(S).
We claim that S is also (n, )-spanning. Indeed, given x ∈ X \ S, since S has maximal cardinality, the set
S ∪ {x} is not (n, )-separated. Thus, since for any distinct y, y 0 ∈ S we have dn (y, y 0 ) ≥ (S is (n, )-
separated), but S ∪ {x} is not (n, )-separated, there must exists y ∈ S such that dn (x, y) < . This shows
that S is (n, )-spanning. Thus, the minimal cardinality
Span(n, ) ≤ Card(S) = Sep(n, ).
Let us now prove (2). Let S be a (n, )-spanning set of minimal cardinality, so that Span(n, ) =Card(S).
By definition of (n, )-spanning, we know that
[
X⊂ Bdn (y, ).
y∈S
Let S 0 be any (n, 2)−separated set. We claim that no two distinct points x1 6= x2 of S 0 can belong to the
same dn −ball. Indeed, if both x1 , x2 ∈ Bdn (y, ) for some y ∈ S, by triangle inequality
dn (x1 , x2 ) ≤ dn (x1 , y) + dn (x2 , y) < + = 2,
which contradicts the fact that S 0 is (n, 2)-separated. So the number of points in S 0 cannot be more than
the number of balls, that is the cardinality of S. Let us now choose S 0 the (n, 2)-separated set of maximal
cardinality. Then
Sep(n, 2) = Card(S 0 ) ≤ Card(S) = Span(n, ).

Proof of Theorem 2.4.1. By the Lemma, for any > 0 and n ∈ N, we have
Sep(n, 2) ≤ Span(n, ) ≤ Sep(n, ).
Since the inequalities hold for any n ∈ N, we also have
log(Sep(n, 2)) log(Span(n, )) log(Sep(n, ))

lim sup ≤ lim sup ≤ lim sup ,
n→∞ n n→∞ n n→∞ n
where the first inequality follows from (2) and the second inequality follows from (1). By sandwitch Theorem
(also called pinching theorem or two policemen theorem), if we now take the limit as tends to zero, both the
left and the righ hand side converge by definition to htop (f ), so
log(Span(f, n, ))
htop (f ) ≤ lim lim sup ≤ htop (f ).
→0 n→∞ n
Thus, we conclude as desired that the limit as tends to zero of the exponential growth rate of Span(f, n, )
is equal to htop (f ).

2.5 More on Topological Entropy:

toral automorphisms and entropy via covers
Using both equivalent definitions of entropy via (n, )-separated and (n, )-spanning sets one can often compute
the topological entropy of a topological dynamical system f : X → X on a metric space (X, d). Let us
summarize how to use them together.
One strategy to compute topological entropy is the following:
1. If for any fixed > 0 you can construct sets Sn , for each n ∈ N, which are (n, )-separating and whose
exponential growth rate is h, then, since Sep(n, ) ≥ Card(Sn ), you can conclude that htop (f ) ≥ h;
2. If for any fixed > 0 you can construct sets Sn , for each n ∈ N, which are (n, )-spanning and whose
exponential growth rate is h, then, since Span(n, ) ≤ Card(Sn ), you can conclude that htop (f ) ≤ h;
3. If the exponential growth rates h and h in the two previous points the same, then you can conclude that
htop (f ) = h = h.
We have already seen examples of this principle in the previous section. We will now compute the topo-
logical entropy of hyperbolic toral automorpshisms using this strategy.
2.5.1 Topological entropy of hyperbolic toral automorpshisms

We defined hyperbolic toral automorphisms in § 1. 6. Recall that they are maps on T2 = R2 /Z2 defined by a
matrix A with integer entries, determinant ±1 and no eigenvalue of modulus 1. The cat map is an example
of hyperbolic toral automorphisms.
Theorem 2.5.1. Let fA : T2 → T2 be a hyperbolic toral automorphisms, with eigenvalues λ1 > 1 and λ1 < 1.
The topological entropy of fA is given by
htop (fA ) = log λ1 .
Note that only the expanding eigenvalue λ1 enters in the entropy. The direction which is contracted does
not play any role in the value of the entropy. This is a general phenomenon: topological entropy sees only
points which diverge exponentially and hence only the expanding directions matter.
Let us denote by v 1 and v 2 the two eigenvalues of A, that is
A v 1 = λ1 v 1 , A v 2 = λ2 v 2 ,
and let us assume that they are renormalized so that they have unit length: ||v 1 || = ||v 2 || = 1.
(a) (b) (c)

Lines An An
in (n, )- (n, )-
di- spanningseparating
rec- set set
tion
v1
Figure 2.3: (n, )-spanning and (n, )-separating sets fo fA : T2 → T2 .
Proof. Let us first construct (n, )-spanning sets for fA . Fix > 0 and choose N ∈ N such that 1/N ≤ /2.
Draw inside the unit square [0, 1) × [0, 1) segments of lines in direction of v 1 , which cross the square fully and
have spacing 1/N both on the horizontal and on the vertical side, as in Figure 2.3(a). Note that the distance
between two successive lines is less than 1/N ≤ /2, since the distance between lines is the side of a right
triangle whose hypothenus is the spacing 1/N between lines.
On each line, consider points whose spacing is /2λn−11 . The reason of this choice will be clear later. Let
S ⊂ T2 be the set which consists of the union over the lines of these points (see Figure 2.3(b)). Let us prove
that S is (n, )-spanning.

(a) (b)
x Ak (x)
and and
y Ak (y)
Figure 2.4: Distance from points of the (n, )-spanning set.
Let x ∈ T2 (note that a point on T2 has two coordinates (x1 , x2 ) and we can think of it as a vector, that
is why why write x). Let y ∈ S be the closest point to x among points in S. We can write between x − y
as sum of a vector proportional to v 2 and a vector proportional to v 1 (see Figure 2.4(a)). Remark that the

distance between x and y along the v 1 direction is less than the spacing of points on each line, which is 2λn−1
1
(see Figure 2.4(a)); the distance between x and y in direction v 2 is less than the distance between x and a
line, which is less than the distance /2 between lines. Thus can write (2.7).

x = y + av 1 + bv 2 , where |a| ≤ , |b| ≤ . (2.7)
2λn−1
1 2
Let us now compute the distance between the orbits of x and y under fA . Recall that fAk is obtained by
first acting linearly by the matrix Ak and then taking the result modulo 1 (which correspond to cutting and
pasting the affine image of [0, 1]2 under A to map it back again a unit square.) Since A acts linearly, for each
iterate k ∈ N
Ak (x − y) = Ak (av 1 + bv 2 ) = aAk v 1 + bAk v 2 = aλk1 v 1 + bλk2 v 2 ,
where in the latter equality we used that v 1 , v 2 are eigenvectors. Thus, since the operation of cutting and
pasting does not increase the distances (actually, if the distance is less than 1, it is preserved when taking the
result modulo Z2 ), by setting z = y + av1 = x − bv2 (see again Figure 2.4(a)), we have by triangle inequality
that
d(fAk (x), fAk (y)) ≤ d(fAk (x), fAk (z)) + d(fAk (z), fAk (y)) ≤ |a|λk1 + |b|λk2 .
Using now the bounds on |a| and |b| from (2.4(a)) and then recalling that k ≤ n − 1 and that λ2 < 1, we get
that

|a|λk1 + |b|λk2 ≤ n−1 λk1 + λk2 ≤ + = .
2λ1 2 2 2
Thus d(fAk (x), fAk (y)) ≤ for each 0 ≤ k ≤ n − 1, which means that dn (x, y) ≤ and concludes the proof that
S is (n, )-spanning.
Let us bound the cardinality of S. Let L be the length of the longest line segment. Then, we can bound
the number of points in each line by L divided by the spacing and since there are at most 2N lines, we get
!
L 4N L n−1
Card(S) ≤ (number of lines )(points on each line) ≤ 2N = λ1 .
2λn−1

1
Let us compute the exponential growth rate of Span(n, ), recalling that N and are fixed and only n grows:
log(Span(n, )) log Card(S) log( 4N L λn−1

1 )
lim sup ≤ lim sup ≤ lim sup
n→∞ n n→∞ n n→∞ n
log 4N L (n − 1) log λ1
= lim sup + lim sup = 0 + log λ1 .
n→∞ n n→∞ n
Thus, using the Theorem which expresses the topological entropy using the growth rate of Span(n, ), we have
htop (fA ) ≤ log λ1 .
Let us now construct (n, )-separating sets for fA . Fix an < 1/2. To construct an (n, ) separated set S,
it is enought to consider now only one of the lines, for example the line of maximal length, whose length will
be denoted by L and let S ⊂ T2 be the set which consists of points on the line whose spacing is /λn−1 1 , as in
Figure 2.3(c). Let us prove that S is (n, )-separating, that is, that, given two distinct points x, y in S, there

exists 0 ≤ l ≤ n − 1 such that d(fAl (x), fAl (y)) ≥ , i.e. the two points can be distinguished with resolution in
time n. Let us check that the closest points, that is two consecutive points on the line, can be distinguished.
If x and y are consecutive points, they can be written as

y =x+ v1 .
λ1n−1
Since, reasoning as before, we have

Ak (x − y) = Ak v 1 = n−1 λk1 v 1 ,
λn−1
1 λ 1
and the operation of cutting and pasting to consider the result mod Z2 does not change distances until the
distance is at least 1, we have, for k = n − 1

d(fAn−1 (x), fAn−1 (y)) = n−1
n−1 λ1 = .
λ1
Thus, dn (x, y) ≥ . If one considers another pair of distinct points x0 , y 0 on the line, d(x0 , y 0 ) ≥ d(x, y) and
moreover, since Ak acts linearly, d(Ak x0 , Ak y 0 ) ≥ d(Ak x, Ak y) for any k. Hence there will also be a 0 ≤ k < n
such that
d(fAk (x0 ), fAk (y 0 )) = d(Ak (x0 ), Ak (y 0 )) ≥ .
Thus we conclude that dn (x0 , y 0 ) and that S is (n, )-separating.
Since cardinality of S is the number of points on the line is at least
L L n−1
Card(S) ≥ = λ .

λn−1
1
1
Using this time the definition of htop (fA ) with Sep(n, ) we get
log(Sep(n, )) log Card(S) log( L λn−1

1 )
htop (fA , ) = lim sup ≥ lim sup ≥ lim sup
n→∞ n n→∞ n n→∞ n
log L (n − 1) log λ1
= lim sup + lim sup = 0 + log λ1 .
n→∞ n n→∞ n
Thus, htop (fA ) ≥ log λ1 . Combining with htop (fA ) ≤ log λ1 , we proved htop (fA ) = log λ1 .
Remark 2.5.1. If f and g are topologically conjugate, then
htop (f ) = htop (g).
Thus, topological entropy is what is called an invariant under topological conjugacy.

To prove Remark 2.5.1 one can use an alternative definition of entropy using covers (see below and see
Exercise).
2.5.2 Topological entropy via covers (Level M, Extra for Level 3)

The combination of (n, )-separating and (n, )-spanning sets allows to calculate the entropy in many examples,
but in some examples another alternative definition, topological entropy using covers, is easier to apply.
Let (X, d) be a compact metric space and f : X → X a topological dynamical system. Let us give an
alternative way to calculate topological entropy using covers.
Given a set U , the diameter of U with respect to the metric d is
diamd (U ) = sup d(x, y).

x,y∈U

For example, the ball Bd (x, ) has diameter 2. Let dn be as usual be the metric
dn (x, y) = max d(f k (x), f k (y)).

0≤k<n
Let U = {U1 , U2 , . . . , Un } be a cover of X with open sets with dn -diameter less than , that is diamdn (Ui ) ≤
for any 1 ≤ i ≤ n. Note that since X is compact we can assume that the cover consists of finitely many open
sets (since from any open cover we can extract a finite subcover).
Example 2.5.1. If S is (n, /2)−spanning, since (see the definition of (n, /2)−spanning),
[
X⊂ Bdn (y, ),
2
y∈S
and the balls have dn -diameter less than , the collection

n o
Bdn y, , y∈S
2
is such a cover.
Example 2.5.2. Let U = {U1 , U2 , . . . , UN } be a cover with open sets with d-diameter less than . Consider
the following sets defined using f : X → X:
n o
Uk0 ∩ f −1 (Uk1 ) ∩ . . . f −(n−1) (Ukn−1 ), Uk0 , Uk1 , . . . , Ukn−1 ∈ U .
This collection is a cover, since given any x ∈ X, since U is a cover, x ∈ Ul0 for some 0 ≤ l0 ≤ N , f (x) ∈ Ul1
for some 0 ≤ l1 ≤ N and so on up to f n−1 (x) ∈ Uln−1 for some 0 ≤ ln−1 ≤ N , so that
x ∈ Ul0 ∩ f −1 (Ul1 ) ∩ . . . f −(n−1) (Uln−1 ).
(Note that here l0 , l1 , . . . , ln−1 , . . . is a symbolic coding of the itinerary of the orbit O+
f (x) with respect to the
cover {U1 , U2 , . . . , UN }. Moreover, the dn -diameter of each set is less than , since if x, y ∈ Uk0 ∩ f −1 (Uk1 ) ∩
. . . f −(n−1) (Ukn−1 ), then
x, y ∈ Uk0 , f (x), f (y) ∈ f (Uk1 ), ..., f n−1 (x), f n−1 (y) ∈ f n−1 (Ukn−1 )
and since diamd (Uk ) ≤ for each 0 ≤ k ≤ N ,
d(x, y) ≤ , d(f (x), f (y)) ≤ , . . . , d(f n−1 (x), f n−1 (y)) ≤ ⇒ dn (x, y) ≤ .
Definition 2.5.1. Let Cov(f, n, ) (or simply Cov(n, ) is the map f is clear from the context) be the
minimal cardinality of open covers U = {U1 , U2 , . . . , UN } with sets whose diameter in the dn metric satisfy
diamdn (Ui ) ≤ .
Remark 2.5.2. One can show that the exponential growth rate of Cov(n, ), that is
log(Cov(f, n, ))
lim ,
n→∞ n
exists (see Extra), so there is no need to use lim inf to define it.
Theorem 2.5.2. Let (X, d) be a compact metric space and f : X → X a topological dynamical systems. The
topological entropy htop (f ) can be computed using Cov(f, n, ) and is given by
log(Cov(f, n, ))
htop (f ) = lim lim .
→0 n→∞ n
We will see an example of computation of topological entropy using this definition in the next sections.
Let us now give the proof of Theorem 2.5.2, which follows immediately from the following Lemma (which is
similar to Lemma 2.3.1).

Lemma 2.5.1. For any topological dynamical system f : X → X on a metric space (X, d) we have:
Span(n, ) ≤ Cov(n, ) ≤ Span(n, /2). (2.8)
Proof. In Example 2.5.1, we showed that if S in (n, /2)−spanning, that there it gives a cover with balls of
dn -diameter less than and of cardinality Card(S). Thus, if S is such that Span(n, /2) = Card(S), this
shows that the minimal cardinality Cov(n, ) is less than the cardinality of S, thus Cov(n, ) ≤ Span(n, /2),
which is the second inequality.
For the other inequality, note that if U is a cover with open sets of dn −diameter less than , then each
open set U is contained in a ball Bdn (x, ) centered at any point x ∈ U . Indeed, if x ∈ U , any other point
y ∈ U is such that dn (x, y) ≤ by definition of diameter. Thus, we can find a cover with dn −balls of radius
of the cardinality Card(U ) and the collection S of the centers of the balls give an (n, )−spanning set of
cardinality Card(U ). If we choose U such that Card(U ) = Cov(n, ), this shows that the minimal cardinality
Span(n, ) is at less than Card(S) = Card(U ). So we have Span(n, ) ≤ Cov(n, ).
Proof of Theorem 2.5.2. For each > 0 and each n ∈ N, by Lemma 2.5.1, we have that
Span(n, ) ≤ Cov(n, ) ≤ Span(n, /2).
Thus, for every > 0 we have that
log Span(n, ) log Cov(n, ) log Cov(n, ) log Span(n, /2)

lim sup ≤ lim sup = lim ≤ lim sup .
n→∞ n n→∞ n n→∞ n n→∞ n
Since as → 0, by Theorem 2.3.1 in the previous section the right an the left hand side tend to htop (f ), we
have, by the sandwitch theorem, that
log Cov(n, )
htop (f ) = lim lim .
→0 n→∞ n
See the Extras for this section for for the proof of the existence of the exponential growth rate of Cov(f, n, )
and see Exercise .0.6 in the Extras for the proof of Remark 2.5.1 on topological entropy as a conjugacy invariant.

2.6 Shift spaces and coding

In various of the examples of dynamical systems that we saw so far (the doubling map, the baker map and the
Gauss map) we described an orbit through its itinerary. In this section we introduce some symbolic spaces
that allow to describe the dynamics of more maps using itineraries.
Let us decompose the space X in finitely many pieces (more in general, one could consider countably many
pieces), that is
X = P1 ∪ P2 ∪ · · · ∪ PN .
If Pi are pairwise disjoint, we say that {P1 , . . . , PN } is a finite partition of X. Let x ∈ X. Since Pi cover X,
for each i ∈ N there exists 1 ≤ ai ≤ N such that f i (x) ∈ Pai . The sequence a0 , a1 , . . . , an , . . . , is the itinerary
of x with respect to {P1 , . . . , PN }.
We can code the (forward) orbit O+ ∞
f (x) with the sequence a = (ai )i=0 . The sequence belongs to
∞
Σ+ N
N = {1, . . . , N } = { a = (ai )i=0 , 1 ≤ ai ≤ N },
that is the space of (one-sided) sequences in the digits 1, . . . , N .

If f is invertible, for each i ∈ Z there exists 1 ≤ ai ≤ N such that f i (x) ∈ Pai and we can code the full
orbit O+ ∞
f (x) with the full (past and future) itinerary a = (ai )i=−∞ , which belongs to the space
ΣN = {1, . . . , N }Z = { a = (ai )∞
i=−∞ , 1 ≤ ai ≤ N },
that is the space of bi-sided sequences in the digits 1, . . . , N .

In both cases, f (x) is coded by the shifted sequence: since f i (f (x)) = f i+1 (x) ∈ Pai+1 by definition of
itinerary of x, the itinerary of f (x), and hence the coding of Of+ (f (x)) is given by
σ + ((ai )+∞ +∞
i=0 ) = (ai+1 )i=0 , or, when f is invertible, by σ((ai )+∞ +∞
i=−∞ ) = (ai+1 )i=−∞ .
The maps
σ + : Σ+ +
N → ΣN , σ : ΣN → ΣN ,
are known as full (one-sided) shift on N symbols and full bi-sided shift on N
If ψ : X → Σ+ N (or ψ : X → ΣN in the invertible case) is the coding map which assign to each point its
itinerary, the previous relation shows that for all x ∈ X
ψ(f (x)) = σ + (ψ(x)) (or ψ(f (x)) = σ + (ψ(x)) if f is invertible).
In order to give a conjugacy, though, the coding map ψ should be both injective and surjective. Thus, it is
natural to ask:
(Q1) Is the coding unique?
(Q2) Do all sequences in Σ+
N (or in ΣN ) occur as possible itineraries?
The answer to both these questions is generally NO. In all the cases that we saw (doubling map, baker map,
Gauss map) so far, all possible finite 12 sequences (in Σ+ 2 for the doubling map, in Σ2 for the baker map and in
countably many digits {1, . . . , n, . . . , }N for the Gauss map) do occur, but as the Example 2.6.1 below shows,
this is often not the case.
Figure 2.5: The map f in Example 2.6.1.
12 Also in these examples there are countably many infinite sequences that do not occurr as itinerararies: for example, for the
doubling map, if the coding partition is [0, 1/2) and [1/2, 1], all sequences which end with a tail of 1s do not appear as itinerary
of any point.

Example 2.6.1. Let f : [0, 1] → [0, 1] be the map give by
if 0 ≤ x < 12 ,

2x
f (x) =
x − 12 if 12 ≤ x < 1.
whose graph is shown in Figure 2.5. Let I1 = [0, 1/2) and I2 = [1/2, 1]. It is clear that if x ∈ I2 , then f (x) ∈ I1 .
On the other hand, if x ∈ I1 , one could have either f (x) ∈ I1 (if x < 1/4) or f (x) ∈ I2 (if 1 < 4 ≤ x < 1/4).
Thus, one will never see two consecutive digits 2, 2 in the itinerary, while all combinations 1, 1, 1, 2 and 2, 1
can occur.
To be able to describe the subset of the shift that describes itineraries of this form is one of the reasons to
study subshifts of finite type of the following form.
Definition 2.6.1. An N × N matrix is called an transition matrix (also called incidence matrix ) if all entries
Aij , 1 ≤ i, j ≤ N , are either 0 or 1.
One can use a matrix A to encore the information of which pairs of consecutive digits can appear in an
itinerary: the digit i can be followed by the digit j if and only if the entry Aij is equal to 1. More formally,
we can consider the following subspaces Σ+ +
A ⊂ ΣN and ΣA ⊂ ΣN of sequences
Definition 2.6.2. The shift spaces associated to a transition matrix A are:
Σ+ +∞ +
A = {(ai )i=0 ∈ ΣN , Aai ai+1 = 1 for all i ∈ N},
ΣA = {(ai )+∞
i=−∞ ∈ ΣN , Aai ai+1 = 1 for all i ∈ Z}.
Example 2.6.2. For example if A is the matrix

1 1
A= ,
1 0
since the only zero entry is A22 = 0, the digit 2 cannot be followed by another digit 2, while all the other pairs
of successive digits 12, 11 and 21 are allowed. Thus the sequences in Σ+ A (respectively ΣA ) are all sequences
in the digits 1, 2 (respectively all the bisided sequences) without any pairs of consecutive digits 2.
If (ai )+∞ + + +∞ +
i=0 ∈ ΣA , also the shifted sequence σ ((ai )i=0 ) belongs to ΣA , since if Aai ai+1 = 1 for all i ∈ N,
clearly also Aai+1 ai+2 = 1 for all i ∈ N (in other words, if a pair of consecutive digits did not occurr in
a, it clearly does not occurr either in the shifted sequence). The same is true for bisided sequences: if
(ai )+∞ +∞ +
i=−∞ ∈ ΣA , also the shifted sequence σ((ai )i=−∞ ) ∈ ΣA . Thus, the spaces ΣA and ΣA are invariant
+
under the shift and we can consider the restriction of σ and σ to this subspaces.
Definition 2.6.3. The restriction of the shift maps to
σ + : Σ+ +
A → ΣA , σ : ΣA → ΣA ,
are called a topological Markov chain 13 (or also a subshift of finite type) associated to the matrix A.
These are special examples of subshifts, that is restrictions of the shift to closed invariant spaces of Σ+
N (or
ΣN ). In a topological Markov chain, the only type of restrictions on the sequences is of the form i cannot be
followed by j, thus depend only on the previous digit14 .
It is very convenient to visualize sequences sequences in ΣA as paths on a graph.
13 In probability, one studies Markov chains, which consists of a topological Markov chain with in addition a measure. We will
define a measure which is invariant under the shift in one of the next lectures.
14 More in general, one can define for example invariant spaces where certain combination of digits, also called words in the
digits, are not allowed (for example forbidden words could be 2212 and 111, so there cannot be occurrences of 11, but not three
consecutive digits 1). A subshift can be equivalently defined in terms of countably many forbidden words: no sequence in the
subshift contains a forbidden word and any sequence in the complement does contain a forbidden word. If only a finite number
of words are forbidden, we have a subshift of finite type. If the maximal length of forbidden words is k + 1, the subshift is called
a k-step subshift of finite type. Thus, topological Markov chains are 1−step subshifts of finite type.

Definition 2.6.4. The graph GA associated to the N × N transition matrix A is a graph with vertices are
v1 , . . . , vN , where vi and vj are connected by an arrow from vi to vj if and only if Aij = 1.
Then the following fact is immediate:
Lemma 2.6.1. A sequence (ai )+∞i=0 ∈ ΣN belongs to ΣA if and only if it describes a path on GA . Similarly a
+ +
sequence (ai )i=−∞ ∈ ΣN belongs to ΣA if and only if it describes a bi-infinite path on GA .

+∞
Example 2.6.3. For example for the following matrices:

 
1 1 0
1 1 0 1
A= , B= , C= 1 0 1 , (2.9)
1 0 0 0
1 0 1
one obtains the graphs GA , GB and GC in Figure 2.6. A paths on GA can never go through v2 and immediately
after v2 again. Since there are no infinite paths on GB , we see that ΣB = ∅.
(a) (b) (c)

GA GB GC
Figure 2.6: The graphs associated to the transition matrices A, B, C in (2.9).
To avoid trivial cases like above, where ΣB = ∅, but also to guarantee interesting dynamical properties, as
we will see in the next lecture, one can ask conditions on the matrix as the following.
Definition 2.6.5. A transition matrix A is called irredudible if for any 1 ≤ i, j ≤ N there exists an n ∈ N
(possibly dependent on i, j) such that the entry (An )ij of the matrix An obtained multiplying A by itself n
times is positive ((An )ij > 0).
A transition matrix A is called aperiodic (or also, in some books, transitive) if there exists an n ∈ N such
that if for any 1 ≤ i, j ≤ N we have (An )ij > 0.
A matrix A such that all entries Aij > 0 is called positive (and we write A > 0). Thus, A is aperiodic if
there exists a power n ∈ N such that An is positive.
Example 2.6.4. For example

1 1 1 1 2 1 1 1 1 n
A2 = = ; if D = , Dn = , (2.10)
1 0 1 0 1 1 0 1 0 1
so A is irreducible and aperiodic (with n = 2) since all entries of A2 are positive, while D is not irreducible,
since for any n the entry (Dn )21 = 0.
Remark 2.6.1. Irreducible and aperiodic matrices can be easily recognized using the associated graph:
(1) A is irreducible if and only if for any two vertices vi and vj on GA there exists a path connecting vi to
vj ;
(2) A is aperiodic if and only if there exists an n such that any two vertices vi and vj on GA can be connected
by a path of the same length n.
The proof of this remark will be given in the next section.
Example 2.6.5. The graph GA in Figure 2.6 shows that A is irreducible, since all vertices can be connected:
for example, v2 can be connected to itself going through v1 . Moreover, it is aperiodic with n = 2, since the
paths from v2 to v2 has length two and one can get paths of length 2 from v1 to itself, from v2 to v1 and from
v1 to v2 by repeating the loop around v1 . The graph GB in Figure 2.6 shows that B is not irreducible, since
there are no paths connecting for example v1 to itself (or neither paths connecting v2 to v1 and v2 to itself).
The graph GC in Figure 2.6 shows that C is irreducible, since one sees immediately that all vertices can be
connected to each other.

Exercise 2.6.1. Consider the following transition matrices

 
  0 0 1 1
1 1 0  0 0 1 1 
A1 =  1 0 1 , A2 = 
 1
. (2.11)
1 0 0 
0 0 1
1 1 0 0
Draw the corresponding graphs GAi , i = 1, 2, associated to them. For each i = 1, 2 is Ai irreducible? is Ai
aperiodic?
2.7 Dynamical properties of topological Markov chains

Let A be an N × N transition matrix and let σ : ΣA → ΣA be the corresponding topological Markov chain.
In this section we investigate the topological dynamical properties of σ : ΣA → ΣA . We first need to know
what are the open sets in a shift space.
Metric, cylinders and balls for a shift space.

The shift spaces Σ+
N and ΣN and their subshifts spaces are metric spaces with one of the following distances.
Let ΣN = {1, . . . , N }Z be the full bi-sided shift space on N symbols. For ρ > 1 consider the distance
∞
X |xk − yk |
dρ (x, y) = , where x = (xk )+∞
k=−∞ , y = (yk )∞
k=−∞ . (2.12)
k=−∞
ρ|k|
The distance is well defined since |xk − ykP

| ≤ N − 1 for any k ∈ N (since both xk , yk ∈ {1, . . . , N }, their
difference is at most N − 1) and the series k 1/ρk is convergent for any ρ > 1, so that
∞ ∞
X N −1 X 1
dρ (x, y) ≤ |k|
≤ 2(N − 1) < ∞.
ρ ρk
k=−∞ k=0
Similarly, for the full one-sided shift space Σ+

N = {1, . . . , N } we can consider for any ρ > 1 the distance
N
∞
X |xk − yk |
d+
ρ (x, y) = , where x = (xk )∞
k=0 , y = (yk )∞
k=0 . (2.13)
ρk
k=0
Note that if ΣA ⊂ ΣN (respectively Σ+ +

A ⊂ ΣN ) is a subshift, the distance dρ in (2.12) (respectively the distance
dρ in (2.13), gives a distance on ΣA (respectively Σ+
+
A ). More generally, all subshift spaces (subsets of ΣN or
Σ+
N invariant under the shift map σ) are metric spaces.
The following sets, called cylinders, play an essential role in the study of shift spaces:
Definition 2.7.1. A cylinder is a subset of ΣN of the form
C−m,n (a−m , . . . , an ) = { x = (xi )+∞

i=−∞ ∈ ΣN , such that xi = ai for all − m ≤ i ≤ n },
where m, n ∈ N and ai ∈ {1, . . . , N } for −m ≤ i ≤ n.

A cylinder is called symmetric if n = m.
Similarly a cylinder in Σ+
N is a subset of the form
Cn (a0 , . . . , an ) = { x = (xi )+∞ +

i=0 ∈ ΣN , such that xi = ai for all 0 ≤ i ≤ n },
where n ∈ N and ai ∈ {1, . . . , N } for 0 ≤ i ≤ n.

Example 2.7.1. The following sequence

i=0
z}|{
x = . . . , x−m−2 , x−m−1 , a−m , . . . , a0 , . . . , an , xn+1 , xn+2 . . .
| {z }
f ixed block
belongs to the cylinder C−m,n (a−m , . . . , an ). All points (yi )+∞

i=−∞ in C−m,n (a−m , . . . , an ) contain the fixed
block of digits a−m , . . . , an centered at a0 (for i = 0), while the tails can be any sequence in the digits
{1, . . . , N }.
It two points belong to the same cylinder, they share a common central block of digits. Thus, it is clear
that the distances in (2.12) and (2.13) are small. More is true. If ρ is chosen sufficiently large, than symmetric
cylinders are exactly balls with respect to the distance dρ :
Lemma 2.7.1. If ρ > 2N − 1, than for any = 1/ρn we have:

1
C−n,n (x−n , . . . , xn ) = Bdρ x, n ,
ρ
where x = (xi )∞
i=−∞ ∈ ΣN be a sequence which contains the central block x−n , . . . , xn .
Proof. Let C−n,n (x−n , . . . , xn ) be a symmetric cylinder in ΣN . Since x = (xi )∞ i=−∞ ∈ ΣN contains the central
block x−n , . . . , xn , the point x ∈ C−n,n (x−n , . . . , xn ).
If also y ∈ C−n,n (x−n , . . . , xn ), then, since |xk − yk | = 0 for all k ∈ Z with |k| ≤ n, we have
−n−1 ∞
X |xk − yk | X |xk − yk |
dρ (x, y) = + .
k=−∞
ρ|k| k=n+1
ρ|k|
Note that, since both xi , yi ∈ {1, . . . , N }, for any i ∈ N we have |xi − yi | ≤ N − 1. Thus, using also the formula
for the sum of the geometric progression, that gives us
∞
X 1 1 ρ
j
= 1 = ,
j=0
ρ 1− ρ
ρ−1
we get
∞ ∞
X N −1 2(N − 1) X 1 2(N − 1) ρ 2(N − 1) 1
dρ (x, y) ≤ 2 ≤ = = .
ρk ρn+1 j=0 ρj ρn+1 ρ − 1 ρn ρ−1
k=n+1
Thus, since
2(N − 1) 1 1 2(N − 1)
dρ (x, y) ≤ n
< n ⇔ <1 ⇔ ρ > 2N − 1,
ρ ρ−1 ρ ρ−1
if ρ > 2N − 1, we have
1 1
dρ (x, y) < n ⇔ y ∈ Bdρ x, n .
ρ ρ
This proves that if ρ > 2N − 1 we have the inclusion

1
C−n,n (x−n , . . . , xn ) ⊂ Bdρ x, n .
ρ

Let us check the reverse inclusion. Assume that y ∈ Bdρ x, ρ1n . Then, if by contradiction y ∈
/ C−n,n (x−n , . . . , xn ),
there exists j ∈ Z with |j| ≤ n such that xj 6= yj , so that |xj − yj | ≥ 1. But then
∞
X |xk − yk | |xj − yj | 1 1
dρ (x, y) = |k|
≥ |j|
≥ |j| ≥ n , since 0 ≤ |j| ≤ n.
ρ ρ ρ ρ
k=−∞


/ Bdρ x, ρ1n , which is a contradiction. Thus, we also have (without any additional assumption on
Thus, y ∈
ρ) the opposite inclusion
1
Bdρ x, n ⊂ C−n,n (x−n , . . . , xn ).
ρ
Exercise 2.7.1. Find a condition on ρ which guarantees that for any = 1/ρn we have:

1
C0,n (x0 , . . . , xn ) = Bd+ x, .
ρ
ρn
Consider now a subshift ΣA ⊂ ΣN determined by the transition matrix A (or the one-sided subshift
Σ+ +
A ⊂ ΣN ).
Definition 2.7.2. A cylinder C−m,n (a−m , . . . , an ) (where n, m ∈ N and ai ∈ {1, . . . , N } for −m ≤ i ≤ n) is

called admissible if Aai ,ai+1 = 1 for all −m ≤ i < n.
Similarly, a cylinder C0,n (a0 , . . . , an ) (where n ∈ N and ai ∈ {1, . . . , N } for 0 ≤ i ≤ n) is called admissible
if Aai ,ai+1 = 1 for all −m ≤ i < n.
Remark 2.7.1. If A is irreducible, a cylinder is admissible if and only if it is not empy. Indeed, the condition
Aai ,ai+1 = 1 for all −m ≤ i < n guarantees that there is a path on GA described by a−m , . . . , an (that is
passing in order through the vertices va−m , . . . , van ) and since A is irreducible, one can continue this path to
a biinfinite path in GA (adding any admissible forward tail starting from an and any admissible backward tail
before a−m ). This path belong to the cylinder and shows that it is not empy.
Number of paths and periodic points

Recall that in the previous section we interpreted the subshift ΣA ⊂ {1, . . . , N }Z as the space of bi-infinite
paths on the graph GA associated to the transition matrix A. The following Lemma turns out to be very
helpful to study dynamical properties.
Lemma 2.7.2 (Number of paths). For any 1 ≤ i, j ≤ N , the number of paths of length n on GA (that is,
paths obtained composing n arrows) starting from the vertex vi and ending in the vertex vj is given by (An )ij
(recall that Anij is the (i, j) entry of the matrix An obtained producing A by itself n times).
Proof. Let us prove it by induction on n. For n = 1, the paths of length 1 connecting vi to vj are simply
arrows from vi to vj . By definition of GA there is an arrow from vi to vj if and only if Aij = 1, thus the
statement for n = 1 holds.
Assume we proved it for n. Then, the number of paths of length n + 1 starting from the vertex vi and
ending in the vertex vj can be obtained by considering all paths of length n + 1 starting from the vertex vi
and ending in any of the other vertices vk , where 1 ≤ k ≤ N and extending each to a path of length n + 1
ending in vj if there is an arrow from vk to vj . Using that by inductive assumption the number of paths of
length n starting from vi and ending in vk is given by Anij , this gives
Card{ paths of length n + 1 from vi to vj }

N
X
= Card{ paths of length n from vi to vk } Card{ arrows from vk to vj }
k=1
N
X
= Anik Akj = An+1
ij ,
k=1
where in the latter equation we simply used the definition of (i, j) entry of the product matrix An A.
The Lemma has the following immediate Corollary on the number of periodic points of a topological
Markov chain.

Corollary 2.7.1. The cardinality of periodic points of period n for σ : ΣA → ΣA is exactly the trace T r(An ).
P
(Recall that the trace of a matrix T r(A) = i Aii is the sum of the diagonal entries of A.)
Proof. If x is a periodic points of period n for σ : ΣA → ΣA , then σ n (x) = x, which implies that the digits of
the sequence x = (xi )+∞
i=−∞ have period n, that is xn+i = xi for all i ∈ Z. Thus, the path described by x on
GA is a periodic path, that repeats periodically the path starting from some vi and coming back to the same
vi . Since the paths of length n connecting vi to vi are Anii by Lemma 2.7.2, we have
Card{x = (xi )+∞

i=−∞ such that σ n (x) = x} =
N
X N
X
= Card{x such that σ n (x) = x and x0 = i} = Anii = T r(An ),
i=1 i=1
which is the desired expression.
Topological transitivity and topological mixing

In the previous section we defined when a transition matrix A is irreducible and when it is aperiodic (also
called transitive or primitive in certain books). The equivalence of the definition in terms of existence of paths
connecting two vertices vi and vj (see Remarks) can be proved using the Lemma 2.7.2.
The dynamical significance of these definitions lie in the following Theorem.
Theorem 2.7.1. Let σ : ΣA → ΣA be a topological Markov chain.
(1) The matrix A is irreducible if and only if σ : ΣA → ΣA is topologically transitive;

(2) If A is aperiodic, then σ : ΣA → ΣA is topologically mixing.
Let us first prove a Lemma which will be used in the proof of the Theorem.
Lemma 2.7.3. If An > 0 for some n > 0, then for any m ≥ n we also have Am > 0.
Proof. Remark first that if An > 0 for some n > 0, this means that for each j there exists a kj such that
Akj j = 1. Otherwise, if Akj = 0 for all 1 ≤ k ≤ N , then the vertex vj cannot be reached from any other
vertex vk , so there cannot exist any path of length n reaching vj , in contradiction with the fact that Anij > 0.
Let us now prove by induction on m that Am > 0 for m ≥ n. For m = n it is true by assumption. If we
verified it for m, take any 1 ≤ i, j ≤ N . Then, by the previous remark there exists kj such that Akj j = 1.
Moreover, for all the other k we have Akj ≥ 0. Hence, we get
N
X
Am+1
ij = Am m m
ik Akj ≥ Aikj Akj j = Aikj
k=1
and Am m
ikj > 0 since A > 0 the inductive assumption. This shows that A
m+1
> 0 and concludes the proof.
Proof of Theorem 2.7.1. Let us prove (1). Assume that A is irreducible. We want to show that for each U, V
non-empty open sets there exists M > 0 such that σ M (U ) ∩ V 6= 0. Fix ρ > 2N − 1. Since each open sets
contains an open ball of the form Bdρ (x, ρ−k ) for some large k and by Lemma in the previous section each
ball of this form is a symmetric cylinder, there exists two admissible cylinders
C(−k,k) (a−k , . . . , ak ) ⊂ U, C(−l,l) (b−l , . . . , bl ) ⊂ V. (2.14)
Let us now construct a point x which contains both blocks of digits a−k , . . . , ak and b−l , . . . , bl . By definition
of irreducibility, taking i = ak and j = b−l , there exists n > 0 such that Anak ,b−l > 0. This means that there
exists a path of length n which connects vak to vb−1 . Let us denote by
y0 = ak , y1 , y2 , . . . , yn−1 , yn = b−l ,

the digits which describe this path. Clearly Ayi yi+1 = 1 for all 0 ≤ i ≤ n − 1. Consider a point x ∈ ΣA such
that
x = . . . a−k , . . . , a0 , . . . , ak , y1 , . . . , yn−1 , b−l , . . . , bl , . . .
|{z}
i=0
(such point exist since by irreducibility we can choose a backward and a forward tail by choosing any path on
GA which starts from bl (for the forward tail) or ends in a−k (for the backward tail). Clearly, since x contains
as central block of digits a−k , . . . , ak , we have x ∈ C(−k,k) (a−k , . . . , ak ) ⊂ U . Moreover, if we set M = n+k +l,
shifting the sequence k + n + l times to the left, since xk+n+l = b0 , we get
σ M (x) = . . . b−l , . . . , b0 , . . . , bl , . . . ,
|{z}
i=0
so that σ M (x) ∈ C(−l,l) (b−l , . . . , bl ) ⊂ V . This shows that
x ∈ U ∩ σ −M (V ) 6= ∅ ⇔ σ M U ∩ V 6= ∅.
This shows that σ : ΣN → ΣN is topologically mixing.

Let us prove the converse implication in (1). Assume that σ : ΣA → ΣA is topologically transitive. Let us
show that A is irreducible. Let 1 ≤ i, j ≤ N . Consider as open sets U, V the cylinders giving by fixing only
the first digit:
U = C(0,0) (i) = {x ∈ ΣA , x0 = i}, V = C(0,0) (j) == {x ∈ ΣA , x0 = j}.
Since σ : ΣA → ΣA is topologically transitive, there exists n > 0 such that σ n (U ) ∩ V 6= 0. Equivalently,

U ∩ σ −n (V ) 6= ∅. This means that there exists x ∈ U ∩ σ −n V . Since x ∈ U , this means that x0 = i. Since
x ∈ σ −n (V ), we have σ n (x) ∈ V . But
(σ n (x))0 = xn , so σ n (x) ∈ V ⇔ xn = j.
Thus, we found an element x ∈ ΣA , that by definition describes a biinfinite path on GA , such that x0 = i and
xn = j. This gives a path of length n connecting vi to vj , showing that Anij > 0.
Let us now prove (2). Assume that An > 0. We want to show that σ : ΣA → ΣA is topologically mixing.
Let U, V be non empty open sets. We seek M0 such that for any M ≥ M0 we have σ M (U ) ∩ V 6= ∅. We
can reason very similarly to part (1). Both U, V contain admissible symmetric cylinders of the form (2.14).
Let M0 = n + k + l. If M ≥ M0 , then M = m + k + l with m ≥ n. Then also Am > 0 by Lemma 2.7.3, so
Amak ,b−l > 0. Thus, there exists a path of length m from vak and vb−l , so we can construct a point in ΣA of
the form
x = . . . a−k , . . . , a0 , . . . , ak , y1 , . . . , ym−1 , b−l , . . . , bl , . . .
|{z}
i=0
−M
Reasoning as in Part (1), x ∈ U ∩ σ (V ), so that σ M (U ) ∩ V 6= ∅. This can be repeated for any M ≥ M0 ,
showing that σ : ΣA → ΣA is topologically mixing.
Topological entropy of Markov chains (Level M only, Extra for Level 3)

Let us sketch how to compute the topological entropy of a topological Markov chain. We will use here the
definition of topological entropy via covers.
Given an N × N matrix, let ||A|| be the norm given by
X
||A|| = |Aij |.
1≤i,j≤N
Theorem 2.7.2 (Entropy of topological Markov chains). Let A be a N × N be an aperiodic transition matrix
and σ : ΣA → ΣA be the associated topological Markov chain. The following limit exists and the topological
entropy is given by
||An ||
htop (σ) = lim log . (2.15)
n→∞ n

Remark 2.7.2. The limit

||An ||
ρ(A) = lim log .
n→∞ n
which appears in in (2.15) is called spectral radius of the matrix A. One can show that the spectral radius is
also given by
ρ(A) = log |λmax
A |,
where |λmax
A | is the maximum modulus of the eigenvalues of A.
Combining the Remark and the Theorem we have the following Corollary, which is more useful for the
computations of entropy:
Corollary 2.7.2. The topological entropy of σ : ΣA → ΣA where A is irreducible is given by
htop (σ) = log |λmax

A |.
In the proof of the Theorem, we will use the following Lemma, whose proof is omitted.
Lemma 2.7.4. Let A be a matrix with non negative entries. Then, for any k > 0, the following limits exist
and we have
||An+k || ||An ||
lim log = lim log .
n→∞ n n→∞ n
Proof of Theorem 2.4.1. Let ρ < 2N − 1 and consider the distance d = dρ , so that by Lemma 2.7.1 balls of
radius 1/ρn in ΣN are exacly symmetric cylinders of the form C−n,n (a−n , . . . , an ). Let = 1/ρk . Consider
the collection U of cylinders which are admissible and of the following form:
U = { C−k,n−1+k (a−n , . . . , an−1+k ), Aai ai+1 = 1 for − n ≤ i < n − 1 + k }. (2.16)
Since (a−k , . . . , an−1+k ) varies over all possible admissible values, Un is a cover: for any x ∈ ΣA , x ∈
C−k,n−1+k (x−k , . . . , xn−1+k ) ∈ Un . Moreover, since cylinders are open balls, Un is an open cover. Let us
check that the dn diameter of each C ∈ Un is less than .
Let C = C−k,n−1+k (a−k , . . . , an−1+k ). If x, y ∈ C, by definition
xi = yi = ai for − k ≤ i ≤ k + n ⇒ σ j (x)i = σ j (y)i for − k − j ≤ i ≤ n + k − j.
Thus, for any 0 ≤ j < n, since n − 1 + k − j ≤ k, we have in particular
σ j (x)i = σ j (y)i for − k ≤ i ≤ k,
so that both σ j (x) and σ j (y) belong to the same symmetric cylinder which is a ball or radius 1/ρk . Hence
dn (x, y) = max d(σ j (x), σ j (y)) ≤ 1/ρk = .

0≤j≤n−1
Since this holds for all x, y ∈ C, diamd (C) < .

Since we just showed that Un is an open cover with balls of dn −diameter less than , we have
Cov(n, ) ≤ Card(Un ).
The cardinality of Un is the number of admissible cylinders of the form (2.16), thus the number of admissible
paths of length n + 2k from any vertex vi to any other vertex vj . Thus, by Lemma 2.7.2, we have
N
X N
X
Card(Un ) = {paths of length n + 2k from vi to vj } = An+2k
ij .
i,j=1 i,j=1
Moreover, one can see that Un is a minimal cover with sets of dn -diameter less than . Note that all cylinders
in U are disjoint balls in the dn -metric. A cover has in particular to cover a point in each cylinder Ci ∈ U

with some open set Ui , but since the diamdn (Ui ) ≤ , it cannot contain any other point outside Ci . Thus, the
cardinality of any cover with open sets of dn −diameter less than has at least the cardinality of Un . Thus
N
X
Cov(n, ) = Card(Un ) = An+2k
ij = ||An+2k ||.
i,j=1
Using the definition of entropy via covers and Lemma 2.7.4, we get
log Cov(n, ) log ||An+2k || log ||An ||

htop (σ, ) = lim = lim = lim .
n→∞ n n→∞ n n→∞ n
Thus, since the quantity is independent on , htop (σ) = htop (σ, ).
2.8 On symbolic coding of toral automorphisms

In all the examples of coding that we have seen so far, the choice of the partition to use to describe itineraries
was very natural. We will show now an example were the partition is much less obvious.
Let fA : T2 → T2 be the cat map, that is the hyperbolic toral automorphism associated to the matrix

2 1
A= .
1 1
Let λ1 > 1, λ2 < 1 be the eigenvalues of A and let

√ √
1+ 5 1− 5
v1 = 2 , v2 = 2 .
1 1
be the corresponding eigenvectors, which are orthogonal since their scalar product is
√ √
(1 + 5) (1 − 5) 1−5
< v 1 , v 2 >= +1·1= + 1 = −1 + 1 = 0.
2 2 2
In Figure 2.7 you can see a partition of T2 into two sets, R1 and R2 (R1 is the union of the white pieces
and R2 is the union of the dark pieces).
Figure 2.7: A partition of T2 into rectangles R1 , R2 .
Each of them is a rectangle on the surface of T2 , whose sides are in the orthogonal directions of either v1
or v2 . In order to see that, it is convenient to cut and paste the triangles as in Figure 2.7 (sets filled with the
same shade are translations of each other). When you cut and paste a triangle by moving it by an integer
vector (k, l) ∈ Z2 , you get a different set in R2 , which represent the same set on the torus T2 (recall that
T2 = R2 /Z2 is the set of equivalence classes of points in R2 where (x, y) ∼ (x0 , y 0 ) if (x0 , y 0 ) = (x, y) + (k, l)).
Thus, if instead than the standard copy which lies in the unit square [0, 1)2 , we use the cut and pasted copies
translated as in Figure 2.7, we can see that both R1 and R2 are rectangles.
[Instead than representing T2 as a unit square with opposite sides identified, we could choose to represent
2
T as union of the two copies of the rectangles R1 and R2 where again opposite sides are identified.]
We could try to use the partition P = {R1 , R2 } of T2 to code fA . This partition has the nice property
that rectangles are mapped to rectangles. Indeed, since the sides of R1 and R2 are parallel to eigenvectors,
the image of R1 and R2 under A have still sides parallel to v1 and v2 , that is, they are still rectangles.
Let us describe the images fA (R1 ), fA (R2 ) under the cat map, referring to Figure 2.8.
Recall that fA is obtained by first acting linearly by A and then taking the result modulo one, which
correspond to projecting R2 to T2 by the projection π : R2 → T2 given by
π(x, y) = (x mod 1, y mod 1). So fA = π ◦ A.

Figure 2.8: The images of the rectangles R1 and R2 under fA : T2 → T2 .
Since the sides of R1 and R2 are parallel to eigenvectors, the image of R1 and R2 are still rectangles, but
since all directions parallel to v1 are expanded by a factor λ1 and all directions parallel to v1 are contracted
by a factor λ2 , the rectangles image under A are thinner and longer, as shown in Figure 2.8 (the image of R1
under A is the lighter shade rectangle, the image of R2 under A is the darker shade rectangle). The projection
π consists of cutting and pasting corresponding pieces of these rectangles back to the unit square, as shown
again in Figure 2.8 (sets with the same name are translated copies of each other).
Note that the images of R1 and R2 under A crosses different parts of the translated copies of each original
rectangle R1 , R2 . In order to describe these intersections precisely, let us write
R 1 = P1 ∪ P2 ∪ P3 , R2 = P4 ∪ P5 ,
where P1 , . . . , P5 are the sets in Figure 2.8 ( that we give the same name Pi to all the sets in R2 that represent
a translated copy by an integer vector of Pi ⊂ [0, 1)2 , since they correspond to the same set on T2 ).
Looking at Figure 2.8, you can see that
f (R1 ) = P3 ∪ P1 ∪ P4 , f (R2 ) = P2 ∪ P5 . (2.17)
Since R1 crosses itself more than once, it is not a good idea to use the partition P = {R1 , R2 } of T2 to code
fA . Let us use instead the finer partition
5
[
P = {P1 , P2 , P3 , P4 , P5 }, 2
T = R1 ∪ R2 = (P1 ∪ P2 ∪ P3 ) ∪ (P4 ∪ P5 ) = Pk
k=1
It is clear that if we code the orbit of the point (x, y) using a sequence (ai )i∈Z ∈ Σ5 = {1, . . . , 5}Z in the usual
way, so that
fAk ((x, y)) ∈ Pak , for all k ∈ Z,
not all sequences of Σ5 will describe itineraries, becouse of the inclusions (2.17). For example, if ak = 1, it
means that fAk ((x, y)) ∈ P3 ⊂ R1 . Thus, fAk+1 ((x, y)) ∈ f (R1 ) = P3 ∪ P1 ∪ P4 , so ak+1 can be only 1, 3 or 4.
Reasoning in a similar way, since if x ∈ P2 ⊂ R1 or x ∈ P2 ⊂ R1 , f (x) ∈ f (R1 ) = P3 ∪ P1 ∪ P4 , the digits
2, 3 can be followed only by 1, 3 or 4. If x ∈ P4 ⊂ R2 or x ∈ P5 ⊂ R2 , f (x) ∈ f (R2 ) = P2 ∪ P5 , we see that
4, 5 can be followed only by 2 or 5.
Let encode this information in a 5 × 5 transition matrix B, by setting Bij = 1 if and only if i can be
followed by j, 0 otherwise. We get  
1 0 1 1 0
 1 0 1 1 0 
 
B=  1 0 1 1 0 . (2.18)

 0 1 0 0 1 
0 1 0 0 1
These considerations lead to the following:
Remark 2.8.1. Full itineraries of orbits Of+A (x) of the cat map fA : T2 → T2 belong to the shift space ΣB .
Conversely, all the transitions that we describe can actually occur. Thus, finite sequences in ΣB all
represent possibile itineararies of orbits of the cat map with respect to the partition P. Moreover, a general
property of the coding map, the image of a point under fA is coded by the shifted sequence in ΣB . Thus, one
can use the shift space ΣB to construct a (semi-)conjugacy of the cat map with the topological Markov chain
σ : ΣB → ΣB . More details can be found in the references quoted in the Extra.
Finding a good partition for coding is not always easy (as in this case), but once one has constructed
a semi-conjugacy with a shift space, it is much easier to prove some dynamical properties, as for example
topological transitivity (which, for a subshift given by a transition matrix, follows simply by verifying that
the matrix is irreducible). See for example the exercise Ex. 2.8.1 below.

Remark 2.8.2. The partition used to code the cat map is an example of Markov partition. Markov partitions
can be constructed more in general for hyperbolic dynamical systems, that is systems that have contracting
and expanding directions. Coding via a Markov partition, allows to reduce the study of the dynamical system
to a symbolic space. This is often a powerful technique to prove dynamical properties of the original system.
In particular, Markov partitions can be constructed for all hyperbolic toral automorphisms similarly to the
example we saw for the cat map. We give another example as an Exercise.
* Exercise 2.8.1. Check that the toral automorphism fA : T2 → T2 given by the matrix

1 1
A= .
1 0
is hyperbolic. Figure 2.9 shows a partition of T2 into rectangles and their images under fA .
Figure 2.9: A partition of T2 into rectangles R0 , R1 , R2 and their images under fA : T2 → T2 .
(a) Using itineraries with respect to the partition {R1 , R2 , R3 } to code a point, find a transition matrix A
such that the shift space ΣA describes all possible itineraries coding orbits O+ fA ((x, y)).
(b) Can there be a point x ∈ T2 whose orbit O+

fA (x) never visits the rectangle R2 ? Justify your answer.
(c) Can there be a point y ∈ T2 whose orbit O+

fA (y) never visits the rectangle R3 ? Explain.

Extras on Topological Dynamics: Topological Spaces

We will consider only metric spaces and define all notions of topological dynamics (see next section) in the
context of metric spaces. More in general, all notions of topological dynamics that we will see can be applyed
in the more general setting of topological spaces. Metric spaces are a special example of topological spaces. In
a metric space, we defined the notion of open and closed sets (see Definition 1.2.3.The collection of open sets
determines what is called a topology on the metric space. We also saw, from the definition of open set in a
metric space, that countable unions and finite intersections of open sets are again open sets (see Lemma 2.1.1)
and that countable intersections and finite unions of closed sets are closed (Corollary 2.1.1).These properties
of open and closed sets can be taken as axioms to characterize open and closed sets in spaces where a distance
is not necessarily given. This leads to the following definitions:
Definition .0.1. A topology T on X is a collection T ⊂ P(X)15 of subsets of X, which are known as the
open sets of X, which satisfy the following properties:
(T1) The empty set and the whole space X belong to T ;
(T2) Countable unions of open sets are open: if U1 , U2 , . . . , Un , . . . are open sets, then ∪k∈N Uk is an open set;
(T3) Finite intersections of open sets are open: if U1 , U2 , . . . , Un are open sets, then ∪nk=1 Uk is an open set.
Exercise .0.2. If (X, d) is a metric space, the collection T of all sets which are open in the metric space
according to Definition 2.1.3, that is all the sets U ⊂ X such that for each x ∈ U there exists > 0 such
that Bd (x, ) ⊂ U , form a topology. Indeed, X and ∅ satisfy Definition 2.1.3 trivially and hence belong to
T , proving (T 1). The second property (T 2) follows from Lemma 2.1.1.The collection of open sets in a metric
space give a topology to the metric space X
Definition .0.2. A topological space (X, T ) is a space X together with a topology T .

Example .0.1. [Metric space topology] A metric space (X, d) with the topology given in the example .0.2 is
a topological space.
The following two are examples of trivial topologies that exist on any set X.
Example .0.2. [Trivial topology] Consider a space X and let Ttr = {∅, X}. One can check that Ttr satisfies
(T 1), (T 2), (T 3). This topology is known as trivial topology. Thus, (X, Ttr ) is a topological space.
Example .0.3. [Point topology] Consider a space X and let Tpt = P(X) be the collection of all subsets of
X. One can check that also Tpt satisfies (T 1), (T 2), (T 3). This topology is known as point topology. Thus,
(X, Tpt ) is a topological space.
In a topological space one can define the notion of convergence or density in the same way we did with
metric spaces, just using open sets instead than balls:
Definition .0.3. A sequence {xn }n∈N ⊂ X converges to x and we write limn→∞ xn = x if for any open set
U containing x there exists N > 0 such that xn ∈ U for all n ≥ N .
Similarly, one can define what it means for a function to be continuous, taking as definition of continuity
the equivalent characterization given by Lemma 2.1.2.
Definition .0.4. Let (X, T ) be a topological space. A sequence {xn }n∈N ⊂ X converges to x and we write
limn→∞ xn = x if for any open set U ∈ mathscrT that contains x, there exists N > 0 such that xn ∈ U for
all n ≥ N .
Lemma .0.1. A function f : X → Y between two topological spaces (X, TX ) and (Y, TY ) is continuous if
and only if for each open set V ∈ TY the preimage f −1 (V ) is an open set of X, that is f −1 (V ) ∈ TX .
Finally, the notion of compactness via covers can be defined in any topological space:
15 The notation P(X) denotes the parts of X, that is the collection of all subsets of X.

Definition .0.5. Let (X, TX ) be a topological space. A subset K ⊂ X is compact by covers if for any open
cover {Uα }α there exists a finite subcover {Uα1 , Uα1 , . . . , UαN } ⊂ {Uα }α such that X ⊂ ∪α Uα .
In the next sections we will define, in the context of metric spaces, dynamical properties as topological
transitivity, topological minimality and topological mixing. All this properties can be defined more in general
for topological spaces. This is why they are called topological properties and why we talk of topological
dynamics.
Extras on topological entropy

Extra 1: Compactness and existence of separated and spanning sets.
We conclude by recalling that to define topological entropy, we assumed thoughout this section that (X, d) is
a compact metric space. We also claimed that both (n, )-sets and (n, )-spanning sets exist and that (n, )-
separated sets have finite cardinality, so that both Sep(n, ) and Span(n, ) are well defined. Compactness is
used to prove these two claims, as shown in the following exercise.
Exercise .0.3. Let (X, d) be a compact metric space. Let n ∈ N and > 0.
(a) Show that there exists an (n, )-spanning set S ⊂ X;
[Hint: Use the definition of compactness by covers, see §2.1.1]
(b) Show that there exists an (n, )-separated set.
[Hint: You do not need any assumption here.]
(c) Show that the cardinality of any (n, )-separated set is always finite and bounded above uniformely, that
is, there exists a constant C > 0 (which depends on n and ) such that
sup{Card(S), where S is an (n, )-separated set } ≤ C.
[Hint: You can use part of the proof of Lemma 2.3.1.]

(d) Conclude that Sep(n, ) and Span(n, ) are well-defined and finite.
Extra 2: Entropy via open covers

Let f : X → X be a topological dynamical system on a compact metric space (X, d). Recall that we defined
in Section 2.5. the quantity Cov(n, ) as the smallest cardinality of an open cover of X with open sets with
diameter less than in the dn metric. Let us prove Remark 2.5.2, that is let us show that the following limit
exists:
log(Cov(f, n, ))
lim
n→∞ n
Let us show that Cov(n, ) is a subadditive sequence in n, that is it satisfies
Cov(n + m, ) ≤ Cov(n, )Cov(m, ), for all n, m ∈ N. (19)
Let U be a cover with open sets of dn −diameter less than and Card(U ) = Cov(n, ) and let V be a cover
with open sets of dm −diameter less than and Card(U ) = Cov(, m). Then, if U ∈ U and V ∈ V , then the
set
U ∩ f −n (V )
is open and has dn+m −diameter less than , since if x, y ∈ U , max0≤i≤n−1 d(f i (x), f i (y)) ≤ and since f n (x)
and f n (y) are both in V , also
max d(f i (f n (x)), f i (f n (y)) ≤ ⇔ max d(f i (x)), f i (y) ≤ ,

0≤i≤m−1 n≤i≤m+n−1
so that dn+m (x, y) ≤ .

Thus, if we consider the collection of all the sets of the form

W = {U ∩ f −n (V ), U ∈U, U ∈U}
it is an open cover with dn+m −diameter less than which consists of at most Card(U )Card(V ) sets (less if
some intersection is empty). Thus, the minimal cardinality Cov(n + m, ) satisfies
Cov(n + m, ) ≤ Card(W ) ≤ Card(U )Card(V ) = Cov(n, )Cov(, m).
To conclude the proof, one can use the following exercise that shows that subadditive sequences always have
a limit16 :
* Exercise .0.4. If (an )n∈N is a subadditive sequence, that is
0 ≤ am+n ≤ an + am , for all n, m ∈ N,
then the following limit exists
an an an
lim and lim = inf .
n→∞ n n→∞ n n→∞ n
If we consider now the sequence an = log Cov(n, ), property (19) becomes
an+m = log Cov(n + m, ) ≤ log Cov(n, )Cov(, m) ≤ log Cov(n, ) + log Cov(, m) = an + am
for any n, m ∈ N. Thus an is subadditive and by Exercise .0.4, the limit
log Cov(n, )
lim
n→∞ n
exists.
Extra 3: Properties of topological entropy

The following properties of topological entropies can be proved using the definition of entropy via covers.
Exercise .0.5. Let (X, dX ), (Y, dY ) be two compact metric space and f : X → X, g : Y → Y two topological
dynamical systems. Using the definition of topological entropy via covers, show that: Let (X, d) be a compact
metric space and let f : X → X be a homeomorphism. Show that htop (f ) = htop (f −1 ).
−1
[Hint: Use the definition of topological entropy via covers. Show that if dfn and dfn are the metrics using to
define entropy, one has
−1
dfn (x, y) = dfn (f n−1 (x), f n−1 (y)).
and use covers by sets of the form f n−1 (Ui ) were Ui belong to a cover of X.]
Exercise .0.6. Let (X, dX ), (Y, dY ) be two compact metric space and f : X → X, g : Y → Y two topological
dynamical systems. Using the definition of topological entropy via covers, show that if f is topologically
conjugate to g (that is there is a topological conjugacy ψ : Y → X) then htop (f ) = htop (g);
[Hint: Since X is compact, you can prove (or just assume and use) that for any > 0 there is δ > 0 such that
for any y1 , y2 ∈ Y , if dY (y1 , y2 ) < δ, then dX (ψ(y1 ), ψ(y2 )) < . Consider covers of X by sets of the form
ψ(Ui ) were Ui belong to a cover of Y . ]
References for Extras on Coding of the cat map

More details on how to construct a conjugacy between the cat map and a shift space can be found in the book
by Katok and Hasselblatt (see §7.4.5).
More details on the coding of the automorphism in Exercise 2.8.1 can be found in the book by Devanay,
An introduction to Chaotic Dynamical Systems, see §2.4.
A definition of the properties of a Markov partition can be found in the book by Katok and Hasselblatt
(see §10.3.1).
16 You can find the solution for example in Katok-Hasselblatt book, Lemma 4.3.7.

CorinnaII

Uploaded by

Copyright:

Available Formats

CorinnaII

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

CorinnaII

Uploaded by

Copyright:

Available Formats

MATH36206 - MATHM6206

Dynamical Systems and Ergodic Theory

Prof. Jens Marklof

PART II: LECTURES 8-15

course web site: www.maths.bris.ac.uk/∼majm/DSET/

2.1 Metric Spaces and Basic Topology notions

Definition 2.1.1. A distance d is a function d : X × X → R+ such that

d(x, z) ≤ d(x, y) + d(y, z).

Example 2.1.1. 1. X = R or X = [0, 1] with

3. X = S 1 with the arc length distance d(z1 , z2 ) defined in §1.2.

In particular two points (ai )∞ ∞ +

Exercise 2.1.5. Prove the Corollary from Lemma 2.1.1

2 c University of Bristol 2010 & 2014

dY (f (x), f (y)) = dX (x, y) ∀ x, y ∈ X.

We already saw an example of isometry:

f (BdX (x, δ)) ⊂ BdY (y, ) ⊂ U,

thus BdX (x, δ) ⊂ f −1 (U ). This shows that f −1 (U ) is open.

3 c University of Bristol 2010 & 2014

2.1.1 Compactness by covers (level M only, Extra for level 3)

4 c University of Bristol 2010 & 2014

2.2 First topological dynamical properties

2.2.1 Topological dynamical properties

5 c University of Bristol 2010 & 2014

Lemma 2.2.1. Assume X has no isolated points. If O+

{B(xn , 1/k), n ∈ N, k ∈ N} = {U1 , U2 , . . . , Un , . . . }.

Bn+1 ⊂ Bn ∩ f −Nn+1 (Un+1 ). (2.1)

non empty nested closed sets has a non empty intersection.

6 c University of Bristol 2010 & 2014

Let us describe the iterates F n (Q).

7 c University of Bristol 2010 & 2014

be the corresponding eigenvectors. Check that v 1 and v 2 are orthogonal.

8 c University of Bristol 2010 & 2014

2.3 Topological conjugacies

2.3.1 Topological conjugacy

The graph of the tent map is shown in Figure 2.2(b).

Recall that we have the trigonometric identities

sin 2θ = 2 sin θ cos θ, sin(π − θ) = sin(θ). (2.3)

9 c University of Bristol 2010 & 2014

Thus, using this trigonometric identities we have

g k (y) ∈ ψ −1 (U ) ⇔ ψ(g k (y)) ∈ U.

Since ψ is a conjugacy, f k ψ = ψg k so f k (ψ(y)) = ψ(g k (y)) ∈ U , so O+

exchanging the role of f and g, and ψ and ψ −1 .

10 c University of Bristol 2010 & 2014

2.3.2 More topological dynamical properties

d(f n (x), f n (y)) ≥ ∆.

11 c University of Bristol 2010 & 2014

d(f n (x), f n (y)) = 2n d(x, y).

d(fAn x, fAn y) = λn2 d(x, y) < d(x, y) < ν.

12 c University of Bristol 2010 & 2014

In various books8 the following is taken as definition of chaos.

(C1) f is topologically transitive;

T (x, y) = (x, y) → (x + y mod 1, y mod 1).

(a) Show that T has sensitive dependence on initial conditions.

• Show that periodic points P er(f ) are dense;

13 c University of Bristol 2010 & 2014

2.4 Topological Entropy

2.4.1 Entropy via separated sets

dn (x, y) = max d(f k (x), f k (y)).

Bdn (x, ) = Bd (x, ) ∩ f −1 (Bd (f (x), ) ∩ · · · ∩ f −n+1 Bd (f n−1 (x), ).

14 c University of Bristol 2010 & 2014

f (BdX (x, δ)) ⊂ BdY (y, ) ⊂ U,

Bdn (x, ) = Bd (x, ) ∩ f −1 (Bd (f (x), ) ∩ · · · ∩ f −n+1 Bd (f n−1 (x), ).

log(Sep(n, )) log 2n+k (n + k) log 2

htop (f ) = lim htop (f, ) ≥ log 2.

log(Sep(f, n, )) log(Span(f, n, ))

log(Span(f, n, )) log 2n+k (n + k) log 2

log(Span(f, n, )) log N

(2) Sep(n, 2) ≤ Span(n, ).

Span(n, ) ≤ Card(S) = Sep(n, ).

dn (x1 , x2 ) ≤ dn (x1 , y) + dn (x2 , y) < + = 2,

Sep(n, 2) ≤ Span(n, ) ≤ Sep(n, ).

log(Sep(n, 2)) log(Span(n, )) log(Sep(n, ))

Figure 2.3: (n, )-spanning and (n, )-separating sets fo fA : T2 → T2 .

Figure 2.4: Distance from points of the (n, )-spanning set.

log(Span(n, )) log Card(S) log( 4N L λn−1

log(Sep(n, )) log Card(S) log( L λn−1

and the balls have dn -diameter less than , the collection

and since diamd (Uk ) ≤ for each 0 ≤ k ≤ N ,

Span(n, ) ≤ Cov(n, ) ≤ Span(n, /2). (2.8)

Span(n, ) ≤ Cov(n, ) ≤ Span(n, /2).

Thus, for every > 0 we have that

log Span(n, ) log Cov(n, ) log Cov(n, ) log Span(n, /2)